Let's stop the madness over here too.

# Let's Design and Build a (mostly) Digital Theremin!

**Floating Point Addition**

Floating point addition is fairly nuanced. Because addition is essentially little endian, the decimals of the inputs need to be aligned before the addition can take place, which in the float world is accomplished by right shifting the significand of the input with the smaller exponent a distance equal to the difference of the exponents.

A good place to start is with normalizing the inputs before denorming the one with the smaller exponent. Zero is a denorm that can confound things here because it will test larger than a non-zero input with negative exponent, so we need to examine each input for zero significand and deal with it somehow. One way is to replace the exponent with a very negative value, so zero will always lose the battle of the exponents.

Next we need to compare the input exponents by subtracting one from the other. Because Hive shift distances are modulo, we need to somehow limit the shift range to the interval [-32:0]. One tricky way to do this is to zero out the significand if the shift distance is greater than -32. Shifting the value zero any distance will always return zero, and subsequently adding this zero to the other signifcant won't change it, which is what would happen if it were actually shifted -32 or beyond.

Then we do a signed addition on the significands with a check for over and underflow, followed by normalization of the result if necessary, and the usual bounds checks on the output exponent. A zero result should return both zero significand and zero exponent.

The output exponent is simply the largest of the two input exponents, given input and output normalization.

I'm reaching the point where a lot of this stuff is becoming boilerplate, which is good.

Found this fantastic paper this morning via Hacker News and am in the process of digesting it:

http://www.eecs.berkeley.edu/~waterman/papers/phd-thesis.pdf

Read it and weep! Laundry lists of fail regarding every popular ISA (instruction set architecture) are laid bare in plain English. Literals being given too much opcode space in almost every ISA is an issue I've really grappled with (and IMO effectively dealt with via a tiered method) in Hive. Surprisingly even ARMv8 (the ISA everyone is currently hitching their wagons to) substantially sucks. The paper rightly tears the x86 ISA a new one, it's absolutely horrid.

Like some newer programming languages which pare things down by purposely omitting various features, I think it's super important to keep the processor as simple as humanly possible. ISA design is really hard work, but it's amazingly poorly done for the vast impact computing has on the world. And we really need open standards here.

**[EDIT]** I like the way they handle different width immediates, by breaking them up and reassembling the pieces in a way that doesn't require hardware muxing (though muxing here isn't really that onerous). I'd love to use that idea but can't. They also picked little endian, which tells me they aren't completely insane.

**LOG2 Float**

Logs are exponents - it's too bad the term exponential is taken.

Log base 2 is the exponent of 2 required to produce the input value. So it's a work backwards thing:

2^0 = 1 so log2(1) = 0

2^1 = 2 so log2(2) = 1

2^2 = 4 so log2(4) = 2

Inputs less than one produce negative results:

2^-1 = 1/2 so log2(1/2) = -1

2^-2 = 1/4 so log2(1/4) = -2

It seems log2(0) equals negative infinity (negative overflow).

If the input is a float with significand S and exponent E:

log2(S * 2^E) = log2(S) + log2(2^E) = log2(S) + E * log2(2) = log2(S) + E * 1 = log2(S) + E

so:

log2(S * 2^E) = E + log2(S)

The output significand is basically the input exponent, plus a positive fractional offset caused by the input significand. The output exponent is simply the distance required to normalize the output significand. This works for any positive input.

As in the EXP2 case, observation of the LOG2 graph reveals the fact that we can scale a small portion to cover all input and output scenarios by compressing the input by appropriate powers of 2 (manipulation of the float exponent). First we subtract 1 from the input to move the curve down to the origin of both axes, and mulitply it by 2 to better fit the calcuation space. This also makes the polynomial approach fundamentally tractable. The polynomial coefficients are adjusted so that they also fit in the calculation space, which gives an polynomial output that is 1/2 the expected value.

What to do with negative inputs? I rashly decided to take the absolute value of the input. Zero input is the only scenario that can cause (negative) overflow, all other inputs should easily fit in the output space.

Here is an illustrative example of how to handle the input value 12.25 (where the significand width is 8 and the exponent width is 4):` SIG EXP`

` --- ---`

` START 01100010 0101`

` ABS & NORM 11000100 0100`

` CHVAR 10001000 0011`

` POLY_i 10001000`

` POLY_o 01001011`

` NORM (signed) 01100000 (SHL 5)`

` MOVE 01100000`

` POLY 00010010 (SHL 5-7) (7 here instead of 8 to make up for poly output of 1/2 expected value)`

` ADD 01110010`

` EXP_o 0011 (8-5)`

` END 01110010 0011`

So the result is (114 / 256) * (2^3) = 3.5625 (the actual precise result is 3.6147...)

The polynomial requires 11 terms for minimal error.

**COS - Float (via polynomial)**

Since cosine is cyclic, we have to figure out where in the particular cycle the input float value happens to be. This is accomplished by multiplying the input value by the inverse of the cycle (1/2*pi) which gives us a signed integer portion that represents the cycle count from zero, and a signed fractional portion which represents the location within the given cycle. We discard the integer portion because it gives us no intra cycle information, and present the fractional portion as input to the cosine polynomial as a signed integer. The polynomial result is also a signed integer, which is requires conversion to float.

Note that larger input values devote more information to the cycle count and less to the location within that cycle, so the significance of the result suffers. In the extreme, the result has no mathematical relation to the input. For instance, maximum input for the native cosine in Excel is (2^27)-1 and anything larger gives a #NUM! result, presumably because the underlying polynomial also employs something like 32 bit integers, and 32 - 27 = 5 remaiing bits of precision, which ain't much. In the larger sense this issue is due to the non-ideal mapping of the semi-logarithmic float domain to the linear domain of the integers and cosine. It's hard to know what to do with this scenario without signaling, but it's clear that declaring the result invalid after its precision sufficiently poops out is the right thing to do.

For implementation, I found it most straightforward to:

1. take the absolute value of the input and normalize it;

2. multiply by 2/pi in order to maximize the precision of the extended multiplication and subtract 2 from the exponent to compensate (*1/4);

3. unsigned shift the significand the distance of the exponent;

4. multiply this by the input sign in order to restore its signedness;

5. plug this into the COS2_POLY routine;

6. normalize the result as a float.

The shift in step 3 above should be limited so as not to exceed the hardware shift distances (+31,-32).

It isn't necessary to check for zero significand at the input as multiplying and shifting a zero will always yeild a zero result. However, after the normalization / conversion to float in step 6 above, the output exponent should be zeroed out if the output significand is zero (to squash the generation of "dirty zeros" here).**SIN - Float (via polynomial)**

To implement sine float, the above routine simply calls SIN2_POLY in step 5 (which offsets the polynomial input by -0x4000,0000 and calls COS2_POLY).

===================

I realize I haven't posted the spreadsheet I've been developing and testing these algorithms in:

http://www.mediafire.com/download/89q89lneraywo6y/HIVE_algorithms_2016-05-10.xls

Rather intentional as I never feel it's really complete enough, but it's getting there.

**Hive Assembly Language (HAL)**

The Hive simulator displays disassembled code as pseudo code, and I generally write code in my notebook in pseudo code form, so it seems the time is ripe to write an assembly language interface. First step is to make a *.mif to *.hal file converter (which provides fodder for the reverse conversion). Most of the functionality already exists in the simulator code, so it's easy peasy lemon squeezy (took me all of 4 hours or so, most of it spent making the output pretty). For instance, here is the unsigned integer div/mod routine in HAL converted from the MIF source:

`0x560 (s0!=0) ? pc += 0x3 // DIV_MOD_U SUB START - check a`

` P0 := 0xff // max q`

` P1 := 0 // zero r`

` pc := P7 // return`

` s2 := lzc(s0) // x := pow[lzc[a]]`

` P2 := 1 << s2`

` s0 *= 0x3f // na := -a`

` s0 *= s2 // 1x Newton Raphson`

` P0 *u= s2`

` P2 += P0`

` s0 *= s2 // 2x`

` P0 *u= s2`

` P2 += P0`

` s0 *= s2 // 3x`

` P0 *u= s2`

` P2 += P0`

` s0 *= s2 // 4x`

` P0 *u= s2`

` P2 += P0`

` P0 *= s2 // 5x`

` P0 *u= s2`

` P2 += P0`

` P2 *u= s1 // q := b * x`

` s2 *= s0 // r := b - (q * a)`

` P1 -= P2`

` s1 -= s0 // 1x adjust q & r`

` (P1<0) ? pc += 0x6`

` P1 -= s0`

` P2 += 0x1`

` s1 -= s0 // 2x`

` (P1<0) ? pc += 0x2`

` P1 -= s0`

` P2 += 0x1`

` P0 := P2 // move q`

`0x582 pc := P7 // DIV_MOD_U SUB END`

(It looks pretty, oh so pretty...)

Now for the harder part: implement a token parser for this language and have it kick out a MIF file. Though a lot of that type of code already exists in the simulator to parse MIF files and the simulator command line.

**HAL**

Two weeks later (!) I've got the HAL to MIF parser finished, which means I can now develop in Hive assembly language. This is nice because assembly is pretty much how I already write and document the code, and the manual (and therefore error-prone) translation steps can be eliminated. Along with the usual end-of-line comments, HAL also supports block comments, so the commenting can be more in-depth and descriptive, and whole chunks of code can be quickly swapped in and out for debug, or kept in reserve for future use. Currently the *.hal <=> *.mif conversion is done via command prompt, I'm not sure yet how much I'll eventually integrate it into the simulator.

The main hold-up was figuring out how to deal with the various representations of literals. A signed 8 bit value is an immediate field in the OP_BYT opcode, signed and unsigned 16 bit values are single in-line literals following the OP_LIT_S and OP_LIT_U opcodes respectively, and 32 bit values are two in-line literals following the OP_LIT opcode. I wanted to use the same HAL notation for all three (e.g.: s2 := 0x89) and let the assembler figure out the best fit based on the value, which turned out to be rather tricky. If the value fits a signed byte and there is no extra pop (OP_BYT is a single operand instruction) then OP_BYT is used. If not, then if the value fits in signed 16 bits OP_LIT_S is used. If not, then if the value fits in unsigned 16 bits OP_LIT_U is used. Otherwise it's 32 bits and OP_LIT is used. This goes smoothly from signed to unsigned to moot, so it's easy to remember. To not break existing code that may rely on relative and absolute addressing, the parser adds NOPs as needed.

Another hold-up was in moving the interrupt controller closer to the opcode decoder and removing register set support for it. It is now controlled via one-hot (one bit per thread) dedicated opcodes. The interrupt controller has been one of the most volatile elements during the design of Hive, I'm hoping it stays put now.

Finally, I got rid of the explicit OP_NOP and moved JMP_8 down to the base of the opcode space. JMP_8 with a zero immediate value gives an opcode that is zero, and a zero jump doesn't do anything at all. There are a couple of other candidates that also do nothing under various circumstances, but having unconditional JMP_8 at the bottom of the opcode space and JMP_8* conditionals at the top neatly bookends things. However, the lack of an explicit NOP threw a wrench into the simulator opcode decode, so that took some thought and work to get past.

With most of these structural changes, the core as synthesized for the FPGA has actually been shrinking a bit (utilizing fewer resources) which is not exactly expected, but is nice. I'm thinking of moving to the EP4CE10 device, which would double the main memory space (16 BRAMS => 32, for a whopping 32KB ;-) and can be had on eBay for <$40.

OK, the HAL parser has been integrated into the Hive simulator. It took me a while to figure out the best way to handle file flow. Now there is no default file at sim startup, memory is blank until a file is loaded manually via the sim command line.

This is the way file I/O now works:

- Both *.hal and *.mif files may be parsed as input, but only *.mif files may be written.

- Reading a *.hal file clears the default write file name.

- Reading a *.mif file sets the default write file name to this *.mif file name.

- Writing a *.mif file sets the default read file name to this *.mif file name, and also

writes the hive_odd.mif and hive_even.mif files for FPGA BRAM init.

I can't use *.hal as a basis for saving within the sim as block comments in the code would get discarded. So it's kind of a one-way street from *.hal to *.mif, though the reverse route is available via a stand-alone Windows command line program.

I converted the verification *.mif file to *.hal and combed through it for literal value issues. It's working now for core verification purposes, and that exercise was very much worth doing. Coding in HAL in a normal text editor seems to be much easier than editing memory contents with the simulator command line.

===========

Now I'm back to contemplating boot loaders and such. The SPI Flash device I have on hand is the Winbond W25Q80BV (got it from Adafruit). It will transfer serial data at 100MHz max in read mode, which is plenty fast even if I only use one of the serial lines. What I don't like about it is the big endian nature of the transfers. With serial protocols I seriously wish they would stick to little endian, as this keeps you from constantly consulting the datasheet to answer the question "big end of what?" Indeed, the end is only important for the address (24 bits, MSb first) and moot for everything else (byte commands and particularly byte data can be seen as endian agnostic). With little endian the answer is "little end of anything" which is nice and consistent. Software types will often state that endianness is a wash, just pick one and don't talk about it too much or you'll start a holy war - but they don't do hardware.

Anyway, this Flash device works all the way down to DC, which vastly simplifies data transfers so dedicated byte I/O serial hardware should be a snap. And since the software that deals with the serial hardware is broadly useful all the time (e.g. for reading and writing presets) it can remain in memory even when not doing software image updates and the like.

The LCD interface is stupidly complex enough that I'll be doing that entirely in software. The init process is like something from the stone age, and data transfer is amazingly slow, even at the bus transaction level.

Rotary encoders will be resynchronized in hardware, and perhaps inverted, but that's it I think. Debounce / state machine tracking is best done in software.

Now you can give us an estimated time of arrival? I would like to experience the results of your labor which can sound like anything you choose? Will your work use the name theremin or more appropriate a proximity controller? Don't make the mistake of that Tanner guy.

Christopher

*"Now you can give us an estimated time of arrival? I would like to experience the results of your labor which can sound like anything you choose? Will your work use the name theremin or more appropriate a proximity controller?" - Christopher*

No idea on ETA but I continue to work on it every chance I get. I want to get a pitch and 2 axis left hand prototype working this year. Still not sure about the synthesis method, particularly after looking into physical modeling and finding it generally to be a pile of special case hand waving nuance. Will definitely be calling it a Theremin.

You must be logged in to post a reply. Please log in or register for a new account.