Let's Design and Build a (mostly) Digital Theremin!

Posted: 9/18/2017 2:28:13 AM

From: Northern NJ, USA

Joined: 2/17/2012

32/16/8 Float

I'm going through my various floating point math subroutines and optimizing them for the unpacked float type under consideration. The float consists of a 32 bit unsigned normalized (MSb=1) magnitude with decimal place to the left of the MSb, a 16 bit signed non-offset exponent (power of 2), and an 8 bit sign (1 or -1), all stored in separate processor registers / memory slots.  Forcing the output of these subroutines to produce normalized results, plus a very well defined non-norm zero (MAG=0, EXP=-0x8FFF, SGN=1) means a lot of the up-front processing can be removed.  I'm hoping that allowing the exponent to "roam free" in the 32 bit space (in calculations between memory storage and retrieval) will reduce the cycles needed to implement the math I need to do, and I have a specific "lim_f" subroutine that reigns it back in to 16 bits when desired (7 cycles max).  Floating point multiplication now takes 8 cycles max, and floating point addition takes 16 cycles max.  I wrote functions that convert floats to / from signed ints, and they take 9 and 8 cycles max., respectively.  Onward and upward!

Posted: 9/24/2017 3:33:26 PM

From: Northern NJ, USA

Joined: 2/17/2012

Strong Typing

One way programming languages are characterized is by how strongly typed they are.  Typing is a way of assigning a label to a constant or variable so that operations on it (arithmetic, logical, storage, printing, etc.) automatically behave a certain way.  Common types are (signed) ints, unsigned ints, floats, characters, etc.  (One horrible thing about early C is the vagueness of the bit width specified by the common types of int, long, etc. as well as modulo behavior.  You often desire a certain modulo along with the natural behavior of it under 2's complement operations, and indeed newer extensions to C++ have explicit width types such as int32_t, uint32_t, etc.)

Typing is an interesting way to automate things, but you often need to override the default action, and this is done through casting, which syntactically uses the type as a sort of function.  For example, int32_t(x) returns the signed 32 bit int representation of x.  Often the value returned isn't different, only the type label has been changed so that it gets operated on as expected for that type.  (Another horrible thing about C is the way it handles signed functions, where all inputs have to be signed for the signed operation to be employed, otherwise the operation is unsigned, which leads to all kinds of unexpected error situations.)

I get the typing thing for bit width, but not so much for signed behavior.  It seems to me that the operation rather than data should be typed, and this is what I do in my assembly language.  Less than comparisons between variables are also typed (using signed or unsigned subtract, equality is sign neutral, less than comparisons to zero are always signed).


Still working on the basic math subroutines.  EXP2_F is done (35 cycles max) and I thought LOG2_F was wrapped up but now I'm seeing some wasted headroom opportunities in the polynomial coefficients.  I could improve the error by 1/2 bit or so worst case, but it might take days to tune the thing.  Same issue with the float version of COS.  There seems to be more headroom available in the COS polynomial vs. the SIN polynomial, another reason to use COS as the basis for both.

This stuff can be fairly complex at the algorithmic level, and incredibly nuanced at the numeric level.  One thing that's really paying off is the hardware limiting of shifts, that alone massively cleaned up the EXP2_F code.


In the processor I removed all the mixed signed / unsigned ops and installed a "reverse subtract" op.  Add, multiply, and, or, xor - none of these care about operand order, but subtract does.  The reverse version swaps (commutes) the operands.  This shaves off an instruction or two here and there, particularly for the immediate ops, not sure if it will stay.

Posted: 9/30/2017 3:02:18 PM

From: Northern NJ, USA

Joined: 2/17/2012


If anyone is interested in my latest Hive math algorithms, I put a spreadsheet describing them here:


I've test-benched each one, they're all polished up and ready to go.  It really helps to work out the algorithms in something interactive like Excel, you get a lot of insight by actually seeing what's going on at each step.  Coding them up in HAL and watching them run in the simulator is also quite informative.  And of course the results of both should match exactly.


# include <filename>

The best way to maintain these algorithms is to have a single copy in a file somewhere, and pull it into your program via a pre-processor directive.  C uses the "# include" syntax and that's what I've decided to do in the HAL assembler.  It's kind of thorny adding this feature because:

1. Errors can now happen in the other files that get pulled in, so the file name needs to track here for meaningful error reporting.

2. With multiple files the line number isn't unique anymore, so this also needs to track for proper error reporting.

3. There can be files within files within files etc. so we need a system to keep track of the read position in any files we will be returning to to read from some more.  This is very similar to pushing the subroutine return address.

4. The commenting syntax should also operate on these directives so they can be commented out, which means the discard of block comments, and the separation of live code and end-of-line line comments, must happen at the same time we are evaluating the directives and pulling any other file contents in.  Looking at the current code, the separation state machine can be simplified and modified to function on a per-line basis, with only a bool state (in_block_comment flag) to be preserved and passed from one line to the next.

I've been punting forever on line numbers, using blank lines in the intermediate files to preserve them and such, but it seems that now is the time to face this issue head on.  For as useful as they are from a diagnostic standpoint, I'm going to move away from intermediate files and use an indexed struct or object that has the source file name, line number, code text, and end of line comment for each source line.  This will nail down any error reporting issues for good, and will allow more flexible pre-processing as the addition of lines won't throw off the line counting and labeling system.

Use of the newer containers in C++ (vectors, hashed storage) has really streamlined the code and taken it to the next level.


Pre-processor directives aren't something you want to use in a higher level language, though C++ uses them extensively to include code, which leads to clunky directive guard tests that attempt to include the code only once.  What you want is a package system that allows you to pick and choose what functionality you want include in and expose to your code.  A package can be a single file with multiple sections, or multiple files, that can be nested.  SV has a really nice packaging system where you can stick all your magic numbers and other stuff in one place, and then refer to them anywhere as many times as you like without conflict, and limiting it to defined subsections of that shared data if desired.


In revisiting the math subroutines, I found the new byte-addressed processor refreshingly easy to program, mostly due to the optional operand move now available with all immediate operations.  Moves are inherently wasteful, so in the past I'd find myself spending hours trying to remove them, particularly from critical sections of code.  This is a huge problem with pure stack machines (Hive is a hybrid register/stack machine) and stack languages, the programmer knows in his heart that moves, swaps, dupes, drops, etc. - any stack manipulation that doesn't involve a functional change to the data itself - is an inefficiency to be minimized, but this effort isn't in any way related to the problem at hand (writing a program to do something) so it's unwelcome mental overhead.  I like puzzles as much as the next person, but not so much when they seriously impede the solving of a bigger puzzle, nor when the sub puzzle solving is an exercise in the minimization of something bad rather than the elimination of it.

The unsigned comparisons were really needed but never fit in the opcode old encoding, so it's nice to have those too.  

Posted: 10/4/2017 9:08:05 PM

From: Northern NJ, USA

Joined: 2/17/2012

Got the new #include system going several days ago.  Using a vector of structs in a LIFO configuration (C++ vectors have LIFO like controls, so I didn't need to add much code to them in order to do this) to hold the state info as it traverses the various input files.  Had to open the files in binary mode for the "tellg" and "seekg" stream pointer functions to work correctly, but other than that it ran shockingly well right off the bat.  The only real wrinkle to iron out was what to do with the #include statement lines - I was attempting to discard them, but didn't want yet another thing going on to keep track of.  It's actually much simpler to just stick them in the store and ignore / delete them later.  The data from the input files is stored in a vector of structs, with each indexed struct holding one line of info separated out (file name, file line number, code, end-of-line comment).  I keep them around until the end so the error system has data to pull from.  Now that I am explicitly tracking file and line, I was able to throw in a processing step that discards empty lines.  There are 7 steps in the assembly process, 3 to pull in the data and massage it (pre-processing), 3 label processes, and a final process that interprets the result to a model of Hive memory.

An #include system is an amazing thing!  All my math subroutines are in separate int and float files, and now I'm going through the CLI (command line interface) code and cleaning it up (byte addressing makes all kinds of things much easier).  Files can be #included inside of other files, which can be #included inside other files, ad infinitum/nauseam.  Sticking code in separate files can really clean up the view of the thing you're working on, and the separation process helps to define the overall modularity of the project.  I've got a math package (which contains int and float packages), a char package (ha ha), a string package, a CLI tokenizer file, a CLI parser file, and a CLI top file so far.  I'd forgotten how much code I'd written!  It could all use a second look and a polish anyway.

What would be really useful would be something that somehow determines what can be safely left out when pulling things in, which would automatically minimize the memory image footprint.  I suppose functions could be enclosed in brackets, and the labeling system could look for orphaned references?  Something to think about.  

I changed my string format a bit.  These are small strings for display and such so they don't need to hold a whole encyclopedia or anything.  The chars are obviously 8 bit, but at the head of the store the first byte is reserved for the fullness index (post inc write pointer), and the second byte is the fullness limit.  Having a separate limit means you don't have to allocate powers of 2 string buffers, which is nice for shorter fixed messages, and for nutty stuff like the 20 line LCD display.  Having a byte index means you can only store 255 chars max (there are 256 states associated with 255 chars, you need one more for "empty").  So this string type is "str8".  Whenever I need longer strings I can easily make a new type with indices of 16 or 32 bits and call it "str16" or "str32".

Posted: 10/9/2017 10:56:07 PM

From: Northern NJ, USA

Joined: 2/17/2012

{Fight Club}

Programming language scoping is namespace containment.  You're free to re-use variable and subroutine names when in differing scopes and you can expect those names to remain local to the scope.  Kind of like Fight Club, what happens in there stays in there.  Namespace pollution can happen quite easily in larger coding projects, and can often be tricky to debug, so you see things in C++ code like "std::cout << blah" where the cout function name is referenced to an explicit library (though I think peppering your code with "std::" pretty much ruins readability).

Scoping in C isn't super sophisticated, rather it's quite mechanically controlled by the {} braces.  An open brace { takes you into another scope, and a close brace } takes you back to the one you were in before the { came along.  So if I write:

{ int tst = 5; }

cout << tst;

I'll get an error if tst wasn't defined somewhere previously in the root scope:

int tst = 3;

{ int tst = 5; }

cout << tst;

which prints 3.  And this also works just fine:

int tst = 5;

{ cout << tst; }

The behavior then is that scopes can "see" names from their own scope all the way back to the root scope.  The determination of which one to use is simply the one in the scope closest to where it is being referenced for use:

int tst = 5; cout << tst; 

int tst = 4; { int tst = 5; cout << tst; }

int tst = 3; { int tst = 4; { int tst = 5; cout << tst; }}

will all print '5'

int tst = 4; { cout << tst; }

int tst = 3; { int tst = 4; { cout << tst; }}

will all print '4', etc.

Every time an opening brace is encountered, a new scope is created and the names within it must be unique among themselves, but not so among the names in all the scopes looking back towards the root.  All scopes share the root scope, which makes it a global resource, but other than that scoping depth isn't necessarily an indication of shared name spaces, because two scopes can have the same depth but only share the root scope:

Here scope 2 and scope 4 have the same depth but only share the root: 

scope 0 { scope 1 { scope 2 }} 

scope 0 { scope 3 { scope 4 }}

Here scope 2 and scope 3 share scope 1 and the root: 

scope 0 { scope 1 { scope 2 }} 

scope 0 { scope 1 { scope 3 }} 

Naming is one of the hardest things programmers have to do.  Names should be descriptive, yet terse, and locally unique and consistent.  Most of my programming time is spent coming up with names for things, coming up with better names for things and renaming them, etc.  Language features that reduce the amount of naming can really facilitate the coding process, and scoping is a simple and powerful way to do this.  With scoping we can use common iterator names like 'i' over and over again, and we don't have to make the names in similar subroutines different so they don't clash.

Scoping also naturally contains things like subroutine guts:

return_type sub_name (port_type port_name, ...) { sub_guts...; return(return_value) }

The port can be seen as a bridge between the inside and outside scopes.  And the return is necessary so as not to "fall through" the closing brace (all of these mechanisms are more mechanical than you might initially think).


Since my last post I've implemented scoping support in the assembler, and WOW is it useful!  Now I don't have to worry about variable names so much (because there really aren't any) but label names are a different matter.  Not having the local labels "fighting" globally with all the others anymore is an immense relief.  The labels can also now be more descriptive, so I don't need to explain as much regarding program flow control in the end-of-line comments, and anything that cuts down on the need for micro documentation is generally a step in the right direction.  These are the assembler steps:

1. Read in files, split code & comments, implement the #include directive.

2. Implement #define (forward text-based search and replace) and #undef (removal from table).

3. Convert to lower case & insert spaces around certain text groupings.

4. Pre-process certain HAL constructs to make them easier to decode later, kill blank code lines.

5. Replace all explicit labels with addresses, kill their address assignment statements.

6. Flatten label name space by traversing {scope} hierarchy and assigning globally unique aliases.

7. Kill scopes anchored by orphaned LH labels (recursively), kill all braces.

8. Process implicitly assigned label tokens, replace with physical addresses.

9. Parse final assembly to memory array.

If you notice in step 7 the process is now removing dead code, which is code that is written like this:

@some_label { some code... }

where the label isn't referenced by anything in-scope.  You have to do this recursively, or in a loop, because removing some dead code can remove the only reference to some other code, which is now dead, etc.  What's great about this is I can now have packages of subroutines that get pulled in with an include statement, but only the ones that actually get called will make it all the way through the assembly process.


I also decided not to go too crazy on pre-processor directives, and I've changed from the C/C++ syntax of #<directive name> to the SystemVerilog syntax of `<directive name>.  The SV syntax is actually safer because directive names when used must have the prefix ` so the namespace is kept separate from everything else.

Posted: 10/19/2017 6:28:41 PM

From: Northern NJ, USA

Joined: 2/17/2012


Last week I was revisiting the CLI (command line interface) HAL assembly code, puzzling over what to stick where in terms of the register stacks while doing the tokenizing and token interpretation.  And it struck me that I could easily incorporate JAL or "jump and link" opcodes - relative subroutine commands - in the processor, which would avoid the use of a register for the subroutine start address.  The subroutine start address would then be a signed offset to the current thread PC, given by immediate 8 or 16 bit value, or by the value at the top of stack B.  And I could do the same for memory accesses, which otherwise requires a register to specify the memory base address.  The inclusion of these opcodes would really help clean up the CLI assembly code, and would slightly compact it and make it run slightly faster as well.  I've read both the Hennessy & Patterson and Patterson & Hennessy texts so I have no excuse for not thinking to include JAL earlier! :-)  (Though I named my JAL "JSB" for "jump to subroutine" to match my "GSB" or "go to subroutine".  I briefly thought of renaming all my "JMP" opcodes "JTO" to match my "GTO" opcodes, but "JMP" has a clearer meaning to my eyes.)

In doing this I also re-addressed the HAL syntax for jumps and subroutines.  I removed any explicit invocation of immediates here, as calculating the PC offset manually is dangerous and prone to error.  All jumps automatically calculate the opcode immediate value given the current PC and an address (or label converted to an address), so the JMP syntax looks more like a GTO, and JSB more like a GSB.  And I decided to remove my primary JSB / GSB syntax:

  pc := @lbl, sa := pc    // jsb_8 or jsb_16

  pc := @lbl, sa := pc, pb    // jsb_8 or jsb_16, w/ pop b

  pc += sb, sa := pc    // jsb

  pc := sb, sa := pc    // gsb

and promote the secondary syntax which is looking cleaner to me now:

  sa := pc := @lbl    // jsb_8 or jsb_16

  sa := pc := @lbl, pb    // jsb_8 or jsb_16, w/ pop b

  sa := pc += sb    // jsb

  sa := pc := sb    // gsb

For the memory access opcodes, I changed the immediate offset to a strictly byte offset.  The offset used to be an increment that was sized to track and index the access width (in terms of bytes) but that hems one in too much for wider accesses relative to the PC, where one has no real control over the byte "phase" of the PC.  This actually frees things up generally in terms of grouping multiple width reserved data blocks together.

The target FPGA has limited block RAM resources, which limits the main memory address to only 12 bits wide.  This is easily reachable in a direct sense via a 16 bit immediate, but I've been studiously avoiding the inclusion of any opcode that directly addresses some subset of memory.  Offset or relative addressing is totally fair game as it operates locally and doesn't rely on anything weird (like paging mechanisms, etc.).  I don't want to paint the design of HIVE into a corner that doesn't treat all 32 bits of address space as equally as possible.

This morning I added a second highlight in the simulator listing that isn't necessary "glued to" the current thread's PC, so I can observe a section of memory while the threads are off doing their thing.  The keys up / pg-up / dn / pg-dn and the 'g' command "unglue" the highlight, hitting the home key "glues" it back.  Minor coding change but pretty handy.

You must be logged in to post a reply. Please log in or register for a new account.