Let's Design and Build a (mostly) Digital Theremin!

Posted: 7/26/2018 9:34:31 PM

From: Northern NJ, USA

Joined: 2/17/2012

Still beavering away, trying different configurations for velocity.  Rewatching Dominik's most excellent video (link) again and again for inspiration - thank you for taking the time to make and post that video Dominik!  Getting closer and have a strong feeling it will end up more or less like this:

Linear left hand position goes through the knee function, and forks to feed EXP2 and a high pass filter (HPF).  This is fed to a first order low pass filter to gain it up with some averaging of the stepping noise (LP1_GAIN), then the two are combined to form the system exponential control signal (~CV), which is fed to LOG2 to form the system linear control signal.  It doesn't look like much but, ye gods, the mixing of linear and exponential signals isn't something I find at all obvious as a solution to anything.

The knee probably provides enough control over the sharpness of the attack that I don't need a dedicated control for this in a formal function generator, though this perhaps precludes things like reverse envelope type effects.  The HPF frequency knob controls the exponential slew, or decay.  I may add an optional peak hold, separate rise / fall decay rates, a dedicated damping point, etc. once I'm happy with the basic functionality.

I have most of the elements of this up and running and it's quite sensitive to C changes.  I can sit ~1m away and tap my bare foot on the metal leg of the table my PC is sitting on, which causes the volume LED bar graph to jump each time.

Pissed away yesterday trying to make an exponential envelope generator based on a 1st order filter.  It mostly works but the attack phase is problematic (too dependent on accumulator initial and dynamic values). Linear slewing run through EXP2 is simpler and airtight.  It seems just about everything is easier to think about and do linearly (so it's unsurprising that the dominant patch cord domain for analog synths is linear).

Posted: 7/27/2018 7:38:42 PM

From: Northern NJ, USA

Joined: 2/17/2012

My (Least) Favorite Mistake

Still at it, but one thing is becoming quite clear: measuring and using true velocity (as the difference of two consecutive left hand position values) is a dead end for synthesis use.  The difference over a small time period (1/48kHz or ~20 us) is itself small, so gaining it up gives nasty stepping noise.  You can average the differences, but do it too much and it dulls the attack by slowing it down.  You get a much larger signal with conventional first order high pass filtering, and the high pass filter time constant can then be used to generate an exponential decay if you like.  I'm currently doing this, but running into internal limiting issues that I need to address (gained-up attack overflow is causing the envelope to peg at max for a while rather than simply peak and decay).

Sometimes you take what seems to be the most obvious and direct road, and months later you're actually farther away from the destination.

[EDIT] I suppose a lot of the directional incorrectness can be chalked up to semantics.  You go into it thinking "velocity" and so build a velocity sensor, when what you really want is the log2 of the high pass filtered position - the former is highly similar but different enough to be not the droid you're looking for (Theremin mind trick you play on yourself).

[EDIT2] Having played with this for a while it of course isn't working as well as I'd like either.  For quieter notes the attack isn't sharp enough sounding, and there isn't sufficient control over dynamics, so I think explicit generation of the attack envelope is called for.

Posted: 7/30/2018 4:18:13 PM

From: Northern NJ, USA

Joined: 2/17/2012

Not Quite...

Got this running an hour ago:

Subtracting post LOG2 is mathematically equivalent to dividing pre LOG2, but in actual use with integers it's generally better from a resolution standpoint to subtract after, and I'm using a flooring subtract here to limit the effect.  What to do with signed values pre LOG2 is a problem, do you offset the input or lop off the bottom half?  Here I'm lopping with a math diode (zero if < 0).  The HP1 corner frequency rate gives a convenient exponential decay that gets linearized via the LOG2.  So the final sum is linear and unsigned.

Does it work?  Yes, with it I can replicate what DOMINIK is doing in his video.  What I don't like about it is when the attack is made steep enough to be sharp sounding the attack amplitude then becomes somewhat more difficult to control.  And the decay gets lost if the sensitivity is set too high (if the subtracted number is too small).  But it seems to be a step in the right direction.  I believe that having the envelope generation in the velocity path rather than in the final combined path may give more opportunity for expressive control?

One behavior that seems useful is the rectification, or the use of only positive velocity.  When the hand goes through the knee zone in a positive direction the velocity is naturally higher, generating an attack.  Hand movement after this doesn't influence the decay much because the velocity gain is lower when not in the knee transition.  Moving the hand in the opposite direction will cancel or damp the decay, but not as profoundly or abruptly as the attack.  It's this kind of subtle but very useful (from a performance standpoint) behavior that I'm searching for.

I think the next obvious experiment is to move the exponential decay to a slew limiter on the linear side, and concentrate on making the attack amplitude more controllable.  Having the velocity sensing and decay combined is rather neat, but it's probably asking too much of a simple algorithm.

Posted: 8/1/2018 2:50:33 AM

From: Northern NJ, USA

Joined: 2/17/2012

The (90) 7% Solution - Volume Side

The above made me realize I should place the envelope generation in the velocity sensing leg, rather than in the final combined volume processing leg:

"...one thing is becoming quite clear: measuring and using true velocity (as the difference of two consecutive left hand position values) is a dead end for synthesis use."  - dewster

Yeah, don't listen to that guy!  What an idiot...

Knee, then lower branch is a rectified single sample difference, fed to LOG2, floor subtracted for variable gain, then the usual peak hold and envelope generator. As before, this gives a very nicely controllable (in terms of variable peak amplitude) fixed attack and decay.  If simply combined with the knee, the player must keep their hand fairly still after the envelope is triggered if either the attack or decay is set to a rather long time, otherwise the envelope can be instantly damped (particularly so if pulled back through the knee) or, to a lesser extent, accentuated.  For some things this might be exactly what you want, but if you desire retriggering without significant damping, then the "SLIM_FALL" linear decay slew limiter (which instantly tracks when the input is larger than the output, but decays at a programmable linear rate when the input is less than the output) in the upper leg can be set to a time constant somewhere near the envelope fall time to allow decent sounding uninterrupted retriggering, as well as less abrupt sounding damping.  And the lower velocity leg can be bypassed (by setting the vel subtraction to a large value) to give fixed decay envelopes off the knee via slim_fall if that's what you want.  Overall, it seems sufficiently versatile without too many knobs, and attacks can be quite sharp sounding if the envelope rise time is set to the quicker side.  It's easy to adjust and easy to play.

So that's it!  I think I'm pretty much done with the volume side.  Hope to make a video of it, or at least an MP3 or two, before the family camping trip hits this weekend.  This was a long time coming, and now it's major hump-overage for me, whew!  I suppose it doesn't seem like much, or all that different from what I had before, but finding basic combinations that work, getting them in the proper sequence, and then fine tuning the parameters, can make all the difference.  The lower branch elements had been previously fine-tuned, so it all went together fairly quickly once I finally got it though my thick skull where to put stuff.

Bit of work on the UI as well, the main power-up page now has controls for preset load/store, auto calibration, volume and pitch side null, and output volume.  Made the auto cal a knob twist rather than a knob push, as that's easier to do.  Unlike the Theremini, auto cal is instantaneous.

Posted: 8/1/2018 4:08:27 PM

From: Northern NJ, USA

Joined: 2/17/2012

Great site you can get completely ambiently lost in, not sure if I've pointed to it before: https://mynoise.net/noiseMachines.php

Under "Synthetic Noises" the "Sweep Noise" set to "Brown" sounds amazingly angry surf-like.

Posted: 8/2/2018 2:41:10 PM

From: Northern NJ, USA

Joined: 2/17/2012

Next up I'd like to nail down the filtering, which amounts to: 

1. Articulating (volume & pitch axis modulating of) the center frequencies of the formant filters (all 2nd order bandpass).
2. A separate filter for the oscillator source.
3. Optional 4th order filtering (maybe just lowpass?) for the noise and oscillator sources.
4. UI support (knobs/screens) for the above.

I notice I'm using a lot of saturating math (add, subtract, multiply) which can be rather expensive (depending on the operation and the signed/unsigned assignments of the operands: 11/4 cycles max/min, including the subroutine call) and which could conceivably be done in one cycle in the processor hardware.  I don't believe multiplication overflow can be detected early enough in the cycle to conditionally jump, and the jump approach probably wouldn't save that many cycles anyway.  But I would need to shoehorn in a lot of extra opcodes, and employ the final multiplication multiplexer to cover saturation values (0, 1, -1/2, +1/2) as it only covers zero at the moment (for shifts that slide into the abyss) so that could be a timing bottleneck.  Something to think about on vacation.  If I end up doing it that would likely be the final major change to the processor as well.

Posted: 8/3/2018 10:03:54 PM

From: Northern NJ, USA

Joined: 2/17/2012

Off on the family reunion camping trip for a week (Stokes State Forest, NJ - rent an enclosed lean-to in winter and visit High Point next door!).

Y'all take care of each other!  Cheers!

Posted: 8/11/2018 11:10:15 PM

From: Northern NJ, USA

Joined: 2/17/2012

Sisterhood of the Traveling Flags

I'm back!  In the lean-to, with notebook in hand, I had a chance to really think about the limiting [0:2^32) of unsigned results and the saturation [-2^31:2^31) of signed results.  The multiply is the pipeline bottleneck, just not enough time to both evaluate and set the output to the limits (one more stage and I could do it).  And jump testing is not the way to do this quickly.

I believe one could use the 4 extra unused stack memory bits (each byte of width has an extra bit for parity, error correction, general use, etc.) to hold information about the extended 64 bit result.  So after each (lower, or unextended) add, subtract, or multiply these bits would be assigned various overflow results, and those results may then be used by two special opcodes (LIM, SAT) on the next cycle (or later) to test for out-of-range and do the limiting.  I have to extend the multiply result to 65 bits (+1) to cover the full unsigned and signed ranges and see if that impacts top logic clock speed too much.  

With this scheme one can't have the stacks "spill to / fill from memory" as that is 32 bits wide, but I don't find myself ever doing that as the stacks are way deeper than I really need (32 entries deep each).  Rather like the simple "zero, sign, etc." flags that get tested in a conventional register-based processor, these would be copies that "stick" or "ride along" with the stack data, neatly avoiding the addition of shared core state.  This way the thread can take an interrupt and return, with the flags for that particular stack datum intact.  I only need three bits for this, the fourth could be used for other things I suppose.  Unlike with conventional processors, I can't imagine needing instructions that explicitly manipulate (set, clear) these traveling flags.  If it works out it should be quite clean, and may resolve a couple of other long-standing awkwardnesses.

It's interesting finding two cycle solutions to opcode functionality, and the individual one cycle steps often end up being handy for other uses.

[EDIT] Worked on the multiplier, adding the extra bit hasn't noticeably slowed it down.  Also combined the logical functions with the miscellaneous functions, and added the limit and saturation opcodes to the ALU.  Still need to shuffle the opcode encode / decode and verify, maybe another couple of days worth of work ahead.  

(Moving on this kinda slow, lots of yard work after the storms last week knocked some dead trees over.  Tried MX Linux for the first time on an older laptop but it's not as good as Mint IMO.  Nothing like an SSD to really speed up old, sluggish laptops.)

[EDIT2] Now working on the sim & assembler C++ code.  Thinking of a struct (flags & data) for the stacks entries.  Always rather cumbersome modeling SystemVerilog constructs and timing.  Lots of churn, hoping this will be the final major processor revision.

[EDIT3] Sim is compiling and verification file is passing.  Next up: add tests for the new functionality, then verify both sim and core.

Posted: 8/20/2018 8:19:48 PM

From: Northern NJ, USA

Joined: 2/17/2012

Past Another Hump

Woke up at 1:30 am with an idea for the multiplier and just had to see if it panned out.  Ended up using it in the sim but not the hardware core (>64 bit math is not standard to C++ so I had to generate the 65th multiplication bit manually given the various signed / unsigned input scenarios - because the inputs are 33 bit sign/zero extended, it's basically 0 for U*U, and the 64th result bit for mixed U*S, S*U, S*S).  Spent all day writing verification assembly and tracking down a few bugs.  Sim and core both verify, it's nice having the same thing described in two different languages as it catches a lot of mistakes / bad ideas (though it's a trifle disorienting switching between C++, SV, and Hive assembly).  Just finished recoding the sections of Theremin and library code which aren't supported anymore.  Notably, immediate logical AND, OR, and XOR are gone.  I never used immediate OR or XOR, and only used the immediate AND a few times here and there.  Immediate extended arithmetic operations are also gone due to infrequency of use.  Even though I used it a fair amount, I removed the reverse subtract - it always kinda bugged me, as an odd man out it never really fit in the encoding, and there are various two cycle replacements.  The core SV code compiles and meets timing, but I don't have the courage today to do the FPGA pump.  Best to quit and declare victory while one is ahead!

With the addition of travelling flags, it feels like the core is in a basically better and more interesting place, though I haven't gone through and replaced all of the limiting / saturating subroutine code with the new and more efficient opcodes yet - will do that once I've got it all back up and running in a stable fashion.  One thing at a time, debugging time seems to increase exponentially with the number of changes applied at once.  Churn like this at the lowest level upsets the whole house of cards, and while I haven't hit a bug yet that I could lick in a day or two, the possibility of that kind of bad luck is always present.  Flirting with disaster is my middle name...

[EDIT] Moron seems to be my middle name today.  Was trying to track down a stack push bug and forgot to do the "Assembler: Generate Programming Files" step in Quartus, the FPGA tool.  So it wasn't at all obvious that the software load wasn't changing with my edits.  The bug was on thread 7, which is the command line thread, so I couldn't upload new SW loads the usual way, so I had to re-pump the FPGA each time and do a non-SPI Flash boot.

In my defense, there are many steps to the SW build process:
1. Fire up the Hive simulator, or do a "cfg" command in it, which translates HAL assembly to MIF.
2. Run "copy.bat" file on my desktop to copy the 4 MIF files to the FPGA Hive source directory.
3. In Quartus do a "Processing | Update Memory Initialization Files".
4. In Quartus do a "Assembler: Generate Programming Files".  (Doh!)
5. In Quartus do a "File | Convert Programming Files..." select the *.cof file (which automates this) and click "Generate".
6. In Quartus open the programmer and run it (also largely automated).

When the command line is working I only have to do step 1, then run TeraTerm and send the generated TTL macro to the FPGA, which programs the SPI and reboots the Theremin.

I have a core version number I can read in the register set via the command line to confirm the SV firmware pump, and I really should have an easily readable software version somewhere as well.

Anyway, it's all back up and running like it was before I started monkeying with things.  Now to clean up the library code a bit.  I can't think of anything else I might want to change or add to my Hive processor, it's feeling kinda done and I'm pretty happy with it.

[EDIT2] Removed all the manually implemented saturation / limit subroutine calls, which has reduced and streamlined several functions.  Having a two step saturation / limit allows you to add, multiply, and especially subtract (which isn't commutative) with the same move flexibility as a 3 operand machine, which is quite nice.

There are real down sides to "kitchen sink"-ing the opcode space.  Leaving certain out ones (like leading zero count, byte flipping, etc.) which take many operations to implement can hurt you in specific scenarios, so you should include those (anything that speeds up floating point operations, or if you have to deal with endianness, should be included if possible).  But ones that you only use now and then, and which can be implemented in two steps with other ops, should probably be removed.  I think people often think of theses things in terms of AMD vs. Intel benchmarks, and how a slight performance difference in one or more seems like certain market success / death, but they're shoveling tens of millions of transistors at those benchmarks for very small improvements, and creating something so complex no one fully understands them so there are bugs, security holes, and assorted gotcha's everywhere.  It's kind of crazy, even fairly modest software and processors are enormous shaky state machines running on enormous shaky state machines, and the whole thing is only manageable because of a huge amount of automation (it's enormous shaky state machines all the way down).

And another thing: we all do it I suppose, but isn't it weird to rely on the error reporting of the compiler when modifying your code?  It's a testament to how well error reporting works, though it's easy to miss important editing details.  My assembler halts on the first error it encounters, which generally pinpoints the issue well, though generally not when it's a scoping error (I notice the C++ compiler I'm using also has problems pinpointing scoping errors).  But scoping is much too useful to not implement / employ.  The C++ compiler doesn't halt on first error, but rather insists on finding a huge pile of them before giving up, whereupon I fix the first one and recompile.  Kind of a time waster IMO, particularly since the subsequent cascade of errors tend to be due to the first few.  Though I suppose in certain debugging scenarios it's critically important to examine as much error info as possible.

Posted: 8/22/2018 4:28:26 PM

From: Northern NJ, USA

Joined: 2/17/2012

Whistler's Poor Stepchild

What makes a human whistle?  You might think it's just noise through a resonant bandpass filter, but a second order rings too much and lets too much noise sound through.  A simple sine wave is amazingly close.

I recorded my own real whistling and looked at it spectrally.  Surprisingly there is a small amount of harmonic content that increases when the whistle is "pushed" rather like vocal harmonics but a lot less.  So I dialed up the all harmonics phase modulation setting with just a touch of volume axis harmonic modulation.  Adding some pitch tracking bandpass noise makes it both realer and not realer, something I need to work on.  Will be adding 4th order options to the noise and oscillator filtering, so maybe that will lead somewhere.  Anyway, here is a sample of my Theremin whistling: [MP3]

There was a blank space on the prototype main UI page so I added the oscillator octave control, which is something I use a lot.  So the main page has preset load and store, pitch and volume null, auto-calibration, global volume, and oscillator octave controls.

You must be logged in to post a reply. Please log in or register for a new account.