Let's Design and Build a (mostly) Digital Theremin!

Posted: 2/28/2018 11:47:43 PM

From: Colmar, France

Joined: 12/31/2007

"(...) All of them I kind of get, particularly if I were making fantastic claims and refusing to describe how I did it (i.e. the norm in this field) but I'm not doing that.  And top musicians are often paid to endorse otherwise mediocre products.  I don't have the money nor the inclination to engage in the old razzle-dazzle.  It's been almost 6 years now and I'm fairly proud of how far I've been able to take the prototype, though I suppose pride is a sin for a reason..."

Ok, then be it. I could not imagine that you were serious, but obviously, you were. I stand corrected. Please accept my apologies for that misunderstanding!

It‘s perhaps a question of mentality (US American vs French/German), but I, personally, would never publicly judge the quality of my own work or compare it to other more renowned people‘s creations, be it in the Theremin domain or any other, but let this part up to people with more wisdom and experience than myself.

Posted: 3/1/2018 1:39:43 AM

From: Northern NJ, USA

Joined: 2/17/2012

Thierry, apology accepted.  But I don't think the issue is cultural.  If I could submit my prototype to an impartial panel of Theremin judges I'd do it tomorrow, but that doesn't exist, so it's a chicken or egg kind of thing where people don't know about something because no one knows to tell them about it, so I suppose it's up to me to start it, but I'm highly suspect as a source of info (my baby).  But companies can talk their crap products up to the moon and no one objects, it's just business as usual. 

Certainly if you stated you'd made a Theremin that "mopped the floor with the EW-Pro" I'd sit up and take notice; your statements would be innocent until proven guilty.  But you're somebody in these circles and I'm nobody.  What's a soul to do?

Posted: 3/1/2018 3:24:31 PM

From: Northern NJ, USA

Joined: 2/17/2012

One Knob Pitch Correction / Note Quantization

You look at consumer vocal processors and have to wonder how much flexibility they give up when dumbing-down the user interface. Power users desire access to key feature parameters, but exposing them can confuse casual / low-information users who want the thing to "just work".  All users are generally happiest when the most useful functionality is "curated" and presented in the most useful manner (power users pretty much want it to "just work" as well, and only desire the resort to a deep-dive when it doesn't).

I've wondered how they get a single knob for pitch correction, and this morning I've achieved that, with no real loss of functionality or control.  Voila:

The 4th order LPF is gone, and the slew limited LPF has been replaced with a simpler slew limiter.  Going through the circuit:

The unsigned pitch number is tapped off and multiplied by 12 (B), which gives a [0:1) note fraction.  The fraction can be viewed as unsigned or signed, here we treat it as signed.  A logical NOT (C) flips the direction, and fractional multiplication gives the quantization strength (D).  The result is slew-limited, scaled by multiplication with 1/12, then mixed back in (E).

The graphs at the bottom show the result (E) of partial quantization strength, full strength, and no strength.  I've found partial instantaneous quantization to be pretty worthless, so I removed the quantization multiplication.  The pitch correction subroutine now checks the slew rate for a value of 0 and bypasses all processing in that case.  So low and medium slew rate settings do pitch correction, and higher settings do quantization, with time doing the quantization smoothing, rather than an explicit instantaneous smoothing function (this is key).

It seems to work as well as anything else I've tried, though it can be quite subtle, even with the slew rate cranked up.  Vibrato kind of messes with it, though it doesn't seem to mess with vibrato when set for pitch correction.  And it's really simple, with one knob giving the full range of both pitch correction and note quantization effects (including bypass).


While pondering the above, I thought of a new interesting construct, but it's currently a solution in search of a problem.  The construct is a modulo filter or slew limiter.  Say the I/O is 3 bit unsigned and it is steady state with an input of 2.  If the input suddenly changes to 5 then the output filters / slews over time 2, 3, 4, 5.  But steady state from 2 to sudden 7 produces an output of 2, 1, 0, 7.  So if the I/O difference is greater than 1/2 full range, then the output filters / slews in the opposite direction.  My subconscious is telling my conscious mind that this is applicable to pitch correction, but my conscious mind can't see a way to it.

Posted: 3/2/2018 2:20:23 PM

From: Northern NJ, USA

Joined: 2/17/2012

PM Experiment

A TW thread from a couple of years ago takes you to this very nice article: http://www.channelroadamps.com/articles/theremin/.  The tone is fairly Clara-ish, and if you save the MP3 and examine it, the waveform is as described in the article: a rectified sine with a hollowed-out section, possessing mostly all harmonics and with a secondary slump in the spectra:

Though there isn't any explicit coupling in that design, coupling makes me think about phase distortion, which made me wonder what would happen if I gave the phase modulation signal feeding the sine function in my glottal oscillator a fixed phase offset.  I did so this morning and the results, which look rather dramatic in the waveform department, are sonically fairly lackluster.  Adjusting the phase offset can produce something that looks like a rectified sine, but sounds almost exactly like the glottal waveform with the harmonics turned down to 1/2.  Same for the odd harmonic settings, this method can give you an almost perfect triangle wave if you want that, but it doesn't fundamentally change the way it sounds other than reducing the overall harmonic level.  The phase offset doesn't seem to make aliasing any better or worse, and it tends to introduce a DC offset.  So, overall, meh.

This is roughly analogous to the ear being deaf to phase, but somewhat more complex and surprising.

The design in that article says the mixing product is almost a sinewave, and diodes are used to rectify it.  After that a high pass filter (0.001uF & 1Meg = 160Hz) provides the scoop-out.  I can get pretty much the same waveform by setting my oscillator filter to high pass, Q=1, frequency=800Hz:

I'm not sure why the cutoff frequency needs to be higher, and there seems to be a secondary spectral fall-off around 1kHz that I can't rationalize via his schematic, but I assume it is due to the natural roll-off of the large diameter vintage speaker.  I have a feeling a lot of the character of the sound originates from ambient miking of the amplifier & speaker in the cabinet.  The cabinet, speaker, and microphone modeling of a good guitar multi-effects pedal board might come in handy here (set to a small 10" or 12" open-backed cabinet), as might the milder distortions (and reverbs!) to be found in there.

I believe FredM once stated that he found a full-wave rectified sine to be a fairly ideal glottal source, and this experiment has really driven that home to me.

Posted: 3/7/2018 8:28:22 PM

From: Northern NJ, USA

Joined: 2/17/2012


Did a final re-tooling of the parameter subroutines and got the preset system working today.  I have a PRESET screen that has two functioning knobs load and stor.  As usual, spinning the load knob loads various presets.  To store a preset, set the stor knob to the preset slot you want to write to and press the knob.  This dual selection facilitates copying and moving presets around in the preset slot space.

Preset slot 0 is reserved for global system parameters.  Having built the thing I know which are which, but I suppose this could be a hazy area for those getting newly acquainted.  Thinking maybe of a mechanism of storing individual parameters in a preset - whether global or not - by pressing the associated encoder on the given page, which would get around a lot of the need to have intimate knowledge of the inner workings.  We'll see, but at least now I have a way of holding onto presets and quickly selecting among them.


Had a power outage for a few days due to the previous nor-easter.  Locally it was mostly just our block, though many thousands of poor souls in NJ, PA, NY, etc. are still waiting, and my sympathy goes out to them as it must be particularly bad with the snow dump today. We hear about "rolling blackouts" and think they must be hell, but I'd take those any day to a continuous multi-day blackout, where you have to figure out what to do with all the food in the fridge, and the email backs up.  My $28USD Baofeng UV-82 was super handy, picking up the weather bands, the local fire company, and FM broadcast radio, going for many hours with hardly a dent in the battery charge.  Dug out my old Sony portable CD player to get my music fix.  Weird how quickly one reverts to "farmer's hours" when there's no artificial lighting.

I used the down time to peruse the DSP books in my library and revisit several key papers I had printed out.  The current setup I have for pitch and volume operating point capture could probably be reduced from second order CIC to first order, which would simplify and save some FPGA fabric (though I'm not really strapped in that department).  However, increasing the decimation by a factor of 16 (196.66MHz => 3kHz) could save 16x the memory in the CIC hum filters, though this would require an interpolation filter to get the operating points back into the 48kHz PCM clock domain.  I know the CIC form can do this as well, though I haven't really looked into it much.  One thread running at the 3kHz interrupt rate could easily handle much of the volume and pitch processing. This is known as multi-rate filtering and I have a new appreciation for the concept.  Getting ~1/2 of the main memory back would let me breathe a bit easier, and would allow for things like audio delays and cabinet sims and stuff.  In many ways operating point processing isn't as critical as audio processing.  [EDIT] The more I look at it, the more it seems I have to stick with second order CIC.  First order doesn't give nearly enough alias rejection.

[EDIT2] What's confusing me about CIC decimation is the apparent ability to trade aliasing for bandwidth after downsampling. This contradicts the general admonition that aliasing is generated by downsampling, and that it must be controlled via low pass filtering before any downsampling occurs.

Posted: 3/10/2018 12:17:52 AM

From: Northern NJ, USA

Joined: 2/17/2012

Obligatory "Clara's Voice" Post

For analysis purposes I went through my two CDs of Clara Rockmore and extracted the portions that featured just her Theremin playing sans accompaniment: MP3 link.  There are 15 samples, the first 10 are from "The Art of the Theremin" and the last 5 are from "Clara Rockmore's Lost Theremin Album", and they are all separated by 1 second of silence.

The first CD is much "drier" and unprocessed sounding.  The second sounds fairly heavily processed (e.g. the piano sounds very compressed).

Regarding the subjective tone in general, the upper registers sound quite female vocal, the mid registers are buzzy, the bass registers super strange sounding. Here and there it almost sounds like there is damage to the speaker cone or something similarly non-linear going on.  Around 300Hz the tone (to me) starts to get objectionable, and I find 200Hz and below to be unpleasant.

If you look at the samples with an 8k FFT, for the mid and higher notes you'll see a dip around 2kHz, and a sharp drop around 4kHz with no real output beyond 4.5kHz or so.  The dip disappears for lower notes, and the 4th harmonic also is suppressed, and I suspect oscillator coupling is happening here.  The 2kHz dip is only present in the samples from "The Art of the Theremin".

One has to be very careful when analyzing this stuff for clues as to what is going on electrically to generate the timbres.  The main confounding thing is the recording is almost certainly of her open backed "monitor" speaker, which will roll off the bass and comb filter the midrange.  The response of larger drivers tend to drop like a rock (~4th order LPF) in the highs, but not before breaking up and resonating and doing other complex things before that.  And of course there are room reflections and other resonances.  So, just as an electric guitar sounds utterly different when listened to straight into a mixing board vs. open back amplifier miked, something very similar is going on with the Theremin sound on these CDs.

I hacked together a very rudimentary speaker cabinet simulator using a delay (back to front baffle travel time) and small CIC (to average the various travel distances) with destructive feedforward summation (subtraction) and attempted to get something similar to Clara's sound.  I'll do a post on that soon.

Posted: 3/12/2018 11:43:25 AM

From: Scotland

Joined: 9/27/2012


whilst I don't understand much of what you post in this thread (analogue, I can almost get my head around but digital...) I do read it and try to follow it.

Most interesting stuff on the Clara Rockmore sound extracts though smile

(Just commenting so that you know people are reading your posts and that you are not posting into a vacuum)



Posted: 3/12/2018 2:34:23 PM

From: Northern NJ, USA

Joined: 2/17/2012

Obligatory "Clara's Cat's Voice" Post

Thanks Roy!  Posting into a vacuum is OK, though some back and forth is always quite welcome.  I worry about readers getting tired of it all, but I'll push on as long as there are developments.

Hey, while looking for Clara's Theremin voice I stumbled across Clara's cat's voice! ;-)  MP3 link.  Glottal (~ramp) waveform feeding a second order tracking bandpass filter with moderate Q and offset a couple of octaves above the fundamental.  This flattens the first 4 harmonic amplitudes.  The smooth expressiveness of the Theremin lends much of the realism to otherwise fairly basic timbres.

About Clara's Theremin voice: it's pretty easy to get female type vocal sounds at the upper range because there are so few harmonics and formants up there.  The key goal I think is to make the range below this sound like the violin family.

Something I'm currently hung up on is a more generalized audio synth signal and processing path. Trying to minimize the page & knob count while maximizing the versatility.  I believe in the near term it will end up something like this: 

  oscillator & noise (with filter) => fixed/variable 1st/2nd order filters (series) => fixed/variable formant bank (parallel)

Posted: 3/12/2018 11:14:56 PM

From: Germany

Joined: 8/30/2014

Haha, cat! Some time ago I was looking for dog vocal tract formants and, lo and behold, I found a paper about it which lists, alas only 2, formants vs. dog breed in a table.
I was not yet able to make notable use of that info, as I don't have a setup to produce the distorted, "violent" kind of vocalization dogs have, and I'm not sure I would put a huge amount of effort into it, but it would be funny ;-)

Posted: 3/14/2018 12:24:58 PM

From: Northern NJ, USA

Joined: 2/17/2012

"Some time ago I was looking for dog vocal tract formants and, lo and behold, I found a paper about it which lists, alas only 2, formants vs. dog breed in a table."  - tinkeringdude

Ha!  Animal formants aren't something I would have even thought of.  I'm not sure how dogs do the barking thing, but a lot of audio stuff, when the process is sped up, takes on an entirely different character.  So maybe it's something simple happening quickly?

Is this (link) the paper?  This (link) chapter is good too, barks don't involve the nasal resonances as that is closed off.


In a effort to modularize the code, I stuck a first order filter (low-pass, high-pass) together with a second order state variable filter (low-pass, band-pass, high-pass, notch) and made the first order operating modes negative values of a parameter, with zero a pass-through. Homing in on a single versatile filter "blob" that can do whatever I need it to do.

Digital filters tend to have more and more tuning error the closer you get to Nyquist (1/2 the sampling frequency) - the second order form goes a bit sharp, while the first order goes really sharp.  Chamberlin gives formulas for correcting them, but in the end it comes down to polynomial approximation.  

After dumbly staring at the chain of transformations I had in front of the second order frequency for too long I decided to trash it and go with a single polynomial that does it all, which simplifies and speeds things up. A simple polynomial that gives almost 16 bit precision is 0.547946x - 0.0271x^3 and this takes care of the maximum frequency being C9 (8372Hz) for full-scale input as well. 

For the first order filter I found the polynomial x - (x^3)/3 to give 8% or so error max (and on the low end), which sounds like a lot but there is no peaking or resonance that might reveal the exact frequency it's set to, so tuning isn't nearly as critical.  We mainly just want to tame the super sharp high end, and ideally the max error would be located there, but even if you wanted to get surgically precise, the high-pass cutoff diverges from the low pass cutoff at higher frequencies, and the high pass response oddly gains up somewhat as well.  They have their uses, but first order filters are mushy sorts of affairs, and it doesn't pay to go too crazy on them.

For just about anything that really matters in music synthesis (pitch, waveform fidelity, noise floor, etc.) I've found 16 bit precision to be a reasonable, rough-and-ready target.

You must be logged in to post a reply. Please log in or register for a new account.