Let's Design and Build a (mostly) Digital Theremin!

Posted: 12/25/2017 10:20:29 PM 1331

From: Northern NJ, USA

Joined: 2/17/2012

"Can you feed a familiar audio signal through your volume control circuit to demonstrate response and quality of signal pass through?" - Christopher

Yes, I'll do something like that soon. Was playing today with high-pass filtering added to the volume control signal, which gives some attack envelope on fast hand moves. Just got it working a few minutes ago and need to experiment with it more (I love working for myself, I get to pick the 80 hours I wish to work each week).

"I do this in my sample which also demonstrates another feature I achieved by using my method of PWM. The outer volume field is more responsive/aggressive than the inner near the loop. This allows for a very gentle, yet wider, soft control window near the loop before off."

Yes, I need to experiment with non-linear responses more, though the linearized response with hard limits seems pretty OK to me.

"You get a freaky daily visitor count on your thread."

So I've noticed!

"I like to think it is several competitive manufactures waiting for a moment to buy out your research with a perfect offer you cannot refuse. That end packaging scares me for you."

For a Theremin? I don't think anyone outside of a few tens of people in the world could care all that much, there's no money in even the best Theremin. Though a fantastic performing digital Theremin could probably be manufactured for <$200 (but good luck getting it through emissions testing).

"Ran a little test with the little girl pic, this thread is huge being cataloged in the search engines (google). Most hits are landing on other pages, not the most recent. A visit is still a visit. Would be nice you get a dollar for every hit, they all see that nice theremin ad at the top of the page."

I'd take a penny for every hit! I'd be interested in your data if you want to post it here or send it to me via email. I've no clue what's causing the hits, but this thread has been going for what seems like forever and I just figured it was a mix of bots and people who check in now and then.

Posted: 12/25/2017 10:47:58 PM 1332

oldtemecula

From: 60 Miles North of San Diego, CA

Joined: 10/1/2014

threads - posts

Lake Hiawatha is an unincorporated community located within Parsippany-Troy Hills in Morris County

Are you on vacation or live like Daniel Boone?

Christopher

Posted: 12/25/2017 11:25:11 PM 1333

dewster

From: Northern NJ, USA

Joined: 2/17/2012

threads - posts

I live in Boonton, NJ, and Parsippany is quite close, both in Morris county. It's not exactly a wilderness, but northern NJ can be a bit more lush, nature-wise, than one might think. So what's your hit data showing?

Posted: 12/26/2017 5:00:14 PM 1334

dewster

From: Northern NJ, USA

Joined: 2/17/2012

threads - posts

LPC Residual

Dug an old microphone out of my PC junk box. It wasn't working due to an internal cable break, so while I was fixing that I swapped the mic cartridge with a known high quality omni-directional one I bought several years back.

So I'm recording my voice and looking at it in Audition 3.0. The formants are fairly clear in the spectral view, but in the waveform view the vocal tract resonance are completely obscuring the generating glottal waveform shape.

It seems vocal synthesis researchers all run into this at some point, how to separate the source from the filters? LPC (Linear Predictive Coding) is one way they've solved it in the past. If I understand it correctly, LPC is a time domain based method of predicting the next sample value in a digital audio stream. The coefficients are selected by iteratively finding the least square error and subtracting it out. Essentially, it's a trainable filter bank that ends up filtering like the thing it's trained to, in this case the human vocal tract. The error that's left over after training - the residual - is the glottal waveform. It's often replaced with a simple pulse train, which gives you the robotic buzzy sounding Speak-n-Spell type voice.

For the glottal generator I'd really like to have a physical approximation of the turbulent noise source that it really is. Though I've noticed the mutated sine (per quadrant inverse squaring) isn't all that bad, and when some noise gets non-linearly added to it (accidentally in my experiments) it can sound quite realistic.

==========

Yesterday and today I experimented with high-pass filtering of the volume side signal. Mixing it back in I can get faster attacks and decays, and taking the absolute value I can "bow the air" (move my hand closer & farther) to make sound. Not sure what to make of any of this yet, the audible effect isn't nearly as prominent as I expected it to be.

Posted: 12/28/2017 3:27:23 AM 1335

dewster

From: Northern NJ, USA

Joined: 2/17/2012

threads - posts

Glottal Papers

Found several good papers on parameterized glottal simulation by Gunnar Fant et al. 10.1.1.643.5278.pdf ("A four-parameter model of glottal flow" - 1985) is early basics, and 1995_36_2-3_119-156.pdf ("The LF-model revisited" - 1995) revisits and regroups the parameters (you can find those and more here: http://www.speech.kth.se/publications/show_by_author.php?author=Fant). I really (really!) love spending time reading papers about things I'm burning to understand.

Their glottal waveform can be obtained by stimulating a state-variable filter tuned somewhat above the fundamental frequency and stomping on the wave via heavy damping somewhere mid-cycle. Played with this in Excel for several hours today but am not ready to code it up yet. A strictly math-based approach might be easier, but it pulls one away from the physics and is expensively iterative with transcendental functions. So the idea of a resonator rather than a formula as the glottal source appeals to me more. The sharp discontinuity produced either way will alias like a mofo, so I need to spend more time looking at it all. The waveform itself is a differentiation of the pressure wave, which is just a somewhat mutated raised cosine. Generating this and differentiating it in real-time might be easier than directly generating the results of differentiation (though better minds than mine have studied this for decades). The needs for a single vowel singing voice are a lot more relaxed than for a fully dynamic talking vocal sim.

For the filter stimulation, I tried sine, triangle, and square with band-pass output, and pulse with low-pass output. For the heinous levels of damping it seems one needs to damp both integrators simultaneously. But too much damping can easily lead to instability, what to do...

Posted: 12/28/2017 3:14:19 PM 1336

tinkeringdude

From: Germany

Joined: 8/30/2014

threads - posts

Hah! I'm even more glad now having subscribed to this thread :-D
What a coincidence. The past few weeks, I have been looking at / tinkering with putting together a (by today's standard) primitive speech synthesizer.
As a crude glottal waveform, I just used a band-limited sawtooth, which I expected to sound somewhat more bowed-string-like than voice, but perhaps close enough.
My goal actually was to get a "robotic" sounding voice, and setting the waveform to a very low freq like 65 Hz (C2), vowel formants I got from tables floating around the net for the male voice do produce an intelligible robot-ish sounding voice.(using pulses followed by a LPF sounds rather horrible. Is that what a "pulse train" is? Just a "low duty" pulse wave?)
At 110 Hz it's still intelligible but sounds less robotic. At 220 Hz it is less intelligible, and using female formants makes it worse, not better - but it could very well be because the portion of my oscillator implementation which I wasn't too lazy to translate from my C++ synth to my C# vocal experimenting sandbox project is currently no longer band-limited at that higher freq and there might be enough aliasing to muddy things up ;-)

I'll look at the very interesting papers you mentioned, although the math is probably above my math skill set (which is rather limited, hence my tendency to use such crude methods and be somewhat happy if I get something working decently ;-) ).

Btw, there is a really nice open source program called PRAAT (dutch for talking), made by phonetics professors, for students.
It is really good at tracking & visualizing formants in the spectrum of a recording (you can make short recordings from within the program and then do stuff with them, very handy, or load wav files), among other things. UI may be a bit funny, but not too bad.

EDIT #2:
Here's a quick howto: Scroll down 1/2 to see a screenshot of formant tracking (red dots)

EDIT:
Btw.:
Though a fantastic performing digital Theremin could probably be manufactured for <$200 (but good luck getting it through emissions testing).
(emphasis: mine)

Ha! I have wondered about that. Is that the reason why even Etherwave is sold as a kit? (at least where I have seen it)

Posted: 12/28/2017 4:39:15 PM 1337

dewster

From: Northern NJ, USA

Joined: 2/17/2012

threads - posts

Tinkeringdude, until I find something that works better I'm using a two quadrant multiple squared sine wave as a "poor man's" glottal source. It's fairly band limited in that it doesn't noticeably alias in the normal speaking and singing registers. It gives a sort of rounded sawtooth with variable harmonic content by mutating the falling edge of a sine wave (making it "snappier" or more vertical in transition).

To form it with floats:

1. Make a sawtooth NCO by adding a phase increment (at every cycle) to a modulo accumulator.

2. Feed this to a standard sine wave function which takes [0:2*pi] range as input and gives [-1:1] range as output.

3. Make a flag that tells you if the sine wave is currently in the middle two quadrants (where it goes from 1 to -1).

4. Make a flag that tells you if the sine wave is currently negative.

5. Take the absolute value of the sine wave to give an output range of [0:1].

6. Multiply this by -1 and add +1 to flip it, to again give an output range of [0:1].

7. If the quadrant flag is true, repeatedly square the value (3 or 4 times). This again gives an output range of [0:1].

8. Multiply this by -1 and add +1 to undo the flip in step 6.

9. If the negative flag is true multiply by -1 to undo the absolute value in step 5.

It's very simple (and even simpler when doing it with 32 bit integers). If you want a poor-man's band limited square wave just set the quadrant flag to always true, which will process all quadrants the same (I implemented this with a negative parameter, where the positive parameter value only does the middle two quadrants as described above, and the absolute value of the parameter sets the number of squarings, 0 giving a sine wave). Not doing the flip / unflip gives a different "blip" type of waveform, but it doesn't have sufficient harmonics for vocal use.

=========

For the formants, I've found you can pretty much set a common Q and common formant mix level and get good results (with my mutated sine wave glottal source anyway). The results are a little better with individual formant mix control, and a bit better with individual Q control. But the formant positions are critical, you will probably have to hand adjust them to taste, some of this likely depending on the harmonics mix of the glottal source.

=========

Wow, thanks for the pointer to PRAAT! The formant view is quite interesting.

Posted: 12/28/2017 5:24:47 PM 1338

tinkeringdude

From: Germany

Joined: 8/30/2014

threads - posts

Ah, thanks, I'll play with that type of source.
As for square or other regular waveforms - if I know how to produce a LUT for something, it's easy to use for me. I implemented the "crude but effective" (TM) scheme of "audio mip mapping" with one waveform LUT per octave and 2x oversampling to deal with the shifted-up harmonics of the bent-up base frequency one LUT was computed for, using the additive synthesis formula for a given waveform, producing a plenty oversampled set of tables with harmonics up to 16k or so, further reduced number of harmonics for each higher octave table. (I just omitted the mip level switching/blending for the code I ported to my C# test project here, so it's currently not alias free for higher freqs, but my original C++ impl sounds just fine also at very high notes). It is cheap enough with linear interpolation to run on a $3 or so microprocessor (which is the final target for all of this, which is one further reason why I'm not looking at current speech synthesis methods or existing libraries which need at least the "steam" of e.g. a Raspberry Pi).

You mention Q and mix level. Are you aware of tables which not only list vowel formant frequencies but also their bandwidth and relative loudness?
I only found tables with freqs, and even fewer going beyond 2 formants...
Well, my experimenting program has a bank of resonant state variable filters with level control, so I ended up fiddling Q and amplitude for every setting by hand, yielding ok results, but I guess they could be better. (perhaps looking more at spectra of my own voice, and slapping the FFT display to my test program to actually see the overall simulated "vocal tract" response will make me wiser in that regard, but that's on the TODO list ;) )

Posted: 12/28/2017 6:02:26 PM 1339

dewster

From: Northern NJ, USA

Joined: 2/17/2012

threads - posts

Yes, 2x additive wavetables are certainly a good solution to aliasing. I don't have the luxury of wavetables as memory is quite limited, but that's actually fine by me as it forces me to do more physical-type synthesis. It's kind of strange that filtering doesn't do much to eliminate aliasing, but mechanical things like polyBlep do. I wish there was such a thing as a generic polyBlep, though even that benefits from 2x sampling.

If you have a sample of the voice you're trying to emulate then the relative formant levels could be found by looking at a spectral view, taking into account the downward slope of the glottal harmonics. Formant Q is rather mild because of the vocal tract aspect ratio and the soft stuff it's made of. If Q is too high it clearly rings, if too low it doesn't do what it should, and setting it to some middle ground doesn't seem all that critical. I watched some videos of software vocal synths and I believe the Q was more or less fixed for the formants - the skirts all looked very similar while they were dragging them around with the mouse to change their frequency / amplitude.

I haven't tried using formant frequency tables, just adjusting things on the prototype to sound human. Minor adjustments to a single formant frequency can make things go from fairly human to fairly not, so I wouldn't expect a formant frequency table to e.g. completely nail Pavoratti, and particularly if it doesn't specify the glottal harmonic parameters. And, as I mentioned earlier, squaring the formant filter frequency, Q, and level parameters seems to give more useful adjustment ranges. "Tuning" like this pretty much has to be real-time interactive, otherwise it will drive you crazy.

The glottal synth papers imply that certain generator settings will give a "breathy" type sound, which I find hard to believe without some type of added noise to the process. Am anxious to see if this is the case, but have a week or so of volunteering ahead of me that I've been putting off for far too long. Gaaa!

A strange side effect of this type of investigation is the heightened awareness you get of this part of your body: you discover can close your nasal passage with your soft palate, close your mouth passage with your tongue, and do all sorts of things with your vocal cords, all while lying in bed, trying to get some sleep. Talking to my wife about the various aspects of the vocal tract tends to creep her out (in a meat robot sense).

Posted: 12/28/2017 7:10:42 PM 1340

tinkeringdude

From: Germany

Joined: 8/30/2014

threads - posts

Oh, as for noise, forgot to mention I'm actually mixing some noise to my sawtooth, as voices seem to have at least some little breathyness to it, not to mention consonants.

As for the "heightened awareness", hehe, I actually need to pay attention when recording vowels to actually make *speaking* vowels. I already have been quite aware of some of the goings on when vocalizing as classical singing is another one of my hobbies, so when I am about to vocalize to record it, my brain likes to switch to "vocal excercise mode" out of habit, and I end up with operatic-ish resonances :D

If talks creep her out, wait until she sees videos of the glottis in action ;)