Let's Design and Build a (mostly) Digital Theremin!

Posted: 1/7/2016 4:33:23 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

To prepare for the audio side of things I'm looking into vocal synthesis.  Not finding a whole lot on the natural excitation of the filters (ala physical modeling of brass and woodwinds, where impedance discontinuities lead to oscillation w/ noise stimulus).

Downloaded SuperCollider yesterday and ran across this nifty cymbal synthesis method: http://www.mcld.co.uk/cymbalsynthesis/.  Just 100 resonators randomly assigned above 300Hz in an exponential manner and stimulated with pink noise creates a startlingly realistic cymbal sound.

Just as PCs require a certain amount of MIPs and memory to do remarkable things, music synthesis seems to benefit greatly from complexity and horsepower.  If I never hear another standard Moog patch (VCO => VCF => VCA) it will be too soon.

Posted: 1/8/2016 12:01:24 AM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Ha!  A "singing vowels" vocal synthesizer for SuperCollider:

http://sccode.org/1-4Uz

The 'I' vowel sounds kind of weird, I can see why the Talking Box goes for tenor Aah!

All the formant tables are there in the code, including Q.  Something to play with.

Posted: 1/8/2016 4:52:30 AM
rkram53

From: Northern NJ, USA

Joined: 7/29/2014

Interesting. What is it about "ah"? When you sing that your vocal cavity is pretty much wide open. Is that open cavity easier to model and emulate mathematically with filters than any degree of closed - which is what you get when you sing any other vowel or dipthong? Appears so. I don't know enough to go any farther here right now.

I bet a completely closed mouth might be more doable too - like "mm". That's just a guess though. 

By the way - what's the goal for your sound engine? What types of sounds are you after. You thinking of doing it a number of ways - for example maybe formats for vocal stuff, samples for others, etc. I really would like to see a theremin that could download a .WAV file for sampling (ideally wireless but maybe USB to start) and then have the theremin use that (the trick being playing it allowing theremin vibrato and portamento to be imposed on the sample - and possibly having an envelope control).

Posted: 1/8/2016 2:35:14 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Post 2^10 (yikes!)

"What is it about "ah"? When you sing that your vocal cavity is pretty much wide open."  - rkram53

When more closed, the lips form something of a low pass filter, reflecting more energy back into the vocal tract, perhaps making resonances not modeled by a simplistic formant approach more pronounced?  I don't know exactly.

My initial direction is to find and explore synthesis methods that are resonant and that oscillate naturally when stimulated, rather than the well worn oscillator => filter approach.  There is a waveguide model of the vocal tract done by Perry Cook (his PHD thesis) but he punted when it came to stimulus (pre-computed wavetables) probably due mainly to the wimpy computing resources available to him at the time (1991).  I don't have a ton of RAM to play with, so things like long delays and reverb are probably out, but waveguides are doable.  I see that much of the waveguide stuff was patented, but apparently the patents have all expired, so that legal minefield has thankfully been cleared.

Posted: 1/16/2016 7:03:34 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Prototype

2016 is the year of the prototype!  I hope!  Here it is so far:

Except for the aluminum flashing plate antenna inside the right pitch box, these are just empty Sterilite boxes affixed to a piece of plywood via CCTV camera mounts, on an old RS folding mic stand.  The CCTV mounts are frail and shaky even when tightened up, so I wouldn't recommend using them for the final deal.  I wound a 0.3mH coil on PVC for the pitch side but haven't experimented further.

==================

I've been spending my time looking into sound synthesis, particularly PC software that will allow me to easily experiment in floats, and to finally try things out with fixed point ints before coding the algorithms up in Hive.  Supercollider works, but the language syntax is IMO quite arcane and so very hard to get into, and the sound synthesizer is buggy on my XP machine (lots of trial and error coaxing to get it to start up).  Haven't looked at Csound yet.  One interesting app is SynthMaker, where GUI modules are interconnected via heirarchical schematics, you can drill all the way down to code, and it will even produce stand-alone modules and plug-ins.  It seems to have morphed into something called "FlowStone": http://www.dsprobotics.com/flowstone.html where the base language is Ruby.  Even my old version of SynthMaker is quite slick and very nicely done.

==================

Anyway, I examined the SynthMaker state variable filter code for any insights it might afford.  Turns out it's a straight-up Chamberlin, but there's a polynomial sine approximation for the input tuning variable, which made me investigate the subject for a Hive subroutine.  Several days later I've got an 8th order poly cosine worksheet in Excel with +/-1.2 ppm error, and now need to work on a Hive version and get it documented.  So many digital ditches to dig!

Posted: 1/23/2016 12:11:18 AM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Normalized Sine

It's rather eerie that a Taylor series polynomial can approximate as many sine (or cosine) cycles as you want:

  sin(x) = (x^1 / 1!) - (x^3 / 3!)  + (x^5 / 5!) - (x^7 / 7!) + ...

Because it only has odd powers of x, sine is an odd function.  So the polynomial, truncated to however many terms which provide sufficient accuracy, may be evaluated with half the usual effort.  But the coefficients will have to be carefully adjusted to best fit the truncated series that we settle on.

First we have to decide what form the input and output will take.  To make the function more generally useful, and to maximize resolution, we can normalize the input angle z so that the full range of input produces one full rotation (x = z * pi / 2), and normalize the output to produce the full range of values over one rotation.  Both input and output are signed, and in a perfect world the output sign would follow the input sign naturally, but doing so wouldn't leave enough headroom for the calculations, which must instead be performed in an unsigned manner over a single quadrant.

I used trial and error to obtain the coefficients, which requires a fair amount of time and patience.  It helps to train yourself by solving the simplest case first and then working your way up.  In all cases the first "hump" in the error curve from input zero to positive should be negative, and there should be as many "turns" in the error curve as there are terms.  For a 32 bit process that employs a Hive type "extended unsigned" multiplies (i.e. the upper 32 bits of a 32 x 32 unsigned multiplication) the errors are as follows:

  one x term:   -45M
  two x terms: +/-1.3M
  three x terms: +/-175k
  four x terms: +/-15k
  five x terms: +/-10
  six x terms: +/-2

For 32 bit representations it seems worth going to 6 terms as the error here is negligible.  It might be possible to reduce the error to +/-1 as there don't seem to be many input values that give error larger than +/1, but I ran out of patience.  One must be careful to produce a slightly negative error at the end of the polynomial or the output will overflow.

Mathematical operations are minimized by interleaving the power operations in with the coefficient operations:

  y = Ax - Bx^3 + Cx^5 - Dx^7 + Ex^9 - Fx^11
  y = x[A - x^2(B  - x^2[C - x^2(D - x^2[E - x^2(F)])])]

Before the initial squaring, the input is shifted left two positions, which removes the extraneous quadrant info, maximizes the resolution of the calculations, and provides sufficient headroom for the A coefficient.  After this, the input signed to unsigned is handled via a subtract 1 followed by a bit NOT of the input if the second most input MSb is set.  The subtract 1 step is skipped if the input is the full scale negative value, which neatly accomodates it, and convenenetly in the area of the graph where it is flattest and therefore the output is changing the least.  Output unsigned to signed is handled similarly if the input MSb is set, but here the coefficients can guarantee that there will be no overflow.

To get Cosine from this we simply offset the input value by +1/4 or -3/4 full scale (add 0b01 to, or subtract 0b11 from, the input MSbs).

Note that the output is not necessarily monotonic, particularly for the higer term cases, because the LSbs have truncation noise.

I've got it coded up in the Hive simulator, results match the Excel spreadsheet.  24 instructions minimum, 29 maximum (depending on the input value).  Sine is one of those things that's hard to tell when you're really done figuring out all the nuances.

Posted: 1/31/2016 5:28:08 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Normalized Cosine

I believe cosine may be slightly superior to sine in terms of truncation noise because the five term polynomial regression error "poops out" right at the threshold of 32 bit resolution.  And it is considerably easier and less dangerous to do the final cosine polynomial coefficient touch-up adjustment because the quadrant endpoint is around zero rather than at a maxima as in the case of sine, so there seems to be no possibility of overflow with mis-adjustment / noise.

I used Excel's "LINEST" function to get the polynomial coefficients, though this function sometimes fails by leaving out data it doesn't cotton to whenever it feels like it and without warning, which is basically insane (like everything else in Excel it seems).  For those situations (i.e. when the higher power coefficients start growing rather than shrinking) I obtain the coefficients from this site:

  http://polynomialregression.drque.net/online.php

I've got polynomials for EXP2 (which speeds things up by a factor of 5 over the bit-by-bit root multiplication approach and is considerably more accurate), LOG2, and TANH, but am currently looking into the best way to include a "NEG" (arithmetic negation) opcode in Hive.  One can do B*(-1); 0-B; (~B)++;  ~(--B), etc.

Floats are another thing I'm looking into.  The EXP2 function specifically brings you face to face with their usefulness.  Though the packing & unpacking into a 32 bit space is inefficient and reduces precision.

It's never been so clear to me how math functions are used disparately by engineers and mathematicians.  Engineers often need something fast and "close enough", while mathematicians seriously tax the life out of a system by ignoring the underlying hardware and software limitations.  And base 10 is really unfortunate, if only we had not considered our thumbs when devising our number system, much of the binary "magic" one encounters for the first time when designing digital logic (https://graphics.stanford.edu/~seander/bithacks.html) would be second nature, and there wouldn't exist non-represenational fractions between the systems.  Base 4 would perhaps be best as it is square (2^2), one can readily distinguish 4 items from 3 or 5 at a glance, and times tables could be learned in kindergarten.  Probably a good thing I'm not king of the world!

Posted: 2/3/2016 12:58:10 PM
rkram53

From: Northern NJ, USA

Joined: 7/29/2014

Dexter,

Don't worry. The qubits are coming! The qubits are coming! Soon it will be time for a total rethink.

I'm saving up for a quantum theremin.

You must be logged in to post a reply. Please log in or register for a new account.