Let's Design and Build a (mostly) Digital Theremin!

Posted: 3/4/2013 2:11:03 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

I ordered the Cyclone 4 board yesterday for $35 shipped.  It has double the block RAM and multipliers, and the fabric is faster.  Even if the Cyclone 2 board is adequate for the finished product I don't want to develop in a cramped space.  Also downloaded and installed the latest Quartus II software - at over 3 gigs it's a pig!  And they ripped out the simulator!  If you want to use the sim it's still kind of in there hiding (you can make a bat file to invoke it) but seems pure TCL or something and it is quite a bit slower than the old native simulator.  Here is a paper on how to use the new sim:

http://www.altera.com/education/univ/software/qsim/unv-qsim.html

Here is a nice quick tutorial for students on schematic entry and simulation:

http://cnx.org/content/m42302/1.3/

The controls are similar to the old sim, and it will open the old waveform stimulus files, but it's kind of awkward and slow to use.  When I was still gainfully employed the Altera rep told me almost no one used it, so there it went.

Not to make too much of this, but simulators have been dogging me to no end for what seems like forever.  The industry standard is ModelSim which is a hodgepodge of scripts, text interfaces, libraries and whatnot, and costs through the roof - I hate it.  The Quartus simulator has kept me afloat and relatively content up to now.  Po me, po me another drink.  Simulation is incredibly interesting and fun for me because I get to see my code actually working, and when I see it (inevitably) malfunction I often get a deeper insight into the code and the application.  I suspect the reason a lot of HDL coders don't like the sim phase is because of the funkiness of ModelSim.

[EDIT] I just tried to fire up my old v9 Quartus to get some work done and the webpack version 12.1 I downloaded and installed yesterday broke it!  I see that v9.1SP2 supports Cyclone 4 and has the old simulator, gonna spend another day downloading and installing that.  Due to SW churn (OSes & applications) I live in constant fear that the design chain I've become accustomed to and efficient with will break entirely.

Posted: 3/4/2013 7:54:16 PM
FredM

From: Eastleigh, Hampshire, U.K. ................................... Fred Mundell. ................................... Electronics Engineer. (Primarily Analogue) .. CV Synths 1974-1980 .. Theremin developer 2007 to present .. soon to be Developing / Trading as WaveCrafter.com . ...................................

Joined: 12/7/2007

Hi Dewster,

Yeah - I share that fear regarding the design chain going wrong!

One thing I have a problem with is probably due to my way of thinking - It would be real nice to have a small MCU inside the FPGA to manage simple tasks (like the UI) - I am fine if I can keep function blocks seperated along lines of analogue, logic, and MCU .. and sometimes mix these a bit .. But I dont see any MCU per-se.. Is there any standard MCU blocks (with ascociated ASM or C assembler / compiler / linker) one can just "drop in" to a FPGA as a "library" component?

I know you develop your own MCU - but this is way beyond my present requirements.. and digging up my PDP4 or PDP 8 schematics from the loft and trying to put these into the FPGA using schematic entry is not a task I would enjoy! ;-)

Fred.

Added ->

tis ok - Ive just done some searching on this topic.. Looks like theres no easy "standard" MCU "drop in".. And that Altera has a big FPGA with meaty MCU for those who want it - but at a price!

I just want the smallest cheapest FPGA / CPLD I can get - That £11 Altera board looks great.. Its really now about deciding whether to go with this and a small DIL MCU, or to design my board to take a big SMD PSoC 3 or 5 .. The latter will certainly be more costly and difficult (I need to get SMD mounted by someone else, which at the PT stage is a pain) but will give me all the glue logic I need and more MCU power than I need and analogue, A/D D/A etc if I need them.. And IO can run at 5V which makes my life a lot easier...

 

Posted: 3/5/2013 2:51:11 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

"Is there any standard MCU blocks (with ascociated ASM or C assembler / compiler / linker) one can just "drop in" to a FPGA as a "library" component?"  - FredM

Over at opencores.org there are hundreds of processor cores for the taking - they are free so YMMV in terms of quality and usability.  Altera's Nios economy version is free and small, but it's probably a black box kind of thing (can't open the hood on the guts).  Xilinx has the MicroBlaze which is probably similar, and they also have the PicoBlaze which is an 8 bit overgrown state machine - I've read of people using it to implement MIDI functionality (the design paper is interesting and quite tractable).  Lattice has the Mico32 and Mico8 which I believe are semi-open designs.

FPGA vendor designs will generally be much more polished and bug-free, but they are usually chained to that vendor's silicon and tool set.  Also, I wouldn't put too much stock in any speed figures they post, those are for the fastest speed grade (i.e. most expensive) devices with lots of effort put into synthesis, placement, and routing.

With my design I'm trying to sidestep the intellectual property and need for tools issues.  The hybrid stack/register approach reduces the need for lots of registers allowing the use of small operand indexes in the opcode, and relieves the stack gymnastics you have to do with a single or dual stack machine.  Multi-threaded removes all of the hazards from the pipeline (no stalls) and gives you as many threads (each with their own interrupt) as pipe stages.  The only down side is an idle thread wastes valuable throughput, but more conventional designs can't keep the pipe full and happy either.  I'm also aiming for an essentially stateless design (no flags, no reserved conventional registers) so subroutines take no overhead, interrupts consume a single stall cycle, and calculations can always be done with complete disregard for what might be happening in some other context.

I installed Quartus II 9.1sp2 free web edition yesterday and I recommend you do this too Fred.  The simulator works, and it supports the Cyclone families of devices.  Download and install are much less painful than for (bloated whale) v12.

Posted: 3/5/2013 3:48:04 PM
FredM

From: Eastleigh, Hampshire, U.K. ................................... Fred Mundell. ................................... Electronics Engineer. (Primarily Analogue) .. CV Synths 1974-1980 .. Theremin developer 2007 to present .. soon to be Developing / Trading as WaveCrafter.com . ...................................

Joined: 12/7/2007

Great! Thank you Dewster!

 Altera's Nios economy version is free and small, but it's probably a black box kind of thing (can't open the hood on the guts).

For what I need, I wont want to get inside its guts.. All I want to do is a simple graphical interface with encoder for user interface.. to allow "preset" selection etc, setting up of PWM M/S (simple D/A) to generate fixed values for the harmonic mixing or perhaps control some digital potentiometers to allow harmonic profiles to track the audio input frequency...

The rest of the FPGA will be used for miscillanious glue functions - implementing phase comparators for analogue PLL, generating the timing for frequency to voltage conversion from multiplied difference, perhaps auto-tuning.. that sort of thing.

I really dont want to spend a lot of time in the digital / coding stuff - All the harmonic generators will be LC filters with frequency locked by simple PLL I think (not strictly required as I could use IFTs to tune to the LC to the required resonance, but if I use a varicap and PLL I can implement other things in future - like expanding the series if any harmonics are not required, or doing direct register switching be re-assigning the resonant frerquencies), summed to form a composite waveshape with the harmonic levels being dynamically controlled, and this (reference oscillator frequency) waveshape being sampled at the VFO frequency.. the entire audio side being analogue.

Fred.

Thanks for the Quartus tip - I will download this as soon as I have downloaded some other stuff you kindly advised about! ;-)

Posted: 3/6/2013 4:48:08 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Having played with Quartus 9.1sp2 webpack some I can report that it fully supports the Cyclone II & III families but not Cyclone IV.  The timing parameters for IV are preliminary, so it won't even simulate them (*sigh* - guess I'll have to get 10.1sp1 for builds and hope the install doesn't clobber 9.1sp2).  A III or IV -8 speed grade part seems to be able to do two stage pipelined 33 x 33 bit signed multiply at 124MHz (II can only muster up 90MHz).  I'd be fairly happy if my processor could do 100MIPs.

Posted: 3/26/2013 10:14:44 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Another few weeks spent on the processor.  Worked out my own 33 x 33 signed multiplier that will do ~130MHz in two stages, or ~200MHz in three.  Can't do three stages in a 4 stage pipelined processor (need push/pop and I/O multiplexing) so I'm considering going to 8 threads and an 8 stage pipeline.  Interstage registering would of course increase but this would be roughly the same as the top speed improvement.  No more block ram would be required (individual stack depths would go from 64 to 32 deep, which is probably sufficient).  As with 4 threads, the 8 threads could all share the same subroutine code because they would be operating in a common memory, possibly leading to further code compaction via global factoring.

My concern is that the processor is growing larger than it needs to be for many applications, and that keeping 8 threads busy might be difficult for a human programmer to manage in assembly.  But 200 MIPS in a bargain basement speed grade 8 FPGA is really hard to ignore.

While combing through the opcodes I discovered that I could use a conditional SKIP instruction, which could prove to be quite useful.  Conventional stack machines generally can't have conditional operations because operands are always consumed, so the programmer can't tell what state the stack is in after a conditional two input operand operation.  With control over the pop you can do this, as long as the stack pointers don't change during the conditional operation.  This is one big thing that always bugged me about stack machines, auto consumption often requires awkward duplication and stack manipulation before an operation if one or both of the input values need to be used again.  In my design the operands are preserved by default, and either or both can be consumed if desired by setting the pop bit(s).

Hope I'm not boring everyone, it seems this is my own personal snipe hunt!

Posted: 5/5/2013 7:31:59 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

Haven't posted to this thread in a while but not for lack of working on it.  Still down in the processor data mines but finally seeing the light at the end of the tunnel.  Went with 32 x 32 = 64 bit 3 stage signed/unsigned multiply, which means 4 registers + I/O muxing + storage/retrieval, which pretty much forces the issue of 8 threads and 8 pipeline stages.  Finally finalized the instruction set.  Finalized the ALU, which will run at nearly 200 MHz in the bargain basement Cyclone III -8 speed grade part.  All 8 threads running full out may give nearly 200 aggregate MIPS for the core, which I believe is enough to do some serious audio work.  Though the main memory will be quite constrained due to limited block RAM in the FPGA fabric.

Most of the main blocks are now done (LIFO pointer ring, program counter ring, thread ring, opcode decoder), verified to be OK, and capable of > 200 MHz operation.  The thing that's taken perhaps the most time on this mammoth detour of a side project is the logical partitioning of functionality into self-contained components that are easily verifiable / speed testable.

To do: internal register set (all over but the shouting), pull it together (data ring + control ring + register set / main memory = the core), verify basic functionality at the top level, write a paper on it, post it at opencores.org (so others may inadvertently repeat my errors until the end of time, and to stake something of an intellectual claim), and the true test of the pudding: make a Theremin (and hopefully other electronic musical instruments) with it.

Also finally settled on a name: HIVE - not an acronym, more the concept of the swarm of activity in a beehive: a bunch of threads sharing the same program and data space, beavering away individually on separate functions, but cooperating together to get larger things done.  The underlying technology is an acronym: THRASH - for THreaded Register And Stack Hybrid (please don't call it TRASH!).  (I considered name variations based on THRASH, like WHIP and LASH, but like HIVE better - I suppose a good thrashing will give you hives!  Or is that welts?)

(TMI: while researching names like "hive" I encountered the (for some reason undiagnosed) condition I experienced often as a teen: Angioedema - my knee joints and sometimes my lips would swell up when I was experiencing acute emotional stress, like the first day of school, the day of a difficult exam, etc.  It's really too bad that one of the few things that makes living worthwhile - learning - so often has unnecessary anxiety associated with it.  Testing is at best extrinsic, and IMO is prima facie evidence of poor teaching skills and/or classrooms that are much too large.  That the classical music training world embraces testing and ranking like they do strikes me as teachers doing their level best to turn students off to one of the few other reasons to live - music.  It's a double whammy of stupid. IMO.)

Posted: 5/5/2013 9:53:42 PM
FredM

From: Eastleigh, Hampshire, U.K. ................................... Fred Mundell. ................................... Electronics Engineer. (Primarily Analogue) .. CV Synths 1974-1980 .. Theremin developer 2007 to present .. soon to be Developing / Trading as WaveCrafter.com . ...................................

Joined: 12/7/2007

Dewster,

That sounds like a really exceptional processor you have developed! - The sort of core which could have wide application for audio and particularly musical instruments / synthesis.

You mention limitation on the RAM - But presumably this is just the internal fast RAM - I presume you can connect (with apropriate configuration of the FPGA) external RAM/ EEPROM with slower access..

Not sure how MIPS relates to real-world speeds.. This is one of the areas where I have fallen foul of specifications.. Particularly with RISC instruction set, where multiplication by anything other than / on 8 bit data required a large number of instructions (as with PSoC1 M8 8 bit processor) .. so I wonder (pure curiosity) how long will a 32*32 bit multiply take to complete with your system?

As far as implementing a theremin - From what I have understood from your disclosures, it seems to me that if you have developed a really fast processing "block" with 32 bit data stream, then you are on your way to a digital theremin that can fully implement linearization in real time at a resolution that will not be audible - That you can, in fact, make a digital theremin which will produce an analogue output which (in terms of pitch resolution and "error" components such as quantization / zipper effect etc) will be indestinguisable from an analogue theremin - but will have all the advantages (linearity / fidelity etc) that digital technology has the potential to provide.

If this can all be "packaged" (or at least all the digital stuff) on a low cost available FPGA board, then I suspect the theremin future is in your hands.

I sincerely wish you all the best - This day was coming - It just needed digital technology capable of the speeds to be available at low enough cost.. But I think you may well have pushed the applicable  digital technology ahead by a few years - 10 years from now I expected fast enough cheap off-the-shelf boards to be available to do the job with little cleverness required..

It looks to me like you have taken available technology and cleverly used this in a way that has made such a product possible now..

Fred.

Posted: 5/6/2013 12:16:37 PM
dewster

From: Northern NJ, USA

Joined: 2/17/2012

"...I presume you can connect (with apropriate configuration of the FPGA) external RAM/ EEPROM with slower access."  - FredM

It could be adapted to be connected to external RAM, but because the core needs dual port access (one port for instructions, the other for data and literals) there would have to be some kind of adapter or wholesale copying from external to internal RAM.  Due to Altera's block RAM restrictions in this device family (can't do 32 bit wide "true" dual port with separate addresses), I've limited reads and writes to 16 bits wide, and the address also to 16 bits wide, though this could be changed fairly easily to 32.  Opcodes are 16 bits wide.

"...so I wonder (pure curiosity) how long will a 32*32 bit multiply take to complete with your system?"

For a 32 bit result (via the opcode one picks the lower 32 bits or the "extended" upper 32 bits of the result) it takes one "cycle", which, per thread, takes 8 (~200 MHz) clocks.  But there are 8 threads sharing the pipeline, so if they are all multiplying (or doing anything else for that matter) the core can do one each at this rate, which adds up to ~200 MIPS.  Think of it as 8 rigidly interleaved processors, each one running at ~25 MHz, sharing the same program and data space (which facilitates parameter passing & handshaking - and clobbering from rogue algorithms!).

"But I think you may well have pushed the applicable  digital technology ahead by a few years..."

Oh, I don't know, FPGA processor cores fairly prevalent, and none of them are very fast or power efficient compared to ASIC processors.  I've just designed one that I think I can live with a bit easier, so on some level it's reinventing the wheel.  It doesn't really make sense to put a processor in an FPGA unless one needs the FPGA in the first place for an application, and there are complex functions that don't have to be performed all that quickly as part of that application (such as UI, linearizing, filtering, voice generation, etc.).

Posted: 5/6/2013 5:34:28 PM
FredM

From: Eastleigh, Hampshire, U.K. ................................... Fred Mundell. ................................... Electronics Engineer. (Primarily Analogue) .. CV Synths 1974-1980 .. Theremin developer 2007 to present .. soon to be Developing / Trading as WaveCrafter.com . ...................................

Joined: 12/7/2007

"Oh, I don't know, FPGA processor cores fairly prevalent, and none of them are very fast or power efficient compared to ASIC processors.  I've just designed one that I think I can live with a bit easier, so on some level it's reinventing the wheel. " - Dewster

At present I have not seen any core within a configurable IC which gets close to the processing speed you are talking about.. Sure, there are fast processors, DSPs etc - but these come with a much higher price tag, and generally are not as configurable as a core implemented in a FPGA I think..

So I accept what you are saying in general terms -  BUT  - When it comes to theremin development, I think it will be quite a long time before some "Arduino equivalent" board comes to market which is fast enough and easy enough to use that REAL digital theremins with acceptable to 'pro' standard functioning could be produced by "average" engineers.

What I think you may have done is to take a cheap FPGA board and create a core for it which is capable of doing the job now.

And if this is the case, you deserve recognition  - There arent that many (if any other) engineers with the ability to design a multi-threading processor into a cheap FPGA, and who is doing this primarily to implement a digital theremin! - And without you, this would probably never had been done, and those wanting digital implementation of a theremin would have needed to wait until "simple" low-cost processor boards got to a level advanced enough to undertake the job when programmed using something like C++.. Most people are not capable of the kind of "hacking" you are doing! ;-)

Fred.

You must be logged in to post a reply. Please log in or register for a new account.