"...I presume you can connect (with apropriate configuration of the FPGA) external RAM/ EEPROM with slower access." - FredM
It could be adapted to be connected to external RAM, but because the core needs dual port access (one port for instructions, the other for data and literals) there would have to be some kind of adapter or wholesale copying from external to internal RAM. Due to Altera's block RAM restrictions in this device family (can't do 32 bit wide "true" dual port with separate addresses), I've limited reads and writes to 16 bits wide, and the address also to 16 bits wide, though this could be changed fairly easily to 32. Opcodes are 16 bits wide.
"...so I wonder (pure curiosity) how long will a 32*32 bit multiply take to complete with your system?"
For a 32 bit result (via the opcode one picks the lower 32 bits or the "extended" upper 32 bits of the result) it takes one "cycle", which, per thread, takes 8 (~200 MHz) clocks. But there are 8 threads sharing the pipeline, so if they are all multiplying (or doing anything else for that matter) the core can do one each at this rate, which adds up to ~200 MIPS. Think of it as 8 rigidly interleaved processors, each one running at ~25 MHz, sharing the same program and data space (which facilitates parameter passing & handshaking - and clobbering from rogue algorithms!).
"But I think you may well have pushed the applicable digital technology ahead by a few years..."
Oh, I don't know, FPGA processor cores fairly prevalent, and none of them are very fast or power efficient compared to ASIC processors. I've just designed one that I think I can live with a bit easier, so on some level it's reinventing the wheel. It doesn't really make sense to put a processor in an FPGA unless one needs the FPGA in the first place for an application, and there are complex functions that don't have to be performed all that quickly as part of that application (such as UI, linearizing, filtering, voice generation, etc.).
"Oh, I don't know, FPGA processor cores fairly prevalent, and none of them are very fast or power efficient compared to ASIC processors. I've just designed one that I think I can live with a bit easier, so on some level it's reinventing the wheel. " - Dewster
At present I have not seen any core within a configurable IC which gets close to the processing speed you are talking about.. Sure, there are fast processors, DSPs etc - but these come with a much higher price tag, and generally are not as configurable as a core implemented in a FPGA I think..
So I accept what you are saying in general terms - BUT - When it comes to theremin development, I think it will be quite a long time before some "Arduino equivalent" board comes to market which is fast enough and easy enough to use that REAL digital theremins with acceptable to 'pro' standard functioning could be produced by "average" engineers.
What I think you may have done is to take a cheap FPGA board and create a core for it which is capable of doing the job now.
And if this is the case, you deserve recognition - There arent that many (if any other) engineers with the ability to design a multi-threading processor into a cheap FPGA, and who is doing this primarily to implement a digital theremin! - And without you, this would probably never had been done, and those wanting digital implementation of a theremin would have needed to wait until "simple" low-cost processor boards got to a level advanced enough to undertake the job when programmed using something like C++.. Most people are not capable of the kind of "hacking" you are doing! ;-)