Go-DSP-Guitar multichannel multi-effects processor

Discuss anything new and newsworthy! See http://planet.linuxaudio.org and https://libreav.org/news for more Linux Audio News!

Announcements of proprietary software may fit better in the Marketplace.


Moderators: raboof, MattKingUSA, khz

andrepxx
Established Member
Posts: 11
Joined: Fri Jan 31, 2020 5:53 pm

Re: Go-DSP-Guitar multichannel multi-effects processor

Post by andrepxx »

I did not follow the whole conversation [..] but this caught my attention. In the jargon I am used to, sub-millisecond means "up to 1 ms worst case scenario".
The original announcement from the Google dev is here: https://groups.google.com/forum/#!msg/g ... aL0E8fAwAJ

I don't think worst-case behaviour of the garbage collector is well-defined. I already doubt that the worst-case behaviour of the operating system's memory management (malloc, free, ...) is well-defined, which is why you're not supposed to call them from a "real-time" thread.

However, if you need "sub-millisecond" timing, you probably won't be able to run it on a general-purpose operating system anyways, including one based on the Linux kernel, since the scheduler's time slices would already be larger than a millisecond. Linux is not an RTOS. (And even a PC, from a hardware architectural perspective, is not really built for real-time. You know, it has all sorts of things that makes timing unpredictable, like interrupts, caching, pipelining, speculative execution, multiple cores / hardware threads doing things concurrently, power-management techniques like dynamic frequency scaling, ...) Even with real-time patches that give you the "fully preemptible kernel", the best you can hope for is being able to run code with hard (?) real-time guarentees in kernel space. So a device driver might be able to run in hard (?) real-time, for example to toggle an I/O pin in software and use that as a clock signal for some serial interface or to do some PWM. A user-level (POSIX) program (and this is what all common audio applications that I know of are) will run under the preemptible Linux kernel and will therefore definitely not have hard real-time guarantees.
To put things into perspective, assume we are operating at 48 kHz sample rate on a 32 buffer size. That would mean that the audio callback of any application has less than 0.6 ms to process all the samples in the buffer.
I somehow doubt this is possible on a PC. I can barely run JACK at a buffer size of 64 operating at 96 kHz, but then I can do no processing, only use it as a "virtual wire". Any lower and it doesn't even start. I don't know of any "real" client I could attach to it when operating at this low of a buffer though. C code that just does a "memcpy(...)" from input to output might be fine. I haven't tried. But any real calculation, actually, nope.
Like: maybe I will not be able to run go-dsp-guitar on a raspberry pi at 16 samples per frame and 96 kHz with 8 channels.
RasPi will very likely be too slow for go-dsp-guitar in real-time mode. We do distribute AArch64 binaries, but they're rather for "batch mode". In fact, I never got JACK to work properly on a Raspberry Pi (3B), but I have to admit that I didn't try too hard.

I can do 512 samples buffer size at 96 kHz (which is ~5 ms) with 2 channels on an Intel Core i3 2310M with pre-amp and power-amp simulation using a filter order of 2048 (with 4096 being just "slightly too large" - we will probably optimize that bit out soon ;-) ) and personally, I consider that "fast enough".

With Guitarix, I can do 256 samples stable (so ~2.5 ms), 128 samples x-run-s rarely. However, their default cabinet simulation does not use convolution, so it's a bit of an apples to oranges comparison.

If I don't use convolution, I can go 256 samples (~2.5 ms) stable in go-dsp-guitar as well, 128 samples x-run-s rarely. Therefore, the performance of the two applications appears to be very similar. However, I'd claim that go-dsp-guitar "does more". It tends to have more sophisticated models, since we started off with a circuit-simulation approach and always put model accuracy and "physical motivation" over pure performance. Therefore, it is supposed to be quite a bit more computationally intensive. It has more inputs and outputs and it applies a "room simulation" to the signals and creates a stereo mixdown in the end. All calculations are done in "double precision" floating-point arithmetic. (Not sure if Guitarix uses single or double precision - not that it would matter a lot, but of course, double precision will use more Cache and memory bandwidth.) And it is implemented in a higher-level language.
I do not really care, nor I am pretending that your comment means that you consider Faust totally useless, but my feeling about that attitude in general, independently from the degree it is manifested, is that it looks more like cultural resistance rather than something technically motivated.
I share your opinion that, from a user perspective, one should not really care what language some project uses to get its job done, as long as the results are fine. It was just one of the reasons for me to start my own project instead of contributing to, say, Guitarix. I wanted to take a different route, use something more "mainstream", that I would have experience with, not have to learn from scratch specifically for this project (and then probably produce really bad code in - at least in the beginning :mrgreen: ), that I could hope more people would be familiar with (or, even if they're not familiar with it, then at least it's more similar to something they're already familiar with - folks coming from a Java or C or Python background tend to pick up Golang pretty easily).

The other thing (and that's probably the more important differentiation) is that the projects have very different focus as well. You can remote-control go-dsp-guitar over a web interface (or from any application that can send JSON over HTTPS) and therefore run the actual signal processing on a "headless" machine. Guitarix has an optional web UI as an addon, but you cannot control everything from it. Guitarix obviously has way more plugins than go-dsp-guitar. (It's also around for quite a while and therefore probably "more mature", has way more wrinkles ironed out, etc.) However, the main differentiation is, like I already mentioned, that go-dsp-guitar is built from the ground up following a rigorous simulation approach, and in fact this was the number one reason for starting a new project.

In the end, I don't like to compare go-dsp-guitar and Guitarix too much though. They're definitely different projects, which are supposed to have their own place and neither is supposed to replace the other, even though they obviously will have quite some overlap in functionality for the user. :mrgreen:

By the way, our first prototype was not implemented in Golang at all (and of course, it didn't have that project name back then), but rather using a mix of Python and C. (We tried to do the actual signal processing in C, the "control" in Python and then do cross-language calls between them.) We did not manage to get any audio API working properly back then though. We tried PortAudio and rtAudio, but both of them gave us very obscure errors. We also tried "raw" ALSA, but that didn't work for real-time at all (latency would increase with runtime of the application) and it also had some bugs. Two years later, we started over in Golang using JACK and then it suddenly all worked pretty well. Of course, we still had to go a long way from there, but at least the I/O was working fine then. There is actually still a repository called "audio-tools" on my GitHub, which is basically the leftover from the go-dsp-guitar Python + C "legacy". It's some tools I used to measure impulse responses. I never ported these to Golang, but they're not end-user relevant anyways. :mrgreen:
After all, when we need to do mission critical DSP, we do not use computers at all but only dedicated chips. As far as DSP goes, audio is actually pretty relaxed in terms of requirements. I have worked with DSP systems that were allowed a worst case scenario latency of 20 microseconds, for example. Computers cannot keep up.
Exactly this! :mrgreen:

Regrards.
CrocoDuck
Established Member
Posts: 1133
Joined: Sat May 05, 2012 6:12 pm
Been thanked: 17 times

Re: Go-DSP-Guitar multichannel multi-effects processor

Post by CrocoDuck »

CrocoDuck wrote:After all, when we need to do mission critical DSP, we do not use computers at all but only dedicated chips. As far as DSP goes, audio is actually pretty relaxed in terms of requirements. I have worked with DSP systems that were allowed a worst case scenario latency of 20 microseconds, for example. Computers cannot keep up.
Oh by the way, to clarify, I am not saying optimisation is not important when working on computers. Quite the opposite: it is frustrating to see a web browser going slow as hell in this day and age of ludicrous computer specs. After all, writing good optimised code (or trying to do it in my case) is part of the fun. My point is more like "in certain use case results might be equivalent".
andrepxx wrote:However, the main differentiation is, like I already mentioned, that go-dsp-guitar is built from the ground up following a rigorous simulation approach, and in fact this was the number one reason for starting a new project.
Cool stuff!

By the way, I wasn't suggesting that you should have started contributing to Guitarix. My point is more like that domain specific languages like Faust are not "evil", mainly for the benefit of other readers. An application like that you describe, with emphasis on rigorous simulation, is very well within the capabilities of Faust. Faust can compile to dedicated DSP hardware chips, like bela boards, so I reckon it does a decent job at keeping the DSP code tight. In fact, I think it generates better C++ than me in 99% of cases... I reckon also control over the web could be built around a Faust application, although that will mean embedding Faust generated code into a bigger project, perhaps. But yeah, any language -> any list of pros and cons.
Last edited by CrocoDuck on Thu Feb 20, 2020 9:59 pm, edited 1 time in total.
merlyn
Established Member
Posts: 1392
Joined: Thu Oct 11, 2018 4:13 pm
Has thanked: 168 times
Been thanked: 247 times

Re: Go-DSP-Guitar multichannel multi-effects processor

Post by merlyn »

andrepxx wrote:
To put things into perspective, assume we are operating at 48 kHz sample rate on a 32 buffer size. That would mean that the audio callback of any application has less than 0.6 ms to process all the samples in the buffer.
I somehow doubt this is possible on a PC. I can barely run JACK at a buffer size of 64 operating at 96 kHz, but then I can do no processing, only use it as a "virtual wire". Any lower and it doesn't even start. I don't know of any "real" client I could attach to it when operating at this low of a buffer though. C code that just does a "memcpy(...)" from input to output might be fine. I haven't tried. But any real calculation, actually, nope.
For the benefit of anyone reading I wouldn't consider a buffer size of 32 frames at 48kHz particularly demanding.

I have an old Athlon CPU and my system can do that. It is true that the lower the buffer size the less processing can be done but I can run Carla with Tal Noizemaker at a buffer of 16 at 48kHz.

When I was using KXStudio I could start JACK with a 16 buffer but there was an immediate stream of Xruns. My question was then -- where is the bottleneck? Was this my soundcard? When KXstudio became out of date I switched to Arch Linux and after some tweaking I could use a 16 buffer. So it wasn't my soundcard, not my CPU, not my hardware -- the bottleneck was in the configuration.

Ideally I want the CPU to be the limiting factor. This means the system will be able to fully make use of new hardware when I upgrade and then I will be able to do more with a 16 buffer. If the bottleneck lies elsewhere then there is no point upgrading my hardware.

A 64 buffer at 96kHz is OK, but if that's the best you can do it may be possible to improve the system with some configuration tweaks.
andrepxx
Established Member
Posts: 11
Joined: Fri Jan 31, 2020 5:53 pm

Re: Go-DSP-Guitar multichannel multi-effects processor

Post by andrepxx »

By the way, I wasn't suggesting that you should have started contributing to Guitarix. My point is more like that domain specific languages like Faust are not "evil", mainly for the benefit of other readers. An application like that you describe, with emphasis on rigorous simulation, is very well within the capabilities of Faust. Faust can compile to dedicated DSP hardware chips, like bela boards, so I reckon it does a decent job at keeping the DSP code tight.
I know that this wasn't your point. I just wanted to mention that I wanted to go with a general-purpose language, mainly since (contrary to Faust) I was already confident in some of them and I also thought that it might be useful to more people.

For example, there was no really well-performing FFT available in Golang. I could have tried creating a Golang wrapper around, say, FFTW (not sure about the performance implications of the cross-language call though), but I thought that a faster FFT implemented directly in Golang might be useful for many. So that basically "dropped out of the project" as a "side artifact" of choosing Golang.

But back to the reasons why I did not choose to use Faust ...

When I want to simulate a real system, I care about actual time, not samples, since the former is a real-world concept, the latter is an artifact of the simulation. However, I saw lots of Faust examples that were "sample-related", something like: "Here we delay the signal by 200 samples and then mix it together with the original to create a comb filter." - But normally, I would not do that, since "a delay of 200 samples" does not mean a thing. Instead, I'd rather do something like: "Here we delay the signal by 5 ms and then mix it together with the original to create a comb filter." - I had the impression that it might be hard to go "sample-agnostic" in Faust, or that I might otherwise be limited by the language in the end. (For me, "domain specific" means "not general-purpose", so it "cannot do everything". It may restrict what I am doing.) Probably I'm wrong about it and it is all possible (perhaps even relatively easy), but yeah ... I somehow decided against it at some point.

Also, I have no idea, if I were to do the signal processing with Faust, and then control it from another language to provide a rather sophisticated web interface, how well that would play and how much work that would be, how much "glue code" required, etc.

Probably Faust is totally fine and it's just me not knowing. :mrgreen:
A 64 buffer at 96kHz is OK, but if that's the best you can do it may be possible to improve the system with some configuration tweaks.
Keep in mind that the 64 sample buffer will allow for no processing.

It's the I/O alone (switching to supervisor-mode, getting the samples from the interface over USB, getting the samples into memory, context-switching to user-mode JACKd, JACKd copying a buffer, sending samples back to the interface over USB) that takes about this amount of time.

It's not surprising since it's less than half a millisecond. Normally, scheduler time slices are already several milliseconds long (*), so what would you expect? In my opinion, this goes beyond "tweaking". The system as a whole is simply not designed for that. If it works, of course, great! However, I would expect that I have to go a good amount higher so that the system has room to do anything useful until it has to get the results out again.

(*) There used to be a constant somewhere in the Linux kernel that says a scheduler tick is 10 ms. I think it's still in the code, but I think the time slice is actually dynamically chosen nowadays. With the fixed 10 ms slice, audio stuff would be near impossible on Linux. :mrgreen:

To be honest, I already find it quite astonishing that a 5 ms buffer works on a PC, considering even the "old" Line6 POD had a 10 ms buffer. Sure, it's 20-th century tech. However, it already had a dedicated DSP. And now we're already twice as fast (and sound a lot better and sample a lot higher - I'm sure the POD didn't sample at 96 kHz or beyond :mrgreen: ) on a regular PC if we can work with 5 ms (512 samples @ 96 kHz) buffer. And indeed, depending on the complexity of the model, I can normally go 512 or even 256 samples without problems. (*)

Man, I'm impressed! And you wanna go fifteen times faster even still with "less than 64 samples at 96 kHz"? I mean "less than 64 samples" means 32 samples, since one normally uses a power of two and 32 samples at 96 kHz is 1 / 3 of a millisecond. So that's 30 times less than the time a POD takes to process your signal. And your PC will have more than four times the amount of data to process in that time (at least double the sample rate - I doubt the original POD sampled faster than 48k - and double the channels, since the POD had a single guitar input - however, on a PC, you'll normally use at least a stereo interface and go-dsp-guitar provides two inputs per default in "real-time mode" - and probably even more than that, since we'll use a much higher bit depth at the PC as well - the POD probably had 16 bits, JACKd is 32 bit float, go-dsp-guitar is 64 bit float internally) to process in that time. :mrgreen:

And yeah, the complexity of the models probably don't compare either. :mrgreen:

Just some thoughts ...

... of course less latency is always better, but after some point, you'll get diminishing returns. My guitar teacher could tell me when he had his guitar plugged into the interface and A/B-ing two signals, one with A/D -> D/A conversion and 64 samples delay and the other "direct", which of them was the "direct" signal, but this guy's a maniac. :mrgreen: For most "normal" use, I'd consider 256 or 512 buffer size at 96 kHz totally fine, even for playing live. (It's like standing 2.5 feet or 5 feet from the amp, respectively.) For post-processing you can always go straight up to 8192 (I think that's the maximum that JACK supports) or, in go-dsp-guitar, use the "batch processing mode", where there are no real-time constraints at all.

(*) Of course, you can basically make the model as complex as you like and then it will take long to process. For example, go-dsp-guitar won't prevent you from applying a million-order (1048576 to be exact :mrgreen: ) FIR "filter" (that high of an order would rather be a convolution reverb than a "filter", but anyhow :mrgreen: ), but of course that definitely won't run in real-time at any reasonable buffer size. :mrgreen:
merlyn
Established Member
Posts: 1392
Joined: Thu Oct 11, 2018 4:13 pm
Has thanked: 168 times
Been thanked: 247 times

Re: Go-DSP-Guitar multichannel multi-effects processor

Post by merlyn »

What's important in this live processing setup is the round trip latency. What you're quoting as latency is the time available to calculate a buffer. It is misleading that QjackCtl and Cadence use that as the latency figure.

Yes, at a 512 buffer there is 512/96000 = 5.333... ms available to calculate a buffer. The latency a user experiences is higher than that.

There is an input buffer and an output buffer, and at least two periods on the output buffer.

For JACK in async mode there is an extra buffer and the calculation is :

nominal latency = (number of frames per period/sample rate)*(2 + number of periods per buffer)

For 2 periods per buffer the latency is four times the latency of one buffer. In practice there is also hardware latency, so it goes up again.

Using a 512 sample buffer at 96kHz results in a round trip latency of over 20 ms.

It's possible to directly measure the round trip latency you're getting with jack_iodelay.
andrepxx
Established Member
Posts: 11
Joined: Fri Jan 31, 2020 5:53 pm

Re: Go-DSP-Guitar multichannel multi-effects processor

Post by andrepxx »

I'm pretty sure (though I must admit that I didn't measure it) that round-trip latency is exactly two times the latency at two periods per buffer, which is the minimum by the way.

The reason why two is the minimum is the following. Let's just suppose for this example that the buffers contain four frames (samples) each. (One "frame" is actually one "sample per channel", but let's assume a single channel for simplicity, so one "frame" is one "sample".) For illustration, I will number the frames with a one-based index, so 1, 2, 3, ..., even though most programming languages will use a zero-based index when accessing a buffer.

I will use a drawing like that.

Code: Select all

Time (in buffers) | In front | In back | Out front | Out back
-------------------------------------------------------------
               -1 | N/A      | N/A     | N/A       | N/A
What happens is something like this.

1. Our hardware has a front and a back buffer for both input and output. The analog-digital converter (ADC) in our audio interface will measure voltage on its input and write the values into the input front buffer.

2. When it is finished doing that, it will swap front and back buffers (for both input and output, but we didn't look at the output yet) and at the same time firing an interrupt, which will cause our processing cycle in JACK.

3. Our application will read the data from the input back buffer, process it and write the result to the output front buffer, while at the same time, the ADC will continue capturing samples into the input front buffer.

4. At the next interrupt, front and back buffers are swapped again. The digital-analog converter (DAC) in our audio interface will output the voltage levels as indicated by the samples in the output back buffer, while our application processes the buffer before that and the analog-digital converter (ADC) acquires the buffer before that. So the delay is two buffer lengths. I will illustrate this graphically.

Before our hardware acquired the first buffer, the input buffer will be empty (uninitialized), there is no audio block being processed and the output buffer will be empty (probably zeroed, but that doesn't matter) as well.

Code: Select all

Time (in buffers) | In front | In back | Out front | Out back
-------------------------------------------------------------
               -1 | N/A      | N/A     | N/A       | N/A
Now, our analog-digital converter (ADC) runs and acquires its first buffer. At each cycle of the sample clock, it stores a single voltage reading in the input buffer. By the time our first buffer was acquired (which took exactly four samples long), things will look like this.

Code: Select all

Time (in buffers) | In front | In back | Out front | Out back
-------------------------------------------------------------
               -1 | N/A      | N/A     | N/A       | N/A
                0 | 1234     | N/A     | N/A       | N/A
Now, front and back buffers are swapped. (This happens "instantly", so it takes "no time".) The ADC continues capturing into what is now the front buffer, while our application processes samples from the input back buffer and writes the results into the output front buffer. The application has to be finished processing the samples from the input (back) buffer and fill the outpur (front) buffer or we will get an Xrun, so at the end of this cycle, our samples will have progressed until the output back buffer and the next samples have been captured by the ADC into the input front buffer. Therefore, we can be certain that (unless we got an Xrun), the samples that were acquired during the last cycle will have moved until the output front buffer when the current cycle is ended. Therefore, the situation now looks like this.

Code: Select all

Time (in buffers) | In front | In back | Out front | Out back
-------------------------------------------------------------
               -1 | N/A      | N/A     | N/A       | N/A
                0 | 1234     | N/A     | N/A       | N/A
                1 | 5678     | 1234    | 1234      | N/A
Now front and back buffers (for both input and output) will be swapped again, the next interrupt will fire, the ADC will continue acquiring into the input front buffer, our application will continue reading samples from the input back buffer, processing it and writing the result into the output front buffer and at the same time, the DAC will start playing back the output back buffer. The ADC and DAC run from the same sample clock (they have to, otherwise they'd drift), therefore, at the moment the ADC acquires the first sample into the input front buffer, the DAC outputs the first sample from the output back buffer and so on.

Code: Select all

Time (in buffers) | In front | In back | Out front | Out back
-------------------------------------------------------------
               -1 | N/A      | N/A     | N/A       | N/A
                0 | 1234     | N/A     | N/A       | N/A
                1 | 5678     | 1234    | 1234      | N/A
                2 | 9012     | 5678    | 5678      | 1234
Now let's take a look at latency. Sample number 1 was captured in the beginning of cycle 0 and played back right in the beginning of cycle 2. Sample number 4 was captured at the end of cycle 0 and played back right in the end of cycle 2. So the latency is exactly two buffers.

By the way, the figures I gave in my last post were buffer sizes through-and-through, so your round-trip latencies will be double that each time. However, I used the same "definition" systematically, so at least it's not "apples to oranges". :mrgreen:
CrocoDuck
Established Member
Posts: 1133
Joined: Sat May 05, 2012 6:12 pm
Been thanked: 17 times

Re: Go-DSP-Guitar multichannel multi-effects processor

Post by CrocoDuck »

Again, just for the benefits of other readers (not attempting any conversion here).
andrepxx wrote:When I want to simulate a real system, I care about actual time, not samples, since the former is a real-world concept, the latter is an artifact of the simulation. However, I saw lots of Faust examples that were "sample-related", something like: "Here we delay the signal by 200 samples and then mix it together with the original to create a comb filter." - But normally, I would not do that, since "a delay of 200 samples" does not mean a thing. Instead, I'd rather do something like: "Here we delay the signal by 5 ms and then mix it together with the original to create a comb filter." - I had the impression that it might be hard to go "sample-agnostic" in Faust, or that I might otherwise be limited by the language in the end.
The operating sample rate is exposed in Faust, so you can go from time to samples (and vice-versa) trivially as you would do in any other language. To be honest, I'd say that samples are a unit of time as arbitrary as seconds, which in the context of DSP typically offer a higher degree of generalization (akin to that of normalized frequency) so I do not see too much the point about the "artifact of the simulation": physics works in any system of units. Either way, definitely the least serious problem I can think of when modelling a continuous time system in discrete time world, I would say. But I guess this was just one example.
andrepxx wrote:(For me, "domain specific" means "not general-purpose", so it "cannot do everything". It may restrict what I am doing.)
That's the idea. Faust is there to take care of the DSP side of things. In terms of DSP, I can hardly think of anything it could be restrictive for.
andrepxx wrote: Also, I have no idea, if I were to do the signal processing with Faust, and then control it from another language to provide a rather sophisticated web interface, how well that would play and how much work that would be, how much "glue code" required, etc.
This is where my knowledge lacks the most. In the most typical use case Faust will generate C++ classes which implement ready to use audio callbacks. You can use them to implement the DSP processing of your program. I think this is how it is used most often: you have a C++ application and you include in it C++ generate from Faust to handle the audio DSP. The Faust compiler can compile to all sorts of targets, including JUCE applications, SOUL code, WebAudio+wasm... But I do not think it can generate Go code. So, if you want to use Go for the rest of the application, perhaps there aren't many advantages.
andrepxx wrote:I'm pretty sure (though I must admit that I didn't measure it) that round-trip latency is exactly two times the latency at two periods per buffer, which is the minimum by the way.
It is pretty much always larger than that. Some of the latency is somehow hardware dependent. My old Scarlett 2i4 had mismatched latency between channels, for example, while a friend old MAudio device had a much higher latency, possibly due to being an USB 1.1 device?

Fun fact: some of the latency figure you get when measuring it with any method is from the average group delay of the analog stuff included in the measurement loop. Average across the bandwidth of the test signal.

I wrote about latency perception on my old blog, by the way:

https://thecrocoduckspond.wordpress.com ... d-mythske/
https://thecrocoduckspond.wordpress.com ... different/
https://thecrocoduckspond.wordpress.com ... ive-study/

There is some experimental evidence that latency perception is affected by many factors, including the instrument itself for various psycho-physiological reasons. Artifacts can be heard at a latency as low as 4.5 ms when playing guitar, but it will be only around 15 ms when the delay becomes clear. For other instruments other time thresholds.
andrepxx
Established Member
Posts: 11
Joined: Fri Jan 31, 2020 5:53 pm

Re: Go-DSP-Guitar multichannel multi-effects processor

Post by andrepxx »

The Faust compiler can compile to all sorts of targets, including JUCE applications, SOUL code, WebAudio+wasm... But I do not think it can generate Go code. So, if you want to use Go for the rest of the application, perhaps there aren't many advantages.
Well, the advantage I would hope for when using Faust is that it would be faster, since it uses specialized abstractions for signal processing and is said to generate quite optimized code. One can call from Go into C (and the other way round - that's how the JACK binding does it, for example), therefore one could in principle implement the signal processing in Faust (either all of it or parts, for example, the most "expensive" effects units, which would be the convolution-based "power-amp" simulation) while keeping other stuff in Go.

However, a cross-language call is expensive, since, as far as I'm aware of, it involves a context switch.

One of the first things the Go standard library does when it gets called from another language (or a call into another language returns) is "acquire an M". An "M" is a thing that's related to scheduling. Go uses the following terms when it comes to scheduling.

- A "P" is a "processor". It is how the Go developers call a hardware thread.
- An "M" is a "machine". It is how the Go developers call an operating-system thread.
- A "G" is a "goroutine". It is how the Go developers call an application-level thread.

There are at most as many running "M"-s as there are "P"-s, but of course, there can be "M"-s waiting, either blocking on a system call or waiting for the operating-system scheduler to get a time-slice. There can be less "M"-s than there are "P"-s, if your application has less "G"-s than there are "P"-s. There can also be more "M"-s than there are "P"-s, since the standard library will spawn a new "M" when there are more "G"-s than there are running "M"-s and another "M" gets blocked, for example on a system call.

So, "acquiring an M" basically means "creating an operating-system thread" (unless there is an idle one "pooled" that can be reused, but we cannot rely on that), so that's a rather expensive operation. As a result, one would want to minimize cross-language calls. Therefore only implementing the effects units themselves in Faust and doing the rest (especially the orchestration) in Go would probably be slow and inefficient, since it would involve two context-switches for every effects unit in the signal chain.

So the first viable option would probably be to implement all effects units and also do the orchestration (and also the I/O, but I think that is generated) in Faust, so basically the entire "JACK processing callback" would be in Faust, and then just control it from Go code.

The other option would be to only implement the most computationally expensive effects unit(s) (which will be the "power amp" unit) in Faust and hope that the reduction in runtime within that effects unit is larger than the overhead of the cross-language call. However, that's quite an inelegant solution since, for real-time, you really want to avoid context switches, and a cross-language call will involve a context switch.

It's said that cross-language calls between Go and C are cheap compared to, say, JNI, or calls between Python and C, but there's still considerable overhead involved.
I wrote about latency perception on my old blog, by the way:
That was very interesting to read, actually. Judging from these numbers, it appears that, contrary to what my guitar teacher said, I'm not too far away from achieving "good" / useful latency properties, especially when keeping in mind that I'm developing / testing on a rather old system and with things like networking and graphical desktop on.

Like I said, I already identified some points for further optimizations, which I hope will bring the application well within the 2.5 - 5 ms realm with convolution-based "power amp simulation" on when running on the machine I use for development. And then, if one was to "throw more hardware at it" or even use a dedicated system without UI, I'd expect one could get really good results from it.

Perhaps also an alternative IIR-based "amp sim" would be nice to have, as it would very likely be less computationally expensive than the convolution-based solution.
merlyn
Established Member
Posts: 1392
Joined: Thu Oct 11, 2018 4:13 pm
Has thanked: 168 times
Been thanked: 247 times

Re: Go-DSP-Guitar multichannel multi-effects processor

Post by merlyn »

@andrepxx If I understand you correctly you are describing the internal buffers in your application. In my model of how things work It's JACK that determines the latency and deals with the hardware.

I have measured my latency with jack_iodelay and the formula I posted above holds good. The number of periods per buffer only applies to playback. That's why its (2 + number of periods per buffer). To get my latency as low as possible I run JACK in sync mode and the formula for that is :

nominal latency = (frames per period/sample rate) * (1 + number of periods per buffer)

There's one period of input buffer and two periods of output buffer. In async mode there's an extra period on the output which, according to falkTX, allows buffers from misbehaving apps to be dumped.

According to jack_iodelay I get a round trip latency of 1.138 ms with a 16 buffer at 96kHz. Does that agree with the formula?

(3 * 16) / 96000 = 0.5 ms

No, it doesn't. Where is the rest of the latency coming from? It's my hardware latency. My soundcard has 30 samples of latency on the input and 30 samples of latency on the output. Trying again including the hardware latency :

(48 + 60) / 96000 = 1.125 ms

It's not exact but they both round to 1.1 ms and @Crocoduck mentioned above that the analogue part introduces some measured latency. It's out by less than one period (16 / 96000 = 0.167 ms) so the formula holds good.
CrocoDuck
Established Member
Posts: 1133
Joined: Sat May 05, 2012 6:12 pm
Been thanked: 17 times

Re: Go-DSP-Guitar multichannel multi-effects processor

Post by CrocoDuck »

merlyn wrote:My soundcard has 30 samples of latency on the input and 30 samples of latency on the output.
That's interesting. How did you figure that out? In the past I could conclude how many samples of mismatch different channels have, but I couldn't be sure of the input and output latency.

By the way, this uses a chirp for latency measurement: https://lsp-plug.in/?page=manuals&section=latency_meter. It means that it will average the analog group delay over a broader range of frequencies (up to half the sample rate if I remember correctly) with respect jack_iodelay. Could be interesting to see how it compares with jack_iodelay. Good soundcards should be phase linear, so results should not vary between Latency Meter and jack_iodelay for good soundcards (which really are the norm nowadays). I remember this was the case with my hardware while I was developing the Latency Meter.
CrocoDuck
Established Member
Posts: 1133
Joined: Sat May 05, 2012 6:12 pm
Been thanked: 17 times

Re: Go-DSP-Guitar multichannel multi-effects processor

Post by CrocoDuck »

merlyn wrote:There's one period of input buffer and two periods of output buffer. In async mode there's an extra period on the output which, according to falkTX, allows buffers from misbehaving apps to be dumped.
Ah yeah, the async thing. Got me confused when I was noob(er). Here the thread: https://linuxmusicians.com/viewtopic.ph ... y+confused. Unfortunately FalkTX replies are gone...

EDIT: Actually in that post I was confused by something else.
merlyn
Established Member
Posts: 1392
Joined: Thu Oct 11, 2018 4:13 pm
Has thanked: 168 times
Been thanked: 247 times

Re: Go-DSP-Guitar multichannel multi-effects processor

Post by merlyn »

Crocoduck wrote:That's interesting. How did you figure that out?
I was assuming that the backend arguments compensate for hardware latency. Here is the output of jack_iodelay without using extra latency :

Code: Select all

 109.250 frames      1.138 ms total roundtrip latency
	extra loopback latency: 61 frames
	use 30 for the backend arguments -I and -O Inv
I use Cadence to configure JACK and put '30' into the 'Extra Latency' fields and then get this :

Code: Select all

   110.250 frames      1.148 ms total roundtrip latency
	extra loopback latency: 2 frames
	use 1 for the backend arguments -I and -O Inv
OK, so that's new :) I was expecting zero. I'll change the extra latency to 31 frames and get this :

Code: Select all

   110.250 frames      1.148 ms total roundtrip latency
	extra loopback latency: 0 frames
	use 0 for the backend arguments -I and -O Inv
I'm assuming that the extra latency is from the hardware. Now I have 31 frames hardware latency.

I'll try the Latency Meter too. Thanks for the suggestion.
CrocoDuck
Established Member
Posts: 1133
Joined: Sat May 05, 2012 6:12 pm
Been thanked: 17 times

Re: Go-DSP-Guitar multichannel multi-effects processor

Post by CrocoDuck »

merlyn wrote:I was assuming that the backend arguments compensate for hardware latency.
Ah yeah that's it. That's what I do too. However, there is not guarantee that ADC a DAC latency are the same, so it might be any combination that gives you 61 as a sum. I though you managed to figure out exactly what your device is doing, somehow. I think that of equal ADC and DAC latency is an OK assumption, and it really does not matter much in reality: at the end you will compensate for the same rountrip latency anyway.
tramp
Established Member
Posts: 2335
Joined: Mon Jul 01, 2013 8:13 am
Has thanked: 9 times
Been thanked: 454 times

Re: Go-DSP-Guitar multichannel multi-effects processor

Post by tramp »

andrepxx wrote:With Guitarix, I can do 256 samples stable (so ~2.5 ms), 128 samples x-run-s rarely. However, their default cabinet simulation does not use convolution, so it's a bit of an apples to oranges comparison.
That's wrong, the cabinet simulation in guitarix is convolution based.
andrepxx wrote:The other thing (and that's probably the more important differentiation) is that the projects have very different focus as well. You can remote-control go-dsp-guitar over a web interface (or from any application that can send JSON over HTTPS) and therefore run the actual signal processing on a "headless" machine. Guitarix has an optional web UI as an addon, but you cannot control everything from it.
For guitarix we prefer MIDI CC as control instance, as that allow you to use foot controllers. But anyhow, you could as well remote control it with the default UI running on a different machine. You could even running multiple control UI's on multiple PC's to control the same engine, or multiple engines. Therefore you could use a Bluetooth connection, or a HTTPS connection via AVAHI (handshake). The web UI is meant to be used on tablet PC or on a Phone, and, is therefore designed for small interfaces.
btw, we also provide a python interface so that people could create there own control interfaces with arduino or what ever they prefer. That is what most people use when running guitarix on a PI.
andrepxx wrote:Perhaps also an alternative IIR-based "amp sim" would be nice to have, as it would very likely be less computationally expensive than the convolution-based solution.
Funny thing aside, that's what we do with our dkbuilder, it create IIR filters based on a given circuit. Here is a example for a Power Amp simulation:
https://linuxmusicians.com/viewtopic.php?f=44&t=19806
On the road again.
andrepxx
Established Member
Posts: 11
Joined: Fri Jan 31, 2020 5:53 pm

Re: Go-DSP-Guitar multichannel multi-effects processor

Post by andrepxx »

@andrepxx If I understand you correctly you are describing the internal buffers in your application. In my model of how things work It's JACK that determines the latency and deals with the hardware.
No.

go-dsp-guitar will do no internal buffering. As you said, that's what JACK has to care about. (JACK will actually "delegate" a lot of that to lower levels too though.)

Well, of course, technically, it uses internal buffers for its calculations, but these do not "delay things". When JACK calls the process() callback of go-dsp-guitar and in the input buffer it passes to go-dsp-guitar, there is a dirac pulse in position N, after process() returns, there will be a response beginning at position N, either still a dirac impulse at position N, if for example, only gain was applied, or a "smeared" response starting at position N, if (also) filtering was applied.

So go-dsp-guitar will not introduce delay on its own. (It might introduce phase shift though. All filters to that.)

Also, JACK itself adds no delay. See here: https://jackaudio.org/faq/no_extra_latency.html

So yeah, what determines your latency is the size of your hardware buffers, which I assume is also what you control in the end when you select the buffer size in JACK. After all, that's why CPU / system load decreases when you make the buffer size larger. The hardware (audio interface) will transfer the samples to the computer in larger chunks, so the kernel will get woken up less (and invoke client applications less for processing). On the computer side, no additional buffering has to happen. The computer will run asynchronously to the actual audio capture / playpack process (and will normally run much faster than it), so it can just completely process everything and write the results back into the hardware buffers until the next "hardware buffer swap".

Also, if you chain multiple applications in JACK, their process() callbacks will all be invoked one after the other (so if the first application's process() returns, the second application's process() gets invoked and passed the buffer that the first application wrote to as its input, and so on) until the signal arrives at the edge of the graph at a hardware output. Therefore, chaining multiple applications in JACK does not increase latency either. It just increases CPU load, as more code has to be run in each cycle.

It's the same internally with go-dsp-guitar. It causes no latency, just CPU load. Of course, adding effects units to the signal chains, does not increase latency either. The block of samples / frames runs through all units in a single process() cycle. This is not a "pipelined" architecture, as we want to optimize for latency when we do (real-time) audio, not throughput.

As a side note, things actually works pretty similar in the video / graphics realm, too. Your applications draw into a frame buffer for their individual windows, the window manager composites the frame buffers into a single large one for the entire screen and writes the result to video memory. The hardware then waits until the next V-sync and then swaps front / back buffer and displays the new frame immediately.
That's wrong, the cabinet simulation in guitarix is convolution based.
I just had a look at the code and it appears that you're right (though, obviously, I did not follow the program flow all the way through - too large code base :mrgreen: ). I found that it uses much shorter IRs, so that's probably where much of the higher efficiency at lower buffer sizes comes from. So yeah, I stand corrected.
Funny thing aside, that's what we do with our dkbuilder
That looks a lot like what I was trying to achieve with go-dsp-guitar. However, I didn't go all the way down to the component level, as it was too much work, and I suspected it wouldn't compute in real-time anyways. General-purpose circuit-simulation like SPICE doesn't work in real-time since the step size is dynamic and can become arbitrarily small, so one has to cut some corners. But huge congrats for making that work!
Post Reply