j_e_f_f_g wrote:[
Since male can't do anything useful beyond whining "you're wrong, loser", I directly put the question to you: Do you deny that jack prefetches (in a secondary -- not the dma -- buffer) the client's data for the next dma cycle?
it is clear that you like to play games with words and meanings, so i don't want to answer your question directly as worded. oh, all right then. yes, i deny it.
JACK is woken up by the audio interface hardware using the same mechanism that eDrummer uses (the ALSA one). the interface hardware does this because its buffer pointer has just passed a position where one period's worth of data/space is now available to user space for reading/writing. you can't change this (even if there are betters way to do this) because this is baked into ALSA's driver model.
lets make this concrete. if you configured the device to use 2 periods of 16 samples each, there is a h/w buffer of 32 samples. each time the h/w pointer passes the start/end or the midpoint of this buffer, it interrupts the CPU, then it gets back to processing the upcoming chunk of 16 samples. it is the responsibility of the CPU/system/kernel/application code to read/write the next 16 samples while the audio interface processes the other 16. ALSA provides you with no access to the 16 sample section of the buffer that the hardware is about to process, only to the other 16 that have just been written to/read from by the hardware.
So, you now have 16 samples worth of time to do whatever processing is necessary to get the other 16 samples set up in the DMA buffer. You can do this any way you want to, and take as much time as you want as long as you don't exceed the time represented by 16 samples. You can copy your data a gadzillion times before finally putting it in the DMA buffer, just as long as you get it there by the time the pointer hits the next period boundary. Nothing that you can do in your processing code will change the latency of the system - in 16 samples from the point of the interrupt, the h/w will start processing data that it assumes you put in the DMA buffer.
And JACK does exactly that. Yes, clients may write/read to/from all many of other buffers during the process callback cycle, but at the end of it, the data from all clients ends up back in the DMA buffer, just as it would if they were all native ALSA applications. The interrupt comes in, JACK wakes up, does a bit of housekeeping, the backend reads the incoming data from the DMA into buffers that clients can access, then wakes up all the clients who write into other buffers, then the backend writes data from those buffers back into the DMA buffer and JACK goes back to sleep.
No pre-fetching, no delay. Only "right now", just as with ALSA-native (or ASIO-native or CoreAudio-native or whatever-native) applications.
The term "double buffering" when used by people who understand audio device driver implementation and design describes precisely the scheme I've outlined above. To put it in more colloquial terms: the hardware divides the h/w ("DMA") buffer into two halves. It uses one half while the CPU gets to "use" the other. Every time it crosses the boundary between the two halves, it tells the CPU. The same design has historically been used in video interfaces too.
Now, there *are* things that one could ask for in an audio interface driver design that would change this mechanism a bit. In particular, you might want to be able to write/read closer to the h/w pointer than the smallest interrupt interval that the device will support (for example, many PCI interfaces won't go below 64 samples between interrupts, which thus defines a lower limit on the latency they can support with this kind of mechanism). If you knew where the h/w pointer was at any time, not just when the device sends an interrupt, you could figure out a way to do this.
And indeed ... you could decouple the wakeup from the h/w for example, by using a DLL and very accurate system clock. CoreAudio on OS X does this, allowing different apps to use the same hardware but be woken at different intervals, thus allowing them to have different latencies even though they all share a single hardware setup. The DLL allows you to predict/guess/know where the h/w pointer is at any point in time with great accuracy, and so rather than wait for the hardware itself to tell you "time to read/write more data", you can just wake up at various intervals and "know" exactly how much data/space there is available to read/write. Of course, CoreAudio isn't super-confident of the accuracy of its DLL and so they add a "safety buffer" - basically they waffle a bit on where the DLL predicts the h/w pointer is to ensure that if the DLL isn't completely accurate, underruns/overruns won't happen. Typically safety buffer sizes are 8 samples - the size is determined by the device driver. This design means that you can safely write pretty much anywhere in the h/w ("DMA") buffer because you know very accurately where the h/w pointer is. If the DLL and the hardware are solid enough, you could even write into the DMA buffer just 1 sample ahead of the h/w pointer, generating just 1 sample's worth of output latency. Unfortunately, this is a theoretical possibility only for consumer level cards, because of the lower limit on the PCI burst transfer size - you need to cook up at least 64 bytes of data to do the DMA transfer.
But ... ALSA doesn't offer such a mechanism. It doesn't have a DLL to predict the h/w pointer position - ALSA is entirely dependent on the interrupts to know that. Moreover, there is hardware where any attempt to access the part of the DMA buffer actively in use by the h/w will fail. And worse, there are still audio interfaces floating around out there where is no DMA buffer at all - the device driver has to actively move data to and from the card via IO ports, and there is no DMA access for either the kernel or user space. ALSA hides this from in its API - you don't actually have any idea whether the "DMA" buffer that you get access to from user space is actually memory associated with DMA or just another internal buffer in ALSA that it will eventually need to read/write to the device via some other mechanism. Thankfully, such audio interfaces are now rare, but even in the early 2000's the fact that an audio interface could be a DMA bus master was a big deal.
In summary, the only way you get to read/write data from/to an audio interface is when the kernel tells you it is time to do so. The kernel has its own mechanism for this, and with ALSA on Linux, that mechanism is based on interrupts from the device. When an interrrupt comes in, and your application is woken up to do I/O, you can only read/write the next period's worth of data if you want to read/write it directly from/into the DMA buffer. As long as you get that done before the next period elapses, everything is good - no data is lost, no clicks are heard. But this (1 period) is the lower limit for latency on an ALSA-driven system, and JACK and its clients match this limit just like any native ALSA app would (admittedly with a bit more CPU overhead) - they read/write the relevant data within the next period, and the data they generate is in the DMA buffer at the end of it, ready for use by the hardware.
(note: the above discussion does not relate to JACK2's "async" mode, which as has been noted is optional and not available in JACK1)