cyclictest

Optimize your system for ultimate performance.

Moderators: khz, MattKingUSA

merlyn
Established Member
Posts: 324
Joined: Thu Oct 11, 2018 4:13 pm

cyclictest

Postby merlyn » Tue Dec 18, 2018 2:33 pm

Have you used cyclictest?

What options do you use?

On the linuxaudio.org wiki here this is suggested :

Code: Select all

cyclictest -t1 -p 80 -n -i 10000 -l 10000 -m

That doesn't work with cyclictest 1.30. This does :

Code: Select all

cyclictest -t1 -p 80 -i 10000 -l 10000 -m

It's the -n option that is the problem. It seems clock_nanosleep is now the default, so -n breaks it.

User avatar
khz
Established Member
Posts: 1038
Joined: Thu Apr 17, 2008 6:29 am
Location: German

Re: cyclictest

Postby khz » Tue Dec 18, 2018 2:41 pm

I wanted to post fewer links, but still do it again. ;-)

I hope it is useful information.
FZ - Does humor belongs in Music?
GNU/LINUX@AUDIO ~ /Wiki $ Howto.Info && GNU/Linux Debian installing >> Linux Audio Workstation LAW
    I don't care about the freedom of speech because I have nothing to say.

merlyn
Established Member
Posts: 324
Joined: Thu Oct 11, 2018 4:13 pm

Re: cyclictest

Postby merlyn » Tue Dec 18, 2018 5:17 pm

khz wrote:I hope it is useful information.


It is, thanks. There are a lot of possibilities with cyclictest. I haven't seen much about how the results relate to the performance of an audio setup.

merlyn
Established Member
Posts: 324
Joined: Thu Oct 11, 2018 4:13 pm

Re: cyclictest

Postby merlyn » Sun Dec 23, 2018 5:48 pm

I found a script that draws a latency plot here.

Code: Select all

#!/bin/bash

# 1. Run cyclictest
cyclictest -l100000000 -m -Sp90 -i200 -h400 -q >output

# 2. Get maximum latency
max=`grep "Max Latencies" output | tr " " "\n" | sort -n | tail -1 | sed s/^0*//`

# 3. Grep data lines, remove empty lines and create a common field separator
grep -v -e "^#" -e "^$" output | tr " " "\t" >histogram

# 4. Set the number of cores, for example
cores=4

# 5. Create two-column data sets with latency classes and frequency values for each core, for example
for i in `seq 1 $cores`
do
  column=`expr $i + 1`
  cut -f1,$column histogram >histogram$i
done

# 6. Create plot command header
echo -n -e "set title \"Latency plot\"\n\
set terminal png\n\
set xlabel \"Latency (us), max $max us\"\n\
set logscale y\n\
set xrange [0:400]\n\
set yrange [0.8:*]\n\
set ylabel \"Number of latency samples\"\n\
set output \"plot.png\"\n\
plot " >plotcmd

# 7. Append plot command data references
for i in `seq 1 $cores`
do
  if test $i != 1
  then
    echo -n ", " >>plotcmd
  fi
  cpuno=`expr $i - 1`
  if test $cpuno -lt 10
  then
    title=" CPU$cpuno"
   else
    title="CPU$cpuno"
  fi
  echo -n "\"histogram$i\" using 1:2 title \"$title\" with histeps" >>plotcmd
done

# 8. Execute plot command
gnuplot -persist <plotcmd


It takes five and a half hours. If you want to see the results more quickly, edit the 'cyclictest -l100000000' option. Take a zero off and it will take around half an hour, I think. Or you can use 'CTRL-C' at any time, and the script will draw a graph with the data it has. You need to run this script as root, or with sudo.
Last edited by merlyn on Mon Dec 24, 2018 12:16 pm, edited 1 time in total.

merlyn
Established Member
Posts: 324
Joined: Thu Oct 11, 2018 4:13 pm

Re: cyclictest

Postby merlyn » Sun Dec 23, 2018 5:51 pm

I left my computer on last night, and drew this :
4.19.11-arch1-1-ARCH #1 SMP PREEMPT.png
You do not have the required permissions to view the files attached to this post.

merlyn
Established Member
Posts: 324
Joined: Thu Oct 11, 2018 4:13 pm

Re: cyclictest

Postby merlyn » Mon Dec 24, 2018 11:48 am

I updated my kernel and ran the script again. This shows a lower maximum latency than the previous kernel. I had hackbench running to generate load. The comparison isn't entirely fair because in the previous test I used my computer for a bit, then left it running. On this test I set it running and left it, without touching anything during the test.

4.19.12-arch1-1-ARCH #1 SMP PREEMPT.png

Still, it shows the stock Arch kernel is good. Is it 'stock' though? It says 'PREEMPT' in the name of the kernel, so really it's a low latency kernel., I think.
You do not have the required permissions to view the files attached to this post.

Jack Winter
Established Member
Posts: 374
Joined: Sun May 28, 2017 3:52 pm

Re: cyclictest

Postby Jack Winter » Tue Dec 25, 2018 9:19 am

The stock Archlinux kernel is indeed what other distros call lowlatency.

If "uname -v" shows PREEMPT then it's a lowlatency kernel and if PREEMPT RT then a realtime kernel.

Running hackbench with cyclictest is a good idea, as loading the system down will make problems in the kernel scheduling more apparent.

Do you have a nvidia GPU?
Reaper/KDE/Archlinux. i7-2600k/16GB + i7-4700HQ/16GB, RME Multiface/Babyface, Behringer X32, WA273-EQ, 2 x WA-412, ADL-600, Tegeler TRC, etc 8) For REAPER on Linux information: https://wiki.cockos.com/wiki/index.php/REAPER_for_Linux

merlyn
Established Member
Posts: 324
Joined: Thu Oct 11, 2018 4:13 pm

Re: cyclictest

Postby merlyn » Tue Dec 25, 2018 12:24 pm

Jack Winter wrote:Do you have a nvidia GPU?

I do, yes.

I still have KXStudio 14.04.5 installed and I tested the kernel that came with KX : 3.13.0-119-lowlatency #166-Ubuntu SMP PREEMPT. For a fair comparison I didn't touch anything while this was running. This plot shows a higher maximum latency than a new kernel. Kernels appear to be getting better!

3.13.0-119-lowlatency #166-Ubuntu SMP PREEMPT.png
You do not have the required permissions to view the files attached to this post.

Jack Winter
Established Member
Posts: 374
Joined: Sun May 28, 2017 3:52 pm

Re: cyclictest

Postby Jack Winter » Wed Dec 26, 2018 1:46 pm

merlyn wrote:
Jack Winter wrote:Do you have a nvidia GPU?

I do, yes.


I asked because lately I started using my nvidia card instead of the builtin intel gpu, and thought I recognized those 300-400us peaks. With intel I get max results somewhere in the 80us range.
Reaper/KDE/Archlinux. i7-2600k/16GB + i7-4700HQ/16GB, RME Multiface/Babyface, Behringer X32, WA273-EQ, 2 x WA-412, ADL-600, Tegeler TRC, etc 8) For REAPER on Linux information: https://wiki.cockos.com/wiki/index.php/REAPER_for_Linux

merlyn
Established Member
Posts: 324
Joined: Thu Oct 11, 2018 4:13 pm

Re: cyclictest

Postby merlyn » Thu Jan 03, 2019 12:46 pm

Jack Winter wrote:I asked because lately I started using my nvidia card instead of the builtin intel gpu, and thought I recognized those 300-400us peaks.

I've got a GTX 650 and I'm using proprietary drivers. It seems this card isn't the best for Linux audio. If I was upgrading my hardware, I would avoid Nvidia.

Kernel 420 may be popular with a certain segment of the Linux musicians community. The maximum latency has gotten higher, just what you'd expect from 420. :D

4.20.0-arch1-1-ARCH #1 SMP PREEMPT.png
You do not have the required permissions to view the files attached to this post.

merlyn
Established Member
Posts: 324
Joined: Thu Oct 11, 2018 4:13 pm

Re: cyclictest

Postby merlyn » Thu Jan 10, 2019 11:24 pm

I got my on-board AMD graphics working, so I can now use an RT kernel. Here's the latency plot of 4.19.10-rt8-rt.

4.19.10-rt8-rt #1 SMP PREEMPT RT.png

It's CPU1 (the green line) that's letting the side down. :) The other three CPUs have a really good maximum latency -- around 50 microseconds. That's 0.05 milliseconds. The CPU1 result means that, technically, this isn't any better than the 4.19.12 low latency kernel (low latency being the default on Arch), but is better than 4.20.0 low latency.
You do not have the required permissions to view the files attached to this post.

User avatar
raboof
Established Member
Posts: 1604
Joined: Tue Apr 08, 2008 11:58 am
Location: Deventer, NL
Contact:

Re: cyclictest

Postby raboof » Fri Jan 11, 2019 9:52 am

merlyn wrote:It's CPU1 (the green line) that's letting the side down. :) The other three CPUs have a really good maximum latency -- around 50 microseconds. That's 0.05 milliseconds.


It's interesting that CPU1 is so much worse than the others. Makes you wonder if there's anything tied to CPU1. If there's something audio-related tied to CPU1, perhaps this is indeed the best we can do. If, however, it is something non-audio-related, I wonder if tying the audio-related tasks to the other CPU's would improve latency (though always at the expense of throughput).

gimmeapill
Established Member
Posts: 542
Joined: Thu Mar 12, 2015 8:41 am

Re: cyclictest

Postby gimmeapill » Fri Jan 11, 2019 9:55 am

I found also (empirically) that 4.19 RT is exceptionally good, but I have unfortunately not much time to join the benchmark fest.
Looking at the results from the old threads on LM, it seems we are getting better: most of the results 2y ago were around 150us, now we're reliably getting below 100us - that's nice!

We were actually discussing automating a bit latency benchmarking on LM around 2016, but it seems to be still an open topic.

Anyway, in case this can still help, there's a relatively easy way to get the graph generated on the fly with Red Hat's Tuna/Oscilloscope:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_MRG/1.3/html-single/Tuna_User_Guide/index.html#sect-Tuna_User_Guide-Using_Testing_Tools_with_TUNA-Oscilloscope

And it should be possible to push it a bit further to identify the actual source of xruns with LatencyTOP:
https://www.openhub.net/p/latencytop

But both are unfortunately very old projects, not sure if they still build...

Some newer projects that could also be of interest are latency-tracker and LTTng:
https://lttng.org/blog/2016/01/06/monitoring-realtime-latencies/
Last edited by gimmeapill on Fri Jan 11, 2019 10:03 am, edited 1 time in total.

gimmeapill
Established Member
Posts: 542
Joined: Thu Mar 12, 2015 8:41 am

Re: cyclictest

Postby gimmeapill » Fri Jan 11, 2019 10:02 am

raboof wrote:
merlyn wrote:It's CPU1 (the green line) that's letting the side down. :) The other three CPUs have a really good maximum latency -- around 50 microseconds. That's 0.05 milliseconds.


It's interesting that CPU1 is so much worse than the others. Makes you wonder if there's anything tied to CPU1. If there's something audio-related tied to CPU1, perhaps this is indeed the best we can do. If, however, it is something non-audio-related, I wonder if tying the audio-related tasks to the other CPU's would improve latency (though always at the expense of throughput).


Every time I messed around with the CPU affinity and tried to outsmart the kernel scheduler was a disaster, there are just too many unknown factors. I mean, it's worth a try but I wouldn't except too much...
One thing that could work is simply to use only three cpus out of four for any thing audio related (for the programs where this is configurable).
Cyclictest should at least show if this is a viable approach.

merlyn
Established Member
Posts: 324
Joined: Thu Oct 11, 2018 4:13 pm

Re: cyclictest

Postby merlyn » Fri Jan 11, 2019 3:03 pm

raboof wrote: If there's something audio-related tied to CPU1, perhaps this is indeed the best we can do.

Yes. Actually a maximum latency of 89 microseconds is good, good enough for me. As well as the raw 'maximum latency' figure there is the shape of the plot. RT kernels usually have a sharp cutoff on the x-axis, whereas most low latency kernels (4.19.12 being an exception) have a more spread out plot, with a handful of higher values scattered across the x-axis.

What does that mean in practice? I think it means that an RT kernel is more consistent. I've also been running xruncounter that tramp wrote, and the results have been more consistent. The advantage of an RT kernel is, in part, that once you find settings where the system has no Xruns there are never (very rarely) Xruns (unless you max out your CPU). With a low latency kernel the occasional latency spikes could cause the occasional Xrun.

@ gimmeapill Did you see xruncounter? This is a more real world test. It would be interesting to calibrate xruncounter against a real audio load.

xruncounter.c

Code: Select all

#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <signal.h>
#include <stdlib.h>
#include <math.h>

#include <jack/jack.h>

/*   gcc -Wall xruncounter.c -lm `pkg-config --cflags --libs jack` -o xruncounter */

jack_client_t *client;
jack_port_t *in_port;
jack_port_t *out_port;

static int xruns = 0;
static int grow = 1;
static int first_x_run = 0;
static float dsp_load = 0;
static int run = 1;


void
jack_shutdown (void *arg)
{
   exit (1);
}

int
jack_xrun_callback(void *arg)
{
   /* count xruns */
   xruns += 1;
   if (xruns == 1) {
       first_x_run = grow/100;
       dsp_load = jack_cpu_load(client);
   }
   fprintf (stderr, "Xrun %i at DSP load %f\n",xruns , jack_cpu_load(client));
   if ((int)jack_cpu_load(client)>95) run = 0;
   return 0;
}

int
jack_srate_callback(jack_nframes_t samplerate, void* arg)
{
    fprintf (stderr, "Samplerate %i \n", samplerate);
    return 0;
}

int
jack_buffersize_callback(jack_nframes_t nframes, void* arg)
{
    fprintf (stderr, "Buffersize is %i \n", nframes);
    return 0;
}

int
jack_process(jack_nframes_t nframes, void *arg)
{
    double d = 0;
    for (int j ; j < grow; ++j) {
        d = tan(atan(tan(atan(tan(atan(tan(atan(tan(atan(123456789.123456789))))))))));
    }
    grow +=100;
    d = 0;
    return (int)d;
}

void
signal_handler (int sig)
{
   jack_client_close (client);
   fprintf (stderr, " signal %i received, exiting ...\n", sig);
   exit (0);
}

int
main (int argc, char *argv[])

{

   if ((client = jack_client_open ("xruncounter", JackNullOption, NULL)) == 0) {
      fprintf (stderr, "jack server not running?\n");
      return 1;
   }

   in_port = jack_port_register(
     client, "in_0", JACK_DEFAULT_AUDIO_TYPE, JackPortIsInput, 0);
   out_port = jack_port_register(
        client, "out_0", JACK_DEFAULT_AUDIO_TYPE, JackPortIsOutput, 0);

   signal (SIGQUIT, signal_handler);
   signal (SIGTERM, signal_handler);
   signal (SIGHUP, signal_handler);
   signal (SIGINT, signal_handler);

   jack_set_xrun_callback(client, jack_xrun_callback, 0);
   jack_set_sample_rate_callback(client, jack_srate_callback, 0);
   jack_set_buffer_size_callback(client, jack_buffersize_callback, 0);
   jack_set_process_callback(client, jack_process, 0);
   jack_on_shutdown (client, jack_shutdown, 0);

   if (jack_activate (client)) {
      fprintf (stderr, "cannot activate client");
      return 1;
   }
   
   if (!jack_is_realtime(client)) {
       fprintf (stderr, "jack isn't running with realtime priority\n");
   } else {
      fprintf (stderr, "jack running with realtime priority\n");
   }
   
   while (run) {
      usleep (100000);
      fprintf (stderr, "DSP load %f  \r", jack_cpu_load(client));
   }
   
   fprintf(stderr, "in complete %i Xruns in %i circles\nfirst Xrun happen at DSP load %f circle %i\n", xruns, grow/100, dsp_load, first_x_run);

   jack_client_close (client);
   exit (0);
}


Return to “System Tuning and Configuration”

Who is online

Users browsing this forum: Google [Bot] and 1 guest