Some performance tests on RPI4 vs RPI3

Discussion of all things Raspberry Pi / Raspberry Pi 2 related

Moderators: MattKingUSA, khz

Post Reply
User avatar
sadko4u
Established Member
Posts: 986
Joined: Mon Sep 28, 2015 9:03 pm
Has thanked: 2 times
Been thanked: 359 times

Some performance tests on RPI4 vs RPI3

Post by sadko4u »

Here are some short observations made when testing performance of LSP Plugins on the new Raspberry PI 4 SBC vs some other competitors.
Test configurations:
  • Raspberry PI 2B v 1.2 with Cortex A-53 CPU, Raspbian stretch, GCC 6.3
  • Raspberry PI 3B with with Cortex A-53 CPU, Raspbian stretch, GCC 6.3
  • Raspberry PI 4B with with Cortex A-72 CPU, Raspbian buster, GCC 8.3
  • ASUS Tinker Board S with Cortex-A17 CPU (actually, is identified as Cortex-A12 - hmmm), TinkerOS (Debian 9), GCC 6.3
Test #0: Compilation time of the development branch while utilizing two CPU cores:

Code: Select all

time make -j2 test
  • RPI 2B v1.2 - real 9m29.012s, user 17m51.744s, sys 1m1.830s
  • RPI 3B - real 9m5.419s, user 17m5.062s, sys 0m58.891s
  • RPI 4B - real 4m3.904s, user 7m6.339s, sys 1m1.706s
  • ATB S - real 4m22.421s, user 8m3.430s, sys 0m26.740s
Now, LSP performance tests. All following performance tests demonstrate single-core performance of the system. Also note that native means native C-implementation of function, neon-d32 means NEON-optimized SIMD implementation of the same function.

Test #1: packed complex number blocks multiplication (packed means that real and imaginary parts of the same complex number are laying nearby one to another):

Code: Select all

.test/lsp-plugins-test ptest dsp.pcomplex.mul3
RPI 2B v1.2:

Code: Select all

┌Case───────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::pcomplex_mul3 x 256    │   5.00│ 909000│   5.00│ 908533│181706.75│    5.5034│100.00│
│neon_d32::pcomplex_mul3 x 256  │   5.00│3580000│   5.00│3579060│715812.17│    1.3970│393.94│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 512    │   5.01│ 459000│   5.00│ 458142│ 91628.60│   10.9136│100.00│
│neon_d32::pcomplex_mul3 x 512  │   5.00│1818000│   5.00│1817274│363454.98│    2.7514│396.66│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 1024   │   5.01│ 230000│   5.00│ 229314│ 45862.87│   21.8041│100.00│
│neon_d32::pcomplex_mul3 x 1024 │   5.00│ 941000│   5.00│ 940935│188187.02│    5.3139│410.33│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 2048   │   5.01│ 106000│   5.00│ 105852│ 21170.60│   47.2353│100.00│
│neon_d32::pcomplex_mul3 x 2048 │   5.01│ 491000│   5.00│ 490502│ 98100.49│   10.1936│463.38│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 4096   │   5.06│  53000│   5.00│  52344│ 10468.99│   95.5202│100.00│
│neon_d32::pcomplex_mul3 x 4096 │   5.00│ 189000│   5.00│ 188976│ 37795.25│   26.4584│361.02│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 8192   │   5.08│  27000│   5.00│  26562│  5312.56│  188.2333│100.00│
│neon_d32::pcomplex_mul3 x 8192 │   5.01│  97000│   5.00│  96891│ 19378.25│   51.6042│364.76│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 16384  │   5.23│  14000│   5.00│  13394│  2678.92│  373.2847│100.00│
│neon_d32::pcomplex_mul3 x 16384│   5.07│  52000│   5.00│  51265│ 10253.09│   97.5316│382.73│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 32768  │   5.69│   7000│   5.00│   6152│  1230.43│  812.7244│100.00│
│neon_d32::pcomplex_mul3 x 32768│   5.23│  18000│   5.00│  17222│  3444.58│  290.3112│279.95│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 65536  │   5.09│   3000│   5.00│   2949│   589.83│ 1695.4077│100.00│
│neon_d32::pcomplex_mul3 x 65536│   5.01│   6000│   5.00│   5984│  1196.91│  835.4813│202.93│
└───────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
RPI 3B:

Code: Select all

┌Case───────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::pcomplex_mul3 x 256    │   5.00│1222000│   5.00│1221007│244201.42│    4.0950│100.00│
│neon_d32::pcomplex_mul3 x 256  │   5.00│4781000│   5.00│4780455│956091.20│    1.0459│391.52│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 512    │   5.01│ 613000│   5.00│ 612096│122419.26│    8.1686│100.00│
│neon_d32::pcomplex_mul3 x 512  │   5.00│2426000│   5.00│2425035│485007.16│    2.0618│396.19│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 1024   │   5.01│ 306000│   5.00│ 305352│ 61070.58│   16.3745│100.00│
│neon_d32::pcomplex_mul3 x 1024 │   5.00│1256000│   5.00│1255695│251139.07│    3.9819│411.23│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 2048   │   5.02│ 146000│   5.00│ 145352│ 29070.42│   34.3992│100.00│
│neon_d32::pcomplex_mul3 x 2048 │   5.00│ 554000│   5.00│ 553900│110780.06│    9.0269│381.07│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 4096   │   5.04│  71000│   5.00│  70398│ 14079.68│   71.0244│100.00│
│neon_d32::pcomplex_mul3 x 4096 │   5.01│ 264000│   5.00│ 263732│ 52746.41│   18.9586│374.63│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 8192   │   5.11│  36000│   5.00│  35230│  7046.02│  141.9241│100.00│
│neon_d32::pcomplex_mul3 x 8192 │   5.00│ 130000│   5.00│ 129948│ 25989.71│   38.4768│368.86│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 16384  │   5.11│  18000│   5.00│  17606│  3521.39│  283.9789│100.00│
│neon_d32::pcomplex_mul3 x 16384│   5.05│  65000│   5.00│  64328│ 12865.80│   77.7255│365.36│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 32768  │   5.06│   8000│   5.00│   7902│  1580.50│  632.7099│100.00│
│neon_d32::pcomplex_mul3 x 32768│   5.09│  21000│   5.00│  20614│  4122.87│  242.5497│260.86│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 65536  │   5.53│   4000│   5.00│   3615│   723.09│ 1382.9450│100.00│
│neon_d32::pcomplex_mul3 x 65536│   5.04│   7000│   5.00│   6943│  1388.70│  720.0977│192.05│
└───────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
RPI 4B:

Code: Select all

┌Case───────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬─Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::pcomplex_mul3 x 256    │   5.00│4766000│   5.00│4765436│ 953087.35│    1.0492│100.00│
│neon_d32::pcomplex_mul3 x 256  │   5.00│8325000│   5.00│8324159│1664831.85│    0.6007│174.68│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 512    │   5.00│2419000│   5.00│2418675│ 483735.18│    2.0672│100.00│
│neon_d32::pcomplex_mul3 x 512  │   5.00│4248000│   5.00│4247392│ 849478.52│    1.1772│175.61│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 1024   │   5.00│1215000│   5.00│1214474│ 242894.97│    4.1170│100.00│
│neon_d32::pcomplex_mul3 x 1024 │   5.00│2146000│   5.00│2145721│ 429144.21│    2.3302│176.68│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 2048   │   5.00│ 596000│   5.00│ 595606│ 119121.21│    8.3948│100.00│
│neon_d32::pcomplex_mul3 x 2048 │   5.00│ 994000│   5.00│ 993814│ 198762.83│    5.0311│166.86│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 4096   │   5.02│ 301000│   5.00│ 300003│  60000.80│   16.6664│100.00│
│neon_d32::pcomplex_mul3 x 4096 │   5.01│ 516000│   5.00│ 515361│ 103072.36│    9.7019│171.78│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 8192   │   5.02│ 151000│   5.00│ 150344│  30068.91│   33.2569│100.00│
│neon_d32::pcomplex_mul3 x 8192 │   5.02│ 260000│   5.00│ 259211│  51842.23│   19.2893│172.41│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 16384  │   5.05│  76000│   5.00│  75291│  15058.28│   66.4086│100.00│
│neon_d32::pcomplex_mul3 x 16384│   5.03│ 131000│   5.00│ 130171│  26034.22│   38.4110│172.89│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 32768  │   5.06│  37000│   5.00│  36573│   7314.79│  136.7094│100.00│
│neon_d32::pcomplex_mul3 x 32768│   5.03│  60000│   5.00│  59631│  11926.25│   83.8487│163.04│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 65536  │   5.02│  17000│   5.00│  16943│   3388.71│  295.0975│100.00│
│neon_d32::pcomplex_mul3 x 65536│   5.21│  24000│   5.00│  23012│   4602.59│  217.2690│135.82│
└───────────────────────────────┴───────┴───────┴───────┴───────┴──────────┴──────────┴──────┘
ATB S:

Code: Select all

┌Case───────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬─Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::pcomplex_mul3 x 256    │   5.00│3571000│   5.00│3570153│ 714030.63│    1.4005│100.00│
│neon_d32::pcomplex_mul3 x 256  │   5.00│5853000│   5.00│5852174│1170434.97│    0.8544│163.92│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 512    │   5.00│1830000│   5.00│1829781│ 365956.38│    2.7326│100.00│
│neon_d32::pcomplex_mul3 x 512  │   5.00│2954000│   5.00│2953243│ 590648.79│    1.6931│161.40│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 1024   │   5.00│ 916000│   5.00│ 915967│ 183193.51│    5.4587│100.00│
│neon_d32::pcomplex_mul3 x 1024 │   5.00│1484000│   5.00│1483181│ 296636.26│    3.3711│161.93│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 2048   │   5.01│ 448000│   5.00│ 447374│  89474.97│   11.1763│100.00│
│neon_d32::pcomplex_mul3 x 2048 │   5.00│ 725000│   5.00│ 724702│ 144940.43│    6.8994│161.99│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 4096   │   5.01│ 228000│   5.00│ 227398│  45479.80│   21.9878│100.00│
│neon_d32::pcomplex_mul3 x 4096 │   5.00│ 357000│   5.00│ 356757│  71351.55│   14.0151│156.89│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 8192   │   5.03│ 115000│   5.00│ 114395│  22879.07│   43.7081│100.00│
│neon_d32::pcomplex_mul3 x 8192 │   5.02│ 181000│   5.00│ 180369│  36073.87│   27.7209│157.67│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 16384  │   5.07│  57000│   5.00│  56168│  11233.61│   89.0186│100.00│
│neon_d32::pcomplex_mul3 x 16384│   5.02│  91000│   5.00│  90609│  18121.87│   55.1819│161.32│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 32768  │   5.10│  29000│   5.00│  28427│   5685.49│  175.8864│100.00│
│neon_d32::pcomplex_mul3 x 32768│   5.02│  45000│   5.00│  44844│   8968.96│  111.4956│157.75│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 65536  │   5.37│  14000│   5.00│  13038│   2607.62│  383.4910│100.00│
│neon_d32::pcomplex_mul3 x 65536│   5.00│  15000│   5.00│  14992│   2998.49│  333.5008│114.99│
└───────────────────────────────┴───────┴───────┴───────┴───────┴──────────┴──────────┴──────┘
Test #2: HSLA -> RGBA color conversion, each color is set of 4 32-bit floating-point numbers.

Code: Select all

.test/lsp-plugins-test ptest dsp.graphics.hsla_to_rgba
RPI 2 v1.2:

Code: Select all

┌Case──────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::hsla_to_rgba x 64     │   5.01│1150000│   5.00│1148586│229717.36│    4.3532│100.00│
│neon_d32::hsla_to_rgba x 64   │   5.00│2470000│   5.00│2469142│493828.54│    2.0250│214.97│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 128    │   5.03│ 565000│   5.00│ 561791│112358.37│    8.9001│100.00│
│neon_d32::hsla_to_rgba x 128  │   5.01│1255000│   5.00│1252281│250456.26│    3.9927│222.91│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 256    │   5.08│ 285000│   5.00│ 280395│ 56079.12│   17.8319│100.00│
│neon_d32::hsla_to_rgba x 256  │   5.04│ 635000│   5.00│ 630223│126044.78│    7.9337│224.76│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 512    │   5.01│ 140000│   5.00│ 139587│ 27917.41│   35.8199│100.00│
│neon_d32::hsla_to_rgba x 512  │   5.06│ 320000│   5.00│ 316267│ 63253.59│   15.8094│226.57│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 1024   │   5.07│  70000│   5.00│  69045│ 13809.14│   72.4158│100.00│
│neon_d32::hsla_to_rgba x 1024 │   5.06│ 160000│   5.00│ 158172│ 31634.51│   31.6110│229.08│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 2048   │   5.11│  35000│   5.00│  34220│  6844.04│  146.1125│100.00│
│neon_d32::hsla_to_rgba x 2048 │   5.10│  80000│   5.00│  78449│ 15689.98│   63.7349│229.25│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 4096   │   5.82│  20000│   5.00│  17169│  3433.97│  291.2079│100.00│
│neon_d32::hsla_to_rgba x 4096 │   5.11│  40000│   5.00│  39164│  7832.88│  127.6669│228.10│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 8192   │   5.84│  10000│   5.00│   8567│  1713.46│  583.6151│100.00│
│neon_d32::hsla_to_rgba x 8192 │   5.08│  20000│   5.00│  19670│  3934.14│  254.1852│229.60│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 16384  │   5.92│   5000│   5.00│   4223│   844.67│ 1183.8974│100.00│
│neon_d32::hsla_to_rgba x 16384│   5.12│  10000│   5.00│   9769│  1953.89│  511.7994│231.32│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 32768  │  12.26│   5000│   5.00│   2038│   407.71│ 2452.7014│100.00│
│neon_d32::hsla_to_rgba x 32768│   5.21│   5000│   5.00│   4798│   959.76│ 1041.9286│235.40│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 65536  │  24.71│   5000│   5.00│   1011│   202.35│ 4942.0450│100.00│
│neon_d32::hsla_to_rgba x 65536│  10.47│   5000│   5.00│   2387│   477.53│ 2094.0958│236.00│
└──────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
RPI 3B:

Code: Select all

┌Case──────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::hsla_to_rgba x 64     │   5.00│1535000│   5.00│1534953│306990.73│    3.2574│100.00│
│neon_d32::hsla_to_rgba x 64   │   5.01│3315000│   5.00│3310910│662182.07│    1.5102│215.70│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 128    │   5.00│ 750000│   5.00│ 749272│149854.46│    6.6731│100.00│
│neon_d32::hsla_to_rgba x 128  │   5.00│1680000│   5.00│1679023│335804.63│    2.9779│224.09│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 256    │   5.04│ 375000│   5.00│ 371678│ 74335.69│   13.4525│100.00│
│neon_d32::hsla_to_rgba x 256  │   5.03│ 850000│   5.00│ 845517│169103.41│    5.9135│227.49│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 512    │   5.13│ 190000│   5.00│ 185137│ 37027.54│   27.0069│100.00│
│neon_d32::hsla_to_rgba x 512  │   5.01│ 425000│   5.00│ 424278│ 84855.71│   11.7847│229.17│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 1024   │   5.11│  95000│   5.00│  92925│ 18585.09│   53.8066│100.00│
│neon_d32::hsla_to_rgba x 1024 │   5.07│ 215000│   5.00│ 212132│ 42426.47│   23.5702│228.28│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 2048   │   5.42│  50000│   5.00│  46164│  9232.89│  108.3085│100.00│
│neon_d32::hsla_to_rgba x 2048 │   5.24│ 110000│   5.00│ 105017│ 21003.43│   47.6113│227.48│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 4096   │   5.43│  25000│   5.00│  23036│  4607.38│  217.0433│100.00│
│neon_d32::hsla_to_rgba x 4096 │   5.21│  55000│   5.00│  52762│ 10552.43│   94.7649│229.03│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 8192   │   6.52│  15000│   5.00│  11495│  2299.02│  434.9682│100.00│
│neon_d32::hsla_to_rgba x 8192 │   5.68│  30000│   5.00│  26416│  5283.28│  189.2765│229.81│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 16384  │   8.72│  10000│   5.00│   5730│  1146.19│  872.4523│100.00│
│neon_d32::hsla_to_rgba x 16384│   5.68│  15000│   5.00│  13195│  2639.14│  378.9107│230.25│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 32768  │   8.71│   5000│   5.00│   2869│   573.90│ 1742.4670│100.00│
│neon_d32::hsla_to_rgba x 32768│   7.68│  10000│   5.00│   6507│  1301.47│  768.3627│226.78│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 65536  │  17.61│   5000│   5.00│   1419│   283.91│ 3522.1824│100.00│
│neon_d32::hsla_to_rgba x 65536│   7.97│   5000│   5.00│   3136│   627.36│ 1593.9904│220.97│
└──────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
RPI 4B:

Code: Select all

┌Case──────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬─Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::hsla_to_rgba x 64     │   5.00│4635000│   5.00│4632126│ 926425.25│    1.0794│100.00│
│neon_d32::hsla_to_rgba x 64   │   5.00│6160000│   5.00│6158996│1231799.22│    0.8118│132.96│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 128    │   5.01│2320000│   5.00│2315033│ 463006.67│    2.1598│100.00│
│neon_d32::hsla_to_rgba x 128  │   5.00│3050000│   5.00│3048391│ 609678.33│    1.6402│131.68│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 256    │   5.02│1110000│   5.00│1105481│ 221096.29│    4.5229│100.00│
│neon_d32::hsla_to_rgba x 256  │   5.01│1550000│   5.00│1547897│ 309579.41│    3.2302│140.02│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 512    │   5.02│ 440000│   5.00│ 438303│  87660.67│   11.4076│100.00│
│neon_d32::hsla_to_rgba x 512  │   5.01│ 780000│   5.00│ 779036│ 155807.33│    6.4182│177.74│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 1024   │   5.05│ 150000│   5.00│ 148385│  29677.08│   33.6960│100.00│
│neon_d32::hsla_to_rgba x 1024 │   5.04│ 390000│   5.00│ 386781│  77356.23│   12.9272│260.66│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 2048   │   5.15│  65000│   5.00│  63161│  12632.28│   79.1623│100.00│
│neon_d32::hsla_to_rgba x 2048 │   5.10│ 195000│   5.00│ 191178│  38235.60│   26.1536│302.68│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 4096   │   5.76│  35000│   5.00│  30367│   6073.44│  164.6512│100.00│
│neon_d32::hsla_to_rgba x 4096 │   5.21│ 100000│   5.00│  95976│  19195.20│   52.0964│316.05│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 8192   │   6.60│  20000│   5.00│  15147│   3029.56│  330.0811│100.00│
│neon_d32::hsla_to_rgba x 8192 │   5.19│  50000│   5.00│  48143│   9628.75│  103.8556│317.83│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 16384  │   6.60│  10000│   5.00│   7573│   1514.63│  660.2272│100.00│
│neon_d32::hsla_to_rgba x 16384│   5.19│  25000│   5.00│  24102│   4820.42│  207.4509│318.26│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 32768  │   6.58│   5000│   5.00│   3801│    760.22│ 1315.4130│100.00│
│neon_d32::hsla_to_rgba x 32768│   6.22│  15000│   5.00│  12049│   2409.89│  414.9569│317.00│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 65536  │  13.22│   5000│   5.00│   1890│    378.19│ 2644.1880│100.00│
│neon_d32::hsla_to_rgba x 65536│   8.56│  10000│   5.00│   5844│   1168.85│  855.5439│309.07│
└──────────────────────────────┴───────┴───────┴───────┴───────┴──────────┴──────────┴──────┘
ATB S:

Code: Select all

┌Case──────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::hsla_to_rgba x 64     │   5.01│2125000│   5.00│2122408│424481.79│    2.3558│100.00│
│neon_d32::hsla_to_rgba x 64   │   5.01│3455000│   5.00│3451185│690237.01│    1.4488│162.61│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 128    │   5.01│1025000│   5.00│1022313│204462.79│    4.8909│100.00│
│neon_d32::hsla_to_rgba x 128  │   5.01│1740000│   5.00│1738133│347626.79│    2.8766│170.02│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 256    │   5.01│ 480000│   5.00│ 479236│ 95847.26│   10.4333│100.00│
│neon_d32::hsla_to_rgba x 256  │   5.02│ 875000│   5.00│ 872039│174407.82│    5.7337│181.96│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 512    │   5.07│ 235000│   5.00│ 231694│ 46338.84│   21.5802│100.00│
│neon_d32::hsla_to_rgba x 512  │   5.04│ 440000│   5.00│ 436707│ 87341.53│   11.4493│188.48│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 1024   │   5.17│ 115000│   5.00│ 111313│ 22262.79│   44.9180│100.00│
│neon_d32::hsla_to_rgba x 1024 │   5.04│ 220000│   5.00│ 218465│ 43693.11│   22.8869│196.26│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 2048   │   5.41│  60000│   5.00│  55490│ 11098.19│   90.1047│100.00│
│neon_d32::hsla_to_rgba x 2048 │   5.06│ 110000│   5.00│ 108748│ 21749.65│   45.9778│195.97│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 4096   │   5.46│  30000│   5.00│  27494│  5498.94│  181.8531│100.00│
│neon_d32::hsla_to_rgba x 4096 │   5.06│  55000│   5.00│  54396│ 10879.22│   91.9184│197.84│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 8192   │   5.48│  15000│   5.00│  13695│  2739.03│  365.0929│100.00│
│neon_d32::hsla_to_rgba x 8192 │   5.52│  30000│   5.00│  27161│  5432.34│  184.0828│198.33│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 16384  │   7.27│  10000│   5.00│   6875│  1375.15│  727.1949│100.00│
│neon_d32::hsla_to_rgba x 16384│   5.53│  15000│   5.00│  13567│  2713.59│  368.5155│197.33│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 32768  │   7.31│   5000│   5.00│   3418│   683.71│ 1462.6100│100.00│
│neon_d32::hsla_to_rgba x 32768│   7.39│  10000│   5.00│   6769│  1353.89│  738.6115│198.02│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 65536  │  14.59│   5000│   5.00│   1713│   342.60│ 2918.8394│100.00│
│neon_d32::hsla_to_rgba x 65536│   7.40│   5000│   5.00│   3378│   675.67│ 1480.0100│197.22│
└──────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
Test #3: Testing biquad recursive filters.

Code: Select all

.test/lsp-plugins-test ptest dsp.filters.static
RPI 2 v1.2:

Code: Select all

┌Case──────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::biquad_process_x1 x8  │  30.87│ 240000│  30.00│ 233265│  7775.51│  128.6090│100.00│
│neon_d32::biquad_process_x1 x8│  30.31│ 330000│  30.00│ 326599│ 10886.67│   91.8555│140.01│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x2 x4  │  30.26│ 300000│  30.00│ 297430│  9914.36│  100.8638│100.00│
│neon_d32::biquad_process_x2 x4│  30.38│ 550000│  30.00│ 543070│ 18102.34│   55.2415│182.59│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x4 x2  │  30.64│ 370000│  30.00│ 362275│ 12075.85│   82.8099│100.00│
│neon_d32::biquad_process_x4 x2│  30.02│1080000│  30.00│1079230│ 35974.37│   27.7976│297.90│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x8 x1  │  30.23│ 360000│  30.00│ 357240│ 11908.01│   83.9771│100.00│
│neon_d32::biquad_process_x8 x1│  30.05│1120000│  30.00│1118058│ 37268.60│   26.8322│312.97│
└──────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
RPI 3B:

Code: Select all

┌Case──────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::biquad_process_x1 x8  │  30.65│ 320000│  30.00│ 313237│ 10441.26│   95.7739│100.00│
│neon_d32::biquad_process_x1 x8│  30.14│ 440000│  30.00│ 437900│ 14596.68│   68.5087│139.80│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x2 x4  │  30.09│ 400000│  30.00│ 398797│ 13293.25│   75.2262│100.00│
│neon_d32::biquad_process_x2 x4│  30.08│ 730000│  30.00│ 728149│ 24271.67│   41.2003│182.59│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x4 x2  │  30.26│ 490000│  30.00│ 485772│ 16192.43│   61.7573│100.00│
│neon_d32::biquad_process_x4 x2│  30.05│1450000│  30.00│1447391│ 48246.37│   20.7269│297.96│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x8 x1  │  30.06│ 480000│  30.00│ 478974│ 15965.83│   62.6338│100.00│
│neon_d32::biquad_process_x8 x1│  30.02│1500000│  30.00│1499104│ 49970.15│   20.0119│312.98│
└──────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
RPI 4B:

Code: Select all

┌Case──────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::biquad_process_x1 x8  │  30.51│ 530000│  30.00│ 521091│ 17369.71│   57.5715│147.28│
│neon_d32::biquad_process_x1 x8│  30.53│ 360000│  30.00│ 353798│ 11793.28│   84.7941│100.00│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x2 x4  │  30.06│1320000│  30.00│1317284│ 43909.48│   22.7741│288.33│
│neon_d32::biquad_process_x2 x4│  30.21│ 460000│  30.00│ 456863│ 15228.80│   65.6651│100.00│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x4 x2  │  30.15│1190000│  30.00│1184163│ 39472.12│   25.3343│138.04│
│neon_d32::biquad_process_x4 x2│  30.08│ 860000│  30.00│ 857823│ 28594.13│   34.9722│100.00│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x8 x1  │  30.28│ 820000│  30.00│ 812439│ 27081.31│   36.9258│100.00│
│neon_d32::biquad_process_x8 x1│  30.18│1530000│  30.00│1520779│ 50692.64│   19.7267│187.19│
└──────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
ATB S:

Code: Select all

┌Case──────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::biquad_process_x1 x8  │  30.52│ 360000│  30.00│ 353841│ 11794.72│   84.7837│100.00│
│neon_d32::biquad_process_x1 x8│  30.02│ 460000│  30.00│ 459745│ 15324.83│   65.2536│129.93│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x2 x4  │  30.09│ 790000│  30.00│ 787760│ 26258.67│   38.0827│100.00│
│neon_d32::biquad_process_x2 x4│  30.16│ 940000│  30.00│ 934894│ 31163.15│   32.0892│118.68│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x4 x2  │  30.27│ 930000│  30.00│ 921617│ 30720.59│   32.5515│100.00│
│neon_d32::biquad_process_x4 x2│  30.12│2400000│  30.00│2390643│ 79688.10│   12.5489│259.40│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x8 x1  │  30.20│ 750000│  30.00│ 744987│ 24832.92│   40.2691│100.00│
│neon_d32::biquad_process_x8 x1│  30.05│1910000│  30.00│1906587│ 63552.93│   15.7349│255.92│
└──────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
Test #4: Testing FFT performance.

Code: Select all

.test/lsp-plugins-test ptest dsp.fft.fft
RPI 2 v1.2:

Code: Select all

┌Case───────────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::direct_fft x 256           │  30.03│ 984000│  30.00│ 983006│ 32766.89│   30.5186│100.72│
│native::packed_direct_fft x 256    │  30.00│ 976000│  30.00│ 975947│ 32531.58│   30.7394│100.00│
│neon_d32::direct_fft x 256         │  30.00│3172000│  30.00│3171515│105717.18│    9.4592│324.97│
│neon_d32::packed_direct_fft x 256  │  30.01│2909000│  30.00│2908396│ 96946.56│   10.3150│298.01│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 512           │  30.05│ 437000│  30.00│ 436215│ 14540.52│   68.7734│100.80│
│native::packed_direct_fft x 512    │  30.02│ 433000│  30.00│ 432760│ 14425.36│   69.3224│100.00│
│neon_d32::direct_fft x 512         │  30.02│1436000│  30.00│1435029│ 47834.30│   20.9055│331.60│
│neon_d32::packed_direct_fft x 512  │  30.02│1323000│  30.00│1322334│ 44077.82│   22.6871│305.56│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 1024          │  30.02│ 195000│  30.00│ 194851│  6495.05│  153.9633│101.13│
│native::packed_direct_fft x 1024   │  30.05│ 193000│  30.00│ 192674│  6422.48│  155.7032│100.00│
│neon_d32::direct_fft x 1024        │  30.00│ 645000│  30.00│ 644953│ 21498.43│   46.5150│334.74│
│neon_d32::packed_direct_fft x 1024 │  30.00│ 597000│  30.00│ 596909│ 19896.99│   50.2589│309.80│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 2048          │  30.34│  87000│  30.00│  86025│  2867.50│  348.7354│100.00│
│native::packed_direct_fft x 2048   │  30.02│  87000│  30.00│  86932│  2897.74│  345.0969│101.05│
│neon_d32::direct_fft x 2048        │  30.09│ 276000│  30.00│ 275196│  9173.20│  109.0132│319.90│
│neon_d32::packed_direct_fft x 2048 │  30.02│ 270000│  30.00│ 269847│  8994.93│  111.1737│313.69│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 4096          │  30.72│  36000│  30.00│  35160│  1172.03│  853.2223│100.00│
│native::packed_direct_fft x 4096   │  30.18│  38000│  30.00│  37769│  1259.00│  794.2829│107.42│
│neon_d32::direct_fft x 4096        │  30.14│ 105000│  30.00│ 104524│  3484.14│  287.0151│297.27│
│neon_d32::packed_direct_fft x 4096 │  30.14│ 113000│  30.00│ 112474│  3749.14│  266.7282│319.88│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 8192          │  31.31│  15000│  30.00│  14374│   479.14│ 2087.0702│100.00│
│native::packed_direct_fft x 8192   │  30.89│  17000│  30.00│  16512│   550.42│ 1816.8099│114.88│
│neon_d32::direct_fft x 8192        │  30.53│  34000│  30.00│  33405│  1113.52│  898.0569│232.40│
│neon_d32::packed_direct_fft x 8192 │  30.33│  44000│  30.00│  43523│  1450.78│  689.2854│302.79│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 16384         │  31.28│   7000│  30.00│   6713│   223.77│ 4468.8017│100.00│
│native::packed_direct_fft x 16384  │  31.53│   8000│  30.00│   7611│   253.73│ 3941.2022│113.39│
│neon_d32::direct_fft x 16384       │  30.96│  16000│  30.00│  15503│   516.78│ 1935.0739│230.94│
│neon_d32::packed_direct_fft x 16384│  30.28│  20000│  30.00│  19812│   660.42│ 1514.1930│295.13│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 32768         │  30.36│   3000│  30.00│   2964│    98.81│10120.9080│100.00│
│native::packed_direct_fft x 32768  │  35.30│   4000│  30.00│   3399│   113.31│ 8825.7283│114.68│
│neon_d32::direct_fft x 32768       │  31.13│   7000│  30.00│   6745│   224.86│ 4447.1594│227.58│
│neon_d32::packed_direct_fft x 32768│  30.96│   9000│  30.00│   8720│   290.69│ 3440.0632│294.21│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 65536         │  53.35│   2000│  30.00│   1124│    37.49│26675.3555│100.00│
│native::packed_direct_fft x 65536  │  45.28│   2000│  30.00│   1325│    44.17│22641.4340│117.82│
│neon_d32::direct_fft x 65536       │  40.37│   3000│  30.00│   2229│    74.30│13458.0963│198.21│
│neon_d32::packed_direct_fft x 65536│  31.26│   3000│  30.00│   2879│    95.97│10419.9240│256.00│
└───────────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
RPI 3B:

Code: Select all

┌Case───────────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::direct_fft x 256           │  30.01│1332000│  30.00│1331626│ 44387.56│   22.5288│101.78│
│native::packed_direct_fft x 256    │  30.02│1309000│  30.00│1308291│ 43609.72│   22.9307│100.00│
│neon_d32::direct_fft x 256         │  30.00│4265000│  30.00│4264809│142160.31│    7.0343│325.98│
│neon_d32::packed_direct_fft x 256  │  30.00│3907000│  30.00│3906824│130227.50│    7.6789│298.62│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 512           │  30.05│ 590000│  30.00│ 589086│ 19636.23│   50.9263│101.46│
│native::packed_direct_fft x 512    │  30.02│ 581000│  30.00│ 580612│ 19353.74│   51.6696│100.00│
│neon_d32::direct_fft x 512         │  30.00│1926000│  30.00│1925738│ 64191.27│   15.5784│331.67│
│neon_d32::packed_direct_fft x 512  │  30.00│1774000│  30.00│1773897│ 59129.92│   16.9119│305.52│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 1024          │  30.05│ 264000│  30.00│ 263595│  8786.52│  113.8106│101.28│
│native::packed_direct_fft x 1024   │  30.08│ 261000│  30.00│ 260263│  8675.46│  115.2677│100.00│
│neon_d32::direct_fft x 1024        │  30.01│ 875000│  30.00│ 874755│ 29158.51│   34.2953│336.10│
│neon_d32::packed_direct_fft x 1024 │  30.03│ 806000│  30.00│ 805103│ 26836.78│   37.2623│309.34│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 2048          │  30.03│ 117000│  30.00│ 116864│  3895.47│  256.7081│102.71│
│native::packed_direct_fft x 2048   │  30.06│ 114000│  30.00│ 113777│  3792.58│  263.6726│100.00│
│neon_d32::direct_fft x 2048        │  30.04│ 374000│  30.00│ 373458│ 12448.62│   80.3302│328.24│
│neon_d32::packed_direct_fft x 2048 │  30.05│ 341000│  30.00│ 340423│ 11347.47│   88.1254│299.20│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 4096          │  30.47│  48000│  30.00│  47254│  1575.16│  634.8556│100.00│
│native::packed_direct_fft x 4096   │  30.58│  49000│  30.00│  48066│  1602.22│  624.1323│101.72│
│neon_d32::direct_fft x 4096        │  30.13│ 144000│  30.00│ 143375│  4779.19│  209.2405│303.41│
│neon_d32::packed_direct_fft x 4096 │  30.11│ 126000│  30.00│ 125535│  4184.52│  238.9759│265.66│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 8192          │  30.69│  19000│  30.00│  18571│   619.05│ 1615.3704│100.00│
│native::packed_direct_fft x 8192   │  31.37│  22000│  30.00│  21039│   701.33│ 1425.8718│113.29│
│neon_d32::direct_fft x 8192        │  30.28│  43000│  30.00│  42608│  1420.28│  704.0871│229.43│
│neon_d32::packed_direct_fft x 8192 │  30.30│  53000│  30.00│  52473│  1749.13│  571.7130│282.55│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 16384         │  32.94│   9000│  30.00│   8195│   273.18│ 3660.5484│100.00│
│native::packed_direct_fft x 16384  │  31.43│  10000│  30.00│   9545│   318.18│ 3142.8263│116.47│
│neon_d32::direct_fft x 16384       │  30.14│  20000│  30.00│  19910│   663.67│ 1506.7712│242.94│
│neon_d32::packed_direct_fft x 16384│  30.07│  24000│  30.00│  23943│   798.11│ 1252.9593│292.15│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 32768         │  33.49│   4000│  30.00│   3583│   119.44│ 8372.4263│100.00│
│native::packed_direct_fft x 32768  │  36.17│   5000│  30.00│   4146│   138.22│ 7234.8576│115.72│
│neon_d32::direct_fft x 32768       │  33.32│   9000│  30.00│   8104│   270.14│ 3701.7390│226.18│
│neon_d32::packed_direct_fft x 32768│  30.02│  10000│  30.00│   9993│   333.13│ 3001.8067│278.91│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 65536         │  48.27│   2000│  30.00│   1243│    41.43│24134.7980│100.00│
│native::packed_direct_fft x 65536  │  38.55│   2000│  30.00│   1556│    51.88│19276.3165│125.20│
│neon_d32::direct_fft x 65536       │  38.96│   3000│  30.00│   2309│    76.99│12988.2230│185.82│
│neon_d32::packed_direct_fft x 65536│  36.24│   4000│  30.00│   3310│   110.37│ 9060.7077│266.37│
└───────────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
RPI 4B:

Code: Select all

┌Case───────────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::direct_fft x 256           │  30.01│4133000│  30.00│4132257│137741.90│    7.2600│100.00│
│native::packed_direct_fft x 256    │  30.01│4182000│  30.00│4181190│139373.02│    7.1750│101.18│
│neon_d32::direct_fft x 256         │  30.00│6437000│  30.00│6436997│214566.58│    4.6606│155.77│
│neon_d32::packed_direct_fft x 256  │  30.00│7430000│  30.00│7429785│247659.52│    4.0378│179.80│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 512           │  30.02│1856000│  30.00│1855015│ 61833.85│   16.1724│100.00│
│native::packed_direct_fft x 512    │  30.02│1866000│  30.00│1865031│ 62167.73│   16.0855│100.54│
│neon_d32::direct_fft x 512         │  30.01│2969000│  30.00│2968377│ 98945.92│   10.1065│160.02│
│neon_d32::packed_direct_fft x 512  │  30.01│3339000│  30.00│3338103│111270.13│    8.9871│179.95│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 1024          │  30.04│ 841000│  30.00│ 840006│ 28000.21│   35.7140│100.00│
│native::packed_direct_fft x 1024   │  30.03│ 844000│  30.00│ 843123│ 28104.12│   35.5820│100.37│
│neon_d32::direct_fft x 1024        │  30.02│1372000│  30.00│1371286│ 45709.54│   21.8773│163.25│
│neon_d32::packed_direct_fft x 1024 │  30.01│1507000│  30.00│1506694│ 50223.14│   19.9111│179.37│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 2048          │  30.03│ 382000│  30.00│ 381644│ 12721.48│   78.6072│100.00│
│native::packed_direct_fft x 2048   │  30.05│ 384000│  30.00│ 383310│ 12777.01│   78.2656│100.44│
│neon_d32::direct_fft x 2048        │  30.03│ 630000│  30.00│ 629390│ 20979.69│   47.6651│164.92│
│neon_d32::packed_direct_fft x 2048 │  30.04│ 673000│  30.00│ 672158│ 22405.28│   44.6323│176.12│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 4096          │  30.04│ 167000│  30.00│ 166752│  5558.42│  179.9073│100.00│
│native::packed_direct_fft x 4096   │  30.15│ 169000│  30.00│ 168153│  5605.11│  178.4087│100.84│
│neon_d32::direct_fft x 4096        │  30.01│ 259000│  30.00│ 258899│  8629.99│  115.8750│155.26│
│neon_d32::packed_direct_fft x 4096 │  30.09│ 283000│  30.00│ 282156│  9405.23│  106.3238│169.21│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 8192          │  30.04│  67000│  30.00│  66918│  2230.63│  448.3037│100.00│
│native::packed_direct_fft x 8192   │  30.13│  75000│  30.00│  74669│  2488.99│  401.7690│111.58│
│neon_d32::direct_fft x 8192        │  30.19│ 105000│  30.00│ 104354│  3478.47│  287.4825│155.94│
│neon_d32::packed_direct_fft x 8192 │  30.21│ 119000│  30.00│ 118168│  3938.96│  253.8740│176.59│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 16384         │  30.25│  30000│  30.00│  29750│   991.68│ 1008.3872│100.00│
│native::packed_direct_fft x 16384  │  30.63│  35000│  30.00│  34281│  1142.73│  875.0958│115.23│
│neon_d32::direct_fft x 16384       │  30.12│  43000│  30.00│  42835│  1427.85│  700.3555│143.98│
│neon_d32::packed_direct_fft x 16384│  30.07│  52000│  30.00│  51883│  1729.44│  578.2207│174.39│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 32768         │  30.18│  14000│  30.00│  13914│   463.82│ 2156.0221│100.00│
│native::packed_direct_fft x 32768  │  31.82│  17000│  30.00│  16027│   534.24│ 1871.8109│115.18│
│neon_d32::direct_fft x 32768       │  30.76│  20000│  30.00│  19505│   650.19│ 1538.0225│140.18│
│neon_d32::packed_direct_fft x 32768│  30.09│  24000│  30.00│  23928│   797.60│ 1253.7599│171.96│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 65536         │  34.39│   7000│  30.00│   6107│   203.57│ 4912.2279│100.00│
│native::packed_direct_fft x 65536  │  30.49│   7000│  30.00│   6888│   229.61│ 4355.3056│112.79│
│neon_d32::direct_fft x 65536       │  32.87│   9000│  30.00│   8213│   273.80│ 3652.3288│134.50│
│neon_d32::packed_direct_fft x 65536│  32.17│  11000│  30.00│  10258│   341.95│ 2924.4389│167.97│
└───────────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
ATB S:

Code: Select all

┌Case───────────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::direct_fft x 256           │  30.01│2738000│  30.00│2737374│ 91245.83│   10.9594│100.00│
│native::packed_direct_fft x 256    │  30.01│2842000│  30.00│2841417│ 94713.92│   10.5581│103.80│
│neon_d32::direct_fft x 256         │  30.00│5398000│  30.00│5397415│179913.84│    5.5582│197.17│
│neon_d32::packed_direct_fft x 256  │  30.00│4659000│  30.00│4658536│155284.54│    6.4398│170.18│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 512           │  30.01│1221000│  30.00│1220583│ 40686.11│   24.5784│100.00│
│native::packed_direct_fft x 512    │  30.02│1265000│  30.00│1264280│ 42142.69│   23.7289│103.58│
│neon_d32::direct_fft x 512         │  30.00│2384000│  30.00│2383924│ 79464.15│   12.5843│195.31│
│neon_d32::packed_direct_fft x 512  │  30.01│2052000│  30.00│2051061│ 68368.72│   14.6266│168.04│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 1024          │  30.01│ 548000│  30.00│ 547797│ 18259.91│   54.7648│100.00│
│native::packed_direct_fft x 1024   │  30.02│ 566000│  30.00│ 565713│ 18857.12│   53.0304│103.27│
│neon_d32::direct_fft x 1024        │  30.03│1067000│  30.00│1066067│ 35535.60│   28.1408│194.61│
│neon_d32::packed_direct_fft x 1024 │  30.01│ 907000│  30.00│ 906618│ 30220.61│   33.0900│165.50│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 2048          │  30.11│ 246000│  30.00│ 245129│  8170.98│  122.3843│100.00│
│native::packed_direct_fft x 2048   │  30.01│ 253000│  30.00│ 252880│  8429.34│  118.6332│103.16│
│neon_d32::direct_fft x 2048        │  30.03│ 453000│  30.00│ 452512│ 15083.76│   66.2965│184.60│
│neon_d32::packed_direct_fft x 2048 │  30.02│ 402000│  30.00│ 401679│ 13389.31│   74.6864│163.86│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 4096          │  30.18│ 109000│  30.00│ 108359│  3612.00│  276.8550│100.00│
│native::packed_direct_fft x 4096   │  30.26│ 112000│  30.00│ 111036│  3701.20│  270.1826│102.47│
│neon_d32::direct_fft x 4096        │  30.09│ 189000│  30.00│ 188421│  6280.73│  159.2171│173.89│
│neon_d32::packed_direct_fft x 4096 │  30.12│ 178000│  30.00│ 177261│  5908.72│  169.2415│163.59│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 8192          │  30.65│  46000│  30.00│  45030│  1501.02│  666.2143│100.00│
│native::packed_direct_fft x 8192   │  30.22│  50000│  30.00│  49639│  1654.66│  604.3546│110.24│
│neon_d32::direct_fft x 8192        │  30.26│  75000│  30.00│  74350│  2478.35│  403.4943│165.11│
│neon_d32::packed_direct_fft x 8192 │  30.11│  74000│  30.00│  73731│  2457.70│  406.8841│163.74│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 16384         │  30.69│  21000│  30.00│  20529│   684.32│ 1461.3124│100.00│
│native::packed_direct_fft x 16384  │  30.31│  23000│  30.00│  22765│   758.85│ 1317.7867│110.89│
│neon_d32::direct_fft x 16384       │  30.63│  32000│  30.00│  31346│  1044.87│  957.0552│152.69│
│neon_d32::packed_direct_fft x 16384│  30.66│  34000│  30.00│  33266│  1108.90│  901.7948│162.04│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 32768         │  31.07│  10000│  30.00│   9656│   321.89│ 3106.6584│100.00│
│native::packed_direct_fft x 32768  │  31.37│  11000│  30.00│  10518│   350.61│ 2852.1370│108.92│
│neon_d32::direct_fft x 32768       │  30.60│  15000│  30.00│  14704│   490.13│ 2040.2563│152.27│
│neon_d32::packed_direct_fft x 32768│  31.33│  16000│  30.00│  15320│   510.69│ 1958.1298│158.65│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 65536         │  33.91│   5000│  30.00│   4422│   147.43│ 6782.8840│100.00│
│native::packed_direct_fft x 65536  │  33.31│   5000│  30.00│   4503│   150.12│ 6661.4460│101.82│
│neon_d32::direct_fft x 65536       │  31.77│   7000│  30.00│   6610│   220.36│ 4538.0359│149.47│
│neon_d32::packed_direct_fft x 65536│  31.96│   7000│  30.00│   6570│   219.00│ 4566.2017│148.55│
└───────────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
Short results:
Test #0: as expected, RPI4 has beaten all competitors. But Tinker Board S is very near.
Test #1: switching in new raspbian from gcc6 to gcc8 resulted in high native code boost. The new RPI 4 is around 4 times faster in native code and around 2 times faster in SIMD implementation.
Test #2: conditional native code also is faster but the gap between new generation and previous is nocieably lesser. Around 2 times in native implementation and around 1.5 times in SIMD-optimized implementation. Tinker Board is around at the same performance as Raspberry Pi 3.
Test #3: the results of this test are surprising. Native code executes faster than SIMD-optimized on RPI4, on other platforms the SIMD-optimized code, as expected, behaves faster. I need to inspect the generated by GCC code and make some conclusions.
Test #4: it also seems that RPI 4 is better working with cache: large blocks are processed better than on others platforms.

So, currently RPI4 is very interesting SBC platform.
LSP (Linux Studio Plugins) Developer and Maintainer.
User avatar
sadko4u
Established Member
Posts: 986
Joined: Mon Sep 28, 2015 9:03 pm
Has thanked: 2 times
Been thanked: 359 times

Re: Some performance tests on RPI4 vs RPI3

Post by sadko4u »

Found the reason for the case where in Test #3 the native code was faster than SIMD-optimized.
On Cortex-A72 mixing s0-s31 register loads and stores with q0-q15 and d0-d31 arithmetics gives extra penalty.
Replacing vld and vldm instructions with vld1.32 to one lane gives much more neat results on RPI 4:

Code: Select all

┌Case──────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::biquad_process_x1 x8  │  30.51│ 530000│  30.00│ 521095│ 17369.87│   57.5710│152.02│
│neon_d32::biquad_process_x1 x8│  30.63│ 350000│  30.00│ 342774│ 11425.81│   87.5211│100.00│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x2 x4  │  30.06│1320000│  30.00│1317208│ 43906.96│   22.7754│288.33│
│neon_d32::biquad_process_x2 x4│  30.21│ 460000│  30.00│ 456836│ 15227.87│   65.6691│100.00│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x4 x2  │  30.14│1190000│  30.00│1184658│ 39488.62│   25.3238│100.00│
│neon_d32::biquad_process_x4 x2│  30.09│2430000│  30.00│2422951│ 80765.06│   12.3816│204.53│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x8 x1  │  30.28│ 820000│  30.00│ 812486│ 27082.90│   36.9237│100.00│
│neon_d32::biquad_process_x8 x1│  30.03│3590000│  30.00│3586740│119558.02│    8.3641│441.45│
└──────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
But x2 and x4 implementations are still slow
LSP (Linux Studio Plugins) Developer and Maintainer.
User avatar
sadko4u
Established Member
Posts: 986
Joined: Mon Sep 28, 2015 9:03 pm
Has thanked: 2 times
Been thanked: 359 times

Re: Some performance tests on RPI4 vs RPI3

Post by sadko4u »

Today I've installed Manjaro AArch64 system (with GCC 9.1.0) on another Raspberry Pi with 4GB RAM.
Here are the same tests on the current devel branch of the LSP Plugins repository:
Test #0

Code: Select all

real    6m33.574s user    8m43.348s sys     1m3.176s
Test #1

Code: Select all

┌Case─────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬─Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::pcomplex_mul3 x 256  │   5.00│5731000│   5.00│5730099│1146019.85│    0.8726│100.00│
│asimd::pcomplex_mul3 x 256   │   5.00│8127000│   5.00│8126270│1625254.05│    0.6153│141.82│
├─────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 512  │   5.00│2894000│   5.00│2893257│ 578651.40│    1.7282│100.00│
│asimd::pcomplex_mul3 x 512   │   5.00│4035000│   5.00│4034650│ 806930.12│    1.2393│139.45│
├─────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 1024 │   5.00│1455000│   5.00│1454081│ 290816.32│    3.4386│100.00│
│asimd::pcomplex_mul3 x 1024  │   5.00│2035000│   5.00│2034188│ 406837.75│    2.4580│139.90│
├─────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 2048 │   5.00│ 691000│   5.00│ 690721│ 138144.27│    7.2388│100.00│
│asimd::pcomplex_mul3 x 2048  │   5.01│ 966000│   5.00│ 965031│ 193006.30│    5.1812│139.71│
├─────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 4096 │   5.01│ 354000│   5.00│ 353464│  70692.87│   14.1457│100.00│
│asimd::pcomplex_mul3 x 4096  │   5.00│ 497000│   5.00│ 496575│  99315.13│   10.0690│140.49│
├─────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 8192 │   5.01│ 178000│   5.00│ 177533│  35506.67│   28.1637│100.00│
│asimd::pcomplex_mul3 x 8192  │   5.01│ 248000│   5.00│ 247520│  49504.08│   20.2004│139.42│
├─────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 16384│   5.00│  89000│   5.00│  88956│  17791.40│   56.2069│100.00│
│asimd::pcomplex_mul3 x 16384 │   5.04│ 125000│   5.00│ 124053│  24810.79│   40.3050│139.45│
├─────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 32768│   5.10│  44000│   5.00│  43175│   8635.13│  115.8061│100.00│
│asimd::pcomplex_mul3 x 32768 │   5.05│  59000│   5.00│  58377│  11675.49│   85.6495│135.21│
├─────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 65536│   5.26│  19000│   5.00│  18077│   3615.53│  276.5848│100.00│
│asimd::pcomplex_mul3 x 65536 │   5.07│  21000│   5.00│  20704│   4140.81│  241.4989│114.53│
└─────────────────────────────┴───────┴───────┴───────┴───────┴──────────┴──────────┴──────┘
Test #2

Code: Select all

┌Case────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::hsla_to_rgba x 64   │   5.01│4340000│   5.00│4335180│867036.03│    1.1534│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 128  │   5.01│2220000│   5.00│2215429│443085.91│    2.2569│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 256  │   5.00│ 965000│   5.00│ 964793│192958.71│    5.1825│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 512  │   5.05│ 385000│   5.00│ 381368│ 76273.75│   13.1107│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 1024 │   5.18│ 145000│   5.00│ 140071│ 28014.33│   35.6960│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 2048 │   5.10│  60000│   5.00│  58856│ 11771.24│   84.9528│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 4096 │   5.31│  30000│   5.00│  28251│  5650.32│  176.9811│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 8192 │   5.35│  15000│   5.00│  14014│  2802.85│  356.7799│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 16384│   7.16│  10000│   5.00│   6980│  1396.10│  716.2832│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 32768│   7.17│   5000│   5.00│   3489│   697.83│ 1433.0150│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 65536│  14.40│   5000│   5.00│   1736│   347.28│ 2879.5062│100.00│
└────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
Test #3

Code: Select all

┌Case────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::biquad_process_x1 x8│  30.41│ 740000│  30.00│ 730077│ 24335.92│   41.0915│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x2 x4│  30.04│1370000│  30.00│1368072│ 45602.40│   21.9287│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x4 x2│  30.10│1330000│  30.00│1325607│ 44186.92│   22.6311│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x8 x1│  30.09│1370000│  30.00│1365703│ 45523.43│   21.9667│100.00│
└────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
Test #4

Code: Select all

┌Case─────────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::direct_fft x 256         │  30.00│5186000│  30.00│5185337│172844.58│    5.7855│105.36│
│native::packed_direct_fft x 256  │  30.00│4922000│  30.00│4921528│164050.96│    6.0957│100.00│
├─────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 512         │  30.00│2295000│  30.00│2294629│ 76487.65│   13.0740│104.58│
│native::packed_direct_fft x 512  │  30.01│2195000│  30.00│2194084│ 73136.16│   13.6731│100.00│
├─────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 1024        │  30.01│1033000│  30.00│1032598│ 34419.96│   29.0529│104.01│
│native::packed_direct_fft x 1024 │  30.01│ 993000│  30.00│ 992748│ 33091.63│   30.2191│100.00│
├─────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 2048        │  30.06│ 391000│  30.00│ 390269│ 13008.97│   76.8700│100.00│
│native::packed_direct_fft x 2048 │  30.05│ 427000│  30.00│ 426328│ 14210.94│   70.3683│109.24│
├─────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 4096        │  30.01│ 181000│  30.00│ 180954│  6031.81│  165.7876│100.00│
│native::packed_direct_fft x 4096 │  30.11│ 190000│  30.00│ 189279│  6309.33│  158.4954│104.60│
├─────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 8192        │  30.24│  77000│  30.00│  76398│  2546.62│  392.6767│100.00│
│native::packed_direct_fft x 8192 │  30.31│  86000│  30.00│  85109│  2836.97│  352.4882│111.40│
├─────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 16384       │  30.64│  34000│  30.00│  33290│  1109.70│  901.1450│100.00│
│native::packed_direct_fft x 16384│  30.13│  40000│  30.00│  39828│  1327.63│  753.2210│119.64│
├─────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 32768       │  30.50│  16000│  30.00│  15735│   524.52│ 1906.4963│100.00│
│native::packed_direct_fft x 32768│  30.55│  19000│  30.00│  18658│   621.94│ 1607.8599│118.57│
├─────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 65536       │  30.52│   7000│  30.00│   6881│   229.39│ 4359.3880│100.00│
│native::packed_direct_fft x 65536│  31.08│   8000│  30.00│   7723│   257.44│ 3884.4142│112.23│
└─────────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
As we see, there are a lot of non-optimized for ASIMD methods provided by LSP Plugins yet.
LSP (Linux Studio Plugins) Developer and Maintainer.
User avatar
sadko4u
Established Member
Posts: 986
Joined: Mon Sep 28, 2015 9:03 pm
Has thanked: 2 times
Been thanked: 359 times

Re: Some performance tests on RPI4 vs RPI3

Post by sadko4u »

Measurements for Banana Pi M64 with Cortex-A53 on board 2GB RAM, Aarch64, Debian Stretch 9, GCC 6.3.0.
Test #0

Code: Select all

real    8m26.972s user    15m27.800s sys     0m59.050s
Test #1

Code: Select all

┌Case─────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬─Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::pcomplex_mul3 x 256  │   5.00│1324000│   5.00│1323665│ 264733.02│    3.7774│100.00│
│asimd::pcomplex_mul3 x 256   │   5.00│5079000│   5.00│5078430│1015686.04│    0.9846│383.66│
├─────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 512  │   5.00│ 862000│   5.00│ 861580│ 172316.15│    5.8033│100.00│
│asimd::pcomplex_mul3 x 512   │   5.00│2578000│   5.00│2577158│ 515431.66│    1.9401│299.12│
├─────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 1024 │   5.01│ 430000│   5.00│ 429443│  85888.69│   11.6430│100.00│
│asimd::pcomplex_mul3 x 1024  │   5.00│1323000│   5.00│1322242│ 264448.58│    3.7815│307.90│
├─────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 2048 │   5.01│ 195000│   5.00│ 194685│  38937.09│   25.6825│100.00│
│asimd::pcomplex_mul3 x 2048  │   5.01│ 555000│   5.00│ 554254│ 110850.99│    9.0211│284.69│
├─────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 4096 │   5.01│  97000│   5.00│  96883│  19376.71│   51.6084│100.00│
│asimd::pcomplex_mul3 x 4096  │   5.00│ 263000│   5.00│ 262842│  52568.53│   19.0228│271.30│
├─────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 8192 │   5.08│  48000│   5.00│  47198│   9439.62│  105.9365│100.00│
│asimd::pcomplex_mul3 x 8192  │   5.03│ 132000│   5.00│ 131201│  26240.40│   38.1092│277.98│
├─────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 16384│   5.26│  25000│   5.00│  23785│   4757.04│  210.2150│100.00│
│asimd::pcomplex_mul3 x 16384 │   5.05│  63000│   5.00│  62341│  12468.27│   80.2036│262.10│
├─────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 32768│   5.12│  12000│   5.00│  11718│   2343.68│  426.6792│100.00│
│asimd::pcomplex_mul3 x 32768 │   5.17│  24000│   5.00│  23193│   4638.61│  215.5817│197.92│
├─────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 65536│   5.42│   5000│   5.00│   4612│    922.46│ 1084.0528│100.00│
│asimd::pcomplex_mul3 x 65536 │   5.59│   7000│   5.00│   6261│   1252.37│  798.4843│135.76│
└─────────────────────────────┴───────┴───────┴───────┴───────┴──────────┴──────────┴──────┘
Test #2

Code: Select all

┌Case────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::hsla_to_rgba x 64   │   5.00│1305000│   5.00│1304616│260923.39│    3.8325│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 128  │   5.02│ 685000│   5.00│ 682719│136543.92│    7.3237│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 256  │   5.03│ 350000│   5.00│ 348095│ 69619.09│   14.3639│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 512  │   5.06│ 175000│   5.00│ 173043│ 34608.67│   28.8945│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 1024 │   5.24│  90000│   5.00│  85844│ 17168.81│   58.2452│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 2048 │   5.27│  45000│   5.00│  42677│  8535.52│  117.1574│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 4096 │   5.81│  25000│   5.00│  21521│  4304.25│  232.3284│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 8192 │   6.97│  15000│   5.00│  10753│  2150.80│  464.9441│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 16384│   9.26│  10000│   5.00│   5400│  1080.16│  925.7896│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 32768│   9.34│   5000│   5.00│   2675│   535.18│ 1868.5150│100.00│
├────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 65536│  19.23│   5000│   5.00│   1300│   260.00│ 3846.1240│100.00│
└────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
Test #3

Code: Select all

┌Case────────────────────────┬Time[s]┬──Iter┬Samp[s]┬───Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::biquad_process_x1 x8│  30.45│310000│  30.00│305404│ 10180.14│   98.2304│100.00│
├────────────────────────────┼───────┼──────┼───────┼──────┼─────────┼──────────┼──────┤
│native::biquad_process_x2 x4│  30.48│430000│  30.00│423172│ 14105.75│   70.8931│100.00│
├────────────────────────────┼───────┼──────┼───────┼──────┼─────────┼──────────┼──────┤
│native::biquad_process_x4 x2│  30.23│570000│  30.00│565594│ 18853.16│   53.0415│100.00│
├────────────────────────────┼───────┼──────┼───────┼──────┼─────────┼──────────┼──────┤
│native::biquad_process_x8 x1│  30.45│640000│  30.00│630553│ 21018.44│   47.5773│100.00│
└────────────────────────────┴───────┴──────┴───────┴──────┴─────────┴──────────┴──────┘
Test #4

Code: Select all

┌Case─────────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::direct_fft x 256         │  30.02│1442000│  30.00│1441231│ 48041.06│   20.8155│101.64│
│native::packed_direct_fft x 256  │  30.02│1419000│  30.00│1418004│ 47266.82│   21.1565│100.00│
├─────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 512         │  30.02│ 644000│  30.00│ 643669│ 21455.66│   46.6078│102.12│
│native::packed_direct_fft x 512  │  30.03│ 631000│  30.00│ 630297│ 21009.91│   47.5966│100.00│
├─────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 1024        │  30.08│ 289000│  30.00│ 288209│  9607.00│  104.0908│101.86│
│native::packed_direct_fft x 1024 │  30.01│ 283000│  30.00│ 282946│  9431.56│  106.0270│100.00│
├─────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 2048        │  30.16│ 127000│  30.00│ 126314│  4210.48│  237.5028│100.00│
│native::packed_direct_fft x 2048 │  30.02│ 127000│  30.00│ 126927│  4230.92│  236.3551│100.49│
├─────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 4096        │  30.06│  55000│  30.00│  54883│  1829.46│  546.6080│100.00│
│native::packed_direct_fft x 4096 │  30.43│  56000│  30.00│  55200│  1840.03│  543.4693│100.58│
├─────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 8192        │  30.04│  21000│  30.00│  20973│   699.12│ 1430.3764│100.00│
│native::packed_direct_fft x 8192 │  30.24│  24000│  30.00│  23810│   793.67│ 1259.9690│113.52│
├─────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 16384       │  31.52│  10000│  30.00│   9516│   317.21│ 3152.4880│100.00│
│native::packed_direct_fft x 16384│  30.05│  11000│  30.00│  10983│   366.12│ 2731.3645│115.42│
├─────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 32768       │  36.95│   5000│  30.00│   4059│   135.33│ 7389.3776│100.00│
│native::packed_direct_fft x 32768│  31.45│   5000│  30.00│   4769│   158.98│ 6290.2866│117.47│
├─────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 65536       │  45.39│   2000│  30.00│   1322│    44.07│22692.5830│100.00│
│native::packed_direct_fft x 65536│  37.31│   2000│  30.00│   1608│    53.60│18656.4135│121.63│
└─────────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
LSP (Linux Studio Plugins) Developer and Maintainer.
Post Reply