Test configurations:
- Raspberry PI 2B v 1.2 with Cortex A-53 CPU, Raspbian stretch, GCC 6.3
- Raspberry PI 3B with with Cortex A-53 CPU, Raspbian stretch, GCC 6.3
- Raspberry PI 4B with with Cortex A-72 CPU, Raspbian buster, GCC 8.3
- ASUS Tinker Board S with Cortex-A17 CPU (actually, is identified as Cortex-A12 - hmmm), TinkerOS (Debian 9), GCC 6.3
Code: Select all
time make -j2 test
- RPI 2B v1.2 - real 9m29.012s, user 17m51.744s, sys 1m1.830s
- RPI 3B - real 9m5.419s, user 17m5.062s, sys 0m58.891s
- RPI 4B - real 4m3.904s, user 7m6.339s, sys 1m1.706s
- ATB S - real 4m22.421s, user 8m3.430s, sys 0m26.740s
Test #1: packed complex number blocks multiplication (packed means that real and imaginary parts of the same complex number are laying nearby one to another):
Code: Select all
.test/lsp-plugins-test ptest dsp.pcomplex.mul3
Code: Select all
┌Case───────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::pcomplex_mul3 x 256 │ 5.00│ 909000│ 5.00│ 908533│181706.75│ 5.5034│100.00│
│neon_d32::pcomplex_mul3 x 256 │ 5.00│3580000│ 5.00│3579060│715812.17│ 1.3970│393.94│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 512 │ 5.01│ 459000│ 5.00│ 458142│ 91628.60│ 10.9136│100.00│
│neon_d32::pcomplex_mul3 x 512 │ 5.00│1818000│ 5.00│1817274│363454.98│ 2.7514│396.66│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 1024 │ 5.01│ 230000│ 5.00│ 229314│ 45862.87│ 21.8041│100.00│
│neon_d32::pcomplex_mul3 x 1024 │ 5.00│ 941000│ 5.00│ 940935│188187.02│ 5.3139│410.33│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 2048 │ 5.01│ 106000│ 5.00│ 105852│ 21170.60│ 47.2353│100.00│
│neon_d32::pcomplex_mul3 x 2048 │ 5.01│ 491000│ 5.00│ 490502│ 98100.49│ 10.1936│463.38│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 4096 │ 5.06│ 53000│ 5.00│ 52344│ 10468.99│ 95.5202│100.00│
│neon_d32::pcomplex_mul3 x 4096 │ 5.00│ 189000│ 5.00│ 188976│ 37795.25│ 26.4584│361.02│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 8192 │ 5.08│ 27000│ 5.00│ 26562│ 5312.56│ 188.2333│100.00│
│neon_d32::pcomplex_mul3 x 8192 │ 5.01│ 97000│ 5.00│ 96891│ 19378.25│ 51.6042│364.76│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 16384 │ 5.23│ 14000│ 5.00│ 13394│ 2678.92│ 373.2847│100.00│
│neon_d32::pcomplex_mul3 x 16384│ 5.07│ 52000│ 5.00│ 51265│ 10253.09│ 97.5316│382.73│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 32768 │ 5.69│ 7000│ 5.00│ 6152│ 1230.43│ 812.7244│100.00│
│neon_d32::pcomplex_mul3 x 32768│ 5.23│ 18000│ 5.00│ 17222│ 3444.58│ 290.3112│279.95│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 65536 │ 5.09│ 3000│ 5.00│ 2949│ 589.83│ 1695.4077│100.00│
│neon_d32::pcomplex_mul3 x 65536│ 5.01│ 6000│ 5.00│ 5984│ 1196.91│ 835.4813│202.93│
└───────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
Code: Select all
┌Case───────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::pcomplex_mul3 x 256 │ 5.00│1222000│ 5.00│1221007│244201.42│ 4.0950│100.00│
│neon_d32::pcomplex_mul3 x 256 │ 5.00│4781000│ 5.00│4780455│956091.20│ 1.0459│391.52│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 512 │ 5.01│ 613000│ 5.00│ 612096│122419.26│ 8.1686│100.00│
│neon_d32::pcomplex_mul3 x 512 │ 5.00│2426000│ 5.00│2425035│485007.16│ 2.0618│396.19│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 1024 │ 5.01│ 306000│ 5.00│ 305352│ 61070.58│ 16.3745│100.00│
│neon_d32::pcomplex_mul3 x 1024 │ 5.00│1256000│ 5.00│1255695│251139.07│ 3.9819│411.23│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 2048 │ 5.02│ 146000│ 5.00│ 145352│ 29070.42│ 34.3992│100.00│
│neon_d32::pcomplex_mul3 x 2048 │ 5.00│ 554000│ 5.00│ 553900│110780.06│ 9.0269│381.07│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 4096 │ 5.04│ 71000│ 5.00│ 70398│ 14079.68│ 71.0244│100.00│
│neon_d32::pcomplex_mul3 x 4096 │ 5.01│ 264000│ 5.00│ 263732│ 52746.41│ 18.9586│374.63│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 8192 │ 5.11│ 36000│ 5.00│ 35230│ 7046.02│ 141.9241│100.00│
│neon_d32::pcomplex_mul3 x 8192 │ 5.00│ 130000│ 5.00│ 129948│ 25989.71│ 38.4768│368.86│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 16384 │ 5.11│ 18000│ 5.00│ 17606│ 3521.39│ 283.9789│100.00│
│neon_d32::pcomplex_mul3 x 16384│ 5.05│ 65000│ 5.00│ 64328│ 12865.80│ 77.7255│365.36│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 32768 │ 5.06│ 8000│ 5.00│ 7902│ 1580.50│ 632.7099│100.00│
│neon_d32::pcomplex_mul3 x 32768│ 5.09│ 21000│ 5.00│ 20614│ 4122.87│ 242.5497│260.86│
├───────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::pcomplex_mul3 x 65536 │ 5.53│ 4000│ 5.00│ 3615│ 723.09│ 1382.9450│100.00│
│neon_d32::pcomplex_mul3 x 65536│ 5.04│ 7000│ 5.00│ 6943│ 1388.70│ 720.0977│192.05│
└───────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
Code: Select all
┌Case───────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬─Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::pcomplex_mul3 x 256 │ 5.00│4766000│ 5.00│4765436│ 953087.35│ 1.0492│100.00│
│neon_d32::pcomplex_mul3 x 256 │ 5.00│8325000│ 5.00│8324159│1664831.85│ 0.6007│174.68│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 512 │ 5.00│2419000│ 5.00│2418675│ 483735.18│ 2.0672│100.00│
│neon_d32::pcomplex_mul3 x 512 │ 5.00│4248000│ 5.00│4247392│ 849478.52│ 1.1772│175.61│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 1024 │ 5.00│1215000│ 5.00│1214474│ 242894.97│ 4.1170│100.00│
│neon_d32::pcomplex_mul3 x 1024 │ 5.00│2146000│ 5.00│2145721│ 429144.21│ 2.3302│176.68│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 2048 │ 5.00│ 596000│ 5.00│ 595606│ 119121.21│ 8.3948│100.00│
│neon_d32::pcomplex_mul3 x 2048 │ 5.00│ 994000│ 5.00│ 993814│ 198762.83│ 5.0311│166.86│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 4096 │ 5.02│ 301000│ 5.00│ 300003│ 60000.80│ 16.6664│100.00│
│neon_d32::pcomplex_mul3 x 4096 │ 5.01│ 516000│ 5.00│ 515361│ 103072.36│ 9.7019│171.78│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 8192 │ 5.02│ 151000│ 5.00│ 150344│ 30068.91│ 33.2569│100.00│
│neon_d32::pcomplex_mul3 x 8192 │ 5.02│ 260000│ 5.00│ 259211│ 51842.23│ 19.2893│172.41│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 16384 │ 5.05│ 76000│ 5.00│ 75291│ 15058.28│ 66.4086│100.00│
│neon_d32::pcomplex_mul3 x 16384│ 5.03│ 131000│ 5.00│ 130171│ 26034.22│ 38.4110│172.89│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 32768 │ 5.06│ 37000│ 5.00│ 36573│ 7314.79│ 136.7094│100.00│
│neon_d32::pcomplex_mul3 x 32768│ 5.03│ 60000│ 5.00│ 59631│ 11926.25│ 83.8487│163.04│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 65536 │ 5.02│ 17000│ 5.00│ 16943│ 3388.71│ 295.0975│100.00│
│neon_d32::pcomplex_mul3 x 65536│ 5.21│ 24000│ 5.00│ 23012│ 4602.59│ 217.2690│135.82│
└───────────────────────────────┴───────┴───────┴───────┴───────┴──────────┴──────────┴──────┘
Code: Select all
┌Case───────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬─Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::pcomplex_mul3 x 256 │ 5.00│3571000│ 5.00│3570153│ 714030.63│ 1.4005│100.00│
│neon_d32::pcomplex_mul3 x 256 │ 5.00│5853000│ 5.00│5852174│1170434.97│ 0.8544│163.92│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 512 │ 5.00│1830000│ 5.00│1829781│ 365956.38│ 2.7326│100.00│
│neon_d32::pcomplex_mul3 x 512 │ 5.00│2954000│ 5.00│2953243│ 590648.79│ 1.6931│161.40│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 1024 │ 5.00│ 916000│ 5.00│ 915967│ 183193.51│ 5.4587│100.00│
│neon_d32::pcomplex_mul3 x 1024 │ 5.00│1484000│ 5.00│1483181│ 296636.26│ 3.3711│161.93│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 2048 │ 5.01│ 448000│ 5.00│ 447374│ 89474.97│ 11.1763│100.00│
│neon_d32::pcomplex_mul3 x 2048 │ 5.00│ 725000│ 5.00│ 724702│ 144940.43│ 6.8994│161.99│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 4096 │ 5.01│ 228000│ 5.00│ 227398│ 45479.80│ 21.9878│100.00│
│neon_d32::pcomplex_mul3 x 4096 │ 5.00│ 357000│ 5.00│ 356757│ 71351.55│ 14.0151│156.89│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 8192 │ 5.03│ 115000│ 5.00│ 114395│ 22879.07│ 43.7081│100.00│
│neon_d32::pcomplex_mul3 x 8192 │ 5.02│ 181000│ 5.00│ 180369│ 36073.87│ 27.7209│157.67│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 16384 │ 5.07│ 57000│ 5.00│ 56168│ 11233.61│ 89.0186│100.00│
│neon_d32::pcomplex_mul3 x 16384│ 5.02│ 91000│ 5.00│ 90609│ 18121.87│ 55.1819│161.32│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 32768 │ 5.10│ 29000│ 5.00│ 28427│ 5685.49│ 175.8864│100.00│
│neon_d32::pcomplex_mul3 x 32768│ 5.02│ 45000│ 5.00│ 44844│ 8968.96│ 111.4956│157.75│
├───────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::pcomplex_mul3 x 65536 │ 5.37│ 14000│ 5.00│ 13038│ 2607.62│ 383.4910│100.00│
│neon_d32::pcomplex_mul3 x 65536│ 5.00│ 15000│ 5.00│ 14992│ 2998.49│ 333.5008│114.99│
└───────────────────────────────┴───────┴───────┴───────┴───────┴──────────┴──────────┴──────┘
Code: Select all
.test/lsp-plugins-test ptest dsp.graphics.hsla_to_rgba
Code: Select all
┌Case──────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::hsla_to_rgba x 64 │ 5.01│1150000│ 5.00│1148586│229717.36│ 4.3532│100.00│
│neon_d32::hsla_to_rgba x 64 │ 5.00│2470000│ 5.00│2469142│493828.54│ 2.0250│214.97│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 128 │ 5.03│ 565000│ 5.00│ 561791│112358.37│ 8.9001│100.00│
│neon_d32::hsla_to_rgba x 128 │ 5.01│1255000│ 5.00│1252281│250456.26│ 3.9927│222.91│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 256 │ 5.08│ 285000│ 5.00│ 280395│ 56079.12│ 17.8319│100.00│
│neon_d32::hsla_to_rgba x 256 │ 5.04│ 635000│ 5.00│ 630223│126044.78│ 7.9337│224.76│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 512 │ 5.01│ 140000│ 5.00│ 139587│ 27917.41│ 35.8199│100.00│
│neon_d32::hsla_to_rgba x 512 │ 5.06│ 320000│ 5.00│ 316267│ 63253.59│ 15.8094│226.57│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 1024 │ 5.07│ 70000│ 5.00│ 69045│ 13809.14│ 72.4158│100.00│
│neon_d32::hsla_to_rgba x 1024 │ 5.06│ 160000│ 5.00│ 158172│ 31634.51│ 31.6110│229.08│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 2048 │ 5.11│ 35000│ 5.00│ 34220│ 6844.04│ 146.1125│100.00│
│neon_d32::hsla_to_rgba x 2048 │ 5.10│ 80000│ 5.00│ 78449│ 15689.98│ 63.7349│229.25│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 4096 │ 5.82│ 20000│ 5.00│ 17169│ 3433.97│ 291.2079│100.00│
│neon_d32::hsla_to_rgba x 4096 │ 5.11│ 40000│ 5.00│ 39164│ 7832.88│ 127.6669│228.10│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 8192 │ 5.84│ 10000│ 5.00│ 8567│ 1713.46│ 583.6151│100.00│
│neon_d32::hsla_to_rgba x 8192 │ 5.08│ 20000│ 5.00│ 19670│ 3934.14│ 254.1852│229.60│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 16384 │ 5.92│ 5000│ 5.00│ 4223│ 844.67│ 1183.8974│100.00│
│neon_d32::hsla_to_rgba x 16384│ 5.12│ 10000│ 5.00│ 9769│ 1953.89│ 511.7994│231.32│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 32768 │ 12.26│ 5000│ 5.00│ 2038│ 407.71│ 2452.7014│100.00│
│neon_d32::hsla_to_rgba x 32768│ 5.21│ 5000│ 5.00│ 4798│ 959.76│ 1041.9286│235.40│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 65536 │ 24.71│ 5000│ 5.00│ 1011│ 202.35│ 4942.0450│100.00│
│neon_d32::hsla_to_rgba x 65536│ 10.47│ 5000│ 5.00│ 2387│ 477.53│ 2094.0958│236.00│
└──────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
Code: Select all
┌Case──────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::hsla_to_rgba x 64 │ 5.00│1535000│ 5.00│1534953│306990.73│ 3.2574│100.00│
│neon_d32::hsla_to_rgba x 64 │ 5.01│3315000│ 5.00│3310910│662182.07│ 1.5102│215.70│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 128 │ 5.00│ 750000│ 5.00│ 749272│149854.46│ 6.6731│100.00│
│neon_d32::hsla_to_rgba x 128 │ 5.00│1680000│ 5.00│1679023│335804.63│ 2.9779│224.09│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 256 │ 5.04│ 375000│ 5.00│ 371678│ 74335.69│ 13.4525│100.00│
│neon_d32::hsla_to_rgba x 256 │ 5.03│ 850000│ 5.00│ 845517│169103.41│ 5.9135│227.49│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 512 │ 5.13│ 190000│ 5.00│ 185137│ 37027.54│ 27.0069│100.00│
│neon_d32::hsla_to_rgba x 512 │ 5.01│ 425000│ 5.00│ 424278│ 84855.71│ 11.7847│229.17│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 1024 │ 5.11│ 95000│ 5.00│ 92925│ 18585.09│ 53.8066│100.00│
│neon_d32::hsla_to_rgba x 1024 │ 5.07│ 215000│ 5.00│ 212132│ 42426.47│ 23.5702│228.28│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 2048 │ 5.42│ 50000│ 5.00│ 46164│ 9232.89│ 108.3085│100.00│
│neon_d32::hsla_to_rgba x 2048 │ 5.24│ 110000│ 5.00│ 105017│ 21003.43│ 47.6113│227.48│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 4096 │ 5.43│ 25000│ 5.00│ 23036│ 4607.38│ 217.0433│100.00│
│neon_d32::hsla_to_rgba x 4096 │ 5.21│ 55000│ 5.00│ 52762│ 10552.43│ 94.7649│229.03│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 8192 │ 6.52│ 15000│ 5.00│ 11495│ 2299.02│ 434.9682│100.00│
│neon_d32::hsla_to_rgba x 8192 │ 5.68│ 30000│ 5.00│ 26416│ 5283.28│ 189.2765│229.81│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 16384 │ 8.72│ 10000│ 5.00│ 5730│ 1146.19│ 872.4523│100.00│
│neon_d32::hsla_to_rgba x 16384│ 5.68│ 15000│ 5.00│ 13195│ 2639.14│ 378.9107│230.25│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 32768 │ 8.71│ 5000│ 5.00│ 2869│ 573.90│ 1742.4670│100.00│
│neon_d32::hsla_to_rgba x 32768│ 7.68│ 10000│ 5.00│ 6507│ 1301.47│ 768.3627│226.78│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 65536 │ 17.61│ 5000│ 5.00│ 1419│ 283.91│ 3522.1824│100.00│
│neon_d32::hsla_to_rgba x 65536│ 7.97│ 5000│ 5.00│ 3136│ 627.36│ 1593.9904│220.97│
└──────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
Code: Select all
┌Case──────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬─Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::hsla_to_rgba x 64 │ 5.00│4635000│ 5.00│4632126│ 926425.25│ 1.0794│100.00│
│neon_d32::hsla_to_rgba x 64 │ 5.00│6160000│ 5.00│6158996│1231799.22│ 0.8118│132.96│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 128 │ 5.01│2320000│ 5.00│2315033│ 463006.67│ 2.1598│100.00│
│neon_d32::hsla_to_rgba x 128 │ 5.00│3050000│ 5.00│3048391│ 609678.33│ 1.6402│131.68│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 256 │ 5.02│1110000│ 5.00│1105481│ 221096.29│ 4.5229│100.00│
│neon_d32::hsla_to_rgba x 256 │ 5.01│1550000│ 5.00│1547897│ 309579.41│ 3.2302│140.02│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 512 │ 5.02│ 440000│ 5.00│ 438303│ 87660.67│ 11.4076│100.00│
│neon_d32::hsla_to_rgba x 512 │ 5.01│ 780000│ 5.00│ 779036│ 155807.33│ 6.4182│177.74│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 1024 │ 5.05│ 150000│ 5.00│ 148385│ 29677.08│ 33.6960│100.00│
│neon_d32::hsla_to_rgba x 1024 │ 5.04│ 390000│ 5.00│ 386781│ 77356.23│ 12.9272│260.66│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 2048 │ 5.15│ 65000│ 5.00│ 63161│ 12632.28│ 79.1623│100.00│
│neon_d32::hsla_to_rgba x 2048 │ 5.10│ 195000│ 5.00│ 191178│ 38235.60│ 26.1536│302.68│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 4096 │ 5.76│ 35000│ 5.00│ 30367│ 6073.44│ 164.6512│100.00│
│neon_d32::hsla_to_rgba x 4096 │ 5.21│ 100000│ 5.00│ 95976│ 19195.20│ 52.0964│316.05│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 8192 │ 6.60│ 20000│ 5.00│ 15147│ 3029.56│ 330.0811│100.00│
│neon_d32::hsla_to_rgba x 8192 │ 5.19│ 50000│ 5.00│ 48143│ 9628.75│ 103.8556│317.83│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 16384 │ 6.60│ 10000│ 5.00│ 7573│ 1514.63│ 660.2272│100.00│
│neon_d32::hsla_to_rgba x 16384│ 5.19│ 25000│ 5.00│ 24102│ 4820.42│ 207.4509│318.26│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 32768 │ 6.58│ 5000│ 5.00│ 3801│ 760.22│ 1315.4130│100.00│
│neon_d32::hsla_to_rgba x 32768│ 6.22│ 15000│ 5.00│ 12049│ 2409.89│ 414.9569│317.00│
├──────────────────────────────┼───────┼───────┼───────┼───────┼──────────┼──────────┼──────┤
│native::hsla_to_rgba x 65536 │ 13.22│ 5000│ 5.00│ 1890│ 378.19│ 2644.1880│100.00│
│neon_d32::hsla_to_rgba x 65536│ 8.56│ 10000│ 5.00│ 5844│ 1168.85│ 855.5439│309.07│
└──────────────────────────────┴───────┴───────┴───────┴───────┴──────────┴──────────┴──────┘
Code: Select all
┌Case──────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::hsla_to_rgba x 64 │ 5.01│2125000│ 5.00│2122408│424481.79│ 2.3558│100.00│
│neon_d32::hsla_to_rgba x 64 │ 5.01│3455000│ 5.00│3451185│690237.01│ 1.4488│162.61│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 128 │ 5.01│1025000│ 5.00│1022313│204462.79│ 4.8909│100.00│
│neon_d32::hsla_to_rgba x 128 │ 5.01│1740000│ 5.00│1738133│347626.79│ 2.8766│170.02│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 256 │ 5.01│ 480000│ 5.00│ 479236│ 95847.26│ 10.4333│100.00│
│neon_d32::hsla_to_rgba x 256 │ 5.02│ 875000│ 5.00│ 872039│174407.82│ 5.7337│181.96│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 512 │ 5.07│ 235000│ 5.00│ 231694│ 46338.84│ 21.5802│100.00│
│neon_d32::hsla_to_rgba x 512 │ 5.04│ 440000│ 5.00│ 436707│ 87341.53│ 11.4493│188.48│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 1024 │ 5.17│ 115000│ 5.00│ 111313│ 22262.79│ 44.9180│100.00│
│neon_d32::hsla_to_rgba x 1024 │ 5.04│ 220000│ 5.00│ 218465│ 43693.11│ 22.8869│196.26│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 2048 │ 5.41│ 60000│ 5.00│ 55490│ 11098.19│ 90.1047│100.00│
│neon_d32::hsla_to_rgba x 2048 │ 5.06│ 110000│ 5.00│ 108748│ 21749.65│ 45.9778│195.97│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 4096 │ 5.46│ 30000│ 5.00│ 27494│ 5498.94│ 181.8531│100.00│
│neon_d32::hsla_to_rgba x 4096 │ 5.06│ 55000│ 5.00│ 54396│ 10879.22│ 91.9184│197.84│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 8192 │ 5.48│ 15000│ 5.00│ 13695│ 2739.03│ 365.0929│100.00│
│neon_d32::hsla_to_rgba x 8192 │ 5.52│ 30000│ 5.00│ 27161│ 5432.34│ 184.0828│198.33│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 16384 │ 7.27│ 10000│ 5.00│ 6875│ 1375.15│ 727.1949│100.00│
│neon_d32::hsla_to_rgba x 16384│ 5.53│ 15000│ 5.00│ 13567│ 2713.59│ 368.5155│197.33│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 32768 │ 7.31│ 5000│ 5.00│ 3418│ 683.71│ 1462.6100│100.00│
│neon_d32::hsla_to_rgba x 32768│ 7.39│ 10000│ 5.00│ 6769│ 1353.89│ 738.6115│198.02│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::hsla_to_rgba x 65536 │ 14.59│ 5000│ 5.00│ 1713│ 342.60│ 2918.8394│100.00│
│neon_d32::hsla_to_rgba x 65536│ 7.40│ 5000│ 5.00│ 3378│ 675.67│ 1480.0100│197.22│
└──────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
Code: Select all
.test/lsp-plugins-test ptest dsp.filters.static
Code: Select all
┌Case──────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::biquad_process_x1 x8 │ 30.87│ 240000│ 30.00│ 233265│ 7775.51│ 128.6090│100.00│
│neon_d32::biquad_process_x1 x8│ 30.31│ 330000│ 30.00│ 326599│ 10886.67│ 91.8555│140.01│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x2 x4 │ 30.26│ 300000│ 30.00│ 297430│ 9914.36│ 100.8638│100.00│
│neon_d32::biquad_process_x2 x4│ 30.38│ 550000│ 30.00│ 543070│ 18102.34│ 55.2415│182.59│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x4 x2 │ 30.64│ 370000│ 30.00│ 362275│ 12075.85│ 82.8099│100.00│
│neon_d32::biquad_process_x4 x2│ 30.02│1080000│ 30.00│1079230│ 35974.37│ 27.7976│297.90│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x8 x1 │ 30.23│ 360000│ 30.00│ 357240│ 11908.01│ 83.9771│100.00│
│neon_d32::biquad_process_x8 x1│ 30.05│1120000│ 30.00│1118058│ 37268.60│ 26.8322│312.97│
└──────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
Code: Select all
┌Case──────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::biquad_process_x1 x8 │ 30.65│ 320000│ 30.00│ 313237│ 10441.26│ 95.7739│100.00│
│neon_d32::biquad_process_x1 x8│ 30.14│ 440000│ 30.00│ 437900│ 14596.68│ 68.5087│139.80│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x2 x4 │ 30.09│ 400000│ 30.00│ 398797│ 13293.25│ 75.2262│100.00│
│neon_d32::biquad_process_x2 x4│ 30.08│ 730000│ 30.00│ 728149│ 24271.67│ 41.2003│182.59│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x4 x2 │ 30.26│ 490000│ 30.00│ 485772│ 16192.43│ 61.7573│100.00│
│neon_d32::biquad_process_x4 x2│ 30.05│1450000│ 30.00│1447391│ 48246.37│ 20.7269│297.96│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x8 x1 │ 30.06│ 480000│ 30.00│ 478974│ 15965.83│ 62.6338│100.00│
│neon_d32::biquad_process_x8 x1│ 30.02│1500000│ 30.00│1499104│ 49970.15│ 20.0119│312.98│
└──────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
Code: Select all
┌Case──────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::biquad_process_x1 x8 │ 30.51│ 530000│ 30.00│ 521091│ 17369.71│ 57.5715│147.28│
│neon_d32::biquad_process_x1 x8│ 30.53│ 360000│ 30.00│ 353798│ 11793.28│ 84.7941│100.00│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x2 x4 │ 30.06│1320000│ 30.00│1317284│ 43909.48│ 22.7741│288.33│
│neon_d32::biquad_process_x2 x4│ 30.21│ 460000│ 30.00│ 456863│ 15228.80│ 65.6651│100.00│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x4 x2 │ 30.15│1190000│ 30.00│1184163│ 39472.12│ 25.3343│138.04│
│neon_d32::biquad_process_x4 x2│ 30.08│ 860000│ 30.00│ 857823│ 28594.13│ 34.9722│100.00│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x8 x1 │ 30.28│ 820000│ 30.00│ 812439│ 27081.31│ 36.9258│100.00│
│neon_d32::biquad_process_x8 x1│ 30.18│1530000│ 30.00│1520779│ 50692.64│ 19.7267│187.19│
└──────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
Code: Select all
┌Case──────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::biquad_process_x1 x8 │ 30.52│ 360000│ 30.00│ 353841│ 11794.72│ 84.7837│100.00│
│neon_d32::biquad_process_x1 x8│ 30.02│ 460000│ 30.00│ 459745│ 15324.83│ 65.2536│129.93│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x2 x4 │ 30.09│ 790000│ 30.00│ 787760│ 26258.67│ 38.0827│100.00│
│neon_d32::biquad_process_x2 x4│ 30.16│ 940000│ 30.00│ 934894│ 31163.15│ 32.0892│118.68│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x4 x2 │ 30.27│ 930000│ 30.00│ 921617│ 30720.59│ 32.5515│100.00│
│neon_d32::biquad_process_x4 x2│ 30.12│2400000│ 30.00│2390643│ 79688.10│ 12.5489│259.40│
├──────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::biquad_process_x8 x1 │ 30.20│ 750000│ 30.00│ 744987│ 24832.92│ 40.2691│100.00│
│neon_d32::biquad_process_x8 x1│ 30.05│1910000│ 30.00│1906587│ 63552.93│ 15.7349│255.92│
└──────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
Code: Select all
.test/lsp-plugins-test ptest dsp.fft.fft
Code: Select all
┌Case───────────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::direct_fft x 256 │ 30.03│ 984000│ 30.00│ 983006│ 32766.89│ 30.5186│100.72│
│native::packed_direct_fft x 256 │ 30.00│ 976000│ 30.00│ 975947│ 32531.58│ 30.7394│100.00│
│neon_d32::direct_fft x 256 │ 30.00│3172000│ 30.00│3171515│105717.18│ 9.4592│324.97│
│neon_d32::packed_direct_fft x 256 │ 30.01│2909000│ 30.00│2908396│ 96946.56│ 10.3150│298.01│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 512 │ 30.05│ 437000│ 30.00│ 436215│ 14540.52│ 68.7734│100.80│
│native::packed_direct_fft x 512 │ 30.02│ 433000│ 30.00│ 432760│ 14425.36│ 69.3224│100.00│
│neon_d32::direct_fft x 512 │ 30.02│1436000│ 30.00│1435029│ 47834.30│ 20.9055│331.60│
│neon_d32::packed_direct_fft x 512 │ 30.02│1323000│ 30.00│1322334│ 44077.82│ 22.6871│305.56│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 1024 │ 30.02│ 195000│ 30.00│ 194851│ 6495.05│ 153.9633│101.13│
│native::packed_direct_fft x 1024 │ 30.05│ 193000│ 30.00│ 192674│ 6422.48│ 155.7032│100.00│
│neon_d32::direct_fft x 1024 │ 30.00│ 645000│ 30.00│ 644953│ 21498.43│ 46.5150│334.74│
│neon_d32::packed_direct_fft x 1024 │ 30.00│ 597000│ 30.00│ 596909│ 19896.99│ 50.2589│309.80│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 2048 │ 30.34│ 87000│ 30.00│ 86025│ 2867.50│ 348.7354│100.00│
│native::packed_direct_fft x 2048 │ 30.02│ 87000│ 30.00│ 86932│ 2897.74│ 345.0969│101.05│
│neon_d32::direct_fft x 2048 │ 30.09│ 276000│ 30.00│ 275196│ 9173.20│ 109.0132│319.90│
│neon_d32::packed_direct_fft x 2048 │ 30.02│ 270000│ 30.00│ 269847│ 8994.93│ 111.1737│313.69│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 4096 │ 30.72│ 36000│ 30.00│ 35160│ 1172.03│ 853.2223│100.00│
│native::packed_direct_fft x 4096 │ 30.18│ 38000│ 30.00│ 37769│ 1259.00│ 794.2829│107.42│
│neon_d32::direct_fft x 4096 │ 30.14│ 105000│ 30.00│ 104524│ 3484.14│ 287.0151│297.27│
│neon_d32::packed_direct_fft x 4096 │ 30.14│ 113000│ 30.00│ 112474│ 3749.14│ 266.7282│319.88│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 8192 │ 31.31│ 15000│ 30.00│ 14374│ 479.14│ 2087.0702│100.00│
│native::packed_direct_fft x 8192 │ 30.89│ 17000│ 30.00│ 16512│ 550.42│ 1816.8099│114.88│
│neon_d32::direct_fft x 8192 │ 30.53│ 34000│ 30.00│ 33405│ 1113.52│ 898.0569│232.40│
│neon_d32::packed_direct_fft x 8192 │ 30.33│ 44000│ 30.00│ 43523│ 1450.78│ 689.2854│302.79│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 16384 │ 31.28│ 7000│ 30.00│ 6713│ 223.77│ 4468.8017│100.00│
│native::packed_direct_fft x 16384 │ 31.53│ 8000│ 30.00│ 7611│ 253.73│ 3941.2022│113.39│
│neon_d32::direct_fft x 16384 │ 30.96│ 16000│ 30.00│ 15503│ 516.78│ 1935.0739│230.94│
│neon_d32::packed_direct_fft x 16384│ 30.28│ 20000│ 30.00│ 19812│ 660.42│ 1514.1930│295.13│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 32768 │ 30.36│ 3000│ 30.00│ 2964│ 98.81│10120.9080│100.00│
│native::packed_direct_fft x 32768 │ 35.30│ 4000│ 30.00│ 3399│ 113.31│ 8825.7283│114.68│
│neon_d32::direct_fft x 32768 │ 31.13│ 7000│ 30.00│ 6745│ 224.86│ 4447.1594│227.58│
│neon_d32::packed_direct_fft x 32768│ 30.96│ 9000│ 30.00│ 8720│ 290.69│ 3440.0632│294.21│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 65536 │ 53.35│ 2000│ 30.00│ 1124│ 37.49│26675.3555│100.00│
│native::packed_direct_fft x 65536 │ 45.28│ 2000│ 30.00│ 1325│ 44.17│22641.4340│117.82│
│neon_d32::direct_fft x 65536 │ 40.37│ 3000│ 30.00│ 2229│ 74.30│13458.0963│198.21│
│neon_d32::packed_direct_fft x 65536│ 31.26│ 3000│ 30.00│ 2879│ 95.97│10419.9240│256.00│
└───────────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
Code: Select all
┌Case───────────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::direct_fft x 256 │ 30.01│1332000│ 30.00│1331626│ 44387.56│ 22.5288│101.78│
│native::packed_direct_fft x 256 │ 30.02│1309000│ 30.00│1308291│ 43609.72│ 22.9307│100.00│
│neon_d32::direct_fft x 256 │ 30.00│4265000│ 30.00│4264809│142160.31│ 7.0343│325.98│
│neon_d32::packed_direct_fft x 256 │ 30.00│3907000│ 30.00│3906824│130227.50│ 7.6789│298.62│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 512 │ 30.05│ 590000│ 30.00│ 589086│ 19636.23│ 50.9263│101.46│
│native::packed_direct_fft x 512 │ 30.02│ 581000│ 30.00│ 580612│ 19353.74│ 51.6696│100.00│
│neon_d32::direct_fft x 512 │ 30.00│1926000│ 30.00│1925738│ 64191.27│ 15.5784│331.67│
│neon_d32::packed_direct_fft x 512 │ 30.00│1774000│ 30.00│1773897│ 59129.92│ 16.9119│305.52│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 1024 │ 30.05│ 264000│ 30.00│ 263595│ 8786.52│ 113.8106│101.28│
│native::packed_direct_fft x 1024 │ 30.08│ 261000│ 30.00│ 260263│ 8675.46│ 115.2677│100.00│
│neon_d32::direct_fft x 1024 │ 30.01│ 875000│ 30.00│ 874755│ 29158.51│ 34.2953│336.10│
│neon_d32::packed_direct_fft x 1024 │ 30.03│ 806000│ 30.00│ 805103│ 26836.78│ 37.2623│309.34│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 2048 │ 30.03│ 117000│ 30.00│ 116864│ 3895.47│ 256.7081│102.71│
│native::packed_direct_fft x 2048 │ 30.06│ 114000│ 30.00│ 113777│ 3792.58│ 263.6726│100.00│
│neon_d32::direct_fft x 2048 │ 30.04│ 374000│ 30.00│ 373458│ 12448.62│ 80.3302│328.24│
│neon_d32::packed_direct_fft x 2048 │ 30.05│ 341000│ 30.00│ 340423│ 11347.47│ 88.1254│299.20│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 4096 │ 30.47│ 48000│ 30.00│ 47254│ 1575.16│ 634.8556│100.00│
│native::packed_direct_fft x 4096 │ 30.58│ 49000│ 30.00│ 48066│ 1602.22│ 624.1323│101.72│
│neon_d32::direct_fft x 4096 │ 30.13│ 144000│ 30.00│ 143375│ 4779.19│ 209.2405│303.41│
│neon_d32::packed_direct_fft x 4096 │ 30.11│ 126000│ 30.00│ 125535│ 4184.52│ 238.9759│265.66│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 8192 │ 30.69│ 19000│ 30.00│ 18571│ 619.05│ 1615.3704│100.00│
│native::packed_direct_fft x 8192 │ 31.37│ 22000│ 30.00│ 21039│ 701.33│ 1425.8718│113.29│
│neon_d32::direct_fft x 8192 │ 30.28│ 43000│ 30.00│ 42608│ 1420.28│ 704.0871│229.43│
│neon_d32::packed_direct_fft x 8192 │ 30.30│ 53000│ 30.00│ 52473│ 1749.13│ 571.7130│282.55│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 16384 │ 32.94│ 9000│ 30.00│ 8195│ 273.18│ 3660.5484│100.00│
│native::packed_direct_fft x 16384 │ 31.43│ 10000│ 30.00│ 9545│ 318.18│ 3142.8263│116.47│
│neon_d32::direct_fft x 16384 │ 30.14│ 20000│ 30.00│ 19910│ 663.67│ 1506.7712│242.94│
│neon_d32::packed_direct_fft x 16384│ 30.07│ 24000│ 30.00│ 23943│ 798.11│ 1252.9593│292.15│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 32768 │ 33.49│ 4000│ 30.00│ 3583│ 119.44│ 8372.4263│100.00│
│native::packed_direct_fft x 32768 │ 36.17│ 5000│ 30.00│ 4146│ 138.22│ 7234.8576│115.72│
│neon_d32::direct_fft x 32768 │ 33.32│ 9000│ 30.00│ 8104│ 270.14│ 3701.7390│226.18│
│neon_d32::packed_direct_fft x 32768│ 30.02│ 10000│ 30.00│ 9993│ 333.13│ 3001.8067│278.91│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 65536 │ 48.27│ 2000│ 30.00│ 1243│ 41.43│24134.7980│100.00│
│native::packed_direct_fft x 65536 │ 38.55│ 2000│ 30.00│ 1556│ 51.88│19276.3165│125.20│
│neon_d32::direct_fft x 65536 │ 38.96│ 3000│ 30.00│ 2309│ 76.99│12988.2230│185.82│
│neon_d32::packed_direct_fft x 65536│ 36.24│ 4000│ 30.00│ 3310│ 110.37│ 9060.7077│266.37│
└───────────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
Code: Select all
┌Case───────────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::direct_fft x 256 │ 30.01│4133000│ 30.00│4132257│137741.90│ 7.2600│100.00│
│native::packed_direct_fft x 256 │ 30.01│4182000│ 30.00│4181190│139373.02│ 7.1750│101.18│
│neon_d32::direct_fft x 256 │ 30.00│6437000│ 30.00│6436997│214566.58│ 4.6606│155.77│
│neon_d32::packed_direct_fft x 256 │ 30.00│7430000│ 30.00│7429785│247659.52│ 4.0378│179.80│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 512 │ 30.02│1856000│ 30.00│1855015│ 61833.85│ 16.1724│100.00│
│native::packed_direct_fft x 512 │ 30.02│1866000│ 30.00│1865031│ 62167.73│ 16.0855│100.54│
│neon_d32::direct_fft x 512 │ 30.01│2969000│ 30.00│2968377│ 98945.92│ 10.1065│160.02│
│neon_d32::packed_direct_fft x 512 │ 30.01│3339000│ 30.00│3338103│111270.13│ 8.9871│179.95│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 1024 │ 30.04│ 841000│ 30.00│ 840006│ 28000.21│ 35.7140│100.00│
│native::packed_direct_fft x 1024 │ 30.03│ 844000│ 30.00│ 843123│ 28104.12│ 35.5820│100.37│
│neon_d32::direct_fft x 1024 │ 30.02│1372000│ 30.00│1371286│ 45709.54│ 21.8773│163.25│
│neon_d32::packed_direct_fft x 1024 │ 30.01│1507000│ 30.00│1506694│ 50223.14│ 19.9111│179.37│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 2048 │ 30.03│ 382000│ 30.00│ 381644│ 12721.48│ 78.6072│100.00│
│native::packed_direct_fft x 2048 │ 30.05│ 384000│ 30.00│ 383310│ 12777.01│ 78.2656│100.44│
│neon_d32::direct_fft x 2048 │ 30.03│ 630000│ 30.00│ 629390│ 20979.69│ 47.6651│164.92│
│neon_d32::packed_direct_fft x 2048 │ 30.04│ 673000│ 30.00│ 672158│ 22405.28│ 44.6323│176.12│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 4096 │ 30.04│ 167000│ 30.00│ 166752│ 5558.42│ 179.9073│100.00│
│native::packed_direct_fft x 4096 │ 30.15│ 169000│ 30.00│ 168153│ 5605.11│ 178.4087│100.84│
│neon_d32::direct_fft x 4096 │ 30.01│ 259000│ 30.00│ 258899│ 8629.99│ 115.8750│155.26│
│neon_d32::packed_direct_fft x 4096 │ 30.09│ 283000│ 30.00│ 282156│ 9405.23│ 106.3238│169.21│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 8192 │ 30.04│ 67000│ 30.00│ 66918│ 2230.63│ 448.3037│100.00│
│native::packed_direct_fft x 8192 │ 30.13│ 75000│ 30.00│ 74669│ 2488.99│ 401.7690│111.58│
│neon_d32::direct_fft x 8192 │ 30.19│ 105000│ 30.00│ 104354│ 3478.47│ 287.4825│155.94│
│neon_d32::packed_direct_fft x 8192 │ 30.21│ 119000│ 30.00│ 118168│ 3938.96│ 253.8740│176.59│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 16384 │ 30.25│ 30000│ 30.00│ 29750│ 991.68│ 1008.3872│100.00│
│native::packed_direct_fft x 16384 │ 30.63│ 35000│ 30.00│ 34281│ 1142.73│ 875.0958│115.23│
│neon_d32::direct_fft x 16384 │ 30.12│ 43000│ 30.00│ 42835│ 1427.85│ 700.3555│143.98│
│neon_d32::packed_direct_fft x 16384│ 30.07│ 52000│ 30.00│ 51883│ 1729.44│ 578.2207│174.39│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 32768 │ 30.18│ 14000│ 30.00│ 13914│ 463.82│ 2156.0221│100.00│
│native::packed_direct_fft x 32768 │ 31.82│ 17000│ 30.00│ 16027│ 534.24│ 1871.8109│115.18│
│neon_d32::direct_fft x 32768 │ 30.76│ 20000│ 30.00│ 19505│ 650.19│ 1538.0225│140.18│
│neon_d32::packed_direct_fft x 32768│ 30.09│ 24000│ 30.00│ 23928│ 797.60│ 1253.7599│171.96│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 65536 │ 34.39│ 7000│ 30.00│ 6107│ 203.57│ 4912.2279│100.00│
│native::packed_direct_fft x 65536 │ 30.49│ 7000│ 30.00│ 6888│ 229.61│ 4355.3056│112.79│
│neon_d32::direct_fft x 65536 │ 32.87│ 9000│ 30.00│ 8213│ 273.80│ 3652.3288│134.50│
│neon_d32::packed_direct_fft x 65536│ 32.17│ 11000│ 30.00│ 10258│ 341.95│ 2924.4389│167.97│
└───────────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
Code: Select all
┌Case───────────────────────────────┬Time[s]┬───Iter┬Samp[s]┬────Est┬Perf[i/s]┬Cost[us/i]┬Rel[%]┐
│native::direct_fft x 256 │ 30.01│2738000│ 30.00│2737374│ 91245.83│ 10.9594│100.00│
│native::packed_direct_fft x 256 │ 30.01│2842000│ 30.00│2841417│ 94713.92│ 10.5581│103.80│
│neon_d32::direct_fft x 256 │ 30.00│5398000│ 30.00│5397415│179913.84│ 5.5582│197.17│
│neon_d32::packed_direct_fft x 256 │ 30.00│4659000│ 30.00│4658536│155284.54│ 6.4398│170.18│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 512 │ 30.01│1221000│ 30.00│1220583│ 40686.11│ 24.5784│100.00│
│native::packed_direct_fft x 512 │ 30.02│1265000│ 30.00│1264280│ 42142.69│ 23.7289│103.58│
│neon_d32::direct_fft x 512 │ 30.00│2384000│ 30.00│2383924│ 79464.15│ 12.5843│195.31│
│neon_d32::packed_direct_fft x 512 │ 30.01│2052000│ 30.00│2051061│ 68368.72│ 14.6266│168.04│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 1024 │ 30.01│ 548000│ 30.00│ 547797│ 18259.91│ 54.7648│100.00│
│native::packed_direct_fft x 1024 │ 30.02│ 566000│ 30.00│ 565713│ 18857.12│ 53.0304│103.27│
│neon_d32::direct_fft x 1024 │ 30.03│1067000│ 30.00│1066067│ 35535.60│ 28.1408│194.61│
│neon_d32::packed_direct_fft x 1024 │ 30.01│ 907000│ 30.00│ 906618│ 30220.61│ 33.0900│165.50│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 2048 │ 30.11│ 246000│ 30.00│ 245129│ 8170.98│ 122.3843│100.00│
│native::packed_direct_fft x 2048 │ 30.01│ 253000│ 30.00│ 252880│ 8429.34│ 118.6332│103.16│
│neon_d32::direct_fft x 2048 │ 30.03│ 453000│ 30.00│ 452512│ 15083.76│ 66.2965│184.60│
│neon_d32::packed_direct_fft x 2048 │ 30.02│ 402000│ 30.00│ 401679│ 13389.31│ 74.6864│163.86│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 4096 │ 30.18│ 109000│ 30.00│ 108359│ 3612.00│ 276.8550│100.00│
│native::packed_direct_fft x 4096 │ 30.26│ 112000│ 30.00│ 111036│ 3701.20│ 270.1826│102.47│
│neon_d32::direct_fft x 4096 │ 30.09│ 189000│ 30.00│ 188421│ 6280.73│ 159.2171│173.89│
│neon_d32::packed_direct_fft x 4096 │ 30.12│ 178000│ 30.00│ 177261│ 5908.72│ 169.2415│163.59│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 8192 │ 30.65│ 46000│ 30.00│ 45030│ 1501.02│ 666.2143│100.00│
│native::packed_direct_fft x 8192 │ 30.22│ 50000│ 30.00│ 49639│ 1654.66│ 604.3546│110.24│
│neon_d32::direct_fft x 8192 │ 30.26│ 75000│ 30.00│ 74350│ 2478.35│ 403.4943│165.11│
│neon_d32::packed_direct_fft x 8192 │ 30.11│ 74000│ 30.00│ 73731│ 2457.70│ 406.8841│163.74│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 16384 │ 30.69│ 21000│ 30.00│ 20529│ 684.32│ 1461.3124│100.00│
│native::packed_direct_fft x 16384 │ 30.31│ 23000│ 30.00│ 22765│ 758.85│ 1317.7867│110.89│
│neon_d32::direct_fft x 16384 │ 30.63│ 32000│ 30.00│ 31346│ 1044.87│ 957.0552│152.69│
│neon_d32::packed_direct_fft x 16384│ 30.66│ 34000│ 30.00│ 33266│ 1108.90│ 901.7948│162.04│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 32768 │ 31.07│ 10000│ 30.00│ 9656│ 321.89│ 3106.6584│100.00│
│native::packed_direct_fft x 32768 │ 31.37│ 11000│ 30.00│ 10518│ 350.61│ 2852.1370│108.92│
│neon_d32::direct_fft x 32768 │ 30.60│ 15000│ 30.00│ 14704│ 490.13│ 2040.2563│152.27│
│neon_d32::packed_direct_fft x 32768│ 31.33│ 16000│ 30.00│ 15320│ 510.69│ 1958.1298│158.65│
├───────────────────────────────────┼───────┼───────┼───────┼───────┼─────────┼──────────┼──────┤
│native::direct_fft x 65536 │ 33.91│ 5000│ 30.00│ 4422│ 147.43│ 6782.8840│100.00│
│native::packed_direct_fft x 65536 │ 33.31│ 5000│ 30.00│ 4503│ 150.12│ 6661.4460│101.82│
│neon_d32::direct_fft x 65536 │ 31.77│ 7000│ 30.00│ 6610│ 220.36│ 4538.0359│149.47│
│neon_d32::packed_direct_fft x 65536│ 31.96│ 7000│ 30.00│ 6570│ 219.00│ 4566.2017│148.55│
└───────────────────────────────────┴───────┴───────┴───────┴───────┴─────────┴──────────┴──────┘
Test #0: as expected, RPI4 has beaten all competitors. But Tinker Board S is very near.
Test #1: switching in new raspbian from gcc6 to gcc8 resulted in high native code boost. The new RPI 4 is around 4 times faster in native code and around 2 times faster in SIMD implementation.
Test #2: conditional native code also is faster but the gap between new generation and previous is nocieably lesser. Around 2 times in native implementation and around 1.5 times in SIMD-optimized implementation. Tinker Board is around at the same performance as Raspberry Pi 3.
Test #3: the results of this test are surprising. Native code executes faster than SIMD-optimized on RPI4, on other platforms the SIMD-optimized code, as expected, behaves faster. I need to inspect the generated by GCC code and make some conclusions.
Test #4: it also seems that RPI 4 is better working with cache: large blocks are processed better than on others platforms.
So, currently RPI4 is very interesting SBC platform.