# Background
i7-12700K = Intel Core i7-12700K (8 big cores each has 2 threads, 4 little cores each has 1 thread) running at 5GHz
rpi4 = Raspberry Pi 4 Rev B, with 4x Cortex-A72 running at 1.8GHz (Broadcom BCM2711)
rpi5 = Raspberry Pi 5, with 4x Coretex-A76 running at 2.4GHz (Broadcom BCM2712)
am69 = TI AM69 starter kit, with 8x Cortex-A72 running at 2.0GHz (TI AM69)
lpi4a = LiCheePi 4A, with 4x RISC-V RV64GCV running at 1.85GHz (Alibaba TH1520)
- I've tried to use both open-source GCC and T-Head's own GCC with `-mcpu=c920` support. Latter gives slightly better result.
# Summary
| | i7-12700K | rpi4 (A72) | rpi5 (A76) | am69 (A72) | lpi4a (RV64GCV) |
| ---------------------------- | ------------- | ----------- | ------------ | ------------ | -------------------------- |
| # of threads | 20 | 4 | 4 | 8 | 4 |
| Frequency | 5.0 (2.8x) | 1.8 (1.0x) | 2.4 (1.3x) | 2.0 (1.1x) | 1.85 (1.0x) |
| Frequency x # of threads | 100.0 (13.9x) | 7.2 (1.0x) | 9.6 (1.3x) | 16.0 (2.2x) | 7.4 (1.0x) |
| CoreMark | 44385 (4.5x) | 9820 (1.0x) | 17665 (1.8x) | 11002 (1.1x) | 8555 (0.9x)<br>9783 (1.0x) |
| CoreMark-Pro (Multi-Core) | 78969 (16.0x) | 4939 (1.0x) | 12746 (2.6x) | 14102 (2.9x) | 4547 (0.9x) |
| CoreMark-Pro (Single-Core) | 10054 (4.5x) | 2242 (1.0x) | 5044 (2.2x) | 2550 (1.1x) | 1572 (0.7x) |
| SPEC CPU 2017 Integer | 10.0 (5.7x) | 1.74 (1.0x) | | | |
| SPEC CPU 2017 Floating-Point | 21.3 (11.7x) | 1.82 (1.0x) | | | |
# SPEC CPU 2017 (intspeed)
## i7-12700K
```
Estimated Estimated
Base Base Base Peak Peak Peak
Benchmarks Threads Run Time Ratio Threads Run Time Ratio
--------------- ------- --------- --------- ------- --------- ---------
600.perlbench_s 4 154 11.5 * 4 143 12.5 *
602.gcc_s 4 304 13.1 * 4 286 13.9 *
605.mcf_s 4 467 10.1 * 4 434 10.9 *
620.omnetpp_s 4 316 5.17 * 4 292 5.59 *
623.xalancbmk_s 4 139 10.2 * 4 133 10.6 *
625.x264_s 4 105 16.7 * 4 96.9 18.2 *
631.deepsjeng_s 4 254 5.65 * 4 233 6.16 *
641.leela_s 4 283 6.03 * 4 261 6.54 *
648.exchange2_s 4 193 15.3 * 4 188 15.7 *
657.xz_s 4 791 7.81 * 4 774 7.99 *
=================================================================================
600.perlbench_s 4 154 11.5 * 4 143 12.5 *
602.gcc_s 4 304 13.1 * 4 286 13.9 *
605.mcf_s 4 467 10.1 * 4 434 10.9 *
620.omnetpp_s 4 316 5.17 * 4 292 5.59 *
623.xalancbmk_s 4 139 10.2 * 4 133 10.6 *
625.x264_s 4 105 16.7 * 4 96.9 18.2 *
631.deepsjeng_s 4 254 5.65 * 4 233 6.16 *
641.leela_s 4 283 6.03 * 4 261 6.54 *
648.exchange2_s 4 193 15.3 * 4 188 15.7 *
657.xz_s 4 791 7.81 * 4 774 7.99 *
Est. SPECspeed(R)2017_int_base 9.41
Est. SPECspeed(R)2017_int_peak 10.0
```
## rpi4
```
Estimated Estimated
Base Base Base Peak Peak Peak
Benchmarks Threads Run Time Ratio Threads Run Time Ratio
--------------- ------- --------- --------- ------- --------- ---------
600.perlbench_s 4 1314 1.35 * 4 1064 1.67 *
602.gcc_s 4 57.1 RE 4 145 RE
605.mcf_s 4 3908 1.21 * 4 3661 1.29 *
620.omnetpp_s 4 1911 0.853 * 4 1667 0.979 *
623.xalancbmk_s 4 997 1.42 * 4 932 1.52 *
625.x264_s 4 798 2.21 * 4 753 2.34 *
631.deepsjeng_s 4 10.2 RE 1 -- FE
641.leela_s 4 1086 1.57 * 4 867 1.97 *
648.exchange2_s 4 896 3.28 * 4 908 3.24 *
657.xz_s 4 9.24 RE 4 8.54 RE
=================================================================================
600.perlbench_s 4 1314 1.35 * 4 1064 1.67 *
602.gcc_s NR NR
605.mcf_s 4 3908 1.21 * 4 3661 1.29 *
620.omnetpp_s 4 1911 0.853 * 4 1667 0.979 *
623.xalancbmk_s 4 997 1.42 * 4 932 1.52 *
625.x264_s 4 798 2.21 * 4 753 2.34 *
631.deepsjeng_s NR NR
641.leela_s 4 1086 1.57 * 4 867 1.97 *
648.exchange2_s 4 896 3.28 * 4 908 3.24 *
657.xz_s NR NR
Est. SPECspeed(R)2017_int_base 1.56
Est. SPECspeed(R)2017_int_peak 1.74
```
# SPEC CPU 2017 (fpspeed)
## i7-12700K
```
Estimated Estimated
Base Base Base Peak Peak Peak
Benchmarks Threads Run Time Ratio Threads Run Time Ratio
--------------- ------- --------- --------- ------- --------- ---------
603.bwaves_s 4 846 69.7 * 1 -- FE
607.cactuBSSN_s 4 330 50.5 * 4 318 52.5 *
619.lbm_s 4 810 6.47 * 4 812 6.45 *
621.wrf_s 1 -- CE 1 -- CE
627.cam4_s 1 -- CE 1 -- CE
628.pop2_s 1 -- CE NR
638.imagick_s 4 872 16.5 * 4 724 19.9 *
644.nab_s 4 436 40.1 * 4 382 45.8 *
649.fotonik3d_s 4 642 14.2 * 4 635 14.4 *
654.roms_s 4 856 18.4 * 4 1.65 RE
=================================================================================
603.bwaves_s 4 846 69.7 * NR
607.cactuBSSN_s 4 330 50.5 * 4 318 52.5 *
619.lbm_s 4 810 6.47 * 4 812 6.45 *
621.wrf_s NR NR
627.cam4_s NR NR
628.pop2_s NR NR
638.imagick_s 4 872 16.5 * 4 724 19.9 *
644.nab_s 4 436 40.1 * 4 382 45.8 *
649.fotonik3d_s 4 642 14.2 * 4 635 14.4 *
654.roms_s 4 856 18.4 * NR
Est. SPECspeed(R)2017_fp_base 23.5
Est. SPECspeed(R)2017_fp_peak 21.3
```
## rpi4
```
CPU2017 License: A000A Test date: Mar-2024
Test sponsor: My Corporation Hardware availability: Mar-2024
Tested by: My Corporation Software availability: Mar-2024
Estimated Estimated
Base Base Base Peak Peak Peak
Benchmarks Threads Run Time Ratio Threads Run Time Ratio
--------------- ------- --------- --------- ------- --------- ---------
603.bwaves_s 4 7.36 RE 1 -- FE
607.cactuBSSN_s 4 20.2 RE 4 92.0 RE
619.lbm_s 4 11153 0.470 * 4 11184 0.468 *
621.wrf_s 4 6369 2.08 * 1 -- FE
627.cam4_s 4 11.3 RE 1 -- FE
628.pop2_s 4 6922 1.72 * 4 6922 1.72 *
638.imagick_s 4 6436 RE 4 5629 RE
644.nab_s 4 3126 5.59 * 4 2326 7.51 *
649.fotonik3d_s 4 99.5 RE 4 99.4 RE
654.roms_s 4 9.35 RE 4 9.67 RE
=================================================================================
603.bwaves_s NR NR
607.cactuBSSN_s NR NR
619.lbm_s 4 11153 0.470 * 4 11184 0.468 *
621.wrf_s 4 6369 2.08 * NR
627.cam4_s NR NR
628.pop2_s 4 6922 1.72 * 4 6922 1.72 *
638.imagick_s NR NR
644.nab_s 4 3126 5.59 * 4 2326 7.51 *
649.fotonik3d_s NR NR
654.roms_s NR NR
Est. SPECspeed(R)2017_fp_base 1.75
Est. SPECspeed(R)2017_fp_peak 1.82
```
# CoreMark
## i7-12700K
```
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 13518
Total time (secs): 13.518000
Iterations/Sec : 44385.264092
Iterations : 600000
Compiler version : GCC11.4.0
Compiler flags : -O2 -DPERFORMANCE_RUN=1 -lrt
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xa14c
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 44385.264092 / GCC11.4.0 -O2 -DPERFORMANCE_RUN=1 -lrt / Heap
```
## rpi4
```
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 11202
Total time (secs): 11.202000
Iterations/Sec : 9819.675058
Iterations : 110000
Compiler version : GCC9.4.0
Compiler flags : -O2 -DPERFORMANCE_RUN=1 -lrt
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0x33ff
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 9819.675058 / GCC9.4.0 -O2 -DPERFORMANCE_RUN=1 -lrt / Heap
```
## rpi5
```
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 16983
Total time (secs): 16.983000
Iterations/Sec : 17664.723547
Iterations : 300000
Compiler version : GCC13.2.0
Compiler flags : -O2 -DPERFORMANCE_RUN=1 -lrt
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xcc42
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 17664.723547 / GCC13.2.0 -O2 -DPERFORMANCE_RUN=1 -lrt / Heap
```
## am69
```
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 18178
Total time (secs): 18.178000
Iterations/Sec : 11002.310485
Iterations : 200000
Compiler version : GCC11.4.0
Compiler flags : -O2 -DPERFORMANCE_RUN=1 -lrt
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0x4983
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 11002.310485 / GCC11.4.0 -O2 -DPERFORMANCE_RUN=1 -lrt / Heap
```
## lpi4a
```
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 12857
Total time (secs): 12.857000
Iterations/Sec : 8555.650618
Iterations : 110000
Compiler version : GCC13.2.0
Compiler flags : -O2 -DPERFORMANCE_RUN=1 -lrt
Memory location : Please put data memory location here
(e.g. code in flash, data on heap etc)
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0x33ff
Correct operation validated. See README.md for run and reporting rules.
CoreMark 1.0 : 8555.650618 / GCC13.2.0 -O2 -DPERFORMANCE_RUN=1 -lrt / Heap
```
# CoreMark-Pro
## i7-12700K
```
WORKLOAD RESULTS TABLE
MultiCore SingleCore
Workload Name (iter/s) (iter/s) Scaling
----------------------------------------------- ---------- ---------- ----------
cjpeg-rose7-preset 1250.00 238.10 5.25
core 42.55 3.42 12.44
linear_alg-mid-100x100-sp 2083.33 367.65 5.67
loops-all-mid-10k-sp 122.85 15.22 8.07
nnet_test 178.57 18.98 9.41
parser-125k 350.88 35.71 9.83
radix2-big-64k 10309.28 1270.65 8.11
sha-test 1538.46 303.03 5.08
zip-test 2500.00 250.00 10.00
MARK RESULTS TABLE
Mark Name MultiCore SingleCore Scaling
----------------------------------------------- ---------- ---------- ----------
CoreMark-PRO 78968.75 10054.14 7.85
```
## rpi4
```
WORKLOAD RESULTS TABLE
MultiCore SingleCore
Workload Name (iter/s) (iter/s) Scaling
----------------------------------------------- ---------- ---------- ----------
cjpeg-rose7-preset 263.16 74.63 3.53
core 2.71 0.69 3.93
linear_alg-mid-100x100-sp 241.55 62.89 3.84
loops-all-mid-10k-sp 4.08 2.34 1.74
nnet_test 11.47 3.46 3.32
parser-125k 8.89 11.24 0.79
radix2-big-64k 105.29 227.63 0.46
sha-test 476.19 138.89 3.43
zip-test 137.93 43.48 3.17
MARK RESULTS TABLE
Mark Name MultiCore SingleCore Scaling
----------------------------------------------- ---------- ---------- ----------
CoreMark-PRO 4939.42 2241.89 2.20
```
## rpi5
```
WORKLOAD RESULTS TABLE
MultiCore SingleCore
Workload Name (iter/s) (iter/s) Scaling
----------------------------------------------- ---------- ---------- ----------
cjpeg-rose7-preset 526.32 144.93 3.63
core 5.34 1.34 3.99
linear_alg-mid-100x100-sp 515.46 136.61 3.77
loops-all-mid-10k-sp 17.91 7.01 2.55
nnet_test 26.46 7.97 3.32
parser-125k 43.96 31.25 1.41
radix2-big-64k 466.64 668.00 0.70
sha-test 625.00 212.77 2.94
zip-test 285.71 90.91 3.14
MARK RESULTS TABLE
Mark Name MultiCore SingleCore Scaling
----------------------------------------------- ---------- ---------- ----------
CoreMark-PRO 12746.27 5044.04 2.53
```
## am69
```
WORKLOAD RESULTS TABLE
MultiCore SingleCore
Workload Name (iter/s) (iter/s) Scaling
----------------------------------------------- ---------- ---------- ----------
cjpeg-rose7-preset 588.24 82.64 7.12
core 6.34 0.80 7.92
linear_alg-mid-100x100-sp 574.71 80.13 7.17
loops-all-mid-10k-sp 11.12 2.66 4.18
nnet_test 19.80 3.71 5.34
parser-125k 72.07 10.75 6.70
radix2-big-64k 805.80 314.07 2.57
sha-test 769.23 153.85 5.00
zip-test 296.30 47.62 6.22
MARK RESULTS TABLE
Mark Name MultiCore SingleCore Scaling
----------------------------------------------- ---------- ---------- ----------
CoreMark-PRO 14102.14 2550.47 5.53
```
## lpi4a
```
WORKLOAD RESULTS TABLE
MultiCore SingleCore
Workload Name (iter/s) (iter/s) Scaling
----------------------------------------------- ---------- ---------- ----------
cjpeg-rose7-preset 263.16 75.19 3.50
core 1.88 0.47 4.00
linear_alg-mid-100x100-sp 235.85 63.69 3.70
loops-all-mid-10k-sp 4.77 2.06 2.32
nnet_test 10.26 3.08 3.33
parser-125k 11.56 8.55 1.35
radix2-big-64k 152.98 71.35 2.14
sha-test 192.31 57.14 3.37
zip-test 121.21 33.33 3.64
MARK RESULTS TABLE
Mark Name MultiCore SingleCore Scaling
----------------------------------------------- ---------- ---------- ----------
CoreMark-PRO 4547.26 1571.90 2.89
```
# Ceph Perf (ceph_perf_local)
## i7-12700K
```
root@22752a5b7c8d:~/ceph/ceph.git/build/bin# ./ceph_perf_local
atomic_int_cmp 3.94ns atomic_t::compare_and_swap
atomic_int_inc 3.72ns atomic_t::inc
atomic_int_read 0.22ns atomic_t::read
atomic_int_set 3.49ns atomic_t::set
mutex_nonblock 10.51ns Mutex lock/unlock (no blocking)
buffer_basic 16.18ns buffer create, add one ptr, delete
buffer_encode_decode 100.52ns buffer create, encode/decode object, delete
buffer_basic_copy 57.57ns buffer create, copy small block, delete
buffer_copy 4.77ns copy out 2 small ptrs from buffer
buffer_encode10 20.64ns buffer encoding 10 structures onto existing ptr
buffer_iterator 314.76ns iterate over buffer with 5 ptrs
cond_ping_pong 2.41us condition variable round-trip
div32 1.23ns 32-bit integer division instruction
div64 2.05ns 64-bit integer division instruction
function_call 0.27ns Call a function that has not been inlined
eventcenter_poll 104.98ns EventCenter::process_events (no timers or events)
eventcenter_dispatch 571.12ns EventCenter::dispatch_event_external latency
memcpy100 0.00ns Copy 100 bytes with memcpy
memcpy1000 0.00ns Copy 1000 bytes with memcpy
memcpy10000 0.00ns Copy 10000 bytes with memcpy
ceph_str_hash_rjenkins 7.58ns rjenkins hash on 16 byte of data
ceph_str_hash_rjenkins 97.54ns rjenkins hash on 256 bytes of data
rdtsc 5.37ns Read the fine-grain cycle counter
cycles_to_seconds 0.87ns Convert a rdtsc result to (double) seconds
cycles_to_seconds 1.13ns Convert a rdtsc result to (uint64_t) nanoseconds
prefetch architecture nonsupport Prefetch instruction
serialize 309.33ns serialize instruction
lfence architecture nonsupport Lfence instruction
sfence architecture nonsupport Sfence instruction
spin_lock 5.15ns Acquire/release SpinLock
spawn_thread 7.37us Start and stop a thread
perf_timer 52.17ns Insert and cancel a SafeTimer
throw_int 523.68ns Throw an int
throw_int_call 652.69ns Throw an int in a function call
throw_exception 654.67ns Throw an Exception
throw_exception_call 891.27ns Throw an Exception in a function call
vector_push_pop 0.55ns Push and pop a std::vector
ceph_clock_now 11.51ns ceph_clock_now function
```
## rpi4
```
atomic_int_cmp 18.39ns atomic_t::compare_and_swap
atomic_int_inc 15.03ns atomic_t::inc
atomic_int_read 4.66ns atomic_t::read
atomic_int_set 10.57ns atomic_t::set
mutex_nonblock 51.71ns Mutex lock/unlock (no blocking)
buffer_basic 99.47ns buffer create, add one ptr, delete
buffer_encode_decode 661.72ns buffer create, encode/decode object, delete
buffer_basic_copy 407.07ns buffer create, copy small block, delete
buffer_copy 42.79ns copy out 2 small ptrs from buffer
buffer_encode10 95.51ns buffer encoding 10 structures onto existing ptr
buffer_iterator 1.65us iterate over buffer with 5 ptrs
cond_ping_pong 21.16us condition variable round-trip
div32 7.25ns 32-bit integer division instruction
div64 architecture nonsupport 64-bit integer division instruction
function_call 3.34ns Call a function that has not been inlined
eventcenter_poll 1.70us EventCenter::process_events (no timers or events)
eventcenter_dispatch 6.89us EventCenter::dispatch_event_external latency
memcpy100 11.71ns Copy 100 bytes with memcpy
memcpy1000 73.09ns Copy 1000 bytes with memcpy
memcpy10000 737.00ns Copy 10000 bytes with memcpy
ceph_str_hash_rjenkins 37.88ns rjenkins hash on 16 byte of data
ceph_str_hash_rjenkins 371.81ns rjenkins hash on 256 bytes of data
rdtsc 46.21ns Read the fine-grain cycle counter
cycles_to_seconds 8.90ns Convert a rdtsc result to (double) seconds
cycles_to_seconds 8.90ns Convert a rdtsc result to (uint64_t) nanoseconds
prefetch 59.95ns Prefetch instruction
serialize architecture nonsupport serialize instruction
lfence 2.80ns Lfence instruction
sfence 1.67ns Sfence instruction
spin_lock 32.36ns Acquire/release SpinLock
spawn_thread 94.17us Start and stop a thread
perf_timer 375.65ns Insert and cancel a SafeTimer
throw_int 4.76us Throw an int
throw_int_call 6.19us Throw an int in a function call
throw_exception 6.10us Throw an Exception
throw_exception_call 8.07us Throw an Exception in a function call
vector_push_pop 1.49ns Push and pop a std::vector
ceph_clock_now 42.88ns ceph_clock_now function
```
# Ceph Perf (ceph_perf_objectstore)
## i7-12700K
```
root@22752a5b7c8d:~/ceph/ceph.git/build/bin# ./ceph_perf_objectstore 1000
args: [1000]
write op: 118us count: 1000
setattr op: 132us count: 2000
omap_setkeys op: 238us count: 2000
omap_rmkey op: 40us count: 1000
encode op: 492us count: 2000
decode op: 678us count: 2000
iterate op: 667us count: 2000
Total rados op 1000 run time 2746us.
```
## rpi4
```
root@ceph0d:/mnt/usb1# ~/ceph/build/bin/ceph_perf_objectstore 1000
args: [1000]
write op: 952us count: 1000
setattr op: 868us count: 2000
omap_setkeys op: 1709us count: 2000
omap_rmkey op: 318us count: 1000
encode op: 2716us count: 2000
decode op: 4127us count: 2000
iterate op: 4429us count: 2000
Total rados op 1000 run time 18000us.
```