Add the scheduling model for Neoverse N1.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Morning. Apparently it is scheduling model season.
This looks like a nice addition. Do you have any performance numbers for it? Out of order scheduling often doesn't give much gain, but it would be good to verify it.
The N1-basic-instructions.s test seems to need an update, according to the precommit tests.
@dmgreen, indeed, the wider the machine, the less the benefits of specific scheduling are. Yet, there may still be opportunities. After all, even in these our days, performance critical code is still hand written in assembly, for many reasons, including optimal scheduling.
Anyways, I recorded these performance improvements using SPEC CPU2017:
- SPECspeed2017_int: +1%
- SPECrate2017_int: +1%
- SPECspeed2017_fp: +1%
- SPECrate2017_fp: -1%
I'm still investigating the one regression, by 2%, in 519.lbm_r, but methinks that this patch is good enough for consideration.
Do you have the individual benchmark results? And is the 2% loss likely random fluctuation, that was just unlucky or do you think there might be a reason for the difference?
Yes. Please, stand by.
And is the 2% loss likely random fluctuation, that was just unlucky or do you think there might be a reason for the difference?
It does not seem to be a random fluctuation. I'm still investigating it.
Benchmarks | Δ% |
---|---|
600.perlbench_s | 101% |
602.gcc_s | 101% |
605.mcf_s | 101% |
620.omnetpp_s | 102% |
623.xalancbmk_s | 101% |
625.x264_s | 101% |
631.deepsjeng_s | 100% |
641.leela_s | 100% |
648.exchange2_s | 100% |
657.xz_s | 101% |
SPECspeed2017_int | 101% |
Benchmarks | Δ% |
---|---|
500.perlbench_r | 100% |
502.gcc_r | 100% |
505.mcf_r | 100% |
520.omnetpp_r | 100% |
523.xalancbmk_r | 101% |
525.x264_r | 101% |
531.deepsjeng_r | 100% |
541.leela_r | 100% |
548.exchange2_r | 101% |
557.xz_r | 101% |
SPECrate2017_int | 101% |
Benchmarks | Δ% |
---|---|
603.bwaves_s | |
607.cactuBSSN_s | 100% |
619.lbm_s | 98% |
621.wrf_s | |
627.cam4_s | |
628.pop2_s | |
638.imagick_s | 103% |
644.nab_s | 100% |
649.fotonik3d_s | |
654.roms_s | |
SPECspeed2017_fp | 100% |
Benchmarks | Δ% |
---|---|
503.bwaves_r | |
507.cactuBSSN_r | 100% |
508.namd_r | 99% |
510.parest_r | 101% |
511.povray_r | 100% |
519.lbm_r | 99% |
521.wrf_r | |
526.blender_r | 100% |
527.cam4_r | |
538.imagick_r | 102% |
544.nab_r | 101% |
549.fotonik3d_r | 102% |
554.roms_r | 100% |
SPECrate2017_fp | 100% |
Thanks. Is that speed or time? I assume speed so higher is better?
I'm just asking because I had tested on the Noeverse-N1 hardware we have as it is easy to run. It has been fairly well setup to minimize noise, and noticed a higher time on x264 then I would have expected. It has reproduced multiple times, but there is a chance that it is just what I would call "layout noise", and the same code with slightly different options would give different results. If you don't see the same thing then it is probably OK. The other results all looked fine, all plus or minus a percent. The change in lbm I saw was smaller, for example, only 0.9%.
Thanks. Is that speed or time? I assume speed so higher is better?
Correct.
I'm just asking because I had tested on the Noeverse-N1 hardware we have as it is easy to run. It has been fairly well setup to minimize noise, and noticed a higher time on x264 then I would have expected. It has reproduced multiple times, but there is a chance that it is just what I would call "layout noise", and the same code with slightly different options would give different results. If you don't see the same thing then it is probably OK. The other results all looked fine, all plus or minus a percent. The change in lbm I saw was smaller, for example, only 0.9%.
I can say that I see a consistent improvement of about 1% or a couple of seconds in x264 with this patch. With regards to lbm, I'm being pessimistic above, as I often see a slow down of about 1%. There is a bit of "layout noise" (I'm borrowing it!) and nothing jumps to attention to explain the difference.
PS: I'd love to get the recipe to minimize system noise.
OK thanks. After trying a few things I can see why it was getting worse, but agree that it looks OK overall.
LGTM. Thanks.