This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add pipeline model for Neoverse N1
ClosedPublic

Authored by evandro on Jun 7 2023, 7:50 PM.

Details

Diff Detail

Event Timeline

evandro created this revision.Jun 7 2023, 7:50 PM
Herald added a project: Restricted Project. · View Herald Transcript
evandro requested review of this revision.Jun 7 2023, 7:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2023, 7:50 PM
dmgreen added a subscriber: dmgreen.

Morning. Apparently it is scheduling model season.

This looks like a nice addition. Do you have any performance numbers for it? Out of order scheduling often doesn't give much gain, but it would be good to verify it.

The N1-basic-instructions.s test seems to need an update, according to the precommit tests.

@dmgreen, indeed, the wider the machine, the less the benefits of specific scheduling are. Yet, there may still be opportunities. After all, even in these our days, performance critical code is still hand written in assembly, for many reasons, including optimal scheduling.

Anyways, I recorded these performance improvements using SPEC CPU2017:

  • SPECspeed2017_int: +1%
  • SPECrate2017_int: +1%
  • SPECspeed2017_fp: +1%
  • SPECrate2017_fp: -1%

I'm still investigating the one regression, by 2%, in 519.lbm_r, but methinks that this patch is good enough for consideration.

evandro updated this revision to Diff 529618.Jun 8 2023, 8:50 AM

Do you have the individual benchmark results? And is the 2% loss likely random fluctuation, that was just unlucky or do you think there might be a reason for the difference?

Do you have the individual benchmark results?

Yes. Please, stand by.

And is the 2% loss likely random fluctuation, that was just unlucky or do you think there might be a reason for the difference?

It does not seem to be a random fluctuation. I'm still investigating it.

scw added a subscriber: scw.Jun 15 2023, 3:24 PM
evandro added a comment.EditedJun 20 2023, 5:57 PM
BenchmarksΔ%
600.perlbench_s101%
602.gcc_s101%
605.mcf_s101%
620.omnetpp_s102%
623.xalancbmk_s101%
625.x264_s101%
631.deepsjeng_s100%
641.leela_s100%
648.exchange2_s100%
657.xz_s101%
SPECspeed2017_int101%
BenchmarksΔ%
500.perlbench_r100%
502.gcc_r100%
505.mcf_r100%
520.omnetpp_r100%
523.xalancbmk_r101%
525.x264_r101%
531.deepsjeng_r100%
541.leela_r100%
548.exchange2_r101%
557.xz_r101%
SPECrate2017_int101%
BenchmarksΔ%
603.bwaves_s
607.cactuBSSN_s100%
619.lbm_s98%
621.wrf_s
627.cam4_s
628.pop2_s
638.imagick_s103%
644.nab_s100%
649.fotonik3d_s
654.roms_s
SPECspeed2017_fp100%
BenchmarksΔ%
503.bwaves_r
507.cactuBSSN_r100%
508.namd_r99%
510.parest_r101%
511.povray_r100%
519.lbm_r99%
521.wrf_r
526.blender_r100%
527.cam4_r
538.imagick_r102%
544.nab_r101%
549.fotonik3d_r102%
554.roms_r100%
SPECrate2017_fp100%

Thanks. Is that speed or time? I assume speed so higher is better?

I'm just asking because I had tested on the Noeverse-N1 hardware we have as it is easy to run. It has been fairly well setup to minimize noise, and noticed a higher time on x264 then I would have expected. It has reproduced multiple times, but there is a chance that it is just what I would call "layout noise", and the same code with slightly different options would give different results. If you don't see the same thing then it is probably OK. The other results all looked fine, all plus or minus a percent. The change in lbm I saw was smaller, for example, only 0.9%.

@dmgreen,

Thanks. Is that speed or time? I assume speed so higher is better?

Correct.

I'm just asking because I had tested on the Noeverse-N1 hardware we have as it is easy to run. It has been fairly well setup to minimize noise, and noticed a higher time on x264 then I would have expected. It has reproduced multiple times, but there is a chance that it is just what I would call "layout noise", and the same code with slightly different options would give different results. If you don't see the same thing then it is probably OK. The other results all looked fine, all plus or minus a percent. The change in lbm I saw was smaller, for example, only 0.9%.

I can say that I see a consistent improvement of about 1% or a couple of seconds in x264 with this patch. With regards to lbm, I'm being pessimistic above, as I often see a slow down of about 1%. There is a bit of "layout noise" (I'm borrowing it!) and nothing jumps to attention to explain the difference.

PS: I'd love to get the recipe to minimize system noise.

dmgreen accepted this revision.Jun 29 2023, 5:41 AM

OK thanks. After trying a few things I can see why it was getting worse, but agree that it looks OK overall.

LGTM. Thanks.

This revision is now accepted and ready to land.Jun 29 2023, 5:41 AM
This revision was automatically updated to reflect the committed changes.
llvm/test/CodeGen/AArch64/machine-licm-sub-loop.ll