This is an archive of the discontinued LLVM Phabricator instance.

[X86] Alter throughput for vpshufb/vpperm on bdver2 model to match AMD documentation (PR51539)
AbandonedPublic

Authored by RKSimon on Sep 25 2021, 9:05 AM.

Details

Summary

As reported on PR51539, codegen involving vpshufb/vpperm appears to report higher than likely throughput costs.

e.g. ctpop: https://c.godbolt.org/z/4hcaMqPzd

According to the AMDFam15h SoG, these are fastpath (tp = 1.0) but just on pipe1 (xbr). Agner + Instxlat agree that both the latency and throughput are faster than the model as well.

AMD (https://www.amd.com/system/files/TechDocs/47414_15h_sw_opt_guide.pdf)
Agner (https://agner.org/optimize/instruction_tables.pdf)
Instxlat (http://users.atw.hu/instlatx64/AuthenticAMD/AuthenticAMD0610F01_K15_Piledriver_InstLatX64.txt)

I think most other shuffles should probably be using xbr as well?

Diff Detail

Event Timeline

RKSimon created this revision.Sep 25 2021, 9:05 AM
RKSimon requested review of this revision.Sep 25 2021, 9:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 25 2021, 9:05 AM

@lebedev.ri Any thoughts?

Sorry, i've been meaning to verify this, but hasn't gotten around to it :/

Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2022, 8:28 AM
Herald added a subscriber: jsji. · View Herald Transcript
RKSimon abandoned this revision.Sep 6 2022, 9:02 AM