This is an archive of the discontinued LLVM Phabricator instance.

[x86] Teach the "generic" x86 CPU to avoid patterns that are slow on widely used processors.
ClosedPublic

Authored by chandlerc on Aug 20 2017, 9:46 PM.

Details

Summary

This occured to me when I saw that we were generating 'inc' and 'dec'
when for Haswell and newer we shouldn't. However, there were a few "X is
slow" things that we should probably just set.

I've avoided any of the "X is fast" features because most of those would
be pretty serious regressions on processors where X isn't actually fast.
The slow things are likely to be negligible costs on processors where
these aren't slow and a significant win when they are slow.

In retrospect this seems somewhat obvious. Not sure why we didn't do this a long time ago.

Event Timeline

chandlerc created this revision.Aug 20 2017, 9:46 PM
craig.topper edited edge metadata.Aug 20 2017, 10:50 PM

This seems reasonable to me. @zvi and @RKSimon, what do you think?

echristo accepted this revision.Aug 21 2017, 12:18 AM

Seems reasonable to add a comment as to what microarch features we're attempting to target as modern here.

Definitely starting to hit the point where we should verify for amd processors though - I'm not sure how any of the new zen based fare here.

-eric

This revision is now accepted and ready to land.Aug 21 2017, 12:18 AM

Seems reasonable to add a comment as to what microarch features we're attempting to target as modern here.

Definitely starting to hit the point where we should verify for amd processors though - I'm not sure how any of the new zen based fare here.

Yeah, I actually looked and this is as good as LLVM's existing information allows.

For example, btver2 up through znver1 have SlowSHLD, and so not including FastSHLDRotate is probably good here to produce "least bad" x86 across architectures.

Other than that though, I couldn't find any FastFoo or SlowFoo features in LLVM's AMD processor feature sets that would make sense here and aren't already covered. Really just SlowBTMem on Barcelona and fam10, and that's been long covered.

This seems really uncontroversial as it essentially just avoids some patterns. Going ahead and landing for now. Happy to revisit or enhance this as desired by others of course.

Thanks for the reviews so far (and glad you mentioned AMD processors Eric!)

This revision was automatically updated to reflect the committed changes.