This is an archive of the discontinued LLVM Phabricator instance.

[X86] Move HasNOPL to a subtarget feature bit. Plumb MCSubtargetInfo through the MCAsmBackend constructor
ClosedPublic

Authored by craig.topper on Jan 3 2018, 5:44 PM.

Details

Summary

After D41349, we can no get a MCSubtargetInfo into the MCAsmBackend constructor. This allows us to get NOPL from a subtarget feature rather than a CPU name blacklist.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Jan 3 2018, 5:44 PM

Nice, I was half way through doing this myself....

PR22965 needs addressing as well - instead of driving off the silvermont feature bit - do we need a 'PreferNop15' feature bit for AMD (fam15/fam16/ryzen) and possibly recent Intel (???) targets and otherwise use 11 byte NOPs?

Limit NOP length to 10 on everything but AMD fam 16h and 17h. @RKSimon you mentioned fam 15h in your previous update, but the optimization manual I found for fam 15h only talks about 11 byte.

We could maybe go to 11 instead of 10. But Intel documentation only lists out to 9 bytes. And the current 10-15 byte sequences we do are different than the 10-15 byte sequences in AMD's manual. We use a CS prefix that's not mentioned in the optimization manuals. The CS prefix does appear in the binutils 10 byte sequence. I believe binutils stops at 10 bytes for all CPUs.

RKSimon added subscribers: pcordes, andreadb.

Limit NOP length to 10 on everything but AMD fam 16h and 17h. @RKSimon you mentioned fam 15h in your previous update, but the optimization manual I found for fam 15h only talks about 11 byte.

Yes, I think you're right - the fam15 sog says 3 prefix bytes are the max to avoid a decode stall, so 11 bytes should be the limit, but going to just 10 bytes is probably OK. Sorry I was going off a comment in PR22965 without rechecking.

We could maybe go to 11 instead of 10. But Intel documentation only lists out to 9 bytes. And the current 10-15 byte sequences we do are different than the 10-15 byte sequences in AMD's manual. We use a CS prefix that's not mentioned in the optimization manuals. The CS prefix does appear in the binutils 10 byte sequence. I believe binutils stops at 10 bytes for all CPUs.

If we're limiting the 11-15 byte options to the AMD targets, I think it'd be better to follow their recommendations (which luckily are the same in both the 16h and 17h sogs).

I feel guilty for asking for this change as it looks like the max nop length change should be part of a separate patch as it might have performance effects that need reviewing.

@andreadb @pcordes do you have any suggestions?

Should I go back to my first patch that just does the plumbing for now?

Should I go back to my first patch that just does the plumbing for now?

Yes please, and split off the NOP length changes to a seperate patch to allow for regression tests. That way I think this patch should have no effect on codegen and is just a cleanup?

Revert back to previous version of the patch which should be NFC

RKSimon accepted this revision.Jan 10 2018, 12:34 PM

LGTM - thanks.

This revision is now accepted and ready to land.Jan 10 2018, 12:34 PM
This revision was automatically updated to reflect the committed changes.