After D41349, we can no get a MCSubtargetInfo into the MCAsmBackend constructor. This allows us to get NOPL from a subtarget feature rather than a CPU name blacklist.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Nice, I was half way through doing this myself....
PR22965 needs addressing as well - instead of driving off the silvermont feature bit - do we need a 'PreferNop15' feature bit for AMD (fam15/fam16/ryzen) and possibly recent Intel (???) targets and otherwise use 11 byte NOPs?
Limit NOP length to 10 on everything but AMD fam 16h and 17h. @RKSimon you mentioned fam 15h in your previous update, but the optimization manual I found for fam 15h only talks about 11 byte.
We could maybe go to 11 instead of 10. But Intel documentation only lists out to 9 bytes. And the current 10-15 byte sequences we do are different than the 10-15 byte sequences in AMD's manual. We use a CS prefix that's not mentioned in the optimization manuals. The CS prefix does appear in the binutils 10 byte sequence. I believe binutils stops at 10 bytes for all CPUs.
Yes, I think you're right - the fam15 sog says 3 prefix bytes are the max to avoid a decode stall, so 11 bytes should be the limit, but going to just 10 bytes is probably OK. Sorry I was going off a comment in PR22965 without rechecking.
We could maybe go to 11 instead of 10. But Intel documentation only lists out to 9 bytes. And the current 10-15 byte sequences we do are different than the 10-15 byte sequences in AMD's manual. We use a CS prefix that's not mentioned in the optimization manuals. The CS prefix does appear in the binutils 10 byte sequence. I believe binutils stops at 10 bytes for all CPUs.
If we're limiting the 11-15 byte options to the AMD targets, I think it'd be better to follow their recommendations (which luckily are the same in both the 16h and 17h sogs).
I feel guilty for asking for this change as it looks like the max nop length change should be part of a separate patch as it might have performance effects that need reviewing.
Yes please, and split off the NOP length changes to a seperate patch to allow for regression tests. That way I think this patch should have no effect on codegen and is just a cleanup?