Alternative nops added in http://reviews.llvm.org/D14178 are not actually nops in x86-64. However all 64 bit CPUs support true long nops.
This patch makes AsmBackend to use restricted set of alternative long nops for x86-64.
Details
- Reviewers
bruno nadav • tstellarAMD echristo • ddunbar
Diff Detail
Event Timeline
Hi Andrey,
lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp | ||
---|---|---|
349 | Any reason why this shouldn't be done as a target feature as suggested in the FIXME above? |
Hi Bruno,
lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp | ||
---|---|---|
349 | Target feature can be used as indicator of true long nop support, but that won't fix the issue. The alternative nop instructions I introduced in http://reviews.llvm.org/D14178 are not actually nops for x86-64 arch. But all CPUs which don't support true long nops are 32-bit-only. So I propose to use true long nops if we compile for x86-64 arch for any CPU even if it's marked as 'doesn't support true long nops'. |
lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp | ||
---|---|---|
349 | I see. However, we could have the opposite logic, something like "FeatureShortNop", and mark generic, i386, etc with them. I looked up these CPUs and none of them have Feature64Bit, so guess HasNopl will simply become !FeatureShortNop. |
lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp | ||
---|---|---|
349 | Basing this off target features would also make it much easier to chose the optimal nop length for intel/amd (as discussed on PR22965). |
lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp | ||
---|---|---|
349 | Oh, I think my understanding of 'generic' CPU wasn't quite right. It's not supposed to be used in 64-bit mode, right? ('x86-64' CPU seems to be 64-bit 'generic') Simon, how do you suggest to choose the optimal nop length in this case? |
lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp | ||
---|---|---|
349 | Probably provide a X86Subtarget::getMaxNopLength() helper function that uses feature bits internally. |
I made AsmBackend to use subtarget features to determine whether true long nops are supported or not, as suggested.
Yet to fix the PR we still need additional code, see lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp : 61.
The most part of the patch is adding MCSubtargetInfo as argument for AsmBackend creator functions for all targets, that's why the patch is so big. But I didn't find any way to do it better.
lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp | ||
---|---|---|
61 | This is to fix https://llvm.org/bugs/show_bug.cgi?id=26554. | |
88 | Before this patch llvm would use true long nops when CPU is not specified, so I kept the behavior unchanged. |
include/llvm/Support/TargetRegistry.h | ||
---|---|---|
118 | The threading issue is the most important but can't Triple and CPU be gotten from MCSubtargetInfo? I realise this would be even more upheaval requiring the patch to be split further. |
I modified the patch as Simon suggested: AsmBackend now gets Triple and CPU from MCSubtargetInfo.
Note that the CPU from MCSubtargetInfo is a bit different from the CPU we used to pass as argument: if -mcpu isn't specified the CPU used to be "", but now it's "generic". I think that this is actually how it should be, however the change in behavior required to fix a couple of tests by specifying -mcpu option (the tests were expecting long nop generation but "generic" doesn't support them).
I'm planning to split this change-set into five commits into LLVM:
- Use MCSubtargetInfo argument instead of CPU and Triple in create*AsmBackend functions
- Replace CPU argument with MCSubtargetInfo in all AsmBackend classes in X86AsmBackend.cpp
- Introduction of FeatureLongNop
- Introduction of FeatureFastNop7
- Fix for alternative long nop generation for x86-64
Also there is a small patch to be committed in clang: http://reviews.llvm.org/D21066
I don't think that MCAsmBackend is the right place for storing this information (and so the rest of the change is also not OK).
Taking a step back this needs to be looked at from two directions:
a) overall module/.o specific things
b) subtarget specific things
The problem is that the way the patch is written you're using the base subtarget all the time and not taking into account any other ones at the function level. So your nops won't vary depending on which function is requesting which cpu.
What I think we'll want to do is take writeNopData off of MCAsmBackend and work it into something that already has the subtarget and then for global nop data we'll use the default subtarget that comes with the module.
Make sense?
I'm not sure about threading this around would yield any unwanted dependency. @Akira and @Eric, is this a problem?