Add support of .nops on X86. This addresses llvm.org/PR45788.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Would it be possible to emit a new Fragment type and leverage the existing NOP emission code in X86AsmBackend.cpp. The code here appears to be copied from X86MCInstLower. Which would mean we would now have 3 places that do almost the same thing.
Thanks for the suggestion. Did not know there was a second copy already. Will try to reuse the code from X86AsmBackend.cpp.
I asked what the largest operand .nops supports https://sourceware.org/bugzilla/show_bug.cgi?id=25789 but did not get a clear answer. Maybe some reviewer can help me ask for a clarification...
llvm/test/MC/X86/x86-directive-nops.s | ||
---|---|---|
2 | # | |
8 | # X32 |
If NopSize is basically equivalent for NumBytes except for 2 cases, then I'd do:
if (!NopSize) { print_some_error(); return; } unsigned NopSize = NumBytes; switch (NumBytes) { ... default: NopSize = 10; ... }
So that you don't have to assign NopSize = NumBytes for each other case.
@craig.topper I tried to add a new Fragment type and use X86AsmBackend::writeNopData to insert NOP instructions. It however changed the outcome of the test cases I used, as X86AsmBackend::writeNopData only generatssingle-byte nop instructions if X86::FeatureNOPL feature bit is not true, while the maxLongNopLength in the current implementation (from X86MCInstLower) allows long nop instructions up to 10 bytes on a 64-bit processor. Which one is preferred? Also, which target triple should I use to test multiple-byte nop instructions? Thanks.
@MaskRay @nickdesaulniers Thanks for the comments. Will refactor the code once we decide which way to go with the implementation.
That seems like a bug in writeNopData. All real 64-bit CPUs should have FeatureNOPL. But I think llvm-mc and llc default to "generic" as a CPU which isn't a real CPU and doesn't have the feature. clang never uses the "generic". The default 64-bit CPU for clang is "x86-64" which does have the feature. I'll see what happens if we check for 64-bit mode explicitly in writeNopData and post a patch tonight or tomorrow.
Thanks! Checking if the CPU is 64-bit along FeatureNOPL seemed to work. Will verify more. Also I'd like to point out the difference of 32-bit CPUs.
I have a concern that we haven't figured out the largest operand .nops supports. We should clarify this (https://sourceware.org/bugzilla/show_bug.cgi?id=25789 )
(1) This is a temporary implementation as it breaks the following tests:
LLVM :: MC/COFF/align-nops.s LLVM :: MC/MachO/x86_32-optimal_nop.s LLVM :: MC/X86/AlignedBundling/misaligned-bundle-group.s LLVM :: MC/X86/AlignedBundling/misaligned-bundle.s LLVM :: MC/X86/align-branch-bundle.s LLVM :: MC/X86/align-branch-pad-max-prefix.s LLVM :: MC/X86/x86_long_nop.s LLVM :: MC/X86/x86_nop.s
The breakage was caused by checking X86::Mode32Bit and X86::Mode64Bit in the newly introduced X86AsmBackend::getMaximumNopSize() function. All the above tests would pass if I removed the two checks, but .nops calls would then emit byte-long nop instructions regardless of its second argument.
(2) Will address error messages in the next iteration.
I'm having trouble posting comments inline; it seems fabricator won't allow me to "save" comments.
Is it too painful to make the member of MCNopsFragment uint64_t? a la MCStreamer::emitFill
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp | ||
---|---|---|
1073 | Mode32Bit is only relevant if FeatureNOPL is false. So I think it should be if (!STI.getFeatureBits()[X86::FeatureNOPL] && !STI.getFeatureBits()[X86::Mode64Bit]) return STI.getFeatureBits()[X86::Mode32Bit] ? 2 : 1; |
I suppose it's easy to do but I'll have to explicitly cast them to signed integers to check if they are smaller than 0. That is probably more error-prone IMO. WDYT?
Issuing multiple-byte NOP for 32-bit mode broke many tests and probably worth a separate patch itself. So this patch will keep the behavior on 32-bit mode unchange for now. This built and passed all the tests.
llvm/include/llvm/MC/MCFragment.h | ||
---|---|---|
358 | Can you call this MaxNopLength or something? | |
llvm/lib/MC/MCAssembler.cpp | ||
625 | When parsing asm, you reject negative lengths. Should these simply be asserts? | |
633 | Does this behaviour match existing gnu? I'd have expected the result of specifying a "too large" maximum size to simply clamp to the target's maximum. This is important as if the result is semantic, then the difference between "largest encodeable" and "largest profitable" becomes a thing the rest of the code has to care about. 15 byte nops are almost always *legal* they're just not *fast*. | |
644 | This loop is duplicated from within emitNops. Can you pass in a MaxNopLength parameter instead? | |
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp | ||
1072 | Rename this function to getMaximumProfitableNop() There's a difference between legality and profit here. As commented earlier, if that matters you'll have a harder task implementation wise. | |
llvm/test/MC/X86/align-branch-bundle.s | ||
9 ↗ | (On Diff #275853) | Having a test delta in a file without .nops is highly suspicious. I'd suggest splitting your patch into a trivial version which emits single byte nops, and an change which adds the multiple byte support. That would allow us to separate the directive mechanics from the interesting profit bits. |
llvm/lib/MC/MCAssembler.cpp | ||
---|---|---|
633 |
Appears so. $ cat foo.s $ gcc -c foo.s With the patch applied, | |
llvm/test/MC/X86/align-branch-bundle.s | ||
9 ↗ | (On Diff #275853) | How about we also print out instruction bytes here. If 64-bit processors can generate a two-byte long nop instruction here, shouldn't we emit that instead of two single-byte nop? Thanks. |
llvm/lib/MC/MCAssembler.cpp | ||
---|---|---|
644 | There isn't any loop in emitNops. Do you by any chance refer to the loop in writeNopData? In that case, they are not duplicate as this loop will break total bytes into nop instructions no longer than specified by the second argument of .nops if provided, while the loop in writeNopData makes sure each instruction emitted is no longer than the maximum length allowed by the target. | |
llvm/test/MC/X86/align-branch-bundle.s | ||
9 ↗ | (On Diff #275853) | On second thought, I agree that splitting the patch is the better approach in case the multiple-byte support causes any regression. Will address this in the next iteration. |
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp | ||
---|---|---|
1072 | Any reason for not reusing maxLongNopLength() rather than rewriting the same thing here? https://github.com/llvm/llvm-project/blob/b2eb1c5793d78d70c1223b098aefc87050f69a8c/llvm/lib/Target/X86/X86MCInstLower.cpp#L1085 |
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp | ||
---|---|---|
1072 | That function can't be moved as is. It uses X86Subtarget which isn't available to MC. It does something different than for 32-bit mode than what is currently in this patch as that causes additional test failures as discussed elsewhere in this review. That function also uses ProcIntelSLM instead of Feature7ByteNOP. And the FeatureFast flags being set assumes FeatureNOPL is set which is backwards of how it should be. I think the function here is closer to how it should be except for the 32-bit difference. |
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp | ||
---|---|---|
1072 |
Yes I'm all for merging these two functions although there are some differences on both 32-bit and 64-bit mode that would break some unit tests, such as https://reviews.llvm.org/D82826?id=275853 on 64-bit mode. Maybe we can address that in a separate patch as previously discussed.
SG. Will start to work on merging them once this patch is checked in. |
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp | ||
---|---|---|
1072 | @craig.topper Okay I missed your comment. Thanks for the clarification. |
@reames @craig.topper Hi just want to double check if there are any additional issues I should address. Thanks.
I think I have a very old (I emphasized it again in another comment) comment which isn't addressed (https://reviews.llvm.org/D82826#2126406 )
We should align with GNU as on the largest operand .nops supports.
I think it's probably better to address that in a separate patch as the this patch is focused on adding support to .nops directive, similar to this comment https://reviews.llvm.org/D82826?id=275853#inline-765965.
llvm/include/llvm/MC/MCFragment.h | ||
---|---|---|
358 | Was this comment addressed? I'm not sure what the variable was called when @reames made this comment. | |
llvm/lib/MC/MCAssembler.cpp | ||
628 | Should we put Asm.getBackend().getMaximumNopSize() into a variable? We call it 3 times. | |
634 | Should we clamp the NopLength if we hit the error case? reportError won't stop immediately. Or are we relying on writeNopData to not exceed the maximum size internally? | |
637 | Maybe just use a if statement here for !NopLength | |
644 | I think he was refering to writeNopData. I suppose we could add a limit parameter to writeNopData, but every target would need to be updated for it. | |
llvm/test/MC/X86/x86-directive-nops-errors.s | ||
6 | Please use X86 rather than X32. X32 gets confusing with gnux32 where 32-bit pointers are used on a 64-bit target. |
llvm/include/llvm/MC/MCFragment.h | ||
---|---|---|
358 | It was called NopLength. The name was confusing because I called it NopLength here but the same value MaxNopLength somewhere so @reames suggested me to rename it to MaxNopLength . While working on that I realized this value was meant to specify a soft cap on the size limit of a no-op instruction, i.e. the second argument (control) as https://sourceware.org/binutils/docs/as/Nops.html#Nops specified, so I kept the name to avoid the confusion with the function name getMaximumNopSize, which returns a "hard cap" LLVM can accept. Any suggestion on what a better name could be? Also I realized MaxNopLen in the newly introduced emitNops function should be renamed once we make a final decision on the naming. Will address other comments together in the next iteration. Thanks! |
Renamed a variable name and addressed comments. @craig.topper Please let me know if it looks any better. And thanks for enabling multibyte-NOPs in 64-bit mode.
Can you call this MaxNopLength or something?