Page MenuHomePhabricator

[X86] support .nops directive
ClosedPublic

Authored by jcai19 on Jun 29 2020, 5:24 PM.

Details

Summary

Add support of .nops on X86. This addresses llvm.org/PR45788.

Diff Detail

Event Timeline

jcai19 created this revision.Jun 29 2020, 5:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2020, 5:24 PM

Would it be possible to emit a new Fragment type and leverage the existing NOP emission code in X86AsmBackend.cpp. The code here appears to be copied from X86MCInstLower. Which would mean we would now have 3 places that do almost the same thing.

jcai19 added a comment.EditedJun 29 2020, 6:04 PM

Would it be possible to emit a new Fragment type and leverage the existing NOP emission code in X86AsmBackend.cpp. The code here appears to be copied from X86MCInstLower. Which would mean we would now have 3 places that do almost the same thing.

Thanks for the suggestion. Did not know there was a second copy already. Will try to reuse the code from X86AsmBackend.cpp.

I asked what the largest operand .nops supports https://sourceware.org/bugzilla/show_bug.cgi?id=25789 but did not get a clear answer. Maybe some reviewer can help me ask for a clarification...

llvm/test/MC/X86/x86-directive-nops.s
2

#

8

# X32

If NopSize is basically equivalent for NumBytes except for 2 cases, then I'd do:

if (!NopSize) {
  print_some_error();
  return;
}
unsigned NopSize = NumBytes;
switch (NumBytes) {
...
default:
  NopSize = 10;
  ...
}

So that you don't have to assign NopSize = NumBytes for each other case.

@craig.topper I tried to add a new Fragment type and use X86AsmBackend::writeNopData to insert NOP instructions. It however changed the outcome of the test cases I used, as X86AsmBackend::writeNopData only generatssingle-byte nop instructions if X86::FeatureNOPL feature bit is not true, while the maxLongNopLength in the current implementation (from X86MCInstLower) allows long nop instructions up to 10 bytes on a 64-bit processor. Which one is preferred? Also, which target triple should I use to test multiple-byte nop instructions? Thanks.

jcai19 added a comment.EditedJun 30 2020, 10:24 PM

@MaskRay @nickdesaulniers Thanks for the comments. Will refactor the code once we decide which way to go with the implementation.

@craig.topper I tried to add a new Fragment type and use X86AsmBackend::writeNopData to insert NOP instructions. It however changed the outcome of the test cases I used, as X86AsmBackend::writeNopData only generatssingle-byte nop instructions if X86::FeatureNOPL feature bit is not true, while the maxLongNopLength in the current implementation (from X86MCInstLower) allows long nop instructions up to 10 bytes on a 64-bit processor. Which one is preferred? Also, which target triple should I use to test multiple-byte nop instructions? Thanks.

That seems like a bug in writeNopData. All real 64-bit CPUs should have FeatureNOPL. But I think llvm-mc and llc default to "generic" as a CPU which isn't a real CPU and doesn't have the feature. clang never uses the "generic". The default 64-bit CPU for clang is "x86-64" which does have the feature. I'll see what happens if we check for 64-bit mode explicitly in writeNopData and post a patch tonight or tomorrow.

Thanks! Checking if the CPU is 64-bit along FeatureNOPL seemed to work. Will verify more. Also I'd like to point out the difference of 32-bit CPUs.

writeNopsData should always use multibyte nops in 64-bit mode now.

is the 32-bit different the 2 byte case?

jcai19 added a comment.Jul 1 2020, 1:14 PM

is the 32-bit different the 2 byte case?

Yes.

I have a concern that we haven't figured out the largest operand .nops supports. We should clarify this (https://sourceware.org/bugzilla/show_bug.cgi?id=25789 )

jcai19 updated this revision to Diff 274963.Jul 1 2020, 5:05 PM

(1) This is a temporary implementation as it breaks the following tests:

LLVM :: MC/COFF/align-nops.s
LLVM :: MC/MachO/x86_32-optimal_nop.s
LLVM :: MC/X86/AlignedBundling/misaligned-bundle-group.s
LLVM :: MC/X86/AlignedBundling/misaligned-bundle.s
LLVM :: MC/X86/align-branch-bundle.s
LLVM :: MC/X86/align-branch-pad-max-prefix.s
LLVM :: MC/X86/x86_long_nop.s
LLVM :: MC/X86/x86_nop.s

The breakage was caused by checking X86::Mode32Bit and X86::Mode64Bit in the newly introduced X86AsmBackend::getMaximumNopSize() function. All the above tests would pass if I removed the two checks, but .nops calls would then emit byte-long nop instructions regardless of its second argument.

(2) Will address error messages in the next iteration.

I'm having trouble posting comments inline; it seems fabricator won't allow me to "save" comments.

Is it too painful to make the member of MCNopsFragment uint64_t? a la MCStreamer::emitFill

craig.topper added inline comments.Jul 1 2020, 5:31 PM
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
1073

Mode32Bit is only relevant if FeatureNOPL is false. So I think it should be

if (!STI.getFeatureBits()[X86::FeatureNOPL] &&
    !STI.getFeatureBits()[X86::Mode64Bit])
  return STI.getFeatureBits()[X86::Mode32Bit] ? 2 : 1;
jcai19 added a comment.Jul 1 2020, 6:14 PM

I'm having trouble posting comments inline; it seems fabricator won't allow me to "save" comments.

Is it too painful to make the member of MCNopsFragment uint64_t? a la MCStreamer::emitFill

I suppose it's easy to do but I'll have to explicitly cast them to signed integers to check if they are smaller than 0. That is probably more error-prone IMO. WDYT?

jcai19 updated this revision to Diff 274981.Jul 1 2020, 6:19 PM

Refactor X86AsmBackend::getMaximumNopSize based on @craig.topper's comment.

jcai19 updated this revision to Diff 275853.Jul 6 2020, 3:47 PM

Issuing multiple-byte NOP for 32-bit mode broke many tests and probably worth a separate patch itself. So this patch will keep the behavior on 32-bit mode unchange for now. This built and passed all the tests.

reames requested changes to this revision.Jul 6 2020, 6:03 PM
reames added inline comments.
llvm/include/llvm/MC/MCFragment.h
358

Can you call this MaxNopLength or something?

llvm/lib/MC/MCAssembler.cpp
625

When parsing asm, you reject negative lengths. Should these simply be asserts?

633

Does this behaviour match existing gnu? I'd have expected the result of specifying a "too large" maximum size to simply clamp to the target's maximum.

This is important as if the result is semantic, then the difference between "largest encodeable" and "largest profitable" becomes a thing the rest of the code has to care about. 15 byte nops are almost always *legal* they're just not *fast*.

644

This loop is duplicated from within emitNops. Can you pass in a MaxNopLength parameter instead?

llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
1072

Rename this function to getMaximumProfitableNop()

There's a difference between legality and profit here. As commented earlier, if that matters you'll have a harder task implementation wise.

llvm/test/MC/X86/align-branch-bundle.s
9 ↗(On Diff #275853)

Having a test delta in a file without .nops is highly suspicious.

I'd suggest splitting your patch into a trivial version which emits single byte nops, and an change which adds the multiple byte support. That would allow us to separate the directive mechanics from the interesting profit bits.

This revision now requires changes to proceed.Jul 6 2020, 6:03 PM
jcai19 marked 2 inline comments as done.Jul 7 2020, 1:43 PM
jcai19 added inline comments.
llvm/lib/MC/MCAssembler.cpp
633

Does this behaviour match existing gnu?

Appears so.

$ cat foo.s
.nops 16, 15

$ gcc -c foo.s
foo.s: Assembler messages:
foo.s:1: Error: invalid single nop size: 15 (expect within [0, 11])

With the patch applied,
$ llvm-mc -filetype=obj -triple=x86_64 foo.s
foo.s:1:1: error: illegal NOP size 15. (expected within [0, 10])
.nops 16, 15
^

llvm/test/MC/X86/align-branch-bundle.s
9 ↗(On Diff #275853)

How about we also print out instruction bytes here. If 64-bit processors can generate a two-byte long nop instruction here, shouldn't we emit that instead of two single-byte nop? Thanks.

jcai19 marked 2 inline comments as done.Jul 7 2020, 2:39 PM
jcai19 added inline comments.
llvm/lib/MC/MCAssembler.cpp
644

There isn't any loop in emitNops. Do you by any chance refer to the loop in writeNopData? In that case, they are not duplicate as this loop will break total bytes into nop instructions no longer than specified by the second argument of .nops if provided, while the loop in writeNopData makes sure each instruction emitted is no longer than the maximum length allowed by the target.

llvm/test/MC/X86/align-branch-bundle.s
9 ↗(On Diff #275853)

On second thought, I agree that splitting the patch is the better approach in case the multiple-byte support causes any regression. Will address this in the next iteration.

jcai19 updated this revision to Diff 276226.Jul 7 2020, 2:55 PM

Address some of @reames's concerns, and rename variables to avoid confusion.

aganea added inline comments.Jul 7 2020, 3:19 PM
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
1072

Any reason for not reusing maxLongNopLength() rather than rewriting the same thing here? https://github.com/llvm/llvm-project/blob/b2eb1c5793d78d70c1223b098aefc87050f69a8c/llvm/lib/Target/X86/X86MCInstLower.cpp#L1085
That function could perhaps be moved to llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h ?

jcai19 updated this revision to Diff 276235.Jul 7 2020, 3:28 PM

Fixed an assertion message.

craig.topper added inline comments.Jul 7 2020, 3:33 PM
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
1072

That function can't be moved as is. It uses X86Subtarget which isn't available to MC. It does something different than for 32-bit mode than what is currently in this patch as that causes additional test failures as discussed elsewhere in this review. That function also uses ProcIntelSLM instead of Feature7ByteNOP. And the FeatureFast flags being set assumes FeatureNOPL is set which is backwards of how it should be.

I think the function here is closer to how it should be except for the 32-bit difference.

jcai19 marked an inline comment as done.Jul 7 2020, 3:36 PM
jcai19 added inline comments.
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
1072

Any reason for not reusing maxLongNopLength() rather than rewriting the same thing here? https://github.com/llvm/llvm-project/blob/b2eb1c5793d78d70c1223b098aefc87050f69a8c/llvm/lib/Target/X86/X86MCInstLower.cpp#L1085

Yes I'm all for merging these two functions although there are some differences on both 32-bit and 64-bit mode that would break some unit tests, such as https://reviews.llvm.org/D82826?id=275853 on 64-bit mode. Maybe we can address that in a separate patch as previously discussed.

That function could perhaps be moved to llvm/lib/Target/X86/MCTargetDesc/X86BaseInfo.h ?

SG. Will start to work on merging them once this patch is checked in.

jcai19 marked an inline comment as done.Jul 7 2020, 3:44 PM
jcai19 added inline comments.
llvm/lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
1072

@craig.topper Okay I missed your comment. Thanks for the clarification.

@reames @craig.topper Hi just want to double check if there are any additional issues I should address. Thanks.

I think I have a very old (I emphasized it again in another comment) comment which isn't addressed (https://reviews.llvm.org/D82826#2126406 )
We should align with GNU as on the largest operand .nops supports.

I think I have a very old (I emphasized it again in another comment) comment which isn't addressed (https://reviews.llvm.org/D82826#2126406 )
We should align with GNU as on the largest operand .nops supports.

I think it's probably better to address that in a separate patch as the this patch is focused on adding support to .nops directive, similar to this comment https://reviews.llvm.org/D82826?id=275853#inline-765965.

craig.topper added inline comments.Wed, Jul 22, 5:43 PM
llvm/include/llvm/MC/MCFragment.h
358

Was this comment addressed? I'm not sure what the variable was called when @reames made this comment.

llvm/lib/MC/MCAssembler.cpp
628

Should we put Asm.getBackend().getMaximumNopSize() into a variable? We call it 3 times.

634

Should we clamp the NopLength if we hit the error case? reportError won't stop immediately. Or are we relying on writeNopData to not exceed the maximum size internally?

637

Maybe just use a if statement here for !NopLength

644

I think he was refering to writeNopData. I suppose we could add a limit parameter to writeNopData, but every target would need to be updated for it.

llvm/test/MC/X86/x86-directive-nops-errors.s
6

Please use X86 rather than X32. X32 gets confusing with gnux32 where 32-bit pointers are used on a 64-bit target.

jcai19 marked an inline comment as done.Thu, Jul 23, 3:02 PM
jcai19 added inline comments.
llvm/include/llvm/MC/MCFragment.h
358

It was called NopLength. The name was confusing because I called it NopLength here but the same value MaxNopLength somewhere so @reames suggested me to rename it to MaxNopLength . While working on that I realized this value was meant to specify a soft cap on the size limit of a no-op instruction, i.e. the second argument (control) as https://sourceware.org/binutils/docs/as/Nops.html#Nops specified, so I kept the name to avoid the confusion with the function name getMaximumNopSize, which returns a "hard cap" LLVM can accept. Any suggestion on what a better name could be? Also I realized MaxNopLen in the newly introduced emitNops function should be renamed once we make a final decision on the naming. Will address other comments together in the next iteration. Thanks!

jcai19 updated this revision to Diff 282289.Fri, Jul 31, 11:53 AM

Renamed a variable name and addressed comments. @craig.topper Please let me know if it looks any better. And thanks for enabling multibyte-NOPs in 64-bit mode.

LGTM

Thank you!

This revision was not accepted when it landed; it landed in state Needs Review.Mon, Aug 3, 11:51 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.