This is an archive of the discontinued LLVM Phabricator instance.

[X86] PR26554: Use not all set of alternative nops in 64 bit mode, but only those which are correct
AbandonedPublic

Authored by aturetsk on Feb 24 2016, 7:46 AM.

Details

Summary

Alternative nops added in http://reviews.llvm.org/D14178 are not actually nops in x86-64. However all 64 bit CPUs support true long nops.
This patch makes AsmBackend to use restricted set of alternative long nops for x86-64.

Ref: https://llvm.org/bugs/show_bug.cgi?id=26554

Diff Detail

Event Timeline

aturetsk updated this revision to Diff 48936.Feb 24 2016, 7:46 AM
aturetsk retitled this revision from to [X86] PR26554: Enable using of true long nops for x86-64 for every CPU.
aturetsk updated this object.
aturetsk added reviewers: nadav, ddunbar.
aturetsk added a subscriber: llvm-commits.
bruno edited edge metadata.May 11 2016, 9:08 AM

Hi Andrey,

lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
351

Any reason why this shouldn't be done as a target feature as suggested in the FIXME above?

Hi Bruno,

lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
351

Target feature can be used as indicator of true long nop support, but that won't fix the issue.

The alternative nop instructions I introduced in http://reviews.llvm.org/D14178 are not actually nops for x86-64 arch. But all CPUs which don't support true long nops are 32-bit-only. So I propose to use true long nops if we compile for x86-64 arch for any CPU even if it's marked as 'doesn't support true long nops'.
Well, we do generate 64-bit instructions for 32-bit-only CPUs when x86-64 arch is used, why not generate true long nops as well?
And I think 'generic' should use true long nops in 64-bit mode (since every real 64 bit CPU seems to support them) and use alternative long nops in 32 bit mode.

bruno added inline comments.May 25 2016, 10:47 AM
lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
351

I see. However, we could have the opposite logic, something like "FeatureShortNop", and mark generic, i386, etc with them. I looked up these CPUs and none of them have Feature64Bit, so guess HasNopl will simply become !FeatureShortNop.

RKSimon added inline comments.
lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
351

Basing this off target features would also make it much easier to chose the optimal nop length for intel/amd (as discussed on PR22965).

aturetsk added inline comments.May 26 2016, 7:15 AM
lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
351

Oh, I think my understanding of 'generic' CPU wasn't quite right. It's not supposed to be used in 64-bit mode, right? ('x86-64' CPU seems to be 64-bit 'generic')
If that's so then I see no reason to avoid using target features and I'll rework the patch (using FeatureLongNop).

Simon, how do you suggest to choose the optimal nop length in this case?
Use different target features for different max lenghts like this:
FeatureLongNop - allow nops of any length
FeatureLongNop7 - only allow nops smaller than 7 bytes
... etc?

RKSimon added inline comments.May 28 2016, 3:48 PM
lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
351

Probably provide a X86Subtarget::getMaxNopLength() helper function that uses feature bits internally.

aturetsk updated this revision to Diff 59576.Jun 3 2016, 10:11 AM
aturetsk edited edge metadata.

Use subtarget features to determine whether true long nops are supported or not.

I made AsmBackend to use subtarget features to determine whether true long nops are supported or not, as suggested.
Yet to fix the PR we still need additional code, see lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp : 61.
The most part of the patch is adding MCSubtargetInfo as argument for AsmBackend creator functions for all targets, that's why the patch is so big. But I didn't find any way to do it better.

lib/Target/X86/MCTargetDesc/X86AsmBackend.cpp
63
89

Before this patch llvm would use true long nops when CPU is not specified, so I kept the behavior unchanged.

aturetsk retitled this revision from [X86] PR26554: Enable using of true long nops for x86-64 for every CPU to [X86] PR26554: Use not all set of alternative nops in 64 bit mode, but only those which are correct.Jun 3 2016, 10:25 AM
aturetsk edited edge metadata.
bruno added inline comments.
include/llvm/Support/TargetRegistry.h
116

I'm not sure about threading this around would yield any unwanted dependency. @Akira and @Eric, is this a problem?

lib/Target/X86/X86.td
42

These should be two distinct commits/patches, one for FeatureLongNop and another for FeatureFastNop7.

RKSimon added inline comments.Jun 5 2016, 10:51 AM
include/llvm/Support/TargetRegistry.h
116–117

The threading issue is the most important but can't Triple and CPU be gotten from MCSubtargetInfo? I realise this would be even more upheaval requiring the patch to be split further.

aturetsk updated this revision to Diff 59864.Jun 7 2016, 4:39 AM
aturetsk updated this object.

Get Triple and CPU from STI in AsmBackend.

I modified the patch as Simon suggested: AsmBackend now gets Triple and CPU from MCSubtargetInfo.
Note that the CPU from MCSubtargetInfo is a bit different from the CPU we used to pass as argument: if -mcpu isn't specified the CPU used to be "", but now it's "generic". I think that this is actually how it should be, however the change in behavior required to fix a couple of tests by specifying -mcpu option (the tests were expecting long nop generation but "generic" doesn't support them).

I'm planning to split this change-set into five commits into LLVM:

  1. Use MCSubtargetInfo argument instead of CPU and Triple in create*AsmBackend functions
  2. Replace CPU argument with MCSubtargetInfo in all AsmBackend classes in X86AsmBackend.cpp
  3. Introduction of FeatureLongNop
  4. Introduction of FeatureFastNop7
  5. Fix for alternative long nop generation for x86-64

Also there is a small patch to be committed in clang: http://reviews.llvm.org/D21066

echristo requested changes to this revision.Jun 7 2016, 2:31 PM
echristo edited edge metadata.

I don't think that MCAsmBackend is the right place for storing this information (and so the rest of the change is also not OK).

Taking a step back this needs to be looked at from two directions:

a) overall module/.o specific things
b) subtarget specific things

The problem is that the way the patch is written you're using the base subtarget all the time and not taking into account any other ones at the function level. So your nops won't vary depending on which function is requesting which cpu.

What I think we'll want to do is take writeNopData off of MCAsmBackend and work it into something that already has the subtarget and then for global nop data we'll use the default subtarget that comes with the module.

Make sense?

This revision now requires changes to proceed.Jun 7 2016, 2:31 PM