Page MenuHomePhabricator

[llvm-objdump][AMDGPU] Detect CPU string
ClosedPublic

Authored by rochauha on Jul 24 2020, 5:06 AM.

Details

Summary

AMDGPU ISA isn't backwards compatible and hence -mcpu must always be specified during disassembly.
However, the AMDGPU target CPU is stored in e_flags in the ELF object.

This patch allows targets to implement CPU string detection, and also implements it for AMDGPU by looking at e_flags.

Diff Detail

Event Timeline

rochauha created this revision.Jul 24 2020, 5:06 AM
scott.linder added inline comments.Jul 24 2020, 8:07 AM
llvm/test/tools/llvm-objdump/ELF/AMDGPU/subtarget.ll
10

Would it make sense to have tests at various optimization levels? Generally we seem to prefer writing tests at the default opt level unless the test is explicitly checking something at a certain opt level.

llvm/tools/llvm-objdump/llvm-objdump.cpp
2156

Between the lint and not knowing off-hand what getHeader returns, I'd probably just not use auto here.

I think the full type can be as simple as ELF64LE::Ehdr?

I would also prefer dropping The, unless it is there to avoid colliding with another name?

2160–2194

Can this just be StringRef ArchName = AMDGPUTargetStreamer::getArchNameFromElfMach(e_flags && EF_AMDGPU_MACH); return ArchName.empty() ? None : ArchName?

If there is an issue with making that function visible here I would propose just moving the implementation, it seems like a pretty general thing.

2174

I would sink this under the condition for MCPU.empty() if you only need it in that case.

rochauha updated this revision to Diff 280667.Jul 25 2020, 4:18 AM
rochauha marked 3 inline comments as done.

Some changes based on review by @scott.linder.

llvm/test/tools/llvm-objdump/ELF/AMDGPU/subtarget.ll
10

I don't have any strong opinion regarding this. The kernel is empty anyways. I chose -O0 with the hope that the binary size would be slightly bigger than otherwise.

llvm/tools/llvm-objdump/llvm-objdump.cpp
2160–2194

From what I understand, TargetStreamer is used to emit binaries. But here we are going the other way round, we need to infer the details from the binary header.

Is it possible to move getArchNameFromElfMach() to lib/Support/TargetParser.*? It would certainly make this patch cleaner. And the code would also be used across AMDGPUTargetStreamer and llvm-objdump.

MaskRay added a comment.EditedJul 26 2020, 10:38 PM

I am a bit worrisome that this patch is adding more target specific stuff to llvm-objdump. Can we step back and think whether precise CPU selection is really necessary? For example, PowerPC has the needs for different ISA levels but it simply uses -mcpu=future.

If different AMDGPU CPUs are incompatible, should such detection be moved to somewhere in lib/Target/AMDGPU/?

echristo requested changes to this revision.Jul 26 2020, 10:42 PM
echristo added a subscriber: echristo.

I'd really like all of the cpu specific bits here to not reside in the objdump binary. Ideally it should be in include/ and lib/ somewhere.

This revision now requires changes to proceed.Jul 26 2020, 10:42 PM

I am a bit worrisome that this patch is adding more target specific stuff to llvm-objdump.

Agreed, I have looked at making the majority of llvm-objdump (and essentially everything under tools/) into a library at one point, but the biggest initial hurdle is the CommandLine parsing library. I think ideally all of the target-relevant parts would be in target-overridden implementations in some library, all of the llvm-objdump behavior parts (like the core symbol-to-symbol loop) would be in an llvm-objdump library which depends on the libraries with target-specific behaviors, and the tools/ directory would contain ~10 line main functions.

Can we step back and think whether precise CPU selection is really necessary? For example, PowerPC has the needs for different ISA levels but it simply uses -mcpu=future.

I am a bit lost on what you mean; is there a significant cost to providing precise CPU selection? It doesn't make sense to require the user to e.g. run the dumper to inspect the binary in order to come up with the command-line to run the dumper in order to inspect the binary.

If different AMDGPU CPUs are incompatible, should such detection be moved to somewhere in lib/Target/AMDGPU/?

Yes, I think it should be done in a generic way. All other targets which don't need/want it just getting the default (existing) behavior.

Can we step back and think whether precise CPU selection is really necessary? For example, PowerPC has the needs for different ISA levels but it simply uses -mcpu=future.

I am a bit lost on what you mean; is there a significant cost to providing precise CPU selection? It doesn't make sense to require the user to e.g. run the dumper to inspect the binary in order to come up with the command-line to run the dumper in order to inspect the binary.

-mcpu=pwr10 decodes more instructions than -mcpu=pwr9.
-mcpu=pwr9 decodes more instructions than -mcpu=pwr8.

I think GNU objdump just decodes all instructions it recognizes, not expecting a -mcpu option. It behaves as if the default recognized ISA level includes everything.

Can we step back and think whether precise CPU selection is really necessary? For example, PowerPC has the needs for different ISA levels but it simply uses -mcpu=future.

I am a bit lost on what you mean; is there a significant cost to providing precise CPU selection? It doesn't make sense to require the user to e.g. run the dumper to inspect the binary in order to come up with the command-line to run the dumper in order to inspect the binary.

-mcpu=pwr10 decodes more instructions than -mcpu=pwr9.
-mcpu=pwr9 decodes more instructions than -mcpu=pwr8.

I think GNU objdump just decodes all instructions it recognizes, not expecting a -mcpu option. It behaves as if the default recognized ISA level includes everything.

Ah, that makes sense. For AMDGCN your earlier point (and Eric's point in email, which doesn't seem to have made it here) concerning incompatibility applies: we don't have forward or backward compatibility between ISA generations. Essentially any change to mcpu (and some features, IIUC) can make two ISAs fundamentally incompatible. I think we have attempted to approximate a "default" ISA before for things like disassembly and failed.

arsenm added a subscriber: arsenm.Jul 28 2020, 8:15 AM

Can we step back and think whether precise CPU selection is really necessary? For example, PowerPC has the needs for different ISA levels but it simply uses -mcpu=future.

I am a bit lost on what you mean; is there a significant cost to providing precise CPU selection? It doesn't make sense to require the user to e.g. run the dumper to inspect the binary in order to come up with the command-line to run the dumper in order to inspect the binary.

-mcpu=pwr10 decodes more instructions than -mcpu=pwr9.
-mcpu=pwr9 decodes more instructions than -mcpu=pwr8.

I think GNU objdump just decodes all instructions it recognizes, not expecting a -mcpu option. It behaves as if the default recognized ISA level includes everything.

Ah, that makes sense. For AMDGCN your earlier point (and Eric's point in email, which doesn't seem to have made it here) concerning incompatibility applies: we don't have forward or backward compatibility between ISA generations. Essentially any change to mcpu (and some features, IIUC) can make two ISAs fundamentally incompatible. I think we have attempted to approximate a "default" ISA before for things like disassembly and failed.

For example in one minor revision, the interpretation of an opcode was changed so we have to change the name in the disassembly based on the individual feature

I am a bit worrisome that this patch is adding more target specific stuff to llvm-objdump.

Agreed, I have looked at making the majority of llvm-objdump (and essentially everything under tools/) into a library at one point, but the biggest initial hurdle is the CommandLine parsing library. I think ideally all of the target-relevant parts would be in target-overridden implementations in some library, all of the llvm-objdump behavior parts (like the core symbol-to-symbol loop) would be in an llvm-objdump library which depends on the libraries with target-specific behaviors, and the tools/ directory would contain ~10 line main functions.

Can we step back and think whether precise CPU selection is really necessary? For example, PowerPC has the needs for different ISA levels but it simply uses -mcpu=future.

I am a bit lost on what you mean; is there a significant cost to providing precise CPU selection? It doesn't make sense to require the user to e.g. run the dumper to inspect the binary in order to come up with the command-line to run the dumper in order to inspect the binary.

If different AMDGPU CPUs are incompatible, should such detection be moved to somewhere in lib/Target/AMDGPU/?

Yes, I think it should be done in a generic way. All other targets which don't need/want it just getting the default (existing) behavior.

I'd like to point out that this patch correctly detects the CPU string by looking at the binary. The CPU string needs to be determined beforehand and passed on to the SubtargetInfo constructor.

To move the CPU string detection to lib/Target/AMDGPU, the target would also need to look at the binary. I think, one would have to pass ObjectFile * to the MCSubtargetInfo constructor. This allows targets to do the needful in their respective constructors. I'd like to know thoughts/opinions on this.

rochauha retitled this revision from [llvm-objdump][AMDGPU] Detect subtarget to [llvm-objdump][AMDGPU] Detect CPU string.Aug 5 2020, 2:23 AM

I am a bit worrisome that this patch is adding more target specific stuff to llvm-objdump.

Agreed, I have looked at making the majority of llvm-objdump (and essentially everything under tools/) into a library at one point, but the biggest initial hurdle is the CommandLine parsing library. I think ideally all of the target-relevant parts would be in target-overridden implementations in some library, all of the llvm-objdump behavior parts (like the core symbol-to-symbol loop) would be in an llvm-objdump library which depends on the libraries with target-specific behaviors, and the tools/ directory would contain ~10 line main functions.

Can we step back and think whether precise CPU selection is really necessary? For example, PowerPC has the needs for different ISA levels but it simply uses -mcpu=future.

I am a bit lost on what you mean; is there a significant cost to providing precise CPU selection? It doesn't make sense to require the user to e.g. run the dumper to inspect the binary in order to come up with the command-line to run the dumper in order to inspect the binary.

If different AMDGPU CPUs are incompatible, should such detection be moved to somewhere in lib/Target/AMDGPU/?

Yes, I think it should be done in a generic way. All other targets which don't need/want it just getting the default (existing) behavior.

I'd like to point out that this patch correctly detects the CPU string by looking at the binary. The CPU string needs to be determined beforehand and passed on to the SubtargetInfo constructor.

To move the CPU string detection to lib/Target/AMDGPU, the target would also need to look at the binary. I think, one would have to pass ObjectFile * to the MCSubtargetInfo constructor. This allows targets to do the needful in their respective constructors. I'd like to know thoughts/opinions on this.

Maybe lib/Target/* is not the right place for the code then? If the dependency is on Object then where would the logical place for the code to live be?

I think "anywhere but llvm-objdump.cpp" is the concern, but I don't know where the right library is for target-specific logic related to Objects.

To me, it doesn't look like this is 'real' target-specific work. For example the target triple is detected within llvm-objdump itself using the getTarget() function. I'd like to understand - what is different in case CPU string detection that it needs to reside somewhere else?

However, one place that I can think of is the Object/ELFObjectFile.h. There are functions like getFileFormatName() and getArch(). Perhaps might be reasonable to have a getCPUNameIfPossible() too?

This seems to be related to the push of the cpu-id (aka. target-id) into other places as well: e.g., D84822, D80750, probably more.
I mentioned in those reviews already, this doesn't strike me as an "AMDGPU" feature at all.
We should start the discussions around missing functionality instead of adding AMDGPU workarounds in all these places...
One example is an IR module level target cpu and/or target feature, similar to target triple.


Also, the commit doesn't have a message that explains what is going on.

This seems to be related to the push of the cpu-id (aka. target-id) into other places as well: e.g., D84822, D80750, probably more.
I mentioned in those reviews already, this doesn't strike me as an "AMDGPU" feature at all.
We should start the discussions around missing functionality instead of adding AMDGPU workarounds in all these places...
One example is an IR module level target cpu and/or target feature, similar to target triple.


Also, the commit doesn't have a message that explains what is going on.

I agree, but can we at least fix the ICE in llvm-objdump in the short-term in terms of the reality we are currently living in? I didn't look into this case closely before commenting last, but immediately above the diff in disassembleObject there is a call to ObjectFile::getFeatures, which is a pure virtual function implemented in the various concrete object file format implementations with reference to specific targets. For example, ELFObjectFileBase::getFeatures is implemented as:

SubtargetFeatures ELFObjectFileBase::getFeatures() const {
  switch (getEMachine()) {
  case ELF::EM_MIPS:
    return getMIPSFeatures();
  case ELF::EM_ARM:
    return getARMFeatures();
  case ELF::EM_RISCV:
    return getRISCVFeatures();
  default:
    return SubtargetFeatures();
  }
}

Which is just the same code Ronak is adding here, hoisted into the Object library. Again, this is not AMDGPU specific, but somewhere it has to be implemented in terms of AMDGPU (because we define the ABI of our code objects).

I worry that we will just have to live with broken tools for months while we try to fix up core parts of LLVM, and all the while we could make a small, well-defined change consistent with the existing logic in the Object library to get to something useful. The "precedent" for this sort of code in Object is then just https://reviews.llvm.org/D21125. If we come along soon after and implement a vastly superior, internally-consistent model of all of these things and update Object, we will have to have updated e.g. getFeatures anyway, so there is no loss.

rochauha updated this revision to Diff 283818.Aug 6 2020, 11:29 PM
  • Pulled CPU string detection logic into ELFObjectFileBase.
  • Added details in commit message.
scott.linder added inline comments.Aug 7 2020, 2:04 PM
llvm/include/llvm/Object/ObjectFile.h
330

Can you document this, noting that it returns an empty StringRef when no CPU can be detected? I am also curious why you dropped the Optional, it seemed like a nicer interface that made the documentation of the "empty" case part of the code. I suppose in C++ there is no way to elide the extra storage for the Optional, but I would still think it is worth using.

I would also prefer to drop Target from the name, as getArch/getFeatures do not mention {Sub}Target.

llvm/lib/Object/ELFObjectFile.cpp
369

I would prefer to just name this following the style guide, I don't think this is a case where breaking it helps readability.

llvm/tools/llvm-objdump/llvm-objdump.cpp
2174

I don't think we need to even mention "AMDGPU" here, and the bit about an empty string being returned in the "failure" case should just be in the doc-comment on getTargetCPUName; I would instead just indicate the call is fallible by adding "try" somewhere.

All in all, I'd just replace this whole comment with // If MCPU Isn't specified, try to detect it.

2176

I think using the ternary here actually makes this longer/harder to read compared to an if (for example, you need the explicit call to .str() rather than getting the std::string(StringRef) constructor implicitly).

[Don't forget the commit message, also here in the phab review please]

rochauha added inline comments.Aug 8 2020, 12:50 AM
llvm/include/llvm/Object/ObjectFile.h
330

Since getArch / getFeatures were not using Optional, I thought maybe the convention should be maintained. But I do agree that Optional is the more self documenting interface in a way.

rochauha updated this revision to Diff 284125.Aug 8 2020, 12:51 AM
rochauha edited the summary of this revision. (Show Details)Aug 8 2020, 12:53 AM
rochauha edited the summary of this revision. (Show Details)
rochauha marked 3 inline comments as done.Aug 8 2020, 12:56 AM
rochauha added inline comments.
llvm/include/llvm/Object/ObjectFile.h
330

Changed to using Optional.

llvm/tools/llvm-objdump/llvm-objdump.cpp
2174

Done.

Looking good to me! Some small nits, and it would be good to hear back on what the general consensus on adding this to Object is.

llvm/lib/Object/ELFObjectFile.cpp
368

Can you add assert(getEMachine() == ELF::EM_AMDGPU); ?

372

I think there should be an implicit StringRef(const char *) constructor, so you can drop the explicit call here, same for the rest of the returns here.

llvm/test/tools/llvm-objdump/ELF/AMDGPU/subtarget.ll
11

shouldn't these specify.txt and detect.txt include %t somewhere in their path? I.e. %t-specify.txt and %t-detect.txt?

llvm/tools/llvm-objdump/llvm-objdump.cpp
2176–2177

Small nit, but I think this reads better as:

if (MCPU.empty())
  MCPU = Obj->tryGetCPUName().getValueOr("");

IMO the ternary operator just makes things more complicated, and you aren't exposing the intermediate type of tryGetCPUName anyway if you are using auto, so we don't lose anything.

jhenderson added inline comments.Aug 13 2020, 12:22 AM
llvm/include/llvm/Object/ELFObjectFile.h
91

I know it's not the way the get*Features methods are written, but perhaps this should be a private method?

scott.linder added inline comments.Aug 13 2020, 9:03 AM
llvm/include/llvm/Object/ELFObjectFile.h
91

I was thinking the same, but as you mention the convention here seems to be that these are public. If we think that is just an oversight maybe we can just follow up with a patch to hide all of them?

jhenderson added inline comments.Aug 14 2020, 12:09 AM
llvm/include/llvm/Object/ELFObjectFile.h
91

That would also be fine by me.

rochauha updated this revision to Diff 285817.Aug 14 2020, 11:45 PM
rochauha marked an inline comment as done.
rochauha marked 4 inline comments as done.Aug 14 2020, 11:47 PM
MaskRay added inline comments.Aug 15 2020, 5:40 PM
llvm/tools/llvm-objdump/llvm-objdump.cpp
2176
rochauha updated this revision to Diff 285879.Aug 15 2020, 10:03 PM
  • Removed comment and braces.
rochauha marked an inline comment as done.Aug 15 2020, 10:05 PM
scott.linder accepted this revision.Aug 17 2020, 10:18 AM

LGTM, and can you please open a follow-up review once this is committed to hide all of the Object::get<Target><Stuff> methods?

MaskRay added inline comments.Aug 17 2020, 10:40 AM
llvm/lib/Object/ELFObjectFile.cpp
382

Why is the comment not aligned with case ?

llvm/test/tools/llvm-objdump/ELF/AMDGPU/subtarget.ll
2

In these binary utility directories, we usually place RUN/CHECK lines before the text.

6

In these binary utility directories, we use ;; for non-RUN-non-CHECK comments. The different comment marker makes comments stand out.

rochauha added inline comments.Aug 17 2020, 10:40 PM
llvm/lib/Object/ELFObjectFile.cpp
382

This is the result after running clang-format.

llvm/test/tools/llvm-objdump/ELF/AMDGPU/subtarget.ll
2

Since the function is pretty small, I thought it would be helpful to have it first.
Do you think the layout should be changed?

6

I looked at a few tests in the binary utility tests but didn't find this pattern. Do you suggest that this should be changed in this test?

LGTM, and can you please open a follow-up review once this is committed to hide all of the Object::get<Target><Stuff> methods?

Yes, I intend to do this.

jhenderson added inline comments.Aug 18 2020, 2:47 AM
llvm/lib/Object/ELFObjectFile.cpp
382

Sounds like a clang-format bug that should be reported? I guess the problem is distinguishing comments for the case lines from those at the end of the previous case. Still, with the blank line before I think it's pretty clear that in this case the comment applies to the case.

llvm/test/tools/llvm-objdump/ELF/AMDGPU/subtarget.ll
6

A lot of the older tests were written before we adopted the double-comment marker style, and haven't been updated. Newer tests should be written with the new style.

This revision was not accepted when it landed; it landed in state Needs Review.Aug 18 2020, 5:14 AM
This revision was automatically updated to reflect the committed changes.

I have manually aligned the comment to the case statement before landing the patch. I think some investigation for clang-format may be necessary.