This is an archive of the discontinued LLVM Phabricator instance.

[Disassembler] Add +all CPU feature to subtargetInfo and use it for disassembling AArch64
AbandonedPublic

Authored by treapster on Jun 14 2022, 6:32 AM.

Details

Summary

It think it's reasonable to enable all supported extensions when we are disassembling something because we don't want to hardcode it in lldb, objdump, bolt and whatever other places use disassembler in llvm. The only reason not to do it I can think of is if there are conflicting instructions with the same encoding in different extensions, but it doesn't seem possible within one archeticture. We probably should use +all on x86 and other platforms? I'm not sure how test for this should look like, because disassembler is used by at least three tools I know of(gdb, objdump, BOLT), so it's not clear where should we put it, and how to make sure that all new instructions are added to it. Waiting for your feedback.

Diff Detail

Event Timeline

treapster created this revision.Jun 14 2022, 6:32 AM
Herald added a reviewer: MaskRay. · View Herald Transcript
Herald added a reviewer: rafauler. · View Herald Transcript
Herald added a reviewer: Amir. · View Herald Transcript
Herald added a reviewer: maksfb. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
treapster requested review of this revision.Jun 14 2022, 6:32 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 14 2022, 6:32 AM
treapster edited the summary of this revision. (Show Details)Jun 14 2022, 7:51 AM
treapster added a comment.EditedJun 14 2022, 7:53 AM

Well, it broke a lot of tests that were checking that instructions are not disassembled without explicitly specified extensions, but i don't quite get the point of such tests. If new instructions were added by default in lldb, why can't we do the same with objdump? Why assert that something is not working and thus prevent adding support for it later?

treapster edited the summary of this revision. (Show Details)Jun 14 2022, 7:54 AM
DavidSpickett added subscribers: labrinea, DavidSpickett.

conflicting instructions with the same encoding in different extensions, but it doesn't seem possible within one archeticture

I seem to remember some system register names changing between architecture *profiles* not version. @labrinea may know more about that.

Well, it broke a lot of tests that were checking that instructions are not disassembled without explicitly specified extensions, but i don't quite get the point of such tests.

Without looking at them in detail I assume they are checking that the proper extension name is printed when it says "requires bla" and that each instruction is properly tied to the correct feature name. This is for when you are assembling as opposed to disassembling.

I agree that objdump should have some kind of "max" setting, even a default. However there will still need to be a way to test what I referred to.

What might be possible is to write the testing such that it doesn't use llvm-objdump. I haven't seen the structure of the tests perhaps this would mean duplicating tests all over the place.

Now I think about it, tests could do something like "-mattr=-all" to remove it, right?

I like the idea of all for llvm-objdump, but it should be a separate change, with -mattr=-all tests.

pirama added a subscriber: pirama.Jun 14 2022, 5:08 PM

We had some internal Google folks hit this exact issue recently. I don't think that the same "default" CPU should be used for debugging tools like llvm-objdump, and that is the crux of the matter here. Perhaps updating the test to specify "generic" as the CPU when passed to llvm-objdump for those tests is the right pattern? Then the default for disassembly should be to have all attributes/features enabled unless the developer specifies on the command line that they want a different target.

I agree that objdump should have some kind of "max" setting, even a default. However there will still need to be a way to test what I referred to.

As you said, instruction requires bla is part of assembler, not disassembler, so it is not affected by the patch. The problem here is that tests do CHECK-UNKNOWN: 83 a9 91 c0 <unknown> which basically checks that objdump cannot disassemble instruction that was assembled in the very same test. If we change that, we can make +all default attribute for objdump and it will disassemble everything the same way GNU objdump and lldb do.

What might be possible is to write the testing such that it doesn't use llvm-objdump. I haven't seen the structure of the tests perhaps this would mean duplicating tests all over the place.

Why not use objdump? If we only check that instructions get disassembled, it will work fine.

For now i'd like to hear more about possible clashing instruction encodings from @labrinea, and if it's not a concern, i think we can fix the tests if we remove CHECK-UNKNOWN: and only call objdump with CHECK-INST. What do you think?

Why not use objdump? If we only check that instructions get disassembled, it will work fine.

I just remember some tests using llvm-mc to disassemble and some using objdump. Istr that using llvm-mc for assemble and disassemble usually meant having two test files instead of one. Anyway, not important.

objdump with a generic cpu or with the all feature removed sgtm.

treapster abandoned this revision.Jul 5 2022, 5:46 AM

Abandoning because @MaskRay already implemented it in D128029 and D128030. Thanks for your time, @MaskRay!