This is an archive of the discontinued LLVM Phabricator instance.

[Object] Recognize ARM64EC binaries in COFFObjectFile::getMachine.
ClosedPublic

Authored by jacek on Apr 24 2023, 12:51 PM.

Details

Summary

ARM64EC/ARM64X binaries use ARM64 or AMD64 machine types, but provide additional CHPE metadata that may be used to distinguish them from pure ARM64/AMD64 binaries.

Diff Detail

Event Timeline

jacek created this revision.Apr 24 2023, 12:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 24 2023, 12:51 PM
jacek requested review of this revision.Apr 24 2023, 12:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 24 2023, 12:51 PM

Is it possible to write tests using yaml2obj instead of lld-link?

Is it possible to write tests using yaml2obj instead of lld-link?

+1 to this if at all possible: the code under test isn't LLD specific (I assume, since it's in the LLVM header area), so sticking the testing in lld seems wrong. It might require some extra work in yaml2obj to get it to support new things, but I don't think that's a bad thing.

lld/test/COFF/arm64ec.test
43 ↗(On Diff #516509)

Nit: delete extra blank line.

jacek updated this revision to Diff 516771.Apr 25 2023, 6:31 AM

Thanks for reviews. The new version uses yaml2obj instead of lld-link. Previous tests will be useful for testing lld-link itself, but I will send it later, probably together with code range generation support.

yaml-based test is a bit less similar to real-world binaries. Linker should align code boundaries to page size, but doing that with yaml SectionData makes it hard to follow. I just used much smaller alignment, it shouldn't matter for disassembler anyway.

CHPE and load config are quite hard to read. I guess we could have some helper syntax for that in yaml2obj, but it's not clear to me how it should look like. Unlike other things like PE headers, load config and CHPE metadata are part of .data/.rdata sections (so it would need to alter SectionData content) and we'd need some way of referencing it (unless we'd still hard-code RVAs). I will investigate if further if you prefer.

I guess we could have some helper syntax for that in yaml2obj, but it's not clear to me how it should look like.

Someone can always use llvm-readobj or dumpbin to help understand the fine details if necessary, so it's not a big deal if it's only a few tests (particularly if they have comments explaining the tricky bits).

We could invent a new kind of "SectionData" that allows structured data, instead of just a string, I guess. We don't want to get too fancy here; even just representing the string as an array of 32-bit integers would be sufficient to annotate various bits using comments.

Adding a mechanism for relocations to represent RVAs seems overkill. yaml2obj is intentionally kept simple.

Just a sidenote; I think @jacek should go get himself commit access at this point.

jacek updated this revision to Diff 536259.Jun 30 2023, 8:18 AM

The new version uses yaml2obj StructuredData from D149439 and D149440.

The new yaml2obj-based test certainly looks reasonable to me. However, it looks like there are three code paths now, and the test only tests one of them. The case where CHPEMetadata is false/nullptr/whatever is possibly covered by other tests(?), but I think you also need to test the other part of the switch you've added? (I don't know anything really about the code area to give any particular guidance about it though).

jacek updated this revision to Diff 536902.Jul 3 2023, 2:29 PM

Yes, existing tests should cover missing CHPE metadata. I added a test for ARM64X.

jhenderson added inline comments.Jul 3 2023, 11:46 PM
llvm/test/tools/llvm-objdump/COFF/arm64ec.yaml
2 ↗(On Diff #536902)

Nits:

  1. Prefer double-dash options for FileCheck (--check-prefix), for consistency with the llvm-readobj command you've added.
  2. DISASM is a more common abbreviation than DISAS, so I suggest using that.
28 ↗(On Diff #536902)

Rather than having two identical(?)-barring-one-field input YAMLs between the two tests, can you use the yaml2obj -D feature to enable sharing of the two? I don't know if that feature is specific to the ELF side or not, but if it is usable, you could have both tests in the same file, and just create two input files from the same input block, which differ only by the Machine type. This would also allow you to minimise the duplication in the FileCheck CHECK patterns.

A good example of what I mean can be seen in this test: https://github.com/llvm/llvm-project/blob/main/llvm/test/tools/llvm-readobj/ELF/file-header-machine-types.test.

jacek updated this revision to Diff 537188.Jul 4 2023, 3:59 PM

Rebased, using a single file and DISASM and double-dash for disassembler output checks.

jhenderson accepted this revision.Jul 5 2023, 1:52 AM

LGTM, with nits.

llvm/test/tools/llvm-objdump/COFF/arm64ec.yaml
1 ↗(On Diff #537188)

Nit: newer tools tests use ## for comments, to distinguish them from lit and FileCheck directives.

5 ↗(On Diff #537188)

I would consider reordering your CHECK/RUN lines for a bit of improved readability/comparability, since you're partly sharing them (I also added some extra whitespace to make things nicely line up):

# RUN: yaml2obj %s -o %t -DMACHINE=IMAGE_FILE_MACHINE_AMD64
# RUN: llvm-objdump -d %t | FileCheck --check-prefixes=DISASM,ARM64EC %s
# RUN: llvm-readobj --coff-load-config %t | FileCheck --check-prefix=CODEMAP %s

# Check that ARM64 image file with CHPE data is recognized as ARM64X.
# RUN: yaml2obj %s -o %t -DMACHINE=IMAGE_FILE_MACHINE_ARM64
# RUN: llvm-objdump -d %t | FileCheck --check-prefixes=DISASM,ARM64X %s
# RUN: llvm-readobj --coff-load-config %t | FileCheck --check-prefix=CODEMAP %s

# ARM64EC: file format coff-arm64ec
# ARM64X:  file format coff-arm64x

# DISASM: ...
This revision is now accepted and ready to land.Jul 5 2023, 1:52 AM

NB: It wouldn't hurt to get an approval from someone more familiar with COFF/ARM64EC etc.

jacek updated this revision to Diff 537334.Jul 5 2023, 6:22 AM
jacek marked 4 inline comments as done.

Reordered and using '##'. Thanks for reviews.

MaskRay accepted this revision.Jul 5 2023, 12:35 PM
mstorsjo accepted this revision.Jul 5 2023, 12:37 PM

LGTM. And I didn't know about the yaml2obj option -D, that's nice!

jhenderson accepted this revision.Jul 6 2023, 12:19 AM

LGTM again.

This revision was landed with ongoing or failed builds.Jul 24 2023, 11:42 AM
This revision was automatically updated to reflect the committed changes.
antmo added a subscriber: antmo.Jul 25 2023, 9:00 AM

Hi, I think this broke lldb-aarch64-windows : https://lab.llvm.org/buildbot/#/builders/219/builds/4366
Could you please take a look at it?

Hi, I think this broke lldb-aarch64-windows : https://lab.llvm.org/buildbot/#/builders/219/builds/4366
Could you please take a look at it?

I think I can see what's going on, sorry about that. Now that COFFObjectFile recognizes ARM64X, it reports that for system DLLs and lldb doesn't know what to do about them. I think that something like the attached patch should help, but I will need a bit to verify.