This is an archive of the discontinued LLVM Phabricator instance.

[JITLink] Initial AArch32 backend
ClosedPublic

Authored by sgraenitz on Feb 15 2023, 2:02 AM.

Details

Summary

This first version lays the foundations for AArch32 support in JITLink. ELFLinkGraphBuilder_aarch32 processes REL-type relocations and populates LinkGraphs from ELF object files for both big- and little-endian systems. The ArmCfg member controls subarchitecture-specific details throughout the linking process (i.e. it's passed to ELFJITLinker_aarch32).

Relocation types follow the ABI documentation's division into classes: Data (endian-sensitive), Arm (32-bit little-endian) and Thumb (2x 16-bit little-endian, "Thumb32" in the docs). The implementation of instruction encoding/decoding for relocation resolution is implemented symmetrically and is testable in isolation (see AArch32 category in JITLinkTests).

Callable Thumb functions are marked with a ThumbSymbol target-flag and stored in the LinkGraph with their real addresses. The thumb-bit is added back in when the owning JITDylib requests the address for such a symbol.

The StubsManager can generate (absolute) Thumb-state stubs for branch range extensions on v7+ targets. Proper GOT/PLT handling is not yet implemented.

This patch is based on the backend implementation in ez-clang and has just enough functionality to model the infrastructure and link a Thumb function main() that calls printf() to dump "Hello Arm!" on Armv7a. It was tested on Raspberry Pi with 32-bit Raspbian OS.

Diff Detail

Event Timeline

sgraenitz created this revision.Feb 15 2023, 2:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2023, 2:02 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
sgraenitz requested review of this revision.Feb 15 2023, 2:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2023, 2:02 AM

In order to try it yourself, build and run the llvm-jitlink-executor tool on a compatible Pi -- for instructions see: https://github.com/echtzeit-dev/llvm-jitlink-executor

On your host machine, run the test from this review and remote JIT-link the output object to your Pi with the llvm-jitlink tool:

> ninja llvm-jitlink llvm-mc llvm-config FileCheck count not
> bin/llvm-lit --filter=ELF_thumb -a test
...
> bin/llvm-jitlink --oop-executor-connect=192.168.1.105:20000 test/ExecutionEngine/JITLink/arm/Output/ELF_thumbv7_printf.s.tmp.o

With the extra output from the above fork, you should get something like this on the executor side:

> ./bin/llvm-jitlink-executor listen=0.0.0.0:20000
Listening at 0.0.0.0:20000
Connected! Starting SimpleRemoteEPCServer.
Bootstrap symbols:
  __llvm_orc_bootstrap_deregister_ehframe_section_wrapper: 0x00101ff0
  __llvm_orc_bootstrap_run_as_main_wrapper: 0x0011cb04
  __llvm_orc_bootstrap_run_as_void_function_wrapper: 0x0011cb4c
  __llvm_orc_bootstrap_mem_write_uint32s_wrapper: 0x0011ca2c
  __llvm_orc_bootstrap_register_ehframe_section_wrapper: 0x00101f34
  __llvm_orc_bootstrap_mem_write_uint16s_wrapper: 0x0011c9e4
  __llvm_orc_bootstrap_mem_write_uint8s_wrapper: 0x0011c99c
  __llvm_orc_bootstrap_mem_write_uint64s_wrapper: 0x0011ca74
  __llvm_orc_bootstrap_mem_write_buffers_wrapper: 0x0011cabc
  __llvm_orc_bootstrap_run_as_int_function_wrapper: 0x0011cb94
Calling int main(0, 0x758005f8) @0x76fc0001
Hello arm!
main @0x76fc0001 returned: 0
^C
sgraenitz added inline comments.Feb 15 2023, 2:12 AM
llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
473

@lhames Ideas how to get this cleaner?

I've taken a look from the Arm side, for initial support I think a lot of the comments don't need attention, but may be worth considering for future revisions. While what's here looks similar to other JITLink backends, I don't have any experience with it so I'll need to defer to someone else more familiar with its code base. Very happy to look at the Arm specifics though.

I must confess that I haven't double checked all the bit-patterns for the relocations against the Arm ARM, the bits that I can easily cross reference from LLD look like they match.

llvm/include/llvm/ExecutionEngine/JITLink/arm.h
33

Strictly speaking that would be a Data relocation (https://github.com/ARM-software/abi-aa/blob/main/aaelf32/aaelf32.rst#5613static-data-relocations) rather than an Arm state relocation.

Not sure if you are planning Arm state instruction support, may be worth making the distinction if you are.

59

Not a strong opinion, but you may wish to use stubs throughout the implementation as I think JITLink seems to use stubs.

Veneers, Stubs and Thunks are all synonyms and although the Arm ABI calls them veneers GNU ld and LLD stick to the linker's choice.

108

IIUC this might be undefined behaviour for a 32-bit type if Hi is 31 as we'll end up with (1UL << 32) -1. Did you mean to make Val uint64_t?

129

I guess we are only supporting little-endian for now (I don't blame you!) while for v7/v8 all instructions will always be little-endian the data can be big-endian on a big-endian system.

140

It looks like this currently only handles Thumb destinations, i.e. it doesn't write a BLX instruction to Arm state if the TargetAddress has the bottom bit set.

Is this because all Arm/Thumb state transitions are veneered so we never get an opportunity to use BLX?

211

The convention for the other targets seems to be "S__STUBS" I know armlink Arm's proprietary linker uses Veneer$$Code, but may be worth keeping in sync with other JITLink targets.

218

Is this indirecting all jumps and calls through a veneer? One thing a static linker needs to be careful about is functions that do not have type STT_FUNC (https://github.com/ARM-software/abi-aa/blob/main/aaelf32/aaelf32.rst#553symbol-values). In this case the bottom bit is alway expected to be 0, even for Thumb Targets. The current veneer would attempt to switch to Arm state for a relocation with bit 0 clear.

Non STT_FUNC symbols are expected to represent labels inside a function so they are always the same state.

It maybe that the JITLink doesn't encounter non STT_FUNC symbols at link time as it should be possible to resolve branches to internal labels at fixup time without having to expose them to the linker.

llvm/lib/ExecutionEngine/JITLink/ELF_aarch32.cpp
177

v7 without the profile attribute includes v7-m, v7-a and v7-r. If this is a problem then I recommend checking Tag_CPU_arch_profile (https://github.com/ARM-software/abi-aa/blob/main/addenda32/addenda32.rst#335target-related-attributes) and checking for A profile.

In terms of instructions v7, v7E_M, v8_M_Main, and v8-R will be compatible, although they are not likely targets for JITLink.

v9_A is also a superset of v8_A so you could include support that with no extra cost.

Very cool. Thank you for contributing this -- it's great to finally see ARM32 support in JITLink! :)

llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
473

I wonder if any other architectures use the low bits of addresses for architecture specific flags? Maybe we could come up with something more general, like a triple-specific 'getRawAddressForArithmetic` function that we could use in common code.

At a minimum we could make the test in this assert conditional on the graph having a thumb triple -- that way we wouldn't affect other platforms.

llvm/include/llvm/ExecutionEngine/JITLink/arm.h
97–105

I think we should spell out the relocations with bias 8 and have the default case be an llvm_unreachable, just to be safe.

238

As an aside to this review -- I find myself doing this a lot too. I wonder if we should add some ArrayRef<char> <-> ArrayRef<uint8_t> conversion helpers in ADT?

I've also gone back and forward on whether block content should be ArrayRef<char> or ArrayRef<uint8_t>. From memory they were originally uint8_t-based and I moved them to char at some point in the hope of reducing friction, but I'm not sure it was very successful in the end. Neither type seems to be a perfect fit.

llvm/lib/ExecutionEngine/JITLink/ELFLinkGraphBuilder.h
113 ↗(On Diff #497595)

Is there an ELF attribute or type that we could be looking at, rather than the name?

(I'm very guilty of using section names rather than attributes in checks, but as JITLink evolves I do want us to use attributes and types more where possible)

llvm/lib/ExecutionEngine/JITLink/ELF_aarch32.cpp
180–197

Should this just be the default case, to future-proof against new ArchKinds? (Not sure whether we're expecting any at this point?)

lhames added inline comments.Feb 15 2023, 11:09 AM
llvm/include/llvm/ExecutionEngine/JITLink/arm.h
33

Are you suggesting a different prefix for the name?

Other architectures have all stuck to Delta<bitwidth>, so unless there's some reason to distinguish this as "arm" I think it would make sense to rename to Delta32.

peter.smith added inline comments.Feb 15 2023, 12:00 PM
llvm/include/llvm/ExecutionEngine/JITLink/arm.h
33

Not the Delta name, the FirstArmRelocation name. For example FirstDataRelocation. I don't think it is hugely important though.

sgraenitz added a comment.EditedFeb 16 2023, 11:41 AM

Thanks for your rich feedback! It points out a number of question that I didn't think about before. I am working on a follow-up and I hope it's ready soon :)

sgraenitz updated this revision to Diff 499974.Feb 23 2023, 2:01 PM
sgraenitz marked 10 inline comments as done.

Revisit design and address feedback

sgraenitz added a comment.EditedFeb 23 2023, 2:04 PM

The revised patch makes a lot of changes (sorry for that), but I think it's for the best. Due to its multitude of sub architectures, I think this target has the potential to grow very big. It would be great to start off with a design that supports it. Let me try and summarize the major changes.

Introduced readAddend() analog to applyFixup(). It decodes the initial addend from a REL relocation from it's fixup location and gets called from addSingleRelRelocation(). There should be no functional difference to RELA relocations.

Out-lined actual implementations from readAddend() and applyFixup() (per relocation class Data/Arm/Thumb) into aarch32.cpp. This allows to move a lot of the supporting machinery into the cpp as well. I am aware of the JITLink convention to have everything in this header in order to inline it, but I don't think it scales enough in this case. Where inlining is a must-have, please consider using LTO/Thin-LTO.

Introduced symmetric encode/decode functions for sets of fixup formats. AArch32 fixups involve complex combinations of shifts and masking. Having this symmetric and side-by-side hopefully aids comprehension and bugfixing. E.g. encodeImmBT4BlT1BlxT2_J1J2(int64_t Value) is used for Thumb_Call and Thumb_Jump24. It returns the bits that represent the immediate value for instruction formats B-T4, BL-T1 and BLX-T2 when J1/J2 branch range extensions are available. It encodes the plain operation and doesn't check for errors (responsibility of the caller). The corresponding decode function is supposed to do the exact inverse operation.

In order to use these encode/decode functions, we need to mask out preexisting bits from the immediate field(s). That's the job of the freestanding writeImmediate() function. It infers the respective mask from a FixupInfo<EdgeKind> struct that collects all constants for a particular relocation type. writeRegister(), checkRegister() and checkOpcode() follow the same approach.

LLD doesn't do this effort. It mostly just overwrites the entire instruction (including opcode, register field, etc.) with default bits. I think for JITLink it makes a lot of sense to have these tools, because we want to be able to implement relaxation and (who knows?) re-optimization at some point. Another benefit is testing. I added AArch32Relocations.Thumb_Call_J1J2 as a first example in JITLinkTests.

BTW Thanks to @kristof.beyls for your convincing arguments for renaming this to AArch32!

llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
473

This is not addressed yet. Will do next!

llvm/include/llvm/ExecutionEngine/JITLink/arm.h
33

Good point. In my new patch the groups mimic relocation Classes!

59

In my new patch I turn around the naming approach: For names in actual code, I stick to the linker conventions as you propose ("stubs" in this case) and in comments I note the ARM ABI name ("veneers" in this case). That's the better fit indeed.

97–105

This was copy/paste and disappeared in the new patch.

108

Oh clearly this should have been 1ULL! Good catch. In my new patch I moved away from the extractBits() helper and stick with plain shifts and masking. It really aids symmetry in encode/decode!

129

Yes, this didn't consider big endian.

for v7/v8 all instructions will always be little-endian the data can be big-endian on a big-endian system

Thanks that's good to know. My new patch accounts for it in theory, but I have no device to test it in practice.

@peter.smith How is this different for pre-v7 CPUs? Thinking about v6m in particular? (Are there even big-endian variants?)

140

Yes, this did only implement BL-T1 to call Thumb code and assumed the LSB to be set. It didn't check whether LSB is clear and switch the instruction to BLX-T2 in order to call ARM code. Let me follow up on interworking veneers later.

211

I was a bit proud that I managed to reverse engineer that :) but you are right.

218

One more thing I will have to take a closer look later on. Not sure we are even adding such symbols to the link-graph yet :)

Non STT_FUNC symbols are expected to represent labels inside a function so they are always the same state.

Thanks for this note! It didn't make sense to me until now.

It maybe that the JITLink doesn't encounter non STT_FUNC symbols at link time as it should be possible to resolve branches to internal labels at fixup time without having to expose them to the linker.

Ok, I will try and keep that in mind.

Is this indirecting all jumps and calls through a veneer?

Yes, the convention in JITLink is to first build stubs for all edges and allow to relax them later on, e.g. in an optimization pass.

238

I'd be in favor of uint8_t for all binary memory, because it avoids this ancient weird thing that signedness of char in C is undefined and every platform/compiler has their own interpretation what that means. The good thing is that it's always in-line with their string literals and that's what I think char is best used for (exclusively).

Hex bytes and char always make trouble at some point for some people, like this: https://github.com/llvm/llvm-project/commit/417b1164c28ef526cfe3ccab70d22598b7c63624

I must have missed the uint8_t era, JITLink used char as long as I can remember. I'd vote for changing it, but it's probably a lot of work and may cause a long tail of tricky issues. Introducing better handling only supports the status quo. Not sure that's ideal.

For now my workaround is to initialize with uint8_t and reinterpret_cast<const char *>, yes.

sgraenitz retitled this revision from [JITLink] Initial 32-bit ARM backend to [JITLink] Initial AArch32 backend.Feb 23 2023, 2:05 PM
sgraenitz updated this revision to Diff 499977.Feb 23 2023, 2:15 PM

Amend all in one commit and try again

sgraenitz marked an inline comment as not done.Feb 23 2023, 2:16 PM
sgraenitz added inline comments.
llvm/lib/ExecutionEngine/JITLink/ELFLinkGraphBuilder.h
113 ↗(On Diff #497595)

Yes this was a quick one. Let me check if I find something better. This is still open.

sgraenitz updated this revision to Diff 500150.Feb 24 2023, 5:10 AM
sgraenitz marked 2 inline comments as done.

Polish and finish renaming Arm -> AArch32

sgraenitz updated this revision to Diff 500194.Feb 24 2023, 7:10 AM

More comments and minor fixes

sgraenitz added inline comments.Feb 24 2023, 7:12 AM
llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
473

I split this out into a separate review: D144714

sgraenitz updated this revision to Diff 500195.Feb 24 2023, 7:17 AM
sgraenitz marked an inline comment as done.

Match by section info instead of name in excludeSection()

@lhames I think all your feedback should be addressed now.

llvm/lib/ExecutionEngine/JITLink/ELFLinkGraphBuilder.h
113 ↗(On Diff #497595)

In a nutshell:

  Creating graph sections:
    ...
    4: Skipping section ".ARM.exidx" explicitly
    5: ".rel.ARM.exidx" is not an SHF_ALLOC section: No graph section will be created.
  ...
Processing relocations:
  .ARM.exidx:
    skipped (fixup section excluded explicitly)
llvm/lib/ExecutionEngine/JITLink/ELF_aarch32.cpp
108

Hope it's fine to keep this around for a bit.

Thanks for the update. Will try and take a look next week. Ran out of time today.

tschuett added inline comments.
llvm/include/llvm/ExecutionEngine/JITLink/aarch32.h
172

Why do you wrap Èxpected with an optional?

sgraenitz updated this revision to Diff 500743.Feb 27 2023, 5:23 AM
sgraenitz marked an inline comment as done.

Make valid edge kinds a pre-condition for readAddend()/applyFixup() and revisit conversion of ELF relocation type <--> JITLink edge kind

sgraenitz added inline comments.Feb 27 2023, 5:24 AM
llvm/include/llvm/ExecutionEngine/JITLink/aarch32.h
172

Hi Thorsten, thanks for your note! The idea was that each of the readAddendX() functions can fall through to the error handling at the end of readAddend(). std::nullopt indicated that it didn't handle the case, while Expected alone would either be a llvm::Error or a int64_t.

However, it made me think again about the kind of error we face when one of the applyFixupX() or readAddendX() fails: An error for unsupported edge kinds is raised when populating the link-graph in a much earlier link phase. This is an error in the input object and thus propagated as llvm::Error. Here, at fixup-time, we should assume that all edge kinds are valid. If we still encounter an invalid one, this is an implementation error and should trigger an assertion failure/llvm_unreachable.

In my new patch, I implemented this AND dropped the optionals.

sgraenitz updated this revision to Diff 500766.Feb 27 2023, 6:35 AM

Drop unused includes

sgraenitz updated this revision to Diff 500769.Feb 27 2023, 6:39 AM

Polish unit tests and add one for encoding MovtT1MovwT3

sgraenitz marked an inline comment as done.Feb 27 2023, 8:20 AM
sgraenitz added inline comments.
llvm/lib/ExecutionEngine/JITLink/ELF_aarch32.cpp
224

v7 without the profile attribute includes v7-m, v7-a and v7-r. If this is a problem then I recommend checking Tag_CPU_arch_profile

Good to know! For now I added a note here. I guess v7-a and v8-a will be the first with good support upstream.

A few small comments concentrating on the Arm specifics.

llvm/include/llvm/ExecutionEngine/JITLink/aarch32.h
97

At the moment this patch supports Arm v7 and V8. The J1J2 branch encoding is present in all v7 and v8. A rough rule of Thumb is All Cortex cores v6-m v7, v8 etc. and the Arm1156t2-S (Thumb-2's introduction). If older CPUs that don't support Thumb-2 are quite far down the roadmap it may be worth simplifying the patch by not including this.

This is just a suggestion, I can see that it is unit tested so no objections to leaving it in.

llvm/lib/ExecutionEngine/JITLink/ELF_aarch32.cpp
129

When -ffunction-sections is used clang will use .ARM.exidx.function_name and .ARM.extab.function_name . Not sure whether that will be the case for JITLink. If it is then it may be worth having these as prefix matches rather than a full match.

130

This will hold for .ARM.exidx sections, but not for .ARM.extab. The .ARM.extab sections are only ever referenced from .ARM.exidx sections (when the number of unwind instructions is too big to inline into 4-bytes they need to go into a separate section.

.text <- R_ARM_PREL31 (SHF_LINK_ORDER) .ARM.exidx -> R_ARM_PREL31 .ARM.extab

The .ARM.exidx sections have a specific ELF type SHT_ARM_EXIDX (https://github.com/ARM-software/abi-aa/blob/main/aaelf32/aaelf32.rst#532section-types) but .ARM.extab does not.

llvm/lib/ExecutionEngine/JITLink/aarch32.cpp
245

Looking at the stub generation code, if there is a Thumb_Call to an external edge, then a stub with LoBitH will be set will be created. This won't be a problem if the code-generator never generates BLX to an external symbol, but would be if it does.

262

Is there a reason why this isn't

LLVM_LIKELY(ArmCfg.J1J2BranchEncoding)
               ? decodeImmBT4BlT1BlxT2_J1J2(R.Hi, R.Lo)
               : decodeImmBT4BlT1BlxT2(R.Hi, R.Lo);

The J1J2 would apply here in the same way as Thumb_Call.

llvm/unittests/ExecutionEngine/JITLink/AArch32Tests.cpp
72

Big-endian for v7 and v8 (and v6 unless in legacy backwards compatible mode be32) have little-endian instructions and big-endian data. If this is just for instructions you probably don't need the template parameter.

You'll still see big-endian instructions in ELF relocatable objects, a be8 supporting linker is expected to endian-reverse instructions for the executable.

Thanks for your notes! I followed up on a few and will try to post a patch towards the end of the week.

llvm/include/llvm/ExecutionEngine/JITLink/aarch32.h
97

Ok, I see. So far I used this to look how passing through an extra config will work out, since the other backends don't do it and here it looks like we need at least some kind of config. I will remove the unused fields.

llvm/lib/ExecutionEngine/JITLink/ELF_aarch32.cpp
130

Ah right, I only checked the flags, but I should use sh_type thanks.

llvm/lib/ExecutionEngine/JITLink/aarch32.cpp
245

Yes, I didn't catch a case like this yet. But I am also not sure I fully understand: If codegen wrote a BLX, then it would be T2 right? The docs say LoBitH must be empty here:

if CurrentInstrSet() == InstrSet_ThumbEE || H == '1' then UNDEFINED;

What is the stub generation code you mean? The one here in JITLink?

262

My understanding was that CPUs without J1J2 didn't have R_ARM_THM_JUMP24 yet, so it wouldn't every happen? (Maybe that would be worth an assertion.)

llvm/unittests/ExecutionEngine/JITLink/AArch32Tests.cpp
72

Somehow I wanted to make this explicit. It's probably worth a comment.

peter.smith added inline comments.Feb 28 2023, 1:39 AM
llvm/lib/ExecutionEngine/JITLink/aarch32.cpp
245

Yes, the code in visitEdge, quoted here. If there is a BLX instruction with a Thumb_Call relocation where the target isn't defined, it looks like a stub will be created.

bool visitEdge(LinkGraph &G, Block *B, Edge &E) {
    if (E.getTarget().isDefined())
      return false;

    switch (E.getKind()) {
    case Thumb_Call:
    case Thumb_Jump24: {
      DEBUG_WITH_TYPE("jitlink", {
        dbgs() << "  Fixing " << G.getEdgeKindName(E.getKind()) << " edge at "
               << B->getFixupAddress(E) << " (" << B->getAddress() << " + "
               << formatv("{0:x}", E.getOffset()) << ")\n";
      });
      E.setTarget(this->getEntryForTarget(G, E.getTarget()));
      return true;
    }
    }
    return false;
  }

I would not expect a code-generator to generate a BLX in this situation as it has no idea whether the target is Arm or Thumb so I would expect a BL. It is possible to write in assembly language but I don't know how well that needs to be supported in a JIT.

In static linkers an R_ARM_CALL (Arm state) or R_ARM_THM_CALL (Thumb state) relocation can be associated with a BL or BLX. As the static linker always knows the (Arm/Thumb) state of the destination it can write a BL instruction if both source and target are the same state or a BLX if they are different. This property permits code-generators to only use BL with R_ARM_CALL/R_ARM_THM_CALL, with BLX reserved for assembler. It also permits Thumb code to reuse Arm stubs.

I don't know how often this would happen in a JIT though.

262

R_ARM_THM_JUMP24 is used for the wide unconditional branch regardless of whether J1J2 encoding is available. It was introduced when the BLX instruction was made available. BL <immediate> can be converted into BLX <immediate> but B.w <immediate> cannot be converted in BX <immediate> as BX <immediate> doesn't exist (only BX <register>).

In summary R_ARM_THM_CALL is for BL and BLX instructions, R_ARM_THM_JUMP24 is for B.w instructions. The J1J2 encoding applies to both BL, BLX and B.w.

sgraenitz updated this revision to Diff 502411.Mar 5 2023, 2:45 AM
sgraenitz marked an inline comment as done.

Ignore .ARM.exidx sections by checking for type ELF::SHT_ARM_EXIDX

sgraenitz updated this revision to Diff 502412.Mar 5 2023, 2:47 AM
sgraenitz marked an inline comment as done.

Drop unused flags in ArmConfig

sgraenitz updated this revision to Diff 502413.Mar 5 2023, 2:51 AM
sgraenitz marked 3 inline comments as done.

Drop LoBitH check when reading initial addend for Thumb_Call

sgraenitz updated this revision to Diff 502414.Mar 5 2023, 2:53 AM
sgraenitz marked 2 inline comments as done.

Support J1J2BranchEncoding for Thumb_Jump24

sgraenitz updated this revision to Diff 502415.Mar 5 2023, 2:55 AM
sgraenitz marked an inline comment as done.

Add comment on instruction endianness for makeHalfWords() in AArch32Tests

sgraenitz updated this revision to Diff 502416.Mar 5 2023, 2:58 AM

Move range checks behind BL-BLX conversion for Thumb_Call

sgraenitz updated this revision to Diff 502417.Mar 5 2023, 3:03 AM

Support J1J2BranchEncoding for Thumb_Jump24 -- also in readAddend()

sgraenitz updated this revision to Diff 502418.Mar 5 2023, 3:06 AM

Raise runtime errors for unsupported edge kinds in subroutines of readAddend() and applyFixup() (instead of asserting)

Just one small comment about when traversing relocations from .ARM.exidx sections get implemented.

I'm out of comments from the Arm side. Good luck with the JITLink specific changes. I'm happy to look at the Arm side of any future updates or new changes.

llvm/lib/ExecutionEngine/JITLink/ELF_aarch32.cpp
119

I know this is a FIXME at the moment so nothing needs to be changed. For when it does get implemented, only relocations with an offset of 4 (modulo 8) should be traversed. An .ARM.exidx table entry is 8-bytes in size, made up of 2 4-byte entries:

R_ARM_PREL31 to function this is a table foreither inline unwind instructions or a R_ARM_PREL31 to a .ARM.extab section containing unwind instructions

The first word at offset 0 (modulo 8) is used to construct a table of offsets to all the functions with table entries. I expect that we would want to avoid following relocations in this context.

sgraenitz updated this revision to Diff 503310.Mar 8 2023, 4:04 AM
sgraenitz marked an inline comment as done.

Add more comprehensive TODO comment for relocations in .ARM.exidx section

Thanks @peter.smith for your help, it's very valuable! I appreciate your offer to help with future reviews as well.

I am planning to land this before end of March. Right now, it's waiting for D144714. In the meantime, if anyone has more feedback, please go ahead.

lhames added inline comments.Mar 11 2023, 3:38 PM
llvm/include/llvm/ExecutionEngine/JITLink/aarch32.h
232

We use $__STUBS elsewhere, rather than S__STUBS.

As a side note: I threw a $ in here to make a clash less likely, but I don't think it's actually an illegal symbol character in ELF or COFF so we should probably pick some universal prefix going forward. Maybe just __llvm_jitlink?

llvm/lib/ExecutionEngine/JITLink/ELF_aarch32.cpp
78–89

This should return the name of the JITLink edge enum, rather than the ELF relocation. That's not a blocker for landing though.

llvm/lib/ExecutionEngine/JITLink/aarch32.cpp
134

Could this be two types -- ThumbRelocation and MutableThumbRelocation, with ThumbRelocations constructible from MutableThumbRelocations?

245

We definitely want to support this eventually, and the JIT linker should have all the information that it needs to do so (though it's possible that we'll need to fix some plumbing in the symbol tables to transfer the thumb bit through).

I don't think we need this in the initial commit though.

sgraenitz updated this revision to Diff 504554.Mar 13 2023, 2:21 AM
sgraenitz marked an inline comment as done.

Make two distinct types for ThumbRelocation and WritableThumbRelocation

llvm/lib/ExecutionEngine/JITLink/ELF_aarch32.cpp
78–89

Do we need the name of the internal edge-kind? I moved the function here from aarch32.h to allow format-specific output. (I still have to see how it works out with intermediate edge kinds.)

llvm/lib/ExecutionEngine/JITLink/aarch32.cpp
134

Yes, it's cleaner like that and it could be structs not classes

dkurt removed a subscriber: dkurt.Mar 13 2023, 12:47 PM
sgraenitz updated this revision to Diff 507405.Mar 22 2023, 9:45 AM
sgraenitz marked an inline comment as done.

Rename stubs section SSTUBS -> llvm_jitlink_STUBS

sgraenitz updated this revision to Diff 507407.Mar 22 2023, 9:48 AM

Extend test to check symbol address type arm/thumb

sgraenitz updated this revision to Diff 507411.Mar 22 2023, 9:54 AM

Remove thumb bit from LinkGraph addresses and store it as target flag in jitlink::Symbol

sgraenitz updated this revision to Diff 507481.Mar 22 2023, 1:16 PM

Rebase and actually run the test in a 32-bit address range

sgraenitz updated this revision to Diff 507484.Mar 22 2023, 1:21 PM

Fix test warning: missing parameter -slab-page-size

sgraenitz updated this revision to Diff 507505.Mar 22 2023, 2:15 PM

In the test, inject an absolute symbol address for printf

I successfully rebased the patch to current mainline and check-llvm is happy. From my side this is ready to land.

lhames accepted this revision.Mar 22 2023, 2:55 PM

Relocation names should be generic (at least for getEdgeKindName), but otherwise LGTM!

Thank you very much for this Stefan: it's a huge contribution, with very neat solutions to hard problems, and a big step forward for LLVM JIT support!

llvm/lib/ExecutionEngine/JITLink/ELF_aarch32.cpp
78–89

The purpose of getEdgeKindName is definitely to name the Edge::Kind enum values (to make it easy to look them up in the JITLink docs and applyRelocation), but I think it makes sense to add support for a reverse mapping to the original relocation names too. We should introduce a new function for this, but that can be done post-commit.

This revision is now accepted and ready to land.Mar 22 2023, 2:55 PM
sgraenitz updated this revision to Diff 507524.Mar 22 2023, 3:27 PM

Move getEdgeKindName() back to aarch32.h and let it return the JITLink-internal name for consistency.
Instead, feed getELFAArch32EdgeKindName() into the LinkGraph. For the moment the reverse-mapping getELFRelocationType() is only used in tests.

Thanks to everyone for their valuable input. I think this has turned out really well. It's in a good shape now to start iterating.

sgraenitz marked 2 inline comments as done.Mar 22 2023, 3:36 PM
sgraenitz edited the summary of this revision. (Show Details)Mar 23 2023, 3:09 AM

Proper GOT/PLT handling is not yet implemented.

Thank you! Just in time to avoid some confusion for future readers! :)

sgraenitz edited the summary of this revision. (Show Details)Mar 23 2023, 3:24 AM
This revision was landed with ongoing or failed builds.Mar 23 2023, 3:26 AM
This revision was automatically updated to reflect the committed changes.
gulfem added a subscriber: gulfem.Mar 23 2023, 12:05 PM

We started seeing a segmentation fault while running ExecutionEngine tests on Mac.

******************** TEST 'LLVM :: ExecutionEngine/MCJIT/2003-01-04-ArgumentBug.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /opt/s/w/ir/x/w/staging/llvm_build/bin/lli -jit-kind=mcjit /opt/s/w/ir/x/w/llvm-llvm-project/llvm/test/ExecutionEngine/MCJIT/2003-01-04-ArgumentBug.ll > /dev/null
: 'RUN: at line 2';   /opt/s/w/ir/x/w/staging/llvm_build/bin/lli /opt/s/w/ir/x/w/llvm-llvm-project/llvm/test/ExecutionEngine/MCJIT/2003-01-04-ArgumentBug.ll > /dev/null
--
Exit Code: 139

Command Output (stderr):
--
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: /opt/s/w/ir/x/w/staging/llvm_build/bin/lli /opt/s/w/ir/x/w/llvm-llvm-project/llvm/test/ExecutionEngine/MCJIT/2003-01-04-ArgumentBug.ll
 #0 0x000000010d026c4b llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/s/w/ir/x/w/staging/llvm_build/bin/lli+0x100ebfc4b)
 #1 0x000000010d024ac9 llvm::sys::RunSignalHandlers() (/opt/s/w/ir/x/w/staging/llvm_build/bin/lli+0x100ebdac9)
 #2 0x000000010d0272de SignalHandler(int) (/opt/s/w/ir/x/w/staging/llvm_build/bin/lli+0x100ec02de)
 #3 0x00007fff20919d7d (/usr/lib/system/libsystem_platform.dylib+0x7fff20353d7d)
 #4 0x0000000000000000 
 #5 0x000000010c1a3344 llvm::orc::LLJIT::deinitialize(llvm::orc::JITDylib&) (/opt/s/w/ir/x/w/staging/llvm_build/bin/lli+0x10003c344)
 #6 0x000000010c1a0beb runOrcJIT(char const*) (/opt/s/w/ir/x/w/staging/llvm_build/bin/lli+0x100039beb)
 #7 0x000000010c19bb5e main (/opt/s/w/ir/x/w/staging/llvm_build/bin/lli+0x100034b5e)
 #8 0x00007fff208eff3d (/usr/lib/system/libdyld.dylib+0x7fff20329f3d)
 #9 0x0000000000000002 
/opt/s/w/ir/x/w/staging/llvm_build/test/ExecutionEngine/MCJIT/Output/2003-01-04-ArgumentBug.ll.script: line 2: 11096 Segmentation fault: 11  /opt/s/w/ir/x/w/staging/llvm_build/bin/lli /opt/s/w/ir/x/w/llvm-llvm-project/llvm/test/ExecutionEngine/MCJIT/2003-01-04-ArgumentBug.ll > /dev/null

https://luci-milo.appspot.com/ui/p/fuchsia/builders/toolchain.ci/clang-mac-x64/b8785839382041226465/overview

Interesting, thanks for the report. I will have a look tomorrow.

This was due to my change to use a target flag to represent the thumb bit. Obviously, these are target-specific and ObjectLinkingLayer has to account for that. Relanding now with the fix.

We are seeing a build error on Windows building with MSVC. Attempt to access a private typedef.

[3510/4703] Building CXX object lib\ExecutionEngine\JITLink\CMakeFiles\LLVMJITLink.dir\ELF_aarch32.cpp.obj
FAILED: lib/ExecutionEngine/JITLink/CMakeFiles/LLVMJITLink.dir/ELF_aarch32.cpp.obj
C:\PROGRA~2\MICROS~1\2019\BUILDT~1\VC\Tools\MSVC\1428~1.299\bin\HostX64\x64\cl.exe  /nologo /TP -DGTEST_HAS_RTTI=0 -DSCE_TARGET_EAP=1 -DUNICODE -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS -D_GLIBCXX_ASSERTIONS -D_HAS_EXCEPTIONS=0 -D_LIBCPP_ENABLE_ASSERTIONS -D_SCL_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_WARNINGS -D_UNICODE -D__SCE__ -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib\ExecutionEngine\JITLink -IC:\j\w\779ddbee\p\llvm\lib\ExecutionEngine\JITLink -Iinclude -IC:\j\w\779ddbee\p\llvm\include -IC:\j\w\779ddbee\p\SIE\Telemetry\telemetry_utils\include -IC:\j\w\779ddbee\p\SIE\Telemetry\telemetry_api\include /DWIN32 /D_WINDOWS   /Zc:inline /Zc:preprocessor /Zc:__cplusplus /Oi /bigobj /W4 -wd4141 -wd4146 -wd4244 -wd4267 -wd4291 -wd4351 -wd4456 -wd4457 -wd4458 -wd4459 -wd4503 -wd4624 -wd4722 -wd4100 -wd4127 -wd4512 -wd4505 -wd4610 -wd4510 -wd4702 -wd4245 -wd4706 -wd4310 -wd4701 -wd4703 -wd4389 -wd4611 -wd4805 -wd4204 -wd4577 -wd4091 -wd4592 -wd4319 -wd4709 -wd5105 -wd4324 -w14062 -we4238 /Gw /MT /O2 /Ob2 /DCHECKING  /EHs-c- /GR- -UNDEBUG /Zc:__cplusplus -std:c++17 /showIncludes /Folib\ExecutionEngine\JITLink\CMakeFiles\LLVMJITLink.dir\ELF_aarch32.cpp.obj /Fdlib\ExecutionEngine\JITLink\CMakeFiles\LLVMJITLink.dir\LLVMJITLink.pdb /FS -c C:\j\w\779ddbee\p\llvm\lib\ExecutionEngine\JITLink\ELF_aarch32.cpp
C:\j\w\779ddbee\p\llvm\lib\ExecutionEngine\JITLink\ELF_aarch32.cpp(188): error C2248: 'llvm::jitlink::ELFLinkGraphBuilder<llvm::object::ELFType<llvm::support::little,false>>::ELFFile': cannot access private typedef declared in class 'llvm::jitlink::ELFLinkGraphBuilder<llvm::object::ELFType<llvm::support::little,false>>'
C:\j\w\779ddbee\p\llvm\lib\ExecutionEngine\JITLink\ELFLinkGraphBuilder.h(58): note: see declaration of 'llvm::jitlink::ELFLinkGraphBuilder<llvm::object::ELFType<llvm::support::little,false>>::ELFFile'
C:\j\w\779ddbee\p\llvm\lib\ExecutionEngine\JITLink\ELF_aarch32.cpp(103): note: see declaration of 'llvm::jitlink::ELFLinkGraphBuilder<llvm::object::ELFType<llvm::support::little,false>>'
C:\j\w\779ddbee\p\llvm\lib\ExecutionEngine\JITLink\ELF_aarch32.cpp(247): note: see reference to class template instantiation 'llvm::jitlink::ELFLinkGraphBuilder_aarch32<llvm::support::little>' being compiled

Thanks for your note. Apparently MSVC refers to use the typedef in the base class https://github.com/llvm/llvm-project/blob/release/16.x/llvm/lib/ExecutionEngine/JITLink/ELFLinkGraphBuilder.h#L58 while it should use llvm::object::ELFFile. Interesting behavior to bail out in this case. Let me fix that quickly.

Thank you for the swift action. Unfortunately another MSVC-ism has reared its head.

FAILED: lib/ExecutionEngine/JITLink/CMakeFiles/LLVMJITLink.dir/aarch32.cpp.obj
C:\PROGRA~2\MICROS~1\2019\BUILDT~1\VC\Tools\MSVC\1428~1.299\bin\HostX64\x64\cl.exe  /nologo /TP -DGTEST_HAS_RTTI=0 -DSCE_TARGET_EAP=1 -DUNICODE -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS -D_GLIBCXX_ASSERTIONS -D_HAS_EXCEPTIONS=0 -D_LIBCPP_ENABLE_ASSERTIONS -D_SCL_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_WARNINGS -D_UNICODE -D__SCE__ -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib\ExecutionEngine\JITLink -IC:\j\w\779ddbee\p\llvm\lib\ExecutionEngine\JITLink -Iinclude -IC:\j\w\779ddbee\p\llvm\include -IC:\j\w\779ddbee\p\SIE\Telemetry\telemetry_utils\include -IC:\j\w\779ddbee\p\SIE\Telemetry\telemetry_api\include /DWIN32 /D_WINDOWS   /Zc:inline /Zc:preprocessor /Zc:__cplusplus /Oi /bigobj /W4 -wd4141 -wd4146 -wd4244 -wd4267 -wd4291 -wd4351 -wd4456 -wd4457 -wd4458 -wd4459 -wd4503 -wd4624 -wd4722 -wd4100 -wd4127 -wd4512 -wd4505 -wd4610 -wd4510 -wd4702 -wd4245 -wd4706 -wd4310 -wd4701 -wd4703 -wd4389 -wd4611 -wd4805 -wd4204 -wd4577 -wd4091 -wd4592 -wd4319 -wd4709 -wd5105 -wd4324 -w14062 -we4238 /Gw /MT /O2 /Ob2 /DCHECKING  /EHs-c- /GR- -UNDEBUG /Zc:__cplusplus -std:c++17 /showIncludes /Folib\ExecutionEngine\JITLink\CMakeFiles\LLVMJITLink.dir\aarch32.cpp.obj /Fdlib\ExecutionEngine\JITLink\CMakeFiles\LLVMJITLink.dir\LLVMJITLink.pdb /FS -c C:\j\w\779ddbee\p\llvm\lib\ExecutionEngine\JITLink\aarch32.cpp
C:\j\w\779ddbee\p\llvm\lib\ExecutionEngine\JITLink\aarch32.cpp(160): error C2672: 'formatv': no matching overloaded function found
C:\j\w\779ddbee\p\llvm\lib\ExecutionEngine\JITLink\aarch32.cpp(161): error C2893: Failed to specialize function template 'llvm::formatv_object<unknown-type> llvm::formatv(const char *,Ts &&...)'
C:\j\w\779ddbee\p\llvm\include\llvm/Support/FormatVariadic.h(251): note: see declaration of 'llvm::formatv'
C:\j\w\779ddbee\p\llvm\lib\ExecutionEngine\JITLink\aarch32.cpp(161): note: With the following template arguments:
C:\j\w\779ddbee\p\llvm\lib\ExecutionEngine\JITLink\aarch32.cpp(161): note: 'Ts={const llvm::support::ulittle16_t &, const llvm::support::ulittle16_t &, const char *}'
C:\j\w\779ddbee\p\llvm\lib\ExecutionEngine\JITLink\aarch32.cpp(159): error C2672: 'llvm::make_error': no matching overloaded function found

This was due to my change to use a target flag to represent the thumb bit. Obviously, these are target-specific and ObjectLinkingLayer has to account for that. Relanding now with the fix.

Thanks, and the issue is fixed in your reland.

Thank you for the swift action. Unfortunately another MSVC-ism has reared its head.

Should be fixed with 4cb0b7ce3b4987446264312d582dac9c9a98a488

The latest commit 4cb0b7ce fixed the 'formatv' error, but now there is a different error:

With VS2019, Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30148 for x64

FAILED: lib/ExecutionEngine/JITLink/CMakeFiles/LLVMJITLink.dir/aarch32.cpp.obj
C:\PROGRA~2\MICROS~1\2019\BUILDT~1\VC\Tools\MSVC\1428~1.299\bin\HostX64\x64\cl.exe /nologo /TP -DGTEST_HAS_RTTI=0 -DSCE_TARGET_EAP=1 -DUNICODE -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS -D_GLIBCXX_ASSERTIONS -D_HAS_EXCEPTIONS=0 -D_LIBCPP_ENABLE_ASSERTIONS -D_SCL_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_WARNINGS -D_UNICODE -DSCE -DSTDC_CONSTANT_MACROS -DSTDC_FORMAT_MACROS -DSTDC_LIMIT_MACROS -Ilib\ExecutionEngine\JITLink -IC:\j\w\779ddbee\p\llvm\lib\ExecutionEngine\JITLink -Iinclude -IC:\j\w\779ddbee\p\llvm\include ............ -wd4324 -w14062 -we4238 /Gw /MT /O2 /Ob2 /DCHECKING /EHs-c- /GR- -UNDEBUG /Zc:cplusplus -std:c++17 /showIncludes /Folib\ExecutionEngine\JITLink\CMakeFiles\LLVMJITLink.dir\aarch32.cpp.obj /Fdlib\ExecutionEngine\JITLink\CMakeFiles\LLVMJITLink.dir\LLVMJITLink.pdb /FS -c C:\j\w\779ddbee\p\llvm\lib\ExecutionEngine\JITLink\aarch32.cpp
C:\j\w\779ddbee\p\llvm\include\llvm/Support/FormatProviders.h(128): error C2872: 'detail': ambiguous symbol
C:\j\w\779ddbee\p\llvm\include\llvm/Support/Chrono.h(76): note: could be 'llvm::detail'
C:\j\w\779ddbee\p\llvm\include\llvm/Support/Endian.h(202): note: or 'llvm::support::detail'

This error is the key to these problem. There are two namespace named 'details', llvm::details and llvm::support::details. Adding 'using namespace support' introduces both at the same level, so details:: becomes ambiguous.

Perhaps moving 'using namespace support' to inside function readAddendData can do the trick. That is the only function that needs it.

The latest commit 4cb0b7ce fixed the 'formatv' error, but now there is a different error:

Is this a public facing build bot? If not I am afraid it will have to wait until next week. The public facing MSVC bots seem ok, e.g.: https://lab.llvm.org/buildbot/#/builders/123

Sunil_Srivastava added a comment.EditedMar 26 2023, 10:29 AM

Is this a public facing build bot?

No, but multiple machines in our organization, using both VS2019 and VS2022 are showing these errors. I am myself puzzled as to why no public bots are failing.

Moreover, after fixing this error (by moving 'using namespace support' inside readAddendData), we ran into another error in another file (same compiler, same options):

C:\j\w\p\cpu-toolchain-ppr\llvm\lib\ExecutionEngine\JITLink\ELF_aarch32.cpp(188): error C2248: 'llvm::jitlink::ELFLinkGraphBuilder<llvm::object::ELFType<llvm::support::little,false>>::ELFFile': cannot access private typedef declared in class 'llvm::jitlink::ELFLinkGraphBuilder<llvm::object::ELFType<llvm::support::little,false>>'
C:\j\w\p\cpu-toolchain-ppr\llvm\lib\ExecutionEngine\JITLink\ELFLinkGraphBuilder.h(58): note: see declaration of 'llvm::jitlink::ELFLinkGraphBuilder<llvm::object::ELFType<llvm::support::little,false>>::ELFFile'
C:\j\w\p\cpu-toolchain-ppr\llvm\lib\ExecutionEngine\JITLink\ELF_aarch32.cpp(103): note: see declaration of 'llvm::jitlink::ELFLinkGraphBuilder<llvm::object::ELFType<llvm::support::little,false>>'
C:\j\w\p\cpu-toolchain-ppr\llvm\lib\ExecutionEngine\JITLink\ELF_aarch32.cpp(247): note: see reference to class template instantiation 'llvm::jitlink::ELFLinkGraphBuilder_aarch32<llvm::support::little>' being compiled
C:\j\w\p\cpu-toolchain-ppr\llvm\lib\ExecutionEngine\JITLink\ELF_aarch32.cpp(188): error C2143: syntax error: missing ')' before '<'
C:\j\w\p\cpu-toolchain-ppr\llvm\lib\ExecutionEngine\JITLink\ELF_aarch32.cpp(188): error C2143: syntax error: missing ';' before '<'
C:\j\w\p\cpu-toolchain-ppr\llvm\lib\ExecutionEngine\JITLink\ELF_aarch32.cpp(188): error C2059: syntax error: '<'
C:\j\w\p\cpu-toolchain-ppr\llvm\lib\ExecutionEngine\JITLink\ELF_aarch32.cpp(188): error C2059: syntax error: ')'
C:\j\w\p\cpu-toolchain-ppr\llvm\lib\ExecutionEngine\JITLink\ELF_aarch32.cpp(188): error C2334: unexpected token(s) preceding ':'; skipping apparent function body
C:\j\w\p\cpu-toolchain-ppr\llvm\lib\ExecutionEngine\JITLink\ELF_aarch32.cpp(244): fatal error C1075: '{': no matching token found

I think the key here is the line (in ELFLinkGraphBuilder.h)

using ELFFile = object::ELFFile<ELFT>;

and then using ELFFIle as a template in a scope where this using statement is visible.

As before, I cannot explain why public bots are not showing this, but I can see it on more than one machines.

Is this a public facing build bot?

No, but multiple machines in our organization, using both VS2019 and VS2022 are showing these errors. I am myself puzzled as to why no public bots are failing.

Hi Sunil, I just ran a build with VS2022 (MSVC 19.33 on Windows 10) myself and it works without any issue. The clang-x64-windows-msvc bot I linked runs VS2019. It works as well and it did build the respective translation unit: https://lab.llvm.org/buildbot/api/v2/logs/26643253/raw

[969/2521] Building CXX object lib\Analysis\CMakeFiles\LLVMAnalysis.dir\ModuleSummaryAnalysis.cpp.obj
[970/2521] Building CXX object lib\ExecutionEngine\Interpreter\CMakeFiles\LLVMInterpreter.dir\Interpreter.cpp.obj
[971/2521] Building CXX object lib\ExecutionEngine\JITLink\CMakeFiles\LLVMJITLink.dir\aarch32.cpp.obj
[972/2521] Building CXX object lib\ExecutionEngine\CMakeFiles\LLVMExecutionEngine.dir\ExecutionEngine.cpp.obj

I think the key here is the line (in ELFLinkGraphBuilder.h)

using ELFFile = object::ELFFile<ELFT>;

and then using ELFFIle as a template in a scope where this using statement is visible.

As before, I cannot explain why public bots are not showing this, but I can see it on more than one machines.

Can you please make the necessary changes and set up a review for it? Without a public facing build bot or a way to reproduce your issue, all my attempts to fix it will be in the dark. Thanks!

llvm/lib/ExecutionEngine/JITLink/aarch32.cpp