This is an archive of the discontinued LLVM Phabricator instance.

ExecutionEngine: add preliminary support for COFF ARM64
ClosedPublic

Authored by kaadam on Oct 25 2019, 9:09 AM.

Diff Detail

Event Timeline

kaadam created this revision.Oct 25 2019, 9:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 25 2019, 9:09 AM

Hi Everyone,

Currently this patch is WIP, I need to do some clean up, and add a unittest to it. I managed to test it with some example, it seems it works. I will update it early next week.

Looks sensible overall, except for two relocations that have incorrect interpretations. You seem to be missing a couple relocation types as well; check lld/COFF/Chunks.cpp for reference on how to handle the rest of them. Some of them are not very probable to use in JITed code (as TLS code requires space for the variables to be allocated in the TLS section by the system's runtime loader; the SECREL_LOW/HIGH_12A/L relocations are used for that), but most of the remaining ones are pretty straightforward to handle anyway.

lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFAArch64.h
169 ↗(On Diff #226440)

No, this is wrong - and this is one of the less obvious details.

If you have an addend stored in the instruction pointed to by IMAGE_REL_ARM64_PAGEBASE_REL21, the addend is expressed in bytes, not in 4096 byte pages.

Consider you have a symbol close to the end of a page, and you want to express an offset by a few bytes (less than a page), that makes the pointed to location in another page.

If the addend would express a number of pages (as this patch expects right now), the addend here would be zero, and you'd end up with this part of the instruction pair pointing at the wrong page.

Therefore, the immediate stored in the instruction before handling relocation is expressed as a number of bytes, even though it means a number of pages after the relocation is done and the instruction is executed.

175 ↗(On Diff #226440)

Nit: The indentation of the first comment line is off here

180 ↗(On Diff #226440)

Nit: Double spaces between "store" and "or"

188 ↗(On Diff #226440)

I guess this check could also be for the relocation type IMAGE_REL_ARM64_PAGEOFFSET_12L?

210 ↗(On Diff #226440)

Would it make more sense, stylistically, to extend the ifdef around the debug statement as well? Right now it does look weird to have code referring to variables that don't exist (even though LLVM_DEBUG will make them disappear).

220 ↗(On Diff #226440)

The indentation here is weird. Please run clang-format-diff -style LLVM on the changes.

243 ↗(On Diff #226440)

This doesn't seem right. IMAGE_REL_ARM64_ADDR64 is a plain 64 bit integer (just like IMAGE_REL_ARM64_ADDR32NB), not a series of instructions that should get immediates added.

287 ↗(On Diff #226440)

This switch for checking for alignment is rather verbose - would it make sense to condense it down to a single expression for all alignments?

330 ↗(On Diff #226440)

Missing newline at end of file

kaadam marked 6 inline comments as done.Oct 29 2019, 9:54 AM

Martin, thanks for your thoughts. I'm on to fix them.

lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFAArch64.h
169 ↗(On Diff #226440)

Thanks for explaining this. It seems I misunderstood this behavior. To be honest I looked at how MachOAArch64 does, how they decode the addend, I compared to aarch64 reference manual to see how it works, it seemed to me I can do the same thing as MachO does.

188 ↗(On Diff #226440)

It is just for IMAGE_REL_ARM64_PAGEOFFSET_12L, we need to determine the shift value only for load/store, since for instructions ADD/ADDS (immediate) we should use zero shift.

210 ↗(On Diff #226440)

Yes, you're right, I will fix it.

220 ↗(On Diff #226440)

Thanks, I will use it.

243 ↗(On Diff #226440)

Yes, you're right. Currently I handle a long branch instruction with this relocation, so when I needed to create stub function (which generates movz/movk instruction), I just set this relocation to detect this case when we have an external symbol. When I tested it with a small examples and Swithshader It worked because I haven't got this relocation type. Of course I should use another one, maybe an internal type.
What do you think?

287 ↗(On Diff #226440)

Sure, it could be.

mstorsjo added inline comments.Oct 29 2019, 2:42 PM
lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFAArch64.h
169 ↗(On Diff #226440)

I guess it differs a bit between how the different object file formats encode the relocations and symbol offsets. (IIRC ELF and MachO can say that a relocation is relative to a symbol with offset, while COFF only points at a symbol, and any offset must be applied via the instruction immediates.)

188 ↗(On Diff #226440)

Yes, but the relocation names already imply this. The L suffixed relocation is used for loads/stores, and the A suffixed relocation is used for add instructions. For IMAGE_REL_ARM64_PAGEOFFSET_12L we do not need to check whether the instruction is a load/store, but we can go directly to reading out the implicit shift amount, and for IMAGE_REL_ARM64_PAGEOFFSET_12A we should not read any implicit shift amount at all.

See lld/COFF/Chunks.cpp, SectionChunk::applyRelARM64. For IMAGE_REL_ARM64_PAGEOFFSET_12L we call applyArm64Ldr which reads out the shift amount and then calls applyArm64Imm, while applyArm64Imm is called directly for IMAGE_REL_ARM64_PAGEOFFSET_12A.

In general, when a linker (either dynamic or static) resolves a relocation, it should seldom need to inspect the instruction it is applied on, even though it is needed here for reading out the implicit shift amount. In general, the relocation type just encodes a specific action that should be done on that memory location with very little extra logic.

243 ↗(On Diff #226440)

Hmm, as I'm not familiar with those bits that generate it, I don't know for sure. As you're already setting a small code model for aarch64/win, doesn't that already achieve this?

Otherwise some variant of adrp+add would normally be used for forming any arbitrary address, instead of a series of mov instructions, unless the value to be formed is a constant.

If necessary I guess one could consider using private internal relocation types, but I'd at least defer those bits to a later patch where it can be discussed properly on its own, and keep this first for the official relocation types.

kaadam marked an inline comment as done.Nov 4 2019, 2:29 AM

Martin, sorry for the delayed response.

lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFAArch64.h
169 ↗(On Diff #226440)

Okay, I see now.
By the way I don't necessary need to decode addend in processRelocationRef function, it could be encoded before the relocation applied, am I right? Which is better? Since the inst contains the offset in the immediate part, so it have to be considered (add the offset to 'Value') right before immediate is rewritten.
I update the change later today.

mstorsjo added inline comments.Nov 5 2019, 6:38 AM
lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFAArch64.h
169 ↗(On Diff #226440)

I presume you meant "it could be decoded before the relocation applied", not encoded?

Yes, you could do the decode+update+encode all in one step. When reviewing I noted that this followed such a two-step style, but I presumed this came from general RuntimeDyld design. In lld it's all done in one single function. I'm not familiar with RuntimeDyld to say if there's any specific needs here for it to be this way, but if other parts of RuntimeDyld does it this way (in particular, other COFF architectures) it might be good to match the style.

kaadam marked an inline comment as done.Nov 5 2019, 9:41 AM
kaadam added inline comments.
lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFAArch64.h
169 ↗(On Diff #226440)

Yes, that's what I meant. Thanks for your suggestion. Yes, I follow that style, since RuntimeDyld design follows a two-step style, but as far as I see there is no any style convention, for example in COFF Thumb a few instruction's addend are decoded, most of them are not. I don't see any significance to be this way, but I would like to do a clean approach for this.

kaadam updated this revision to Diff 228226.Nov 7 2019, 6:10 AM
kaadam edited the summary of this revision. (Show Details)

Looking pretty good now, just a few details I found. Then this needs tests to get rid of the WIP status.

lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFAArch64.h
46 ↗(On Diff #228226)

Leftover commented out code?

57 ↗(On Diff #228226)

Is there any better error reporting mechanism available here?

The current form of assert is wrong, as the address of the string will evaluate as nonzero and the assert wouldn't trigger. Maybe assert(0 && "message") in case there's no better way of actually reporting the issue to the caller.

117 ↗(On Diff #228226)

I can't say I entirely understand this function (I understand what it tries to do, but fail to follow exactly how the details fit together), but as I see that it is very similar to the corresponding existing code for x86_64, I presume it's correct.

171 ↗(On Diff #228226)

Nit: Here and below you have pretty superfluous outer parentheses

180 ↗(On Diff #228226)

This doesn't seem to be right?

For instructions B.cond and e.g. cbz, you have 5 least significant bits of the instruction being other data than the immediate, and after that, the resulting value should be left shifted by 2.

So here, it should be Addend = ((orig & 0x00FFFFE0) >> 5) << 2; (for clarity) or Addend = (orig & 0x00FFFFE0) >> 3; (for more straightforward but less obvious code).

Something similar should be done for branch14 below as well.

LLD currently actually ignores the existing immediate in these relocations (which hasn't been an issue so far, but technically is an oversight).

205 ↗(On Diff #228226)

Do we need to support reading out the immediate from INTERNAL_REL_ARM64_LONG_BRANCH26 here? Or as the only place that generates it writes a zero immediate I guess it's not necessary?

302 ↗(On Diff #228226)

If there actually was a nonzero immediate in the instruction here from before, or32le won't do the right thing. We don't handle this in lld right now, but as this code at least tries to read out the immediate further up, it would be good for consistency to actually clear the immediate from the branch instruction here before or'ing in the final value.

338 ↗(On Diff #228226)

Hmm, where does Value end up added to this one?

I do see that the existing COFF targets does it the same way, and that code does seem to be used and have a working test, but I don't see how it works. Do you have any clue?

kaadam marked 8 inline comments as done.Nov 15 2019, 5:22 AM

Thanks for the lots of comment, it was really helpful. I update the patch, I enclose a unit_test for this. There are tree test failures (adr, b.cond, secrel.), I'm still looking for the reasons.

Expression 'decode_operand(adr1, 1) = (_const[20:0] - adr1[20:0])' is false: 0x2014 != 0x10014
Expression 'decode_operand(bcond, 1)[23:5] = (_foo - bcond)[20:2]' is false: 0x7ffff != 0x7fff8
Expression '*{4}secrel = _foo - section_addr(COFF_AArch64.obj, .text)' is false: 0xfffeffea != 0x4

lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFAArch64.h
117 ↗(On Diff #228226)

Yes, the logic is the same as other COFF targets do. Basically we should handle an external symbol which is so far.

So first time when we detected we have an external symbol, generate a stub function (movz/movk [1]). Currently the Value contains the address of the external symbol for the original relocation branch26 (b/bl instruction). We need to replace the Value here to point to the address of the stub instead of the external symbol. That's why we need to call resolveRelocation for the original relocation, but here we will pass the stub address as Value to it. So when original branch26 relocation is resolved that will point to stub. After that we returned with our internal relocation type and stuboffset, create RelocationEntry for these data and we can use the original Value for external symbol. Finally we will resolve the relocation for this internal type, the symbol address will be encode the into movz/movk stub instructions.

all in all we're going to have two jump, bl -> stub -> external symbol.

[1]: https://github.com/llvm-mirror/llvm/blob/master/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp#L933

205 ↗(On Diff #228226)

Yes, it is not necessary since the Addend is always zero this internal type.

302 ↗(On Diff #228226)

Yes, you're right . maybe the immediate for these instruction could be cleared when the addends are decoded.

338 ↗(On Diff #228226)

Unfortunately I'm not sure what happens under the hood, but it seems it is correct to use only the addend here, which contains the offset of the item from the beginning of its section in this case.

kaadam updated this revision to Diff 229518.Nov 15 2019, 5:24 AM
kaadam retitled this revision from WIP: ExecutionEngine: add preliminary support for COFF ARM64 to ExecutionEngine: add preliminary support for COFF ARM64 .

change is updated, unit_test is attached.

mstorsjo added inline comments.Nov 15 2019, 5:48 AM
lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFAArch64.h
117 ↗(On Diff #228226)

Yes, that much I understand (I implemented the same feature in the lld linker as well) - it was more relating to the exact details of each of the method calls here, but I don't have any concrete question regarding it right now either.

193 ↗(On Diff #229518)

You can't do the masking out of the original immediate here; updating orig has no effect at all, as that's a local variable.

Handling masking out the old value within processRelocationRef feels wrong in general (as this function only inspects and gathers info but doesn't update anything yet), I think this should be in resolveRelocation.

So then you can't use or32le there in those cases, but more something like write32le(P, (read32le(P) & ~Mask) | V); (which perhaps can warrant a helper function of its own).

kaadam updated this revision to Diff 230078.Nov 19 2019, 8:09 AM

updated the write method in branch relocations, the unit_test is also updated.

kaadam updated this revision to Diff 230091.Nov 19 2019, 9:08 AM
mstorsjo accepted this revision.Nov 19 2019, 1:57 PM

LGTM

Do you need someone to commit this for you? Do you have a preferred form for the git author line in that case ("User Name <email@address>"); with or without non-ascii diacritics?

This revision is now accepted and ready to land.Nov 19 2019, 1:57 PM

Martin, thanks for the review. Yes, I need. May I ask you to commit this change? I use "Adam Kallai <kadam@inf.u-szeged.hu> in the git author line.

mstorsjo added inline comments.Nov 20 2019, 1:00 AM
test/ExecutionEngine/RuntimeDyld/AArch64/COFF_AArch64.s
127 ↗(On Diff #230091)

FWIW, I had to fix up the filename here, to COFF_AArch64.s.tmp.obj to make the tests pass. Pushed with that changed.

This revision was automatically updated to reflect the committed changes.
kaadam marked an inline comment as done.Nov 20 2019, 1:21 AM
kaadam added inline comments.
test/ExecutionEngine/RuntimeDyld/AArch64/COFF_AArch64.s
127 ↗(On Diff #230091)

That's interesting. I ran tests manually, so I think that's why I didn't noticed this failure. So LLVM test infrastructure use this form of the object file? Thanks for fixing that.
There was no problem with check in line 39?

  1. rtdyld-check: decode_operand(brel, 0)[25:0] = (stub_addr(COFF_AArch64.obj/.text, dummy) - brel)[27:2]
kaadam marked an inline comment as done.Nov 20 2019, 1:24 AM
kaadam added inline comments.
test/ExecutionEngine/RuntimeDyld/AArch64/COFF_AArch64.s
127 ↗(On Diff #230091)

Ahh sorry, I see now, you've modified that as well. Thanks again.

mstorsjo added inline comments.Nov 20 2019, 1:36 AM
test/ExecutionEngine/RuntimeDyld/AArch64/COFF_AArch64.s
127 ↗(On Diff #230091)

That's how the test infrastructure expands it yes. The %t expands to a temporary file name based on the current file name, plus .tmp.

You can manually run the tests for just one file, by running bin/llvm-lit -v path/to/test/ExecutionEngine/Foo/file.s, for a subdirectory tree with the same command pointing to a directory, or all of them (with ninja check-llvm).

My CL is broke the buildbot, a break is necessary in default case. How could I fix it? Shall I need to update this change?

My CL is broke the buildbot, a break is necessary in default case. How could I fix it? Shall I need to update this change?

I pushed a new commit to fix this now.

You could post a new patch for fixing it, but in this case the fix was trivial enough to just do myself. If fixing takes longer, it might be good to revert the commit in the meantime, and then it might be better to update the full patch so that it can be reapplied in fixed form.

My CL is broke the buildbot, a break is necessary in default case. How could I fix it? Shall I need to update this change?

I pushed a new commit to fix this now.

You could post a new patch for fixing it, but in this case the fix was trivial enough to just do myself. If fixing takes longer, it might be good to revert the commit in the meantime, and then it might be better to update the full patch so that it can be reapplied in fixed form.

Okay, I see. Thanks Martin.

lhames added a subscriber: lhames.Nov 26 2019, 2:57 PM

Coming to this a bit late -- Thanks for working on this kaadam! And thanks for the review mstorsjo.

Does COFF/AArch64 use a GOT? I couldn't see handling for that here?