Details
Diff Detail
Event Timeline
Hi Everyone,
Currently this patch is WIP, I need to do some clean up, and add a unittest to it. I managed to test it with some example, it seems it works. I will update it early next week.
Looks sensible overall, except for two relocations that have incorrect interpretations. You seem to be missing a couple relocation types as well; check lld/COFF/Chunks.cpp for reference on how to handle the rest of them. Some of them are not very probable to use in JITed code (as TLS code requires space for the variables to be allocated in the TLS section by the system's runtime loader; the SECREL_LOW/HIGH_12A/L relocations are used for that), but most of the remaining ones are pretty straightforward to handle anyway.
lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFAArch64.h | ||
---|---|---|
170 | No, this is wrong - and this is one of the less obvious details. If you have an addend stored in the instruction pointed to by IMAGE_REL_ARM64_PAGEBASE_REL21, the addend is expressed in bytes, not in 4096 byte pages. Consider you have a symbol close to the end of a page, and you want to express an offset by a few bytes (less than a page), that makes the pointed to location in another page. If the addend would express a number of pages (as this patch expects right now), the addend here would be zero, and you'd end up with this part of the instruction pair pointing at the wrong page. Therefore, the immediate stored in the instruction before handling relocation is expressed as a number of bytes, even though it means a number of pages after the relocation is done and the instruction is executed. | |
176 | Nit: The indentation of the first comment line is off here | |
181 | Nit: Double spaces between "store" and "or" | |
189 | I guess this check could also be for the relocation type IMAGE_REL_ARM64_PAGEOFFSET_12L? | |
211 | Would it make more sense, stylistically, to extend the ifdef around the debug statement as well? Right now it does look weird to have code referring to variables that don't exist (even though LLVM_DEBUG will make them disappear). | |
221 | The indentation here is weird. Please run clang-format-diff -style LLVM on the changes. | |
244 | This doesn't seem right. IMAGE_REL_ARM64_ADDR64 is a plain 64 bit integer (just like IMAGE_REL_ARM64_ADDR32NB), not a series of instructions that should get immediates added. | |
288 | This switch for checking for alignment is rather verbose - would it make sense to condense it down to a single expression for all alignments? | |
365 | Missing newline at end of file |
Martin, thanks for your thoughts. I'm on to fix them.
lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFAArch64.h | ||
---|---|---|
170 | Thanks for explaining this. It seems I misunderstood this behavior. To be honest I looked at how MachOAArch64 does, how they decode the addend, I compared to aarch64 reference manual to see how it works, it seemed to me I can do the same thing as MachO does. | |
189 | It is just for IMAGE_REL_ARM64_PAGEOFFSET_12L, we need to determine the shift value only for load/store, since for instructions ADD/ADDS (immediate) we should use zero shift. | |
211 | Yes, you're right, I will fix it. | |
221 | Thanks, I will use it. | |
244 | Yes, you're right. Currently I handle a long branch instruction with this relocation, so when I needed to create stub function (which generates movz/movk instruction), I just set this relocation to detect this case when we have an external symbol. When I tested it with a small examples and Swithshader It worked because I haven't got this relocation type. Of course I should use another one, maybe an internal type. | |
288 | Sure, it could be. |
lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFAArch64.h | ||
---|---|---|
170 | I guess it differs a bit between how the different object file formats encode the relocations and symbol offsets. (IIRC ELF and MachO can say that a relocation is relative to a symbol with offset, while COFF only points at a symbol, and any offset must be applied via the instruction immediates.) | |
189 | Yes, but the relocation names already imply this. The L suffixed relocation is used for loads/stores, and the A suffixed relocation is used for add instructions. For IMAGE_REL_ARM64_PAGEOFFSET_12L we do not need to check whether the instruction is a load/store, but we can go directly to reading out the implicit shift amount, and for IMAGE_REL_ARM64_PAGEOFFSET_12A we should not read any implicit shift amount at all. See lld/COFF/Chunks.cpp, SectionChunk::applyRelARM64. For IMAGE_REL_ARM64_PAGEOFFSET_12L we call applyArm64Ldr which reads out the shift amount and then calls applyArm64Imm, while applyArm64Imm is called directly for IMAGE_REL_ARM64_PAGEOFFSET_12A. In general, when a linker (either dynamic or static) resolves a relocation, it should seldom need to inspect the instruction it is applied on, even though it is needed here for reading out the implicit shift amount. In general, the relocation type just encodes a specific action that should be done on that memory location with very little extra logic. | |
244 | Hmm, as I'm not familiar with those bits that generate it, I don't know for sure. As you're already setting a small code model for aarch64/win, doesn't that already achieve this? Otherwise some variant of adrp+add would normally be used for forming any arbitrary address, instead of a series of mov instructions, unless the value to be formed is a constant. If necessary I guess one could consider using private internal relocation types, but I'd at least defer those bits to a later patch where it can be discussed properly on its own, and keep this first for the official relocation types. |
Martin, sorry for the delayed response.
lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFAArch64.h | ||
---|---|---|
170 | Okay, I see now. |
lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFAArch64.h | ||
---|---|---|
170 | I presume you meant "it could be decoded before the relocation applied", not encoded? Yes, you could do the decode+update+encode all in one step. When reviewing I noted that this followed such a two-step style, but I presumed this came from general RuntimeDyld design. In lld it's all done in one single function. I'm not familiar with RuntimeDyld to say if there's any specific needs here for it to be this way, but if other parts of RuntimeDyld does it this way (in particular, other COFF architectures) it might be good to match the style. |
lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFAArch64.h | ||
---|---|---|
170 | Yes, that's what I meant. Thanks for your suggestion. Yes, I follow that style, since RuntimeDyld design follows a two-step style, but as far as I see there is no any style convention, for example in COFF Thumb a few instruction's addend are decoded, most of them are not. I don't see any significance to be this way, but I would like to do a clean approach for this. |
Looking pretty good now, just a few details I found. Then this needs tests to get rid of the WIP status.
lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFAArch64.h | ||
---|---|---|
47 | Leftover commented out code? | |
58 | Is there any better error reporting mechanism available here? The current form of assert is wrong, as the address of the string will evaluate as nonzero and the assert wouldn't trigger. Maybe assert(0 && "message") in case there's no better way of actually reporting the issue to the caller. | |
118 | I can't say I entirely understand this function (I understand what it tries to do, but fail to follow exactly how the details fit together), but as I see that it is very similar to the corresponding existing code for x86_64, I presume it's correct. | |
172 | Nit: Here and below you have pretty superfluous outer parentheses | |
181 | This doesn't seem to be right? For instructions B.cond and e.g. cbz, you have 5 least significant bits of the instruction being other data than the immediate, and after that, the resulting value should be left shifted by 2. So here, it should be Addend = ((orig & 0x00FFFFE0) >> 5) << 2; (for clarity) or Addend = (orig & 0x00FFFFE0) >> 3; (for more straightforward but less obvious code). Something similar should be done for branch14 below as well. LLD currently actually ignores the existing immediate in these relocations (which hasn't been an issue so far, but technically is an oversight). | |
206 | Do we need to support reading out the immediate from INTERNAL_REL_ARM64_LONG_BRANCH26 here? Or as the only place that generates it writes a zero immediate I guess it's not necessary? | |
303 | If there actually was a nonzero immediate in the instruction here from before, or32le won't do the right thing. We don't handle this in lld right now, but as this code at least tries to read out the immediate further up, it would be good for consistency to actually clear the immediate from the branch instruction here before or'ing in the final value. | |
339 | Hmm, where does Value end up added to this one? I do see that the existing COFF targets does it the same way, and that code does seem to be used and have a working test, but I don't see how it works. Do you have any clue? |
Thanks for the lots of comment, it was really helpful. I update the patch, I enclose a unit_test for this. There are tree test failures (adr, b.cond, secrel.), I'm still looking for the reasons.
Expression 'decode_operand(adr1, 1) = (_const[20:0] - adr1[20:0])' is false: 0x2014 != 0x10014
Expression 'decode_operand(bcond, 1)[23:5] = (_foo - bcond)[20:2]' is false: 0x7ffff != 0x7fff8
Expression '*{4}secrel = _foo - section_addr(COFF_AArch64.obj, .text)' is false: 0xfffeffea != 0x4
lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFAArch64.h | ||
---|---|---|
118 | Yes, the logic is the same as other COFF targets do. Basically we should handle an external symbol which is so far. So first time when we detected we have an external symbol, generate a stub function (movz/movk [1]). Currently the Value contains the address of the external symbol for the original relocation branch26 (b/bl instruction). We need to replace the Value here to point to the address of the stub instead of the external symbol. That's why we need to call resolveRelocation for the original relocation, but here we will pass the stub address as Value to it. So when original branch26 relocation is resolved that will point to stub. After that we returned with our internal relocation type and stuboffset, create RelocationEntry for these data and we can use the original Value for external symbol. Finally we will resolve the relocation for this internal type, the symbol address will be encode the into movz/movk stub instructions. all in all we're going to have two jump, bl -> stub -> external symbol. [1]: https://github.com/llvm-mirror/llvm/blob/master/lib/ExecutionEngine/RuntimeDyld/RuntimeDyld.cpp#L933 | |
206 | Yes, it is not necessary since the Addend is always zero this internal type. | |
303 | Yes, you're right . maybe the immediate for these instruction could be cleared when the addends are decoded. | |
339 | Unfortunately I'm not sure what happens under the hood, but it seems it is correct to use only the addend here, which contains the offset of the item from the beginning of its section in this case. |
lib/ExecutionEngine/RuntimeDyld/Targets/RuntimeDyldCOFFAArch64.h | ||
---|---|---|
118 | Yes, that much I understand (I implemented the same feature in the lld linker as well) - it was more relating to the exact details of each of the method calls here, but I don't have any concrete question regarding it right now either. | |
194 | You can't do the masking out of the original immediate here; updating orig has no effect at all, as that's a local variable. Handling masking out the old value within processRelocationRef feels wrong in general (as this function only inspects and gathers info but doesn't update anything yet), I think this should be in resolveRelocation. So then you can't use or32le there in those cases, but more something like write32le(P, (read32le(P) & ~Mask) | V); (which perhaps can warrant a helper function of its own). |
LGTM
Do you need someone to commit this for you? Do you have a preferred form for the git author line in that case ("User Name <email@address>"); with or without non-ascii diacritics?
Martin, thanks for the review. Yes, I need. May I ask you to commit this change? I use "Adam Kallai <kadam@inf.u-szeged.hu> in the git author line.
test/ExecutionEngine/RuntimeDyld/AArch64/COFF_AArch64.s | ||
---|---|---|
128 | FWIW, I had to fix up the filename here, to COFF_AArch64.s.tmp.obj to make the tests pass. Pushed with that changed. |
test/ExecutionEngine/RuntimeDyld/AArch64/COFF_AArch64.s | ||
---|---|---|
128 | That's interesting. I ran tests manually, so I think that's why I didn't noticed this failure. So LLVM test infrastructure use this form of the object file? Thanks for fixing that.
|
test/ExecutionEngine/RuntimeDyld/AArch64/COFF_AArch64.s | ||
---|---|---|
128 | Ahh sorry, I see now, you've modified that as well. Thanks again. |
test/ExecutionEngine/RuntimeDyld/AArch64/COFF_AArch64.s | ||
---|---|---|
128 | That's how the test infrastructure expands it yes. The %t expands to a temporary file name based on the current file name, plus .tmp. You can manually run the tests for just one file, by running bin/llvm-lit -v path/to/test/ExecutionEngine/Foo/file.s, for a subdirectory tree with the same command pointing to a directory, or all of them (with ninja check-llvm). |
My CL is broke the buildbot, a break is necessary in default case. How could I fix it? Shall I need to update this change?
I pushed a new commit to fix this now.
You could post a new patch for fixing it, but in this case the fix was trivial enough to just do myself. If fixing takes longer, it might be good to revert the commit in the meantime, and then it might be better to update the full patch so that it can be reapplied in fixed form.
Coming to this a bit late -- Thanks for working on this kaadam! And thanks for the review mstorsjo.
Does COFF/AArch64 use a GOT? I couldn't see handling for that here?
Leftover commented out code?