Instead of using constant pools, use movw movt pair.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/Thumb2/thumb2-execute-only-long-calls.ll | ||
---|---|---|
20 | Is there some reason we can't just generate movw r0, :lower16:bar; movt r0, :upper16:bar? |
@efriedma Thank you for your suggestion. I will remove the extra indirection.
I was wondering if you could also provide some insights about the RWPI case. I believe the same optimization also applies to RWPI. However, I actually want to store the function address as a global variable when using RWPI, because I want the address to live in RAM instead of Flash, so that I can redirect the function call at runtime, for dynamic linking purpose.
Should I create a new target feature to indicate that I want to store function address in RAM?
that I can redirect the function call at runtime, for dynamic linking purpose.
Can you describe a little more what you're trying to do here?
If you want to replace the implementation of an existing function at runtime, you'd be better off implementing the indirection as a frontend feature; by the time you get to the backend, optimizations have destroyed the semantics you want.
Can you describe a little more what you're trying to do here?
Sure. My eventual goal is to enable fine-granular live-update on ARM based microcontrollers, which requires the system to do some relocation at runtime. Below I will describe the challenge with a simple C example.
Consider the following C snippet:
extern void global_func(void); // A global function whose symbol is exported by the system at runtime. static void local_func(void) { ... } static void main_entry(void) { local_func(); global_func(); }
I want to load and run the compiled object file at runtime, which requires two steps.
- Burn the object file into Flash storage.
- Perform a runtime symbol resolution and relocation so that global_func is set to the runtime address.
The reason why I must store code in Flash storage is that the microcontroller I am using, as well as many other ARM based microcontrollers, has Flash storage 5x greater than RAM, and code typically directly runs from Flash.
local_func requires the compiler to use position independent code, which has already been handled by -fropi. global_func however, is the case I am trying to solve here.
Existing compiler options always store the address of global_func in Flash.
The default case:
main_entry: bl local_func b.w global_func // Relative address is hardcoded in the instruction, in Flash.
If compile with -mlong-calls:
main_entry: bl local_func ldr r0, [pc, #4] // Load address from constant pool, still in Flash. bx r0 .Lconst_pool: .word global_func
In the hypothetical case if the compiler chose to use movw movt pair:
main_entry: bl local_func movw r0, :lower16:global_func // Absolute address is hardcoded in the instruction, still in Flash. movt r0, :upper16:global_func bx. r0
I was expecting to use the "side effect" of -mexecute-only that promotes constant pools to global variables to achieve my goal of having the function address to live in RAM.
main_entry: bl local_func movw r0, :lower16:.const_pool(sbrel) movt r0, :upper16:.const_pool(sbrel) // Also using RWPI so that the jump table can be placed anywhere in RAM pointed by r9. ldr r0, [r9, r0] // Absolute address is held in RAM now. bx r0
As you have already pointed out, in the normal case when we do not need to put the address in RAM, the extra indirection is unnecessary and slows down the code.
But if I have a use case like above where I need to store the address in RAM, could you enlighten me about the best approach to achieve my goal?
The construct you want is pretty similar to a GOT. if you compile with -fPIE -fsemantic-interposition, you get basically the code you want, except that the compiler uses a plt by default instead of a got. If we supported -fno-plt for ARM, it would be almost exactly what you want. That said, that won't work with -frwpi... maybe we need some new kind of relocation to represent that.
Unfortunately, -fPIE seems not to be generating the PLT on LLVM for embedded ARM.
C source file (test.c):
extern void bar(void); void foo(void) { bar(); }
LLVM with clang -O2 -fPIE -fsemantic-interposition -mlong-calls --target=armv7em-none-eabi -c test.c:
00000000 <foo>: 0: 4800 ldr r0, [pc, #0] ; (4 <foo+0x4>) 2: 4700 bx r0 4: 00000000 .word 0x00000000
ARM GNU with arm-none-eabi-gcc -O2 -fPIE -mlong-calls -msingle-pic-base -mcpu=cortex-m4 -c test.c:
00000000 <foo>: 0: 4b01 ldr r3, [pc, #4] ; (8 <foo+0x8>) 2: f859 3003 ldr.w r3, [r9, r3] 6: 4718 bx r3 8: 00000000 .word 0x00000000
One, -mlong-calls isn't currently compatible with PIE. Two, on ARM, there are no special plt relocations; the linker just takes care of it. (You can see the differences if you try to take the address of a function without calling it.)
I have updated the diff to avoid the extra indirection. I am thinking about adding a new option, say -mgot-calls to allow code generation with the extra indirection. Is it sensible and shall I create another diff to discuss that?
I am thinking about adding a new option, say -mgot-calls to allow code generation with the extra indirection. Is it sensible and shall I create another diff to discuss that?
That probably makes sense, yes.
llvm/lib/Target/ARM/ARMISelLowering.cpp | ||
---|---|---|
2655 | Can we directly check that movw/movt is available? I think that's what we do in other places? (Then just assert we aren't execute-only in the non-movw path.) |
Then just assert we aren't execute-only in the non-movw path.
When we are not execute-only, existing code handles it by using constant pools and we are all good.
In the case where we are execute-only and long-calls at the same time, we assert that we have movt like in other places in the same source file.
LGTM with one small change.
clang/lib/Driver/ToolChains/Arch/ARM.cpp | ||
---|---|---|
779 | Fix this comment? |
Updated the comment to reflect that now we allow using -mlong-calls with -mexecute-only.
Just in case you assume that I have push permission, unfortunately I do not. Could you help me merge the patch in? Thanks.
Fix this comment?