Extend describeLoadedValue with support for target specific ARM and AArch64 instruction interpretation. Some of the instructions can operate with global string addresses or constant pool indexes but such cases are omitted since we currently lack flexible support for processing such operands at dwarf generating backed.
Details
Diff Detail
Event Timeline
In general, is it really safe to describe loaded values at the call sites?
For example, consider the following program:
caller.c:
#include <stdint.h> extern int callee(uint64_t); extern void escape(uint64_t *); uint64_t value = 0x1234; int main() { uint64_t local = value; escape(&local); callee(local); return 0; }
callee.c:
#include <stdint.h> extern void clobber(void); static uint64_t *ptr; void escape(uint64_t *p) { ptr = p; } int callee(uint64_t p) { *ptr = 0xdead; clobber(); // Print parameter here. return 0; }
compiled using -O3 -g -Xclang -femit-debug-entry-values --target=aarch64 results in the following call site entry for callee's parameter:
0x000000cf: DW_TAG_GNU_call_site_parameter DW_AT_location (DW_OP_reg0 W0) DW_AT_GNU_call_site_value (DW_OP_breg31 WSP+1, DW_OP_deref)
In this case it seems that the parameter would be printed incorrectly due to deref'ing local, which has been modified after the call to callee began? I unfortunately don't have any ARM machine/qemu up and running, so I can't verify this.
(A side-comment, but shouldn't the offset be 8 here? The value corresponds to a ldr x0, [sp, #8].)
I think that the same concern holds about the MI.hasOneMemOperand() part of TargetInstrInfo's implementation of describeLoadedValue().
The DWARF standard (Sec. 3.4.2) states that call site values should be omitted if they may be clobbered by the call:
If it is not possible to avoid registers or memory locations that might be clobbered by the call in the expression, then the DW_AT_call_value attribute should not be provided. The reason for the restriction is that the value of the parameter may be needed in the midst of the callee, where the call clobbered registers or memory might be already clobbered, and if the consumer is not assured by the producer it can safely use those values, the consumer can not safely use the values at all.
Or am I overlooking something here?
include/llvm/CodeGen/TargetInstrInfo.h | ||
---|---|---|
891 | spelling: Immediate | |
lib/Target/ARM/ARMBaseInstrInfo.cpp | ||
5327 | nit: describe | |
5348 | nit: also |
@dstenb you are right! I've tried this example and it is as you expected. We should avoid dereferencing memory locations. I will address those issues in separate patch.
@dstenb @NikolaPrica, thanks for sharing, that's a great example. I've filed llvm.org/PR43343 so we can track the issue and discuss solutions.
lib/CodeGen/AsmPrinter/DwarfDebug.cpp | ||
---|---|---|
682–684 | Any sense of what it would take to add support for such entry values? Would it be enough to store the expression alongside the new register in the work list, and then concatenate the two expressions when finding an instruction that describes the new register? Or must we consider how the expressions look in some way? |
lib/CodeGen/AsmPrinter/DwarfDebug.cpp | ||
---|---|---|
682–684 | I believe that such pair of register and expression would be enough for generating new entry value. For now, there are simple cases such as DW_OP_plus/minus over the parameter forwarding registers. This new restriction came from instructions such as "$w0 = nsw ADDWri $w0, 2, 0". |
include/llvm/CodeGen/TargetInstrInfo.h | ||
---|---|---|
888 | I wonder if the hook should allow the source and destination to be different, as we then for example could describe cases like this: $reg0 = add $frame-ptr, -13 If so, would it then make sense to move the LEA part of X86's describeLoadedValue() hook into this hook instead? |
include/llvm/CodeGen/TargetInstrInfo.h | ||
---|---|---|
888 | If so, should we perhaps also consider generalizing the hook so that it has a Destination out-parameter, e.g. same as isCopyInstr()? That could probably be helpful if we make the describeLoadedValue() hook aware of which register it should describe, as we discussed in D67225. |
include/llvm/CodeGen/TargetInstrInfo.h | ||
---|---|---|
888 |
In fact we should only relay on situations were source and destination operands are different. Such restriction should be used at general part of describeLoadedValue(). There is no use of describing situatios like $reg0 = add $reg0, 4 This case would require recursive description of $reg0. Describing such instruction is a different story.
The LEA instruction is more complex than add immidiate instruction. It could be observed as add immidiate for one case but it could also have addition of multiple source registers with some multiplication operations. IMHO it would be better to keep this API function clean in sense of recognizing only clear add immidiate instruction for purpose of further usage.
In the first version of this function I've added a destination operand but I've removed it since there was no current use of it. But for further flexibilty I will add it. |
include/llvm/CodeGen/TargetInstrInfo.h | ||
---|---|---|
888 |
In previous revisions of the downstream target we develop for we had to resort to: $reg0 = mov $frame-ptr $reg0 = add $reg0, $offset instead of loading the frame pointer with an offset in one instruction. Perhaps there is some upstream target that requires the same?
Is that due to the issue with expressions in collectCallSiteParameters() which we discussed earlier in this patch?
Okay, that sounds fair. Moving some parts to the LEA implementation to this hook, and keeping the rest in describeLoadedValue() would probably not be ideal.
Okay, thanks! |
include/llvm/CodeGen/TargetInstrInfo.h | ||
---|---|---|
888 |
Yes. You are right. Such cases should be handled the way we discussed there. But until such support is provided such instructions ($reg0 = add $reg0, 4) should be omitted. I will emphasize that as TODO comment. |
-Include Destination operand in isAddImmediate()
-Exclude description of add immediate instructions where source and destination registers are the same.
include/llvm/CodeGen/TargetInstrInfo.h | ||
---|---|---|
888 |
This part of the comment needs to be updated to reflect that the destination and source can be any operands. | |
lib/Target/ARM/ARMBaseInstrInfo.cpp | ||
5335 | nit: theirs -> their | |
test/DebugInfo/MIR/AArch64/dbgcall-site-interpretation.mir | ||
3 | I'm sorry for such a late comment about this, but if you still have the C reproducers readily available I think it would be helpful to add them to the tests (assuming that the IR or MIR haven't been substantially modified). |
Thanks for adding the source-level reproducers!
LGTM except for the some minor inline comments. I think it would be good to let the others have their say about this also.
lib/CodeGen/TargetInstrInfo.cpp | ||
---|---|---|
1138–1141 | Would it be safe to return such values without this check due to the getNumElements() == 0 check in collectCallSiteParameters()? If so, perhaps we should just consider this a limitation in collectCallSiteParameters(), and let the hook describe the values even though they currently can't be used by the caller. | |
lib/Target/AArch64/AArch64InstrInfo.cpp | ||
5821–5823 | nit: Indentation. There are some minor indentation things inside the switch case also (e.g. space before parenthesis in assert (). | |
lib/Target/ARM/ARMBaseInstrInfo.cpp | ||
5335 | That typo is still here. |
lib/CodeGen/TargetInstrInfo.cpp | ||
---|---|---|
1138–1141 | It would be safe. getNumElements() == 0 would cover this case. But I wanted to explicitly document such behaviour in form of code block and TODO. Although in describeLoadedValue () we choose what we describe it may be better to move such verification to collectCallSiteParameters ()? |
lib/CodeGen/TargetInstrInfo.cpp | ||
---|---|---|
1138–1141 | Sorry for the delay! Moving the verification to collectCallSiteParameters() sounds good to me. |
-Rebase
-Move register identity transformation check to DwarfDebug::collectCallSiteParameters()
include/llvm/CodeGen/TargetInstrInfo.h | ||
---|---|---|
888 | nit: \c Source |
include/llvm/CodeGen/TargetInstrInfo.h | ||
---|---|---|
888 | Sorry, I'm not sure what are you aiming at. Do you think I should replace the order of Source and Destination? |
include/llvm/CodeGen/TargetInstrInfo.h | ||
---|---|---|
888 | Looking at the Doxygen documentation, it appears that @ is an alias for \, i.e. a command prefix, so I think this says that Source and Destination are Doxygen commands. \c Source would print the variable with the typewriter font, which I guess was the intention here? |
include/llvm/CodeGen/TargetInstrInfo.h | ||
---|---|---|
888 | Correct. We generally prefer the \ version over @ in LLVM. |
lib/Target/ARM/ARMBaseInstrInfo.cpp | ||
---|---|---|
5323 | Sorry for being late in the game. I find it weird for a function called is to have inout parameters. What about doing something along the lines of: struct DestSourcePair {const MachineOperand *Destination, *Source;}; Optional<DestSource> isAddImmediate(MI); |
lib/Target/ARM/ARMBaseInstrInfo.cpp | ||
---|---|---|
5323 | That sound better. But since isCopyInstr and isCopyInstrImpl have similar signature maybe I should post such change as a separate NFC patch? |
lib/Target/ARM/ARMBaseInstrInfo.cpp | ||
---|---|---|
5323 | Sure. Either way works. |
I wonder if the hook should allow the source and destination to be different, as we then for example could describe cases like this:
If so, would it then make sense to move the LEA part of X86's describeLoadedValue() hook into this hook instead?