This is an archive of the discontinued LLVM Phabricator instance.

[ARM][AArch64][DebugInfo] Improve call site instruction interpretation
ClosedPublic

Authored by NikolaPrica on Sep 13 2019, 8:27 AM.

Details

Summary

Extend describeLoadedValue with support for target specific ARM and AArch64 instruction interpretation. Some of the instructions can operate with global string addresses or constant pool indexes but such cases are omitted since we currently lack flexible support for processing such operands at dwarf generating backed.

Diff Detail

Event Timeline

NikolaPrica created this revision.Sep 13 2019, 8:27 AM

In general, is it really safe to describe loaded values at the call sites?

For example, consider the following program:

caller.c:

#include <stdint.h>

extern int callee(uint64_t);
extern void escape(uint64_t *);

uint64_t value = 0x1234;

int main() {
  uint64_t local = value;
  escape(&local);
  callee(local);
  return 0;
}

callee.c:

#include <stdint.h>

extern void clobber(void);

static uint64_t *ptr;

void escape(uint64_t *p) {
  ptr = p;
}

int callee(uint64_t p) {
  *ptr = 0xdead;
  clobber();
  // Print parameter here.
  return 0;
}

compiled using -O3 -g -Xclang -femit-debug-entry-values --target=aarch64 results in the following call site entry for callee's parameter:

0x000000cf:       DW_TAG_GNU_call_site_parameter
                    DW_AT_location	(DW_OP_reg0 W0)
                    DW_AT_GNU_call_site_value	(DW_OP_breg31 WSP+1, DW_OP_deref)

In this case it seems that the parameter would be printed incorrectly due to deref'ing local, which has been modified after the call to callee began? I unfortunately don't have any ARM machine/qemu up and running, so I can't verify this.

(A side-comment, but shouldn't the offset be 8 here? The value corresponds to a ldr x0, [sp, #8].)

I think that the same concern holds about the MI.hasOneMemOperand() part of TargetInstrInfo's implementation of describeLoadedValue().

The DWARF standard (Sec. 3.4.2) states that call site values should be omitted if they may be clobbered by the call:

If it is not possible to avoid registers or memory locations that might be clobbered by the call in the expression, then the DW_AT_call_value attribute should not be provided. The reason for the restriction is that the value of the parameter may be needed in the midst of the callee, where the call clobbered registers or memory might be already clobbered, and if the consumer is not assured by the producer it can safely use those values, the consumer can not safely use the values at all.

Or am I overlooking something here?

include/llvm/CodeGen/TargetInstrInfo.h
891 ↗(On Diff #220110)

spelling: Immediate

lib/Target/ARM/ARMBaseInstrInfo.cpp
5327 ↗(On Diff #220110)

nit: describe

5348 ↗(On Diff #220110)

nit: also

-Address spelling issues.
-Dispatch memory loading instruction descriptions.

@dstenb you are right! I've tried this example and it is as you expected. We should avoid dereferencing memory locations. I will address those issues in separate patch.

-Remove unnecessary MIR test elements.

-Remove redundant comment.

vsk added a comment.Sep 17 2019, 2:17 PM

@dstenb you are right! I've tried this example and it is as you expected. We should avoid dereferencing memory locations. I will address those issues in separate patch.

@dstenb @NikolaPrica, thanks for sharing, that's a great example. I've filed llvm.org/PR43343 so we can track the issue and discuss solutions.

dstenb added inline comments.Sep 19 2019, 3:56 AM
lib/CodeGen/AsmPrinter/DwarfDebug.cpp
682–684 ↗(On Diff #220487)

Any sense of what it would take to add support for such entry values? Would it be enough to store the expression alongside the new register in the work list, and then concatenate the two expressions when finding an instruction that describes the new register? Or must we consider how the expressions look in some way?

NikolaPrica marked an inline comment as done.Sep 19 2019, 6:47 AM
NikolaPrica added inline comments.
lib/CodeGen/AsmPrinter/DwarfDebug.cpp
682–684 ↗(On Diff #220487)

I believe that such pair of register and expression would be enough for generating new entry value. For now, there are simple cases such as DW_OP_plus/minus over the parameter forwarding registers. This new restriction came from instructions such as "$w0 = nsw ADDWri $w0, 2, 0".

dstenb added inline comments.Sep 30 2019, 4:22 AM
include/llvm/CodeGen/TargetInstrInfo.h
888 ↗(On Diff #220487)

I wonder if the hook should allow the source and destination to be different, as we then for example could describe cases like this:

$reg0 = add $frame-ptr, -13

If so, would it then make sense to move the LEA part of X86's describeLoadedValue() hook into this hook instead?

dstenb added inline comments.Oct 3 2019, 8:05 AM
include/llvm/CodeGen/TargetInstrInfo.h
888 ↗(On Diff #220487)

If so, should we perhaps also consider generalizing the hook so that it has a Destination out-parameter, e.g. same as isCopyInstr()? That could probably be helpful if we make the describeLoadedValue() hook aware of which register it should describe, as we discussed in D67225.

NikolaPrica marked an inline comment as done.Oct 7 2019, 5:20 AM
NikolaPrica added inline comments.
include/llvm/CodeGen/TargetInstrInfo.h
888 ↗(On Diff #220487)

I wonder if the hook should allow the source and destination to be different

In fact we should only relay on situations were source and destination operands are different. Such restriction should be used at general part of describeLoadedValue(). There is no use of describing situatios like

$reg0 = add $reg0, 4

This case would require recursive description of $reg0. Describing such instruction is a different story.

If so, would it then make sense to move the LEA part of X86's describeLoadedValue() hook into this hook instead?

The LEA instruction is more complex than add immidiate instruction. It could be observed as add immidiate for one case but it could also have addition of multiple source registers with some multiplication operations. IMHO it would be better to keep this API function clean in sense of recognizing only clear add immidiate instruction for purpose of further usage.

If so, should we perhaps also consider generalizing the hook so that it has a Destination out-parameter, e.g. same as isCopyInstr()? That could probably be helpful if we make the describeLoadedValue() hook aware of which register it should describe, as we discussed in D67225.

In the first version of this function I've added a destination operand but I've removed it since there was no current use of it. But for further flexibilty I will add it.

dstenb added inline comments.Oct 7 2019, 6:53 AM
include/llvm/CodeGen/TargetInstrInfo.h
888 ↗(On Diff #220487)

In fact we should only relay on situations were source and destination operands are different. Such restriction should be used at general part of describeLoadedValue(). There is no use of describing situatios like

$reg0 = add $reg0, 4

In previous revisions of the downstream target we develop for we had to resort to:

$reg0 = mov $frame-ptr
$reg0 = add $reg0, $offset

instead of loading the frame pointer with an offset in one instruction. Perhaps there is some upstream target that requires the same?

This case would require recursive description of $reg0. Describing such instruction is a different story.

Is that due to the issue with expressions in collectCallSiteParameters() which we discussed earlier in this patch?

The LEA instruction is more complex than add immidiate instruction. It could be observed as add immidiate for one case but it could also have addition of multiple source registers with some multiplication operations. IMHO it would be better to keep this API function clean in sense of recognizing only clear add immidiate instruction for purpose of further usage.

Okay, that sounds fair. Moving some parts to the LEA implementation to this hook, and keeping the rest in describeLoadedValue() would probably not be ideal.

In the first version of this function I've added a destination operand but I've removed it since there was no current use of it. But for further flexibilty I will add it.

Okay, thanks!

NikolaPrica marked an inline comment as done.Oct 7 2019, 7:13 AM
NikolaPrica added inline comments.
include/llvm/CodeGen/TargetInstrInfo.h
888 ↗(On Diff #220487)

Is that due to the issue with expressions in collectCallSiteParameters() which we discussed earlier in this patch?

Yes. You are right. Such cases should be handled the way we discussed there. But until such support is provided such instructions ($reg0 = add $reg0, 4) should be omitted. I will emphasize that as TODO comment.

-Include Destination operand in isAddImmediate()
-Exclude description of add immediate instructions where source and destination registers are the same.

dstenb added inline comments.Oct 9 2019, 1:36 AM
include/llvm/CodeGen/TargetInstrInfo.h
888 ↗(On Diff #220487)

[...] adds an immediate value to its first operand and stores it in the first, [...]

This part of the comment needs to be updated to reflect that the destination and source can be any operands.

lib/Target/ARM/ARMBaseInstrInfo.cpp
5335 ↗(On Diff #223848)

nit: theirs -> their

test/DebugInfo/MIR/AArch64/dbgcall-site-interpretation.mir
2 ↗(On Diff #223848)

I'm sorry for such a late comment about this, but if you still have the C reproducers readily available I think it would be helpful to add them to the tests (assuming that the IR or MIR haven't been substantially modified).

-Update addImmediate() commet.
-Add source code producer for test case.

Any additional comments?

Thanks for adding the source-level reproducers!

LGTM except for the some minor inline comments. I think it would be good to let the others have their say about this also.

lib/CodeGen/TargetInstrInfo.cpp
1137–1140 ↗(On Diff #224077)

Would it be safe to return such values without this check due to the getNumElements() == 0 check in collectCallSiteParameters()? If so, perhaps we should just consider this a limitation in collectCallSiteParameters(), and let the hook describe the values even though they currently can't be used by the caller.

lib/Target/AArch64/AArch64InstrInfo.cpp
5698–5700 ↗(On Diff #224077)

nit: Indentation. There are some minor indentation things inside the switch case also (e.g. space before parenthesis in assert ().

lib/Target/ARM/ARMBaseInstrInfo.cpp
5335 ↗(On Diff #223848)

That typo is still here.

NikolaPrica marked an inline comment as done.

-AArch64InstrInfo::addImmediate() indentation

Thanks for useful comments @dstenb!

NikolaPrica marked an inline comment as not done.Oct 14 2019, 8:33 AM
NikolaPrica added inline comments.
lib/CodeGen/TargetInstrInfo.cpp
1137–1140 ↗(On Diff #224077)

It would be safe. getNumElements() == 0 would cover this case. But I wanted to explicitly document such behaviour in form of code block and TODO. Although in describeLoadedValue () we choose what we describe it may be better to move such verification to collectCallSiteParameters ()?

dstenb added inline comments.Oct 16 2019, 5:48 AM
lib/CodeGen/TargetInstrInfo.cpp
1137–1140 ↗(On Diff #224077)

Sorry for the delay!

Moving the verification to collectCallSiteParameters() sounds good to me.

-Rebase
-Move register identity transformation check to DwarfDebug::collectCallSiteParameters()

aprantl added inline comments.Oct 17 2019, 4:34 PM
include/llvm/CodeGen/TargetInstrInfo.h
947 ↗(On Diff #225430)

nit: \c Source

NikolaPrica marked an inline comment as done.Oct 22 2019, 1:45 AM
NikolaPrica added inline comments.
include/llvm/CodeGen/TargetInstrInfo.h
947 ↗(On Diff #225430)

Sorry, I'm not sure what are you aiming at. Do you think I should replace the order of Source and Destination?

dstenb added inline comments.Oct 23 2019, 5:34 AM
include/llvm/CodeGen/TargetInstrInfo.h
947 ↗(On Diff #225430)

Looking at the Doxygen documentation, it appears that @ is an alias for \, i.e. a command prefix, so I think this says that Source and Destination are Doxygen commands. \c Source would print the variable with the typewriter font, which I guess was the intention here?

aprantl added inline comments.Oct 23 2019, 12:13 PM
include/llvm/CodeGen/TargetInstrInfo.h
947 ↗(On Diff #225430)

Correct. We generally prefer the \ version over @ in LLVM.

NikolaPrica marked an inline comment as not done.

@ -> \c

aprantl added inline comments.Oct 25 2019, 8:36 AM
lib/Target/ARM/ARMBaseInstrInfo.cpp
5355 ↗(On Diff #226415)

Sorry for being late in the game. I find it weird for a function called is to have inout parameters. What about doing something along the lines of:

struct DestSourcePair {const MachineOperand *Destination, *Source;};
Optional<DestSource> isAddImmediate(MI);
NikolaPrica marked an inline comment as done.Oct 27 2019, 11:59 PM
NikolaPrica added inline comments.
lib/Target/ARM/ARMBaseInstrInfo.cpp
5355 ↗(On Diff #226415)

That sound better. But since isCopyInstr and isCopyInstrImpl have similar signature maybe I should post such change as a separate NFC patch?

aprantl added inline comments.Oct 28 2019, 12:31 PM
lib/Target/ARM/ARMBaseInstrInfo.cpp
5355 ↗(On Diff #226415)

Sure. Either way works.

NikolaPrica marked an inline comment as not done.Oct 29 2019, 9:31 AM

Is this good to go now?

vsk accepted this revision.Oct 29 2019, 10:11 AM

Thanks, lgtm. This seems to be in great shape.

This revision is now accepted and ready to land.Oct 29 2019, 10:11 AM
dstenb accepted this revision.Oct 29 2019, 11:48 AM

Yes, thanks!

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptOct 30 2019, 6:04 AM
Herald added a subscriber: hiraditya. · View Herald Transcript