Page MenuHomePhabricator

[2/4] [MC][NFC] Add an optional PreviousInst argument to MCInstrAnalysis::evaluateBranch
Needs ReviewPublic

Authored by asb on Jan 5 2022, 10:33 AM.

Details

Summary

evaluateBranch currently takes a single instruction. This works for most
targets, but is insufficient for e.g. auipc+jalr pairs on RISC-V. This
commit adds a new optional argument for the previous instruction,
allowing these targets to implement evaluateBranch for such instruction
pairs.

This patch simply adds the argument, and modifies llvm-objdump to pass
it.

This isn't a pretty solution, so if you have alternate proposals I'd be keen to discuss.

Diff Detail

Event Timeline

asb created this revision.Jan 5 2022, 10:33 AM
asb requested review of this revision.Jan 5 2022, 10:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2022, 10:34 AM

Checking the consecutive pair of instructions likely works for the majority of cases for some architectures. It fails when a RISC architecture interleaves the high and low parts with an unrelated instruction. This pattern is pretty common on AArch64.
Ideally that case should be handled as well.

% cat a.c
int a, b;
int foo() { return a + b; }
% clang --target=aarch64-linux-gnu -nostdlib -fuse-ld=lld -O1 a.c -o a
% llvm-objdump -d a
...
000000000021022c <foo>:
  21022c: 88 00 00 90   adrp    x8, 0x220000 <foo+0x40>
  210230: 89 00 00 90   adrp    x9, 0x220000 <foo+0x44>
  210234: 08 45 42 b9   ldr     w8, [x8, #580]
  210238: 29 49 42 b9   ldr     w9, [x9, #584]
  21023c: 20 01 08 0b   add     w0, w9, w8
  210240: c0 03 5f d6   ret

To obtain the addresses of a and b, we need to have states, which can be in MCInstrAnalysis itself.
llvm-objdump output is a disassembly dead listing. It is not a recursive disassembler and there is no basic block information.
So something may be fundamentally difficult to do, e.g. we may not reliably clear states when a new instruction is at the start of a new basic block...

asb added a comment.Jan 13 2022, 3:24 AM

Thanks for the review @MaskRay, I had thought about options such as adding state to MCInstrAnalysis, but perhaps was too quick to dismiss it (obviously partly influenced by the fact that the current RISC-V backend doesn't seem to generate output where just checking instruction pairs doesn't work!). As you suggest, an issue will be around tracking/clearing state upon basic block boundaries, but perhaps a simple heuristic will work well enough.

So perhaps:

  • Keep the existing evaluateBranch interface (i.e. pass a single MCInst)
  • Add a reset() method to MCInstrAnalysis to allow users to reset state as desired
  • Adding state to RISCVMCInstrAnalysis, and having evaluateBranch manipulate it.
  • Have evaluateBranch reset the state whenever encountering a control flow instruction. This is imprecise - it will be overly conservative in some potential cases and too restrictive in others, but hopefully on real-world code it means simple cases are appropriately annotated, and there are few/no instances of confusing, incorrect output.

What do you think? (Especially on the last bulletpoint)?

Thanks for the review @MaskRay, I had thought about options such as adding state to MCInstrAnalysis, but perhaps was too quick to dismiss it (obviously partly influenced by the fact that the current RISC-V backend doesn't seem to generate output where just checking instruction pairs doesn't work!). As you suggest, an issue will be around tracking/clearing state upon basic block boundaries, but perhaps a simple heuristic will work well enough.

So perhaps:

  • Keep the existing evaluateBranch interface (i.e. pass a single MCInst)
  • Add a reset() method to MCInstrAnalysis to allow users to reset state as desired
  • Adding state to RISCVMCInstrAnalysis, and having evaluateBranch manipulate it.
  • Have evaluateBranch reset the state whenever encountering a control flow instruction. This is imprecise - it will be overly conservative in some potential cases and too restrictive in others, but hopefully on real-world code it means simple cases are appropriately annotated, and there are few/no instances of confusing, incorrect output.

What do you think? (Especially on the last bulletpoint)?

Sounds good!