This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Account for implicit IT when calculating inline asm size
ClosedPublic

Authored by peter.smith on Oct 3 2018, 9:37 AM.

Details

Summary

When deciding if it is safe to optimize a conditional branch to a CBZ or CBNZ the offsets of the BasicBlocks from the start of the function are estimated. For inline assembly the generic getInlineAsmLength() function is used to get a worst case estimate of the inline assembly by multiplying the number of instructions by the max instruction size of 4 bytes. This unfortunately doesn't take into account the generation of Thumb implicit IT instructions. In edge cases such as when all the instructions in the block are 4-bytes in size and there is an implicit IT then the size is underestimated. This can cause an out of range CBZ or CBNZ to be generated.

The patch takes a conservative approach and assumes that every instruction in the inline assembly block may have an implicit IT.

Fixes pr31805 https://bugs.llvm.org/show_bug.cgi?id=31805 which seems to be intermittently causing the Linux Kernel to fail to build on Arm.

Possible alternatives:

  • Be less conservative and a small constant, say 4-bytes to account for any overspill from implicit IT creation, this could still fail but would probably work well in practice.
  • The Arm MaxInstLength could be increased to 6, but this may not be right for Arm as there is some code in the ConstantIslands pass that assumes 4-byte alignment of Arm instructions.
  • Reimplement getInlineAsmLength() or some similar function to estimate how many implicit IT instructions are needed. Doing this accurately may require a lot of the functionality of an assembly parser though.

Diff Detail

Repository
rL LLVM

Event Timeline

peter.smith created this revision.Oct 3 2018, 9:37 AM

The Arm MaxInstLength could be increased to 6

This seems like a better solution.

but this may not be right for Arm as there is some code in the ConstantIslands pass that assumes 4-byte alignment of Arm instructions.

We can just round up the size of an inline asm to a multiple of 4 if the function is in ARM mode.

lib/Target/ARM/ARMComputeBlockSize.cpp
62 ↗(On Diff #168125)

Doing the adjustment *here* is pretty clearly wrong; getInstSizeInBytes is supposed to return a conservative estimate of the instruction size. If it isn't conservative enough, we should fix it *there*, rather than try to hack around it afterwards.

Thanks for the comments. I've made an attempt at a diff where the MaxInstLength is increased to 6. This moves all the significant code to getInstSizeInBytes(). I've applied the change to all Targets (Elf, Macho, Windows ...) as it is possible to use implicit-IT on all of them.

Code changes seem fine.

test/CodeGen/ARM/cbz-implicit-it-range.ll
2 ↗(On Diff #168278)

The second RUN line here doesn't actually run anything?

Updated diff to make the second run: command actually run and test. My apologies for missing that.

This revision is now accepted and ready to land.Oct 5 2018, 12:55 PM
This revision was automatically updated to reflect the committed changes.