Sometimes, LSR may change the way the ARM VCTP intrinsic's operand is calculated, and the change, while correct, will completely block tail-predication, for instance by defining the operand inside the loop.
This patch aims to fix this issue by adding a new TTI hook to tell LSR to ignore some instructions (for now, only the VCTP intrinsic on ARM).
This patch adds:
- A new TTI hook: bool canLSRFixupInstruction(Instruction *I), which returns false when LSR shouldn't change I's operands.
- A new function in LSR called FilterOutUndesirableUses, which calls this new TTI hook on every LSRUse's LSRFixup UserInst, and deletes the LSRUse if the hook returns false for one of the instructions.
- An impl of this TTI hook for ARM, which returns false for VCTP intrinsics.
Note that I'm unsure about these changes. Do you feel like this is an appropriate fix for this issue?
Allowing LSR to do its thing and fixing it ourselves later in a backend pass is tricky and fragile, so I personally feel that fixing the problem in LSR directly is the best course of action.
Is this necessary? If this is executed in the loop then I don't see the worth of optimising the first iteration.