Page MenuHomePhabricator

[LSR][ARM] Add new TTI hook to mark some LSR chains as profitable

Authored by Pierre-vh on May 5 2020, 7:06 AM.



Sometimes, LSR may change the way the ARM VCTP intrinsic's operand is calculated, and the change, while correct, will completely block tail-predication, for instance by defining the operand inside the loop.
This patch aims to fix this issue by adding a new TTI hook to tell LSR to ignore some instructions (for now, only the VCTP intrinsic on ARM).

This patch adds:

  • A new TTI hook: bool canLSRFixupInstruction(Instruction *I), which returns false when LSR shouldn't change I's operands.
  • A new function in LSR called FilterOutUndesirableUses, which calls this new TTI hook on every LSRUse's LSRFixup UserInst, and deletes the LSRUse if the hook returns false for one of the instructions.
  • An impl of this TTI hook for ARM, which returns false for VCTP intrinsics.

Note that I'm unsure about these changes. Do you feel like this is an appropriate fix for this issue?
Allowing LSR to do its thing and fixing it ourselves later in a backend pass is tricky and fragile, so I personally feel that fixing the problem in LSR directly is the best course of action.

Diff Detail

Event Timeline

Pierre-vh created this revision.May 5 2020, 7:06 AM
samparker added inline comments.May 5 2020, 11:32 PM

How about doing the filtering straight away, even before 'OptimizeShadowIV'? Can we just remove the instruction before we generate any formulae?

Pierre-vh updated this revision to Diff 262637.May 7 2020, 6:38 AM
Pierre-vh retitled this revision from [LSR][ARM] Add new TTI hook to discard unwanted LSRUses to [LSR][ARM] Add new TTI hook to mark some LSR chains as profitable.

Changing the implementation of the patch.

  • TTI hook renamed to isProfitableLSRChainElement
    • It now returns true for the VCTP
  • Removed FilterOutUndesirableUses
  • Now isProfitableLSRChainElement is called in LSR's isProfitableChain function. If it returns true for one of the chain's UserInst, the chain will be considered profitable and will not be optimized by LSR.

Looks good. Would you mind also adding a codegen test so we can see this going through the whole pipeline, successfully being tail predicated?


Is this necessary? If this is executed in the loop then I don't see the worth of optimising the first iteration.

Pierre-vh updated this revision to Diff 263435.May 12 2020, 7:36 AM
Pierre-vh marked 2 inline comments as done.
  • Simplified the vctp-chains.ll test: removed useless attributes
  • Added codegen test to test the whole pipeline (based on the vctpi32 test from vctp-chains.ll)

It's not executed in the loop below, that one skips the first element, so the first element has to be handled separately.

See IVChain's implementation:

struct IVChain {
  SmallVector<IVInc, 1> Incs;
   /* ... */

  using const_iterator = SmallVectorImpl<IVInc>::const_iterator;

  // Return the first increment in the chain.
  const_iterator begin() const {
    return std::next(Incs.begin());
  const_iterator end() const {
    return Incs.end();

  /* ... */
samparker accepted this revision.May 12 2020, 11:39 PM

LGTM, cheers.

This revision is now accepted and ready to land.May 12 2020, 11:39 PM
This revision was automatically updated to reflect the committed changes.