This is an archive of the discontinued LLVM Phabricator instance.

[X86] Memory folding for commutative instructions.
ClosedPublic

Authored by RKSimon on Oct 9 2014, 6:04 AM.

Details

Summary

This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning.

This mainly helps the stack inliner better fold reloads of 3 (or more) operand instructions (VEX encoded SSE etc.) but by performing this in the lowest foldMemoryOperandImpl implementation it also replaces the X86InstrInfo::optimizeLoadInstr version and is now used by FastISel too.

Unlike the X86InstrInfo::optimizeLoadInstr implementation it uses findCommutedOpIndices instead of hard coded commute operand indices.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 14650.Oct 9 2014, 6:04 AM
RKSimon retitled this revision from to [X86] Memory folding for commutative instructions..
RKSimon updated this object.
RKSimon edited the test plan for this revision. (Show Details)
RKSimon added reviewers: nadav, andreadb, spatel.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added subscribers: alexr, Unknown Object (MLST).
RKSimon updated this revision to Diff 14665.Oct 9 2014, 11:29 AM

Tidied up check for commutative instructions - we can avoid isCommutable and use findCommutedOpIndices directly and then fold anything that is commutable under certain cases (e.g. FMA instructions).

Added the missing test case.

Hi Simon

lib/Target/X86/X86InstrInfo.cpp
4218 ↗(On Diff #14665)

I am not sure this is what we want generally speaking.
Indeed, if you look at the code you modified earlier (BTW having the full context would be easier for the review), we may not want to keep the commuted instruction unless the commute is in place:

MachineInstr *NewMI = commuteInstruction(MI, false);
Unable to commute.
if (!NewMI) return 0;
if (NewMI != MI) {
<——— here if the instruction is different we do not commute.

// New instruction. It doesn't need to be kept.
NewMI->eraseFromParent();
return 0;

}
Should we provide more arguments to have a finer control?

4219 ↗(On Diff #14665)

This assert is not valid.
This is legal for commuteInstruction to return nullptr.
Make an early exit here.
See r208371 for more details.

4228 ↗(On Diff #14665)

Ditto.

4218 ↗(On Diff #14650)

I am not sure this is what we want generally speaking.
Indeed, if you look at the code you modified earlier (BTW having the full context would be easier for the review), we may not want to keep the commuted instruction unless the commute is in place:

MachineInstr *NewMI = commuteInstruction(MI, false);
// Unable to commute.
if (!NewMI) return 0;
if (NewMI != MI) { // <——— here if the instruction is different we do not commute.
  // New instruction. It doesn't need to be kept.
  NewMI->eraseFromParent();
  return 0;
}

Should we provide more arguments to have a finer control?

4219 ↗(On Diff #14650)

This assert is not valid.
This is legal for commuteInstruction to return nullptr.
Make an early exit here.
See r208371 for more details.

4228 ↗(On Diff #14650)

Ditto.

RKSimon updated this revision to Diff 14722.Oct 10 2014, 6:56 AM

Hi Quentin,

I've added error handling code for commuteInstruction returning nullptr or a new MachineInstr*, both before and after the commuted folding attempt. This appears to be enough and we don't need to alter either the commute or folding call arguments to support it - we can keep to in-place instruction commutes only.

Simon.

qcolombet accepted this revision.Oct 10 2014, 9:35 AM
qcolombet added a reviewer: qcolombet.

Hi Simon,

LGTM.

As a side question, do you see any performance difference with that patch?

Thanks,
-Quentin

This revision is now accepted and ready to land.Oct 10 2014, 9:35 AM

As a side question, do you see any performance difference with that patch?

Thanks Quentin, on Jaguar I'm seeing a definite gain on (VEX encoded) SSE heavy code loops (physics, animation and vectormath), completely due to stack reload folding. On my Sandy Bridge Xeon the difference is a lot smaller / negligible. Although minor there are also improvements to code size / instruction packing.

Simon.

RKSimon closed this revision.Oct 12 2014, 4:03 AM
RKSimon updated this revision to Diff 14775.

Closed by commit rL219584 (authored by @RKSimon).

Excuse me, I have reverted this in r219595.
It broke i686 builders.

I'll send a reproducible testcase.ll later.