This is an archive of the discontinued LLVM Phabricator instance.

[MachineScheduler] Allow clustering mem ops with complex addresses
ClosedPublic

Authored by foad on Dec 18 2019, 4:27 AM.

Details

Summary

The generic BaseMemOpClusterMutation calls into TargetInstrInfo to
analyze the address of each load/store instruction, and again to decide
whether two instructions should be clustered. Previously this had to
represent each address as a single base operand plus a constant byte
offset. This patch extends it to support any number of base operands.

The old target hook getMemOperandWithOffset is now a convenience
function for callers that are only prepared to handle a single base
operand. It calls the new more general target hook
getMemOperandsWithOffset.

The only requirements for the base operands returned by
getMemOperandsWithOffset are:

  • they can be sorted by MemOpInfo::Compare, such that clusterable ops

get sorted next to each other, and

  • shouldClusterMemOps knows what they mean.

One simple follow-on is to enable clustering of AMDGPU FLAT instructions
with both vaddr and saddr (base register + offset register). I've left
a FIXME in the code for this case.

Diff Detail

Event Timeline

foad created this revision.Dec 18 2019, 4:27 AM
foad added a comment.Dec 20 2019, 4:29 AM

The old target hook TargetInstrInfo::getMemOperandWithOffset has been
renamed and split into three:

getMemBaseRegWithOffset: returns a base register plus byte offset,
default implementation calls...

getMemBaseOpWithOffset: returns a base operand (register or frame index)
plus byte offset, default implementation calls...

[...]

I could do the renaming and splitting into getMemBaseRegWithOffset and getMemBaseOpWithOffset as a preparatory patch if that would make the review easier. It's mechanical and uncontroversial (I think). The only reason for reintroducing the getMemBaseRegWithOffset variant (which is all we had prior to D54846, when it was called getMemOpBaseRegImmOfs) is that it simplifies some call sites.

LGTM for AMDGPU part, but you need review for each affected target.

I've wanted this for a long time

llvm/include/llvm/CodeGen/TargetInstrInfo.h
1267

Why doesn't this one return bool for failure?

foad marked an inline comment as done.Dec 23 2019, 2:45 AM
foad added inline comments.
llvm/include/llvm/CodeGen/TargetInstrInfo.h
1267

Returning without adding anything to AddrOps signifies failure. I'm happy to change it to return a bool if you think it'd be clearer.

I've been pondering other changes to this API like:

  • Returning the offset in bytes as an int64_t, like the getMemBase* functions do. Then AddrOps would only be for base (register/FI) operands.
  • Also return the width in bytes of the memory access. Several targets already have internal functions that do this.
foad updated this revision to Diff 236830.Jan 8 2020, 7:53 AM

New simpler approach. This version only introduces one new target hook,
getMemOperandsWithOffset, and leaves the old getMemOperandWithOffset as
a convenience function. It also leaves the immediate Offset as an
int64_t value in bytes, instead of trying to represent it as one or more
MachineOperands.

Adding target owners to review the (mostly mechanical) changes to affected targets.

foad marked an inline comment as done.Jan 8 2020, 7:57 AM
arsenm added a comment.Jan 8 2020, 8:17 AM

Mostly LGTM, except this diff seems to have absorbed a number of recent changes into the diff

llvm/include/llvm/CodeGen/TargetInstrInfo.h
1748

Somehow absorbed some of a diff from a few weeks ago

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2035–2038

Fold into return BaseOp->isReg() || BaseOp->isFI();

5880–5888

Looks like an unrelated change?

Unit tests: pass. 61316 tests passed, 0 failed and 736 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

foad added a comment.Jan 8 2020, 8:58 AM

Mostly LGTM, except this diff seems to have absorbed a number of recent changes into the diff

I forgot to mention that the new patch has also been rebased on current master, so if you're looking at the diffs between patch 1 and patch 2 then you'll see lots of spurious stuff.

foad edited the summary of this revision. (Show Details)Jan 8 2020, 9:01 AM
foad added a reviewer: MatzeB.Jan 8 2020, 9:05 AM

LG for Lanai changes too

lenary removed a subscriber: lenary.Jan 20 2020, 7:07 AM
arsenm accepted this revision.Jan 20 2020, 5:18 PM

LGTM

This revision is now accepted and ready to land.Jan 20 2020, 5:18 PM
This revision was automatically updated to reflect the committed changes.