This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add custom isel for (add X, imm) used by load/stores.
ClosedPublic

Authored by craig.topper on May 27 2022, 3:48 PM.

Details

Summary

If the imm is out of range for an ADDI, we will materialize it in
a register using multiple instructions. If the ADD is used by a
load/store, doPeepholeLoadStoreADDI can try to pull an ADDI from
the constant materialization into the load/store offset. This only
works if the ADD has a single use, otherwise the peephole would have
to rebuild multiple nodes.

This patch instead tries to solve the problem when the add is selected.
We check that the add is only used by loads/stores and if it is
we will select it to (ADDI (ADD X, Imm-Lo12), Lo12). This will enable
the simple case in doPeepholeLoadStoreADDI that can bypass an ADDI
used as a pointer. As a result we can remove the more complicated
peephole from doPeepholeLoadStoreADDI.

Diff Detail

Event Timeline

craig.topper created this revision.May 27 2022, 3:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2022, 3:48 PM
craig.topper requested review of this revision.May 27 2022, 3:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2022, 3:48 PM

Just a thought on an alternate approach. Feel free to ignore.

Given:
addi a2, a2, Low12C
add a2, a0, a2

Is there any reason we shouldn't canonicalize toward:
add a2, a0, a2
addi a2, a2, Low12C

That is, try to push the add with immediate towards the users? (Assume a one use restriction on the original addi.)

This isn't an optimization per se, but if we could treat it as a canonicalization, I think it simplifies the address matching problem significantly.

Just a thought on an alternate approach. Feel free to ignore.

Given:
addi a2, a2, Low12C
add a2, a0, a2

Is there any reason we shouldn't canonicalize toward:
add a2, a0, a2
addi a2, a2, Low12C

That is, try to push the add with immediate towards the users? (Assume a one use restriction on the original addi.)

This isn't an optimization per se, but if we could treat it as a canonicalization, I think it simplifies the address matching problem significantly.

Where do you propose to canonicalize it? Another peephole between isel and doPeepholeLoadStoreADDI?

StephenFan added inline comments.May 29 2022, 9:34 PM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
703

(ADDI (ADD X, Imm-Hi), Imm-Lo12) ?

craig.topper added inline comments.May 29 2022, 11:10 PM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
703

That - was meant as subtraction.

Just a thought on an alternate approach. Feel free to ignore.

Given:
addi a2, a2, Low12C
add a2, a0, a2

Is there any reason we shouldn't canonicalize toward:
add a2, a0, a2
addi a2, a2, Low12C

That is, try to push the add with immediate towards the users? (Assume a one use restriction on the original addi.)

This isn't an optimization per se, but if we could treat it as a canonicalization, I think it simplifies the address matching problem significantly.

Where do you propose to canonicalize it? Another peephole between isel and doPeepholeLoadStoreADDI?

It turns out I had a wrong mental model on SDAG handling of constants. I had been thinking we'd expanded the constant materialization sequences during legalization; looking at an example with debug output, this turns out not to be the case. As such, my assumption that this was just a DAGCombine rule is off base.

Given that, yeah, the structure would seem to require a separate peephole between ISEL and the current address matching. ("between" might be slightly the wrong word here as we could probably fixed point the later two.)

Just a thought on an alternate approach. Feel free to ignore.

Given:
addi a2, a2, Low12C
add a2, a0, a2

Is there any reason we shouldn't canonicalize toward:
add a2, a0, a2
addi a2, a2, Low12C

That is, try to push the add with immediate towards the users? (Assume a one use restriction on the original addi.)

This isn't an optimization per se, but if we could treat it as a canonicalization, I think it simplifies the address matching problem significantly.

Where do you propose to canonicalize it? Another peephole between isel and doPeepholeLoadStoreADDI?

It turns out I had a wrong mental model on SDAG handling of constants. I had been thinking we'd expanded the constant materialization sequences during legalization; looking at an example with debug output, this turns out not to be the case. As such, my assumption that this was just a DAGCombine rule is off base.

Given that, yeah, the structure would seem to require a separate peephole between ISEL and the current address matching. ("between" might be slightly the wrong word here as we could probably fixed point the later two.)

I'm not sure it would significantly simplify much over this patch. I would still want to check the memory folding opportunity before moving the ADDI across the ADD. Some targets may support LUI+ADDI macrofusion so we shouldn't split those up unless the ADDI is guaranteed to be removed.

Doing it as part of isel means we strip the lower bits off before constant materialization instead of needing to pattern match the LUI+ADDIW in a peephole. Though I guess maybe that could be avoided if we used LUI+ADDI instead of LUI+ADDIW when possible.

reames accepted this revision.Jun 2 2022, 12:06 PM

LGTM

Not entirely thrilled with this, but don't want perfection to be the enemy of the good here. We can take this and continue to think about other approaches to the problem.

This revision is now accepted and ready to land.Jun 2 2022, 12:06 PM

LGTM

Not entirely thrilled with this, but don't want perfection to be the enemy of the good here. We can take this and continue to think about other approaches to the problem.

Part of me wonders if we should move load/store addressing match to ComplexPat that finds the register and offset. Similar to how we do address matching on X86. I think that would allow us to remove the late peephole. Does that sound like a direction I should investigate?

LGTM

Not entirely thrilled with this, but don't want perfection to be the enemy of the good here. We can take this and continue to think about other approaches to the problem.

Part of me wonders if we should move load/store addressing match to ComplexPat that finds the register and offset. Similar to how we do address matching on X86. I think that would allow us to remove the late peephole. Does that sound like a direction I should investigate?

Honestly, not sure. I don't yet have enough context to have a gut feel to this.

This revision was landed with ongoing or failed builds.Jun 2 2022, 1:45 PM
This revision was automatically updated to reflect the committed changes.