This is an archive of the discontinued LLVM Phabricator instance.

[Codegen][LegalizeIntegerTypes] New legalization strategy for scalar shifts: shift through stack
ClosedPublic

Authored by lebedev.ri on Dec 23 2022, 2:57 PM.

Details

Summary

https://reviews.llvm.org/D140493 is going to teach SROA how to promote allocas
that have variably-indexed loads. That does bring up questions of cost model,
since that requires creating wide shifts.

Indeed, our legalization for them is not optimal.
We either split it into parts, or lower it into a libcall.
But if the shift amount is by a multiple of CHAR_BIT,
we can also legalize it throught stack.

The basic idea is very simple:

  1. Get a stack slot 2x the width of the shift type
  2. store the value we are shifting into one half of the slot
  3. pad the other half of the slot. for logical shifts, with zero, for arithmetic shift with signbit
  4. index into the slot (starting from the base half into which we spilled, either upwards or downwards)
  5. load
  6. split loaded integer

This works for both little-endian and big-endian machines:
https://alive2.llvm.org/ce/z/YNVwd5

And better yet, if the original shift amount was not a multiple of CHAR_BIT,
we can just shift by that remainder afterwards: https://alive2.llvm.org/ce/z/pz5G-K

I think, if we are going perform shift->shift-by-parts expansion more than once,
we should instead go through stack, which is what this patch does.

Diff Detail

Event Timeline

lebedev.ri created this revision.Dec 23 2022, 2:57 PM
lebedev.ri requested review of this revision.Dec 23 2022, 2:57 PM
lebedev.ri edited the summary of this revision. (Show Details)Dec 23 2022, 3:34 PM
arsenm added a subscriber: arsenm.Dec 23 2022, 3:38 PM

Can I convince you to repeat this for globalisel

craig.topper added inline comments.Dec 23 2022, 3:51 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
187 ↗(On Diff #485161)

Unused?

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4150

Unused variable in release builds

4150

unsigned

4158

Can these be unsigned? I'm not sure why getScalarSizeInBits returns uint64_t. The SizeInBits methods in Datalayout and Type return unsigned.

4195

Can this be SRA of Shiftee by VTBits-1?

4212

Can this be clipped by masking with AND?

lebedev.ri added inline comments.Dec 23 2022, 3:55 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4218

This is a bug, it should be sext, patch incoming.

lebedev.ri marked 6 inline comments as done.

@craig.topper thank you for taking a look!
Addressing all review notes.

Correct alive2 link: https://alive2.llvm.org/ce/z/x9gu7H

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4195

Right, i was not sure what should we emit here.
SETCC to MVT::i1 is a bit questionable this late, so let's indeed go with ashr.

4212

Right. I was just looking into that. We just need to call clampDynamicVectorIndex().

Now with a RISCV test, too.

Do conjure MachinePointerInfo.

Do conjure MachinePointerInfo.

Isnt pointerinfo for stack automatically inferred?

lebedev.ri edited the summary of this revision. (Show Details)

Looks like i'm overengineering this.
The logic is even simpler: https://alive2.llvm.org/ce/z/YNVwd5
Instead of manually picking what goes into which half of stack slot,
just extend+position the value that is being shifted,
and just spill it. That allows the slot to be always filled in-order.

Do conjure MachinePointerInfo.

Isnt pointerinfo for stack automatically inferred?

Not for variable indexes into stack slot.

lebedev.ri retitled this revision from [Codegen][LegalizeIntegerTypes] New legalization strategy for scalar shifts w/ shift amount by a multiple of CHAR_BIT to [Codegen][LegalizeIntegerTypes] New legalization strategy for scalar shifts: shift through stack.
lebedev.ri edited the summary of this revision. (Show Details)

Aha, and i'm missing the point.
This *nicely* generalizes to non-CHAR_BIT-multiple shift amounts: https://alive2.llvm.org/ce/z/pz5G-K

(The patch without context in test diff's is already 7.5Mb, phab does not take patches that are larger than 8Mb.)

Actually, code changes don't need full context, and that significantly reduces patch size,
allowing for full context in test changes. Should have thought of that first.

A few header changes seem to be missing? I can't find a definition of ShiftLegalizationStrategy.

It might be worth implementing a strategy that avoids unaligned loads (by splitting the shift amount by the native register width instead of CHAR_BIT). On targets that don't have native unaligned loads, they're pretty expensive. Even on targets that do have unaligned loads, an aligned load can reduce the cost of the store forwarding stall. (But on targets with fast unaligned loads, they're probably worth using if the shift amount is known to be a multiple of CHAR_BIT.)

lebedev.ri updated this revision to Diff 486115.Jan 3 2023, 4:32 PM

Rebased.

A few header changes seem to be missing? I can't find a definition of ShiftLegalizationStrategy.

It might be worth implementing a strategy that avoids unaligned loads (by splitting the shift amount by the native register width instead of CHAR_BIT). On targets that don't have native unaligned loads, they're pretty expensive. Even on targets that do have unaligned loads, an aligned load can reduce the cost of the store forwarding stall. (But on targets with fast unaligned loads, they're probably worth using if the shift amount is known to be a multiple of CHAR_BIT.)

@efriedma thank you for taking a look!
Indeed, the fact that loads are unaligned has occurred to me.
But, this patch is already much bigger than i already planned.
Honestly, initially i only looked to handle lshr by multiple of CHAR_BIT,
and *everything* ontop of that just kinda kept being added.

I may be interested in looking into further follow-up improvements,
as long as the original patch (in general) does not result in really dragged our review,
but i really don't feel it's reasonable to do even more stuff here.
WDYT?

I won't insist on implementing everything in one patch. I was going to say we need a bailout for targets that don't have unaligned loads... but looking at llvm/test/CodeGen/RISCV/shifts.ll, I guess the previous codegen is terrible enough that it's an improvement even if we expand unaligned loads to byte loads. So I guess that's fine.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4160

MachinePointerInfo::getUnknownStack instead of MachinePointerInfo()? Or actually, DAGTypeLegalizer::SplitVecRes_INSERT_SUBVECTOR does the following:

auto FrameIndex = cast<FrameIndexSDNode>(StackPtr.getNode())->getIndex();
auto PtrInfo = MachinePointerInfo::getFixedStack(MF, FrameIndex);
lebedev.ri marked an inline comment as done.Jan 3 2023, 5:24 PM

I won't insist on implementing everything in one patch. I was going to say we need a bailout for targets that don't have unaligned loads... but looking at llvm/test/CodeGen/RISCV/shifts.ll, I guess the previous codegen is terrible enough that it's an improvement even if we expand unaligned loads to byte loads. So I guess that's fine.

That was precisely my point. It's already not great, and this does not seem to regress it.

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4160

As per @craig.topper, and looking at -debug,
we get right MachinePointerInfo for stack-based
memory operations automatically. Not doing that manually
frees us from having to track alignment, too.

efriedma added inline comments.Jan 3 2023, 7:11 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4160

I guess optimizations kick in, sure, so you don't need to compute the exact pointer info.

But please at least use getUnknownStack(); a default-constructed MachinePointerInfo implicitly points to address-space zero.

craig.topper added inline comments.Jan 3 2023, 8:38 PM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4160

Isn't there a check for null pointer info in getLoad/getStore to infer the pointer info? Will that still trigger with getUnknownStack()?

efriedma added inline comments.Jan 4 2023, 10:24 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4160

Having an interface that says "the address-space of the store is based on the MachinePointerInfo()... unless the address refers to a frame index, in which case we ignore the MachinePointerInfo() you passed" is not at all intuitive. The fact that it works that way looks like a historical accident. Please don't abuse this behavior; if you want a more convenient interface for a store to a frame index, please add a new interface.

lebedev.ri marked 2 inline comments as done.Jan 4 2023, 10:31 AM
lebedev.ri added inline comments.
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4160

FWIW, originally i did exactly what DAGTypeLegalizer::SplitVecRes_INSERT_SUBVECTOR() did,
but @craig.topper noted that it happens automatically.

lebedev.ri marked an inline comment as done.

Undo changes that were previously requested.

@efriedma @craig.topper does anyone have any further thoughts here? Is this waiting on me? If not, i'd like to get this going and look into follow-ups.

craig.topper added inline comments.Jan 13 2023, 10:46 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4134

Is this ByteVecVT only used to make clampDynamicVectorIndex work? It won't cause vector instructions to be generated from scalar code will it?

4164

I think the alignment is incorrect on this store.

4176

Is this always going to create an AND? Trying to decide if bringing "Vector" into this made this more confusing.

4190

Nit: Use curly braces for consistency with else

4212

Does this shift end up in ExpandShiftWithKnownAmountBit because of the AND?

lebedev.ri marked 5 inline comments as done.

@craig.topper thank you for taking a look!
Addressing all review notes.

lebedev.ri added inline comments.Jan 13 2023, 11:44 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp
4134

Yes, only for clampDynamicVectorIndex().

4164

Bingo!

4212

That's the goal, yes.

craig.topper added inline comments.Jan 13 2023, 11:50 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
9352 ↗(On Diff #489087)

We can drop this now right?

lebedev.ri marked an inline comment as done.

Drop clampDynamicVectorIndex() changes.

This revision is now accepted and ready to land.Jan 13 2023, 11:29 PM

LGTM

Thank you for the review!

This revision was landed with ongoing or failed builds.Jan 14 2023, 8:13 AM
This revision was automatically updated to reflect the committed changes.