This is an archive of the discontinued LLVM Phabricator instance.

GlobalISel: Lower funnel shifts
ClosedPublic

Authored by arsenm on Mar 20 2020, 7:59 AM.

Details

Summary

128-bit cases are broken, tests need completion and fixes, and divide-by-constant optimizations are missing (these would also apply after the legalizer, not during as would be required)

Diff Detail

Event Timeline

arsenm created this revision.Mar 20 2020, 7:59 AM
foad added a subscriber: foad.Mar 20 2020, 8:17 AM
foad added inline comments.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
5104–5105

This won't work when Z is 0 if shifting by BW is undefined.

arsenm marked an inline comment as done.Mar 20 2020, 8:44 AM
arsenm added inline comments.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
5104–5105

That's handled below but not included in the high level comment

foad added inline comments.Mar 20 2020, 9:23 AM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
5133–5134

It would be slicker to implement (X << (BW - (Z % BW))) as (X << 1 << ((BW - 1) - (Z % BW))).

arsenm marked an inline comment as done.Mar 30 2020, 11:45 AM
arsenm added inline comments.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
5133–5134

I copied this from the DAG version. Maybe that should be updated first? (or maybe it happens to turn into this anyway from other combines)

arsenm marked an inline comment as done.Mar 30 2020, 7:31 PM
arsenm added inline comments.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
5133–5134

I don't actually see how this is better? It doesn't eliminate the urem, and adds an extra instruction?

foad added inline comments.Mar 31 2020, 12:36 AM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
5133–5134

Adding one instruction here (the << 1) avoids the need for the compare and select at the end to handle the shift-by-zero case.

RKSimon added inline comments.May 26 2020, 2:11 AM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
5127

@arsenm In D80489 for SelectionDAG we changed the pow2 InvShAmt case to ~ShAmt & Mask to help targets that can only use the bottom bits of the shift amount.

Herald added a project: Restricted Project. · View Herald TranscriptMay 26 2020, 2:11 AM

Is this still being worked on?

Is this still being worked on?

Yes, I have an updated version I haven't posted yet. The problem I ended up blocked on was the fact that we don't have the divide-by-constant optimizations, combined with 128-bit division legalization not working

arsenm updated this revision to Diff 295041.Sep 29 2020, 10:36 AM
arsenm retitled this revision from GlobalISel: Lower funnel shifts to WIP: GlobalISel: Lower funnel shifts.
arsenm edited the summary of this revision. (Show Details)
foad added inline comments.Sep 30 2020, 1:17 AM
llvm/include/llvm/CodeGen/GlobalISel/Utils.h
311

Comment about nullptr seems wrong. Match doesn't take any pointer arguments.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
5109

Should be C && C->getValue().urem(BW) != 0.

5213–5214

Please add some kind of comment to the effect that these expansions are not quite right, cos they might try to shift by BW.

llvm/lib/CodeGen/GlobalISel/Utils.cpp
924

Could return a bit earlier with:

if (Def->getOpcode() == TargetOpcode::G_IMPLICIT_DEF)
  return AllowUndefs;
944

You need this return true inside the body of the if and a return false at the end of the function. Alternatively, return early as soon as we know that Def is not G_BUILD_VECTOR*.

Is this still being worked on?

Yes, I have an updated version I haven't posted yet. The problem I ended up blocked on was the fact that we don't have the divide-by-constant optimizations, combined with 128-bit division legalization not working

Anything we can help on? For the divide by constant opts, I guess this is: X / C -> ashr X, log2(C)? Anything else?

Is this still being worked on?

Yes, I have an updated version I haven't posted yet. The problem I ended up blocked on was the fact that we don't have the divide-by-constant optimizations, combined with 128-bit division legalization not working

Anything we can help on? For the divide by constant opts, I guess this is: X / C -> ashr X, log2(C)? Anything else?

It's a urem here, I think this was the important one:

// fold (urem x, pow2) -> (and x, pow2-1)

// fold (urem x, (shl pow2, y)) -> (and x, (add (shl pow2, y), -1))
foad added a comment.Mar 18 2021, 4:43 AM

Reverse ping! We are now seeing funnel shifts in the wild, because InstCombine has started generating them. Can this patch go in as-is? Are there correctness problems or is it just a matter of optimizing the udiv/urem into shifts?

arsenm updated this revision to Diff 332044.Mar 19 2021, 6:04 PM
arsenm retitled this revision from WIP: GlobalISel: Lower funnel shifts to GlobalISel: Lower funnel shifts.

Fix i128 case and rebase. matchUnaryPredicate was broken so it was emitting urem in cases where it wasn't needed. We're still emitting worse code in the end from urem expansion though, but I think everything works

foad added inline comments.Mar 22 2021, 4:37 AM
llvm/include/llvm/CodeGen/GlobalISel/Utils.h
331–333

Nit: "AllowUndefs"

Comment about nullptr seems wrong. Match doesn't take any pointer arguments.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
5109

I still think this check for !C is either wrong or redundant, depending on what the behaviour of matchUnaryPredicate is supposed to be.

5133–5134

Using ~Z here only works if BitWidth is a power of two. Otherwise you need something like BitWidth - 1 - (Z % BitWidth).

llvm/lib/CodeGen/GlobalISel/Utils.cpp
953

At the top level, matchUnaryPredicate only calls Match for G_CONSTANTs (not for G_IMPLICIT_DEF despite what the comment in Utils.h says).

Inside a G_BUILD_VECTOR(_TRUNC) you call Match on anything. Did you mean to have a check for G_CONSTANT here too, like the SelectionDAG version does?

llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fshl.mir
49

Not related to this patch, but I think it would work out better for AMDGPU to implement 32-bit G_FSHL as a 64-bit left shift and then extract the high part of the result.

arsenm marked 2 inline comments as done.Mar 22 2021, 12:42 PM
arsenm added inline comments.
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
5109

This is is correct because in the variable case, you have to assume it's a non-zero mod

5133–5134

Isn't that what the isNonZeroModBitWidthOrUndef check is for? I just directly copied this from the DAG, so this should match the behavior

arsenm updated this revision to Diff 332401.Mar 22 2021, 12:43 PM

Address comments

foad added inline comments.Mar 22 2021, 1:28 PM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
5133–5134

DAG only tries to use a shift in the other direction if isPowerOf2_32(BW) (TargetLowering.cpp line 6456).

foad added inline comments.Mar 22 2021, 1:32 PM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
5109

Huh? In the variable case you can't assume it's a non-zero mod, because it might not be. This function is really isKnownToBeNonZeroModBitWidthOrUndef, and the safe default is to return false.

arsenm updated this revision to Diff 332425.Mar 22 2021, 2:32 PM

Make matchUnaryPredicate behave like the DAG version, and only handle constant or undef

foad accepted this revision.Mar 23 2021, 4:25 AM

LGTM, thanks!

This revision is now accepted and ready to land.Mar 23 2021, 4:25 AM
foad added inline comments.Mar 25 2021, 6:26 AM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
5133–5134

I realise I approved the patch but I still think this needs fixing. The ~Z thing only works if BW is a power of two.

arsenm added inline comments.Mar 29 2021, 2:28 PM
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
5133–5134
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp