128-bit cases are broken, tests need completion and fixes, and divide-by-constant optimizations are missing (these would also apply after the legalizer, not during as would be required)
Details
Diff Detail
Event Timeline
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
4167–4168 | This won't work when Z is 0 if shifting by BW is undefined. |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
4167–4168 | That's handled below but not included in the high level comment |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
4196–4197 | It would be slicker to implement (X << (BW - (Z % BW))) as (X << 1 << ((BW - 1) - (Z % BW))). |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
4196–4197 | I copied this from the DAG version. Maybe that should be updated first? (or maybe it happens to turn into this anyway from other combines) |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
4196–4197 | I don't actually see how this is better? It doesn't eliminate the urem, and adds an extra instruction? |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
4196–4197 | Adding one instruction here (the << 1) avoids the need for the compare and select at the end to handle the shift-by-zero case. |
Yes, I have an updated version I haven't posted yet. The problem I ended up blocked on was the fact that we don't have the divide-by-constant optimizations, combined with 128-bit division legalization not working
llvm/include/llvm/CodeGen/GlobalISel/Utils.h | ||
---|---|---|
249 ↗ | (On Diff #295041) | Comment about nullptr seems wrong. Match doesn't take any pointer arguments. |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
4172 | Should be C && C->getValue().urem(BW) != 0. | |
4276–4277 | Please add some kind of comment to the effect that these expansions are not quite right, cos they might try to shift by BW. | |
llvm/lib/CodeGen/GlobalISel/Utils.cpp | ||
750 ↗ | (On Diff #295041) | Could return a bit earlier with: if (Def->getOpcode() == TargetOpcode::G_IMPLICIT_DEF) return AllowUndefs; |
770 ↗ | (On Diff #295041) | You need this return true inside the body of the if and a return false at the end of the function. Alternatively, return early as soon as we know that Def is not G_BUILD_VECTOR*. |
Anything we can help on? For the divide by constant opts, I guess this is: X / C -> ashr X, log2(C)? Anything else?
It's a urem here, I think this was the important one:
// fold (urem x, pow2) -> (and x, pow2-1) // fold (urem x, (shl pow2, y)) -> (and x, (add (shl pow2, y), -1))
Reverse ping! We are now seeing funnel shifts in the wild, because InstCombine has started generating them. Can this patch go in as-is? Are there correctness problems or is it just a matter of optimizing the udiv/urem into shifts?
Fix i128 case and rebase. matchUnaryPredicate was broken so it was emitting urem in cases where it wasn't needed. We're still emitting worse code in the end from urem expansion though, but I think everything works
llvm/include/llvm/CodeGen/GlobalISel/Utils.h | ||
---|---|---|
328–330 ↗ | (On Diff #332044) | Nit: "AllowUndefs" Comment about nullptr seems wrong. Match doesn't take any pointer arguments. |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
4172 | I still think this check for !C is either wrong or redundant, depending on what the behaviour of matchUnaryPredicate is supposed to be. | |
4196–4197 | Using ~Z here only works if BitWidth is a power of two. Otherwise you need something like BitWidth - 1 - (Z % BitWidth). | |
llvm/lib/CodeGen/GlobalISel/Utils.cpp | ||
878 ↗ | (On Diff #332044) | At the top level, matchUnaryPredicate only calls Match for G_CONSTANTs (not for G_IMPLICIT_DEF despite what the comment in Utils.h says). Inside a G_BUILD_VECTOR(_TRUNC) you call Match on anything. Did you mean to have a check for G_CONSTANT here too, like the SelectionDAG version does? |
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-fshl.mir | ||
49 | Not related to this patch, but I think it would work out better for AMDGPU to implement 32-bit G_FSHL as a 64-bit left shift and then extract the high part of the result. |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
4196–4197 | DAG only tries to use a shift in the other direction if isPowerOf2_32(BW) (TargetLowering.cpp line 6456). |
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
4172 | Huh? In the variable case you can't assume it's a non-zero mod, because it might not be. This function is really isKnownToBeNonZeroModBitWidthOrUndef, and the safe default is to return false. |
Make matchUnaryPredicate behave like the DAG version, and only handle constant or undef
llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp | ||
---|---|---|
4196–4197 | I realise I approved the patch but I still think this needs fixing. The ~Z thing only works if BW is a power of two. |
This won't work when Z is 0 if shifting by BW is undefined.