If the shift is NUW/NSW, then it does demand the high bits.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
In fact, I prefer the following modifications:
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp index dbf318a85e9e..76aa3c6d7623 100644 --- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp +++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp @@ -1761,6 +1761,8 @@ bool TargetLowering::SimplifyDemandedBits( } APInt InDemandedMask = DemandedBits.lshr(ShAmt); + if (Op.getNode()->getFlags().hasNoUnsignedWrap()) + InDemandedMask.setHighBits(ShAmt); if (SimplifyDemandedBits(Op0, InDemandedMask, DemandedElts, Known, TLO, Depth + 1)) return true;
But this modification, for x86 is not friendly.
If there are more suggestions, please tell.
What prevents SimplifyDemandedBits from turning this back into an aextload and causing an infinite loop.
llvm/lib/Target/RISCV/RISCVISelLowering.cpp | ||
---|---|---|
10313 | Need to check that the SHL is the only user of the load or this will duplicate a load. | |
10317 | You need to replace the chain output of the original load with the chain output of the new load. |
@spatel @RKSimon @lebedev.ri is it a bug that we are neither demanding the upper bits for nuw/nsw nor removing the poison flags if SimplifyDemandedBits returns true?
FMF are pretty rare in DAG, so this could be a simple oversight. It does seem like we should drop poison flags in that case, yes.
I think it shouldn't demand them. The only reason it "needs" them,
is to satisfy it's poison-generating flags. I think it would be better
to keep not demanding them, and drop the poison-generating flags.
For example consider:
%y = and %x, 15 %z = shl nuw %y, 4 => %z = shl %x, 4
Now if you demand the high bits (and only because of nuw!),
you suddenly can't look past the and.
You do have a point, in fact I just want to optimize RISCV. I have a question now, I see InstCombineSimplifyDemanded.cpp has similar support, can you help to explain why?
Having re-read the whole thread here, it is not obvious to me what is the original problem that is trying to be solved here?
I have a question now, I see InstCombineSimplifyDemanded.cpp has similar support, can you help to explain why?
Optimizations in back-end, including this one, are mainly to deal with
optimization opportiunities that arise during instruction lowering/legalization,
and other sequences that are not well-exploitable in generic IR.
OTOH, the middle-end optimizations is where the optimizations should happen in general.
It's likely the InstCombineSimplifyDemanded code has the same bug.
It's a bug if we are both propagating flags and not accounting for them - we've definitely caught cases like that in IR.
I agree that it seems like the better choice would be to ignore/drop flags (the alternative is that flags could penalize optimization). But it's also possible that changing it could cause missed folds because we lost information by dropping flags. We probably just need to experiment and see what falls out from it.
I'm not sure what this is trying to fix originally,
but i think the test changes clearly show that we should be going in the opposite direction.
llvm/test/CodeGen/AArch64/arm64-shifted-sext.ll | ||
---|---|---|
199 ↗ | (On Diff #485325) | This seems to be a regression |
llvm/test/CodeGen/AMDGPU/shl.ll | ||
492 ↗ | (On Diff #485325) | This seems to be a regression |
llvm/test/CodeGen/RISCV/rv64i-complex-float.ll | ||
23 ↗ | (On Diff #485325) | This seems to be a regression |
llvm/test/CodeGen/X86/parity.ll | ||
404 ↗ | (On Diff #485325) | This seems to be a massive regression |
533 ↗ | (On Diff #485325) | This seems to be a massive regression |
657 ↗ | (On Diff #485325) | This seems to be a massive regression |
llvm/test/CodeGen/X86/setcc.ll | ||
76 ↗ | (On Diff #485325) | This seems to be a regression |
llvm/test/CodeGen/X86/split-store.ll | ||
176 ↗ | (On Diff #485325) | This seems to be a regression |
197 ↗ | (On Diff #485325) | This seems to be a regression |
217 ↗ | (On Diff #485325) | This seems to be a regression |
llvm/test/CodeGen/X86/vector-shuffle-combining-avx512bwvl.ll | ||
151 ↗ | (On Diff #485325) | This seems to be a regression |
I'll try to fill in the original problem. See the test in aext-to-zext.ll
We have initially have this DAG
Initial selection DAG: %bb.0 'read:entry' SelectionDAG has 19 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t3: i64 = Constant<0> t11: i16 = Constant<8> t8: i64 = add nuw t2, Constant:i64<1> t9: i8,ch = load<(load (s8) from %ir.arrayidx1)> t0, t8, undef:i64 t10: i16 = zero_extend t9 t13: i16 = shl nuw t10, Constant:i64<8> t5: i8,ch = load<(load (s8) from %ir.adr)> t0, t2, undef:i64 t6: i16 = zero_extend t5 t14: i16 = or t13, t6 t15: i64 = zero_extend t14 t17: ch,glue = CopyToReg t0, Register:i64 $x10, t15 t18: ch = RISCVISD::RET_FLAG t17, Register:i64 $x10, t17:1
Part way through the first DAGCombine, t10 zero_extend is replaced by an any_extend because the extended bits aren't demanded. The users are a shl by 8 and a zero_extend that reads bits 63:16 of the shl.
The extends get combined with loads to leave this DAG
Optimized lowered selection DAG: %bb.0 'read:entry' SelectionDAG has 15 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t8: i64 = add nuw t2, Constant:i64<1> t20: i16,ch = load<(load (s8) from %ir.arrayidx1), anyext from i8> t0, t8, undef:i64 t13: i16 = shl nuw t20, Constant:i64<8> t21: i16,ch = load<(load (s8) from %ir.adr), zext from i8> t0, t2, undef:i64 t14: i16 = or t13, t21 t15: i64 = zero_extend t14 t17: ch,glue = CopyToReg t0, Register:i64 $x10, t15 t18: ch = RISCVISD::RET_FLAG t17, Register:i64 $x10, t17:1
We type legalize and optimize to this
SelectionDAG has 16 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t8: i64 = add nuw t2, Constant:i64<1> t22: i64,ch = load<(load (s8) from %ir.arrayidx1), anyext from i8> t0, t8, undef:i64 t23: i64 = shl nuw t22, Constant:i64<8> t24: i64,ch = load<(load (s8) from %ir.adr), zext from i8> t0, t2, undef:i64 t25: i64 = or t23, t24 t27: i64 = and t25, Constant:i64<65535> t17: ch,glue = CopyToReg t0, Register:i64 $x10, t27 t18: ch = RISCVISD::RET_FLAG t17, Register:i64 $x10, t17:1
The t27 'and' would be unnecessary if we had two zextloads instead of a zextload and an aextload.
This patch happens to prevent the original zero_extend from becoming an any_extend. Which solves the problem, but I don't think it's robust.
Need to check that the SHL is the only user of the load or this will duplicate a load.