This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/RISCV/
-
Target/
-
RISCV/
2
RISCVISelLowering.cpp
-
test/CodeGen/RISCV/
-
CodeGen/
-
RISCV/
-
aext-to-zext.ll

Differential D140665

[SelectionDAG][RISCV][X86][AArch64][AMDGPU][PowerPC] Improve SimplifyDemandedBits for SHL with NUW/NSW flags.
Needs RevisionPublic

Authored by liaolucy on Dec 25 2022, 7:59 PM.

Download Raw Diff

Details

Reviewers

craig.topper
asb
spatel
RKSimon
lebedev.ri

Summary

If the shift is NUW/NSW, then it does demand the high bits.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,060 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
	60,060 ms	x64 debian > libFuzzer.libFuzzer::minimize_crash.test
	60,070 ms	x64 debian > libFuzzer.libFuzzer::out-of-process-fuzz.test
	60,040 ms	x64 debian > libFuzzer.libFuzzer::value-profile-load.test

Event Timeline

liaolucy created this revision.Dec 25 2022, 7:59 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 25 2022, 7:59 PM

Herald added subscribers: sunshaoce, VincentWu, StephenFan and 28 others. · View Herald Transcript

liaolucy requested review of this revision.Dec 25 2022, 7:59 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 25 2022, 7:59 PM

Herald added subscribers: llvm-commits, • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B204897: Diff 485264.Dec 25 2022, 8:58 PM

Is this a case missing from DAGCombiner::BackwardsPropagateMask?

In D140665#4016599, @craig.topper wrote:

Is this a case missing from DAGCombiner::BackwardsPropagateMask?

In fact, I prefer the following modifications:

diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index dbf318a85e9e..76aa3c6d7623 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -1761,6 +1761,8 @@ bool TargetLowering::SimplifyDemandedBits(
       }
 
       APInt InDemandedMask = DemandedBits.lshr(ShAmt);
+      if (Op.getNode()->getFlags().hasNoUnsignedWrap())
+        InDemandedMask.setHighBits(ShAmt);
       if (SimplifyDemandedBits(Op0, InDemandedMask, DemandedElts, Known, TLO,
                                Depth + 1))
         return true;

But this modification, for x86 is not friendly.

If there are more suggestions, please tell.

What prevents SimplifyDemandedBits from turning this back into an aextload and causing an infinite loop.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
10313	Need to check that the SHL is the only user of the load or this will duplicate a load.
10317	You need to replace the chain output of the original load with the chain output of the new load.

craig.topper requested changes to this revision.Dec 25 2022, 10:31 PM

This revision now requires changes to proceed.Dec 25 2022, 10:31 PM

In D140665#4016600, @liaolucy wrote:

In D140665#4016599, @craig.topper wrote:

Is this a case missing from DAGCombiner::BackwardsPropagateMask?

In fact, I prefer the following modifications:

diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index dbf318a85e9e..76aa3c6d7623 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -1761,6 +1761,8 @@ bool TargetLowering::SimplifyDemandedBits(
       }
 
       APInt InDemandedMask = DemandedBits.lshr(ShAmt);
+      if (Op.getNode()->getFlags().hasNoUnsignedWrap())
+        InDemandedMask.setHighBits(ShAmt);
       if (SimplifyDemandedBits(Op0, InDemandedMask, DemandedElts, Known, TLO,
                                Depth + 1))
         return true;

But this modification, for x86 is not friendly.

If there are more suggestions, please tell.

@spatel @RKSimon @lebedev.ri is it a bug that we are neither demanding the upper bits for nuw/nsw nor removing the poison flags if SimplifyDemandedBits returns true?

In D140665#4016606, @craig.topper wrote:
In D140665#4016600, @liaolucy wrote:
In D140665#4016599, @craig.topper wrote:

Is this a case missing from DAGCombiner::BackwardsPropagateMask?

In fact, I prefer the following modifications:
diff --git a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
index dbf318a85e9e..76aa3c6d7623 100644
--- a/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
@@ -1761,6 +1761,8 @@ bool TargetLowering::SimplifyDemandedBits(
       }
 
       APInt InDemandedMask = DemandedBits.lshr(ShAmt);
+      if (Op.getNode()->getFlags().hasNoUnsignedWrap())
+        InDemandedMask.setHighBits(ShAmt);
       if (SimplifyDemandedBits(Op0, InDemandedMask, DemandedElts, Known, TLO,
                                Depth + 1))
         return true;
But this modification, for x86 is not friendly.

If there are more suggestions, please tell.
@spatel @RKSimon @lebedev.ri is it a bug that we are neither demanding the upper bits for nuw/nsw nor removing the poison flags if SimplifyDemandedBits returns true?

FMF are pretty rare in DAG, so this could be a simple oversight. It does seem like we should drop poison flags in that case, yes.

Improve SimplifyDemandedBits for SHL with NUW/NSW flags.

Herald added subscribers: kosarev, ecnelises, kerbowa and 11 others. · View Herald TranscriptDec 26 2022, 5:43 PM

liaolucy added reviewers: spatel, RKSimon, lebedev.ri.Dec 26 2022, 5:45 PM

I think it shouldn't demand them. The only reason it "needs" them,
is to satisfy it's poison-generating flags. I think it would be better
to keep not demanding them, and drop the poison-generating flags.
For example consider:

%y = and %x, 15
%z = shl nuw %y, 4
 =>
%z = shl %x, 4

Now if you demand the high bits (and only because of nuw!),
you suddenly can't look past the and.

This revision now requires changes to proceed.Dec 26 2022, 5:53 PM

In D140665#4017069, @lebedev.ri wrote:
I think it shouldn't demand them. The only reason it "needs" them,
is to satisfy it's poison-generating flags. I think it would be better
to keep not demanding them, and drop the poison-generating flags.
For example consider:
%y = and %x, 15
%z = shl nuw %y, 4
 =>
%z = shl %x, 4
Now if you demand the high bits (and only because of nuw!),
you suddenly can't look past the and.

You do have a point, in fact I just want to optimize RISCV. I have a question now, I see InstCombineSimplifyDemanded.cpp has similar support, can you help to explain why?

Harbormaster completed remote builds in B204939: Diff 485325.Dec 26 2022, 6:59 PM

In D140665#4017096, @liaolucy wrote:
In D140665#4017069, @lebedev.ri wrote:
I think it shouldn't demand them. The only reason it "needs" them,
is to satisfy it's poison-generating flags. I think it would be better
to keep not demanding them, and drop the poison-generating flags.
For example consider:
%y = and %x, 15
%z = shl nuw %y, 4
 =>
%z = shl %x, 4
Now if you demand the high bits (and only because of nuw!),
you suddenly can't look past the and.
You do have a point, in fact I just want to optimize RISCV.

Having re-read the whole thread here, it is not obvious to me what is the original problem that is trying to be solved here?

I have a question now, I see InstCombineSimplifyDemanded.cpp has similar support, can you help to explain why?

Optimizations in back-end, including this one, are mainly to deal with
optimization opportiunities that arise during instruction lowering/legalization,
and other sequences that are not well-exploitable in generic IR.
OTOH, the middle-end optimizations is where the optimizations should happen in general.
It's likely the InstCombineSimplifyDemanded code has the same bug.

It's a bug if we are both propagating flags and not accounting for them - we've definitely caught cases like that in IR.

I agree that it seems like the better choice would be to ignore/drop flags (the alternative is that flags could penalize optimization). But it's also possible that changing it could cause missed folds because we lost information by dropping flags. We probably just need to experiment and see what falls out from it.

arsenm added a subscriber: arsenm.Dec 27 2022, 9:01 AM

arsenm added inline comments.

llvm/test/CodeGen/AMDGPU/shl.ll
492 ↗	(On Diff #485325)	This is worse
llvm/test/CodeGen/AMDGPU/wwm-reserved-spill.ll
48 ↗	(On Diff #485325)	This is obviously better, but also is -O0

In D140665#4017550, @spatel wrote:

It's a bug if we are both propagating flags and not accounting for them - we've definitely caught cases like that in IR.

I agree that it seems like the better choice would be to ignore/drop flags (the alternative is that flags could penalize optimization). But it's also possible that changing it could cause missed folds because we lost information by dropping flags. We probably just need to experiment and see what falls out from it.

I'm not sure what this is trying to fix originally,
but i think the test changes clearly show that we should be going in the opposite direction.

llvm/test/CodeGen/AArch64/arm64-shifted-sext.ll
199 ↗	(On Diff #485325)	This seems to be a regression
llvm/test/CodeGen/AMDGPU/shl.ll
492 ↗	(On Diff #485325)	This seems to be a regression
llvm/test/CodeGen/RISCV/rv64i-complex-float.ll
23 ↗	(On Diff #485325)	This seems to be a regression
llvm/test/CodeGen/X86/parity.ll
404 ↗	(On Diff #485325)	This seems to be a massive regression
533 ↗	(On Diff #485325)	This seems to be a massive regression
657 ↗	(On Diff #485325)	This seems to be a massive regression
llvm/test/CodeGen/X86/setcc.ll
76 ↗	(On Diff #485325)	This seems to be a regression
llvm/test/CodeGen/X86/split-store.ll
176 ↗	(On Diff #485325)	This seems to be a regression
197 ↗	(On Diff #485325)	This seems to be a regression
217 ↗	(On Diff #485325)	This seems to be a regression
llvm/test/CodeGen/X86/vector-shuffle-combining-avx512bwvl.ll
151 ↗	(On Diff #485325)	This seems to be a regression

In D140665#4017570, @lebedev.ri wrote:

In D140665#4017550, @spatel wrote:

It's a bug if we are both propagating flags and not accounting for them - we've definitely caught cases like that in IR.

I agree that it seems like the better choice would be to ignore/drop flags (the alternative is that flags could penalize optimization). But it's also possible that changing it could cause missed folds because we lost information by dropping flags. We probably just need to experiment and see what falls out from it.

I'm not sure what this is trying to fix originally,
but i think the test changes clearly show that we should be going in the opposite direction.

I'll try to fill in the original problem. See the test in aext-to-zext.ll

We have initially have this DAG

Initial selection DAG: %bb.0 'read:entry'                                        
SelectionDAG has 19 nodes:                                                       
  t0: ch,glue = EntryToken                                                       
  t2: i64,ch = CopyFromReg t0, Register:i64 %0                                   
  t3: i64 = Constant<0>                                                          
  t11: i16 = Constant<8>                                                         
              t8: i64 = add nuw t2, Constant:i64<1>                              
            t9: i8,ch = load<(load (s8) from %ir.arrayidx1)> t0, t8, undef:i64   
          t10: i16 = zero_extend t9                                              
        t13: i16 = shl nuw t10, Constant:i64<8>                                  
          t5: i8,ch = load<(load (s8) from %ir.adr)> t0, t2, undef:i64           
        t6: i16 = zero_extend t5                                                 
      t14: i16 = or t13, t6                                                      
    t15: i64 = zero_extend t14                                                   
  t17: ch,glue = CopyToReg t0, Register:i64 $x10, t15                            
  t18: ch = RISCVISD::RET_FLAG t17, Register:i64 $x10, t17:1

Part way through the first DAGCombine, t10 zero_extend is replaced by an any_extend because the extended bits aren't demanded. The users are a shl by 8 and a zero_extend that reads bits 63:16 of the shl.

The extends get combined with loads to leave this DAG

Optimized lowered selection DAG: %bb.0 'read:entry'                              
SelectionDAG has 15 nodes:                                                       
  t0: ch,glue = EntryToken                                                       
  t2: i64,ch = CopyFromReg t0, Register:i64 %0                                   
            t8: i64 = add nuw t2, Constant:i64<1>                                
          t20: i16,ch = load<(load (s8) from %ir.arrayidx1), anyext from i8> t0, t8, undef:i64
        t13: i16 = shl nuw t20, Constant:i64<8>                                  
        t21: i16,ch = load<(load (s8) from %ir.adr), zext from i8> t0, t2, undef:i64
      t14: i16 = or t13, t21                                                     
    t15: i64 = zero_extend t14                                                   
  t17: ch,glue = CopyToReg t0, Register:i64 $x10, t15                            
  t18: ch = RISCVISD::RET_FLAG t17, Register:i64 $x10, t17:1

We type legalize and optimize to this

SelectionDAG has 16 nodes:                                                       
  t0: ch,glue = EntryToken                                                       
  t2: i64,ch = CopyFromReg t0, Register:i64 %0                                   
            t8: i64 = add nuw t2, Constant:i64<1>                                
          t22: i64,ch = load<(load (s8) from %ir.arrayidx1), anyext from i8> t0, t8, undef:i64
        t23: i64 = shl nuw t22, Constant:i64<8>                                  
        t24: i64,ch = load<(load (s8) from %ir.adr), zext from i8> t0, t2, undef:i64
      t25: i64 = or t23, t24                                                     
    t27: i64 = and t25, Constant:i64<65535>                                      
  t17: ch,glue = CopyToReg t0, Register:i64 $x10, t27                            
  t18: ch = RISCVISD::RET_FLAG t17, Register:i64 $x10, t17:1

The t27 'and' would be unnecessary if we had two zextloads instead of a zextload and an aextload.

This patch happens to prevent the original zero_extend from becoming an any_extend. Which solves the problem, but I don't think it's robust.

spatel mentioned this in D140733: [InstSimplify] fold exact divide to poison if it is known to not divide evenly.Dec 28 2022, 11:27 AM

spatel mentioned this in rGf0faea571403: [InstSimplify] fold exact divide to poison if it is known to not divide evenly.Dec 29 2022, 7:30 AM

spatel mentioned this in rGc43a7874a301: [InstCombine] don't let 'exact' inhibit demanded bits folds for udiv.Jan 4 2023, 10:13 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVISelLowering.cpp

14 lines

test/

CodeGen/

RISCV/

aext-to-zext.ll

24 lines

Diff 485264

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,002 Lines • ▼ Show 20 Lines	RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
setPrefFunctionAlignment(FunctionAlignment);		setPrefFunctionAlignment(FunctionAlignment);

setMinimumJumpTableEntries(5);		setMinimumJumpTableEntries(5);

// Jumps are expensive, compared to logic		// Jumps are expensive, compared to logic
setJumpIsExpensive();		setJumpIsExpensive();

setTargetDAGCombine({ISD::INTRINSIC_WO_CHAIN, ISD::ADD, ISD::SUB, ISD::AND,		setTargetDAGCombine({ISD::INTRINSIC_WO_CHAIN, ISD::ADD, ISD::SUB, ISD::AND,
ISD::OR, ISD::XOR, ISD::SETCC, ISD::SELECT});		ISD::OR, ISD::XOR, ISD::SETCC, ISD::SELECT, ISD::SHL});
if (Subtarget.is64Bit())		if (Subtarget.is64Bit())
setTargetDAGCombine(ISD::SRA);		setTargetDAGCombine(ISD::SRA);

if (Subtarget.hasStdExtF())		if (Subtarget.hasStdExtF())
setTargetDAGCombine({ISD::FADD, ISD::FMAXNUM, ISD::FMINNUM});		setTargetDAGCombine({ISD::FADD, ISD::FMAXNUM, ISD::FMINNUM});

if (Subtarget.hasStdExtZbb())		if (Subtarget.hasStdExtZbb())
setTargetDAGCombine({ISD::UMAX, ISD::UMIN, ISD::SMAX, ISD::SMIN});		setTargetDAGCombine({ISD::UMAX, ISD::UMIN, ISD::SMAX, ISD::SMIN});
▲ Show 20 Lines • Show All 9,281 Lines • ▼ Show 20 Lines	if (ShAmt.getOpcode() == RISCVISD::SPLAT_VECTOR_SPLIT_I64_VL) {
// We don't need the upper 32 bits of a 64-bit element for a shift amount.		// We don't need the upper 32 bits of a 64-bit element for a shift amount.
SDLoc DL(N);		SDLoc DL(N);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
ShAmt = DAG.getNode(RISCVISD::VMV_V_X_VL, DL, VT, DAG.getUNDEF(VT),		ShAmt = DAG.getNode(RISCVISD::VMV_V_X_VL, DL, VT, DAG.getUNDEF(VT),
ShAmt.getOperand(1),		ShAmt.getOperand(1),
DAG.getRegister(RISCV::X0, Subtarget.getXLenVT()));		DAG.getRegister(RISCV::X0, Subtarget.getXLenVT()));
return DAG.getNode(N->getOpcode(), DL, VT, N->getOperand(0), ShAmt);		return DAG.getNode(N->getOpcode(), DL, VT, N->getOperand(0), ShAmt);
}		}
		if (N->getFlags().hasNoUnsignedWrap()) {
		SDLoc DL(N);
		SDValue N0 = N->getOperand(0);
		auto *Ld = dyn_cast<LoadSDNode>(N0);
		if (Ld && ISD::isEXTLoad(Ld)) {
		craig.topperUnsubmitted Not Done Reply Inline Actions Need to check that the SHL is the only user of the load or this will duplicate a load. craig.topper: Need to check that the SHL is the only user of the load or this will duplicate a load.
		SDValue Res = DAG.getExtLoad(ISD::ZEXTLOAD, DL, N->getValueType(0),
		Ld->getChain(), Ld->getBasePtr(),
		Ld->getMemoryVT(), Ld->getMemOperand());
		return DAG.getNode(N->getOpcode(), DL, N->getValueType(0), Res,
		craig.topperUnsubmitted Not Done Reply Inline Actions You need to replace the chain output of the original load with the chain output of the new load. craig.topper: You need to replace the chain output of the original load with the chain output of the new load.
		N->getOperand(1));
		}
		}
break;		break;
}		}
case RISCVISD::ADD_VL:		case RISCVISD::ADD_VL:
case RISCVISD::SUB_VL:		case RISCVISD::SUB_VL:
case RISCVISD::VWADD_W_VL:		case RISCVISD::VWADD_W_VL:
case RISCVISD::VWADDU_W_VL:		case RISCVISD::VWADDU_W_VL:
case RISCVISD::VWSUB_W_VL:		case RISCVISD::VWSUB_W_VL:
case RISCVISD::VWSUBU_W_VL:		case RISCVISD::VWSUBU_W_VL:
▲ Show 20 Lines • Show All 3,578 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/aext-to-zext.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
				; RUN: \| FileCheck %s -check-prefix=RV64I

				; We prefer to zero extend for zextload. The default behavior in
				; TargetLowering::SimplifyDemandedBits is convert zero_extend into any_extend.
				define zeroext i16 @read(ptr nocapture noundef readonly %adr) {
				; RV64I-LABEL: read:
				; RV64I: # %bb.0: # %entry
				; RV64I-NEXT: lbu a1, 1(a0)
				; RV64I-NEXT: lbu a0, 0(a0)
				; RV64I-NEXT: slli a1, a1, 8
				; RV64I-NEXT: or a0, a1, a0
				; RV64I-NEXT: ret
				entry:
				%0 = load i8, ptr %adr, align 1
				%conv = zext i8 %0 to i16
				%arrayidx1 = getelementptr inbounds i8, ptr %adr, i64 1
				%1 = load i8, ptr %arrayidx1, align 1
				%conv2 = zext i8 %1 to i16
				%shl = shl nuw i16 %conv2, 8
				%or = or i16 %shl, %conv
				ret i16 %or
				}