This is an archive of the discontinued LLVM Phabricator instance.

Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
ClosedPublic

Authored by wdng on Aug 31 2017, 12:27 PM.

Download Raw Diff

Details

Reviewers

arsenm
b-sumner
t-tye
kzhuravl
rampitec

Commits

rG5676acad9e0b: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
rL315610: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.

Summary

During the DAGCombine optimization phase, the LLVM compiler converts ISD::CTTZ_ZERO_UNDEF to ISD::CTTZ and then expands during the Legalization phase, which prevents the v_ffbl_b32 instruction generation. This patch implements custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.

Diff Detail

Repository: rL LLVM

Event Timeline

wdng created this revision.Aug 31 2017, 12:27 PM

Herald added a subscriber: nhaehnle. · View Herald TranscriptAug 31 2017, 12:28 PM

arsenm added inline comments.Aug 31 2017, 1:34 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16680 ↗	(On Diff #113451)	The existing code was the correct way to do this

arsenm added a subscriber: llvm-commits.Aug 31 2017, 1:34 PM

wdng added inline comments.Aug 31 2017, 1:54 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

16680 ↗

(On Diff #113451)

With original code, we will have the following code transformations:

Initial selection DAG: BB#0 'sample_test:entry'
SelectionDAG has 50 nodes:
  t0: ch = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %vreg2
  t3: i64 = Constant<0>
  t5: i64,ch = load<LD8[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant)> t0, t2, undef:i64
  t6: i64,ch = merge_values t5, t5:1
    t8: i64 = add t2, Constant:i64<8>
  t9: i64,ch = load<LD8[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant)> t0, t8, undef:i64
  t10: i64,ch = merge_values t9, t9:1
  t11: ch = TokenFactor t6:1, t10:1
      t13: i64 = llvm.amdgcn.dispatch.ptr TargetConstant:i32<359>
    t19: i64 = add t13, Constant:i64<4>
  t20: i16,ch = load<LD2[%4(addrspace=2)](align=4)(tbaa=<0x4436db8>)> t11, t19, undef:i64
    t25: i64 = llvm.amdgcn.implicitarg.ptr TargetConstant:i32<460>
  t27: i64,ch = load<LD8[%11(addrspace=2)](tbaa=<0x4435518>)> t11, t25, undef:i64
  t29: i64 = Constant<32>
              t17: i32 = llvm.amdgcn.workgroup.id.x TargetConstant:i32<505>
              t21: i32 = zero_extend t20
            t22: i32 = mul t17, t21
            t15: i32 = llvm.amdgcn.workitem.id.x TargetConstant:i32<508>
          t23: i32 = add t22, t15 
        t26: i64 = zero_extend t23 
      t28: i64 = add t27, t26 
    t31: i64 = shl t28, Constant:i32<32>
  t32: i64 = sra t31, Constant:i32<32>
    t33: i64 = add t6, t32 
  t34: i8,ch = load<LD1[%arrayidx(addrspace=1)](tbaa=<0x4435498>)> t11, t33, undef:i64
  t35: i32 = zero_extend t34 
    t39: i1 = setcc t35, Constant:i32<0>, setne:ch
    t36: i32 = cttz_zero_undef t35
  t40: i32 = select t39, t36, Constant:i32<32>
  t43: i1 = setcc Constant:i32<8>, t40, setult:ch
      t47: ch = TokenFactor t20:1, t27:1, t34:1
        t44: i32 = umin t40, Constant:i32<8>
      t45: i8 = truncate t44 
      t46: i64 = add t10, t32 
    t48: ch = store<ST1[%arrayidx3(addrspace=1)](tbaa=<0x4435498>)> t47, t45, t46, undef:i64
  t49: ch = ENDPGM t48

Optimized lowered selection DAG: BB#0 'sample_test:entry'
SelectionDAG has 35 nodes:
  t0: ch = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %vreg2
  t5: i64,ch = load<LD8[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant)> t0, t2, undef:i64
    t8: i64 = add t2, Constant:i64<8>
  t9: i64,ch = load<LD8[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant)> t0, t8, undef:i64
  t11: ch = TokenFactor t5:1, t9:1
    t33: i64 = add t5, t63 
  t54: i32,ch = load<LD1[%arrayidx(addrspace=1)](tbaa=<0x4435498>), zext from i8> t11, t33, undef:i64
    t25: i64 = llvm.amdgcn.implicitarg.ptr TargetConstant:i32<460>
  t62: i32,ch = load<LD4[%11(addrspace=2)](align=8)(tbaa=<0x4435518>)> t11, t25, undef:i64
          t17: i32 = llvm.amdgcn.workgroup.id.x TargetConstant:i32<505>
        t22: i32 = mul t17, t64 
        t15: i32 = llvm.amdgcn.workitem.id.x TargetConstant:i32<508>
      t23: i32 = add t22, t15
    t60: i32 = add t62, t23
  t63: i64 = sign_extend t60, ValueType:ch:i32
      t13: i64 = llvm.amdgcn.dispatch.ptr TargetConstant:i32<359>
    t19: i64 = add t13, Constant:i64<4>
  t64: i32,ch = load<LD2[%4(addrspace=2)](align=4)(tbaa=<0x4436db8>), zext from i16> t11, t19, undef:i64
      t47: ch = TokenFactor t64:1, t62:1, t54:1
        t53: i32 = cttz t54
      t44: i32 = umin t53, Constant:i32<8>
      t46: i64 = add t9, t63
    t50: ch = store<ST1[%arrayidx3(addrspace=1)](tbaa=<0x4435498>), trunc to i8> t47, t44, t46, undef:i64
  t49: ch = ENDPGM t50

We won't be able to generate s/v_ffbl instructions. I found llvm.cttz.i32 has all been converted to cttz_zero_undef instread of 'cttz'.

If we don't want to change the original way of implementation, we may want to do a custom lowering for ISD::CTTZ at AMDGPU backend to ISD::CTTZ_ZERO_UNDE?

Ping.

I think the actual problem is the implementation of ISD::CTTZ not using v_ffbl and not this transformation.

If v_ffbl is able to produce a defined answer of bit width for 0, then you want to match it with cttz and have the operation action for cttz_zero_undef set to Expand. That will turn all cttz_zero_undef calls into cttz.

If v_ffbl is not capable of handling zero, then you want cttz_zero_undef set to Legal, and cttz set to Expand which will make use of cttz_zero_undef and a select. Or you can make cttz Custom and do your own lowering.

I think the instruction behavior is to return -1 on 0 input. IIRC we handle this and fold that for ctlz already, just not cttz.

Just add a custom lowering ISD:CTTZ to ISD::CTTZ_ZERO_UNDEF

In D37348#859119, @wdng wrote:

Just add a custom lowering ISD:CTTZ to ISD::CTTZ_ZERO_UNDEF

I don't think that will help. Why not follow exactly how CTLZ* is handled now and implement AMDGPUTargetLowering::LowerCTTZ making use of ffbl?

wdng updated this revision to Diff 114215.Sep 7 2017, 11:13 AM

wdng retitled this revision from Tighten conditions for converting ISD::CTTZ_ZERO_UNDEF to ISD::CTTZ to Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ..

wdng added a reviewer: craig.topper.Sep 7 2017, 11:36 AM

Ping.

wdng added a reviewer: t-tye.Sep 8 2017, 12:10 PM

arsenm added inline comments.Sep 8 2017, 1:22 PM

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2784–2793	I don't understand why you have this or most of the other changes. This shouldn't be substantially different from how we handle ctlz already. i.e. I would expect to see another version of AMDGPUTargetLowering::performCtlzCombine that does essentially the same thing for CTTZ.
lib/Target/AMDGPU/AMDGPUISelLowering.cpp
420	This should definitely remain legal
2032	You didn't enable custom lowering for i64, so this is dead code. You also didn't a dd any tests for it. In either case, it should be in a separate patch from the i32 handling.
lib/Target/AMDGPU/AMDGPUInstrInfo.td
301–302	This isn't a signed/unsigned operation. There is just one v_ffbl_b32.

craig.topper resigned from this revision.Sep 8 2017, 10:48 PM

wdng marked 2 inline comments as done.Sep 11 2017, 9:05 AM

wdng added inline comments.

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
420	I think it doesn't matter to define it as Custom, because it will be converted to FFBL_U32 during the custom lowering and then pattern matching to the ffbl instruction anyway at the end. However, if we defined it as Legal, we will have a "duplicate" or "extra" pattern (FFBL_U32 and CTTZ_ZERO_UNDEF) for generating the ffbl instruction. Is there any specific reason that I neglect here that we have to define it as Legal?

arsenm added inline comments.Sep 12 2017, 7:05 PM

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2803–2806	OK, I see the default expansion here isn't the compare and select like I expected. Since the compare+select implementation is likely more instructions with the compare than the sub/ctpop implementation, that one should be tried first.
lib/Target/AMDGPU/AMDGPUISelLowering.cpp
417	We should probably fix this at some point to be legal
1113–1114	Also need the select with -1 optimization (and corresponding tests) as cttz
2024	This is mostly copy past from LowerCTLZ. These should be factored into a common helper.
test/CodeGen/AMDGPU/cttz_zero_undef.ll
107	Need i64 tests

Changes based on code review feedback.

Upload a full diff.

Missing performCtlzCombine equivalent

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2803–2806	I don't see this changed
test/CodeGen/AMDGPU/cttz_zero_undef.ll
112	Missing scalar version

Address code reviews.

Fix the issues that variables are not capitalized.

wdng marked 2 inline comments as done.Sep 14 2017, 2:50 PM

Ping.

arsenm added inline comments.Sep 15 2017, 10:17 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2038–2044	Indentation wrong
2045	llvm_unreachable
2063–2066	Select between Zero and One as input to getSetCC
3019–3020	You could just pass in the new opcode directly rather than selecting it again
3021	Commented out code
3042	You didn't add tests for this part
lib/Target/AMDGPU/AMDGPUISelLowering.h
374	Should be name FFBL_B32 to match the instruction
test/CodeGen/AMDGPU/cttz_zero_undef.ll
134	Also should have some tests with i8/i16

Will create another separate ticket to fix the v_ffbl_sdwa instruction generation.

Ping.

wdng edited reviewers, added: kzhuravl; removed: craig.topper.Sep 22 2017, 12:29 PM

Ping.

Needs more comprehensive check lines. Just checking the instructions won't demonstrate that the extra instructions you're trying to avoid aren't there

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
3028	Don't include AMDGPUISD in the name of this
3053	Ditto
test/CodeGen/AMDGPU/cttz_zero_undef.ll
85	This needs to check more
97	This needs to check more
121–122	Ditto

Address code reivews.

Ping.

wdng added a reviewer: rampitec.Oct 10 2017, 11:10 AM

arsenm added inline comments.Oct 11 2017, 11:05 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2031	Extra space after ::
2040	Don't includ eAMDGPUISD in variable name
2041	Missing space before {
2073	Double // and missing closing )
2077	Double //
test/CodeGen/AMDGPU/cttz_zero_undef.ll
0–1	Add -enable-var-scope to all of the FileCheck lines. Several of these tests are broken
171–172	This isn't checking the outputs and select
184	Using undefined VAL
198	Undefined VAL

Address code reviews.

wdng marked 3 inline comments as done.Oct 11 2017, 4:14 PM

craig.topper removed a subscriber: craig.topper.Oct 11 2017, 4:16 PM

arsenm added inline comments.Oct 11 2017, 4:24 PM

test/CodeGen/AMDGPU/cttz_zero_undef.ll
175–176	Using 2 -DAGs with identical lines doesn't do anything. It will pass with only one

wdng added inline comments.Oct 11 2017, 4:27 PM

test/CodeGen/AMDGPU/cttz_zero_undef.ll
175–176	No, it won't work if I remove the -DAG. As the generated instructions get interleaved with each other.

Remove duplicate check lines.

Removed -DAG checks completely.

LGTM

This revision is now accepted and ready to land.Oct 12 2017, 10:39 AM

Closed by commit rL315610: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ. (authored by wdng). · Explain WhyOct 12 2017, 12:37 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

Target/

TargetSelectionDAG.td

2 lines

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

19 lines

Target/

AMDGPU/

AMDGPUISelLowering.h

7 lines

AMDGPUISelLowering.cpp

100 lines

AMDGPUInstrInfo.td

2 lines

EvergreenInstructions.td

2 lines

SOPInstructions.td

5 lines

test/

CodeGen/

AMDGPU/

cttz_zero_undef.ll

194 lines

Diff 118718

include/llvm/Target/TargetSelectionDAG.td

Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	def SDTFPBinOp : SDTypeProfile<1, 2, [ // fadd, fmul, etc.
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisFP<0>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisFP<0>
]>;		]>;
def SDTFPSignOp : SDTypeProfile<1, 2, [ // fcopysign.		def SDTFPSignOp : SDTypeProfile<1, 2, [ // fcopysign.
SDTCisSameAs<0, 1>, SDTCisFP<0>, SDTCisFP<2>		SDTCisSameAs<0, 1>, SDTCisFP<0>, SDTCisFP<2>
]>;		]>;
def SDTFPTernaryOp : SDTypeProfile<1, 3, [ // fmadd, fnmsub, etc.		def SDTFPTernaryOp : SDTypeProfile<1, 3, [ // fmadd, fnmsub, etc.
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisFP<0>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisFP<0>
]>;		]>;
def SDTIntUnaryOp : SDTypeProfile<1, 1, [ // ctlz		def SDTIntUnaryOp : SDTypeProfile<1, 1, [ // ctlz, cttz
SDTCisSameAs<0, 1>, SDTCisInt<0>		SDTCisSameAs<0, 1>, SDTCisInt<0>
]>;		]>;
def SDTIntExtendOp : SDTypeProfile<1, 1, [ // sext, zext, anyext		def SDTIntExtendOp : SDTypeProfile<1, 1, [ // sext, zext, anyext
SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<1, 0>, SDTCisSameNumEltsAs<0, 1>		SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<1, 0>, SDTCisSameNumEltsAs<0, 1>
]>;		]>;
def SDTIntTruncOp : SDTypeProfile<1, 1, [ // trunc		def SDTIntTruncOp : SDTypeProfile<1, 1, [ // trunc
SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<0, 1>, SDTCisSameNumEltsAs<0, 1>		SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<0, 1>, SDTCisSameNumEltsAs<0, 1>
]>;		]>;
▲ Show 20 Lines • Show All 1,045 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 2,741 Lines • ▼ Show 20 Lines	case ISD::CTPOP: {

return Op;		return Op;
}		}
case ISD::CTLZ_ZERO_UNDEF:		case ISD::CTLZ_ZERO_UNDEF:
// This trivially expands to CTLZ.		// This trivially expands to CTLZ.
return DAG.getNode(ISD::CTLZ, dl, Op.getValueType(), Op);		return DAG.getNode(ISD::CTLZ, dl, Op.getValueType(), Op);
case ISD::CTLZ: {		case ISD::CTLZ: {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
unsigned len = VT.getSizeInBits();		unsigned Len = VT.getSizeInBits();

if (TLI.isOperationLegalOrCustom(ISD::CTLZ_ZERO_UNDEF, VT)) {		if (TLI.isOperationLegalOrCustom(ISD::CTLZ_ZERO_UNDEF, VT)) {
EVT SetCCVT = getSetCCResultType(VT);		EVT SetCCVT = getSetCCResultType(VT);
SDValue CTLZ = DAG.getNode(ISD::CTLZ_ZERO_UNDEF, dl, VT, Op);		SDValue CTLZ = DAG.getNode(ISD::CTLZ_ZERO_UNDEF, dl, VT, Op);
SDValue Zero = DAG.getConstant(0, dl, VT);		SDValue Zero = DAG.getConstant(0, dl, VT);
SDValue SrcIsZero = DAG.getSetCC(dl, SetCCVT, Op, Zero, ISD::SETEQ);		SDValue SrcIsZero = DAG.getSetCC(dl, SetCCVT, Op, Zero, ISD::SETEQ);
return DAG.getNode(ISD::SELECT, dl, VT, SrcIsZero,		return DAG.getNode(ISD::SELECT, dl, VT, SrcIsZero,
DAG.getConstant(len, dl, VT), CTLZ);		DAG.getConstant(Len, dl, VT), CTLZ);
}		}

// for now, we do this:		// for now, we do this:
// x = x \| (x >> 1);		// x = x \| (x >> 1);
// x = x \| (x >> 2);		// x = x \| (x >> 2);
// ...		// ...
// x = x \| (x >>16);		// x = x \| (x >>16);
// x = x \| (x >>32); // for 64-bit input		// x = x \| (x >>32); // for 64-bit input
// return popcount(~x);		// return popcount(~x);
//		//
// Ref: "Hacker's Delight" by Henry Warren		// Ref: "Hacker's Delight" by Henry Warren
EVT ShVT = TLI.getShiftAmountTy(VT, DAG.getDataLayout());		EVT ShVT = TLI.getShiftAmountTy(VT, DAG.getDataLayout());
for (unsigned i = 0; (1U << i) <= (len / 2); ++i) {		for (unsigned i = 0; (1U << i) <= (Len / 2); ++i) {
SDValue Tmp3 = DAG.getConstant(1ULL << i, dl, ShVT);		SDValue Tmp3 = DAG.getConstant(1ULL << i, dl, ShVT);
Op = DAG.getNode(ISD::OR, dl, VT, Op,		Op = DAG.getNode(ISD::OR, dl, VT, Op,
DAG.getNode(ISD::SRL, dl, VT, Op, Tmp3));		DAG.getNode(ISD::SRL, dl, VT, Op, Tmp3));
}		}
Op = DAG.getNOT(dl, Op, VT);		Op = DAG.getNOT(dl, Op, VT);
return DAG.getNode(ISD::CTPOP, dl, VT, Op);		return DAG.getNode(ISD::CTPOP, dl, VT, Op);
}		}
case ISD::CTTZ_ZERO_UNDEF:		case ISD::CTTZ_ZERO_UNDEF:
// This trivially expands to CTTZ.		// This trivially expands to CTTZ.
return DAG.getNode(ISD::CTTZ, dl, Op.getValueType(), Op);		return DAG.getNode(ISD::CTTZ, dl, Op.getValueType(), Op);
case ISD::CTTZ: {		case ISD::CTTZ: {
		EVT VT = Op.getValueType();
		unsigned Len = VT.getSizeInBits();

		if (TLI.isOperationLegalOrCustom(ISD::CTTZ_ZERO_UNDEF, VT)) {
		EVT SetCCVT = getSetCCResultType(VT);
		SDValue CTTZ = DAG.getNode(ISD::CTTZ_ZERO_UNDEF, dl, VT, Op);
		SDValue Zero = DAG.getConstant(0, dl, VT);
		SDValue SrcIsZero = DAG.getSetCC(dl, SetCCVT, Op, Zero, ISD::SETEQ);
		return DAG.getNode(ISD::SELECT, dl, VT, SrcIsZero,
		DAG.getConstant(Len, dl, VT), CTTZ);
		}
		arsenmUnsubmitted Done Reply Inline Actions I don't understand why you have this or most of the other changes. This shouldn't be substantially different from how we handle ctlz already. i.e. I would expect to see another version of AMDGPUTargetLowering::performCtlzCombine that does essentially the same thing for CTTZ. arsenm: I don't understand why you have this or most of the other changes. This shouldn't be…

// for now, we use: { return popcount(~x & (x - 1)); }		// for now, we use: { return popcount(~x & (x - 1)); }
// unless the target has ctlz but not ctpop, in which case we use:		// unless the target has ctlz but not ctpop, in which case we use:
// { return 32 - nlz(~x & (x-1)); }		// { return 32 - nlz(~x & (x-1)); }
// Ref: "Hacker's Delight" by Henry Warren		// Ref: "Hacker's Delight" by Henry Warren
EVT VT = Op.getValueType();
SDValue Tmp3 = DAG.getNode(ISD::AND, dl, VT,		SDValue Tmp3 = DAG.getNode(ISD::AND, dl, VT,
DAG.getNOT(dl, Op, VT),		DAG.getNOT(dl, Op, VT),
DAG.getNode(ISD::SUB, dl, VT, Op,		DAG.getNode(ISD::SUB, dl, VT, Op,
DAG.getConstant(1, dl, VT)));		DAG.getConstant(1, dl, VT)));
// If ISD::CTLZ is legal and CTPOP isn't, then do that instead.		// If ISD::CTLZ is legal and CTPOP isn't, then do that instead.
if (!TLI.isOperationLegalOrCustom(ISD::CTPOP, VT) &&		if (!TLI.isOperationLegalOrCustom(ISD::CTPOP, VT) &&
TLI.isOperationLegalOrCustom(ISD::CTLZ, VT))		TLI.isOperationLegalOrCustom(ISD::CTLZ, VT))
return DAG.getNode(ISD::SUB, dl, VT,		return DAG.getNode(ISD::SUB, dl, VT,
		arsenmUnsubmitted Done Reply Inline Actions OK, I see the default expansion here isn't the compare and select like I expected. Since the compare+select implementation is likely more instructions with the compare than the sub/ctpop implementation, that one should be tried first. arsenm: OK, I see the default expansion here isn't the compare and select like I expected. Since the…
		arsenmUnsubmitted Done Reply Inline Actions I don't see this changed arsenm: I don't see this changed
DAG.getConstant(VT.getSizeInBits(), dl, VT),		DAG.getConstant(VT.getSizeInBits(), dl, VT),
DAG.getNode(ISD::CTLZ, dl, VT, Tmp3));		DAG.getNode(ISD::CTLZ, dl, VT, Tmp3));
return DAG.getNode(ISD::CTPOP, dl, VT, Tmp3);		return DAG.getNode(ISD::CTPOP, dl, VT, Tmp3);
}		}
}		}
}		}

bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {		bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
▲ Show 20 Lines • Show All 1,850 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.h

Show All 26 Lines
struct ArgDescriptor;		struct ArgDescriptor;

class AMDGPUTargetLowering : public TargetLowering {		class AMDGPUTargetLowering : public TargetLowering {
private:		private:
/// \returns AMDGPUISD::FFBH_U32 node if the incoming \p Op may have been		/// \returns AMDGPUISD::FFBH_U32 node if the incoming \p Op may have been
/// legalized from a smaller type VT. Need to match pre-legalized type because		/// legalized from a smaller type VT. Need to match pre-legalized type because
/// the generic legalization inserts the add/sub between the select and		/// the generic legalization inserts the add/sub between the select and
/// compare.		/// compare.
SDValue getFFBH_U32(SelectionDAG &DAG, SDValue Op, const SDLoc &DL) const;		SDValue getFFBX_U32(SelectionDAG &DAG, SDValue Op, const SDLoc &DL, unsigned Opc) const;

public:		public:
static bool isOrEquivalentToAdd(SelectionDAG &DAG, SDValue Op);		static bool isOrEquivalentToAdd(SelectionDAG &DAG, SDValue Op);

protected:		protected:
const AMDGPUSubtarget *Subtarget;		const AMDGPUSubtarget *Subtarget;
AMDGPUAS AMDGPUASI;		AMDGPUAS AMDGPUASI;

SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
/// \brief Split a vector store into multiple scalar stores.		/// \brief Split a vector store into multiple scalar stores.
/// \returns The resulting chain.		/// \returns The resulting chain.

SDValue LowerFREM(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFREM(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFCEIL(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFCEIL(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFTRUNC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFTRUNC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFRINT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFRINT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFNEARBYINT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFNEARBYINT(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerFROUND32_16(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFROUND32_16(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFROUND64(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFROUND64(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFROUND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFROUND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFFLOOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFFLOOR(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerCTLZ(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCTLZ_CTTZ(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerINT_TO_FP32(SDValue Op, SelectionDAG &DAG, bool Signed) const;		SDValue LowerINT_TO_FP32(SDValue Op, SelectionDAG &DAG, bool Signed) const;
SDValue LowerINT_TO_FP64(SDValue Op, SelectionDAG &DAG, bool Signed) const;		SDValue LowerINT_TO_FP64(SDValue Op, SelectionDAG &DAG, bool Signed) const;
SDValue LowerUINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerUINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerFP64_TO_INT(SDValue Op, SelectionDAG &DAG, bool Signed) const;		SDValue LowerFP64_TO_INT(SDValue Op, SelectionDAG &DAG, bool Signed) const;
SDValue LowerFP_TO_FP16(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFP_TO_FP16(SDValue Op, SelectionDAG &DAG) const;
Show All 14 Lines	SDValue splitBinaryBitConstantOpImpl(DAGCombinerInfo &DCI, const SDLoc &SL,
uint32_t ValLo, uint32_t ValHi) const;		uint32_t ValLo, uint32_t ValHi) const;
SDValue performShlCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performShlCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performSraCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performSraCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performSrlCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performSrlCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performMulCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performMulCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performMulhsCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performMulhsCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performMulhuCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performMulhuCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performMulLoHi24Combine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performMulLoHi24Combine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performCtlzCombine(const SDLoc &SL, SDValue Cond, SDValue LHS,		SDValue performCtlz_CttzCombine(const SDLoc &SL, SDValue Cond, SDValue LHS,
SDValue RHS, DAGCombinerInfo &DCI) const;		SDValue RHS, DAGCombinerInfo &DCI) const;
SDValue performSelectCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performSelectCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performFNegCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performFNegCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performFAbsCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performFAbsCombine(SDNode *N, DAGCombinerInfo &DCI) const;

static EVT getEquivalentMemType(LLVMContext &Context, EVT VT);		static EVT getEquivalentMemType(LLVMContext &Context, EVT VT);

virtual SDValue LowerGlobalAddress(AMDGPUMachineFunction *MFI, SDValue Op,		virtual SDValue LowerGlobalAddress(AMDGPUMachineFunction *MFI, SDValue Op,
▲ Show 20 Lines • Show All 266 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
CARRY,		CARRY,
BORROW,		BORROW,
BFE_U32, // Extract range of bits with zero extension to 32-bits.		BFE_U32, // Extract range of bits with zero extension to 32-bits.
BFE_I32, // Extract range of bits with sign extension to 32-bits.		BFE_I32, // Extract range of bits with sign extension to 32-bits.
BFI, // (src0 & src1) \| (~src0 & src2)		BFI, // (src0 & src1) \| (~src0 & src2)
BFM, // Insert a range of bits into a 32-bit word.		BFM, // Insert a range of bits into a 32-bit word.
FFBH_U32, // ctlz with -1 if input is zero.		FFBH_U32, // ctlz with -1 if input is zero.
FFBH_I32,		FFBH_I32,
		FFBL_B32, // cttz with -1 if input is zero.
		arsenmUnsubmitted Done Reply Inline Actions Should be name FFBL_B32 to match the instruction arsenm: Should be name FFBL_B32 to match the instruction
MUL_U24,		MUL_U24,
MUL_I24,		MUL_I24,
MULHI_U24,		MULHI_U24,
MULHI_I24,		MULHI_I24,
MAD_U24,		MAD_U24,
MAD_I24,		MAD_I24,
MUL_LOHI_I24,		MUL_LOHI_I24,
MUL_LOHI_U24,		MUL_LOHI_U24,
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 408 Lines • ▼ Show 20 Lines	AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::SELECT_CC, MVT::i64, Expand);		setOperationAction(ISD::SELECT_CC, MVT::i64, Expand);

setOperationAction(ISD::SMIN, MVT::i32, Legal);		setOperationAction(ISD::SMIN, MVT::i32, Legal);
setOperationAction(ISD::UMIN, MVT::i32, Legal);		setOperationAction(ISD::UMIN, MVT::i32, Legal);
setOperationAction(ISD::SMAX, MVT::i32, Legal);		setOperationAction(ISD::SMAX, MVT::i32, Legal);
setOperationAction(ISD::UMAX, MVT::i32, Legal);		setOperationAction(ISD::UMAX, MVT::i32, Legal);

if (Subtarget->hasFFBH())		if (Subtarget->hasFFBH())
setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i32, Custom);		setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i32, Custom);
		arsenmUnsubmitted Not Done Reply Inline Actions We should probably fix this at some point to be legal arsenm: We should probably fix this at some point to be legal

if (Subtarget->hasFFBL())		if (Subtarget->hasFFBL())
setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i32, Legal);		setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i32, Custom);
arsenmUnsubmitted Done Reply Inline Actions This should definitely remain legal arsenm: This should definitely remain legal
wdngAuthorUnsubmitted Not Done Reply Inline Actions I think it doesn't matter to define it as Custom, because it will be converted to FFBL_U32 during the custom lowering and then pattern matching to the ffbl instruction anyway at the end. However, if we defined it as Legal, we will have a "duplicate" or "extra" pattern (FFBL_U32 and CTTZ_ZERO_UNDEF) for generating the ffbl instruction. Is there any specific reason that I neglect here that we have to define it as Legal? wdng: I think it doesn't matter to define it as Custom, because it will be converted to FFBL_U32…

		setOperationAction(ISD::CTTZ, MVT::i64, Custom);
		setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i64, Custom);
setOperationAction(ISD::CTLZ, MVT::i64, Custom);		setOperationAction(ISD::CTLZ, MVT::i64, Custom);
setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i64, Custom);		setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i64, Custom);

// We only really have 32-bit BFE instructions (and 16-bit on VI).		// We only really have 32-bit BFE instructions (and 16-bit on VI).
//		//
// On SI+ there are 64-bit BFEs, but they are scalar only and there isn't any		// On SI+ there are 64-bit BFEs, but they are scalar only and there isn't any
// effort to match them now. We want this to be false for i64 cases when the		// effort to match them now. We want this to be false for i64 cases when the
// extraction isn't restricted to the upper or lower half. Ideally we would		// extraction isn't restricted to the upper or lower half. Ideally we would
▲ Show 20 Lines • Show All 672 Lines • ▼ Show 20 Lines	SDValue AMDGPUTargetLowering::LowerOperation(SDValue Op,
case ISD::FNEARBYINT: return LowerFNEARBYINT(Op, DAG);		case ISD::FNEARBYINT: return LowerFNEARBYINT(Op, DAG);
case ISD::FROUND: return LowerFROUND(Op, DAG);		case ISD::FROUND: return LowerFROUND(Op, DAG);
case ISD::FFLOOR: return LowerFFLOOR(Op, DAG);		case ISD::FFLOOR: return LowerFFLOOR(Op, DAG);
case ISD::SINT_TO_FP: return LowerSINT_TO_FP(Op, DAG);		case ISD::SINT_TO_FP: return LowerSINT_TO_FP(Op, DAG);
case ISD::UINT_TO_FP: return LowerUINT_TO_FP(Op, DAG);		case ISD::UINT_TO_FP: return LowerUINT_TO_FP(Op, DAG);
case ISD::FP_TO_FP16: return LowerFP_TO_FP16(Op, DAG);		case ISD::FP_TO_FP16: return LowerFP_TO_FP16(Op, DAG);
case ISD::FP_TO_SINT: return LowerFP_TO_SINT(Op, DAG);		case ISD::FP_TO_SINT: return LowerFP_TO_SINT(Op, DAG);
case ISD::FP_TO_UINT: return LowerFP_TO_UINT(Op, DAG);		case ISD::FP_TO_UINT: return LowerFP_TO_UINT(Op, DAG);
		case ISD::CTTZ:
		case ISD::CTTZ_ZERO_UNDEF:
case ISD::CTLZ:		case ISD::CTLZ:
		arsenmUnsubmitted Not Done Reply Inline Actions Also need the select with -1 optimization (and corresponding tests) as cttz arsenm: Also need the select with -1 optimization (and corresponding tests) as cttz
case ISD::CTLZ_ZERO_UNDEF:		case ISD::CTLZ_ZERO_UNDEF:
return LowerCTLZ(Op, DAG);		return LowerCTLZ_CTTZ(Op, DAG);
case ISD::DYNAMIC_STACKALLOC: return LowerDYNAMIC_STACKALLOC(Op, DAG);		case ISD::DYNAMIC_STACKALLOC: return LowerDYNAMIC_STACKALLOC(Op, DAG);
}		}
return Op;		return Op;
}		}

void AMDGPUTargetLowering::ReplaceNodeResults(SDNode *N,		void AMDGPUTargetLowering::ReplaceNodeResults(SDNode *N,
SmallVectorImpl<SDValue> &Results,		SmallVectorImpl<SDValue> &Results,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
▲ Show 20 Lines • Show All 890 Lines • ▼ Show 20 Lines	SDValue AMDGPUTargetLowering::LowerFFLOOR(SDValue Op, SelectionDAG &DAG) const {
SDValue NeTrunc = DAG.getSetCC(SL, SetCCVT, Src, Trunc, ISD::SETONE);		SDValue NeTrunc = DAG.getSetCC(SL, SetCCVT, Src, Trunc, ISD::SETONE);
SDValue And = DAG.getNode(ISD::AND, SL, SetCCVT, Lt0, NeTrunc);		SDValue And = DAG.getNode(ISD::AND, SL, SetCCVT, Lt0, NeTrunc);

SDValue Add = DAG.getNode(ISD::SELECT, SL, MVT::f64, And, NegOne, Zero);		SDValue Add = DAG.getNode(ISD::SELECT, SL, MVT::f64, And, NegOne, Zero);
// TODO: Should this propagate fast-math-flags?		// TODO: Should this propagate fast-math-flags?
return DAG.getNode(ISD::FADD, SL, MVT::f64, Trunc, Add);		return DAG.getNode(ISD::FADD, SL, MVT::f64, Trunc, Add);
}		}

SDValue AMDGPUTargetLowering::LowerCTLZ(SDValue Op, SelectionDAG &DAG) const {		static bool isCtlzOpc(unsigned Opc) {
		return Opc == ISD::CTLZ \|\| Opc == ISD::CTLZ_ZERO_UNDEF;
		arsenmUnsubmitted Done Reply Inline Actions This is mostly copy past from LowerCTLZ. These should be factored into a common helper. arsenm: This is mostly copy past from LowerCTLZ. These should be factored into a common helper.
		}

		static bool isCttzOpc(unsigned Opc) {
		return Opc == ISD::CTTZ \|\| Opc == ISD::CTTZ_ZERO_UNDEF;
		}

		SDValue AMDGPUTargetLowering::LowerCTLZ_CTTZ(SDValue Op, SelectionDAG &DAG) const {
		arsenmUnsubmitted Done Reply Inline Actions Extra space after :: arsenm: Extra space after ::
SDLoc SL(Op);		SDLoc SL(Op);
		arsenmUnsubmitted Not Done Reply Inline Actions You didn't enable custom lowering for i64, so this is dead code. You also didn't a dd any tests for it. In either case, it should be in a separate patch from the i32 handling. arsenm: You didn't enable custom lowering for i64, so this is dead code. You also didn't a dd any tests…
SDValue Src = Op.getOperand(0);		SDValue Src = Op.getOperand(0);
bool ZeroUndef = Op.getOpcode() == ISD::CTLZ_ZERO_UNDEF;		bool ZeroUndef = Op.getOpcode() == ISD::CTTZ_ZERO_UNDEF \|\|
		Op.getOpcode() == ISD::CTLZ_ZERO_UNDEF;

		unsigned ISDOpc, NewOpc;
		if (isCtlzOpc(Op.getOpcode())) {
		ISDOpc = ISD::CTLZ_ZERO_UNDEF;
		NewOpc = AMDGPUISD::FFBH_U32;
		arsenmUnsubmitted Done Reply Inline Actions Don't includ eAMDGPUISD in variable name arsenm: Don't includ eAMDGPUISD in variable name
		} else if (isCttzOpc(Op.getOpcode())) {
		arsenmUnsubmitted Done Reply Inline Actions Missing space before { arsenm: Missing space before {
		ISDOpc = ISD::CTTZ_ZERO_UNDEF;
		NewOpc = AMDGPUISD::FFBL_B32;
		} else
		arsenmUnsubmitted Done Reply Inline Actions Indentation wrong arsenm: Indentation wrong
		llvm_unreachable("Unexpected OPCode!!!");
		arsenmUnsubmitted Done Reply Inline Actions llvm_unreachable arsenm: llvm_unreachable


if (ZeroUndef && Src.getValueType() == MVT::i32)		if (ZeroUndef && Src.getValueType() == MVT::i32)
return DAG.getNode(AMDGPUISD::FFBH_U32, SL, MVT::i32, Src);		return DAG.getNode(NewOpc, SL, MVT::i32, Src);

SDValue Vec = DAG.getNode(ISD::BITCAST, SL, MVT::v2i32, Src);		SDValue Vec = DAG.getNode(ISD::BITCAST, SL, MVT::v2i32, Src);

const SDValue Zero = DAG.getConstant(0, SL, MVT::i32);		const SDValue Zero = DAG.getConstant(0, SL, MVT::i32);
const SDValue One = DAG.getConstant(1, SL, MVT::i32);		const SDValue One = DAG.getConstant(1, SL, MVT::i32);

SDValue Lo = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::i32, Vec, Zero);		SDValue Lo = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::i32, Vec, Zero);
SDValue Hi = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::i32, Vec, One);		SDValue Hi = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::i32, Vec, One);

EVT SetCCVT = getSetCCResultType(DAG.getDataLayout(),		EVT SetCCVT = getSetCCResultType(DAG.getDataLayout(),
*DAG.getContext(), MVT::i32);		*DAG.getContext(), MVT::i32);

SDValue Hi0 = DAG.getSetCC(SL, SetCCVT, Hi, Zero, ISD::SETEQ);		SDValue ZeroOrOne = isCtlzOpc(Op.getOpcode()) ? Zero : One;
		SDValue HiOrLo = isCtlzOpc(Op.getOpcode()) ? Hi : Lo;
		SDValue Hi0orLo0 = DAG.getSetCC(SL, SetCCVT, HiOrLo, ZeroOrOne, ISD::SETEQ);

SDValue CtlzLo = DAG.getNode(ISD::CTLZ_ZERO_UNDEF, SL, MVT::i32, Lo);		SDValue OprLo = DAG.getNode(ISDOpc, SL, MVT::i32, Lo);
		arsenmUnsubmitted Done Reply Inline Actions Select between Zero and One as input to getSetCC arsenm: Select between Zero and One as input to getSetCC
SDValue CtlzHi = DAG.getNode(ISD::CTLZ_ZERO_UNDEF, SL, MVT::i32, Hi);		SDValue OprHi = DAG.getNode(ISDOpc, SL, MVT::i32, Hi);

const SDValue Bits32 = DAG.getConstant(32, SL, MVT::i32);		const SDValue Bits32 = DAG.getConstant(32, SL, MVT::i32);
SDValue Add = DAG.getNode(ISD::ADD, SL, MVT::i32, CtlzLo, Bits32);		SDValue Add, NewOpr;
		if (isCtlzOpc(Op.getOpcode())) {
		Add = DAG.getNode(ISD::ADD, SL, MVT::i32, OprLo, Bits32);
// ctlz(x) = hi_32(x) == 0 ? ctlz(lo_32(x)) + 32 : ctlz(hi_32(x))		// ctlz(x) = hi_32(x) == 0 ? ctlz(lo_32(x)) + 32 : ctlz(hi_32(x))
		arsenmUnsubmitted Done Reply Inline Actions Double // and missing closing ) arsenm: Double // and missing closing )
SDValue NewCtlz = DAG.getNode(ISD::SELECT, SL, MVT::i32, Hi0, Add, CtlzHi);		NewOpr = DAG.getNode(ISD::SELECT, SL, MVT::i32, Hi0orLo0, Add, OprHi);
		} else {
		Add = DAG.getNode(ISD::ADD, SL, MVT::i32, OprHi, Bits32);
		// cttz(x) = lo_32(x) == 0 ? cttz(hi_32(x)) + 32 : cttz(lo_32(x))
		arsenmUnsubmitted Done Reply Inline Actions Double // arsenm: Double //
		NewOpr = DAG.getNode(ISD::SELECT, SL, MVT::i32, Hi0orLo0, Add, OprLo);
		}

if (!ZeroUndef) {		if (!ZeroUndef) {
// Test if the full 64-bit input is zero.		// Test if the full 64-bit input is zero.

// FIXME: DAG combines turn what should be an s_and_b64 into a v_or_b32,		// FIXME: DAG combines turn what should be an s_and_b64 into a v_or_b32,
// which we probably don't want.		// which we probably don't want.
SDValue Lo0 = DAG.getSetCC(SL, SetCCVT, Lo, Zero, ISD::SETEQ);		SDValue LoOrHi = isCtlzOpc(Op.getOpcode()) ? Lo : Hi;
SDValue SrcIsZero = DAG.getNode(ISD::AND, SL, SetCCVT, Lo0, Hi0);		SDValue Lo0OrHi0 = DAG.getSetCC(SL, SetCCVT, LoOrHi, ZeroOrOne, ISD::SETEQ);
		SDValue SrcIsZero = DAG.getNode(ISD::AND, SL, SetCCVT, Lo0OrHi0, Hi0orLo0);

// TODO: If i64 setcc is half rate, it can result in 1 fewer instruction		// TODO: If i64 setcc is half rate, it can result in 1 fewer instruction
// with the same cycles, otherwise it is slower.		// with the same cycles, otherwise it is slower.
// SDValue SrcIsZero = DAG.getSetCC(SL, SetCCVT, Src,		// SDValue SrcIsZero = DAG.getSetCC(SL, SetCCVT, Src,
// DAG.getConstant(0, SL, MVT::i64), ISD::SETEQ);		// DAG.getConstant(0, SL, MVT::i64), ISD::SETEQ);

const SDValue Bits32 = DAG.getConstant(64, SL, MVT::i32);		const SDValue Bits32 = DAG.getConstant(64, SL, MVT::i32);

// The instruction returns -1 for 0 input, but the defined intrinsic		// The instruction returns -1 for 0 input, but the defined intrinsic
// behavior is to return the number of bits.		// behavior is to return the number of bits.
NewCtlz = DAG.getNode(ISD::SELECT, SL, MVT::i32,		NewOpr = DAG.getNode(ISD::SELECT, SL, MVT::i32,
SrcIsZero, Bits32, NewCtlz);		SrcIsZero, Bits32, NewOpr);
}		}

return DAG.getNode(ISD::ZERO_EXTEND, SL, MVT::i64, NewCtlz);		return DAG.getNode(ISD::ZERO_EXTEND, SL, MVT::i64, NewOpr);
}		}

SDValue AMDGPUTargetLowering::LowerINT_TO_FP32(SDValue Op, SelectionDAG &DAG,		SDValue AMDGPUTargetLowering::LowerINT_TO_FP32(SDValue Op, SelectionDAG &DAG,
bool Signed) const {		bool Signed) const {
// Unsigned		// Unsigned
// cul2f(ulong u)		// cul2f(ulong u)
//{		//{
// uint lz = clz(u);		// uint lz = clz(u);
▲ Show 20 Lines • Show All 895 Lines • ▼ Show 20 Lines
}		}

static bool isNegativeOne(SDValue Val) {		static bool isNegativeOne(SDValue Val) {
if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Val))		if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Val))
return C->isAllOnesValue();		return C->isAllOnesValue();
return false;		return false;
}		}

static bool isCtlzOpc(unsigned Opc) {		SDValue AMDGPUTargetLowering::getFFBX_U32(SelectionDAG &DAG,
return Opc == ISD::CTLZ \|\| Opc == ISD::CTLZ_ZERO_UNDEF;
}

SDValue AMDGPUTargetLowering::getFFBH_U32(SelectionDAG &DAG,
SDValue Op,		SDValue Op,
const SDLoc &DL) const {		const SDLoc &DL,
		unsigned Opc) const {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
EVT LegalVT = getTypeToTransformTo(*DAG.getContext(), VT);		EVT LegalVT = getTypeToTransformTo(*DAG.getContext(), VT);
		arsenmUnsubmitted Done Reply Inline Actions You could just pass in the new opcode directly rather than selecting it again arsenm: You could just pass in the new opcode directly rather than selecting it again
if (LegalVT != MVT::i32 && (Subtarget->has16BitInsts() &&		if (LegalVT != MVT::i32 && (Subtarget->has16BitInsts() &&
		arsenmUnsubmitted Done Reply Inline Actions Commented out code arsenm: Commented out code
LegalVT != MVT::i16))		LegalVT != MVT::i16))
return SDValue();		return SDValue();

if (VT != MVT::i32)		if (VT != MVT::i32)
Op = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i32, Op);		Op = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i32, Op);

SDValue FFBH = DAG.getNode(AMDGPUISD::FFBH_U32, DL, MVT::i32, Op);		SDValue FFBX = DAG.getNode(Opc, DL, MVT::i32, Op);
		arsenmUnsubmitted Done Reply Inline Actions Don't include AMDGPUISD in the name of this arsenm: Don't include AMDGPUISD in the name of this
if (VT != MVT::i32)		if (VT != MVT::i32)
FFBH = DAG.getNode(ISD::TRUNCATE, DL, VT, FFBH);		FFBX = DAG.getNode(ISD::TRUNCATE, DL, VT, FFBX);

return FFBH;		return FFBX;
}		}

// The native instructions return -1 on 0 input. Optimize out a select that		// The native instructions return -1 on 0 input. Optimize out a select that
// produces -1 on 0.		// produces -1 on 0.
//		//
// TODO: If zero is not undef, we could also do this if the output is compared		// TODO: If zero is not undef, we could also do this if the output is compared
// against the bitwidth.		// against the bitwidth.
//		//
// TODO: Should probably combine against FFBH_U32 instead of ctlz directly.		// TODO: Should probably combine against FFBH_U32 instead of ctlz directly.
SDValue AMDGPUTargetLowering::performCtlzCombine(const SDLoc &SL, SDValue Cond,		SDValue AMDGPUTargetLowering::performCtlz_CttzCombine(const SDLoc &SL, SDValue Cond,
		arsenmUnsubmitted Not Done Reply Inline Actions You didn't add tests for this part arsenm: You didn't add tests for this part
SDValue LHS, SDValue RHS,		SDValue LHS, SDValue RHS,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
ConstantSDNode *CmpRhs = dyn_cast<ConstantSDNode>(Cond.getOperand(1));		ConstantSDNode *CmpRhs = dyn_cast<ConstantSDNode>(Cond.getOperand(1));
if (!CmpRhs \|\| !CmpRhs->isNullValue())		if (!CmpRhs \|\| !CmpRhs->isNullValue())
return SDValue();		return SDValue();

SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
ISD::CondCode CCOpcode = cast<CondCodeSDNode>(Cond.getOperand(2))->get();		ISD::CondCode CCOpcode = cast<CondCodeSDNode>(Cond.getOperand(2))->get();
SDValue CmpLHS = Cond.getOperand(0);		SDValue CmpLHS = Cond.getOperand(0);

		unsigned Opc = isCttzOpc(RHS.getOpcode()) ? AMDGPUISD::FFBL_B32 :
		arsenmUnsubmitted Done Reply Inline Actions Ditto arsenm: Ditto
		AMDGPUISD::FFBH_U32;

// select (setcc x, 0, eq), -1, (ctlz_zero_undef x) -> ffbh_u32 x		// select (setcc x, 0, eq), -1, (ctlz_zero_undef x) -> ffbh_u32 x
		// select (setcc x, 0, eq), -1, (cttz_zero_undef x) -> ffbl_u32 x
if (CCOpcode == ISD::SETEQ &&		if (CCOpcode == ISD::SETEQ &&
isCtlzOpc(RHS.getOpcode()) &&		(isCtlzOpc(RHS.getOpcode()) \|\| isCttzOpc(RHS.getOpcode())) &&
RHS.getOperand(0) == CmpLHS &&		RHS.getOperand(0) == CmpLHS &&
isNegativeOne(LHS)) {		isNegativeOne(LHS)) {
return getFFBH_U32(DAG, CmpLHS, SL);		return getFFBX_U32(DAG, CmpLHS, SL, Opc);
}		}

// select (setcc x, 0, ne), (ctlz_zero_undef x), -1 -> ffbh_u32 x		// select (setcc x, 0, ne), (ctlz_zero_undef x), -1 -> ffbh_u32 x
		// select (setcc x, 0, ne), (cttz_zero_undef x), -1 -> ffbl_u32 x
if (CCOpcode == ISD::SETNE &&		if (CCOpcode == ISD::SETNE &&
isCtlzOpc(LHS.getOpcode()) &&		(isCtlzOpc(LHS.getOpcode()) \|\| isCttzOpc(RHS.getOpcode())) &&
LHS.getOperand(0) == CmpLHS &&		LHS.getOperand(0) == CmpLHS &&
isNegativeOne(RHS)) {		isNegativeOne(RHS)) {
return getFFBH_U32(DAG, CmpLHS, SL);		return getFFBX_U32(DAG, CmpLHS, SL, Opc);
}		}

return SDValue();		return SDValue();
}		}

static SDValue distributeOpThroughSelect(TargetLowering::DAGCombinerInfo &DCI,		static SDValue distributeOpThroughSelect(TargetLowering::DAGCombinerInfo &DCI,
unsigned Op,		unsigned Op,
const SDLoc &SL,		const SDLoc &SL,
▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	if (VT == MVT::f32 && Subtarget->hasFminFmaxLegacy()) {
= combineFMinMaxLegacy(SDLoc(N), VT, LHS, RHS, True, False, CC, DCI);		= combineFMinMaxLegacy(SDLoc(N), VT, LHS, RHS, True, False, CC, DCI);
// Revisit this node so we can catch min3/max3/med3 patterns.		// Revisit this node so we can catch min3/max3/med3 patterns.
//DCI.AddToWorklist(MinMax.getNode());		//DCI.AddToWorklist(MinMax.getNode());
return MinMax;		return MinMax;
}		}
}		}

// There's no reason to not do this if the condition has other uses.		// There's no reason to not do this if the condition has other uses.
return performCtlzCombine(SDLoc(N), Cond, True, False, DCI);		return performCtlz_CttzCombine(SDLoc(N), Cond, True, False, DCI);
}		}

static bool isConstantFPZero(SDValue N) {		static bool isConstantFPZero(SDValue N) {
if (const ConstantFPSDNode *C = isConstOrConstSplatFP(N))		if (const ConstantFPSDNode *C = isConstOrConstSplatFP(N))
return C->isZero() && !C->isNegative();		return C->isZero() && !C->isNegative();
return false;		return false;
}		}

▲ Show 20 Lines • Show All 571 Lines • ▼ Show 20 Lines	const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
NODE_NAME_CASE(CARRY)		NODE_NAME_CASE(CARRY)
NODE_NAME_CASE(BORROW)		NODE_NAME_CASE(BORROW)
NODE_NAME_CASE(BFE_U32)		NODE_NAME_CASE(BFE_U32)
NODE_NAME_CASE(BFE_I32)		NODE_NAME_CASE(BFE_I32)
NODE_NAME_CASE(BFI)		NODE_NAME_CASE(BFI)
NODE_NAME_CASE(BFM)		NODE_NAME_CASE(BFM)
NODE_NAME_CASE(FFBH_U32)		NODE_NAME_CASE(FFBH_U32)
NODE_NAME_CASE(FFBH_I32)		NODE_NAME_CASE(FFBH_I32)
		NODE_NAME_CASE(FFBL_B32)
NODE_NAME_CASE(MUL_U24)		NODE_NAME_CASE(MUL_U24)
NODE_NAME_CASE(MUL_I24)		NODE_NAME_CASE(MUL_I24)
NODE_NAME_CASE(MULHI_U24)		NODE_NAME_CASE(MULHI_U24)
NODE_NAME_CASE(MULHI_I24)		NODE_NAME_CASE(MULHI_I24)
NODE_NAME_CASE(MUL_LOHI_U24)		NODE_NAME_CASE(MUL_LOHI_U24)
NODE_NAME_CASE(MUL_LOHI_I24)		NODE_NAME_CASE(MUL_LOHI_I24)
NODE_NAME_CASE(MAD_U24)		NODE_NAME_CASE(MAD_U24)
NODE_NAME_CASE(MAD_I24)		NODE_NAME_CASE(MAD_I24)
▲ Show 20 Lines • Show All 190 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUInstrInfo.td

	Show First 20 Lines • Show All 292 Lines • ▼ Show 20 Lines
	def AMDGPUbfe_u32 : SDNode<"AMDGPUISD::BFE_U32", AMDGPUDTIntTernaryOp>;			def AMDGPUbfe_u32 : SDNode<"AMDGPUISD::BFE_U32", AMDGPUDTIntTernaryOp>;
	def AMDGPUbfe_i32 : SDNode<"AMDGPUISD::BFE_I32", AMDGPUDTIntTernaryOp>;			def AMDGPUbfe_i32 : SDNode<"AMDGPUISD::BFE_I32", AMDGPUDTIntTernaryOp>;
	def AMDGPUbfi : SDNode<"AMDGPUISD::BFI", AMDGPUDTIntTernaryOp>;			def AMDGPUbfi : SDNode<"AMDGPUISD::BFI", AMDGPUDTIntTernaryOp>;
	def AMDGPUbfm : SDNode<"AMDGPUISD::BFM", SDTIntBinOp>;			def AMDGPUbfm : SDNode<"AMDGPUISD::BFM", SDTIntBinOp>;

	def AMDGPUffbh_u32 : SDNode<"AMDGPUISD::FFBH_U32", SDTIntUnaryOp>;			def AMDGPUffbh_u32 : SDNode<"AMDGPUISD::FFBH_U32", SDTIntUnaryOp>;
	def AMDGPUffbh_i32 : SDNode<"AMDGPUISD::FFBH_I32", SDTIntUnaryOp>;			def AMDGPUffbh_i32 : SDNode<"AMDGPUISD::FFBH_I32", SDTIntUnaryOp>;

				def AMDGPUffbl_b32 : SDNode<"AMDGPUISD::FFBL_B32", SDTIntUnaryOp>;

				arsenmUnsubmitted Done Reply Inline Actions This isn't a signed/unsigned operation. There is just one v_ffbl_b32. arsenm: This isn't a signed/unsigned operation. There is just one v_ffbl_b32.
	// Signed and unsigned 24-bit multiply. The highest 8-bits are ignore			// Signed and unsigned 24-bit multiply. The highest 8-bits are ignore
	// when performing the mulitply. The result is a 32-bit value.			// when performing the mulitply. The result is a 32-bit value.
	def AMDGPUmul_u24 : SDNode<"AMDGPUISD::MUL_U24", SDTIntBinOp,			def AMDGPUmul_u24 : SDNode<"AMDGPUISD::MUL_U24", SDTIntBinOp,
	[SDNPCommutative, SDNPAssociative]			[SDNPCommutative, SDNPAssociative]
	>;			>;
	def AMDGPUmul_i24 : SDNode<"AMDGPUISD::MUL_I24", SDTIntBinOp,			def AMDGPUmul_i24 : SDNode<"AMDGPUISD::MUL_I24", SDTIntBinOp,
	[SDNPCommutative, SDNPAssociative]			[SDNPCommutative, SDNPAssociative]
	>;			>;
	▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

lib/Target/AMDGPU/EvergreenInstructions.td

	Show First 20 Lines • Show All 436 Lines • ▼ Show 20 Lines

	def ADDC_UINT : R600_2OP_Helper <0x52, "ADDC_UINT", AMDGPUcarry>;			def ADDC_UINT : R600_2OP_Helper <0x52, "ADDC_UINT", AMDGPUcarry>;
	def SUBB_UINT : R600_2OP_Helper <0x53, "SUBB_UINT", AMDGPUborrow>;			def SUBB_UINT : R600_2OP_Helper <0x53, "SUBB_UINT", AMDGPUborrow>;

	def FLT32_TO_FLT16 : R600_1OP_Helper <0xA2, "FLT32_TO_FLT16", AMDGPUfp_to_f16, VecALU>;			def FLT32_TO_FLT16 : R600_1OP_Helper <0xA2, "FLT32_TO_FLT16", AMDGPUfp_to_f16, VecALU>;
	def FLT16_TO_FLT32 : R600_1OP_Helper <0xA3, "FLT16_TO_FLT32", f16_to_fp, VecALU>;			def FLT16_TO_FLT32 : R600_1OP_Helper <0xA3, "FLT16_TO_FLT32", f16_to_fp, VecALU>;
	def BCNT_INT : R600_1OP_Helper <0xAA, "BCNT_INT", ctpop, VecALU>;			def BCNT_INT : R600_1OP_Helper <0xAA, "BCNT_INT", ctpop, VecALU>;
	def FFBH_UINT : R600_1OP_Helper <0xAB, "FFBH_UINT", AMDGPUffbh_u32, VecALU>;			def FFBH_UINT : R600_1OP_Helper <0xAB, "FFBH_UINT", AMDGPUffbh_u32, VecALU>;
	def FFBL_INT : R600_1OP_Helper <0xAC, "FFBL_INT", cttz_zero_undef, VecALU>;			def FFBL_INT : R600_1OP_Helper <0xAC, "FFBL_INT", AMDGPUffbl_b32, VecALU>;

	let hasSideEffects = 1 in {			let hasSideEffects = 1 in {
	def MOVA_INT_eg : R600_1OP <0xCC, "MOVA_INT", [], VecALU>;			def MOVA_INT_eg : R600_1OP <0xCC, "MOVA_INT", [], VecALU>;
	}			}

	def FLT_TO_INT_eg : FLT_TO_INT_Common<0x50> {			def FLT_TO_INT_eg : FLT_TO_INT_Common<0x50> {
	let Pattern = [];			let Pattern = [];
	let Itinerary = AnyALU;			let Itinerary = AnyALU;
	▲ Show 20 Lines • Show All 313 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SOPInstructions.td

	Show First 20 Lines • Show All 153 Lines • ▼ Show 20 Lines
	def S_BCNT1_I32_B32 : SOP1_32 <"s_bcnt1_i32_b32",			def S_BCNT1_I32_B32 : SOP1_32 <"s_bcnt1_i32_b32",
	[(set i32:$sdst, (ctpop i32:$src0))]			[(set i32:$sdst, (ctpop i32:$src0))]
	>;			>;
	def S_BCNT1_I32_B64 : SOP1_32_64 <"s_bcnt1_i32_b64">;			def S_BCNT1_I32_B64 : SOP1_32_64 <"s_bcnt1_i32_b64">;
	} // End Defs = [SCC]			} // End Defs = [SCC]

	def S_FF0_I32_B32 : SOP1_32 <"s_ff0_i32_b32">;			def S_FF0_I32_B32 : SOP1_32 <"s_ff0_i32_b32">;
	def S_FF0_I32_B64 : SOP1_32_64 <"s_ff0_i32_b64">;			def S_FF0_I32_B64 : SOP1_32_64 <"s_ff0_i32_b64">;
				def S_FF1_I32_B64 : SOP1_32_64 <"s_ff1_i32_b64">;

	def S_FF1_I32_B32 : SOP1_32 <"s_ff1_i32_b32",			def S_FF1_I32_B32 : SOP1_32 <"s_ff1_i32_b32",
	[(set i32:$sdst, (cttz_zero_undef i32:$src0))]			[(set i32:$sdst, (AMDGPUffbl_b32 i32:$src0))]
	>;			>;
	def S_FF1_I32_B64 : SOP1_32_64 <"s_ff1_i32_b64">;

	def S_FLBIT_I32_B32 : SOP1_32 <"s_flbit_i32_b32",			def S_FLBIT_I32_B32 : SOP1_32 <"s_flbit_i32_b32",
	[(set i32:$sdst, (AMDGPUffbh_u32 i32:$src0))]			[(set i32:$sdst, (AMDGPUffbh_u32 i32:$src0))]
	>;			>;

	def S_FLBIT_I32_B64 : SOP1_32_64 <"s_flbit_i32_b64">;			def S_FLBIT_I32_B64 : SOP1_32_64 <"s_flbit_i32_b64">;
	def S_FLBIT_I32 : SOP1_32 <"s_flbit_i32",			def S_FLBIT_I32 : SOP1_32 <"s_flbit_i32",
	[(set i32:$sdst, (AMDGPUffbh_i32 i32:$src0))]			[(set i32:$sdst, (AMDGPUffbh_i32 i32:$src0))]
	▲ Show 20 Lines • Show All 1,137 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/cttz_zero_undef.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=SI -check-prefix=SI-NOSDWA -check-prefix=FUNC %s
				arsenmUnsubmitted Not Done Reply Inline Actions Add -enable-var-scope to all of the FileCheck lines. Several of these tests are broken arsenm: Add -enable-var-scope to all of the FileCheck lines. Several of these tests are broken
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=SI -check-prefix=SI-SDWA -check-prefix=FUNC %s
	; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s			; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=EG -check-prefix=FUNC %s

				declare i7 @llvm.cttz.i7(i7, i1) nounwind readnone
				declare i8 @llvm.cttz.i8(i8, i1) nounwind readnone
				declare i16 @llvm.cttz.i16(i16, i1) nounwind readnone
	declare i32 @llvm.cttz.i32(i32, i1) nounwind readnone			declare i32 @llvm.cttz.i32(i32, i1) nounwind readnone
				declare i64 @llvm.cttz.i64(i64, i1) nounwind readnone
	declare <2 x i32> @llvm.cttz.v2i32(<2 x i32>, i1) nounwind readnone			declare <2 x i32> @llvm.cttz.v2i32(<2 x i32>, i1) nounwind readnone
	declare <4 x i32> @llvm.cttz.v4i32(<4 x i32>, i1) nounwind readnone			declare <4 x i32> @llvm.cttz.v4i32(<4 x i32>, i1) nounwind readnone
	declare i32 @llvm.r600.read.tidig.x() nounwind readnone			declare i32 @llvm.r600.read.tidig.x() nounwind readnone

	; FUNC-LABEL: {{^}}s_cttz_zero_undef_i32:			; FUNC-LABEL: {{^}}s_cttz_zero_undef_i32:
	; SI: s_load_dword [[VAL:s[0-9]+]],			; SI: s_load_dword [[VAL:s[0-9]+]],
	; SI: s_ff1_i32_b32 [[SRESULT:s[0-9]+]], [[VAL]]			; SI: s_ff1_i32_b32 [[SRESULT:s[0-9]+]], [[VAL]]
	; SI: v_mov_b32_e32 [[VRESULT:v[0-9]+]], [[SRESULT]]			; SI: v_mov_b32_e32 [[VRESULT:v[0-9]+]], [[SRESULT]]
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @v_cttz_zero_undef_v4i32(<4 x i32> addrspace(1)* noalias %out, <4 x i32> addrspace(1)* noalias %valptr) nounwind {			define amdgpu_kernel void @v_cttz_zero_undef_v4i32(<4 x i32> addrspace(1)* noalias %out, <4 x i32> addrspace(1)* noalias %valptr) nounwind {
	%tid = call i32 @llvm.r600.read.tidig.x()			%tid = call i32 @llvm.r600.read.tidig.x()
	%in.gep = getelementptr <4 x i32>, <4 x i32> addrspace(1)* %valptr, i32 %tid			%in.gep = getelementptr <4 x i32>, <4 x i32> addrspace(1)* %valptr, i32 %tid
	%val = load <4 x i32>, <4 x i32> addrspace(1)* %in.gep, align 16			%val = load <4 x i32>, <4 x i32> addrspace(1)* %in.gep, align 16
	%cttz = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> %val, i1 true) nounwind readnone			%cttz = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> %val, i1 true) nounwind readnone
	store <4 x i32> %cttz, <4 x i32> addrspace(1)* %out, align 16			store <4 x i32> %cttz, <4 x i32> addrspace(1)* %out, align 16
	ret void			ret void
	}			}

				; FUNC-LABEL: {{^}}s_cttz_zero_undef_i8_with_select:
				; SI: s_ff1_i32_b32 s{{[0-9]+}}, s{{[0-9]+}}
				arsenmUnsubmitted Done Reply Inline Actions This needs to check more arsenm: This needs to check more
				; EG: MEM_RAT MSKOR
				; EG: FFBL_INT
				define amdgpu_kernel void @s_cttz_zero_undef_i8_with_select(i8 addrspace(1)* noalias %out, i8 %val) nounwind {
				%cttz = tail call i8 @llvm.cttz.i8(i8 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i8 %val, 0
				%ret = select i1 %cttz_ret, i8 %cttz, i8 32
				store i8 %cttz, i8 addrspace(1)* %out, align 4
				ret void
				}

				; FUNC-LABEL: {{^}}s_cttz_zero_undef_i16_with_select:
				; SI: s_ff1_i32_b32 s{{[0-9]+}}, s{{[0-9]+}}
				arsenmUnsubmitted Done Reply Inline Actions This needs to check more arsenm: This needs to check more
				; EG: MEM_RAT MSKOR
				; EG: FFBL_INT
				define amdgpu_kernel void @s_cttz_zero_undef_i16_with_select(i16 addrspace(1)* noalias %out, i16 %val) nounwind {
				%cttz = tail call i16 @llvm.cttz.i16(i16 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i16 %val, 0
				%ret = select i1 %cttz_ret, i16 %cttz, i16 32
				store i16 %cttz, i16 addrspace(1)* %out, align 4
				ret void
				}

				arsenmUnsubmitted Done Reply Inline Actions Need i64 tests arsenm: Need i64 tests
				; FUNC-LABEL: {{^}}s_cttz_zero_undef_i32_with_select:
				; SI: s_ff1_i32_b32
				; EG: MEM_RAT_CACHELESS STORE_RAW [[RESULT:T[0-9]+\.[XYZW]]]
				; EG: FFBL_INT {{\? }}[[RESULT]]
				define amdgpu_kernel void @s_cttz_zero_undef_i32_with_select(i32 addrspace(1)* noalias %out, i32 %val) nounwind {
				arsenmUnsubmitted Done Reply Inline Actions Missing scalar version arsenm: Missing scalar version
				%cttz = tail call i32 @llvm.cttz.i32(i32 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i32 %val, 0
				%ret = select i1 %cttz_ret, i32 %cttz, i32 32
				store i32 %cttz, i32 addrspace(1)* %out, align 4
				ret void
				}

				; FUNC-LABEL: {{^}}s_cttz_zero_undef_i64_with_select:
				; SI: s_ff1_i32_b32 s{{[0-9]+}}, s{{[0-9]+}}
				; SI: s_ff1_i32_b32 s{{[0-9]+}}, s{{[0-9]+}}
				arsenmUnsubmitted Done Reply Inline Actions Ditto arsenm: Ditto
				; EG: MEM_RAT_CACHELESS STORE_RAW [[RESULT:T[0-9]+\.[XYZW]]]
				define amdgpu_kernel void @s_cttz_zero_undef_i64_with_select(i64 addrspace(1)* noalias %out, i64 %val) nounwind {
				%cttz = tail call i64 @llvm.cttz.i64(i64 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i64 %val, 0
				%ret = select i1 %cttz_ret, i64 %cttz, i64 32
				store i64 %cttz, i64 addrspace(1)* %out, align 4
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_zero_undef_i8_with_select:
				; SI-NOSDWA: v_ffbl_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}
				; SI-SDWA: v_ffbl_b32_sdwa
				arsenmUnsubmitted Done Reply Inline Actions Also should have some tests with i8/i16 arsenm: Also should have some tests with i8/i16
				; EG: MEM_RAT MSKOR
				define amdgpu_kernel void @v_cttz_zero_undef_i8_with_select(i8 addrspace(1)* noalias %out, i8 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i8, i8 addrspace(1)* %arrayidx, align 1
				%cttz = tail call i8 @llvm.cttz.i8(i8 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i8 %val, 0
				%ret = select i1 %cttz_ret, i8 %cttz, i8 32
				store i8 %ret, i8 addrspace(1)* %out, align 4
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_zero_undef_i16_with_select:
				; SI-NOSDWA: v_ffbl_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}
				; SI-SDWA: v_ffbl_b32_sdwa
				; EG: MEM_RAT MSKOR
				define amdgpu_kernel void @v_cttz_zero_undef_i16_with_select(i16 addrspace(1)* noalias %out, i16 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i16, i16 addrspace(1)* %arrayidx, align 1
				%cttz = tail call i16 @llvm.cttz.i16(i16 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i16 %val, 0
				%ret = select i1 %cttz_ret, i16 %cttz, i16 32
				store i16 %ret, i16 addrspace(1)* %out, align 4
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_zero_undef_i32_with_select:
				; SI: v_ffbl_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}
				; EG: MEM_RAT_CACHELESS STORE_RAW [[RESULT:T[0-9]+\.[XYZW]]]
				define amdgpu_kernel void @v_cttz_zero_undef_i32_with_select(i32 addrspace(1)* noalias %out, i32 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i32, i32 addrspace(1)* %arrayidx, align 1
				%cttz = tail call i32 @llvm.cttz.i32(i32 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i32 %val, 0
				%ret = select i1 %cttz_ret, i32 %cttz, i32 32
				store i32 %ret, i32 addrspace(1)* %out, align 4
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_zero_undef_i64_with_select:
				; SI: v_or_b32_e32
				; SI: v_or_b32_e32
				arsenmUnsubmitted Done Reply Inline Actions This isn't checking the outputs and select arsenm: This isn't checking the outputs and select
				; SI-NOSDWA-DAG: v_or_b32_e32
				; SI-NOSDWA-DAG: v_or_b32_e32
				; SI-SDWA-DAG: v_or_b32_sdwa
				; SI-SDWA-DAG: v_or_b32_sdwa
				arsenmUnsubmitted Done Reply Inline Actions Using 2 -DAGs with identical lines doesn't do anything. It will pass with only one arsenm: Using 2 -DAGs with identical lines doesn't do anything. It will pass with only one
				wdngAuthorUnsubmitted Done Reply Inline Actions No, it won't work if I remove the -DAG. As the generated instructions get interleaved with each other. wdng: No, it won't work if I remove the -DAG. As the generated instructions get interleaved with each…
				; SI: v_or_b32_e32 [[VAL1:v[0-9]+]], v{{[0-9]+}}, v{{[0-9]+}}
				; SI: v_or_b32_e32 [[VAL2:v[0-9]+]], v{{[0-9]+}}, v{{[0-9]+}}
				; SI-DAG: v_ffbl_b32_e32 v{{[0-9]+}}, [[VAL1]]
				; SI-DAG: v_ffbl_b32_e32 v{{[0-9]+}}, [[VAL2]]
				; EG: MEM_RAT_CACHELESS STORE_RAW [[RESULT:T[0-9]+\.[XYZW]]]
				define amdgpu_kernel void @v_cttz_zero_undef_i64_with_select(i64 addrspace(1)* noalias %out, i64 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i64, i64 addrspace(1)* %arrayidx, align 1
				%cttz = tail call i64 @llvm.cttz.i64(i64 %val, i1 true) nounwind readnone
				arsenmUnsubmitted Done Reply Inline Actions Using undefined VAL arsenm: Using undefined VAL
				%cttz_ret = icmp ne i64 %val, 0
				%ret = select i1 %cttz_ret, i64 %cttz, i64 32
				store i64 %ret, i64 addrspace(1)* %out, align 4
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_i32_sel_eq_neg1:
				; SI: v_ffbl_b32_e32 v{{[0-9]+}}, [[VAL:v[0-9]+]]
				; SI: v_cmp_ne_u32_e32 vcc, 0, [[VAL]]
				; SI: s_endpgm
				; EG: MEM_RAT_CACHELESS STORE_RAW
				; EG: FFBL_INT
				define amdgpu_kernel void @v_cttz_i32_sel_eq_neg1(i32 addrspace(1)* noalias %out, i32 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i32, i32 addrspace(1)* %arrayidx, align 1
				arsenmUnsubmitted Done Reply Inline Actions Undefined VAL arsenm: Undefined VAL
				%ctlz = call i32 @llvm.cttz.i32(i32 %val, i1 false) nounwind readnone
				%cmp = icmp eq i32 %val, 0
				%sel = select i1 %cmp, i32 -1, i32 %ctlz
				store i32 %sel, i32 addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_i32_sel_ne_neg1:
				; SI: v_ffbl_b32_e32 v{{[0-9]+}}, [[VAL:v[0-9]+]]
				; SI: v_cmp_ne_u32_e32 vcc, 0, [[VAL]]
				; SI: s_endpgm
				; EG: MEM_RAT_CACHELESS STORE_RAW
				; EG: FFBL_INT
				define amdgpu_kernel void @v_cttz_i32_sel_ne_neg1(i32 addrspace(1)* noalias %out, i32 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i32, i32 addrspace(1)* %arrayidx, align 1
				%ctlz = call i32 @llvm.cttz.i32(i32 %val, i1 false) nounwind readnone
				%cmp = icmp ne i32 %val, 0
				%sel = select i1 %cmp, i32 %ctlz, i32 -1
				store i32 %sel, i32 addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_i32_sel_ne_bitwidth:
				; SI: v_ffbl_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}
				; SI: v_cmp
				; SI: v_cndmask
				; SI: s_endpgm
				; EG: MEM_RAT_CACHELESS STORE_RAW
				; EG: FFBL_INT
				define amdgpu_kernel void @v_cttz_i32_sel_ne_bitwidth(i32 addrspace(1)* noalias %out, i32 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i32, i32 addrspace(1)* %arrayidx, align 1
				%ctlz = call i32 @llvm.cttz.i32(i32 %val, i1 false) nounwind readnone
				%cmp = icmp ne i32 %ctlz, 32
				%sel = select i1 %cmp, i32 %ctlz, i32 -1
				store i32 %sel, i32 addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_i8_sel_eq_neg1:
				; SI: {{buffer\|flat}}_load_ubyte
				; SI: v_ffbl_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}
				; EG: MEM_RAT MSKOR
				; EG: FFBL_INT
				define amdgpu_kernel void @v_cttz_i8_sel_eq_neg1(i8 addrspace(1)* noalias %out, i8 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i8, i8 addrspace(1)* %arrayidx, align 1
				%ctlz = call i8 @llvm.cttz.i8(i8 %val, i1 false) nounwind readnone
				%cmp = icmp eq i8 %val, 0
				%sel = select i1 %cmp, i8 -1, i8 %ctlz
				store i8 %sel, i8 addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_i16_sel_eq_neg1:
				; SI: {{buffer\|flat}}_load_ubyte
				; SI: v_ffbl_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}
				; SI: buffer_store_short
				; EG: MEM_RAT MSKOR
				; EG: FFBL_INT
				define amdgpu_kernel void @v_cttz_i16_sel_eq_neg1(i16 addrspace(1)* noalias %out, i16 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i16, i16 addrspace(1)* %arrayidx, align 1
				%ctlz = call i16 @llvm.cttz.i16(i16 %val, i1 false) nounwind readnone
				%cmp = icmp eq i16 %val, 0
				%sel = select i1 %cmp, i16 -1, i16 %ctlz
				store i16 %sel, i16 addrspace(1)* %out
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 118718

include/llvm/Target/TargetSelectionDAG.td

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

lib/Target/AMDGPU/AMDGPUISelLowering.h

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

lib/Target/AMDGPU/AMDGPUInstrInfo.td

lib/Target/AMDGPU/EvergreenInstructions.td

lib/Target/AMDGPU/SOPInstructions.td

test/CodeGen/AMDGPU/cttz_zero_undef.ll

Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
ClosedPublic