This is an archive of the discontinued LLVM Phabricator instance.

Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
ClosedPublic

Authored by wdng on Aug 31 2017, 12:27 PM.

Download Raw Diff

Details

Reviewers

arsenm
b-sumner
t-tye
kzhuravl
rampitec

Commits

rG5676acad9e0b: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
rL315610: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.

Summary

During the DAGCombine optimization phase, the LLVM compiler converts ISD::CTTZ_ZERO_UNDEF to ISD::CTTZ and then expands during the Legalization phase, which prevents the v_ffbl_b32 instruction generation. This patch implements custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.

Diff Detail

Repository: rL LLVM

Event Timeline

wdng created this revision.Aug 31 2017, 12:27 PM

Herald added a subscriber: nhaehnle. · View Herald TranscriptAug 31 2017, 12:28 PM

arsenm added inline comments.Aug 31 2017, 1:34 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16680 ↗	(On Diff #113451)	The existing code was the correct way to do this

arsenm added a subscriber: llvm-commits.Aug 31 2017, 1:34 PM

wdng added inline comments.Aug 31 2017, 1:54 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

16680 ↗

(On Diff #113451)

With original code, we will have the following code transformations:

Initial selection DAG: BB#0 'sample_test:entry'
SelectionDAG has 50 nodes:
  t0: ch = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %vreg2
  t3: i64 = Constant<0>
  t5: i64,ch = load<LD8[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant)> t0, t2, undef:i64
  t6: i64,ch = merge_values t5, t5:1
    t8: i64 = add t2, Constant:i64<8>
  t9: i64,ch = load<LD8[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant)> t0, t8, undef:i64
  t10: i64,ch = merge_values t9, t9:1
  t11: ch = TokenFactor t6:1, t10:1
      t13: i64 = llvm.amdgcn.dispatch.ptr TargetConstant:i32<359>
    t19: i64 = add t13, Constant:i64<4>
  t20: i16,ch = load<LD2[%4(addrspace=2)](align=4)(tbaa=<0x4436db8>)> t11, t19, undef:i64
    t25: i64 = llvm.amdgcn.implicitarg.ptr TargetConstant:i32<460>
  t27: i64,ch = load<LD8[%11(addrspace=2)](tbaa=<0x4435518>)> t11, t25, undef:i64
  t29: i64 = Constant<32>
              t17: i32 = llvm.amdgcn.workgroup.id.x TargetConstant:i32<505>
              t21: i32 = zero_extend t20
            t22: i32 = mul t17, t21
            t15: i32 = llvm.amdgcn.workitem.id.x TargetConstant:i32<508>
          t23: i32 = add t22, t15 
        t26: i64 = zero_extend t23 
      t28: i64 = add t27, t26 
    t31: i64 = shl t28, Constant:i32<32>
  t32: i64 = sra t31, Constant:i32<32>
    t33: i64 = add t6, t32 
  t34: i8,ch = load<LD1[%arrayidx(addrspace=1)](tbaa=<0x4435498>)> t11, t33, undef:i64
  t35: i32 = zero_extend t34 
    t39: i1 = setcc t35, Constant:i32<0>, setne:ch
    t36: i32 = cttz_zero_undef t35
  t40: i32 = select t39, t36, Constant:i32<32>
  t43: i1 = setcc Constant:i32<8>, t40, setult:ch
      t47: ch = TokenFactor t20:1, t27:1, t34:1
        t44: i32 = umin t40, Constant:i32<8>
      t45: i8 = truncate t44 
      t46: i64 = add t10, t32 
    t48: ch = store<ST1[%arrayidx3(addrspace=1)](tbaa=<0x4435498>)> t47, t45, t46, undef:i64
  t49: ch = ENDPGM t48

Optimized lowered selection DAG: BB#0 'sample_test:entry'
SelectionDAG has 35 nodes:
  t0: ch = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %vreg2
  t5: i64,ch = load<LD8[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant)> t0, t2, undef:i64
    t8: i64 = add t2, Constant:i64<8>
  t9: i64,ch = load<LD8[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant)> t0, t8, undef:i64
  t11: ch = TokenFactor t5:1, t9:1
    t33: i64 = add t5, t63 
  t54: i32,ch = load<LD1[%arrayidx(addrspace=1)](tbaa=<0x4435498>), zext from i8> t11, t33, undef:i64
    t25: i64 = llvm.amdgcn.implicitarg.ptr TargetConstant:i32<460>
  t62: i32,ch = load<LD4[%11(addrspace=2)](align=8)(tbaa=<0x4435518>)> t11, t25, undef:i64
          t17: i32 = llvm.amdgcn.workgroup.id.x TargetConstant:i32<505>
        t22: i32 = mul t17, t64 
        t15: i32 = llvm.amdgcn.workitem.id.x TargetConstant:i32<508>
      t23: i32 = add t22, t15
    t60: i32 = add t62, t23
  t63: i64 = sign_extend t60, ValueType:ch:i32
      t13: i64 = llvm.amdgcn.dispatch.ptr TargetConstant:i32<359>
    t19: i64 = add t13, Constant:i64<4>
  t64: i32,ch = load<LD2[%4(addrspace=2)](align=4)(tbaa=<0x4436db8>), zext from i16> t11, t19, undef:i64
      t47: ch = TokenFactor t64:1, t62:1, t54:1
        t53: i32 = cttz t54
      t44: i32 = umin t53, Constant:i32<8>
      t46: i64 = add t9, t63
    t50: ch = store<ST1[%arrayidx3(addrspace=1)](tbaa=<0x4435498>), trunc to i8> t47, t44, t46, undef:i64
  t49: ch = ENDPGM t50

We won't be able to generate s/v_ffbl instructions. I found llvm.cttz.i32 has all been converted to cttz_zero_undef instread of 'cttz'.

If we don't want to change the original way of implementation, we may want to do a custom lowering for ISD::CTTZ at AMDGPU backend to ISD::CTTZ_ZERO_UNDE?

Ping.

I think the actual problem is the implementation of ISD::CTTZ not using v_ffbl and not this transformation.

If v_ffbl is able to produce a defined answer of bit width for 0, then you want to match it with cttz and have the operation action for cttz_zero_undef set to Expand. That will turn all cttz_zero_undef calls into cttz.

If v_ffbl is not capable of handling zero, then you want cttz_zero_undef set to Legal, and cttz set to Expand which will make use of cttz_zero_undef and a select. Or you can make cttz Custom and do your own lowering.

I think the instruction behavior is to return -1 on 0 input. IIRC we handle this and fold that for ctlz already, just not cttz.

Just add a custom lowering ISD:CTTZ to ISD::CTTZ_ZERO_UNDEF

In D37348#859119, @wdng wrote:

Just add a custom lowering ISD:CTTZ to ISD::CTTZ_ZERO_UNDEF

I don't think that will help. Why not follow exactly how CTLZ* is handled now and implement AMDGPUTargetLowering::LowerCTTZ making use of ffbl?

wdng updated this revision to Diff 114215.Sep 7 2017, 11:13 AM

wdng retitled this revision from Tighten conditions for converting ISD::CTTZ_ZERO_UNDEF to ISD::CTTZ to Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ..

wdng added a reviewer: craig.topper.Sep 7 2017, 11:36 AM

Ping.

wdng added a reviewer: t-tye.Sep 8 2017, 12:10 PM

arsenm added inline comments.Sep 8 2017, 1:22 PM

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2784–2793 ↗	(On Diff #114215)	I don't understand why you have this or most of the other changes. This shouldn't be substantially different from how we handle ctlz already. i.e. I would expect to see another version of AMDGPUTargetLowering::performCtlzCombine that does essentially the same thing for CTTZ.
lib/Target/AMDGPU/AMDGPUISelLowering.cpp
420 ↗	(On Diff #114215)	This should definitely remain legal
2029 ↗	(On Diff #114215)	You didn't enable custom lowering for i64, so this is dead code. You also didn't a dd any tests for it. In either case, it should be in a separate patch from the i32 handling.
lib/Target/AMDGPU/AMDGPUInstrInfo.td
301–302 ↗	(On Diff #114215)	This isn't a signed/unsigned operation. There is just one v_ffbl_b32.

craig.topper resigned from this revision.Sep 8 2017, 10:48 PM

wdng marked 2 inline comments as done.Sep 11 2017, 9:05 AM

wdng added inline comments.

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
420 ↗	(On Diff #114215)	I think it doesn't matter to define it as Custom, because it will be converted to FFBL_U32 during the custom lowering and then pattern matching to the ffbl instruction anyway at the end. However, if we defined it as Legal, we will have a "duplicate" or "extra" pattern (FFBL_U32 and CTTZ_ZERO_UNDEF) for generating the ffbl instruction. Is there any specific reason that I neglect here that we have to define it as Legal?

arsenm added inline comments.Sep 12 2017, 7:05 PM

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2803–2806 ↗	(On Diff #114215)	OK, I see the default expansion here isn't the compare and select like I expected. Since the compare+select implementation is likely more instructions with the compare than the sub/ctpop implementation, that one should be tried first.
lib/Target/AMDGPU/AMDGPUISelLowering.cpp
417 ↗	(On Diff #114215)	We should probably fix this at some point to be legal
1109–1110 ↗	(On Diff #114215)	Also need the select with -1 optimization (and corresponding tests) as cttz
2021 ↗	(On Diff #114215)	This is mostly copy past from LowerCTLZ. These should be factored into a common helper.
test/CodeGen/AMDGPU/cttz_zero_undef.ll
103 ↗	(On Diff #114215)	Need i64 tests

Changes based on code review feedback.

Upload a full diff.

Missing performCtlzCombine equivalent

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2803–2806 ↗	(On Diff #114215)	I don't see this changed
test/CodeGen/AMDGPU/cttz_zero_undef.ll
109 ↗	(On Diff #115231)	Missing scalar version

Address code reviews.

Fix the issues that variables are not capitalized.

wdng marked 2 inline comments as done.Sep 14 2017, 2:50 PM

Ping.

arsenm added inline comments.Sep 15 2017, 10:17 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2036–2042 ↗	(On Diff #115296)	Indentation wrong
2043 ↗	(On Diff #115296)	llvm_unreachable
2061–2064 ↗	(On Diff #115296)	Select between Zero and One as input to getSetCC
3008–3009 ↗	(On Diff #115296)	You could just pass in the new opcode directly rather than selecting it again
3010 ↗	(On Diff #115296)	Commented out code
3035 ↗	(On Diff #115296)	You didn't add tests for this part
lib/Target/AMDGPU/AMDGPUISelLowering.h
374 ↗	(On Diff #115296)	Should be name FFBL_B32 to match the instruction
test/CodeGen/AMDGPU/cttz_zero_undef.ll
131 ↗	(On Diff #115296)	Also should have some tests with i8/i16

Will create another separate ticket to fix the v_ffbl_sdwa instruction generation.

Ping.

wdng edited reviewers, added: kzhuravl; removed: craig.topper.Sep 22 2017, 12:29 PM

Ping.

Needs more comprehensive check lines. Just checking the instructions won't demonstrate that the extra instructions you're trying to avoid aren't there

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
3028 ↗	(On Diff #115908)	Don't include AMDGPUISD in the name of this
3053 ↗	(On Diff #115908)	Ditto
test/CodeGen/AMDGPU/cttz_zero_undef.ll
85 ↗	(On Diff #115908)	This needs to check more
97 ↗	(On Diff #115908)	This needs to check more
121–122 ↗	(On Diff #115908)	Ditto

Address code reivews.

Ping.

wdng added a reviewer: rampitec.Oct 10 2017, 11:10 AM

arsenm added inline comments.Oct 11 2017, 11:05 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2031 ↗	(On Diff #118277)	Extra space after ::
2040 ↗	(On Diff #118277)	Don't includ eAMDGPUISD in variable name
2041 ↗	(On Diff #118277)	Missing space before {
2073 ↗	(On Diff #118277)	Double // and missing closing )
2077 ↗	(On Diff #118277)	Double //
test/CodeGen/AMDGPU/cttz_zero_undef.ll
1 ↗	(On Diff #118277)	Add -enable-var-scope to all of the FileCheck lines. Several of these tests are broken
171–172 ↗	(On Diff #118277)	This isn't checking the outputs and select
184 ↗	(On Diff #118277)	Using undefined VAL
198 ↗	(On Diff #118277)	Undefined VAL

Address code reviews.

wdng marked 3 inline comments as done.Oct 11 2017, 4:14 PM

craig.topper removed a subscriber: craig.topper.Oct 11 2017, 4:16 PM

arsenm added inline comments.Oct 11 2017, 4:24 PM

test/CodeGen/AMDGPU/cttz_zero_undef.ll
175–176 ↗	(On Diff #118718)	Using 2 -DAGs with identical lines doesn't do anything. It will pass with only one

wdng added inline comments.Oct 11 2017, 4:27 PM

test/CodeGen/AMDGPU/cttz_zero_undef.ll
175–176 ↗	(On Diff #118718)	No, it won't work if I remove the -DAG. As the generated instructions get interleaved with each other.

Remove duplicate check lines.

Removed -DAG checks completely.

LGTM

This revision is now accepted and ready to land.Oct 12 2017, 10:39 AM

Closed by commit rL315610: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ. (authored by wdng). · Explain WhyOct 12 2017, 12:37 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Target/

TargetSelectionDAG.td

2 lines

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

19 lines

Target/

AMDGPU/

AMDGPUISelLowering.h

7 lines

AMDGPUISelLowering.cpp

100 lines

AMDGPUInstrInfo.td

2 lines

EvergreenInstructions.td

2 lines

SOPInstructions.td

5 lines

test/

CodeGen/

AMDGPU/

cttz_zero_undef.ll

194 lines

Diff 118831

llvm/trunk/include/llvm/Target/TargetSelectionDAG.td

Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	def SDTFPBinOp : SDTypeProfile<1, 2, [ // fadd, fmul, etc.
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisFP<0>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisFP<0>
]>;		]>;
def SDTFPSignOp : SDTypeProfile<1, 2, [ // fcopysign.		def SDTFPSignOp : SDTypeProfile<1, 2, [ // fcopysign.
SDTCisSameAs<0, 1>, SDTCisFP<0>, SDTCisFP<2>		SDTCisSameAs<0, 1>, SDTCisFP<0>, SDTCisFP<2>
]>;		]>;
def SDTFPTernaryOp : SDTypeProfile<1, 3, [ // fmadd, fnmsub, etc.		def SDTFPTernaryOp : SDTypeProfile<1, 3, [ // fmadd, fnmsub, etc.
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisFP<0>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisFP<0>
]>;		]>;
def SDTIntUnaryOp : SDTypeProfile<1, 1, [ // ctlz		def SDTIntUnaryOp : SDTypeProfile<1, 1, [ // ctlz, cttz
SDTCisSameAs<0, 1>, SDTCisInt<0>		SDTCisSameAs<0, 1>, SDTCisInt<0>
]>;		]>;
def SDTIntExtendOp : SDTypeProfile<1, 1, [ // sext, zext, anyext		def SDTIntExtendOp : SDTypeProfile<1, 1, [ // sext, zext, anyext
SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<1, 0>, SDTCisSameNumEltsAs<0, 1>		SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<1, 0>, SDTCisSameNumEltsAs<0, 1>
]>;		]>;
def SDTIntTruncOp : SDTypeProfile<1, 1, [ // trunc		def SDTIntTruncOp : SDTypeProfile<1, 1, [ // trunc
SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<0, 1>, SDTCisSameNumEltsAs<0, 1>		SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<0, 1>, SDTCisSameNumEltsAs<0, 1>
]>;		]>;
▲ Show 20 Lines • Show All 1,045 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 2,767 Lines • ▼ Show 20 Lines	case ISD::CTPOP: {

return Op;		return Op;
}		}
case ISD::CTLZ_ZERO_UNDEF:		case ISD::CTLZ_ZERO_UNDEF:
// This trivially expands to CTLZ.		// This trivially expands to CTLZ.
return DAG.getNode(ISD::CTLZ, dl, Op.getValueType(), Op);		return DAG.getNode(ISD::CTLZ, dl, Op.getValueType(), Op);
case ISD::CTLZ: {		case ISD::CTLZ: {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
unsigned len = VT.getSizeInBits();		unsigned Len = VT.getSizeInBits();

if (TLI.isOperationLegalOrCustom(ISD::CTLZ_ZERO_UNDEF, VT)) {		if (TLI.isOperationLegalOrCustom(ISD::CTLZ_ZERO_UNDEF, VT)) {
EVT SetCCVT = getSetCCResultType(VT);		EVT SetCCVT = getSetCCResultType(VT);
SDValue CTLZ = DAG.getNode(ISD::CTLZ_ZERO_UNDEF, dl, VT, Op);		SDValue CTLZ = DAG.getNode(ISD::CTLZ_ZERO_UNDEF, dl, VT, Op);
SDValue Zero = DAG.getConstant(0, dl, VT);		SDValue Zero = DAG.getConstant(0, dl, VT);
SDValue SrcIsZero = DAG.getSetCC(dl, SetCCVT, Op, Zero, ISD::SETEQ);		SDValue SrcIsZero = DAG.getSetCC(dl, SetCCVT, Op, Zero, ISD::SETEQ);
return DAG.getNode(ISD::SELECT, dl, VT, SrcIsZero,		return DAG.getNode(ISD::SELECT, dl, VT, SrcIsZero,
DAG.getConstant(len, dl, VT), CTLZ);		DAG.getConstant(Len, dl, VT), CTLZ);
}		}

// for now, we do this:		// for now, we do this:
// x = x \| (x >> 1);		// x = x \| (x >> 1);
// x = x \| (x >> 2);		// x = x \| (x >> 2);
// ...		// ...
// x = x \| (x >>16);		// x = x \| (x >>16);
// x = x \| (x >>32); // for 64-bit input		// x = x \| (x >>32); // for 64-bit input
// return popcount(~x);		// return popcount(~x);
//		//
// Ref: "Hacker's Delight" by Henry Warren		// Ref: "Hacker's Delight" by Henry Warren
EVT ShVT = TLI.getShiftAmountTy(VT, DAG.getDataLayout());		EVT ShVT = TLI.getShiftAmountTy(VT, DAG.getDataLayout());
for (unsigned i = 0; (1U << i) <= (len / 2); ++i) {		for (unsigned i = 0; (1U << i) <= (Len / 2); ++i) {
SDValue Tmp3 = DAG.getConstant(1ULL << i, dl, ShVT);		SDValue Tmp3 = DAG.getConstant(1ULL << i, dl, ShVT);
Op = DAG.getNode(ISD::OR, dl, VT, Op,		Op = DAG.getNode(ISD::OR, dl, VT, Op,
DAG.getNode(ISD::SRL, dl, VT, Op, Tmp3));		DAG.getNode(ISD::SRL, dl, VT, Op, Tmp3));
}		}
Op = DAG.getNOT(dl, Op, VT);		Op = DAG.getNOT(dl, Op, VT);
return DAG.getNode(ISD::CTPOP, dl, VT, Op);		return DAG.getNode(ISD::CTPOP, dl, VT, Op);
}		}
case ISD::CTTZ_ZERO_UNDEF:		case ISD::CTTZ_ZERO_UNDEF:
// This trivially expands to CTTZ.		// This trivially expands to CTTZ.
return DAG.getNode(ISD::CTTZ, dl, Op.getValueType(), Op);		return DAG.getNode(ISD::CTTZ, dl, Op.getValueType(), Op);
case ISD::CTTZ: {		case ISD::CTTZ: {
		EVT VT = Op.getValueType();
		unsigned Len = VT.getSizeInBits();

		if (TLI.isOperationLegalOrCustom(ISD::CTTZ_ZERO_UNDEF, VT)) {
		EVT SetCCVT = getSetCCResultType(VT);
		SDValue CTTZ = DAG.getNode(ISD::CTTZ_ZERO_UNDEF, dl, VT, Op);
		SDValue Zero = DAG.getConstant(0, dl, VT);
		SDValue SrcIsZero = DAG.getSetCC(dl, SetCCVT, Op, Zero, ISD::SETEQ);
		return DAG.getNode(ISD::SELECT, dl, VT, SrcIsZero,
		DAG.getConstant(Len, dl, VT), CTTZ);
		}

// for now, we use: { return popcount(~x & (x - 1)); }		// for now, we use: { return popcount(~x & (x - 1)); }
// unless the target has ctlz but not ctpop, in which case we use:		// unless the target has ctlz but not ctpop, in which case we use:
// { return 32 - nlz(~x & (x-1)); }		// { return 32 - nlz(~x & (x-1)); }
// Ref: "Hacker's Delight" by Henry Warren		// Ref: "Hacker's Delight" by Henry Warren
EVT VT = Op.getValueType();
SDValue Tmp3 = DAG.getNode(ISD::AND, dl, VT,		SDValue Tmp3 = DAG.getNode(ISD::AND, dl, VT,
DAG.getNOT(dl, Op, VT),		DAG.getNOT(dl, Op, VT),
DAG.getNode(ISD::SUB, dl, VT, Op,		DAG.getNode(ISD::SUB, dl, VT, Op,
DAG.getConstant(1, dl, VT)));		DAG.getConstant(1, dl, VT)));
// If ISD::CTLZ is legal and CTPOP isn't, then do that instead.		// If ISD::CTLZ is legal and CTPOP isn't, then do that instead.
if (!TLI.isOperationLegalOrCustom(ISD::CTPOP, VT) &&		if (!TLI.isOperationLegalOrCustom(ISD::CTPOP, VT) &&
TLI.isOperationLegalOrCustom(ISD::CTLZ, VT))		TLI.isOperationLegalOrCustom(ISD::CTLZ, VT))
return DAG.getNode(ISD::SUB, dl, VT,		return DAG.getNode(ISD::SUB, dl, VT,
▲ Show 20 Lines • Show All 1,863 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.h

Show All 26 Lines
struct ArgDescriptor;		struct ArgDescriptor;

class AMDGPUTargetLowering : public TargetLowering {		class AMDGPUTargetLowering : public TargetLowering {
private:		private:
/// \returns AMDGPUISD::FFBH_U32 node if the incoming \p Op may have been		/// \returns AMDGPUISD::FFBH_U32 node if the incoming \p Op may have been
/// legalized from a smaller type VT. Need to match pre-legalized type because		/// legalized from a smaller type VT. Need to match pre-legalized type because
/// the generic legalization inserts the add/sub between the select and		/// the generic legalization inserts the add/sub between the select and
/// compare.		/// compare.
SDValue getFFBH_U32(SelectionDAG &DAG, SDValue Op, const SDLoc &DL) const;		SDValue getFFBX_U32(SelectionDAG &DAG, SDValue Op, const SDLoc &DL, unsigned Opc) const;

public:		public:
static bool isOrEquivalentToAdd(SelectionDAG &DAG, SDValue Op);		static bool isOrEquivalentToAdd(SelectionDAG &DAG, SDValue Op);

protected:		protected:
const AMDGPUSubtarget *Subtarget;		const AMDGPUSubtarget *Subtarget;
AMDGPUAS AMDGPUASI;		AMDGPUAS AMDGPUASI;

SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerEXTRACT_SUBVECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const;
/// \brief Split a vector store into multiple scalar stores.		/// \brief Split a vector store into multiple scalar stores.
/// \returns The resulting chain.		/// \returns The resulting chain.

SDValue LowerFREM(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFREM(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFCEIL(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFCEIL(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFTRUNC(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFTRUNC(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFRINT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFRINT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFNEARBYINT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFNEARBYINT(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerFROUND32_16(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFROUND32_16(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFROUND64(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFROUND64(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFROUND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFROUND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFFLOOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFFLOOR(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerCTLZ(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCTLZ_CTTZ(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerINT_TO_FP32(SDValue Op, SelectionDAG &DAG, bool Signed) const;		SDValue LowerINT_TO_FP32(SDValue Op, SelectionDAG &DAG, bool Signed) const;
SDValue LowerINT_TO_FP64(SDValue Op, SelectionDAG &DAG, bool Signed) const;		SDValue LowerINT_TO_FP64(SDValue Op, SelectionDAG &DAG, bool Signed) const;
SDValue LowerUINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerUINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerFP64_TO_INT(SDValue Op, SelectionDAG &DAG, bool Signed) const;		SDValue LowerFP64_TO_INT(SDValue Op, SelectionDAG &DAG, bool Signed) const;
SDValue LowerFP_TO_FP16(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFP_TO_FP16(SDValue Op, SelectionDAG &DAG) const;
Show All 14 Lines	SDValue splitBinaryBitConstantOpImpl(DAGCombinerInfo &DCI, const SDLoc &SL,
uint32_t ValLo, uint32_t ValHi) const;		uint32_t ValLo, uint32_t ValHi) const;
SDValue performShlCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performShlCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performSraCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performSraCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performSrlCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performSrlCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performMulCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performMulCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performMulhsCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performMulhsCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performMulhuCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performMulhuCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performMulLoHi24Combine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performMulLoHi24Combine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performCtlzCombine(const SDLoc &SL, SDValue Cond, SDValue LHS,		SDValue performCtlz_CttzCombine(const SDLoc &SL, SDValue Cond, SDValue LHS,
SDValue RHS, DAGCombinerInfo &DCI) const;		SDValue RHS, DAGCombinerInfo &DCI) const;
SDValue performSelectCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performSelectCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performFNegCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performFNegCombine(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue performFAbsCombine(SDNode *N, DAGCombinerInfo &DCI) const;		SDValue performFAbsCombine(SDNode *N, DAGCombinerInfo &DCI) const;

static EVT getEquivalentMemType(LLVMContext &Context, EVT VT);		static EVT getEquivalentMemType(LLVMContext &Context, EVT VT);

virtual SDValue LowerGlobalAddress(AMDGPUMachineFunction *MFI, SDValue Op,		virtual SDValue LowerGlobalAddress(AMDGPUMachineFunction *MFI, SDValue Op,
▲ Show 20 Lines • Show All 266 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
CARRY,		CARRY,
BORROW,		BORROW,
BFE_U32, // Extract range of bits with zero extension to 32-bits.		BFE_U32, // Extract range of bits with zero extension to 32-bits.
BFE_I32, // Extract range of bits with sign extension to 32-bits.		BFE_I32, // Extract range of bits with sign extension to 32-bits.
BFI, // (src0 & src1) \| (~src0 & src2)		BFI, // (src0 & src1) \| (~src0 & src2)
BFM, // Insert a range of bits into a 32-bit word.		BFM, // Insert a range of bits into a 32-bit word.
FFBH_U32, // ctlz with -1 if input is zero.		FFBH_U32, // ctlz with -1 if input is zero.
FFBH_I32,		FFBH_I32,
		FFBL_B32, // cttz with -1 if input is zero.
MUL_U24,		MUL_U24,
MUL_I24,		MUL_I24,
MULHI_U24,		MULHI_U24,
MULHI_I24,		MULHI_I24,
MAD_U24,		MAD_U24,
MAD_I24,		MAD_I24,
MUL_LOHI_I24,		MUL_LOHI_I24,
MUL_LOHI_U24,		MUL_LOHI_U24,
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 411 Lines • ▼ Show 20 Lines	AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::UMIN, MVT::i32, Legal);		setOperationAction(ISD::UMIN, MVT::i32, Legal);
setOperationAction(ISD::SMAX, MVT::i32, Legal);		setOperationAction(ISD::SMAX, MVT::i32, Legal);
setOperationAction(ISD::UMAX, MVT::i32, Legal);		setOperationAction(ISD::UMAX, MVT::i32, Legal);

if (Subtarget->hasFFBH())		if (Subtarget->hasFFBH())
setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i32, Custom);		setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i32, Custom);

if (Subtarget->hasFFBL())		if (Subtarget->hasFFBL())
setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i32, Legal);		setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i32, Custom);

		setOperationAction(ISD::CTTZ, MVT::i64, Custom);
		setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i64, Custom);
setOperationAction(ISD::CTLZ, MVT::i64, Custom);		setOperationAction(ISD::CTLZ, MVT::i64, Custom);
setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i64, Custom);		setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i64, Custom);

// We only really have 32-bit BFE instructions (and 16-bit on VI).		// We only really have 32-bit BFE instructions (and 16-bit on VI).
//		//
// On SI+ there are 64-bit BFEs, but they are scalar only and there isn't any		// On SI+ there are 64-bit BFEs, but they are scalar only and there isn't any
// effort to match them now. We want this to be false for i64 cases when the		// effort to match them now. We want this to be false for i64 cases when the
// extraction isn't restricted to the upper or lower half. Ideally we would		// extraction isn't restricted to the upper or lower half. Ideally we would
▲ Show 20 Lines • Show All 678 Lines • ▼ Show 20 Lines	SDValue AMDGPUTargetLowering::LowerOperation(SDValue Op,
case ISD::FNEARBYINT: return LowerFNEARBYINT(Op, DAG);		case ISD::FNEARBYINT: return LowerFNEARBYINT(Op, DAG);
case ISD::FROUND: return LowerFROUND(Op, DAG);		case ISD::FROUND: return LowerFROUND(Op, DAG);
case ISD::FFLOOR: return LowerFFLOOR(Op, DAG);		case ISD::FFLOOR: return LowerFFLOOR(Op, DAG);
case ISD::SINT_TO_FP: return LowerSINT_TO_FP(Op, DAG);		case ISD::SINT_TO_FP: return LowerSINT_TO_FP(Op, DAG);
case ISD::UINT_TO_FP: return LowerUINT_TO_FP(Op, DAG);		case ISD::UINT_TO_FP: return LowerUINT_TO_FP(Op, DAG);
case ISD::FP_TO_FP16: return LowerFP_TO_FP16(Op, DAG);		case ISD::FP_TO_FP16: return LowerFP_TO_FP16(Op, DAG);
case ISD::FP_TO_SINT: return LowerFP_TO_SINT(Op, DAG);		case ISD::FP_TO_SINT: return LowerFP_TO_SINT(Op, DAG);
case ISD::FP_TO_UINT: return LowerFP_TO_UINT(Op, DAG);		case ISD::FP_TO_UINT: return LowerFP_TO_UINT(Op, DAG);
		case ISD::CTTZ:
		case ISD::CTTZ_ZERO_UNDEF:
case ISD::CTLZ:		case ISD::CTLZ:
case ISD::CTLZ_ZERO_UNDEF:		case ISD::CTLZ_ZERO_UNDEF:
return LowerCTLZ(Op, DAG);		return LowerCTLZ_CTTZ(Op, DAG);
case ISD::DYNAMIC_STACKALLOC: return LowerDYNAMIC_STACKALLOC(Op, DAG);		case ISD::DYNAMIC_STACKALLOC: return LowerDYNAMIC_STACKALLOC(Op, DAG);
}		}
return Op;		return Op;
}		}

void AMDGPUTargetLowering::ReplaceNodeResults(SDNode *N,		void AMDGPUTargetLowering::ReplaceNodeResults(SDNode *N,
SmallVectorImpl<SDValue> &Results,		SmallVectorImpl<SDValue> &Results,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
▲ Show 20 Lines • Show All 1,022 Lines • ▼ Show 20 Lines	SDValue AMDGPUTargetLowering::LowerFFLOOR(SDValue Op, SelectionDAG &DAG) const {
SDValue NeTrunc = DAG.getSetCC(SL, SetCCVT, Src, Trunc, ISD::SETONE);		SDValue NeTrunc = DAG.getSetCC(SL, SetCCVT, Src, Trunc, ISD::SETONE);
SDValue And = DAG.getNode(ISD::AND, SL, SetCCVT, Lt0, NeTrunc);		SDValue And = DAG.getNode(ISD::AND, SL, SetCCVT, Lt0, NeTrunc);

SDValue Add = DAG.getNode(ISD::SELECT, SL, MVT::f64, And, NegOne, Zero);		SDValue Add = DAG.getNode(ISD::SELECT, SL, MVT::f64, And, NegOne, Zero);
// TODO: Should this propagate fast-math-flags?		// TODO: Should this propagate fast-math-flags?
return DAG.getNode(ISD::FADD, SL, MVT::f64, Trunc, Add);		return DAG.getNode(ISD::FADD, SL, MVT::f64, Trunc, Add);
}		}

SDValue AMDGPUTargetLowering::LowerCTLZ(SDValue Op, SelectionDAG &DAG) const {		static bool isCtlzOpc(unsigned Opc) {
		return Opc == ISD::CTLZ \|\| Opc == ISD::CTLZ_ZERO_UNDEF;
		}

		static bool isCttzOpc(unsigned Opc) {
		return Opc == ISD::CTTZ \|\| Opc == ISD::CTTZ_ZERO_UNDEF;
		}

		SDValue AMDGPUTargetLowering::LowerCTLZ_CTTZ(SDValue Op, SelectionDAG &DAG) const {
SDLoc SL(Op);		SDLoc SL(Op);
SDValue Src = Op.getOperand(0);		SDValue Src = Op.getOperand(0);
bool ZeroUndef = Op.getOpcode() == ISD::CTLZ_ZERO_UNDEF;		bool ZeroUndef = Op.getOpcode() == ISD::CTTZ_ZERO_UNDEF \|\|
		Op.getOpcode() == ISD::CTLZ_ZERO_UNDEF;

		unsigned ISDOpc, NewOpc;
		if (isCtlzOpc(Op.getOpcode())) {
		ISDOpc = ISD::CTLZ_ZERO_UNDEF;
		NewOpc = AMDGPUISD::FFBH_U32;
		} else if (isCttzOpc(Op.getOpcode())) {
		ISDOpc = ISD::CTTZ_ZERO_UNDEF;
		NewOpc = AMDGPUISD::FFBL_B32;
		} else
		llvm_unreachable("Unexpected OPCode!!!");


if (ZeroUndef && Src.getValueType() == MVT::i32)		if (ZeroUndef && Src.getValueType() == MVT::i32)
return DAG.getNode(AMDGPUISD::FFBH_U32, SL, MVT::i32, Src);		return DAG.getNode(NewOpc, SL, MVT::i32, Src);

SDValue Vec = DAG.getNode(ISD::BITCAST, SL, MVT::v2i32, Src);		SDValue Vec = DAG.getNode(ISD::BITCAST, SL, MVT::v2i32, Src);

const SDValue Zero = DAG.getConstant(0, SL, MVT::i32);		const SDValue Zero = DAG.getConstant(0, SL, MVT::i32);
const SDValue One = DAG.getConstant(1, SL, MVT::i32);		const SDValue One = DAG.getConstant(1, SL, MVT::i32);

SDValue Lo = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::i32, Vec, Zero);		SDValue Lo = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::i32, Vec, Zero);
SDValue Hi = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::i32, Vec, One);		SDValue Hi = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::i32, Vec, One);

EVT SetCCVT = getSetCCResultType(DAG.getDataLayout(),		EVT SetCCVT = getSetCCResultType(DAG.getDataLayout(),
*DAG.getContext(), MVT::i32);		*DAG.getContext(), MVT::i32);

SDValue Hi0 = DAG.getSetCC(SL, SetCCVT, Hi, Zero, ISD::SETEQ);		SDValue ZeroOrOne = isCtlzOpc(Op.getOpcode()) ? Zero : One;
		SDValue HiOrLo = isCtlzOpc(Op.getOpcode()) ? Hi : Lo;
		SDValue Hi0orLo0 = DAG.getSetCC(SL, SetCCVT, HiOrLo, ZeroOrOne, ISD::SETEQ);

SDValue CtlzLo = DAG.getNode(ISD::CTLZ_ZERO_UNDEF, SL, MVT::i32, Lo);		SDValue OprLo = DAG.getNode(ISDOpc, SL, MVT::i32, Lo);
SDValue CtlzHi = DAG.getNode(ISD::CTLZ_ZERO_UNDEF, SL, MVT::i32, Hi);		SDValue OprHi = DAG.getNode(ISDOpc, SL, MVT::i32, Hi);

const SDValue Bits32 = DAG.getConstant(32, SL, MVT::i32);		const SDValue Bits32 = DAG.getConstant(32, SL, MVT::i32);
SDValue Add = DAG.getNode(ISD::ADD, SL, MVT::i32, CtlzLo, Bits32);		SDValue Add, NewOpr;
		if (isCtlzOpc(Op.getOpcode())) {
		Add = DAG.getNode(ISD::ADD, SL, MVT::i32, OprLo, Bits32);
// ctlz(x) = hi_32(x) == 0 ? ctlz(lo_32(x)) + 32 : ctlz(hi_32(x))		// ctlz(x) = hi_32(x) == 0 ? ctlz(lo_32(x)) + 32 : ctlz(hi_32(x))
SDValue NewCtlz = DAG.getNode(ISD::SELECT, SL, MVT::i32, Hi0, Add, CtlzHi);		NewOpr = DAG.getNode(ISD::SELECT, SL, MVT::i32, Hi0orLo0, Add, OprHi);
		} else {
		Add = DAG.getNode(ISD::ADD, SL, MVT::i32, OprHi, Bits32);
		// cttz(x) = lo_32(x) == 0 ? cttz(hi_32(x)) + 32 : cttz(lo_32(x))
		NewOpr = DAG.getNode(ISD::SELECT, SL, MVT::i32, Hi0orLo0, Add, OprLo);
		}

if (!ZeroUndef) {		if (!ZeroUndef) {
// Test if the full 64-bit input is zero.		// Test if the full 64-bit input is zero.

// FIXME: DAG combines turn what should be an s_and_b64 into a v_or_b32,		// FIXME: DAG combines turn what should be an s_and_b64 into a v_or_b32,
// which we probably don't want.		// which we probably don't want.
SDValue Lo0 = DAG.getSetCC(SL, SetCCVT, Lo, Zero, ISD::SETEQ);		SDValue LoOrHi = isCtlzOpc(Op.getOpcode()) ? Lo : Hi;
SDValue SrcIsZero = DAG.getNode(ISD::AND, SL, SetCCVT, Lo0, Hi0);		SDValue Lo0OrHi0 = DAG.getSetCC(SL, SetCCVT, LoOrHi, ZeroOrOne, ISD::SETEQ);
		SDValue SrcIsZero = DAG.getNode(ISD::AND, SL, SetCCVT, Lo0OrHi0, Hi0orLo0);

// TODO: If i64 setcc is half rate, it can result in 1 fewer instruction		// TODO: If i64 setcc is half rate, it can result in 1 fewer instruction
// with the same cycles, otherwise it is slower.		// with the same cycles, otherwise it is slower.
// SDValue SrcIsZero = DAG.getSetCC(SL, SetCCVT, Src,		// SDValue SrcIsZero = DAG.getSetCC(SL, SetCCVT, Src,
// DAG.getConstant(0, SL, MVT::i64), ISD::SETEQ);		// DAG.getConstant(0, SL, MVT::i64), ISD::SETEQ);

const SDValue Bits32 = DAG.getConstant(64, SL, MVT::i32);		const SDValue Bits32 = DAG.getConstant(64, SL, MVT::i32);

// The instruction returns -1 for 0 input, but the defined intrinsic		// The instruction returns -1 for 0 input, but the defined intrinsic
// behavior is to return the number of bits.		// behavior is to return the number of bits.
NewCtlz = DAG.getNode(ISD::SELECT, SL, MVT::i32,		NewOpr = DAG.getNode(ISD::SELECT, SL, MVT::i32,
SrcIsZero, Bits32, NewCtlz);		SrcIsZero, Bits32, NewOpr);
}		}

return DAG.getNode(ISD::ZERO_EXTEND, SL, MVT::i64, NewCtlz);		return DAG.getNode(ISD::ZERO_EXTEND, SL, MVT::i64, NewOpr);
}		}

SDValue AMDGPUTargetLowering::LowerINT_TO_FP32(SDValue Op, SelectionDAG &DAG,		SDValue AMDGPUTargetLowering::LowerINT_TO_FP32(SDValue Op, SelectionDAG &DAG,
bool Signed) const {		bool Signed) const {
// Unsigned		// Unsigned
// cul2f(ulong u)		// cul2f(ulong u)
//{		//{
// uint lz = clz(u);		// uint lz = clz(u);
▲ Show 20 Lines • Show All 895 Lines • ▼ Show 20 Lines
}		}

static bool isNegativeOne(SDValue Val) {		static bool isNegativeOne(SDValue Val) {
if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Val))		if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Val))
return C->isAllOnesValue();		return C->isAllOnesValue();
return false;		return false;
}		}

static bool isCtlzOpc(unsigned Opc) {		SDValue AMDGPUTargetLowering::getFFBX_U32(SelectionDAG &DAG,
return Opc == ISD::CTLZ \|\| Opc == ISD::CTLZ_ZERO_UNDEF;
}

SDValue AMDGPUTargetLowering::getFFBH_U32(SelectionDAG &DAG,
SDValue Op,		SDValue Op,
const SDLoc &DL) const {		const SDLoc &DL,
		unsigned Opc) const {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
EVT LegalVT = getTypeToTransformTo(*DAG.getContext(), VT);		EVT LegalVT = getTypeToTransformTo(*DAG.getContext(), VT);
if (LegalVT != MVT::i32 && (Subtarget->has16BitInsts() &&		if (LegalVT != MVT::i32 && (Subtarget->has16BitInsts() &&
LegalVT != MVT::i16))		LegalVT != MVT::i16))
return SDValue();		return SDValue();

if (VT != MVT::i32)		if (VT != MVT::i32)
Op = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i32, Op);		Op = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i32, Op);

SDValue FFBH = DAG.getNode(AMDGPUISD::FFBH_U32, DL, MVT::i32, Op);		SDValue FFBX = DAG.getNode(Opc, DL, MVT::i32, Op);
if (VT != MVT::i32)		if (VT != MVT::i32)
FFBH = DAG.getNode(ISD::TRUNCATE, DL, VT, FFBH);		FFBX = DAG.getNode(ISD::TRUNCATE, DL, VT, FFBX);

return FFBH;		return FFBX;
}		}

// The native instructions return -1 on 0 input. Optimize out a select that		// The native instructions return -1 on 0 input. Optimize out a select that
// produces -1 on 0.		// produces -1 on 0.
//		//
// TODO: If zero is not undef, we could also do this if the output is compared		// TODO: If zero is not undef, we could also do this if the output is compared
// against the bitwidth.		// against the bitwidth.
//		//
// TODO: Should probably combine against FFBH_U32 instead of ctlz directly.		// TODO: Should probably combine against FFBH_U32 instead of ctlz directly.
SDValue AMDGPUTargetLowering::performCtlzCombine(const SDLoc &SL, SDValue Cond,		SDValue AMDGPUTargetLowering::performCtlz_CttzCombine(const SDLoc &SL, SDValue Cond,
SDValue LHS, SDValue RHS,		SDValue LHS, SDValue RHS,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
ConstantSDNode *CmpRhs = dyn_cast<ConstantSDNode>(Cond.getOperand(1));		ConstantSDNode *CmpRhs = dyn_cast<ConstantSDNode>(Cond.getOperand(1));
if (!CmpRhs \|\| !CmpRhs->isNullValue())		if (!CmpRhs \|\| !CmpRhs->isNullValue())
return SDValue();		return SDValue();

SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
ISD::CondCode CCOpcode = cast<CondCodeSDNode>(Cond.getOperand(2))->get();		ISD::CondCode CCOpcode = cast<CondCodeSDNode>(Cond.getOperand(2))->get();
SDValue CmpLHS = Cond.getOperand(0);		SDValue CmpLHS = Cond.getOperand(0);

		unsigned Opc = isCttzOpc(RHS.getOpcode()) ? AMDGPUISD::FFBL_B32 :
		AMDGPUISD::FFBH_U32;

// select (setcc x, 0, eq), -1, (ctlz_zero_undef x) -> ffbh_u32 x		// select (setcc x, 0, eq), -1, (ctlz_zero_undef x) -> ffbh_u32 x
		// select (setcc x, 0, eq), -1, (cttz_zero_undef x) -> ffbl_u32 x
if (CCOpcode == ISD::SETEQ &&		if (CCOpcode == ISD::SETEQ &&
isCtlzOpc(RHS.getOpcode()) &&		(isCtlzOpc(RHS.getOpcode()) \|\| isCttzOpc(RHS.getOpcode())) &&
RHS.getOperand(0) == CmpLHS &&		RHS.getOperand(0) == CmpLHS &&
isNegativeOne(LHS)) {		isNegativeOne(LHS)) {
return getFFBH_U32(DAG, CmpLHS, SL);		return getFFBX_U32(DAG, CmpLHS, SL, Opc);
}		}

// select (setcc x, 0, ne), (ctlz_zero_undef x), -1 -> ffbh_u32 x		// select (setcc x, 0, ne), (ctlz_zero_undef x), -1 -> ffbh_u32 x
		// select (setcc x, 0, ne), (cttz_zero_undef x), -1 -> ffbl_u32 x
if (CCOpcode == ISD::SETNE &&		if (CCOpcode == ISD::SETNE &&
isCtlzOpc(LHS.getOpcode()) &&		(isCtlzOpc(LHS.getOpcode()) \|\| isCttzOpc(RHS.getOpcode())) &&
LHS.getOperand(0) == CmpLHS &&		LHS.getOperand(0) == CmpLHS &&
isNegativeOne(RHS)) {		isNegativeOne(RHS)) {
return getFFBH_U32(DAG, CmpLHS, SL);		return getFFBX_U32(DAG, CmpLHS, SL, Opc);
}		}

return SDValue();		return SDValue();
}		}

static SDValue distributeOpThroughSelect(TargetLowering::DAGCombinerInfo &DCI,		static SDValue distributeOpThroughSelect(TargetLowering::DAGCombinerInfo &DCI,
unsigned Op,		unsigned Op,
const SDLoc &SL,		const SDLoc &SL,
▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	if (VT == MVT::f32 && Subtarget->hasFminFmaxLegacy()) {
= combineFMinMaxLegacy(SDLoc(N), VT, LHS, RHS, True, False, CC, DCI);		= combineFMinMaxLegacy(SDLoc(N), VT, LHS, RHS, True, False, CC, DCI);
// Revisit this node so we can catch min3/max3/med3 patterns.		// Revisit this node so we can catch min3/max3/med3 patterns.
//DCI.AddToWorklist(MinMax.getNode());		//DCI.AddToWorklist(MinMax.getNode());
return MinMax;		return MinMax;
}		}
}		}

// There's no reason to not do this if the condition has other uses.		// There's no reason to not do this if the condition has other uses.
return performCtlzCombine(SDLoc(N), Cond, True, False, DCI);		return performCtlz_CttzCombine(SDLoc(N), Cond, True, False, DCI);
}		}

static bool isConstantFPZero(SDValue N) {		static bool isConstantFPZero(SDValue N) {
if (const ConstantFPSDNode *C = isConstOrConstSplatFP(N))		if (const ConstantFPSDNode *C = isConstOrConstSplatFP(N))
return C->isZero() && !C->isNegative();		return C->isZero() && !C->isNegative();
return false;		return false;
}		}

▲ Show 20 Lines • Show All 571 Lines • ▼ Show 20 Lines	const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
NODE_NAME_CASE(CARRY)		NODE_NAME_CASE(CARRY)
NODE_NAME_CASE(BORROW)		NODE_NAME_CASE(BORROW)
NODE_NAME_CASE(BFE_U32)		NODE_NAME_CASE(BFE_U32)
NODE_NAME_CASE(BFE_I32)		NODE_NAME_CASE(BFE_I32)
NODE_NAME_CASE(BFI)		NODE_NAME_CASE(BFI)
NODE_NAME_CASE(BFM)		NODE_NAME_CASE(BFM)
NODE_NAME_CASE(FFBH_U32)		NODE_NAME_CASE(FFBH_U32)
NODE_NAME_CASE(FFBH_I32)		NODE_NAME_CASE(FFBH_I32)
		NODE_NAME_CASE(FFBL_B32)
NODE_NAME_CASE(MUL_U24)		NODE_NAME_CASE(MUL_U24)
NODE_NAME_CASE(MUL_I24)		NODE_NAME_CASE(MUL_I24)
NODE_NAME_CASE(MULHI_U24)		NODE_NAME_CASE(MULHI_U24)
NODE_NAME_CASE(MULHI_I24)		NODE_NAME_CASE(MULHI_I24)
NODE_NAME_CASE(MUL_LOHI_U24)		NODE_NAME_CASE(MUL_LOHI_U24)
NODE_NAME_CASE(MUL_LOHI_I24)		NODE_NAME_CASE(MUL_LOHI_I24)
NODE_NAME_CASE(MAD_U24)		NODE_NAME_CASE(MAD_U24)
NODE_NAME_CASE(MAD_I24)		NODE_NAME_CASE(MAD_I24)
▲ Show 20 Lines • Show All 190 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/AMDGPUInstrInfo.td

	Show First 20 Lines • Show All 292 Lines • ▼ Show 20 Lines
	def AMDGPUbfe_u32 : SDNode<"AMDGPUISD::BFE_U32", AMDGPUDTIntTernaryOp>;			def AMDGPUbfe_u32 : SDNode<"AMDGPUISD::BFE_U32", AMDGPUDTIntTernaryOp>;
	def AMDGPUbfe_i32 : SDNode<"AMDGPUISD::BFE_I32", AMDGPUDTIntTernaryOp>;			def AMDGPUbfe_i32 : SDNode<"AMDGPUISD::BFE_I32", AMDGPUDTIntTernaryOp>;
	def AMDGPUbfi : SDNode<"AMDGPUISD::BFI", AMDGPUDTIntTernaryOp>;			def AMDGPUbfi : SDNode<"AMDGPUISD::BFI", AMDGPUDTIntTernaryOp>;
	def AMDGPUbfm : SDNode<"AMDGPUISD::BFM", SDTIntBinOp>;			def AMDGPUbfm : SDNode<"AMDGPUISD::BFM", SDTIntBinOp>;

	def AMDGPUffbh_u32 : SDNode<"AMDGPUISD::FFBH_U32", SDTIntUnaryOp>;			def AMDGPUffbh_u32 : SDNode<"AMDGPUISD::FFBH_U32", SDTIntUnaryOp>;
	def AMDGPUffbh_i32 : SDNode<"AMDGPUISD::FFBH_I32", SDTIntUnaryOp>;			def AMDGPUffbh_i32 : SDNode<"AMDGPUISD::FFBH_I32", SDTIntUnaryOp>;

				def AMDGPUffbl_b32 : SDNode<"AMDGPUISD::FFBL_B32", SDTIntUnaryOp>;

	// Signed and unsigned 24-bit multiply. The highest 8-bits are ignore			// Signed and unsigned 24-bit multiply. The highest 8-bits are ignore
	// when performing the mulitply. The result is a 32-bit value.			// when performing the mulitply. The result is a 32-bit value.
	def AMDGPUmul_u24 : SDNode<"AMDGPUISD::MUL_U24", SDTIntBinOp,			def AMDGPUmul_u24 : SDNode<"AMDGPUISD::MUL_U24", SDTIntBinOp,
	[SDNPCommutative, SDNPAssociative]			[SDNPCommutative, SDNPAssociative]
	>;			>;
	def AMDGPUmul_i24 : SDNode<"AMDGPUISD::MUL_I24", SDTIntBinOp,			def AMDGPUmul_i24 : SDNode<"AMDGPUISD::MUL_I24", SDTIntBinOp,
	[SDNPCommutative, SDNPAssociative]			[SDNPCommutative, SDNPAssociative]
	>;			>;
	▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/EvergreenInstructions.td

	Show First 20 Lines • Show All 443 Lines • ▼ Show 20 Lines

	def ADDC_UINT : R600_2OP_Helper <0x52, "ADDC_UINT", AMDGPUcarry>;			def ADDC_UINT : R600_2OP_Helper <0x52, "ADDC_UINT", AMDGPUcarry>;
	def SUBB_UINT : R600_2OP_Helper <0x53, "SUBB_UINT", AMDGPUborrow>;			def SUBB_UINT : R600_2OP_Helper <0x53, "SUBB_UINT", AMDGPUborrow>;

	def FLT32_TO_FLT16 : R600_1OP_Helper <0xA2, "FLT32_TO_FLT16", AMDGPUfp_to_f16, VecALU>;			def FLT32_TO_FLT16 : R600_1OP_Helper <0xA2, "FLT32_TO_FLT16", AMDGPUfp_to_f16, VecALU>;
	def FLT16_TO_FLT32 : R600_1OP_Helper <0xA3, "FLT16_TO_FLT32", f16_to_fp, VecALU>;			def FLT16_TO_FLT32 : R600_1OP_Helper <0xA3, "FLT16_TO_FLT32", f16_to_fp, VecALU>;
	def BCNT_INT : R600_1OP_Helper <0xAA, "BCNT_INT", ctpop, VecALU>;			def BCNT_INT : R600_1OP_Helper <0xAA, "BCNT_INT", ctpop, VecALU>;
	def FFBH_UINT : R600_1OP_Helper <0xAB, "FFBH_UINT", AMDGPUffbh_u32, VecALU>;			def FFBH_UINT : R600_1OP_Helper <0xAB, "FFBH_UINT", AMDGPUffbh_u32, VecALU>;
	def FFBL_INT : R600_1OP_Helper <0xAC, "FFBL_INT", cttz_zero_undef, VecALU>;			def FFBL_INT : R600_1OP_Helper <0xAC, "FFBL_INT", AMDGPUffbl_b32, VecALU>;

	let hasSideEffects = 1 in {			let hasSideEffects = 1 in {
	def MOVA_INT_eg : R600_1OP <0xCC, "MOVA_INT", [], VecALU>;			def MOVA_INT_eg : R600_1OP <0xCC, "MOVA_INT", [], VecALU>;
	}			}

	def FLT_TO_INT_eg : FLT_TO_INT_Common<0x50> {			def FLT_TO_INT_eg : FLT_TO_INT_Common<0x50> {
	let Pattern = [];			let Pattern = [];
	let Itinerary = AnyALU;			let Itinerary = AnyALU;
	▲ Show 20 Lines • Show All 313 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/AMDGPU/SOPInstructions.td

	Show First 20 Lines • Show All 153 Lines • ▼ Show 20 Lines
	def S_BCNT1_I32_B32 : SOP1_32 <"s_bcnt1_i32_b32",			def S_BCNT1_I32_B32 : SOP1_32 <"s_bcnt1_i32_b32",
	[(set i32:$sdst, (ctpop i32:$src0))]			[(set i32:$sdst, (ctpop i32:$src0))]
	>;			>;
	def S_BCNT1_I32_B64 : SOP1_32_64 <"s_bcnt1_i32_b64">;			def S_BCNT1_I32_B64 : SOP1_32_64 <"s_bcnt1_i32_b64">;
	} // End Defs = [SCC]			} // End Defs = [SCC]

	def S_FF0_I32_B32 : SOP1_32 <"s_ff0_i32_b32">;			def S_FF0_I32_B32 : SOP1_32 <"s_ff0_i32_b32">;
	def S_FF0_I32_B64 : SOP1_32_64 <"s_ff0_i32_b64">;			def S_FF0_I32_B64 : SOP1_32_64 <"s_ff0_i32_b64">;
				def S_FF1_I32_B64 : SOP1_32_64 <"s_ff1_i32_b64">;

	def S_FF1_I32_B32 : SOP1_32 <"s_ff1_i32_b32",			def S_FF1_I32_B32 : SOP1_32 <"s_ff1_i32_b32",
	[(set i32:$sdst, (cttz_zero_undef i32:$src0))]			[(set i32:$sdst, (AMDGPUffbl_b32 i32:$src0))]
	>;			>;
	def S_FF1_I32_B64 : SOP1_32_64 <"s_ff1_i32_b64">;

	def S_FLBIT_I32_B32 : SOP1_32 <"s_flbit_i32_b32",			def S_FLBIT_I32_B32 : SOP1_32 <"s_flbit_i32_b32",
	[(set i32:$sdst, (AMDGPUffbh_u32 i32:$src0))]			[(set i32:$sdst, (AMDGPUffbh_u32 i32:$src0))]
	>;			>;

	def S_FLBIT_I32_B64 : SOP1_32_64 <"s_flbit_i32_b64">;			def S_FLBIT_I32_B64 : SOP1_32_64 <"s_flbit_i32_b64">;
	def S_FLBIT_I32 : SOP1_32 <"s_flbit_i32",			def S_FLBIT_I32 : SOP1_32 <"s_flbit_i32",
	[(set i32:$sdst, (AMDGPUffbh_i32 i32:$src0))]			[(set i32:$sdst, (AMDGPUffbh_i32 i32:$src0))]
	▲ Show 20 Lines • Show All 1,138 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AMDGPU/cttz_zero_undef.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=SI -check-prefix=SI-NOSDWA -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=SI -check-prefix=SI-SDWA -check-prefix=FUNC %s
	; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s			; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs < %s \| FileCheck -enable-var-scope -check-prefix=EG -check-prefix=FUNC %s

				declare i7 @llvm.cttz.i7(i7, i1) nounwind readnone
				declare i8 @llvm.cttz.i8(i8, i1) nounwind readnone
				declare i16 @llvm.cttz.i16(i16, i1) nounwind readnone
	declare i32 @llvm.cttz.i32(i32, i1) nounwind readnone			declare i32 @llvm.cttz.i32(i32, i1) nounwind readnone
				declare i64 @llvm.cttz.i64(i64, i1) nounwind readnone
	declare <2 x i32> @llvm.cttz.v2i32(<2 x i32>, i1) nounwind readnone			declare <2 x i32> @llvm.cttz.v2i32(<2 x i32>, i1) nounwind readnone
	declare <4 x i32> @llvm.cttz.v4i32(<4 x i32>, i1) nounwind readnone			declare <4 x i32> @llvm.cttz.v4i32(<4 x i32>, i1) nounwind readnone
	declare i32 @llvm.r600.read.tidig.x() nounwind readnone			declare i32 @llvm.r600.read.tidig.x() nounwind readnone

	; FUNC-LABEL: {{^}}s_cttz_zero_undef_i32:			; FUNC-LABEL: {{^}}s_cttz_zero_undef_i32:
	; SI: s_load_dword [[VAL:s[0-9]+]],			; SI: s_load_dword [[VAL:s[0-9]+]],
	; SI: s_ff1_i32_b32 [[SRESULT:s[0-9]+]], [[VAL]]			; SI: s_ff1_i32_b32 [[SRESULT:s[0-9]+]], [[VAL]]
	; SI: v_mov_b32_e32 [[VRESULT:v[0-9]+]], [[SRESULT]]			; SI: v_mov_b32_e32 [[VRESULT:v[0-9]+]], [[SRESULT]]
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @v_cttz_zero_undef_v4i32(<4 x i32> addrspace(1)* noalias %out, <4 x i32> addrspace(1)* noalias %valptr) nounwind {			define amdgpu_kernel void @v_cttz_zero_undef_v4i32(<4 x i32> addrspace(1)* noalias %out, <4 x i32> addrspace(1)* noalias %valptr) nounwind {
	%tid = call i32 @llvm.r600.read.tidig.x()			%tid = call i32 @llvm.r600.read.tidig.x()
	%in.gep = getelementptr <4 x i32>, <4 x i32> addrspace(1)* %valptr, i32 %tid			%in.gep = getelementptr <4 x i32>, <4 x i32> addrspace(1)* %valptr, i32 %tid
	%val = load <4 x i32>, <4 x i32> addrspace(1)* %in.gep, align 16			%val = load <4 x i32>, <4 x i32> addrspace(1)* %in.gep, align 16
	%cttz = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> %val, i1 true) nounwind readnone			%cttz = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> %val, i1 true) nounwind readnone
	store <4 x i32> %cttz, <4 x i32> addrspace(1)* %out, align 16			store <4 x i32> %cttz, <4 x i32> addrspace(1)* %out, align 16
	ret void			ret void
	}			}

				; FUNC-LABEL: {{^}}s_cttz_zero_undef_i8_with_select:
				; SI: s_ff1_i32_b32 s{{[0-9]+}}, s{{[0-9]+}}
				; EG: MEM_RAT MSKOR
				; EG: FFBL_INT
				define amdgpu_kernel void @s_cttz_zero_undef_i8_with_select(i8 addrspace(1)* noalias %out, i8 %val) nounwind {
				%cttz = tail call i8 @llvm.cttz.i8(i8 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i8 %val, 0
				%ret = select i1 %cttz_ret, i8 %cttz, i8 32
				store i8 %cttz, i8 addrspace(1)* %out, align 4
				ret void
				}

				; FUNC-LABEL: {{^}}s_cttz_zero_undef_i16_with_select:
				; SI: s_ff1_i32_b32 s{{[0-9]+}}, s{{[0-9]+}}
				; EG: MEM_RAT MSKOR
				; EG: FFBL_INT
				define amdgpu_kernel void @s_cttz_zero_undef_i16_with_select(i16 addrspace(1)* noalias %out, i16 %val) nounwind {
				%cttz = tail call i16 @llvm.cttz.i16(i16 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i16 %val, 0
				%ret = select i1 %cttz_ret, i16 %cttz, i16 32
				store i16 %cttz, i16 addrspace(1)* %out, align 4
				ret void
				}

				; FUNC-LABEL: {{^}}s_cttz_zero_undef_i32_with_select:
				; SI: s_ff1_i32_b32
				; EG: MEM_RAT_CACHELESS STORE_RAW [[RESULT:T[0-9]+\.[XYZW]]]
				; EG: FFBL_INT {{\? }}[[RESULT]]
				define amdgpu_kernel void @s_cttz_zero_undef_i32_with_select(i32 addrspace(1)* noalias %out, i32 %val) nounwind {
				%cttz = tail call i32 @llvm.cttz.i32(i32 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i32 %val, 0
				%ret = select i1 %cttz_ret, i32 %cttz, i32 32
				store i32 %cttz, i32 addrspace(1)* %out, align 4
				ret void
				}

				; FUNC-LABEL: {{^}}s_cttz_zero_undef_i64_with_select:
				; SI: s_ff1_i32_b32 s{{[0-9]+}}, s{{[0-9]+}}
				; SI: s_ff1_i32_b32 s{{[0-9]+}}, s{{[0-9]+}}
				; EG: MEM_RAT_CACHELESS STORE_RAW [[RESULT:T[0-9]+\.[XYZW]]]
				define amdgpu_kernel void @s_cttz_zero_undef_i64_with_select(i64 addrspace(1)* noalias %out, i64 %val) nounwind {
				%cttz = tail call i64 @llvm.cttz.i64(i64 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i64 %val, 0
				%ret = select i1 %cttz_ret, i64 %cttz, i64 32
				store i64 %cttz, i64 addrspace(1)* %out, align 4
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_zero_undef_i8_with_select:
				; SI-NOSDWA: v_ffbl_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}
				; SI-SDWA: v_ffbl_b32_sdwa
				; EG: MEM_RAT MSKOR
				define amdgpu_kernel void @v_cttz_zero_undef_i8_with_select(i8 addrspace(1)* noalias %out, i8 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i8, i8 addrspace(1)* %arrayidx, align 1
				%cttz = tail call i8 @llvm.cttz.i8(i8 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i8 %val, 0
				%ret = select i1 %cttz_ret, i8 %cttz, i8 32
				store i8 %ret, i8 addrspace(1)* %out, align 4
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_zero_undef_i16_with_select:
				; SI-NOSDWA: v_ffbl_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}
				; SI-SDWA: v_ffbl_b32_sdwa
				; EG: MEM_RAT MSKOR
				define amdgpu_kernel void @v_cttz_zero_undef_i16_with_select(i16 addrspace(1)* noalias %out, i16 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i16, i16 addrspace(1)* %arrayidx, align 1
				%cttz = tail call i16 @llvm.cttz.i16(i16 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i16 %val, 0
				%ret = select i1 %cttz_ret, i16 %cttz, i16 32
				store i16 %ret, i16 addrspace(1)* %out, align 4
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_zero_undef_i32_with_select:
				; SI: v_ffbl_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}
				; EG: MEM_RAT_CACHELESS STORE_RAW [[RESULT:T[0-9]+\.[XYZW]]]
				define amdgpu_kernel void @v_cttz_zero_undef_i32_with_select(i32 addrspace(1)* noalias %out, i32 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i32, i32 addrspace(1)* %arrayidx, align 1
				%cttz = tail call i32 @llvm.cttz.i32(i32 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i32 %val, 0
				%ret = select i1 %cttz_ret, i32 %cttz, i32 32
				store i32 %ret, i32 addrspace(1)* %out, align 4
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_zero_undef_i64_with_select:
				; SI-NOSDWA: v_or_b32_e32
				; SI-NOSDWA: v_or_b32_e32
				; SI-NOSDWA: v_or_b32_e32
				; SI-SDWA: v_or_b32_sdwa
				; SI-NOSDWA: v_or_b32_e32
				; SI-SDWA: v_or_b32_sdwa
				; SI: v_or_b32_e32 [[VAL1:v[0-9]+]], v{{[0-9]+}}, v{{[0-9]+}}
				; SI: v_or_b32_e32 [[VAL2:v[0-9]+]], v{{[0-9]+}}, v{{[0-9]+}}
				; SI-DAG: v_ffbl_b32_e32 v{{[0-9]+}}, [[VAL1]]
				; SI-DAG: v_ffbl_b32_e32 v{{[0-9]+}}, [[VAL2]]
				; EG: MEM_RAT_CACHELESS STORE_RAW [[RESULT:T[0-9]+\.[XYZW]]]
				define amdgpu_kernel void @v_cttz_zero_undef_i64_with_select(i64 addrspace(1)* noalias %out, i64 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i64, i64 addrspace(1)* %arrayidx, align 1
				%cttz = tail call i64 @llvm.cttz.i64(i64 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i64 %val, 0
				%ret = select i1 %cttz_ret, i64 %cttz, i64 32
				store i64 %ret, i64 addrspace(1)* %out, align 4
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_i32_sel_eq_neg1:
				; SI: v_ffbl_b32_e32 v{{[0-9]+}}, [[VAL:v[0-9]+]]
				; SI: v_cmp_ne_u32_e32 vcc, 0, [[VAL]]
				; SI: s_endpgm
				; EG: MEM_RAT_CACHELESS STORE_RAW
				; EG: FFBL_INT
				define amdgpu_kernel void @v_cttz_i32_sel_eq_neg1(i32 addrspace(1)* noalias %out, i32 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i32, i32 addrspace(1)* %arrayidx, align 1
				%ctlz = call i32 @llvm.cttz.i32(i32 %val, i1 false) nounwind readnone
				%cmp = icmp eq i32 %val, 0
				%sel = select i1 %cmp, i32 -1, i32 %ctlz
				store i32 %sel, i32 addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_i32_sel_ne_neg1:
				; SI: v_ffbl_b32_e32 v{{[0-9]+}}, [[VAL:v[0-9]+]]
				; SI: v_cmp_ne_u32_e32 vcc, 0, [[VAL]]
				; SI: s_endpgm
				; EG: MEM_RAT_CACHELESS STORE_RAW
				; EG: FFBL_INT
				define amdgpu_kernel void @v_cttz_i32_sel_ne_neg1(i32 addrspace(1)* noalias %out, i32 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i32, i32 addrspace(1)* %arrayidx, align 1
				%ctlz = call i32 @llvm.cttz.i32(i32 %val, i1 false) nounwind readnone
				%cmp = icmp ne i32 %val, 0
				%sel = select i1 %cmp, i32 %ctlz, i32 -1
				store i32 %sel, i32 addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_i32_sel_ne_bitwidth:
				; SI: v_ffbl_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}
				; SI: v_cmp
				; SI: v_cndmask
				; SI: s_endpgm
				; EG: MEM_RAT_CACHELESS STORE_RAW
				; EG: FFBL_INT
				define amdgpu_kernel void @v_cttz_i32_sel_ne_bitwidth(i32 addrspace(1)* noalias %out, i32 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i32, i32 addrspace(1)* %arrayidx, align 1
				%ctlz = call i32 @llvm.cttz.i32(i32 %val, i1 false) nounwind readnone
				%cmp = icmp ne i32 %ctlz, 32
				%sel = select i1 %cmp, i32 %ctlz, i32 -1
				store i32 %sel, i32 addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_i8_sel_eq_neg1:
				; SI: {{buffer\|flat}}_load_ubyte
				; SI: v_ffbl_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}
				; EG: MEM_RAT MSKOR
				; EG: FFBL_INT
				define amdgpu_kernel void @v_cttz_i8_sel_eq_neg1(i8 addrspace(1)* noalias %out, i8 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i8, i8 addrspace(1)* %arrayidx, align 1
				%ctlz = call i8 @llvm.cttz.i8(i8 %val, i1 false) nounwind readnone
				%cmp = icmp eq i8 %val, 0
				%sel = select i1 %cmp, i8 -1, i8 %ctlz
				store i8 %sel, i8 addrspace(1)* %out
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_i16_sel_eq_neg1:
				; SI: {{buffer\|flat}}_load_ubyte
				; SI: v_ffbl_b32_e32 v{{[0-9]+}}, v{{[0-9]+}}
				; SI: buffer_store_short
				; EG: MEM_RAT MSKOR
				; EG: FFBL_INT
				define amdgpu_kernel void @v_cttz_i16_sel_eq_neg1(i16 addrspace(1)* noalias %out, i16 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i16, i16 addrspace(1)* %arrayidx, align 1
				%ctlz = call i16 @llvm.cttz.i16(i16 %val, i1 false) nounwind readnone
				%cmp = icmp eq i16 %val, 0
				%sel = select i1 %cmp, i16 -1, i16 %ctlz
				store i16 %sel, i16 addrspace(1)* %out
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 118831

llvm/trunk/include/llvm/Target/TargetSelectionDAG.td

llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.h

llvm/trunk/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/trunk/lib/Target/AMDGPU/AMDGPUInstrInfo.td

llvm/trunk/lib/Target/AMDGPU/EvergreenInstructions.td

llvm/trunk/lib/Target/AMDGPU/SOPInstructions.td

llvm/trunk/test/CodeGen/AMDGPU/cttz_zero_undef.ll

Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
ClosedPublic