This is an archive of the discontinued LLVM Phabricator instance.

Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
ClosedPublic

Authored by wdng on Aug 31 2017, 12:27 PM.

Download Raw Diff

Details

Reviewers

arsenm
b-sumner
t-tye
kzhuravl
rampitec

Commits

rG5676acad9e0b: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
rL315610: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.

Summary

During the DAGCombine optimization phase, the LLVM compiler converts ISD::CTTZ_ZERO_UNDEF to ISD::CTTZ and then expands during the Legalization phase, which prevents the v_ffbl_b32 instruction generation. This patch implements custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.

Diff Detail

Repository: rL LLVM

Event Timeline

wdng created this revision.Aug 31 2017, 12:27 PM

Herald added a subscriber: nhaehnle. · View Herald TranscriptAug 31 2017, 12:28 PM

arsenm added inline comments.Aug 31 2017, 1:34 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16680 ↗	(On Diff #113451)	The existing code was the correct way to do this

arsenm added a subscriber: llvm-commits.Aug 31 2017, 1:34 PM

wdng added inline comments.Aug 31 2017, 1:54 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

16680 ↗

(On Diff #113451)

With original code, we will have the following code transformations:

Initial selection DAG: BB#0 'sample_test:entry'
SelectionDAG has 50 nodes:
  t0: ch = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %vreg2
  t3: i64 = Constant<0>
  t5: i64,ch = load<LD8[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant)> t0, t2, undef:i64
  t6: i64,ch = merge_values t5, t5:1
    t8: i64 = add t2, Constant:i64<8>
  t9: i64,ch = load<LD8[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant)> t0, t8, undef:i64
  t10: i64,ch = merge_values t9, t9:1
  t11: ch = TokenFactor t6:1, t10:1
      t13: i64 = llvm.amdgcn.dispatch.ptr TargetConstant:i32<359>
    t19: i64 = add t13, Constant:i64<4>
  t20: i16,ch = load<LD2[%4(addrspace=2)](align=4)(tbaa=<0x4436db8>)> t11, t19, undef:i64
    t25: i64 = llvm.amdgcn.implicitarg.ptr TargetConstant:i32<460>
  t27: i64,ch = load<LD8[%11(addrspace=2)](tbaa=<0x4435518>)> t11, t25, undef:i64
  t29: i64 = Constant<32>
              t17: i32 = llvm.amdgcn.workgroup.id.x TargetConstant:i32<505>
              t21: i32 = zero_extend t20
            t22: i32 = mul t17, t21
            t15: i32 = llvm.amdgcn.workitem.id.x TargetConstant:i32<508>
          t23: i32 = add t22, t15 
        t26: i64 = zero_extend t23 
      t28: i64 = add t27, t26 
    t31: i64 = shl t28, Constant:i32<32>
  t32: i64 = sra t31, Constant:i32<32>
    t33: i64 = add t6, t32 
  t34: i8,ch = load<LD1[%arrayidx(addrspace=1)](tbaa=<0x4435498>)> t11, t33, undef:i64
  t35: i32 = zero_extend t34 
    t39: i1 = setcc t35, Constant:i32<0>, setne:ch
    t36: i32 = cttz_zero_undef t35
  t40: i32 = select t39, t36, Constant:i32<32>
  t43: i1 = setcc Constant:i32<8>, t40, setult:ch
      t47: ch = TokenFactor t20:1, t27:1, t34:1
        t44: i32 = umin t40, Constant:i32<8>
      t45: i8 = truncate t44 
      t46: i64 = add t10, t32 
    t48: ch = store<ST1[%arrayidx3(addrspace=1)](tbaa=<0x4435498>)> t47, t45, t46, undef:i64
  t49: ch = ENDPGM t48

Optimized lowered selection DAG: BB#0 'sample_test:entry'
SelectionDAG has 35 nodes:
  t0: ch = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %vreg2
  t5: i64,ch = load<LD8[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant)> t0, t2, undef:i64
    t8: i64 = add t2, Constant:i64<8>
  t9: i64,ch = load<LD8[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant)> t0, t8, undef:i64
  t11: ch = TokenFactor t5:1, t9:1
    t33: i64 = add t5, t63 
  t54: i32,ch = load<LD1[%arrayidx(addrspace=1)](tbaa=<0x4435498>), zext from i8> t11, t33, undef:i64
    t25: i64 = llvm.amdgcn.implicitarg.ptr TargetConstant:i32<460>
  t62: i32,ch = load<LD4[%11(addrspace=2)](align=8)(tbaa=<0x4435518>)> t11, t25, undef:i64
          t17: i32 = llvm.amdgcn.workgroup.id.x TargetConstant:i32<505>
        t22: i32 = mul t17, t64 
        t15: i32 = llvm.amdgcn.workitem.id.x TargetConstant:i32<508>
      t23: i32 = add t22, t15
    t60: i32 = add t62, t23
  t63: i64 = sign_extend t60, ValueType:ch:i32
      t13: i64 = llvm.amdgcn.dispatch.ptr TargetConstant:i32<359>
    t19: i64 = add t13, Constant:i64<4>
  t64: i32,ch = load<LD2[%4(addrspace=2)](align=4)(tbaa=<0x4436db8>), zext from i16> t11, t19, undef:i64
      t47: ch = TokenFactor t64:1, t62:1, t54:1
        t53: i32 = cttz t54
      t44: i32 = umin t53, Constant:i32<8>
      t46: i64 = add t9, t63
    t50: ch = store<ST1[%arrayidx3(addrspace=1)](tbaa=<0x4435498>), trunc to i8> t47, t44, t46, undef:i64
  t49: ch = ENDPGM t50

We won't be able to generate s/v_ffbl instructions. I found llvm.cttz.i32 has all been converted to cttz_zero_undef instread of 'cttz'.

If we don't want to change the original way of implementation, we may want to do a custom lowering for ISD::CTTZ at AMDGPU backend to ISD::CTTZ_ZERO_UNDE?

Ping.

I think the actual problem is the implementation of ISD::CTTZ not using v_ffbl and not this transformation.

If v_ffbl is able to produce a defined answer of bit width for 0, then you want to match it with cttz and have the operation action for cttz_zero_undef set to Expand. That will turn all cttz_zero_undef calls into cttz.

If v_ffbl is not capable of handling zero, then you want cttz_zero_undef set to Legal, and cttz set to Expand which will make use of cttz_zero_undef and a select. Or you can make cttz Custom and do your own lowering.

I think the instruction behavior is to return -1 on 0 input. IIRC we handle this and fold that for ctlz already, just not cttz.

Just add a custom lowering ISD:CTTZ to ISD::CTTZ_ZERO_UNDEF

In D37348#859119, @wdng wrote:

Just add a custom lowering ISD:CTTZ to ISD::CTTZ_ZERO_UNDEF

I don't think that will help. Why not follow exactly how CTLZ* is handled now and implement AMDGPUTargetLowering::LowerCTTZ making use of ffbl?

wdng updated this revision to Diff 114215.Sep 7 2017, 11:13 AM

wdng retitled this revision from Tighten conditions for converting ISD::CTTZ_ZERO_UNDEF to ISD::CTTZ to Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ..

wdng added a reviewer: craig.topper.Sep 7 2017, 11:36 AM

Ping.

wdng added a reviewer: t-tye.Sep 8 2017, 12:10 PM

arsenm added inline comments.Sep 8 2017, 1:22 PM

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2784–2793	I don't understand why you have this or most of the other changes. This shouldn't be substantially different from how we handle ctlz already. i.e. I would expect to see another version of AMDGPUTargetLowering::performCtlzCombine that does essentially the same thing for CTTZ.
lib/Target/AMDGPU/AMDGPUISelLowering.cpp
420	This should definitely remain legal
2030	You didn't enable custom lowering for i64, so this is dead code. You also didn't a dd any tests for it. In either case, it should be in a separate patch from the i32 handling.
lib/Target/AMDGPU/AMDGPUInstrInfo.td
301–302	This isn't a signed/unsigned operation. There is just one v_ffbl_b32.

craig.topper resigned from this revision.Sep 8 2017, 10:48 PM

wdng marked 2 inline comments as done.Sep 11 2017, 9:05 AM

wdng added inline comments.

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
420	I think it doesn't matter to define it as Custom, because it will be converted to FFBL_U32 during the custom lowering and then pattern matching to the ffbl instruction anyway at the end. However, if we defined it as Legal, we will have a "duplicate" or "extra" pattern (FFBL_U32 and CTTZ_ZERO_UNDEF) for generating the ffbl instruction. Is there any specific reason that I neglect here that we have to define it as Legal?

arsenm added inline comments.Sep 12 2017, 7:05 PM

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2803–2806	OK, I see the default expansion here isn't the compare and select like I expected. Since the compare+select implementation is likely more instructions with the compare than the sub/ctpop implementation, that one should be tried first.
lib/Target/AMDGPU/AMDGPUISelLowering.cpp
417	We should probably fix this at some point to be legal
1111–1112	Also need the select with -1 optimization (and corresponding tests) as cttz
2022	This is mostly copy past from LowerCTLZ. These should be factored into a common helper.
test/CodeGen/AMDGPU/cttz_zero_undef.ll
104	Need i64 tests

Changes based on code review feedback.

Upload a full diff.

Missing performCtlzCombine equivalent

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2803–2806	I don't see this changed
test/CodeGen/AMDGPU/cttz_zero_undef.ll
109	Missing scalar version

Address code reviews.

Fix the issues that variables are not capitalized.

wdng marked 2 inline comments as done.Sep 14 2017, 2:50 PM

Ping.

arsenm added inline comments.Sep 15 2017, 10:17 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2032–2038	Indentation wrong
2039	llvm_unreachable
2055–2058	Select between Zero and One as input to getSetCC
3002–3003	You could just pass in the new opcode directly rather than selecting it again
3004	Commented out code
3024	You didn't add tests for this part
lib/Target/AMDGPU/AMDGPUISelLowering.h
374	Should be name FFBL_B32 to match the instruction
test/CodeGen/AMDGPU/cttz_zero_undef.ll
131	Also should have some tests with i8/i16

Will create another separate ticket to fix the v_ffbl_sdwa instruction generation.

Ping.

wdng edited reviewers, added: kzhuravl; removed: craig.topper.Sep 22 2017, 12:29 PM

Ping.

Needs more comprehensive check lines. Just checking the instructions won't demonstrate that the extra instructions you're trying to avoid aren't there

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
3010	Don't include AMDGPUISD in the name of this
3035	Ditto
test/CodeGen/AMDGPU/cttz_zero_undef.ll
82	This needs to check more
94	This needs to check more
118–119	Ditto

Address code reivews.

Ping.

wdng added a reviewer: rampitec.Oct 10 2017, 11:10 AM

arsenm added inline comments.Oct 11 2017, 11:05 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2029	Extra space after ::
2034	Don't includ eAMDGPUISD in variable name
2035	Missing space before {
2067	Double // and missing closing )
2071	Double //
test/CodeGen/AMDGPU/cttz_zero_undef.ll
2	Add -enable-var-scope to all of the FileCheck lines. Several of these tests are broken
168–169	This isn't checking the outputs and select
181	Using undefined VAL
195	Undefined VAL

Address code reviews.

wdng marked 3 inline comments as done.Oct 11 2017, 4:14 PM

craig.topper removed a subscriber: craig.topper.Oct 11 2017, 4:16 PM

arsenm added inline comments.Oct 11 2017, 4:24 PM

test/CodeGen/AMDGPU/cttz_zero_undef.ll
172–173	Using 2 -DAGs with identical lines doesn't do anything. It will pass with only one

wdng added inline comments.Oct 11 2017, 4:27 PM

test/CodeGen/AMDGPU/cttz_zero_undef.ll
172–173	No, it won't work if I remove the -DAG. As the generated instructions get interleaved with each other.

Remove duplicate check lines.

Removed -DAG checks completely.

LGTM

This revision is now accepted and ready to land.Oct 12 2017, 10:39 AM

Closed by commit rL315610: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ. (authored by wdng). · Explain WhyOct 12 2017, 12:37 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

Target/

TargetSelectionDAG.td

2 lines

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

13 lines

Target/

AMDGPU/

AMDGPUISelLowering.h

3 lines

AMDGPUISelLowering.cpp

63 lines

AMDGPUInstrInfo.td

2 lines

EvergreenInstructions.td

2 lines

SOPInstructions.td

5 lines

test/

CodeGen/

AMDGPU/

cttz_zero_undef.ll

39 lines

Diff 115231

include/llvm/Target/TargetSelectionDAG.td

Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	def SDTFPBinOp : SDTypeProfile<1, 2, [ // fadd, fmul, etc.
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisFP<0>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisFP<0>
]>;		]>;
def SDTFPSignOp : SDTypeProfile<1, 2, [ // fcopysign.		def SDTFPSignOp : SDTypeProfile<1, 2, [ // fcopysign.
SDTCisSameAs<0, 1>, SDTCisFP<0>, SDTCisFP<2>		SDTCisSameAs<0, 1>, SDTCisFP<0>, SDTCisFP<2>
]>;		]>;
def SDTFPTernaryOp : SDTypeProfile<1, 3, [ // fmadd, fnmsub, etc.		def SDTFPTernaryOp : SDTypeProfile<1, 3, [ // fmadd, fnmsub, etc.
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisFP<0>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisFP<0>
]>;		]>;
def SDTIntUnaryOp : SDTypeProfile<1, 1, [ // ctlz		def SDTIntUnaryOp : SDTypeProfile<1, 1, [ // ctlz, cttz
SDTCisSameAs<0, 1>, SDTCisInt<0>		SDTCisSameAs<0, 1>, SDTCisInt<0>
]>;		]>;
def SDTIntExtendOp : SDTypeProfile<1, 1, [ // sext, zext, anyext		def SDTIntExtendOp : SDTypeProfile<1, 1, [ // sext, zext, anyext
SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<1, 0>, SDTCisSameNumEltsAs<0, 1>		SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<1, 0>, SDTCisSameNumEltsAs<0, 1>
]>;		]>;
def SDTIntTruncOp : SDTypeProfile<1, 1, [ // trunc		def SDTIntTruncOp : SDTypeProfile<1, 1, [ // trunc
SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<0, 1>, SDTCisSameNumEltsAs<0, 1>		SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<0, 1>, SDTCisSameNumEltsAs<0, 1>
]>;		]>;
▲ Show 20 Lines • Show All 1,045 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 2,774 Lines • ▼ Show 20 Lines	case ISD::CTLZ: {
}		}
Op = DAG.getNOT(dl, Op, VT);		Op = DAG.getNOT(dl, Op, VT);
return DAG.getNode(ISD::CTPOP, dl, VT, Op);		return DAG.getNode(ISD::CTPOP, dl, VT, Op);
}		}
case ISD::CTTZ_ZERO_UNDEF:		case ISD::CTTZ_ZERO_UNDEF:
// This trivially expands to CTTZ.		// This trivially expands to CTTZ.
return DAG.getNode(ISD::CTTZ, dl, Op.getValueType(), Op);		return DAG.getNode(ISD::CTTZ, dl, Op.getValueType(), Op);
case ISD::CTTZ: {		case ISD::CTTZ: {
		EVT VT = Op.getValueType();
		unsigned len = VT.getSizeInBits();

		if (TLI.isOperationLegalOrCustom(ISD::CTTZ_ZERO_UNDEF, VT)) {
		EVT SetCCVT = getSetCCResultType(VT);
		SDValue CTTZ = DAG.getNode(ISD::CTTZ_ZERO_UNDEF, dl, VT, Op);
		SDValue Zero = DAG.getConstant(0, dl, VT);
		SDValue SrcIsZero = DAG.getSetCC(dl, SetCCVT, Op, Zero, ISD::SETEQ);
		return DAG.getNode(ISD::SELECT, dl, VT, SrcIsZero,
		DAG.getConstant(len, dl, VT), CTTZ);
		}
		arsenmUnsubmitted Done Reply Inline Actions I don't understand why you have this or most of the other changes. This shouldn't be substantially different from how we handle ctlz already. i.e. I would expect to see another version of AMDGPUTargetLowering::performCtlzCombine that does essentially the same thing for CTTZ. arsenm: I don't understand why you have this or most of the other changes. This shouldn't be…

// for now, we use: { return popcount(~x & (x - 1)); }		// for now, we use: { return popcount(~x & (x - 1)); }
// unless the target has ctlz but not ctpop, in which case we use:		// unless the target has ctlz but not ctpop, in which case we use:
// { return 32 - nlz(~x & (x-1)); }		// { return 32 - nlz(~x & (x-1)); }
// Ref: "Hacker's Delight" by Henry Warren		// Ref: "Hacker's Delight" by Henry Warren
EVT VT = Op.getValueType();
SDValue Tmp3 = DAG.getNode(ISD::AND, dl, VT,		SDValue Tmp3 = DAG.getNode(ISD::AND, dl, VT,
DAG.getNOT(dl, Op, VT),		DAG.getNOT(dl, Op, VT),
DAG.getNode(ISD::SUB, dl, VT, Op,		DAG.getNode(ISD::SUB, dl, VT, Op,
DAG.getConstant(1, dl, VT)));		DAG.getConstant(1, dl, VT)));
// If ISD::CTLZ is legal and CTPOP isn't, then do that instead.		// If ISD::CTLZ is legal and CTPOP isn't, then do that instead.
if (!TLI.isOperationLegalOrCustom(ISD::CTPOP, VT) &&		if (!TLI.isOperationLegalOrCustom(ISD::CTPOP, VT) &&
TLI.isOperationLegalOrCustom(ISD::CTLZ, VT))		TLI.isOperationLegalOrCustom(ISD::CTLZ, VT))
return DAG.getNode(ISD::SUB, dl, VT,		return DAG.getNode(ISD::SUB, dl, VT,
		arsenmUnsubmitted Done Reply Inline Actions OK, I see the default expansion here isn't the compare and select like I expected. Since the compare+select implementation is likely more instructions with the compare than the sub/ctpop implementation, that one should be tried first. arsenm: OK, I see the default expansion here isn't the compare and select like I expected. Since the…
		arsenmUnsubmitted Done Reply Inline Actions I don't see this changed arsenm: I don't see this changed
DAG.getConstant(VT.getSizeInBits(), dl, VT),		DAG.getConstant(VT.getSizeInBits(), dl, VT),
DAG.getNode(ISD::CTLZ, dl, VT, Tmp3));		DAG.getNode(ISD::CTLZ, dl, VT, Tmp3));
return DAG.getNode(ISD::CTPOP, dl, VT, Tmp3);		return DAG.getNode(ISD::CTPOP, dl, VT, Tmp3);
}		}
}		}
}		}

bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {		bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
▲ Show 20 Lines • Show All 1,855 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.h

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	protected:
SDValue LowerFRINT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFRINT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFNEARBYINT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFNEARBYINT(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerFROUND32_16(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFROUND32_16(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFROUND64(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFROUND64(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFROUND(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFROUND(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerFFLOOR(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFFLOOR(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerCTLZ(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCTTZ_CTLZ(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerINT_TO_FP32(SDValue Op, SelectionDAG &DAG, bool Signed) const;		SDValue LowerINT_TO_FP32(SDValue Op, SelectionDAG &DAG, bool Signed) const;
SDValue LowerINT_TO_FP64(SDValue Op, SelectionDAG &DAG, bool Signed) const;		SDValue LowerINT_TO_FP64(SDValue Op, SelectionDAG &DAG, bool Signed) const;
SDValue LowerUINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerUINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSINT_TO_FP(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerFP64_TO_INT(SDValue Op, SelectionDAG &DAG, bool Signed) const;		SDValue LowerFP64_TO_INT(SDValue Op, SelectionDAG &DAG, bool Signed) const;
SDValue LowerFP_TO_FP16(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerFP_TO_FP16(SDValue Op, SelectionDAG &DAG) const;
▲ Show 20 Lines • Show All 297 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
CARRY,		CARRY,
BORROW,		BORROW,
BFE_U32, // Extract range of bits with zero extension to 32-bits.		BFE_U32, // Extract range of bits with zero extension to 32-bits.
BFE_I32, // Extract range of bits with sign extension to 32-bits.		BFE_I32, // Extract range of bits with sign extension to 32-bits.
BFI, // (src0 & src1) \| (~src0 & src2)		BFI, // (src0 & src1) \| (~src0 & src2)
BFM, // Insert a range of bits into a 32-bit word.		BFM, // Insert a range of bits into a 32-bit word.
FFBH_U32, // ctlz with -1 if input is zero.		FFBH_U32, // ctlz with -1 if input is zero.
FFBH_I32,		FFBH_I32,
		FFBL_U32, // cttz with -1 if input is zero.
		arsenmUnsubmitted Done Reply Inline Actions Should be name FFBL_B32 to match the instruction arsenm: Should be name FFBL_B32 to match the instruction
MUL_U24,		MUL_U24,
MUL_I24,		MUL_I24,
MULHI_U24,		MULHI_U24,
MULHI_I24,		MULHI_I24,
MAD_U24,		MAD_U24,
MAD_I24,		MAD_I24,
MUL_LOHI_I24,		MUL_LOHI_I24,
MUL_LOHI_U24,		MUL_LOHI_U24,
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 408 Lines • ▼ Show 20 Lines	AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::SELECT_CC, MVT::i64, Expand);		setOperationAction(ISD::SELECT_CC, MVT::i64, Expand);

setOperationAction(ISD::SMIN, MVT::i32, Legal);		setOperationAction(ISD::SMIN, MVT::i32, Legal);
setOperationAction(ISD::UMIN, MVT::i32, Legal);		setOperationAction(ISD::UMIN, MVT::i32, Legal);
setOperationAction(ISD::SMAX, MVT::i32, Legal);		setOperationAction(ISD::SMAX, MVT::i32, Legal);
setOperationAction(ISD::UMAX, MVT::i32, Legal);		setOperationAction(ISD::UMAX, MVT::i32, Legal);

if (Subtarget->hasFFBH())		if (Subtarget->hasFFBH())
setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i32, Custom);		setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i32, Custom);
		arsenmUnsubmitted Not Done Reply Inline Actions We should probably fix this at some point to be legal arsenm: We should probably fix this at some point to be legal

if (Subtarget->hasFFBL())		if (Subtarget->hasFFBL())
setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i32, Legal);		setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i32, Custom);
arsenmUnsubmitted Done Reply Inline Actions This should definitely remain legal arsenm: This should definitely remain legal
wdngAuthorUnsubmitted Not Done Reply Inline Actions I think it doesn't matter to define it as Custom, because it will be converted to FFBL_U32 during the custom lowering and then pattern matching to the ffbl instruction anyway at the end. However, if we defined it as Legal, we will have a "duplicate" or "extra" pattern (FFBL_U32 and CTTZ_ZERO_UNDEF) for generating the ffbl instruction. Is there any specific reason that I neglect here that we have to define it as Legal? wdng: I think it doesn't matter to define it as Custom, because it will be converted to FFBL_U32…

		setOperationAction(ISD::CTTZ, MVT::i64, Custom);
		setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i64, Custom);
setOperationAction(ISD::CTLZ, MVT::i64, Custom);		setOperationAction(ISD::CTLZ, MVT::i64, Custom);
setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i64, Custom);		setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i64, Custom);

// We only really have 32-bit BFE instructions (and 16-bit on VI).		// We only really have 32-bit BFE instructions (and 16-bit on VI).
//		//
// On SI+ there are 64-bit BFEs, but they are scalar only and there isn't any		// On SI+ there are 64-bit BFEs, but they are scalar only and there isn't any
// effort to match them now. We want this to be false for i64 cases when the		// effort to match them now. We want this to be false for i64 cases when the
// extraction isn't restricted to the upper or lower half. Ideally we would		// extraction isn't restricted to the upper or lower half. Ideally we would
▲ Show 20 Lines • Show All 670 Lines • ▼ Show 20 Lines	SDValue AMDGPUTargetLowering::LowerOperation(SDValue Op,
case ISD::FNEARBYINT: return LowerFNEARBYINT(Op, DAG);		case ISD::FNEARBYINT: return LowerFNEARBYINT(Op, DAG);
case ISD::FROUND: return LowerFROUND(Op, DAG);		case ISD::FROUND: return LowerFROUND(Op, DAG);
case ISD::FFLOOR: return LowerFFLOOR(Op, DAG);		case ISD::FFLOOR: return LowerFFLOOR(Op, DAG);
case ISD::SINT_TO_FP: return LowerSINT_TO_FP(Op, DAG);		case ISD::SINT_TO_FP: return LowerSINT_TO_FP(Op, DAG);
case ISD::UINT_TO_FP: return LowerUINT_TO_FP(Op, DAG);		case ISD::UINT_TO_FP: return LowerUINT_TO_FP(Op, DAG);
case ISD::FP_TO_FP16: return LowerFP_TO_FP16(Op, DAG);		case ISD::FP_TO_FP16: return LowerFP_TO_FP16(Op, DAG);
case ISD::FP_TO_SINT: return LowerFP_TO_SINT(Op, DAG);		case ISD::FP_TO_SINT: return LowerFP_TO_SINT(Op, DAG);
case ISD::FP_TO_UINT: return LowerFP_TO_UINT(Op, DAG);		case ISD::FP_TO_UINT: return LowerFP_TO_UINT(Op, DAG);
		case ISD::CTTZ:
		case ISD::CTTZ_ZERO_UNDEF:
case ISD::CTLZ:		case ISD::CTLZ:
		arsenmUnsubmitted Not Done Reply Inline Actions Also need the select with -1 optimization (and corresponding tests) as cttz arsenm: Also need the select with -1 optimization (and corresponding tests) as cttz
case ISD::CTLZ_ZERO_UNDEF:		case ISD::CTLZ_ZERO_UNDEF:
return LowerCTLZ(Op, DAG);		return LowerCTTZ_CTLZ(Op, DAG);
case ISD::DYNAMIC_STACKALLOC: return LowerDYNAMIC_STACKALLOC(Op, DAG);		case ISD::DYNAMIC_STACKALLOC: return LowerDYNAMIC_STACKALLOC(Op, DAG);
}		}
return Op;		return Op;
}		}

void AMDGPUTargetLowering::ReplaceNodeResults(SDNode *N,		void AMDGPUTargetLowering::ReplaceNodeResults(SDNode *N,
SmallVectorImpl<SDValue> &Results,		SmallVectorImpl<SDValue> &Results,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
▲ Show 20 Lines • Show All 890 Lines • ▼ Show 20 Lines	SDValue AMDGPUTargetLowering::LowerFFLOOR(SDValue Op, SelectionDAG &DAG) const {
SDValue NeTrunc = DAG.getSetCC(SL, SetCCVT, Src, Trunc, ISD::SETONE);		SDValue NeTrunc = DAG.getSetCC(SL, SetCCVT, Src, Trunc, ISD::SETONE);
SDValue And = DAG.getNode(ISD::AND, SL, SetCCVT, Lt0, NeTrunc);		SDValue And = DAG.getNode(ISD::AND, SL, SetCCVT, Lt0, NeTrunc);

SDValue Add = DAG.getNode(ISD::SELECT, SL, MVT::f64, And, NegOne, Zero);		SDValue Add = DAG.getNode(ISD::SELECT, SL, MVT::f64, And, NegOne, Zero);
// TODO: Should this propagate fast-math-flags?		// TODO: Should this propagate fast-math-flags?
return DAG.getNode(ISD::FADD, SL, MVT::f64, Trunc, Add);		return DAG.getNode(ISD::FADD, SL, MVT::f64, Trunc, Add);
}		}

SDValue AMDGPUTargetLowering::LowerCTLZ(SDValue Op, SelectionDAG &DAG) const {		static bool isCtlzOpc(unsigned Opc) {
		return Opc == ISD::CTLZ \|\| Opc == ISD::CTLZ_ZERO_UNDEF;
		arsenmUnsubmitted Done Reply Inline Actions This is mostly copy past from LowerCTLZ. These should be factored into a common helper. arsenm: This is mostly copy past from LowerCTLZ. These should be factored into a common helper.
		}

		SDValue AMDGPUTargetLowering:: LowerCTTZ_CTLZ(SDValue Op, SelectionDAG &DAG) const {
SDLoc SL(Op);		SDLoc SL(Op);
SDValue Src = Op.getOperand(0);		SDValue Src = Op.getOperand(0);
bool ZeroUndef = Op.getOpcode() == ISD::CTLZ_ZERO_UNDEF;		bool ZeroUndef = Op.getOpcode() == ISD::CTTZ_ZERO_UNDEF \|\|
		Op.getOpcode() == ISD::CTLZ_ZERO_UNDEF;
		arsenmUnsubmitted Done Reply Inline Actions Extra space after :: arsenm: Extra space after ::

		arsenmUnsubmitted Not Done Reply Inline Actions You didn't enable custom lowering for i64, so this is dead code. You also didn't a dd any tests for it. In either case, it should be in a separate patch from the i32 handling. arsenm: You didn't enable custom lowering for i64, so this is dead code. You also didn't a dd any tests…
		unsigned ISDOpc, AMDGPUISDOpc;
		if (isCtlzOpc(Op.getOpcode())) {
		ISDOpc = ISD::CTLZ_ZERO_UNDEF;
		AMDGPUISDOpc = AMDGPUISD::FFBH_U32;
		arsenmUnsubmitted Done Reply Inline Actions Don't includ eAMDGPUISD in variable name arsenm: Don't includ eAMDGPUISD in variable name
		} else {
		arsenmUnsubmitted Done Reply Inline Actions Missing space before { arsenm: Missing space before {
		ISDOpc = ISD::CTTZ_ZERO_UNDEF;
		AMDGPUISDOpc = AMDGPUISD::FFBL_U32;
		}
		arsenmUnsubmitted Done Reply Inline Actions Indentation wrong arsenm: Indentation wrong

		arsenmUnsubmitted Done Reply Inline Actions llvm_unreachable arsenm: llvm_unreachable
if (ZeroUndef && Src.getValueType() == MVT::i32)		if (ZeroUndef && Src.getValueType() == MVT::i32)
return DAG.getNode(AMDGPUISD::FFBH_U32, SL, MVT::i32, Src);		return DAG.getNode(AMDGPUISDOpc, SL, MVT::i32, Src);

SDValue Vec = DAG.getNode(ISD::BITCAST, SL, MVT::v2i32, Src);		SDValue Vec = DAG.getNode(ISD::BITCAST, SL, MVT::v2i32, Src);

const SDValue Zero = DAG.getConstant(0, SL, MVT::i32);		const SDValue Zero = DAG.getConstant(0, SL, MVT::i32);
const SDValue One = DAG.getConstant(1, SL, MVT::i32);		const SDValue One = DAG.getConstant(1, SL, MVT::i32);

SDValue Lo = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::i32, Vec, Zero);		SDValue Lo = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::i32, Vec, Zero);
SDValue Hi = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::i32, Vec, One);		SDValue Hi = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::i32, Vec, One);

EVT SetCCVT = getSetCCResultType(DAG.getDataLayout(),		EVT SetCCVT = getSetCCResultType(DAG.getDataLayout(),
*DAG.getContext(), MVT::i32);		*DAG.getContext(), MVT::i32);

SDValue Hi0 = DAG.getSetCC(SL, SetCCVT, Hi, Zero, ISD::SETEQ);		SDValue Hi0;
		if (isCtlzOpc(Op.getOpcode()))
		Hi0 = DAG.getSetCC(SL, SetCCVT, Hi, Zero, ISD::SETEQ);
		else
		Hi0 = DAG.getSetCC(SL, SetCCVT, Hi, One, ISD::SETEQ);
		arsenmUnsubmitted Done Reply Inline Actions Select between Zero and One as input to getSetCC arsenm: Select between Zero and One as input to getSetCC

SDValue CtlzLo = DAG.getNode(ISD::CTLZ_ZERO_UNDEF, SL, MVT::i32, Lo);		SDValue OprLo = DAG.getNode(ISDOpc, SL, MVT::i32, Lo);
SDValue CtlzHi = DAG.getNode(ISD::CTLZ_ZERO_UNDEF, SL, MVT::i32, Hi);		SDValue OprHi = DAG.getNode(ISDOpc, SL, MVT::i32, Hi);

const SDValue Bits32 = DAG.getConstant(32, SL, MVT::i32);		const SDValue Bits32 = DAG.getConstant(32, SL, MVT::i32);
SDValue Add = DAG.getNode(ISD::ADD, SL, MVT::i32, CtlzLo, Bits32);		SDValue Add, NewOpr;
		if (isCtlzOpc(Op.getOpcode())) {
// ctlz(x) = hi_32(x) == 0 ? ctlz(lo_32(x)) + 32 : ctlz(hi_32(x))		Add = DAG.getNode(ISD::ADD, SL, MVT::i32, OprLo, Bits32);
SDValue NewCtlz = DAG.getNode(ISD::SELECT, SL, MVT::i32, Hi0, Add, CtlzHi);		//// ctlz(x) = hi_32(x) == 0 ? ctlz(lo_32(x)) + 32 : ctlz(hi_32(x)
		arsenmUnsubmitted Done Reply Inline Actions Double // and missing closing ) arsenm: Double // and missing closing )
		NewOpr = DAG.getNode(ISD::SELECT, SL, MVT::i32, Hi0, Add, OprHi);
		} else {
		Add = DAG.getNode(ISD::ADD, SL, MVT::i32, OprHi, Bits32);
		//// cttz(x) = lo_32(x) == 0 ? cttz(hi_32(x)) + 32 : cttz(lo_32(x))
		arsenmUnsubmitted Done Reply Inline Actions Double // arsenm: Double //
		NewOpr = DAG.getNode(ISD::SELECT, SL, MVT::i32, Hi0, Add, OprLo);
		}

if (!ZeroUndef) {		if (!ZeroUndef) {
// Test if the full 64-bit input is zero.		// Test if the full 64-bit input is zero.

// FIXME: DAG combines turn what should be an s_and_b64 into a v_or_b32,		// FIXME: DAG combines turn what should be an s_and_b64 into a v_or_b32,
// which we probably don't want.		// which we probably don't want.
SDValue Lo0 = DAG.getSetCC(SL, SetCCVT, Lo, Zero, ISD::SETEQ);		SDValue Lo0 = DAG.getSetCC(SL, SetCCVT, Lo, Zero, ISD::SETEQ);
SDValue SrcIsZero = DAG.getNode(ISD::AND, SL, SetCCVT, Lo0, Hi0);		SDValue SrcIsZero = DAG.getNode(ISD::AND, SL, SetCCVT, Lo0, Hi0);

// TODO: If i64 setcc is half rate, it can result in 1 fewer instruction		// TODO: If i64 setcc is half rate, it can result in 1 fewer instruction
// with the same cycles, otherwise it is slower.		// with the same cycles, otherwise it is slower.
// SDValue SrcIsZero = DAG.getSetCC(SL, SetCCVT, Src,		// SDValue SrcIsZero = DAG.getSetCC(SL, SetCCVT, Src,
// DAG.getConstant(0, SL, MVT::i64), ISD::SETEQ);		// DAG.getConstant(0, SL, MVT::i64), ISD::SETEQ);

const SDValue Bits32 = DAG.getConstant(64, SL, MVT::i32);		const SDValue Bits32 = DAG.getConstant(64, SL, MVT::i32);

// The instruction returns -1 for 0 input, but the defined intrinsic		// The instruction returns -1 for 0 input, but the defined intrinsic
// behavior is to return the number of bits.		// behavior is to return the number of bits.
NewCtlz = DAG.getNode(ISD::SELECT, SL, MVT::i32,		NewOpr = DAG.getNode(ISD::SELECT, SL, MVT::i32,
SrcIsZero, Bits32, NewCtlz);		SrcIsZero, Bits32, NewOpr);
}		}

return DAG.getNode(ISD::ZERO_EXTEND, SL, MVT::i64, NewCtlz);		return DAG.getNode(ISD::ZERO_EXTEND, SL, MVT::i64, NewOpr);
}		}

SDValue AMDGPUTargetLowering::LowerINT_TO_FP32(SDValue Op, SelectionDAG &DAG,		SDValue AMDGPUTargetLowering::LowerINT_TO_FP32(SDValue Op, SelectionDAG &DAG,
bool Signed) const {		bool Signed) const {
// Unsigned		// Unsigned
// cul2f(ulong u)		// cul2f(ulong u)
//{		//{
// uint lz = clz(u);		// uint lz = clz(u);
▲ Show 20 Lines • Show All 885 Lines • ▼ Show 20 Lines
}		}

static bool isNegativeOne(SDValue Val) {		static bool isNegativeOne(SDValue Val) {
if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Val))		if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Val))
return C->isAllOnesValue();		return C->isAllOnesValue();
return false;		return false;
}		}

static bool isCtlzOpc(unsigned Opc) {
return Opc == ISD::CTLZ \|\| Opc == ISD::CTLZ_ZERO_UNDEF;
}

SDValue AMDGPUTargetLowering::getFFBH_U32(SelectionDAG &DAG,		SDValue AMDGPUTargetLowering::getFFBH_U32(SelectionDAG &DAG,
SDValue Op,		SDValue Op,
const SDLoc &DL) const {		const SDLoc &DL) const {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
EVT LegalVT = getTypeToTransformTo(*DAG.getContext(), VT);		EVT LegalVT = getTypeToTransformTo(*DAG.getContext(), VT);
if (LegalVT != MVT::i32 && (Subtarget->has16BitInsts() &&		if (LegalVT != MVT::i32 && (Subtarget->has16BitInsts() &&
		arsenmUnsubmitted Done Reply Inline Actions You could just pass in the new opcode directly rather than selecting it again arsenm: You could just pass in the new opcode directly rather than selecting it again
LegalVT != MVT::i16))		LegalVT != MVT::i16))
		arsenmUnsubmitted Done Reply Inline Actions Commented out code arsenm: Commented out code
return SDValue();		return SDValue();

if (VT != MVT::i32)		if (VT != MVT::i32)
Op = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i32, Op);		Op = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i32, Op);

SDValue FFBH = DAG.getNode(AMDGPUISD::FFBH_U32, DL, MVT::i32, Op);		SDValue FFBH = DAG.getNode(AMDGPUISD::FFBH_U32, DL, MVT::i32, Op);
		arsenmUnsubmitted Done Reply Inline Actions Don't include AMDGPUISD in the name of this arsenm: Don't include AMDGPUISD in the name of this
if (VT != MVT::i32)		if (VT != MVT::i32)
FFBH = DAG.getNode(ISD::TRUNCATE, DL, VT, FFBH);		FFBH = DAG.getNode(ISD::TRUNCATE, DL, VT, FFBH);

return FFBH;		return FFBH;
}		}

// The native instructions return -1 on 0 input. Optimize out a select that		// The native instructions return -1 on 0 input. Optimize out a select that
// produces -1 on 0.		// produces -1 on 0.
//		//
// TODO: If zero is not undef, we could also do this if the output is compared		// TODO: If zero is not undef, we could also do this if the output is compared
// against the bitwidth.		// against the bitwidth.
//		//
// TODO: Should probably combine against FFBH_U32 instead of ctlz directly.		// TODO: Should probably combine against FFBH_U32 instead of ctlz directly.
SDValue AMDGPUTargetLowering::performCtlzCombine(const SDLoc &SL, SDValue Cond,		SDValue AMDGPUTargetLowering::performCtlzCombine(const SDLoc &SL, SDValue Cond,
		arsenmUnsubmitted Not Done Reply Inline Actions You didn't add tests for this part arsenm: You didn't add tests for this part
SDValue LHS, SDValue RHS,		SDValue LHS, SDValue RHS,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
ConstantSDNode *CmpRhs = dyn_cast<ConstantSDNode>(Cond.getOperand(1));		ConstantSDNode *CmpRhs = dyn_cast<ConstantSDNode>(Cond.getOperand(1));
if (!CmpRhs \|\| !CmpRhs->isNullValue())		if (!CmpRhs \|\| !CmpRhs->isNullValue())
return SDValue();		return SDValue();

SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
ISD::CondCode CCOpcode = cast<CondCodeSDNode>(Cond.getOperand(2))->get();		ISD::CondCode CCOpcode = cast<CondCodeSDNode>(Cond.getOperand(2))->get();
SDValue CmpLHS = Cond.getOperand(0);		SDValue CmpLHS = Cond.getOperand(0);

// select (setcc x, 0, eq), -1, (ctlz_zero_undef x) -> ffbh_u32 x		// select (setcc x, 0, eq), -1, (ctlz_zero_undef x) -> ffbh_u32 x
		arsenmUnsubmitted Done Reply Inline Actions Ditto arsenm: Ditto
if (CCOpcode == ISD::SETEQ &&		if (CCOpcode == ISD::SETEQ &&
isCtlzOpc(RHS.getOpcode()) &&		isCtlzOpc(RHS.getOpcode()) &&
RHS.getOperand(0) == CmpLHS &&		RHS.getOperand(0) == CmpLHS &&
isNegativeOne(LHS)) {		isNegativeOne(LHS)) {
return getFFBH_U32(DAG, CmpLHS, SL);		return getFFBH_U32(DAG, CmpLHS, SL);
}		}

// select (setcc x, 0, ne), (ctlz_zero_undef x), -1 -> ffbh_u32 x		// select (setcc x, 0, ne), (ctlz_zero_undef x), -1 -> ffbh_u32 x
▲ Show 20 Lines • Show All 717 Lines • ▼ Show 20 Lines	const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
NODE_NAME_CASE(CARRY)		NODE_NAME_CASE(CARRY)
NODE_NAME_CASE(BORROW)		NODE_NAME_CASE(BORROW)
NODE_NAME_CASE(BFE_U32)		NODE_NAME_CASE(BFE_U32)
NODE_NAME_CASE(BFE_I32)		NODE_NAME_CASE(BFE_I32)
NODE_NAME_CASE(BFI)		NODE_NAME_CASE(BFI)
NODE_NAME_CASE(BFM)		NODE_NAME_CASE(BFM)
NODE_NAME_CASE(FFBH_U32)		NODE_NAME_CASE(FFBH_U32)
NODE_NAME_CASE(FFBH_I32)		NODE_NAME_CASE(FFBH_I32)
		NODE_NAME_CASE(FFBL_U32)
NODE_NAME_CASE(MUL_U24)		NODE_NAME_CASE(MUL_U24)
NODE_NAME_CASE(MUL_I24)		NODE_NAME_CASE(MUL_I24)
NODE_NAME_CASE(MULHI_U24)		NODE_NAME_CASE(MULHI_U24)
NODE_NAME_CASE(MULHI_I24)		NODE_NAME_CASE(MULHI_I24)
NODE_NAME_CASE(MUL_LOHI_U24)		NODE_NAME_CASE(MUL_LOHI_U24)
NODE_NAME_CASE(MUL_LOHI_I24)		NODE_NAME_CASE(MUL_LOHI_I24)
NODE_NAME_CASE(MAD_U24)		NODE_NAME_CASE(MAD_U24)
NODE_NAME_CASE(MAD_I24)		NODE_NAME_CASE(MAD_I24)
▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUInstrInfo.td

	Show First 20 Lines • Show All 292 Lines • ▼ Show 20 Lines
	def AMDGPUbfe_u32 : SDNode<"AMDGPUISD::BFE_U32", AMDGPUDTIntTernaryOp>;			def AMDGPUbfe_u32 : SDNode<"AMDGPUISD::BFE_U32", AMDGPUDTIntTernaryOp>;
	def AMDGPUbfe_i32 : SDNode<"AMDGPUISD::BFE_I32", AMDGPUDTIntTernaryOp>;			def AMDGPUbfe_i32 : SDNode<"AMDGPUISD::BFE_I32", AMDGPUDTIntTernaryOp>;
	def AMDGPUbfi : SDNode<"AMDGPUISD::BFI", AMDGPUDTIntTernaryOp>;			def AMDGPUbfi : SDNode<"AMDGPUISD::BFI", AMDGPUDTIntTernaryOp>;
	def AMDGPUbfm : SDNode<"AMDGPUISD::BFM", SDTIntBinOp>;			def AMDGPUbfm : SDNode<"AMDGPUISD::BFM", SDTIntBinOp>;

	def AMDGPUffbh_u32 : SDNode<"AMDGPUISD::FFBH_U32", SDTIntUnaryOp>;			def AMDGPUffbh_u32 : SDNode<"AMDGPUISD::FFBH_U32", SDTIntUnaryOp>;
	def AMDGPUffbh_i32 : SDNode<"AMDGPUISD::FFBH_I32", SDTIntUnaryOp>;			def AMDGPUffbh_i32 : SDNode<"AMDGPUISD::FFBH_I32", SDTIntUnaryOp>;

				def AMDGPUffbl_u32 : SDNode<"AMDGPUISD::FFBL_U32", SDTIntUnaryOp>;

				arsenmUnsubmitted Done Reply Inline Actions This isn't a signed/unsigned operation. There is just one v_ffbl_b32. arsenm: This isn't a signed/unsigned operation. There is just one v_ffbl_b32.
	// Signed and unsigned 24-bit multiply. The highest 8-bits are ignore			// Signed and unsigned 24-bit multiply. The highest 8-bits are ignore
	// when performing the mulitply. The result is a 32-bit value.			// when performing the mulitply. The result is a 32-bit value.
	def AMDGPUmul_u24 : SDNode<"AMDGPUISD::MUL_U24", SDTIntBinOp,			def AMDGPUmul_u24 : SDNode<"AMDGPUISD::MUL_U24", SDTIntBinOp,
	[SDNPCommutative, SDNPAssociative]			[SDNPCommutative, SDNPAssociative]
	>;			>;
	def AMDGPUmul_i24 : SDNode<"AMDGPUISD::MUL_I24", SDTIntBinOp,			def AMDGPUmul_i24 : SDNode<"AMDGPUISD::MUL_I24", SDTIntBinOp,
	[SDNPCommutative, SDNPAssociative]			[SDNPCommutative, SDNPAssociative]
	>;			>;
	▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

lib/Target/AMDGPU/EvergreenInstructions.td

	Show First 20 Lines • Show All 436 Lines • ▼ Show 20 Lines

	def ADDC_UINT : R600_2OP_Helper <0x52, "ADDC_UINT", AMDGPUcarry>;			def ADDC_UINT : R600_2OP_Helper <0x52, "ADDC_UINT", AMDGPUcarry>;
	def SUBB_UINT : R600_2OP_Helper <0x53, "SUBB_UINT", AMDGPUborrow>;			def SUBB_UINT : R600_2OP_Helper <0x53, "SUBB_UINT", AMDGPUborrow>;

	def FLT32_TO_FLT16 : R600_1OP_Helper <0xA2, "FLT32_TO_FLT16", AMDGPUfp_to_f16, VecALU>;			def FLT32_TO_FLT16 : R600_1OP_Helper <0xA2, "FLT32_TO_FLT16", AMDGPUfp_to_f16, VecALU>;
	def FLT16_TO_FLT32 : R600_1OP_Helper <0xA3, "FLT16_TO_FLT32", f16_to_fp, VecALU>;			def FLT16_TO_FLT32 : R600_1OP_Helper <0xA3, "FLT16_TO_FLT32", f16_to_fp, VecALU>;
	def BCNT_INT : R600_1OP_Helper <0xAA, "BCNT_INT", ctpop, VecALU>;			def BCNT_INT : R600_1OP_Helper <0xAA, "BCNT_INT", ctpop, VecALU>;
	def FFBH_UINT : R600_1OP_Helper <0xAB, "FFBH_UINT", AMDGPUffbh_u32, VecALU>;			def FFBH_UINT : R600_1OP_Helper <0xAB, "FFBH_UINT", AMDGPUffbh_u32, VecALU>;
	def FFBL_INT : R600_1OP_Helper <0xAC, "FFBL_INT", cttz_zero_undef, VecALU>;			def FFBL_INT : R600_1OP_Helper <0xAC, "FFBL_INT", AMDGPUffbl_u32, VecALU>;

	let hasSideEffects = 1 in {			let hasSideEffects = 1 in {
	def MOVA_INT_eg : R600_1OP <0xCC, "MOVA_INT", [], VecALU>;			def MOVA_INT_eg : R600_1OP <0xCC, "MOVA_INT", [], VecALU>;
	}			}

	def FLT_TO_INT_eg : FLT_TO_INT_Common<0x50> {			def FLT_TO_INT_eg : FLT_TO_INT_Common<0x50> {
	let Pattern = [];			let Pattern = [];
	let Itinerary = AnyALU;			let Itinerary = AnyALU;
	▲ Show 20 Lines • Show All 313 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SOPInstructions.td

	Show First 20 Lines • Show All 153 Lines • ▼ Show 20 Lines
	def S_BCNT1_I32_B32 : SOP1_32 <"s_bcnt1_i32_b32",			def S_BCNT1_I32_B32 : SOP1_32 <"s_bcnt1_i32_b32",
	[(set i32:$sdst, (ctpop i32:$src0))]			[(set i32:$sdst, (ctpop i32:$src0))]
	>;			>;
	def S_BCNT1_I32_B64 : SOP1_32_64 <"s_bcnt1_i32_b64">;			def S_BCNT1_I32_B64 : SOP1_32_64 <"s_bcnt1_i32_b64">;
	} // End Defs = [SCC]			} // End Defs = [SCC]

	def S_FF0_I32_B32 : SOP1_32 <"s_ff0_i32_b32">;			def S_FF0_I32_B32 : SOP1_32 <"s_ff0_i32_b32">;
	def S_FF0_I32_B64 : SOP1_32_64 <"s_ff0_i32_b64">;			def S_FF0_I32_B64 : SOP1_32_64 <"s_ff0_i32_b64">;
				def S_FF1_I32_B64 : SOP1_32_64 <"s_ff1_i32_b64">;

	def S_FF1_I32_B32 : SOP1_32 <"s_ff1_i32_b32",			def S_FF1_I32_B32 : SOP1_32 <"s_ff1_i32_b32",
	[(set i32:$sdst, (cttz_zero_undef i32:$src0))]			[(set i32:$sdst, (AMDGPUffbl_u32 i32:$src0))]
	>;			>;
	def S_FF1_I32_B64 : SOP1_32_64 <"s_ff1_i32_b64">;

	def S_FLBIT_I32_B32 : SOP1_32 <"s_flbit_i32_b32",			def S_FLBIT_I32_B32 : SOP1_32 <"s_flbit_i32_b32",
	[(set i32:$sdst, (AMDGPUffbh_u32 i32:$src0))]			[(set i32:$sdst, (AMDGPUffbh_u32 i32:$src0))]
	>;			>;

	def S_FLBIT_I32_B64 : SOP1_32_64 <"s_flbit_i32_b64">;			def S_FLBIT_I32_B64 : SOP1_32_64 <"s_flbit_i32_b64">;
	def S_FLBIT_I32 : SOP1_32 <"s_flbit_i32",			def S_FLBIT_I32 : SOP1_32 <"s_flbit_i32",
	[(set i32:$sdst, (AMDGPUffbh_i32 i32:$src0))]			[(set i32:$sdst, (AMDGPUffbh_i32 i32:$src0))]
	▲ Show 20 Lines • Show All 1,137 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/cttz_zero_undef.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
				arsenmUnsubmitted Not Done Reply Inline Actions Add -enable-var-scope to all of the FileCheck lines. Several of these tests are broken arsenm: Add -enable-var-scope to all of the FileCheck lines. Several of these tests are broken
	; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s			; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s

	declare i32 @llvm.cttz.i32(i32, i1) nounwind readnone			declare i32 @llvm.cttz.i32(i32, i1) nounwind readnone
				declare i64 @llvm.cttz.i64(i64, i1) nounwind readnone
	declare <2 x i32> @llvm.cttz.v2i32(<2 x i32>, i1) nounwind readnone			declare <2 x i32> @llvm.cttz.v2i32(<2 x i32>, i1) nounwind readnone
	declare <4 x i32> @llvm.cttz.v4i32(<4 x i32>, i1) nounwind readnone			declare <4 x i32> @llvm.cttz.v4i32(<4 x i32>, i1) nounwind readnone
	declare i32 @llvm.r600.read.tidig.x() nounwind readnone			declare i32 @llvm.r600.read.tidig.x() nounwind readnone

	; FUNC-LABEL: {{^}}s_cttz_zero_undef_i32:			; FUNC-LABEL: {{^}}s_cttz_zero_undef_i32:
	; SI: s_load_dword [[VAL:s[0-9]+]],			; SI: s_load_dword [[VAL:s[0-9]+]],
	; SI: s_ff1_i32_b32 [[SRESULT:s[0-9]+]], [[VAL]]			; SI: s_ff1_i32_b32 [[SRESULT:s[0-9]+]], [[VAL]]
	; SI: v_mov_b32_e32 [[VRESULT:v[0-9]+]], [[SRESULT]]			; SI: v_mov_b32_e32 [[VRESULT:v[0-9]+]], [[SRESULT]]
	▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @v_cttz_zero_undef_v4i32(<4 x i32> addrspace(1)* noalias %out, <4 x i32> addrspace(1)* noalias %valptr) nounwind {			define amdgpu_kernel void @v_cttz_zero_undef_v4i32(<4 x i32> addrspace(1)* noalias %out, <4 x i32> addrspace(1)* noalias %valptr) nounwind {
	%tid = call i32 @llvm.r600.read.tidig.x()			%tid = call i32 @llvm.r600.read.tidig.x()
	%in.gep = getelementptr <4 x i32>, <4 x i32> addrspace(1)* %valptr, i32 %tid			%in.gep = getelementptr <4 x i32>, <4 x i32> addrspace(1)* %valptr, i32 %tid
	%val = load <4 x i32>, <4 x i32> addrspace(1)* %in.gep, align 16			%val = load <4 x i32>, <4 x i32> addrspace(1)* %in.gep, align 16
	%cttz = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> %val, i1 true) nounwind readnone			%cttz = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> %val, i1 true) nounwind readnone
	store <4 x i32> %cttz, <4 x i32> addrspace(1)* %out, align 16			store <4 x i32> %cttz, <4 x i32> addrspace(1)* %out, align 16
	ret void			ret void
	}			}

				; FUNC-LABEL: {{^}}s_cttz_zero_undef_i32_with_select:
				; SI: s_ff1_i32_b32
				arsenmUnsubmitted Done Reply Inline Actions This needs to check more arsenm: This needs to check more
				; EG: MEM_RAT_CACHELESS STORE_RAW [[RESULT:T[0-9]+\.[XYZW]]]
				; EG: FFBL_INT {{\? }}[[RESULT]]
				define amdgpu_kernel void @s_cttz_zero_undef_i32_with_select(i32 addrspace(1)* noalias %out, i32 %val) nounwind {
				%cttz = tail call i32 @llvm.cttz.i32(i32 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i32 %val, 0
				%ret = select i1 %cttz_ret, i32 %cttz, i32 32
				store i32 %cttz, i32 addrspace(1)* %out, align 4
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_zero_undef_i32_with_select:
				; SI: v_ffbl_b32_e32
				arsenmUnsubmitted Done Reply Inline Actions This needs to check more arsenm: This needs to check more
				; EG: MEM_RAT_CACHELESS STORE_RAW [[RESULT:T[0-9]+\.[XYZW]]]
				define amdgpu_kernel void @v_cttz_zero_undef_i32_with_select(i32 addrspace(1)* noalias %out, i32 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i32, i32 addrspace(1)* %arrayidx, align 1
				%cttz = tail call i32 @llvm.cttz.i32(i32 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i32 %val, 0
				%ret = select i1 %cttz_ret, i32 %cttz, i32 32
				store i32 %ret, i32 addrspace(1)* %out, align 4
				ret void
				}

				arsenmUnsubmitted Done Reply Inline Actions Need i64 tests arsenm: Need i64 tests
				; FUNC-LABEL: {{^}}v_cttz_zero_undef_i64_with_select:
				; SI: v_ffbl_b32_e32
				; SI: v_ffbl_b32_e32
				; EG: MEM_RAT_CACHELESS STORE_RAW [[RESULT:T[0-9]+\.[XYZW]]]
				define amdgpu_kernel void @v_cttz_zero_undef_i64_with_select(i64 addrspace(1)* noalias %out, i64 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				arsenmUnsubmitted Done Reply Inline Actions Missing scalar version arsenm: Missing scalar version
				%val = load i64, i64 addrspace(1)* %arrayidx, align 1
				%cttz = tail call i64 @llvm.cttz.i64(i64 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i64 %val, 0
				%ret = select i1 %cttz_ret, i64 %cttz, i64 32
				store i64 %ret, i64 addrspace(1)* %out, align 4
				ret void
				}

				arsenmUnsubmitted Done Reply Inline Actions Also should have some tests with i8/i16 arsenm: Also should have some tests with i8/i16
				arsenmUnsubmitted Done Reply Inline Actions Ditto arsenm: Ditto
				arsenmUnsubmitted Done Reply Inline Actions This isn't checking the outputs and select arsenm: This isn't checking the outputs and select
				arsenmUnsubmitted Done Reply Inline Actions Using undefined VAL arsenm: Using undefined VAL
				arsenmUnsubmitted Done Reply Inline Actions Undefined VAL arsenm: Undefined VAL
				arsenmUnsubmitted Done Reply Inline Actions Using 2 -DAGs with identical lines doesn't do anything. It will pass with only one arsenm: Using 2 -DAGs with identical lines doesn't do anything. It will pass with only one
				wdngAuthorUnsubmitted Done Reply Inline Actions No, it won't work if I remove the -DAG. As the generated instructions get interleaved with each other. wdng: No, it won't work if I remove the -DAG. As the generated instructions get interleaved with each…

This is an archive of the discontinued LLVM Phabricator instance.

Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 115231

include/llvm/Target/TargetSelectionDAG.td

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

lib/Target/AMDGPU/AMDGPUISelLowering.h

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

lib/Target/AMDGPU/AMDGPUInstrInfo.td

lib/Target/AMDGPU/EvergreenInstructions.td

lib/Target/AMDGPU/SOPInstructions.td

test/CodeGen/AMDGPU/cttz_zero_undef.ll

Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
ClosedPublic