This is an archive of the discontinued LLVM Phabricator instance.

Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
ClosedPublic

Authored by wdng on Aug 31 2017, 12:27 PM.

Download Raw Diff

Details

Reviewers

arsenm
b-sumner
t-tye
kzhuravl
rampitec

Commits

rG5676acad9e0b: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
rL315610: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.

Summary

During the DAGCombine optimization phase, the LLVM compiler converts ISD::CTTZ_ZERO_UNDEF to ISD::CTTZ and then expands during the Legalization phase, which prevents the v_ffbl_b32 instruction generation. This patch implements custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.

Diff Detail

Repository: rL LLVM

Event Timeline

wdng created this revision.Aug 31 2017, 12:27 PM

Herald added a subscriber: nhaehnle. · View Herald TranscriptAug 31 2017, 12:28 PM

arsenm added inline comments.Aug 31 2017, 1:34 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
16680 ↗	(On Diff #113451)	The existing code was the correct way to do this

arsenm added a subscriber: llvm-commits.Aug 31 2017, 1:34 PM

wdng added inline comments.Aug 31 2017, 1:54 PM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

16680 ↗

(On Diff #113451)

With original code, we will have the following code transformations:

Initial selection DAG: BB#0 'sample_test:entry'
SelectionDAG has 50 nodes:
  t0: ch = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %vreg2
  t3: i64 = Constant<0>
  t5: i64,ch = load<LD8[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant)> t0, t2, undef:i64
  t6: i64,ch = merge_values t5, t5:1
    t8: i64 = add t2, Constant:i64<8>
  t9: i64,ch = load<LD8[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant)> t0, t8, undef:i64
  t10: i64,ch = merge_values t9, t9:1
  t11: ch = TokenFactor t6:1, t10:1
      t13: i64 = llvm.amdgcn.dispatch.ptr TargetConstant:i32<359>
    t19: i64 = add t13, Constant:i64<4>
  t20: i16,ch = load<LD2[%4(addrspace=2)](align=4)(tbaa=<0x4436db8>)> t11, t19, undef:i64
    t25: i64 = llvm.amdgcn.implicitarg.ptr TargetConstant:i32<460>
  t27: i64,ch = load<LD8[%11(addrspace=2)](tbaa=<0x4435518>)> t11, t25, undef:i64
  t29: i64 = Constant<32>
              t17: i32 = llvm.amdgcn.workgroup.id.x TargetConstant:i32<505>
              t21: i32 = zero_extend t20
            t22: i32 = mul t17, t21
            t15: i32 = llvm.amdgcn.workitem.id.x TargetConstant:i32<508>
          t23: i32 = add t22, t15 
        t26: i64 = zero_extend t23 
      t28: i64 = add t27, t26 
    t31: i64 = shl t28, Constant:i32<32>
  t32: i64 = sra t31, Constant:i32<32>
    t33: i64 = add t6, t32 
  t34: i8,ch = load<LD1[%arrayidx(addrspace=1)](tbaa=<0x4435498>)> t11, t33, undef:i64
  t35: i32 = zero_extend t34 
    t39: i1 = setcc t35, Constant:i32<0>, setne:ch
    t36: i32 = cttz_zero_undef t35
  t40: i32 = select t39, t36, Constant:i32<32>
  t43: i1 = setcc Constant:i32<8>, t40, setult:ch
      t47: ch = TokenFactor t20:1, t27:1, t34:1
        t44: i32 = umin t40, Constant:i32<8>
      t45: i8 = truncate t44 
      t46: i64 = add t10, t32 
    t48: ch = store<ST1[%arrayidx3(addrspace=1)](tbaa=<0x4435498>)> t47, t45, t46, undef:i64
  t49: ch = ENDPGM t48

Optimized lowered selection DAG: BB#0 'sample_test:entry'
SelectionDAG has 35 nodes:
  t0: ch = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %vreg2
  t5: i64,ch = load<LD8[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant)> t0, t2, undef:i64
    t8: i64 = add t2, Constant:i64<8>
  t9: i64,ch = load<LD8[undef(addrspace=2)](nontemporal)(dereferenceable)(invariant)> t0, t8, undef:i64
  t11: ch = TokenFactor t5:1, t9:1
    t33: i64 = add t5, t63 
  t54: i32,ch = load<LD1[%arrayidx(addrspace=1)](tbaa=<0x4435498>), zext from i8> t11, t33, undef:i64
    t25: i64 = llvm.amdgcn.implicitarg.ptr TargetConstant:i32<460>
  t62: i32,ch = load<LD4[%11(addrspace=2)](align=8)(tbaa=<0x4435518>)> t11, t25, undef:i64
          t17: i32 = llvm.amdgcn.workgroup.id.x TargetConstant:i32<505>
        t22: i32 = mul t17, t64 
        t15: i32 = llvm.amdgcn.workitem.id.x TargetConstant:i32<508>
      t23: i32 = add t22, t15
    t60: i32 = add t62, t23
  t63: i64 = sign_extend t60, ValueType:ch:i32
      t13: i64 = llvm.amdgcn.dispatch.ptr TargetConstant:i32<359>
    t19: i64 = add t13, Constant:i64<4>
  t64: i32,ch = load<LD2[%4(addrspace=2)](align=4)(tbaa=<0x4436db8>), zext from i16> t11, t19, undef:i64
      t47: ch = TokenFactor t64:1, t62:1, t54:1
        t53: i32 = cttz t54
      t44: i32 = umin t53, Constant:i32<8>
      t46: i64 = add t9, t63
    t50: ch = store<ST1[%arrayidx3(addrspace=1)](tbaa=<0x4435498>), trunc to i8> t47, t44, t46, undef:i64
  t49: ch = ENDPGM t50

We won't be able to generate s/v_ffbl instructions. I found llvm.cttz.i32 has all been converted to cttz_zero_undef instread of 'cttz'.

If we don't want to change the original way of implementation, we may want to do a custom lowering for ISD::CTTZ at AMDGPU backend to ISD::CTTZ_ZERO_UNDE?

Ping.

I think the actual problem is the implementation of ISD::CTTZ not using v_ffbl and not this transformation.

If v_ffbl is able to produce a defined answer of bit width for 0, then you want to match it with cttz and have the operation action for cttz_zero_undef set to Expand. That will turn all cttz_zero_undef calls into cttz.

If v_ffbl is not capable of handling zero, then you want cttz_zero_undef set to Legal, and cttz set to Expand which will make use of cttz_zero_undef and a select. Or you can make cttz Custom and do your own lowering.

I think the instruction behavior is to return -1 on 0 input. IIRC we handle this and fold that for ctlz already, just not cttz.

Just add a custom lowering ISD:CTTZ to ISD::CTTZ_ZERO_UNDEF

In D37348#859119, @wdng wrote:

Just add a custom lowering ISD:CTTZ to ISD::CTTZ_ZERO_UNDEF

I don't think that will help. Why not follow exactly how CTLZ* is handled now and implement AMDGPUTargetLowering::LowerCTTZ making use of ffbl?

wdng updated this revision to Diff 114215.Sep 7 2017, 11:13 AM

wdng retitled this revision from Tighten conditions for converting ISD::CTTZ_ZERO_UNDEF to ISD::CTTZ to Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ..

wdng added a reviewer: craig.topper.Sep 7 2017, 11:36 AM

Ping.

wdng added a reviewer: t-tye.Sep 8 2017, 12:10 PM

arsenm added inline comments.Sep 8 2017, 1:22 PM

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2784–2793	I don't understand why you have this or most of the other changes. This shouldn't be substantially different from how we handle ctlz already. i.e. I would expect to see another version of AMDGPUTargetLowering::performCtlzCombine that does essentially the same thing for CTTZ.
lib/Target/AMDGPU/AMDGPUISelLowering.cpp
420	This should definitely remain legal
2029	You didn't enable custom lowering for i64, so this is dead code. You also didn't a dd any tests for it. In either case, it should be in a separate patch from the i32 handling.
lib/Target/AMDGPU/AMDGPUInstrInfo.td
301–302	This isn't a signed/unsigned operation. There is just one v_ffbl_b32.

craig.topper resigned from this revision.Sep 8 2017, 10:48 PM

wdng marked 2 inline comments as done.Sep 11 2017, 9:05 AM

wdng added inline comments.

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
420	I think it doesn't matter to define it as Custom, because it will be converted to FFBL_U32 during the custom lowering and then pattern matching to the ffbl instruction anyway at the end. However, if we defined it as Legal, we will have a "duplicate" or "extra" pattern (FFBL_U32 and CTTZ_ZERO_UNDEF) for generating the ffbl instruction. Is there any specific reason that I neglect here that we have to define it as Legal?

arsenm added inline comments.Sep 12 2017, 7:05 PM

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2803–2806	OK, I see the default expansion here isn't the compare and select like I expected. Since the compare+select implementation is likely more instructions with the compare than the sub/ctpop implementation, that one should be tried first.
lib/Target/AMDGPU/AMDGPUISelLowering.cpp
417	We should probably fix this at some point to be legal
1109–1110	Also need the select with -1 optimization (and corresponding tests) as cttz
2021	This is mostly copy past from LowerCTLZ. These should be factored into a common helper.
test/CodeGen/AMDGPU/cttz_zero_undef.ll
103	Need i64 tests

Changes based on code review feedback.

Upload a full diff.

Missing performCtlzCombine equivalent

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
2803–2806	I don't see this changed
test/CodeGen/AMDGPU/cttz_zero_undef.ll
108	Missing scalar version

Address code reviews.

Fix the issues that variables are not capitalized.

wdng marked 2 inline comments as done.Sep 14 2017, 2:50 PM

Ping.

arsenm added inline comments.Sep 15 2017, 10:17 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2082–2088	Indentation wrong
2089	llvm_unreachable
2095–2098	Select between Zero and One as input to getSetCC
3036–3037	You could just pass in the new opcode directly rather than selecting it again
3038	Commented out code
3058	You didn't add tests for this part
lib/Target/AMDGPU/AMDGPUISelLowering.h
375	Should be name FFBL_B32 to match the instruction
test/CodeGen/AMDGPU/cttz_zero_undef.ll
130	Also should have some tests with i8/i16

Will create another separate ticket to fix the v_ffbl_sdwa instruction generation.

Ping.

wdng edited reviewers, added: kzhuravl; removed: craig.topper.Sep 22 2017, 12:29 PM

Ping.

Needs more comprehensive check lines. Just checking the instructions won't demonstrate that the extra instructions you're trying to avoid aren't there

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
3044	Don't include AMDGPUISD in the name of this
3069	Ditto
test/CodeGen/AMDGPU/cttz_zero_undef.ll
81	This needs to check more
93	This needs to check more
117–118	Ditto

Address code reivews.

Ping.

wdng added a reviewer: rampitec.Oct 10 2017, 11:10 AM

arsenm added inline comments.Oct 11 2017, 11:05 AM

lib/Target/AMDGPU/AMDGPUISelLowering.cpp
2083	Extra space after ::
2084	Don't includ eAMDGPUISD in variable name
2085	Missing space before {
2106	Double // and missing closing )
2110	Double //
test/CodeGen/AMDGPU/cttz_zero_undef.ll
2	Add -enable-var-scope to all of the FileCheck lines. Several of these tests are broken
167–168	This isn't checking the outputs and select
180	Using undefined VAL
194	Undefined VAL

Address code reviews.

wdng marked 3 inline comments as done.Oct 11 2017, 4:14 PM

craig.topper removed a subscriber: craig.topper.Oct 11 2017, 4:16 PM

arsenm added inline comments.Oct 11 2017, 4:24 PM

test/CodeGen/AMDGPU/cttz_zero_undef.ll
171–172	Using 2 -DAGs with identical lines doesn't do anything. It will pass with only one

wdng added inline comments.Oct 11 2017, 4:27 PM

test/CodeGen/AMDGPU/cttz_zero_undef.ll
171–172	No, it won't work if I remove the -DAG. As the generated instructions get interleaved with each other.

Remove duplicate check lines.

Removed -DAG checks completely.

LGTM

This revision is now accepted and ready to land.Oct 12 2017, 10:39 AM

Closed by commit rL315610: Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ. (authored by wdng). · Explain WhyOct 12 2017, 12:37 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

Target/

TargetSelectionDAG.td

2 lines

lib/

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

13 lines

Target/

AMDGPU/

AMDGPUISelLowering.h

3 lines

AMDGPUISelLowering.cpp

62 lines

AMDGPUInstrInfo.td

3 lines

EvergreenInstructions.td

2 lines

SOPInstructions.td

5 lines

test/

CodeGen/

AMDGPU/

cttz_zero_undef.ll

25 lines

Diff 114215

include/llvm/Target/TargetSelectionDAG.td

Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	def SDTFPBinOp : SDTypeProfile<1, 2, [ // fadd, fmul, etc.
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisFP<0>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisFP<0>
]>;		]>;
def SDTFPSignOp : SDTypeProfile<1, 2, [ // fcopysign.		def SDTFPSignOp : SDTypeProfile<1, 2, [ // fcopysign.
SDTCisSameAs<0, 1>, SDTCisFP<0>, SDTCisFP<2>		SDTCisSameAs<0, 1>, SDTCisFP<0>, SDTCisFP<2>
]>;		]>;
def SDTFPTernaryOp : SDTypeProfile<1, 3, [ // fmadd, fnmsub, etc.		def SDTFPTernaryOp : SDTypeProfile<1, 3, [ // fmadd, fnmsub, etc.
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisFP<0>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisFP<0>
]>;		]>;
def SDTIntUnaryOp : SDTypeProfile<1, 1, [ // ctlz		def SDTIntUnaryOp : SDTypeProfile<1, 1, [ // ctlz, cttz
SDTCisSameAs<0, 1>, SDTCisInt<0>		SDTCisSameAs<0, 1>, SDTCisInt<0>
]>;		]>;
def SDTIntExtendOp : SDTypeProfile<1, 1, [ // sext, zext, anyext		def SDTIntExtendOp : SDTypeProfile<1, 1, [ // sext, zext, anyext
SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<1, 0>, SDTCisSameNumEltsAs<0, 1>		SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<1, 0>, SDTCisSameNumEltsAs<0, 1>
]>;		]>;
def SDTIntTruncOp : SDTypeProfile<1, 1, [ // trunc		def SDTIntTruncOp : SDTypeProfile<1, 1, [ // trunc
SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<0, 1>, SDTCisSameNumEltsAs<0, 1>		SDTCisInt<0>, SDTCisInt<1>, SDTCisOpSmallerThanOp<0, 1>, SDTCisSameNumEltsAs<0, 1>
]>;		]>;
▲ Show 20 Lines • Show All 1,045 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 2,774 Lines • ▼ Show 20 Lines	case ISD::CTLZ: {
}		}
Op = DAG.getNOT(dl, Op, VT);		Op = DAG.getNOT(dl, Op, VT);
return DAG.getNode(ISD::CTPOP, dl, VT, Op);		return DAG.getNode(ISD::CTPOP, dl, VT, Op);
}		}
case ISD::CTTZ_ZERO_UNDEF:		case ISD::CTTZ_ZERO_UNDEF:
// This trivially expands to CTTZ.		// This trivially expands to CTTZ.
return DAG.getNode(ISD::CTTZ, dl, Op.getValueType(), Op);		return DAG.getNode(ISD::CTTZ, dl, Op.getValueType(), Op);
case ISD::CTTZ: {		case ISD::CTTZ: {
		EVT VT = Op.getValueType();
		unsigned len = VT.getSizeInBits();

		if (TLI.isOperationLegalOrCustom(ISD::CTTZ_ZERO_UNDEF, VT)) {
		EVT SetCCVT = getSetCCResultType(VT);
		SDValue CTTZ = DAG.getNode(ISD::CTTZ_ZERO_UNDEF, dl, VT, Op);
		SDValue Zero = DAG.getConstant(0, dl, VT);
		SDValue SrcIsZero = DAG.getSetCC(dl, SetCCVT, Op, Zero, ISD::SETEQ);
		return DAG.getNode(ISD::SELECT, dl, VT, SrcIsZero,
		DAG.getConstant(len, dl, VT), CTTZ);
		}
		arsenmUnsubmitted Done Reply Inline Actions I don't understand why you have this or most of the other changes. This shouldn't be substantially different from how we handle ctlz already. i.e. I would expect to see another version of AMDGPUTargetLowering::performCtlzCombine that does essentially the same thing for CTTZ. arsenm: I don't understand why you have this or most of the other changes. This shouldn't be…

// for now, we use: { return popcount(~x & (x - 1)); }		// for now, we use: { return popcount(~x & (x - 1)); }
// unless the target has ctlz but not ctpop, in which case we use:		// unless the target has ctlz but not ctpop, in which case we use:
// { return 32 - nlz(~x & (x-1)); }		// { return 32 - nlz(~x & (x-1)); }
// Ref: "Hacker's Delight" by Henry Warren		// Ref: "Hacker's Delight" by Henry Warren
EVT VT = Op.getValueType();
SDValue Tmp3 = DAG.getNode(ISD::AND, dl, VT,		SDValue Tmp3 = DAG.getNode(ISD::AND, dl, VT,
DAG.getNOT(dl, Op, VT),		DAG.getNOT(dl, Op, VT),
DAG.getNode(ISD::SUB, dl, VT, Op,		DAG.getNode(ISD::SUB, dl, VT, Op,
DAG.getConstant(1, dl, VT)));		DAG.getConstant(1, dl, VT)));
// If ISD::CTLZ is legal and CTPOP isn't, then do that instead.		// If ISD::CTLZ is legal and CTPOP isn't, then do that instead.
if (!TLI.isOperationLegalOrCustom(ISD::CTPOP, VT) &&		if (!TLI.isOperationLegalOrCustom(ISD::CTPOP, VT) &&
TLI.isOperationLegalOrCustom(ISD::CTLZ, VT))		TLI.isOperationLegalOrCustom(ISD::CTLZ, VT))
return DAG.getNode(ISD::SUB, dl, VT,		return DAG.getNode(ISD::SUB, dl, VT,
		arsenmUnsubmitted Done Reply Inline Actions OK, I see the default expansion here isn't the compare and select like I expected. Since the compare+select implementation is likely more instructions with the compare than the sub/ctpop implementation, that one should be tried first. arsenm: OK, I see the default expansion here isn't the compare and select like I expected. Since the…
		arsenmUnsubmitted Done Reply Inline Actions I don't see this changed arsenm: I don't see this changed
DAG.getConstant(VT.getSizeInBits(), dl, VT),		DAG.getConstant(VT.getSizeInBits(), dl, VT),
DAG.getNode(ISD::CTLZ, dl, VT, Tmp3));		DAG.getNode(ISD::CTLZ, dl, VT, Tmp3));
return DAG.getNode(ISD::CTPOP, dl, VT, Tmp3);		return DAG.getNode(ISD::CTPOP, dl, VT, Tmp3);
}		}
}		}
}		}

bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {		bool SelectionDAGLegalize::ExpandNode(SDNode *Node) {
▲ Show 20 Lines • Show All 1,855 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.h

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	protected:

/// \brief Split a vector store into 2 stores of half the vector.		/// \brief Split a vector store into 2 stores of half the vector.
SDValue SplitVectorStore(SDValue Op, SelectionDAG &DAG) const;		SDValue SplitVectorStore(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSDIVREM(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSDIVREM(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerUDIVREM(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerUDIVREM(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerDIVREM24(SDValue Op, SelectionDAG &DAG, bool sign) const;		SDValue LowerDIVREM24(SDValue Op, SelectionDAG &DAG, bool sign) const;
		SDValue LowerCTTZ(SDValue Op, SelectionDAG &DAG) const;
void LowerUDIVREM64(SDValue Op, SelectionDAG &DAG,		void LowerUDIVREM64(SDValue Op, SelectionDAG &DAG,
SmallVectorImpl<SDValue> &Results) const;		SmallVectorImpl<SDValue> &Results) const;
void analyzeFormalArgumentsCompute(CCState &State,		void analyzeFormalArgumentsCompute(CCState &State,
const SmallVectorImpl<ISD::InputArg> &Ins) const;		const SmallVectorImpl<ISD::InputArg> &Ins) const;
public:		public:
AMDGPUTargetLowering(const TargetMachine &TM, const AMDGPUSubtarget &STI);		AMDGPUTargetLowering(const TargetMachine &TM, const AMDGPUSubtarget &STI);

bool mayIgnoreSignedZero(SDValue Op) const {		bool mayIgnoreSignedZero(SDValue Op) const {
▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
CARRY,		CARRY,
BORROW,		BORROW,
BFE_U32, // Extract range of bits with zero extension to 32-bits.		BFE_U32, // Extract range of bits with zero extension to 32-bits.
BFE_I32, // Extract range of bits with sign extension to 32-bits.		BFE_I32, // Extract range of bits with sign extension to 32-bits.
BFI, // (src0 & src1) \| (~src0 & src2)		BFI, // (src0 & src1) \| (~src0 & src2)
BFM, // Insert a range of bits into a 32-bit word.		BFM, // Insert a range of bits into a 32-bit word.
FFBH_U32, // ctlz with -1 if input is zero.		FFBH_U32, // ctlz with -1 if input is zero.
FFBH_I32,		FFBH_I32,
		FFBL_U32, // cttz with -1 if input is zero.
		arsenmUnsubmitted Done Reply Inline Actions Should be name FFBL_B32 to match the instruction arsenm: Should be name FFBL_B32 to match the instruction
		FFBL_I32,
MUL_U24,		MUL_U24,
MUL_I24,		MUL_I24,
MULHI_U24,		MULHI_U24,
MULHI_I24,		MULHI_I24,
MAD_U24,		MAD_U24,
MAD_I24,		MAD_I24,
MUL_LOHI_I24,		MUL_LOHI_I24,
MUL_LOHI_U24,		MUL_LOHI_U24,
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 408 Lines • ▼ Show 20 Lines	AMDGPUTargetLowering::AMDGPUTargetLowering(const TargetMachine &TM,
setOperationAction(ISD::SELECT_CC, MVT::i64, Expand);		setOperationAction(ISD::SELECT_CC, MVT::i64, Expand);

setOperationAction(ISD::SMIN, MVT::i32, Legal);		setOperationAction(ISD::SMIN, MVT::i32, Legal);
setOperationAction(ISD::UMIN, MVT::i32, Legal);		setOperationAction(ISD::UMIN, MVT::i32, Legal);
setOperationAction(ISD::SMAX, MVT::i32, Legal);		setOperationAction(ISD::SMAX, MVT::i32, Legal);
setOperationAction(ISD::UMAX, MVT::i32, Legal);		setOperationAction(ISD::UMAX, MVT::i32, Legal);

if (Subtarget->hasFFBH())		if (Subtarget->hasFFBH())
setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i32, Custom);		setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i32, Custom);
		arsenmUnsubmitted Not Done Reply Inline Actions We should probably fix this at some point to be legal arsenm: We should probably fix this at some point to be legal

if (Subtarget->hasFFBL())		if (Subtarget->hasFFBL())
setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i32, Legal);		setOperationAction(ISD::CTTZ_ZERO_UNDEF, MVT::i32, Custom);
arsenmUnsubmitted Done Reply Inline Actions This should definitely remain legal arsenm: This should definitely remain legal
wdngAuthorUnsubmitted Not Done Reply Inline Actions I think it doesn't matter to define it as Custom, because it will be converted to FFBL_U32 during the custom lowering and then pattern matching to the ffbl instruction anyway at the end. However, if we defined it as Legal, we will have a "duplicate" or "extra" pattern (FFBL_U32 and CTTZ_ZERO_UNDEF) for generating the ffbl instruction. Is there any specific reason that I neglect here that we have to define it as Legal? wdng: I think it doesn't matter to define it as Custom, because it will be converted to FFBL_U32…

setOperationAction(ISD::CTLZ, MVT::i64, Custom);		setOperationAction(ISD::CTLZ, MVT::i64, Custom);
setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i64, Custom);		setOperationAction(ISD::CTLZ_ZERO_UNDEF, MVT::i64, Custom);

// We only really have 32-bit BFE instructions (and 16-bit on VI).		// We only really have 32-bit BFE instructions (and 16-bit on VI).
//		//
// On SI+ there are 64-bit BFEs, but they are scalar only and there isn't any		// On SI+ there are 64-bit BFEs, but they are scalar only and there isn't any
// effort to match them now. We want this to be false for i64 cases when the		// effort to match them now. We want this to be false for i64 cases when the
▲ Show 20 Lines • Show All 671 Lines • ▼ Show 20 Lines	SDValue AMDGPUTargetLowering::LowerOperation(SDValue Op,
case ISD::FNEARBYINT: return LowerFNEARBYINT(Op, DAG);		case ISD::FNEARBYINT: return LowerFNEARBYINT(Op, DAG);
case ISD::FROUND: return LowerFROUND(Op, DAG);		case ISD::FROUND: return LowerFROUND(Op, DAG);
case ISD::FFLOOR: return LowerFFLOOR(Op, DAG);		case ISD::FFLOOR: return LowerFFLOOR(Op, DAG);
case ISD::SINT_TO_FP: return LowerSINT_TO_FP(Op, DAG);		case ISD::SINT_TO_FP: return LowerSINT_TO_FP(Op, DAG);
case ISD::UINT_TO_FP: return LowerUINT_TO_FP(Op, DAG);		case ISD::UINT_TO_FP: return LowerUINT_TO_FP(Op, DAG);
case ISD::FP_TO_FP16: return LowerFP_TO_FP16(Op, DAG);		case ISD::FP_TO_FP16: return LowerFP_TO_FP16(Op, DAG);
case ISD::FP_TO_SINT: return LowerFP_TO_SINT(Op, DAG);		case ISD::FP_TO_SINT: return LowerFP_TO_SINT(Op, DAG);
case ISD::FP_TO_UINT: return LowerFP_TO_UINT(Op, DAG);		case ISD::FP_TO_UINT: return LowerFP_TO_UINT(Op, DAG);
		case ISD::CTTZ:
		case ISD::CTTZ_ZERO_UNDEF:
		return LowerCTTZ(Op, DAG);
		arsenmUnsubmitted Not Done Reply Inline Actions Also need the select with -1 optimization (and corresponding tests) as cttz arsenm: Also need the select with -1 optimization (and corresponding tests) as cttz
case ISD::CTLZ:		case ISD::CTLZ:
case ISD::CTLZ_ZERO_UNDEF:		case ISD::CTLZ_ZERO_UNDEF:
return LowerCTLZ(Op, DAG);		return LowerCTLZ(Op, DAG);
case ISD::DYNAMIC_STACKALLOC: return LowerDYNAMIC_STACKALLOC(Op, DAG);		case ISD::DYNAMIC_STACKALLOC: return LowerDYNAMIC_STACKALLOC(Op, DAG);
}		}
return Op;		return Op;
}		}

▲ Show 20 Lines • Show All 893 Lines • ▼ Show 20 Lines	SDValue AMDGPUTargetLowering::LowerFFLOOR(SDValue Op, SelectionDAG &DAG) const {
SDValue NeTrunc = DAG.getSetCC(SL, SetCCVT, Src, Trunc, ISD::SETONE);		SDValue NeTrunc = DAG.getSetCC(SL, SetCCVT, Src, Trunc, ISD::SETONE);
SDValue And = DAG.getNode(ISD::AND, SL, SetCCVT, Lt0, NeTrunc);		SDValue And = DAG.getNode(ISD::AND, SL, SetCCVT, Lt0, NeTrunc);

SDValue Add = DAG.getNode(ISD::SELECT, SL, MVT::f64, And, NegOne, Zero);		SDValue Add = DAG.getNode(ISD::SELECT, SL, MVT::f64, And, NegOne, Zero);
// TODO: Should this propagate fast-math-flags?		// TODO: Should this propagate fast-math-flags?
return DAG.getNode(ISD::FADD, SL, MVT::f64, Trunc, Add);		return DAG.getNode(ISD::FADD, SL, MVT::f64, Trunc, Add);
}		}


		SDValue AMDGPUTargetLowering:: LowerCTTZ(SDValue Op, SelectionDAG &DAG) const {
		arsenmUnsubmitted Done Reply Inline Actions This is mostly copy past from LowerCTLZ. These should be factored into a common helper. arsenm: This is mostly copy past from LowerCTLZ. These should be factored into a common helper.
		SDLoc SL(Op);
		SDValue Src = Op.getOperand(0);
		bool ZeroUndef = Op.getOpcode() == ISD::CTTZ_ZERO_UNDEF;

		if (ZeroUndef && Src.getValueType() == MVT::i32)
		return DAG.getNode(AMDGPUISD::FFBL_U32, SL, MVT::i32, Src);

		SDValue Vec = DAG.getNode(ISD::BITCAST, SL, MVT::v2i32, Src);
		arsenmUnsubmitted Not Done Reply Inline Actions You didn't enable custom lowering for i64, so this is dead code. You also didn't a dd any tests for it. In either case, it should be in a separate patch from the i32 handling. arsenm: You didn't enable custom lowering for i64, so this is dead code. You also didn't a dd any tests…

		const SDValue Zero = DAG.getConstant(0, SL, MVT::i32);
		const SDValue One = DAG.getConstant(1, SL, MVT::i32);

		SDValue Lo = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::i32, Vec, Zero);
		SDValue Hi = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::i32, Vec, One);

		EVT SetCCVT = getSetCCResultType(DAG.getDataLayout(),
		*DAG.getContext(), MVT::i32);

		SDValue Hi0 = DAG.getSetCC(SL, SetCCVT, Hi, One, ISD::SETEQ);

		SDValue CttzLo = DAG.getNode(ISD::CTTZ_ZERO_UNDEF, SL, MVT::i32, Lo);
		SDValue CttzHi = DAG.getNode(ISD::CTTZ_ZERO_UNDEF, SL, MVT::i32, Hi);

		const SDValue Bits32 = DAG.getConstant(32, SL, MVT::i32);
		SDValue Add = DAG.getNode(ISD::ADD, SL, MVT::i32, CttzHi, Bits32);

		// cttz(x) = lo_32(x) == 0 ? cttz(hi_32(x)) + 32 : cttz(lo_32(x))
		SDValue NewCttz = DAG.getNode(ISD::SELECT, SL, MVT::i32, Hi0, Add, CttzLo);

		if (!ZeroUndef) {
		// Test if the full 64-bit input is zero.

		// FIXME: DAG combines turn what should be an s_and_b64 into a v_or_b32,
		// which we probably don't want.
		SDValue Lo0 = DAG.getSetCC(SL, SetCCVT, Lo, Zero, ISD::SETEQ);
		SDValue SrcIsZero = DAG.getNode(ISD::AND, SL, SetCCVT, Lo0, Hi0);

		// TODO: If i64 setcc is half rate, it can result in 1 fewer instruction
		// with the same cycles, otherwise it is slower.
		// SDValue SrcIsZero = DAG.getSetCC(SL, SetCCVT, Src,
		// DAG.getConstant(0, SL, MVT::i64), ISD::SETEQ);

		const SDValue Bits32 = DAG.getConstant(64, SL, MVT::i32);

		// The instruction returns -1 for 0 input, but the defined intrinsic
		// behavior is to return the number of bits.
		NewCttz = DAG.getNode(ISD::SELECT, SL, MVT::i32,
		SrcIsZero, Bits32, NewCttz);
		}

		return DAG.getNode(ISD::ZERO_EXTEND, SL, MVT::i64, NewCttz);
		}

SDValue AMDGPUTargetLowering::LowerCTLZ(SDValue Op, SelectionDAG &DAG) const {		SDValue AMDGPUTargetLowering::LowerCTLZ(SDValue Op, SelectionDAG &DAG) const {
SDLoc SL(Op);		SDLoc SL(Op);
SDValue Src = Op.getOperand(0);		SDValue Src = Op.getOperand(0);
bool ZeroUndef = Op.getOpcode() == ISD::CTLZ_ZERO_UNDEF;		bool ZeroUndef = Op.getOpcode() == ISD::CTLZ_ZERO_UNDEF;

if (ZeroUndef && Src.getValueType() == MVT::i32)		if (ZeroUndef && Src.getValueType() == MVT::i32)
return DAG.getNode(AMDGPUISD::FFBH_U32, SL, MVT::i32, Src);		return DAG.getNode(AMDGPUISD::FFBH_U32, SL, MVT::i32, Src);

SDValue Vec = DAG.getNode(ISD::BITCAST, SL, MVT::v2i32, Src);		SDValue Vec = DAG.getNode(ISD::BITCAST, SL, MVT::v2i32, Src);
		arsenmUnsubmitted Done Reply Inline Actions Extra space after :: arsenm: Extra space after ::

		arsenmUnsubmitted Done Reply Inline Actions Don't includ eAMDGPUISD in variable name arsenm: Don't includ eAMDGPUISD in variable name
const SDValue Zero = DAG.getConstant(0, SL, MVT::i32);		const SDValue Zero = DAG.getConstant(0, SL, MVT::i32);
		arsenmUnsubmitted Done Reply Inline Actions Missing space before { arsenm: Missing space before {
const SDValue One = DAG.getConstant(1, SL, MVT::i32);		const SDValue One = DAG.getConstant(1, SL, MVT::i32);

SDValue Lo = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::i32, Vec, Zero);		SDValue Lo = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::i32, Vec, Zero);
		arsenmUnsubmitted Done Reply Inline Actions Indentation wrong arsenm: Indentation wrong
SDValue Hi = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::i32, Vec, One);		SDValue Hi = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, MVT::i32, Vec, One);
		arsenmUnsubmitted Done Reply Inline Actions llvm_unreachable arsenm: llvm_unreachable

EVT SetCCVT = getSetCCResultType(DAG.getDataLayout(),		EVT SetCCVT = getSetCCResultType(DAG.getDataLayout(),
*DAG.getContext(), MVT::i32);		*DAG.getContext(), MVT::i32);

SDValue Hi0 = DAG.getSetCC(SL, SetCCVT, Hi, Zero, ISD::SETEQ);		SDValue Hi0 = DAG.getSetCC(SL, SetCCVT, Hi, Zero, ISD::SETEQ);

SDValue CtlzLo = DAG.getNode(ISD::CTLZ_ZERO_UNDEF, SL, MVT::i32, Lo);		SDValue CtlzLo = DAG.getNode(ISD::CTLZ_ZERO_UNDEF, SL, MVT::i32, Lo);
SDValue CtlzHi = DAG.getNode(ISD::CTLZ_ZERO_UNDEF, SL, MVT::i32, Hi);		SDValue CtlzHi = DAG.getNode(ISD::CTLZ_ZERO_UNDEF, SL, MVT::i32, Hi);

		arsenmUnsubmitted Done Reply Inline Actions Select between Zero and One as input to getSetCC arsenm: Select between Zero and One as input to getSetCC
const SDValue Bits32 = DAG.getConstant(32, SL, MVT::i32);		const SDValue Bits32 = DAG.getConstant(32, SL, MVT::i32);
SDValue Add = DAG.getNode(ISD::ADD, SL, MVT::i32, CtlzLo, Bits32);		SDValue Add = DAG.getNode(ISD::ADD, SL, MVT::i32, CtlzLo, Bits32);

// ctlz(x) = hi_32(x) == 0 ? ctlz(lo_32(x)) + 32 : ctlz(hi_32(x))		// ctlz(x) = hi_32(x) == 0 ? ctlz(lo_32(x)) + 32 : ctlz(hi_32(x))
SDValue NewCtlz = DAG.getNode(ISD::SELECT, SL, MVT::i32, Hi0, Add, CtlzHi);		SDValue NewCtlz = DAG.getNode(ISD::SELECT, SL, MVT::i32, Hi0, Add, CtlzHi);

if (!ZeroUndef) {		if (!ZeroUndef) {
// Test if the full 64-bit input is zero.		// Test if the full 64-bit input is zero.
		arsenmUnsubmitted Done Reply Inline Actions Double // and missing closing ) arsenm: Double // and missing closing )

// FIXME: DAG combines turn what should be an s_and_b64 into a v_or_b32,		// FIXME: DAG combines turn what should be an s_and_b64 into a v_or_b32,
// which we probably don't want.		// which we probably don't want.
SDValue Lo0 = DAG.getSetCC(SL, SetCCVT, Lo, Zero, ISD::SETEQ);		SDValue Lo0 = DAG.getSetCC(SL, SetCCVT, Lo, Zero, ISD::SETEQ);
		arsenmUnsubmitted Done Reply Inline Actions Double // arsenm: Double //
SDValue SrcIsZero = DAG.getNode(ISD::AND, SL, SetCCVT, Lo0, Hi0);		SDValue SrcIsZero = DAG.getNode(ISD::AND, SL, SetCCVT, Lo0, Hi0);

// TODO: If i64 setcc is half rate, it can result in 1 fewer instruction		// TODO: If i64 setcc is half rate, it can result in 1 fewer instruction
// with the same cycles, otherwise it is slower.		// with the same cycles, otherwise it is slower.
// SDValue SrcIsZero = DAG.getSetCC(SL, SetCCVT, Src,		// SDValue SrcIsZero = DAG.getSetCC(SL, SetCCVT, Src,
// DAG.getConstant(0, SL, MVT::i64), ISD::SETEQ);		// DAG.getConstant(0, SL, MVT::i64), ISD::SETEQ);

const SDValue Bits32 = DAG.getConstant(64, SL, MVT::i32);		const SDValue Bits32 = DAG.getConstant(64, SL, MVT::i32);
▲ Show 20 Lines • Show All 909 Lines • ▼ Show 20 Lines
static bool isCtlzOpc(unsigned Opc) {		static bool isCtlzOpc(unsigned Opc) {
return Opc == ISD::CTLZ \|\| Opc == ISD::CTLZ_ZERO_UNDEF;		return Opc == ISD::CTLZ \|\| Opc == ISD::CTLZ_ZERO_UNDEF;
}		}

SDValue AMDGPUTargetLowering::getFFBH_U32(SelectionDAG &DAG,		SDValue AMDGPUTargetLowering::getFFBH_U32(SelectionDAG &DAG,
SDValue Op,		SDValue Op,
const SDLoc &DL) const {		const SDLoc &DL) const {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
EVT LegalVT = getTypeToTransformTo(*DAG.getContext(), VT);		EVT LegalVT = getTypeToTransformTo(*DAG.getContext(), VT);
if (LegalVT != MVT::i32 && (Subtarget->has16BitInsts() &&		if (LegalVT != MVT::i32 && (Subtarget->has16BitInsts() &&
		arsenmUnsubmitted Done Reply Inline Actions You could just pass in the new opcode directly rather than selecting it again arsenm: You could just pass in the new opcode directly rather than selecting it again
LegalVT != MVT::i16))		LegalVT != MVT::i16))
		arsenmUnsubmitted Done Reply Inline Actions Commented out code arsenm: Commented out code
return SDValue();		return SDValue();

if (VT != MVT::i32)		if (VT != MVT::i32)
Op = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i32, Op);		Op = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i32, Op);

SDValue FFBH = DAG.getNode(AMDGPUISD::FFBH_U32, DL, MVT::i32, Op);		SDValue FFBH = DAG.getNode(AMDGPUISD::FFBH_U32, DL, MVT::i32, Op);
		arsenmUnsubmitted Done Reply Inline Actions Don't include AMDGPUISD in the name of this arsenm: Don't include AMDGPUISD in the name of this
if (VT != MVT::i32)		if (VT != MVT::i32)
FFBH = DAG.getNode(ISD::TRUNCATE, DL, VT, FFBH);		FFBH = DAG.getNode(ISD::TRUNCATE, DL, VT, FFBH);

return FFBH;		return FFBH;
}		}

// The native instructions return -1 on 0 input. Optimize out a select that		// The native instructions return -1 on 0 input. Optimize out a select that
// produces -1 on 0.		// produces -1 on 0.
//		//
// TODO: If zero is not undef, we could also do this if the output is compared		// TODO: If zero is not undef, we could also do this if the output is compared
// against the bitwidth.		// against the bitwidth.
//		//
// TODO: Should probably combine against FFBH_U32 instead of ctlz directly.		// TODO: Should probably combine against FFBH_U32 instead of ctlz directly.
SDValue AMDGPUTargetLowering::performCtlzCombine(const SDLoc &SL, SDValue Cond,		SDValue AMDGPUTargetLowering::performCtlzCombine(const SDLoc &SL, SDValue Cond,
		arsenmUnsubmitted Not Done Reply Inline Actions You didn't add tests for this part arsenm: You didn't add tests for this part
SDValue LHS, SDValue RHS,		SDValue LHS, SDValue RHS,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
ConstantSDNode *CmpRhs = dyn_cast<ConstantSDNode>(Cond.getOperand(1));		ConstantSDNode *CmpRhs = dyn_cast<ConstantSDNode>(Cond.getOperand(1));
if (!CmpRhs \|\| !CmpRhs->isNullValue())		if (!CmpRhs \|\| !CmpRhs->isNullValue())
return SDValue();		return SDValue();

SelectionDAG &DAG = DCI.DAG;		SelectionDAG &DAG = DCI.DAG;
ISD::CondCode CCOpcode = cast<CondCodeSDNode>(Cond.getOperand(2))->get();		ISD::CondCode CCOpcode = cast<CondCodeSDNode>(Cond.getOperand(2))->get();
SDValue CmpLHS = Cond.getOperand(0);		SDValue CmpLHS = Cond.getOperand(0);

// select (setcc x, 0, eq), -1, (ctlz_zero_undef x) -> ffbh_u32 x		// select (setcc x, 0, eq), -1, (ctlz_zero_undef x) -> ffbh_u32 x
		arsenmUnsubmitted Done Reply Inline Actions Ditto arsenm: Ditto
if (CCOpcode == ISD::SETEQ &&		if (CCOpcode == ISD::SETEQ &&
isCtlzOpc(RHS.getOpcode()) &&		isCtlzOpc(RHS.getOpcode()) &&
RHS.getOperand(0) == CmpLHS &&		RHS.getOperand(0) == CmpLHS &&
isNegativeOne(LHS)) {		isNegativeOne(LHS)) {
return getFFBH_U32(DAG, CmpLHS, SL);		return getFFBH_U32(DAG, CmpLHS, SL);
}		}

// select (setcc x, 0, ne), (ctlz_zero_undef x), -1 -> ffbh_u32 x		// select (setcc x, 0, ne), (ctlz_zero_undef x), -1 -> ffbh_u32 x
▲ Show 20 Lines • Show All 717 Lines • ▼ Show 20 Lines	const char* AMDGPUTargetLowering::getTargetNodeName(unsigned Opcode) const {
NODE_NAME_CASE(CARRY)		NODE_NAME_CASE(CARRY)
NODE_NAME_CASE(BORROW)		NODE_NAME_CASE(BORROW)
NODE_NAME_CASE(BFE_U32)		NODE_NAME_CASE(BFE_U32)
NODE_NAME_CASE(BFE_I32)		NODE_NAME_CASE(BFE_I32)
NODE_NAME_CASE(BFI)		NODE_NAME_CASE(BFI)
NODE_NAME_CASE(BFM)		NODE_NAME_CASE(BFM)
NODE_NAME_CASE(FFBH_U32)		NODE_NAME_CASE(FFBH_U32)
NODE_NAME_CASE(FFBH_I32)		NODE_NAME_CASE(FFBH_I32)
		NODE_NAME_CASE(FFBL_U32)
		NODE_NAME_CASE(FFBL_I32)
NODE_NAME_CASE(MUL_U24)		NODE_NAME_CASE(MUL_U24)
NODE_NAME_CASE(MUL_I24)		NODE_NAME_CASE(MUL_I24)
NODE_NAME_CASE(MULHI_U24)		NODE_NAME_CASE(MULHI_U24)
NODE_NAME_CASE(MULHI_I24)		NODE_NAME_CASE(MULHI_I24)
NODE_NAME_CASE(MUL_LOHI_U24)		NODE_NAME_CASE(MUL_LOHI_U24)
NODE_NAME_CASE(MUL_LOHI_I24)		NODE_NAME_CASE(MUL_LOHI_I24)
NODE_NAME_CASE(MAD_U24)		NODE_NAME_CASE(MAD_U24)
NODE_NAME_CASE(MAD_I24)		NODE_NAME_CASE(MAD_I24)
▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUInstrInfo.td

	Show First 20 Lines • Show All 292 Lines • ▼ Show 20 Lines
	def AMDGPUbfe_u32 : SDNode<"AMDGPUISD::BFE_U32", AMDGPUDTIntTernaryOp>;			def AMDGPUbfe_u32 : SDNode<"AMDGPUISD::BFE_U32", AMDGPUDTIntTernaryOp>;
	def AMDGPUbfe_i32 : SDNode<"AMDGPUISD::BFE_I32", AMDGPUDTIntTernaryOp>;			def AMDGPUbfe_i32 : SDNode<"AMDGPUISD::BFE_I32", AMDGPUDTIntTernaryOp>;
	def AMDGPUbfi : SDNode<"AMDGPUISD::BFI", AMDGPUDTIntTernaryOp>;			def AMDGPUbfi : SDNode<"AMDGPUISD::BFI", AMDGPUDTIntTernaryOp>;
	def AMDGPUbfm : SDNode<"AMDGPUISD::BFM", SDTIntBinOp>;			def AMDGPUbfm : SDNode<"AMDGPUISD::BFM", SDTIntBinOp>;

	def AMDGPUffbh_u32 : SDNode<"AMDGPUISD::FFBH_U32", SDTIntUnaryOp>;			def AMDGPUffbh_u32 : SDNode<"AMDGPUISD::FFBH_U32", SDTIntUnaryOp>;
	def AMDGPUffbh_i32 : SDNode<"AMDGPUISD::FFBH_I32", SDTIntUnaryOp>;			def AMDGPUffbh_i32 : SDNode<"AMDGPUISD::FFBH_I32", SDTIntUnaryOp>;

				def AMDGPUffbl_u32 : SDNode<"AMDGPUISD::FFBL_U32", SDTIntUnaryOp>;
				def AMDGPUffbl_i32 : SDNode<"AMDGPUISD::FFBL_I32", SDTIntUnaryOp>;
				arsenmUnsubmitted Done Reply Inline Actions This isn't a signed/unsigned operation. There is just one v_ffbl_b32. arsenm: This isn't a signed/unsigned operation. There is just one v_ffbl_b32.

	// Signed and unsigned 24-bit multiply. The highest 8-bits are ignore			// Signed and unsigned 24-bit multiply. The highest 8-bits are ignore
	// when performing the mulitply. The result is a 32-bit value.			// when performing the mulitply. The result is a 32-bit value.
	def AMDGPUmul_u24 : SDNode<"AMDGPUISD::MUL_U24", SDTIntBinOp,			def AMDGPUmul_u24 : SDNode<"AMDGPUISD::MUL_U24", SDTIntBinOp,
	[SDNPCommutative, SDNPAssociative]			[SDNPCommutative, SDNPAssociative]
	>;			>;
	def AMDGPUmul_i24 : SDNode<"AMDGPUISD::MUL_I24", SDTIntBinOp,			def AMDGPUmul_i24 : SDNode<"AMDGPUISD::MUL_I24", SDTIntBinOp,
	[SDNPCommutative, SDNPAssociative]			[SDNPCommutative, SDNPAssociative]
	>;			>;
	▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

lib/Target/AMDGPU/EvergreenInstructions.td

	Show First 20 Lines • Show All 436 Lines • ▼ Show 20 Lines

	def ADDC_UINT : R600_2OP_Helper <0x52, "ADDC_UINT", AMDGPUcarry>;			def ADDC_UINT : R600_2OP_Helper <0x52, "ADDC_UINT", AMDGPUcarry>;
	def SUBB_UINT : R600_2OP_Helper <0x53, "SUBB_UINT", AMDGPUborrow>;			def SUBB_UINT : R600_2OP_Helper <0x53, "SUBB_UINT", AMDGPUborrow>;

	def FLT32_TO_FLT16 : R600_1OP_Helper <0xA2, "FLT32_TO_FLT16", AMDGPUfp_to_f16, VecALU>;			def FLT32_TO_FLT16 : R600_1OP_Helper <0xA2, "FLT32_TO_FLT16", AMDGPUfp_to_f16, VecALU>;
	def FLT16_TO_FLT32 : R600_1OP_Helper <0xA3, "FLT16_TO_FLT32", f16_to_fp, VecALU>;			def FLT16_TO_FLT32 : R600_1OP_Helper <0xA3, "FLT16_TO_FLT32", f16_to_fp, VecALU>;
	def BCNT_INT : R600_1OP_Helper <0xAA, "BCNT_INT", ctpop, VecALU>;			def BCNT_INT : R600_1OP_Helper <0xAA, "BCNT_INT", ctpop, VecALU>;
	def FFBH_UINT : R600_1OP_Helper <0xAB, "FFBH_UINT", AMDGPUffbh_u32, VecALU>;			def FFBH_UINT : R600_1OP_Helper <0xAB, "FFBH_UINT", AMDGPUffbh_u32, VecALU>;
	def FFBL_INT : R600_1OP_Helper <0xAC, "FFBL_INT", cttz_zero_undef, VecALU>;			def FFBL_INT : R600_1OP_Helper <0xAC, "FFBL_INT", AMDGPUffbl_u32, VecALU>;

	let hasSideEffects = 1 in {			let hasSideEffects = 1 in {
	def MOVA_INT_eg : R600_1OP <0xCC, "MOVA_INT", [], VecALU>;			def MOVA_INT_eg : R600_1OP <0xCC, "MOVA_INT", [], VecALU>;
	}			}

	def FLT_TO_INT_eg : FLT_TO_INT_Common<0x50> {			def FLT_TO_INT_eg : FLT_TO_INT_Common<0x50> {
	let Pattern = [];			let Pattern = [];
	let Itinerary = AnyALU;			let Itinerary = AnyALU;
	▲ Show 20 Lines • Show All 313 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SOPInstructions.td

	Show First 20 Lines • Show All 153 Lines • ▼ Show 20 Lines
	def S_BCNT1_I32_B32 : SOP1_32 <"s_bcnt1_i32_b32",			def S_BCNT1_I32_B32 : SOP1_32 <"s_bcnt1_i32_b32",
	[(set i32:$sdst, (ctpop i32:$src0))]			[(set i32:$sdst, (ctpop i32:$src0))]
	>;			>;
	def S_BCNT1_I32_B64 : SOP1_32_64 <"s_bcnt1_i32_b64">;			def S_BCNT1_I32_B64 : SOP1_32_64 <"s_bcnt1_i32_b64">;
	} // End Defs = [SCC]			} // End Defs = [SCC]

	def S_FF0_I32_B32 : SOP1_32 <"s_ff0_i32_b32">;			def S_FF0_I32_B32 : SOP1_32 <"s_ff0_i32_b32">;
	def S_FF0_I32_B64 : SOP1_32_64 <"s_ff0_i32_b64">;			def S_FF0_I32_B64 : SOP1_32_64 <"s_ff0_i32_b64">;
				def S_FF1_I32_B64 : SOP1_32_64 <"s_ff1_i32_b64">;

	def S_FF1_I32_B32 : SOP1_32 <"s_ff1_i32_b32",			def S_FF1_I32_B32 : SOP1_32 <"s_ff1_i32_b32",
	[(set i32:$sdst, (cttz_zero_undef i32:$src0))]			[(set i32:$sdst, (AMDGPUffbl_u32 i32:$src0))]
	>;			>;
	def S_FF1_I32_B64 : SOP1_32_64 <"s_ff1_i32_b64">;

	def S_FLBIT_I32_B32 : SOP1_32 <"s_flbit_i32_b32",			def S_FLBIT_I32_B32 : SOP1_32 <"s_flbit_i32_b32",
	[(set i32:$sdst, (AMDGPUffbh_u32 i32:$src0))]			[(set i32:$sdst, (AMDGPUffbh_u32 i32:$src0))]
	>;			>;

	def S_FLBIT_I32_B64 : SOP1_32_64 <"s_flbit_i32_b64">;			def S_FLBIT_I32_B64 : SOP1_32_64 <"s_flbit_i32_b64">;
	def S_FLBIT_I32 : SOP1_32 <"s_flbit_i32",			def S_FLBIT_I32 : SOP1_32 <"s_flbit_i32",
	[(set i32:$sdst, (AMDGPUffbh_i32 i32:$src0))]			[(set i32:$sdst, (AMDGPUffbh_i32 i32:$src0))]
	▲ Show 20 Lines • Show All 1,137 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/cttz_zero_undef.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -march=amdgcn -mcpu=tonga -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=SI -check-prefix=FUNC %s
				arsenmUnsubmitted Not Done Reply Inline Actions Add -enable-var-scope to all of the FileCheck lines. Several of these tests are broken arsenm: Add -enable-var-scope to all of the FileCheck lines. Several of these tests are broken
	; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s			; RUN: llc -march=r600 -mcpu=cypress -verify-machineinstrs < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s

	declare i32 @llvm.cttz.i32(i32, i1) nounwind readnone			declare i32 @llvm.cttz.i32(i32, i1) nounwind readnone
	declare <2 x i32> @llvm.cttz.v2i32(<2 x i32>, i1) nounwind readnone			declare <2 x i32> @llvm.cttz.v2i32(<2 x i32>, i1) nounwind readnone
	declare <4 x i32> @llvm.cttz.v4i32(<4 x i32>, i1) nounwind readnone			declare <4 x i32> @llvm.cttz.v4i32(<4 x i32>, i1) nounwind readnone
	declare i32 @llvm.r600.read.tidig.x() nounwind readnone			declare i32 @llvm.r600.read.tidig.x() nounwind readnone

	; FUNC-LABEL: {{^}}s_cttz_zero_undef_i32:			; FUNC-LABEL: {{^}}s_cttz_zero_undef_i32:
	▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines
	define amdgpu_kernel void @v_cttz_zero_undef_v4i32(<4 x i32> addrspace(1)* noalias %out, <4 x i32> addrspace(1)* noalias %valptr) nounwind {			define amdgpu_kernel void @v_cttz_zero_undef_v4i32(<4 x i32> addrspace(1)* noalias %out, <4 x i32> addrspace(1)* noalias %valptr) nounwind {
	%tid = call i32 @llvm.r600.read.tidig.x()			%tid = call i32 @llvm.r600.read.tidig.x()
	%in.gep = getelementptr <4 x i32>, <4 x i32> addrspace(1)* %valptr, i32 %tid			%in.gep = getelementptr <4 x i32>, <4 x i32> addrspace(1)* %valptr, i32 %tid
	%val = load <4 x i32>, <4 x i32> addrspace(1)* %in.gep, align 16			%val = load <4 x i32>, <4 x i32> addrspace(1)* %in.gep, align 16
	%cttz = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> %val, i1 true) nounwind readnone			%cttz = call <4 x i32> @llvm.cttz.v4i32(<4 x i32> %val, i1 true) nounwind readnone
	store <4 x i32> %cttz, <4 x i32> addrspace(1)* %out, align 16			store <4 x i32> %cttz, <4 x i32> addrspace(1)* %out, align 16
	ret void			ret void
	}			}

				; FUNC-LABEL: {{^}}s_cttz_zero_undef_i32_with_select:
				; SI: s_ff1_i32_b32
				arsenmUnsubmitted Done Reply Inline Actions This needs to check more arsenm: This needs to check more
				; EG: MEM_RAT_CACHELESS STORE_RAW [[RESULT:T[0-9]+\.[XYZW]]]
				; EG: FFBL_INT {{\? }}[[RESULT]]
				define amdgpu_kernel void @s_cttz_zero_undef_i32_with_select(i32 addrspace(1)* noalias %out, i32 %val) nounwind {
				%cttz = tail call i32 @llvm.cttz.i32(i32 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i32 %val, 0
				%ret = select i1 %cttz_ret, i32 %cttz, i32 32
				store i32 %cttz, i32 addrspace(1)* %out, align 4
				ret void
				}

				; FUNC-LABEL: {{^}}v_cttz_zero_undef_i32_with_select:
				; SI: v_ffbl_b32_e32
				arsenmUnsubmitted Done Reply Inline Actions This needs to check more arsenm: This needs to check more
				; EG: MEM_RAT_CACHELESS STORE_RAW [[RESULT:T[0-9]+\.[XYZW]]]
				define amdgpu_kernel void @v_cttz_zero_undef_i32_with_select(i32 addrspace(1)* noalias %out, i32 addrspace(1)* nocapture readonly %arrayidx) nounwind {
				%val = load i32, i32 addrspace(1)* %arrayidx, align 1
				%cttz = tail call i32 @llvm.cttz.i32(i32 %val, i1 true) nounwind readnone
				%cttz_ret = icmp ne i32 %val, 0
				%ret = select i1 %cttz_ret, i32 %cttz, i32 32
				store i32 %ret, i32 addrspace(1)* %out, align 4
				ret void
				}

				arsenmUnsubmitted Done Reply Inline Actions Need i64 tests arsenm: Need i64 tests
				arsenmUnsubmitted Done Reply Inline Actions Missing scalar version arsenm: Missing scalar version
				arsenmUnsubmitted Done Reply Inline Actions Also should have some tests with i8/i16 arsenm: Also should have some tests with i8/i16
				arsenmUnsubmitted Done Reply Inline Actions Ditto arsenm: Ditto
				arsenmUnsubmitted Done Reply Inline Actions This isn't checking the outputs and select arsenm: This isn't checking the outputs and select
				arsenmUnsubmitted Done Reply Inline Actions Using undefined VAL arsenm: Using undefined VAL
				arsenmUnsubmitted Done Reply Inline Actions Undefined VAL arsenm: Undefined VAL
				arsenmUnsubmitted Done Reply Inline Actions Using 2 -DAGs with identical lines doesn't do anything. It will pass with only one arsenm: Using 2 -DAGs with identical lines doesn't do anything. It will pass with only one
				wdngAuthorUnsubmitted Done Reply Inline Actions No, it won't work if I remove the -DAG. As the generated instructions get interleaved with each other. wdng: No, it won't work if I remove the -DAG. As the generated instructions get interleaved with each…

This is an archive of the discontinued LLVM Phabricator instance.

Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 114215

include/llvm/Target/TargetSelectionDAG.td

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

lib/Target/AMDGPU/AMDGPUISelLowering.h

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

lib/Target/AMDGPU/AMDGPUInstrInfo.td

lib/Target/AMDGPU/EvergreenInstructions.td

lib/Target/AMDGPU/SOPInstructions.td

test/CodeGen/AMDGPU/cttz_zero_undef.ll

Implement custom lowering for ISD::CTTZ_ZERO_UNDEF and ISD::CTTZ.
ClosedPublic