This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
CodeGen/
-
ISDOpcodes.h
-
IR/
2
Intrinsics.td
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
CodeGen/SelectionDAG/
-
SelectionDAG/
2
LegalizeDAG.cpp
1
SelectionDAGBuilder.cpp
-
Target/X86/
-
X86/
3
X86ISelDAGToDAG.cpp
4
X86ISelLowering.cpp
-
X86TargetTransformInfo.h
1
X86TargetTransformInfo.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
2
sad_intrinsic.ll

Differential D9029

[PATCH][CodeGen] Adding "llvm.sad" intrinsic and corresponding ISD::SAD node for "Sum Of Absolute Differences" operation
Needs RevisionPublic

Authored by • ashahid on Apr 15 2015, 8:01 AM.

Download Raw Diff

Details

Reviewers

ab
aschwaighofer
jmolloy
hfinkel

Summary

http://permalink.gmane.org/gmane.comp.compilers.llvm.devel/81724

This is with reference to the X86 PSAD instruction generation discussion happened in the above link.

This patch contains the Codgen changes. This change is introducing an llvm intrinsic called @llvm.sad() which takes two integer vector input and return an integer result. An ISD::SAD corresponding to this intrinsic is also added. Loop vectorizer or SLP vectorizer can use this intrinsic only after querying the target.

Diff Detail

Event Timeline

• ashahid updated this revision to Diff 23773.Apr 15 2015, 8:01 AM

• ashahid retitled this revision from to [PATCH][CodeGen] Adding "llvm.sad" intrinsic and corresponding ISD::SAD node for "Sum Of Absolute Differences" operation.

• ashahid updated this object.

• ashahid edited the test plan for this revision. (Show Details)

• ashahid added reviewers: hfinkel, spatel, aschwaighofer, jmolloy, ab.

• ashahid set the repository for this revision to rL LLVM.

• ashahid added a subscriber: Unknown Object (MLST).

A few general notes:

A LangRef description of the intrinsic would be useful for reviewing (and of course necessary for committing).
Also, the TTI parts are independent of the intrinsic/building/legalization parts; perhaps submit those with the patch that introduces their usage? (the vectorizer I guess)
I'm a bit confused about the lowering: if you only implement the v16i8 legal case, I don't expect to see anything other than a .td pattern and the SelectionDAGBuilder code.
In general, I think this will eventually need much stronger legalization: once something is in IR, people *will* use it, and IMO, if there isn't a good reason, it shouldn't fail to select or crash the compiler.

Thanks for working on this, I'm excited about the vectorizer parts!
-Ahmed

include/llvm/IR/Intrinsics.td
589	The opposite question was asked on the thread about i32, but can we constraint the return type more than this? (open question, I don't think so)
590	No spaces after '[' and before ']'.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1291–1295	This basically bypasses the legalizer for SAD, for no good reason. It should just be handled by the default case, and the legalization below.
2741–2760	I'm confused: what does this achieve?
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
5096–5100	I would inline ExpandSad here, like bswap right above.
lib/Target/X86/X86ISelDAGToDAG.cpp
2087–2097	Why not a .td pattern? Should be a one-liner (+ the SDNode in TargetSelectionDAG.td).
2089	I think getMachineNode has an overload for two operands you can use.
2091	Two things: SDM says SSE1 supports this on v8i8, which isn't really legal. why not a single check? For instance, is hasAVX() useful here?
lib/Target/X86/X86ISelLowering.cpp
915	Why not Legal?
16928–16929	Why not a single SDValue?
16930–16936	The original ISD::SAD building returns an integer (matching the intrinsic). Why v2i64 + extract + trunc here, instead of just doing DAG.getNode(ISD::SAD, dl, Op.getValueType(), ...) ?
16932–16933	There's a getNode overload you can use for two operands: getNode(..., Op0, Op1), like you do right below.
lib/Target/X86/X86TargetTransformInfo.cpp
1143–1144	Why not "return false"? It doesn't make much sense to call it on any other type, but I think that's the point of the function.
test/CodeGen/X86/sad_intrinsic.ll
1	Why AVX? This is all SSE2, no?
5	Explicitly checking the movd would be useful, I think

Hi Ahmed,

Thanks for looking into it.

Hi,

Thanks for continuing to work on this!

From a high level, I still stand by the comments I made on your LoopVectorizer review that this is a bit too Intel-specific. ARM and AArch64's SAD instructions do not behave as you model here. You model SAD as a per-element subtract, abs, then a cross-lane (horizontal) summation. The VABD instruction in ARM does not do the last step - that is assumed to be done in a separate instruction.

Now, we can model your intrinsic semantics in ARM using:

val2 = ABD val0, val1
val3 = ADDV val2

However that isn't very efficient - the sum-across-lanes is expensive. Ideally, in a loop, we'd do something like this:

loop:
  reduction1 = ABD array0[i], array1[i]
  br loop
val = ADDV reduction1

So we'd only do the horizontal step once, not on every iteration. That is why I think having the intrinsic represent just the per-element operations, not the horizontal part, would be the lowest common denominator between our two targets. It is easier to pattern match two nodes than to split one node apart and LICM part of it.

Similarly, the above construction should the loop vectorizer emit it is not very good for you. You would never be able to match your PSAD instruction. So my suggestion is this:

The intrinsic only models the per-element operations, not the horizontal part.
The X86 backend pattern matches the sad intrinsic plus a horizontal reduction to its PSAD node.
The Loop Vectorizer has code to emit *either* the per-element-only or the per-element-plus-horizontal SAD as part of its reduction logic.
The TargetTransformInfo hook is extended to return an enum SADType, in much the same way that popcnt type is done currently.

How does this sound?

Cheers,

James

This revision now requires changes to proceed.Apr 16 2015, 2:29 AM

FWIW I agree with James' concerns: I thought the intrinsic had already
been agreed upon, but that doesn't seem to be the case.

This got me thinking, and perhaps that's what you're saying James: if
the SAD intrinsic is too X86-specific, why do we even need an
intrinsic? We already express horizontal reductions using the
expanded form (if that's not ideal, then that's a separate topic);
the SUB is just a "sub <16 x i8>".

The tricky part is the ABS, which AFAIK we express using the expanded
form as well (InstCombine seems to agree).

If the only reason for the intrinsic is making it easier on the
vectorizer (I haven't really followed that discussion), it sounds like
there's something to be done for the more general problem (e.g. MAX,
or SAT, which I have not gotten back to yet).

Am I making sense?

-Ahmed

Hi James,

Thanks for your explanation.

With regards custom lowering, I think you have misinterpreted me. What I was saying was that if the intrinsic is defined as "sum( abs( *a++ - *b++ ) )", non-X86 backends could custom lower it as something like "ABD + ADDV" (absolute >difference, sum-of-all-lanes). However, you'd end up with the sum-of-all-lanes unnecessarily being inside the loop! By the time the intrinsic is expanded, it may be difficult to determine that the sum can be moved outside the loop.

If the horizontal sum is an intrinsic, how the LICM will happen on it?

Also, you haven't answered my question about signedness that I've mentioned several times.

Currently the SAD intrinsic is modeled for unsigned data types. AFAIK, signedness matters
only when an arithmetic operation results in a carry or overflow. We have not seen a use case
for signed data types yet. Even in case of ARM, “SABD” (if I am correct) does not
set overflow flag.

Regards,
Shahid

From: James Molloy [mailto:james@jamesmolloy.co.uk]
Sent: Thursday, April 16, 2015 9:42 PM
To: Shahid, Asghar-ahmad; reviews+D9029+public+eee34c83a2c6f996@reviews.llvm.org; hfinkel@anl.gov; spatel@rotateright.com; aschwaighofer@apple.com; ahmed.bougacha@gmail.com
Cc: llvm-commits@cs.uiuc.edu
Subject: Re: [PATCH] [PATCH][CodeGen] Adding "llvm.sad" intrinsic and corresponding ISD::SAD node for "Sum Of Absolute Differences" operation

Hi Asghar-ahmad,

Thanks for responding. I'll try and explain in more detail what I mean. I agree that we can custom lower things and that we could implement your intrinsic on our and other architecture. That is not in question. What is in question is whether the definition of the intrinsic and behaviour as-is would allow *efficient* implementation on multiple target architectures.

To reiterate the examples from earlier, there are seemingly two different approaches for lowering a sum-of-absolute-differences loop. Assume 'a' and 'b' are the two input arrays, as some p\
ointer to vector type.

1:
int r = 0;
for (i = ...) {

r += sum( abs( *a++ - *b++ ) );

}
// r holds the sum-of-absolute-differences

2:
vector int r = {0};
for (i = ...) {

r += abs( *a++ - *b++ );

}
r holds partial sums.
int sad = sum(r);
sad holds the sum-of-absolute-differences

The most efficient form of lowering for X86 may possibly be (1), where a PSAD instruction can be used (although for non-i8 types perhaps not?). For ARM, AArch64 and according to an appnote I found [0] Altivec (I couldn't find anything about MIPS), (2) is going to be better.

So the goal as I see it is to define these intrinsics and IR idioms such that both forms can be generated depending on the target (and/or datatype - you don't have PSAD for floating point types, so if someone does a non-int SAD loop the most efficient form for you would be (2)).

With regards custom lowering, I think you have misinterpreted me. What I was saying was that if the intrinsic is defined as "sum( abs( *a++ - *b++ ) )", non-X86 backends could custom lower it as something like "ABD + ADDV" (absolute difference, sum-of-all-lanes). However, you'd end up with the sum-of-all-lanes unnecessarily being inside the loop! By the time the intrinsic is expanded, it may be difficult to determine that the sum can be moved outside the loop.

Conversely, if we defined the intrinsic as "abs( *a++ - *b--)", we could still easily generate loop type (1) by adding a sum() around it. As it is easier to match a pattern than split a pattern apart and move it around (ISel is made for pattern matching!) this is the implementation I am suggesting.

Yes, you're right, this means the name "SAD" for the node may be a misnomer. What I've asked for is the splitting apart of an opaque intrinsic into a smaller opaque intrinsic and generic support IR, which is something we do try and do where possible elsewhere in the compiler. I hope I've explained why the node as you've described it may not be useful for any non-X86 target.

Also, you haven't answered my question about signedness that I've mentioned several times.

Cheers,

[0] http://www.freescale.com/webapp/sps/download/license.jsp?colCode=AVEC_SAD

Hi James,

From: James Molloy [mailto:james@jamesmolloy.co.uk]
Sent: Monday, April 20, 2015 2:03 PM
To: Shahid, Asghar-ahmad; reviews+D9029+public+eee34c83a2c6f996@reviews.llvm.org; hfinkel@anl.gov; spatel@rotateright.com; aschwaighofer@apple.com; ahmed.bougacha@gmail.com
Cc: llvm-commits@cs.uiuc.edu
Subject: Re: [PATCH] [PATCH][CodeGen] Adding "llvm.sad" intrinsic and corresponding ISD::SAD node for "Sum Of Absolute Differences" operation

Hi Shahid,

No matter how horizontal sums are modelled LICM will always have great difficulty recognizing that it can be hoisted. I think this is beyond LICM's abilities, which is why I suggested we allow the Loop Vectorizer to create several different idioms, one of which has the sum already hoisted. The Loop Vectorizer has all the knowledge to know that this is legal and profitable to do.
So you mean to create different idioms suited to different targets such as the (1) for X86, (2) for ARM from your example below.

I don't think you're right there with regards signedness. The conceptual operation of SAD is this:

Extend inputs to the output size (i8 -> i32 in PSAD's case)
Subtract the inputs
abs() the result
(optionally) sum the abs()s.

I think steps (2) and (4) are signedness-independent. I think steps (1) and (3) are not, and step (1) is where ARM's SABD differs from UABD.
I had referred the ARMv8 architecture manual for SABD & UABD, but I could not find any difference in their semantics,
both are using the unsigned input data. Is it the right doc to refer?

X86's PSAD also extends the inputs from i8 to i32, so I don't think PSAD is signedness independent either.
I don’t think this is correct, PSAD operate on 8 unsigned byte operands of source and destination.

Cheers,

James

Are you sure about that? I thought your psad had this signature:
i32 @psad(i8* %a, i8* %b)

Oops, it was a typo, pls read it as “both the operand source1 and source2”. Below is the link for reference.
https://www.yumpu.com/en/document/view/7897009/ia-32-intelr-architecture-software-developers-manual-volume-2/643

Regards,
Shahid

From: James Molloy [mailto:james@jamesmolloy.co.uk]
Sent: Monday, April 20, 2015 10:05 PM
To: Shahid, Asghar-ahmad; reviews+D9029+public+eee34c83a2c6f996@reviews.llvm.org; hfinkel@anl.gov; spatel@rotateright.com; aschwaighofer@apple.com; ahmed.bougacha@gmail.com
Cc: llvm-commits@cs.uiuc.edu
Subject: Re: [PATCH] [PATCH][CodeGen] Adding "llvm.sad" intrinsic and corresponding ISD::SAD node for "Sum Of Absolute Differences" operation

Hi,
So you mean to create different idioms suited to different targets such as the (1) for X86, (2) for ARM from your example below.
Yes; although (2) will apply to more than just ARM - there may be examples where such an idiom is profitable for X86 too, such as when you have i16 datatypes.

I had referred the ARMv8 architecture manual for SABD & UABD, but I could not find any difference in their semantics,
both are using the unsigned input data. Is it the right doc to refer?

That is the correct document. There's a knack to reading it: both SABD and UABD have a shared decode and semantic, but if you notice the "unsigned" pseudo-variable is set from the "U" bit of the instruction, which is 0 for UABD and 1 for SABD. So they do treat their input as differently signed.

I don’t think this is correct, PSAD operate on 8 unsigned byte operands of source and destination.

Are you sure about that? I thought your psad had this signature:

i32 @psad(i8* %a, i8* %b)

Are you saying instead it has:

i8 @psad(i8* %a, i8* %b)

Because that contradicts what you've said earlier in these threads and also intuition - what is the point in such an instruction as it can overflow during the summation step?

Cheers,

James

Hi James,

We agree with your logic of providing two intrinsic for ‘absolute difference’ and ‘horizontal add’.
I will soon send a description of that.
However for targets supporting specialized SAD instructions keeping the proposed llvm.sad() intrinsic
(of course need to extend to model for data types & signedness) still make sense due to

Cost modeling would be easier and efficient for SAD idioms

No dag combine would be required (absd() + hadd() -> sad())

Thoughts?

Regards,
Shahid

From: Shahid, Asghar-ahmad
Sent: Monday, April 20, 2015 10:24 PM
To: 'James Molloy'; reviews+D9029+public+eee34c83a2c6f996@reviews.llvm.org; hfinkel@anl.gov; spatel@rotateright.com; aschwaighofer@apple.com; ahmed.bougacha@gmail.com
Cc: llvm-commits@cs.uiuc.edu
Subject: RE: [PATCH] [PATCH][CodeGen] Adding "llvm.sad" intrinsic and corresponding ISD::SAD node for "Sum Of Absolute Differences" operation

Are you sure about that? I thought your psad had this signature:
i32 @psad(i8* %a, i8* %b)

Regards,
Shahid

From: James Molloy [mailto:james@jamesmolloy.co.uk]
Sent: Monday, April 20, 2015 10:05 PM
To: Shahid, Asghar-ahmad; reviews+D9029+public+eee34c83a2c6f996@reviews.llvm.org<mailto:reviews+D9029+public+eee34c83a2c6f996@reviews.llvm.org>; hfinkel@anl.gov<mailto:hfinkel@anl.gov>; spatel@rotateright.com<mailto:spatel@rotateright.com>; aschwaighofer@apple.com<mailto:aschwaighofer@apple.com>; ahmed.bougacha@gmail.com<mailto:ahmed.bougacha@gmail.com>
Cc: llvm-commits@cs.uiuc.edu<mailto:llvm-commits@cs.uiuc.edu>
Subject: Re: [PATCH] [PATCH][CodeGen] Adding "llvm.sad" intrinsic and corresponding ISD::SAD node for "Sum Of Absolute Differences" operation

I had referred the ARMv8 architecture manual for SABD & UABD, but I could not find any difference in their semantics,
both are using the unsigned input data. Is it the right doc to refer?

I don’t think this is correct, PSAD operate on 8 unsigned byte operands of source and destination.

Are you sure about that? I thought your psad had this signature:

i32 @psad(i8* %a, i8* %b)

Are you saying instead it has:

i8 @psad(i8* %a, i8* %b)

Because that contradicts what you've said earlier in these threads and also intuition - what is the point in such an instruction as it can overflow during the summation step?

Cheers,

James

Looks like this is proceeding with a different implementation (for x86 at least):
http://reviews.llvm.org/D14897
http://reviews.llvm.org/D14840

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

7 lines

TargetTransformInfoImpl.h

2 lines

CodeGen/

ISDOpcodes.h

6 lines

IR/

Intrinsics.td

5 lines

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

CodeGen/

SelectionDAG/

LegalizeDAG.cpp

34 lines

SelectionDAGBuilder.cpp

12 lines

Target/

X86/

X86ISelDAGToDAG.cpp

11 lines

X86ISelLowering.cpp

31 lines

X86TargetTransformInfo.h

1 line

X86TargetTransformInfo.cpp

33 lines

test/

CodeGen/

X86/

sad_intrinsic.ll

24 lines

Diff 23773

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	public:

/// \brief Return true if the target works with masked instruction		/// \brief Return true if the target works with masked instruction
/// AVX2 allows masks for consecutive load and store for i32 and i64 elements.		/// AVX2 allows masks for consecutive load and store for i32 and i64 elements.
/// AVX-512 architecture will also allow masks for non-consecutive memory		/// AVX-512 architecture will also allow masks for non-consecutive memory
/// accesses.		/// accesses.
bool isLegalMaskedStore(Type *DataType, int Consecutive) const;		bool isLegalMaskedStore(Type *DataType, int Consecutive) const;
bool isLegalMaskedLoad(Type *DataType, int Consecutive) const;		bool isLegalMaskedLoad(Type *DataType, int Consecutive) const;

		/// \brief Return true if the target supports 'sum of absolute differences'
		/// instruction for the given type.Calling this function with NULL argument
		/// reportis if the SAD instruction is supported by this target in general.
		bool isLegalSad(Type *DataType) const;

/// \brief Return the cost of the scaling factor used in the addressing		/// \brief Return the cost of the scaling factor used in the addressing
/// mode represented by AM for this target, for a load/store		/// mode represented by AM for this target, for a load/store
/// of the specified type.		/// of the specified type.
/// If the AM is supported, the return value must be >= 0.		/// If the AM is supported, the return value must be >= 0.
/// If the AM is not supported, it returns a negative value.		/// If the AM is not supported, it returns a negative value.
/// TODO: Handle pre/postinc as well.		/// TODO: Handle pre/postinc as well.
int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale) const;		bool HasBaseReg, int64_t Scale) const;
▲ Show 20 Lines • Show All 214 Lines • ▼ Show 20 Lines	public:
virtual void getUnrollingPreferences(Loop *L, UnrollingPreferences &UP) = 0;		virtual void getUnrollingPreferences(Loop *L, UnrollingPreferences &UP) = 0;
virtual bool isLegalAddImmediate(int64_t Imm) = 0;		virtual bool isLegalAddImmediate(int64_t Imm) = 0;
virtual bool isLegalICmpImmediate(int64_t Imm) = 0;		virtual bool isLegalICmpImmediate(int64_t Imm) = 0;
virtual bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV,		virtual bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset, bool HasBaseReg,		int64_t BaseOffset, bool HasBaseReg,
int64_t Scale) = 0;		int64_t Scale) = 0;
virtual bool isLegalMaskedStore(Type *DataType, int Consecutive) = 0;		virtual bool isLegalMaskedStore(Type *DataType, int Consecutive) = 0;
virtual bool isLegalMaskedLoad(Type *DataType, int Consecutive) = 0;		virtual bool isLegalMaskedLoad(Type *DataType, int Consecutive) = 0;
		virtual bool isLegalSad(Type *DataType) = 0;
virtual int getScalingFactorCost(Type Ty, GlobalValue BaseGV,		virtual int getScalingFactorCost(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset, bool HasBaseReg,		int64_t BaseOffset, bool HasBaseReg,
int64_t Scale) = 0;		int64_t Scale) = 0;
virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;		virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;
virtual bool isProfitableToHoist(Instruction *I) = 0;		virtual bool isProfitableToHoist(Instruction *I) = 0;
virtual bool isTypeLegal(Type *Ty) = 0;		virtual bool isTypeLegal(Type *Ty) = 0;
virtual unsigned getJumpBufAlignment() = 0;		virtual unsigned getJumpBufAlignment() = 0;
virtual unsigned getJumpBufSize() = 0;		virtual unsigned getJumpBufSize() = 0;
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	return Impl.isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg,
Scale);		Scale);
}		}
bool isLegalMaskedStore(Type *DataType, int Consecutive) override {		bool isLegalMaskedStore(Type *DataType, int Consecutive) override {
return Impl.isLegalMaskedStore(DataType, Consecutive);		return Impl.isLegalMaskedStore(DataType, Consecutive);
}		}
bool isLegalMaskedLoad(Type *DataType, int Consecutive) override {		bool isLegalMaskedLoad(Type *DataType, int Consecutive) override {
return Impl.isLegalMaskedLoad(DataType, Consecutive);		return Impl.isLegalMaskedLoad(DataType, Consecutive);
}		}
		bool isLegalSad(Type *DataType) override { return Impl.isLegalSad(DataType); }
int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale) override {		bool HasBaseReg, int64_t Scale) override {
return Impl.getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg, Scale);		return Impl.getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg, Scale);
}		}
bool isTruncateFree(Type Ty1, Type Ty2) override {		bool isTruncateFree(Type Ty1, Type Ty2) override {
return Impl.isTruncateFree(Ty1, Ty2);		return Impl.isTruncateFree(Ty1, Ty2);
}		}
bool isProfitableToHoist(Instruction *I) override {		bool isProfitableToHoist(Instruction *I) override {
▲ Show 20 Lines • Show All 210 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
// the implementation of LSR.		// the implementation of LSR.
return !BaseGV && BaseOffset == 0 && Scale <= 1;		return !BaseGV && BaseOffset == 0 && Scale <= 1;
}		}

bool isLegalMaskedStore(Type *DataType, int Consecutive) { return false; }		bool isLegalMaskedStore(Type *DataType, int Consecutive) { return false; }

bool isLegalMaskedLoad(Type *DataType, int Consecutive) { return false; }		bool isLegalMaskedLoad(Type *DataType, int Consecutive) { return false; }

		bool isLegalSad(Type *DataType) { return false; }

int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale) {		bool HasBaseReg, int64_t Scale) {
// Guess that all legal addressing mode are free.		// Guess that all legal addressing mode are free.
if (isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg, Scale))		if (isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg, Scale))
return 0;		return 0;
return -1;		return -1;
}		}

▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 319 Lines • ▼ Show 20 Lines	enum NodeType {
/// When the 1st operand is a vector, the shift amount must be in the same		/// When the 1st operand is a vector, the shift amount must be in the same
/// type. (TLI.getShiftAmountTy() will return the same type when the input		/// type. (TLI.getShiftAmountTy() will return the same type when the input
/// type is a vector.)		/// type is a vector.)
SHL, SRA, SRL, ROTL, ROTR,		SHL, SRA, SRL, ROTL, ROTR,

/// Byte Swap and Counting operators.		/// Byte Swap and Counting operators.
BSWAP, CTTZ, CTLZ, CTPOP,		BSWAP, CTTZ, CTLZ, CTPOP,

		/// SAD - This corresponds to an operation representing 'Sum Of Absolute
		/// Differences' of the two input integer vector elements and the reduced sum
		/// is returned as integer scalar.
		/// This node is generated from llvm.sad intrinsics.
		SAD,

/// Bit counting operators with an undefined result for zero inputs.		/// Bit counting operators with an undefined result for zero inputs.
CTTZ_ZERO_UNDEF, CTLZ_ZERO_UNDEF,		CTTZ_ZERO_UNDEF, CTLZ_ZERO_UNDEF,

/// Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not		/// Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not
/// i1 then the high bits must conform to getBooleanContents.		/// i1 then the high bits must conform to getBooleanContents.
SELECT,		SELECT,

/// Select with a vector condition (op #0) and two vector operands (ops #1		/// Select with a vector condition (op #0) and two vector operands (ops #1
▲ Show 20 Lines • Show All 542 Lines • Show Last 20 Lines

include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 578 Lines • ▼ Show 20 Lines
	def int_convertuu : Intrinsic<[llvm_anyint_ty],			def int_convertuu : Intrinsic<[llvm_anyint_ty],
	[llvm_anyint_ty, llvm_i32_ty, llvm_i32_ty]>;			[llvm_anyint_ty, llvm_i32_ty, llvm_i32_ty]>;

	// Clear cache intrinsic, default to ignore (ie. emit nothing)			// Clear cache intrinsic, default to ignore (ie. emit nothing)
	// maps to void __clear_cache() on supporting platforms			// maps to void __clear_cache() on supporting platforms
	def int_clear_cache : Intrinsic<[], [llvm_ptr_ty, llvm_ptr_ty],			def int_clear_cache : Intrinsic<[], [llvm_ptr_ty, llvm_ptr_ty],
	[], "llvm.clear_cache">;			[], "llvm.clear_cache">;

				// Calculate the Sum of Absolute Differences (SAD) of the two input integer
				// vectors.
				def int_sad : Intrinsic<[llvm_anyint_ty],
				abUnsubmitted Not Done Reply Inline Actions The opposite question was asked on the thread about i32, but can we constraint the return type more than this? (open question, I don't think so) ab: The opposite question was asked on the thread about i32, but can we constraint the return type…
				[ llvm_anyvector_ty, llvm_anyvector_ty ], [IntrNoMem]>;
				abUnsubmitted Not Done Reply Inline Actions No spaces after '[' and before ']'. ab: No spaces after '[' and before ']'.

	//===-------------------------- Masked Intrinsics -------------------------===//			//===-------------------------- Masked Intrinsics -------------------------===//
	//			//
	def int_masked_store : Intrinsic<[], [llvm_anyvector_ty, LLVMPointerTo<0>,			def int_masked_store : Intrinsic<[], [llvm_anyvector_ty, LLVMPointerTo<0>,
	llvm_i32_ty,			llvm_i32_ty,
	LLVMVectorSameWidth<0, llvm_i1_ty>],			LLVMVectorSameWidth<0, llvm_i1_ty>],
	[IntrReadWriteArgMem]>;			[IntrReadWriteArgMem]>;

	def int_masked_load : Intrinsic<[llvm_anyvector_ty],			def int_masked_load : Intrinsic<[llvm_anyvector_ty],
	Show All 35 Lines

lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	bool TargetTransformInfo::isLegalMaskedStore(Type *DataType,
return TTIImpl->isLegalMaskedStore(DataType, Consecutive);		return TTIImpl->isLegalMaskedStore(DataType, Consecutive);
}		}

bool TargetTransformInfo::isLegalMaskedLoad(Type *DataType,		bool TargetTransformInfo::isLegalMaskedLoad(Type *DataType,
int Consecutive) const {		int Consecutive) const {
return TTIImpl->isLegalMaskedLoad(DataType, Consecutive);		return TTIImpl->isLegalMaskedLoad(DataType, Consecutive);
}		}

		bool TargetTransformInfo::isLegalSad(Type *DataType) const {
		return TTIImpl->isLegalSad(DataType);
		}

int TargetTransformInfo::getScalingFactorCost(Type Ty, GlobalValue BaseGV,		int TargetTransformInfo::getScalingFactorCost(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset,		int64_t BaseOffset,
bool HasBaseReg,		bool HasBaseReg,
int64_t Scale) const {		int64_t Scale) const {
return TTIImpl->getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg,		return TTIImpl->getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg,
Scale);		Scale);
}		}

▲ Show 20 Lines • Show All 199 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	SDValue ExpandLegalINT_TO_FP(bool isSigned, SDValue LegalOp, EVT DestVT,
SDLoc dl);		SDLoc dl);
SDValue PromoteLegalINT_TO_FP(SDValue LegalOp, EVT DestVT, bool isSigned,		SDValue PromoteLegalINT_TO_FP(SDValue LegalOp, EVT DestVT, bool isSigned,
SDLoc dl);		SDLoc dl);
SDValue PromoteLegalFP_TO_INT(SDValue LegalOp, EVT DestVT, bool isSigned,		SDValue PromoteLegalFP_TO_INT(SDValue LegalOp, EVT DestVT, bool isSigned,
SDLoc dl);		SDLoc dl);

SDValue ExpandBSWAP(SDValue Op, SDLoc dl);		SDValue ExpandBSWAP(SDValue Op, SDLoc dl);
SDValue ExpandBitCount(unsigned Opc, SDValue Op, SDLoc dl);		SDValue ExpandBitCount(unsigned Opc, SDValue Op, SDLoc dl);
		SDValue ExpandSAD(EVT DestVT, SDValue LHS, SDValue RHS, SDLoc dl);

SDValue ExpandExtractFromVectorThroughStack(SDValue Op);		SDValue ExpandExtractFromVectorThroughStack(SDValue Op);
SDValue ExpandInsertToVectorThroughStack(SDValue Op);		SDValue ExpandInsertToVectorThroughStack(SDValue Op);
SDValue ExpandVectorBuildThroughStack(SDNode* Node);		SDValue ExpandVectorBuildThroughStack(SDNode* Node);

SDValue ExpandConstantFP(ConstantFPSDNode *CFP, bool UseCP);		SDValue ExpandConstantFP(ConstantFPSDNode *CFP, bool UseCP);

std::pair<SDValue, SDValue> ExpandAtomic(SDNode *Node);		std::pair<SDValue, SDValue> ExpandAtomic(SDNode *Node);
▲ Show 20 Lines • Show All 1,131 Lines • ▼ Show 20 Lines	if (Action == TargetLowering::Expand) {
SDValue NewVal;		SDValue NewVal;
NewVal = DAG.getNode(ISD::TRAP, SDLoc(Node), Node->getVTList(),		NewVal = DAG.getNode(ISD::TRAP, SDLoc(Node), Node->getVTList(),
Node->getOperand(0));		Node->getOperand(0));
ReplaceNode(Node, NewVal.getNode());		ReplaceNode(Node, NewVal.getNode());
LegalizeOp(NewVal.getNode());		LegalizeOp(NewVal.getNode());
return;		return;
}		}
break;		break;
		case ISD::SAD:
		Action = TLI.getOperationAction(Node->getOpcode(),
		Node->getOperand(0).getValueType());
		SimpleFinishLegalizing = false;
		break;
		abUnsubmitted Not Done Reply Inline Actions This basically bypasses the legalizer for SAD, for no good reason. It should just be handled by the default case, and the legalization below. ab: This basically bypasses the legalizer for SAD, for no good reason. It should just be handled…
default:		default:
if (Node->getOpcode() >= ISD::BUILTIN_OP_END) {		if (Node->getOpcode() >= ISD::BUILTIN_OP_END) {
Action = TargetLowering::Legal;		Action = TargetLowering::Legal;
} else {		} else {
Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));		Action = TLI.getOperationAction(Node->getOpcode(), Node->getValueType(0));
}		}
break;		break;
}		}
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	void SelectionDAGLegalize::LegalizeOp(SDNode *Node) {
switch (Node->getOpcode()) {		switch (Node->getOpcode()) {
default:		default:
#ifndef NDEBUG		#ifndef NDEBUG
dbgs() << "NODE: ";		dbgs() << "NODE: ";
Node->dump( &DAG);		Node->dump( &DAG);
dbgs() << "\n";		dbgs() << "\n";
#endif		#endif
llvm_unreachable("Do not know how to legalize this operator!");		llvm_unreachable("Do not know how to legalize this operator!");
		case ISD::SAD:
case ISD::CALLSEQ_START:		case ISD::CALLSEQ_START:
case ISD::CALLSEQ_END:		case ISD::CALLSEQ_END:
break;		break;
case ISD::LOAD: {		case ISD::LOAD: {
return LegalizeLoadOps(Node);		return LegalizeLoadOps(Node);
}		}
case ISD::STORE: {		case ISD::STORE: {
return LegalizeStoreOps(Node);		return LegalizeStoreOps(Node);
▲ Show 20 Lines • Show All 1,334 Lines • ▼ Show 20 Lines	case MVT::i64:
Tmp4 = DAG.getNode(ISD::OR, dl, VT, Tmp4, Tmp3);		Tmp4 = DAG.getNode(ISD::OR, dl, VT, Tmp4, Tmp3);
Tmp2 = DAG.getNode(ISD::OR, dl, VT, Tmp2, Tmp1);		Tmp2 = DAG.getNode(ISD::OR, dl, VT, Tmp2, Tmp1);
Tmp8 = DAG.getNode(ISD::OR, dl, VT, Tmp8, Tmp6);		Tmp8 = DAG.getNode(ISD::OR, dl, VT, Tmp8, Tmp6);
Tmp4 = DAG.getNode(ISD::OR, dl, VT, Tmp4, Tmp2);		Tmp4 = DAG.getNode(ISD::OR, dl, VT, Tmp4, Tmp2);
return DAG.getNode(ISD::OR, dl, VT, Tmp8, Tmp4);		return DAG.getNode(ISD::OR, dl, VT, Tmp8, Tmp4);
}		}
}		}

		SDValue SelectionDAGLegalize::ExpandSAD(EVT DestVT, SDValue LHS, SDValue RHS,
		SDLoc DL) {
		SDValue result;
		switch (LHS.getValueType().getSimpleVT().SimpleTy) {
		default:
		llvm_unreachable("Unhandled Expand type in SAD");
		case MVT::i16:
		SDValue sad = DAG.getNode(ISD::SAD, DL, MVT::v2i64, LHS, RHS);
		EVT VecIdxTy = DAG.getTargetLoweringInfo().getVectorIdxTy();
		SDValue BottomHalf = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, sad,
		DAG.getConstant(0, VecIdxTy));
		SDValue TopHalf = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, sad,
		DAG.getConstant(1, VecIdxTy));

		BottomHalf = DAG.getNode(ISD::TRUNCATE, DL, DestVT, BottomHalf);
		TopHalf = DAG.getNode(ISD::TRUNCATE, DL, DestVT, TopHalf);
		result = DAG.getNode(ISD::ADD, DL, DestVT, TopHalf, BottomHalf);
		}
		return result;
		}
		abUnsubmitted Not Done Reply Inline Actions I'm confused: what does this achieve? ab: I'm confused: what does this achieve?

/// Expand the specified bitcount instruction into operations.		/// Expand the specified bitcount instruction into operations.
SDValue SelectionDAGLegalize::ExpandBitCount(unsigned Opc, SDValue Op,		SDValue SelectionDAGLegalize::ExpandBitCount(unsigned Opc, SDValue Op,
SDLoc dl) {		SDLoc dl) {
switch (Opc) {		switch (Opc) {
default: llvm_unreachable("Cannot expand this yet!");		default: llvm_unreachable("Cannot expand this yet!");
case ISD::CTPOP: {		case ISD::CTPOP: {
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
EVT ShVT = TLI.getShiftAmountTy(VT);		EVT ShVT = TLI.getShiftAmountTy(VT);
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines	void SelectionDAGLegalize::ExpandNode(SDNode *Node) {
case ISD::CTTZ:		case ISD::CTTZ:
case ISD::CTTZ_ZERO_UNDEF:		case ISD::CTTZ_ZERO_UNDEF:
Tmp1 = ExpandBitCount(Node->getOpcode(), Node->getOperand(0), dl);		Tmp1 = ExpandBitCount(Node->getOpcode(), Node->getOperand(0), dl);
Results.push_back(Tmp1);		Results.push_back(Tmp1);
break;		break;
case ISD::BSWAP:		case ISD::BSWAP:
Results.push_back(ExpandBSWAP(Node->getOperand(0), dl));		Results.push_back(ExpandBSWAP(Node->getOperand(0), dl));
break;		break;
		case ISD::SAD:
		Results.push_back(ExpandSAD(Node->getValueType(0), Node->getOperand(0),
		Node->getOperand(1), dl));
		break;
case ISD::FRAMEADDR:		case ISD::FRAMEADDR:
case ISD::RETURNADDR:		case ISD::RETURNADDR:
case ISD::FRAME_TO_ARGS_OFFSET:		case ISD::FRAME_TO_ARGS_OFFSET:
Results.push_back(DAG.getConstant(0, Node->getValueType(0)));		Results.push_back(DAG.getConstant(0, Node->getValueType(0)));
break;		break;
case ISD::FLT_ROUNDS_:		case ISD::FLT_ROUNDS_:
Results.push_back(DAG.getConstant(1, Node->getValueType(0)));		Results.push_back(DAG.getConstant(1, Node->getValueType(0)));
break;		break;
▲ Show 20 Lines • Show All 1,459 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,370 Lines • ▼ Show 20 Lines	SDValue t0 = DAG.getNode(ISD::FMUL, dl, MVT::f32, RHS,
getF32Constant(DAG, 0x40549a78));		getF32Constant(DAG, 0x40549a78));
return getLimitedPrecisionExp2(t0, dl, DAG);		return getLimitedPrecisionExp2(t0, dl, DAG);
}		}

// No special expansion.		// No special expansion.
return DAG.getNode(ISD::FPOW, dl, LHS.getValueType(), LHS, RHS);		return DAG.getNode(ISD::FPOW, dl, LHS.getValueType(), LHS, RHS);
}		}

		/// ExpandSad - expand the llvm sad intrinsic.
		static SDValue ExpandSad(SDLoc DL, EVT VT, SDValue LHS, SDValue RHS,
		SelectionDAG &DAG) {
		SDValue result;
		result = DAG.getNode(ISD::SAD, DL, VT, LHS, RHS);
		return result;
		}

/// ExpandPowI - Expand a llvm.powi intrinsic.		/// ExpandPowI - Expand a llvm.powi intrinsic.
static SDValue ExpandPowI(SDLoc DL, SDValue LHS, SDValue RHS,		static SDValue ExpandPowI(SDLoc DL, SDValue LHS, SDValue RHS,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
// If RHS is a constant, we can expand this out to a multiplication tree,		// If RHS is a constant, we can expand this out to a multiplication tree,
// otherwise we end up lowering to a call to __powidf2 (for example). When		// otherwise we end up lowering to a call to __powidf2 (for example). When
// optimizing for size, we only want to do this if the expansion would produce		// optimizing for size, we only want to do this if the expansion would produce
// a small number of multiplies, otherwise we do the full expansion.		// a small number of multiplies, otherwise we do the full expansion.
▲ Show 20 Lines • Show All 694 Lines • ▼ Show 20 Lines	case Intrinsic::readcyclecounter: {
DAG.setRoot(Res.getValue(1));		DAG.setRoot(Res.getValue(1));
return nullptr;		return nullptr;
}		}
case Intrinsic::bswap:		case Intrinsic::bswap:
setValue(&I, DAG.getNode(ISD::BSWAP, sdl,		setValue(&I, DAG.getNode(ISD::BSWAP, sdl,
getValue(I.getArgOperand(0)).getValueType(),		getValue(I.getArgOperand(0)).getValueType(),
getValue(I.getArgOperand(0))));		getValue(I.getArgOperand(0))));
return nullptr;		return nullptr;
		case Intrinsic::sad: {
		setValue(&I, ExpandSad(sdl, TLI.getValueType(I.getType()), getValue(I.getArgOperand(0)),
		getValue(I.getArgOperand(1)), DAG));
		return nullptr;
		}
		abUnsubmitted Not Done Reply Inline Actions I would inline ExpandSad here, like bswap right above. ab: I would inline ExpandSad here, like bswap right above.
case Intrinsic::cttz: {		case Intrinsic::cttz: {
SDValue Arg = getValue(I.getArgOperand(0));		SDValue Arg = getValue(I.getArgOperand(0));
ConstantInt *CI = cast<ConstantInt>(I.getArgOperand(1));		ConstantInt *CI = cast<ConstantInt>(I.getArgOperand(1));
EVT Ty = Arg.getValueType();		EVT Ty = Arg.getValueType();
setValue(&I, DAG.getNode(CI->isZero() ? ISD::CTTZ : ISD::CTTZ_ZERO_UNDEF,		setValue(&I, DAG.getNode(CI->isZero() ? ISD::CTTZ : ISD::CTTZ_ZERO_UNDEF,
sdl, Ty, Arg));		sdl, Ty, Arg));
return nullptr;		return nullptr;
}		}
▲ Show 20 Lines • Show All 2,723 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelDAGToDAG.cpp

Show First 20 Lines • Show All 2,078 Lines • ▼ Show 20 Lines	SDNode X86DAGToDAGISel::Select(SDNode Node) {
if (Node->isMachineOpcode()) {		if (Node->isMachineOpcode()) {
DEBUG(dbgs() << "== "; Node->dump(CurDAG); dbgs() << '\n');		DEBUG(dbgs() << "== "; Node->dump(CurDAG); dbgs() << '\n');
Node->setNodeId(-1);		Node->setNodeId(-1);
return nullptr; // Already selected.		return nullptr; // Already selected.
}		}

switch (Opcode) {		switch (Opcode) {
default: break;		default: break;
		case ISD::SAD: {
		SDNode *New;
		SDValue Ops[] = {Node->getOperand(0), Node->getOperand(1)};
		abUnsubmitted Not Done Reply Inline Actions I think getMachineNode has an overload for two operands you can use. ab: I think getMachineNode has an overload for two operands you can use.

		if (Subtarget->hasAVX() \|\| Subtarget->hasSSE1() \|\| Subtarget->hasSSE2())
		abUnsubmitted Not Done Reply Inline Actions Two things: SDM says SSE1 supports this on v8i8, which isn't really legal. why not a single check? For instance, is hasAVX() useful here? ab: Two things: - SDM says SSE1 supports this on v8i8, which isn't really legal. - why not a single…
		New =
		CurDAG->getMachineNode(X86::PSADBWrr, dl, Node->getValueType(0), Ops);
		else
		New = Node;
		return New;
		}
		abUnsubmitted Not Done Reply Inline Actions Why not a .td pattern? Should be a one-liner (+ the SDNode in TargetSelectionDAG.td). ab: Why not a .td pattern? Should be a one-liner (+ the SDNode in TargetSelectionDAG.td).
case ISD::INTRINSIC_W_CHAIN: {		case ISD::INTRINSIC_W_CHAIN: {
unsigned IntNo = cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();		unsigned IntNo = cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();
switch (IntNo) {		switch (IntNo) {
default: break;		default: break;
case Intrinsic::x86_avx2_gather_d_pd:		case Intrinsic::x86_avx2_gather_d_pd:
case Intrinsic::x86_avx2_gather_d_pd_256:		case Intrinsic::x86_avx2_gather_d_pd_256:
case Intrinsic::x86_avx2_gather_q_pd:		case Intrinsic::x86_avx2_gather_q_pd:
case Intrinsic::x86_avx2_gather_q_pd_256:		case Intrinsic::x86_avx2_gather_q_pd_256:
▲ Show 20 Lines • Show All 755 Lines • Show Last 20 Lines

lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 905 Lines • ▼ Show 20 Lines	if (!TM.Options.UseSoftFloat && Subtarget->hasSSE2()) {
}		}

// Custom lower v2i64 and v2f64 selects.		// Custom lower v2i64 and v2f64 selects.
setOperationAction(ISD::LOAD, MVT::v2f64, Legal);		setOperationAction(ISD::LOAD, MVT::v2f64, Legal);
setOperationAction(ISD::LOAD, MVT::v2i64, Legal);		setOperationAction(ISD::LOAD, MVT::v2i64, Legal);
setOperationAction(ISD::SELECT, MVT::v2f64, Custom);		setOperationAction(ISD::SELECT, MVT::v2f64, Custom);
setOperationAction(ISD::SELECT, MVT::v2i64, Custom);		setOperationAction(ISD::SELECT, MVT::v2i64, Custom);

		setOperationAction(ISD::SAD, MVT::v8i8, Custom);
		setOperationAction(ISD::SAD, MVT::v16i8, Expand);
		abUnsubmitted Not Done Reply Inline Actions Why not Legal? ab: Why not Legal?

setOperationAction(ISD::FP_TO_SINT, MVT::v4i32, Legal);		setOperationAction(ISD::FP_TO_SINT, MVT::v4i32, Legal);
setOperationAction(ISD::SINT_TO_FP, MVT::v4i32, Legal);		setOperationAction(ISD::SINT_TO_FP, MVT::v4i32, Legal);

setOperationAction(ISD::UINT_TO_FP, MVT::v4i8, Custom);		setOperationAction(ISD::UINT_TO_FP, MVT::v4i8, Custom);
setOperationAction(ISD::UINT_TO_FP, MVT::v4i16, Custom);		setOperationAction(ISD::UINT_TO_FP, MVT::v4i16, Custom);
// As there is no 64-bit GPR available, we need build a special custom		// As there is no 64-bit GPR available, we need build a special custom
// sequence to convert from v2i32 to v2f32.		// sequence to convert from v2i32 to v2f32.
if (!Subtarget->is64Bit())		if (!Subtarget->is64Bit())
▲ Show 20 Lines • Show All 15,982 Lines • ▼ Show 20 Lines	if (DstVT==MVT::i64 && SrcVT.isVector())
return Op;		return Op;
// MMX <=> MMX conversions are Legal.		// MMX <=> MMX conversions are Legal.
if (SrcVT.isVector() && DstVT.isVector())		if (SrcVT.isVector() && DstVT.isVector())
return Op;		return Op;
// All other conversions need to be expanded.		// All other conversions need to be expanded.
return SDValue();		return SDValue();
}		}

		static SDValue LowerSAD(SDValue Op, const X86Subtarget *Subtarget,
		SelectionDAG &DAG) {
		SDNode *Node = Op.getNode();
		SDLoc dl(Node);
		SDValue N0 = Node->getOperand(0);
		SDValue N1 = Node->getOperand(1);
		SDValue result;

		if (N0.getValueType() != MVT::v8i8)
		return SDValue();

		if (Subtarget->hasSSE1() \|\| Subtarget->hasSSE2() \|\| Subtarget->hasAVX()) {
		EVT VecIdxTy = DAG.getTargetLoweringInfo().getVectorIdxTy();
		SDValue V0 = DAG.getUNDEF(MVT::v8i8);
		SDValue V1 = DAG.getUNDEF(MVT::v8i8);
		abUnsubmitted Not Done Reply Inline Actions Why not a single SDValue? ab: Why not a single SDValue?
		SDValue Op0 = DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v16i8, V0, N0);
		SDValue Op1 = DAG.getNode(ISD::CONCAT_VECTORS, dl, MVT::v16i8, V1, N1);
		SDValue Ops[] = {Op0, Op1};
		SDValue SAD = DAG.getNode(ISD::SAD, dl, MVT::v2i64, Ops);
		abUnsubmitted Not Done Reply Inline Actions There's a getNode overload you can use for two operands: getNode(..., Op0, Op1), like you do right below. ab: There's a getNode overload you can use for two operands: getNode(..., Op0, Op1), like you do…
		SDValue BottomHalf = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, dl, MVT::i64, SAD,
		DAG.getConstant(0, VecIdxTy));
		result = DAG.getNode(ISD::TRUNCATE, dl, MVT::i32, BottomHalf);
		abUnsubmitted Not Done Reply Inline Actions The original ISD::SAD building returns an integer (matching the intrinsic). Why v2i64 + extract + trunc here, instead of just doing DAG.getNode(ISD::SAD, dl, Op.getValueType(), ...) ? ab: The original ISD::SAD building returns an integer (matching the intrinsic). Why v2i64 +…
		} else
		result = SDValue();
		return result;
		}

static SDValue LowerCTPOP(SDValue Op, const X86Subtarget *Subtarget,		static SDValue LowerCTPOP(SDValue Op, const X86Subtarget *Subtarget,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
SDNode *Node = Op.getNode();		SDNode *Node = Op.getNode();
SDLoc dl(Node);		SDLoc dl(Node);

Op = Op.getOperand(0);		Op = Op.getOperand(0);
EVT VT = Op.getValueType();		EVT VT = Op.getValueType();
assert((VT.is128BitVector() \|\| VT.is256BitVector()) &&		assert((VT.is128BitVector() \|\| VT.is256BitVector()) &&
▲ Show 20 Lines • Show All 322 Lines • ▼ Show 20 Lines	SDValue X86TargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const {
case ISD::BITCAST: return LowerBITCAST(Op, Subtarget, DAG);		case ISD::BITCAST: return LowerBITCAST(Op, Subtarget, DAG);
case ISD::ADDC:		case ISD::ADDC:
case ISD::ADDE:		case ISD::ADDE:
case ISD::SUBC:		case ISD::SUBC:
case ISD::SUBE: return LowerADDC_ADDE_SUBC_SUBE(Op, DAG);		case ISD::SUBE: return LowerADDC_ADDE_SUBC_SUBE(Op, DAG);
case ISD::ADD: return LowerADD(Op, DAG);		case ISD::ADD: return LowerADD(Op, DAG);
case ISD::SUB: return LowerSUB(Op, DAG);		case ISD::SUB: return LowerSUB(Op, DAG);
case ISD::FSINCOS: return LowerFSINCOS(Op, Subtarget, DAG);		case ISD::FSINCOS: return LowerFSINCOS(Op, Subtarget, DAG);
		case ISD::SAD: return LowerSAD(Op, Subtarget, DAG);
}		}
}		}

/// ReplaceNodeResults - Replace a node with an illegal result type		/// ReplaceNodeResults - Replace a node with an illegal result type
/// with a new node built out of custom code.		/// with a new node built out of custom code.
void X86TargetLowering::ReplaceNodeResults(SDNode *N,		void X86TargetLowering::ReplaceNodeResults(SDNode *N,
SmallVectorImpl<SDValue>&Results,		SmallVectorImpl<SDValue>&Results,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
▲ Show 20 Lines • Show All 7,451 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	public:
unsigned getIntImmCost(const APInt &Imm, Type *Ty);		unsigned getIntImmCost(const APInt &Imm, Type *Ty);

unsigned getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm,		unsigned getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm,
Type *Ty);		Type *Ty);
unsigned getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		unsigned getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
Type *Ty);		Type *Ty);
bool isLegalMaskedLoad(Type *DataType, int Consecutive);		bool isLegalMaskedLoad(Type *DataType, int Consecutive);
bool isLegalMaskedStore(Type *DataType, int Consecutive);		bool isLegalMaskedStore(Type *DataType, int Consecutive);
		bool isLegalSad(Type *DataType);

/// @}		/// @}
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 1,118 Lines • ▼ Show 20 Lines	if (ST->hasAVX512() \|\| ST->hasAVX2())
return true;		return true;
return false;		return false;
}		}

bool X86TTIImpl::isLegalMaskedStore(Type *DataType, int Consecutive) {		bool X86TTIImpl::isLegalMaskedStore(Type *DataType, int Consecutive) {
return isLegalMaskedLoad(DataType, Consecutive);		return isLegalMaskedLoad(DataType, Consecutive);
}		}

		bool X86TTIImpl::isLegalSad(Type *DataTy) {

		// Checks if target support SAD instruction
		if (!DataTy) {
		if (ST->hasSSE1())
		return true;

		if (ST->hasSSE2())
		return true;

		if (ST->hasAVX())
		return true;

		return false;
		}

		assert(DataTy->isVectorTy() && "Must be a vector");
		assert(DataTy->getScalarType()->isIntegerTy() && "Elem must be an integer");
		abUnsubmitted Not Done Reply Inline Actions Why not "return false"? It doesn't make much sense to call it on any other type, but I think that's the point of the function. ab: Why not "return false"? It doesn't make much sense to call it on any other type, but I think…

		// Reaching here means target supports SAD, check if the
		// supported instruction is legal for a given type ...
		Type *ITy = DataTy->getScalarType();
		if (ITy != Type::getInt8Ty(DataTy->getContext()))
		return false;

		// and for size
		VectorType *Ty = cast<VectorType>(DataTy);
		int VLen = Ty->getNumElements();
		if (VLen != 8 && VLen != 16)
		return false;

		return true;
		}

test/CodeGen/X86/sad_intrinsic.ll

This file was added.

				; RUN: llc -mtriple=x86_64-unknown-linux-gnu -mattr=+avx < %s \| FileCheck %s -check-prefix=AVX
				abUnsubmitted Not Done Reply Inline Actions Why AVX? This is all SSE2, no? ab: Why AVX? This is all SSE2, no?

				; AVX-LABEL: sad_intrinsic_test1

				; AVX: psadbw %xmm1, %xmm0
				abUnsubmitted Not Done Reply Inline Actions Explicitly checking the movd would be useful, I think ab: Explicitly checking the movd would be useful, I think


				define i32 @sad_intrinsic_test1(<16 x i8> %a1, <16 x i8> %a2) {
				%1 = call i32 @llvm.sad.i32.v16i8.v16i8(<16 x i8> %a1, <16 x i8> %a2)
				ret i32 %1
				}


				; AVX-LABEL: sad_intrinsic_test2

				; AVX: psadbw %xmm1, %xmm0

				define i32 @sad_intrinsic_test2(<8 x i8> %a1, <8 x i8> %a2) {
				%1 = call i32 @llvm.sad.i32.v8i8.v8i8(<8 x i8> %a1, <8 x i8> %a2)
				ret i32 %1
				}

				declare i32 @llvm.sad.i32.v8i8.v8i8(<8 x i8>, <8 x i8>)
				declare i32 @llvm.sad.i32.v16i8.v16i8(<16 x i8>, <16 x i8>)

This is an archive of the discontinued LLVM Phabricator instance.

[PATCH][CodeGen] Adding "llvm.sad" intrinsic and corresponding ISD::SAD node for "Sum Of Absolute Differences" operationNeeds RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 23773

include/llvm/Analysis/TargetTransformInfo.h

include/llvm/Analysis/TargetTransformInfoImpl.h

include/llvm/CodeGen/ISDOpcodes.h

include/llvm/IR/Intrinsics.td

lib/Analysis/TargetTransformInfo.cpp

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

lib/Target/X86/X86ISelDAGToDAG.cpp

lib/Target/X86/X86ISelLowering.cpp

lib/Target/X86/X86TargetTransformInfo.h

lib/Target/X86/X86TargetTransformInfo.cpp

test/CodeGen/X86/sad_intrinsic.ll

[PATCH][CodeGen] Adding "llvm.sad" intrinsic and corresponding ISD::SAD node for "Sum Of Absolute Differences" operation
Needs RevisionPublic