This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
4/4
AArch64TargetTransformInfo.cpp
-
test/Analysis/CostModel/AArch64/
-
Analysis/
-
CostModel/
-
AArch64/
-
arith-overflow.ll
-
cast.ll

Differential D132784

[AArch64][TTI] Add cost table entry for trunc over vector of integers.
ClosedPublic

Authored by mingmingl on Aug 27 2022, 1:14 AM.

Download Raw Diff

Details

Reviewers

dmgreen
jaykang10
CarolineConcatto
fhahn

Commits

rG242203d254c5: [AArch64][TTI] Add cost table entry for trunc over vector of integers.

Summary

Tablegen patterns exist to use 'xtn' and 'uzp1' for trunc [1]. Cost table entries are updated based on the actual number of {xtn, uzp1} instructions generated.
Without this, an IR instruction like trunc <8 x i16> %v to <8 x i8> is considered free and might be sinked to other basic blocks. As a result, the sinked 'trunc' is in a different basic block with its (usually not-free) vector operand and misses the chance to be combined during instruction selection. (examples in [2])
It's a lot of effort to teach CodeGenPrepare.cpp to sink the operand of trunc without introducing regressions, since the instruction to compute the operand of trunc could be faster (e.g., throughput) than the instruction corresponding to "trunc (bin-vector-op".
- For instance in [3], sinking %1 (as trunc operand) into bb.1 and bb.2 means to replace 2 xtn with 2 shrn (shrn has a throughput of 1 and only utilize v1 pipeline), which is not necessarily good, especially since ushr result needs to be preserved for store operation in bb.0. Meanwhile, it's too optimistic (for CodeGenPrepare pass) to assume machine-cse will always be able to de-dup shrn from various basic blocks into one shrn.

[1] Pattern for {v8i16->v8i8, v4i32->v4i16, v2i64->v2i32} and pattern for concat (trunc, trunc) -> uzp1

[2] pattern for trunc(umin(X, 255)) -> UQXTRN v8i8 and other {u,s}x{min,max} pattern for v8i16 operands), and pattern for shrn (v8i16->v8i8, v2i64->v2i32)

[3]

; instruction latency / throughput / pipeline on `neoverse-n1`
bb.0:
  %1 = lshr <8 x i16> %10, <i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4, i16 4>   ; ushr, latency 2, throughput 1, pipeline V1
  %2 = trunc <8 x i16> %1 to <8 x i8>  ; xtn, latency 2, throughput 2, pipeline V
  %3 = store <8 x i8> %1, ptr %addr
  br cond i1 cond, label bb.1, label bb.2
    
bb.1:
  %4 = trunc <8 x i16> %1 to <8 x i8> ; xtn
    
bb.2:
  %5 = trunc <8 x i16> %1 to <8 x i8> ; xtn

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mingmingl created this revision.Aug 27 2022, 1:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 27 2022, 1:14 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

mingmingl requested review of this revision.Aug 27 2022, 1:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 27 2022, 1:14 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

mingmingl retitled this revision from [AArch64][TTI] Set the cost of XTN to 1 for 2xi64 (to 2xi32) and 8xi16 (to 8xi8). to [AArch64][TTI] Set the cost of XTN to 1 for 2xi64 (to 2xi32) and 8xi16 (to8xi8)..Aug 27 2022, 1:31 AM

mingmingl edited the summary of this revision. (Show Details)

mingmingl added a reviewer: dmgreen.

mingmingl edited the summary of this revision. (Show Details)

mingmingl edited the summary of this revision. (Show Details)Aug 27 2022, 1:39 AM

Harbormaster completed remote builds in B183733: Diff 456093.Aug 27 2022, 1:50 AM

This certainly sounds sensible. There are some tests in llvm/test/Analysis/CostModel/AArch64/cast.ll that suggest these costs are already 1, but that seems to not apply in all situations. I don't really understand why at the moment, I would guess that something is incorrectly marking it as free, but only some of the time.

Can you add a test case to show where it does give a difference in the costs?

Specify datalayout explicitly for llvm/test/Analysis/CostModel/AArch64/cast.ll in this patch, and in its diffbase D132856 to demonstrate that lacking datalayout could affect the accuracy of cost-analysis test.

In D132784#3754849, @dmgreen wrote:

This certainly sounds sensible. There are some tests in llvm/test/Analysis/CostModel/AArch64/cast.ll that suggest these costs are already 1, but that seems to not apply in all situations.

This is a fairly good point. With 'gdb opt' debugging, the direct cause that costs are already 1 is that`cast.ll` doesn't explicitly specify a data-layout (that provides legal integer types) and opt won't guess from mtriple either. D132856 sets the data-layout to aarch64 le and updated tests passed. (not 100% sure if opt should make this 'guess', or data-layout should be mandatory in tests; the latter was discussed in this llvm-dev thread)

The longer story is, the code executes in the following flow without a data-layout

AArch64TTIImpl::getCastInstrCost falls back to call BaseT::getCastInstrCost (callsite) after aarch64-specific cost-table look-up gives no result.
BaseT::getCastInstrCost first calls BaseT::getCastInstrCost (callsite) and gets cost '1' so it continues at line 970
- Missing data-layout in cast.ll means DataLayout:: LegalIntWidths vector is empty and thereby no integer type is legal, so 'trunc' is not free and this is where 1 is returned -> note this is NOT the '1' observed by llvm/test/Analysis/CostModel/AArch64/cast.l (3 explains why)

BaseT::getCastInstrCost continues to get the legalization cost (which is the 1 relied on by cast.ll), and returns here

I don't really understand why at the moment, I would guess that something is incorrectly marking it as free, but only some of the time.

Can you add a test case to show where it does give a difference in the costs?

Of course! Created D132856 to show what a passing test looks like with aarch64 little-endian layout specified, and updated this patch to show the diff of added entries.

As a collateral result of specifying data-layout and added entries (in this patch), 'trunc <16 x i16> undef to <16 x i8>' cost now becomes 3

3 is calculated by cost model as, 1 for split cost, and 2 x cast cost, relevant lines)
https://gcc.godbolt.org/z/rjbdf7Gjd shows uzp1 (latency 2 and throughput 2 on neoverse-n1) is generated.

I wonder if the fact that trunc <16 x i16> undef to <16 x i8> is over-costly indicates an entry "{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i16, 1 }," should be added in this same patch? Also i might try adding 'data-layout' for other aarch64 cost tests and see what's the diff as a next step.

Harbormaster completed remote builds in B183970: Diff 456360.Aug 29 2022, 10:53 AM

https://reviews.llvm.org/D132889 sweeps tests in llvm/test/Analysis/CostModel/AArch64 by adding aarch64-le target layout and updating tests -> only cast.ll and arith-overflow.ll stand out.

Given the split approach of BaseTTIImpl is used as a fallback when entry is not present in the table, it's likely that updating a few entries (as current patch does) to the correct number may still cause inaccuracies (in terms of how they are used in split approach and affecting the estimation of wider types).

Planning to add more entries (using https://gcc.godbolt.org/z/q8qodd147 as a template to get a better idea of codegen for different trunc), not sure if there is a minimum set of table entries that (works well with split approach and thereby) get all (at least all in cast.ll) combinations of trunc as accurate as possible.

Meanwhile would appreciate feedbacks on this plan! (assuming i'm on the right track to attribute cost to trunc for typical aarch64 data layout :) )

Added more table entries for trunc of vector integers. In this way, costs are specified explicitly rather than depending on the 'split half' approach in BaseTTI.

Updated arith-overflow.ll as a result of more table entries.

In D132784#3757003, @mingmingl wrote:

https://reviews.llvm.org/D132889 sweeps tests in llvm/test/Analysis/CostModel/AArch64 by adding aarch64-le target layout and updating tests -> only cast.ll and arith-overflow.ll stand out.

Given the split approach of BaseTTIImpl is used as a fallback when entry is not present in the table, it's likely that updating a few entries (as current patch does) to the correct number may still cause inaccuracies (in terms of how they are used in split approach and affecting the estimation of wider types).

Planning to add more entries (using https://gcc.godbolt.org/z/q8qodd147 as a template to get a better idea of codegen for different trunc), not sure if there is a minimum set of table entries that (works well with split approach and thereby) get all (at least all in cast.ll) combinations of trunc as accurate as possible.

Meanwhile would appreciate feedbacks on this plan! (assuming i'm on the right track to attribute cost to trunc for typical aarch64 data layout :) )

Ended up adding entries for 'trunc' over vector integers rather than relying on split approach of BaseTTI.
The numerical value is based on the number of {xtn, uzp1} in actual codegen (https://gist.github.com/minglotus-6/438e49494fe3d26876933141f889c2ac has godbolt links for many 'trunc'). 'arith-overflow.ll' is updated accordingly.

Besides, trunc <4 x i64> %var to <4 x i8> seems suboptimal -> https://gcc.godbolt.org/z/b36oEr11d gives 2 xtn + 1 uzp1 while 1 uzp1 + 1 xtn (4 x i64 -> 4 x i32, then 4 x i32 -> 4 x i8) should be sufficient IIUC. Use '3' (from actual codegen) and added a FIXME for it.

Harbormaster completed remote builds in B184084: Diff 456557.Aug 30 2022, 12:48 AM

mingmingl mentioned this in D132856: [NFC][AArch64] Specify aarch64 little-endian linux datalayout explicitly for cast.ll and arith-overflow.ll .Aug 30 2022, 12:49 AM

The new values LGTM. Thanks.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1611	It looks like it is already 2 here: https://godbolt.org/z/T4rTqf1Tx. I guess it is not very reliable at the moment.

This revision is now accepted and ready to land.Sep 1 2022, 2:07 AM

Changes:

Update the cost of trunc <4 x i64> to <4 x i8> from 3 to 2. Filed https://github.com/llvm/llvm-project/issues/57502 to track the differences in codegen.
Re-format affected table lines so columns are aligned. Only <4 x i64> -> <4 x i8> entry is updated in cast.ll and all other lines remain unchanged, compared with reviewed version (i.e., no inadvertent change due to re-formatting)

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1611	I filed https://github.com/llvm/llvm-project/issues/57502 to track this difference and took the liberty to update this number to 2 in this patch. Re-format the table entries so columns are aligned -> only <4 x i64> -> <4 x i8> entry is updated in `cast.ll` as a result of this entry update, compared with reviewed version. Please let me know if it's more idiomatic to use the higher one (when the codegen is not very reliable and the diff is small). Will hold this patch for a few more days and possibly commit on next Monday if the change from 3 to 2 looks good. Thanks for reviews!

Harbormaster completed remote builds in B184592: Diff 457283.Sep 1 2022, 9:57 AM

dmgreen added inline comments.Sep 2 2022, 1:15 AM

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1611	I like removing the FIXME, but if the score is somewhere between 2 and 3, then 3 is probably the better option. It seems to be 3 in more cases where the result gets used, too.

Update the cost of "trunc <4 x i64> to <4 x i8>" to 3 based on discussions.

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp
1611	Got it, thanks! Updated this entry to 3 and going to commit.

This revision was landed with ongoing or failed builds.Sep 2 2022, 10:07 AM

Closed by commit rG242203d254c5: [AArch64][TTI] Add cost table entry for trunc over vector of integers. (authored by mingmingl). · Explain Why

This revision was automatically updated to reflect the committed changes.

mingmingl added a commit: rG242203d254c5: [AArch64][TTI] Add cost table entry for trunc over vector of integers..

Harbormaster completed remote builds in B184843: Diff 457618.Sep 2 2022, 10:34 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64TargetTransformInfo.cpp

24 lines

test/

Analysis/

CostModel/

AArch64/

arith-overflow.ll

36 lines

cast.ll

34 lines

Diff 457629

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 1,596 Lines • ▼ Show 20 Lines	InstructionCost AArch64TTIImpl::getCastInstrCost(unsigned Opcode, Type *Dst,
EVT DstTy = TLI->getValueType(DL, Dst);		EVT DstTy = TLI->getValueType(DL, Dst);

if (!SrcTy.isSimple() \|\| !DstTy.isSimple())		if (!SrcTy.isSimple() \|\| !DstTy.isSimple())
return AdjustCost(		return AdjustCost(
BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));		BaseT::getCastInstrCost(Opcode, Dst, Src, CCH, CostKind, I));

static const TypeConversionCostTblEntry		static const TypeConversionCostTblEntry
ConversionTbl[] = {		ConversionTbl[] = {
{ ISD::TRUNCATE, MVT::v4i16, MVT::v4i32, 1 },		{ ISD::TRUNCATE, MVT::v2i8, MVT::v2i64, 1}, // xtn
{ ISD::TRUNCATE, MVT::v4i32, MVT::v4i64, 0 },		{ ISD::TRUNCATE, MVT::v2i16, MVT::v2i64, 1}, // xtn
{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i32, 3 },		{ ISD::TRUNCATE, MVT::v2i32, MVT::v2i64, 1}, // xtn
{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i32, 6 },		{ ISD::TRUNCATE, MVT::v4i8, MVT::v4i32, 1}, // xtn
		{ ISD::TRUNCATE, MVT::v4i8, MVT::v4i64, 3}, // 2 xtn + 1 uzp1
		{ ISD::TRUNCATE, MVT::v4i16, MVT::v4i32, 1}, // xtn
		{ ISD::TRUNCATE, MVT::v4i16, MVT::v4i64, 2}, // 1 uzp1 + 1 xtn
		dmgreenUnsubmitted Done Reply Inline Actions It looks like it is already 2 here: https://godbolt.org/z/T4rTqf1Tx. I guess it is not very reliable at the moment. dmgreen: It looks like it is already 2 here: https://godbolt.org/z/T4rTqf1Tx. I guess it is not very…
		mingminglAuthorUnsubmitted Done Reply Inline Actions I filed https://github.com/llvm/llvm-project/issues/57502 to track this difference and took the liberty to update this number to 2 in this patch. Re-format the table entries so columns are aligned -> only <4 x i64> -> <4 x i8> entry is updated in `cast.ll` as a result of this entry update, compared with reviewed version. Please let me know if it's more idiomatic to use the higher one (when the codegen is not very reliable and the diff is small). Will hold this patch for a few more days and possibly commit on next Monday if the change from 3 to 2 looks good. Thanks for reviews! mingmingl: I filed https://github.com/llvm/llvm-project/issues/57502 to track this difference and took the…
		dmgreenUnsubmitted Done Reply Inline Actions I like removing the FIXME, but if the score is somewhere between 2 and 3, then 3 is probably the better option. It seems to be 3 in more cases where the result gets used, too. dmgreen: I like removing the FIXME, but if the score is somewhere between 2 and 3, then 3 is probably…
		mingminglAuthorUnsubmitted Done Reply Inline Actions Got it, thanks! Updated this entry to 3 and going to commit. mingmingl: Got it, thanks! Updated this entry to 3 and going to commit.
		{ ISD::TRUNCATE, MVT::v4i32, MVT::v4i64, 1}, // 1 uzp1
		{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i16, 1}, // 1 xtn
		{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i32, 2}, // 1 uzp1 + 1 xtn
		{ ISD::TRUNCATE, MVT::v8i8, MVT::v8i64, 4}, // 3 x uzp1 + xtn
		{ ISD::TRUNCATE, MVT::v8i16, MVT::v8i32, 1}, // 1 uzp1
		{ ISD::TRUNCATE, MVT::v8i16, MVT::v8i64, 3}, // 3 x uzp1
		{ ISD::TRUNCATE, MVT::v8i32, MVT::v8i64, 2}, // 2 x uzp1
		{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i16, 1}, // uzp1
		{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i32, 3}, // (2 + 1) x uzp1
		{ ISD::TRUNCATE, MVT::v16i8, MVT::v16i64, 7}, // (4 + 2 + 1) x uzp1
		{ ISD::TRUNCATE, MVT::v16i16, MVT::v16i32, 2}, // 2 x uzp1
		{ ISD::TRUNCATE, MVT::v16i16, MVT::v16i64, 6}, // (4 + 2) x uzp1
		{ ISD::TRUNCATE, MVT::v16i32, MVT::v16i64, 4}, // 4 x uzp1

// Truncations on nxvmiN		// Truncations on nxvmiN
{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i16, 1 },		{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i16, 1 },
{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i32, 1 },		{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i32, 1 },
{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i64, 1 },		{ ISD::TRUNCATE, MVT::nxv2i1, MVT::nxv2i64, 1 },
{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i16, 1 },		{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i16, 1 },
{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i32, 1 },		{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i32, 1 },
{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i64, 2 },		{ ISD::TRUNCATE, MVT::nxv4i1, MVT::nxv4i64, 2 },
▲ Show 20 Lines • Show All 1,475 Lines • Show Last 20 Lines

llvm/test/Analysis/CostModel/AArch64/arith-overflow.ll

	Show First 20 Lines • Show All 353 Lines • ▼ Show 20 Lines

	define i32 @smul(i32 %arg) {			define i32 @smul(i32 %arg) {
	; RECIP-LABEL: 'smul'			; RECIP-LABEL: 'smul'
	; RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I64 = call { i64, i1 } @llvm.smul.with.overflow.i64(i64 undef, i64 undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I64 = call { i64, i1 } @llvm.smul.with.overflow.i64(i64 undef, i64 undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V2I64 = call { <2 x i64>, <2 x i1> } @llvm.smul.with.overflow.v2i64(<2 x i64> undef, <2 x i64> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 18 for instruction: %V2I64 = call { <2 x i64>, <2 x i1> } @llvm.smul.with.overflow.v2i64(<2 x i64> undef, <2 x i64> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V4I64 = call { <4 x i64>, <4 x i1> } @llvm.smul.with.overflow.v4i64(<4 x i64> undef, <4 x i64> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V4I64 = call { <4 x i64>, <4 x i1> } @llvm.smul.with.overflow.v4i64(<4 x i64> undef, <4 x i64> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %V8I64 = call { <8 x i64>, <8 x i1> } @llvm.smul.with.overflow.v8i64(<8 x i64> undef, <8 x i64> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %V8I64 = call { <8 x i64>, <8 x i1> } @llvm.smul.with.overflow.v8i64(<8 x i64> undef, <8 x i64> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %I32 = call { i32, i1 } @llvm.smul.with.overflow.i32(i32 undef, i32 undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %I32 = call { i32, i1 } @llvm.smul.with.overflow.i32(i32 undef, i32 undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 36 for instruction: %V4I32 = call { <4 x i32>, <4 x i1> } @llvm.smul.with.overflow.v4i32(<4 x i32> undef, <4 x i32> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 38 for instruction: %V4I32 = call { <4 x i32>, <4 x i1> } @llvm.smul.with.overflow.v4i32(<4 x i32> undef, <4 x i32> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 72 for instruction: %V8I32 = call { <8 x i32>, <8 x i1> } @llvm.smul.with.overflow.v8i32(<8 x i32> undef, <8 x i32> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 76 for instruction: %V8I32 = call { <8 x i32>, <8 x i1> } @llvm.smul.with.overflow.v8i32(<8 x i32> undef, <8 x i32> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 144 for instruction: %V16I32 = call { <16 x i32>, <16 x i1> } @llvm.smul.with.overflow.v16i32(<16 x i32> undef, <16 x i32> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 152 for instruction: %V16I32 = call { <16 x i32>, <16 x i1> } @llvm.smul.with.overflow.v16i32(<16 x i32> undef, <16 x i32> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I16 = call { i16, i1 } @llvm.smul.with.overflow.i16(i16 undef, i16 undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I16 = call { i16, i1 } @llvm.smul.with.overflow.i16(i16 undef, i16 undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 16 for instruction: %V8I16 = call { <8 x i16>, <8 x i1> } @llvm.smul.with.overflow.v8i16(<8 x i16> undef, <8 x i16> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V8I16 = call { <8 x i16>, <8 x i1> } @llvm.smul.with.overflow.v8i16(<8 x i16> undef, <8 x i16> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 32 for instruction: %V16I16 = call { <16 x i16>, <16 x i1> } @llvm.smul.with.overflow.v16i16(<16 x i16> undef, <16 x i16> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V16I16 = call { <16 x i16>, <16 x i1> } @llvm.smul.with.overflow.v16i16(<16 x i16> undef, <16 x i16> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 64 for instruction: %V32I16 = call { <32 x i16>, <32 x i1> } @llvm.smul.with.overflow.v32i16(<32 x i16> undef, <32 x i16> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V32I16 = call { <32 x i16>, <32 x i1> } @llvm.smul.with.overflow.v32i16(<32 x i16> undef, <32 x i16> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I8 = call { i8, i1 } @llvm.smul.with.overflow.i8(i8 undef, i8 undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I8 = call { i8, i1 } @llvm.smul.with.overflow.i8(i8 undef, i8 undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V16I8 = call { <16 x i8>, <16 x i1> } @llvm.smul.with.overflow.v16i8(<16 x i8> undef, <16 x i8> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %V16I8 = call { <16 x i8>, <16 x i1> } @llvm.smul.with.overflow.v16i8(<16 x i8> undef, <16 x i8> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V32I8 = call { <32 x i8>, <32 x i1> } @llvm.smul.with.overflow.v32i8(<32 x i8> undef, <32 x i8> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %V32I8 = call { <32 x i8>, <32 x i1> } @llvm.smul.with.overflow.v32i8(<32 x i8> undef, <32 x i8> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V64I8 = call { <64 x i8>, <64 x i1> } @llvm.smul.with.overflow.v64i8(<64 x i8> undef, <64 x i8> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %V64I8 = call { <64 x i8>, <64 x i1> } @llvm.smul.with.overflow.v64i8(<64 x i8> undef, <64 x i8> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; RECIP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; SIZE-LABEL: 'smul'			; SIZE-LABEL: 'smul'
	; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I64 = call { i64, i1 } @llvm.smul.with.overflow.i64(i64 undef, i64 undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I64 = call { i64, i1 } @llvm.smul.with.overflow.i64(i64 undef, i64 undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V2I64 = call { <2 x i64>, <2 x i1> } @llvm.smul.with.overflow.v2i64(<2 x i64> undef, <2 x i64> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V2I64 = call { <2 x i64>, <2 x i1> } @llvm.smul.with.overflow.v2i64(<2 x i64> undef, <2 x i64> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V4I64 = call { <4 x i64>, <4 x i1> } @llvm.smul.with.overflow.v4i64(<4 x i64> undef, <4 x i64> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V4I64 = call { <4 x i64>, <4 x i1> } @llvm.smul.with.overflow.v4i64(<4 x i64> undef, <4 x i64> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V8I64 = call { <8 x i64>, <8 x i1> } @llvm.smul.with.overflow.v8i64(<8 x i64> undef, <8 x i64> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V8I64 = call { <8 x i64>, <8 x i1> } @llvm.smul.with.overflow.v8i64(<8 x i64> undef, <8 x i64> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %I32 = call { i32, i1 } @llvm.smul.with.overflow.i32(i32 undef, i32 undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %I32 = call { i32, i1 } @llvm.smul.with.overflow.i32(i32 undef, i32 undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V4I32 = call { <4 x i32>, <4 x i1> } @llvm.smul.with.overflow.v4i32(<4 x i32> undef, <4 x i32> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V4I32 = call { <4 x i32>, <4 x i1> } @llvm.smul.with.overflow.v4i32(<4 x i32> undef, <4 x i32> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V8I32 = call { <8 x i32>, <8 x i1> } @llvm.smul.with.overflow.v8i32(<8 x i32> undef, <8 x i32> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V8I32 = call { <8 x i32>, <8 x i1> } @llvm.smul.with.overflow.v8i32(<8 x i32> undef, <8 x i32> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %V16I32 = call { <16 x i32>, <16 x i1> } @llvm.smul.with.overflow.v16i32(<16 x i32> undef, <16 x i32> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V16I32 = call { <16 x i32>, <16 x i1> } @llvm.smul.with.overflow.v16i32(<16 x i32> undef, <16 x i32> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I16 = call { i16, i1 } @llvm.smul.with.overflow.i16(i16 undef, i16 undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I16 = call { i16, i1 } @llvm.smul.with.overflow.i16(i16 undef, i16 undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V8I16 = call { <8 x i16>, <8 x i1> } @llvm.smul.with.overflow.v8i16(<8 x i16> undef, <8 x i16> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V8I16 = call { <8 x i16>, <8 x i1> } @llvm.smul.with.overflow.v8i16(<8 x i16> undef, <8 x i16> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V16I16 = call { <16 x i16>, <16 x i1> } @llvm.smul.with.overflow.v16i16(<16 x i16> undef, <16 x i16> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V16I16 = call { <16 x i16>, <16 x i1> } @llvm.smul.with.overflow.v16i16(<16 x i16> undef, <16 x i16> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V32I16 = call { <32 x i16>, <32 x i1> } @llvm.smul.with.overflow.v32i16(<32 x i16> undef, <32 x i16> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V32I16 = call { <32 x i16>, <32 x i1> } @llvm.smul.with.overflow.v32i16(<32 x i16> undef, <32 x i16> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I8 = call { i8, i1 } @llvm.smul.with.overflow.i8(i8 undef, i8 undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %I8 = call { i8, i1 } @llvm.smul.with.overflow.i8(i8 undef, i8 undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V16I8 = call { <16 x i8>, <16 x i1> } @llvm.smul.with.overflow.v16i8(<16 x i8> undef, <16 x i8> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V16I8 = call { <16 x i8>, <16 x i1> } @llvm.smul.with.overflow.v16i8(<16 x i8> undef, <16 x i8> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V32I8 = call { <32 x i8>, <32 x i1> } @llvm.smul.with.overflow.v32i8(<32 x i8> undef, <32 x i8> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V32I8 = call { <32 x i8>, <32 x i1> } @llvm.smul.with.overflow.v32i8(<32 x i8> undef, <32 x i8> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V64I8 = call { <64 x i8>, <64 x i1> } @llvm.smul.with.overflow.v64i8(<64 x i8> undef, <64 x i8> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %V64I8 = call { <64 x i8>, <64 x i1> } @llvm.smul.with.overflow.v64i8(<64 x i8> undef, <64 x i8> undef)
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines

	define i32 @umul(i32 %arg) {			define i32 @umul(i32 %arg) {
	; RECIP-LABEL: 'umul'			; RECIP-LABEL: 'umul'
	; RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I64 = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 undef, i64 undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I64 = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 undef, i64 undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %V2I64 = call { <2 x i64>, <2 x i1> } @llvm.umul.with.overflow.v2i64(<2 x i64> undef, <2 x i64> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 17 for instruction: %V2I64 = call { <2 x i64>, <2 x i1> } @llvm.umul.with.overflow.v2i64(<2 x i64> undef, <2 x i64> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %V4I64 = call { <4 x i64>, <4 x i1> } @llvm.umul.with.overflow.v4i64(<4 x i64> undef, <4 x i64> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 34 for instruction: %V4I64 = call { <4 x i64>, <4 x i1> } @llvm.umul.with.overflow.v4i64(<4 x i64> undef, <4 x i64> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 68 for instruction: %V8I64 = call { <8 x i64>, <8 x i1> } @llvm.umul.with.overflow.v8i64(<8 x i64> undef, <8 x i64> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 68 for instruction: %V8I64 = call { <8 x i64>, <8 x i1> } @llvm.umul.with.overflow.v8i64(<8 x i64> undef, <8 x i64> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %I32 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 undef, i32 undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %I32 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 undef, i32 undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 35 for instruction: %V4I32 = call { <4 x i32>, <4 x i1> } @llvm.umul.with.overflow.v4i32(<4 x i32> undef, <4 x i32> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 37 for instruction: %V4I32 = call { <4 x i32>, <4 x i1> } @llvm.umul.with.overflow.v4i32(<4 x i32> undef, <4 x i32> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 70 for instruction: %V8I32 = call { <8 x i32>, <8 x i1> } @llvm.umul.with.overflow.v8i32(<8 x i32> undef, <8 x i32> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 74 for instruction: %V8I32 = call { <8 x i32>, <8 x i1> } @llvm.umul.with.overflow.v8i32(<8 x i32> undef, <8 x i32> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 140 for instruction: %V16I32 = call { <16 x i32>, <16 x i1> } @llvm.umul.with.overflow.v16i32(<16 x i32> undef, <16 x i32> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 148 for instruction: %V16I32 = call { <16 x i32>, <16 x i1> } @llvm.umul.with.overflow.v16i32(<16 x i32> undef, <16 x i32> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I16 = call { i16, i1 } @llvm.umul.with.overflow.i16(i16 undef, i16 undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I16 = call { i16, i1 } @llvm.umul.with.overflow.i16(i16 undef, i16 undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 15 for instruction: %V8I16 = call { <8 x i16>, <8 x i1> } @llvm.umul.with.overflow.v8i16(<8 x i16> undef, <8 x i16> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V8I16 = call { <8 x i16>, <8 x i1> } @llvm.umul.with.overflow.v8i16(<8 x i16> undef, <8 x i16> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 30 for instruction: %V16I16 = call { <16 x i16>, <16 x i1> } @llvm.umul.with.overflow.v16i16(<16 x i16> undef, <16 x i16> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V16I16 = call { <16 x i16>, <16 x i1> } @llvm.umul.with.overflow.v16i16(<16 x i16> undef, <16 x i16> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 60 for instruction: %V32I16 = call { <32 x i16>, <32 x i1> } @llvm.umul.with.overflow.v32i16(<32 x i16> undef, <32 x i16> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V32I16 = call { <32 x i16>, <32 x i1> } @llvm.umul.with.overflow.v32i16(<32 x i16> undef, <32 x i16> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I8 = call { i8, i1 } @llvm.umul.with.overflow.i8(i8 undef, i8 undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I8 = call { i8, i1 } @llvm.umul.with.overflow.i8(i8 undef, i8 undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V16I8 = call { <16 x i8>, <16 x i1> } @llvm.umul.with.overflow.v16i8(<16 x i8> undef, <16 x i8> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %V16I8 = call { <16 x i8>, <16 x i1> } @llvm.umul.with.overflow.v16i8(<16 x i8> undef, <16 x i8> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V32I8 = call { <32 x i8>, <32 x i1> } @llvm.umul.with.overflow.v32i8(<32 x i8> undef, <32 x i8> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 22 for instruction: %V32I8 = call { <32 x i8>, <32 x i1> } @llvm.umul.with.overflow.v32i8(<32 x i8> undef, <32 x i8> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V64I8 = call { <64 x i8>, <64 x i1> } @llvm.umul.with.overflow.v64i8(<64 x i8> undef, <64 x i8> undef)			; RECIP-NEXT: Cost Model: Found an estimated cost of 44 for instruction: %V64I8 = call { <64 x i8>, <64 x i1> } @llvm.umul.with.overflow.v64i8(<64 x i8> undef, <64 x i8> undef)
	; RECIP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef			; RECIP-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
	;			;
	; SIZE-LABEL: 'umul'			; SIZE-LABEL: 'umul'
	; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I64 = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 undef, i64 undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %I64 = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 undef, i64 undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V2I64 = call { <2 x i64>, <2 x i1> } @llvm.umul.with.overflow.v2i64(<2 x i64> undef, <2 x i64> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V2I64 = call { <2 x i64>, <2 x i1> } @llvm.umul.with.overflow.v2i64(<2 x i64> undef, <2 x i64> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4I64 = call { <4 x i64>, <4 x i1> } @llvm.umul.with.overflow.v4i64(<4 x i64> undef, <4 x i64> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4I64 = call { <4 x i64>, <4 x i1> } @llvm.umul.with.overflow.v4i64(<4 x i64> undef, <4 x i64> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8I64 = call { <8 x i64>, <8 x i1> } @llvm.umul.with.overflow.v8i64(<8 x i64> undef, <8 x i64> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8I64 = call { <8 x i64>, <8 x i1> } @llvm.umul.with.overflow.v8i64(<8 x i64> undef, <8 x i64> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %I32 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 undef, i32 undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %I32 = call { i32, i1 } @llvm.umul.with.overflow.i32(i32 undef, i32 undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V4I32 = call { <4 x i32>, <4 x i1> } @llvm.umul.with.overflow.v4i32(<4 x i32> undef, <4 x i32> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V4I32 = call { <4 x i32>, <4 x i1> } @llvm.umul.with.overflow.v4i32(<4 x i32> undef, <4 x i32> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V8I32 = call { <8 x i32>, <8 x i1> } @llvm.umul.with.overflow.v8i32(<8 x i32> undef, <8 x i32> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8I32 = call { <8 x i32>, <8 x i1> } @llvm.umul.with.overflow.v8i32(<8 x i32> undef, <8 x i32> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 5 for instruction: %V16I32 = call { <16 x i32>, <16 x i1> } @llvm.umul.with.overflow.v16i32(<16 x i32> undef, <16 x i32> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16I32 = call { <16 x i32>, <16 x i1> } @llvm.umul.with.overflow.v16i32(<16 x i32> undef, <16 x i32> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I16 = call { i16, i1 } @llvm.umul.with.overflow.i16(i16 undef, i16 undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I16 = call { i16, i1 } @llvm.umul.with.overflow.i16(i16 undef, i16 undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8I16 = call { <8 x i16>, <8 x i1> } @llvm.umul.with.overflow.v8i16(<8 x i16> undef, <8 x i16> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V8I16 = call { <8 x i16>, <8 x i1> } @llvm.umul.with.overflow.v8i16(<8 x i16> undef, <8 x i16> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16I16 = call { <16 x i16>, <16 x i1> } @llvm.umul.with.overflow.v16i16(<16 x i16> undef, <16 x i16> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16I16 = call { <16 x i16>, <16 x i1> } @llvm.umul.with.overflow.v16i16(<16 x i16> undef, <16 x i16> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V32I16 = call { <32 x i16>, <32 x i1> } @llvm.umul.with.overflow.v32i16(<32 x i16> undef, <32 x i16> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V32I16 = call { <32 x i16>, <32 x i1> } @llvm.umul.with.overflow.v32i16(<32 x i16> undef, <32 x i16> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I8 = call { i8, i1 } @llvm.umul.with.overflow.i8(i8 undef, i8 undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %I8 = call { i8, i1 } @llvm.umul.with.overflow.i8(i8 undef, i8 undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16I8 = call { <16 x i8>, <16 x i1> } @llvm.umul.with.overflow.v16i8(<16 x i8> undef, <16 x i8> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V16I8 = call { <16 x i8>, <16 x i1> } @llvm.umul.with.overflow.v16i8(<16 x i8> undef, <16 x i8> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V32I8 = call { <32 x i8>, <32 x i1> } @llvm.umul.with.overflow.v32i8(<32 x i8> undef, <32 x i8> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V32I8 = call { <32 x i8>, <32 x i1> } @llvm.umul.with.overflow.v32i8(<32 x i8> undef, <32 x i8> undef)
	; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V64I8 = call { <64 x i8>, <64 x i1> } @llvm.umul.with.overflow.v64i8(<64 x i8> undef, <64 x i8> undef)			; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %V64I8 = call { <64 x i8>, <64 x i1> } @llvm.umul.with.overflow.v64i8(<64 x i8> undef, <64 x i8> undef)
	Show All 24 Lines

llvm/test/Analysis/CostModel/AArch64/cast.ll

	Show First 20 Lines • Show All 161 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %r26 = trunc i64 undef to i1			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %r26 = trunc i64 undef to i1
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %r27 = trunc i64 undef to i8			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %r27 = trunc i64 undef to i8
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %r28 = trunc i64 undef to i16			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %r28 = trunc i64 undef to i16
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %r29 = trunc i64 undef to i32			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %r29 = trunc i64 undef to i32
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i8i16 = trunc <2 x i16> undef to <2 x i8>			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i8i16 = trunc <2 x i16> undef to <2 x i8>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i8i32 = trunc <2 x i32> undef to <2 x i8>			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i8i32 = trunc <2 x i32> undef to <2 x i8>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s2i8i64 = trunc <2 x i64> undef to <2 x i8>			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s2i8i64 = trunc <2 x i64> undef to <2 x i8>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i16i32 = trunc <2 x i32> undef to <2 x i16>			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i16i32 = trunc <2 x i32> undef to <2 x i16>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i16i64 = trunc <2 x i64> undef to <2 x i16>			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s2i16i64 = trunc <2 x i64> undef to <2 x i16>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s2i32i64 = trunc <2 x i64> undef to <2 x i32>			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s2i32i64 = trunc <2 x i64> undef to <2 x i32>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i8i16 = trunc <4 x i16> undef to <4 x i8>			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i8i16 = trunc <4 x i16> undef to <4 x i8>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i8i32 = trunc <4 x i32> undef to <4 x i8>			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i8i32 = trunc <4 x i32> undef to <4 x i8>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i8i64 = trunc <4 x i64> undef to <4 x i8>			; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %s4i8i64 = trunc <4 x i64> undef to <4 x i8>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i16i32 = trunc <4 x i32> undef to <4 x i16>			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i16i32 = trunc <4 x i32> undef to <4 x i16>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i16i64 = trunc <4 x i64> undef to <4 x i16>			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s4i16i64 = trunc <4 x i64> undef to <4 x i16>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s4i32i64 = trunc <4 x i64> undef to <4 x i32>			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s4i32i64 = trunc <4 x i64> undef to <4 x i32>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i8i16 = trunc <8 x i16> undef to <8 x i8>			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8i8i16 = trunc <8 x i16> undef to <8 x i8>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %s8i8i32 = trunc <8 x i32> undef to <8 x i8>			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s8i8i32 = trunc <8 x i32> undef to <8 x i8>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i8i64 = trunc <8 x i64> undef to <8 x i8>			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %s8i8i64 = trunc <8 x i64> undef to <8 x i8>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %s8i16i32 = trunc <8 x i32> undef to <8 x i16>			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8i16i32 = trunc <8 x i32> undef to <8 x i16>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s8i16i64 = trunc <8 x i64> undef to <8 x i16>			; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %s8i16i64 = trunc <8 x i64> undef to <8 x i16>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s8i32i64 = trunc <8 x i64> undef to <8 x i32>			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s8i32i64 = trunc <8 x i64> undef to <8 x i32>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s16i8i16 = trunc <16 x i16> undef to <16 x i8>			; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s16i8i16 = trunc <16 x i16> undef to <16 x i8>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %s16i8i32 = trunc <16 x i32> undef to <16 x i8>			; CHECK-NEXT: Cost Model: Found an estimated cost of 3 for instruction: %s16i8i32 = trunc <16 x i32> undef to <16 x i8>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %s16i8i64 = trunc <16 x i64> undef to <16 x i8>			; CHECK-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %s16i8i64 = trunc <16 x i64> undef to <16 x i8>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %s16i16i32 = trunc <16 x i32> undef to <16 x i16>			; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s16i16i32 = trunc <16 x i32> undef to <16 x i16>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %s16i16i64 = trunc <16 x i64> undef to <16 x i16>			; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %s16i16i64 = trunc <16 x i64> undef to <16 x i16>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: %s16i32i64 = trunc <16 x i64> undef to <16 x i32>			; CHECK-NEXT: Cost Model: Found an estimated cost of 4 for instruction: %s16i32i64 = trunc <16 x i64> undef to <16 x i32>
	; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void			; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
	;			;
	%r8 = trunc i8 undef to i1			%r8 = trunc i8 undef to i1
	%r15 = trunc i16 undef to i1			%r15 = trunc i16 undef to i1
	%r16 = trunc i16 undef to i8			%r16 = trunc i16 undef to i8
	%r21 = trunc i32 undef to i1			%r21 = trunc i32 undef to i1
	%r22 = trunc i32 undef to i8			%r22 = trunc i32 undef to i8
	%r23 = trunc i32 undef to i16			%r23 = trunc i32 undef to i16
	▲ Show 20 Lines • Show All 1,014 Lines • Show Last 20 Lines