This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
2/6
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
AMDGPU/
-
combine-cond-add-sub.ll
-
dagcombine-setcc-select.ll
-
ds-alignment.ll
-
ds_write2.ll
-
idot4u.ll
-
idot8s.ll
-
idot8u.ll
-
load-local-redundant-copies.ll
-
store-local.128.ll
-
store-local.96.ll
4/8
widen-smrd-loads.ll
-
ARM/
-
addsubcarry-promotion.ll
-
icmp-shift-opt.ll
-
reg_sequence.ll
-
Hexagon/autohvx/
-
autohvx/
1
isel-vpackew.ll
4/7
mulh.ll
-
PowerPC/
3
aix32-cc-abi-vaarg.ll
-
combine-fneg.ll
-
select_const.ll
-
RISCV/
-
mul.ll
-
pr58511.ll
-
SystemZ/
-
pr36164.ll
-
Thumb2/
-
mve-vst3.ll
-
X86/
-
2011-10-19-LegelizeLoad.ll
-
2012-08-07-CmpISelBug.ll
-
addcarry.ll
-
any_extend_vector_inreg_of_broadcast.ll
1
any_extend_vector_inreg_of_broadcast_from_memory.ll
-
avx512-mask-op.ll
-
avx512bw-intrinsics-upgrade.ll
2/4
avx512vl-vec-masked-cmp.ll
-
const-shift-of-constmasked.ll
-
dagcombine-cse.ll
3/6
dagcombine-select.ll
-
field-extract-use-trunc.ll
2/2
horizontal-sum.ll
-
icmp-shift-opt.ll
1
insert-into-constant-vector.ll
1
insertelement-var-index.ll
-
is_fpclass-fp80.ll
-
isel-blendi-gettargetconstant.ll
-
masked_store.ll
-
movmsk-cmp.ll
-
mulvi32.ll
-
nontemporal-3.ll
-
pmulh.ll
-
popcnt.ll
1/1
pr53419.ll
-
promote-vec3.ll
1
psubus.ll
-
shift-mask.ll
-
shuffle-strided-with-offset-128.ll
-
single_elt_vector_memory_operation.ll
-
smax.ll
-
smin.ll
-
umax.ll
-
umin.ll
4/6
v8i1-masks.ll
-
vector-fshl-256.ll
-
vector-fshl-512.ll
-
vector-fshl-rot-256.ll
-
vector-fshl-rot-512.ll
-
vector-fshr-256.ll
-
vector-fshr-512.ll
-
vector-fshr-rot-256.ll
-
vector-fshr-rot-512.ll
-
vector-interleaved-load-i16-stride-2.ll
-
vector-interleaved-load-i16-stride-3.ll
-
vector-interleaved-load-i16-stride-4.ll
-
vector-interleaved-load-i16-stride-5.ll
-
vector-interleaved-load-i16-stride-6.ll
-
vector-interleaved-load-i16-stride-7.ll
-
vector-interleaved-load-i16-stride-8.ll
-
vector-interleaved-load-i32-stride-2.ll
-
vector-interleaved-load-i32-stride-3.ll
-
vector-interleaved-load-i32-stride-4.ll
-
vector-interleaved-load-i32-stride-5.ll
-
vector-interleaved-load-i32-stride-6.ll
-
vector-interleaved-load-i32-stride-7.ll
-
vector-interleaved-load-i32-stride-8.ll
-
vector-interleaved-load-i64-stride-2.ll
-
vector-interleaved-load-i64-stride-3.ll
-
vector-interleaved-load-i64-stride-4.ll
-
vector-interleaved-load-i64-stride-5.ll
-
vector-interleaved-load-i64-stride-6.ll
-
vector-interleaved-load-i64-stride-7.ll
-
vector-interleaved-load-i64-stride-8.ll
-
vector-interleaved-load-i8-stride-2.ll
-
vector-interleaved-load-i8-stride-5.ll
-
vector-interleaved-load-i8-stride-6.ll
-
vector-interleaved-load-i8-stride-7.ll
-
vector-interleaved-load-i8-stride-8.ll
-
vector-interleaved-store-i32-stride-7.ll
-
vector-interleaved-store-i64-stride-5.ll
-
vector-interleaved-store-i64-stride-6.ll
-
vector-interleaved-store-i64-stride-7.ll
-
vector-interleaved-store-i64-stride-8.ll
-
vector-interleaved-store-i8-stride-7.ll
-
vector-reduce-and-cmp.ll
-
vector-reduce-and.ll
-
vector-reduce-or.ll
-
vector-reduce-xor.ll
-
vector-replicaton-i1-mask.ll
-
vector-rotate-256.ll
-
vector-rotate-512.ll
1
vector-shuffle-combining.ll
-
vector-shuffle-concatenation.ll
2
vector-shuffle-sse4a.ll
-
vector-zext.ll
-
widen-load-of-small-alloca-with-zero-upper-half.ll
1/2
xor.ll
-
zero_extend_vector_inreg_of_broadcast.ll

Differential D127115

[DAGCombine] Make sure combined nodes are added back to the worklist in topological order.
ClosedPublic

Authored by deadalnix on Jun 6 2022, 7:49 AM.

Download Raw Diff

Details

Reviewers

efriedma
craig.topper
spatel
foad
pengfei
lebedev.ri
RKSimon

Commits

rGa70d5e25f32e: [DAGCombine] Make sure combined nodes are added back to the worklist in…
rGe69fa03ddd85: [DAGCombine] Make sure combined nodes are added back to the worklist in…

Summary

Currently, a node and its users are added back to the worklist in reverse topological order after it is combined. This diff changes that order to be topological. This is part of a larger migration to get the DAGCombiner to process nodes in topological order.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,170 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/non-overloaded::vloxseg.c
	60,230 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/non-overloaded::vluxseg.c
	60,180 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/overloaded::vloxseg.c
	60,260 ms	x64 debian > Clang.CodeGen/RISCV/rvv-intrinsics-autogenerated/policy/overloaded::vluxseg.c
	180 ms	x64 debian > LLVM.CodeGen/X86::bitreverse.ll
		View Full Test Results (9 Failed)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

RKSimon mentioned this in rG556c94e73ed0: [DAG] visitINSERT_VECTOR_ELT - use mergeEltWithShuffle to merge inserted vector….Jan 22 2023, 9:20 AM

Rebase on top of @RKSimon 's element shuffle work.

deadalnix added inline comments.Jan 22 2023, 6:36 PM

llvm/test/CodeGen/X86/v8i1-masks.ll
149	We have a new regression here :(

Harbormaster completed remote builds in B209264: Diff 491218.Jan 22 2023, 8:40 PM

RKSimon added inline comments.Jan 23 2023, 2:03 AM

llvm/test/CodeGen/X86/v8i1-masks.ll
149	Same regression - I just refactored the file recently to add AVX512 coverage

RKSimon mentioned this in rG0c69cb226a57: [X86] Add test coverage for and(ext(and(x, c1)),c2) patterns.Jan 23 2023, 4:08 AM

RKSimon mentioned this in rGd1426cd4848b: [DAG] visitAnd - fold (and (ext (and V, c1)), c2) -> (and (ext V), (and c1….Jan 23 2023, 6:29 AM

RKSimon added inline comments.Jan 23 2023, 9:28 AM

llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast_from_memory.ll
3671	We still need to look at this - `vbroadcastf128 {{.*#+}} ymm1 = mem[0,1,0,1]` and `vmovdqa (%rdi), %xmm2` have been split but should be able to share the same load.

RKSimon added inline comments.Jan 23 2023, 9:29 AM

llvm/test/CodeGen/X86/insertelement-var-index.ll
2338	we're inserting the same load that we've already broadcast to the entire zmm?

Rebase on top of the fix for v8i1-masks.ll

Harbormaster completed remote builds in B209570: Diff 491668.Jan 24 2023, 2:54 AM

RKSimon mentioned this in D142536: [X86] lowerShuffleAsLanePermuteAndRepeatedMask - retain the per-lane undef elements and don't just copy the repeated mask.Jan 25 2023, 6:35 AM

It might be interesting to make the previous order optional with a flag to llc for testing purposes, just to check which transforms are flaky/dependent on happenstance codes.

In D127115#4081978, @goldstein.w.n wrote:

It might be interesting to make the previous order optional with a flag to llc for testing purposes, just to check which transforms are flaky/dependent on happenstance codes.

This would be useful for the initial stages when this is committed to trunk as I'm guessing we'll be fighting regressions for a while (which is why I reckon getting this committed shortly after the 16.0 cherry picks quieten down would be ideal) - hopefully it'd never get to a release branch though, or used in any test files.

RKSimon mentioned this in rG37bc62ed0a24: [X86] lowerShuffleAsLanePermuteAndRepeatedMask - retain the per-lane undef….Jan 29 2023, 3:04 AM

One more rebase and upgradign tests:

Several tests where converted to opaque pointer type.
rG37bc62ed0a24303aa572155009358b8937ab8b4c

Harbormaster completed remote builds in B210761: Diff 493290.Jan 30 2023, 7:32 AM

Matt added a subscriber: Matt.Feb 1 2023, 8:33 PM

RKSimon mentioned this in rG9ffe58dc273f: [PowerPC] aix32-cc-abi-vaarg.ll - improve DAG checks.Feb 4 2023, 3:18 AM

rebase

Herald added a subscriber: jobnoorman. · View Herald TranscriptFeb 7 2023, 3:48 PM

remove erroneous autogen note.

Harbormaster completed remote builds in B212487: Diff 495668.Feb 7 2023, 5:20 PM

chfast added inline comments.Feb 8 2023, 12:19 AM

llvm/test/CodeGen/X86/avx512vl-vec-masked-cmp.ll
2694	Looks like fixed now.

RKSimon mentioned this in rG0b0a38a7a229: [X86] combineX86ShufflesRecursively - don't widen shuffle subvector inputs.Feb 11 2023, 5:23 AM

Rebase

deadalnix added inline comments.Feb 14 2023, 1:42 PM

llvm/test/CodeGen/X86/pr53419.ll
102	This file has now regressed :'(

Harbormaster completed remote builds in B213728: Diff 497430.Feb 14 2023, 2:44 PM

Looks like we're very close to finally getting this in - @kazu @goldstein.w.n do you recognize any of the remaining regressions?

(temporarily commandeering to rebase patch) @deadalnix please take this back when you're about

rebase

RKSimon added inline comments.Mar 17 2023, 5:02 AM

llvm/test/CodeGen/X86/add-and-not.ll

305 ↗

(On Diff #506040)

SelectionDAG has 14 nodes:
  t0: ch,glue = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %0
            t15: i32 = truncate t2
          t16: i32 = xor t15, Constant:i32<-1>
        t19: i32 = and t16, Constant:i32<1>
      t20: i64 = zero_extend t19
    t6: i64 = add t2, t20
  t9: ch,glue = CopyToReg t0, Register:i64 $rax, t6
  t10: ch = X86ISD::RET_FLAG t9, TargetConstant:i32<0>, Register:i64 $rax, t9:1

llvm/test/CodeGen/X86/dagcombine-select.ll

217

Looks like the truncate means we're now failing to call foldBinOpIntoSelect before:

SelectionDAG has 15 nodes:
  t0: ch,glue = EntryToken
  t2: i32,ch = CopyFromReg t0, Register:i32 %0
  t3: i8 = truncate t2
          t4: i1 = truncate t2
        t7: i32 = select t4, Constant:i32<2>, Constant:i32<3>
      t9: i8 = truncate t7
    t10: i32 = shl Constant:i32<1>, t9
  t13: ch,glue = CopyToReg t0, Register:i32 $eax, t10
  t14: ch = X86ISD::RET_FLAG t13, TargetConstant:i32<0>, Register:i32 $eax, t13:1

becomes:

SelectionDAG has 14 nodes:
  t0: ch,glue = EntryToken
            t2: i32,ch = CopyFromReg t0, Register:i32 %0
          t23: i8 = truncate t2
        t25: i8 = and t23, Constant:i8<1>
      t22: i8 = xor t25, Constant:i8<3>
    t10: i32 = shl Constant:i32<1>, t22
  t13: ch,glue = CopyToReg t0, Register:i32 $eax, t10
  t14: ch = X86ISD::RET_FLAG t13, TargetConstant:i32<0>, Register:i32 $eax, t13:1

Harbormaster completed remote builds in B220030: Diff 506040.Mar 17 2023, 5:23 AM

address regressions in foldBinOpIntoSelect when handling shift(x, trunc/zext(y)) patterns

shift amount type canonicalization was preventing foldBinOpIntoSelect combines before other select-of-constant combines occurred

Unfortunately I haven't found a good way to test the fix separately from the main patch.

Harbormaster completed remote builds in B220423: Diff 506561.Mar 20 2023, 7:21 AM

RKSimon mentioned this in rGa6a788bdfb39: [DAG] foldBinOpIntoSelect - use FoldConstantArithmetic instead of getNode() +….Mar 21 2023, 5:59 AM

RKSimon mentioned this in D146121: [DAG] Move lshr narrowing from visitANDLike to SimplifyDemandedBits.Mar 29 2023, 5:51 AM

rebase

Harbormaster completed remote builds in B223663: Diff 510916.Apr 4 2023, 4:20 PM

deadalnix mentioned this in D147821: [DAG] Peek through zext/trunc in haveNoCommonBitsSet..Apr 7 2023, 3:46 PM

@RKSimon You did well to take this over.

llvm/test/CodeGen/X86/add-and-not.ll
305 ↗	(On Diff #510916)	I was able to repro this one standalone in D147821 . For some reason, this fix work on 32 bits, but there is still a missing piece for 64bits.

deadalnix mentioned this in D147827: [DAG] Peek through zext/trunc when matching (or (and X, (not Y)), Y)..Apr 7 2023, 4:33 PM

deadalnix added inline comments.Apr 7 2023, 4:53 PM

llvm/test/CodeGen/X86/add-and-not.ll
305 ↗	(On Diff #510916)	D147827 sorts this out.

deadalnix added inline comments.Apr 8 2023, 7:40 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2433–2439	Shouldn't this come in its own patch?

RKSimon added inline comments.Apr 10 2023, 9:23 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2433–2439	Yes, I just hadn't found a good way to test with trunk so far.

Amaury Séchet <deadalnix@gmail.com> mentioned this in rG9041e1fa29a4: [DAG] Peek through zext/trunc in haveNoCommonBitsSet..Apr 11 2023, 4:44 AM

Amaury Séchet <deadalnix@gmail.com> mentioned this in rG91105df3dfeb: [DAG] Peek through zext/trunc when matching (or (and X, (not Y)), Y)..Apr 11 2023, 6:48 AM

deadalnix commandeered this revision.Apr 11 2023, 3:40 PM

deadalnix edited reviewers, added: RKSimon; removed: deadalnix.

Rebase

Harbormaster completed remote builds in B224890: Diff 512609.Apr 11 2023, 4:15 PM

Regen LLVM.CodeGen/PowerPC::select_const.ll

Harbormaster completed remote builds in B224943: Diff 512679.Apr 12 2023, 1:44 AM

RKSimon added inline comments.Apr 12 2023, 3:05 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
47324 ↗	(On Diff #512679)	I've removed this assertion in rGb20c1ffe8f3e as part of PR60007

deadalnix added inline comments.Apr 12 2023, 4:15 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
47324 ↗	(On Diff #512679)	Will rebase.

Rebase on top of rGb20c1ffe8f3e

Harbormaster completed remote builds in B225112: Diff 512888.Apr 12 2023, 11:19 AM

Rebase, unfortunately, we have a big regression on RISCV/pr58511.ll

Harbormaster completed remote builds in B227725: Diff 516403.Apr 24 2023, 8:08 AM

RKSimon mentioned this in rG17dd1ad14be7: [X86] lowerShuffleAsElementInsertion - fold to or(vzext_movl(scalar_to_vector….May 7 2023, 12:58 PM

RKSimon added inline comments.May 7 2023, 12:59 PM

llvm/test/CodeGen/X86/insert-into-constant-vector.ll
28	These should be fixed by rG17dd1ad14be77c722f7c7c1e4fa273c6f170abea

rebase

Harbormaster completed remote builds in B230871: Diff 520699.May 9 2023, 8:27 AM

Fix RISCV's pr58511.ll regression

Herald added a subscriber: arichardson. · View Herald TranscriptMay 18 2023, 12:11 PM

deadalnix added inline comments.May 18 2023, 12:23 PM

llvm/test/CodeGen/X86/vector-reduce-or-bool.ll
507–508 ↗	(On Diff #523499)	Do we care about these? They replace a load with a materialization via vpbroadcast, and it's not clear to me that is actually worse. It's not clear to me either when one option is picked over the other.

Harbormaster completed remote builds in B232966: Diff 523499.May 18 2023, 1:32 PM

RKSimon added inline comments.May 18 2023, 1:36 PM

llvm/test/CodeGen/X86/vector-reduce-or-bool.ll
507–508 ↗	(On Diff #523499)	Don't worry about these - its an existing problem with build vector constants

deadalnix added inline comments.May 19 2023, 1:47 PM

llvm/test/CodeGen/X86/vector-reduce-or-bool.ll
508 ↗	(On Diff #523499)	Ok, what outstanding issue do we have left? It seems like this is close to good to go.

Yes, I think this is very close now - please can you get some test-suite numbers to highlight any perf differences?

In D127115#4358686, @RKSimon wrote:

Yes, I think this is very close now - please can you get some test-suite numbers to highlight any perf differences?

I ran the test suite with and without the patch. The perf difference is well within the measurement noise if there is one at all.

craig.topper mentioned this in rG2e6bfa8ed08c: [RISCV] Update pr58511.ll to not use mul by constant that can be converted to….May 20 2023, 7:08 PM

Rebase on top of rG2e6bfa8ed08c94e2edbe673379c90012efc95abb

Harbormaster completed remote builds in B233446: Diff 524126.May 21 2023, 1:38 PM

Thanks, we're definitely very close now - please can you update the summary to something closer to a commit message

deadalnix edited the summary of this revision. (Show Details)May 23 2023, 4:10 PM

deadalnix edited the summary of this revision. (Show Details)

FYI I ran this patch on an internal corpus of AMDGPU graphics shaders and didn't see anything alarming. There was a slight change in the way fmul+fadd are combined into fma, but I don't think it was consistently worse - just different. Anyway I will work on that, but I do not think it should block this patch.

In D127115#4367640, @foad wrote:

FYI I ran this patch on an internal corpus of AMDGPU graphics shaders and didn't see anything alarming. There was a slight change in the way fmul+fadd are combined into fma, but I don't think it was consistently worse - just different. Anyway I will work on that, but I do not think it should block this patch.

The tests I ran also looked fine. Nothing looked worrying in what I tried.

RKSimon retitled this revision from [RFC][DAGCombine] Make sure combined nodes are added back to the worklist in topological order. to [DAGCombine] Make sure combined nodes are added back to the worklist in topological order..May 24 2023, 6:00 AM

Rebase and fix tests

Harbormaster completed remote builds in B235151: Diff 526435.May 29 2023, 6:51 AM

@deadalnix Please can you rebase again?

FYI I ran this patch on an internal corpus of AMDGPU graphics shaders and didn't see anything alarming. There was a slight change in the way fmul+fadd are combined into fma, but I don't think it was consistently worse - just different. Anyway I will work on that, but I do not think it should block this patch.

I added a test case @fma_vs_output_modifier in test/CodeGen/AMDGPU/dagcombine-fma-fmad.ll so if you rebase now you will see a new regression. D151890 fixes it.

I think the next step for this patch is to decide how to get it committed, I'm expecting there will be a few perf regressions that the current tests miss (or only vaguely hint at), but IMO these are grossly outweighed by the benefits we're seeing.

But I'm worried it will end up being stuck in a reversion/re-commit loop for every report.

Does anyone else have any thoughts on this? @foad @nikic @craig.topper ?

Does anyone else have any thoughts on this? @foad @nikic @craig.topper ?

Not really :/ For my use cases (graphics shaders on AMDGPU) I am not too concerned, but to be safe we could run some fairly extensive performance tests. It would probably take about a week to get results from that.

I think you can just go ahead and land this. At this point it doesn't seem like this is going to cause systematic regressions -- and for individual cases we can fix forward.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2257	For truncates, don't we need to make sure no significant bits of the shamt get truncated?

RKSimon added inline comments.Jun 1 2023, 8:24 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2257	Yes, we should probably drop this change from the patch, accept the regression and address it in a followup

deadalnix added inline comments.Jun 1 2023, 9:52 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2257	Fair enough.

Rebase and remove the "Peek through any trunc/zext to shift amount type." change.

deadalnix added a child revision: D151916: [DAG] Peek through trunc when combining select into shifts..Jun 1 2023, 11:29 AM

deadalnix added inline comments.Jun 1 2023, 11:29 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2257	I extracted this into D151916

If people are not too worried, I think we shoudl land this, because it's time consuming to keep it up to date as there are a lot of conflicts when rebasing.

Fix CodeGen/PowerPC/select_const.ll

Harbormaster completed remote builds in B235939: Diff 527548.Jun 1 2023, 1:51 PM

LGTM - I think this is ready to land now

This revision is now accepted and ready to land.Jun 4 2023, 11:06 AM

This revision was landed with ongoing or failed builds.Jun 5 2023, 4:09 AM

Closed by commit rGe69fa03ddd85: [DAGCombine] Make sure combined nodes are added back to the worklist in… (authored by Amaury Séchet <deadalnix@gmail.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

Amaury Séchet <deadalnix@gmail.com> added a commit: rGe69fa03ddd85: [DAGCombine] Make sure combined nodes are added back to the worklist in….

deadalnix added inline comments.Jun 5 2023, 4:12 AM

llvm/test/CodeGen/Hexagon/autohvx/mulh.ll
19	@kparzysz This has been landed, your move now :)

@deadalnix congratulations and good luck! :)

It seems that this broke the AMDGPU OpenMP runtime buildbot (https://lab.llvm.org/buildbot/#/builders/193/builds/32362).
I reverted locally and the build works again.
The issue I'm seeing is that the build of the file kmp_affinity.cpp

I also see build timeouts on my builds

jplehr added a reverting change: rGc9998ec14597: Revert "[DAGCombine] Make sure combined nodes are added back to the worklist in….Jun 5 2023, 7:57 AM

thanks @jplehr

In D127115#4395664, @jplehr wrote:

It seems that this broke the AMDGPU OpenMP runtime buildbot (https://lab.llvm.org/buildbot/#/builders/193/builds/32362).
I reverted locally and the build works again.
The issue I'm seeing is that the build of the file kmp_affinity.cpp

In a 2-stage build, D127115 caused Clang to end in an infinite loop when compiling some files to.
This applies to a few other files, including llvm/lib/IR/AsmWriter.cpp

Investigate infinite loops in stage2 compilers

This revision is now accepted and ready to land.Jun 5 2023, 2:28 PM

RKSimon requested changes to this revision.Jun 5 2023, 2:28 PM

This revision now requires changes to proceed.Jun 5 2023, 2:28 PM

nikic mentioned this in rGe1aa91b36325: [InstCombine] Use KnownBits::shl() in SimplifyDemandedBits().Jun 6 2023, 12:59 AM

@deadalnix I've reproduced the stage2 infinite loop on AsmWriter.cpp - currently trying to get bugpoint to reduce it.

define void @_ZN12_GLOBAL__N_114AssemblyWriterC2ERN4llvm21formatted_raw_ostreamERNS1_11SlotTrackerEPKNS1_6ModuleEPNS1_24AssemblyAnnotationWriterEbb() {
entry:
  %__end1.sroa.5.0.copyload = load ptr, ptr poison, align 8
  %__end1.sroa.6.0.copyload = load ptr, ptr null, align 8
  %cmp.i.i.i8.i.i = icmp ne ptr null, %__end1.sroa.6.0.copyload
  %cmp.i.i.i.i9.i.i = icmp ne ptr null, %__end1.sroa.5.0.copyload
  %.not.i = select i1 %cmp.i.i.i8.i.i, i1 true, i1 %cmp.i.i.i.i9.i.i
  %.not.i.fr = freeze i1 %.not.i
  br i1 %.not.i.fr, label %for.cond.us, label %if.end.split

for.cond.us:                                      ; preds = %entry
  unreachable

if.end.split:                                     ; preds = %entry
  ret void
}

In D127115#4399236, @RKSimon wrote:

define void @_ZN12_GLOBAL__N_114AssemblyWriterC2ERN4llvm21formatted_raw_ostreamERNS1_11SlotTrackerEPKNS1_6ModuleEPNS1_24AssemblyAnnotationWriterEbb() {
entry:
  %__end1.sroa.5.0.copyload = load ptr, ptr poison, align 8
  %__end1.sroa.6.0.copyload = load ptr, ptr null, align 8
  %cmp.i.i.i8.i.i = icmp ne ptr null, %__end1.sroa.6.0.copyload
  %cmp.i.i.i.i9.i.i = icmp ne ptr null, %__end1.sroa.5.0.copyload
  %.not.i = select i1 %cmp.i.i.i8.i.i, i1 true, i1 %cmp.i.i.i.i9.i.i
  %.not.i.fr = freeze i1 %.not.i
  br i1 %.not.i.fr, label %for.cond.us, label %if.end.split

for.cond.us:                                      ; preds = %entry
  unreachable

if.end.split:                                     ; preds = %entry
  ret void
}

We've also identified this issue in an internal codebase built with this patch, the problem is that // X != Y --> (X^Y) in TargetLowering::SimplifySetCC and Transform (brcond (xor x, y)) -> (brcond (setcc, x, y, ne)) in DAGCombiner::rebuildSetCC keep undoing each other without an end:

Combining: t1517: ch = brcond t0, t1520, BasicBlock:ch<if.end.split 0x1ae28ac5070>
Creating new node: t1522: i1 = setcc t1521, Constant:i1<-1>, setne:ch
Creating new node: t1523: ch = brcond t0, t1522, BasicBlock:ch<if.end.split 0x1ae28ac5070>
 ... into: t1523: ch = brcond t0, t1522, BasicBlock:ch<if.end.split 0x1ae28ac5070>

Combining: t1521: i1 = freeze t9

Combining: t1523: ch = brcond t0, t1522, BasicBlock:ch<if.end.split 0x1ae28ac5070>

Combining: t1522: i1 = setcc t1521, Constant:i1<-1>, setne:ch
Creating new node: t1524: i1 = setcc t9, Constant:i1<-1>, setne:ch
Creating new node: t1525: i1 = freeze t1524
 ... into: t1525: i1 = freeze t1524

Combining: t1525: i1 = freeze t1524

Combining: t1524: i1 = setcc t9, Constant:i1<-1>, setne:ch
Creating new node: t1526: i1 = xor t9, Constant:i1<-1>
 ... into: t1526: i1 = xor t9, Constant:i1<-1>

Combining: t1526: i1 = xor t9, Constant:i1<-1>

Combining: t1525: i1 = freeze t1526
Creating new node: t1527: i1 = freeze t9
 ... into: t1526: i1 = xor t1527, Constant:i1<-1>

Combining: t1526: i1 = xor t1527, Constant:i1<-1>

Combining: t1527: i1 = freeze t9

Combining: t1523: ch = brcond t0, t1526, BasicBlock:ch<if.end.split 0x1ae28ac5070> --------> Back to where we started.
Creating new node: t1528: i1 = setcc t1527, Constant:i1<-1>, setne:ch
Creating new node: t1529: ch = brcond t0, t1528, BasicBlock:ch<if.end.split 0x1ae28ac5070>
 ... into: t1529: ch = brcond t0, t1528, BasicBlock:ch<if.end.split 0x1ae28ac5070>

In D127115#4399535, @n-omer wrote:

We've also identified this issue in an internal codebase built with this patch, the problem is that // X != Y --> (X^Y) in TargetLowering::SimplifySetCC and Transform (brcond (xor x, y)) -> (brcond (setcc, x, y, ne)) in DAGCombiner::rebuildSetCC keep undoing each other without an end:

That seems relatively easy to fix, the question is, which one do we want?

In D127115#4399236, @RKSimon wrote:

define void @_ZN12_GLOBAL__N_114AssemblyWriterC2ERN4llvm21formatted_raw_ostreamERNS1_11SlotTrackerEPKNS1_6ModuleEPNS1_24AssemblyAnnotationWriterEbb() {
entry:
  %__end1.sroa.5.0.copyload = load ptr, ptr poison, align 8
  %__end1.sroa.6.0.copyload = load ptr, ptr null, align 8
  %cmp.i.i.i8.i.i = icmp ne ptr null, %__end1.sroa.6.0.copyload
  %cmp.i.i.i.i9.i.i = icmp ne ptr null, %__end1.sroa.5.0.copyload
  %.not.i = select i1 %cmp.i.i.i8.i.i, i1 true, i1 %cmp.i.i.i.i9.i.i
  %.not.i.fr = freeze i1 %.not.i
  br i1 %.not.i.fr, label %for.cond.us, label %if.end.split

for.cond.us:                                      ; preds = %entry
  unreachable

if.end.split:                                     ; preds = %entry
  ret void
}

I am failing to see how this generates an infinite loop. llc compiles it just fine. Would you have more details on how to repro?

In D127115#4403083, @deadalnix wrote:

In D127115#4399236, @RKSimon wrote:

define void @_ZN12_GLOBAL__N_114AssemblyWriterC2ERN4llvm21formatted_raw_ostreamERNS1_11SlotTrackerEPKNS1_6ModuleEPNS1_24AssemblyAnnotationWriterEbb() {
entry:
  %__end1.sroa.5.0.copyload = load ptr, ptr poison, align 8
  %__end1.sroa.6.0.copyload = load ptr, ptr null, align 8
  %cmp.i.i.i8.i.i = icmp ne ptr null, %__end1.sroa.6.0.copyload
  %cmp.i.i.i.i9.i.i = icmp ne ptr null, %__end1.sroa.5.0.copyload
  %.not.i = select i1 %cmp.i.i.i8.i.i, i1 true, i1 %cmp.i.i.i.i9.i.i
  %.not.i.fr = freeze i1 %.not.i
  br i1 %.not.i.fr, label %for.cond.us, label %if.end.split

for.cond.us:                                      ; preds = %entry
  unreachable

if.end.split:                                     ; preds = %entry
  ret void
}

I am failing to see how this generates an infinite loop. llc compiles it just fine. Would you have more details on how to repro?

With this patch applied on top of main@2011ad0cbbf52a6f3b7bf76aa40578d3ff9fd60d I'm able to reproduce the infinite loop with llc.

I have this one, which repro and is simpler:

define i64 @foo(i1 %0) {
  br label %2

2:
  %3 = select i1 %0, i1 %0, i1 false
  %4 = freeze i1 %3
  br i1 %4, label %5, label %6

5:
  br label %6

6:
  %7 = phi i64 [ 0, %5 ], [ 1, %2 ]
  ret i64 %7
}

I'll figure out a fix. That doesn't sound complicated.

Amaury Séchet <deadalnix@gmail.com> mentioned this in rGaa5a1eaa38bb: [NFC] Add regression tests for an infinite loop caused by D127115.Jun 8 2023, 2:13 AM

I can confirm D152430 indeed fixes the infinite loop.

Rebase on top of D152430

deadalnix added a parent revision: D152430: [DAG] Peek through freeze when deciding whether we should convert setcc to math or not..Jun 9 2023, 6:07 AM

Harbormaster completed remote builds in B237746: Diff 529937.Jun 9 2023, 7:20 AM

Are you now able to create a stage2 build?

RKSimon mentioned this in D152544: [DAGCombine] Move setcc of freeze fold to brcond .Jun 9 2023, 9:55 AM

@deadalnix Please can you pre-commit the brcond regression test(s) and rebase on D152544 / 5c6ff3a6025570479da5b72fcd02ca93b470683b

RKSimon mentioned this in D152430: [DAG] Peek through freeze when deciding whether we should convert setcc to math or not..Jun 12 2023, 3:33 AM

In D127115#4413265, @RKSimon wrote:

@deadalnix Please can you pre-commit the brcond regression test(s) and rebase on D152544 / 5c6ff3a6025570479da5b72fcd02ca93b470683b

The regression test has been precommitted already in aa5a1eaa38bbcff64e22a0e0662843d119d3d71f

I'm rebasing this one now.

Rebase on top of D152544

As far as I know, the infinite loop problem is solved now, do we have any outstanding issue? Shall we give this another try?

LGTM

This revision is now accepted and ready to land.Jun 12 2023, 6:30 AM

Harbormaster completed remote builds in B238169: Diff 530482.Jun 12 2023, 6:30 AM

rebase

Harbormaster completed remote builds in B238407: Diff 530797.Jun 13 2023, 1:22 AM

Closed by commit rGa70d5e25f32e: [DAGCombine] Make sure combined nodes are added back to the worklist in… (authored by Amaury Séchet <deadalnix@gmail.com>). · Explain WhyJun 13 2023, 2:15 AM

This revision was automatically updated to reflect the committed changes.

Amaury Séchet <deadalnix@gmail.com> added a commit: rGa70d5e25f32e: [DAGCombine] Make sure combined nodes are added back to the worklist in….

deadalnix mentioned this in D152928: [RFC][DAG] Initially add nodes in the worklist in topological order..Jun 14 2023, 8:52 AM

Thanks for working on this and good luck.

I've been fuzzing this change since one day before it landed (for the second time) hoping to find some other compilation hangs. So far so good.

kparzysz added inline comments.Jun 16 2023, 1:01 PM

llvm/test/CodeGen/Hexagon/autohvx/mulh.ll
19	Excellent. Working on it.

Herald added a subscriber: wangpc. · View Herald TranscriptJun 16 2023, 1:01 PM

Large Diff

This large diff affects 113 files. Files without inline comments have been collapsed. Expand All Files

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

8 lines

test/

CodeGen/

AMDGPU/

combine-cond-add-sub.ll

10 lines

dagcombine-setcc-select.ll

8 lines

117 lines

16 lines

10 lines

44 lines

42 lines

load-local-redundant-copies.ll

14 lines

store-local.128.ll

383 lines

store-local.96.ll

211 lines

widen-smrd-loads.ll

3 lines

ARM/

addsubcarry-promotion.ll

27 lines

icmp-shift-opt.ll

9 lines

reg_sequence.ll

3 lines

Hexagon/

autohvx/

isel-vpackew.ll

2 lines

mulh.ll

40 lines

PowerPC/

aix32-cc-abi-vaarg.ll

52 lines

combine-fneg.ll

8 lines

select_const.ll

77 lines

RISCV/

mul.ll

18 lines

pr58511.ll

16 lines

SystemZ/

pr36164.ll

4 lines

Thumb2/

mve-vst3.ll

14 lines

X86/

2011-10-19-LegelizeLoad.ll

10 lines

2012-08-07-CmpISelBug.ll

8 lines

addcarry.ll

38 lines

any_extend_vector_inreg_of_broadcast.ll

128 lines

any_extend_vector_inreg_of_broadcast_from_memory.ll

20 lines

avx512-mask-op.ll

3 lines

avx512bw-intrinsics-upgrade.ll

28 lines

avx512vl-vec-masked-cmp.ll

90 lines

const-shift-of-constmasked.ll

12 lines

dagcombine-cse.ll

41 lines

dagcombine-select.ll

29 lines

field-extract-use-trunc.ll

2 lines

horizontal-sum.ll

149 lines

icmp-shift-opt.ll

6 lines

insert-into-constant-vector.ll

10 lines

insertelement-var-index.ll

77 lines

is_fpclass-fp80.ll

5 lines

isel-blendi-gettargetconstant.ll

7 lines

19 lines

36 lines

30 lines

208 lines

72 lines

4 lines

45 lines

6 lines

56 lines

6 lines

shuffle-strided-with-offset-128.ll

6 lines

single_elt_vector_memory_operation.ll

11 lines

6 lines

6 lines

6 lines

6 lines

8 lines

12 lines

4 lines

vector-fshl-rot-256.ll

10 lines

vector-fshl-rot-512.ll

4 lines

vector-fshr-256.ll

12 lines

vector-fshr-512.ll

4 lines

vector-fshr-rot-256.ll

10 lines

vector-fshr-rot-512.ll

4 lines

vector-interleaved-load-i16-stride-2.ll

62 lines

vector-interleaved-load-i16-stride-3.ll

626 lines

vector-interleaved-load-i16-stride-4.ll

2026 lines

vector-interleaved-load-i16-stride-5.ll

3523 lines

vector-interleaved-load-i16-stride-6.ll

5024 lines

vector-interleaved-load-i16-stride-7.ll

8021 lines

vector-interleaved-load-i16-stride-8.ll

244 lines

vector-interleaved-load-i32-stride-2.ll

32 lines

vector-interleaved-load-i32-stride-3.ll

362 lines

vector-interleaved-load-i32-stride-4.ll

354 lines

vector-interleaved-load-i32-stride-5.ll

3119 lines

vector-interleaved-load-i32-stride-6.ll

3497 lines

vector-interleaved-load-i32-stride-7.ll

6617 lines

vector-interleaved-load-i32-stride-8.ll

406 lines

vector-interleaved-load-i64-stride-2.ll

32 lines

vector-interleaved-load-i64-stride-3.ll

162 lines

vector-interleaved-load-i64-stride-4.ll

2340 lines

vector-interleaved-load-i64-stride-5.ll

1314 lines

vector-interleaved-load-i64-stride-6.ll

1087 lines

vector-interleaved-load-i64-stride-7.ll

2036 lines

vector-interleaved-load-i64-stride-8.ll

1674 lines

vector-interleaved-load-i8-stride-2.ll

46 lines

vector-interleaved-load-i8-stride-5.ll

58 lines

vector-interleaved-load-i8-stride-6.ll

48 lines

vector-interleaved-load-i8-stride-7.ll

2497 lines

vector-interleaved-load-i8-stride-8.ll

559 lines

vector-interleaved-store-i32-stride-7.ll

1312 lines

vector-interleaved-store-i64-stride-5.ll

503 lines

vector-interleaved-store-i64-stride-6.ll

712 lines

vector-interleaved-store-i64-stride-7.ll

557 lines

vector-interleaved-store-i64-stride-8.ll

1542 lines

vector-interleaved-store-i8-stride-7.ll

78 lines

vector-reduce-and-cmp.ll

34 lines

vector-reduce-and.ll

46 lines

vector-reduce-or.ll

46 lines

vector-reduce-xor.ll

46 lines

vector-replicaton-i1-mask.ll

246 lines

vector-rotate-256.ll

10 lines

vector-rotate-512.ll

4 lines

vector-shuffle-combining.ll

54 lines

vector-shuffle-concatenation.ll

2 lines

vector-shuffle-sse4a.ll

2 lines

vector-zext.ll

12 lines

widen-load-of-small-alloca-with-zero-upper-half.ll

18 lines

xor.ll

2 lines

zero_extend_vector_inreg_of_broadcast.ll

56 lines

Diff 489859

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,615 Lines • ▼ Show 20 Lines	if (LegalDAG) {
continue;		continue;
}		}

LLVM_DEBUG(dbgs() << "\nCombining: "; N->dump(&DAG));		LLVM_DEBUG(dbgs() << "\nCombining: "; N->dump(&DAG));

// Add any operands of the new node which have not yet been combined to the		// Add any operands of the new node which have not yet been combined to the
// worklist as well. Because the worklist uniques things already, this		// worklist as well. Because the worklist uniques things already, this
// won't repeatedly process the same operand.		// won't repeatedly process the same operand.
CombinedNodes.insert(N);
for (const SDValue &ChildN : N->op_values())		for (const SDValue &ChildN : N->op_values())
if (!CombinedNodes.count(ChildN.getNode()))		if (!CombinedNodes.count(ChildN.getNode()))
AddToWorklist(ChildN.getNode());		AddToWorklist(ChildN.getNode());

		CombinedNodes.insert(N);
SDValue RV = combine(N);		SDValue RV = combine(N);

if (!RV.getNode())		if (!RV.getNode())
continue;		continue;

++NodesCombined;		++NodesCombined;

// If we get back the same node we passed in, rather than a new node or		// If we get back the same node we passed in, rather than a new node or
Show All 17 Lines	else {
DAG.ReplaceAllUsesWith(N, &RV);		DAG.ReplaceAllUsesWith(N, &RV);
}		}

// Push the new node and any users onto the worklist. Omit this if the		// Push the new node and any users onto the worklist. Omit this if the
// new node is the EntryToken (e.g. if a store managed to get optimized		// new node is the EntryToken (e.g. if a store managed to get optimized
// out), because re-visiting the EntryToken and its users will not uncover		// out), because re-visiting the EntryToken and its users will not uncover
// any additional opportunities, but there may be a large number of such		// any additional opportunities, but there may be a large number of such
// users, potentially causing compile time explosion.		// users, potentially causing compile time explosion.
if (RV.getOpcode() != ISD::EntryToken) {		if (RV.getOpcode() != ISD::EntryToken)
AddToWorklist(RV.getNode());		AddToWorklistWithUsers(RV.getNode());
AddUsersToWorklist(RV.getNode());
}

// Finally, if the node is now dead, remove it from the graph. The node		// Finally, if the node is now dead, remove it from the graph. The node
// may not be dead if the replacement process recursively simplified to		// may not be dead if the replacement process recursively simplified to
// something else needing this node. This will also take care of adding any		// something else needing this node. This will also take care of adding any
// operands which have lost a user to the worklist.		// operands which have lost a user to the worklist.
recursivelyDeleteUnusedNodes(N);		recursivelyDeleteUnusedNodes(N);
}		}

▲ Show 20 Lines • Show All 577 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::foldBinOpIntoSelect(SDNode *BO) {
SDValue Sel = BO->getOperand(0);		SDValue Sel = BO->getOperand(0);
if (Sel.getOpcode() != ISD::SELECT \|\| !Sel.hasOneUse()) {		if (Sel.getOpcode() != ISD::SELECT \|\| !Sel.hasOneUse()) {
SelOpNo = 1;		SelOpNo = 1;
Sel = BO->getOperand(1);		Sel = BO->getOperand(1);
}		}

if (Sel.getOpcode() != ISD::SELECT \|\| !Sel.hasOneUse())		if (Sel.getOpcode() != ISD::SELECT \|\| !Sel.hasOneUse())
return SDValue();		return SDValue();

		nikicUnsubmitted Not Done Reply Inline Actions For truncates, don't we need to make sure no significant bits of the shamt get truncated? nikic: For truncates, don't we need to make sure no significant bits of the shamt get truncated?
		RKSimonUnsubmitted Not Done Reply Inline Actions Yes, we should probably drop this change from the patch, accept the regression and address it in a followup RKSimon: Yes, we should probably drop this change from the patch, accept the regression and address it…
		deadalnixAuthorUnsubmitted Done Reply Inline Actions Fair enough. deadalnix: Fair enough.
		deadalnixAuthorUnsubmitted Done Reply Inline Actions I extracted this into D151916 deadalnix: I extracted this into D151916
SDValue CT = Sel.getOperand(1);		SDValue CT = Sel.getOperand(1);
if (!isConstantOrConstantVector(CT, true) &&		if (!isConstantOrConstantVector(CT, true) &&
!DAG.isConstantFPBuildVectorOrConstantFP(CT))		!DAG.isConstantFPBuildVectorOrConstantFP(CT))
return SDValue();		return SDValue();

SDValue CF = Sel.getOperand(2);		SDValue CF = Sel.getOperand(2);
if (!isConstantOrConstantVector(CF, true) &&		if (!isConstantOrConstantVector(CF, true) &&
!DAG.isConstantFPBuildVectorOrConstantFP(CF))		!DAG.isConstantFPBuildVectorOrConstantFP(CF))
▲ Show 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitADDLike(SDNode *N) {
if (N0.isUndef())		if (N0.isUndef())
return N0;		return N0;
if (N1.isUndef())		if (N1.isUndef())
return N1;		return N1;

// fold (add c1, c2) -> c1+c2		// fold (add c1, c2) -> c1+c2
if (SDValue C = DAG.FoldConstantArithmetic(ISD::ADD, DL, VT, {N0, N1}))		if (SDValue C = DAG.FoldConstantArithmetic(ISD::ADD, DL, VT, {N0, N1}))
return C;		return C;

// canonicalize constant to RHS		// canonicalize constant to RHS
if (DAG.isConstantIntBuildVectorOrConstantInt(N0) &&		if (DAG.isConstantIntBuildVectorOrConstantInt(N0) &&
!DAG.isConstantIntBuildVectorOrConstantInt(N1))		!DAG.isConstantIntBuildVectorOrConstantInt(N1))
return DAG.getNode(ISD::ADD, DL, VT, N1, N0);		return DAG.getNode(ISD::ADD, DL, VT, N1, N0);

// fold vector ops		// fold vector ops
		deadalnixAuthorUnsubmitted Not Done Reply Inline Actions Shouldn't this come in its own patch? deadalnix: Shouldn't this come in its own patch?
		RKSimonUnsubmitted Not Done Reply Inline Actions Yes, I just hadn't found a good way to test with trunk so far. RKSimon: Yes, I just hadn't found a good way to test with trunk so far.
if (VT.isVector()) {		if (VT.isVector()) {
if (SDValue FoldedVOp = SimplifyVBinOp(N, DL))		if (SDValue FoldedVOp = SimplifyVBinOp(N, DL))
return FoldedVOp;		return FoldedVOp;

// fold (add x, 0) -> x, vector edition		// fold (add x, 0) -> x, vector edition
if (ISD::isConstantSplatVectorAllZeros(N1.getNode()))		if (ISD::isConstantSplatVectorAllZeros(N1.getNode()))
return N0;		return N0;
}		}
▲ Show 20 Lines • Show All 23,854 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll

Load File

llvm/test/CodeGen/AMDGPU/dagcombine-setcc-select.ll

Load File

llvm/test/CodeGen/AMDGPU/ds-alignment.ll

Load File

llvm/test/CodeGen/AMDGPU/ds_write2.ll

Load File

llvm/test/CodeGen/AMDGPU/idot4u.ll

Load File

llvm/test/CodeGen/AMDGPU/idot8s.ll

Load File

llvm/test/CodeGen/AMDGPU/idot8u.ll

Load File

llvm/test/CodeGen/AMDGPU/load-local-redundant-copies.ll

Load File

llvm/test/CodeGen/AMDGPU/store-local.128.ll

Load File

llvm/test/CodeGen/AMDGPU/store-local.96.ll

Load File

llvm/test/CodeGen/AMDGPU/widen-smrd-loads.ll

	Show First 20 Lines • Show All 228 Lines • ▼ Show 20 Lines
	; VI-LABEL: widen_v2i8_constant_load:			; VI-LABEL: widen_v2i8_constant_load:
	; VI: ; %bb.0:			; VI: ; %bb.0:
	; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24			; VI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x24
	; VI-NEXT: v_mov_b32_e32 v0, 44			; VI-NEXT: v_mov_b32_e32 v0, 44
	; VI-NEXT: v_mov_b32_e32 v1, 3			; VI-NEXT: v_mov_b32_e32 v1, 3
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_load_dword s0, s[0:1], 0x0			; VI-NEXT: s_load_dword s0, s[0:1], 0x0
	; VI-NEXT: s_waitcnt lgkmcnt(0)			; VI-NEXT: s_waitcnt lgkmcnt(0)
	; VI-NEXT: s_add_i32 s1, s0, 12			; VI-NEXT: s_and_b32 s1, s0, 0xffff
				foadUnsubmitted Not Done Reply Inline Actions Given this DAG: SelectionDAG has 34 nodes: t0: ch,glue = EntryToken t94: i64,ch = CopyFromReg t0, Register:i64 %1 t10: i64 = AssertAlign<16> t94 t11: i64 = add nuw t10, Constant:i64<36> t91: v2i32,ch = load<(dereferenceable invariant load (s64) from %ir.arg.kernarg.offset, align 4, addrspace 4)> t0, t11, undef:i64 t92: i64 = bitcast t91 t99: i32,ch = load<(invariant load (s32) from %ir.arg.load, addrspace 4)> t0, t92, undef:i64 t101: i32 = and t99, Constant:i32<65535> t102: i16 = truncate t101 t49: i32 = any_extend t102 t79: i32 = add t49, Constant:i32<12> t83: i32 = or t79, Constant:i32<4> t98: i16 = truncate t83 t71: i16 = and t98, Constant:i16<255> t38: i16 = srl t102, Constant:i16<8> t46: i32 = any_extend t38 t81: i32 = add t46, Constant:i32<44> t84: i32 = or t81, Constant:i32<3> t95: i16 = truncate t84 t61: i16 = shl t95, Constant:i16<8> t62: i16 = or t71, t61 t34: ch = store<(store (s16) into `ptr addrspace(1) null`, addrspace 1)> t0, t62, Constant:i64<0>, undef:i64 t28: ch = ENDPGM t34 If the first thing we do is call simplifyDemandedBits on t101, then it will be removed (replaced with t99) since the upper 16 bits are not demanded. But if the first thing we do is combine t49 then it will be replaced with t101 (since any_extend of a truncate is a no-op), losing the fact that we didn't care about the upper 16 bits, and then t101 can no longer be removed. foad: Given this DAG: ``` SelectionDAG has 34 nodes: t0: ch,glue = EntryToken t94…
				deadalnixAuthorUnsubmitted Done Reply Inline Actions Funny enough, this is exactly why we want to move the DAG to be fully processed in topological order. But as we enforce topological processing on some part of the DAG, it is always possible that in other parts where it is implementation defined, it stops being done in that order. Eventually, all of it will be done in this order. On that one specifically, t49 eventually sink to `t98: i16 = truncate t83` so simplifyDemandedBits still has the information it needs in the DAG. Have you looked at why it isn't finding it? Is it because it's simply too deep, and going that deep would be too costly? On a side note, because order is implementation defined, there are likely instance of similar patterns in the wild where this doesn't get optimized. deadalnix: Funny enough, this is exactly why we want to move the DAG to be fully processed in topological…
				foadUnsubmitted Not Done Reply Inline Actions If I understand correctly: we visit t49 and combine it into t101 we visit t79 and do nothing because that did nothing, we do not revisit t83 and t98 if we did revisit t98 I think demanded bits would allow us to remove the AND in t101 foad: If I understand correctly: # we visit t49 and combine it into t101 # we visit t79 and do…
				deadalnixAuthorUnsubmitted Done Reply Inline Actions I think so, unless it is too deep. deadalnix: I think so, unless it is too deep.
				foadUnsubmitted Not Done Reply Inline Actions Is this a known deficiency in the DAGCombiner algorithm? I guess when we modify a node we add its immediate users to the worklist, but not all of its users-of-users and so on. Are there ways to work around this? foad: Is this a known deficiency in the DAGCombiner algorithm? I guess when we modify a node we add…
				deadalnixAuthorUnsubmitted Done Reply Inline Actions Right now, the order in which nodes are processed is implementation defined. This is a limitation of the current DAGCombiner. This patch is part of an effort to make the order topological. deadalnix: Right now, the order in which nodes are processed is implementation defined. This is a…
				foadUnsubmitted Not Done Reply Inline Actions I understand that. Maybe I'm not phrasing my question very well. Here's my understanding of the current behaviour even with your patch applied. We start off with: t49: i32 = any_extend t102 t79: i32 = add t49, Constant:i32<12> t83: i32 = or t79, Constant:i32<4> t98: i16 = truncate t83 We process t49, and we are able to simplify it, so we add its user t79 to the worklist. Then we process t79 but are not able to simplify it, so we do not add t83 (or t98) to the worklist. But there is a problem here: the fact that we simplify t49 means that we could now make some progress if we tried to combine t98, because that would call SimplifyDemandedBits which can peek several levels "down" into the DAG, and simplify t102 or t101. So there's a missed opportunity here. The fact that we simplified t49 unlocks a potential further simplification three levels "up" in the DAG, but we only ever add immediate users (one level "up") to the worklist when we simplify something. foad: I understand that. Maybe I'm not phrasing my question very well. Here's my understanding of…
				deadalnixAuthorUnsubmitted Done Reply Inline Actions I'm not sure where we are talking past each other so let me step back and explain how DAGCombine works and what we want to change about it. First, the DAGCombiner adds all the nodes in the DAG in its worklist, in an implementation defined order. The actual order depends on the specific operation performed on the DAG before it reaches the combiner. In your example, that would include both t49 and t98. Then it goes over all nodes and try to combine them. When a node is combined, it readds the result of the combine to the worklist as well as its users. In this case, the combiner visits t98 at some point, then t49, which it combines into t101, but then, it never revisit t98. Visiting the DAG in topological order, you'd combine t49 before going over t98. This diff does part of the job, but clearly isn't the end of the story. However, it's one of the most disruptive step. deadalnix: I'm not sure where we are talking past each other so let me step back and explain how…
	; VI-NEXT: v_mov_b32_e32 v2, s0			; VI-NEXT: v_mov_b32_e32 v2, s0
				; VI-NEXT: s_add_i32 s1, s1, 12
	; VI-NEXT: v_add_u32_sdwa v0, vcc, v0, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_1			; VI-NEXT: v_add_u32_sdwa v0, vcc, v0, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:BYTE_1
	; VI-NEXT: s_or_b32 s0, s1, 4			; VI-NEXT: s_or_b32 s0, s1, 4
	; VI-NEXT: v_or_b32_sdwa v2, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD			; VI-NEXT: v_or_b32_sdwa v2, v0, v1 dst_sel:BYTE_1 dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:DWORD
	; VI-NEXT: v_mov_b32_e32 v3, s0			; VI-NEXT: v_mov_b32_e32 v3, s0
	; VI-NEXT: v_mov_b32_e32 v0, 0			; VI-NEXT: v_mov_b32_e32 v0, 0
	; VI-NEXT: v_mov_b32_e32 v1, 0			; VI-NEXT: v_mov_b32_e32 v1, 0
	; VI-NEXT: v_or_b32_sdwa v2, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD			; VI-NEXT: v_or_b32_sdwa v2, v3, v2 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
	; VI-NEXT: flat_store_short v[0:1], v2			; VI-NEXT: flat_store_short v[0:1], v2
	▲ Show 20 Lines • Show All 247 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/addsubcarry-promotion.ll

Load File

llvm/test/CodeGen/ARM/icmp-shift-opt.ll

Load File

llvm/test/CodeGen/ARM/reg_sequence.ll

Load File

llvm/test/CodeGen/Hexagon/autohvx/isel-vpackew.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=hexagon < %s \| FileCheck %s			; RUN: llc -march=hexagon < %s \| FileCheck %s

	define void @f0(ptr %a0, ptr %a1, ptr %a2) #0 {			define void @f0(ptr %a0, ptr %a1, ptr %a2) #0 {
	; CHECK-LABEL: f0:			; CHECK-LABEL: f0:
	; CHECK: // %bb.0: // %b0			; CHECK: // %bb.0: // %b0
	; CHECK-NEXT: {			; CHECK-NEXT: {
	; CHECK-NEXT: r7 = #-4			; CHECK-NEXT: r7 = #124
				kparzyszUnsubmitted Not Done Reply Inline Actions This is fine. kparzysz: This is fine.
	; CHECK-NEXT: v0 = vmem(r0+#0)			; CHECK-NEXT: v0 = vmem(r0+#0)
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: {			; CHECK-NEXT: {
	; CHECK-NEXT: v1 = vmem(r1+#0)			; CHECK-NEXT: v1 = vmem(r1+#0)
	; CHECK-NEXT: }			; CHECK-NEXT: }
	; CHECK-NEXT: {			; CHECK-NEXT: {
	; CHECK-NEXT: v1:0.w = vmpy(v0.h,v1.h)			; CHECK-NEXT: v1:0.w = vmpy(v0.h,v1.h)
	; CHECK-NEXT: }			; CHECK-NEXT: }
	Show All 36 Lines

llvm/test/CodeGen/Hexagon/autohvx/mulh.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=hexagon -mattr=+hvxv60,+hvx-length128b,-packets < %s \| FileCheck --check-prefix=V60 %s			; RUN: llc -march=hexagon -mattr=+hvxv60,+hvx-length128b,-packets < %s \| FileCheck --check-prefix=V60 %s
	; RUN: llc -march=hexagon -mattr=+hvxv65,+hvx-length128b,-packets < %s \| FileCheck --check-prefix=V65 %s			; RUN: llc -march=hexagon -mattr=+hvxv65,+hvx-length128b,-packets < %s \| FileCheck --check-prefix=V65 %s
	; RUN: llc -march=hexagon -mattr=+hvxv69,+hvx-length128b,-packets < %s \| FileCheck --check-prefix=V69 %s			; RUN: llc -march=hexagon -mattr=+hvxv69,+hvx-length128b,-packets < %s \| FileCheck --check-prefix=V69 %s

	define <64 x i16> @mulhs16(<64 x i16> %a0, <64 x i16> %a1) #0 {			define <64 x i16> @mulhs16(<64 x i16> %a0, <64 x i16> %a1) #0 {
	; V60-LABEL: mulhs16:			; V60-LABEL: mulhs16:
	; V60: // %bb.0:			; V60: // %bb.0:
	; V60-NEXT: {			; V60-NEXT: {
	; V60-NEXT: v1:0.w = vmpy(v1.h,v0.h)			; V60-NEXT: v1:0.w = vmpy(v1.h,v0.h)
	; V60-NEXT: }			; V60-NEXT: }
	; V60-NEXT: {			; V60-NEXT: {
	; V60-NEXT: v0.h = vshuffo(v1.h,v0.h)			; V60-NEXT: r7 = #124
				; V60-NEXT: }
				; V60-NEXT: {
				; V60-NEXT: v1:0 = vshuff(v1,v0,r7)
				; V60-NEXT: }
				; V60-NEXT: {
				; V60-NEXT: v0.h = vpacko(v1.w,v0.w)
				deadalnixAuthorUnsubmitted Done Reply Inline Actions How do we get some Hexagon expert to look at this? @MaskRay ? @kparzysz ? deadalnix: How do we get some Hexagon expert to look at this? @MaskRay ? @kparzysz ?
				kparzyszUnsubmitted Not Done Reply Inline Actions This isn't good. I'll take a look. kparzysz: This isn't good. I'll take a look.
				deadalnixAuthorUnsubmitted Done Reply Inline Actions Thanks. deadalnix: Thanks.
				kparzyszUnsubmitted Not Done Reply Inline Actions This will be easy to fix. Please ping me once this patch is merged, and I'll take care of it. For the purpose of this patch, this is not an issue. kparzysz: This will be easy to fix. Please ping me once this patch is merged, and I'll take care of it.
				deadalnixAuthorUnsubmitted Done Reply Inline Actions OK, thanks for looking into this. deadalnix: OK, thanks for looking into this.
				deadalnixAuthorUnsubmitted Done Reply Inline Actions @kparzysz This has been landed, your move now :) deadalnix: @kparzysz This has been landed, your move now :)
				kparzyszUnsubmitted Not Done Reply Inline Actions Excellent. Working on it. kparzysz: Excellent. Working on it.
	; V60-NEXT: }			; V60-NEXT: }
	; V60-NEXT: {			; V60-NEXT: {
	; V60-NEXT: jumpr r31			; V60-NEXT: jumpr r31
	; V60-NEXT: }			; V60-NEXT: }
	;			;
	; V65-LABEL: mulhs16:			; V65-LABEL: mulhs16:
	; V65: // %bb.0:			; V65: // %bb.0:
	; V65-NEXT: {			; V65-NEXT: {
	; V65-NEXT: v1:0.w = vmpy(v1.h,v0.h)			; V65-NEXT: v1:0.w = vmpy(v1.h,v0.h)
	; V65-NEXT: }			; V65-NEXT: }
	; V65-NEXT: {			; V65-NEXT: {
	; V65-NEXT: v0.h = vshuffo(v1.h,v0.h)			; V65-NEXT: r7 = #124
				; V65-NEXT: }
				; V65-NEXT: {
				; V65-NEXT: v1:0 = vshuff(v1,v0,r7)
				; V65-NEXT: }
				; V65-NEXT: {
				; V65-NEXT: v0.h = vpacko(v1.w,v0.w)
	; V65-NEXT: }			; V65-NEXT: }
	; V65-NEXT: {			; V65-NEXT: {
	; V65-NEXT: jumpr r31			; V65-NEXT: jumpr r31
	; V65-NEXT: }			; V65-NEXT: }
	;			;
	; V69-LABEL: mulhs16:			; V69-LABEL: mulhs16:
	; V69: // %bb.0:			; V69: // %bb.0:
	; V69-NEXT: {			; V69-NEXT: {
	; V69-NEXT: v1:0.w = vmpy(v1.h,v0.h)			; V69-NEXT: v1:0.w = vmpy(v1.h,v0.h)
	; V69-NEXT: }			; V69-NEXT: }
	; V69-NEXT: {			; V69-NEXT: {
	; V69-NEXT: v0.h = vshuffo(v1.h,v0.h)			; V69-NEXT: r7 = #124
				; V69-NEXT: }
				; V69-NEXT: {
				; V69-NEXT: v1:0 = vshuff(v1,v0,r7)
				; V69-NEXT: }
				; V69-NEXT: {
				; V69-NEXT: v0.h = vpacko(v1.w,v0.w)
	; V69-NEXT: }			; V69-NEXT: }
	; V69-NEXT: {			; V69-NEXT: {
	; V69-NEXT: jumpr r31			; V69-NEXT: jumpr r31
	; V69-NEXT: }			; V69-NEXT: }
	%v0 = sext <64 x i16> %a0 to <64 x i32>			%v0 = sext <64 x i16> %a0 to <64 x i32>
	%v1 = sext <64 x i16> %a1 to <64 x i32>			%v1 = sext <64 x i16> %a1 to <64 x i32>
	%v2 = mul <64 x i32> %v0, %v1			%v2 = mul <64 x i32> %v0, %v1
	%v3 = lshr <64 x i32> %v2, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>			%v3 = lshr <64 x i32> %v2, <i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16, i32 16>
	%v4 = trunc <64 x i32> %v3 to <64 x i16>			%v4 = trunc <64 x i32> %v3 to <64 x i16>
	ret <64 x i16> %v4			ret <64 x i16> %v4
	}			}

	define <64 x i16> @mulhu16(<64 x i16> %a0, <64 x i16> %a1) #0 {			define <64 x i16> @mulhu16(<64 x i16> %a0, <64 x i16> %a1) #0 {
	; V60-LABEL: mulhu16:			; V60-LABEL: mulhu16:
	; V60: // %bb.0:			; V60: // %bb.0:
	; V60-NEXT: {			; V60-NEXT: {
	; V60-NEXT: v1:0.uw = vmpy(v1.uh,v0.uh)			; V60-NEXT: v1:0.uw = vmpy(v1.uh,v0.uh)
	; V60-NEXT: }			; V60-NEXT: }
	; V60-NEXT: {			; V60-NEXT: {
	; V60-NEXT: v0.h = vshuffo(v1.h,v0.h)			; V60-NEXT: r7 = #124
				; V60-NEXT: }
				; V60-NEXT: {
				; V60-NEXT: v1:0 = vshuff(v1,v0,r7)
				; V60-NEXT: }
				; V60-NEXT: {
				; V60-NEXT: v0.h = vpacko(v1.w,v0.w)
	; V60-NEXT: }			; V60-NEXT: }
	; V60-NEXT: {			; V60-NEXT: {
	; V60-NEXT: jumpr r31			; V60-NEXT: jumpr r31
	; V60-NEXT: }			; V60-NEXT: }
	;			;
	; V65-LABEL: mulhu16:			; V65-LABEL: mulhu16:
	; V65: // %bb.0:			; V65: // %bb.0:
	; V65-NEXT: {			; V65-NEXT: {
	; V65-NEXT: v1:0.uw = vmpy(v1.uh,v0.uh)			; V65-NEXT: v1:0.uw = vmpy(v1.uh,v0.uh)
	; V65-NEXT: }			; V65-NEXT: }
	; V65-NEXT: {			; V65-NEXT: {
	; V65-NEXT: v0.h = vshuffo(v1.h,v0.h)			; V65-NEXT: r7 = #124
				; V65-NEXT: }
				; V65-NEXT: {
				; V65-NEXT: v1:0 = vshuff(v1,v0,r7)
				; V65-NEXT: }
				; V65-NEXT: {
				; V65-NEXT: v0.h = vpacko(v1.w,v0.w)
	; V65-NEXT: }			; V65-NEXT: }
	; V65-NEXT: {			; V65-NEXT: {
	; V65-NEXT: jumpr r31			; V65-NEXT: jumpr r31
	; V65-NEXT: }			; V65-NEXT: }
	;			;
	; V69-LABEL: mulhu16:			; V69-LABEL: mulhu16:
	; V69: // %bb.0:			; V69: // %bb.0:
	; V69-NEXT: {			; V69-NEXT: {
	▲ Show 20 Lines • Show All 195 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/aix32-cc-abi-vaarg.ll

	Show First 20 Lines • Show All 253 Lines • ▼ Show 20 Lines

	; 32BIT-LABEL: stack:			; 32BIT-LABEL: stack:
	; 32BIT-DAG: - { id: 0, name: arg1, type: default, offset: 0, size: 4			; 32BIT-DAG: - { id: 0, name: arg1, type: default, offset: 0, size: 4
	; 32BIT-DAG: - { id: 1, name: arg2, type: default, offset: 0, size: 4			; 32BIT-DAG: - { id: 1, name: arg2, type: default, offset: 0, size: 4

	; 32BIT-LABEL: body: \|			; 32BIT-LABEL: body: \|
	; 32BIT-DAG: liveins: $f1, $r5, $r6, $r7, $r8, $r9, $r10			; 32BIT-DAG: liveins: $f1, $r5, $r6, $r7, $r8, $r9, $r10
	; 32BIT-DAG: renamable $r3 = ADDI %fixed-stack.0, 0			; 32BIT-DAG: renamable $r3 = ADDI %fixed-stack.0, 0
				; 32BIT-DAG: STW killed renamable $r7, 8, %fixed-stack.0 :: (store (s32), align 8)
	; 32BIT-DAG: STW renamable $r5, 0, %fixed-stack.0 :: (store (s32) into %fixed-stack.0, align 16)			; 32BIT-DAG: STW renamable $r5, 0, %fixed-stack.0 :: (store (s32) into %fixed-stack.0, align 16)
	; 32BIT-DAG: STW renamable $r6, 4, %fixed-stack.0 :: (store (s32) into %fixed-stack.0 + 4)			; 32BIT-DAG: STW renamable $r6, 4, %fixed-stack.0 :: (store (s32) into %fixed-stack.0 + 4)
	; 32BIT-DAG: STW killed renamable $r7, 8, %fixed-stack.0 :: (store (s32) into %fixed-stack.0 + 8, align 8)
	; 32BIT-DAG: STW killed renamable $r8, 12, %fixed-stack.0 :: (store (s32))			; 32BIT-DAG: STW killed renamable $r8, 12, %fixed-stack.0 :: (store (s32))
	; 32BIT-DAG: STW killed renamable $r9, 16, %fixed-stack.0 :: (store (s32) into %fixed-stack.0 + 16, align 16)			; 32BIT-DAG: STW killed renamable $r9, 16, %fixed-stack.0 :: (store (s32) into %fixed-stack.0 + 16, align 16)
	; 32BIT-DAG: STW killed renamable $r10, 20, %fixed-stack.0 :: (store (s32))			; 32BIT-DAG: STW killed renamable $r10, 20, %fixed-stack.0 :: (store (s32))
	; 32BIT-DAG: STW renamable $r3, 0, %stack.0.arg1 :: (store (s32) into %ir.0)			; 32BIT-DAG: STW renamable $r3, 0, %stack.0.arg1 :: (store (s32) into %ir.0)
	; 32BIT-DAG: STW killed renamable $r3, 0, %stack.1.arg2 :: (store (s32) into %ir.1)			; 32BIT-DAG: STW killed renamable $r3, 0, %stack.1.arg2 :: (store (s32) into %ir.1)
				; 32BIT-DAG: STW renamable $r5, 0, %stack.2 :: (store (s32) into %stack.2, align 8)
				; 32BIT-DAG: STW renamable $r6, 4, %stack.2 :: (store (s32) into %stack.2 + 4)
				; 32BIT-DAG: renamable $f0 = LFD 0, %stack.2 :: (load (s64) from %stack.2)
				; 32BIT-DAG: STW killed renamable $r5, 0, %stack.3 :: (store (s32) into %stack.3, align 8)
				; 32BIT-DAG: STW killed renamable $r6, 4, %stack.3 :: (store (s32) into %stack.3 + 4)
				; 32BIT-DAG: renamable $f2 = LFD 0, %stack.3 :: (load (s64) from %stack.3)
				; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f1, implicit $rm
				; 32BIT-DAG: renamable $f1 = nofpexcept FADD killed renamable $f2, renamable $f2, implicit $rm
				; 32BIT-DAG: renamable $f1 = nofpexcept FADD killed renamable $f0, killed renamable $f1, implicit $rm
	; 32BIT-DAG: BLR implicit $lr, implicit $rm, implicit $f1			; 32BIT-DAG: BLR implicit $lr, implicit $rm, implicit $f1
				RKSimonUnsubmitted Not Done Reply Inline Actions ; 32BIT-LABEL: body: \| ; 32BIT-DAG: liveins: $f1, $r5, $r6, $r7, $r8, $r9, $r10 ; 32BIT-DAG: renamable $r3 = ADDI %fixed-stack.0, 0 ; 32BIT-DAG: STW killed renamable $r7, 8, %fixed-stack.0 :: (store (s32), align 8) ; 32BIT-DAG: STW renamable $r5, 0, %fixed-stack.0 :: (store (s32) into %fixed-stack.0, align 16) ; 32BIT-DAG: STW renamable $r6, 4, %fixed-stack.0 :: (store (s32) into %fixed-stack.0 + 4) ; 32BIT-DAG: STW killed renamable $r8, 12, %fixed-stack.0 :: (store (s32)) ; 32BIT-DAG: STW killed renamable $r9, 16, %fixed-stack.0 :: (store (s32) into %fixed-stack.0 + 16, align 16) ; 32BIT-DAG: STW killed renamable $r10, 20, %fixed-stack.0 :: (store (s32)) ; 32BIT-DAG: STW renamable $r3, 0, %stack.0.arg1 :: (store (s32) into %ir.0) ; 32BIT-DAG: STW killed renamable $r3, 0, %stack.1.arg2 :: (store (s32) into %ir.1) ; 32BIT-DAG: STW renamable $r5, 0, %stack.2 :: (store (s32) into %stack.2, align 8) ; 32BIT-DAG: STW renamable $r6, 4, %stack.2 :: (store (s32) into %stack.2 + 4) ; 32BIT-DAG: renamable $f0 = LFD 0, %stack.2 :: (load (s64) from %stack.2) ; 32BIT-DAG: STW killed renamable $r5, 0, %stack.3 :: (store (s32) into %stack.3, align 8) ; 32BIT-DAG: STW killed renamable $r6, 4, %stack.3 :: (store (s32) into %stack.3 + 4) ; 32BIT-DAG: renamable $f2 = LFD 0, %stack.3 :: (load (s64) from %stack.3) ; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f1, implicit $rm ; 32BIT-DAG: renamable $f1 = nofpexcept FADD killed renamable $f2, renamable $f2, implicit $rm ; 32BIT-DAG: renamable $f1 = nofpexcept FADD killed renamable $f0, killed renamable $f1, implicit $rm ; 32BIT-DAG: BLR implicit $lr, implicit $rm, implicit $f1 RKSimon: ``` ; 32BIT-LABEL: body: \| ; 32BIT-DAG: liveins: $f1, $r5, $r6, $r7, $r8, $r9…

	define double @double_stack_va_arg(double %one, double %two, double %three, double %four, double %five, double %six, double %seven, double %eight, double %nine, double %ten, double %eleven, double %twelve, double %thirteen, ...) local_unnamed_addr {			define double @double_stack_va_arg(double %one, double %two, double %three, double %four, double %five, double %six, double %seven, double %eight, double %nine, double %ten, double %eleven, double %twelve, double %thirteen, ...) local_unnamed_addr {
	; ASM32-LABEL: double_stack_va_arg:			; ASM32-LABEL: double_stack_va_arg:
	; ASM32: # %bb.0: # %entry			; ASM32: # %bb.0: # %entry
	; ASM32-NEXT: fadd 0, 1, 2			; ASM32-NEXT: fadd 0, 1, 2
	; ASM32-NEXT: addi 4, 1, 128			; ASM32-NEXT: addi 3, 1, 128
	; ASM32-NEXT: lwz 3, 132(1)			; ASM32-NEXT: lwz 4, 132(1)
	; ASM32-NEXT: fadd 0, 0, 3			; ASM32-NEXT: fadd 0, 0, 3
	; ASM32-NEXT: stw 4, -4(1)			; ASM32-NEXT: stw 3, -4(1)
	; ASM32-NEXT: fadd 0, 0, 4			; ASM32-NEXT: fadd 0, 0, 4
	; ASM32-NEXT: lwz 4, 128(1)			; ASM32-NEXT: lwz 3, 128(1)
	; ASM32-NEXT: fadd 0, 0, 5			; ASM32-NEXT: fadd 0, 0, 5
	; ASM32-NEXT: stw 3, -12(1)			; ASM32-NEXT: stw 3, -16(1)
	; ASM32-NEXT: fadd 0, 0, 6			; ASM32-NEXT: fadd 0, 0, 6
	; ASM32-NEXT: stw 4, -16(1)			; ASM32-NEXT: stw 4, -12(1)
	; ASM32-NEXT: fadd 0, 0, 7			; ASM32-NEXT: fadd 0, 0, 7
	; ASM32-NEXT: lfd 1, -16(1)			; ASM32-NEXT: lfd 1, -16(1)
	; ASM32-NEXT: fadd 0, 0, 8			; ASM32-NEXT: fadd 0, 0, 8
	; ASM32-NEXT: stw 3, -20(1)			; ASM32-NEXT: stw 3, -24(1)
	; ASM32-NEXT: fadd 0, 0, 9			; ASM32-NEXT: fadd 0, 0, 9
	; ASM32-NEXT: stw 4, -24(1)			; ASM32-NEXT: stw 4, -20(1)
	; ASM32-NEXT: fadd 0, 0, 10			; ASM32-NEXT: fadd 0, 0, 10
	; ASM32-NEXT: fadd 0, 0, 11			; ASM32-NEXT: fadd 0, 0, 11
	; ASM32-NEXT: fadd 0, 0, 12			; ASM32-NEXT: fadd 0, 0, 12
	; ASM32-NEXT: fadd 0, 0, 13			; ASM32-NEXT: fadd 0, 0, 13
	; ASM32-NEXT: fadd 0, 0, 1			; ASM32-NEXT: fadd 0, 0, 1
	; ASM32-NEXT: lfd 1, -24(1)			; ASM32-NEXT: lfd 1, -24(1)
	; ASM32-NEXT: fadd 1, 1, 1			; ASM32-NEXT: fadd 1, 1, 1
	; ASM32-NEXT: fadd 1, 0, 1			; ASM32-NEXT: fadd 1, 0, 1
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; 32BIT-LABEL: stack:			; 32BIT-LABEL: stack:
	; 32BIT-DAG: - { id: 0, name: arg1, type: default, offset: 0, size: 4, alignment: 4,			; 32BIT-DAG: - { id: 0, name: arg1, type: default, offset: 0, size: 4, alignment: 4,
	; 32BIT-DAG: - { id: 1, name: arg2, type: default, offset: 0, size: 4, alignment: 4,			; 32BIT-DAG: - { id: 1, name: arg2, type: default, offset: 0, size: 4, alignment: 4,
	; 32BIT-DAG: - { id: 2, name: '', type: default, offset: 0, size: 8, alignment: 8,			; 32BIT-DAG: - { id: 2, name: '', type: default, offset: 0, size: 8, alignment: 8,
	; 32BIT-DAG: - { id: 3, name: '', type: default, offset: 0, size: 8, alignment: 8,			; 32BIT-DAG: - { id: 3, name: '', type: default, offset: 0, size: 8, alignment: 8,

	; 32BIT-LABEL: body: \|			; 32BIT-LABEL: body: \|
	; 32BIT-DAG: liveins: $f1, $f2, $f3, $f4, $f5, $f6, $f7, $f8, $f9, $f10, $f11, $f12, $f13			; 32BIT-DAG: liveins: $f1, $f2, $f3, $f4, $f5, $f6, $f7, $f8, $f9, $f10, $f11, $f12, $f13
	; 32BIT-DAG: renamable $r4 = ADDI %fixed-stack.0, 0			; 32BIT-DAG: renamable $r3 = ADDI %fixed-stack.0, 0
	; 32BIT-DAG: STW killed renamable $r4, 0, %stack.0.arg1 :: (store (s32) into %ir.0)			; 32BIT-DAG: STW killed renamable $r3, 0, %stack.0.arg1 :: (store (s32) into %ir.0)
	; 32BIT-DAG: renamable $r4 = LWZ 0, %fixed-stack.0 :: (load (s32) from %ir.argp.cur142, align 16)			; 32BIT-DAG: renamable $r3 = LWZ 0, %fixed-stack.0 :: (load (s32) from %ir.argp.cur142, align 16)
	; 32BIT-DAG: renamable $f1 = nofpexcept FADD killed renamable $f0, killed renamable $f1, implicit $rm
	; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f1, killed renamable $f2, implicit $rm			; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f1, killed renamable $f2, implicit $rm
	; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f3, implicit $rm			; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f3, implicit $rm
				; 32BIT-DAG: STW renamable $r3, 0, %stack.2 :: (store (s32) into %stack.2, align 8)
	; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f4, implicit $rm			; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f4, implicit $rm
				; 32BIT-DAG: renamable $r4 = LWZ 4, %fixed-stack.0 :: (load (s32) from %ir.argp.cur142 + 4)
	; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f5, implicit $rm			; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f5, implicit $rm
	; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f6, implicit $rm			; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f6, implicit $rm
	; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f7, implicit $rm			; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f7, implicit $rm
				; 32BIT-DAG: STW renamable $r4, 4, %stack.2 :: (store (s32) into %stack.2 + 4)
	; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f8, implicit $rm			; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f8, implicit $rm
				; 32BIT-DAG: renamable $f1 = LFD 0, %stack.2 :: (load (s64) from %stack.2)
	; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f9, implicit $rm			; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f9, implicit $rm
				; 32BIT-DAG: STW killed renamable $r3, 0, %stack.3 :: (store (s32) into %stack.3, align 8)
	; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f10, implicit $rm			; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f10, implicit $rm
				; 32BIT-DAG: STW killed renamable $r4, 4, %stack.3 :: (store (s32) into %stack.3 + 4)
	; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f11, implicit $rm			; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f11, implicit $rm
				; 32BIT-DAG: renamable $f2 = LFD 0, %stack.3 :: (load (s64) from %stack.3)
	; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f12, implicit $rm			; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f12, implicit $rm
	; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f13, implicit $rm			; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f13, implicit $rm
	; 32BIT-DAG: renamable $r3 = LWZ 4, %fixed-stack.0 :: (load (s32) from %ir.argp.cur142 + 4)
	; 32BIT-DAG: STW renamable $r3, 4, %stack.2 :: (store (s32) into %stack.2 + 4)
	; 32BIT-DAG: renamable $f1 = LFD 0, %stack.2 :: (load (s64) from %stack.2)
	; 32BIT-DAG: STW killed renamable $r4, 0, %stack.3 :: (store (s32) into %stack.3, align 8)
	; 32BIT-DAG: STW killed renamable $r3, 4, %stack.3 :: (store (s32) into %stack.3 + 4)
	; 32BIT-DAG: renamable $f2 = LFD 0, %stack.3 :: (load (s64) from %stack.3)
	; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f1, implicit $rm			; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f1, implicit $rm
	; 32BIT-DAG: STW renamable $r4, 0, %stack.2 :: (store (s32) into %stack.2, align 8)
	; 32BIT-DAG: renamable $f1 = nofpexcept FADD killed renamable $f2, renamable $f2, implicit $rm			; 32BIT-DAG: renamable $f1 = nofpexcept FADD killed renamable $f2, renamable $f2, implicit $rm
				; 32BIT-DAG: renamable $f1 = nofpexcept FADD killed renamable $f0, killed renamable $f1, implicit $rm
	; 32BIT-DAG: BLR implicit $lr, implicit $rm, implicit $f1			; 32BIT-DAG: BLR implicit $lr, implicit $rm, implicit $f1

				;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
				RKSimonUnsubmitted Not Done Reply Inline Actions remove this - its just the update script getting confused by the manual -DAG checks. RKSimon: remove this - its just the update script getting confused by the manual -DAG checks.
				; 32BIT: {{.*}}
				RKSimonUnsubmitted Not Done Reply Inline Actions ; 32BIT-LABEL: body: \| ; 32BIT-DAG: liveins: $f1, $f2, $f3, $f4, $f5, $f6, $f7, $f8, $f9, $f10, $f11, $f12, $f13 ; 32BIT-DAG: renamable $r3 = ADDI %fixed-stack.0, 0 ; 32BIT-DAG: STW killed renamable $r3, 0, %stack.0.arg1 :: (store (s32) into %ir.0) ; 32BIT-DAG: renamable $r3 = LWZ 0, %fixed-stack.0 :: (load (s32) from %ir.argp.cur142, align 16) ; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f1, killed renamable $f2, implicit $rm ; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f3, implicit $rm ; 32BIT-DAG: STW renamable $r3, 0, %stack.2 :: (store (s32) into %stack.2, align 8) ; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f4, implicit $rm ; 32BIT-DAG: renamable $r4 = LWZ 4, %fixed-stack.0 :: (load (s32) from %ir.argp.cur142 + 4) ; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f5, implicit $rm ; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f6, implicit $rm ; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f7, implicit $rm ; 32BIT-DAG: STW renamable $r4, 4, %stack.2 :: (store (s32) into %stack.2 + 4) ; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f8, implicit $rm ; 32BIT-DAG: renamable $f1 = LFD 0, %stack.2 :: (load (s64) from %stack.2) ; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f9, implicit $rm ; 32BIT-DAG: STW killed renamable $r3, 0, %stack.3 :: (store (s32) into %stack.3, align 8) ; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f10, implicit $rm ; 32BIT-DAG: STW killed renamable $r4, 4, %stack.3 :: (store (s32) into %stack.3 + 4) ; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f11, implicit $rm ; 32BIT-DAG: renamable $f2 = LFD 0, %stack.3 :: (load (s64) from %stack.3) ; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f12, implicit $rm ; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f13, implicit $rm ; 32BIT-DAG: renamable $f0 = nofpexcept FADD killed renamable $f0, killed renamable $f1, implicit $rm ; 32BIT-DAG: renamable $f1 = nofpexcept FADD killed renamable $f2, renamable $f2, implicit $rm ; 32BIT-DAG: renamable $f1 = nofpexcept FADD killed renamable $f0, killed renamable $f1, implicit $rm ; 32BIT-DAG: BLR implicit $lr, implicit $rm, implicit $f1 RKSimon: ``` ; 32BIT-LABEL: body: \| ; 32BIT-DAG: liveins: $f1, $f2, $f3, $f4, $f5, $f6…

llvm/test/CodeGen/PowerPC/combine-fneg.ll

Load File

llvm/test/CodeGen/PowerPC/select_const.ll

Load File

llvm/test/CodeGen/RISCV/mul.ll

Load File

llvm/test/CodeGen/RISCV/pr58511.ll

Load File

llvm/test/CodeGen/SystemZ/pr36164.ll

Load File

llvm/test/CodeGen/Thumb2/mve-vst3.ll

Load File

llvm/test/CodeGen/X86/2011-10-19-LegelizeLoad.ll

Load File

llvm/test/CodeGen/X86/2012-08-07-CmpISelBug.ll

Load File

llvm/test/CodeGen/X86/addcarry.ll

Load File

llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast.ll

Load File

llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast_from_memory.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 2,837 Lines • ▼ Show 20 Lines
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)
	; AVX-NEXT: vmovdqa %xmm1, (%rdx)			; AVX-NEXT: vmovdqa %xmm1, (%rdx)
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; AVX2-LABEL: vec384_i16_widen_to_i32_factor2_broadcast_to_v12i32_factor12:			; AVX2-LABEL: vec384_i16_widen_to_i32_factor2_broadcast_to_v12i32_factor12:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vpbroadcastw (%rdi), %xmm0			; AVX2-NEXT: vpbroadcastw (%rdi), %xmm0
	; AVX2-NEXT: vmovdqa 48(%rdi), %xmm1			; AVX2-NEXT: vmovdqa 48(%rdi), %xmm1
	; AVX2-NEXT: vpshufb {{.*#+}} ymm1 = ymm1[2,3,6,7,10,11,14,15,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u]			; AVX2-NEXT: vpshufb {{.*#+}} ymm1 = ymm1[2,3,6,7,10,11,14,15,u,u,u,u,u,u,u,u,18,19,22,23,26,27,30,31,u,u,u,u,u,u,u,u]
	; AVX2-NEXT: vbroadcasti128 {{.*#+}} ymm2 = mem[0,1,0,1]			; AVX2-NEXT: vbroadcasti128 {{.*#+}} ymm2 = mem[0,1,0,1]
	; AVX2-NEXT: vpshuflw {{.*#+}} ymm2 = ymm2[0,0,0,0,4,5,6,7,8,8,8,8,12,13,14,15]			; AVX2-NEXT: vpshuflw {{.*#+}} ymm2 = ymm2[0,0,0,0,4,5,6,7,8,8,8,8,12,13,14,15]
	; AVX2-NEXT: vpunpcklwd {{.*#+}} ymm1 = ymm2[0],ymm1[0],ymm2[1],ymm1[1],ymm2[2],ymm1[2],ymm2[3],ymm1[3],ymm2[8],ymm1[8],ymm2[9],ymm1[9],ymm2[10],ymm1[10],ymm2[11],ymm1[11]			; AVX2-NEXT: vpunpcklwd {{.*#+}} ymm1 = ymm2[0],ymm1[0],ymm2[1],ymm1[1],ymm2[2],ymm1[2],ymm2[3],ymm1[3],ymm2[8],ymm1[8],ymm2[9],ymm1[9],ymm2[10],ymm1[10],ymm2[11],ymm1[11]
	; AVX2-NEXT: vpaddb (%rsi), %ymm1, %ymm1			; AVX2-NEXT: vpaddb (%rsi), %ymm1, %ymm1
	; AVX2-NEXT: vpaddb 32(%rsi), %ymm0, %ymm0			; AVX2-NEXT: vpaddb 32(%rsi), %ymm0, %ymm0
	; AVX2-NEXT: vmovdqa %ymm0, 32(%rdx)			; AVX2-NEXT: vmovdqa %ymm0, 32(%rdx)
	; AVX2-NEXT: vmovdqa %ymm1, (%rdx)			; AVX2-NEXT: vmovdqa %ymm1, (%rdx)
	; AVX2-NEXT: vzeroupper			; AVX2-NEXT: vzeroupper
	▲ Show 20 Lines • Show All 793 Lines • ▼ Show 20 Lines
	; SSE42-NEXT: paddb 32(%rsi), %xmm0			; SSE42-NEXT: paddb 32(%rsi), %xmm0
	; SSE42-NEXT: movdqa %xmm0, 32(%rdx)			; SSE42-NEXT: movdqa %xmm0, 32(%rdx)
	; SSE42-NEXT: movdqa %xmm2, 16(%rdx)			; SSE42-NEXT: movdqa %xmm2, 16(%rdx)
	; SSE42-NEXT: movdqa %xmm1, (%rdx)			; SSE42-NEXT: movdqa %xmm1, (%rdx)
	; SSE42-NEXT: retq			; SSE42-NEXT: retq
	;			;
	; AVX-LABEL: vec384_i32_widen_to_i128_factor4_broadcast_to_v3i128_factor3:			; AVX-LABEL: vec384_i32_widen_to_i128_factor4_broadcast_to_v3i128_factor3:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vbroadcastf128 {{.*#+}} ymm0 = mem[0,1,0,1]			; AVX-NEXT: vmovaps 48(%rdi), %xmm0
	; AVX-NEXT: vblendps {{.*#+}} xmm1 = xmm0[0],mem[1,2,3]			; AVX-NEXT: vbroadcastf128 {{.*#+}} ymm1 = mem[0,1,0,1]
	; AVX-NEXT: vpaddb (%rsi), %xmm1, %xmm1			; AVX-NEXT: vblendps {{.*#+}} ymm0 = ymm1[0],ymm0[1,2,3],ymm1[4],ymm0[5,6,7]
				; AVX-NEXT: vextractf128 $1, %ymm0, %xmm1
				; AVX-NEXT: vpaddb 16(%rsi), %xmm1, %xmm1
				; AVX-NEXT: vpaddb (%rsi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa (%rdi), %xmm2			; AVX-NEXT: vmovdqa (%rdi), %xmm2
	; AVX-NEXT: vmovdqa 16(%rdi), %xmm3			; AVX-NEXT: vmovdqa 16(%rdi), %xmm3
	; AVX-NEXT: vpaddb 48(%rsi), %xmm3, %xmm3			; AVX-NEXT: vpaddb 48(%rsi), %xmm3, %xmm3
	; AVX-NEXT: vpaddb 16(%rsi), %xmm0, %xmm0
	; AVX-NEXT: vpaddb 32(%rsi), %xmm2, %xmm2			; AVX-NEXT: vpaddb 32(%rsi), %xmm2, %xmm2
	; AVX-NEXT: vmovdqa %xmm2, 32(%rdx)			; AVX-NEXT: vmovdqa %xmm2, 32(%rdx)
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)
	; AVX-NEXT: vmovdqa %xmm3, 48(%rdx)			; AVX-NEXT: vmovdqa %xmm3, 48(%rdx)
	; AVX-NEXT: vmovdqa %xmm1, (%rdx)			; AVX-NEXT: vmovdqa %xmm0, (%rdx)
				; AVX-NEXT: vmovdqa %xmm1, 16(%rdx)
	; AVX-NEXT: vzeroupper			; AVX-NEXT: vzeroupper
	; AVX-NEXT: retq			; AVX-NEXT: retq
				RKSimonUnsubmitted Not Done Reply Inline Actions We still need to look at this - `vbroadcastf128 {{.#+}} ymm1 = mem[0,1,0,1]` and `vmovdqa (%rdi), %xmm2` have been split but should be able to share the same load. RKSimon:* We still need to look at this - `vbroadcastf128 {{.*#+}} ymm1 = mem[0,1,0,1]` and `vmovdqa…
	;			;
	; AVX2-LABEL: vec384_i32_widen_to_i128_factor4_broadcast_to_v3i128_factor3:			; AVX2-LABEL: vec384_i32_widen_to_i128_factor4_broadcast_to_v3i128_factor3:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vmovdqa (%rdi), %ymm0			; AVX2-NEXT: vmovdqa (%rdi), %ymm0
	; AVX2-NEXT: vmovdqa 48(%rdi), %xmm1			; AVX2-NEXT: vmovdqa 48(%rdi), %xmm1
	; AVX2-NEXT: vpermq {{.*#+}} ymm2 = ymm0[0,1,0,1]			; AVX2-NEXT: vpermq {{.*#+}} ymm2 = ymm0[0,1,0,1]
	; AVX2-NEXT: vpblendd {{.*#+}} ymm1 = ymm2[0],ymm1[1,2,3],ymm2[4],ymm1[5,6,7]			; AVX2-NEXT: vpblendd {{.*#+}} ymm1 = ymm2[0],ymm1[1,2,3],ymm2[4],ymm1[5,6,7]
	; AVX2-NEXT: vpaddb (%rsi), %ymm1, %ymm1			; AVX2-NEXT: vpaddb (%rsi), %ymm1, %ymm1
	▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
	; AVX-LABEL: vec384_i64_widen_to_i128_factor2_broadcast_to_v3i128_factor3:			; AVX-LABEL: vec384_i64_widen_to_i128_factor2_broadcast_to_v3i128_factor3:
	; AVX: # %bb.0:			; AVX: # %bb.0:
	; AVX-NEXT: vbroadcastf128 {{.*#+}} ymm0 = mem[0,1,0,1]			; AVX-NEXT: vbroadcastf128 {{.*#+}} ymm0 = mem[0,1,0,1]
	; AVX-NEXT: vblendps {{.*#+}} xmm1 = xmm0[0,1],mem[2,3]			; AVX-NEXT: vblendps {{.*#+}} xmm1 = xmm0[0,1],mem[2,3]
	; AVX-NEXT: vpaddb (%rsi), %xmm1, %xmm1			; AVX-NEXT: vpaddb (%rsi), %xmm1, %xmm1
	; AVX-NEXT: vmovdqa (%rdi), %xmm2			; AVX-NEXT: vmovdqa (%rdi), %xmm2
	; AVX-NEXT: vmovdqa 16(%rdi), %xmm3			; AVX-NEXT: vmovdqa 16(%rdi), %xmm3
	; AVX-NEXT: vpaddb 48(%rsi), %xmm3, %xmm3			; AVX-NEXT: vpaddb 48(%rsi), %xmm3, %xmm3
	; AVX-NEXT: vpaddb 16(%rsi), %xmm0, %xmm0
	; AVX-NEXT: vpaddb 32(%rsi), %xmm2, %xmm2			; AVX-NEXT: vpaddb 32(%rsi), %xmm2, %xmm2
	; AVX-NEXT: vmovdqa %xmm2, 32(%rdx)			; AVX-NEXT: vpaddb 16(%rsi), %xmm0, %xmm0
	; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)			; AVX-NEXT: vmovdqa %xmm0, 16(%rdx)
				; AVX-NEXT: vmovdqa %xmm2, 32(%rdx)
	; AVX-NEXT: vmovdqa %xmm3, 48(%rdx)			; AVX-NEXT: vmovdqa %xmm3, 48(%rdx)
	; AVX-NEXT: vmovdqa %xmm1, (%rdx)			; AVX-NEXT: vmovdqa %xmm1, (%rdx)
	; AVX-NEXT: vzeroupper			; AVX-NEXT: vzeroupper
	; AVX-NEXT: retq			; AVX-NEXT: retq
	;			;
	; AVX2-LABEL: vec384_i64_widen_to_i128_factor2_broadcast_to_v3i128_factor3:			; AVX2-LABEL: vec384_i64_widen_to_i128_factor2_broadcast_to_v3i128_factor3:
	; AVX2: # %bb.0:			; AVX2: # %bb.0:
	; AVX2-NEXT: vmovdqa (%rdi), %ymm0			; AVX2-NEXT: vmovdqa (%rdi), %ymm0
	▲ Show 20 Lines • Show All 1,504 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/avx512-mask-op.ll

Load File

llvm/test/CodeGen/X86/avx512bw-intrinsics-upgrade.ll

Load File

llvm/test/CodeGen/X86/avx512vl-vec-masked-cmp.ll

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 2,685 Lines • ▼ Show 20 Lines
	}			}


	define zeroext i4 @test_vpcmpeqq_v2i1_v4i1_mask(<2 x i64> %__a, <2 x i64> %__b) local_unnamed_addr {			define zeroext i4 @test_vpcmpeqq_v2i1_v4i1_mask(<2 x i64> %__a, <2 x i64> %__b) local_unnamed_addr {
	; VLX-LABEL: test_vpcmpeqq_v2i1_v4i1_mask:			; VLX-LABEL: test_vpcmpeqq_v2i1_v4i1_mask:
	; VLX: # %bb.0: # %entry			; VLX: # %bb.0: # %entry
	; VLX-NEXT: vpcmpeqq %xmm1, %xmm0, %k0			; VLX-NEXT: vpcmpeqq %xmm1, %xmm0, %k0
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
				chfastUnsubmitted Not Done Reply Inline Actions Is this a regression? chfast: Is this a regression?
				deadalnixAuthorUnsubmitted Done Reply Inline Actions Yes, I think so. On the other hand, the no vlx case got improved :) deadalnix: Yes, I think so. On the other hand, the no vlx case got improved :)
				RKSimonUnsubmitted Not Done Reply Inline Actions Looking at this - it looks like combineScalarAndWithMaskSetcc needs to be tweaked to peek through any_extend() nodes RKSimon: Looking at this - it looks like combineScalarAndWithMaskSetcc needs to be tweaked to peek…
				chfastUnsubmitted Done Reply Inline Actions Looks like fixed now. chfast: Looks like fixed now.
	;			;
	; NoVLX-LABEL: test_vpcmpeqq_v2i1_v4i1_mask:			; NoVLX-LABEL: test_vpcmpeqq_v2i1_v4i1_mask:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%1 = bitcast <2 x i64> %__b to <2 x i64>			%1 = bitcast <2 x i64> %__b to <2 x i64>
	%2 = icmp eq <2 x i64> %0, %1			%2 = icmp eq <2 x i64> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%4 = bitcast <4 x i1> %3 to i4			%4 = bitcast <4 x i1> %3 to i4
	ret i4 %4			ret i4 %4
	}			}

	define zeroext i4 @test_vpcmpeqq_v2i1_v4i1_mask_mem(<2 x i64> %__a, ptr %__b) local_unnamed_addr {			define zeroext i4 @test_vpcmpeqq_v2i1_v4i1_mask_mem(<2 x i64> %__a, ptr %__b) local_unnamed_addr {
	; VLX-LABEL: test_vpcmpeqq_v2i1_v4i1_mask_mem:			; VLX-LABEL: test_vpcmpeqq_v2i1_v4i1_mask_mem:
	; VLX: # %bb.0: # %entry			; VLX: # %bb.0: # %entry
	; VLX-NEXT: vpcmpeqq (%rdi), %xmm0, %k0			; VLX-NEXT: vpcmpeqq (%rdi), %xmm0, %k0
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_vpcmpeqq_v2i1_v4i1_mask_mem:			; NoVLX-LABEL: test_vpcmpeqq_v2i1_v4i1_mask_mem:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vmovdqa (%rdi), %xmm1			; NoVLX-NEXT: vmovdqa (%rdi), %xmm1
	; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load <2 x i64>, ptr %__b			%load = load <2 x i64>, ptr %__b
	%1 = bitcast <2 x i64> %load to <2 x i64>			%1 = bitcast <2 x i64> %load to <2 x i64>
	%2 = icmp eq <2 x i64> %0, %1			%2 = icmp eq <2 x i64> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	Show All 10 Lines
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_masked_vpcmpeqq_v2i1_v4i1_mask:			; NoVLX-LABEL: test_masked_vpcmpeqq_v2i1_v4i1_mask:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%1 = bitcast <2 x i64> %__b to <2 x i64>			%1 = bitcast <2 x i64> %__b to <2 x i64>
	%2 = icmp eq <2 x i64> %0, %1			%2 = icmp eq <2 x i64> %0, %1
	%3 = bitcast i8 %__u to <8 x i1>			%3 = bitcast i8 %__u to <8 x i1>
	%extract.i = shufflevector <8 x i1> %3, <8 x i1> undef, <2 x i32> <i32 0, i32 1>			%extract.i = shufflevector <8 x i1> %3, <8 x i1> undef, <2 x i32> <i32 0, i32 1>
	Show All 12 Lines
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_masked_vpcmpeqq_v2i1_v4i1_mask_mem:			; NoVLX-LABEL: test_masked_vpcmpeqq_v2i1_v4i1_mask_mem:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vmovdqa (%rsi), %xmm1			; NoVLX-NEXT: vmovdqa (%rsi), %xmm1
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpeqq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load <2 x i64>, ptr %__b			%load = load <2 x i64>, ptr %__b
	%1 = bitcast <2 x i64> %load to <2 x i64>			%1 = bitcast <2 x i64> %load to <2 x i64>
	%2 = icmp eq <2 x i64> %0, %1			%2 = icmp eq <2 x i64> %0, %1
	%3 = bitcast i8 %__u to <8 x i1>			%3 = bitcast i8 %__u to <8 x i1>
	Show All 11 Lines
	; VLX-NEXT: vpcmpeqq (%rdi){1to2}, %xmm0, %k0			; VLX-NEXT: vpcmpeqq (%rdi){1to2}, %xmm0, %k0
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_vpcmpeqq_v2i1_v4i1_mask_mem_b:			; NoVLX-LABEL: test_vpcmpeqq_v2i1_v4i1_mask_mem_b:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpcmpeqq (%rdi){1to8}, %zmm0, %k0			; NoVLX-NEXT: vpcmpeqq (%rdi){1to8}, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load i64, ptr %__b			%load = load i64, ptr %__b
	%vec = insertelement <2 x i64> undef, i64 %load, i32 0			%vec = insertelement <2 x i64> undef, i64 %load, i32 0
	%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>
	%2 = icmp eq <2 x i64> %0, %1			%2 = icmp eq <2 x i64> %0, %1
	Show All 10 Lines
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_masked_vpcmpeqq_v2i1_v4i1_mask_mem_b:			; NoVLX-LABEL: test_masked_vpcmpeqq_v2i1_v4i1_mask_mem_b:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpeqq (%rsi){1to8}, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpeqq (%rsi){1to8}, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load i64, ptr %__b			%load = load i64, ptr %__b
	%vec = insertelement <2 x i64> undef, i64 %load, i32 0			%vec = insertelement <2 x i64> undef, i64 %load, i32 0
	%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>
	%2 = icmp eq <2 x i64> %0, %1			%2 = icmp eq <2 x i64> %0, %1
	▲ Show 20 Lines • Show All 4,640 Lines • ▼ Show 20 Lines
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_vpcmpsgtq_v2i1_v4i1_mask:			; NoVLX-LABEL: test_vpcmpsgtq_v2i1_v4i1_mask:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%1 = bitcast <2 x i64> %__b to <2 x i64>			%1 = bitcast <2 x i64> %__b to <2 x i64>
	%2 = icmp sgt <2 x i64> %0, %1			%2 = icmp sgt <2 x i64> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%4 = bitcast <4 x i1> %3 to i4			%4 = bitcast <4 x i1> %3 to i4
	ret i4 %4			ret i4 %4
	}			}

	define zeroext i4 @test_vpcmpsgtq_v2i1_v4i1_mask_mem(<2 x i64> %__a, ptr %__b) local_unnamed_addr {			define zeroext i4 @test_vpcmpsgtq_v2i1_v4i1_mask_mem(<2 x i64> %__a, ptr %__b) local_unnamed_addr {
	; VLX-LABEL: test_vpcmpsgtq_v2i1_v4i1_mask_mem:			; VLX-LABEL: test_vpcmpsgtq_v2i1_v4i1_mask_mem:
	; VLX: # %bb.0: # %entry			; VLX: # %bb.0: # %entry
	; VLX-NEXT: vpcmpgtq (%rdi), %xmm0, %k0			; VLX-NEXT: vpcmpgtq (%rdi), %xmm0, %k0
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_vpcmpsgtq_v2i1_v4i1_mask_mem:			; NoVLX-LABEL: test_vpcmpsgtq_v2i1_v4i1_mask_mem:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vmovdqa (%rdi), %xmm1			; NoVLX-NEXT: vmovdqa (%rdi), %xmm1
	; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load <2 x i64>, ptr %__b			%load = load <2 x i64>, ptr %__b
	%1 = bitcast <2 x i64> %load to <2 x i64>			%1 = bitcast <2 x i64> %load to <2 x i64>
	%2 = icmp sgt <2 x i64> %0, %1			%2 = icmp sgt <2 x i64> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	Show All 10 Lines
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_masked_vpcmpsgtq_v2i1_v4i1_mask:			; NoVLX-LABEL: test_masked_vpcmpsgtq_v2i1_v4i1_mask:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%1 = bitcast <2 x i64> %__b to <2 x i64>			%1 = bitcast <2 x i64> %__b to <2 x i64>
	%2 = icmp sgt <2 x i64> %0, %1			%2 = icmp sgt <2 x i64> %0, %1
	%3 = bitcast i8 %__u to <8 x i1>			%3 = bitcast i8 %__u to <8 x i1>
	%extract.i = shufflevector <8 x i1> %3, <8 x i1> undef, <2 x i32> <i32 0, i32 1>			%extract.i = shufflevector <8 x i1> %3, <8 x i1> undef, <2 x i32> <i32 0, i32 1>
	Show All 12 Lines
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_masked_vpcmpsgtq_v2i1_v4i1_mask_mem:			; NoVLX-LABEL: test_masked_vpcmpsgtq_v2i1_v4i1_mask_mem:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vmovdqa (%rsi), %xmm1			; NoVLX-NEXT: vmovdqa (%rsi), %xmm1
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpgtq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load <2 x i64>, ptr %__b			%load = load <2 x i64>, ptr %__b
	%1 = bitcast <2 x i64> %load to <2 x i64>			%1 = bitcast <2 x i64> %load to <2 x i64>
	%2 = icmp sgt <2 x i64> %0, %1			%2 = icmp sgt <2 x i64> %0, %1
	%3 = bitcast i8 %__u to <8 x i1>			%3 = bitcast i8 %__u to <8 x i1>
	Show All 11 Lines
	; VLX-NEXT: vpcmpgtq (%rdi){1to2}, %xmm0, %k0			; VLX-NEXT: vpcmpgtq (%rdi){1to2}, %xmm0, %k0
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_vpcmpsgtq_v2i1_v4i1_mask_mem_b:			; NoVLX-LABEL: test_vpcmpsgtq_v2i1_v4i1_mask_mem_b:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpcmpgtq (%rdi){1to8}, %zmm0, %k0			; NoVLX-NEXT: vpcmpgtq (%rdi){1to8}, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load i64, ptr %__b			%load = load i64, ptr %__b
	%vec = insertelement <2 x i64> undef, i64 %load, i32 0			%vec = insertelement <2 x i64> undef, i64 %load, i32 0
	%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>
	%2 = icmp sgt <2 x i64> %0, %1			%2 = icmp sgt <2 x i64> %0, %1
	Show All 10 Lines
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_masked_vpcmpsgtq_v2i1_v4i1_mask_mem_b:			; NoVLX-LABEL: test_masked_vpcmpsgtq_v2i1_v4i1_mask_mem_b:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpgtq (%rsi){1to8}, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpgtq (%rsi){1to8}, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load i64, ptr %__b			%load = load i64, ptr %__b
	%vec = insertelement <2 x i64> undef, i64 %load, i32 0			%vec = insertelement <2 x i64> undef, i64 %load, i32 0
	%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>
	%2 = icmp sgt <2 x i64> %0, %1			%2 = icmp sgt <2 x i64> %0, %1
	▲ Show 20 Lines • Show All 4,700 Lines • ▼ Show 20 Lines
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_vpcmpsgeq_v2i1_v4i1_mask:			; NoVLX-LABEL: test_vpcmpsgeq_v2i1_v4i1_mask:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%1 = bitcast <2 x i64> %__b to <2 x i64>			%1 = bitcast <2 x i64> %__b to <2 x i64>
	%2 = icmp sge <2 x i64> %0, %1			%2 = icmp sge <2 x i64> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%4 = bitcast <4 x i1> %3 to i4			%4 = bitcast <4 x i1> %3 to i4
	ret i4 %4			ret i4 %4
	}			}

	define zeroext i4 @test_vpcmpsgeq_v2i1_v4i1_mask_mem(<2 x i64> %__a, ptr %__b) local_unnamed_addr {			define zeroext i4 @test_vpcmpsgeq_v2i1_v4i1_mask_mem(<2 x i64> %__a, ptr %__b) local_unnamed_addr {
	; VLX-LABEL: test_vpcmpsgeq_v2i1_v4i1_mask_mem:			; VLX-LABEL: test_vpcmpsgeq_v2i1_v4i1_mask_mem:
	; VLX: # %bb.0: # %entry			; VLX: # %bb.0: # %entry
	; VLX-NEXT: vpcmpnltq (%rdi), %xmm0, %k0			; VLX-NEXT: vpcmpnltq (%rdi), %xmm0, %k0
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_vpcmpsgeq_v2i1_v4i1_mask_mem:			; NoVLX-LABEL: test_vpcmpsgeq_v2i1_v4i1_mask_mem:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vmovdqa (%rdi), %xmm1			; NoVLX-NEXT: vmovdqa (%rdi), %xmm1
	; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load <2 x i64>, ptr %__b			%load = load <2 x i64>, ptr %__b
	%1 = bitcast <2 x i64> %load to <2 x i64>			%1 = bitcast <2 x i64> %load to <2 x i64>
	%2 = icmp sge <2 x i64> %0, %1			%2 = icmp sge <2 x i64> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	Show All 10 Lines
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_masked_vpcmpsgeq_v2i1_v4i1_mask:			; NoVLX-LABEL: test_masked_vpcmpsgeq_v2i1_v4i1_mask:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%1 = bitcast <2 x i64> %__b to <2 x i64>			%1 = bitcast <2 x i64> %__b to <2 x i64>
	%2 = icmp sge <2 x i64> %0, %1			%2 = icmp sge <2 x i64> %0, %1
	%3 = bitcast i8 %__u to <8 x i1>			%3 = bitcast i8 %__u to <8 x i1>
	%extract.i = shufflevector <8 x i1> %3, <8 x i1> undef, <2 x i32> <i32 0, i32 1>			%extract.i = shufflevector <8 x i1> %3, <8 x i1> undef, <2 x i32> <i32 0, i32 1>
	Show All 12 Lines
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_masked_vpcmpsgeq_v2i1_v4i1_mask_mem:			; NoVLX-LABEL: test_masked_vpcmpsgeq_v2i1_v4i1_mask_mem:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vmovdqa (%rsi), %xmm1			; NoVLX-NEXT: vmovdqa (%rsi), %xmm1
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpnltq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load <2 x i64>, ptr %__b			%load = load <2 x i64>, ptr %__b
	%1 = bitcast <2 x i64> %load to <2 x i64>			%1 = bitcast <2 x i64> %load to <2 x i64>
	%2 = icmp sge <2 x i64> %0, %1			%2 = icmp sge <2 x i64> %0, %1
	%3 = bitcast i8 %__u to <8 x i1>			%3 = bitcast i8 %__u to <8 x i1>
	Show All 11 Lines
	; VLX-NEXT: vpcmpnltq (%rdi){1to2}, %xmm0, %k0			; VLX-NEXT: vpcmpnltq (%rdi){1to2}, %xmm0, %k0
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_vpcmpsgeq_v2i1_v4i1_mask_mem_b:			; NoVLX-LABEL: test_vpcmpsgeq_v2i1_v4i1_mask_mem_b:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpcmpnltq (%rdi){1to8}, %zmm0, %k0			; NoVLX-NEXT: vpcmpnltq (%rdi){1to8}, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load i64, ptr %__b			%load = load i64, ptr %__b
	%vec = insertelement <2 x i64> undef, i64 %load, i32 0			%vec = insertelement <2 x i64> undef, i64 %load, i32 0
	%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>
	%2 = icmp sge <2 x i64> %0, %1			%2 = icmp sge <2 x i64> %0, %1
	Show All 10 Lines
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_masked_vpcmpsgeq_v2i1_v4i1_mask_mem_b:			; NoVLX-LABEL: test_masked_vpcmpsgeq_v2i1_v4i1_mask_mem_b:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpnltq (%rsi){1to8}, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpnltq (%rsi){1to8}, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load i64, ptr %__b			%load = load i64, ptr %__b
	%vec = insertelement <2 x i64> undef, i64 %load, i32 0			%vec = insertelement <2 x i64> undef, i64 %load, i32 0
	%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>
	%2 = icmp sge <2 x i64> %0, %1			%2 = icmp sge <2 x i64> %0, %1
	▲ Show 20 Lines • Show All 4,720 Lines • ▼ Show 20 Lines
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_vpcmpultq_v2i1_v4i1_mask:			; NoVLX-LABEL: test_vpcmpultq_v2i1_v4i1_mask:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%1 = bitcast <2 x i64> %__b to <2 x i64>			%1 = bitcast <2 x i64> %__b to <2 x i64>
	%2 = icmp ult <2 x i64> %0, %1			%2 = icmp ult <2 x i64> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%4 = bitcast <4 x i1> %3 to i4			%4 = bitcast <4 x i1> %3 to i4
	ret i4 %4			ret i4 %4
	}			}

	define zeroext i4 @test_vpcmpultq_v2i1_v4i1_mask_mem(<2 x i64> %__a, ptr %__b) local_unnamed_addr {			define zeroext i4 @test_vpcmpultq_v2i1_v4i1_mask_mem(<2 x i64> %__a, ptr %__b) local_unnamed_addr {
	; VLX-LABEL: test_vpcmpultq_v2i1_v4i1_mask_mem:			; VLX-LABEL: test_vpcmpultq_v2i1_v4i1_mask_mem:
	; VLX: # %bb.0: # %entry			; VLX: # %bb.0: # %entry
	; VLX-NEXT: vpcmpltuq (%rdi), %xmm0, %k0			; VLX-NEXT: vpcmpltuq (%rdi), %xmm0, %k0
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_vpcmpultq_v2i1_v4i1_mask_mem:			; NoVLX-LABEL: test_vpcmpultq_v2i1_v4i1_mask_mem:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vmovdqa (%rdi), %xmm1			; NoVLX-NEXT: vmovdqa (%rdi), %xmm1
	; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0			; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load <2 x i64>, ptr %__b			%load = load <2 x i64>, ptr %__b
	%1 = bitcast <2 x i64> %load to <2 x i64>			%1 = bitcast <2 x i64> %load to <2 x i64>
	%2 = icmp ult <2 x i64> %0, %1			%2 = icmp ult <2 x i64> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	Show All 10 Lines
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_masked_vpcmpultq_v2i1_v4i1_mask:			; NoVLX-LABEL: test_masked_vpcmpultq_v2i1_v4i1_mask:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%1 = bitcast <2 x i64> %__b to <2 x i64>			%1 = bitcast <2 x i64> %__b to <2 x i64>
	%2 = icmp ult <2 x i64> %0, %1			%2 = icmp ult <2 x i64> %0, %1
	%3 = bitcast i8 %__u to <8 x i1>			%3 = bitcast i8 %__u to <8 x i1>
	%extract.i = shufflevector <8 x i1> %3, <8 x i1> undef, <2 x i32> <i32 0, i32 1>			%extract.i = shufflevector <8 x i1> %3, <8 x i1> undef, <2 x i32> <i32 0, i32 1>
	Show All 12 Lines
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_masked_vpcmpultq_v2i1_v4i1_mask_mem:			; NoVLX-LABEL: test_masked_vpcmpultq_v2i1_v4i1_mask_mem:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vmovdqa (%rsi), %xmm1			; NoVLX-NEXT: vmovdqa (%rsi), %xmm1
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpltuq %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load <2 x i64>, ptr %__b			%load = load <2 x i64>, ptr %__b
	%1 = bitcast <2 x i64> %load to <2 x i64>			%1 = bitcast <2 x i64> %load to <2 x i64>
	%2 = icmp ult <2 x i64> %0, %1			%2 = icmp ult <2 x i64> %0, %1
	%3 = bitcast i8 %__u to <8 x i1>			%3 = bitcast i8 %__u to <8 x i1>
	Show All 11 Lines
	; VLX-NEXT: vpcmpltuq (%rdi){1to2}, %xmm0, %k0			; VLX-NEXT: vpcmpltuq (%rdi){1to2}, %xmm0, %k0
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_vpcmpultq_v2i1_v4i1_mask_mem_b:			; NoVLX-LABEL: test_vpcmpultq_v2i1_v4i1_mask_mem_b:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vpcmpltuq (%rdi){1to8}, %zmm0, %k0			; NoVLX-NEXT: vpcmpltuq (%rdi){1to8}, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load i64, ptr %__b			%load = load i64, ptr %__b
	%vec = insertelement <2 x i64> undef, i64 %load, i32 0			%vec = insertelement <2 x i64> undef, i64 %load, i32 0
	%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>
	%2 = icmp ult <2 x i64> %0, %1			%2 = icmp ult <2 x i64> %0, %1
	Show All 10 Lines
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_masked_vpcmpultq_v2i1_v4i1_mask_mem_b:			; NoVLX-LABEL: test_masked_vpcmpultq_v2i1_v4i1_mask_mem_b:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vpcmpltuq (%rsi){1to8}, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vpcmpltuq (%rsi){1to8}, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x i64>			%0 = bitcast <2 x i64> %__a to <2 x i64>
	%load = load i64, ptr %__b			%load = load i64, ptr %__b
	%vec = insertelement <2 x i64> undef, i64 %load, i32 0			%vec = insertelement <2 x i64> undef, i64 %load, i32 0
	%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x i64> %vec, <2 x i64> undef, <2 x i32> <i32 0, i32 0>
	%2 = icmp ult <2 x i64> %0, %1			%2 = icmp ult <2 x i64> %0, %1
	▲ Show 20 Lines • Show All 3,667 Lines • ▼ Show 20 Lines
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_vcmpoeqpd_v2i1_v4i1_mask:			; NoVLX-LABEL: test_vcmpoeqpd_v2i1_v4i1_mask:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0			; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x double>			%0 = bitcast <2 x i64> %__a to <2 x double>
	%1 = bitcast <2 x i64> %__b to <2 x double>			%1 = bitcast <2 x i64> %__b to <2 x double>
	%2 = fcmp oeq <2 x double> %0, %1			%2 = fcmp oeq <2 x double> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%4 = bitcast <4 x i1> %3 to i4			%4 = bitcast <4 x i1> %3 to i4
	ret i4 %4			ret i4 %4
	}			}

	define zeroext i4 @test_vcmpoeqpd_v2i1_v4i1_mask_mem(<2 x i64> %__a, ptr %__b) local_unnamed_addr {			define zeroext i4 @test_vcmpoeqpd_v2i1_v4i1_mask_mem(<2 x i64> %__a, ptr %__b) local_unnamed_addr {
	; VLX-LABEL: test_vcmpoeqpd_v2i1_v4i1_mask_mem:			; VLX-LABEL: test_vcmpoeqpd_v2i1_v4i1_mask_mem:
	; VLX: # %bb.0: # %entry			; VLX: # %bb.0: # %entry
	; VLX-NEXT: vcmpeqpd (%rdi), %xmm0, %k0			; VLX-NEXT: vcmpeqpd (%rdi), %xmm0, %k0
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_vcmpoeqpd_v2i1_v4i1_mask_mem:			; NoVLX-LABEL: test_vcmpoeqpd_v2i1_v4i1_mask_mem:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vmovapd (%rdi), %xmm1			; NoVLX-NEXT: vmovapd (%rdi), %xmm1
	; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0			; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x double>			%0 = bitcast <2 x i64> %__a to <2 x double>
	%load = load <2 x i64>, ptr %__b			%load = load <2 x i64>, ptr %__b
	%1 = bitcast <2 x i64> %load to <2 x double>			%1 = bitcast <2 x i64> %load to <2 x double>
	%2 = fcmp oeq <2 x double> %0, %1			%2 = fcmp oeq <2 x double> %0, %1
	%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%3 = shufflevector <2 x i1> %2, <2 x i1> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	%4 = bitcast <4 x i1> %3 to i4			%4 = bitcast <4 x i1> %3 to i4
	ret i4 %4			ret i4 %4
	}			}

	define zeroext i4 @test_vcmpoeqpd_v2i1_v4i1_mask_mem_b(<2 x i64> %__a, ptr %__b) local_unnamed_addr {			define zeroext i4 @test_vcmpoeqpd_v2i1_v4i1_mask_mem_b(<2 x i64> %__a, ptr %__b) local_unnamed_addr {
	; VLX-LABEL: test_vcmpoeqpd_v2i1_v4i1_mask_mem_b:			; VLX-LABEL: test_vcmpoeqpd_v2i1_v4i1_mask_mem_b:
	; VLX: # %bb.0: # %entry			; VLX: # %bb.0: # %entry
	; VLX-NEXT: vcmpeqpd (%rdi){1to2}, %xmm0, %k0			; VLX-NEXT: vcmpeqpd (%rdi){1to2}, %xmm0, %k0
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_vcmpoeqpd_v2i1_v4i1_mask_mem_b:			; NoVLX-LABEL: test_vcmpoeqpd_v2i1_v4i1_mask_mem_b:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: vcmpeqpd (%rdi){1to8}, %zmm0, %k0			; NoVLX-NEXT: vcmpeqpd (%rdi){1to8}, %zmm0, %k0
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x double>			%0 = bitcast <2 x i64> %__a to <2 x double>
	%load = load double, ptr %__b			%load = load double, ptr %__b
	%vec = insertelement <2 x double> undef, double %load, i32 0			%vec = insertelement <2 x double> undef, double %load, i32 0
	%1 = shufflevector <2 x double> %vec, <2 x double> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x double> %vec, <2 x double> undef, <2 x i32> <i32 0, i32 0>
	%2 = fcmp oeq <2 x double> %0, %1			%2 = fcmp oeq <2 x double> %0, %1
	Show All 11 Lines
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_masked_vcmpoeqpd_v2i1_v4i1_mask:			; NoVLX-LABEL: test_masked_vcmpoeqpd_v2i1_v4i1_mask:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1			; NoVLX-NEXT: # kill: def $xmm1 killed $xmm1 def $zmm1
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x double>			%0 = bitcast <2 x i64> %__a to <2 x double>
	%1 = bitcast <2 x i64> %__b to <2 x double>			%1 = bitcast <2 x i64> %__b to <2 x double>
	%2 = fcmp oeq <2 x double> %0, %1			%2 = fcmp oeq <2 x double> %0, %1
	%3 = bitcast i2 %__u to <2 x i1>			%3 = bitcast i2 %__u to <2 x i1>
	%4 = and <2 x i1> %2, %3			%4 = and <2 x i1> %2, %3
	Show All 11 Lines
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_masked_vcmpoeqpd_v2i1_v4i1_mask_mem:			; NoVLX-LABEL: test_masked_vcmpoeqpd_v2i1_v4i1_mask_mem:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vmovapd (%rsi), %xmm1			; NoVLX-NEXT: vmovapd (%rsi), %xmm1
	; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vcmpeqpd %zmm1, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x double>			%0 = bitcast <2 x i64> %__a to <2 x double>
	%load = load <2 x i64>, ptr %__b			%load = load <2 x i64>, ptr %__b
	%1 = bitcast <2 x i64> %load to <2 x double>			%1 = bitcast <2 x i64> %load to <2 x double>
	%2 = fcmp oeq <2 x double> %0, %1			%2 = fcmp oeq <2 x double> %0, %1
	%3 = bitcast i2 %__u to <2 x i1>			%3 = bitcast i2 %__u to <2 x i1>
	Show All 11 Lines
	; VLX-NEXT: kmovb %k0, %eax			; VLX-NEXT: kmovb %k0, %eax
	; VLX-NEXT: retq			; VLX-NEXT: retq
	;			;
	; NoVLX-LABEL: test_masked_vcmpoeqpd_v2i1_v4i1_mask_mem_b:			; NoVLX-LABEL: test_masked_vcmpoeqpd_v2i1_v4i1_mask_mem_b:
	; NoVLX: # %bb.0: # %entry			; NoVLX: # %bb.0: # %entry
	; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0			; NoVLX-NEXT: # kill: def $xmm0 killed $xmm0 def $zmm0
	; NoVLX-NEXT: kmovw %edi, %k1			; NoVLX-NEXT: kmovw %edi, %k1
	; NoVLX-NEXT: vcmpeqpd (%rsi){1to8}, %zmm0, %k0 {%k1}			; NoVLX-NEXT: vcmpeqpd (%rsi){1to8}, %zmm0, %k0 {%k1}
	; NoVLX-NEXT: kshiftlw $14, %k0, %k0
	; NoVLX-NEXT: kshiftrw $14, %k0, %k0
	; NoVLX-NEXT: kmovw %k0, %eax			; NoVLX-NEXT: kmovw %k0, %eax
				; NoVLX-NEXT: andl $3, %eax
	; NoVLX-NEXT: vzeroupper			; NoVLX-NEXT: vzeroupper
	; NoVLX-NEXT: retq			; NoVLX-NEXT: retq
	entry:			entry:
	%0 = bitcast <2 x i64> %__a to <2 x double>			%0 = bitcast <2 x i64> %__a to <2 x double>
	%load = load double, ptr %__b			%load = load double, ptr %__b
	%vec = insertelement <2 x double> undef, double %load, i32 0			%vec = insertelement <2 x double> undef, double %load, i32 0
	%1 = shufflevector <2 x double> %vec, <2 x double> undef, <2 x i32> <i32 0, i32 0>			%1 = shufflevector <2 x double> %vec, <2 x double> undef, <2 x i32> <i32 0, i32 0>
	%2 = fcmp oeq <2 x double> %0, %1			%2 = fcmp oeq <2 x double> %0, %1
	▲ Show 20 Lines • Show All 2,173 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/const-shift-of-constmasked.ll

Load File

llvm/test/CodeGen/X86/dagcombine-cse.ll

Load File

llvm/test/CodeGen/X86/dagcombine-select.ll

Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	; CHECK-NEXT: retq
%sel = select i1 %cond, i32 2, i32 3		%sel = select i1 %cond, i32 2, i32 3
%bo = shl i32 %sel, 8		%bo = shl i32 %sel, 8
ret i32 %bo		ret i32 %bo
}		}

define i32 @shl_constant_sel_constants(i1 %cond) {		define i32 @shl_constant_sel_constants(i1 %cond) {
; CHECK-LABEL: shl_constant_sel_constants:		; CHECK-LABEL: shl_constant_sel_constants:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: notb %dil		; CHECK-NEXT: movl %edi, %ecx
; CHECK-NEXT: movzbl %dil, %eax		; CHECK-NEXT: andb $1, %cl
; CHECK-NEXT: andl $1, %eax		; CHECK-NEXT: xorb $3, %cl
; CHECK-NEXT: leal 4(,%rax,4), %eax		; CHECK-NEXT: movl $1, %eax
		; CHECK-NEXT: # kill: def $cl killed $cl killed $ecx
		; CHECK-NEXT: shll %cl, %eax
		deadalnixAuthorUnsubmitted Done Reply Inline Actions Looking into this guy, it is not obvious if the backend is the right place to fix it. Previously, the DAG looked like: SelectionDAG has 17 nodes: t0: ch = EntryToken t2: i32,ch = CopyFromReg t0, Register:i32 %0 t32: i8 = truncate t2 t38: i8 = xor t32, Constant:i8<-1> t34: i32 = any_extend t38 t36: i32 = and t34, Constant:i32<1> t30: i32 = shl t36, Constant:i8<2> t25: i32 = add t30, Constant:i32<4> t13: ch,glue = CopyToReg t0, Register:i32 $eax, t25 t14: ch = X86ISD::RET_FLAG t13, TargetConstant:i32<0>, Register:i32 $eax, t13:1 Now it does look like: SelectionDAG has 14 nodes: t0: ch = EntryToken t2: i32,ch = CopyFromReg t0, Register:i32 %0 t22: i8 = truncate t2 t24: i8 = and t22, Constant:i8<1> t21: i8 = sub Constant:i8<3>, t24 t10: i32 = shl Constant:i32<1>, t21 t13: ch,glue = CopyToReg t0, Register:i32 $eax, t10 t14: ch = X86ISD::RET_FLAG t13, TargetConstant:i32<0>, Register:i32 $eax, t13:1 It must be noted that opt will transform this IR into: define i32 @shl_constant_sel_constants(i1 %cond) local_unnamed_addr #0 { %bo = select i1 %cond, i32 4, i32 8 ret i32 %bo } Which, regardless of this patch, compiles to: shl_constant_sel_constants: # @shl_constant_sel_constants # %bb.0: notb %dil movzbl %dil, %eax andl $1, %eax leal 4(,%rax,4), %eax retq So it seems that this is fine. how do we proceed in such a case? Simply add a second version of that test case with the optimized IR? deadalnix: Looking into this guy, it is not obvious if the backend is the right place to fix it.
		RKSimonUnsubmitted Not Done Reply Inline Actions Last person to touch this was @laytonio in D90349 - it looks like this is the same codegen as BEFORE that patch. RKSimon: Last person to touch this was @laytonio in D90349 - it looks like this is the same codegen as…
		spatelUnsubmitted Not Done Reply Inline Actions D128080 might help this example. spatel: D128080 might help this example.
		deadalnixAuthorUnsubmitted Done Reply Inline Actions So I was able to find a solution for this, by matching select of constant equivalents. However, ti is creating a couple of infinite loops at the moment, so we'll see how it goes. deadalnix: So I was able to find a solution for this, by matching select of constant equivalents. However…
		deadalnixAuthorUnsubmitted Done Reply Inline Actions I submitted an RFC of the approach in D130675 to get feedback. It's not fully fledged, but it shows some potential. deadalnix: I submitted an RFC of the approach in D130675 to get feedback. It's not fully fledged, but it…
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%sel = select i1 %cond, i32 2, i32 3		%sel = select i1 %cond, i32 2, i32 3
%bo = shl i32 1, %sel		%bo = shl i32 1, %sel
ret i32 %bo		ret i32 %bo
}		}

define i32 @lshr_constant_sel_constants(i1 %cond) {		define i32 @lshr_constant_sel_constants(i1 %cond) {
; CHECK-LABEL: lshr_constant_sel_constants:		; CHECK-LABEL: lshr_constant_sel_constants:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: # kill: def $edi killed $edi def $rdi		; CHECK-NEXT: movl %edi, %ecx
; CHECK-NEXT: andl $1, %edi		; CHECK-NEXT: andb $1, %cl
; CHECK-NEXT: leal 8(,%rdi,8), %eax		; CHECK-NEXT: xorb $3, %cl
		; CHECK-NEXT: movl $64, %eax
		; CHECK-NEXT: # kill: def $cl killed $cl killed $ecx
		; CHECK-NEXT: shrl %cl, %eax
		RKSimonUnsubmitted Not Done Reply Inline Actions Looks like the truncate means we're now failing to call foldBinOpIntoSelect before: SelectionDAG has 15 nodes: t0: ch,glue = EntryToken t2: i32,ch = CopyFromReg t0, Register:i32 %0 t3: i8 = truncate t2 t4: i1 = truncate t2 t7: i32 = select t4, Constant:i32<2>, Constant:i32<3> t9: i8 = truncate t7 t10: i32 = shl Constant:i32<1>, t9 t13: ch,glue = CopyToReg t0, Register:i32 $eax, t10 t14: ch = X86ISD::RET_FLAG t13, TargetConstant:i32<0>, Register:i32 $eax, t13:1 becomes: SelectionDAG has 14 nodes: t0: ch,glue = EntryToken t2: i32,ch = CopyFromReg t0, Register:i32 %0 t23: i8 = truncate t2 t25: i8 = and t23, Constant:i8<1> t22: i8 = xor t25, Constant:i8<3> t10: i32 = shl Constant:i32<1>, t22 t13: ch,glue = CopyToReg t0, Register:i32 $eax, t10 t14: ch = X86ISD::RET_FLAG t13, TargetConstant:i32<0>, Register:i32 $eax, t13:1 RKSimon: Looks like the truncate means we're now failing to call foldBinOpIntoSelect before: ```…
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%sel = select i1 %cond, i32 2, i32 3		%sel = select i1 %cond, i32 2, i32 3
%bo = lshr i32 64, %sel		%bo = lshr i32 64, %sel
ret i32 %bo		ret i32 %bo
}		}

define i32 @ashr_constant_sel_constants(i1 %cond) {		define i32 @ashr_constant_sel_constants(i1 %cond) {
; CHECK-LABEL: ashr_constant_sel_constants:		; CHECK-LABEL: ashr_constant_sel_constants:
; CHECK: # %bb.0:		; CHECK: # %bb.0:
; CHECK-NEXT: # kill: def $edi killed $edi def $rdi		; CHECK-NEXT: movl %edi, %ecx
; CHECK-NEXT: andl $1, %edi		; CHECK-NEXT: andb $1, %cl
; CHECK-NEXT: shll $4, %edi		; CHECK-NEXT: xorb $3, %cl
; CHECK-NEXT: leal 16(%rdi), %eax		; CHECK-NEXT: movl $128, %eax
		; CHECK-NEXT: # kill: def $cl killed $cl killed $ecx
		; CHECK-NEXT: shrl %cl, %eax
; CHECK-NEXT: retq		; CHECK-NEXT: retq
%sel = select i1 %cond, i32 2, i32 3		%sel = select i1 %cond, i32 2, i32 3
%bo = ashr i32 128, %sel		%bo = ashr i32 128, %sel
ret i32 %bo		ret i32 %bo
}		}

define double @fsub_constant_sel_constants(i1 %cond) {		define double @fsub_constant_sel_constants(i1 %cond) {
; CHECK-LABEL: fsub_constant_sel_constants:		; CHECK-LABEL: fsub_constant_sel_constants:
▲ Show 20 Lines • Show All 235 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/field-extract-use-trunc.ll

Load File

llvm/test/CodeGen/X86/horizontal-sum.ll

Show All 26 Lines
;		;
; SSSE3-FAST-LABEL: pair_sum_v4f32_v4f32:		; SSSE3-FAST-LABEL: pair_sum_v4f32_v4f32:
; SSSE3-FAST: # %bb.0:		; SSSE3-FAST: # %bb.0:
; SSSE3-FAST-NEXT: haddps %xmm1, %xmm0		; SSSE3-FAST-NEXT: haddps %xmm1, %xmm0
; SSSE3-FAST-NEXT: haddps %xmm3, %xmm2		; SSSE3-FAST-NEXT: haddps %xmm3, %xmm2
; SSSE3-FAST-NEXT: haddps %xmm2, %xmm0		; SSSE3-FAST-NEXT: haddps %xmm2, %xmm0
; SSSE3-FAST-NEXT: retq		; SSSE3-FAST-NEXT: retq
;		;
; AVX1-SLOW-LABEL: pair_sum_v4f32_v4f32:		; AVX-SLOW-LABEL: pair_sum_v4f32_v4f32:
; AVX1-SLOW: # %bb.0:		; AVX-SLOW: # %bb.0:
; AVX1-SLOW-NEXT: vhaddps %xmm1, %xmm0, %xmm0		; AVX-SLOW-NEXT: vhaddps %xmm1, %xmm0, %xmm0
; AVX1-SLOW-NEXT: vhaddps %xmm2, %xmm2, %xmm1		; AVX-SLOW-NEXT: vhaddps %xmm2, %xmm2, %xmm1
; AVX1-SLOW-NEXT: vshufps {{.*#+}} xmm2 = xmm0[0,2],xmm1[0,1]		; AVX-SLOW-NEXT: vshufps {{.*#+}} xmm2 = xmm0[0,2],xmm1[0,1]
; AVX1-SLOW-NEXT: vshufps {{.*#+}} xmm0 = xmm0[1,3],xmm1[1,1]		; AVX-SLOW-NEXT: vshufps {{.*#+}} xmm0 = xmm0[1,3],xmm1[1,1]
; AVX1-SLOW-NEXT: vhaddps %xmm3, %xmm3, %xmm1		; AVX-SLOW-NEXT: vhaddps %xmm3, %xmm3, %xmm1
; AVX1-SLOW-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[3]		; AVX-SLOW-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[3]
; AVX1-SLOW-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0,1,2],xmm1[0]		; AVX-SLOW-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0,1,2],xmm1[0]
; AVX1-SLOW-NEXT: vaddps %xmm0, %xmm1, %xmm0		; AVX-SLOW-NEXT: vaddps %xmm0, %xmm1, %xmm0
; AVX1-SLOW-NEXT: retq		; AVX-SLOW-NEXT: retq
;		;
; AVX-FAST-LABEL: pair_sum_v4f32_v4f32:		; AVX-FAST-LABEL: pair_sum_v4f32_v4f32:
; AVX-FAST: # %bb.0:		; AVX-FAST: # %bb.0:
; AVX-FAST-NEXT: vhaddps %xmm1, %xmm0, %xmm0		; AVX-FAST-NEXT: vhaddps %xmm1, %xmm0, %xmm0
; AVX-FAST-NEXT: vhaddps %xmm3, %xmm2, %xmm1		; AVX-FAST-NEXT: vhaddps %xmm3, %xmm2, %xmm1
; AVX-FAST-NEXT: vhaddps %xmm1, %xmm0, %xmm0		; AVX-FAST-NEXT: vhaddps %xmm1, %xmm0, %xmm0
; AVX-FAST-NEXT: retq		; AVX-FAST-NEXT: retq
;
; AVX2-SLOW-LABEL: pair_sum_v4f32_v4f32:
; AVX2-SLOW: # %bb.0:
; AVX2-SLOW-NEXT: vhaddps %xmm1, %xmm0, %xmm0
; AVX2-SLOW-NEXT: vhaddps %xmm2, %xmm2, %xmm1
; AVX2-SLOW-NEXT: vshufps {{.*#+}} xmm2 = xmm0[0,2],xmm1[0,3]
; AVX2-SLOW-NEXT: vshufps {{.*#+}} xmm0 = xmm0[1,3],xmm1[1,3]
; AVX2-SLOW-NEXT: vhaddps %xmm3, %xmm3, %xmm1
; AVX2-SLOW-NEXT: vblendps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[3]
; AVX2-SLOW-NEXT: vinsertps {{.*#+}} xmm1 = xmm2[0,1,2],xmm1[0]
; AVX2-SLOW-NEXT: vaddps %xmm0, %xmm1, %xmm0
; AVX2-SLOW-NEXT: retq
%5 = shufflevector <4 x float> %0, <4 x float> poison, <2 x i32> <i32 0, i32 2>		%5 = shufflevector <4 x float> %0, <4 x float> poison, <2 x i32> <i32 0, i32 2>
%6 = shufflevector <4 x float> %0, <4 x float> poison, <2 x i32> <i32 1, i32 3>		%6 = shufflevector <4 x float> %0, <4 x float> poison, <2 x i32> <i32 1, i32 3>
%7 = fadd <2 x float> %5, %6		%7 = fadd <2 x float> %5, %6
%8 = shufflevector <2 x float> %7, <2 x float> poison, <2 x i32> <i32 1, i32 undef>		%8 = shufflevector <2 x float> %7, <2 x float> poison, <2 x i32> <i32 1, i32 undef>
%9 = fadd <2 x float> %7, %8		%9 = fadd <2 x float> %7, %8
%10 = shufflevector <4 x float> %1, <4 x float> poison, <2 x i32> <i32 0, i32 2>		%10 = shufflevector <4 x float> %1, <4 x float> poison, <2 x i32> <i32 0, i32 2>
%11 = shufflevector <4 x float> %1, <4 x float> poison, <2 x i32> <i32 1, i32 3>		%11 = shufflevector <4 x float> %1, <4 x float> poison, <2 x i32> <i32 1, i32 3>
%12 = fadd <2 x float> %10, %11		%12 = fadd <2 x float> %10, %11
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
; AVX1-SLOW-NEXT: vpaddd %xmm2, %xmm1, %xmm1		; AVX1-SLOW-NEXT: vpaddd %xmm2, %xmm1, %xmm1
; AVX1-SLOW-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]		; AVX1-SLOW-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
; AVX1-SLOW-NEXT: vphaddd %xmm3, %xmm3, %xmm1		; AVX1-SLOW-NEXT: vphaddd %xmm3, %xmm3, %xmm1
; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm2 = xmm1[0,0,0,0]		; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm2 = xmm1[0,0,0,0]
; AVX1-SLOW-NEXT: vpaddd %xmm1, %xmm2, %xmm1		; AVX1-SLOW-NEXT: vpaddd %xmm1, %xmm2, %xmm1
; AVX1-SLOW-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3,4,5],xmm1[6,7]		; AVX1-SLOW-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3,4,5],xmm1[6,7]
; AVX1-SLOW-NEXT: retq		; AVX1-SLOW-NEXT: retq
;		;
; AVX1-FAST-LABEL: pair_sum_v4i32_v4i32:		; AVX-FAST-LABEL: pair_sum_v4i32_v4i32:
; AVX1-FAST: # %bb.0:		; AVX-FAST: # %bb.0:
; AVX1-FAST-NEXT: vphaddd %xmm1, %xmm0, %xmm0		; AVX-FAST-NEXT: vphaddd %xmm1, %xmm0, %xmm0
; AVX1-FAST-NEXT: vphaddd %xmm3, %xmm2, %xmm1		; AVX-FAST-NEXT: vphaddd %xmm3, %xmm2, %xmm1
; AVX1-FAST-NEXT: vphaddd %xmm1, %xmm0, %xmm0		; AVX-FAST-NEXT: vphaddd %xmm1, %xmm0, %xmm0
; AVX1-FAST-NEXT: retq		; AVX-FAST-NEXT: retq
;		;
; AVX2-SLOW-LABEL: pair_sum_v4i32_v4i32:		; AVX2-SLOW-LABEL: pair_sum_v4i32_v4i32:
; AVX2-SLOW: # %bb.0:		; AVX2-SLOW: # %bb.0:
; AVX2-SLOW-NEXT: vphaddd %xmm1, %xmm0, %xmm0		; AVX2-SLOW-NEXT: vphaddd %xmm1, %xmm0, %xmm0
; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[1,1,3,3]		; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[1,3,1,3]
		; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,1,3]
; AVX2-SLOW-NEXT: vpaddd %xmm1, %xmm0, %xmm0		; AVX2-SLOW-NEXT: vpaddd %xmm1, %xmm0, %xmm0
; AVX2-SLOW-NEXT: vphaddd %xmm2, %xmm2, %xmm1		; AVX2-SLOW-NEXT: vphaddd %xmm2, %xmm2, %xmm1
; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm2 = xmm1[1,1,1,1]		; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm2 = xmm1[1,1,1,1]
; AVX2-SLOW-NEXT: vpaddd %xmm2, %xmm1, %xmm1		; AVX2-SLOW-NEXT: vpaddd %xmm2, %xmm1, %xmm1
; AVX2-SLOW-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,3]		; AVX2-SLOW-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
; AVX2-SLOW-NEXT: vphaddd %xmm3, %xmm3, %xmm1		; AVX2-SLOW-NEXT: vphaddd %xmm3, %xmm3, %xmm1
; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm2 = xmm1[1,1,1,1]		; AVX2-SLOW-NEXT: vpbroadcastd %xmm1, %xmm2
; AVX2-SLOW-NEXT: vpaddd %xmm2, %xmm1, %xmm1		; AVX2-SLOW-NEXT: vpaddd %xmm1, %xmm2, %xmm1
; AVX2-SLOW-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[0]		; AVX2-SLOW-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[3]
; AVX2-SLOW-NEXT: retq		; AVX2-SLOW-NEXT: retq
;
; AVX2-FAST-LABEL: pair_sum_v4i32_v4i32:
; AVX2-FAST: # %bb.0:
; AVX2-FAST-NEXT: vphaddd %xmm3, %xmm2, %xmm2
; AVX2-FAST-NEXT: vphaddd %xmm1, %xmm0, %xmm0
; AVX2-FAST-NEXT: vphaddd %xmm2, %xmm0, %xmm0
; AVX2-FAST-NEXT: retq
%5 = shufflevector <4 x i32> %0, <4 x i32> poison, <2 x i32> <i32 0, i32 2>		%5 = shufflevector <4 x i32> %0, <4 x i32> poison, <2 x i32> <i32 0, i32 2>
%6 = shufflevector <4 x i32> %0, <4 x i32> poison, <2 x i32> <i32 1, i32 3>		%6 = shufflevector <4 x i32> %0, <4 x i32> poison, <2 x i32> <i32 1, i32 3>
%7 = add <2 x i32> %5, %6		%7 = add <2 x i32> %5, %6
%8 = shufflevector <2 x i32> %7, <2 x i32> poison, <2 x i32> <i32 1, i32 undef>		%8 = shufflevector <2 x i32> %7, <2 x i32> poison, <2 x i32> <i32 1, i32 undef>
%9 = add <2 x i32> %7, %8		%9 = add <2 x i32> %7, %8
%10 = shufflevector <4 x i32> %1, <4 x i32> poison, <2 x i32> <i32 0, i32 2>		%10 = shufflevector <4 x i32> %1, <4 x i32> poison, <2 x i32> <i32 0, i32 2>
%11 = shufflevector <4 x i32> %1, <4 x i32> poison, <2 x i32> <i32 1, i32 3>		%11 = shufflevector <4 x i32> %1, <4 x i32> poison, <2 x i32> <i32 1, i32 3>
%12 = add <2 x i32> %10, %11		%12 = add <2 x i32> %10, %11
Show All 21 Lines
; SSSE3-SLOW-LABEL: pair_sum_v8f32_v4f32:		; SSSE3-SLOW-LABEL: pair_sum_v8f32_v4f32:
; SSSE3-SLOW: # %bb.0:		; SSSE3-SLOW: # %bb.0:
; SSSE3-SLOW-NEXT: haddps %xmm1, %xmm0		; SSSE3-SLOW-NEXT: haddps %xmm1, %xmm0
; SSSE3-SLOW-NEXT: movaps %xmm0, %xmm1		; SSSE3-SLOW-NEXT: movaps %xmm0, %xmm1
; SSSE3-SLOW-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,3],xmm0[1,3]		; SSSE3-SLOW-NEXT: shufps {{.*#+}} xmm1 = xmm1[1,3],xmm0[1,3]
; SSSE3-SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2,1,3]		; SSSE3-SLOW-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2,1,3]
; SSSE3-SLOW-NEXT: addps %xmm1, %xmm0		; SSSE3-SLOW-NEXT: addps %xmm1, %xmm0
; SSSE3-SLOW-NEXT: haddps %xmm3, %xmm2		; SSSE3-SLOW-NEXT: haddps %xmm3, %xmm2
; SSSE3-SLOW-NEXT: movaps %xmm5, %xmm1		; SSSE3-SLOW-NEXT: haddps %xmm4, %xmm5
; SSSE3-SLOW-NEXT: haddps %xmm4, %xmm1		; SSSE3-SLOW-NEXT: haddps %xmm5, %xmm2
; SSSE3-SLOW-NEXT: haddps %xmm1, %xmm2
; SSSE3-SLOW-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,1,3,2]		; SSSE3-SLOW-NEXT: shufps {{.*#+}} xmm2 = xmm2[0,1,3,2]
; SSSE3-SLOW-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm2[0]		; SSSE3-SLOW-NEXT: movlhps {{.*#+}} xmm0 = xmm0[0],xmm2[0]
; SSSE3-SLOW-NEXT: haddps %xmm7, %xmm6		; SSSE3-SLOW-NEXT: haddps %xmm7, %xmm6
; SSSE3-SLOW-NEXT: haddps %xmm5, %xmm4		; SSSE3-SLOW-NEXT: haddps %xmm6, %xmm6
; SSSE3-SLOW-NEXT: haddps %xmm6, %xmm4		; SSSE3-SLOW-NEXT: shufps {{.*#+}} xmm2 = xmm2[2,3],xmm6[0,1]
; SSSE3-SLOW-NEXT: movaps %xmm4, %xmm1		; SSSE3-SLOW-NEXT: movaps %xmm2, %xmm1
; SSSE3-SLOW-NEXT: retq		; SSSE3-SLOW-NEXT: retq
;		;
; SSSE3-FAST-LABEL: pair_sum_v8f32_v4f32:		; SSSE3-FAST-LABEL: pair_sum_v8f32_v4f32:
; SSSE3-FAST: # %bb.0:		; SSSE3-FAST: # %bb.0:
; SSSE3-FAST-NEXT: haddps %xmm1, %xmm0		; SSSE3-FAST-NEXT: haddps %xmm1, %xmm0
; SSSE3-FAST-NEXT: haddps %xmm0, %xmm0		; SSSE3-FAST-NEXT: haddps %xmm0, %xmm0
; SSSE3-FAST-NEXT: haddps %xmm3, %xmm2		; SSSE3-FAST-NEXT: haddps %xmm3, %xmm2
; SSSE3-FAST-NEXT: haddps %xmm5, %xmm4		; SSSE3-FAST-NEXT: haddps %xmm5, %xmm4
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
;		;
; AVX2-SLOW-LABEL: pair_sum_v8f32_v4f32:		; AVX2-SLOW-LABEL: pair_sum_v8f32_v4f32:
; AVX2-SLOW: # %bb.0:		; AVX2-SLOW: # %bb.0:
; AVX2-SLOW-NEXT: vhaddps %xmm1, %xmm0, %xmm0		; AVX2-SLOW-NEXT: vhaddps %xmm1, %xmm0, %xmm0
; AVX2-SLOW-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[1,3,1,3]		; AVX2-SLOW-NEXT: vpermilps {{.*#+}} xmm1 = xmm0[1,3,1,3]
; AVX2-SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,2,1,3]		; AVX2-SLOW-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,2,1,3]
; AVX2-SLOW-NEXT: vaddps %xmm1, %xmm0, %xmm0		; AVX2-SLOW-NEXT: vaddps %xmm1, %xmm0, %xmm0
; AVX2-SLOW-NEXT: vhaddps %xmm4, %xmm4, %xmm1		; AVX2-SLOW-NEXT: vhaddps %xmm4, %xmm4, %xmm1
; AVX2-SLOW-NEXT: vhaddps %xmm5, %xmm5, %xmm4		; AVX2-SLOW-NEXT: vhaddps %xmm5, %xmm5, %xmm8
; AVX2-SLOW-NEXT: vhaddps %xmm3, %xmm2, %xmm2		; AVX2-SLOW-NEXT: vhaddps %xmm3, %xmm2, %xmm2
; AVX2-SLOW-NEXT: vshufps {{.*#+}} xmm3 = xmm2[0,2],xmm1[0,3]		; AVX2-SLOW-NEXT: vshufps {{.*#+}} xmm1 = xmm2[0,2],xmm1[0,1]
; AVX2-SLOW-NEXT: vinsertps {{.*#+}} xmm3 = xmm3[0,1,2],xmm4[0]		; AVX2-SLOW-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1,2],xmm8[0]
; AVX2-SLOW-NEXT: vshufps {{.*#+}} xmm1 = xmm2[1,3],xmm1[1,3]		; AVX2-SLOW-NEXT: vhaddps %xmm4, %xmm5, %xmm3
; AVX2-SLOW-NEXT: vblendps {{.*#+}} xmm1 = xmm1[0,1,2],xmm4[3]		; AVX2-SLOW-NEXT: vshufps {{.*#+}} xmm2 = xmm2[1,3],xmm3[3,1]
; AVX2-SLOW-NEXT: vaddps %xmm1, %xmm3, %xmm1		; AVX2-SLOW-NEXT: vaddps %xmm2, %xmm1, %xmm1
; AVX2-SLOW-NEXT: vpermilpd {{.*#+}} xmm2 = xmm1[1,0]		; AVX2-SLOW-NEXT: vpermilpd {{.*#+}} xmm2 = xmm1[1,0]
; AVX2-SLOW-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0		; AVX2-SLOW-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
; AVX2-SLOW-NEXT: vmovddup {{.*#+}} xmm1 = xmm1[0,0]		; AVX2-SLOW-NEXT: vmovddup {{.*#+}} xmm1 = xmm1[0,0]
; AVX2-SLOW-NEXT: vhaddps %xmm7, %xmm6, %xmm2		; AVX2-SLOW-NEXT: vhaddps %xmm7, %xmm6, %xmm2
; AVX2-SLOW-NEXT: vhaddps %xmm2, %xmm2, %xmm2		; AVX2-SLOW-NEXT: vhaddps %xmm2, %xmm2, %xmm2
; AVX2-SLOW-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1		; AVX2-SLOW-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
; AVX2-SLOW-NEXT: vshufpd {{.*#+}} ymm0 = ymm0[0],ymm1[1],ymm0[2],ymm1[2]		; AVX2-SLOW-NEXT: vshufpd {{.*#+}} ymm0 = ymm0[0],ymm1[1],ymm0[2],ymm1[2]
; AVX2-SLOW-NEXT: retq		; AVX2-SLOW-NEXT: retq
;		;
; AVX2-FAST-LABEL: pair_sum_v8f32_v4f32:		; AVX2-FAST-LABEL: pair_sum_v8f32_v4f32:
; AVX2-FAST: # %bb.0:		; AVX2-FAST: # %bb.0:
; AVX2-FAST-NEXT: vhaddps %xmm1, %xmm0, %xmm0		; AVX2-FAST-NEXT: vhaddps %xmm1, %xmm0, %xmm0
; AVX2-FAST-NEXT: vhaddps %xmm0, %xmm0, %xmm0		; AVX2-FAST-NEXT: vhaddps %xmm0, %xmm0, %xmm0
; AVX2-FAST-NEXT: vhaddps %xmm4, %xmm4, %xmm1		; AVX2-FAST-NEXT: vhaddps %xmm4, %xmm4, %xmm1
; AVX2-FAST-NEXT: vhaddps %xmm5, %xmm5, %xmm4		; AVX2-FAST-NEXT: vhaddps %xmm5, %xmm5, %xmm8
; AVX2-FAST-NEXT: vhaddps %xmm3, %xmm2, %xmm2		; AVX2-FAST-NEXT: vhaddps %xmm3, %xmm2, %xmm2
; AVX2-FAST-NEXT: vshufps {{.*#+}} xmm3 = xmm2[0,2],xmm1[0,3]		; AVX2-FAST-NEXT: vshufps {{.*#+}} xmm1 = xmm2[0,2],xmm1[0,1]
; AVX2-FAST-NEXT: vinsertps {{.*#+}} xmm3 = xmm3[0,1,2],xmm4[0]		; AVX2-FAST-NEXT: vinsertps {{.*#+}} xmm1 = xmm1[0,1,2],xmm8[0]
; AVX2-FAST-NEXT: vshufps {{.*#+}} xmm1 = xmm2[1,3],xmm1[1,3]		; AVX2-FAST-NEXT: vhaddps %xmm4, %xmm5, %xmm3
; AVX2-FAST-NEXT: vblendps {{.*#+}} xmm1 = xmm1[0,1,2],xmm4[3]		; AVX2-FAST-NEXT: vshufps {{.*#+}} xmm2 = xmm2[1,3],xmm3[3,1]
; AVX2-FAST-NEXT: vaddps %xmm1, %xmm3, %xmm1		; AVX2-FAST-NEXT: vaddps %xmm2, %xmm1, %xmm1
; AVX2-FAST-NEXT: vpermilpd {{.*#+}} xmm2 = xmm1[1,0]		; AVX2-FAST-NEXT: vpermilpd {{.*#+}} xmm2 = xmm1[1,0]
; AVX2-FAST-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0		; AVX2-FAST-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
; AVX2-FAST-NEXT: vmovddup {{.*#+}} xmm1 = xmm1[0,0]		; AVX2-FAST-NEXT: vmovddup {{.*#+}} xmm1 = xmm1[0,0]
; AVX2-FAST-NEXT: vhaddps %xmm7, %xmm6, %xmm2		; AVX2-FAST-NEXT: vhaddps %xmm7, %xmm6, %xmm2
; AVX2-FAST-NEXT: vhaddps %xmm0, %xmm2, %xmm2		; AVX2-FAST-NEXT: vhaddps %xmm0, %xmm2, %xmm2
; AVX2-FAST-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1		; AVX2-FAST-NEXT: vinsertf128 $1, %xmm2, %ymm1, %ymm1
; AVX2-FAST-NEXT: vshufpd {{.*#+}} ymm0 = ymm0[0],ymm1[1],ymm0[2],ymm1[2]		; AVX2-FAST-NEXT: vshufpd {{.*#+}} ymm0 = ymm0[0],ymm1[1],ymm0[2],ymm1[2]
; AVX2-FAST-NEXT: retq		; AVX2-FAST-NEXT: retq
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines
; AVX2-SLOW: # %bb.0:		; AVX2-SLOW: # %bb.0:
; AVX2-SLOW-NEXT: vphaddd %xmm1, %xmm0, %xmm0		; AVX2-SLOW-NEXT: vphaddd %xmm1, %xmm0, %xmm0
; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[1,3,1,3]		; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm0[1,3,1,3]
; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,1,3]		; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,2,1,3]
; AVX2-SLOW-NEXT: vpaddd %xmm1, %xmm0, %xmm0		; AVX2-SLOW-NEXT: vpaddd %xmm1, %xmm0, %xmm0
; AVX2-SLOW-NEXT: vphaddd %xmm4, %xmm4, %xmm1		; AVX2-SLOW-NEXT: vphaddd %xmm4, %xmm4, %xmm1
; AVX2-SLOW-NEXT: vphaddd %xmm5, %xmm5, %xmm4		; AVX2-SLOW-NEXT: vphaddd %xmm5, %xmm5, %xmm4
; AVX2-SLOW-NEXT: vphaddd %xmm3, %xmm2, %xmm2		; AVX2-SLOW-NEXT: vphaddd %xmm3, %xmm2, %xmm2
; AVX2-SLOW-NEXT: vshufps {{.*#+}} xmm3 = xmm2[0,2],xmm1[0,3]		; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm3 = xmm2[0,2,2,3]
; AVX2-SLOW-NEXT: vinsertps {{.*#+}} xmm3 = xmm3[0,1,2],xmm4[0]		; AVX2-SLOW-NEXT: vpunpcklqdq {{.*#+}} xmm3 = xmm3[0],xmm1[0]
; AVX2-SLOW-NEXT: vshufps {{.*#+}} xmm1 = xmm2[1,3],xmm1[1,3]		; AVX2-SLOW-NEXT: vpbroadcastd %xmm4, %xmm5
		; AVX2-SLOW-NEXT: vpblendd {{.*#+}} xmm3 = xmm3[0,1,2],xmm5[3]
		; AVX2-SLOW-NEXT: vshufps {{.*#+}} xmm1 = xmm2[1,3],xmm1[1,1]
; AVX2-SLOW-NEXT: vblendps {{.*#+}} xmm1 = xmm1[0,1,2],xmm4[3]		; AVX2-SLOW-NEXT: vblendps {{.*#+}} xmm1 = xmm1[0,1,2],xmm4[3]
; AVX2-SLOW-NEXT: vpaddd %xmm1, %xmm3, %xmm1		; AVX2-SLOW-NEXT: vpaddd %xmm1, %xmm3, %xmm1
; AVX2-SLOW-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]		; AVX2-SLOW-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]		; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]
; AVX2-SLOW-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0		; AVX2-SLOW-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
; AVX2-SLOW-NEXT: vphaddd %xmm7, %xmm6, %xmm1		; AVX2-SLOW-NEXT: vphaddd %xmm7, %xmm6, %xmm1
; AVX2-SLOW-NEXT: vphaddd %xmm1, %xmm1, %xmm1		; AVX2-SLOW-NEXT: vphaddd %xmm1, %xmm1, %xmm1
; AVX2-SLOW-NEXT: vpbroadcastq %xmm1, %ymm1		; AVX2-SLOW-NEXT: vpbroadcastq %xmm1, %ymm1
; AVX2-SLOW-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1,2,3,4,5],ymm1[6,7]		; AVX2-SLOW-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1,2,3,4,5],ymm1[6,7]
; AVX2-SLOW-NEXT: retq		; AVX2-SLOW-NEXT: retq
;		;
; AVX2-FAST-LABEL: pair_sum_v8i32_v4i32:		; AVX2-FAST-LABEL: pair_sum_v8i32_v4i32:
; AVX2-FAST: # %bb.0:		; AVX2-FAST: # %bb.0:
; AVX2-FAST-NEXT: vphaddd %xmm1, %xmm0, %xmm0		; AVX2-FAST-NEXT: vphaddd %xmm1, %xmm0, %xmm0
; AVX2-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0		; AVX2-FAST-NEXT: vphaddd %xmm0, %xmm0, %xmm0
; AVX2-FAST-NEXT: vphaddd %xmm4, %xmm4, %xmm1		; AVX2-FAST-NEXT: vphaddd %xmm4, %xmm4, %xmm1
; AVX2-FAST-NEXT: vphaddd %xmm5, %xmm5, %xmm4		; AVX2-FAST-NEXT: vphaddd %xmm5, %xmm5, %xmm4
; AVX2-FAST-NEXT: vphaddd %xmm3, %xmm2, %xmm2		; AVX2-FAST-NEXT: vphaddd %xmm3, %xmm2, %xmm2
; AVX2-FAST-NEXT: vshufps {{.*#+}} xmm3 = xmm2[0,2],xmm1[0,3]		; AVX2-FAST-NEXT: vpshufd {{.*#+}} xmm3 = xmm2[0,2,2,3]
; AVX2-FAST-NEXT: vinsertps {{.*#+}} xmm3 = xmm3[0,1,2],xmm4[0]		; AVX2-FAST-NEXT: vpunpcklqdq {{.*#+}} xmm3 = xmm3[0],xmm1[0]
; AVX2-FAST-NEXT: vshufps {{.*#+}} xmm1 = xmm2[1,3],xmm1[1,3]		; AVX2-FAST-NEXT: vpbroadcastd %xmm4, %xmm5
		; AVX2-FAST-NEXT: vpblendd {{.*#+}} xmm3 = xmm3[0,1,2],xmm5[3]
		; AVX2-FAST-NEXT: vshufps {{.*#+}} xmm1 = xmm2[1,3],xmm1[1,1]
; AVX2-FAST-NEXT: vblendps {{.*#+}} xmm1 = xmm1[0,1,2],xmm4[3]		; AVX2-FAST-NEXT: vblendps {{.*#+}} xmm1 = xmm1[0,1,2],xmm4[3]
; AVX2-FAST-NEXT: vpaddd %xmm1, %xmm3, %xmm1		; AVX2-FAST-NEXT: vpaddd %xmm1, %xmm3, %xmm1
; AVX2-FAST-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]		; AVX2-FAST-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
; AVX2-FAST-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]		; AVX2-FAST-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]
; AVX2-FAST-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0		; AVX2-FAST-NEXT: vinserti128 $1, %xmm1, %ymm0, %ymm0
; AVX2-FAST-NEXT: vphaddd %xmm7, %xmm6, %xmm1		; AVX2-FAST-NEXT: vphaddd %xmm7, %xmm6, %xmm1
; AVX2-FAST-NEXT: vphaddd %xmm0, %xmm1, %xmm1		; AVX2-FAST-NEXT: vphaddd %xmm0, %xmm1, %xmm1
; AVX2-FAST-NEXT: vpbroadcastq %xmm1, %ymm1		; AVX2-FAST-NEXT: vpbroadcastq %xmm1, %ymm1
▲ Show 20 Lines • Show All 271 Lines • ▼ Show 20 Lines
; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]		; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]
; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[3,3,3,3]		; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[3,3,3,3]
; AVX2-SLOW-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0],xmm1[1],xmm0[2,3]		; AVX2-SLOW-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0],xmm1[1],xmm0[2,3]
; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm2[1,1,1,1]		; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm2[1,1,1,1]
; AVX2-SLOW-NEXT: vpaddd %xmm2, %xmm1, %xmm1		; AVX2-SLOW-NEXT: vpaddd %xmm2, %xmm1, %xmm1
; AVX2-SLOW-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm4[0],xmm1[0]		; AVX2-SLOW-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm4[0],xmm1[0]
; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm4 = xmm2[3,3,3,3]		; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm4 = xmm2[3,3,3,3]
; AVX2-SLOW-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm4[0]		; AVX2-SLOW-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm4[0]
; AVX2-SLOW-NEXT: vpblendd {{.*#+}} xmm2 = xmm5[0,1],xmm2[2,3]
; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm4 = xmm3[1,1,1,1]
; AVX2-SLOW-NEXT: vpbroadcastd %xmm3, %xmm5
; AVX2-SLOW-NEXT: vpaddd %xmm5, %xmm4, %xmm4
; AVX2-SLOW-NEXT: vpblendd {{.*#+}} xmm1 = xmm1[0,1,2],xmm4[3]
; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm4 = xmm3[2,2,2,2]
; AVX2-SLOW-NEXT: vpblendd {{.*#+}} xmm2 = xmm2[0,1,2],xmm4[3]
; AVX2-SLOW-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1,2],xmm3[3]
; AVX2-SLOW-NEXT: vpaddd %xmm1, %xmm0, %xmm0		; AVX2-SLOW-NEXT: vpaddd %xmm1, %xmm0, %xmm0
; AVX2-SLOW-NEXT: vpaddd %xmm2, %xmm0, %xmm0		; AVX2-SLOW-NEXT: vpblendd {{.*#+}} xmm1 = xmm5[0,1],xmm2[2,3]
		; AVX2-SLOW-NEXT: vpaddd %xmm1, %xmm0, %xmm0
		; AVX2-SLOW-NEXT: vpbroadcastq %xmm3, %xmm1
		; AVX2-SLOW-NEXT: vpbroadcastd %xmm3, %xmm2
		; AVX2-SLOW-NEXT: vpshufd {{.*#+}} xmm4 = xmm3[2,2,2,2]
		; AVX2-SLOW-NEXT: vpaddd %xmm2, %xmm4, %xmm2
		; AVX2-SLOW-NEXT: vpaddd %xmm1, %xmm3, %xmm1
		; AVX2-SLOW-NEXT: vpaddd %xmm2, %xmm1, %xmm1
		; AVX2-SLOW-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[3]
; AVX2-SLOW-NEXT: retq		; AVX2-SLOW-NEXT: retq
;		;
; AVX2-FAST-LABEL: sequential_sum_v4i32_v4i32:		; AVX2-FAST-LABEL: sequential_sum_v4i32_v4i32:
; AVX2-FAST: # %bb.0:		; AVX2-FAST: # %bb.0:
; AVX2-FAST-NEXT: vphaddd %xmm1, %xmm0, %xmm4		; AVX2-FAST-NEXT: vphaddd %xmm1, %xmm0, %xmm4
; AVX2-FAST-NEXT: vpshufd {{.*#+}} xmm4 = xmm4[0,2,2,3]		; AVX2-FAST-NEXT: vpshufd {{.*#+}} xmm4 = xmm4[0,2,2,3]
; AVX2-FAST-NEXT: vpunpckhdq {{.*#+}} xmm5 = xmm0[2],xmm1[2],xmm0[3],xmm1[3]		; AVX2-FAST-NEXT: vpunpckhdq {{.*#+}} xmm5 = xmm0[2],xmm1[2],xmm0[3],xmm1[3]
; AVX2-FAST-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]		; AVX2-FAST-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]
; AVX2-FAST-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[3,3,3,3]		; AVX2-FAST-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[3,3,3,3]
; AVX2-FAST-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0],xmm1[1],xmm0[2,3]		; AVX2-FAST-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0],xmm1[1],xmm0[2,3]
; AVX2-FAST-NEXT: vphaddd %xmm2, %xmm2, %xmm1		; AVX2-FAST-NEXT: vphaddd %xmm2, %xmm2, %xmm1
; AVX2-FAST-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm4[0],xmm1[0]		; AVX2-FAST-NEXT: vpunpcklqdq {{.*#+}} xmm1 = xmm4[0],xmm1[0]
; AVX2-FAST-NEXT: vpshufd {{.*#+}} xmm4 = xmm2[3,3,3,3]		; AVX2-FAST-NEXT: vpshufd {{.*#+}} xmm4 = xmm2[3,3,3,3]
; AVX2-FAST-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm4[0]		; AVX2-FAST-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm4[0]
; AVX2-FAST-NEXT: vpblendd {{.*#+}} xmm2 = xmm5[0,1],xmm2[2,3]
; AVX2-FAST-NEXT: vphaddd %xmm3, %xmm3, %xmm4
; AVX2-FAST-NEXT: vpshufd {{.*#+}} xmm5 = xmm3[2,2,2,2]
; AVX2-FAST-NEXT: vpblendd {{.*#+}} xmm2 = xmm2[0,1,2],xmm5[3]
; AVX2-FAST-NEXT: vpbroadcastd %xmm4, %xmm4
; AVX2-FAST-NEXT: vpblendd {{.*#+}} xmm1 = xmm1[0,1,2],xmm4[3]
; AVX2-FAST-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1,2],xmm3[3]
; AVX2-FAST-NEXT: vpaddd %xmm1, %xmm0, %xmm0		; AVX2-FAST-NEXT: vpaddd %xmm1, %xmm0, %xmm0
; AVX2-FAST-NEXT: vpaddd %xmm2, %xmm0, %xmm0		; AVX2-FAST-NEXT: vpblendd {{.*#+}} xmm1 = xmm5[0,1],xmm2[2,3]
		; AVX2-FAST-NEXT: vpaddd %xmm1, %xmm0, %xmm0
		; AVX2-FAST-NEXT: vphaddd %xmm3, %xmm3, %xmm1
		; AVX2-FAST-NEXT: vpshufd {{.*#+}} xmm2 = xmm3[2,2,2,2]
		; AVX2-FAST-NEXT: vpbroadcastd %xmm1, %xmm1
		; AVX2-FAST-NEXT: vpaddd %xmm1, %xmm3, %xmm1
		; AVX2-FAST-NEXT: vpaddd %xmm2, %xmm1, %xmm1
		; AVX2-FAST-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[3]
; AVX2-FAST-NEXT: retq		; AVX2-FAST-NEXT: retq
%5 = shufflevector <4 x i32> %0, <4 x i32> %1, <2 x i32> <i32 0, i32 4>		%5 = shufflevector <4 x i32> %0, <4 x i32> %1, <2 x i32> <i32 0, i32 4>
%6 = shufflevector <4 x i32> %0, <4 x i32> %1, <2 x i32> <i32 1, i32 5>		%6 = shufflevector <4 x i32> %0, <4 x i32> %1, <2 x i32> <i32 1, i32 5>
%7 = add <2 x i32> %5, %6		%7 = add <2 x i32> %5, %6
%8 = shufflevector <4 x i32> %0, <4 x i32> %1, <2 x i32> <i32 2, i32 6>		%8 = shufflevector <4 x i32> %0, <4 x i32> %1, <2 x i32> <i32 2, i32 6>
%9 = add <2 x i32> %8, %7		%9 = add <2 x i32> %8, %7
%10 = shufflevector <4 x i32> %0, <4 x i32> %1, <2 x i32> <i32 3, i32 7>		%10 = shufflevector <4 x i32> %0, <4 x i32> %1, <2 x i32> <i32 3, i32 7>
%11 = add <2 x i32> %10, %9		%11 = add <2 x i32> %10, %9
▲ Show 20 Lines • Show All 247 Lines • ▼ Show 20 Lines	; AVX-FAST-NEXT: retq
%9 = insertelement <4 x float> undef, float %5, i32 0		%9 = insertelement <4 x float> undef, float %5, i32 0
%10 = insertelement <4 x float> %9, float %6, i32 1		%10 = insertelement <4 x float> %9, float %6, i32 1
%11 = insertelement <4 x float> %10, float %7, i32 2		%11 = insertelement <4 x float> %10, float %7, i32 2
%12 = insertelement <4 x float> %11, float %8, i32 3		%12 = insertelement <4 x float> %11, float %8, i32 3
ret <4 x float> %12		ret <4 x float> %12
}		}

define <4 x i32> @reduction_sum_v4i32_v4i32(<4 x i32> %0, <4 x i32> %1, <4 x i32> %2, <4 x i32> %3) {		define <4 x i32> @reduction_sum_v4i32_v4i32(<4 x i32> %0, <4 x i32> %1, <4 x i32> %2, <4 x i32> %3) {
; SSSE3-SLOW-LABEL: reduction_sum_v4i32_v4i32:		; SSSE3-SLOW-LABEL: reduction_sum_v4i32_v4i32:
		deadalnixAuthorUnsubmitted Done Reply Inline Actions What about this one? deadalnix: What about this one?
; SSSE3-SLOW: # %bb.0:		; SSSE3-SLOW: # %bb.0:
; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm4 = xmm0[2,3,2,3]		; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm4 = xmm0[2,3,2,3]
; SSSE3-SLOW-NEXT: paddd %xmm4, %xmm0		; SSSE3-SLOW-NEXT: paddd %xmm4, %xmm0
; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm4 = xmm0[1,1,1,1]		; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm4 = xmm0[1,1,1,1]
; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm5 = xmm1[2,3,2,3]		; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm5 = xmm1[2,3,2,3]
; SSSE3-SLOW-NEXT: paddd %xmm1, %xmm5		; SSSE3-SLOW-NEXT: paddd %xmm1, %xmm5
; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm5[1,1,1,1]		; SSSE3-SLOW-NEXT: pshufd {{.*#+}} xmm1 = xmm5[1,1,1,1]
; SSSE3-SLOW-NEXT: punpckldq {{.*#+}} xmm4 = xmm4[0],xmm1[0],xmm4[1],xmm1[1]		; SSSE3-SLOW-NEXT: punpckldq {{.*#+}} xmm4 = xmm4[0],xmm1[0],xmm4[1],xmm1[1]
Show All 23 Lines
; SSSE3-FAST-NEXT: pshufd {{.*#+}} xmm2 = xmm3[2,3,2,3]		; SSSE3-FAST-NEXT: pshufd {{.*#+}} xmm2 = xmm3[2,3,2,3]
; SSSE3-FAST-NEXT: paddd %xmm3, %xmm2		; SSSE3-FAST-NEXT: paddd %xmm3, %xmm2
; SSSE3-FAST-NEXT: phaddd %xmm2, %xmm1		; SSSE3-FAST-NEXT: phaddd %xmm2, %xmm1
; SSSE3-FAST-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2]		; SSSE3-FAST-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2],xmm1[0,2]
; SSSE3-FAST-NEXT: retq		; SSSE3-FAST-NEXT: retq
;		;
; AVX1-SLOW-LABEL: reduction_sum_v4i32_v4i32:		; AVX1-SLOW-LABEL: reduction_sum_v4i32_v4i32:
; AVX1-SLOW: # %bb.0:		; AVX1-SLOW: # %bb.0:
; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm4 = xmm0[2,3,2,3]		; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm4 = xmm0[2,3,2,3]
		deadalnixAuthorUnsubmitted Done Reply Inline Actions This was a win that gets reverted by D127595 deadalnix: This was a win that gets reverted by D127595
; AVX1-SLOW-NEXT: vpaddd %xmm4, %xmm0, %xmm0		; AVX1-SLOW-NEXT: vpaddd %xmm4, %xmm0, %xmm0
; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm4 = xmm0[1,1,1,1]		; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm4 = xmm0[1,1,1,1]
; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm5 = xmm1[2,3,2,3]		; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm5 = xmm1[2,3,2,3]
; AVX1-SLOW-NEXT: vpaddd %xmm5, %xmm1, %xmm1		; AVX1-SLOW-NEXT: vpaddd %xmm5, %xmm1, %xmm1
; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm5 = xmm1[1,1,1,1]		; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm5 = xmm1[1,1,1,1]
; AVX1-SLOW-NEXT: vpunpckldq {{.*#+}} xmm4 = xmm4[0],xmm5[0],xmm4[1],xmm5[1]		; AVX1-SLOW-NEXT: vpunpckldq {{.*#+}} xmm4 = xmm4[0],xmm5[0],xmm4[1],xmm5[1]
; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm5 = xmm2[2,3,2,3]		; AVX1-SLOW-NEXT: vpshufd {{.*#+}} xmm5 = xmm2[2,3,2,3]
; AVX1-SLOW-NEXT: vpaddd %xmm5, %xmm2, %xmm2		; AVX1-SLOW-NEXT: vpaddd %xmm5, %xmm2, %xmm2
▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/icmp-shift-opt.ll

Load File

llvm/test/CodeGen/X86/insert-into-constant-vector.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -disable-peephole -mtriple=i686-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefixes=X86-SSE,X86-SSE2			; RUN: llc < %s -disable-peephole -mtriple=i686-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefixes=X86-SSE,X86-SSE2
	; RUN: llc < %s -disable-peephole -mtriple=x86_64-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefixes=X64-SSE,X64-SSE2			; RUN: llc < %s -disable-peephole -mtriple=x86_64-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefixes=X64-SSE,X64-SSE2
	; RUN: llc < %s -disable-peephole -mtriple=i686-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefixes=X86-SSE,X86-SSE4			; RUN: llc < %s -disable-peephole -mtriple=i686-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefixes=X86-SSE,X86-SSE4
	; RUN: llc < %s -disable-peephole -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefixes=X64-SSE,X64-SSE4			; RUN: llc < %s -disable-peephole -mtriple=x86_64-unknown-unknown -mattr=+sse4.1 \| FileCheck %s --check-prefixes=X64-SSE,X64-SSE4
	; RUN: llc < %s -disable-peephole -mtriple=i686-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefixes=X86-AVX,X86-AVX1			; RUN: llc < %s -disable-peephole -mtriple=i686-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefixes=X86-AVX,X86-AVX1
	; RUN: llc < %s -disable-peephole -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefixes=X64-AVX,X64-AVX1			; RUN: llc < %s -disable-peephole -mtriple=x86_64-unknown-unknown -mattr=+avx \| FileCheck %s --check-prefixes=X64-AVX,X64-AVX1
	; RUN: llc < %s -disable-peephole -mtriple=i686-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefixes=X86-AVX,X86-AVX2			; RUN: llc < %s -disable-peephole -mtriple=i686-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefixes=X86-AVX,X86-AVX2
	; RUN: llc < %s -disable-peephole -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefixes=X64-AVX,X64-AVX2			; RUN: llc < %s -disable-peephole -mtriple=x86_64-unknown-unknown -mattr=+avx2 \| FileCheck %s --check-prefixes=X64-AVX,X64-AVX2
	; RUN: llc < %s -disable-peephole -mtriple=i686-unknown-unknown -mattr=+avx512f \| FileCheck %s --check-prefixes=X86-AVX,X86-AVX512F			; RUN: llc < %s -disable-peephole -mtriple=i686-unknown-unknown -mattr=+avx512f \| FileCheck %s --check-prefixes=X86-AVX,X86-AVX512F
	; RUN: llc < %s -disable-peephole -mtriple=x86_64-unknown-unknown -mattr=+avx512f \| FileCheck %s --check-prefixes=X64-AVX,X64-AVX512F			; RUN: llc < %s -disable-peephole -mtriple=x86_64-unknown-unknown -mattr=+avx512f \| FileCheck %s --check-prefixes=X64-AVX,X64-AVX512F

	define <16 x i8> @elt0_v16i8(i8 %x) {			define <16 x i8> @elt0_v16i8(i8 %x) {
	; X86-SSE2-LABEL: elt0_v16i8:			; X86-SSE2-LABEL: elt0_v16i8:
	; X86-SSE2: # %bb.0:			; X86-SSE2: # %bb.0:
	; X86-SSE2-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero			; X86-SSE2-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero
	; X86-SSE2-NEXT: andps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0			; X86-SSE2-NEXT: movaps {{.*#+}} xmm0 = [0,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]
				; X86-SSE2-NEXT: andnps %xmm1, %xmm0
	; X86-SSE2-NEXT: orps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0			; X86-SSE2-NEXT: orps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
	; X86-SSE2-NEXT: retl			; X86-SSE2-NEXT: retl
	;			;
	; X64-SSE2-LABEL: elt0_v16i8:			; X64-SSE2-LABEL: elt0_v16i8:
	; X64-SSE2: # %bb.0:			; X64-SSE2: # %bb.0:
	; X64-SSE2-NEXT: movd %edi, %xmm0			; X64-SSE2-NEXT: movd %edi, %xmm1
	; X64-SSE2-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; X64-SSE2-NEXT: movdqa {{.*#+}} xmm0 = [0,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255]
				; X64-SSE2-NEXT: pandn %xmm1, %xmm0
	; X64-SSE2-NEXT: por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0			; X64-SSE2-NEXT: por {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; X64-SSE2-NEXT: retq			; X64-SSE2-NEXT: retq
				RKSimonUnsubmitted Not Done Reply Inline Actions These should be fixed by rG17dd1ad14be77c722f7c7c1e4fa273c6f170abea RKSimon: These should be fixed by rG17dd1ad14be77c722f7c7c1e4fa273c6f170abea
	;			;
	; X86-SSE4-LABEL: elt0_v16i8:			; X86-SSE4-LABEL: elt0_v16i8:
	; X86-SSE4: # %bb.0:			; X86-SSE4: # %bb.0:
	; X86-SSE4-NEXT: movdqa {{.*#+}} xmm0 = <u,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15>			; X86-SSE4-NEXT: movdqa {{.*#+}} xmm0 = <u,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15>
	; X86-SSE4-NEXT: pinsrb $0, {{[0-9]+}}(%esp), %xmm0			; X86-SSE4-NEXT: pinsrb $0, {{[0-9]+}}(%esp), %xmm0
	; X86-SSE4-NEXT: retl			; X86-SSE4-NEXT: retl
	;			;
	; X64-SSE4-LABEL: elt0_v16i8:			; X64-SSE4-LABEL: elt0_v16i8:
	▲ Show 20 Lines • Show All 480 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/insertelement-var-index.ll

	Show First 20 Lines • Show All 2,282 Lines • ▼ Show 20 Lines
	; SSE-NEXT: cmovnsl %eax, %ecx			; SSE-NEXT: cmovnsl %eax, %ecx
	; SSE-NEXT: andl $-2147483648, %ecx # imm = 0x80000000			; SSE-NEXT: andl $-2147483648, %ecx # imm = 0x80000000
	; SSE-NEXT: addl %eax, %ecx			; SSE-NEXT: addl %eax, %ecx
	; SSE-NEXT: # kill: def $eax killed $eax killed $rax			; SSE-NEXT: # kill: def $eax killed $eax killed $rax
	; SSE-NEXT: xorl %edx, %edx			; SSE-NEXT: xorl %edx, %edx
	; SSE-NEXT: divl %ecx			; SSE-NEXT: divl %ecx
	; SSE-NEXT: retq			; SSE-NEXT: retq
	;			;
	; AVX1OR2-LABEL: PR44139:			; AVX1-LABEL: PR44139:
	; AVX1OR2: # %bb.0:			; AVX1: # %bb.0:
	; AVX1OR2-NEXT: vbroadcastsd (%rdi), %ymm0			; AVX1-NEXT: vbroadcastsd (%rdi), %ymm0
	; AVX1OR2-NEXT: movl (%rdi), %eax			; AVX1-NEXT: vpinsrq $1, (%rdi), %xmm0, %xmm1
	; AVX1OR2-NEXT: vmovaps %ymm0, 64(%rdi)			; AVX1-NEXT: vblendps {{.*#+}} ymm1 = ymm1[0,1,2,3],ymm0[4,5,6,7]
	; AVX1OR2-NEXT: vmovaps %ymm0, 96(%rdi)			; AVX1-NEXT: vmovaps %ymm0, 64(%rdi)
	; AVX1OR2-NEXT: vmovaps %ymm0, (%rdi)			; AVX1-NEXT: vmovaps %ymm0, 96(%rdi)
	; AVX1OR2-NEXT: vmovaps %ymm0, 32(%rdi)			; AVX1-NEXT: vmovaps %ymm0, 32(%rdi)
	; AVX1OR2-NEXT: leal 2147483647(%rax), %ecx			; AVX1-NEXT: movl (%rdi), %eax
	; AVX1OR2-NEXT: testl %eax, %eax			; AVX1-NEXT: vmovaps %ymm1, (%rdi)
	; AVX1OR2-NEXT: cmovnsl %eax, %ecx			; AVX1-NEXT: leal 2147483647(%rax), %ecx
	; AVX1OR2-NEXT: andl $-2147483648, %ecx # imm = 0x80000000			; AVX1-NEXT: testl %eax, %eax
	; AVX1OR2-NEXT: addl %eax, %ecx			; AVX1-NEXT: cmovnsl %eax, %ecx
	; AVX1OR2-NEXT: # kill: def $eax killed $eax killed $rax			; AVX1-NEXT: andl $-2147483648, %ecx # imm = 0x80000000
	; AVX1OR2-NEXT: xorl %edx, %edx			; AVX1-NEXT: addl %eax, %ecx
	; AVX1OR2-NEXT: divl %ecx			; AVX1-NEXT: # kill: def $eax killed $eax killed $rax
	; AVX1OR2-NEXT: vzeroupper			; AVX1-NEXT: xorl %edx, %edx
	; AVX1OR2-NEXT: retq			; AVX1-NEXT: divl %ecx
				; AVX1-NEXT: vzeroupper
				; AVX1-NEXT: retq
				;
				; AVX2-LABEL: PR44139:
				; AVX2: # %bb.0:
				; AVX2-NEXT: vpbroadcastq (%rdi), %ymm0
				; AVX2-NEXT: vpinsrq $1, (%rdi), %xmm0, %xmm1
				; AVX2-NEXT: vpblendd {{.*#+}} ymm1 = ymm1[0,1,2,3],ymm0[4,5,6,7]
				; AVX2-NEXT: vmovdqa %ymm0, 64(%rdi)
				; AVX2-NEXT: vmovdqa %ymm0, 96(%rdi)
				; AVX2-NEXT: vmovdqa %ymm0, 32(%rdi)
				; AVX2-NEXT: movl (%rdi), %eax
				; AVX2-NEXT: vmovdqa %ymm1, (%rdi)
				; AVX2-NEXT: leal 2147483647(%rax), %ecx
				; AVX2-NEXT: testl %eax, %eax
				; AVX2-NEXT: cmovnsl %eax, %ecx
				; AVX2-NEXT: andl $-2147483648, %ecx # imm = 0x80000000
				; AVX2-NEXT: addl %eax, %ecx
				; AVX2-NEXT: # kill: def $eax killed $eax killed $rax
				; AVX2-NEXT: xorl %edx, %edx
				; AVX2-NEXT: divl %ecx
				; AVX2-NEXT: vzeroupper
				; AVX2-NEXT: retq
	;			;
	; AVX512-LABEL: PR44139:			; AVX512-LABEL: PR44139:
	; AVX512: # %bb.0:			; AVX512: # %bb.0:
	; AVX512-NEXT: vbroadcastsd (%rdi), %zmm0			; AVX512-NEXT: vmovdqa64 (%rdi), %zmm0
	; AVX512-NEXT: movl (%rdi), %eax			; AVX512-NEXT: vpbroadcastq (%rdi), %zmm1
	; AVX512-NEXT: vmovaps %zmm0, (%rdi)			; AVX512-NEXT: vpmovqd %zmm0, %ymm0
	; AVX512-NEXT: vmovaps %zmm0, 64(%rdi)			; AVX512-NEXT: vpinsrq $1, (%rdi), %xmm1, %xmm2
				RKSimonUnsubmitted Not Done Reply Inline Actions we're inserting the same load that we've already broadcast to the entire zmm? RKSimon: we're inserting the same load that we've already broadcast to the entire zmm?
				; AVX512-NEXT: vinserti32x4 $0, %xmm2, %zmm1, %zmm2
				; AVX512-NEXT: vmovdqa64 %zmm1, 64(%rdi)
				; AVX512-NEXT: vmovdqa64 %zmm2, (%rdi)
				; AVX512-NEXT: vmovd %xmm0, %eax
	; AVX512-NEXT: leal 2147483647(%rax), %ecx			; AVX512-NEXT: leal 2147483647(%rax), %ecx
	; AVX512-NEXT: testl %eax, %eax			; AVX512-NEXT: testl %eax, %eax
	; AVX512-NEXT: cmovnsl %eax, %ecx			; AVX512-NEXT: cmovnsl %eax, %ecx
	; AVX512-NEXT: andl $-2147483648, %ecx # imm = 0x80000000			; AVX512-NEXT: andl $-2147483648, %ecx # imm = 0x80000000
	; AVX512-NEXT: addl %eax, %ecx			; AVX512-NEXT: addl %eax, %ecx
	; AVX512-NEXT: # kill: def $eax killed $eax killed $rax			; AVX512-NEXT: # kill: def $eax killed $eax killed $rax
	; AVX512-NEXT: xorl %edx, %edx			; AVX512-NEXT: xorl %edx, %edx
	; AVX512-NEXT: divl %ecx			; AVX512-NEXT: divl %ecx
	; AVX512-NEXT: vzeroupper			; AVX512-NEXT: vzeroupper
	; AVX512-NEXT: retq			; AVX512-NEXT: retq
	;			;
	; X86AVX2-LABEL: PR44139:			; X86AVX2-LABEL: PR44139:
	; X86AVX2: # %bb.0:			; X86AVX2: # %bb.0:
	; X86AVX2-NEXT: movl {{[0-9]+}}(%esp), %ecx			; X86AVX2-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; X86AVX2-NEXT: movl (%ecx), %eax
	; X86AVX2-NEXT: vbroadcastsd (%ecx), %ymm0			; X86AVX2-NEXT: vbroadcastsd (%ecx), %ymm0
				; X86AVX2-NEXT: vunpcklpd {{.*#+}} xmm1 = xmm0[0],mem[0]
				; X86AVX2-NEXT: vblendps {{.*#+}} ymm1 = ymm1[0,1,2,3],ymm0[4,5,6,7]
	; X86AVX2-NEXT: vmovaps %ymm0, 64(%ecx)			; X86AVX2-NEXT: vmovaps %ymm0, 64(%ecx)
	; X86AVX2-NEXT: vmovaps %ymm0, 96(%ecx)			; X86AVX2-NEXT: vmovaps %ymm0, 96(%ecx)
	; X86AVX2-NEXT: vmovaps %ymm0, (%ecx)
	; X86AVX2-NEXT: vmovaps %ymm0, 32(%ecx)			; X86AVX2-NEXT: vmovaps %ymm0, 32(%ecx)
				; X86AVX2-NEXT: movl (%ecx), %eax
				; X86AVX2-NEXT: vmovaps %ymm1, (%ecx)
	; X86AVX2-NEXT: leal 2147483647(%eax), %ecx			; X86AVX2-NEXT: leal 2147483647(%eax), %ecx
	; X86AVX2-NEXT: testl %eax, %eax			; X86AVX2-NEXT: testl %eax, %eax
	; X86AVX2-NEXT: cmovnsl %eax, %ecx			; X86AVX2-NEXT: cmovnsl %eax, %ecx
	; X86AVX2-NEXT: andl $-2147483648, %ecx # imm = 0x80000000			; X86AVX2-NEXT: andl $-2147483648, %ecx # imm = 0x80000000
	; X86AVX2-NEXT: addl %eax, %ecx			; X86AVX2-NEXT: addl %eax, %ecx
	; X86AVX2-NEXT: xorl %edx, %edx			; X86AVX2-NEXT: xorl %edx, %edx
	; X86AVX2-NEXT: divl %ecx			; X86AVX2-NEXT: divl %ecx
	; X86AVX2-NEXT: vzeroupper			; X86AVX2-NEXT: vzeroupper
	Show All 14 Lines

llvm/test/CodeGen/X86/is_fpclass-fp80.ll

Load File

llvm/test/CodeGen/X86/isel-blendi-gettargetconstant.ll

Load File

llvm/test/CodeGen/X86/masked_store.ll

Load File

llvm/test/CodeGen/X86/movmsk-cmp.ll

Load File

llvm/test/CodeGen/X86/mulvi32.ll

Load File

llvm/test/CodeGen/X86/nontemporal-3.ll

Load File

llvm/test/CodeGen/X86/pmulh.ll

Load File

llvm/test/CodeGen/X86/popcnt.ll

Load File

llvm/test/CodeGen/X86/pr53419.ll

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	bb:
%lhs = load <4 x i8>, ptr %arg1, align 1		%lhs = load <4 x i8>, ptr %arg1, align 1
%rhs = load <4 x i8>, ptr %arg, align 1		%rhs = load <4 x i8>, ptr %arg, align 1
%cmp = icmp eq <4 x i8> %lhs, %rhs		%cmp = icmp eq <4 x i8> %lhs, %rhs
%all_eq = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> %cmp)		%all_eq = call i1 @llvm.vector.reduce.and.v4i1(<4 x i1> %cmp)
ret i1 %all_eq		ret i1 %all_eq
}		}

define i1 @intrinsic_v8i8(ptr align 1 %arg, ptr align 1 %arg1) {		define i1 @intrinsic_v8i8(ptr align 1 %arg, ptr align 1 %arg1) {
; SSE-LABEL: intrinsic_v8i8:		; SSE2-LABEL: intrinsic_v8i8:
; SSE: # %bb.0: # %bb		; SSE2: # %bb.0: # %bb
; SSE-NEXT: movq {{.*#+}} xmm0 = mem[0],zero		; SSE2-NEXT: movq {{.*#+}} xmm0 = mem[0],zero
; SSE-NEXT: movq {{.*#+}} xmm1 = mem[0],zero		; SSE2-NEXT: movq {{.*#+}} xmm1 = mem[0],zero
; SSE-NEXT: pcmpeqb %xmm0, %xmm1		; SSE2-NEXT: pcmpeqb %xmm0, %xmm1
; SSE-NEXT: pmovmskb %xmm1, %eax		; SSE2-NEXT: pmovmskb %xmm1, %eax
; SSE-NEXT: cmpb $-1, %al		; SSE2-NEXT: cmpb $-1, %al
; SSE-NEXT: sete %al		; SSE2-NEXT: sete %al
; SSE-NEXT: retq		; SSE2-NEXT: retq
		;
		; SSE42-LABEL: intrinsic_v8i8:
		; SSE42: # %bb.0: # %bb
		; SSE42-NEXT: pmovzxbw {{.*#+}} xmm0 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero
		; SSE42-NEXT: pmovzxbw {{.*#+}} xmm1 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero
		; SSE42-NEXT: pcmpeqw %xmm0, %xmm1
		; SSE42-NEXT: packsswb %xmm1, %xmm1
		; SSE42-NEXT: pmovmskb %xmm1, %eax
		; SSE42-NEXT: cmpb $-1, %al
		; SSE42-NEXT: sete %al
		; SSE42-NEXT: retq
;		;
; AVX-LABEL: intrinsic_v8i8:		; AVX-LABEL: intrinsic_v8i8:
; AVX: # %bb.0: # %bb		; AVX: # %bb.0: # %bb
; AVX-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; AVX-NEXT: vpmovzxbw {{.*#+}} xmm0 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero
; AVX-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero		; AVX-NEXT: vpmovzxbw {{.*#+}} xmm1 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero
; AVX-NEXT: vpcmpeqb %xmm1, %xmm0, %xmm0		; AVX-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
		; AVX-NEXT: vpacksswb %xmm0, %xmm0, %xmm0
; AVX-NEXT: vpmovmskb %xmm0, %eax		; AVX-NEXT: vpmovmskb %xmm0, %eax
; AVX-NEXT: cmpb $-1, %al		; AVX-NEXT: cmpb $-1, %al
; AVX-NEXT: sete %al		; AVX-NEXT: sete %al
; AVX-NEXT: retq		; AVX-NEXT: retq
;		;
; X86-LABEL: intrinsic_v8i8:		; X86-LABEL: intrinsic_v8i8:
; X86: # %bb.0: # %bb		; X86: # %bb.0: # %bb
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax		; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx		; X86-NEXT: movl {{[0-9]+}}(%esp), %ecx
; X86-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero		; X86-NEXT: vpmovzxbw {{.*#+}} xmm0 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero
; X86-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero		; X86-NEXT: vpmovzxbw {{.*#+}} xmm1 = mem[0],zero,mem[1],zero,mem[2],zero,mem[3],zero,mem[4],zero,mem[5],zero,mem[6],zero,mem[7],zero
; X86-NEXT: vpcmpeqb %xmm1, %xmm0, %xmm0		; X86-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
		; X86-NEXT: vpacksswb %xmm0, %xmm0, %xmm0
		deadalnixAuthorUnsubmitted Done Reply Inline Actions This file has now regressed :'( deadalnix: This file has now regressed :'(
; X86-NEXT: vpmovmskb %xmm0, %eax		; X86-NEXT: vpmovmskb %xmm0, %eax
; X86-NEXT: cmpb $-1, %al		; X86-NEXT: cmpb $-1, %al
; X86-NEXT: sete %al		; X86-NEXT: sete %al
; X86-NEXT: retl		; X86-NEXT: retl
bb:		bb:
%lhs = load <8 x i8>, ptr %arg1, align 1		%lhs = load <8 x i8>, ptr %arg1, align 1
%rhs = load <8 x i8>, ptr %arg, align 1		%rhs = load <8 x i8>, ptr %arg, align 1
%cmp = icmp eq <8 x i8> %lhs, %rhs		%cmp = icmp eq <8 x i8> %lhs, %rhs
▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines
; X86-NEXT: sete %al		; X86-NEXT: sete %al
; X86-NEXT: retl		; X86-NEXT: retl
bb:		bb:
%lhs = load i32, ptr %arg1, align 1		%lhs = load i32, ptr %arg1, align 1
%rhs = load i32, ptr %arg, align 1		%rhs = load i32, ptr %arg, align 1
%all_eq = icmp eq i32 %lhs, %rhs		%all_eq = icmp eq i32 %lhs, %rhs
ret i1 %all_eq		ret i1 %all_eq
}		}
		;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
		; SSE: {{.*}}

llvm/test/CodeGen/X86/promote-vec3.ll

Load File

llvm/test/CodeGen/X86/psubus.ll

Show First 20 Lines • Show All 739 Lines • ▼ Show 20 Lines
; SSE2-NEXT: pcmpgtd %xmm3, %xmm5		; SSE2-NEXT: pcmpgtd %xmm3, %xmm5
; SSE2-NEXT: pxor %xmm5, %xmm4		; SSE2-NEXT: pxor %xmm5, %xmm4
; SSE2-NEXT: pand %xmm1, %xmm5		; SSE2-NEXT: pand %xmm1, %xmm5
; SSE2-NEXT: por %xmm4, %xmm5		; SSE2-NEXT: por %xmm4, %xmm5
; SSE2-NEXT: pslld $16, %xmm5		; SSE2-NEXT: pslld $16, %xmm5
; SSE2-NEXT: psrad $16, %xmm5		; SSE2-NEXT: psrad $16, %xmm5
; SSE2-NEXT: packssdw %xmm6, %xmm5		; SSE2-NEXT: packssdw %xmm6, %xmm5
; SSE2-NEXT: psubusw %xmm5, %xmm0		; SSE2-NEXT: psubusw %xmm5, %xmm0
; SSE2-NEXT: retq		; SSE2-NEXT: retq
		RKSimonUnsubmitted Not Done Reply Inline Actions Looks like we've managed to break DAGCombiner::foldSubToUSubSat / getTruncatedUSUBSAT some place. RKSimon: Looks like we've managed to break DAGCombiner::foldSubToUSubSat / getTruncatedUSUBSAT some…
;		;
; SSSE3-LABEL: test13:		; SSSE3-LABEL: test13:
; SSSE3: # %bb.0: # %vector.ph		; SSSE3: # %bb.0: # %vector.ph
; SSSE3-NEXT: movdqa {{.*#+}} xmm3 = [2147483648,2147483648,2147483648,2147483648]		; SSSE3-NEXT: movdqa {{.*#+}} xmm3 = [2147483648,2147483648,2147483648,2147483648]
; SSSE3-NEXT: movdqa %xmm2, %xmm4		; SSSE3-NEXT: movdqa %xmm2, %xmm4
; SSSE3-NEXT: pxor %xmm3, %xmm4		; SSSE3-NEXT: pxor %xmm3, %xmm4
; SSSE3-NEXT: movdqa {{.*#+}} xmm5 = [2147549183,2147549183,2147549183,2147549183]		; SSSE3-NEXT: movdqa {{.*#+}} xmm5 = [2147549183,2147549183,2147549183,2147549183]
; SSSE3-NEXT: movdqa %xmm5, %xmm6		; SSSE3-NEXT: movdqa %xmm5, %xmm6
▲ Show 20 Lines • Show All 1,032 Lines • ▼ Show 20 Lines	vector.ph:
%res = trunc <8 x i64> %sub to <8 x i16>		%res = trunc <8 x i64> %sub to <8 x i16>
ret <8 x i16> %res		ret <8 x i16> %res
}		}

define <16 x i16> @psubus_16i32_max(<16 x i16> %x, <16 x i32> %y) nounwind {		define <16 x i16> @psubus_16i32_max(<16 x i16> %x, <16 x i32> %y) nounwind {
; SSE2OR3-LABEL: psubus_16i32_max:		; SSE2OR3-LABEL: psubus_16i32_max:
; SSE2OR3: # %bb.0: # %vector.ph		; SSE2OR3: # %bb.0: # %vector.ph
; SSE2OR3-NEXT: movdqa {{.*#+}} xmm7 = [2147483648,2147483648,2147483648,2147483648]		; SSE2OR3-NEXT: movdqa {{.*#+}} xmm7 = [2147483648,2147483648,2147483648,2147483648]
; SSE2OR3-NEXT: movdqa %xmm3, %xmm8		; SSE2OR3-NEXT: movdqa %xmm5, %xmm8
; SSE2OR3-NEXT: pxor %xmm7, %xmm8		; SSE2OR3-NEXT: pxor %xmm7, %xmm8
; SSE2OR3-NEXT: movdqa {{.*#+}} xmm6 = [2147549183,2147549183,2147549183,2147549183]		; SSE2OR3-NEXT: movdqa {{.*#+}} xmm6 = [2147549183,2147549183,2147549183,2147549183]
; SSE2OR3-NEXT: movdqa %xmm6, %xmm9		; SSE2OR3-NEXT: movdqa %xmm6, %xmm9
; SSE2OR3-NEXT: pcmpgtd %xmm8, %xmm9		; SSE2OR3-NEXT: pcmpgtd %xmm8, %xmm9
; SSE2OR3-NEXT: pcmpeqd %xmm8, %xmm8		; SSE2OR3-NEXT: pcmpeqd %xmm8, %xmm8
		; SSE2OR3-NEXT: pand %xmm9, %xmm5
		; SSE2OR3-NEXT: pxor %xmm8, %xmm9
		; SSE2OR3-NEXT: por %xmm5, %xmm9
		; SSE2OR3-NEXT: pslld $16, %xmm9
		; SSE2OR3-NEXT: psrad $16, %xmm9
		; SSE2OR3-NEXT: movdqa %xmm4, %xmm10
		; SSE2OR3-NEXT: pxor %xmm7, %xmm10
		; SSE2OR3-NEXT: movdqa %xmm6, %xmm5
		; SSE2OR3-NEXT: pcmpgtd %xmm10, %xmm5
		; SSE2OR3-NEXT: pand %xmm5, %xmm4
		; SSE2OR3-NEXT: pxor %xmm8, %xmm5
		; SSE2OR3-NEXT: por %xmm4, %xmm5
		; SSE2OR3-NEXT: pslld $16, %xmm5
		; SSE2OR3-NEXT: psrad $16, %xmm5
		; SSE2OR3-NEXT: packssdw %xmm9, %xmm5
		; SSE2OR3-NEXT: movdqa %xmm3, %xmm4
		; SSE2OR3-NEXT: pxor %xmm7, %xmm4
		; SSE2OR3-NEXT: movdqa %xmm6, %xmm9
		; SSE2OR3-NEXT: pcmpgtd %xmm4, %xmm9
; SSE2OR3-NEXT: pand %xmm9, %xmm3		; SSE2OR3-NEXT: pand %xmm9, %xmm3
; SSE2OR3-NEXT: pxor %xmm8, %xmm9		; SSE2OR3-NEXT: pxor %xmm8, %xmm9
; SSE2OR3-NEXT: por %xmm3, %xmm9		; SSE2OR3-NEXT: por %xmm3, %xmm9
; SSE2OR3-NEXT: pslld $16, %xmm9		; SSE2OR3-NEXT: pslld $16, %xmm9
; SSE2OR3-NEXT: psrad $16, %xmm9		; SSE2OR3-NEXT: psrad $16, %xmm9
; SSE2OR3-NEXT: movdqa %xmm2, %xmm3		; SSE2OR3-NEXT: pxor %xmm2, %xmm7
; SSE2OR3-NEXT: pxor %xmm7, %xmm3
; SSE2OR3-NEXT: movdqa %xmm6, %xmm10
; SSE2OR3-NEXT: pcmpgtd %xmm3, %xmm10
; SSE2OR3-NEXT: pand %xmm10, %xmm2
; SSE2OR3-NEXT: pxor %xmm8, %xmm10
; SSE2OR3-NEXT: por %xmm2, %xmm10
; SSE2OR3-NEXT: pslld $16, %xmm10
; SSE2OR3-NEXT: psrad $16, %xmm10
; SSE2OR3-NEXT: packssdw %xmm9, %xmm10
; SSE2OR3-NEXT: psubusw %xmm10, %xmm0
; SSE2OR3-NEXT: movdqa %xmm5, %xmm2
; SSE2OR3-NEXT: pxor %xmm7, %xmm2
; SSE2OR3-NEXT: movdqa %xmm6, %xmm3
; SSE2OR3-NEXT: pcmpgtd %xmm2, %xmm3
; SSE2OR3-NEXT: pand %xmm3, %xmm5
; SSE2OR3-NEXT: pxor %xmm8, %xmm3
; SSE2OR3-NEXT: por %xmm5, %xmm3
; SSE2OR3-NEXT: pslld $16, %xmm3
; SSE2OR3-NEXT: psrad $16, %xmm3
; SSE2OR3-NEXT: pxor %xmm4, %xmm7
; SSE2OR3-NEXT: pcmpgtd %xmm7, %xmm6		; SSE2OR3-NEXT: pcmpgtd %xmm7, %xmm6
; SSE2OR3-NEXT: pxor %xmm6, %xmm8		; SSE2OR3-NEXT: pxor %xmm6, %xmm8
; SSE2OR3-NEXT: pand %xmm4, %xmm6		; SSE2OR3-NEXT: pand %xmm2, %xmm6
; SSE2OR3-NEXT: por %xmm8, %xmm6		; SSE2OR3-NEXT: por %xmm8, %xmm6
; SSE2OR3-NEXT: pslld $16, %xmm6		; SSE2OR3-NEXT: pslld $16, %xmm6
; SSE2OR3-NEXT: psrad $16, %xmm6		; SSE2OR3-NEXT: psrad $16, %xmm6
; SSE2OR3-NEXT: packssdw %xmm3, %xmm6		; SSE2OR3-NEXT: packssdw %xmm9, %xmm6
; SSE2OR3-NEXT: psubusw %xmm6, %xmm1		; SSE2OR3-NEXT: psubusw %xmm6, %xmm0
		; SSE2OR3-NEXT: psubusw %xmm5, %xmm1
; SSE2OR3-NEXT: retq		; SSE2OR3-NEXT: retq
;		;
; SSE41-LABEL: psubus_16i32_max:		; SSE41-LABEL: psubus_16i32_max:
; SSE41: # %bb.0: # %vector.ph		; SSE41: # %bb.0: # %vector.ph
; SSE41-NEXT: movdqa {{.*#+}} xmm6 = [65535,65535,65535,65535]		; SSE41-NEXT: movdqa {{.*#+}} xmm6 = [65535,65535,65535,65535]
		; SSE41-NEXT: pminud %xmm6, %xmm5
		; SSE41-NEXT: pminud %xmm6, %xmm4
		; SSE41-NEXT: packusdw %xmm5, %xmm4
; SSE41-NEXT: pminud %xmm6, %xmm3		; SSE41-NEXT: pminud %xmm6, %xmm3
; SSE41-NEXT: pminud %xmm6, %xmm2		; SSE41-NEXT: pminud %xmm6, %xmm2
; SSE41-NEXT: packusdw %xmm3, %xmm2		; SSE41-NEXT: packusdw %xmm3, %xmm2
; SSE41-NEXT: psubusw %xmm2, %xmm0		; SSE41-NEXT: psubusw %xmm2, %xmm0
; SSE41-NEXT: pminud %xmm6, %xmm5
; SSE41-NEXT: pminud %xmm6, %xmm4
; SSE41-NEXT: packusdw %xmm5, %xmm4
; SSE41-NEXT: psubusw %xmm4, %xmm1		; SSE41-NEXT: psubusw %xmm4, %xmm1
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX1-LABEL: psubus_16i32_max:		; AVX1-LABEL: psubus_16i32_max:
; AVX1: # %bb.0: # %vector.ph		; AVX1: # %bb.0: # %vector.ph
; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm3		; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm3
; AVX1-NEXT: vmovdqa {{.*#+}} xmm4 = [65535,65535,65535,65535]		; AVX1-NEXT: vmovdqa {{.*#+}} xmm4 = [65535,65535,65535,65535]
; AVX1-NEXT: vpminud %xmm4, %xmm3, %xmm3		; AVX1-NEXT: vpminud %xmm4, %xmm3, %xmm3
▲ Show 20 Lines • Show All 1,284 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/shift-mask.ll

Load File

llvm/test/CodeGen/X86/shuffle-strided-with-offset-128.ll

Load File

llvm/test/CodeGen/X86/single_elt_vector_memory_operation.ll

Load File

llvm/test/CodeGen/X86/smax.ll

Load File

llvm/test/CodeGen/X86/smin.ll

Load File

llvm/test/CodeGen/X86/umax.ll

Load File

llvm/test/CodeGen/X86/umin.ll

Load File

llvm/test/CodeGen/X86/v8i1-masks.ll

	Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
	; X64-NEXT: vpxor %xmm2, %xmm2, %xmm2			; X64-NEXT: vpxor %xmm2, %xmm2, %xmm2
	; X64-NEXT: vpcmpeqd %xmm2, %xmm1, %xmm1			; X64-NEXT: vpcmpeqd %xmm2, %xmm1, %xmm1
	; X64-NEXT: vpcmpeqd %xmm2, %xmm0, %xmm0			; X64-NEXT: vpcmpeqd %xmm2, %xmm0, %xmm0
	; X64-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0			; X64-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
	; X64-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; X64-NEXT: vandps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-AVX2-LABEL: and_mask_constant:			; X32-AVX2-LABEL: and_mask_constant:
	; X32-AVX2: ## %bb.0:			; X32-AVX2: ## %bb.0:
				deadalnixAuthorUnsubmitted Done Reply Inline Actions We have a new regression here :( deadalnix: We have a new regression here :(
				RKSimonUnsubmitted Not Done Reply Inline Actions Same regression - I just refactored the file recently to add AVX512 coverage RKSimon: Same regression - I just refactored the file recently to add AVX512 coverage
	; X32-AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1			; X32-AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; X32-AVX2-NEXT: vpcmpeqd %ymm1, %ymm0, %ymm0			; X32-AVX2-NEXT: vpcmpeqd %ymm1, %ymm0, %ymm0
	; X32-AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}, %ymm0, %ymm0			; X32-AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2],ymm0[3],ymm1[4],ymm0[5,6],ymm1[7]
				; X32-AVX2-NEXT: vpbroadcastd {{.*#+}} ymm1 = [1,1,1,1,1,1,1,1]
				; X32-AVX2-NEXT: vpand %ymm1, %ymm0, %ymm0
				deadalnixAuthorUnsubmitted Done Reply Inline Actions This is new :( deadalnix: This is new :(
				deadalnixAuthorUnsubmitted Done Reply Inline Actions @RKSimon I was able to reduce this to D129641 . This regression didn't exist before D129641 , but it does now. deadalnix: @RKSimon I was able to reduce this to D129641 . This regression didn't exist before D129641…
				RKSimonUnsubmitted Not Done Reply Inline Actions Sorry I should have said - yes this is caused by more yak shaving. I have a large number of other patches in flight that should help, but I'll add this one to the list. RKSimon: Sorry I should have said - yes this is caused by more yak shaving. I have a large number of…
				deadalnixAuthorUnsubmitted Done Reply Inline Actions Maybe I can help reviewing some? deadalnix: Maybe I can help reviewing some?
	; X32-AVX2-NEXT: retl			; X32-AVX2-NEXT: retl
	;			;
	; X64-AVX2-LABEL: and_mask_constant:			; X64-AVX2-LABEL: and_mask_constant:
	; X64-AVX2: ## %bb.0:			; X64-AVX2: ## %bb.0:
	; X64-AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1			; X64-AVX2-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; X64-AVX2-NEXT: vpcmpeqd %ymm1, %ymm0, %ymm0			; X64-AVX2-NEXT: vpcmpeqd %ymm1, %ymm0, %ymm0
	; X64-AVX2-NEXT: vpand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %ymm0, %ymm0			; X64-AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2],ymm0[3],ymm1[4],ymm0[5,6],ymm1[7]
				; X64-AVX2-NEXT: vpbroadcastd {{.*#+}} ymm1 = [1,1,1,1,1,1,1,1]
				; X64-AVX2-NEXT: vpand %ymm1, %ymm0, %ymm0
	; X64-AVX2-NEXT: retq			; X64-AVX2-NEXT: retq
	%m = icmp eq <8 x i32> %v0, zeroinitializer			%m = icmp eq <8 x i32> %v0, zeroinitializer
	%mand = and <8 x i1> %m, <i1 true, i1 false, i1 false, i1 true, i1 false, i1 true, i1 true, i1 false>			%mand = and <8 x i1> %m, <i1 true, i1 false, i1 false, i1 true, i1 false, i1 true, i1 true, i1 false>
	%r = zext <8 x i1> %mand to <8 x i32>			%r = zext <8 x i1> %mand to <8 x i32>
	ret <8 x i32> %r			ret <8 x i32> %r
	}			}

	define <8 x i32> @two_ands(<8 x float> %x) local_unnamed_addr #0 {			define <8 x i32> @two_ands(<8 x float> %x) local_unnamed_addr #0 {
	▲ Show 20 Lines • Show All 933 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-fshl-256.ll

Load File

llvm/test/CodeGen/X86/vector-fshl-512.ll

Load File

llvm/test/CodeGen/X86/vector-fshl-rot-256.ll

Load File

llvm/test/CodeGen/X86/vector-fshl-rot-512.ll

Load File

llvm/test/CodeGen/X86/vector-fshr-256.ll

Load File

llvm/test/CodeGen/X86/vector-fshr-512.ll

Load File

llvm/test/CodeGen/X86/vector-fshr-rot-256.ll

Load File

llvm/test/CodeGen/X86/vector-fshr-rot-512.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-2.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-3.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-4.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-5.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-6.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-7.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-8.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-2.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-3.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-4.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-5.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-6.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-7.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i32-stride-8.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-2.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-3.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-4.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-5.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-6.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-7.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i64-stride-8.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-2.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-5.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-6.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-7.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-load-i8-stride-8.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-store-i32-stride-7.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-5.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-6.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-7.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-store-i64-stride-8.ll

Load File

llvm/test/CodeGen/X86/vector-interleaved-store-i8-stride-7.ll

Load File

llvm/test/CodeGen/X86/vector-reduce-and-cmp.ll

Load File

llvm/test/CodeGen/X86/vector-reduce-and.ll

Load File

llvm/test/CodeGen/X86/vector-reduce-or.ll

Load File

llvm/test/CodeGen/X86/vector-reduce-xor.ll

Load File

llvm/test/CodeGen/X86/vector-replicaton-i1-mask.ll

Load File

llvm/test/CodeGen/X86/vector-rotate-256.ll

Load File

llvm/test/CodeGen/X86/vector-rotate-512.ll

Load File

llvm/test/CodeGen/X86/vector-shuffle-combining.ll

Show First 20 Lines • Show All 2,985 Lines • ▼ Show 20 Lines	; AVX-NEXT: retq
%6 = insertelement <8 x i16> %5, i16 %a4, i32 6		%6 = insertelement <8 x i16> %5, i16 %a4, i32 6
%7 = insertelement <8 x i16> %6, i16 %b15, i32 7		%7 = insertelement <8 x i16> %6, i16 %b15, i32 7
ret <8 x i16> %7		ret <8 x i16> %7
}		}

define <8 x i16> @shuffle_extract_concat_insert(<4 x i16> %lhsa, <4 x i16> %rhsa, <8 x i16> %b) {		define <8 x i16> @shuffle_extract_concat_insert(<4 x i16> %lhsa, <4 x i16> %rhsa, <8 x i16> %b) {
; SSE2-LABEL: shuffle_extract_concat_insert:		; SSE2-LABEL: shuffle_extract_concat_insert:
; SSE2: # %bb.0:		; SSE2: # %bb.0:
; SSE2-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]		; SSE2-NEXT: movd %xmm1, %eax
; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[0,2,2,3,4,5,6,7]		; SSE2-NEXT: pextrw $2, %xmm1, %ecx
; SSE2-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,4,6,6,7]		; SSE2-NEXT: pextrw $5, %xmm2, %edx
		; SSE2-NEXT: pextrw $7, %xmm2, %esi
		; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
		; SSE2-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,4,7,6,7]
; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]		; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,2,2,3]
; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[1,0,3,2,4,5,6,7]		; SSE2-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[2,1,0,3,4,5,6,7]
; SSE2-NEXT: pshuflw {{.*#+}} xmm1 = xmm2[0,3,2,3,4,5,6,7]		; SSE2-NEXT: pinsrw $4, %ecx, %xmm0
; SSE2-NEXT: pshufhw {{.*#+}} xmm1 = xmm1[0,1,2,3,7,5,6,7]		; SSE2-NEXT: pinsrw $5, %edx, %xmm0
; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,2,2,3]		; SSE2-NEXT: pinsrw $6, %eax, %xmm0
; SSE2-NEXT: pshuflw {{.*#+}} xmm1 = xmm1[0,1,3,2,4,5,6,7]		; SSE2-NEXT: pinsrw $7, %esi, %xmm0
; SSE2-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]
; SSE2-NEXT: retq		; SSE2-NEXT: retq
;		;
; SSSE3-LABEL: shuffle_extract_concat_insert:		; SSSE3-LABEL: shuffle_extract_concat_insert:
; SSSE3: # %bb.0:		; SSSE3: # %bb.0:
; SSSE3-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]		; SSSE3-NEXT: pextrw $2, %xmm1, %eax
; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[4,5,0,1,12,13,8,9,u,u,u,u,u,u,u,u]		; SSSE3-NEXT: pextrw $5, %xmm2, %ecx
; SSSE3-NEXT: pshufb {{.*#+}} xmm2 = xmm2[0,1,6,7,10,11,14,15,u,u,u,u,u,u,u,u]		; SSSE3-NEXT: movd %xmm1, %edx
		; SSSE3-NEXT: pextrw $7, %xmm2, %esi
; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]		; SSSE3-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
		; SSSE3-NEXT: pshufb {{.*#+}} xmm0 = xmm0[8,9,2,3,0,1,14,15,u,u,u,u,u,u,u,u]
		; SSSE3-NEXT: pinsrw $4, %eax, %xmm0
		; SSSE3-NEXT: pinsrw $5, %ecx, %xmm0
		; SSSE3-NEXT: pinsrw $6, %edx, %xmm0
		; SSSE3-NEXT: pinsrw $7, %esi, %xmm0
; SSSE3-NEXT: retq		; SSSE3-NEXT: retq
;		;
; SSE41-LABEL: shuffle_extract_concat_insert:		; SSE41-LABEL: shuffle_extract_concat_insert:
; SSE41: # %bb.0:		; SSE41: # %bb.0:
; SSE41-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]		; SSE41-NEXT: pextrw $2, %xmm1, %eax
; SSE41-NEXT: pshufb {{.*#+}} xmm0 = xmm0[4,5,0,1,12,13,8,9,u,u,u,u,u,u,u,u]
; SSE41-NEXT: pshufb {{.*#+}} xmm2 = xmm2[0,1,6,7,10,11,14,15,u,u,u,u,u,u,u,u]
; SSE41-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]		; SSE41-NEXT: punpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
		; SSE41-NEXT: pshufb {{.*#+}} xmm0 = xmm0[8,9,2,3,0,1,14,15,u,u,u,u,12,13,14,15]
		; SSE41-NEXT: movd %xmm1, %ecx
		; SSE41-NEXT: pinsrw $4, %eax, %xmm0
		; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm0[0,1,2,3,4],xmm2[5],xmm0[6,7]
		; SSE41-NEXT: pinsrw $6, %ecx, %xmm0
		; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm0[0,1,2,3,4,5,6],xmm2[7]
; SSE41-NEXT: retq		; SSE41-NEXT: retq
;		;
; AVX-LABEL: shuffle_extract_concat_insert:		; AVX-LABEL: shuffle_extract_concat_insert:
; AVX: # %bb.0:		; AVX: # %bb.0:
; AVX-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]		; AVX-NEXT: vpextrw $2, %xmm1, %eax
; AVX-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[4,5,0,1,12,13,8,9,u,u,u,u,u,u,u,u]		; AVX-NEXT: vpunpcklwd {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1],xmm0[2],xmm2[2],xmm0[3],xmm2[3]
; AVX-NEXT: vpshufb {{.*#+}} xmm1 = xmm2[0,1,6,7,10,11,14,15,u,u,u,u,u,u,u,u]		; AVX-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[8,9,2,3,0,1,14,15,u,u,u,u,12,13,14,15]
; AVX-NEXT: vpunpcklwd {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3]		; AVX-NEXT: vmovd %xmm1, %ecx
		; AVX-NEXT: vpinsrw $4, %eax, %xmm0, %xmm0
		; AVX-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3,4],xmm2[5],xmm0[6,7]
		; AVX-NEXT: vpinsrw $6, %ecx, %xmm0, %xmm0
		; AVX-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3,4,5,6],xmm2[7]
; AVX-NEXT: retq		; AVX-NEXT: retq
		RKSimonUnsubmitted Not Done Reply Inline Actions It looks like this no longer folds with DAGCombiner::combineInsertEltToShuffle RKSimon: It looks like this no longer folds with DAGCombiner::combineInsertEltToShuffle
%a = shufflevector <4 x i16> %lhsa, <4 x i16> %rhsa, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>		%a = shufflevector <4 x i16> %lhsa, <4 x i16> %rhsa, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
%a0 = extractelement <8 x i16> %a, i32 0		%a0 = extractelement <8 x i16> %a, i32 0
%a4 = extractelement <8 x i16> %a, i32 4		%a4 = extractelement <8 x i16> %a, i32 4
%a6 = extractelement <8 x i16> %a, i32 6		%a6 = extractelement <8 x i16> %a, i32 6
%b11 = extractelement <8 x i16> %b, i32 3		%b11 = extractelement <8 x i16> %b, i32 3
%b13 = extractelement <8 x i16> %b, i32 5		%b13 = extractelement <8 x i16> %b, i32 5
%b15 = extractelement <8 x i16> %b, i32 7		%b15 = extractelement <8 x i16> %b, i32 7
%1 = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 2, i32 8, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>		%1 = shufflevector <8 x i16> %a, <8 x i16> %b, <8 x i32> <i32 2, i32 8, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
▲ Show 20 Lines • Show All 523 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-shuffle-concatenation.ll

Load File

llvm/test/CodeGen/X86/vector-shuffle-sse4a.ll

	Show First 20 Lines • Show All 358 Lines • ▼ Show 20 Lines
	;			;

	; Out of range.			; Out of range.
	define <16 x i8> @shuffle_8_18_uuuuuuuuuuuuuu(<16 x i8> %a, <16 x i8> %b) {			define <16 x i8> @shuffle_8_18_uuuuuuuuuuuuuu(<16 x i8> %a, <16 x i8> %b) {
	; AMD10H-LABEL: shuffle_8_18_uuuuuuuuuuuuuu:			; AMD10H-LABEL: shuffle_8_18_uuuuuuuuuuuuuu:
	; AMD10H: # %bb.0:			; AMD10H: # %bb.0:
	; AMD10H-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]			; AMD10H-NEXT: movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
	; AMD10H-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2,2,3]			; AMD10H-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,2,2,3]
				; AMD10H-NEXT: andps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
				chfastUnsubmitted Not Done Reply Inline Actions I don't understand why this change happened and if it is for better. chfast: I don't understand why this change happened and if it is for better.
				RKSimonUnsubmitted Not Done Reply Inline Actions It looks like the change in combine order has affected when canonicalizeShuffleWithBinOps was applied - I'm not very concerned about this change - we already had the domain crossing penalty. More annoying is why does this pre-SSSE3 target not end up with the same codegen as the post-SSSE3 targets below? RKSimon: It looks like the change in combine order has affected when canonicalizeShuffleWithBinOps was…
	; AMD10H-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[2,1,2,3,4,5,6,7]			; AMD10H-NEXT: pshuflw {{.*#+}} xmm0 = xmm0[2,1,2,3,4,5,6,7]
	; AMD10H-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
	; AMD10H-NEXT: packuswb %xmm0, %xmm0			; AMD10H-NEXT: packuswb %xmm0, %xmm0
	; AMD10H-NEXT: retq			; AMD10H-NEXT: retq
	;			;
	; BTVER1-LABEL: shuffle_8_18_uuuuuuuuuuuuuu:			; BTVER1-LABEL: shuffle_8_18_uuuuuuuuuuuuuu:
	; BTVER1: # %bb.0:			; BTVER1: # %bb.0:
	; BTVER1-NEXT: psrld $16, %xmm1			; BTVER1-NEXT: psrld $16, %xmm1
	; BTVER1-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,2,3]			; BTVER1-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,2,3]
	; BTVER1-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]			; BTVER1-NEXT: punpcklbw {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1],xmm0[2],xmm1[2],xmm0[3],xmm1[3],xmm0[4],xmm1[4],xmm0[5],xmm1[5],xmm0[6],xmm1[6],xmm0[7],xmm1[7]
	▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/vector-zext.ll

Load File

llvm/test/CodeGen/X86/widen-load-of-small-alloca-with-zero-upper-half.ll

Load File

llvm/test/CodeGen/X86/xor.ll

	Show First 20 Lines • Show All 403 Lines • ▼ Show 20 Lines
	; X64-LIN-LABEL: PR17487:			; X64-LIN-LABEL: PR17487:
	; X64-LIN: # %bb.0:			; X64-LIN: # %bb.0:
	; X64-LIN-NEXT: movl %edi, %eax			; X64-LIN-NEXT: movl %edi, %eax
	; X64-LIN-NEXT: andl $1, %eax			; X64-LIN-NEXT: andl $1, %eax
	; X64-LIN-NEXT: retq			; X64-LIN-NEXT: retq
	;			;
	; X64-WIN-LABEL: PR17487:			; X64-WIN-LABEL: PR17487:
	; X64-WIN: # %bb.0:			; X64-WIN: # %bb.0:
				; X64-WIN-NEXT: andb $1, %cl
				chfastUnsubmitted Not Done Reply Inline Actions Is this expected preference of having the instruction with smaller operand? chfast: Is this expected preference of having the instruction with smaller operand?
				deadalnixAuthorUnsubmitted Done Reply Inline Actions I'd assume this is target dependent. deadalnix: I'd assume this is target dependent.
	; X64-WIN-NEXT: movzbl %cl, %eax			; X64-WIN-NEXT: movzbl %cl, %eax
	; X64-WIN-NEXT: andl $1, %eax
	; X64-WIN-NEXT: retq			; X64-WIN-NEXT: retq
	%tmp = insertelement <2 x i1> undef, i1 %tobool, i32 1			%tmp = insertelement <2 x i1> undef, i1 %tobool, i32 1
	%tmp1 = zext <2 x i1> %tmp to <2 x i64>			%tmp1 = zext <2 x i1> %tmp to <2 x i64>
	%tmp2 = xor <2 x i64> %tmp1, <i64 1, i64 1>			%tmp2 = xor <2 x i64> %tmp1, <i64 1, i64 1>
	%tmp3 = extractelement <2 x i64> %tmp2, i32 1			%tmp3 = extractelement <2 x i64> %tmp2, i32 1
	%add = add nsw i64 0, %tmp3			%add = add nsw i64 0, %tmp3
	%cmp6 = icmp ne i64 %add, 1			%cmp6 = icmp ne i64 %add, 1
	%conv7 = zext i1 %cmp6 to i32			%conv7 = zext i1 %cmp6 to i32
	▲ Show 20 Lines • Show All 253 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/zero_extend_vector_inreg_of_broadcast.ll

Load File

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Make sure combined nodes are added back to the worklist in topological order.ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Large Diff

Revision Contents

Diff 489859

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/test/CodeGen/AMDGPU/combine-cond-add-sub.ll

llvm/test/CodeGen/AMDGPU/dagcombine-setcc-select.ll

llvm/test/CodeGen/AMDGPU/ds-alignment.ll

llvm/test/CodeGen/AMDGPU/ds_write2.ll

llvm/test/CodeGen/AMDGPU/idot4u.ll

llvm/test/CodeGen/AMDGPU/idot8s.ll

llvm/test/CodeGen/AMDGPU/idot8u.ll

llvm/test/CodeGen/AMDGPU/load-local-redundant-copies.ll

llvm/test/CodeGen/AMDGPU/store-local.128.ll

llvm/test/CodeGen/AMDGPU/store-local.96.ll

llvm/test/CodeGen/AMDGPU/widen-smrd-loads.ll

llvm/test/CodeGen/ARM/addsubcarry-promotion.ll

llvm/test/CodeGen/ARM/icmp-shift-opt.ll

llvm/test/CodeGen/ARM/reg_sequence.ll

llvm/test/CodeGen/Hexagon/autohvx/isel-vpackew.ll

llvm/test/CodeGen/Hexagon/autohvx/mulh.ll

llvm/test/CodeGen/PowerPC/aix32-cc-abi-vaarg.ll

llvm/test/CodeGen/PowerPC/combine-fneg.ll

llvm/test/CodeGen/PowerPC/select_const.ll

llvm/test/CodeGen/RISCV/mul.ll

llvm/test/CodeGen/RISCV/pr58511.ll

llvm/test/CodeGen/SystemZ/pr36164.ll

llvm/test/CodeGen/Thumb2/mve-vst3.ll

llvm/test/CodeGen/X86/2011-10-19-LegelizeLoad.ll

llvm/test/CodeGen/X86/2012-08-07-CmpISelBug.ll

llvm/test/CodeGen/X86/addcarry.ll

llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast.ll

llvm/test/CodeGen/X86/any_extend_vector_inreg_of_broadcast_from_memory.ll

llvm/test/CodeGen/X86/avx512-mask-op.ll

llvm/test/CodeGen/X86/avx512bw-intrinsics-upgrade.ll

llvm/test/CodeGen/X86/avx512vl-vec-masked-cmp.ll

llvm/test/CodeGen/X86/const-shift-of-constmasked.ll

llvm/test/CodeGen/X86/dagcombine-cse.ll

llvm/test/CodeGen/X86/dagcombine-select.ll

llvm/test/CodeGen/X86/field-extract-use-trunc.ll

llvm/test/CodeGen/X86/horizontal-sum.ll

llvm/test/CodeGen/X86/icmp-shift-opt.ll

llvm/test/CodeGen/X86/insert-into-constant-vector.ll

llvm/test/CodeGen/X86/insertelement-var-index.ll

llvm/test/CodeGen/X86/is_fpclass-fp80.ll

llvm/test/CodeGen/X86/isel-blendi-gettargetconstant.ll

llvm/test/CodeGen/X86/masked_store.ll

llvm/test/CodeGen/X86/movmsk-cmp.ll

llvm/test/CodeGen/X86/mulvi32.ll

llvm/test/CodeGen/X86/nontemporal-3.ll

llvm/test/CodeGen/X86/pmulh.ll

llvm/test/CodeGen/X86/popcnt.ll

llvm/test/CodeGen/X86/pr53419.ll

llvm/test/CodeGen/X86/promote-vec3.ll

llvm/test/CodeGen/X86/psubus.ll

llvm/test/CodeGen/X86/shift-mask.ll

llvm/test/CodeGen/X86/shuffle-strided-with-offset-128.ll

llvm/test/CodeGen/X86/single_elt_vector_memory_operation.ll

llvm/test/CodeGen/X86/smax.ll

llvm/test/CodeGen/X86/smin.ll

llvm/test/CodeGen/X86/umax.ll

llvm/test/CodeGen/X86/umin.ll

llvm/test/CodeGen/X86/v8i1-masks.ll

llvm/test/CodeGen/X86/vector-fshl-256.ll

llvm/test/CodeGen/X86/vector-fshl-512.ll

llvm/test/CodeGen/X86/vector-fshl-rot-256.ll

llvm/test/CodeGen/X86/vector-fshl-rot-512.ll

llvm/test/CodeGen/X86/vector-fshr-256.ll

llvm/test/CodeGen/X86/vector-fshr-512.ll

llvm/test/CodeGen/X86/vector-fshr-rot-256.ll

llvm/test/CodeGen/X86/vector-fshr-rot-512.ll

llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-2.ll

llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-3.ll

llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-4.ll

llvm/test/CodeGen/X86/vector-interleaved-load-i16-stride-5.ll

[DAGCombine] Make sure combined nodes are added back to the worklist in topological order.
ClosedPublic