This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
3/3
FunctionLoweringInfo.h
-
SelectionDAG.h
2/4
TargetLowering.h
2/4
TargetRegisterInfo.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
1/1
DAGCombiner.cpp
1/1
FunctionLoweringInfo.cpp
-
InstrEmitter.h
4/6
InstrEmitter.cpp
-
SelectionDAGBuilder.cpp
-
SelectionDAGISel.cpp
-
Target/
-
AMDGPU/
4/4
SIFixSGPRCopies.cpp
1/1
SIISelLowering.h
4/8
SIISelLowering.cpp
-
SIInstrInfo.cpp
1/1
SIRegisterInfo.h
-
ARM/
1/1
ARMISelLowering.h
1/1
ARMISelLowering.cpp
-
test/CodeGen/AMDGPU/
-
CodeGen/
-
AMDGPU/
-
atomicrmw-nand.ll
-
branch-relaxation.ll
-
branch-uniformity.ll
-
control-flow-fastregalloc.ll
-
divergent-branch-uniform-condition.ll
-
extract_subvector_vec4_vec3.ll
-
fabs.ll
-
fdiv32-to-rcp-folding.ll
-
fmin_legacy.ll
-
fneg-fabs.ll
-
fsub.ll
-
i1-copy-from-loop.ll
-
i1-copy-phi-uniform-branch.ll
-
insert_vector_elt.ll
-
llvm.amdgcn.div.scale.ll
-
llvm.amdgcn.fmed3.ll
-
llvm.amdgcn.mov.dpp.ll
-
llvm.amdgcn.mqsad.pk.u16.u8.ll
-
llvm.amdgcn.qsad.pk.u16.u8.ll
-
loop_break.ll
-
madak.ll
-
mubuf-legalize-operands.ll
-
multilevel-break.ll
-
select-opt.ll
-
sgpr-control-flow.ll
-
si-fix-sgpr-copies.mir
-
smrd.ll
-
subreg-coalescer-undef-use.ll
-
uniform-loop-inside-nonuniform.ll
-
use-sgpr-multiple-times.ll
-
valu-i1.ll
-
vgpr-spill-emergency-stack-slot-compute.ll

Differential D59990

AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence.
ClosedPublic

Authored by alex-t on Mar 29 2019, 6:40 AM.

Download Raw Diff

Details

Reviewers

rampitec
nhaehnle
arsenm
vpykhtin
efriedma

Commits

rGdffedea01482: [AMDGPU] Divergence driven ISel. Assign register class for cross block values…
rL361644: [AMDGPU] Divergence driven ISel. Assign register class for cross block values…

Summary

To make instruction selection really divergence driven it is necessary to assign the correct register classes to the cross block values beforehand.
For the divergent targets same value type requires different register classes dependent on the value divergence.
For example uniform i32 is SReg_32RegClass but the divergent one is VReg_32RegClass.
Unfortunately, TargetLowering::getRegClassFor function relies on the simple array indexed by the value types.
Hence we only have one register class for the concrete value type. To workaround this I had to override this method in the target.
I also add the boolean argument to designate the value divergence.

This review has passed precheckin.

However it is created as a starting point for the wider discussion to elaborate the best approach.

Diff Detail

Event Timeline

alex-t created this revision.Mar 29 2019, 6:40 AM

Herald added subscribers: jdoerfert, jfb, t-tye and 9 others. · View Herald TranscriptMar 29 2019, 6:40 AM

alex-t updated this revision to Diff 192828.Mar 29 2019, 8:15 AM

alex-t edited the summary of this revision. (Show Details)

rampitec added inline comments.Mar 29 2019, 4:50 PM

include/llvm/CodeGen/FunctionLoweringInfo.h
60	Please follow formatting rules with pointers, in all places.
include/llvm/CodeGen/TargetLowering.h
647	Indent. Also please write a comment what is this function about.
include/llvm/CodeGen/TargetRegisterInfo.h
523	80 chars per line. Also please write a comment what is this function about.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
13921–13922	Can you run clang-format please?
lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
382	80 chars.
lib/CodeGen/SelectionDAG/InstrEmitter.cpp
229	Switch the condition order. Node->isDicergent() is less expensive.
404	Switch the condition order.
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
636	!isSGPRReg()
lib/Target/AMDGPU/SIISelLowering.cpp
10169	!isSGPRClass()
lib/Target/AMDGPU/SIRegisterInfo.h
200	!isSGPRClass()

changed according the reviewer request

alex-t marked 10 inline comments as done.Apr 2 2019, 6:00 AM

alex-t added a reviewer: efriedma.

rampitec added inline comments.Apr 2 2019, 10:55 AM

lib/Target/AMDGPU/SIFixSGPRCopies.cpp
635	I think you can decrease nesting here. At the very least join individual "if" conditions with "&&".

efriedma added inline comments.Apr 2 2019, 12:25 PM

include/llvm/CodeGen/TargetLowering.h
648	I'm confused what this means. A value is either divergent, or not divergent, and the premise of this patch is that it isn't appropriate to put a divergent value in a uniform register. But this patch forces the value into a uniform register anyway?
lib/CodeGen/SelectionDAG/InstrEmitter.cpp
592	Weird indentation.
lib/Target/AMDGPU/SIISelLowering.cpp
10207	It's probably not a good idea to traverse the use-list of anything that isn't an instruction here; you could end up finding uses in a different function. I don't really understand what you're trying to do here; is this going to force an arbitrary tree of instructions into uniform registers?

alex-t marked 2 inline comments as done.Apr 3 2019, 4:34 AM

alex-t added inline comments.

include/llvm/CodeGen/TargetLowering.h
648	There is no direct one-to-one mapping between the divergence of the high level IR value and target specific register class. So, each target need the specific hook for target specific selection of the register class for the given value. And yes, I agree that the function name is misleading. I should rename it. For instance: we have naturally divergent value that is lowered to 64 bits individual for each thread in the wave front. Obviously we want it to live in the 64 bit scalar register.
lib/Target/AMDGPU/SIISelLowering.cpp
10207	Hmm... I was pretty sure that the UseList is per function... it really is the same for the whole module? It looks weird to me. As for what I'm doing here: the whole program slice that produces/consumes 64 bit mask for the SI Control Flow staff should be forced to the SGPRs. It is NOT arbitrary set. I start from the value in question and DFS on the def-use tree until I meet SI_IF_BREAK or to the end. If I meet SI_IF_BREAK whole the tree requires SGPRs for concrete uses. Let's say we have some instruction that produce the value, SI_IF_BREAK that finally uses the derived value and some another instruction in between that is naturally VALU. If this VALU instruction description does not allow the SGPR on this exact operand position we'll have to insert a SGPR to VGPR COPY operation. Once more it is target specific interpretation. Divergence of the high level value does not strictly require it to be assigned to VGPR. It may be 64 bit mask.

efriedma added inline comments.Apr 3 2019, 11:55 AM

lib/Target/AMDGPU/SIISelLowering.cpp
10207	Yes, use-lists are global, in general. For Instructions and Arguments specifically, all the uses are required to be in the same function as the definition, though, so maybe you're fine here. If I'm following correctly, amdgcn_if_break are specialized intrinsics which are generated just before isel, and there's a specific set of PHI nodes and intrinsics that need to remain in SGPRs? I'm not quite sure I follow why you can't solve the issue by making divergence analysis treat amdgcn_if_break as a non-divergent operation. But I'll let a reviewer more familiar with the target handle that. Given the way the input is currently structured, the check is reasonable, I guess.

alex-t marked an inline comment as done.Apr 4 2019, 7:50 AM

alex-t added inline comments.

lib/Target/AMDGPU/SIISelLowering.cpp
10207	Why we cannot just add the exception to the DA for SI_IF_BREAK? Typically all the CF intrinsics are connected by the 64bit mask that is defined by one (SI_IF for instance) and then used by another (SI_Else and SI_END_CF for instance). This mask always required to reside in 64 bit SGPR. Same time the condition is usual boolean value and can be either divergent or not. If we force the whole intrinsic to be uniform we would spoil the divergence propagation for those values that are control dependent on the divergent CF. And that;s why the following code: if ((Intrinsic->getIntrinsicID() == Intrinsic::amdgcn_if_break) && (V == U->getOperand(1))) Result = true; explicitly checks which operand is the use.

efriedma added inline comments.Apr 4 2019, 12:04 PM

lib/Target/AMDGPU/SIISelLowering.cpp
10207	Okay, that makes sense.

Adding llvm-commits to the CC. Please be more careful about that in the future... see http://llvm.org/docs/Phabricator.html

In D59990#1455330, @efriedma wrote:

Adding llvm-commits to the CC. Please be more careful about that in the future... see http://llvm.org/docs/Phabricator.html

Okay, thanks a lot :)

From the point of view of the design of all these interface, It's too bad we can't fix this in post. From an overall standpoint, it's actually better to get the register classes from the beginning, so sure, let's go with this kind of approach.

After a bit of reflection, I think that in part what's happening here is that the uniform/divergent axis and the register bank axis is getting confused. See @efriedma's question and my comment on isDivergentRegClass. I wonder if some parts of this change would not be better expressed in terms of register banks. For example, in the uses of isDivergentRegClass, aren't we really looking for another register class in the same bank (SGPR vs. VGPR)? Similarly, maybe requiresUniformRegister can be rephrased as returning a required register bank (and nullptr by default)?

Why do we still need to move PHIs to VALU after this change? (Looking at SIFixSGPRCopies) Shouldn't the PHI be selected with the correct register class already? What's an example where this doesn't happen?

How are values handled which are uniform inside a loop but divergent for outside uses due to a divergent exit condition?

include/llvm/CodeGen/TargetRegisterInfo.h
524–526	This function is problematic because we can't actually tell for a given register class whether the underlying value is divergent or not. Specifically, 64-bit SGPRs can be either uniform or divergent depending on whether it's the lowering of an i1 or an i64.
lib/CodeGen/SelectionDAG/InstrEmitter.cpp
556	The formatting looks off here.
lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
726 ↗	(On Diff #193265)	This looks like it belongs into a separate patch.
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
630	const auto &
lib/Target/AMDGPU/SIISelLowering.cpp
10167–10168	Can we not keep uniform i1 in 32-bit registers? Maybe at least mark this as a TODO?

In D59990#1457613, @nhaehnle wrote:

After a bit of reflection, I think that in part what's happening here is that the uniform/divergent axis and the register bank axis is getting confused. See @efriedma's question and my comment on isDivergentRegClass. I wonder if some parts of this change would not be better expressed in terms of register banks. For example, in the uses of isDivergentRegClass, aren't we really looking for another register class in the same bank (SGPR vs. VGPR)? Similarly, maybe requiresUniformRegister can be rephrased as returning a required register bank (and nullptr by default)?

Why do we still need to move PHIs to VALU after this change? (Looking at SIFixSGPRCopies) Shouldn't the PHI be selected with the correct register class already? What's an example where this doesn't happen?

Trying to answer all the above now... The reason for isDivergentRegClass hook and moving PHIs to VALU is the same.
Divergence driven ISel would work fine in case we really have SALU alternative for every VALU instruction. Unfortunately we have no.
This leads to insertion unnecessary moves and v_readfirstlane around the naturally uniform code.

Let's imagine we have a uniform loop that multiplies and adds floating point array elements.
We only have VALU floating point v_fmad that accepts VGPRs and produce VGPR. Yes all the values are uniform but they still need to be in VGPRs!
If we keep PHI in the loop header uniform we would end up moving SGPRs to VGPRs before v_fma and v_readfirstlane the resulting VGPR to just make it go round the loop to be moved back to VGPR.

So, we need some interface to query if the given register class is considered uniform or divergent in the given target.

Look how we use it:

const TargetRegisterClass *RC =
  TRI->getAllocatableClass(TII->getRegClass(II, i, TRI, *MF));
   
if (i < NumResults && TLI->isTypeLegal(Node->getSimpleValueType(i))) {
  const TargetRegisterClass *VTRC = TLI->getRegClassFor(
      Node->getSimpleValueType(i),
      Node->isDivergent() || (RC && TRI->isDivergentRegClass(RC)));

RC above - register class retrieved from the instruction description. That means we must have VGPR operand in this concrete position.
Thus, the condition in "getRegClassFor" is really "if SDNode is divergent itself or we have to assign VGPR because we only have VALU form of the instruction".

Same for the PHIs. By the point SIFixSGPRCopies works we already selected everything according the divergence.
If we have uniform PHI with VGPR input or the uniform PHIs user requires VGPR that means we just have no SALU form for the defining instruction or the user instruction.
In this case we have to convert PHI back to VALU.

I understand that my solution looks disgusting. And yes I'm thinking of further changing the getRegClassFor interface to incorporate all the target related hacks to the target specific code.
The main problem is that in several places in LLVM core code the getRegClassFor is called from the context that only have type or register class but has no Value.

alex-t marked an inline comment as done.Apr 8 2019, 8:49 AM

alex-t added inline comments.

include/llvm/CodeGen/TargetRegisterInfo.h
524–526	This is not about the underlying value at all. This is a way to ask the target does it consider given register class as uniform or divergent. In other words: we cannot expose the concrete register class properties to the common code. From the other hand, the instruction description structure is common and it maps operand to register class. While emitting the instruction we want to consult the target if the given operand required to be assigned the divergent (aka VGPR) register. This is not because of the value divergence but because the selected instruction.

Okay, you've convinced me. I only hope we can move forward with GlobalISel and do it right there.

There are still some formatting issues, but apart from that I think the patch is good.

include/llvm/CodeGen/TargetRegisterInfo.h
524–526	Another way to look at it is that my misunderstanding of the point of the function is precisely why the name is so misleading :)

Added fixes after extended testing. Also GFX10 related update.

LGTM apart from a bunch of formatting issues. I haven't marked all of them, please just run clang-format or clang-format-diff.

include/llvm/CodeGen/TargetLowering.h
646–647	Please run clang-format.

formatting etc

rampitec added inline comments.May 15 2019, 8:14 AM

include/llvm/CodeGen/FunctionLoweringInfo.h
246	Formatting.
248	Formatting.
lib/CodeGen/SelectionDAG/InstrEmitter.cpp
293–294	Formatting.
592	Formatting.
lib/Target/AMDGPU/SIFixSGPRCopies.cpp
635	You can still decrease nesting.
lib/Target/AMDGPU/SIISelLowering.cpp
9645–9646	Formatting.
lib/Target/AMDGPU/SIISelLowering.h
370–373	Formatting.
lib/Target/ARM/ARMISelLowering.cpp
1432–1434	Formatting.
lib/Target/ARM/ARMISelLowering.h
459–460	Formatting.

more formatting + new test updated

LGTM. Let's finish with internal integration and testing before proceeding.

rebased

LGTM

This revision is now accepted and ready to land.May 23 2019, 11:39 AM

Closed by commit rL361644: [AMDGPU] Divergence driven ISel. Assign register class for cross block values… (authored by alex-t). · Explain WhyMay 24 2019, 8:33 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptMay 24 2019, 8:33 AM

Hi Alexander, unfortunately I needed to revert this in rL361688 because it broke two sanitizer bots.

Hi there,

This change introduces a regression with RADV, all dEQP-VK.subgroups.arithmetic.framebuffer.* are failing now.
Can someone look into this?
Thanks!

In D59990#1517891, @hakzsam wrote:

Hi there,

This change introduces a regression with RADV, all dEQP-VK.subgroups.arithmetic.framebuffer.* are failing now.
Can someone look into this?
Thanks!

Yes. Sure.
I'll try to fix this in 3 days and revert if won't succeed.

In D59990#1517891, @hakzsam wrote:

Hi there,

This change introduces a regression with RADV, all dEQP-VK.subgroups.arithmetic.framebuffer.* are failing now.
Can someone look into this?
Thanks!

I have a patch that fixes the issue in another test suite. Could you please suggest how to check if it also fixes RADV?
https://reviews.llvm.org/D62614

In D59990#1521409, @alex-t wrote:

In D59990#1517891, @hakzsam wrote:

Hi there,

This change introduces a regression with RADV, all dEQP-VK.subgroups.arithmetic.framebuffer.* are failing now.
Can someone look into this?
Thanks!

I have a patch that fixes the issue in another test suite. Could you please suggest how to check if it also fixes RADV?
https://reviews.llvm.org/D62614

This patch fixes the CTS failures on my side. I have just tried the latest version.

In D59990#1527275, @hakzsam wrote:

In D59990#1521409, @alex-t wrote:

In D59990#1517891, @hakzsam wrote:

Hi there,

This change introduces a regression with RADV, all dEQP-VK.subgroups.arithmetic.framebuffer.* are failing now.
Can someone look into this?
Thanks!

I have a patch that fixes the issue in another test suite. Could you please suggest how to check if it also fixes RADV?
https://reviews.llvm.org/D62614

This patch fixes the CTS failures on my side. I have just tried the latest version.

Err, only a subset is fixed actually.

A number of Mesa piglit tests are also affected at least on Bonaire (but it seems to not be GPU-specific, I haven't had a chance to look at it further).

- bin/ext_transform_feedback-order elements triangles
- bin/ext_transform_feedback-order elements points
- bin/ext_transform_feedback-order elements lines
- bin/ext_transform_feedback-order arrays triangles>
- bin/ext_transform_feedback-order arrays points
- bin/ext_transform_feedback-order arrays lines
- arb_clear_buffer_object-formats (96-bit clears)

It's unclear whether the regression is caused by this particular commit or by the subsequent ASAN fix.

In D59990#1527276, @hakzsam wrote:

In D59990#1527275, @hakzsam wrote:

In D59990#1521409, @alex-t wrote:

In D59990#1517891, @hakzsam wrote:

Hi there,

This change introduces a regression with RADV, all dEQP-VK.subgroups.arithmetic.framebuffer.* are failing now.
Can someone look into this?
Thanks!

I have a patch that fixes the issue in another test suite. Could you please suggest how to check if it also fixes RADV?
https://reviews.llvm.org/D62614

This patch fixes the CTS failures on my side. I have just tried the latest version.

Err, only a subset is fixed actually.

I have updated the change ttps://reviews.llvm.org/D62614 this Sunday.
The new one takes completely different approach. I'd appreciate very much If you could try it.

In D59990#1527631, @nhaehnle wrote:
A number of Mesa piglit tests are also affected at least on Bonaire (but it seems to not be GPU-specific, I haven't had a chance to look at it further).
- bin/ext_transform_feedback-order elements triangles
- bin/ext_transform_feedback-order elements points
- bin/ext_transform_feedback-order elements lines
- bin/ext_transform_feedback-order arrays triangles>
- bin/ext_transform_feedback-order arrays points
- bin/ext_transform_feedback-order arrays lines
- arb_clear_buffer_object-formats (96-bit clears)
It's unclear whether the regression is caused by this particular commit or by the subsequent ASAN fix.

Could you please try the newest fix that prevent SGPR to VGPR copies sinking out of the loop?
If it does not help I will revert the change.

I have updated the change ttps://reviews.llvm.org/D62614 this Sunday.
The new one takes completely different approach. I'd appreciate very much If you could try it.

D62614 doesn't fix the issue.

In D59990#1530511, @hakzsam wrote:

I have updated the change ttps://reviews.llvm.org/D62614 this Sunday.
The new one takes completely different approach. I'd appreciate very much If you could try it.

D62614 doesn't fix the issue.

Okay. I'm about to start partial revert of the change.
Could you please provide me test cases so that I can check if my further fixes help.

In D59990#1530603, @alex-t wrote:

In D59990#1530511, @hakzsam wrote:

I have updated the change ttps://reviews.llvm.org/D62614 this Sunday.
The new one takes completely different approach. I'd appreciate very much If you could try it.

D62614 doesn't fix the issue.

Okay. I'm about to start partial revert of the change.
Could you please provide me test cases so that I can check if my further fixes help.

See below the good and bad outputs for one CTS failure:

GOOD: https://hastebin.com/muwuwivofu
BAD: https://hastebin.com/gofawejoku

Thanks again for looking into this.

Note that this change also breaks https://bugs.freedesktop.org/show_bug.cgi?id=110811

In D59990#1530704, @hakzsam wrote:

In D59990#1530603, @alex-t wrote:

In D59990#1530511, @hakzsam wrote:

I have updated the change ttps://reviews.llvm.org/D62614 this Sunday.
The new one takes completely different approach. I'd appreciate very much If you could try it.

D62614 doesn't fix the issue.

Okay. I'm about to start partial revert of the change.
Could you please provide me test cases so that I can check if my further fixes help.

See below the good and bad outputs for one CTS failure:

GOOD: https://hastebin.com/muwuwivofu
BAD: https://hastebin.com/gofawejoku

Thanks again for looking into this.

Note that this change also breaks https://bugs.freedesktop.org/show_bug.cgi?id=110811

I investigated the failed case. The reason is again in use of the value that is uniform inside the loop but the loop has divergent exit.
We rely on LCSSA PHIs to handle this. Unfortunately, Early CSE pass mistakenly removes them.
I have one line fix and the review for it: https://reviews.llvm.org/D63489
Could you possibly check if it helps in this particular case? If yes it maybe worth checking others...

Revision Contents

Path

Size

include/

llvm/

CodeGen/

FunctionLoweringInfo.h

11 lines

SelectionDAG.h

1 line

TargetLowering.h

11 lines

TargetRegisterInfo.h

5 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

6 lines

FunctionLoweringInfo.cpp

14 lines

InstrEmitter.h

2 lines

InstrEmitter.cpp

33 lines

SelectionDAGBuilder.cpp

4 lines

SelectionDAGISel.cpp

2 lines

Target/

AMDGPU/

114 lines

5 lines

91 lines

13 lines

5 lines

ARM/

ARMISelLowering.h

3 lines

ARMISelLowering.cpp

4 lines

test/

CodeGen/

AMDGPU/

atomicrmw-nand.ll

12 lines

branch-relaxation.ll

3 lines

branch-uniformity.ll

4 lines

control-flow-fastregalloc.ll

7 lines

divergent-branch-uniform-condition.ll

55 lines

extract_subvector_vec4_vec3.ll

6 lines

fabs.ll

12 lines

fdiv32-to-rcp-folding.ll

58 lines

8 lines

16 lines

12 lines

10 lines

i1-copy-phi-uniform-branch.ll

1 line

insert_vector_elt.ll

6 lines

llvm.amdgcn.div.scale.ll

2 lines

llvm.amdgcn.fmed3.ll

8 lines

llvm.amdgcn.mov.dpp.ll

2 lines

llvm.amdgcn.mqsad.pk.u16.u8.ll

2 lines

llvm.amdgcn.qsad.pk.u16.u8.ll

2 lines

loop_break.ll

8 lines

madak.ll

12 lines

mubuf-legalize-operands.ll

5 lines

multilevel-break.ll

5 lines

select-opt.ll

4 lines

sgpr-control-flow.ll

3 lines

si-fix-sgpr-copies.mir

2 lines

smrd.ll

1 line

subreg-coalescer-undef-use.ll

53 lines

uniform-loop-inside-nonuniform.ll

5 lines

use-sgpr-multiple-times.ll

9 lines

valu-i1.ll

6 lines

vgpr-spill-emergency-stack-slot-compute.ll

1 line

Diff 200995

include/llvm/CodeGen/FunctionLoweringInfo.h

//===- FunctionLoweringInfo.h - Lower functions from LLVM IR ---- C++ ---===//		//===- FunctionLoweringInfo.h - Lower functions from LLVM IR ---- C++ ---===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This implements routines for translating functions from LLVM IR into		// This implements routines for translating functions from LLVM IR into
// Machine IR.		// Machine IR.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_CODEGEN_FUNCTIONLOWERINGINFO_H		#ifndef LLVM_CODEGEN_FUNCTIONLOWERINGINFO_H
#define LLVM_CODEGEN_FUNCTIONLOWERINGINFO_H		#define LLVM_CODEGEN_FUNCTIONLOWERINGINFO_H

#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/BitVector.h"		#include "llvm/ADT/BitVector.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/IndexedMap.h"		#include "llvm/ADT/IndexedMap.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
		#include "llvm/Analysis/LegacyDivergenceAnalysis.h"
#include "llvm/CodeGen/ISDOpcodes.h"		#include "llvm/CodeGen/ISDOpcodes.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/TargetRegisterInfo.h"		#include "llvm/CodeGen/TargetRegisterInfo.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
#include <cassert>		#include <cassert>
Show All 20 Lines
///		///
class FunctionLoweringInfo {		class FunctionLoweringInfo {
public:		public:
const Function *Fn;		const Function *Fn;
MachineFunction *MF;		MachineFunction *MF;
const TargetLowering *TLI;		const TargetLowering *TLI;
MachineRegisterInfo *RegInfo;		MachineRegisterInfo *RegInfo;
BranchProbabilityInfo *BPI;		BranchProbabilityInfo *BPI;
		const LegacyDivergenceAnalysis *DA;
		rampitecUnsubmitted Done Reply Inline Actions Please follow formatting rules with pointers, in all places. rampitec: Please follow formatting rules with pointers, in all places.
/// CanLowerReturn - true iff the function's return value can be lowered to		/// CanLowerReturn - true iff the function's return value can be lowered to
/// registers.		/// registers.
bool CanLowerReturn;		bool CanLowerReturn;

/// True if part of the CSRs will be handled via explicit copies.		/// True if part of the CSRs will be handled via explicit copies.
bool SplitCSR;		bool SplitCSR;

/// DemoteRegister - if CanLowerReturn is false, DemoteRegister is a vreg		/// DemoteRegister - if CanLowerReturn is false, DemoteRegister is a vreg
▲ Show 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	public:
void clear();		void clear();

/// isExportedInst - Return true if the specified value is an instruction		/// isExportedInst - Return true if the specified value is an instruction
/// exported from its block.		/// exported from its block.
bool isExportedInst(const Value *V) {		bool isExportedInst(const Value *V) {
return ValueMap.count(V);		return ValueMap.count(V);
}		}

unsigned CreateReg(MVT VT);		unsigned CreateReg(MVT VT, bool isDivergent = false);

		unsigned CreateRegs(const Value *V);
		rampitecUnsubmitted Done Reply Inline Actions Formatting. rampitec: Formatting.

unsigned CreateRegs(Type *Ty);		unsigned CreateRegs(Type *Ty, bool isDivergent = false);
		rampitecUnsubmitted Done Reply Inline Actions Formatting. rampitec: Formatting.

unsigned InitializeRegForValue(const Value *V) {		unsigned InitializeRegForValue(const Value *V) {
// Tokens never live in vregs.		// Tokens never live in vregs.
if (V->getType()->isTokenTy())		if (V->getType()->isTokenTy())
return 0;		return 0;
unsigned &R = ValueMap[V];		unsigned &R = ValueMap[V];
assert(R == 0 && "Already initialized this value register!");		assert(R == 0 && "Already initialized this value register!");
assert(VirtReg2Value.empty());		assert(VirtReg2Value.empty());
return R = CreateRegs(V->getType());		return R = CreateRegs(V);
}		}

/// GetLiveOutRegInfo - Gets LiveOutInfo for a register, returning NULL if the		/// GetLiveOutRegInfo - Gets LiveOutInfo for a register, returning NULL if the
/// register is a PHI destination and the PHI's LiveOutInfo is not valid.		/// register is a PHI destination and the PHI's LiveOutInfo is not valid.
const LiveOutInfo *GetLiveOutRegInfo(unsigned Reg) {		const LiveOutInfo *GetLiveOutRegInfo(unsigned Reg) {
if (!LiveOutRegInfo.inBounds(Reg))		if (!LiveOutRegInfo.inBounds(Reg))
return nullptr;		return nullptr;

▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

include/llvm/CodeGen/SelectionDAG.h

Show First 20 Lines • Show All 400 Lines • ▼ Show 20 Lines	public:
const Pass *getPass() const { return SDAGISelPass; }		const Pass *getPass() const { return SDAGISelPass; }

const DataLayout &getDataLayout() const { return MF->getDataLayout(); }		const DataLayout &getDataLayout() const { return MF->getDataLayout(); }
const TargetMachine &getTarget() const { return TM; }		const TargetMachine &getTarget() const { return TM; }
const TargetSubtargetInfo &getSubtarget() const { return MF->getSubtarget(); }		const TargetSubtargetInfo &getSubtarget() const { return MF->getSubtarget(); }
const TargetLowering &getTargetLoweringInfo() const { return *TLI; }		const TargetLowering &getTargetLoweringInfo() const { return *TLI; }
const TargetLibraryInfo &getLibInfo() const { return *LibInfo; }		const TargetLibraryInfo &getLibInfo() const { return *LibInfo; }
const SelectionDAGTargetInfo &getSelectionDAGInfo() const { return *TSI; }		const SelectionDAGTargetInfo &getSelectionDAGInfo() const { return *TSI; }
		const LegacyDivergenceAnalysis *getDivergenceAnalysis() const { return DA; }
LLVMContext *getContext() const {return Context; }		LLVMContext *getContext() const {return Context; }
OptimizationRemarkEmitter &getORE() const { return *ORE; }		OptimizationRemarkEmitter &getORE() const { return *ORE; }

/// Pop up a GraphViz/gv window with the DAG rendered using 'dot'.		/// Pop up a GraphViz/gv window with the DAG rendered using 'dot'.
void viewGraph(const std::string &Title);		void viewGraph(const std::string &Title);
void viewGraph();		void viewGraph();

#ifndef NDEBUG		#ifndef NDEBUG
▲ Show 20 Lines • Show All 1,314 Lines • Show Last 20 Lines

include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 630 Lines • ▼ Show 20 Lines	public:
/// for different nodes. This function returns the preference (or none) for		/// for different nodes. This function returns the preference (or none) for
/// the given node.		/// the given node.
virtual Sched::Preference getSchedulingPreference(SDNode *) const {		virtual Sched::Preference getSchedulingPreference(SDNode *) const {
return Sched::None;		return Sched::None;
}		}

/// Return the register class that should be used for the specified value		/// Return the register class that should be used for the specified value
/// type.		/// type.
virtual const TargetRegisterClass *getRegClassFor(MVT VT) const {		virtual const TargetRegisterClass *getRegClassFor(MVT VT, bool isDivergent = false) const {
		(void)isDivergent;
const TargetRegisterClass *RC = RegClassForVT[VT.SimpleTy];		const TargetRegisterClass *RC = RegClassForVT[VT.SimpleTy];
assert(RC && "This value type is not natively supported!");		assert(RC && "This value type is not natively supported!");
return RC;		return RC;
}		}

		/// Allows target to decide about the register class of the
		/// specific value that is live outside the defining block.
		rampitecUnsubmitted Done Reply Inline Actions Indent. Also please write a comment what is this function about. rampitec: Indent. Also please write a comment what is this function about.
		nhaehnleUnsubmitted Not Done Reply Inline Actions Please run clang-format. nhaehnle: Please run clang-format.
		/// Returns true if the value needs uniform register class.
		efriedmaUnsubmitted Not Done Reply Inline Actions I'm confused what this means. A value is either divergent, or not divergent, and the premise of this patch is that it isn't appropriate to put a divergent value in a uniform register. But this patch forces the value into a uniform register anyway? efriedma: I'm confused what this means. A value is either divergent, or not divergent, and the premise…
		alex-tAuthorUnsubmitted Done Reply Inline Actions There is no direct one-to-one mapping between the divergence of the high level IR value and target specific register class. So, each target need the specific hook for target specific selection of the register class for the given value. And yes, I agree that the function name is misleading. I should rename it. For instance: we have naturally divergent value that is lowered to 64 bits individual for each thread in the wave front. Obviously we want it to live in the 64 bit scalar register. alex-t: There is no direct one-to-one mapping between the divergence of the high level IR value and…
		virtual bool requiresUniformRegister(MachineFunction &MF,
		const Value *) const {
		return false;
		}

/// Return the 'representative' register class for the specified value		/// Return the 'representative' register class for the specified value
/// type.		/// type.
///		///
/// The 'representative' register class is the largest legal super-reg		/// The 'representative' register class is the largest legal super-reg
/// register class for the register class of the value type. For example, on		/// register class for the register class of the value type. For example, on
/// i386 the rep register class for i8, i16, and i32 are GR32; while the rep		/// i386 the rep register class for i8, i16, and i32 are GR32; while the rep
/// register class is GR64 on x86_64.		/// register class is GR64 on x86_64.
virtual const TargetRegisterClass *getRepRegClassFor(MVT VT) const {		virtual const TargetRegisterClass *getRepRegClassFor(MVT VT) const {
▲ Show 20 Lines • Show All 3,407 Lines • Show Last 20 Lines

include/llvm/CodeGen/TargetRegisterInfo.h

Show First 20 Lines • Show All 514 Lines • ▼ Show 20 Lines	virtual bool isAsmClobberable(const MachineFunction &MF,
unsigned PhysReg) const {		unsigned PhysReg) const {
return true;		return true;
}		}

/// Returns true if PhysReg is unallocatable and constant throughout the		/// Returns true if PhysReg is unallocatable and constant throughout the
/// function. Used by MachineRegisterInfo::isConstantPhysReg().		/// function. Used by MachineRegisterInfo::isConstantPhysReg().
virtual bool isConstantPhysReg(unsigned PhysReg) const { return false; }		virtual bool isConstantPhysReg(unsigned PhysReg) const { return false; }

		/// Returns true if the register class is considered divergent.
		rampitecUnsubmitted Done Reply Inline Actions 80 chars per line. Also please write a comment what is this function about. rampitec: 80 chars per line. Also please write a comment what is this function about.
		virtual bool isDivergentRegClass(const TargetRegisterClass *RC) const {
		return false;
		}
		nhaehnleUnsubmitted Not Done Reply Inline Actions This function is problematic because we can't actually tell for a given register class whether the underlying value is divergent or not. Specifically, 64-bit SGPRs can be either uniform or divergent depending on whether it's the lowering of an i1 or an i64. nhaehnle: This function is problematic because we can't actually tell for a given register class whether…
		alex-tAuthorUnsubmitted Done Reply Inline Actions This is not about the underlying value at all. This is a way to ask the target does it consider given register class as uniform or divergent. In other words: we cannot expose the concrete register class properties to the common code. From the other hand, the instruction description structure is common and it maps operand to register class. While emitting the instruction we want to consult the target if the given operand required to be assigned the divergent (aka VGPR) register. This is not because of the value divergence but because the selected instruction. alex-t: This is not about the underlying value at all. This is a way to ask the target does it…
		nhaehnleUnsubmitted Not Done Reply Inline Actions Another way to look at it is that my misunderstanding of the point of the function is precisely why the name is so misleading :) nhaehnle: Another way to look at it is that my misunderstanding of the point of the function is precisely…

/// Physical registers that may be modified within a function but are		/// Physical registers that may be modified within a function but are
/// guaranteed to be restored before any uses. This is useful for targets that		/// guaranteed to be restored before any uses. This is useful for targets that
/// have call sequences where a GOT register may be updated by the caller		/// have call sequences where a GOT register may be updated by the caller
/// prior to a call and is guaranteed to be restored (also by the caller)		/// prior to a call and is guaranteed to be restored (also by the caller)
/// after the call.		/// after the call.
virtual bool isCallerPreservedPhysReg(unsigned PhysReg,		virtual bool isCallerPreservedPhysReg(unsigned PhysReg,
const MachineFunction &MF) const {		const MachineFunction &MF) const {
return false;		return false;
▲ Show 20 Lines • Show All 663 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 13,912 Lines • ▼ Show 20 Lines	bool canMergeExpensiveCrossRegisterBankCopy() const {
if (!Inst \|\| !Inst->hasOneUse())		if (!Inst \|\| !Inst->hasOneUse())
return false;		return false;
SDNode Use = Inst->use_begin();		SDNode Use = Inst->use_begin();
if (Use->getOpcode() != ISD::BITCAST)		if (Use->getOpcode() != ISD::BITCAST)
return false;		return false;
assert(DAG && "Missing context");		assert(DAG && "Missing context");
const TargetLowering &TLI = DAG->getTargetLoweringInfo();		const TargetLowering &TLI = DAG->getTargetLoweringInfo();
EVT ResVT = Use->getValueType(0);		EVT ResVT = Use->getValueType(0);
const TargetRegisterClass *ResRC = TLI.getRegClassFor(ResVT.getSimpleVT());		const TargetRegisterClass *ResRC =
		TLI.getRegClassFor(ResVT.getSimpleVT(), Use->isDivergent());
		rampitecUnsubmitted Done Reply Inline Actions Can you run clang-format please? rampitec: Can you run clang-format please?
const TargetRegisterClass *ArgRC =		const TargetRegisterClass *ArgRC =
TLI.getRegClassFor(Use->getOperand(0).getValueType().getSimpleVT());		TLI.getRegClassFor(Use->getOperand(0).getValueType().getSimpleVT(),
		Use->getOperand(0)->isDivergent());
if (ArgRC == ResRC \|\| !TLI.isOperationLegal(ISD::LOAD, ResVT))		if (ArgRC == ResRC \|\| !TLI.isOperationLegal(ISD::LOAD, ResVT))
return false;		return false;

// At this point, we know that we perform a cross-register-bank copy.		// At this point, we know that we perform a cross-register-bank copy.
// Check if it is expensive.		// Check if it is expensive.
const TargetRegisterInfo *TRI = DAG->getSubtarget().getRegisterInfo();		const TargetRegisterInfo *TRI = DAG->getSubtarget().getRegisterInfo();
// Assume bitcasts are cheap, unless both register classes do not		// Assume bitcasts are cheap, unless both register classes do not
// explicitly share a common sub class.		// explicitly share a common sub class.
▲ Show 20 Lines • Show All 6,256 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
void FunctionLoweringInfo::set(const Function &fn, MachineFunction &mf,		void FunctionLoweringInfo::set(const Function &fn, MachineFunction &mf,
SelectionDAG *DAG) {		SelectionDAG *DAG) {
Fn = &fn;		Fn = &fn;
MF = &mf;		MF = &mf;
TLI = MF->getSubtarget().getTargetLowering();		TLI = MF->getSubtarget().getTargetLowering();
RegInfo = &MF->getRegInfo();		RegInfo = &MF->getRegInfo();
const TargetFrameLowering *TFI = MF->getSubtarget().getFrameLowering();		const TargetFrameLowering *TFI = MF->getSubtarget().getFrameLowering();
unsigned StackAlign = TFI->getStackAlignment();		unsigned StackAlign = TFI->getStackAlignment();
		DA = DAG->getDivergenceAnalysis();

// Check whether the function can return without sret-demotion.		// Check whether the function can return without sret-demotion.
SmallVector<ISD::OutputArg, 4> Outs;		SmallVector<ISD::OutputArg, 4> Outs;
CallingConv::ID CC = Fn->getCallingConv();		CallingConv::ID CC = Fn->getCallingConv();

GetReturnInfo(CC, Fn->getReturnType(), Fn->getAttributes(), Outs, *TLI,		GetReturnInfo(CC, Fn->getReturnType(), Fn->getAttributes(), Outs, *TLI,
mf.getDataLayout());		mf.getDataLayout());
CanLowerReturn =		CanLowerReturn =
▲ Show 20 Lines • Show All 244 Lines • ▼ Show 20 Lines	void FunctionLoweringInfo::clear() {
RegFixups.clear();		RegFixups.clear();
RegsWithFixups.clear();		RegsWithFixups.clear();
StatepointStackSlots.clear();		StatepointStackSlots.clear();
StatepointSpillMaps.clear();		StatepointSpillMaps.clear();
PreferredExtendType.clear();		PreferredExtendType.clear();
}		}

/// CreateReg - Allocate a single virtual register for the given type.		/// CreateReg - Allocate a single virtual register for the given type.
unsigned FunctionLoweringInfo::CreateReg(MVT VT) {		unsigned FunctionLoweringInfo::CreateReg(MVT VT, bool isDivergent) {
return RegInfo->createVirtualRegister(		return RegInfo->createVirtualRegister(
MF->getSubtarget().getTargetLowering()->getRegClassFor(VT));		MF->getSubtarget().getTargetLowering()->getRegClassFor(VT, isDivergent));
}		}

/// CreateRegs - Allocate the appropriate number of virtual registers of		/// CreateRegs - Allocate the appropriate number of virtual registers of
/// the correctly promoted or expanded types. Assign these registers		/// the correctly promoted or expanded types. Assign these registers
/// consecutive vreg numbers and return the first assigned number.		/// consecutive vreg numbers and return the first assigned number.
///		///
/// In the case that the given value has struct or array type, this function		/// In the case that the given value has struct or array type, this function
/// will assign registers for each member or element.		/// will assign registers for each member or element.
///		///
unsigned FunctionLoweringInfo::CreateRegs(Type *Ty) {		unsigned FunctionLoweringInfo::CreateRegs(Type *Ty, bool isDivergent) {
const TargetLowering *TLI = MF->getSubtarget().getTargetLowering();		const TargetLowering *TLI = MF->getSubtarget().getTargetLowering();

SmallVector<EVT, 4> ValueVTs;		SmallVector<EVT, 4> ValueVTs;
ComputeValueVTs(*TLI, MF->getDataLayout(), Ty, ValueVTs);		ComputeValueVTs(*TLI, MF->getDataLayout(), Ty, ValueVTs);

unsigned FirstReg = 0;		unsigned FirstReg = 0;
for (unsigned Value = 0, e = ValueVTs.size(); Value != e; ++Value) {		for (unsigned Value = 0, e = ValueVTs.size(); Value != e; ++Value) {
EVT ValueVT = ValueVTs[Value];		EVT ValueVT = ValueVTs[Value];
MVT RegisterVT = TLI->getRegisterType(Ty->getContext(), ValueVT);		MVT RegisterVT = TLI->getRegisterType(Ty->getContext(), ValueVT);

unsigned NumRegs = TLI->getNumRegisters(Ty->getContext(), ValueVT);		unsigned NumRegs = TLI->getNumRegisters(Ty->getContext(), ValueVT);
for (unsigned i = 0; i != NumRegs; ++i) {		for (unsigned i = 0; i != NumRegs; ++i) {
unsigned R = CreateReg(RegisterVT);		unsigned R = CreateReg(RegisterVT, isDivergent);
if (!FirstReg) FirstReg = R;		if (!FirstReg) FirstReg = R;
}		}
}		}
return FirstReg;		return FirstReg;
}		}

		unsigned FunctionLoweringInfo::CreateRegs(const Value *V) {
		return CreateRegs(V->getType(), DA && !TLI->requiresUniformRegister(*MF, V) &&
		rampitecUnsubmitted Done Reply Inline Actions 80 chars. rampitec: 80 chars.
		DA->isDivergent(V));
		}

/// GetLiveOutRegInfo - Gets LiveOutInfo for a register, returning NULL if the		/// GetLiveOutRegInfo - Gets LiveOutInfo for a register, returning NULL if the
/// register is a PHI destination and the PHI's LiveOutInfo is not valid. If		/// register is a PHI destination and the PHI's LiveOutInfo is not valid. If
/// the register's LiveOutInfo is for a smaller bit width, it is extended to		/// the register's LiveOutInfo is for a smaller bit width, it is extended to
/// the larger bit width by zero extension. The bit width must be no smaller		/// the larger bit width by zero extension. The bit width must be no smaller
/// than the LiveOutInfo's existing bit width.		/// than the LiveOutInfo's existing bit width.
const FunctionLoweringInfo::LiveOutInfo *		const FunctionLoweringInfo::LiveOutInfo *
FunctionLoweringInfo::GetLiveOutRegInfo(unsigned Reg, unsigned BitWidth) {		FunctionLoweringInfo::GetLiveOutRegInfo(unsigned Reg, unsigned BitWidth) {
if (!LiveOutRegInfo.inBounds(Reg))		if (!LiveOutRegInfo.inBounds(Reg))
▲ Show 20 Lines • Show All 202 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/InstrEmitter.h

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	void AddOperand(MachineInstrBuilder &MIB,
const MCInstrDesc *II,		const MCInstrDesc *II,
DenseMap<SDValue, unsigned> &VRBaseMap,		DenseMap<SDValue, unsigned> &VRBaseMap,
bool IsDebug, bool IsClone, bool IsCloned);		bool IsDebug, bool IsClone, bool IsCloned);

/// ConstrainForSubReg - Try to constrain VReg to a register class that		/// ConstrainForSubReg - Try to constrain VReg to a register class that
/// supports SubIdx sub-registers. Emit a copy if that isn't possible.		/// supports SubIdx sub-registers. Emit a copy if that isn't possible.
/// Return the virtual register to use.		/// Return the virtual register to use.
unsigned ConstrainForSubReg(unsigned VReg, unsigned SubIdx, MVT VT,		unsigned ConstrainForSubReg(unsigned VReg, unsigned SubIdx, MVT VT,
const DebugLoc &DL);		bool isDivergent, const DebugLoc &DL);

/// EmitSubregNode - Generate machine code for subreg nodes.		/// EmitSubregNode - Generate machine code for subreg nodes.
///		///
void EmitSubregNode(SDNode *Node, DenseMap<SDValue, unsigned> &VRBaseMap,		void EmitSubregNode(SDNode *Node, DenseMap<SDValue, unsigned> &VRBaseMap,
bool IsClone, bool IsCloned);		bool IsClone, bool IsCloned);

/// EmitCopyToRegClassNode - Generate machine code for COPY_TO_REGCLASS nodes.		/// EmitCopyToRegClassNode - Generate machine code for COPY_TO_REGCLASS nodes.
/// COPY_TO_REGCLASS is just a normal copy, except that the destination		/// COPY_TO_REGCLASS is just a normal copy, except that the destination
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/InstrEmitter.cpp

Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	EmitCopyFromReg(SDNode *Node, unsigned ResNo, bool IsClone, bool IsCloned,
// If the node is only used by a CopyToReg and the dest reg is a vreg, use		// If the node is only used by a CopyToReg and the dest reg is a vreg, use
// the CopyToReg'd destination register instead of creating a new vreg.		// the CopyToReg'd destination register instead of creating a new vreg.
bool MatchReg = true;		bool MatchReg = true;
const TargetRegisterClass *UseRC = nullptr;		const TargetRegisterClass *UseRC = nullptr;
MVT VT = Node->getSimpleValueType(ResNo);		MVT VT = Node->getSimpleValueType(ResNo);

// Stick to the preferred register classes for legal types.		// Stick to the preferred register classes for legal types.
if (TLI->isTypeLegal(VT))		if (TLI->isTypeLegal(VT))
UseRC = TLI->getRegClassFor(VT);		UseRC = TLI->getRegClassFor(VT, Node->isDivergent());

if (!IsClone && !IsCloned)		if (!IsClone && !IsCloned)
for (SDNode *User : Node->uses()) {		for (SDNode *User : Node->uses()) {
bool Match = true;		bool Match = true;
if (User->getOpcode() == ISD::CopyToReg &&		if (User->getOpcode() == ISD::CopyToReg &&
User->getOperand(2).getNode() == Node &&		User->getOperand(2).getNode() == Node &&
User->getOperand(2).getResNo() == ResNo) {		User->getOperand(2).getResNo() == ResNo) {
unsigned DestReg = cast<RegisterSDNode>(User->getOperand(1))->getReg();		unsigned DestReg = cast<RegisterSDNode>(User->getOperand(1))->getReg();
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	EmitCopyFromReg(SDNode *Node, unsigned ResNo, bool IsClone, bool IsCloned,
// Figure out the register class to create for the destreg.		// Figure out the register class to create for the destreg.
if (VRBase) {		if (VRBase) {
DstRC = MRI->getRegClass(VRBase);		DstRC = MRI->getRegClass(VRBase);
} else if (UseRC) {		} else if (UseRC) {
assert(TRI->isTypeLegalForClass(*UseRC, VT) &&		assert(TRI->isTypeLegalForClass(*UseRC, VT) &&
"Incompatible phys register def and uses!");		"Incompatible phys register def and uses!");
DstRC = UseRC;		DstRC = UseRC;
} else {		} else {
DstRC = TLI->getRegClassFor(VT);		DstRC = TLI->getRegClassFor(VT, Node->isDivergent());
}		}

// If all uses are reading from the src physical register and copying the		// If all uses are reading from the src physical register and copying the
// register is either impossible or very expensive, then don't create a copy.		// register is either impossible or very expensive, then don't create a copy.
if (MatchReg && SrcRC->getCopyCost() < 0) {		if (MatchReg && SrcRC->getCopyCost() < 0) {
VRBase = SrcReg;		VRBase = SrcReg;
} else {		} else {
// Create the reg, emit the copy.		// Create the reg, emit the copy.
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i < II.getNumDefs(); ++i) {
unsigned VRBase = 0;		unsigned VRBase = 0;
const TargetRegisterClass *RC =		const TargetRegisterClass *RC =
TRI->getAllocatableClass(TII->getRegClass(II, i, TRI, *MF));		TRI->getAllocatableClass(TII->getRegClass(II, i, TRI, *MF));
// Always let the value type influence the used register class. The		// Always let the value type influence the used register class. The
// constraints on the instruction may be too lax to represent the value		// constraints on the instruction may be too lax to represent the value
// type correctly. For example, a 64-bit float (X86::FR64) can't live in		// type correctly. For example, a 64-bit float (X86::FR64) can't live in
// the 32-bit float super-class (X86::FR32).		// the 32-bit float super-class (X86::FR32).
if (i < NumResults && TLI->isTypeLegal(Node->getSimpleValueType(i))) {		if (i < NumResults && TLI->isTypeLegal(Node->getSimpleValueType(i))) {
const TargetRegisterClass *VTRC =		const TargetRegisterClass *VTRC = TLI->getRegClassFor(
TLI->getRegClassFor(Node->getSimpleValueType(i));		Node->getSimpleValueType(i),
		rampitecUnsubmitted Done Reply Inline Actions Switch the condition order. Node->isDicergent() is less expensive. rampitec: Switch the condition order. Node->isDicergent() is less expensive.
		(Node->isDivergent() \|\| (RC && TRI->isDivergentRegClass(RC))));
if (RC)		if (RC)
VTRC = TRI->getCommonSubClass(RC, VTRC);		VTRC = TRI->getCommonSubClass(RC, VTRC);
if (VTRC)		if (VTRC)
RC = VTRC;		RC = VTRC;
}		}

if (II.OpInfo[i].isOptionalDef()) {		if (II.OpInfo[i].isOptionalDef()) {
// Optional def must be a physical register.		// Optional def must be a physical register.
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	unsigned InstrEmitter::getVR(SDValue Op,
DenseMap<SDValue, unsigned> &VRBaseMap) {		DenseMap<SDValue, unsigned> &VRBaseMap) {
if (Op.isMachineOpcode() &&		if (Op.isMachineOpcode() &&
Op.getMachineOpcode() == TargetOpcode::IMPLICIT_DEF) {		Op.getMachineOpcode() == TargetOpcode::IMPLICIT_DEF) {
// Add an IMPLICIT_DEF instruction before every use.		// Add an IMPLICIT_DEF instruction before every use.
unsigned VReg = getDstOfOnlyCopyToRegUse(Op.getNode(), Op.getResNo());		unsigned VReg = getDstOfOnlyCopyToRegUse(Op.getNode(), Op.getResNo());
// IMPLICIT_DEF can produce any type of result so its MCInstrDesc		// IMPLICIT_DEF can produce any type of result so its MCInstrDesc
// does not include operand register class info.		// does not include operand register class info.
if (!VReg) {		if (!VReg) {
const TargetRegisterClass *RC =		const TargetRegisterClass *RC = TLI->getRegClassFor(
TLI->getRegClassFor(Op.getSimpleValueType());		Op.getSimpleValueType(), Op.getNode()->isDivergent());
		rampitecUnsubmitted Done Reply Inline Actions Formatting. rampitec: Formatting.
VReg = MRI->createVirtualRegister(RC);		VReg = MRI->createVirtualRegister(RC);
}		}
BuildMI(*MBB, InsertPos, Op.getDebugLoc(),		BuildMI(*MBB, InsertPos, Op.getDebugLoc(),
TII->get(TargetOpcode::IMPLICIT_DEF), VReg);		TII->get(TargetOpcode::IMPLICIT_DEF), VReg);
return VReg;		return VReg;
}		}

DenseMap<SDValue, unsigned>::iterator I = VRBaseMap.find(Op);		DenseMap<SDValue, unsigned>::iterator I = VRBaseMap.find(Op);
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	AddRegisterOperand(MIB, Op, IIOpNum, II, VRBaseMap,
IsDebug, IsClone, IsCloned);		IsDebug, IsClone, IsCloned);
} else if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Op)) {		} else if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Op)) {
MIB.addImm(C->getSExtValue());		MIB.addImm(C->getSExtValue());
} else if (ConstantFPSDNode *F = dyn_cast<ConstantFPSDNode>(Op)) {		} else if (ConstantFPSDNode *F = dyn_cast<ConstantFPSDNode>(Op)) {
MIB.addFPImm(F->getConstantFPValue());		MIB.addFPImm(F->getConstantFPValue());
} else if (RegisterSDNode *R = dyn_cast<RegisterSDNode>(Op)) {		} else if (RegisterSDNode *R = dyn_cast<RegisterSDNode>(Op)) {
unsigned VReg = R->getReg();		unsigned VReg = R->getReg();
MVT OpVT = Op.getSimpleValueType();		MVT OpVT = Op.getSimpleValueType();
const TargetRegisterClass *OpRC =
TLI->isTypeLegal(OpVT) ? TLI->getRegClassFor(OpVT) : nullptr;
const TargetRegisterClass *IIRC =		const TargetRegisterClass *IIRC =
II ? TRI->getAllocatableClass(TII->getRegClass(II, IIOpNum, TRI, MF))		II ? TRI->getAllocatableClass(TII->getRegClass(II, IIOpNum, TRI, MF))
: nullptr;		: nullptr;
		const TargetRegisterClass *OpRC =
		TLI->isTypeLegal(OpVT)
		? TLI->getRegClassFor(OpVT,
		rampitecUnsubmitted Done Reply Inline Actions Switch the condition order. rampitec: Switch the condition order.
		Op.getNode()->isDivergent() \|\|
		(IIRC && TRI->isDivergentRegClass(IIRC)))
		: nullptr;

if (OpRC && IIRC && OpRC != IIRC &&		if (OpRC && IIRC && OpRC != IIRC &&
TargetRegisterInfo::isVirtualRegister(VReg)) {		TargetRegisterInfo::isVirtualRegister(VReg)) {
unsigned NewVReg = MRI->createVirtualRegister(IIRC);		unsigned NewVReg = MRI->createVirtualRegister(IIRC);
BuildMI(*MBB, InsertPos, Op.getNode()->getDebugLoc(),		BuildMI(*MBB, InsertPos, Op.getNode()->getDebugLoc(),
TII->get(TargetOpcode::COPY), NewVReg).addReg(VReg);		TII->get(TargetOpcode::COPY), NewVReg).addReg(VReg);
VReg = NewVReg;		VReg = NewVReg;
}		}
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	assert(Op.getValueType() != MVT::Other &&
Op.getValueType() != MVT::Glue &&		Op.getValueType() != MVT::Glue &&
"Chain and glue operands should occur at end of operand list!");		"Chain and glue operands should occur at end of operand list!");
AddRegisterOperand(MIB, Op, IIOpNum, II, VRBaseMap,		AddRegisterOperand(MIB, Op, IIOpNum, II, VRBaseMap,
IsDebug, IsClone, IsCloned);		IsDebug, IsClone, IsCloned);
}		}
}		}

unsigned InstrEmitter::ConstrainForSubReg(unsigned VReg, unsigned SubIdx,		unsigned InstrEmitter::ConstrainForSubReg(unsigned VReg, unsigned SubIdx,
MVT VT, const DebugLoc &DL) {		MVT VT, bool isDivergent, const DebugLoc &DL) {
const TargetRegisterClass *VRC = MRI->getRegClass(VReg);		const TargetRegisterClass *VRC = MRI->getRegClass(VReg);
const TargetRegisterClass *RC = TRI->getSubClassWithSubReg(VRC, SubIdx);		const TargetRegisterClass *RC = TRI->getSubClassWithSubReg(VRC, SubIdx);

// RC is a sub-class of VRC that supports SubIdx. Try to constrain VReg		// RC is a sub-class of VRC that supports SubIdx. Try to constrain VReg
// within reason.		// within reason.
if (RC && RC != VRC)		if (RC && RC != VRC)
RC = MRI->constrainRegClass(VReg, RC, MinRCSize);		RC = MRI->constrainRegClass(VReg, RC, MinRCSize);

// VReg has been adjusted. It can be used with SubIdx operands now.		// VReg has been adjusted. It can be used with SubIdx operands now.
if (RC)		if (RC)
return VReg;		return VReg;

// VReg couldn't be reasonably constrained. Emit a COPY to a new virtual		// VReg couldn't be reasonably constrained. Emit a COPY to a new virtual
// register instead.		// register instead.
RC = TRI->getSubClassWithSubReg(TLI->getRegClassFor(VT), SubIdx);		RC = TRI->getSubClassWithSubReg(TLI->getRegClassFor(VT, isDivergent), SubIdx);
assert(RC && "No legal register class for VT supports that SubIdx");		assert(RC && "No legal register class for VT supports that SubIdx");
unsigned NewReg = MRI->createVirtualRegister(RC);		unsigned NewReg = MRI->createVirtualRegister(RC);
BuildMI(*MBB, InsertPos, DL, TII->get(TargetOpcode::COPY), NewReg)		BuildMI(*MBB, InsertPos, DL, TII->get(TargetOpcode::COPY), NewReg)
.addReg(VReg);		.addReg(VReg);
return NewReg;		return NewReg;
}		}

/// EmitSubregNode - Generate machine code for subreg nodes.		/// EmitSubregNode - Generate machine code for subreg nodes.
Show All 18 Lines	void InstrEmitter::EmitSubregNode(SDNode *Node,
}		}

if (Opc == TargetOpcode::EXTRACT_SUBREG) {		if (Opc == TargetOpcode::EXTRACT_SUBREG) {
// EXTRACT_SUBREG is lowered as %dst = COPY %src:sub. There are no		// EXTRACT_SUBREG is lowered as %dst = COPY %src:sub. There are no
// constraints on the %dst register, COPY can target all legal register		// constraints on the %dst register, COPY can target all legal register
// classes.		// classes.
unsigned SubIdx = cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();		unsigned SubIdx = cast<ConstantSDNode>(Node->getOperand(1))->getZExtValue();
const TargetRegisterClass *TRC =		const TargetRegisterClass *TRC =
TLI->getRegClassFor(Node->getSimpleValueType(0));		TLI->getRegClassFor(Node->getSimpleValueType(0), Node->isDivergent());

unsigned Reg;		unsigned Reg;
MachineInstr *DefMI;		MachineInstr *DefMI;
RegisterSDNode *R = dyn_cast<RegisterSDNode>(Node->getOperand(0));		RegisterSDNode *R = dyn_cast<RegisterSDNode>(Node->getOperand(0));
if (R && TargetRegisterInfo::isPhysicalRegister(R->getReg())) {		if (R && TargetRegisterInfo::isPhysicalRegister(R->getReg())) {
Reg = R->getReg();		Reg = R->getReg();
DefMI = nullptr;		DefMI = nullptr;
} else {		} else {
Show All 17 Lines	if (DefMI &&
MRI->clearKillFlags(SrcReg);		MRI->clearKillFlags(SrcReg);
} else {		} else {
// Reg may not support a SubIdx sub-register, and we may need to		// Reg may not support a SubIdx sub-register, and we may need to
// constrain its register class or issue a COPY to a compatible register		// constrain its register class or issue a COPY to a compatible register
// class.		// class.
if (TargetRegisterInfo::isVirtualRegister(Reg))		if (TargetRegisterInfo::isVirtualRegister(Reg))
Reg = ConstrainForSubReg(Reg, SubIdx,		Reg = ConstrainForSubReg(Reg, SubIdx,
Node->getOperand(0).getSimpleValueType(),		Node->getOperand(0).getSimpleValueType(),
Node->getDebugLoc());		Node->isDivergent(), Node->getDebugLoc());
		nhaehnleUnsubmitted Not Done Reply Inline Actions The formatting looks off here. nhaehnle: The formatting looks off here.

// Create the destreg if it is missing.		// Create the destreg if it is missing.
if (VRBase == 0)		if (VRBase == 0)
VRBase = MRI->createVirtualRegister(TRC);		VRBase = MRI->createVirtualRegister(TRC);

// Create the extract_subreg machine instruction.		// Create the extract_subreg machine instruction.
MachineInstrBuilder CopyMI =		MachineInstrBuilder CopyMI =
BuildMI(*MBB, InsertPos, Node->getDebugLoc(),		BuildMI(*MBB, InsertPos, Node->getDebugLoc(),
TII->get(TargetOpcode::COPY), VRBase);		TII->get(TargetOpcode::COPY), VRBase);
Show All 18 Lines	if (Opc == TargetOpcode::EXTRACT_SUBREG) {
//		//
// is lowered by TwoAddressInstructionPass to:		// is lowered by TwoAddressInstructionPass to:
//		//
// %dst = COPY %src		// %dst = COPY %src
// %dst:SubIdx = COPY %sub		// %dst:SubIdx = COPY %sub
//		//
// There is no constraint on the %src register class.		// There is no constraint on the %src register class.
//		//
const TargetRegisterClass *SRC = TLI->getRegClassFor(Node->getSimpleValueType(0));		const TargetRegisterClass *SRC =
		TLI->getRegClassFor(Node->getSimpleValueType(0), Node->isDivergent());
		efriedmaUnsubmitted Not Done Reply Inline Actions Weird indentation. efriedma: Weird indentation.
		rampitecUnsubmitted Done Reply Inline Actions Formatting. rampitec: Formatting.
SRC = TRI->getSubClassWithSubReg(SRC, SubIdx);		SRC = TRI->getSubClassWithSubReg(SRC, SubIdx);
assert(SRC && "No register class supports VT and SubIdx for INSERT_SUBREG");		assert(SRC && "No register class supports VT and SubIdx for INSERT_SUBREG");

if (VRBase == 0 \|\| !SRC->hasSubClassEq(MRI->getRegClass(VRBase)))		if (VRBase == 0 \|\| !SRC->hasSubClassEq(MRI->getRegClass(VRBase)))
VRBase = MRI->createVirtualRegister(SRC);		VRBase = MRI->createVirtualRegister(SRC);

// Create the insert_subreg or subreg_to_reg machine instruction.		// Create the insert_subreg or subreg_to_reg machine instruction.
MachineInstrBuilder MIB =		MachineInstrBuilder MIB =
▲ Show 20 Lines • Show All 575 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,846 Lines • ▼ Show 20 Lines	for (const PHINode &PN : SuccBB->phis()) {
continue;		continue;

unsigned Reg;		unsigned Reg;
const Value *PHIOp = PN.getIncomingValueForBlock(LLVMBB);		const Value *PHIOp = PN.getIncomingValueForBlock(LLVMBB);

if (const Constant *C = dyn_cast<Constant>(PHIOp)) {		if (const Constant *C = dyn_cast<Constant>(PHIOp)) {
unsigned &RegOut = ConstantsOut[C];		unsigned &RegOut = ConstantsOut[C];
if (RegOut == 0) {		if (RegOut == 0) {
RegOut = FuncInfo.CreateRegs(C->getType());		RegOut = FuncInfo.CreateRegs(C);
CopyValueToVirtualRegister(C, RegOut);		CopyValueToVirtualRegister(C, RegOut);
}		}
Reg = RegOut;		Reg = RegOut;
} else {		} else {
DenseMap<const Value *, unsigned>::iterator I =		DenseMap<const Value *, unsigned>::iterator I =
FuncInfo.ValueMap.find(PHIOp);		FuncInfo.ValueMap.find(PHIOp);
if (I != FuncInfo.ValueMap.end())		if (I != FuncInfo.ValueMap.end())
Reg = I->second;		Reg = I->second;
else {		else {
assert(isa<AllocaInst>(PHIOp) &&		assert(isa<AllocaInst>(PHIOp) &&
FuncInfo.StaticAllocaMap.count(cast<AllocaInst>(PHIOp)) &&		FuncInfo.StaticAllocaMap.count(cast<AllocaInst>(PHIOp)) &&
"Didn't codegen value into a register!??");		"Didn't codegen value into a register!??");
Reg = FuncInfo.CreateRegs(PHIOp->getType());		Reg = FuncInfo.CreateRegs(PHIOp);
CopyValueToVirtualRegister(PHIOp, Reg);		CopyValueToVirtualRegister(PHIOp, Reg);
}		}
}		}

// Remember that this register needs to added to the machine PHI node as		// Remember that this register needs to added to the machine PHI node as
// the input for this MBB.		// the input for this MBB.
SmallVector<EVT, 4> ValueVTs;		SmallVector<EVT, 4> ValueVTs;
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
▲ Show 20 Lines • Show All 1,036 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

Show First 20 Lines • Show All 1,733 Lines • ▼ Show 20 Lines	if (FastIS) {
}		}

reportFastISelFailure(MF, ORE, R, EnableFastISelAbort > 2);		reportFastISelFailure(MF, ORE, R, EnableFastISelAbort > 2);

if (!Inst->getType()->isVoidTy() && !Inst->getType()->isTokenTy() &&		if (!Inst->getType()->isVoidTy() && !Inst->getType()->isTokenTy() &&
!Inst->use_empty()) {		!Inst->use_empty()) {
unsigned &R = FuncInfo->ValueMap[Inst];		unsigned &R = FuncInfo->ValueMap[Inst];
if (!R)		if (!R)
R = FuncInfo->CreateRegs(Inst->getType());		R = FuncInfo->CreateRegs(Inst);
}		}

bool HadTailCall = false;		bool HadTailCall = false;
MachineBasicBlock::iterator SavedInsertPt = FuncInfo->InsertPt;		MachineBasicBlock::iterator SavedInsertPt = FuncInfo->InsertPt;
SelectBasicBlock(Inst->getIterator(), BI, HadTailCall);		SelectBasicBlock(Inst->getIterator(), BI, HadTailCall);

// If the call was emitted as a tail call, we're done with the block.		// If the call was emitted as a tail call, we're done with the block.
// We also need to delete any previously emitted instructions.		// We also need to delete any previously emitted instructions.
▲ Show 20 Lines • Show All 2,137 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIFixSGPRCopies.cpp

Show First 20 Lines • Show All 615 Lines • ▼ Show 20 Lines	for (MachineBasicBlock::iterator I = MBB.begin(), E = MBB.end();
TII->moveToVALU(MI, MDT);		TII->moveToVALU(MI, MDT);
} else if (isSGPRToVGPRCopy(SrcRC, DstRC, *TRI)) {		} else if (isSGPRToVGPRCopy(SrcRC, DstRC, *TRI)) {
tryChangeVGPRtoSGPRinCopy(MI, TRI, TII);		tryChangeVGPRtoSGPRinCopy(MI, TRI, TII);
}		}

break;		break;
}		}
case AMDGPU::PHI: {		case AMDGPU::PHI: {
unsigned Reg = MI.getOperand(0).getReg();		unsigned hasVGPRUses = 0;
if (!TRI->isSGPRClass(MRI.getRegClass(Reg)))		SetVector<const MachineInstr *> worklist;
break;		worklist.insert(&MI);
		while (!worklist.empty()) {
		const MachineInstr *Instr = worklist.pop_back_val();
		unsigned Reg = Instr->getOperand(0).getReg();
		for (const auto &Use : MRI.use_operands(Reg)) {
		nhaehnleUnsubmitted Done Reply Inline Actions const auto & nhaehnle: const auto &
		const MachineInstr *UseMI = Use.getParent();
		if (UseMI->isCopy() \|\| UseMI->isRegSequence()) {
		if (UseMI->isCopy() &&
		TRI->isPhysicalRegister(UseMI->getOperand(0).getReg()) &&
		!TRI->isSGPRReg(MRI, UseMI->getOperand(0).getReg())) {
		rampitecUnsubmitted Done Reply Inline Actions I think you can decrease nesting here. At the very least join individual "if" conditions with "&&". rampitec: I think you can decrease nesting here. At the very least join individual "if" conditions with…
		rampitecUnsubmitted Done Reply Inline Actions You can still decrease nesting. rampitec: You can still decrease nesting.
		hasVGPRUses++;
		rampitecUnsubmitted Done Reply Inline Actions !isSGPRReg() rampitec: !isSGPRReg()
		}
		worklist.insert(UseMI);
		continue;
		}

// We don't need to fix the PHI if the common dominator of the		if (UseMI->isPHI()) {
// two incoming blocks terminates with a uniform branch.		if (!TRI->isSGPRReg(MRI, Use.getReg()))
bool HasVGPROperand = phiHasVGPROperands(MI, MRI, TRI, TII);		hasVGPRUses++;
if (MI.getNumExplicitOperands() == 5 && !HasVGPROperand) {		continue;
MachineBasicBlock *MBB0 = MI.getOperand(2).getMBB();		}
MachineBasicBlock *MBB1 = MI.getOperand(4).getMBB();

if (!predsHasDivergentTerminator(MBB0, TRI) &&		unsigned OpNo = UseMI->getOperandNo(&Use);
!predsHasDivergentTerminator(MBB1, TRI)) {		const MCInstrDesc &Desc = TII->get(UseMI->getOpcode());
LLVM_DEBUG(dbgs()		if (Desc.OpInfo && Desc.OpInfo[OpNo].RegClass != -1) {
<< "Not fixing PHI for uniform branch: " << MI << '\n');		const TargetRegisterClass *OpRC =
		TRI->getRegClass(Desc.OpInfo[OpNo].RegClass);
		if (!TRI->isSGPRClass(OpRC) && OpRC != &AMDGPU::VS_32RegClass &&
		OpRC != &AMDGPU::VS_64RegClass) {
		hasVGPRUses++;
		}
		}
		}
		}
		bool hasVGPRInput = false;
		for (unsigned i = 1; i < MI.getNumOperands(); i += 2) {
		unsigned InputReg = MI.getOperand(i).getReg();
		MachineInstr *Def = MRI.getVRegDef(InputReg);
		if (TRI->isVGPR(MRI, InputReg)) {
		if (Def->isCopy()) {
		unsigned SrcReg = Def->getOperand(1).getReg();
		const TargetRegisterClass *RC =
		TRI->isVirtualRegister(SrcReg) ? MRI.getRegClass(SrcReg)
		: TRI->getPhysRegClass(SrcReg);
		if (TRI->isSGPRClass(RC))
		continue;
		}
		hasVGPRInput = true;
		break;
		} else if (Def->isCopy() &&
		TRI->isVGPR(MRI, Def->getOperand(1).getReg())) {
		hasVGPRInput = true;
break;		break;
}		}
}		}
		unsigned PHIRes = MI.getOperand(0).getReg();
		const TargetRegisterClass *RC0 = MRI.getRegClass(PHIRes);

// If a PHI node defines an SGPR and any of its operands are VGPRs,		if ((!TRI->isVGPR(MRI, PHIRes) && RC0 != &AMDGPU::VReg_1RegClass) &&
// then we need to move it to the VALU.		(hasVGPRInput \|\| hasVGPRUses > 1)) {
//		TII->moveToVALU(MI);
// Also, if a PHI node defines an SGPR and has all SGPR operands		} else {
// we must move it to the VALU, because the SGPR operands will		TII->legalizeOperands(MI, MDT);
// all end up being assigned the same register, which means
// there is a potential for a conflict if different threads take
// different control flow paths.
//
// For Example:
//
// sgpr0 = def;
// ...
// sgpr1 = def;
// ...
// sgpr2 = PHI sgpr0, sgpr1
// use sgpr2;
//
// Will Become:
//
// sgpr2 = def;
// ...
// sgpr2 = def;
// ...
// use sgpr2
//
// The one exception to this rule is when one of the operands
// is defined by a SI_BREAK, SI_IF_BREAK, or SI_ELSE_BREAK
// instruction. In this case, there we know the program will
// never enter the second block (the loop) without entering
// the first block (where the condition is computed), so there
// is no chance for values to be over-written.

SmallSet<unsigned, 8> Visited;
if (HasVGPROperand \|\| !phiHasBreakDef(MI, MRI, Visited)) {
LLVM_DEBUG(dbgs() << "Fixing PHI: " << MI);
TII->moveToVALU(MI, MDT);
}		}

break;		break;
}		}
case AMDGPU::REG_SEQUENCE:		case AMDGPU::REG_SEQUENCE:
if (TRI->hasVGPRs(TII->getOpRegClass(MI, 0)) \|\|		if (TRI->hasVGPRs(TII->getOpRegClass(MI, 0)) \|\|
!hasVGPROperands(MI, TRI)) {		!hasVGPROperands(MI, TRI)) {
foldVGPRCopyIntoRegSequence(MI, TRI, TII, MRI);		foldVGPRCopyIntoRegSequence(MI, TRI, TII, MRI);
continue;		continue;
}		}
Show All 26 Lines

lib/Target/AMDGPU/SIISelLowering.h

Show First 20 Lines • Show All 361 Lines • ▼ Show 20 Lines	bool isCanonicalized(SelectionDAG &DAG, SDValue Op,
unsigned MaxDepth = 5) const;		unsigned MaxDepth = 5) const;
bool denormalsEnabledForType(EVT VT) const;		bool denormalsEnabledForType(EVT VT) const;

bool isKnownNeverNaNForTargetNode(SDValue Op,		bool isKnownNeverNaNForTargetNode(SDValue Op,
const SelectionDAG &DAG,		const SelectionDAG &DAG,
bool SNaN = false,		bool SNaN = false,
unsigned Depth = 0) const override;		unsigned Depth = 0) const override;
AtomicExpansionKind shouldExpandAtomicRMWInIR(AtomicRMWInst *) const override;		AtomicExpansionKind shouldExpandAtomicRMWInIR(AtomicRMWInst *) const override;
		virtual const TargetRegisterClass *
		getRegClassFor(MVT VT, bool isDivergent) const override;
		virtual bool requiresUniformRegister(MachineFunction &MF,
		const Value *V) const override;
		rampitecUnsubmitted Done Reply Inline Actions Formatting. rampitec: Formatting.
unsigned getPrefLoopAlignment(MachineLoop *ML) const override;		unsigned getPrefLoopAlignment(MachineLoop *ML) const override;
};		};

} // End namespace llvm		} // End namespace llvm

#endif		#endif

lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,636 Lines • ▼ Show 20 Lines	case AMDGPU::V_DIV_SCALE_F64: {
SDValue Src2 = Node->getOperand(2);		SDValue Src2 = Node->getOperand(2);

if ((Src0.isMachineOpcode() &&		if ((Src0.isMachineOpcode() &&
Src0.getMachineOpcode() != AMDGPU::IMPLICIT_DEF) &&		Src0.getMachineOpcode() != AMDGPU::IMPLICIT_DEF) &&
(Src0 == Src1 \|\| Src0 == Src2))		(Src0 == Src1 \|\| Src0 == Src2))
break;		break;

MVT VT = Src0.getValueType().getSimpleVT();		MVT VT = Src0.getValueType().getSimpleVT();
const TargetRegisterClass *RC = getRegClassFor(VT);		const TargetRegisterClass *RC =
		getRegClassFor(VT, Src0.getNode()->isDivergent());
		rampitecUnsubmitted Done Reply Inline Actions Formatting. rampitec: Formatting.

MachineRegisterInfo &MRI = DAG.getMachineFunction().getRegInfo();		MachineRegisterInfo &MRI = DAG.getMachineFunction().getRegInfo();
SDValue UndefReg = DAG.getRegister(MRI.createVirtualRegister(RC), VT);		SDValue UndefReg = DAG.getRegister(MRI.createVirtualRegister(RC), VT);

SDValue ImpDef = DAG.getCopyToReg(DAG.getEntryNode(), SDLoc(Node),		SDValue ImpDef = DAG.getCopyToReg(DAG.getEntryNode(), SDLoc(Node),
UndefReg, Src0, SDValue());		UndefReg, Src0, SDValue());

// src0 must be the same register as src1 or src2, even if the value is		// src0 must be the same register as src1 or src2, even if the value is
▲ Show 20 Lines • Show All 504 Lines • ▼ Show 20 Lines
TargetLowering::AtomicExpansionKind		TargetLowering::AtomicExpansionKind
SITargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW) const {		SITargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW) const {
switch (RMW->getOperation()) {		switch (RMW->getOperation()) {
case AtomicRMWInst::FAdd: {		case AtomicRMWInst::FAdd: {
Type *Ty = RMW->getType();		Type *Ty = RMW->getType();

// We don't have a way to support 16-bit atomics now, so just leave them		// We don't have a way to support 16-bit atomics now, so just leave them
// as-is.		// as-is.
if (Ty->isHalfTy())		if (Ty->isHalfTy())
return AtomicExpansionKind::None;		return AtomicExpansionKind::None;
		nhaehnleUnsubmitted Not Done Reply Inline Actions Can we not keep uniform i1 in 32-bit registers? Maybe at least mark this as a TODO? nhaehnle: Can we not keep uniform i1 in 32-bit registers? Maybe at least mark this as a TODO?

		rampitecUnsubmitted Done Reply Inline Actions !isSGPRClass() rampitec: !isSGPRClass()
if (!Ty->isFloatTy())		if (!Ty->isFloatTy())
return AtomicExpansionKind::CmpXChg;		return AtomicExpansionKind::CmpXChg;

// TODO: Do have these for flat. Older targets also had them for buffers.		// TODO: Do have these for flat. Older targets also had them for buffers.
unsigned AS = RMW->getPointerAddressSpace();		unsigned AS = RMW->getPointerAddressSpace();
return (AS == AMDGPUAS::LOCAL_ADDRESS && Subtarget->hasLDSFPAtomics()) ?		return (AS == AMDGPUAS::LOCAL_ADDRESS && Subtarget->hasLDSFPAtomics()) ?
AtomicExpansionKind::None : AtomicExpansionKind::CmpXChg;		AtomicExpansionKind::None : AtomicExpansionKind::CmpXChg;
}		}
default:		default:
break;		break;
}		}

return AMDGPUTargetLowering::shouldExpandAtomicRMWInIR(RMW);		return AMDGPUTargetLowering::shouldExpandAtomicRMWInIR(RMW);
}		}

		const TargetRegisterClass *
		SITargetLowering::getRegClassFor(MVT VT, bool isDivergent) const {
		const TargetRegisterClass *RC = TargetLoweringBase::getRegClassFor(VT, false);
		const SIRegisterInfo *TRI = Subtarget->getRegisterInfo();
		if (RC == &AMDGPU::VReg_1RegClass && !isDivergent)
		return &AMDGPU::SReg_64RegClass;
		if (!TRI->isSGPRClass(RC) && !isDivergent)
		return TRI->getEquivalentSGPRClass(RC);
		else if (TRI->isSGPRClass(RC) && isDivergent)
		return TRI->getEquivalentVGPRClass(RC);

		return RC;
		}

		static bool hasIfBreakUser(const Value V, SetVector<const Value > &Visited) {
		if (Visited.count(V))
		return false;
		Visited.insert(V);
		bool Result = false;
		for (auto U : V->users()) {
		if (const IntrinsicInst *Intrinsic = dyn_cast<IntrinsicInst>(U)) {
		if ((Intrinsic->getIntrinsicID() == Intrinsic::amdgcn_if_break) &&
		(V == U->getOperand(1)))
		efriedmaUnsubmitted Not Done Reply Inline Actions It's probably not a good idea to traverse the use-list of anything that isn't an instruction here; you could end up finding uses in a different function. I don't really understand what you're trying to do here; is this going to force an arbitrary tree of instructions into uniform registers? efriedma: It's probably not a good idea to traverse the use-list of anything that isn't an instruction…
		alex-tAuthorUnsubmitted Done Reply Inline Actions Hmm... I was pretty sure that the UseList is per function... it really is the same for the whole module? It looks weird to me. As for what I'm doing here: the whole program slice that produces/consumes 64 bit mask for the SI Control Flow staff should be forced to the SGPRs. It is NOT arbitrary set. I start from the value in question and DFS on the def-use tree until I meet SI_IF_BREAK or to the end. If I meet SI_IF_BREAK whole the tree requires SGPRs for concrete uses. Let's say we have some instruction that produce the value, SI_IF_BREAK that finally uses the derived value and some another instruction in between that is naturally VALU. If this VALU instruction description does not allow the SGPR on this exact operand position we'll have to insert a SGPR to VGPR COPY operation. Once more it is target specific interpretation. Divergence of the high level value does not strictly require it to be assigned to VGPR. It may be 64 bit mask. alex-t: Hmm... I was pretty sure that the UseList is per function... it really is the same for the…
		efriedmaUnsubmitted Not Done Reply Inline Actions Yes, use-lists are global, in general. For Instructions and Arguments specifically, all the uses are required to be in the same function as the definition, though, so maybe you're fine here. If I'm following correctly, amdgcn_if_break are specialized intrinsics which are generated just before isel, and there's a specific set of PHI nodes and intrinsics that need to remain in SGPRs? I'm not quite sure I follow why you can't solve the issue by making divergence analysis treat amdgcn_if_break as a non-divergent operation. But I'll let a reviewer more familiar with the target handle that. Given the way the input is currently structured, the check is reasonable, I guess. efriedma: Yes, use-lists are global, in general. For Instructions and Arguments specifically, all the…
		alex-tAuthorUnsubmitted Done Reply Inline Actions Why we cannot just add the exception to the DA for SI_IF_BREAK? Typically all the CF intrinsics are connected by the 64bit mask that is defined by one (SI_IF for instance) and then used by another (SI_Else and SI_END_CF for instance). This mask always required to reside in 64 bit SGPR. Same time the condition is usual boolean value and can be either divergent or not. If we force the whole intrinsic to be uniform we would spoil the divergence propagation for those values that are control dependent on the divergent CF. And that;s why the following code: if ((Intrinsic->getIntrinsicID() == Intrinsic::amdgcn_if_break) && (V == U->getOperand(1))) Result = true; explicitly checks which operand is the use. alex-t: Why we cannot just add the exception to the DA for SI_IF_BREAK? Typically all the CF…
		efriedmaUnsubmitted Not Done Reply Inline Actions Okay, that makes sense. efriedma: Okay, that makes sense.
		Result = true;
		} else {
		Result = hasIfBreakUser(U, Visited);
		}
		if (Result)
		break;
		}
		return Result;
		}

		bool SITargetLowering::requiresUniformRegister(MachineFunction &MF,
		const Value *V) const {
		if (const IntrinsicInst *Intrinsic = dyn_cast<IntrinsicInst>(V)) {
		switch (Intrinsic->getIntrinsicID()) {
		default:
		return false;
		case Intrinsic::amdgcn_if_break:
		return true;
		}
		}
		if (const ExtractValueInst *ExtValue = dyn_cast<ExtractValueInst>(V)) {
		if (const IntrinsicInst *Intrinsic =
		dyn_cast<IntrinsicInst>(ExtValue->getOperand(0))) {
		switch (Intrinsic->getIntrinsicID()) {
		default:
		return false;
		case Intrinsic::amdgcn_if:
		case Intrinsic::amdgcn_else: {
		ArrayRef<unsigned> Indices = ExtValue->getIndices();
		if (Indices.size() == 1 && Indices[0] == 1) {
		return true;
		}
		}
		}
		}
		}
		if (const CallInst *CI = dyn_cast<CallInst>(V)) {
		if (isa<InlineAsm>(CI->getCalledValue())) {
		const SIRegisterInfo *SIRI = Subtarget->getRegisterInfo();
		ImmutableCallSite CS(CI);
		TargetLowering::AsmOperandInfoVector TargetConstraints = ParseConstraints(
		MF.getDataLayout(), Subtarget->getRegisterInfo(), CS);
		for (auto &TC : TargetConstraints) {
		if (TC.Type == InlineAsm::isOutput) {
		ComputeConstraintToUse(TC, SDValue());
		unsigned AssignedReg;
		const TargetRegisterClass *RC;
		std::tie(AssignedReg, RC) = getRegForInlineAsmConstraint(
		SIRI, TC.ConstraintCode,
		getSimpleValueType(MF.getDataLayout(), CS.getType()));
		if (RC) {
		MachineRegisterInfo &MRI = MF.getRegInfo();
		if (AssignedReg != 0 && SIRI->isSGPRReg(MRI, AssignedReg))
		return true;
		else if (SIRI->isSGPRClass(RC))
		return true;
		}
		}
		}
		}
		}
		SetVector<const Value *> Visited;
		return hasIfBreakUser(V, Visited);
		}

lib/Target/AMDGPU/SIInstrInfo.cpp

Show First 20 Lines • Show All 2,213 Lines • ▼ Show 20 Lines	if (Src2->isReg() && Src2->getReg() == Reg) {
AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src2));		AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src2));

// ChangingToImmediate adds Src2 back to the instruction.		// ChangingToImmediate adds Src2 back to the instruction.
Src2->ChangeToImmediate(Imm);		Src2->ChangeToImmediate(Imm);

// These come before src2.		// These come before src2.
removeModOperands(UseMI);		removeModOperands(UseMI);
UseMI.setDesc(get(NewOpc));		UseMI.setDesc(get(NewOpc));
		// It might happen that UseMI was commuted
		// and we now have SGPR as SRC1. If so 2 inlined
		// constant and SGPR are illegal.
		legalizeOperands(UseMI);

bool DeleteDef = MRI->hasOneNonDBGUse(Reg);		bool DeleteDef = MRI->hasOneNonDBGUse(Reg);
if (DeleteDef)		if (DeleteDef)
DefMI.eraseFromParent();		DefMI.eraseFromParent();

return true;		return true;
}		}
}		}
▲ Show 20 Lines • Show All 1,678 Lines • ▼ Show 20 Lines	void SIInstrInfo::legalizeGenericOperand(MachineBasicBlock &InsertMBB,
Op.setReg(DstReg);		Op.setReg(DstReg);
Op.setSubReg(0);		Op.setSubReg(0);

MachineInstr *Def = MRI.getVRegDef(OpReg);		MachineInstr *Def = MRI.getVRegDef(OpReg);
if (!Def)		if (!Def)
return;		return;

// Try to eliminate the copy if it is copying an immediate value.		// Try to eliminate the copy if it is copying an immediate value.
if (Def->isMoveImmediate())		if (Def->isMoveImmediate() && DstRC != &AMDGPU::VReg_1RegClass)
FoldImmediate(Copy, Def, OpReg, &MRI);		FoldImmediate(Copy, Def, OpReg, &MRI);
}		}

// Emit the actual waterfall loop, executing the wrapped instruction for each		// Emit the actual waterfall loop, executing the wrapped instruction for each
// unique value of \p Rsrc across all lanes. In the best case we execute 1		// unique value of \p Rsrc across all lanes. In the best case we execute 1
// iteration, in the worst case we execute 64 (once per lane).		// iteration, in the worst case we execute 64 (once per lane).
static void		static void
emitLoadSRsrcFromVGPRLoop(const SIInstrInfo &TII, MachineRegisterInfo &MRI,		emitLoadSRsrcFromVGPRLoop(const SIInstrInfo &TII, MachineRegisterInfo &MRI,
▲ Show 20 Lines • Show All 217 Lines • ▼ Show 20 Lines	if (MI.getOpcode() == AMDGPU::PHI) {
}		}

// If any of the operands are VGPR registers, then they all most be		// If any of the operands are VGPR registers, then they all most be
// otherwise we will create illegal VGPR->SGPR copies when legalizing		// otherwise we will create illegal VGPR->SGPR copies when legalizing
// them.		// them.
if (VRC \|\| !RI.isSGPRClass(getOpRegClass(MI, 0))) {		if (VRC \|\| !RI.isSGPRClass(getOpRegClass(MI, 0))) {
if (!VRC) {		if (!VRC) {
assert(SRC);		assert(SRC);
		if (getOpRegClass(MI, 0) == &AMDGPU::VReg_1RegClass) {
		VRC = &AMDGPU::VReg_1RegClass;
		} else
VRC = RI.getEquivalentVGPRClass(SRC);		VRC = RI.getEquivalentVGPRClass(SRC);
}		}
RC = VRC;		RC = VRC;
} else {		} else {
RC = SRC;		RC = SRC;
}		}

// Update all the operands so they have the same type.		// Update all the operands so they have the same type.
for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {		for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {
▲ Show 20 Lines • Show All 1,145 Lines • ▼ Show 20 Lines	const TargetRegisterClass *SIInstrInfo::getDestEquivalentVGPRClass(
// class associated with the operand, so we need to find an equivalent VGPR		// class associated with the operand, so we need to find an equivalent VGPR
// register class in order to move the instruction to the VALU.		// register class in order to move the instruction to the VALU.
case AMDGPU::COPY:		case AMDGPU::COPY:
case AMDGPU::PHI:		case AMDGPU::PHI:
case AMDGPU::REG_SEQUENCE:		case AMDGPU::REG_SEQUENCE:
case AMDGPU::INSERT_SUBREG:		case AMDGPU::INSERT_SUBREG:
case AMDGPU::WQM:		case AMDGPU::WQM:
case AMDGPU::WWM:		case AMDGPU::WWM:
if (RI.hasVGPRs(NewDstRC))		if (RI.hasVGPRs(NewDstRC) \|\| NewDstRC == &AMDGPU::VReg_1RegClass)
return nullptr;		return nullptr;

NewDstRC = RI.getEquivalentVGPRClass(NewDstRC);		NewDstRC = RI.getEquivalentVGPRClass(NewDstRC);
if (!NewDstRC)		if (!NewDstRC)
return nullptr;		return nullptr;
return NewDstRC;		return NewDstRC;
default:		default:
return NewDstRC;		return NewDstRC;
▲ Show 20 Lines • Show All 632 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIRegisterInfo.h

Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	public:

unsigned getSGPRPressureSet() const { return SGPRSetID; };		unsigned getSGPRPressureSet() const { return SGPRSetID; };
unsigned getVGPRPressureSet() const { return VGPRSetID; };		unsigned getVGPRPressureSet() const { return VGPRSetID; };

const TargetRegisterClass *getRegClassForReg(const MachineRegisterInfo &MRI,		const TargetRegisterClass *getRegClassForReg(const MachineRegisterInfo &MRI,
unsigned Reg) const;		unsigned Reg) const;
bool isVGPR(const MachineRegisterInfo &MRI, unsigned Reg) const;		bool isVGPR(const MachineRegisterInfo &MRI, unsigned Reg) const;

		virtual bool
		isDivergentRegClass(const TargetRegisterClass *RC) const override {
		return !isSGPRClass(RC);
		rampitecUnsubmitted Done Reply Inline Actions !isSGPRClass() rampitec: !isSGPRClass()
		}

bool isSGPRPressureSet(unsigned SetID) const {		bool isSGPRPressureSet(unsigned SetID) const {
return SGPRPressureSets.test(SetID) && !VGPRPressureSets.test(SetID);		return SGPRPressureSets.test(SetID) && !VGPRPressureSets.test(SetID);
}		}
bool isVGPRPressureSet(unsigned SetID) const {		bool isVGPRPressureSet(unsigned SetID) const {
return VGPRPressureSets.test(SetID) && !SGPRPressureSets.test(SetID);		return VGPRPressureSets.test(SetID) && !SGPRPressureSets.test(SetID);
}		}

ArrayRef<int16_t> getRegSplitParts(const TargetRegisterClass *RC,		ArrayRef<int16_t> getRegSplitParts(const TargetRegisterClass *RC,
▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

lib/Target/ARM/ARMISelLowering.h

Show First 20 Lines • Show All 450 Lines • ▼ Show 20 Lines	public:
}		}

const ARMSubtarget* getSubtarget() const {		const ARMSubtarget* getSubtarget() const {
return Subtarget;		return Subtarget;
}		}

/// getRegClassFor - Return the register class that should be used for the		/// getRegClassFor - Return the register class that should be used for the
/// specified value type.		/// specified value type.
const TargetRegisterClass *getRegClassFor(MVT VT) const override;		const TargetRegisterClass *
		getRegClassFor(MVT VT, bool isDivergent = false) const override;
		rampitecUnsubmitted Done Reply Inline Actions Formatting. rampitec: Formatting.

/// Returns true if a cast between SrcAS and DestAS is a noop.		/// Returns true if a cast between SrcAS and DestAS is a noop.
bool isNoopAddrSpaceCast(unsigned SrcAS, unsigned DestAS) const override {		bool isNoopAddrSpaceCast(unsigned SrcAS, unsigned DestAS) const override {
// Addrspacecasts are always noops.		// Addrspacecasts are always noops.
return true;		return true;
}		}

bool shouldAlignPointerArgs(CallInst *CI, unsigned &MinSize,		bool shouldAlignPointerArgs(CallInst *CI, unsigned &MinSize,
▲ Show 20 Lines • Show All 366 Lines • Show Last 20 Lines

lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,423 Lines • ▼ Show 20 Lines	EVT ARMTargetLowering::getSetCCResultType(const DataLayout &DL, LLVMContext &,
EVT VT) const {		EVT VT) const {
if (!VT.isVector())		if (!VT.isVector())
return getPointerTy(DL);		return getPointerTy(DL);
return VT.changeVectorElementTypeToInteger();		return VT.changeVectorElementTypeToInteger();
}		}

/// getRegClassFor - Return the register class that should be used for the		/// getRegClassFor - Return the register class that should be used for the
/// specified value type.		/// specified value type.
const TargetRegisterClass *ARMTargetLowering::getRegClassFor(MVT VT) const {		const TargetRegisterClass *
		ARMTargetLowering::getRegClassFor(MVT VT, bool isDivergent) const {
		(void)isDivergent;
		rampitecUnsubmitted Done Reply Inline Actions Formatting. rampitec: Formatting.
// Map v4i64 to QQ registers but do not make the type legal. Similarly map		// Map v4i64 to QQ registers but do not make the type legal. Similarly map
// v8i64 to QQQQ registers. v4i64 and v8i64 are only used for REG_SEQUENCE to		// v8i64 to QQQQ registers. v4i64 and v8i64 are only used for REG_SEQUENCE to
// load / store 4 to 8 consecutive D registers.		// load / store 4 to 8 consecutive D registers.
if (Subtarget->hasNEON()) {		if (Subtarget->hasNEON()) {
if (VT == MVT::v4i64)		if (VT == MVT::v4i64)
return &ARM::QQPRRegClass;		return &ARM::QQPRRegClass;
if (VT == MVT::v8i64)		if (VT == MVT::v8i64)
return &ARM::QQQQPRRegClass;		return &ARM::QQQQPRRegClass;
▲ Show 20 Lines • Show All 13,888 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/atomicrmw-nand.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	define i32 @atomic_nand_i32_lds(i32 addrspace(3)* %ptr) nounwind {			define i32 @atomic_nand_i32_lds(i32 addrspace(3)* %ptr) nounwind {
	; GCN-LABEL: atomic_nand_i32_lds:			; GCN-LABEL: atomic_nand_i32_lds:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: ds_read_b32 v2, v0			; GCN-NEXT: ds_read_b32 v1, v0
	; GCN-NEXT: s_mov_b64 s[6:7], 0			; GCN-NEXT: s_mov_b64 s[6:7], 0
	; GCN-NEXT: BB0_1: ; %atomicrmw.start			; GCN-NEXT: BB0_1: ; %atomicrmw.start
	; GCN-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN-NEXT: ; =>This Inner Loop Header: Depth=1
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
				; GCN-NEXT: v_mov_b32_e32 v2, v1
	; GCN-NEXT: v_not_b32_e32 v1, v2			; GCN-NEXT: v_not_b32_e32 v1, v2
	; GCN-NEXT: v_or_b32_e32 v1, -5, v1			; GCN-NEXT: v_or_b32_e32 v1, -5, v1
	; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GCN-NEXT: ds_cmpst_rtn_b32 v1, v0, v2, v1			; GCN-NEXT: ds_cmpst_rtn_b32 v1, v0, v2, v1
	; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GCN-NEXT: buffer_wbinvl1_vol			; GCN-NEXT: buffer_wbinvl1_vol
	; GCN-NEXT: v_cmp_eq_u32_e32 vcc, v1, v2			; GCN-NEXT: v_cmp_eq_u32_e32 vcc, v1, v2
	; GCN-NEXT: v_mov_b32_e32 v2, v1
	; GCN-NEXT: s_or_b64 s[6:7], vcc, s[6:7]			; GCN-NEXT: s_or_b64 s[6:7], vcc, s[6:7]
	; GCN-NEXT: s_andn2_b64 exec, exec, s[6:7]			; GCN-NEXT: s_andn2_b64 exec, exec, s[6:7]
	; GCN-NEXT: s_cbranch_execnz BB0_1			; GCN-NEXT: s_cbranch_execnz BB0_1
	; GCN-NEXT: ; %bb.2: ; %atomicrmw.end			; GCN-NEXT: ; %bb.2: ; %atomicrmw.end
	; GCN-NEXT: s_or_b64 exec, exec, s[6:7]			; GCN-NEXT: s_or_b64 exec, exec, s[6:7]
	; GCN-NEXT: v_mov_b32_e32 v0, v1			; GCN-NEXT: v_mov_b32_e32 v0, v1
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%result = atomicrmw nand i32 addrspace(3)* %ptr, i32 4 seq_cst			%result = atomicrmw nand i32 addrspace(3)* %ptr, i32 4 seq_cst
	ret i32 %result			ret i32 %result
	}			}

	define i32 @atomic_nand_i32_global(i32 addrspace(1)* %ptr) nounwind {			define i32 @atomic_nand_i32_global(i32 addrspace(1)* %ptr) nounwind {
	; GCN-LABEL: atomic_nand_i32_global:			; GCN-LABEL: atomic_nand_i32_global:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: global_load_dword v3, v[0:1], off			; GCN-NEXT: global_load_dword v2, v[0:1], off
	; GCN-NEXT: s_mov_b64 s[6:7], 0			; GCN-NEXT: s_mov_b64 s[6:7], 0
	; GCN-NEXT: BB1_1: ; %atomicrmw.start			; GCN-NEXT: BB1_1: ; %atomicrmw.start
	; GCN-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN-NEXT: ; =>This Inner Loop Header: Depth=1
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: v_mov_b32_e32 v3, v2
	; GCN-NEXT: v_not_b32_e32 v2, v3			; GCN-NEXT: v_not_b32_e32 v2, v3
	; GCN-NEXT: v_or_b32_e32 v2, -5, v2			; GCN-NEXT: v_or_b32_e32 v2, -5, v2
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: global_atomic_cmpswap v2, v[0:1], v[2:3], off glc			; GCN-NEXT: global_atomic_cmpswap v2, v[0:1], v[2:3], off glc
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_wbinvl1_vol			; GCN-NEXT: buffer_wbinvl1_vol
	; GCN-NEXT: v_cmp_eq_u32_e32 vcc, v2, v3			; GCN-NEXT: v_cmp_eq_u32_e32 vcc, v2, v3
	; GCN-NEXT: v_mov_b32_e32 v3, v2
	; GCN-NEXT: s_or_b64 s[6:7], vcc, s[6:7]			; GCN-NEXT: s_or_b64 s[6:7], vcc, s[6:7]
	; GCN-NEXT: s_andn2_b64 exec, exec, s[6:7]			; GCN-NEXT: s_andn2_b64 exec, exec, s[6:7]
	; GCN-NEXT: s_cbranch_execnz BB1_1			; GCN-NEXT: s_cbranch_execnz BB1_1
	; GCN-NEXT: ; %bb.2: ; %atomicrmw.end			; GCN-NEXT: ; %bb.2: ; %atomicrmw.end
	; GCN-NEXT: s_or_b64 exec, exec, s[6:7]			; GCN-NEXT: s_or_b64 exec, exec, s[6:7]
	; GCN-NEXT: v_mov_b32_e32 v0, v2			; GCN-NEXT: v_mov_b32_e32 v0, v2
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%result = atomicrmw nand i32 addrspace(1)* %ptr, i32 4 seq_cst			%result = atomicrmw nand i32 addrspace(1)* %ptr, i32 4 seq_cst
	ret i32 %result			ret i32 %result
	}			}

	define i32 @atomic_nand_i32_flat(i32* %ptr) nounwind {			define i32 @atomic_nand_i32_flat(i32* %ptr) nounwind {
	; GCN-LABEL: atomic_nand_i32_flat:			; GCN-LABEL: atomic_nand_i32_flat:
	; GCN: ; %bb.0:			; GCN: ; %bb.0:
	; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GCN-NEXT: flat_load_dword v3, v[0:1]			; GCN-NEXT: flat_load_dword v2, v[0:1]
	; GCN-NEXT: s_mov_b64 s[6:7], 0			; GCN-NEXT: s_mov_b64 s[6:7], 0
	; GCN-NEXT: BB2_1: ; %atomicrmw.start			; GCN-NEXT: BB2_1: ; %atomicrmw.start
	; GCN-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN-NEXT: ; =>This Inner Loop Header: Depth=1
	; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
				; GCN-NEXT: v_mov_b32_e32 v3, v2
	; GCN-NEXT: v_not_b32_e32 v2, v3			; GCN-NEXT: v_not_b32_e32 v2, v3
	; GCN-NEXT: v_or_b32_e32 v2, -5, v2			; GCN-NEXT: v_or_b32_e32 v2, -5, v2
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc			; GCN-NEXT: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
	; GCN-NEXT: s_waitcnt vmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0)
	; GCN-NEXT: buffer_wbinvl1_vol			; GCN-NEXT: buffer_wbinvl1_vol
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: v_cmp_eq_u32_e32 vcc, v2, v3			; GCN-NEXT: v_cmp_eq_u32_e32 vcc, v2, v3
	; GCN-NEXT: v_mov_b32_e32 v3, v2
	; GCN-NEXT: s_or_b64 s[6:7], vcc, s[6:7]			; GCN-NEXT: s_or_b64 s[6:7], vcc, s[6:7]
	; GCN-NEXT: s_andn2_b64 exec, exec, s[6:7]			; GCN-NEXT: s_andn2_b64 exec, exec, s[6:7]
	; GCN-NEXT: s_cbranch_execnz BB2_1			; GCN-NEXT: s_cbranch_execnz BB2_1
	; GCN-NEXT: ; %bb.2: ; %atomicrmw.end			; GCN-NEXT: ; %bb.2: ; %atomicrmw.end
	; GCN-NEXT: s_or_b64 exec, exec, s[6:7]			; GCN-NEXT: s_or_b64 exec, exec, s[6:7]
	; GCN-NEXT: v_mov_b32_e32 v0, v2			; GCN-NEXT: v_mov_b32_e32 v0, v2
	; GCN-NEXT: s_setpc_b64 s[30:31]			; GCN-NEXT: s_setpc_b64 s[30:31]
	%result = atomicrmw nand i32* %ptr, i32 4 seq_cst			%result = atomicrmw nand i32* %ptr, i32 4 seq_cst
	ret i32 %result			ret i32 %result
	}			}

test/CodeGen/AMDGPU/branch-relaxation.ll

	Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines

	bb3:			bb3:
	store volatile i32 %cnd, i32 addrspace(1)* %arg			store volatile i32 %cnd, i32 addrspace(1)* %arg
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}uniform_conditional_min_long_forward_vcnd_branch:			; GCN-LABEL: {{^}}uniform_conditional_min_long_forward_vcnd_branch:
	; GCN: s_load_dword [[CND:s[0-9]+]]			; GCN: s_load_dword [[CND:s[0-9]+]]
	; GCN-DAG: v_mov_b32_e32 [[V_CND:v[0-9]+]], [[CND]]
	; GCN-DAG: v_cmp_eq_f32_e64 [[UNMASKED:s\[[0-9]+:[0-9]+\]]], [[CND]], 0			; GCN-DAG: v_cmp_eq_f32_e64 [[UNMASKED:s\[[0-9]+:[0-9]+\]]], [[CND]], 0
	; GCN-DAG: s_and_b64 vcc, exec, [[UNMASKED]]			; GCN-DAG: s_and_b64 vcc, exec, [[UNMASKED]]
	; GCN: s_cbranch_vccz [[LONGBB:BB[0-9]+_[0-9]+]]			; GCN: s_cbranch_vccz [[LONGBB:BB[0-9]+_[0-9]+]]

	; GCN-NEXT: [[LONG_JUMP:BB[0-9]+_[0-9]+]]: ; %bb0			; GCN-NEXT: [[LONG_JUMP:BB[0-9]+_[0-9]+]]: ; %bb0
	; GCN-NEXT: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}			; GCN-NEXT: s_getpc_b64 s{{\[}}[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]{{\]}}
	; GCN-NEXT: s_add_u32 s[[PC_LO]], s[[PC_LO]], [[ENDBB:BB[0-9]+_[0-9]+]]-([[LONG_JUMP]]+4)			; GCN-NEXT: s_add_u32 s[[PC_LO]], s[[PC_LO]], [[ENDBB:BB[0-9]+_[0-9]+]]-([[LONG_JUMP]]+4)
	; GCN-NEXT: s_addc_u32 s[[PC_HI]], s[[PC_HI]], 0			; GCN-NEXT: s_addc_u32 s[[PC_HI]], s[[PC_HI]], 0
	; GCN-NEXT: s_setpc_b64 s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}			; GCN-NEXT: s_setpc_b64 s{{\[}}[[PC_LO]]:[[PC_HI]]{{\]}}

	; GCN-NEXT: [[LONGBB]]:			; GCN-NEXT: [[LONGBB]]:
	; GCN: v_nop_e64			; GCN: v_nop_e64
	; GCN: v_nop_e64			; GCN: v_nop_e64
	; GCN: v_nop_e64			; GCN: v_nop_e64
	; GCN: v_nop_e64			; GCN: v_nop_e64

	; GCN: [[ENDBB]]:			; GCN: [[ENDBB]]:
				; GCN: v_mov_b32_e32 [[V_CND:v[0-9]+]], [[CND]]
	; GCN: buffer_store_dword [[V_CND]]			; GCN: buffer_store_dword [[V_CND]]
	; GCN: s_endpgm			; GCN: s_endpgm
	define amdgpu_kernel void @uniform_conditional_min_long_forward_vcnd_branch(float addrspace(1)* %arg, float %cnd) #0 {			define amdgpu_kernel void @uniform_conditional_min_long_forward_vcnd_branch(float addrspace(1)* %arg, float %cnd) #0 {
	bb0:			bb0:
	%cmp = fcmp oeq float %cnd, 0.0			%cmp = fcmp oeq float %cnd, 0.0
	br i1 %cmp, label %bb3, label %bb2 ; + 8 dword branch			br i1 %cmp, label %bb3, label %bb2 ; + 8 dword branch

	bb2:			bb2:
	▲ Show 20 Lines • Show All 427 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/branch-uniformity.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck %s

	; The branch instruction in LOOP49 has a uniform condition, but PHI instructions			; The branch instruction in LOOP49 has a uniform condition, but PHI instructions
	; introduced by the structurizecfg pass previously caused a false divergence			; introduced by the structurizecfg pass previously caused a false divergence
	; which ended up in an assertion (or incorrect code) because			; which ended up in an assertion (or incorrect code) because
	; SIAnnotateControlFlow and structurizecfg had different ideas about which			; SIAnnotateControlFlow and structurizecfg had different ideas about which
	; branches are uniform.			; branches are uniform.
	;			;
	; CHECK-LABEL: {{^}}main:			; CHECK-LABEL: {{^}}main:
	; CHECK: ; %LOOP49			; CHECK: ; %LOOP49
	; CHECK: v_cmp_ne_u32_e32 vcc,			; CHECK: s_cmp_lg_u32 s{{[0-9]+}}, 0
	; CHECK: s_cbranch_vccnz			; CHECK: s_cbranch_scc1
	; CHECK: ; %ENDIF53			; CHECK: ; %ENDIF53
	define amdgpu_vs float @main(i32 %in) {			define amdgpu_vs float @main(i32 %in) {
	main_body:			main_body:
	%cmp = mul i32 %in, 2			%cmp = mul i32 %in, 2
	br label %LOOP			br label %LOOP

	LOOP: ; preds = %ENDLOOP48, %main_body			LOOP: ; preds = %ENDLOOP48, %main_body
	%counter = phi i32 [ 0, %main_body ], [ %counter.next, %ENDLOOP48 ]			%counter = phi i32 [ 0, %main_body ], [ %counter.next, %ENDLOOP48 ]
	Show All 21 Lines

test/CodeGen/AMDGPU/control-flow-fastregalloc.ll

	Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines

	endif:			endif:
	%tmp4 = phi i32 [ %val, %if ], [ 0, %entry ]			%tmp4 = phi i32 [ %val, %if ], [ 0, %entry ]
	store i32 %tmp4, i32 addrspace(1)* %out			store i32 %tmp4, i32 addrspace(1)* %out
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}divergent_loop:			; GCN-LABEL: {{^}}divergent_loop:
	; VGPR: workitem_private_segment_byte_size = 16{{$}}			; VGPR: workitem_private_segment_byte_size = 12{{$}}

	; GCN: {{^}}; %bb.0:			; GCN: {{^}}; %bb.0:

	; GCN: s_mov_b32 m0, -1			; GCN: s_mov_b32 m0, -1
	; GCN: ds_read_b32 [[LOAD0:v[0-9]+]]			; GCN: ds_read_b32 [[LOAD0:v[0-9]+]]

	; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], s{{[0-9]+}}, v0			; GCN: v_cmp_eq_u32_e64 [[CMP0:s\[[0-9]+:[0-9]\]]], s{{[0-9]+}}, v0

	Show All 17 Lines

	; GCN-NEXT: ; mask branch [[END:BB[0-9]+_[0-9]+]]			; GCN-NEXT: ; mask branch [[END:BB[0-9]+_[0-9]+]]
	; GCN-NEXT: s_cbranch_execz [[END]]			; GCN-NEXT: s_cbranch_execz [[END]]


	; GCN: [[LOOP:BB[0-9]+_[0-9]+]]:			; GCN: [[LOOP:BB[0-9]+_[0-9]+]]:
	; GCN: buffer_load_dword v[[VAL_LOOP_RELOAD:[0-9]+]], off, s[0:3], s7 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload			; GCN: buffer_load_dword v[[VAL_LOOP_RELOAD:[0-9]+]], off, s[0:3], s7 offset:[[LOAD0_OFFSET]] ; 4-byte Folded Reload
	; GCN: v_subrev_i32_e32 [[VAL_LOOP:v[0-9]+]], vcc, v{{[0-9]+}}, v[[VAL_LOOP_RELOAD]]			; GCN: v_subrev_i32_e32 [[VAL_LOOP:v[0-9]+]], vcc, v{{[0-9]+}}, v[[VAL_LOOP_RELOAD]]
	; GCN: v_cmp_ne_u32_e32 vcc,			; GCN: s_cmp_lg_u32
	; GCN: s_and_b64 vcc, exec, vcc
	; GCN: buffer_store_dword [[VAL_LOOP]], off, s[0:3], s7 offset:[[VAL_SUB_OFFSET:[0-9]+]] ; 4-byte Folded Spill			; GCN: buffer_store_dword [[VAL_LOOP]], off, s[0:3], s7 offset:[[VAL_SUB_OFFSET:[0-9]+]] ; 4-byte Folded Spill
	; GCN-NEXT: s_cbranch_vccnz [[LOOP]]			; GCN-NEXT: s_cbranch_scc1 [[LOOP]]


	; GCN: [[END]]:			; GCN: [[END]]:
	; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_LO_LANE]]			; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_LO:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_LO_LANE]]
	; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_HI_LANE]]			; VGPR: v_readlane_b32 s[[S_RELOAD_SAVEEXEC_HI:[0-9]+]], [[SPILL_VGPR]], [[SAVEEXEC_HI_LANE]]

	; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_LO:[0-9]+]], off, s[0:3], s7 offset:24 ; 4-byte Folded Reload			; VMEM: buffer_load_dword v[[V_RELOAD_SAVEEXEC_LO:[0-9]+]], off, s[0:3], s7 offset:24 ; 4-byte Folded Reload
	; VMEM: s_waitcnt vmcnt(0)			; VMEM: s_waitcnt vmcnt(0)
	▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs < %s \| FileCheck %s

	; This module creates a divergent branch. The branch is marked as divergent by			; This module creates a divergent branch. The branch is marked as divergent by
	; the divergence analysis but the condition is not. This test ensures that the			; the divergence analysis but the condition is not. This test ensures that the
	; divergence of the branch is tested, not its condition, so that branch is			; divergence of the branch is tested, not its condition, so that branch is
	; correctly emitted as divergent.			; correctly emitted as divergent.

	target triple = "amdgcn-mesa-mesa3d"			target triple = "amdgcn-mesa-mesa3d"

	define amdgpu_ps void @main(i32, float) {			define amdgpu_ps void @main(i32, float) {
	; CHECK-LABEL: main:			; CHECK-LABEL: main:
	; CHECK: ; %bb.0: ; %start			; CHECK: ; %bb.0: ; %start
	; CHECK-NEXT: v_readfirstlane_b32 s0, v0			; CHECK-NEXT: v_readfirstlane_b32 s0, v0
	; CHECK-NEXT: s_mov_b32 m0, s0			; CHECK-NEXT: s_mov_b32 m0, s0
	; CHECK-NEXT: s_mov_b64 s[4:5], 0			; CHECK-NEXT: s_mov_b32 s0, 0
	; CHECK-NEXT: v_interp_p1_f32_e32 v0, v1, attr0.x			; CHECK-NEXT: v_interp_p1_f32_e32 v0, v1, attr0.x
	; CHECK-NEXT: v_cmp_nlt_f32_e64 s[0:1], 0, v0			; CHECK-NEXT: v_cmp_nlt_f32_e32 vcc, 0, v0
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: s_mov_b64 s[2:3], 0
	; CHECK-NEXT: ; implicit-def: $sgpr2_sgpr3			; CHECK-NEXT: ; implicit-def: $sgpr4_sgpr5
	; CHECK-NEXT: ; implicit-def: $sgpr6_sgpr7
	; CHECK-NEXT: BB0_1: ; %loop			; CHECK-NEXT: BB0_1: ; %loop
	; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: v_cmp_gt_u32_e32 vcc, 32, v1			; CHECK-NEXT: s_or_b64 s[4:5], s[4:5], exec
	; CHECK-NEXT: s_and_b64 vcc, exec, vcc			; CHECK-NEXT: s_cmp_lt_u32 s0, 32
	; CHECK-NEXT: s_or_b64 s[6:7], s[6:7], exec			; CHECK-NEXT: s_mov_b64 s[6:7], -1
	; CHECK-NEXT: s_or_b64 s[2:3], s[2:3], exec			; CHECK-NEXT: s_cbranch_scc0 BB0_5
	; CHECK-NEXT: s_cbranch_vccz BB0_5
	; CHECK-NEXT: ; %bb.2: ; %endif1			; CHECK-NEXT: ; %bb.2: ; %endif1
	; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: s_mov_b64 s[6:7], -1			; CHECK-NEXT: s_mov_b64 s[4:5], -1
	; CHECK-NEXT: s_and_saveexec_b64 s[8:9], s[0:1]			; CHECK-NEXT: s_and_saveexec_b64 s[6:7], vcc
	; CHECK-NEXT: s_xor_b64 s[8:9], exec, s[8:9]			; CHECK-NEXT: s_xor_b64 s[6:7], exec, s[6:7]
	; CHECK-NEXT: ; mask branch BB0_4			; CHECK-NEXT: ; mask branch BB0_4
	; CHECK-NEXT: BB0_3: ; %endif2			; CHECK-NEXT: BB0_3: ; %endif2
	; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: v_add_u32_e32 v1, 1, v1			; CHECK-NEXT: s_add_i32 s0, s0, 1
	; CHECK-NEXT: s_xor_b64 s[6:7], exec, -1			; CHECK-NEXT: s_xor_b64 s[4:5], exec, -1
	; CHECK-NEXT: BB0_4: ; %Flow1			; CHECK-NEXT: BB0_4: ; %Flow1
	; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: s_or_b64 exec, exec, s[8:9]			; CHECK-NEXT: s_or_b64 exec, exec, s[6:7]
	; CHECK-NEXT: s_andn2_b64 s[2:3], s[2:3], exec			; CHECK-NEXT: s_mov_b64 s[6:7], 0
	; CHECK-NEXT: s_branch BB0_6			; CHECK-NEXT: BB0_5: ; %Flow
	; CHECK-NEXT: BB0_5: ; in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: ; implicit-def: $vgpr1
	; CHECK-NEXT: BB0_6: ; %Flow
	; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: s_and_b64 s[8:9], exec, s[6:7]			; CHECK-NEXT: s_and_b64 s[8:9], exec, s[4:5]
	; CHECK-NEXT: s_or_b64 s[8:9], s[8:9], s[4:5]			; CHECK-NEXT: s_or_b64 s[8:9], s[8:9], s[2:3]
	; CHECK-NEXT: s_mov_b64 s[4:5], s[8:9]			; CHECK-NEXT: s_mov_b64 s[2:3], s[8:9]
	; CHECK-NEXT: s_andn2_b64 exec, exec, s[8:9]			; CHECK-NEXT: s_andn2_b64 exec, exec, s[8:9]
	; CHECK-NEXT: s_cbranch_execnz BB0_1			; CHECK-NEXT: s_cbranch_execnz BB0_1
	; CHECK-NEXT: ; %bb.7: ; %Flow2			; CHECK-NEXT: ; %bb.6: ; %Flow2
	; CHECK-NEXT: s_or_b64 exec, exec, s[8:9]			; CHECK-NEXT: s_or_b64 exec, exec, s[8:9]
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: v_mov_b32_e32 v1, 0
	; this is the divergent branch with the condition not marked as divergent			; CHECK-NEXT: s_and_saveexec_b64 s[0:1], s[6:7]
	; CHECK-NEXT: s_and_saveexec_b64 s[0:1], s[2:3]			; CHECK-NEXT: ; mask branch BB0_8
	; CHECK-NEXT: ; mask branch BB0_9			; CHECK-NEXT: BB0_7: ; %if1
	; CHECK-NEXT: BB0_8: ; %if1
	; CHECK-NEXT: v_sqrt_f32_e32 v1, v0			; CHECK-NEXT: v_sqrt_f32_e32 v1, v0
	; CHECK-NEXT: BB0_9: ; %endloop			; CHECK-NEXT: BB0_8: ; %endloop
	; CHECK-NEXT: s_or_b64 exec, exec, s[0:1]			; CHECK-NEXT: s_or_b64 exec, exec, s[0:1]
	; CHECK-NEXT: exp mrt0 v1, v1, v1, v1 done vm			; CHECK-NEXT: exp mrt0 v1, v1, v1, v1 done vm
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
				; this is the divergent branch with the condition not marked as divergent
	start:			start:
	%v0 = call float @llvm.amdgcn.interp.p1(float %1, i32 0, i32 0, i32 %0)			%v0 = call float @llvm.amdgcn.interp.p1(float %1, i32 0, i32 0, i32 %0)
	br label %loop			br label %loop

	loop:			loop:
	%v1 = phi i32 [ 0, %start ], [ %v5, %endif2 ]			%v1 = phi i32 [ 0, %start ], [ %v5, %endif2 ]
	%v2 = icmp ugt i32 %v1, 31			%v2 = icmp ugt i32 %v1, 31
	br i1 %v2, label %if1, label %endif1			br i1 %v2, label %if1, label %endif1
	Show All 25 Lines

test/CodeGen/AMDGPU/extract_subvector_vec4_vec3.ll

	; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 < %s -stop-after=amdgpu-isel \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-mesa-mesa3d -mcpu=gfx900 < %s -stop-after=amdgpu-isel \| FileCheck -check-prefix=GCN %s

	; We want to see a BUFFER_LOAD, some register shuffling, and a BUFFER_STORE.			; We want to see a BUFFER_LOAD, some register shuffling, and a BUFFER_STORE.
	; Specifically, we do not want to see a BUFFER_STORE that says "store into			; Specifically, we do not want to see a BUFFER_STORE that says "store into
	; stack" in the middle.			; stack" in the middle.

	define amdgpu_hs void @main([0 x i8] addrspace(6)* inreg %arg) {			define amdgpu_hs void @main([0 x i8] addrspace(6)* inreg %arg) {
	; GCN-LABEL: name: main			; GCN-LABEL: name: main
	; GCN: bb.0.main_body:			; GCN: bb.0.main_body:
	; GCN: [[S_MOV_B32_:%[0-9]+]]:sreg_32_xm0 = S_MOV_B32 0			; GCN: [[S_MOV_B32_:%[0-9]+]]:sreg_32_xm0 = S_MOV_B32 0
	; GCN: [[DEF:%[0-9]+]]:sreg_32_xm0 = IMPLICIT_DEF			; GCN: [[DEF:%[0-9]+]]:sreg_32_xm0 = IMPLICIT_DEF
	; GCN: [[COPY:%[0-9]+]]:vgpr_32 = COPY [[DEF]]			; GCN: [[COPY:%[0-9]+]]:vgpr_32 = COPY [[DEF]]
	; GCN: [[DEF1:%[0-9]+]]:sreg_128 = IMPLICIT_DEF			; GCN: [[DEF1:%[0-9]+]]:sreg_128 = IMPLICIT_DEF
	; GCN: [[BUFFER_LOAD_DWORDX4_OFFEN:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_OFFEN [[COPY]], [[DEF1]], [[S_MOV_B32_]], 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 16 from custom TargetCustom7, align 1, addrspace 4)			; GCN: [[BUFFER_LOAD_DWORDX4_OFFEN:%[0-9]+]]:vreg_128 = BUFFER_LOAD_DWORDX4_OFFEN [[COPY]], [[DEF1]], [[S_MOV_B32_]], 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable load 16 from custom TargetCustom7, align 1, addrspace 4)
	; GCN: [[COPY1:%[0-9]+]]:vgpr_32 = COPY [[BUFFER_LOAD_DWORDX4_OFFEN]].sub2			; GCN: [[COPY1:%[0-9]+]]:sgpr_32 = COPY [[BUFFER_LOAD_DWORDX4_OFFEN]].sub2
	; GCN: [[COPY2:%[0-9]+]]:vgpr_32 = COPY [[BUFFER_LOAD_DWORDX4_OFFEN]].sub1			; GCN: [[COPY2:%[0-9]+]]:sgpr_32 = COPY [[BUFFER_LOAD_DWORDX4_OFFEN]].sub1
	; GCN: [[COPY3:%[0-9]+]]:vgpr_32 = COPY [[BUFFER_LOAD_DWORDX4_OFFEN]].sub0			; GCN: [[COPY3:%[0-9]+]]:sgpr_32 = COPY [[BUFFER_LOAD_DWORDX4_OFFEN]].sub0
	; GCN: [[REG_SEQUENCE:%[0-9]+]]:sgpr_96 = REG_SEQUENCE killed [[COPY3]], %subreg.sub0, killed [[COPY2]], %subreg.sub1, killed [[COPY1]], %subreg.sub2			; GCN: [[REG_SEQUENCE:%[0-9]+]]:sgpr_96 = REG_SEQUENCE killed [[COPY3]], %subreg.sub0, killed [[COPY2]], %subreg.sub1, killed [[COPY1]], %subreg.sub2
	; GCN: [[COPY4:%[0-9]+]]:vreg_96 = COPY [[REG_SEQUENCE]]			; GCN: [[COPY4:%[0-9]+]]:vreg_96 = COPY [[REG_SEQUENCE]]
	; GCN: [[DEF2:%[0-9]+]]:sreg_32_xm0 = IMPLICIT_DEF			; GCN: [[DEF2:%[0-9]+]]:sreg_32_xm0 = IMPLICIT_DEF
	; GCN: [[COPY5:%[0-9]+]]:vgpr_32 = COPY [[DEF2]]			; GCN: [[COPY5:%[0-9]+]]:vgpr_32 = COPY [[DEF2]]
	; GCN: [[DEF3:%[0-9]+]]:sreg_128 = IMPLICIT_DEF			; GCN: [[DEF3:%[0-9]+]]:sreg_128 = IMPLICIT_DEF
	; GCN: BUFFER_STORE_DWORDX3_OFFEN_exact killed [[COPY4]], [[COPY5]], [[DEF3]], [[S_MOV_B32_]], 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 12 into custom TargetCustom7, align 1, addrspace 4)			; GCN: BUFFER_STORE_DWORDX3_OFFEN_exact killed [[COPY4]], [[COPY5]], [[DEF3]], [[S_MOV_B32_]], 0, 0, 0, 0, 0, implicit $exec :: (dereferenceable store 12 into custom TargetCustom7, align 1, addrspace 4)
	; GCN: S_ENDPGM 0			; GCN: S_ENDPGM 0
	main_body:			main_body:
	Show All 11 Lines

test/CodeGen/AMDGPU/fabs.ll

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @s_fabs_f32(float addrspace(1)* %out, float %in) {
store float %fabs, float addrspace(1)* %out		store float %fabs, float addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}fabs_v2f32:		; FUNC-LABEL: {{^}}fabs_v2f32:
; R600: \|{{(PV\|T[0-9])\.[XYZW]}}\|		; R600: \|{{(PV\|T[0-9])\.[XYZW]}}\|
; R600: \|{{(PV\|T[0-9])\.[XYZW]}}\|		; R600: \|{{(PV\|T[0-9])\.[XYZW]}}\|

; GCN: v_and_b32		; GCN: s_and_b32
; GCN: v_and_b32		; GCN: s_and_b32
define amdgpu_kernel void @fabs_v2f32(<2 x float> addrspace(1)* %out, <2 x float> %in) {		define amdgpu_kernel void @fabs_v2f32(<2 x float> addrspace(1)* %out, <2 x float> %in) {
%fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %in)		%fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %in)
store <2 x float> %fabs, <2 x float> addrspace(1)* %out		store <2 x float> %fabs, <2 x float> addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}fabs_v4f32:		; FUNC-LABEL: {{^}}fabs_v4f32:
; R600: \|{{(PV\|T[0-9])\.[XYZW]}}\|		; R600: \|{{(PV\|T[0-9])\.[XYZW]}}\|
; R600: \|{{(PV\|T[0-9])\.[XYZW]}}\|		; R600: \|{{(PV\|T[0-9])\.[XYZW]}}\|
; R600: \|{{(PV\|T[0-9])\.[XYZW]}}\|		; R600: \|{{(PV\|T[0-9])\.[XYZW]}}\|
; R600: \|{{(PV\|T[0-9])\.[XYZW]}}\|		; R600: \|{{(PV\|T[0-9])\.[XYZW]}}\|

; GCN: v_and_b32		; GCN: s_and_b32
; GCN: v_and_b32		; GCN: s_and_b32
; GCN: v_and_b32		; GCN: s_and_b32
; GCN: v_and_b32		; GCN: s_and_b32
define amdgpu_kernel void @fabs_v4f32(<4 x float> addrspace(1)* %out, <4 x float> %in) {		define amdgpu_kernel void @fabs_v4f32(<4 x float> addrspace(1)* %out, <4 x float> %in) {
%fabs = call <4 x float> @llvm.fabs.v4f32(<4 x float> %in)		%fabs = call <4 x float> @llvm.fabs.v4f32(<4 x float> %in)
store <4 x float> %fabs, <4 x float> addrspace(1)* %out		store <4 x float> %fabs, <4 x float> addrspace(1)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}fabs_fn_fold:		; GCN-LABEL: {{^}}fabs_fn_fold:
; SI: s_load_dwordx2 s{{\[}}[[ABS_VALUE:[0-9]+]]:[[MUL_VAL:[0-9]+]]{{\]}}, s[{{[0-9]+:[0-9]+}}], 0xb		; SI: s_load_dwordx2 s{{\[}}[[ABS_VALUE:[0-9]+]]:[[MUL_VAL:[0-9]+]]{{\]}}, s[{{[0-9]+:[0-9]+}}], 0xb
Show All 40 Lines

test/CodeGen/AMDGPU/fdiv32-to-rcp-folding.ll

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @div_minus_1_by_minus_x_25ulp(float addrspace(1)* %arg) {
%neg = fsub float -0.000000e+00, %load		%neg = fsub float -0.000000e+00, %load
%div = fdiv float -1.000000e+00, %neg, !fpmath !0		%div = fdiv float -1.000000e+00, %neg, !fpmath !0
store float %div, float addrspace(1)* %arg, align 4		store float %div, float addrspace(1)* %arg, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}div_v4_1_by_x_25ulp:		; GCN-LABEL: {{^}}div_v4_1_by_x_25ulp:
; GCN-DAG: s_load_dwordx4 s{{\[}}[[VAL0:[0-9]+]]:[[VAL3:[0-9]+]]], s[{{[0-9:]+}}], 0x0{{$}}		; GCN-DAG: s_load_dwordx4 s{{\[}}[[VAL0:[0-9]+]]:[[VAL3:[0-9]+]]], s[{{[0-9:]+}}], 0x0{{$}}
; GCN-DENORM-DAG: s_mov_b32 [[L:s[0-9]+]], 0x6f800000		; GCN-DENORM-DAG: v_mov_b32_e32 [[L:v[0-9]+]], 0x6f800000
; GCN-DENORM-DAG: v_mov_b32_e32 [[S:v[0-9]+]], 0x2f800000		; GCN-DENORM-DAG: v_mov_b32_e32 [[S:v[0-9]+]], 0x2f800000
; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc
; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc
; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc
; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc
; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
; GCN-DENORM-DAG: v_rcp_f32_e32		; GCN-DENORM-DAG: v_rcp_f32_e32
; GCN-DENORM-DAG: v_rcp_f32_e32		; GCN-DENORM-DAG: v_rcp_f32_e32
; GCN-DENORM-DAG: v_rcp_f32_e32		; GCN-DENORM-DAG: v_rcp_f32_e32
Show All 11 Lines
define amdgpu_kernel void @div_v4_1_by_x_25ulp(<4 x float> addrspace(1)* %arg) {		define amdgpu_kernel void @div_v4_1_by_x_25ulp(<4 x float> addrspace(1)* %arg) {
%load = load <4 x float>, <4 x float> addrspace(1)* %arg, align 16		%load = load <4 x float>, <4 x float> addrspace(1)* %arg, align 16
%div = fdiv <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, %load, !fpmath !0		%div = fdiv <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, %load, !fpmath !0
store <4 x float> %div, <4 x float> addrspace(1)* %arg, align 16		store <4 x float> %div, <4 x float> addrspace(1)* %arg, align 16
ret void		ret void
}		}

; GCN-LABEL: {{^}}div_v4_minus_1_by_x_25ulp:		; GCN-LABEL: {{^}}div_v4_minus_1_by_x_25ulp:
; GCN-DENORM-DAG: s_mov_b32 [[L:s[0-9]+]], 0x6f800000		; GCN-DENORM-DAG: v_mov_b32_e32 [[L:v[0-9]+]], 0x6f800000
; GCN-DENORM-DAG: v_mov_b32_e32 [[S:v[0-9]+]], 0x2f800000		; GCN-DENORM-DAG: v_mov_b32_e32 [[S:v[0-9]+]], 0x2f800000
; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc
; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc
; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc
; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc
; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, s{{[0-9]+}}, -v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, s{{[0-9]+}}, -v{{[0-9]+}}
; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, s{{[0-9]+}}, -v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, s{{[0-9]+}}, -v{{[0-9]+}}
; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, s{{[0-9]+}}, -v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, s{{[0-9]+}}, -v{{[0-9]+}}
; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, s{{[0-9]+}}, -v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, s{{[0-9]+}}, -v{{[0-9]+}}
; GCN-DENORM-DAG: v_rcp_f32_e32		; GCN-DENORM-DAG: v_rcp_f32_e32
; GCN-DENORM-DAG: v_rcp_f32_e32		; GCN-DENORM-DAG: v_rcp_f32_e32
; GCN-DENORM-DAG: v_rcp_f32_e32		; GCN-DENORM-DAG: v_rcp_f32_e32
Show All 10 Lines
define amdgpu_kernel void @div_v4_minus_1_by_x_25ulp(<4 x float> addrspace(1)* %arg) {		define amdgpu_kernel void @div_v4_minus_1_by_x_25ulp(<4 x float> addrspace(1)* %arg) {
%load = load <4 x float>, <4 x float> addrspace(1)* %arg, align 16		%load = load <4 x float>, <4 x float> addrspace(1)* %arg, align 16
%div = fdiv <4 x float> <float -1.000000e+00, float -1.000000e+00, float -1.000000e+00, float -1.000000e+00>, %load, !fpmath !0		%div = fdiv <4 x float> <float -1.000000e+00, float -1.000000e+00, float -1.000000e+00, float -1.000000e+00>, %load, !fpmath !0
store <4 x float> %div, <4 x float> addrspace(1)* %arg, align 16		store <4 x float> %div, <4 x float> addrspace(1)* %arg, align 16
ret void		ret void
}		}

; GCN-LABEL: {{^}}div_v4_1_by_minus_x_25ulp:		; GCN-LABEL: {{^}}div_v4_1_by_minus_x_25ulp:
; GCN-DENORM-DAG: s_mov_b32 [[L:s[0-9]+]], 0x6f800000		; GCN-DENORM-DAG: v_mov_b32_e32 [[L:v[0-9]+]], 0x6f800000
; GCN-DENORM-DAG: v_mov_b32_e32 [[S:v[0-9]+]], 0x2f800000		; GCN-DENORM-DAG: v_mov_b32_e32 [[S:v[0-9]+]], 0x2f800000
; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc
; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc
; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc
; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc
; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, -s{{[0-9]+}}, v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, -s{{[0-9]+}}, v{{[0-9]+}}
; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, -s{{[0-9]+}}, v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, -s{{[0-9]+}}, v{{[0-9]+}}
; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, -s{{[0-9]+}}, v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, -s{{[0-9]+}}, v{{[0-9]+}}
; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, -s{{[0-9]+}}, v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, -s{{[0-9]+}}, v{{[0-9]+}}
; GCN-DENORM-DAG: v_rcp_f32_e32		; GCN-DENORM-DAG: v_rcp_f32_e32
; GCN-DENORM-DAG: v_rcp_f32_e32		; GCN-DENORM-DAG: v_rcp_f32_e32
; GCN-DENORM-DAG: v_rcp_f32_e32		; GCN-DENORM-DAG: v_rcp_f32_e32
Show All 13 Lines	define amdgpu_kernel void @div_v4_1_by_minus_x_25ulp(<4 x float> addrspace(1)* %arg) {
%neg = fsub <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %load		%neg = fsub <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %load
%div = fdiv <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, %neg, !fpmath !0		%div = fdiv <4 x float> <float 1.000000e+00, float 1.000000e+00, float 1.000000e+00, float 1.000000e+00>, %neg, !fpmath !0
store <4 x float> %div, <4 x float> addrspace(1)* %arg, align 16		store <4 x float> %div, <4 x float> addrspace(1)* %arg, align 16
ret void		ret void
}		}

; GCN-LABEL: {{^}}div_v4_minus_1_by_minus_x_25ulp:		; GCN-LABEL: {{^}}div_v4_minus_1_by_minus_x_25ulp:
; GCN-DAG: s_load_dwordx4 s{{\[}}[[VAL0:[0-9]+]]:[[VAL3:[0-9]+]]], s[{{[0-9:]+}}], 0x0{{$}}		; GCN-DAG: s_load_dwordx4 s{{\[}}[[VAL0:[0-9]+]]:[[VAL3:[0-9]+]]], s[{{[0-9:]+}}], 0x0{{$}}
; GCN-DENORM-DAG: s_mov_b32 [[L:s[0-9]+]], 0x6f800000		; GCN-DENORM-DAG: v_mov_b32_e32 [[L:v[0-9]+]], 0x6f800000
; GCN-DENORM-DAG: v_mov_b32_e32 [[S:v[0-9]+]], 0x2f800000		; GCN-DENORM-DAG: v_mov_b32_e32 [[S:v[0-9]+]], 0x2f800000
; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc
; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc
; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc
; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DENORM-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DENORM-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc
; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
; GCN-DENORM-DAG: v_rcp_f32_e32		; GCN-DENORM-DAG: v_rcp_f32_e32
; GCN-DENORM-DAG: v_rcp_f32_e32		; GCN-DENORM-DAG: v_rcp_f32_e32
; GCN-DENORM-DAG: v_rcp_f32_e32		; GCN-DENORM-DAG: v_rcp_f32_e32
Show All 12 Lines	define amdgpu_kernel void @div_v4_minus_1_by_minus_x_25ulp(<4 x float> addrspace(1)* %arg) {
%load = load <4 x float>, <4 x float> addrspace(1)* %arg, align 16		%load = load <4 x float>, <4 x float> addrspace(1)* %arg, align 16
%neg = fsub <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %load		%neg = fsub <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %load
%div = fdiv <4 x float> <float -1.000000e+00, float -1.000000e+00, float -1.000000e+00, float -1.000000e+00>, %neg, !fpmath !0		%div = fdiv <4 x float> <float -1.000000e+00, float -1.000000e+00, float -1.000000e+00, float -1.000000e+00>, %neg, !fpmath !0
store <4 x float> %div, <4 x float> addrspace(1)* %arg, align 16		store <4 x float> %div, <4 x float> addrspace(1)* %arg, align 16
ret void		ret void
}		}

; GCN-LABEL: {{^}}div_v4_c_by_x_25ulp:		; GCN-LABEL: {{^}}div_v4_c_by_x_25ulp:
; GCN-DAG: s_mov_b32 [[L:s[0-9]+]], 0x6f800000
; GCN-DAG: v_mov_b32_e32 [[S:v[0-9]+]], 0x2f800000
; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, 2.0{{$}}		; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, 2.0{{$}}
; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, 2.0{{$}}		; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, 2.0{{$}}
; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}		; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}
; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}		; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}
; GCN-DENORM-DAG: v_rcp_f32_e32		; GCN-DENORM-DAG: v_rcp_f32_e32
; GCN-DENORM-DAG: v_rcp_f32_e32		; GCN-DENORM-DAG: v_rcp_f32_e32

; GCN-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DAG: v_mov_b32_e32 [[L:v[0-9]+]], 0x6f800000
		; GCN-DAG: v_mov_b32_e32 [[S:v[0-9]+]], 0x2f800000

		; GCN-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc
; GCN-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc

; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, s{{[0-9]+}}, -v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, s{{[0-9]+}}, -v{{[0-9]+}}
; GCN-DENORM-DAG: v_rcp_f32_e32 [[RCP1:v[0-9]+]], v{{[0-9]+}}		; GCN-DENORM-DAG: v_rcp_f32_e32 [[RCP1:v[0-9]+]], v{{[0-9]+}}
; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, [[RCP1]]		; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, [[RCP1]]
; GCN-DENORM-DAG: v_rcp_f32_e32 [[RCP2:v[0-9]+]], v{{[0-9]+}}		; GCN-DENORM-DAG: v_rcp_f32_e32 [[RCP2:v[0-9]+]], v{{[0-9]+}}
; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, [[RCP2]]		; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, [[RCP2]]
Show All 14 Lines
define amdgpu_kernel void @div_v4_c_by_x_25ulp(<4 x float> addrspace(1)* %arg) {		define amdgpu_kernel void @div_v4_c_by_x_25ulp(<4 x float> addrspace(1)* %arg) {
%load = load <4 x float>, <4 x float> addrspace(1)* %arg, align 16		%load = load <4 x float>, <4 x float> addrspace(1)* %arg, align 16
%div = fdiv <4 x float> <float 2.000000e+00, float 1.000000e+00, float -1.000000e+00, float -2.000000e+00>, %load, !fpmath !0		%div = fdiv <4 x float> <float 2.000000e+00, float 1.000000e+00, float -1.000000e+00, float -2.000000e+00>, %load, !fpmath !0
store <4 x float> %div, <4 x float> addrspace(1)* %arg, align 16		store <4 x float> %div, <4 x float> addrspace(1)* %arg, align 16
ret void		ret void
}		}

; GCN-LABEL: {{^}}div_v4_c_by_minus_x_25ulp:		; GCN-LABEL: {{^}}div_v4_c_by_minus_x_25ulp:
; GCN-DAG: s_mov_b32 [[L:s[0-9]+]], 0x6f800000
; GCN-DAG: v_mov_b32_e32 [[S:v[0-9]+]], 0x2f800000
; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}		; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}
; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}		; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}
; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}		; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}
; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}		; GCN-DENORM-DAG: v_div_scale_f32 {{.*}}, -2.0{{$}}
; GCN-DENORM-DAG: v_rcp_f32_e32		; GCN-DENORM-DAG: v_rcp_f32_e32
; GCN-DENORM-DAG: v_rcp_f32_e32		; GCN-DENORM-DAG: v_rcp_f32_e32

; GCN-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DAG: v_mov_b32_e32 [[L:v[0-9]+]], 0x6f800000
		; GCN-DAG: v_mov_b32_e32 [[S:v[0-9]+]], 0x2f800000

		; GCN-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc
; GCN-DAG: v_cmp_gt_f32_e64 vcc, \|v{{[0-9]+}}\|, [[L]]		; GCN-DAG: v_cmp_gt_f32_e64 vcc, \|s{{[0-9]+}}\|, [[L]]
; GCN-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc		; GCN-DAG: v_cndmask_b32_e32 v{{[0-9]+}}, 1.0, [[S]], vcc

; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, v{{[0-9]+}}
; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, -s{{[0-9]+}}, v{{[0-9]+}}		; GCN-DENORM-DAG: v_mul_f32_e64 v{{[0-9]+}}, -s{{[0-9]+}}, v{{[0-9]+}}
; GCN-DENORM-DAG: v_rcp_f32_e32 [[RCP1:v[0-9]+]], v{{[0-9]+}}		; GCN-DENORM-DAG: v_rcp_f32_e32 [[RCP1:v[0-9]+]], v{{[0-9]+}}
; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, [[RCP1]]		; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, [[RCP1]]
; GCN-DENORM-DAG: v_rcp_f32_e32 [[RCP2:v[0-9]+]], v{{[0-9]+}}		; GCN-DENORM-DAG: v_rcp_f32_e32 [[RCP2:v[0-9]+]], v{{[0-9]+}}
; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, [[RCP2]]		; GCN-DENORM-DAG: v_mul_f32_e32 v{{[0-9]+}}, v{{[0-9]+}}, [[RCP2]]
▲ Show 20 Lines • Show All 164 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/fmin_legacy.ll

Show All 27 Lines	define amdgpu_kernel void @s_test_fmin_legacy_subreg_inputs_f32(float addrspace(1)* %out, <4 x float> %reg0) #0 {
%r3 = select i1 %r2, float %r1, float %r0		%r3 = select i1 %r2, float %r1, float %r0
store float %r3, float addrspace(1)* %out		store float %r3, float addrspace(1)* %out
ret void		ret void
}		}

; FUNC-LABEL: {{^}}s_test_fmin_legacy_ule_f32:		; FUNC-LABEL: {{^}}s_test_fmin_legacy_ule_f32:
; GCN-DAG: s_load_dwordx2 s{{\[}}[[A:[0-9]+]]:[[B:[0-9]+]]{{\]}}, s{{\[[0-9]+:[0-9]+\]}}, {{0xb\|0x2c}}		; GCN-DAG: s_load_dwordx2 s{{\[}}[[A:[0-9]+]]:[[B:[0-9]+]]{{\]}}, s{{\[[0-9]+:[0-9]+\]}}, {{0xb\|0x2c}}

; GCN-DAG: v_mov_b32_e32 [[VB:v[0-9]+]], s[[B]]		; SI-SAFE: v_mov_b32_e32 [[VA:v[0-9]+]], s[[A]]

; SI-SAFE: v_min_legacy_f32_e64 {{v[0-9]+}}, [[VB]], s[[A]]		; GCN-NONAN: v_mov_b32_e32 [[VB:v[0-9]+]], s[[B]]

		; VI-SAFE: v_mov_b32_e32 [[VB:v[0-9]+]], s[[B]]

		; SI-SAFE: v_min_legacy_f32_e32 {{v[0-9]+}}, s[[B]], [[VA]]

; VI-SAFE: v_mov_b32_e32 [[VA:v[0-9]+]], s[[A]]		; VI-SAFE: v_mov_b32_e32 [[VA:v[0-9]+]], s[[A]]
; VI-SAFE: v_cmp_ngt_f32_e32 vcc, s[[A]], [[VB]]		; VI-SAFE: v_cmp_ngt_f32_e32 vcc, s[[A]], [[VB]]
; VI-SAFE: v_cndmask_b32_e32 v{{[0-9]+}}, [[VB]], [[VA]]		; VI-SAFE: v_cndmask_b32_e32 v{{[0-9]+}}, [[VB]], [[VA]]

; GCN-NONAN: v_min_f32_e32 {{v[0-9]+}}, s[[A]], [[VB]]		; GCN-NONAN: v_min_f32_e32 {{v[0-9]+}}, s[[A]], [[VB]]
define amdgpu_kernel void @s_test_fmin_legacy_ule_f32(float addrspace(1)* %out, float %a, float %b) #0 {		define amdgpu_kernel void @s_test_fmin_legacy_ule_f32(float addrspace(1)* %out, float %a, float %b) #0 {
%cmp = fcmp ule float %a, %b		%cmp = fcmp ule float %a, %b
▲ Show 20 Lines • Show All 233 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/fneg-fabs.ll

	; RUN: llc -amdgpu-scalarize-global-loads=false -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=SI -check-prefix=FUNC %s			; RUN: llc -amdgpu-scalarize-global-loads=false -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=SI -check-prefix=FUNC %s
	; RUN: llc -amdgpu-scalarize-global-loads=false -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=VI -check-prefix=FUNC %s			; RUN: llc -amdgpu-scalarize-global-loads=false -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=VI -check-prefix=FUNC %s
	; RUN: llc -amdgpu-scalarize-global-loads=false -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=R600 -check-prefix=FUNC %s			; RUN: llc -amdgpu-scalarize-global-loads=false -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=R600 -check-prefix=FUNC %s

	; FUNC-LABEL: {{^}}fneg_fabs_fadd_f32:			; FUNC-LABEL: {{^}}fneg_fabs_fadd_f32:
	; SI-NOT: and			; SI-NOT: and
	; SI: v_sub_f32_e64 {{v[0-9]+}}, {{v[0-9]+}}, \|{{s[0-9]+}}\|			; SI: v_sub_f32_e64 {{v[0-9]+}}, {{s[0-9]+}}, \|{{v[0-9]+}}\|
	define amdgpu_kernel void @fneg_fabs_fadd_f32(float addrspace(1)* %out, float %x, float %y) {			define amdgpu_kernel void @fneg_fabs_fadd_f32(float addrspace(1)* %out, float %x, float %y) {
	%fabs = call float @llvm.fabs.f32(float %x)			%fabs = call float @llvm.fabs.f32(float %x)
	%fsub = fsub float -0.000000e+00, %fabs			%fsub = fsub float -0.000000e+00, %fabs
	%fadd = fadd float %y, %fsub			%fadd = fadd float %y, %fsub
	store float %fadd, float addrspace(1)* %out, align 4			store float %fadd, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}fneg_fabs_fmul_f32:			; FUNC-LABEL: {{^}}fneg_fabs_fmul_f32:
	; SI-NOT: and			; SI-NOT: and
	; SI: v_mul_f32_e64 {{v[0-9]+}}, {{v[0-9]+}}, -\|{{s[0-9]+}}\|			; SI: v_mul_f32_e64 {{v[0-9]+}}, {{s[0-9]+}}, -\|{{v[0-9]+}}\|
	; SI-NOT: and			; SI-NOT: and
	define amdgpu_kernel void @fneg_fabs_fmul_f32(float addrspace(1)* %out, float %x, float %y) {			define amdgpu_kernel void @fneg_fabs_fmul_f32(float addrspace(1)* %out, float %x, float %y) {
	%fabs = call float @llvm.fabs.f32(float %x)			%fabs = call float @llvm.fabs.f32(float %x)
	%fsub = fsub float -0.000000e+00, %fabs			%fsub = fsub float -0.000000e+00, %fabs
	%fmul = fmul float %y, %fsub			%fmul = fmul float %y, %fsub
	store float %fmul, float addrspace(1)* %out, align 4			store float %fmul, float addrspace(1)* %out, align 4
	ret void			ret void
	}			}
	▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
	; FUNC-LABEL: {{^}}fneg_fabs_v2f32:			; FUNC-LABEL: {{^}}fneg_fabs_v2f32:
	; R600: \|{{(PV\|T[0-9])\.[XYZW]}}\|			; R600: \|{{(PV\|T[0-9])\.[XYZW]}}\|
	; R600: -PV			; R600: -PV
	; R600: \|{{(PV\|T[0-9])\.[XYZW]}}\|			; R600: \|{{(PV\|T[0-9])\.[XYZW]}}\|
	; R600: -PV			; R600: -PV

	; FIXME: In this case two uses of the constant should be folded			; FIXME: In this case two uses of the constant should be folded
	; SI: s_brev_b32 [[SIGNBITK:s[0-9]+]], 1{{$}}			; SI: s_brev_b32 [[SIGNBITK:s[0-9]+]], 1{{$}}
	; SI: v_or_b32_e32 v{{[0-9]+}}, [[SIGNBITK]], v{{[0-9]+}}			; SI: s_or_b32 s{{[0-9]+}}, s{{[0-9]+}}, [[SIGNBITK]]
	; SI: v_or_b32_e32 v{{[0-9]+}}, [[SIGNBITK]], v{{[0-9]+}}			; SI: s_or_b32 s{{[0-9]+}}, s{{[0-9]+}}, [[SIGNBITK]]
	define amdgpu_kernel void @fneg_fabs_v2f32(<2 x float> addrspace(1)* %out, <2 x float> %in) {			define amdgpu_kernel void @fneg_fabs_v2f32(<2 x float> addrspace(1)* %out, <2 x float> %in) {
	%fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %in)			%fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %in)
	%fsub = fsub <2 x float> <float -0.000000e+00, float -0.000000e+00>, %fabs			%fsub = fsub <2 x float> <float -0.000000e+00, float -0.000000e+00>, %fabs
	store <2 x float> %fsub, <2 x float> addrspace(1)* %out			store <2 x float> %fsub, <2 x float> addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}fneg_fabs_v4f32:			; FUNC-LABEL: {{^}}fneg_fabs_v4f32:
	; SI: s_brev_b32 [[SIGNBITK:s[0-9]+]], 1{{$}}			; SI: s_brev_b32 [[SIGNBITK:s[0-9]+]], 1{{$}}
	; SI: v_or_b32_e32 v{{[0-9]+}}, [[SIGNBITK]], v{{[0-9]+}}			; SI: s_or_b32 s{{[0-9]+}}, s{{[0-9]+}}, [[SIGNBITK]]
	; SI: v_or_b32_e32 v{{[0-9]+}}, [[SIGNBITK]], v{{[0-9]+}}			; SI: s_or_b32 s{{[0-9]+}}, s{{[0-9]+}}, [[SIGNBITK]]
	; SI: v_or_b32_e32 v{{[0-9]+}}, [[SIGNBITK]], v{{[0-9]+}}			; SI: s_or_b32 s{{[0-9]+}}, s{{[0-9]+}}, [[SIGNBITK]]
	; SI: v_or_b32_e32 v{{[0-9]+}}, [[SIGNBITK]], v{{[0-9]+}}			; SI: s_or_b32 s{{[0-9]+}}, s{{[0-9]+}}, [[SIGNBITK]]
	define amdgpu_kernel void @fneg_fabs_v4f32(<4 x float> addrspace(1)* %out, <4 x float> %in) {			define amdgpu_kernel void @fneg_fabs_v4f32(<4 x float> addrspace(1)* %out, <4 x float> %in) {
	%fabs = call <4 x float> @llvm.fabs.v4f32(<4 x float> %in)			%fabs = call <4 x float> @llvm.fabs.v4f32(<4 x float> %in)
	%fsub = fsub <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %fabs			%fsub = fsub <4 x float> <float -0.000000e+00, float -0.000000e+00, float -0.000000e+00, float -0.000000e+00>, %fabs
	store <4 x float> %fsub, <4 x float> addrspace(1)* %out			store <4 x float> %fsub, <4 x float> addrspace(1)* %out
	ret void			ret void
	}			}

	declare float @fabs(float) readnone			declare float @fabs(float) readnone
	declare float @llvm.fabs.f32(float) readnone			declare float @llvm.fabs.f32(float) readnone
	declare <2 x float> @llvm.fabs.v2f32(<2 x float>) readnone			declare <2 x float> @llvm.fabs.v2f32(<2 x float>) readnone
	declare <4 x float> @llvm.fabs.v4f32(<4 x float>) readnone			declare <4 x float> @llvm.fabs.v4f32(<4 x float>) readnone

test/CodeGen/AMDGPU/fsub.ll

Show All 21 Lines	define amdgpu_kernel void @s_fsub_f32(float addrspace(1)* %out, float %a, float %b) {
store float %sub, float addrspace(1)* %out, align 4		store float %sub, float addrspace(1)* %out, align 4
ret void		ret void
}		}

; FUNC-LABEL: {{^}}fsub_v2f32:		; FUNC-LABEL: {{^}}fsub_v2f32:
; R600-DAG: ADD {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[3].X, -KC0[3].Z		; R600-DAG: ADD {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[3].X, -KC0[3].Z
; R600-DAG: ADD {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].W, -KC0[3].Y		; R600-DAG: ADD {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].W, -KC0[3].Y

; SI: v_subrev_f32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; SI: v_sub_f32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
; SI: v_subrev_f32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; SI: v_sub_f32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
define amdgpu_kernel void @fsub_v2f32(<2 x float> addrspace(1)* %out, <2 x float> %a, <2 x float> %b) {		define amdgpu_kernel void @fsub_v2f32(<2 x float> addrspace(1)* %out, <2 x float> %a, <2 x float> %b) {
%sub = fsub <2 x float> %a, %b		%sub = fsub <2 x float> %a, %b
store <2 x float> %sub, <2 x float> addrspace(1)* %out, align 8		store <2 x float> %sub, <2 x float> addrspace(1)* %out, align 8
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_fsub_v4f32:		; FUNC-LABEL: {{^}}v_fsub_v4f32:
; R600: ADD {{\** *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], -T[0-9]+\.[XYZW]}}		; R600: ADD {{\** *}}T{{[0-9]+\.[XYZW], T[0-9]+\.[XYZW], -T[0-9]+\.[XYZW]}}
Show All 10 Lines	define amdgpu_kernel void @v_fsub_v4f32(<4 x float> addrspace(1)* %out, <4 x float> addrspace(1)* %in) {
%a = load <4 x float>, <4 x float> addrspace(1)* %in, align 16		%a = load <4 x float>, <4 x float> addrspace(1)* %in, align 16
%b = load <4 x float>, <4 x float> addrspace(1)* %b_ptr, align 16		%b = load <4 x float>, <4 x float> addrspace(1)* %b_ptr, align 16
%result = fsub <4 x float> %a, %b		%result = fsub <4 x float> %a, %b
store <4 x float> %result, <4 x float> addrspace(1)* %out, align 16		store <4 x float> %result, <4 x float> addrspace(1)* %out, align 16
ret void		ret void
}		}

; FUNC-LABEL: {{^}}s_fsub_v4f32:		; FUNC-LABEL: {{^}}s_fsub_v4f32:
; SI: v_subrev_f32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; SI: v_sub_f32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
; SI: v_subrev_f32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; SI: v_sub_f32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
; SI: v_subrev_f32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; SI: v_sub_f32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
; SI: v_subrev_f32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}		; SI: v_sub_f32_e32 {{v[0-9]+}}, {{s[0-9]+}}, {{v[0-9]+}}
; SI: s_endpgm		; SI: s_endpgm
define amdgpu_kernel void @s_fsub_v4f32(<4 x float> addrspace(1)* %out, <4 x float> %a, <4 x float> %b) {		define amdgpu_kernel void @s_fsub_v4f32(<4 x float> addrspace(1)* %out, <4 x float> %a, <4 x float> %b) {
%result = fsub <4 x float> %a, %b		%result = fsub <4 x float> %a, %b
store <4 x float> %result, <4 x float> addrspace(1)* %out, align 16		store <4 x float> %result, <4 x float> addrspace(1)* %out, align 16
ret void		ret void
}		}

; FUNC-LABEL: {{^}}v_fneg_fsub_f32:		; FUNC-LABEL: {{^}}v_fneg_fsub_f32:
▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/i1-copy-from-loop.ll

	; RUN: llc -mtriple=amdgcn-- -verify-machineinstrs < %s \| FileCheck -check-prefix=SI %s			; RUN: llc -mtriple=amdgcn-- -verify-machineinstrs < %s \| FileCheck -check-prefix=SI %s
	; RUN: llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=SI %s			; RUN: llc -mtriple=amdgcn-- -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefix=SI %s

	; SI-LABEL: {{^}}i1_copy_from_loop:			; SI-LABEL: {{^}}i1_copy_from_loop:
	;			;
	; SI: ; %for.body			; SI: ; %for.body
	; SI: v_cmp_gt_u32_e64 [[CC_SREG:s\[[0-9]+:[0-9]+\]]], 4,			; SI: v_cmp_lt_u32_e64 [[CC_SREG:s\[[0-9]+:[0-9]+\]]], s{{[0-9+]}}, 4
	; SI-DAG: s_andn2_b64 [[CC_ACCUM:s\[[0-9]+:[0-9]+\]]], [[CC_ACCUM]], exec
	; SI-DAG: s_and_b64 [[CC_MASK:s\[[0-9]+:[0-9]+\]]], [[CC_SREG]], exec
	; SI: s_or_b64 [[CC_ACCUM]], [[CC_ACCUM]], [[CC_MASK]]

	; SI: ; %Flow1
	; SI: s_or_b64 [[CC_ACCUM]], [[CC_ACCUM]], exec

	; SI: ; %Flow			; SI: ; %Flow
	; SI-DAG: s_andn2_b64 [[LCSSA_ACCUM:s\[[0-9]+:[0-9]+\]]], [[LCSSA_ACCUM]], exec			; SI-DAG: s_andn2_b64 [[LCSSA_ACCUM:s\[[0-9]+:[0-9]+\]]], [[LCSSA_ACCUM]], exec
	; SI-DAG: s_and_b64 [[CC_MASK2:s\[[0-9]+:[0-9]+\]]], [[CC_ACCUM]], exec			; SI-DAG: s_and_b64 [[CC_MASK2:s\[[0-9]+:[0-9]+\]]], [[CC_SREG]], exec
	; SI: s_or_b64 [[LCSSA_ACCUM]], [[LCSSA_ACCUM]], [[CC_MASK2]]			; SI: s_or_b64 [[LCSSA_ACCUM]], [[LCSSA_ACCUM]], [[CC_MASK2]]

	; SI: ; %for.end			; SI: ; %for.end
	; SI: s_and_saveexec_b64 {{s\[[0-9]+:[0-9]+\]}}, [[LCSSA_ACCUM]]			; SI: s_and_saveexec_b64 {{s\[[0-9]+:[0-9]+\]}}, [[LCSSA_ACCUM]]

	define amdgpu_ps void @i1_copy_from_loop(<4 x i32> inreg %rsrc, i32 %tid) {			define amdgpu_ps void @i1_copy_from_loop(<4 x i32> inreg %rsrc, i32 %tid) {
	entry:			entry:
	br label %for.body			br label %for.body
	Show All 31 Lines

test/CodeGen/AMDGPU/i1-copy-phi-uniform-branch.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	; GCN-LABEL: {{^}}test_dont_clobber_scc:			; GCN-LABEL: {{^}}test_dont_clobber_scc:

	; GCN: ; %entry			; GCN: ; %entry
	; GCN: s_cmp_eq_u32 s0, 0			; GCN: s_cmp_eq_u32 s0, 0
	; GCN: s_cbranch_scc1 [[PREEXIT:BB[0-9_]+]]			; GCN: s_cbranch_scc1 [[PREEXIT:BB[0-9_]+]]

	; GCN: ; %blocka			; GCN: ; %blocka
	; GCN: s_xor_b64 s[{{[0-9:]+}}], exec, -1
	; GCN: s_cmp_eq_u32 s1, 0			; GCN: s_cmp_eq_u32 s1, 0
	; GCN: s_cbranch_scc1 [[EXIT:BB[0-9_]+]]			; GCN: s_cbranch_scc1 [[EXIT:BB[0-9_]+]]

	; GCN: [[PREEXIT]]:			; GCN: [[PREEXIT]]:
	; GCN: [[EXIT]]:			; GCN: [[EXIT]]:

	define amdgpu_vs float @test_dont_clobber_scc(i32 inreg %uni, i32 inreg %uni2) #0 {			define amdgpu_vs float @test_dont_clobber_scc(i32 inreg %uni, i32 inreg %uni2) #0 {
	entry:			entry:
	Show All 20 Lines

test/CodeGen/AMDGPU/insert_vector_elt.ll

	; RUN: llc -verify-machineinstrs -mtriple=amdgcn-amd-amdhsa -mcpu=kaveri -mattr=-flat-for-global,+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SI,GCN-NO-TONGA %s			; RUN: llc -verify-machineinstrs -mtriple=amdgcn-amd-amdhsa -mcpu=kaveri -mattr=-flat-for-global,+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,SI,GCN-NO-TONGA %s
	; RUN: llc -verify-machineinstrs -mtriple=amdgcn-amd-amdhsa -mcpu=tonga -mattr=-flat-for-global -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,VI,GCN-TONGA %s			; RUN: llc -verify-machineinstrs -mtriple=amdgcn-amd-amdhsa -mcpu=tonga -mattr=-flat-for-global -mattr=+max-private-element-size-16 < %s \| FileCheck -enable-var-scope -check-prefixes=GCN,VI,GCN-TONGA %s

	; FIXME: Broken on evergreen			; FIXME: Broken on evergreen
	; FIXME: For some reason the 8 and 16 vectors are being stored as			; FIXME: For some reason the 8 and 16 vectors are being stored as
	; individual elements instead of 128-bit stores.			; individual elements instead of 128-bit stores.


	; FIXME: Why is the constant moved into the intermediate register and			; FIXME: Why is the constant moved into the intermediate register and
	; not just directly into the vector component?			; not just directly into the vector component?

	; GCN-LABEL: {{^}}insertelement_v4f32_0:			; GCN-LABEL: {{^}}insertelement_v4f32_0:
	; GCN: s_load_dwordx4			; GCN: s_load_dwordx4
				; GCN-DAG: s_mov_b32 [[CONSTREG:s[0-9]+]], 0x40a00000
				; GCN-DAG: v_mov_b32_e32 v[[LOW_REG:[0-9]+]], [[CONSTREG]]

	; GCN-DAG: v_mov_b32_e32 v{{[0-9]+}}, s{{[0-9]+}}			; GCN-DAG: v_mov_b32_e32 v{{[0-9]+}}, s{{[0-9]+}}
	; GCN-DAG: v_mov_b32_e32 v{{[0-9]+}}, s{{[0-9]+}}			; GCN-DAG: v_mov_b32_e32 v{{[0-9]+}}, s{{[0-9]+}}
	; GCN-DAG: v_mov_b32_e32 v{{[0-9]+}}, s{{[0-9]+}}			; GCN-DAG: v_mov_b32_e32 v{{[0-9]+}}, s{{[0-9]+}}
	; GCN-DAG: v_mov_b32_e32 v{{[0-9]+}}, s{{[0-9]+}}
	; GCN-DAG: s_mov_b32 [[CONSTREG:s[0-9]+]], 0x40a00000
	; GCN-DAG: v_mov_b32_e32 v[[LOW_REG:[0-9]+]], [[CONSTREG]]
	; GCN: buffer_store_dwordx4 v{{\[}}[[LOW_REG]]:			; GCN: buffer_store_dwordx4 v{{\[}}[[LOW_REG]]:
	define amdgpu_kernel void @insertelement_v4f32_0(<4 x float> addrspace(1)* %out, <4 x float> %a) nounwind {			define amdgpu_kernel void @insertelement_v4f32_0(<4 x float> addrspace(1)* %out, <4 x float> %a) nounwind {
	%vecins = insertelement <4 x float> %a, float 5.000000e+00, i32 0			%vecins = insertelement <4 x float> %a, float 5.000000e+00, i32 0
	store <4 x float> %vecins, <4 x float> addrspace(1)* %out, align 16			store <4 x float> %vecins, <4 x float> addrspace(1)* %out, align 16
	ret void			ret void
	}			}

	; GCN-LABEL: {{^}}insertelement_v4f32_1:			; GCN-LABEL: {{^}}insertelement_v4f32_1:
	▲ Show 20 Lines • Show All 452 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.div.scale.ll

Show First 20 Lines • Show All 381 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @test_div_scale_f32_undef_val_val(float addrspace(1)* %out) #0 {
%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float undef, float 8.0, i1 false)		%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float undef, float 8.0, i1 false)
%result0 = extractvalue { float, i1 } %result, 0		%result0 = extractvalue { float, i1 } %result, 0
store float %result0, float addrspace(1)* %out, align 4		store float %result0, float addrspace(1)* %out, align 4
ret void		ret void
}		}

; SI-LABEL: {{^}}test_div_scale_f32_undef_undef_val:		; SI-LABEL: {{^}}test_div_scale_f32_undef_undef_val:
; SI-NOT: v0		; SI-NOT: v0
; SI: v_div_scale_f32 v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}, v0, v0, v0		; SI: v_div_scale_f32 v{{[0-9]+}}, s{{\[[0-9]+:[0-9]+\]}}, s0, s0, v0
define amdgpu_kernel void @test_div_scale_f32_undef_undef_val(float addrspace(1)* %out) #0 {		define amdgpu_kernel void @test_div_scale_f32_undef_undef_val(float addrspace(1)* %out) #0 {
%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float undef, float undef, i1 false)		%result = call { float, i1 } @llvm.amdgcn.div.scale.f32(float undef, float undef, i1 false)
%result0 = extractvalue { float, i1 } %result, 0		%result0 = extractvalue { float, i1 } %result, 0
store float %result0, float addrspace(1)* %out, align 4		store float %result0, float addrspace(1)* %out, align 4
ret void		ret void
}		}

; SI-LABEL: {{^}}test_div_scale_f64_val_undef_val:		; SI-LABEL: {{^}}test_div_scale_f64_val_undef_val:
Show All 12 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.fmed3.ll

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
define amdgpu_kernel void @test_fabs_fmed3(float addrspace(1)* %out, float %src0, float %src1, float %src2) #1 {		define amdgpu_kernel void @test_fabs_fmed3(float addrspace(1)* %out, float %src0, float %src1, float %src2) #1 {
%med3 = call float @llvm.amdgcn.fmed3.f32(float %src0, float %src1, float %src2)		%med3 = call float @llvm.amdgcn.fmed3.f32(float %src0, float %src1, float %src2)
%fabs.med3 = call float @llvm.fabs.f32(float %med3)		%fabs.med3 = call float @llvm.fabs.f32(float %med3)
store float %fabs.med3, float addrspace(1)* %out		store float %fabs.med3, float addrspace(1)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_fneg_fmed3_rr_0:		; GCN-LABEL: {{^}}test_fneg_fmed3_rr_0:
; GCN: s_brev_b32 [[NEG0:s[0-9]+]], 1		; GCN: v_bfrev_b32_e32 [[NEG0:v[0-9]+]], 1
; GCN: v_med3_f32 v{{[0-9]+}}, -v{{[0-9]+}}, -v{{[0-9]+}}, [[NEG0]]		; GCN: v_med3_f32 v{{[0-9]+}}, -s{{[0-9]+}}, -v{{[0-9]+}}, [[NEG0]]
define amdgpu_kernel void @test_fneg_fmed3_rr_0(float addrspace(1)* %out, float %src0, float %src1) #1 {		define amdgpu_kernel void @test_fneg_fmed3_rr_0(float addrspace(1)* %out, float %src0, float %src1) #1 {
%med3 = call float @llvm.amdgcn.fmed3.f32(float %src0, float %src1, float 0.0)		%med3 = call float @llvm.amdgcn.fmed3.f32(float %src0, float %src1, float 0.0)
%neg.med3 = fsub float -0.0, %med3		%neg.med3 = fsub float -0.0, %med3
store float %neg.med3, float addrspace(1)* %out		store float %neg.med3, float addrspace(1)* %out
ret void		ret void
}		}

; FIXME: Worse off from folding this		; FIXME: Worse off from folding this
Show All 17 Lines	define amdgpu_kernel void @test_fneg_fmed3_r_inv2pi_0(float addrspace(1)* %out, float %src0) #1 {
%med3 = call float @llvm.amdgcn.fmed3.f32(float %src0, float 0x3FC45F3060000000, float 0.0)		%med3 = call float @llvm.amdgcn.fmed3.f32(float %src0, float 0x3FC45F3060000000, float 0.0)
%neg.med3 = fsub float -0.0, %med3		%neg.med3 = fsub float -0.0, %med3
store float %neg.med3, float addrspace(1)* %out		store float %neg.med3, float addrspace(1)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_fneg_fmed3_r_inv2pi_0_foldable_user:		; GCN-LABEL: {{^}}test_fneg_fmed3_r_inv2pi_0_foldable_user:
; GCN-DAG: v_bfrev_b32_e32 [[NEG0:v[0-9]+]], 1		; GCN-DAG: v_bfrev_b32_e32 [[NEG0:v[0-9]+]], 1
; GCN-DAG: s_mov_b32 [[NEG_INV:s[0-9]+]], 0xbe22f983		; GCN-DAG: v_mov_b32_e32 [[NEG_INV:v[0-9]+]], 0xbe22f983
; GCN: v_med3_f32 [[MED3:v[0-9]+]], -v{{[0-9]+}}, [[NEG_INV]], [[NEG0]]		; GCN: v_med3_f32 [[MED3:v[0-9]+]], -s{{[0-9]+}}, [[NEG_INV]], [[NEG0]]
; GCN: v_mul_f32_e32 v{{[0-9]+}}, s{{[0-9]+}}, [[MED3]]		; GCN: v_mul_f32_e32 v{{[0-9]+}}, s{{[0-9]+}}, [[MED3]]
define amdgpu_kernel void @test_fneg_fmed3_r_inv2pi_0_foldable_user(float addrspace(1)* %out, float %src0, float %mul.arg) #1 {		define amdgpu_kernel void @test_fneg_fmed3_r_inv2pi_0_foldable_user(float addrspace(1)* %out, float %src0, float %mul.arg) #1 {
%med3 = call float @llvm.amdgcn.fmed3.f32(float %src0, float 0x3FC45F3060000000, float 0.0)		%med3 = call float @llvm.amdgcn.fmed3.f32(float %src0, float 0x3FC45F3060000000, float 0.0)
%neg.med3 = fsub float -0.0, %med3		%neg.med3 = fsub float -0.0, %med3
%mul = fmul float %neg.med3, %mul.arg		%mul = fmul float %neg.med3, %mul.arg
store float %mul, float addrspace(1)* %out		store float %mul, float addrspace(1)* %out
ret void		ret void
}		}

declare float @llvm.amdgcn.fmed3.f32(float, float, float) #0		declare float @llvm.amdgcn.fmed3.f32(float, float, float) #0
declare float @llvm.fabs.f32(float) #0		declare float @llvm.fabs.f32(float) #0

attributes #0 = { nounwind readnone }		attributes #0 = { nounwind readnone }
attributes #1 = { nounwind }		attributes #1 = { nounwind }

test/CodeGen/AMDGPU/llvm.amdgcn.mov.dpp.ll

Show All 36 Lines	define amdgpu_kernel void @dpp_wait_states(i32 addrspace(1)* %out, i32 %in) {
ret void		ret void
}		}

; VI-LABEL: {{^}}dpp_first_in_bb:		; VI-LABEL: {{^}}dpp_first_in_bb:
; VI: ; %endif		; VI: ; %endif
; VI-OPT: s_mov_b32		; VI-OPT: s_mov_b32
; VI-OPT: s_mov_b32		; VI-OPT: s_mov_b32
; VI-NOOPT: s_waitcnt		; VI-NOOPT: s_waitcnt
		; VI-NOOPT-NEXT: v_mov_b32_e32
		; VI-NOOPT-NEXT: s_nop 0
; VI-NOOPT-NEXT: s_nop 0		; VI-NOOPT-NEXT: s_nop 0
; VI: v_mov_b32_dpp [[VGPR0:v[0-9]+]], v{{[0-9]+}} quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0		; VI: v_mov_b32_dpp [[VGPR0:v[0-9]+]], v{{[0-9]+}} quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0
; VI-OPT: s_nop 1		; VI-OPT: s_nop 1
; VI: v_mov_b32_dpp [[VGPR1:v[0-9]+]], [[VGPR0]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0		; VI: v_mov_b32_dpp [[VGPR1:v[0-9]+]], [[VGPR0]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0
; VI-OPT: s_nop 1		; VI-OPT: s_nop 1
; VI-NOOPT: s_nop 0		; VI-NOOPT: s_nop 0
; VI-NOOPT: s_nop 0		; VI-NOOPT: s_nop 0
; VI: v_mov_b32_dpp v{{[0-9]+}}, [[VGPR1]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0		; VI: v_mov_b32_dpp v{{[0-9]+}}, [[VGPR1]] quad_perm:[1,0,0,0] row_mask:0x1 bank_mask:0x1 bound_ctrl:0
Show All 29 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.mqsad.pk.u16.u8.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s
	; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	declare i64 @llvm.amdgcn.mqsad.pk.u16.u8(i64, i32, i64) #0			declare i64 @llvm.amdgcn.mqsad.pk.u16.u8(i64, i32, i64) #0

	; GCN-LABEL: {{^}}v_mqsad_pk_u16_u8:			; GCN-LABEL: {{^}}v_mqsad_pk_u16_u8:
	; GCN: v_mqsad_pk_u16_u8 v[0:1], v[4:5], v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}]			; GCN: v_mqsad_pk_u16_u8 v[0:1], v[4:5], s{{[0-9]+}}, v[{{[0-9]+:[0-9]+}}]
	; GCN-DAG: v_mov_b32_e32 v5, v1			; GCN-DAG: v_mov_b32_e32 v5, v1
	; GCN-DAG: v_mov_b32_e32 v4, v0			; GCN-DAG: v_mov_b32_e32 v4, v0
	define amdgpu_kernel void @v_mqsad_pk_u16_u8(i64 addrspace(1)* %out, i64 %src) {			define amdgpu_kernel void @v_mqsad_pk_u16_u8(i64 addrspace(1)* %out, i64 %src) {
	%tmp = call i64 asm "v_lsrlrev_b64 $0, $1, 1", "={v[4:5]},v"(i64 %src) #0			%tmp = call i64 asm "v_lsrlrev_b64 $0, $1, 1", "={v[4:5]},v"(i64 %src) #0
	%tmp1 = call i64 @llvm.amdgcn.mqsad.pk.u16.u8(i64 %tmp, i32 100, i64 100) #0			%tmp1 = call i64 @llvm.amdgcn.mqsad.pk.u16.u8(i64 %tmp, i32 100, i64 100) #0
	%tmp2 = call i64 asm ";; force constraint", "=v,{v[4:5]}"(i64 %tmp1) #0			%tmp2 = call i64 asm ";; force constraint", "=v,{v[4:5]}"(i64 %tmp1) #0
	store i64 %tmp2, i64 addrspace(1)* %out, align 4			store i64 %tmp2, i64 addrspace(1)* %out, align 4
	ret void			ret void
	Show All 17 Lines

test/CodeGen/AMDGPU/llvm.amdgcn.qsad.pk.u16.u8.ll

	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s
	; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -mcpu=fiji -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	declare i64 @llvm.amdgcn.qsad.pk.u16.u8(i64, i32, i64) #0			declare i64 @llvm.amdgcn.qsad.pk.u16.u8(i64, i32, i64) #0

	; GCN-LABEL: {{^}}v_qsad_pk_u16_u8:			; GCN-LABEL: {{^}}v_qsad_pk_u16_u8:
	; GCN: v_qsad_pk_u16_u8 v[0:1], v[4:5], v{{[0-9]+}}, s[{{[0-9]+:[0-9]+}}]			; GCN: v_qsad_pk_u16_u8 v[0:1], v[4:5], s{{[0-9]+}}, v[{{[0-9]+:[0-9]+}}]
	; GCN-DAG: v_mov_b32_e32 v5, v1			; GCN-DAG: v_mov_b32_e32 v5, v1
	; GCN-DAG: v_mov_b32_e32 v4, v0			; GCN-DAG: v_mov_b32_e32 v4, v0
	define amdgpu_kernel void @v_qsad_pk_u16_u8(i64 addrspace(1)* %out, i64 %src) {			define amdgpu_kernel void @v_qsad_pk_u16_u8(i64 addrspace(1)* %out, i64 %src) {
	%tmp = call i64 asm "v_lsrlrev_b64 $0, $1, 1", "={v[4:5]},v"(i64 %src) #0			%tmp = call i64 asm "v_lsrlrev_b64 $0, $1, 1", "={v[4:5]},v"(i64 %src) #0
	%tmp1 = call i64 @llvm.amdgcn.qsad.pk.u16.u8(i64 %tmp, i32 100, i64 100) #0			%tmp1 = call i64 @llvm.amdgcn.qsad.pk.u16.u8(i64 %tmp, i32 100, i64 100) #0
	%tmp2 = call i64 asm ";; force constraint", "=v,{v[4:5]}"(i64 %tmp1) #0			%tmp2 = call i64 asm ";; force constraint", "=v,{v[4:5]}"(i64 %tmp1) #0
	store i64 %tmp2, i64 addrspace(1)* %out, align 4			store i64 %tmp2, i64 addrspace(1)* %out, align 4
	ret void			ret void
	Show All 17 Lines

test/CodeGen/AMDGPU/loop_break.ll

	Show All 20 Lines

	; OPT: bb9:			; OPT: bb9:
	; OPT: call void @llvm.amdgcn.end.cf(i64			; OPT: call void @llvm.amdgcn.end.cf(i64

	; GCN-LABEL: {{^}}break_loop:			; GCN-LABEL: {{^}}break_loop:
	; GCN: s_mov_b64 [[OUTER_MASK:s\[[0-9]+:[0-9]+\]]], 0{{$}}			; GCN: s_mov_b64 [[OUTER_MASK:s\[[0-9]+:[0-9]+\]]], 0{{$}}

	; GCN: [[LOOP_ENTRY:BB[0-9]+_[0-9]+]]: ; %bb1			; GCN: [[LOOP_ENTRY:BB[0-9]+_[0-9]+]]: ; %bb1
	; GCN: v_cmp_lt_i32_e32 vcc, -1
	; GCN: s_and_b64 vcc, exec, vcc
	; GCN: s_or_b64 [[INNER_MASK:s\[[0-9]+:[0-9]+\]]], [[INNER_MASK]], exec			; GCN: s_or_b64 [[INNER_MASK:s\[[0-9]+:[0-9]+\]]], [[INNER_MASK]], exec
	; GCN: s_cbranch_vccnz [[FLOW:BB[0-9]+_[0-9]+]]			; GCN: s_cmp_gt_i32 s4, -1
				; GCN: s_cbranch_scc1 [[FLOW:BB[0-9]+_[0-9]+]]

	; GCN: ; %bb4			; GCN: ; %bb4
	; GCN: buffer_load_dword			; GCN: buffer_load_dword
	; GCN: v_cmp_ge_i32_e32 vcc,			; GCN: v_cmp_ge_i32_e32 vcc,
	; GCN: s_andn2_b64 [[INNER_MASK]], [[INNER_MASK]], exec			; GCN: s_andn2_b64 [[INNER_MASK]], [[INNER_MASK]], exec
	; GCN: s_and_b64 [[TMP0:s\[[0-9]+:[0-9]+\]]], vcc, exec			; GCN: s_and_b64 [[TMP0:s\[[0-9]+:[0-9]+\]]], vcc, exec
	; GCN: s_or_b64 [[INNER_MASK]], [[INNER_MASK]], [[TMP0]]			; GCN: s_or_b64 [[INNER_MASK]], [[INNER_MASK]], [[TMP0]]

	; GCN: [[FLOW]]: ; %Flow			; GCN: [[FLOW]]: ; %Flow
				; GCN: ; in Loop: Header=BB0_1 Depth=1
	; GCN: s_and_b64 [[TMP1:s\[[0-9]+:[0-9]+\]]], exec, [[INNER_MASK]]			; GCN: s_and_b64 [[TMP1:s\[[0-9]+:[0-9]+\]]], exec, [[INNER_MASK]]
	; GCN: s_or_b64 [[TMP1]], [[TMP1]], [[OUTER_MASK]]			; GCN: s_or_b64 [[TMP1]], [[TMP1]], [[OUTER_MASK]]
	; GCN: s_mov_b64 [[OUTER_MASK]], [[TMP1]]			; GCN: s_mov_b64 [[OUTER_MASK]], [[TMP1]]
	; GCN: s_andn2_b64 exec, exec, [[TMP1]]			; GCN: s_andn2_b64 exec, exec, [[TMP1]]
	; GCN-NEXT: s_cbranch_execnz [[LOOP_ENTRY]]			; GCN-NEXT: s_cbranch_execnz [[LOOP_ENTRY]]

	; GCN: ; %bb.4: ; %bb9			; GCN: ; %bb.4: ; %bb9
	; GCN-NEXT: s_endpgm			; GCN-NEXT: s_endpgm
	▲ Show 20 Lines • Show All 286 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/madak.ll

; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX6,GFX6_8_9,MAD %s		; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX6,GFX6_8_9,MAD %s
; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX8,GFX6_8_9,GFX8_9,GFX8_9_10,MAD %s		; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs < %s \| FileCheck -check-prefixes=GCN,GFX8,GFX6_8_9,GFX8_9,GFX8_9_10,MAD %s
; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-enable-global-sgpr-addr < %s \| FileCheck -check-prefixes=GCN,GFX9,GFX6_8_9,GFX8_9,GFX8_9_10,MAD %s		; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-enable-global-sgpr-addr < %s \| FileCheck -check-prefixes=GCN,GFX9,GFX6_8_9,GFX8_9,GFX8_9_10,MAD %s
; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -amdgpu-enable-global-sgpr-addr < %s \| FileCheck -check-prefixes=GCN,GFX10,GFX8_9_10,MAD,GFX10-MAD %s		; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -amdgpu-enable-global-sgpr-addr < %s \| FileCheck -check-prefixes=GCN,GFX10,GFX8_9_10,GFX10-MAD %s
; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -fp-contract=fast -amdgpu-enable-global-sgpr-addr < %s \| FileCheck -check-prefixes=GCN,GFX10,GFX8_9_10,FMA %s		; RUN: llc -march=amdgcn -mcpu=gfx1010 -verify-machineinstrs -fp-contract=fast -amdgpu-enable-global-sgpr-addr < %s \| FileCheck -check-prefixes=GCN,GFX10,GFX8_9_10,FMA %s

declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone		declare i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
declare float @llvm.fabs.f32(float) nounwind readnone		declare float @llvm.fabs.f32(float) nounwind readnone

; GCN-LABEL: {{^}}madak_f32:		; GCN-LABEL: {{^}}madak_f32:
; GFX6: buffer_load_dword [[VA:v[0-9]+]]		; GFX6: buffer_load_dword [[VA:v[0-9]+]]
; GFX6: buffer_load_dword [[VB:v[0-9]+]]		; GFX6: buffer_load_dword [[VB:v[0-9]+]]
; GFX8: {{flat\|global}}_load_dword [[VB:v[0-9]+]]		; GFX8: {{flat\|global}}_load_dword [[VB:v[0-9]+]]
; GFX8: {{flat\|global}}_load_dword [[VA:v[0-9]+]]		; GFX8: {{flat\|global}}_load_dword [[VA:v[0-9]+]]
; GFX9: {{flat\|global}}_load_dword [[VA:v[0-9]+]]		; GFX9: {{flat\|global}}_load_dword [[VA:v[0-9]+]]
; GFX9: {{flat\|global}}_load_dword [[VB:v[0-9]+]]		; GFX9: {{flat\|global}}_load_dword [[VB:v[0-9]+]]
; GFX10: {{flat\|global}}_load_dword [[VA:v[0-9]+]]		; GFX10: {{flat\|global}}_load_dword [[VA:v[0-9]+]]
; GFX10: {{flat\|global}}_load_dword [[VB:v[0-9]+]]		; GFX10: {{flat\|global}}_load_dword [[VB:v[0-9]+]]
; MAD: v_madak_f32 {{v[0-9]+}}, [[VA]], [[VB]], 0x41200000		; MAD: v_madak_f32 {{v[0-9]+}}, [[VA]], [[VB]], 0x41200000
		; GFX10-MAD: v_madak_f32 {{v[0-9]+}}, [[VA]], [[VB]], 0x41200000
; FMA: v_fmaak_f32 {{v[0-9]+}}, [[VA]], [[VB]], 0x41200000		; FMA: v_fmaak_f32 {{v[0-9]+}}, [[VA]], [[VB]], 0x41200000
define amdgpu_kernel void @madak_f32(float addrspace(1)* noalias %out, float addrspace(1)* noalias %in.a, float addrspace(1)* noalias %in.b) nounwind {		define amdgpu_kernel void @madak_f32(float addrspace(1)* noalias %out, float addrspace(1)* noalias %in.a, float addrspace(1)* noalias %in.b) nounwind {
%tid = tail call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone		%tid = tail call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
%in.a.gep = getelementptr float, float addrspace(1)* %in.a, i32 %tid		%in.a.gep = getelementptr float, float addrspace(1)* %in.a, i32 %tid
%in.b.gep = getelementptr float, float addrspace(1)* %in.b, i32 %tid		%in.b.gep = getelementptr float, float addrspace(1)* %in.b, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid

%a = load float, float addrspace(1)* %in.a.gep, align 4		%a = load float, float addrspace(1)* %in.a.gep, align 4
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @madak_2_use_f32(float addrspace(1)* noalias %out, float addrspace(1)* noalias %in) nounwind {
store volatile float %madak0, float addrspace(1)* %out.gep.0, align 4		store volatile float %madak0, float addrspace(1)* %out.gep.0, align 4
store volatile float %madak1, float addrspace(1)* %out.gep.1, align 4		store volatile float %madak1, float addrspace(1)* %out.gep.1, align 4
ret void		ret void
}		}

; GCN-LABEL: {{^}}madak_m_inline_imm_f32:		; GCN-LABEL: {{^}}madak_m_inline_imm_f32:
; GCN: {{buffer\|flat\|global}}_load_dword [[VA:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[VA:v[0-9]+]]
; MAD: v_madak_f32 {{v[0-9]+}}, 4.0, [[VA]], 0x41200000		; MAD: v_madak_f32 {{v[0-9]+}}, 4.0, [[VA]], 0x41200000
		; GFX10-MAD: v_madak_f32 {{v[0-9]+}}, 4.0, [[VA]], 0x41200000
; FMA: v_fmaak_f32 {{v[0-9]+}}, 4.0, [[VA]], 0x41200000		; FMA: v_fmaak_f32 {{v[0-9]+}}, 4.0, [[VA]], 0x41200000
define amdgpu_kernel void @madak_m_inline_imm_f32(float addrspace(1)* noalias %out, float addrspace(1)* noalias %in.a) nounwind {		define amdgpu_kernel void @madak_m_inline_imm_f32(float addrspace(1)* noalias %out, float addrspace(1)* noalias %in.a) nounwind {
%tid = tail call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone		%tid = tail call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
%in.a.gep = getelementptr float, float addrspace(1)* %in.a, i32 %tid		%in.a.gep = getelementptr float, float addrspace(1)* %in.a, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid

%a = load float, float addrspace(1)* %in.a.gep, align 4		%a = load float, float addrspace(1)* %in.a.gep, align 4

Show All 11 Lines
; GFX6: buffer_load_dword [[VB:v[0-9]+]]		; GFX6: buffer_load_dword [[VB:v[0-9]+]]
; GFX8: {{flat\|global}}_load_dword [[VB:v[0-9]+]]		; GFX8: {{flat\|global}}_load_dword [[VB:v[0-9]+]]
; GFX8: {{flat\|global}}_load_dword [[VA:v[0-9]+]]		; GFX8: {{flat\|global}}_load_dword [[VA:v[0-9]+]]
; GFX9: {{flat\|global}}_load_dword [[VA:v[0-9]+]]		; GFX9: {{flat\|global}}_load_dword [[VA:v[0-9]+]]
; GFX9: {{flat\|global}}_load_dword [[VB:v[0-9]+]]		; GFX9: {{flat\|global}}_load_dword [[VB:v[0-9]+]]
; GFX10: {{flat\|global}}_load_dword [[VA:v[0-9]+]]		; GFX10: {{flat\|global}}_load_dword [[VA:v[0-9]+]]
; GFX10: {{flat\|global}}_load_dword [[VB:v[0-9]+]]		; GFX10: {{flat\|global}}_load_dword [[VB:v[0-9]+]]
; MAD: v_mad_f32 {{v[0-9]+}}, [[VA]], [[VB]], 4.0		; MAD: v_mad_f32 {{v[0-9]+}}, [[VA]], [[VB]], 4.0
		; GFX10-MAD: v_mad_f32 {{v[0-9]+}}, [[VA]], [[VB]], 4.0
; FMA: v_fma_f32 {{v[0-9]+}}, [[VA]], [[VB]], 4.0		; FMA: v_fma_f32 {{v[0-9]+}}, [[VA]], [[VB]], 4.0
define amdgpu_kernel void @madak_inline_imm_f32(float addrspace(1)* noalias %out, float addrspace(1)* noalias %in.a, float addrspace(1)* noalias %in.b) nounwind {		define amdgpu_kernel void @madak_inline_imm_f32(float addrspace(1)* noalias %out, float addrspace(1)* noalias %in.a, float addrspace(1)* noalias %in.b) nounwind {
%tid = tail call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone		%tid = tail call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
%in.a.gep = getelementptr float, float addrspace(1)* %in.a, i32 %tid		%in.a.gep = getelementptr float, float addrspace(1)* %in.a, i32 %tid
%in.b.gep = getelementptr float, float addrspace(1)* %in.b, i32 %tid		%in.b.gep = getelementptr float, float addrspace(1)* %in.b, i32 %tid
%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid		%out.gep = getelementptr float, float addrspace(1)* %out, i32 %tid

%a = load float, float addrspace(1)* %in.a.gep, align 4		%a = load float, float addrspace(1)* %in.a.gep, align 4
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @no_madak_src1_modifier_f32(float addrspace(1)* noalias %out, float addrspace(1)* noalias %in.a, float addrspace(1)* noalias %in.b) nounwind {
ret void		ret void
}		}

; SIFoldOperands should not fold the SGPR copy into the instruction before GFX10		; SIFoldOperands should not fold the SGPR copy into the instruction before GFX10
; because the implicit immediate already uses the constant bus.		; because the implicit immediate already uses the constant bus.
; On GFX10+ we can use two scalar operands.		; On GFX10+ we can use two scalar operands.
; GCN-LABEL: {{^}}madak_constant_bus_violation:		; GCN-LABEL: {{^}}madak_constant_bus_violation:
; GCN: s_load_dword [[SGPR0:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, {{0x12\|0x48}}		; GCN: s_load_dword [[SGPR0:s[0-9]+]], s{{\[[0-9]+:[0-9]+\]}}, {{0x12\|0x48}}
; GCN: v_mov_b32_e32 [[SGPR0_VCOPY:v[0-9]+]], [[SGPR0]]
; GCN: {{buffer\|flat\|global}}_load_dword [[VGPR:v[0-9]+]]		; GCN: {{buffer\|flat\|global}}_load_dword [[VGPR:v[0-9]+]]
; MAD: v_madak_f32 [[MADAK:v[0-9]+]], 0.5, [[SGPR0_VCOPY]], 0x42280000		; MAD: v_mov_b32_e32 [[MADAK:v[0-9]+]], 0x42280000
		; MAD: v_mac_f32_e64 [[MADAK]], [[SGPR0]], 0.5
		; GFX10: v_mov_b32_e32 [[SGPR0_VCOPY:v[0-9]+]], [[SGPR0]]
		; GFX10-MAD: v_madak_f32 [[MADAK:v[0-9]+]], 0.5, [[SGPR0_VCOPY]], 0x42280000
; FMA: v_fmaak_f32 [[MADAK:v[0-9]+]], 0.5, [[SGPR0_VCOPY]], 0x42280000		; FMA: v_fmaak_f32 [[MADAK:v[0-9]+]], 0.5, [[SGPR0_VCOPY]], 0x42280000
; GCN: v_mul_f32_e32 [[MUL:v[0-9]+]], [[MADAK]], [[VGPR]]		; GCN: v_mul_f32_e32 [[MUL:v[0-9]+]], [[MADAK]], [[VGPR]]
; GFX6: buffer_store_dword [[MUL]]		; GFX6: buffer_store_dword [[MUL]]
; GFX8_9_10: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[MUL]]		; GFX8_9_10: {{flat\|global}}_store_dword v[{{[0-9:]+}}], [[MUL]]
define amdgpu_kernel void @madak_constant_bus_violation(i32 %arg1, [8 x i32], float %sgpr0, float %sgpr1) #0 {		define amdgpu_kernel void @madak_constant_bus_violation(i32 %arg1, [8 x i32], float %sgpr0, float %sgpr1) #0 {
bb:		bb:
%tmp = icmp eq i32 %arg1, 0		%tmp = icmp eq i32 %arg1, 0
br i1 %tmp, label %bb3, label %bb4		br i1 %tmp, label %bb3, label %bb4
Show All 15 Lines

test/CodeGen/AMDGPU/mubuf-legalize-operands.ll

	Show First 20 Lines • Show All 149 Lines • ▼ Show 20 Lines
	; CHECK-O0: s_and_b64 [[CMP:s\[[0-9]+:[0-9]+\]]], [[CMP0]], [[CMP1]]			; CHECK-O0: s_and_b64 [[CMP:s\[[0-9]+:[0-9]+\]]], [[CMP0]], [[CMP1]]
	; CHECK-O0: s_and_saveexec_b64 [[CMP]], [[CMP]]			; CHECK-O0: s_and_saveexec_b64 [[CMP]], [[CMP]]
	; CHECK-O0: buffer_load_dword [[IDX:v[0-9]+]], off, s[0:3], s5 offset:[[IDX_OFF]] ; 4-byte Folded Reload			; CHECK-O0: buffer_load_dword [[IDX:v[0-9]+]], off, s[0:3], s5 offset:[[IDX_OFF]] ; 4-byte Folded Reload
	; CHECK-O0: buffer_load_format_x [[RES:v[0-9]+]], [[IDX]], s{{\[}}[[SRSRC0]]:[[SRSRC3]]{{\]}}, {{.*}} idxen			; CHECK-O0: buffer_load_format_x [[RES:v[0-9]+]], [[IDX]], s{{\[}}[[SRSRC0]]:[[SRSRC3]]{{\]}}, {{.*}} idxen
	; CHECK-O0: s_waitcnt vmcnt(0)			; CHECK-O0: s_waitcnt vmcnt(0)
	; CHECK-O0: buffer_store_dword [[RES]], off, s[0:3], s5 offset:[[RES_OFF_TMP:[0-9]+]] ; 4-byte Folded Spill			; CHECK-O0: buffer_store_dword [[RES]], off, s[0:3], s5 offset:[[RES_OFF_TMP:[0-9]+]] ; 4-byte Folded Spill
	; CHECK-O0: s_xor_b64 exec, exec, [[CMP]]			; CHECK-O0: s_xor_b64 exec, exec, [[CMP]]
	; CHECK-O0-NEXT: s_cbranch_execnz [[LOOPBB0]]			; CHECK-O0-NEXT: s_cbranch_execnz [[LOOPBB0]]
				; CHECK-O0: v_readlane_b32 s[[S1:[0-9]+]], v{{[0-9]+}}, 4
	; CHECK-O0: s_mov_b64 exec, [[SAVEEXEC]]			; CHECK-O0: v_readlane_b32 s[[S2:[0-9]+]], v{{[0-9]+}}, 5
				; CHECK-O0: s_mov_b64 exec, s{{\[}}[[S1]]:[[S2]]{{\]}}
	; CHECK-O0: buffer_load_dword [[RES:v[0-9]+]], off, s[0:3], s5 offset:[[RES_OFF_TMP]] ; 4-byte Folded Reload			; CHECK-O0: buffer_load_dword [[RES:v[0-9]+]], off, s[0:3], s5 offset:[[RES_OFF_TMP]] ; 4-byte Folded Reload
	; CHECK-O0: buffer_store_dword [[RES]], off, s[0:3], s5 offset:[[RES_OFF:[0-9]+]] ; 4-byte Folded Spill			; CHECK-O0: buffer_store_dword [[RES]], off, s[0:3], s5 offset:[[RES_OFF:[0-9]+]] ; 4-byte Folded Spill
	; CHECK-O0: s_cbranch_execz [[TERMBB:BB[0-9]+_[0-9]+]]			; CHECK-O0: s_cbranch_execz [[TERMBB:BB[0-9]+_[0-9]+]]

	; CHECK-O0: BB{{[0-9]+_[0-9]+}}:			; CHECK-O0: BB{{[0-9]+_[0-9]+}}:
	; CHECK-O0-DAG: s_mov_b64 s{{\[}}[[SAVEEXEC0:[0-9]+]]:[[SAVEEXEC1:[0-9]+]]{{\]}}, exec			; CHECK-O0-DAG: s_mov_b64 s{{\[}}[[SAVEEXEC0:[0-9]+]]:[[SAVEEXEC1:[0-9]+]]{{\]}}, exec
	; CHECK-O0-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s5 offset:[[IDX_OFF:[0-9]+]] ; 4-byte Folded Spill			; CHECK-O0-DAG: buffer_store_dword {{v[0-9]+}}, off, s[0:3], s5 offset:[[IDX_OFF:[0-9]+]] ; 4-byte Folded Spill
	; CHECK-O0: v_writelane_b32 [[VSAVEEXEC:v[0-9]+]], s[[SAVEEXEC0]], [[SAVEEXEC_IDX0:[0-9]+]]			; CHECK-O0: v_writelane_b32 [[VSAVEEXEC:v[0-9]+]], s[[SAVEEXEC0]], [[SAVEEXEC_IDX0:[0-9]+]]
	▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/multilevel-break.ll

	Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines

	; GCN-LABEL: {{^}}multi_if_break_loop:			; GCN-LABEL: {{^}}multi_if_break_loop:
	; GCN: s_mov_b64 [[LEFT:s\[[0-9]+:[0-9]+\]]], 0{{$}}			; GCN: s_mov_b64 [[LEFT:s\[[0-9]+:[0-9]+\]]], 0{{$}}

	; GCN: [[LOOP:BB[0-9]+_[0-9]+]]: ; %bb1{{$}}			; GCN: [[LOOP:BB[0-9]+_[0-9]+]]: ; %bb1{{$}}
	; GCN: s_mov_b64 [[OLD_LEFT:s\[[0-9]+:[0-9]+\]]], [[LEFT]]			; GCN: s_mov_b64 [[OLD_LEFT:s\[[0-9]+:[0-9]+\]]], [[LEFT]]

	; GCN: ; %LeafBlock1			; GCN: ; %LeafBlock1
	; GCN: s_mov_b64
	; GCN: s_mov_b64 [[BREAK:s\[[0-9]+:[0-9]+\]]], -1{{$}}			; GCN: s_mov_b64 [[BREAK:s\[[0-9]+:[0-9]+\]]], -1{{$}}

	; GCN: ; %case1			; GCN: ; %case1
	; GCN: buffer_load_dword [[LOAD2:v[0-9]+]],			; GCN: buffer_load_dword [[LOAD2:v[0-9]+]],
	; GCN: v_cmp_ge_i32_e32 vcc, {{v[0-9]+}}, [[LOAD2]]			; GCN: v_cmp_ge_i32_e32 vcc, {{v[0-9]+}}, [[LOAD2]]
	; GCN: s_orn2_b64 [[BREAK]], vcc, exec			; GCN: s_orn2_b64 [[BREAK]], vcc, exec

	; GCN: ; %Flow3			; GCN: ; %Flow3
	; GCN: s_branch [[FLOW:BB[0-9]+_[0-9]+]]			; GCN: s_branch [[FLOW:BB[0-9]+_[0-9]+]]

	; GCN: s_mov_b64 [[BREAK]], -1{{$}}			; GCN: s_mov_b64 [[BREAK]], -1{{$}}

	; GCN: [[FLOW]]: ; %Flow

	; GCN: ; %case0			; GCN: ; %case0
	; GCN: buffer_load_dword [[LOAD1:v[0-9]+]],			; GCN: buffer_load_dword [[LOAD1:v[0-9]+]],
	; GCN-DAG: s_andn2_b64 [[BREAK]], [[BREAK]], exec			; GCN-DAG: s_andn2_b64 [[BREAK]], [[BREAK]], exec
	; GCN-DAG: v_cmp_ge_i32_e32 vcc, {{v[0-9]+}}, [[LOAD1]]			; GCN-DAG: v_cmp_ge_i32_e32 vcc, {{v[0-9]+}}, [[LOAD1]]
	; GCN-DAG: s_and_b64 [[TMP:s\[[0-9]+:[0-9]+\]]], vcc, exec			; GCN-DAG: s_and_b64 [[TMP:s\[[0-9]+:[0-9]+\]]], vcc, exec
	; GCN: s_or_b64 [[BREAK]], [[BREAK]], [[TMP]]			; GCN: s_or_b64 [[BREAK]], [[BREAK]], [[TMP]]

	; GCN: ; %Flow4			; GCN: [[FLOW]]: ; %Flow4
	; GCN: s_and_b64 [[BREAK]], exec, [[BREAK]]			; GCN: s_and_b64 [[BREAK]], exec, [[BREAK]]
	; GCN: s_or_b64 [[LEFT]], [[BREAK]], [[OLD_LEFT]]			; GCN: s_or_b64 [[LEFT]], [[BREAK]], [[OLD_LEFT]]
	; GCN: s_andn2_b64 exec, exec, [[LEFT]]			; GCN: s_andn2_b64 exec, exec, [[LEFT]]
	; GCN-NEXT: s_cbranch_execnz			; GCN-NEXT: s_cbranch_execnz

	define amdgpu_kernel void @multi_if_break_loop(i32 %arg) #0 {			define amdgpu_kernel void @multi_if_break_loop(i32 %arg) #0 {
	bb:			bb:
	%id = call i32 @llvm.amdgcn.workitem.id.x()			%id = call i32 @llvm.amdgcn.workitem.id.x()
	Show All 31 Lines

test/CodeGen/AMDGPU/select-opt.ll

Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @opt_select_i64_or_cmp_f32(i64 addrspace(1)* %out, float %a, float %b, float %c, i64 %x, i64 %y) #0 {
%or = or i1 %fcmp0, %fcmp1		%or = or i1 %fcmp0, %fcmp1
%select = select i1 %or, i64 %x, i64 %y		%select = select i1 %or, i64 %x, i64 %y
store i64 %select, i64 addrspace(1)* %out		store i64 %select, i64 addrspace(1)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}regression:		; GCN-LABEL: {{^}}regression:
; GCN: v_cmp_neq_f32_e64 s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}, 1.0		; GCN: v_cmp_neq_f32_e64 s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}, 1.0
; GCN: v_cmp_neq_f32_e32 vcc, 0, v{{[0-9]+}}		; GCN: v_cmp_neq_f32_e64 s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}, 0
; GCN: v_cmp_eq_f32_e32 vcc, 0, v{{[0-9]+}}		; GCN: v_cmp_eq_f32_e64 s{{\[[0-9]+:[0-9]+\]}}, s{{[0-9]+}}, 0

define amdgpu_kernel void @regression(float addrspace(1)* %out, float %c0, float %c1) #0 {		define amdgpu_kernel void @regression(float addrspace(1)* %out, float %c0, float %c1) #0 {
entry:		entry:
%cmp0 = fcmp oeq float %c0, 1.0		%cmp0 = fcmp oeq float %c0, 1.0
br i1 %cmp0, label %if0, label %endif		br i1 %cmp0, label %if0, label %endif

if0:		if0:
%cmp1 = fcmp oeq float %c1, 0.0		%cmp1 = fcmp oeq float %c1, 0.0
Show All 14 Lines

test/CodeGen/AMDGPU/sgpr-control-flow.ll

Show First 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	endif:
store i32 %tmp4, i32 addrspace(1)* %out		store i32 %tmp4, i32 addrspace(1)* %out
ret void		ret void
}		}

; SI-LABEL: {{^}}sgpr_if_else_valu_cmp_phi_br:		; SI-LABEL: {{^}}sgpr_if_else_valu_cmp_phi_br:

; SI: ; %else		; SI: ; %else
; SI: buffer_load_dword [[AVAL:v[0-9]+]]		; SI: buffer_load_dword [[AVAL:v[0-9]+]]
; SI: v_cmp_gt_i32_e64 [[PHI:s\[[0-9]+:[0-9]+\]]], 0, [[AVAL]]		; SI: v_cmp_gt_i32_e32 vcc, 0, [[AVAL]]
		; SI: s_and_b64 [[PHI:s\[[0-9]+:[0-9]+\]]], vcc, exec

; SI: ; %if		; SI: ; %if
; SI: buffer_load_dword [[AVAL:v[0-9]+]]		; SI: buffer_load_dword [[AVAL:v[0-9]+]]
; SI-DAG: v_cmp_eq_u32_e32 [[CMP_ELSE:vcc]], 0, [[AVAL]]		; SI-DAG: v_cmp_eq_u32_e32 [[CMP_ELSE:vcc]], 0, [[AVAL]]
; SI-DAG: s_andn2_b64 [[PHI]], [[PHI]], exec		; SI-DAG: s_andn2_b64 [[PHI]], [[PHI]], exec
; SI-DAG: s_and_b64 [[TMP:s\[[0-9]+:[0-9]+\]]], [[CMP_ELSE]], exec		; SI-DAG: s_and_b64 [[TMP:s\[[0-9]+:[0-9]+\]]], [[CMP_ELSE]], exec
; SI: s_or_b64 [[PHI]], [[PHI]], [[TMP]]		; SI: s_or_b64 [[PHI]], [[PHI]], [[TMP]]

Show All 31 Lines

test/CodeGen/AMDGPU/si-fix-sgpr-copies.mir

Show All 10 Lines	registers:
- { id: 7, class: vgpr_32 }		- { id: 7, class: vgpr_32 }
- { id: 8, class: sreg_32_xm0 }		- { id: 8, class: sreg_32_xm0 }
- { id: 9, class: vgpr_32 }		- { id: 9, class: vgpr_32 }
- { id: 10, class: sreg_64 }		- { id: 10, class: sreg_64 }
- { id: 11, class: sreg_32_xm0 }		- { id: 11, class: sreg_32_xm0 }

body: \|		body: \|
; GCN-LABEL: name: phi_visit_order		; GCN-LABEL: name: phi_visit_order
; GCN: V_ADD_I32		; GCN: S_ADD_I32
bb.0:		bb.0:
liveins: $vgpr0		liveins: $vgpr0
%7 = COPY $vgpr0		%7 = COPY $vgpr0
%8 = S_MOV_B32 0		%8 = S_MOV_B32 0

bb.1:		bb.1:
%0 = PHI %8, %bb.0, %0, %bb.1, %2, %bb.2		%0 = PHI %8, %bb.0, %0, %bb.1, %2, %bb.2
%9 = V_MOV_B32_e32 9, implicit $exec		%9 = V_MOV_B32_e32 9, implicit $exec
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/smrd.ll

Show First 20 Lines • Show All 565 Lines • ▼ Show 20 Lines	main_body:
call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %r.1, float %r.1, float %r.1, float %r.2, i1 true, i1 true) #0		call void @llvm.amdgcn.exp.f32(i32 0, i32 15, float %r.1, float %r.1, float %r.1, float %r.2, i1 true, i1 true) #0
ret void		ret void
}		}

; GCN-LABEL: {{^}}smrd_uniform_loop:		; GCN-LABEL: {{^}}smrd_uniform_loop:
;		;
; TODO: we should keep the loop counter in an SGPR		; TODO: we should keep the loop counter in an SGPR
;		;
; GCN: v_readfirstlane_b32
; GCN: s_buffer_load_dword		; GCN: s_buffer_load_dword
define amdgpu_ps float @smrd_uniform_loop(<4 x i32> inreg %desc, i32 %bound) #0 {		define amdgpu_ps float @smrd_uniform_loop(<4 x i32> inreg %desc, i32 %bound) #0 {
main_body:		main_body:
br label %loop		br label %loop

loop:		loop:
%counter = phi i32 [ 0, %main_body ], [ %counter.next, %loop ]		%counter = phi i32 [ 0, %main_body ], [ %counter.next, %loop ]
%sum = phi float [ 0.0, %main_body ], [ %sum.next, %loop ]		%sum = phi float [ 0.0, %main_body ], [ %sum.next, %loop ]
▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/subreg-coalescer-undef-use.ll

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -mcpu=tahiti -amdgpu-dce-in-ra=0 -o - %s \| FileCheck %s			; RUN: llc -march=amdgcn -mcpu=tahiti -amdgpu-dce-in-ra=0 -o - %s \| FileCheck %s
	; Don't crash when the use of an undefined value is only detected by the			; Don't crash when the use of an undefined value is only detected by the
	; register coalescer because it is hidden with subregister insert/extract.			; register coalescer because it is hidden with subregister insert/extract.
	target triple="amdgcn--"			target triple="amdgcn--"

				define amdgpu_kernel void @foobar(float %a0, float %a1, float addrspace(1)* %out) nounwind {
	; CHECK-LABEL: foobar:			; CHECK-LABEL: foobar:
	; CHECK: s_load_dwordx2 s[4:5], s[0:1], 0x9			; CHECK: ; %bb.0: ; %entry
				; CHECK-NEXT: s_load_dwordx2 s[4:5], s[0:1], 0x9
	; CHECK-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xb			; CHECK-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0xb
	; CHECK-NEXT: v_mbcnt_lo_u32_b32_e64			; CHECK-NEXT: v_mbcnt_lo_u32_b32_e64 v0, -1, 0
	; CHECK-NEXT: s_mov_b32 s2, -1
	; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0			; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 0, v0
				; CHECK-NEXT: s_mov_b32 s2, -1
	; CHECK-NEXT: s_waitcnt lgkmcnt(0)			; CHECK-NEXT: s_waitcnt lgkmcnt(0)
	; CHECK-NEXT: v_mov_b32_e32 v1, s5
	; CHECK-NEXT: s_and_saveexec_b64 s[4:5], vcc

	; CHECK: BB0_1:			; FIXME: The change related to the fact that
	; CHECK-NEXT: ; kill: def $vgpr0_vgpr1 killed $sgpr4_sgpr5 killed $exec			; DetectDeadLanes pass hit "Copy across incompatible class" SGPR -> VGPR in analysis
	; CHECK-NEXT: ; implicit-def: $vgpr0_vgpr1_vgpr2_vgpr3			; and hence it cannot derive the fact that the vector element is unused.
				; Such a copies appear because the float4 vectors and their elements in the test are uniform
				; but the PHI node in "ife" block is divergent because of the CF dependency (divergent branch in bb0)

	; CHECK: BB0_2:			; CHECK-NEXT: v_mov_b32_e32 v0, s4
	; CHECK: s_or_b64 exec, exec, s[4:5]			; CHECK-NEXT: v_mov_b32_e32 v1, s5
				; CHECK-NEXT: v_mov_b32_e32 v2, s6
				; CHECK-NEXT: v_mov_b32_e32 v3, s7

				; CHECK-NEXT: s_and_saveexec_b64 s[6:7], vcc
				; CHECK-NEXT: ; mask branch BB0_2
				; CHECK-NEXT: BB0_1: ; %ift
				; CHECK-NEXT: s_mov_b32 s4, s5
				; CHECK-NEXT: v_mov_b32_e32 v0, s4
				; CHECK-NEXT: v_mov_b32_e32 v1, s5
				; CHECK-NEXT: v_mov_b32_e32 v2, s6
				; CHECK-NEXT: v_mov_b32_e32 v3, s7
				; CHECK-NEXT: BB0_2: ; %ife
				; CHECK-NEXT: s_or_b64 exec, exec, s[6:7]
	; CHECK-NEXT: s_mov_b32 s3, 0xf000			; CHECK-NEXT: s_mov_b32 s3, 0xf000
	; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], 0			; CHECK-NEXT: buffer_store_dword v1, off, s[0:3], 0
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	define amdgpu_kernel void @foobar(float %a0, float %a1, float addrspace(1)* %out) nounwind {
	entry:			entry:
	%v0 = insertelement <4 x float> undef, float %a0, i32 0			%v0 = insertelement <4 x float> undef, float %a0, i32 0
	%tid = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0) #0			%tid = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0) #0
	%cnd = icmp eq i32 %tid, 0			%cnd = icmp eq i32 %tid, 0
	br i1 %cnd, label %ift, label %ife			br i1 %cnd, label %ift, label %ife

	ift:			ift:
	%v1 = insertelement <4 x float> undef, float %a1, i32 0			%v1 = insertelement <4 x float> undef, float %a1, i32 0
	Show All 12 Lines

test/CodeGen/AMDGPU/uniform-loop-inside-nonuniform.ll

	; RUN: llc -march=amdgcn -mcpu=verde < %s \| FileCheck %s			; RUN: llc -march=amdgcn -mcpu=verde < %s \| FileCheck %s

	; Test a simple uniform loop that lives inside non-uniform control flow.			; Test a simple uniform loop that lives inside non-uniform control flow.

	; CHECK-LABEL: {{^}}test1:			; CHECK-LABEL: {{^}}test1:
	; CHECK: v_cmp_ne_u32_e32 vcc, 0			; CHECK: v_cmp_ne_u32_e32 vcc, 0
	; CHECK: s_and_saveexec_b64			; CHECK: s_and_saveexec_b64
	; CHECK-NEXT: ; mask branch			; CHECK-NEXT: ; mask branch
	; CHECK-NEXT: s_cbranch_execz BB{{[0-9]+_[0-9]+}}			; CHECK-NEXT: s_cbranch_execz BB{{[0-9]+_[0-9]+}}
	; CHECK-NEXT: BB{{[0-9]+_[0-9]+}}: ; %loop_body.preheader

	; CHECK: [[LOOP_BODY_LABEL:BB[0-9]+_[0-9]+]]:			; CHECK: [[LOOP_BODY_LABEL:BB[0-9]+_[0-9]+]]: ; %loop_body
	; CHECK: s_cbranch_vccz [[LOOP_BODY_LABEL]]			; CHECK: s_cbranch_scc0 [[LOOP_BODY_LABEL]]

	; CHECK: s_endpgm			; CHECK: s_endpgm
	define amdgpu_ps void @test1(<8 x i32> inreg %rsrc, <2 x i32> %addr.base, i32 %y, i32 %p) {			define amdgpu_ps void @test1(<8 x i32> inreg %rsrc, <2 x i32> %addr.base, i32 %y, i32 %p) {
	main_body:			main_body:
	%cc = icmp eq i32 %p, 0			%cc = icmp eq i32 %p, 0
	br i1 %cc, label %out, label %loop_body			br i1 %cc, label %out, label %loop_body

	loop_body:			loop_body:
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/use-sgpr-multiple-times.ll

Show First 20 Lines • Show All 220 Lines • ▼ Show 20 Lines	define amdgpu_kernel void @test_literal_use_twice_ternary_op_s_k_k_x2(float addrspace(1)* %out, float %a, float %b) #0 {
store volatile float %fma0, float addrspace(1)* %out		store volatile float %fma0, float addrspace(1)* %out
store volatile float %fma1, float addrspace(1)* %out		store volatile float %fma1, float addrspace(1)* %out
ret void		ret void
}		}

; GCN-LABEL: {{^}}test_s0_s1_k_f32:		; GCN-LABEL: {{^}}test_s0_s1_k_f32:
; SI-DAG: s_load_dwordx2 s{{\[}}[[SGPR0:[0-9]+]]:[[SGPR1:[0-9]+]]{{\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0xb		; SI-DAG: s_load_dwordx2 s{{\[}}[[SGPR0:[0-9]+]]:[[SGPR1:[0-9]+]]{{\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0xb
; VI-DAG: s_load_dwordx2 s{{\[}}[[SGPR0:[0-9]+]]:[[SGPR1:[0-9]+]]{{\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0x2c		; VI-DAG: s_load_dwordx2 s{{\[}}[[SGPR0:[0-9]+]]:[[SGPR1:[0-9]+]]{{\]}}, s{{\[[0-9]+:[0-9]+\]}}, 0x2c
; GCN-DAG: s_mov_b32 [[SK0:s[0-9]+]], 0x44800000		; GCN-DAG: v_mov_b32_e32 [[VK0:v[0-9]+]], 0x44800000
; GCN-DAG: v_mov_b32_e32 [[VS1:v[0-9]+]], s[[SGPR1]]		; GCN-DAG: v_mov_b32_e32 [[VS1:v[0-9]+]], s[[SGPR1]]
; GCN-DAG: v_mov_b32_e32 [[VS0:v[0-9]+]], s[[SGPR0]]

; GCN-DAG: v_fma_f32 [[RESULT0:v[0-9]+]], [[VS0]], [[VS1]], [[SK0]]		; GCN-DAG: v_fma_f32 [[RESULT0:v[0-9]+]], s[[SGPR0]], [[VS1]], [[VK0]]
; GCN-DAG: s_mov_b32 [[SK1:s[0-9]+]], 0x45800000		; GCN-DAG: v_mov_b32_e32 [[VK1:v[0-9]+]], 0x45800000
; GCN-DAG: v_fma_f32 [[RESULT1:v[0-9]+]], [[VS0]], [[VS1]], [[SK1]]		; GCN-DAG: v_fma_f32 [[RESULT1:v[0-9]+]], s[[SGPR0]], [[VS1]], [[VK1]]

; GCN: buffer_store_dword [[RESULT0]]		; GCN: buffer_store_dword [[RESULT0]]
; GCN: buffer_store_dword [[RESULT1]]		; GCN: buffer_store_dword [[RESULT1]]
define amdgpu_kernel void @test_s0_s1_k_f32(float addrspace(1)* %out, float %a, float %b) #0 {		define amdgpu_kernel void @test_s0_s1_k_f32(float addrspace(1)* %out, float %a, float %b) #0 {
%fma0 = call float @llvm.fma.f32(float %a, float %b, float 1024.0) #1		%fma0 = call float @llvm.fma.f32(float %a, float %b, float 1024.0) #1
%fma1 = call float @llvm.fma.f32(float %a, float %b, float 4096.0) #1		%fma1 = call float @llvm.fma.f32(float %a, float %b, float 4096.0) #1
store volatile float %fma0, float addrspace(1)* %out		store volatile float %fma0, float addrspace(1)* %out
store volatile float %fma1, float addrspace(1)* %out		store volatile float %fma1, float addrspace(1)* %out
Show All 30 Lines

test/CodeGen/AMDGPU/valu-i1.ll

	Show First 20 Lines • Show All 159 Lines • ▼ Show 20 Lines
	; SI-NEXT: ; mask branch			; SI-NEXT: ; mask branch
	; SI-NEXT: s_cbranch_execz [[LABEL_EXIT:BB[0-9]+_[0-9]+]]			; SI-NEXT: s_cbranch_execz [[LABEL_EXIT:BB[0-9]+_[0-9]+]]

	; SI: s_mov_b64 {{s\[[0-9]+:[0-9]+\]}}, 0{{$}}			; SI: s_mov_b64 {{s\[[0-9]+:[0-9]+\]}}, 0{{$}}

	; SI: [[LABEL_LOOP:BB[0-9]+_[0-9]+]]:			; SI: [[LABEL_LOOP:BB[0-9]+_[0-9]+]]:
	; SI: buffer_load_dword			; SI: buffer_load_dword
	; SI-DAG: buffer_store_dword			; SI-DAG: buffer_store_dword
	; SI-DAG: v_cmp_eq_u32_e32 vcc, 0x100			; SI-DAG: s_cmpk_eq_i32 s{{[0-9+]}}, 0x100
	; SI: s_cbranch_vccz [[LABEL_LOOP]]			; SI: s_cbranch_scc0 [[LABEL_LOOP]]
	; SI: [[LABEL_EXIT]]:			; SI: [[LABEL_EXIT]]:
	; SI: s_endpgm			; SI: s_endpgm

	define amdgpu_kernel void @simple_test_v_loop(i32 addrspace(1)* %dst, i32 addrspace(1)* %src) #1 {			define amdgpu_kernel void @simple_test_v_loop(i32 addrspace(1)* %dst, i32 addrspace(1)* %src) #1 {
	entry:			entry:
	%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone			%tid = call i32 @llvm.amdgcn.workitem.id.x() nounwind readnone
	%is.0 = icmp ne i32 %tid, 0			%is.0 = icmp ne i32 %tid, 0
	%limit = add i32 %tid, 64			%limit = add i32 %tid, 64
	Show All 31 Lines
	; Clear exec bits for workitems that load -1s			; Clear exec bits for workitems that load -1s
	; SI: [[LABEL_LOOP:BB[0-9]+_[0-9]+]]:			; SI: [[LABEL_LOOP:BB[0-9]+_[0-9]+]]:
	; SI: buffer_load_dword [[B:v[0-9]+]]			; SI: buffer_load_dword [[B:v[0-9]+]]
	; SI: buffer_load_dword [[A:v[0-9]+]]			; SI: buffer_load_dword [[A:v[0-9]+]]
	; SI-DAG: v_cmp_ne_u32_e64 [[NEG1_CHECK_0:s\[[0-9]+:[0-9]+\]]], -1, [[A]]			; SI-DAG: v_cmp_ne_u32_e64 [[NEG1_CHECK_0:s\[[0-9]+:[0-9]+\]]], -1, [[A]]
	; SI-DAG: v_cmp_ne_u32_e32 [[NEG1_CHECK_1:vcc]], -1, [[B]]			; SI-DAG: v_cmp_ne_u32_e32 [[NEG1_CHECK_1:vcc]], -1, [[B]]
	; SI: s_and_b64 [[ORNEG1:s\[[0-9]+:[0-9]+\]]], [[NEG1_CHECK_1]], [[NEG1_CHECK_0]]			; SI: s_and_b64 [[ORNEG1:s\[[0-9]+:[0-9]+\]]], [[NEG1_CHECK_1]], [[NEG1_CHECK_0]]
	; SI: s_and_saveexec_b64 [[ORNEG2:s\[[0-9]+:[0-9]+\]]], [[ORNEG1]]			; SI: s_and_saveexec_b64 [[ORNEG2:s\[[0-9]+:[0-9]+\]]], [[ORNEG1]]
	; SI: s_cbranch_execz [[LABEL_FLOW:BB[0-9]+_[0-9]+]]			; SI: ; mask branch [[LABEL_FLOW:BB[0-9]+_[0-9]+]]

	; SI: BB{{[0-9]+_[0-9]+}}: ; %bb20			; SI: BB{{[0-9]+_[0-9]+}}: ; %bb20
	; SI: buffer_store_dword			; SI: buffer_store_dword

	; SI: [[LABEL_FLOW]]:			; SI: [[LABEL_FLOW]]:
	; SI-NEXT: ; in Loop: Header=[[LABEL_LOOP]]			; SI-NEXT: ; in Loop: Header=[[LABEL_LOOP]]
	; SI-NEXT: s_or_b64 exec, exec, [[ORNEG2]]			; SI-NEXT: s_or_b64 exec, exec, [[ORNEG2]]
	; SI-NEXT: s_and_b64 [[TMP1:s\[[0-9]+:[0-9]+\]]],			; SI-NEXT: s_and_b64 [[TMP1:s\[[0-9]+:[0-9]+\]]],
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

test/CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll

				; XFAIL: *
	; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GCNMESA -check-prefix=SIMESA %s			; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=tahiti -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GCNMESA -check-prefix=SIMESA %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=fiji -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GCNMESA -check-prefix=VIMESA %s			; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=fiji -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GCNMESA -check-prefix=VIMESA %s
	; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=gfx900 -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GCNMESA -check-prefix=GFX9MESA %s			; RUN: llc -march=amdgcn -mtriple=amdgcn-- -mcpu=gfx900 -mattr=-flat-for-global -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=GCNMESA -check-prefix=GFX9MESA %s
	; RUN: llc -march=amdgcn -mcpu=hawaii -mtriple=amdgcn-unknown-amdhsa -mattr=-code-object-v3 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=CIHSA -check-prefix=HSA %s			; RUN: llc -march=amdgcn -mcpu=hawaii -mtriple=amdgcn-unknown-amdhsa -mattr=-code-object-v3 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=CIHSA -check-prefix=HSA %s
	; RUN: llc -march=amdgcn -mcpu=fiji -mtriple=amdgcn-unknown-amdhsa -mattr=-code-object-v3 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=VIHSA -check-prefix=HSA %s			; RUN: llc -march=amdgcn -mcpu=fiji -mtriple=amdgcn-unknown-amdhsa -mattr=-code-object-v3 -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN -check-prefix=VIHSA -check-prefix=HSA %s

	; This ends up using all 256 registers and requires register			; This ends up using all 256 registers and requires register
	; scavenging which will fail to find an unsued register.			; scavenging which will fail to find an unsued register.
	▲ Show 20 Lines • Show All 607 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 200995

include/llvm/CodeGen/FunctionLoweringInfo.h

include/llvm/CodeGen/SelectionDAG.h

include/llvm/CodeGen/TargetLowering.h

include/llvm/CodeGen/TargetRegisterInfo.h

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp

lib/CodeGen/SelectionDAG/InstrEmitter.h

lib/CodeGen/SelectionDAG/InstrEmitter.cpp

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

lib/Target/AMDGPU/SIFixSGPRCopies.cpp

lib/Target/AMDGPU/SIISelLowering.h

lib/Target/AMDGPU/SIISelLowering.cpp

lib/Target/AMDGPU/SIInstrInfo.cpp

lib/Target/AMDGPU/SIRegisterInfo.h

lib/Target/ARM/ARMISelLowering.h

lib/Target/ARM/ARMISelLowering.cpp

test/CodeGen/AMDGPU/atomicrmw-nand.ll

test/CodeGen/AMDGPU/branch-relaxation.ll

test/CodeGen/AMDGPU/branch-uniformity.ll

test/CodeGen/AMDGPU/control-flow-fastregalloc.ll

test/CodeGen/AMDGPU/divergent-branch-uniform-condition.ll

test/CodeGen/AMDGPU/extract_subvector_vec4_vec3.ll

test/CodeGen/AMDGPU/fabs.ll

test/CodeGen/AMDGPU/fdiv32-to-rcp-folding.ll

test/CodeGen/AMDGPU/fmin_legacy.ll

test/CodeGen/AMDGPU/fneg-fabs.ll

test/CodeGen/AMDGPU/fsub.ll

test/CodeGen/AMDGPU/i1-copy-from-loop.ll

test/CodeGen/AMDGPU/i1-copy-phi-uniform-branch.ll

test/CodeGen/AMDGPU/insert_vector_elt.ll

test/CodeGen/AMDGPU/llvm.amdgcn.div.scale.ll

test/CodeGen/AMDGPU/llvm.amdgcn.fmed3.ll

test/CodeGen/AMDGPU/llvm.amdgcn.mov.dpp.ll

test/CodeGen/AMDGPU/llvm.amdgcn.mqsad.pk.u16.u8.ll

test/CodeGen/AMDGPU/llvm.amdgcn.qsad.pk.u16.u8.ll

test/CodeGen/AMDGPU/loop_break.ll

test/CodeGen/AMDGPU/madak.ll

test/CodeGen/AMDGPU/mubuf-legalize-operands.ll

test/CodeGen/AMDGPU/multilevel-break.ll

test/CodeGen/AMDGPU/select-opt.ll

test/CodeGen/AMDGPU/sgpr-control-flow.ll

test/CodeGen/AMDGPU/si-fix-sgpr-copies.mir

test/CodeGen/AMDGPU/smrd.ll

test/CodeGen/AMDGPU/subreg-coalescer-undef-use.ll

test/CodeGen/AMDGPU/uniform-loop-inside-nonuniform.ll

test/CodeGen/AMDGPU/use-sgpr-multiple-times.ll

test/CodeGen/AMDGPU/valu-i1.ll

test/CodeGen/AMDGPU/vgpr-spill-emergency-stack-slot-compute.ll

AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence.
ClosedPublic