This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/test/
-
test/
-
CodeGenCUDA/
8/8
fp-atomics-optremarks.cu
-
CodeGenOpenCL/
2/2
fp-atomics-optremarks-gfx90a.cl
-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
-
TargetLowering.h
-
lib/
-
CodeGen/
37/37
AtomicExpandPass.cpp
-
Target/
-
AMDGPU/
-
AMDGPUISelLowering.h
-
AMDGPUISelLowering.cpp
-
SIISelLowering.h
28/28
SIISelLowering.cpp
-
X86/
-
X86ISelLowering.h
-
X86ISelLowering.cpp
-
test/CodeGen/
-
CodeGen/
-
AMDGPU/
1/1
fp-atomics-remarks-gfx90a.ll
-
llc-pipeline.ll
-
X86/
-
O0-pipeline.ll
-
opt-pipeline.ll

Differential D106891

[Remarks] Emit optimization remarks for atomics generating CAS loop
ClosedPublic

Authored by gandhi21299 on Jul 27 2021, 9:51 AM.

Download Raw Diff

Details

Reviewers

arsenm
yaxunl
rampitec
b-sumner
t-tye
nikic

Commits

rGf22ba5187350: [Remarks] Emit optimization remarks for atomics generating CAS loop
rG435785214f73: [Remarks] Emit optimization remarks for atomics generating CAS loop
rGc4e5425aa579: [Remarks] Emit optimization remarks for atomics generating CAS loop

Summary

Implements ORE in AtomicExpandPass to report atomics generating a compare and swap loop.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

rampitec added inline comments.Aug 6 2021, 2:19 PM

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12120	Do not pass it there. Turn reportAtomicExpand into lambda.

turned reportAtomicExpand(...) into a lambda as requested

Harbormaster completed remote builds in B118450: Diff 364891.Aug 6 2021, 2:57 PM

refreshing patch

Harbormaster completed remote builds in B118592: Diff 365066.Aug 8 2021, 7:41 PM

rebase

Harbormaster completed remote builds in B118687: Diff 365199.Aug 9 2021, 9:12 AM

emit CAS loop remark only if the instruction is of floating point type

Harbormaster completed remote builds in B118689: Diff 365202.Aug 9 2021, 9:28 AM

arsenm added inline comments.Aug 9 2021, 10:57 AM

llvm/lib/CodeGen/AtomicExpandPass.cpp
631–637	This is an AMDGPU specific message/restriction. The floating point operation isn't relevant, and you don't even know the specific reason at this point.

floating point check is not required for cas loop check, getting rid of it

Harbormaster completed remote builds in B118706: Diff 365228.Aug 9 2021, 11:12 AM

gandhi21299 marked an inline comment as done.Aug 9 2021, 3:06 PM

restricting remarks emission on AMDGPU targets only

@rampitec comments?

Harbormaster completed remote builds in B118924: Diff 365522.Aug 10 2021, 10:39 AM

rampitec added inline comments.Aug 10 2021, 10:46 AM

llvm/lib/CodeGen/AtomicExpandPass.cpp
633	Still the same problem.

requested change

Harbormaster completed remote builds in B118945: Diff 365564.Aug 10 2021, 11:48 AM

@rampitec besides the remarks, am I missing anything else in the patch?

In D106891#2938128, @gandhi21299 wrote:

@rampitec besides the remarks, am I missing anything else in the patch?

You should not use AMD specific code in the common code.

reverted the patch to the previous one

Harbormaster completed remote builds in B118986: Diff 365625.Aug 10 2021, 4:18 PM

eliminated unsafe hardware remarks in SIISelLowering.cpp
updated cas loop remark and corresponding tests

Harbormaster completed remote builds in B119149: Diff 365855.Aug 11 2021, 2:43 PM

removed AMDGPU check

Harbormaster completed remote builds in B119150: Diff 365859.Aug 11 2021, 2:56 PM

In D106891#2940411, @gandhi21299 wrote:

eliminated unsafe hardware remarks in SIISelLowering.cpp

Most of this patch is not needed now. You do not need to pass ORE to targets, it is a part of the next patch.

clang/test/CodeGenCUDA/fp-atomics-optremarks.cu
11	Need tests for all scopes.
llvm/lib/CodeGen/AtomicExpandPass.cpp
585	I do not see why do you need this function and all its arguments now. You can just call ORE->emit() directly.
636	That does not help with target defined scope names, such as our "one-as" for example.
llvm/test/CodeGen/AMDGPU/fp-atomics-remarks-gfx90a.ll
5	You need to write tests for all scopes.

You also need to retitle it now, it is not about AMDGPU and not about FP.

llvm/lib/CodeGen/AtomicExpandPass.cpp
634	Need to name the operation.

gandhi21299 retitled this revision from [AMDGPU] [Remarks] Emit optimization remarks for FP atomics to [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop.Aug 12 2021, 8:02 AM

gandhi21299 edited the summary of this revision. (Show Details)

gandhi21299 marked 2 inline comments as done.Aug 12 2021, 9:01 AM

gandhi21299 added inline comments.

llvm/lib/CodeGen/AtomicExpandPass.cpp
636	How can I get target defined scope names?

rampitec added inline comments.Aug 12 2021, 9:04 AM

llvm/lib/CodeGen/AtomicExpandPass.cpp
636	It is right on the instruction: %result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 syncscope("one-as") seq_cst

gandhi21299 added inline comments.Aug 12 2021, 9:09 AM

llvm/lib/CodeGen/AtomicExpandPass.cpp
636	Sorry, I meant from the LLVM API.

rampitec added inline comments.Aug 12 2021, 9:25 AM

llvm/lib/CodeGen/AtomicExpandPass.cpp
636	LLVMContext::getSyncScopeNames()

gandhi21299 added inline comments.Aug 12 2021, 11:27 AM

llvm/lib/CodeGen/AtomicExpandPass.cpp
636	I think that gives me all sync scopes available for the target. If not, which sync scope in the vector corresponds to the instruction I am dealing with?

rampitec added inline comments.Aug 12 2021, 11:35 AM

llvm/lib/CodeGen/AtomicExpandPass.cpp
636	https://llvm.org/doxygen/MachineOperand_8cpp_source.html#l00474

gandhi21299 marked 4 inline comments as done.Aug 12 2021, 1:01 PM

gandhi21299 marked an inline comment as done.

requested changes from reviewer

added memory scope tests and updated remarks and tests accordingly
still working on clang/test/CodeGenCUDA/fp-atomics-optremarks.cu and clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl

rampitec added inline comments.Aug 12 2021, 2:57 PM

llvm/lib/CodeGen/AtomicExpandPass.cpp
631	Only if SSNs.empty().
637	I believe getOpcodeName() will return "atomicrmw" instead of the operation. Also missing space after it.

gandhi21299 added inline comments.Aug 12 2021, 2:58 PM

clang/test/CodeGenCUDA/fp-atomics-optremarks.cu
11	`__atomic_fetch_add` does not take scope as an argument, how could I add tests with different scopes?
clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl
25 ↗	(On Diff #366112)	For some reason, remarks are not emitted here. The command to run looks right above...

Harbormaster completed remote builds in B119341: Diff 366112.Aug 12 2021, 2:58 PM

rampitec added inline comments.Aug 12 2021, 3:00 PM

clang/test/CodeGenCUDA/fp-atomics-optremarks.cu
11	At least in the IR test.

gandhi21299 added inline comments.Aug 12 2021, 3:01 PM

clang/test/CodeGenCUDA/fp-atomics-optremarks.cu
11	What do you mean by that?
llvm/lib/CodeGen/AtomicExpandPass.cpp
631	Sorry, what do you mean? SSN will be empty at that point.
637	getOpcodeName() returns `atomicrmwoperation`, as per the tests the spacing looks correct to me.

rampitec added inline comments.Aug 12 2021, 3:07 PM

clang/test/CodeGenCUDA/fp-atomics-optremarks.cu
11	You need to test all of that. If you cannot write a proper .cu test, then write an IR test and run llc.
llvm/lib/CodeGen/AtomicExpandPass.cpp
631	I thought want to cache it. But really just declare it here.
637	The operation to report is AI->getOperation(). Spacing is wrong, "operation" is your text.

gandhi21299 marked 4 inline comments as done.Aug 12 2021, 3:17 PM

gandhi21299 added inline comments.

clang/test/CodeGenCUDA/fp-atomics-optremarks.cu
11	Should I discard this test then since the test fp-atomics-remarks-gfx90a.ll already satisfies?

corrected remarks by replacing the operation name and updated tests accordingly
code format

corrected atomics-remarks-gfx90a.cl test to emit remark as well

rampitec added inline comments.Aug 12 2021, 3:29 PM

clang/test/CodeGenCUDA/fp-atomics-optremarks.cu
11	CU test is still needed. You also need it in the .cl test below.
llvm/lib/CodeGen/AtomicExpandPass.cpp
600	What should this "Passed" do and why wouldn't just declare it where you use it?

no way to pass memory_scope in __atomic_fetch_add(...), discarded the test.

gandhi21299 marked an inline comment as done.Aug 12 2021, 3:35 PM

gandhi21299 added inline comments.

clang/test/CodeGenCUDA/fp-atomics-optremarks.cu
11	Alright, I am not sure how I can test for the other scopes though.
llvm/lib/CodeGen/AtomicExpandPass.cpp
600	https://llvm.org/docs/Remarks.html Since this is an informative pass and not that pass failed to optimize, the "Passed" argument is used. I will move it downwards, I thought it might be useful in the future for other operations. Its better below for now anyways.

Harbormaster completed remote builds in B119351: Diff 366132.Aug 12 2021, 3:40 PM

gandhi21299 marked an inline comment as done.Aug 12 2021, 3:48 PM

gandhi21299 added inline comments.

llvm/lib/CodeGen/AtomicExpandPass.cpp
600	Actually I am getting a runtime error at the line where I declare Remark when I bring it down.

added clang/test/CodeGenCUDA/fp-atomics-optremarks.cu back
moved Remark declaration into the else block

gandhi21299 marked an inline comment as done.Aug 13 2021, 8:30 AM

Harbormaster completed remote builds in B119458: Diff 366294.Aug 13 2021, 9:20 AM

rebased against main branch
cleaned up code

Harbormaster completed remote builds in B119461: Diff 366301.Aug 13 2021, 9:47 AM

Please retitle it without AMDGPU and remove the changes to pass ORE to targets. It is not a part of this change, it is a part of the folloup target specific change.

gandhi21299 retitled this revision from [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop to [Remarks] Emit optimization remarks for atomics generating CAS loop.Aug 13 2021, 10:00 AM

arsenm added inline comments.Aug 13 2021, 10:20 AM

llvm/lib/CodeGen/AtomicExpandPass.cpp
637	Missing word atomic?

gandhi21299 added inline comments.Aug 13 2021, 10:27 AM

llvm/lib/CodeGen/AtomicExpandPass.cpp
637	Its already part of the OperationName

eliminated irrelevant changes to this patch

Please restore opencl test.

llvm/lib/CodeGen/AtomicExpandPass.cpp
637	Matt is right, missing "atomic" word.

replaced the OpenCL test
renamed filenames
added 'atomic' to the remark and tests

gandhi21299 marked 2 inline comments as done.Aug 13 2021, 1:04 PM

removed StringExtras.h inclusion

rampitec added inline comments.Aug 13 2021, 1:46 PM

clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl
32 ↗	(On Diff #366343)	It is not system as test name suggests. Just rename to atomic_cas and add calls with all other scopes into the same function.

Harbormaster completed remote builds in B119494: Diff 366343.Aug 13 2021, 1:52 PM

adding more tests in clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl for various scopes, memory_scope_work_item is called out as invalid by the compiler so excluded that

rampitec added inline comments.Aug 13 2021, 2:29 PM

clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl
33 ↗	(On Diff #366349)	Just combine all the calls into a single function.

combined all tests into one

LGTM

This revision is now accepted and ready to land.Aug 13 2021, 2:37 PM

Thanks a lot for the review! I will merge this patch in as soon as the CI passes.

Harbormaster completed remote builds in B119507: Diff 366357.Aug 13 2021, 3:14 PM

This revision was landed with ongoing or failed builds.Aug 13 2021, 9:44 PM

Closed by commit rGc4e5425aa579: [Remarks] Emit optimization remarks for atomics generating CAS loop (authored by gandhi21299). · Explain Why

This revision was automatically updated to reflect the committed changes.

gandhi21299 added a commit: rGc4e5425aa579: [Remarks] Emit optimization remarks for atomics generating CAS loop.

gandhi21299 added a reverting change: rG29e11a1aa303: Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop".Aug 13 2021, 10:59 PM

gandhi21299 reopened this revision.Aug 13 2021, 11:01 PM

This revision is now accepted and ready to land.Aug 13 2021, 11:01 PM

http://llvm-compile-time-tracker.com/compare.php?from=1f2d40c47f5f8fd01d91d73a1f52044fe1c83225&to=c4e5425aa579d21530ef1766d7144b38a347f247&stat=instructions

Compile time regressions especially for -O0 -g are higher than expected with this patch.

changed type of ORE from OptimizationRemarkEmitter* to std::shared_ptr<OptimizationRemarkEmitter> and construct it within AtomicExpandPass, this solution is implemented to address for the regressions in many backends due to prerequisite passes

fixed breaking tests

Herald added a subscriber: nemanjai. · View Herald TranscriptAug 14 2021, 3:19 PM

I don’t think constructing in the pass is the solution. Why exactly is this introducing such a big slowdown?

llvm/lib/CodeGen/AtomicExpandPass.cpp
182	There’s basically never a reason to use shared_ptr over unique_ptr

Harbormaster completed remote builds in B119587: Diff 366463.Aug 14 2021, 3:56 PM

@xbolva00 is concerned about Dominator Tree Construction

I will actually revert my changes back with only the tests updated to see if the times are reasonable.

Also, @nikic suggested to construct ORE here if we cannot usefully preserve them. I am not sure if preserving the information is useful though.

reverting changes back to declaring ORE using getAnalysis

@xbolva00 I timed X86/opt-pipeline.ll passes and DTC executed in 0.1% of the total compile time.

Harbormaster completed remote builds in B119590: Diff 366467.Aug 14 2021, 5:02 PM

Closed by commit rG435785214f73: [Remarks] Emit optimization remarks for atomics generating CAS loop (authored by gandhi21299). · Explain WhyAug 14 2021, 10:38 PM

This revision was automatically updated to reflect the committed changes.

gandhi21299 added a commit: rG435785214f73: [Remarks] Emit optimization remarks for atomics generating CAS loop.

http://llvm-compile-time-tracker.com/compare.php?from=530aa7e4da14fb22493ab7e175f8c34dd10333d3&to=435785214f73ff0c92e97f2ade6356e3ba3bf661&stat=instructions

Still same problem with -O0 -g.
Please wait a bit for more people, dont rush.

sqlite3 +1.3% increase of compile time for -O0 -g is simply not acceptable.

xbolva00 added a reverting change: rG49de6070a2b7: Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop".Aug 15 2021, 3:11 AM

In D106891#2945342, @arsenm wrote:

I don’t think constructing in the pass is the solution. Why exactly is this introducing such a big slowdown?

The reason are the additional DominatorTree and LoopInfo calculations. These have a big impact at O0. These analysis calculations are caused by a deficiency in the legacy pass manager, which is still used for the codegen pipeline: Even though these analyses are only needed if diagnostic hotness is used, the pass manager requires them to be scheduled unconditionally.

The way to avoid this is to construct ORE without going through the pass manager. Grep for OptimizationRemarkEmitter ORE for various passes that already do this (though the reason is different in some cases -- there are some analysis preservation concerns with loop passes).

Okay, sorry about that. Thanks for reverting my commit. I will use a unique_ptr and wait for another approval.

reverting type of ORE from OptimizationRemarkEmitter * back to std::unique_ptr<OptimizationRemarkEmitter> and constructing it in AtomicExpand to avoid DTC and LI overhead.

You also need to drop the getAnalysisUsage() change, otherwise the pass manager will still create the ORE itself.

PS: This revision still has incorrectly configured permissions, which prevents me from leaving line comments. Revisions on this Phabricator instance should always be publicly writable.

removing analysis requirement as requested

+ nikic

Harbormaster completed remote builds in B119617: Diff 366507.Aug 15 2021, 9:51 AM

fixing breaking tests by eliminating passes that are no longer in the pass pipelines

gandhi21299 reopened this revision.Aug 15 2021, 11:36 AM

This revision is now accepted and ready to land.Aug 15 2021, 11:36 AM

Please drop a change in PowerPC/O3-pipeline.ll

Harbormaster completed remote builds in B119622: Diff 366514.Aug 15 2021, 12:03 PM

eliminated changes in PowerPC/O3-pipeline.ll, as requested

Harbormaster completed remote builds in B119625: Diff 366517.Aug 15 2021, 12:38 PM

Any ideas on what could be causing the failure in windows pre-merge checks?

Not related to your patch, feel free to ignore

Alright, please let me know if this patch is good for merge at your convenience.

rampitec added inline comments.Aug 16 2021, 10:23 AM

llvm/lib/CodeGen/AtomicExpandPass.cpp
182	Is there a reason to construct it upfront and not just use a local variable only when needed? Like in StackProtector.cpp for example.

gandhi21299 added inline comments.Aug 16 2021, 10:30 AM

llvm/lib/CodeGen/AtomicExpandPass.cpp
182	We can certainly implement it as a local variable as long as we have access to the function this pass is operating on. I was thinking of its potential use throughout this pass in the future.

rampitec added inline comments.Aug 16 2021, 11:10 AM

llvm/lib/CodeGen/AtomicExpandPass.cpp
182	You have access to the function, AI->getParent()->getParent(). You also will not need to pass ORE everywhere in the subsequent patch, just construct it in target in place.

gandhi21299 added inline comments.Aug 16 2021, 11:12 AM

llvm/lib/CodeGen/AtomicExpandPass.cpp
182	Sounds like a plan!

ORE does not need to be a pointer anymore, it is constructed as local variable with this patch as requested by reviewer.

gandhi21299 marked 2 inline comments as done.Aug 16 2021, 11:24 AM

LGTM, but please wait for others too.

Will do, thanks!

Harbormaster completed remote builds in B119751: Diff 366683.Aug 16 2021, 12:26 PM

rampitec mentioned this in D108150: [Remarks] [AMDGPU] Emit optimization remarks for atomics generating hardware instructions.Aug 16 2021, 12:30 PM

LGTM

Closed by commit rGf22ba5187350: [Remarks] Emit optimization remarks for atomics generating CAS loop (authored by gandhi21299). · Explain WhyAug 16 2021, 1:56 PM

This revision was automatically updated to reflect the committed changes.

gandhi21299 added a commit: rGf22ba5187350: [Remarks] Emit optimization remarks for atomics generating CAS loop.

Revision Contents

Path

Size

clang/

test/

CodeGenCUDA/

fp-atomics-optremarks.cu

16 lines

CodeGenOpenCL/

fp-atomics-optremarks-gfx90a.cl

28 lines

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

5 lines

lib/

CodeGen/

AtomicExpandPass.cpp

29 lines

Target/

AMDGPU/

AMDGPUISelLowering.h

5 lines

AMDGPUISelLowering.cpp

4 lines

SIISelLowering.h

4 lines

SIISelLowering.cpp

56 lines

X86/

X86ISelLowering.h

4 lines

X86ISelLowering.cpp

6 lines

test/

CodeGen/

AMDGPU/

fp-atomics-remarks-gfx90a.ll

12 lines

llc-pipeline.ll

25 lines

X86/

O0-pipeline.ll

7 lines

opt-pipeline.ll

7 lines

Diff 364619

clang/test/CodeGenCUDA/fp-atomics-optremarks.cu

This file was added.

				// RUN: %clang_cc1 %s -triple=amdgcn-amd-amdhsa -fcuda-is-device \
				// RUN: -target-cpu gfx90a -Rpass=atomic-expand -S -o - 2>&1 \| \
				// RUN: FileCheck %s --check-prefix=GFX90A-CAS

				// REQUIRES: amdgpu-registered-target

				#include "Inputs/cuda.h"
				#include <stdatomic.h>

				// GFX90A-CAS: An atomic instruction was expanded into a compare and swap loop
				// GFX90A-CAS-LABEL: _Z14atomic_add_casPf
				rampitecUnsubmitted Done Reply Inline Actions Need tests for all scopes. rampitec: Need tests for all scopes.
				gandhi21299AuthorUnsubmitted Done Reply Inline Actions `__atomic_fetch_add` does not take scope as an argument, how could I add tests with different scopes? gandhi21299: `__atomic_fetch_add` does not take scope as an argument, how could I add tests with different…
				rampitecUnsubmitted Done Reply Inline Actions At least in the IR test. rampitec: At least in the IR test.
				gandhi21299AuthorUnsubmitted Done Reply Inline Actions What do you mean by that? gandhi21299: What do you mean by that?
				rampitecUnsubmitted Done Reply Inline Actions You need to test all of that. If you cannot write a proper .cu test, then write an IR test and run llc. rampitec: You need to test all of that. If you cannot write a proper .cu test, then write an IR test and…
				gandhi21299AuthorUnsubmitted Done Reply Inline Actions Should I discard this test then since the test fp-atomics-remarks-gfx90a.ll already satisfies? gandhi21299: Should I discard this test then since the test fp-atomics-remarks-gfx90a.ll already satisfies?
				rampitecUnsubmitted Done Reply Inline Actions CU test is still needed. You also need it in the .cl test below. rampitec: CU test is still needed. You also need it in the .cl test below.
				gandhi21299AuthorUnsubmitted Done Reply Inline Actions Alright, I am not sure how I can test for the other scopes though. gandhi21299: Alright, I am not sure how I can test for the other scopes though.
				// GFX90A-CAS: flat_atomic_cmpswap v0, v[2:3], v[4:5] glc
				// GFX90A-CAS: s_cbranch_execnz
				__device__ float atomic_add_cas(float *p) {
				return __atomic_fetch_add(p, 1.0f, memory_order_relaxed);
				}

clang/test/CodeGenOpenCL/fp-atomics-optremarks-gfx90a.cl

This file was added.

				// RUN: %clang_cc1 -cl-std=CL2.0 -triple amdgcn-amd-amdhsa -target-cpu gfx90a \
				// RUN: -Rpass=si-lower -munsafe-fp-atomics %s -S -o - 2>&1 \
				// RUN: \| FileCheck %s -check-prefix=GFX90A-HW

				typedef enum memory_order {
				memory_order_relaxed = __ATOMIC_RELAXED,
				memory_order_acquire = __ATOMIC_ACQUIRE,
				memory_order_release = __ATOMIC_RELEASE,
				memory_order_acq_rel = __ATOMIC_ACQ_REL,
				memory_order_seq_cst = __ATOMIC_SEQ_CST
				} memory_order;

				typedef enum memory_scope {
				memory_scope_work_item = __OPENCL_MEMORY_SCOPE_WORK_ITEM,
				memory_scope_work_group = __OPENCL_MEMORY_SCOPE_WORK_GROUP,
				memory_scope_device = __OPENCL_MEMORY_SCOPE_DEVICE,
				memory_scope_all_svm_devices = __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES,
				#if defined(cl_intel_subgroups) \|\| defined(cl_khr_subgroups)
				memory_scope_sub_group = __OPENCL_MEMORY_SCOPE_SUB_GROUP
				#endif
				} memory_scope;

				// GFX90A-HW: A floating-point atomic instruction will generate an unsafe hardware instruction which may fail to update memory [-Rpass=si-lower]
				rampitecUnsubmitted Done Reply Inline Actions Should check other cases too. Essentially a check per every distinct emitted remark. rampitec: Should check other cases too. Essentially a check per every distinct emitted remark.
				// GFX90A-HW-LABEL: test_atomic_add
				// GFX90A-HW: global_atomic_add_f64 v[0:1], v[0:1], v[2:3], off glc
				float test_atomic_add(__global atomic_double *d, double a) {
				arsenmUnsubmitted Done Reply Inline Actions Should check the operands etc, this isn't enough to see which version was selected arsenm: Should check the operands etc, this isn't enough to see which version was selected
				return __opencl_atomic_fetch_add(d, a, memory_order_relaxed, memory_scope_work_group);
				}

llvm/include/llvm/CodeGen/TargetLowering.h

Show All 22 Lines
#define LLVM_CODEGEN_TARGETLOWERING_H		#define LLVM_CODEGEN_TARGETLOWERING_H

#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/CodeGen/DAGCombine.h"		#include "llvm/CodeGen/DAGCombine.h"
#include "llvm/CodeGen/ISDOpcodes.h"		#include "llvm/CodeGen/ISDOpcodes.h"
#include "llvm/CodeGen/RuntimeLibcalls.h"		#include "llvm/CodeGen/RuntimeLibcalls.h"
#include "llvm/CodeGen/SelectionDAG.h"		#include "llvm/CodeGen/SelectionDAG.h"
#include "llvm/CodeGen/SelectionDAGNodes.h"		#include "llvm/CodeGen/SelectionDAGNodes.h"
#include "llvm/CodeGen/TargetCallingConv.h"		#include "llvm/CodeGen/TargetCallingConv.h"
#include "llvm/CodeGen/ValueTypes.h"		#include "llvm/CodeGen/ValueTypes.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
▲ Show 20 Lines • Show All 1,960 Lines • ▼ Show 20 Lines	public:
/// AtomicExpand pass.		/// AtomicExpand pass.
virtual AtomicExpansionKind		virtual AtomicExpansionKind
shouldExpandAtomicCmpXchgInIR(AtomicCmpXchgInst *AI) const {		shouldExpandAtomicCmpXchgInIR(AtomicCmpXchgInst *AI) const {
return AtomicExpansionKind::None;		return AtomicExpansionKind::None;
}		}

/// Returns how the IR-level AtomicExpand pass should expand the given		/// Returns how the IR-level AtomicExpand pass should expand the given
/// AtomicRMW, if at all. Default is to never expand.		/// AtomicRMW, if at all. Default is to never expand.
virtual AtomicExpansionKind shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW) const {		virtual AtomicExpansionKind
		shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW,
		OptimizationRemarkEmitter *ORE) const {
return RMW->isFloatingPointOperation() ?		return RMW->isFloatingPointOperation() ?
AtomicExpansionKind::CmpXChg : AtomicExpansionKind::None;		AtomicExpansionKind::CmpXChg : AtomicExpansionKind::None;
}		}

/// On some platforms, an AtomicRMW that never actually modifies the value		/// On some platforms, an AtomicRMW that never actually modifies the value
/// (such as fetch_add of 0) can be turned into a fence followed by an		/// (such as fetch_add of 0) can be turned into a fence followed by an
/// atomic load. This may sound useless, but it makes it possible for the		/// atomic load. This may sound useless, but it makes it possible for the
/// processor to keep the cacheline shared, dramatically improving		/// processor to keep the cacheline shared, dramatically improving
▲ Show 20 Lines • Show All 2,676 Lines • Show Last 20 Lines

llvm/lib/CodeGen/AtomicExpandPass.cpp

Show All 11 Lines
// include the use of (intrinsic-based) load-linked/store-conditional loops,		// include the use of (intrinsic-based) load-linked/store-conditional loops,
// AtomicCmpXchg, or type coercions.		// AtomicCmpXchg, or type coercions.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/CodeGen/AtomicExpandUtils.h"		#include "llvm/CodeGen/AtomicExpandUtils.h"
#include "llvm/CodeGen/RuntimeLibcalls.h"		#include "llvm/CodeGen/RuntimeLibcalls.h"
#include "llvm/CodeGen/TargetLowering.h"		#include "llvm/CodeGen/TargetLowering.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/CodeGen/TargetSubtargetInfo.h"		#include "llvm/CodeGen/TargetSubtargetInfo.h"
#include "llvm/CodeGen/ValueTypes.h"		#include "llvm/CodeGen/ValueTypes.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
Show All 25 Lines
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "atomic-expand"		#define DEBUG_TYPE "atomic-expand"

namespace {		namespace {

class AtomicExpand: public FunctionPass {		class AtomicExpand: public FunctionPass {
const TargetLowering *TLI = nullptr;		const TargetLowering *TLI = nullptr;
		OptimizationRemarkEmitter *ORE;

public:		public:
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid

AtomicExpand() : FunctionPass(ID) {		AtomicExpand() : FunctionPass(ID) {
initializeAtomicExpandPass(*PassRegistry::getPassRegistry());		initializeAtomicExpandPass(*PassRegistry::getPassRegistry());
}		}

bool runOnFunction(Function &F) override;		bool runOnFunction(Function &F) override;

private:		private:
		void getAnalysisUsage(AnalysisUsage &AU) const override;
		bool emitAtomicExpansionRemarks(AtomicRMWInst *RMW,
		TargetLowering::AtomicExpansionKind Kind);
bool bracketInstWithFences(Instruction *I, AtomicOrdering Order);		bool bracketInstWithFences(Instruction *I, AtomicOrdering Order);
IntegerType getCorrespondingIntegerType(Type T, const DataLayout &DL);		IntegerType getCorrespondingIntegerType(Type T, const DataLayout &DL);
LoadInst convertAtomicLoadToIntegerType(LoadInst LI);		LoadInst convertAtomicLoadToIntegerType(LoadInst LI);
bool tryExpandAtomicLoad(LoadInst *LI);		bool tryExpandAtomicLoad(LoadInst *LI);
bool expandAtomicLoadToLL(LoadInst *LI);		bool expandAtomicLoadToLL(LoadInst *LI);
bool expandAtomicLoadToCmpXchg(LoadInst *LI);		bool expandAtomicLoadToCmpXchg(LoadInst *LI);
StoreInst convertAtomicStoreToIntegerType(StoreInst SI);		StoreInst convertAtomicStoreToIntegerType(StoreInst SI);
bool expandAtomicStore(StoreInst *SI);		bool expandAtomicStore(StoreInst *SI);
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
template <typename Inst>		template <typename Inst>
static bool atomicSizeSupported(const TargetLowering TLI, Inst I) {		static bool atomicSizeSupported(const TargetLowering TLI, Inst I) {
unsigned Size = getAtomicOpSize(I);		unsigned Size = getAtomicOpSize(I);
Align Alignment = I->getAlign();		Align Alignment = I->getAlign();
return Alignment >= Size &&		return Alignment >= Size &&
Size <= TLI->getMaxAtomicSizeInBitsSupported() / 8;		Size <= TLI->getMaxAtomicSizeInBitsSupported() / 8;
}		}

		void AtomicExpand::getAnalysisUsage(AnalysisUsage &AU) const {
		AU.addRequired<OptimizationRemarkEmitterWrapperPass>();
		}

bool AtomicExpand::runOnFunction(Function &F) {		bool AtomicExpand::runOnFunction(Function &F) {
auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();		auto *TPC = getAnalysisIfAvailable<TargetPassConfig>();
if (!TPC)		if (!TPC)
return false;		return false;

		ORE = &getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE();
		arsenmUnsubmitted Done Reply Inline Actions There’s basically never a reason to use shared_ptr over unique_ptr arsenm: There’s basically never a reason to use shared_ptr over unique_ptr
		rampitecUnsubmitted Done Reply Inline Actions Is there a reason to construct it upfront and not just use a local variable only when needed? Like in StackProtector.cpp for example. rampitec: Is there a reason to construct it upfront and not just use a local variable only when needed?
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions We can certainly implement it as a local variable as long as we have access to the function this pass is operating on. I was thinking of its potential use throughout this pass in the future. gandhi21299: We can certainly implement it as a local variable as long as we have access to the function…
		rampitecUnsubmitted Done Reply Inline Actions You have access to the function, AI->getParent()->getParent(). You also will not need to pass ORE everywhere in the subsequent patch, just construct it in target in place. rampitec: You have access to the function, AI->getParent()->getParent(). You also will not need to pass…
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Sounds like a plan! gandhi21299: Sounds like a plan!
auto &TM = TPC->getTM<TargetMachine>();		auto &TM = TPC->getTM<TargetMachine>();
if (!TM.getSubtargetImpl(F)->enableAtomicExpand())		if (!TM.getSubtargetImpl(F)->enableAtomicExpand())
return false;		return false;
TLI = TM.getSubtargetImpl(F)->getTargetLowering();		TLI = TM.getSubtargetImpl(F)->getTargetLowering();

SmallVector<Instruction *, 1> AtomicInsts;		SmallVector<Instruction *, 1> AtomicInsts;

// Changing control-flow while iterating through it is a bad idea, so gather a		// Changing control-flow while iterating through it is a bad idea, so gather a
▲ Show 20 Lines • Show All 383 Lines • ▼ Show 20 Lines	case AtomicRMWInst::FAdd:
return Builder.CreateFAdd(Loaded, Inc, "new");		return Builder.CreateFAdd(Loaded, Inc, "new");
case AtomicRMWInst::FSub:		case AtomicRMWInst::FSub:
return Builder.CreateFSub(Loaded, Inc, "new");		return Builder.CreateFSub(Loaded, Inc, "new");
default:		default:
llvm_unreachable("Unknown atomic op");		llvm_unreachable("Unknown atomic op");
}		}
}		}

		bool AtomicExpand::emitAtomicExpansionRemarks(
		AtomicRMWInst *RMW, TargetLowering::AtomicExpansionKind Kind) {
		if (Kind == TargetLowering::AtomicExpansionKind::CmpXChg) {
		ORE->emit([&]() {
		arsenmUnsubmitted Done Reply Inline Actions Cannot check the AMDGPU attribute here arsenm: Cannot check the AMDGPU attribute here
		rampitecUnsubmitted Done Reply Inline Actions I do not see why do you need this function and all its arguments now. You can just call ORE->emit() directly. rampitec: I do not see why do you need this function and all its arguments now. You can just call ORE…
		OptimizationRemark Remark(DEBUG_TYPE, "Passed", RMW->getFunction());
		Remark << "An atomic instruction was expanded into a compare and swap "
		"loop";
		rampitecUnsubmitted Done Reply Inline Actions This is target specific. Besides there can be different reasons for a loop. rampitec: This is target specific. Besides there can be different reasons for a loop.
		return Remark;
		});
		arsenmUnsubmitted Done Reply Inline Actions Maybe CAS should be spelled out arsenm: Maybe CAS should be spelled out
		return true;
		arsenmUnsubmitted Done Reply Inline Actions I think remark messages aren't supposed to end in punctuation arsenm: I think remark messages aren't supposed to end in punctuation
		}
		return false;
		}
		rampitecUnsubmitted Done Reply Inline Actions Nothing was generated just yet, pass just left IR instruction untouched. In a common case we cannot say what an abstract BE will do about it later. rampitec: Nothing was generated just yet, pass just left IR instruction untouched. In a common case we…

bool AtomicExpand::tryExpandAtomicRMW(AtomicRMWInst *AI) {		bool AtomicExpand::tryExpandAtomicRMW(AtomicRMWInst *AI) {
		arsenmUnsubmitted Done Reply Inline Actions This is too strong of a statement here, although I think emitting anything here is useless arsenm: This is too strong of a statement here, although I think emitting anything here is useless
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions The only other option is SIISelLowering then. gandhi21299: The only other option is SIISelLowering then.
switch (TLI->shouldExpandAtomicRMWInIR(AI)) {		TargetLowering::AtomicExpansionKind Kind =
		TLI->shouldExpandAtomicRMWInIR(AI, ORE);
		arsenmUnsubmitted Done Reply Inline Actions Unsafe is misleading, plus this is being too specific to AMDGPU arsenm: Unsafe is misleading, plus this is being too specific to AMDGPU
		rampitecUnsubmitted Done Reply Inline Actions Having UnsafeFPAtomicFlag does not automatically mean a HW instruction produced is unsafe. Moreover, you simply cannot know why this or that decision was done by a target method here. rampitec: Having UnsafeFPAtomicFlag does not automatically mean a HW instruction produced is unsafe.
		emitAtomicExpansionRemarks(AI, Kind);
		switch (Kind) {
		rampitecUnsubmitted Done Reply Inline Actions What should this "Passed" do and why wouldn't just declare it where you use it? rampitec: What should this "Passed" do and why wouldn't just declare it where you use it?
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions https://llvm.org/docs/Remarks.html Since this is an informative pass and not that pass failed to optimize, the "Passed" argument is used. I will move it downwards, I thought it might be useful in the future for other operations. Its better below for now anyways. gandhi21299: https://llvm.org/docs/Remarks.html Since this is an informative pass and not that pass failed…
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Actually I am getting a runtime error at the line where I declare Remark when I bring it down. gandhi21299: Actually I am getting a runtime error at the line where I declare Remark when I bring it down.
case TargetLoweringBase::AtomicExpansionKind::None:		case TargetLoweringBase::AtomicExpansionKind::None:
return false;		return false;
case TargetLoweringBase::AtomicExpansionKind::LLSC: {		case TargetLoweringBase::AtomicExpansionKind::LLSC: {
unsigned MinCASSize = TLI->getMinCmpXchgSizeInBits() / 8;		unsigned MinCASSize = TLI->getMinCmpXchgSizeInBits() / 8;
unsigned ValueSize = getAtomicOpSize(AI);		unsigned ValueSize = getAtomicOpSize(AI);
if (ValueSize < MinCASSize) {		if (ValueSize < MinCASSize) {
expandPartwordAtomicRMW(AI,		expandPartwordAtomicRMW(AI,
TargetLoweringBase::AtomicExpansionKind::LLSC);		TargetLoweringBase::AtomicExpansionKind::LLSC);
Show All 14 Lines	if (ValueSize < MinCASSize) {
// TODO: Handle atomicrmw fadd/fsub		// TODO: Handle atomicrmw fadd/fsub
if (AI->getType()->isFloatingPointTy())		if (AI->getType()->isFloatingPointTy())
return false;		return false;

expandPartwordAtomicRMW(AI,		expandPartwordAtomicRMW(AI,
TargetLoweringBase::AtomicExpansionKind::CmpXChg);		TargetLoweringBase::AtomicExpansionKind::CmpXChg);
} else {		} else {
expandAtomicRMWToCmpXchg(AI, createCmpXchgInstFun);		expandAtomicRMWToCmpXchg(AI, createCmpXchgInstFun);
}		}
		rampitecUnsubmitted Done Reply Inline Actions Only if SSNs.empty(). rampitec: Only if SSNs.empty().
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Sorry, what do you mean? SSN will be empty at that point. gandhi21299: Sorry, what do you mean? SSN will be empty at that point.
		rampitecUnsubmitted Done Reply Inline Actions I thought want to cache it. But really just declare it here. rampitec: I thought want to cache it. But really just declare it here.
return true;		return true;
}		}
		rampitecUnsubmitted Done Reply Inline Actions Still the same problem. rampitec: Still the same problem.
case TargetLoweringBase::AtomicExpansionKind::MaskedIntrinsic: {		case TargetLoweringBase::AtomicExpansionKind::MaskedIntrinsic: {
		rampitecUnsubmitted Done Reply Inline Actions Need to name the operation. rampitec: Need to name the operation.
expandAtomicRMWToMaskedIntrinsic(AI);		expandAtomicRMWToMaskedIntrinsic(AI);
return true;		return true;
		rampitecUnsubmitted Done Reply Inline Actions That does not help with target defined scope names, such as our "one-as" for example. rampitec: That does not help with target defined scope names, such as our "one-as" for example.
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions How can I get target defined scope names? gandhi21299: How can I get target defined scope names?
		rampitecUnsubmitted Done Reply Inline Actions It is right on the instruction: %result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 syncscope("one-as") seq_cst rampitec: It is right on the instruction: %result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0…
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Sorry, I meant from the LLVM API. gandhi21299: Sorry, I meant from the LLVM API.
		rampitecUnsubmitted Done Reply Inline Actions LLVMContext::getSyncScopeNames() rampitec: LLVMContext::getSyncScopeNames()
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions I think that gives me all sync scopes available for the target. If not, which sync scope in the vector corresponds to the instruction I am dealing with? gandhi21299: I think that gives me all sync scopes available for the target. If not, which sync scope in the…
		rampitecUnsubmitted Done Reply Inline Actions https://llvm.org/doxygen/MachineOperand_8cpp_source.html#l00474 rampitec: https://llvm.org/doxygen/MachineOperand_8cpp_source.html#l00474
}		}
		arsenmUnsubmitted Done Reply Inline Actions This is an AMDGPU specific message/restriction. The floating point operation isn't relevant, and you don't even know the specific reason at this point. arsenm: This is an AMDGPU specific message/restriction. The floating point operation isn't relevant…
		rampitecUnsubmitted Done Reply Inline Actions I believe getOpcodeName() will return "atomicrmw" instead of the operation. Also missing space after it. rampitec: I believe getOpcodeName() will return "atomicrmw" instead of the operation. Also missing space…
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions getOpcodeName() returns `atomicrmwoperation`, as per the tests the spacing looks correct to me. gandhi21299: getOpcodeName() returns `atomicrmwoperation`, as per the tests the spacing looks correct to me.
		rampitecUnsubmitted Done Reply Inline Actions The operation to report is AI->getOperation(). Spacing is wrong, "operation" is your text. rampitec: The operation to report is AI->getOperation(). Spacing is wrong, "operation" is your text.
		arsenmUnsubmitted Done Reply Inline Actions Missing word atomic? arsenm: Missing word atomic?
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions Its already part of the OperationName gandhi21299: Its already part of the OperationName
		rampitecUnsubmitted Done Reply Inline Actions Matt is right, missing "atomic" word. rampitec: Matt is right, missing "atomic" word.
default:		default:
llvm_unreachable("Unhandled case in tryExpandAtomicRMW");		llvm_unreachable("Unhandled case in tryExpandAtomicRMW");
}		}
}		}

namespace {		namespace {

struct PartwordMaskValues {		struct PartwordMaskValues {
▲ Show 20 Lines • Show All 1,280 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h

Show All 9 Lines
/// Interface definition of the TargetLowering class that is common		/// Interface definition of the TargetLowering class that is common
/// to all AMD GPUs.		/// to all AMD GPUs.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUISELLOWERING_H		#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUISELLOWERING_H
#define LLVM_LIB_TARGET_AMDGPU_AMDGPUISELLOWERING_H		#define LLVM_LIB_TARGET_AMDGPU_AMDGPUISELLOWERING_H

		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/CodeGen/CallingConvLower.h"		#include "llvm/CodeGen/CallingConvLower.h"
#include "llvm/CodeGen/TargetLowering.h"		#include "llvm/CodeGen/TargetLowering.h"

namespace llvm {		namespace llvm {

class AMDGPUMachineFunction;		class AMDGPUMachineFunction;
class AMDGPUSubtarget;		class AMDGPUSubtarget;
struct ArgDescriptor;		struct ArgDescriptor;
▲ Show 20 Lines • Show All 295 Lines • ▼ Show 20 Lines	public:
/// type of implicit parameter.		/// type of implicit parameter.
uint32_t getImplicitParameterOffset(const MachineFunction &MF,		uint32_t getImplicitParameterOffset(const MachineFunction &MF,
const ImplicitParameter Param) const;		const ImplicitParameter Param) const;

MVT getFenceOperandTy(const DataLayout &DL) const override {		MVT getFenceOperandTy(const DataLayout &DL) const override {
return MVT::i32;		return MVT::i32;
}		}

AtomicExpansionKind shouldExpandAtomicRMWInIR(AtomicRMWInst *) const override;		AtomicExpansionKind
		shouldExpandAtomicRMWInIR(AtomicRMWInst *,
		OptimizationRemarkEmitter *ORE) const override;

bool isConstantUnsignedBitfieldExtactLegal(unsigned Opc, LLT Ty1,		bool isConstantUnsignedBitfieldExtactLegal(unsigned Opc, LLT Ty1,
LLT Ty2) const override;		LLT Ty2) const override;
};		};

namespace AMDGPUISD {		namespace AMDGPUISD {

enum NodeType : unsigned {		enum NodeType : unsigned {
▲ Show 20 Lines • Show All 199 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show All 12 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "AMDGPUISelLowering.h"		#include "AMDGPUISelLowering.h"
#include "AMDGPU.h"		#include "AMDGPU.h"
#include "AMDGPUInstrInfo.h"		#include "AMDGPUInstrInfo.h"
#include "AMDGPUMachineFunction.h"		#include "AMDGPUMachineFunction.h"
#include "GCNSubtarget.h"		#include "GCNSubtarget.h"
#include "SIMachineFunctionInfo.h"		#include "SIMachineFunctionInfo.h"
		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/CodeGen/Analysis.h"		#include "llvm/CodeGen/Analysis.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"		#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"

using namespace llvm;		using namespace llvm;
▲ Show 20 Lines • Show All 4,801 Lines • ▼ Show 20 Lines	case ISD::INTRINSIC_WO_CHAIN: {
}		}
}		}
default:		default:
return false;		return false;
}		}
}		}

TargetLowering::AtomicExpansionKind		TargetLowering::AtomicExpansionKind
AMDGPUTargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW) const {		AMDGPUTargetLowering::shouldExpandAtomicRMWInIR(
		AtomicRMWInst RMW, OptimizationRemarkEmitter ORE) const {
switch (RMW->getOperation()) {		switch (RMW->getOperation()) {
case AtomicRMWInst::Nand:		case AtomicRMWInst::Nand:
case AtomicRMWInst::FAdd:		case AtomicRMWInst::FAdd:
case AtomicRMWInst::FSub:		case AtomicRMWInst::FSub:
return AtomicExpansionKind::CmpXChg;		return AtomicExpansionKind::CmpXChg;
default:		default:
return AtomicExpansionKind::None;		return AtomicExpansionKind::None;
}		}
}		}

bool AMDGPUTargetLowering::isConstantUnsignedBitfieldExtactLegal(		bool AMDGPUTargetLowering::isConstantUnsignedBitfieldExtactLegal(
unsigned Opc, LLT Ty1, LLT Ty2) const {		unsigned Opc, LLT Ty1, LLT Ty2) const {
return Ty1 == Ty2 && (Ty1 == LLT::scalar(32) \|\| Ty1 == LLT::scalar(64));		return Ty1 == Ty2 && (Ty1 == LLT::scalar(32) \|\| Ty1 == LLT::scalar(64));
}		}

llvm/lib/Target/AMDGPU/SIISelLowering.h

Show First 20 Lines • Show All 445 Lines • ▼ Show 20 Lines	bool isCanonicalized(Register Reg, MachineFunction &MF,
unsigned MaxDepth = 5) const;		unsigned MaxDepth = 5) const;
bool denormalsEnabledForType(const SelectionDAG &DAG, EVT VT) const;		bool denormalsEnabledForType(const SelectionDAG &DAG, EVT VT) const;
bool denormalsEnabledForType(LLT Ty, MachineFunction &MF) const;		bool denormalsEnabledForType(LLT Ty, MachineFunction &MF) const;

bool isKnownNeverNaNForTargetNode(SDValue Op,		bool isKnownNeverNaNForTargetNode(SDValue Op,
const SelectionDAG &DAG,		const SelectionDAG &DAG,
bool SNaN = false,		bool SNaN = false,
unsigned Depth = 0) const override;		unsigned Depth = 0) const override;
AtomicExpansionKind shouldExpandAtomicRMWInIR(AtomicRMWInst *) const override;		AtomicExpansionKind
		shouldExpandAtomicRMWInIR(AtomicRMWInst *,
		OptimizationRemarkEmitter *ORE) const override;

virtual const TargetRegisterClass *		virtual const TargetRegisterClass *
getRegClassFor(MVT VT, bool isDivergent) const override;		getRegClassFor(MVT VT, bool isDivergent) const override;
virtual bool requiresUniformRegister(MachineFunction &MF,		virtual bool requiresUniformRegister(MachineFunction &MF,
const Value *V) const override;		const Value *V) const override;
Align getPrefLoopAlignment(MachineLoop *ML) const override;		Align getPrefLoopAlignment(MachineLoop *ML) const override;

void allocateHSAUserSGPRs(CCState &CCInfo,		void allocateHSAUserSGPRs(CCState &CCInfo,
Show All 36 Lines

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show All 13 Lines
	#include "SIISelLowering.h"			#include "SIISelLowering.h"
	#include "AMDGPU.h"			#include "AMDGPU.h"
	#include "AMDGPUInstrInfo.h"			#include "AMDGPUInstrInfo.h"
	#include "AMDGPUTargetMachine.h"			#include "AMDGPUTargetMachine.h"
	#include "SIMachineFunctionInfo.h"			#include "SIMachineFunctionInfo.h"
	#include "SIRegisterInfo.h"			#include "SIRegisterInfo.h"
	#include "llvm/ADT/Statistic.h"			#include "llvm/ADT/Statistic.h"
	#include "llvm/Analysis/LegacyDivergenceAnalysis.h"			#include "llvm/Analysis/LegacyDivergenceAnalysis.h"
				#include "llvm/Analysis/OptimizationRemarkEmitter.h"
	#include "llvm/BinaryFormat/ELF.h"			#include "llvm/BinaryFormat/ELF.h"
	#include "llvm/CodeGen/Analysis.h"			#include "llvm/CodeGen/Analysis.h"
	#include "llvm/CodeGen/FunctionLoweringInfo.h"			#include "llvm/CodeGen/FunctionLoweringInfo.h"
	#include "llvm/CodeGen/GlobalISel/GISelKnownBits.h"			#include "llvm/CodeGen/GlobalISel/GISelKnownBits.h"
	#include "llvm/CodeGen/MachineLoopInfo.h"			#include "llvm/CodeGen/MachineLoopInfo.h"
	#include "llvm/IR/DiagnosticInfo.h"			#include "llvm/IR/DiagnosticInfo.h"
	#include "llvm/IR/IntrinsicInst.h"			#include "llvm/IR/IntrinsicInst.h"
	#include "llvm/IR/IntrinsicsAMDGPU.h"			#include "llvm/IR/IntrinsicsAMDGPU.h"
	▲ Show 20 Lines • Show All 12,078 Lines • ▼ Show 20 Lines
	static bool fpModeMatchesGlobalFPAtomicMode(const AtomicRMWInst *RMW) {			static bool fpModeMatchesGlobalFPAtomicMode(const AtomicRMWInst *RMW) {
	const fltSemantics &Flt = RMW->getType()->getScalarType()->getFltSemantics();			const fltSemantics &Flt = RMW->getType()->getScalarType()->getFltSemantics();
	auto DenormMode = RMW->getParent()->getParent()->getDenormalMode(Flt);			auto DenormMode = RMW->getParent()->getParent()->getDenormalMode(Flt);
	if (&Flt == &APFloat::IEEEsingle())			if (&Flt == &APFloat::IEEEsingle())
	return DenormMode == DenormalMode::getPreserveSign();			return DenormMode == DenormalMode::getPreserveSign();
	return DenormMode == DenormalMode::getIEEE();			return DenormMode == DenormalMode::getIEEE();
	}			}

	TargetLowering::AtomicExpansionKind			static TargetLowering::AtomicExpansionKind
	SITargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *RMW) const {			reportAtomicExpand(TargetLowering::AtomicExpansionKind Kind,
				OptimizationRemarkEmitter *ORE,
				OptimizationRemark OptRemark) {
				arsenmUnsubmitted Done Reply Inline Actions This is supposed to come from the pass, I don't think you can materialize this out of nowhere arsenm: This is supposed to come from the pass, I don't think you can materialize this out of nowhere
				gandhi21299AuthorUnsubmitted Done Reply Inline Actions I couldn't get `getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE()` to work. gandhi21299: I couldn't get `getAnalysis<OptimizationRemarkEmitterWrapperPass>().getORE()` to work.
				rampitecUnsubmitted Done Reply Inline Actions Why OptRemark and not just StringRef? I really want to see as little churn as possible at the call site. rampitec: Why OptRemark and not just StringRef? I really want to see as little churn as possible at the…
				gandhi21299AuthorUnsubmitted Done Reply Inline Actions With only StringRef, we would also have to pass in RMW since OptimizationRemark constructor depends on that. gandhi21299: With only StringRef, we would also have to pass in RMW since OptimizationRemark constructor…
				rampitecUnsubmitted Done Reply Inline Actions It seems better be a lamda where you can just capture most of the stuff. rampitec: It seems better be a lamda where you can just capture most of the stuff.
				gandhi21299AuthorUnsubmitted Done Reply Inline Actions What would the type be for the following lambda? [&](){ return OptimizationRemark(...); } gandhi21299: What would the type be for the following lambda? ``` [&](){ return OptimizationRemark(...)…
				rampitecUnsubmitted Done Reply Inline Actions You need to return TargetLowering::AtomicExpansionKind, not OptimizationRemark. rampitec: You need to return TargetLowering::AtomicExpansionKind, not OptimizationRemark.
				gandhi21299AuthorUnsubmitted Done Reply Inline Actions This is what goes into ORE->emit() as an argument though. I suspect the remark won't be emitted if I return AtomicExpansionKind within the lambda. gandhi21299: This is what goes into ORE->emit() as an argument though. I suspect the remark won't be emitted…
				gandhi21299AuthorUnsubmitted Done Reply Inline Actions @rampitec I don't think there is a way to pass the ORE->emit argument lambda expression into the reportAtomicExpand() function because of its capture. It needs access to RMW from the enclosing scope. (Well, I could define a copy of RMW and somehow pass it into the lambda but that seems too much for this task) gandhi21299: @rampitec I don't think there is a way to pass the ORE->emit argument lambda expression into…
				rampitecUnsubmitted Done Reply Inline Actions Do not pass it there. Turn reportAtomicExpand into lambda. rampitec: Do not pass it there. Turn reportAtomicExpand into lambda.
				ORE->emit([&]() { return OptRemark; });
				return Kind;
				}
				rampitecUnsubmitted Done Reply Inline Actions This is not the only reason. You cannot use a fixed string here. rampitec: This is not the only reason. You cannot use a fixed string here.

				arsenmUnsubmitted Done Reply Inline Actions Using FP abbreviation is unclear. IIRC there shouldn't be an end period in messages arsenm: Using FP abbreviation is unclear. IIRC there shouldn't be an end period in messages
				TargetLowering::AtomicExpansionKind SITargetLowering::shouldExpandAtomicRMWInIR(
				AtomicRMWInst RMW, OptimizationRemarkEmitter ORE) const {
				OptimizationRemark OptRemark(DEBUG_TYPE, "Passed", RMW->getFunction());
				rampitecUnsubmitted Done Reply Inline Actions Just static, no need to expose it from SITargetLowering. Maybe even a functor inside shouldExpandAtomicRMWInIR ifself capturing ORE and RMW to pass less arguments. rampitec: Just static, no need to expose it from SITargetLowering. Maybe even a functor inside…
	switch (RMW->getOperation()) {			switch (RMW->getOperation()) {
	case AtomicRMWInst::FAdd: {			case AtomicRMWInst::FAdd: {
	Type *Ty = RMW->getType();			Type *Ty = RMW->getType();
				arsenmUnsubmitted Done Reply Inline Actions This message seems concerning. Unless you already know everything about this instruction, it doesn't really help you. arsenm: This message seems concerning. Unless you already know everything about this instruction, it…

	// We don't have a way to support 16-bit atomics now, so just leave them			// We don't have a way to support 16-bit atomics now, so just leave them
	// as-is.			// as-is.
	if (Ty->isHalfTy())			if (Ty->isHalfTy())
				arsenmUnsubmitted Done Reply Inline Actions Should not need to create and delete, the analysis pass should own this arsenm: Should not need to create and delete, the analysis pass should own this
				gandhi21299AuthorUnsubmitted Done Reply Inline Actions above comment gandhi21299: above comment
	return AtomicExpansionKind::None;			return AtomicExpansionKind::None;

	if (!Ty->isFloatTy() && (!Subtarget->hasGFX90AInsts() \|\| !Ty->isDoubleTy()))			if (!Ty->isFloatTy() && (!Subtarget->hasGFX90AInsts() \|\| !Ty->isDoubleTy()))
	return AtomicExpansionKind::CmpXChg;			return AtomicExpansionKind::CmpXChg;

	unsigned AS = RMW->getPointerAddressSpace();			unsigned AS = RMW->getPointerAddressSpace();

	if ((AS == AMDGPUAS::GLOBAL_ADDRESS \|\| AS == AMDGPUAS::FLAT_ADDRESS) &&			if ((AS == AMDGPUAS::GLOBAL_ADDRESS \|\| AS == AMDGPUAS::FLAT_ADDRESS) &&
	Subtarget->hasAtomicFaddInsts()) {			Subtarget->hasAtomicFaddInsts()) {
	// The amdgpu-unsafe-fp-atomics attribute enables generation of unsafe			// The amdgpu-unsafe-fp-atomics attribute enables generation of unsafe
	// floating point atomic instructions. May generate more efficient code,			// floating point atomic instructions. May generate more efficient code,
	// but may not respect rounding and denormal modes, and may give incorrect			// but may not respect rounding and denormal modes, and may give incorrect
	// results for certain memory destinations.			// results for certain memory destinations.
	if (RMW->getFunction()			if (RMW->getFunction()
	->getFnAttribute("amdgpu-unsafe-fp-atomics")			->getFnAttribute("amdgpu-unsafe-fp-atomics")
	.getValueAsString() != "true")			.getValueAsString() != "true")
	return AtomicExpansionKind::CmpXChg;			return AtomicExpansionKind::CmpXChg;

	if (Subtarget->hasGFX90AInsts()) {			if (Subtarget->hasGFX90AInsts()) {
	if (Ty->isFloatTy() && AS == AMDGPUAS::FLAT_ADDRESS)			if (Ty->isFloatTy() && AS == AMDGPUAS::FLAT_ADDRESS)
	return AtomicExpansionKind::CmpXChg;			return AtomicExpansionKind::CmpXChg;
				rampitecUnsubmitted Done Reply Inline Actions It was not generated. We have multiple returns below this point. Some of them return None and some CmpXChg for various reasons. The request was to report when we produce the instruction if it is unsafe, not just that we are about to produce an instruction. Then to make it useful a remark should tell what was the reason to either produce an instruction or expand it. Looking at a stream of remarks in a big program one would also want to understand what exactly was expanded and what was left as is. A stream of messages "A hardware instruction was generated" unlikely will help to understand what was done. rampitec: It was not generated. We have multiple returns below this point. Some of them return None and…
				gandhi21299AuthorUnsubmitted Done Reply Inline Actions Will the hardware instruction be generated in the end of this function then? gandhi21299: Will the hardware instruction be generated in the end of this function then?
				rampitecUnsubmitted Done Reply Inline Actions It will not be generated here to begin with. If the function returns None the atomicrmw will be just left as is and then later selected into the instruction. But if you read the function, it has many returns for many different reasons, and that is exactly what a useful remark shall report. rampitec: It will not be generated here to begin with. If the function returns None the atomicrmw will be…
				rampitecUnsubmitted Done Reply Inline Actions It would not necessarily generate a HW instruction. There are still cases where we return CmpXChg. rampitec: It would not necessarily generate a HW instruction. There are still cases where we return…

	auto SSID = RMW->getSyncScopeID();			auto SSID = RMW->getSyncScopeID();
	if (SSID == SyncScope::System \|\|			if (SSID == SyncScope::System \|\|
	SSID == RMW->getContext().getOrInsertSyncScopeID("one-as"))			SSID == RMW->getContext().getOrInsertSyncScopeID("one-as"))
	return AtomicExpansionKind::CmpXChg;			return AtomicExpansionKind::CmpXChg;

	return AtomicExpansionKind::None;			return reportAtomicExpand(
				AtomicExpansionKind::None, ORE,
				OptRemark << "A floating-point atomic instruction will generate"
				rampitecUnsubmitted Done Reply Inline Actions I do not understand this message about the use. We are checking the use below simply because there was no return version of global_atomic_add_f32 on gfx908, so we are forced to expand it. rampitec: I do not understand this message about the use. We are checking the use below simply because…
				gandhi21299AuthorUnsubmitted Done Reply Inline Actions right, I forgot to erase that part. How does the following look: "A floating-point atomic instruction will generate an unsafe hardware instruction" I am not sure what other details I could put in here gandhi21299: right, I forgot to erase that part. How does the following look: "A floating-point atomic…
				rampitecUnsubmitted Done Reply Inline Actions In this place it might fail to update memory. But it is difficult to read and understand with all of that big ORE->emit blobs all over the place. rampitec: In this place it might fail to update memory. But it is difficult to read and understand with…
				" an unsafe hardware instruction which may fail to "
				"update memory");
				rampitecUnsubmitted Done Reply Inline Actions You need to remove all of that now. rampitec: You need to remove all of that now.
	}			}

	if (AS == AMDGPUAS::FLAT_ADDRESS)			if (AS == AMDGPUAS::FLAT_ADDRESS)
	return AtomicExpansionKind::CmpXChg;			return AtomicExpansionKind::CmpXChg;

	return RMW->use_empty() ? AtomicExpansionKind::None			if (RMW->use_empty()) {
	: AtomicExpansionKind::CmpXChg;			if (RMW->getFunction()
				rampitecUnsubmitted Done Reply Inline Actions No need to check attribute. Everything below amdgpu-unsafe-fp-atomics check to the end of the block is unsafe. Just revert to original return and call reportAtomicExpand() for the AtomicExpansionKind::None case. rampitec: No need to check attribute. Everything below amdgpu-unsafe-fp-atomics check to the end of the…
				->getFnAttribute("amdgpu-unsafe-fp-atomics")
				.getValueAsBool()) {
				return reportAtomicExpand(
				AtomicExpansionKind::None, ORE,
				OptRemark << "A floating-point atomic instruction with no "
				"following use will generate an unsafe hardware "
				"instruction");
				}
				return AtomicExpansionKind::None;
				}
				return AtomicExpansionKind::CmpXChg;
				rampitecUnsubmitted Done Reply Inline Actions That's a lot of churn. Please create a function returning AtomicExpansionKind, pass what you are going to return into that function, return that argument from the function, and also pass a string for diagnosticts to emit from there. Replace returns here with its calls. Like: `return reportAtomicExpand(AtomicExpansionKind::None, ORE, "Produced HW atomic is unsafe and might not update memory");` rampitec: That's a lot of churn. Please create a function returning AtomicExpansionKind, pass what you…
	}			}

	// DS FP atomics do repect the denormal mode, but the rounding mode is fixed			// DS FP atomics do repect the denormal mode, but the rounding mode is fixed
	// to round-to-nearest-even.			// to round-to-nearest-even.
	// The only exception is DS_ADD_F64 which never flushes regardless of mode.			// The only exception is DS_ADD_F64 which never flushes regardless of mode.
	if (AS == AMDGPUAS::LOCAL_ADDRESS && Subtarget->hasLDSFPAtomics()) {			if (AS == AMDGPUAS::LOCAL_ADDRESS && Subtarget->hasLDSFPAtomics()) {
	if (!Ty->isDoubleTy())			if (!Ty->isDoubleTy())
	return AtomicExpansionKind::None;			return AtomicExpansionKind::None;

	return (fpModeMatchesGlobalFPAtomicMode(RMW) \|\|			if (fpModeMatchesGlobalFPAtomicMode(RMW))
	RMW->getFunction()			return AtomicExpansionKind::None;

				if (RMW->getFunction()
	->getFnAttribute("amdgpu-unsafe-fp-atomics")			->getFnAttribute("amdgpu-unsafe-fp-atomics")
	.getValueAsString() == "true")			.getValueAsBool()) {
	? AtomicExpansionKind::None			return reportAtomicExpand(
	: AtomicExpansionKind::CmpXChg;			AtomicExpansionKind::None, ORE,
				OptRemark
				<< "A floating-point atomic instruction will generate an unsafe"
				rampitecUnsubmitted Done Reply Inline Actions This one might be unsafe not because of the cache it works on, but because it might not follow denorm mode. rampitec: This one might be unsafe not because of the cache it works on, but because it might not follow…
				arsenmUnsubmitted Done Reply Inline Actions Should not abbreviate denormal arsenm: Should not abbreviate denormal
				" hardware instruction which may not follow denormal mode");
				}
				return AtomicExpansionKind::CmpXChg;
	}			}

	return AtomicExpansionKind::CmpXChg;			return AtomicExpansionKind::CmpXChg;
	}			}
	default:			default:
	break;			break;
	}			}

	return AMDGPUTargetLowering::shouldExpandAtomicRMWInIR(RMW);			return AMDGPUTargetLowering::shouldExpandAtomicRMWInIR(RMW, ORE);
	}			}

	const TargetRegisterClass *			const TargetRegisterClass *
	SITargetLowering::getRegClassFor(MVT VT, bool isDivergent) const {			SITargetLowering::getRegClassFor(MVT VT, bool isDivergent) const {
	const TargetRegisterClass *RC = TargetLoweringBase::getRegClassFor(VT, false);			const TargetRegisterClass *RC = TargetLoweringBase::getRegClassFor(VT, false);
	const SIRegisterInfo *TRI = Subtarget->getRegisterInfo();			const SIRegisterInfo *TRI = Subtarget->getRegisterInfo();
	if (RC == &AMDGPU::VReg_1RegClass && !isDivergent)			if (RC == &AMDGPU::VReg_1RegClass && !isDivergent)
	return Subtarget->getWavefrontSize() == 64 ? &AMDGPU::SReg_64RegClass			return Subtarget->getWavefrontSize() == 64 ? &AMDGPU::SReg_64RegClass
	▲ Show 20 Lines • Show All 110 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.h

//===-- X86ISelLowering.h - X86 DAG Lowering Interface ----------- C++ --===//		//===-- X86ISelLowering.h - X86 DAG Lowering Interface ----------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file defines the interfaces that X86 uses to lower LLVM code into a		// This file defines the interfaces that X86 uses to lower LLVM code into a
// selection DAG.		// selection DAG.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_LIB_TARGET_X86_X86ISELLOWERING_H		#ifndef LLVM_LIB_TARGET_X86_X86ISELLOWERING_H
#define LLVM_LIB_TARGET_X86_X86ISELLOWERING_H		#define LLVM_LIB_TARGET_X86_X86ISELLOWERING_H

		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/CodeGen/TargetLowering.h"		#include "llvm/CodeGen/TargetLowering.h"

namespace llvm {		namespace llvm {
class X86Subtarget;		class X86Subtarget;
class X86TargetMachine;		class X86TargetMachine;

namespace X86ISD {		namespace X86ISD {
// X86 Specific DAG Nodes		// X86 Specific DAG Nodes
▲ Show 20 Lines • Show All 1,556 Lines • ▼ Show 20 Lines	bool CanLowerReturn(CallingConv::ID CallConv, MachineFunction &MF,
LLVMContext &Context) const override;		LLVMContext &Context) const override;

const MCPhysReg *getScratchRegisters(CallingConv::ID CC) const override;		const MCPhysReg *getScratchRegisters(CallingConv::ID CC) const override;

TargetLoweringBase::AtomicExpansionKind		TargetLoweringBase::AtomicExpansionKind
shouldExpandAtomicLoadInIR(LoadInst *LI) const override;		shouldExpandAtomicLoadInIR(LoadInst *LI) const override;
bool shouldExpandAtomicStoreInIR(StoreInst *SI) const override;		bool shouldExpandAtomicStoreInIR(StoreInst *SI) const override;
TargetLoweringBase::AtomicExpansionKind		TargetLoweringBase::AtomicExpansionKind
shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const override;		shouldExpandAtomicRMWInIR(AtomicRMWInst *AI,
		OptimizationRemarkEmitter *ORE) const override;

LoadInst *		LoadInst *
lowerIdempotentRMWIntoFencedLoad(AtomicRMWInst *AI) const override;		lowerIdempotentRMWIntoFencedLoad(AtomicRMWInst *AI) const override;

bool lowerAtomicStoreAsStoreSDNode(const StoreInst &SI) const override;		bool lowerAtomicStoreAsStoreSDNode(const StoreInst &SI) const override;
bool lowerAtomicLoadAsLoadSDNode(const LoadInst &LI) const override;		bool lowerAtomicLoadAsLoadSDNode(const LoadInst &LI) const override;

bool needsCmpXchgNb(Type *MemType) const;		bool needsCmpXchgNb(Type *MemType) const;
▲ Show 20 Lines • Show All 130 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show All 23 Lines
#include "llvm/ADT/SmallBitVector.h"		#include "llvm/ADT/SmallBitVector.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/ADT/StringSwitch.h"		#include "llvm/ADT/StringSwitch.h"
#include "llvm/Analysis/BlockFrequencyInfo.h"		#include "llvm/Analysis/BlockFrequencyInfo.h"
#include "llvm/Analysis/EHPersonalities.h"		#include "llvm/Analysis/EHPersonalities.h"
#include "llvm/Analysis/ObjCARCUtil.h"		#include "llvm/Analysis/ObjCARCUtil.h"
		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/ProfileSummaryInfo.h"		#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Analysis/VectorUtils.h"		#include "llvm/Analysis/VectorUtils.h"
#include "llvm/CodeGen/IntrinsicLowering.h"		#include "llvm/CodeGen/IntrinsicLowering.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"		#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineJumpTableInfo.h"		#include "llvm/CodeGen/MachineJumpTableInfo.h"
#include "llvm/CodeGen/MachineLoopInfo.h"		#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineModuleInfo.h"		#include "llvm/CodeGen/MachineModuleInfo.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/TargetLowering.h"		#include "llvm/CodeGen/TargetLowering.h"
#include "llvm/CodeGen/WinEHFuncInfo.h"		#include "llvm/CodeGen/WinEHFuncInfo.h"
#include "llvm/IR/CallingConv.h"		#include "llvm/IR/CallingConv.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalAlias.h"		#include "llvm/IR/GlobalAlias.h"
#include "llvm/IR/GlobalVariable.h"		#include "llvm/IR/GlobalVariable.h"
		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/MC/MCAsmInfo.h"		#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCContext.h"		#include "llvm/MC/MCContext.h"
#include "llvm/MC/MCExpr.h"		#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCSymbol.h"		#include "llvm/MC/MCSymbol.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
▲ Show 20 Lines • Show All 29,103 Lines • ▼ Show 20 Lines	if (MemType->getPrimitiveSizeInBits() == 64 && !Subtarget.is64Bit() &&
(Subtarget.hasSSE1() \|\| Subtarget.hasX87()))		(Subtarget.hasSSE1() \|\| Subtarget.hasX87()))
return AtomicExpansionKind::None;		return AtomicExpansionKind::None;

return needsCmpXchgNb(MemType) ? AtomicExpansionKind::CmpXChg		return needsCmpXchgNb(MemType) ? AtomicExpansionKind::CmpXChg
: AtomicExpansionKind::None;		: AtomicExpansionKind::None;
}		}

TargetLowering::AtomicExpansionKind		TargetLowering::AtomicExpansionKind
X86TargetLowering::shouldExpandAtomicRMWInIR(AtomicRMWInst *AI) const {		X86TargetLowering::shouldExpandAtomicRMWInIR(
		AtomicRMWInst AI, OptimizationRemarkEmitter ORE) const {
unsigned NativeWidth = Subtarget.is64Bit() ? 64 : 32;		unsigned NativeWidth = Subtarget.is64Bit() ? 64 : 32;
Type *MemType = AI->getType();		Type *MemType = AI->getType();

// If the operand is too big, we must see if cmpxchg8/16b is available		// If the operand is too big, we must see if cmpxchg8/16b is available
// and default to library calls otherwise.		// and default to library calls otherwise.
if (MemType->getPrimitiveSizeInBits() > NativeWidth) {		if (MemType->getPrimitiveSizeInBits() > NativeWidth) {
return needsCmpXchgNb(MemType) ? AtomicExpansionKind::CmpXChg		return needsCmpXchgNb(MemType) ? AtomicExpansionKind::CmpXChg
: AtomicExpansionKind::None;		: AtomicExpansionKind::None;
▲ Show 20 Lines • Show All 23,494 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/fp-atomics-remarks-gfx90a.ll

This file was added.

				; RUN: llc -march=amdgcn -mcpu=gfx90a -verify-machineinstrs --pass-remarks=atomic-expand \
				; RUN: %s -o - 2>&1 \| FileCheck %s --check-prefix=GFX90A-CAS

				; GFX90A-CAS: An atomic instruction was expanded into a compare and swap loop
				; GFX90A-CAS-LABEL: _Z14atomic_add_casPf:
				rampitecUnsubmitted Done Reply Inline Actions You need to write tests for all scopes. rampitec: You need to write tests for all scopes.
				; GFX90A-CAS: flat_atomic_cmpswap v2, v[0:1], v[2:3] glc
				; GFX90A-CAS: s_cbranch_execnz
				define dso_local void @_Z14atomic_add_casPf(float* %p) {
				entry:
				%ret = atomicrmw fadd float* %p, float 7.0 monotonic, align 4
				ret void
				}

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

	Show All 38 Lines
	; GCN-O0-NEXT: AMDGPU Inline All Functions			; GCN-O0-NEXT: AMDGPU Inline All Functions
	; GCN-O0-NEXT: CallGraph Construction			; GCN-O0-NEXT: CallGraph Construction
	; GCN-O0-NEXT: Call Graph SCC Pass Manager			; GCN-O0-NEXT: Call Graph SCC Pass Manager
	; GCN-O0-NEXT: Inliner for always_inline functions			; GCN-O0-NEXT: Inliner for always_inline functions
	; GCN-O0-NEXT: A No-Op Barrier Pass			; GCN-O0-NEXT: A No-Op Barrier Pass
	; GCN-O0-NEXT: Lower OpenCL enqueued blocks			; GCN-O0-NEXT: Lower OpenCL enqueued blocks
	; GCN-O0-NEXT: Lower uses of LDS variables from non-kernel functions			; GCN-O0-NEXT: Lower uses of LDS variables from non-kernel functions
	; GCN-O0-NEXT: FunctionPass Manager			; GCN-O0-NEXT: FunctionPass Manager
				; GCN-O0-NEXT: Dominator Tree Construction
				; GCN-O0-NEXT: Natural Loop Information
				; GCN-O0-NEXT: Lazy Branch Probability Analysis
				; GCN-O0-NEXT: Lazy Block Frequency Analysis
				; GCN-O0-NEXT: Optimization Remark Emitter
	; GCN-O0-NEXT: Expand Atomic instructions			; GCN-O0-NEXT: Expand Atomic instructions
	; GCN-O0-NEXT: Lower constant intrinsics			; GCN-O0-NEXT: Lower constant intrinsics
	; GCN-O0-NEXT: Remove unreachable blocks from the CFG			; GCN-O0-NEXT: Remove unreachable blocks from the CFG
	; GCN-O0-NEXT: Expand vector predication intrinsics			; GCN-O0-NEXT: Expand vector predication intrinsics
	; GCN-O0-NEXT: Scalarize Masked Memory Intrinsics			; GCN-O0-NEXT: Scalarize Masked Memory Intrinsics
	; GCN-O0-NEXT: Expand reduction intrinsics			; GCN-O0-NEXT: Expand reduction intrinsics
	; GCN-O0-NEXT: CallGraph Construction			; GCN-O0-NEXT: CallGraph Construction
	; GCN-O0-NEXT: Call Graph SCC Pass Manager			; GCN-O0-NEXT: Call Graph SCC Pass Manager
	▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
	; GCN-O1-NEXT: CallGraph Construction			; GCN-O1-NEXT: CallGraph Construction
	; GCN-O1-NEXT: Call Graph SCC Pass Manager			; GCN-O1-NEXT: Call Graph SCC Pass Manager
	; GCN-O1-NEXT: Inliner for always_inline functions			; GCN-O1-NEXT: Inliner for always_inline functions
	; GCN-O1-NEXT: A No-Op Barrier Pass			; GCN-O1-NEXT: A No-Op Barrier Pass
	; GCN-O1-NEXT: Lower OpenCL enqueued blocks			; GCN-O1-NEXT: Lower OpenCL enqueued blocks
	; GCN-O1-NEXT: Lower uses of LDS variables from non-kernel functions			; GCN-O1-NEXT: Lower uses of LDS variables from non-kernel functions
	; GCN-O1-NEXT: FunctionPass Manager			; GCN-O1-NEXT: FunctionPass Manager
	; GCN-O1-NEXT: Infer address spaces			; GCN-O1-NEXT: Infer address spaces
				; GCN-O1-NEXT: Dominator Tree Construction
				; GCN-O1-NEXT: Natural Loop Information
				; GCN-O1-NEXT: Lazy Branch Probability Analysis
				; GCN-O1-NEXT: Lazy Block Frequency Analysis
				; GCN-O1-NEXT: Optimization Remark Emitter
	; GCN-O1-NEXT: Expand Atomic instructions			; GCN-O1-NEXT: Expand Atomic instructions
	; GCN-O1-NEXT: AMDGPU Promote Alloca			; GCN-O1-NEXT: AMDGPU Promote Alloca
	; GCN-O1-NEXT: Dominator Tree Construction			; GCN-O1-NEXT: Dominator Tree Construction
	; GCN-O1-NEXT: SROA			; GCN-O1-NEXT: SROA
	; GCN-O1-NEXT: Post-Dominator Tree Construction			; GCN-O1-NEXT: Post-Dominator Tree Construction
	; GCN-O1-NEXT: Natural Loop Information			; GCN-O1-NEXT: Natural Loop Information
	; GCN-O1-NEXT: Legacy Divergence Analysis			; GCN-O1-NEXT: Legacy Divergence Analysis
	; GCN-O1-NEXT: AMDGPU IR optimizations			; GCN-O1-NEXT: AMDGPU IR optimizations
	▲ Show 20 Lines • Show All 235 Lines • ▼ Show 20 Lines
	; GCN-O1-OPTS-NEXT: CallGraph Construction			; GCN-O1-OPTS-NEXT: CallGraph Construction
	; GCN-O1-OPTS-NEXT: Call Graph SCC Pass Manager			; GCN-O1-OPTS-NEXT: Call Graph SCC Pass Manager
	; GCN-O1-OPTS-NEXT: Inliner for always_inline functions			; GCN-O1-OPTS-NEXT: Inliner for always_inline functions
	; GCN-O1-OPTS-NEXT: A No-Op Barrier Pass			; GCN-O1-OPTS-NEXT: A No-Op Barrier Pass
	; GCN-O1-OPTS-NEXT: Lower OpenCL enqueued blocks			; GCN-O1-OPTS-NEXT: Lower OpenCL enqueued blocks
	; GCN-O1-OPTS-NEXT: Lower uses of LDS variables from non-kernel functions			; GCN-O1-OPTS-NEXT: Lower uses of LDS variables from non-kernel functions
	; GCN-O1-OPTS-NEXT: FunctionPass Manager			; GCN-O1-OPTS-NEXT: FunctionPass Manager
	; GCN-O1-OPTS-NEXT: Infer address spaces			; GCN-O1-OPTS-NEXT: Infer address spaces
				; GCN-O1-OPTS-NEXT: Dominator Tree Construction
				; GCN-O1-OPTS-NEXT: Natural Loop Information
				; GCN-O1-OPTS-NEXT: Lazy Branch Probability Analysis
				; GCN-O1-OPTS-NEXT: Lazy Block Frequency Analysis
				; GCN-O1-OPTS-NEXT: Optimization Remark Emitter
	; GCN-O1-OPTS-NEXT: Expand Atomic instructions			; GCN-O1-OPTS-NEXT: Expand Atomic instructions
	; GCN-O1-OPTS-NEXT: AMDGPU Promote Alloca			; GCN-O1-OPTS-NEXT: AMDGPU Promote Alloca
	; GCN-O1-OPTS-NEXT: Dominator Tree Construction			; GCN-O1-OPTS-NEXT: Dominator Tree Construction
	; GCN-O1-OPTS-NEXT: SROA			; GCN-O1-OPTS-NEXT: SROA
	; GCN-O1-OPTS-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O1-OPTS-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O1-OPTS-NEXT: Function Alias Analysis Results			; GCN-O1-OPTS-NEXT: Function Alias Analysis Results
	; GCN-O1-OPTS-NEXT: Memory SSA			; GCN-O1-OPTS-NEXT: Memory SSA
	; GCN-O1-OPTS-NEXT: Natural Loop Information			; GCN-O1-OPTS-NEXT: Natural Loop Information
	▲ Show 20 Lines • Show All 268 Lines • ▼ Show 20 Lines
	; GCN-O2-NEXT: CallGraph Construction			; GCN-O2-NEXT: CallGraph Construction
	; GCN-O2-NEXT: Call Graph SCC Pass Manager			; GCN-O2-NEXT: Call Graph SCC Pass Manager
	; GCN-O2-NEXT: Inliner for always_inline functions			; GCN-O2-NEXT: Inliner for always_inline functions
	; GCN-O2-NEXT: A No-Op Barrier Pass			; GCN-O2-NEXT: A No-Op Barrier Pass
	; GCN-O2-NEXT: Lower OpenCL enqueued blocks			; GCN-O2-NEXT: Lower OpenCL enqueued blocks
	; GCN-O2-NEXT: Lower uses of LDS variables from non-kernel functions			; GCN-O2-NEXT: Lower uses of LDS variables from non-kernel functions
	; GCN-O2-NEXT: FunctionPass Manager			; GCN-O2-NEXT: FunctionPass Manager
	; GCN-O2-NEXT: Infer address spaces			; GCN-O2-NEXT: Infer address spaces
				; GCN-O2-NEXT: Dominator Tree Construction
				; GCN-O2-NEXT: Natural Loop Information
				; GCN-O2-NEXT: Lazy Branch Probability Analysis
				; GCN-O2-NEXT: Lazy Block Frequency Analysis
				; GCN-O2-NEXT: Optimization Remark Emitter
	; GCN-O2-NEXT: Expand Atomic instructions			; GCN-O2-NEXT: Expand Atomic instructions
	; GCN-O2-NEXT: AMDGPU Promote Alloca			; GCN-O2-NEXT: AMDGPU Promote Alloca
	; GCN-O2-NEXT: Dominator Tree Construction			; GCN-O2-NEXT: Dominator Tree Construction
	; GCN-O2-NEXT: SROA			; GCN-O2-NEXT: SROA
	; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O2-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O2-NEXT: Function Alias Analysis Results			; GCN-O2-NEXT: Function Alias Analysis Results
	; GCN-O2-NEXT: Memory SSA			; GCN-O2-NEXT: Memory SSA
	; GCN-O2-NEXT: Natural Loop Information			; GCN-O2-NEXT: Natural Loop Information
	▲ Show 20 Lines • Show All 270 Lines • ▼ Show 20 Lines
	; GCN-O3-NEXT: CallGraph Construction			; GCN-O3-NEXT: CallGraph Construction
	; GCN-O3-NEXT: Call Graph SCC Pass Manager			; GCN-O3-NEXT: Call Graph SCC Pass Manager
	; GCN-O3-NEXT: Inliner for always_inline functions			; GCN-O3-NEXT: Inliner for always_inline functions
	; GCN-O3-NEXT: A No-Op Barrier Pass			; GCN-O3-NEXT: A No-Op Barrier Pass
	; GCN-O3-NEXT: Lower OpenCL enqueued blocks			; GCN-O3-NEXT: Lower OpenCL enqueued blocks
	; GCN-O3-NEXT: Lower uses of LDS variables from non-kernel functions			; GCN-O3-NEXT: Lower uses of LDS variables from non-kernel functions
	; GCN-O3-NEXT: FunctionPass Manager			; GCN-O3-NEXT: FunctionPass Manager
	; GCN-O3-NEXT: Infer address spaces			; GCN-O3-NEXT: Infer address spaces
				; GCN-O3-NEXT: Dominator Tree Construction
				; GCN-O3-NEXT: Natural Loop Information
				; GCN-O3-NEXT: Lazy Branch Probability Analysis
				; GCN-O3-NEXT: Lazy Block Frequency Analysis
				; GCN-O3-NEXT: Optimization Remark Emitter
	; GCN-O3-NEXT: Expand Atomic instructions			; GCN-O3-NEXT: Expand Atomic instructions
	; GCN-O3-NEXT: AMDGPU Promote Alloca			; GCN-O3-NEXT: AMDGPU Promote Alloca
	; GCN-O3-NEXT: Dominator Tree Construction			; GCN-O3-NEXT: Dominator Tree Construction
	; GCN-O3-NEXT: SROA			; GCN-O3-NEXT: SROA
	; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)			; GCN-O3-NEXT: Basic Alias Analysis (stateless AA impl)
	; GCN-O3-NEXT: Function Alias Analysis Results			; GCN-O3-NEXT: Function Alias Analysis Results
	; GCN-O3-NEXT: Memory SSA			; GCN-O3-NEXT: Memory SSA
	; GCN-O3-NEXT: Natural Loop Information			; GCN-O3-NEXT: Natural Loop Information
	▲ Show 20 Lines • Show All 261 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/O0-pipeline.ll

	; When EXPENSIVE_CHECKS are enabled, the machine verifier appears between each			; When EXPENSIVE_CHECKS are enabled, the machine verifier appears between each
	; pass. Ignore it with 'grep -v'.			; pass. Ignore it with 'grep -v'.
	; RUN: llc -mtriple=x86_64-- -O0 -debug-pass=Structure < %s -o /dev/null 2>&1 \			; RUN: llc -mtriple=x86_64-- -O0 -debug-pass=Structure < %s -o /dev/null 2>&1 \
	; RUN: \| grep -v 'Verify generated machine code' \| FileCheck %s			; RUN: \| grep -v 'Verify generated machine code' \| FileCheck %s

	; REQUIRES: asserts			; REQUIRES: asserts

	; CHECK-LABEL: Pass Arguments:			; CHECK-LABEL: Pass Arguments:
	; CHECK-NEXT: Target Library Information			; CHECK-NEXT: Target Library Information
	; CHECK-NEXT: Target Pass Configuration			; CHECK-NEXT: Target Pass Configuration
	; CHECK-NEXT: Machine Module Information			; CHECK-NEXT: Machine Module Information
	; CHECK-NEXT: Target Transform Information			; CHECK-NEXT: Target Transform Information
				; CHECK-NEXT: Profile summary info
	; CHECK-NEXT: Create Garbage Collector Module Metadata			; CHECK-NEXT: Create Garbage Collector Module Metadata
	; CHECK-NEXT: Assumption Cache Tracker			; CHECK-NEXT: Assumption Cache Tracker
	; CHECK-NEXT: Profile summary info
	; CHECK-NEXT: Machine Branch Probability Analysis			; CHECK-NEXT: Machine Branch Probability Analysis
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
				; CHECK-NEXT: Dominator Tree Construction
				; CHECK-NEXT: Natural Loop Information
				; CHECK-NEXT: Lazy Branch Probability Analysis
				; CHECK-NEXT: Lazy Block Frequency Analysis
				; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: Lower AMX intrinsics			; CHECK-NEXT: Lower AMX intrinsics
	; CHECK-NEXT: Lower AMX type for load/store			; CHECK-NEXT: Lower AMX type for load/store
	; CHECK-NEXT: Pre AMX Tile Config			; CHECK-NEXT: Pre AMX Tile Config
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Lower Garbage Collection Instructions			; CHECK-NEXT: Lower Garbage Collection Instructions
	; CHECK-NEXT: Shadow Stack GC Lowering			; CHECK-NEXT: Shadow Stack GC Lowering
	; CHECK-NEXT: Lower constant intrinsics			; CHECK-NEXT: Lower constant intrinsics
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/opt-pipeline.ll

	Show All 10 Lines

	; REQUIRES: asserts			; REQUIRES: asserts

	; CHECK-LABEL: Pass Arguments:			; CHECK-LABEL: Pass Arguments:
	; CHECK-NEXT: Target Library Information			; CHECK-NEXT: Target Library Information
	; CHECK-NEXT: Target Pass Configuration			; CHECK-NEXT: Target Pass Configuration
	; CHECK-NEXT: Machine Module Information			; CHECK-NEXT: Machine Module Information
	; CHECK-NEXT: Target Transform Information			; CHECK-NEXT: Target Transform Information
				; CHECK-NEXT: Profile summary info
	; CHECK-NEXT: Type-Based Alias Analysis			; CHECK-NEXT: Type-Based Alias Analysis
	; CHECK-NEXT: Scoped NoAlias Alias Analysis			; CHECK-NEXT: Scoped NoAlias Alias Analysis
	; CHECK-NEXT: Assumption Cache Tracker			; CHECK-NEXT: Assumption Cache Tracker
	; CHECK-NEXT: Profile summary info
	; CHECK-NEXT: Create Garbage Collector Module Metadata			; CHECK-NEXT: Create Garbage Collector Module Metadata
	; CHECK-NEXT: Machine Branch Probability Analysis			; CHECK-NEXT: Machine Branch Probability Analysis
	; CHECK-NEXT: ModulePass Manager			; CHECK-NEXT: ModulePass Manager
	; CHECK-NEXT: Pre-ISel Intrinsic Lowering			; CHECK-NEXT: Pre-ISel Intrinsic Lowering
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
				; CHECK-NEXT: Dominator Tree Construction
				; CHECK-NEXT: Natural Loop Information
				; CHECK-NEXT: Lazy Branch Probability Analysis
				; CHECK-NEXT: Lazy Block Frequency Analysis
				; CHECK-NEXT: Optimization Remark Emitter
	; CHECK-NEXT: Expand Atomic instructions			; CHECK-NEXT: Expand Atomic instructions
	; CHECK-NEXT: Lower AMX intrinsics			; CHECK-NEXT: Lower AMX intrinsics
	; CHECK-NEXT: Lower AMX type for load/store			; CHECK-NEXT: Lower AMX type for load/store
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Canonicalize natural loops			; CHECK-NEXT: Canonicalize natural loops
	▲ Show 20 Lines • Show All 181 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Remarks] Emit optimization remarks for atomics generating CAS loopClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 364619

clang/test/CodeGenCUDA/fp-atomics-optremarks.cu

clang/test/CodeGenOpenCL/fp-atomics-optremarks-gfx90a.cl

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/AtomicExpandPass.cpp

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

llvm/lib/Target/AMDGPU/SIISelLowering.h

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

llvm/lib/Target/X86/X86ISelLowering.h

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/AMDGPU/fp-atomics-remarks-gfx90a.ll

llvm/test/CodeGen/AMDGPU/llc-pipeline.ll

llvm/test/CodeGen/X86/O0-pipeline.ll

llvm/test/CodeGen/X86/opt-pipeline.ll

[Remarks] Emit optimization remarks for atomics generating CAS loop
ClosedPublic