This is an archive of the discontinued LLVM Phabricator instance.

[Remarks] Emit optimization remarks for atomics generating CAS loop
ClosedPublic

Authored by gandhi21299 on Jul 27 2021, 9:51 AM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
rampitec added inline comments.Aug 6 2021, 2:19 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
12121

Do not pass it there. Turn reportAtomicExpand into lambda.

gandhi21299 marked an inline comment as done.
  • turned reportAtomicExpand(...) into a lambda as requested

refreshing patch

  • emit CAS loop remark only if the instruction is of floating point type
arsenm added inline comments.Aug 9 2021, 10:57 AM
llvm/lib/CodeGen/AtomicExpandPass.cpp
631–637

This is an AMDGPU specific message/restriction. The floating point operation isn't relevant, and you don't even know the specific reason at this point.

  • floating point check is not required for cas loop check, getting rid of it
gandhi21299 marked an inline comment as done.Aug 9 2021, 3:06 PM
  • restricting remarks emission on AMDGPU targets only
rampitec added inline comments.Aug 10 2021, 10:46 AM
llvm/lib/CodeGen/AtomicExpandPass.cpp
633

Still the same problem.

gandhi21299 marked an inline comment as done.

requested change

@rampitec besides the remarks, am I missing anything else in the patch?

@rampitec besides the remarks, am I missing anything else in the patch?

You should not use AMD specific code in the common code.

  • reverted the patch to the previous one
  • eliminated unsafe hardware remarks in SIISelLowering.cpp
  • updated cas loop remark and corresponding tests

removed AMDGPU check

  • eliminated unsafe hardware remarks in SIISelLowering.cpp

Most of this patch is not needed now. You do not need to pass ORE to targets, it is a part of the next patch.

clang/test/CodeGenCUDA/fp-atomics-optremarks.cu
10 ↗(On Diff #365859)

Need tests for all scopes.

llvm/lib/CodeGen/AtomicExpandPass.cpp
585

I do not see why do you need this function and all its arguments now. You can just call ORE->emit() directly.

636

That does not help with target defined scope names, such as our "one-as" for example.

llvm/test/CodeGen/AMDGPU/fp-atomics-remarks-gfx90a.ll
4 ↗(On Diff #365859)

You need to write tests for all scopes.

You also need to retitle it now, it is not about AMDGPU and not about FP.

llvm/lib/CodeGen/AtomicExpandPass.cpp
634

Need to name the operation.

gandhi21299 retitled this revision from [AMDGPU] [Remarks] Emit optimization remarks for FP atomics to [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop.Aug 12 2021, 8:02 AM
gandhi21299 edited the summary of this revision. (Show Details)
gandhi21299 marked 2 inline comments as done.Aug 12 2021, 9:01 AM
gandhi21299 added inline comments.
llvm/lib/CodeGen/AtomicExpandPass.cpp
636

How can I get target defined scope names?

rampitec added inline comments.Aug 12 2021, 9:04 AM
llvm/lib/CodeGen/AtomicExpandPass.cpp
636

It is right on the instruction:

%result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 syncscope("one-as") seq_cst
gandhi21299 added inline comments.Aug 12 2021, 9:09 AM
llvm/lib/CodeGen/AtomicExpandPass.cpp
636

Sorry, I meant from the LLVM API.

rampitec added inline comments.Aug 12 2021, 9:25 AM
llvm/lib/CodeGen/AtomicExpandPass.cpp
636

LLVMContext::getSyncScopeNames()

gandhi21299 added inline comments.Aug 12 2021, 11:27 AM
llvm/lib/CodeGen/AtomicExpandPass.cpp
636

I think that gives me all sync scopes available for the target. If not, which sync scope in the vector corresponds to the instruction I am dealing with?

gandhi21299 marked 4 inline comments as done.Aug 12 2021, 1:01 PM
gandhi21299 marked an inline comment as done.

requested changes from reviewer

  • added memory scope tests and updated remarks and tests accordingly
  • still working on clang/test/CodeGenCUDA/fp-atomics-optremarks.cu and clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl
rampitec added inline comments.Aug 12 2021, 2:57 PM
llvm/lib/CodeGen/AtomicExpandPass.cpp
631

Only if SSNs.empty().

637

I believe getOpcodeName() will return "atomicrmw" instead of the operation. Also missing space after it.

gandhi21299 added inline comments.Aug 12 2021, 2:58 PM
clang/test/CodeGenCUDA/fp-atomics-optremarks.cu
10 ↗(On Diff #365859)

__atomic_fetch_add does not take scope as an argument, how could I add tests with different scopes?

clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl
25 ↗(On Diff #366112)

For some reason, remarks are not emitted here. The command to run looks right above...

rampitec added inline comments.Aug 12 2021, 3:00 PM
clang/test/CodeGenCUDA/fp-atomics-optremarks.cu
10 ↗(On Diff #365859)

At least in the IR test.

gandhi21299 added inline comments.Aug 12 2021, 3:01 PM
clang/test/CodeGenCUDA/fp-atomics-optremarks.cu
10 ↗(On Diff #365859)

What do you mean by that?

llvm/lib/CodeGen/AtomicExpandPass.cpp
631

Sorry, what do you mean? SSN will be empty at that point.

637

getOpcodeName() returns atomicrmwoperation, as per the tests the spacing looks correct to me.

rampitec added inline comments.Aug 12 2021, 3:07 PM
clang/test/CodeGenCUDA/fp-atomics-optremarks.cu
10 ↗(On Diff #365859)

You need to test all of that. If you cannot write a proper .cu test, then write an IR test and run llc.

llvm/lib/CodeGen/AtomicExpandPass.cpp
631

I thought want to cache it. But really just declare it here.

637

The operation to report is AI->getOperation(). Spacing is wrong, "operation" is your text.

gandhi21299 marked 4 inline comments as done.Aug 12 2021, 3:17 PM
gandhi21299 added inline comments.
clang/test/CodeGenCUDA/fp-atomics-optremarks.cu
10 ↗(On Diff #365859)

Should I discard this test then since the test fp-atomics-remarks-gfx90a.ll already satisfies?

  • corrected remarks by replacing the operation name and updated tests accordingly
  • code format
  • corrected atomics-remarks-gfx90a.cl test to emit remark as well
rampitec added inline comments.Aug 12 2021, 3:29 PM
clang/test/CodeGenCUDA/fp-atomics-optremarks.cu
10 ↗(On Diff #365859)

CU test is still needed. You also need it in the .cl test below.

llvm/lib/CodeGen/AtomicExpandPass.cpp
600

What should this "Passed" do and why wouldn't just declare it where you use it?

gandhi21299 marked 3 inline comments as done.

no way to pass memory_scope in __atomic_fetch_add(...), discarded the test.

gandhi21299 marked an inline comment as done.Aug 12 2021, 3:35 PM
gandhi21299 added inline comments.
clang/test/CodeGenCUDA/fp-atomics-optremarks.cu
10 ↗(On Diff #365859)

Alright, I am not sure how I can test for the other scopes though.

llvm/lib/CodeGen/AtomicExpandPass.cpp
600

https://llvm.org/docs/Remarks.html

Since this is an informative pass and not that pass failed to optimize, the "Passed" argument is used. I will move it downwards, I thought it might be useful in the future for other operations. Its better below for now anyways.

gandhi21299 marked an inline comment as done.Aug 12 2021, 3:48 PM
gandhi21299 added inline comments.
llvm/lib/CodeGen/AtomicExpandPass.cpp
600

Actually I am getting a runtime error at the line where I declare Remark when I bring it down.

  • added clang/test/CodeGenCUDA/fp-atomics-optremarks.cu back
  • moved Remark declaration into the else block
gandhi21299 marked an inline comment as done.Aug 13 2021, 8:30 AM
  • rebased against main branch
  • cleaned up code

Please retitle it without AMDGPU and remove the changes to pass ORE to targets. It is not a part of this change, it is a part of the folloup target specific change.

gandhi21299 retitled this revision from [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop to [Remarks] Emit optimization remarks for atomics generating CAS loop.Aug 13 2021, 10:00 AM
arsenm added inline comments.Aug 13 2021, 10:20 AM
llvm/lib/CodeGen/AtomicExpandPass.cpp
637

Missing word atomic?

gandhi21299 added inline comments.Aug 13 2021, 10:27 AM
llvm/lib/CodeGen/AtomicExpandPass.cpp
637

Its already part of the OperationName

  • eliminated irrelevant changes to this patch

Please restore opencl test.

llvm/lib/CodeGen/AtomicExpandPass.cpp
637

Matt is right, missing "atomic" word.

  • replaced the OpenCL test
  • renamed filenames
  • added 'atomic' to the remark and tests
gandhi21299 marked 2 inline comments as done.Aug 13 2021, 1:04 PM
  • removed StringExtras.h inclusion
rampitec added inline comments.Aug 13 2021, 1:46 PM
clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl
32 ↗(On Diff #366343)

It is not system as test name suggests. Just rename to atomic_cas and add calls with all other scopes into the same function.

gandhi21299 marked an inline comment as done.
  • adding more tests in clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl for various scopes, memory_scope_work_item is called out as invalid by the compiler so excluded that
rampitec added inline comments.Aug 13 2021, 2:29 PM
clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl
33 ↗(On Diff #366349)

Just combine all the calls into a single function.

gandhi21299 marked an inline comment as done.
  • combined all tests into one
This revision is now accepted and ready to land.Aug 13 2021, 2:37 PM

Thanks a lot for the review! I will merge this patch in as soon as the CI passes.

This revision was landed with ongoing or failed builds.Aug 13 2021, 9:44 PM
This revision was automatically updated to reflect the committed changes.
gandhi21299 reopened this revision.Aug 13 2021, 11:01 PM
This revision is now accepted and ready to land.Aug 13 2021, 11:01 PM
  • changed type of ORE from OptimizationRemarkEmitter* to std::shared_ptr<OptimizationRemarkEmitter> and construct it within AtomicExpandPass, this solution is implemented to address for the regressions in many backends due to prerequisite passes
  • fixed breaking tests

I don’t think constructing in the pass is the solution. Why exactly is this introducing such a big slowdown?

llvm/lib/CodeGen/AtomicExpandPass.cpp
182

There’s basically never a reason to use shared_ptr over unique_ptr

gandhi21299 marked an inline comment as done.Aug 14 2021, 4:15 PM

@xbolva00 is concerned about Dominator Tree Construction

I will actually revert my changes back with only the tests updated to see if the times are reasonable.

Also, @nikic suggested to construct ORE here if we cannot usefully preserve them. I am not sure if preserving the information is useful though.

reverting changes back to declaring ORE using getAnalysis

@xbolva00 I timed X86/opt-pipeline.ll passes and DTC executed in 0.1% of the total compile time.

sqlite3 +1.3% increase of compile time for -O0 -g is simply not acceptable.

nikic added a comment.Aug 15 2021, 6:37 AM

I don’t think constructing in the pass is the solution. Why exactly is this introducing such a big slowdown?

The reason are the additional DominatorTree and LoopInfo calculations. These have a big impact at O0. These analysis calculations are caused by a deficiency in the legacy pass manager, which is still used for the codegen pipeline: Even though these analyses are only needed if diagnostic hotness is used, the pass manager requires them to be scheduled unconditionally.

The way to avoid this is to construct ORE without going through the pass manager. Grep for OptimizationRemarkEmitter ORE for various passes that already do this (though the reason is different in some cases -- there are some analysis preservation concerns with loop passes).

Okay, sorry about that. Thanks for reverting my commit. I will use a unique_ptr and wait for another approval.

reverting type of ORE from OptimizationRemarkEmitter * back to std::unique_ptr<OptimizationRemarkEmitter> and constructing it in AtomicExpand to avoid DTC and LI overhead.

nikic added a comment.Aug 15 2021, 9:19 AM

You also need to drop the getAnalysisUsage() change, otherwise the pass manager will still create the ORE itself.

PS: This revision still has incorrectly configured permissions, which prevents me from leaving line comments. Revisions on this Phabricator instance should always be publicly writable.

gandhi21299 added a reviewer: nikic.
  • removing analysis requirement as requested

+ nikic

  • fixing breaking tests by eliminating passes that are no longer in the pass pipelines
gandhi21299 reopened this revision.Aug 15 2021, 11:36 AM
This revision is now accepted and ready to land.Aug 15 2021, 11:36 AM

Please drop a change in PowerPC/O3-pipeline.ll

  • eliminated changes in PowerPC/O3-pipeline.ll, as requested

Any ideas on what could be causing the failure in windows pre-merge checks?

Not related to your patch, feel free to ignore

Alright, please let me know if this patch is good for merge at your convenience.

rampitec added inline comments.Aug 16 2021, 10:23 AM
llvm/lib/CodeGen/AtomicExpandPass.cpp
182

Is there a reason to construct it upfront and not just use a local variable only when needed? Like in StackProtector.cpp for example.

gandhi21299 added inline comments.Aug 16 2021, 10:30 AM
llvm/lib/CodeGen/AtomicExpandPass.cpp
182

We can certainly implement it as a local variable as long as we have access to the function this pass is operating on. I was thinking of its potential use throughout this pass in the future.

rampitec added inline comments.Aug 16 2021, 11:10 AM
llvm/lib/CodeGen/AtomicExpandPass.cpp
182

You have access to the function, AI->getParent()->getParent().
You also will not need to pass ORE everywhere in the subsequent patch, just construct it in target in place.

gandhi21299 added inline comments.Aug 16 2021, 11:12 AM
llvm/lib/CodeGen/AtomicExpandPass.cpp
182

Sounds like a plan!

  • ORE does not need to be a pointer anymore, it is constructed as local variable with this patch as requested by reviewer.
gandhi21299 marked 2 inline comments as done.Aug 16 2021, 11:24 AM
rampitec accepted this revision.Aug 16 2021, 11:28 AM

LGTM, but please wait for others too.

Will do, thanks!

arsenm accepted this revision.Aug 16 2021, 1:15 PM

LGTM