Implements ORE in AtomicExpandPass to report atomics generating a compare and swap loop.
Details
- Reviewers
arsenm yaxunl rampitec b-sumner t-tye nikic - Commits
- rGf22ba5187350: [Remarks] Emit optimization remarks for atomics generating CAS loop
rG435785214f73: [Remarks] Emit optimization remarks for atomics generating CAS loop
rGc4e5425aa579: [Remarks] Emit optimization remarks for atomics generating CAS loop
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
12120 | Do not pass it there. Turn reportAtomicExpand into lambda. |
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
---|---|---|
631–637 | This is an AMDGPU specific message/restriction. The floating point operation isn't relevant, and you don't even know the specific reason at this point. |
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
---|---|---|
633 | Still the same problem. |
- eliminated unsafe hardware remarks in SIISelLowering.cpp
- updated cas loop remark and corresponding tests
Most of this patch is not needed now. You do not need to pass ORE to targets, it is a part of the next patch.
clang/test/CodeGenCUDA/fp-atomics-optremarks.cu | ||
---|---|---|
11 | Need tests for all scopes. | |
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
585 | I do not see why do you need this function and all its arguments now. You can just call ORE->emit() directly. | |
636 | That does not help with target defined scope names, such as our "one-as" for example. | |
llvm/test/CodeGen/AMDGPU/fp-atomics-remarks-gfx90a.ll | ||
5 | You need to write tests for all scopes. |
You also need to retitle it now, it is not about AMDGPU and not about FP.
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
---|---|---|
634 | Need to name the operation. |
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
---|---|---|
636 | How can I get target defined scope names? |
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
---|---|---|
636 | It is right on the instruction: %result = atomicrmw fadd float addrspace(1)* %ptr, float 4.0 syncscope("one-as") seq_cst |
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
---|---|---|
636 | Sorry, I meant from the LLVM API. |
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
---|---|---|
636 | LLVMContext::getSyncScopeNames() |
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
---|---|---|
636 | I think that gives me all sync scopes available for the target. If not, which sync scope in the vector corresponds to the instruction I am dealing with? |
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
---|---|---|
636 |
requested changes from reviewer
- added memory scope tests and updated remarks and tests accordingly
- still working on clang/test/CodeGenCUDA/fp-atomics-optremarks.cu and clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl
clang/test/CodeGenCUDA/fp-atomics-optremarks.cu | ||
---|---|---|
11 | __atomic_fetch_add does not take scope as an argument, how could I add tests with different scopes? | |
clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl | ||
25 ↗ | (On Diff #366112) | For some reason, remarks are not emitted here. The command to run looks right above... |
clang/test/CodeGenCUDA/fp-atomics-optremarks.cu | ||
---|---|---|
11 | At least in the IR test. |
clang/test/CodeGenCUDA/fp-atomics-optremarks.cu | ||
---|---|---|
11 | You need to test all of that. If you cannot write a proper .cu test, then write an IR test and run llc. | |
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
631 | I thought want to cache it. But really just declare it here. | |
637 | The operation to report is AI->getOperation(). Spacing is wrong, "operation" is your text. |
clang/test/CodeGenCUDA/fp-atomics-optremarks.cu | ||
---|---|---|
11 | Should I discard this test then since the test fp-atomics-remarks-gfx90a.ll already satisfies? |
- corrected remarks by replacing the operation name and updated tests accordingly
- code format
clang/test/CodeGenCUDA/fp-atomics-optremarks.cu | ||
---|---|---|
11 | Alright, I am not sure how I can test for the other scopes though. | |
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
600 | https://llvm.org/docs/Remarks.html Since this is an informative pass and not that pass failed to optimize, the "Passed" argument is used. I will move it downwards, I thought it might be useful in the future for other operations. Its better below for now anyways. |
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
---|---|---|
600 | Actually I am getting a runtime error at the line where I declare Remark when I bring it down. |
- added clang/test/CodeGenCUDA/fp-atomics-optremarks.cu back
- moved Remark declaration into the else block
Please retitle it without AMDGPU and remove the changes to pass ORE to targets. It is not a part of this change, it is a part of the folloup target specific change.
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
---|---|---|
637 | Missing word atomic? |
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
---|---|---|
637 | Its already part of the OperationName |
Please restore opencl test.
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
---|---|---|
637 | Matt is right, missing "atomic" word. |
clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl | ||
---|---|---|
32 ↗ | (On Diff #366343) | It is not system as test name suggests. Just rename to atomic_cas and add calls with all other scopes into the same function. |
- adding more tests in clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl for various scopes, memory_scope_work_item is called out as invalid by the compiler so excluded that
clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl | ||
---|---|---|
33 ↗ | (On Diff #366349) | Just combine all the calls into a single function. |
Compile time regressions especially for -O0 -g are higher than expected with this patch.
- changed type of ORE from OptimizationRemarkEmitter* to std::shared_ptr<OptimizationRemarkEmitter> and construct it within AtomicExpandPass, this solution is implemented to address for the regressions in many backends due to prerequisite passes
I don’t think constructing in the pass is the solution. Why exactly is this introducing such a big slowdown?
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
---|---|---|
182 | There’s basically never a reason to use shared_ptr over unique_ptr |
I will actually revert my changes back with only the tests updated to see if the times are reasonable.
Also, @nikic suggested to construct ORE here if we cannot usefully preserve them. I am not sure if preserving the information is useful though.
@xbolva00 I timed X86/opt-pipeline.ll passes and DTC executed in 0.1% of the total compile time.
The reason are the additional DominatorTree and LoopInfo calculations. These have a big impact at O0. These analysis calculations are caused by a deficiency in the legacy pass manager, which is still used for the codegen pipeline: Even though these analyses are only needed if diagnostic hotness is used, the pass manager requires them to be scheduled unconditionally.
The way to avoid this is to construct ORE without going through the pass manager. Grep for OptimizationRemarkEmitter ORE for various passes that already do this (though the reason is different in some cases -- there are some analysis preservation concerns with loop passes).
Okay, sorry about that. Thanks for reverting my commit. I will use a unique_ptr and wait for another approval.
reverting type of ORE from OptimizationRemarkEmitter * back to std::unique_ptr<OptimizationRemarkEmitter> and constructing it in AtomicExpand to avoid DTC and LI overhead.
You also need to drop the getAnalysisUsage() change, otherwise the pass manager will still create the ORE itself.
PS: This revision still has incorrectly configured permissions, which prevents me from leaving line comments. Revisions on this Phabricator instance should always be publicly writable.
- fixing breaking tests by eliminating passes that are no longer in the pass pipelines
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
---|---|---|
182 | Is there a reason to construct it upfront and not just use a local variable only when needed? Like in StackProtector.cpp for example. |
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
---|---|---|
182 | We can certainly implement it as a local variable as long as we have access to the function this pass is operating on. I was thinking of its potential use throughout this pass in the future. |
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
---|---|---|
182 | You have access to the function, AI->getParent()->getParent(). |
llvm/lib/CodeGen/AtomicExpandPass.cpp | ||
---|---|---|
182 | Sounds like a plan! |
- ORE does not need to be a pointer anymore, it is constructed as local variable with this patch as requested by reviewer.
Need tests for all scopes.