Page MenuHomePhabricator

[Remarks] Emit optimization remarks for atomics generating CAS loop
ClosedPublic

Authored by gandhi21299 on Jul 27 2021, 9:51 AM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Please retitle it without AMDGPU and remove the changes to pass ORE to targets. It is not a part of this change, it is a part of the folloup target specific change.

gandhi21299 retitled this revision from [AMDGPU] [Remarks] Emit optimization remarks for atomics generating CAS loop to [Remarks] Emit optimization remarks for atomics generating CAS loop.Aug 13 2021, 10:00 AM
arsenm added inline comments.Aug 13 2021, 10:20 AM
llvm/lib/CodeGen/AtomicExpandPass.cpp
613

Missing word atomic?

gandhi21299 added inline comments.Aug 13 2021, 10:27 AM
llvm/lib/CodeGen/AtomicExpandPass.cpp
613

Its already part of the OperationName

  • eliminated irrelevant changes to this patch

Please restore opencl test.

llvm/lib/CodeGen/AtomicExpandPass.cpp
613

Matt is right, missing "atomic" word.

  • replaced the OpenCL test
  • renamed filenames
  • added 'atomic' to the remark and tests
gandhi21299 marked 2 inline comments as done.Aug 13 2021, 1:04 PM
  • removed StringExtras.h inclusion
rampitec added inline comments.Aug 13 2021, 1:46 PM
clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl
33

It is not system as test name suggests. Just rename to atomic_cas and add calls with all other scopes into the same function.

gandhi21299 marked an inline comment as done.
  • adding more tests in clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl for various scopes, memory_scope_work_item is called out as invalid by the compiler so excluded that
rampitec added inline comments.Aug 13 2021, 2:29 PM
clang/test/CodeGenOpenCL/atomics-remarks-gfx90a.cl
34

Just combine all the calls into a single function.

gandhi21299 marked an inline comment as done.
  • combined all tests into one
This revision is now accepted and ready to land.Aug 13 2021, 2:37 PM

Thanks a lot for the review! I will merge this patch in as soon as the CI passes.

This revision was landed with ongoing or failed builds.Aug 13 2021, 9:44 PM
This revision was automatically updated to reflect the committed changes.
gandhi21299 reopened this revision.Aug 13 2021, 11:01 PM
This revision is now accepted and ready to land.Aug 13 2021, 11:01 PM
  • changed type of ORE from OptimizationRemarkEmitter* to std::shared_ptr<OptimizationRemarkEmitter> and construct it within AtomicExpandPass, this solution is implemented to address for the regressions in many backends due to prerequisite passes
  • fixed breaking tests

I don’t think constructing in the pass is the solution. Why exactly is this introducing such a big slowdown?

llvm/lib/CodeGen/AtomicExpandPass.cpp
174

There’s basically never a reason to use shared_ptr over unique_ptr

gandhi21299 marked an inline comment as done.Aug 14 2021, 4:15 PM

@xbolva00 is concerned about Dominator Tree Construction

I will actually revert my changes back with only the tests updated to see if the times are reasonable.

Also, @nikic suggested to construct ORE here if we cannot usefully preserve them. I am not sure if preserving the information is useful though.

reverting changes back to declaring ORE using getAnalysis

@xbolva00 I timed X86/opt-pipeline.ll passes and DTC executed in 0.1% of the total compile time.

sqlite3 +1.3% increase of compile time for -O0 -g is simply not acceptable.

nikic added a comment.Aug 15 2021, 6:37 AM

I don’t think constructing in the pass is the solution. Why exactly is this introducing such a big slowdown?

The reason are the additional DominatorTree and LoopInfo calculations. These have a big impact at O0. These analysis calculations are caused by a deficiency in the legacy pass manager, which is still used for the codegen pipeline: Even though these analyses are only needed if diagnostic hotness is used, the pass manager requires them to be scheduled unconditionally.

The way to avoid this is to construct ORE without going through the pass manager. Grep for OptimizationRemarkEmitter ORE for various passes that already do this (though the reason is different in some cases -- there are some analysis preservation concerns with loop passes).

Okay, sorry about that. Thanks for reverting my commit. I will use a unique_ptr and wait for another approval.

reverting type of ORE from OptimizationRemarkEmitter * back to std::unique_ptr<OptimizationRemarkEmitter> and constructing it in AtomicExpand to avoid DTC and LI overhead.

nikic added a comment.Aug 15 2021, 9:19 AM

You also need to drop the getAnalysisUsage() change, otherwise the pass manager will still create the ORE itself.

PS: This revision still has incorrectly configured permissions, which prevents me from leaving line comments. Revisions on this Phabricator instance should always be publicly writable.

gandhi21299 added a reviewer: nikic.
  • removing analysis requirement as requested

+ nikic

  • fixing breaking tests by eliminating passes that are no longer in the pass pipelines
gandhi21299 reopened this revision.Aug 15 2021, 11:36 AM
This revision is now accepted and ready to land.Aug 15 2021, 11:36 AM

Please drop a change in PowerPC/O3-pipeline.ll

  • eliminated changes in PowerPC/O3-pipeline.ll, as requested

Any ideas on what could be causing the failure in windows pre-merge checks?

Not related to your patch, feel free to ignore

Alright, please let me know if this patch is good for merge at your convenience.

rampitec added inline comments.Aug 16 2021, 10:23 AM
llvm/lib/CodeGen/AtomicExpandPass.cpp
174

Is there a reason to construct it upfront and not just use a local variable only when needed? Like in StackProtector.cpp for example.

gandhi21299 added inline comments.Aug 16 2021, 10:30 AM
llvm/lib/CodeGen/AtomicExpandPass.cpp
174

We can certainly implement it as a local variable as long as we have access to the function this pass is operating on. I was thinking of its potential use throughout this pass in the future.

rampitec added inline comments.Aug 16 2021, 11:10 AM
llvm/lib/CodeGen/AtomicExpandPass.cpp
174

You have access to the function, AI->getParent()->getParent().
You also will not need to pass ORE everywhere in the subsequent patch, just construct it in target in place.

gandhi21299 added inline comments.Aug 16 2021, 11:12 AM
llvm/lib/CodeGen/AtomicExpandPass.cpp
174

Sounds like a plan!

  • ORE does not need to be a pointer anymore, it is constructed as local variable with this patch as requested by reviewer.
gandhi21299 marked 2 inline comments as done.Aug 16 2021, 11:24 AM
rampitec accepted this revision.Aug 16 2021, 11:28 AM

LGTM, but please wait for others too.

Will do, thanks!

arsenm accepted this revision.Aug 16 2021, 1:15 PM

LGTM