Page MenuHomePhabricator

[AMDGPU] [IndirectCalls] Don't propagate attributes to address taken functions and their callees
ClosedPublic

Authored by madhur13490 on May 25 2021, 10:50 PM.

Details

Summary

Don't propagate launch bound related attributes to
address taken functions and their callees. The idea
is to do a traversal over the call graph starting at
address taken functions and erase the attributes
set by previous logic i.e. process().

This two phase approach makes sure that we don't
miss out on deep nested callees from address taken
functions as a function might be called directly as
well as indirectly.

This patch is also reattempt to D94585 as latent issues
are fixed in hasAddressTaken function in the recent
past.

Diff Detail

Event Timeline

madhur13490 created this revision.May 25 2021, 10:50 PM
madhur13490 requested review of this revision.May 25 2021, 10:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2021, 10:50 PM

Going back to remove the attributes feels more confusing than just not setting them in the first place

llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
235–238

Just assert, the return is dead

llvm/test/CodeGen/AMDGPU/propagate-attributes-launch-bounds.ll
8

Add a comment that there should be no attribute set

rampitec added inline comments.May 26 2021, 12:04 PM
llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
41

Restore blank line here.

223

\n in single quotes.

address review comments

madhur13490 marked 3 inline comments as done.May 26 2021, 11:17 PM
madhur13490 added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
235–238

I'd say its a style choice. Preferring llvm_unreachable over assert.

foad added inline comments.May 27 2021, 2:10 AM
llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
118

Add the nullptr initializion here instead of in the constructor?

235–238

Then it's a bad choice :) I agree with Matt. assert(CG) or assert(CG && "message") are shorter and simpler and more familiar because they're already ubiquitous throughout llvm.

248

It's more common to use a SmallVector for this, with push_back and pop_back_val.

madhur13490 added inline comments.May 27 2021, 2:28 AM
llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
118

May I know why?

235–238

As per coding standards, llvm_unreachable seems more preferred.
https://llvm.org/docs/CodingStandards.html#assert-liberally

@rampitec What do you think? Internally, I changed this from assert.

248

I need a queue. This is implementing BFS.

foad added inline comments.May 27 2021, 2:37 AM
llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
118

It's slightly better style because if you have more than one constructor, you don't have to repeat the same initialization value for every constructor. It doesn't really matter if you only have one constructor.

use NSDMI for ModuleCG

madhur13490 marked an inline comment as done.May 27 2021, 3:11 AM

Generally LGTM. Please wait other reviewers too.

llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
235–238

Internally if was "if() assert(false ...)". assert(false) shall never be used, that's why I suggested llvm_unreachange() instead. But "assert(ModuleCG);" is a more common thing.

madhur13490 added inline comments.May 28 2021, 7:38 AM
llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
235–238

There are ways to implement this:
A.

if (!ModuleCG) { 
   assert(false, "ModuleCG not present"); 
   return; 
} 
<rest code>

B. Just

assert(ModuleCG && "Module CG not present"); 
< rest code>

C.

 if (!ModuleCG) { 
      llvm_unreachable("ModuleCG"); 
      return; 
 } 
<rest code>

Matt suggests B but in release builds the code would just crash because it would end up accessing nullptr.
I had A initially because it asserts as well as returns gracefully so it does not crash. C is similar to A just llvm_unreachable replaces assert.

I prefer but C because it follows coding standard and also returns gracefully returns in release builds.

Which one we agree the best here?

madhur13490 added inline comments.May 28 2021, 7:40 AM
llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
235–238

Typo: I prefer C

foad added a comment.May 28 2021, 8:01 AM

You cannot recover after an "unreachable" or a failed assertion.

C [...] returns gracefully returns in release builds.

No it doesn't. In a release build llvm_unreachable will usually expand to __builtin_unreachable() or __assume(false) and the optimising compiler will remove any code after that point because it is allowed to assume that it is not reachable.

You cannot recover after an "unreachable" or a failed assertion.

C [...] returns gracefully returns in release builds.

No it doesn't. In a release build llvm_unreachable will usually expand to __builtin_unreachable() or __assume(false) and the optimising compiler will remove any code after that point because it is allowed to assume that it is not reachable.

Ok,, B seems common in code base too.

Fixed one more test after rebase.

madhur13490 marked 2 inline comments as done.May 28 2021, 12:29 PM

add comment about keeping indirect calls handling separate

arsenm accepted this revision.Jun 1 2021, 11:52 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
221

I don't think null F can reach here

235–238

C doesn't follow the coding standard and adds dead code. You also should NOT be attempting to gracefully handle invalid cases in a release build. The optimizer will happily delete this null check anyway

This revision is now accepted and ready to land.Jun 1 2021, 11:52 AM
madhur13490 added inline comments.Jun 1 2021, 11:56 AM
llvm/lib/Target/AMDGPU/AMDGPUPropagateAttributes.cpp
221

One of the test cases from AOMP reached here with null F.

This patch causes the amdgpu-propagate-attributes-late pass to loop forever when the input IR has a recursive function call (and a couple of other minor things). Reduced test case:

target triple = "amdgcn-amd-amdhsa"

define i64 @f(i64 %x)  {
entry:
  br i1 undef, label %return, label %if.end

if.end:
  %call = call i64 @f(i64 undef) 
  unreachable

return:
  ret i64 1
}

define i8* @unrelated() {
entry:
  ret i8* bitcast (i64 (i64)* @f to i8*)
}

declare amdgpu_kernel void @has_a_kernel(i32)

Reverting this patch fixes that - compilation proceeds correctly, instead of looping forever.

I believe this patch is an optimisation. Tests don't cover functional changes, there is no reference to a bug report and I can't think of anything this might be fixing. The commit message doesn't say what motivates the change. I'm therefore going to revert it on the basis that it is broken.

The attributor framework was written to factor out code very like this, and it won't fall over on a recursive function call, so my preference remains to change over to that instead of doing amdgpu-only callgraph walking stuff.

llvm/test/CodeGen/AMDGPU/propagate-attributes-common-callees.ll
1

I can't tell what this test is looking for. What do you mean by common callees here?