This is an archive of the discontinued LLVM Phabricator instance.

[HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols
ClosedPublic

Authored by gandhi21299 on Sep 13 2021, 11:09 AM.

Details

Summary

By default clang emits complete contructors as alias of base constructors if they are the same.
The backend is supposed to emit symbols for the alias, otherwise it causes undefined symbols.
@yaxunl observed that this issue is related to the llvm options -amdgpu-early-inline-all=true
and -amdgpu-function-calls=false. This issue is resolved by only inlining global values
with internal linkage. The getCalleeFunction() in AMDGPUResourceUsageAnalysis also had
to be extended to support aliases to functions. inline-calls.ll was corrected appropriately.

Diff Detail

Event Timeline

gandhi21299 created this revision.Sep 13 2021, 11:09 AM
gandhi21299 requested review of this revision.Sep 13 2021, 11:09 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 13 2021, 11:09 AM
gandhi21299 edited the summary of this revision. (Show Details)Sep 13 2021, 11:12 AM

While I think the early inliner is largely obsolete, it should still handle aliases correctly

arsenm added inline comments.Sep 13 2021, 11:15 AM
clang/lib/Driver/ToolChains/Clang.cpp
5089

This looks like an unrelated change?

We cannot disable early inline all since this will cause performance regressions. Instead, the early inline all pass should be fixed so it does not remove aliases.

should the amdgpu-early-inline-all flag be deleted?

should the amdgpu-early-inline-all flag be deleted?

No. It is still used by HIP. We will deprecate it in the feature, but it is not ready yet.

gandhi21299 marked an inline comment as done.Sep 13 2021, 12:33 PM
gandhi21299 added inline comments.
clang/lib/Driver/ToolChains/Clang.cpp
5089

Ahh yes, I will get rid of it.

gandhi21299 marked an inline comment as done.
  • set GlobalOpt parameter to false by default to disallow alias elimination when the options EarlyInlineAll and EnableFunctionCalls are true and false, respectively.
clang/lib/Driver/ToolChains/Clang.cpp
5089

This was part of a revert that is required for this patch to function.

gandhi21299 edited the summary of this revision. (Show Details)Sep 14 2021, 12:24 PM
gandhi21299 added a reviewer: arsenm.
  • added the include header for HIP runtime
  • converted the HIP test into a CUDA test

@yaxunl Under what criteria should an alias not be removed?

@yaxunl Under what criteria should an alias not be removed?

If the linkage of the alias is not internal, then it should not be removed.

Internal linkage detection works great for our purposes but it causes a failure in llvm/test/CodeGen/AMDGPU/inline-calls.ll due to @func_alias unable to be casted into a Function. If we pass through that, the @kernel3 causes the error: scalar registers (98) exceeds limit (96) in function 'kernel3'.

@yaxunl I think we have two ways to go from here:

  1. If appropriate, reset the maximum number of scalar registers allowed in @kernel3 (inline-calls.ll) to fix the test.
  2. Determine a stronger condition for inlining.

Internal linkage detection works great for our purposes but it causes a failure in llvm/test/CodeGen/AMDGPU/inline-calls.ll due to @func_alias unable to be casted into a Function. If we pass through that, the @kernel3 causes the error: scalar registers (98) exceeds limit (96) in function 'kernel3'.

That almost sounds like using the wrong subtarget for the alias

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
694

This needs a backend test

gandhi21299 marked an inline comment as done.
  • Prevent removing alias if the GlobalAlias does not have internal linkage
  • replaced a cast with a dyn_cast since the return value from getCalleeFunction() is not always a Function
  • RUN on line 2 was causing 2 more scalar registers to be used on tonga due to @func_alias not being inlined, hence I eliminated that test
  • RUN on line 3 generated a call instruction to an aliased function which is not supported on r600 (according to @arsenm ), hence I eliminated that test as well
gandhi21299 edited the summary of this revision. (Show Details)Sep 22 2021, 1:44 PM
  • refreshing patch
arsenm added inline comments.Sep 23 2021, 9:40 AM
llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
65 ↗(On Diff #374369)

I think this is not the right place for this. If we can determine the callee function, we should have directly set it in the instruction during call lowering

gandhi21299 added inline comments.Sep 23 2021, 10:05 AM
llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
65 ↗(On Diff #374369)

Which file would that be in?

arsenm added inline comments.Sep 23 2021, 10:06 AM
llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
65 ↗(On Diff #374369)

SIISelLowering and AMDGPUCallLowering

  • Declare an unhandled call lowering in SelectionDAG when a callee is encountered which cannot be casted into a Function
  • I am still investigating the effects on GlobalISel side of things, there seems to be a problem when lowering a call to @func in @kernel as well.
  • inline-calls.ll is expected to fail with this patch, we could turn it into a negative test depending on how the work goes.

It does not look like function calls are supported yet in AMDGPUCallLowering, is that correct?

Pls update the description of the patch and also make sure it passes internal CI.

clang/test/CodeGenCUDA/amdgpu-alias-undef-symbols.cu
2

this test needs

// REQUIRES: amdgpu-registered-target, clang-driver

llvm/test/CodeGen/AMDGPU/inline-calls.ll
2–3 ↗(On Diff #375079)

why these two lines are removed?

  • replaced a cast with a dyn_cast since the return value from getCalleeFunction() is not always a Function
  • RUN on line 2 was causing 2 more scalar registers to be used on tonga due to @func_alias not being inlined, hence I eliminated that test
  • RUN on line 3 generated a call instruction to an aliased function which is not supported on r600 (according to @arsenm ), hence I eliminated that test as well

@yaxunl

gandhi21299 marked an inline comment as done.
  • added the REQUIRES line as requested by Sam
gandhi21299 edited the summary of this revision. (Show Details)Sep 27 2021, 9:08 AM

@yaxunl Should inline-calls.ll be converted into an expected failing test or removed? (to avoid cast failure in AMDGPUResourceAnalysis to break the test)

gandhi21299 added a reviewer: tstellar.
  • declare failure when lowering an accessor of a callee which is not a function, in GlobalISel
yaxunl added inline comments.Sep 28 2021, 11:59 AM
llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp
96–97 ↗(On Diff #375655)

If we do this for older GPU's (e.g. Tonga/redwood), IR's using aliases will fail on them. I don't think it is acceptable.

Is it possible to restrict this change to gfx9 and above? Or should we introduce some feature to indicate 'alias support' and use that to restrict this change to subtargets supporting this feature.

llvm/test/CodeGen/AMDGPU/inline-calls.ll
1 ↗(On Diff #375655)

need to add check for gfx906 and gfx1030

gandhi21299 added inline comments.Sep 28 2021, 12:01 PM
llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp
96–97 ↗(On Diff #375655)

Restricting this change to gfx9 and above sounds simpler and more relevant with the problem as well.

gandhi21299 abandoned this revision.Sep 30 2021, 9:59 AM
gandhi21299 reclaimed this revision.

Sorry, that was a mistake.

inline-calls.ll failed on gfx908 due to the change in SIISelLowering.cpp, line 3015. Without the change, there is a failure in AMDGPUResourceAnalysis.cpp, line 65 because Op.getGlobal() is not a Function.

  • Since callees may alias to a function pointer, it makes sense for getCalleeFunction(...) to return a Function which is a cast of the operand of a GlobalAlias.
  • eliminated changes in SIISelLowering
gandhi21299 edited the summary of this revision. (Show Details)Oct 1 2021, 1:00 PM
gandhi21299 added a reviewer: Restricted Project.Oct 7 2021, 8:09 AM

refreshing patch

added -nogpulib and -nogpuinc flags to amdgpu-alias-undef-symbols.cu

Passed internal CI

gandhi21299 added inline comments.Oct 12 2021, 1:16 PM
llvm/test/CodeGen/AMDGPU/inline-calls.ll
3 ↗(On Diff #378218)

@tstellar Is there a way to restrict the AlwaysInliner to only run on amdgcn architecture?

add a restrictions to what architecture AlwaysInliner should run on, updated the inline-calls.ll test.

yaxunl accepted this revision.Oct 15 2021, 8:45 AM

LGTM. Thanks.

This revision is now accepted and ready to land.Oct 15 2021, 8:45 AM
thakis added a subscriber: thakis.Oct 15 2021, 12:04 PM

This breaks tests on Mac: http://45.33.8.238/mac/37119/step_7.txt

Please take a look and revert for now if it takes a while to fix.

@gandhi21299 you may need to add "-target x86_64-unknown-linux-gnu" to your codegen test to avoid issue with Darwin toolchain.

gandhi21299 reopened this revision.Oct 15 2021, 3:25 PM
This revision is now accepted and ready to land.Oct 15 2021, 3:25 PM

added -target option in the test amdgpu-alias-undef-symbols.cu

@thakis can you please check if this solution is sufficient? Thanks for bringing it up

MaskRay added inline comments.
clang/test/CodeGenCUDA/amdgpu-alias-undef-symbols.cu
4

non-driver tests prefer %clang_cc1.

%clang invokes the driver and has varying behaviors on different platforms. Include paths/resource dir may be quite different.

gandhi21299 added inline comments.Oct 16 2021, 12:33 PM
clang/test/CodeGenCUDA/amdgpu-alias-undef-symbols.cu
4

Alias is not generated when I make the change to:

// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx906 -aux-triple x86_64-unknown-linux-gnu \
// RUN:   -x hip -fcuda-is-device -fgpu-rdc -O3 -mllvm -amdgpu-early-inline-all=true \
// RUN:   -mllvm -amdgpu-function-calls=false -emit-llvm %s -o - | FileCheck %s
arsenm added inline comments.Oct 19 2021, 1:16 PM
llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
67 ↗(On Diff #380537)

I thought aliases could include embedded bitcasts of the function type, so the function wouldn't directly appear here

gandhi21299 added inline comments.Oct 20 2021, 10:59 AM
llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
67 ↗(On Diff #380537)

Can you please elaborate on "include embedded bitcasts of the function type"? It's a consequence of the AlwaysInliner where the callee gets replaced by the alias to a function, ie. @func_alias gets replaced by @func in the inline-calls.ll test.

arsenm added inline comments.Oct 20 2021, 11:15 AM
llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
67 ↗(On Diff #380537)

Something like this where the alias changes the type from the original function:

@add1alias3 = alias float (float), bitcast (i32 (i32)* @add1 to float(float)*)
gandhi21299 added inline comments.Oct 20 2021, 11:31 AM
llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
67 ↗(On Diff #380537)

I see, that will probably break the compiler since a bitcast expression is not a Function.