By default clang emits complete contructors as alias of base constructors if they are the same.
The backend is supposed to emit symbols for the alias, otherwise it causes undefined symbols.
@yaxunl observed that this issue is related to the llvm options -amdgpu-early-inline-all=true
and -amdgpu-function-calls=false. This issue is resolved by only inlining global values
with internal linkage. The getCalleeFunction() in AMDGPUResourceUsageAnalysis also had
to be extended to support aliases to functions. inline-calls.ll was corrected appropriately.
Details
- Reviewers
yaxunl arsenm nhaustov tstellar - Group Reviewers
Restricted Project - Commits
- rG0567f0333176: [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols
rG03375a3fb33b: [HIP] [AlwaysInliner] Disable AlwaysInliner to eliminate undefined symbols
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
While I think the early inliner is largely obsolete, it should still handle aliases correctly
clang/lib/Driver/ToolChains/Clang.cpp | ||
---|---|---|
5094 | This looks like an unrelated change? |
We cannot disable early inline all since this will cause performance regressions. Instead, the early inline all pass should be fixed so it does not remove aliases.
No. It is still used by HIP. We will deprecate it in the feature, but it is not ready yet.
clang/lib/Driver/ToolChains/Clang.cpp | ||
---|---|---|
5094 | Ahh yes, I will get rid of it. |
- set GlobalOpt parameter to false by default to disallow alias elimination when the options EarlyInlineAll and EnableFunctionCalls are true and false, respectively.
clang/lib/Driver/ToolChains/Clang.cpp | ||
---|---|---|
5094 | This was part of a revert that is required for this patch to function. |
I think you may try fixing the following line:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp#L97
Internal linkage detection works great for our purposes but it causes a failure in llvm/test/CodeGen/AMDGPU/inline-calls.ll due to @func_alias unable to be casted into a Function. If we pass through that, the @kernel3 causes the error: scalar registers (98) exceeds limit (96) in function 'kernel3'.
@yaxunl I think we have two ways to go from here:
- If appropriate, reset the maximum number of scalar registers allowed in @kernel3 (inline-calls.ll) to fix the test.
- Determine a stronger condition for inlining.
That almost sounds like using the wrong subtarget for the alias
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp | ||
---|---|---|
694 ↗ | (On Diff #372573) | This needs a backend test |
- replaced a cast with a dyn_cast since the return value from getCalleeFunction() is not always a Function
- RUN on line 2 was causing 2 more scalar registers to be used on tonga due to @func_alias not being inlined, hence I eliminated that test
- RUN on line 3 generated a call instruction to an aliased function which is not supported on r600 (according to @arsenm ), hence I eliminated that test as well
llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp | ||
---|---|---|
68 | I think this is not the right place for this. If we can determine the callee function, we should have directly set it in the instruction during call lowering |
llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp | ||
---|---|---|
68 | Which file would that be in? |
llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp | ||
---|---|---|
68 | SIISelLowering and AMDGPUCallLowering |
- Declare an unhandled call lowering in SelectionDAG when a callee is encountered which cannot be casted into a Function
- I am still investigating the effects on GlobalISel side of things, there seems to be a problem when lowering a call to @func in @kernel as well.
- inline-calls.ll is expected to fail with this patch, we could turn it into a negative test depending on how the work goes.
It does not look like function calls are supported yet in AMDGPUCallLowering, is that correct?
@yaxunl Should inline-calls.ll be converted into an expected failing test or removed? (to avoid cast failure in AMDGPUResourceAnalysis to break the test)
- declare failure when lowering an accessor of a callee which is not a function, in GlobalISel
llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp | ||
---|---|---|
98–99 | If we do this for older GPU's (e.g. Tonga/redwood), IR's using aliases will fail on them. I don't think it is acceptable. Is it possible to restrict this change to gfx9 and above? Or should we introduce some feature to indicate 'alias support' and use that to restrict this change to subtargets supporting this feature. | |
llvm/test/CodeGen/AMDGPU/inline-calls.ll | ||
0–3 | need to add check for gfx906 and gfx1030 |
llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp | ||
---|---|---|
98–99 | Restricting this change to gfx9 and above sounds simpler and more relevant with the problem as well. |
inline-calls.ll failed on gfx908 due to the change in SIISelLowering.cpp, line 3015. Without the change, there is a failure in AMDGPUResourceAnalysis.cpp, line 65 because Op.getGlobal() is not a Function.
- Since callees may alias to a function pointer, it makes sense for getCalleeFunction(...) to return a Function which is a cast of the operand of a GlobalAlias.
add a restrictions to what architecture AlwaysInliner should run on, updated the inline-calls.ll test.
This breaks tests on Mac: http://45.33.8.238/mac/37119/step_7.txt
Please take a look and revert for now if it takes a while to fix.
@gandhi21299 you may need to add "-target x86_64-unknown-linux-gnu" to your codegen test to avoid issue with Darwin toolchain.
@thakis can you please check if this solution is sufficient? Thanks for bringing it up
clang/test/CodeGenCUDA/amdgpu-alias-undef-symbols.cu | ||
---|---|---|
4 | non-driver tests prefer %clang_cc1. %clang invokes the driver and has varying behaviors on different platforms. Include paths/resource dir may be quite different. |
clang/test/CodeGenCUDA/amdgpu-alias-undef-symbols.cu | ||
---|---|---|
4 | Alias is not generated when I make the change to: // RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -target-cpu gfx906 -aux-triple x86_64-unknown-linux-gnu \ // RUN: -x hip -fcuda-is-device -fgpu-rdc -O3 -mllvm -amdgpu-early-inline-all=true \ // RUN: -mllvm -amdgpu-function-calls=false -emit-llvm %s -o - | FileCheck %s |
llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp | ||
---|---|---|
67 | I thought aliases could include embedded bitcasts of the function type, so the function wouldn't directly appear here |
llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp | ||
---|---|---|
67 | Can you please elaborate on "include embedded bitcasts of the function type"? It's a consequence of the AlwaysInliner where the callee gets replaced by the alias to a function, ie. @func_alias gets replaced by @func in the inline-calls.ll test. |
llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp | ||
---|---|---|
67 | Something like this where the alias changes the type from the original function: @add1alias3 = alias float (float), bitcast (i32 (i32)* @add1 to float(float)*) |
llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp | ||
---|---|---|
67 | I see, that will probably break the compiler since a bitcast expression is not a Function. |
This looks like an unrelated change?