After generated call for ctor/dtor for entry, global variable for ctor/dtor are useless.
Remove them for non-lib profiles.
Lib profile still need these in case export function used the global variable which require ctor/dtor.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
We should also have a library test case that verifies that the global_dtors variable is kept.
clang/lib/CodeGen/CGHLSLRuntime.h | ||
---|---|---|
50 | You don't need to pass the triple in here. If you have the module, you have the triple. |
clang/lib/CodeGen/CGHLSLRuntime.cpp | ||
---|---|---|
208 | I question whether we should do this early or late. It feels wrong to me to have clang modifying IR. There are places where clang does this sort of thing so it isn't unprecedented, but it feels wrong. | |
clang/lib/Sema/SemaDeclAttr.cpp | ||
6898 ↗ | (On Diff #461271) | This change should be separate. It doesn't directly relate to the change you've described in the description. I understand you need it to use the same test source, but don't group unrelated functional changes. |
clang/test/CodeGenHLSL/GlobalDestructorsLib.hlsl | ||
1 ↗ | (On Diff #461271) | The source of this test and the test it was copied from are almost identical. It feels very much like they should be the same file with different check prefixes. You can even have them share a check prefix for the things that are common and only have a differing prefix for the things that are different. |
clang/lib/CodeGen/CGHLSLRuntime.cpp | ||
---|---|---|
208 | The reason I want to do it is that with the global variable for ctors/dtors, the ctor/dtor functions will not be removed in optimization passes. Also, we insert the call of dtor/ctors to entry which already did what the global variable for ctor/dtor asked. If the global variable for ctor/dtor is still kept, other backends might call the ctor/dtor again. | |
clang/lib/Sema/SemaDeclAttr.cpp | ||
6898 ↗ | (On Diff #461271) | I'll create another PR for this. |
clang/test/CodeGenHLSL/GlobalDestructorsLib.hlsl | ||
1 ↗ | (On Diff #461271) | Will merge it. |
clang/lib/CodeGen/CGHLSLRuntime.cpp | ||
---|---|---|
208 | That makes sense, and is a fair reasoning. This makes me wonder if we would benefit from having a target-specific hook to add early IR optimization passes, but that is a whole different discussion... |
clang/lib/CodeGen/CGHLSLRuntime.cpp | ||
---|---|---|
208 | And a related discussion. Maybe we should not insert call of ctor/dtor to entries inside a library for the same reason. We cannot remove the global variable for ctor/dtor when things need to be initialized for exported functions other than entries. |
Why are we using different mechanisms for global constructors in "libraries" vs. other code? If we need a mechanism in LLVM already, we might as well use it all the time?
For library, even though we could inline all constructors for entry functions in the library, the exported functions which is not entry function could also require the constructors to be called in the entry function after linking. So we cannot remove the global variable for the constructors in library.
For non-library cases, there should be only 1 entry function and the constructors are already called in the entry function. Keeping the global variable for the constructors will make the global variable has more than 1 store in some case which will disable some optimization opportunity. Also, backends might call the constructors again because the global variable still exists to mark the constructors.
Then we go to the question of why call constructors in clang codeGen. The reason for that is the old version of DXIL does not allow function calls. Call constructors in clang codeGen could help to remove function calls early. Also, it helps to get better optimization results.
To elaborate here in a different wording... You can kinda think of HLSL "library" builds a lot like static archive linking in C/C++, so for the most part library shaders just work like C++.
The non-library mode is _very_ different. For non-library builds there is no linker, we emit fully contained executable code from the backend. In non-library code we need to inline global constructors into the entry function, but we don't need to keep the global variables around because we have no dynamic linker or loader, shader execution is much more similar to baremetal targets in that regard.
It seems like it would be better to put the code in an LLVM IR transform pass, if we can. It separates the concerns more clearly, and it would make life easier for other compilers. (If you need it to run early, see registerPipelineEarlySimplificationEPCallback.)
I won't block the patch on that, though.
@efriedma, I think that's a great suggestion. @python3kgae, if you don't adjust this patch, can you file an issue for that so that we don't lose that feedback?
Thanks for the review.
I like the idea to move code to LLVM IR pass.
I'll commit this PR as is, and create another one to move the code.
You don't need to pass the triple in here. If you have the module, you have the triple.