This is an archive of the discontinued LLVM Phabricator instance.

[HLSL] Remove global ctor/dtor variable for non-lib profile.
ClosedPublic

Authored by python3kgae on Sep 15 2022, 4:53 PM.

Details

Summary

After generated call for ctor/dtor for entry, global variable for ctor/dtor are useless.
Remove them for non-lib profiles.
Lib profile still need these in case export function used the global variable which require ctor/dtor.

Diff Detail

Event Timeline

python3kgae created this revision.Sep 15 2022, 4:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2022, 4:53 PM
Herald added a subscriber: Anastasia. · View Herald Transcript
python3kgae requested review of this revision.Sep 15 2022, 4:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2022, 4:53 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
beanz added a comment.Sep 19 2022, 8:26 AM

We should also have a library test case that verifies that the global_dtors variable is kept.

clang/lib/CodeGen/CGHLSLRuntime.h
50 ↗(On Diff #461014)

You don't need to pass the triple in here. If you have the module, you have the triple.

Add test for dtor in lib.

python3kgae marked an inline comment as done.Sep 19 2022, 10:46 AM
beanz added inline comments.Sep 20 2022, 2:22 PM
clang/lib/CodeGen/CGHLSLRuntime.cpp
218

I question whether we should do this early or late. It feels wrong to me to have clang modifying IR. There are places where clang does this sort of thing so it isn't unprecedented, but it feels wrong.

clang/lib/Sema/SemaDeclAttr.cpp
6898 ↗(On Diff #461271)

This change should be separate. It doesn't directly relate to the change you've described in the description. I understand you need it to use the same test source, but don't group unrelated functional changes.

clang/test/CodeGenHLSL/GlobalDestructorsLib.hlsl
1 ↗(On Diff #461271)

The source of this test and the test it was copied from are almost identical. It feels very much like they should be the same file with different check prefixes. You can even have them share a check prefix for the things that are common and only have a differing prefix for the things that are different.

python3kgae marked 3 inline comments as done.Sep 20 2022, 2:38 PM
python3kgae added inline comments.
clang/lib/CodeGen/CGHLSLRuntime.cpp
218

The reason I want to do it is that with the global variable for ctors/dtors, the ctor/dtor functions will not be removed in optimization passes.
As a result, the global variable will have 2 stores ( 1 for the ctor, 1 for the inlined ctor ). That might cause optimizations to go a different path.

Also, we insert the call of dtor/ctors to entry which already did what the global variable for ctor/dtor asked. If the global variable for ctor/dtor is still kept, other backends might call the ctor/dtor again.

clang/lib/Sema/SemaDeclAttr.cpp
6898 ↗(On Diff #461271)

I'll create another PR for this.

clang/test/CodeGenHLSL/GlobalDestructorsLib.hlsl
1 ↗(On Diff #461271)

Will merge it.

beanz added inline comments.Sep 20 2022, 2:48 PM
clang/lib/CodeGen/CGHLSLRuntime.cpp
218

That makes sense, and is a fair reasoning. This makes me wonder if we would benefit from having a target-specific hook to add early IR optimization passes, but that is a whole different discussion...

python3kgae marked 4 inline comments as done.Sep 20 2022, 3:00 PM
python3kgae added inline comments.
clang/lib/CodeGen/CGHLSLRuntime.cpp
218

And a related discussion.

Maybe we should not insert call of ctor/dtor to entries inside a library for the same reason.

We cannot remove the global variable for ctor/dtor when things need to be initialized for exported functions other than entries.
If we insert call of ctor to entires, there will be multiple stores for the global variable( 1 for the ctor, 1 or more for the inlined ctor).
And other backends might call ctor/dtor again when linking.

python3kgae marked an inline comment as done.

Rebase for allow SV_GroupIndex in lib profile

Switching up code reviewers for the codegen questions.

Why are we using different mechanisms for global constructors in "libraries" vs. other code? If we need a mechanism in LLVM already, we might as well use it all the time?

Why are we using different mechanisms for global constructors in "libraries" vs. other code? If we need a mechanism in LLVM already, we might as well use it all the time?

For library, even though we could inline all constructors for entry functions in the library, the exported functions which is not entry function could also require the constructors to be called in the entry function after linking. So we cannot remove the global variable for the constructors in library.
For non-library cases, there should be only 1 entry function and the constructors are already called in the entry function. Keeping the global variable for the constructors will make the global variable has more than 1 store in some case which will disable some optimization opportunity. Also, backends might call the constructors again because the global variable still exists to mark the constructors.

Then we go to the question of why call constructors in clang codeGen. The reason for that is the old version of DXIL does not allow function calls. Call constructors in clang codeGen could help to remove function calls early. Also, it helps to get better optimization results.

beanz added a comment.Oct 6 2022, 11:25 AM

Why are we using different mechanisms for global constructors in "libraries" vs. other code? If we need a mechanism in LLVM already, we might as well use it all the time?

To elaborate here in a different wording... You can kinda think of HLSL "library" builds a lot like static archive linking in C/C++, so for the most part library shaders just work like C++.

The non-library mode is _very_ different. For non-library builds there is no linker, we emit fully contained executable code from the backend. In non-library code we need to inline global constructors into the entry function, but we don't need to keep the global variables around because we have no dynamic linker or loader, shader execution is much more similar to baremetal targets in that regard.

efriedma accepted this revision.Oct 6 2022, 12:13 PM

It seems like it would be better to put the code in an LLVM IR transform pass, if we can. It separates the concerns more clearly, and it would make life easier for other compilers. (If you need it to run early, see registerPipelineEarlySimplificationEPCallback.)

I won't block the patch on that, though.

This revision is now accepted and ready to land.Oct 6 2022, 12:13 PM
beanz added a comment.Oct 6 2022, 12:19 PM

@efriedma, I think that's a great suggestion. @python3kgae, if you don't adjust this patch, can you file an issue for that so that we don't lose that feedback?

It seems like it would be better to put the code in an LLVM IR transform pass, if we can. It separates the concerns more clearly, and it would make life easier for other compilers. (If you need it to run early, see registerPipelineEarlySimplificationEPCallback.)

I won't block the patch on that, though.

Thanks for the review.
I like the idea to move code to LLVM IR pass.
I'll commit this PR as is, and create another one to move the code.