This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add GlobalDCE before internalization pass
ClosedPublic

Authored by yaxunl on Mar 17 2021, 8:19 AM.

Details

Summary

The internalization pass only internalizes global variables
with no users. If the global variable has some dead user,
the internalization pass will not internalize it.

To be able to internalize global variables with dead
users, a global dce pass is needed before the
internalization pass.

This patch adds that.

Diff Detail

Event Timeline

yaxunl requested review of this revision.Mar 17 2021, 8:19 AM
yaxunl created this revision.
tra added a comment.Mar 17 2021, 10:09 AM

Shouldn't the unused ASCs be eliminated by DCE? We seem to be cleaning up the consequences of a problem that happened somewhere else.

tra added a comment.Mar 17 2021, 10:15 AM

I do not see any hanging ASCs in the generated IR with clang@HEAD: https://godbolt.org/z/TE4Yhr

What do I miss?

In D98783#2632153, @tra wrote:

I do not see any hanging ASCs in the generated IR with clang@HEAD: https://godbolt.org/z/TE4Yhr

What do I miss?

The unused casts are invisible since they are not used by any visible IR's. To see them, you need to dump the users of these global variables explicitly.

In D98783#2632143, @tra wrote:

Shouldn't the unused ASCs be eliminated by DCE? We seem to be cleaning up the consequences of a problem that happened somewhere else.

DCE does not eliminate these unused ASCs because normally they should not exist. That's why we do not want them to be kept in the IR emitted by clang.

yaxunl updated this revision to Diff 331363.Mar 17 2021, 1:39 PM

bug fix. only remove unused casts.

tra added a comment.Mar 17 2021, 4:34 PM
In D98783#2632143, @tra wrote:

Shouldn't the unused ASCs be eliminated by DCE? We seem to be cleaning up the consequences of a problem that happened somewhere else.

DCE does not eliminate these unused ASCs because normally they should not exist. That's why we do not want them to be kept in the IR emitted by clang.

TBH, these 'invisible' ASCs do bother me. I wonder what else is present in the IR that we can't easily examine with existing tools?
In principle I'm OK with eliminating such dangling ASCs here as a short-term workaround, but I think it's potentially a more general issue which needs a more general solution.

One way to consistently deal with something like that is to codegen something using those ASCs, but which would be considered unused and later DCE'd along with unused ASCs.
We could use some sort of counterpart for @llvm.used only for potentially unused IR we create.
Tying them to such @llvm_maybe_unused would avoid the problem.

In D98783#2633261, @tra wrote:
In D98783#2632143, @tra wrote:

Shouldn't the unused ASCs be eliminated by DCE? We seem to be cleaning up the consequences of a problem that happened somewhere else.

DCE does not eliminate these unused ASCs because normally they should not exist. That's why we do not want them to be kept in the IR emitted by clang.

TBH, these 'invisible' ASCs do bother me. I wonder what else is present in the IR that we can't easily examine with existing tools?
In principle I'm OK with eliminating such dangling ASCs here as a short-term workaround, but I think it's potentially a more general issue which needs a more general solution.

The invisible LLVM constants only happen when they are created but not used later, which is rare since usually a constant is created and used immediately, making them visible.

One way to consistently deal with something like that is to codegen something using those ASCs, but which would be considered unused and later DCE'd along with unused ASCs.
We could use some sort of counterpart for @llvm.used only for potentially unused IR we create.
Tying them to such @llvm_maybe_unused would avoid the problem.

That's one solution. Another solution is to add a member function to LLVM module to clean up unused LLVM constants and add check to LLVM IR verifier to make sure there is no unused constants.

tra added a subscriber: rsmith.Mar 22 2021, 12:08 PM

We may want to add someone with more expertise with the IR as a reviewer. I'd like an educated opinion on whether the invisible dangling IR is something that needs fixing in general or if it's OK to just clean it up in this particular case. Or both.

@rjmccall, @rsmith -- do you have any suggestions -- either on the subject of the invisible dangling IR or on who may be the right person to talk to?

In D98783#2642269, @tra wrote:

We may want to add someone with more expertise with the IR as a reviewer. I'd like an educated opinion on whether the invisible dangling IR is something that needs fixing in general or if it's OK to just clean it up in this particular case. Or both.

@rjmccall, @rsmith -- do you have any suggestions -- either on the subject of the invisible dangling IR or on who may be the right person to talk to?

If Clang is creating constants unnecessarily, we should try to avoid that on general compile time / memory usage grounds unless doing so is a serious problem. But I don't think it should be our responsibility to GC unused constant expressions in order to make DCE work.

It's extremely common to create bitcast global constants. So the fact that this happens with addrspacecasts but hasn't been a persistent problem with bitcasts makes more suspect that DCE actually tries to handle this, but something about what it's doing isn't aware of addrspacecast. And indeed, LLVM's GlobalDCE seems to call a method called removeDeadConstantUsers() before concluding that a constant can't be thrown away. So either you're using a different transform that needs to do the same thing, or something is stopping removeDeadConstantUsers() from eliminating this addrspacecast.

In D98783#2642269, @tra wrote:

We may want to add someone with more expertise with the IR as a reviewer. I'd like an educated opinion on whether the invisible dangling IR is something that needs fixing in general or if it's OK to just clean it up in this particular case. Or both.

@rjmccall, @rsmith -- do you have any suggestions -- either on the subject of the invisible dangling IR or on who may be the right person to talk to?

If Clang is creating constants unnecessarily, we should try to avoid that on general compile time / memory usage grounds unless doing so is a serious problem. But I don't think it should be our responsibility to GC unused constant expressions in order to make DCE work.

It's extremely common to create bitcast global constants. So the fact that this happens with addrspacecasts but hasn't been a persistent problem with bitcasts makes more suspect that DCE actually tries to handle this, but something about what it's doing isn't aware of addrspacecast. And indeed, LLVM's GlobalDCE seems to call a method called removeDeadConstantUsers() before concluding that a constant can't be thrown away. So either you're using a different transform that needs to do the same thing, or something is stopping removeDeadConstantUsers() from eliminating this addrspacecast.

You are right. GlobalDCE is able to remove the dead addr space casts added by clang. The reason why the useless global variables are not internalized is that the internalization pass is done before GlobalDCE, whereas the internalization pass only internalizes useless global variables. To be able to internalize global variables with dead users, I need to add another GlobalDCE before internalization pass.

yaxunl updated this revision to Diff 337825.Apr 15 2021, 10:54 AM
yaxunl retitled this revision from [CUDA][HIP] Remove unused addr space casts to [AMDGPU] Add GlobalDCE before internalization pass.
yaxunl edited the summary of this revision. (Show Details)
yaxunl added a reviewer: rampitec.

Add GlobalDCE before internalization pass.

tra accepted this revision.Apr 15 2021, 12:56 PM

LGTM with a test nit.

clang/test/CodeGenCUDA/unused-global-var.cu
24–30

Mixing CHECK and CHECK-NOT is tricky and, in general, only works if things are always in the same order.

E.g. if does v3 get emitted after v4, the test will still pass.

One way to deal with that would be to split the positive and negative checks into separate runs.
First one would check the variables we do want to keep with CHECK-DAG.
The other one would only check for the absence of the variables with -NOT.

This revision is now accepted and ready to land.Apr 15 2021, 12:56 PM
arsenm added inline comments.Apr 15 2021, 12:57 PM
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
581

Should we move where the internalize pass is added instead?

yaxunl marked 2 inline comments as done.Apr 15 2021, 2:01 PM
yaxunl added inline comments.
clang/test/CodeGenCUDA/unused-global-var.cu
24–30

will do

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
581

we need two global dce, one before internalize, one after internalize.

the first global dce will eliminate the dead users of the global vars, but not the vars themselves since they are external.

the internalize pass will make the useless vars internal.

the second global dec will eliminate the useless internal global vars. Without the second global dce, the useless global var will not be eliminated.

yaxunl updated this revision to Diff 337900.Apr 15 2021, 2:25 PM
yaxunl marked 2 inline comments as done.

revised tests by Artem's comments.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 17 2021, 8:25 AM