This is an archive of the discontinued LLVM Phabricator instance.

[amdgpu] Expand all ConstantExpr users of LDS variables in instructions
ClosedPublic

Authored by JonChesterfield on Sep 7 2022, 6:44 AM.

Details

Summary

Bug noted in D112717 can be sidestepped with this change.

Expanding all ConstantExpr involved with LDS up front makes the variable specialisation simpler. Excludes ConstantExpr that don't access LDS to avoid disturbing codegen elsewhere.

Diff Detail

Event Timeline

JonChesterfield created this revision.Sep 7 2022, 6:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2022, 6:44 AM
JonChesterfield requested review of this revision.Sep 7 2022, 6:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2022, 6:44 AM

Haven't landed this previously because of the test churn. check-openmp is fine, check-llvm needs manual patch up.

Failed Tests (9):
  LLVM :: CodeGen/AMDGPU/ds_read2.ll
  LLVM :: CodeGen/AMDGPU/ds_write2.ll
  LLVM :: CodeGen/AMDGPU/loop_break.ll
  LLVM :: CodeGen/AMDGPU/lower-kernel-lds-constexpr.ll
  LLVM :: CodeGen/AMDGPU/lower-kernel-lds-super-align.ll
  LLVM :: CodeGen/AMDGPU/lower-lds-struct-aa-memcpy.ll
  LLVM :: CodeGen/AMDGPU/lower-lds-struct-aa-merge.ll
  LLVM :: CodeGen/AMDGPU/lower-lds-struct-aa.ll
  LLVM :: CodeGen/AMDGPU/lower-module-lds-constantexpr.ll
  • update three tests
  • update some more tests
  • rebase, two more tests. Codegen improvement is coincidental
JonChesterfield added inline comments.Sep 9 2022, 8:12 AM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
308

Replacing the ConstantExpr with Instructions is not necessary since the replacement is itself a ConstantExpr. However, if a nexted ConstantExpr is used from multiple functions, the control flow to trace back to the corresponding instruction in a given function is awkward. It's probably better to unconditionally replace every constexpr in every instruction with more instructions, at least as far as LDS lowering is concerned.

  • Change strategy, always expand constantexpr on lds
JonChesterfield added a comment.EditedSep 9 2022, 2:08 PM

The current revision will fix the O0 codegen bug @carlo.bertolli is looking at.

It's possible, though awkward, to keep the original idea of deleting the call to replaceConstantUsesInFunction if the lambda passed to replaceLDSVariablesWithStruct works much harder to identify the function corresponding to a Use. That's approximately as complicated as expanding the ConstantExpr up front. Choosing this approach because it's convenient for the other WIP on LDS which replaces some expressions with function calls and thus also needs to expand ConstantExpr.

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
208

This needs a test case, though the implementation that was called into by replaceConstantUsesInFunction unconditionally dereferences it which suggests empty blocks don't get here in practice.

JonChesterfield retitled this revision from [amdgpu] Remove unnecessary removal of constantexpr, expected to fix O0 problem to [amdgpu] Expand all ConstantExpr users of LDS variables in instructions.Sep 9 2022, 2:09 PM
JonChesterfield edited the summary of this revision. (Show Details)
JonChesterfield edited the summary of this revision. (Show Details)Sep 12 2022, 6:35 AM
JonChesterfield added reviewers: arsenm, rampitec.
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
208

Oh, right. Basic blocks can't be empty - closest one can get is a branch, and in that case we have an instruction to insert before.

  • blocks aren't empty, contain at least a br
rampitec added inline comments.Sep 12 2022, 11:11 AM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
165

Should this be a SetVector?

foad added a comment.Sep 13 2022, 1:39 AM

Bug noted in D112717 can be sidestepped with this change.

Can you add a regression test for that bug?

Passed internal epsdb in 733447

Bug noted in D112717 can be sidestepped with this change.

Can you add a regression test for that bug?

I could - would have to check the exact failure mode and write a fairly complicated test case. This code can't fall victim to the same bug, as it emits instructions for each phi constantexpr, and the currently broken code is best deleted if/after this patch lands. I'm not sure the test would be value adding but will write it if necessary.

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
165

I don't think so. That would avoid pushing constantexpr that use N distinct global variables N times, but I can't think of one that looks like that. The LDS variables tend to be used by addrspacecast or bitcast nodes at the root. So I think we'd gain nothing from the set behaviour.

If I've missed a case, multiple copies of the same constantexpr here don't affect correctness anyway as there's set semantics further down.

This revision is now accepted and ready to land.Sep 13 2022, 1:23 PM
  • Add a regression test for duplicate constantexpr in phi arguments

Bug noted in D112717 can be sidestepped with this change.

Can you add a regression test for that bug?

I could - would have to check the exact failure mode and write a fairly complicated test case

Test case was easier than anticipated - the constantexpr didn't have to be nested and block order doesn't make a difference. Added before commit. Fails the verifier before this patch, correct after.

This revision was landed with ongoing or failed builds.Sep 13 2022, 11:56 PM
This revision was automatically updated to reflect the committed changes.
arsenm added inline comments.Sep 14 2022, 10:17 AM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
166

Needs a bunch of braces throughout

foad added inline comments.Nov 17 2022, 1:46 AM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
160

What is replaceConstantExprInFunction?