This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Handle memset users in PromoteAlloca
ClosedPublic

Authored by Pierre-vh on Mar 16 2023, 6:35 AM.

Details

Summary

Allows allocas with memset users to be promoted.

This is intended to prevent patterns such as memset(&alloca, 0, sizeof(alloca)) (which I think can be emitted by frontends) from preventing a vectorization of allocas.

Fixes SWDEV-388784

Diff Detail

Event Timeline

Pierre-vh created this revision.Mar 16 2023, 6:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2023, 6:35 AM
Pierre-vh requested review of this revision.Mar 16 2023, 6:35 AM
arsenm added inline comments.Mar 16 2023, 6:42 AM
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
397

You can hide the intrinsic checks by dyn_casting to MemSetInst

405–407

I don't see why you need this restriction but if you want to lift it in a separate patch that's fine

558–561

I'd expect this to be up with the MemTransferInst handling, dyn_casting to MemSetInst

llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll
32

Not checks are fragile,, might as well generate the checks here

62

.double?

arsenm added inline comments.Mar 16 2023, 6:44 AM
llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll
2

Probably should also run SROA to make sure the end result is correct

Pierre-vh updated this revision to Diff 505798.Mar 16 2023, 6:59 AM
Pierre-vh marked 6 inline comments as done.

Comments

arsenm added inline comments.Mar 16 2023, 7:07 AM
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
392

I'd expect the input to be the MemSetInst and the dyn_cast would be on the caller side. Also, use TypeSize?

398

Move the size query here?

502

getTypeStoreSize?

637

store size

Why is this not part and done by sroa?

Pierre-vh updated this revision to Diff 505801.Mar 16 2023, 7:13 AM
Pierre-vh marked 4 inline comments as done.

Address comments

Why is this not part and done by sroa?

Do you mean the PromoteAlloca pass? I'm not sure

Why is this not part and done by sroa?

@lebedev.ri had some patches to do a much less aggressive vector formation during SROA

jdoerfert added inline comments.Mar 16 2023, 11:59 AM
llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll
32

the store of val is dead and SROA drops it. What is the point?
Fixing the test and I see reasonable SROA output, what is missing? https://godbolt.org/z/Pacce89fY

Pierre-vh added inline comments.Mar 17 2023, 12:46 AM
llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll
32

What I'm trying to do here is verify our assumption that the memset is correctly interpreted by SROA & removed. (Which is verified by the presence of the zeroinitializer in the insertelement)
The interesting check line is the OPT one, the SROA check is just to ensure the alloca is removed after.

Why is this not part and done by sroa?

@lebedev.ri had some patches to do a much less aggressive vector formation during SROA

(Unrelated to the review) do you have a link to those patches? I was planning to do some PromoteAlloca improvements later but if we're planning to merge it with SROA later then I suppose time would be better spent moving in that direction and helping these patches land

arsenm added inline comments.Mar 17 2023, 7:17 AM
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
397

I null check redundant with one in caller

632

Can dyn_cast and use below instead of casting again

llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll
32

Might as well another reload and store to another output

arsenm added inline comments.Mar 17 2023, 8:17 AM
llvm/test/CodeGen/AMDGPU/promote-alloca-memset.ll
4

Don't see the point of separating the run lines, it's really only the aggregate result that matters

Pierre-vh marked 5 inline comments as done.

Comments

arsenm added inline comments.Mar 27 2023, 5:25 AM
llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
399

!I->isVolatile() would be clearer

636

I don't see how this could differ with the current code

Pierre-vh updated this revision to Diff 508607.Mar 27 2023, 5:43 AM
Pierre-vh marked an inline comment as done.

Comments

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
636

I think t's better to be safe especially since this is (I think) cheap to do, but I could replace it with an assertion if you prefer.

Pierre-vh added a reviewer: nhaehnle.
Pierre-vh added a reviewer: rampitec.
arsenm accepted this revision.Mar 28 2023, 5:28 AM
This revision is now accepted and ready to land.Mar 28 2023, 5:28 AM
This revision was landed with ongoing or failed builds.Mar 28 2023, 6:02 AM
This revision was automatically updated to reflect the committed changes.