We are seeing significant performance loss when an alloca fails to get promoted
to register. I have observed that this is due to the common type found when
attempting to rewrite partition users being unviable for promotion. While if we
would have continue looking for a type, we would have found a subtype in the
original allocated type that would have enabled promotion. Thus first check if
the initial common type found is promotion viable and if not then continue
looking instead of stopping with the initial common type found.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/AMDGPU/sroa-common-type-fail-promotion.ll | ||
---|---|---|
2 ↗ | (On Diff #437969) | test should go in test/Transforms/SROA, not CodeGen |
llvm/lib/Transforms/Scalar/SROA.cpp | ||
---|---|---|
4268–4269 | Using isVectorTy + VectorType is ugly. Use dyn_cast for the type check too? |
I'm not sure I understand how this change works. isVectorPromotionViable() doesn't seem to use SliceTy at all, it computes its own candidate types. Why does the picked slice type matter for vector promotion?
llvm/test/Transforms/SROA/sroa-common-type-fail-promotion.ll | ||
---|---|---|
3 | -mtriple=amdgcn-amd-amdhsa -mcpu=gfx908 is likely unnecessary here. -opaque-pointers can also be dropped (it's the default). | |
21 | The address spaces don't look essential for the test and add a lot of noise -- can probably be dropped. |
The picked slice type matters because rewritePartition() will attempt to convert the slice users to that type. This won't matter for isVectorPromotionViable() during the first pass over the alloca partition but during the second pass. The second pass over the alloca partition, isVectorPromotionViable() will check the type of any loads or stores that are the exact size of the partition. If we were able to convert those loads and stores into our picked slice type of the first iteration, then it will be used here.
llvm/lib/Transforms/Scalar/SROA.cpp | ||
---|---|---|
4268 | You didn't re-use the dyn_cast value |
llvm/lib/Transforms/Scalar/SROA.cpp | ||
---|---|---|
4268–4278 | From the flow here, it looks like it's trying to split vectors before arrays. Would this work better if moved after the array handling? Can you add some additional tests with structs and arrays of vectors? |
llvm/test/Transforms/SROA/sroa-common-type-fail-promotion.ll | ||
---|---|---|
338–342 | I'm not seeing why these allocas were not eliminated |
llvm/test/Transforms/SROA/sroa-common-type-fail-promotion.ll | ||
---|---|---|
338–342 | My change only works if there is a common type found and that common type happens to be a vectortype. In the other tests the common vectortype was found from a store <4 x float> ... instruction. If we remove such instruction then there is no common vectortype and my changes to look at the original allocated type and vector promotion check is not enabled. From what I have observed, SROA fails to promote these allocas because it found common slicety of float while there are load halfs. This causes some offset issues and failure to promote when visitLoadInst() is called. If we change float type to half type by adjusting the stores to also be half type then we would have no issue promoting the allocas. |
LGTM
llvm/test/Transforms/SROA/sroa-common-type-fail-promotion.ll | ||
---|---|---|
338–342 | Can you look into handling these cases in a follow on patch? |
Lowercase Check