This is an archive of the discontinued LLVM Phabricator instance.

[SROA] Create additional vector type candidates based on store and load slices
ClosedPublic

Authored by zhuhan0 on Feb 2 2023, 3:45 PM.

Details

Summary

Second try at A-Wadhwani's https://reviews.llvm.org/D132096, which was reverted. The original patch had three issues:

  • https://reviews.llvm.org/D134032, which bjope kindly fixed. That patch is merged into this one.
  • GHI #57796. Fixed and added a test.
  • GHI #57821. I believe this is an undefined behavior which is not the fault of the original patch. Please see the issue for more details.

Original diff summary:

This patch adds additional vector types to be considered when doing promotion in SROA, based on the types of the store and load slices. This provides more promotion opportunities, by potentially using an optimal "intermediate" vector type.

For example, the following code would currently not be promoted to a vector, since __m128i is a <2 x i64> vector.

#include <xmmintrin.h>

__m128i packfoo0(int a, int b, int c, int d) {
  int r[4] = {a, b, c, d};
  __m128i rm;
  std::memcpy(&rm, r, sizeof(rm));
  return rm;
}
packfoo0(int, int, int, int):
        mov     dword ptr [rsp - 24], edi
        mov     dword ptr [rsp - 20], esi
        mov     dword ptr [rsp - 16], edx
        mov     dword ptr [rsp - 12], ecx
        movaps  xmm0, xmmword ptr [rsp - 24]
        ret

By also considering the types of the elements, we could find that the <4 x i32> type would be valid for promotion, hence removing the memory accesses for this function. In other words, we can explore other new vector types, with the same size but different element types based on the load and store instructions from the Slices, which can provide us more promotion opportunities.

Additionally, the step for removing duplicate elements from the CandidateTys vector was not using an equality comparator, which has been fixed.

Diff Detail

Event Timeline

zhuhan0 created this revision.Feb 2 2023, 3:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 2 2023, 3:45 PM
zhuhan0 requested review of this revision.Feb 2 2023, 3:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 2 2023, 3:45 PM
zhuhan0 edited the summary of this revision. (Show Details)Feb 2 2023, 3:56 PM
zhuhan0 added reviewers: MatzeB, bjope, dyung, lebedev.ri, aprantl.
zhuhan0 added a subscriber: A-Wadhwani.
zhuhan0 edited the summary of this revision. (Show Details)Feb 2 2023, 4:01 PM
arsenm added a subscriber: arsenm.Feb 2 2023, 5:45 PM
arsenm added inline comments.
llvm/lib/Transforms/Scalar/SROA.cpp
1960

The first isa<VectorType> should be redundant with the isValidElementType check

llvm/test/Transforms/SROA/vector-promotion.ll
960

Should add some vector of pointer tests

Is one of the tests testing that debug info is updated correctly in the new code path?

Is one of the tests testing that debug info is updated correctly in the new code path?

I don't think so. Not familiar with debug info TBH. Let me take a stab at that.

zhuhan0 updated this revision to Diff 494742.Feb 3 2023, 2:42 PM
  • Remove redundant isa<VectorType>. Add some vector of ptr tests.
  • Add opt -passes=-debugify,sroa test to vector-promotion.ll
zhuhan0 added inline comments.Feb 3 2023, 2:43 PM
llvm/test/Transforms/SROA/vector-promotion.ll
4

@aprantl How do you feel about this? I can pre-commit this to show the changes if that's better.

zhuhan0 updated this revision to Diff 495892.Feb 8 2023, 10:57 AM

Pre-committed tests in vector-promotion.ll

Can I get some review on the newer version please? Thanks in advance!

vangthao added inline comments.Feb 14 2023, 2:50 PM
llvm/test/Transforms/SROA/sroa-common-type-fail-promotion.ll
26

There seems to be a regression here.

zhuhan0 added inline comments.Feb 14 2023, 3:40 PM
llvm/test/Transforms/SROA/sroa-common-type-fail-promotion.ll
26

Hi, this is actually the result of a bug fix that we happened to put in in this diff (line 2001 - 2028 in the new version), not related with the optimization change. There was a bug where RankVectorTypes which is not an equivalence relation but used in std::unique. And when a non-equivalence relation is used, the behavior is undefined. Take test_zeroinit as an example, there are two candidates <4 x i32> and <8 x i16> during isVectorPromotionViable. But due to this bug, <8 x i16> was removed by CandidateTys.erase and <4 x i32> is not viable so the transformation failed unexpectedly. I could separate the bug fix if that's better. On the other hand, I don't think this bitcast addition is going to show up in the end after backend lowering anyways.

mingmingl added inline comments.
llvm/lib/Transforms/Scalar/SROA.cpp
1952–1959

(Take the liberty to chime in for my understanding)
Here Ty is inserted even if it doesn't have the exact size with P. Does it mean (in this patch) that slices (S) are considered for vector SROA even if it has different size than P?

zhuhan0 added inline comments.Feb 14 2023, 4:38 PM
llvm/lib/Transforms/Scalar/SROA.cpp
1952–1959

This Ty is not a candidate type. It's a type that shows up in load or store instructions and it'll be used later for considering other types. For example, if one Ty is i16 and there is one candidate type <2 x i32>, the inner loop will be able to add another candidate type <4 x i16>. Slices are still only considered if they have the same size as P, because only two places insert candidate types and they both enforce such requirement (line 1954 - 1955 and 1969 - 1970.

mingmingl added inline comments.Feb 14 2023, 10:00 PM
llvm/lib/Transforms/Scalar/SROA.cpp
1952–1959

many thanks for the example! Running this over https://godbolt.org/z/1rqGd9z3E and was able to understand how it helps (basically having more candidates vector types for checkVectorTypeForPromotion, not only the vector types seen in original partition slices).

zhuhan0 updated this revision to Diff 498169.Feb 16 2023, 3:26 PM

Separate RankVectorTypes bug fix into another diff.

MatzeB accepted this revision.Mar 6 2023, 4:30 PM

There is a curious change in types for test_memset, test_array_vector and test_array_vector2.

This appears to stem from the code added in https://github.com/llvm/llvm-project/commit/529eafd9beff233ba8debfc73e0b5c04cac36835 which I don't fully understand the intention right now. @lebedev.ri do you think it is good (or at least not bad) to use 8 x i16 in those tests?

Either way this seems outside the scope of this diff to discuss as it is pre-existing code. The change LGTM.

This revision is now accepted and ready to land.Mar 6 2023, 4:30 PM

Seems there's no objection. Landing this one.

This is causing crashes when building Skia. See https://reviews.llvm.org/D141188#4179444

This is causing crashes when building Skia. See https://reviews.llvm.org/D141188#4179444

Sorry about that. Do you have steps to repro?

This is causing crashes when building Skia. See https://reviews.llvm.org/D141188#4179444

Sorry about that. Do you have steps to repro?

Here are the files that clang diagnostic created for me:
https://drive.google.com/file/d/1GAQhpwv2UYwJYdVO9L5GHLbC244W37FO/view?usp=sharing

You'll have to tweak paths in the .sh file, but that should be enough to reproduce on your end.

This is causing crashes when building Skia. See https://reviews.llvm.org/D141188#4179444

Sorry about that. Do you have steps to repro?

Here are the files that clang diagnostic created for me:
https://drive.google.com/file/d/1GAQhpwv2UYwJYdVO9L5GHLbC244W37FO/view?usp=sharing

You'll have to tweak paths in the .sh file, but that should be enough to reproduce on your end.

Thanks! Fixed with https://github.com/llvm/llvm-project/commit/68a07c156e90e76a2129308fe28815e709276424.

dyung added a comment.Mar 9 2023, 6:57 PM

Hi @zhuhan0 one of our internal tests started to fail and I bisected it back to your change. I've put details and repro up in issue 61318, can you take a look?

dyung added a comment.Mar 9 2023, 7:17 PM

Hi @zhuhan0 one of our internal tests started to fail and I bisected it back to your change. I've put details and repro up in issue 61318, can you take a look?

Nevermind, I see this is the same issue I previously filed as issue #57821 which you believe to be due to undefined behavior in the spec. I'll have to discuss this further internally to see if perhaps an update to the test is warranted.

Hi @zhuhan0 one of our internal tests started to fail and I bisected it back to your change. I've put details and repro up in issue 61318, can you take a look?

Nevermind, I see this is the same issue I previously filed as issue #57821 which you believe to be due to undefined behavior in the spec. I'll have to discuss this further internally to see if perhaps an update to the test is warranted.

Hi yes I have the investigation details in the old issue. Let me know if there's any question.