This is an archive of the discontinued LLVM Phabricator instance.

[SLPVectorizer] Fix PR21780 Expansion of 256 bit vector loads fails to fold into shuffles
Needs ReviewPublic

Authored by dtemirbulatov on Sep 8 2017, 4:08 PM.

Details

Summary

This is the second part of PR21780 fix, here is part one https://reviews.llvm.org/D37579

This change allows to restore Load, Insert, GEP instructions removed by InstCombine pass. Here I am using intrinsic to keep information in the IR as it was discussed in http://lists.llvm.org/pipermail/llvm-dev/2017-July/115730.html.

Before restoring instructions we have have to check existing instructions for belonging to our basic block, also we check the shuffle mask and all offset, ...

Diff Detail

Event Timeline

dtemirbulatov created this revision.Sep 8 2017, 4:08 PM
hfinkel edited edge metadata.Sep 8 2017, 4:59 PM

Thanks for working on this. Please post an RFC on llvm-dev about the intrinsic and how you intend to use it. You can reference this review, but we should make sure that we have design consensus before proceeding here.

RKSimon edited edge metadata.Sep 10 2017, 3:50 AM

Some more style comment as with D37579

lib/Transforms/Vectorize/SLPVectorizer.cpp
4223

clang-format - braces

5685

It'd be much clearer to early out.

if (!C || C->getZExtValue() > MaxOffset)
  return nullptr;

// We found the first InsertElem in this chain.
uint64_t Offset = C->getZExtValue();
...
5738
for (unsigned i = 0, e = MaxOffset + 1; i < e; i++)
5749
if (!Use || Use->getParent() != BB)
5773
for (unsigned i = 0, e = MaxOffset + 1; i < e; i++)
5783

You're accessing GEPs[i - 1] multiple times - pull out?

Thanks for working on this. Please post an RFC on llvm-dev about the intrinsic and how you intend to use it. You can reference this review, but we should make sure that we have design consensus before proceeding here.

Here is the thread: http://lists.llvm.org/pipermail/llvm-dev/2017-September/117381.html

Fix review remarks and add an aggregate pointer as intrinsic's second parameter, remove getNextNode() usage.

Thanks for working on this. Please post an RFC on llvm-dev about the intrinsic and how you intend to use it. You can reference this review, but we should make sure that we have design consensus before proceeding here.

Here is the thread: http://lists.llvm.org/pipermail/llvm-dev/2017-September/117381.html

Yes. Are you planing on updating that thread? Last message I see says:

but I am looking now at "speculation_marker" metadata and I am still not sure how to implement it better.

are you still looking at some other kind of metadata solution? Did you mean for "metadata" here to mean the intrinsic you're proposing?

are you still looking at some other kind of metadata solution? Did you mean for "metadata" here to mean the intrinsic you're proposing?

No, I am happy with intrinsic solution.

reames added a subscriber: reames.Sep 26 2017, 11:11 AM
reames added inline comments.
test/Transforms/SLPVectorizer/X86/pr21780.ll
23

I think something might be missing here. You're forming a 4x wide load, but you've only proven dereferenceability for offsets 0, 1, 3. (i.e. not 2). How do we know it's safe to dereference between the two elements 1 & 3?

dtemirbulatov added a comment.EditedSep 26 2017, 3:48 PM

I think something might be missing here. You're forming a 4x wide load, but you've only proven dereferenceability for offsets 0, 1, 3. (i.e. not 2). How do we know it's safe to dereference between the two elements 1 & 3?

well, it is a maximum reference that was recorded there in InstCombine, that implies 0, 1, 2 and 3. And if you are saying that 0, 1, 3 dereferenceable and 2-nd is not at the same time then this solution is not correct.

Update for solution, here it uses metadata "speculation.marker" with set of offsets instead of maximum offset as before, thread for discussion http://lists.llvm.org/pipermail/llvm-dev/2017-September/117782.html.

Update for solution, here it uses metadata "speculation.marker" with set of offsets instead of maximum offset as before, thread for discussion http://lists.llvm.org/pipermail/llvm-dev/2017-September/117782.html.

Can you please update the thread with your new plan for this metadata (what adds it, when, and what consumes it), including the proposed definition for the LangRef, so we can all see exactly what you're proposing?

Add LangRef.rst change. Small changes in SLPVectorizer.cpp.

RKSimon added inline comments.Oct 26 2017, 2:32 AM
docs/LangRef.rst
4893

"consists"

4894

Is it a set or a min/max range? The examples only ever have 2 entries. If a set do they have to be sorted?

4896

"is to keep track of dereferanceable memory locations after load operations to that memory were deleted, as it might be beneficial for future optimizations."

include/llvm/IR/LLVMContext.h
104

// alignment

include/llvm/Transforms/Vectorize/SLPVectorizer.h
154

Unnecessary whitespace change

Rebase and fix remarks.

Update description in LangRef.rst.

@dtemirbulatov What is the status of this patch please?

@dtemirbulatov What is the status of this patch please?

@RKSimon I think it needs review. I will rebase proposed change.

Abandon this? I think this is being handled by the (WIP) Attributor work to support dereference

Abandon this? I think this is being handled by the (WIP) Attributor work to support dereference

no, I think I can rebase the change.

Abandon this? I think this is being handled by the (WIP) Attributor work to support dereference

no, I think I can rebase the change.

I suspect @RKSimon is might be right here, it may be better to redesign the feature with nowadays best practices in mind (attributor).

RKSimon added a subscriber: jdoerfert.

Adding @jdoerfert who can probably advise regarding whether Attributor will handle all of this

With D65402 (and later D65593), the Attributor will determine dereferenceability based on accesses that "must be executed".
Since the Attributor runs early in the pipeline (for now only then), this will "generally" solve the issue as unused loads are still present.
However, we lack the ability to annotate dereferenceable for pointers that are not arguments, function return values, call site arguments, or call site return values.
That means, we might determine dereferenceability early but fail to manifest it in the IR or loose the information during inlining. Though, I am unsure how much of
a problem that really is. If it turns out to be bad, I'd suggest to add an intrinsic to describe dereferenceability (among other things [alignment, max object size, ...]) of a pointer.

Long story short, I'd recommend investing the time in reviewing and testing the Attributor infrastructure and potentially thinking about a pointer property intrinsic.

If I misunderstood what is going on here, please let me know.