This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Fix PR21780 Expansion of 256 bit vector loads fails to fold into shuffles
Needs RevisionPublic

Authored by dtemirbulatov on Sep 7 2017, 11:29 AM.

Details

Summary

This is a first part of PR21780 fix.

This change allows to keep information about removed Load instruction by InstCombine pass in order to restore those later. Here I am using Intrinsics to keep information in the IR as it was discussed in http://lists.llvm.org/pipermail/llvm-dev/2017-July/115730.html. Before that, I used metadata for that purpose https://reviews.llvm.org/D35139.

I introduced phantom_mem intrinsic that consist of two parameters the first is pointer and the second is maximum offset for that pointer. In the Instcombine pass, while constructing such intrisics, I try to avoid case if GEP's origin is in a different basic block and GEP itself is in a loop, but it is ok if both GEP and the pointer origin are in the loop and both belong to the same basic block.

Diff Detail

Event Timeline

dtemirbulatov created this revision.Sep 7 2017, 11:29 AM
RKSimon edited edge metadata.Sep 10 2017, 3:42 AM

Some style comments as the poor bracing/indentation makes it difficult to grok - more thorough review needs to wait until you've addressed @hfinkel's comments on D37648 about phantom mem intrinsics

lib/Transforms/InstCombine/InstructionCombining.cpp
1521

clang-format this - your indentation is broken.

2990

clang-format - braces and indentation are really bad

sanjoy added a subscriber: sanjoy.Sep 12 2017, 2:28 PM

Why not leave the original load in place, and tag the load metadata like !speculation_marker ? In either case, this needs a LangRef entry.

Why not leave the original load in place, and tag the load metadata like !speculation_marker ? In either case, this needs a LangRef entry.

Re-reading, I messed up the phrasing there. I meant to say two things:

  • You could consider leaving the original load in place, and tag it with something like !speculation_marker. This metadata would indicate to passes like DCE that said load should not be eliminated till the very end of the optimization pipeline (so that we keep the optimization information for as long as possible). This does block optimizations though (like you will not be able to sink stores across this now present load, something you may have been able to do with the intrinsic you're introducing, though not without figuring out a way to denote it as not reading memory), but it has the upside of not introducing a new intrinsic.
  • Irrespective of whether you decide to go with the intrinsic or the metadata approach, please also update the language reference with an entry describing the semantics of this new thing.

Some style comments as the poor bracing/indentation makes it difficult to grok - more thorough review needs to wait until you've addressed @hfinkel's comments on D37648 about phantom mem intrinsics

I think I did, it is here http://lists.llvm.org/pipermail/llvm-dev/2017-September/117381.html

Why not leave the original load in place, and tag the load metadata like !speculation_marker ? In either case, this needs a LangRef entry.

Re-reading, I messed up the phrasing there. I meant to say two things:

  • You could consider leaving the original load in place, and tag it with something like !speculation_marker. This metadata would indicate to passes like DCE that said load should not be eliminated till the very end of the optimization pipeline (so that we keep the optimization information for as long as possible). This does block optimizations though (like you will not be able to sink stores across this now present load, something you may have been able to do with the intrinsic you're introducing, though not without figuring out a way to denote it as not reading memory), but it has the upside of not introducing a new intrinsic.
  • Irrespective of whether you decide to go with the intrinsic or the metadata approach, please also update the language reference with an entry describing the semantics of this new thing.

Thanks, for me, implementation with intrinsic looks better since it is detached from the actual instruction, since later this instruction might end up in a loop, etc.

Fix review remarks and add an aggregate pointer intrinsic's second parameter.

Fix recordDeadLoad() parameter lining with clang-format.

add missed test in the last change.

Update for solution, here it uses metadata "speculation.marker" with set of offsets instead of maximum offset as before, thread for discussion http://lists.llvm.org/pipermail/llvm-dev/2017-September/117782.html.

hfinkel edited edge metadata.Oct 25 2017, 7:52 AM

See my comment in D37648.

Add LangRef.rst change, update test to check metadata values.

Rebase and fix remarks.

Update description in LangRef.rst.

reames requested changes to this revision.Nov 27 2017, 11:28 AM
reames added a subscriber: reames.

Initial round of comments focused on the metadata semantics only. I completely skipped the instcombine changes.

You also need verifier tests.

docs/LangRef.rst
4919

This needs a name change.

We already have dereferenceable attribute which marks arguments and return values as being *globally* deferenceable. This is marking a memory location as being dereferenceable at a particular contextual location.

Maybe: dereferenceable_offsets?

4935

More information on the format is needed. Specific missing pieces:

  • what do both fields mean?
  • i64 or any constant?
  • signed or unsigned?
  • is it a list?

I'd also suggest requiring sorting for easy of access.

test/Transforms/InstCombine/speculation_marker.ll
4

Just to comment: in your test case, we can prove that the loads post dominate the entry to the function and could update the argument with the existing dereferenceability attribute. This might be an alternate approach and separately worth implementation.

This revision now requires changes to proceed.Nov 27 2017, 11:28 AM
RKSimon added a subscriber: jdoerfert.

Adding @jdoerfert who can probably advise regarding whether Attributor will handle all of this

sanjoy removed a subscriber: sanjoy.Sep 2 2019, 9:20 AM

Adding @jdoerfert who can probably advise regarding whether Attributor will handle all of this

First, sorry for not reacting sooner.


Long story short, I would prefer a more generic solution.

The way I understand this patch is that it tries to "remember" knowledge we are about to loose due to a transformation, correct? (If not, my next paragraphs are probably way off.)

To answer the Attributor question:
The Attributor will tie dereferenceable and nonnull knowledge to arguments and call site return values (especially the latter is not properly working due to some stuff I don't want to get into).
That means, if the base pointer is an argument (like in the test cases) and the accesses are in the must-be-executed-context of the argument (like in the test cases) it will work. If either is
not the case, it won't. That is the problem I would fix.

A similar situation arose recently in D69477. While we got a temporary solution in there, I think my response to it is adequate here as well https://reviews.llvm.org/D69477#1724399 .
I am in contact with @Tyker on this but I haven't found the time to write an RFC. If more people are interested in a generic way to preserve information I'd be happy to share my thoughts so far an help on a design.

To give you an idea, I was currently thinking about something along the lines of: llvm.assume() ["derefereceable"(%p, 8), "derefereceable"(%gep, 4), "align"(%p, 16) ]