This is an archive of the discontinued LLVM Phabricator instance.

Convert a masked.load of a dereferenceable address to an unconditional load
ClosedPublic

Authored by reames on Mar 22 2019, 9:56 AM.

Details

Summary

If we have a masked.load from a location we know to be dereferenceable, we can simply issue a speculative unconditional load against that address. The key advantage is that it produces IR which is well understood by the optimizer. The select (cnd, load, passthrough) form produced should be pattern matchable back to hardware predication if profitable.

Diff Detail

Event Timeline

reames created this revision.Mar 22 2019, 9:56 AM
reames updated this revision to Diff 191903.Mar 22 2019, 9:57 AM
reames added a reviewer: spatel.

missed a todo

I'm not really an expert in InstCombine but it looks good to me. Someone else should probably take a look as well. I added some minor comments though.

lib/Transforms/InstCombine/InstCombineCalls.cpp
1169

Isn't that implemented below (in this patch)?

1247

I would have preferred an actual type here (and in one or two other places below where it is not obvious). However, I don't know if this is the "standard" in this part of the code base.

test/Transforms/InstCombine/masked_intrinsics.ll
211

The zero here is the lane, correct? It looks "like a special case", even if it's not.

Could we copy this test, change the width to 8 and gather lane 5? That should make sure it works "in general" with minimal effort on your part.

Idk if there is, or if we need, a negative test as well, e.g., two active lanes in a constant mask.

These are 2 independent transforms, right? It would be better to split that into 2 smaller reviews/commits, so we are not missing anything (and easier to diagnose if there's trouble post-commit). The tests should show minimal IR patterns to exercise those 2 independent transforms.

reames planned changes to this revision.Apr 22 2019, 10:18 AM
reames marked 3 inline comments as done.

These are 2 independent transforms, right? It would be better to split that into 2 smaller reviews/commits, so we are not missing anything (and easier to diagnose if there's trouble post-commit). The tests should show minimal IR patterns to exercise those 2 independent transforms.

You can view them that way, or not. I was looking at this as "what was needed to decompose a single element gather from a dereferenceable address", but I can see your point. I'll split.

lib/Transforms/InstCombine/InstCombineCalls.cpp
1169

Nope, that's the gather case. Analogous, but distinct intrinsics.

1247

I'm happy to change to explicit types, which did you find non-obvious?

test/Transforms/InstCombine/masked_intrinsics.ll
211

Will do.

reames updated this revision to Diff 196105.Apr 22 2019, 11:32 AM
reames retitled this revision from Optimize masked.loads and masked.gathers with a single active lane to Convert a masked.load of a dereferenceable address to an unconditional load.
reames edited the summary of this revision. (Show Details)

Split the patch. This review is now only the masked.load part.

spatel accepted this revision.Apr 22 2019, 2:13 PM

LGTM. This canonicalization is reverse of the typical (shorter code is preferred), but the reasoning in the summary makes sense: we have lots of IR optimizer logic for selects, and this should be reversible in the backend.

lib/Transforms/InstCombine/InstCombineCalls.cpp
1183

typo: sensative

1187

I think LLVM standards prefer to make this an explicit "Value *" rather than "auto *":
http://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable

This revision is now accepted and ready to land.Apr 22 2019, 2:13 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2019, 8:25 AM