This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine][SSE] Demanded vector elements for scalar intrinsics
ClosedPublic

Authored by RKSimon on Feb 21 2016, 8:56 AM.

Details

Summary

This patch improves support for determining the demanded vector elements through SSE scalar intrinsics:

1 - recognise that we only need the lowest element of the second input for binary scalar operations (and all the elements of the first input)

2 - recognise that the roundss/roundsd intrinsics use the lowest element of the second input and the remaining elements from the first input

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 48623.Feb 21 2016, 8:56 AM
RKSimon retitled this revision from to [InstCombine][SSE] Demanded vector elements for scalar intrinsics.
RKSimon updated this object.
RKSimon added reviewers: majnemer, craig.topper, spatel.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
spatel edited edge metadata.Apr 18 2016, 10:39 AM

4 - addss gets simplified to a fadd call if we aren't interested in the pass through elements (can we do this for div/sqrt?)

I think so - please add a TODO comment. But this change really should be a separate patch review because it has FP exception behavior implications, doesn't it?

I'm happy to split this into multiple commits if anyone thinks its necessary.

Yes, this should definitely be split up for committing for easier tracking and reversion if needed. I'd argue that it should be done for review purposes too as there are so many test cases to consider.

lib/Transforms/InstCombine/InstCombineCalls.cpp
1243–1258 ↗(On Diff #48623)

Don't these also demand the lowest element of the first input vector?

1275–1278 ↗(On Diff #48623)

Instead of returning immediately, isn't it more efficient to fall-through to the check of the 2nd arg? Ie, replace both args in one shot if possible. That's what the code in SimplifyDemandedVectorElts() is doing if I'm reading it correctly.

Cheers Sanjay, I'll split off the InstCombineSimplifyDemanded.cpp diffs into a separate patch

RKSimon updated this revision to Diff 54186.Apr 19 2016, 6:33 AM
RKSimon updated this object.
RKSimon edited edge metadata.

Split based on Sanjay's comments

RKSimon added inline comments.Apr 19 2016, 6:37 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
1374–1389 ↗(On Diff #54186)

For the binary scalar intrinsics we need all the elements of the first input: the lowest is used in the operation and the remaining are all 'passed through' to the result.

1406–1409 ↗(On Diff #54186)

I've included the 'MadeChange' bool so we can combine both in one pass. I've also done this in the existing similar COMI/UCOMI combine above.

spatel accepted this revision.Apr 19 2016, 9:09 AM
spatel edited edge metadata.

LGTM - see inline for 1 minor change to the patch itself.

The rest of the comments are just 'thinking out loud' sort of questions so we have a paper trail for how we got here.
On that note, if you know of bug reports that track any of this, please do include them in the commit message.
https://llvm.org/bugs/show_bug.cgi?id=22206 is one for the sqrt intrinsic for whenever that part of the patch set lands.

lib/Transforms/InstCombine/InstCombineCalls.cpp
1374–1389 ↗(On Diff #54186)

Got it - it was the code comment that threw me off. Please change to something like:
"These intrinsics demand all elements of the first input because the high elements of that input are passed through. The low elements of both inputs are used for the actual binary op."

test/Transforms/InstCombine/x86-sse.ll
134–136 ↗(On Diff #54186)

The next patch will delete these extra inserts?

test/Transforms/InstCombine/x86-sse2.ll
72–78 ↗(On Diff #54186)

This case is, "If a tree falls in the IEEE754 woods..." right? Ie, with fast-math, we zap the whole thing, but I don't think we can under default settings.

105–118 ↗(On Diff #54186)

This isn't part of this patch of course, but this isn't always right, is it? See previous comment about default behavior under IEEE754. To maintain correct exception behavior, we should have subtracted %b from %a. But I suppose this is ok because Clang/LLVM don't support changing the FP env?

The patch that converts the intrinsic to regular IR will allow this to become "ret double 1.0"?

This revision is now accepted and ready to land.Apr 19 2016, 9:09 AM
RKSimon added inline comments.Apr 20 2016, 5:21 AM
test/Transforms/InstCombine/x86-sse.ll
134–136 ↗(On Diff #54186)

Yes D19318 handles those.

test/Transforms/InstCombine/x86-sse2.ll
105–118 ↗(On Diff #54186)

Yes, this is what D19318 is handling - we shouldn't call the scalar intrinsics if we don't actually use the result and the demanded-elts logic has nixed the lowest element inputs.

This revision was automatically updated to reflect the committed changes.