Page MenuHomePhabricator

[InstCombine] matchFunnelShift - support non-uniform constant vector shift amounts (PR46895)
ClosedPublic

Authored by RKSimon on Sep 28 2020, 7:56 AM.

Details

Summary

Complete PR46895 by refactoring D87452/D88402 to allow us to match non-uniform constant values.

Diff Detail

Event Timeline

RKSimon created this revision.Sep 28 2020, 7:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2020, 7:56 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
RKSimon requested review of this revision.Sep 28 2020, 7:56 AM
nikic added a comment.EditedSep 28 2020, 9:40 AM

To handle the various undef cases this ended up being a lot bulkier than I hoped - I'd love to hear if anyone has any suggestions on how we could use ConstantExpr/PatternMatch to simplify this as I imagine we could use such functionality in other places as we improve non-uniform vector support.

Would something along the lines of ConstantExpr::getAdd(LC, RC) and then matching the result against a Width splat work? As C + undef already gets folded to undef.

I did wonder if a ConstantExpr::isUndef() help could be added to return a bool/vectorbool that could be then used to select the 'matching' values (e.g. undef + 1 -> 31 + 1 for i32 rotate).

RKSimon planned changes to this revision.Sep 30 2020, 2:12 PM
RKSimon updated this revision to Diff 296914.Oct 8 2020, 3:36 AM
RKSimon retitled this revision from [InstCombine] matchRotate - support non-uniform constant vector rotation amounts (PR46895) to [InstCombine] matchFunnelShift - support non-uniform constant vector shift amounts (PR46895).
RKSimon edited the summary of this revision. (Show Details)
RKSimon removed a reviewer: majnemer.

I've removed the undef handling from this patch - we can revisit that later once we have a better idea of how to achieve it (see D88687).

This revision is now accepted and ready to land.Oct 8 2020, 3:42 AM

It is a miscompile--or, at least a fingerprint is being miscomputed. The failure check is here:

https://boringssl.googlesource.com/boringssl/+/refs/heads/master/crypto/hrss/hrss_test.cc#165

But that is obviously somewhat removed from where things actually went wrong, which is what I'm (rather slowly, unfortunately) trying to figure out.

It is a miscompile--or, at least a fingerprint is being miscomputed. The failure check is here:

https://boringssl.googlesource.com/boringssl/+/refs/heads/master/crypto/hrss/hrss_test.cc#165

But that is obviously somewhat removed from where things actually went wrong, which is what I'm (rather slowly, unfortunately) trying to figure out.

I think there were some related fixes afterwards. Is the issue still present in master?

MaskRay added subscribers: nemanjai, MaskRay.EditedOct 16 2020, 10:35 AM

It is a miscompile--or, at least a fingerprint is being miscomputed. The failure check is here:

https://boringssl.googlesource.com/boringssl/+/refs/heads/master/crypto/hrss/hrss_test.cc#165

But that is obviously somewhat removed from where things actually went wrong, which is what I'm (rather slowly, unfortunately) trying to figure out.

I think there were some related fixes afterwards. Is the issue still present in master?

The powerpc64le issue is still present as of 51ff04567b2f8d06b2062bd3ed72eab2e93e4466.

Adding @nemanjai in case he can find anything obvious from this commit or D88834

Here is a relatively simple, self-contained testcase. Of interest is that the entire loop needs to be present, and completely unwound for this bug to show. It looks to me like either next_carry or carry is mis-calculated or stale on the ninth iteration which fills the second-to-last element.

Failed
    Correct          Actual
 0: c71c71c71c71c9fc   c71c71c71c71c9fc
 1: 5b91f6e791f6e791   5b91f6e791f6e791
 2: 6000000000000005   6000000000000005
 3: 9e79e79e79e79ed8   9e79e79e79e79ed8
 4: 4bf2c78af2c7816e   4bf2c78af2c7816e
 5: a53d3d3d3d3d3d3d   a53d3d3d3d3d3d3d
 6: 7555555555555558   7555555555555558
 7: cf3cf3cf3cf3cfcd   cf3cf3cf3cf3cfcd
 8: d3cfa5effb5effcf   d3cfa5effb5effcf
 9: 0000000000000000   0000000000000001
10: 0000000000000000   0000000000000000