This is an archive of the discontinued LLVM Phabricator instance.

[x86] enhance matching of pmaddwd
ClosedPublic

Authored by spatel on Mar 29 2021, 12:12 PM.

Details

Summary

This was crashing with the example from:
https://llvm.org/PR49716
...and I hopefully stubbed that out with a283d7258360 , but as we can see from the SSE vs. AVX code, I think we need to try harder to match the pattern.

This matcher code was adapted from another pmadd pattern match in D49636, but it needs different ops to deal with size mismatches.

Diff Detail

Event Timeline

spatel created this revision.Mar 29 2021, 12:12 PM
spatel requested review of this revision.Mar 29 2021, 12:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2021, 12:12 PM

This seems reasonable. Is it possible we also need to handle the case where all the extract indices from a different subvector of the input? It's hard to say from the test case that we started from since it appears to have a logic mistake that exposed the bug in the first place. It used "sizeof sizeof(r) in its loop control which caused the loop bounds to be 4. Seems like it should have been sizeof(r)/sizeof(int16_t).

This seems reasonable. Is it possible we also need to handle the case where all the extract indices from a different subvector of the input? It's hard to say from the test case that we started from since it appears to have a logic mistake that exposed the bug in the first place. It used "sizeof sizeof(r) in its loop control which caused the loop bounds to be 4. Seems like it should have been sizeof(r)/sizeof(int16_t).

Right - that source looks unintended.
We could go further in matching, but it gets harder to keep track of the indices and then calculate the offsets, and it doesn't seem as likely to come up in practice. This case seemed plausible from valid source even (!), so that's why I figured we should try to get it.

craig.topper accepted this revision.Mar 29 2021, 1:40 PM

This seems reasonable. Is it possible we also need to handle the case where all the extract indices from a different subvector of the input? It's hard to say from the test case that we started from since it appears to have a logic mistake that exposed the bug in the first place. It used "sizeof sizeof(r) in its loop control which caused the loop bounds to be 4. Seems like it should have been sizeof(r)/sizeof(int16_t).

Right - that source looks unintended.
We could go further in matching, but it gets harder to keep track of the indices and then calculate the offsets, and it doesn't seem as likely to come up in practice. This case seemed plausible from valid source even (!), so that's why I figured we should try to get it.

I'm inclined to agree. So this patch LGTM.

This revision is now accepted and ready to land.Mar 29 2021, 1:40 PM
This revision was automatically updated to reflect the committed changes.