This is an archive of the discontinued LLVM Phabricator instance.

[x86] fix mappings of cvttp2si x86 intrinsics to x86-specific nodes and isel patterns (PR37551)
ClosedPublic

Authored by craig.topper on Jun 10 2018, 11:09 AM.

Details

Summary

The tests in:
https://bugs.llvm.org/show_bug.cgi?id=37751
...show miscompiles because we wrongly mapped and folded x86-specific intrinsics into generic DAG nodes.

This patch corrects the mappings in X86IntrinsicsInfo.h and adds isel matching corresponding to the new patterns. The complete tests for the failure cases should be in avx-cvttp2si.ll and sse-cvttp2si.ll (let me know if you see any missing possibilities/targets). This could be 3 independent reviews/commits if that's preferred.

We mentioned reordering the existing defs in PR37751, but I've postponed that potential change because we might prefer the way we're handling the codegen for AVX512 as shown here. In particular, we're seeing improvements in avx512vl-intrinsics-upgrade.ll because we're able to fold the 2 conversion calls together and then use a simpler move+mask. This is because I didn't exclude AVX512 from the new tablegen patterns, but the existing defs have 'NoVLX' predicates. I don't know AVX512, so I'm not sure if the existing defs are ideal or not.

Diff Detail

Event Timeline

spatel created this revision.Jun 10 2018, 11:09 AM

For avx512f I've thought for a little that we may want to add one use checks to the masked patterns for avx512 for the longer latency/low throughput instructions.

spatel added inline comments.Jun 10 2018, 11:24 AM
test/CodeGen/X86/avx-cvttp2si.ll
19–20

Missing load folding for AVX512?

craig.topper added inline comments.Jun 10 2018, 11:32 AM
test/CodeGen/X86/avx-cvttp2si.ll
19–20

Looks like we're missing isel folding patterns with loads for both avx and avx512. But avx is being fixed by peephole. AVX512 isn't because the load folding tables are incomplete.

RKSimon added inline comments.Jun 10 2018, 11:49 AM
test/CodeGen/X86/sse-cvttp2si.ll
7

I don't think they are affected, but please can you add the cvttss2si(64)/cvttsd2si(64) test cases as well to this and the avx file?

spatel updated this revision to Diff 150745.Jun 11 2018, 7:02 AM
spatel marked an inline comment as done.

Patch updated:
Rebased after adding SSE scalar intrinsics with rL334404. There are no extra AVX1/2 intrinsics of this sort AFAICT, but let me know if I've missed something. As expected, this patch doesn't change any of the new scalar tests.

@craig.topper What do you want to do for the avx512 equivalents in X86IntrinsicsInfo.h

e.g.

X86_INTRINSIC_DATA(avx512_mask_cvttpd2dq_128, INTR_TYPE_1OP_MASK, X86ISD::CVTTP2SI, 0),
X86_INTRINSIC_DATA(avx512_mask_cvttpd2dq_512, INTR_TYPE_1OP_MASK, ISD::FP_TO_SINT, X86ISD::CVTTP2SI_RND),
X86_INTRINSIC_DATA(avx512_mask_cvttpd2qq_128, INTR_TYPE_1OP_MASK, ISD::FP_TO_SINT, 0),
X86_INTRINSIC_DATA(avx512_mask_cvttpd2qq_256, INTR_TYPE_1OP_MASK, ISD::FP_TO_SINT, 0),
X86_INTRINSIC_DATA(avx512_mask_cvttpd2qq_512, INTR_TYPE_1OP_MASK, ISD::FP_TO_SINT, X86ISD::CVTTP2SI_RND),

I think I've fixed the missing load folding in r334460.

For avx512 I think we should also update those table entries to use X86ISD::CVTTP2SI

spatel updated this revision to Diff 150998.Jun 12 2018, 12:40 PM

Patch updated:
Rebased after load folding fixes.

I think I've fixed the missing load folding in r334460.

For avx512 I think we should also update those table entries to use X86ISD::CVTTP2SI

Is it correct that those updates would be independent of the current patch? If so, I'd like to move ahead with this part as-is.

I don't know anything about AVX512, so I don't want to jeopardize these fixes (assuming they're good) with more diffs.

Alternatively, if someone wants to commandeer this and add the AVX512 part, I'll gladly go hack on other stuff. :)

craig.topper commandeered this revision.Jun 12 2018, 1:17 PM
craig.topper edited reviewers, added: spatel; removed: craig.topper.

I'll fix avx512 including fp_to_uint usages which I assume have the same bug.

-Use X86cvttp2si and X86cvttp2usi in the instruction patterns. Add fp_to_sint/fp_to_uint as Pat patterns.
-Add NoVLX to the new patterns, this means we go back to duplicating the operation when we have masked and unmasked version in the same block. But that's a general problem and not something we should tackle here.
-Updated for new test file that tests all the avx512 specific intrinsics that are subject to the same bug.

I'm considering a follow up to replace fp_to_sint/fp_to_uint with X86ISD nodes in PreprocessISelDAG to cut down the number of patterns(and add the missing masked patterns) while still giving DAG combine plenty of freedom.

spatel added inline comments.Jun 13 2018, 7:30 AM
test/CodeGen/X86/avx512-cvttp2i.ll
7–22 ↗(On Diff #151045)

I'm going to need new glasses after this review.

There are 16 intrinsics here, but there are 17 code diffs in the intrinsics header.
Are we missing tests for:
declare <8 x i32> @llvm.x86.avx512.mask.cvttpd2dq.512(<8 x double>, <8 x i32>, i8, i32)

RKSimon added inline comments.Jun 13 2018, 8:13 AM
test/CodeGen/X86/mmx-cvt.ll
159

You can fix this regression by updating the fp_to_sint pattern at the bottom of X86InstrMMX.td to take X86cvttp2si

Add additional MMX pattern. Update for the 17th avx512 intrinsic

spatel accepted this revision.Jun 13 2018, 2:18 PM

LGTM.

Might want to get a 2nd opinion on 512 though since I'm not as familiar with that.

This revision is now accepted and ready to land.Jun 13 2018, 2:18 PM
This revision was automatically updated to reflect the committed changes.