This is an archive of the discontinued LLVM Phabricator instance.

[x86] vectorize cast ops in lowering to avoid register file transfers
ClosedPublic

Authored by spatel on Jan 17 2019, 9:30 AM.

Details

Summary

The proposal in D56796 may cross the line because we're trying to avoid vectorization transforms in generic DAG combining. So this is an alternate, later, x86-specific translation of that patch.

I've avoided all potentially controversial transforms such as extraction from a non-zero element of a vector, so all test diffs here are a clear win AFAIK.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Jan 17 2019, 9:30 AM
spatel marked an inline comment as done.Jan 17 2019, 9:36 AM
spatel added inline comments.
test/CodeGen/X86/vec_int_to_fp.ll
5685–5686 ↗(On Diff #182314)

Not sure yet how this case became a "shuffle first and extract from 0 element", but we probably want to do that more generally to enable this transform more often.

We still seem to be missing many x86_64 cases?

We still seem to be missing many x86_64 cases?

In the knownbits tests, there are other problems. We have some other op(s) sitting between the extract and convert. Some of that would be fixed with a change for PR39975:
https://bugs.llvm.org/show_bug.cgi?id=39975
...but we may need more patches before that to avoid yet more regressions.

Do we have a SSE2/AVX1 cvtdq2pd test case?

spatel marked an inline comment as done.Jan 21 2019, 10:50 AM

Do we have a SSE2/AVX1 cvtdq2pd test case?

See inline comment for 'vec_int_to_fp.ll: extract0_sitofp_v4i32_f64().' Let me know if you're thinking of a different pattern.

test/CodeGen/X86/vec_int_to_fp.ll
5581 ↗(On Diff #182314)

We miss this with SSE because the v4f64 type is not legal. We need to add another check to allow conversion to v2f64 directly if we're extracting from the zero or low elements of a 128-bit source vector.

spatel marked an inline comment as done.Jan 24 2019, 5:55 PM
spatel added inline comments.
test/CodeGen/X86/vec_int_to_fp.ll
5581 ↗(On Diff #182314)

I have an ugly draft of a patch that would handle that case. It requires that we produce a X86ISD::CVTSI2P node rather than the generic SINT_TO_FP and that we return/adjust the destination type rather than assuming it's a vector with the same number of elements.

I'd prefer to do that in a follow-up commit to reduce risk (assuming we're ok with this general direction).

RKSimon added inline comments.Jan 25 2019, 12:20 AM
lib/Target/X86/X86ISelLowering.cpp
17412 ↗(On Diff #182314)

They're less common but add a TODO about smaller (extended) integer types?

17415 ↗(On Diff #182314)

Is the one use necessary? This combine should replace the scalar conversion with a vector, whether there are other uses of the scalar isn't necessarily relevant (but maybe extra instructions if we support shuffles in the future)?

test/CodeGen/X86/vec_int_to_fp.ll
5685–5686 ↗(On Diff #182314)

SSE2 only supports extractelement from index #0 so the shuffle gets added to move the element there.

spatel marked 3 inline comments as done.Feb 5 2019, 4:26 PM
spatel added inline comments.
lib/Target/X86/X86ISelLowering.cpp
17415 ↗(On Diff #182314)

Yes, I was being conservative, but avoiding the register transfer is probably still enough to justify the transform regardless of other uses.

We don't have any existing regression tests to cover that pattern, so I added some with rL353249

spatel updated this revision to Diff 185438.Feb 5 2019, 4:28 PM
spatel marked an inline comment as done.

Patch updated:

  1. Added TODO for handling smaller integers.
  2. Removed one-use restriction (still looks worthwhile to avoid the transfer).
RKSimon accepted this revision.Feb 6 2019, 3:12 AM

LGTM thanks

This revision is now accepted and ready to land.Feb 6 2019, 3:12 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2019, 6:59 AM