This is an archive of the discontinued LLVM Phabricator instance.

[X86] Attempt to pre-truncate arithmetic operations if useful
ClosedPublic

Authored by RKSimon on Jan 2 2017, 3:15 PM.

Details

Summary

In some cases its more efficient to combine TRUNC( BINOP( X, Y ) ) --> BINOP( TRUNC( X ), TRUNC( Y ) ) if the binop is legal for the truncated types.

This is true for vector integer multiplication (especially vXi64), as well ADD/AND/XOR/OR in cases where we only need to truncate one of the inputs at runtime (e.g. a duplicated input or an one use constant we can fold).

Further work could be done here - scalar cases (especially i64) could often benefit (if we avoid partial registers etc.) and other opcodes possibly too.

I have considered implementing this for all targets within the DAGCombiner but wasn't sure we could devise a suitable cost model system that would give us the range we need?

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 82820.Jan 2 2017, 3:15 PM
RKSimon retitled this revision from to [X86] Attempt to pre-truncate arithmetic operations if useful.
RKSimon updated this object.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
delena edited edge metadata.Jan 3 2017, 1:00 AM

One some targets pmulld is significantly slower than pmullw (FeatureSlowPMULLD). I assume that truncating even different (variable) inputs should be profitable in this case.

And your transformation may be profitable for this scenario:

trunc(binop (s/zext(x), s/zext(y)) when x and y are different variables.

lib/Target/X86/X86ISelLowering.cpp
31795 ↗(On Diff #82820)

Should you check hasOneUse() here ?

RKSimon updated this revision to Diff 82926.Jan 3 2017, 11:58 AM
RKSimon edited edge metadata.

Updated based on Elena's feedback

RKSimon marked an inline comment as done.Jan 3 2017, 12:02 PM

One some targets pmulld is significantly slower than pmullw (FeatureSlowPMULLD). I assume that truncating even different (variable) inputs should be profitable in this case.

And your transformation may be profitable for this scenario:

trunc(binop (s/zext(x), s/zext(y)) when x and y are different variables.

If its alright, I can add support for both of these in followup patches. I've added TODOs to describe possible areas for improvement.

delena accepted this revision.Jan 3 2017, 10:40 PM
delena edited edge metadata.
This revision is now accepted and ready to land.Jan 3 2017, 10:40 PM
This revision was automatically updated to reflect the committed changes.