This is an archive of the discontinued LLVM Phabricator instance.

[x86] Correct the implementation of isTruncateFree to be more accurate
Changes PlannedPublic

Authored by craig.topper on Sep 26 2017, 4:52 PM.

Details

Reviewers
RKSimon
zvi
spatel
Summary

Currently we returned true as long as the source type is larger than the dest type, but truncates are only "free" if we can use a subregister extract. This corrects the implementation to match that.

It looks like the EVT signature was also running the check on vectors which was probably unintentional. So I've corrected that here. I think this may have exposed some missing cases in the cost model.

The avx512-mask-op.ll changed because we previously promoted the load to 32-bits under the assumption that truncating from i32 to i1 is free. This ultimately allowed the two ands to be CSEd by the DAG since there were then both i32. Now we have one in i32 and one in i8.

Diff Detail

Event Timeline

craig.topper created this revision.Sep 26 2017, 4:52 PM
RKSimon accepted this revision.Sep 28 2017, 2:24 AM

LGTM - the avx512-mask-op.ll regression is an annoyance but a separate problem IMO.

This revision is now accepted and ready to land.Sep 28 2017, 2:24 AM

I'm investigating a perf loss on one our benchmarks before committing this.

craig.topper planned changes to this revision.Oct 4 2017, 3:43 PM

I can't seem to fix the perf loss on this one benchmark. So I need to hold off on this.

It looks to be that in this particular case, the SLP vectorize was previously creating a v16i64->v16i8 truncate in AVX2 and we are no longer doing that. Attempts at fudging new entries into the cost model to lower the truncate cost (which was calculated at 11) haven't worked so far.

RKSimon added inline comments.Dec 9 2017, 8:55 AM
test/Analysis/CostModel/X86/trunc.ll
50

missing AVX512 cost?

94

missing AVX512 cost?

The CHECK lines have been added to the test after the last time this review was updated.

@craig.topper Did you ever get to the cause of the regression you saw?