This is an archive of the discontinued LLVM Phabricator instance.

[X86] Fix the cost model for v16i16->v16i32 zero_extend/sign_extend with AVX2
ClosedPublic

Authored by craig.topper on Jan 29 2020, 10:27 AM.

Details

Summary

We seem to be inheriting the cost from sse4.1. But if we have 256-bit registers we should be able to do this with just one extract to split the 16i16 and two v8i16->v8i32 operations so our cost should be 3 not 4.

Diff Detail

Event Timeline

craig.topper created this revision.Jan 29 2020, 10:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 29 2020, 10:27 AM
Herald added a subscriber: hiraditya. · View Herald Transcript

I guess the cost should be 3 if we count the extract we need to split the 16i16 input. New patch coming in a moment.

craig.topper edited the summary of this revision. (Show Details)
This revision is now accepted and ready to land.Jan 29 2020, 10:52 AM
This revision was automatically updated to reflect the committed changes.