This is an archive of the discontinued LLVM Phabricator instance.

[X86][AVX] Eliminate redundant movzbl instruction.
ClosedPublic

Authored by LuoYuanke on Aug 1 2022, 6:30 PM.

Details

Summary

The movzbl instruction can be combined to vpinsrb or vmovd, when it is actual lowered from anyext.

Diff Detail

Event Timeline

LuoYuanke created this revision.Aug 1 2022, 6:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 1 2022, 6:30 PM
LuoYuanke requested review of this revision.Aug 1 2022, 6:30 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 1 2022, 6:30 PM
LuoYuanke retitled this revision from [X86] Eliminate redundant movzbl instruction. to [X86] [WIP] Eliminate redundant movzbl instruction..Aug 1 2022, 6:35 PM
LuoYuanke updated this revision to Diff 464907.Oct 4 2022, 1:01 AM

Add pattern to combine copy mask register to gpr

LuoYuanke retitled this revision from [X86] [WIP] Eliminate redundant movzbl instruction. to [X86] Eliminate redundant movzbl instruction..Oct 15 2022, 5:00 PM
LuoYuanke added inline comments.Oct 15 2022, 5:03 PM
llvm/test/CodeGen/X86/avx-intrinsics-fast-isel.ll
2141

@craig.topper, I am wondering why we keep movzbl before vpinsrb and vmovd. Is it used deliberately to eliminate partially register stall?

craig.topper added inline comments.Oct 15 2022, 6:11 PM
llvm/test/CodeGen/X86/avx-intrinsics-fast-isel.ll
2141

Yes I think that was the original reason. With 64-bit we almost never use ah,by,ch,dh. And Intel CPUs since SNB don’t have merges unless the H registers have been written. So it probably doesn’t matter much anymore.

we've lost some checks, probably due to the shared check-prefixes not working anymore

llvm/test/CodeGen/X86/load-scalar-as-vector.ll
5

I think you need to splt the AVX into:

--check-prefixes=AVX,AVX1
--check-prefixes=AVX,AVX2
llvm/test/CodeGen/X86/sse41-intrinsics-fast-isel.ll
6

you may need to add X86-AVX1 (and X64-AVX512 below)

Make it clear in the summary this is for AVX+ targets only?

Make it clear in the summary this is for AVX+ targets only?

I'll add the pattern for avx512f and avx512bw as well.

LuoYuanke updated this revision to Diff 468758.Oct 18 2022, 6:47 PM

Rebase and address Simon's comments.

LuoYuanke updated this revision to Diff 468764.Oct 18 2022, 7:28 PM

Move pattern that require VK register from SSE.td to AVX512.td

RKSimon retitled this revision from [X86] Eliminate redundant movzbl instruction. to [X86][AVX] Eliminate redundant movzbl instruction..Oct 19 2022, 2:45 AM
RKSimon edited the summary of this revision. (Show Details)

Have you seen any cases where we need movzwl?

llvm/lib/Target/X86/X86InstrAVX512.td
11720

I don't think you need HasAVX?

Have you seen any cases where we need movzwl?

I didn't see the case. Maybe can eliminate movzwl too.

LuoYuanke updated this revision to Diff 468869.Oct 19 2022, 4:39 AM

Address Simon's comments.

LuoYuanke marked an inline comment as done.Oct 19 2022, 4:39 AM
RKSimon accepted this revision.Oct 20 2022, 9:20 AM

LGTM (it doesn't look like movzwl is an issue given we try so hard to avoid 16-bit ops)

This revision is now accepted and ready to land.Oct 20 2022, 9:20 AM
This revision was automatically updated to reflect the committed changes.