This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add X86ISD::VSHLV and X86ISD::VSRLV nodes for psllv and psrlv
ClosedPublic

Authored by craig.topper on Jan 14 2019, 6:46 PM.

Details

Summary

Previously we used ISD::SHL and ISD::SRL to represent these in SelectionDAG. ISD::SHL/SRL interpret an out of range shift amount as undefined behavior and will constant fold to undef. While the intrinsics are defined to return 0 for out of range shift amounts. A previous patch added a special node for VPSRAV to produce all sign bits.

This was previously believed safe because undefs frequently get turned into 0 either from the constant pool or a desire to not have a false register dependency. But undef is treated specially in some optimizations. For example, its ignored in detection of vector splats. So if the ISD::SHL/SRL can be constant folded and all of the elements with in bounds shift amounts are the same, we might fold it to single element broadcast from the constant pool. This would not put 0s in the elements with out of bounds shift amounts.

Diff Detail

Event Timeline

zhutianyang created this revision.Jan 14 2019, 6:46 PM

I've committed the tests along with some formatting fixes, removal of unused operands, and a fix to minimize the diffs in avx-intrinsics-x86.ll. Can you rebase this patch and regenerate the test checks?

RKSimon added inline comments.Jan 15 2019, 1:25 AM
lib/Target/X86/X86InstrAVX512.td
6517

Duplication - wrap in defm macro?

lib/Target/X86/X86InstrSSE.td
8402

Worth wrapping these in a defm macro to reduce duplication?

zhutianyang marked an inline comment as done.Jan 15 2019, 3:27 AM

I've committed the tests along with some formatting fixes, removal of unused operands, and a fix to minimize the diffs in avx-intrinsics-x86.ll. Can you rebase this patch and regenerate the test checks?

Ok, I will

lib/Target/X86/X86InstrAVX512.td
6517

Could you share a demo to explain how to wrap it? thanks

RKSimon added inline comments.Jan 15 2019, 3:42 AM
lib/Target/X86/X86InstrAVX512.td
6517

Something like (sorry not fully tested):

multiclass avx512_var_shift_int<string InstrStr, SDNode OpNode> {
  defm : avx512_var_shift_int_lowering<InstrStr#"W", OpNode, v8i16x_info, [HasVLX, HasBWI]>;
  defm : avx512_var_shift_int_lowering<InstrStr#"W", OpNode, v16i16x_info, [HasVLX, HasBWI]>;
  defm : avx512_var_shift_int_lowering<InstrStr#"W", OpNode, v32i16_info, [HasBWI]>;
  defm : avx512_var_shift_int_lowering_mb<InstrStr#"D", OpNode, v4i32x_info, [HasVLX]>;
  defm : avx512_var_shift_int_lowering_mb<InstrStr#"D", OpNode, v8i32x_info, [HasVLX]>;
  defm : avx512_var_shift_int_lowering_mb<InstrStr#"D", OpNode, v16i32_info, [HasAVX512]>;
  defm : avx512_var_shift_int_lowering_mb<InstrStr#"Q", OpNode, v2i64x_info, [HasVLX]>;
  defm : avx512_var_shift_int_lowering_mb<InstrStr#"Q", OpNode, v4i64x_info, [HasVLX]>;
  defm : avx512_var_shift_int_lowering_mb<InstrStr#"Q", OpNode, v8i64_info, [HasAVX512]>;
}
defm : avx512_var_shift_int<"VPSRAV", X86vsrav>;
zhutianyang marked an inline comment as done and an inline comment as not done.Jan 15 2019, 4:22 AM
zhutianyang added inline comments.
lib/Target/X86/X86InstrAVX512.td
6517

Thanks a lot. it is a clean method.

RKSimon added inline comments.Jan 15 2019, 5:12 AM
lib/Target/X86/X86InstrSSE.td
8402

We could even move the patterns inside avx2_var_shift now that all users want them.

zhutianyang added inline comments.Jan 15 2019, 5:23 AM
lib/Target/X86/X86InstrSSE.td
8402

I think you have some good ideas, could you share a demo again?
To be frankly, I am not familial with td programming, Thanks.

RKSimon added inline comments.Jan 15 2019, 7:30 AM
lib/Target/X86/X86InstrSSE.td
8402

Actually this isn't going to work as VPSRAVQ doesn't exist on AVX2 - let's leave it as this.

craig.topper commandeered this revision.Jan 15 2019, 12:22 PM
craig.topper edited reviewers, added: zhutianyang; removed: craig.topper.

Commandeering so I can rebase the tests in the interest of getting this through review quicker.

Rebase tests.

Move patterns into avx2_var_shift. Add multiclass in AVX512 to wrap the vector 3 vector lengths.

RKSimon accepted this revision.Jan 16 2019, 1:19 PM

LGTM - this should also be merged into the 8.00 release branch, please create a merge request bug and block PR40331

This revision is now accepted and ready to land.Jan 16 2019, 1:19 PM
This revision was automatically updated to reflect the committed changes.
craig.topper added a comment.EditedJan 16 2019, 1:52 PM

Merge request PR40343