AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer Truncate with/without saturation
Added tests for DAG lowering ,encoding and intrinsics
Details
- Reviewers
AsafBadouh delena - Commits
- rG074a64e72c31: AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer…
rGda1b2ea9557d: AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer…
rL243122: AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer…
rL242990: AVX-512: Implemented encoding , DAG lowering and intrinsics for Integer…
Diff Detail
Event Timeline
include/llvm/IR/IntrinsicsX86.td | ||
---|---|---|
5822 | So, this file is starting to get pretty unwieldy. Why not use multiclasses? | |
include/llvm/Target/TargetSelectionDAG.td | ||
496 | I'm not a fan of this intermediate state. Can you add the same comment that's above ld/st ("Don't use this directly..."), and/or move this somewhere else? Basically I'd prefer masked_load and masked_store to be adjacent. | |
497 | Indentation? | |
lib/Target/X86/X86ISelLowering.cpp | ||
1371–1372 | From my understanding, BWI doesn't imply VLX (even though SKX has both). So, shouldn't these be predicated by hasVLX() as well? I mention the same issue in another comment, so I might be missing something. | |
12514 | "vpmovqwb" ? | |
12515–12523 | I see that 512 bit vectors are handled in l12497, so the added block is only useful for smaller VTRUNCs. Two questions:
So, would it make sense to replace l12491-12494 with: if (Subtarget->hasVLX() && !InVT.is512BitVector() && (InVT.getVectorElementType() != MVT::i16 || Subtarget->hasBWI()) | |
15251–15252 | How about: necessary: - casting ... - extend ... | |
15257 | capitalization? extend -> Extend (here and elsewhere) But I'm not convinced it's a good idea to reuse the same function. Perhaps it's different enough to warrant another function for masking truncstores? | |
15268–15282 | Would it be wrong to do this all the time, even for non-truncstores? So that you can remove extendMask completely, and just decide based on MaskVT/Mask.getVT() ? | |
15274 | } else{ -> } else { | |
15293 | Why? I see the patterns match X86select, but what's wrong with ISD::VSELECT? | |
15451 | true -> /*ExtendMask=*/true | |
24190 | unnecessary whitespace, -> if (TLI | |
lib/Target/X86/X86InstrAVX512.td | ||
5762–5764 | Multiclass? !add(0x10) or something for the opcode. | |
lib/Target/X86/X86InstrFragmentsSIMD.td | ||
117–124 | Vtunc -> Vtrunc Why "Op"? Also, is there a better place for this? Perhaps with the other SDTPs (or the lone one in this file)? | |
lib/Target/X86/X86IntrinsicsInfo.h | ||
178 | What about the saturating _mem variants? If you add them, would that let you remove the avx512_trunc_mr_lowering patterns? | |
test/CodeGen/X86/avx512-trunc-ext.ll | ||
0 | Do you mind splitting the file in trunc and ext tests? | |
9 | readnone isn't useful; also, can you use an attribute group? |
include/llvm/IR/IntrinsicsX86.td | ||
---|---|---|
5822 | Ahmed, we don't want to change format of this file now. We need to add hundreds of intrinsics and we generate these lines. Multiclasses will require manual code. It will significantly slow down the process. |
include/llvm/IR/IntrinsicsX86.td | ||
---|---|---|
5822 | Could you contribute the code that you use to generate the intrinsic definitions? I'm not entirely sure that a multiclass would help here (as I recall, TableGen does not have a 'map' data structure, and that's probably what you want here). However, to the extent possible, we should have human-maintainable code. TableGen alone might not currently be the best tool for this right now, but if so, we should have a detailed understanding of why. |
include/llvm/IR/IntrinsicsX86.td | ||
---|---|---|
5822 | We can't contribute the generator. It is an informal semiautomatic process based on parsing intrinscs headers. But it significantly simplifies our life when we need to add such amount of intrinsics. |
include/llvm/IR/IntrinsicsX86.td | ||
---|---|---|
5822 | At least some of it looks easily multiclass-able, no? For instance, the _mem_ variants, or the saturation kind. That's already a factor of 6. We can also have a multiclass for the size variants. So, even without being smart with types, you can do: multiclass int_avx512_mask_pmov_base<string SatKind, string Size, LLVMType InVT, LLVMType OutVT, LLVMType ScVT> { def int_x86_avx512_mask_pmov#SatKind#_#NAME#_#Size : GCCBuiltin<"__builtin_ia32_pmov"#SatKind#NAME#Size#"_mask">, Intrinsic<[OutVT], [InVT, OutVT, ScVT], [IntrNoMem]>; def int_x86_avx512_mask_pmov#SatKind#_#NAME#_mem_#Size : GCCBuiltin<"__builtin_ia32_pmov"#SatKind#NAME#Size#"mem_mask">, Intrinsic<[], [llvm_ptr_ty, InVT, ScVT], [IntrReadWriteArgMem]>; } multiclass int_x86_avx512_mask_pmov_base_sized<string Size, LLVMType InVT, LLVMType OutVT, LLVMType ScVT> { defm NAME#"" : int_avx512_mask_pmov_base< "", Size, InVT, OutVT, ScVT>; defm NAME#"" : int_avx512_mask_pmov_base< "s", Size, InVT, OutVT, ScVT>; defm NAME#"" : int_avx512_mask_pmov_base<"us", Size, InVT, OutVT, ScVT>; } multiclass int_x86_avx512_mask_pmov_all<LLVMType ScVT128, LLVMType InVT128, LLVMType OutVT128, LLVMType ScVT256, LLVMType InVT256, LLVMType OutVT256, LLVMType ScVT512, LLVMType InVT512, LLVMType OutVT512> { defm NAME : int_x86_avx512_mask_pmov_base_sized<"128", InVT128, OutVT128, ScVT128>; defm NAME : int_x86_avx512_mask_pmov_base_sized<"256", InVT256, OutVT256, ScVT256>; defm NAME : int_x86_avx512_mask_pmov_base_sized<"512", InVT512, OutVT512, ScVT512>; } let TargetPrefix = "x86" in { defm qb : int_x86_avx512_mask_pmov_all<llvm_i8_ty, llvm_v2i64_ty, llvm_v16i8_ty, llvm_i8_ty, llvm_v4i64_ty, llvm_v16i8_ty, llvm_i8_ty, llvm_v8i64_ty, llvm_v16i8_ty>; defm qw : int_x86_avx512_mask_pmov_all<llvm_i8_ty, llvm_v2i64_ty, llvm_v8i16_ty, llvm_i8_ty, llvm_v4i64_ty, llvm_v8i16_ty, llvm_i8_ty, llvm_v8i64_ty, llvm_v8i16_ty>; defm qd : int_x86_avx512_mask_pmov_all<llvm_i8_ty, llvm_v2i64_ty, llvm_v4i32_ty, llvm_i8_ty, llvm_v4i64_ty, llvm_v4i32_ty, llvm_i8_ty, llvm_v8i64_ty, llvm_v8i32_ty>; defm db : int_x86_avx512_mask_pmov_all<llvm_i8_ty, llvm_v4i32_ty, llvm_v16i8_ty, llvm_i8_ty, llvm_v8i32_ty, llvm_v16i8_ty, llvm_i16_ty, llvm_v16i32_ty, llvm_v16i8_ty>; defm dw : int_x86_avx512_mask_pmov_all<llvm_i8_ty, llvm_v4i32_ty, llvm_v8i16_ty, llvm_i8_ty, llvm_v8i32_ty, llvm_v8i16_ty, llvm_i16_ty, llvm_v16i32_ty, llvm_v16i16_ty>; defm wb : int_x86_avx512_mask_pmov_all<llvm_i8_ty, llvm_v8i16_ty, llvm_v16i8_ty, llvm_i16_ty, llvm_v16i16_ty, llvm_v16i8_ty, llvm_i32_ty, llvm_v32i16_ty, llvm_v32i8_ty>; } Which makes the differences more obvious, I think (though it's not ideal either) (and I think we can handle types by faking maps (using !cast<>(##)) and just passing the scalar types). I admit trunc/ext are one of the worst examples though (but only because of the widening of the smaller types, I think). For instance, a few hundred lines above, the arithmetic instructions seem even more easily factorizable. |
(Note that you can also reduce noise by passing ValueTypes, and building LLVMTypes in the base multiclass. I hit another tblgen bug while trying that though, I'll have to look into it some more ;)
Ahmed,Thank you very much for your review! I have updated the patch according to your comments
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
15257 | extendMask flag removed | |
15293 | ISD::VSELECT can be lower and combined later to different sequence ( broadcast for example), to prevent this behavior i use X86ISD::SELECT | |
15451 | case removed | |
lib/Target/X86/X86InstrAVX512.td | ||
5762–5764 | I think that in this case a multiclass would not help here, it will not decrease the code lines numbers and most probably make code more difficult for understanding. | |
lib/Target/X86/X86IntrinsicsInfo.h | ||
178 | Yes , you are correct, it will also simplify the implementation . But in order to do this i need to extend StoreSDNode and MaskedStoreSDNode class to hold additional information - saturation type. |
Elena,Thank you very much for your review! I have updated the patch according to your comments.
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
15293 | X86ISD::SELECT We can't use ISD::VSELECT here because it is not allays "Legal" for the destination type. | |
16159 | 1-2 lines of comments before function, please. | |
16331 | remove {} | |
24189 | The comments here are also should be fixed as bellow. | |
24190 | remove {} | |
24307 | Comments: | |
24308 | remove {} |
new diff uploaded.
previous commit was reverted (tablegen failed, Sparc target).
New changes was made, part of implementation was moved from TargetSelectionDAG.td to X86InstrFragmentsSIMD.td
So, this file is starting to get pretty unwieldy. Why not use multiclasses?