In previous patch https://reviews.llvm.org/D93594, we only scalarize tilezero, tileload, tilestore and tiledpbssd. In this patch we scalarize tdpbf16ps intrinsic.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| llvm/test/CodeGen/X86/AMX/amx-low-intrinsics.ll | ||
|---|---|---|
| 174–175 ↗ | (On Diff #321675) | Can we use a shuffle instruction? |
| llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp | ||
|---|---|---|
| 383–385 | Is it concise to use below? template <Intrinsic::ID IntrID>
typename std::enable_if_t<
IntrID == Intrinsic::x86_tdpbssd_internal ||
IntrID == Intrinsic::x86_tdpbf16ps_internal, bool>
lowerTileDP(Instruction *TileDP); | |
| 389–390 | Can we create vecC with <256 x float>? | |
| 412 | better to use EltCF32 or CF32 | |
| 419 | ditto | |
| 420 | Better to define a variable for it and reuse. | |
| llvm/lib/Target/X86/X86LowerAMXIntrinsics.cpp | ||
|---|---|---|
| 389–390 | In fact, we are trying to find a bitcast whose operand is <256 x i32>, as shown in line229. | |
Is it concise to use below?
template <Intrinsic::ID IntrID> typename std::enable_if_t< IntrID == Intrinsic::x86_tdpbssd_internal || IntrID == Intrinsic::x86_tdpbf16ps_internal, bool> lowerTileDP(Instruction *TileDP);