Enable FP16 complex FMA instructions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| llvm/lib/Target/X86/X86ScheduleZnver3.td | ||
|---|---|---|
| 64 ↗ | (On Diff #355794) | You could avoid this change if you add a scheduler class to whatever instruction is complaining? |
| llvm/lib/Target/X86/X86ScheduleZnver3.td | ||
|---|---|---|
| 64 ↗ | (On Diff #355794) | /me goes to add herald rule that i forgot to add |
| clang/lib/Headers/avx512fp16intrin.h | ||
|---|---|---|
| 2955 | Outer brackets | |
- Rebase.
- Add _mm_mask3_fcmadd_sch and _mm_mask3_fcmadd_round_sch.
- Address comments from Yuanke and Simon.
| clang/test/CodeGen/X86/avx512fp16-builtins.c | ||
|---|---|---|
| 4223 | MADD? | |
| 4315 | MADD? | |
| llvm/include/llvm/IR/IntrinsicsX86.td | ||
| 5736 | _cph? | |
| 5800 | _csh? | |
| llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp | ||
| 3903 | "b" means rounding. Right? | |
| 3949 | Sorry, I didn't find the constrain in the spec. | |
| llvm/lib/Target/X86/X86ISelLowering.cpp | ||
| 47414 | Can swap LHS and RHS reduce some redundant code? | |
| 47421 | The lambda seems only be called once. | |
| 47423 | Is it possible fast and non-fast instruction is mixed due to inline? Shall we check the instruction AllowContract flag? | |
| 47436 | Merge it to previous line. | |
| llvm/lib/Target/X86/X86InstrAVX512.td | ||
| 5772 | Moving ClobberConstraint before IsCommutable saves the code for default value? | |
| 13593 | The name seems not accurate. Is it cfmop for mul and cfmaop for fma? | |
| 13629 | I didn't see this flag for other scalar instructions, why we need it for complex instruction? | |
| llvm/lib/Target/X86/X86InstrFoldTables.cpp | ||
| 1852 | Why FR32X version is not needed for complex scalar instructions? | |
| llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp | ||
|---|---|---|
| 3903 | broadcasting | |
| 3949 | #UD if (dest_reg == src1_reg) or ( dest_reg == src2_reg) | |
| llvm/lib/Target/X86/X86InstrAVX512.td | ||
| 13629 | Because all complex instructions have constrains "dst != src1 && dst != src2". We use earlyclobber to avoid the dst been assigned to src1 or src2. | |
| llvm/lib/Target/X86/X86InstrFoldTables.cpp | ||
|---|---|---|
| 1852 | Do you mean complex ss/sd? We don't have these instructions. | |
| llvm/lib/Target/X86/X86InstrFoldTables.cpp | ||
|---|---|---|
| 1852 | No, I mean we have both X86::XXX and X86::XXX_Int for other instructions. One is FR16X which can be unfolded, one is VR128X which can't. For example, VFNMADD213SHZm and VFNMADD213SHZm_Int. | |
| llvm/lib/Target/X86/X86InstrFoldTables.cpp | ||
|---|---|---|
| 1852 | The VFCMULCSHZrr instructions produce two 16-bit values packed into the lower 32 bits. That would mean we would need a FR32X result, but it couldn't interact meaningfully with any other FR32X instruction since its really two values. I think we only have FR32/FR64 instructions for things that have generic IR equivalents or that we create from other generic IR operations. Like I think we have an FR32 RCP and RSQRT because we can convert float div or 1/sqrt to them. | |
| llvm/lib/Target/X86/X86InstrFoldTables.cpp | ||
|---|---|---|
| 1852 | Thanks, Craig. I understand now. :) | |
| llvm/lib/Target/X86/X86InstrAVX512.td | ||
|---|---|---|
| 13629 | Got it. Thanks! | |
Thanks for the review.
| clang/test/CodeGen/X86/avx512fp16-builtins.c | ||
|---|---|---|
| 4223 | They are marks used when adding tests. We can remove them now. | |
| llvm/lib/Target/X86/X86ISelLowering.cpp | ||
|---|---|---|
| 47544 | Sorry, I don't understand the comments. What does FMF mean? | |
| llvm/lib/Target/X86/X86ISelLowering.cpp | ||
|---|---|---|
| 47544 | fast math flags? | |
| llvm/lib/Target/X86/X86ISelLowering.cpp | ||
|---|---|---|
| 47544 | I understand now. Thanks, Simon. :) | |
| llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp | ||
|---|---|---|
| 3903 | Sorry, my mistake. Here b supposes to represent EVEX.b bit in the encoding. | |
Outer brackets