This is an archive of the discontinued LLVM Phabricator instance.

[X86] AVX512FP16 instructions enabling 6/6
ClosedPublic

Authored by pengfei on Jul 1 2021, 12:46 AM.

Diff Detail

Event Timeline

pengfei created this revision.Jul 1 2021, 12:46 AM
pengfei requested review of this revision.Jul 1 2021, 12:46 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 1 2021, 12:46 AM
pengfei edited the summary of this revision. (Show Details)Jul 1 2021, 1:00 AM
RKSimon added inline comments.Jul 3 2021, 6:48 AM
llvm/lib/Target/X86/X86ScheduleZnver3.td
64 ↗(On Diff #355794)

You could avoid this change if you add a scheduler class to whatever instruction is complaining?

lebedev.ri added inline comments.
llvm/lib/Target/X86/X86ScheduleZnver3.td
64 ↗(On Diff #355794)

/me goes to add herald rule that i forgot to add
Yep, please just add the new sched class here as unsupported.

pengfei updated this revision to Diff 356373.Jul 4 2021, 4:14 AM

Add the missing scheduler info. Thanks Simon and Roman.

pengfei updated this revision to Diff 356602.Jul 5 2021, 10:32 PM

Fix the capitalization mismatch in tests.

LuoYuanke added inline comments.Jul 14 2021, 7:41 PM
llvm/test/CodeGen/X86/avx512cfma-intrinsics.ll
4

Do we miss broadcast test case?

llvm/test/CodeGen/X86/avx512cfmul-intrinsics.ll
4

Do we miss broadcast test case?

RKSimon added inline comments.Aug 17 2021, 2:42 AM
clang/lib/Headers/avx512fp16intrin.h
2955

Outer brackets

pengfei updated this revision to Diff 366892.Aug 17 2021, 7:32 AM
pengfei marked 3 inline comments as done.
  1. Rebase.
  2. Add _mm_mask3_fcmadd_sch and _mm_mask3_fcmadd_round_sch.
  3. Address comments from Yuanke and Simon.
LuoYuanke added inline comments.Aug 26 2021, 7:27 AM
clang/test/CodeGen/X86/avx512fp16-builtins.c
4223

MADD?

4315

MADD?

llvm/include/llvm/IR/IntrinsicsX86.td
5736

_cph?

5800

_csh?

llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
3903

"b" means rounding. Right?

3949

Sorry, I didn't find the constrain in the spec.

llvm/lib/Target/X86/X86ISelLowering.cpp
47414

Can swap LHS and RHS reduce some redundant code?

47421

The lambda seems only be called once.

47423

Is it possible fast and non-fast instruction is mixed due to inline? Shall we check the instruction AllowContract flag?

47436

Merge it to previous line.

llvm/lib/Target/X86/X86InstrAVX512.td
5772

Moving ClobberConstraint before IsCommutable saves the code for default value?

13593

The name seems not accurate. Is it cfmop for mul and cfmaop for fma?

13629

I didn't see this flag for other scalar instructions, why we need it for complex instruction?

llvm/lib/Target/X86/X86InstrFoldTables.cpp
1852

Why FR32X version is not needed for complex scalar instructions?

pengfei added inline comments.Aug 26 2021, 8:03 AM
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
3903

broadcasting

3949

#UD if (dest_reg == src1_reg) or ( dest_reg == src2_reg)

llvm/lib/Target/X86/X86InstrAVX512.td
13629

Because all complex instructions have constrains "dst != src1 && dst != src2". We use earlyclobber to avoid the dst been assigned to src1 or src2.

pengfei added inline comments.Aug 26 2021, 8:06 AM
llvm/lib/Target/X86/X86InstrFoldTables.cpp
1852

Do you mean complex ss/sd? We don't have these instructions.

LuoYuanke added inline comments.Aug 26 2021, 4:47 PM
llvm/lib/Target/X86/X86InstrFoldTables.cpp
1852

No, I mean we have both X86::XXX and X86::XXX_Int for other instructions. One is FR16X which can be unfolded, one is VR128X which can't. For example, VFNMADD213SHZm and VFNMADD213SHZm_Int.

craig.topper added inline comments.Aug 26 2021, 5:00 PM
llvm/lib/Target/X86/X86InstrFoldTables.cpp
1852

The VFCMULCSHZrr instructions produce two 16-bit values packed into the lower 32 bits. That would mean we would need a FR32X result, but it couldn't interact meaningfully with any other FR32X instruction since its really two values.

I think we only have FR32/FR64 instructions for things that have generic IR equivalents or that we create from other generic IR operations. Like I think we have an FR32 RCP and RSQRT because we can convert float div or 1/sqrt to them.

LuoYuanke added inline comments.Aug 26 2021, 6:26 PM
llvm/lib/Target/X86/X86InstrFoldTables.cpp
1852

Thanks, Craig. I understand now. :)

LuoYuanke added inline comments.Aug 26 2021, 6:27 PM
llvm/lib/Target/X86/X86InstrAVX512.td
13629

Got it. Thanks!

pengfei updated this revision to Diff 369049.Aug 27 2021, 2:30 AM
pengfei marked 9 inline comments as done.

Address Yuanke's comments.

Thanks for the review.

clang/test/CodeGen/X86/avx512fp16-builtins.c
4223

They are marks used when adding tests. We can remove them now.

LuoYuanke added inline comments.Aug 27 2021, 6:09 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47544

Sorry, I don't understand the comments. What does FMF mean?

RKSimon added inline comments.Aug 27 2021, 6:29 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47544

fast math flags?

LuoYuanke added inline comments.Aug 27 2021, 6:47 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
47544

I understand now. Thanks, Simon. :)

pengfei updated this revision to Diff 369093.Aug 27 2021, 7:11 AM

Update comments. NFC.

llvm/lib/Target/X86/X86ISelLowering.cpp
47544

Yes.

LuoYuanke accepted this revision.Aug 27 2021, 5:20 PM

LGTM, but may wait 1 or 2 days for the comments from others.

This revision is now accepted and ready to land.Aug 27 2021, 5:20 PM
This revision was landed with ongoing or failed builds.Aug 29 2021, 11:58 PM
This revision was automatically updated to reflect the committed changes.
pengfei added inline comments.Aug 31 2021, 12:11 AM
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
3903

Sorry, my mistake. Here b supposes to represent EVEX.b bit in the encoding.
It's used as broadcast control only in memory variants in vector instructions.
When it is used in register variants, it enables rounding control and SAE.