This is an archive of the discontinued LLVM Phabricator instance.

[X86] AVX512FP16 instructions enabling 2/6
ClosedPublic

Authored by pengfei on Jun 30 2021, 11:08 PM.

Diff Detail

Event Timeline

pengfei created this revision.Jun 30 2021, 11:08 PM
pengfei requested review of this revision.Jun 30 2021, 11:08 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 30 2021, 11:08 PM
pengfei edited the summary of this revision. (Show Details)Jul 1 2021, 1:00 AM
RKSimon added inline comments.Jul 3 2021, 6:42 AM
llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll
6–8

I think you need to add a fallback prefix for AVX512 without FP16

llvm/test/CodeGen/X86/vector-reduce-fmin-nnan.ll
7

I think you need to add a fallback prefix for AVX512 without FP16

pengfei updated this revision to Diff 356364.Jul 3 2021, 8:20 PM
pengfei marked 2 inline comments as done.

Add prefix AVX512BW for non FP16 cases.
Thanks Simon for review.

LuoYuanke added inline comments.Aug 10 2021, 5:37 AM
clang/include/clang/Basic/BuiltinsX86.def
1860

Why there is no 256 and 128 version for addph, subph, mulph, divph?

clang/lib/Headers/avx512fp16intrin.h
310

_CMP_NEQ_OS?

316

Why it is OQ not UQ? Ditto for all other ucomi intrinsics.

514

Why there is rounding control for max/min operation?

667

Will it be combined to one instruction? If __B[0] is 0, and mask[0] is 0, there is no exception?

696

Do we have rounding control for min?

755

This name may be misleading, it means suppress exception. Right?

950

It seems there is no mask for reduce operation.

961

Not sure if there is any room to optimize. The operation for element 2, 3 is unnecessary.

clang/lib/Headers/avx512vlfp16intrin.h
365

Ditto

393

Ditto.

clang/test/CodeGen/X86/avx512fp16-builtins.c
639

Need a blank line?

645

Ditto.

pengfei updated this revision to Diff 365738.Aug 11 2021, 7:01 AM
pengfei marked 6 inline comments as done.
  1. Rebase to the first merged FP16 patch.
  2. Address Yuanke's comments.
  3. Add parentheses around casts.
  4. Remove OptForSize predicate for vmovsh.
  5. Add more immediate value for encoding tests of vcmpph/sh.
clang/include/clang/Basic/BuiltinsX86.def
1860

Because they don't support rounding control, thus we can directly use instructions like fadd, fsub etc.

clang/lib/Headers/avx512fp16intrin.h
310

_CMP_NEQ_US is correct, see the operation in intrinsic guide (we should update for sh):
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_comineq_ss&expand=1365,1366

316

OQ requires both inputs are non-nan. This is the same as the intrinsic's behavior:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_ucomieq_ss&expand=1354,7232

The "u" in intrinsic name corresponds to not "U" but "Q" in the macro name.

514

They still need to control the exception.

667

We should consider this case, but only under strict FP mode. We have ss/sd defined in this way too. We should fix them in future.

696

No. But we need control exception.

755

Yes. But we are widely using it for SAE in existing code :)

Thanks for the review.

LuoYuanke added inline comments.Aug 11 2021, 7:48 AM
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
3197

There is no cmpsh?

3199

We only have vcmpph?

llvm/lib/Target/X86/X86ISelLowering.cpp
1869

Is this related to FP16?

llvm/lib/Target/X86/X86InstrAVX512.td
2670

X86cmpms (

2671

CommutableCMPCC:$cc

pengfei updated this revision to Diff 365895.Aug 11 2021, 6:47 PM
pengfei marked 3 inline comments as done.

Address Yuanke's comments.

pengfei added inline comments.Aug 11 2021, 6:49 PM
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
3197

Good catch!

llvm/lib/Target/X86/X86ISelLowering.cpp
1869

Yes, but it's better to move them together with other FP16 settings.

pengfei updated this revision to Diff 365899.Aug 11 2021, 7:22 PM

Update missing changes.

LuoYuanke added inline comments.Aug 11 2021, 8:04 PM
llvm/lib/Target/X86/X86InstrFoldTables.cpp
4838

Is this because intrinsics always assume the arguments are passed in register?

llvm/test/CodeGen/X86/avx512fp16-fmaxnum.ll
27

Is it legal without avx512vl?

llvm/test/CodeGen/X86/avx512fp16-fold-load-binops.ll
8

Any case for max/min?

pengfei updated this revision to Diff 365995.Aug 12 2021, 7:37 AM
pengfei marked an inline comment as done.

Address Yuanke's comments.

llvm/lib/Target/X86/X86InstrFoldTables.cpp
4838

No. Because the register size of scalar intrinsics is larger than corresponding memory size:
https://github.com/llvm/llvm-project/blob/main/llvm/utils/TableGen/X86FoldTablesEmitter.cpp#L493

llvm/test/CodeGen/X86/avx512fp16-fmaxnum.ll
27

avx512fp16 implies avx512vl.

LuoYuanke added inline comments.Aug 12 2021, 7:18 PM
clang/lib/Headers/avx512vlfp16intrin.h
367
LuoYuanke accepted this revision.Aug 12 2021, 7:19 PM

LGTM, but may wait 1 or 2 days for the comments from others.

This revision is now accepted and ready to land.Aug 12 2021, 7:19 PM
pengfei updated this revision to Diff 366194.Aug 12 2021, 11:04 PM

Address Yuanke's comment.

Thanks for the review!

clang/lib/Headers/avx512vlfp16intrin.h
367

Yes. Cood catch.

This revision was landed with ongoing or failed builds.Aug 14 2021, 5:57 PM
This revision was automatically updated to reflect the committed changes.
craig.topper added inline comments.Aug 15 2021, 9:51 AM
llvm/lib/Target/X86/AsmParser/X86AsmParser.cpp
3197

Changing this created bug where this is now accepted

cmpeqsh %xmm0, %xmm1, %k1

llvm/lib/Target/X86/MCTargetDesc/X86ATTInstPrinter.cpp
169

What about the equivalent code in the Intel printer?

llvm/lib/Target/X86/X86InstrAVX512.td
2670

This should be synchronized with the CMPSS/CMPSD code.

Thanks Craig!