This is an archive of the discontinued LLVM Phabricator instance.

[X86] AVX512FP16 instructions enabling 3/6
ClosedPublic

Authored by pengfei on Jun 30 2021, 11:31 PM.

Diff Detail

Event Timeline

pengfei created this revision.Jun 30 2021, 11:31 PM
pengfei requested review of this revision.Jun 30 2021, 11:31 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 30 2021, 11:31 PM
pengfei updated this revision to Diff 355786.Jun 30 2021, 11:58 PM

Remove unexpected intrinsics.

pengfei edited the summary of this revision. (Show Details)Jul 1 2021, 1:00 AM
pengfei updated this revision to Diff 357722.Jul 10 2021, 8:03 AM

Add tests for extendhfxf2 and truncxfhf2.

LuoYuanke added inline comments.Aug 12 2021, 6:29 AM
clang/lib/Headers/avx512fp16intrin.h
1746

Does it also return i32 in x86_64 platform? We may unify the intrinsic both for x86 and x86_x64 to return i32.

1872

VCVTPH2PSX support broadcast compared to VCVTPH2PS, but for intrinsics there is no difference. Do we need to add the new intrinsics? Ditto for its variants.

pengfei added inline comments.Aug 12 2021, 7:15 PM
clang/lib/Headers/avx512fp16intrin.h
1746

Yes. This is used for both x86 and x86_x64.

1872

Yes. The difference is the type. We previously use __m256i for the half vector since _Float16 is not a legal type then.

craig.topper added inline comments.Aug 12 2021, 7:32 PM
clang/lib/Headers/avx512fp16intrin.h
951

Put extra parentheses here to avoid repeating PR51324. Same for any macro that has a typecast of builtin result.

pengfei updated this revision to Diff 366470.Aug 14 2021, 7:07 PM
pengfei marked an inline comment as done.

Rebased.
Add extra parentheses for macro.

LuoYuanke added inline comments.Aug 15 2021, 2:03 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
1949

Sorry, I'm just confused on why the type is the same for ISD::SINT_TO_FP and ISD::FP_TO_SINT? The legalization use src type for ISD::SINT_TO_FP and dst type for ISD::FP_TO_SINT? Why not unify to dst type.

1993

How do we know it covert to v16f16? Is it possible convert to v16f32?

2051

Why it is not v2i16?

20078

Should this node be chained to Op.getOperand(0) for strict FP and convert operation be chained to this node?

22163

InChain for strict FP?

22174

Is there any case for v3f16?

31324

Is it possible the type is i3/i5/i7?

31516

Where is vXi16 handle? Is it promoted to vXi32 finally?

31531

Isn't the result type changed to v8f16? Why we don't extract sub-vector here?

49678

Need to check Subtarget.hasFP16() ?

llvm/lib/Target/X86/X86InstrAVX512.td
8188

Why null_frag instead of X86vfpround?

8194

What is the alias used for? Can't it be distinguished from operand?

craig.topper added inline comments.Aug 15 2021, 12:58 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
1949

This is the rules for how LegalizeDAG.cpp and LegalizeVectorOps looks up the operation action for these nodes. It always use the integer type.

I believe this is because for scalar types, there are more legal integer types than there are legal conversions. The FP types were already checked for legality by the type legalizer. This has gotten a lot more complicated for vectors.

craig.topper added inline comments.Aug 15 2021, 1:11 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
31324

We should only get here for integer type that we enabled Custom handling for.

llvm/lib/Target/X86/X86InstrAVX512.td
8188

The masked forms require X86vmfpround and the unmasked forms require X86any_vfpround. Template doesn't handle that. So there are extra patterns later

8194

The memory form can't be distinquished by operand and requires a suffix. This alias exists so that the register form can also use a suffix even though it isn't required. It allows an "rm" constraint to be used for a suffixed mnemonic in inline assembly without needing to know if the compiler will pick register or memory.

Thank Craig for the clarification!

pengfei updated this revision to Diff 366630.Aug 16 2021, 7:54 AM
  1. Address Yuanke's comments.
  2. Add missed strict FP handling.
  3. Refactor the repeated declarations for strict FP.
llvm/lib/Target/X86/X86ISelLowering.cpp
1993

No. Because v16f32 is not a legal type on X86.

2051

This is used to customize vector widen, which always check the action of result type.

20078

Not, we just chain FP nodes together.

22163

Good catch.

22174

No, v3f16 will be widen to v4f16 first.

31516

No. i16 and f16 has the same element size. So we don't need to replace them with custom nodes.

31531

Yes. The common widen code widen both src and dst elements as the same size. We are customizing to different size here so than we can always select the 128 bit instructions. Result type larger than 128 bits doesn't have this problem.

49678

No. We can't go to here without feature FP16 enabled.

pengfei added inline comments.Aug 16 2021, 5:46 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
1993

Sorry, I mistook it with v32f32.
The answer is still no. But because we extend v16i16 to v16i32 in combineS(U)IntToFP at the begining. So we don't need to worry about there's no a legal v16i16 to v16f32.

LuoYuanke accepted this revision.Aug 16 2021, 6:40 PM

LGTM, but wait 1 or 2 days for the comments from others.

This revision is now accepted and ready to land.Aug 16 2021, 6:40 PM
This revision was landed with ongoing or failed builds.Aug 17 2021, 6:35 PM
This revision was automatically updated to reflect the committed changes.
This comment was removed by samitolvanen.

Thanks @vitalybuka for the information. I didn't receive this buildbot failure notice. I found the latest build has turned green, but I didn't find which commit fixed it. I'll keep watching it for a while.