This is an archive of the discontinued LLVM Phabricator instance.

[X86] AVX512FP16 instructions enabling 3/6
ClosedPublic

Authored by pengfei on Jun 30 2021, 11:31 PM.

Diff Detail

Event Timeline

pengfei created this revision.Jun 30 2021, 11:31 PM
pengfei requested review of this revision.Jun 30 2021, 11:31 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 30 2021, 11:31 PM
pengfei updated this revision to Diff 355786.Jun 30 2021, 11:58 PM

Remove unexpected intrinsics.

pengfei edited the summary of this revision. (Show Details)Jul 1 2021, 1:00 AM
pengfei updated this revision to Diff 357722.Jul 10 2021, 8:03 AM

Add tests for extendhfxf2 and truncxfhf2.

LuoYuanke added inline comments.Aug 12 2021, 6:29 AM
clang/lib/Headers/avx512fp16intrin.h
1709

Does it also return i32 in x86_64 platform? We may unify the intrinsic both for x86 and x86_x64 to return i32.

1835

VCVTPH2PSX support broadcast compared to VCVTPH2PS, but for intrinsics there is no difference. Do we need to add the new intrinsics? Ditto for its variants.

pengfei added inline comments.Aug 12 2021, 7:15 PM
clang/lib/Headers/avx512fp16intrin.h
1709

Yes. This is used for both x86 and x86_x64.

1835

Yes. The difference is the type. We previously use __m256i for the half vector since _Float16 is not a legal type then.

craig.topper added inline comments.Aug 12 2021, 7:32 PM
clang/lib/Headers/avx512fp16intrin.h
914

Put extra parentheses here to avoid repeating PR51324. Same for any macro that has a typecast of builtin result.

pengfei updated this revision to Diff 366470.Aug 14 2021, 7:07 PM
pengfei marked an inline comment as done.

Rebased.
Add extra parentheses for macro.

LuoYuanke added inline comments.Aug 15 2021, 2:03 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
1955

Sorry, I'm just confused on why the type is the same for ISD::SINT_TO_FP and ISD::FP_TO_SINT? The legalization use src type for ISD::SINT_TO_FP and dst type for ISD::FP_TO_SINT? Why not unify to dst type.

1996

How do we know it covert to v16f16? Is it possible convert to v16f32?

2054

Why it is not v2i16?

19996

Should this node be chained to Op.getOperand(0) for strict FP and convert operation be chained to this node?

22001

InChain for strict FP?

22012

Is there any case for v3f16?

31082

Is it possible the type is i3/i5/i7?

31260

Where is vXi16 handle? Is it promoted to vXi32 finally?

31275

Isn't the result type changed to v8f16? Why we don't extract sub-vector here?

49324

Need to check Subtarget.hasFP16() ?

llvm/lib/Target/X86/X86InstrAVX512.td
8193

Why null_frag instead of X86vfpround?

8199

What is the alias used for? Can't it be distinguished from operand?

craig.topper added inline comments.Aug 15 2021, 12:58 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
1955

This is the rules for how LegalizeDAG.cpp and LegalizeVectorOps looks up the operation action for these nodes. It always use the integer type.

I believe this is because for scalar types, there are more legal integer types than there are legal conversions. The FP types were already checked for legality by the type legalizer. This has gotten a lot more complicated for vectors.

craig.topper added inline comments.Aug 15 2021, 1:11 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
31082

We should only get here for integer type that we enabled Custom handling for.

llvm/lib/Target/X86/X86InstrAVX512.td
8193

The masked forms require X86vmfpround and the unmasked forms require X86any_vfpround. Template doesn't handle that. So there are extra patterns later

8199

The memory form can't be distinquished by operand and requires a suffix. This alias exists so that the register form can also use a suffix even though it isn't required. It allows an "rm" constraint to be used for a suffixed mnemonic in inline assembly without needing to know if the compiler will pick register or memory.

Thank Craig for the clarification!

pengfei updated this revision to Diff 366630.Aug 16 2021, 7:54 AM
  1. Address Yuanke's comments.
  2. Add missed strict FP handling.
  3. Refactor the repeated declarations for strict FP.
llvm/lib/Target/X86/X86ISelLowering.cpp
1996

No. Because v16f32 is not a legal type on X86.

2054

This is used to customize vector widen, which always check the action of result type.

19996

Not, we just chain FP nodes together.

22001

Good catch.

22012

No, v3f16 will be widen to v4f16 first.

31260

No. i16 and f16 has the same element size. So we don't need to replace them with custom nodes.

31275

Yes. The common widen code widen both src and dst elements as the same size. We are customizing to different size here so than we can always select the 128 bit instructions. Result type larger than 128 bits doesn't have this problem.

49324

No. We can't go to here without feature FP16 enabled.

pengfei added inline comments.Aug 16 2021, 5:46 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
1996

Sorry, I mistook it with v32f32.
The answer is still no. But because we extend v16i16 to v16i32 in combineS(U)IntToFP at the begining. So we don't need to worry about there's no a legal v16i16 to v16f32.

LuoYuanke accepted this revision.Aug 16 2021, 6:40 PM

LGTM, but wait 1 or 2 days for the comments from others.

This revision is now accepted and ready to land.Aug 16 2021, 6:40 PM
This revision was landed with ongoing or failed builds.Aug 17 2021, 6:35 PM
This revision was automatically updated to reflect the committed changes.
This comment was removed by samitolvanen.

Thanks @vitalybuka for the information. I didn't receive this buildbot failure notice. I found the latest build has turned green, but I didn't find which commit fixed it. I'll keep watching it for a while.