This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add AVX-NE-CONVERT instructions.
ClosedPublic

Authored by FreddyYe on Oct 13 2022, 7:10 PM.

Diff Detail

Event Timeline

FreddyYe created this revision.Oct 13 2022, 7:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2022, 7:10 PM
FreddyYe requested review of this revision.Oct 13 2022, 7:10 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 13 2022, 7:10 PM
FreddyYe retitled this revision from Add AVX-NE-CONVERT instructions. to [X86] Add AVX-NE-CONVERT instructions..Oct 13 2022, 8:17 PM
LuoYuanke added inline comments.Oct 13 2022, 8:41 PM
clang/lib/Basic/Targets/X86.cpp
795

Do we need it here?

craig.topper added inline comments.
clang/lib/Headers/immintrin.h
262

Is this FIXME still relevant? Don't we support _Float16 with SSE2 now?

llvm/include/llvm/Support/X86TargetParser.def
206

Extra space before "avxneconvert"

pengfei added inline comments.Oct 13 2022, 10:18 PM
clang/lib/Basic/Targets/X86.cpp
795

We don't need it.

clang/lib/Headers/immintrin.h
262

_Float16 is supported with SSE2, but maybe we need to move __m128h, __m256h out of avx512fp16intrin.h

pengfei added inline comments.Oct 13 2022, 10:22 PM
clang/lib/Headers/avxneconvertintrin.h
48

I think the bf16 vector type may have the same problem with FP16. When need to move them out of avx512vlbf16intrin.h
Another issue is we want to switch them to __bf16 vector. Hope D132329 can be landed first.

RKSimon added inline comments.Oct 14 2022, 1:28 AM
clang/test/CodeGen/X86/avxneconvert-builtins.c
3

32-bit test coverage?

FreddyYe updated this revision to Diff 468158.Oct 17 2022, 4:22 AM

Address part of comments.

FreddyYe marked 5 inline comments as done.Oct 17 2022, 4:23 AM

THX for reviews!

clang/lib/Headers/immintrin.h
262

Yes. This is a redundant FIXME.

FreddyYe marked 2 inline comments as done.Oct 17 2022, 4:24 AM
RKSimon added inline comments.Oct 17 2022, 6:20 AM
llvm/test/MC/X86/avx-ne-convert-att.s
1 ↗(On Diff #468158)

merge the att + intel test files and use --check-prefixes to test both

merge att/intel test coverage files and rename the 32/64 bit files so that they are close together in the file lists

Matt added a subscriber: Matt.Oct 19 2022, 5:03 PM
pengfei added inline comments.Oct 19 2022, 11:19 PM
clang/lib/Headers/immintrin.h
262

I have moved FP16/BF16 vector types out of original header files. rGe0fb01e9
There should be no dependency to FP16 and BF16 feature now.

pengfei added inline comments.Oct 19 2022, 11:22 PM
clang/test/CodeGen/X86/avxneconvert-builtins.c
3

This should be removed now.

llvm/test/CodeGen/X86/avxneconvert-intrinsics.ll
3

Do we have real dependency to FP16?

4

ditto.

pengfei added inline comments.Oct 25 2022, 7:58 PM
clang/include/clang/Basic/BuiltinsX86.def
2131–2132

These should be shared with AVX512-BF16.

clang/lib/Headers/avxneconvertintrin.h
87–95

Add unified intrinsics like AVXVNNI.

Possibly rename the x86-64-* test files to *-64 (and *-32 equivalent) so that the 32/64 bit files are closer together for tracking (and to help avoid bitrot).

clang/lib/Headers/immintrin.h
262

Update to this?

#if !(defined(_MSC_VER) || defined(__SCE__)) || __has_feature(modules) ||      \
    (defined(__AVXNECONVERT__) && defined(__AVX512FP16__))
llvm/test/MC/X86/x86-64-avx-ne-convert-att.s
1 ↗(On Diff #468158)

x86-64-avx-ne-convert-intel.s ?

FreddyYe updated this revision to Diff 471408.Oct 27 2022, 11:41 PM
FreddyYe marked 10 inline comments as done.

Address comments. THX for review!

FreddyYe added inline comments.Oct 28 2022, 12:15 AM
llvm/test/CodeGen/X86/avxneconvert-intrinsics.ll
5

Need to add +avx512bf16,+avx512vl tests for shared builtin intrinsic. I just found it crashed for lacking new patterns for avx512bf16. I'll update ASAP.

RKSimon added inline comments.Oct 28 2022, 8:43 AM
clang/lib/Headers/avx512vlbf16intrin.h
164

Is there no way for attribute to allow different attribute permutations?

Also, can we keep the __builtin_ia32_cvtneps2bf16_128 naming convention?

pengfei added inline comments.Oct 28 2022, 9:12 AM
clang/lib/Headers/avx512vlbf16intrin.h
164

Is there no way for attribute to allow different attribute permutations?

We have discussed this problem with GCC folks. There are two problems here:

  1. Unlike builtins, function attributes are more generic. It may introduce a lot of checks between callers and callees. I had a research to limit it to __always_inline__ functions only. However, Clang handles inlining in middle-end, we don't have such information in the front-end. Besides, we don't know how to merge different permutations if they are inlining to the same function.
  2. We don't know how to put the permutations into IR's function attributes. We need to preserve all permutations for inlining reference, but the backend needs a determine feature list rather than selective.
FreddyYe updated this revision to Diff 471710.Oct 28 2022, 9:22 PM

Fix crash to compile avx512vlbf16 intrinsics.

pengfei added inline comments.Oct 29 2022, 7:02 AM
clang/include/clang/Driver/Options.td
4599–4600

Need to move it before mavxvnniint8 .

clang/lib/Basic/Targets/X86.cpp
1034

Move it ahead.

clang/lib/Headers/avx512vlbf16intrin.h
164

It's better to use __builtin_ia32_cvtneps2bf16_128.

clang/lib/Headers/avxneconvertintrin.h
107

VBCSTNESH2PS

140

VBCSTNESH2PS

208

16

274

16

340

16

406

16

clang/test/Preprocessor/x86_target_features.c
593–599

Should we check __AVX2__ like we did for AVXVNNI?

llvm/lib/Support/Host.cpp
1819

Move it ahead and remove the blank line.

llvm/lib/Target/X86/X86ISelLowering.cpp
2181–2198

How about merge it here?

llvm/lib/Target/X86/X86InstrSSE.td
8260–8261

This can be f16 mem now.

8264–8265

f128mem, f256mem

8268–8269

ditto.

llvm/test/CodeGen/X86/avx512bf16-vl-intrinsics.ll
129–140 ↗(On Diff #471710)

You don't need to add them here, just another RUN in below file should be enough, e.g.,

; RUN: llc < %s -O0 -verify-machineinstrs -mtriple=x86_64-unknown-unknown --show-mc-encoding -mattr=+avx512bf16,+avx512vl | FileCheck %s --check-prefix=AVX512BF16
; RUN: llc < %s -O0 -verify-machineinstrs -mtriple=i686-unknown-unknown --show-mc-encoding -mattr=+avx512bf16,+avx512vl | FileCheck %s --check-prefix=AVX512BF16
llvm/test/CodeGen/X86/avxneconvert-intrinsics.ll
3

--check-prefixes=CHECK,X64

4

--check-prefixes=CHECK,X86

FreddyYe updated this revision to Diff 471920.Oct 31 2022, 1:29 AM
FreddyYe marked 19 inline comments as done.

Address comments. THX for review.

FreddyYe marked an inline comment as done.Oct 31 2022, 1:29 AM
FreddyYe added inline comments.
clang/lib/Headers/avx512vlbf16intrin.h
164

I think __builtin_ia32_vcvtneps2bf16128 is also a "right" name.

See builtin_ia32_vfmaddsubph256, builtin_ia32_minph256...

And I admit naming conventions of clang builtins as well as LLVM IR builtins are confusing right now.

pengfei added inline comments.Oct 31 2022, 2:12 AM
clang/lib/Headers/avx512vlbf16intrin.h
164

The problem here is 16128 is a bit confusing, a _ breaks it into 2 number.
But I'm not insist on it :)

FreddyYe marked an inline comment as done.Oct 31 2022, 6:56 AM
FreddyYe added inline comments.
clang/lib/Headers/avx512vlbf16intrin.h
164

I did a try but found __builtin_ia32_cvtneps2bf16_256 existed for avx512bf16, and it's used for mask intrinsic lowering currently. What about not change this time? We can do a refine patch later for avx512bf16 builtins since they also have some redundant FE/codegen logics for 256/512 mask intrinsics.

LGTM.

clang/lib/Headers/avx512vlbf16intrin.h
164

No problem.

llvm/test/CodeGen/X86/avxneconvert-intrinsics-shared.ll
3

Remove -O0

llvm/test/CodeGen/X86/avxneconvert-intrinsics.ll
3

ditto.

pengfei accepted this revision.Oct 31 2022, 7:46 AM
This revision is now accepted and ready to land.Oct 31 2022, 7:46 AM
This revision was landed with ongoing or failed builds.Oct 31 2022, 8:43 AM
This revision was automatically updated to reflect the committed changes.
FreddyYe marked an inline comment as done.