This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
docs/
8/8
LanguageExtensions.rst
-
include/clang/
-
clang/
-
Basic/
-
DiagnosticSemaKinds.td
-
FPOptions.def
1/1
LangOptions.def
-
TargetInfo.h
-
Driver/
-
Options.td
-
lib/
-
AST/
1/1
Type.cpp
-
Basic/
-
TargetInfo.cpp
-
Targets/
2/2
AMDGPU.h
-
ARM.cpp
-
NVPTX.h
-
X86.h
9/9
X86.cpp
-
CodeGen/
-
CGExprScalar.cpp
-
Driver/ToolChains/
-
ToolChains/
-
Clang.cpp
-
Sema/
-
SemaCast.cpp
-
SemaExpr.cpp
2/2
SemaOverload.cpp
-
test/
-
CodeGen/X86/
-
X86/
-
avx512bf16-error.c
-
bfloat-mangle.cpp
3/4
bfloat16.cpp
3/3
fexcess-precision-bfloat16.c
-
CodeGenCUDA/
-
amdgpu-bf16.cu
-
bf16.cu
-
Driver/
-
fexcess-precision.c
-
Sema/
-
arm-bf16-forbidden-ops.c
-
arm-bf16-forbidden-ops.cpp
1/1
arm-bfloat.cpp
-
SemaCUDA/
-
amdgpu-bf16.cu
-
bf16.cu

Differential D150913

[Clang][BFloat16] Upgrade __bf16 to arithmetic type, change mangling, and extend excess precision support.
ClosedPublic

Authored by codemzs on May 18 2023, 3:27 PM.

Download Raw Diff

Details

Reviewers

tahonermann
rjmccall
zahiraam
stuij
pengfei
erichkeane

Commits

rGe62175736551: [Clang][BFloat16] Upgrade __bf16 to arithmetic type, change mangling, and…

Summary

Pursuant to RFC discussions, this change enhances the handling of the __bf16 type in Clang.

Firstly, it upgrades __bf16 from a storage-only type to an arithmetic type.
Secondly, it changes the mangling of __bf16 to DF16b on all architectures except ARM. This change has been made in accordance with the finalization of the mangling for the std::bfloat16_t type, as discussed at https://github.com/itanium-cxx-abi/cxx-abi/pull/147.
Finally, this commit extends the existing excess precision support to the __bf16 type. This applies to hardware architectures that do not natively support bfloat16 arithmetic.

Appropriate tests have been added to verify the effects of these changes and ensure no regressions in other areas of the compiler.

Diff Detail

Event Timeline

codemzs created this revision.May 18 2023, 3:27 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2023, 3:27 PM

Herald added subscribers: mattd, gchakrabarti, asavonic and 3 others. · View Herald Transcript

codemzs requested review of this revision.May 18 2023, 3:27 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2023, 3:27 PM

Herald added subscribers: cfe-commits, MaskRay, jholewinski. · View Herald Transcript

Misc style improvement.

clang/lib/AST/Type.cpp
2200	Remove the tab whitespace.
clang/lib/Sema/SemaOverload.cpp
2053–2054
clang/test/Sema/arm-bfloat.cpp
45–55	Remove newline.

codemzs mentioned this in D149573: [Clang][C++23] Implement core language changes from P1467R9 extended floating-point types and standard names.May 18 2023, 3:34 PM

Harbormaster completed remote builds in B233031: Diff 523582.May 18 2023, 10:53 PM

Great work! Thanks for the patch!

clang/include/clang/AST/ASTContext.h
1102 ↗	(On Diff #523582)	Don't have a look at ISO/IEC/IEEE 60559, but I doubt BF16 is still not a IEEE type for now.
clang/lib/Basic/Targets/AMDGPU.h
120–121	I think it's time to bring D139608 back with this patch :)
clang/lib/Basic/Targets/X86.cpp
362–363	Maybe not need it.
390	I'm not sure if I understand the meaning of `HasFullBFloat16`. If it is used for target that supports arithmetic `__bf16`, we should not use `+fullbf16` but always enable it for SSE2, i.e., `HasFullBFloat16 = SSELevel >= SSE2`. Because X86 GCC already supports arithmetic for `__bf16`. If this is used in the way like `HasLegalHalfType`, we should enable it once we have a full BF16 ISA on X86. `fullbf16` doesn't make much sense to me.
1131	ditto.
clang/test/CodeGen/X86/bfloat16.cpp
3–4	The backend has already support lowering of `bfloat`, I don't think it's necessary to do extra work in FE unless for excess-precision.
clang/test/CodeGen/X86/fexcess-precision-bfloat16.c
8	The tests here make me guess you want to use `fullbf16` the same as `HasLegalHalfType`.

rjmccall added inline comments.May 19 2023, 9:49 AM

clang/lib/Basic/Targets/X86.cpp
390	At the moment, we haven't done the work to emulate BFloat16 arithmetic in any of the three ways we can do that: Clang doesn't promote it in IRGen, LLVM doesn't promote it in legalization, and we don't have compiler-rt functions for it. If we emit these instructions, they'll just sail through LLVM and fail in the backend. So in the short term, we have to restrict this to targets that directly support BFloat16 arithmetic in hardware, which doesn't include x86. Once we have that emulation support, I agree that the x86 targets should enable this whenever they would enable `__bf16`.

I believe I had updated the __bf16 documentation in /llvm-project/clang/docs/LanguageExtensions.rst, but it appears to have been omitted in this patch. I assure you, I'll rectify this in the next iteration.

clang/include/clang/AST/ASTContext.h
1102 ↗	(On Diff #523582)	You are correct, it isn't officially part of ISO/IEEEE standard but implements the properties specified by the standard I think, in any case I will remove the comment as it could be misleading.
clang/lib/Basic/Targets/AMDGPU.h
120–121	I'm inclined to establish a default value, overridden only for ARM, to avoid repetition. If there are no objections, I plan to implement this change in the next iteration.
clang/lib/Basic/Targets/X86.cpp
390	@rjmccall, I concur and just wanted to confirm this change indeed intends to provide `BFloat16` emulation support, utilizing excess precision for promotion to `float`. The `HasFullBFloat16` switch is designed to determine excess precision support automatically when the hardware does not natively support `bfloat16` arithmetic.

pengfei added inline comments.May 19 2023, 11:37 PM

clang/lib/Basic/Targets/X86.cpp
390	LLVM doesn't promote it in legalization, and we don't have compiler-rt functions for it. That's not true: https://godbolt.org/z/jxf5E83vG. The `HasFullBFloat16` switch is designed to determine excess precision support automatically when the hardware does not natively support bfloat16 arithmetic. Makes sense to me.

zahiraam added inline comments.May 21 2023, 11:27 AM

clang/lib/Basic/Targets/X86.cpp
390	At the moment, we haven't done the work to emulate BFloat16 arithmetic in any of the three ways we can do that: Clang doesn't promote it in IRGen, LLVM doesn't promote it in legalization, and we don't have compiler-rt functions for it. If we emit these instructions, they'll just sail through LLVM and fail in the backend. So in the short term, we have to restrict this to targets that directly support BFloat16 arithmetic in hardware, which doesn't include x86. Once we have that emulation support, I agree that the x86 targets should enable this whenever they would enable `__bf16`. Would be nice to add a comment to clarify it.
clang/test/CodeGen/X86/bfloat16.cpp
3–4	The backend has already support lowering of `bfloat`, I don't think it's necessary to do extra work in FE unless for excess-precision. +1.
clang/test/CodeGen/X86/fexcess-precision-bfloat16.c
361	Fix this.

pengfei mentioned this in D136919: [X86][RFC] Change mangle name of __bf16 from u6__bf16 to DF16b.May 21 2023, 5:59 PM

asb mentioned this in D150929: [RISCV][BF16] Enable __bf16 for riscv targets.May 22 2023, 9:11 AM

@pengfei, @zahiraam, I appreciate your feedback.

@pengfei, the HasFullBFloat16 flag is primarily for identifying hardware with native bfloat16 support to facilitate automatic excess precision support. I concur that since x86 possesses backend bfloat16 emulation (as noted in D126953), front-end emulation might not be necessary. The test's purpose was to provide coverage for this change. However, I am open to either removing it entirely or relocating it to a more suitable target as per your recommendation.

codemzs added inline comments.May 22 2023, 1:48 PM

clang/lib/Basic/Targets/X86.cpp
362–363	Clarified on the other thread but if you have questions please feel free to post here and I will address them.
390	@pengfei, you're right. As part of D126953, the x86 backend received `bfloat16` emulation support. Also, I hope my explanation about the `HasFullBFloat16` flag addressed your questions. Please let me know if further clarification/change is needed.
clang/test/CodeGen/X86/bfloat16.cpp
3–4	@pengfei @zahiraam I added this test to verify bfloat16 IR gen functionality, considering both scenarios: with and without native bfloat16 support. However, if you believe it's more beneficial to omit it, I'm open to doing so. Happy to also move this test to another target that doesn't have backend support for emulation.
clang/test/CodeGen/X86/fexcess-precision-bfloat16.c
8	Yes that is correct it is just to emulate the correct IR gen if x86 were to have native support. Happy to remove these tests if you feel that is better?

zahiraam added inline comments.May 22 2023, 1:55 PM

clang/test/CodeGen/X86/bfloat16.cpp
3–4	I think that's fine. You can leave it.

Harbormaster completed remote builds in B233697: Diff 524467.May 22 2023, 6:51 PM

LGTM.

LGTM. Just a minor comment.

clang/include/clang/Basic/LangOptions.def
321	May be differentiate the description from the previous line?

Apologies for misunderstanding what this patch was doing. This all seems reasonable, and the code changes look good. I think the documentation needs significant reorganization; I've attached a draft. Please review for correctness.

clang/docs/LanguageExtensions.rst

852

Suggested rework:

Clang supports three half-precision (16-bit) floating point types: ``__fp16``,
``_Float16`` and ``__bf16``.  These types are supported in all language
modes, but not on all targets:

- ``__fp16`` is supported on every target.

- ``_Float16`` is currently supported on the following targets:
  * 32-bit ARM (natively on some architecture versions)
  * 64-bit ARM (AArch64) (natively on ARMv8.2a and above)
  * AMDGPU (natively)
  * SPIR (natively)
  * X86 (if SSE2 is available; natively if AVX512-FP16 is also available)

- ``__bf16`` is currently supported on the following targets:
  * 32-bit ARM
  * 64-bit ARM (AArch64)
  * X86 (when SSE2 is available)

(For X86, SSE2 is available on 64-bit and all recent 32-bit processors.)

``__fp16`` and ``_Float16`` both use the binary16 format from IEEE
754-2008, which provides a 5-bit exponent and an 11-bit significand
(counting the implicit leading 1).  ``__bf16`` uses the `bfloat16
<https://en.wikipedia.org/wiki/Bfloat16_floating-point_format>`_ format,
which provides an 8-bit exponent and an 8-bit significand; this is the same
exponent range as `float`, just with greatly reduced precision.

``_Float16`` and ``__bf16`` follow the usual rules for arithmetic
floating-point types.  Most importantly, this means that arithmetic operations
on operands of these types are formally performed in the type and produce
values of the type.  ``__fp16`` does not follow those rules: most operations
immediately promote operands of type ``__fp16`` to ``float``, and so
arithmetic operations are defined to be performed in ``float`` and so result in
a value of type ``float`` (unless further promoted because of other operands).
See below for more information on the exact specifications of these types.

Only some of the supported processors for ``__fp16`` and ``__bf16`` offer
native hardware support for arithmetic in their corresponding formats.
The exact conditions are described in the lists above.  When compiling for a
processor without native support, Clang will perform the arithmetic in
``float``, inserting extensions and truncations as necessary.  This can be
done in a way that exactly emulates the behavior of hardware support for
arithmetic, but it can require many extra operations.  By default, Clang takes
advantage of the C standard's allowances for excess precision in intermediate
operands in order to eliminate intermediate truncations within statements.
This is generally much faster but can generate different results from strict
operation-by-operation emulation.

The use of excess precision can be independently controlled for these two
types with the ``-ffloat16-excess-precision=`` and
``-fbfloat16-excess-precision=`` options.  Valid values include:
- ``none`` (meaning to perform strict operation-by-operation emulation)
- ``standard`` (meaning that excess precision is permitted under the rules
  described in the standard, i.e. never across explicit casts or statements)
- ``fast`` (meaning that excess precision is permitted whenever the
  optimizer sees an opportunity to avoid truncations; currently this has no
  effect beyond ``standard``)

The ``_Float16`` type is an interchange floating type specified in
 ISO/IEC TS 18661-3:2015 ("Floating-point extensions for C").  It will
be supported on more targets as they define ABIs for it.

The ``__bf16`` type is a non-standard extension, but it generally follows
the rules for arithmetic interchange floating types from ISO/IEC TS
18661-3:2015.  In previous versions of Clang, it was a storage-only type
that forbade arithmetic operations.  It will be supported on more targets
as they define ABIs for it.

The ``__fp16`` type was originally an ARM extension and is specified
by the `ARM C Language Extensions <https://github.com/ARM-software/acle/releases>`_.
Clang uses the ``binary16`` format from IEEE 754-2008 for ``__fp16``,
not the ARM alternative format.  Operators that expect arithmetic operands
immediately promote ``__fp16`` operands to ``float``.

It is recommended that portable code use ``_Float16`` instead of ``__fp16``,
as it has been defined by the C standards committee and has behavior that is
more familiar to most programmers.

Because ``__fp16`` operands are always immediately promoted to ``float``, the
common real type of ``__fp16`` and ``_Float16`` for the purposes of the usual
arithmetic conversions is ``float``.

A literal can be given ``_Float16`` type using the suffix ``f16``. For example,
``3.14f16``.

Because default argument promotion only applies to the standard floating-point
types, ``_Float16`` values are not promoted to ``double`` when passed as variadic
or untyped arguments.  As a consequence, some caution must be taken when using
certain library facilities with ``_Float16``; for example, there is no ``printf`` format
specifier for ``_Float16``, and (unlike ``float``) it will not be implicitly promoted to
``double`` when passed to ``printf``, so the programmer must explicitly cast it to
``double`` before using it with an ``%f`` or similar specifier.

clang/lib/Sema/SemaOverload.cpp

1998

pengfei added inline comments.May 24 2023, 5:25 AM

clang/docs/LanguageExtensions.rst
852	Only some of the supported processors for ``__fp16`` and ``__bf16`` offer native hardware support for arithmetic in their corresponding formats. Do you mean `_Float16`? The exact conditions are described in the lists above. When compiling for a processor without native support, Clang will perform the arithmetic in ``float``, inserting extensions and truncations as necessary. It's a bit conflict with `These types are supported in all language modes, but not on all targets`. Why do we need to emulate for a type that doesn't necessarily support on all target? My understand is that inserting extensions and truncations are used for 2 purposes: A type that is designed to support all target. For now, it's only used for __fp16. Support excess-precision=`standard`. This applies for both _Float16 and __bf16.

rjmccall added inline comments.May 24 2023, 11:07 AM

clang/docs/LanguageExtensions.rst
852	Do you mean `_Float16`? Yes, thank you. I knew I'd screw that up somewhere. Why do we need to emulate for a type that doesn't necessarily support on all target? Would this be clearer? Arithmetic on ``_Float16`` and ``__bf16`` is enabled on some targets that don't provide native architectural support for arithmetic on these formats. These targets are noted in the lists of supported targets above. On these targets, Clang will perform the arithmetic in ``float``, inserting extensions and truncations as necessary. My understand is that inserting extensions and truncations are used for 2 purposes: No, I believe we always insert extensions and truncations. The cases you're describing are places we insert extensions and truncations in the frontend, so that the backend doesn't see operations on `half` / `bfloat` at all. But when these operations do make it to the backend, and there's no direct architectural support for them on the target, the backend still just inserts extensions and truncations so it can do the arithmetic in `float`. This is clearest in the ARM codegen (https://godbolt.org/z/q9KoGEYqb) because the conversions are just instructions, but you can also see it in the X86 codegen (https://godbolt.org/z/ejdd4P65W): all the runtime functions are just extensions/truncations, and the actual arithmetic is done with `mulss` and `addss`. This frontend/backend distinction is not something that matters to users, so the documentation glosses over the difference. I haven't done an exhaustive investigation, so it's possible that there are types and targets where we emit a compiler-rt call to do each operation instead, but those compiler-rt functions almost certainly just do an extension to float in the same way, so I don't think the documentation as written would be misleading for those targets, either.

Incorporates suggestions provided by @rjmccall, @pengfei, and @zahiraam.

@rjmccall, your thorough restructuring of the floating-point types documentation is highly appreciated. Thank you.

Harbormaster completed remote builds in B234295: Diff 525298.May 24 2023, 8:53 PM

pengfei added inline comments.May 25 2023, 12:19 AM

clang/docs/LanguageExtensions.rst
852	Thanks for the explanation! Sorry, I failed to make the distinction between "support" and "natively support", I guess users may be confusing at the beginning too. I agree the documentation is to explain the whole behavior of compile to user. I think we have 3 aspects that want to tell users: Whether a type is arithmetic type or not and is (natively) supported by all targets or just a few; The result of a type may not be consistent across different targets or/and excess-precision value; The excess-precision control doesn't take effect if a type is natively supported by targets; It would be more clear if we can give such a summary before the detailed explanation.

codemzs added inline comments.May 25 2023, 11:15 AM

clang/docs/LanguageExtensions.rst
852	Does adding the below to the top of the description make it more clear? Half-Precision Floating Point Clang supports three half-precision (16-bit) floating point types: `__fp16`,` `_Float16` `and` `__bf16``. These types are supported in all language modes, but their support differs across targets. Here, it's important to understand the difference between "support" and "natively support": A type is "supported" if the compiler can handle code using that type, which might involve translating operations into an equivalent code that the target hardware understands. A type is "natively supported" if the hardware itself understands the type and can perform operations on it directly. This typically yields better performance and more accurate results. Another crucial aspect to note is the consistency of the result of a type across different targets and excess-precision values. Different hardware (targets) might produce slightly different results due to the level of precision they support and how they handle excess-precision values. It means the same code can yield different results when compiled for different hardware. Finally, note that the control of excess-precision does not take effect if a type is natively supported by targets. If the hardware supports the type directly, the compiler does not need to (and cannot) use excess precision to potentially speed up the operations. Given these points, here is the detailed support for each type: `__fp16` is supported on every target. `_Float16` is currently supported on the following targets: 32-bit ARM (natively on some architecture versions) 64-bit ARM (AArch64) (natively on ARMv8.2a and above) AMDGPU (natively) SPIR (natively) X86 (if SSE2 is available; natively if AVX512-FP16 is also available) `__bf16` is currently supported on the following targets: 32-bit ARM 64-bit ARM (AArch64) X86 (when SSE2 is available) ... ...

rjmccall added inline comments.May 25 2023, 12:05 PM

clang/docs/LanguageExtensions.rst

852

I think that's a good basic idea, but it's okay to leave some of the detail for later. How about this:

Clang supports three half-precision (16-bit) floating point types: ``__fp16``, ``_Float16`` and ``__bf16``. These types are supported in all language modes, but their support differs between targets.  A target is said to have "native support" for a type if the target processor offers instructions for directly performing basic arithmetic on that type.  In the absence of native support, a type can still be supported if the compiler can emulate arithmetic on the type by promoting to ``float``; see below for more information on this emulation.

* ``__fp16`` is supported on all targets.  The special semantics of this type mean that no arithmetic is ever performed directly on ``__fp16`` values; see below.

* ``_Float16`` is supported on the following targets: (...)

* ``__bf16`` is supported on the following targets (currently never natively): (...)

And then below we can adjust the paragraph about emulation:

When compiling arithmetic on ``_Float16`` and ``__bf16`` for a target without
native support, Clang will perform the arithmetic in ``float``, inserting extensions
and truncations as necessary.  This can be done in a way that exactly matches the
operation-by-operation behavior of native support, but that can require many
extra truncations and extensions.  By default, when emulating ``_Float16`` and
``__bf16`` arithmetic using ``float``, Clang does not truncate intermediate operands
back to their true type unless the operand is the result of an explicit cast or
assignment.  This is generally much faster but can generate different results from
strict operation-by-operation emulation.  (Usually the results are more precise.)
This is permitted by the C and C++ standards under the rules for excess precision
in intermediate operands; see the discussion of evaluation formats in the C
standard and [expr.pre] in the C++ standard.

pengfei added inline comments.May 25 2023, 7:16 PM

clang/docs/LanguageExtensions.rst
852	This revision looks better. The contents are rather clear to me. Thanks!

Addresses feedback on extended floating type documentation from @rjmccall and @pengfei

Harbormaster completed remote builds in B234749: Diff 525920.May 25 2023, 10:23 PM

One slight miscommunication. Otherwise this LGTM, thank you.

clang/docs/LanguageExtensions.rst
880	You can drop this paragraph, it's no longer necessary. I should've been clearer that I was suggesting this, sorry.

Addresses @rjmccall suggestions.

rjmccall accepted this revision.May 26 2023, 9:31 AM

This revision is now accepted and ready to land.May 26 2023, 9:31 AM

Hi @rjmccall, @pengfei, and @zahiraam,

Thank you for your valuable review and acceptance of my patch. As I lack commit access, could I kindly request one of you to perform the commit on my behalf? Please use the following command: git commit --amend --author="M. Zeeshan Siddiqui <mzs@microsoft.com>".

git commit message:

[Clang][BFloat16] Upgrade __bf16 to arithmetic type, change mangling,
and extend excess precision support

Pursuant to discussions at
https://discourse.llvm.org/t/rfc-c-23-p1467r9-extended-floating-point-types-and-standard-names/70033/22,
this commit enhances the handling of the __bf16 type in Clang.
- Firstly, it upgrades __bf16 from a storage-only type to an arithmetic
  type.
- Secondly, it changes the mangling of __bf16 to DF16b on all
  architectures except ARM. This change has been made in
  accordance with the finalization of the mangling for the
  std::bfloat16_t type, as discussed at
  https://github.com/itanium-cxx-abi/cxx-abi/pull/147.
- Finally, this commit extends the existing excess precision support to
  the __bf16 type. This applies to hardware architectures that do not
  natively support bfloat16 arithmetic.
Appropriate tests have been added to verify the effects of these
changes and ensure no regressions in other areas of the compiler.

Reviewed By: rjmccall, pengfei, zahiraam

Differential Revision: https://reviews.llvm.org/D150913

I would like to add that I have rebased this patch on LLVM main as of just now and also applied clang formatting to this patch. However, to maintain consistency and respect untouched lines, I opted to reverse certain clang-format changes, which might result in a clang format failure on Debian. Should you deem it necessary for me to apply clang formatting across all lines regardless, I am open to revising the format accordingly.

Your assistance is greatly appreciated.

Harbormaster completed remote builds in B234869: Diff 526072.May 26 2023, 12:21 PM

This revision was landed with ongoing or failed builds.May 26 2023, 10:34 PM

Closed by commit rGe62175736551: [Clang][BFloat16] Upgrade __bf16 to arithmetic type, change mangling, and… (authored by codemzs, committed by pengfei). · Explain Why

This revision was automatically updated to reflect the committed changes.

pengfei added a commit: rGe62175736551: [Clang][BFloat16] Upgrade __bf16 to arithmetic type, change mangling, and….

codemzs mentioned this in D150291: [Clang] Rename internal type identifier(s) for __bf16 to BF16Ty.May 29 2023, 11:48 PM

I'm late to review and can no longer stamp an approval on this, but I'll note for the historical record that the changes look good to me.

Peter9606 added a subscriber: Peter9606.Jul 20 2023, 5:28 PM

Revision Contents

Path

Size

clang/

docs/

LanguageExtensions.rst

4 lines

include/

clang/

Basic/

DiagnosticSemaKinds.td

2 lines

FPOptions.def

1 line

LangOptions.def

1 line

TargetInfo.h

15 lines

Driver/

Options.td

9 lines

lib/

AST/

Type.cpp

21 lines

Basic/

TargetInfo.cpp

1 line

Targets/

1 line

3 lines

1 line

1 line

12 lines

CodeGen/

CGExprScalar.cpp

16 lines

Driver/

ToolChains/

Clang.cpp

5 lines

Sema/

SemaCast.cpp

14 lines

SemaExpr.cpp

4 lines

SemaOverload.cpp

10 lines

test/

CodeGen/

X86/

avx512bf16-error.c

1 line

bfloat-mangle.cpp

2 lines

bfloat16.cpp

145 lines

fexcess-precision-bfloat16.c

360 lines

CodeGenCUDA/

amdgpu-bf16.cu

10 lines

bf16.cu

22 lines

Driver/

fexcess-precision.c

4 lines

Sema/

arm-bf16-forbidden-ops.c

arm-bf16-forbidden-ops.cpp

arm-bfloat.cpp

58 lines

SemaCUDA/

amdgpu-bf16.cu

44 lines

bf16.cu

30 lines

Diff 524467

clang/docs/LanguageExtensions.rst

Show First 20 Lines • Show All 793 Lines • ▼ Show 20 Lines
performing the operation, and then truncating to ``_Float16``. When doing this		performing the operation, and then truncating to ``_Float16``. When doing this
emulation, Clang defaults to following the C standard's rules for excess		emulation, Clang defaults to following the C standard's rules for excess
precision arithmetic, which avoids intermediate truncations within statements		precision arithmetic, which avoids intermediate truncations within statements
and may generate different results from a strict operation-by-operation		and may generate different results from a strict operation-by-operation
emulation.		emulation.

``_Float16`` will be supported on more targets as they define ABIs for it.		``_Float16`` will be supported on more targets as they define ABIs for it.

``__bf16`` is purely a storage format; it is currently only supported on the following targets:		``__bf16`` is an arithmetic type that utilizes native support if available,
		else promotes to ``float`` for arithmetic operations and truncates back to
		``__bf16``, akin to ``_Float16``. It's currently supported on:"

* 32-bit ARM		* 32-bit ARM
* 64-bit ARM (AArch64)		* 64-bit ARM (AArch64)
* X86 (see below)		* X86 (see below)

On X86 targets, ``__bf16`` is supported as long as SSE2 is available, which		On X86 targets, ``__bf16`` is supported as long as SSE2 is available, which
includes all 64-bit and all recent 32-bit processors.		includes all 64-bit and all recent 32-bit processors.

Show All 31 Lines
``3.14f16``.		``3.14f16``.

Because default argument promotion only applies to the standard floating-point		Because default argument promotion only applies to the standard floating-point
types, ``_Float16`` values are not promoted to ``double`` when passed as variadic		types, ``_Float16`` values are not promoted to ``double`` when passed as variadic
or untyped arguments. As a consequence, some caution must be taken when using		or untyped arguments. As a consequence, some caution must be taken when using
certain library facilities with ``_Float16``; for example, there is no ``printf`` format		certain library facilities with ``_Float16``; for example, there is no ``printf`` format
specifier for ``_Float16``, and (unlike ``float``) it will not be implicitly promoted to		specifier for ``_Float16``, and (unlike ``float``) it will not be implicitly promoted to
``double`` when passed to ``printf``, so the programmer must explicitly cast it to		``double`` when passed to ``printf``, so the programmer must explicitly cast it to
``double`` before using it with an ``%f`` or similar specifier.		``double`` before using it with an ``%f`` or similar specifier.
		rjmccallUnsubmitted Done Reply Inline Actions Suggested rework: Clang supports three half-precision (16-bit) floating point types: ``__fp16``, ``_Float16`` and ``__bf16``. These types are supported in all language modes, but not on all targets: - ``__fp16`` is supported on every target. - ``_Float16`` is currently supported on the following targets: * 32-bit ARM (natively on some architecture versions) * 64-bit ARM (AArch64) (natively on ARMv8.2a and above) * AMDGPU (natively) * SPIR (natively) * X86 (if SSE2 is available; natively if AVX512-FP16 is also available) - ``__bf16`` is currently supported on the following targets: * 32-bit ARM * 64-bit ARM (AArch64) * X86 (when SSE2 is available) (For X86, SSE2 is available on 64-bit and all recent 32-bit processors.) ``__fp16`` and ``_Float16`` both use the binary16 format from IEEE 754-2008, which provides a 5-bit exponent and an 11-bit significand (counting the implicit leading 1). ``__bf16`` uses the `bfloat16 <https://en.wikipedia.org/wiki/Bfloat16_floating-point_format>`_ format, which provides an 8-bit exponent and an 8-bit significand; this is the same exponent range as `float`, just with greatly reduced precision. ``_Float16`` and ``__bf16`` follow the usual rules for arithmetic floating-point types. Most importantly, this means that arithmetic operations on operands of these types are formally performed in the type and produce values of the type. ``__fp16`` does not follow those rules: most operations immediately promote operands of type ``__fp16`` to ``float``, and so arithmetic operations are defined to be performed in ``float`` and so result in a value of type ``float`` (unless further promoted because of other operands). See below for more information on the exact specifications of these types. Only some of the supported processors for ``__fp16`` and ``__bf16`` offer native hardware support for arithmetic in their corresponding formats. The exact conditions are described in the lists above. When compiling for a processor without native support, Clang will perform the arithmetic in ``float``, inserting extensions and truncations as necessary. This can be done in a way that exactly emulates the behavior of hardware support for arithmetic, but it can require many extra operations. By default, Clang takes advantage of the C standard's allowances for excess precision in intermediate operands in order to eliminate intermediate truncations within statements. This is generally much faster but can generate different results from strict operation-by-operation emulation. The use of excess precision can be independently controlled for these two types with the ``-ffloat16-excess-precision=`` and ``-fbfloat16-excess-precision=`` options. Valid values include: - ``none`` (meaning to perform strict operation-by-operation emulation) - ``standard`` (meaning that excess precision is permitted under the rules described in the standard, i.e. never across explicit casts or statements) - ``fast`` (meaning that excess precision is permitted whenever the optimizer sees an opportunity to avoid truncations; currently this has no effect beyond ``standard``) The ``_Float16`` type is an interchange floating type specified in ISO/IEC TS 18661-3:2015 ("Floating-point extensions for C"). It will be supported on more targets as they define ABIs for it. The ``__bf16`` type is a non-standard extension, but it generally follows the rules for arithmetic interchange floating types from ISO/IEC TS 18661-3:2015. In previous versions of Clang, it was a storage-only type that forbade arithmetic operations. It will be supported on more targets as they define ABIs for it. The ``__fp16`` type was originally an ARM extension and is specified by the `ARM C Language Extensions <https://github.com/ARM-software/acle/releases>`_. Clang uses the ``binary16`` format from IEEE 754-2008 for ``__fp16``, not the ARM alternative format. Operators that expect arithmetic operands immediately promote ``__fp16`` operands to ``float``. It is recommended that portable code use ``_Float16`` instead of ``__fp16``, as it has been defined by the C standards committee and has behavior that is more familiar to most programmers. Because ``__fp16`` operands are always immediately promoted to ``float``, the common real type of ``__fp16`` and ``_Float16`` for the purposes of the usual arithmetic conversions is ``float``. A literal can be given ``_Float16`` type using the suffix ``f16``. For example, ``3.14f16``. Because default argument promotion only applies to the standard floating-point types, ``_Float16`` values are not promoted to ``double`` when passed as variadic or untyped arguments. As a consequence, some caution must be taken when using certain library facilities with ``_Float16``; for example, there is no ``printf`` format specifier for ``_Float16``, and (unlike ``float``) it will not be implicitly promoted to ``double`` when passed to ``printf``, so the programmer must explicitly cast it to ``double`` before using it with an ``%f`` or similar specifier. rjmccall: Suggested rework: ``` Clang supports three half-precision (16-bit) floating point types…
		pengfeiUnsubmitted Done Reply Inline Actions Only some of the supported processors for ``__fp16`` and ``__bf16`` offer native hardware support for arithmetic in their corresponding formats. Do you mean `_Float16`? The exact conditions are described in the lists above. When compiling for a processor without native support, Clang will perform the arithmetic in ``float``, inserting extensions and truncations as necessary. It's a bit conflict with `These types are supported in all language modes, but not on all targets`. Why do we need to emulate for a type that doesn't necessarily support on all target? My understand is that inserting extensions and truncations are used for 2 purposes: A type that is designed to support all target. For now, it's only used for __fp16. Support excess-precision=`standard`. This applies for both _Float16 and __bf16. pengfei: ``` Only some of the supported processors for ``__fp16`` and ``__bf16`` offer native hardware…
		rjmccallUnsubmitted Done Reply Inline Actions Do you mean `_Float16`? Yes, thank you. I knew I'd screw that up somewhere. Why do we need to emulate for a type that doesn't necessarily support on all target? Would this be clearer? Arithmetic on ``_Float16`` and ``__bf16`` is enabled on some targets that don't provide native architectural support for arithmetic on these formats. These targets are noted in the lists of supported targets above. On these targets, Clang will perform the arithmetic in ``float``, inserting extensions and truncations as necessary. My understand is that inserting extensions and truncations are used for 2 purposes: No, I believe we always insert extensions and truncations. The cases you're describing are places we insert extensions and truncations in the frontend, so that the backend doesn't see operations on `half` / `bfloat` at all. But when these operations do make it to the backend, and there's no direct architectural support for them on the target, the backend still just inserts extensions and truncations so it can do the arithmetic in `float`. This is clearest in the ARM codegen (https://godbolt.org/z/q9KoGEYqb) because the conversions are just instructions, but you can also see it in the X86 codegen (https://godbolt.org/z/ejdd4P65W): all the runtime functions are just extensions/truncations, and the actual arithmetic is done with `mulss` and `addss`. This frontend/backend distinction is not something that matters to users, so the documentation glosses over the difference. I haven't done an exhaustive investigation, so it's possible that there are types and targets where we emit a compiler-rt call to do each operation instead, but those compiler-rt functions almost certainly just do an extension to float in the same way, so I don't think the documentation as written would be misleading for those targets, either. rjmccall: > Do you mean `_Float16`? Yes, thank you. I knew I'd screw that up somewhere. > Why do we…
		pengfeiUnsubmitted Done Reply Inline Actions Thanks for the explanation! Sorry, I failed to make the distinction between "support" and "natively support", I guess users may be confusing at the beginning too. I agree the documentation is to explain the whole behavior of compile to user. I think we have 3 aspects that want to tell users: Whether a type is arithmetic type or not and is (natively) supported by all targets or just a few; The result of a type may not be consistent across different targets or/and excess-precision value; The excess-precision control doesn't take effect if a type is natively supported by targets; It would be more clear if we can give such a summary before the detailed explanation. pengfei: Thanks for the explanation! Sorry, I failed to make the distinction between "support" and…
		codemzsAuthorUnsubmitted Done Reply Inline Actions Does adding the below to the top of the description make it more clear? Half-Precision Floating Point Clang supports three half-precision (16-bit) floating point types: `__fp16`,` `_Float16` `and` `__bf16``. These types are supported in all language modes, but their support differs across targets. Here, it's important to understand the difference between "support" and "natively support": A type is "supported" if the compiler can handle code using that type, which might involve translating operations into an equivalent code that the target hardware understands. A type is "natively supported" if the hardware itself understands the type and can perform operations on it directly. This typically yields better performance and more accurate results. Another crucial aspect to note is the consistency of the result of a type across different targets and excess-precision values. Different hardware (targets) might produce slightly different results due to the level of precision they support and how they handle excess-precision values. It means the same code can yield different results when compiled for different hardware. Finally, note that the control of excess-precision does not take effect if a type is natively supported by targets. If the hardware supports the type directly, the compiler does not need to (and cannot) use excess precision to potentially speed up the operations. Given these points, here is the detailed support for each type: `__fp16` is supported on every target. `_Float16` is currently supported on the following targets: 32-bit ARM (natively on some architecture versions) 64-bit ARM (AArch64) (natively on ARMv8.2a and above) AMDGPU (natively) SPIR (natively) X86 (if SSE2 is available; natively if AVX512-FP16 is also available) `__bf16` is currently supported on the following targets: 32-bit ARM 64-bit ARM (AArch64) X86 (when SSE2 is available) ... ... codemzs: Does adding the below to the top of the description make it more clear? Half-Precision…
		rjmccallUnsubmitted Done Reply Inline Actions I think that's a good basic idea, but it's okay to leave some of the detail for later. How about this: Clang supports three half-precision (16-bit) floating point types: ``__fp16``, ``_Float16`` and ``__bf16``. These types are supported in all language modes, but their support differs between targets. A target is said to have "native support" for a type if the target processor offers instructions for directly performing basic arithmetic on that type. In the absence of native support, a type can still be supported if the compiler can emulate arithmetic on the type by promoting to ``float``; see below for more information on this emulation. * ``__fp16`` is supported on all targets. The special semantics of this type mean that no arithmetic is ever performed directly on ``__fp16`` values; see below. * ``_Float16`` is supported on the following targets: (...) * ``__bf16`` is supported on the following targets (currently never natively): (...) And then below we can adjust the paragraph about emulation: When compiling arithmetic on ``_Float16`` and ``__bf16`` for a target without native support, Clang will perform the arithmetic in ``float``, inserting extensions and truncations as necessary. This can be done in a way that exactly matches the operation-by-operation behavior of native support, but that can require many extra truncations and extensions. By default, when emulating ``_Float16`` and ``__bf16`` arithmetic using ``float``, Clang does not truncate intermediate operands back to their true type unless the operand is the result of an explicit cast or assignment. This is generally much faster but can generate different results from strict operation-by-operation emulation. (Usually the results are more precise.) This is permitted by the C and C++ standards under the rules for excess precision in intermediate operands; see the discussion of evaluation formats in the C standard and [expr.pre] in the C++ standard. rjmccall: I think that's a good basic idea, but it's okay to leave some of the detail for later. How…
		pengfeiUnsubmitted Done Reply Inline Actions This revision looks better. The contents are rather clear to me. Thanks! pengfei: This revision looks better. The contents are rather clear to me. Thanks!

Messages on ``deprecated`` and ``unavailable`` Attributes		Messages on ``deprecated`` and ``unavailable`` Attributes
=========================================================		=========================================================

An optional string message can be added to the ``deprecated`` and		An optional string message can be added to the ``deprecated`` and
``unavailable`` attributes. For example:		``unavailable`` attributes. For example:

.. code-block:: c++		.. code-block:: c++
Show All 11 Lines	harmless.c:4:3: warning: 'explode' is deprecated: extremely unsafe, use 'combust' instead!!!
^		^

Query for this feature with		Query for this feature with
``__has_extension(attribute_deprecated_with_message)`` and		``__has_extension(attribute_deprecated_with_message)`` and
``__has_extension(attribute_unavailable_with_message)``.		``__has_extension(attribute_unavailable_with_message)``.

Attributes on Enumerators		Attributes on Enumerators
=========================		=========================

		rjmccallUnsubmitted Done Reply Inline Actions You can drop this paragraph, it's no longer necessary. I should've been clearer that I was suggesting this, sorry. rjmccall: You can drop this paragraph, it's no longer necessary. I should've been clearer that I was…
Clang allows attributes to be written on individual enumerators. This allows		Clang allows attributes to be written on individual enumerators. This allows
enumerators to be deprecated, made unavailable, etc. The attribute must appear		enumerators to be deprecated, made unavailable, etc. The attribute must appear
after the enumerator name and before any initializer, like so:		after the enumerator name and before any initializer, like so:

.. code-block:: c++		.. code-block:: c++

enum OperationMode {		enum OperationMode {
OM_Invalid,		OM_Invalid,
▲ Show 20 Lines • Show All 4,114 Lines • Show Last 20 Lines

clang/include/clang/Basic/DiagnosticSemaKinds.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,742 Lines • ▼ Show 20 Lines	def warn_cast_function_type : Warning<
InGroup<CastFunctionType>, DefaultIgnore;		InGroup<CastFunctionType>, DefaultIgnore;
def warn_cast_function_type_strict : Warning<warn_cast_function_type.Summary>,		def warn_cast_function_type_strict : Warning<warn_cast_function_type.Summary>,
InGroup<CastFunctionTypeStrict>, DefaultIgnore;		InGroup<CastFunctionTypeStrict>, DefaultIgnore;
def err_cast_pointer_to_non_pointer_int : Error<		def err_cast_pointer_to_non_pointer_int : Error<
"pointer cannot be cast to type %0">;		"pointer cannot be cast to type %0">;
def err_nullptr_cast : Error<		def err_nullptr_cast : Error<
"cannot cast an object of type %select{'nullptr_t' to %1\|%1 to 'nullptr_t'}0"		"cannot cast an object of type %select{'nullptr_t' to %1\|%1 to 'nullptr_t'}0"
>;		>;
def err_cast_to_bfloat16 : Error<"cannot type-cast to __bf16">;
def err_cast_from_bfloat16 : Error<"cannot type-cast from __bf16">;
def err_typecheck_expect_scalar_operand : Error<		def err_typecheck_expect_scalar_operand : Error<
"operand of type %0 where arithmetic or pointer type is required">;		"operand of type %0 where arithmetic or pointer type is required">;
def err_typecheck_cond_incompatible_operands : Error<		def err_typecheck_cond_incompatible_operands : Error<
"incompatible operand types%diff{ ($ and $)\|}0,1">;		"incompatible operand types%diff{ ($ and $)\|}0,1">;
def err_typecheck_expect_flt_or_vector : Error<		def err_typecheck_expect_flt_or_vector : Error<
"invalid operand of type %0 where floating, complex or "		"invalid operand of type %0 where floating, complex or "
"a vector of such types is required">;		"a vector of such types is required">;
def err_cast_selector_expr : Error<		def err_cast_selector_expr : Error<
▲ Show 20 Lines • Show All 3,084 Lines • Show Last 20 Lines

clang/include/clang/Basic/FPOptions.def

	Show All 20 Lines
	OPTION(AllowFPReassociate, bool, 1, AllowFEnvAccess)			OPTION(AllowFPReassociate, bool, 1, AllowFEnvAccess)
	OPTION(NoHonorNaNs, bool, 1, AllowFPReassociate)			OPTION(NoHonorNaNs, bool, 1, AllowFPReassociate)
	OPTION(NoHonorInfs, bool, 1, NoHonorNaNs)			OPTION(NoHonorInfs, bool, 1, NoHonorNaNs)
	OPTION(NoSignedZero, bool, 1, NoHonorInfs)			OPTION(NoSignedZero, bool, 1, NoHonorInfs)
	OPTION(AllowReciprocal, bool, 1, NoSignedZero)			OPTION(AllowReciprocal, bool, 1, NoSignedZero)
	OPTION(AllowApproxFunc, bool, 1, AllowReciprocal)			OPTION(AllowApproxFunc, bool, 1, AllowReciprocal)
	OPTION(FPEvalMethod, LangOptions::FPEvalMethodKind, 2, AllowApproxFunc)			OPTION(FPEvalMethod, LangOptions::FPEvalMethodKind, 2, AllowApproxFunc)
	OPTION(Float16ExcessPrecision, LangOptions::ExcessPrecisionKind, 2, FPEvalMethod)			OPTION(Float16ExcessPrecision, LangOptions::ExcessPrecisionKind, 2, FPEvalMethod)
				OPTION(BFloat16ExcessPrecision, LangOptions::ExcessPrecisionKind, 2, FPEvalMethod)
	#undef OPTION			#undef OPTION

clang/include/clang/Basic/LangOptions.def

	Show First 20 Lines • Show All 312 Lines • ▼ Show 20 Lines
	COMPATIBLE_LANGOPT(CLFiniteMathOnly , 1, 0, "__FINITE_MATH_ONLY__ predefined macro")			COMPATIBLE_LANGOPT(CLFiniteMathOnly , 1, 0, "__FINITE_MATH_ONLY__ predefined macro")
	/// FP_CONTRACT mode (on/off/fast).			/// FP_CONTRACT mode (on/off/fast).
	BENIGN_ENUM_LANGOPT(DefaultFPContractMode, FPModeKind, 2, FPM_Off, "FP contraction type")			BENIGN_ENUM_LANGOPT(DefaultFPContractMode, FPModeKind, 2, FPM_Off, "FP contraction type")
	COMPATIBLE_LANGOPT(ExpStrictFP, 1, false, "Enable experimental strict floating point")			COMPATIBLE_LANGOPT(ExpStrictFP, 1, false, "Enable experimental strict floating point")
	BENIGN_LANGOPT(RoundingMath, 1, false, "Do not assume default floating-point rounding behavior")			BENIGN_LANGOPT(RoundingMath, 1, false, "Do not assume default floating-point rounding behavior")
	BENIGN_ENUM_LANGOPT(FPExceptionMode, FPExceptionModeKind, 2, FPE_Default, "FP Exception Behavior Mode type")			BENIGN_ENUM_LANGOPT(FPExceptionMode, FPExceptionModeKind, 2, FPE_Default, "FP Exception Behavior Mode type")
	BENIGN_ENUM_LANGOPT(FPEvalMethod, FPEvalMethodKind, 2, FEM_UnsetOnCommandLine, "FP type used for floating point arithmetic")			BENIGN_ENUM_LANGOPT(FPEvalMethod, FPEvalMethodKind, 2, FEM_UnsetOnCommandLine, "FP type used for floating point arithmetic")
	ENUM_LANGOPT(Float16ExcessPrecision, ExcessPrecisionKind, 2, FPP_Standard, "Intermediate truncation behavior for floating point arithmetic")			ENUM_LANGOPT(Float16ExcessPrecision, ExcessPrecisionKind, 2, FPP_Standard, "Intermediate truncation behavior for floating point arithmetic")
				ENUM_LANGOPT(BFloat16ExcessPrecision, ExcessPrecisionKind, 2, FPP_Standard, "Intermediate truncation behavior for floating point arithmetic")
				zahiraamUnsubmitted Done Reply Inline Actions May be differentiate the description from the previous line? zahiraam: May be differentiate the description from the previous line?
	LANGOPT(NoBitFieldTypeAlign , 1, 0, "bit-field type alignment")			LANGOPT(NoBitFieldTypeAlign , 1, 0, "bit-field type alignment")
	LANGOPT(HexagonQdsp6Compat , 1, 0, "hexagon-qdsp6 backward compatibility")			LANGOPT(HexagonQdsp6Compat , 1, 0, "hexagon-qdsp6 backward compatibility")
	LANGOPT(ObjCAutoRefCount , 1, 0, "Objective-C automated reference counting")			LANGOPT(ObjCAutoRefCount , 1, 0, "Objective-C automated reference counting")
	LANGOPT(ObjCWeakRuntime , 1, 0, "__weak support in the ARC runtime")			LANGOPT(ObjCWeakRuntime , 1, 0, "__weak support in the ARC runtime")
	LANGOPT(ObjCWeak , 1, 0, "Objective-C __weak in ARC and MRC files")			LANGOPT(ObjCWeak , 1, 0, "Objective-C __weak in ARC and MRC files")
	LANGOPT(ObjCSubscriptingLegacyRuntime , 1, 0, "Subscripting support in legacy ObjectiveC runtime")			LANGOPT(ObjCSubscriptingLegacyRuntime , 1, 0, "Subscripting support in legacy ObjectiveC runtime")
	BENIGN_LANGOPT(CompatibilityQualifiedIdBlockParamTypeChecking, 1, 0,			BENIGN_LANGOPT(CompatibilityQualifiedIdBlockParamTypeChecking, 1, 0,
	"compatibility mode for type checking block parameters "			"compatibility mode for type checking block parameters "
	▲ Show 20 Lines • Show All 148 Lines • Show Last 20 Lines

clang/include/clang/Basic/TargetInfo.h

Show First 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	protected:
bool VLASupported;		bool VLASupported;
bool NoAsmVariants; // True if {\|} are normal characters.		bool NoAsmVariants; // True if {\|} are normal characters.
bool HasLegalHalfType; // True if the backend supports operations on the half		bool HasLegalHalfType; // True if the backend supports operations on the half
// LLVM IR type.		// LLVM IR type.
bool HalfArgsAndReturns;		bool HalfArgsAndReturns;
bool HasFloat128;		bool HasFloat128;
bool HasFloat16;		bool HasFloat16;
bool HasBFloat16;		bool HasBFloat16;
		bool HasFullBFloat16; // True if the backend supports native bfloat16
		// arithmetic. Used to determine excess precision
		// support in the frontend.
bool HasIbm128;		bool HasIbm128;
bool HasLongDouble;		bool HasLongDouble;
bool HasFPReturn;		bool HasFPReturn;
bool HasStrictFP;		bool HasStrictFP;

unsigned char MaxAtomicPromoteWidth, MaxAtomicInlineWidth;		unsigned char MaxAtomicPromoteWidth, MaxAtomicInlineWidth;
std::string DataLayoutString;		std::string DataLayoutString;
const char *UserLabelPrefix;		const char *UserLabelPrefix;
▲ Show 20 Lines • Show All 413 Lines • ▼ Show 20 Lines	public:

/// Determine whether the __float128 type is supported on this target.		/// Determine whether the __float128 type is supported on this target.
virtual bool hasFloat128Type() const { return HasFloat128; }		virtual bool hasFloat128Type() const { return HasFloat128; }

/// Determine whether the _Float16 type is supported on this target.		/// Determine whether the _Float16 type is supported on this target.
virtual bool hasFloat16Type() const { return HasFloat16; }		virtual bool hasFloat16Type() const { return HasFloat16; }

/// Determine whether the _BFloat16 type is supported on this target.		/// Determine whether the _BFloat16 type is supported on this target.
virtual bool hasBFloat16Type() const { return HasBFloat16; }		virtual bool hasBFloat16Type() const {
		return HasBFloat16 \|\| HasFullBFloat16;
		}

		/// Determine whether the BFloat type is fully supported on this target, i.e
		/// arithemtic operations.
		virtual bool hasFullBFloat16Type() const { return HasFullBFloat16; }

/// Determine whether the __ibm128 type is supported on this target.		/// Determine whether the __ibm128 type is supported on this target.
virtual bool hasIbm128Type() const { return HasIbm128; }		virtual bool hasIbm128Type() const { return HasIbm128; }

/// Determine whether the long double type is supported on this target.		/// Determine whether the long double type is supported on this target.
virtual bool hasLongDoubleType() const { return HasLongDouble; }		virtual bool hasLongDoubleType() const { return HasLongDouble; }

/// Determine whether return of a floating point value is supported		/// Determine whether return of a floating point value is supported
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	public:
virtual const char *getFloat128Mangling() const { return "g"; }		virtual const char *getFloat128Mangling() const { return "g"; }

/// Return the mangled code of __ibm128.		/// Return the mangled code of __ibm128.
virtual const char *getIbm128Mangling() const {		virtual const char *getIbm128Mangling() const {
llvm_unreachable("ibm128 not implemented on this target");		llvm_unreachable("ibm128 not implemented on this target");
}		}

/// Return the mangled code of bfloat.		/// Return the mangled code of bfloat.
virtual const char *getBFloat16Mangling() const {		virtual const char *getBFloat16Mangling() const { return "DF16b"; }
llvm_unreachable("bfloat not implemented on this target");
}

/// Return the value for the C99 FLT_EVAL_METHOD macro.		/// Return the value for the C99 FLT_EVAL_METHOD macro.
virtual LangOptions::FPEvalMethodKind getFPEvalMethod() const {		virtual LangOptions::FPEvalMethodKind getFPEvalMethod() const {
return LangOptions::FPEvalMethodKind::FEM_Source;		return LangOptions::FPEvalMethodKind::FEM_Source;
}		}

virtual bool supportSourceEvalMethod() const { return true; }		virtual bool supportSourceEvalMethod() const { return true; }

▲ Show 20 Lines • Show All 964 Lines • Show Last 20 Lines

clang/include/clang/Driver/Options.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,636 Lines • ▼ Show 20 Lines	def ffloat16_excess_precision_EQ : Joined<["-"], "ffloat16-excess-precision=">,
Group<f_Group>, Flags<[CC1Option, NoDriverOption]>,		Group<f_Group>, Flags<[CC1Option, NoDriverOption]>,
HelpText<"Allows control over excess precision on targets where native "		HelpText<"Allows control over excess precision on targets where native "
"support for Float16 precision types is not available. By default, excess "		"support for Float16 precision types is not available. By default, excess "
"precision is used to calculate intermediate results following the "		"precision is used to calculate intermediate results following the "
"rules specified in ISO C99.">,		"rules specified in ISO C99.">,
Values<"standard,fast,none">, NormalizedValuesScope<"LangOptions">,		Values<"standard,fast,none">, NormalizedValuesScope<"LangOptions">,
NormalizedValues<["FPP_Standard", "FPP_Fast", "FPP_None"]>,		NormalizedValues<["FPP_Standard", "FPP_Fast", "FPP_None"]>,
MarshallingInfoEnum<LangOpts<"Float16ExcessPrecision">, "FPP_Standard">;		MarshallingInfoEnum<LangOpts<"Float16ExcessPrecision">, "FPP_Standard">;
		def fbfloat16_excess_precision_EQ : Joined<["-"], "fbfloat16-excess-precision=">,
		Group<f_Group>, Flags<[CC1Option, NoDriverOption]>,
		HelpText<"Allows control over excess precision on targets where native "
		"support for BFloat16 precision types is not available. By default, excess "
		"precision is used to calculate intermediate results following the "
		"rules specified in ISO C99.">,
		Values<"standard,fast,none">, NormalizedValuesScope<"LangOptions">,
		NormalizedValues<["FPP_Standard", "FPP_Fast", "FPP_None"]>,
		MarshallingInfoEnum<LangOpts<"BFloat16ExcessPrecision">, "FPP_Standard">;
def : Flag<["-"], "fexpensive-optimizations">, Group<clang_ignored_gcc_optimization_f_Group>;		def : Flag<["-"], "fexpensive-optimizations">, Group<clang_ignored_gcc_optimization_f_Group>;
def : Flag<["-"], "fno-expensive-optimizations">, Group<clang_ignored_gcc_optimization_f_Group>;		def : Flag<["-"], "fno-expensive-optimizations">, Group<clang_ignored_gcc_optimization_f_Group>;
def fextdirs_EQ : Joined<["-"], "fextdirs=">, Group<f_Group>;		def fextdirs_EQ : Joined<["-"], "fextdirs=">, Group<f_Group>;
def : Flag<["-"], "fdefer-pop">, Group<clang_ignored_gcc_optimization_f_Group>;		def : Flag<["-"], "fdefer-pop">, Group<clang_ignored_gcc_optimization_f_Group>;
def : Flag<["-"], "fno-defer-pop">, Group<clang_ignored_gcc_optimization_f_Group>;		def : Flag<["-"], "fno-defer-pop">, Group<clang_ignored_gcc_optimization_f_Group>;
def : Flag<["-"], "fextended-identifiers">, Group<clang_ignored_f_Group>;		def : Flag<["-"], "fextended-identifiers">, Group<clang_ignored_f_Group>;
def : Flag<["-"], "fno-extended-identifiers">, Group<f_Group>, Flags<[Unsupported]>;		def : Flag<["-"], "fno-extended-identifiers">, Group<f_Group>, Flags<[Unsupported]>;
def fhosted : Flag<["-"], "fhosted">, Group<f_Group>;		def fhosted : Flag<["-"], "fhosted">, Group<f_Group>;
▲ Show 20 Lines • Show All 5,606 Lines • Show Last 20 Lines

clang/lib/AST/Type.cpp

Show First 20 Lines • Show All 1,481 Lines • ▼ Show 20 Lines	return Ctx.getObjCObjectType(baseType, objType->getTypeArgsAsWritten(),
/isKindOf=/false);		/isKindOf=/false);
}		}
};		};

} // namespace		} // namespace

bool QualType::UseExcessPrecision(const ASTContext &Ctx) {		bool QualType::UseExcessPrecision(const ASTContext &Ctx) {
const BuiltinType *BT = getTypePtr()->getAs<BuiltinType>();		const BuiltinType *BT = getTypePtr()->getAs<BuiltinType>();
if (BT) {		if (!BT) {
		const VectorType *VT = getTypePtr()->getAs<VectorType>();
		if (VT) {
		QualType ElementType = VT->getElementType();
		return ElementType.UseExcessPrecision(Ctx);
		}
		} else {
switch (BT->getKind()) {		switch (BT->getKind()) {
case BuiltinType::Kind::Float16: {		case BuiltinType::Kind::Float16: {
const TargetInfo &TI = Ctx.getTargetInfo();		const TargetInfo &TI = Ctx.getTargetInfo();
if (TI.hasFloat16Type() && !TI.hasLegalHalfType() &&		if (TI.hasFloat16Type() && !TI.hasLegalHalfType() &&
Ctx.getLangOpts().getFloat16ExcessPrecision() !=		Ctx.getLangOpts().getFloat16ExcessPrecision() !=
Ctx.getLangOpts().ExcessPrecisionKind::FPP_None)		Ctx.getLangOpts().ExcessPrecisionKind::FPP_None)
return true;		return true;
return false;		return false;
}		} break;
		case BuiltinType::Kind::BFloat16: {
		const TargetInfo &TI = Ctx.getTargetInfo();
		if (TI.hasBFloat16Type() && !TI.hasFullBFloat16Type() &&
		Ctx.getLangOpts().getBFloat16ExcessPrecision() !=
		Ctx.getLangOpts().ExcessPrecisionKind::FPP_None)
		return true;
		return false;
		} break;
default:		default:
return false;		return false;
}		}
}		}
return false;		return false;
}		}

/// Substitute the given type arguments for Objective-C type		/// Substitute the given type arguments for Objective-C type
▲ Show 20 Lines • Show All 670 Lines • ▼ Show 20 Lines	bool Type::isRealType() const {
if (const auto *ET = dyn_cast<EnumType>(CanonicalType))		if (const auto *ET = dyn_cast<EnumType>(CanonicalType))
return ET->getDecl()->isComplete() && !ET->getDecl()->isScoped();		return ET->getDecl()->isComplete() && !ET->getDecl()->isScoped();
return isBitIntType();		return isBitIntType();
}		}

bool Type::isArithmeticType() const {		bool Type::isArithmeticType() const {
if (const auto *BT = dyn_cast<BuiltinType>(CanonicalType))		if (const auto *BT = dyn_cast<BuiltinType>(CanonicalType))
return BT->getKind() >= BuiltinType::Bool &&		return BT->getKind() >= BuiltinType::Bool &&
BT->getKind() <= BuiltinType::Ibm128 &&		BT->getKind() <= BuiltinType::Ibm128;
		codemzsAuthorUnsubmitted Done Reply Inline Actions Remove the tab whitespace. codemzs: Remove the tab whitespace.
BT->getKind() != BuiltinType::BFloat16;
if (const auto *ET = dyn_cast<EnumType>(CanonicalType))		if (const auto *ET = dyn_cast<EnumType>(CanonicalType))
// GCC allows forward declaration of enum types (forbid by C99 6.7.2.3p2).		// GCC allows forward declaration of enum types (forbid by C99 6.7.2.3p2).
// If a body isn't seen by the time we get here, return false.		// If a body isn't seen by the time we get here, return false.
//		//
// C++0x: Enumerations are not arithmetic types. For now, just return		// C++0x: Enumerations are not arithmetic types. For now, just return
// false for scoped enumerations since that will disable any		// false for scoped enumerations since that will disable any
// unwanted implicit conversions.		// unwanted implicit conversions.
return !ET->getDecl()->isScoped() && ET->getDecl()->isComplete();		return !ET->getDecl()->isScoped() && ET->getDecl()->isComplete();
▲ Show 20 Lines • Show All 2,501 Lines • Show Last 20 Lines

clang/lib/Basic/TargetInfo.cpp

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	TargetInfo::TargetInfo(const llvm::Triple &T) : Triple(T) {
VLASupported = true;		VLASupported = true;
NoAsmVariants = false;		NoAsmVariants = false;
HasLegalHalfType = false;		HasLegalHalfType = false;
HalfArgsAndReturns = false;		HalfArgsAndReturns = false;
HasFloat128 = false;		HasFloat128 = false;
HasIbm128 = false;		HasIbm128 = false;
HasFloat16 = false;		HasFloat16 = false;
HasBFloat16 = false;		HasBFloat16 = false;
		HasFullBFloat16 = false;
HasLongDouble = true;		HasLongDouble = true;
HasFPReturn = true;		HasFPReturn = true;
HasStrictFP = false;		HasStrictFP = false;
PointerWidth = PointerAlign = 32;		PointerWidth = PointerAlign = 32;
BoolWidth = BoolAlign = 8;		BoolWidth = BoolAlign = 8;
IntWidth = IntAlign = 32;		IntWidth = IntAlign = 32;
LongWidth = LongAlign = 32;		LongWidth = LongAlign = 32;
LongLongWidth = LongLongAlign = 64;		LongLongWidth = LongLongAlign = 64;
▲ Show 20 Lines • Show All 917 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/AMDGPU.h

Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	public:
uint64_t getPointerAlignV(LangAS AddrSpace) const override {		uint64_t getPointerAlignV(LangAS AddrSpace) const override {
return getPointerWidthV(AddrSpace);		return getPointerWidthV(AddrSpace);
}		}

uint64_t getMaxPointerWidth() const override {		uint64_t getMaxPointerWidth() const override {
return getTriple().getArch() == llvm::Triple::amdgcn ? 64 : 32;		return getTriple().getArch() == llvm::Triple::amdgcn ? 64 : 32;
}		}

bool hasBFloat16Type() const override { return isAMDGCN(getTriple()); }		bool hasBFloat16Type() const override { return isAMDGCN(getTriple()); }
const char *getBFloat16Mangling() const override { return "u6__bf16"; };

		pengfeiUnsubmitted Done Reply Inline Actions I think it's time to bring D139608 back with this patch :) pengfei: I think it's time to bring D139608 back with this patch :)
		codemzsAuthorUnsubmitted Done Reply Inline Actions I'm inclined to establish a default value, overridden only for ARM, to avoid repetition. If there are no objections, I plan to implement this change in the next iteration. codemzs: I'm inclined to establish a default value, overridden only for ARM, to avoid repetition. If…
std::string_view getClobbers() const override { return ""; }		std::string_view getClobbers() const override { return ""; }

ArrayRef<const char *> getGCCRegNames() const override;		ArrayRef<const char *> getGCCRegNames() const override;

ArrayRef<TargetInfo::GCCRegAlias> getGCCRegAliases() const override {		ArrayRef<TargetInfo::GCCRegAlias> getGCCRegAliases() const override {
return std::nullopt;		return std::nullopt;
}		}

▲ Show 20 Lines • Show All 348 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/ARM.cpp

Show First 20 Lines • Show All 508 Lines • ▼ Show 20 Lines	bool ARMTargetInfo::handleTargetFeatures(std::vector<std::string> &Features,
HWDiv = 0;		HWDiv = 0;
DotProd = 0;		DotProd = 0;
HasMatMul = 0;		HasMatMul = 0;
HasPAC = 0;		HasPAC = 0;
HasBTI = 0;		HasBTI = 0;
HasFloat16 = true;		HasFloat16 = true;
ARMCDECoprocMask = 0;		ARMCDECoprocMask = 0;
HasBFloat16 = false;		HasBFloat16 = false;
		HasFullBFloat16 = false;
FPRegsDisabled = false;		FPRegsDisabled = false;

// This does not diagnose illegal cases like having both		// This does not diagnose illegal cases like having both
// "+vfpv2" and "+vfpv3" or having "+neon" and "-fp64".		// "+vfpv2" and "+vfpv3" or having "+neon" and "-fp64".
for (const auto &Feature : Features) {		for (const auto &Feature : Features) {
if (Feature == "+soft-float") {		if (Feature == "+soft-float") {
SoftFloat = true;		SoftFloat = true;
} else if (Feature == "+vfp2sp" \|\| Feature == "+vfp2") {		} else if (Feature == "+vfp2sp" \|\| Feature == "+vfp2") {
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	if (Feature == "+soft-float") {
ARMCDECoprocMask \|= (1U << Coproc);		ARMCDECoprocMask \|= (1U << Coproc);
} else if (Feature == "+bf16") {		} else if (Feature == "+bf16") {
HasBFloat16 = true;		HasBFloat16 = true;
} else if (Feature == "-fpregs") {		} else if (Feature == "-fpregs") {
FPRegsDisabled = true;		FPRegsDisabled = true;
} else if (Feature == "+pacbti") {		} else if (Feature == "+pacbti") {
HasPAC = 1;		HasPAC = 1;
HasBTI = 1;		HasBTI = 1;
		} else if (Feature == "+fullbf16") {
		HasFullBFloat16 = true;
}		}
}		}

HalfArgsAndReturns = true;		HalfArgsAndReturns = true;

switch (ArchVersion) {		switch (ArchVersion) {
case 6:		case 6:
if (ArchProfile == llvm::ARM::ProfileKind::M)		if (ArchProfile == llvm::ARM::ProfileKind::M)
▲ Show 20 Lines • Show All 833 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/NVPTX.h

Show First 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	CallingConvCheckResult checkCallingConvention(CallingConv CC) const override {
// a host function.		// a host function.
if (HostTarget)		if (HostTarget)
return HostTarget->checkCallingConvention(CC);		return HostTarget->checkCallingConvention(CC);
return CCCR_Warning;		return CCCR_Warning;
}		}

bool hasBitIntType() const override { return true; }		bool hasBitIntType() const override { return true; }
bool hasBFloat16Type() const override { return true; }		bool hasBFloat16Type() const override { return true; }
const char *getBFloat16Mangling() const override { return "u6__bf16"; };
};		};
} // namespace targets		} // namespace targets
} // namespace clang		} // namespace clang
#endif // LLVM_CLANG_LIB_BASIC_TARGETS_NVPTX_H		#endif // LLVM_CLANG_LIB_BASIC_TARGETS_NVPTX_H

clang/lib/Basic/Targets/X86.h

Show First 20 Lines • Show All 411 Lines • ▼ Show 20 Lines	if (TargetAddrSpace == ptr64)
return 64;		return 64;
return PointerWidth;		return PointerWidth;
}		}

uint64_t getPointerAlignV(LangAS AddrSpace) const override {		uint64_t getPointerAlignV(LangAS AddrSpace) const override {
return getPointerWidthV(AddrSpace);		return getPointerWidthV(AddrSpace);
}		}

const char *getBFloat16Mangling() const override { return "u6__bf16"; };
};		};

// X86-32 generic target		// X86-32 generic target
class LLVM_LIBRARY_VISIBILITY X86_32TargetInfo : public X86TargetInfo {		class LLVM_LIBRARY_VISIBILITY X86_32TargetInfo : public X86TargetInfo {
public:		public:
X86_32TargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts)		X86_32TargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts)
: X86TargetInfo(Triple, Opts) {		: X86TargetInfo(Triple, Opts) {
DoubleAlign = LongLongAlign = 32;		DoubleAlign = LongLongAlign = 32;
▲ Show 20 Lines • Show All 566 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/X86.cpp

Show First 20 Lines • Show All 353 Lines • ▼ Show 20 Lines	for (const auto &Feature : Features) {
} else if (Feature == "+tsxldtrk") {		} else if (Feature == "+tsxldtrk") {
HasTSXLDTRK = true;		HasTSXLDTRK = true;
} else if (Feature == "+uintr") {		} else if (Feature == "+uintr") {
HasUINTR = true;		HasUINTR = true;
} else if (Feature == "+crc32") {		} else if (Feature == "+crc32") {
HasCRC32 = true;		HasCRC32 = true;
} else if (Feature == "+x87") {		} else if (Feature == "+x87") {
HasX87 = true;		HasX87 = true;
		} else if (Feature == "+fullbf16") {
		HasFullBFloat16 = true;
		pengfeiUnsubmitted Done Reply Inline Actions Maybe not need it. pengfei: Maybe not need it.
		codemzsAuthorUnsubmitted Done Reply Inline Actions Clarified on the other thread but if you have questions please feel free to post here and I will address them. codemzs: Clarified on the other thread but if you have questions please feel free to post here and I…
}		}

X86SSEEnum Level = llvm::StringSwitch<X86SSEEnum>(Feature)		X86SSEEnum Level = llvm::StringSwitch<X86SSEEnum>(Feature)
.Case("+avx512f", AVX512F)		.Case("+avx512f", AVX512F)
.Case("+avx2", AVX2)		.Case("+avx2", AVX2)
.Case("+avx", AVX)		.Case("+avx", AVX)
.Case("+sse4.2", SSE42)		.Case("+sse4.2", SSE42)
.Case("+sse4.1", SSE41)		.Case("+sse4.1", SSE41)
.Case("+ssse3", SSSE3)		.Case("+ssse3", SSSE3)
.Case("+sse3", SSE3)		.Case("+sse3", SSE3)
.Case("+sse2", SSE2)		.Case("+sse2", SSE2)
.Case("+sse", SSE1)		.Case("+sse", SSE1)
.Default(NoSSE);		.Default(NoSSE);
SSELevel = std::max(SSELevel, Level);		SSELevel = std::max(SSELevel, Level);

HasFloat16 = SSELevel >= SSE2;		HasFloat16 = SSELevel >= SSE2;

		// X86 target has bfloat16 emulation support in the backend, where
		// bfloat16 is treated as a 32-bit float, arithmetic operations are
		// performed in 32-bit, and the result is converted back to bfloat16.
		// Truncation and extension between bfloat16 and 32-bit float are supported
		// by the compiler-rt library. However, native bfloat16 support is currently
		// not available in the X86 target. Hence, HasFullBFloat16 will be false
		// until native bfloat16 support is available. HasFullBFloat16 is used to
		// determine whether to automatically use excess floating point precision
		// for bfloat16 arithmetic operations in the front-end.
HasBFloat16 = SSELevel >= SSE2;		HasBFloat16 = SSELevel >= SSE2;
		pengfeiUnsubmitted Done Reply Inline Actions I'm not sure if I understand the meaning of `HasFullBFloat16`. If it is used for target that supports arithmetic `__bf16`, we should not use `+fullbf16` but always enable it for SSE2, i.e., `HasFullBFloat16 = SSELevel >= SSE2`. Because X86 GCC already supports arithmetic for `__bf16`. If this is used in the way like `HasLegalHalfType`, we should enable it once we have a full BF16 ISA on X86. `fullbf16` doesn't make much sense to me. pengfei: I'm not sure if I understand the meaning of `HasFullBFloat16`. If it is used for target that…
		rjmccallUnsubmitted Done Reply Inline Actions At the moment, we haven't done the work to emulate BFloat16 arithmetic in any of the three ways we can do that: Clang doesn't promote it in IRGen, LLVM doesn't promote it in legalization, and we don't have compiler-rt functions for it. If we emit these instructions, they'll just sail through LLVM and fail in the backend. So in the short term, we have to restrict this to targets that directly support BFloat16 arithmetic in hardware, which doesn't include x86. Once we have that emulation support, I agree that the x86 targets should enable this whenever they would enable `__bf16`. rjmccall: At the moment, we haven't done the work to emulate BFloat16 arithmetic in any of the three ways…
		codemzsAuthorUnsubmitted Done Reply Inline Actions @rjmccall, I concur and just wanted to confirm this change indeed intends to provide `BFloat16` emulation support, utilizing excess precision for promotion to `float`. The `HasFullBFloat16` switch is designed to determine excess precision support automatically when the hardware does not natively support `bfloat16` arithmetic. codemzs: @rjmccall, I concur and just wanted to confirm this change indeed intends to provide `BFloat16`…
		pengfeiUnsubmitted Done Reply Inline Actions LLVM doesn't promote it in legalization, and we don't have compiler-rt functions for it. That's not true: https://godbolt.org/z/jxf5E83vG. The `HasFullBFloat16` switch is designed to determine excess precision support automatically when the hardware does not natively support bfloat16 arithmetic. Makes sense to me. pengfei: > LLVM doesn't promote it in legalization, and we don't have compiler-rt functions for it.
		codemzsAuthorUnsubmitted Done Reply Inline Actions @pengfei, you're right. As part of D126953, the x86 backend received `bfloat16` emulation support. Also, I hope my explanation about the `HasFullBFloat16` flag addressed your questions. Please let me know if further clarification/change is needed. codemzs: @pengfei, you're right. As part of D126953, the x86 backend received `bfloat16` emulation…
		zahiraamUnsubmitted Done Reply Inline Actions At the moment, we haven't done the work to emulate BFloat16 arithmetic in any of the three ways we can do that: Clang doesn't promote it in IRGen, LLVM doesn't promote it in legalization, and we don't have compiler-rt functions for it. If we emit these instructions, they'll just sail through LLVM and fail in the backend. So in the short term, we have to restrict this to targets that directly support BFloat16 arithmetic in hardware, which doesn't include x86. Once we have that emulation support, I agree that the x86 targets should enable this whenever they would enable `__bf16`. Would be nice to add a comment to clarify it. zahiraam: > At the moment, we haven't done the work to emulate BFloat16 arithmetic in any of the three…

MMX3DNowEnum ThreeDNowLevel = llvm::StringSwitch<MMX3DNowEnum>(Feature)		MMX3DNowEnum ThreeDNowLevel = llvm::StringSwitch<MMX3DNowEnum>(Feature)
.Case("+3dnowa", AMD3DNowAthlon)		.Case("+3dnowa", AMD3DNowAthlon)
.Case("+3dnow", AMD3DNow)		.Case("+3dnow", AMD3DNow)
.Case("+mmx", MMX)		.Case("+mmx", MMX)
.Default(NoMMX3DNow);		.Default(NoMMX3DNow);
MMX3DNowLevel = std::max(MMX3DNowLevel, ThreeDNowLevel);		MMX3DNowLevel = std::max(MMX3DNowLevel, ThreeDNowLevel);

▲ Show 20 Lines • Show All 724 Lines • ▼ Show 20 Lines	return llvm::StringSwitch<bool>(Feature)
.Case("x86_32", getTriple().getArch() == llvm::Triple::x86)		.Case("x86_32", getTriple().getArch() == llvm::Triple::x86)
.Case("x86_64", getTriple().getArch() == llvm::Triple::x86_64)		.Case("x86_64", getTriple().getArch() == llvm::Triple::x86_64)
.Case("x87", HasX87)		.Case("x87", HasX87)
.Case("xop", XOPLevel >= XOP)		.Case("xop", XOPLevel >= XOP)
.Case("xsave", HasXSAVE)		.Case("xsave", HasXSAVE)
.Case("xsavec", HasXSAVEC)		.Case("xsavec", HasXSAVEC)
.Case("xsaves", HasXSAVES)		.Case("xsaves", HasXSAVES)
.Case("xsaveopt", HasXSAVEOPT)		.Case("xsaveopt", HasXSAVEOPT)
		.Case("fullbf16", HasFullBFloat16)
		pengfeiUnsubmitted Done Reply Inline Actions ditto. pengfei: ditto.
.Default(false);		.Default(false);
}		}

// We can't use a generic validation scheme for the features accepted here		// We can't use a generic validation scheme for the features accepted here
// versus subtarget features accepted in the target attribute because the		// versus subtarget features accepted in the target attribute because the
// bitfield structure that's initialized in the runtime only supports the		// bitfield structure that's initialized in the runtime only supports the
// below currently rather than the full range of subtarget features. (See		// below currently rather than the full range of subtarget features. (See
// X86TargetInfo::hasFeature for a somewhat comprehensive list).		// X86TargetInfo::hasFeature for a somewhat comprehensive list).
▲ Show 20 Lines • Show All 494 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGExprScalar.cpp

Show First 20 Lines • Show All 808 Lines • ▼ Show 20 Lines	public:
LValue EmitCompoundAssignLValue(const CompoundAssignOperator *E,		LValue EmitCompoundAssignLValue(const CompoundAssignOperator *E,
Value (ScalarExprEmitter::F)(const BinOpInfo &),		Value (ScalarExprEmitter::F)(const BinOpInfo &),
Value *&Result);		Value *&Result);

Value EmitCompoundAssign(const CompoundAssignOperator E,		Value EmitCompoundAssign(const CompoundAssignOperator E,
Value (ScalarExprEmitter::F)(const BinOpInfo &));		Value (ScalarExprEmitter::F)(const BinOpInfo &));

QualType getPromotionType(QualType Ty) {		QualType getPromotionType(QualType Ty) {
		const auto &Ctx = CGF.getContext();
if (auto *CT = Ty->getAs<ComplexType>()) {		if (auto *CT = Ty->getAs<ComplexType>()) {
QualType ElementType = CT->getElementType();		QualType ElementType = CT->getElementType();
if (ElementType.UseExcessPrecision(CGF.getContext()))		if (ElementType.UseExcessPrecision(Ctx))
return CGF.getContext().getComplexType(CGF.getContext().FloatTy);		return Ctx.getComplexType(Ctx.FloatTy);
}		}
if (Ty.UseExcessPrecision(CGF.getContext()))
return CGF.getContext().FloatTy;		if (Ty.UseExcessPrecision(Ctx)) {
		if (auto *VT = Ty->getAs<VectorType>()) {
		unsigned NumElements = VT->getNumElements();
		return Ctx.getVectorType(Ctx.FloatTy, NumElements, VT->getVectorKind());
		}
		return Ctx.FloatTy;
		}

return QualType();		return QualType();
}		}

// Binary operators and binary compound assignment operators.		// Binary operators and binary compound assignment operators.
#define HANDLEBINOP(OP) \		#define HANDLEBINOP(OP) \
Value VisitBin##OP(const BinaryOperator E) { \		Value VisitBin##OP(const BinaryOperator E) { \
QualType promotionTy = getPromotionType(E->getType()); \		QualType promotionTy = getPromotionType(E->getType()); \
auto result = Emit##OP(EmitBinOps(E, promotionTy)); \		auto result = Emit##OP(EmitBinOps(E, promotionTy)); \
▲ Show 20 Lines • Show All 4,584 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Clang.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,768 Lines • ▼ Show 20 Lines	static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,
StringRef FPContract;		StringRef FPContract;
StringRef LastSeenFfpContractOption;		StringRef LastSeenFfpContractOption;
bool SeenUnsafeMathModeOption = false;		bool SeenUnsafeMathModeOption = false;
if (!JA.isDeviceOffloading(Action::OFK_Cuda) &&		if (!JA.isDeviceOffloading(Action::OFK_Cuda) &&
!JA.isOffloading(Action::OFK_HIP))		!JA.isOffloading(Action::OFK_HIP))
FPContract = "on";		FPContract = "on";
bool StrictFPModel = false;		bool StrictFPModel = false;
StringRef Float16ExcessPrecision = "";		StringRef Float16ExcessPrecision = "";
		StringRef BFloat16ExcessPrecision = "";

if (const Arg *A = Args.getLastArg(options::OPT_flimited_precision_EQ)) {		if (const Arg *A = Args.getLastArg(options::OPT_flimited_precision_EQ)) {
CmdArgs.push_back("-mlimit-float-precision");		CmdArgs.push_back("-mlimit-float-precision");
CmdArgs.push_back(A->getValue());		CmdArgs.push_back(A->getValue());
}		}

for (const Arg *A : Args) {		for (const Arg *A : Args) {
auto optID = A->getOption().getID();		auto optID = A->getOption().getID();
▲ Show 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	case options::OPT_fexcess_precision_EQ: {
else		else
D.Diag(diag::err_drv_unsupported_option_argument)		D.Diag(diag::err_drv_unsupported_option_argument)
<< A->getSpelling() << Val;		<< A->getSpelling() << Val;
} else {		} else {
if (!(Val.equals("standard") \|\| Val.equals("fast")))		if (!(Val.equals("standard") \|\| Val.equals("fast")))
D.Diag(diag::err_drv_unsupported_option_argument)		D.Diag(diag::err_drv_unsupported_option_argument)
<< A->getSpelling() << Val;		<< A->getSpelling() << Val;
}		}
		BFloat16ExcessPrecision = Float16ExcessPrecision;
break;		break;
}		}
case options::OPT_ffinite_math_only:		case options::OPT_ffinite_math_only:
HonorINFs = false;		HonorINFs = false;
HonorNaNs = false;		HonorNaNs = false;
break;		break;
case options::OPT_fno_finite_math_only:		case options::OPT_fno_finite_math_only:
HonorINFs = true;		HonorINFs = true;
▲ Show 20 Lines • Show All 159 Lines • ▼ Show 20 Lines	CmdArgs.push_back(Args.MakeArgString("-ffp-exception-behavior=" +
FPExceptionBehavior));		FPExceptionBehavior));

if (!FPEvalMethod.empty())		if (!FPEvalMethod.empty())
CmdArgs.push_back(Args.MakeArgString("-ffp-eval-method=" + FPEvalMethod));		CmdArgs.push_back(Args.MakeArgString("-ffp-eval-method=" + FPEvalMethod));

if (!Float16ExcessPrecision.empty())		if (!Float16ExcessPrecision.empty())
CmdArgs.push_back(Args.MakeArgString("-ffloat16-excess-precision=" +		CmdArgs.push_back(Args.MakeArgString("-ffloat16-excess-precision=" +
Float16ExcessPrecision));		Float16ExcessPrecision));
		if (!BFloat16ExcessPrecision.empty())
		CmdArgs.push_back(Args.MakeArgString("-fbfloat16-excess-precision=" +
		BFloat16ExcessPrecision));

ParseMRecip(D, Args, CmdArgs);		ParseMRecip(D, Args, CmdArgs);

// -ffast-math enables the __FAST_MATH__ preprocessor macro, but check for the		// -ffast-math enables the __FAST_MATH__ preprocessor macro, but check for the
// individual features enabled by -ffast-math instead of the option itself as		// individual features enabled by -ffast-math instead of the option itself as
// that's consistent with gcc's behaviour.		// that's consistent with gcc's behaviour.
if (!HonorINFs && !HonorNaNs && !MathErrno && AssociativeMath && ApproxFunc &&		if (!HonorINFs && !HonorNaNs && !MathErrno && AssociativeMath && ApproxFunc &&
ReciprocalMath && !SignedZeros && !TrappingMath && !RoundingFPMath) {		ReciprocalMath && !SignedZeros && !TrappingMath && !RoundingFPMath) {
▲ Show 20 Lines • Show All 5,426 Lines • Show Last 20 Lines

clang/lib/Sema/SemaCast.cpp

Show First 20 Lines • Show All 3,086 Lines • ▼ Show 20 Lines	void CastOperation::CheckCStyleCast() {
// Note that member pointers were filtered out with C++, above.		// Note that member pointers were filtered out with C++, above.

if (isa<ObjCSelectorExpr>(SrcExpr.get())) {		if (isa<ObjCSelectorExpr>(SrcExpr.get())) {
Self.Diag(SrcExpr.get()->getExprLoc(), diag::err_cast_selector_expr);		Self.Diag(SrcExpr.get()->getExprLoc(), diag::err_cast_selector_expr);
SrcExpr = ExprError();		SrcExpr = ExprError();
return;		return;
}		}

// Can't cast to or from bfloat
if (DestType->isBFloat16Type() && !SrcType->isBFloat16Type()) {
Self.Diag(SrcExpr.get()->getExprLoc(), diag::err_cast_to_bfloat16)
<< SrcExpr.get()->getSourceRange();
SrcExpr = ExprError();
return;
}
if (SrcType->isBFloat16Type() && !DestType->isBFloat16Type()) {
Self.Diag(SrcExpr.get()->getExprLoc(), diag::err_cast_from_bfloat16)
<< SrcExpr.get()->getSourceRange();
SrcExpr = ExprError();
return;
}

// If either type is a pointer, the other type has to be either an		// If either type is a pointer, the other type has to be either an
// integer or a pointer.		// integer or a pointer.
if (!DestType->isArithmeticType()) {		if (!DestType->isArithmeticType()) {
if (!SrcType->isIntegralType(Self.Context) && SrcType->isArithmeticType()) {		if (!SrcType->isIntegralType(Self.Context) && SrcType->isArithmeticType()) {
Self.Diag(SrcExpr.get()->getExprLoc(),		Self.Diag(SrcExpr.get()->getExprLoc(),
diag::err_cast_pointer_from_non_pointer_int)		diag::err_cast_pointer_from_non_pointer_int)
<< SrcType << SrcExpr.get()->getSourceRange();		<< SrcType << SrcExpr.get()->getSourceRange();
SrcExpr = ExprError();		SrcExpr = ExprError();
▲ Show 20 Lines • Show All 235 Lines • Show Last 20 Lines

clang/lib/Sema/SemaExpr.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,779 Lines • ▼ Show 20 Lines	QualType Sema::CheckVectorOperands(ExprResult &LHS, ExprResult &RHS,
// For example, "const float" and "float" are equivalent.		// For example, "const float" and "float" are equivalent.
QualType LHSType = LHS.get()->getType().getUnqualifiedType();		QualType LHSType = LHS.get()->getType().getUnqualifiedType();
QualType RHSType = RHS.get()->getType().getUnqualifiedType();		QualType RHSType = RHS.get()->getType().getUnqualifiedType();

const VectorType *LHSVecType = LHSType->getAs<VectorType>();		const VectorType *LHSVecType = LHSType->getAs<VectorType>();
const VectorType *RHSVecType = RHSType->getAs<VectorType>();		const VectorType *RHSVecType = RHSType->getAs<VectorType>();
assert(LHSVecType \|\| RHSVecType);		assert(LHSVecType \|\| RHSVecType);

if ((LHSVecType && LHSVecType->getElementType()->isBFloat16Type()) \|\|
(RHSVecType && RHSVecType->getElementType()->isBFloat16Type()))
return ReportInvalid ? InvalidOperands(Loc, LHS, RHS) : QualType();

// AltiVec-style "vector bool op vector bool" combinations are allowed		// AltiVec-style "vector bool op vector bool" combinations are allowed
// for some operators but not others.		// for some operators but not others.
if (!AllowBothBool &&		if (!AllowBothBool &&
LHSVecType && LHSVecType->getVectorKind() == VectorType::AltiVecBool &&		LHSVecType && LHSVecType->getVectorKind() == VectorType::AltiVecBool &&
RHSVecType && RHSVecType->getVectorKind() == VectorType::AltiVecBool)		RHSVecType && RHSVecType->getVectorKind() == VectorType::AltiVecBool)
return ReportInvalid ? InvalidOperands(Loc, LHS, RHS) : QualType();		return ReportInvalid ? InvalidOperands(Loc, LHS, RHS) : QualType();

// This operation may not be performed on boolean vectors.		// This operation may not be performed on boolean vectors.
▲ Show 20 Lines • Show All 10,614 Lines • Show Last 20 Lines

clang/lib/Sema/SemaOverload.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,989 Lines • ▼ Show 20 Lines

if (S.Context.hasSameUnqualifiedType(FromType, ToType)) {

// Complex-real conversions (C99 6.3.1.7)

SCS.Second = ICK_Complex_Real;

FromType = ToType.getUnqualifiedType();

} else if (FromType->isRealFloatingType() && ToType->isRealFloatingType()) {

// FIXME: disable conversions between long double, __ibm128 and __float128

// if their representation is different until there is back end support

// We of course allow this conversion if long double is really double.

// Conversions between bfloat and other floats are not permitted.

// Conversions between bfloat16 and float16 is currently not supported.

rjmccallUnsubmitted

Done

// We of course allow this conversion if long double is really double.

- // Conversions between bfloat16 and float16 is currently not supported.

+ // Conversions between bfloat16 and float16 are currently not supported.

if ((FromType->isBFloat16Type() &&

rjmccall:

if (FromType == S.Context.BFloat16Ty || ToType == S.Context.BFloat16Ty)

if ((FromType->isBFloat16Type() &&

(ToType->isFloat16Type() || ToType->isHalfType())) ||

(ToType->isBFloat16Type() &&

(FromType->isFloat16Type() || FromType->isHalfType())))

return false;

// Conversions between IEEE-quad and IBM-extended semantics are not

// permitted.

const llvm::fltSemantics &FromSem =

S.Context.getFloatTypeSemantics(FromType);

const llvm::fltSemantics &ToSem = S.Context.getFloatTypeSemantics(ToType);

if ((&FromSem == &llvm::APFloat::PPCDoubleDouble() &&

&ToSem == &llvm::APFloat::IEEEquad()) ||

(&FromSem == &llvm::APFloat::IEEEquad() &&

&ToSem == &llvm::APFloat::PPCDoubleDouble()))

return false;

// Floating point conversions (C++ 4.8).

SCS.Second = ICK_Floating_Conversion;

FromType = ToType.getUnqualifiedType();

} else if ((FromType->isRealFloatingType() &&

ToType->isIntegralType(S.Context)) ||

(FromType->isIntegralOrUnscopedEnumerationType() &&

ToType->isRealFloatingType())) {

// Conversions between bfloat and int are not permitted.

if (FromType->isBFloat16Type() || ToType->isBFloat16Type())

return false;

// Floating-integral conversions (C++ 4.9).

SCS.Second = ICK_Floating_Integral;

FromType = ToType.getUnqualifiedType();

} else if (S.IsBlockPointerConversion(FromType, ToType, FromType)) {

SCS.Second = ICK_Block_Pointer_Conversion;

} else if (AllowObjCWritebackConversion &&

S.isObjCWritebackConversion(FromType, ToType, FromType)) {

Show All 14 Lines

if (S.Context.hasSameUnqualifiedType(FromType, ToType)) {

FromType = ToType.getUnqualifiedType();

} else if (!S.getLangOpts().CPlusPlus &&

S.Context.typesAreCompatible(ToType, FromType)) {

// Compatible conversions (Clang extension for C function overloading)

SCS.Second = ICK_Compatible_Conversion;

FromType = ToType.getUnqualifiedType();

} else if (IsTransparentUnionStandardConversion(S, From, ToType,

InOverloadResolution,

SCS, CStyle)) {

SCS.Second = ICK_TransparentUnionConversion;

codemzsAuthorUnsubmitted

Done

FromType = ToType.getUnqualifiedType();

- } else if (IsTransparentUnionStandardConversion(

- S, From, ToType, InOverloadResolution, SCS, CStyle)) {

+ } else if (IsTransparentUnionStandardConversion(S, From, ToType,

+ InOverloadResolution,

SCS.Second = ICK_TransparentUnionConversion;

codemzs:

FromType = ToType;

} else if (tryAtomicConversion(S, From, ToType, InOverloadResolution, SCS,

CStyle)) {

// tryAtomicConversion has updated the standard conversion sequence

// appropriately.

return true;

} else if (ToType->isEventT() &&

From->isIntegerConstantExpr(S.getASTContext()) &&

▲ Show 20 Lines • Show All 13,562 Lines • Show Last 20 Lines

clang/test/CodeGen/X86/avx512bf16-error.c

	// RUN: %clang_cc1 -fsyntax-only -verify -ffreestanding -triple x86_64-linux-pc %s			// RUN: %clang_cc1 -fsyntax-only -verify -ffreestanding -triple x86_64-linux-pc %s

	// expected-error@+1 3 {{unknown type name '__bfloat16'}}			// expected-error@+1 3 {{unknown type name '__bfloat16'}}
	__bfloat16 foo(__bfloat16 a, __bfloat16 b) {			__bfloat16 foo(__bfloat16 a, __bfloat16 b) {
	return a + b;			return a + b;
	}			}

	#include <immintrin.h>			#include <immintrin.h>

	// expected-error@+4 {{invalid operands to binary expression ('__bfloat16' (aka '__bf16') and '__bfloat16')}}
	// expected-warning@+2 3 {{'__bfloat16' is deprecated: use __bf16 instead}}			// expected-warning@+2 3 {{'__bfloat16' is deprecated: use __bf16 instead}}
	// expected-note@* 3 {{'__bfloat16' has been explicitly marked deprecated here}}			// expected-note@* 3 {{'__bfloat16' has been explicitly marked deprecated here}}
	__bfloat16 bar(__bfloat16 a, __bfloat16 b) {			__bfloat16 bar(__bfloat16 a, __bfloat16 b) {
	return a + b;			return a + b;
	}			}

clang/test/CodeGen/X86/bfloat-mangle.cpp

	// RUN: %clang_cc1 -triple i386-unknown-unknown -target-feature +sse2 -emit-llvm -o - %s \| FileCheck %s --check-prefixes=LINUX			// RUN: %clang_cc1 -triple i386-unknown-unknown -target-feature +sse2 -emit-llvm -o - %s \| FileCheck %s --check-prefixes=LINUX
	// RUN: %clang_cc1 -triple x86_64-unknown-unknown -target-feature +sse2 -emit-llvm -o - %s \| FileCheck %s --check-prefixes=LINUX			// RUN: %clang_cc1 -triple x86_64-unknown-unknown -target-feature +sse2 -emit-llvm -o - %s \| FileCheck %s --check-prefixes=LINUX
	// RUN: %clang_cc1 -triple i386-windows-msvc -target-feature +sse2 -emit-llvm -o - %s \| FileCheck %s --check-prefixes=WINDOWS			// RUN: %clang_cc1 -triple i386-windows-msvc -target-feature +sse2 -emit-llvm -o - %s \| FileCheck %s --check-prefixes=WINDOWS
	// RUN: %clang_cc1 -triple x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - %s \| FileCheck %s --check-prefixes=WINDOWS			// RUN: %clang_cc1 -triple x86_64-windows-msvc -target-feature +sse2 -emit-llvm -o - %s \| FileCheck %s --check-prefixes=WINDOWS

	// LINUX: define {{.*}}void @_Z3foou6__bf16(bfloat noundef %b)			// LINUX: define {{.*}}void @_Z3fooDF16b(bfloat noundef %b)
	// WINDOWS: define {{.*}}void @"?foo@@YAXU__bf16@__clang@@@Z"(bfloat noundef %b)			// WINDOWS: define {{.*}}void @"?foo@@YAXU__bf16@__clang@@@Z"(bfloat noundef %b)
	void foo(__bf16 b) {}			void foo(__bf16 b) {}

clang/test/CodeGen/X86/bfloat16.cpp

This file was added.

				// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 2
				// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -target-feature +fullbf16 -S -emit-llvm %s -o - \| FileCheck %s
				// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -S -emit-llvm %s -o - \| FileCheck -check-prefix=CHECK-NBF16 %s

				pengfeiUnsubmitted Done Reply Inline Actions The backend has already support lowering of `bfloat`, I don't think it's necessary to do extra work in FE unless for excess-precision. pengfei: The backend has already support lowering of `bfloat`, I don't think it's necessary to do extra…
				zahiraamUnsubmitted Done Reply Inline Actions The backend has already support lowering of `bfloat`, I don't think it's necessary to do extra work in FE unless for excess-precision. +1. zahiraam: > The backend has already support lowering of `bfloat`, I don't think it's necessary to do…
				codemzsAuthorUnsubmitted Done Reply Inline Actions @pengfei @zahiraam I added this test to verify bfloat16 IR gen functionality, considering both scenarios: with and without native bfloat16 support. However, if you believe it's more beneficial to omit it, I'm open to doing so. Happy to also move this test to another target that doesn't have backend support for emulation. codemzs: @pengfei @zahiraam I added this test to verify bfloat16 IR gen functionality, considering both…
				zahiraamUnsubmitted Not Done Reply Inline Actions I think that's fine. You can leave it. zahiraam: I think that's fine. You can leave it.
				// CHECK-LABEL: define dso_local void @_Z11test_scalarDF16bDF16b
				// CHECK-SAME: (bfloat noundef [[A:%.]], bfloat noundef [[B:%.]]) #[[ATTR0:[0-9]+]] {
				// CHECK: [[A_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-NEXT: [[B_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-NEXT: [[C:%.*]] = alloca bfloat, align 2
				// CHECK-NEXT: store bfloat [[A]], ptr [[A_ADDR]], align 2
				// CHECK-NEXT: store bfloat [[B]], ptr [[B_ADDR]], align 2
				// CHECK-NEXT: [[TMP0:%.*]] = load bfloat, ptr [[A_ADDR]], align 2
				// CHECK-NEXT: [[TMP1:%.*]] = load bfloat, ptr [[B_ADDR]], align 2
				// CHECK-NEXT: [[ADD:%.*]] = fadd bfloat [[TMP0]], [[TMP1]]
				// CHECK-NEXT: store bfloat [[ADD]], ptr [[C]], align 2
				// CHECK-NEXT: [[TMP2:%.*]] = load bfloat, ptr [[A_ADDR]], align 2
				// CHECK-NEXT: [[TMP3:%.*]] = load bfloat, ptr [[B_ADDR]], align 2
				// CHECK-NEXT: [[SUB:%.*]] = fsub bfloat [[TMP2]], [[TMP3]]
				// CHECK-NEXT: store bfloat [[SUB]], ptr [[C]], align 2
				// CHECK-NEXT: [[TMP4:%.*]] = load bfloat, ptr [[A_ADDR]], align 2
				// CHECK-NEXT: [[TMP5:%.*]] = load bfloat, ptr [[B_ADDR]], align 2
				// CHECK-NEXT: [[MUL:%.*]] = fmul bfloat [[TMP4]], [[TMP5]]
				// CHECK-NEXT: store bfloat [[MUL]], ptr [[C]], align 2
				// CHECK-NEXT: [[TMP6:%.*]] = load bfloat, ptr [[A_ADDR]], align 2
				// CHECK-NEXT: [[TMP7:%.*]] = load bfloat, ptr [[B_ADDR]], align 2
				// CHECK-NEXT: [[DIV:%.*]] = fdiv bfloat [[TMP6]], [[TMP7]]
				// CHECK-NEXT: store bfloat [[DIV]], ptr [[C]], align 2
				// CHECK-NEXT: ret void
				//
				// CHECK-NBF16-LABEL: define dso_local void @_Z11test_scalarDF16bDF16b
				// CHECK-NBF16-SAME: (bfloat noundef [[A:%.]], bfloat noundef [[B:%.]]) #[[ATTR0:[0-9]+]] {
				// CHECK-NBF16: [[A_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-NBF16-NEXT: [[B_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-NBF16-NEXT: [[C:%.*]] = alloca bfloat, align 2
				// CHECK-NBF16-NEXT: store bfloat [[A]], ptr [[A_ADDR]], align 2
				// CHECK-NBF16-NEXT: store bfloat [[B]], ptr [[B_ADDR]], align 2
				// CHECK-NBF16-NEXT: [[TMP0:%.*]] = load bfloat, ptr [[A_ADDR]], align 2
				// CHECK-NBF16-NEXT: [[EXT:%.*]] = fpext bfloat [[TMP0]] to float
				// CHECK-NBF16-NEXT: [[TMP1:%.*]] = load bfloat, ptr [[B_ADDR]], align 2
				// CHECK-NBF16-NEXT: [[EXT1:%.*]] = fpext bfloat [[TMP1]] to float
				// CHECK-NBF16-NEXT: [[ADD:%.*]] = fadd float [[EXT]], [[EXT1]]
				// CHECK-NBF16-NEXT: [[UNPROMOTION:%.*]] = fptrunc float [[ADD]] to bfloat
				// CHECK-NBF16-NEXT: store bfloat [[UNPROMOTION]], ptr [[C]], align 2
				// CHECK-NBF16-NEXT: [[TMP2:%.*]] = load bfloat, ptr [[A_ADDR]], align 2
				// CHECK-NBF16-NEXT: [[EXT2:%.*]] = fpext bfloat [[TMP2]] to float
				// CHECK-NBF16-NEXT: [[TMP3:%.*]] = load bfloat, ptr [[B_ADDR]], align 2
				// CHECK-NBF16-NEXT: [[EXT3:%.*]] = fpext bfloat [[TMP3]] to float
				// CHECK-NBF16-NEXT: [[SUB:%.*]] = fsub float [[EXT2]], [[EXT3]]
				// CHECK-NBF16-NEXT: [[UNPROMOTION4:%.*]] = fptrunc float [[SUB]] to bfloat
				// CHECK-NBF16-NEXT: store bfloat [[UNPROMOTION4]], ptr [[C]], align 2
				// CHECK-NBF16-NEXT: [[TMP4:%.*]] = load bfloat, ptr [[A_ADDR]], align 2
				// CHECK-NBF16-NEXT: [[EXT5:%.*]] = fpext bfloat [[TMP4]] to float
				// CHECK-NBF16-NEXT: [[TMP5:%.*]] = load bfloat, ptr [[B_ADDR]], align 2
				// CHECK-NBF16-NEXT: [[EXT6:%.*]] = fpext bfloat [[TMP5]] to float
				// CHECK-NBF16-NEXT: [[MUL:%.*]] = fmul float [[EXT5]], [[EXT6]]
				// CHECK-NBF16-NEXT: [[UNPROMOTION7:%.*]] = fptrunc float [[MUL]] to bfloat
				// CHECK-NBF16-NEXT: store bfloat [[UNPROMOTION7]], ptr [[C]], align 2
				// CHECK-NBF16-NEXT: [[TMP6:%.*]] = load bfloat, ptr [[A_ADDR]], align 2
				// CHECK-NBF16-NEXT: [[EXT8:%.*]] = fpext bfloat [[TMP6]] to float
				// CHECK-NBF16-NEXT: [[TMP7:%.*]] = load bfloat, ptr [[B_ADDR]], align 2
				// CHECK-NBF16-NEXT: [[EXT9:%.*]] = fpext bfloat [[TMP7]] to float
				// CHECK-NBF16-NEXT: [[DIV:%.*]] = fdiv float [[EXT8]], [[EXT9]]
				// CHECK-NBF16-NEXT: [[UNPROMOTION10:%.*]] = fptrunc float [[DIV]] to bfloat
				// CHECK-NBF16-NEXT: store bfloat [[UNPROMOTION10]], ptr [[C]], align 2
				// CHECK-NBF16-NEXT: ret void
				//
				void test_scalar(__bf16 a, __bf16 b) {
				__bf16 c;
				c = a + b;
				c = a - b;
				c = a * b;
				c = a / b;
				}

				typedef __bf16 v8bfloat16 __attribute__((__vector_size__(16)));

				// CHECK-LABEL: define dso_local void @_Z11test_vectorDv8_DF16bS_
				// CHECK-SAME: (<8 x bfloat> noundef [[A:%.]], <8 x bfloat> noundef [[B:%.]]) #[[ATTR0:[0-9]+]] {
				// CHECK: [[A_ADDR:%.*]] = alloca <8 x bfloat>, align 16
				// CHECK-NEXT: [[B_ADDR:%.*]] = alloca <8 x bfloat>, align 16
				// CHECK-NEXT: [[C:%.*]] = alloca <8 x bfloat>, align 16
				// CHECK-NEXT: store <8 x bfloat> [[A]], ptr [[A_ADDR]], align 16
				// CHECK-NEXT: store <8 x bfloat> [[B]], ptr [[B_ADDR]], align 16
				// CHECK-NEXT: [[TMP0:%.*]] = load <8 x bfloat>, ptr [[A_ADDR]], align 16
				// CHECK-NEXT: [[TMP1:%.*]] = load <8 x bfloat>, ptr [[B_ADDR]], align 16
				// CHECK-NEXT: [[ADD:%.*]] = fadd <8 x bfloat> [[TMP0]], [[TMP1]]
				// CHECK-NEXT: store <8 x bfloat> [[ADD]], ptr [[C]], align 16
				// CHECK-NEXT: [[TMP2:%.*]] = load <8 x bfloat>, ptr [[A_ADDR]], align 16
				// CHECK-NEXT: [[TMP3:%.*]] = load <8 x bfloat>, ptr [[B_ADDR]], align 16
				// CHECK-NEXT: [[SUB:%.*]] = fsub <8 x bfloat> [[TMP2]], [[TMP3]]
				// CHECK-NEXT: store <8 x bfloat> [[SUB]], ptr [[C]], align 16
				// CHECK-NEXT: [[TMP4:%.*]] = load <8 x bfloat>, ptr [[A_ADDR]], align 16
				// CHECK-NEXT: [[TMP5:%.*]] = load <8 x bfloat>, ptr [[B_ADDR]], align 16
				// CHECK-NEXT: [[MUL:%.*]] = fmul <8 x bfloat> [[TMP4]], [[TMP5]]
				// CHECK-NEXT: store <8 x bfloat> [[MUL]], ptr [[C]], align 16
				// CHECK-NEXT: [[TMP6:%.*]] = load <8 x bfloat>, ptr [[A_ADDR]], align 16
				// CHECK-NEXT: [[TMP7:%.*]] = load <8 x bfloat>, ptr [[B_ADDR]], align 16
				// CHECK-NEXT: [[DIV:%.*]] = fdiv <8 x bfloat> [[TMP6]], [[TMP7]]
				// CHECK-NEXT: store <8 x bfloat> [[DIV]], ptr [[C]], align 16
				// CHECK-NEXT: ret void
				//
				// CHECK-NBF16-LABEL: define dso_local void @_Z11test_vectorDv8_DF16bS_
				// CHECK-NBF16-SAME: (<8 x bfloat> noundef [[A:%.]], <8 x bfloat> noundef [[B:%.]]) #[[ATTR0:[0-9]+]] {
				// CHECK-NBF16: [[A_ADDR:%.*]] = alloca <8 x bfloat>, align 16
				// CHECK-NBF16-NEXT: [[B_ADDR:%.*]] = alloca <8 x bfloat>, align 16
				// CHECK-NBF16-NEXT: [[C:%.*]] = alloca <8 x bfloat>, align 16
				// CHECK-NBF16-NEXT: store <8 x bfloat> [[A]], ptr [[A_ADDR]], align 16
				// CHECK-NBF16-NEXT: store <8 x bfloat> [[B]], ptr [[B_ADDR]], align 16
				// CHECK-NBF16-NEXT: [[TMP0:%.*]] = load <8 x bfloat>, ptr [[A_ADDR]], align 16
				// CHECK-NBF16-NEXT: [[EXT:%.*]] = fpext <8 x bfloat> [[TMP0]] to <8 x float>
				// CHECK-NBF16-NEXT: [[TMP1:%.*]] = load <8 x bfloat>, ptr [[B_ADDR]], align 16
				// CHECK-NBF16-NEXT: [[EXT1:%.*]] = fpext <8 x bfloat> [[TMP1]] to <8 x float>
				// CHECK-NBF16-NEXT: [[ADD:%.*]] = fadd <8 x float> [[EXT]], [[EXT1]]
				// CHECK-NBF16-NEXT: [[UNPROMOTION:%.*]] = fptrunc <8 x float> [[ADD]] to <8 x bfloat>
				// CHECK-NBF16-NEXT: store <8 x bfloat> [[UNPROMOTION]], ptr [[C]], align 16
				// CHECK-NBF16-NEXT: [[TMP2:%.*]] = load <8 x bfloat>, ptr [[A_ADDR]], align 16
				// CHECK-NBF16-NEXT: [[EXT2:%.*]] = fpext <8 x bfloat> [[TMP2]] to <8 x float>
				// CHECK-NBF16-NEXT: [[TMP3:%.*]] = load <8 x bfloat>, ptr [[B_ADDR]], align 16
				// CHECK-NBF16-NEXT: [[EXT3:%.*]] = fpext <8 x bfloat> [[TMP3]] to <8 x float>
				// CHECK-NBF16-NEXT: [[SUB:%.*]] = fsub <8 x float> [[EXT2]], [[EXT3]]
				// CHECK-NBF16-NEXT: [[UNPROMOTION4:%.*]] = fptrunc <8 x float> [[SUB]] to <8 x bfloat>
				// CHECK-NBF16-NEXT: store <8 x bfloat> [[UNPROMOTION4]], ptr [[C]], align 16
				// CHECK-NBF16-NEXT: [[TMP4:%.*]] = load <8 x bfloat>, ptr [[A_ADDR]], align 16
				// CHECK-NBF16-NEXT: [[EXT5:%.*]] = fpext <8 x bfloat> [[TMP4]] to <8 x float>
				// CHECK-NBF16-NEXT: [[TMP5:%.*]] = load <8 x bfloat>, ptr [[B_ADDR]], align 16
				// CHECK-NBF16-NEXT: [[EXT6:%.*]] = fpext <8 x bfloat> [[TMP5]] to <8 x float>
				// CHECK-NBF16-NEXT: [[MUL:%.*]] = fmul <8 x float> [[EXT5]], [[EXT6]]
				// CHECK-NBF16-NEXT: [[UNPROMOTION7:%.*]] = fptrunc <8 x float> [[MUL]] to <8 x bfloat>
				// CHECK-NBF16-NEXT: store <8 x bfloat> [[UNPROMOTION7]], ptr [[C]], align 16
				// CHECK-NBF16-NEXT: [[TMP6:%.*]] = load <8 x bfloat>, ptr [[A_ADDR]], align 16
				// CHECK-NBF16-NEXT: [[EXT8:%.*]] = fpext <8 x bfloat> [[TMP6]] to <8 x float>
				// CHECK-NBF16-NEXT: [[TMP7:%.*]] = load <8 x bfloat>, ptr [[B_ADDR]], align 16
				// CHECK-NBF16-NEXT: [[EXT9:%.*]] = fpext <8 x bfloat> [[TMP7]] to <8 x float>
				// CHECK-NBF16-NEXT: [[DIV:%.*]] = fdiv <8 x float> [[EXT8]], [[EXT9]]
				// CHECK-NBF16-NEXT: [[UNPROMOTION10:%.*]] = fptrunc <8 x float> [[DIV]] to <8 x bfloat>
				// CHECK-NBF16-NEXT: store <8 x bfloat> [[UNPROMOTION10]], ptr [[C]], align 16
				// CHECK-NBF16-NEXT: ret void
				//
				void test_vector(v8bfloat16 a, v8bfloat16 b) {
				v8bfloat16 c;
				c = a + b;
				c = a - b;
				c = a * b;
				c = a / b;
				}

clang/test/CodeGen/X86/fexcess-precision-bfloat16.c

This file was added.

				// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --version 2
				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=fast -emit-llvm -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-EXT %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=fast -target-feature +fullbf16 \
				// RUN: -emit-llvm -o - %s \| FileCheck -check-prefixes=CHECK-NO-EXT %s
				pengfeiUnsubmitted Done Reply Inline Actions The tests here make me guess you want to use `fullbf16` the same as `HasLegalHalfType`. pengfei: The tests here make me guess you want to use `fullbf16` the same as `HasLegalHalfType`.
				codemzsAuthorUnsubmitted Done Reply Inline Actions Yes that is correct it is just to emulate the correct IR gen if x86 were to have native support. Happy to remove these tests if you feel that is better? codemzs: Yes that is correct it is just to emulate the correct IR gen if x86 were to have native support.

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=standard -emit-llvm -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-EXT %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=standard -target-feature +fullbf16 \
				// RUN: -emit-llvm -o - %s \| FileCheck -check-prefixes=CHECK-NO-EXT %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=none -emit-llvm -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-NO-EXT %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=none -target-feature +fullbf16 \
				// RUN: -emit-llvm -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-NO-EXT %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=fast \
				// RUN: -emit-llvm -ffp-eval-method=source -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-EXT %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=fast -target-feature +fullbf16 \
				// RUN: -emit-llvm -ffp-eval-method=source -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-NO-EXT %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=standard \
				// RUN: -emit-llvm -ffp-eval-method=source -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-EXT %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=standard -target-feature +fullbf16 \
				// RUN: -emit-llvm -ffp-eval-method=source -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-NO-EXT %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=none \
				// RUN: -emit-llvm -ffp-eval-method=source -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-NO-EXT %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=none -target-feature +fullbf16 \
				// RUN: -emit-llvm -ffp-eval-method=source -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-NO-EXT %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=fast \
				// RUN: -emit-llvm -ffp-eval-method=double -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-EXT-DBL %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=fast -target-feature +fullbf16 \
				// RUN: -emit-llvm -ffp-eval-method=double -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-EXT-DBL %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=standard \
				// RUN: -emit-llvm -ffp-eval-method=double -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-EXT-DBL %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=standard -target-feature +fullbf16 \
				// RUN: -emit-llvm -ffp-eval-method=double -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-EXT-DBL %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=none \
				// RUN: -emit-llvm -ffp-eval-method=double -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-EXT-DBL %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=none -target-feature +fullbf16 \
				// RUN: -emit-llvm -ffp-eval-method=double -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-EXT-DBL %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=fast \
				// RUN: -emit-llvm -ffp-eval-method=extended -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-EXT-FP80 %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=fast -target-feature +fullbf16 \
				// RUN: -emit-llvm -ffp-eval-method=extended -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-EXT-FP80 %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=standard \
				// RUN: -emit-llvm -ffp-eval-method=extended -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-EXT-FP80 %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=standard -target-feature +fullbf16 \
				// RUN: -emit-llvm -ffp-eval-method=extended -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-EXT-FP80 %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=none \
				// RUN: -emit-llvm -ffp-eval-method=extended -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-EXT-FP80 %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=none -target-feature +fullbf16 \
				// RUN: -emit-llvm -ffp-eval-method=extended -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-EXT-FP80 %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=none \
				// RUN: -ffp-contract=on -emit-llvm -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-CONTRACT %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=none -target-feature +fullbf16 \
				// RUN: -ffp-contract=on -emit-llvm -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-CONTRACT %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=none \
				// RUN: -fmath-errno -ffp-contract=on -fno-rounding-math \
				// RUN: -ffp-eval-method=source -emit-llvm -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-CONTRACT %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=none -target-feature +fullbf16 \
				// RUN: -fmath-errno -ffp-contract=on -fno-rounding-math \
				// RUN: -ffp-eval-method=source -emit-llvm -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-CONTRACT %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=none \
				// RUN: -fmath-errno -ffp-contract=on -fno-rounding-math \
				// RUN: -ffp-eval-method=double -emit-llvm -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-CONTRACT-DBL %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=none -target-feature +fullbf16 \
				// RUN: -fmath-errno -ffp-contract=on -fno-rounding-math \
				// RUN: -ffp-eval-method=double -emit-llvm -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-CONTRACT-DBL %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=none \
				// RUN: -fmath-errno -ffp-contract=on -fno-rounding-math \
				// RUN: -ffp-eval-method=extended -emit-llvm -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-CONTRACT-EXT %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=none -target-feature +fullbf16 \
				// RUN: -fmath-errno -ffp-contract=on -fno-rounding-math \
				// RUN: -ffp-eval-method=extended -emit-llvm -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-CONTRACT-EXT %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=none \
				// RUN: -fapprox-func -fmath-errno -fno-signed-zeros -mreassociate \
				// RUN: -freciprocal-math -ffp-contract=on -fno-rounding-math \
				// RUN: -funsafe-math-optimizations -emit-llvm -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-UNSAFE %s

				// RUN: %clang_cc1 -triple x86_64-unknown-unknown \
				// RUN: -fbfloat16-excess-precision=none -target-feature +fullbf16 \
				// RUN: -fapprox-func -fmath-errno -fno-signed-zeros -mreassociate \
				// RUN: -freciprocal-math -ffp-contract=on -fno-rounding-math \
				// RUN: -funsafe-math-optimizations -emit-llvm -o - %s \
				// RUN: \| FileCheck -check-prefixes=CHECK-UNSAFE %s

				// CHECK-EXT-LABEL: define dso_local bfloat @f
				// CHECK-EXT-SAME: (bfloat noundef [[A:%.]], bfloat noundef [[B:%.]], bfloat noundef [[C:%.]], bfloat noundef [[D:%.]]) #[[ATTR0:[0-9]+]] {
				// CHECK-EXT-NEXT: entry:
				// CHECK-EXT-NEXT: [[A_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-EXT-NEXT: [[B_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-EXT-NEXT: [[C_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-EXT-NEXT: [[D_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-EXT-NEXT: store bfloat [[A]], ptr [[A_ADDR]], align 2
				// CHECK-EXT-NEXT: store bfloat [[B]], ptr [[B_ADDR]], align 2
				// CHECK-EXT-NEXT: store bfloat [[C]], ptr [[C_ADDR]], align 2
				// CHECK-EXT-NEXT: store bfloat [[D]], ptr [[D_ADDR]], align 2
				// CHECK-EXT-NEXT: [[TMP0:%.*]] = load bfloat, ptr [[A_ADDR]], align 2
				// CHECK-EXT-NEXT: [[EXT:%.*]] = fpext bfloat [[TMP0]] to float
				// CHECK-EXT-NEXT: [[TMP1:%.*]] = load bfloat, ptr [[B_ADDR]], align 2
				// CHECK-EXT-NEXT: [[EXT1:%.*]] = fpext bfloat [[TMP1]] to float
				// CHECK-EXT-NEXT: [[MUL:%.*]] = fmul float [[EXT]], [[EXT1]]
				// CHECK-EXT-NEXT: [[TMP2:%.*]] = load bfloat, ptr [[C_ADDR]], align 2
				// CHECK-EXT-NEXT: [[EXT2:%.*]] = fpext bfloat [[TMP2]] to float
				// CHECK-EXT-NEXT: [[TMP3:%.*]] = load bfloat, ptr [[D_ADDR]], align 2
				// CHECK-EXT-NEXT: [[EXT3:%.*]] = fpext bfloat [[TMP3]] to float
				// CHECK-EXT-NEXT: [[MUL4:%.*]] = fmul float [[EXT2]], [[EXT3]]
				// CHECK-EXT-NEXT: [[ADD:%.*]] = fadd float [[MUL]], [[MUL4]]
				// CHECK-EXT-NEXT: [[UNPROMOTION:%.*]] = fptrunc float [[ADD]] to bfloat
				// CHECK-EXT-NEXT: ret bfloat [[UNPROMOTION]]
				//
				// CHECK-NO-EXT-LABEL: define dso_local bfloat @f
				// CHECK-NO-EXT-SAME: (bfloat noundef [[A:%.]], bfloat noundef [[B:%.]], bfloat noundef [[C:%.]], bfloat noundef [[D:%.]]) #[[ATTR0:[0-9]+]] {
				// CHECK-NO-EXT-NEXT: entry:
				// CHECK-NO-EXT-NEXT: [[A_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-NO-EXT-NEXT: [[B_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-NO-EXT-NEXT: [[C_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-NO-EXT-NEXT: [[D_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-NO-EXT-NEXT: store bfloat [[A]], ptr [[A_ADDR]], align 2
				// CHECK-NO-EXT-NEXT: store bfloat [[B]], ptr [[B_ADDR]], align 2
				// CHECK-NO-EXT-NEXT: store bfloat [[C]], ptr [[C_ADDR]], align 2
				// CHECK-NO-EXT-NEXT: store bfloat [[D]], ptr [[D_ADDR]], align 2
				// CHECK-NO-EXT-NEXT: [[TMP0:%.*]] = load bfloat, ptr [[A_ADDR]], align 2
				// CHECK-NO-EXT-NEXT: [[TMP1:%.*]] = load bfloat, ptr [[B_ADDR]], align 2
				// CHECK-NO-EXT-NEXT: [[MUL:%.*]] = fmul bfloat [[TMP0]], [[TMP1]]
				// CHECK-NO-EXT-NEXT: [[TMP2:%.*]] = load bfloat, ptr [[C_ADDR]], align 2
				// CHECK-NO-EXT-NEXT: [[TMP3:%.*]] = load bfloat, ptr [[D_ADDR]], align 2
				// CHECK-NO-EXT-NEXT: [[MUL1:%.*]] = fmul bfloat [[TMP2]], [[TMP3]]
				// CHECK-NO-EXT-NEXT: [[ADD:%.*]] = fadd bfloat [[MUL]], [[MUL1]]
				// CHECK-NO-EXT-NEXT: ret bfloat [[ADD]]
				//
				// CHECK-EXT-DBL-LABEL: define dso_local bfloat @f
				// CHECK-EXT-DBL-SAME: (bfloat noundef [[A:%.]], bfloat noundef [[B:%.]], bfloat noundef [[C:%.]], bfloat noundef [[D:%.]]) #[[ATTR0:[0-9]+]] {
				// CHECK-EXT-DBL-NEXT: entry:
				// CHECK-EXT-DBL-NEXT: [[A_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-EXT-DBL-NEXT: [[B_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-EXT-DBL-NEXT: [[C_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-EXT-DBL-NEXT: [[D_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-EXT-DBL-NEXT: store bfloat [[A]], ptr [[A_ADDR]], align 2
				// CHECK-EXT-DBL-NEXT: store bfloat [[B]], ptr [[B_ADDR]], align 2
				// CHECK-EXT-DBL-NEXT: store bfloat [[C]], ptr [[C_ADDR]], align 2
				// CHECK-EXT-DBL-NEXT: store bfloat [[D]], ptr [[D_ADDR]], align 2
				// CHECK-EXT-DBL-NEXT: [[TMP0:%.*]] = load bfloat, ptr [[A_ADDR]], align 2
				// CHECK-EXT-DBL-NEXT: [[CONV:%.*]] = fpext bfloat [[TMP0]] to double
				// CHECK-EXT-DBL-NEXT: [[TMP1:%.*]] = load bfloat, ptr [[B_ADDR]], align 2
				// CHECK-EXT-DBL-NEXT: [[CONV1:%.*]] = fpext bfloat [[TMP1]] to double
				// CHECK-EXT-DBL-NEXT: [[MUL:%.*]] = fmul double [[CONV]], [[CONV1]]
				// CHECK-EXT-DBL-NEXT: [[TMP2:%.*]] = load bfloat, ptr [[C_ADDR]], align 2
				// CHECK-EXT-DBL-NEXT: [[CONV2:%.*]] = fpext bfloat [[TMP2]] to double
				// CHECK-EXT-DBL-NEXT: [[TMP3:%.*]] = load bfloat, ptr [[D_ADDR]], align 2
				// CHECK-EXT-DBL-NEXT: [[CONV3:%.*]] = fpext bfloat [[TMP3]] to double
				// CHECK-EXT-DBL-NEXT: [[MUL4:%.*]] = fmul double [[CONV2]], [[CONV3]]
				// CHECK-EXT-DBL-NEXT: [[ADD:%.*]] = fadd double [[MUL]], [[MUL4]]
				// CHECK-EXT-DBL-NEXT: [[CONV5:%.*]] = fptrunc double [[ADD]] to bfloat
				// CHECK-EXT-DBL-NEXT: ret bfloat [[CONV5]]
				//
				// CHECK-EXT-FP80-LABEL: define dso_local bfloat @f
				// CHECK-EXT-FP80-SAME: (bfloat noundef [[A:%.]], bfloat noundef [[B:%.]], bfloat noundef [[C:%.]], bfloat noundef [[D:%.]]) #[[ATTR0:[0-9]+]] {
				// CHECK-EXT-FP80-NEXT: entry:
				// CHECK-EXT-FP80-NEXT: [[A_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-EXT-FP80-NEXT: [[B_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-EXT-FP80-NEXT: [[C_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-EXT-FP80-NEXT: [[D_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-EXT-FP80-NEXT: store bfloat [[A]], ptr [[A_ADDR]], align 2
				// CHECK-EXT-FP80-NEXT: store bfloat [[B]], ptr [[B_ADDR]], align 2
				// CHECK-EXT-FP80-NEXT: store bfloat [[C]], ptr [[C_ADDR]], align 2
				// CHECK-EXT-FP80-NEXT: store bfloat [[D]], ptr [[D_ADDR]], align 2
				// CHECK-EXT-FP80-NEXT: [[TMP0:%.*]] = load bfloat, ptr [[A_ADDR]], align 2
				// CHECK-EXT-FP80-NEXT: [[CONV:%.*]] = fpext bfloat [[TMP0]] to x86_fp80
				// CHECK-EXT-FP80-NEXT: [[TMP1:%.*]] = load bfloat, ptr [[B_ADDR]], align 2
				// CHECK-EXT-FP80-NEXT: [[CONV1:%.*]] = fpext bfloat [[TMP1]] to x86_fp80
				// CHECK-EXT-FP80-NEXT: [[MUL:%.*]] = fmul x86_fp80 [[CONV]], [[CONV1]]
				// CHECK-EXT-FP80-NEXT: [[TMP2:%.*]] = load bfloat, ptr [[C_ADDR]], align 2
				// CHECK-EXT-FP80-NEXT: [[CONV2:%.*]] = fpext bfloat [[TMP2]] to x86_fp80
				// CHECK-EXT-FP80-NEXT: [[TMP3:%.*]] = load bfloat, ptr [[D_ADDR]], align 2
				// CHECK-EXT-FP80-NEXT: [[CONV3:%.*]] = fpext bfloat [[TMP3]] to x86_fp80
				// CHECK-EXT-FP80-NEXT: [[MUL4:%.*]] = fmul x86_fp80 [[CONV2]], [[CONV3]]
				// CHECK-EXT-FP80-NEXT: [[ADD:%.*]] = fadd x86_fp80 [[MUL]], [[MUL4]]
				// CHECK-EXT-FP80-NEXT: [[CONV5:%.*]] = fptrunc x86_fp80 [[ADD]] to bfloat
				// CHECK-EXT-FP80-NEXT: ret bfloat [[CONV5]]
				//
				// CHECK-CONTRACT-LABEL: define dso_local bfloat @f
				// CHECK-CONTRACT-SAME: (bfloat noundef [[A:%.]], bfloat noundef [[B:%.]], bfloat noundef [[C:%.]], bfloat noundef [[D:%.]]) #[[ATTR0:[0-9]+]] {
				// CHECK-CONTRACT-NEXT: entry:
				// CHECK-CONTRACT-NEXT: [[A_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-CONTRACT-NEXT: [[B_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-CONTRACT-NEXT: [[C_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-CONTRACT-NEXT: [[D_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-CONTRACT-NEXT: store bfloat [[A]], ptr [[A_ADDR]], align 2
				// CHECK-CONTRACT-NEXT: store bfloat [[B]], ptr [[B_ADDR]], align 2
				// CHECK-CONTRACT-NEXT: store bfloat [[C]], ptr [[C_ADDR]], align 2
				// CHECK-CONTRACT-NEXT: store bfloat [[D]], ptr [[D_ADDR]], align 2
				// CHECK-CONTRACT-NEXT: [[TMP0:%.*]] = load bfloat, ptr [[A_ADDR]], align 2
				// CHECK-CONTRACT-NEXT: [[TMP1:%.*]] = load bfloat, ptr [[B_ADDR]], align 2
				// CHECK-CONTRACT-NEXT: [[TMP2:%.*]] = load bfloat, ptr [[C_ADDR]], align 2
				// CHECK-CONTRACT-NEXT: [[TMP3:%.*]] = load bfloat, ptr [[D_ADDR]], align 2
				// CHECK-CONTRACT-NEXT: [[MUL1:%.*]] = fmul bfloat [[TMP2]], [[TMP3]]
				// CHECK-CONTRACT-NEXT: [[TMP4:%.*]] = call bfloat @llvm.fmuladd.bf16(bfloat [[TMP0]], bfloat [[TMP1]], bfloat [[MUL1]])
				// CHECK-CONTRACT-NEXT: ret bfloat [[TMP4]]
				//
				// CHECK-CONTRACT-DBL-LABEL: define dso_local bfloat @f
				// CHECK-CONTRACT-DBL-SAME: (bfloat noundef [[A:%.]], bfloat noundef [[B:%.]], bfloat noundef [[C:%.]], bfloat noundef [[D:%.]]) #[[ATTR0:[0-9]+]] {
				// CHECK-CONTRACT-DBL-NEXT: entry:
				// CHECK-CONTRACT-DBL-NEXT: [[A_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-CONTRACT-DBL-NEXT: [[B_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-CONTRACT-DBL-NEXT: [[C_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-CONTRACT-DBL-NEXT: [[D_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-CONTRACT-DBL-NEXT: store bfloat [[A]], ptr [[A_ADDR]], align 2
				// CHECK-CONTRACT-DBL-NEXT: store bfloat [[B]], ptr [[B_ADDR]], align 2
				// CHECK-CONTRACT-DBL-NEXT: store bfloat [[C]], ptr [[C_ADDR]], align 2
				// CHECK-CONTRACT-DBL-NEXT: store bfloat [[D]], ptr [[D_ADDR]], align 2
				// CHECK-CONTRACT-DBL-NEXT: [[TMP0:%.*]] = load bfloat, ptr [[A_ADDR]], align 2
				// CHECK-CONTRACT-DBL-NEXT: [[CONV:%.*]] = fpext bfloat [[TMP0]] to double
				// CHECK-CONTRACT-DBL-NEXT: [[TMP1:%.*]] = load bfloat, ptr [[B_ADDR]], align 2
				// CHECK-CONTRACT-DBL-NEXT: [[CONV1:%.*]] = fpext bfloat [[TMP1]] to double
				// CHECK-CONTRACT-DBL-NEXT: [[TMP2:%.*]] = load bfloat, ptr [[C_ADDR]], align 2
				// CHECK-CONTRACT-DBL-NEXT: [[CONV2:%.*]] = fpext bfloat [[TMP2]] to double
				// CHECK-CONTRACT-DBL-NEXT: [[TMP3:%.*]] = load bfloat, ptr [[D_ADDR]], align 2
				// CHECK-CONTRACT-DBL-NEXT: [[CONV3:%.*]] = fpext bfloat [[TMP3]] to double
				// CHECK-CONTRACT-DBL-NEXT: [[MUL4:%.*]] = fmul double [[CONV2]], [[CONV3]]
				// CHECK-CONTRACT-DBL-NEXT: [[TMP4:%.*]] = call double @llvm.fmuladd.f64(double [[CONV]], double [[CONV1]], double [[MUL4]])
				// CHECK-CONTRACT-DBL-NEXT: [[CONV5:%.*]] = fptrunc double [[TMP4]] to bfloat
				// CHECK-CONTRACT-DBL-NEXT: ret bfloat [[CONV5]]
				//
				// CHECK-CONTRACT-EXT-LABEL: define dso_local bfloat @f
				// CHECK-CONTRACT-EXT-SAME: (bfloat noundef [[A:%.]], bfloat noundef [[B:%.]], bfloat noundef [[C:%.]], bfloat noundef [[D:%.]]) #[[ATTR0:[0-9]+]] {
				// CHECK-CONTRACT-EXT-NEXT: entry:
				// CHECK-CONTRACT-EXT-NEXT: [[A_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-CONTRACT-EXT-NEXT: [[B_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-CONTRACT-EXT-NEXT: [[C_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-CONTRACT-EXT-NEXT: [[D_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-CONTRACT-EXT-NEXT: store bfloat [[A]], ptr [[A_ADDR]], align 2
				// CHECK-CONTRACT-EXT-NEXT: store bfloat [[B]], ptr [[B_ADDR]], align 2
				// CHECK-CONTRACT-EXT-NEXT: store bfloat [[C]], ptr [[C_ADDR]], align 2
				// CHECK-CONTRACT-EXT-NEXT: store bfloat [[D]], ptr [[D_ADDR]], align 2
				// CHECK-CONTRACT-EXT-NEXT: [[TMP0:%.*]] = load bfloat, ptr [[A_ADDR]], align 2
				// CHECK-CONTRACT-EXT-NEXT: [[CONV:%.*]] = fpext bfloat [[TMP0]] to x86_fp80
				// CHECK-CONTRACT-EXT-NEXT: [[TMP1:%.*]] = load bfloat, ptr [[B_ADDR]], align 2
				// CHECK-CONTRACT-EXT-NEXT: [[CONV1:%.*]] = fpext bfloat [[TMP1]] to x86_fp80
				// CHECK-CONTRACT-EXT-NEXT: [[TMP2:%.*]] = load bfloat, ptr [[C_ADDR]], align 2
				// CHECK-CONTRACT-EXT-NEXT: [[CONV2:%.*]] = fpext bfloat [[TMP2]] to x86_fp80
				// CHECK-CONTRACT-EXT-NEXT: [[TMP3:%.*]] = load bfloat, ptr [[D_ADDR]], align 2
				// CHECK-CONTRACT-EXT-NEXT: [[CONV3:%.*]] = fpext bfloat [[TMP3]] to x86_fp80
				// CHECK-CONTRACT-EXT-NEXT: [[MUL4:%.*]] = fmul x86_fp80 [[CONV2]], [[CONV3]]
				// CHECK-CONTRACT-EXT-NEXT: [[TMP4:%.*]] = call x86_fp80 @llvm.fmuladd.f80(x86_fp80 [[CONV]], x86_fp80 [[CONV1]], x86_fp80 [[MUL4]])
				// CHECK-CONTRACT-EXT-NEXT: [[CONV5:%.*]] = fptrunc x86_fp80 [[TMP4]] to bfloat
				// CHECK-CONTRACT-EXT-NEXT: ret bfloat [[CONV5]]
				//
				// CHECK-UNSAFE-LABEL: define dso_local bfloat @f
				// CHECK-UNSAFE-SAME: (bfloat noundef [[A:%.]], bfloat noundef [[B:%.]], bfloat noundef [[C:%.]], bfloat noundef [[D:%.]]) #[[ATTR0:[0-9]+]] {
				// CHECK-UNSAFE-NEXT: entry:
				// CHECK-UNSAFE-NEXT: [[A_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-UNSAFE-NEXT: [[B_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-UNSAFE-NEXT: [[C_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-UNSAFE-NEXT: [[D_ADDR:%.*]] = alloca bfloat, align 2
				// CHECK-UNSAFE-NEXT: store bfloat [[A]], ptr [[A_ADDR]], align 2
				// CHECK-UNSAFE-NEXT: store bfloat [[B]], ptr [[B_ADDR]], align 2
				// CHECK-UNSAFE-NEXT: store bfloat [[C]], ptr [[C_ADDR]], align 2
				// CHECK-UNSAFE-NEXT: store bfloat [[D]], ptr [[D_ADDR]], align 2
				// CHECK-UNSAFE-NEXT: [[TMP0:%.*]] = load bfloat, ptr [[A_ADDR]], align 2
				// CHECK-UNSAFE-NEXT: [[TMP1:%.*]] = load bfloat, ptr [[B_ADDR]], align 2
				// CHECK-UNSAFE-NEXT: [[TMP2:%.*]] = load bfloat, ptr [[C_ADDR]], align 2
				// CHECK-UNSAFE-NEXT: [[TMP3:%.*]] = load bfloat, ptr [[D_ADDR]], align 2
				// CHECK-UNSAFE-NEXT: [[MUL1:%.*]] = fmul reassoc nsz arcp afn bfloat [[TMP2]], [[TMP3]]
				// CHECK-UNSAFE-NEXT: [[TMP4:%.*]] = call reassoc nsz arcp afn bfloat @llvm.fmuladd.bf16(bfloat [[TMP0]], bfloat [[TMP1]], bfloat [[MUL1]])
				// CHECK-UNSAFE-NEXT: ret bfloat [[TMP4]]
				//
				__bf16 f(__bf16 a, __bf16 b, __bf16 c, __bf16 d) {
				return a * b + c * d;
				}
				zahiraamUnsubmitted Done Reply Inline Actions Fix this. zahiraam: Fix this.

clang/test/CodeGenCUDA/amdgpu-bf16.cu

	// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py			// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target
	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target

	// RUN: %clang_cc1 "-aux-triple" "x86_64-unknown-linux-gnu" "-triple" "amdgcn-amd-amdhsa" \			// RUN: %clang_cc1 "-aux-triple" "x86_64-unknown-linux-gnu" "-triple" "amdgcn-amd-amdhsa" \
	// RUN: -fcuda-is-device "-aux-target-cpu" "x86-64" -emit-llvm -o - %s \| FileCheck %s			// RUN: -fcuda-is-device "-aux-target-cpu" "x86-64" -emit-llvm -o - %s \| FileCheck %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	// CHECK-LABEL: @_Z8test_argPu6__bf16u6__bf16(			// CHECK-LABEL: @_Z8test_argPDF16bDF16b(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[OUT_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)			// CHECK-NEXT: [[OUT_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
	// CHECK-NEXT: [[IN_ADDR:%.*]] = alloca bfloat, align 2, addrspace(5)			// CHECK-NEXT: [[IN_ADDR:%.*]] = alloca bfloat, align 2, addrspace(5)
	// CHECK-NEXT: [[BF16:%.*]] = alloca bfloat, align 2, addrspace(5)			// CHECK-NEXT: [[BF16:%.*]] = alloca bfloat, align 2, addrspace(5)
	// CHECK-NEXT: [[OUT_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[OUT_ADDR]] to ptr			// CHECK-NEXT: [[OUT_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[OUT_ADDR]] to ptr
	// CHECK-NEXT: [[IN_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[IN_ADDR]] to ptr			// CHECK-NEXT: [[IN_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[IN_ADDR]] to ptr
	// CHECK-NEXT: [[BF16_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[BF16]] to ptr			// CHECK-NEXT: [[BF16_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[BF16]] to ptr
	// CHECK-NEXT: store ptr [[OUT:%.*]], ptr [[OUT_ADDR_ASCAST]], align 8			// CHECK-NEXT: store ptr [[OUT:%.*]], ptr [[OUT_ADDR_ASCAST]], align 8
	// CHECK-NEXT: store bfloat [[IN:%.*]], ptr [[IN_ADDR_ASCAST]], align 2			// CHECK-NEXT: store bfloat [[IN:%.*]], ptr [[IN_ADDR_ASCAST]], align 2
	// CHECK-NEXT: [[TMP0:%.*]] = load bfloat, ptr [[IN_ADDR_ASCAST]], align 2			// CHECK-NEXT: [[TMP0:%.*]] = load bfloat, ptr [[IN_ADDR_ASCAST]], align 2
	// CHECK-NEXT: store bfloat [[TMP0]], ptr [[BF16_ASCAST]], align 2			// CHECK-NEXT: store bfloat [[TMP0]], ptr [[BF16_ASCAST]], align 2
	// CHECK-NEXT: [[TMP1:%.*]] = load bfloat, ptr [[BF16_ASCAST]], align 2			// CHECK-NEXT: [[TMP1:%.*]] = load bfloat, ptr [[BF16_ASCAST]], align 2
	// CHECK-NEXT: [[TMP2:%.*]] = load ptr, ptr [[OUT_ADDR_ASCAST]], align 8			// CHECK-NEXT: [[TMP2:%.*]] = load ptr, ptr [[OUT_ADDR_ASCAST]], align 8
	// CHECK-NEXT: store bfloat [[TMP1]], ptr [[TMP2]], align 2			// CHECK-NEXT: store bfloat [[TMP1]], ptr [[TMP2]], align 2
	// CHECK-NEXT: ret void			// CHECK-NEXT: ret void
	//			//
	__device__ void test_arg(__bf16 *out, __bf16 in) {			__device__ void test_arg(__bf16 *out, __bf16 in) {
	__bf16 bf16 = in;			__bf16 bf16 = in;
	*out = bf16;			*out = bf16;
	}			}

	// CHECK-LABEL: @_Z9test_loadPu6__bf16S_(			// CHECK-LABEL: @_Z9test_loadPDF16bS_(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[OUT_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)			// CHECK-NEXT: [[OUT_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
	// CHECK-NEXT: [[IN_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)			// CHECK-NEXT: [[IN_ADDR:%.*]] = alloca ptr, align 8, addrspace(5)
	// CHECK-NEXT: [[BF16:%.*]] = alloca bfloat, align 2, addrspace(5)			// CHECK-NEXT: [[BF16:%.*]] = alloca bfloat, align 2, addrspace(5)
	// CHECK-NEXT: [[OUT_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[OUT_ADDR]] to ptr			// CHECK-NEXT: [[OUT_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[OUT_ADDR]] to ptr
	// CHECK-NEXT: [[IN_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[IN_ADDR]] to ptr			// CHECK-NEXT: [[IN_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[IN_ADDR]] to ptr
	// CHECK-NEXT: [[BF16_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[BF16]] to ptr			// CHECK-NEXT: [[BF16_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[BF16]] to ptr
	// CHECK-NEXT: store ptr [[OUT:%.*]], ptr [[OUT_ADDR_ASCAST]], align 8			// CHECK-NEXT: store ptr [[OUT:%.*]], ptr [[OUT_ADDR_ASCAST]], align 8
	// CHECK-NEXT: store ptr [[IN:%.*]], ptr [[IN_ADDR_ASCAST]], align 8			// CHECK-NEXT: store ptr [[IN:%.*]], ptr [[IN_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[IN_ADDR_ASCAST]], align 8			// CHECK-NEXT: [[TMP0:%.*]] = load ptr, ptr [[IN_ADDR_ASCAST]], align 8
	// CHECK-NEXT: [[TMP1:%.*]] = load bfloat, ptr [[TMP0]], align 2			// CHECK-NEXT: [[TMP1:%.*]] = load bfloat, ptr [[TMP0]], align 2
	// CHECK-NEXT: store bfloat [[TMP1]], ptr [[BF16_ASCAST]], align 2			// CHECK-NEXT: store bfloat [[TMP1]], ptr [[BF16_ASCAST]], align 2
	// CHECK-NEXT: [[TMP2:%.*]] = load bfloat, ptr [[BF16_ASCAST]], align 2			// CHECK-NEXT: [[TMP2:%.*]] = load bfloat, ptr [[BF16_ASCAST]], align 2
	// CHECK-NEXT: [[TMP3:%.*]] = load ptr, ptr [[OUT_ADDR_ASCAST]], align 8			// CHECK-NEXT: [[TMP3:%.*]] = load ptr, ptr [[OUT_ADDR_ASCAST]], align 8
	// CHECK-NEXT: store bfloat [[TMP2]], ptr [[TMP3]], align 2			// CHECK-NEXT: store bfloat [[TMP2]], ptr [[TMP3]], align 2
	// CHECK-NEXT: ret void			// CHECK-NEXT: ret void
	//			//
	__device__ void test_load(__bf16 out, __bf16 in) {			__device__ void test_load(__bf16 out, __bf16 in) {
	__bf16 bf16 = *in;			__bf16 bf16 = *in;
	*out = bf16;			*out = bf16;
	}			}

	// CHECK-LABEL: @_Z8test_retu6__bf16(			// CHECK-LABEL: @_Z8test_retDF16b(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[RETVAL:%.*]] = alloca bfloat, align 2, addrspace(5)			// CHECK-NEXT: [[RETVAL:%.*]] = alloca bfloat, align 2, addrspace(5)
	// CHECK-NEXT: [[IN_ADDR:%.*]] = alloca bfloat, align 2, addrspace(5)			// CHECK-NEXT: [[IN_ADDR:%.*]] = alloca bfloat, align 2, addrspace(5)
	// CHECK-NEXT: [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr			// CHECK-NEXT: [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
	// CHECK-NEXT: [[IN_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[IN_ADDR]] to ptr			// CHECK-NEXT: [[IN_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[IN_ADDR]] to ptr
	// CHECK-NEXT: store bfloat [[IN:%.*]], ptr [[IN_ADDR_ASCAST]], align 2			// CHECK-NEXT: store bfloat [[IN:%.*]], ptr [[IN_ADDR_ASCAST]], align 2
	// CHECK-NEXT: [[TMP0:%.*]] = load bfloat, ptr [[IN_ADDR_ASCAST]], align 2			// CHECK-NEXT: [[TMP0:%.*]] = load bfloat, ptr [[IN_ADDR_ASCAST]], align 2
	// CHECK-NEXT: ret bfloat [[TMP0]]			// CHECK-NEXT: ret bfloat [[TMP0]]
	//			//
	__device__ __bf16 test_ret( __bf16 in) {			__device__ __bf16 test_ret( __bf16 in) {
	return in;			return in;
	}			}

	// CHECK-LABEL: @_Z9test_callu6__bf16(			// CHECK-LABEL: @_Z9test_callDF16b(
	// CHECK-NEXT: entry:			// CHECK-NEXT: entry:
	// CHECK-NEXT: [[RETVAL:%.*]] = alloca bfloat, align 2, addrspace(5)			// CHECK-NEXT: [[RETVAL:%.*]] = alloca bfloat, align 2, addrspace(5)
	// CHECK-NEXT: [[IN_ADDR:%.*]] = alloca bfloat, align 2, addrspace(5)			// CHECK-NEXT: [[IN_ADDR:%.*]] = alloca bfloat, align 2, addrspace(5)
	// CHECK-NEXT: [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr			// CHECK-NEXT: [[RETVAL_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[RETVAL]] to ptr
	// CHECK-NEXT: [[IN_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[IN_ADDR]] to ptr			// CHECK-NEXT: [[IN_ADDR_ASCAST:%.*]] = addrspacecast ptr addrspace(5) [[IN_ADDR]] to ptr
	// CHECK-NEXT: store bfloat [[IN:%.*]], ptr [[IN_ADDR_ASCAST]], align 2			// CHECK-NEXT: store bfloat [[IN:%.*]], ptr [[IN_ADDR_ASCAST]], align 2
	// CHECK-NEXT: [[TMP0:%.*]] = load bfloat, ptr [[IN_ADDR_ASCAST]], align 2			// CHECK-NEXT: [[TMP0:%.*]] = load bfloat, ptr [[IN_ADDR_ASCAST]], align 2
	// CHECK-NEXT: [[CALL:%.*]] = call contract noundef bfloat @_Z8test_retu6__bf16(bfloat noundef [[TMP0]]) #[[ATTR1:[0-9]+]]			// CHECK-NEXT: [[CALL:%.*]] = call contract noundef bfloat @_Z8test_retDF16b(bfloat noundef [[TMP0]]) #[[ATTR1:[0-9]+]]
	// CHECK-NEXT: ret bfloat [[CALL]]			// CHECK-NEXT: ret bfloat [[CALL]]
	//			//
	__device__ __bf16 test_call( __bf16 in) {			__device__ __bf16 test_call( __bf16 in) {
	return test_ret(in);			return test_ret(in);
	}			}


	// CHECK-LABEL: @_Z15test_vec_assignv(			// CHECK-LABEL: @_Z15test_vec_assignv(
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/bf16.cu

	// REQUIRES: nvptx-registered-target			// REQUIRES: nvptx-registered-target
	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target

	// RUN: %clang_cc1 "-aux-triple" "x86_64-unknown-linux-gnu" "-triple" "nvptx64-nvidia-cuda" \			// RUN: %clang_cc1 "-aux-triple" "x86_64-unknown-linux-gnu" "-triple" "nvptx64-nvidia-cuda" \
	// RUN: -fcuda-is-device "-aux-target-cpu" "x86-64" -S -o - %s \| FileCheck %s			// RUN: -fcuda-is-device "-aux-target-cpu" "x86-64" -S -o - %s \| FileCheck %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	// CHECK-LABEL: .visible .func _Z8test_argPu6__bf16u6__bf16(			// CHECK-LABEL: .visible .func _Z8test_argPDF16bDF16b(
	// CHECK: .param .b64 _Z8test_argPu6__bf16u6__bf16_param_0,			// CHECK: .param .b64 _Z8test_argPDF16bDF16b_param_0,
	// CHECK: .param .b16 _Z8test_argPu6__bf16u6__bf16_param_1			// CHECK: .param .b16 _Z8test_argPDF16bDF16b_param_1
	//			//
	__device__ void test_arg(__bf16 *out, __bf16 in) {			__device__ void test_arg(__bf16 *out, __bf16 in) {
	// CHECK: ld.param.b16 %{{h.*}}, [_Z8test_argPu6__bf16u6__bf16_param_1];			// CHECK: ld.param.b16 %{{h.*}}, [_Z8test_argPDF16bDF16b_param_1];
	__bf16 bf16 = in;			__bf16 bf16 = in;
	*out = bf16;			*out = bf16;
	// CHECK: st.b16			// CHECK: st.b16
	// CHECK: ret;			// CHECK: ret;
	}			}


	// CHECK-LABEL: .visible .func (.param .b32 func_retval0) _Z8test_retu6__bf16(			// CHECK-LABEL: .visible .func (.param .b32 func_retval0) _Z8test_retDF16b(
	// CHECK: .param .b16 _Z8test_retu6__bf16_param_0			// CHECK: .param .b16 _Z8test_retDF16b_param_0
	__device__ __bf16 test_ret( __bf16 in) {			__device__ __bf16 test_ret( __bf16 in) {
	// CHECK: ld.param.b16 %h{{.*}}, [_Z8test_retu6__bf16_param_0];			// CHECK: ld.param.b16 %h{{.*}}, [_Z8test_retDF16b_param_0];
	return in;			return in;
	// CHECK: st.param.b16 [func_retval0+0], %h			// CHECK: st.param.b16 [func_retval0+0], %h
	// CHECK: ret;			// CHECK: ret;
	}			}

	// CHECK-LABEL: .visible .func (.param .b32 func_retval0) _Z9test_callu6__bf16(			// CHECK-LABEL: .visible .func (.param .b32 func_retval0) _Z9test_callDF16b(
	// CHECK: .param .b16 _Z9test_callu6__bf16_param_0			// CHECK: .param .b16 _Z9test_callDF16b_param_0
	__device__ __bf16 test_call( __bf16 in) {			__device__ __bf16 test_call( __bf16 in) {
	// CHECK: ld.param.b16 %h{{.*}}, [_Z9test_callu6__bf16_param_0];			// CHECK: ld.param.b16 %h{{.*}}, [_Z9test_callDF16b_param_0];
	// CHECK: st.param.b16 [param0+0], %h2;			// CHECK: st.param.b16 [param0+0], %h2;
	// CHECK: .param .b32 retval0;			// CHECK: .param .b32 retval0;
	// CHECK: call.uni (retval0),			// CHECK: call.uni (retval0),
	// CHECK-NEXT: _Z8test_retu6__bf16,			// CHECK-NEXT: _Z8test_retDF16b,
	// CHECK-NEXT: (			// CHECK-NEXT: (
	// CHECK-NEXT: param0			// CHECK-NEXT: param0
	// CHECK-NEXT );			// CHECK-NEXT );
	// CHECK: ld.param.b16 %h{{.*}}, [retval0+0];			// CHECK: ld.param.b16 %h{{.*}}, [retval0+0];
	return test_ret(in);			return test_ret(in);
	// CHECK: st.param.b16 [func_retval0+0], %h			// CHECK: st.param.b16 [func_retval0+0], %h
	// CHECK: ret;			// CHECK: ret;
	}			}

clang/test/Driver/fexcess-precision.c

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	// RUN: \| FileCheck --check-prefix=CHECK-ERR-16 %s			// RUN: \| FileCheck --check-prefix=CHECK-ERR-16 %s

	// RUN: %clang -### -target aarch64 -fexcess-precision=none -c %s 2>&1 \			// RUN: %clang -### -target aarch64 -fexcess-precision=none -c %s 2>&1 \
	// RUN: \| FileCheck --check-prefix=CHECK-ERR-NONE %s			// RUN: \| FileCheck --check-prefix=CHECK-ERR-NONE %s
	// RUN: %clang_cl -### -target aarch64 -fexcess-precision=none -c -- %s 2>&1 \			// RUN: %clang_cl -### -target aarch64 -fexcess-precision=none -c -- %s 2>&1 \
	// RUN: \| FileCheck --check-prefix=CHECK-ERR-NONE %s			// RUN: \| FileCheck --check-prefix=CHECK-ERR-NONE %s

	// CHECK-FAST: "-ffloat16-excess-precision=fast"			// CHECK-FAST: "-ffloat16-excess-precision=fast"
				// CHECK-FAST: "-fbfloat16-excess-precision=fast"
	// CHECK-STD: "-ffloat16-excess-precision=standard"			// CHECK-STD: "-ffloat16-excess-precision=standard"
				// CHECK-STD: "-fbfloat16-excess-precision=standard"
	// CHECK-NONE: "-ffloat16-excess-precision=none"			// CHECK-NONE: "-ffloat16-excess-precision=none"
				// CHECK-NONE: "-fbfloat16-excess-precision=none"
	// CHECK-ERR-NONE: unsupported argument 'none' to option '-fexcess-precision='			// CHECK-ERR-NONE: unsupported argument 'none' to option '-fexcess-precision='
	// CHECK: "-cc1"			// CHECK: "-cc1"
	// CHECK-NOT: "-ffloat16-excess-precision=fast"			// CHECK-NOT: "-ffloat16-excess-precision=fast"
				// CHECK-NOT: "-fbfloat16-excess-precision=fast"
	// CHECK-ERR-16: unsupported argument '16' to option '-fexcess-precision='			// CHECK-ERR-16: unsupported argument '16' to option '-fexcess-precision='

clang/test/Sema/arm-bf16-forbidden-ops.c

This file was deleted.

	// RUN: %clang_cc1 -fsyntax-only -verify -triple aarch64 -target-feature +bf16 %s
	// RUN: %clang_cc1 -fsyntax-only -verify -triple aarch64 -target-feature -bf16 %s

	__bf16 test_cast_from_float(float in) {
	return (__bf16)in; // expected-error {{cannot type-cast to __bf16}}
	}

	__bf16 test_cast_from_float_literal(void) {
	return (__bf16)1.0f; // expected-error {{cannot type-cast to __bf16}}
	}

	__bf16 test_cast_from_int(int in) {
	return (__bf16)in; // expected-error {{cannot type-cast to __bf16}}
	}

	__bf16 test_cast_from_int_literal(void) {
	return (__bf16)1; // expected-error {{cannot type-cast to __bf16}}
	}

	__bf16 test_cast_bfloat(__bf16 in) {
	return (__bf16)in; // this one should work
	}

	float test_cast_to_float(__bf16 in) {
	return (float)in; // expected-error {{cannot type-cast from __bf16}}
	}

	int test_cast_to_int(__bf16 in) {
	return (int)in; // expected-error {{cannot type-cast from __bf16}}
	}

	__bf16 test_implicit_from_float(float in) {
	return in; // expected-error {{returning 'float' from a function with incompatible result type '__bf16'}}
	}

	__bf16 test_implicit_from_float_literal(void) {
	return 1.0f; // expected-error {{returning 'float' from a function with incompatible result type '__bf16'}}
	}

	__bf16 test_implicit_from_int(int in) {
	return in; // expected-error {{returning 'int' from a function with incompatible result type '__bf16'}}
	}

	__bf16 test_implicit_from_int_literal(void) {
	return 1; // expected-error {{returning 'int' from a function with incompatible result type '__bf16'}}
	}

	__bf16 test_implicit_bfloat(__bf16 in) {
	return in; // this one should work
	}

	float test_implicit_to_float(__bf16 in) {
	return in; // expected-error {{returning '__bf16' from a function with incompatible result type 'float'}}
	}

	int test_implicit_to_int(__bf16 in) {
	return in; // expected-error {{returning '__bf16' from a function with incompatible result type 'int'}}
	}

	__bf16 test_cond(__bf16 a, __bf16 b, _Bool which) {
	// Conditional operator _should_ be supported, without nonsense
	// complaints like 'types __bf16 and __bf16 are not compatible'
	return which ? a : b;
	}

	__bf16 test_cond_float(__bf16 a, __bf16 b, _Bool which) {
	return which ? a : 1.0f; // expected-error {{incompatible operand types ('__bf16' and 'float')}}
	}

	__bf16 test_cond_int(__bf16 a, __bf16 b, _Bool which) {
	return which ? a : 1; // expected-error {{incompatible operand types ('__bf16' and 'int')}}
	}

clang/test/Sema/arm-bf16-forbidden-ops.cpp

This file was deleted.

	// RUN: %clang_cc1 -fsyntax-only -verify -triple aarch64 -target-feature +bf16 %s
	// RUN: %clang_cc1 -fsyntax-only -verify -triple aarch64 -target-feature -bf16 %s

	__bf16 test_static_cast_from_float(float in) {
	return static_cast<__bf16>(in); // expected-error {{static_cast from 'float' to '__bf16' is not allowed}}
	}

	__bf16 test_static_cast_from_float_literal(void) {
	return static_cast<__bf16>(1.0f); // expected-error {{static_cast from 'float' to '__bf16' is not allowed}}
	}

	__bf16 test_static_cast_from_int(int in) {
	return static_cast<__bf16>(in); // expected-error {{static_cast from 'int' to '__bf16' is not allowed}}
	}

	__bf16 test_static_cast_from_int_literal(void) {
	return static_cast<__bf16>(1); // expected-error {{static_cast from 'int' to '__bf16' is not allowed}}
	}

	__bf16 test_static_cast_bfloat(__bf16 in) {
	return static_cast<__bf16>(in); // this one should work
	}

	float test_static_cast_to_float(__bf16 in) {
	return static_cast<float>(in); // expected-error {{static_cast from '__bf16' to 'float' is not allowed}}
	}

	int test_static_cast_to_int(__bf16 in) {
	return static_cast<int>(in); // expected-error {{static_cast from '__bf16' to 'int' is not allowed}}
	}

	__bf16 test_implicit_from_float(float in) {
	return in; // expected-error {{cannot initialize return object of type '__bf16' with an lvalue of type 'float'}}
	}

	__bf16 test_implicit_from_float_literal() {
	return 1.0f; // expected-error {{cannot initialize return object of type '__bf16' with an rvalue of type 'float'}}
	}

	__bf16 test_implicit_from_int(int in) {
	return in; // expected-error {{cannot initialize return object of type '__bf16' with an lvalue of type 'int'}}
	}

	__bf16 test_implicit_from_int_literal() {
	return 1; // expected-error {{cannot initialize return object of type '__bf16' with an rvalue of type 'int'}}
	}

	__bf16 test_implicit_bfloat(__bf16 in) {
	return in; // this one should work
	}

	float test_implicit_to_float(__bf16 in) {
	return in; // expected-error {{cannot initialize return object of type 'float' with an lvalue of type '__bf16'}}
	}

	int test_implicit_to_int(__bf16 in) {
	return in; // expected-error {{cannot initialize return object of type 'int' with an lvalue of type '__bf16'}}
	}

	__bf16 test_cond(__bf16 a, __bf16 b, bool which) {
	// Conditional operator _should_ be supported, without nonsense
	// complaints like 'types __bf16 and __bf16 are not compatible'
	return which ? a : b;
	}

	__bf16 test_cond_float(__bf16 a, __bf16 b, bool which) {
	return which ? a : 1.0f; // expected-error {{incompatible operand types ('__bf16' and 'float')}}
	}

	__bf16 test_cond_int(__bf16 a, __bf16 b, bool which) {
	return which ? a : 1; // expected-error {{incompatible operand types ('__bf16' and 'int')}}
	}

clang/test/Sema/arm-bfloat.cpp

	// RUN: %clang_cc1 -fsyntax-only -verify=scalar,neon -std=c++11 \			// RUN: %clang_cc1 -fsyntax-only -verify=scalar,neon -std=c++11 \
	// RUN: -triple aarch64-arm-none-eabi -target-cpu cortex-a75 \			// RUN: -triple aarch64-arm-none-eabi -target-cpu cortex-a75 \
	// RUN: -target-feature +bf16 -target-feature +neon %s			// RUN: -target-feature +bf16 -target-feature +neon -Wno-unused %s
	// RUN: %clang_cc1 -fsyntax-only -verify=scalar,neon -std=c++11 \			// RUN: %clang_cc1 -fsyntax-only -verify=scalar,neon -std=c++11 \
	// RUN: -triple arm-arm-none-eabi -target-cpu cortex-a53 \			// RUN: -triple arm-arm-none-eabi -target-cpu cortex-a53 \
	// RUN: -target-feature +bf16 -target-feature +neon %s			// RUN: -target-feature +bf16 -target-feature +neon -Wno-unused %s

	// The types should be available under AArch64 even without the bf16 feature			// The types should be available under AArch64 even without the bf16 feature
	// RUN: %clang_cc1 -fsyntax-only -verify=scalar -DNONEON -std=c++11 \			// RUN: %clang_cc1 -fsyntax-only -verify=scalar -DNONEON -std=c++11 \
	// RUN: -triple aarch64-arm-none-eabi -target-cpu cortex-a75 \			// RUN: -triple aarch64-arm-none-eabi -target-cpu cortex-a75 \
	// RUN: -target-feature -bf16 -target-feature +neon %s			// RUN: -target-feature -bf16 -target-feature +neon -Wno-unused %s

	// REQUIRES: aarch64-registered-target \|\| arm-registered-target			// REQUIRES: aarch64-registered-target \|\| arm-registered-target

	void test(bool b) {			void test(bool b) {
	__bf16 bf16;			__bf16 bf16;

	bf16 + bf16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}			bf16 + bf16;
	bf16 - bf16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}			bf16 - bf16;
	bf16 * bf16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}			bf16 * bf16;
	bf16 / bf16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}			bf16 / bf16;

	__fp16 fp16;			__fp16 fp16;

	bf16 + fp16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}			bf16 + fp16;
	fp16 + bf16; // scalar-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}			fp16 + bf16;
	bf16 - fp16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}			bf16 - fp16;
	fp16 - bf16; // scalar-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}			fp16 - bf16;
	bf16 * fp16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}			bf16 * fp16;
	fp16 * bf16; // scalar-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}			fp16 * bf16;
	bf16 / fp16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}			bf16 / fp16;
	fp16 / bf16; // scalar-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}			fp16 / bf16;
	bf16 = fp16; // scalar-error {{assigning to '__bf16' from incompatible type '__fp16'}}			bf16 = fp16; // scalar-error {{assigning to '__bf16' from incompatible type '__fp16'}}
	fp16 = bf16; // scalar-error {{assigning to '__fp16' from incompatible type '__bf16'}}			fp16 = bf16; // scalar-error {{assigning to '__fp16' from incompatible type '__bf16'}}
	bf16 + (b ? fp16 : bf16); // scalar-error {{incompatible operand types ('__fp16' and '__bf16')}}			bf16 + (b ? fp16 : bf16);
	}			}

	#ifndef NONEON			#ifndef NONEON

	#include <arm_neon.h>			#include <arm_neon.h>

	void test_vector(bfloat16x4_t a, bfloat16x4_t b, float16x4_t c) {			void test_vector(bfloat16x4_t a, bfloat16x4_t b, float16x4_t c) {
	a + b; // neon-error {{invalid operands to binary expression ('bfloat16x4_t' (vector of 4 'bfloat16_t' values) and 'bfloat16x4_t')}}			a + b;
	a - b; // neon-error {{invalid operands to binary expression ('bfloat16x4_t' (vector of 4 'bfloat16_t' values) and 'bfloat16x4_t')}}			a - b;
	a * b; // neon-error {{invalid operands to binary expression ('bfloat16x4_t' (vector of 4 'bfloat16_t' values) and 'bfloat16x4_t')}}			a * b;
	a / b; // neon-error {{invalid operands to binary expression ('bfloat16x4_t' (vector of 4 'bfloat16_t' values) and 'bfloat16x4_t')}}			a / b;

	a + c; // neon-error {{invalid operands to binary expression ('bfloat16x4_t' (vector of 4 'bfloat16_t' values) and 'float16x4_t' (vector of 4 'float16_t' values))}}			a + c;
	a - c; // neon-error {{invalid operands to binary expression ('bfloat16x4_t' (vector of 4 'bfloat16_t' values) and 'float16x4_t' (vector of 4 'float16_t' values))}}			a - c;
	a * c; // neon-error {{invalid operands to binary expression ('bfloat16x4_t' (vector of 4 'bfloat16_t' values) and 'float16x4_t' (vector of 4 'float16_t' values))}}			a * c;
	a / c; // neon-error {{invalid operands to binary expression ('bfloat16x4_t' (vector of 4 'bfloat16_t' values) and 'float16x4_t' (vector of 4 'float16_t' values))}}			a / c;
	c + b; // neon-error {{invalid operands to binary expression ('float16x4_t' (vector of 4 'float16_t' values) and 'bfloat16x4_t' (vector of 4 'bfloat16_t' values))}}			c + b;
	c - b; // neon-error {{invalid operands to binary expression ('float16x4_t' (vector of 4 'float16_t' values) and 'bfloat16x4_t' (vector of 4 'bfloat16_t' values))}}			c - b;
	c * b; // neon-error {{invalid operands to binary expression ('float16x4_t' (vector of 4 'float16_t' values) and 'bfloat16x4_t' (vector of 4 'bfloat16_t' values))}}			c * b;
	c / b; // neon-error {{invalid operands to binary expression ('float16x4_t' (vector of 4 'float16_t' values) and 'bfloat16x4_t' (vector of 4 'bfloat16_t' values))}}			c / b;
				codemzsAuthorUnsubmitted Done Reply Inline Actions Remove newline. codemzs: Remove newline.
	}			}
	#endif			#endif
	No newline at end of file			No newline at end of file

clang/test/SemaCUDA/amdgpu-bf16.cu

	// REQUIRES: amdgpu-registered-target			// REQUIRES: amdgpu-registered-target
	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target

	// RUN: %clang_cc1 "-triple" "x86_64-unknown-linux-gnu" "-aux-triple" "amdgcn-amd-amdhsa"\
	// RUN: "-target-cpu" "x86-64" -fsyntax-only -verify=amdgcn %s
	// RUN: %clang_cc1 "-aux-triple" "x86_64-unknown-linux-gnu" "-triple" "amdgcn-amd-amdhsa"\
	// RUN: -fcuda-is-device "-aux-target-cpu" "x86-64" -fsyntax-only -verify=amdgcn %s

	// RUN: %clang_cc1 "-aux-triple" "x86_64-unknown-linux-gnu" "-triple" "r600-unknown-unknown"\			// RUN: %clang_cc1 "-aux-triple" "x86_64-unknown-linux-gnu" "-triple" "r600-unknown-unknown"\
	// RUN: -fcuda-is-device "-aux-target-cpu" "x86-64" -fsyntax-only -verify=amdgcn,r600 %s			// RUN: -fcuda-is-device "-aux-target-cpu" "x86-64" -fsyntax-only -verify=r600 %s

	// AMDGCN has storage-only support for bf16. R600 does not support it should error out when			// AMDGCN has storage-only support for bf16. R600 does not support it should error out when
	// it's the main target.			// it's the main target.

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	// There should be no errors on using the type itself, or when loading/storing values for amdgcn.			// There should be no errors on using the type itself, or when loading/storing values for amdgcn.
	// r600 should error on all uses of the type.			// r600 should error on all uses of the type.

	// r600-error@+1 {{__bf16 is not supported on this target}}			// r600-error@+1 {{__bf16 is not supported on this target}}
	typedef __attribute__((ext_vector_type(2))) __bf16 bf16_x2;			typedef __attribute__((ext_vector_type(2))) __bf16 bf16_x2;
	// r600-error@+1 {{__bf16 is not supported on this target}}			// r600-error@+1 {{__bf16 is not supported on this target}}
	typedef __attribute__((ext_vector_type(4))) __bf16 bf16_x4;			typedef __attribute__((ext_vector_type(4))) __bf16 bf16_x4;
	// r600-error@+1 {{__bf16 is not supported on this target}}			// r600-error@+1 {{__bf16 is not supported on this target}}
	typedef __attribute__((ext_vector_type(8))) __bf16 bf16_x8;			typedef __attribute__((ext_vector_type(8))) __bf16 bf16_x8;
	// r600-error@+1 {{__bf16 is not supported on this target}}			// r600-error@+1 {{__bf16 is not supported on this target}}
	typedef __attribute__((ext_vector_type(16))) __bf16 bf16_x16;			typedef __attribute__((ext_vector_type(16))) __bf16 bf16_x16;

	// r600-error@+1 2 {{__bf16 is not supported on this target}}			// r600-error@+1 2 {{__bf16 is not supported on this target}}
	__device__ void test(bool b, __bf16 *out, __bf16 in) {			__device__ void test(bool b, __bf16 *out, __bf16 in) {
	__bf16 bf16 = in; // r600-error {{__bf16 is not supported on this target}}			__bf16 bf16 = in; // r600-error {{__bf16 is not supported on this target}}

	bf16 + bf16; // amdgcn-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}
	bf16 - bf16; // amdgcn-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}
	bf16 * bf16; // amdgcn-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}
	bf16 / bf16; // amdgcn-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}

	__fp16 fp16;

	bf16 + fp16; // amdgcn-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}
	fp16 + bf16; // amdgcn-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}
	bf16 - fp16; // amdgcn-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}
	fp16 - bf16; // amdgcn-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}
	bf16 * fp16; // amdgcn-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}
	fp16 * bf16; // amdgcn-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}
	bf16 / fp16; // amdgcn-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}
	fp16 / bf16; // amdgcn-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}
	bf16 = fp16; // amdgcn-error {{assigning to '__bf16' from incompatible type '__fp16'}}
	fp16 = bf16; // amdgcn-error {{assigning to '__fp16' from incompatible type '__bf16'}}
	bf16 + (b ? fp16 : bf16); // amdgcn-error {{incompatible operand types ('__fp16' and '__bf16')}}
	*out = bf16;			*out = bf16;

	// amdgcn-error@+1 {{static_cast from '__bf16' to 'unsigned short' is not allowed}}
	unsigned short u16bf16 = static_cast<unsigned short>(bf16);
	// amdgcn-error@+2 {{C-style cast from 'unsigned short' to '__bf16' is not allowed}}
	// r600-error@+1 {{__bf16 is not supported on this target}}
	bf16 = (__bf16)u16bf16;

	// amdgcn-error@+1 {{static_cast from '__bf16' to 'float' is not allowed}}
	float f32bf16 = static_cast<float>(bf16);
	// amdgcn-error@+2 {{C-style cast from 'float' to '__bf16' is not allowed}}
	// r600-error@+1 {{__bf16 is not supported on this target}}
	bf16 = (__bf16)f32bf16;

	// amdgcn-error@+1 {{static_cast from '__bf16' to 'double' is not allowed}}
	double f64bf16 = static_cast<double>(bf16);
	// amdgcn-error@+2 {{C-style cast from 'double' to '__bf16' is not allowed}}
	// r600-error@+1 {{__bf16 is not supported on this target}}
	bf16 = (__bf16)f64bf16;

	// r600-error@+1 {{__bf16 is not supported on this target}}			// r600-error@+1 {{__bf16 is not supported on this target}}
	typedef __attribute__((ext_vector_type(2))) __bf16 bf16_x2;			typedef __attribute__((ext_vector_type(2))) __bf16 bf16_x2;
	bf16_x2 vec2_a, vec2_b;			bf16_x2 vec2_a, vec2_b;
	vec2_a = vec2_b;			vec2_a = vec2_b;

	// r600-error@+1 {{__bf16 is not supported on this target}}			// r600-error@+1 {{__bf16 is not supported on this target}}
	typedef __attribute__((ext_vector_type(4))) __bf16 bf16_x4;			typedef __attribute__((ext_vector_type(4))) __bf16 bf16_x4;
	bf16_x4 vec4_a, vec4_b;			bf16_x4 vec4_a, vec4_b;
	Show All 21 Lines

clang/test/SemaCUDA/bf16.cu

	// REQUIRES: nvptx-registered-target			// REQUIRES: nvptx-registered-target
	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target

	// RUN: %clang_cc1 "-triple" "x86_64-unknown-linux-gnu" "-aux-triple" "nvptx64-nvidia-cuda" \			// RUN: %clang_cc1 "-triple" "x86_64-unknown-linux-gnu" "-aux-triple" "nvptx64-nvidia-cuda" \
	// RUN: "-target-cpu" "x86-64" -fsyntax-only -verify=scalar %s			// RUN: "-target-cpu" "x86-64" -fsyntax-only -verify=scalar -Wno-unused %s
	// RUN: %clang_cc1 "-aux-triple" "x86_64-unknown-linux-gnu" "-triple" "nvptx64-nvidia-cuda" \			// RUN: %clang_cc1 "-aux-triple" "x86_64-unknown-linux-gnu" "-triple" "nvptx64-nvidia-cuda" \
	// RUN: -fcuda-is-device "-aux-target-cpu" "x86-64" -fsyntax-only -verify=scalar %s			// RUN: -fcuda-is-device "-aux-target-cpu" "x86-64" -fsyntax-only -verify=scalar -Wno-unused %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	__device__ void test(bool b, __bf16 *out, __bf16 in) {			__device__ void test(bool b, __bf16 *out, __bf16 in) {
	__bf16 bf16 = in; // No error on using the type itself.			__bf16 bf16 = in; // No error on using the type itself.

	bf16 + bf16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}			bf16 + bf16;
	bf16 - bf16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}			bf16 - bf16;
	bf16 * bf16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}			bf16 * bf16;
	bf16 / bf16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__bf16')}}			bf16 / bf16;

	__fp16 fp16;			__fp16 fp16;

	bf16 + fp16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}			bf16 + fp16;
	fp16 + bf16; // scalar-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}			fp16 + bf16;
	bf16 - fp16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}			bf16 - fp16;
	fp16 - bf16; // scalar-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}			fp16 - bf16;
	bf16 * fp16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}			bf16 * fp16;
	fp16 * bf16; // scalar-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}			fp16 * bf16;
	bf16 / fp16; // scalar-error {{invalid operands to binary expression ('__bf16' and '__fp16')}}			bf16 / fp16;
	fp16 / bf16; // scalar-error {{invalid operands to binary expression ('__fp16' and '__bf16')}}			fp16 / bf16;
	bf16 = fp16; // scalar-error {{assigning to '__bf16' from incompatible type '__fp16'}}			bf16 = fp16; // scalar-error {{assigning to '__bf16' from incompatible type '__fp16'}}
	fp16 = bf16; // scalar-error {{assigning to '__fp16' from incompatible type '__bf16'}}			fp16 = bf16; // scalar-error {{assigning to '__fp16' from incompatible type '__bf16'}}
	bf16 + (b ? fp16 : bf16); // scalar-error {{incompatible operand types ('__fp16' and '__bf16')}}			bf16 + (b ? fp16 : bf16);
	*out = bf16;			*out = bf16;
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[Clang][BFloat16] Upgrade __bf16 to arithmetic type, change mangling, and extend excess precision support.ClosedPublic

Details

Diff Detail

Event Timeline

Half-Precision Floating Point

Revision Contents

Diff 524467

clang/docs/LanguageExtensions.rst

Half-Precision Floating Point

clang/include/clang/Basic/DiagnosticSemaKinds.td

clang/include/clang/Basic/FPOptions.def

clang/include/clang/Basic/LangOptions.def

clang/include/clang/Basic/TargetInfo.h

clang/include/clang/Driver/Options.td

clang/lib/AST/Type.cpp

clang/lib/Basic/TargetInfo.cpp

clang/lib/Basic/Targets/AMDGPU.h

clang/lib/Basic/Targets/ARM.cpp

clang/lib/Basic/Targets/NVPTX.h

clang/lib/Basic/Targets/X86.h

clang/lib/Basic/Targets/X86.cpp

clang/lib/CodeGen/CGExprScalar.cpp

clang/lib/Driver/ToolChains/Clang.cpp

clang/lib/Sema/SemaCast.cpp

clang/lib/Sema/SemaExpr.cpp

clang/lib/Sema/SemaOverload.cpp

clang/test/CodeGen/X86/avx512bf16-error.c

clang/test/CodeGen/X86/bfloat-mangle.cpp

clang/test/CodeGen/X86/bfloat16.cpp

clang/test/CodeGen/X86/fexcess-precision-bfloat16.c

clang/test/CodeGenCUDA/amdgpu-bf16.cu

clang/test/CodeGenCUDA/bf16.cu

clang/test/Driver/fexcess-precision.c

clang/test/Sema/arm-bf16-forbidden-ops.c

clang/test/Sema/arm-bf16-forbidden-ops.cpp

clang/test/Sema/arm-bfloat.cpp

clang/test/SemaCUDA/amdgpu-bf16.cu

clang/test/SemaCUDA/bf16.cu

[Clang][BFloat16] Upgrade __bf16 to arithmetic type, change mangling, and extend excess precision support.
ClosedPublic