This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
2
LangRef.rst
-
include/llvm/
-
llvm/
-
Analysis/
-
TargetTransformInfoImpl.h
-
CodeGen/
-
BasicTTIImpl.h
-
ISDOpcodes.h
-
SelectionDAGISel.h
-
IR/
1
IRBuilder.h
1
Intrinsics.td
-
Support/
-
TargetOpcodes.def
-
Target/
-
Target.td
-
lib/CodeGen/
-
CodeGen/
-
AsmPrinter/
1
AsmPrinter.cpp
-
SelectionDAG/
1
LegalizeVectorTypes.cpp
2
SelectionDAGBuilder.cpp
-
SelectionDAGISel.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
arithmetic_fence.ll
-
arithmetic_fence2.ll

Differential D99675

[llvm][clang] Create new intrinsic llvm.arithmetic.fence to control FP optimization at expression level
ClosedPublic

Authored by mibintc on Mar 31 2021, 11:26 AM.

Download Raw Diff

Details

Reviewers

andrew.w.kaylor
pengfei
kbsmith1
kpn
cameron.mcinally
uweigand
LuoYuanke
LiuChen3
craig.topper

Commits

rG931e95687d6d: [llvm][clang][fpenv] Create new intrinsic llvm.arith.fence to control FP…

Summary

This patch adds a new llvm intrinsic, llvm.arith.fence. The purpose is to provide fine control, at the expression level, over floating point optimization when -ffast-math (-ffp-model=fast) is enabled. We are also proposing a new clang builtin that provides access to this intrinsic, as well as a new clang command line option -fprotect-parens that will be implemented using this intrinsic.

This patch is authored by @pengfei

Rationale

Some expression transformations that are mathematically correct, such as reassociation and distribution, may be incorrect when dealing with finite precision floating point. For example, these two expressions,

(a + b) + c
a + (b + c)

are equivalent mathematically in integer arithmetic, but not in floating point. In some floating point (FP) models, the compiler is allowed to make these value-unsafe transformations for performance reasons, even when the programmer uses parentheses explicitly. But the compiler must always honor the parentheses implied by llvm.arith.fence, regardless of the FP model settings.

Under –ffp-model=fast, llvm.arith.fence provides a way to partially enforce ordering in an FP expression.

Original expression	Transformed expression	Permitted?
(a + b) + c	a + (b + c)	Yes!
llvm.arith.fence(a + b) + c	a + (b + c)	No!

NOTE: The llvm.arith.fence serves no purpose in value-safe FP modes like –ffp-model=precise: FP expressions are already strictly ordered.

The new llvm intrinsic also enables the implementation of the option -fprotect-parens which is available in gfortran as well as the Intel C++ and Fortran compilers: icc and ifort.

Proposed llvm IR changes

Requirements for llvm.arith.fence:

There is one operand. The input to the intrinsic is an llvm::Value and must be scalar floating point or vector floating point.
The return type is the same as the operand type.
The return value is equivalent to the operand.

Optimizing llvm.arith.fence

Constant folding may substitute the constant value of the llvm.arith.fence operand for the value of fence itself in the case where the operand is constant.
CSE Detection: No special changes needed: if E1 and E2 are CSE, then llvm.arith.fence(E1) and llvm.arith.fence(E2) are CSE.
FMA transformation should be enabled, at least in the -ffp-model=fast case.
- The expression “llvm.arith.fence(a * b) + c” means that “a * b” must happen before “+ c” and FMA guarantees that, but to prevent later optimizations from unpacking the FMA the correct transformation needs to be:

llvm.arith.fence(a * b) + c  →  llvm.arith.fence(FMA(a, b, c))

In the ffp-model=fast case, FMA formation doesn’t happen until Isel, so we just need to add the llvm.arith.fence cases to ISel pattern matching.
There are some choices around the FMA optimization. For this example:

%t1 = fmul double %x, %y
%t2 = call double @llvm.arith.fence.f64(double %t1)
%t3 = fadd contract double %t2, %z

1. FMA is allowed across an arith.fence if and only if the FMF contract flag is set for the llvm.arith.fence operand. After review discussion, we are convinced this choice doesn't work.
2. FMA is not allowed across a fence We are recommending this choice
3. The FMF contract flag should be set on the llvm.arith.fence intrinsic call if contraction should be enabled
Fast Math Optimization:
- The result of a llvm.arith.fence can participate in fast math optimizations. For example:

// This transformation is legal:
w + llvm.arith.fence(x + y) + z   →   w + z + llvm.arith.fence(x + y)

The operand of a llvm.arith.fence can participate in fast math optimizations. For example:

// This transformation is legal:
llvm.arith.fence((x+y)+z) --> llvm.arith.fence(x+(y+z))

NOTE: We want fast-math optimization within the fence, but not across the fence.

MIR Optimization:
- The use of a pseudo-operation in the MIR serves the same purpose as the intrinsic in the IR, since all the optimizations are based on patterns matching from known DAGs/MIs.
- Backend simply respects the llvm.arith.fence intrinsic, builds llvm.arith.fence node during DAG/ISel and emits pseudo arithmetic_fence MI after it.
- The pseudo arithmetic_fence MI turns into a comment when emitting assembly.

Other llvm changes needed -- utility functions

The ValueTracking utilities will need to be taught to handle the new intrinsic. For example, there are utility functions like isKnownNeverNaN() and CannotBeOrderedLessThanZero() that will need to “look through” the intrinsic.

A simple example

// llvm IR, llvm.arith.fence over addition.
 %5 = load double, double* %B, align 8
 %add1 = fadd fast double %4, %5
 %6 = call double @llvm.arith.fence.f64(double %add1)
 %7 = load double, double* %C, align 8
 %mul = fmul fast double %6, %7
 store double %mul, double* %A, align 8

Example, llvm.arith.fence over memory operand

Consider this similar example, which illustrates how ‘x’ can be optimized while ‘z’ is fenced. Notice ‘q’ is simplified to ‘b’ (q = a + b - a -> q = b), but ‘z’ isn’t simplified because of the fence.

// llvm IR
  define dso_local float @f(float %a, float %b) 
  local_unnamed_addr #0 {
  %x = fadd fast float %b, %a
  %tmp = call fast float @llvm.arith.fence.f32(float %x)
  %z = fsub fast float %tmp, %a
  %result = call fast float @llvm.maxnum.f32(float %z, float %b)
  ret float %result

Clang changes to take advantage of this intrinsic

Add new clang builtin __arithmetic_fence
- Add builtin definition
  - There is one operand. Any kind of expression, including memory operand.
  - The return type is the same as the operand type. The result of the intrinsic is the value of its rvalue operand.
  - The operand type can be any scalar floating point type, complex, or vector with float or complex element type.
  - The invocation of __arithmetic_fence is not a C/C++ constant expression, even if the operands are constant.

- Add semantic checks and test cases
- Modify clang/codegen to generate the llvm.arith.fence intrinsic
Add support for a new command-line option -fprotect-parens which honors parentheses within a floating point expression, the default is -fno-protect-parens. For example,

// Compile with -ffast-math
double A,B,C;
A = __arithmetic_fence(A+B)*C;

// llvm IR
 %4 = load double, double* %A, align 8
 %5 = load double, double* %B, align 8
 %add1 = fadd fast double %4, %5
 %6 = call double @llvm.arith_fence.f64(double %add1)
 %7 = load double, double* %C, align 8
 %mul = fmul fast double %6, %7
 store double %mul, double* %A, align 8

Motivation: the new clang builtin provides clang compatibility with the Intel C++ compiler builtin __fence which has similar semantics, and likewise enables implementation of the option -fprotect-parens. The new builtin provides the clang programmer control over floating point optimizations at the expression level.

Pros & Cons

1. Pros
Increases expressiveness and precise control over floating point calculations.
Provides a desirable compatibility feature from industrial compilers
1. Cons
Intrinsic bloat.
Some of LLVM's optimizations need to understand the llvm.arith.fence semantics in order to retain optimization capabilities. This will require at least some engineering effort.
Any target that wants to support this has to make modifications to their back-end.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mibintc created this revision.Mar 31 2021, 11:26 AM

Herald added subscribers: dexonsmith, jfb, hiraditya. · View Herald TranscriptMar 31 2021, 11:26 AM

mibintc requested review of this revision.Mar 31 2021, 11:26 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2021, 11:26 AM

Herald added a subscriber: jdoerfert. · View Herald Transcript

Harbormaster completed remote builds in B96567: Diff 334487.Mar 31 2021, 12:21 PM

riccibruno added a subscriber: riccibruno.Mar 31 2021, 1:16 PM

mibintc edited the summary of this revision. (Show Details)Mar 31 2021, 2:26 PM

mibintc edited the summary of this revision. (Show Details)Mar 31 2021, 2:29 PM

mibintc edited the summary of this revision. (Show Details)Mar 31 2021, 2:32 PM

dexonsmith removed a subscriber: dexonsmith.Mar 31 2021, 4:04 PM

spatel added a subscriber: spatel.Apr 6 2021, 8:33 AM

The expression “llvm.arith.fence(a * b) + c” means that “a * b” must happen before “+ c” and FMA guarantees that, but to prevent later optimizations from unpacking the FMA the correct transformation needs to be:

llvm.arith.fence(a * b) + c → llvm.arith.fence(FMA(a, b, c))

Does this actually block later transforms from unpacking the FMA? Maybe if the FMA isn't marked "fast"...

How is llvm.arith.fence() different from using "freeze" on a floating-point value? The goal isn't really the same, sure, but the effects seem similar at first glance.

In D99675#2671924, @efriedma wrote:

The expression “llvm.arith.fence(a * b) + c” means that “a * b” must happen before “+ c” and FMA guarantees that, but to prevent later optimizations from unpacking the FMA the correct transformation needs to be:

llvm.arith.fence(a * b) + c → llvm.arith.fence(FMA(a, b, c))

Does this actually block later transforms from unpacking the FMA? Maybe if the FMA isn't marked "fast"...

I think we could define llvm.arith.fence to be such that this FMA contraction isn't legal/correct, or it could be left as is. In the implementation that was used for the Intel compiler FMA contraction did not occur across an an __fence boundary. It is unclear whether that was intended as the semantic, or if we just never bothered to implement that contraction.
Not allowing the FMA contraction across the llvm.arith.fence would make unpacking an FMA allowed under the same circumstances that LLVM currently allows that.

How is llvm.arith.fence() different from using "freeze" on a floating-point value? The goal isn't really the same, sure, but the effects seem similar at first glance.

They are similar. However, fence is a no-op if the operand can be proven not to be undef or poison, and in such circumstances could be removed by an optimizer. llvm.arith.fence cannot be removed by an optimizer, because doing so might allow instructions that were "outside" the fence from being reassociated/distrbuted with the instructions/operands that were inside the fence.

In D99675#2671924, @efriedma wrote:

The expression “llvm.arith.fence(a * b) + c” means that “a * b” must happen before “+ c” and FMA guarantees that, but to prevent later optimizations from unpacking the FMA the correct transformation needs to be:

llvm.arith.fence(a * b) + c → llvm.arith.fence(FMA(a, b, c))

Does this actually block later transforms from unpacking the FMA? Maybe if the FMA isn't marked "fast"...

I'd like @pengfei to reply to this question. I think the overall idea is that many of the optimizations are pattern based, and the existing pattern wouldn't match the new intrinsic.

How is llvm.arith.fence() different from using "freeze" on a floating-point value? The goal isn't really the same, sure, but the effects seem similar at first glance.

Initially we thought the intrinsic "ssa.copy" could serve. However ssa.copy is for a different purpose and it gets optimized away. We want arith.fence to survive through codegen, that's one reason why we think a new intrinsic is needed.

In D99675#2671924, @efriedma wrote:

The expression “llvm.arith.fence(a * b) + c” means that “a * b” must happen before “+ c” and FMA guarantees that, but to prevent later optimizations from unpacking the FMA the correct transformation needs to be:

llvm.arith.fence(a * b) + c → llvm.arith.fence(FMA(a, b, c))

Does this actually block later transforms from unpacking the FMA? Maybe if the FMA isn't marked "fast"...

Later transforms could unpack the FMA, but the result would be fenced. The intent isn't so much to prevent the FMA from being unpacked as to prevent losing the original fence semantics. That said, it doesn't quite work. For example, you might have this:

%mul = fmul fast float %a, %b
%fenced_mul = call float @llvm.arith.fence.f32(%mul)
%result = fadd fast float %fenced_mul, %c

If there are no other uses of %fenced_mul, that could become

%tmp = call fast float @llvm.fmuladd.f32(float %a, float %b, float %c)
%result = call float @llvm.arith.fence.f32(%tmp)

If a later optimization decided to unpack this, it would become this:

%mul = fmul fast float %a, %b
%tmp = fadd fast float %mul, %c
%result = call float @llvm.arith.fence.f32(%tmp)

I suggested this as a way of enabling the FMA optimization. It brings the fadd into the fence, but still protects the fmul from being reassociated or otherwise transformed with other operations outside the fence. In a purely practical sense, this would probably work. In a more strict sense, though, I now see that it has the problem that you could legally distribute the addition within the fence. I can't see a practical reason anyone would do that, but the semantics would allow it. The same ("legal but not practical") is true of forming the fmuladd intrinsic before codegen, I think.

So, no, I don't think this works the way it was intended.

That might push us back to Kevin's suggestion of just not allowing the FMA optimization across a fence.

JonChesterfield added a subscriber: JonChesterfield.Apr 8 2021, 4:27 AM

mibintc mentioned this in D100118: [clang] Add support for new builtin __arithmetic_fence to control floating point optimization, and new clang option fprotect-parens.Apr 8 2021, 8:45 AM

mibintc added a child revision: D100118: [clang] Add support for new builtin __arithmetic_fence to control floating point optimization, and new clang option fprotect-parens.Apr 8 2021, 8:47 AM

In D99675#2672138, @kbsmith1 wrote:

In D99675#2671924, @efriedma wrote:

How is llvm.arith.fence() different from using "freeze" on a floating-point value? The goal isn't really the same, sure, but the effects seem similar at first glance.

They are similar. However, freeze is a no-op if the operand can be proven not to be undef or poison, and in such circumstances could be removed by an optimizer. llvm.arith.fence cannot be removed by an optimizer, because doing so might allow instructions that were "outside" the fence from being reassociated/distrbuted with the instructions/operands that were inside the fence.

Okay. In practice, it's basically impossible for us to prove that the result of "fast" arithmetic isn't poison, given the way ninf/nnan are defined, but depending on that would be fragile.

cameron.mcinally added a subscriber: cameron.mcinally.Apr 12 2021, 2:53 PM

rscottmanley added a subscriber: rscottmanley.Apr 12 2021, 3:38 PM

This is a minor update from @pengfei which allows simple tests cases to run end-to-end with clang.
Also I changed the "summary" to reflect the review discussion around the FMA optimization, to choose "FMA is not allowed across a fence".

Harbormaster completed remote builds in B99005: Diff 337879.Apr 15 2021, 2:03 PM

I accidentally dropped the test case in previous commit. Just adding it back in -- under the llvm/test directory (previously it was in the wrong location).

Harbormaster completed remote builds in B99165: Diff 338099.Apr 16 2021, 7:58 AM

What changes are needed for a backend, and what happens if they aren't done?

In D99675#2695424, @kpn wrote:

What changes are needed for a backend, and what happens if they aren't done?

In the clang patch, I'm planning to add into TargetInfo a function like "does the target support __arithmetic_fence"?
In the llvm patch, the fallback implementation could be to merely ignore the call, and pass through the operand value. Is that adequate?

In D99675#2695480, @mibintc wrote:

In D99675#2695424, @kpn wrote:

What changes are needed for a backend, and what happens if they aren't done?

In the clang patch, I'm planning to add into TargetInfo a function like "does the target support __arithmetic_fence"?
In the llvm patch, the fallback implementation could be to merely ignore the call, and pass through the operand value. Is that adequate?

If clang is the only compiler to ever emit this new intrinsic then, yes, that's perfectly fine.

If a front-end other than clang uses the new fence then I'm nervous about having the fence just vanish. If the fence is used then it must be for correctness, right? Having something needed for correctness silently not work seems ... sub-optimal. It's the sort of thing that might not get caught in testing, and then you've got end-users running software that silently lacks something needed for correctness. That makes me nervous. I'd rather LLVM bomb instead of silently dropping this fence. Then developers know they have a problem before a product goes out the door.

But if I'm the only one that's nervous then that's OK and clang rejecting the compile would be sufficient.

Has this sort of issue come up in the past? How was it handled?

In D99675#2695424, @kpn wrote:

What changes are needed for a backend, and what happens if they aren't done?

As far as I understand it, backend does optimizations based on patterns of the known nodes and MIs. Inserting a new node/MI will block any optimizations across the fence. So it respects the semantics of the intrinsic without target special chenges.
I'm not sure if there's room for optimization cross the arithmetic.fence. If there is and no changes for it, backend may have some performance loss under these circumstances.

Having something needed for correctness silently not work seems ... sub-optimal.

I think backend is conservative for optimizations when use the intrinsic. It won't have correctness issue silently, but performance loss might.

In D99675#2696327, @pengfei wrote:

In D99675#2695424, @kpn wrote:

What changes are needed for a backend, and what happens if they aren't done?

As far as I understand it, backend does optimizations based on patterns of the known nodes and MIs. Inserting a new node/MI will block any optimizations across the fence. So it respects the semantics of the intrinsic without target special chenges.
I'm not sure if there's room for optimization cross the arithmetic.fence. If there is and no changes for it, backend may have some performance loss under these circumstances.

Having something needed for correctness silently not work seems ... sub-optimal.

I think backend is conservative for optimizations when use the intrinsic. It won't have correctness issue silently, but performance loss might.

OK, that sounds fine, then.

tschuett added a subscriber: tschuett.May 18 2021, 8:11 AM

Matt added a subscriber: Matt.May 19 2021, 11:13 AM

sidorovd added a subscriber: sidorovd.May 20 2021, 4:00 AM

Rebased to ToT. It fixes the previous illegal type lowering problems. It also updates the tests to show the functionality in a better way as well as fixes a newly found problem.

Ready for your code review and +1

We think this patch provides basic functionality for the intrinsic, and enhancements can be added in future patches.

Thanks!

Harbormaster completed remote builds in B106347: Diff 348046.May 26 2021, 12:34 PM

pengfei added reviewers: kpn, cameron.mcinally, uweigand.May 26 2021, 6:36 PM

pengfei added reviewers: LuoYuanke, LiuChen3.Jun 1 2021, 5:38 PM

pengfei added a reviewer: craig.topper.Jun 3 2021, 12:18 AM

We may add description on the intrinsic in docs/LangRef.rst.

craig.topper added inline comments.Jun 3 2021, 10:15 AM

llvm/include/llvm/IR/IRBuilder.h
911	Do you really need curly braces around DstType and Val? A single value should be implicitly convertible to ArrayRef.
llvm/include/llvm/IR/Intrinsics.td
1341	This comment got duplicated.
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1336	I think you should check isVerbose() before printing this.
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
3151	What about splitting a vector like v8f32 on SSE2?
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6296	There's already a variable called sdl that contains this. It's used in the surrounding cases.
6299	Why isn't this just Val.getValueType()?

This patch addresses all of @craig.topper comments and adds documentation for the new intrinsic to the language reference as requested by @LuoYuanke nke

Harbormaster completed remote builds in B107999: Diff 350307.Jun 7 2021, 9:14 AM

pengfei added inline comments.Jun 7 2021, 10:28 PM

llvm/docs/LangRef.rst
21457	Should be equal to the text?

I corrected error in LangRef documentation that @pengfei pointed out.

Harbormaster completed remote builds in B108275: Diff 350700.Jun 8 2021, 1:34 PM

pengfei added inline comments.Jun 9 2021, 7:34 AM

llvm/docs/LangRef.rst
21457	Yeah, a good catch. But I initially meant `^^^` should be equal to the title. :)

Correct small formatting issue in LangRef.rst thanks @pengfei

Harbormaster completed remote builds in B108427: Diff 350912.Jun 9 2021, 9:40 AM

pengfei mentioned this in D104247: [DAGCombine] reassoc flag shouldn't enable contract.Jun 18 2021, 7:13 PM

LGTM, but pls wait for 1 or 2 days to see if there is any more comments.

This revision is now accepted and ready to land.Jun 23 2021, 5:10 PM

Rebasing. Hope this clears lit fails.

Harbormaster completed remote builds in B110860: Diff 354300.Jun 24 2021, 11:21 AM

This revision was landed with ongoing or failed builds.Jun 28 2021, 9:27 AM

Closed by commit rG931e95687d6d: [llvm][clang][fpenv] Create new intrinsic llvm.arith.fence to control FP… (authored by mibintc). · Explain Why

This revision was automatically updated to reflect the committed changes.

mibintc added a commit: rG931e95687d6d: [llvm][clang][fpenv] Create new intrinsic llvm.arith.fence to control FP….

skan mentioned this in D112127: [Codegen] Set ARITH_FENCE as meta-instruction.Oct 20 2021, 2:19 AM

skan mentioned this in rGedff0070a126: [Codegen] Set ARITH_FENCE as meta-instruction.Oct 20 2021, 7:19 PM

aaron.ballman mentioned this in D155430: [clang][Interp] Implement __arithmethic_fence for floating types.Jul 31 2023, 6:01 AM

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

36 lines

include/

llvm/

Analysis/

TargetTransformInfoImpl.h

1 line

CodeGen/

BasicTTIImpl.h

1 line

ISDOpcodes.h

4 lines

SelectionDAGISel.h

1 line

IR/

IRBuilder.h

7 lines

Intrinsics.td

3 lines

Support/

TargetOpcodes.def

3 lines

Target/

Target.td

7 lines

lib/

CodeGen/

AsmPrinter/

AsmPrinter.cpp

4 lines

SelectionDAG/

LegalizeVectorTypes.cpp

3 lines

SelectionDAGBuilder.cpp

6 lines

SelectionDAGISel.cpp

8 lines

test/

CodeGen/

X86/

arithmetic_fence.ll

161 lines

arithmetic_fence2.ll

170 lines

Diff 354931

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 21,447 Lines • ▼ Show 20 Lines	- If the given pointer is not associated with the given type metadata

2. If the function has a non-void return type, a pointer to a function that		2. If the function has a non-void return type, a pointer to a function that
returns an unspecified value without causing side effects.		returns an unspecified value without causing side effects.

If the function's return value's second element is false, the value of the		If the function's return value's second element is false, the value of the
first element is undefined.		first element is undefined.


		'``llvm.arithmetic.fence``' Intrinsic
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
		pengfeiUnsubmitted Not Done Reply Inline Actions Should be equal to the text? pengfei: Should be equal to the text?
		pengfeiUnsubmitted Not Done Reply Inline Actions Yeah, a good catch. But I initially meant `^^^` should be equal to the title. :) pengfei: Yeah, a good catch. But I initially meant `^^^` should be equal to the title. :)

		Syntax:
		"""""""

		::

		declare <type>
		@llvm.arithmetic.fence(<type> <op>)

		Overview:
		"""""""""

		The purpose of the ``llvm.arithmetic.fence`` intrinsic
		is to prevent the optimizer from performaing fast-math optimizations,
		particularly reassociation,
		between the argument and the expression that contains the argument.
		It can be used to preserve the parentheses in the source language.

		Arguments:
		""""""""""

		The ``llvm.arithmetic.fence`` intrinsic takes only one argument.
		The argument and the return value are floating-point numbers,
		or vector floating-point numbers, of the same type.

		Semantics:
		""""""""""

		This intrinsic returns the value of its operand. The optimizer can optimize
		the argument, but the optimizer cannot hoist any component of the operand
		to the containing context, and the optimizer cannot move the calculation of
		any expression in the containing context into the operand.


'``llvm.donothing``' Intrinsic		'``llvm.donothing``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

▲ Show 20 Lines • Show All 1,066 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 570 Lines • ▼ Show 20 Lines	InstructionCost getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
TTI::TargetCostKind CostKind) const {		TTI::TargetCostKind CostKind) const {
switch (ICA.getID()) {		switch (ICA.getID()) {
default:		default:
break;		break;
case Intrinsic::annotation:		case Intrinsic::annotation:
case Intrinsic::assume:		case Intrinsic::assume:
case Intrinsic::sideeffect:		case Intrinsic::sideeffect:
case Intrinsic::pseudoprobe:		case Intrinsic::pseudoprobe:
		case Intrinsic::arithmetic_fence:
case Intrinsic::dbg_declare:		case Intrinsic::dbg_declare:
case Intrinsic::dbg_value:		case Intrinsic::dbg_value:
case Intrinsic::dbg_label:		case Intrinsic::dbg_label:
case Intrinsic::invariant_start:		case Intrinsic::invariant_start:
case Intrinsic::invariant_end:		case Intrinsic::invariant_end:
case Intrinsic::launder_invariant_group:		case Intrinsic::launder_invariant_group:
case Intrinsic::strip_invariant_group:		case Intrinsic::strip_invariant_group:
case Intrinsic::is_constant:		case Intrinsic::is_constant:
▲ Show 20 Lines • Show All 587 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 1,603 Lines • ▼ Show 20 Lines	getTypeBasedIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
case Intrinsic::experimental_constrained_fmuladd:		case Intrinsic::experimental_constrained_fmuladd:
ISDs.push_back(ISD::STRICT_FMA);		ISDs.push_back(ISD::STRICT_FMA);
break;		break;
// FIXME: We should return 0 whenever getIntrinsicCost == TCC_Free.		// FIXME: We should return 0 whenever getIntrinsicCost == TCC_Free.
case Intrinsic::lifetime_start:		case Intrinsic::lifetime_start:
case Intrinsic::lifetime_end:		case Intrinsic::lifetime_end:
case Intrinsic::sideeffect:		case Intrinsic::sideeffect:
case Intrinsic::pseudoprobe:		case Intrinsic::pseudoprobe:
		case Intrinsic::arithmetic_fence:
return 0;		return 0;
case Intrinsic::masked_store: {		case Intrinsic::masked_store: {
Type *Ty = Tys[0];		Type *Ty = Tys[0];
Align TyAlign = thisT()->DL.getABITypeAlign(Ty);		Align TyAlign = thisT()->DL.getABITypeAlign(Ty);
return thisT()->getMaskedMemoryOpCost(Instruction::Store, Ty, TyAlign, 0,		return thisT()->getMaskedMemoryOpCost(Instruction::Store, Ty, TyAlign, 0,
CostKind);		CostKind);
}		}
case Intrinsic::masked_load: {		case Intrinsic::masked_load: {
▲ Show 20 Lines • Show All 562 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/ISDOpcodes.h

	Show First 20 Lines • Show All 1,091 Lines • ▼ Show 20 Lines
	UBSANTRAP,			UBSANTRAP,

	/// PREFETCH - This corresponds to a prefetch intrinsic. The first operand			/// PREFETCH - This corresponds to a prefetch intrinsic. The first operand
	/// is the chain. The other operands are the address to prefetch,			/// is the chain. The other operands are the address to prefetch,
	/// read / write specifier, locality specifier and instruction / data cache			/// read / write specifier, locality specifier and instruction / data cache
	/// specifier.			/// specifier.
	PREFETCH,			PREFETCH,

				/// ARITH_FENCE - This corresponds to a arithmetic fence intrinsic. Both its
				/// operand and output are the same floating type.
				ARITH_FENCE,

	/// OUTCHAIN = ATOMIC_FENCE(INCHAIN, ordering, scope)			/// OUTCHAIN = ATOMIC_FENCE(INCHAIN, ordering, scope)
	/// This corresponds to the fence instruction. It takes an input chain, and			/// This corresponds to the fence instruction. It takes an input chain, and
	/// two integer constants: an AtomicOrdering and a SynchronizationScope.			/// two integer constants: an AtomicOrdering and a SynchronizationScope.
	ATOMIC_FENCE,			ATOMIC_FENCE,

	/// Val, OUTCHAIN = ATOMIC_LOAD(INCHAIN, ptr)			/// Val, OUTCHAIN = ATOMIC_LOAD(INCHAIN, ptr)
	/// This corresponds to "load atomic" instruction.			/// This corresponds to "load atomic" instruction.
	ATOMIC_LOAD,			ATOMIC_LOAD,
	▲ Show 20 Lines • Show All 334 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/SelectionDAGISel.h

Show First 20 Lines • Show All 312 Lines • ▼ Show 20 Lines	private:
// Calls to these functions are generated by tblgen.		// Calls to these functions are generated by tblgen.
void Select_INLINEASM(SDNode *N);		void Select_INLINEASM(SDNode *N);
void Select_READ_REGISTER(SDNode *Op);		void Select_READ_REGISTER(SDNode *Op);
void Select_WRITE_REGISTER(SDNode *Op);		void Select_WRITE_REGISTER(SDNode *Op);
void Select_UNDEF(SDNode *N);		void Select_UNDEF(SDNode *N);
void CannotYetSelect(SDNode *N);		void CannotYetSelect(SDNode *N);

void Select_FREEZE(SDNode *N);		void Select_FREEZE(SDNode *N);
		void Select_ARITH_FENCE(SDNode *N);

private:		private:
void DoInstructionSelection();		void DoInstructionSelection();
SDNode MorphNode(SDNode Node, unsigned TargetOpc, SDVTList VTList,		SDNode MorphNode(SDNode Node, unsigned TargetOpc, SDVTList VTList,
ArrayRef<SDValue> Ops, unsigned EmitNodeInfo);		ArrayRef<SDValue> Ops, unsigned EmitNodeInfo);

/// Prepares the landing pad to take incoming values or do other EH		/// Prepares the landing pad to take incoming values or do other EH
/// personality specific tasks. Returns true if the block should be		/// personality specific tasks. Returns true if the block should be
Show All 40 Lines

llvm/include/llvm/IR/IRBuilder.h

Show First 20 Lines • Show All 899 Lines • ▼ Show 20 Lines	CallInst CreateMinimum(Value LHS, Value *RHS, const Twine &Name = "") {
return CreateBinaryIntrinsic(Intrinsic::minimum, LHS, RHS, nullptr, Name);		return CreateBinaryIntrinsic(Intrinsic::minimum, LHS, RHS, nullptr, Name);
}		}

/// Create call to the maximum intrinsic.		/// Create call to the maximum intrinsic.
CallInst CreateMaximum(Value LHS, Value *RHS, const Twine &Name = "") {		CallInst CreateMaximum(Value LHS, Value *RHS, const Twine &Name = "") {
return CreateBinaryIntrinsic(Intrinsic::maximum, LHS, RHS, nullptr, Name);		return CreateBinaryIntrinsic(Intrinsic::maximum, LHS, RHS, nullptr, Name);
}		}

		/// Create a call to the arithmetic_fence intrinsic.
		CallInst CreateArithmeticFence(Value Val, Type *DstType,
		const Twine &Name = "") {
		return CreateIntrinsic(Intrinsic::arithmetic_fence, DstType, Val, nullptr,
		craig.topperUnsubmitted Not Done Reply Inline Actions Do you really need curly braces around DstType and Val? A single value should be implicitly convertible to ArrayRef. craig.topper: Do you really need curly braces around DstType and Val? A single value should be implicitly…
		Name);
		}

/// Create a call to the experimental.vector.extract intrinsic.		/// Create a call to the experimental.vector.extract intrinsic.
CallInst CreateExtractVector(Type DstType, Value SrcVec, Value Idx,		CallInst CreateExtractVector(Type DstType, Value SrcVec, Value Idx,
const Twine &Name = "") {		const Twine &Name = "") {
return CreateIntrinsic(Intrinsic::experimental_vector_extract,		return CreateIntrinsic(Intrinsic::experimental_vector_extract,
{DstType, SrcVec->getType()}, {SrcVec, Idx}, nullptr,		{DstType, SrcVec->getType()}, {SrcVec, Idx}, nullptr,
Name);		Name);
}		}

▲ Show 20 Lines • Show All 1,755 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 1,329 Lines • ▼ Show 20 Lines

	// The pseudoprobe intrinsic works as a place holder to the block it probes.			// The pseudoprobe intrinsic works as a place holder to the block it probes.
	// Like the sideeffect intrinsic defined above, this intrinsic is treated by the			// Like the sideeffect intrinsic defined above, this intrinsic is treated by the
	// optimizer as having opaque side effects so that it won't be get rid of or moved			// optimizer as having opaque side effects so that it won't be get rid of or moved
	// out of the block it probes.			// out of the block it probes.
	def int_pseudoprobe : Intrinsic<[], [llvm_i64_ty, llvm_i64_ty, llvm_i32_ty, llvm_i64_ty],			def int_pseudoprobe : Intrinsic<[], [llvm_i64_ty, llvm_i64_ty, llvm_i32_ty, llvm_i64_ty],
	[IntrInaccessibleMemOnly, IntrWillReturn]>;			[IntrInaccessibleMemOnly, IntrWillReturn]>;

				// Arithmetic fence intrinsic.
				def int_arithmetic_fence : Intrinsic<[llvm_anyfloat_ty], [LLVMMatchType<0>], [IntrNoMem]>;

	// Intrinsics to support half precision floating point format			// Intrinsics to support half precision floating point format
				craig.topperUnsubmitted Not Done Reply Inline Actions This comment got duplicated. craig.topper: This comment got duplicated.
	let IntrProperties = [IntrNoMem, IntrWillReturn] in {			let IntrProperties = [IntrNoMem, IntrWillReturn] in {
	def int_convert_to_fp16 : DefaultAttrsIntrinsic<[llvm_i16_ty], [llvm_anyfloat_ty]>;			def int_convert_to_fp16 : DefaultAttrsIntrinsic<[llvm_i16_ty], [llvm_anyfloat_ty]>;
	def int_convert_from_fp16 : DefaultAttrsIntrinsic<[llvm_anyfloat_ty], [llvm_i16_ty]>;			def int_convert_from_fp16 : DefaultAttrsIntrinsic<[llvm_anyfloat_ty], [llvm_i16_ty]>;
	}			}

	// Saturating floating point to integer intrinsics			// Saturating floating point to integer intrinsics
	let IntrProperties = [IntrNoMem, IntrSpeculatable, IntrWillReturn] in {			let IntrProperties = [IntrNoMem, IntrSpeculatable, IntrWillReturn] in {
	def int_fptoui_sat : DefaultAttrsIntrinsic<[llvm_anyint_ty], [llvm_anyfloat_ty]>;			def int_fptoui_sat : DefaultAttrsIntrinsic<[llvm_anyint_ty], [llvm_anyfloat_ty]>;
	▲ Show 20 Lines • Show All 408 Lines • Show Last 20 Lines

llvm/include/llvm/Support/TargetOpcodes.def

	Show First 20 Lines • Show All 115 Lines • ▼ Show 20 Lines

	/// Lifetime markers.			/// Lifetime markers.
	HANDLE_TARGET_OPCODE(LIFETIME_START)			HANDLE_TARGET_OPCODE(LIFETIME_START)
	HANDLE_TARGET_OPCODE(LIFETIME_END)			HANDLE_TARGET_OPCODE(LIFETIME_END)

	/// Pseudo probe			/// Pseudo probe
	HANDLE_TARGET_OPCODE(PSEUDO_PROBE)			HANDLE_TARGET_OPCODE(PSEUDO_PROBE)

				/// Arithmetic fence.
				HANDLE_TARGET_OPCODE(ARITH_FENCE)

	/// A Stackmap instruction captures the location of live variables at its			/// A Stackmap instruction captures the location of live variables at its
	/// position in the instruction stream. It is followed by a shadow of bytes			/// position in the instruction stream. It is followed by a shadow of bytes
	/// that must lie within the function and not contain another stackmap.			/// that must lie within the function and not contain another stackmap.
	HANDLE_TARGET_OPCODE(STACKMAP)			HANDLE_TARGET_OPCODE(STACKMAP)

	/// FEntry all - This is a marker instruction which gets translated into a raw fentry call.			/// FEntry all - This is a marker instruction which gets translated into a raw fentry call.
	HANDLE_TARGET_OPCODE(FENTRY_CALL)			HANDLE_TARGET_OPCODE(FENTRY_CALL)

	▲ Show 20 Lines • Show All 642 Lines • Show Last 20 Lines

llvm/include/llvm/Target/Target.td

Show First 20 Lines • Show All 1,170 Lines • ▼ Show 20 Lines	def LIFETIME_END : StandardPseudoInstruction {
let hasSideEffects = false;		let hasSideEffects = false;
}		}
def PSEUDO_PROBE : StandardPseudoInstruction {		def PSEUDO_PROBE : StandardPseudoInstruction {
let OutOperandList = (outs);		let OutOperandList = (outs);
let InOperandList = (ins i64imm:$guid, i64imm:$index, i8imm:$type, i32imm:$attr);		let InOperandList = (ins i64imm:$guid, i64imm:$index, i8imm:$type, i32imm:$attr);
let AsmString = "PSEUDO_PROBE";		let AsmString = "PSEUDO_PROBE";
let hasSideEffects = 1;		let hasSideEffects = 1;
}		}
		def ARITH_FENCE : StandardPseudoInstruction {
		let OutOperandList = (outs unknown:$dst);
		let InOperandList = (ins unknown:$src);
		let AsmString = "";
		let hasSideEffects = false;
		let Constraints = "$src = $dst";
		}

def STACKMAP : StandardPseudoInstruction {		def STACKMAP : StandardPseudoInstruction {
let OutOperandList = (outs);		let OutOperandList = (outs);
let InOperandList = (ins i64imm:$id, i32imm:$nbytes, variable_ops);		let InOperandList = (ins i64imm:$id, i32imm:$nbytes, variable_ops);
let hasSideEffects = true;		let hasSideEffects = true;
let isCall = true;		let isCall = true;
let mayLoad = true;		let mayLoad = true;
let usesCustomInserter = true;		let usesCustomInserter = true;
▲ Show 20 Lines • Show All 528 Lines • Show Last 20 Lines

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp

Show First 20 Lines • Show All 1,326 Lines • ▼ Show 20 Lines	for (auto &MI : MBB) {
if (isVerbose()) emitImplicitDef(&MI);		if (isVerbose()) emitImplicitDef(&MI);
break;		break;
case TargetOpcode::KILL:		case TargetOpcode::KILL:
if (isVerbose()) emitKill(&MI, *this);		if (isVerbose()) emitKill(&MI, *this);
break;		break;
case TargetOpcode::PSEUDO_PROBE:		case TargetOpcode::PSEUDO_PROBE:
emitPseudoProbe(MI);		emitPseudoProbe(MI);
break;		break;
		case TargetOpcode::ARITH_FENCE:
		if (isVerbose())
		craig.topperUnsubmitted Not Done Reply Inline Actions I think you should check isVerbose() before printing this. craig.topper: I think you should check isVerbose() before printing this.
		OutStreamer->emitRawComment("ARITH_FENCE");
		break;
default:		default:
emitInstruction(&MI);		emitInstruction(&MI);
if (CanDoExtraAnalysis) {		if (CanDoExtraAnalysis) {
MCInst MCI;		MCInst MCI;
MCI.setOpcode(MI.getOpcode());		MCI.setOpcode(MI.getOpcode());
auto Name = OutStreamer->getMnemonic(MCI);		auto Name = OutStreamer->getMnemonic(MCI);
auto I = MnemonicCounts.insert({Name, 0u});		auto I = MnemonicCounts.insert({Name, 0u});
I.first->second++;		I.first->second++;
▲ Show 20 Lines • Show All 2,257 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	#endif
case ISD::FEXP2:		case ISD::FEXP2:
case ISD::FFLOOR:		case ISD::FFLOOR:
case ISD::FLOG:		case ISD::FLOG:
case ISD::FLOG10:		case ISD::FLOG10:
case ISD::FLOG2:		case ISD::FLOG2:
case ISD::FNEARBYINT:		case ISD::FNEARBYINT:
case ISD::FNEG:		case ISD::FNEG:
case ISD::FREEZE:		case ISD::FREEZE:
		case ISD::ARITH_FENCE:
case ISD::FP_EXTEND:		case ISD::FP_EXTEND:
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
case ISD::FP_TO_UINT:		case ISD::FP_TO_UINT:
case ISD::FRINT:		case ISD::FRINT:
case ISD::FROUND:		case ISD::FROUND:
case ISD::FROUNDEVEN:		case ISD::FROUNDEVEN:
case ISD::FSIN:		case ISD::FSIN:
case ISD::FSQRT:		case ISD::FSQRT:
▲ Show 20 Lines • Show All 877 Lines • ▼ Show 20 Lines	#endif
case ISD::FEXP2:		case ISD::FEXP2:
case ISD::FFLOOR:		case ISD::FFLOOR:
case ISD::FLOG:		case ISD::FLOG:
case ISD::FLOG10:		case ISD::FLOG10:
case ISD::FLOG2:		case ISD::FLOG2:
case ISD::FNEARBYINT:		case ISD::FNEARBYINT:
case ISD::FNEG:		case ISD::FNEG:
case ISD::FREEZE:		case ISD::FREEZE:
		case ISD::ARITH_FENCE:
case ISD::FP_EXTEND:		case ISD::FP_EXTEND:
case ISD::FP_ROUND:		case ISD::FP_ROUND:
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
case ISD::FP_TO_UINT:		case ISD::FP_TO_UINT:
case ISD::FRINT:		case ISD::FRINT:
case ISD::FROUND:		case ISD::FROUND:
case ISD::FROUNDEVEN:		case ISD::FROUNDEVEN:
case ISD::FSIN:		case ISD::FSIN:
▲ Show 20 Lines • Show All 2,147 Lines • ▼ Show 20 Lines	#include "llvm/IR/ConstrainedOps.def"
case ISD::BSWAP:		case ISD::BSWAP:
case ISD::CTLZ:		case ISD::CTLZ:
case ISD::CTLZ_ZERO_UNDEF:		case ISD::CTLZ_ZERO_UNDEF:
case ISD::CTPOP:		case ISD::CTPOP:
case ISD::CTTZ:		case ISD::CTTZ:
case ISD::CTTZ_ZERO_UNDEF:		case ISD::CTTZ_ZERO_UNDEF:
case ISD::FNEG:		case ISD::FNEG:
case ISD::FREEZE:		case ISD::FREEZE:
		case ISD::ARITH_FENCE:
		craig.topperUnsubmitted Not Done Reply Inline Actions What about splitting a vector like v8f32 on SSE2? craig.topper: What about splitting a vector like v8f32 on SSE2?
case ISD::FCANONICALIZE:		case ISD::FCANONICALIZE:
Res = WidenVecRes_Unary(N);		Res = WidenVecRes_Unary(N);
break;		break;
case ISD::FMA:		case ISD::FMA:
case ISD::FSHL:		case ISD::FSHL:
case ISD::FSHR:		case ISD::FSHR:
Res = WidenVecRes_Ternary(N);		Res = WidenVecRes_Ternary(N);
break;		break;
▲ Show 20 Lines • Show All 2,442 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,286 Lines • ▼ Show 20 Lines	setValue(&I, DAG.getNode(ISD::FMAXIMUM, sdl,
getValue(I.getArgOperand(1)), Flags));		getValue(I.getArgOperand(1)), Flags));
return;		return;
case Intrinsic::copysign:		case Intrinsic::copysign:
setValue(&I, DAG.getNode(ISD::FCOPYSIGN, sdl,		setValue(&I, DAG.getNode(ISD::FCOPYSIGN, sdl,
getValue(I.getArgOperand(0)).getValueType(),		getValue(I.getArgOperand(0)).getValueType(),
getValue(I.getArgOperand(0)),		getValue(I.getArgOperand(0)),
getValue(I.getArgOperand(1)), Flags));		getValue(I.getArgOperand(1)), Flags));
return;		return;
		case Intrinsic::arithmetic_fence: {
		setValue(&I, DAG.getNode(ISD::ARITH_FENCE, sdl,
		craig.topperUnsubmitted Not Done Reply Inline Actions There's already a variable called sdl that contains this. It's used in the surrounding cases. craig.topper: There's already a variable called sdl that contains this. It's used in the surrounding cases.
		getValue(I.getArgOperand(0)).getValueType(),
		getValue(I.getArgOperand(0)), Flags));
		return;
		craig.topperUnsubmitted Not Done Reply Inline Actions Why isn't this just Val.getValueType()? craig.topper: Why isn't this just Val.getValueType()?
		}
case Intrinsic::fma:		case Intrinsic::fma:
setValue(&I, DAG.getNode(		setValue(&I, DAG.getNode(
ISD::FMA, sdl, getValue(I.getArgOperand(0)).getValueType(),		ISD::FMA, sdl, getValue(I.getArgOperand(0)).getValueType(),
getValue(I.getArgOperand(0)), getValue(I.getArgOperand(1)),		getValue(I.getArgOperand(0)), getValue(I.getArgOperand(1)),
getValue(I.getArgOperand(2)), Flags));		getValue(I.getArgOperand(2)), Flags));
return;		return;
#define INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC) \		#define INSTRUCTION(NAME, NARG, ROUND_MODE, INTRINSIC) \
case Intrinsic::INTRINSIC:		case Intrinsic::INTRINSIC:
▲ Show 20 Lines • Show All 4,822 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

Show First 20 Lines • Show All 2,319 Lines • ▼ Show 20 Lines
void SelectionDAGISel::Select_FREEZE(SDNode *N) {		void SelectionDAGISel::Select_FREEZE(SDNode *N) {
// TODO: We don't have FREEZE pseudo-instruction in MachineInstr-level now.		// TODO: We don't have FREEZE pseudo-instruction in MachineInstr-level now.
// If FREEZE instruction is added later, the code below must be changed as		// If FREEZE instruction is added later, the code below must be changed as
// well.		// well.
CurDAG->SelectNodeTo(N, TargetOpcode::COPY, N->getValueType(0),		CurDAG->SelectNodeTo(N, TargetOpcode::COPY, N->getValueType(0),
N->getOperand(0));		N->getOperand(0));
}		}

		void SelectionDAGISel::Select_ARITH_FENCE(SDNode *N) {
		CurDAG->SelectNodeTo(N, TargetOpcode::ARITH_FENCE, N->getValueType(0),
		N->getOperand(0));
		}

/// GetVBR - decode a vbr encoding whose top bit is set.		/// GetVBR - decode a vbr encoding whose top bit is set.
LLVM_ATTRIBUTE_ALWAYS_INLINE static uint64_t		LLVM_ATTRIBUTE_ALWAYS_INLINE static uint64_t
GetVBR(uint64_t Val, const unsigned char *MatcherTable, unsigned &Idx) {		GetVBR(uint64_t Val, const unsigned char *MatcherTable, unsigned &Idx) {
assert(Val >= 128 && "Not a VBR");		assert(Val >= 128 && "Not a VBR");
Val &= 127; // Remove first vbr bit.		Val &= 127; // Remove first vbr bit.

unsigned Shift = 7;		unsigned Shift = 7;
uint64_t NextBits;		uint64_t NextBits;
▲ Show 20 Lines • Show All 535 Lines • ▼ Show 20 Lines	case ISD::WRITE_REGISTER:
Select_WRITE_REGISTER(NodeToMatch);		Select_WRITE_REGISTER(NodeToMatch);
return;		return;
case ISD::UNDEF:		case ISD::UNDEF:
Select_UNDEF(NodeToMatch);		Select_UNDEF(NodeToMatch);
return;		return;
case ISD::FREEZE:		case ISD::FREEZE:
Select_FREEZE(NodeToMatch);		Select_FREEZE(NodeToMatch);
return;		return;
		case ISD::ARITH_FENCE:
		Select_ARITH_FENCE(NodeToMatch);
		return;
}		}

assert(!NodeToMatch->isMachineOpcode() && "Node already selected!");		assert(!NodeToMatch->isMachineOpcode() && "Node already selected!");

// Set up the node stack with NodeToMatch as the only node on the stack.		// Set up the node stack with NodeToMatch as the only node on the stack.
SmallVector<SDValue, 8> NodeStack;		SmallVector<SDValue, 8> NodeStack;
SDValue N = SDValue(NodeToMatch, 0);		SDValue N = SDValue(NodeToMatch, 0);
NodeStack.push_back(N);		NodeStack.push_back(N);
▲ Show 20 Lines • Show All 903 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/arithmetic_fence.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+fma \| FileCheck %s --check-prefix=X86
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+fma \| FileCheck %s --check-prefix=X64

				define float @f1(float %a, float %b, float %c) {
				; X86-LABEL: f1:
				; X86: # %bb.0:
				; X86-NEXT: pushl %eax
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; X86-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero
				; X86-NEXT: vfmadd213ss {{.#+}} xmm1 = (xmm0 xmm1) + mem
				; X86-NEXT: vmovss %xmm1, (%esp)
				; X86-NEXT: flds (%esp)
				; X86-NEXT: popl %eax
				; X86-NEXT: .cfi_def_cfa_offset 4
				; X86-NEXT: retl
				;
				; X64-LABEL: f1:
				; X64: # %bb.0:
				; X64-NEXT: vfmadd213ss {{.#+}} xmm0 = (xmm1 xmm0) + xmm2
				; X64-NEXT: retq
				%mul = fmul fast float %b, %a
				%add = fadd fast float %mul, %c
				ret float %add
				}

				define float @f2(float %a, float %b, float %c) {
				; X86-LABEL: f2:
				; X86: # %bb.0:
				; X86-NEXT: pushl %eax
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero
				; X86-NEXT: vmulss {{[0-9]+}}(%esp), %xmm0, %xmm0
				; X86-NEXT: #ARITH_FENCE
				; X86-NEXT: vaddss {{[0-9]+}}(%esp), %xmm0, %xmm0
				; X86-NEXT: vmovss %xmm0, (%esp)
				; X86-NEXT: flds (%esp)
				; X86-NEXT: popl %eax
				; X86-NEXT: .cfi_def_cfa_offset 4
				; X86-NEXT: retl
				;
				; X64-LABEL: f2:
				; X64: # %bb.0:
				; X64-NEXT: vmulss %xmm0, %xmm1, %xmm0
				; X64-NEXT: #ARITH_FENCE
				; X64-NEXT: vaddss %xmm2, %xmm0, %xmm0
				; X64-NEXT: retq
				%mul = fmul fast float %b, %a
				%tmp = call float @llvm.arithmetic.fence.f32(float %mul)
				%add = fadd fast float %tmp, %c
				ret float %add
				}

				define double @f3(double %a) {
				; X86-LABEL: f3:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: .cfi_offset %ebp, -8
				; X86-NEXT: movl %esp, %ebp
				; X86-NEXT: .cfi_def_cfa_register %ebp
				; X86-NEXT: andl $-8, %esp
				; X86-NEXT: subl $8, %esp
				; X86-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
				; X86-NEXT: vmulsd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
				; X86-NEXT: vmovsd %xmm0, (%esp)
				; X86-NEXT: fldl (%esp)
				; X86-NEXT: movl %ebp, %esp
				; X86-NEXT: popl %ebp
				; X86-NEXT: .cfi_def_cfa %esp, 4
				; X86-NEXT: retl
				;
				; X64-LABEL: f3:
				; X64: # %bb.0:
				; X64-NEXT: vmulsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; X64-NEXT: retq
				%1 = fadd fast double %a, %a
				%2 = fadd fast double %a, %a
				%3 = fadd fast double %1, %2
				ret double %3
				}

				define double @f4(double %a) {
				; X86-LABEL: f4:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: .cfi_offset %ebp, -8
				; X86-NEXT: movl %esp, %ebp
				; X86-NEXT: .cfi_def_cfa_register %ebp
				; X86-NEXT: andl $-8, %esp
				; X86-NEXT: subl $8, %esp
				; X86-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
				; X86-NEXT: vaddsd %xmm0, %xmm0, %xmm0
				; X86-NEXT: vmovapd %xmm0, %xmm1
				; X86-NEXT: #ARITH_FENCE
				; X86-NEXT: vaddsd %xmm0, %xmm1, %xmm0
				; X86-NEXT: vmovsd %xmm0, (%esp)
				; X86-NEXT: fldl (%esp)
				; X86-NEXT: movl %ebp, %esp
				; X86-NEXT: popl %ebp
				; X86-NEXT: .cfi_def_cfa %esp, 4
				; X86-NEXT: retl
				;
				; X64-LABEL: f4:
				; X64: # %bb.0:
				; X64-NEXT: vaddsd %xmm0, %xmm0, %xmm0
				; X64-NEXT: vmovapd %xmm0, %xmm1
				; X64-NEXT: #ARITH_FENCE
				; X64-NEXT: vaddsd %xmm0, %xmm1, %xmm0
				; X64-NEXT: retq
				%1 = fadd fast double %a, %a
				%t = call double @llvm.arithmetic.fence.f64(double %1)
				%2 = fadd fast double %a, %a
				%3 = fadd fast double %t, %2
				ret double %3
				}

				define <2 x float> @f5(<2 x float> %a) {
				; X86-LABEL: f5:
				; X86: # %bb.0:
				; X86-NEXT: vmulps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0, %xmm0
				; X86-NEXT: retl
				;
				; X64-LABEL: f5:
				; X64: # %bb.0:
				; X64-NEXT: vmulps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0, %xmm0
				; X64-NEXT: retq
				%1 = fadd fast <2 x float> %a, %a
				%2 = fadd fast <2 x float> %a, %a
				%3 = fadd fast <2 x float> %1, %2
				ret <2 x float> %3
				}

				define <2 x float> @f6(<2 x float> %a) {
				; X86-LABEL: f6:
				; X86: # %bb.0:
				; X86-NEXT: vaddps %xmm0, %xmm0, %xmm0
				; X86-NEXT: vmovaps %xmm0, %xmm1
				; X86-NEXT: #ARITH_FENCE
				; X86-NEXT: vaddps %xmm0, %xmm1, %xmm0
				; X86-NEXT: retl
				;
				; X64-LABEL: f6:
				; X64: # %bb.0:
				; X64-NEXT: vaddps %xmm0, %xmm0, %xmm0
				; X64-NEXT: vmovaps %xmm0, %xmm1
				; X64-NEXT: #ARITH_FENCE
				; X64-NEXT: vaddps %xmm0, %xmm1, %xmm0
				; X64-NEXT: retq
				%1 = fadd fast <2 x float> %a, %a
				%t = call <2 x float> @llvm.arithmetic.fence.v2f32(<2 x float> %1)
				%2 = fadd fast <2 x float> %a, %a
				%3 = fadd fast <2 x float> %t, %2
				ret <2 x float> %3
				}

				declare float @llvm.arithmetic.fence.f32(float)
				declare double @llvm.arithmetic.fence.f64(double)
				declare <2 x float> @llvm.arithmetic.fence.v2f32(<2 x float>)

llvm/test/CodeGen/X86/arithmetic_fence2.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=i686-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X86
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 \| FileCheck %s --check-prefix=X64

				define double @f1(double %a) {
				; X86-LABEL: f1:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: .cfi_offset %ebp, -8
				; X86-NEXT: movl %esp, %ebp
				; X86-NEXT: .cfi_def_cfa_register %ebp
				; X86-NEXT: andl $-8, %esp
				; X86-NEXT: subl $8, %esp
				; X86-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
				; X86-NEXT: mulsd {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
				; X86-NEXT: movsd %xmm0, (%esp)
				; X86-NEXT: fldl (%esp)
				; X86-NEXT: movl %ebp, %esp
				; X86-NEXT: popl %ebp
				; X86-NEXT: .cfi_def_cfa %esp, 4
				; X86-NEXT: retl
				;
				; X64-LABEL: f1:
				; X64: # %bb.0:
				; X64-NEXT: mulsd {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
				; X64-NEXT: retq
				%1 = fadd fast double %a, %a
				%2 = fadd fast double %a, %a
				%3 = fadd fast double %1, %2
				ret double %3
				}

				define double @f2(double %a) {
				; X86-LABEL: f2:
				; X86: # %bb.0:
				; X86-NEXT: pushl %ebp
				; X86-NEXT: .cfi_def_cfa_offset 8
				; X86-NEXT: .cfi_offset %ebp, -8
				; X86-NEXT: movl %esp, %ebp
				; X86-NEXT: .cfi_def_cfa_register %ebp
				; X86-NEXT: andl $-8, %esp
				; X86-NEXT: subl $8, %esp
				; X86-NEXT: movsd {{.*#+}} xmm0 = mem[0],zero
				; X86-NEXT: addsd %xmm0, %xmm0
				; X86-NEXT: movapd %xmm0, %xmm1
				; X86-NEXT: #ARITH_FENCE
				; X86-NEXT: addsd %xmm0, %xmm1
				; X86-NEXT: movsd %xmm1, (%esp)
				; X86-NEXT: fldl (%esp)
				; X86-NEXT: movl %ebp, %esp
				; X86-NEXT: popl %ebp
				; X86-NEXT: .cfi_def_cfa %esp, 4
				; X86-NEXT: retl
				;
				; X64-LABEL: f2:
				; X64: # %bb.0:
				; X64-NEXT: addsd %xmm0, %xmm0
				; X64-NEXT: movapd %xmm0, %xmm1
				; X64-NEXT: #ARITH_FENCE
				; X64-NEXT: addsd %xmm0, %xmm1
				; X64-NEXT: movapd %xmm1, %xmm0
				; X64-NEXT: retq
				%1 = fadd fast double %a, %a
				%t = call double @llvm.arithmetic.fence.f64(double %1)
				%2 = fadd fast double %a, %a
				%3 = fadd fast double %t, %2
				ret double %3
				}

				define <2 x float> @f3(<2 x float> %a) {
				; X86-LABEL: f3:
				; X86: # %bb.0:
				; X86-NEXT: mulps {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
				; X86-NEXT: retl
				;
				; X64-LABEL: f3:
				; X64: # %bb.0:
				; X64-NEXT: mulps {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
				; X64-NEXT: retq
				%1 = fadd fast <2 x float> %a, %a
				%2 = fadd fast <2 x float> %a, %a
				%3 = fadd fast <2 x float> %1, %2
				ret <2 x float> %3
				}

				define <2 x float> @f4(<2 x float> %a) {
				; X86-LABEL: f4:
				; X86: # %bb.0:
				; X86-NEXT: addps %xmm0, %xmm0
				; X86-NEXT: movaps %xmm0, %xmm1
				; X86-NEXT: #ARITH_FENCE
				; X86-NEXT: addps %xmm0, %xmm1
				; X86-NEXT: movaps %xmm1, %xmm0
				; X86-NEXT: retl
				;
				; X64-LABEL: f4:
				; X64: # %bb.0:
				; X64-NEXT: addps %xmm0, %xmm0
				; X64-NEXT: movaps %xmm0, %xmm1
				; X64-NEXT: #ARITH_FENCE
				; X64-NEXT: addps %xmm0, %xmm1
				; X64-NEXT: movaps %xmm1, %xmm0
				; X64-NEXT: retq
				%1 = fadd fast <2 x float> %a, %a
				%t = call <2 x float> @llvm.arithmetic.fence.v2f32(<2 x float> %1)
				%2 = fadd fast <2 x float> %a, %a
				%3 = fadd fast <2 x float> %t, %2
				ret <2 x float> %3
				}

				define <8 x float> @f5(<8 x float> %a) {
				; X86-LABEL: f5:
				; X86: # %bb.0:
				; X86-NEXT: movaps {{.*#+}} xmm2 = [4.0E+0,4.0E+0,4.0E+0,4.0E+0]
				; X86-NEXT: mulps %xmm2, %xmm0
				; X86-NEXT: mulps %xmm2, %xmm1
				; X86-NEXT: retl
				;
				; X64-LABEL: f5:
				; X64: # %bb.0:
				; X64-NEXT: movaps {{.*#+}} xmm2 = [4.0E+0,4.0E+0,4.0E+0,4.0E+0]
				; X64-NEXT: mulps %xmm2, %xmm0
				; X64-NEXT: mulps %xmm2, %xmm1
				; X64-NEXT: retq
				%1 = fadd fast <8 x float> %a, %a
				%2 = fadd fast <8 x float> %a, %a
				%3 = fadd fast <8 x float> %1, %2
				ret <8 x float> %3
				}

				define <8 x float> @f6(<8 x float> %a) {
				; X86-LABEL: f6:
				; X86: # %bb.0:
				; X86-NEXT: addps %xmm0, %xmm0
				; X86-NEXT: addps %xmm1, %xmm1
				; X86-NEXT: movaps %xmm1, %xmm2
				; X86-NEXT: #ARITH_FENCE
				; X86-NEXT: movaps %xmm0, %xmm3
				; X86-NEXT: #ARITH_FENCE
				; X86-NEXT: addps %xmm0, %xmm3
				; X86-NEXT: addps %xmm1, %xmm2
				; X86-NEXT: movaps %xmm3, %xmm0
				; X86-NEXT: movaps %xmm2, %xmm1
				; X86-NEXT: retl
				;
				; X64-LABEL: f6:
				; X64: # %bb.0:
				; X64-NEXT: addps %xmm0, %xmm0
				; X64-NEXT: addps %xmm1, %xmm1
				; X64-NEXT: movaps %xmm1, %xmm2
				; X64-NEXT: #ARITH_FENCE
				; X64-NEXT: movaps %xmm0, %xmm3
				; X64-NEXT: #ARITH_FENCE
				; X64-NEXT: addps %xmm0, %xmm3
				; X64-NEXT: addps %xmm1, %xmm2
				; X64-NEXT: movaps %xmm3, %xmm0
				; X64-NEXT: movaps %xmm2, %xmm1
				; X64-NEXT: retq
				%1 = fadd fast <8 x float> %a, %a
				%t = call <8 x float> @llvm.arithmetic.fence.v8f32(<8 x float> %1)
				%2 = fadd fast <8 x float> %a, %a
				%3 = fadd fast <8 x float> %t, %2
				ret <8 x float> %3
				}

				declare float @llvm.arithmetic.fence.f32(float)
				declare double @llvm.arithmetic.fence.f64(double)
				declare <2 x float> @llvm.arithmetic.fence.v2f32(<2 x float>)
				declare <8 x float> @llvm.arithmetic.fence.v8f32(<8 x float>)

This is an archive of the discontinued LLVM Phabricator instance.

[llvm][clang] Create new intrinsic llvm.arithmetic.fence to control FP optimization at expression levelClosedPublic

Details

Rationale

Proposed llvm IR changes

Optimizing llvm.arith.fence

Other llvm changes needed -- utility functions

A simple example

Example, llvm.arith.fence over memory operand

Clang changes to take advantage of this intrinsic

Pros & Cons

Diff Detail

Event Timeline

Revision Contents

Diff 354931

llvm/docs/LangRef.rst

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/include/llvm/CodeGen/ISDOpcodes.h

llvm/include/llvm/CodeGen/SelectionDAGISel.h

llvm/include/llvm/IR/IRBuilder.h

llvm/include/llvm/IR/Intrinsics.td

llvm/include/llvm/Support/TargetOpcodes.def

llvm/include/llvm/Target/Target.td

llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

llvm/test/CodeGen/X86/arithmetic_fence.ll

llvm/test/CodeGen/X86/arithmetic_fence2.ll

[llvm][clang] Create new intrinsic llvm.arithmetic.fence to control FP optimization at expression level
ClosedPublic