This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/CodeGen/
-
CodeGen/
-
CGExprScalar.cpp
-
test/CodeGen/
-
CodeGen/
-
pragma-fp-2.cpp

Differential D66092

[CodeGen] Generate constrained fp intrinsics depending on FPOptions
Needs ReviewPublic

Authored by sepavloff on Aug 12 2019, 9:25 AM.

Download Raw Diff

Details

Reviewers

rjmccall
anemet
kpn
aaron.ballman
hfinkel

Summary

If the value of FPOption is modified, for example by using pragma
'clang fp', create calls to constrained fp intrinsics with metadata
arguments corresponding to the selected rounding mode and exception
behavior.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 36674
Build 36673: arc lint + arc unit

Event Timeline

sepavloff created this revision.Aug 12 2019, 9:25 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 12 2019, 9:25 AM

Harbormaster completed remote builds in B36603: Diff 214646.Aug 12 2019, 9:26 AM

sepavloff added parent revisions: D66073: Added unit tests to check supported rounding modes, D66078: Added RAII object for authomatic restore of fp state, D65997: Add options rounding and exceptions to pragma fp.Aug 12 2019, 9:28 AM

Does this work for anything that uses TreeTransform, like C++ templates?

Also, if any constrained intrinsics are used in a function then the entire function needs to be constrained. Is this handled anywhere?

sepavloff mentioned this in D65994: Extended FPOptions with new attributes.Aug 12 2019, 8:46 PM

Added tests for 'pragma clang fp' in template instantiations

Harbormaster completed remote builds in B36674: Diff 214857.Aug 13 2019, 9:34 AM

In D66092#1625380, @kpn wrote:

Does this work for anything that uses TreeTransform, like C++ templates?

Added such tests.

Also, if any constrained intrinsics are used in a function then the entire function needs to be constrained. Is this handled anywhere?

If we decided to make the entire function constrained, it should be done somewhere in IR transformations, because inlining may mix function bodies with different fp options.

In D66092#1627339, @sepavloff wrote:

In D66092#1625380, @kpn wrote:

Also, if any constrained intrinsics are used in a function then the entire function needs to be constrained. Is this handled anywhere?

If we decided to make the entire function constrained, it should be done somewhere in IR transformations, because inlining may mix function bodies with different fp options.

Kevin is right. We have decided that if constrained intrinsics are used anywhere in a function they must be used throughout the function. Otherwise, there would be nothing to prevent the non-constrained FP operations from migrating across constrained operations and the handling could get botched. The "relaxed" arguments ("round.tonearest" and "fpexcept.ignore") should be used where the default settings would apply. The front end should also be setting the "strictfp" attribute on calls within a constrained scope and, I think, functions that contain constrained intrinsics.

We will need to teach the inliner to enforce this rule if it isn't already doing so, but if things aren't correct coming out of the front end an incorrect optimization could already happen before we get to the inliner. We always rely on the front end producing IR with fully correct semantics.

pengfei added a subscriber: pengfei.Aug 14 2019, 5:23 PM

In D66092#1629460, @andrew.w.kaylor wrote:

In D66092#1627339, @sepavloff wrote:

In D66092#1625380, @kpn wrote:

Also, if any constrained intrinsics are used in a function then the entire function needs to be constrained. Is this handled anywhere?

If we decided to make the entire function constrained, it should be done somewhere in IR transformations, because inlining may mix function bodies with different fp options.

Kevin is right. We have decided that if constrained intrinsics are used anywhere in a function they must be used throughout the function. Otherwise, there would be nothing to prevent the non-constrained FP operations from migrating across constrained operations and the handling could get botched. The "relaxed" arguments ("round.tonearest" and "fpexcept.ignore") should be used where the default settings would apply. The front end should also be setting the "strictfp" attribute on calls within a constrained scope and, I think, functions that contain constrained intrinsics.

We will need to teach the inliner to enforce this rule if it isn't already doing so, but if things aren't correct coming out of the front end an incorrect optimization could already happen before we get to the inliner. We always rely on the front end producing IR with fully correct semantics.

Replacement of floating point operations with constrained intrinsics seems more an optimization helper then a semantic requirement. IR where constrained operations are mixed with unconstrained is still valid in sense of IR specification. Tools that use IR for something other than code generation may don't need such replacement. If the replacement is made by a separate pass, such tool can turn it off, but if it is a part of clang codegen, there is no simple solution, the tool must be reworked.

Another issue is non-standard rounding. It can be represented by constrained intrinsics only. The rounding does not require restrictions on code motion, so mixture of constrained and unconstrained operation is OK. Replacement of all operations with constrained intrinsics would give poorly optimized code, because compiler does not optimize them. It would be a bad thing if a user adds the pragma to execute a statement with specific rounding mode and loses optimization.

Using dedicated pass to shape fp operations seems a flexible solution. It allows to implement things like #pragma STDC FENV_ROUND without teaching all passes to work with constrained intrinsics.

In D66092#1630997, @sepavloff wrote:

Replacement of floating point operations with constrained intrinsics seems more an optimization helper then a semantic requirement. IR where constrained operations are mixed with unconstrained is still valid in sense of IR specification.

The thing that makes the IR semantically incomplete is that there is nothing there to prevent incorrect code motion of the non-constrained operations. Consider this case:

if (someCondition) {
  #pragma clang fp rounding(downward)
  fesetround(FE_DOWNWARD);
  x = y/z;
  fesetround(FE_TONEAREST);
}
a = b/c;

If you generate a regular fdiv instruction for the 'a = b/c;' statement, there is nothing that would prevent it from being hoisted above the call to fesetround() and so it might be rounded incorrectly.

In D66092#1630997, @sepavloff wrote:

Another issue is non-standard rounding. It can be represented by constrained intrinsics only. The rounding does not require restrictions on code motion, so mixture of constrained and unconstrained operation is OK. Replacement of all operations with constrained intrinsics would give poorly optimized code, because compiler does not optimize them. It would be a bad thing if a user adds the pragma to execute a statement with specific rounding mode and loses optimization.

I agree that loss of optimization would be a bad thing, but I think it's unavoidable. By using non-default rounding modes the user is implicitly accepting some loss of optimization. This may be more than they would have expected, but I can't see any way around it.

The thing that makes the IR semantically incomplete is that there is nothing there to prevent incorrect code motion of the non-constrained operations. Consider this case:
if (someCondition) {
  #pragma clang fp rounding(downward)
  fesetround(FE_DOWNWARD);
  x = y/z;
  fesetround(FE_TONEAREST);
}
a = b/c;
If you generate a regular fdiv instruction for the 'a = b/c;' statement, there is nothing that would prevent it from being hoisted above the call to fesetround() and so it might be rounded incorrectly.

This is a good example, as it demonstrates intended usage of the pragma: there is a big program, in which some small pieces must be executed in some special way. Some notes:

User expects that small change confined to selected block is local, it does not affects the code outside the block. The specification of pragma just ensures it. If the change affects the entire function (and possibly other functions that use it), it is felt as something wrong.
The pragma usage is different from intended. The purpose of #pragma clang fp rounding is to model C2x #pragma STDC FENV_ROUND (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2347.pdf, 7.6.2). Such pragma would set rounding mode at the beginning of the block and restore previous state at the end. That is the code should look like:

if (someCondition) {
  #pragma clang fp rounding(downward)
  x = y/z;
}
a = b/c;

What is the issue with moving a = b/c? If it moves ahead of if statement it seems OK, because the rounding mode is the same in that point. It cannot be moved inside the block (where rounding mode is different) because it breaks semantics. We could consider another example:

for (i = …) {
  #pragma clang fp rounding(downward)
  a[i] = x/y;
}

If x and y are loop invariants, x/y could be hoisted out of the loop. However on IR level it would be moved as constrained intrinsic, so semantic would preserve.
The issue arises only when an expression is moved inside the block where specific rounding mode is in effect. Something like this:

z = x*y;
for (i = …) {
  #pragma clang fp rounding(downward)
  a[i] += z;
}

And for some reason z=x*y is inserted into the loop. In such cases the node, that comes from outside the block, must be transformed.

There must be more than one way to prevent undesirable moves. For instance, fence node may be extended so that it prevented moving floating operation across it, and they may be used to organize a region where specific floating point environment is in act.

In D66092#1630997, @sepavloff wrote:

Another issue is non-standard rounding. It can be represented by constrained intrinsics only. The rounding does not require restrictions on code motion, so mixture of constrained and unconstrained operation is OK. Replacement of all operations with constrained intrinsics would give poorly optimized code, because compiler does not optimize them. It would be a bad thing if a user adds the pragma to execute a statement with specific rounding mode and loses optimization.

I agree that loss of optimization would be a bad thing, but I think it's unavoidable. By using non-default rounding modes the user is implicitly accepting some loss of optimization. This may be more than they would have expected, but I can't see any way around it.

Nowadays there are many architectures designed for machine learning tasks. The usually operate on short data (half, bfloat16 etc), in which precision is relatively low. Rounding control in this case is much more important that on big cores. Kernel writers do fancy things using appropriate rounding modes for different pieces of code to gain accuracy. Such processors may encode rounding mode in their instructions. Cost of using specific rounding mode is zero. Loss of performance in this use case is not excusable.

In any case impact on performance must be minimized.

In D66092#1632642, @sepavloff wrote:

What is the issue with moving a = b/c? If it moves ahead of if statement it seems OK, because the rounding mode is the same in that point. It cannot be moved inside the block (where rounding mode is different) because it breaks semantics.

It may be that the optimizer can prove that 'someCondition' is always true and it will eliminate the if statement and there is nothing to prevent the operation from migrating between the calls that change the rounding mode.

This is my main point -- "call i32 @fesetround" does not act as a barrier to an fdiv instruction (for example), but it does act as a barrier to a constrained FP intrinsic. It is not acceptable, for performance reasons in the general case, to have calls act as barriers to unconstrained FP operations. Therefore, to keep everything semantically correct, it is necessary to use constrained intrinsics in any function where the floating point environment may be changed.

I agree that impact on performance must be minimized, but this is necessary for correctness.

It took some digging, but I finally found the e-mail thread where we initially agreed that we can't mix constrained FP intrinsics and non-constrained FP operations within a function. Here it is: http://lists.llvm.org/pipermail/cfe-dev/2017-August/055325.html

mibintc added a subscriber: mibintc.Aug 17 2019, 11:07 AM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGExprScalar.cpp

40 lines

test/

CodeGen/

pragma-fp-2.cpp

104 lines

Diff 214857

clang/lib/CodeGen/CGExprScalar.cpp

Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	bool isFixedPointBinOp() const {
// an int.		// an int.
if (const auto *BinOp = dyn_cast<BinaryOperator>(E)) {		if (const auto *BinOp = dyn_cast<BinaryOperator>(E)) {
QualType LHSType = BinOp->getLHS()->getType();		QualType LHSType = BinOp->getLHS()->getType();
QualType RHSType = BinOp->getRHS()->getType();		QualType RHSType = BinOp->getRHS()->getType();
return LHSType->isFixedPointType() \|\| RHSType->isFixedPointType();		return LHSType->isFixedPointType() \|\| RHSType->isFixedPointType();
}		}
return false;		return false;
}		}

		llvm::ConstrainedFPIntrinsic::RoundingMode getConstrainedRounding() const {
		switch (FPFeatures.getRoundingMode()) {
		case LangOptions::FPRoundingModeKind::ToNearest:
		return llvm::ConstrainedFPIntrinsic::RoundingMode::rmToNearest;
		case LangOptions::FPRoundingModeKind::Downward:
		return llvm:: ConstrainedFPIntrinsic::RoundingMode::rmDownward;
		case LangOptions::FPRoundingModeKind::Upward:
		return llvm::ConstrainedFPIntrinsic::RoundingMode::rmUpward;
		case LangOptions::FPRoundingModeKind::TowardZero:
		return llvm::ConstrainedFPIntrinsic::RoundingMode::rmTowardZero;
		case LangOptions::FPRoundingModeKind::Dynamic:
		return llvm::ConstrainedFPIntrinsic::RoundingMode::rmDynamic;
		}
		return llvm::ConstrainedFPIntrinsic::RoundingMode::rmDynamic;
		}

		llvm::ConstrainedFPIntrinsic::ExceptionBehavior getConstrainedExcept() const {
		switch (FPFeatures.getExceptionMode()) {
		case LangOptions::FPExceptionModeKind::Ignore:
		return llvm::ConstrainedFPIntrinsic::ExceptionBehavior::ebIgnore;
		case LangOptions::FPExceptionModeKind::MayTrap:
		return llvm::ConstrainedFPIntrinsic::ExceptionBehavior::ebMayTrap;
		case LangOptions::FPExceptionModeKind::Strict:
		return llvm::ConstrainedFPIntrinsic::ExceptionBehavior::ebStrict;
		}
		return llvm::ConstrainedFPIntrinsic::ExceptionBehavior::ebStrict;
		}
};		};

static bool MustVisitNullValue(const Expr *E) {		static bool MustVisitNullValue(const Expr *E) {
// If a null pointer expression's type is the C++0x nullptr_t, then		// If a null pointer expression's type is the C++0x nullptr_t, then
// it's not necessarily a simple constant and it must be evaluated		// it's not necessarily a simple constant and it must be evaluated
// for its potential side effects.		// for its potential side effects.
return E->getType()->isNullPtrType();		return E->getType()->isNullPtrType();
}		}
▲ Show 20 Lines • Show All 572 Lines • ▼ Show 20 Lines	Value *EmitMul(const BinOpInfo &Ops) {
}		}

if (Ops.Ty->isUnsignedIntegerType() &&		if (Ops.Ty->isUnsignedIntegerType() &&
CGF.SanOpts.has(SanitizerKind::UnsignedIntegerOverflow) &&		CGF.SanOpts.has(SanitizerKind::UnsignedIntegerOverflow) &&
!CanElideOverflowCheck(CGF.getContext(), Ops))		!CanElideOverflowCheck(CGF.getContext(), Ops))
return EmitOverflowCheckedBinOp(Ops);		return EmitOverflowCheckedBinOp(Ops);

if (Ops.LHS->getType()->isFPOrFPVectorTy()) {		if (Ops.LHS->getType()->isFPOrFPVectorTy()) {
		CGBuilderTy::FloatingPointStateSaver S(Builder,
		Ops.FPFeatures.isFPConstrained(), Ops.getConstrainedExcept(),
		Ops.getConstrainedRounding());
Value *V = Builder.CreateFMul(Ops.LHS, Ops.RHS, "mul");		Value *V = Builder.CreateFMul(Ops.LHS, Ops.RHS, "mul");
return propagateFMFlags(V, Ops);		return propagateFMFlags(V, Ops);
}		}
return Builder.CreateMul(Ops.LHS, Ops.RHS, "mul");		return Builder.CreateMul(Ops.LHS, Ops.RHS, "mul");
}		}
/// Create a binary op that checks for overflow.		/// Create a binary op that checks for overflow.
/// Currently only supports +, - and *.		/// Currently only supports +, - and *.
Value *EmitOverflowCheckedBinOp(const BinOpInfo &Ops);		Value *EmitOverflowCheckedBinOp(const BinOpInfo &Ops);
▲ Show 20 Lines • Show All 2,257 Lines • ▼ Show 20 Lines	if ((CGF.SanOpts.has(SanitizerKind::IntegerDivideByZero) \|\|
llvm::Value *Zero = llvm::Constant::getNullValue(ConvertType(Ops.Ty));		llvm::Value *Zero = llvm::Constant::getNullValue(ConvertType(Ops.Ty));
llvm::Value *NonZero = Builder.CreateFCmpUNE(Ops.RHS, Zero);		llvm::Value *NonZero = Builder.CreateFCmpUNE(Ops.RHS, Zero);
EmitBinOpCheck(std::make_pair(NonZero, SanitizerKind::FloatDivideByZero),		EmitBinOpCheck(std::make_pair(NonZero, SanitizerKind::FloatDivideByZero),
Ops);		Ops);
}		}
}		}

if (Ops.LHS->getType()->isFPOrFPVectorTy()) {		if (Ops.LHS->getType()->isFPOrFPVectorTy()) {
		CGBuilderTy::FloatingPointStateSaver S(Builder,
		Ops.FPFeatures.isFPConstrained(), Ops.getConstrainedExcept(),
		Ops.getConstrainedRounding());
llvm::Value *Val = Builder.CreateFDiv(Ops.LHS, Ops.RHS, "div");		llvm::Value *Val = Builder.CreateFDiv(Ops.LHS, Ops.RHS, "div");
if (CGF.getLangOpts().OpenCL &&		if (CGF.getLangOpts().OpenCL &&
!CGF.CGM.getCodeGenOpts().CorrectlyRoundedDivSqrt) {		!CGF.CGM.getCodeGenOpts().CorrectlyRoundedDivSqrt) {
// OpenCL v1.1 s7.4: minimum accuracy of single precision / is 2.5ulp		// OpenCL v1.1 s7.4: minimum accuracy of single precision / is 2.5ulp
// OpenCL v1.2 s5.6.4.2: The -cl-fp32-correctly-rounded-divide-sqrt		// OpenCL v1.2 s5.6.4.2: The -cl-fp32-correctly-rounded-divide-sqrt
// build option allows an application to specify that single precision		// build option allows an application to specify that single precision
// floating-point divide (x/y and 1/x) and sqrt used in the program		// floating-point divide (x/y and 1/x) and sqrt used in the program
// source are correctly rounded.		// source are correctly rounded.
▲ Show 20 Lines • Show All 341 Lines • ▼ Show 20 Lines	if (op.Ty->isUnsignedIntegerType() &&
!CanElideOverflowCheck(CGF.getContext(), op))		!CanElideOverflowCheck(CGF.getContext(), op))
return EmitOverflowCheckedBinOp(op);		return EmitOverflowCheckedBinOp(op);

if (op.LHS->getType()->isFPOrFPVectorTy()) {		if (op.LHS->getType()->isFPOrFPVectorTy()) {
// Try to form an fmuladd.		// Try to form an fmuladd.
if (Value *FMulAdd = tryEmitFMulAdd(op, CGF, Builder))		if (Value *FMulAdd = tryEmitFMulAdd(op, CGF, Builder))
return FMulAdd;		return FMulAdd;

		CGBuilderTy::FloatingPointStateSaver S(Builder,
		op.FPFeatures.isFPConstrained(), op.getConstrainedExcept(),
		op.getConstrainedRounding());
Value *V = Builder.CreateFAdd(op.LHS, op.RHS, "add");		Value *V = Builder.CreateFAdd(op.LHS, op.RHS, "add");
return propagateFMFlags(V, op);		return propagateFMFlags(V, op);
}		}

if (op.isFixedPointBinOp())		if (op.isFixedPointBinOp())
return EmitFixedPointBinOp(op);		return EmitFixedPointBinOp(op);

return Builder.CreateAdd(op.LHS, op.RHS, "add");		return Builder.CreateAdd(op.LHS, op.RHS, "add");
▲ Show 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	if (op.Ty->isUnsignedIntegerType() &&
CGF.SanOpts.has(SanitizerKind::UnsignedIntegerOverflow) &&		CGF.SanOpts.has(SanitizerKind::UnsignedIntegerOverflow) &&
!CanElideOverflowCheck(CGF.getContext(), op))		!CanElideOverflowCheck(CGF.getContext(), op))
return EmitOverflowCheckedBinOp(op);		return EmitOverflowCheckedBinOp(op);

if (op.LHS->getType()->isFPOrFPVectorTy()) {		if (op.LHS->getType()->isFPOrFPVectorTy()) {
// Try to form an fmuladd.		// Try to form an fmuladd.
if (Value *FMulAdd = tryEmitFMulAdd(op, CGF, Builder, true))		if (Value *FMulAdd = tryEmitFMulAdd(op, CGF, Builder, true))
return FMulAdd;		return FMulAdd;
		CGBuilderTy::FloatingPointStateSaver S(Builder,
		op.FPFeatures.isFPConstrained(), op.getConstrainedExcept(),
		op.getConstrainedRounding());
Value *V = Builder.CreateFSub(op.LHS, op.RHS, "sub");		Value *V = Builder.CreateFSub(op.LHS, op.RHS, "sub");
return propagateFMFlags(V, op);		return propagateFMFlags(V, op);
}		}

if (op.isFixedPointBinOp())		if (op.isFixedPointBinOp())
return EmitFixedPointBinOp(op);		return EmitFixedPointBinOp(op);

return Builder.CreateSub(op.LHS, op.RHS, "sub");		return Builder.CreateSub(op.LHS, op.RHS, "sub");
▲ Show 20 Lines • Show All 1,155 Lines • Show Last 20 Lines

clang/test/CodeGen/pragma-fp-2.cpp

This file was added.

				// RUN: %clang_cc1 -triple %itanium_abi_triple -emit-llvm -o - %s \| FileCheck %s

				void func_01(float x, float y) {
				#pragma clang fp rounding(upward) exceptions(maytrap)
				float z = x + y;
				// CHECK-LABEL: _Z7func_01ff
				// CHECK: call float @llvm.experimental.constrained.fadd.f32({{.*}}, metadata !"round.upward", metadata !"fpexcept.maytrap")
				}

				void func_02(float x, float y) {
				float z = x + y;
				// CHECK-LABEL: _Z7func_02ff
				// CHECK: fadd float %0, %1
				}

				void func_03(float x, float y) {
				float z1 = x + y;
				{
				#pragma clang fp rounding(downward) exceptions(strict)
				float z2 = x - y;
				}
				float z3 = x * y;
				// CHECK-LABEL: _Z7func_03ff
				// CHECK: fadd float
				// CHECK: call float @llvm.experimental.constrained.fsub.f32({{.*}}, metadata !"round.downward", metadata !"fpexcept.strict")
				// CHECK: fmul float
				}

				void func_03a(float x, float y) {
				float z1 = x + y;
				{
				#pragma clang fp rounding(downward) exceptions(strict)
				float z2 = x - y;
				{
				#pragma clang fp rounding(upward) exceptions(maytrap)
				float z2a = x * y;
				}
				float z2b = x + y;
				}
				float z3 = x * y;
				// CHECK-LABEL: _Z8func_03aff
				// CHECK: fadd
				// CHECK: call float @llvm.experimental.constrained.fsub.f32({{.*}}, metadata !"round.downward", metadata !"fpexcept.strict")
				// CHECK: call float @llvm.experimental.constrained.fmul.f32({{.*}}, metadata !"round.upward", metadata !"fpexcept.maytrap")
				// CHECK: call float @llvm.experimental.constrained.fadd.f32({{.*}}, metadata !"round.downward", metadata !"fpexcept.strict")
				// CHECK: fmul
				}


				#pragma clang fp rounding(towardzero) exceptions(ignore)

				void func_04(float x, float y) {
				float z = x + y;
				// CHECK-LABEL: _Z7func_04ff
				// CHECK: call float @llvm.experimental.constrained.fadd.f32({{.*}}, metadata !"round.towardzero", metadata !"fpexcept.ignore")
				}

				void func_05(float x, float y) {
				float z = x / y;
				// CHECK-LABEL: _Z7func_05ff
				// CHECK: call float @llvm.experimental.constrained.fdiv.f32({{.*}}, metadata !"round.towardzero", metadata !"fpexcept.ignore")
				}

				#pragma clang fp rounding(dynamic) exceptions(maytrap)

				void func_06(float x, float y) {
				float z = x + y;
				// CHECK-LABEL: _Z7func_06ff
				// CHECK: call float @llvm.experimental.constrained.fadd.f32({{.*}}, metadata !"round.dynamic", metadata !"fpexcept.maytrap")
				}

				#pragma clang fp rounding(tonearest) exceptions(ignore)

				void func_07(float x, float y) {
				float z = x + y;
				// CHECK-LABEL: _Z7func_07ff
				// CHECK: fadd float
				}

				template<typename T>
				T tfunc_08(T a, T b) {
				#pragma clang fp rounding(downward)
				return a * b;
				}

				float func_09(float x, float y) {
				return x + y;
				// CHECK-LABEL: _Z7func_09ff
				// CHECK: fadd float
				}

				float func_10(float x, float y) {
				return tfunc_08(x, y);
				// CHECK-LABEL: _Z7func_10ff
				}

				// CHECK-LABEL: _Z8tfunc_08IfET_S0_S0_
				// CHECK: call float @llvm.experimental.constrained.fmul.f32({{.*}}, metadata !"round.downward", metadata !"fpexcept.ignore")

				float func_11(float x, float y) {
				return x - y;
				// CHECK-LABEL: _Z7func_11ff
				// CHECK: fsub float
				}