This is an archive of the discontinued LLVM Phabricator instance.

Experimental pass to convert all floating point operations to the equivalent constrained intrinsics
Needs ReviewPublic

Authored by andrew.w.kaylor on Feb 9 2018, 2:02 PM.

Download Raw Diff

Details

Reviewers

spatel
hfinkel
scanon

Summary

I'm not sure this should actually be committed. I'm mostly putting it here as a way to make it available to anyone who is interested.

If anyone would like to see it added to the code base, feel free to make comments and I'll clean it up accordingly.

Right now, it only supports the legacy pass manager. Otherwise, I think it works OK.

Diff Detail

Event Timeline

andrew.w.kaylor created this revision.Feb 9 2018, 2:02 PM

Herald added subscribers: hintonda, mgorny, mehdi_amini. · View Herald TranscriptFeb 9 2018, 2:02 PM

kpn added a subscriber: kpn.Feb 20 2018, 9:32 AM

Isn't this pass needed since we don't have a way to prevent code movement between blocks using the pragma FENV_ACCESS ON and blocks that aren't? I suggest having it check for use of constrained intrinsics before making any changes.

And if it was a function pass we could rerun it after certain optimizations like, say, the inliner.

Thoughts? Anyone?

I don't think this will work. The ConstantFolder could fold away traps early in the IRBuilder. We also wouldn't catch the FNeg/FSub case either.

I think Clang would need to explicitly generate the constrained intrinsics.

Here's a quick example. You can see in the IR that the ConstantFolder picks up the FDiv as undefined:

#include <stdio.h>

int main() {
  float x = 5.0;
  float y = 0.0;
  float z = x/y;
  return z;
}

and

; Function Attrs: norecurse nounwind readnone uwtable
define dso_local i32 @main() local_unnamed_addr #0 !dbg !6 {
entry:
  ret i32 undef, !dbg !8
}

Oh, of course FNeg will need to be emitted by the front end. And couldn't we turn off constant folding in the IRBuilder when necessary?

Plus, we still have to deal with inlining being done llvm-side. We don't know in clang what llvm will inline. And we don't want an inlined function's instructions moved in between use of constrained intrinsics, for example.

In D43142#1284190, @kpn wrote:

And couldn't we turn off constant folding in the IRBuilder when necessary?

I don't think that would work in all cases. E.g. a binary operator with two constant operands.

int main() {
  float z = 5.0/0.0;
  return z;
}

Well, not without a considerable effort to rework the IRBuilder at least. I'm fairly sure we can't produce an Instruction with two Constant operands right now.

Plus, we still have to deal with inlining being done llvm-side. We don't know in clang what llvm will inline. And we don't want an inlined function's instructions moved in between use of constrained intrinsics, for example.

That's undefined behavior. A translation unit that checks or modifies the FP Environment must set the FENV_ACCESS pragma to ON. When it is set to ON, only constrained intrinsics should be generated from Clang. If it is unset or set to OFF, any FP Environment access causes undefined behavior.

It's unclear to me how LTO (or other cross file inlining) would work here. I haven't given it much though until now. My knee-jerk reaction is that we shouldn't be inlining from a FENV_ACCESS=OFF to FENV_ACCESS=ON location.

In D43142#1284271, @cameron.mcinally wrote:
In D43142#1284190, @kpn wrote:

And couldn't we turn off constant folding in the IRBuilder when necessary?

I don't think that would work in all cases. E.g. a binary operator with two constant operands.
int main() {
  float z = 5.0/0.0;
  return z;
}
Well, not without a considerable effort to rework the IRBuilder at least. I'm fairly sure we can't produce an Instruction with two Constant operands right now.

Then don't we have to deal with this sometime? If #pragma FENV_ACCESS ON is set for the main() example above then is constant folding allowed?

In D43142#1284296, @kpn wrote:

Then don't we have to deal with this sometime? If #pragma FENV_ACCESS ON is set for the main() example above then is constant folding allowed?

We do not want to constant fold when FENV_ACCESS=ON. We want the hardware instruction to execute and set the flags at runtime.

It's not really a problem though, since the constrained intrinsics won't constant fold.

In D43142#1284271, @cameron.mcinally wrote:

In D43142#1284190, @kpn wrote:

...

It's unclear to me how LTO (or other cross file inlining) would work here. I haven't given it much though until now. My knee-jerk reaction is that we shouldn't be inlining from a FENV_ACCESS=OFF to FENV_ACCESS=ON location.

I'm pretty sure that we decided that we couldn't (because once you inline the regular FP ops, there's no way to restrict their movement relative to the constrained intrinsics at the IR level).

In D43142#1284135, @cameron.mcinally wrote:

I don't think this will work. The ConstantFolder could fold away traps early in the IRBuilder. We also wouldn't catch the FNeg/FSub case either.

I think Clang would need to explicitly generate the constrained intrinsics.

To be clear, this is something that we need to address even if it seems possible to do it another way. A core part of our project development practice is that we never leave the IR, in between passes, in a semantically invalid or inconsistent state. We always have a single IR semantic interpretation and phase ordering concerns matter only for performance, and effectiveness, not correctness. If we do something else, that should be an explicit decision (that should be discussed on the dev list).

In D43142#1284800, @hfinkel wrote:

In D43142#1284271, @cameron.mcinally wrote:

In D43142#1284190, @kpn wrote:

...

It's unclear to me how LTO (or other cross file inlining) would work here. I haven't given it much though until now. My knee-jerk reaction is that we shouldn't be inlining from a FENV_ACCESS=OFF to FENV_ACCESS=ON location.

I'm pretty sure that we decided that we couldn't (because once you inline the regular FP ops, there's no way to restrict their movement relative to the constrained intrinsics at the IR level).

I thought what we said was that if constrained FP intrinsics were used anywhere in a function then they had to be used everywhere in the function. So if you inline a function with FENV_ACCESS=ON into a function with FENV_ACCESS=OFF then the FP operations in the destination function need to be converted to constrained intrinsics, and if you inline a function with FENV_ACCESS=OFF into a function with FENV_ACCESS=ON then the FP operations in the function being inlined need to be converted. This conversion will have a detrimental effect on optimizations, so it would be nice to factor that into the inling cost, but at this point I'm mostly concerned about generating correct code.

arsenm added a subscriber: arsenm.Nov 1 2018, 6:36 PM

arsenm added inline comments.

lib/Transforms/IPO/ForceConstrainedFP.cpp
59	Why isn't this a function pass?
83–84	I would expect we would have opcode->intrinsic and intrinsic->opcode functions in utilities, and then this pass wouldn't need to be aware of the current set of constrained intrinsics
lib/Transforms/IPO/PassManagerBuilder.cpp
407–410	I would drop this part

In D43142#1284977, @andrew.w.kaylor wrote:

I thought what we said was that if constrained FP intrinsics were used anywhere in a function then they had to be used everywhere in the function. So if you inline a function with FENV_ACCESS=ON into a function with FENV_ACCESS=OFF then the FP operations in the destination function need to be converted to constrained intrinsics, and if you inline a function with FENV_ACCESS=OFF into a function with FENV_ACCESS=ON then the FP operations in the function being inlined need to be converted. This conversion will have a detrimental effect on optimizations, so it would be nice to factor that into the inling cost, but at this point I'm mostly concerned about generating correct code.

I'm realizing that FENV_ACCESS is poorly designed.

Let's take this small example:

#include <fenv.h>
#include <stdio.h>

void bar();

#pragma STDC FENV_ACCESS ON
int main() {
  feenableexcept(FE_DIVBYZERO);
  bar();
  float x = 1.0/0.0;
  printf("main x=%lf\n", x);
}

#pragma STDC FENV_ACCESS OFF
void bar() {
  float x = 1.0/0.0;
  printf("bar x=%lf\n", x);
}

The FDiv in bar() would access the FP environment since main() enabled exceptions. But, since bar() is preceded with FENV_ACCESS=OFF, the Standard says that accessing the FP environment in bar() is undefined behavior. Since it's undefined behavior, the compiler can optimize as it wishes, including inlining bar() without modifying the FDiv.

It would really be the user's responsibility to make bar() safe. E.g.:

#pragma STDC FENV_ACCESS OFF
void bar() {
  fedisableexcept(FE_DIVBYZERO);
  float x = 1.0/0.0;
  printf("bar x=%lf\n", x);
  feenableexcept(FE_DIVBYZERO);
}

In that case, we would be free to inline bar() assuming that the feenableexcept/fedisableexcept functions act as barriers. However, explicitly making each call safe seems like an unreasonable burden on the user. Imagine the mental gymnastics necessary for a large call tree.

I suppose the compiler could automatically manage this for the user, by saving/restoring the FP environment in the prolog/epilog of FENV_ACCESS=OFF functions. But, there would be a non-trivial cost for that. A priori reasoning tells me that it's not worth the trouble.

Can anyone poke holes in this?

In D43142#1285069, @cameron.mcinally wrote:
In D43142#1284977, @andrew.w.kaylor wrote:

I thought what we said was that if constrained FP intrinsics were used anywhere in a function then they had to be used everywhere in the function. So if you inline a function with FENV_ACCESS=ON into a function with FENV_ACCESS=OFF then the FP operations in the destination function need to be converted to constrained intrinsics, and if you inline a function with FENV_ACCESS=OFF into a function with FENV_ACCESS=ON then the FP operations in the function being inlined need to be converted. This conversion will have a detrimental effect on optimizations, so it would be nice to factor that into the inling cost, but at this point I'm mostly concerned about generating correct code.

I'm realizing that FENV_ACCESS is poorly designed.

Let's take this small example:
#include <fenv.h>
#include <stdio.h>

void bar();

#pragma STDC FENV_ACCESS ON
int main() {
  feenableexcept(FE_DIVBYZERO);
  bar();
  float x = 1.0/0.0;
  printf("main x=%lf\n", x);
}

#pragma STDC FENV_ACCESS OFF
void bar() {
  float x = 1.0/0.0;
  printf("bar x=%lf\n", x);
}
The FDiv in bar() would access the FP environment since main() enabled exceptions. But, since bar() is preceded with FENV_ACCESS=OFF, the Standard says that accessing the FP environment in bar() is undefined behavior. Since it's undefined behavior, the compiler can optimize as it wishes, including inlining bar() without modifying the FDiv.

If all optimizations including constant folding, or at least optimizations on floating point, are delayed until after inlining then there's no problem. And the optimizations applied to an inlined functions don't have to be the same as the optimizations on the standalone function. The copy of bar() inlined into main() can inherit the FENV_ACCESS ON of main() despite the standalone version of bar() having it OFF.

In D43142#1284977, @andrew.w.kaylor wrote:

In D43142#1284800, @hfinkel wrote:

In D43142#1284271, @cameron.mcinally wrote:

In D43142#1284190, @kpn wrote:

...

It's unclear to me how LTO (or other cross file inlining) would work here. I haven't given it much though until now. My knee-jerk reaction is that we shouldn't be inlining from a FENV_ACCESS=OFF to FENV_ACCESS=ON location.

I'm pretty sure that we decided that we couldn't (because once you inline the regular FP ops, there's no way to restrict their movement relative to the constrained intrinsics at the IR level).

I thought what we said was that if constrained FP intrinsics were used anywhere in a function then they had to be used everywhere in the function. So if you inline a function with FENV_ACCESS=ON into a function with FENV_ACCESS=OFF then the FP operations in the destination function need to be converted to constrained intrinsics, and if you inline a function with FENV_ACCESS=OFF into a function with FENV_ACCESS=ON then the FP operations in the function being inlined need to be converted. This conversion will have a detrimental effect on optimizations, so it would be nice to factor that into the inling cost, but at this point I'm mostly concerned about generating correct code.

This. My suggestion to add this pass came from this context right here as described by Andy.

The inlining case inserting regular FP ops into a function is no different from a case where FENV_ACCESS is ON in one block inside a function but OFF in the next block inside that same function. I don't think disabling inlining makes sense because of that.

In D43142#1285565, @kpn wrote:

If all optimizations including constant folding, or at least optimizations on floating point, are delayed until after inlining then there's no problem.

I'll add that this is a ton of work. A binary Instruction can't currently have two Constant operands. So, ConstantFolding is baked into the Instruction implementation right now. If I'm mistaken, someone please correct me.

I'm not an expert on inlining, but I imagine there are challenges moving it to 1st in the pass order too. I could see it being difficult to analyze cost on an unoptimized, non-canonical IR.

There are probably other optimizations, like InstCombine, that we're not considering also.

And the optimizations applied to an inlined functions don't have to be the same as the optimizations on the standalone function. The copy of bar() inlined into main() can inherit the >FENV_ACCESS ON of main() despite the standalone version of bar() having it OFF.

At its core, this code has undefined behavior. So, your suggestion is one possible solution.

If my reasoning in my last comment is correct, we don't have to worry about inlining FENV_ACCESS=OFF functions. Since it's undefined behavior, we're free to inline that function with no modifications. It's the user's responsibility to make sure that a FENV_ACCESS=OFF function isn't called by a FENV_ACCESS=ON function without explicitly modifying the FP environment.

Well, at least that's my current understanding of the problem. Does anyone see problems with my initial assumptions?

In D43142#1285602, @cameron.mcinally wrote:

In D43142#1285565, @kpn wrote:

If all optimizations including constant folding, or at least optimizations on floating point, are delayed until after inlining then there's no problem.

I'll add that this is a ton of work. A binary Instruction can't currently have two Constant operands. So, ConstantFolding is baked into the Instruction implementation right now. If I'm mistaken, someone please correct me.

Surely this can be worked around with assigning to compiler temporaries?

Dealing with statics may be an issue. This was mentioned to me at lunch by our front end guy.

I'm not an expert on inlining, but I imagine there are challenges moving it to 1st in the pass order too. I could see it being difficult to analyze cost on an unoptimized, non-canonical IR.

There are probably other optimizations, like InstCombine, that we're not considering also.

OK, that sounds like something that may get dealt with when we have time to polish things into an ideal state. Maybe never.

If my reasoning in my last comment is correct, we don't have to worry about inlining FENV_ACCESS=OFF functions. Since it's undefined behavior, we're free to inline that function with no modifications. It's the user's responsibility to make sure that a FENV_ACCESS=OFF function isn't called by a FENV_ACCESS=ON function without explicitly modifying the FP environment.

Hmmm. A clang option to turn on FENV_ACCESS for entire compilations would work around this issue. So the behavior you are describing here would work for me.

In D43142#1285692, @kpn wrote:

Hmmm. A clang option to turn on FENV_ACCESS for entire compilations would work around this issue. So the behavior you are describing here would work for me.

This is interesting. I was thinking the same thing (and maybe even changing the Standard to match).

I'm struggling to find a good use case for dynamically changing FENV_ACCESS in a translation unit. My current belief is that if a user wants to run with traps, they would want trap-safe code everywhere. If that was the case, the user would still be able to toggle signals with the feenableexcept() and friends functions (similar for rounding mode too).

In D43142#1285069, @cameron.mcinally wrote:

I'm realizing that FENV_ACCESS is poorly designed.

Yes, this is true. It's difficult to understand and difficult to implement. Personally, I'm not as worried about precisely supporting FENV_ACCESS as I am supporting the code behavior that it is intended to enable. The FENV_ACCESS pragma is useful as a reference, but for me this is really about being able to control the optimizer and allow a way to restrict what it does with FP operations. The optimizer has no business knowing anything about pragmas in the source code and what any given language standard says about them. It just needs to be internally consistent and follow the rules we set for the IR language.

But getting back to FENV_ACCESS (because I would like to see us provide a correct implementation of the standard) as it applies to your example, I believe the standard says that the compiler should assume the default environment for any region marked as FENV_ACCESS OFF. The behavior is only undefined if the FP environment doesn't match the defaults when that code is executed. This puts the burden of making sure the environment is correct for transitions between regions with different access settings. That's a bit of a nightmare for the programmer if he decides to mix FENV_ACCESS modes across a program, but it's fairly simple for the compiler. In your original example, we are free to constant fold the division in bar and not worry about the fact that the FP exception won't occur.

Furthermore, if we inline bar into main, we can use "round.tonearest" as the rounding mode argument and "fpexcept.ignore" as the exception behavior argument, so we could still constant fold the division if we hadn't done so previously for whatever reason. This is because the contents of bar are still conceptually in the FENV_ACCESS OFF scope as the user originally specified. The fact that we have chosen to inline the function doesn't change the semantics. The only thing we need to worry about is making sure that FP operations don't migrate across calls or other FP operations that would break these semantics. Using the intrinsics after inlining takes care of that.

In D43142#1285069, @cameron.mcinally wrote:
Let's take this small example:
#include <fenv.h>
#include <stdio.h>

void bar();

#pragma STDC FENV_ACCESS ON
int main() {
  feenableexcept(FE_DIVBYZERO);
  bar();
  float x = 1.0/0.0;
  printf("main x=%lf\n", x);
}

#pragma STDC FENV_ACCESS OFF
void bar() {
  float x = 1.0/0.0;
  printf("bar x=%lf\n", x);
}
The FDiv in bar() would access the FP environment since main() enabled exceptions. But, since bar() is preceded with FENV_ACCESS=OFF, the Standard says that accessing the FP environment in bar() is undefined behavior. Since it's undefined behavior, the compiler can optimize as it wishes, including inlining bar() without modifying the FDiv.

It would really be the user's responsibility to make bar() safe. E.g.:
#pragma STDC FENV_ACCESS OFF
void bar() {
  fedisableexcept(FE_DIVBYZERO);
  float x = 1.0/0.0;
  printf("bar x=%lf\n", x);
  feenableexcept(FE_DIVBYZERO);
}
In that case, we would be free to inline bar() assuming that the feenableexcept/fedisableexcept functions act as barriers. However, explicitly making each call safe seems like an unreasonable burden on the user. Imagine the mental gymnastics necessary for a large call tree.

I suppose the compiler could automatically manage this for the user, by saving/restoring the FP environment in the prolog/epilog of FENV_ACCESS=OFF functions. But, there would be a non-trivial cost for that. A priori reasoning tells me that it's not worth the trouble.

Can anyone poke holes in this?

In D43142#1285602, @cameron.mcinally wrote:

In D43142#1285565, @kpn wrote:

If all optimizations including constant folding, or at least optimizations on floating point, are delayed until after inlining then there's no problem.

I'll add that this is a ton of work. A binary Instruction can't currently have two Constant operands. So, ConstantFolding is baked into the Instruction implementation right now. If I'm mistaken, someone please correct me.

I'm not an expert on inlining, but I imagine there are challenges moving it to 1st in the pass order too. I could see it being difficult to analyze cost on an unoptimized, non-canonical IR.

We really can't defer optimizations that occur before inlining. The legacy pass manager is specifically structured so that the function simplification passes and the inlining pass are layered such that the function simplification passes are run on a function before we decide whether or not to inline it and then run again on the calling functions so that if we did inline the code can be simplified again in its new context before we decide whether or not to inline that function. I'm not sure how the new pass manager manages this dance, but I'm certain that it must have the same effective ordering.

In D43142#1285708, @andrew.w.kaylor wrote:

The fact that we have chosen to inline the function doesn't change the semantics.

I agree with this. Inlined or a call, it should be the same semantics.

The only thing we need to worry about is making sure that FP operations don't migrate across calls or other FP operations that would break these semantics. Using the intrinsics after inlining takes care of that.

I don't agree with this. This is changing the semantics if we choose to inline a function by converting some operations into constrained intrinsics. In other words, an executable will have different behavior if we choose to inline or not. That's not ok.

We're in undefined behavior territory if the user doesn't explicitly reset the FP Env to the default when moving to a FENV_ACCESS=OFF function. If we treat the feenableexcept() and friends as barriers (which we must anyway), then we can freely inline (or call) a function as we wish with no change to semantics.

In D43142#1285789, @cameron.mcinally wrote:

In D43142#1285708, @andrew.w.kaylor wrote:

The only thing we need to worry about is making sure that FP operations don't migrate across calls or other FP operations that would break these semantics. Using the intrinsics after inlining takes care of that.

I don't agree with this. This is changing the semantics if we choose to inline a function by converting some operations into constrained intrinsics. In other words, an executable will have different behavior if we choose to inline or not. That's not ok.

Well, yes and no. In a region defined as FENV_ACCESS off the user has granted the compiler explicit permission to ignore FP status bits and exception behavior. Depending on how we optimize these regions the FP status bits may come out differently and exceptions may or may not be raised, but the user has promised not to unmask exceptions or look at status bits in that region. So if that part of the behavior of the code changes depending on whether or not we inline that's OK because that's what the user signed up for. In that sense the behavior hasn't changed any more than it does between -O0 and -O3.

Now maybe someone might have code that calls an FENV_ACCESS OFF function from an FENV_ACCESS ON function and then looks at FP status bits after the call returns, but the standard explicitly says that when controls transfers from a scope with FENV_ACCESS off to a scope with FENV_ACCESS on the state of the FP status flags is undefined.

So, yes, technically the behavior may change, but that's OK.

In D43142#1285825, @andrew.w.kaylor wrote:

In D43142#1285789, @cameron.mcinally wrote:

In D43142#1285708, @andrew.w.kaylor wrote:

The only thing we need to worry about is making sure that FP operations don't migrate across calls or other FP operations that would break these semantics. Using the intrinsics after inlining takes care of that.

I don't agree with this. This is changing the semantics if we choose to inline a function by converting some operations into constrained intrinsics. In other words, an executable will have different behavior if we choose to inline or not. That's not ok.

Well, yes and no. In a region defined as FENV_ACCESS off the user has granted the compiler explicit permission to ignore FP status bits and exception behavior. Depending on how we optimize these regions the FP status bits may come out differently and exceptions may or may not be raised, but the user has promised not to unmask exceptions or look at status bits in that region. So if that part of the behavior of the code changes depending on whether or not we inline that's OK because that's what the user signed up for. In that sense the behavior hasn't changed any more than it does between -O0 and -O3.

More specifically, in an all FENV_ACCESS=OFF world we _already_ have different results that come from the compiler choosing to inline or not. So different results from inlining a FENV_ACCESS=OFF function into a FENV_ACCESS=ON function is just more of the same.

To the point about barriers, converting to use the constrained intrinsics functions as a barrier whether or not the =OFF function calls fpenableexcept() and friends.

Now maybe someone might have code that calls an FENV_ACCESS OFF function from an FENV_ACCESS ON function and then looks at FP status bits after the call returns, but the standard explicitly says that when controls transfers from a scope with FENV_ACCESS off to a scope with FENV_ACCESS on the state of the FP status flags is undefined.

So, yes, technically the behavior may change, but that's OK.

Agreed.

In D43142#1285825, @andrew.w.kaylor wrote:

Well, yes and no. In a region defined as FENV_ACCESS off the user has granted the compiler explicit permission to ignore FP status bits and exception behavior. Depending on how we optimize these regions the FP status bits may come out differently and exceptions may or may not be raised, but the user has promised not to unmask exceptions or look at status bits in that region. So if that part of the behavior of the code changes depending on whether or not we inline that's OK because that's what the user signed up for. In that sense the behavior hasn't changed any more than it does between -O0 and -O3.

That's well put. I agree with that. I was probably overreaching in my last comment.

Now maybe someone might have code that calls an FENV_ACCESS OFF function from an FENV_ACCESS ON function and then looks at FP status bits after the call returns, but the standard explicitly says that when controls transfers from a scope with FENV_ACCESS off to a scope with FENV_ACCESS on the state of the FP status flags is undefined.

So, yes, technically the behavior may change, but that's OK.

I think we both agree that it's undefined behavior. If that's correct, then it doesn't really matter what we do after we hit it. So any solution is acceptable...

In D43142#1285856, @cameron.mcinally wrote:

I think we both agree that it's undefined behavior. If that's correct, then it doesn't really matter what we do after we hit it. So any solution is acceptable...

We still need to control code motion. For instance, if the code checks the status bits before calling the FENV_ACCESS OFF function we can't have FP operations that are inlined hoisted above the status bit check.

In D43142#1285974, @andrew.w.kaylor wrote:

We still need to control code motion. For instance, if the code checks the status bits before calling the FENV_ACCESS OFF function we can't have FP operations that are inlined hoisted above the status bit check.

How are you envisioning the status bits are checked?

If through fegetenv() and friends, those should probably be barriers.

cameron.mcinally mentioned this in D53157: Teach the IRBuilder about constrained fadd and friends.Nov 17 2018, 2:16 PM

uweigand added a subscriber: uweigand.Dec 3 2018, 8:08 AM

pengfei added a subscriber: pengfei.Aug 14 2019, 5:24 PM

kpn mentioned this in D69272: Enable '#pragma STDC FENV_ACCESS' in frontend.Oct 22 2019, 7:33 AM

Revision Contents

Path

Size

include/

llvm/

InitializePasses.h

1 line

Transforms/

IPO.h

3 lines

lib/

Transforms/

IPO/

CMakeLists.txt

1 line

ForceConstrainedFP.cpp

193 lines

IPO.cpp

1 line

PassManagerBuilder.cpp

8 lines

test/

Transforms/

force-constrained-fp.ll

91 lines

Diff 133682

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines
	void initializeExpandMemCmpPassPass(PassRegistry&);			void initializeExpandMemCmpPassPass(PassRegistry&);
	void initializeExpandPostRAPass(PassRegistry&);			void initializeExpandPostRAPass(PassRegistry&);
	void initializeExpandReductionsPass(PassRegistry&);			void initializeExpandReductionsPass(PassRegistry&);
	void initializeExternalAAWrapperPassPass(PassRegistry&);			void initializeExternalAAWrapperPassPass(PassRegistry&);
	void initializeFEntryInserterPass(PassRegistry&);			void initializeFEntryInserterPass(PassRegistry&);
	void initializeFinalizeMachineBundlesPass(PassRegistry&);			void initializeFinalizeMachineBundlesPass(PassRegistry&);
	void initializeFlattenCFGPassPass(PassRegistry&);			void initializeFlattenCFGPassPass(PassRegistry&);
	void initializeFloat2IntLegacyPassPass(PassRegistry&);			void initializeFloat2IntLegacyPassPass(PassRegistry&);
				void initializeForceConstrainedFPPassPass(PassRegistry&);
	void initializeForceFunctionAttrsLegacyPassPass(PassRegistry&);			void initializeForceFunctionAttrsLegacyPassPass(PassRegistry&);
	void initializeForwardControlFlowIntegrityPass(PassRegistry&);			void initializeForwardControlFlowIntegrityPass(PassRegistry&);
	void initializeFuncletLayoutPass(PassRegistry&);			void initializeFuncletLayoutPass(PassRegistry&);
	void initializeFunctionImportLegacyPassPass(PassRegistry&);			void initializeFunctionImportLegacyPassPass(PassRegistry&);
	void initializeGCMachineCodeAnalysisPass(PassRegistry&);			void initializeGCMachineCodeAnalysisPass(PassRegistry&);
	void initializeGCModuleInfoPass(PassRegistry&);			void initializeGCModuleInfoPass(PassRegistry&);
	void initializeGCOVProfilerLegacyPassPass(PassRegistry&);			void initializeGCOVProfilerLegacyPassPass(PassRegistry&);
	void initializeGVNHoistLegacyPassPass(PassRegistry&);			void initializeGVNHoistLegacyPassPass(PassRegistry&);
	▲ Show 20 Lines • Show All 242 Lines • Show Last 20 Lines

include/llvm/Transforms/IPO.h

	Show First 20 Lines • Show All 269 Lines • ▼ Show 20 Lines
	// IR metadata to reflect the profile.			// IR metadata to reflect the profile.
	ModulePass *createSampleProfileLoaderPass();			ModulePass *createSampleProfileLoaderPass();
	ModulePass *createSampleProfileLoaderPass(StringRef Name);			ModulePass *createSampleProfileLoaderPass(StringRef Name);

	/// Write ThinLTO-ready bitcode to Str.			/// Write ThinLTO-ready bitcode to Str.
	ModulePass *createWriteThinLTOBitcodePass(raw_ostream &Str,			ModulePass *createWriteThinLTOBitcodePass(raw_ostream &Str,
	raw_ostream *ThinLinkOS = nullptr);			raw_ostream *ThinLinkOS = nullptr);

				/// Experimental pass to force all FP ops to constrained intrinsics
				ModulePass *createForceConstrainedFPPass();

	} // End llvm namespace			} // End llvm namespace

	#endif			#endif

lib/Transforms/IPO/CMakeLists.txt

	add_llvm_library(LLVMipo			add_llvm_library(LLVMipo
	AlwaysInliner.cpp			AlwaysInliner.cpp
	ArgumentPromotion.cpp			ArgumentPromotion.cpp
	BarrierNoopPass.cpp			BarrierNoopPass.cpp
	BlockExtractor.cpp			BlockExtractor.cpp
	CalledValuePropagation.cpp			CalledValuePropagation.cpp
	ConstantMerge.cpp			ConstantMerge.cpp
	CrossDSOCFI.cpp			CrossDSOCFI.cpp
	DeadArgumentElimination.cpp			DeadArgumentElimination.cpp
	ElimAvailExtern.cpp			ElimAvailExtern.cpp
	ExtractGV.cpp			ExtractGV.cpp
				ForceConstrainedFP.cpp
	ForceFunctionAttrs.cpp			ForceFunctionAttrs.cpp
	FunctionAttrs.cpp			FunctionAttrs.cpp
	FunctionImport.cpp			FunctionImport.cpp
	GlobalDCE.cpp			GlobalDCE.cpp
	GlobalOpt.cpp			GlobalOpt.cpp
	GlobalSplit.cpp			GlobalSplit.cpp
	IPConstantPropagation.cpp			IPConstantPropagation.cpp
	IPO.cpp			IPO.cpp
	Show All 24 Lines

lib/Transforms/IPO/ForceConstrainedFP.cpp

				//===----- ForceConstrainedFP.cpp - Convert all FP ops to constrained -----===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file defines a pass that converts all floating point operations to
				/// the equivalent constrained intrinsic.
				///
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/ArrayRef.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/IR/InstrTypes.h"
				#include "llvm/IR/Intrinsics.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/IR/Module.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Transforms/IPO.h"
				using namespace llvm;

				#define DEBUG_TYPE "force-constrained-fp"

				STATISTIC(NumFPOpsConverted, "Number of FP ops converted to intrinsics");

				namespace {

				class ForceConstrainedFPPass : public ModulePass {
				public:
				static char ID;

				ForceConstrainedFPPass() : ModulePass(ID) {
				initializeForceConstrainedFPPassPass(*PassRegistry::getPassRegistry());
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesAll();
				}

				bool runOnModule(Module &M) override;

				private:
				bool processIntrinsicCall(IntrinsicInst* IntrinCall);
				bool processBinaryOperator(BinaryOperator *BinOp);

				void replaceFPInstrWithIntrinsic(Instruction *I, Intrinsic::ID IID,
				ArrayRef<Value*> Operands);
				};

				bool ForceConstrainedFPPass::runOnModule(Module &M) {
				bool Changed = false;

				for (auto &F : M.functions()) {
				arsenmUnsubmitted Not Done Reply Inline Actions Why isn't this a function pass? arsenm: Why isn't this a function pass?
				for (auto &BB : F) {
				// Because we're erasing instructions, we need to pre-increment
				// the iterator.
				auto It = BB.begin();
				while (It != BB.end()) {
				Instruction &I = *It;
				++It;
				if (auto *BinOp = dyn_cast<BinaryOperator>(&I))
				Changed \|= processBinaryOperator(BinOp);
				else if (auto *IntrinCall = dyn_cast<IntrinsicInst>(&I))
				Changed \|= processIntrinsicCall(IntrinCall);
				}
				}
				}

				return Changed;
				}

				bool ForceConstrainedFPPass::processBinaryOperator(BinaryOperator* BinOp) {
				Intrinsic::ID IID;
				switch (BinOp->getOpcode()) {
				default:
				return false;
				case Instruction::FAdd:
				IID = Intrinsic::experimental_constrained_fadd;
				arsenmUnsubmitted Not Done Reply Inline Actions I would expect we would have opcode->intrinsic and intrinsic->opcode functions in utilities, and then this pass wouldn't need to be aware of the current set of constrained intrinsics arsenm: I would expect we would have opcode->intrinsic and intrinsic->opcode functions in utilities…
				break;
				case Instruction::FSub:
				IID = Intrinsic::experimental_constrained_fsub;
				break;
				case Instruction::FMul:
				IID = Intrinsic::experimental_constrained_fmul;
				break;
				case Instruction::FDiv:
				IID = Intrinsic::experimental_constrained_fdiv;
				break;
				case Instruction::FRem:
				IID = Intrinsic::experimental_constrained_frem;
				break;
				}
				replaceFPInstrWithIntrinsic(BinOp, IID, { BinOp->getOperand(0),
				BinOp->getOperand(1) });
				return true;
				}

				bool ForceConstrainedFPPass::processIntrinsicCall(IntrinsicInst* IntrinCall) {
				Intrinsic::ID IID;
				switch (IntrinCall->getIntrinsicID()) {
				default:
				return false;
				case Intrinsic::fma:
				IID = Intrinsic::experimental_constrained_fma;
				break;
				case Intrinsic::sqrt:
				IID = Intrinsic::experimental_constrained_sqrt;
				break;
				case Intrinsic::pow:
				IID = Intrinsic::experimental_constrained_pow;
				break;
				case Intrinsic::powi:
				IID = Intrinsic::experimental_constrained_powi;
				break;
				case Intrinsic::sin:
				IID = Intrinsic::experimental_constrained_sin;
				break;
				case Intrinsic::cos:
				IID = Intrinsic::experimental_constrained_cos;
				break;
				case Intrinsic::exp:
				IID = Intrinsic::experimental_constrained_exp;
				break;
				case Intrinsic::exp2:
				IID = Intrinsic::experimental_constrained_exp2;
				break;
				case Intrinsic::log:
				IID = Intrinsic::experimental_constrained_log;
				break;
				case Intrinsic::log10:
				IID = Intrinsic::experimental_constrained_log10;
				break;
				case Intrinsic::log2:
				IID = Intrinsic::experimental_constrained_log2;
				break;
				case Intrinsic::rint:
				IID = Intrinsic::experimental_constrained_rint;
				break;
				case Intrinsic::nearbyint:
				IID = Intrinsic::experimental_constrained_nearbyint;
				break;
				}
				SmallVector<Value *, 4> Args(IntrinCall->arg_operands());
				replaceFPInstrWithIntrinsic(IntrinCall, IID, Args);
				return true;
				}

				void
				ForceConstrainedFPPass::replaceFPInstrWithIntrinsic(Instruction *I,
				Intrinsic::ID IID,
				ArrayRef<Value*> Operands) {
				Module *M = I->getModule();
				LLVMContext &Context = I->getContext();
				Value *RoundingMode = MetadataAsValue::get(Context,
				MDString::get(Context,
				"round.dynamic"));
				Value *ExBehavior = MetadataAsValue::get(Context,
				MDString::get(Context,
				"fpexcept.strict"));
				Function *IntrinFn = Intrinsic::getDeclaration(M, IID,
				I->getType());

				SmallVector<Value *, 5> Args;
				for (auto *Op : Operands)
				Args.push_back(Op);
				Args.push_back(RoundingMode);
				Args.push_back(ExBehavior);

				Value *FPOp = CallInst::Create(IntrinFn, Args, I->getName() + ".strict", I);

				DEBUG(dbgs() << "ForceConstrainedFP: Replacing:\n " << *I
				<< "\n with:\n " << *FPOp);
				++NumFPOpsConverted;

				I->replaceAllUsesWith(FPOp);
				I->eraseFromParent();
				}

				} // End anonymous namespace

				char ForceConstrainedFPPass::ID = 0;
				INITIALIZE_PASS(ForceConstrainedFPPass, "force-constrained-fp",
				"Force constrained floating point", false, false)

				ModulePass *llvm::createForceConstrainedFPPass() {
				return new ForceConstrainedFPPass();
				}

lib/Transforms/IPO/IPO.cpp

	Show All 24 Lines

	void llvm::initializeIPO(PassRegistry &Registry) {			void llvm::initializeIPO(PassRegistry &Registry) {
	initializeArgPromotionPass(Registry);			initializeArgPromotionPass(Registry);
	initializeCalledValuePropagationLegacyPassPass(Registry);			initializeCalledValuePropagationLegacyPassPass(Registry);
	initializeConstantMergeLegacyPassPass(Registry);			initializeConstantMergeLegacyPassPass(Registry);
	initializeCrossDSOCFIPass(Registry);			initializeCrossDSOCFIPass(Registry);
	initializeDAEPass(Registry);			initializeDAEPass(Registry);
	initializeDAHPass(Registry);			initializeDAHPass(Registry);
				initializeForceConstrainedFPPassPass(Registry);
	initializeForceFunctionAttrsLegacyPassPass(Registry);			initializeForceFunctionAttrsLegacyPassPass(Registry);
	initializeGlobalDCELegacyPassPass(Registry);			initializeGlobalDCELegacyPassPass(Registry);
	initializeGlobalOptLegacyPassPass(Registry);			initializeGlobalOptLegacyPassPass(Registry);
	initializeGlobalSplitPass(Registry);			initializeGlobalSplitPass(Registry);
	initializeIPCPPass(Registry);			initializeIPCPPass(Registry);
	initializeAlwaysInlinerLegacyPassPass(Registry);			initializeAlwaysInlinerLegacyPassPass(Registry);
	initializeSimpleInlinerPass(Registry);			initializeSimpleInlinerPass(Registry);
	initializeInferFunctionAttrsLegacyPassPass(Registry);			initializeInferFunctionAttrsLegacyPassPass(Registry);
	▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	static cl::opt<bool>
EnableSimpleLoopUnswitch("enable-simple-loop-unswitch", cl::init(false),		EnableSimpleLoopUnswitch("enable-simple-loop-unswitch", cl::init(false),
cl::Hidden,		cl::Hidden,
cl::desc("Enable the simple loop unswitch pass."));		cl::desc("Enable the simple loop unswitch pass."));

static cl::opt<bool> EnableGVNSink(		static cl::opt<bool> EnableGVNSink(
"enable-gvn-sink", cl::init(false), cl::Hidden,		"enable-gvn-sink", cl::init(false), cl::Hidden,
cl::desc("Enable the GVN sinking pass (default = off)"));		cl::desc("Enable the GVN sinking pass (default = off)"));

		static cl::opt<bool>
		ForceConstrainedFP("convert-fp-to-constrained", cl::init(false), cl::Hidden,
		cl::desc("Convert all FP ops to constrained intrinsics"));

PassManagerBuilder::PassManagerBuilder() {		PassManagerBuilder::PassManagerBuilder() {
OptLevel = 2;		OptLevel = 2;
SizeLevel = 0;		SizeLevel = 0;
LibraryInfo = nullptr;		LibraryInfo = nullptr;
Inliner = nullptr;		Inliner = nullptr;
DisableUnrollLoops = false;		DisableUnrollLoops = false;
SLPVectorize = RunSLPVectorization;		SLPVectorize = RunSLPVectorization;
LoopVectorize = RunLoopVectorization;		LoopVectorize = RunLoopVectorization;
▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(
if (!PGOSampleUse.empty()) {		if (!PGOSampleUse.empty()) {
MPM.add(createPruneEHPass());		MPM.add(createPruneEHPass());
MPM.add(createSampleProfileLoaderPass(PGOSampleUse));		MPM.add(createSampleProfileLoaderPass(PGOSampleUse));
}		}

// Allow forcing function attributes as a debugging and tuning aid.		// Allow forcing function attributes as a debugging and tuning aid.
MPM.add(createForceFunctionAttrsLegacyPass());		MPM.add(createForceFunctionAttrsLegacyPass());

		// Experimental pass to exercise the constrained FP intrinsics.
		if (ForceConstrainedFP)
		MPM.add(createForceConstrainedFPPass());

		arsenmUnsubmitted Not Done Reply Inline Actions I would drop this part arsenm: I would drop this part
// If all optimizations are disabled, just run the always-inline pass and,		// If all optimizations are disabled, just run the always-inline pass and,
// if enabled, the function merging pass.		// if enabled, the function merging pass.
if (OptLevel == 0) {		if (OptLevel == 0) {
addPGOInstrPasses(MPM);		addPGOInstrPasses(MPM);
if (Inliner) {		if (Inliner) {
MPM.add(Inliner);		MPM.add(Inliner);
Inliner = nullptr;		Inliner = nullptr;
}		}
▲ Show 20 Lines • Show All 609 Lines • Show Last 20 Lines

test/Transforms/force-constrained-fp.ll

				; RUN: opt < %s -force-constrained-fp -S 2>&1 \| FileCheck %s

				define void @test(float %f, double %d) {
				%t1 = fadd float %f, %f
				; CHECK: %t1.strict = call float @llvm.experimental.constrained.fadd.f32(
				; CHECK: float %f, float %f,
				; CHECK: metadata !"round.dynamic", metadata !"fpexcept.strict")
				%t2 = call double @llvm.sqrt.f64(double %d)
				; CHECK: %t2.strict = call double @llvm.experimental.constrained.sqrt.f64(
				; CHECK: double %d,
				; CHECK: metadata !"round.dynamic", metadata !"fpexcept.strict")
				%t3 = fsub double %t2, %d
				; CHECK: %t3.strict = call double @llvm.experimental.constrained.fsub.f64(
				; CHECK: double %t2.strict, double %d,
				; CHECK: metadata !"round.dynamic", metadata !"fpexcept.strict")
				%t4 = fmul float %t1, %f
				; CHECK: %t4.strict = call float @llvm.experimental.constrained.fmul.f32(
				; CHECK: float %t1.strict, float %f,
				; CHECK: metadata !"round.dynamic", metadata !"fpexcept.strict")
				%t5 = fdiv double %t2, %t3
				; CHECK: %t5.strict = call double @llvm.experimental.constrained.fdiv.f64(
				; CHECK: double %t2.strict, double %t3.strict,
				; CHECK: metadata !"round.dynamic", metadata !"fpexcept.strict")
				%t6 = frem double %t2, %t3
				; CHECK: %t6.strict = call double @llvm.experimental.constrained.frem.f64(
				; CHECK: double %t2.strict, double %t3.strict,
				; CHECK: metadata !"round.dynamic", metadata !"fpexcept.strict")
				%t7 = call float @llvm.fma.f32(float %t1, float %f, float %t4)
				; CHECK: %t7.strict = call float @llvm.experimental.constrained.fma.f32(
				; CHECK: float %t1.strict, float %f, float %t4.strict,
				; CHECK: metadata !"round.dynamic", metadata !"fpexcept.strict")
				%t8 = call double @llvm.pow.f64(double %d, double %t3)
				; CHECK: %t8.strict = call double @llvm.experimental.constrained.pow.f64(
				; CHECK: double %d, double %t3.strict,
				; CHECK: metadata !"round.dynamic", metadata !"fpexcept.strict")
				%t9 = call float @llvm.powi.f32(float %f, i32 4)
				; CHECK: %t9.strict = call float @llvm.experimental.constrained.powi.f32(
				; CHECK: float %f, i32 4,
				; CHECK: metadata !"round.dynamic", metadata !"fpexcept.strict")
				%t10 = call double @llvm.sin.f64(double %d)
				; CHECK: %t10.strict = call double @llvm.experimental.constrained.sin.f64(
				; CHECK: double %d,
				; CHECK: metadata !"round.dynamic", metadata !"fpexcept.strict")
				%t11 = call double @llvm.cos.f64(double %d)
				; CHECK: %t11.strict = call double @llvm.experimental.constrained.cos.f64(
				; CHECK: double %d,
				; CHECK: metadata !"round.dynamic", metadata !"fpexcept.strict")
				%t12 = call float @llvm.exp.f32(float %f)
				; CHECK: %t12.strict = call float @llvm.experimental.constrained.exp.f32(
				; CHECK: float %f,
				; CHECK: metadata !"round.dynamic", metadata !"fpexcept.strict")
				%t13 = call float @llvm.exp2.f32(float %f)
				; CHECK: %t13.strict = call float @llvm.experimental.constrained.exp2.f32(
				; CHECK: float %f,
				; CHECK: metadata !"round.dynamic", metadata !"fpexcept.strict")
				%t14 = call double @llvm.log.f64(double %d)
				; CHECK: %t14.strict = call double @llvm.experimental.constrained.log.f64(
				; CHECK: double %d,
				; CHECK: metadata !"round.dynamic", metadata !"fpexcept.strict")
				%t15 = call double @llvm.log10.f64(double %d)
				; CHECK: %t15.strict = call double @llvm.experimental.constrained.log10.f64(
				; CHECK: double %d,
				; CHECK: metadata !"round.dynamic", metadata !"fpexcept.strict")
				%t16 = call double @llvm.log2.f64(double %d)
				; CHECK: %t16.strict = call double @llvm.experimental.constrained.log2.f64(
				; CHECK: double %d,
				; CHECK: metadata !"round.dynamic", metadata !"fpexcept.strict")
				%t17 = call float @llvm.rint.f32(float %f)
				; CHECK: %t17.strict = call float @llvm.experimental.constrained.rint.f32(
				; CHECK: float %f,
				; CHECK: metadata !"round.dynamic", metadata !"fpexcept.strict")
				%t18 = call float @llvm.nearbyint.f32(float %f)
				; CHECK: %t18.strict = call float @llvm.experimental.constrained.nearbyint.f32(
				; CHECK: float %f,
				; CHECK: metadata !"round.dynamic", metadata !"fpexcept.strict")
				ret void
				}

				declare double @llvm.sqrt.f64(double)
				declare float @llvm.fma.f32(float, float, float)
				declare double @llvm.pow.f64(double, double)
				declare float @llvm.powi.f32(float, i32)
				declare double @llvm.sin.f64(double)
				declare double @llvm.cos.f64(double)
				declare float @llvm.exp.f32(float)
				declare float @llvm.exp2.f32(float)
				declare double @llvm.log.f64(double)
				declare double @llvm.log10.f64(double)
				declare double @llvm.log2.f64(double)
				declare float @llvm.rint.f32(float)
				declare float @llvm.nearbyint.f32(float)

This is an archive of the discontinued LLVM Phabricator instance.

Experimental pass to convert all floating point operations to the equivalent constrained intrinsicsNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 133682

include/llvm/InitializePasses.h

include/llvm/Transforms/IPO.h

lib/Transforms/IPO/CMakeLists.txt

lib/Transforms/IPO/ForceConstrainedFP.cpp

lib/Transforms/IPO/IPO.cpp

lib/Transforms/IPO/PassManagerBuilder.cpp

test/Transforms/force-constrained-fp.ll

Experimental pass to convert all floating point operations to the equivalent constrained intrinsics
Needs ReviewPublic