This is an archive of the discontinued LLVM Phabricator instance.

"float2int": Add a new pass to demote from float to int where possible.
ClosedPublic

Authored by jmolloy on Feb 20 2015, 7:58 AM.

Details

Reviewers
hfinkel
Summary

It is possible to have code that converts from integer to float, performs
operations then converts back, and the result is provably the same as if
integers were used.

This can come from different sources, but the most obvious is a helper function
that uses floats but the arguments given at an inlined callsites are integers.

This pass considers all integers requiring a bitwidth less than or equal to
the bitwidth of the mantissa of a floating point type (23 for floats, 52 for
doubles) as exactly representable in floating point.

Diff Detail

Repository
rL LLVM

Event Timeline

jmolloy updated this revision to Diff 20404.Feb 20 2015, 7:58 AM
jmolloy retitled this revision from to "float2int": Add a new pass to demote from float to int where possible..
jmolloy updated this object.
jmolloy edited the test plan for this revision. (Show Details)
jmolloy added a reviewer: hfinkel.
jmolloy set the repository for this revision to rL LLVM.
jmolloy added a subscriber: Unknown Object (MLST).
hfinkel edited edge metadata.Feb 20 2015, 12:40 PM

This is neat, thanks for posting it!

lib/Transforms/Scalar/Float2Int.cpp
47

Make this a command-line option.

157

You can also handle Select here (and PHIs, although that might require more work?)

224

This is directly recursive; that's likely a bad idea (especially considering that it has no depth cutoff). Can you make this be worklist-based instead. We don't need a small cutoff if we have a worklist.

256

Why is the FIXME here? Do you want to mark these as unsigned under some circumstances?

281

What's this FIXME for? You'd want to mark the full range? (that would make the unioning unnecessary ;) )

318

Why are you checking for isFullSet() here? What about if you only have i8 (or something definitively smaller than the mantissa)?

333

Hrmm, we have other floating-point types in the IR. You need to either exclude them somewhere, or handle them.

Either way, the real way to do this is to call:

... = APFloat::semanticsPrecision(ConvertedToTy->getFltSemantics()) - 1;
341

Exactly where does 23 come from here? I understand that is the number of bits in a single-precision mantissa, but what does it have to do with picking the integer type?

Can you use make use of Type::getIntNTy?

385

You should not need to explicitly handle the equal-types case here. IRB::CreateZExtOrTrunc, will also check this and do nothing if the incoming and outgoing bitwidths are equal.

(this obviously applies to the checks below as well).

442

Hrmm. To have converted an instruction, you must have converted all uses, right? If we know these need to be dead, and we just don't know the order, we can use the same technique employed by ADCE:

  1. First, loop over all instructions calling I->dropAllReferences
  2. Then, loop over them again, erasing them as you do here.
test/Transforms/Float2Int/basic.ll
4

Please move these descriptions to be with each associated test.

jmolloy updated this revision to Diff 20682.Feb 25 2015, 9:01 AM
jmolloy edited edge metadata.

Hi Hal,

Thanks for the quick review! Sorry for not getting back to it for a couple of days - I was at a conference.

I've uploaded a new patch addressing most of your comments.

You can also handle Select here (and PHIs, although that might require more work?)

Yes, I think I can. I've added a FIXME for this and will do (at least the select work) in a followup, if that's OK? PHIs I think will certainly need more thought.

Why is the FIXME here? Do you want to mark these as unsigned under some circumstances?

The FIXME was (a) not meant to make it to upstream code review :) and (b) because I hadn't clocked the sense of APSFloat's constructor - I thought it was "isSigned" rather than "isUnsigned", and my current code felt like it was doing the wrong thing. It's actually right.

Why are you checking for isFullSet() here? What about if you only have i8 (or something definitively smaller than the mantissa)?

R's underlying type is i65 (MaxBitwidth + 1), so isFullSet() will only trigger on either an i64 multiplication/addition or a poison value (which I define as "i65 full-set"). 0..255 should never trigger isFullSet here, unless I'm misunderstanding how ConstantRange works.

Hrmm, we have other floating-point types in the IR. You need to either exclude them somewhere, or handle them.

Thanks! I wasn't aware of the fltSemantics precision. Updated.

Exactly where does 23 come from here?

Thanks for noticing that. It's wrong, it should of course be 32. I don't really want to use getIntNTy() because I don't want to get an illegal type - I've restricted myself to i32 or i64 for the moment. I could grab TLI and query that I suppose?

Hrmm. To have converted an instruction, you must have converted all uses, right?

In fact, because I'm using a MapVector and reversing through it I hit all nodes after their uses. So I just don't need the use_empty() check.

jmolloy updated this revision to Diff 20683.Feb 25 2015, 9:14 AM

This is directly recursive; that's likely a bad idea (especially considering that it has no depth cutoff). Can you make this be worklist-based instead. We don't need a small cutoff if we have a worklist.

Hrmm, I agree. It's slightly awkward because I do need the return value of traverse() right there and then, so converting it to be breadth first is going to be awkward.

I think I'll separate these out into two parts - first, scan up the use-def graph using a worklist to find all the nodes that I want to visit. Then, so a linear walk of the result back-to-front calculating node values.

For the moment I've added a cutoff of 32 instructions; I'll go work on reworking the algorithm now.

jmolloy updated this revision to Diff 20686.Feb 25 2015, 9:42 AM
reames added a subscriber: reames.Feb 25 2015, 11:22 AM

I'm going to deref to Hal on the code for the moment. It sounds like things are still in progress. Once you've got the approach nailed down, I'll try to take another look for nits.

At a high level, why are you restricting yourself to entire sequences? It seems like this would be profitable even if all you did was push the float to int conversion back a bit or the int to float forward a bit. This might even simplify the code.

Is the concern here that the backends might not make appropriate use of the floating point unit for integer ops?

lib/Transforms/Scalar/Float2Int.cpp
441

Can this be a range loop? Can you use a utility to delete things which are recursively trivially dead?

jmolloy updated this revision to Diff 20772.Feb 26 2015, 9:18 AM

Hi Hal, Philip,

This latest revision removes the recursion, replacing it with a slightly more complicated but still understandable two-phase approach.

Cheers,

James

Hi Philip,

I'm going to deref to Hal on the code for the moment. It sounds like things are still in progress. Once you've got the approach nailed down, I'll try to take another look for nits.

Thanks!

At a high level, why are you restricting yourself to entire sequences? It seems like this would be profitable even if all you did was push the float to int conversion back a bit or the int to float forward a bit. This might even simplify the code.

The short answer is "first do no harm". I flirted with the idea of doing what you suggest in previous patch incarnations, and had trouble working out when it was profitable.

In the current incarnation, the heuristic is "yes", which simplifies things quite a bit. By converting an entire chain, we:

  • Never add any instructions.
  • Never move any instructions.

The former might be required if we had to push a cast up past a PHI, and the latter might cause issues if we pushed a cold cast into a hot region. We'd have to be very careful to only move casts outside of loops, never inside, for example (unless we could remove them completely). This means that an iterative,greedy method is really out of the question and a whole-function analysis is required.

I still hope to be able to do what you want, maybe at a (much?) later stage. I hope that the current infrastructure is a good staging point for such a transform - really it's just adding a weighting to each node and a global sum determining if the transform is worth it. The mechanism shouldn't have to change, I think.

Can this be a range loop? Can you use a utility to delete things which are recursively trivially dead?

In this latest incarnation it's now a trivial loop, so I think using a utility isn't really required. Also, I know which nodes are dead; getting a utility to discover them isn't really required.

It can't be a range loop because I need to reverse-iterate over the container to ensure I hit uses before defs.

Cheers,

James

Hi Hal,

Gentle ping.

Cheers,

James

mkuper added a subscriber: mkuper.Mar 4 2015, 10:44 PM

Hi James,

Just so we have a record of what we talked about on IRC (and can give Hal a chance to disagree :-) ).
On x86, vector i64 muls can be much worse than vector double muls. Since this is pre-LoopV, and we don't know if we'll end up with vector or scalar code, I think the safe thing to do on x86 would be to disable this for cases where we'll do a double -> i64 transformation.

This means we should probably have a target hook for that that x86 can override.

hfinkel added inline comments.Mar 5 2015, 7:29 AM
lib/Transforms/Scalar/Float2Int.cpp
49

Don't put the "-" at the beginning of "-float2int-max-integer-bw".

66

Please make all of the functions here start with a lowercase letter.

75

Remove extra space before >

165

I don't like the terminology of calling this a 'forward' walk. It is walking instructions from the roots (which are at the end of the execution sequence), through their operands. I call this a 'backward' walk.

170

If you visit defs before uses, that seems 'forward' to me.

231

Remove blank line.

296

Remove blank line.

300

Remove blank line.

302

Remove blank line.

339

Remove blank line.

348

Remove blank line.

419

Where do you filter out MinBW > 64?

422

{ } not needed here.

473

Unnecessary line break?

test/Transforms/Float2Int/basic.ll
120

Please add negative test using a large integer type (i128, etc.).

Hi James,

Just so we have a record of what we talked about on IRC (and can give Hal a chance to disagree :-)

Good; I disagree :-)

The first question is answer is: What is the most useful and reasonable canonical form? The reason I support running this pass early in the pipeline is because I believe that demoting these int -> fp -> int sequences to int sequences, when semantically equivalent, is the most useful canonical form.

If it is useful, because of microarchitectural features, to use FP vector ops instead of integer vector ops, then that should be 'actively' handled later (instead of just taking advantage of it when it happens to happen).

So I think that this should run early by default, x86 included. We should also reverse the transformation later, perhaps within the vectorizer, using an actual cost model, if that proves useful.

).

On x86, vector i64 muls can be much worse than vector double muls. Since this is pre-LoopV, and we don't know if we'll end up with vector or scalar code, I think the safe thing to do on x86 would be to disable this for cases where we'll do a double -> i64 transformation.

This means we should probably have a target hook for that that x86 can override.

jmolloy updated this revision to Diff 22181.Mar 18 2015, 7:40 AM

Hi Hal,

Thanks for the review, sorry for the long round trip time. All comments should be fixed.

Cheers,

James

hfinkel accepted this revision.Mar 23 2015, 11:39 AM
hfinkel edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Mar 23 2015, 11:39 AM
jmolloy closed this revision.Jul 17 2015, 2:21 AM