[InstCombine] fadd double (sitofp x), y check that the promotion is valid
ClosedPublic

Authored by apilipenko on Mar 21 2017, 5:00 AM.

Details

Summary

Doing these transformations check that the result of integer addition is representable in the FP type.
(fadd double (sitofp x), fpcst) --> (sitofp (add int x, intcst))
(fadd double (sitofp x), (sitofp y)) --> (sitofp (add int x, y))

This is a fix for https://bugs.llvm.org//show_bug.cgi?id=27036

Diff Detail

Repository
rL LLVM
apilipenko created this revision.Mar 21 2017, 5:00 AM
boris.ulasevich added inline comments.
test/Transforms/InstCombine/add-sitofp.ll
16 ↗(On Diff #92458)

Why do we need lshr here? %n + 1 = ((%m >>> 24) && %b) + 1 takes range 1..256 - it fits quite well to 23-bit single precision mantissa, and optimisation can be applied here, isn't it?
Does it make sense for you? Do I miss something?

Before getting to any details about the patch, we need to address the question raised in PR27036: why are we doing this transform in InstCombine at all? The assumption is that an integer add is more canonical and/or cheaper than an FP add. Is that universally true?

arsenm added a subscriber: arsenm.Mar 21 2017, 8:04 AM

Before getting to any details about the patch, we need to address the question raised in PR27036: why are we doing this transform in InstCombine at all? The assumption is that an integer add is more canonical and/or cheaper than an FP add. Is that universally true?

I would say no. For example on AMDGPU a f32 add is exactly as fast as an i32 add, but the i32 add has an additional carry output constraint to deal with

The assumption is that an integer add is more canonical and/or cheaper than an FP add. Is that universally true?

No. There exist GPU architectures where a 32b integer addition is a more expensive operation than a 32b FP add.

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1401 ↗(On Diff #92458)

s/mantissa/significand/ (https://en.wikipedia.org/wiki/Significand#Use_of_.22mantissa.22)

Also, semanticsPrecision returns the number of bits in the significand (which includes the implicit "integral" bit when present). There's no "plus one for the sign bit".

scanon added inline comments.Mar 21 2017, 9:38 AM
lib/Transforms/InstCombine/InstCombineAddSub.cpp
1429 ↗(On Diff #92458)

Aren't we missing a check that RHSIntVal can be exactly in the floating-point type here? I.e. if (RHSIntVal->getType()->getIntegerBitWidth() <= MaxRepresentableBits), mirroring the check for LHS above?

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1404 ↗(On Diff #92458)

This check seems more conservative than we'd want it to be. For instance, in the example in the comments we know that 'x & 1234' is can be represented as a 12-bit signed integer, but if x is i32 this code will return 32 for LHSIntVal->getType()->getIntegerBitWidth(). This directly relates to Boris' comment on the test case.

1410 ↗(On Diff #92458)

The MaxRepresentableBits calculation and comparison won't be necessary if neither this condition nor the condition at line 1424 is true. I think the code should be organized to check these dyn_cast conditions first, particularly if you are going to do something more complicated with the number of bits required.

1413 ↗(On Diff #92458)

This condition will block either transformation and so it could be checked at line 1395 and save us a bit of work sometimes.

test/Transforms/InstCombine/add-sitofp.ll
21 ↗(On Diff #92458)

Can you add a negative test for the '(fadd double (sitofp x), (sitofp y)) --> (sitofp (add int x, y))' case also?

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1414 ↗(On Diff #92458)

It looks like getFPToSI and getSIToFP will end up using the APFloat class. Wouldn't it be better to just use APFloat here and call APFloat::isInteger()?

Before getting to any details about the patch, we need to address the question raised in PR27036: why are we doing this transform in InstCombine at all? The assumption is that an integer add is more canonical and/or cheaper than an FP add. Is that universally true?

It's difficult to tell. Even if FP addition is cheaper the second combine also saves us one sitofp conversion. It should be also taken into account.

These combines are buggy and there is a simple fix. Since I don't have a good understanding on the profitability of these combines I decided to go ahead with the fix rather then removing them for example.

Before getting to any details about the patch, we need to address the question raised in PR27036: why are we doing this transform in InstCombine at all? The assumption is that an integer add is more canonical and/or cheaper than an FP add. Is that universally true?

It's difficult to tell. Even if FP addition is cheaper the second combine also saves us one sitofp conversion. It should be also taken into account.

These combines are buggy and there is a simple fix. Since I don't have a good understanding on the profitability of these combines I decided to go ahead with the fix rather then removing them for example.

Agreed - solving the miscompiling is the most important thing. Since we can do that quickly with a small patch, let's go ahead. We can decide what is canonical form as a next step, but please add a 'TODO' comment about that.

This example leads into the same gray area we have with several instcombine folds: it's not clear if the value-track-ability is enough to justify the transform, whether a backend should be responsible for fixing something that is not optimal for all targets, if the removal of an IR instruction overrides either of those...

apilipenko marked 3 inline comments as done.

Address review comments.

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1404 ↗(On Diff #92458)

Given that we are uncertain if we need this transform at all (see the discussion above) I'd prefer not to make it too smart without a good reason. So, unless we have a clear reason why we want this code to be more aggressive I'd prefer to keep this fix as simple as it is now.

For more aggressive version of this transform we should rely on Float2Int pass and make it smarter if needed.

1413 ↗(On Diff #92458)

The second transform looks for either RHSConv or LHSConv to have one use, so it doesn't always block the second transform.

1414 ↗(On Diff #92458)

It sounds like a separate refactoring to the existing code.

1429 ↗(On Diff #92458)

We should be fine as is. I assume that the LHS, RHS of the addition have the same types (there can be an assert) and there is also a check below that integer types are the same.

test/Transforms/InstCombine/add-sitofp.ll
16 ↗(On Diff #92458)

I just duplicated the example above. I agree it is confusing in this context. I'll simplify the negative test case to make it more clear that the result of the operation is not guaranteed to fit into float type.

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1404 ↗(On Diff #92458)

That seems reasonable, but can you add a comment indicating that it's being conservative and that the example transformation might not happen?

1413 ↗(On Diff #92458)

I see. I missed that.

1414 ↗(On Diff #92458)

Yes, that's true.

apilipenko added a comment.EditedThu, Mar 30, 2:46 AM

Ping.

Update, I didn't see that there was a request to add a comment. Will do and update the patch,

This addresses my concerns. Gentle ping to the other reviewers.

See inline for some nits.

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1402–1403 ↗(On Diff #92843)

Is that clang-formatted?

Please upload the version with the TODO/FIXME comments.

test/Transforms/InstCombine/add-sitofp.ll
17 ↗(On Diff #92843)

"highes" -> "highest"; same typo is repeated below.

Also, I appreciate changing this test file to use FileCheck, but could you run utils/update_test_checks.py instead of hand-editing the CHECK lines? That makes the assertions stronger and easier to update (we're pretty sure we'll want to update these one way or another).

apilipenko updated this revision to Diff 95113.Thu, Apr 13, 6:06 AM

Add the comment about overly conservative legality check, add the test demonstrating it.

apilipenko marked 3 inline comments as done.Thu, Apr 13, 6:07 AM

The code changes look good to me, but I'm not sure if the tests cover the case that Andy requested, so I'll let him comment.

@andrew.w.kaylor - ping, do you have any comments about the latest revision?

andrew.w.kaylor accepted this revision.Fri, Apr 21, 10:32 AM

I'm happy with this. Thanks for the improvements!

This revision is now accepted and ready to land.Fri, Apr 21, 10:32 AM
This revision was automatically updated to reflect the committed changes.