This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Simplify and correct folding fcmps with the same children
ClosedPublic

Authored by timshen on Jun 27 2016, 5:11 PM.

Details

Summary

Take advantage of FCmpInst::Predicate's bit pattern and handle (fcmp *, x, y) | (fcmp *, x, y) and (fcmp *, x, y) & (fcmp *, x, y) more consistently. Also fold more FCmpInst::FCMP_FALSE and FCmpInst::FCMP_TRUE to constants.

Currently InstCombine wrongly folds (fcmp ogt, x, y) | (fcmp ord, x, y) to (fcmp ogt, x, y); this patch also fixes that.

Diff Detail

Repository
rL LLVM

Event Timeline

timshen updated this revision to Diff 62046.Jun 27 2016, 5:11 PM
timshen retitled this revision from to [InstCombine] Simplify and correct folding fcmps with the same children.
timshen updated this object.
timshen added a reviewer: spatel.
timshen added subscribers: echristo, iteratee, llvm-commits.
spatel edited edge metadata.Jun 28 2016, 11:55 AM

This looks fantastic...if it's true. :)

So that's my problem: before we even get to this patch, we only have ~13 tests covering the 256 possible (16 * 16 predicates * 2 (and+or) / 2 for commutation) combinations of predicates.

Is there some clever way to get more coverage from those tests? If there's nothing obvious, we might as well just copy/paste the entire set into the existing regression test files. It's cheap. That way, we're certain that we're choosing the right predicate in all cases. The regression tests would then also serve as an unofficial assert that the enum values never change from their perfectly chosen values. IMO, an official assert and some loud comments in the code are warranted too.

Whether or not we agree that we should have the full regression test coverage, please note that there's a script to generate exact checks for regression tests:
$ utils/update_test_checks.py --help

This removes the need for CHECK-NOTs.

I've updated the old tests using that script in rL274046 and rL274047 .

If there's nothing obvious, we might as well just copy/paste the entire set into the existing regression test files.

I guess I missed a bit of context. Copy from where? By saying regression test files do you mean test/Transforms/InstCombine/and-fcmp.ll and test/Transforms/InstCombine/or-fcmp.ll?

Besides, auto-generate the CHECKs (assertions?) won't help with the LLVM code coverage, will it? If not, should we generate all 512 (regardless of the commutations) cases, since it's not that many? But then we have to test the generator... :P.

If there's nothing obvious, we might as well just copy/paste the entire set into the existing regression test files.

I guess I missed a bit of context. Copy from where? By saying regression test files do you mean test/Transforms/InstCombine/and-fcmp.ll and test/Transforms/InstCombine/or-fcmp.ll?

Yes, the files under the 'test' dir are the regression tests:
http://llvm.org/docs/TestingGuide.html#llvm-testing-infrastructure-organization

Sorry, 'copy/paste' wasn't the best phrase. I just meant we should stamp out all N of the logical combinations of predicates. Right now, we don't really know what the existing behavior for each combination looks like.

Besides, auto-generate the CHECKs (assertions?) won't help with the LLVM code coverage, will it? If not, should we generate all 512 (regardless of the commutations) cases, since it's not that many? But then we have to test the generator... :P.

I don't understand what you mean by 'test the generator'?

Here's how I would proceed:

  1. Create all N tests in the existing test files. I don't think there's much value in the commuted variants, so this would actually be N = 16*17 = 272 I think.
  2. Run the script to generate the CHECK lines.
  3. Commit these test changes to trunk.
  4. Now, we know what predicates the existing code produces (hopefully there are no existing bugs).
  5. Apply your code patch.
  6. Regenerate the CHECK lines using the script.
  7. Update the patch here with those diffs, so we can see any functional differences from your change in logic.
  1. Create all N tests in the existing test files. I don't think there's much value in the commuted variants, so this would actually be N = 16*17 = 272 I think.
  2. Run the script to generate the CHECK lines.
  3. Commit these test changes to trunk.

I prefer not to commit this to the trunk, but only locally, since @t6 in or-fcmp.ll is testing an actual existing bug (sorry for not put that explicitly). It'd be weird to check-in something that is known wrong, and make them pass later.

How about I compare the auto-generated CHECKs and post the functional differences here in the comments, we inspect it, and then check-in the patch with all N tests (with correct CHECKs) together?

timshen updated this object.Jun 28 2016, 5:40 PM
timshen edited edge metadata.

The functional change is here:
https://gist.github.com/timshen91/51b209c8d1be22ef96a1a9fd4131595d#file-fcmp-diff-diff

I verified the differences, and considered them valid. PTAL. Specifically, there are bug fixes from line 2586 to line 2643; others are just optimizations.

timshen updated this revision to Diff 62162.Jun 28 2016, 6:33 PM

Updated the patch with all 272 tests.

Sounds good. Thanks for checking.

timshen updated this revision to Diff 62196.EditedJun 29 2016, 3:15 AM

Updated comments, and move the changed (perfectly handled actually) solutions prior to other heuristics. The move catches two more optimizations, see a local diff:

diff --git a/test/Transforms/InstCombine/and-fcmp.ll b/test/Transforms/InstCombine/and-fcmp.ll
index 8ad6779..fa9e441 100644
--- a/test/Transforms/InstCombine/and-fcmp.ll
+++ b/test/Transforms/InstCombine/and-fcmp.ll
@@ -546,10 +546,8 @@ bb:
 define i1 @auto_gen_35(double %a, double %b) {
 ; CHECK-LABEL: @auto_gen_35(
 ; CHECK-NEXT:  bb:
-; CHECK-NEXT:    [[CMP:%.*]] = fcmp ord double %a, %b
-; CHECK-NEXT:    [[CMP1:%.*]] = fcmp ord double %a, %b
-; CHECK-NEXT:    [[RETVAL:%.*]] = and i1 [[CMP]], [[CMP1]]
-; CHECK-NEXT:    ret i1 [[RETVAL]]
+; CHECK-NEXT:    [[TMP0:%.*]] = fcmp ord double %a, %b
+; CHECK-NEXT:    ret i1 [[TMP0]]
 ;
 bb:
   %cmp = fcmp ord double %a, %b
diff --git a/test/Transforms/InstCombine/or-fcmp.ll b/test/Transforms/InstCombine/or-fcmp.ll
index f961cfa..9d743c7 100644
--- a/test/Transforms/InstCombine/or-fcmp.ll
+++ b/test/Transforms/InstCombine/or-fcmp.ll
@@ -1579,10 +1579,8 @@ bb:
 define i1 @auto_gen_119(double %a, double %b) {
 ; CHECK-LABEL: @auto_gen_119(
 ; CHECK-NEXT:  bb:
-; CHECK-NEXT:    [[CMP:%.*]] = fcmp uno double %a, %b
-; CHECK-NEXT:    [[CMP1:%.*]] = fcmp uno double %a, %b
-; CHECK-NEXT:    [[RETVAL:%.*]] = or i1 [[CMP]], [[CMP1]]
-; CHECK-NEXT:    ret i1 [[RETVAL]]
+; CHECK-NEXT:    [[TMP0:%.*]] = fcmp uno double %a, %b
+; CHECK-NEXT:    ret i1 [[TMP0]]
 ;
 bb:
   %cmp = fcmp uno double %a, %b
  1. Create all N tests in the existing test files. I don't think there's much value in the commuted variants, so this would actually be N = 16*17 = 272 I think.
  2. Run the script to generate the CHECK lines.
  3. Commit these test changes to trunk.

I prefer not to commit this to the trunk, but only locally, since @t6 in or-fcmp.ll is testing an actual existing bug (sorry for not put that explicitly). It'd be weird to check-in something that is known wrong, and make them pass later.

How about I compare the auto-generated CHECKs and post the functional differences here in the comments, we inspect it, and then check-in the patch with all N tests (with correct CHECKs) together?

I started looking over the diffs in the gist, but I'm going to suggest again that you check in the tests with current output as a preliminary step to this patch. IMO, the fact that we have miscompiles is more reason to do it that way. The advantages are:

  1. There's a direct before/after comparison in the commit log of the functional changes from this patch.
  2. Post-commit reviewers can easily parse those diffs inline in email.
  3. It's also easier for us to review the test diffs here in Phab in one shot alongside the code changes.

There should be no concern about documenting buggy behavior in a regression test; I do this all the time. Then there's no guesswork about the old codegen. Just add "; FIXME: miscompile" or "; FIXME: missed fold" before tests that you know are misbehaving.

To substantially reduce the size of the patch: remove the "bb:" label from each test and the corresponding CHECK line. These are all single basic-block tests, so there's no value in those lines.

spatel added inline comments.Jun 29 2016, 11:21 AM
test/Transforms/InstCombine/and-fcmp.ll
17–18 ↗(On Diff #62239)

As you noted, these renaming changes in the scripted output are caused because the script uses the variable names in the IR to create FileCheck variable names.

The reason the IR names change with your patch is because the old code would reuse existing instructions, for example:

if (Op0CC == FCmpInst::FCMP_TRUE)
  return RHS;

...but the new code always creates a new instruction. So the old checks show that the instruction has the same variable name as the original test case, but the new checks show a new temp name (%1).

For the sake of cleanliness (and I think Eric was suggesting the same thing), can you remove these diffs? This could be done by either checking in the NFC changes before this patch or removing these hunks from this patch.

if (Op0CC == FCmpInst::FCMP_TRUE)

return RHS;

I see. I created D21855.

spatel accepted this revision.Jun 29 2016, 12:27 PM
spatel edited edge metadata.

Some good boolean logic puzzles in these changes. :)
See inline comments for some nits, then LGTM.

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
46 ↗(On Diff #62263)

Take the advantage -> Take advantage

49 ↗(On Diff #62263)

I would remove the message parameter from all of these static_asserts to tighten it up. If someone tries to change these values, it should be obvious why we don't want that. On that note, can you add a comment to the definition of CmpInst::Predicate in IntsrTypes.h that says something like, "These enum values have the magical property of matching the logical and/or of the predicates. Any changes to these values will require changes to InstCombine."

96 ↗(On Diff #62263)

...or a constant true/false value.

1125–1133 ↗(On Diff #62263)

This comment and the similar one below didn't make anything clearer to me. I would leave it out, but that's just my suggestion.

This revision is now accepted and ready to land.Jun 29 2016, 12:27 PM
timshen updated this revision to Diff 62272.Jun 29 2016, 12:56 PM
timshen marked 4 inline comments as done.
timshen edited edge metadata.

Updated comments as suggested.

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
1125–1133 ↗(On Diff #62263)

The boolean logic deduction is quite different though. :)

I removed the first duplicated paragraph in FoldOrOfFCmps.