Page MenuHomePhabricator

[instcombine] Optimise for zero initialisation of product given fast flags are enabled
ClosedPublic

Authored by zjaffal on Aug 11 2022, 5:34 AM.

Details

Summary

Currently, clang ignores the 0 initialisation in finite math
For example:

double f_prod = 0;
double arr[1000];
for (size_t i = 0; i < 1000; i++) {
  f_prod *= arr[i];
 }

Clang will ignore that f_prod is set to zero and it will generate assembly to iterate over the loop.

Diff Detail

Event Timeline

zjaffal created this revision.Aug 11 2022, 5:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 11 2022, 5:34 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
zjaffal requested review of this revision.Aug 11 2022, 5:34 AM
fhahn added inline comments.
llvm/test/Transforms/InstCombine/remove-loop-phi-fastmul.ll
10

It would be good to split off the test changes to a separate patch that just adds the tests.

20

Would be good to use 'rotated' version of the loop to keep things simpler.

37

We should also have tests where all the incoming values are instructions or arguments. We also need some tests with phis with more than 2 incoming values.

zjaffal retitled this revision from [opt] Optimise for zero initialisation of product given finite math in Clang to [instcombine] Optimise for zero initialisation of product given finite math in Clang.Aug 12 2022, 3:33 AM
spatel added inline comments.Aug 12 2022, 4:55 AM
llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
1476 ↗(On Diff #451814)
zjaffal added inline comments.Aug 15 2022, 3:02 AM
llvm/lib/Transforms/InstCombine/InstCombinePHI.cpp
1476 ↗(On Diff #451814)

I tested using llvm::matchSimpleRecurrence. Here is what I found

  1. We need to move the optimisation into InstCombineMulDivRem instead which is fine.
  2. We need to add a case into llvm::matchSimpleRecurrence to handle FMul
  3. The main issue of using matchSimpleRecurrence is that it doesn't handle phi with more than two operands.

I am not sure how common phi's with more than two operands are.

zjaffal updated this revision to Diff 452620.Aug 15 2022, 3:26 AM
  • Move functionality to InstCombineMulDivRem.cpp
  • Use matchSimpleRecurrence
spatel added inline comments.Aug 15 2022, 6:14 AM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
489

Checking for no-inf is correct, but not strictly necessary since 0.0 * Inf = NaN and we check 'nnan' (see InstSimplify for the existing constraints for the fold of X*0.0 to 0.0).

491

We don't need to walk through the phi operands again - the zero we want had to be captured as "Start"?

if (matchSimpleRecurrence(&I, Phi, Start, Step) && match(Start, m_Zero()) &&
    I.hasNoInfs() && I.hasNoNaNs() && I.hasNoSignedZeros())
  return replaceInstUsesWith(I, Start);
llvm/test/Transforms/InstCombine/remove-loop-phi-fastmul.ll
23

This test should have the minimal FMF needed to trigger the transform.

130–131

I don't think this test is adding value. It just shows the expected X*0.0 simplification to 0.0?

All of the tests should be reduced to the minimum IR needed to show the transform. Looking at existing tests in the file "recurrence.ll" might be useful.

If you want to verify that a larger example reduces with the usual opt pipeline, it would be better to add a test under "test/Transforms/PhaseOrdering".

zjaffal updated this revision to Diff 452718.Aug 15 2022, 9:42 AM

Updating D131672: [instcombine] Optimise for zero initialisation of product given finite math in Clang

zjaffal retitled this revision from [instcombine] Optimise for zero initialisation of product given finite math in Clang to [instcombine] Optimise for zero initialisation of product given fast flags are enabled.Aug 15 2022, 10:10 AM
zjaffal marked 7 inline comments as done.
spatel added inline comments.Aug 15 2022, 11:37 AM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
492

Was there a reason to not use the shorter and suggested m_Zero() matcher?
It probably doesn't make a difference for this particular pattern, but that code handles things like vector constants, so we won't miss any unusual types.

zjaffal added inline comments.Aug 15 2022, 11:47 AM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
492

I didn't know it existed. I will push a change now

zjaffal updated this revision to Diff 452759.Aug 15 2022, 11:48 AM

Updating D131672: [instcombine] Optimise for zero initialisation of product given fast flags are enabled

zjaffal marked an inline comment as done.Aug 16 2022, 12:45 AM
zjaffal updated this revision to Diff 452934.Aug 16 2022, 2:54 AM

Updating D131672: [instcombine] Optimise for zero initialisation of product given fast flags are enabled

spatel accepted this revision.Aug 16 2022, 4:55 AM

LGTM - see inline comment for formatting nit.
The tests seem fine too, but let @fhahn have another look in case there are any other suggestions.

I'm not sure what source causes this pattern, but there could be related fast-math simplifications like X * 1.0 --> X.

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
492

Remove braces around one-line 'if' clause to be consistent with surrounding code.

This revision is now accepted and ready to land.Aug 16 2022, 4:55 AM
zjaffal updated this revision to Diff 452993.Aug 16 2022, 7:04 AM

remove brackets around if statement

fhahn added a comment.Aug 16 2022, 7:53 AM

Thanks for the update! Could you also update the description to describe the transformation implemented, with the legality considerations as well?

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
487

Could you add a comment explaining the transform we are applying here? Also, it would probably be good to move this more towards the end of the function, so cheaper patterns are tried first.

zjaffal updated this revision to Diff 453038.Aug 16 2022, 9:14 AM

Updating D131672: [instcombine] Optimise for zero initialisation of product given fast flags are enabled

@zjaffal Please can you rebase against trunk again? D131838 managed to affect some of your tests

fhahn accepted this revision.Aug 17 2022, 2:35 AM

LGTM, thanks!

LGTM - see inline comment for formatting nit.
The tests seem fine too, but let @fhahn have another look in case there are any other suggestions.

I'm not sure what source causes this pattern, but there could be related fast-math simplifications like X * 1.0 --> X.

The source was a user report. I am not sure about FMUL recurrences that have 1.0 as start value. But I think we might want the same fold for integer multiply as follow up.

This revision was landed with ongoing or failed builds.Aug 17 2022, 3:12 AM
This revision was automatically updated to reflect the committed changes.