This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Fix compile time issue in recursive CTR analysis code
ClosedPublic

Authored by tejohnson on Mar 6 2020, 5:53 PM.

Details

Summary

Avoid re-examining operands on recursive walk looking for CTR.
This was causing huge compile time after some earlier optimization
created a large expression.

The start of the expression (created by IndVarSimplify) looked like:

%469 = lshr i64 trunc (i128 xor (i128 udiv (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011)) to i64), i64 45) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011)) to i64), i64 45) to i128), ...

with the _ZN4absl13hash_internal13CityHashState5kSeedE referenced many times.

Diff Detail

Event Timeline

tejohnson created this revision.Mar 6 2020, 5:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2020, 5:53 PM
MaskRay added a subscriber: MaskRay.Mar 6 2020, 6:36 PM
MaskRay added inline comments.
llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
508

How about SmallPtrSet<const Value *, 4> (or 4 -> another integer)?

tejohnson marked an inline comment as done.Mar 6 2020, 7:35 PM
tejohnson updated this revision to Diff 248880.Mar 6 2020, 7:35 PM

Address comment

jsji added a reviewer: Restricted Project.Mar 6 2020, 7:52 PM
jsji added a project: Restricted Project.
MaskRay added a comment.EditedMar 10 2020, 1:25 PM

Generally looks good, but the title needs to be clarified. This piece of code is related to an optimization which uses CTR as the loop count register.

For context, D6786 introduced mightUseCTR. rL251582 made it recurse into the constant.

IIUC, with ThinLTO's ImportConstantsWithRefs optimization, a constant can be very large, and recursing into it for every instruction can make the compilation slow. Is that the case?

tejohnson retitled this revision from [PowerPC] Fix compile time issue to [PowerPC] Fix compile time issue in recursive CTR analysis code.Mar 10 2020, 1:44 PM

Generally looks good, but the title needs to be clarified. This piece of code is related to an optimization which uses CTR as the loop count register.

Updated the title.

For context, D6786 introduced mightUseCTR. rL251582 made it recurse into the constant.

IIUC, with ThinLTO's ImportConstantsWithRefs optimization, a constant can be very large, and recursing into it for every instruction can make the compilation slow. Is that the case?

No, the constant itself was quite simple. Importing it simply enabled some additional optimization (in IndVarSimplify) that created a huge instruction expression. It was iterating that expression, not the constant itself, that is so slow.

MaskRay accepted this revision.Mar 10 2020, 2:18 PM
This revision is now accepted and ready to land.Mar 10 2020, 2:18 PM

Generally looks good, but the title needs to be clarified. This piece of code is related to an optimization which uses CTR as the loop count register.

Updated the title.

For context, D6786 introduced mightUseCTR. rL251582 made it recurse into the constant.

IIUC, with ThinLTO's ImportConstantsWithRefs optimization, a constant can be very large, and recursing into it for every instruction can make the compilation slow. Is that the case?

No, the constant itself was quite simple. Importing it simply enabled some additional optimization (in IndVarSimplify) that created a huge instruction expression. It was iterating that expression, not the constant itself, that is so slow.

It'd be nice to give a (reduced) example in the description, even if testing it reliably may be unrealistic.

tejohnson edited the summary of this revision. (Show Details)Mar 10 2020, 7:06 PM

Generally looks good, but the title needs to be clarified. This piece of code is related to an optimization which uses CTR as the loop count register.

Updated the title.

For context, D6786 introduced mightUseCTR. rL251582 made it recurse into the constant.

IIUC, with ThinLTO's ImportConstantsWithRefs optimization, a constant can be very large, and recursing into it for every instruction can make the compilation slow. Is that the case?

No, the constant itself was quite simple. Importing it simply enabled some additional optimization (in IndVarSimplify) that created a huge instruction expression. It was iterating that expression, not the constant itself, that is so slow.

It'd be nice to give a (reduced) example in the description, even if testing it reliably may be unrealistic.

I showed the start of the expression that causes the analysis to re-analyze the same operand many times.

This revision was automatically updated to reflect the committed changes.