This is an archive of the discontinued LLVM Phabricator instance.

[LV] Use Demanded Bits and ValueTracking for reduction type-shrinking
ClosedPublic

Authored by mcrosier on Jan 19 2018, 11:20 AM.

Details

Summary

The type-shrinking logic in reduction detection, although narrow in scope, is also rather ad-hoc, which has led to bugs (e.g., PR35734). This patch modifies the approach to rely on the demanded bits and value tracking analyses, if available. We currently perform type-shrinking separately for reductions and other instructions in the loop. Long-term, we should probably think about computing minimal bit widths in a more complete way for the loops we want to vectorize.

Reference: https://bugs.llvm.org/show_bug.cgi?id=35734

Diff Detail

Event Timeline

mssimpso created this revision.Jan 19 2018, 11:20 AM
mssimpso edited the summary of this revision. (Show Details)

Hi Matthew,

thanks for taking care of this! I like the general idea of the fix but I have a concern regarding the TODO in line 408:

// TODO: We should not rely on InstCombine to rewrite the reduction in the
//       smaller type. We should just generate a correctly typed expression
//       to begin with.

The cost model is relying on an optimization that will hopefully be applied by InstCombine. I wonder if it would be too complicated to implement the actual type shrinking in LV code gen as part of the fix.
That would better align cost modeling with LV code generation, which is one the major concerns of the current infrastructure.

In the future VPlan infrastructure, we will definitely need to address this optimization as a VPlan-to-VPlan transformation. Thanks for bringing up this issue.
Diego

lib/Transforms/Utils/LoopUtils.cpp
168–169

(I don't know DB/ValueTracking in depth so maybe I'm missing something.)
If I understand correctly, we would try with value tracking in the following scenario:

117    MaxBitWidth = 64
...
127    MaxBitWidth = 33
...
129    MaxBitWidth = 64
...
132    First condition is true
}

But we wouldn't in the following one:

117    MaxBitWidth = 64
...
127    MaxBitWidth = 31
...
129    MaxBitWidth = 32
...
132    First condition is false
}

Is this the expected behavior? In other words, if DB returns a width narrower than the original one and it's later rounded up to the original width (first scenario), could value tracking return an even narrower width?

If so, shouldn't we always try with value tracking, even when MaxBitWidth != DL.getTypeSizeInBits?
Otherwise, shouldn't we skip value tracking for those cases (first scenario)? We could invoke isPowerOf2_64 only once at the end of the function.

386

may been -> may have been?

Hi Matthew,

thanks for taking care of this! I like the general idea of the fix but I have a concern regarding the TODO in line 408:

// TODO: We should not rely on InstCombine to rewrite the reduction in the
//       smaller type. We should just generate a correctly typed expression
//       to begin with.

The cost model is relying on an optimization that will hopefully be applied by InstCombine. I wonder if it would be too complicated to implement the actual type shrinking in LV code gen as part of the fix.
That would better align cost modeling with LV code generation, which is one the major concerns of the current infrastructure.

In the future VPlan infrastructure, we will definitely need to address this optimization as a VPlan-to-VPlan transformation. Thanks for bringing up this issue.
Diego

Yes, my comment was intended to draw attention to something that needs to be addressed in the future VPlan work. I think we will want to generate the actual type-shrunk code instead of inserting trunc/ext pairs, and assuming InstCombine will do what we want it to. We also generate trunc/ext pairs in SLP for type-shrinking, which needs to be addressed as well, I think. As you point out, this will better align the cost model with code generation.

lib/Transforms/Utils/LoopUtils.cpp
168–169

Demanded bits and value tracking will report different bit widths, in general. My thinking was that if demanded bits is able to limit the bit width at all, we use that width. Otherwise, we can try to do something with value tracking, which I think can be a more expensive analysis.

So I think what we should do there is move the isPowerOf2_64 round-ups to the very end. The MaxBitWidth == DL.getTypeSizeInBits(Exit->getType()) condition will then check if demanded bits was able to tell us anything before we do the rounding.

386

Nice catch!

Hi Chad,

I was waiting for the comments in lines 132 and 389 to be addressed.
Other than that, LGTM.

Thanks,
Diego

mcrosier commandeered this revision.Feb 2 2018, 8:08 AM
mcrosier updated this revision to Diff 132601.
mcrosier added a reviewer: mssimpso.

Address Diego's comments.

mcrosier marked 4 inline comments as done.EditedFeb 2 2018, 8:09 AM

Hi Digeo,
Matt will be out of the office for a few weeks and he asked me to follow up on this patch. Hopefully, the new version addresses all of your concerns.

Regards,
Chad

dcaballe accepted this revision.Feb 2 2018, 2:12 PM

LGTM

This revision is now accepted and ready to land.Feb 2 2018, 2:12 PM
This revision was automatically updated to reflect the committed changes.