Page MenuHomePhabricator

Add getMostFrequentByte and use for isBytewiseValue implementation
Needs ReviewPublic

Authored by vitalybuka on Jul 2 2019, 4:16 PM.

Details

Reviewers
pcc
eugenis
Summary

getMostFrequentByte is needed to optimize emitStoresForConstant in CGDecl.cpp for pattern initialization.
It should not affect compiled code. Different algorithm but isBytewiseValue behavior should be unchanged.

From @pcc prototype for -ftrivial-auto-var-init=pattern optimizations

Event Timeline

vitalybuka created this revision.Jul 2 2019, 4:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2019, 4:16 PM
vitalybuka updated this revision to Diff 207654.Jul 2 2019, 4:26 PM

Fix compilation warnings

bjope added a subscriber: bjope.Jul 3 2019, 9:49 AM

What is the idea here? Just a refactoring to get rid of recursive calls? Or are you going to use the histogram for some other purposes?
(the patch is lacking description, but perhaps it's just a temporary hack an not something that is subject for submitting?)

FWIW, downstream we have 16-bit addressable units and we are tweaking isBytewiseValue to also support "16-bit bytes" in our fork. So far it has been quite simple to do such adjustments, and maybe it will be doable by using a 65536+1 entries large histogram as well.
I just got curious to what the plan is here (to understand if it is worth spending time on continuing to adapt to the upstream changes or if we simply should keep our current version that should be working for 16-bit splats).

vitalybuka edited the summary of this revision. (Show Details)Jul 3 2019, 9:57 AM
vitalybuka edited the summary of this revision. (Show Details)
vitalybuka edited the summary of this revision. (Show Details)
bjope added inline comments.Jul 3 2019, 10:04 AM
llvm/lib/Analysis/ValueTracking.cpp
3206

Not sure that we are guaranteed to get Size number of zeroes here. Not unless we also add a check for DL.typeSizeEqualsStoreSize(V->getType). Consider if we for example have an i12, then only 12 bits out of 16 is known to be zero.

(this might be a bug on trunk as well, or maybe I've missed some condition somewhere regarding how this is used)

What is the idea here? Just a refactoring to get rid of recursive calls? Or are you going to use the histogram for some other purposes?
(the patch is lacking description, but perhaps it's just a temporary hack an not something that is subject for submitting?)

Added comment into description, I hope to upload the patch which is going to use new function today.

FWIW, downstream we have 16-bit addressable units and we are tweaking isBytewiseValue to also support "16-bit bytes" in our fork. So far it has been quite simple to do such adjustments, and maybe it will be doable by using a 65536+1 entries large histogram as well.
I just got curious to what the plan is here (to understand if it is worth spending time on continuing to adapt to the upstream changes or if we simply should keep our current version that should be working for 16-bit splats).

I don't plan other changes in this code after this patch stack.

bjope added a comment.Jul 3 2019, 10:06 AM

What is the idea here? Just a refactoring to get rid of recursive calls? Or are you going to use the histogram for some other purposes?
(the patch is lacking description, but perhaps it's just a temporary hack an not something that is subject for submitting?)

Added comment into description, I hope to upload the patch which is going to use new function today.

FWIW, downstream we have 16-bit addressable units and we are tweaking isBytewiseValue to also support "16-bit bytes" in our fork. So far it has been quite simple to do such adjustments, and maybe it will be doable by using a 65536+1 entries large histogram as well.
I just got curious to what the plan is here (to understand if it is worth spending time on continuing to adapt to the upstream changes or if we simply should keep our current version that should be working for 16-bit splats).

I don't plan other changes in this code after this patch stack.

Ok. Thanks!

vitalybuka marked an inline comment as done.Jul 3 2019, 10:36 AM
vitalybuka added inline comments.
llvm/lib/Analysis/ValueTracking.cpp
3206

yes, it's not precise, as not precise the ConstantFP handling below and isSplat(8) handling in ConstantInt
I guess it does not change behavior of IsBytewiseValueTest, and it's good enough for my future patch.

Also I think this is the best behavior for current users. If we memset zero there then we should be good. Same does not work below for (CI->getBitWidth() % 8) and for non zero initiazer, we had to give up and count it against Other.

If we count non aligned isNullValue against "Other", it will change behavior of IsBytewiseValueTest.

vitalybuka edited the summary of this revision. (Show Details)Jul 10 2019, 4:02 PM
vitalybuka edited the summary of this revision. (Show Details)

I'm a bit worried about performance implications.
isBytewiseValue is now doing some clearly unnecessary work.
Could you sanity check that it does not affect compilation times?

llvm/lib/Analysis/ValueTracking.cpp
3172

use std::array

jfb added a subscriber: jfb.Jul 12 2019, 12:42 PM
jfb added inline comments.
llvm/lib/Analysis/ValueTracking.cpp
3182

Should Undef count as Other? Or have its own count?

3206

I think it's worth a comment on over-estimate, but it seems fine as-is.

vitalybuka marked an inline comment as done.Jul 12 2019, 1:15 PM
vitalybuka added inline comments.
llvm/lib/Analysis/ValueTracking.cpp
3182

Undef means we don't care what it's there, so we can fill we any pattern.
Other counts bytes which are different from most frequent value and will have to be fixed, e.g. after memset

eugenis added inline comments.Jul 12 2019, 1:59 PM
llvm/lib/Analysis/ValueTracking.cpp
3182

Should we count undefs separately in the histogram? It seems like it would not add any complexity or runtime cost, and the caller can just add the two number if that's what they need.

jfb added inline comments.Jul 12 2019, 2:12 PM
llvm/lib/Analysis/ValueTracking.cpp
3182

Right, it seems like we might want the following information:

  • Value is one of 256 bytes were tracking.
  • Value wasn't something we looked into, for X bytes.
  • Value was undef.
  • Value was padding (same as undef).

We can figure out some of this, but not all, from the current patch. Is it useful to know the rest? I'm not sure! I haven't inspected all uses to be able to convince myself.

vitalybuka marked an inline comment as done.Jul 12 2019, 3:20 PM
vitalybuka added inline comments.
llvm/lib/Analysis/ValueTracking.cpp
3182

Right now I only use "count" and "other". Future users can easily count and return undefs and update interface if needed. But I would prefer to do so when needed. Also maybe at some point it would be useful to return entire histogram.

But I am OK to include undef if you believe it could be used soon.