This is an archive of the discontinued LLVM Phabricator instance.

[MemCpyOpt] Use MaxIntSize in byte instead of bit
ClosedPublic

Authored by junbuml on May 11 2016, 10:40 AM.

Details

Summary

This change fix the bug in isProfitableToUseMemset() where MaxIntSize shoule be in byte, not bit.

Diff Detail

Repository
rL LLVM

Event Timeline

junbuml updated this revision to Diff 56938.May 11 2016, 10:40 AM
junbuml retitled this revision from to [MemCpyOpt] Use MaxIntSize in byte instead of bit.
junbuml updated this object.
junbuml added reviewers: arsenm, mcrosier, mehdi_amini.
junbuml added a subscriber: llvm-commits.
mcrosier edited edge metadata.EditedMay 13 2016, 5:55 AM

In your test case aren't we merging these stores into a memset to be later lowered to a series of stores (or single 64-bit store) on most targets? The final code generated will be better in this case, because it can merge the 3 stores. In other cases I'm concerned the rather small memset might perturb optimizations that don't know how to deal with memsets. Don't we have a pass that merges stores into wider stores (e.g., SLP vectorizer or the counterpart to the LoadMerge pass or something)?

junbuml added a comment.EditedMay 13 2016, 8:01 AM

In your test case aren't we merging these stores into a memset to be later lowered to a series of stores (or single 64-bit store) on most targets? The final code generated will be better in this case, because it can merge the 3 stores.

Yes, for even small number of stores, isProfitableToUseMemset() try to see if we can reduce the total number of stores by merge them to memset which will be lowed later. But MaxIntSize should be in byte to make the heuristic correct.

In other cases I'm concerned the rather small memset might perturb optimizations that don't know how to deal with memsets.

Not sure exactly which pass, but I think it could be, if a pass gives up it's optimization when encountering a call instruction. If we concern merging small number of stores into memset here, we might need to discuss if doing this itself is profitable or not.

Don't we have a pass that merges stores into wider stores (e.g., SLP vectorizer or the counterpart to the LoadMerge pass or something)?

For sequential stores, it seams that the narrow stores are combined in DAG level :

store i16 0, i16* %0
store i16 0, i16* %1

into

STRWui %vreg1, %vreg0, 0;

As far as I check, however, this doesn't seem to handle the test case in this change, where I mixed the stored size : 2byte, 4byte, 2byte.

mehdi_amini added inline comments.May 13 2016, 8:33 AM
lib/Transforms/Scalar/MemCpyOptimizer.cpp
188 ↗(On Diff #56938)

Good catch!
Looks like a misnamed method, should be getLargestLegalIntTypeSizeInBits()

mcrosier accepted this revision.May 13 2016, 8:42 AM
mcrosier edited edge metadata.

LGTM. Thanks for the clarification, Jun.

My comments weren't in opposition to this patch, I was just commenting on peculiarities of the test case. Feel free to post a patch to rename the function as Mendi suggests. You might also investigate why the stores aren't being merged.

This revision is now accepted and ready to land.May 13 2016, 8:42 AM

My comments weren't in opposition to this patch, I was just commenting on peculiarities of the test case. Feel free to post a patch to rename the function as Mendi suggests. You might also investigate why the stores aren't being merged.

Thanks Chad. I see your point. From what you pointed out, I found potential improvement in DAG combine which merge :

store i16 0, i16* %0
store i16 0, i16* %1

into

STRWui

but

store i16 0, i16* %0
store i16 0, i16* %1
store i32 0, i32* %2

into

STRHHui
STRHHui
STRWui
This revision was automatically updated to reflect the committed changes.