This change fix the bug in isProfitableToUseMemset() where MaxIntSize shoule be in byte, not bit.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
In your test case aren't we merging these stores into a memset to be later lowered to a series of stores (or single 64-bit store) on most targets? The final code generated will be better in this case, because it can merge the 3 stores. In other cases I'm concerned the rather small memset might perturb optimizations that don't know how to deal with memsets. Don't we have a pass that merges stores into wider stores (e.g., SLP vectorizer or the counterpart to the LoadMerge pass or something)?
In your test case aren't we merging these stores into a memset to be later lowered to a series of stores (or single 64-bit store) on most targets? The final code generated will be better in this case, because it can merge the 3 stores.
Yes, for even small number of stores, isProfitableToUseMemset() try to see if we can reduce the total number of stores by merge them to memset which will be lowed later. But MaxIntSize should be in byte to make the heuristic correct.
In other cases I'm concerned the rather small memset might perturb optimizations that don't know how to deal with memsets.
Not sure exactly which pass, but I think it could be, if a pass gives up it's optimization when encountering a call instruction. If we concern merging small number of stores into memset here, we might need to discuss if doing this itself is profitable or not.
Don't we have a pass that merges stores into wider stores (e.g., SLP vectorizer or the counterpart to the LoadMerge pass or something)?
For sequential stores, it seams that the narrow stores are combined in DAG level :
store i16 0, i16* %0 store i16 0, i16* %1
into
STRWui %vreg1, %vreg0, 0;
As far as I check, however, this doesn't seem to handle the test case in this change, where I mixed the stored size : 2byte, 4byte, 2byte.
lib/Transforms/Scalar/MemCpyOptimizer.cpp | ||
---|---|---|
188 ↗ | (On Diff #56938) | Good catch! |
LGTM. Thanks for the clarification, Jun.
My comments weren't in opposition to this patch, I was just commenting on peculiarities of the test case. Feel free to post a patch to rename the function as Mendi suggests. You might also investigate why the stores aren't being merged.
My comments weren't in opposition to this patch, I was just commenting on peculiarities of the test case. Feel free to post a patch to rename the function as Mendi suggests. You might also investigate why the stores aren't being merged.
Thanks Chad. I see your point. From what you pointed out, I found potential improvement in DAG combine which merge :
store i16 0, i16* %0 store i16 0, i16* %1
into
STRWui
but
store i16 0, i16* %0 store i16 0, i16* %1 store i32 0, i32* %2
into
STRHHui STRHHui STRWui