This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Make sure instructions are ordered when computing spill cost.
ClosedPublic

Authored by fhahn on Jun 24 2020, 2:37 AM.

Details

Summary

The entries in VectorizableTree are not necessarily ordered by their
position in basic blocks. Collect them and order them by dominance so
later instructions are guaranteed to be visited first. For instructions
in different basic blocks, we only scan to the beginning of the block,
so their order does not matter, as long as all instructions in a basic
block are grouped together. Using dominance ensures a deterministic order.

The modified test case contains an example where we compute a wrong
spill cost (2) without this patch, even though there is no call between
any instruction in the bundle.

This seems to have limited practical impact, .e.g on X86 with a recent
Intel Xeon CPU with -O3 -march=native -flto on MultiSource,SPEC2000,SPEC2006
there are no binary changes.

Diff Detail

Event Timeline

fhahn created this revision.Jun 24 2020, 2:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 24 2020, 2:37 AM
ABataev added inline comments.Jun 24 2020, 5:23 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3769–3778

Can we use a set instead of the vector with sort?

fhahn marked an inline comment as done.Jun 25 2020, 5:52 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3769–3778

We could use set with dominates() as comparator I think. I am not sure if there would be a benefit of doing so for the use locally in the function, but maybe we could use it to keep VectorizableTree in order to start with? Not entirely sure what impact that would have at other places, but it might be worth exploring as follow-up?

nikic added a subscriber: nikic.Jul 3 2020, 5:07 AM

This seems to have limited practical impact, .e.g on X86 with a recent
Intel Xeon CPU with -O3 -march=native -flto on MultiSource,SPEC2000,SPEC2006
there are no binary changes.

That's not particularly surprising, as the spill cost is always zero on X86. AArch64 is the only architecture that defines getCostOfKeepingLiveOverCall().

I had a different patch for this issue in D64523, but your solution to the problem is much simpler :)

This revision is now accepted and ready to land.Jul 3 2020, 7:05 AM
vdmitrie added inline comments.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3777

There is a problem with this predicate function.
MSFT STL implementation of stable_sort asserts that if predicate returned true
then it must return false when operands are swapped.
But (A dom B) == false does not necessarily mean that (B dom A) == true.
Instead: (A dom B) ==true means that (B dom A) == false.
Rewriting it like this solves this issue:

return DT->dominates(B, A);
xbolva00 added inline comments.Aug 8 2020, 6:06 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3777

@fhahn try to fix it and reland.

Seems like this is a reason why a win buildbot failed.

xbolva00 added a comment.EditedAug 11 2020, 2:39 AM

Seems like ClamAV has now compile time regression ~ 0.1-0.2 % (http://llvm-compile-time-tracker.com/compare.php?from=3ce57e012110519c1d3a49fc98959a64634d5d8f&to=36e1fc5f68e918ba69ccd9033b38240265617c4e&stat=instructions&details=on)

CMakeFiles/clamscan.dir/libclamav_mspack.c.o 6432M 6472M (+0.62%)
CMakeFiles/clamscan.dir/libclamav_nsis_LZMADecode.c.o 730M 738M (+1.09%)

CMakeFiles/clamscan.dir/libclamav_upx.c.o 1213M 1272M (+4.87%)

cc @nikic