This is an archive of the discontinued LLVM Phabricator instance.

SLPVectorizer.cpp: Ensure SLPVectorizer can visit each block by dominated order
Needs RevisionPublic

Authored by chapuni on Oct 5 2017, 7:45 AM.

Details

Summary

This causes different output between libstdc++ and libc++.

Consider A,B,C,D,E, and F.
DT->properlyDominates() says;

  • D dominates A
  • E dominates C
  • F dominates B

libstdc++'s stable_sort did;

  1. D (dominates A)
  2. A
  3. B
  4. E (dominates C)
  5. C
  6. F (oughta dominate B...)

F was not hoisted.

libc++ didn't modify order, oops.

I suggest to introduce stupid sort here. Please suggest me if you had better algorithm.

Diff Detail

Repository
rL LLVM

Event Timeline

chapuni created this revision.Oct 5 2017, 7:45 AM
davide requested changes to this revision.Oct 5 2017, 1:38 PM
davide added a subscriber: davide.

The issue here is that properlyDominates only captures the parent->child relation in the dom, and siblings in the {post}dominator tree can appear in any order.
I don't think stupid sort is the best way forward, as it seems to be badly O(N^2) in the number of nodes in the dom (i.e. the number of reachable blocks in the CFG).
You may want to instead tighten the check to have a deterministic ordering for siblings).
For example, if you do an RPO walk of the CFG for your function F, you can assign an integer (unique) starting from zero, everytime you visit a new block.
This is is a start, and it's O(N), but you want probably try to be smarter and piggy-back on some other BB scan (like the one happening before this sorting, although I haven't checked the details).

This revision now requires changes to proceed.Oct 5 2017, 1:38 PM
dberlin edited edge metadata.Oct 5 2017, 3:01 PM

We sort siblings by RPO in newgvn. If you want to guarantee parent before child, however, just sort by the dominator tree dfs in and out numbers.

davide added a comment.Oct 5 2017, 3:04 PM

We sort siblings by RPO in newgvn. If you want to guarantee parent before child, however, just sort by the dominator tree dfs in and out numbers.

+1, yes, that's where I got the idea from :)

dberlin requested changes to this revision.Oct 5 2017, 3:07 PM

Also this needs a test

chapuni updated this revision to Diff 118120.Oct 6 2017, 6:54 PM
chapuni edited edge metadata.
chapuni retitled this revision from SLPVectorizer.cpp: Avoid std::stable_sort() due to asymmetric comparator to SLPVectorizer.cpp: Ensure SLPVectorizer can visit each block by dominated order.

The test added. This comes from EmitAtomicUpdateValue() in CGAtomic.cpp.
I confirmed it fails on both libstdc++ and libc++ to me.

I will attempt to get rid of O(N^2) there, but I think it'd not be big deal unless preds exceeded hundreds.

davide added a comment.Oct 6 2017, 8:46 PM

The test added. This comes from EmitAtomicUpdateValue() in CGAtomic.cpp.
I confirmed it fails on both libstdc++ and libc++ to me.

I will attempt to get rid of O(N^2) there, but I think it'd not be big deal unless preds exceeded hundreds.

I really think we should try to at least make an attempt to get rid of the O(N^2) behaviour as it doesn't seem terribly complicated to do so _and_ it may bite us in the future in many unexpected ways. The test seems fine, but you can probably clean it up a bit (and run metarenamer on it for shorter names)

davide added inline comments.Oct 6 2017, 10:53 PM
llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
3347–3351

this is going to be stale once you have a value numbering to fix this in linear time.

llvm/trunk/test/Transforms/SLPVectorizer/X86/visit-dominated.ll
2 ↗(On Diff #118120)

I'm not sure I understand this comment?

4 ↗(On Diff #118120)

what do you mean by may be inducted ?

chapuni added inline comments.Oct 7 2017, 12:54 AM
llvm/trunk/lib/Transforms/Vectorize/SLPVectorizer.cpp
3347–3351

The issues is not "stability", but properlyDominates(A,B) would inapplicable to sorting.

llvm/trunk/test/Transforms/SLPVectorizer/X86/visit-dominated.ll
2 ↗(On Diff #118120)

How about? "Ensure each dominator block comes first in advance of its users"

4 ↗(On Diff #118120)

It might be superfluous. I'd like to explain why

[[QUAL1:%.+]] = extractelement <2 x i64> [[VEC_VALUE_QUALTYPE]], i32 1

appears in a few place.

Preds will in fact be thousands sometimes with switch statements. If you don't want to fix it, let me know and I'll fix the sort on Monday

chapuni updated this revision to Diff 118146.Oct 7 2017, 6:23 PM

@dberlin, I got rid of std::stable_sort() itself. Confirmed all tests (with mine) passed.

chapuni updated this revision to Diff 118147.Oct 7 2017, 6:32 PM

Forgot to reformat.

davide requested changes to this revision.EditedOct 7 2017, 9:24 PM

hmmm, I'm still not entirely convinced, and I'd like somebody who knows the vectorizer to chime in.
This is not a NFC change, at least in theory, from what I understand (feel free to prove me wrong :)
The code as it was was sorting the CSE blocks in dominator order (and visiting them in function order), now, you're visiting them in RPO and not sorting them.
About the first change, I'm not sure what are the effects. Probably if the vectorizer is formulated as a forward data flow problem this isn't necessarily a bad thing (in general, visiting BBs in function order is not guaranteed to be optimal, that's why we have other visiting schemes).

The second part is the one that worries me. Dominator tree order and RPO aren't strictly the same, e.g. when you have a basic block with multiple successors, as far as I can tell.
If you don't need dominator ordering and you can get away with RPO, fine, but you should at least explain why (and add a comment).

This revision now requires changes to proceed.Oct 7 2017, 9:24 PM
davide added a comment.Oct 7 2017, 9:56 PM

yes, presumably that would work.
Takumi, can you please trying to update the code with Danny's suggestion?

Does this test pass to you?

Excuse me, I'm off.

The issue is that std::sort is insufficient with properlyDominates() as comparator.
So, I have to propose to get rid of std::sort there.

Test code: https://gist.github.com/chapuni/9c5659610f5cd3c0933708bd43f32cdb

Although I am not special to vectorizer, I would like to fix my issue. See,
http://bb.pgr.jp/builders/bootstrap-clang-libcxx-lld-i686-linux/builds/459 (Applied this)
http://bb.pgr.jp/builders/bootstrap-clang-libcxx-lld-i686-linux/builds/460 (Reverted)

Is it inefficient to use RPOT there? I think better than sorting.

kuhar resigned from this revision.Sep 30 2019, 9:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 30 2019, 9:47 AM
Herald added a subscriber: jfb. · View Herald Transcript