This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Support internal users of splat loads
ClosedPublic

Authored by vporpo on Mar 17 2022, 11:46 AM.

Details

Summary

Until now we would only accept a broadcast load pattern if it is only used
by a single vector of instructions.
This patch relaxes this, and allows for the broadcast to have more than one
user vector, as long as all of its uses are internal to the SLP graph and
vectorized.

Diff Detail

Event Timeline

vporpo created this revision.Mar 17 2022, 11:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2022, 11:46 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
vporpo requested review of this revision.Mar 17 2022, 11:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2022, 11:46 AM
ABataev added inline comments.Mar 17 2022, 4:27 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1175–1190

Can this be moved to getSplatScore anyhow?

vporpo added inline comments.Apr 15 2022, 12:49 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1175–1190

getSplatScore() does not seem to check if V1 and V2 form a splat. As far as I understand it checks for a potential splat across all lanes and returns an extra score in that case. Not sure how I could combine this code with getSplatScore().

ABataev added inline comments.Apr 18 2022, 5:32 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1179

Why 3, could you give some extra comments here?

1180

Same question here

vporpo added inline comments.Apr 18 2022, 7:03 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1180

This sets the limit of users of a broadcast that will be checked. Instead of having a fixed hard limit of say 10 users, this is taking into account the number of lanes, because the more the lanes, the higher the limit should be to avoid it being over-conservative. For example, if we only have 2 lanes then this corresponds to a limit of 7, but if we have 4 lanes then this becomes 13.
Anyway, I don't think this is super important, we could just set a hard limit of 10. What do you think?

ABataev added inline comments.Apr 18 2022, 7:39 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1180

The limit here is just to reduce number of checks, probably some hardcoded constant value would be better, otherwise we may have to many checks for very big NumLanes values.

vporpo updated this revision to Diff 423397.Apr 18 2022, 7:55 AM

Replaced the users limit with a hard-coded value.

ABataev added inline comments.Apr 18 2022, 8:38 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1179

That's too much, I assume. Better to have 4-8, it may require to many iterations in the all_of function.

vporpo updated this revision to Diff 423413.Apr 18 2022, 9:24 AM

Changed limit to 8.

This revision is now accepted and ready to land.Apr 18 2022, 11:22 AM
This revision was landed with ongoing or failed builds.Apr 18 2022, 12:01 PM
This revision was automatically updated to reflect the committed changes.
vdmitrie added inline comments.
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1169

New arguments are not described.

1371

ditto: not all arguments described to this function.

vporpo added inline comments.Apr 19 2022, 10:48 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1169

Good point. U1 and U2 correspond to the users of V1 and V2 as we are walking up the operands. Let me fix this in a new patch.

vdmitrie added inline comments.Apr 19 2022, 11:17 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1176

BTW NumLanes capture is not used in the lambda.

vporpo added inline comments.Apr 19 2022, 11:23 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
1176

Yeah, it was breaking some builds, so I reverted it and recommitted the patch with the fix.