This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Generate shuffles if we can reorder an existing node
Needs ReviewPublic

Authored by dmgreen on Apr 12 2022, 4:50 AM.

Details

Summary

Whilst vectorizing a tree element, if we detect a list of nodes which is a reordering of an existing tree element, we can create a shuffle instead of falling back to a "build-vector" (i.e inserts that we extract from a vector instruction). This can help reduce the number of insert/extracts sequences that need to be cleaned up by instcombine.

Diff Detail

Event Timeline

dmgreen created this revision.Apr 12 2022, 4:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2022, 4:50 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
dmgreen requested review of this revision.Apr 12 2022, 4:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2022, 4:50 AM

Just FYI, there is D116312 and D110978 with more general improvements, I assume

Oh yeah. I'm not sure about the first, but D110978 seems like it might be a larger change that handles this case and a lot more. Apparently it's 1200 lines of changes? That sounds like a lot! It tries to re-build the shuffles after from the extracts?
In this case I "just" need vector combine to see the shuffle, not the insert/extract, as instcombine doesn't run between the two.

Oh yeah. I'm not sure about the first, but D110978 seems like it might be a larger change that handles this case and a lot more. Apparently it's 1200 lines of changes? That sounds like a lot! It tries to re-build the shuffles after from the extracts?
In this case I "just" need vector combine to see the shuffle, not the insert/extract, as instcombine doesn't run between the two.

No, it tries to do the same as your patch - find reused scalars in the previously build nodes and emit shuffles, if possible, instead of bunch of extracts.
The problem is that we need to track the dependencies between nodes, that's why it is so big.

No, it tries to do the same as your patch - find reused scalars in the previously build nodes and emit shuffles, if possible, instead of bunch of extracts.
The problem is that we need to track the dependencies between nodes, that's why it is so big.

Do you think this patch is incorrect then? Or just handling a subset of D110978? As far as I understand it should be safe.

I think this is the last puzzle piece I need, to allow vectorcombine to see shuffles as opposed to insert/extracts. What do you think of giving this patch a go in the meantime, before the more general case is handled in D110978?

No, it tries to do the same as your patch - find reused scalars in the previously build nodes and emit shuffles, if possible, instead of bunch of extracts.
The problem is that we need to track the dependencies between nodes, that's why it is so big.

Do you think this patch is incorrect then? Or just handling a subset of D110978? As far as I understand it should be safe.

I think this is the last puzzle piece I need, to allow vectorcombine to see shuffles as opposed to insert/extracts. What do you think of giving this patch a go in the meantime, before the more general case is handled in D110978?

Yes, your patch is a subset of D110978. It can handle scalars from different nodes.