This is an archive of the discontinued LLVM Phabricator instance.

[mlir][spirv][vector] Enable vector3 when converting to dot product
ClosedPublic

Authored by antiagainst on Apr 18 2023, 9:41 AM.

Details

Summary

It's common to see such cases for contraction from convolution with
input channel as 3. Although we aren't utilizing all 4 lanes for
dot product, it should still be better than performing the multiply
and reduction separately.

Diff Detail

Event Timeline

antiagainst created this revision.Apr 18 2023, 9:41 AM
Herald added a project: Restricted Project. · View Herald Transcript
antiagainst requested review of this revision.Apr 18 2023, 9:41 AM
kuhar accepted this revision.Apr 18 2023, 12:23 PM

LGTM but I wonder if we should generalize this to support any vector width > 2. For widths > 4, we could unroll it into a chain of 4-element dot products.

This revision is now accepted and ready to land.Apr 18 2023, 12:23 PM

LGTM but I wonder if we should generalize this to support any vector width > 2. For widths > 4, we could unroll it into a chain of 4-element dot products.

For vectors with a size larger than 4, we could already relying on unrolling at vector level, right?

kuhar added a comment.Apr 18 2023, 1:59 PM

LGTM but I wonder if we should generalize this to support any vector width > 2. For widths > 4, we could unroll it into a chain of 4-element dot products.

For vectors with a size larger than 4, we could already relying on unrolling at vector level, right?

That's independent of the unrolling IMO -- whatever the unrolling scheme used is, we can efficiently lower any reduction of vectors of size >= 3.

LGTM but I wonder if we should generalize this to support any vector width > 2. For widths > 4, we could unroll it into a chain of 4-element dot products.

For vectors with a size larger than 4, we could already relying on unrolling at vector level, right?

That's independent of the unrolling IMO -- whatever the unrolling scheme used is, we can efficiently lower any reduction of vectors of size >= 3.

Yeah, agreed. We can do that when a use case arise later.