This is very much a WIP but I'd like to get feedback on if this is the correct way to go about implementing this. As per Sean Silva's advice, this uses the group size as a second batch dimension in both input and weight.
Details
Diff Detail
Event Timeline
I suspect the g dimension should be contiguous with the C dimension, and I'm not sure if it should be major or minor to it.
mlir/python/mlir/dialects/linalg/opdsl/ops/core_named_ops.py | ||
---|---|---|
364 | mods and divs in indexing map is going to be problematic..... I'd expect input shape to be [NG][N][CG][...][..] where NG is assumed to be C/G and CG is assumed to be c%G. We cannot enforce those conditions, they are expected to be true.... |
It's not really something we get to choose -- the op definition in frontends determines whether the groups are major or minor to the "real channels" -- I think it is major but please verify that. If we don't get this right then we have to transpose before calling this op which is wasteful.
It looks like PyTorch in the slow CPU backend implements this with batches of convolutions rather than a batch dimension. After some digging through mkldnn, it looks like group is major there, although that's opaque to the user. So for PyTorch at least it looks like [N, G, C, ...] is the way to go.
Finally finished the bugfixing, with a lot of help from Mahesh. Should be ready to be merged unless anyone has any more requests.