Most computer vision torch models uses nchw/ncw convolution. In a previous patch we added decomposition conv2dNchw to conv1dNcw. To enhance the performance on torch models we add this vectorization pattern for conv1dNcw which would consquently also improve the performance on conv2dNchw.
On IREE + Intel Xeon 8360 + Resnet50, we were able to get ~7x speed up ~880ms to 126ms.
can we rephrase to avoid newlines?