In the stride == 1 case, conv1d reads contiguous data along the input dimension. This can be advantageaously used to bulk memory transfers and compute while avoiding unrolling. Experimentally, this can yield speedups of up to 50%.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo