The AVX2 lowering for transpose operations is only applicable to f32 vector types.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
Just curious if F32 data movement would not just work for any 32-bit entity, but change looks good to me.
Make sure to get presubmit green first before committing though.
Comment Actions
Thanks!
Just curious if F32 data movement would not just work for any 32-bit entity, but change looks good to me.
The existing patterns currently use f32 specific instructions (note the Ps):
Value t0 = mm256UnpackLoPs(ib, vs[0], vs[1]); Value t1 = mm256UnpackHiPs(ib, vs[0], vs[1]); Value t2 = mm256UnpackLoPs(ib, vs[2], vs[3]); Value t3 = mm256UnpackHiPs(ib, vs[2], vs[3]); Value t4 = mm256UnpackLoPs(ib, vs[4], vs[5]); Value t5 = mm256UnpackHiPs(ib, vs[4], vs[5]); Value t6 = mm256UnpackLoPs(ib, vs[6], vs[7]); Value t7 = mm256UnpackHiPs(ib, vs[6], vs[7]);
It should be relatively easy to enable them on i32 since they have the same size.