This adds a fold of add(x, shuffle(x, <1,0,3,2,5,4,...>), into shuffle(addp(x), <0,0,1,1,2,2,..>. The ADDP instruction takes two vectors and returns one, adding adjacent pairs. So we match x in a custom combine as it is lowered from a v8i32. The original code would be 2 rev64 and 2 add, with the new code being a single addp with a zip1;zip2 shuffle, producing smaller code.
Details
Details
Diff Detail
Diff Detail
Event Timeline
Comment Actions
Yeah sure that sounds good, but in a separate patch. I was attempting to get sub to work in the same way too, using a mul <0,-1>, but it's much harder for it to be profitable without using demanded elements.