This is a one-line-of-code patch for:
The reasoning that I'm hoping will hold is that we shouldn't discriminate a memset operation from a memcpy at this level because they have exactly the same load/store instruction type requirements.
But as the test cases show, there's some ugliness here:
- The i386 (Windows) test expands to use 32 stores instead of a 'rep stosl'. Is that better or worse? (I'm not sure why this change even happens yet.)
- The memset-2.ll tests look quite awkward in the way they splat the byte value into an XMM reg; imul isn't generally cheap.
- Why do the memset-nonzero.ll tests for an AVX1 target not use vbroadcast like the AVX2 target?
- Why does the machine scheduler reorder the DAG nodes? In all cases, we create those store nodes in low-to-high mem address order, but that's not how the machine instructions come out.
I don't think any of the above are big enough problems to prevent this patch from going in first, but we should improve those.