Simplify and generalize chain handling and search for 64-bit load-store pairs.
Nontemporal test now converts 64-bit integer load-store into f64 which it realizes directly instead of splitting into two i32 pairs.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
34373 ↗ | (On Diff #125793) | Is NewChain an unused variable now? |
Are we supposed to be losing the non temporal property during this transformation
It's still listed as a 8 byte nontermporal store in the DAG post transformation. IIUC this should be MOVNTQ.
What about the i32 and i64 stores from the original IR? It looks like we emited a movl and a movsd?
Previously this transformation missed the the i64 load/non-temporal store pair and as a result they get split into a i32 operations. Because this aligns with the nontemporal store of the doubleword value in 12(%ebp) another transformation elides that pair (which given what this test look at is probably indicative that this test needs some sort of fencing added).
I was wondering why we weren't using movntsd, but I forgot that's an AMD SSE4A instruction.
LGTM, but I wonder if we should be avoiding nontemporal stores if we can't preserve that property in the generated instructions. I assume that issue exists for other cases even without this patch though.