The current DAG combine turns a sequence of extracts from <4 x i32> followed by zexts into a store followed by scalar loads.
According to measurements by Martin Krastev (see PR 21269) for x86-64, a sequence of an extract, movs and shifts gives better performance. However, for 32-bit x86, the previous sequence still seems better.
Details
Details
Diff Detail
Diff Detail
- Repository
- rL LLVM
Event Timeline
Comment Actions
Hi Michael,
This LGTM with one request.
Could you add a test case where we fall back to the old sequence of store + loads?
No need to send an updated patch.
Thanks,
-Quentin
lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
22656 ↗ | (On Diff #16860) | Period at the end of the comment. |
Comment Actions
Thanks, Quentin!
Sure, will add that.
It'll have to be slightly more contrived because it needs to have an explicit zext, as opposed to an implicit extension coming from a GEP, but that's no big deal.