I don't think the semantics of the llvm masked gather intrinsic care
about the order the elements are loaded. For example, type legalization
by splitting will chain them in parallel. This is different than
scatter which we do chain in order.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I'm glad you brought this up. I remember having my doubts when first doing the lowering for MGATHER. I can't remember if we had a discussion around that or not but it was on my mind. I think at some point in development I went with the conservative option and it wasn't caught in review. I suspect it's okay to use unordered loads. Do you know if other targets do?
X86 documentation for gather says "The values may be read from memory in any order. Memory ordering with other instructions follows the Intel64 memory-ordering model."
ARM's documentation makes no mention of ordering that I can find.
I think the difference between of order and unordered load is when the exception happens on memory access,
and the document said "However, using this intrinsic prevents exceptions on memory access to masked-off lanes."
IMO, it makes sense to me to assume mgather is unordered access, because intrinsic users have to set masked-off to avoid exception.