This is an archive of the discontinued LLVM Phabricator instance.

[mlir][llvm] Add memcpy support for mem2reg/sroa.
ClosedPublic

Authored by Moxinilian on Jun 14 2023, 4:31 AM.

Details

Summary

This revision introduces SROA and mem2reg support for the family of
memcpy-like intrinsics (memcpy, memcpy.inline and memmove).

The mem2reg implementation transforms memcpys of full types into loads
and store. Memcpy between two promotable slots always disappear.

The SROA implementation transforms memcpys of *entire* aggregate types
into memcpys of all of their fields.

Diff Detail

Event Timeline

Moxinilian created this revision.Jun 14 2023, 4:31 AM
Herald added a reviewer: dcaballe. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
Moxinilian requested review of this revision.Jun 14 2023, 4:32 AM

Looks good!

I added some nit comments and questions.

mlir/lib/Dialect/LLVMIR/IR/LLVMMemorySlot.cpp
436

ultra nit: Can you add a short doc string to the specialization as well.

458

Question: Isn't this guaranteed by the current GEP implementation? Should we make this an assert instead?

675

Can you explain why we are not checking here if the full slot is accessed, similar to memcpyCanRewire?

May be worth adding a comment!

730–731

ultra nit: I would probably promote this to a function doc string?

748

nit: the LLVM way of writing the loop would be

for (size_t i = 0, e = slot.elementPtrs.size(); i != e; i++) {
752

nit: does it make sense to spell out the type here? I would assume it is a MemorySlot?

mlir/test/Dialect/LLVMIR/mem2reg-intrinsics.mlir
226

Should there be a test for memcopy inline since it behaves a little different?

mlir/test/Dialect/LLVMIR/sroa-intrinsics.mlir
250

nit: copy pasta left over here and below?

273

Cant we generate SLOT_IN_OTHER1, SLOT_IN_OTHER2, etc? Or are the indexes not generated in order here?

330

Maybe also use memmove / memcopy inline instead of memcpy everywhere?

tschuett added inline comments.
mlir/lib/Dialect/LLVMIR/IR/LLVMMemorySlot.cpp
429

Nit: please return std::nullopt for optionals.

627

Nit: most of these function should be marked static instead of hiding them in an anonymous namespace.

Moxinilian marked 12 inline comments as done.
Moxinilian edited the summary of this revision. (Show Details)

Address comments.

mlir/lib/Dialect/LLVMIR/IR/LLVMMemorySlot.cpp
429

So I feel like if I do this I it will become very inconsistent with the rest of the dialect. Maybe a follow-up patch could change this everywhere instead?

458

While that is the case, I still need to check that the data given to memcpy is well behaved. A third-party operation could have declared a memory slot with non-i32 accesses that uses an llvm.ptr as a backend. A bit far-fetched, but could happen.

675

The reason is because we are not checking that we can rewire but simply that the access is safe. My rewiring logic only supports rewiring full copies, but what defines as a safe access is any size within the slot. I added a comment to clarify.

mlir/test/Dialect/LLVMIR/sroa-intrinsics.mlir
273

While the order is predictable (3, 2, 1, 0), it is not specified (which is fine because memcpy does not specify its copy logic). I think it is thus preferable to not expect any ordering. What do you think?

Moxinilian added a comment.EditedJun 15 2023, 5:56 AM

@tschuett Templated functions with specialization cannot be static. I made the non-specialized templates static, but for the specialized ones this is impossible. (reference: http://eel.is/c++draft/temp.expl.spec#2)

gysit accepted this revision.Jun 15 2023, 6:21 AM

Thanks LGTM!

mlir/lib/Dialect/LLVMIR/IR/LLVMMemorySlot.cpp
458

Ok I understand. I was mostly worried things may suddenly stop to mem2reg/sroa without an apparent reason. In this case an assert or some error message may be helpful.

mlir/test/Dialect/LLVMIR/sroa-intrinsics.mlir
273

I think most places I know actually hardcode the order in that case. The drawback of this is that you need to fix the test if the actual implementation changes. I think in this case we are mostly interested in seeing all the ops with the correct indexes. In that sense your solution is fine.

This revision is now accepted and ready to land.Jun 15 2023, 6:21 AM
This revision was automatically updated to reflect the committed changes.