The purpose of patch is to learn Loop idiom recognition pass how to recognize simple memmove patterns
in similar way like GCC does: https://godbolt.org/z/fh95e83od
LoopIdiomRecognize already has machinery for memset and memcpy recognition, patch tries to extend existing capabilities with minimal effort.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp | ||
---|---|---|
1287 | Finally, someone trying to add this transform figured out this check... I think this is the fourth patch trying to add memmove to loopidiom. Do you need to compare the offset to the size of the load/store operations? I think you might get some funny behavior if the operations overlap. | |
1345 | Do we need to make sure we don't fall into this codepath? |
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp | ||
---|---|---|
1345 | Yes, now this branch is ommited. Added tests to catch such mistakes. |
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp | ||
---|---|---|
1287 |
Well, hopefully there won't be fifth one :)
Yes, it would be nice to prevent overlapping. I'm going to add protection against it and UT reproducer. |
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp | ||
---|---|---|
1287 | Now load and store can't overlap each other and their sizes must be equal. However I'm not sure how reproducer should look like. Currently isLegalStore reject everything which has abs(Stride) != StoreSize so my understanding is that only one load and one store in loop are allowed. I ended up with following reproducer: define void @do_not_form_memmove_for_overlapped_access(i32* %s, i64 %size) { entry: %end.idx = add i64 %size, -1 %end.ptr = getelementptr inbounds i32, i32* %s, i64 %end.idx br label %while.body while.body: %phi.ptr = phi i32* [ %s, %entry ], [ %next.ptr, %while.body ] %next = bitcast i32* %phi.ptr to i16* %src.ptr = getelementptr i16, i16* %next, i64 1 %src.ptr2 = bitcast i16* %src.ptr to i32* ; below misaligned load is overlapped with store. %val = load i32, i32* %src.ptr2, align 4 %dst.ptr = getelementptr i32, i32* %phi.ptr, i64 0 store i32 %val, i32* %dst.ptr, align 4 %next.ptr = getelementptr i32, i32* %phi.ptr, i64 1 %cmp = icmp eq i32* %next.ptr, %end.ptr br i1 %cmp, label %exit, label %while.body exit: ret void } The thing is that I'm not sure whether or not above snippet is correct IR since we do misaligned load. |
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp | ||
---|---|---|
1287 | Ok, I checked documentation and mailing list for better understanding. Changing load just to "%val = load i32, i32* %src.ptr2, align 2" should be fine and give legal IR with underaligned access. Added improved overlapping access UT reproducer. |
@efriedma : Thank you for review! Now all comments are addressed and change pass on CI.
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp | ||
---|---|---|
1380–1381 | Don't touch this counter when memmove is emitted. |
llvm/lib/Transforms/Scalar/LoopIdiomRecognize.cpp | ||
---|---|---|
1380 | Ok, I will add this counter. |
Can you add a testcase for a case where both IsMemCpy and UseMemMove are true? I think you'll end up bailing out due to the alias check, but I'd like to be sure.
llvm/test/Transforms/LoopIdiom/basic.ll | ||
---|---|---|
1269 | Fix test to emit memmove. |
Wouldn't you mind if I added memcpy handling in follow-up change? While working on memcpy transformation I ended up with many unit tests (almost every memmove load-store UT has memcpy equivalent) and my change still gets bigger.
Current patch bailing out on memcpy should be enough to satisfy PR. I don't insist but if you agree I will leave fixme on do_not_form_memmove4 UT and will open follow-up review transforming memcpy.
Also you can add NumMemMove..