- When memcpy is lowered in SelectionDAG, similar to inlining a function, scoped AA metadata should be generated and attached on the lowered loads/stores following 'noalias' arguments.
|80 ms||x64 debian > LLVM.CodeGen/AArch64::memcpy-scoped-aa.ll|
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/llc -mtriple=aarch64-linux-gnu -o - /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/AArch64/memcpy-scoped-aa.ll | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/AArch64/memcpy-scoped-aa.ll
|150 ms||x64 debian > LLVM.CodeGen/AMDGPU::memcpy-scoped-aa.ll|
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -o - /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/AMDGPU/memcpy-scoped-aa.ll | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/AMDGPU/memcpy-scoped-aa.ll
|80 ms||x64 debian > LLVM.CodeGen/X86::memcpy-scoped-aa.ll|
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/llc -mtriple=x86_64-linux-gnu -stop-after=finalize-isel -o - /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/X86/memcpy-scoped-aa.ll | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=MIR /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/X86/memcpy-scoped-aa.ll
|130 ms||x64 windows > LLVM.CodeGen/AArch64::memcpy-scoped-aa.ll|
Script: -- : 'RUN: at line 1'; c:\ws\w2\llvm-project\premerge-checks\build\bin\llc.exe -mtriple=aarch64-linux-gnu -o - C:\ws\w2\llvm-project\premerge-checks\llvm\test\CodeGen\AArch64\memcpy-scoped-aa.ll | c:\ws\w2\llvm-project\premerge-checks\build\bin\filecheck.exe C:\ws\w2\llvm-project\premerge-checks\llvm\test\CodeGen\AArch64\memcpy-scoped-aa.ll
|160 ms||x64 windows > LLVM.CodeGen/AMDGPU::memcpy-scoped-aa.ll|
Script: -- : 'RUN: at line 1'; c:\ws\w2\llvm-project\premerge-checks\build\bin\llc.exe -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -o - C:\ws\w2\llvm-project\premerge-checks\llvm\test\CodeGen\AMDGPU\memcpy-scoped-aa.ll | c:\ws\w2\llvm-project\premerge-checks\build\bin\filecheck.exe C:\ws\w2\llvm-project\premerge-checks\llvm\test\CodeGen\AMDGPU\memcpy-scoped-aa.ll
|View Full Test Results (6 Failed)|
With that extra scoped AA info, this store is freely scheduled ahead to minimize register pressure as cyclone disables the latency heuristic.
Similar change due to 'cyclone' schedule favors register pressure over latency.
functionality wise the same. The sequence of offset looks better in terms of the locality.
The same from the functionality wise but loads from that memcpy are scheduled ahead in the favor of latency hiding.
Do I understand correctly that this patch is trying to claim that the memcpy src and dst do not alias through scoped alias metadata? If so, I'm afraid this is incorrect. Our contract for llvm.memcpy requires that src/dst are either NoAlias or MustAlias (but not PartialAlias). From LangRef:
The ‘llvm.memcpy.*’ intrinsics copy a block of memory from the source location to the destination location, which must either be equal or non-overlapping.
The langref says "The ‘llvm.memcpy.*’ intrinsics copy a block of memory from the source location to the destination location, which must either be equal or non-overlapping. It copies “len” bytes of memory over. If the argument is known to be aligned to some boundary, this can be specified as an attribute on the argument.". It says that src and dst are either equal or non-overlapping. For the equal case, they should be eliminated earlier. Even if there are remaining ones, claiming them 'NoAlias' doesn't have correctness issue when copying them. It won't cause correctness issue if we schedule those loads and stores freely.
In addition, the extra AA scope metadata here only has impact on this instance of memcpy.
It does look like a valid way to indicate that the individual loads and stores are independent, except for their value dependency.
Given the very late introduction of new !alias.scope and !noalias metadata, is their a way to have a testcase look at a machine ir dump, together with the metadata output at that phase ?
Generally, you do not know whether they are equal or not.
Even if there are remaining ones, claiming them 'NoAlias' doesn't have correctness issue when copying them. It won't cause correctness issue if we schedule those loads and stores freely.
I agree that rescheduling the loads/stores is correct even if src and dst are equal. However, the metadata itself is still incorrect: It will claim that the loads/stores are NoAlias, even though they are actually MustAlias.
The proposed version introduces 2 scopes, assigning all store to one scope and all loads to another.
I order to avoid the MustAlias vs NoAlias problem, we could introduce a scope per load-store pair. Of course, that might be a lot of extra scopes..
This was discussed earlier today in LLVM's AA Technical call. This method should work, but one possible gotcha came up: sometimes a memcpy lowering results in overlapping load/stores. Those overlapping load/stores must remain 'aliasing', so they should belong to the same scope.
// memcpy(dst, src, 23) becomes: store i64 dst+0, (load i64 src+0) // scope 0 store i64 dst+8, (load i64, src+8) // scope 1, not overlapping with previous pair store i64 dst+15, (load i64, src+15) // also scope 1, overlapping with previous pair
Not being able to catch that meeting. I am not sure the current SDAG would generate that. Let me check. If we did that in the current SDAG implementation, we may remove scoped AA metadata for that trailing (or maybe the heading) loads as the conservative solution before we introduce new scopes.
Just read the relevant threads and bugs reported on the change of allowing exact-overlap on llvm.memcpy. See the reference list at the end. Personally, I think it's OK to assume NoAlias added here. By allowing exact-overlap in llvm.memcpy, the most significant change is on the basic-aa, which must consider the case where the source and destination of llvm.memcpy is the same. The make senses at the IR level, where llvm.memcpy is treated as a single op as the exact-overlap means the copy is a no-op and won't always overwrite the destination memory. But, where we lower that copy into loads/stores, we say that loads/stores won't alias. That's fine as the order between loads and stores on the same offset (or location for the exact-overlap case) is established through the data dependency. In addition, the no-alias established here is a scoped one, which only applies to loads/stored from this llvm.memcpy only. It won't affect the AA result between them to loads/stores out of the scope. (This patch depends on https://reviews.llvm.org/D102215, which propagates scoped AA on mem ops into loads/stores after lowering.)
But, there are definitely issues where the backend may split/merge/narrow/widen memory operations. For split and narrow, they seem fine. I started to address the merge issue in https://reviews.llvm.org/D102821 to ensure the correct scoped AA is used on the merged stores or loads. For widen, mostly load widening, we have a quite tricky issue as the widening result, in most cases, doesn't care about extra content loaded from memory. They want to choose the widened one to maximize performance but we have the risk that simply reusing the scoped AA metadata may result in undesired behavior.
As mentioned in D102255#2766721, we still would need to use separate scopes for ever load/store pair.