Page MenuHomePhabricator

[SelectionDAG] Generate scoped AA metadata when lowering memcpy.
Needs ReviewPublic

Authored by hliao on May 11 2021, 10:03 AM.

Details

Summary
  • When memcpy is lowered in SelectionDAG, similar to inlining a function, scoped AA metadata should be generated and attached on the lowered loads/stores following 'noalias' arguments.

Diff Detail

Unit TestsFailed

TimeTest
80 msx64 debian > LLVM.CodeGen/AArch64::memcpy-scoped-aa.ll
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/llc -mtriple=aarch64-linux-gnu -o - /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/AArch64/memcpy-scoped-aa.ll | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/AArch64/memcpy-scoped-aa.ll
150 msx64 debian > LLVM.CodeGen/AMDGPU::memcpy-scoped-aa.ll
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -o - /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/AMDGPU/memcpy-scoped-aa.ll | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/AMDGPU/memcpy-scoped-aa.ll
80 msx64 debian > LLVM.CodeGen/X86::memcpy-scoped-aa.ll
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/llc -mtriple=x86_64-linux-gnu -stop-after=finalize-isel -o - /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/X86/memcpy-scoped-aa.ll | /var/lib/buildkite-agent/builds/llvm-project/build/bin/FileCheck --check-prefix=MIR /var/lib/buildkite-agent/builds/llvm-project/llvm/test/CodeGen/X86/memcpy-scoped-aa.ll
130 msx64 windows > LLVM.CodeGen/AArch64::memcpy-scoped-aa.ll
Script: -- : 'RUN: at line 1'; c:\ws\w2\llvm-project\premerge-checks\build\bin\llc.exe -mtriple=aarch64-linux-gnu -o - C:\ws\w2\llvm-project\premerge-checks\llvm\test\CodeGen\AArch64\memcpy-scoped-aa.ll | c:\ws\w2\llvm-project\premerge-checks\build\bin\filecheck.exe C:\ws\w2\llvm-project\premerge-checks\llvm\test\CodeGen\AArch64\memcpy-scoped-aa.ll
160 msx64 windows > LLVM.CodeGen/AMDGPU::memcpy-scoped-aa.ll
Script: -- : 'RUN: at line 1'; c:\ws\w2\llvm-project\premerge-checks\build\bin\llc.exe -mtriple=amdgcn-amd-amdhsa -mcpu=gfx1010 -o - C:\ws\w2\llvm-project\premerge-checks\llvm\test\CodeGen\AMDGPU\memcpy-scoped-aa.ll | c:\ws\w2\llvm-project\premerge-checks\build\bin\filecheck.exe C:\ws\w2\llvm-project\premerge-checks\llvm\test\CodeGen\AMDGPU\memcpy-scoped-aa.ll
View Full Test Results (6 Failed)

Event Timeline

hliao created this revision.May 11 2021, 10:03 AM
hliao requested review of this revision.May 11 2021, 10:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2021, 10:03 AM
hliao added inline comments.May 11 2021, 10:14 AM
llvm/test/CodeGen/AArch64/arm64-2012-05-07-MemcpyAlignBug.ll
12

With that extra scoped AA info, this store is freely scheduled ahead to minimize register pressure as cyclone disables the latency heuristic.

llvm/test/CodeGen/AArch64/arm64-memcpy-inline.ll
33

Similar change due to 'cyclone' schedule favors register pressure over latency.

llvm/test/CodeGen/AArch64/arm64-misaligned-memcpy-inline.ll
36

functionality wise the same. The sequence of offset looks better in terms of the locality.

llvm/test/CodeGen/PowerPC/pr45301.ll
29

The same from the functionality wise but loads from that memcpy are scheduled ahead in the favor of latency hiding.

nikic added a subscriber: nikic.

Do I understand correctly that this patch is trying to claim that the memcpy src and dst do not alias through scoped alias metadata? If so, I'm afraid this is incorrect. Our contract for llvm.memcpy requires that src/dst are either NoAlias or MustAlias (but not PartialAlias). From LangRef:

The ‘llvm.memcpy.*’ intrinsics copy a block of memory from the source location to the destination location, which must either be equal or non-overlapping.

hliao added a comment.EditedMay 11 2021, 1:17 PM

Do I understand correctly that this patch is trying to claim that the memcpy src and dst do not alias through scoped alias metadata? If so, I'm afraid this is incorrect. Our contract for llvm.memcpy requires that src/dst are either NoAlias or MustAlias (but not PartialAlias). From LangRef:

The ‘llvm.memcpy.*’ intrinsics copy a block of memory from the source location to the destination location, which must either be equal or non-overlapping.

The langref says "The ‘llvm.memcpy.*’ intrinsics copy a block of memory from the source location to the destination location, which must either be equal or non-overlapping. It copies “len” bytes of memory over. If the argument is known to be aligned to some boundary, this can be specified as an attribute on the argument.". It says that src and dst are either equal or non-overlapping. For the equal case, they should be eliminated earlier. Even if there are remaining ones, claiming them 'NoAlias' doesn't have correctness issue when copying them. It won't cause correctness issue if we schedule those loads and stores freely.
In addition, the extra AA scope metadata here only has impact on this instance of memcpy.

It does look like a valid way to indicate that the individual loads and stores are independent, except for their value dependency.

Given the very late introduction of new !alias.scope and !noalias metadata, is their a way to have a testcase look at a machine ir dump, together with the metadata output at that phase ?

hliao added a comment.May 11 2021, 2:11 PM

It does look like a valid way to indicate that the individual loads and stores are independent, except for their value dependency.

Given the very late introduction of new !alias.scope and !noalias metadata, is their a way to have a testcase look at a machine ir dump, together with the metadata output at that phase ?

That's on my plan to enable the dumping of scoped AA in MIR. These AA metadata are generated within the backend and need special support to dump them friendly.

gchatelet retitled this revision from [SelectionDAG] Generate scoped AA metadata when loweing memcpy. to [SelectionDAG] Generate scoped AA metadata when lowering memcpy..May 12 2021, 12:27 AM
nikic added a comment.May 12 2021, 1:48 PM

Do I understand correctly that this patch is trying to claim that the memcpy src and dst do not alias through scoped alias metadata? If so, I'm afraid this is incorrect. Our contract for llvm.memcpy requires that src/dst are either NoAlias or MustAlias (but not PartialAlias). From LangRef:

The ‘llvm.memcpy.*’ intrinsics copy a block of memory from the source location to the destination location, which must either be equal or non-overlapping.

The langref says "The ‘llvm.memcpy.*’ intrinsics copy a block of memory from the source location to the destination location, which must either be equal or non-overlapping. It copies “len” bytes of memory over. If the argument is known to be aligned to some boundary, this can be specified as an attribute on the argument.". It says that src and dst are either equal or non-overlapping. For the equal case, they should be eliminated earlier.

Generally, you do not know whether they are equal or not.

Even if there are remaining ones, claiming them 'NoAlias' doesn't have correctness issue when copying them. It won't cause correctness issue if we schedule those loads and stores freely.

I agree that rescheduling the loads/stores is correct even if src and dst are equal. However, the metadata itself is still incorrect: It will claim that the loads/stores are NoAlias, even though they are actually MustAlias.

I propose to discuss this in tomorrow's aa tech call.

I agree that rescheduling the loads/stores is correct even if src and dst are equal. However, the metadata itself is still incorrect: It will claim that the loads/stores are NoAlias, even though they are actually MustAlias.

The proposed version introduces 2 scopes, assigning all store to one scope and all loads to another.

I order to avoid the MustAlias vs NoAlias problem, we could introduce a scope per load-store pair. Of course, that might be a lot of extra scopes..

jeroen.dobbelaere added a comment.EditedTue, May 18, 12:06 PM

I agree that rescheduling the loads/stores is correct even if src and dst are equal. However, the metadata itself is still incorrect: It will claim that the loads/stores are NoAlias, even though they are actually MustAlias.

The proposed version introduces 2 scopes, assigning all store to one scope and all loads to another.

I order to avoid the MustAlias vs NoAlias problem, we could introduce a scope per load-store pair. Of course, that might be a lot of extra scopes..

This was discussed earlier today in LLVM's AA Technical call. This method should work, but one possible gotcha came up: sometimes a memcpy lowering results in overlapping load/stores. Those overlapping load/stores must remain 'aliasing', so they should belong to the same scope.

possible example:

// memcpy(dst, src, 23) becomes:
store i64 dst+0, (load i64 src+0)    // scope 0
store i64 dst+8, (load i64, src+8)   // scope 1, not overlapping with previous pair
store i64 dst+15, (load i64, src+15) // also scope 1, overlapping with previous pair
hliao updated this revision to Diff 346540.Wed, May 19, 1:02 PM

Add MIR printer and parser support for scoped AA metadata generated in the backend.

hliao added a comment.Wed, May 19, 1:08 PM

I agree that rescheduling the loads/stores is correct even if src and dst are equal. However, the metadata itself is still incorrect: It will claim that the loads/stores are NoAlias, even though they are actually MustAlias.

The proposed version introduces 2 scopes, assigning all store to one scope and all loads to another.

I order to avoid the MustAlias vs NoAlias problem, we could introduce a scope per load-store pair. Of course, that might be a lot of extra scopes..

This was discussed earlier today in LLVM's AA Technical call. This method should work, but one possible gotcha came up: sometimes a memcpy lowering results in overlapping load/stores. Those overlapping load/stores must remain 'aliasing', so they should belong to the same scope.

possible example:

// memcpy(dst, src, 23) becomes:
store i64 dst+0, (load i64 src+0)    // scope 0
store i64 dst+8, (load i64, src+8)   // scope 1, not overlapping with previous pair
store i64 dst+15, (load i64, src+15) // also scope 1, overlapping with previous pair

Not being able to catch that meeting. I am not sure the current SDAG would generate that. Let me check. If we did that in the current SDAG implementation, we may remove scoped AA metadata for that trailing (or maybe the heading) loads as the conservative solution before we introduce new scopes.

Just read the relevant threads and bugs reported on the change of allowing exact-overlap on llvm.memcpy. See the reference list at the end. Personally, I think it's OK to assume NoAlias added here. By allowing exact-overlap in llvm.memcpy, the most significant change is on the basic-aa, which must consider the case where the source and destination of llvm.memcpy is the same. The make senses at the IR level, where llvm.memcpy is treated as a single op as the exact-overlap means the copy is a no-op and won't always overwrite the destination memory. But, where we lower that copy into loads/stores, we say that loads/stores won't alias. That's fine as the order between loads and stores on the same offset (or location for the exact-overlap case) is established through the data dependency. In addition, the no-alias established here is a scoped one, which only applies to loads/stored from this llvm.memcpy only. It won't affect the AA result between them to loads/stores out of the scope. (This patch depends on https://reviews.llvm.org/D102215, which propagates scoped AA on mem ops into loads/stores after lowering.)

But, there are definitely issues where the backend may split/merge/narrow/widen memory operations. For split and narrow, they seem fine. I started to address the merge issue in https://reviews.llvm.org/D102821 to ensure the correct scoped AA is used on the merged stores or loads. For widen, mostly load widening, we have a quite tricky issue as the widening result, in most cases, doesn't care about extra content loaded from memory. They want to choose the widened one to maximize performance but we have the risk that simply reusing the scoped AA metadata may result in undesired behavior.

https://lists.llvm.org/pipermail/cfe-dev/2020-August/066614.html

Add MIR printer and parser support for scoped AA metadata generated in the backend.

Hi Michael, I think it would be better (easier to review) if you could put this part in a separate patch.

Can you please split off the metadata parsing part into a separate patch? It's not directly related to the memcpy lowering.

hliao added a comment.Tue, May 25, 1:31 PM

Can you please split off the metadata parsing part into a separate patch? It's not directly related to the memcpy lowering.

OK, I need to add unit tests to verify that as we won't be able to generate machine metadata within the current backend. I will prepare that this night.

hliao added a comment.Wed, May 26, 1:44 PM

Can you please split off the metadata parsing part into a separate patch? It's not directly related to the memcpy lowering.

OK, I need to add unit tests to verify that as we won't be able to generate machine metadata within the current backend. I will prepare that this night.

MIR printer change is split into https://reviews.llvm.org/D103205

hliao updated this revision to Diff 348369.Thu, May 27, 1:14 PM

Clean up after splitting MIR printer and parser changes.

hliao added a comment.Thu, May 27, 1:56 PM

Clean up after splitting MIR printer and parser changes.

here's the printer change (D103205) and parse change (D103282)

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
6399–6422

As mentioned in D102255#2766721, we still would need to use separate scopes for ever load/store pair.
That's the only way to avoid a possible 'noalias' result when the source and destination happen to be identical.
(based on the AA Techcall discussion, such a situation was considered to be not acceptable).