This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Fix DAG combine store elimination, different address space.
ClosedPublic

Authored by hgreving on May 7 2021, 4:18 PM.

Details

Summary
Fixes a bug in the DAG combiner that eliminates the stores because it missed
to inspect the address space of the pointers.

%v = load %ptr_as1
// no chain side effect
store %v, %ptr_as2

As well as

store %v, %ptr_as1
store %v, %ptr_as2

Adds a test for above in X86.

Diff Detail

Event Timeline

hgreving created this revision.May 7 2021, 4:18 PM
hgreving requested review of this revision.May 7 2021, 4:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 7 2021, 4:18 PM
hgreving edited the summary of this revision. (Show Details)May 7 2021, 4:18 PM
myhsu added a subscriber: myhsu.May 9 2021, 9:30 PM

This patch looks legit to me.
I'll suggest you to put people that have been actively working on this part as reviewers. They might have some other comments.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17920

just curious, is the code here formatted by clang-format?

llvm/test/CodeGen/X86/fs_gs_dag_combine.ll
18 ↗(On Diff #343779)

copy_fs_diff and copy_gs_diff seem a little redundant to me: The address spaces are already different, why should we care about address? Can you elaborate a little more on the purpose of these two functions?

What about DeadStoreElimination ? It can also merge stores.. maybe dse needs similar fix.

RKSimon added inline comments.May 10 2021, 4:47 AM
llvm/test/CodeGen/X86/fs_gs_dag_combine.ll
3 ↗(On Diff #343779)

very pedantic but maybe call this file dagcombine-dead-store.ll? That matches similar files in the folder. Then improve the comment here to explain that the address spaces are different and the stores shouldn't be eliminated?

hgreving updated this revision to Diff 344066.May 10 2021, 8:27 AM
hgreving marked an inline comment as done.
hgreving added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17920

Yes.

llvm/test/CodeGen/X86/fs_gs_dag_combine.ll
18 ↗(On Diff #343779)

I've added a comment. I thought the two more tests won't hurt, I'm fine removing the :gs versions if considered unnecessary.

What about DeadStoreElimination ? It can also merge stores.. maybe dse needs similar fix.

https://www.godbolt.org/z/aqncWfb4c seems to suggest maybe, though haven't debugged whether it's DSE or DAG combine again who does it.

What about DeadStoreElimination ? It can also merge stores.. maybe dse needs similar fix.

https://www.godbolt.org/z/aqncWfb4c seems to suggest maybe, though haven't debugged whether it's DSE or DAG combine again who does it.

If you show the IR for that example, it looks ok:
https://www.godbolt.org/z/55v7anfa3
I don't see any explicit references to address spaces within DSE itself, so I assume the underlying ValueTracking and/or Alias Analysis is ok here.

What about DeadStoreElimination ? It can also merge stores.. maybe dse needs similar fix.

https://www.godbolt.org/z/aqncWfb4c seems to suggest maybe, though haven't debugged whether it's DSE or DAG combine again who does it.

If you show the IR for that example, it looks ok:
https://www.godbolt.org/z/55v7anfa3

Can you add that example to the test file and commit that file as a preliminary step?
Please use "utils/update_llc_test_checks.py" if possible to auto-generate the FileCheck lines.

I've found the same bug in DAG combine, this time for store to store as somebody has already suspected above. I will add a fix + test this to this patch.

hgreving updated this revision to Diff 344090.May 10 2021, 9:32 AM

Extended the scope of this a little bit. The same bug was present for store to store opts as well.
Adds more tests and elaborates more in the comment.

arsenm requested changes to this revision.May 10 2021, 10:02 AM
arsenm added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17934

You don't need the getPointerInfo, StoreSDNode directly provides getAddressSpace

This revision now requires changes to proceed.May 10 2021, 10:02 AM
hgreving edited the summary of this revision. (Show Details)May 10 2021, 10:07 AM
hgreving updated this revision to Diff 344109.May 10 2021, 10:27 AM
hgreving marked an inline comment as done.

s/->getPointerInfo()->getAddrSpace()/->getAddressSpace()/

s/->getPointerInfo()->getAddrSpace()/->getAddressSpace()/

hgreving updated this revision to Diff 344131.May 10 2021, 11:34 AM

clang-format missed for last update.

arsenm accepted this revision.May 10 2021, 12:11 PM
This revision is now accepted and ready to land.May 10 2021, 12:11 PM

Code changes LG.
I'd still prefer for the tests to go in first, so we have a regression test record of the miscompiles. Let me know if the suggestion was missed or isn't clear.

Code changes LG.
I'd still prefer for the tests to go in first, so we have a regression test record of the miscompiles. Let me know if the suggestion was missed or isn't clear.

I did miss this comment. Could you clarify, do you want me to commit the failing test, or a passing version with a FIXME?

Code changes LG.
I'd still prefer for the tests to go in first, so we have a regression test record of the miscompiles. Let me know if the suggestion was missed or isn't clear.

I did miss this comment. Could you clarify, do you want me to commit the failing test, or a passing version with a FIXME?

@spatel meant the passing version (current trunk codegen) with a FIXME - you then rebase so this patch shows the codegen diffs - its easier to handle if you auto generate the checks with the update scripts.

Code changes LG.
I'd still prefer for the tests to go in first, so we have a regression test record of the miscompiles. Let me know if the suggestion was missed or isn't clear.

I did miss this comment. Could you clarify, do you want me to commit the failing test, or a passing version with a FIXME?

@spatel meant the passing version (current trunk codegen) with a FIXME - you then rebase so this patch shows the codegen diffs - its easier to handle if you auto generate the checks with the update scripts.

Right. It would be nice to commit the baseline (current) output - it will look something like this:

define i32 @copy_fs_same() {
; CHECK-LABEL: copy_fs_same:
; CHECK:       # %bb.0:
; CHECK-NEXT:    movl 1, %eax
; CHECK-NEXT:    retl
  %t0 = load i32, i32* inttoptr (i64 1 to i32*), align 4
  store i32 %t0, i32 addrspace(257)* inttoptr (i64 1 to i32 addrspace(257)*), align 4
  ret i32 %t0
}

One more test question - is it possible to add a test that exercises the indexed address code path?
Something like this might do it?

define void @output_fs_same_indexed(i32 %v, i32* %b) {
; CHECK-LABEL: output_fs_same:
; CHECK:       # %bb.0:
; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %eax
; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %ecx
; CHECK-NEXT:    movl %eax, 168(%ecx)
; CHECK-NEXT:    retl
  %p = getelementptr i32, i32* %b, i64 42
  %pa = addrspacecast i32* %p to i32 addrspace(257)*
  store i32 %v, i32* %p, align 4
  store i32 %v, i32 addrspace(257)* %pa, align 4
  ret void
}

Code changes LG.
I'd still prefer for the tests to go in first, so we have a regression test record of the miscompiles. Let me know if the suggestion was missed or isn't clear.

I did miss this comment. Could you clarify, do you want me to commit the failing test, or a passing version with a FIXME?

@spatel meant the passing version (current trunk codegen) with a FIXME - you then rebase so this patch shows the codegen diffs - its easier to handle if you auto generate the checks with the update scripts.

Right. It would be nice to commit the baseline (current) output - it will look something like this:

define i32 @copy_fs_same() {
; CHECK-LABEL: copy_fs_same:
; CHECK:       # %bb.0:
; CHECK-NEXT:    movl 1, %eax
; CHECK-NEXT:    retl
  %t0 = load i32, i32* inttoptr (i64 1 to i32*), align 4
  store i32 %t0, i32 addrspace(257)* inttoptr (i64 1 to i32 addrspace(257)*), align 4
  ret i32 %t0
}

One more test question - is it possible to add a test that exercises the indexed address code path?
Something like this might do it?

define void @output_fs_same_indexed(i32 %v, i32* %b) {
; CHECK-LABEL: output_fs_same:
; CHECK:       # %bb.0:
; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %eax
; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %ecx
; CHECK-NEXT:    movl %eax, 168(%ecx)
; CHECK-NEXT:    retl
  %p = getelementptr i32, i32* %b, i64 42
  %pa = addrspacecast i32* %p to i32 addrspace(257)*
  store i32 %v, i32* %p, align 4
  store i32 %v, i32 addrspace(257)* %pa, align 4
  ret void
}

WB https://reviews.llvm.org/D102243 ?
Let me add the suggested test changes.

hgreving updated this revision to Diff 344434.May 11 2021, 9:20 AM

Removed :gs versions in the test.
Added indexed store tests.

Code changes LG.
I'd still prefer for the tests to go in first, so we have a regression test record of the miscompiles. Let me know if the suggestion was missed or isn't clear.

I did miss this comment. Could you clarify, do you want me to commit the failing test, or a passing version with a FIXME?

@spatel meant the passing version (current trunk codegen) with a FIXME - you then rebase so this patch shows the codegen diffs - its easier to handle if you auto generate the checks with the update scripts.

Right. It would be nice to commit the baseline (current) output - it will look something like this:

define i32 @copy_fs_same() {
; CHECK-LABEL: copy_fs_same:
; CHECK:       # %bb.0:
; CHECK-NEXT:    movl 1, %eax
; CHECK-NEXT:    retl
  %t0 = load i32, i32* inttoptr (i64 1 to i32*), align 4
  store i32 %t0, i32 addrspace(257)* inttoptr (i64 1 to i32 addrspace(257)*), align 4
  ret i32 %t0
}

One more test question - is it possible to add a test that exercises the indexed address code path?
Something like this might do it?

define void @output_fs_same_indexed(i32 %v, i32* %b) {
; CHECK-LABEL: output_fs_same:
; CHECK:       # %bb.0:
; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %eax
; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %ecx
; CHECK-NEXT:    movl %eax, 168(%ecx)
; CHECK-NEXT:    retl
  %p = getelementptr i32, i32* %b, i64 42
  %pa = addrspacecast i32* %p to i32 addrspace(257)*
  store i32 %v, i32* %p, align 4
  store i32 %v, i32 addrspace(257)* %pa, align 4
  ret void
}

Done.. I've removed the :gs versions, they didn't add any value.

hgreving updated this revision to Diff 344449.May 11 2021, 9:50 AM

Used update_llc_test_checks.py for CHECKs.

hgreving updated this revision to Diff 344465.May 11 2021, 10:30 AM

Fix test's comment gibberish.

If you update/rebase this patch after landing D102243, the test file should have just a few different CHECK lines.

spatel accepted this revision.May 11 2021, 10:31 AM

LGTM - thanks!

hgreving updated this revision to Diff 344471.May 11 2021, 10:32 AM
jrtc27 added a subscriber: jrtc27.May 11 2021, 10:33 AM
jrtc27 added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17919

Should this be weakened using isNoopAddrSpaceCast both(?) ways round?

hgreving marked an inline comment as done.May 11 2021, 10:53 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17919

I am actually not sure about the _exact_ semantics of isNoopAddrSpaceCast(). Does it imply that if true, it is a complete nop w.r.t. the memory model?

jrtc27 added inline comments.May 11 2021, 10:56 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17919

memcpy/memmove/memset libcall lowering at least relies on being able to lower calls with any pointers that can be no-op cast to address space 0 (see checkAddrSpaceIsValidForLibcall). CodeGenPrepare also peeks through them though haven't looked exactly how.

hgreving added inline comments.May 11 2021, 11:20 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17919

My take on this would be to stay rather slightly conservative and fix this bug, which can lead to clear miscompiles, unless somebody jumps in and confirms the exact semantics of isNoopAddrSpaceCast(). The comment just says "is a nop". This doesn't directly mean to me that one can eliminate a store like that. Then OTOH, if the two stores are in any way different, then the cast should not be a nop either, so maybe yes?

Should I add a TODO comment here maybe?

arsenm added inline comments.May 11 2021, 1:38 PM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17919

I do not think this should check isNoopAddrSpaceCast. This is mostly a question of whether an addrspacecast changes the bitpattern of the pointer. The memory accesses may still have different properties and isn't related to these combines

Is this ok pushing? There was an open question about isNoopAddrSpaceCast(). I would suggest somebody would look at that post this patch as a possible minor improvement?

hgreving marked an inline comment as done.May 11 2021, 7:46 PM
hgreving added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17919

Makes sense.

RKSimon accepted this revision.May 12 2021, 1:56 AM

LGTM with a few minors (update the patch just before commit - no need to get the TODO comments reviewed again)

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17915

Add an additional TODO comment about adding a isNoopAddrSpaceCast check

17928

Add an additional TODO comment

17940

Add a TODO comment

arsenm added inline comments.May 12 2021, 5:27 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
17915

No, do not add this todo. It does not make any sense. There is no addrspacecasting going on in this context

This revision was landed with ongoing or failed builds.May 12 2021, 7:22 AM
This revision was automatically updated to reflect the committed changes.
hgreving marked an inline comment as done.