Page MenuHomePhabricator

[GVN] Clobber partially aliased loads.
ClosedPublic

Authored by dfukalov on Jan 27 2021, 10:17 AM.

Details

Summary

Use offsets stored in AliasResult implemented in D98718.

Updated with fix of issue reported in https://reviews.llvm.org/D95543#2745161

Diff Detail

Event Timeline

dfukalov requested review of this revision.Jan 27 2021, 10:17 AM
dfukalov created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2021, 10:17 AM
dfukalov planned changes to this revision.Jan 27 2021, 10:18 AM
dfukalov updated this revision to Diff 327811.Mar 3 2021, 8:33 AM

Implementation.

dfukalov retitled this revision from [WIP][GVN] Clobber partially aliased loads. to [GVN] Clobber partially aliased loads..Mar 3 2021, 8:39 AM
dfukalov edited the summary of this revision. (Show Details)
dfukalov added reviewers: nikic, asbirlea, fhahn.
dfukalov added a subscriber: fvrmatteo.
bmahjour removed a subscriber: bmahjour.Mar 3 2021, 8:40 AM
dfukalov planned changes to this revision.Mar 16 2021, 9:19 AM

The patch should be updated after converting AliasResult.

dfukalov updated this revision to Diff 335902.Apr 7 2021, 12:26 PM

Rebased to use reworked AliasResult in D98718.

dfukalov edited the summary of this revision. (Show Details)Apr 7 2021, 12:27 PM

Performance-wise this looks good.
Please wait to see if the other reviewers have comments on this.

dfukalov updated this revision to Diff 336377.Apr 9 2021, 3:33 AM

Rebased after preparation changes (D98027+D98718) committed.

dfukalov updated this revision to Diff 336379.Apr 9 2021, 3:38 AM

Removed unneeded change in MemorySSATest.cpp.

nikic added inline comments.Apr 11 2021, 2:10 PM
llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
518

Do you have any information on why this is not a concern anymore? The comment was introduced in https://github.com/llvm/llvm-project/commit/a471751c24324e7ba6ac5c612dbedb16c644fc44, unfortunately without a test case :(

522

Say we have a load from X, then a load from PartialAlias of X without offset, then another load from X. I think after your change, we will no longer forward from the first load, because we'll return a clobber for the PartialAlias in between, even though it cannot be used.

Can you please test this situation? Maybe we should not return a clobber if no offset is available?

llvm/test/Transforms/GVN/clobber-partial-alias.ll
1

update_test_checks please.

dfukalov updated this revision to Diff 336899.Apr 12 2021, 10:47 AM

Addressing comments.

dfukalov marked 2 inline comments as done.Apr 12 2021, 10:59 AM
dfukalov added inline comments.
llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
518

Yes, I saw this change and because of this comment added VisitedPhiBBs check when we return PartialAlias with offset in BasicAAResult::aliasGEP.

522

Yes, you're right there is no reason to return here PartialAlias if we don't know the offset.

dfukalov updated this revision to Diff 339022.Tue, Apr 20, 3:54 PM
dfukalov marked an inline comment as done.

Added test for partially aliased loads with phi translation in address.

dfukalov marked an inline comment as done.Tue, Apr 20, 3:56 PM
dfukalov added inline comments.
llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
518

I added test load_load_partial_alias_cross_block_phi_trans to check this.

nikic added inline comments.Thu, Apr 22, 2:27 PM
llvm/test/Transforms/GVN/PRE/rle.ll
736 ↗(On Diff #339022)

I believe for phi translation to work you need to have the translated geps in the predecessor. This test case worked for me:

define i32 @load_load_partial_alias_cross_block_phi_trans(i8* %P) nounwind {
; LE-LABEL: @load_load_partial_alias_cross_block_phi_trans(
; LE-NEXT:  entry:
; LE-NEXT:    [[XX:%.*]] = bitcast i8* [[P:%.*]] to i32*
; LE-NEXT:    [[X1:%.*]] = load i32, i32* [[XX]], align 4
; LE-NEXT:    [[CMP:%.*]] = icmp eq i32 [[X1]], 127
; LE-NEXT:    [[TMP0:%.*]] = lshr i32 [[X1]], 16
; LE-NEXT:    [[TMP1:%.*]] = trunc i32 [[TMP0]] to i8
; LE-NEXT:    [[TMP2:%.*]] = lshr i32 [[X1]], 8
; LE-NEXT:    [[TMP3:%.*]] = trunc i32 [[TMP2]] to i8
; LE-NEXT:    br i1 [[CMP]], label [[IF:%.*]], label [[ELSE:%.*]]
; LE:       if:
; LE-NEXT:    br label [[JOIN:%.*]]
; LE:       else: 
; LE-NEXT:    br label [[JOIN]]
; LE:       join: 
; LE-NEXT:    [[TMP5:%.*]] = phi i8 [ [[TMP3]], [[IF]] ], [ [[TMP1]], [[ELSE]] ]
; LE-NEXT:    [[CONV6:%.*]] = zext i8 [[TMP5]] to i32
; LE-NEXT:    ret i32 [[CONV6]]
; LE:       if.end:
; LE-NEXT:    ret i32 52 
; 
; BE-LABEL: @load_load_partial_alias_cross_block2(
; BE-NEXT:  entry:
; BE-NEXT:    [[XX:%.*]] = bitcast i8* [[P:%.*]] to i32*
; BE-NEXT:    [[X1:%.*]] = load i32, i32* [[XX]], align 4
; BE-NEXT:    [[CMP:%.*]] = icmp eq i32 [[X1]], 127
; BE-NEXT:    [[TMP0:%.*]] = lshr i32 [[X1]], 8
; BE-NEXT:    [[TMP1:%.*]] = trunc i32 [[TMP0]] to i8
; BE-NEXT:    [[TMP2:%.*]] = lshr i32 [[X1]], 16
; BE-NEXT:    [[TMP3:%.*]] = trunc i32 [[TMP2]] to i8
; BE-NEXT:    br i1 [[CMP]], label [[IF:%.*]], label [[ELSE:%.*]]
; BE:       if:
; BE-NEXT:    br label [[JOIN:%.*]]
; BE:       else: 
; BE-NEXT:    br label [[JOIN]]
; BE:       join: 
; BE-NEXT:    [[TMP5:%.*]] = phi i8 [ [[TMP3]], [[IF]] ], [ [[TMP1]], [[ELSE]] ]
; BE-NEXT:    [[CONV6:%.*]] = zext i8 [[TMP5]] to i32
; BE-NEXT:    ret i32 [[CONV6]]
; BE:       if.end:
; BE-NEXT:    ret i32 52
; 
entry:
  %xx = bitcast i8* %P to i32*
  %x1 = load i32, i32* %xx, align 4
  %cmp = icmp eq i32 %x1, 127
  br i1 %cmp, label %if, label %else

if:
  %arrayidx.if = getelementptr inbounds i8, i8* %P, i64 1
  br label %join

else:
  %arrayidx.else = getelementptr inbounds i8, i8* %P, i64 2
  br label %join
  
join: 
  %idx = phi i64 [ 1, %if ], [ 2, %else ]
  %arrayidx4 = getelementptr inbounds i8, i8* %P, i64 %idx
  %tmp5 = load i8, i8* %arrayidx4, align 1
  %conv6 = zext i8 %tmp5 to i32
  ret i32 %conv6

if.end:
  ret i32 52
}

I think the actually problematic case probably has something to do with phi translation in loops (where you translate into the same block, but in a different iteration). I'll try to play with that tomorrow, but if I don't find anything let's just land this and see if any miscompiles turn up :)

nikic accepted this revision.Fri, Apr 23, 10:02 AM

LGTM, you might want to pick up the additional test cases.

llvm/lib/Transforms/Scalar/GVN.cpp
1009

I don't think we can really do anything with negative offsets, unless we want to do something like shorten the second load and combine it with the value of the first load. That's unlikely to be profitable.

llvm/test/Transforms/GVN/PRE/rle.ll
736 ↗(On Diff #339022)

I constructed a case that involves a loop, but everything seems to work fine in that case as well:

define void @load_load_partial_alias_loop(i8* %P) {
; LE-LABEL: @load_load_partial_alias_loop(
; LE-NEXT:  entry:
; LE-NEXT:    [[P_1:%.*]] = getelementptr i8, i8* [[P:%.*]], i64 1
; LE-NEXT:    [[V_1:%.*]] = load i8, i8* [[P_1]], align 1
; LE-NEXT:    call void @use.i8(i8 [[V_1]])
; LE-NEXT:    [[P_1_32:%.*]] = bitcast i8* [[P_1]] to i32*
; LE-NEXT:    [[V_1_32:%.*]] = load i32, i32* [[P_1_32]], align 4
; LE-NEXT:    call void @use.i32(i32 [[V_1_32]])
; LE-NEXT:    [[TMP0:%.*]] = trunc i32 [[V_1_32]] to i8
; LE-NEXT:    br label [[LOOP:%.*]]
; LE:       loop:
; LE-NEXT:    [[V_I:%.*]] = phi i8 [ [[TMP0]], [[ENTRY:%.*]] ], [ [[TMP2:%.*]], [[LOOP_LOOP_CRIT_EDGE:%.*]] ]
; LE-NEXT:    [[I:%.*]] = phi i64 [ 1, [[ENTRY]] ], [ [[I_INC:%.*]], [[LOOP_LOOP_CRIT_EDGE]] ]
; LE-NEXT:    [[P_I:%.*]] = getelementptr i8, i8* [[P]], i64 [[I]]
; LE-NEXT:    call void @use.i8(i8 [[V_I]])
; LE-NEXT:    [[P_I_32:%.*]] = bitcast i8* [[P_I]] to i32*
; LE-NEXT:    [[V_I_32:%.*]] = load i32, i32* [[P_I_32]], align 4
; LE-NEXT:    call void @use.i32(i32 [[V_I_32]])
; LE-NEXT:    [[I_INC]] = add i64 [[I]], 1
; LE-NEXT:    [[CMP:%.*]] = icmp ne i64 [[I_INC]], 64
; LE-NEXT:    [[TMP1:%.*]] = lshr i32 [[V_I_32]], 8
; LE-NEXT:    [[TMP2]] = trunc i32 [[TMP1]] to i8
; LE-NEXT:    br i1 [[CMP]], label [[LOOP_LOOP_CRIT_EDGE]], label [[EXIT:%.*]]
; LE:       loop.loop_crit_edge:
; LE-NEXT:    br label [[LOOP]]
; LE:       exit:
; LE-NEXT:    ret void
;
; BE-LABEL: @load_load_partial_alias_loop(
; BE-NEXT:  entry:
; BE-NEXT:    [[P_1:%.*]] = getelementptr i8, i8* [[P:%.*]], i64 1
; BE-NEXT:    [[V_1:%.*]] = load i8, i8* [[P_1]], align 1
; BE-NEXT:    call void @use.i8(i8 [[V_1]])
; BE-NEXT:    [[P_1_32:%.*]] = bitcast i8* [[P_1]] to i32*
; BE-NEXT:    [[V_1_32:%.*]] = load i32, i32* [[P_1_32]], align 4
; BE-NEXT:    call void @use.i32(i32 [[V_1_32]])
; BE-NEXT:    [[TMP0:%.*]] = lshr i32 [[V_1_32]], 24
; BE-NEXT:    [[TMP1:%.*]] = trunc i32 [[TMP0]] to i8
; BE-NEXT:    br label [[LOOP:%.*]]
; BE:       loop:
; BE-NEXT:    [[V_I:%.*]] = phi i8 [ [[TMP1]], [[ENTRY:%.*]] ], [ [[TMP3:%.*]], [[LOOP_LOOP_CRIT_EDGE:%.*]] ]
; BE-NEXT:    [[I:%.*]] = phi i64 [ 1, [[ENTRY]] ], [ [[I_INC:%.*]], [[LOOP_LOOP_CRIT_EDGE]] ]
; BE-NEXT:    [[P_I:%.*]] = getelementptr i8, i8* [[P]], i64 [[I]]
; BE-NEXT:    call void @use.i8(i8 [[V_I]])
; BE-NEXT:    [[P_I_32:%.*]] = bitcast i8* [[P_I]] to i32*
; BE-NEXT:    [[V_I_32:%.*]] = load i32, i32* [[P_I_32]], align 4
; BE-NEXT:    call void @use.i32(i32 [[V_I_32]])
; BE-NEXT:    [[I_INC]] = add i64 [[I]], 1
; BE-NEXT:    [[CMP:%.*]] = icmp ne i64 [[I_INC]], 64
; BE-NEXT:    [[TMP2:%.*]] = lshr i32 [[V_I_32]], 16
; BE-NEXT:    [[TMP3]] = trunc i32 [[TMP2]] to i8
; BE-NEXT:    br i1 [[CMP]], label [[LOOP_LOOP_CRIT_EDGE]], label [[EXIT:%.*]]
; BE:       loop.loop_crit_edge:
; BE-NEXT:    br label [[LOOP]]
; BE:       exit:
; BE-NEXT:    ret void
;
entry:
  %P.1 = getelementptr i8, i8* %P, i64 1
  %v.1 = load i8, i8* %P.1
  call void @use.i8(i8 %v.1)
  %P.1.32 = bitcast i8* %P.1 to i32*
  %v.1.32 = load i32, i32* %P.1.32
  call void @use.i32(i32 %v.1.32)
  br label %loop

loop:
  %i = phi i64 [ 1, %entry ], [ %i.inc, %loop ]
  %P.i = getelementptr i8, i8* %P, i64 %i
  %v.i = load i8, i8* %P.i
  call void @use.i8(i8 %v.i)
  %P.i.32 = bitcast i8* %P.i to i32*
  %v.i.32 = load i32, i32* %P.i.32
  call void @use.i32(i32 %v.i.32)
  %i.inc = add i64 %i, 1
  %cmp = icmp ne i64 %i.inc, 64
  br i1 %cmp, label %loop, label %exit

exit:
  ret void
}

declare void @use.i8(i8) readnone
declare void @use.i32(i32) readnone

Now, I have some doubts that these transforms are profitable, but at least they're not wrong.

This revision is now accepted and ready to land.Fri, Apr 23, 10:02 AM
This revision was automatically updated to reflect the committed changes.

We're seeing some issues with this patch, potentially a miscompile. When we enable assertions we get an error about widening atomic loads -- I'm not sure this is the source of the miscompile we're seeing, but it certainly looks related (I think this is causing corruption and leading to issues elsewhere).

$ cat repro.ll
; ModuleID = 'repro.ll'
source_filename = "repro.ll"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.widget = type { i32 }
%struct.baz = type { i32, %struct.snork }
%struct.snork = type { %struct.spam }
%struct.spam = type { i32, i32 }

@global = external local_unnamed_addr global %struct.widget, align 4
@global.1 = external local_unnamed_addr global i8, align 1
@global.2 = external local_unnamed_addr global i32, align 4

define void @zot(%struct.baz* %arg) local_unnamed_addr align 2 {
bb:
  %tmp = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1
  %tmp1 = bitcast %struct.snork* %tmp to i64*
  %tmp2 = load i64, i64* %tmp1, align 4
  %tmp3 = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1, i32 0, i32 1
  %tmp4 = icmp ugt i64 %tmp2, 4294967295
  br label %bb5

bb5:                                              ; preds = %bb14, %bb
  %tmp6 = load i32, i32* %tmp3, align 4
  %tmp7 = icmp ne i32 %tmp6, 0
  %tmp8 = select i1 %tmp7, i1 %tmp4, i1 false
  %tmp9 = zext i1 %tmp8 to i8
  store i8 %tmp9, i8* @global.1, align 1
  %tmp10 = load i32, i32* @global.2, align 4
  switch i32 %tmp10, label %bb11 [
    i32 1, label %bb12
    i32 2, label %bb12
  ]

bb11:                                             ; preds = %bb5
  br label %bb14

bb12:                                             ; preds = %bb5, %bb5
  %tmp13 = load atomic i32, i32* getelementptr inbounds (%struct.widget, %struct.widget* @global, i64 0, i32 0) acquire, align 4
  br label %bb14

bb14:                                             ; preds = %bb12, %bb11
  br label %bb5
}
$ opt -O2 repro.ll -disable-output
opt: /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Utils/VNCoercion.cpp:496: llvm::Value *llvm::VNCoercion::getLoadValueForLoad(llvm::LoadInst *, unsigned int, llvm::Type *, llvm::Instruction *, const llvm::DataLayout &): Assertion `SrcVal->isSimple() && "Cannot widen volatile/atomic load!"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/rupprecht/dev/opt -O2 repro.ll -disable-output
...
#10 0x000000000768c138 llvm::VNCoercion::getLoadValueForLoad(llvm::LoadInst*, unsigned int, llvm::Type*, llvm::Instruction*, llvm::DataLayout const&) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Utils/VNCoercion.cpp:497:5
#11 0x00000000070a01ad llvm::gvn::AvailableValue::MaterializeAdjustedValue(llvm::LoadInst*, llvm::Instruction*, llvm::GVN&) const /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:902:11
#12 0x00000000070ae067 llvm::gvn::AvailableValueInBlock::MaterializeAdjustedValue(llvm::LoadInst*, llvm::GVN&) const /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:281:5
#13 0x00000000070a1d11 ConstructSSAForLoadSet(llvm::LoadInst*, llvm::SmallVectorImpl<llvm::gvn::AvailableValueInBlock>&, llvm::GVN&) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:874:40
#14 0x00000000070a3ce6 llvm::GVN::processNonLocalLoad(llvm::LoadInst*) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:1614:12
#15 0x00000000070a610a llvm::GVN::processLoad(llvm::LoadInst*) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:1866:5
#16 0x00000000070a7398 llvm::GVN::processInstruction(llvm::Instruction*) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:2292:9
#17 0x00000000070a8175 llvm::GVN::processBlock(llvm::BasicBlock*) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:2490:24
#18 0x00000000070a7c4f llvm::GVN::iterateOnFunction(llvm::Function&) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:2837:16
#19 0x000000000709fdd6 llvm::GVN::runImpl(llvm::Function&, llvm::AssumptionCache&, llvm::DominatorTree&, llvm::TargetLibraryInfo const&, llvm::AAResults&, llvm::MemoryDependenceResults*, llvm::LoopInfo*, llvm::OptimizationRemarkEmitter*, llvm::MemorySSA*) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:2436:20
#20 0x000000000709fa6e llvm::GVN::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:674:8
#21 0x0000000007873fb7 llvm::detail::PassModel<llvm::Function, llvm::GVN, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /home/rupprecht/src/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:85:17
#22 0x000000000691574c llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /home/rupprecht/src/llvm-project/llvm/include/llvm/IR/PassManager.h:517:16
#23 0x0000000004617e77 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /home/rupprecht/src/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:85:17

We're seeing some issues with this patch, potentially a miscompile. When we enable assertions we get an error about widening atomic loads -- I'm not sure this is the source of the miscompile we're seeing, but it certainly looks related (I think this is causing corruption and leading to issues elsewhere).

@rupprecht thanks for report, I'll take a look.

Thanks for taking a look. Would it be all right to revert in the meantime? (I'm assuming the fix is not trivial)

Temporarily reverted in fec2945998947f04d672e9c5f33b57f7177474c0 to keep trunk clean.

I'm not a domain expert here, but I can test out a fix for this patch when you have one ready.

Hi @nikic, I've prepared updated patch with fix for the issue reported recently. Should I create new review request or reopen this one and update the diff?

nikic added a comment.Wed, May 12, 4:57 AM

Hi @nikic, I've prepared updated patch with fix for the issue reported recently. Should I create new review request or reopen this one and update the diff?

Either way is fine. If you reopen this one, please also mark it as "review requested", so it goes back to the queue.

dfukalov reopened this revision.Wed, May 12, 5:55 AM
This revision is now accepted and ready to land.Wed, May 12, 5:55 AM
dfukalov updated this revision to Diff 344792.Wed, May 12, 5:56 AM
dfukalov edited the summary of this revision. (Show Details)

Atomic loads are stored in MD as clobber dependent in caches so GVN got a trash
offset for the pair of unaliased load and atomic load in the test case.

Added storage map for offsets of different dependent loads, updated tests.

dfukalov requested review of this revision.Wed, May 12, 5:57 AM

Updated patch LG -- the unreduced test passes now (and has no assertion errors). Thanks! (Deferring to @nikic or others for actual re-review).

nikic added a comment.Thu, May 13, 3:36 AM

Something that's not quite clear to me is why the Offset = None at the start of getDependency() wasn't sufficient. For the atomic load clobber case, wouldn't it keep that None value?

getDependency() call is just a start of processing in GVN, and MemoryDependenceResults contains a number of a load' dependencies. One of them was NonLocal clobbering dependency between %tmp4.1 = load i8... and %tmp6.1 = load atomic i8... since latter is atomic (e.g. MemoryDependenceAnalysis.cpp$494-500). And GVN got this clobbering dependency not from getDependency().

Let's see what happening when GVN processes %tmp4.1 = load i8, i8* %tmp3.1, align 4 from the following IR

define void @load_load_partial_alias_atomic(i8* %arg) {
bb:
  %tmp1.1 = getelementptr inbounds i8, i8* %arg, i64 0
  %tmp2.1 = getelementptr inbounds i8, i8* %arg, i64 1
  %tmp2.2 = bitcast i8* %tmp2.1 to i64*
  %tmp2.3 = load i64, i64* %tmp2.2, align 4
  %tmp2.4 = icmp ugt i64 %tmp2.3, 1

  %tmp3.1 = getelementptr inbounds i8, i8* %arg, i64 2
  br label %bb5

bb5:                                              ; preds = %bb14, %bb
  %tmp4.1 = load i8, i8* %tmp3.1, align 4
  %tmp6.1 = load atomic i8, i8* getelementptr inbounds (i8, i8* @global, i64 0) acquire, align 4
  %tmp7.1 = add i8 %tmp6.1, %tmp4.1
  store i8 %tmp7.1, i8* %tmp1.1
  br label %bb5

}

At first, getDependency() returns just NonLocal dependency and GVN falls to processNonLocalLoad() (GVN.cpp$1849-1853).
Then getNonLocalPointerDependency() returns both NonLocal dependencies (to %tmp2.3 = load i64... and to %tmp6.1 = load atomic i8).
And then AnalyzeLoadAvailability() tries to process them and asks MDA for a clobber offsets, without calling getDependency() between their processing.
So GVN thought there are valid clobbering offsets for both of them and tried to optimize incorrect pair of loads (with atomic load one).

nikic accepted this revision.Thu, May 13, 1:10 PM

Thanks for the explanation! Probably the best solution would be the same we did in AA, which is to embed the offset in MemDepResult, then we couldn't have this kind of "out of sync" issue by design. But this also LGTM.

This revision is now accepted and ready to land.Thu, May 13, 1:10 PM
This revision was automatically updated to reflect the committed changes.
dfukalov added a comment.EditedFri, May 14, 1:22 AM

Yes, I thought about packing offset to MemDepResult, but it is already a packed pointer with other info and it seems to be overkill to double its size for a quite rare offset value.