Page MenuHomePhabricator

[GVN] Clobber partially aliased loads.
ClosedPublic

Authored by dfukalov on Jan 27 2021, 10:17 AM.

Details

Summary

Use offsets stored in AliasResult implemented in D98718.

Diff Detail

Event Timeline

dfukalov requested review of this revision.Jan 27 2021, 10:17 AM
dfukalov created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2021, 10:17 AM
dfukalov planned changes to this revision.Jan 27 2021, 10:18 AM
dfukalov updated this revision to Diff 327811.Mar 3 2021, 8:33 AM

Implementation.

dfukalov retitled this revision from [WIP][GVN] Clobber partially aliased loads. to [GVN] Clobber partially aliased loads..Mar 3 2021, 8:39 AM
dfukalov edited the summary of this revision. (Show Details)
dfukalov added reviewers: nikic, asbirlea, fhahn.
dfukalov added a subscriber: fvrmatteo.
bmahjour removed a subscriber: bmahjour.Mar 3 2021, 8:40 AM
dfukalov planned changes to this revision.Mar 16 2021, 9:19 AM

The patch should be updated after converting AliasResult.

dfukalov updated this revision to Diff 335902.Apr 7 2021, 12:26 PM

Rebased to use reworked AliasResult in D98718.

dfukalov edited the summary of this revision. (Show Details)Apr 7 2021, 12:27 PM

Performance-wise this looks good.
Please wait to see if the other reviewers have comments on this.

dfukalov updated this revision to Diff 336377.Apr 9 2021, 3:33 AM

Rebased after preparation changes (D98027+D98718) committed.

dfukalov updated this revision to Diff 336379.Apr 9 2021, 3:38 AM

Removed unneeded change in MemorySSATest.cpp.

nikic added inline comments.Sun, Apr 11, 2:10 PM
llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
528

Do you have any information on why this is not a concern anymore? The comment was introduced in https://github.com/llvm/llvm-project/commit/a471751c24324e7ba6ac5c612dbedb16c644fc44, unfortunately without a test case :(

532

Say we have a load from X, then a load from PartialAlias of X without offset, then another load from X. I think after your change, we will no longer forward from the first load, because we'll return a clobber for the PartialAlias in between, even though it cannot be used.

Can you please test this situation? Maybe we should not return a clobber if no offset is available?

llvm/test/Transforms/GVN/clobber-partial-alias.ll
2

update_test_checks please.

dfukalov updated this revision to Diff 336899.Mon, Apr 12, 10:47 AM

Addressing comments.

dfukalov marked 2 inline comments as done.Mon, Apr 12, 10:59 AM
dfukalov added inline comments.
llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
528

Yes, I saw this change and because of this comment added VisitedPhiBBs check when we return PartialAlias with offset in BasicAAResult::aliasGEP.

532

Yes, you're right there is no reason to return here PartialAlias if we don't know the offset.

dfukalov updated this revision to Diff 339022.Tue, Apr 20, 3:54 PM
dfukalov marked an inline comment as done.

Added test for partially aliased loads with phi translation in address.

dfukalov marked an inline comment as done.Tue, Apr 20, 3:56 PM
dfukalov added inline comments.
llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
528

I added test load_load_partial_alias_cross_block_phi_trans to check this.

nikic added inline comments.Thu, Apr 22, 2:27 PM
llvm/test/Transforms/GVN/PRE/rle.ll
736 ↗(On Diff #339022)

I believe for phi translation to work you need to have the translated geps in the predecessor. This test case worked for me:

define i32 @load_load_partial_alias_cross_block_phi_trans(i8* %P) nounwind {
; LE-LABEL: @load_load_partial_alias_cross_block_phi_trans(
; LE-NEXT:  entry:
; LE-NEXT:    [[XX:%.*]] = bitcast i8* [[P:%.*]] to i32*
; LE-NEXT:    [[X1:%.*]] = load i32, i32* [[XX]], align 4
; LE-NEXT:    [[CMP:%.*]] = icmp eq i32 [[X1]], 127
; LE-NEXT:    [[TMP0:%.*]] = lshr i32 [[X1]], 16
; LE-NEXT:    [[TMP1:%.*]] = trunc i32 [[TMP0]] to i8
; LE-NEXT:    [[TMP2:%.*]] = lshr i32 [[X1]], 8
; LE-NEXT:    [[TMP3:%.*]] = trunc i32 [[TMP2]] to i8
; LE-NEXT:    br i1 [[CMP]], label [[IF:%.*]], label [[ELSE:%.*]]
; LE:       if:
; LE-NEXT:    br label [[JOIN:%.*]]
; LE:       else: 
; LE-NEXT:    br label [[JOIN]]
; LE:       join: 
; LE-NEXT:    [[TMP5:%.*]] = phi i8 [ [[TMP3]], [[IF]] ], [ [[TMP1]], [[ELSE]] ]
; LE-NEXT:    [[CONV6:%.*]] = zext i8 [[TMP5]] to i32
; LE-NEXT:    ret i32 [[CONV6]]
; LE:       if.end:
; LE-NEXT:    ret i32 52 
; 
; BE-LABEL: @load_load_partial_alias_cross_block2(
; BE-NEXT:  entry:
; BE-NEXT:    [[XX:%.*]] = bitcast i8* [[P:%.*]] to i32*
; BE-NEXT:    [[X1:%.*]] = load i32, i32* [[XX]], align 4
; BE-NEXT:    [[CMP:%.*]] = icmp eq i32 [[X1]], 127
; BE-NEXT:    [[TMP0:%.*]] = lshr i32 [[X1]], 8
; BE-NEXT:    [[TMP1:%.*]] = trunc i32 [[TMP0]] to i8
; BE-NEXT:    [[TMP2:%.*]] = lshr i32 [[X1]], 16
; BE-NEXT:    [[TMP3:%.*]] = trunc i32 [[TMP2]] to i8
; BE-NEXT:    br i1 [[CMP]], label [[IF:%.*]], label [[ELSE:%.*]]
; BE:       if:
; BE-NEXT:    br label [[JOIN:%.*]]
; BE:       else: 
; BE-NEXT:    br label [[JOIN]]
; BE:       join: 
; BE-NEXT:    [[TMP5:%.*]] = phi i8 [ [[TMP3]], [[IF]] ], [ [[TMP1]], [[ELSE]] ]
; BE-NEXT:    [[CONV6:%.*]] = zext i8 [[TMP5]] to i32
; BE-NEXT:    ret i32 [[CONV6]]
; BE:       if.end:
; BE-NEXT:    ret i32 52
; 
entry:
  %xx = bitcast i8* %P to i32*
  %x1 = load i32, i32* %xx, align 4
  %cmp = icmp eq i32 %x1, 127
  br i1 %cmp, label %if, label %else

if:
  %arrayidx.if = getelementptr inbounds i8, i8* %P, i64 1
  br label %join

else:
  %arrayidx.else = getelementptr inbounds i8, i8* %P, i64 2
  br label %join
  
join: 
  %idx = phi i64 [ 1, %if ], [ 2, %else ]
  %arrayidx4 = getelementptr inbounds i8, i8* %P, i64 %idx
  %tmp5 = load i8, i8* %arrayidx4, align 1
  %conv6 = zext i8 %tmp5 to i32
  ret i32 %conv6

if.end:
  ret i32 52
}

I think the actually problematic case probably has something to do with phi translation in loops (where you translate into the same block, but in a different iteration). I'll try to play with that tomorrow, but if I don't find anything let's just land this and see if any miscompiles turn up :)

nikic accepted this revision.Fri, Apr 23, 10:02 AM

LGTM, you might want to pick up the additional test cases.

llvm/lib/Transforms/Scalar/GVN.cpp
1008

I don't think we can really do anything with negative offsets, unless we want to do something like shorten the second load and combine it with the value of the first load. That's unlikely to be profitable.

llvm/test/Transforms/GVN/PRE/rle.ll
736 ↗(On Diff #339022)

I constructed a case that involves a loop, but everything seems to work fine in that case as well:

define void @load_load_partial_alias_loop(i8* %P) {
; LE-LABEL: @load_load_partial_alias_loop(
; LE-NEXT:  entry:
; LE-NEXT:    [[P_1:%.*]] = getelementptr i8, i8* [[P:%.*]], i64 1
; LE-NEXT:    [[V_1:%.*]] = load i8, i8* [[P_1]], align 1
; LE-NEXT:    call void @use.i8(i8 [[V_1]])
; LE-NEXT:    [[P_1_32:%.*]] = bitcast i8* [[P_1]] to i32*
; LE-NEXT:    [[V_1_32:%.*]] = load i32, i32* [[P_1_32]], align 4
; LE-NEXT:    call void @use.i32(i32 [[V_1_32]])
; LE-NEXT:    [[TMP0:%.*]] = trunc i32 [[V_1_32]] to i8
; LE-NEXT:    br label [[LOOP:%.*]]
; LE:       loop:
; LE-NEXT:    [[V_I:%.*]] = phi i8 [ [[TMP0]], [[ENTRY:%.*]] ], [ [[TMP2:%.*]], [[LOOP_LOOP_CRIT_EDGE:%.*]] ]
; LE-NEXT:    [[I:%.*]] = phi i64 [ 1, [[ENTRY]] ], [ [[I_INC:%.*]], [[LOOP_LOOP_CRIT_EDGE]] ]
; LE-NEXT:    [[P_I:%.*]] = getelementptr i8, i8* [[P]], i64 [[I]]
; LE-NEXT:    call void @use.i8(i8 [[V_I]])
; LE-NEXT:    [[P_I_32:%.*]] = bitcast i8* [[P_I]] to i32*
; LE-NEXT:    [[V_I_32:%.*]] = load i32, i32* [[P_I_32]], align 4
; LE-NEXT:    call void @use.i32(i32 [[V_I_32]])
; LE-NEXT:    [[I_INC]] = add i64 [[I]], 1
; LE-NEXT:    [[CMP:%.*]] = icmp ne i64 [[I_INC]], 64
; LE-NEXT:    [[TMP1:%.*]] = lshr i32 [[V_I_32]], 8
; LE-NEXT:    [[TMP2]] = trunc i32 [[TMP1]] to i8
; LE-NEXT:    br i1 [[CMP]], label [[LOOP_LOOP_CRIT_EDGE]], label [[EXIT:%.*]]
; LE:       loop.loop_crit_edge:
; LE-NEXT:    br label [[LOOP]]
; LE:       exit:
; LE-NEXT:    ret void
;
; BE-LABEL: @load_load_partial_alias_loop(
; BE-NEXT:  entry:
; BE-NEXT:    [[P_1:%.*]] = getelementptr i8, i8* [[P:%.*]], i64 1
; BE-NEXT:    [[V_1:%.*]] = load i8, i8* [[P_1]], align 1
; BE-NEXT:    call void @use.i8(i8 [[V_1]])
; BE-NEXT:    [[P_1_32:%.*]] = bitcast i8* [[P_1]] to i32*
; BE-NEXT:    [[V_1_32:%.*]] = load i32, i32* [[P_1_32]], align 4
; BE-NEXT:    call void @use.i32(i32 [[V_1_32]])
; BE-NEXT:    [[TMP0:%.*]] = lshr i32 [[V_1_32]], 24
; BE-NEXT:    [[TMP1:%.*]] = trunc i32 [[TMP0]] to i8
; BE-NEXT:    br label [[LOOP:%.*]]
; BE:       loop:
; BE-NEXT:    [[V_I:%.*]] = phi i8 [ [[TMP1]], [[ENTRY:%.*]] ], [ [[TMP3:%.*]], [[LOOP_LOOP_CRIT_EDGE:%.*]] ]
; BE-NEXT:    [[I:%.*]] = phi i64 [ 1, [[ENTRY]] ], [ [[I_INC:%.*]], [[LOOP_LOOP_CRIT_EDGE]] ]
; BE-NEXT:    [[P_I:%.*]] = getelementptr i8, i8* [[P]], i64 [[I]]
; BE-NEXT:    call void @use.i8(i8 [[V_I]])
; BE-NEXT:    [[P_I_32:%.*]] = bitcast i8* [[P_I]] to i32*
; BE-NEXT:    [[V_I_32:%.*]] = load i32, i32* [[P_I_32]], align 4
; BE-NEXT:    call void @use.i32(i32 [[V_I_32]])
; BE-NEXT:    [[I_INC]] = add i64 [[I]], 1
; BE-NEXT:    [[CMP:%.*]] = icmp ne i64 [[I_INC]], 64
; BE-NEXT:    [[TMP2:%.*]] = lshr i32 [[V_I_32]], 16
; BE-NEXT:    [[TMP3]] = trunc i32 [[TMP2]] to i8
; BE-NEXT:    br i1 [[CMP]], label [[LOOP_LOOP_CRIT_EDGE]], label [[EXIT:%.*]]
; BE:       loop.loop_crit_edge:
; BE-NEXT:    br label [[LOOP]]
; BE:       exit:
; BE-NEXT:    ret void
;
entry:
  %P.1 = getelementptr i8, i8* %P, i64 1
  %v.1 = load i8, i8* %P.1
  call void @use.i8(i8 %v.1)
  %P.1.32 = bitcast i8* %P.1 to i32*
  %v.1.32 = load i32, i32* %P.1.32
  call void @use.i32(i32 %v.1.32)
  br label %loop

loop:
  %i = phi i64 [ 1, %entry ], [ %i.inc, %loop ]
  %P.i = getelementptr i8, i8* %P, i64 %i
  %v.i = load i8, i8* %P.i
  call void @use.i8(i8 %v.i)
  %P.i.32 = bitcast i8* %P.i to i32*
  %v.i.32 = load i32, i32* %P.i.32
  call void @use.i32(i32 %v.i.32)
  %i.inc = add i64 %i, 1
  %cmp = icmp ne i64 %i.inc, 64
  br i1 %cmp, label %loop, label %exit

exit:
  ret void
}

declare void @use.i8(i8) readnone
declare void @use.i32(i32) readnone

Now, I have some doubts that these transforms are profitable, but at least they're not wrong.

This revision is now accepted and ready to land.Fri, Apr 23, 10:02 AM
This revision was automatically updated to reflect the committed changes.

We're seeing some issues with this patch, potentially a miscompile. When we enable assertions we get an error about widening atomic loads -- I'm not sure this is the source of the miscompile we're seeing, but it certainly looks related (I think this is causing corruption and leading to issues elsewhere).

$ cat repro.ll
; ModuleID = 'repro.ll'
source_filename = "repro.ll"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.widget = type { i32 }
%struct.baz = type { i32, %struct.snork }
%struct.snork = type { %struct.spam }
%struct.spam = type { i32, i32 }

@global = external local_unnamed_addr global %struct.widget, align 4
@global.1 = external local_unnamed_addr global i8, align 1
@global.2 = external local_unnamed_addr global i32, align 4

define void @zot(%struct.baz* %arg) local_unnamed_addr align 2 {
bb:
  %tmp = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1
  %tmp1 = bitcast %struct.snork* %tmp to i64*
  %tmp2 = load i64, i64* %tmp1, align 4
  %tmp3 = getelementptr inbounds %struct.baz, %struct.baz* %arg, i64 0, i32 1, i32 0, i32 1
  %tmp4 = icmp ugt i64 %tmp2, 4294967295
  br label %bb5

bb5:                                              ; preds = %bb14, %bb
  %tmp6 = load i32, i32* %tmp3, align 4
  %tmp7 = icmp ne i32 %tmp6, 0
  %tmp8 = select i1 %tmp7, i1 %tmp4, i1 false
  %tmp9 = zext i1 %tmp8 to i8
  store i8 %tmp9, i8* @global.1, align 1
  %tmp10 = load i32, i32* @global.2, align 4
  switch i32 %tmp10, label %bb11 [
    i32 1, label %bb12
    i32 2, label %bb12
  ]

bb11:                                             ; preds = %bb5
  br label %bb14

bb12:                                             ; preds = %bb5, %bb5
  %tmp13 = load atomic i32, i32* getelementptr inbounds (%struct.widget, %struct.widget* @global, i64 0, i32 0) acquire, align 4
  br label %bb14

bb14:                                             ; preds = %bb12, %bb11
  br label %bb5
}
$ opt -O2 repro.ll -disable-output
opt: /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Utils/VNCoercion.cpp:496: llvm::Value *llvm::VNCoercion::getLoadValueForLoad(llvm::LoadInst *, unsigned int, llvm::Type *, llvm::Instruction *, const llvm::DataLayout &): Assertion `SrcVal->isSimple() && "Cannot widen volatile/atomic load!"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /home/rupprecht/dev/opt -O2 repro.ll -disable-output
...
#10 0x000000000768c138 llvm::VNCoercion::getLoadValueForLoad(llvm::LoadInst*, unsigned int, llvm::Type*, llvm::Instruction*, llvm::DataLayout const&) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Utils/VNCoercion.cpp:497:5
#11 0x00000000070a01ad llvm::gvn::AvailableValue::MaterializeAdjustedValue(llvm::LoadInst*, llvm::Instruction*, llvm::GVN&) const /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:902:11
#12 0x00000000070ae067 llvm::gvn::AvailableValueInBlock::MaterializeAdjustedValue(llvm::LoadInst*, llvm::GVN&) const /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:281:5
#13 0x00000000070a1d11 ConstructSSAForLoadSet(llvm::LoadInst*, llvm::SmallVectorImpl<llvm::gvn::AvailableValueInBlock>&, llvm::GVN&) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:874:40
#14 0x00000000070a3ce6 llvm::GVN::processNonLocalLoad(llvm::LoadInst*) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:1614:12
#15 0x00000000070a610a llvm::GVN::processLoad(llvm::LoadInst*) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:1866:5
#16 0x00000000070a7398 llvm::GVN::processInstruction(llvm::Instruction*) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:2292:9
#17 0x00000000070a8175 llvm::GVN::processBlock(llvm::BasicBlock*) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:2490:24
#18 0x00000000070a7c4f llvm::GVN::iterateOnFunction(llvm::Function&) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:2837:16
#19 0x000000000709fdd6 llvm::GVN::runImpl(llvm::Function&, llvm::AssumptionCache&, llvm::DominatorTree&, llvm::TargetLibraryInfo const&, llvm::AAResults&, llvm::MemoryDependenceResults*, llvm::LoopInfo*, llvm::OptimizationRemarkEmitter*, llvm::MemorySSA*) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:2436:20
#20 0x000000000709fa6e llvm::GVN::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /home/rupprecht/src/llvm-project/llvm/lib/Transforms/Scalar/GVN.cpp:674:8
#21 0x0000000007873fb7 llvm::detail::PassModel<llvm::Function, llvm::GVN, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /home/rupprecht/src/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:85:17
#22 0x000000000691574c llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /home/rupprecht/src/llvm-project/llvm/include/llvm/IR/PassManager.h:517:16
#23 0x0000000004617e77 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) /home/rupprecht/src/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:85:17

We're seeing some issues with this patch, potentially a miscompile. When we enable assertions we get an error about widening atomic loads -- I'm not sure this is the source of the miscompile we're seeing, but it certainly looks related (I think this is causing corruption and leading to issues elsewhere).

@rupprecht thanks for report, I'll take a look.