This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
-
ScalarEvolution.cpp
-
test/
-
Analysis/
-
DependenceAnalysis/
-
Preliminary.ll
-
ScalarEvolution/
-
flags-from-poison.ll
-
Transforms/SLPVectorizer/X86/
-
SLPVectorizer/
-
X86/
1
consecutive-access.ll

Differential D111186

[SCEV] Infer flags from add/gep in any block
ClosedPublic

Authored by reames on Oct 5 2021, 3:03 PM.

Download Raw Diff

Details

Reviewers

nikic
efriedma
mkazantsev

Commits

rG0658bab870c8: [SCEV] Infer flags from add/gep in any block

Summary

This patch removes a compile time restriction from isSCEVExprNeverPoison. We've strengthened our ability to reason about flags on scopes other than addrecs, and this bailout prevents us from using it. The comment is also suspect as well in that we're in the middle of constructing a SCEV for I. As such, we're going to visit all operands *anyways*.

@nikic If you can easily run compile time impact data, doing a sanity check on this patch is probably reasonable.

Diff Detail

Event Timeline

reames created this revision.Oct 5 2021, 3:03 PM

Herald added subscribers: bmahjour, javed.absar, bollu and 2 others. · View Herald TranscriptOct 5 2021, 3:03 PM

reames requested review of this revision.Oct 5 2021, 3:03 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 5 2021, 3:03 PM

reames added a child revision: D111180: The result of a function with noundef return attribute must be well defined.Oct 5 2021, 3:04 PM

reames mentioned this in D111180: The result of a function with noundef return attribute must be well defined.

Harbormaster completed remote builds in B127164: Diff 377361.Oct 5 2021, 3:49 PM

I'm fine with that, provided that CT impact is good.

This revision is now accepted and ready to land.Oct 5 2021, 10:46 PM

Compile-time impact looks acceptable: https://llvm-compile-time-tracker.com/compare.php?from=91d15aa0b8bff10bd1ccf279418560d17fea52ff&to=e9112b9b93ef7d96468bae3168c0d96c35d190c3&stat=instructions

The comment is also suspect as well in that we're in the middle of constructing a SCEV for I. As such, we're going to visit all operands *anyways*.

I can only make a guess here, but what this might be referring to is the fact that SCEV construction from IR will coalesce add/sub and mul expressions into a single call of getAddExpr/getMulExpr rather than building up a chain of binary adds/muls. Effectively, the change you do here defeats that (for the case where the IR instruction has flags, even if they are inapplicable), because you will end up calling getSCEV on each individual add due to the operand fetch in the poison check.

llvm/test/Transforms/SLPVectorizer/X86/consecutive-access.ll
14	Comment needs update.

In D111186#3045847, @nikic wrote:

The comment is also suspect as well in that we're in the middle of constructing a SCEV for I. As such, we're going to visit all operands *anyways*.

I can only make a guess here, but what this might be referring to is the fact that SCEV construction from IR will coalesce add/sub and mul expressions into a single call of getAddExpr/getMulExpr rather than building up a chain of binary adds/muls. Effectively, the change you do here defeats that (for the case where the IR instruction has flags, even if they are inapplicable), because you will end up calling getSCEV on each individual add due to the operand fetch in the poison check.

So, while reasonable, this isn't quite right or at least isn't so any longer. When constructing an add reduction tree, we will aggressively collapse into a single add node, dropping flags as needed. This is done inside getAddExpr. The code that runs during parsing - which goes out of it's way to separate by flag type - appears to just be canonicalized back into the flattened form )if not all flags match). Now, maybe there's a difference in the number of nodes from the interior of the tree formed, but that cost should be *very* minimal. (As for one thing, how would you construct an add tree with a non-SCEVable operand type?)

("add" can be replaced with "arithmetic operation" in the above to generalize)

Closed by commit rG0658bab870c8: [SCEV] Infer flags from add/gep in any block (authored by reames). · Explain WhyOct 6 2021, 11:12 AM

This revision was automatically updated to reflect the committed changes.

reames added a commit: rG0658bab870c8: [SCEV] Infer flags from add/gep in any block.

In D111186#3045982, @reames wrote:

In D111186#3045847, @nikic wrote:

The comment is also suspect as well in that we're in the middle of constructing a SCEV for I. As such, we're going to visit all operands *anyways*.

I can only make a guess here, but what this might be referring to is the fact that SCEV construction from IR will coalesce add/sub and mul expressions into a single call of getAddExpr/getMulExpr rather than building up a chain of binary adds/muls. Effectively, the change you do here defeats that (for the case where the IR instruction has flags, even if they are inapplicable), because you will end up calling getSCEV on each individual add due to the operand fetch in the poison check.

So, while reasonable, this isn't quite right or at least isn't so any longer. When constructing an add reduction tree, we will aggressively collapse into a single add node, dropping flags as needed. This is done inside getAddExpr. The code that runs during parsing - which goes out of it's way to separate by flag type - appears to just be canonicalized back into the flattened form )if not all flags match). Now, maybe there's a difference in the number of nodes from the interior of the tree formed, but that cost should be *very* minimal. (As for one thing, how would you construct an add tree with a non-SCEVable operand type?)

I'm not sure I understand what you're saying here. Maybe easier to talk about an example:

%x = add %a, %b
%y = add nuw %x, %c

Let's say we call getSCEV(%y) in a position where flags could not be transferred. Prior to this change, this would call getAddExpr(S(%a), S(%b), S(%c)). After this change it will additionally call getAddExpr(S(%a), S(%b)) as part of the poison check, when evaluating the %x operand. This additional expression will go unused (beyond the poison check). Previously, this would only happen if the flags could actually be transferred.

Revision Contents

Path

Size

llvm/

lib/

Analysis/

ScalarEvolution.cpp

10 lines

test/

Analysis/

DependenceAnalysis/

Preliminary.ll

2 lines

ScalarEvolution/

flags-from-poison.ll

8 lines

Transforms/

SLPVectorizer/

X86/

consecutive-access.ll

21 lines

Diff 377361

llvm/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,639 Lines • ▼ Show 20 Lines	if (BLoop && BLoop->getHeader() == B->getParent() &&
::isGuaranteedToTransferExecutionToSuccessor(B->getParent()->begin(),		::isGuaranteedToTransferExecutionToSuccessor(B->getParent()->begin(),
B->getIterator()))		B->getIterator()))
return true;		return true;
return false;		return false;
}		}


bool ScalarEvolution::isSCEVExprNeverPoison(const Instruction *I) {		bool ScalarEvolution::isSCEVExprNeverPoison(const Instruction *I) {
// Here we check that I is in the header of the innermost loop containing I,
// since we only deal with instructions in the loop header. The actual loop we
// need to check later will come from an add recurrence, but getting that
// requires computing the SCEV of the operands, which can be expensive. This
// check we can do cheaply to rule out some cases early.
Loop *InnermostContainingLoop = LI.getLoopFor(I->getParent());
if (InnermostContainingLoop == nullptr \|\|
InnermostContainingLoop->getHeader() != I->getParent())
return false;

// Only proceed if we can prove that I does not yield poison.		// Only proceed if we can prove that I does not yield poison.
if (!programUndefinedIfPoison(I))		if (!programUndefinedIfPoison(I))
return false;		return false;

// At this point we know that if I is executed, then it does not wrap		// At this point we know that if I is executed, then it does not wrap
// according to at least one of NSW or NUW. If I is not executed, then we do		// according to at least one of NSW or NUW. If I is not executed, then we do
// not know if the calculation that I represents would wrap. Multiple		// not know if the calculation that I represents would wrap. Multiple
// instructions can map to the same SCEV. If we apply NSW or NUW from I to		// instructions can map to the same SCEV. If we apply NSW or NUW from I to
▲ Show 20 Lines • Show All 7,088 Lines • Show Last 20 Lines

llvm/test/Analysis/DependenceAnalysis/Preliminary.ll

	Show First 20 Lines • Show All 617 Lines • ▼ Show 20 Lines
	define void @p9(i32* %A, i32* %B, i32 %n) nounwind uwtable ssp {			define void @p9(i32* %A, i32* %B, i32 %n) nounwind uwtable ssp {
	entry:			entry:
	%idxprom = sext i32 %n to i64			%idxprom = sext i32 %n to i64
	%arrayidx = getelementptr inbounds i32, i32* %A, i64 %idxprom			%arrayidx = getelementptr inbounds i32, i32* %A, i64 %idxprom
	store i32 0, i32* %arrayidx, align 4			store i32 0, i32* %arrayidx, align 4

	; CHECK-LABEL: p9			; CHECK-LABEL: p9
	; CHECK: da analyze - none!			; CHECK: da analyze - none!
	; CHECK: da analyze - flow [\|<]!			; CHECK: da analyze - none!
	; CHECK: da analyze - confused!			; CHECK: da analyze - confused!
	; CHECK: da analyze - none!			; CHECK: da analyze - none!
	; CHECK: da analyze - confused!			; CHECK: da analyze - confused!
	; CHECK: da analyze - none!			; CHECK: da analyze - none!

	%add = add nsw i32 %n, 1			%add = add nsw i32 %n, 1
	%idxprom1 = sext i32 %add to i64			%idxprom1 = sext i32 %add to i64
	%arrayidx2 = getelementptr inbounds i32, i32* %A, i64 %idxprom1			%arrayidx2 = getelementptr inbounds i32, i32* %A, i64 %idxprom1
	▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/Analysis/ScalarEvolution/flags-from-poison.ll

Show First 20 Lines • Show All 1,622 Lines • ▼ Show 20 Lines

; TODO: once D111180 lands, remove the udiv from these *-basic tests.		; TODO: once D111180 lands, remove the udiv from these *-basic tests.
; noundef really should be enough		; noundef really should be enough

define noundef i32 @add-basic(i32 %a, i32 %b) {		define noundef i32 @add-basic(i32 %a, i32 %b) {
; CHECK-LABEL: 'add-basic'		; CHECK-LABEL: 'add-basic'
; CHECK-NEXT: Classifying expressions for: @add-basic		; CHECK-NEXT: Classifying expressions for: @add-basic
; CHECK-NEXT: %res = add nuw nsw i32 %a, %b		; CHECK-NEXT: %res = add nuw nsw i32 %a, %b
; CHECK-NEXT: --> (%a + %b) U: full-set S: full-set		; CHECK-NEXT: --> (%a + %b)<nuw><nsw> U: full-set S: full-set
; CHECK-NEXT: %res2 = udiv i32 255, %res		; CHECK-NEXT: %res2 = udiv i32 255, %res
; CHECK-NEXT: --> (255 /u (%a + %b)) U: [0,256) S: [0,256)		; CHECK-NEXT: --> (255 /u (%a + %b)<nuw><nsw>) U: [0,256) S: [0,256)
; CHECK-NEXT: Determining loop execution counts for: @add-basic		; CHECK-NEXT: Determining loop execution counts for: @add-basic
;		;
%res = add nuw nsw i32 %a, %b		%res = add nuw nsw i32 %a, %b
%res2 = udiv i32 255, %res		%res2 = udiv i32 255, %res
ret i32 %res2		ret i32 %res2
}		}

define noundef i32 @sub-basic(i32 %a, i32 %b) {		define noundef i32 @sub-basic(i32 %a, i32 %b) {
Show All 9 Lines	;
%res2 = udiv i32 255, %res		%res2 = udiv i32 255, %res
ret i32 %res2		ret i32 %res2
}		}

define noundef i32 @mul-basic(i32 %a, i32 %b) {		define noundef i32 @mul-basic(i32 %a, i32 %b) {
; CHECK-LABEL: 'mul-basic'		; CHECK-LABEL: 'mul-basic'
; CHECK-NEXT: Classifying expressions for: @mul-basic		; CHECK-NEXT: Classifying expressions for: @mul-basic
; CHECK-NEXT: %res = mul nuw nsw i32 %a, %b		; CHECK-NEXT: %res = mul nuw nsw i32 %a, %b
; CHECK-NEXT: --> (%a * %b) U: full-set S: full-set		; CHECK-NEXT: --> (%a * %b)<nuw><nsw> U: full-set S: full-set
; CHECK-NEXT: %res2 = udiv i32 255, %res		; CHECK-NEXT: %res2 = udiv i32 255, %res
; CHECK-NEXT: --> (255 /u (%a * %b)) U: [0,256) S: [0,256)		; CHECK-NEXT: --> (255 /u (%a * %b)<nuw><nsw>) U: [0,256) S: [0,256)
; CHECK-NEXT: Determining loop execution counts for: @mul-basic		; CHECK-NEXT: Determining loop execution counts for: @mul-basic
;		;
%res = mul nuw nsw i32 %a, %b		%res = mul nuw nsw i32 %a, %b
%res2 = udiv i32 255, %res		%res2 = udiv i32 255, %res
ret i32 %res2		ret i32 %res2
}		}

@gA = external global i32		@gA = external global i32
▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/consecutive-access.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -basic-aa -slp-vectorizer -S \| FileCheck %s			; RUN: opt < %s -basic-aa -slp-vectorizer -S \| FileCheck %s
	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.9.0"			target triple = "x86_64-apple-macosx10.9.0"

	@A = common global [2000 x double] zeroinitializer, align 16			@A = common global [2000 x double] zeroinitializer, align 16
	@B = common global [2000 x double] zeroinitializer, align 16			@B = common global [2000 x double] zeroinitializer, align 16
	@C = common global [2000 x float] zeroinitializer, align 16			@C = common global [2000 x float] zeroinitializer, align 16
	@D = common global [2000 x float] zeroinitializer, align 16			@D = common global [2000 x float] zeroinitializer, align 16

	; Currently SCEV isn't smart enough to figure out that accesses			; Currently SCEV isn't smart enough to figure out that accesses
	; A[3i], A[3i+1] and A[3*i+2] are consecutive, but in future			; A[3i], A[3i+1] and A[3*i+2] are consecutive, but in future
	; that would hopefully be fixed. For now, check that this isn't			; that would hopefully be fixed. For now, check that this isn't
	; vectorized.			; vectorized.
				nikicUnsubmitted Not Done Reply Inline Actions Comment needs update. nikic: Comment needs update.
	; Function Attrs: nounwind ssp uwtable			; Function Attrs: nounwind ssp uwtable
	define void @foo_3double(i32 %u) #0 {			define void @foo_3double(i32 %u) #0 {
	; CHECK-LABEL: @foo_3double(			; CHECK-LABEL: @foo_3double(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[U_ADDR:%.*]] = alloca i32, align 4			; CHECK-NEXT: [[U_ADDR:%.*]] = alloca i32, align 4
	; CHECK-NEXT: store i32 [[U:%.]], i32 [[U_ADDR]], align 4			; CHECK-NEXT: store i32 [[U:%.]], i32 [[U_ADDR]], align 4
	; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[U]], 3			; CHECK-NEXT: [[MUL:%.*]] = mul nsw i32 [[U]], 3
	; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[MUL]] to i64			; CHECK-NEXT: [[IDXPROM:%.*]] = sext i32 [[MUL]] to i64
	; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM]]			; CHECK-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM]]
	; CHECK-NEXT: [[TMP0:%.]] = load double, double [[ARRAYIDX]], align 8
	; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM]]			; CHECK-NEXT: [[ARRAYIDX4:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM]]
	; CHECK-NEXT: [[TMP1:%.]] = load double, double [[ARRAYIDX4]], align 8
	; CHECK-NEXT: [[ADD5:%.*]] = fadd double [[TMP0]], [[TMP1]]
	; CHECK-NEXT: store double [[ADD5]], double* [[ARRAYIDX]], align 8
	; CHECK-NEXT: [[ADD11:%.*]] = add nsw i32 [[MUL]], 1			; CHECK-NEXT: [[ADD11:%.*]] = add nsw i32 [[MUL]], 1
	; CHECK-NEXT: [[IDXPROM12:%.*]] = sext i32 [[ADD11]] to i64			; CHECK-NEXT: [[IDXPROM12:%.*]] = sext i32 [[ADD11]] to i64
	; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM12]]			; CHECK-NEXT: [[ARRAYIDX13:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM12]]
	; CHECK-NEXT: [[TMP2:%.]] = load double, double [[ARRAYIDX13]], align 8			; CHECK-NEXT: [[TMP0:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
				; CHECK-NEXT: [[TMP1:%.]] = load <2 x double>, <2 x double> [[TMP0]], align 8
	; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM12]]			; CHECK-NEXT: [[ARRAYIDX17:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM12]]
	; CHECK-NEXT: [[TMP3:%.]] = load double, double [[ARRAYIDX17]], align 8			; CHECK-NEXT: [[TMP2:%.]] = bitcast double [[ARRAYIDX4]] to <2 x double>*
	; CHECK-NEXT: [[ADD18:%.*]] = fadd double [[TMP2]], [[TMP3]]			; CHECK-NEXT: [[TMP3:%.]] = load <2 x double>, <2 x double> [[TMP2]], align 8
	; CHECK-NEXT: store double [[ADD18]], double* [[ARRAYIDX13]], align 8			; CHECK-NEXT: [[TMP4:%.*]] = fadd <2 x double> [[TMP1]], [[TMP3]]
				; CHECK-NEXT: [[TMP5:%.]] = bitcast double [[ARRAYIDX]] to <2 x double>*
				; CHECK-NEXT: store <2 x double> [[TMP4]], <2 x double>* [[TMP5]], align 8
	; CHECK-NEXT: [[ADD24:%.*]] = add nsw i32 [[MUL]], 2			; CHECK-NEXT: [[ADD24:%.*]] = add nsw i32 [[MUL]], 2
	; CHECK-NEXT: [[IDXPROM25:%.*]] = sext i32 [[ADD24]] to i64			; CHECK-NEXT: [[IDXPROM25:%.*]] = sext i32 [[ADD24]] to i64
	; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM25]]			; CHECK-NEXT: [[ARRAYIDX26:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @A, i32 0, i64 [[IDXPROM25]]
	; CHECK-NEXT: [[TMP4:%.]] = load double, double [[ARRAYIDX26]], align 8			; CHECK-NEXT: [[TMP6:%.]] = load double, double [[ARRAYIDX26]], align 8
	; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM25]]			; CHECK-NEXT: [[ARRAYIDX30:%.]] = getelementptr inbounds [2000 x double], [2000 x double] @B, i32 0, i64 [[IDXPROM25]]
	; CHECK-NEXT: [[TMP5:%.]] = load double, double [[ARRAYIDX30]], align 8			; CHECK-NEXT: [[TMP7:%.]] = load double, double [[ARRAYIDX30]], align 8
	; CHECK-NEXT: [[ADD31:%.*]] = fadd double [[TMP4]], [[TMP5]]			; CHECK-NEXT: [[ADD31:%.*]] = fadd double [[TMP6]], [[TMP7]]
	; CHECK-NEXT: store double [[ADD31]], double* [[ARRAYIDX26]], align 8			; CHECK-NEXT: store double [[ADD31]], double* [[ARRAYIDX26]], align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%u.addr = alloca i32, align 4			%u.addr = alloca i32, align 4
	store i32 %u, i32* %u.addr, align 4			store i32 %u, i32* %u.addr, align 4
	%mul = mul nsw i32 %u, 3			%mul = mul nsw i32 %u, 3
	%idxprom = sext i32 %mul to i64			%idxprom = sext i32 %mul to i64
	▲ Show 20 Lines • Show All 537 Lines • Show Last 20 Lines