This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
1/3
LoopCacheAnalysis.h
-
lib/Analysis/
-
Analysis/
7/15
LoopCacheAnalysis.cpp
-
test/Analysis/LoopCacheAnalysis/PowerPC/
-
Analysis/
-
LoopCacheAnalysis/
-
PowerPC/
-
LoopnestFixedSize.ll

Differential D122857

[LoopCacheAnalysis] Enable delinearization of fixed sized arrays
ClosedPublic

Authored by congzhe on Mar 31 2022, 7:01 PM.

Download Raw Diff

Details

Reviewers

bmahjour
Whitney
Meinersbur

Group Reviewers

Restricted Project

Commits

rGc428a3d2a09e: [LoopCacheAnalysis] Enable delinearization of fixed sized arrays

Summary

This is a follow-up to the discussion during the LoopOptWG meeting.

Currently loop cache cost (LCC) cannot analyze fix-sized arrays since it cannot delinearize them. This patch adds the capability to delinearize fix-sized arrays to LCC.

Please note that the function newly added in this patch, i.e., IndexedReference::tryDelinearizeFixedSize(), has some duplications with DependenceInfo::tryDelinearizeFixedSize() in DependenceAnalysis.cpp. In DependenceAnalysis we work on two memory accesses Src and Dst when we do delinearization while in LCC we only work on one memory access, therefore I did not directly reuse DependenceInfo::tryDelinearizeFixedSize() in LCC. However if desired, I could extract IndexedReference::tryDelinearizeFixedSize() as a utility function, move it to DependenceAnalysis.cpp, and reuse this utility function in DependenceInfo::tryDelinearizeFixedSize() thus simplifies DependenceInfo::tryDelinearizeFixedSize(). This refactoring will need some amount of work, therefore I would like to ask for comments and feedbacks on the overall functionality of the current patch. If the current patch looks okay to everyone, I can move on with the aforementioned refactoring work if needed.

Another note is that this patch did not do range checks after delinearization (this was done for DependenceAnalysis under flag !DisableDelinearizationChecks). However, since in LCC the delinearization of parametric sized arrays had not done this check at the first place, for now I omitted the check for delinearization of fix-sized arrays as well.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

congzhe created this revision.Mar 31 2022, 7:01 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2022, 7:01 PM

Herald added subscribers: hiraditya, nemanjai. · View Herald Transcript

congzhe requested review of this revision.Mar 31 2022, 7:01 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2022, 7:01 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

congzhe updated this revision to Diff 419605.Mar 31 2022, 7:29 PM

congzhe edited the summary of this revision. (Show Details)

congzhe edited the summary of this revision. (Show Details)Mar 31 2022, 7:32 PM

congzhe added reviewers: bmahjour, Whitney, Restricted Project.

congzhe added a project: Restricted Project.

congzhe edited the summary of this revision. (Show Details)Mar 31 2022, 7:39 PM

congzhe edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B157299: Diff 419605.Mar 31 2022, 8:20 PM

Meinersbur added a subscriber: Meinersbur.Apr 5 2022, 11:08 AM

Meinersbur added inline comments.

llvm/include/llvm/Analysis/LoopCacheAnalysis.h
148–149	In principle, each dimension can be dynamic or fixed size separately. For instance using C99 VLAs: void func(int n, double A[][128][n]); has 3 dimensions of which only the middle one is fixed size. However, DependenceInfo may not support them either?
llvm/lib/Analysis/LoopCacheAnalysis.cpp
345–346	Did you consider one of `stripPointerCasts`, `stripPointerCastsAndAliases`, `stripPointerCastsSameRepresentation`, `stripPointerCastsForAliasAnalysis`?

Meinersbur retitled this revision from [LoopCacheAnlysis] Enable delinearization of fixed sized arrays to [LoopCacheAnalysis] Enable delinearization of fixed sized arrays.Apr 5 2022, 11:14 AM

nikic added a subscriber: nikic.Apr 5 2022, 11:40 AM

nikic added inline comments.

llvm/lib/Analysis/LoopCacheAnalysis.cpp
335	So, I guess this is something of a pre-existing issue, but this API looks completely broken to me. Deriving information from a GEP source element type is blatantly illegal under IR semantics -- in particular, the array bounds specified therein may be completely arbitrary. The correct way to derive such information is from SCEV addrec expressions. For example in your first test you have `%arrayidx10` with SCEV `{{(8 + %a)<nuw>,+,8192}<nuw><%for.body>,+,4}<nuw><%for.body4>`, and the steps of those addrecs are semantically meaningful.

Meinersbur added inline comments.Apr 5 2022, 4:55 PM

llvm/lib/Analysis/LoopCacheAnalysis.cpp
335	It should be only a heuristic, using the fact that these indices are usually something sensible from the source code. In case of DependenceInfo, there is an additional check whether the derived indices are actually legal (unless `--da-disable-delinearization-checks=true`). LoopCacheAnalysis is an heuristic analysis entirely, deriving and heuristic cost function without a claim of semantics. The SCEV addrec approach is already used by LoopCacheAnalysis in the form of `llvm::delinearize`.

nikic added inline comments.Apr 6 2022, 1:10 AM

llvm/lib/Analysis/LoopCacheAnalysis.cpp
335	Okay, this is slightly less bad than I though then -- but GEP type structure should still not be used for heuristic purposes either. If the SCEV addrec approach is already used in llvm::delinearize(), then why is this additional code needed? The addrecs involved in the motivational test cases contain all the necessary information, so there should be no need to look at GEP structure.

congzhe added inline comments.Apr 6 2022, 7:42 AM

llvm/lib/Analysis/LoopCacheAnalysis.cpp
335	Thanks for the comment! The current SCEV addrec approach that is already used in llvm::delinearize() could not handle fixed-size arrays, it could only handle parametric-size arrays. That is why we treat delinearization of fixed-size and parametric-size arrays separately in `DependenceAnalysis`, and I followed the same fashion here in `LoopCacheAnalysis`.

congzhe added inline comments.Apr 6 2022, 11:33 AM

llvm/lib/Analysis/LoopCacheAnalysis.cpp
335	Could you please elaborate a bit more on the reason that "GEP type structure should still not be used for heuristic purposes"? Is it because in the longer term we envision GEP instructions will not have type information anymore?

nikic added inline comments.Apr 6 2022, 12:28 PM

llvm/lib/Analysis/LoopCacheAnalysis.cpp
335	Could you please elaborate a bit more on the reason that "GEP type structure should still not be used for heuristic purposes"? Is it because in the longer term we envision GEP instructions will not have type information anymore? That's part of it, but also more generally that you're relying on the frontend encoding things in a specific way here. Maybe the array is actually wrapped inside a struct. Maybe the array decays into a slice (`[0 x ...]` style) or a pointer. Maybe the user is an old-school C programmer and uses `ary[y * SIZE + x]` instead of `ary[y][x]`. Matching GEP structure locks you into matching a very specific code pattern, and does not support code that is semantically equivalent, but represented slightly differently in IR. You mention that the SCEV-based approach cannot handle fixed-sized arrays. Could you please explain in more detail why? It's not really obvious to me.

congzhe added inline comments.Apr 7 2022, 2:37 PM

llvm/lib/Analysis/LoopCacheAnalysis.cpp
335	Thanks for your explanation on GEP type structures. Currently the SCEV-based approach is more like a symbolic technique and if it does not find any parameters (symbols) in the SCEV expression, it just bails out. I'm looking into whether we could incorporate fixed-size array delinearization in the the SCEV-based approach and trying to prototype a solution if possible. I will get back to this thread once I get some results.

That's part of it, but also more generally that you're relying on the frontend encoding things in a specific way here. Maybe the array is actually wrapped inside a struct. Maybe the array decays into a slice ([0 x ...] style) or a pointer. Maybe the user is an old-school C programmer and uses ary[y * SIZE + x] instead of ary[y][x]. Matching GEP structure locks you into matching a very specific code pattern, and does not support code that is semantically equivalent, but represented slightly differently in IR.

There is and always will different optimizations depending on how the code is written even though semantically equivalent. While I agree the it is nicer to canonicalize it away, it's an unrealistic goal to expect that everywhere.

I assume you are also referring to "Future: Type-less GEP" from https://llvm.org/devmtg/2022-04-03/slides/keynote.Opaque.Pointers.Are.Coming.pdf. I think there are problems with (e.g. no inrange and inbounds qualifiers), but even if, we could as well add metadata to the memory accesses giving hints to the optimizer to what the dimensions are.

You mention that the SCEV-based approach cannot handle fixed-sized arrays. Could you please explain in more detail why? It's not really obvious to me.

The current delinearizer assumes that the size of a dimension is is represented by a SCEVUnknown, such as %n. When that's the case it will occur in every access function as long as it's not zero (but break when the actual size is an expression such as n+2). The stride depends on what the index expression is, eg. A[2*i][j] has a twice as large stride as the "real" size of an inner subarray. What's worse is that different memory accesses may result in different interpretations of the array sizes for each access, which is hard to cope with. Symbolic sizes are comparatively stable.

At least this is what I am assuming why the current algorithm is designed as it is, and there might be a lot of room for improvement.

You mention that the SCEV-based approach cannot handle fixed-sized arrays. Could you please explain in more detail why? It's not really obvious to me.

To add to the above explanations, the algorithm for recovering subscripts (delinearization) is described in this paper. The implementation in LLVM is based on that paper. In section 3.1 they describe a generic example where they try to recover subscripts A[i+o0][ j+o1][k+o2] for an access function that's been linearized in the IR. The original (linearized) polynomial in that example is (n2(n1*o0 + o1) + o2) + n1*n2*i + n2*j + k. They apply a heuristic to recover the size of each dimension of the array (in this case A[][n1][n2]). They then take the polynomial and divide it by the size of the inner most dimension n2. The remainder from that division (ie. o2 + k) forms the subscript for the inner most dimension and the quotient (ie n1o0+o1+n1i+ j) is further divided by n1 to recover o1 + j for the middle dimension and so on.

If the sizes of each dimension of the array were compile-time constants, and some of the offsets (ie o0, o1 or o2) were also constant, those constants would get folded together in away that would make them indistinguishable from each other and the algorithm wouldn't be able to correctly recover the original subscripts. (eg. if n1=10, n2=10, o0=1, o1=2, o2=3, then the original linearized expression would be 123 + 100i + 10j + k. Even if the algorithm knew the sizes of each dimension of the array (ie 10x10), dividing this expression by 10 would yield a remainder of 123 + k which would not be a valid subscript for the inner dimension!

Hopefully this example clarifies why the same algorithm cannot be used to delinearize subscripts in a fixed size array.

@nikic it may be a good idea to have a discussion about this (impact of opaque pointers on delinearization) at one of the upcoming LoopOptWG calls. If you care to join us, please let me know or add an agenda item to https://docs.google.com/document/d/1sdzoyB11s0ccTZ3fobqctDpgJmRoFcz0sviKxqczs4g/edit so we can discuss this in more depth. Thanks!

bmahjour added inline comments.Apr 11 2022, 2:07 PM

llvm/include/llvm/Analysis/LoopCacheAnalysis.h
148–149	The existing delinearization algorithms only work if all sizes are parametric or if all are constant. `A` in the example above will be a plain `double*` pointer, so `getIndexExpressionsFromGEP()` won't be able to recover much from it.
llvm/lib/Analysis/LoopCacheAnalysis.cpp
355	can use a range-based loop here
358	[nit] this assert should be done before the copy into Sizes.
390	Why do we need to store `IsFixedSize` as a data member?

congzhe mentioned this in D123559: [DA] Refactor with a better API.Apr 11 2022, 6:09 PM

Addressed comments from Michael @Meinersbur and Bardia @bmahjour.

If the current patch looks okay, the next step would be to do refactoring, i.e., move IndexedReference::tryDelinearizeFixedSize(ScalarEvolution *SE, Instruction *Src, const SCEV *SrcAccessFn, SmallVectorImpl<const SCEV *> &SrcSubscripts) to DependenceInfo, and reuse it in both LoopCacheAnalysis.cpp and DependenceAnalysis.cpp.

Currently tryDelinearizeFixedSize() in DependenceAnalysis.cpp is used as tryDelinearizeFixedSize(Src, Dst, SrcAccessFn, DstAccessFn, SrcSubscripts, DstSubscripts). What the refactoring would do is to replace it by tryDelinearizeFixedSize(Src, SrcAccessFn, SrcSubscripts) and tryDelinearizeFixedSize(Dst, DstAccessFn, DstSubscripts).

I am leaning towards doing the refactoring work in a separate patch after the current patch is landed, which may make the process cleaner and more trackable. Nevertheless, I am open to other approaches as well (such as putting the refactoring work in this patch too).

congzhe added inline comments.Apr 13 2022, 8:11 PM

llvm/include/llvm/Analysis/LoopCacheAnalysis.h
148–149	Thanks for the explanation, that is exactly what would happen.
llvm/lib/Analysis/LoopCacheAnalysis.cpp
355	Please correct me if I'm wrong -- here we need the index (i) to index into both `Subscripts` and `SrcSizes`. If we used a range-based loop like `for (const SCEV *S : Subscripts)`, we would not be able to get the elements from `SrcSizes`?
358	Thanks for the comment, I've placed the assert before the copy into Sizes.
390	Thanks, you are right that it does not necessarily need to be a class member. I've updated it to be a local variable.

congzhe added inline comments.Apr 13 2022, 8:15 PM

llvm/lib/Analysis/LoopCacheAnalysis.cpp
345–346	Thanks for the comment! According to your comment, this piece of change in `DependenceAnalysis.cpp` is updated to using `stripPointerCasts()` in D123559 and is now landed, and here I've done the same updated.

Harbormaster completed remote builds in B159601: Diff 422727.Apr 13 2022, 8:40 PM

@congzhe Could you write here what your plans are? Continue with this patch? Improve the SCEV-based algorithm? Introduce array size hint metadata?

In D122857#3463093, @Meinersbur wrote:

@congzhe Could you write here what your plans are? Continue with this patch? Improve the SCEV-based algorithm? Introduce array size hint metadata?

I think I will look into how I can improve the SCEV-based algorithm to incorporate fixed-size array delinearization.

I wonder if you could clarify what you meant by "continue with this patch"? I would certainly like to continue with the current patch if possible, but due to opaque pointers it seems that the current approach in this patch become fragile and not viable?

In D122857#3464913, @congzhe wrote:

I wonder if you could clarify what you meant by "continue with this patch"? I would certainly like to continue with the current patch if possible, but due to opaque pointers it seems that the current approach in this patch become fragile and not viable?

I would not block this based only on a possible future development, taking into account that there are already uses of getIndexExpressionsFromGEP (e.g. by DA) and there is a possible workaround.

In D122857#3465404, @Meinersbur wrote:

In D122857#3464913, @congzhe wrote:

I wonder if you could clarify what you meant by "continue with this patch"? I would certainly like to continue with the current patch if possible, but due to opaque pointers it seems that the current approach in this patch become fragile and not viable?

I would not block this based only on a possible future development, taking into account that there are already uses of getIndexExpressionsFromGEP (e.g. by DA) and there is a possible workaround.

Thanks Michael, that does sound good to me. I've already addressed the comments in this patch in its current form. Moving forward I guess we could do one of the following:

go ahead trying to merge it and do the refactoring (as mentioned in https://reviews.llvm.org/D122857#3450657) in the next patch, or
modify this patch to do refactoring in the current patch.

I am leaning towards (1) since it might be easier to track the changes, as compared to doing a whole lot of work in one patch. But I don't have a strong preference and I'll be glad to take either option (and any other possible option).

I am OK with doing the refactoring later.

This revision is now accepted and ready to land.Apr 25 2022, 10:18 AM

This revision was landed with ongoing or failed builds.Apr 29 2022, 1:19 PM

Closed by commit rGc428a3d2a09e: [LoopCacheAnalysis] Enable delinearization of fixed sized arrays (authored by congzhe). · Explain Why

This revision was automatically updated to reflect the committed changes.

congzhe added a commit: rGc428a3d2a09e: [LoopCacheAnalysis] Enable delinearization of fixed sized arrays.

Landed this patch, will work on the refactoring in the next patch.

The test case added here appears to sometimes fail due to output ordering differences. E.g. https://lab.llvm.org/buildbot/#/builders/16/builds/28225

That failure is blamed on my revert commit -- but in my local build, the test isn't failing. Perhaps there's some nondeterminism in processing order? I might switch these to CHECK-DAG, though I don't know if that's just covering up some other underlying bug...

In D122857#3483554, @jyknight wrote:

The test case added here appears to sometimes fail due to output ordering differences. E.g. https://lab.llvm.org/buildbot/#/builders/16/builds/28225

That failure is blamed on my revert commit -- but in my local build, the test isn't failing. Perhaps there's some nondeterminism in processing order? I might switch these to CHECK-DAG, though I don't know if that's just covering up some other underlying bug...

Thanks James, I've noticed that as well - In my local build it does pass. But I should have used CHECK-DAG. Let me commit a small patch that does this.

congzhe mentioned this in rG97b8a54b25f3: [LoopCacheAnalysis] Minor test case update.Apr 29 2022, 3:47 PM

In D122857#3483562, @congzhe wrote:

In D122857#3483554, @jyknight wrote:

The test case added here appears to sometimes fail due to output ordering differences. E.g. https://lab.llvm.org/buildbot/#/builders/16/builds/28225

That failure is blamed on my revert commit -- but in my local build, the test isn't failing. Perhaps there's some nondeterminism in processing order? I might switch these to CHECK-DAG, though I don't know if that's just covering up some other underlying bug...

Thanks James, I've noticed that as well - In my local build it does pass. But I should have used CHECK-DAG. Let me commit a small patch that does this.

Fixed in https://github.com/llvm/llvm-project/commit/97b8a54b25f326cac8324e0ee3adde271799951c

Is it possible to make the print order deterministic instead?

In D122857#3483652, @congzhe wrote:

In D122857#3483562, @congzhe wrote:

In D122857#3483554, @jyknight wrote:

The test case added here appears to sometimes fail due to output ordering differences. E.g. https://lab.llvm.org/buildbot/#/builders/16/builds/28225

That failure is blamed on my revert commit -- but in my local build, the test isn't failing. Perhaps there's some nondeterminism in processing order? I might switch these to CHECK-DAG, though I don't know if that's just covering up some other underlying bug...

Thanks James, I've noticed that as well - In my local build it does pass. But I should have used CHECK-DAG. Let me commit a small patch that does this.

Fixed in https://github.com/llvm/llvm-project/commit/97b8a54b25f326cac8324e0ee3adde271799951c

CHECK-DAG works around the non-determinism. The analysis should probably be fixed to print results in a deterministic fashion.

congzhe mentioned this in D124725: [NFC][LoopCacheAnalysis] Use stable_sort() to avoid non-deterministic print output.Apr 30 2022, 7:47 PM

In D122857#3483912, @fhahn wrote:

In D122857#3483652, @congzhe wrote:

In D122857#3483562, @congzhe wrote:

In D122857#3483554, @jyknight wrote:

The test case added here appears to sometimes fail due to output ordering differences. E.g. https://lab.llvm.org/buildbot/#/builders/16/builds/28225

That failure is blamed on my revert commit -- but in my local build, the test isn't failing. Perhaps there's some nondeterminism in processing order? I might switch these to CHECK-DAG, though I don't know if that's just covering up some other underlying bug...

Thanks James, I've noticed that as well - In my local build it does pass. But I should have used CHECK-DAG. Let me commit a small patch that does this.

Fixed in https://github.com/llvm/llvm-project/commit/97b8a54b25f326cac8324e0ee3adde271799951c

CHECK-DAG works around the non-determinism. The analysis should probably be fixed to print results in a deterministic fashion.

@nikic @fhahn Thanks for the comments, I posted a patch D124725 to fix the nondeterminism order, I'd appreciate your comments.

congzhe mentioned this in D124745: [Delinearization] Refactoring of fixed-size array delinearization.May 1 2022, 5:24 PM

congzhe mentioned this in rG4c77d0276b83: [Delinearization] Refactoring of fixed-size array delinearization.Jun 16 2022, 1:09 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

LoopCacheAnalysis.h

4 lines

lib/

Analysis/

LoopCacheAnalysis.cpp

67 lines

test/

Analysis/

LoopCacheAnalysis/

PowerPC/

LoopnestFixedSize.ll

160 lines

Diff 426151

llvm/include/llvm/Analysis/LoopCacheAnalysis.h

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	public:
/// subscripts is zero		/// subscripts is zero
/// - or otherwise equal to 'TripCount'.		/// - or otherwise equal to 'TripCount'.
CacheCostTy computeRefCost(const Loop &L, unsigned CLS) const;		CacheCostTy computeRefCost(const Loop &L, unsigned CLS) const;

private:		private:
/// Attempt to delinearize the indexed reference.		/// Attempt to delinearize the indexed reference.
bool delinearize(const LoopInfo &LI);		bool delinearize(const LoopInfo &LI);

		bool tryDelinearizeFixedSize(ScalarEvolution SE, Instruction Src,
		const SCEV *SrcAccessFn,
		SmallVectorImpl<const SCEV *> &SrcSubscripts);

/// Return true if the index reference is invariant with respect to loop \p L.		/// Return true if the index reference is invariant with respect to loop \p L.
bool isLoopInvariant(const Loop &L) const;		bool isLoopInvariant(const Loop &L) const;

/// Return true if the indexed reference is 'consecutive' in loop \p L.		/// Return true if the indexed reference is 'consecutive' in loop \p L.
/// An indexed reference is 'consecutive' if the only coefficient that uses		/// An indexed reference is 'consecutive' if the only coefficient that uses
/// the loop induction variable is the rightmost one, and the access stride is		/// the loop induction variable is the rightmost one, and the access stride is
/// smaller than the cache line size \p CLS.		/// smaller than the cache line size \p CLS.
bool isConsecutive(const Loop &L, unsigned CLS) const;		bool isConsecutive(const Loop &L, unsigned CLS) const;
Show All 27 Lines	private:
/// The subscript (indexes) of the memory reference.		/// The subscript (indexes) of the memory reference.
SmallVector<const SCEV *, 3> Subscripts;		SmallVector<const SCEV *, 3> Subscripts;

/// The dimensions of the memory reference.		/// The dimensions of the memory reference.
SmallVector<const SCEV *, 3> Sizes;		SmallVector<const SCEV *, 3> Sizes;

ScalarEvolution &SE;		ScalarEvolution &SE;
};		};

/// A reference group represents a set of memory references that exhibit		/// A reference group represents a set of memory references that exhibit
		MeinersburUnsubmitted Not Done Reply Inline Actions In principle, each dimension can be dynamic or fixed size separately. For instance using C99 VLAs: void func(int n, double A[][128][n]); has 3 dimensions of which only the middle one is fixed size. However, DependenceInfo may not support them either? Meinersbur: In principle, each dimension can be dynamic or fixed size separately. For instance using C99…
		bmahjourUnsubmitted Not Done Reply Inline Actions The existing delinearization algorithms only work if all sizes are parametric or if all are constant. `A` in the example above will be a plain `double` pointer, so `getIndexExpressionsFromGEP()` won't be able to recover much from it. bmahjour:* The existing delinearization algorithms only work if all sizes are parametric or if all are…
		congzheAuthorUnsubmitted Done Reply Inline Actions Thanks for the explanation, that is exactly what would happen. congzhe: Thanks for the explanation, that is exactly what would happen.
/// temporal or spacial reuse. Two references belong to the same		/// temporal or spacial reuse. Two references belong to the same
/// reference group with respect to a inner loop L iff:		/// reference group with respect to a inner loop L iff:
/// 1. they have a loop independent dependency, or		/// 1. they have a loop independent dependency, or
/// 2. they have a loop carried dependence with a small dependence distance		/// 2. they have a loop carried dependence with a small dependence distance
/// (e.g. less than 2) carried by the inner loop, or		/// (e.g. less than 2) carried by the inner loop, or
/// 3. they refer to the same array, and the subscript in their innermost		/// 3. they refer to the same array, and the subscript in their innermost
/// dimension is less than or equal to 'd' (where 'd' is less than the cache		/// dimension is less than or equal to 'd' (where 'd' is less than the cache
/// line size)		/// line size)
▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

llvm/lib/Analysis/LoopCacheAnalysis.cpp

Show First 20 Lines • Show All 313 Lines • ▼ Show 20 Lines	CacheCostTy IndexedReference::computeRefCost(const Loop &L,

LLVM_DEBUG(dbgs().indent(4)		LLVM_DEBUG(dbgs().indent(4)
<< "RefCost is not a constant! Setting to RefCost=InvalidCost "		<< "RefCost is not a constant! Setting to RefCost=InvalidCost "
"(invalid value).\n");		"(invalid value).\n");

return CacheCost::InvalidCost;		return CacheCost::InvalidCost;
}		}

		bool IndexedReference::tryDelinearizeFixedSize(
		ScalarEvolution SE, Instruction Src, const SCEV *SrcAccessFn,
		SmallVectorImpl<const SCEV *> &SrcSubscripts) {
		Value *SrcPtr = getLoadStorePointerOperand(Src);
		const SCEVUnknown *SrcBase =
		dyn_cast<SCEVUnknown>(SE->getPointerBase(SrcAccessFn));

		// Check the simple case where the array dimensions are fixed size.
		auto *SrcGEP = dyn_cast<GetElementPtrInst>(SrcPtr);
		if (!SrcGEP)
		return false;

		SmallVector<int, 4> SrcSizes;
		getIndexExpressionsFromGEP(*SE, SrcGEP, SrcSubscripts, SrcSizes);
		nikicUnsubmitted Not Done Reply Inline Actions So, I guess this is something of a pre-existing issue, but this API looks completely broken to me. Deriving information from a GEP source element type is blatantly illegal under IR semantics -- in particular, the array bounds specified therein may be completely arbitrary. The correct way to derive such information is from SCEV addrec expressions. For example in your first test you have `%arrayidx10` with SCEV `{{(8 + %a)<nuw>,+,8192}<nuw><%for.body>,+,4}<nuw><%for.body4>`, and the steps of those addrecs are semantically meaningful. nikic: So, I guess this is something of a pre-existing issue, but this API looks completely broken to…
		MeinersburUnsubmitted Not Done Reply Inline Actions It should be only a heuristic, using the fact that these indices are usually something sensible from the source code. In case of DependenceInfo, there is an additional check whether the derived indices are actually legal (unless `--da-disable-delinearization-checks=true`). LoopCacheAnalysis is an heuristic analysis entirely, deriving and heuristic cost function without a claim of semantics. The SCEV addrec approach is already used by LoopCacheAnalysis in the form of `llvm::delinearize`. Meinersbur: It should be only a heuristic, using the fact that these indices are usually something sensible…
		nikicUnsubmitted Not Done Reply Inline Actions Okay, this is slightly less bad than I though then -- but GEP type structure should still not be used for heuristic purposes either. If the SCEV addrec approach is already used in llvm::delinearize(), then why is this additional code needed? The addrecs involved in the motivational test cases contain all the necessary information, so there should be no need to look at GEP structure. nikic: Okay, this is slightly less bad than I though then -- but GEP type structure should still not…
		congzheAuthorUnsubmitted Done Reply Inline Actions Thanks for the comment! The current SCEV addrec approach that is already used in llvm::delinearize() could not handle fixed-size arrays, it could only handle parametric-size arrays. That is why we treat delinearization of fixed-size and parametric-size arrays separately in `DependenceAnalysis`, and I followed the same fashion here in `LoopCacheAnalysis`. congzhe: Thanks for the comment! The current SCEV addrec approach that is already used in llvm…
		congzheAuthorUnsubmitted Done Reply Inline Actions Could you please elaborate a bit more on the reason that "GEP type structure should still not be used for heuristic purposes"? Is it because in the longer term we envision GEP instructions will not have type information anymore? congzhe: Could you please elaborate a bit more on the reason that "GEP type structure should still not…
		nikicUnsubmitted Not Done Reply Inline Actions Could you please elaborate a bit more on the reason that "GEP type structure should still not be used for heuristic purposes"? Is it because in the longer term we envision GEP instructions will not have type information anymore? That's part of it, but also more generally that you're relying on the frontend encoding things in a specific way here. Maybe the array is actually wrapped inside a struct. Maybe the array decays into a slice (`[0 x ...]` style) or a pointer. Maybe the user is an old-school C programmer and uses `ary[y * SIZE + x]` instead of `ary[y][x]`. Matching GEP structure locks you into matching a very specific code pattern, and does not support code that is semantically equivalent, but represented slightly differently in IR. You mention that the SCEV-based approach cannot handle fixed-sized arrays. Could you please explain in more detail why? It's not really obvious to me. nikic: > Could you please elaborate a bit more on the reason that "GEP type structure should still not…
		congzheAuthorUnsubmitted Done Reply Inline Actions Thanks for your explanation on GEP type structures. Currently the SCEV-based approach is more like a symbolic technique and if it does not find any parameters (symbols) in the SCEV expression, it just bails out. I'm looking into whether we could incorporate fixed-size array delinearization in the the SCEV-based approach and trying to prototype a solution if possible. I will get back to this thread once I get some results. congzhe: Thanks for your explanation on GEP type structures. Currently the SCEV-based approach is more…

		// Check that the two size arrays are non-empty and equal in length and
		// value.
		if (SrcSizes.empty() \|\| SrcSubscripts.size() <= 1) {
		SrcSubscripts.clear();
		return false;
		}

		Value *SrcBasePtr = SrcGEP->getOperand(0)->stripPointerCasts();

		// Check that for identical base pointers we do not miss index offsets
		MeinersburUnsubmitted Not Done Reply Inline Actions Did you consider one of `stripPointerCasts`, `stripPointerCastsAndAliases`, `stripPointerCastsSameRepresentation`, `stripPointerCastsForAliasAnalysis`? Meinersbur: Did you consider one of `stripPointerCasts`, `stripPointerCastsAndAliases`…
		congzheAuthorUnsubmitted Done Reply Inline Actions Thanks for the comment! According to your comment, this piece of change in `DependenceAnalysis.cpp` is updated to using `stripPointerCasts()` in D123559 and is now landed, and here I've done the same updated. congzhe: Thanks for the comment! According to your comment, this piece of change in `DependenceAnalysis.
		// that have been added before this GEP is applied.
		if (SrcBasePtr != SrcBase->getValue()) {
		SrcSubscripts.clear();
		return false;
		}

		assert(SrcSubscripts.size() == SrcSizes.size() + 1 &&
		"Expected equal number of entries in the list of size and "
		"subscript.");
		bmahjourUnsubmitted Not Done Reply Inline Actions can use a range-based loop here bmahjour: can use a range-based loop here
		congzheAuthorUnsubmitted Done Reply Inline Actions Please correct me if I'm wrong -- here we need the index (i) to index into both `Subscripts` and `SrcSizes`. If we used a range-based loop like `for (const SCEV S : Subscripts)`, we would not be able to get the elements from `SrcSizes`? congzhe:* Please correct me if I'm wrong -- here we need the index (i) to index into both `Subscripts`…

		for (auto Idx : seq<unsigned>(1, Subscripts.size()))
		Sizes.push_back(SE->getConstant(Subscripts[Idx]->getType(), SrcSizes[Idx - 1]));
		bmahjourUnsubmitted Not Done Reply Inline Actions [nit] this assert should be done before the copy into Sizes. bmahjour: [nit] this assert should be done before the copy into Sizes.
		congzheAuthorUnsubmitted Done Reply Inline Actions Thanks for the comment, I've placed the assert before the copy into Sizes. congzhe: Thanks for the comment, I've placed the assert before the copy into Sizes.

		LLVM_DEBUG({
		dbgs() << "Delinearized subscripts of fixed-size array\n"
		<< "SrcGEP:" << *SrcGEP << "\n";
		});
		return true;
		}

bool IndexedReference::delinearize(const LoopInfo &LI) {		bool IndexedReference::delinearize(const LoopInfo &LI) {
assert(Subscripts.empty() && "Subscripts should be empty");		assert(Subscripts.empty() && "Subscripts should be empty");
assert(Sizes.empty() && "Sizes should be empty");		assert(Sizes.empty() && "Sizes should be empty");
assert(!IsValid && "Should be called once from the constructor");		assert(!IsValid && "Should be called once from the constructor");
LLVM_DEBUG(dbgs() << "Delinearizing: " << StoreOrLoadInst << "\n");		LLVM_DEBUG(dbgs() << "Delinearizing: " << StoreOrLoadInst << "\n");

const SCEV *ElemSize = SE.getElementSize(&StoreOrLoadInst);		const SCEV *ElemSize = SE.getElementSize(&StoreOrLoadInst);
const BasicBlock *BB = StoreOrLoadInst.getParent();		const BasicBlock *BB = StoreOrLoadInst.getParent();

if (Loop *L = LI.getLoopFor(BB)) {		if (Loop *L = LI.getLoopFor(BB)) {
const SCEV *AccessFn =		const SCEV *AccessFn =
SE.getSCEVAtScope(getPointerOperand(&StoreOrLoadInst), L);		SE.getSCEVAtScope(getPointerOperand(&StoreOrLoadInst), L);

BasePointer = dyn_cast<SCEVUnknown>(SE.getPointerBase(AccessFn));		BasePointer = dyn_cast<SCEVUnknown>(SE.getPointerBase(AccessFn));
if (BasePointer == nullptr) {		if (BasePointer == nullptr) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs().indent(2)		dbgs().indent(2)
<< "ERROR: failed to delinearize, can't identify base pointer\n");		<< "ERROR: failed to delinearize, can't identify base pointer\n");
return false;		return false;
}		}

		bool IsFixedSize = false;
		// Try to delinearize fixed-size arrays.
		if (tryDelinearizeFixedSize(&SE, &StoreOrLoadInst, AccessFn, Subscripts)) {
		bmahjourUnsubmitted Not Done Reply Inline Actions Why do we need to store `IsFixedSize` as a data member? bmahjour: Why do we need to store `IsFixedSize` as a data member?
		congzheAuthorUnsubmitted Done Reply Inline Actions Thanks, you are right that it does not necessarily need to be a class member. I've updated it to be a local variable. congzhe: Thanks, you are right that it does not necessarily need to be a class member. I've updated it…
		IsFixedSize = true;
		/// The last element of \p Sizes is the element size.
		Sizes.push_back(ElemSize);
		LLVM_DEBUG(dbgs().indent(2) << "In Loop '" << L->getName()
		<< "', AccessFn: " << *AccessFn << "\n");
		}

AccessFn = SE.getMinusSCEV(AccessFn, BasePointer);		AccessFn = SE.getMinusSCEV(AccessFn, BasePointer);

		// Try to delinearize parametric-size arrays.
		if (!IsFixedSize) {
LLVM_DEBUG(dbgs().indent(2) << "In Loop '" << L->getName()		LLVM_DEBUG(dbgs().indent(2) << "In Loop '" << L->getName()
<< "', AccessFn: " << *AccessFn << "\n");		<< "', AccessFn: " << *AccessFn << "\n");

llvm::delinearize(SE, AccessFn, Subscripts, Sizes,		llvm::delinearize(SE, AccessFn, Subscripts, Sizes,
SE.getElementSize(&StoreOrLoadInst));		SE.getElementSize(&StoreOrLoadInst));
		}

if (Subscripts.empty() \|\| Sizes.empty() \|\|		if (Subscripts.empty() \|\| Sizes.empty() \|\|
Subscripts.size() != Sizes.size()) {		Subscripts.size() != Sizes.size()) {
// Attempt to determine whether we have a single dimensional array access.		// Attempt to determine whether we have a single dimensional array access.
// before giving up.		// before giving up.
if (!isOneDimensionalArray(AccessFn, ElemSize, *L, SE)) {		if (!isOneDimensionalArray(AccessFn, ElemSize, *L, SE)) {
LLVM_DEBUG(dbgs().indent(2)		LLVM_DEBUG(dbgs().indent(2)
<< "ERROR: failed to delinearize reference\n");		<< "ERROR: failed to delinearize reference\n");
▲ Show 20 Lines • Show All 304 Lines • Show Last 20 Lines

llvm/test/Analysis/LoopCacheAnalysis/PowerPC/LoopnestFixedSize.ll

This file was added.

				; RUN: opt < %s -passes='print<loop-cache-cost>' -disable-output 2>&1 \| FileCheck %s

				target datalayout = "e-m:e-i64:64-n32:64"
				target triple = "powerpc64le-unknown-linux-gnu"

				; Check delinearization in loop cache analysis can handle fixed-size arrays.
				; The IR is copied from llvm/test/Analysis/DependenceAnalysis/SimpleSIVNoValidityCheckFixedSize.ll

				; CHECK: Loop 'for.body' has cost = 4186116
				; CHECK: Loop 'for.body4' has cost = 128898

				;; #define N 1024
				;; #define M 2048
				;; void t1(int a[N][M]) {
				;; for (int i = 0; i < N-1; ++i)
				;; for (int j = 2; j < M; ++j)
				;; a[i][j] = a[i+1][j-2];
				;; }

				define void @t1([2048 x i32]* %a) {
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.inc11
				%indvars.iv4 = phi i64 [ 0, %entry ], [ %indvars.iv.next5, %for.inc11 ]
				br label %for.body4

				for.body4: ; preds = %for.body, %for.body4
				%indvars.iv = phi i64 [ 2, %for.body ], [ %indvars.iv.next, %for.body4 ]
				%0 = add nuw nsw i64 %indvars.iv4, 1
				%1 = add nsw i64 %indvars.iv, -2
				%arrayidx6 = getelementptr inbounds [2048 x i32], [2048 x i32]* %a, i64 %0, i64 %1
				%2 = load i32, i32* %arrayidx6, align 4
				%a_gep = getelementptr inbounds [2048 x i32], [2048 x i32]* %a, i64 0
				%arrayidx10 = getelementptr inbounds [2048 x i32], [2048 x i32]* %a_gep, i64 %indvars.iv4, i64 %indvars.iv
				store i32 %2, i32* %arrayidx10, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp ne i64 %indvars.iv.next, 2048
				br i1 %exitcond, label %for.body4, label %for.inc11

				for.inc11: ; preds = %for.body4
				%indvars.iv.next5 = add nuw nsw i64 %indvars.iv4, 1
				%exitcond7 = icmp ne i64 %indvars.iv.next5, 1023
				br i1 %exitcond7, label %for.body, label %for.end13

				for.end13: ; preds = %for.inc11
				ret void
				}


				; CHECK: Loop 'for.body' has cost = 4186116
				; CHECK: Loop 'for.body4' has cost = 128898

				define void @t2([2048 x i32]* %a) {
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.inc11
				%indvars.iv4 = phi i64 [ 0, %entry ], [ %indvars.iv.next5, %for.inc11 ]
				br label %for.body4

				for.body4: ; preds = %for.body, %for.body4
				%indvars.iv = phi i64 [ 2, %for.body ], [ %indvars.iv.next, %for.body4 ]
				%0 = add nuw nsw i64 %indvars.iv4, 1
				%1 = add nsw i64 %indvars.iv, -2
				%arrayidx6 = getelementptr inbounds [2048 x i32], [2048 x i32]* %a, i64 %0, i64 %1
				%2 = load i32, i32* %arrayidx6, align 4
				%call = call [2048 x i32]* @func_with_returned_arg([2048 x i32]* returned %a)
				%arrayidx10 = getelementptr inbounds [2048 x i32], [2048 x i32]* %call, i64 %indvars.iv4, i64 %indvars.iv
				store i32 %2, i32* %arrayidx10, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp ne i64 %indvars.iv.next, 2048
				br i1 %exitcond, label %for.body4, label %for.inc11

				for.inc11: ; preds = %for.body4
				%indvars.iv.next5 = add nuw nsw i64 %indvars.iv4, 1
				%exitcond7 = icmp ne i64 %indvars.iv.next5, 1023
				br i1 %exitcond7, label %for.body, label %for.end13

				for.end13: ; preds = %for.inc11
				ret void
				}

				declare [2048 x i32]* @func_with_returned_arg([2048 x i32]* returned %arg)

				; CHECK: Loop 'for.body' has cost = 4472886244958208
				; CHECK: Loop 'for.body4' has cost = 4472886244958208
				; CHECK: Loop 'for.body8' has cost = 4472886244958208
				; CHECK: Loop 'for.body12' has cost = 4472886244958208
				; CHECK: Loop 'for.body16' has cost = 137728168833024


				;; #define N 1024
				;; #define M 2048
				;; void t3(int a[][N][N][N][M]) {
				;; for (int i1 = 0; i1 < N-1; ++i1)
				;; for (int i2 = 2; i2 < N; ++i2)
				;; for (int i3 = 0; i3 < N; ++i3)
				;; for (int i4 = 3; i4 < N; ++i4)
				;; for (int i5 = 0; i5 < M-2; ++i5)
				;; a[i1][i2][i3][i4][i5] = a[i1+1][i2-2][i3][i4-3][i5+2];
				;; }

				define void @t3([1024 x [1024 x [1024 x [2048 x i32]]]]* %a) {
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.inc46
				%indvars.iv18 = phi i64 [ 0, %entry ], [ %indvars.iv.next19, %for.inc46 ]
				br label %for.body4

				for.body4: ; preds = %for.body, %for.inc43
				%indvars.iv14 = phi i64 [ 2, %for.body ], [ %indvars.iv.next15, %for.inc43 ]
				br label %for.body8

				for.body8: ; preds = %for.body4, %for.inc40
				%indvars.iv11 = phi i64 [ 0, %for.body4 ], [ %indvars.iv.next12, %for.inc40 ]
				br label %for.body12

				for.body12: ; preds = %for.body8, %for.inc37
				%indvars.iv7 = phi i64 [ 3, %for.body8 ], [ %indvars.iv.next8, %for.inc37 ]
				br label %for.body16

				for.body16: ; preds = %for.body12, %for.body16
				%indvars.iv = phi i64 [ 0, %for.body12 ], [ %indvars.iv.next, %for.body16 ]
				%0 = add nuw nsw i64 %indvars.iv18, 1
				%1 = add nsw i64 %indvars.iv14, -2
				%2 = add nsw i64 %indvars.iv7, -3
				%3 = add nuw nsw i64 %indvars.iv, 2
				%arrayidx26 = getelementptr inbounds [1024 x [1024 x [1024 x [2048 x i32]]]], [1024 x [1024 x [1024 x [2048 x i32]]]]* %a, i64 %0, i64 %1, i64 %indvars.iv11, i64 %2, i64 %3
				%4 = load i32, i32* %arrayidx26, align 4
				%arrayidx36 = getelementptr inbounds [1024 x [1024 x [1024 x [2048 x i32]]]], [1024 x [1024 x [1024 x [2048 x i32]]]]* %a, i64 %indvars.iv18, i64 %indvars.iv14, i64 %indvars.iv11, i64 %indvars.iv7, i64 %indvars.iv
				store i32 %4, i32* %arrayidx36, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp ne i64 %indvars.iv.next, 2046
				br i1 %exitcond, label %for.body16, label %for.inc37

				for.inc37: ; preds = %for.body16
				%indvars.iv.next8 = add nuw nsw i64 %indvars.iv7, 1
				%exitcond10 = icmp ne i64 %indvars.iv.next8, 1024
				br i1 %exitcond10, label %for.body12, label %for.inc40

				for.inc40: ; preds = %for.inc37
				%indvars.iv.next12 = add nuw nsw i64 %indvars.iv11, 1
				%exitcond13 = icmp ne i64 %indvars.iv.next12, 1024
				br i1 %exitcond13, label %for.body8, label %for.inc43

				for.inc43: ; preds = %for.inc40
				%indvars.iv.next15 = add nuw nsw i64 %indvars.iv14, 1
				%exitcond17 = icmp ne i64 %indvars.iv.next15, 1024
				br i1 %exitcond17, label %for.body4, label %for.inc46

				for.inc46: ; preds = %for.inc43
				%indvars.iv.next19 = add nuw nsw i64 %indvars.iv18, 1
				%exitcond21 = icmp ne i64 %indvars.iv.next19, 1023
				br i1 %exitcond21, label %for.body, label %for.end48

				for.end48: ; preds = %for.inc46
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[LoopCacheAnalysis] Enable delinearization of fixed sized arraysClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 426151

llvm/include/llvm/Analysis/LoopCacheAnalysis.h

llvm/lib/Analysis/LoopCacheAnalysis.cpp

llvm/test/Analysis/LoopCacheAnalysis/PowerPC/LoopnestFixedSize.ll

[LoopCacheAnalysis] Enable delinearization of fixed sized arrays
ClosedPublic