This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
-
BlockGenerators.cpp

Differential D25095

[Polly] Ignore removed memory access. NFC.
AbandonedPublic

Authored by Meinersbur on Sep 30 2016, 8:10 AM.

Download Raw Diff

Details

Reviewers

etherzhhb
grosser
gareevroman
jdoerfert

Summary

The DeLICM (and Known) pass will remove redundant explicit memory accesses. This patch prepares the code generator to ignore such removed access. The code currently has no effect because there is no pass except Invariant Load Hoisting yet that removes accesses. The hoisted loads are skipped because their value is already found in GlobalMap.

Opening for review since Tobias mentioned concerns about generateArrayLoad returning Undef.

Diff Detail

Event Timeline

Meinersbur updated this revision to Diff 73054.Sep 30 2016, 8:10 AM

Meinersbur retitled this revision from to [Polly] Ignore removed memory access. NFC..

Meinersbur updated this object.

Meinersbur added reviewers: grosser, jdoerfert, etherzhhb.

Meinersbur added a project: Restricted Project.

Meinersbur added subscribers: pollydev, llvm-commits.

Hi Michael,

I am worried about a situation where we have sth like:

%val = load %A
%div = div %val, 0
store %div

If we drop the div, we might trigger undefined behavior. To my understanding this can never happen, as you will only drop direct copy sequences, right? I am probably too careful here, but if you could quickly check (assert) that the only use of %val is in a store instruction, my worries would disappear.

Also, if you commit this separately, which certainly would not hurt, a JSON test case would be amazing.

Best,
Tobias

In D25095#557608, @grosser wrote:

I am worried about a situation where we have sth like:

%val = load %A
%div = div %val, 0
store %div

If we drop the div, we might trigger undefined behavior.

Did you mean "drop the load"? This patch does not influence non-access instruction handling.

To my understanding this can never happen, as you will only drop direct copy sequences, right? I am probably too careful here, but if you could quickly check (assert) that the only use of %val is in a store instruction, my worries would disappear.

The correctness of removing a memory access is a concern of the the code that removes them. At codegen-time it is very difficult or even impossible to retroactively check whether a removal was correct. There is no exhaustive list of reasons why it could be removed. The detection mechanism you mentioned is neither sufficient nor mandatory even for just DeLICM. It is invalid if the store, which could also be an implicit store, is not also removed. Uses can also be in other statements that get their llvm::Value from a scalar load, but have no effect on the removal of such a load.

Also consider that not generating a store is just as harmful for the generated code as replacing a load by an Undef. The missing store would be even harder to find.

Also, if you commit this separately, which certainly would not hurt, a JSON test case would be amazing.

The json importer currently does not support removing MemoryAccesses. Implementing that will require much more code and certainly also its own review.

In D25095#557608, @grosser wrote:

Also, if you commit this separately, which certainly would not hurt, a JSON test case would be amazing.

Another patch would introduce ScopInfo::removeSingleMemoryAccess(MemoryAccess*). FWIW I could bundle that patch, this patch, and json importer support for removing accesses.

Adding Roman who might have an opinion here as well.

In D25095#557969, @Meinersbur wrote:

In D25095#557608, @grosser wrote:

I am worried about a situation where we have sth like:

%val = load %A
%div = div %val, 0
store %div

If we drop the div, we might trigger undefined behavior.

Did you mean "drop the load"? This patch does not influence non-access instruction handling.

Right.

To my understanding this can never happen, as you will only drop direct copy sequences, right? I am probably too careful here, but if you could quickly check (assert) that the only use of %val is in a store instruction, my worries would disappear.

The correctness of removing a memory access is a concern of the the code that removes them. At codegen-time it is very difficult or even impossible to retroactively check whether a removal was correct. There is no exhaustive list of reasons why it could be removed. The detection mechanism you mentioned is neither sufficient nor mandatory even for just DeLICM. It is invalid if the store, which could also be an implicit store, is not also removed. Uses can also be in other statements that get their llvm::Value from a scalar load, but have no effect on the removal of such a load.

I understand that in the end passes such as DeLICM need to decide what is save to do and what not.

Let me just write down some related, but unfinished thoughts I have, that may give some context of what I am thinking about and where I believe there are possibilities to improve the current design of Polly.

I have been thinking for a while, if we really want to remove loads and stores from the polyhedral representation and to which extend we want to do this. I generally have the feeling that removing memory accesses by dropping the full polly::MemoryAccess is not what we actually want, as it results in an inconsistency between the IR and the polyhedral model. We already discussed this earlier, when we concluded that the MemoryAccess is currently holding two kinds of information. The information about the original access and the information about how we transform the memory access. If we drop the full memory access structure we loose the information about the original memory access. Obviously, there is precedence with Johannes' load hoisting, which introduced this behavior, but I hope we can improve this design in the future by making the separation between new and original memory access behavior more clear

I see different ways to improve this situation. I could e.g. see the load hoisting being expressed by modifying the access functions similar to how we do this today in Roman's and your work. We also have since a long time on the Polly todo list the introduction of ScopStmts that are defined on instruction trees, rather than basic blocks. As part of this, I could imaging that we automatically detect pure 'copy-only' statements and match them to the Copy_Stmt Roman recently introduced.

Assuming this direction is where we want to go (and this probably needs more discussion -- maybe next phone call?), I was wondering where DeLICM fits in here. If DeLICM for example only removes copy statements maybe we could in the future avoid the removal of memory accesses in DeLICM and instead remove full copy statements.

Now going back to this actual patch. I don't think it is necessary to today go full speed on these design changes, but am OK to continue removing full polly::MemoryAccesses for a while. Still, I think to make future changes easier it would be great if we could limit the generality of such changes to what is needed for DeLICM today. As said earlier, I understand that we cannot validate DeLICM retrospectively. But maybe we can check what kind of load/store memory accesses we allow to be removed.

I went back to learn which memory accesses DeLICM removes today and it seems
delicm's cleanup routine does not drop loads at all and only drops stores that store a value that has been loaded before (so no div in between). It seems the load part of this patch can today not be
reached, right? According to the comments in cleanup, you are planning to also remove the corresponding loads, but such loads would only be limited to the ones , where all uses are direct uses by store statements in the same basic block?Does your comment above refer to this situation, when mentioning that the vectorizer would crash if the load is not dropped? Or does this require more than one compute instruction between load and store?

Also consider that not generating a store is just as harmful for the generated code as replacing a load by an Undef. The missing store would be even harder to find.

Also, if you commit this separately, which certainly would not hurt, a JSON test case would be amazing.

The json importer currently does not support removing MemoryAccesses. Implementing that will require much more code and certainly also its own review.

I suggested this as a way to obtain test coverage for this patch without the need for delicm yet. I would hope that JSON is easy to extend (e.g. just mark a memory access as 'deleted').

Another option would be to commit this change together with the delicm cleanup() routine after DELICM. We can then run delicm to test the codegen change and it also becomes more clear what
are the actual uses. I am in fact starting to prefer this approach slightly.

Best,
Tobias

In D25095#558309, @grosser wrote:

Now going back to this actual patch. I don't think it is necessary to today go full speed on these design changes, but am OK to continue removing full polly::MemoryAccesses for a while. Still, I think to make future changes easier it would be great if we could limit the generality of such changes to what is needed for DeLICM today. As said earlier, I understand that we cannot validate DeLICM retrospectively. But maybe we can check what kind of load/store memory accesses we allow to be removed.

I went back to learn which memory accesses DeLICM removes today and it seems
delicm's cleanup routine does not drop loads at all and only drops stores that store a value that has been loaded before (so no div in between). It seems the load part of this patch can today not be
reached, right? According to the comments in cleanup, you are planning to also remove the corresponding loads, but such loads would only be limited to the ones , where all uses are direct uses by store statements in the same basic block?Does your comment above refer to this situation, when mentioning that the vectorizer would crash if the load is not dropped? Or does this require more than one compute instruction between load and store?

the cleanup() currently implemented in DeLICM does the base minimum: Remove stores to array that a not necessary, but would impact the freedom of the scheduler. Removing loads as well would require something like a reference counter for loads (or just looking for remaining valid uses), which I might still implement. I think there were also situations were stores that are dead a left over, but I can't recall details at the moment. These are concerns of DeLICM, and I would be uncomfortable with adding these details to an unrelated pass such as CodeGen.

If the load is not dropped, it might read an location that has never been written to because all writes to it have been redirected, ie. and Undef value. Without the check added in this patch whether the loads has been dropped, it would probably crash because it required details from an deleted MemoryAccess. If there is a compute instruction in between in the same statement, then nothing should be removed in the first place.

Also consider that not generating a store is just as harmful for the generated code as replacing a load by an Undef. The missing store would be even harder to find.

Also, if you commit this separately, which certainly would not hurt, a JSON test case would be amazing.

The json importer currently does not support removing MemoryAccesses. Implementing that will require much more code and certainly also its own review.

I suggested this as a way to obtain test coverage for this patch without the need for delicm yet. I would hope that JSON is easy to extend (e.g. just mark a memory access as 'deleted').

Another option would be to commit this change together with the delicm cleanup() routine after DELICM. We can then run delicm to test the codegen change and it also becomes more clear what
are the actual uses. I am in fact starting to prefer this approach slightly.

Committing with DeLICM would be fine with me, but doesn't solve the question of returning undef.

Do implicitly suggest to commit cleanup() separately from the main part?

Michael

In D25095#558989, @Meinersbur wrote:

In D25095#558309, @grosser wrote:

Now going back to this actual patch. I don't think it is necessary to today go full speed on these design changes, but am OK to continue removing full polly::MemoryAccesses for a while. Still, I think to make future changes easier it would be great if we could limit the generality of such changes to what is needed for DeLICM today. As said earlier, I understand that we cannot validate DeLICM retrospectively. But maybe we can check what kind of load/store memory accesses we allow to be removed.

I went back to learn which memory accesses DeLICM removes today and it seems
delicm's cleanup routine does not drop loads at all and only drops stores that store a value that has been loaded before (so no div in between). It seems the load part of this patch can today not be
reached, right? According to the comments in cleanup, you are planning to also remove the corresponding loads, but such loads would only be limited to the ones , where all uses are direct uses by store statements in the same basic block?Does your comment above refer to this situation, when mentioning that the vectorizer would crash if the load is not dropped? Or does this require more than one compute instruction between load and store?

the cleanup() currently implemented in DeLICM does the base minimum: Remove stores to array that a not necessary, but would impact the freedom of the scheduler. Removing loads as well would require something like a reference counter for loads (or just looking for remaining valid uses), which I might still implement. I think there were also situations were stores that are dead a left over, but I can't recall details at the moment. These are concerns of DeLICM, and I would be uncomfortable with adding these details to an unrelated pass such as CodeGen.

OK. So it seems today (and in the immediate future) DeLICM will only drop stores, but loads are left untouched. Is this right?

If the load is not dropped, it might read an location that has never been written to because all writes to it have been redirected, ie. and Undef value. Without the check added in this patch whether the loads has been dropped, it would probably crash because it required details from an deleted MemoryAccess. If there is a compute instruction in between in the same statement, then nothing should be removed in the first place.

Does this not mean that load memory accesses will never be removed by DeLICM and that consequently the "if (!Stmt.getArrayAccessOrNULLFor(Load))" is never false, due to DeLICM? If I add an assert before the "createUNDEF" none of the current DeLICM test cases fails, except the ones that already fail due to the missing "known" pass.

It seems as if this code currently is not needed. If this is the case, maybe we should postpone this discussion until we actually decide to remove loads in DeLICM?

Also consider that not generating a store is just as harmful for the generated code as replacing a load by an Undef. The missing store would be even harder to find.

Also, if you commit this separately, which certainly would not hurt, a JSON test case would be amazing.

The json importer currently does not support removing MemoryAccesses. Implementing that will require much more code and certainly also its own review.

I suggested this as a way to obtain test coverage for this patch without the need for delicm yet. I would hope that JSON is easy to extend (e.g. just mark a memory access as 'deleted').

Another option would be to commit this change together with the delicm cleanup() routine after DELICM. We can then run delicm to test the codegen change and it also becomes more clear what
are the actual uses. I am in fact starting to prefer this approach slightly.

Committing with DeLICM would be fine with me, but doesn't solve the question of returning undef.

Do implicitly suggest to commit cleanup() separately from the main part?

Yes. The cleanup routine is certainly an independent change that would make sense to split off.

I personally prefer small changes similar to how Sven is looking for patch sets that are split up in independent changes. Optimally changes of commonly around 30-50 lines, with around 200 lines max. I understand that this takes additional work to split up patches, but in my experience it saves a lot of time in discussions. Obviously, if I am open to discuss different strategies to upstream such large patches.

(In general I prefer to avoid to build up such large patches in the first place, as working with them always adds a lot of overhead).

OK, let's postpone this until the cleanup() patch.

Thinking about this change, we could possibly also commit a minimal DeLICM pass which only runs cleanup(). In some way, this is also a trivial change, which could even be tested on an appropriately designed .ll input file. It would allow us to get all the basic infrastructure in place.

In D25095#560645, @grosser wrote:

Thinking about this change, we could possibly also commit a minimal DeLICM pass which only runs cleanup(). In some way, this is also a trivial change, which could even be tested on an appropriately designed .ll input file. It would allow us to get all the basic infrastructure in place.

If we want to do this, I'd prefer to put cleanup() into a separate pass.

Revision Contents

Path

Size

lib/

CodeGen/

BlockGenerators.cpp

16 lines

Diff 73054

lib/CodeGen/BlockGenerators.cpp

	Show First 20 Lines • Show All 229 Lines • ▼ Show 20 Lines
	}			}

	Value BlockGenerator::generateArrayLoad(ScopStmt &Stmt, LoadInst Load,			Value BlockGenerator::generateArrayLoad(ScopStmt &Stmt, LoadInst Load,
	ValueMapT &BBMap, LoopToScevMapT &LTS,			ValueMapT &BBMap, LoopToScevMapT &LTS,
	isl_id_to_ast_expr *NewAccesses) {			isl_id_to_ast_expr *NewAccesses) {
	if (Value *PreloadLoad = GlobalMap.lookup(Load))			if (Value *PreloadLoad = GlobalMap.lookup(Load))
	return PreloadLoad;			return PreloadLoad;

				// Optimization passes (Invariant Load Hoisting, DeLICM) can remove loads when
				// replaced by a load at another location. The invariant load case is caught
				// by the lookup in GlobalMap which contains the value. In other cases the
				// optimization pass is responsible to ensure that the load's value is not
				// used, either by replacing all uses or it wasn't used in the first place. We
				// need to return some value because VectorBlockGenerator might multiplex it
				// into a vector register, even though the vector component is not used.
				if (!Stmt.getArrayAccessOrNULLFor(Load))
				return UndefValue::get(Load->getType());

	Value *NewPointer =			Value *NewPointer =
	generateLocationAccessed(Stmt, Load, BBMap, LTS, NewAccesses);			generateLocationAccessed(Stmt, Load, BBMap, LTS, NewAccesses);
	Value *ScalarLoad = Builder.CreateAlignedLoad(			Value *ScalarLoad = Builder.CreateAlignedLoad(
	NewPointer, Load->getAlignment(), Load->getName() + "_p_scalar_");			NewPointer, Load->getAlignment(), Load->getName() + "_p_scalar_");

	if (DebugPrinting)			if (DebugPrinting)
	RuntimeDebugBuilder::createCPUPrinter(Builder, "Load from ", NewPointer,			RuntimeDebugBuilder::createCPUPrinter(Builder, "Load from ", NewPointer,
	": ", ScalarLoad, "\n");			": ", ScalarLoad, "\n");

	return ScalarLoad;			return ScalarLoad;
	}			}

	void BlockGenerator::generateArrayStore(ScopStmt &Stmt, StoreInst *Store,			void BlockGenerator::generateArrayStore(ScopStmt &Stmt, StoreInst *Store,
	ValueMapT &BBMap, LoopToScevMapT &LTS,			ValueMapT &BBMap, LoopToScevMapT &LTS,
	isl_id_to_ast_expr *NewAccesses) {			isl_id_to_ast_expr *NewAccesses) {
				// Optimization passes (DeLICM) can remove stores when deemed unnecessary or
				// replaced by a store at another location. In this case do not generate a
				// StoreInst.
				if (!Stmt.getArrayAccessOrNULLFor(Store))
				return;

	Value *NewPointer =			Value *NewPointer =
	generateLocationAccessed(Stmt, Store, BBMap, LTS, NewAccesses);			generateLocationAccessed(Stmt, Store, BBMap, LTS, NewAccesses);
	Value *ValueOperand = getNewValue(Stmt, Store->getValueOperand(), BBMap, LTS,			Value *ValueOperand = getNewValue(Stmt, Store->getValueOperand(), BBMap, LTS,
	getLoopForStmt(Stmt));			getLoopForStmt(Stmt));

	if (DebugPrinting)			if (DebugPrinting)
	RuntimeDebugBuilder::createCPUPrinter(Builder, "Store to ", NewPointer,			RuntimeDebugBuilder::createCPUPrinter(Builder, "Store to ", NewPointer,
	": ", ValueOperand, "\n");			": ", ValueOperand, "\n");
	▲ Show 20 Lines • Show All 1,208 Lines • Show Last 20 Lines