This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
LICM.cpp
-
test/Transforms/LICM/
-
Transforms/
-
LICM/
-
hoist-invariant-load-twice.ll

Differential D26501

[LICM] Retain load instruction invariant metadata when hoisting
AbandonedPublic

Authored by whitequark on Nov 10 2016, 6:57 AM.

Download Raw Diff

Details

Reviewers

reames
sanjoy
hfinkel

Summary

Currently, when hoisting a load marked invariant, LICM will drop
all non-debug metadata, including the !invariant.load metadata.
Most crucially, this prevents LICM from hoisting it again from
an outerwards loop, and potentially leading to a significant
performance hit in case where the inner loop is very short.

This commit alters LICM to preserve !invariant.load metadata since
"the optimizer may assume the memory location referenced by the load
contains the same value at all points in the program where the memory
location is known to be dereferenceable".

Diff Detail

Repository: rL LLVM

Event Timeline

whitequark updated this revision to Diff 77479.Nov 10 2016, 6:57 AM

whitequark retitled this revision from to [LICM] Retain load instruction metadata when hoisting.

whitequark updated this object.

whitequark added reviewers: hfinkel, sanjoy, reames.

whitequark set the repository for this revision to rL LLVM.

junbuml added a subscriber: junbuml.Nov 10 2016, 7:10 AM

Loads of pointers are not hoisted for some reason, so keeping the pointer-related metadata makes no sense for now. Amended patch to be less aggressive and only keep !invariant.load.

Actually updated the diff this time.

We can do this only if we alter the LangRef to make it clear that we're not allowed to have control dependencies on the "invariantness" of the load. The current text says, "If a load instruction tagged with the !invariant.load metadata is executed, the optimizer may assume the memory location referenced by the load contains the same value at all points in the program where the memory location is known to be dereferenceable." It is the "if a load... is executed" part that is the problem here. It implies that the "invariantness" of the load can depend on any and all control dependencies on the load instruction itself.

I think that this is true for current use cases (i.e. pointers inside vtbls), but we may need to be careful about breaking existing users. Also, I believe that our interpretation of "invariantness" at the MI level ignores control dependencies (and also is intertwined with dereferenceability, but that's another matter). Maybe ask on llvm-dev?

Another option is that relevant use cases here really want some other kind of metadata (noalias or something new).

@hfinkel OK, I see. I wonder if the desired optimizations here as well as in D18738 could be more easily handled by loop peeling.

but we may need to be careful about breaking existing users

This definitely breaks us. We use this for marking loads of array lengths, and we will often see cases like this:

void f(java.lang.Foo s) {
  // s is known to be dereferenceable(32), say
  k0 = s.field;
  foo()
  k1 = s.field;
  if (s instanceof int[]) {  // actually always false at runtime, but we've not folded this yet
    k2 = s.length // same offset as k.field, and marked as invariant.load, so the load looks the same in IR
  }
}

We can't hoist k2 out of the control flow to the beginning of the function, and then CSE k0 and k1 since we do really have a control dependency here -- k2 is invariant only if s is an array.

I also want to point out that when introducing a non-control dependent variant of invariant.load we need to be careful to avoid weird situations where dead code can affect behavior of a program. That is, the behavior of:

foo;
bar;

should not be different from

foo;
bar;
if (<false at runtime>)
  k = ...; !invariant_load

In D26501#591939, @whitequark wrote:

@hfinkel OK, I see. I wonder if the desired optimizations here as well as in D18738 could be more easily handled by loop peeling.

What do you mean? I don't see how loop peeling generally solves this hosting-out-of-nested-loops problem.

In D26501#591952, @hfinkel wrote:

In D26501#591939, @whitequark wrote:

@hfinkel OK, I see. I wonder if the desired optimizations here as well as in D18738 could be more easily handled by loop peeling.

What do you mean? I don't see how loop peeling generally solves this hosting-out-of-nested-loops problem.

Sorry, the idea I had would not actually help.

In D26501#591948, @sanjoy wrote:

but we may need to be careful about breaking existing users

This definitely breaks us.

OK, I see. Do you perhaps also feel a need for a control-independent (or less control-dependent) variation on !invariant.load? Or is this too niche?

In D26501#591957, @whitequark wrote:

In D26501#591952, @hfinkel wrote:

In D26501#591939, @whitequark wrote:

@hfinkel OK, I see. I wonder if the desired optimizations here as well as in D18738 could be more easily handled by loop peeling.

What do you mean? I don't see how loop peeling generally solves this hosting-out-of-nested-loops problem.

Sorry, the idea I had would not actually help.

In D26501#591948, @sanjoy wrote:

but we may need to be careful about breaking existing users

This definitely breaks us.

OK, I see. Do you perhaps also feel a need for a control-independent (or less control-dependent) variation on !invariant.load? Or is this too niche?

I think that we should have one; it is a property of the pointer itself, not the individual accesses, however (since they can always be hoisted to be where the pointer is generated, assuming we also have dereferenceability), and so we should do this by adding a function-attribute/load-metadata pair, like the !unconditionally_dereferenceable proposed in D18738.

whitequark abandoned this revision.May 18 2019, 3:45 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2019, 3:45 PM

Herald added subscribers: llvm-commits, asbirlea. · View Herald Transcript

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

LICM.cpp

3 lines

test/

Transforms/

LICM/

hoist-invariant-load-twice.ll

36 lines

Diff 77486

lib/Transforms/Scalar/LICM.cpp

Context not available.
	// time in isGuaranteedToExecute if we don't actually have anything to	// time in isGuaranteedToExecute if we don't actually have anything to
	// drop. It is a compile time optimization, not required for correctness.	// drop. It is a compile time optimization, not required for correctness.
	!isGuaranteedToExecute(I, DT, CurLoop, SafetyInfo))	!isGuaranteedToExecute(I, DT, CurLoop, SafetyInfo))
	I.dropUnknownNonDebugMetadata();	I.dropUnknownNonDebugMetadata(
		LLVMContext::MD_invariant_load);

	// Move the new node to the Preheader, before its terminator.	// Move the new node to the Preheader, before its terminator.
	I.moveBefore(Preheader->getTerminator());	I.moveBefore(Preheader->getTerminator());
Context not available.

test/Transforms/LICM/hoist-invariant-load-twice.ll

This file was added.

				; RUN: opt -S -licm < %s \| FileCheck %s
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				declare void @work(i32, i32, i32)

				define void @loop(i32* dereferenceable(4) %ptr) {
				; CHECK: entry:
				; CHECK-NEXT: %val = load i32, i32* %ptr, !invariant.load !0
				; CHECK-NEXT: br label %loop1
				entry:
				br label %loop1

				loop1:
				%ind1 = phi i32 [ 0, %entry ], [ %indnext1, %loop2 ]
				%cont1 = icmp eq i32 %ind1, 10
				br i1 %cont1, label %loop2, label %exit

				loop2:
				%ind2 = phi i32 [ 0, %loop1 ], [ %indnext2, %work ]
				%indnext1 = add i32 %ind1, 1
				%cont2 = icmp eq i32 %ind2, 15
				br i1 %cont2, label %work, label %loop1

				work:
				%indnext2 = add i32 %ind2, 1
				%val = load i32, i32* %ptr, !invariant.load !0
				call void @work(i32 %ind1, i32 %ind2, i32 %val)
				br label %loop2

				exit:
				ret void
				}

				!0 = !{}
				!1 = !{i64 4}