Download Raw Diff

Details

Reviewers

uweigand
eastig
chandlerc
efriedma
fhahn
craig.topper
hfinkel
RKSimon

Commits

rL360970: [CodeMetrics] Don't let extends of i1 be free.

Summary

getUserCost() currently returns TCC_Free for any extend of a compare (i1) result. It seems this is only true in a limited number of cases where for example two compares are chained. Even in those types of cases it seems unlikely that they are generally free, while they may be in some cases.

This patch therefore suggests removing this special handling of cast of i1. No tests are failing because of this. What remains is to make sure that this does not introduce any performance regressions. It seems that on SystemZ and also preliminary on X86 this looks ok. Some additional benchmarking on more targets (and X86) would be welcome.

If some target actually would want the old behavior, it could override getUserCost().

(This was originally https://reviews.llvm.org/D53373 -- a bit of discussion there)

Diff Detail

Event Timeline

jonpa created this revision.Nov 20 2018, 12:23 AM

As was mentioned in the original patch, this does still need tests. There are tests here, for example in test/Analysis/CostModel/.... You should be able to see some differences in costs, especially for architectures where the resulting cost computation differs significantly.

include/llvm/Analysis/TargetTransformInfoImpl.h
809–810	You don't need this line at all -- the checks below are sufficient.
811–815	The intent (that seems pretty clear from this comment) was not to necessarily say that all i1 extends were free, but that `cmp(ext(cmp(...)))` is free. That is, an extend which only exists to chain one comparison to another. That seems much more plausible than saying that any extend of an i1 is free. However, there is no way of modeling such a "free" extend in this area.... At least, not with the cost model as we have it set up. We could potentially model this by saying that the second `cmp` is free and clearly documenting that the reason is because we will have already accounted for the cost when visiting the extend. That still leaves open the question: are chained comparisons actually free on all targets? My guess is that they are not. For example, even on x86, one of the most CISCy targets, chained comparisons of this kind will rarely if ever be "free" in any meaningful sense.... If that is what you want to correct by making all of this go away (which does seem reasonable to me) I think you need to much more clearly explain this in the patch description to avoid confusion.

This revision now requires changes to proceed.Nov 20 2018, 1:16 AM

As was mentioned in the original patch, this does still need tests. There are tests here, for example in test/Analysis/CostModel/.... You should be able to see som\

e differences in costs, especially for architectures where the resulting cost computation differs significantly.

Putting back my test case hoping it will serve the purpose

Please also see inline comments.

include/llvm/Analysis/TargetTransformInfoImpl.h
809–810	OK, moved the cast to the return statement.
811–815	However, there is no way of modeling such a "free" extend in this area.... At least, not with the cost model as we have it set up. IIUC, those passes that want a higher level of accuracy for the cost values, could instead call getCastInstrCost() or getCmpSelInstrCost() and pass the Instruction pointer to then analyze the code and isolate cases where e.g. the second compare is free. It would even be possible for a target to override getUserCost() and perhaps do something similar, since the User pointer is available, although getUserCost() is supposed to be simple and quick, I think.

jonpa edited the summary of this revision. (Show Details)Nov 20 2018, 3:48 AM

Hi Jonas,

I ran some benchmarks on ARM64, Cortex-A57. I see no regressions.

BTW, there is a way to test getUserCost via Inliner. See test/Transforms/Inline/AArch64/gep-cost.ll. Cost-free instructions increase NumInstructionsSimplified.

Thanks,
Evgeny Astigeevich

I ran some benchmarks on ARM64, Cortex-A57. I see no regressions.

great!

BTW, there is a way to test getUserCost via Inliner. See test/Transforms/Inline/AArch64/gep-cost.ll. Cost-free instructions increase NumInstructionsSimplified.

Ah, thanks. I removed my initial test and made a new one the same way using the inliner.

Herald added a subscriber: eraman. · View Herald TranscriptNov 22 2018, 12:37 AM

+1, LGTM
I'd like someone from the X86 world to approve this as well.

Thanks,
Evgeny Astigeevich

ping!

I'd like someone from the X86 world to approve this as well.

RKSimon added inline comments.Dec 3 2018, 5:20 AM

include/llvm/Analysis/TargetTransformInfoImpl.h
809–810	Please can you add a comment here that describes the recommended approaches if someone needs to get the old behaviour.

Added comment as requested.

jonpa marked 2 inline comments as done.Dec 3 2018, 6:46 AM

@chandlerc Are you happy with this now?

In D54742#1316748, @RKSimon wrote:

@chandlerc Are you happy with this now?

Not really. The test is still too indirect and complex IMO.

As I said in my comments, there are *direct* tests of the cost model, including the getUserCost result. Not sure why that got ignored, I gave the path for them. A concrete example:

llvm/test/Analysis/CostModel/X86/costmodel.ll

I don't think this should be tested by complex inlining tests. Instead, we have a direct and obvious way to test this stuff and we should be using it. The fact that other architectures are *only* testing the latency or reciprocal throughput cost model, and not the code size cost model, is not a reason to bypass the clear and obvious testing layer we have here. Instead we should add exactly the test for the changed cost model that this change is proposing.

This revision now requires changes to proceed.Dec 4 2018, 12:54 AM

In D54742#1318046, @chandlerc wrote:
In D54742#1316748, @RKSimon wrote:

@chandlerc Are you happy with this now?

Not really. The test is still too indirect and complex IMO.

As I said in my comments, there are *direct* tests of the cost model, including the getUserCost result. Not sure why that got ignored, I gave the path for them. A concrete example:
llvm/test/Analysis/CostModel/X86/costmodel.ll
I don't think this should be tested by complex inlining tests. Instead, we have a direct and obvious way to test this stuff and we should be using it. The fact that other architectures are *only* testing the latency or reciprocal throughput cost model, and not the code size cost model, is not a reason to bypass the clear and obvious testing layer we have here. Instead we should add exactly the test for the changed cost model that this change is proposing.

Sorry, but I'm confused. My understanding is that a test that uses '-cost-model -analyze' (like the one you reference) is using *CostModel*. For this simple test function

define i64 @fun(i64 %Arg1, i64 %Arg2) {
  %Cmp = icmp eq i64 %Arg1, %Arg2
  %Res = zext i1 %Cmp to i64
  ret i64 %Res
}

, CostModel will not call getUserCost(), but instead getCastInstrCost() for the zext. So my patch does not affect this test, and I don't know how to make a direct test that calls getUserCost() like you propose.

ping!

ping! @eastig? @RKSimon? @chandlerc?

In D54742#1334276, @jonpa wrote:

ping! @eastig? @RKSimon? @chandlerc?

, CostModel will not call getUserCost(), but instead getCastInstrCost() for the zext. So my patch does not affect this test, and I don't know how to make a direct test that calls getUserCost() like you propose.

Sure it does. Run the -cost-model analysis with -cost-kind=code-size and it will call getUserCost(). Note the implementation of:

int getInstructionCost(const Instruction *I, enum TargetCostKind kind) const {
  switch (kind){
  case TCK_RecipThroughput:
    return getInstructionThroughput(I);

  case TCK_Latency:
    return getInstructionLatency(I);

  case TCK_CodeSize:
    return getUserCost(I);
  }
  llvm_unreachable("Unknown instruction cost kind");
}

In D54742#1335204, @hfinkel wrote:
In D54742#1334276, @jonpa wrote:

ping! @eastig? @RKSimon? @chandlerc?

, CostModel will not call getUserCost(), but instead getCastInstrCost() for the zext. So my patch does not affect this test, and I don't know how to make a direct test that calls getUserCost() like you propose.

Sure it does. Run the -cost-model analysis with -cost-kind=code-size and it will call getUserCost(). Note the implementation of:
int getInstructionCost(const Instruction *I, enum TargetCostKind kind) const {
  switch (kind){
  case TCK_RecipThroughput:
    return getInstructionThroughput(I);

  case TCK_Latency:
    return getInstructionLatency(I);

  case TCK_CodeSize:
    return getUserCost(I);
  }
  llvm_unreachable("Unknown instruction cost kind");
}

I also gave an example that does this in my suggestion.

Test updated per review so that the costs of extensions of i1 are tested directly.

Sure it does. Run the -cost-model analysis with -cost-kind=code-size and it will call getUserCost().

Ah, thanks and apologies- I was not aware of the -cost-kind=code-size flag.

I also gave an example that does this in my suggestion.

Yes, indeed, sorry.

I'd like someone from the X86 world to approve this as well. (@eastig)

Did anyone check this patch on X86 (in addition to my own preliminary run on my laptop)?

Differential's name is slightly misleading.
Extends from i1 weren't all free before, only if they were an extensions of an icmp result.

Probably other arches are also affected by this (any tests failing?),
would probably be a good idea to have the same test for them too (x86, aarch64?).

And, would be awesome to precommit the adjusted tests, to see the change :)

include/llvm/Analysis/TargetTransformInfoImpl.h
813	'this' method? `getGEPCost()` or `getExtCost()` ?
test/Analysis/CostModel/SystemZ/ext-i1-cost.ll
4–5 ↗	(On Diff #180470)	This is slightly misleading. The patch does not change the cost of an extension from i1. It changes the cost of an extension of i1 returned from `icmp`. So Add one test where i1 was an input to the function Rename test file to `ext-of-icmp-cost.ll` or something Adjust comment

Thanks for quick review!

Probably other arches are also affected by this (any tests failing?),
And, would be awesome to precommit the adjusted tests, to see the change :)

No - no tests are failing!

would probably be a good idea to have the same test for them too (x86, aarch64?).

Since this is the default implementation which is affected, wouldn't that be redundant?

Add one test where i1 was an input to the function

Rename test file to ext-of-icmp-cost.ll or something

3.Adjust comment

ok - done.

ping!

include/llvm/Analysis/TargetTransformInfoImpl.h
813	I was thinking about the same function as is changed here - getUserCost(). Comment updated.

Ping!

Tests using '-cost-model -cost-kind=code-size' in the "direct" way requested in earlier review...

PING!

Herald added a subscriber: jdoerfert. · View Herald TranscriptFeb 20 2019, 6:35 PM

Ping!

There are still no tests failing with this patch, and I do think I have met all requests from reviewers...

LGTM.

We should watch for perf regressions on x86, and other targets.

Thanks for review.

committed as r360970.

kristina added a commit: rL360970: [CodeMetrics] Don't let extends of i1 be free..May 16 2019, 8:49 PM

This seems to be causing multiple performance regressions across several bots in compile time and execution time tests.

More specifically these builders are the most affected: clang-cmake-x86_64-avx2-linux, clang-cmake-x86_64-avx2-linux-perf, clang-cmake-x86_64-sde-avx512-linux.

My bad, I didn't check well enough, it seems an unrelated patch made certain tests crash due to asserts. Thanks to @craig.topper for pointing that out.

RKSimon resigned from this revision.May 17 2019, 12:23 AM

Diff 180482

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 800 Lines • ▼ Show 20 Lines	if (auto CS = ImmutableCallSite(U)) {
Type *FTy = CS.getCalledValue()->getType()->getPointerElementType();		Type *FTy = CS.getCalledValue()->getType()->getPointerElementType();
return static_cast<T *>(this)		return static_cast<T *>(this)
->getCallCost(cast<FunctionType>(FTy), CS.arg_size());		->getCallCost(cast<FunctionType>(FTy), CS.arg_size());
}		}

SmallVector<const Value *, 8> Arguments(CS.arg_begin(), CS.arg_end());		SmallVector<const Value *, 8> Arguments(CS.arg_begin(), CS.arg_end());
return static_cast<T *>(this)->getCallCost(F, Arguments);		return static_cast<T *>(this)->getCallCost(F, Arguments);
}		}

if (const CastInst *CI = dyn_cast<CastInst>(U)) {		if (isa<SExtInst>(U) \|\| isa<ZExtInst>(U) \|\| isa<FPExtInst>(U))
		chandlercUnsubmitted Done Reply Inline Actions You don't need this line at all -- the checks below are sufficient. chandlerc: You don't need this line at all -- the checks below are sufficient.
		jonpaAuthorUnsubmitted Done Reply Inline Actions OK, moved the cast to the return statement. jonpa: OK, moved the cast to the return statement.
		RKSimonUnsubmitted Done Reply Inline Actions Please can you add a comment here that describes the recommended approaches if someone needs to get the old behaviour. RKSimon: Please can you add a comment here that describes the recommended approaches if someone needs to…
// Result of a cmp instruction is often extended (to be used by other		// The old behaviour of generally treating extensions of icmp to be free
// cmp instructions, logical or return instructions). These are usually		// has been removed. A target that needs it should override getUserCost().
// nop on most sane targets.		return static_cast<T *>(this)->getExtCost(cast<Instruction>(U),
		lebedev.riUnsubmitted Done Reply Inline Actions 'this' method? `getGEPCost()` or `getExtCost()` ? lebedev.ri: 'this' method? `getGEPCost()` or `getExtCost()` ?
		jonpaAuthorUnsubmitted Done Reply Inline Actions I was thinking about the same function as is changed here - getUserCost(). Comment updated. jonpa: I was thinking about the same function as is changed here - getUserCost(). Comment updated.
if (isa<CmpInst>(CI->getOperand(0)))		Operands.back());
return TTI::TCC_Free;
chandlercUnsubmitted Not Done Reply Inline Actions The intent (that seems pretty clear from this comment) was not to necessarily say that all i1 extends were free, but that `cmp(ext(cmp(...)))` is free. That is, an extend which only exists to chain one comparison to another. That seems much more plausible than saying that any extend of an i1 is free. However, there is no way of modeling such a "free" extend in this area.... At least, not with the cost model as we have it set up. We could potentially model this by saying that the second `cmp` is free and clearly documenting that the reason is because we will have already accounted for the cost when visiting the extend. That still leaves open the question: are chained comparisons actually free on all targets? My guess is that they are not. For example, even on x86, one of the most CISCy targets, chained comparisons of this kind will rarely if ever be "free" in any meaningful sense.... If that is what you want to correct by making all of this go away (which does seem reasonable to me) I think you need to much more clearly explain this in the patch description to avoid confusion. chandlerc: The intent (that seems pretty clear from this comment) was not to necessarily say that all i1…
jonpaAuthorUnsubmitted Not Done Reply Inline Actions However, there is no way of modeling such a "free" extend in this area.... At least, not with the cost model as we have it set up. IIUC, those passes that want a higher level of accuracy for the cost values, could instead call getCastInstrCost() or getCmpSelInstrCost() and pass the Instruction pointer to then analyze the code and isolate cases where e.g. the second compare is free. It would even be possible for a target to override getUserCost() and perhaps do something similar, since the User pointer is available, although getUserCost() is supposed to be simple and quick, I think. jonpa: > However, there is no way of modeling such a "free" extend in this area.... At least, not with…
if (isa<SExtInst>(CI) \|\| isa<ZExtInst>(CI) \|\| isa<FPExtInst>(CI))
return static_cast<T *>(this)->getExtCost(CI, Operands.back());
}

return static_cast<T *>(this)->getOperationCost(		return static_cast<T *>(this)->getOperationCost(
Operator::getOpcode(U), U->getType(),		Operator::getOpcode(U), U->getType(),
U->getNumOperands() == 1 ? U->getOperand(0)->getType() : nullptr);		U->getNumOperands() == 1 ? U->getOperand(0)->getType() : nullptr);
}		}

int getInstructionLatency(const Instruction *I) {		int getInstructionLatency(const Instruction *I) {
SmallVector<const Value *, 4> Operands(I->value_op_begin(),		SmallVector<const Value *, 4> Operands(I->value_op_begin(),
Show All 33 Lines

test/Analysis/CostModel/SystemZ/ext-of-icmp-cost.ll

This file was added.

				; RUN: opt < %s -cost-model -cost-kind=code-size -analyze \
				; RUN: -mtriple=s390x-unknown-linux -mcpu=z13 \| FileCheck %s
				;
				; Check that getUserCost() does not return TCC_Free for extensions of
				; i1 returned from icmp.

				define i64 @fun1(i64 %v) {
				; CHECK-LABEL: 'fun1'
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %cmp = icmp eq i64 %v, 0
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %z = zext i1 %cmp to i64
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: ret i64 %z
				%cmp = icmp eq i64 %v, 0
				%z = zext i1 %cmp to i64
				ret i64 %z
				}

				define i64 @fun2(i64 %v) {
				; CHECK-LABEL: 'fun2'
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %cmp = icmp eq i64 %v, 0
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %z = sext i1 %cmp to i64
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: ret i64 %z
				%cmp = icmp eq i64 %v, 0
				%z = sext i1 %cmp to i64
				ret i64 %z
				}

				define double @fun3(i64 %v) {
				; CHECK-LABEL: 'fun3'
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %cmp = icmp eq i64 %v, 0
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %z = uitofp i1 %cmp to double
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: ret double %z
				%cmp = icmp eq i64 %v, 0
				%z = uitofp i1 %cmp to double
				ret double %z
				}

				define double @fun4(i64 %v) {
				; CHECK-LABEL: 'fun4'
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %cmp = icmp eq i64 %v, 0
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %z = sitofp i1 %cmp to double
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: ret double %z
				%cmp = icmp eq i64 %v, 0
				%z = sitofp i1 %cmp to double
				ret double %z
				}

				define i64 @fun5(i1 %v) {
				; CHECK-LABEL: 'fun5'
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: %z = zext i1 %v to i64
				; CHECK: Cost Model: Found an estimated cost of 1 for instruction: ret i64 %z
				%z = zext i1 %v to i64
				ret i64 %z
				}

This is an archive of the discontinued LLVM Phabricator instance.

[CodeMetrics] Don't let extends of i1 be free.
Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 180482

include/llvm/Analysis/TargetTransformInfoImpl.h

test/Analysis/CostModel/SystemZ/ext-of-icmp-cost.ll

This is an archive of the discontinued LLVM Phabricator instance.

[CodeMetrics] Don't let extends of i1 be free.Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 180482

include/llvm/Analysis/TargetTransformInfoImpl.h

test/Analysis/CostModel/SystemZ/ext-of-icmp-cost.ll

[CodeMetrics] Don't let extends of i1 be free.
Needs ReviewPublic