Page MenuHomePhabricator

[CodeMetrics] Don't let extends of i1 be free.
Needs ReviewPublic

Authored by jonpa on Nov 20 2018, 12:23 AM.

Details

Summary

getUserCost() currently returns TCC_Free for any extend of a compare (i1) result. It seems this is only true in a limited number of cases where for example two compares are chained. Even in those types of cases it seems unlikely that they are generally free, while they may be in some cases.

This patch therefore suggests removing this special handling of cast of i1. No tests are failing because of this. What remains is to make sure that this does not introduce any performance regressions. It seems that on SystemZ and also preliminary on X86 this looks ok. Some additional benchmarking on more targets (and X86) would be welcome.

If some target actually would want the old behavior, it could override getUserCost().

(This was originally https://reviews.llvm.org/D53373 -- a bit of discussion there)

Diff Detail

Event Timeline

jonpa created this revision.Nov 20 2018, 12:23 AM
chandlerc requested changes to this revision.Nov 20 2018, 1:16 AM

As was mentioned in the original patch, this does still need tests. There are tests here, for example in test/Analysis/CostModel/.... You should be able to see some differences in costs, especially for architectures where the resulting cost computation differs significantly.

include/llvm/Analysis/TargetTransformInfoImpl.h
809–810

You don't need this line at all -- the checks below are sufficient.

811–815

The intent (that seems pretty clear from this comment) was not to necessarily say that *all* i1 extends were free, but that cmp(ext(cmp(...))) is free. That is, an extend which only exists to chain one comparison to another. That seems much more plausible than saying that *any* extend of an i1 is free.

However, there is no way of modeling such a "free" extend in this area.... At least, not with the cost model as we have it set up.

We could potentially model this by saying that the second cmp is free and clearly documenting that the reason is because we will have already accounted for the cost when visiting the extend.

That still leaves open the question: are chained comparisons actually free on all targets? My guess is that they are not. For example, even on x86, one of the most CISCy targets, chained comparisons of this kind will rarely if ever be "free" in any meaningful sense.... If that is what you want to correct by making *all* of this go away (which does seem reasonable to me) I think you need to much more clearly explain this in the patch description to avoid confusion.

This revision now requires changes to proceed.Nov 20 2018, 1:16 AM
jonpa updated this revision to Diff 174745.Nov 20 2018, 3:40 AM
jonpa marked an inline comment as done.

As was mentioned in the original patch, this does still need tests. There are tests here, for example in test/Analysis/CostModel/.... You should be able to see som\

e differences in costs, especially for architectures where the resulting cost computation differs significantly.

Putting back my test case hoping it will serve the purpose

Please also see inline comments.

include/llvm/Analysis/TargetTransformInfoImpl.h
809–810

OK, moved the cast to the return statement.

811–815

However, there is no way of modeling such a "free" extend in this area.... At least, not with the cost model as we have it set up.

IIUC, those passes that want a higher level of accuracy for the cost values, could instead call getCastInstrCost() or getCmpSelInstrCost() and pass the Instruction pointer to then analyze the code and isolate cases where e.g. the second compare is free.

It would even be possible for a target to override getUserCost() and perhaps do something similar, since the User pointer is available, although getUserCost() is supposed to be simple and quick, I think.

jonpa edited the summary of this revision. (Show Details)Nov 20 2018, 3:48 AM

Hi Jonas,

I ran some benchmarks on ARM64, Cortex-A57. I see no regressions.

BTW, there is a way to test getUserCost via Inliner. See test/Transforms/Inline/AArch64/gep-cost.ll. Cost-free instructions increase NumInstructionsSimplified.

Thanks,
Evgeny Astigeevich

jonpa updated this revision to Diff 175017.Nov 22 2018, 12:37 AM

I ran some benchmarks on ARM64, Cortex-A57. I see no regressions.

great!

BTW, there is a way to test getUserCost via Inliner. See test/Transforms/Inline/AArch64/gep-cost.ll. Cost-free instructions increase NumInstructionsSimplified.

Ah, thanks. I removed my initial test and made a new one the same way using the inliner.

eastig accepted this revision.Nov 23 2018, 3:13 AM

+1, LGTM
I'd like someone from the X86 world to approve this as well.

Thanks,
Evgeny Astigeevich

ping!

I'd like someone from the X86 world to approve this as well.

RKSimon added inline comments.Dec 3 2018, 5:20 AM
include/llvm/Analysis/TargetTransformInfoImpl.h
809–810

Please can you add a comment here that describes the recommended approaches if someone needs to get the old behaviour.

jonpa updated this revision to Diff 176390.Dec 3 2018, 6:46 AM

Added comment as requested.

jonpa marked 2 inline comments as done.Dec 3 2018, 6:46 AM

@chandlerc Are you happy with this now?

chandlerc requested changes to this revision.Dec 4 2018, 12:54 AM

@chandlerc Are you happy with this now?

Not really. The test is still too indirect and complex IMO.

As I said in my comments, there are *direct* tests of the cost model, including the getUserCost result. Not sure why that got ignored, I gave the path for them. A concrete example:

llvm/test/Analysis/CostModel/X86/costmodel.ll

I don't think this should be tested by complex inlining tests. Instead, we have a direct and obvious way to test this stuff and we should be using it. The fact that other architectures are *only* testing the latency or reciprocal throughput cost model, and not the code size cost model, is not a reason to bypass the clear and obvious testing layer we have here. Instead we should add exactly the test for the changed cost model that this change is proposing.

This revision now requires changes to proceed.Dec 4 2018, 12:54 AM
jonpa added a comment.Dec 4 2018, 3:29 AM

@chandlerc Are you happy with this now?

Not really. The test is still too indirect and complex IMO.

As I said in my comments, there are *direct* tests of the cost model, including the getUserCost result. Not sure why that got ignored, I gave the path for them. A concrete example:

llvm/test/Analysis/CostModel/X86/costmodel.ll

I don't think this should be tested by complex inlining tests. Instead, we have a direct and obvious way to test this stuff and we should be using it. The fact that other architectures are *only* testing the latency or reciprocal throughput cost model, and not the code size cost model, is not a reason to bypass the clear and obvious testing layer we have here. Instead we should add exactly the test for the changed cost model that this change is proposing.

Sorry, but I'm confused. My understanding is that a test that uses '-cost-model -analyze' (like the one you reference) is using *CostModel*. For this simple test function

define i64 @fun(i64 %Arg1, i64 %Arg2) {
  %Cmp = icmp eq i64 %Arg1, %Arg2
  %Res = zext i1 %Cmp to i64
  ret i64 %Res
}

, CostModel will not call getUserCost(), but instead getCastInstrCost() for the zext. So my patch does not affect this test, and I don't know how to make a direct test that calls getUserCost() like you propose.

, CostModel will not call getUserCost(), but instead getCastInstrCost() for the zext. So my patch does not affect this test, and I don't know how to make a direct test that calls getUserCost() like you propose.

Sure it does. Run the -cost-model analysis with -cost-kind=code-size and it will call getUserCost(). Note the implementation of:

int getInstructionCost(const Instruction *I, enum TargetCostKind kind) const {
  switch (kind){
  case TCK_RecipThroughput:
    return getInstructionThroughput(I);

  case TCK_Latency:
    return getInstructionLatency(I);

  case TCK_CodeSize:
    return getUserCost(I);
  }
  llvm_unreachable("Unknown instruction cost kind");
}

, CostModel will not call getUserCost(), but instead getCastInstrCost() for the zext. So my patch does not affect this test, and I don't know how to make a direct test that calls getUserCost() like you propose.

Sure it does. Run the -cost-model analysis with -cost-kind=code-size and it will call getUserCost(). Note the implementation of:

int getInstructionCost(const Instruction *I, enum TargetCostKind kind) const {
  switch (kind){
  case TCK_RecipThroughput:
    return getInstructionThroughput(I);
 
  case TCK_Latency:
    return getInstructionLatency(I);
 
  case TCK_CodeSize:
    return getUserCost(I);
  }
  llvm_unreachable("Unknown instruction cost kind");
}

I also gave an example that does this in my suggestion.

jonpa updated this revision to Diff 180470.Jan 7 2019, 5:46 AM

Test updated per review so that the costs of extensions of i1 are tested directly.

Sure it does. Run the -cost-model analysis with -cost-kind=code-size and it will call getUserCost().

Ah, thanks and apologies- I was not aware of the -cost-kind=code-size flag.

I also gave an example that does this in my suggestion.

Yes, indeed, sorry.

jonpa added a comment.Jan 7 2019, 5:51 AM

I'd like someone from the X86 world to approve this as well. (@eastig)

Did anyone check this patch on X86 (in addition to my own preliminary run on my laptop)?

Differential's name is slightly misleading.
Extends from i1 weren't all free before, only if they were an extensions of an icmp result.

Probably other arches are also affected by this (any tests failing?),
would probably be a good idea to have the same test for them too (x86, aarch64?).

And, would be awesome to precommit the adjusted tests, to see the change :)

include/llvm/Analysis/TargetTransformInfoImpl.h
813

'this' method? getGEPCost() or getExtCost() ?

test/Analysis/CostModel/SystemZ/ext-i1-cost.ll
4–5 ↗(On Diff #180470)

This is slightly misleading.
The patch does not change the cost of an extension from i1.
It changes the cost of an extension of i1 returned from icmp.

So

  1. Add one test where i1 was an input to the function
  2. Rename test file to ext-of-icmp-cost.ll or something
  3. Adjust comment
jonpa updated this revision to Diff 180482.Jan 7 2019, 6:26 AM
jonpa marked 2 inline comments as done.

Thanks for quick review!

Probably other arches are also affected by this (any tests failing?),
And, would be awesome to precommit the adjusted tests, to see the change :)

No - no tests are failing!

would probably be a good idea to have the same test for them too (x86, aarch64?).

Since this is the default implementation which is affected, wouldn't that be redundant?

  1. Add one test where i1 was an input to the function
  2. Rename test file to ext-of-icmp-cost.ll or something 3.Adjust comment

ok - done.

jonpa added a comment.Jan 22 2019, 5:55 AM

ping!

include/llvm/Analysis/TargetTransformInfoImpl.h
813

I was thinking about the same function as is changed here - getUserCost(). Comment updated.

jonpa added a comment.Jan 31 2019, 8:18 AM

Ping!

Tests using '-cost-model -cost-kind=code-size' in the "direct" way requested in earlier review...

jonpa added a comment.May 14 2019, 8:38 AM

Ping!

There are still no tests failing with this patch, and I do think I have met all requests from reviewers...

hfinkel accepted this revision.May 14 2019, 9:50 AM

LGTM.

We should watch for perf regressions on x86, and other targets.

jonpa added a comment.May 16 2019, 6:32 PM

Thanks for review.

committed as r360970.

kristina added a subscriber: kristina.EditedMay 16 2019, 8:56 PM

This seems to be causing multiple performance regressions across several bots in compile time and execution time tests.

More specifically these builders are the most affected: clang-cmake-x86_64-avx2-linux, clang-cmake-x86_64-avx2-linux-perf, clang-cmake-x86_64-sde-avx512-linux.

My bad, I didn't check well enough, it seems an unrelated patch made certain tests crash due to asserts. Thanks to @craig.topper for pointing that out.

RKSimon resigned from this revision.May 17 2019, 12:23 AM