This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Remove unused llvm.experimental.noalias.scope.decl
ClosedPublic

Authored by jeroen.dobbelaere on Jan 21 2021, 7:50 AM.

Details

Summary

A @llvm.experimental.noalias.scope.decl is only useful if there is !alias.scope and !noalias metadata that uses the declared scope.
When that is not the case for at least one of the two, the intrinsic call can as well be removed.

Diff Detail

Event Timeline

jeroen.dobbelaere requested review of this revision.Jan 21 2021, 7:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 21 2021, 7:50 AM
nikic added a comment.Jan 21 2021, 8:02 AM

InstSimplify is not a good place to put this (it's mostly a testing pass -- you'll find that it's only scheduled once in a standard pipeline). InstCombine is a better place. I would collect scopes during initial worklist population and then delete when the declaration is visited.

jeroen.dobbelaere retitled this revision from [InstSimplifyPass] Remove unused llvm.experimental.noalias.scope.decl to [InstCombine] Remove unused llvm.experimental.noalias.scope.decl.

Moved to InstCombine.

nikic added inline comments.Jan 21 2021, 12:33 PM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3777

Why only MD_alias_scope and not also MD_noalias?

jeroen.dobbelaere added inline comments.
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3777

MD_noalias is telling what scopes (variables) are 'active'/'visible' for the memory instruction, but to which it will not alias.
MD_alias_scopeis telling what 'variable(s)' is/are associated to the memory instruction (to which it _will_ alias).

A no-aliasing relationship between two instructions can be deduced by checking if the !alias.scope of one instruction is a subset of the !noalias of the other instruction (also taking into account the 'domain').

Once a scope is not any more used in !alias.scope (aka, those memory instructions have been optimized away, or they lost the metadata), the location of that scope declaration has no influence any more and can be omitted.
(With the full restrict patches, this corresponds the usage count of the declaration becoming 0)

Of course the same reasoning can be done for the !noalias metadata: once a declared scope is not used any more in !noalias you can come to the same conclusion and omit that declaration.

hmm..

My reasoning was that we are more likely to hit this with the '!alias.scope'. But a more complete solution can also track the UsedNoAliasScopes. If the declared scope is not found in one of the two, it (and its declaration) become useless and we can omit it. Removing the declaration is easy and safe. I am not sure if removing the metadata associated with that scope is also easy and safe to do. Shared metadata between functions could make it tricky, although that should only happen in testcases as far as I am aware.

So, the short answer is: focusing on MD_alias_scope is likely the most effective (and corresponds to how the full restrict patches handle it)

But, extending the analysis to also track MD_noalias can provide a more complete solution, at a slightly higher cost.

@hfinkel, do you agree with my reasoning ?

@nikic, @jdoerfert shall I add the UsedNoAliasScopes tracking as well ?
Whatever we choose, some extra explanation why this is valid should also be added.

nikic added inline comments.Jan 23 2021, 2:11 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3777

At least longer term, I would like that every scope use also has a corresponding decl. With that in mind, I don't really like the idea of leading behind "dangling" scope references, even if it is technically correct.

If it doesn't lose much practical optimization power, I would prefer it if we only dropped decls for scopes that are used in neither alias.scope nor noalias. But if this is a significant restriction, then I think your current approach of looking at alias.scope is fine (as it is annotated to loads, and those are much more likely to be eliminated than stores).

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3777

I understand.

Again, the original idea is based on the full restrict version with the 'usage based' elimination of the scope declarations: when no actual pointer is based on the scope declaration associated with the restrict pointer, that declaration is not needed any more. A 'completely clean' version could combine this with going over _all_ instructions in the function and removing that scope from the metadata. With full restrict, that cost was considered to be too high. There, it is also possible to have multiple declarations using the same scope, but associated to different 'restrict pointers' sharing that scope.

IMHO, it does not make sense to only remove the declaration if the scope is not used at all. Once it is not used by either all !alias.scope, or all !noalias, it can be removed. A further refinement can indeed spend time in actually removing the unused/ineffective scope from the metadata. But I would rather do that as a followup change.

I noticed that D93040 added a decent 2% penalty for trampd3d-v4. In https://llvm-compile-time-tracker.com/index.php?config=O3&stat=instructions&branch=dobbelaj-snps/perf/D95141_20210123_01 you can see that the cleanup wins back most of this regression.

xbolva00 added a subscriber: xbolva00.EditedJan 23 2021, 5:57 AM

I noticed that D93040 added a decent 2% penalty for trampd3d-v4. In https://llvm-compile-time-tracker.com/index.php?config=O3&stat=instructions&branch=dobbelaj-snps/perf/D95141_20210123_01 you can see that the cleanup wins back most of this regression.

4% for LTO.
It would be good to revert D93040 maybe.. Or be sure this fix goes in before llvm 12.
It is kinda unfortunate to introduce new regressions few days before rc1.

I noticed that D93040 added a decent 2% penalty for trampd3d-v4. In https://llvm-compile-time-tracker.com/index.php?config=O3&stat=instructions&branch=dobbelaj-snps/perf/D95141_20210123_01 you can see that the cleanup wins back most of this regression.

It would be good to revert D93040 maybe.. Or be sure this fix goes in before llvm 12.

The goal is to have everything in before llvm-12 (unless other big blockers pop up)

jeroen.dobbelaere edited the summary of this revision. (Show Details)

Look at both !alias.scope and !noalias to decide when to omit a declaration.
During the analysis of !alias.scope and !nolias : when a 'list of scopes' was already analysed we do not need to go over it again.

See https://llvm-compile-time-tracker.com/index.php?config=O3&stat=instructions&branch=dobbelaj-snps/perf/D95141_20210123_02 for the compile time effect.

Logic seems sounds to me. Code as well, one minor nit below. I have no objections to this.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3789

Style: I would prefer early exits but feel free to ignore if you think its not better.

const auto *MDScopeList = dyn_cast_or_null<MDNode>(ScopeList);
if (!MDScopeList || !Container.insert(MDSCopeList).second)
  return;
for (auto &MDOperand : MDScopeList->operands())
  if (auto *MDScope = dyn_cast<MDNode>(MDOperand))
    Container.insert(MDScope);

Is it legal that the cast fails? If the verifier would complain we might just assume it is an MDNode.

llvm/test/Transforms/Coroutines/ex4.ll
56

I'm confused where these come from in the first place but that is unrelated.

nikic added a comment.Jan 23 2021, 1:52 PM

During the analysis of !alias.scope and !nolias : when a 'list of scopes' was already analysed we do not need to go over it again.

I would expect that analysing scope list multiple times isn't really what's causing the compile-time hit here. It's probably the fact that you're looking up two kinds of metadata for each instruction, and metadata lookup is pretty expensive. You might want to check what the impact is if you only analyze mayReadOrWriteMemory() instructions.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3804

I've just pushed a NoAliasScopeDeclInst class to make this kind of code nicer.

auto *Decl = dyn_cast<NoAliasScopeDeclInst>(Inst);
if (!Decl)
  return false;

const MDNode *MD = Decl->getScopeList();
nikic added a comment.Jan 23 2021, 2:07 PM

During the analysis of !alias.scope and !nolias : when a 'list of scopes' was already analysed we do not need to go over it again.

I would expect that analysing scope list multiple times isn't really what's causing the compile-time hit here. It's probably the fact that you're looking up two kinds of metadata for each instruction, and metadata lookup is pretty expensive. You might want to check what the impact is if you only analyze mayReadOrWriteMemory() instructions.

Though possibly that won't help, as it's exactly loads, stores and calls that typically carry metadata anyway. Maybe just skipping DbgInfoIntrinsic would help to avoid a larger hit for -g builds...

During the analysis of !alias.scope and !nolias : when a 'list of scopes' was already analysed we do not need to go over it again.

I would expect that analysing scope list multiple times isn't really what's causing the compile-time hit here. It's probably the fact that you're looking up two kinds of metadata for each instruction, and metadata lookup is pretty expensive. You might want to check what the impact is if you only analyze mayReadOrWriteMemory() instructions.

Though possibly that won't help, as it's exactly loads, stores and calls that typically carry metadata anyway. Maybe just skipping DbgInfoIntrinsic would help to avoid a larger hit for -g builds...

Thanks for the feedback ! I'll check some variants tomorrow (Sunday). First getting some sleep ;)

llvm/test/Transforms/Coroutines/ex4.ll
56

I would assume the inlining of a function that returns a struct. Those typically have a noalias attribute for the return argument.

Use NoAliasScopeDeclInst. Only analyze non-debug instructions with metadata. (*1)

This seems to be faster than analyzing non-debug instruction that are reading or writing memory. (*2)

See https://llvm-compile-time-tracker.com/index.php?config=O3&stat=instructions&branch=dobbelaj-snps/perf/D95141_20210124_01 for some comparisons:

  • *1 corresponds to: 374c7514b58d963cbab82341ad8383fcc89f588d
  • *2 corresponds to: 26ed00968ed9925ead692dfd8a75bdf8dc4a3492
nikic accepted this revision.Jan 24 2021, 4:37 AM

LGTM

This revision is now accepted and ready to land.Jan 24 2021, 4:37 AM
jeroen.dobbelaere marked an inline comment as done.Jan 24 2021, 4:43 AM

Thanks. I'll still do the small refactoring recommended by @jdoerfert and then do the commits.

llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3789

ok.

I don't think the verifier is already complaining. I am all for making this more strict, but rather do this in a followup patch.

Small refactoring as recommended by @jdoerfert

jeroen.dobbelaere marked an inline comment as done.Jan 24 2021, 4:46 AM