This is an archive of the discontinued LLVM Phabricator instance.

[TableGen] Cache the vectors of records returned by getAllDerivedDefinitions().
ClosedPublic

Authored by Paul-C-Anagnostopoulos on Dec 4 2020, 11:19 AM.

Details

Summary

In RecordKeeper: The vectors of records returned by getAllDerivedDefinitions(). Many backends request the same record vectors multiple times. This includes big ones like the records inheriting from the Instruction class.

My timings indicate that this speeds up most backends by about 2%, for not much effort.

A couple of backends are full of extra phase timers. I will get rid of those before pushing this revision.

Diff Detail

Event Timeline

Paul-C-Anagnostopoulos requested review of this revision.Dec 4 2020, 11:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 4 2020, 11:19 AM

Can you split this up? The namespace caching change seems quite a bit simpler and easy to approve. The other change makes me think we should return an ArrayRef instead of a vector from getAllDerivedDefinitions. Otherwise we're still making a copy on every call. I also wonder if it should use StringMap instead of std::unordered_map.

Which backends are requesting Instructions multiple times? Don't most backends go through getInstructionsByEnumValue that already caches?

@dblaikie will attest to the fact that I still don't understand the C++ baroque memory management model. It took me awhile to figure out how to use the std::unordered_map.

I presume the extra copy is made each time one of the vectors is returned? Is there an cleverer way to fix this than changing all 221 calls to getAllDerivedDefinitions?

Any backend that works on the instructions and then, in a second step, works with CodeGenTarget, ends up asking for the instruction vector at least twice. One of them, I think InstrInfoEmitter, asks for it three times. getInstructionsByEnumValue helps, but doesn't completely solve the problem. And there are various other subsets of the records, some large, that are requested multiple times: PatFrags, AtomicNoRet, VOP, Commutable_REV.

I should note that timing these changes is rather dicey. The timings vary so much from one run to the next that it's tough to decide whether I'm actually getting anywhere. To convince myself these changes actually helped, I ran various backends at least 30 times each.

I will break out the instruction namespace change, submit a second Phabricator revision, and remove it from this revision. Probably not until Sunday, so let's carry on with this discussion.

It occurs to me that we may have to stick with copying the vector of records each time it is returned. There is nothing stopping backends from modifying the vector. I'll do some research.

I should have realized this earlier.

Paul-C-Anagnostopoulos retitled this revision from [TableGen] Cache two things to improve speed. to [TableGen] Cache the vectors of records returned by getAllDerivedDefinitions()..
Paul-C-Anagnostopoulos edited the summary of this revision. (Show Details)

I removed the namespace caching and submitted a separate revision: https://reviews.llvm.org/D92722

dblaikie added inline comments.Dec 5 2020, 10:27 AM
llvm/lib/TableGen/Record.cpp
2602

I'd probably write this as std::string ClassNameString = ClassName.str() but could be written as std::string ClassNameString(ClassName); as well, if you prefer, but as-written (std::string ClassNameString = std::string(ClassName);) it's a bit overly verbose/redundant.

2602–2610

This causes a few redundant map lookups on the expensive path-, but probably the overall best solution would be using StringMap for this anyway (no need to do conversions via std::string in that case, for instance)

auto p = m.try_emplace(ClassNameString);
if (p.second) {
  p.first->second = getAllDerivedDefinitions(makeArrayRef(ClassName);
}
return p.first->second;

If this continues to use std::unordered_map, the code should be the same except for the need to call "str()" on the use of ClassNameString: auto p = m.try_emplace(ClassNameString.str());

I will play with this some more on Monday and take your advice, David.

The StringMap sounds interesting.

I've taken David's advice and now used StringMap. This time I measured and got about a 3% improvement with this caching.

llvm/include/llvm/TableGen/Record.h
37

Remove this include.

llvm/lib/TableGen/Record.cpp
2609

Remove these old lines.

Nice! StringMap is definitely a good way to go here, the timings in that talk don't look right at all.

llvm/lib/TableGen/Record.cpp
2602

Because you're using StringMap you don't need to convert this to an std::string. try_emplace works with StringRef.

Remove the cruft. This is now ready to push.

LGTM thanks paul

This revision is now accepted and ready to land.Dec 7 2020, 3:30 PM
dblaikie added inline comments.Dec 9 2020, 11:24 AM
llvm/lib/TableGen/Record.cpp
2600

Now that the result is cached, perhaps this function could return by const ref instead of by value, to avoid unnecessary copies of the resulting vector?

llvm/lib/TableGen/Record.cpp
2600

The vector must be copied, because there is nothing stopping the caller from modifying it.

Do you think it's worth the effort to establish the rule that it cannot be modified and then investigate every call to see what's going on?

craig.topper added inline comments.Dec 9 2020, 11:58 AM
llvm/lib/TableGen/Record.cpp
2600

If the caller needs to modify, the caller can make a copy explicitly. I doubt most callers modify it. I think if you just change it to return const std::vector<Record*> & any code that does

std::vector<Record*> Foo = getAllDerivedDefinitions

will still compile. A copy will just be made by the assignment. This should cover most callers I would guess. Then we can through and individually change the callers to use "const std::vector &" to remove the copies.

llvm/lib/TableGen/Record.cpp
2600

That sounds like a good plan.

Some day I will get used to all the copying. I think.