This is an archive of the discontinued LLVM Phabricator instance.

Changes to use a string pool for symbols
AcceptedPublic

Authored by jaredwy on Aug 30 2022, 7:24 PM.

Details

Reviewers
lhames
ributzka

Diff Detail

Event Timeline

jaredwy created this revision.Aug 30 2022, 7:24 PM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: hiraditya. · View Herald Transcript
jaredwy requested review of this revision.Aug 30 2022, 7:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 30 2022, 7:24 PM
jaredwy edited reviewers, added: lhames; removed: ributzka.Aug 30 2022, 7:25 PM
jaredwy removed a subscriber: hiraditya.
jaredwy removed a subscriber: StephenFan.

Hi Jared, thanks for working on this! Your patch looks really good. Two minor comments inline.

llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
428–429

Is there a special reason for these methods to take RValue references? Otherwise, these look like classical sink arguments that should be moved in and taken by value (just remove the &&).

1038

Do we have a definition for the dtor here?

jaredwy added inline comments.Sep 6 2022, 1:08 AM
llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
428–429

No not really and in fact started life that way, just was looking to avoid the overhead of the add/decrement of the ownership counter when I noticed every usage was taking ownership, will remove :)

1038

Sorry don't follow the question? Implementation is further down if that's what you meant? Did you want me to move it inline here?

dblaikie added inline comments.
llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
428–429

If you std::move at the caller and in the callee, you shouldn't need to add/decrement ownership counters even when passing by value. (it'll add an extra move, but no copies - copies are what cause add/decrement), so this should be good

lhames accepted this revision.Sep 8 2022, 6:30 PM

Dave's suggestion about pass-by-value should be implemented, but otherwise LGTM. Thanks Jared!

There's a part 2 to this bug, which I thought was mentioned in http://llvm.org/PR55533, but I now realize is not: We want to change the interface of LinkGraph and associated classes to use SymbolStringPtr rather than StringRef. That will make it cheap to transfer/copy/compare strings between the current link and the surrounding ORC session, which is something that plugins (and the ObjectLinkingLayer) do quite a lot.

Jared -- do you want to tackle that under http://llvm.org/PR55533? Otherwise we can close that (once you land this patch) and file a follow-up issue.

llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
428–429

Yep -- the various rvalue reference arguments should all be changed to pass-by-value with the caller using std::move.

1026–1032

What happens if you ditch this overload and require a SymbolStringPool for construction? Does it break many uses (outside the unit tests)?

I guess it's just the jitlink::link API (and various implementations of that) that we'd need to update?

This revision is now accepted and ready to land.Sep 8 2022, 6:30 PM
jaredwy updated this revision to Diff 465653.Oct 5 2022, 9:36 PM

Review comments addressed and also just went and removed the getName returning a stringref.

dblaikie added inline comments.Oct 6 2022, 2:24 PM
llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
516–517

This could return by const-ref (the same way a large member like a std::vector would return from an accessor by const ref) & then isNamed could be removed & the usage code kept the same? (I assume isNamed was added to avoid the cost of copying the SymbolStringPtr - and if it returns by const ref there's no added cost? Oh, I guess callers still have to add the * in to dereference? yeah, but still maybe better than adding this isNamed function that's a bit quirky - since it only allows the one operation (equality) without the SymbolStringPtr copy, which is a bit niche/special case/inflexible)

519–521

I think this can be removed now?

jaredwy added inline comments.Oct 6 2022, 3:39 PM
llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
516–517

It was mostly added to aid in readability of the code and avoiding the unnecessary copy but I am not attached to it, I am new to these lands so if it is not the LLVM way to add utility functions i can remove

519–521

It could.

jaredwy updated this revision to Diff 465919.Oct 6 2022, 4:17 PM

Removing an uncessary if statement

dblaikie added inline comments.Oct 6 2022, 5:46 PM
llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
516–517

Could you show/compare the readability issues you had in mind?

Would getName returning by const-ref address the unnecessary copy concerns?

sgraenitz added inline comments.Oct 7 2022, 9:21 AM
llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
1038

Ah you are right, sorry missed that.

jaredwy updated this revision to Diff 466205.Oct 7 2022, 3:48 PM

Removing a isNamed in favour of just using the comparisons

lhames added a comment.Oct 7 2022, 6:15 PM

I love seeing all that BlockDependenciesMap code disappearing as a result of this change. :D

Lots of minor changes, but the broad direction of this is _great_! Once this patch lands the next big target should be LinkGraph::LookupMap -- we should be able to ditch the StringRefs and pass the SymbolStringPtrs right through to the context. That'll avoid a tonne of re-interning on the ObjectLinkingLayer side.

Very cool work!

llvm/include/llvm/ExecutionEngine/JITLink/JITLink.h
516–517

I'd be in favor of having getName() return by const-ref and dropping isNamed -- this would be consistent with the existing getters/setters.

1232–1233

This and the other "add" methods can take SymbolStringPtrs now, rather than StringRefs.

This is a place where I wouldn't mind a convenience overload for the StringRef though. I.e.:

Symbol &addExternalSymbol(SymbolStringPtr Name, orc::ExecutorAddrDiff Size, Linkage L) {
  assert(llvm::count(ExternalSymbols, Name) == 0 && "Duplicate external symbol");
    auto &Sym = Symbol::constructExternal(
        Allocator, createAddressable(orc::ExecutorAddr(), false),
        std::move(Name), Size, L);
    ExternalSymbols.insert(&Sym);
    return Sym;
}

Symbol &addExternalSymbol(StringRef Name, orc::ExecutorAddrDiff Size, Linkage L) {
  return addExternalSymbol(SSP->intern(Name), Size, L);
}
llvm/include/llvm/ExecutionEngine/JITLink/TableManager.h
42

Are the parens needed here? I think * binds tight enough that you can omit them.

llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp
244–245

I'd include intern convenience methods on both the LinkGraph and the builder, and then use them to avoid naming SymbolStringPtrs that only have one use. I.e.

Gym = &createDLLImportEntry(intern(SymbolName), *ExternalSym);

OTOH it's always worth naming a SymbolStringPtr that gets used more than once (otherwise you'd have to intern it more than once).

llvm/lib/ExecutionEngine/JITLink/EHFrameSupport.cpp
377

I think you can drop the outer parens here.

530

I think you can drop them outer parens here too.

llvm/lib/ExecutionEngine/JITLink/ELF_x86_64.cpp
257–258

Could lazily intern ELFGOTSymbolName into an ELFJITLinker_x86_64 member variable to speed up repeat comparisons?

No need to do that for this patch, but maybe add a FIXME to follow up on the idea?

llvm/lib/ExecutionEngine/JITLink/JITLinkGeneric.cpp
202–203

I think you can now just assert(Sym->getName() && "Externals must be named").

207

We should key UnresolvedExternals on the SymbolStringPtr now, rather than the StringRef.

llvm/lib/ExecutionEngine/JITLink/PerGraphGOTAndPLTStubsBuilder.h
73

I think GOTEntries should be able to be keyed on the SymbolStringPtr too.

95

PLTStubs should likewise be keyed on the SymbolStringPtr.

llvm/lib/ExecutionEngine/Orc/COFFPlatform.cpp
68

We need to use the ExecutionSession's SymbolStringPool here. I'd update this to take the ExecutionSession& -- COFFPlatform has an ES member handy that you can pass in.

807

Once everyone's sharing the same pool this can be a direct SymbolStringPtr value comparison.

llvm/lib/ExecutionEngine/Orc/ELFNixPlatform.cpp
655

Can be a SSP comparison once the pool is shared.

803–811

These tls symbol strings should be interned and stored in the plugin -- that'll save re-interning them for each graph.

llvm/lib/ExecutionEngine/Orc/ExecutionUtils.cpp
577–580

We shouldn't need to allocate a string on the LinkGraph for this any more. Instead, you should just be able to say:

Ptr.setName(G->intern(getImplPrefix() + *KV.first));
llvm/lib/ExecutionEngine/Orc/MachOPlatform.cpp
885

Can compare SSP values once the pool is shared.

1061–1065

We should intern these tlv strings and store them as members in the plugin to avoid redundant interning.

1340–1343

More strings to be interned and stored in the plugin.

llvm/lib/ExecutionEngine/Orc/ObjectLinkingLayer.cpp
233–240

Once Symbol::getName() returns by const-ref I think you can replace InternedName with Sym->getName() everywhere here.

245–246

Ditto here.

431–435

I'd drop Name and just use Sym->getName() here.

486–490

Drop SymName and just use Sym->getName().

llvm/unittests/ExecutionEngine/JITLink/LinkGraphTests.cpp
77–91

Once you drop the convenience method this shouldn't be needed.

847

Missing newline. :)

jaredwy updated this revision to Diff 523650.May 18 2023, 9:31 PM

This further uses symbol ptrs across jitlink, i think in most places now we use it for the symbol name, there is some clean up to go, but this is a good starting point

lhames added a comment.Aug 6 2023, 3:32 PM

Re-reviewing, since I know you're working on a rebase at the moment. Mostly cosmetic comments, but a handful of functional ones too.

llvm/include/llvm/ExecutionEngine/Orc/Layer.h
169 ↗(On Diff #523650)

This shouldn't be needed: Clients can access getExecutionSession instead.

llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp
305–308

Variable names should start with uppercase (according to LLVM coding standards).

You'll have to copy the SymbolStringPtr either way, so you can just drop the move down to the point where you use it as a map key:

auto InternedName = G->intern(S);
auto Symbol = &G->addExternalSymbol(InternedName, 0, false);
Symbol->setLive(true);
ExternalSymbols[std::move(InternedName)] = Symbol;
362–363

This should probably be done before the patch lands.

368–369

You can std::move these map keys to save a ref-count. :)

378–386

sym should be capitalized to Sym.

llvm/lib/ExecutionEngine/JITLink/COFF_x86_64.cpp
197

imageBaseName should be capitalized to ImageBaseName.

llvm/lib/ExecutionEngine/JITLink/ELF_i386.cpp
86

What about

if (Sym->hasName() *Sym->getName() == ELFGOTSymbolName)

?

llvm/lib/ExecutionEngine/JITLink/ELF_x86_64.cpp
257–259

This could be if (Sym.hasName() && *Sym.getName() == ELFGOTSymbolName) too.

284

if (Sym.hasName() && *Sym.getName() == ELFGOTSymbolName) could be used here too.

llvm/lib/ExecutionEngine/JITLink/JITLinkGeneric.cpp
203

Better yet, assert(Sym->hasName() && "Externals must be named"). Sorry I didn't spot that earlier.

llvm/lib/ExecutionEngine/Orc/ExecutionUtils.cpp
578–580

Twine::getSingleStringRef is only valid if the twine is actually representing a single StringRef, but here it's a concatenation.

I think you want to replace:

auto name = getImpPrefix() + *KV.first;
Ptr.setName(G->intern(name.getSingleStringRef()));

with

Ptr.setName(G->intern((getImpPrefix() + *KV.first).str()));
jaredwy updated this revision to Diff 557985.Nov 2 2023, 2:46 AM

Responding to review comments