This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Keep track info of lazy-emitted symbols in ModuleBuilder
ClosedPublic

Authored by junaire on Jun 1 2022, 7:44 AM.

Details

Summary

The intent of this patch is to selectively carry some states over to
the Builder so we won't lose the information of the previous symbols.

This used to be several downstream patches of Cling, it aims to fix
errors in Clang Interpreter when trying to use inline functions.
Before this patch:

clang-repl> inline int foo() { return 42;}
clang-repl> int x = foo();

JIT session error: Symbols not found: [ _Z3foov ]
error: Failed to materialize symbols:
{ (main, { x, $.incr_module_1.__inits.0, __orc_init_func.incr_module_1 }) }

Co-authored-by: Axel Naumann <Axel.Naumann@cern.ch>
Signed-off-by: Jun Zhang <jun@junz.org>

Diff Detail

Event Timeline

junaire created this revision.Jun 1 2022, 7:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2022, 7:44 AM
junaire requested review of this revision.Jun 1 2022, 7:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2022, 7:44 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
junaire updated this revision to Diff 433607.Jun 1 2022, 6:27 PM

Fix the broken build

junaire updated this revision to Diff 433610.Jun 1 2022, 6:30 PM

Only use heplers

junaire updated this revision to Diff 433623.Jun 1 2022, 7:11 PM

Fix the build

junaire updated this revision to Diff 433624.Jun 1 2022, 7:13 PM

Forget to use helpers again...

junaire updated this revision to Diff 433659.Jun 1 2022, 11:26 PM

Fix the unittest failure

junaire updated this revision to Diff 433694.Jun 2 2022, 2:32 AM

also swap TBAA

You haven't really described the issue you're having. The code looks like it's maintaining a map from symbol names to the GlobalDecl from which it'll be emitted; can you explain what Cling wants this for?

Thanks for the prompt reply.

We have seen that when there is an inline/weak symbol with no use CodeGen decides not to emit that symbol and moves it into a map. Upon a use it would lazily emit it. However, in cling/clang-repl the use might come on the next line, which means that we called CodeGeneratorImpl::StartModule which has called CodeGeneratorImpl::Initialize and reset Builder. By that time we have lost the information if we emitted the symbol or not.

The intent of this patch is to selectively carry some state over to the Builder so that we can keep track of the lazy decisions the previous builder took. If we eagerly emit declarations we have no problems clang-repl -Xcc -Xclang -Xcc -femit-all-decls "inline int f() { return 1; }" "int i = f();" makes it work. However it seems an overkill.

A second goal of this patch is to start a discussion if/how we could do it better as carrying state is error-prone...

Okay, I understand. So, first off, I wouldn't really call that a "weak" symbol rather than, say, a lazily-emitted symbol; "weak" already has plenty of different senses, and we should try to avoid coining more. Also, your patch description makes it sound like there's a general bug you're fixing rather than specifically an issue with re-using a single CodeGenerator to generate a succession of modules; please remember that people reading your commit messages don't necessarily know you or contextualize your patches by knowing that you work on Cling.

On a more technical note, it's now clear that the main thrust of your patch is the state persistence in StartModule. Your patch is effectively adding a feature where StartModule can be invoked multiple times (assuming it's been appropriately finalized from earlier invocations). I think that's fine, although I imagine it will need a lot of further changes to allow linkage between these modules. That doesn't really explain why you needed to add a new field to the CodeGenModule, though.

junaire updated this revision to Diff 433965.Jun 2 2022, 11:44 PM

remove EmittedDeferredDecls

junaire updated this revision to Diff 433976.Jun 3 2022, 12:23 AM

Update commit message

junaire retitled this revision from [CodeGen] Correctly handle weak symbols in the codegen to [CodeGen] Keep track info of lazy-emitted symbols in ModuleBuilder.Jun 3 2022, 12:25 AM
junaire edited the summary of this revision. (Show Details)

Okay, I understand. So, first off, I wouldn't really call that a "weak" symbol rather than, say, a lazily-emitted symbol; "weak" already has plenty of different senses, and we should try to avoid coining more. Also, your patch description makes it sound like there's a general bug you're fixing rather than specifically an issue with re-using a single CodeGenerator to generate a succession of modules; please remember that people reading your commit messages don't necessarily know you or contextualize your patches by knowing that you work on Cling.

Yes, you are right. We will update the patch title and summary.

On a more technical note, it's now clear that the main thrust of your patch is the state persistence in StartModule. Your patch is effectively adding a feature where StartModule can be invoked multiple times (assuming it's been appropriately finalized from earlier invocations).

Yes!

I think that's fine, although I imagine it will need a lot of further changes to allow linkage between these modules.

Currently, we do that outside of CodeGen in a separate module pass which looks like this:


bool runOnGlobal(GlobalValue& GV) {
  if (GV.isDeclaration())
    return false; // no change.

  // GV is a definition.

  // It doesn't make sense to keep unnamed constants, we wouldn't know how
  // to reference them anyway.
  if (!GV.hasName())
    return false;

  if (GV.getName().startswith(".str"))
    return false;

  llvm::GlobalValue::LinkageTypes LT = GV.getLinkage();
  if (!GV.isDiscardableIfUnused(LT))
    return false;

  if (LT == llvm::GlobalValue::InternalLinkage
      || LT == llvm::GlobalValue::PrivateLinkage) {
    // We want to keep this GlobalValue around, but have to tell the JIT
    // linker that it should not error on duplicate symbols.
    // FIXME: Ideally the frontend would never emit duplicate symbols and
    // we could just use the old version of saying:
    // GV.setLinkage(llvm::GlobalValue::ExternalLinkage);
    GV.setLinkage(llvm::GlobalValue::WeakAnyLinkage);
    return true; // a change!
  }
  return false;
}

The new JIT infrastructure does not ignore duplicate symbols anymore which makes it more challenging to adjust linkage. Ideally, we should CodeGen know what's needed to make code in two different llvm::Modules work... I do not have a feeling if that requires a lot of changes or even redesign. I think this might become clearer when we start upstreaming the code "undo" patches where CodeGen's state should be reverted to a previous partial translation unit. @junaire, maybe it is a good time to make your patch in that area public and add the current reviewers.

That doesn't really explain why you needed to add a new field to the CodeGenModule, though.

Good question. The original idea of EmittedDeferredDecls (patch was developed in 2017) was to track deferred decls that were emitted in a previous llvm::Module. We checked and it is not needed to fix the current test case. We also are checking on the bigger infrastructure if we can drop this field.

rsmith added inline comments.Jun 3 2022, 12:36 PM
clang/lib/CodeGen/CodeGenModule.h
1480–1482

This function name doesn't match its implementation.

clang/lib/CodeGen/ModuleBuilder.cpp
136–165

Please can you move all of this logic into a method on CodeGenModule rather than exposing its implementation details to ModuleBuilder?

junaire updated this revision to Diff 434236.Jun 3 2022, 8:08 PM
junaire retitled this revision from [CodeGen] Keep track info of lazy-emitted symbols in ModuleBuilder to [CodeGen] Keep track info of lazy-emitted symbols in ModuleBuilder.
junaire edited the summary of this revision. (Show Details)

Address @rsmith 's comments.

junaire marked 2 inline comments as done.Jun 3 2022, 8:19 PM
junaire updated this revision to Diff 434238.Jun 3 2022, 8:45 PM

Fix the broken build

junaire updated this revision to Diff 434251.Jun 4 2022, 12:45 AM

I think std::swap makes the logic more clear

v.g.vassilev added inline comments.Jun 6 2022, 6:34 AM
clang/lib/CodeGen/CodeGenModule.h
1506

Would it make sense to make the RAII a friend to avoid adding the accessors of various structures?

clang/lib/CodeGen/ModuleBuilder.cpp
142

Maybe we could move this call into Initialize.

junaire updated this revision to Diff 434473.Jun 6 2022, 7:46 AM

friend class comes to the rescue!

junaire updated this revision to Diff 434474.Jun 6 2022, 7:47 AM
junaire marked an inline comment as done.

fix a typo

junaire added inline comments.Jun 6 2022, 7:49 AM
clang/lib/CodeGen/ModuleBuilder.cpp
142

I don't know why but moving this to Initialize causes a crash...

junaire updated this revision to Diff 434479.Jun 6 2022, 8:00 AM

Remove extra brackets

rjmccall added inline comments.Jun 6 2022, 8:14 AM
clang/lib/CodeGen/CodeGenModule.h
1482

I think a RAII object is an odd way to express this API; there's not really a natural reason you would scope this. It would be better to just have a method on CodeGenModule that steals the internal lazy-emission state from a different instance.

junaire added inline comments.Jun 6 2022, 9:07 AM
clang/lib/CodeGen/CodeGenModule.h
1482

I don't have a better way to do this in mind, We need to restore the states after resetting the Builder right? So I guess RAII should be fine? Can you please provide some hints about how to do this more elegantly? Thx!

rjmccall added inline comments.Jun 6 2022, 9:22 AM
clang/lib/CodeGen/CodeGenModule.h
1482

I don't know what you mean. You're not restoring any old state, you're moving state into the new CGM. The whole thing is an imperative operation, it's just being done in a destructor for no particular reason.

std::unique_ptr<CodeGenModule> newBuilder(new CodeGenModule(...));
Builder->moveLazyEmissionStateInto(newBuilder.get());
Builder = newBuilder;
junaire updated this revision to Diff 434714.Jun 7 2022, 12:46 AM

Try to address @rjmccall 's comments

junaire added inline comments.Jun 7 2022, 12:54 AM
clang/lib/CodeGen/ModuleBuilder.cpp
165

I left I may doing something wrong here, after invoking Initialize we set Builder = std::move(NewBuilder);, so these lines are useless now? I'm quite confused though... @rjmccall

rjmccall added inline comments.Jun 7 2022, 2:20 PM
clang/lib/CodeGen/CodeGenModule.h
1482

I liked all the assertions you had here. It was good to validate that we weren't overwriting useful state in the new builder.

clang/lib/CodeGen/ModuleBuilder.cpp
165

Ah, I see. Probably the most reasonable thing would be to do something like this in StartModule:

auto OldBuilder = std::move(Builder);
Initialize(Ctx);
if (OldBuilder) {
  OldBuilder->moveLazyEmissionStateInto(*Builder);
}
junaire updated this revision to Diff 435000.Jun 7 2022, 5:18 PM

Address @rjmccall 's comments

junaire updated this revision to Diff 435004.Jun 7 2022, 5:30 PM

I believe that OldBuilder is always valid in the normal cases, so
remove the if condition and replace it with an assertion.

I believe that OldBuilder is always valid in the normal cases, so
remove the if condition and replace it with an assertion.

Is StartModule not called for the first time we start emitting a module, or do we pointlessly create a CGM and then immediately replace it?

junaire updated this revision to Diff 435460.Jun 9 2022, 2:21 AM

OK, then we'd better to roll the condition back.

rjmccall accepted this revision.Jun 9 2022, 7:21 AM

Okay, LGTM

This revision is now accepted and ready to land.Jun 9 2022, 7:21 AM

Okay, LGTM

Thanks for the review ☺️

v.g.vassilev added inline comments.Jun 9 2022, 9:46 AM
clang/lib/CodeGen/CodeGenModule.h
1500

It seems that we forgot to move the WeakRefReferences?

junaire added inline comments.Jun 9 2022, 10:14 PM
clang/lib/CodeGen/CodeGenModule.h
1500
hctim added a subscriber: hctim.Jun 13 2022, 9:54 AM

Looks like this broke the ASan buildbot (and may have been missed because the bot was already red at the time):

https://lab.llvm.org/buildbot/#/builders/5/builds/24588

Instructions on how to repro the bot are here: https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild

FAIL: Clang :: Interpreter/execute.cpp (9159 of 66378)
******************** TEST 'Clang :: Interpreter/execute.cpp' FAILED ********************
Script:
--
: 'RUN: at line 1';   /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan/bin/clang-repl "int i = 10;" 'extern "C" int printf(const char*,...);'             'auto r1 = printf("i = %d\n", i);' | /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan/bin/FileCheck --check-prefix=CHECK-DRIVER /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/test/Interpreter/execute.cpp
: 'RUN: at line 6';   cat /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/test/Interpreter/execute.cpp | /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan/bin/clang-repl | /b/sanitizer-x86_64-linux-fast/build/llvm_build_asan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/test/Interpreter/execute.cpp
--
Exit Code: 2
Command Output (stderr):
--
=================================================================
==58455==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 3912 byte(s) in 3 object(s) allocated from:
    #0 0x55f45fa2983d in operator new(unsigned long) /b/s/w/ir/cache/builder/src/third_party/llvm/compiler-rt/lib/asan/asan_new_delete.cpp:95:3
    #1 0x55f4600f75ad in llvm::RegisterTargetMachine<llvm::X86TargetMachine>::Allocator(llvm::Target const&, llvm::Triple const&, llvm::StringRef, llvm::StringRef, llvm::TargetOptions const&, llvm::Optional<llvm::Reloc::Model>, llvm::Optional<llvm::CodeModel::Model>, llvm::CodeGenOpt::Level, bool) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:1306:12
    #2 0x55f46267c8bb in createTargetMachine /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/MC/TargetRegistry.h:466:12
    #3 0x55f46267c8bb in CreateTargetMachine /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/CodeGen/BackendUtil.cpp:560:23
    #4 0x55f46267c8bb in EmitAssembly /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/CodeGen/BackendUtil.cpp:1012:3
    #5 0x55f46267c8bb in clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream>>) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/CodeGen/BackendUtil.cpp:1182:13
    #6 0x55f4626736e5 in clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/CodeGen/CodeGenAction.cpp:379:7
    #7 0x55f4615be6cc in clang::IncrementalParser::ParseOrWrapTopLevelDecl() /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Interpreter/IncrementalParser.cpp:218:13
    #8 0x55f4615c0e13 in clang::IncrementalParser::Parse(llvm::StringRef) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Interpreter/IncrementalParser.cpp:265:14
    #9 0x55f4615bb814 in clang::Interpreter::Parse(llvm::StringRef) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Interpreter/Interpreter.cpp:207:22
    #10 0x55f45fa30581 in clang::Interpreter::ParseAndExecute(llvm::StringRef) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/include/clang/Interpreter/Interpreter.h:64:16
    #11 0x55f45fa2e95c in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/tools/clang-repl/ClangRepl.cpp:97:30
    #12 0x7fde05cc509a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) (BuildId: eb6a5dd378d22b1e695984462a799cd4c81cdc22)

Indirect leak of 32776 byte(s) in 1 object(s) allocated from:
    #0 0x55f45f9f8c88 in calloc /b/s/w/ir/cache/builder/src/third_party/llvm/compiler-rt/lib/asan/asan_malloc_linux.cpp:77:3
    #1 0x55f460cf72a2 in safe_calloc /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/Support/MemAlloc.h:40:18
    #2 0x55f460cf72a2 in AllocateBuckets /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/FoldingSet.cpp:173:40
    #3 0x55f460cf72a2 in llvm::FoldingSetBase::FoldingSetBase(unsigned int) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Support/FoldingSet.cpp:187:13
    #4 0x55f46a156d6a in FoldingSetImpl /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/ADT/FoldingSet.h:438:9
    #5 0x55f46a156d6a in ContextualFoldingSet /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/ADT/FoldingSet.h:620:9
    #6 0x55f46a156d6a in clang::ASTContext::ASTContext(clang::LangOptions&, clang::SourceManager&, clang::IdentifierTable&, clang::SelectorTable&, clang::Builtin::Context&, clang::TranslationUnitKind) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/AST/ASTContext.cpp:976:7
    #7 0x55f460f2c1eb in clang::CompilerInstance::createASTContext() /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Frontend/CompilerInstance.cpp:554:23
    #8 0x55f4612e7d71 in clang::FrontendAction::BeginSourceFile(clang::CompilerInstance&, clang::FrontendInputFile const&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Frontend/FrontendAction.cpp:918:10
    #9 0x55f460f37655 in clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Frontend/CompilerInstance.cpp:1032:13
    #10 0x55f4615bd1af in clang::IncrementalParser::IncrementalParser(std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>, llvm::LLVMContext&, llvm::Error&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Interpreter/IncrementalParser.cpp:130:7
    #11 0x55f4615bb00c in make_unique<clang::IncrementalParser, std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance> >, llvm::LLVMContext &, llvm::Error &> /b/sanitizer-x86_64-linux-fast/build/libcxx_build_asan/include/c++/v1/__memory/unique_ptr.h:717:32
    #12 0x55f4615bb00c in clang::Interpreter::Interpreter(std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>, llvm::Error&) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Interpreter/Interpreter.cpp:179:16
    #13 0x55f4615bb4a6 in clang::Interpreter::create(std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>) /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/lib/Interpreter/Interpreter.cpp:189:40
    #14 0x55f45fa2e1e8 in main /b/sanitizer-x86_64-linux-fast/build/llvm-project/clang/tools/clang-repl/ClangRepl.cpp:85:27
    #15 0x7fde05cc509a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a) (BuildId: eb6a5dd378d22b1e695984462a799cd4c81cdc22)

< ... snip, many more objects leaked ... >