This is an archive of the discontinued LLVM Phabricator instance.

[ObjC] Encode pointers to C++ classes as "^v" if the encoded string would otherwise include template specialization types
ClosedPublic

Authored by ahatanak on Feb 16 2021, 2:11 PM.

Details

Summary

This helps reduce the size of the encoded C++ type strings in the binary.

This is enabled by default only on Darwin, but can be enabled/disabled via command line options.

rdar://63288571

Diff Detail

Event Timeline

ahatanak created this revision.Feb 16 2021, 2:11 PM
ahatanak requested review of this revision.Feb 16 2021, 2:11 PM

This patch focuses only on reducing the encoded size of classes with template specialization, but if the information about pointed-to types isn't used at all, clang can always emit "^v" for C++ pointer types. Or it might make more sense to base the decision on whether the type is POD rather than on whether the string includes template specialization.

rjmccall accepted this revision.Feb 16 2021, 4:10 PM

You might want to test nested classes of class templates. I don't know if we encode nested classes in any reasonable way in the first place.

I'm generally okay with being aggressive about stripping this kind of information from encodings, since encodings are not good enough to do type reflection in ObjC++ anyway. If you did decide you wanted to preserve more information from the encoding, though, you could consider only applying this at the innermost level that mentions a template specialization type, rather than the outermost. That is, the new logic seems to not expand the X in X* if X recursively has a field that's a pointer to a template specialization, rather than expanding X and then not expanding the type of that one field. Definitely not something I think you need to do now, though.

This revision is now accepted and ready to land.Feb 16 2021, 4:10 PM
theraven requested changes to this revision.Feb 17 2021, 1:20 AM

I'm broadly in favour of this change, but with the GNUstep ABIs this is an ABI-breaking change and so should not be on by default. The type encoding leaks into the ABI in two ways:

  • Dispatch is dependent on type, which avoids some stack corruption bugs on NeXT-style runtimes. This means that the types of the selector in the caller and the selector in the method definition must match. I don't see a problem with turning this change off for this specific case, because it's incredibly rare to pass complex C++ template types as arguments to Objective-C methods.
  • The variable that contains the offset of ivars includes the type encoding, so that changing the type of a public ivar breaks linkage, rather than just subtly corrupting data later. We could leave this change on for any @private or @package ivars, but changing the type encoding for public ivars will break code linking with existing frameworks.

I'm quite happy with this to be opt-in for non-Apple runtimes and opt-out for Apple runtimes.

This revision now requires changes to proceed.Feb 17 2021, 1:20 AM

Since this is just in pointee types, I suspect this is fine for dispatch outside of targets (like that one old Cray) where sizeof(void*) != sizeof(some_struct*), which I don't believe we generally support. But yeah, if this affects GNU-runtime ivar mangling, we need a way to not disturb this for that case.

ahatanak updated this revision to Diff 324515.Feb 17 2021, 9:17 PM
ahatanak edited the summary of this revision. (Show Details)

Enable this optimization only on Darwin.

ahatanak updated this revision to Diff 324519.Feb 17 2021, 9:23 PM

Upload patch with context.

ahatanak updated this revision to Diff 324540.Feb 17 2021, 11:45 PM

Add test cases for nested class.

theraven added inline comments.Feb 18 2021, 1:39 AM
clang/include/clang/Driver/Options.td
2107

I think there's some magic in Options.td for defining the thing and its no- variant together. I don't think you want the f prefix in the name either if it's an f-option.

clang/lib/Driver/ToolChains/Clang.cpp
5796

I don't think this should be guarded on Darwin, it should be guarded on whether the ObjCRuntime is a NeXT-family runtime. It might be fine to enable this for all non-GNUstep runtimes (the GCC ABI has typed selectors, but doesn't use the type information for dispatch and doesn't have non-fragile ivars. I believe the ObjFW runtime is the same).

ahatanak updated this revision to Diff 324628.Feb 18 2021, 7:29 AM
ahatanak marked 2 inline comments as done.
  • Use BoolFOption to define both the positive and negative options.
  • Enable the optimization if the ObjC runtime is a NeXT-family runtime.
theraven accepted this revision.Feb 18 2021, 7:42 AM

Thanks!

This revision is now accepted and ready to land.Feb 18 2021, 7:42 AM
This revision was landed with ongoing or failed builds.Feb 18 2021, 9:38 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Feb 18 2021, 10:35 AM

Very cool change, I wanted something like this for a long time :)

It seems to break Index/preamble-reparse-changed-module.m on some platforms:
http://45.33.8.238/linux/39727/step_7.txt
http://45.33.8.238/macm1/3788/step_6.txt

Does the failure make sense to you?

Very cool change, I wanted something like this for a long time :)

It seems to break Index/preamble-reparse-changed-module.m on some platforms:
http://45.33.8.238/linux/39727/step_7.txt
http://45.33.8.238/macm1/3788/step_6.txt

Does the failure make sense to you?

I don't know why the test is failing. It doesn't fail on my machine.

This is also broken: http://green.lab.llvm.org/green/job/lldb-cmake/29029/

I'm not sure if it's related to the one you saw.

I ssh'd into one of the failing boxes and ran the failing command without piping it into filecheck. Here's the output:

thakis@dotc:~/src/llvm-project$ env CINDEXTEST_EDITING=1 CINDEXTEST_EXECUTE_COMMAND="cp /usr/local/google/home/thakis/src/llvm-project/clang/test/Index/Inputs/preamble-reparse-changed-module/new-head.h /usr/local/google/home/thakis/src/llvm-project/out/gn/obj/clang/test/Index/Output/preamble-reparse-changed-module.m.tmp/mod/head.h" CINDEXTEST_EXECUTE_AFTER_TRIAL=1      /usr/local/google/home/thakis/src/llvm-project/out/gn/bin/c-index-test -test-load-source-reparse 3 local /usr/local/google/home/thakis/src/llvm-project/clang/test/Index/preamble-reparse-changed-module.m -I /usr/local/google/home/thakis/src/llvm-project/out/gn/obj/clang/test/Index/Output/preamble-reparse-changed-module.m.tmp -I /usr/local/google/home/thakis/src/llvm-project/out/gn/obj/clang/test/Index/Output/preamble-reparse-changed-module.m.tmp/mod -fmodules -fmodules-cache-path=/usr/local/google/home/thakis/src/llvm-project/out/gn/obj/clang/test/Index/Output/preamble-reparse-changed-module.m.tmp/mcp 
c-index-test: ../../llvm/include/llvm/ADT/SmallVector.h:281: llvm::SmallVectorTemplateCommon::const_reference llvm::SmallVectorTemplateCommon<unsigned long, void>::operator[](llvm::SmallVectorTemplateCommon::size_type) const [T = unsigned long]: Assertion `idx < size()' failed.
libclang: crash detected during parsing: {
  'source_filename' : '(null)'
  'command_line_args' : ['clang', '/usr/local/google/home/thakis/src/llvm-project/clang/test/Index/preamble-reparse-changed-module.m', '-I', '/usr/local/google/home/thakis/src/llvm-project/out/gn/obj/clang/test/Index/Output/preamble-reparse-changed-module.m.tmp', '-I', '/usr/local/google/home/thakis/src/llvm-project/out/gn/obj/clang/test/Index/Output/preamble-reparse-changed-module.m.tmp/mod', '-fmodules', '-fmodules-cache-path=/usr/local/google/home/thakis/src/llvm-project/out/gn/obj/clang/test/Index/Output/preamble-reparse-changed-module.m.tmp/mcp'],
  'unsaved_files' : [],
  'options' : 13,
}
Unable to load translation unit!
Failure: libclang crashed

Do you build without assertions locally maybe?

#3  0x00007ffff78ee662 in __GI___assert_fail (assertion=0x90165c "idx < size()", 
    file=0xc60efe "../../llvm/include/llvm/ADT/SmallVector.h", line=281, 
    function=0xa8780d "llvm::SmallVectorTemplateCommon::const_reference llvm::SmallVectorTemplateCommon<unsigned long, void>::operator[](llvm::SmallVectorTemplateCommon::size_type) const [T = unsigned long]") at assert.c:101
#4  0x000000000444401a in clang::ASTReader::ParseLanguageOptions(llvm::SmallVector<unsigned long, 64u> const&, bool, clang::ASTReaderListener&, bool) ()
#5  0x000000000443eb62 in clang::ASTReader::ReadOptionsBlock(llvm::BitstreamCursor&, unsigned int, bool, clang::ASTReaderListener&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()
#6  0x00000000044474c5 in clang::ASTReader::ReadControlBlock(clang::serialization::ModuleFile&, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, clang::serialization::ModuleFile const*, unsigned int) ()
#7  0x000000000444a901 in clang::ASTReader::ReadASTCore(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, clang::serialization::ModuleFile*, llvm::SmallVectorImpl<clang::ASTReader::ImportedModule>&, long, long, clang::ASTFileSignature, unsigned int) ()
#8  0x00000000044584f8 in clang::ASTReader::ReadAST(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, unsigned int, llvm::SmallVectorImpl<clang::ASTReader::ImportedSubmodule>*) ()
#9  0x00000000042c07a5 in clang::CompilerInstance::findOrCompileModuleAndReadAST(llvm::StringRef, clang::SourceLocation, clang::SourceLocation, bool) ()
#10 0x00000000042c4650 in clang::CompilerInstance::loadModule(clang::SourceLocation, llvm::ArrayRef<std::pair<clang::IdentifierInfo*, clang::SourceLocation> >, clang::Module::NameVisibilityKind, bool) ()
#11 0x00000000049713c6 in clang::Preprocessor::LexAfterModuleImport(clang::Token&) ()
#12 0x0000000004972afb in clang::Preprocessor::Lex(clang::Token&) ()
#13 0x0000000004e6421d in clang::Parser::ParseModuleName(clang::SourceLocation, llvm::SmallVectorImpl<std::pair<clang::IdentifierInfo*, clang::SourceLocation> >&, bool) ()
#14 0x0000000004e5c9f5 in clang::Parser::ParseModuleImport(clang::SourceLocation) ()
#15 0x0000000004ed2e23 in clang::Parser::ParseObjCAtDirectives(clang::Parser::ParsedAttributesWithRange&) ()
#16 0x0000000004e5d802 in clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) ()
#17 0x0000000004e5b7e6 in clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, bool) ()
#18 0x0000000004e5af77 in clang::Parser::ParseFirstTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&) ()
#19 0x0000000004e5632d in clang::ParseAST(clang::Sema&, bool, bool) ()
#20 0x000000000434e3a3 in clang::FrontendAction::Execute() ()
#21 0x0000000004261cc6 in clang::ASTUnit::Parse(std::shared_ptr<clang::PCHContainerOperations>, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>) ()
#22 0x0000000004265d8d in clang::ASTUnit::LoadFromCompilerInvocation(std::shared_ptr<clang::PCHContainerOperations>, unsigned int, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>) ()
#23 0x0000000004267090 in clang::ASTUnit::LoadFromCommandLine(char const**, char const**, std::shared_ptr<clang::PCHContainerOperations>, llvm::IntrusiveRefCntPtr<clang::DiagnosticsEngine>, llvm::StringRef, bool, clang::CaptureDiagsKind, llvm::ArrayRef<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, llvm::MemoryBuffer*> >, bool, unsigned int, clang::TranslationUnitKind, bool, bool, bool, clang::SkipFunctionBodiesScope, bool, bool, bool, bool, llvm::Optional<llvm::StringRef>, std::unique_ptr<clang::ASTUnit, std::default_delete<clang::ASTUnit> >*, llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>) ()
#24 0x00000000045c7b05 in void llvm::function_ref<void ()>::callback_fn<clang_parseTranslationUnit2FullArgv::$_1>(long) ()
#25 0x000000000464d955 in llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) ()
#26 0x000000000464db04 in RunSafelyOnThread_Dispatch(void*) ()

Maybe the problem is that the test reuses a serialized module from an earlier run, and this changes serialization stuff without bumping the serialization version?

See maybe also https://reviews.llvm.org/D73202 . Do we need to bump serialization::VERSION_MINOR ?

See maybe also https://reviews.llvm.org/D73202 . Do we need to bump serialization::VERSION_MINOR ?

I gave

See maybe also https://reviews.llvm.org/D73202 . Do we need to bump serialization::VERSION_MINOR ?

I gave that a try in http://reviews.llvm.org/rG063236646849564094f5fcfc947ad36dba0efedb . Looks like it did the trick: http://45.33.8.238/linux/39740/step_7.txt Some other test is also broken, but the breakage caused by this change is fixed in that build. So the failure happened with LLVM_APPEND_VC_REV=NO.

I still don't understand why the test was failing, but did we have to bump the version because a new LangOpt (EncodeCXXClassTemplateSpec) was added?

I still don't understand why the test was failing, but did we have to bump the version because a new LangOpt (EncodeCXXClassTemplateSpec) was added?

Here are the RUN lines:

// RUN: mkdir -p %t/mod
// RUN: touch %t/empty.h
// RUN: cp %S/Inputs/preamble-reparse-changed-module/module.modulemap %t/mod
// RUN: cp %S/Inputs/preamble-reparse-changed-module/head.h %t/mod

// RUN: env CINDEXTEST_EDITING=1 CINDEXTEST_EXECUTE_COMMAND="cp %S/Inputs/preamble-reparse-changed-module/new-head.h %t/mod/head.h" CINDEXTEST_EXECUTE_AFTER_TRIAL=1 \
// RUN:     c-index-test -test-load-source-reparse 3 local %s -I %t -I %t/mod -fmodules -fmodules-cache-path=%t/mcp 2>&1 | FileCheck %s

It wouldn't have failed in a clean build.

The test is reusing the module cache path %t/mcp from the previous run. I think most implicit modules tests make themselves robust to incremental builds by adding something like this before the first use of the cache:

// RUN: rm -rf %t/mcp

I still don't understand why the test was failing, but did we have to bump the version because a new LangOpt (EncodeCXXClassTemplateSpec) was added?

Here are the RUN lines:

// RUN: mkdir -p %t/mod
// RUN: touch %t/empty.h
// RUN: cp %S/Inputs/preamble-reparse-changed-module/module.modulemap %t/mod
// RUN: cp %S/Inputs/preamble-reparse-changed-module/head.h %t/mod

// RUN: env CINDEXTEST_EDITING=1 CINDEXTEST_EXECUTE_COMMAND="cp %S/Inputs/preamble-reparse-changed-module/new-head.h %t/mod/head.h" CINDEXTEST_EXECUTE_AFTER_TRIAL=1 \
// RUN:     c-index-test -test-load-source-reparse 3 local %s -I %t -I %t/mod -fmodules -fmodules-cache-path=%t/mcp 2>&1 | FileCheck %s

It wouldn't have failed in a clean build.

The test is reusing the module cache path %t/mcp from the previous run. I think most implicit modules tests make themselves robust to incremental builds by adding something like this before the first use of the cache:

// RUN: rm -rf %t/mcp

Right; see also description of https://reviews.llvm.org/D73202

However, while that makes the test more robust, the test kind of found a real bug that actual users could've hit (...but only uses who build with LLVM_APPEND_VC_REV=NO), since users usually don't manually manage cache dirs.

I managed to reproduce the failure by building everything with -DLLVM_APPEND_VC_REV=NO.

What I found was that any BENIGN_LANGOPT added to LangOptions.def would cause the test to fail since CompilerInvocation::getModuleHash doesn't use the option to compute the hash if it's a BENIGN_LANGOPT. The test was added a month ago, so I think my commit was the first to cause the failure. If I use LANGOPT instead of BENIGN_LANGOPT, the test passes.

The description of BENIGN_LANGOPT says "for options that don't affect the construction of the AST in any way" and since the option can change the AST (e.g., the value of unsigned length = sizeof(@encode(S)); will differ depending on whether the member for the option is set or not), I think I should have used LANGOPT instead of BENIGN_LANGOPT.

Since this is something that is easy to overlook, should there be a comment to remind people to update the version whenever they add a BENIGN_LANGOPT?