This is an archive of the discontinued LLVM Phabricator instance.

[Clang][CodeGen][CXX]: Workaround __attribute((no_destroy)) crash/incorrect code generation.
ClosedPublic

Authored by kristina on Nov 9 2018, 12:14 PM.

Details

Summary

Difficult to reproduce crash, attempted it in the test, it appears to not trigger it. I can consistently reproduce it when building a large project with this attribute added and can consistently crash it without the fix in place, however.

This avoids the codepath that should not be executed in the first place by rechecking NoDestroyAttr and bailing out instead of emitting incorrect code or causing an assert in assert enabled builds. This does not appear to cause any regressions for x86_64 test suite and my test is useless since it does not offer the needed coverage.

clang-8: /SourceCache/llvm-trunk-8.0/include/llvm/Support/Casting.h:106: static bool llvm::isa_impl_cl<clang::CXXConstructorDecl, const clang::CXXMethodDecl *>::doit(const From *) [To = clang::CXXConstructorDecl, From = const clang::CXXMethodDecl *]: Assertion `Val && "isa<> used on a null pointer"' failed.

-- asserts here, rest is signal handler/stack trace --

#9 0x0000000000d5a35d toCXXCtorType /SourceCache/llvm-trunk-8.0/tools/clang/lib/CodeGen/CodeGenTypes.h:66:3
#10 0x0000000000d5a35d clang::CodeGen::CodeGenModule::getAddrOfCXXStructor(clang::CXXMethodDecl const*, clang::CodeGen::StructorType, clang::CodeGen::CGFunctionInfo const*, llvm::FunctionType*, bool, clang::CodeGen::ForDefinition_t) /SourceCache/llvm-trunk-8.0/tools/clang/lib/CodeGen/CGCXX.cpp:237:0
#11 0x0000000000e05c9b Address /SourceCache/llvm-trunk-8.0/tools/clang/lib/CodeGen/Address.h:31:5
#12 0x0000000000e05c9b ConstantAddress /SourceCache/llvm-trunk-8.0/tools/clang/lib/CodeGen/Address.h:78:0
#13 0x0000000000e05c9b clang::CodeGen::CodeGenFunction::EmitCXXGlobalVarDeclInit(clang::VarDecl const&, llvm::Constant*, bool) /SourceCache/llvm-trunk-8.0/tools/clang/lib/CodeGen/CGDeclCXX.cpp:179:0
#14 0x0000000000e07b7a clang::CodeGen::CodeGenFunction::GenerateCXXGlobalVarDeclInitFunc(llvm::Function*, clang::VarDecl const*, llvm::GlobalVariable*, bool) /SourceCache/llvm-trunk-8.0/tools/clang/lib/CodeGen/CGDeclCXX.cpp:608:3
#15 0x0000000000e06fc8 clang::CodeGen::CodeGenModule::EmitCXXGlobalVarDeclInitFunc(clang::VarDecl const*, llvm::GlobalVariable*, bool) /SourceCache/llvm-trunk-8.0/tools/clang/lib/CodeGen/CGDeclCXX.cpp:441:3
#16 0x0000000000b6596f clang::CodeGen::CodeGenModule::EmitGlobalVarDefinition(clang::VarDecl const*, bool) /SourceCache/llvm-trunk-8.0/tools/clang/lib/CodeGen/CodeGenModule.cpp:3650:5
#17 0x0000000000b5c66b clang::CodeGen::CodeGenModule::EmitGlobalDefinition(clang::GlobalDecl, llvm::GlobalValue*) /SourceCache/llvm-trunk-8.0/tools/clang/lib/CodeGen/CodeGenModule.cpp:0:12
#18 0x0000000000b67dfb getKind /SourceCache/llvm-trunk-8.0/tools/clang/include/clang/AST/DeclBase.h:421:51
#19 0x0000000000b67dfb classof /SourceCache/llvm-trunk-8.0/tools/clang/include/clang/AST/DeclCXX.h:3875:0
#20 0x0000000000b67dfb doit /SourceCache/llvm-trunk-8.0/include/llvm/Support/Casting.h:59:0
#21 0x0000000000b67dfb doit /SourceCache/llvm-trunk-8.0/include/llvm/Support/Casting.h:107:0
#22 0x0000000000b67dfb doit /SourceCache/llvm-trunk-8.0/include/llvm/Support/Casting.h:133:0
#23 0x0000000000b67dfb doit /SourceCache/llvm-trunk-8.0/include/llvm/Support/Casting.h:123:0
#24 0x0000000000b67dfb isa<clang::DecompositionDecl, clang::Decl *> /SourceCache/llvm-trunk-8.0/include/llvm/Support/Casting.h:143:0
#25 0x0000000000b67dfb dyn_cast<clang::DecompositionDecl, clang::Decl> /SourceCache/llvm-trunk-8.0/include/llvm/Support/Casting.h:334:0
#26 0x0000000000b67dfb clang::CodeGen::CodeGenModule::EmitTopLevelDecl(clang::Decl*) /SourceCache/llvm-trunk-8.0/tools/clang/lib/CodeGen/CodeGenModule.cpp:4783:0
#27 0x0000000000b6bf7b getPointer /SourceCache/llvm-trunk-8.0/include/llvm/ADT/PointerIntPair.h:56:58
#28 0x0000000000b6bf7b getNextDeclInContext /SourceCache/llvm-trunk-8.0/tools/clang/include/clang/AST/DeclBase.h:424:0
#29 0x0000000000b6bf7b operator++ /SourceCache/llvm-trunk-8.0/tools/clang/include/clang/AST/DeclBase.h:1973:0
#30 0x0000000000b6bf7b clang::CodeGen::CodeGenModule::EmitDeclContext(clang::DeclContext const*) /SourceCache/llvm-trunk-8.0/tools/clang/lib/CodeGen/CodeGenModule.cpp:4744:0
#31 0x000000000115a470 _ZN12_GLOBAL__N_117CodeGeneratorImpl18HandleTopLevelDeclEN5clang12DeclGroupRefE$7cf5ba2c3e38da85712ae9dd9c2a6e32 /SourceCache/llvm-trunk-8.0/tools/clang/lib/CodeGen/ModuleBuilder.cpp:159:73
#32 0x000000000115833d clang::BackendConsumer::HandleTopLevelDecl(clang::DeclGroupRef) /SourceCache/llvm-trunk-8.0/tools/clang/lib/CodeGen/CodeGenAction.cpp:171:11
#33 0x00000000011642f4 clang::ParseAST(clang::Sema&, bool, bool) /SourceCache/llvm-trunk-8.0/tools/clang/lib/Parse/ParseAST.cpp:161:11
#34 0x00000000010c9d83 shouldBuildGlobalModuleIndex /SourceCache/llvm-trunk-8.0/tools/clang/lib/Frontend/CompilerInstance.cpp:81:11
#35 0x00000000010c9d83 clang::FrontendAction::Execute() /SourceCache/llvm-trunk-8.0/tools/clang/lib/Frontend/FrontendAction.cpp:920:0
#36 0x0000000001028d21 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /SourceCache/llvm-trunk-8.0/tools/clang/lib/Frontend/CompilerInstance.cpp:968:11
#37 0x00000000011536fa clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /SourceCache/llvm-trunk-8.0/tools/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:266:25
#38 0x0000000000b0e77a cc1_main(llvm::ArrayRef<char const*>, char const*, void*) /SourceCache/llvm-trunk-8.0/tools/clang/tools/driver/cc1_main.cpp:218:13
#39 0x0000000000b0c5c0 ExecuteCC1Tool /SourceCache/llvm-trunk-8.0/tools/clang/tools/driver/driver.cpp:310:12
#40 0x0000000000b0c5c0 main /SourceCache/llvm-trunk-8.0/tools/clang/tools/driver/driver.cpp:382:0
#41 0x00007f2e66060830 __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/../csu/libc-start.c:325:0
#42 0x0000000000b09029 _start (/usr/local/sdk/llvm-8.0.4141/bin/clang-8+0xb09029)
clang-8: error: unable to execute command: Aborted (core dumped)
clang-8: error: clang frontend command failed due to signal (use -v to see invocation)
Kristina's toolchain (8.0.4141+Asserts+Modular+ThinLTO+Tests+Debug) clang version 8.0.4141 (https://llvm.googlesource.com/clang acc4a4c5ae74512d3a5bed39b207aa536f2debbe) (https://llvm.googlesource.com/llvm 10ce7e317240f9a45f85317712dd0f8dbbc98fab) (based on LLVM 8.0.4141svn)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/local/sdk/nightly/bin

My test would be adding this to no_destroy.cpp except unfortunately that does not manage to trigger it and thus passes, so I'm unable to include a suitable test:

namespace my_ns {
	class with_nontrivial {
	public:
		with_nontrivial() { SomeValue[2] = 3; }
		~with_nontrivial() { SomeValue[2] = 4; }
	protected:
		int SomeValue[10];
	};
	
	class my_class : public with_nontrivial {
	public:
		my_class(const char* cstr) {}
	};
}

namespace {
	constexpr char ce_var[] = "string literal";
	__attribute((no_destroy)) static my_ns::my_class s_var(ce_var);
}

Diff Detail

Event Timeline

kristina created this revision.Nov 9 2018, 12:14 PM
kristina updated this revision to Diff 173411.Nov 9 2018, 12:17 PM

Revised (style/ordering).

erik.pilkington added a subscriber: erik.pilkington.

Have you tried running creduce on the preprocessed source? We should really have a real reproducer for this, otherwise its really hard to tell what the underlying problem is.

Have you tried running creduce on the preprocessed source? We should really have a real reproducer for this, otherwise its really hard to tell what the underlying problem is.

Unable to build it with current Clang/LLVM headers, attribute is a very recent addition as well as general changes. I just get a flurry of errors like this:

/src/clwn/creduce/clang_delta/InstantiateTemplateTypeParamToInt.cpp:187:19: error: no member named 'getLocStart' in 'clang::TemplateTypeParmTypeLoc'
  void *Ptr = Loc.getLocStart().getPtrEncoding();

Have you tried running creduce on the preprocessed source? We should really have a real reproducer for this, otherwise its really hard to tell what the underlying problem is.

I'm going to echo this request -- without a test case, it's hard to know whether this actually fixes anything or just moves the issue elsewhere.

JonasToth resigned from this revision.Nov 9 2018, 1:46 PM

Have you tried running creduce on the preprocessed source? We should really have a real reproducer for this, otherwise its really hard to tell what the underlying problem is.

I'm going to echo this request -- without a test case, it's hard to know whether this actually fixes anything or just moves the issue elsewhere.

Running the creduce on it now, it's slowly chugging along, will see what happens, only 15112502 bytes to go, I'll get back to this in a few days and see what/if anything it comes up with. I do hope I launched it right.

kristina added a comment.EditedNov 9 2018, 11:09 PM

This seems to be a peculiar side effect of weird combinations of previous uses of attributes no_destroy, used, and section, as well as complex macros used to create metaclass-like systems (through the use of said attributes). It seems that there is a serious bug somewhere, the repeated use of metaclass-like macros seem to break Clang's lexer/parser and leave it in a state where it starts behaving erratically, with this being a major symptom of it.

The trace state at the time of the crash usually resembles this:

1.      /SourceCache/ips-1.0.0/os/credentials.cpp:20:41: current parser token ';'
2.      LLVM IR generation of declaration '<using-directive>'
3.      /SourceCache/ips-1.0.0/os/credentials.cpp:20:1 <Spelling=/SourceCache/ips-1.0.0/os/credentials.cpp:20:17>: Generating code for declaration '(anonymous namespace)::s_log'

When using static instead of an anonymous namespace the last token parsed is static instead of ;. I also seem to have found a bug in implementation of no_destroy, fixing which would be considered an actual "fix" instead of a "workaround". I gave several other approaches of replicating this state, closest one being:

class global_ns_class {
public:
	global_ns_class(int some_value) { SomeValue[2] = some_value; }
	~global_ns_class() { SomeValue[2] = 1; }
protected:
	volatile int SomeValue[10];
};

#define MAKE_GVAR(_sfx, _init_val) \
global_ns_class g_var##_sfx(_init_val); \
__attribute((no_destroy, used, section("namedsect"))) \
global_ns_class* g_var##_sfx##_ref = &g_var##_sfx;

MAKE_GVAR(ONE,   1);
MAKE_GVAR(TWO,   2);
MAKE_GVAR(THREE, 3);

namespace my_ns {
	class with_nontrivial {
	public:
		with_nontrivial() { SomeValue[2] = 3; }
		~with_nontrivial() { SomeValue[2] = 4; }
	protected:
		int SomeValue[10];
	};
	
	class my_class : public with_nontrivial {
	public:
		my_class(const char* cstr) {}
	};
}

#define MAKE_VAR(_x,_y) \
	namespace { constexpr char _$__##_x[] = _y; \
	__attribute((no_destroy)) my_ns::my_class _x (_$__##_x); }
	
MAKE_VAR(s_var, "test str lit");

Which seems to force Clang to take an incorrect code path but not trip the assert, unlike in the project where the assert is reliably tripped likely due to a manifestation of another, far more bening bug which is far beyond the scope of my understanding of Clang internals (it seems to be rooted in the preprocessor<->sema interactions). The 400k line preprocessed file is able to trigger it with 100% reliability, the preconditions for triggering the bug require the code to be semantically valid, which makes it hard to fuzz.

There seem to be too many variables to take into account to reliably reproduce it with the assert firing off, some manifestations just generate subtly-incorrect code (by emitting __cxa_atexit even with attribute present). The libc atexit fallback in handling of that attribute also seems like it could lead to issue but at this point this is just speculation.

I'm honestly not sure what to make of it at this point. Considering it's an immensely useful attribute for eliminating destructors of objects with "infinite" lifetime (I first hit this bug when I added it to all top level loggers which are created through a macro similar to one above), given that the x86_64 test suite does pass with it, I still think some kind of workaround is worth it (with reference to this issue)? I'd be happy to revert it if/once a concrete cause is found but so far I seem to be hitting dead ends.

I added a logger for when this specific issue is triggered to my downstream fork and it seems in most cases it causes "bening" behavior of attribute being ineffective. The assert is tripped when the attribute is partially ineffective, leading to Clang trying to emit code to register a destructor that doesn't exist with cxxabi.

Considering this is something I will likely have to dig through by hand, can anyone help me brainstorm how to approach this besides fuzzing it which I don't think will work due to the precondition of the code being semantically valid (not being diagnosed with an error)?

I suspect I could also try to trace attributes and attempt to reproduce parts that use attributes alone, in hopes of getting Clang in that state, not sure if that will be effective but it seems like a decent bet.

I've managed to isolate enough to make for a testcase. Is that too broad or is it okay to attach as is?

The following triggers the assert:

namespace util {
	template <typename T>
	class test_vector
	{
	public:
		test_vector(unsigned c)
			: Used(0), TotalSize(sizeof(T) * c)
		{
			// Intentional
			Storage = (T*)(new T[TotalSize]);
		}
		~test_vector() {
			delete[] Storage;
		}
		char* data() {
			return Storage;
		}
		unsigned size() {
			return TotalSize;
		}
		void push_back(T one_thing) {
			Storage[Used++] = one_thing;
		}
	private:
		unsigned Used;
		unsigned TotalSize;
		T*       Storage;
	};
}

volatile int do_not_optimize_me = 0;

namespace os {
	using char_vec_t = util::test_vector<char>;
	
	class logger_base
	{
	public:
		constexpr logger_base() = default;
	protected:
		char_vec_t* NamePtr = nullptr;
		char_vec_t  Name    = char_vec_t(10);
	};
	
	class logger : public logger_base
	{
	public:
		void stuff(const char* fmt, int something) {
			do_not_optimize_me = something + fmt[0];
		}
		logger()
		{
			Name = char_vec_t(8);
			Name.push_back('a');
		}
		
		logger(const char* name)
		{
			Name = util::test_vector<char>(__builtin_strlen(name));
			Name.push_back(name[0]);
			Name.push_back(name[1]);
		}
	};
}

#define TOPLEVEL_LOGGER(_var_name, _category_const)               \
	namespace { constexpr char $__##_var_name[] = _category_const; \
	__attribute((no_destroy)) os::logger _var_name ($__##_var_name) ; }

TOPLEVEL_LOGGER(s_log, "something");

class some_class {
public:
	some_class(int some_value) {
		do_not_optimize_me = some_value;
		s_log.stuff("wawawa", 5 + do_not_optimize_me);
	}
	~some_class() = default;
};

static some_class s_ctor(1);
kristina updated this revision to Diff 173503.EditedNov 10 2018, 3:14 AM
kristina set the repository for this revision to rC Clang.

Revised, I think I worked out what was happening, there's an explanation in the comment in the test. This is a relatively new attribute so the coverage for it did not test this specific corner case, I've managed to manually narrow it down and write a reliable regression test. The core issue boils down to having a non-trivially destructible member in a superclass that lacks a destructor and a subclass that lacks a destructor, in which case, a different path was taken to balance out the static storage duration class' initializer call and the __cxa_atexit registration. This adds an explicit check for the attribute and avoids balancing out the constructor as intended by the attribute.

The new test successfully replicates the assertion failure and should fail for the above cases in assertion builds. Without assertions the generated code produces undefined behavior if the above conditions are met. There is a separate test for this attribute that provides the coverage for its functionality, this is a much more narrower test for a very specific scenario in which it was possible to cause an improperly balanced constructor call followed by a emission of a call to a destructor that would never be emitted due to the attribute, thus tripping the assert. Now no attempt to call the destructor is made if the declaration is marked as no_destroy.

Test without patch:

=> ninja check-clang-semacxx

FAIL: Clang :: SemaCXX/attr-no-destroy-d54344.cpp (164 of 818)
******************** TEST 'Clang :: SemaCXX/attr-no-destroy-d54344.cpp' FAILED ********************

(--- stack trace and the expected assertion failure ---)

/q/org.llvm.caches/llvm-8.0/4141/tools/clang/test/SemaCXX/Output/attr-no-destroy-d54344.cpp.script: line 1: 31356 Aborted (core dumped)

--

********************
Testing Time: 5.03s
********************
Failing Tests (1):
    Clang :: SemaCXX/attr-no-destroy-d54344.cpp

  Expected Passes    : 816
  Expected Failures  : 1
  Unexpected Failures: 1
FAILED: tools/clang/test/CMakeFiles/check-clang-semacxx

ninja: build stopped: subcommand failed.

Patch applied:

=> ninja check-clang-semacxx

Testing Time: 6.28s
  Expected Passes    : 817
  Expected Failures  : 1
lebedev.ri added inline comments.
test/SemaCXX/attr-no-destroy-d54344.cpp
93 ↗(On Diff #173503)

Please add new line.

kristina updated this revision to Diff 173507.Nov 10 2018, 3:29 AM

Revised, added a newline to the testcase.

kristina marked an inline comment as done.Nov 10 2018, 3:29 AM

I have a few nits, but I think this looks about right.

I reduced this testcase with creduce, can you use the minimized version? Also, I think it makes more sense to put the test into test/CodeGenCXX and verify the -emit-llvm-only output is correct with FileCheck. You can check out pretty much any test in that directory for an example of how to do this if you don't know already.

class a {
public:
  ~a();
};
class logger_base {
  a d;
};
class e : logger_base {};
__attribute((no_destroy)) e g;
lib/CodeGen/CGDeclCXX.cpp
72

You should do this with D.isNoDestroy(CGF.getContext()), I suspect that this will still crash with -fno-c++-static-destructors without an attribute, and that function checks for the existence of that flag. Can you add a test for that too?

kristina updated this revision to Diff 173530.Nov 10 2018, 2:24 PM
kristina set the repository for this revision to rC Clang.

@erik.pilkington Fantastic catch, I totally forgot about the global flag. Revised, moved to CodeGenCXX, added a test to verify ctor/dtor is balanced under normal circumstances and two tests to ensure ctor is not balanced with either flag or attribute.

=> ninja check-clang-codegencxx

Testing Time: 7.68s
  Expected Passes    : 927
  Expected Failures  : 2
  Unsupported Tests  : 5

The attribute was actually missing tests for IR generation so thank you for suggesting moving them there, I think this covers all cases for this bug (balanced regression test without attr or flag, unbalanced test because of attr, unbalanced test because of global flag).

kristina marked an inline comment as done.Nov 10 2018, 2:26 PM
aaron.ballman accepted this revision.Nov 10 2018, 2:52 PM

LGTM aside from some small commenting nits. However, I leave it to @erik.pilkington to make the final sign-off.

lib/CodeGen/CGDeclCXX.cpp
68

non-existant -> nonexistent

72

Drop the (D54344) from the comment.

This revision is now accepted and ready to land.Nov 10 2018, 2:52 PM
kristina updated this revision to Diff 173534.EditedNov 10 2018, 3:15 PM

Revised to address nitpicks. And yeah, I'll wait for Erik to sign off on it before I commit.

kristina marked 2 inline comments as done.Nov 10 2018, 3:18 PM
erik.pilkington accepted this revision.Nov 11 2018, 4:39 PM

LGTM too, with one small fix in test. Thanks for working on this!

test/CodeGenCXX/attr-no-destroy-d54344.cpp
4 ↗(On Diff #173534)

Why do you require asserts here? If this test still passes regardless of whether clang was compiled with assertions enabled, we should always run it.

LGTM too, with one small fix in test. Thanks for working on this!

Well, I figured since the assertion being tripped was (before investigation) the only reliable way to notice the bug it makes sense for it to stay in, main concern being that should anything regress and assertions are off, the code generation is essentially undefined. Though a good argument for taking it out would be the fact that currently it's the only test that verifies IR generated with the attribute (last time I checked), but I would also imagine most people running tests would have assertion enabled (or debug) builds.

Though I'm happy to revise the test to remove the assertion requirement, deferring to your judgement with regards to above.

Aside from that, are you happy for me to land this after the revision?

kristina updated this revision to Diff 173602.Nov 11 2018, 5:02 PM

Revised to address comments.

Though on the other hand it makes no sense to skip it with asserts off either, that just clicked in my head, sorry.

LGTM too, with one small fix in test. Thanks for working on this!

Well, I figured since the assertion being tripped was (before investigation) the only reliable way to notice the bug it makes sense for it to stay in, main concern being that should anything regress and assertions are off, the code generation is essentially undefined. Though a good argument for taking it out would be the fact that currently it's the only test that verifies IR generated with the attribute (last time I checked), but I would also imagine most people running tests would have assertion enabled (or debug) builds.

clang/test/CodeGenCXX/{always,no}_destroy.cpp also test the generated IR. I would prefer always running it, its an interesting code path that might break in other ways, so even if we can't test as thoroughly without assertions, its still worth it to run.

Though I'm happy to revise the test to remove the assertion requirement, deferring to your judgement with regards to above.

Aside from that, are you happy for me to land this after the revision?

Yep, land away! Thanks again for fixing this.

This revision was automatically updated to reflect the committed changes.