Add guards to the asm writer to prevent crashing
when dumping an instruction that has no basic
block.
Details
Diff Detail
Event Timeline
lib/IR/AsmWriter.cpp | ||
---|---|---|
3157 | Why do you need TheModule here? |
This test case is quite large, can it be made smaller?
Also there are no changes to InstCombine, why are you running the pass?
David: I have the same question. It turns out that to reproduce you need to print an instruction that has a metadata attached, but does not have a parent.
Here is the offending code in InstCombine (with comment from me on what happens):
DEBUG(raw_string_ostream SS(OrigI); I->print(SS); OrigI = SS.str();); DEBUG(dbgs() << "IC: Visiting: " << OrigI << '\n'); if (Instruction *Result = visit(*I)) { // <-- Result does not have a parent, neither a dbgloc at this point ++NumCombined; // Should we replace the old instruction with a new one? if (Result != I) { DEBUG(dbgs() << "IC: Old = " << *I << '\n' << " New = " << *Result << '\n'); // <-- This print succeed because there is no debug location if (I->getDebugLoc()) Result->setDebugLoc(I->getDebugLoc()); // <-- here we set the debug location, still no parent // Everything uses the new instruction now. I->replaceAllUsesWith(Result); // <- we are in a state where we have user but still no parent, is it valid? // Move the name to the new instruction first. Result->takeName(I); // Push the new instruction and any users onto the worklist. Worklist.Add(Result); // <- This is where we will try to print again, but this time with a debug location and no parent (->crash) Worklist.AddUsersToWorkList(*Result); // Insert the new instruction into the basic block... BasicBlock *InstParent = I->getParent(); BasicBlock::iterator InsertPos = I->getIterator();
This test case is quite large, can it be made smaller?
Most of the size is from the "-g" metadata, however the crash does not happen without metadata (in particular, compiling without "-g" makes the problem go away). I could try to edit the metadata by hand -- are there any guidelines or suggestions you could offer with that regard?
Also there are no changes to InstCombine, why are you running the pass?
The test is intended to make sure that you can run instcombine with "-debug" and not crash.
@joker.eph: I agree with your assessment.
lib/IR/AsmWriter.cpp | ||
---|---|---|
3157 | TheModule is a null pointer in the case where the instruction has no parent BB. |
You can use bugpoint to reduce the size of the code (I expect one basic block and 3 instructions to be enough). For the metadata, you'll need only the one attached to the faulty instruction.
However, as Duncan said, I don't think this is the test we want and a simple C++ unitest seems less fragile to assert that "we can print a value that has no parent but has a metadata ".
lib/IR/AsmWriter.cpp | ||
---|---|---|
3157 | This does not answer my question. Why do you need this check *here*? A few lines above, TheModule was dereferenced, it makes sense. Here it does not seem to be the case. |
However, as Duncan said, I don't think this is the test we want and a simple C++ unitest seems less fragile to assert that "we can print a value that has no parent but has a metadata ".
You and Duncan are right -- a C++ unit test would be smaller/better. I'll see about writing one.
Thanks for the comments.
lib/IR/AsmWriter.cpp | ||
---|---|---|
3157 | You are right -- not sure what my thinking was there. I'll take it out in a subsequent patch. Thanks. |
Updated patch with C++ unit test instead of *.ll test.
Question: I looked around for a style guide or "how to" for writing new C++ unit tests but could not find one. Did I miss it, or maybe such a guide doesn't exist?
I had to choose between A) constructing the IR directly with IRBuilder or equivalent and B) writing assembly, invoking parseAssemblyString, then grabbing pointers to instructions. I chose option "A", hope that is ok.
unittests/IR/AsmWriterTest.cpp | ||
---|---|---|
49 | It seems a bit overkill to me to have to create a function and multiple basic blocks, I'd expect this to be close to minimal: // PR24852: Ensure that an instruction can be printed even when it // has no parent. LLVMContext Ctx; auto Ty = Type::getInt32Ty(Ctx) auto Undef = UndefValue::get(Ty); std::unique_ptr<BinaryOperator> Add(BinaryOperator::CreateAdd(Undef, Undef)); Add.setDebugLoc(DebugLoc()); std::string S; raw_string_ostream OS(S); Add->print(OS) ASSERT_EQ(OS.str(), "<insert correct string here>"); } |
unittests/IR/AsmWriterTest.cpp | ||
---|---|---|
49 | It's possible that the empty DebugLoc won't be enough and a metadata is needed I don't know. Your original example can be fine, but add an ASSERT_EQ on the expected outputs. |
Hmm, I think I may have jumped the gun-- the reduced test case doesn't actually trigger the crash. I'll go back to my original code...
Revert to previous test code (br with weights).
Add EXPECT_EQ tests for results of print() calls.
This is what I was fearing, it needs a real metadata attached.
Replacing the setDebugLoc with
Add->setMetadata( "", MDNode::get(Ctx, {ConstantAsMetadata::get(ConstantInt::get(Ty, 1))}));
seems to do the trick though.
—
Mehdi
Thanks for the helpful review suggestions.
I am a newcomer to LLVM and do not have commit permissions. Would someone be willing to help me out by submitting this change for me?
Would the following be a better way to address this:
diff --git a/lib/IR/AsmWriter.cpp b/lib/IR/AsmWriter.cpp index 185db47..949b710 100644 --- a/lib/IR/AsmWriter.cpp +++ b/lib/IR/AsmWriter.cpp @@ -2052,7 +2052,8 @@ private: /// \brief Print out metadata attachments. void printMetadataAttachments( const SmallVectorImpl<std::pair<unsigned, MDNode *>> &MDs, - StringRef Separator); + StringRef Separator, + LLVMContext &Context); // printInfoComment - Print a little comment after the instruction indicating // which slot it occupies. @@ -2627,7 +2628,7 @@ void AssemblyWriter::printFunction(const Function *F) { SmallVector<std::pair<unsigned, MDNode *>, 4> MDs; F->getAllMetadata(MDs); - printMetadataAttachments(MDs, " "); + printMetadataAttachments(MDs, " ", F->getContext()); if (F->isDeclaration()) { Out << '\n'; @@ -3111,7 +3112,7 @@ void AssemblyWriter::printInstruction(const Instruction &I) { // Print Metadata info. SmallVector<std::pair<unsigned, MDNode *>, 4> InstMD; I.getAllMetadata(InstMD); - printMetadataAttachments(InstMD, ", "); + printMetadataAttachments(InstMD, ", ", I.getType()->getContext()); // Print a nice comment. printInfoComment(I); @@ -3119,12 +3120,13 @@ void AssemblyWriter::printInstruction(const Instruction &I) { void AssemblyWriter::printMetadataAttachments( const SmallVectorImpl<std::pair<unsigned, MDNode *>> &MDs, - StringRef Separator) { + StringRef Separator, + LLVMContext &Context) { if (MDs.empty()) return; if (MDNames.empty()) - TheModule->getMDKindNames(MDNames); + Context.getMDKindNames(MDNames); for (const auto &I : MDs) { unsigned Kind = I.first;
That way you'd still get the names in the printout.
Since the mailing list comments don't show up here, please note Mehdi Amini's suggestion to use MDs[0].second->getContext().
Since the mailing list comments don't show up here, please note Mehdi Amini's suggestion to use MDs[0].second->getContext().
Agreed.
Why do you need TheModule here?