This is an archive of the discontinued LLVM Phabricator instance.

Fix for Bug 24852 (crash with -debug -instcombine)
ClosedPublic

Authored by thanm on Dec 28 2015, 12:40 PM.

Details

Summary

Add guards to the asm writer to prevent crashing
when dumping an instruction that has no basic
block.

Diff Detail

Event Timeline

thanm updated this revision to Diff 43704.Dec 28 2015, 12:40 PM
thanm retitled this revision from to Fix for Bug 24852 (crash with -debug -instcombine).
thanm updated this object.
thanm added a subscriber: llvm-commits.
thanm added a subscriber: davidxl.
thanm added a reviewer: sanjoy.Jan 4 2016, 5:24 AM
thanm added a comment.Jan 6 2016, 1:18 PM

Ping. Let me know if there are any questions.

What instructions in the test trigger the crash?

mehdi_amini added inline comments.
lib/IR/AsmWriter.cpp
3157

Why do you need TheModule here?

This test case is quite large, can it be made smaller?

Also there are no changes to InstCombine, why are you running the pass?

The problem gets triggered by instcomine? Perhaps it exposes a problem there?

David: I have the same question. It turns out that to reproduce you need to print an instruction that has a metadata attached, but does not have a parent.

Here is the offending code in InstCombine (with comment from me on what happens):

DEBUG(raw_string_ostream SS(OrigI); I->print(SS); OrigI = SS.str(););
DEBUG(dbgs() << "IC: Visiting: " << OrigI << '\n');

if (Instruction *Result = visit(*I)) {   // <-- Result does not have a parent, neither a dbgloc at this point
  ++NumCombined;
  // Should we replace the old instruction with a new one?
  if (Result != I) {
    DEBUG(dbgs() << "IC: Old = " << *I << '\n'
                 << "    New = " << *Result << '\n');      // <-- This print succeed because there is no debug location

    if (I->getDebugLoc())
      Result->setDebugLoc(I->getDebugLoc());   // <-- here we set the debug location, still no parent
    // Everything uses the new instruction now.
    I->replaceAllUsesWith(Result);    // <- we are in a state where we have user but still no parent, is it valid?

    // Move the name to the new instruction first.
    Result->takeName(I);

    // Push the new instruction and any users onto the worklist.
    Worklist.Add(Result);               // <- This is where we will try to print again, but this time with a debug location and no parent (->crash)
    Worklist.AddUsersToWorkList(*Result);

    // Insert the new instruction into the basic block...
    BasicBlock *InstParent = I->getParent();
    BasicBlock::iterator InsertPos = I->getIterator();
thanm added a comment.Jan 6 2016, 2:44 PM

This test case is quite large, can it be made smaller?

Most of the size is from the "-g" metadata, however the crash does not happen without metadata (in particular, compiling without "-g" makes the problem go away). I could try to edit the metadata by hand -- are there any guidelines or suggestions you could offer with that regard?

Also there are no changes to InstCombine, why are you running the pass?

The test is intended to make sure that you can run instcombine with "-debug" and not crash.

@joker.eph: I agree with your assessment.

lib/IR/AsmWriter.cpp
3157

TheModule is a null pointer in the case where the instruction has no parent BB.

You can use bugpoint to reduce the size of the code (I expect one basic block and 3 instructions to be enough). For the metadata, you'll need only the one attached to the faulty instruction.

However, as Duncan said, I don't think this is the test we want and a simple C++ unitest seems less fragile to assert that "we can print a value that has no parent but has a metadata ".

lib/IR/AsmWriter.cpp
3157

This does not answer my question. Why do you need this check *here*? A few lines above, TheModule was dereferenced, it makes sense. Here it does not seem to be the case.

thanm added a comment.Jan 6 2016, 3:38 PM

However, as Duncan said, I don't think this is the test we want and a simple C++ unitest seems less fragile to assert that "we can print a value that has no parent but has a metadata ".

You and Duncan are right -- a C++ unit test would be smaller/better. I'll see about writing one.

Thanks for the comments.

lib/IR/AsmWriter.cpp
3157

You are right -- not sure what my thinking was there. I'll take it out in a subsequent patch. Thanks.

thanm updated this revision to Diff 44215.Jan 7 2016, 8:29 AM

Remove *.ll test; add C++ unit test.
Take out unnecessary guard in metadata printer.

thanm added a comment.Jan 7 2016, 8:36 AM

Updated patch with C++ unit test instead of *.ll test.

Question: I looked around for a style guide or "how to" for writing new C++ unit tests but could not find one. Did I miss it, or maybe such a guide doesn't exist?

I had to choose between A) constructing the IR directly with IRBuilder or equivalent and B) writing assembly, invoking parseAssemblyString, then grabbing pointers to instructions. I chose option "A", hope that is ok.

mehdi_amini added inline comments.Jan 7 2016, 9:00 AM
unittests/IR/AsmWriterTest.cpp
50

It seems a bit overkill to me to have to create a function and multiple basic blocks, I'd expect this to be close to minimal:

// PR24852: Ensure that an instruction can be printed even when it
// has no parent.
LLVMContext Ctx;
auto Ty = Type::getInt32Ty(Ctx)
auto Undef = UndefValue::get(Ty);
std::unique_ptr<BinaryOperator> Add(BinaryOperator::CreateAdd(Undef, Undef));
Add.setDebugLoc(DebugLoc());
std::string S;
raw_string_ostream OS(S);
Add->print(OS)
ASSERT_EQ(OS.str(), "<insert correct string here>");
}
mehdi_amini added inline comments.Jan 7 2016, 9:02 AM
unittests/IR/AsmWriterTest.cpp
50

It's possible that the empty DebugLoc won't be enough and a metadata is needed I don't know.

Your original example can be fine, but add an ASSERT_EQ on the expected outputs.

thanm updated this revision to Diff 44221.Jan 7 2016, 9:11 AM

Reduced unit test based on guidance from joker.eph.

thanm added a comment.Jan 7 2016, 9:16 AM

Hmm, I think I may have jumped the gun-- the reduced test case doesn't actually trigger the crash. I'll go back to my original code...

thanm updated this revision to Diff 44225.Jan 7 2016, 9:35 AM

Revert to previous test code (br with weights).
Add EXPECT_EQ tests for results of print() calls.

This is what I was fearing, it needs a real metadata attached.
Replacing the setDebugLoc with

Add->setMetadata(
    "", MDNode::get(Ctx, {ConstantAsMetadata::get(ConstantInt::get(Ty, 1))}));

seems to do the trick though.


Mehdi

thanm updated this revision to Diff 44233.Jan 7 2016, 11:01 AM

Incorporate another suggestion from joker.eph.

mehdi_amini accepted this revision.Jan 7 2016, 11:13 AM
mehdi_amini added a reviewer: mehdi_amini.

LGTM.

unittests/IR/AsmWriterTest.cpp
24

"has a metadata attached but no parent."

This revision is now accepted and ready to land.Jan 7 2016, 11:13 AM
thanm updated this revision to Diff 44236.Jan 7 2016, 11:19 AM
thanm edited edge metadata.

Update comment.

thanm added a comment.Jan 7 2016, 11:50 AM

Thanks for the helpful review suggestions.

I am a newcomer to LLVM and do not have commit permissions. Would someone be willing to help me out by submitting this change for me?

Would the following be a better way to address this:

diff --git a/lib/IR/AsmWriter.cpp b/lib/IR/AsmWriter.cpp
index 185db47..949b710 100644
--- a/lib/IR/AsmWriter.cpp
+++ b/lib/IR/AsmWriter.cpp
@@ -2052,7 +2052,8 @@ private:
   /// \brief Print out metadata attachments.
   void printMetadataAttachments(
       const SmallVectorImpl<std::pair<unsigned, MDNode *>> &MDs,
-      StringRef Separator);
+      StringRef Separator,
+      LLVMContext &Context);
 
   // printInfoComment - Print a little comment after the instruction indicating
   // which slot it occupies.
@@ -2627,7 +2628,7 @@ void AssemblyWriter::printFunction(const Function *F) {
 
   SmallVector<std::pair<unsigned, MDNode *>, 4> MDs;
   F->getAllMetadata(MDs);
-  printMetadataAttachments(MDs, " ");
+  printMetadataAttachments(MDs, " ", F->getContext());
 
   if (F->isDeclaration()) {
     Out << '\n';
@@ -3111,7 +3112,7 @@ void AssemblyWriter::printInstruction(const Instruction &I) {
   // Print Metadata info.
   SmallVector<std::pair<unsigned, MDNode *>, 4> InstMD;
   I.getAllMetadata(InstMD);
-  printMetadataAttachments(InstMD, ", ");
+  printMetadataAttachments(InstMD, ", ", I.getType()->getContext());
 
   // Print a nice comment.
   printInfoComment(I);
@@ -3119,12 +3120,13 @@ void AssemblyWriter::printInstruction(const Instruction &I) {
 
 void AssemblyWriter::printMetadataAttachments(
     const SmallVectorImpl<std::pair<unsigned, MDNode *>> &MDs,
-    StringRef Separator) {
+    StringRef Separator,
+    LLVMContext &Context) {
   if (MDs.empty())
     return;
 
   if (MDNames.empty())
-    TheModule->getMDKindNames(MDNames);
+    Context.getMDKindNames(MDNames);
 
   for (const auto &I : MDs) {
     unsigned Kind = I.first;

That way you'd still get the names in the printout.

thanm added a comment.Jan 7 2016, 12:02 PM

Good suggestion-- let me pull that into the patch.

Since the mailing list comments don't show up here, please note Mehdi Amini's suggestion to use MDs[0].second->getContext().

thanm added a comment.Jan 7 2016, 12:06 PM

Since the mailing list comments don't show up here, please note Mehdi Amini's suggestion to use MDs[0].second->getContext().

Agreed.

thanm updated this revision to Diff 44247.Jan 7 2016, 12:16 PM

Incorporate joker.eph's suggestion to use MDs[0].second->getContext().

mehdi_amini closed this revision.Jan 7 2016, 12:18 PM

r257094. Thanks.