This is an archive of the discontinued LLVM Phabricator instance.

Fix incorrect Twine usage in CFGPrinter
ClosedPublic

Authored by mcopik on Oct 5 2018, 6:00 AM.

Details

Summary

CFGPrinter (-view-cfg, -dot-cfg) invokes an undefined behaviour (dangling pointer to rvalue) on IR files with branch weights. This patch fixes the problem caused by Twine initialization and string conversion split into two statements.

This change fixes the bug 37019. A similar patch to this problem was provided in the llvmlite project

Diff Detail

Event Timeline

mcopik created this revision.Oct 5 2018, 6:00 AM
kristina requested changes to this revision.EditedOct 6 2018, 1:08 AM
kristina added a reviewer: kristina.
kristina added a subscriber: kristina.

I don't see any issue, both call Twine::str() which collapses the Twine into an std::string at which point it's returning a std::string by copy or eliding it (not sure which), either of which are well defined. The Twine itself can accept an integer argument and can live on the stack as long as the scope its within is alive.

From a cosmetic viewpoint, I think your change does make it look better though, FWIW.

This revision now requires changes to proceed.Oct 6 2018, 1:08 AM

Thank you @kristina for the review. I must admit I do not fully understand in which way this is broken, but the attached test case clearly shows there is some memory issue. My (limited) understanding is that Twine(Weight->getZExtValue()) will be destroyed at the end of line 175 which means that already in line 176 we may potentially reference invalid memory.
http://llvm.org/docs/ProgrammersManual.html#dss-twine has a similar example and explicitly warns of such uses:

"Because Twine is constructed with temporary objects on the stack, and because these instances are destroyed at the end of the current statement, it is an inherently dangerous API. For example, this simple variant contains undefined behavior and will probably crash:

void foo(const Twine &T);

StringRef X = ...
unsigned i = ...
const Twine &Tmp = X + "." + Twine(i);
foo(Tmp);

because the temporaries are destroyed before the call. That said, Twine’s are much more efficient than intermediate std::string temporaries, and they work really well with StringRef. Just be aware of their limitations."

This change also fixes https://bugs.llvm.org/PR37019.

Marcin, twines are very hard to understand. It might make sense to add both a link the the programmers manual reference I just cited as well as a link to the bug report that is addressed with this patch. We should also acknowledge that a similar patch has already been written in the context of the LLVM lite project.

@kristina, what exactly do you suggest should be changed?

This talks about UB, but i fail to find any ASAN/UBSan output.
Is there ASAN/UBSan output for this? Can you post it?

Nontheless, if the code is identical, then the produced assembly should be mostly the same, too, right?
https://godbolt.org/z/vyEKNo <- i wouldn't say the produced string is the same.
E.g., where does the .asciz "label=\"W:" go?

include/llvm/Analysis/CFGPrinter.h
176

Reading the http://llvm.org/docs/ProgrammersManual.html#dss-twine, i *do* think the fix is correct.
The old code would call str() on a Twine stored on a stack,
which is clearly stated as broken in the manual.

JonasToth added inline comments.
include/llvm/Analysis/CFGPrinter.h
176

The pattern for Twine would be Twine("label=\"W:") + Weight->getZExtValue() + "\"").str(), why use Twine for the second value instead of the first one?
The fix itself should still be correct.

lebedev.ri added inline comments.Oct 7 2018, 4:09 AM
include/llvm/Analysis/CFGPrinter.h
176

That doesn't compile https://godbolt.org/z/7UhuUJ

mcopik added a comment.Oct 7 2018, 4:20 AM

@kristina thank you for the comment. I think that @grosser has already explain the problem in details, including the reference to docs which explicitly warns before storing Twines constructed with temporaries. Furthermore, Twine documentation in Twine.h states: " A Twine is not intended for use directly and should not be stored, its implementation relies on the ability to store pointers to temporary stack objects which may be deallocated at the end of a statement.". I verified this statement by inspecting Twine implementation and I found out that Twine does not capture nor copy the provided objects but always takes a pointer .

As a result, the statement in line 175 creates a Twine object which internally holds pointers to a Twine instance for weight value which was a temporary object on the stack that might be destroyed after execution moves to statement in line 176.

This bus was also discovered by someone else before: https://bugs.llvm.org/show_bug.cgi?id=37019 and it was fixed in a similar fashion in the llvmlite project https://github.com/numba/llvmlite/pull/343

JonasToth added inline comments.Oct 7 2018, 4:22 AM
include/llvm/Analysis/CFGPrinter.h
176

Fair point :D

mcopik edited the summary of this revision. (Show Details)Oct 7 2018, 4:22 AM
grosser accepted this revision.EditedOct 7 2018, 4:27 AM

Thanks Marcin for improving the commit message. We all seem to agree that this is the right correction. Hence, I feel this is ready to upstream from my side. As there have been quite some opinions, it would be great to hear if there are any open concerns that should be adressed or if others also feel this LGT(them). We especially still need feedback from @kristina!

As a minor remark. I would drop the comment about the twine stuff. We cannot have these comments spread all over the code base. Having this documented in the commit message seems sufficient from my perspective.

kristina accepted this revision.EditedOct 7 2018, 11:03 AM

My bad, I must have misunderstood the inner workings of Twine. I was under the impression that it would end up holding a reference in the LHS object since all of them were just constructed and Twine objects chain together. I know they're unsafe to store anywhere aside from the stack but I didn't realize this behavior was UB. My apologies, majority of my experience with Twines was using them as an input parameter instead of a StringRef for example.

Surprised this didn't get picked off by any sanitizer earlier though I'm guessing due to the location of it the compiler likely optimized it to the version in the patch, so it went unnoticed for a very long time.

Anyway, again, apologies for misunderstanding the semantics of it. LGTM!

This revision is now accepted and ready to land.Oct 7 2018, 11:03 AM

Thanks Marcin for improving the commit message. We all seem to agree that this is the right correction. Hence, I feel this is ready to upstream from my side. As there have been quite some opinions, it would be great to hear if there are any open concerns that should be adressed or if others also feel this LGT(them). We especially still need feedback from @kristina!

As a minor remark. I would drop the comment about the twine stuff. We cannot have these comments spread all over the code base. Having this documented in the commit message seems sufficient from my perspective.

Also agree on that point, the fix is good but it's probably best to drop the comment itself, if anything, documentation should likely be updated to clearly label such behavior as undefined as a followup.

Marcin, let us know if you agree with the comment fix. I can then commit the patch, when it's ready.

mcopik updated this revision to Diff 168630.Oct 8 2018, 3:21 AM

@grosser @kristina The superfluous comment is removed.

include/llvm/Analysis/CFGPrinter.h
176

@JonasToth I didn't want to change the original code and only remove the dangling pointer to a stack temporary object. I see that @lebedev.ri already answered your question; I think the problem is that addition operators for Twine are defined only for Twine and StringRef objects. I think there's a conversion to StringRef via ctor for c-string but not for uint64_t.

kristina accepted this revision.Oct 8 2018, 4:08 AM

LGTM and good catch, I wonder if there are any other similar improper usages of Twine going unnoticed. (Also, if you don't have commit access, please mention it so someone can land the patch on your behalf).

JonasToth added inline comments.Oct 8 2018, 5:05 AM
include/llvm/Analysis/CFGPrinter.h
176

Yeah. The constructor for Twine is overloaded for all common types, I kinda expected that for the concatentation functionalities, too. Everything is fine as is ;)

mcopik added a comment.Oct 8 2018, 8:08 AM

@kristina @grosser Could you ask you to commit the patch? I don't have the access.

@kristina @grosser Could you ask you to commit the patch? I don't have the access.

Will do so in a moment, my main buildbot just finishing running something before I can quickly build this with the patch.

This revision was automatically updated to reflect the committed changes.