This is an archive of the discontinued LLVM Phabricator instance.

Runtime stack overflow prevention for Twines
Needs ReviewPublic

Authored by marcodiiga on May 19 2014, 12:58 AM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

As the documentation stands, Twines are inherently dangerous and they should only be used as temporary arguments to a function. No temporaries should be stored as in the following

void foo(const Twine &T);
...
StringRef X = ...
unsigned i = ...
const Twine &Tmp = X + "." + Twine(i);
foo(Tmp);

anyway there's a different case much more subtle that doesn't seem to create problems at first and (tested on a Win8 machine - VS2012) sometimes only shows up in different project configurations (e.g. Release but not in Debug):

const char *ptr1 = "_text1";
const char *ptr2 = "_text2";
const char *ptr3 = "_text3";
llvm::Twine data = llvm::Twine("data") + ptr1 + ptr2 + ptr3;
function_call(data); // function_call(Twine& tw);

This is a totally different case where data consistency is put at risk and only at runtime one might encounter (or not) a quite substantial variety of errors, mostly stack overflows.

The reason is explained by the following C++ test-case which causes a runtime stack-overflow on a Win8 VS2012 system

#include <iostream>
using namespace std;

class Twine;

enum Kind {twineType, stringType, emptyKind};

union Child {
  const char *cStringPtr;
  const Twine* twine;
};

// Simplified model of the Twine class - relevant excerpts
class Twine {
public:

  Child LHS;
  Kind LHSKind;
  Child RHS;
  Kind RHSKind;

  Twine(const char* pt) {
    LHSKind = Kind::stringType;
    LHS.cStringPtr = pt;
    RHSKind = Kind::emptyKind;
  }

  explicit Twine(Child _LHS, Kind _LHSKind,
    Child _RHS, Kind _RHSKind)
    : LHS(_LHS), RHS(_RHS), LHSKind(_LHSKind), RHSKind(_RHSKind) {
  }

  Twine Twine::concat(const Twine &Suffix) const {
    Child NewLHS, NewRHS;
    NewLHS.twine = this;
    NewRHS.twine = &Suffix;
    Kind NewLHSKind = Kind::twineType, NewRHSKind = Kind::twineType;

    if(RHSKind == Kind::emptyKind) { // If LHS is unused, just copy the POD stuff and forget about the object (save indirection)
      NewLHS = LHS;
      NewLHSKind = LHSKind;
    }
    if(Suffix.RHSKind == Kind::emptyKind) { // If RHS is unused, just copy the POD stuff and forget about the object (save indirection)
      NewRHS = Suffix.LHS;
      NewRHSKind = Suffix.LHSKind;
    }
    return Twine(NewLHS, NewLHSKind, NewRHS, NewRHSKind);
  }

};

Twine operator+(const Twine &LHS, const Twine &RHS) {
  return LHS.concat(RHS);
}

int main(int argc, char* argv[])
{

  Twine dt("A");
  dt = dt + "B"; // First time this works since the temporary Child data is entirely copied (POD = deep copy)

  dt = dt + "C"; // Second time this doesn't work! dt's Child LHS (which contained the pointer to the data) is 
                 // substituted with a pointer to itself. No more data.

  // dt.str(); // Infinite recursion when exploring LHS and stack overflow

  return 0;
}

Whether or not this is considered a bug, it can introduce subtle memory problems in an application and aside from the "Twine is inherently dangerous" line, there's no mention of it. The proposed fix attempts to avoid such a (likely) unintended use.

As for the "Rule of Three", I believe the issue only stands for already-initialized reuse of objects, thus: only for assignment operators.

Diff Detail

Event Timeline

marcodiiga updated this revision to Diff 9522.May 19 2014, 12:58 AM
marcodiiga retitled this revision from to Runtime stack overflow prevention for Twines.
marcodiiga updated this object.
marcodiiga edited the test plan for this revision. (Show Details)
marcodiiga added a reviewer: deleted.
marcodiiga added a subscriber: Unknown Object (MLST).

The test case you explained is really good. Is it possible to include the test case with this patch?

dblaikie added a subscriber: dblaikie.

(removing the "llvm" group from the review to reduce the number of 'to' people (that way every reply won't require moderation for having too many people on the 'to' line))