This is an archive of the discontinued LLVM Phabricator instance.

Add a c_str() member to StringLiteral
AbandonedPublic

Authored by zturner on Dec 13 2016, 10:30 AM.

Details

Summary

string literals are necessarily null-terminated, so it makes sense to provide a c_str() method to treat it as a null terminated string.

In practice this arises because you might imagine a global table that contains option names which you want to pass to something like getopt_long_only, but in other situations you might want to do some comparisons on the option name (for example, is --foo in the options table), where you want to make use of StringRef comparison operators. Of course, there are tons of other system calls or library calls which we don't have control over that take const char* and where we pull arguments for these functiosn from a global table of const char *, so this doesn't seem like just a one-off case.

Writing .str().c_str() for these cases is problematic since it means we cannot assume the storage will live longer than the current statement, plus it's just inefficient since we're unnecessarily incurring a copy.

Diff Detail

Event Timeline

zturner updated this revision to Diff 81254.Dec 13 2016, 10:30 AM
zturner retitled this revision from to Add a c_str() member to StringLiteral.
zturner updated this object.
zturner added a subscriber: llvm-commits.
zturner updated this revision to Diff 81319.Dec 13 2016, 3:50 PM

rsmith showed me some magic to make clang error on constructing a StringLiteral with a non-literal. Unfortunately it still doesn't prevent someone declaring one in a non constexpr context, i.e. you can still write: StringLiteral S = "foo"; instead of constexpr StringLiteral S = "foo";, I don't think anyone has come up with a way to prevent the former yet, but this is at least better than before.

ruiu added a subscriber: ruiu.Dec 13 2016, 4:01 PM

As to StringRefZ, I was about to propose StringRef to compute string length lazily, so that I could use it to represent strings in string table. I didn't because it would things hard to debug (because you get errors lazily.) But maybe it makes sense?

In D27721#621660, @ruiu wrote:

As to StringRefZ, I was about to propose StringRef to compute string length lazily, so that I could use it to represent strings in string table. I didn't because it would things hard to debug (because you get errors lazily.) But maybe it makes sense?

Maybe I'd need to see an actual proposal, but I'm having trouble imagining what you had in mind. Would you just call strlen on every call to StringRef::size()? Or store an extra bool in the class that says whether the length is known?

What would be the advantage and why does this laziness help in a string table?

ruiu added a comment.Dec 13 2016, 4:13 PM

We instantiate a StringRefZ for each string in a string table, but most of them are not going to be used because not all symbols are used for symbol resolution. So, most strlen() are going to be just waste of CPU time. However, still, StringRefZ is useful because it is a uniform representation of a string there. If StringRef computes string size lazily if it doesn't know it at compile time, we can just use StringRef there.

Probably should take this discussion to a thread specifically about StringRefZ, but fwiw I think that would complicate the logic of StringRef considerably while also introducing thread synchronization issues and potentially increasing the size of a StringRef. It's possible seeing the actual implementation would change my mind, but so far it doesn't sound like a good candidate for StringRef.

Back to this patch, anyone have any other thoughts? FWIW I changed my usage so I'm now storing a StringRef instead of a StringLiteral (because it actually needed to be constructed without a literal in some scenarios), but the API still seems to make sense here to me (and the attribute seems like a good addition regardless)

zturner abandoned this revision.Dec 15 2016, 11:19 AM