This is an archive of the discontinued LLVM Phabricator instance.

Add a user-defined literal for StringRef
AbandonedPublic

Authored by malcolm.parsons on Nov 6 2016, 1:02 AM.

Details

Summary

User-defined literals are supported by GCC 4.7, Clang 3.1 and MSVC 2015.

A user-defined literal for StringRef makes it easy to create StringRefs
containing embedded nulls.
It is also constexpr, so global tables of StringRef don't need a static
initializer.

Inspired by D25639.

Event Timeline

malcolm.parsons retitled this revision from to Add a user-defined literal for StringRef.
malcolm.parsons updated this object.
malcolm.parsons added a subscriber: llvm-commits.
idlecode added inline comments.Nov 6 2016, 4:46 AM
include/llvm/ADT/StringRef.h
882

Does this operator need it's own namespace?

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1221 ↗(On Diff #76984)

getOrCreateSymbol takes Twine as argument so StringRef conversion is not needed.
There is few other such use-cases but it would be quite tedious to fix them all manually - maybe clang-tidy could take care of it in the future.

mehdi_amini edited edge metadata.

+chandler as a reviewer for the usage of user string literals.

mehdi_amini added inline comments.Nov 6 2016, 8:56 AM
include/llvm/ADT/StringRef.h
74

Is this related to this patch?

86–88

same.

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
1221 ↗(On Diff #76984)

Are you sure that using a Twine with a const char * is equivalent? I wonder if there won't be an extra call to strlen at the point where the Twine is serialized.

lib/CodeGen/AsmPrinter/DIEHash.cpp
41 ↗(On Diff #76984)

Why not just return "";?

lib/IR/Value.cpp
197 ↗(On Diff #76984)

same.

lib/MC/MCAsmStreamer.cpp
350 ↗(On Diff #76984)

Same in this context.

malcolm.parsons added inline comments.Nov 6 2016, 9:14 AM
include/llvm/ADT/StringRef.h
74

No.

86–88

Yes. This constructor needs to be constexpr, so that the constexpr UDL can call it. It can't use assert in C++11.

lib/CodeGen/AsmPrinter/DIEHash.cpp
41 ↗(On Diff #76984)

I wonder this about the original.
These transformations were made with sed.

dtzWill added a subscriber: dtzWill.Nov 6 2016, 1:10 PM

The LLVM coding standards don't mention UDL.

The Google C++ Style Guide says
Pro: User-defined literals are a very concise notation for creating objects of user-defined types.
Con: User-defined literals allow the creation of new syntactic forms that are unfamiliar even to experienced C++ programmers.
Decision: Do not overload operator"", i.e. do not introduce user-defined literals

http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0403r0.html proposes a UDL for std::string_view.

I think the pros outweigh the cons.

chandlerc edited edge metadata.Nov 22 2016, 3:25 PM

The LLVM coding standards don't mention UDL.

The Google C++ Style Guide says
Pro: User-defined literals are a very concise notation for creating objects of user-defined types.
Con: User-defined literals allow the creation of new syntactic forms that are unfamiliar even to experienced C++ programmers.
Decision: Do not overload operator"", i.e. do not introduce user-defined literals

http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0403r0.html proposes a UDL for std::string_view.

I think the pros outweigh the cons.

FWIW, I disagree.

I think that UDLs introduce a surprising and novel syntax for calling functions. If there is an overwhelming desire to add UDLs for StringRef, we could do so, but so far I find the benefit provided relatively small and increasing the syntax scope of the language a high price to pay.

For example, in the overwhelming majority of the cases in this patch, I suspect the construction of the StringRef could be skipped entirely, and at most some missing overloads added to cause the code to work as-is without any need for further syntactic contortions.

I think that UDLs introduce a surprising and novel syntax for calling functions.

Calling methods on a literal StringRef is indeed surprising and novel.
I see some of those in this patch; they were not intentional and I'm removing them.

In the overwhelming majority of the cases in this patch, I suspect the construction of the StringRef could be skipped entirely, and at most some missing overloads added to cause the code to work as-is without any need for further syntactic contortions.

I'll remove the cases that don't have embedded nulls and add some that are constructed statically (PR11944).

malcolm.parsons edited edge metadata.

Make namespace inline.
Only use UDL where useful.

zturner edited edge metadata.Nov 23 2016, 9:01 AM

If the goal here is to achieve constexprness so it doesn't require a static initializer, how about reviving Mehdi's patch which make the const char * constructor explicit, and then adding a constexpr char array constructor?

There are a bunch of unrelated cleanups here. I don't know if those are good or not, but they make it really hard to even read this patch.

As several have said, I don't think continuing to update this patch is the right path forward. IMO you should:

  1. take any pure cleanup (IE, unrelated to UDLs) parts of this patch, factor them into nice small patches that touch an isolated part of LLVM and send those out for review.
  1. send a mail to llvm-dev explaining *why* you think we need a UDL for StringRef (what problems it solves, why the issues raised by myself and Pete don't apply) and why other solutions don't work.

Then #2 can resolve the actual debate. Note that if the "why" is what Zach hypothesizes, I would much prefer the approach he suggested to a UDL.

-Chandler

If the goal here is to achieve constexprness so it doesn't require a static initializer, how about reviving Mehdi's patch which make the const char * constructor explicit, and then adding a constexpr char array constructor?

A char array constructor cannot tell whether the array is a literal or not.
A UDL can only be called with a literal.

If the goal here is to achieve constexprness so it doesn't require a static initializer, how about reviving Mehdi's patch which make the const char * constructor explicit, and then adding a constexpr char array constructor?

A char array constructor cannot tell whether the array is a literal or not.
A UDL can only be called with a literal.

I think as Chandler said, perhaps taking this to llvm-dev where more people will have a chance to chime in might be a good path forward

malcolm.parsons abandoned this revision.Nov 25 2016, 4:14 AM