This is an archive of the discontinued LLVM Phabricator instance.

Add Formatv() versions of all our printf style formatting functions
ClosedPublic

Authored by zturner on Dec 9 2016, 1:47 PM.

Details

Summary

We have various functions like Stream::Printf(), and Error::SetErrorStringWithFormat(), Log::Printf(), and various others. I added functions that delegate to formatv in case people want to see some actual usage in practice.

I also converted a few various Printfs, etc to this syntax just to demonstrate what it looks like. I tried to find callsites that would illustrate how it makes things simpler, such as :

  • not needing to call .c_str() and being able to pass std::string and/or StringRef directly as the argument.
  • places where the same argument is printed multiple times, only having to pass it once (incidentally this fixed a spelling mistake since when passing the argument the 2nd time, there had been a typo. So this also serves as an example of how this is safer than Printf, since less code typed = fewer chances to mess up).
  • places where complex format strings are constructed, such as using the printf width specifier * or PRIx64 are used and now we can simply pass whatever type we have directly, even if it's architecture dependent like size_t.

Various other uses.

Obviously a change like this is too big to do all in one pass, since it would have a roughly 100% chance of introducing new bugs and be impossible to bisect. This is basically just to get the hookups in place and have a single-digit number of illustrative use cases.

Diff Detail

Repository
rL LLVM

Event Timeline

zturner updated this revision to Diff 80949.Dec 9 2016, 1:47 PM
zturner retitled this revision from to Add Formatv() versions of all our printf style formatting functions.
zturner updated this object.
zturner added reviewers: labath, clayborg, jingham.
zturner added a subscriber: lldb-commits.
clayborg edited edge metadata.Dec 9 2016, 4:02 PM

It is nice to be able to use StringRef and std::string as is. I was wondering if you have:

StringRef null;
StringRef empty("");
StringRef hello("hello")

What does this show:

formatv("{0} {1} {2}", null, empty, hello);

I would hope for:

(null) "" "hello"

Where "(null)" is used for the empty StringRef since it doesn't have a value. This is what printf does if you pass nullptr in for a %s. The other string has a pointer, but no size, so it should show the empty string. Also for std::string and for StringRef if would be great if we can show binary values since a StringRef and std::string can contain NULL characters and also non printable characters. One option that would be really cool for both StringRef and std::string would be the memory view stuff we did a month back! So we could do:

StringRef bytes("\x01\x02\x03");
formatv("{0:memory}", bytes);

And get out a memory dump.

source/Target/Target.cpp
1558–1559 ↗(On Diff #80949)

Can we add lldb_private::FileSpec support with this patch as well? Then the code can be:

error.SetErrorStringWithFormatv(
            "{0}[{1:x+}] can't be resolved, {0;filename} is not currently loaded",
            addr_module_sp->GetFileSpec(), resolved_addr.GetFileAddress());
clayborg requested changes to this revision.Dec 9 2016, 4:03 PM
clayborg edited edge metadata.

I would like to see the FileSpec having its formatv stuff done for this since it will start a whole slew of people wanting to use it and we should have an example of a class that show how to add your own formatting options.

This revision now requires changes to proceed.Dec 9 2016, 4:03 PM
zturner updated this revision to Diff 80992.Dec 9 2016, 9:16 PM
zturner edited edge metadata.

Added a format provider for FileSpec. Style syntax is documented in this patch. Added some unit tests so you can see the output.

To answer Greg's earlier question about printing c-strings, formatv("'{0}' '{1}' '{2}'", (const char *)nullptr, "", "test"); would print "'' '' 'test'";. So yes that means that nullptr doesn't print "(null)". It could be made to do so if desired, but if it's any consolation the goal is really to move away from c-strings, and StringRef doesn't have any concept of null. There's just empty and not empty.

One nice thing about is that you can easily change the behavior of existing formatters using a mechanism which I've called "adapters". You can see an example of a builtin adapter in this patch, where I use fmt_repeat to repeat a character 7 times. To write an adapter for const char * that prints (null), you could do this:

struct fmt_or_null {
  explicit fmt_or_null(const char *s) : s(s) {}
  void format(llvm::raw_ostream &Stream, StringRef Style) const {
    if (!s)
      Stream << "(null)";  // Override the default behavior for nullptr;
    else
      llvm::format_provider<const char *>::format(s, Stream, Style);  // Otherwise just use the default;
  }
  const char *s;
};

void foo() {
  const char *s = nullptr;
  std::string result = llvm::formatv("{0}", fmt_or_null(s));
}
labath edited edge metadata.Dec 10 2016, 1:03 AM

This looks like a step in the right direction. I trust you won't go on a reformatting spree of the log messages until we settle on the syntax there. ;)

include/lldb/Core/ModuleSpec.h
244 ↗(On Diff #80992)

you can delete the uint64_t cast now. It was only there to make this portably printable.

include/lldb/Host/FileSpec.h
784 ↗(On Diff #80992)

I am wondering.. what is the preferred style for adding formatting capability. I sort of assumed that the format_provider thingy was reserved for cases where you had no control over the object being formatted (e.g. because it comes from a third party library), and that for cases like this we would use the member format function (or, to put it differently: if we're going to be using the format_provider everywhere, what's the point of having the member functions in the first place?).

Also, my preference would be to not have the F and D styles for FileSpec. I think they are redundant, as we have FileSpec::GetFilename and FileSpec::GetDirectory, and also they are not compiler-enforced --
Formatv('{0}', fs.GetFilename()) is slightly longer than Formatv('{0:F}', fs), but if I make a typo in the former one, it will be a compiler error, while the other one won't. I suppose it could be useful, if one wants to format say an array of FileSpecs using the array format syntax, but I'm not sure how often will someone want to do that *and* display only the filenames.

I don't care strongly about any of these things, but I want to know what to expect.

Looking good. A few little changes and possibly supporting long option names that can be minimally specified. See inlined comments.

source/Host/common/FileSpec.cpp
1394 ↗(On Diff #80992)

If we are going to not care about case I would say we should use lower case in our initial usages of this formatter.

This leads to another question: should we allow format strings to be specified with longer names and that all formatv strings can be minimally specified like the LLDB command names? Then we can choose more descriptive names like "filename" and "directory" which can be shortened to "f" and "d" since they minimally match. The main issue with this is if you wanted to add another format flavor that starts with any letter that already matches, like say we wanted to later add "filename-no-extension". We could say that the first in the list that partially matches would always win. This would mean that when adding support for a new flavor you would always need to add it to the end of the list. This can help keeps things more readable in the ::format functions while allowing users to also be explicit if they need to.

It would be nice to avoid asserts if possible in the logging ::format code as if we ever open up the logging to users somehow (uncontrolled input), we don't want to crash. I would rather see something in the output like "invalid format style 'ffff' for lldb_private::FileSpec" just to be safe. The user will see this output and it would be nice not to crash.

source/Target/Target.cpp
1558 ↗(On Diff #80992)

I would prefer us to use lower case format characters in our example code if possible, so 'f' instead of 'F'.

zturner added inline comments.Dec 12 2016, 9:27 AM
source/Host/common/FileSpec.cpp
1394 ↗(On Diff #80992)

I'm sort of ambivalent about long option names in the way you suggest. It definitely allows for less cryptic format strings, but on the other hand with our 80-column limit, with run the risk of leading to additional levels of wrapping which could reduce readability. That said, it's easy to modify these formatters, so maybe we could start with short only, and then re-evaluate after it starts getting some usage? Would give people a chance to build up a stronger opinion one way or the other.

As for asserts, I don't think it should be an issue here. The reason I say this is because the assert is compiled out in release mode, so the assert itself will never be the cause of a crash for the user. The only way it could crash is if the code that followed did not perform gracefully under the condition that triggered the assert to fire in the first place.

The assert here is used only to notify the programmer that they've made a mistake in their debug build, but in a normal build it will fall back to using the default Dir + Separator + Filename format if anything in the format string is unrecognized.

I prefer this to adding additional conditional branches, because printing something sane is better than printing "Unknown Format String" in a log file.

clayborg added inline comments.Dec 12 2016, 10:04 AM
source/Host/common/FileSpec.cpp
1394 ↗(On Diff #80992)

I am saying that we add long format name support in now only in the ::format() functions but use them with the minimal characters on our code when logging. This would avoid any 80 char limitations when using them, but keep the ::format() functions more readable.

I am fine with the assert being there if the code still is designed to function when the assert isn't there. LLVM tends to say asserts are invariants and the code isn't expect to function when the asserts are off, so as long as we avoid avoid this in LLDB I am fine (use asserts for alerting, but code around it).

1401 ↗(On Diff #80992)

Do we want all strings and classes that don't have valid state to print "(empty)"? If so maybe we build in something into formatv like:

Stream << formatv::empty();

This would allow us to get a consistent output from everyone when their objects are invalid.

zturner added inline comments.Dec 12 2016, 10:31 AM
source/Host/common/FileSpec.cpp
1394 ↗(On Diff #80992)

Ahh I see what you mean. I wouldn't want us to do anything like that across the board, because we can't foresee all possible ways we might want to use it, and I don't think we'd be able to apply this rule consistently across all types and all formatters we'd ever make in LLDB. For example, I can imagine a hypothetical situation where you might do llvm::formatv("{0:pxest}") where p, x, e, s, and t are "codes" for some specific field, and it just strings all the results together. Then there are some situations where case matters, like when printing integers X` means hex with uppercase digits and x means hex with lowercase digits.

clayborg accepted this revision.Dec 12 2016, 1:05 PM
clayborg edited edge metadata.

So we need to formalize this in the formatv documentation before we begin wide adoption otherwise people can do what ever they want. It would be good to write this up so we get consistent implementations. But that is in here looks good enough to start with.

This revision is now accepted and ready to land.Dec 12 2016, 1:05 PM

Thanks. Not going to commit this just yet, as I know there was at least one concern over on llvm-dev about adopting this. So I'll give it another few days to see if the complaints remain.

Sounds good. Take it as my vote this is ok to start with if we get consensus

This revision was automatically updated to reflect the committed changes.