This is an archive of the discontinued LLVM Phabricator instance.

Add a more succinct logging syntax
ClosedPublic

Authored by labath on Dec 6 2016, 8:22 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

labath updated this revision to Diff 80423.Dec 6 2016, 8:22 AM
labath retitled this revision from to Straw-man proposal for new logging syntax.
labath updated this object.
labath added a subscriber: lldb-commits.
zturner added a subscriber: zturner.Dec 6 2016, 8:32 AM

A couple comments:

  1. If we're going to use formatv (yay!) then let's standardize. No point mising formatv and streaming, especially when the former will almost always be less verbose.
  1. One of the concerns that came up last time was that of evaluating the arguments to the logging functions even in the case where it didn't get logged, because sometimes just calling the function to evaluate the parameter can be expensive. To get around that you could pass the arguments directly to the macro instead of streaming them, and the macro could wrap everything in a lambda.

Combining #1 and #2 above, maybe you end up with something like this (not tested, probably needs some tweaking to work):

#define LLDB_LOG(...) LLDB_LOG_([]() { return llvm::formatv(__VA_ARGS__); }

LLDB_LOG("pid = {0}: tracking new thread tid {1}", GetID(), tid);
labath added a comment.Dec 6 2016, 8:48 AM

A couple comments:

  1. If we're going to use formatv (yay!) then let's standardize. No point mising formatv and streaming, especially when the former will almost always be less verbose.

In this change, I deliberately tried to mix styles, to show different ways of doing things. I also wanted to avoid prescribing the right API to use, as I've found llvm::formatv to be longer in the simple cases. However, hiding it in a macro is a good idea, and would remove that problem.

  1. One of the concerns that came up last time was that of evaluating the arguments to the logging functions even in the case where it didn't get logged, because sometimes just calling the function to evaluate the parameter can be expensive. To get around that you could pass the arguments directly to the macro instead of streaming them, and the macro could wrap everything in a lambda.

The macro expands to something like if (!log) /*do nothing*/; else log << whatever, so the streaming part only gets executed if logging is actually enabled. That said, I like the API you propose as well (and it could probably use something similar under the hood, to avoid messing around with lambdas).

Part of the reason I like the current "if (log) log->Printf()" is that it doesn't cost you much if logging isn't enabled. So if you have a log line like:

if (log)
  log->Printf("%s %s %s", myStringRef1.str().c_str(), myStringRef2.str().c_str(), myStringRef3.str().c_str());

And switch over to using:

LLDB_LOG() << myStringRef1 << " " << myStringRef2 << " " << myStringRef2;

You end up doing some work with all of the types regardless of wether the logging is enabled. We tend to do pretty heavy logging in places and I really don't want performance impacted (the expression parser emits a full novel worth of information when logging is enabled), nor would I want people to log less because they are worried about affecting performance.

So any solution should have very minimal cost if logging isn't enabled. We also do log channels, so any design will need to include the ability to log to a channel, any channel, or only if all channels are enabled.

The other thing I don't like about streams is they are harder to read over a printf style log string. I wrote all of dwarfdump and used the full power off C++ streams and I regret doing this as all of the code is really hard to read. I also didn't like the fact that streams maintain state and can affect things down the line. If we adopt LLVM style streams I believe they don't suffer from the C++ streams having state issue. But with C++ you can do this:

std::cout << std::dec << 12;
std::cout << 24;

And then you add some code in the middle:

std::cout << std::dec << 12;
std::cout << std::hex << 0x1234;
std::cout << 24;

Now your "24" comes out as hex instead of decimal. This means you have to start always specifying your state:

std::cout << std::dec << 12;
std::cout << std::hex << 0x1234;
std::cout << std::dec << 24;

So my main objection is the cost of logging when it isn't enable with this approach. I am going to assume we won't use any state in streams if we do ever make something like this. I would still also like the ability to use Printf in the new logs if we do come up with a solution.

How can be solve the performance problem with this approach?

Does llvm::formatv allow you to specify the base of integers, or just where they go? Can you give it width, number base, leading characters and other things like a lot of what printf does?

I am fine with us changing logging, just want to make sure it works well. Ideally it would include:

  • almost zero cost when logging disabled. No arguments should be passed to any functions and then determine that logging is disabled, it should check if logging is enabled and not do any of the work if it isn't.
  • no state in streams so adding logging before other logging can't affect future output
  • the ability to some form if format strings (printf for formatv style strings) so not everything has to be done with streams and control codes that are sent into the stream to change from decimal to hex etc
  • I would rather avoid macros if possible, though we might have to use them to get the performance

Part of the reason I like the current "if (log) log->Printf()" is that it doesn't cost you much if logging isn't enabled. So if you have a log line like:

if (log)
  log->Printf("%s %s %s", myStringRef1.str().c_str(), myStringRef2.str().c_str(), myStringRef3.str().c_str());

And switch over to using:

LLDB_LOG() << myStringRef1 << " " << myStringRef2 << " " << myStringRef2;

You end up doing some work with all of the types regardless of wether the logging is enabled. We tend to do pretty heavy logging in places and I really don't want performance impacted (the expression parser emits a full novel worth of information when logging is enabled), nor would I want people to log less because they are worried about affecting performance.

So any solution should have very minimal cost if logging isn't enabled. We also do log channels, so any design will need to include the ability to log to a channel, any channel, or only if all channels are enabled.

The other thing I don't like about streams is they are harder to read over a printf style log string. I wrote all of dwarfdump and used the full power off C++ streams and I regret doing this as all of the code is really hard to read. I also didn't like the fact that streams maintain state and can affect things down the line. If we adopt LLVM style streams I believe they don't suffer from the C++ streams having state issue. But with C++ you can do this:

Take a look at llvm::formatv. I wrote this in LLVM specifically to be the best of both worlds. You get all the flexibility and conciseness of printf with all the type safety of streams. Example:

// Printf style
log->Printf("%d %s %z", 12, "test", (size_t)7);

// Stream style
*log << 12 << "test" << (size_t)7;

// formatv style
*log << llvm::formatv("{0} {1} {2}", 12, "test", (size_t)7);

// alternative formatv sytle
log->Formatv("{0} {1} {2}", 12, "test", (size_t)7);

As you can see, you don't have to specify the type. All the ugliness about PRIx64, compiler inconsistencies, platform-dependent sizes, etc all disappear because it deduces the type. And it's even better than this, because you can write custom formatters for your own types. For examples, LLDB has a class called AddressRange. With printf you would have to write this log->Printf("%llu - %llu", range.first(), range.second())', but with formatv` you can write this: log->Formatv("{0}", range); as long as it can find a formatter somewhere. And formatters can define arbitrary style options too, so you could write something like log->Formatv("{0:abc}", range); and this string "abc" would get passed to the formatter's format function. This is useful when a type is often formatted different ways (precision, optional fields, etc).

I strongly recommend moving to this for all printf style formatting moving forward.

Does llvm::formatv allow you to specify the base of integers, or just where they go? Can you give it width, number base, leading characters and other things like a lot of what printf does?

You can specify hex and decimal. No other bases, although that could be added. To the other questions, yes. Here are some examples:

std::string S = formatv("{0}", 7);   // S = "7"
S = formatv("{0,-3}", 7);   // S = "7  "  (-3 indicates left align inside a field of width 3)
S = formatv("{0,=3}", 7);  // S = " 7 "  (=3 indicates center align inside a field of width 3)
S = formatv("{0,=10; x}, 23);  // S = "   0x17   "  (=10 indicates center align inside a field of width 10, x indicates hex.   , is the alignment marker ; is the style marker
S = formatv("{0;P}", 0.12345);  // S = "12.35%"  (P indicates to format as a percentage, only valid for floating point types.  Default precision is 2
S = formatv("{0;3P}", 0.12345);  // S = "12.345%"  (P indicates to format as a percentage, only valid for floating point types.  Explicitly specify precision of 3.

There are lots of other things you can do, and since I wrote / designed it, I'm happy to add additional features that you or others request.

jingham added a subscriber: jingham.Dec 6 2016, 9:57 AM

I not infrequently have some non-trivial code in the "if (log)" block that gathers the information that I am then going to print, or loops over entities printing them. Putting more complex code like that inside a macro looks awful and is hard to debug. I don't think that argues for not using the macros in the case where you are logging a simple string, but it does argue for keeping the old way available.

Also, some extant log messages don't have file and function because they are making structured readable output and the function noise is just that, and makes the logs harder to read. It's worse when constructing logs that span functions because you would have to work way too hard to line things up nicely. If we're going to keep the old log mechanism around, that's okay, you can just continue to use that for these purposes. Otherwise, you might want have decorated and non-decorated variants.

This discussion has diverged across lldb-dev and lldb-commits. I think we should move it to one place. I propose lldb-dev, as that seems a better place for high-level discussions.

labath updated this revision to Diff 80744.Dec 8 2016, 4:19 AM

Update the example to the formatv-based API

Seems weird that we are a C++ codebase and we fall back to C macros. We currently need the macro only for file + line and to also not have to worry about doing "if (log) log->". I am not a big fan of macros, but I am open to it if everyone else wants it.

include/lldb/Core/Log.h
218–219 ↗(On Diff #80744)

This is hard coding in the file + line all the time. I would like this to be an option. We also need to be able to optionally enable the pid/tid, thread name, stack backtrace and all the other options that are currently supported by "log enable". Seems like we need a log function that exists in Log.cpp to take a log mutex, add the extra log prefix stuff (file + line, pid/tid, thread name, timestamp, delta since last log time, stack etc) and then call the llvm::formatv() one or more times and then release the mutex.

include/lldb/Host/FileSpec.h
675–677 ↗(On Diff #80744)

For a good example of what objects can do to help sell this proposal we should handle different values for the Options string. Maybe this to support the following:

Options is empty -> print out full path
Options == "filename" -> print out file name only with extension
Options == "basename" -> print out file name only without extension
Options == "directory" -> print out file directory only
Options == "extension" -> print out file extension only
Options == "volume" -> print out file volume only which will probably only work on windows paths currently
Options == "fullname" -> print out volume + directory + filename, same as default empty string

Then we can format like:

FileSpec file("/tmp/foo.txt");
llvm::formatv("{0; filename}", file);

A couple of thoughts / two cents.

I don't mind the current if (log) { ... } style of logging, even with the PRIx64's and having to do filepath.GetPath().c_str() and all that. I like being able to do extra work in a if (log) block of code -- create a SymbolContext for something or whatever, and as I'm debugging that section of code I like to be able to put a breakpoint there and look more closely. If this is all in a preprocessor macro, a breakpoint is going to stop me on the if() and then I'm trying to step through a macro? And if my computation is down in a lambda I need to figure out how to put a breakpoint in that or step into that somehow to look at the values being computed. How do I create a SymbolContext my_symctx(...); and then include that in what I log if I have to do all of this in the formatv arguments?

I'm not thrilled with the formatv reinvention of format specification. The printf formatters are a bizarre little invention, but it's a bizarre invention that we've all worked with for decades and everyone who works in C knows well. The formatv format specification may be superior, but I don't want to see innovation here, I want clarity with the linga franca that every C programmer knows. I think the barrier to adopting something non-standard is very high.

The formatv format specification string seems to explicitly mention the ordering of the arguments, like llvm::formatv("{0} {1} {2}", first, second, third). I'm guessing this allows you to print the arguments in different order than they appear? Like llvm::formatv("{0} {2} {2} {1}", first, second, third)? What's the benefit of this vrs. the uncertainty of which arguments are used in different parts of the string (short of counting them out by hand)? If I have a formatv format specification like "{0} {1} {2} {3} {4}" and I want to insert an argument between 2 and 3, I need to renumber 3 and 4 now? Or do I do "{0} {1} {2} {5} {3} {4}" and throw my argument on to the end?

Simply *ability* to put the arguments in any order is my concern. Of course I'm never going to do it, but I'm going to have to work on code where other people have done it.

I don't personally want to see the file / function / line numbers in my logging. It's fine if that's an option that other people can enable if they want it. There's going to be a growth in the size of our text for the constant strings for every log point having its filename and function, but it will probably not be significant (maybe the strings can be coalesced, I wouldn't bet one way or the other.)

If formatv could take something a LOT closer to printf, I'd be better with it. Something like the python string formatters would be a real improvement - just %d for base 10 numbers, not having to worry about the size of the argument in the format string, for instance, and would still be readable & understandable by any C programmer out there.

If we're going to start using this alternate logging scheme, eventually there will be a push to use it everywhere, or someone will go through all the other classes in the code base and switch it out without consultation. We need to all sign on to this idea before it goes into the code base at all.

labath added a comment.Dec 9 2016, 3:26 AM

I'll respond to your comments on the lldb-dev thread.

include/lldb/Core/Log.h
218–219 ↗(On Diff #80744)

I am planning to make the source information optional. This diff is mainly here to demonstrate the target syntax from the users point of view. The changes are made here are not to be taken as examples of good coding practices - it is just the minimal set of changes I needed to do to make things work. When we agree on the final syntax, I'll implement this more nicely.

emaste added a subscriber: emaste.Dec 9 2016, 10:22 AM
labath updated this revision to Diff 83967.Jan 11 2017, 6:08 AM
  • Replace the hacky proof-of-concept implementation with something more production-ready
  • Add a option to control adding of source information (defaulting to on)
  • Add some unit tests
  • Add an example of how would ProcessWindowsLog look like with this syntax
clayborg requested changes to this revision.Jan 11 2017, 9:48 AM
clayborg added a reviewer: clayborg.

Looks fine except I would rather not have file + line on by default.

source/Commands/CommandObjectLog.cpp
51–52 ↗(On Diff #83967)

This shouldn't be on by default. Maybe it should be renamed to "file-line".

103 ↗(On Diff #83967)

Don't default this to on.

135–136 ↗(On Diff #83967)

Reverse

This revision now requires changes to proceed.Jan 11 2017, 9:48 AM
labath updated this revision to Diff 84108.Jan 12 2017, 5:02 AM
labath edited edge metadata.
  • Several people expressed wishes to have this off by default, so I have done that.
  • Renamed the log option to --file-function (Note I am deliberately not printing the line numbers -- they tend to change all the time and it's more consistent with what the code is currently doing)
  • I am planning to submit the core changes only, the refactors are here just for illustration, and I'll handle them separately.
labath updated this revision to Diff 84690.Jan 17 2017, 9:48 AM

Final version. No real changes, just removed usage changes in NativeProcessLinux
I will put in as sepearate patches.

labath retitled this revision from Straw-man proposal for new logging syntax to Add a more succinct logging syntax.Jan 17 2017, 9:49 AM
labath edited the summary of this revision. (Show Details)
clayborg accepted this revision.Jan 17 2017, 10:54 AM

Looks good.

This revision is now accepted and ready to land.Jan 17 2017, 10:54 AM
dancol added a subscriber: dancol.Jan 17 2017, 11:03 AM
dancol added inline comments.
source/Core/Log.cpp
78 ↗(On Diff #84690)

I usually implement printf-into-std::string by using vsnprintf to figure out how many characters we generate, using std::string::resize to create a buffer of that many characters (unfortunately, zero-filling them), then vsnprintf directly into that buffer. This way, you only need one allocation.

The currant approach involves at least three allocations: first, the string generated by vasprintf. Second, the internal stringstream buffer. Third, the copy of the buffer that std::stringstream::str generates.

It's more expensive that it needs to be.

zturner added inline comments.Jan 17 2017, 11:11 AM
source/Core/Log.cpp
78 ↗(On Diff #84690)

To be fair, we should really be deleting these methods long term, and using formatv. This way you often end up with 0 allocations (for short messages), and on top of that, only one underlying format call (as opposed to wasting time calling vasprintf twice here).

labath added inline comments.Jan 18 2017, 2:59 AM
source/Core/Log.cpp
78 ↗(On Diff #84690)

The currant approach involves at least three allocations: first, the string generated by vasprintf. Second, the internal stringstream buffer. Third, the copy of the buffer that std::stringstream::str generates.

str() returns a reference to the underlying buffer (this is an llvm stream, not std::stringstream), so there is no copy there. Since this is not really performance critical, I'd prefer to keep the code simpler (plus, I think the extra vsnprintf call balances the extra malloc, roughly speaking).

And, same as Zachary, I hope this code will go away eventually.

This revision was automatically updated to reflect the committed changes.