This is an archive of the discontinued LLVM Phabricator instance.

[libc] add core parsing for printf
AbandonedPublic

Authored by michaelrj on Jan 13 2022, 3:10 PM.

Details

Reviewers
sivachandra
lntue
Summary

Add the core parsing logic to handle printf, as well as an entrypoint
for sprintf. Most of the conversions are not yet done (the only ones
that are complete are % c s and n), but adding them should be simple
from the printf side, though the business logic will be complex.

Diff Detail

Event Timeline

michaelrj created this revision.Jan 13 2022, 3:10 PM
michaelrj requested review of this revision.Jan 13 2022, 3:10 PM
lntue added inline comments.Jan 14 2022, 6:18 AM
libc/src/stdio/printf_conv_core.h
26

Use enum class.

65

Mark this method const.

74

Use enum class.

libc/src/stdio/printf_impl.h
2

Why are these functions implemented in the header?

michaelrj updated this revision to Diff 400182.Jan 14 2022, 4:14 PM
michaelrj marked 3 inline comments as done.

address a few comments, and fix a buffer overflow that was causing crashes during testing.

michaelrj edited the summary of this revision. (Show Details)Jan 14 2022, 4:14 PM
michaelrj updated this revision to Diff 401038.Jan 18 2022, 4:45 PM

add tests for string precision and change char conversion to use the string converter

michaelrj updated this revision to Diff 401437.Jan 19 2022, 4:27 PM
michaelrj marked an inline comment as done.

add index mode parsing and tests, as well as tests for char conversion and %%

libc/src/stdio/printf_impl.h
2

Because the libc machinery most easily works with this as a header library.

michaelrj retitled this revision from [WIP][libc] add core parsing for printf to [libc] add core parsing for printf.Jan 19 2022, 4:28 PM

I have not read through everything yet. But, I think we should have a high level design which makes use of C++ facilities where possible to make the whole system neater and manageable.

  1. Separate the actions of parsing, formatting and production.
  2. Viewing the parsing aspect as a separate action is straightforward. The parsing system itself needs a design of its own. I think we can tackle it after we tackle the formatting and production actions.
  3. With respect to formatting, we should have a Stream class like this:
typedef void BufferWriter(void *, char *, size_t);

class Stream {
public:
  Stream(void *buffer, BufferWriter *writer);

  template <typename T>
  Stream &operator<<(T);
};

There should be various specializations of the operator<< method. In fact, considering we need to accommodate a number of formatting features, we should imitate much of the C++ std::ostream functionality (like being able to change encoding with std::hex, being able to set minimum width, etc.) Also, some of the specializations should probably live in their own object files so that they can be selectively excluded. For example, we might want to exclude floating point number formatters.

  1. The production aspect refers to the buffer and the writer function. For example, buffer can be a FILE * or a char *. The writer functions for these would be something like:
void char_writer(void *buf, char *data, size_t size) {
  char *dst = static_cast<char *>(buf);
  // Write |size| bytes from |data| to |dst|
}

void file_writer(void *bug, char*data, size_t size) {
  FILE *file = static_cast<FILE *>(buf);
  // Write |size| bytes from |data| to |file|
}

Note that the Stream class and the buffer writers can be written and tested without requiring a parser.

michaelrj updated this revision to Diff 402139.Jan 21 2022, 4:56 PM

minor formatting fixes

I think that the ideas you have proposed are good ones, but I think that sticking with the function oriented approach is better than changing to the stream oriented approach.

To start off with, the parsing, formatting, and production functions are all already separate (see parse_format_specifier_sequential, string_conversion, and the OutputBuffer class respectively). These can be tested separately, and I plan on making the future conversions tested separately by default, although I haven't set up separate testing for the string conversions since to test the completed sprintf function we need at least one conversion, and so using the string conversion for testing there works nicely.

In general concept, the stream oriented and function oriented approaches are basically the same, the main difference being that for the existing function oriented approach there isn't an overarching stream class storing the flags and options, they're just passed as arguments to each function. For production the OutputBuffer class takes strings and appends them to the end of its buffer, the only difference being that it uses the write function instead of the stream operators.

The most important part, in my opinion, is that the function oriented approach is already implemented, and for your other suggestions I can add them to the existing implementation (e.g. changing the way that the buffer is stored in the OutputBuffer). I don't see any particular improvements by adopting the stream approach that make it worth refactoring the implementation.

michaelrj abandoned this revision.Feb 11 2022, 11:41 AM

This version of printf has issues that are not easily fixed. It is being redesigned, and a new patch will be made.