This is an archive of the discontinued LLVM Phabricator instance.

[obj2yaml] - Better indentations in the ELF output.
AbandonedPublic

Authored by grimar on Jan 20 2020, 7:15 AM.

Details

Summary

The current obj2yaml's output has an issue: it places
a lot of an excessive indentation spaces between keys and values.
Then we have to remove them manually to improve the readability of
test cases. I see such requests from reviewers regularly around
and it is not ideal.

The issue happens because Output::paddedKey has a trivial logic which
hardcodes the maximum of spaces used to align the values column:

const char *spaces = "                ";
if (key.size() < strlen(spaces))
  Padding = &spaces[key.size()];

I've experimented with a ways to improve this situation and
seems finally found the solution, though it is not that simple
as I've hoped when started.

The main problem is that until we performed a mapping of
fields, we know nothing about their names and hence
can't use this information to tweak the indentations on the fly.
At the same time we write the output during doing the mapping,
so it is too late to do something on this step.

So. In this patch I've introduced a "pre-mapping" pass. This is a
cheap "mapping" that is done right before the real mapping.
This pass only scans the keys which are on the current level of the output
and finds the maximum key length. This information is then used
to calculate and print a better indentation during the real mapping pass.

Given that the approach is not that simple, I'd like to discuss before
doing additional polishing. Perhaps the main question is: do we want
such complexity to improve indentations?

Diff Detail

Event Timeline

grimar created this revision.Jan 20 2020, 7:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 20 2020, 7:15 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
grimar marked an inline comment as done.Jan 20 2020, 7:20 AM
grimar added inline comments.
llvm/test/tools/obj2yaml/elf-output-indentation.yaml
19

Here the additional indentation appears because of AddressAlign, which is
the largest key among others. The same situation is shown below.
I did not investigate how much hard to to improve such places too yet.

I'll give this a more thorough look at some point hopefully, but my instinct is the current situation is the worst of the three cases "good padding"/"excessive padding"/"no padding". I think a little bit of code complexity would be okay to improve the excessive padding situation. I'm not sure instinctively whether this is too much however. I'll give it some thought.

llvm/test/tools/obj2yaml/elf-output-indentation.yaml
19

I definitely find it weird that .text is not indented to the same extend as .data.

20–24

I'm guessing this block is indented because of Relocations? This doesn't seem great, given that the Reloctions line doesn't have a direct value itself.

I agree that a preflight pass is unavoidable if we want to improve indentation. This patch does not seem to bring much complexity, so I am in favor of such a change. I'll need to dive deeply into the logic.

grimar marked 2 inline comments as done.EditedJan 21 2020, 1:20 AM

I agree that a preflight pass is unavoidable if we want to improve indentation.

There was one more way I though about.
For each type handler, like:

void MappingTraits<ELFYAML::Object>::mapping(IO &IO, ELFYAML::Object &Object) {
...
  IO.mapTag("!ELF", true);
  IO.mapRequired("FileHeader", Object.Header);
  IO.mapOptional("ProgramHeaders", Object.ProgramHeaders);
  IO.mapOptional("Sections", Object.Chunks);
  IO.mapOptional("Symbols", Object.Symbols);
  IO.mapOptional("DynamicSymbols", Object.DynamicSymbols);
....
}

We could calculate the indentation in its code, like:
(pseudocode)

void MappingTraits<ELFYAML::Object>::mapping(IO &IO, ELFYAML::Object &Object) {
  int MaxKey = sizeof("FileHeader") or sizeof("ProgramHeaders") if exist and not empty or ...;
  IO.setMaxPadding(MaxKey);
...
  IO.mapTag("!ELF", true);
  IO.mapRequired("FileHeader", Object.Header);

It could give a nice results, it is simple and does not require pre-mapping, but
it is not a generic approach, that is why I supposed it is not an option.

llvm/test/tools/obj2yaml/elf-output-indentation.yaml
19

Yes, though it just follows the simple rule: "find the largest key size and align others".
I am not sure what can be a good generic heuristic without knowing the context.

20–24

Yeah, but the problem here is that preflightKey which is used to scan keys:

bool preflightKey(const char *Key, bool Required, bool SameAsDefault, bool &,
                   void *&) override {
   if (Required || !SameAsDefault)
     MaxKeyLen = std::max(MaxKeyLen, strlen(Key));
   return false;
 }

does not have any information about values (how to know that Relocations is an empty list?)
Accessing to the values never happens,
because of return false, which prevents any more logic from execution.
I.e. The code only scans over keys on the current level and does not allow going deeper.
If it did than much more logic would be needed to be implemented for this "pre-mapping" pass.

The solutions I see are: either adopting the code in yaml::Output to teach it about possibility of "pre-mapping"
(the idea of this patch was to avoid touching Output code, I've only fixed a minor misbehaviors in processKeyWithDefault)
and/or implementing a smarter (and larger) PreMappingOutput class.

Okay, I'm happy with the concept, and I don't think it's too complex, but I do think my inline comments about the formatting for lists (members, and also other keys in the list owner) need resolving for this to be properly worthwhile.

llvm/include/llvm/Support/YAMLTraits.h
1133

Some doxygen-style comments on this and the other new functions and classes are needed.

1139

Should this be a size_t? Negative padding doesn't make much sense...

1206

MaxPaddingsStack -> MaxPaddingStack

1217

This seems like a weird way of structuring the code. A PreMappingOutput is a subclass of Output, but that means it also contains a unique pointer to itself? That seems like a recipe for destruction order issues.

llvm/lib/Support/YAMLTraits.cpp
833

Why not just StringRef Spaces = " ";?

836

My immediate thought here was "what if the key is longer than the spaces". I imagine it's handled, but it's not obvious.

838

clang-format?

llvm/test/tools/obj2yaml/elf-output-indentation.yaml
19

Is it not possible to iterate over all list elements to find their individual required padding sizes, and then set them all to the max of the list members?

20–24

I'm not really an expert in how all this code fits together, so I don't know what's possible. However, I do think that indenting a value like this looks bad, and will regularly leave us with extra spurious spaces again. Is it possible to tell what kind of mapping keys are (e.g. lists versus strings/numbers) at all? I do think some extra smarts to solve this would be wise if needed.

grimar planned changes to this revision.Jan 29 2020, 3:55 AM
grimar abandoned this revision.EditedFeb 18 2020, 6:31 AM

I'll continue D74611 which is alternative.