This is an archive of the discontinued LLVM Phabricator instance.

[clang][lex] Ensure minimizer output is never larger than input
AbandonedPublic

Authored by jansvoboda11 on Jun 17 2021, 6:34 AM.

Details

Summary

This patch ensures the output of source minimizer is never larger than the input. This property will be handy in a follow up patch that makes it possible to pad the minimized output to the size of the original input.

Diff Detail

Event Timeline

jansvoboda11 requested review of this revision.Jun 17 2021, 6:34 AM
jansvoboda11 created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptJun 17 2021, 6:34 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Fix assert wording

This patch ensures the output of source minimizer is never larger than the input. This property will be handy in a follow up patch that makes it possible to pad the minimized output to the size of the original input.

I suppose when I wrote first wrote this I was thinking of a secondary purpose of canonicalizing the sources; in which case, adding a trailing newline (for example) seems like feature rather than a bug. I'm not too attached to that, but I am curious for more context about why it's important to be able to pad the file to the same size, and whether that indicates a bug that should be fixed somewhere else instead. (I'll read the follow-up patches in the stack...)

clang/unittests/Lex/DependencyDirectivesSourceMinimizerTest.cpp
149

I'm not comfortable with this change... the idea of the (/*invalid*/ was to guarantee that an invalid macro argument list didn't get minimized to something valid.

Perhaps the result could be "#define MACRO(\n", and just strip the comment?

Another thing to consider: a client that wants the minimized source to be "no bigger than" the original source can use the original source if it ends up growing. For example, https://reviews.llvm.org/D104462 could check the resulting size, and just return the original source in the (extremely unlikely?) corner case where the minimized sources are bigger.

In that case, we wouldn't need this patch's assertion.

I bet some of these changes are good to do anyway (like dropping the /*invalid*/ comment, which was useful as a canary-in-the-coal-mine when testing the prototype on large bodies of sources (and maybe has outlived its usefulness)), but you could skip the fiddly changes here that complicate the logic and make the output less canonical and harder to read, like "maintain 0 whitespace after macro names" and "maintain missing newline at EOF".

jansvoboda11 abandoned this revision.Jul 15 2021, 2:08 PM