This is an archive of the discontinued LLVM Phabricator instance.

[LLVM] Fix handling of escaped closing brace `}` in `formatv` (under an option)
AbandonedPublic

Authored by jurahul on Jul 14 2020, 6:03 PM.

Details

Summary
  • formatv was not handling closing braces as documented and treating them as unescaped. This also means that cases like "{{}}" would, post-formatting, be "{}}" which is not correct per the documented behavior.
  • Fix the string parsing and analysis code to treat closing braces as requiring escape and assert of we find a closing brace not paired with an opening one.
  • Unfortunately, many tests and code rely on the existing behavior, so introduce a FormatEscapeMode enum to switch between existing (legacy) and new (strict) mode. Also added a formatv_strict() function corresponding to the standalone formatv() function.
  • Update some existing unit tests that to use strict mode.

Diff Detail

Event Timeline

jurahul created this revision.Jul 14 2020, 6:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2020, 6:03 PM

I ran into this issue when trying to attach default values to a parameter in the code generated by TableGen, which uses formatv(). Adding just {} does not work, so I thought {{}} should work based on the comments, but the code does not handle this. When testing, I found that there is a lot of existing code and tests that rely on this current (legacy) behavior, so decided to only enable the new mode optionally. Ideally, we should get rid of the legacy behavior, but it seems there are too many tests to fix (in clang, clangd etc), so I am not handling that with this change.

I'm sympathetic to wanting to clean this up, but having two modes without a plan to remove one doesn't seem ideal.

What do you think about adjusting the documentation to describe the actual behaviour instead? It doesn't seem unreasonable - it requires { to be doubled but not }, and there's no ambiguity because } cannot appear inside a replacement sequence therefore must close one.

That's a reasonable course of action as well. I could look into fixing all the in-tree uses of formatv() to conform to the strict mode, but I do have concern about out-of-tree uses breaking with this and leading to unnecessary API churn, so for that purpose we would potentially need to keep the legacy mode. So overall, I think updating the documentation seems reasonable. I'll update the review for that.

jurahul abandoned this revision.Jul 15 2020, 9:54 AM

Abandoning this one, will start a new one for the documentation changes and some simplification to the code as well.

Thanks. I do think the strict behavior you describe here is a little bit better, but it's really hard to completely get there (not least because format strings often aren't tested...).
Formalizing the existing behavior is better than the halfway state.