This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Support/
-
lib/
-
Support/
15/15
FileCheck.cpp
2/2
FileCheckImpl.h

Differential D81667

[FileCheck] Add precision to format specifier
ClosedPublic

Authored by thopre on Jun 11 2020, 8:42 AM.

Download Raw Diff

Details

Reviewers

jhenderson
jdenny
probinson
grimar
arichardson

Commits

rG998709b7d553: [FileCheck] Add precision to format specifier

Summary

Add printf-style precision specifier to pad numbers to a given number of
digits when matching them if the value is smaller than the given
precision. This works on both empty numeric expression (e.g. variable
definition from input) and when matching a numeric expression. The
syntax is as follows:

[[#%.<precision><format specifier>, ...]

where <format specifier> is optional and ... can be a variable
definition or not with an empty expression or not. In the absence of a
precision specifier, a variable definition will accept leading zeros.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

thopre created this revision.Jun 11 2020, 8:42 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptJun 11 2020, 8:42 AM

Harbormaster failed remote builds in B59987: Diff 270159!Jun 11 2020, 10:28 AM

I think I agree with your conclusions. More generally, I think we should be permissive, where permissiveness is not going to be surprising (i.e. no explicit format specifier seems reasonable in the general context), and should follow scanf style format specifiers where reasonable. If I follow it right, it would therefore be possible to specify a 16-digit hex field with %.16x, right? Could you clarify what the motivation of the "with empty expression" bit is for? Is that just because when there is an empty expression, your regex is incorrect, or something else?

The code change in general seems simple enough to support the proposal too, though I haven't reviewed it in detail. I'll wait until you've added documentation/tests etc, so that I can review it all at once.

In D81667#2092260, @jhenderson wrote:

I think I agree with your conclusions. More generally, I think we should be permissive, where permissiveness is not going to be surprising (i.e. no explicit format specifier seems reasonable in the general context), and should follow scanf style format specifiers where reasonable. If I follow it right, it would therefore be possible to specify a 16-digit hex field with %.16x, right? Could you clarify what the motivation of the "with empty expression" bit is for? Is that just because when there is an empty expression, your regex is incorrect, or something else?

When only using a numeric expression, numeric substitution blocks are behaving as printf: print a value as text to be matched against the input. When defining a variable with an empty expression (the majority of definition cases), it behaves more like a scanf. Only printf support a precision in its syntax. Scanf doesn't support it. This is the main reason why I ask the question. It is also the case that we currently allow leading zeros when matching an unknown numeric value for a numeric variable definition with empty expression.

I think the case of a variable defined from an expression is a special case since you are matching something specific so in itself doesn't mandate extending the precision to variable definition with empty expression. However I think allowing a precision when matching an unknown variable is both useful and makes for syntax consistency.

To answer your earlier question, yes it'll be possible to match a 16-bit hex with #%.16x,VAR1: or an 8-bit hex with #%.8x, VAR2:. However #%.8, VAR1 will print all 16-bit of VAR1, same as printf. Does that seem reasonable or should we deviate from printf and give an error in such a case?

The code change in general seems simple enough to support the proposal too, though I haven't reviewed it in detail. I'll wait until you've added documentation/tests etc, so that I can review it all at once.

jhenderson mentioned this in D81144: [MC] Generate .debug_line in the 64-bit DWARF format [2/7].Jun 16 2020, 12:15 AM

In D81667#2092827, @thopre wrote:

In D81667#2092260, @jhenderson wrote:

I think I agree with your conclusions. More generally, I think we should be permissive, where permissiveness is not going to be surprising (i.e. no explicit format specifier seems reasonable in the general context), and should follow scanf style format specifiers where reasonable. If I follow it right, it would therefore be possible to specify a 16-digit hex field with %.16x, right? Could you clarify what the motivation of the "with empty expression" bit is for? Is that just because when there is an empty expression, your regex is incorrect, or something else?

When only using a numeric expression, numeric substitution blocks are behaving as printf: print a value as text to be matched against the input. When defining a variable with an empty expression (the majority of definition cases), it behaves more like a scanf. Only printf support a precision in its syntax. Scanf doesn't support it. This is the main reason why I ask the question. It is also the case that we currently allow leading zeros when matching an unknown numeric value for a numeric variable definition with empty expression.

I think the case of a variable defined from an expression is a special case since you are matching something specific so in itself doesn't mandate extending the precision to variable definition with empty expression. However I think allowing a precision when matching an unknown variable is both useful and makes for syntax consistency.

To answer your earlier question, yes it'll be possible to match a 16-bit hex with #%.16x,VAR1: or an 8-bit hex with #%.8x, VAR2:. However #%.8, VAR1 will print all 16-bit of VAR1, same as printf. Does that seem reasonable or should we deviate from printf and give an error in such a case?

I think that seems reasonable to me overall. Thanks for explaining.

ikudrin added a subscriber: ikudrin.Jun 16 2020, 4:33 AM

MaskRay added a subscriber: MaskRay.Jun 16 2020, 9:53 PM

MaskRay added inline comments.

llvm/lib/Support/FileCheckImpl.h
56	Prefer default member initializer (`unsigned Precision = 0;`)

Should the regex wildcard for a numeric variable definition with empty expression also respect the precision, i.e. #%.5u, VAR2: would be matched by (([1-9][0-9]+)? [0-9]{1,5})

I believe I followed the comments about matching behavior for an empty expression (scanf-like) vs. an expression (printf-like). So the above question is about whether, in the empty-expression case, it's worthwhile to support a precision specified by . even though scanf does not support that. Right?

I don't understand the above regex due to the space character after the ?. Was that intended?

Can you give some example inputs and explain the intended matching behavior for #%.5u, VAR2:? Why is this behavior needed in FileCheck but not in scanf?

thopre edited the summary of this revision. (Show Details)Jun 17 2020, 9:53 AM

In D81667#2098443, @jdenny wrote:

Should the regex wildcard for a numeric variable definition with empty expression also respect the precision, i.e. #%.5u, VAR2: would be matched by (([1-9][0-9]+)? [0-9]{1,5})

I believe I followed the comments about matching behavior for an empty expression (scanf-like) vs. an expression (printf-like). So the above question is about whether, in the empty-expression case, it's worthwhile to support a precision specified by . even though scanf does not support that. Right?

Correct.

I don't understand the above regex due to the space character after the ?. Was that intended?

No, fixed now.

Can you give some example inputs and explain the intended matching behavior for #%.5u, VAR2:? Why is this behavior needed in FileCheck but not in scanf?

Say the directive is:

CHECK: Address #%.8x,ADDR: is aligned

and the input text is:

Address 12345678 is aligned

I'd expect the directive to match and the value in ADDR to be 0x12345678. Now if the input text was:

Address FFFFFFFF12345678

I'd expect the directive to fail. If the directive was #%x, ADDR: the first input would have led to the same outcome but the second input would have led the directive matching and the value in ADDR to be 0xFFFFFFFF12345678.

Besides whether this is a useful feature, it makes for easier parsing and consistency in the syntax (no difference between variables defined from an expression where the precision would be allowed and variables defined from an empty expression where precision would not be allowed).

In D81667#2098537, @thopre wrote:

In D81667#2098443, @jdenny wrote:

Can you give some example inputs and explain the intended matching behavior for #%.5u, VAR2:? Why is this behavior needed in FileCheck but not in scanf?

Besides whether this is a useful feature, it makes for easier parsing and consistency in the syntax (no difference between variables defined from an expression where the precision would be allowed and variables defined from an empty expression where precision would not be allowed).

I forgot to mention that scanf doesn't need this because it's separate from printf (weaker need for consistency) and I guess aims at parsing some value more than checking format.

I don't understand the above regex due to the space character after the ?. Was that intended?

No, fixed now.

It now says #%.5u, VAR2: matches (([1-9][0-9]+)?[0-9]{1,5}), but that matches 123456789. I think that's unintended.

Can you give some example inputs and explain the intended matching behavior for #%.5u, VAR2:? Why is this behavior needed in FileCheck but not in scanf?

Say the directive is:

CHECK: Address #%.8x,ADDR: is aligned

and the input text is:

Address 12345678 is aligned

I'd expect the directive to match and the value in ADDR to be 0x12345678. Now if the input text was:

Address FFFFFFFF12345678

I'd expect the directive to fail.

You mean fail to match and continue searching? Or fail immediately?

So, %.8x is a maximum? For printf, it's a minimum. scanf's %8x (no .) feels more like what you're going for except that it discards additional digits instead of failing to match.

In D81667#2098625, @jdenny wrote:

I don't understand the above regex due to the space character after the ?. Was that intended?

No, fixed now.

It now says #%.5u, VAR2: matches (([1-9][0-9]+)?[0-9]{1,5}), but that matches 123456789. I think that's unintended.

Can you give some example inputs and explain the intended matching behavior for #%.5u, VAR2:? Why is this behavior needed in FileCheck but not in scanf?

Say the directive is:

CHECK: Address #%.8x,ADDR: is aligned

and the input text is:

Address 12345678 is aligned

I'd expect the directive to match and the value in ADDR to be 0x12345678. Now if the input text was:

Address FFFFFFFF12345678

I'd expect the directive to fail.

You mean fail to match and continue searching? Or fail immediately?

So, %.8x is a maximum? For printf, it's a minimum. scanf's %8x (no .) feels more like what you're going for except that it discards additional digits instead of failing to match.

My bad, my example was completely wrong. My personal motivation is consistency in the syntax. New example:

I'd expect 0x[[#%.8x, ADDR:]] to match 00001234 or FFFFFFFF12345678 but not 1234 due to there not being enough digits. I guess it could be useful to check alignment in a tool but as I said my main motivation is keeping a common format specifier syntax for all numeric substitution blocks. Note that my regex was indeed wrong anyway, it should be (([1-9][0-9]+)?[0-9]{5}).

thopre edited the summary of this revision. (Show Details)Jun 17 2020, 10:57 AM

I'd expect 0x[[#%.8x, ADDR:]] to match 00001234 or FFFFFFFF12345678 but not 1234 due to there not being enough digits.

OK, it would expect a value that could have been printed by printf with %.8x.

I guess it could be useful to check alignment in a tool but as I said my main motivation is keeping a common format specifier syntax for all numeric substitution blocks. Note that my regex was indeed wrong anyway, it should be (([1-9][0-9]+)?[0-9]{5}).

I think you want + to be * to permit 123456.

What would happen on 012345? Would it match 01234 and leave 5 for a later directive, or would FileCheck fail immediately?

In D81667#2098944, @jdenny wrote:

I'd expect 0x[[#%.8x, ADDR:]] to match 00001234 or FFFFFFFF12345678 but not 1234 due to there not being enough digits.

OK, it would expect a value that could have been printed by printf with %.8x.

FWIW, this is what I'm imagining the overall behaviour to be. If printf could have produced the output for a given format specifier, we should accept it, and conversely if it can't produce the output for a given format specifier, we shouldn't accept it.

I'm not sure whether we should consume all digits before applying the precision check or not though. I can see benefits for either side.

In D81667#2100049, @jhenderson wrote:

In D81667#2098944, @jdenny wrote:

I'd expect 0x[[#%.8x, ADDR:]] to match 00001234 or FFFFFFFF12345678 but not 1234 due to there not being enough digits.

OK, it would expect a value that could have been printed by printf with %.8x.

FWIW, this is what I'm imagining the overall behaviour to be. If printf could have produced the output for a given format specifier, we should accept it, and conversely if it can't produce the output for a given format specifier, we shouldn't accept it.

I'm not sure whether we should consume all digits before applying the precision check or not though. I can see benefits for either side.

We currently accept numbers with leading zeroes but printf would not produce those without a precision. Should we start by fixing this then?

In D81667#2100079, @thopre wrote:

In D81667#2100049, @jhenderson wrote:

In D81667#2098944, @jdenny wrote:

I'd expect 0x[[#%.8x, ADDR:]] to match 00001234 or FFFFFFFF12345678 but not 1234 due to there not being enough digits.

OK, it would expect a value that could have been printed by printf with %.8x.

FWIW, this is what I'm imagining the overall behaviour to be. If printf could have produced the output for a given format specifier, we should accept it, and conversely if it can't produce the output for a given format specifier, we shouldn't accept it.

I'm not sure whether we should consume all digits before applying the precision check or not though. I can see benefits for either side.

We currently accept numbers with leading zeroes but printf would not produce those without a precision. Should we start by fixing this then?

I think we need leading zeros to be accepted until we have an alternative in place. Otherwise, there may be existing tests that rely on the current behaviour which we can't migrate. I think that means a rough order of: 1) Add precision support; 2) Migrate existing tests to use it where needed; 3) Stop accepting leading zeros except via precision. 2) and 3) can probably be done at the same time. We should only do them as part of 1) if it's harder to keep them separate, in my opinion.

I want to raise one point. Some people may expect format specifier to be similar to scanf, instead of printf. scanf uses similar but less powerful format specifiers than printf. For instance, . is not valid in scanf. %.4u should fail (though glibc appears to be weird things; musl is good). In scanf, %4u reads at most 4 digits, not exactly 4 digits. The only way is %4c plus a conversion -> this is certainly not suitable in FileCheck. Anyway %.4u stills looks good to me.

If no variable is captured, is the syntax [[#%.4u:]]?

In D81667#2102317, @MaskRay wrote:

I want to raise one point. Some people may expect format specifier to be similar to scanf, instead of printf. scanf uses similar but less powerful format specifiers than printf. For instance, . is not valid in scanf. %.4u should fail (though glibc appears to be weird things; musl is good). In scanf, %4u reads at most 4 digits, not exactly 4 digits. The only way is %4c plus a conversion -> this is certainly not suitable in FileCheck. Anyway %.4u stills looks good to me.

That's exactly the point of the second question in the description. Capturing a variable feels more like scanf but I think a unified syntax makes more sense. This is where we need to diverge from the printf/scanf analogy. Since the accepted format is defined explicitely in the documentation I don't think it's a big problem.

If no variable is captured, is the syntax [[#%.4u:]]?

It would be #%.4u or simply #%.4 since u is the default format specifier.

In D81667#2103464, @thopre wrote:

In D81667#2102317, @MaskRay wrote:

I want to raise one point. Some people may expect format specifier to be similar to scanf, instead of printf. scanf uses similar but less powerful format specifiers than printf. For instance, . is not valid in scanf. %.4u should fail (though glibc appears to be weird things; musl is good). In scanf, %4u reads at most 4 digits, not exactly 4 digits. The only way is %4c plus a conversion -> this is certainly not suitable in FileCheck. Anyway %.4u stills looks good to me.

That's exactly the point of the second question in the description. Capturing a variable feels more like scanf but I think a unified syntax makes more sense. This is where we need to diverge from the printf/scanf analogy. Since the accepted format is defined explicitely in the documentation I don't think it's a big problem.

If no variable is captured, is the syntax [[#%.4u:]]?

It would be #%.4u or simply #%.4 since u is the default format specifier.

Nice. [[#%.4u]] (non-capturing) and [[#%.4u,ADDR:]] (capturing) looks good to me. Might be worth noting that it is not a scanf-supported specifier.

Finish implementation based on consensus reached on questions raised by the proof of concept version.

Harbormaster completed remote builds in B67601: Diff 284150.Aug 8 2020, 4:19 PM

Add example of precision in documentation

Harbormaster completed remote builds in B67647: Diff 284233.Aug 9 2020, 2:19 PM

Functionality looks reasonable, although I haven't checked the testing yet.

llvm/docs/CommandGuide/FileCheck.rst
738 ↗	(On Diff #284233)	If we expand this out, the full syntax is apparently `[[#%.<precision><precision><conversion specifier>,<NUMVAR:]]`, which I don't think is what you mean :-)
742–743 ↗	(On Diff #284233)	Should we say something about leading zeros beyond those required by the precision value?
746 ↗	(On Diff #284233)	Nit: There's a double space after "to".
765 ↗	(On Diff #284233)
llvm/lib/Support/FileCheck.cpp
47	`StringRef`?
737	Can you fix the case of `fmtloc` whilst you're modifying this line, please?
llvm/lib/Support/FileCheckImpl.h
89

Address most comments

llvm/docs/CommandGuide/FileCheck.rst
742–743 ↗	(On Diff #284233)	Is that what you expected?
llvm/lib/Support/FileCheck.cpp
47	ostringstream below does not understand StringRef so I would need to do .str() which can be expensive. Any reason not to keep const char*?

grimar added inline comments.Aug 11 2020, 3:23 AM

llvm/lib/Support/FileCheck.cpp
59	Perhaps, it might be simpler just to merge switches and write the logic here as: Expected<std::string> ExpressionFormat::getWildcardRegex() const { if (Value == Kind::NoFormat) return createStringError(std::errc::invalid_argument, "trying to match value with invalid format"); switch (Value) { case Kind::Unsigned: if (Precision) return ("-?([1-9][0-9]*)?[0-9]{" + Twine(Precision) + "}").str(); return std::string("[0-9]+"); case Kind::Signed: ... default: llvm_unreachable("...."); } }
68	Seems you should be able to do the following instead? return (RegexPrefix + Twine(Precision) + "}").str();
699	Use `trim`? FormatExpr.trim(SpaceChars)
llvm/unittests/Support/FileCheckTest.cpp
165 ↗	(On Diff #284233)	This will fail if `NumStr` is empty. Is it OK (I guess so), though perhaps a bit cleaner would be to use `StringRef::startswith`.
171 ↗	(On Diff #284233)	PaddedStr = "-";

Address more review comments

thopre added inline comments.Aug 11 2020, 3:40 AM

llvm/lib/Support/FileCheck.cpp
59	I'm not a big fan of repeating the formatting logic for the Precision case so I've kept that bit as is. What do you think of the result?

grimar added inline comments.Aug 11 2020, 3:56 AM

llvm/lib/Support/FileCheck.cpp
59	I see 2 possible improvements: When you have a dedicated `RegexPrefix` variable, you postpone the return and have to add `break`s everywhere. If you just do not want to repeat the formatting logic, I'd suggest to add a little helper. E.g: auto CreatePrecisionRegex = [](StringRef S) -> std::string { return (S + Twine(Precision) + "}").str(); }; switch (Value) { case Kind::Unsigned: if (Precision) return CreatePrecisionRegex("-?([1-9][0-9]*)?[0-9]{"); return std::string("[0-9]+"); default: llvm_unreachable("ddd"); } The main benefit is that you can return early and avoid having a one more variable. Perhaps it doesn't make much sence to use `createStringError` for the `default` case? It is unreachable now and can't be tested either (I believe). So I'd either remove the `if (Value == Kind::NoFormat)` block and handle the error in the `default`, like you initially did, or keep it and switch to using `llvm_unreachable` in `default`.

grimar added inline comments.Aug 11 2020, 3:58 AM

llvm/lib/Support/FileCheck.cpp
59	Oh, and for `1)` there is no need to use `-> std::string`: auto CreatePrecisionRegex = [](StringRef S) { return (S + Twine(Precision) + "}").str(); };

Harbormaster completed remote builds in B67868: Diff 284632.Aug 11 2020, 4:04 AM

Harbormaster completed remote builds in B67870: Diff 284635.Aug 11 2020, 4:09 AM

Add review comments

thopre added inline comments.Aug 11 2020, 4:23 AM

llvm/lib/Support/FileCheck.cpp
59	Ah yes, I started doing it your way and changed in the middle. I'll remove the top if block

Harbormaster completed remote builds in B67874: Diff 284647.Aug 11 2020, 4:25 AM

jdenny added inline comments.Aug 11 2020, 9:26 AM

llvm/docs/CommandGuide/FileCheck.rst
732 ↗	(On Diff #284647)	"`%<fmtspec>` is an optional" -> "`%<fmtspec>,` is an optional"? That is, you must either have `%<fmtspec>` and `,` or neither, right? "the what" -> "what"
733 ↗	(On Diff #284647)	"how many leading zeros" -> "how many digits" given that you can directly specify the latter (as a minimum) but not the former?
758 ↗	(On Diff #284647)	`IMM`->`ADDR` The documentation above says 8 is the minimum, but `F0F0` has 4 digits.
761 ↗	(On Diff #284647)	Isn't `:` supposed to be `,`? That's how the tests seem to work, and FileCheck complains when I try this syntax with `:`.
769 ↗	(On Diff #284647)	"variable" -> "variables,"
776 ↗	(On Diff #284647)	When `<expr>` is empty (here or in the variable definition syntax), then the precision specifier specifies the minimum number of digits to be matched, right? When `<expr>` is non-empty, then the precision specifier combined with the actual value of the expression specifies an exact number of digits to be matched, right? I understand that the precision is a minimum here too, but I think it's a printing/substitution minimum not a matching/capturing minimum. My point is that this case is a bit hard to follow. It seems to me that the numeric substitution syntax with no `<expr>` is actually more like a variable definition syntax with no variable (and thus no `:`): there's no existing value to match against, so there's nothing to "substitute". Instead you're capturing a new value and either saving it as a variable or discarding it. Can we document it that way? If so, instead of calling the first syntax "The syntax to define a numeric variable", you might call it "The syntax to capture a numeric value". It can optionally define a numeric variable.

jhenderson added inline comments.Aug 13 2020, 2:10 AM

llvm/docs/CommandGuide/FileCheck.rst
742–743 ↗	(On Diff #284647)	I think that is much simpler.
llvm/lib/Support/FileCheck.cpp
46	This doesn't compile. I don't think you can use `->` in a capture list. You just need to specify `this` and then use appropriately below.
llvm/test/FileCheck/numeric-expression.txt
147 ↗	(On Diff #284647)	Same goes elsewhere.
llvm/unittests/Support/FileCheckTest.cpp
142 ↗	(On Diff #284647)	I think you could simplify this code by starting with `std::string ExtendedInput = Input;` and then just using `ExtendedInput` in the checks below.
153–162 ↗	(On Diff #284647)	It sounds to me like this is really just two completely different functions. I'd recommend splitting.

Address all remaining review comments

llvm/docs/CommandGuide/FileCheck.rst
776 ↗	(On Diff #284647)	I like the idea of distinguishing between capturing a value and substituting a value. Good call.
llvm/lib/Support/FileCheck.cpp
46	It's what I found out before I submit this diff, I must have forgotten to undo the change. Sorry about that.

Harbormaster completed remote builds in B68258: Diff 285356.Aug 13 2020, 7:28 AM

thopre retitled this revision from [RFC, FileCheck] Add precision to format specifier to [FileCheck] Add precision to format specifier.Aug 19 2020, 8:49 AM

I think this is basically ready now, barring my example comment.

llvm/docs/CommandGuide/FileCheck.rst
754–757 ↗	(On Diff #285356)	If this example is meant to demonstrate the precision as well as conversion, it probably makes sense to say something like "but would not match `mov r5, 0x00F0F0FEFE`" and/or change the example to `mov r5, 0x0000F0F0`, so that it shows the precision behaviour.

Better demonstrate precision in documentation

llvm/docs/CommandGuide/FileCheck.rst
754–757 ↗	(On Diff #285356)	Good point.

LGTM, but best wait for someone else to confirm too.

This revision is now accepted and ready to land.Aug 20 2020, 1:43 AM

Harbormaster completed remote builds in B68991: Diff 286738.Aug 20 2020, 2:21 AM

In D81667#2227851, @jhenderson wrote:

LGTM, but best wait for someone else to confirm too.

Ping anyone else?

I've debugged this and it LGTM.
Have a few minor suggestions about the code (up to you).

llvm/lib/Support/FileCheck.cpp
47	Perhaps, a bit cleaner would be to add the "{" right here.
73	You can just use the value you have already.
92	You can combine these cases I think: case Kind::HexUpper: case Kind::HexLower: AbsoluteValueStr = utohexstr(AbsoluteValue, Value == Kind::HexLower); break;

Herald added a subscriber: danielkiss. · View Herald TranscriptAug 30 2020, 2:39 AM

Closed by commit rG998709b7d553: [FileCheck] Add precision to format specifier (authored by thopre). · Explain WhyAug 30 2020, 11:40 AM

This revision was automatically updated to reflect the committed changes.

thopre marked 3 inline comments as done.

thopre added a commit: rG998709b7d553: [FileCheck] Add precision to format specifier.

Revision Contents

Path

Size

llvm/

lib/

Support/

FileCheck.cpp

56 lines

FileCheckImpl.h

12 lines

Diff 270159

llvm/lib/Support/FileCheck.cpp

Show All 37 Lines case Kind::HexUpper:

return StringRef("%X"); return StringRef("%X");

case Kind::HexLower: case Kind::HexLower:

return StringRef("%x"); return StringRef("%x");

} }

llvm_unreachable("unknown expression format"); llvm_unreachable("unknown expression format");

} }

Expected<StringRef> ExpressionFormat::getWildcardRegex() const { Expected<StringRef> ExpressionFormat::getWildcardRegex() const {

switch (Value) { switch (Value) {

jhendersonUnsubmitted

Done

This doesn't compile. I don't think you can use -> in a capture list. You just need to specify this and then use appropriately below.

jhenderson: This doesn't compile. I don't think you can use `->` in a capture list. You just need to…

thopreAuthorUnsubmitted

Done

It's what I found out before I submit this diff, I must have forgotten to undo the change. Sorry about that.

thopre: It's what I found out before I submit this diff, I must have forgotten to undo the change.

case Kind::Unsigned: case Kind::Unsigned:

jhendersonUnsubmitted

Done

StringRef?

jhenderson: `StringRef`?

thopreAuthorUnsubmitted

Done

ostringstream below does not understand StringRef so I would need to do .str() which can be expensive. Any reason not to keep const char*?

thopre: ostringstream below does not understand StringRef so I would need to do .str() which can be…

grimarUnsubmitted

Done

auto CreatePrecisionRegex = [this](StringRef S) {

- return (S + Twine(Precision) + "}").str();

+ return (S + "{" + Twine(Precision) + "}").str();

};

switch (Value) {

Perhaps, a bit cleaner would be to add the "{" right here.

grimar: Perhaps, a bit cleaner would be to add the "{" right here.

return StringRef("[0-9]+"); return StringRef("[0-9]+");

case Kind::Signed: case Kind::Signed:

return StringRef("-?[0-9]+"); return StringRef("-?[0-9]+");

case Kind::HexUpper: case Kind::HexUpper:

return StringRef("[0-9A-F]+"); return StringRef("[0-9A-F]+");

case Kind::HexLower: case Kind::HexLower:

return StringRef("[0-9a-f]+"); return StringRef("[0-9a-f]+");

default: default:

return createStringError(std::errc::invalid_argument, return createStringError(std::errc::invalid_argument,

"trying to match value with invalid format"); "trying to match value with invalid format");

} }

grimarUnsubmitted

Done

Perhaps, it might be simpler just to merge switches and write the logic here as:

Expected<std::string> ExpressionFormat::getWildcardRegex() const {
  if (Value == Kind::NoFormat)
    return createStringError(std::errc::invalid_argument,
                             "trying to match value with invalid format");
  switch (Value) {
  case Kind::Unsigned:
    if (Precision)
      return ("-?([1-9][0-9]*)?[0-9]{" + Twine(Precision) + "}").str();
    return std::string("[0-9]+");
  case Kind::Signed:
     ...
  default:
    llvm_unreachable("....");
  }
}

grimar: Perhaps, it might be simpler just to merge switches and write the logic here as: ```…

thopreAuthorUnsubmitted

Done

I'm not a big fan of repeating the formatting logic for the Precision case so I've kept that bit as is. What do you think of the result?

thopre: I'm not a big fan of repeating the formatting logic for the Precision case so I've kept that…

grimarUnsubmitted

Done

I see 2 possible improvements:

When you have a dedicated RegexPrefix variable, you postpone the return and have to add breaks everywhere. If you just do not want to repeat the formatting logic, I'd suggest to add a little helper. E.g:

auto CreatePrecisionRegex = [](StringRef S) -> std::string {
  return (S + Twine(Precision) + "}").str();
};

switch (Value) {
case Kind::Unsigned:
  if (Precision)
    return CreatePrecisionRegex("-?([1-9][0-9]*)?[0-9]{");
  return std::string("[0-9]+");
default:
  llvm_unreachable("ddd");
}

The main benefit is that you can return early and avoid having a one more variable.

Perhaps it doesn't make much sence to use createStringError for the default case? It is unreachable now and can't be tested either (I believe).

So I'd either remove the if (Value == Kind::NoFormat) block and handle the error in the default, like you initially did,
or keep it and switch to using llvm_unreachable in default.

grimar: I see 2 possible improvements: 1) When you have a dedicated `RegexPrefix` variable, you…

grimarUnsubmitted

Done

Oh, and for 1) there is no need to use -> std::string:

auto CreatePrecisionRegex = [](StringRef S) {
  return (S + Twine(Precision) + "}").str();
};

grimar: Oh, and for `1)` there is no need to use `-> std::string`: ``` auto CreatePrecisionRegex = []…

thopreAuthorUnsubmitted

Done

Ah yes, I started doing it your way and changed in the middle. I'll remove the top if block

thopre: Ah yes, I started doing it your way and changed in the middle. I'll remove the top if block

Expected<std::string> Expected<std::string>

ExpressionFormat::getMatchingString(ExpressionValue IntegerValue) const { ExpressionFormat::getMatchingString(ExpressionValue IntegerValue) const {

uint64_t AbsoluteValue;

StringRef SignPrefix = IntegerValue.isNegative() ? "-" : "";

if (Value == Kind::Signed) { if (Value == Kind::Signed) {

Expected<int64_t> SignedValue = IntegerValue.getSignedValue(); Expected<int64_t> SignedValue = IntegerValue.getSignedValue();

if (!SignedValue) if (!SignedValue)

grimarUnsubmitted

Done

Seems you should be able to do the following instead?

return (RegexPrefix + Twine(Precision) + "}").str();

grimar: Seems you should be able to do the following instead? ``` return (RegexPrefix + Twine…

return SignedValue.takeError(); return SignedValue.takeError();

return itostr(*SignedValue); if (*SignedValue < 0)

} AbsoluteValue = cantFail(IntegerValue.getAbsolute().getUnsignedValue());

else

AbsoluteValue = cantFail(IntegerValue.getSignedValue());

grimarUnsubmitted

Done

else

- AbsoluteValue = cantFail(IntegerValue.getSignedValue());

+ AbsoluteValue = *SignedValue;

} else {

You can just use the value you have already.

grimar: You can just use the value you have already.

} else {

Expected<uint64_t> UnsignedValue = IntegerValue.getUnsignedValue(); Expected<uint64_t> UnsignedValue = IntegerValue.getUnsignedValue();

if (!UnsignedValue) if (!UnsignedValue)

return UnsignedValue.takeError(); return UnsignedValue.takeError();

AbsoluteValue = *UnsignedValue;

}

std::string AbsoluteValueStr;

switch (Value) { switch (Value) {

case Kind::Unsigned: case Kind::Unsigned:

return utostr(*UnsignedValue); case Kind::Signed:

AbsoluteValueStr = utostr(AbsoluteValue);

break;

case Kind::HexUpper: case Kind::HexUpper:

return utohexstr(*UnsignedValue, /*LowerCase=*/false); AbsoluteValueStr = utohexstr(AbsoluteValue, /*LowerCase=*/false);

break;

case Kind::HexLower: case Kind::HexLower:

return utohexstr(*UnsignedValue, /*LowerCase=*/true); AbsoluteValueStr = utohexstr(AbsoluteValue, /*LowerCase=*/true);

break;

grimarUnsubmitted

Done

You can combine these cases I think:

case Kind::HexUpper:
case Kind::HexLower:
  AbsoluteValueStr = utohexstr(AbsoluteValue, Value == Kind::HexLower);
  break;

grimar: You can combine these cases I think: ``` case Kind::HexUpper: case Kind::HexLower…

default: default:

return createStringError(std::errc::invalid_argument, return createStringError(std::errc::invalid_argument,

"trying to match value with invalid format"); "trying to match value with invalid format");

} }

if (Precision > AbsoluteValueStr.size()) {

unsigned LeadingZeros = Precision - AbsoluteValueStr.size();

return (Twine(SignPrefix) + std::string(LeadingZeros, '0') +

AbsoluteValueStr)

.str();

}

return (Twine(SignPrefix) + AbsoluteValueStr).str();

} }

Expected<ExpressionValue> Expected<ExpressionValue>

ExpressionFormat::valueFromStringRepr(StringRef StrVal, ExpressionFormat::valueFromStringRepr(StringRef StrVal,

const SourceMgr &SM) const { const SourceMgr &SM) const {

bool ValueIsSigned = Value == Kind::Signed; bool ValueIsSigned = Value == Kind::Signed;

StringRef OverflowErrorStr = "unable to represent numeric value"; StringRef OverflowErrorStr = "unable to represent numeric value";

if (ValueIsSigned) { if (ValueIsSigned) {

▲ Show 20 Lines • Show All 577 Lines • ▼ Show 20 Lines Expected<std::unique_ptr<Expression>> Pattern::parseNumericSubstitutionBlock(

// Parse format specifier (NOTE: ',' is also an argument seperator). // Parse format specifier (NOTE: ',' is also an argument seperator).

size_t FormatSpecEnd = Expr.find(','); size_t FormatSpecEnd = Expr.find(',');

size_t FunctionStart = Expr.find('('); size_t FunctionStart = Expr.find('(');

if (FormatSpecEnd != StringRef::npos && FormatSpecEnd < FunctionStart) { if (FormatSpecEnd != StringRef::npos && FormatSpecEnd < FunctionStart) {

Expr = Expr.ltrim(SpaceChars); Expr = Expr.ltrim(SpaceChars);

if (!Expr.consume_front("%")) if (!Expr.consume_front("%"))

return ErrorDiagnostic::get( return ErrorDiagnostic::get(

SM, Expr, "invalid matching format specification in expression"); SM, Expr, "invalid matching format specification in expression");

grimarUnsubmitted

Done

Use trim?

FormatExpr.trim(SpaceChars)

grimar: Use `trim`? ``` FormatExpr.trim(SpaceChars) ```

// Parse precision.

unsigned Precision = 0;

if (Expr.consume_front(".")) {

if (Expr.consumeInteger(10, Precision))

return ErrorDiagnostic::get(SM, Expr,

"invalid precision in format specifier");

}

// Check for unknown matching format specifier and set matching format in // Check for unknown matching format specifier and set matching format in

// class instance representing this expression. // class instance representing this expression.

SMLoc fmtloc = SMLoc::getFromPointer(Expr.data()); SMLoc fmtloc = SMLoc::getFromPointer(Expr.data());

switch (popFront(Expr)) { switch (popFront(Expr)) {

case 'u': case 'u':

ExplicitFormat = ExpressionFormat(ExpressionFormat::Kind::Unsigned); ExplicitFormat =

ExpressionFormat(ExpressionFormat::Kind::Unsigned, Precision);

break; break;

case 'd': case 'd':

ExplicitFormat = ExpressionFormat(ExpressionFormat::Kind::Signed); ExplicitFormat =

ExpressionFormat(ExpressionFormat::Kind::Signed, Precision);

break; break;

case 'x': case 'x':

ExplicitFormat = ExpressionFormat(ExpressionFormat::Kind::HexLower); ExplicitFormat =

ExpressionFormat(ExpressionFormat::Kind::HexLower, Precision);

break; break;

case 'X': case 'X':

ExplicitFormat = ExpressionFormat(ExpressionFormat::Kind::HexUpper); ExplicitFormat =

ExpressionFormat(ExpressionFormat::Kind::HexUpper, Precision);

break; break;

default: default:

return ErrorDiagnostic::get(SM, fmtloc, return ErrorDiagnostic::get(SM, fmtloc,

"invalid format specifier in expression"); "invalid format specifier in expression");

} }

Expr = Expr.ltrim(SpaceChars); Expr = Expr.ltrim(SpaceChars);

if (!Expr.consume_front(",")) if (!Expr.consume_front(","))

return ErrorDiagnostic::get( return ErrorDiagnostic::get(

SM, Expr, "invalid matching format specification in expression"); SM, Expr, "invalid matching format specification in expression");

} }

jhendersonUnsubmitted

Done

Can you fix the case of fmtloc whilst you're modifying this line, please?

jhenderson: Can you fix the case of `fmtloc` whilst you're modifying this line, please?

// Save variable definition expression if any. // Save variable definition expression if any.

size_t DefEnd = Expr.find(':'); size_t DefEnd = Expr.find(':');

if (DefEnd != StringRef::npos) { if (DefEnd != StringRef::npos) {

DefExpr = Expr.substr(0, DefEnd); DefExpr = Expr.substr(0, DefEnd);

Expr = Expr.substr(DefEnd + 1); Expr = Expr.substr(DefEnd + 1);

} }

▲ Show 20 Lines • Show All 1,815 Lines • Show Last 20 Lines

llvm/lib/Support/FileCheckImpl.h

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines enum class Kind {

/// Value should be printed as an uppercase hex number. /// Value should be printed as an uppercase hex number.

HexUpper, HexUpper,

/// Value should be printed as a lowercase hex number. /// Value should be printed as a lowercase hex number.

HexLower HexLower

}; };

private: private:

Kind Value; Kind Value;

unsigned Precision;

MaskRayUnsubmitted

Done

Prefer default member initializer (unsigned Precision = 0;)

MaskRay: Prefer default member initializer (`unsigned Precision = 0;`)

public: public:

/// Evaluates a format to true if it can be used in a match. /// Evaluates a format to true if it can be used in a match.

explicit operator bool() const { return Value != Kind::NoFormat; } explicit operator bool() const { return Value != Kind::NoFormat; }

/// Define format equality: formats are equal if neither is NoFormat and /// Define format equality: formats are equal if neither is NoFormat and

/// their kinds are the same. /// their kinds and precision are the same.

bool operator==(const ExpressionFormat &Other) const { bool operator==(const ExpressionFormat &Other) const {

return Value != Kind::NoFormat && Value == Other.Value; return Value != Kind::NoFormat && Value == Other.Value &&

Precision == Other.Precision;

} }

bool operator!=(const ExpressionFormat &Other) const { bool operator!=(const ExpressionFormat &Other) const {

return !(*this == Other); return !(*this == Other);

} }

bool operator==(Kind OtherValue) const { return Value == OtherValue; } bool operator==(Kind OtherValue) const { return Value == OtherValue; }

bool operator!=(Kind OtherValue) const { return !(*this == OtherValue); } bool operator!=(Kind OtherValue) const { return !(*this == OtherValue); }

/// \returns the format specifier corresponding to this format as a string. /// \returns the format specifier corresponding to this format as a string.

StringRef toString() const; StringRef toString() const;

ExpressionFormat() : Value(Kind::NoFormat){}; ExpressionFormat() : Value(Kind::NoFormat), Precision(0){};

explicit ExpressionFormat(Kind Value) : Value(Value){}; explicit ExpressionFormat(Kind Value) : Value(Value), Precision(0){};

explicit ExpressionFormat(Kind Value, unsigned Precision)

: Value(Value), Precision(Precision){};

/// \returns a wildcard regular expression StringRef that matches any value /// \returns a wildcard regular expression StringRef that matches any value

/// in the format represented by this instance, or an error if the format is /// in the format represented by this instance, or an error if the format is

/// NoFormat. /// NoFormat.

Expected<StringRef> getWildcardRegex() const; Expected<StringRef> getWildcardRegex() const;

jhendersonUnsubmitted

Done

/// \returns a wildcard regular expression string that matches any value in

- /// the format represented by this instance and none other value, or an error

+ /// the format represented by this instance and no other value, or an error

/// if the format is NoFormat.

jhenderson:

/// \returns the string representation of \p Value in the format represented /// \returns the string representation of \p Value in the format represented

/// by this instance, or an error if conversion to this format failed or the /// by this instance, or an error if conversion to this format failed or the

/// format is NoFormat. /// format is NoFormat.

Expected<std::string> getMatchingString(ExpressionValue Value) const; Expected<std::string> getMatchingString(ExpressionValue Value) const;

/// \returns the value corresponding to string representation \p StrVal /// \returns the value corresponding to string representation \p StrVal

/// according to the matching format represented by this instance or an error /// according to the matching format represented by this instance or an error

/// with diagnostic against \p SM if \p StrVal does not correspond to a valid /// with diagnostic against \p SM if \p StrVal does not correspond to a valid

▲ Show 20 Lines • Show All 735 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[FileCheck] Add precision to format specifierClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 270159

llvm/lib/Support/FileCheck.cpp

llvm/lib/Support/FileCheckImpl.h

[FileCheck] Add precision to format specifier
ClosedPublic