This is an archive of the discontinued LLVM Phabricator instance.

Add `StringRef::consumeInteger`
ClosedPublic

Authored by zturner on Sep 20 2016, 12:50 PM.

Details

Summary

StringRef::getAsInteger() is our answer to the C-apis strtoll, strtoull, etc. Those functions have a signature that looks like this:

unsigned long long int strtoull (const char* str, char** endptr, int radix)

The second argument is (optionally) used to return the location in the input string where the returned number stopped. So, for example if you had a string such as `"23908234 Integers", then you could do the following:

const char * str = "23908234 Integers";
unsigned result = strtoul(str, &str, 10);
// str == " Integers";

However, regardless of the second argument, these functions only assume that the input string begins with a number, whereas our getAsInteger functions consider it a failure unless the entire string is a number.

LLDB depends on the ability to extract integers from the beginning of strings quite commonly, and there is no good solution to this in LLVM presently. So I'm introducing the consumeInteger() function which consumes a number from the beginning of the string, failing only if the string does not BEGIN with a valid number of the specified radix, and upon success modifying the original StringRef to chop off the consumed portion. This is similar in spirit to how the consume_front() and consume_back() functions work.

getAsInteger() is updated to internally use the consume functions so that the logic is not duplicated, and unit tests are added for consume behavior.

Diff Detail

Repository
rL LLVM

Event Timeline

zturner updated this revision to Diff 71968.Sep 20 2016, 12:50 PM
zturner retitled this revision from to Add `StringRef::consumeInteger`.
zturner updated this object.
zturner added reviewers: beanz, mehdi_amini.
zturner added a subscriber: llvm-commits.

BTW, I couldn't decide what was better for the return value.

getAsInteger() returns true for failure and false for success, but it is difficult to change this as it the function is in widespread use throughout the codebase.

consume_front() and consume_back() return true for success and false for failure, which is much more sensible.

So do we have consumeInteger() maintain consistency with consume_front() and consume_back() or with getAsInteger()? I'm not sure. I made it consistent with getAsInteger(), but if anyone feels it should be changed, let me know. Ideally we would change getAsInteger() to use a more sensible return value.

mehdi_amini edited edge metadata.Sep 21 2016, 12:54 PM

Can you add a test that shows what happen with an explicit radix? (And a mismatch like base 10 and hexadecimal for example)

include/llvm/ADT/StringRef.h
414 ↗(On Diff #71968)

Is the intent to provide 64 bits here? If so why not using int64_t?

zturner added inline comments.Sep 21 2016, 12:59 PM
include/llvm/ADT/StringRef.h
414 ↗(On Diff #71968)

Good question. I based this off the implementation in getAsInteger above. I don't know if it makes a difference. If you think int64_t is better LMK.

zturner updated this revision to Diff 72115.Sep 21 2016, 3:17 PM
zturner edited edge metadata.

Added a few test cases showing what happens in the case of a radix mismatch (TL;DR - it stops consuming once it reaches a digit of an invalid radix).

Also this caught a few bugs in weird edge cases, which I fixed, so thanks for the suggestion

include/llvm/ADT/StringRef.h
75 ↗(On Diff #72115)

Sorry, this is unrelated to this CL. I was going to submit this separately without review.

mehdi_amini accepted this revision.Sep 21 2016, 3:43 PM
mehdi_amini edited edge metadata.

LGTM. Thanks.

This revision is now accepted and ready to land.Sep 21 2016, 3:43 PM
beanz accepted this revision.Sep 21 2016, 3:54 PM
beanz edited edge metadata.

LGTM too!

This revision was automatically updated to reflect the committed changes.
amccarth added inline comments.
llvm/trunk/include/llvm/ADT/StringRef.h
29

Please indicate what true and false returns mean. Consider making them private if these aren't the droids a general user is looking for.

llvm/trunk/lib/Support/StringRef.cpp
364

Probably outside the scope of this patch, but the proposal for 0o for C++ allows the O to be uppercase as well, just as with the X and B for the other radixes.

384

true means invalid?! Wow! I did not see that coming.

I believe this check is unnecessary now, since this patch adds a check near the end to see if any characters were consumed.

llvm/trunk/unittests/ADT/StringRefTest.cpp
581

Please add a case or two for a radix prefix followed by no valid digits. For example:

`"0b", "0XQ"

mehdi_amini added inline comments.Sep 22 2016, 11:56 AM
llvm/trunk/lib/Support/StringRef.cpp
384

This is a widespread convention in LLVM APIs that returning "true" means an error occurred (and I'm always confused).

The new Error class style settles this issue, we should just be more consistent on using it.