Previously, strtol/ll/ul/ull would return a pointer to the end of its
parsing, regardless of if it detected a number. Now it will return a
length of 0 when it doesn't find a number.
Details
- Reviewers
sivachandra - Commits
- rG53804d4eb286: [libc] fix strtol returning the wrong length
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
add tests for lone prefixes
libc/src/__support/str_conv_utils.h | ||
---|---|---|
83 | it's to handle the case where the number is just "0" and the base is set to automatic. Without this, then that would be parsed as an octal number with no digits, as opposed to a decimal number with one digit. I've added a test to check this. |
libc/src/__support/str_conv_utils.h | ||
---|---|---|
83 | May be infer_base is incorrect then? Consider this: const char *n = "08"; char *next; long i = ::strtol(n, &next, 0); Since "08" is an invalid number, should we want the above strtolcall to succeed or a fail? If it is a failure, should next point to 8 or 0? Should it be the same if the base was explicitly specified to be 8? I also think we can ask the same questions when base is 16. My reading of the standard says that "08" is an invalid number so no conversion should be performed. Which means, str_end should point to original_src. Likewise, "0xZ" with base 16 is an invalid number so no conversion should be performed and str_end should point to original_src. |
libc/src/__support/str_conv_utils.h | ||
---|---|---|
83 | infer_base was incorrect, and I have now fixed it so that it doesn't skip over the leading octal 0, and removed this condition. From what I can tell, in the case of both "08" and "0xZ", when parsed with an automatic base, what we have is a solo octal 0 (and this matches what I see from other libc implementations). In the C standard §6.4.4.1 it specifies that "A decimal constant begins with a nonzero digit", meaning that those numbers cannot be assumed to be base 10, but that "An octal constant consists of the prefix 0 optionally followed by a sequence of the digits 0 through 7 only". This means that the longest valid interpretation of "08" is that it is an octal 0 followed by some value that is not a valid digit (equivalent to "0Z"). For "0xZ" it is similar, a hexadecimal number has to have a valid hexadecimal digit after the "0x" so this is just 0. |
libc/src/__support/str_conv_utils.h | ||
---|---|---|
45 | Replace This will advance the string pointer. with This function will advance |src| to the first valid digit in the inferred base. | |
83 | Ah, yes! I am reading two different parts of the standard in isolation. 6.4.4.1 Has this text: An octal constant consists of the prefix 0 optionally followed by a sequence of the digits 0 through 7 only. A hexadecimal constant consists of the prefix 0x or 0X followed by a sequence of the decimal digits and the letters a (or A) through f (or F) with values 10 through 15 respectively. Which makes 08 an invalid literal. But, the spec for strto<int> functions also says: decompose the input string into three parts: an initial, possibly empty, sequence of white-space characters (as specified by the isspace function), a subject sequence resembling an integer represented in some radix determined by the value of base, and a final string of one or more unrecognized characters, including the terminating null character of the input string. So, in the string "08", '8' is the first unrecognized character, so the string consists of only one valid base 8 character "0". Combining the above two parts of the standard, "08" should be interpreted as just the octal zero. Which is what you are saying I think. This same argument does not hold for "0x" or "0xZ" because just the prefix 0x or the number 0xZ are not hex numbers at all. But, the octal argument kicks in here as well: so "0x" is octal zero, and "0xZ" is also octal zero. I just checked what other libcs do and it looks like glibc atleast follows this argument. In both the cases, the end_ptr is updated by only 1 implying that only one valid character was read. If you are convinced with this, can you update infer_base to reflect all of this. Also add comments quoting the appropriate sections in the standard. |
Replace
with