I need this new API for wscanf/wprintf checker support
Details
Diff Detail
Event Timeline
Once these issues are fixed this should be fine.
llvm/lib/Support/ConvertUTFWrapper.cpp | ||
---|---|---|
148 | I believe this actually needs an additional check that that SrcBytes.size() is < 4 and return false. | |
172 | This is technically correct, but it's implicit in that the max number of UTF8 code units per code point is the same as sizeof(UTF32). Would be nice to have a comment. |
Adding some more Unicode folks to the reviewers just in case, but I think this is reasonable.
llvm/lib/Support/ConvertUTFWrapper.cpp | ||
---|---|---|
148 | Should we be testing that the input array length is a multiple of 4 as well? |
This is true, but this patch is matching the existing API. I don't like the current API, but I don't think this patch needs to fix that, and specifically shouldn't diverge from the existing API.
llvm/lib/Support/ConvertUTFWrapper.cpp | ||
---|---|---|
148 | no, that assert doesn't do that. so you are right something is missing here - and on second thought, it probably should not be an assert, just a check in that if |
llvm/lib/Support/ConvertUTFWrapper.cpp | ||
---|---|---|
148 | I'm happy with the alignment assertion, but I was asking for assert(SrcBytes.size() % sizeof(UTF32) == 0); to ensure nobody does something silly like pass in 13 bytes of data and expect good things to happen. |
Beside the missing comment Michael wanted, I think this looks consistent with the existing functions
llvm/lib/Support/ConvertUTFWrapper.cpp | ||
---|---|---|
172 | Nit: The comment still doesn't say that we assume there can only be 4 bytes per utf-8 code units - which would not be the case if the utf-8 comes for non-iso10646 conforming android environments for example |
llvm/lib/Support/ConvertUTFWrapper.cpp | ||
---|---|---|
172 | I was confused by this, Unicode limits Codepoints to 0x10FFFF so the maximum number of UTF-8 codeunits is 4. I mean, I can still put the comment in, but it seems pointless? https://www.unicode.org/versions/Unicode14.0.0/ch03.pdf#I1.36559 This limit, of 0x10FFFF has been in place since the year 2000 |
Just noticed that @cor3ntin created his phabricator account the day before I did, idk why but I think that's pretty cool.
LGTM, @MarcusJohnson91, do you need someone to land this on your behalf? If so, what name and email address would you like used for patch attribution?
Hey Aaron, yes, let's go with my real name, Marcus Johnson and MarcusLJohnson1991@gmail
thanks :)
Thanks for the patch! I've landed on your behalf in c3460689288abc98c91d8d6bffa74be9eb16c74d. (I can't close the review because @Bigcheese still has marked as requesting changes.)
Unfortunately, I had to revert in a6beb18b845ca8548319d08df9eea46c87e1e533 because the commit broke at least one of the build bots: https://lab.llvm.org/buildbot#builders/100/builds/13947. Can you investigate?
I changed the test strings to match the other tests, and i tested the changes with ninja check-llvm-support, it should work 🤞
BOM was missing from the test strings, again tested with ninja check-llvm-support and it's again saying everything passed 🤷
@MarcusJohnson91 The failure looks unrelated. The test looks good, I think we can try again. I'd rather wait for @aaron.ballman in case you need to revert again though!
Agreed that the precommit CI failure looks unrelated and that the test seems reasonable. I've landed again in d14ccbc2e88d2cebd27cdd829879ded10ba0c9ea and we'll see how happy the build bots are.
Nope, because there's a reviewer still marked as blocking. I'll try removing them as reviewer, perhaps then Phab will let me close.
I believe this actually needs an additional check that that SrcBytes.size() is < 4 and return false.