This is an archive of the discontinued LLVM Phabricator instance.

ConvertUTF, new wrapper API
ClosedPublic

Authored by MarcusJohnson91 on Nov 21 2021, 6:13 PM.

Details

Summary

I need this new API for wscanf/wprintf checker support

Diff Detail

Event Timeline

MarcusJohnson91 requested review of this revision.Nov 21 2021, 6:13 PM
MarcusJohnson91 edited the summary of this revision. (Show Details)Nov 21 2021, 6:16 PM

Rebased on main

Bigcheese requested changes to this revision.Jan 21 2022, 6:29 PM

Once these issues are fixed this should be fine.

llvm/lib/Support/ConvertUTFWrapper.cpp
148

I believe this actually needs an additional check that that SrcBytes.size() is < 4 and return false.

172

This is technically correct, but it's implicit in that the max number of UTF8 code units per code point is the same as sizeof(UTF32). Would be nice to have a comment.

This revision now requires changes to proceed.Jan 21 2022, 6:29 PM
aaron.ballman added a subscriber: aaron.ballman.

Adding some more Unicode folks to the reviewers just in case, but I think this is reasonable.

llvm/lib/Support/ConvertUTFWrapper.cpp
148

Should we be testing that the input array length is a multiple of 4 as well?

I though modern LLVM is to return llvm::Expected<std::string>?!?

I though modern LLVM is to return llvm::Expected<std::string>?!?

This is true, but this patch is matching the existing API. I don't like the current API, but I don't think this patch needs to fix that, and specifically shouldn't diverge from the existing API.

cor3ntin added inline comments.Jan 25 2022, 4:18 AM
llvm/lib/Support/ConvertUTFWrapper.cpp
148

There is an assert just below, I suppose that's fine

163

SrcEnd = Src + ByteSwapped.size() may be easier to read.

cor3ntin added inline comments.Jan 25 2022, 5:33 AM
llvm/lib/Support/ConvertUTFWrapper.cpp
148

no, that assert doesn't do that. so you are right something is missing here - and on second thought, it probably should not be an assert, just a check in that if

aaron.ballman added inline comments.Jan 25 2022, 5:35 AM
llvm/lib/Support/ConvertUTFWrapper.cpp
148

I'm happy with the alignment assertion, but I was asking for assert(SrcBytes.size() % sizeof(UTF32) == 0); to ensure nobody does something silly like pass in 13 bytes of data and expect good things to happen.

Implemented feedback

Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2022, 3:01 PM
cor3ntin accepted this revision.Mar 21 2022, 10:55 AM

Beside the missing comment Michael wanted, I think this looks consistent with the existing functions

llvm/lib/Support/ConvertUTFWrapper.cpp
172

Nit: The comment still doesn't say that we assume there can only be 4 bytes per utf-8 code units - which would not be the case if the utf-8 comes for non-iso10646 conforming android environments for example

llvm/lib/Support/ConvertUTFWrapper.cpp
172

I was confused by this, Unicode limits Codepoints to 0x10FFFF so the maximum number of UTF-8 codeunits is 4.

I mean, I can still put the comment in, but it seems pointless?

https://www.unicode.org/versions/Unicode14.0.0/ch03.pdf#I1.36559

This limit, of 0x10FFFF has been in place since the year 2000

https://www.unicode.org/L2/L2000/00079-n2175.htm

Added comment @cor3ntin requested

Just noticed that @cor3ntin created his phabricator account the day before I did, idk why but I think that's pretty cool.

aaron.ballman accepted this revision.Mar 22 2022, 4:54 AM

LGTM, @MarcusJohnson91, do you need someone to land this on your behalf? If so, what name and email address would you like used for patch attribution?

MarcusJohnson91 added a comment.EditedMar 22 2022, 9:16 AM

LGTM, @MarcusJohnson91, do you need someone to land this on your behalf? If so, what name and email address would you like used for patch attribution?

Hey Aaron, yes, let's go with my real name, Marcus Johnson and MarcusLJohnson1991@gmail

thanks :)

LGTM, @MarcusJohnson91, do you need someone to land this on your behalf? If so, what name and email address would you like used for patch attribution?

Hey Aaron, yes, let's go with my real name, Marcus Johnson and MarcusLJohnson1991@gmail

thanks :)

Thanks for the patch! I've landed on your behalf in c3460689288abc98c91d8d6bffa74be9eb16c74d. (I can't close the review because @Bigcheese still has marked as requesting changes.)

LGTM, @MarcusJohnson91, do you need someone to land this on your behalf? If so, what name and email address would you like used for patch attribution?

Hey Aaron, yes, let's go with my real name, Marcus Johnson and MarcusLJohnson1991@gmail

thanks :)

Thanks for the patch! I've landed on your behalf in c3460689288abc98c91d8d6bffa74be9eb16c74d. (I can't close the review because @Bigcheese still has marked as requesting changes.)

Thanks so much, I'll try to close it :)

MarcusJohnson91 accepted this revision.Mar 22 2022, 11:30 AM

LGTM, @MarcusJohnson91, do you need someone to land this on your behalf? If so, what name and email address would you like used for patch attribution?

Hey Aaron, yes, let's go with my real name, Marcus Johnson and MarcusLJohnson1991@gmail

thanks :)

Thanks for the patch! I've landed on your behalf in c3460689288abc98c91d8d6bffa74be9eb16c74d. (I can't close the review because @Bigcheese still has marked as requesting changes.)

Thanks so much, I'll try to close it :)

Unfortunately, I had to revert in a6beb18b845ca8548319d08df9eea46c87e1e533 because the commit broke at least one of the build bots: https://lab.llvm.org/buildbot#builders/100/builds/13947. Can you investigate?

LGTM, @MarcusJohnson91, do you need someone to land this on your behalf? If so, what name and email address would you like used for patch attribution?

Hey Aaron, yes, let's go with my real name, Marcus Johnson and MarcusLJohnson1991@gmail

thanks :)

Thanks for the patch! I've landed on your behalf in c3460689288abc98c91d8d6bffa74be9eb16c74d. (I can't close the review because @Bigcheese still has marked as requesting changes.)

Thanks so much, I'll try to close it :)

Unfortunately, I had to revert in a6beb18b845ca8548319d08df9eea46c87e1e533 because the commit broke at least one of the build bots: https://lab.llvm.org/buildbot#builders/100/builds/13947. Can you investigate?

Awww

I'll check into it.

MarcusJohnson91 updated this revision to Diff 417499.EditedMar 22 2022, 10:17 PM

I changed the test strings to match the other tests, and i tested the changes with ninja check-llvm-support, it should work 🤞

Why does ninja check-llvm-support work on my machine but fail every single time here

BOM was missing from the test strings, again tested with ninja check-llvm-support and it's again saying everything passed 🤷

cor3ntin accepted this revision.Mar 23 2022, 2:28 AM

@MarcusJohnson91 The failure looks unrelated. The test looks good, I think we can try again. I'd rather wait for @aaron.ballman in case you need to revert again though!

@MarcusJohnson91 The failure looks unrelated. The test looks good, I think we can try again. I'd rather wait for @aaron.ballman in case you need to revert again though!

Agreed that the precommit CI failure looks unrelated and that the test seems reasonable. I've landed again in d14ccbc2e88d2cebd27cdd829879ded10ba0c9ea and we'll see how happy the build bots are.

@MarcusJohnson91 The failure looks unrelated. The test looks good, I think we can try again. I'd rather wait for @aaron.ballman in case you need to revert again though!

Agreed that the precommit CI failure looks unrelated and that the test seems reasonable. I've landed again in d14ccbc2e88d2cebd27cdd829879ded10ba0c9ea and we'll see how happy the build bots are.

Seems like they were happy after all? 🤞

@MarcusJohnson91 The failure looks unrelated. The test looks good, I think we can try again. I'd rather wait for @aaron.ballman in case you need to revert again though!

Agreed that the precommit CI failure looks unrelated and that the test seems reasonable. I've landed again in d14ccbc2e88d2cebd27cdd829879ded10ba0c9ea and we'll see how happy the build bots are.

Seems like they were happy after all? 🤞

I believe so -- I've not seen any related failures come across my inbox, at least.

@aaron.ballman Can you close this?

Nope, because there's a reviewer still marked as blocking. I'll try removing them as reviewer, perhaps then Phab will let me close.

This revision is now accepted and ready to land.Apr 2 2022, 7:00 AM