Page MenuHomePhabricator

[llvm-ar] Simplify Windows comparePaths NFCI
ClosedPublic

Authored by andrewng on Feb 12 2020, 3:55 AM.

Details

Summary

Replace use of widenPath in comparePaths with UTF8ToUTF16. widenPath
does a lot more than just conversion from UTF-8 to UTF-16. This is not
necessary for CompareStringOrdinal and could possibly even cause
problems.

Diff Detail

Event Timeline

andrewng created this revision.Feb 12 2020, 3:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2020, 3:56 AM

This is not necessary for CompareStringOrdinal and could possibly even cause problems.

Perhaps I am not a good person to review this patch. Have a question though: by "cause problems" do you know something that can be wrapped into a test case?

grimar edited reviewers, added: MaskRay, gbreynoo; removed: grimar.Feb 12 2020, 5:47 AM
grimar removed a subscriber: gbreynoo.
grimar added a subscriber: grimar.

I've added people who seems did the change in this area or around. Hope it might help...

This is not necessary for CompareStringOrdinal and could possibly even cause problems.

Perhaps I am not a good person to review this patch. Have a question though: by "cause problems" do you know something that can be wrapped into a test case?

No, I don't know of an example problem and there might not be a case where a problem could occur. The behaviour of widenPath is dependent on the input length. So if there is a possibility that two strings of different length can be "equal" according to CompareStringOrdinal, then widenPath could cause a problem. What makes the situation a little more likely to be problematic, is that the current implementation of widenPath bases it's decisions on the length of the UTF-8 input.

I felt sad about this original case-insensitive patch D68033. I'll not make objections if this is the consensus. I feel that I don't know enough about Windows to review this patch.

MaskRay added a subscriber: MaskRay.
andrewng edited reviewers, added: ruiu, mstorsjo; removed: gbreynoo.Feb 13 2020, 2:54 AM
andrewng added a subscriber: gbreynoo.

Just to add a bit more context for this change, I have a patch in progress to fix Windows UNC path handling in widenPath and that's how I stumbled upon this usage in llvm-ar. In my modified version of widenPath, I use the length of the input as UTF-16 to decide on whether the path needs to be "expanded", which should reduce the scope for problems in this llvm-ar use case. However, it really isn't needed here and hence this patch.

rupprecht resigned from this revision.Feb 13 2020, 10:23 AM

In my modified version of widenPath...

I second grimar's request for an associated test that covers this behavior, but are you saying that the issue is your modified version of sys::path::widenPath breaks some llvm-ar tests, and switching to sys::windows::UTF8ToUTF16 will keep them passing? (If so, submitting w/o an additional test sounds fine to me)

Anyway -- I trust all the other reviewers to do a better review, as I barely ever touch windows.

rnk accepted this revision.Feb 13 2020, 10:33 AM

lgtm

So, I don't see how adjusting the UNC handling in widenPath will matter. If the two strings are equal, widenPath should probably treat them the same, and add the UNC prefix to LHS iff it does to RHS. But, the UNC handling is clearly unnecessary to do a simple case insensitive string comparison, so this change seems correct even if UNC handling were not an issue.

This revision is now accepted and ready to land.Feb 13 2020, 10:33 AM

In my modified version of widenPath...

I second grimar's request for an associated test that covers this behavior, but are you saying that the issue is your modified version of sys::path::widenPath breaks some llvm-ar tests, and switching to sys::windows::UTF8ToUTF16 will keep them passing? (If so, submitting w/o an additional test sounds fine to me)

My modified version of sys::path::widenPath does not affect any llvm-ar tests and actually should be less likely to cause any problems.

The key point is that the use of sys::path::widenPath is not required here and could possibly cause problems, where as using sys::windows::UTF8ToUTF16 is simpler and more appropriate.

Just curious: Do we know that the paths stored in the archive are actually UTF-8 and not just MBCS using whatever code page the user had when they created the archive?

This revision was automatically updated to reflect the committed changes.

Just curious: Do we know that the paths stored in the archive are actually UTF-8 and not just MBCS using whatever code page the user had when they created the archive?

I don't believe that this is well defined as far as any "ar" file format standard is concerned. However, llvm-ar does make the assumption of UTF-8 encoding and it's command line input's are in UTF-8. This is probably the only sensible default given that there is no indication of what code page a user might have been using.

Thanks. I had missed that InitLLVM converts the command line arguments to UTF-8.