This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/tools/llvm-ar/
-
tools/
-
llvm-ar/
-
llvm-ar.cpp

Differential D74477

[llvm-ar] Simplify Windows comparePaths NFCI
ClosedPublic

Authored by andrewng on Feb 12 2020, 3:55 AM.

Download Raw Diff

Details

Reviewers

ruiu
mstorsjo
rnk
rupprecht

Commits

rG430fc538e6dc: [llvm-ar] Simplify Windows comparePaths NFCI

Summary

Replace use of widenPath in comparePaths with UTF8ToUTF16. widenPath
does a lot more than just conversion from UTF-8 to UTF-16. This is not
necessary for CompareStringOrdinal and could possibly even cause
problems.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

andrewng created this revision.Feb 12 2020, 3:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 12 2020, 3:56 AM

This is not necessary for CompareStringOrdinal and could possibly even cause problems.

Perhaps I am not a good person to review this patch. Have a question though: by "cause problems" do you know something that can be wrapped into a test case?

I've added people who seems did the change in this area or around. Hope it might help...

In D74477#1872033, @grimar wrote:

This is not necessary for CompareStringOrdinal and could possibly even cause problems.

Perhaps I am not a good person to review this patch. Have a question though: by "cause problems" do you know something that can be wrapped into a test case?

No, I don't know of an example problem and there might not be a case where a problem could occur. The behaviour of widenPath is dependent on the input length. So if there is a possibility that two strings of different length can be "equal" according to CompareStringOrdinal, then widenPath could cause a problem. What makes the situation a little more likely to be problematic, is that the current implementation of widenPath bases it's decisions on the length of the UTF-8 input.

I felt sad about this original case-insensitive patch D68033. I'll not make objections if this is the consensus. I feel that I don't know enough about Windows to review this patch.

MaskRay removed a reviewer: MaskRay.Feb 12 2020, 11:08 AM

MaskRay added a subscriber: MaskRay.

andrewng edited reviewers, added: ruiu, mstorsjo; removed: gbreynoo.Feb 13 2020, 2:54 AM

andrewng added a subscriber: gbreynoo.

Just to add a bit more context for this change, I have a patch in progress to fix Windows UNC path handling in widenPath and that's how I stumbled upon this usage in llvm-ar. In my modified version of widenPath, I use the length of the input as UTF-16 to decide on whether the path needs to be "expanded", which should reduce the scope for problems in this llvm-ar use case. However, it really isn't needed here and hence this patch.

mstorsjo added a reviewer: rnk.Feb 13 2020, 5:50 AM

In D74477#1873998, @andrewng wrote:

In my modified version of widenPath...

I second grimar's request for an associated test that covers this behavior, but are you saying that the issue is your modified version of sys::path::widenPath breaks some llvm-ar tests, and switching to sys::windows::UTF8ToUTF16 will keep them passing? (If so, submitting w/o an additional test sounds fine to me)

Anyway -- I trust all the other reviewers to do a better review, as I barely ever touch windows.

Herald added a subscriber: rupprecht. · View Herald TranscriptFeb 13 2020, 10:23 AM

lgtm

So, I don't see how adjusting the UNC handling in widenPath will matter. If the two strings are equal, widenPath should probably treat them the same, and add the UNC prefix to LHS iff it does to RHS. But, the UNC handling is clearly unnecessary to do a simple case insensitive string comparison, so this change seems correct even if UNC handling were not an issue.

This revision is now accepted and ready to land.Feb 13 2020, 10:33 AM

In D74477#1874772, @rupprecht wrote:

In D74477#1873998, @andrewng wrote:

In my modified version of widenPath...

I second grimar's request for an associated test that covers this behavior, but are you saying that the issue is your modified version of sys::path::widenPath breaks some llvm-ar tests, and switching to sys::windows::UTF8ToUTF16 will keep them passing? (If so, submitting w/o an additional test sounds fine to me)

My modified version of sys::path::widenPath does not affect any llvm-ar tests and actually should be less likely to cause any problems.

The key point is that the use of sys::path::widenPath is not required here and could possibly cause problems, where as using sys::windows::UTF8ToUTF16 is simpler and more appropriate.

Just curious: Do we know that the paths stored in the archive are actually UTF-8 and not just MBCS using whatever code page the user had when they created the archive?

Closed by commit rG430fc538e6dc: [llvm-ar] Simplify Windows comparePaths NFCI (authored by andrewng). · Explain WhyFeb 14 2020, 3:24 AM

This revision was automatically updated to reflect the committed changes.

In D74477#1875164, @amccarth wrote:

Just curious: Do we know that the paths stored in the archive are actually UTF-8 and not just MBCS using whatever code page the user had when they created the archive?

I don't believe that this is well defined as far as any "ar" file format standard is concerned. However, llvm-ar does make the assumption of UTF-8 encoding and it's command line input's are in UTF-8. This is probably the only sensible default given that there is no indication of what code page a user might have been using.

Thanks. I had missed that InitLLVM converts the command line arguments to UTF-8.

Revision Contents

Path

Size

llvm/

tools/

llvm-ar/

llvm-ar.cpp

5 lines

Diff 244606

llvm/tools/llvm-ar/llvm-ar.cpp

	Show All 15 Lines
	#include "llvm/ADT/Triple.h"			#include "llvm/ADT/Triple.h"
	#include "llvm/IR/LLVMContext.h"			#include "llvm/IR/LLVMContext.h"
	#include "llvm/Object/Archive.h"			#include "llvm/Object/Archive.h"
	#include "llvm/Object/ArchiveWriter.h"			#include "llvm/Object/ArchiveWriter.h"
	#include "llvm/Object/MachO.h"			#include "llvm/Object/MachO.h"
	#include "llvm/Object/ObjectFile.h"			#include "llvm/Object/ObjectFile.h"
	#include "llvm/Support/Chrono.h"			#include "llvm/Support/Chrono.h"
	#include "llvm/Support/CommandLine.h"			#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/ConvertUTF.h"
	#include "llvm/Support/Errc.h"			#include "llvm/Support/Errc.h"
	#include "llvm/Support/FileSystem.h"			#include "llvm/Support/FileSystem.h"
	#include "llvm/Support/Format.h"			#include "llvm/Support/Format.h"
	#include "llvm/Support/FormatVariadic.h"			#include "llvm/Support/FormatVariadic.h"
	#include "llvm/Support/InitLLVM.h"			#include "llvm/Support/InitLLVM.h"
	#include "llvm/Support/LineIterator.h"			#include "llvm/Support/LineIterator.h"
	#include "llvm/Support/MemoryBuffer.h"			#include "llvm/Support/MemoryBuffer.h"
	#include "llvm/Support/Path.h"			#include "llvm/Support/Path.h"
	▲ Show 20 Lines • Show All 481 Lines • ▼ Show 20 Lines

	static bool comparePaths(StringRef Path1, StringRef Path2) {			static bool comparePaths(StringRef Path1, StringRef Path2) {
	// When on Windows this function calls CompareStringOrdinal			// When on Windows this function calls CompareStringOrdinal
	// as Windows file paths are case-insensitive.			// as Windows file paths are case-insensitive.
	// CompareStringOrdinal compares two Unicode strings for			// CompareStringOrdinal compares two Unicode strings for
	// binary equivalence and allows for case insensitivity.			// binary equivalence and allows for case insensitivity.
	#ifdef _WIN32			#ifdef _WIN32
	SmallVector<wchar_t, 128> WPath1, WPath2;			SmallVector<wchar_t, 128> WPath1, WPath2;
	failIfError(sys::path::widenPath(normalizePath(Path1), WPath1));			failIfError(sys::windows::UTF8ToUTF16(normalizePath(Path1), WPath1));
	failIfError(sys::path::widenPath(normalizePath(Path2), WPath2));			failIfError(sys::windows::UTF8ToUTF16(normalizePath(Path2), WPath2));

	return CompareStringOrdinal(WPath1.data(), WPath1.size(), WPath2.data(),			return CompareStringOrdinal(WPath1.data(), WPath1.size(), WPath2.data(),
	WPath2.size(), true) == CSTR_EQUAL;			WPath2.size(), true) == CSTR_EQUAL;
	#else			#else
	return normalizePath(Path1) == normalizePath(Path2);			return normalizePath(Path1) == normalizePath(Path2);
	#endif			#endif
	}			}

	▲ Show 20 Lines • Show All 696 Lines • Show Last 20 Lines