This is an archive of the discontinued LLVM Phabricator instance.

Teach llvm's path library to support both windows and posix paths at the same time.
ClosedPublic

Authored by zturner on Mar 10 2017, 9:37 PM.

Details

Summary

I expect this to be wildly controversial, but here goes.

In LLDB we often are referring to path's on a remote host. You're remote debugging a Windows process from a Linux machine, for example. You need to be able to refer to files on the target machine by typing their paths. Currently LLDB has a TON of custom logic to do all this, and it doesn't handle all the obscure edge cases as well as LLVM does.

99% of LLVM's path handling code is already syntax-agnostic and there are literally just a handful of instructions that are platform specific. So supporting both at the same time should only increase code size by a small fraction. All of the runtime checks are trivially inlined. Path handling is hard to get right, so it's a shame we can't take advantage of the 99% not-syntax-aware code just because of the 1% that is.

Although I expect LLDB to be the biggest consumer, it seems possible this functionality could find use elsewhere. We read and write paths to and from binary files on arbitrary platforms, so it seems reasonable that someday one of these files might contain a Windows path and we try to read it on a Linux machine (or vice versa) and want to do something meaningful with the path.

All path functions are updated to take a Style argument which is defaulted to native, so existing behavior is unchanged. However, I've updated the test suite to remove as many #ifdefs as possible to prove that we can manipulate windows paths on linux, and vice versa.

Diff Detail

Repository
rL LLVM

Event Timeline

zturner created this revision.Mar 10 2017, 9:37 PM
labath edited edge metadata.Mar 13 2017, 3:28 AM

Lldb has a copy of a part of this code with the same type of #ifdef -> if changes applied, so it would be great it we could get rid of that.

What concerns me about this from a usability point of view is that the path style is not stored explicitly anywhere -- it just floats around implicitly. For working with remote paths, you're going to need some sort of an object which encapsulates this information. I suppose this could live in lldb's FileSpec as it did until now, but I'm wondering whether we shouldn't add something like that here, if we're going to start advertising foreign path support. I am asking that now because that would affect the signature of all of these functions, and it's probably better to avoid changing them twice.

llvm/unittests/Support/Path.cpp
62 ↗(On Diff #91451)

stray comma

@zturner said (in email)

I actually think that should live in lldb. All of llvm's path handling code is stateless which makes the most sense from an api perspective. Lldb is just "special" in that regard

I think that makes sense when you don't have to worry about path syntax, as then there isn't actually any state to maintain. But if you support different path syntaxes, then it sounds to me like you should maintain that explicitly, as without it, there is no way to interpret the path correctly. I was thinking of having a class (let's call it FileSpec for lack of a better name), which would basically be a glorified std::pair<SmallString, Style> wrapper.

All the manipulation functions would remain freestanding as they are now, but they would accept either a SmallString (in which case they assume native path syntax) or the FileSpec object, where the syntax is stored explicitly. (this would probably require a helper type FileSpecRef, constructible from both, to enable code sharing in the implementation of these functions.)

I can live with this implementation, if that's what people consider better, but I think I'd prefer something where you have the path syntax stored explicitly.

@zturner said (in email)

I actually think that should live in lldb. All of llvm's path handling code is stateless which makes the most sense from an api perspective. Lldb is just "special" in that regard

I think that makes sense when you don't have to worry about path syntax, as then there isn't actually any state to maintain. But if you support different path syntaxes, then it sounds to me like you should maintain that explicitly, as without it, there is no way to interpret the path correctly. I was thinking of having a class (let's call it FileSpec for lack of a better name), which would basically be a glorified std::pair<SmallString, Style> wrapper.

All the manipulation functions would remain freestanding as they are now, but they would accept either a SmallString (in which case they assume native path syntax) or the FileSpec object, where the syntax is stored explicitly. (this would probably require a helper type FileSpecRef, constructible from both, to enable code sharing in the implementation of these functions.)

I can live with this implementation, if that's what people consider better, but I think I'd prefer something where you have the path syntax stored explicitly.

Well, yea. I just think that's exactly what LLDB's FileSpec class already is (or should be anyway, after removing all the excess stuff, which this patch will enable) ;-) And I don't think anyone outside of LLDB will ever need it.

In any case, is anyone opposed to enabling dynamic path syntax support instead of static?

All the manipulation functions would remain freestanding as they are now, but they would accept either a SmallString (in which case they assume native path syntax) or the FileSpec object, where the syntax is stored explicitly. (this would probably require a helper type FileSpecRef, constructible from both, to enable code sharing in the implementation of these functions.)

I can live with this implementation, if that's what people consider better, but I think I'd prefer something where you have the path syntax stored explicitly.

Well, yea. I just think that's exactly what LLDB's FileSpec class already is (or should be anyway, after removing all the excess stuff, which this patch will enable) ;-) And I don't think anyone outside of LLDB will ever need it.

Ok, that makes sense. I guess we can pull FileSpec down to llvm if/when another user of the functionality comes up.

In any case, is anyone opposed to enabling dynamic path syntax support instead of static?

No objection from me.

This revision was automatically updated to reflect the committed changes.