Page MenuHomePhabricator

Add Hurd triplet to LLVMSupport
ClosedPublic

Authored by sthibaul on Nov 10 2018, 5:31 AM.

Details

Summary

This introduces GNU Hurd as a new target OS.

Diff Detail

Repository
rL LLVM

Event Timeline

sthibaul created this revision.Nov 10 2018, 5:31 AM

I had no idea Hurd was still alive, but hey!

I'm curious to what the triple would be. On Linux, we can have for example "target-gnu-linux", on hurd, would that be "target-gnu-gnu"? Sounds weird...

Also, this will need tests (unit tests in LLVM and parsing tests in clang).

sthibaul updated this revision to Diff 173511.Nov 10 2018, 5:41 AM

The GNU/Hurd triple is target-gnu, simply.

Is the latest version a proper test for unit test?

Yes, the clang part is coming next.

rengolin accepted this revision.Nov 10 2018, 5:46 AM

The GNU/Hurd triple is target-gnu, simply.

I suppose there will never be a non-Gnu Hurd. :)

Is the latest version a proper test for unit test?

Yes it is. LGTM, thanks!

This revision is now accepted and ready to land.Nov 10 2018, 5:46 AM

(I can not commit this myself)

(I can not commit this myself)

No problems, I'll commit that for you. Just need to finish building/testing.

Hm, shouldn't it be in line with most other triples?

I'm not sure how GCC does it but this seems pretty out of line in comparison to all other targets, GNU makes sense to indicate that the target uses a GNU (usually with Glibc) userland but there's no indication of the kernel, which would be something like mk or mach. In which case I think i386-pc-mach-gnu or something along the lines would make sense, unless this is absolutely needed for compatibility reasons?

I'm not sure how GCC does it but this seems pretty out of line in comparison to all other targets, GNU makes sense to indicate that the target uses a GNU (usually with Glibc) userland but there's no indication of the kernel, which would be something like mk or mach. In which case I think i386-pc-mach-gnu or something along the lines would make sense, unless this is absolutely needed for compatibility reasons?

Hi Kristina,

Triples are not always triples, they're actually tuples, and that raises a huge number of issues, for example, in "target-pc-gnu", is that the environment, the OS or both?

Unfortunately, this mess exists for a very long time in toolchains that cannot be rebuilt in OSs that cannot choose a triple that is actually a quadruple.

From Hurd's own porting guidelines [1], "target-pc-gnu" is the triple they use, so there's not much we can do to change that.

The only problem from this patch is if people forget the OS part, for example, write "target-gnu" instead of "target-linux-gnu". Before, that'd throw an error, now it'll match with Hurd. It's hard to avoid hand typos in triples and have a sane code, but automated systems could end up generating "target--gnu" which is still parsed as Hurd.

Honestly, I don't think there is a way to fix this one other than asking Hurd to change their triple to "target-pc-hurd-gnu", but I don't think that would spawn a conversation that would end well. :)

cheers,
--renato

[1] https://www.gnu.org/software/hurd/hurd/porting/guidelines.html

I'm not sure how GCC does it but this seems pretty out of line in comparison to all other targets, GNU makes sense to indicate that the target uses a GNU (usually with Glibc) userland but there's no indication of the kernel, which would be something like mk or mach. In which case I think i386-pc-mach-gnu or something along the lines would make sense, unless this is absolutely needed for compatibility reasons?

Hi Kristina,

Triples are not always triples, they're actually tuples, and that raises a huge number of issues, for example, in "target-pc-gnu", is that the environment, the OS or both?

Unfortunately, this mess exists for a very long time in toolchains that cannot be rebuilt in OSs that cannot choose a triple that is actually a quadruple.

From Hurd's own porting guidelines [1], "target-pc-gnu" is the triple they use, so there's not much we can do to change that.

The only problem from this patch is if people forget the OS part, for example, write "target-gnu" instead of "target-linux-gnu". Before, that'd throw an error, now it'll match with Hurd. It's hard to avoid hand typos in triples and have a sane code, but automated systems could end up generating "target--gnu" which is still parsed as Hurd.

Honestly, I don't think there is a way to fix this one other than asking Hurd to change their triple to "target-pc-hurd-gnu", but I don't think that would spawn a conversation that would end well. :)

cheers,
--renato

[1] https://www.gnu.org/software/hurd/hurd/porting/guidelines.html

Yes but my concern comes from possible confusion with Glibc based Linux targets which are one of the "main" targets supported by LLVM. Either adding hurd or mach somewhere would be sufficient to disambiguate it from others, and yes I'm aware that they have a somewhat free format but they convey some sort of information, ie. with Linux targets gnu implies GNU/Glibc-based userland, so going by that convention one could easily assume that said triple simply implies GNU/Glibc/ELF userland. I think simply using gnu is too ambiguous, especially considering this is being added as an experimental target.

I understand your concerns, but that's really not something that we can change at this point, there is far too much software which has written i386-pc-gnu in their source code for us to change all that.

I understand your concerns, but that's really not something that we can change at this point, there is far too much software which has written i386-pc-gnu in their source code for us to change all that.

I'm getting an immense amount of errors, did you run make check-all on this patch?

rengolin requested changes to this revision.Nov 10 2018, 7:49 AM

Yes but my concern comes from possible confusion with Glibc based Linux targets which are one of the "main" targets supported by LLVM. Either adding hurd or mach somewhere would be sufficient to disambiguate it from others, and yes I'm aware that they have a somewhat free format but they convey some sort of information, ie. with Linux targets gnu implies GNU/Glibc-based userland, so going by that convention one could easily assume that said triple simply implies GNU/Glibc/ELF userland. I think simply using gnu is too ambiguous, especially considering this is being added as an experimental target.

I agree with you completely, but as Samuel said, they can't change the triple (no one can, really).

GCC gets away with this because the triple is part of the build process and doesn't mean a lot, but LLVM relies heavily on it.

So we're really between accepting the patch or not supporting Hurd at all (which is not that big of a problem, I get it :).

I think the issues I'm seeing in the tests are related to the confusion you mention, so I'm retracting my approval until it can be proven correct (and everyone agrees).

This revision now requires changes to proceed.Nov 10 2018, 7:49 AM

not supporting Hurd at all (which is not that big of a problem, I get it :).

Well, it definitely is for GNU/Hurd, now that very basic software pieces such as librsvg start _depending_ on rust, which is available only through llvm.

if people forget the OS part, for example, write "target-gnu" instead of "target-linux-gnu".

Is that really supposed to work? I thought the rules was that the cpu parts and the vendor parts mustn't contain '-', and thus the splitting is never ambiguous, and the os part can contain '-' (heavily used by GNU/Linux, GNU/kFreeBSD, etc.)

Well, it definitely is for GNU/Hurd, now that very basic software pieces such as librsvg start _depending_ on rust, which is available only through llvm.

That was an old Hurd joke (that it would never be used by anyone), thus the smiley face. Jokes don't work well on the Internet, sorry.

I'm getting an immense amount of errors, did you run make check-all on this patch?

To be honest, I hadn't even imagined that the patch could have consequences on non-Hurd systems.

“StartsWith("gnu", Triple::Hurd)”

is really supposed to be the way to detect the GNU (aka GNU/Hurd) triplet (we usually have e.g. i386-unknown-gnu0.3)

Well, it definitely is for GNU/Hurd, now that very basic software pieces such as librsvg start _depending_ on rust, which is available only through llvm.

That was an old Hurd joke (that it would never be used by anyone), thus the smiley face. Jokes don't work well on the Internet, sorry.

Ok :)

I guess I'm just getting too used to seeing a lot of people refuse to move a finger for the GNU/Hurd port.

Is that really supposed to work? I thought the rules was that the cpu parts and the vendor parts mustn't contain '-', and thus the splitting is never ambiguous, and the os part can contain '-' (heavily used by GNU/Linux, GNU/kFreeBSD, etc.)

That was my initial comment to Kristina. With triples, there is no must. It is what it is because it was so before. The reason you cannot add "hurd" to your triple is the same we cannot add extra dashes or reorder / insert, for public facing triples.

Internally, Clang normalises everything (sometimes destroying the meaning of the triple, because the previous meaning is "encoded" into the IR generated, and the IR triple is now for the back-end consumption).

So, there you have it, a triple is a bunch of things for different meanings and it gets completely decimated along its path through the compiler.

Getting even simple things right can go wrong in horrible ways. I believe that was Kristina original worry and I completely agree with her. It seems I was a bit more optimistic than I should today and Hurd made me smile.

Now, for the next steps, we'll have to go the hard way.

We'll need:

  • complete tests, building Clang+LLVM, make check-all passing on at least one major arch
  • Explanation of all variations a Hurd triple can take (additional flags, variants from other compilers)
  • What does the OS/Env "gnu" means (does it mandate glibc, can it be used with some other libraries, etc)

And we'd also have to understand the actual conflict of the OS/Env "gnu" alternatives.

You could use a different entry point for the driver similar to clang-cl or do what clang does on Darwin systems, that would avoid using this triple. What goes into clang driver is translated into an invocation of clang -cc1 frontend, which is where the -triple option is passed through. If you mess with the driver or just build your own invocation, and translate your triple to something that makes more sense internally (ie. i386-pc-mach-gnu) you have more leeway. Here however you're starting the process of adding a new experimental target to LLVM and in general my advice on new targets is to get some form of blessing from @chandlerc since I'd prefer for him to quickly glance over this. That's not to say I'm against adding support for GNU Hurd, I'm moreso implying that regardless of what GNU Hurd guidelines say, if they may cause problems in LLVM ecosystem, especially as far as core targets (or rather platforms) go, you have to consider the large ecosystems that use Clang and LLVM and sometimes adapt to avoid causing issues downstream (for example with distro vendors with Glibc/GNU Linux being the core of main variations of server and desktop Linux distros).

I really do find the break with convention here somewhat deeply unfortunate. It makes parsing and recognizing triples reliably much harder IMO. This really should be 'hurd-gnu' or 'mach-gnu' (or however you want to spell the 'OS' here).

If the only way to support this is to use the existing name, I still think that accepting arbitrary suffixes is a mistake. What do they mean? What do we do with them when canonicalizing? If there are suffixes, we need to know how to parse them and to do so accurately.

To be honest, I hadn't even imagined that the patch could have consequences on non-Hurd systems.

Me neither, that's why I was so optimistic. :/

“StartsWith("gnu", Triple::Hurd)”

is really supposed to be the way to detect the GNU (aka GNU/Hurd) triplet (we usually have e.g. i386-unknown-gnu0.3)

There are some implicit assumptions about the environment on "non-4th" position, meaning the dashes don't guarantee much.

The main problem is not what's "right" and "wrong", but what will work with everything else and what won't.

As I explained before, GCC gets away with quirky triples by not relying on them much, which we can't do.

That's not to say I'm against adding support for GNU Hurd, I'm moreso implying that regardless of what GNU Hurd guidelines say, if they may cause problems in LLVM ecosystem, especially as far as core targets (or rather platforms) go, you have to consider the large ecosystems that use Clang and LLVM and sometimes adapt to avoid causing issues downstream (for example with distro vendors with Glibc/GNU Linux being the core of main variations of server and desktop Linux distros).

Yup.

If you absolutely must use this triple, however, I would suggest handling it in the Clang driver, you have slightly more leeway there, and there isn't going to be as much resistance since one of the main reasons we have the driver (aside from the obvious) is translating certain flags from what people are used to (or what's needed for compatibility) to an internal compiler command-line, aka -cc1 (or -cc1as). Again, support for GNU Hurd is very welcome here, it's just that we also have to make sure everyone else is happy, which includes downstream GNU Linux variations and various Glibc maintainers. I'm merely suggesting not registering the target with that triple in LLVMSupport which is kind of core to *all* parts of LLVM, while your interactions with Clang are mostly governed by the driver (you don't usually avoid the driver since it builds/adds/translates a lot of internal but required flags before invoking the frontend).

I really do find the break with convention here somewhat deeply unfortunate. It makes parsing and recognizing triples reliably much harder IMO. This really should be 'hurd-gnu' or 'mach-gnu' (or however you want to spell the 'OS' here).

It's already too hard as it is. I was trying to grep lines from Triple.cpp to see if we already had conflicting names between OS and Env, but wasn't very successful.

Glancing with the eye, I didn't find anything conflicting, <unrelated> but there seems to be a "cloudabi" in the OS type, which is weird. </unrelated>

If the only way to support this is to use the existing name, I still think that accepting arbitrary suffixes is a mistake. What do they mean? What do we do with them when canonicalizing? If there are suffixes, we need to know how to parse them and to do so accurately.

I don't want to create another target parser... :(

The TargetParser was introduced for Arm because we have a similar problem and it's is a big mess. But those names are used pretty much everywhere and we have no other choice (like why Windows9 had to become Windows10).

On the other hand, as Kristina said, Hurd's *IS* an experimental OS. I don't think it ever will _not_ be. The weight it has on the complexity of parsing should be considered from that point of view.

Hurd's parsing itself may not be complicated (assuming only version numbers can come after, etc), "gnu" starts a good number of other environments, which are already carefully ordered in parseEnvironment.

sthibaul added a comment.EditedNov 10 2018, 10:52 AM

For sure I consider that not breaking anything on other archs is a must :)

accepting arbitrary suffixes is a mistake

Well, I was actually only talking about what the usual autoconf-provided config.guess script returns: i686-unknown-gnu0.9. The real triplet that we use in cross-compilation toolchains etc. is i386-pc-gnu or i386-unknown-gnu.

If you absolutely must use this triple

Mapping i386-pc-gnu to i386-pc-hurd-gnu internally is fine for me. Actually strictly speaking it should be i386-pc-gnu-gnu, because the system is really called GNU. That's what we use inside glibc for instance. Would that be enough to keep the current strategy, or should it be still kept inside the clang driver? I'm asking mostly because since I know next to nothing on the llvm architecture, I don't really know how to do that other approach.

Explanation of all variations a Hurd triple can take (additional flags, variants from other compilers)

I don't think there is currently any variant beyond i[3456]86-{pc,unknown}-gnu[0-9.]*

What does the OS/Env "gnu" means (does it mandate glibc, can it be used with some other libraries, etc)

Currently it mandates glibc, yes, there is no plan to make the GNU system use another library than the GNU library :)

And we'd also have to understand the actual conflict of the OS/Env "gnu" alternatives.

I at least know about {linux,window,kfreebsd}-gnu{,eabi,eabihf,-elf}

sthibaul updated this revision to Diff 173541.Nov 10 2018, 4:41 PM

Perhaps we could go this way with additional code in cmake/config-ix.cmake, to get "hurd-gnu" as soon as possible just after detection and keep everything else inside llvm straight with llvm conventions?

(as expected, I could run make check-all fine on an x86_64 linux host)

kristina requested changes to this revision.Nov 10 2018, 4:49 PM

Perhaps we could go this way with additional code in cmake/config-ix.cmake, to get "hurd-gnu" as soon as possible just after detection and keep everything else inside llvm straight with llvm conventions?

I really don't think this should go into the build system, again, I would suggest using one of the suggested triples for LLVM's internal use and handling your special case inside the Clang driver. I'm just suggesting the path of least resistance, so to speak, involving the build system as well is definitely not the way to go. I would suggest using the triple you have at the moment and then dealing with it in the driver in the revision I linked to this.

This revision now requires changes to proceed.Nov 10 2018, 4:49 PM

I'm not sure to understand: do you mean to drop this patch entirely, so that llvm considers i386-pc-gnu as just a GNU-ish platform, and then Driver::getToolChain should special-case the Triple::UnknownOS and parse Triple::getTriple itself to call the Hurd code instead of the default toolchains::Generic_GCC case?

Or perhaps at least keep a Triplet::isOSHurd() which does the hurd-specific triplet parsing, instead of adding a function to clang to be used wherever we want to check for that?

I'm not sure to understand: do you mean to drop this patch entirely, so that llvm considers i386-pc-gnu as just a GNU-ish platform, and then Driver::getToolChain should special-case the Triple::UnknownOS and parse Triple::getTriple itself to call the Hurd code instead of the default toolchains::Generic_GCC case?

No, this patch is fine, aside from the part that involves the build system. i386-pc-hurd-gnu is fine as is, however, it seems that you want to be able to use i386-pc-gnu from Clang, if I understand correctly.

Take a look at: https://github.com/llvm-project/llvm-project-20170507/blob/master/clang/lib/Driver/

There's a lot of workarounds for a lot of various and bizzare cases, and while it is cluttered, you can resort to something similiar to what LLVM tools do and check Argv0 and have clang-hurd or something along the lines, you can later pick that up and if your triple was specified, rewrite the -triple argument to the internal one used by LLVM. Or you can just do what I mentioned without any of that, though it may receive more pushback. In either case to invoke the compiler, the driver will construct a -triple argument which doesn't have to be the same as the one given to the driver. You have several opportunities to emit your LLVM triple there. There are some rather creative (read that as hacky) workarounds inside parts of the driver, and while I don't encourage going overboard with hacky approaches, it's possible to make something reasonable, like what I mentioned. You can definitely find a reasonable approach that works for you and does not cause problems.

i386-pc-hurd-gnu is fine as is, however, it seems that you want to be able to use i386-pc-gnu from Clang, if I understand correctly.

Aah, ok, so llvm itself would use the cmake-provided target i386-unknown-gnu0.9 which makes it only behave GNU-ish, but the driver part of clang can provide an explicit target to llvm, *before* getOSTypeName gets called. That's what I wasn't understanding, I thought the clang Driver things were after llvm processing.

Take a look at: https://github.com/llvm-project/llvm-project-20170507/blob/master/clang/lib/Driver/

Well, it's 13875 lines of code... Considering that you mentioned Darwin, do you mean tinkering with Driver.cpp's computeTargetTriple?

check Argv0 and have clang-hurd or something along the lines

Err, we do not want a special clang invocation program, we just want usual clang calls running on Hurd to use the i386-pc-hurd-gnu target by default, I guess that's what computeTargetTriple would allow to achieve?

sthibaul updated this revision to Diff 173600.Nov 11 2018, 4:45 PM

So we are back with this.

kristina accepted this revision.Nov 11 2018, 6:19 PM

Alright, as far as this part goes everything LGTM.

Would prefer to wait for @rengolin to sign off on this as well.

kristina retitled this revision from Add Hurd triplet to Add Hurd triplet to LLVMSupport.Nov 11 2018, 6:21 PM
kristina edited the summary of this revision. (Show Details)
echristo accepted this revision.Nov 13 2018, 4:50 PM

I can sign off on this.

Thanks!

-eric

rengolin accepted this revision.Nov 13 2018, 11:26 PM
This revision is now accepted and ready to land.Nov 13 2018, 11:26 PM
This revision was automatically updated to reflect the committed changes.