Page MenuHomePhabricator

For Linux/gnu compatibility, preinclude <stdc-predef.h> if the file is available
ClosedPublic

Authored by mibintc on Jun 13 2017, 11:57 AM.

Details

Summary

As reported in llvm bugzilla 32377.
Here’s a patch to add preinclude of stdc-predef.h.

The gcc documentation says “On GNU/Linux, <stdc-predef.h> is pre-included.” See https://gcc.gnu.org/gcc-4.8/porting_to.html;

The preinclude is inhibited with –ffreestanding.

Basically I fixed the failing test cases by adding –ffreestanding which inhibits this behavior.

I fixed all the failing tests, including some in extra/test, there's a separate patch for that which is linked here

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Hahnfeld added inline comments.Jun 27 2017, 11:12 PM
test/Driver/gcc-predef.c
9

The driver will only include stdc-predef.h if it can be found in the system. With that, the current version of this test will only run on such Linux system. Maybe add a basic tree in test/Driver/Inputs and test that the corresponding header is included?

I will take a look at the final version tomorrow.

Fedor, let me address the comments from Jonas (with another revision!) before you take a look.

mibintc planned changes to this revision.Jun 28 2017, 9:22 PM

I'm planning to rework this patch again. sorry for the churn.

mibintc updated this revision to Diff 104911.Jun 30 2017, 1:27 PM
mibintc added a subscriber: ilya-biryukov.

The previous (n-1) version didn't work: I wanted to iterate through the system include directories to check for the existence of stdc-predef.h but clang is using a different kind of system include option for /usr and /usr/include so the iterator didn't find any system includes at all, i don't think there is an iterator which will iterate through these special kind of include directories. [Kind of embarrassing that I posted that patch which didn't work at all, sorry about that.]

This one behaves more like gcc anyway. gcc puts the -include option in there regardless, and to suppress the behavior you are to use the -ffreestanding option.

I responded to the remarks from Jonas Hahnfeld, making the patch only for Linux.

I'm having trouble with one of the existing tests. These are the ones that fail. I don't know how to fix the text. It's a google test and it is using a dummy file system with whitelisted directories. @ilya-biryukov Is there a way to pass in a command line argument? if so, I could fix it these failures by adding -ffreestanding option which suppresses the new behavior. That's how I fixed the many test failures which I encountered with this new behavior.

Extra Tools Unit Tests :: clangd/ClangdTests/ClangdVFSTest.Parse
Extra Tools Unit Tests :: clangd/ClangdTests/ClangdVFSTest.ParseWithHeader
Extra Tools Unit Tests :: clangd/ClangdTests/ClangdVFSTest.Reparse

The other test that fails is my own new test! It fails because I don't know how to set it up so the test thinks it has a gcc toolchain with version > 4.8. I tried using gcc-toolchain set to various other Linux toolchains that i see in the test/Driver/Inputs - none of them cause the gcc version to be in the range. I also tried using -ccc-installation=Inputs/ which I see being used for gcc version parsing. How can I set up the test so that the GCCInstallation has a Version >= 4.8? I test the new functionality from the console on Linux and can confirm it's working.

Failing Tests (4): -- the 3 listed above, as well as

Clang :: Driver/gcc-predef.c -- my new test, the 3rd run which confirms that the include is added

Here's a patch to fix ClangdTests: https://reviews.llvm.org/D34936
We should probably land it before your patch to avoid ClangdTests failures between those llvm and clang-tools-extra revisions.

The other test that fails is my own new test! It fails because I don't know how to set it up so the test thinks it has a gcc toolchain with version > 4.8. I tried using gcc-toolchain set to various other Linux toolchains that i see in the test/Driver/Inputs - none of them cause the gcc version to be in the range. I also tried using -ccc-installation=Inputs/ which I see being used for gcc version parsing. How can I set up the test so that the GCCInstallation has a Version >= 4.8? I test the new functionality from the console on Linux and can confirm it's working.

You could try adding a completely new directory in Inputs/. Add the directory usr/lib/gcc/x86_64-unknown-linux/4.9.0 underneath which should be recognized as the version. Maybe you have to add some more files, I'm not really familiar with the detection mechanism...

lib/Driver/ToolChains/Gnu.cpp
2332–2343

I think this can still be moved to Linux...

test/Driver/gcc-predef.c
28

Maybe it's better to run clang with -### and checking that it passes -include to cc1?

Then you can have:

// CHECK-PREDEF: "-include" "stdc-predef.h"
// CHECK-NO-PREDEF-NOT: "-include" "stdc-predef.h"

and pass the corresponding prefixes to FileCheck. Could you also add a test that it the file is not included if it does not exist? This can be done with the current basic_linux_tree

The other test that fails is my own new test! It fails because I don't know how to set it up so the test thinks it has a gcc toolchain with version > 4.8. I tried using gcc-toolchain set to various other Linux toolchains that i see in the test/Driver/Inputs - none of them cause the gcc version to be in the range. I also tried using -ccc-installation=Inputs/ which I see being used for gcc version parsing. How can I set up the test so that the GCCInstallation has a Version >= 4.8? I test the new functionality from the console on Linux and can confirm it's working.

You could try adding a completely new directory in Inputs/. Add the directory usr/lib/gcc/x86_64-unknown-linux/4.9.0 underneath which should be recognized as the version. Maybe you have to add some more files, I'm not really familiar with the detection mechanism...

Thanks. I did that by copying the directory from fake_install_tree into a new install directory named stdc-predef. However, this modification didn't result in altering the macro values for gnu-major and gnu-minor. On the other hand, it is necessary to have a gcc installation in the --sysroot directory. The test didn't pass if the gcc installation wasn't present in the Inputs directory. I used the method you suggested and modified the test to use -### and check for the presence or absence of -include stdc-predef.h. Now these new tests are passing. Thank you very much for the suggestion.

mibintc marked an inline comment as done.Jul 5 2017, 1:35 PM
mibintc updated this revision to Diff 105329.Jul 5 2017, 1:48 PM

I responded to the review from Jonas Hahnfeld: I moved the entire change into ToolChains/Linux and removed the modifications to Gnu.h and Gnu.cpp; I modified the new tests to use -### with FileCheck, and added a tree in Inputs/ for the new tests. Jonas please see inline reply concerning whether to--or how to--check for existence of stdc-predef.h

Also, note that there are some tests from clang-tidy that need to be updated. I have a separate patch for that since it's in a different repo. I assume it would be good to get those modifications landed before this patch.

Jonas asked about adding a new test to ensure that "-include stdc-predef.h" does not get added if the file doesn't exist.

I added a reply to that but I can't see where it went. So I'm writing the reply again.

The current version of the patch doesn't check for the existence of stdc-predef.h. As far as I understand, this is consistent with gcc behavior. It is expected that if you are on a system without stdc-predef.h then you can add -ffreestanding.

In a prior revision of this patch (see Diff 4), I attempted to iterate through the system directories to check for the existence of the file before adding -include with a full path name. However that patch didn't work. I was using this call to iterate through the system includes: getAllArgValues(options::OPT_isystem). However, the directory where stdc-predef.h is located, /usr/include, is added into the search path by using the option -internal-externc-isystem. getAllArgValues(options::OPT_isystem) does not iterate through the include directories which were added via -internal-externc-isystem. [Note: is this a bug?]. There is no enumeration value within options::OPT_? which corresponds to -internal-externc-isystem.

I could check for the existence of this file: /usr/include/stdc-predef.h; I don't know whether this would be acceptable or if it's correct.

jyknight added inline comments.
lib/Driver/ToolChains/Linux.cpp
755

I don't see why it makes any sense to condition this based on a GCC installation existing and being >= 4.8, since this header comes from libc, and is necessary for standards compliance even if there's absolutely no GCC installed or involved.

Additionally, something like this -- a way for the libc to be involved in setting up predefined macros -- seems probably necessary for *any* libc, if they want to be strictly standards-compliant. Some of the required-to-be-predefined macros like __STDC_ISO_10646__ can only really be provided by the libc, since the appropriate value is dependent on the locale implementation in the libc, and not on the compiler at all.

It's possible that glibc on Linux is the only one who's cared to implement it thus far (and, only when you're using GCC, at that, hence this change...). That seems likely, really, since mostly the impacted macros are nearly useless, so who cares... :) However, if others want to be conforming, I expect they _ought_ to be using this mechanism, as well...

Thus, perhaps it ought to be an unconditional behavior of clang to include the file if it exists, unless -ffreestanding is passed.

mibintc updated this revision to Diff 105526.Jul 6 2017, 1:55 PM

I rewrote the patch the way that Richard suggested, adding a cc1 option "finclude-if-exists" and injecting #if __has_include etc. [OK to use finclude? include-if-exists preferred?]
I responded to James' remarks and removed the gcc version check.
Does this look better?

mibintc edited the summary of this revision. (Show Details)Jul 6 2017, 1:55 PM
jyknight added inline comments.Jul 6 2017, 2:55 PM
lib/Driver/ToolChains/Linux.cpp
755

This is still checking GCCInstallation.isValid(), OPT_nostdinc, and only attempting to load the header on Linux OSes.

I don't think it shouldn't be conditioned by the former two, and I think it should attempt to include stdc-predef.h if it exists on *all* OSes, not just Linux (for the reasons described in the above comment).

mibintc updated this revision to Diff 105671.Jul 7 2017, 11:57 AM

I updated the patch as James directed, moving the preinclude of stdc-predef.h into clang.cpp and out of the Linux toolchain entirely. I also needed to update a couple more tests in tools/extra, so I will need to update that related differential as well.

mibintc retitled this revision from to support gcc 4.8 (and newer) compatibility on Linux, preinclude <stdc-predef.h> to For standards compatibility, preinclude <stdc-predef.h> if the file is available .Jul 10 2017, 8:00 AM
mibintc edited the summary of this revision. (Show Details)

This version is still disabling upon -nostdinc, which doesn't make sense to me.

You didn't remove that, nor respond explaining why you think it does make sense?

This version is still disabling upon -nostdinc, which doesn't make sense to me.

I agree. If I pass -nostdinc, so that I then arrange include paths for my own libc headers, I'd then like to pick up the predef header from that library (if available). There's no clearly right answer for whether __STDC_NO_THREADS__ is defined, for example, regardless of whether we're using -nostdinc.

You didn't remove that, nor respond explaining why you think it does make sense?

This version is still disabling upon -nostdinc, which doesn't make sense to me.

You didn't remove that, nor respond explaining why you think it does make sense?

You're right, I should remove the check on nostdinc. Sorry I missed that remark from you initially, that's why I didn't reply. I will upload another patch which removes the check on nostdinc.

mibintc updated this revision to Diff 105904.Jul 10 2017, 1:08 PM
mibintc edited the summary of this revision. (Show Details)

I removed the check on -nostdinc; and made some updates to the test cases

mibintc edited the summary of this revision. (Show Details)Jul 10 2017, 1:09 PM

OK folks, I was off the grid last week but I'm back now, and at my grindstone again.
I haven't received any comments since I updated the patch to remove the checks on "-nostdinc".
Look OK to commit?
Many thanks for all your reviews
--Melanie

mibintc planned changes to this revision.Jul 27 2017, 10:27 AM

I'm going to rebase the patch to latest.

mibintc updated this revision to Diff 108519.Jul 27 2017, 1:38 PM

Here is an updated diff which is rebased to current trunk per @fedor.sergeev 's suggestion (thank you). make check-all and check-clang are working for me on Linux with no unexpected failures.

The associated patch for test cases in the tools/extra directory is still valid, it doesn't need updating.

fedor.sergeev added a comment.EditedJul 27 2017, 3:58 PM

Hmm... I tried this patch and now the following worries me:

  • it passes -finclude-if-exists stdc-predef.h on all platforms (say, including my Solaris platform that has no system stdc-predef.h)
  • it searches all the paths, not just "system include" ones

That essentially disallows user to have stdc-predef.h include in my own project, since there is a chance that this user header
will be accidentally included by this hidden machinery.

] cat stdc-predef.h
#error I was not expecting to see this!
] bin/clang hello-world.c
In file included from <built-in>:2:
./stdc-predef.h:1:2: error: I was not expecting to see this!
#error I was not expecting to see this!
 ^
1 error generated.
]

Hmm... I tried this patch and now the following worries me:

  • it passes -finclude-if-exists stdc-predef.h on all platforms (say, including my Solaris platform that has no system stdc-predef.h)

Right, but Solaris probably _ought_ to add one as well, to define those macros.

  • it searches all the paths, not just "system include" ones

    That essentially disallows user to have stdc-predef.h include in my own project, since there is a chance that this user header will be accidentally included by this hidden machinery.

IMO, this is a fairly negligible issue, and so we go *shrug* oh well.

+1 for using a <> include -- that does seem better.

But, note, that will have no effect w.r.t. this issue for most users, since typically people use "cc -Isomepath", which adds 'somepath' to the list which gets searched by both <> and "" includes. Hardly anyone uses -iquote.

Hmm... I tried this patch and now the following worries me:

  • it passes -finclude-if-exists stdc-predef.h on all platforms (say, including my Solaris platform that has no system stdc-predef.h)

Right, but Solaris probably _ought_ to add one as well, to define those macros.

Point taken, started internal discussion with Solaris header folks.

+1 for using a <> include -- that does seem better.

This is the only remaining issue that I would like to see fixed here.

mibintc planned changes to this revision.Jul 28 2017, 8:44 AM

@erichkeane contacted me offline and pointed out I'm twine-ing with " not <. i'm planning to change the option name from "finclude if exists" to "fsystem include if exists", then i'll quote with < not ". hope to get this updated patch into review later today. let me know if you think i'm off track. thanks for all your careful review. i knew there was something wrong with the include like Fedor pointed out but i couldn't see where i went wrong.

mibintc updated this revision to Diff 108669.Jul 28 2017, 10:12 AM

Here's an updated patch which is using angle brackets to do the include, so the search for stdc-predef.h is limited to system directories. Also my previous revision was missing the new test cases since i had gotten a new sandbox but forgot to "svn add". This patch is showing the new test cases.

fedor.sergeev added inline comments.Jul 31 2017, 10:13 AM
test/Driver/stdc-predef.c
15 ↗(On Diff #108669)

I would rather see a real check on include file inclusion (say, checking -H output) than a check for a macro.
Exact macro guard name is not a public interface at all. You might be lucky with current stdc-predef.h on Linux, but on other platforms it could be named differently.

mibintc updated this revision to Diff 108999.Jul 31 2017, 2:19 PM

This patch responds to @fedor.sergeev 's feedback from earlier today. This is a change to the test case test/Driver/stdc-predef.c in the situation that the system does not supply a version of stdc-predef.h. Fedor had suggested an option like -H which traces the include files, but the file stdc-predef.h does not appear in the -H standard include tracing even if it is available on the host system. Instead for this case I preprocess the simple test file, with sysroot pointing to a filesystem missing the file stdc-predef.h, and verify with FileCheck that the string stdc-predef.h doesn't appear in the preprocessed output.

joerg added a subscriber: joerg.Aug 1 2017, 4:13 AM

I had a long discussion with James about this on IRC without reaching a clear consensus. I consider forcing this behavior on all targets to be a major bug. It should be opt-in and opt-in only:

(1) The header name is not mandated by any standard. It is not in any namespace generally accepted as implementation-owned.
(2) It adds magic behavior that can make debugging more difficult. Partially preprocessed sources for example could be compiled with plain -c before, now they need a different command line.
(3) It seems to me that the GNU userland (and maybe Windows) is the exception to a well integrated tool chain. Most other platforms have a single canonical libc, libm and libpthread implementation and can as such directly define all the relevant macros directly in the driver. Given that many of the macros involved are already reflected by the compiler behavior anyway, they can't be decoupled. I.e. the questionable concept of locale-independent wchar_t is already hard-coded in the front end as soon as any long character literals are used.

As such, please move the command line additions back into the target-specific files for targets that actually want to get this behavior.

mibintc planned changes to this revision.Aug 1 2017, 5:52 AM

I will prepare another patch responding to joerg's comment:

Quoted Text

I had a long discussion with James about this on IRC without reaching a clear consensus. I consider forcing this behavior on all targets to be a major bug. It should be opt-in and opt-in only:

(1) The header name is not mandated by any standard. It is not in any namespace generally accepted as implementation-owned.
(2) It adds magic behavior that can make debugging more difficult. Partially preprocessed sources for example could be compiled with plain -c before, now they need a different command line.
(3) It seems to me that the GNU userland (and maybe Windows) is the exception to a well integrated tool chain. Most other platforms have a single canonical libc, libm and libpthread implementation and can as such directly define all the relevant macros directly in the driver. Given that many of the macros involved are already reflected by the compiler behavior anyway, they can't be decoupled. I.e. the questionable concept of locale-independent wchar_t is already hard-coded in the front end as soon as any long character literals are used.

As such, please move the command line additions back into the target-specific files for targets that actually want to get this behavior.

mibintc retitled this revision from For standards compatibility, preinclude <stdc-predef.h> if the file is available to For Linux/gnu compatibility, preinclude <stdc-predef.h> if the file is available .Aug 1 2017, 5:53 AM
mibintc edited the summary of this revision. (Show Details)
mibintc updated this revision to Diff 110055.Aug 7 2017, 12:54 PM

In the last review, it was deemed less controversial if I move these updates back into the gnu/Linux tool chain. This revision is responding to that feedback. I also simplified the test case. I tested on Linux with check-all and check-clang and found no unexpected failures.

The other test case modifications are to fix tests which are broken with this change due to unexpected changes to the preprocessing. I changed all these to pass -ffreestanding which inhibits the new behavior.

In fact I did have trouble writing the new test case to pass with the gnu/Linux toolchain. In the file lib/Driver/ToolChains/Linux.cpp function AddGnuIncludeArgs checks if GCCInstallation.isValid(). I don't know how to set up a Driver/Inputs sysroot tree so that the isValid check passes. (Does anyone know?) I experimented with various existing trees and had success with basic_cross_linux_tree, so I used that one in the test case. However basic_cross_linux_tree doesn't contain stdc-predef.h. Would it be OK to add an include file to that tree? (I don't know if it's acceptable to add a new sysroot tree: it is a lot of new files and directories. On the other hand, I don't know if it's OK to co-opt a tree which was added by a different test case) Perhaps you want me to another test case which verifies that fsystem-include-if-exists works generally. Currently I've added an affirmative test to see that -fsystem-include-if-exists is on the command line, and a test that -ffreestanding inhibits, and a negative test that when given a sysroot without the file, that the preprocessed output shows no indication of stdc-predef.h.

In fact I did have trouble writing the new test case to pass with the gnu/Linux toolchain. In the file lib/Driver/ToolChains/Linux.cpp function AddGnuIncludeArgs checks if GCCInstallation.isValid().

You should not be doing stdc-predef.h under GCCInstallation.isValid().
You are handling interaction with libc, so it has nothing to do with the presence or absence of gcc toolchain.

mibintc updated this revision to Diff 110193.Aug 8 2017, 7:30 AM

Responding to @fedor.sergeev 's comment. This is an updated patch which pulls out the "isValid" check on GCCInstallation. Also I'm putting back the dummy sysroot tree which contains stdc-predef.h and adding a new test run to confirm that the new option fsystem-include-if-exists is actually working.

I had a long discussion with James about this on IRC without reaching a clear consensus. I consider forcing this behavior on all targets to be a major bug. It should be opt-in and opt-in only:

(1) The header name is not mandated by any standard. It is not in any namespace generally accepted as implementation-owned.

This is a point. I didn't think it was a big deal, but if you want to argue a different name should be used, that's a reasonable argument. If we can get some agreement amongst other libc vendors to use some more agreeable alternative name, and keep a fallback on linux-only for the "stdc-predef.h" name, I'd consider that as a great success.

(2) It adds magic behavior that can make debugging more difficult. Partially preprocessed sources for example could be compiled with plain -c before, now they need a different command line.

If this is a problem, making it be Linux-only does _nothing_ to solve it. But I don't actually see how this is a substantively new problem? Compiling with plain -c before would get #defines for those predefined macros that the compiler sets, even though you may not have wanted those. Is this fundamentally different?

(3) It seems to me that the GNU userland (and maybe Windows) is the exception to a well integrated tool chain. Most other platforms have a single canonical libc, libm and libpthread implementation and can as such directly define all the relevant macros directly in the driver.

I don't think this is accurate. There's many platforms out there, and for almost none of them do we have exact knowledge of the features of the libc encoded into the compiler. I'd note that not only do you need this for every (OS, libc) combination, you'd need it for every (OS, libc-VERSION) combination.

Given that many of the macros involved are already reflected by the compiler behavior anyway, they can't be decoupled. I.e. the questionable concept of locale-independent wchar_t is already hard-coded in the front end as soon as any long character literals are used.

AFAICT, this example is not actually the case. The frontend only needs to know *a* valid encoding for wide-character literals in some implementation-defined locale (for example, it might always emit them as unicode codepoints, as clang does). Standard says: "the array elements [...] are initialized with the sequence of wide characters corresponding to the multibyte character sequence, as defined by the mbstowcs function with an implementation defined current locale."

On the other hand, I believe the intent of this macro is to guarantee that _no matter what_ the locale is, that a U+0100 character (say, translated with mbrtowc from the locale encoding) gets represented as 0x100.

Thus, it's "fine" for the frontend to always emit wchar_t literals as unicode, yet, the libc may sometimes use an arbitrary different internal encoding, depending on the locale used at runtime. (Obviously using wide character literals with such a libc would be a poor user experience, and such a libc probably ought to reconsider its choices, but that's a different discussion.)

As such, please move the command line additions back into the target-specific files for targets that actually want to get this behavior.

Without even a suggestion of a better solution to use for other targets, I find it is to be a real shame to push for this to be linux-only, and leave everyone else hanging. I find that that _most_ of these defines are effectively library decisions. I further would claim that these are likely to vary over the lifetime of even a single libc, and that as such we would be better served by allowing the libc to set them directly, rather than encoding it into the compiler.

TTBOMK, no targets except linux/glibc/gcc actually comply with this part of the C99/C11 standard today, and so this feature would be useful to have available across all targets.

(I very much dislike that the C standard has started adding all these new predefined macros, instead of exposing them from a header, but there's not much to be done about that...)

Going through the various macros:
__STDC_ISO_10646__:
As discussed above, this is effectively entirely up to the libc. The compiler only need support one possible encoding for wchar_t, and clang always chooses unicode. But it can't define this because the libc might use others as well.

__STDC_MB_MIGHT_NEQ_WC__:
As with __STDC_ISO_10646__, this is up to the libc not the compiler. (At least, I think so... I note that clang currently sets this for freebsd, with a FIXME next to it saying it's only intended to apply to wide character literals. I don't see that the standard says that, however, so, IMO, having it set for freebsd was and is correct).

__STDC_UTF16__, __STDC_UTF32__:
Again, analogous to __STDC_ISO_10646__, except dealing with char16_t/char32_t. this should be set if the libc guarantees that mbrtoc16/mbrtoc32 generate utf16/32-encoded data.

__STDC_ANALYZABLE__
Possibly(?) entirely on the compiler side. (I'm going to bet that nobody will ever implement this, though.)

__STDC_IEC_559__, __STDC_IEC_559_COMPLEX__:
Requires cooperation between library and compiler; some parts of compliance is in the compiler, and some is in the libm routines. GCC defines __GCC_IEC_559 to indicate when the compiler intends to comply, and glibc then sets __STDC_IEC_559__ depending on the GCC define, to indicate the system as a whole is intended to comply.

__STDC_LIB_EXT1__:
Entirely a library issue, nothing to do with the compiler.

__STDC_NO_COMPLEX__:
Given that clang supports complex, what remains is: does libc provides complex.h and the functions within?

__STDC_NO_THREADS__:
Given that clang supports _Atomic (and provides stdatomic.h), what remains is: does libc provide thread.h and the functions within?

__STDC_NO_VLA__:
Entirely on the compiler, nothing to do with libc.

I had a long discussion with James about this on IRC without reaching a clear consensus. I consider forcing this behavior on all targets to be a major bug. It should be opt-in and opt-in only:

(1) The header name is not mandated by any standard. It is not in any namespace generally accepted as implementation-owned.

This is a point. I didn't think it was a big deal, but if you want to argue a different name should be used, that's a reasonable argument.
If we can get some agreement amongst other libc vendors to use some more agreeable alternative name, and keep a fallback on linux-only for the "stdc-predef.h" name, I'd consider that as a great success.

Perhaps not a big deal yet, but as I have recently described stdc-predef.h idea to Oracle Solaris libc/headers/compilers folks, they generally welcomed the idea..

(3) ...Most other platforms have a single canonical libc, libm and libpthread implementation and can as such directly define all the relevant macros directly in the driver.

I don't think this is accurate. There's many platforms out there, and for almost none of them do we have exact knowledge of the features of the libc encoded into the compiler.

Solaris is a direct example of that...

joerg added a comment.Aug 9 2017, 12:08 PM

(2) It adds magic behavior that can make debugging more difficult. Partially preprocessed sources for example could be compiled with plain -c before, now they need a different command line.

If this is a problem, making it be Linux-only does _nothing_ to solve it. But I don't actually see how this is a substantively new problem? Compiling with plain -c before
would get #defines for those predefined macros that the compiler sets, even though you may not have wanted those. Is this fundamentally different?

It makes it a linux-only problem. As such, it is something *I* only care about secondary. A typical use case I care about a lot is pulling the crash report sources from my (NetBSD) build machine,
extracting the original command line to rerun the normal compilation with -save-temps. I don't necessarily have the (same) system headers on the machine I use for debugging and that's exactly
the kind of use case this change breaks. All other predefined macros are driven by the target triple and remain stable.

(3) It seems to me that the GNU userland (and maybe Windows) is the exception to a well integrated tool chain. Most other platforms have a single canonical
libc, libm and libpthread implementation and can as such directly define all the relevant macros directly in the driver.

I don't think this is accurate. There's many platforms out there, and for almost none of them do we have exact knowledge of the features of the libc encoded
into the compiler. I'd note that not only do you need this for every (OS, libc) combination, you'd need it for every (OS, libc-VERSION) combination.

Not really. The feature set is rarely changing and generally will have only a cut-off version.

Given that many of the macros involved are already reflected by the compiler behavior anyway, they can't be decoupled. I.e. the questionable
concept of locale-independent wchar_t is already hard-coded in the front end as soon as any long character literals are used.

AFAICT, this example is not actually the case. The frontend only needs to know *a* valid encoding for wide-character literals in some
implementation-defined locale (for example, it might always emit them as unicode codepoints, as clang does). Standard says:
"the array elements [...] are initialized with the sequence of wide characters corresponding to the multibyte character sequence, as
defined by the mbstowcs function with an implementation defined current locale."

I know what the standard says. It doesn't make much sense if you do not have a fixed wchar_t encoding.

On the other hand, I believe the intent of this macro is to guarantee that _no matter what_ the locale is,
that a U+0100 character (say, translated with mbrtowc from the locale encoding) gets represented as 0x100.

Yes, so it is essentially "we have screwed up by creating a language mechanism that adds a major constraint, so let's go all the way".

Thus, it's "fine" for the frontend to always emit wchar_t literals as unicode, yet, the libc may sometimes use an arbitrary different
internal encoding, depending on the locale used at runtime. (Obviously using wide character literals with such a libc would be a poor
user experience, and such a libc probably ought to reconsider its choices, but that's a different discussion.)

One of the biggest opponents of that was Itojun. It's not a decision that should be made here as it is only indirectly related to this discussion.

As such, please move the command line additions back into the target-specific files for targets that actually want to get this behavior.

Without even a suggestion of a better solution to use for other targets, I find it is to be a real shame to push for this to be linux-only,
and leave everyone else hanging. I find that that _most_ of these defines are effectively library decisions. I further would claim that these
are likely to vary over the lifetime of even a single libc, and that as such we would be better served by allowing the libc to set them directly, rather than encoding it into the compiler.

TTBOMK, no targets except linux/glibc/gcc actually comply with this part of the C99/C11 standard today, and so this feature would be useful to have available across all targets.

(I very much dislike that the C standard has started adding all these new predefined macros, instead of exposing them from a header, but there's not much to be done about that...)

Exactly. It's not like this is a lot of target logic. It should be a single call for targets that want to get this functionality. But that's my point -- it should be opt-in, not opt-out.

(2) It adds magic behavior that can make debugging more difficult. Partially preprocessed sources for example could be compiled with plain -c before, now they need a different command line.

If this is a problem, making it be Linux-only does _nothing_ to solve it. But I don't actually see how this is a substantively new problem? Compiling with plain -c before
would get #defines for those predefined macros that the compiler sets, even though you may not have wanted those. Is this fundamentally different?

It makes it a linux-only problem. As such, it is something *I* only care about secondary. A typical use case I care about a lot is pulling the crash report sources from my (NetBSD) build machine,
extracting the original command line to rerun the normal compilation with -save-temps. I don't necessarily have the (same) system headers on the machine I use for debugging and that's exactly
the kind of use case this change breaks. All other predefined macros are driven by the target triple and remain stable.

Don't you use preprocessed source files from a crash?

Just to restate: the ideal outcome of this discussion for me would be to resolve things such that _ALL_ libc implementations will feel comfortable using this technique to provide the C11-required predefined macros.

I'd love for linux, freebsd, macos, solaris, etc etc libc to all conform to the C standard in this regards, and do so in a common way, without the need to encode information about each libc version into the compiler. I _really_ don't think that scales well.

So I take your comments from FreeBSD's point of view seriously, and would very much like to understand and hopefully resolve them.

(2) It adds magic behavior that can make debugging more difficult. Partially preprocessed sources for example could be compiled with plain -c before, now they need a different command line.

If this is a problem, making it be Linux-only does _nothing_ to solve it. But I don't actually see how this is a substantively new problem? Compiling with plain -c before
would get #defines for those predefined macros that the compiler sets, even though you may not have wanted those. Is this fundamentally different?

It makes it a linux-only problem. As such, it is something *I* only care about secondary. A typical use case I care about a lot is pulling the crash report sources from my (NetBSD) build machine,
extracting the original command line to rerun the normal compilation with -save-temps. I don't necessarily have the (same) system headers on the machine I use for debugging and that's exactly
the kind of use case this change breaks. All other predefined macros are driven by the target triple and remain stable.

"it's Linux only so I don't care if it's broken." is still not very helpful. :)

But I do think understand what you're saying now, so thanks for the elaboration.

Firstly, let's consider a "clang foo.i" or "clang -x cpp-output foo.c" compilation. In that case, it *clearly* should not be including the predef file. I think the patch as it stands may not do this properly. A test needs to be added for this to this patch, and perhaps the behavior needs to be fixed as well.

(Sidenote: clang doesn't support preprocessed input properly, but that's another bug, and we certainly ought not make it worse. Check out e.g. "int main() { return GNUC; }". it should report that GNUC is undeclared, but instead compiles a program that returns 4.)

But, that's not the case you're talking about above -- you're not talking about compiling preprocessed output, you're talking about taking output that comes from -frewrite-includes.

Let me recap the scenario:

  1. Start with a source file foo.c, with this content:

#include <stdlib.h>
#pragma clang __debug parser_crash

  1. Run "clang foo.c". It crashes, and dumps a /tmp/foo-XXX.c and a /tmp/foo-XXX.sh script.

The .c file is generated via -frewrite-includes, so it's _not_ already preprocessed, it simply has all includes pulled into a single file. It also _doesn't_ insert the compiler-predefined macros at the top, but it _will_ include the content of this stdc-predef.h file.

OK, so then...
The generated script invokes a -cc1 command line, with all the include arguments stripped out of the command. (TO FIX: We should be stripping the new arg as well: add "-fsystem-include-if-exists" argument to the list of include things in the skipArgs() function in lib/Driver/Job.cpp). Even without that change, it's actually already fine, as there is no include path specified in which to find the file -- but it's cleaner to strip it, so let's do that. The reproducer script will thus run correctly, and not include the file.

Now, the "/tmp/foo-XXX.sh" also has a line labeled "Driver args: " with the original command-line on it. If I understand correctly, you then like to take this simpler Driver command-line, and edit it manually: add -save-temps, and change the input filename to the "/tmp/foo-XXX.c" file, and run that, instead of actually invoking the reproducer foo-XXX.sh.

Since stdc-predef.h is included automatically, it will now be present twice -- first, it will read the one from your system's /usr/include, and then the copy inlined into the /tmp/foo-XXX.c file. That's not what you desired. You wanted nothing from your /usr/include to be used.

The fix for the end-user here is easy: you can add -nostdinc which will suppress all the default include paths, and thus it will not find this predef file from your system include dir.

I'll note that you'd also have had an issue if the original driver command-line had a "-include" option in it, which you would have needed to edit out manually as well. (But I understand that is less common.)

Have I correctly described the situation? I guess my feeling here is that this is somewhat of an edge-case, and that the workaround (-nostdinc) is a sufficient fix.

(3) It seems to me that the GNU userland (and maybe Windows) is the exception to a well integrated tool chain. Most other platforms have a single canonical
libc, libm and libpthread implementation and can as such directly define all the relevant macros directly in the driver.

I don't think this is accurate. There's many platforms out there, and for almost none of them do we have exact knowledge of the features of the libc encoded
into the compiler. I'd note that not only do you need this for every (OS, libc) combination, you'd need it for every (OS, libc-VERSION) combination.

Not really. The feature set is rarely changing and generally will have only a cut-off version.

Yes, but this is information that the compiler has no real need to know, and that for many platforms would be difficult and problematic to coordinate.

(2) It adds magic behavior that can make debugging more difficult. Partially preprocessed sources for example could be compiled with plain -c before, now they need a different command line.

If this is a problem, making it be Linux-only does _nothing_ to solve it. But I don't actually see how this is a substantively new problem? Compiling with plain -c before
would get #defines for those predefined macros that the compiler sets, even though you may not have wanted those. Is this fundamentally different?

It makes it a linux-only problem. As such, it is something *I* only care about secondary. A typical use case I care about a lot is pulling the crash report sources from my (NetBSD) build machine,
extracting the original command line to rerun the normal compilation with -save-temps. I don't necessarily have the (same) system headers on the machine I use for debugging and that's exactly
the kind of use case this change breaks. All other predefined macros are driven by the target triple and remain stable.

Don't you use preprocessed source files from a crash?

The crash rewrite tool creates semi-preprocessed output. It resolves includes along all code branches, but keeps macros and CPP conditionals alone.

Now, the "/tmp/foo-XXX.sh" also has a line labeled "Driver args: " with the original command-line on it. If I understand correctly, you then like to take this
simpler Driver command-line, and edit it manually: add -save-temps, and change the input filename to the "/tmp/foo-XXX.c" file, and run that, instead
of actually invoking the reproducer foo-XXX.sh.

It's not so much that it is simpler, but the only reasonable easy way for forcing preprocessor output to written. That one is much nicer for passing to creduce for example (after stripping the remaining # lines).

Since stdc-predef.h is included automatically, it will now be present twice -- first, it will read the one from your system's /usr/include, and then the copy inlined into the
/tmp/foo-XXX.c file. That's not what you desired. You wanted nothing from your /usr/include to be used.

Correct. Worse, the include may file depending on the host system and give different results from the original build, resulting in annoying debugging seasons.

The fix for the end-user here is easy: you can add -nostdinc which will suppress all the default include paths, and thus it will not find this predef file from your system include dir.

Yeah, except you have to remember that when it was never needed before.

I'll note that you'd also have had an issue if the original driver command-line had a "-include" option in it, which you would have needed to edit out
manually as well. (But I understand that is less common.)

Yes, -include is rarely used in the wild.

Have I correctly described the situation? I guess my feeling here is that this is somewhat of an edge-case, and that the workaround (-nostdinc) is a sufficient fix.

(3) It seems to me that the GNU userland (and maybe Windows) is the exception to a well integrated tool chain. Most other platforms have a single canonical
libc, libm and libpthread implementation and can as such directly define all the relevant macros directly in the driver.

I don't think this is accurate. There's many platforms out there, and for almost none of them do we have exact knowledge of the features of the libc encoded
into the compiler. I'd note that not only do you need this for every (OS, libc) combination, you'd need it for every (OS, libc-VERSION) combination.

Not really. The feature set is rarely changing and generally will have only a cut-off version.

Yes, but this is information that the compiler has no real need to know, and that for many platforms would be difficult and problematic to coordinate.

That's kind of my point. There is little coordination involved for almost all platforms since they provide a consistent user environment as a single package. Pushing the changes into Clang (or GCC for that matter) is trivial if you do not have to deal with multiple different versions of libc, the kernel and whatever in a mix-and-match scenario.

mibintc updated this revision to Diff 110627.Aug 10 2017, 1:17 PM

This patch is responding to @jyknight 's concern about preprocessed input. The patch as it stands doesn't have this issue. I added 2 test cases, one using option -x cpp-output, and another for a source file suffixed with .i

Quoting James: "Firstly, let's consider a "clang foo.i" or "clang -x cpp-output foo.c" compilation. In that case, it *clearly* should not be including the predef file. I think the patch as it stands may not do this properly. A test needs to be added for this to this patch, and perhaps the behavior needs to be fixed as well."

This patch is responding to @jyknight 's concern about preprocessed input. The patch as it stands doesn't have this issue. I added 2 test cases, one using option -x cpp-output, and another for a source file suffixed with .i

Quoting James: "Firstly, let's consider a "clang foo.i" or "clang -x cpp-output foo.c" compilation. In that case, it *clearly* should not be including the predef file. I think the patch as it stands may not do this properly. A test needs to be added for this to this patch, and perhaps the behavior needs to be fixed as well."

Thanks for the test. It wasn't obvious from the code, so I'm glad to hear it was already correct. :)

Did you see the other suggestion I cleverly hid within a big block of commentary? "(TO FIX: We should be stripping the new arg as well: add "-fsystem-include-if-exists" argument to the list of include things in the skipArgs() function in lib/Driver/Job.cpp)"

Hahnfeld resigned from this revision.Aug 10 2017, 11:05 PM
mibintc planned changes to this revision.Aug 11 2017, 6:48 AM

Need TO FIX: We should be stripping the new arg as well: add "-fsystem-include-if-exists" argument to the list of include things in the skipArgs() function in lib/Driver/Job.cpp

mibintc updated this revision to Diff 111019.Aug 14 2017, 10:06 AM

@jyknight recommended adding the new option to skipArgs in Job.cpp. This revision makes that change and tests that in crash-report.c; look OK?

Another ping. Is it possible to approve this modification, or alternatively, let me know that it won't be approved indefinitely? I understand that it's a controversial change. Thanks!

mibintc updated this revision to Diff 114820.Sep 12 2017, 6:35 AM

I heard that @jyknight is still interested in this functionality, updating the patch to latest sources, to do this i needed to make a small change to a couple of driver tests. Other than that it's the same as previous submission.

This version is fine with me. The only contentious part is whether it should be opt-in or opt-out for platforms, so getting this version in and revisiting the issue again later is OK from my perspective.

Hey @jyknight I heard from @erichkeane that you may want a couple more changes to this patch. Please let me know if you have some changes to recommend. @joerg thought this was OK for submission. Thanks --Melanie

I'd still _very_ much like a solution that is acceptable for all libc to use, and have that on by default.

However, I guess this is fine.

Only concern I have is that it seems odd that so many test-cases needed to be changed. What fails without those test changes?

I'd still _very_ much like a solution that is acceptable for all libc to use, and have that on by default.

However, I guess this is fine.

Only concern I have is that it seems odd that so many test-cases needed to be changed. What fails without those test changes?

Those tests fail because they generate preprocessed output and then check carefully for precise results. Since the preprocessed output is different, the comparison fails. I "fixed" the tests by adding the freestanding option which inhibits the new behavior. I'll have to check out and rebuild everything so I can show you exactly the failure mode, I expect I can post details tomorrow.

mibintc added a comment.EditedOct 25 2017, 7:36 AM

I'd still _very_ much like a solution that is acceptable for all libc to use, and have that on by default.

However, I guess this is fine.

Only concern I have is that it seems odd that so many test-cases needed to be changed. What fails without those test changes?

Those tests fail because they generate preprocessed output and then check carefully for precise results. Since the preprocessed output is different, the comparison fails. I "fixed" the tests by adding the freestanding option which inhibits the new behavior. I'll have to check out and rebuild everything so I can show you exactly the failure mode, I expect I can post details tomorrow.

I created a fresh workspace and applied the patch, then I make "check-clang". I saw one test fail. The test that failed is Driver/crash-report.c. (BTW I need to recheck all the tests and make sure the test modifications are still needed, and likewise in the "extra" tests directory which is in the separate patch file). This is the failure mode for the Driver crash report. Is this sufficient detail to understand the test issue? If not what else can I provide?

llvm/tools/clang/test/Driver/crash-report.c
Exit Code: 1

Command Output (stderr):
--
/site/spt/usr20/mblower/sptel9-ws/llorg/llvm/tools/clang/test/Driver/crash-report.c:20:11: error: expected string not found in input
// CHECK: Preprocessed source(s) and associated run script(s) are located at:

```          ^
<stdin>:1:1: note: scanning from here
<built-in>:1:10: fatal error: '-fobjc-runtime=gcc' file not found
^

If I make this change, adding the freestanding option, and rebuild "check-clang", then no tests are failing:
diff --git a/test/Driver/crash-report.c b/test/Driver/crash-report.c
index a3f1f9e..526e4dd 100644
--- a/test/Driver/crash-report.c
+++ b/test/Driver/crash-report.c
@@ -2,7 +2,7 @@
 // RUN: mkdir %t
 // RUN: not env TMPDIR=%t TEMP=%t TMP=%t RC_DEBUG_OPTIONS=1              \
 // RUN:  CC_PRINT_HEADERS=1 CC_LOG_DIAGNOSTICS=1                         \
-// RUN:  %clang -fsyntax-only %s                                         \
+// RUN:  %clang -fsyntax-only -ffreestanding %s
 // RUN:  -F/tmp/ -I /tmp/ -idirafter /tmp/ -iquote /tmp/ -isystem /tmp/  \
 // RUN:  -iprefix /the/prefix -iwithprefix /tmp -iwithprefixbefore /tmp/ \
 // RUN:  -Xclang -internal-isystem -Xclang /tmp/                         \
This comment was removed by mibintc.
mibintc updated this revision to Diff 120581.Oct 27 2017, 6:11 AM

The patch to clang is the same as before. The modifications are to the test cases, I verified that all these test cases need modifications. I will also be updating the associated diff for clang/tools/extra, please review that as well. Without the test patches, these are the tests that fail (using check-all) . The failure mode is as described in a note I posted earlier this week, please let me know if you need further details. @jyknight Looking for your review and approval, thanks!

Clang :: Driver/crash-report-header.h
Clang :: Driver/crash-report-spaces.c
Clang :: Driver/crash-report.c
Clang :: Driver/rewrite-map-in-diagnostics.c
Clang :: Driver/stdc-predef.c
Clang :: Index/IBOutletCollection.m
Clang :: Index/annotate-macro-args.m
Clang :: Index/annotate-tokens-pp.c
Clang :: Index/annotate-tokens.c
Clang :: Index/c-index-getCursor-test.m
Clang :: Index/get-cursor-macro-args.m
Clang :: Index/get-cursor.cpp
Clang :: Preprocessor/ignore-pragmas.c
Clang Tools :: pp-trace/pp-trace-pragma-general.cpp
Clang Tools :: pp-trace/pp-trace-pragma-opencl.cpp
Clang-Unit :: Tooling/./ToolingTests/CommentHandlerTest.BasicTest1
Clang-Unit :: Tooling/./ToolingTests/CommentHandlerTest.BasicTest2
Clang-Unit :: Tooling/./ToolingTests/CommentHandlerTest.BasicTest3
Clang-Unit :: Tooling/./ToolingTests/CommentHandlerTest.IfBlock1
Clang-Unit :: Tooling/./ToolingTests/CommentHandlerTest.IfBlock2
Clang-Unit :: Tooling/./ToolingTests/CommentHandlerTest.IfBlock3
Clang-Unit :: Tooling/./ToolingTests/CommentHandlerTest.PPDirectives
mibintc reopened this revision.Nov 22 2017, 8:59 AM

Hi.

I also posted this question on IRC

@erichkeane pushed a fix for D34158 which on Linux does a preinclude of stdc-predef.h; on Monday. Later on Monday he pulled it out because it caused 3 test failures and word-of-mouth that it also caused chromium build to fail. I haven't been watching this IRC so I didn't catch the details about the chromium build fail.

if you know, can you send me details about how/why chromium build fails?

also, the 3 tests (2 lit tests, 1 google test) which failed on buildbot do not fail on the linux servers in-house at Intel. Any clue about why that would be? I have a patched compiler, and verify that the new functionality is present. I build check-clang and there are no fails.

I looked with llvm-lit at each of the 2 failing tests and they don't fail, I checked the google test libclang "skip ranges" and it doesn't fail

also maybe i should redirect this question elsewhere. don't hesitate to let me know

also i have used the self-build for D34158 and likewise those 3 tests do not fail.

i used "llvm-lit -a" on one of the failing tests test/Modules/crash-vfs-path-symlink-component.m to see why the test was failing. i was not able to run the commands directly from the linux shell
the lit command that didn't work is not env FORCE_CLANG_DIAGNOSTICS_CRASH= ...
it's the "not" which is not understood - does that ring a bell? it works OK within llvm-lit
sptel9-ws/llorg1/builds/llorgefi2linux_debug/llvm/bin/count 1 ; the buildbot is showing 2 not 1

This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.