This is an archive of the discontinued LLVM Phabricator instance.

[libc] Use #undef isascii in specific header
ClosedPublic

Authored by mcgrathr on Jan 13 2021, 5:24 PM.

Details

Summary

Standard C allows all standard headers to declare macros for all
their functions. So after possibly including any standard header
like <ctype.h>, it's perfectly normal for any and all of the
functions it declares to be defined as macros. Standard C requires
explicit #undef before using that identifier in a way that is not
compatible with function-like macro definitions.

The C standard's rules for this are extended to POSIX as well for
the interfaces it defines, and it's the expected norm for
nonstandard extensions declared by standard C library headers too.

So far the only place this has come up for llvm-libc's code is with
the isascii function in Fuchsia's libc. But other cases can arise
for any standard (or common extension) function names that source
code in llvm-libc is using in nonstandard ways, i.e. as C++
identifiers.

The only correct and robust way to handle the possible inclusion of
standard C library headers when building llvm-libc source code is to
use #undef explicitly for each identifier before using it. The
easy and obvious place to do that is in the per-function header.
This requires that all code, such as test code, that might include
any standard C library headers, e.g. via utils/UnitTest/Test.h, make
sure to include those *first* before the per-function header.

This change does that for isascii and its test. But it should be
done uniformly for all the code and documented as a consistent
convention so new implementation files are sure to get this right.

Diff Detail

Event Timeline

mcgrathr created this revision.Jan 13 2021, 5:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2021, 5:24 PM
mcgrathr requested review of this revision.Jan 13 2021, 5:24 PM
sivachandra accepted this revision.EditedJan 13 2021, 10:03 PM

I am to blame for this. I did consider the approach you are suggesting, and also few other less ideal approaches. What you are proposing is the most appropriate way I would think but I did not like it previously for a few reasons.

  1. I felt that instead of introducing a one off pattern, we should have a more uniform pattern. As in, we should have the undef in all of the internal headers.
  2. Even with that approach, it is still sensitive to the order of include files as you have mentioned. Fixing the order breaks the current include order style (which was inherited from the general LLVM style.)

Both the above problems are addressable I think but quickly go beyond the scope of the problem that was to be addressed. So, considering the problem is only showing up for Fuchsia today, I proposed to put the undef next to the Fuchsia includes so that it is clear why we have them at all.

But, now that you have brought it up, I think we should solve this more thoroughly. What I mean is, while your change LGTM, I would like to eliminate these one-off patterns and put in place a more general set of rules/solutions even if it requires breaking away from the general LLVM style for the libc. What do you think of a plan like this, which is mostly along the ideas you have proposed:

  1. For the undef in the internal header: It would be nice to fold the #undef <function name> inside of a macro like this (which doesn't work and my preprocessor knowledge is not enough to come up with alternates) but may be not important because of helper libraries (see #2 below).
// This macro is to be used inside of the namespace __llvm_libc
#define LLVM_LIBC_FUNCTION_DECL(name) \
  #undef name
  decltype(::name) name;
  1. We will change the include order rule to:
    1. First come public libc headers
    2. Then are non-libc headers. So, in unittests, the test framework headers come next.
    3. Next are libc-internal headers. We will need to define what are libc-internal headers. All headers in src/... are internal headers. But, headers from helper libraries outside of the src/... directory should also be considered as internal. We currently only have two such libraries which are header only. All internal headers should #undefpublic libc names used by them in non-standard ways.

Each of the above groups should be lexicographically sorted.

  1. The enforcement of these rules should be tool driven.

Let me know what you think about this.

This revision is now accepted and ready to land.Jan 13 2021, 10:03 PM

clang-format supports include blocks:

BasedOnStyle: LLVM
IncludeBlocks: Regroup
IncludeCategories:
  - Regex:           '^<.*\.h>'
    Priority:        2
  - Regex:           '^<.*'
    Priority:        6
  - Regex:           '^"(llvm|llvm-c)/'
    Priority:        5
  - Regex:           '^"(clang|clang-c)/'
    Priority:        4
  - Regex:           '^"\.\./'
    Priority:        3
  - Regex:           '.*'
    Priority:        1

That plan sounds good to me. There is no way to fold #undef into a macro, so you can't avoid a little bit of boilerplate unless you generate the source code.

This revision was automatically updated to reflect the committed changes.