This is an archive of the discontinued LLVM Phabricator instance.

[libc] Allow regex in entrypoint list
AbandonedPublic

Authored by abrachet on Jan 5 2022, 10:36 AM.

Details

Summary

Don't need to list out entrypoints one by one, we can now glob for them.

The glob is kind of a hack, libc.src.ctype.* works because we conveniently have the .. it would be intuitive if you wanted all the is* functions in ctype to write libc.src.ctype.is* but that would not work. I don't imagine this is a situation that would ever come up though.

A current issue is that string functions have many different targets like there is libc.src.string.bcmp and libc.src.string.bcmp_x86_64_opt_sse2, both define the bcmp symbol. I'm not sure how we want to deal with this. At present the targets with different compile options do not make their way to libllvmlibc.a.

Diff Detail

Event Timeline

abrachet created this revision.Jan 5 2022, 10:36 AM
abrachet requested review of this revision.Jan 5 2022, 10:36 AM

If this patch is going to land, there are a few things to fix first, specifically the entrypoints files for the other targets (aarch64 and windows) and the cmake file for the tests (line 54 in libc/test/src/CMakeLists.txt references TARGET_LLVMLIBC_ENTRYPOINTS specifically and throws errors when I try to build with your patch).

But I'm not sure that this patch matches with the design of the entrypoints file, specifically because not all of the functions are currently available for all platforms. For example, the malloc dependant functions aren't available on AArch64 yet, meaning they'd need to be specifically excluded. This means that we'd effectively be swapping a list of functions which are included for a list of functions which are excluded. While this doesn't particularly cause problems for already established platforms, I think it would make bringing up a new platform harder.

I do think having support for globbing could be useful though, specifically for the functions that are very platform independent, such as the ctype.h functions, so perhaps using both globs and explicit lists will be the eventual solution.

abrachet updated this revision to Diff 397737.Jan 5 2022, 6:01 PM
  • Fix failing build when building with LLVM_LIBC_FULL_BUILD defined
  • Add include subdirectory after source directories are finished adding all symbols, so all used symbols can be aggregated before header generation
  • Don't require HdrGen to need one or more --e flags. src/threads/linux/thread_start_args.h is generated under src/ but doesn't need any symbols, so it doesn't need --e.

If this patch is going to land, there are a few things to fix first, specifically the entrypoints files for the other targets (aarch64 and windows)

The other targets don't _need_ to be updated, they still work as is. I think it makes more sense to only change the one file, to limit the scope of this patch. But it is trivial to change the other platforms here if we want that.

and the cmake file for the tests (line 54 in libc/test/src/CMakeLists.txt references TARGET_LLVMLIBC_ENTRYPOINTS specifically and throws errors when I try to build with your patch).

Fixed

But I'm not sure that this patch matches with the design of the entrypoints file, specifically because not all of the functions are currently available for all platforms. For example, the malloc dependant functions aren't available on AArch64 yet, meaning they'd need to be specifically excluded. This means that we'd effectively be swapping a list of functions which are included for a list of functions which are excluded. While this doesn't particularly cause problems for already established platforms, I think it would make bringing up a new platform harder.

Malloc and friends are not included on aarch64 because I only touched the x86 config file. Excluding in that file would be the same as what is done here, list(APPEND TARGET_LIBC_OMIT_ENTRYPOINTS ${MALLOC_DEPENDANT_ENTRYPOINTS})

I'm not sure I see why it makes it harder for other platforms. On the contrary I made this change to make it easier for us to support other platforms and add new symbols without needing to remember to add it to every platform where it should be on.

The patch is better, although there are still a few things that need fixing:

The malloc dependent functions don't work, specifically when building with -DLLVM_LIBC_INCLUDE_SCUDO=ON then strdup gets the error no member named 'malloc' in the global namespace.

On Aarch64 the unmodified entrypoints.txt seems to be causing issues, I've copied the error below, I think the problem is that specifically listed functions aren't being properly included (and this is true on x86_64 as well).

CMake Error at ~/llvm-project/libc/cmake/modules/LLVMLibCLibraryRules.cmake:88 (add_library):
  Error evaluating generator expression:

    $<TARGET_OBJECTS:libc.src.math.aarch64.ceil>

  Objects of target "libc.src.math.aarch64.ceil" referenced but is not an
  OBJECT library.
Call Stack (most recent call first):
  ~/llvm-project/libc/lib/CMakeLists.txt:3 (add_entrypoint_library)


CMake Error at ~/llvm-project/libc/cmake/modules/LLVMLibCLibraryRules.cmake:88 (add_library):
  No SOURCES given to target: llvmlibc
Call Stack (most recent call first):
  ~/llvm-project/libc/lib/CMakeLists.txt:3 (add_entrypoint_library)

I htink I understand what your intent is a bit better, and I agree that this patch is useful, but I would like to adjust which headers are marked as "all" vs which have their functions listed. I think that headers that are safely and trivially cross-platform can be marked as all, such as ctype.h, in addition I think that headers that have significant platform differences should have their functions listed, such as math.h. The reason behind this is that I agree that not having to add the functions to each list specifically would be nice, but I also think that for functions that may have cross-platform difficulties then adding that function to a platform that you haven't tested is likely to cause problems. With the current setup, if I want to land a function on just x86_64 because it will require tweaking for aarch64 and I want to do that separately, then I just need to remember to add the function to x86_64, which is hard to miss because if I don't then I can't test it. If we switch all functions to being enabled by default on all platforms, then I won't have to remember to add my function to the allow list for x86_64, but I will have to add it to the omit list for all of the other platforms if I don't want to break those platforms' builds.

libc/cmake/modules/LLVMLibCObjectRules.cmake
287

I think you need to add set_property(GLOBAL APPEND PROPERTY TARGET_LLVMLIBC_USED_ENTRYPOINTS ${fq_target_name}) to this function.

abrachet updated this revision to Diff 398393.Jan 8 2022, 9:14 PM
  • Fix bug when building with -DLLVM_LIBC_INCLUDE_SCUDO=ON
  • Fix aliasee targets not being included
  • Only use target globbing for libc.src.ctype.*

@michaelrj I concur. I think it's better to be explicit and list every single functions rather than globing.
Globing simplifies the configuration a bit but I suspect it will have surprising effect in the future.

@abrachet is there anything else you're trying to achieve with this patch?

Globing simplifies the configuration a bit but I suspect it will have surprising effect in the future.

If we don't think the tradeoff is worth it, which it may not be, I'm fine abandoning this patch. I do agree that this could potentially bite us later.

However I think that the current situation is also error prone. For example for our Linux x86 and ARM64 configurations,

memccpy
mempcpy
stpcpy
stpncpy
strncat
atof
strtod
strtof
strtold

Are all missing for ARM despite having pure C++ implementations for these.

For Windows it's worse, granted I don't know how supported Windows even is. All of the above symbols plus these, which are OS agnostic are missing.

imaxdiv
abs
bsearch
div
labs
ldiv
llabs
lldiv
qsort
fedisableexcept
feenableexcept
fegetexcept
cos
llrint
llrintf
llrintl
lrint
lrintf
lrintl
sin
tan

@abrachet is there anything else you're trying to achieve with this patch?

No, we're just planning on adding a new OS target in the future and this would make it simpler to maintain as we increase the number of OS+architecture configurations. Moreover these is no canonical list of all entrypoints, linux/x86_64 essentially acts like this, but it isn't programmatically guaranteed.

I am with @gchatelet and @michaelrj wrt scalability issues and wrt globbing causing subtle issues in future. I agree it is not ideal and may be a bit too verbose. I don't necessarily view this verbosity as a problem as I prefer explicit vs implicit in such matters. That said, if you want to redesign this setup at a higher level, feel free to propose alternates.

An alternative is having a ground truth list of entrypoints that we update explicitly, basically what linux/x86_64/entrypoints.txt is now, and then the OS/arch specific files could use regex. That way the list of entrypoints is created before add_entrypoint_object What you all think about that? Or should we stick with how things are now?

While I'm not opposed to having a ground truth list of entrypoints, in fact I think it would be a good idea, I am concerned that if we're doing regexes against that then the end result might be the same as just using regexes as was previously proposed, except that there's another new place to remember to add the entrypoint.

abrachet abandoned this revision.Apr 18 2022, 8:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 18 2022, 8:47 AM