This is an archive of the discontinued LLVM Phabricator instance.

[libc] Enable linux directory entries syscalls in riscv64
ClosedPublic

Authored by mikhail.ramalho on Apr 6 2023, 12:56 PM.

Details

Summary

This patch updates the struct dirent to be on par with glibc (by adding
a missing d_type member) and update the readdir call to use SYS_getdents64
instead of SYS_getdents.

The change to SYS_getdents64 was necessary because SYS_getdents returns
a slightly different struct (without the d_type member), so when
returning a pointer to our internal buffer holding the directory entries,
the contents of the buffer would be off-by-one, resulting in garbage.

To support SYS_getdents, buffer is rewritten everytime the syscall is
called: as per the documentation, d_type is always the last byte in the
structure, so the rewrite:

  1. gets d_type from the last byte in the structure
  2. shifts d_name by one
  3. stores d_type where d_name was previously
  4. clears previous d_type location

Diff Detail

Event Timeline

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 6 2023, 12:56 PM

Add support for SYS_getdents

mikhail.ramalho published this revision for review.Apr 7 2023, 11:07 AM
mikhail.ramalho edited the summary of this revision. (Show Details)
mikhail.ramalho edited the summary of this revision. (Show Details)

Fix error message

michaelrj added inline comments.Apr 11 2023, 3:45 PM
libc/src/__support/File/dir.cpp
45

this struct should be moved to its own file in include/llvm-libc-types/ to match struct dirent

51

you should probably add a comment clarifying that the 1 here represents the minimum width of this string, and that this struct has a variable length.

libc/src/__support/File/dir.cpp
45

struct linux_dirent is kinda an oddball... glibc never defines it, see https://man7.org/linux/man-pages/man2/getdents.2.html:
Note: There is no definition of struct linux_dirent in glibc; see NOTES.

The notes mention:

Library support for getdents64() was added in glibc 2.30; Glibc
does not provide a wrapper for getdents(); call getdents() (or
getdents64() on earlier glibc versions) using syscall(2).  In
that case you will need to define the linux_dirent or
linux_dirent64 structure yourself.

Maybe I can add a comment just explaining that?

I also doesn't seem to be exposed by the kernel: https://elixir.bootlin.com/linux/v4.7/source/fs/readdir.c#L150

51

will do!

sivachandra added inline comments.Apr 13 2023, 12:57 AM
libc/include/llvm-libc-types/ino64_t.h
12 ↗(On Diff #511741)

As far as I can tell, this is exactly the same as ino_t. Why do we need this?

libc/include/llvm-libc-types/struct_dirent.h
12

This will pollute user namespace.

17

This type definition should live in llvm-libc-types/ino_t.h. Similar structuring for off_t;

21

Have you checked to see if this breaks x86_64?

libc/src/__support/File/dir.cpp
41

This file is supposed to contain the platform independent implementation. Can we improve the platform abstraction to push variance within Linux into the Linux implementation?

81

Calling memcpy on overlapping source and destination memory is undefined.

mikhail.ramalho marked 4 inline comments as done.Apr 17 2023, 6:35 AM
mikhail.ramalho added inline comments.
libc/include/llvm-libc-types/ino64_t.h
12 ↗(On Diff #511741)

I'm following glibc here, as it defines both types separately, despite them being the same. Do you think it's better to remove it and just typedef ino64_t to be ino_t?

libc/include/llvm-libc-types/struct_dirent.h
12

Do you have any suggestions on how to have SYS_getdents64 available here?

21

No regressions in x86_64.

libc/src/__support/File/dir.cpp
41

do you mean creating a linux/ dir with the Dir::read() implementation?

81

changed to memmove.

mikhail.ramalho marked 2 inline comments as done.
  • Added comment about string lenght
  • changed memcopy to memmove to prevent UB
jrtc27 requested changes to this revision.Apr 17 2023, 7:01 AM
jrtc27 added inline comments.
libc/include/llvm-libc-types/ino64_t.h
12 ↗(On Diff #514209)

uintptr_t is most certainly not the same thing as ino64_t. I remember complaining about ino_t and time_t(?) being typedef'ed to that a while back but nobody fixed it. NAK on this though; use proper types.

This revision now requires changes to proceed.Apr 17 2023, 7:01 AM
mikhail.ramalho marked an inline comment as done.Apr 17 2023, 7:32 AM
mikhail.ramalho added inline comments.
libc/include/llvm-libc-types/ino64_t.h
12 ↗(On Diff #514209)

I just double-checked and in glibc it's defined as __uint64_t in 32 bits and unsigned long int in 64 bits.

We can use __UINT64_TYPE__ then (long long unsigned int in 32 bits and long unsigned int in 64 bits).

mikhail.ramalho marked an inline comment as done.

Changed ino_t and ino64_t type to UINT64_TYPE

jrtc27 added inline comments.Apr 17 2023, 7:43 AM
libc/include/llvm-libc-types/ino64_t.h
12 ↗(On Diff #514209)

FYI Darwin (unnecessarily? for consistency?) uses unsigned long long even on 64-bit for uint64_t but that's fine, it does still define ino_t as uint64_t.

michaelrj added inline comments.Apr 18 2023, 4:52 PM
libc/include/llvm-libc-types/struct_dirent.h
17

since ino64_t and ino_t are both typedef'd to uint64_t now, this should be unnecessary. Same for off64_t and off_t.

libc/src/__support/File/dir.cpp
45

glibc not providing the struct publicly doesn't mean that we can't. I think it's cleaner to have the type defined in llvm-libc-types and then use #ifdef to include the appropriate struct.

mikhail.ramalho marked 5 inline comments as done.
  • Added new struct linux_dirent to llvm-libc-types
  • Removed both ino64_t and off64_t
libc/src/__support/File/dir.cpp
45

got it

Add linux_dirent to dirent.h only if unix is defined

michaelrj added inline comments.Apr 20 2023, 11:50 AM
libc/include/dirent.h.def
14 ↗(On Diff #515327)

if this is a linux type then it probably makes more sense for this to be __linux__

libc/include/llvm-libc-types/off64_t.h
12 ↗(On Diff #515327)

I didn't mean you should delete off64_t completely, only that you didn't need it in that specific place.

libc/src/__support/File/dir.cpp
14–15

now that we've moved to cpp/new.h for allocation, you should be able to remove the stdlib include as well as the string include.

47

We generally avoid calling public names of libc function in our internal code, if you want to call strlen you should use the internal implementation in string/string_utils.h called string_length.

65

You should also use the internal memmove which is in src/string/memory_utils/memmove_implementations.h and memset (same pattern)

mikhail.ramalho marked 3 inline comments as done.Apr 21 2023, 7:17 AM
mikhail.ramalho added inline comments.
libc/include/llvm-libc-types/off64_t.h
12 ↗(On Diff #515327)

We don't actually need it since, as you mentioned, off_t and off64_t are the same types (the same for ino_t and ino64_t). Do we want to keep them both in the code base?

  • Changed ifdef unix to linux
  • Changed strlen, memset and memmove to use libc's internal implementation

ping.

@sivachandra can you clarify what you mean by "This file is supposed to contain the platform independent implementation. Can we improve the platform abstraction to push variance within Linux into the Linux implementation?"?

sivachandra added inline comments.Apr 26 2023, 9:50 AM
libc/src/__support/File/dir.cpp
41

This file is a platform independent abstraction for the dirent API. So, all references to Linux pieces and variances should move to linux_dir.cpp. Which means that we should not have the #ifdef for Linux syscalls here. I have not looked in to it in detail, but what I was trying to say that, if are unable to fit these variances in the current platform abstraction, we should enhance the abstraction instead of leaking platform specific details in to this file.

Moved linux specific implementation to linux_dir.cpp

mikhail.ramalho marked 2 inline comments as done.Apr 27 2023, 11:52 AM

@sivachandra does my last chance address your comment?

The Linux-specific handling was moved to the linux_dir.cpp file.

@sivachandra does my last chance address your comment?

That particular comment was addressed I believe. I want to do some homework wrt the types used in this patch. I will get back shortly.

libc/include/llvm-libc-types/struct_linux_dirent.h
1 ↗(On Diff #517365)

This is an internal type so should not be define in a public header file.

mikhail.ramalho added inline comments.May 1 2023, 1:26 PM
libc/include/llvm-libc-types/struct_linux_dirent.h
1 ↗(On Diff #517365)

This sounds contradictory to what @michaelrj suggested in one of his comments:

glibc not providing the struct publicly doesn't mean that we can't. I think it's cleaner to have the type defined in llvm-libc-types and then use #ifdef to include the appropriate struct.

michaelrj added inline comments.May 1 2023, 1:32 PM
libc/include/llvm-libc-types/struct_linux_dirent.h
1 ↗(On Diff #517365)

From what I can tell from the man page, the linux_dirent struct is somewhat standard, but also often not defined by the C library. Given that, I personally prefer defining it like the other dirent struct as a public type. If we decide not to define it publicly, I think we should still define it in a separate header and use ifdef to include it as needed. That header might end up in __support/File/linux.

sivachandra added inline comments.May 1 2023, 1:37 PM
libc/include/llvm-libc-types/struct_linux_dirent.h
1 ↗(On Diff #517365)

Sorry, I missed @michaelrj's comment - AFAICT, linux_dirent is only relevant for the syscall, so we should not expose it publicly. We have examples of other such types in the libc: https://github.com/llvm/llvm-project/blob/main/libc/src/signal/linux/signal_utils.h#L29. And since it would be an internal type, you should name it LinuxDirent.

sivachandra added inline comments.May 1 2023, 1:43 PM
libc/include/llvm-libc-types/struct_linux_dirent.h
1 ↗(On Diff #517365)

From what I can tell from the man page, the linux_dirent struct is somewhat standard, but also often not defined by the C library.

You are perhaps looking at https://man7.org/linux/man-pages/man2/getdents.2.html, which says:

These are not the interfaces you are interested in.  Look at readdir(3) for the POSIX-conforming C library interface.  This page documents the bare kernel system call interfaces.
  • make struct linux_dirent a private header
  • rebased

We can land this after my comments from this round are addressed.

libc/include/llvm-libc-types/ino_t.h
12 ↗(On Diff #518727)

Is this required for this change? If not, remove it. If yes, can explain why?

Defining ino_t to be a typedef of __UINTPTR_TYPE__ might not be the best but it was done because the size of ino_t matches the pointer size on most platforms. If have ways to do this better, feel free to send a separate patch.

libc/include/llvm-libc-types/off64_t.h
12 ↗(On Diff #515327)

Linux man pages for fopencookie explicitly specify off64_t so we should keep it.

libc/src/__support/File/linux_dir.cpp
39

Do we need this alternate path at all for the platforms we care about now? For example, if none of the builders exercise this path, then it is dead code. The four platforms I have checked and care for at this time, x86_64, aarch64, arm32 and riscv64, all have SYS_getdents64. If this alternate path is not an immediate necessity, we should skip it.

libc/src/__support/File/linux_dir.cpp
39

@sivachandra, SYS_getdents doesn't seem to be defined in riscv64, so we need to add the patch to support SYS_getdents64 instead. Please, check: https://github.com/riscv-collab/riscv-gnu-toolchain/blob/master/linux-headers/include/asm-generic/unistd.h

May I ask how you are testing this? I tested cross-compiling libc with clang, and building it natively in qemu, and SYS_getdents was not defined in either case.

sivachandra added inline comments.May 4 2023, 12:14 PM
libc/src/__support/File/linux_dir.cpp
39

I think you understood my comment in the other way - I was suggesting the removal of the SYS_getdents path and to keep only the new SYS_getdents64 path.

mikhail.ramalho marked an inline comment as done.May 4 2023, 12:22 PM
mikhail.ramalho added inline comments.
libc/src/__support/File/linux_dir.cpp
39

Oh, I see; riscv32 doesn't seem to have SYS_getdents defined either.

I would only argue to keep this for completeness' sake, but I can remove it if you prefer.

sivachandra added inline comments.May 4 2023, 12:30 PM
libc/src/__support/File/linux_dir.cpp
39

It would be dead code not tested anywhere. So, I would suggest to just remove it.

addressed comments

sivachandra accepted this revision.May 4 2023, 1:08 PM
This revision was not accepted when it landed; it landed in state Needs Review.May 4 2023, 3:07 PM
This revision was automatically updated to reflect the committed changes.