This is an archive of the discontinued LLVM Phabricator instance.

[LTO] Fix assertion failed when flushing bitcode incrementally for LTO output.
ClosedPublic

Authored by Enna1 on Oct 22 2021, 2:01 AM.

Details

Summary

In https://reviews.llvm.org/D86905, we introduce an optimization, when lld emits LLVM bitcode,
we allow bitcode writer flush data to disk early when buffered data size is above some threshold.

But when --plugin-opt=emit-llvm and -o /dev/null are used,
lld will trigger assertion BytesRead >= 0 && static_cast<size_t>(BytesRead) == BytesFromDisk.
When we write output to /dev/null, BytesRead is zero, but at this program point BytesFromDisk is always non-zero.

This patch fix this assertion failed

Diff Detail

Unit TestsFailed

Event Timeline

Enna1 created this revision.Oct 22 2021, 2:01 AM
Enna1 requested review of this revision.Oct 22 2021, 2:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 22 2021, 2:01 AM
Enna1 edited the summary of this revision. (Show Details)Oct 22 2021, 2:02 AM

Can you add a test?

Enna1 added a comment.Oct 22 2021, 7:35 AM

On default, command line option FlushThreshold is 512(M) defined in llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

static cl::opt<uint32_t> FlushThreshold(
    "bitcode-flush-threshold", cl::Hidden, cl::init(512),
    cl::desc("The threshold (unit M) for flushing LLVM bitcode."));

and AFAIK, there is no command line option for lld to modify FlushThreshold value.
So to reproduce this assertion failed in a tiny testcase, we need modify FlushThreshold to 0 manually in BitcodeWriter constructor:

BitcodeWriter::BitcodeWriter(SmallVectorImpl<char> &Buffer, raw_fd_stream *FS)
    : Buffer(Buffer), Stream(new BitstreamWriter(Buffer, FS, /*FlushThreshold*/0)) {
  writeBitcodeHeader(*Stream);
}

Then we can trigger this assertion failed:

$ clang -c -flto tu1.cpp -o tu1.o
$ clang -c -flto tu2.cpp -o tu2.o
$ clang -fuse-ld=lld -flto -Wl,--plugin-opt=emit-llvm tu1.o tu2.o -o /dev/null

The content of tu1.cpp and tu2.cpp:

// tu1.cpp
int unused(int a);
int probably_inlined(int a);
int main(int argc, const char *argv[]) {
  return probably_inlined(argc);
}
// tu2.cpp
int unused(int a) {
  return a + 1;
}
int probably_inlined(int a) {
  return a + 2;
}

On default, command line option FlushThreshold is 512(M) defined in llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

static cl::opt<uint32_t> FlushThreshold(
    "bitcode-flush-threshold", cl::Hidden, cl::init(512),
    cl::desc("The threshold (unit M) for flushing LLVM bitcode."));

and AFAIK, there is no command line option for lld to modify FlushThreshold value.

You can set internal an llvm option such as this from lld via -mllvm -bitcode-flush-threshold=0, or from the clang driver via -Wl,-mllvm,-bitcode-flush-threshold=0

Enna1 updated this revision to Diff 381696.Oct 22 2021, 7:13 PM

Add a test.

MaskRay added a comment.EditedOct 27 2021, 4:01 PM

Before openLTOOutputFile is called, the output may be non-existent.
Then if (!sys::fs::is_regular_file(file)) will disable the D86905 memory optimization.

The D86905 memory optimization relies on read(2) reading back bytes in the file (for the BackpatchWord operation).
However, read(2) on /dev/null just returns 0 and breaks the D86905 assumption.

Now the optimization looks subtle to me. The special file opening operation probably does not belong to lld/ELF. All LTO users need to write some code like this.
So hope @stephan.yichao.zhao can move the subtle code to BitstreamWriter.h.

MaskRay requested changes to this revision.Oct 27 2021, 4:03 PM
This revision now requires changes to proceed.Oct 27 2021, 4:03 PM

Before openLTOOutputFile is called, the output may be non-existent.
Then if (!sys::fs::is_regular_file(file)) will disable the D86905 memory optimization.

The code eventually uses this check to decide if a file is seekable.
The windows branch checks if a file is a regular file because of pipe. For non-linux, I guess the lseek(/dev/null, 0) does not return -1 for some reason.
Is it possible to put the check here?

Thanks for the pointer. Do you have suggestion on how/where to fix the issue leveraging SupportsSeeking?

Thanks for the pointer. Do you have suggestion on how/where to fix the issue leveraging SupportsSeeking?

The spec of lseek says

The behavior of lseek() on devices which are incapable of seeking is implementation-defined. The value of the file offset associated with such a device is undefined.

Is /dev/null a device-like file? If so, using lseek is not a stable approach.

Maybe using the following for both WIN32 and non-WIN32 is better

SupportsSeeking = !EC && Status.type() == sys::fs::file_type::regular_file;
stephan.yichao.zhao added a comment.EditedOct 27 2021, 5:38 PM

To limit the restriction to only devices-like file, this says character and block types are devices, using LLVM enums, it could be like this around here:

sys::fs::file_status Status;
std::error_code EC = status(FD, Status);
if (EC ||
    Status.type() == sys::fs::file_type::block_file || 
    Status.type() == sys::fs::file_type::character_file) {
  SupportsSeeking = false;
} else {
  off_t loc = ::lseek(FD, 0, SEEK_CUR);
  SupportsSeeking = loc != (off_t)-1;
}
MaskRay added a comment.EditedOct 27 2021, 9:31 PM

Thanks for the pointer. Do you have suggestion on how/where to fix the issue leveraging SupportsSeeking?

The spec of lseek says

The behavior of lseek() on devices which are incapable of seeking is implementation-defined. The value of the file offset associated with such a device is undefined.

Is /dev/null a device-like file? If so, using lseek is not a stable approach.

Maybe using the following for both WIN32 and non-WIN32 is better

SupportsSeeking = !EC && Status.type() == sys::fs::file_type::regular_file;

Agreed. Does this change address this patch's issue (ld.lld --plugin-opt=emit-llvm -mllvm -bitcode-flush-threshold=0 -o /dev/null %t.o)?

Can you post a patch? (Patching lld/ELF/LTO.cpp gives me a feeling working around a pitfall which might well hit others, so fixing the source would be nice.)

Agreed. Does this change address this patch's issue (ld.lld --plugin-opt=emit-llvm -mllvm -bitcode-flush-threshold=0 -o /dev/null %t.o)?

Can you post a patch? (Patching lld/ELF/LTO.cpp gives me a feeling working around a pitfall which might well hit others, so fixing the source would be nice.)

It needs some test to confirm. I probably do not have time to try this until this weekend.
@Enna1: If you have time, can you please verify if the suggested change works with your local test cases, and update this change if it works?

Enna1 added a comment.EditedOct 27 2021, 9:53 PM

Agreed. Does this change address this patch's issue (ld.lld --plugin-opt=emit-llvm -mllvm -bitcode-flush-threshold=0 -o /dev/null %t.o)?

Can you post a patch? (Patching lld/ELF/LTO.cpp gives me a feeling working around a pitfall which might well hit others, so fixing the source would be nice.)

It needs some test to confirm. I probably do not have time to try this until this weekend.
@Enna1: If you have time, can you please verify if the suggested change works with your local test cases, and update this change if it works?

Sure, I'll verify if the suggested change can fix the assertion failure with my local testcases.
If it works , I'll update this change.
Thanks @stephan.yichao.zhao @MaskRay

Enna1 updated this revision to Diff 382950.Oct 28 2021, 1:51 AM
Enna1 edited the summary of this revision. (Show Details)

Update patch as @stephan.yichao.zhao @MaskRay suggested.
For both WIN32 and non-WIN32, use the following code to determine SupportsSeeking

off_t loc = ::lseek(FD, 0, SEEK_CUR);
sys::fs::file_status Status;
std::error_code EC = status(FD, Status);
if (EC ||
    Status.type() == sys::fs::file_type::block_file || 
    Status.type() == sys::fs::file_type::character_file) {
  SupportsSeeking = false;
} else {
  SupportsSeeking = loc != (off_t)-1;
}
Enna1 added a comment.Oct 28 2021, 2:51 AM

The latest change fails two tests:

  • raw_pwrite_ostreamTest.TestDevNull
  • LTO/ARM/inline-asm.ll

It seems making raw_fd_ostream(/dev/null) do not support seeking breaks logic elsewhere ...

Thank you for testing this.

The test cases fail because the code seeks on /dev/null. Somehow /dev/null can be seeked to the current position, but not others, while BitcodeWriter fails because it seeks /dev/null to non-current position.

BitcodeWriter relies on raw_fd_stream. If the creation of raw_fd_stream returns an error, BitcodeWriter will gets only raw_fd_ostream. raw_fd_stream uses supportsSeek to check if returning an error.

raw_fd_stream was introduced for BitcodeWriter to support that optimization. So I am suggesting we put your original change here like this

sys::fs::file_status Status;
std::error_code EC = status(get_fd(), Status);
if (EC || Status.type() != sys::fs::file_type::regular_file)
    EC = std::make_error_code(std::errc::invalid_argument);
Enna1 updated this revision to Diff 383221.Oct 28 2021, 7:54 PM
Enna1 edited the summary of this revision. (Show Details)

Update patch.
Introduce an is_regular_file check in raw_fd_stream constructor, if not a regular file, return an error code. And BitcodeWriter will fallback use raw_fd_ostream.
This looks good under my tests.
Thanks @stephan.yichao.zhao

LGTM. Thank you! will leave this to @MaskRay for the final check.

Enna1 edited the summary of this revision. (Show Details)Oct 28 2021, 8:16 PM
steven_wu added inline comments.Oct 29 2021, 12:27 PM
llvm/lib/Support/raw_ostream.cpp
919

Would it be better to cache the result in the constructor for raw_fd_ostream? Adding another status call might not be optimal.

@dexonsmith Do you have any concerns over this?

dexonsmith added inline comments.Oct 29 2021, 2:07 PM
llvm/lib/Support/raw_ostream.cpp
919

Agreed that adding a stat is potentially expensive; is there already one in the other raw_fd_ostream constructor? Why not reuse it?

Enna1 added a comment.EditedNov 1 2021, 8:00 PM

Thanks for your review @steven_wu @dexonsmith

What about add a bool IsRegularFile field like bool SupportsSeeking in raw_fd_ostream, and change the code around here like this:

  // Get the starting position.
  off_t loc = ::lseek(FD, 0, SEEK_CUR);
  sys::fs::file_status Status;
  std::error_code EC = status(FD, Status);
  IsRegularFile = Status.type() == sys::fs::file_type::regular_file;
#ifdef _WIN32
  // MSVCRT's _lseek(SEEK_CUR) doesn't return -1 for pipes.
  SupportsSeeking = !EC && IsRegularFile;
#else
  SupportsSeeking = loc != (off_t)-1;
#endif

So we can directly use IsRegularFile in raw_fd_stream constructor.

Enna1 added a comment.Nov 5 2021, 2:53 AM

friendly ping :)

MaskRay added a comment.EditedNov 5 2021, 10:26 AM

You can upload a new diff.

I support that SupportsSeeking = additionally checks Status.type() == sys::fs::file_type::regular_file, so that Windows/non-Windows code paths are unified.

Enna1 updated this revision to Diff 385337.Nov 7 2021, 6:15 AM

Update patch.
Add a bool IsRegularFile field like bool SupportsSeeking in raw_fd_ostream.
Cache the result Status.type() == sys::fs::file_type::regular_file in the constructor for raw_fd_ostream,
and replace checking SupportsSeeking with checking IsRegularFile in raw_fd_stream constructor, if not a regular file, return an error code.

We can not add an additional check Status.type() == sys::fs::file_type::regular_file for SupportsSeeking, this will fail another two tests, @stephan.yichao.zhao wrote an explanation for this.

You can upload a new diff.

I support that SupportsSeeking = additionally checks Status.type() == sys::fs::file_type::regular_file, so that Windows/non-Windows code paths are unified.

Here :)

Thank you for testing this.

The test cases fail because the code seeks on /dev/null. Somehow /dev/null can be seeked to the current position, but not others, while BitcodeWriter fails because it seeks /dev/null to non-current position.

BitcodeWriter relies on raw_fd_stream. If the creation of raw_fd_stream returns an error, BitcodeWriter will gets only raw_fd_ostream. raw_fd_stream uses supportsSeek to check if returning an error.

raw_fd_stream was introduced for BitcodeWriter to support that optimization. So I am suggesting we put your original change here like this

sys::fs::file_status Status;
std::error_code EC = status(get_fd(), Status);
if (EC || Status.type() != sys::fs::file_type::regular_file)
    EC = std::make_error_code(std::errc::invalid_argument);
Enna1 marked 2 inline comments as done.Dec 15 2021, 2:52 AM
MaskRay added inline comments.Dec 29 2021, 6:42 PM
lld/test/ELF/lto/emit-llvm_assertion_failure.ll
7 ↗(On Diff #385337)

ld.lld has very little to do with this LTO regression. It'd be good to use llvm-lto2 or llvm-lto to test this, not ld.lld

llvm/lib/Support/raw_ostream.cpp
651

Why not SupportsSeeking = !EC && IsRegularFile; for non-Windows as well?

Enna1 updated this revision to Diff 396741.Dec 30 2021, 6:41 PM

Update Patch.

Why not SupportsSeeking = !EC && IsRegularFile; for non-Windows as well?

Use SupportsSeeking = !EC && IsRegularFile; for both Windows and non-Windows.

ld.lld has very little to do with this LTO regression. It'd be good to use llvm-lto2 or llvm-lto to test this, not ld.lld

Cause https://reviews.llvm.org/D86905 the special file opening operation is only used in lld/ELF, so for now we can only test this bug in ld.lld.

MaskRay added inline comments.Dec 30 2021, 7:19 PM
lld/test/ELF/lto/emit-llvm_assertion_failture.ll
1 ↗(On Diff #396741)

;;

1 ↗(On Diff #396741)

Use - as filename separator

Suggest: move the RUN line and the comment into emit-llvm.ll. No need for a new test file.

15 ↗(On Diff #396741)

delete trailing blank line

Enna1 updated this revision to Diff 396745.Dec 30 2021, 8:00 PM

Address review comments.

MaskRay accepted this revision.Dec 30 2021, 8:01 PM

Thanks!

This revision is now accepted and ready to land.Dec 30 2021, 8:01 PM
Enna1 updated this revision to Diff 396759.Dec 30 2021, 11:43 PM

Revert using SupportsSeeking = !EC && IsRegularFile; for both Windows and non-Windows, which makes raw_pwrite_ostreamTest.TestDevNull and LTO/ARM/inline-asm.ll failed.

This revision was landed with ongoing or failed builds.Jan 4 2022, 9:40 PM
This revision was automatically updated to reflect the committed changes.