This is an archive of the discontinued LLVM Phabricator instance.

[IPSCCP] Add assertion to surface cases where we zap returns with overdefined users.
ClosedPublic

Authored by fhahn on Jul 24 2019, 9:57 AM.

Details

Summary

We should only zap returns in functions, where all live users have a
replace-able value (are not overdefined). Unused return values should be
undefined.

This should make it easier to detect bugs like in PR42738.

Alternatively we could bail out of zapping the function returns, but I
think it would be better to address those divergences between function
and call-site values where they are actually caused.

Event Timeline

fhahn created this revision.Jul 24 2019, 9:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 24 2019, 9:57 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
This revision is now accepted and ready to land.Jul 24 2019, 2:31 PM
davide accepted this revision.Jul 24 2019, 5:12 PM

LGTM.

This revision was automatically updated to reflect the committed changes.

A couple of our CI jobs have found this assertion triggering. The first is when compiling the linux kernel using allmodconfig. The second is on this reduced example using LTO:

// test1.cpp
// clang --target=armv7a-linux-gnueabihf test1.cpp -O2 -flto -fuse-ld=lld
struct b {
  double a;
};
void d();
short e(b) {
  try {
    d();
  } catch (short) {
  }
}
int g() {
  b f;
  try {
    e(f);
  } catch (short) {
  }
}
int main() {
  try {
    try {
      g();
    } catch (char) {
    }
  } catch (char) {
  }
}

Results in:

ld.lld: /work/llvm/src/llvm/lib/Transforms/Scalar/SCCP.cpp:1940: void findReturnsToZap(llvm::Function &, SmallVector<llvm::ReturnInst *, 8> &, (anonymous namespace)::SCCPSolver &): Assertion `all_of(F.users(), [&Solver](User *U) { if (isa<Instruction>(U) && !Solver.isBlockExecutable(cast<Instruction>(U)->getParent())) return false; if (U->getType()->isStructTy()) { return all_of( Solver.getStructLatticeValueFor(U), [](const LatticeVal &LV) { return !LV.isOverdefined(); }); } return !Solver.getLatticeValueFor(U).isOverdefined(); }) && "We can only zap functions where all live users have a concrete value"' failed.
Stack dump:
0.      Program arguments: /work/llvm/build/bin/ld.lld -EL -z relro -X --hash-style=gnu --eh-frame-hdr -m armelf_linux_eabi -dynamic-linker /lib/ld-linux-armhf.so.3 -o a.out /usr/lib/gcc-cross/arm-linux-gnueabihf/7.3.0/../../../../arm-linux-gnueabihf/lib/../lib/crt1.o /usr/lib/gcc-cross/arm-linux-gnueabihf/7.3.0/../../../../arm-linux-gnueabihf/lib/../lib/crti.o /usr/lib/gcc-cross/arm-linux-gnueabihf/7.3.0/crtbegin.o -L/usr/lib/gcc-cross/arm-linux-gnueabihf/7.3.0 -L/usr/lib/gcc-cross/arm-linux-gnueabihf/7.3.0/../../../../arm-linux-gnueabihf/lib/../lib -L/usr/lib/gcc-cross/arm-linux-gnueabihf/7.3.0/../../../../lib -L/work/llvm/build/bin/../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib/arm-linux-gnueabihf/../../lib -L/usr/lib/gcc-cross/arm-linux-gnueabihf/7.3.0/../../../../arm-linux-gnueabihf/lib -L/usr/lib/gcc-cross/arm-linux-gnueabihf/7.3.0/../../.. -L/work/llvm/build/bin/../lib -L/lib -L/usr/lib -plugin /work/llvm/build/bin/../lib/LLVMgold.so -plugin-opt=mcpu=generic -plugin-opt=O2 /tmp/test1-87b8ab.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc-cross/arm-linux-gnueabihf/7.3.0/crtend.o /usr/lib/gcc-cross/arm-linux-gnueabihf/7.3.0/../../../../arm-linux-gnueabihf/lib/../lib/crtn.o 
1.      Running pass 'Interprocedural Sparse Conditional Constant Propagation' on module 'ld-temp.o'.
 #0 0x0000000001bdc184 PrintStackTraceSignalHandler(void*) (/work/llvm/build/bin/ld.lld+0x1bdc184)
 #1 0x0000000001bd9eee llvm::sys::RunSignalHandlers() (/work/llvm/build/bin/ld.lld+0x1bd9eee)
 #2 0x0000000001bdc718 SignalHandler(int) (/work/llvm/build/bin/ld.lld+0x1bdc718)
 #3 0x00007feade941890 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12890)
 #4 0x00007feadd029e97 raise /build/glibc-OTsEL5/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
 #5 0x00007feadd02b801 abort /build/glibc-OTsEL5/glibc-2.27/stdlib/abort.c:81:0
 #6 0x00007feadd01b39a __assert_fail_base /build/glibc-OTsEL5/glibc-2.27/assert/assert.c:89:0
 #7 0x00007feadd01b412 (/lib/x86_64-linux-gnu/libc.so.6+0x30412)
 #8 0x0000000003884f75 (/work/llvm/build/bin/ld.lld+0x3884f75)
 #9 0x00000000038819ef llvm::runIPSCCP(llvm::Module&, llvm::DataLayout const&, llvm::TargetLibraryInfo const*, llvm::function_ref<llvm::AnalysisResultsForFn (llvm::Function&)>) (/work/llvm/build/bin/ld.lld+0x38819ef)
#10 0x000000000366acec (anonymous namespace)::IPSCCPLegacyPass::runOnModule(llvm::Module&) (/work/llvm/build/bin/ld.lld+0x366acec)
#11 0x000000000408b76d llvm::legacy::PassManagerImpl::run(llvm::Module&) (/work/llvm/build/bin/ld.lld+0x408b76d)
#12 0x00000000031b4d6b (anonymous namespace)::opt(llvm::lto::Config&, llvm::TargetMachine*, unsigned int, llvm::Module&, bool, llvm::ModuleSummaryIndex*, llvm::ModuleSummaryIndex const*) (/work/llvm/build/bin/ld.lld+0x31b4d6b)
#13 0x00000000031b38e6 llvm::lto::backend(llvm::lto::Config&, std::function<std::unique_ptr<llvm::lto::NativeObjectStream, std::default_delete<llvm::lto::NativeObjectStream> > (unsigned int)>, unsigned int, std::unique_ptr<llvm::Module, std::default_delete<llvm::Module> >, llvm::ModuleSummaryIndex&) (/work/llvm/build/bin/ld.lld+0x31b38e6)
#14 0x00000000031a91bb llvm::lto::LTO::runRegularLTO(std::function<std::unique_ptr<llvm::lto::NativeObjectStream, std::default_delete<llvm::lto::NativeObjectStream> > (unsigned int)>) (/work/llvm/build/bin/ld.lld+0x31a91bb)
#15 0x00000000031a8b71 llvm::lto::LTO::run(std::function<std::unique_ptr<llvm::lto::NativeObjectStream, std::default_delete<llvm::lto::NativeObjectStream> > (unsigned int)>, std::function<std::function<std::unique_ptr<llvm::lto::NativeObjectStream, std::default_delete<llvm::lto::NativeObjectStream> > (unsigned int)> (unsigned int, llvm::StringRef)>) (/work/llvm/build/bin/ld.lld+0x31a8b71)
#16 0x0000000001d2cfa0 lld::elf::BitcodeCompiler::compile() (/work/llvm/build/bin/ld.lld+0x1d2cfa0)
#17 0x0000000001cb2226 void lld::elf::LinkerDriver::compileBitcodeFiles<llvm::object::ELFType<(llvm::support::endianness)1, false> >() (/work/llvm/build/bin/ld.lld+0x1cb2226)
#18 0x0000000001ca678a void lld::elf::LinkerDriver::link<llvm::object::ELFType<(llvm::support::endianness)1, false> >(llvm::opt::InputArgList&) (/work/llvm/build/bin/ld.lld+0x1ca678a)
#19 0x0000000001c9ebb4 lld::elf::LinkerDriver::main(llvm::ArrayRef<char const*>) (/work/llvm/build/bin/ld.lld+0x1c9ebb4)
#20 0x0000000001c9d0f7 lld::elf::link(llvm::ArrayRef<char const*>, bool, llvm::raw_ostream&) (/work/llvm/build/bin/ld.lld+0x1c9d0f7)
#21 0x0000000001b7d838 main (/work/llvm/build/bin/ld.lld+0x1b7d838)
#22 0x00007feadd00cb97 __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:344:0
#23 0x0000000001b7d02a _start (/work/llvm/build/bin/ld.lld+0x1b7d02a)
clang-10: error: unable to execute command: Aborted (core dumped)
clang-10: error: linker command failed due to signal (use -v to see invocation)

I do have a mail from our CI system with a series of reproducible steps that can clone and checkout revisions of LLVM and Linux that can cross-compile the linux kernel for aarch64 on an x86_64 linux machine that I'll forward on to you. I also have the preprocessed source of kernel/signal.c that failed that I'll include in the mail but I can also probably add to a PR if I compress it.

Would you be able to take a look to see if the failures are genuine bugs in clang that the assertion has uncovered or whether it is triggering by mistake?

It is

A couple of our CI jobs have found this assertion triggering. The first is when compiling the linux kernel using allmodconfig. The second is on this reduced example using LTO:

Thanks, that's interesting, I'll take a look. This is also triggering a different assertion in this job: https://ci.linaro.org/job/tcwg_kernel-build-llvm-master-aarch64-mainline-allmodconfig/401/ (@nickdesaulniers made me aware) and I think I should have a reproducer for that issue as well shortly. I should know what's going on on Monday the latest. If it is blocking your jobs, should we revert it in the meantime?

A couple of our CI jobs have found this assertion triggering. The first is when compiling the linux kernel using allmodconfig. The second is on this reduced example using LTO:

Thanks, that's interesting, I'll take a look. This is also triggering a different assertion in this job: https://ci.linaro.org/job/tcwg_kernel-build-llvm-master-aarch64-mainline-allmodconfig/401/ (@nickdesaulniers made me aware) and I think I should have a reproducer for that issue as well shortly. I should know what's going on on Monday the latest. If it is blocking your jobs, should we revert it in the meantime?

Thanks for taking a look. From a Linaro perspective there isn't anything that depends on this that needs to run over the weekend so there is no immediate need to revert if you are going to look at it next week. Nick might have more urgent jobs that are affected though so I'm happy to let his opinion take precedence.

fhahn added a comment.Jul 27 2019, 7:03 AM

I do have a mail from our CI system with a series of reproducible steps that can clone and checkout revisions of LLVM and Linux that can cross-compile the linux kernel for aarch64 on an x86_64 linux machine that I'll forward on to you. I also have the preprocessed source of kernel/signal.c that failed that I'll include in the mail but I can also probably add to a PR if I compress it.

Would you be able to take a look to see if the failures are genuine bugs in clang that the assertion has uncovered or whether it is triggering by mistake?

Would it be possible to provide the bitcode file before LTO without too much effort? The reproducer builds fine on macOS without -fuse-ld=lld (fuse-ld=lld fails with an unrelated error) but unfortunately I do not have access to a suitable Linux system for cross-compiling to armv7a-linux-gnueabihf

We at least have to check if the user is a blockaddress. If they do not have any users in the IR, we won't have lattice values for them.

I do have a mail from our CI system with a series of reproducible steps that can clone and checkout revisions of LLVM and Linux that can cross-compile the linux kernel for aarch64 on an x86_64 linux machine that I'll forward on to you. I also have the preprocessed source of kernel/signal.c that failed that I'll include in the mail but I can also probably add to a PR if I compress it.

Would you be able to take a look to see if the failures are genuine bugs in clang that the assertion has uncovered or whether it is triggering by mistake?

Would it be possible to provide the bitcode file before LTO without too much effort? The reproducer builds fine on macOS without -fuse-ld=lld (fuse-ld=lld fails with an unrelated error) but unfortunately I do not have access to a suitable Linux system for cross-compiling to armv7a-linux-gnueabihf

I can get the bitcode object out of lld before the code-generator is run, but as there is only one input this is pretty much identical to the output of the compiler.

I'm struggling to reproduce the failure outside of lld. I've tried:

  • -fuse-ld=gold with the llvmLTO.so plugin
  • Extracting what looks to be the equivalent command line options that llvm would pass to the lto plugin --data-sections --function-sections --relax-elf-relocations --relocation-model=static --addrsig --disable-verify -O3 -mcpu=generic

I can only think that there is some way that lld is setting up the code generator via the API that I can't capture with the external tools.

I will send you (via email due to file size) the output of lld --reproduce which should capture all the dependencies in an archive. I've modified the example a bit to add a test2.cpp to provide a definition of void d() {} this needs to be in a different source file as the test will pass if I define in test1.cpp.

I've checked that the reproducer works on my machine (TM).