Page MenuHomePhabricator

[SCCP] Remove forcedconstant, go to overdefined instead
ClosedPublic

Authored by fhahn on Apr 30 2019, 7:54 AM.

Details

Summary

This patch removes forcedconstant to simplify things for the
move to ValueLattice, which includes constant ranges, but no
forced constants.

This patch removes forcedconstant and changes ResolvedUndefsIn
to mark instructions with unknown operands as overdefined. This
means we do not do simplifications based on undef directly in SCCP
any longer, but this seems to hardly come up in practice (see stats
below), presumably because InstCombine & others take care
of most of the relevant folds already.

It is still beneficial to keep ResolvedUndefIn, as it allows us delaying
going to overdefined until we propagated all known information.

I also built MultiSource, SPEC2000 and SPEC2006 and compared
sccp.IPNumInstRemoved and sccp.NumInstRemoved. It looks like the impact
is quite low:

Tests: 244
Same hash: 238 (filtered out)
Remaining: 6
Metric: sccp.IPNumInstRemoved

Program base patch diff
test-suite...arks/VersaBench/dbms/dbms.test 4.00 3.00 -25.0%
test-suite...TimberWolfMC/timberwolfmc.test 38.00 34.00 -10.5%
test-suite...006/453.povray/453.povray.test 158.00 155.00 -1.9%
test-suite.../CINT2000/176.gcc/176.gcc.test 668.00 668.00 0.0%
test-suite.../CINT2006/403.gcc/403.gcc.test 1209.00 1209.00 0.0%
test-suite...arks/mafft/pairlocalalign.test 76.00 76.00 0.0%

Tests: 244
Same hash: 238 (filtered out)
Remaining: 6
Metric: sccp.NumInstRemoved

Program base patch diff
test-suite...arks/mafft/pairlocalalign.test 185.00 175.00 -5.4%
test-suite.../CINT2006/403.gcc/403.gcc.test 2059.00 2056.00 -0.1%
test-suite.../CINT2000/176.gcc/176.gcc.test 2358.00 2357.00 -0.0%
test-suite...006/453.povray/453.povray.test 317.00 317.00 0.0%
test-suite...TimberWolfMC/timberwolfmc.test 12.00 12.00 0.0%

Diff Detail

Event Timeline

fhahn created this revision.Apr 30 2019, 7:54 AM
fhahn updated this revision to Diff 235030.Dec 21 2019, 2:42 PM

Rebased

Unit tests: fail. 61088 tests passed, 1 failed and 728 were skipped.

failed: libc++.std/thread/thread_mutex/thread_mutex_requirements/thread_mutex_requirements_mutex/thread_mutex_recursive/lock.pass.cpp

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

fhahn updated this revision to Diff 235431.Dec 27 2019, 8:39 AM

Another rebase after splitting up patches.

Unit tests: fail. 61133 tests passed, 1 failed and 728 were skipped.

failed: Clang.SemaOpenCL/numbered-address-space.cl

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

You aren't really removing forcedconstant, here; instead, you're basically making every constant behave like a forcedconstant (it can be merged with another constant). I'm not very confident our extension of the SCCP algorithm to include forcedconstants is truly sound. And I think constructing ConstantRanges out of forcedconstants makes the whole foundation shakier; I'm afraid you could end up constructing a contradiction because we don't ever force the value of the forcedconstant.

I'd be much more comfortable with pretending that undef is an opaque constant, like a GlobalValue. That makes the whole algorithm much closer to a textbook algorithm, and it should be essentially sound, I think. If we don't manipulate UndefValues except through ConstantExpr folding, it should be sound as long as ConstantExpr folding is itself sound. And we don't have to do any extra work as undef gets replaced with poison. (Once we have poison constants, I think we can treat them as Unknown without any special logic.)

I'm not planning to continue reviewing the rest of the patches in this series until my comment here is addressed. Does that make sense?

I'm not planning to continue reviewing the rest of the patches in this series until my comment here is addressed. Does that make sense?

Yes that makes a lot of sense, thank you! IIUC your suggestion is independent of the ConstantRange extension and I'll try to put up a patch on top of current master.

fhahn updated this revision to Diff 243011.Feb 6 2020, 2:33 PM

Update the patch to remove forcedconstant and only folding explicit undefs.

I'll update the description. Eli, it would be great if you could have a look to check if that makes sense to you now.

fhahn retitled this revision from [SCCP] Remove forcedconstant and replace some uses with constant ranges. to [SCCP] Remove forcedconstant, use InstSimplify to fold undef values..Feb 6 2020, 2:34 PM
fhahn edited the summary of this revision. (Show Details)

I'd prefer to get rid of ResolveUndefsIn completely, if possible. If we don't special-case UndefValue anywhere in SCCP, that should just work, I think? SCCP should still be correct even though UndefValue isn't a "normal" constant; it doesn't try to make assumptions based on the pointer equivalence of two "constant" values, except for joins like PHI nodes where it shouldn't matter. Am I missing something? How much optimization power do we lose that way, versus this patch's approach?

The approach here is a little different from that. I'm understanding it correctly, the idea is that in cases where we can't prove a forced constant is actually constant, we instead go straight to overdefined? So we sort of keep our "undef=>undefined" extension of the SCCP model, but get rid of the shakiest part of it? That should work, I think... but I'm not sure the whole ResolvedUndefsIn is giving you any significant benefit at that point. At least, I can't think of a case off the top of my head where it helps. Maybe I'm forgetting something.

I'm not completely confident calling Simplify* with operands that aren't Constants does the right thing here; that's sort of an extension to the algorithm. I guess it should work, though?

Does this patch in its current form still depend on any of the other patches?

Oh, hmm, in terms of optimization I guess you get some benefit from making assumptions about the behavior of undef, instead of only interacting with it through constant folding. You get better results by implicitly unifying UndefValue with other constants in PHI nodes and call arguments, and forcing branches one way instead of both ways. Maybe this is better than completely dropping ResolveUndefsIn. Maybe this approach is fine, then.

I'd prefer to finish and land this first, and let it bake a little while, before landing the other pieces. It seems a little riskier than the rest, since it's really hard to reason about undef-related changes.

fhahn updated this revision to Diff 243456.Feb 9 2020, 11:58 AM

Removed simplifications based on undef, just go to overdefined directly.

I'd prefer to get rid of ResolveUndefsIn completely, if possible. If we don't special-case UndefValue anywhere in SCCP, that should just work, I think? SCCP should still be correct even though UndefValue isn't a "normal" constant; it doesn't try to make assumptions based on the pointer equivalence of two "constant" values, except for joins like PHI nodes where it shouldn't matter. Am I missing something? How much optimization power do we lose that way, versus this patch's approach?

The approach here is a little different from that. I'm understanding it correctly, the idea is that in cases where we can't prove a forced constant is actually constant, we instead go straight to overdefined? So we sort of keep our "undef=>undefined" extension of the SCCP model, but get rid of the shakiest part of it? That should work, I think... but I'm not sure the whole ResolvedUndefsIn is giving you any significant benefit at that point. At least, I can't think of a case off the top of my head where it helps. Maybe I'm forgetting something.

I think this and the first paragraph are addressed in the comments later on.

I'm not completely confident calling Simplify* with operands that aren't Constants does the right thing here; that's sort of an extension to the algorithm. I guess it should work, though?

The calls to Simplify* mostly took care of folding existing undef constants. There are a few tests that check for that, but in practice it seems to not really pop up, as other passes should clean up those. I've also compared the stats to the previous version of the patch, and there are no changes.

Does this patch in its current form still depend on any of the other patches?

No, it does not. I've removed the dependencies.

Oh, hmm, in terms of optimization I guess you get some benefit from making assumptions about the behavior of undef, instead of only interacting with it through constant folding. You get better results by implicitly unifying UndefValue with other constants in PHI nodes and call arguments, and forcing branches one way instead of both ways. Maybe this is better than completely dropping ResolveUndefsIn. Maybe this approach is fine, then.

Yes I think there is a benefit of delaying resolving undefs till we finished with an iteration, as we might discover a concrete value later, e.g. for PHIs where we discover that some predecessors are executable later in the iteration. As is I think the patch removes all forced-constant related machinery, which should be the most problematic part of ResolvedUndefsIn.

I'd prefer to finish and land this first, and let it bake a little while, before landing the other pieces. It seems a little riskier than the rest, since it's really hard to reason about undef-related changes.

Sounds good! What do you think of the latest version of the patch? The remainder of ResolvedUndefsIn deals with undef constants as branch conditions. I think keeping those should be save, as they do not really interact with the lattice values.

efriedma accepted this revision.Feb 10 2020, 5:27 PM

LGTM

I'm surprised this works so well, but I'm very happy to get rid of forcedconstant if there isn't a practical performance impact.

This revision is now accepted and ready to land.Feb 10 2020, 5:27 PM
fhahn retitled this revision from [SCCP] Remove forcedconstant, use InstSimplify to fold undef values. to [SCCP] Remove forcedconstant, go to overdefined instead.Feb 11 2020, 7:04 AM
fhahn edited the summary of this revision. (Show Details)
fhahn added a comment.Feb 11 2020, 7:31 AM

LGTM

I'm surprised this works so well, but I'm very happy to get rid of forcedconstant if there isn't a practical performance impact.

Thanks Eli! I'll collected the stats again and the impact seems very low, as indicated, but I'll make sure to keep an eye out for any performance regressions.

This revision was automatically updated to reflect the committed changes.

The Linux kernel hits the assert in markConstant on arm32 defconfig in net/mac80211/sta_info.o, as reported by Linaro's CI:

00:14:30 clang: /home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-src/lib/Transforms/Scalar/SCCP.cpp:131: bool {anonymous}::LatticeVal::markConstant(llvm::Constant*): Assertion `isUnknown()' failed.
00:14:30 Stack dump:
00:14:30 0.	Program arguments: /home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang -Wp,-MD,net/mac80211/.sta_info.o.d -nostdinc -isystem /home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/lib/clang/11.0.0/include -I./arch/arm/include -I./arch/arm/include/generated -I./include -I./arch/arm/include/uapi -I./arch/arm/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ -mbig-endian -Qunused-arguments -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration -Werror=implicit-int -Wno-format-security -std=gnu89 --target=arm-linux-gnueabihf --prefix=/usr/bin/ --gcc-toolchain=/usr -no-integrated-as -Werror=unknown-warning-option -fno-dwarf2-cfi-asm -mabi=aapcs-linux -mfpu=vfp -funwind-tables -meabi gnu -marm -Wa,-mno-warn-deprecated -D__LINUX_ARM_ARCH__=6 -march=armv6k -mtune=arm1136j-s -msoft-float -Uarm -fno-delete-null-pointer-checks -Wno-address-of-packed-member -O2 -Wframe-larger-than=1024 -fstack-protector-strong -Wno-format-invalid-specifier -Wno-gnu -Wno-tautological-compare -mno-global-merge -Wno-unused-const-variable -ftrivial-auto-var-init=pattern -pg -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -fno-strict-overflow -fno-merge-all-constants -fno-stack-check -Werror=date-time -Werror=incompatible-pointer-types -fmacro-prefix-map=./= -Wno-initializer-overrides -Wno-format -Wno-sign-compare -Wno-format-zero-length -DDEBUG -fsanitize-coverage=trace-pc -fsanitize-coverage=trace-cmp -DKBUILD_MODFILE="net/mac80211/mac80211" -DKBUILD_BASENAME="sta_info" -DKBUILD_MODNAME="mac80211" -c -o net/mac80211/sta_info.o net/mac80211/sta_info.c 
00:14:30 1.	<eof> parser at end of file
00:14:30 2.	Per-module optimization passes
00:14:30 3.	Running pass 'Interprocedural Sparse Conditional Constant Propagation' on module 'net/mac80211/sta_info.c'.
00:14:31  #0 0x000055995af27e4a llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x1b48e4a)
00:14:31  #1 0x000055995af25944 llvm::sys::RunSignalHandlers() (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x1b46944)
00:14:31  #2 0x000055995af25bb5 llvm::sys::CleanupOnSignal(unsigned long) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x1b46bb5)
00:14:31  #3 0x000055995ae9cf38 CrashRecoverySignalHandler(int) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x1abdf38)
00:14:31  #4 0x00007f9138484890 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12890)
00:14:31  #5 0x00007f9137135e97 raise /build/glibc-OTsEL5/glibc-2.27/signal/../sysdeps/unix/sysv/linux/raise.c:51:0
00:14:31  #6 0x00007f9137137801 abort /build/glibc-OTsEL5/glibc-2.27/stdlib/abort.c:81:0
00:14:31  #7 0x00007f913712739a __assert_fail_base /build/glibc-OTsEL5/glibc-2.27/assert/assert.c:89:0
00:14:31  #8 0x00007f9137127412 (/lib/x86_64-linux-gnu/libc.so.6+0x30412)
00:14:31  #9 0x000055995ae0aced (anonymous namespace)::LatticeVal::markConstant(llvm::Constant*) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x1a2bced)
00:14:31 #10 0x000055995ae0cf33 (anonymous namespace)::SCCPSolver::markConstant(llvm::Value*, llvm::Constant*) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x1a2df33)
00:14:31 #11 0x000055995ae12875 llvm::InstVisitor<(anonymous namespace)::SCCPSolver, void>::visit(llvm::Instruction&) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x1a33875)
00:14:31 #12 0x000055995ae133de (anonymous namespace)::SCCPSolver::markUsersAsChanged(llvm::Value*) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x1a343de)
00:14:31 #13 0x000055995ae1389c (anonymous namespace)::SCCPSolver::Solve() (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x1a3489c)
00:14:31 #14 0x000055995ae1511f llvm::runIPSCCP(llvm::Module&, llvm::DataLayout const&, std::function<llvm::TargetLibraryInfo const& (llvm::Function&)>, llvm::function_ref<llvm::AnalysisResultsForFn (llvm::Function&)>) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x1a3611f)
00:14:31 #15 0x000055995aa73269 (anonymous namespace)::IPSCCPLegacyPass::runOnModule(llvm::Module&) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x1694269)
00:14:31 #16 0x000055995a9342b1 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x15552b1)
00:14:31 #17 0x000055995b14d70d (anonymous namespace)::EmitAssemblyHelper::EmitAssembly(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x1d6e70d)
00:14:31 #18 0x000055995b14f195 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x1d70195)
00:14:31 #19 0x000055995bc60bf5 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x2881bf5)
00:14:31 #20 0x000055995c572379 clang::ParseAST(clang::Sema&, bool, bool) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x3193379)
00:14:31 #21 0x000055995bc60d78 clang::CodeGenAction::ExecuteAction() (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x2881d78)
00:14:31 #22 0x000055995b694419 clang::FrontendAction::Execute() (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x22b5419)
00:14:31 #23 0x000055995b653352 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x2274352)
00:14:31 #24 0x000055995b743b0c clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x2364b0c)
00:14:31 #25 0x0000559959f99d14 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0xbbad14)
00:14:31 #26 0x0000559959f950c9 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0xbb60c9)
00:14:31 #27 0x000055995b535e55 void llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, bool*) const::'lambda'()>(long) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x2156e55)
00:14:31 #28 0x000055995ae9d013 llvm::CrashRecoveryContext::RunSafely(llvm::function_ref<void ()>) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x1abe013)
00:14:31 #29 0x000055995b536938 clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, bool*) const (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x2157938)
00:14:31 #30 0x000055995b512215 clang::driver::Compilation::ExecuteCommand(clang::driver::Command const&, clang::driver::Command const*&) const (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x2133215)
00:14:31 #31 0x000055995b512cd1 clang::driver::Compilation::ExecuteJobs(clang::driver::JobList const&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) const (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x2133cd1)
00:14:31 #32 0x000055995b51b129 clang::driver::Driver::ExecuteCompilation(clang::driver::Compilation&, llvm::SmallVectorImpl<std::pair<int, clang::driver::Command const*> >&) (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0x213c129)
00:14:31 #33 0x0000559959f29a7f main (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0xb4aa7f)
00:14:31 #34 0x00007f9137118b97 __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:344:0
00:14:31 #35 0x0000559959f94c2a _start (/home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin/clang+0xbb5c2a)
00:14:31 clang-11: error: clang frontend command failed due to signal (use -v to see invocation)
00:14:31 clang version 11.0.0 
00:14:31 Target: arm-unknown-linux-gnueabihf
00:14:31 Thread model: posix
00:14:31 InstalledDir: /home/tcwg-buildslave/workspace/tcwg_kernel_1/llvm-install/bin
00:14:31 clang-11: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
00:14:31 clang-11: note: diagnostic msg: 
00:14:31 ********************
00:14:31 
00:14:31 PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
00:14:31 Preprocessed source(s) and associated run script(s) are located at:
00:14:31 clang-11: note: diagnostic msg: /tmp/sta_info-eeebaf.c
00:14:31 clang-11: note: diagnostic msg: /tmp/sta_info-eeebaf.sh
00:14:31 clang-11: note: diagnostic msg: 
00:14:31 
00:14:31 ********************
00:14:31 LLVM ERROR: out of memory
00:14:31 Aborted (core dumped)

I've creduce'd it down to

enum { a };
enum b { c, d };
e;
static _Bool g(struct f *h, enum b i) {
  i &&j();
  return a;
}
static k(char h, enum b i) {
  _Bool l = g(e, i);
  l;
}
m(h) {
  k(h, c);
  g(h, d);
}

The full preprocessed file and interestingness test are available here: https://github.com/nathanchance/creduce-files/tree/872520243c62229267aa0ad0b091a5f4e6e1e2be/aadb635e04854220064b77cc10d0e6772f5492fd

The Linux kernel hits the assert in markConstant on arm32 defconfig in net/mac80211/sta_info.o, as reported by Linaro's CI:

FWIW, I also hit this assert, when building Qt in qfontsubset.cpp for x86_64 and aarch64 mingw.

fhahn added a comment.Feb 12 2020, 1:42 AM

Thanks, I've managed to reproduce the issue and reverted the change while I investigate.

I've recommitted the change with a fix as rGbb310b3f73dd . The issue was related to the fact that we skip calls to tracked functions, but we might mark one of the users of the result as overdefined. If we later discover that the result of the call is constant, we try in some cases to mark an overdefined value as constant. I think in theory we also had the same issue without the patch, but got lucky that in case we mark a user of a function result as overdefined the corresponding visitor bails out early if the instruction is overdefined.

As a fix, we can skip instructions depending on tracked and unknown calls. Once we resolve the call result, we propagate it to the users. This also allows us the handle a few additional cases.

I was thinking about this a bit more, and I'm not sure this is enough to allow value ranges to work correctly around undef.

The problem essentially boils down to this: what happens when an instruction that might return undef is marked "constantrange"? If an instruction is marked "constant", it doesn't really matter if it really returns "constant or undef": SCCP will RAUW the instruction away, so it can't produce undef after SCCP runs. If it's a constantrange, though, we never RAUW the value, so the "undef" case never goes away. And if it's undef, it could be outside the computed range, and we could drop necessary arithmetic/comparisons. For example, suppose we compute a value X is in range 0-255, and then we "and X, 255". It looks like we could drop the "and"... but we never actually eliminated the possibility that the value was undef. So now we've transformed "and undef, 255" -> "undef", which is wrong.

This problem specifically comes up because we resolve undefs "late", I think: if we treated undef as an overdefined value at the point where we initially analyzed it, this case wouldn't come up.

fhahn added a comment.Feb 17 2020, 2:36 PM

I was thinking about this a bit more, and I'm not sure this is enough to allow value ranges to work correctly around undef.

The problem essentially boils down to this: what happens when an instruction that might return undef is marked "constantrange"? If an instruction is marked "constant", it doesn't really matter if it really returns "constant or undef": SCCP will RAUW the instruction away, so it can't produce undef after SCCP runs. If it's a constantrange, though, we never RAUW the value, so the "undef" case never goes away. And if it's undef, it could be outside the computed range, and we could drop necessary arithmetic/comparisons. For example, suppose we compute a value X is in range 0-255, and then we "and X, 255". It looks like we could drop the "and"... but we never actually eliminated the possibility that the value was undef. So now we've transformed "and undef, 255" -> "undef", which is wrong.

I thought a bit more about the cases where we generate ranges:

  1. Ops with all operands either constant/constant range (e.g. add, sub, mul, select)
  2. Range restricting ops with overdefined, constant/constant range operands (e.g. and overdefined, [0, 255])
  3. Phis with multiple live incoming values
  4. Function arguments with multiple call sites.

I think for 1. and 2., we should never generate a concrete range, if any of the operands may be undef as in your example. As long as any operand is unknown/undef, we wait until ResolvedUndefsIn.

In 3. and 4. we could however generate concrete ranges when one of the incoming values/concrete arguments may be undef. For phi nodes for example, the current implementation just skips incoming unknown/undefined values. If one of the incoming values is a range 0-255 and the other incoming value is undef, we end up with a value X as you described. I think we could however handle this directly in visitPhiNode: if some of the incoming values are unknown and some others are non-singleton constant ranges we have to go to overdefined.

I think we can deal with 4. in a similar fashion: when merging function arguments, we need to go to overdefined once we merge in an unknown/undef value and a (non singleton) constant range.

I think that should ensure we never construct a concrete range for a value that might be undef. I hope I didn't miss any cases. What do you think?

fhahn added a comment.Feb 18 2020, 9:22 AM

I was thinking about this a bit more, and I'm not sure this is enough to allow value ranges to work correctly around undef.

The problem essentially boils down to this: what happens when an instruction that might return undef is marked "constantrange"? If an instruction is marked "constant", it doesn't really matter if it really returns "constant or undef": SCCP will RAUW the instruction away, so it can't produce undef after SCCP runs. If it's a constantrange, though, we never RAUW the value, so the "undef" case never goes away. And if it's undef, it could be outside the computed range, and we could drop necessary arithmetic/comparisons. For example, suppose we compute a value X is in range 0-255, and then we "and X, 255". It looks like we could drop the "and"... but we never actually eliminated the possibility that the value was undef. So now we've transformed "and undef, 255" -> "undef", which is wrong.

I thought a bit more about the cases where we generate ranges:

  1. Ops with all operands either constant/constant range (e.g. add, sub, mul, select)
  2. Range restricting ops with overdefined, constant/constant range operands (e.g. and overdefined, [0, 255])
  3. Phis with multiple live incoming values
  4. Function arguments with multiple call sites.

    I think for 1. and 2., we should never generate a concrete range, if any of the operands may be undef as in your example. As long as any operand is unknown/undef, we wait until ResolvedUndefsIn.

    In 3. and 4. we could however generate concrete ranges when one of the incoming values/concrete arguments may be undef. For phi nodes for example, the current implementation just skips incoming unknown/undefined values. If one of the incoming values is a range 0-255 and the other incoming value is undef, we end up with a value X as you described. I think we could however handle this directly in visitPhiNode: if some of the incoming values are unknown and some others are non-singleton constant ranges we have to go to overdefined.

    I think we can deal with 4. in a similar fashion: when merging function arguments, we need to go to overdefined once we merge in an unknown/undef value and a (non singleton) constant range.

    I think that should ensure we never construct a concrete range for a value that might be undef. I hope I didn't miss any cases. What do you think?

Coincidentally a (I think) related bug just got filed, where LVI gets this wrong for PHIs, as you described. I think to fix is have ValueLattice::mergeIn([x, y], undef) -> overdefined instead of the current result [x, y]
(https://bugs.llvm.org/show_bug.cgi?id=44949)

I think we could however handle this directly in visitPhiNode: if some of the incoming values are unknown and some others are non-singleton constant ranges we have to go to overdefined.

If you treat unknown as if it were overdefined, you lose some of the nice properties of the sparse conditional propagation: in particular, that you get the same result regardless of the visitation order. You'll get weird results in cases that don't actually involve any undef values.

I think we could however handle this directly in visitPhiNode: if some of the incoming values are unknown and some others are non-singleton constant ranges we have to go to overdefined.

If you treat unknown as if it were overdefined, you lose some of the nice properties of the sparse conditional propagation: in particular, that you get the same result regardless of the visitation order. You'll get weird results in cases that don't actually involve any undef values.

Right, I hoped the way we mark blocks/edges as executable should take care of most cases not involving actual undef values. But maybe it is better to just treat undef constants as overdefined at construction. I initially was not sure if that would be enough, but I think it should be. Either way, I think both approaches should rule out the problem outline I think. I'll check the impact of both approaches and I;ll put up a follow up patch. Does that sounds good?

Yes, that sounds fine

fhahn added a comment.Feb 20 2020, 1:57 PM

Yes, that sounds fine

I'll probably will get the results early next week. I first have to add handling for AND to limit the ranges.

I also pushed a small follow up fix 99809f98d7bb which excludes loads from being marked as overdefined in ResolvedUndefsIn, as this breaks an invariant that all non-store users of unknown constants are replaced and causes a crash. I think that should be fine, especially as it is not related to forcedconstants, but please let me know if you have any concerns, then I'll revert this again.

fhahn added a comment.Feb 24 2020, 8:29 AM

I've added another patch that implements AND overdefined, constant range -> constant range (D75054). With that, I checked compared the impact of

  1. making ValueLatticeElement::mergeIn go to overdefined, if a constant range is merged with an undefined value
  2. Using overdefined for undef constant when we create the ValueLatticeElement.

Comparing 1. and 2. for MultiSource, SPEC2000, SPEC2006 with -O3 & LTO below (sorry for the weird formatting, not sure how to post the output of compare.py better on phab). It seems like the impact of 1. is lower, presumably because we skip incoming values from unreachable blocks in PHIs. I've also added some stats for the impact on CorrelatedValuePropagation in the description of D75054.

it would be great if you could take another look and let me know which direction you prefer.

Sum of all SCCP stats (using cat report.json | grep sccp | cut -d" " -f 10 | cut -d',' -f 1 | awk '{s+=$1} END {print s}')

Base: 220147

  1. : 220062
  2. : 218205

Program 1. 2.
test-suite...T2000/300.twolf/300.twolf.test 179.00 64.00 -64.2%
test-suite...stones-3.1/fhourstones3.1.test 8.00 3.00 -62.5%
test-suite...lications/minisat/minisat.test 7.00 3.00 -57.1%
test-suite...abench/jpeg/jpeg-6a/cjpeg.test 39.00 19.00 -51.3%
test-suite...ocBench/espresso/espresso.test 131.00 64.00 -51.1%
test-suite...nsumer-jpeg/consumer-jpeg.test 42.00 22.00 -47.6%
test-suite...nchmarks/McCat/09-vor/vor.test 39.00 21.00 -46.2%
test-suite...cations/hexxagon/hexxagon.test 28.00 16.00 -42.9%
test-suite...ve-susan/automotive-susan.test 67.00 43.00 -35.8%
test-suite...Source/Benchmarks/sim/sim.test 24.00 16.00 -33.3%
test-suite...129.compress/129.compress.test 90.00 64.00 -28.9%
test-suite...urce/Applications/hbd/hbd.test 30.00 23.00 -23.3%
test-suite.../Benchmarks/Bullet/bullet.test 233.00 186.00 -20.2%
test-suite...langs-C/unix-tbl/unix-tbl.test 12.00 10.00 -16.7%
test-suite...INT95/132.ijpeg/132.ijpeg.test 100.00 84.00 -16.0%
test-suite...lications/ClamAV/clamscan.test 854.00 722.00 -15.5%
test-suite.../CINT2000/252.eon/252.eon.test 349.00 297.00 -14.9%
test-suite...lications/SIBsim4/SIBsim4.test 16.00 14.00 -12.5%
test-suite...CFP2000/177.mesa/177.mesa.test 256.00 227.00 -11.3%
test-suite...lications/sqlite3/sqlite3.test 473.00 422.00 -10.8%
test-suite...oxyApps-C++/miniFE/miniFE.test 28.00 25.00 -10.7%
test-suite.../CINT2000/176.gcc/176.gcc.test 1015.00 920.00 -9.4%
test-suite...TimberWolfMC/timberwolfmc.test 43.00 39.00 -9.3%
test-suite.../CINT95/134.perl/134.perl.test 86.00 78.00 -9.3%
test-suite...T2006/445.gobmk/445.gobmk.test 1563.00 1422.00 -9.0%
test-suite...SPEC/CINT95/130.li/130.li.test 103.00 94.00 -8.7%
test-suite...006/447.dealII/447.dealII.test 449.00 411.00 -8.5%
test-suite...ce/Applications/Burg/burg.test 26.00 24.00 -7.7%
test-suite...marks/SciMark2-C/scimark2.test 26.00 24.00 -7.7%
test-suite.../CINT2006/403.gcc/403.gcc.test 3684.00 3438.00 -6.7%
test-suite...pplications/oggenc/oggenc.test 222.00 208.00 -6.3%
test-suite...urce/Applications/lua/lua.test 263.00 247.00 -6.1%
test-suite...plications/d/make_dparser.test 51.00 48.00 -5.9%
test-suite...3.xalancbmk/483.xalancbmk.test 3807.00 3604.00 -5.3%
test-suite...nsumer-lame/consumer-lame.test 192.00 182.00 -5.2%
test-suite...CFP2006/444.namd/444.namd.test 195.00 186.00 -4.6%
test-suite...yApps-C++/PENNANT/PENNANT.test 204.00 195.00 -4.4%
test-suite...encode/alacconvert-encode.test 52.00 50.00 -3.8%
test-suite...decode/alacconvert-decode.test 52.00 50.00 -3.8%
test-suite...Applications/kimwitu++/kc.test 646.00 626.00 -3.1%
test-suite...0.perlbench/400.perlbench.test 1580.00 1538.00 -2.7%
test-suite...ProxyApps-C++/CLAMR/CLAMR.test 312.00 304.00 -2.6%
test-suite...marks/7zip/7zip-benchmark.test 4790.00 4670.00 -2.5%
test-suite...CFP2006/433.milc/433.milc.test 201.00 196.00 -2.5%
test-suite...ce/Benchmarks/PAQ8p/paq8p.test 87.00 85.00 -2.3%
test-suite...chmarks/MallocBench/gs/gs.test 397.00 388.00 -2.3%
test-suite...006/450.soplex/450.soplex.test 426.00 418.00 -1.9%
test-suite...ications/JM/ldecod/ldecod.test 152.00 150.00 -1.3%
test-suite.../Applications/SPASS/SPASS.test 2019.00 1996.00 -1.1%
test-suite...0/253.perlbmk/253.perlbmk.test 944.00 935.00 -1.0%
test-suite...rks/tramp3d-v4/tramp3d-v4.test 342.00 339.00 -0.9%
test-suite...6/471.omnetpp/471.omnetpp.test 418.00 415.00 -0.7%
test-suite...:: External/Povray/povray.test 1438.00 1436.00 -0.1%
test-suite...-typeset/consumer-typeset.test 3172.00 3169.00 -0.1%
test-suite...ications/JM/lencod/lencod.test 8296.00 8291.00 -0.1%
test-suite...lFlow-dbl/ControlFlow-dbl.test 520.00 520.00 0.0%

Metric: sccp.NumInstRemoved

Program 1. 2.
test-suite...:: External/Povray/povray.test 191.00 179.00 -6.3%
test-suite...marks/7zip/7zip-benchmark.test 1154.00 1109.00 -3.9%
test-suite...chmarks/MallocBench/gs/gs.test 159.00 154.00 -3.1%
test-suite...006/453.povray/453.povray.test 608.00 590.00 -3.0%
test-suite.../Benchmarks/Bullet/bullet.test 585.00 576.00 -1.5%
test-suite...urce/Applications/lua/lua.test 431.00 427.00 -0.9%
test-suite.../CINT2000/252.eon/252.eon.test 235.00 233.00 -0.9%
test-suite...lications/ClamAV/clamscan.test 1116.00 1111.00 -0.4%
test-suite...0.perlbench/400.perlbench.test 1143.00 1141.00 -0.2%
test-suite...rks/tramp3d-v4/tramp3d-v4.test 6745.00 6737.00 -0.1%
test-suite...CFP2000/177.mesa/177.mesa.test 1641.00 1640.00 -0.1%
test-suite...006/447.dealII/447.dealII.test 12050.00 12043.00 -0.1%
test-suite.../CINT2006/403.gcc/403.gcc.test 7210.00 7211.00 0.0%

presumably because we skip incoming values from unreachable blocks in PHIs

That doesn't make any sense. We should skip incoming values from unreachable blocks no matter what they contain. I assume the significant difference between the two approaches is that one is special-casing merging undef and a constant.

I'm curious about two other possibilities:

  1. Use overdefined for undef constants, but add a special case for PHI nodes with a literal "undef" operand.
  2. Treat undef as if it were an opaque constant, like a constant expression you can't analyze, instead of going straight to overdefined, then again add a special case for PHI nodes with a constant undef operand.
fhahn added a comment.Feb 25 2020, 8:32 AM

presumably because we skip incoming values from unreachable blocks in PHIs

That doesn't make any sense. We should skip incoming values from unreachable blocks no matter what they contain. I assume the significant difference between the two approaches is that one is special-casing merging undef and a constant.

Ah yes, my wording got mixed up here, sorry! The point I wanted to make is that we seem to not loose much in terms of precision by mergeIn(ConstantRange, unknown) -> overdefined.

I'm curious about two other possibilities:

  1. Use overdefined for undef constants, but add a special case for PHI nodes with a literal "undef" operand.

This seems to have no impact. I've changed getValueState to mark UndefValue constants as overdefined and then added a special case to visitPHINode, which skips incoming UndefValues if the PHI has a constant value. getValueState. (D75122) I think should be along the lines you suggested here.

  1. Treat undef as if it were an opaque constant, like a constant expression you can't analyze, instead of going straight to overdefined, then again add a special case for PHI nodes with a constant undef operand.

I started with this approach, but it seemed like it would require updating lots of places where we handle constants which do not expect it to be undef.

I've tried a slightly different approach (see D75120), which I think is quite similar in spirit: add a new undef lattice state, which can be merged with constants, but merging it with non-singleton ranges goes to overdefined. I think that makes it easier to update the code. Most places in SCCP can treat it similar to unknown, in that we can delay marking instructions with undef operands as overdefined until we finished solving. Is that approach roughly along the direction you suggested? For cases like %p = phi [undef], [0] %foo = add %p, 1, it is important to not mark %foo overdefined before we merged all incoming values of %p. Without delaying solving instructions with undef operands, we would have to mark %foo overdefined once we have %p == undef.

Also, from the comments in ValueLattice.h and in LVI, it seems the code expects unknown to mean unreachable/dead, which is slightly different to undef and the new state would help to clarify this.

Adding an explicit "undef" lattice state makes sense, I think; it should allow modeling everything we need correctly.

Also, from the comments in ValueLattice.h and in LVI, it seems the code expects unknown to mean unreachable/dead, which is slightly different to undef and the new state would help to clarify this.

Yes, I'm happier reserving unknown for unvisited values. It wouldn't be too much of a stretch to also use it for poison values, since they have similar properties in terms of the lattice. (Granted, we don't try to model poison in SCCP at the moment.)