This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add `foldCSELOfCSEl` DAG combine
ClosedPublic

Authored by Kmeakin on May 12 2022, 2:47 PM.

Diff Detail

Event Timeline

Kmeakin created this revision.May 12 2022, 2:47 PM
Kmeakin requested review of this revision.May 12 2022, 2:47 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 12 2022, 2:47 PM

Maybe this should be done in InstCombine instead?

Maybe this should be done in InstCombine instead?

It may be worth adding a similar transform to InstCombine. However InstCombine cannot catch all instances of CSEL, because some LLVM IR intrinsics (eg llvm.uadd.sat) expand to CSEL during the lowering from LLVM IR to SelectionDAG IR: too late to be handled by InstCombine.

Oh, okay, yeah, I hadn't noticed the types that you're dealing with here!

samparker accepted this revision.Aug 15 2022, 2:25 AM

This looks okay to me.

This revision is now accepted and ready to land.Aug 15 2022, 2:25 AM
This revision was landed with ongoing or failed builds.Aug 16 2022, 4:50 AM
This revision was automatically updated to reflect the committed changes.
vitalybuka added a comment.EditedAug 16 2022, 8:33 PM

https://lab.llvm.org/staging/#/builders/224/builds/6/steps/10/logs/stdio

affected tests:
llvm/test/tools/dsymutil/X86/dwarf4-linetable.test
llvm/test/tools/dsymutil/X86/odr-fwd-declaration.cpp
llvm/test/tools/dsymutil/X86/generate-empty-CU.test
llvm/test/tools/dsymutil/X86/custom-line-table.test
llvm/test/tools/dsymutil/X86/modules.m
llvm/test/tools/dsymutil/X86/odr-anon-namespace.cpp
llvm/test/tools/dsymutil/X86/modules-empty.m
llvm/test/tools/dsymutil/X86/dwarf5-linetable.test
llvm/test/tools/dsymutil/X86/submodules.m
llvm/test/tools/dsymutil/X86/inlined-static-variable.cpp
llvm/test/tools/dsymutil/ARM/scattered.c
llvm/test/tools/dsymutil/X86/mismatch.m
llvm/test/tools/dsymutil/ARM/fat-arch-not-found.test
llvm/test/tools/dsymutil/X86/dead-stripped.cpp
llvm/test/tools/dsymutil/X86/fat-archive-input-i386.test
llvm/test/tools/dsymutil/X86/empty_range.s
llvm/test/tools/dsymutil/ARM/inlined-low_pc.c
llvm/test/tools/dsymutil/X86/odr-member-functions.cpp
llvm/test/tools/dsymutil/X86/fat-object-input-x86_64.test
llvm/test/tools/dsymutil/X86/fat-object-input-x86_64h.test
llvm/test/tools/dsymutil/X86/modules-pruning.cpp
llvm/test/tools/dsymutil/X86/label.test
llvm/test/tools/dsymutil/X86/odr-fwd-declaration2.cpp
llvm/test/tools/dsymutil/X86/odr-uniquing.cpp

reproducer:

# GCE T2A instance
# 5.15.0-1016-gcp #21-Ubuntu SMP Fri Aug 5 12:20:06 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
# deps from https://github.com/google/sanitizers/blob/master/buildbot/install_deps.sh

mkdir mybuild && cd mybuild
git clone https://github.com/llvm/llvm-project.git
(
  mkdir llvm_build0 && cd llvm_build0
  cmake -DLLVM_LIT_ARGS=-v -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_ENABLE_PER_TARGET_RUNTIME_DIR=OFF -DLLVM_ENABLE_PROJECTS="clang;compiler-rt;lld" -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_CCACHE_BUILD=ON ../llvm-project/llvm
ninja
)


# I don't know if this reproduces without instrumented libc++.
(
  mkdir libcxx_build_ubsan && cd libcxx_build_ubsan
  cmake -DLLVM_LIT_ARGS=-v -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_ENABLE_PER_TARGET_RUNTIME_DIR=OFF -DCMAKE_C_COMPILER=$(readlink -f ../llvm_build0/bin)/clang -DCMAKE_CXX_COMPILER=$(readlink -f ../llvm_build0/bin)/clang++ -DLLVM_USE_LINKER=lld '-DLLVM_ENABLE_PROJECTS=libcxx;libcxxabi' -DCMAKE_BUILD_TYPE=Release -DLLVM_USE_SANITIZER=Undefined '-DCMAKE_C_FLAGS=-fsanitize=undefined ' '-DCMAKE_CXX_FLAGS=-fsanitize=undefined '  ../llvm-project/llvm
ninja cxx cxxabi
)

(
  mkdir llvm_build_ubsan && cd llvm_build_ubsan
  cmake -DLLVM_LIT_ARGS=-v -GNinja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_ENABLE_PER_TARGET_RUNTIME_DIR=OFF -DCMAKE_C_COMPILER=$(readlink -f ..)/llvm_build0/bin/clang -DCMAKE_CXX_COMPILER=$(readlink -f ..)/llvm_build0/bin/clang++ -DLLVM_USE_LINKER=lld '-DLLVM_ENABLE_PROJECTS='\''clang;lld;clang-tools-extra;mlir'\''' -DCMAKE_BUILD_TYPE=Release -DLLVM_USE_SANITIZER=Undefined -DLLVM_ENABLE_LIBCXX=ON '-DCMAKE_C_FLAGS=-nostdinc++ -isystem $(readlink -f ..)/libcxx_build_ubsan/include -isystem $(readlink -f ..)/libcxx_build_ubsan/include/c++/v1 -fsanitize=undefined -lc++abi -Wl,--rpath=$(readlink -f ..)/libcxx_build_ubsan/lib -L$(readlink -f ..)/libcxx_build_ubsan/lib -w' '-DCMAKE_CXX_FLAGS=-nostdinc++ -isystem $(readlink -f ..)/libcxx_build_ubsan/include -isystem $(readlink -f ..)/libcxx_build_ubsan/include/c++/v1 -fsanitize=undefined -lc++abi -Wl,--rpath=$(readlink -f ..)/libcxx_build_ubsan/lib -L$(readlink -f ..)/libcxx_build_ubsan/lib -w' '-DCMAKE_EXE_LINKER_FLAGS=-lc++abi -Wl,--rpath=$(readlink -f ..)/libcxx_build_ubsan/lib -L$(readlink -f ..)/libcxx_build_ubsan/lib' ../llvm-project/llvm
)

git -C llvm-project/ checkout 6f9423ef06926a70af84b77cb290c91214cf791a
ninja -C llvm_build0 && ninja -C llvm_build_ubsan/ -t clean && LIT_FILTER=fat-object-input-x86_64 ninja -C llvm_build_ubsan/ check-llvm
# RESULT: hangs

git -C llvm-project/ checkout 6f9423ef06926a70af84b77cb290c91214cf791a^
ninja -C llvm_build0 && ninja -C llvm_build_ubsan/ -t clean && LIT_FILTER=fat-object-input-x86_64 ninja -C llvm_build_ubsan/ check-llvm
# RESULT: pass
-- Testing: 2 of 44592 tests, 2 workers --
PASS: LLVM :: tools/dsymutil/X86/fat-object-input-x86_64h.test (1 of 2)
PASS: LLVM :: tools/dsymutil/X86/fat-object-input-x86_64.test (2 of 2)

git -C llvm-project/ checkout origin/main
git -C llvm-project/ revert 6f9423ef06926a70af84b77cb290c91214cf791a
ninja -C llvm_build0 && ninja -C llvm_build_ubsan/ -t clean && LIT_FILTER=fat-object-input-x86_64 ninja -C llvm_build_ubsan/ check-llvm
# RESULT: also pass
vitalybuka reopened this revision.Aug 16 2022, 8:33 PM
This revision is now accepted and ready to land.Aug 16 2022, 8:33 PM

I also ran into a case where this commit broke things. One of the miscompiled files is https://martin.st/temp/bprint-preproc.c, compiled with clang -target aarch64-linux-gnu bprint-preproc.c -O3, if that helps identifying what the issue is.

I also ran into a case where this commit broke things. One of the miscompiled files is https://martin.st/temp/bprint-preproc.c, compiled with clang -target aarch64-linux-gnu bprint-preproc.c -O3, if that helps identifying what the issue is.

Could you elaborate on what broke? What output were you expecting, and how did it change after this patch?

I also ran into a case where this commit broke things. One of the miscompiled files is https://martin.st/temp/bprint-preproc.c, compiled with clang -target aarch64-linux-gnu bprint-preproc.c -O3, if that helps identifying what the issue is.

Could you elaborate on what broke? What output were you expecting, and how did it change after this patch?

I didn't dive into it in detail to see what actually behaves differently. But it's fairly straightforward to repro it on aarch64-linux like this:

$ git clone git://source.ffmpeg.org/ffmpeg
$ cd ffmpeg
# Build everything with a good clang, run tests which succeed:
$ ./configure --samples=../samples --cc=clang
$ make fate-rsync
$ make -j$(nproc) fate-fits-demux # succeeds
# Update clang in $PATH to one with this patch applied.
# Rebuild libavutil/bprint.o:
$ rm libavutil/bprint.o 
$ make V=1 -j$(nproc) fate-fits-demux # fails (with a segfault)
Kmeakin updated this revision to Diff 453649.Aug 18 2022, 6:47 AM

Make foldCSELOfCSEL more conservative

Thanks, this version seems to work for all the tests that were broken previously.

Kmeakin closed this revision.Aug 18 2022, 5:32 PM

I was investigating this.

(CSEL l r EQ (CMP (CSEL x y cc2 cond) x)) => (CSEL l r cc2 cond)

Could it be said as "Reduction of (CMP(CSEL)), regardless of outer CSEL"?

// Where x and y are constants

I am dubious prerequisites. Is it sufficient?

cc = ((cc2 ? x : y) == x)

This can be transformed to;

cc = (cc2 || y == x)

I am afraid if correlation between x and y would be missed.
I think such a transformation may be applied to (CMP(CSEL(CMP))).

I was investigating this.

(CSEL l r EQ (CMP (CSEL x y cc2 cond) x)) => (CSEL l r cc2 cond)

Could it be said as "Reduction of (CMP(CSEL)), regardless of outer CSEL"?

I suppose we could do that. Can you produce any tests that would benefit from a more generalised transform?

// Where x and y are constants

I am dubious prerequisites. Is it sufficient?

cc = ((cc2 ? x : y) == x)

This can be transformed to;

cc = (cc2 || y == x)

I am afraid if correlation between x and y would be missed.
I think such a transformation may be applied to (CMP(CSEL(CMP))).

I am not sure I understand what you mean by this. Could you give an example?