Now that DAGCombiner may merge multiple non-consecutive stores in a single pass, the simple one-use analysis for when to split a store
into stores of XZR for pairing purposes is no longer sufficient. Improve check to capture these cases.
Details
- Reviewers
t.p.northover gberry
Diff Detail
- Build Status
Buildable 6732 Build 6732: arc lint + arc unit
Event Timeline
Hi Nirav,
I did a test run with this patch to see what the extent of the changes are and it seems to get into an infinite loop during ISel in quite a few cases. For example, when compiling a test from the llvm test-suite like so:
clang -DSMALL_PROBLEM_SIZE=1 -DNDEBUG -O3 -c MultiSource/Benchmarks/TSVC/Packing-dbl/tsc.c
Also, could you fix the summary of this patch? The first sentence seems incomplete.
There are some simplifications to BaseOffsetDifference16 which can be done once cleanups, but I'd like to get this in before that.
This doesn't even compile for me:
FAILED: /local/mnt/gberry/infra/tools/clang-tools/clang-x86-494-x++ -DGTEST_HAS_RTTI=0 -DLLVM_BUILD_GLOBAL_ISEL -D_DEBUG -D_GNU_SOURCE -DSTDC_CONSTANT_MACROS -DSTDC_FORMAT_MACROS -D__\
STDC_LIMIT_MACROS -Ilib/Target/AArch64 -I/local/mnt/gberry/src/llvm/lib/Target/AArch64 -Iinclude -I/local/mnt/gberry/src/llvm/include -O2 -fPIC -fPIC -fvisibility-inlines-hidden -Werror -W\
error=date-time -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wcovered-switch-default -Wnon-virtual-dtor -Wdel\
ete-non-virtual-dtor -Wstring-conversion -fcolor-diagnostics -ffunction-sections -fdata-sections -O3 -UNDEBUG -fno-exceptions -fno-rtti -MMD -MT lib/Target/AArch64/CMakeFiles/LLVMAArch6\
4CodeGen.dir/AArch64ISelLowering.cpp.o -MF lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelLowering.cpp.o.d -o lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64IS\
elLowering.cpp.o -c /local/mnt/gberry/src/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
/local/mnt/gberry/src/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:9298:52: error: too few arguments to function call, expected 2, have 1
auto AddrDecomp = BaseIndexOffset::match(Addr); ~~~~~~~~~~~~~~~~~~~~~~ ^
/local/mnt/gberry/src/llvm/include/llvm/CodeGen/SelectionDAGAddressAnalysis.h:60:3: note: 'match' declared here
static BaseIndexOffset match(SDValue Ptr, const SelectionDAG &DAG); ^
/local/mnt/gberry/src/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:9300:64: error: too few arguments to function call, expected 2, have 1
auto OtherAddrDecomp = BaseIndexOffset::match(OtherAddr); ~~~~~~~~~~~~~~~~~~~~~~ ^
/local/mnt/gberry/src/llvm/include/llvm/CodeGen/SelectionDAGAddressAnalysis.h:60:3: note: 'match' declared here
static BaseIndexOffset match(SDValue Ptr, const SelectionDAG &DAG); ^
I noticed a few code size regressions looking through the diffs of generated code for this change which were caused by:
- the offsets of the stores not having the right alignment to allow for stp x opcodes to be formed
- the offsets of the stores being too large for stp x, but not too large for stp q
- the offsets of the stores being so large that adds are needed to compute the address for str x, but not so large that adds are needed to compute the address for str q
lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
9335 | Would it not be better to store the BaseIndexOffset in STAddrs instead of re-computing it for every iteration of the StVal->uses() loop? Also, do you have a use case/benchmark where this helps performance/code size significantly to justify the n^2 checking here? |
Hmm. Dealing with offsets makes this harder. It seems like we could achieve the same rough goal in a more straight forward-manner post-DAG after load-store pairing has happened.
Would it not be better to store the BaseIndexOffset in STAddrs instead of re-computing it for every iteration of the StVal->uses() loop?
Also, do you have a use case/benchmark where this helps performance/code size significantly to justify the n^2 checking here?