This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Fix stores of zero values
AbandonedPublic

Authored by niravd on May 24 2017, 1:35 PM.

Details

Summary

Now that DAGCombiner may merge multiple non-consecutive stores in a single pass, the simple one-use analysis for when to split a store
into stores of XZR for pairing purposes is no longer sufficient. Improve check to capture these cases.

Event Timeline

niravd created this revision.May 24 2017, 1:35 PM
gberry requested changes to this revision.Jun 12 2017, 12:29 PM

Hi Nirav,

I did a test run with this patch to see what the extent of the changes are and it seems to get into an infinite loop during ISel in quite a few cases. For example, when compiling a test from the llvm test-suite like so:
clang -DSMALL_PROBLEM_SIZE=1 -DNDEBUG -O3 -c MultiSource/Benchmarks/TSVC/Packing-dbl/tsc.c

Also, could you fix the summary of this patch? The first sentence seems incomplete.

This revision now requires changes to proceed.Jun 12 2017, 12:29 PM
niravd edited the summary of this revision. (Show Details)Jun 12 2017, 1:01 PM
niravd updated this revision to Diff 102242.Jun 12 2017, 1:50 PM
niravd edited edge metadata.

Fix infinite loop

niravd planned changes to this revision.Jun 13 2017, 9:26 AM

Fix infinite loop

niravd requested review of this revision.Jun 15 2017, 8:58 AM

There are some simplifications to BaseOffsetDifference16 which can be done once cleanups, but I'd like to get this in before that.

niravd updated this revision to Diff 103513.Jun 21 2017, 7:48 PM

Rewrite to use BaseIndexOffset

gberry requested changes to this revision.Jul 13 2017, 1:06 PM

This doesn't even compile for me:

FAILED: /local/mnt/gberry/infra/tools/clang-tools/clang-x86-494-x++ -DGTEST_HAS_RTTI=0 -DLLVM_BUILD_GLOBAL_ISEL -D_DEBUG -D_GNU_SOURCE -DSTDC_CONSTANT_MACROS -DSTDC_FORMAT_MACROS -D__\
STDC_LIMIT_MACROS -Ilib/Target/AArch64 -I/local/mnt/gberry/src/llvm/lib/Target/AArch64 -Iinclude -I/local/mnt/gberry/src/llvm/include -O2 -fPIC -fPIC -fvisibility-inlines-hidden -Werror -W\
error=date-time -std=c++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wcovered-switch-default -Wnon-virtual-dtor -Wdel\
ete-non-virtual-dtor -Wstring-conversion -fcolor-diagnostics -ffunction-sections -fdata-sections -O3 -UNDEBUG -fno-exceptions -fno-rtti -MMD -MT lib/Target/AArch64/CMakeFiles/LLVMAArch6\
4CodeGen.dir/AArch64ISelLowering.cpp.o -MF lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64ISelLowering.cpp.o.d -o lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64IS\
elLowering.cpp.o -c /local/mnt/gberry/src/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
/local/mnt/gberry/src/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:9298:52: error: too few arguments to function call, expected 2, have 1

auto AddrDecomp = BaseIndexOffset::match(Addr);
                  ~~~~~~~~~~~~~~~~~~~~~~     ^

/local/mnt/gberry/src/llvm/include/llvm/CodeGen/SelectionDAGAddressAnalysis.h:60:3: note: 'match' declared here

static BaseIndexOffset match(SDValue Ptr, const SelectionDAG &DAG);
^

/local/mnt/gberry/src/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:9300:64: error: too few arguments to function call, expected 2, have 1

auto OtherAddrDecomp = BaseIndexOffset::match(OtherAddr);
                       ~~~~~~~~~~~~~~~~~~~~~~          ^

/local/mnt/gberry/src/llvm/include/llvm/CodeGen/SelectionDAGAddressAnalysis.h:60:3: note: 'match' declared here

static BaseIndexOffset match(SDValue Ptr, const SelectionDAG &DAG);
^
This revision now requires changes to proceed.Jul 13 2017, 1:06 PM
niravd updated this revision to Diff 106639.Jul 14 2017, 7:11 AM
niravd edited edge metadata.

Rebase to current.

I noticed a few code size regressions looking through the diffs of generated code for this change which were caused by:

  • the offsets of the stores not having the right alignment to allow for stp x opcodes to be formed
  • the offsets of the stores being too large for stp x, but not too large for stp q
  • the offsets of the stores being so large that adds are needed to compute the address for str x, but not so large that adds are needed to compute the address for str q
lib/Target/AArch64/AArch64ISelLowering.cpp
9335

Would it not be better to store the BaseIndexOffset in STAddrs instead of re-computing it for every iteration of the StVal->uses() loop?

Also, do you have a use case/benchmark where this helps performance/code size significantly to justify the n^2 checking here?

I noticed a few code size regressions looking through the diffs of generated code for this change which were caused by:

  • the offsets of the stores not having the right alignment to allow for stp x opcodes to be formed
  • the offsets of the stores being too large for stp x, but not too large for stp q
  • the offsets of the stores being so large that adds are needed to compute the address for str x, but not so large that adds are needed to compute the address for str q

Hmm. Dealing with offsets makes this harder. It seems like we could achieve the same rough goal in a more straight forward-manner post-DAG after load-store pairing has happened.

niravd abandoned this revision.Aug 8 2017, 1:16 PM