This is an archive of the discontinued LLVM Phabricator instance.

Remove sometimes faulty rewrite of memcpy in instcombine.
ClosedPublic

Authored by uabelho on Feb 22 2017, 6:32 AM.

Details

Summary

Solves PR 31990.

The bad rewrite could replace a memcpy of one word with
store i4 -1
while it should actually be
store i8 -1

Hopefully opt and llc has improved enough so the original optimization
done by the code isn't needed anymore.

One already existing testcase is affected. It originally tested that
the memcpy was replaced with
load double
but since we now remove that rewrite it will be
load i64
instead.

Patch suggestion by Eli Friedman.

Diff Detail

Event Timeline

uabelho created this revision.Feb 22 2017, 6:32 AM

What do you think, is it ok to remove the offending code as I've done?

What about the test memcpy-to-load.ll, is the update ok or should I remove it?

efriedma added inline comments.
test/Transforms/InstCombine/memcpy-to-load.ll
15

We already produce this result 64-bit targets (combineLoadToOperationType transforms double load/store to i64 load/store); the interesting piece is what happens on targets where i64 isn't legal.

In general, it shouldn't matter what we produce here; we should always eventually rewrite loads/stores to the appropriate register type for the target. And other optimization passes don't really care about the types of loads and stores anyway (SROA is a lot more flexible than it used to be). So if this does in fact expose a performance regression, we should fix it elsewhere.

That said, it would be nice if you could give testsuite performance numbers on some 32-bit target to make sure we aren't missing some important optimization.

uabelho added inline comments.Feb 23 2017, 1:35 AM
test/Transforms/InstCombine/memcpy-to-load.ll
15

Ok, for the performance numbers you'd have to give me some guidance then.

Is it these tests you want me to run?
http://llvm.org/docs/TestingGuide.html#test-suite-quickstart

I have a 64b linux ubuntu 14.04 machine that I think I managed to run the test-suite the LNT way on according to:
http://llvm.org/docs/lnt/quickstart.html

I ran them two times without and then two times with the patch and imported all the results to a database and started looking through the web UI but to be honest I don't know what I'm looking for and I'm quite confused.

So, please some guidance to what numbers you want me to dig up.

Is it these tests you want me to run?
http://llvm.org/docs/TestingGuide.html#test-suite-quickstart

Yes, that's right.

In terms of the results, you should be seeing a page like http://llvm.org/perf/db_default/v4/nts/109028?compare_to=109007 . On the sidebar on the left, there are two dates in bold: those are the two runs you're comparing. The differences between the two runs are listed under the heading "Run-Over-Run Changes Detail". The important number is the first one; the percentage change between the two runs; if it's negative (green), there's a performance improvement, and if it's positive (red), there's a performance regression. (If there are no significant changes between the two runs, this section might be empty.)

For this patch, please make sure you're comparing the performance of 32-bit binaries (build with "-m32" in CFLAGS/CXXFLAGS).

Alright!

So I've done four runs of

lnt runtest nt --sandbox SANDBOX --cflag=-m32 --cc <path-to-compiler> --test-suite <path-to-test-suite>

Two (base 1 and base2) on two commits without the patch, where the only difference between the commits is
a white space in CODE_OWNERS.txt.

Then two runs (patch1 and patch2) on two other commits containing the patch, where the only difference
between the commits is the same white space change in CODE_OWNERS.txt

If I compare the two runs without the patch there are Execution Time regressions up to 98.36% and Compile
Time regressions up to 219.18%. A total of: "Performance Regressions 191".

I did the above just to get an indication of how stable the numbers are when nothing significant at all has changed.

I have a hard time realizing if the patch changes anything significantly so I put up a couple of screen shots from
different comparisons from the web UI:
http://imgur.com/a/SVg08

There we have base2 vs base1 which is the two runs without the patch compared.

Then four pictures where the two patch runs are compared to the two base runs.

Can you make anything out of this?

(Btw, with --cflag=-m32 three tests seems to fail:

FAIL: MultiSource/Applications/ClamAV/clamscan.compile_time (1 of 2465)
FAIL: MultiSource/Applications/ClamAV/clamscan.execution_time (494 of 2465)
FAIL: MultiSource/Benchmarks/DOE-ProxyApps-C/XSBench/XSBench.execution_time (495 of 2465)

both with and without the patch. I don't see those FAILs without -m32.)

Those results are much more noisy than they should be... if you're seeing run-to-run performance differences of over a second for the same binary, something has gone very wrong. Please ask on llvm-dev for help with that.

Not sure what's causing the test failures... make sure you're passing --cc, --cxx, --cflags, and --cxxflags?

Ok, with help from llvm-dev I now get more stable benchmark numbers.
I don't if this is still too messy to draw any conclusions from, but
at least it's significantly more stable than before.

I've run the benchmarks with

lnt runtest nt --sandbox SANDBOX --cflag=-m32 --cc <path-to-compiler> --test-suite <path-to-test-suite> --threads 1 --build-threads 12 --benchmarking-only --use-perf=1 --make-param="RUNUNDER=taskset -c 1" --benchmarking-only --multisample=3

lnt runtest nt doesn't have any "--cxxflags" flag so I'm only using
--cflag=-m32.

(The failing tests I got with -m32 seems to be connected with zconf.h,
maybe https://bugs.launchpad.net/ubuntu/+source/zlib/+bug/1155307
I tried the workaround suggested at the end but it failed in some
unclear way then so I simply hope that those three tests aren't vital for
the results we're looking for.)

Base vs base comparison (only improvements):
http://i.imgur.com/6eTlc7H.png

Patch vs base comparison (also only improvements):
http://i.imgur.com/ye8EDSl.png

Ok?

efriedma accepted this revision.Feb 28 2017, 12:56 PM

Numbers are good enough; LGTM.

This revision is now accepted and ready to land.Feb 28 2017, 12:56 PM
This revision was automatically updated to reflect the committed changes.