Wed, Mar 27
LGTM, thanks for fixing this!
This might be a more general problem, other passes might expect a normalized form as well, such as UnrollAndJam and LoopDistribute.
For LoopVectorizer, I am somewhat surprised. It calls simplifyLoop itself (instead of relying on LoopSimplifyPass). Could the same be done for LoopRotation?
This makes sense to me, given that loops with bottom checks are a precondition for LV. I think it would be good to also update the section about llvm.loop.vectorize.enable in LangRef to say it might also enable transformations like LoopRotate, in preparation for LV.
In case LV fails to vectorize the loop, it might be a bit surprising the loop has been rotated. But the way things are structured at the moment, there's nothing we can do about that (we cannot tell in LoopRotate if LV will be able to vectorize). By documenting that behavior, we can push the responsibility for that to the user. Maybe a remark in LoopRotate would be helpful to indicate that we only rotated because of the metadata.
Tue, Mar 26
Mar 14 2019
Feb 26 2019
Feb 8 2019
Feb 6 2019
LGTM with the comment.
Jan 25 2019
This is great, please add some tests or check for remarks in existing tests (e.g. for the recursive case).
Awesome and thanks for the test. LGTM!
Jan 24 2019
Can you please describe the user experience?
Dec 13 2018
Actually this has been failing for 8 hours. So reverted in r349117. Also reverted your attempt to update the test. It wasn't updating the right test: r349118
Nov 27 2018
Nov 26 2018
Hi Florian, are you saying that in this case (known unsafe dep) we would still vectorize the loop (and always fail at run-time)?
Jul 20 2018
LGTM, you have trailing whitespace in one of the hunks, please clang-format.
Jul 16 2018
May 17 2018
Mostly nits. LGTM with the requested changes.
May 7 2018
Mostly small things except for the question on whether we should only compute this when the remark is actually enabled.
Apr 17 2018
Looks pretty straight-forward.
Mar 20 2018
While it's preferred to use ORE as an analysis pass, sometimes that's hard (e.g because it's a function pass, or simply because it's hard to thread the ORE instance through the many layers). In these cases it's fine to construct one inline. When remarks are requested this will amount to repopulating BFI for the function as the ORE instance is created.
Mar 13 2018
@inglorion, I am inclined to recommit this unless I hear from you in a few days:
Mar 12 2018
Mar 7 2018
@inglorion Is this from a bot, I didn't see any failures? A bit more info would be helpful.
Mar 6 2018
Feb 26 2018
Feb 24 2018
Feb 21 2018
Seeing such major swings, my preference would be to revert and put the new version up for review (I think that your hack works). Then commit the new combined version after a few days so that the perf bots got a chance to recover. What do you think?
Feb 20 2018
Please revert until these things get worked out so that we can properly track performance. We are seeing many regressions including 17% on 444.namd and 12% on 482.sphinx3 in SPECfp 2006.
Hi Adam -
Rather than reverting for all targets, can we just hack ARM/AArch with patches like this (if we can add at least one test to show what changed, that would be better of course) :Index: lib/Target/ARM/ARMTargetTransformInfo.cpp =================================================================== --- lib/Target/ARM/ARMTargetTransformInfo.cpp (revision 325579) +++ lib/Target/ARM/ARMTargetTransformInfo.cpp (working copy) @@ -514,6 +514,13 @@ int Cost = BaseT::getArithmeticInstrCost(Opcode, Ty, Op1Info, Op2Info, Opd1PropInfo, Opd2PropInfo); + // Assume that floating point arithmetic operations cost twice as much as + // integer operations. + // FIXME: This is a win on several perf benchmarks running on CPU model ???, + // but there are no regression tests that show why or how this is good. + if (Ty->isFPOrFPVectorTy()) + Cost *= 2; + // This is somewhat of a hack. The problem that we are facing is that SROA // creates a sequence of shift, and, or instructions to construct values. // These sequences are recognized by the ISel and have zero-cost. Not so for
The patch caused regressions in the LLVM benchmarks and in Spec2k/Spec2k6 benchmarks on AArch64 Cortex-A53:
The regression of SingleSource/Benchmarks/Misc/matmul_f64_4x4 can also be seen on the public bot: http://lnt.llvm.org/db_default/v4/nts/90636
It is 128.85%.
The main difference in generated code is FMUL(FP, scalar) instead of FMUL(SIMD, scalar):fmul d20, d16, d2
instead offmul v17.2d, v1.2d, v5.2d
This also caused code size increase: 6.04% in SingleSource/Benchmarks/Misc/matmul_f64_4x4
I am working on a reproducer.
Thanks. We knew this change was likely to cause perf regressions based on some of the x86 diffs, so having those reductions will help tune the models in general and specifically for AArch64.
Ie, we should be able to solve the AArch64 problems with AArch64-specific cost model changes rather than reverting this. For example as @fhahn mentioned, we might want to make the int-to-FP ratio 3:2 for some cores. Another possibility is overriding the fmul/fsub/fadd AArch64 costs to be more realistic (as we also probably have to do for x86).
Feb 12 2018
Jan 20 2018
This is used by Swift. Providing a macro instead of a static value may be a better solution. If you are willing to do that I am OK with that but I have a strong objection against simply removing this.
Jan 9 2018
Jan 5 2018
LGTM too. Thanks for getting back to this!
Jan 2 2018
Dec 20 2017
Dec 14 2017
I had to further tweak this in rL320725. Let me know if you see any issues.
Dec 6 2017
Dec 1 2017
Looks like it's a test problem. When I tweak the sample profile file according to https://clang.llvm.org/docs/UsersManual.html#sample-profile-text-format, I do get hotness on the remarks.
@modocache, @davide, are you guys sure this feature is working? The test does not actually check whether hotness is included in the remarks and when I run it manually they are missing. In D40678, I am filtering out remarks with no hotness when any threshold is set all the remarks are filtered out in this new test.
Nov 30 2017
Nov 29 2017
Nov 28 2017
Nov 27 2017
Thanks, Chris! This moves the cmake bits to config-ix.cmake.
Nov 17 2017
Nov 15 2017
Nov 14 2017
I get two failures, can you please take a look?