In D102861#2816862, @fhahn wrote:After this change, building the LLVM test-suite for X86_64h seems to fail http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-x86_64h-O3/9586/ :
Thanks so much, I have reverted.
In D102861#2816862, @fhahn wrote:After this change, building the LLVM test-suite for X86_64h seems to fail http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-x86_64h-O3/9586/ :
Thanks so much, I have reverted.
I'm hoping this revision will work well for all architectures @LuoYuanke
I will submit an updated patch which unconditionally adds ffp-contract=off to the failing tests (versus checking ARCH)
In D104055#2811694, @rengolin wrote:Probably cleaner to revert, let the bots calm down, then reapply the right patch.
I would like to submit this patch as well as the dependent clang patch, hopefully over this weekend when it would be less disruptive.
In llorg currently, with ffp-model=precise, the option ffp-contract=fast is implied. My patch to clang is to change this to ffp-contract=on. My patch to LNT hopes to achieve merely "do what you did before so the test has expected behavior".
No I'm wrong, the LNT bot was updated with my patch, but the Broadwell tests failed for execution_time versus the previous run where they failed for hash code not matching.
When I committed the clang patch, the same tests are also failing on arm. I'd like to disable contract on those tests unconditionally not just architecture specific
Plus I'm not sure if my architecture test is working right because my patch didn't "work" for broadwell. Confused: is my patch incorrect for x86 or my patch didn't get to the bot?
The bot is showing a fail due to this patch, see https://lab.llvm.org/buildbot#builders/110/builds/4007
Correct small formatting issue in LangRef.rst thanks @pengfei
I corrected error in LangRef documentation that @pengfei pointed out.
This patch addresses all of @craig.topper comments and adds documentation for the new intrinsic to the language reference as requested by @LuoYuanke nke
question for @aaron.ballman
I plan to push this in a couple days. Do the LNT bots pull the source from latest or is a special action needed to feed the updated tests into the bots?
I made a change to ActOnParenExpr to check that the parenthesized expression is not an lvalue before inserting the call to builtin __arithmetic_fence
I corrected the test output and the tests are passing now. This patch defines FMA_DISABLED=1 and ffp-contract=off for the 20 polybench test cases that are failing with D74436 applied. The remaining polybench test cases are run without those additional options, and so the additional kernel invocation with StrictFP does occur. These are the 20 tests
SingleSource/Benchmarks/Linpack
SingleSource/Benchmarks/Misc-C++/Large
SingleSource/Benchmarks/Polybench/datamining/covariance
SingleSource/Benchmarks/Polybench/datamining/correlation
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trisolv
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gemver
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/bicg
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trmm
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/atax
SingleSource/Benchmarks/Polybench/linear-algebra/solvers/gramschmidt
SingleSource/Benchmarks/Polybench/stencils/adi
MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR
MultiSource/Benchmarks/DOE-ProxyApps-C++/HPCCG
MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE
MultiSource/Benchmarks/Rodinia/srad
MultiSource/Applications/oggenc
MicroBenchmarks/ImageProcessing/BilateralFiltering
MicroBenchmarks/ImageProcessing/Blur
In D74436#2785730, @lebedev.ri wrote:No real comments from me.
I assume, the errors are because -ffp-contract=on actually results in *less* error?
Hoping @lebedev.ri will take a look since he requested changes, thanks!
Here's the patch with ffp-contract=off for the 20 tests that failed with X86 and modifications to the test source to suppress the 2nd kernel execution #if !FMA_DISABLED
In response to the review comments, I'm updating the patch like this. It's not complete just showing you the idea.
In addition to adding -ffp-contract=off to the failing tests, add -DFMA_DISABLED=1
Then in the test itself, if FMA is disabled, don't do the duplicate run of the kernel.
Rebased the patch. The parent patch is updated and corrected as well, and tests can run end-to-end.
Rebased to ToT. It fixes the previous illegal type lowering problems. It also updates the tests to show the functionality in a better way as well as fixes a newly found problem.
@rengolin
Thanks I appreciate the issues you raise.
This revision of the patch adds ffp-contract=off into the build line for the 20 tests which are failing on X86 due to the dependent clang patch.
In D102861#2774403, @andrew.w.kaylor wrote:This seems OK as a short term solution, but it is also problematic in that it prevents FMA from being used in these performance measurements. I understand that it's not trivial to allow FP-related error tolerance in the test, but (as a future patch) it would be nice to at least have a way to turn off the exact result checking on FP tests so that the performance could be measured with fast-math and fp-contract enabled, and that would require having a way to re-enable FMA for these tests.
Would it be better to add "-ffp-contract=off" to the test build lines for these cases (and any others that are already using the pragma)?
In D102861#2771946, @jdoerfert wrote:I think this is generally reasonable, haven't looked at all benchmarks and I would prefer some other opinions.
I created https://reviews.llvm.org/D102861 to make changes to the failing LNT tests. Hoping to push this commit after the test changes in D102862 are approved.
In D74436#2190566, @lebedev.ri wrote:In D74436#2190492, @lebedev.ri wrote:<...>
And by codegen changes i mostly mean newly-set/now-unset fp fast-math instruction flags.
let 'er rip
There are 3 clang settings for ffp-contract (on, off, fast) but the FMF bits have only "allow contract". Clang sets the "allow contract" bit in the IRBuilder only when ffp-contract=fast
I rebased and enhanced the test case clang/test/CodeGen/ffp-contract-option.c to show the effect of various ffp-contract={on,fast,off} * ffast-math=on,off in response to the request from @lebedev.ri ; sorry for leaving this hanging for so long, I wasn't sure what to do about the optimization discrepancies but I have a plan now. We think it could be caused by a particular nuance on Broadwell.
In my premerge testing, I see this fail which looks like the check string is incorrect:
Corrected the failing lit test and applied the clang-format patch
In D101640#2757316, @aaron.ballman wrote:In D101640#2757178, @mibintc wrote:In D101640#2757039, @aaron.ballman wrote:Thanks! The last remaining question to me is whether this should be a target-specific option or whether it makes sense to allow it as an option for any target.
I thought the patch may be more acceptable to the community if we restricted it.
I think we usually prefer more general solutions to target-specific ones, typically. Given that the functionality is generally useful for easing porting projects from 32- to 64-bits and that the code is simpler without the target-specific bits, my preference is to go with the general solution unless someone else has objections.
In D101640#2757039, @aaron.ballman wrote:Thanks! The last remaining question to me is whether this should be a target-specific option or whether it makes sense to allow it as an option for any target.
I responded to most all @aaron.ballman 's review comments, may need a little more wordsmithing
Respond to @aaron.ballman 's review
I rebased the patch and responded to review comments from @aaron.ballman and @jansvoboda11
Responded to suggestions from @jansvoboda11
Respond to review comments from @jansvoboda11
@erichkeane Can you suggest reviewers for this patch?
There was no resolution about what option name would be acceptable
some inline comments for reviewers
I think this patch is complete except it needs to wait until the parent patch is finished. Also some re-factoring may be desireable, I'll add some inline comments about that.
Wow thanks for doing this! I worked on it a couple days a while ago but I abandoned the effort and went back to my day job. It seems like preprocessing ought to be something like a "state machine" but I couldn't figure out the mechanism. Would it make sense to add some kind of high level description of the components, now that you've gone to the [presumably massive] effort of understanding it? Just a couple small comments above.
The diff appears to be 2 separate commits, so on first glance this is only patching the test files. Usually if I am working on a patch and have responded to comments, I compress the patch+updates into a single commit (git rebase -i) before creating a diff to upload to Phabricator.
In D99675#2695424, @kpn wrote:What changes are needed for a backend, and what happens if they aren't done?
I accidentally dropped the test case in previous commit. Just adding it back in -- under the llvm/test directory (previously it was in the wrong location).
This is a minor change with only formatting changes, this patch is not yet ready for review, only discussion.
Together with the llvm parent patch, this simple program can now run end-to-end
This is a minor update from @pengfei which allows simple tests cases to run end-to-end with clang.
Also I changed the "summary" to reflect the review discussion around the FMA optimization, to choose "FMA is not allowed across a fence".
I added the InGroup rule for the new warning diagnostic like Aaron requested
I removed the diagnostic from InGroup<Extra>, that's the only change from previous revision
In D97764#2685742, @craig.topper wrote:In D97764#2685655, @mibintc wrote:I received a bug report that this patch creates error diagnostics for calls to a builtin, like calling 'abort', 'exit' or one of the target builtins like __builtin_ia32_packssw then that call should be allowed without remark but this patch causes the compilation to fail. We could require that all builtin's be declared with "no caller saved reg" but that's a big modification. I'm planning to make a change which igores builtin calls, but continue to error on any implicitly declared or user declared functions without the "no caller saved reg" attribute.
Builtins like __builtin_ia32_packssw aren't usually called directly. The user should be calling the always_inline wrapper functions in x86intrin.h. Would that also fail?
I received a bug report that this patch creates error diagnostics for calls to a builtin, like calling 'abort', 'exit' or one of the target builtins like __builtin_ia32_packssw then that call should be allowed without remark but this patch causes the compilation to fail. We could require that all builtin's be declared with "no caller saved reg" but that's a big modification. I'm planning to make a change which igores builtin calls, but continue to error on any implicitly declared or user declared functions without the "no caller saved reg" attribute.
In D99675#2671924, @efriedma wrote:The expression “llvm.arith.fence(a * b) + c” means that “a * b” must happen before “+ c” and FMA guarantees that, but to prevent later optimizations from unpacking the FMA the correct transformation needs to be:
llvm.arith.fence(a * b) + c → llvm.arith.fence(FMA(a, b, c))
Does this actually block later transforms from unpacking the FMA? Maybe if the FMA isn't marked "fast"...
I'd like @pengfei to reply to this question. I think the overall idea is that many of the optimizations are pattern based, and the existing pattern wouldn't match the new intrinsic.
How is llvm.arith.fence() different from using "freeze" on a floating-point value? The goal isn't really the same, sure, but the effects seem similar at first glance.
Initially we thought the intrinsic "ssa.copy" could serve. However ssa.copy is for a different purpose and it gets optimized away. We want arith.fence to survive through codegen, that's one reason why we think a new intrinsic is needed.