avt77 (Andrew V. Tischenko)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 11 2016, 3:46 AM (58 w, 5 d)

Recent Activity

Yesterday

avt77 committed rL303985: The fix for PR22004: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too….
The fix for PR22004: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too…
Fri, May 26, 6:23 AM

Thu, May 25

avt77 updated the diff for D32352: Go to eleven.

Hi All,
I merged with trunk and launched "check-all": everything works without any issue.
Craig, Zvi - could you give me LGTM?

Thu, May 25, 8:17 AM
avt77 updated the diff for D32218: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands.".

I restored the condition like here:

Thu, May 25, 7:29 AM

Wed, May 24

avt77 added inline comments to D32218: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands.".
Wed, May 24, 11:58 PM
avt77 updated the diff for D33099: AMD Jaguar scheduler doesn't correctly model 256-bit AVX instructions.

I've fixed all issues raised by Simon. In addition I re-checked all numbers: it seems they are correct now.

Wed, May 24, 6:32 AM

Wed, May 17

avt77 added inline comments to D33099: AMD Jaguar scheduler doesn't correctly model 256-bit AVX instructions.
Wed, May 17, 8:47 AM

Mon, May 15

avt77 updated the diff for D33099: AMD Jaguar scheduler doesn't correctly model 256-bit AVX instructions.

I slightly changed the algorithm of throughput calculation: if the instr sched model does not have cycles for the given instruction but it's valid then throughput is equal to lattency.

Mon, May 15, 11:52 PM
avt77 added inline comments to D32218: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands.".
Mon, May 15, 10:13 AM
avt77 created D33203: Add scheduler classes to integer/float horizontal operations.
Mon, May 15, 9:23 AM
avt77 added inline comments to D32218: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands.".
Mon, May 15, 3:28 AM

Fri, May 12

avt77 updated the diff for D33099: AMD Jaguar scheduler doesn't correctly model 256-bit AVX instructions.

It seems I fixed all known issues except proper support of vzeroupper and vzeroall: will try to do it in the next patch.

Fri, May 12, 5:02 AM

Thu, May 11

avt77 added reviewers for D33099: AMD Jaguar scheduler doesn't correctly model 256-bit AVX instructions: RKSimon, spatel, dtemirbulatov.
Thu, May 11, 6:29 AM
avt77 created D33099: AMD Jaguar scheduler doesn't correctly model 256-bit AVX instructions.
Thu, May 11, 6:08 AM

Apr 27 2017

avt77 committed rL301529: 2 tests that were lost in rL301390.
2 tests that were lost in rL301390
Apr 27 2017, 3:33 AM

Apr 26 2017

avt77 updated the diff for D32352: Go to eleven.

Now we have 3 different versions of test_mul_spec.

Apr 26 2017, 10:03 AM
avt77 updated the diff for D32352: Go to eleven.

I removed redundant local variables.

Apr 26 2017, 7:44 AM
avt77 committed rL301390: PR31007 and PR27884 will be closed: a possibility to compile constants like 0bH….
PR31007 and PR27884 will be closed: a possibility to compile constants like 0bH…
Apr 26 2017, 3:10 AM
avt77 updated the diff for D32352: Go to eleven.

The issues with break-return fixed.

Apr 26 2017, 3:05 AM
avt77 updated the diff for D32352: Go to eleven.

**Lambdas refactoring: in fact I tried to do it from the very beginning but I got the error message:

/home/atischenko/workspaces/lea-mult-DAG/llvm/lib/Target/X86/X86ISelLowering.cpp: In function ‘llvm::SDValue combineMulSpecial(uint64_t, llvm::SDNode*, llvm::SelectionDAG&, llvm::EVT, llvm::SDLoc)’:
/home/atischenko/workspaces/lea-mult-DAG/llvm/lib/Target/X86/X86ISelLowering.cpp:30955:3: error: conversion from ‘combineMulSpecial(uint64_t, llvm::SDNode*, llvm::SelectionDAG&, llvm::EVT, llvm::SDLoc)::<lambda(int, int, bool)>’ to non-scalar type ‘llvm::SDValue’ requested

};
^

/home/atischenko/workspaces/lea-mult-DAG/llvm/lib/Target/X86/X86ISelLowering.cpp:30973:55: error: no match for call to ‘(llvm::SDValue) (int, int, bool)’

Result = combineMulShlAddOrSub(5, 1, /*isAdd*/true);
Apr 26 2017, 2:33 AM

Apr 25 2017

avt77 added a comment to D32218: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands.".

The initial problem was described in PR22004. The problem raises because this check does not allow using of identifiers starting from dot. But such IDs are legal for asm, e.g as local labels. As result we have (in this exactly case) the binary expression without RHS operand. It means "too few operands" like assertion says. If I comment out the check then everything works without any problems including all regression tests. Could I remove the check from the code?

Apr 25 2017, 9:47 AM
avt77 updated the diff for D32352: Go to eleven.

The issue with MulConstantOptimization was fixed.

Apr 25 2017, 9:33 AM
avt77 added inline comments to D32352: Go to eleven.
Apr 25 2017, 9:22 AM
avt77 updated the diff for D32352: Go to eleven.

I implemented the requests from Zvi, Isaba and RKSimon. Please, review again.

Apr 25 2017, 7:57 AM
avt77 updated the diff for D32218: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands.".

I simply commented the check becuase it hides the case of identificator. There are no regression failed tests that's why I hope it's acceptable.

Apr 25 2017, 6:53 AM
avt77 updated the diff for D32162: Inline asm 0bH conflict.

I inhibitted the default constructor for ParseStatementInfo::ParseStatementInfo() because it can't work without proper initialization.

Apr 25 2017, 5:17 AM
avt77 abandoned D30572: Remove equal BBs from a function.
Apr 25 2017, 4:26 AM
avt77 added inline comments to D32352: Go to eleven.
Apr 25 2017, 3:11 AM
avt77 added a comment to D32352: Go to eleven.

It's already limited:

// An imul is usually smaller than the alternative sequence.
if (DAG.getMachineFunction().getFunction()->optForMinSize())

Ah, sorry I missed that. The fact that it is "MinSize" highlights that we're in a gray area for the DAG. That is, it's hard to know what the best sequence will be without looking at the instruction timing. Given that, we need to know if converting these muls is generally good. Do you have real or synthetic benchmark info for these cases? Is there a perf difference, for example, between Jaguar and Haswell (since those CPUs are specified in the tests)? Is the codegen ever different for those CPUs? If not, why are we adding different RUNs for them in this patch?

Apr 25 2017, 3:08 AM
avt77 added a comment to D30572: Remove equal BBs from a function.

Hi All,
Reading the sources of TailMerging Pass I discovered that it has special switch "tail-merge-size" allowing to resolve the issue from loop-serch.ll test. The default value of the switch is 3 but if I change it as 2 then everything works fine.
Because of that I decided to abandon this review :-(
I'm going to investigate the possibility to change the default value. If it is not allowed for any reasons (compile time, target specific requirements, etc.) I'll implement special hook in Target as it's suggested in sources.

Apr 25 2017, 1:13 AM

Apr 24 2017

avt77 added a comment to D32352: Go to eleven.

Is this or should this be limited when optimizing for size? I didn't count the instruction bytes...it might depend on the multiplier constant which version is smaller?

Apr 24 2017, 8:05 AM
avt77 updated the diff for D32352: Go to eleven.

I implemented code reuse for different constants support. In addition I slightly changed 2 tests to deal with latency/throughput numbers. BTW, it is not clear at the moment how to use those numbers for 32-bit? What cpu should we use?

Apr 24 2017, 5:00 AM
avt77 updated the diff for D32162: Inline asm 0bH conflict.

Test function "foo" was renamed as "PR31007" to show its origin.

Apr 24 2017, 4:54 AM
avt77 updated the diff for D32162: Inline asm 0bH conflict.

I moved inline-0bh.ll test in test/Codegen/X86 folder.

Apr 24 2017, 2:40 AM

Apr 22 2017

avt77 added inline comments to rL300311: This patch closes PR#32216: Better testing of schedule model instruction….
Apr 22 2017, 12:42 AM

Apr 21 2017

avt77 created D32352: Go to eleven.
Apr 21 2017, 8:02 AM
avt77 added a comment to D32219: [X86][SSE] Improve DIV/SQRT throughput estimates for SB/HW schedule models.

What are your plans here? I've just checked (with help of "-print-schedule=true") IMUL and LEA for Jaguar: they are completely wrong if we compare with numbers from http://www.agner.org/optimize/instruction_tables.pdf. Are we going to change all these things step-by-step?

Apr 21 2017, 2:51 AM

Apr 20 2017

avt77 added inline comments to D32162: Inline asm 0bH conflict.
Apr 20 2017, 1:41 AM

Apr 19 2017

avt77 updated the diff for D32162: Inline asm 0bH conflict.

I added required comments and one additional tiny fix to cover PR27884: now it works properly. The corresponding regression test was added as well.

Apr 19 2017, 5:50 AM
avt77 created D32218: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands.".
Apr 19 2017, 4:34 AM
avt77 added inline comments to D32162: Inline asm 0bH conflict.
Apr 19 2017, 4:15 AM
avt77 added inline comments to D32162: Inline asm 0bH conflict.
Apr 19 2017, 12:38 AM

Apr 18 2017

avt77 added reviewers for D32162: Inline asm 0bH conflict: RKSimon, spatel, dtemirbulatov, zizhar.
Apr 18 2017, 5:54 AM
avt77 created D32162: Inline asm 0bH conflict.
Apr 18 2017, 5:50 AM

Apr 14 2017

avt77 committed rL300314: Fix for PR#30562: Selection DAG error: Detected cycle in SelectionDAG..
Fix for PR#30562: Selection DAG error: Detected cycle in SelectionDAG.
Apr 14 2017, 2:30 AM
avt77 committed rL300311: This patch closes PR#32216: Better testing of schedule model instruction….
This patch closes PR#32216: Better testing of schedule model instruction…
Apr 14 2017, 12:57 AM

Apr 12 2017

avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

I implemeted all requirements from hfinkel.
Please, review again.

Apr 12 2017, 8:15 AM

Apr 11 2017

avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

I fixed the latest requirements from RKSimon. Please, give me your feedback.

Apr 11 2017, 12:23 AM

Apr 7 2017

avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

Hope, I fixed all comments raised by RKSimon.
hfinkel, what do you think about?

Apr 7 2017, 9:54 AM
avt77 added inline comments to D30941: Better testing of schedule model instruction latencies/throughputs.
Apr 7 2017, 9:51 AM

Apr 4 2017

avt77 added a comment to D31668: Fix PR30562.

You should use the following command to generate diff:

Apr 4 2017, 11:46 PM

Mar 31 2017

avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

Accordingly to requirements from Simon I inserted prefix "sched: " for scheduler comments and made "false" as default value for -print-schedule option. As result I restored original versions of all X86-tests excepting 2 ones to demonstrate the changes. Now we don't have any failed test.

Mar 31 2017, 3:14 AM

Mar 30 2017

avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

The problem with failed tests raised because of new lines of comments added as result of this patch. I was wrong when I told that FileCheck does not allow adding of new comments at EOL.
I redesigned the patch to make it possible to add Latency:Throughput at the end of exisiting comment (if any). As result I was forced to change API of EmitInstruction from MCStreamer. I don't like this change because there are a lot of successors of MCStreamer but it works perfectly and maybe useful for other targets.
I regenerated (with help of update_llc_test_checks.py) 34 tests and now we have only 16 failed tests: I'm going to fix them asap.

Mar 30 2017, 10:01 AM

Mar 22 2017

avt77 added a comment to D30572: Remove equal BBs from a function.

Matthias,
Thank you for the fast reply.

Mar 22 2017, 7:58 AM
avt77 added a comment to D30941: Better testing of schedule model instruction latencies/throughputs.

I did everything accordingly to Hal's requirements except one: the default value of "print-schedule" switch is false because otherewise we have "Unexpected Failures: 530" and it's X86 tests ony. The problem is very simple: update_llc_test_checks.py generates CHECKs like here

; XOP-AVX1-NEXT: vextractf128 $1, %ymm2, %xmm5

I did not realize that CHECK-NEXT always matched the whole line. That's interesting.

Mar 22 2017, 7:30 AM
avt77 added a reviewer for D30572: Remove equal BBs from a function: MatzeB.
Mar 22 2017, 7:02 AM
avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

I did everything accordingly to Hal's requirements except one: the default value of "print-schedule" switch is false because otherewise we have "Unexpected Failures: 530" and it's X86 tests ony. The problem is very simple: update_llc_test_checks.py generates CHECKs like here

Mar 22 2017, 6:24 AM

Mar 21 2017

avt77 added a comment to D30941: Better testing of schedule model instruction latencies/throughputs.

Hal,
I removed the special option (-print-schedule) and tried to check-all. The result was very unpleseant but predictable:

Mar 21 2017, 5:53 AM

Mar 17 2017

avt77 added a comment to D30941: Better testing of schedule model instruction latencies/throughputs.

hfinkel, yes I mean I'll do it in the next version of this patch soon.
rksimon, do you mean we should rename the compiler option like "-print-schedule"?

It is not clear to me why you won't just always do this when in verbose-asm mode. Thoughts on not having a separate option at all?

Mar 17 2017, 9:41 AM
avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

Throughput calculation is implemented.

Mar 17 2017, 9:37 AM
avt77 added a comment to D30941: Better testing of schedule model instruction latencies/throughputs.

hfinkel, yes I mean I'll do it in the next version of this patch soon.
rksimon, do you mean we should rename the compiler option like "-print-schedule"?

Mar 17 2017, 1:33 AM

Mar 16 2017

avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

The implementation was moved to target independent area and all Hal's comments were applied. I did not do anything with Throughput: it will be done in the patch.

Mar 16 2017, 5:52 AM

Mar 15 2017

avt77 added a comment to D30941: Better testing of schedule model instruction latencies/throughputs.

hfinkel, could you help me? First of all could you give me a link(s) to any doc(s) related to our MCSchedModel except sources?

Next, I was told that ResourceCycles here:

class ProcWriteResources<list<ProcResourceKind> resources> {

list<ProcResourceKind> ProcResources = resources;
list<int> ResourceCycles = [];
int Latency = 1;
int NumMicroOps = 1;

could be used as Throughput of the given instruction. Is it right? Does it mean I could include it in generated comment as well? If YES I suppose it should be the max of the Cycles, right?

Mar 15 2017, 3:17 AM

Mar 14 2017

avt77 created D30941: Better testing of schedule model instruction latencies/throughputs.
Mar 14 2017, 7:51 AM

Mar 13 2017

avt77 updated the diff for D30572: Remove equal BBs from a function.

Now I found and fixed issues raised during bootstrap. The bootstrap works perfectly now. And I got some numbers: on such huge files like clang, llc, etc. we have about 0.1% size of file decreasing. At the moment I'm using the most possible conservative approach but collected numbers show that we can get better results: it's for sure.

Mar 13 2017, 7:57 AM

Mar 10 2017

avt77 added inline comments to D26855: New unsafe-fp-math implementation for X86 target.
Mar 10 2017, 12:09 AM

Mar 7 2017

avt77 updated the diff for D30572: Remove equal BBs from a function.

I implemented requirements raised by Davide and fix an issue with bootstrap: now it works properly. Next steps: I'm going to collect statistic on bootsrap such as: number of removed BBs and instructions inside those BBs, the total size changing, compile time. When it's done I'm going to extend transformation with other kinds of BBs (e.g. parts of EH, -O1 (size) optimizations) and/or terminators.

Mar 7 2017, 5:54 AM

Mar 3 2017

avt77 created D30572: Remove equal BBs from a function.
Mar 3 2017, 6:41 AM

Mar 2 2017

avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

I committed PIC-related test in trunk and updated this patch to be able to compare it with new code generation.

Mar 2 2017, 6:53 AM
avt77 committed rL296746: Added special test covering a problem with PIC relocation model on SLM….
Added special test covering a problem with PIC relocation model on SLM…
Mar 2 2017, 5:59 AM

Feb 21 2017

avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

Guy Blank found a problem with PIC relocation model on SLM architecture. I fixed it and added the corresponding test.

Feb 21 2017, 10:29 AM

Feb 14 2017

avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

Hi All,
Do we except anything more here?
It seems I fixed all requirements. Maybe it's time for LGTM?

Feb 14 2017, 4:20 AM

Feb 13 2017

avt77 added a comment to D29627: Compile time decreasing in the case we're dealing with Machine Combiner.

Committed revision 294936

Feb 13 2017, 1:55 AM
avt77 committed rL294936: Compile time decreasing in the case we're dealing with Machine Combiner. .
Compile time decreasing in the case we're dealing with Machine Combiner.
Feb 13 2017, 1:55 AM

Feb 10 2017

avt77 added a comment to D29627: Compile time decreasing in the case we're dealing with Machine Combiner.

Just for your info: I collected the perf numbers.

Feb 10 2017, 10:05 AM
avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

I fixed all known issues:

  • AVX512 is now again supported by DAGCombiner
  • FMA instructions are being used when FMA is enabled

This version clearly shows the advantage of sched model usage: it selects reciprocal code when it's profit only (e.g. compare v8f32_one_step and v8f32_one_step_2_divs, etc.)

Feb 10 2017, 8:31 AM
avt77 added a comment to rL294128: [X86][SSE] Add target cpu specific reciprocal tests.

Could you add some new tests like these:

Feb 10 2017, 6:32 AM

Feb 9 2017

avt77 added a comment to rL294128: [X86][SSE] Add target cpu specific reciprocal tests.

I'd like to update these tests again: see the attach. We should add FMA stuff.

Could you do it? Or Could I simply commit the new version?

Feb 9 2017, 4:59 AM

Feb 7 2017

avt77 added a comment to D29627: Compile time decreasing in the case we're dealing with Machine Combiner.

I uploaded the test for review

Feb 7 2017, 3:01 AM
avt77 created D29627: Compile time decreasing in the case we're dealing with Machine Combiner.
Feb 7 2017, 3:00 AM

Feb 3 2017

avt77 added inline comments to D26855: New unsafe-fp-math implementation for X86 target.
Feb 3 2017, 3:03 AM
avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

I fixed the issue with compile time increasing - see usage of MinInstr->getTrace(MBB). Now we're getting the trace when we really need it only. As result the executing profile was totally changed and the compiling time is now even less than it was in DAG Combiner - about 1.5 s on my laptop (I'm speaking about our worst case test only).

Feb 3 2017, 3:02 AM

Jan 31 2017

avt77 added inline comments to D26855: New unsafe-fp-math implementation for X86 target.
Jan 31 2017, 1:46 AM

Jan 27 2017

avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

I updated recip-fastmath2.ll test accordingly to Simon recommendations. Now it includes special checks for different CPUs: SandyBridge, Haswell and btver2. These new checks demonstrate that alternative sequence of instructions is being selected when it's really cheaper than the single fdiv instruction. (Obviously we should change cost numbers for SandyBridge because they are too small.)

Jan 27 2017, 4:39 AM
avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

I got the first profiling data. In fact it's the same that was described by Sanjay:

Jan 27 2017, 3:45 AM
avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

I think the only issue that needs to be addressed is (finally!) sharing perf data. This has been raised at least 3 times. The possible compile-time implication, the speciality of the application (fast-math) etc are well understood.

Gerolf

Jan 27 2017, 1:07 AM

Jan 26 2017

avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

Hi All,
I found a really "stress" test for div operations (see the attachment)

(tnx to Sanjay Patel). The test shows maybe the worst case of the possible degradation because of this patch. I used the following command with 2 different compilers:

Jan 26 2017, 12:27 AM

Jan 19 2017

avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

Chandler, in fact this patch should not show any improvement in generating code. If you look in changes made in tests you'll see that the newly generated code is almost identical to the previous one (only some names, order of instructions, etc.). The idea of the patch is moving of such kind of optimization from the rather high level (DAGCombiner) to the really low level (MachineCombiner), Here we see real target machine instructions and as result we can use real cost model to estimate the real cost of possible transformation (in the given case the transformation is the replacement of one instruction (div) with some sequence of instructions). The transformation itself already exists inside Clang but the patch suggests to implement it in another place and that's it. If we agree with this new place of implementation then it will be the base for future possible similar optimizations like rsqrt, etc. And in addition this (and follow up) patch(es) will allow us to remove 'fake' subtarget features like FeatureFastScalarFSQRT / FeatureFastVectorFSQRT, etc. The question from Gerolf was not about the quality of the generated code (it's the same like we have now) but about the compilation time only.

Jan 19 2017, 3:04 AM

Jan 18 2017

avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

What is "Eigen project"? Could you point me to it?

Jan 18 2017, 9:26 AM
avt77 added inline comments to D26855: New unsafe-fp-math implementation for X86 target.
Jan 18 2017, 1:03 AM

Jan 16 2017

avt77 added inline comments to D26855: New unsafe-fp-math implementation for X86 target.
Jan 16 2017, 3:22 AM
avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

I fixed everything except one comment (see below). And I collected new perf numbers. Now I used the following command for bootstrap building:

Jan 16 2017, 3:18 AM

Jan 9 2017

avt77 added inline comments to D27618: Failure to vectorize __builtin_sqrt/__builtin_sqrtf.
Jan 9 2017, 1:35 AM

Dec 28 2016

avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

I fixed the last issues raised by Gerolf except one related to special case of "if" because the suggested change breaks the current logic.

Dec 28 2016, 9:35 AM

Dec 24 2016

avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

I made new experiments but now I use a dedicated computer for it:

Dec 24 2016, 6:35 AM

Dec 23 2016

avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

I'll let Simon decide but these numbers are iffy. I can't necessarily conclude your patch increases compile time but I can't conclude anything else either. In particular, the stock clang measurement have a variance of 20% between consecutive runs, so I have very little faith in the numbers collected.
Rafael recent('ish)ly published a set of suggestions/knob to turn on to get relatively stable numbers on a Linux machine. I'm also pretty sure the topic of how to get {reliable numbers, numbers you can have faith in} has been discussed multiple times (look at the archives, Sean has generally pretty informative posts/insights on the topic).

Dec 23 2016, 9:01 AM
avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

Two more comments:

  1. I did not update the patch accordingly to the latest Gerolf comments: I'll do it asap
  2. Gerolf asked: "Perhaps I missed it but I expected the optimization to kick in only under fast math. I saw 'fast' in the test cases, but didn't see a check in the code."
Dec 23 2016, 8:53 AM
avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

Yes, I've just got the numbers. I created 2 versions of clang compiler: directly from trunk and with my patch applied. Then with help of these compilers I created 2 new compilers with the following configuration:

Dec 23 2016, 8:34 AM

Dec 22 2016

avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.
  1. The current trunk has already changes in Machine::print, etc. similar in my initial patch. Because of that I removed all corresponding changes and did not answer on all corresponding comments.
  2. It seems I fixed all other requirements from Gerolf
  3. But the main question is the same: should we continue with the effort?
Dec 22 2016, 8:12 AM
avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

What do you mean when you speak about "automation"? Do you mean a possibility to describe alternative sequences with tools like TableGen? If yes I'm afraid it'll require some real time to implement. That's why from my point of view the hand-made patterns similar to the given one could be really useful in future. But of course it'd be really interesting to launch such a project. Right?

Dec 22 2016, 4:38 AM

Dec 19 2016

avt77 updated the diff for D27618: Failure to vectorize __builtin_sqrt/__builtin_sqrtf.

I restored the check if (ICS->hasNoNaNs()) but for non-vector operations only because vector instructions work correctly with invalid input values. To make it possible I changed the signature of llvm::getIntrinsicForCallSite. Now it knows the required intrinsic target.

Dec 19 2016, 3:46 AM