avt77 (Andrew V. Tischenko)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 11 2016, 3:46 AM (54 w, 1 d)

Recent Activity

Today

avt77 updated the diff for D32218: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands.".

I simply commented the check becuase it hides the case of identificator. There are no regression failed tests that's why I hope it's acceptable.

Tue, Apr 25, 6:53 AM
avt77 updated the diff for D32162: Inline asm 0bH conflict.

I inhibitted the default constructor for ParseStatementInfo::ParseStatementInfo() because it can't work without proper initialization.

Tue, Apr 25, 5:17 AM
avt77 abandoned D30572: Remove equal BBs from a function.
Tue, Apr 25, 4:26 AM
avt77 added inline comments to D32352: Go to eleven.
Tue, Apr 25, 3:11 AM
avt77 added a comment to D32352: Go to eleven.

It's already limited:

// An imul is usually smaller than the alternative sequence.
if (DAG.getMachineFunction().getFunction()->optForMinSize())

Ah, sorry I missed that. The fact that it is "MinSize" highlights that we're in a gray area for the DAG. That is, it's hard to know what the best sequence will be without looking at the instruction timing. Given that, we need to know if converting these muls is generally good. Do you have real or synthetic benchmark info for these cases? Is there a perf difference, for example, between Jaguar and Haswell (since those CPUs are specified in the tests)? Is the codegen ever different for those CPUs? If not, why are we adding different RUNs for them in this patch?

Tue, Apr 25, 3:08 AM
avt77 added a comment to D30572: Remove equal BBs from a function.

Hi All,
Reading the sources of TailMerging Pass I discovered that it has special switch "tail-merge-size" allowing to resolve the issue from loop-serch.ll test. The default value of the switch is 3 but if I change it as 2 then everything works fine.
Because of that I decided to abandon this review :-(
I'm going to investigate the possibility to change the default value. If it is not allowed for any reasons (compile time, target specific requirements, etc.) I'll implement special hook in Target as it's suggested in sources.

Tue, Apr 25, 1:13 AM

Yesterday

avt77 added a comment to D32352: Go to eleven.

Is this or should this be limited when optimizing for size? I didn't count the instruction bytes...it might depend on the multiplier constant which version is smaller?

Mon, Apr 24, 8:05 AM
avt77 updated the diff for D32352: Go to eleven.

I implemented code reuse for different constants support. In addition I slightly changed 2 tests to deal with latency/throughput numbers. BTW, it is not clear at the moment how to use those numbers for 32-bit? What cpu should we use?

Mon, Apr 24, 5:00 AM
avt77 updated the diff for D32162: Inline asm 0bH conflict.

Test function "foo" was renamed as "PR31007" to show its origin.

Mon, Apr 24, 4:54 AM
avt77 updated the diff for D32162: Inline asm 0bH conflict.

I moved inline-0bh.ll test in test/Codegen/X86 folder.

Mon, Apr 24, 2:40 AM

Sat, Apr 22

avt77 added inline comments to rL300311: This patch closes PR#32216: Better testing of schedule model instruction….
Sat, Apr 22, 12:42 AM

Fri, Apr 21

avt77 created D32352: Go to eleven.
Fri, Apr 21, 8:02 AM
avt77 added a comment to D32219: [X86][SSE] Improve DIV/SQRT throughput estimates for SB/HW schedule models.

What are your plans here? I've just checked (with help of "-print-schedule=true") IMUL and LEA for Jaguar: they are completely wrong if we compare with numbers from http://www.agner.org/optimize/instruction_tables.pdf. Are we going to change all these things step-by-step?

Fri, Apr 21, 2:51 AM

Thu, Apr 20

avt77 added inline comments to D32162: Inline asm 0bH conflict.
Thu, Apr 20, 1:41 AM

Wed, Apr 19

avt77 updated the diff for D32162: Inline asm 0bH conflict.

I added required comments and one additional tiny fix to cover PR27884: now it works properly. The corresponding regression test was added as well.

Wed, Apr 19, 5:50 AM
avt77 created D32218: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands.".
Wed, Apr 19, 4:34 AM
avt77 added inline comments to D32162: Inline asm 0bH conflict.
Wed, Apr 19, 4:15 AM
avt77 added inline comments to D32162: Inline asm 0bH conflict.
Wed, Apr 19, 12:38 AM

Tue, Apr 18

avt77 added reviewers for D32162: Inline asm 0bH conflict: RKSimon, spatel, dtemirbulatov, zizhar.
Tue, Apr 18, 5:54 AM
avt77 created D32162: Inline asm 0bH conflict.
Tue, Apr 18, 5:50 AM

Fri, Apr 14

avt77 committed rL300314: Fix for PR#30562: Selection DAG error: Detected cycle in SelectionDAG..
Fix for PR#30562: Selection DAG error: Detected cycle in SelectionDAG.
Fri, Apr 14, 2:30 AM
avt77 committed rL300311: This patch closes PR#32216: Better testing of schedule model instruction….
This patch closes PR#32216: Better testing of schedule model instruction…
Fri, Apr 14, 12:57 AM

Wed, Apr 12

avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

I implemeted all requirements from hfinkel.
Please, review again.

Wed, Apr 12, 8:15 AM

Tue, Apr 11

avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

I fixed the latest requirements from RKSimon. Please, give me your feedback.

Tue, Apr 11, 12:23 AM

Fri, Apr 7

avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

Hope, I fixed all comments raised by RKSimon.
hfinkel, what do you think about?

Fri, Apr 7, 9:54 AM
avt77 added inline comments to D30941: Better testing of schedule model instruction latencies/throughputs.
Fri, Apr 7, 9:51 AM

Tue, Apr 4

avt77 added a comment to D31668: Fix PR30562.

You should use the following command to generate diff:

Tue, Apr 4, 11:46 PM

Fri, Mar 31

avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

Accordingly to requirements from Simon I inserted prefix "sched: " for scheduler comments and made "false" as default value for -print-schedule option. As result I restored original versions of all X86-tests excepting 2 ones to demonstrate the changes. Now we don't have any failed test.

Fri, Mar 31, 3:14 AM

Thu, Mar 30

avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

The problem with failed tests raised because of new lines of comments added as result of this patch. I was wrong when I told that FileCheck does not allow adding of new comments at EOL.
I redesigned the patch to make it possible to add Latency:Throughput at the end of exisiting comment (if any). As result I was forced to change API of EmitInstruction from MCStreamer. I don't like this change because there are a lot of successors of MCStreamer but it works perfectly and maybe useful for other targets.
I regenerated (with help of update_llc_test_checks.py) 34 tests and now we have only 16 failed tests: I'm going to fix them asap.

Thu, Mar 30, 10:01 AM

Mar 22 2017

avt77 added a comment to D30572: Remove equal BBs from a function.

Matthias,
Thank you for the fast reply.

Mar 22 2017, 7:58 AM
avt77 added a comment to D30941: Better testing of schedule model instruction latencies/throughputs.

I did everything accordingly to Hal's requirements except one: the default value of "print-schedule" switch is false because otherewise we have "Unexpected Failures: 530" and it's X86 tests ony. The problem is very simple: update_llc_test_checks.py generates CHECKs like here

; XOP-AVX1-NEXT: vextractf128 $1, %ymm2, %xmm5

I did not realize that CHECK-NEXT always matched the whole line. That's interesting.

Mar 22 2017, 7:30 AM
avt77 added a reviewer for D30572: Remove equal BBs from a function: MatzeB.
Mar 22 2017, 7:02 AM
avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

I did everything accordingly to Hal's requirements except one: the default value of "print-schedule" switch is false because otherewise we have "Unexpected Failures: 530" and it's X86 tests ony. The problem is very simple: update_llc_test_checks.py generates CHECKs like here

Mar 22 2017, 6:24 AM

Mar 21 2017

avt77 added a comment to D30941: Better testing of schedule model instruction latencies/throughputs.

Hal,
I removed the special option (-print-schedule) and tried to check-all. The result was very unpleseant but predictable:

Mar 21 2017, 5:53 AM

Mar 17 2017

avt77 added a comment to D30941: Better testing of schedule model instruction latencies/throughputs.

hfinkel, yes I mean I'll do it in the next version of this patch soon.
rksimon, do you mean we should rename the compiler option like "-print-schedule"?

It is not clear to me why you won't just always do this when in verbose-asm mode. Thoughts on not having a separate option at all?

Mar 17 2017, 9:41 AM
avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

Throughput calculation is implemented.

Mar 17 2017, 9:37 AM
avt77 added a comment to D30941: Better testing of schedule model instruction latencies/throughputs.

hfinkel, yes I mean I'll do it in the next version of this patch soon.
rksimon, do you mean we should rename the compiler option like "-print-schedule"?

Mar 17 2017, 1:33 AM

Mar 16 2017

avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

The implementation was moved to target independent area and all Hal's comments were applied. I did not do anything with Throughput: it will be done in the patch.

Mar 16 2017, 5:52 AM

Mar 15 2017

avt77 added a comment to D30941: Better testing of schedule model instruction latencies/throughputs.

hfinkel, could you help me? First of all could you give me a link(s) to any doc(s) related to our MCSchedModel except sources?

Next, I was told that ResourceCycles here:

class ProcWriteResources<list<ProcResourceKind> resources> {

list<ProcResourceKind> ProcResources = resources;
list<int> ResourceCycles = [];
int Latency = 1;
int NumMicroOps = 1;

could be used as Throughput of the given instruction. Is it right? Does it mean I could include it in generated comment as well? If YES I suppose it should be the max of the Cycles, right?

Mar 15 2017, 3:17 AM

Mar 14 2017

avt77 created D30941: Better testing of schedule model instruction latencies/throughputs.
Mar 14 2017, 7:51 AM

Mar 13 2017

avt77 updated the diff for D30572: Remove equal BBs from a function.

Now I found and fixed issues raised during bootstrap. The bootstrap works perfectly now. And I got some numbers: on such huge files like clang, llc, etc. we have about 0.1% size of file decreasing. At the moment I'm using the most possible conservative approach but collected numbers show that we can get better results: it's for sure.

Mar 13 2017, 7:57 AM

Mar 10 2017

avt77 added inline comments to D26855: New unsafe-fp-math implementation for X86 target.
Mar 10 2017, 12:09 AM

Mar 7 2017

avt77 updated the diff for D30572: Remove equal BBs from a function.

I implemented requirements raised by Davide and fix an issue with bootstrap: now it works properly. Next steps: I'm going to collect statistic on bootsrap such as: number of removed BBs and instructions inside those BBs, the total size changing, compile time. When it's done I'm going to extend transformation with other kinds of BBs (e.g. parts of EH, -O1 (size) optimizations) and/or terminators.

Mar 7 2017, 5:54 AM

Mar 3 2017

avt77 created D30572: Remove equal BBs from a function.
Mar 3 2017, 6:41 AM

Mar 2 2017

avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

I committed PIC-related test in trunk and updated this patch to be able to compare it with new code generation.

Mar 2 2017, 6:53 AM
avt77 committed rL296746: Added special test covering a problem with PIC relocation model on SLM….
Added special test covering a problem with PIC relocation model on SLM…
Mar 2 2017, 5:59 AM

Feb 21 2017

avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

Guy Blank found a problem with PIC relocation model on SLM architecture. I fixed it and added the corresponding test.

Feb 21 2017, 10:29 AM

Feb 14 2017

avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

Hi All,
Do we except anything more here?
It seems I fixed all requirements. Maybe it's time for LGTM?

Feb 14 2017, 4:20 AM

Feb 13 2017

avt77 added a comment to D29627: Compile time decreasing in the case we're dealing with Machine Combiner.

Committed revision 294936

Feb 13 2017, 1:55 AM
avt77 committed rL294936: Compile time decreasing in the case we're dealing with Machine Combiner. .
Compile time decreasing in the case we're dealing with Machine Combiner.
Feb 13 2017, 1:55 AM

Feb 10 2017

avt77 added a comment to D29627: Compile time decreasing in the case we're dealing with Machine Combiner.

Just for your info: I collected the perf numbers.

Feb 10 2017, 10:05 AM
avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

I fixed all known issues:

  • AVX512 is now again supported by DAGCombiner
  • FMA instructions are being used when FMA is enabled

This version clearly shows the advantage of sched model usage: it selects reciprocal code when it's profit only (e.g. compare v8f32_one_step and v8f32_one_step_2_divs, etc.)

Feb 10 2017, 8:31 AM
avt77 added a comment to rL294128: [X86][SSE] Add target cpu specific reciprocal tests.

Could you add some new tests like these:

Feb 10 2017, 6:32 AM

Feb 9 2017

avt77 added a comment to rL294128: [X86][SSE] Add target cpu specific reciprocal tests.

I'd like to update these tests again: see the attach. We should add FMA stuff.

Could you do it? Or Could I simply commit the new version?

Feb 9 2017, 4:59 AM

Feb 7 2017

avt77 added a comment to D29627: Compile time decreasing in the case we're dealing with Machine Combiner.

I uploaded the test for review

Feb 7 2017, 3:01 AM
avt77 created D29627: Compile time decreasing in the case we're dealing with Machine Combiner.
Feb 7 2017, 3:00 AM

Feb 3 2017

avt77 added inline comments to D26855: New unsafe-fp-math implementation for X86 target.
Feb 3 2017, 3:03 AM
avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

I fixed the issue with compile time increasing - see usage of MinInstr->getTrace(MBB). Now we're getting the trace when we really need it only. As result the executing profile was totally changed and the compiling time is now even less than it was in DAG Combiner - about 1.5 s on my laptop (I'm speaking about our worst case test only).

Feb 3 2017, 3:02 AM

Jan 31 2017

avt77 added inline comments to D26855: New unsafe-fp-math implementation for X86 target.
Jan 31 2017, 1:46 AM

Jan 27 2017

avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

I updated recip-fastmath2.ll test accordingly to Simon recommendations. Now it includes special checks for different CPUs: SandyBridge, Haswell and btver2. These new checks demonstrate that alternative sequence of instructions is being selected when it's really cheaper than the single fdiv instruction. (Obviously we should change cost numbers for SandyBridge because they are too small.)

Jan 27 2017, 4:39 AM
avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

I got the first profiling data. In fact it's the same that was described by Sanjay:

Jan 27 2017, 3:45 AM
avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

I think the only issue that needs to be addressed is (finally!) sharing perf data. This has been raised at least 3 times. The possible compile-time implication, the speciality of the application (fast-math) etc are well understood.

Gerolf

Jan 27 2017, 1:07 AM

Jan 26 2017

avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

Hi All,
I found a really "stress" test for div operations (see the attachment)

(tnx to Sanjay Patel). The test shows maybe the worst case of the possible degradation because of this patch. I used the following command with 2 different compilers:

Jan 26 2017, 12:27 AM

Jan 19 2017

avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

Chandler, in fact this patch should not show any improvement in generating code. If you look in changes made in tests you'll see that the newly generated code is almost identical to the previous one (only some names, order of instructions, etc.). The idea of the patch is moving of such kind of optimization from the rather high level (DAGCombiner) to the really low level (MachineCombiner), Here we see real target machine instructions and as result we can use real cost model to estimate the real cost of possible transformation (in the given case the transformation is the replacement of one instruction (div) with some sequence of instructions). The transformation itself already exists inside Clang but the patch suggests to implement it in another place and that's it. If we agree with this new place of implementation then it will be the base for future possible similar optimizations like rsqrt, etc. And in addition this (and follow up) patch(es) will allow us to remove 'fake' subtarget features like FeatureFastScalarFSQRT / FeatureFastVectorFSQRT, etc. The question from Gerolf was not about the quality of the generated code (it's the same like we have now) but about the compilation time only.

Jan 19 2017, 3:04 AM

Jan 18 2017

avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

What is "Eigen project"? Could you point me to it?

Jan 18 2017, 9:26 AM
avt77 added inline comments to D26855: New unsafe-fp-math implementation for X86 target.
Jan 18 2017, 1:03 AM

Jan 16 2017

avt77 added inline comments to D26855: New unsafe-fp-math implementation for X86 target.
Jan 16 2017, 3:22 AM
avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

I fixed everything except one comment (see below). And I collected new perf numbers. Now I used the following command for bootstrap building:

Jan 16 2017, 3:18 AM

Jan 9 2017

avt77 added inline comments to D27618: Failure to vectorize __builtin_sqrt/__builtin_sqrtf.
Jan 9 2017, 1:35 AM

Dec 28 2016

avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

I fixed the last issues raised by Gerolf except one related to special case of "if" because the suggested change breaks the current logic.

Dec 28 2016, 9:35 AM

Dec 24 2016

avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

I made new experiments but now I use a dedicated computer for it:

Dec 24 2016, 6:35 AM

Dec 23 2016

avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

I'll let Simon decide but these numbers are iffy. I can't necessarily conclude your patch increases compile time but I can't conclude anything else either. In particular, the stock clang measurement have a variance of 20% between consecutive runs, so I have very little faith in the numbers collected.
Rafael recent('ish)ly published a set of suggestions/knob to turn on to get relatively stable numbers on a Linux machine. I'm also pretty sure the topic of how to get {reliable numbers, numbers you can have faith in} has been discussed multiple times (look at the archives, Sean has generally pretty informative posts/insights on the topic).

Dec 23 2016, 9:01 AM
avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

Two more comments:

  1. I did not update the patch accordingly to the latest Gerolf comments: I'll do it asap
  2. Gerolf asked: "Perhaps I missed it but I expected the optimization to kick in only under fast math. I saw 'fast' in the test cases, but didn't see a check in the code."
Dec 23 2016, 8:53 AM
avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

Yes, I've just got the numbers. I created 2 versions of clang compiler: directly from trunk and with my patch applied. Then with help of these compilers I created 2 new compilers with the following configuration:

Dec 23 2016, 8:34 AM

Dec 22 2016

avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.
  1. The current trunk has already changes in Machine::print, etc. similar in my initial patch. Because of that I removed all corresponding changes and did not answer on all corresponding comments.
  2. It seems I fixed all other requirements from Gerolf
  3. But the main question is the same: should we continue with the effort?
Dec 22 2016, 8:12 AM
avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

What do you mean when you speak about "automation"? Do you mean a possibility to describe alternative sequences with tools like TableGen? If yes I'm afraid it'll require some real time to implement. That's why from my point of view the hand-made patterns similar to the given one could be really useful in future. But of course it'd be really interesting to launch such a project. Right?

Dec 22 2016, 4:38 AM

Dec 19 2016

avt77 updated the diff for D27618: Failure to vectorize __builtin_sqrt/__builtin_sqrtf.

I restored the check if (ICS->hasNoNaNs()) but for non-vector operations only because vector instructions work correctly with invalid input values. To make it possible I changed the signature of llvm::getIntrinsicForCallSite. Now it knows the required intrinsic target.

Dec 19 2016, 3:46 AM

Dec 16 2016

avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

I've updated the reciprocal related tests to see difference between old and new code gen more clearly. In fact there is no real difference but the new approach allows to take into account the schedule cost model when we deal with different machine code patterns. This patch should become the first step in the future similar optimizations like rsqrt, etc.

Dec 16 2016, 4:38 AM
avt77 committed rL289931: Extra coverage tests to demonstrate fixes in D72618 and D26855.
Extra coverage tests to demonstrate fixes in D72618 and D26855
Dec 16 2016, 2:06 AM

Dec 9 2016

avt77 added reviewers for D27618: Failure to vectorize __builtin_sqrt/__builtin_sqrtf: RKSimon, spatel, ABataev.
Dec 9 2016, 7:05 AM
avt77 retitled D27618: Failure to vectorize __builtin_sqrt/__builtin_sqrtf from to Failure to vectorize __builtin_sqrt/__builtin_sqrtf.
Dec 9 2016, 7:04 AM

Dec 7 2016

avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

I fixed all requirements raised by Simon and Alexey. The special test is here as well.

Dec 7 2016, 5:41 AM

Dec 6 2016

avt77 added reviewers for D26855: New unsafe-fp-math implementation for X86 target: hfinkel, Gerolf, dsanders.
Dec 6 2016, 3:01 AM
avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

I updated the test related to reciprocal code gen. Now it shows most of possible variants of dividend (not only 1.0 as it was before). Now the patch is ready for final review.

Dec 6 2016, 2:58 AM

Dec 2 2016

avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

All failed assertions were resolved. The ability to use old reciprocal implementation for non-x86 platforms was restored. Now we have only one failed test CodeGen/X86/recip-fastmath.ll but that's exactly what we were expecting because I changed the code generation here. I hope that's 99.99% of the patch - please review.

Dec 2 2016, 6:53 AM

Dec 1 2016

avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

I fixed one assertion failing (but there is another one at the moment), moved to the range for, formatted and made some sugar changes.

Dec 1 2016, 5:53 AM

Nov 30 2016

avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

The new version for next discussions

Nov 30 2016, 8:33 AM
avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

In fact if you try something like

Nov 30 2016, 4:05 AM
avt77 added a comment to D26855: New unsafe-fp-math implementation for X86 target.

The new reciprocal implementation is done on more than 97%: we still don't have public tests but the code changes are almost completed. Please review and send me your comments.

I applied the patch to trunk, but every example that I tried after that crashed:
Assertion failed: (i < getNumOperands() && "getOperand() out of range!"), function getOperand, file llvm/include/llvm/CodeGen/MachineInstr.h, line 280.

Is it correct that this patch will allow us to remove 'fake' subtarget features like FeatureFastScalarFSQRT / FeatureFastVectorFSQRT ?

Nov 30 2016, 12:38 AM

Nov 28 2016

avt77 updated the diff for D26855: New unsafe-fp-math implementation for X86 target.

The new reciprocal implementation is done on more than 97%: we still don't have public tests but the code changes are almost completed. Please review and send me your comments.

Nov 28 2016, 9:04 AM

Nov 18 2016

avt77 retitled D26855: New unsafe-fp-math implementation for X86 target from to New unsafe-fp-math implementation for X86 target.
Nov 18 2016, 7:24 AM

Nov 2 2016

avt77 added a comment to D24760: Failure to hoist constant out of loop.

Looks like the hoisting issue described in PR27136 is already fixed at head? My guess is that the fix is from https://reviews.llvm.org/rL284757

Nov 2 2016, 3:03 AM

Oct 31 2016

avt77 updated the diff for D25722: Improved cost model for FDIV and FSQRT.

FSQRT changes: SSE1 Cost table updated with Pentium III numbers; SSE42 cost table added with Nehalem numbers

Oct 31 2016, 2:35 AM

Oct 29 2016

avt77 updated the diff for D25722: Improved cost model for FDIV and FSQRT.

Haswell numbers added for AVX2

Oct 29 2016, 3:40 AM

Oct 27 2016

avt77 updated the diff for D25722: Improved cost model for FDIV and FSQRT.

The wrong SNB numbers were fixed (tnx to Simon Pilgrim)

Oct 27 2016, 8:10 AM
avt77 updated the diff for D25722: Improved cost model for FDIV and FSQRT.

All numbers from IACA were replaced with Agner's numbers

Oct 27 2016, 6:37 AM
avt77 added inline comments to D25722: Improved cost model for FDIV and FSQRT.
Oct 27 2016, 6:35 AM
avt77 added a comment to D24760: Failure to hoist constant out of loop.

I don't see any changes to ifcvt-rescan-diamonds.ll

Oct 27 2016, 1:04 AM

Oct 26 2016

avt77 updated the diff for D24760: Failure to hoist constant out of loop.

I've fixed all failed tests. Test owners please review my changes: they are rather small that's why it should not take a lot of time from you.

Oct 26 2016, 7:35 AM
avt77 added a comment to D25722: Improved cost model for FDIV and FSQRT.

If I understood correctly I should replace all IACA numbers with Agner's numbers, right? OK, I'll do it.
JFYI, I'm not working in Intel since July but of course I know a lot of guys from Intel and I'll try to ask them about IACA future.

Oct 26 2016, 12:43 AM