avt77 (Andrew V. Tischenko)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 11 2016, 3:46 AM (66 w, 5 d)

Recent Activity

Yesterday

avt77 updated the diff for D35115: X86 Asm should produce error messages instead of assertions when it's possible.

This patch replaces assertions with normal diagnostic when it's possible. Now we see normal error messages instead of compiler crashes. It should close PR33861, PR33862 and PR33661. In addition the patch prepares tests for D35621.

Fri, Jul 21, 8:13 AM

Thu, Jul 20

avt77 updated the diff for D35621: X86 Asm can't work properly with symbolic Scale.

I updated 2 tests in trunk to demonstarte the difference in code generation after this patch aplying. Now we can clearly see the issues and how they could be resolved with help of this patch.

Thu, Jul 20, 7:46 AM
avt77 committed rL308609: This patch added some test cases to demonsrate the issues described in Bug….
This patch added some test cases to demonsrate the issues described in Bug…
Thu, Jul 20, 5:46 AM

Wed, Jul 19

avt77 created D35621: X86 Asm can't work properly with symbolic Scale.
Wed, Jul 19, 7:22 AM

Fri, Jul 14

avt77 added inline comments to D35115: X86 Asm should produce error messages instead of assertions when it's possible.
Fri, Jul 14, 1:47 AM
avt77 updated the diff for D35202: [X86] lea rdx, [rax - one] adds one instead of subtracts when one is a symbol that has been .set (PR33667).

I fixed Simon's comments.

Fri, Jul 14, 1:43 AM

Thu, Jul 13

avt77 updated the diff for D33099: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573).

I merged this patch with trunk. Now it's a part 2 othe initial patch.

Thu, Jul 13, 11:57 PM

Tue, Jul 11

avt77 added inline comments to D35115: X86 Asm should produce error messages instead of assertions when it's possible.
Tue, Jul 11, 8:21 AM
avt77 added inline comments to D35115: X86 Asm should produce error messages instead of assertions when it's possible.
Tue, Jul 11, 3:43 AM

Mon, Jul 10

avt77 committed rL307552: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler Part-1….
[X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler Part-1…
Mon, Jul 10, 9:36 AM
avt77 closed D35198: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler Part-1 by committing rL307552: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler Part-1….
Mon, Jul 10, 9:36 AM
avt77 created D35202: [X86] lea rdx, [rax - one] adds one instead of subtracts when one is a symbol that has been .set (PR33667).
Mon, Jul 10, 8:37 AM
avt77 created D35198: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler Part-1 .
Mon, Jul 10, 6:07 AM

Fri, Jul 7

avt77 updated the diff for D35115: X86 Asm should produce error messages instead of assertions when it's possible.

I removed NFCs from this patch.

Fri, Jul 7, 7:55 AM
avt77 committed rL307397: NFC: I simply added CHECK-LABEL to prevent false matches in the tests..
NFC: I simply added CHECK-LABEL to prevent false matches in the tests.
Fri, Jul 7, 6:42 AM
avt77 updated the diff for D33099: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573).

Simon, thank you for all these catches: I fixed them.

Fri, Jul 7, 4:00 AM
avt77 created D35115: X86 Asm should produce error messages instead of assertions when it's possible.
Fri, Jul 7, 3:20 AM

Thu, Jul 6

avt77 retitled D33099: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573) from AMD Jaguar scheduler doesn't correctly model 256-bit AVX instructions to [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573).
Thu, Jul 6, 7:24 AM
avt77 updated the diff for D33099: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573).

We have now only 256-bit ops: it makes the patch smaller.

Thu, Jul 6, 2:59 AM

Jun 13 2017

avt77 updated the diff for D34056: Tail merge size.

The comments fixed.
New reviewers added.

Jun 13 2017, 9:26 AM

Jun 9 2017

avt77 created D34056: Tail merge size.
Jun 9 2017, 9:01 AM

Jun 8 2017

avt77 committed rL304986: Add scheduler classes to integer/float horizontal operations..
Add scheduler classes to integer/float horizontal operations.
Jun 8 2017, 9:44 AM
avt77 closed D33203: Add scheduler classes to integer/float horizontal operations by committing rL304986: Add scheduler classes to integer/float horizontal operations..
Jun 8 2017, 9:44 AM
avt77 updated the diff for D33099: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573).

All notes from Simon were resolved. In addition I fixed numbers for some XMM versions of VMOVxxxx instructions.

Jun 8 2017, 7:22 AM
avt77 added inline comments to D33099: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573).
Jun 8 2017, 4:41 AM
avt77 updated the diff for D33203: Add scheduler classes to integer/float horizontal operations.

Some renames and other minor updates.

Jun 8 2017, 3:48 AM
avt77 committed rL304972: This patch closes PR28513: an optimization of multiplication by different….
This patch closes PR28513: an optimization of multiplication by different…
Jun 8 2017, 3:20 AM

Jun 7 2017

avt77 created D33991: Improved throughput calculation.
Jun 7 2017, 6:55 AM
avt77 updated the diff for D33099: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573).

I removed all changes related to throughput calculations. And I made all updates suggested by Simon.

Jun 7 2017, 5:50 AM
avt77 updated the diff for D33203: Add scheduler classes to integer/float horizontal operations.

I re-implemented all things mentioned by Simon and Gadi.

Jun 7 2017, 2:46 AM

Jun 6 2017

avt77 added a comment to D33897: [X86][SandyBridge,Haswell] Updating information on each instruction in HSW and SNB about latency, number of uOps and used ports.

Why don't you use regular expressions instead of simple list of instructions?

Jun 6 2017, 6:36 AM

Jun 1 2017

avt77 updated the diff for D33203: Add scheduler classes to integer/float horizontal operations.

I removed default values defined for SB model (required by javed.absar).

Jun 1 2017, 9:14 AM
avt77 updated the diff for D33203: Add scheduler classes to integer/float horizontal operations.

I fixed all Simon's notes.

Jun 1 2017, 9:01 AM
avt77 added a comment to D33099: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573).

I really don't understand why you are having to change the throughput calculation as part of this - split this as another patch?

Jun 1 2017, 5:53 AM
avt77 added inline comments to D33203: Add scheduler classes to integer/float horizontal operations.
Jun 1 2017, 5:47 AM
avt77 added inline comments to D33203: Add scheduler classes to integer/float horizontal operations.
Jun 1 2017, 5:43 AM

May 30 2017

avt77 closed D32352: Go to eleven.

Committed revision 304209

May 30 2017, 6:01 AM
avt77 committed rL304209: This patch closes PR28513: an optimization of multiplication by different….
This patch closes PR28513: an optimization of multiplication by different…
May 30 2017, 6:01 AM
avt77 updated the diff for D33203: Add scheduler classes to integer/float horizontal operations.

I redesigned the implementation accordingly to Simon requirements. Now it's done in general way and every X86 should support horizontal operations modeling. I did not check the numbers for SB and SLM: I simply kept the current ones. And I separated Ymm version from Xmm version to be able to model the corresponding throughput difference for Jaguar.

May 30 2017, 3:42 AM

May 29 2017

avt77 closed D32218: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands.".

The revision was committed.

May 29 2017, 7:29 AM

May 26 2017

avt77 committed rL303985: The fix for PR22004: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too….
The fix for PR22004: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too…
May 26 2017, 6:23 AM

May 25 2017

avt77 updated the diff for D32352: Go to eleven.

Hi All,
I merged with trunk and launched "check-all": everything works without any issue.
Craig, Zvi - could you give me LGTM?

May 25 2017, 8:17 AM
avt77 updated the diff for D32218: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands.".

I restored the condition like here:

May 25 2017, 7:29 AM

May 24 2017

avt77 added inline comments to D32218: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands.".
May 24 2017, 11:58 PM
avt77 updated the diff for D33099: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573).

I've fixed all issues raised by Simon. In addition I re-checked all numbers: it seems they are correct now.

May 24 2017, 6:32 AM

May 17 2017

avt77 added inline comments to D33099: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573).
May 17 2017, 8:47 AM

May 15 2017

avt77 updated the diff for D33099: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573).

I slightly changed the algorithm of throughput calculation: if the instr sched model does not have cycles for the given instruction but it's valid then throughput is equal to lattency.

May 15 2017, 11:52 PM
avt77 added inline comments to D32218: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands.".
May 15 2017, 10:13 AM
avt77 created D33203: Add scheduler classes to integer/float horizontal operations.
May 15 2017, 9:23 AM
avt77 added inline comments to D32218: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands.".
May 15 2017, 3:28 AM

May 12 2017

avt77 updated the diff for D33099: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573).

It seems I fixed all known issues except proper support of vzeroupper and vzeroall: will try to do it in the next patch.

May 12 2017, 5:02 AM

May 11 2017

avt77 added reviewers for D33099: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573): RKSimon, spatel, dtemirbulatov.
May 11 2017, 6:29 AM
avt77 created D33099: [X86] Model 256-bit AVX instructions in the AMD Jaguar scheduler (PR28573).
May 11 2017, 6:08 AM

Apr 27 2017

avt77 committed rL301529: 2 tests that were lost in rL301390.
2 tests that were lost in rL301390
Apr 27 2017, 3:33 AM

Apr 26 2017

avt77 updated the diff for D32352: Go to eleven.

Now we have 3 different versions of test_mul_spec.

Apr 26 2017, 10:03 AM
avt77 updated the diff for D32352: Go to eleven.

I removed redundant local variables.

Apr 26 2017, 7:44 AM
avt77 committed rL301390: PR31007 and PR27884 will be closed: a possibility to compile constants like 0bH….
PR31007 and PR27884 will be closed: a possibility to compile constants like 0bH…
Apr 26 2017, 3:10 AM
avt77 updated the diff for D32352: Go to eleven.

The issues with break-return fixed.

Apr 26 2017, 3:05 AM
avt77 updated the diff for D32352: Go to eleven.

**Lambdas refactoring: in fact I tried to do it from the very beginning but I got the error message:

/home/atischenko/workspaces/lea-mult-DAG/llvm/lib/Target/X86/X86ISelLowering.cpp: In function ‘llvm::SDValue combineMulSpecial(uint64_t, llvm::SDNode*, llvm::SelectionDAG&, llvm::EVT, llvm::SDLoc)’:
/home/atischenko/workspaces/lea-mult-DAG/llvm/lib/Target/X86/X86ISelLowering.cpp:30955:3: error: conversion from ‘combineMulSpecial(uint64_t, llvm::SDNode*, llvm::SelectionDAG&, llvm::EVT, llvm::SDLoc)::<lambda(int, int, bool)>’ to non-scalar type ‘llvm::SDValue’ requested

};
^

/home/atischenko/workspaces/lea-mult-DAG/llvm/lib/Target/X86/X86ISelLowering.cpp:30973:55: error: no match for call to ‘(llvm::SDValue) (int, int, bool)’

Result = combineMulShlAddOrSub(5, 1, /*isAdd*/true);
Apr 26 2017, 2:33 AM

Apr 25 2017

avt77 added a comment to D32218: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands.".

The initial problem was described in PR22004. The problem raises because this check does not allow using of identifiers starting from dot. But such IDs are legal for asm, e.g as local labels. As result we have (in this exactly case) the binary expression without RHS operand. It means "too few operands" like assertion says. If I comment out the check then everything works without any problems including all regression tests. Could I remove the check from the code?

Apr 25 2017, 9:47 AM
avt77 updated the diff for D32352: Go to eleven.

The issue with MulConstantOptimization was fixed.

Apr 25 2017, 9:33 AM
avt77 added inline comments to D32352: Go to eleven.
Apr 25 2017, 9:22 AM
avt77 updated the diff for D32352: Go to eleven.

I implemented the requests from Zvi, Isaba and RKSimon. Please, review again.

Apr 25 2017, 7:57 AM
avt77 updated the diff for D32218: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands.".

I simply commented the check becuase it hides the case of identificator. There are no regression failed tests that's why I hope it's acceptable.

Apr 25 2017, 6:53 AM
avt77 updated the diff for D32162: Inline asm 0bH conflict.

I inhibitted the default constructor for ParseStatementInfo::ParseStatementInfo() because it can't work without proper initialization.

Apr 25 2017, 5:17 AM
avt77 abandoned D30572: Remove equal BBs from a function.
Apr 25 2017, 4:26 AM
avt77 added inline comments to D32352: Go to eleven.
Apr 25 2017, 3:11 AM
avt77 added a comment to D32352: Go to eleven.

It's already limited:

// An imul is usually smaller than the alternative sequence.
if (DAG.getMachineFunction().getFunction()->optForMinSize())

Ah, sorry I missed that. The fact that it is "MinSize" highlights that we're in a gray area for the DAG. That is, it's hard to know what the best sequence will be without looking at the instruction timing. Given that, we need to know if converting these muls is generally good. Do you have real or synthetic benchmark info for these cases? Is there a perf difference, for example, between Jaguar and Haswell (since those CPUs are specified in the tests)? Is the codegen ever different for those CPUs? If not, why are we adding different RUNs for them in this patch?

Apr 25 2017, 3:08 AM
avt77 added a comment to D30572: Remove equal BBs from a function.

Hi All,
Reading the sources of TailMerging Pass I discovered that it has special switch "tail-merge-size" allowing to resolve the issue from loop-serch.ll test. The default value of the switch is 3 but if I change it as 2 then everything works fine.
Because of that I decided to abandon this review :-(
I'm going to investigate the possibility to change the default value. If it is not allowed for any reasons (compile time, target specific requirements, etc.) I'll implement special hook in Target as it's suggested in sources.

Apr 25 2017, 1:13 AM

Apr 24 2017

avt77 added a comment to D32352: Go to eleven.

Is this or should this be limited when optimizing for size? I didn't count the instruction bytes...it might depend on the multiplier constant which version is smaller?

Apr 24 2017, 8:05 AM
avt77 updated the diff for D32352: Go to eleven.

I implemented code reuse for different constants support. In addition I slightly changed 2 tests to deal with latency/throughput numbers. BTW, it is not clear at the moment how to use those numbers for 32-bit? What cpu should we use?

Apr 24 2017, 5:00 AM
avt77 updated the diff for D32162: Inline asm 0bH conflict.

Test function "foo" was renamed as "PR31007" to show its origin.

Apr 24 2017, 4:54 AM
avt77 updated the diff for D32162: Inline asm 0bH conflict.

I moved inline-0bh.ll test in test/Codegen/X86 folder.

Apr 24 2017, 2:40 AM

Apr 22 2017

avt77 added inline comments to rL300311: This patch closes PR#32216: Better testing of schedule model instruction….
Apr 22 2017, 12:42 AM

Apr 21 2017

avt77 created D32352: Go to eleven.
Apr 21 2017, 8:02 AM
avt77 added a comment to D32219: [X86][SSE] Improve DIV/SQRT throughput estimates for SB/HW schedule models.

What are your plans here? I've just checked (with help of "-print-schedule=true") IMUL and LEA for Jaguar: they are completely wrong if we compare with numbers from http://www.agner.org/optimize/instruction_tables.pdf. Are we going to change all these things step-by-step?

Apr 21 2017, 2:51 AM

Apr 20 2017

avt77 added inline comments to D32162: Inline asm 0bH conflict.
Apr 20 2017, 1:41 AM

Apr 19 2017

avt77 updated the diff for D32162: Inline asm 0bH conflict.

I added required comments and one additional tiny fix to cover PR27884: now it works properly. The corresponding regression test was added as well.

Apr 19 2017, 5:50 AM
avt77 created D32218: X86AsmParser.cpp asserts: OperandStack.size() > 1 && "Too few operands.".
Apr 19 2017, 4:34 AM
avt77 added inline comments to D32162: Inline asm 0bH conflict.
Apr 19 2017, 4:15 AM
avt77 added inline comments to D32162: Inline asm 0bH conflict.
Apr 19 2017, 12:38 AM

Apr 18 2017

avt77 added reviewers for D32162: Inline asm 0bH conflict: RKSimon, spatel, dtemirbulatov, zizhar.
Apr 18 2017, 5:54 AM
avt77 created D32162: Inline asm 0bH conflict.
Apr 18 2017, 5:50 AM

Apr 14 2017

avt77 committed rL300314: Fix for PR#30562: Selection DAG error: Detected cycle in SelectionDAG..
Fix for PR#30562: Selection DAG error: Detected cycle in SelectionDAG.
Apr 14 2017, 2:30 AM
avt77 committed rL300311: This patch closes PR#32216: Better testing of schedule model instruction….
This patch closes PR#32216: Better testing of schedule model instruction…
Apr 14 2017, 12:57 AM

Apr 12 2017

avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

I implemeted all requirements from hfinkel.
Please, review again.

Apr 12 2017, 8:15 AM

Apr 11 2017

avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

I fixed the latest requirements from RKSimon. Please, give me your feedback.

Apr 11 2017, 12:23 AM

Apr 7 2017

avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

Hope, I fixed all comments raised by RKSimon.
hfinkel, what do you think about?

Apr 7 2017, 9:54 AM
avt77 added inline comments to D30941: Better testing of schedule model instruction latencies/throughputs.
Apr 7 2017, 9:51 AM

Apr 4 2017

avt77 added a comment to D31668: Fix PR30562.

You should use the following command to generate diff:

Apr 4 2017, 11:46 PM

Mar 31 2017

avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

Accordingly to requirements from Simon I inserted prefix "sched: " for scheduler comments and made "false" as default value for -print-schedule option. As result I restored original versions of all X86-tests excepting 2 ones to demonstrate the changes. Now we don't have any failed test.

Mar 31 2017, 3:14 AM

Mar 30 2017

avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

The problem with failed tests raised because of new lines of comments added as result of this patch. I was wrong when I told that FileCheck does not allow adding of new comments at EOL.
I redesigned the patch to make it possible to add Latency:Throughput at the end of exisiting comment (if any). As result I was forced to change API of EmitInstruction from MCStreamer. I don't like this change because there are a lot of successors of MCStreamer but it works perfectly and maybe useful for other targets.
I regenerated (with help of update_llc_test_checks.py) 34 tests and now we have only 16 failed tests: I'm going to fix them asap.

Mar 30 2017, 10:01 AM

Mar 22 2017

avt77 added a comment to D30572: Remove equal BBs from a function.

Matthias,
Thank you for the fast reply.

Mar 22 2017, 7:58 AM
avt77 added a comment to D30941: Better testing of schedule model instruction latencies/throughputs.

I did everything accordingly to Hal's requirements except one: the default value of "print-schedule" switch is false because otherewise we have "Unexpected Failures: 530" and it's X86 tests ony. The problem is very simple: update_llc_test_checks.py generates CHECKs like here

; XOP-AVX1-NEXT: vextractf128 $1, %ymm2, %xmm5

I did not realize that CHECK-NEXT always matched the whole line. That's interesting.

Mar 22 2017, 7:30 AM
avt77 added a reviewer for D30572: Remove equal BBs from a function: MatzeB.
Mar 22 2017, 7:02 AM
avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

I did everything accordingly to Hal's requirements except one: the default value of "print-schedule" switch is false because otherewise we have "Unexpected Failures: 530" and it's X86 tests ony. The problem is very simple: update_llc_test_checks.py generates CHECKs like here

Mar 22 2017, 6:24 AM

Mar 21 2017

avt77 added a comment to D30941: Better testing of schedule model instruction latencies/throughputs.

Hal,
I removed the special option (-print-schedule) and tried to check-all. The result was very unpleseant but predictable:

Mar 21 2017, 5:53 AM

Mar 17 2017

avt77 added a comment to D30941: Better testing of schedule model instruction latencies/throughputs.

hfinkel, yes I mean I'll do it in the next version of this patch soon.
rksimon, do you mean we should rename the compiler option like "-print-schedule"?

It is not clear to me why you won't just always do this when in verbose-asm mode. Thoughts on not having a separate option at all?

Mar 17 2017, 9:41 AM
avt77 updated the diff for D30941: Better testing of schedule model instruction latencies/throughputs.

Throughput calculation is implemented.

Mar 17 2017, 9:37 AM
avt77 added a comment to D30941: Better testing of schedule model instruction latencies/throughputs.

hfinkel, yes I mean I'll do it in the next version of this patch soon.
rksimon, do you mean we should rename the compiler option like "-print-schedule"?

Mar 17 2017, 1:33 AM