- User Since
- Apr 11 2016, 3:46 AM (66 w, 5 d)
This patch replaces assertions with normal diagnostic when it's possible. Now we see normal error messages instead of compiler crashes. It should close PR33861, PR33862 and PR33661. In addition the patch prepares tests for D35621.
Thu, Jul 20
I updated 2 tests in trunk to demonstarte the difference in code generation after this patch aplying. Now we can clearly see the issues and how they could be resolved with help of this patch.
Wed, Jul 19
Fri, Jul 14
I fixed Simon's comments.
Thu, Jul 13
I merged this patch with trunk. Now it's a part 2 othe initial patch.
Tue, Jul 11
Mon, Jul 10
Fri, Jul 7
I removed NFCs from this patch.
Simon, thank you for all these catches: I fixed them.
Thu, Jul 6
We have now only 256-bit ops: it makes the patch smaller.
Jun 13 2017
The comments fixed.
New reviewers added.
Jun 9 2017
Jun 8 2017
All notes from Simon were resolved. In addition I fixed numbers for some XMM versions of VMOVxxxx instructions.
Some renames and other minor updates.
Jun 7 2017
I removed all changes related to throughput calculations. And I made all updates suggested by Simon.
I re-implemented all things mentioned by Simon and Gadi.
Jun 6 2017
Why don't you use regular expressions instead of simple list of instructions?
Jun 1 2017
I removed default values defined for SB model (required by javed.absar).
I fixed all Simon's notes.
May 30 2017
I redesigned the implementation accordingly to Simon requirements. Now it's done in general way and every X86 should support horizontal operations modeling. I did not check the numbers for SB and SLM: I simply kept the current ones. And I separated Ymm version from Xmm version to be able to model the corresponding throughput difference for Jaguar.
May 29 2017
The revision was committed.
May 26 2017
May 25 2017
I merged with trunk and launched "check-all": everything works without any issue.
Craig, Zvi - could you give me LGTM?
I restored the condition like here:
May 24 2017
I've fixed all issues raised by Simon. In addition I re-checked all numbers: it seems they are correct now.
May 17 2017
May 15 2017
I slightly changed the algorithm of throughput calculation: if the instr sched model does not have cycles for the given instruction but it's valid then throughput is equal to lattency.
May 12 2017
It seems I fixed all known issues except proper support of vzeroupper and vzeroall: will try to do it in the next patch.
May 11 2017
Apr 27 2017
Apr 26 2017
Now we have 3 different versions of test_mul_spec.
I removed redundant local variables.
The issues with break-return fixed.
**Lambdas refactoring: in fact I tried to do it from the very beginning but I got the error message:
/home/atischenko/workspaces/lea-mult-DAG/llvm/lib/Target/X86/X86ISelLowering.cpp: In function ‘llvm::SDValue combineMulSpecial(uint64_t, llvm::SDNode*, llvm::SelectionDAG&, llvm::EVT, llvm::SDLoc)’:
/home/atischenko/workspaces/lea-mult-DAG/llvm/lib/Target/X86/X86ISelLowering.cpp:30955:3: error: conversion from ‘combineMulSpecial(uint64_t, llvm::SDNode*, llvm::SelectionDAG&, llvm::EVT, llvm::SDLoc)::<lambda(int, int, bool)>’ to non-scalar type ‘llvm::SDValue’ requested
/home/atischenko/workspaces/lea-mult-DAG/llvm/lib/Target/X86/X86ISelLowering.cpp:30973:55: error: no match for call to ‘(llvm::SDValue) (int, int, bool)’
Result = combineMulShlAddOrSub(5, 1, /*isAdd*/true);
Apr 25 2017
The initial problem was described in PR22004. The problem raises because this check does not allow using of identifiers starting from dot. But such IDs are legal for asm, e.g as local labels. As result we have (in this exactly case) the binary expression without RHS operand. It means "too few operands" like assertion says. If I comment out the check then everything works without any problems including all regression tests. Could I remove the check from the code?
The issue with MulConstantOptimization was fixed.
I implemented the requests from Zvi, Isaba and RKSimon. Please, review again.
I simply commented the check becuase it hides the case of identificator. There are no regression failed tests that's why I hope it's acceptable.
I inhibitted the default constructor for ParseStatementInfo::ParseStatementInfo() because it can't work without proper initialization.
It's already limited:// An imul is usually smaller than the alternative sequence. if (DAG.getMachineFunction().getFunction()->optForMinSize())
Ah, sorry I missed that. The fact that it is "MinSize" highlights that we're in a gray area for the DAG. That is, it's hard to know what the best sequence will be without looking at the instruction timing. Given that, we need to know if converting these muls is generally good. Do you have real or synthetic benchmark info for these cases? Is there a perf difference, for example, between Jaguar and Haswell (since those CPUs are specified in the tests)? Is the codegen ever different for those CPUs? If not, why are we adding different RUNs for them in this patch?
Reading the sources of TailMerging Pass I discovered that it has special switch "tail-merge-size" allowing to resolve the issue from loop-serch.ll test. The default value of the switch is 3 but if I change it as 2 then everything works fine.
Because of that I decided to abandon this review :-(
I'm going to investigate the possibility to change the default value. If it is not allowed for any reasons (compile time, target specific requirements, etc.) I'll implement special hook in Target as it's suggested in sources.
Apr 24 2017
I implemented code reuse for different constants support. In addition I slightly changed 2 tests to deal with latency/throughput numbers. BTW, it is not clear at the moment how to use those numbers for 32-bit? What cpu should we use?
Test function "foo" was renamed as "PR31007" to show its origin.
I moved inline-0bh.ll test in test/Codegen/X86 folder.
Apr 22 2017
Apr 21 2017
What are your plans here? I've just checked (with help of "-print-schedule=true") IMUL and LEA for Jaguar: they are completely wrong if we compare with numbers from http://www.agner.org/optimize/instruction_tables.pdf. Are we going to change all these things step-by-step?
Apr 20 2017
Apr 19 2017
I added required comments and one additional tiny fix to cover PR27884: now it works properly. The corresponding regression test was added as well.
Apr 18 2017
Apr 14 2017
Apr 12 2017
I implemeted all requirements from hfinkel.
Please, review again.
Apr 11 2017
I fixed the latest requirements from RKSimon. Please, give me your feedback.
Apr 7 2017
Hope, I fixed all comments raised by RKSimon.
hfinkel, what do you think about?
Apr 4 2017
You should use the following command to generate diff:
Mar 31 2017
Accordingly to requirements from Simon I inserted prefix "sched: " for scheduler comments and made "false" as default value for -print-schedule option. As result I restored original versions of all X86-tests excepting 2 ones to demonstrate the changes. Now we don't have any failed test.
Mar 30 2017
The problem with failed tests raised because of new lines of comments added as result of this patch. I was wrong when I told that FileCheck does not allow adding of new comments at EOL.
I redesigned the patch to make it possible to add Latency:Throughput at the end of exisiting comment (if any). As result I was forced to change API of EmitInstruction from MCStreamer. I don't like this change because there are a lot of successors of MCStreamer but it works perfectly and maybe useful for other targets.
I regenerated (with help of update_llc_test_checks.py) 34 tests and now we have only 16 failed tests: I'm going to fix them asap.
Mar 22 2017
Thank you for the fast reply.
I did everything accordingly to Hal's requirements except one: the default value of "print-schedule" switch is false because otherewise we have "Unexpected Failures: 530" and it's X86 tests ony. The problem is very simple: update_llc_test_checks.py generates CHECKs like here
Mar 21 2017
I removed the special option (-print-schedule) and tried to check-all. The result was very unpleseant but predictable:
Mar 17 2017
Throughput calculation is implemented.
hfinkel, yes I mean I'll do it in the next version of this patch soon.
rksimon, do you mean we should rename the compiler option like "-print-schedule"?