Page MenuHomePhabricator

v_klochkov (Vyacheslav Klochkov)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 16 2015, 12:01 PM (226 w, 4 d)

Recent Activity

May 30 2019

vladimirlaz <vladimir.lazarev@intel.com> committed rG7e5a7aab09ba: [SYCL] Common Reference Semantics for accessor class (authored by v_klochkov).
[SYCL] Common Reference Semantics for accessor class
May 30 2019, 8:08 AM
vladimirlaz <vladimir.lazarev@intel.com> committed rGedb3d6dcd905: [SYCL] multi_ptr class: fixes for conversion to void multi_ptr (authored by v_klochkov).
[SYCL] multi_ptr class: fixes for conversion to void multi_ptr
May 30 2019, 8:07 AM
Vladimir Lazarev <vladimir.lazarev@intel.com> committed rGaf2061c7aec0: [SYCL] event: use only the immediate deps in wait_and_throw()/get_wait_list()… (authored by v_klochkov).
[SYCL] event: use only the immediate deps in wait_and_throw()/get_wait_list()…
May 30 2019, 8:06 AM
Vladimir Lazarev <vladimir.lazarev@intel.com> committed rGd3db9581e3ec: [SYCL] Implemented some conversion and non-member functions for multi_ptr class (authored by v_klochkov).
[SYCL] Implemented some conversion and non-member functions for multi_ptr class
May 30 2019, 8:05 AM
Vladimir Lazarev <vladimir.lazarev@intel.com> committed rGd32a9f76cc88: [SYCL] Changed the interface of the method scheduler::getDepEventsRecusruve() (authored by v_klochkov).
[SYCL] Changed the interface of the method scheduler::getDepEventsRecusruve()
May 30 2019, 8:05 AM
vladimirlaz <vladimir.lazarev@intel.com> committed rG2f3d5ff59146: [SYCL] Added missing wait(), wait_and_throw(), get_wait_list() methods to event… (authored by v_klochkov).
[SYCL] Added missing wait(), wait_and_throw(), get_wait_list() methods to event…
May 30 2019, 8:02 AM

Mar 9 2018

v_klochkov abandoned D26436: Fix for lost FastMathFlags in 4 optimizations..

3 opts out of 4 optimizations mentioned here (i.e. Reassociate, SLP, GVN) have been fixed by me in separate smaller patches.

Mar 9 2018, 3:53 PM
v_klochkov accepted D44324: [TwoAddressInstructionPass] Improve tryInstructionCommute X86 FMA and vpternlog instructions.

Hi Craig,
This change-set looks good to me.
-Vyacheslav

Mar 9 2018, 3:12 PM

Jan 9 2017

v_klochkov committed rL291473: X86-specific path: Implemented the fusing of MUL+ADDSUB to FMADDSUB..
X86-specific path: Implemented the fusing of MUL+ADDSUB to FMADDSUB.
Jan 9 2017, 12:37 PM
v_klochkov closed D28087: X86 instr selection: combine ADDSUB + MUL to FMADDSUB by committing rL291473: X86-specific path: Implemented the fusing of MUL+ADDSUB to FMADDSUB..
Jan 9 2017, 12:37 PM

Jan 6 2017

v_klochkov added a comment to D28087: X86 instr selection: combine ADDSUB + MUL to FMADDSUB.

Elena, thank you for the review and for the comments.
I made additional changes accordingly to your recommendations.

Jan 6 2017, 4:42 PM
v_klochkov updated the diff for D28087: X86 instr selection: combine ADDSUB + MUL to FMADDSUB.
  • removed 'fast' attributes from the LIT test;
  • fixed the function names (the first letter must be a lower case);
  • removed unused 'DAG' parameter from isAddSub() function;
  • did few other really minor changes.
Jan 6 2017, 4:28 PM

Jan 5 2017

v_klochkov updated the diff for D28087: X86 instr selection: combine ADDSUB + MUL to FMADDSUB.

Sorry, just a minor fix for a harmless misprint in the LIT test (replaced 'FMAx3_512' with 'FMA3_512')

Jan 5 2017, 2:42 AM
v_klochkov updated the diff for D28087: X86 instr selection: combine ADDSUB + MUL to FMADDSUB.
  1. Made some restructures in ADDSUB idiom recognition. As a result of these changes 512-bit FMADDSUB idiom can be recognized and X86ISD::FMADDSUB is generated.
Jan 5 2017, 2:30 AM
v_klochkov added a comment to D28087: X86 instr selection: combine ADDSUB + MUL to FMADDSUB.

Does it make sense to convert 512-bit operation ADDSUB to FMADDSUB with all-ones multiplier? What ICC does?

Jan 5 2017, 2:21 AM

Jan 1 2017

v_klochkov planned changes to D28087: X86 instr selection: combine ADDSUB + MUL to FMADDSUB.

I see my mistake now. CPU ISA says that ADDSUBPD/PS instructions are available for 128 and 256-bit vectors only.
It is not available for ZMMs.

Jan 1 2017, 10:56 PM

Dec 29 2016

v_klochkov updated the diff for D28087: X86 instr selection: combine ADDSUB + MUL to FMADDSUB.

`Hi Simon,

Dec 29 2016, 6:04 PM

Dec 23 2016

v_klochkov retitled D28087: X86 instr selection: combine ADDSUB + MUL to FMADDSUB from to X86 instr selection: combine ADDSUB + MUL to FMADDSUB.
Dec 23 2016, 9:27 PM

Dec 7 2016

v_klochkov accepted D27144: [AVX-512] Correctly preserve the passthru semantics of the FMA scalar intrinsics.

LGTM from me.
I think it also makes sense to wait for Zvi to complete his code-review of this patch.

Dec 7 2016, 4:02 PM
v_klochkov added a comment to D27144: [AVX-512] Correctly preserve the passthru semantics of the FMA scalar intrinsics.

Hi Craig,

Dec 7 2016, 3:47 PM

Nov 22 2016

v_klochkov committed rL287700: Fixed the lost FastMathFlags in GVN(Global Value Numbering)..
Fixed the lost FastMathFlags in GVN(Global Value Numbering).
Nov 22 2016, 1:03 PM
v_klochkov closed D26952: Fix for lost FastMathFlags in GVN by committing rL287700: Fixed the lost FastMathFlags in GVN(Global Value Numbering)..
Nov 22 2016, 1:02 PM
v_klochkov committed rL287695: Fixed the lost FastMathFlags in Reassociate optimization..
Fixed the lost FastMathFlags in Reassociate optimization.
Nov 22 2016, 12:33 PM
v_klochkov closed D26957: Fix for lost FastMathFlags in Reassociate optimization by committing rL287695: Fixed the lost FastMathFlags in Reassociate optimization..
Nov 22 2016, 12:33 PM
v_klochkov retitled D26957: Fix for lost FastMathFlags in Reassociate optimization from to Fix for lost FastMathFlags in Reassociate optimization.
Nov 22 2016, 12:07 AM

Nov 21 2016

v_klochkov updated the diff for D26952: Fix for lost FastMathFlags in GVN.

I just tightened the checks in the LIT test.

Nov 21 2016, 10:31 PM
v_klochkov retitled D26952: Fix for lost FastMathFlags in GVN from to Fix for lost FastMathFlags in GVN.
Nov 21 2016, 7:05 PM

Nov 15 2016

v_klochkov added a comment to D26575: Fix for lost FastMathFlags in SLPVectorizer (intrinsic calls).

Michael, thank you for the quick code-review!

Nov 15 2016, 5:13 PM
v_klochkov committed rL287064: Fixed the lost FastMathFlags for CALL operations in SLPVectorizer..
Fixed the lost FastMathFlags for CALL operations in SLPVectorizer.
Nov 15 2016, 5:05 PM
v_klochkov closed D26575: Fix for lost FastMathFlags in SLPVectorizer (intrinsic calls) by committing rL287064: Fixed the lost FastMathFlags for CALL operations in SLPVectorizer..
Nov 15 2016, 5:05 PM

Nov 11 2016

v_klochkov added a comment to D26543: Fix for lost FastMathFlags in SLPVectorizer.

`Ok, done. Similar fix for intrinsic calls is submitted here: https://reviews.llvm.org/D26575

Nov 11 2016, 10:49 PM
v_klochkov retitled D26575: Fix for lost FastMathFlags in SLPVectorizer (intrinsic calls) from to Fix for lost FastMathFlags in SLPVectorizer (intrinsic calls).
Nov 11 2016, 10:47 PM
v_klochkov added a comment to D26543: Fix for lost FastMathFlags in SLPVectorizer.

`Thank you Michael,

Nov 11 2016, 12:22 PM
v_klochkov committed rL286626: Fixed the lost FastMathFlags for FCmp operations in SLPVectorizer..
Fixed the lost FastMathFlags for FCmp operations in SLPVectorizer.
Nov 11 2016, 12:05 PM
v_klochkov closed D26543: Fix for lost FastMathFlags in SLPVectorizer by committing rL286626: Fixed the lost FastMathFlags for FCmp operations in SLPVectorizer..
Nov 11 2016, 12:05 PM
v_klochkov added a comment to D26436: Fix for lost FastMathFlags in 4 optimizations..

Thank you for the code review and comments.
I decided to use Davide's advice and split this patch into 4 smaller and independent patches.

Nov 11 2016, 3:16 AM
v_klochkov retitled D26543: Fix for lost FastMathFlags in SLPVectorizer from to Fix for lost FastMathFlags in SLPVectorizer.
Nov 11 2016, 3:09 AM

Nov 10 2016

v_klochkov planned changes to D26436: Fix for lost FastMathFlags in 4 optimizations..

Ok, thank you for the response.
Initially I did not know how to test those test cases, but now I have some ideas.
Hopefully -run-pass switch will help me to test only 1 pass and check that FastMathFlags are not get lost in the affected opt passes.

Nov 10 2016, 9:54 AM

Nov 8 2016

v_klochkov retitled D26436: Fix for lost FastMathFlags in 4 optimizations. from to Fix for lost FastMathFlags in 4 optimizations..
Nov 8 2016, 5:47 PM

Sep 21 2016

v_klochkov accepted D24447: [AVX-512] Add support for commuting VPTERNLOG..

Hi Craig,

Sep 21 2016, 4:09 PM

Aug 30 2016

v_klochkov updated subscribers of D23909: [X86] Remove DenseMap for storing FMA3 grouping information.
Aug 30 2016, 3:37 PM
v_klochkov added a comment to D23909: [X86] Remove DenseMap for storing FMA3 grouping information.

Your comment regarding using the base opcode byte was a good surprise for me.
I did not realize that it is possible to use it and that it is always available in TSFlags.

Aug 30 2016, 3:34 PM

Aug 29 2016

v_klochkov added a comment to D23909: [X86] Remove DenseMap for storing FMA3 grouping information.

`Hi Craig,

Aug 29 2016, 4:23 PM

Aug 11 2016

v_klochkov committed rL278431: X86-FMA3: Implemented commute transformation for EVEX/AVX512 FMA3 opcodes..
X86-FMA3: Implemented commute transformation for EVEX/AVX512 FMA3 opcodes.
Aug 11 2016, 3:15 PM
v_klochkov closed D23108: Implemented 132/213/231 forms selection for X86-FMA3-AVX512 opcodes. by committing rL278431: X86-FMA3: Implemented commute transformation for EVEX/AVX512 FMA3 opcodes..
Aug 11 2016, 3:15 PM

Aug 9 2016

v_klochkov added a comment to D23108: Implemented 132/213/231 forms selection for X86-FMA3-AVX512 opcodes..

Craig, thank you for the reviewing the changes.
I moved the new files to llvm/lib/Target/X86.

Aug 9 2016, 3:38 PM
v_klochkov updated the diff for D23108: Implemented 132/213/231 forms selection for X86-FMA3-AVX512 opcodes..

Moved the 2 new files X86InstrFMA3Info.{cpp,h} from llvm/lib/Target/X86/Utils to llvm/lib/Target/X86

Aug 9 2016, 3:33 PM

Aug 8 2016

v_klochkov added a comment to D23108: Implemented 132/213/231 forms selection for X86-FMA3-AVX512 opcodes..

Hi Craig,

Aug 8 2016, 12:50 PM
v_klochkov updated the diff for D23108: Implemented 132/213/231 forms selection for X86-FMA3-AVX512 opcodes..

Cancelled the changes in 2 places unrelated to the new classes and their uses (both are in isNonFoldablePartialRegisterLoad()).

Aug 8 2016, 12:13 PM

Aug 3 2016

v_klochkov retitled D23108: Implemented 132/213/231 forms selection for X86-FMA3-AVX512 opcodes. from to Implemented 132/213/231 forms selection for X86-FMA3-AVX512 opcodes..
Aug 3 2016, 12:15 AM

Jul 27 2016

v_klochkov accepted D22799: [X86] Remove CustomInserter for FMA3 instructions..

Hi Craig,
LGTM. It is good to see that the register coalescer works fine for FMAs and that custom-inserter code can be just removed.

Jul 27 2016, 5:10 PM

Apr 15 2016

v_klochkov added a comment to D18751: [MachineCombiner] Support for floating-point FMA on ARM64.

`I worked on X86-FMA optimization in some other compiler and was switched to LLVM project just recently.

Apr 15 2016, 1:50 PM

Apr 6 2016

v_klochkov added a comment to D18751: [MachineCombiner] Support for floating-point FMA on ARM64.

I did just a very quick scan through the change-set,
and added few minor comments related to styling.

Apr 6 2016, 3:12 PM
v_klochkov updated subscribers of D18751: [MachineCombiner] Support for floating-point FMA on ARM64.
Apr 6 2016, 11:40 AM

Dec 8 2015

v_klochkov committed rL255080: X86-FMA3: Defined the ExeDomain property for Scalar FMA3 opcodes..
X86-FMA3: Defined the ExeDomain property for Scalar FMA3 opcodes.
Dec 8 2015, 4:15 PM
v_klochkov closed D15317: X86-FMA3: Defined ExeDomain for Scalar FMA3 opcodes by committing rL255080: X86-FMA3: Defined the ExeDomain property for Scalar FMA3 opcodes..
Dec 8 2015, 4:15 PM

Dec 7 2015

v_klochkov retitled D15317: X86-FMA3: Defined ExeDomain for Scalar FMA3 opcodes from to X86-FMA3: Defined ExeDomain for Scalar FMA3 opcodes.
Dec 7 2015, 4:28 PM

Nov 25 2015

v_klochkov committed rL254140: X86-FMA3: Improved/enabled the memory folding optimization for scalar loads.
X86-FMA3: Improved/enabled the memory folding optimization for scalar loads
Nov 25 2015, 11:48 PM
v_klochkov closed D14762: X86-FMA3: Memory folding for scalar loads + FMA3 by committing rL254140: X86-FMA3: Improved/enabled the memory folding optimization for scalar loads.
Nov 25 2015, 11:48 PM

Nov 24 2015

v_klochkov added a comment to D14762: X86-FMA3: Memory folding for scalar loads + FMA3.

Thank you for the review!

Nov 24 2015, 12:09 AM
v_klochkov updated the diff for D14762: X86-FMA3: Memory folding for scalar loads + FMA3.

Updated the unit test.

Nov 24 2015, 12:03 AM

Nov 23 2015

v_klochkov added a comment to D14762: X86-FMA3: Memory folding for scalar loads + FMA3.

Hi David,

Nov 23 2015, 1:26 AM
v_klochkov updated the diff for D14762: X86-FMA3: Memory folding for scalar loads + FMA3.

Fixed the misprints and updated the unit test.

Nov 23 2015, 1:14 AM

Nov 17 2015

v_klochkov retitled D14762: X86-FMA3: Memory folding for scalar loads + FMA3 from to X86-FMA3: Memory folding for scalar loads + FMA3.
Nov 17 2015, 3:15 PM

Nov 12 2015

v_klochkov committed rL252973: X86-FMA3: Implemented commute transformations FMA*_Int instructions..
X86-FMA3: Implemented commute transformations FMA*_Int instructions.
Nov 12 2015, 4:10 PM
v_klochkov closed D14550: X86-FMA3: Implemented commute transformations for FMA*_Int instructions by committing rL252973: X86-FMA3: Implemented commute transformations FMA*_Int instructions..
Nov 12 2015, 4:10 PM
v_klochkov committed rL252940: My first/test commit. Removed a trailing whitespace..
My first/test commit. Removed a trailing whitespace.
Nov 12 2015, 12:14 PM

Nov 11 2015

v_klochkov added a comment to D14550: X86-FMA3: Implemented commute transformations for FMA*_Int instructions.

Thank you for the comments and for the approval!
-Slava

Nov 11 2015, 8:04 PM
v_klochkov added a comment to D14550: X86-FMA3: Implemented commute transformations for FMA*_Int instructions.

Hi Quentin,

Nov 11 2015, 3:25 PM
v_klochkov updated the diff for D14550: X86-FMA3: Implemented commute transformations for FMA*_Int instructions.

Did additional refactoring suggested by Quentin:

  • added an optional parameter IsIntrinsic to isFMA3()
  • removed the duplicating code (loop) from getFMA3OpcodeToCommuteOperands().
Nov 11 2015, 3:22 PM

Nov 10 2015

v_klochkov retitled D14550: X86-FMA3: Implemented commute transformations for FMA*_Int instructions from to X86-FMA3: Implemented commute transformations for FMA*_Int instructions.
Nov 10 2015, 12:43 PM

Nov 5 2015

v_klochkov added a comment to D13269: Improved X86-FMA3 mem-folding & coalescing.

Please accept my apologies for adding noise to the code-review process.
Ok, no more cleaup-ups at the late code-review phases in future.

Nov 5 2015, 5:38 PM
v_klochkov added a comment to D13269: Improved X86-FMA3 mem-folding & coalescing.

I created the FMA*_Int opcodes in the patch for ( D13710 ) and it has been committed to LLVM trunc.
That patch conflicted with the changes I did here in X86InstrFMA.td.
Thus, I had to update my local workspace and upload the new RE-BASED patch this time.

Nov 5 2015, 12:27 PM
v_klochkov updated the diff for D13269: Improved X86-FMA3 mem-folding & coalescing.

Resolved the conflicts in X86InstrFMA.td and updated the fma-commute-x86.ll test.

Nov 5 2015, 12:25 PM

Nov 3 2015

v_klochkov added a comment to D13710: New X86 FMA3*_Int opcodes for scalar FMA intrinsics..

Please see the answers to your questions.

Nov 3 2015, 11:48 AM

Nov 2 2015

v_klochkov added a comment to D13710: New X86 FMA3*_Int opcodes for scalar FMA intrinsics..

Hi Elena,

Nov 2 2015, 2:27 PM
v_klochkov updated the diff for D13710: New X86 FMA3*_Int opcodes for scalar FMA intrinsics..

Updated the test fma-intrinsics-x86.ll:

  • added Windows target code-gen;
  • added checks for memory folding optimization opt of FMAs generated for intrinsics.
Nov 2 2015, 2:03 PM

Oct 21 2015

v_klochkov added a comment to D13710: New X86 FMA3*_Int opcodes for scalar FMA intrinsics..

Thank you for the comments. Please review the new/updated test.

Oct 21 2015, 7:14 PM
v_klochkov updated the diff for D13710: New X86 FMA3*_Int opcodes for scalar FMA intrinsics..

Updated the unit test fma-intrinsics-phi-213-to-231.ll:

  • cancelled the insertion of new scalar test cases (they were added in previous version of patch);
  • added 128-bit packed test cases
  • tightened the checks;
Oct 21 2015, 7:10 PM

Oct 14 2015

v_klochkov added a comment to D13710: New X86 FMA3*_Int opcodes for scalar FMA intrinsics..

Hi Quentin,

Oct 14 2015, 2:58 PM

Oct 13 2015

v_klochkov updated subscribers of D13710: New X86 FMA3*_Int opcodes for scalar FMA intrinsics..
Oct 13 2015, 4:07 PM
v_klochkov added a comment to D13269: Improved X86-FMA3 mem-folding & coalescing.

Hi Ahmed,

Oct 13 2015, 4:04 PM
v_klochkov retitled D13710: New X86 FMA3*_Int opcodes for scalar FMA intrinsics. from to New X86 FMA3*_Int opcodes for scalar FMA intrinsics..
Oct 13 2015, 4:00 PM

Oct 12 2015

v_klochkov added a comment to D13269: Improved X86-FMA3 mem-folding & coalescing.

Hi Quentin,

Oct 12 2015, 4:34 PM

Oct 5 2015

v_klochkov added a comment to D13269: Improved X86-FMA3 mem-folding & coalescing.

Hi Quentin,

Oct 5 2015, 5:07 PM
v_klochkov updated the diff for D13269: Improved X86-FMA3 mem-folding & coalescing.

Additional minor changes + code-restructuring in getFMA3OpcodeToCommuteOperands().

Oct 5 2015, 5:02 PM

Oct 1 2015

v_klochkov added a comment to D13269: Improved X86-FMA3 mem-folding & coalescing.

I also did additional changes accordingly to reviewers' recommendations.

Oct 1 2015, 3:25 PM
v_klochkov updated the diff for D13269: Improved X86-FMA3 mem-folding & coalescing.

Ahmed, Quentin,
Thank you for the quick code-review.

Oct 1 2015, 3:23 PM

Sep 29 2015

v_klochkov retitled D13269: Improved X86-FMA3 mem-folding & coalescing from to Improved X86-FMA3 mem-folding & coalescing.
Sep 29 2015, 1:50 PM

Sep 18 2015

v_klochkov added a comment to D11370: Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing..

Thank you for the comments.
I fixed the "immeditate" misprint,
replaced the if statement: if (!(A && B) && !(C && D)) --> if ((!A || !B) && (!C || !D))
and removed the 'FIXME' word in SIFoldOperands.cpp (accordingly to recommendation from Quentin Colombet).

Sep 18 2015, 10:13 AM

Sep 17 2015

v_klochkov added a comment to D11370: Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing..

Mr. Stellard,
Please review and approve the changes in 3 files owned by AMDGPU target:

llvm/lib/Target/AMDGPU/SIFoldOperands.cpp
llvm/lib/Target/AMDGPU/SIInstrInfo.h
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

All the changes (14 files, including AMDGPU) have been reviewed by Quentin Colombet.

Also, below I attached the e-mails I sent you to your e-mail at amd.com
Please see more details in it.

Sep 17 2015, 11:36 AM
v_klochkov added a reviewer for D11370: Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing.: tstellarAMD.
Sep 17 2015, 11:22 AM

Aug 28 2015

v_klochkov added a comment to D11370: Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing..

Please see my answer regarding the new assert in AMDGPU version of commuteInstructionImpl() method.
Thanks,
Slava

Aug 28 2015, 5:04 PM
v_klochkov updated the diff for D11370: Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing..

Fixed coding style/standard violations such as too long lines, indentations, etc.
using the recommendations from 'clang-format-diff.py' tool.

Aug 28 2015, 4:48 PM
v_klochkov added a comment to D11370: Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing..

The code 'if (MI->IsCommutable() && TII->commuteInstruction(MI))' works differently before and after the changes in commuteInstruction() method .

Aug 28 2015, 1:49 PM

Aug 27 2015

v_klochkov added a comment to D11370: Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing..

`Hi Quentin,

Aug 27 2015, 3:53 PM
v_klochkov updated the diff for D11370: Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing..

I removed the X86 FMA specific changes and left here only the interface changes
for commuteInstruction() and findCommutedOpIndices() methods accordingly to Quentin's request.

Aug 27 2015, 3:45 PM

Aug 20 2015

v_klochkov added a comment to D11370: Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing..

Hi Quentin,

Aug 20 2015, 11:25 AM

Aug 19 2015

v_klochkov updated the diff for D11370: Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing..

This change-set (3rd revision) is done accordingly to Quentin's suggestion to have 'protected virtual commuteInstructionImpl()' method
and to have the other method 'commuteInstruction()' non-virtual. The last one can accept CommuteAnyOperandIndex arguments.

Aug 19 2015, 6:18 PM

Aug 10 2015

v_klochkov updated the diff for D11370: Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing..

I updated the change-set accordingly to the comments and recommendations from reviewers.

Aug 10 2015, 2:19 PM