Page MenuHomePhabricator

andreadb (Andrea Di Biagio)
User

Projects

User does not belong to any projects.

User Details

User Since
May 9 2013, 11:10 AM (292 w, 2 d)

Recent Activity

Fri, Dec 14

andreadb accepted D55600: [TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorElts.

LGTM

Fri, Dec 14, 3:20 AM

Thu, Dec 13

andreadb accepted D55655: [DAGCombiner] after simplifying demanded elements of vector operand of extract, revisit the extract.

LGTM

Thu, Dec 13, 7:33 AM
andreadb added inline comments to D55600: [TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorElts.
Thu, Dec 13, 6:44 AM
andreadb added inline comments to D55600: [TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorElts.
Thu, Dec 13, 5:43 AM
andreadb added a comment to D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads..

Let's not lose sight of the big picture here. If uarch problems exist, are they *worse* than the cost of calling memcmp()?

Almost certainly no, even for memcpy where potential store-forwarding stalls or 4k aliasing are a pretty minor concern most of the time.

I pointed those things out so the new unaligned load/store code-gen can be the best it can be while people are working on that code anyway, *not* because I think there's a risk of overall regressions.

Thu, Dec 13, 3:35 AM
andreadb accepted D55557: [llvm-mca] Move llvm-mca library to llvm/lib/MCA..

This patch LGTM.

Thu, Dec 13, 3:27 AM

Wed, Dec 12

andreadb added inline comments to D55600: [TargetLowering] Add ISD::OR + ISD::XOR handling to SimplifyDemandedVectorElts.
Wed, Dec 12, 10:03 AM
andreadb accepted D55426: [SelectionDAG] Add a generic isSplatValue function..

Thanks Simon.

Wed, Dec 12, 9:55 AM
andreadb added inline comments to D55426: [SelectionDAG] Add a generic isSplatValue function..
Wed, Dec 12, 8:51 AM
andreadb added a comment to D55565: [X86] Don't emit MULX by default with BMI2.

I don't have a strong opinion on this.

Wed, Dec 12, 4:58 AM
andreadb accepted D55558: [TargetLowering] Add ISD::AND handling to SimplifyDemandedVectorElts.

Looks good to me.

Wed, Dec 12, 4:34 AM
andreadb accepted D55414: [X86] Emit SBB instead of SETCC_CARRY from LowerSELECT. Break false dependency on the SBB input..

LGTM too.

Wed, Dec 12, 4:10 AM

Tue, Dec 11

andreadb added a comment to D55494: [x86] allow 8-bit adds to be promoted by convertToThreeAddress() to form LEA.

This patch looks good to me.

Tue, Dec 11, 4:21 AM
andreadb added a comment to rL348114: [ARM][MC] Move information about variadic register defs into tablegen.

In the ARM backend, we only use variable_ops for the LDM and STM instructions, where the variadic operands are either all uses or all defs. I'm less familiar with the other backends, but all of the other cases where variable_ops is used look similar.

If a target does need both variable uses and defs on the same instruction, I assume that we'd need to add a way to represent that to the MCInst or MCOperand, as the sequence of uses and defs would vary between different instances of the instruction.simillar

Tue, Dec 11, 2:57 AM

Mon, Dec 10

andreadb accepted D55345: [AArch64] Refactor the Exynos scheduling predicates.

Thanks Evandro.

Mon, Dec 10, 9:10 AM
andreadb accepted D55507: [DAGCombiner] Use the result value type in visitCONCAT_VECTORS.

I only have one comment. Otherwise, this LGTM. Thanks.

Mon, Dec 10, 5:13 AM

Fri, Dec 7

andreadb added a comment to D55414: [X86] Emit SBB instead of SETCC_CARRY from LowerSELECT. Break false dependency on the SBB input..

Hi Craig,

Fri, Dec 7, 11:48 PM
andreadb added a comment to D55345: [AArch64] Refactor the Exynos scheduling predicates.

I went through the various changes, and I didn’t find anything that looks obviously wrong. The new predicates look fine to me.

Fri, Dec 7, 8:31 AM

Thu, Dec 6

andreadb added inline comments to D55375: [AArch64] Refactor the scheduling predicates.
Thu, Dec 6, 3:52 PM
andreadb accepted D55375: [AArch64] Refactor the scheduling predicates.

I only have one question. Otherwise, the new predicates LGTM (their logic seems to match what you wrote in the code comments).

Thu, Dec 6, 12:43 PM
andreadb added a comment to D55345: [AArch64] Refactor the Exynos scheduling predicates.

Sorry, I was juggling too many patches out of order.

I'd appreciate your reviewing the new predicates and their application while I work on test cases.

Thank you.

Thu, Dec 6, 11:18 AM
andreadb added a comment to D55375: [AArch64] Refactor the scheduling predicates.

Note that I made this patch a predecessor for D55345.

Thu, Dec 6, 10:38 AM
andreadb added inline comments to D55375: [AArch64] Refactor the scheduling predicates.
Thu, Dec 6, 9:42 AM
andreadb added a comment to D55345: [AArch64] Refactor the Exynos scheduling predicates.

Rebase the patch.

Thu, Dec 6, 9:42 AM
andreadb added a comment to D55345: [AArch64] Refactor the Exynos scheduling predicates.

Hi Evandro,

Thu, Dec 6, 6:18 AM
andreadb added a comment to D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads..

I just looked over the codegen changes so far, but I want to add some more knowledgeable x86 hackers to have a look too. There are 2 concerns:
<snip>

  1. Are there any known uarch problems with unaligned accesses (either scalar or SSE)?

*Unaligned* loads are a potential minor slowdown if they cross cache-line boundaries. (Or on AMD, maybe even 32-byte or even 16-byte boundaries). There is literally zero penalty when they don't cross any relevant boundary on modern CPUs (on Intel, that's 64-byte cache lines).

Thu, Dec 6, 4:54 AM
andreadb added a comment to D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads..

One of my coworkers did an informal test last year and saw that newer Intel CPUs optimization of REP-string-op-instruction was faster than using SSE2 (he used large data sizes, not anything in the shorter ranges this patch deals with). Is that something that should be looked at? (or has somebody done that examination already)

Yes, I'm planning to work on this next :) It should go in SelectionDAGTargetInfo::EmitTargetCodeForMemcmp(), similar to what we did for memcpy and memset though.

Thu, Dec 6, 3:35 AM
andreadb added a comment to D55263: [CodeGen][ExpandMemcmp] Add an option for allowing overlapping loads..

Here's a basic benchmark for memcmp(a, b, N) where N is a compile-time constant, and a and b differ first at character M:

The change makes the impacted values 2.5 - 3x as fast.

Thu, Dec 6, 3:15 AM

Wed, Dec 5

andreadb updated the diff for D55274: [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef..

Patch updated.

Wed, Dec 5, 11:15 AM
andreadb added inline comments to D55274: [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef..
Wed, Dec 5, 7:58 AM
andreadb updated the diff for D55274: [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef..

Thanks Simon for the feedback.
My new combine logic was indeed very similar to the existing logic in visitCONCAT_VECTORS.
This patch reuses that logic and introduces a new rule for the case where the first operand of the concat_vector is a scalar_to_vector.

Wed, Dec 5, 7:37 AM

Tue, Dec 4

andreadb created D55274: [DagCombiner][X86] Simplify a ConcatVectors of a scalar_to_vector with undef..
Tue, Dec 4, 8:12 AM

Fri, Nov 30

andreadb accepted D55089: [TableGen] Fix negation of simple predicates.

I'm all for reusing code, but this is not any sophisticated algorithm, but mere streaming of data. I rather keep the additional methods and simplify the exiting ones to eliminate their special cases for the sake of maintenance. After all, complicating them is what resulted in this issue.

Fri, Nov 30, 12:07 PM
andreadb added a comment to D55089: [TableGen] Fix negation of simple predicates.

Hi Evandro.

Fri, Nov 30, 3:33 AM

Thu, Nov 29

andreadb added a comment to D54640: [DAGCombiner] narrow truncated binops.

LGTM.

At the beginning, I was a bit concerned by the increase in the number partial register writes.
So I started experimenting with your patch fo a bit. In practice, I couldn't come up with an obvious case where your patch introduces extra partial register stalls. Also, none of the tests modified by your patch really suffers from partial register stalls.
The checks perfomed by your transform are quite conservative (the new combine rule is limited to pre-legalization, and it only affects binary opcodes where one of the operands is a constant). So, I am happy with it.

Thanks everyone for the reviews and comments!
I haven't thought about partial reg stalls in a while, but IIRC, AMD doesn't care in cases like these because it doesn't rename 8/16-bit regs, and Intel should be protected because we have a machine pass (-x86-fixup-bw-insts) to prevent the problem.

Thu, Nov 29, 12:47 PM
andreadb added inline comments to D54640: [DAGCombiner] narrow truncated binops.
Thu, Nov 29, 11:02 AM
andreadb added inline comments to D54640: [DAGCombiner] narrow truncated binops.
Thu, Nov 29, 10:38 AM
andreadb accepted D54640: [DAGCombiner] narrow truncated binops.
Thu, Nov 29, 10:08 AM
andreadb added a comment to D54640: [DAGCombiner] narrow truncated binops.

At the beginning, I was a bit concerned by the increase in the number partial register writes.
So I started experimenting with your patch fo a bit. In practice, I couldn't come up with an obvious case where your patch introduces extra partial register stalls. Also, none of the tests modified by your patch really suffers from partial register stalls.
The checks perfomed by your transform are quite conservative (the new combine rule is limited to pre-legalization, and it only affects binary opcodes where one of the operands is a constant). So, I am happy with it.

Thu, Nov 29, 10:08 AM

Wed, Nov 28

andreadb updated the diff for D54957: [llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666)..

Address review comments.

Wed, Nov 28, 11:14 AM
andreadb added a comment to D54957: [llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666)..

Thank you, looks good in general!
Some very contrived nits:

Wed, Nov 28, 10:18 AM
andreadb closed D55000: [llvm-mca] Return the total number of cycles from method Pipeline::run()..

Committed at revision 347767.

Wed, Nov 28, 8:28 AM
andreadb created D55000: [llvm-mca] Return the total number of cycles from method Pipeline::run()..
Wed, Nov 28, 5:36 AM
andreadb updated the diff for D54957: [llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666)..

Patch updated.

Wed, Nov 28, 4:19 AM

Tue, Nov 27

andreadb added inline comments to D54957: [llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666)..
Tue, Nov 27, 10:44 AM
andreadb added a reviewer for D54957: [llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666).: lebedev.ri.
Tue, Nov 27, 9:51 AM
andreadb created D54957: [llvm-mca][MC] Add the ability to declare which processor resources model load/store queues (PR36666)..
Tue, Nov 27, 9:51 AM

Fri, Nov 23

andreadb accepted D54822: [AArch64] Refactor the scheduling predicates (3/3) (NFC).

LGTM. Thanks Evandro!

Fri, Nov 23, 4:49 PM
andreadb accepted D54820: [AArch64] Refactor the scheduling predicates (2/3) (NFC).

Only a minor nit. Otherwise, it LGTM. Thanks!

Fri, Nov 23, 4:49 PM
andreadb accepted D54777: [AArch64] Refactor the scheduling predicates (1/3) (NFC).

LGTM. Thanks!

Fri, Nov 23, 4:46 PM
andreadb added inline comments to D54777: [AArch64] Refactor the scheduling predicates (1/3) (NFC).
Fri, Nov 23, 3:21 PM
andreadb added inline comments to D54777: [AArch64] Refactor the scheduling predicates (1/3) (NFC).
Fri, Nov 23, 3:00 PM
andreadb added reviewers for D54603: [llvm-mca][RFC] Adding binary support to llvm-mca.: courbet, gchatelet, RKSimon, spatel, craig.topper.
Fri, Nov 23, 7:10 AM

Thu, Nov 22

andreadb added inline comments to D54820: [AArch64] Refactor the scheduling predicates (2/3) (NFC).
Thu, Nov 22, 3:31 AM
andreadb added inline comments to D54822: [AArch64] Refactor the scheduling predicates (3/3) (NFC).
Thu, Nov 22, 3:27 AM
andreadb added inline comments to D54822: [AArch64] Refactor the scheduling predicates (3/3) (NFC).
Thu, Nov 22, 3:26 AM
andreadb added inline comments to D54777: [AArch64] Refactor the scheduling predicates (1/3) (NFC).
Thu, Nov 22, 3:25 AM

Wed, Nov 21

andreadb added a comment to D54777: [AArch64] Refactor the scheduling predicates (1/3) (NFC).

Question: the code beginning at MCOpcodeSwitchStatement above cannot be used as a regular MCSchedPredicate too. If so, how can I avoid writing the same statement twice, since this condition is used both in AArch64InstrInfo.cpp and in AArch64Sched*.td?

Wed, Nov 21, 1:43 PM
andreadb accepted D54648: [TableGen] Emit more variant transitions.

The issue that I'm trying to avoid is that it's not enough for me to add predicates based on MCSchedPredicate for Exynos processors is other processors don't. Then, if an instruction that I model by using a variant schedule is also modeled by another processor, TableGen will emit no solution at all for the instruction. This patch, which I recognize is just an attempt, aims at allowing the proper solution for a processor using such predicates, while indeed resulting in a clumsy solution the scheduling of the same instruction for other processors.

The issue is that it's virtually impossible at the moment to model AArch64 without running on llvm-mca giving right up. I was thinking that instead of giving up, llvm-mca should resort to a reasobale default and highlight it in its result. I proposed NoSchedPred as this default, but, though we can discuss what the default should be, I think that no default does not make sense as is.

Wed, Nov 21, 12:28 PM
andreadb added a comment to D54777: [AArch64] Refactor the scheduling predicates (1/3) (NFC).

Hi Evandro.

Wed, Nov 21, 4:02 AM
andreadb added a comment to D54648: [TableGen] Emit more variant transitions.

Hi Evandro,

Wed, Nov 21, 3:38 AM
andreadb added a reviewer for D54777: [AArch64] Refactor the scheduling predicates (1/3) (NFC): andreadb.
Wed, Nov 21, 12:15 AM

Nov 14 2018

andreadb accepted D54095: [X86] X86DAGToDAGISel::matchBitExtract(): extract 'lshr' from `X`.

P.s.: I wonder if the MOVZ would get away if the input operand is declared as "zeroext i8". We should check it (maybe not in this patch).

Nov 14 2018, 11:16 AM
andreadb added a comment to D54095: [X86] X86DAGToDAGISel::matchBitExtract(): extract 'lshr' from `X`.

On these tests, i would even say this new MOVZ+OR variant is better..

Nov 14 2018, 6:38 AM

Nov 13 2018

andreadb accepted D54095: [X86] X86DAGToDAGISel::matchBitExtract(): extract 'lshr' from `X`.

In general, I tend to prefer a sequence of MOVZ + OR over a partial register write introduced by a byte move.
That being said, I don't have a strong opinion on this.

AMD processors don't tend to rename portions of a 32/64 bit GPR. So, the partial write introduced by your byte move is not renamed, and there is a false dependency on the input register operand. On the other hand, if you use a MOVZ, you end up with one extra instruction to dispatch/execute. However, in practice the latency of that extra MOVZ can be hidden, as you can see from the throughput numbers generated by llvm-mca for your examples.

Partial register moves may no be a problem on modern Intel cpus.
Intel processors know how to rename portions of a GPR, so false dependencies can be removed that way. According to Agner: since Haswell, the processor solves this problem without any visible performance penalties. Perhaps it makes dual bookkeeping of both the partial register and the full register. (cit. "Haswell and Broadwell pipeline" and "Skylake pipeline").

There is still a small penalty for SandyBridge/Ivybridge. On those processors, a partial write can be renamed. However, when the full register is read later on, a merge opcode is issued to join the data from the different writes. That being said, the latency of that merge opcode should be relatively small.

Note however that, according to Agner (see chapter "Sandybridge and Ivybridge pipeline") a zero-extending move from an 8-bit register to a 32-bit or 64-bit register can also be eliminated at register renaming stage. That seems to be only true for Sandybridge and Ivybridge. As far as I am aware of, no AMD processor allows zero extending move elimination.
That being said, a solution that uses a MOVZ + OR may be better (or equivalent) on Sandybridge/Ivybridge.

Nov 13 2018, 7:10 AM
andreadb added a comment to D54095: [X86] X86DAGToDAGISel::matchBitExtract(): extract 'lshr' from `X`.

In general, I tend to prefer a sequence of MOVZ + OR over a partial register write introduced by a byte move.
That being said, I don't have a strong opinion on this.

Nov 13 2018, 5:41 AM

Nov 8 2018

andreadb added a comment to D54268: [llvm-mca] PR39261: Rename FetchStage to EntryStage..

Thanks Matt, I will apply the suggested changes and then commit.

Nov 8 2018, 9:34 AM
andreadb created D54268: [llvm-mca] PR39261: Rename FetchStage to EntryStage..
Nov 8 2018, 9:25 AM

Nov 7 2018

andreadb accepted D54179: [llvm-mca] Move the AssembleInput logic into its own class..

LGTM.

Nov 7 2018, 10:36 AM
andreadb added inline comments to D54179: [llvm-mca] Move the AssembleInput logic into its own class..
Nov 7 2018, 3:47 AM
andreadb added a comment to D54179: [llvm-mca] Move the AssembleInput logic into its own class..

I started thinking about another way of organizing this patch, to reduce all of the potential extra files. I can only really see two CodeRegion generators: one for ASM input, and the other for object file input... once we add the binary support. Perhaps, we just have a CodeRegionGenerator.{h,cpp} and put the generators for ASM and object files in those instead of having separate header/implementations for ASM and Object files. Just a thought.

Nov 7 2018, 3:32 AM

Nov 1 2018

andreadb created D53976: [llvm-mca] Add extra counters for move elimination in view RegisterFileStatistics..
Nov 1 2018, 7:16 AM

Oct 30 2018

andreadb created D53880: [tblgen][PredicateExpander] Add the ability to describe more complex constraints on instruction operands..
Oct 30 2018, 12:04 PM
andreadb accepted D53407: [llvm-mca] Move namespace mca inside llvm::.

LGTM. Thanks!

Oct 30 2018, 4:09 AM

Oct 29 2018

andreadb added a comment to D53407: [llvm-mca] Move namespace mca inside llvm::.

I am okay with this change provided that you only change files in llvm-mca/include or in llvm-mca/lib.

Why is the difference between llvm-mca/include/* and llvm-mca/Views/*?

If llvm-mca/Views/* names want to stay in namespace mca, as they reference llvm-mca/include/*, they need to be written as:

namespace mca {
using namespace llvm::mca;

or use qualified llvm::mca::* (which is clumsy)

Having both namespace mca and namespace llvm::mca will make it easy to cause name conflict.

Oct 29 2018, 5:17 AM

Oct 25 2018

andreadb added a comment to D52932: [MCSched] Bind PFM Counters to the CPUs instead of the SchedModel..

@courbet: Your commit seems to cause build issues:

http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/838

Thanks

Oct 25 2018, 6:00 AM

Oct 24 2018

andreadb accepted D52932: [MCSched] Bind PFM Counters to the CPUs instead of the SchedModel..

Move pfm counters to ExegesisTarget. There is now a separate Exegesis TableGen backend.

Oct 24 2018, 11:02 AM
andreadb added a comment to D53407: [llvm-mca] Move namespace mca inside llvm::.

Friendly ping :)

Oct 24 2018, 4:11 AM
andreadb added a comment to D52779: AMD BdVer2 (Piledriver) Initial Scheduler model.

I forgot to write:
Please make sure to add all the bdver2 tests to llvm-mca - just copy the relevant ISA test files and update the RUN tags.
Also, a test that shows the problematic store throughput would be nice.

Oct 24 2018, 3:53 AM
andreadb accepted D52779: AMD BdVer2 (Piledriver) Initial Scheduler model.

I am okay with accepting this patch, provided that you fix the store throughput. If not in this patch, then it should definitely be addressed by a follow-up patch.
At the moment, your model assumes a maximum throughput of two store operations per cycle, which is incorrect.

Oct 24 2018, 3:42 AM
andreadb added a comment to D53585: [llvm-mca] Improved error handling and error reporting from class InstrBuilder..

Thanks for the review Matt.

Oct 24 2018, 3:07 AM

Oct 23 2018

andreadb created D53585: [llvm-mca] Improved error handling and error reporting from class InstrBuilder..
Oct 23 2018, 9:33 AM

Oct 12 2018

andreadb added a comment to D53055: [MCA] Limit the number of bytes fetched per cycle..

Now that llvm-mca is a library, people can define their own custom pipeline without having to modify the "default pipeline stages".
In particular, I don't want to introduce any frontend concepts in the default pipeline of llvm-mca.
For now, any frontend simulation should be implemented by stages that are not part of the default pipeline.

I don't have a particular opinion about whether this should be part of the default pipeline or not, but I think modeling the frontend is very important.
This article from a couple years ago analyzes the typical workloads on a google datacenter. While most of the stalls are from the backend, the frontend has a significant contribution:

Oct 12 2018, 3:36 AM

Oct 11 2018

andreadb added inline comments to D53134: [tblgen][llvm-mca] Add the ability to describe move elimination candidates via tablegen..
Oct 11 2018, 11:32 AM
andreadb added a comment to D53055: [MCA] Limit the number of bytes fetched per cycle..

Hi Andrea,

There is already bug https://bugs.llvm.org/show_bug.cgi?id=36665, which is about adding support for simulating the hardware frontend logic.
I know that @courbet and his team would like to work on it. So, you can probably try to work with them on this.
Unfortunately, that bugzilla must be updated. There is not enough information there (I suggested to send a detailed RFC upstream in case).

I strongly suggest you/your team/Clement's team to work together on that task. I am afraid that people may be working on the same tasks in parallel.. That has to be avoided.
You can use that bugzilla to coordinate your work upsteam on this.

Let me clarify this: Owen is working with us :) He has taken over the genetic scheduler work I presented at EuroLLVM. One of the bottlenecks we had was the frontend hence the change. I agree that this should have been made clearer (@owenrodley, can you create a bugzilla account and assign the bug to yourself ?)

Oct 11 2018, 9:44 AM
andreadb accepted D53095: [x86] add and use fast horizontal vector math subtarget feature.

Thanks Sanjay!

Oct 11 2018, 8:53 AM
andreadb updated the diff for D53134: [tblgen][llvm-mca] Add the ability to describe move elimination candidates via tablegen..

Patch updated:

Oct 11 2018, 8:09 AM
andreadb created D53134: [tblgen][llvm-mca] Add the ability to describe move elimination candidates via tablegen..
Oct 11 2018, 5:26 AM

Oct 10 2018

andreadb added reviewers for D53055: [MCA] Limit the number of bytes fetched per cycle.: courbet, gchatelet, RKSimon, atrick.

Hi Owen,

Oct 10 2018, 3:23 AM

Oct 9 2018

andreadb added a comment to D52932: [MCSched] Bind PFM Counters to the CPUs instead of the SchedModel..

Thanks for the description Clement,

Oct 9 2018, 7:56 AM
andreadb added a comment to D52997: [x86] allow single source horizontal op matching (PR39195).

So 2 options for moving forward:

  1. Allow this transform as shown here because it is mostly just restoring the behavior of last week. Follow that up with a subtarget feature to prevent the transform (not ideal, but the alternative 'undo' is much harder).
  2. Limit this transform to 'optsize' right now because it's a size win in all cases.

I'd vote for (1) for this patch - optsize + HasFastHorinzontalOp might be necessary depending on how soon we can agree on a scheduler model driven mechanism that re-expands HADD later on (per Andrea's suggestion - but hopefully we can discuss that at the devmtg)

Oct 9 2018, 5:35 AM
andreadb added a comment to D52932: [MCSched] Bind PFM Counters to the CPUs instead of the SchedModel..

Thanks for waiting Clement.

Oct 9 2018, 4:27 AM
andreadb added a comment to D52997: [x86] allow single source horizontal op matching (PR39195).

Hi Sanjay,

Oct 9 2018, 3:23 AM

Oct 5 2018

andreadb added a comment to D52932: [MCSched] Bind PFM Counters to the CPUs instead of the SchedModel..

I’m not at work today, but I’d like a bit of time to review this patch.

Oct 5 2018, 8:03 AM
andreadb added inline comments to D52779: AMD BdVer2 (Piledriver) Initial Scheduler model.
Oct 5 2018, 4:07 AM

Oct 4 2018

andreadb added a comment to D52886: [X86] Move ReadAfterLd functionality into X86FoldableSchedWrite (PR36957).

Thanks Simon!

Oct 4 2018, 8:06 AM
andreadb added a comment to D52779: AMD BdVer2 (Piledriver) Initial Scheduler model.

Hi Roman,

Oct 4 2018, 5:59 AM

Oct 1 2018

andreadb accepted D46662: [X86] condition branches folding for three-way conditional codes.

Thanks.
I don’t have other comments.

Oct 1 2018, 12:24 PM

Sep 29 2018

andreadb updated the diff for D52663: [X86][BtVer2] Teach how to identify zero-idio VPERM2F128rr instructions..

Addressed review comments.

Sep 29 2018, 6:48 AM
andreadb added a comment to D46662: [X86] condition branches folding for three-way conditional codes.
In D46662#1248780, @xur wrote:

Using new SubtargetFeature method (suggested by Andrea) to make this pass opt-in for subtargets.
Changed the tests accordingly.

Sep 29 2018, 6:45 AM