Page MenuHomePhabricator

andreadb (Andrea Di Biagio)
User

Projects

User does not belong to any projects.

User Details

User Since
May 9 2013, 11:10 AM (413 w, 2 d)

Recent Activity

Wed, Apr 7

andreadb accepted D98174: [MCA] Add tests for IPC on Cortex-A55.

LGTM

Wed, Apr 7, 1:15 PM · Restricted Project

Fri, Mar 26

andreadb added a comment to D98174: [MCA] Add tests for IPC on Cortex-A55.

For the non XFAIL tests, I suggest to use the usual python script to auto generate CHECK directives.

Fri, Mar 26, 8:32 AM · Restricted Project

Thu, Mar 25

andreadb added inline comments to D99339: [MCA] Support carry-over instructions for in-order processors.
Thu, Mar 25, 1:03 PM · Restricted Project
andreadb accepted D99339: [MCA] Support carry-over instructions for in-order processors.

LGTM modulo a few nits (see comments below).

Thu, Mar 25, 8:02 AM · Restricted Project

Wed, Mar 24

andreadb committed rG97a00b7b2083: [MCA] Fix for uninitialised member in constructor. NFC (authored by andreadb).
[MCA] Fix for uninitialised member in constructor. NFC
Wed, Mar 24, 4:23 AM

Tue, Mar 23

andreadb accepted D98628: [MCA] Disable RCU for InOrderIssueStage.

LGTM. Thanks Andrew!

Tue, Mar 23, 1:03 PM · Restricted Project
andreadb added a comment to D98628: [MCA] Disable RCU for InOrderIssueStage.

Hey Andrew,

Tue, Mar 23, 8:26 AM · Restricted Project
andreadb committed rGf5bdc88e4d2b: [MCA] Improved handling of negative read-advance cycles. (authored by andreadb).
[MCA] Improved handling of negative read-advance cycles.
Tue, Mar 23, 7:49 AM

Mon, Mar 22

andreadb added inline comments to D98628: [MCA] Disable RCU for InOrderIssueStage.
Mon, Mar 22, 9:36 AM · Restricted Project
andreadb added a comment to D98906: [X86] Improve lowering of the unrolled inline-asm probing.

Be careful about referencing stack locations below the red-zone.

The x86-64 ABI mandates a 128 bytes red zone. Your new instruction would effectively write a zero at a location below the red-zone.
That region of memory is not reserved, and should be considered volatile. For example: signals and interrupt handlers are allowed to modify it.

Isn't the volatility unimportant for the purposes of stack probing? The write exists only to poke the page (lightly) to ensure there's no page that is not rw, probing does not particularly care if the data at the relevant addresses are overwritten, or even that the byte is written to the address in the first place (as long as it triggers page permission checks). The only concern could be that some signal handler or an interrupt put some data there, but when these are running the probing itself is suspended, isn't it?

Mon, Mar 22, 9:12 AM · Restricted Project
andreadb added a comment to D98906: [X86] Improve lowering of the unrolled inline-asm probing.

Be careful about referencing stack locations below the red-zone.

Mon, Mar 22, 8:01 AM · Restricted Project

Fri, Mar 19

andreadb accepted D98628: [MCA] Disable RCU for InOrderIssueStage.

LGTM.

Fri, Mar 19, 12:24 PM · Restricted Project
andreadb added a comment to D98628: [MCA] Disable RCU for InOrderIssueStage.

I personally don't like the idea of making the RCU optional in the RetireStage. The is no point in having a RetireStage if there is no RCU.

Fri, Mar 19, 3:22 AM · Restricted Project

Wed, Mar 17

andreadb accepted D98604: [MCA] Ensure that writes occur in-order.

Thanks a lot Andrew.

Wed, Mar 17, 3:57 AM · Restricted Project

Tue, Mar 16

andreadb added a comment to D98604: [MCA] Ensure that writes occur in-order.

Hi Andrew,

Tue, Mar 16, 5:18 AM · Restricted Project

Mon, Mar 15

andreadb requested changes to D98628: [MCA] Disable RCU for InOrderIssueStage.

There is nothing wrong with using a dummy RCU for the InOrderIssue stage.

Mon, Mar 15, 5:37 AM · Restricted Project
andreadb added inline comments to D98604: [MCA] Ensure that writes occur in-order.
Mon, Mar 15, 4:33 AM · Restricted Project

Mar 11 2021

andreadb added a comment to D94604: [CodeGen] Allow parallel uses of a resource.

OK. So X86 isn't a concern because it isn't using MachineScheduler (at least not post-RA).

Indeed, after looking a bit to brainstorm how MachineScheduler would change, I note that only when BufferSize of a resource is 0 does MachineScheduler update any of the resource usage (ReservedCycles). And with a bit more digging, there is currently only one upstream scheduling model which uses both BufferSize = 0 and resource groups -- and the resource group itself doesn't have BufferSize=0. So, a change to MachineScheduler to respect groups when BufferSize = 0 doesn't look like it's likely to cause much disruption.

I'll go try that out and see what happens.

Mar 11 2021, 9:35 AM · Restricted Project
andreadb added a comment to D94604: [CodeGen] Allow parallel uses of a resource.

<snip>

Perhaps I should state explicitly what it is that needs to be modeled for the Cortex-M7 scheduler to make sure we're on the same page:

  1. Some instructions require the entire FP datapath
  2. Other instructions require half of the FP datapath
  3. It is possible to dual-issue two instructions each requiring half of the FP datapath
  4. It is not possible to dual-issue instructions requiring the entire FP datapath with instructions requiring the entire FP datapath or half of the FP datapath.

I would love it if there was a way to just make this work out of the box. However, stating that a resource is used twice (that's what's in the current code) or that there's a resource group with two parts (as suggested) doesn't do the trick. Nor did trying to define VPort0 and VPort1 as sub-units of VPort.

What you have described is a classic scenario for processor resource groups You can have a group for the entire FP datapath, and then model each half of the FP datapath separately with a resource unit. If this doesn't work for you, then it is a bug in the MachineScheduler (at least, the logic that does the bookkeping of resource cycles for groups).

More in general, an algorithm cannot ignore the resource cycle contributions of individual units to a group. Otherwise, group latencies are incorrectly computed.

That is certainly what I would have expected resource groups to do.

I do get that resource consumption begins immediately, but the scheduling model certainly does allow a resource to be occupied for multiple cycles. And the MachineScheduler doesn't seem to care about how it came to be that way -- whether through resource groups, subclasses, or using the same resource twice in the InstRW. What it cares about is the list of resources and cycles in the WriteProcResTable.

The number of cycles reported in WriteProcResTable is not a problem here. In fact, it is actually correct and it should be 2.
For groups, the number of resource cycles reported by WriteProcResTable doesn't necessarily translate to actual latency. Some of (if not all) the resource cycles consumed by a group may often map to the same runtime cycle. That is because each individual unit starts consumption at relative cycle #0. So there is clearly an overlap. It implies that resource cycles for groups can be consumed in parallel.

In WriteProcResTable, group resource cycles are computed by summing all the individual contributions from all the resource units (that, plus any extra cycles explicitly declared for the group). That's how you end up with 2cy for M7UnitVPort, and that is correct.

That is the computation I'm seeing.

If MachineScheduler believes that those 2 resource cycles translates to a 2cy latency, then that's a bug in MachineScheduler.

In your case, group M7UnitVPort is consumed for 2 "resource cycles". However, it doesn't mean that the group will only be available every other cycle. Those two resource cycles are contributed by M7UnitVPort0 and M7UnitVPort1 (one resource cycle each), and the resource unit consumption always happens at relative cycle #0. In reality, those two resource cycles are effectively the same cycle (i.e. units are consumed in parallel for 1cy).

That's what I don't see happening in MachineScheduler. It sees the group as if it were a separate resource which is consumed for two cycles and doesn't try to find parallel units within the group to provide those two cycles of resource consumption. As far as I can tell, the concept of groups does not exist at all in MachineScheduler.

Again, apologies if I still don't get the full picture. But I strongly believe at this point that there might be a wrong assumption in the MachineScheduler on how resource cycles are set for groups.

I think where we're getting to is that MachineScheduler has never been made to work "as expected" with groups. If groups are the accepted way to specify this sort of resource usage, then making MachineScheduler use group information would seem to be preferable to adding a new annotation. However, I do have one reservation. Groups are used fairly widely at present -- I see them in PowerPC, X86, AArch64, and ARM. I am not at all sanguine about changing something which would have such widespread effects without more people chiming in. Any idea who else should be part of this discussion?

Mar 11 2021, 4:28 AM · Restricted Project

Mar 10 2021

andreadb added a comment to D98356: [MCA] Support in-order CPUs with MicroOpBufferSize=1.

is this patch needed because otherwise you see the following crash/assertion failure?

Yes, that was the failure, and I just did the minimum work I could to make MicroOpBufferSize=1 work like MicroOpBufferSize=0. I don't really know what reorder buffers and retire control units are -- they are not part of my mental model of how the AMDGPU processors work.

Mar 10 2021, 12:49 PM · Restricted Project
andreadb accepted D98356: [MCA] Support in-order CPUs with MicroOpBufferSize=1.

Hi Jay,

Mar 10 2021, 12:09 PM · Restricted Project

Mar 9 2021

andreadb added a comment to D98174: [MCA] Add tests for IPC on Cortex-A55.

One last question.
What is the plan with the SDIV test cases? I don't think that there is anything that we can do to improve that simulation, since it would require knowledge that isn't available at simulation time. The risk is to end up with a test which isn't very useful in practice (it will always be marked as XFAIL). In which case, I suggest to remove those DIV tests entirely.

They are not very useful as "tests", I agree, but they can be useful as a documentation,
highlighting the cases where MCA and hardware do not match. Although there is no point
in having two tests for the same issue, so we can remove one of them.

Mar 9 2021, 7:53 AM · Restricted Project
andreadb added a comment to D98174: [MCA] Add tests for IPC on Cortex-A55.

Ops.. I have accidentally "accepted" this patch.
Basically it LGTM if you remove the SDIV tests for now as I don't think that there is value in having them (unless you prove me wrong).

Mar 9 2021, 6:20 AM · Restricted Project
andreadb accepted D98174: [MCA] Add tests for IPC on Cortex-A55.

One last question.
What is the plan with the SDIV test cases? I don't think that there is anything that we can do to improve that simulation, since it would require knowledge that isn't available at simulation time. The risk is to end up with a test which isn't very useful in practice (it will always be marked as XFAIL). In which case, I suggest to remove those DIV tests entirely.

Mar 9 2021, 6:19 AM · Restricted Project
andreadb added a comment to D94604: [CodeGen] Allow parallel uses of a resource.

<snip>

Perhaps I should state explicitly what it is that needs to be modeled for the Cortex-M7 scheduler to make sure we're on the same page:

  1. Some instructions require the entire FP datapath
  2. Other instructions require half of the FP datapath
  3. It is possible to dual-issue two instructions each requiring half of the FP datapath
  4. It is not possible to dual-issue instructions requiring the entire FP datapath with instructions requiring the entire FP datapath or half of the FP datapath.

I would love it if there was a way to just make this work out of the box. However, stating that a resource is used twice (that's what's in the current code) or that there's a resource group with two parts (as suggested) doesn't do the trick. Nor did trying to define VPort0 and VPort1 as sub-units of VPort.

Mar 9 2021, 3:15 AM · Restricted Project
andreadb added inline comments to D98174: [MCA] Add tests for IPC on Cortex-A55.
Mar 9 2021, 1:44 AM · Restricted Project

Mar 8 2021

andreadb added inline comments to D98174: [MCA] Add tests for IPC on Cortex-A55.
Mar 8 2021, 6:50 AM · Restricted Project

Mar 2 2021

andreadb added inline comments to D94928: [llvm-mca] Add support for in-order CPUs.
Mar 2 2021, 11:34 AM · Restricted Project
andreadb accepted D94928: [llvm-mca] Add support for in-order CPUs.

LGTM.

Mar 2 2021, 7:30 AM · Restricted Project
andreadb added inline comments to D94928: [llvm-mca] Add support for in-order CPUs.
Mar 2 2021, 5:48 AM · Restricted Project
andreadb added a comment to D94928: [llvm-mca] Add support for in-order CPUs.

Now that https://reviews.llvm.org/D95954 has been fixed, there are only a few things left to do:

  1. Disable the bottleneck analysis for in-order processors.
  2. Add a couple of lines to the release notes describing this new feature.
  3. Update the llvm-mca docs.

Thank you! Can you please check if wording in the docs is ok?

Regarding the bottleneck analysis: it seems to work for some cases
(A55-all-views.s), but not for others (A55-out-of-order-retire.s).
I will check what is wrong here. The feature is disabled for now as you
suggested.

Mar 2 2021, 5:30 AM · Restricted Project

Feb 22 2021

andreadb closed D86177: [llvm-mca][NFC] Separate calculation of display data from its display in the summary and instruction info views.

Not sure why this review wasn't automatically closed.
This was committed back in August as cf6adecd6a8718ee2737ca55e4cd938364b984cc

Feb 22 2021, 2:57 AM · Restricted Project

Feb 19 2021

andreadb added inline comments to D94928: [llvm-mca] Add support for in-order CPUs.
Feb 19 2021, 3:10 AM · Restricted Project

Feb 17 2021

andreadb added a comment to D94928: [llvm-mca] Add support for in-order CPUs.

Now that https://reviews.llvm.org/D95954 has been fixed, there are only a few things left to do:

Feb 17 2021, 6:12 AM · Restricted Project

Feb 4 2021

andreadb added a comment to D95997: [NFC][llvm-mca] Fix compiler warning.

No problem. Your change was good.
Thanks for fixing the warning.

Feb 4 2021, 1:49 PM · Restricted Project
andreadb added a comment to D95997: [NFC][llvm-mca] Fix compiler warning.

Not sure I understand why you would need an approval now. You have already committed this change.
See: https://reviews.llvm.org/rG91e7a17133324ac4beaf6ed45c170436c2d91c98

Feb 4 2021, 10:49 AM · Restricted Project
andreadb added a comment to D95321: [NFC] Fixing build warning with llvm-mca.

@plotfi could you please commit this?

Feb 4 2021, 1:11 AM · Restricted Project
andreadb added a comment to D95997: [NFC][llvm-mca] Fix compiler warning.

This is a duplicate of D95321.
No idea why it is taking so long for that other patch to go in.

Feb 4 2021, 1:11 AM · Restricted Project

Jan 27 2021

andreadb added a comment to D94928: [llvm-mca] Add support for in-order CPUs.

Thanks for the updated patch.

Jan 27 2021, 9:29 AM · Restricted Project

Jan 25 2021

andreadb added a comment to D95321: [NFC] Fixing build warning with llvm-mca.

LGTM.

For the record, that warning was firstly introduced by another commit which was meant to be an NFC refactoring
in preparation for the most recent work that added support for the generation of JSON files.

commit cf6adecd6a8718ee2737ca55e4cd938364b984cc

Citation needed.
Even the warning says otherwise.

Jan 25 2021, 4:12 AM · Restricted Project
andreadb added a comment to D95321: [NFC] Fixing build warning with llvm-mca.

BTW, https://reviews.llvm.org/D86644?id=318269 likely added a few more of these bugs.

Jan 25 2021, 3:53 AM · Restricted Project
andreadb accepted D95321: [NFC] Fixing build warning with llvm-mca.

For the record, that warning was firstly introduced by another commit which was meant to be an NFC refactoring
in preparation for the most recent work that added support for the generation of JSON files.

Jan 25 2021, 3:45 AM · Restricted Project

Jan 21 2021

andreadb accepted D86644: [llvm-mca] Initial implementation of output serialization using JSON.

LGTM.

Jan 21 2021, 11:42 AM · Restricted Project
andreadb added inline comments to D86644: [llvm-mca] Initial implementation of output serialization using JSON.
Jan 21 2021, 4:33 AM · Restricted Project
andreadb added a comment to D86644: [llvm-mca] Initial implementation of output serialization using JSON.

I can see that InstructionView has been moved outside of View.h into its own file, but I cannot see that new file anywhere in this diff.

Jan 21 2021, 4:19 AM · Restricted Project

Jan 20 2021

andreadb added inline comments to D94928: [llvm-mca] Add support for in-order CPUs.
Jan 20 2021, 1:02 PM · Restricted Project
andreadb added a comment to D94928: [llvm-mca] Add support for in-order CPUs.

Thanks for the review Andrea!

Your model assumes an unbounded queue of instructions (something like rudimental reservation station) where to store dispatched instructions.

If you mean InstQueue, then it is bounded by Bandwidth variable - the
maximum number of instructions that can be issued in the next cycle.

Correct me if I am wrong, but in-order processor don't use a reservation station.
In the absence of structural hazards, if data dependencies are met, then uOPs are directly issued to the underlying execution units.
So the dispatch event is not decoupled from the issue event.

The fact that your patch adds an unbounded queue sounds a bit strange to me. Not sure what @dmgreen
thinks about it. But this basically means that dispatch and issue are different events.

That is true. However, the problem here is that MCA timeline view counts stalls
as a number of cycles between dispatch and issue events. If dispatch and issue
always happen in the same cycle, stalls are not displayed:

[0,3]     .  DeeER  .    .    add	w13, w30, #1
[0,4]     .  DeeeER .    .    smulh	x30, x29, x28
[0,5]     .     DeeeER   .    smulh	x27, x30, x28
[0,6]     .        DeeeER.    smulh	xzr, x27, x26
[0,7]     .    .    DeeeER    umulh	x30, x29, x28

To avoid this, the implementation emits a dispatch event for instructions that
should be executed in the next cycle. If an instruction is unable to execute due
to a hazard, it is delayed and a stall is displayed starting from the dispatch
event:

[0,3]     . DeeeER  .    .    add	w13, w30, #4095, lsl #12
[0,4]     . DeeeeER .    .    smulh	x30, x29, x28
[0,5]     .  D==eeeeER   .    smulh	x27, x30, x28
[0,6]     .  D=====eeeeER.    smulh	xzr, x27, x26
[0,7]     .    .  D=eeeeER    umulh	x30, x29, x28

I remember that I did this intentionally, but now I'm not really convinced that
this difference is worth extra complexity. Let me know what you think about
this.

Jan 20 2021, 10:37 AM · Restricted Project

Jan 19 2021

andreadb added a comment to D94928: [llvm-mca] Add support for in-order CPUs.

Thanks for working on this feature!

Jan 19 2021, 8:19 AM · Restricted Project
andreadb added reviewers for D94928: [llvm-mca] Add support for in-order CPUs: dmgreen, evgeny777.
Jan 19 2021, 5:59 AM · Restricted Project
andreadb added a comment to D94604: [CodeGen] Allow parallel uses of a resource.

I have only skimmed through this patch once, however I think that you can fix the problem in https://reviews.llvm.org/D94605 without introducing your new field ResourceUses.

The "problematic" resource is M7UnitVPort

def M7UnitVPort  : ProcResource<2> { let BufferSize = 0; }

In your case, you want to allow the consumption of both resource units from a single write.
You can do that if you convert M7UnitVPort into a group (see example below)

def M7UnitVPort0 : ProcResource<1> { let BufferSize = 0; }
def M7UnitVPort1 : ProcResource<1> { let BufferSize = 0; }

def M7UnitVPort : ProcResGroup<[M7UnitVPort0, M7UnitVPort1]>;

At that point, you simply enumerate the resource units in the list of consumed resources. So, something like this:

Example - before:

def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort, M7UnitVPort]>

Example - after:

def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort0, M7UnitVPort1]>

In conclusion, if the goal is to be able to do something like that, then I think the syntax is already expressive enough.
The obvious downside is that currently you need to declare multiple resources to do what you want to do.

Unfortunately, I have tried doing this with a resource group with no success. ExpandProcResources ends up marking the resource group as used for multiple cycles:

From CortexM7ModelSchedClasses:

{DBGFIELD("IIC_fpFMAC64_WriteFPMAC64_ReadFPMAC_ReadFPMUL_ReadFPMUL") 1, true, false, 161, 4, 795, 1, 132, 3}, // #136

From ARMWriteProcResTable:

{ 9,  1}, // #161
{10,  2}, // #162
{11,  1}, // #163
{12,  1}, // #164

From CortexM7ModelProcResources:

{"M7UnitVFP",       1, 0, 0, nullptr}, // #9
{"M7UnitVPort",     2, 0, 0, CortexM7ModelProcResourceSubUnits + 1}, // #10
{"M7UnitVPort0",    1, 0, 0, nullptr}, // #11
{"M7UnitVPort1",    1, 0, 0, nullptr}, // #12

In the end, the test in lines 1139-1140 of SubTargetEmitter.cpp forces multiple uses of a resource -- whether they be explicitly stated in an InstRW, implied by using different resources in a resource group, or hierarchically stated as using subunits of the resource -- to take multiple cycles. That test seems so fundamental to the way that current schedule descriptions work that it seemed better to introduce the additional Uses notation than to change it.

Jan 19 2021, 5:05 AM · Restricted Project

Jan 18 2021

andreadb added a comment to D94604: [CodeGen] Allow parallel uses of a resource.

I have only skimmed through this patch once, however I think that you can fix the problem in https://reviews.llvm.org/D94605 without introducing your new field ResourceUses.

Jan 18 2021, 3:31 AM · Restricted Project

Jan 16 2021

andreadb added a comment to D86644: [llvm-mca] Initial implementation of output serialization using JSON.

I really like this JSON format. Thanks a lot!

Jan 16 2021, 5:11 AM · Restricted Project

Dec 18 2020

andreadb requested changes to D86644: [llvm-mca] Initial implementation of output serialization using JSON.

The json output is better now.
However there are things about the implementation that must be fixed/changed (see my comments below).

Dec 18 2020, 6:03 AM · Restricted Project
andreadb added inline comments to D86644: [llvm-mca] Initial implementation of output serialization using JSON.
Dec 18 2020, 5:31 AM · Restricted Project

Nov 25 2020

andreadb accepted D92026: [TableGen][SchedModels] Get rid of hasVariant. NFC.

Thanks for checking.

Nov 25 2020, 2:57 AM · Restricted Project

Nov 24 2020

andreadb added a comment to D92026: [TableGen][SchedModels] Get rid of hasVariant. NFC.

When you say that " Subtarget files are unchanged by the patch", does it mean that you have checked all the targets or just ARM?

I checked ARM, AArch64 and X86. Does any other have scheduling enabled?

Nov 24 2020, 9:01 AM · Restricted Project
andreadb added a comment to D92026: [TableGen][SchedModels] Get rid of hasVariant. NFC.

I've only skimmed through the code, but it looks like a nice improvement.

Nov 24 2020, 7:33 AM · Restricted Project
andreadb accepted D91704: [llvm-mca] Fix processing thumb instruction set.

LGTM

Nov 24 2020, 7:16 AM · Restricted Project
andreadb added a comment to D91704: [llvm-mca] Fix processing thumb instruction set.

A couple of minor nits. Otherwise the llvm-mca change looks good.
Thanks.

Nov 24 2020, 1:51 AM · Restricted Project

Nov 23 2020

andreadb added a comment to D91704: [llvm-mca] Fix processing thumb instruction set.

Thanks for the updated patch.

Nov 23 2020, 4:17 AM · Restricted Project

Nov 18 2020

andreadb added a comment to D91410: [llvm][clang][mlir] Add checks for the return values from Target::createXXX to prevent protential null deref.

LGTM for llvm-mca.

Nov 18 2020, 4:36 PM · Restricted Project, Restricted Project, Restricted Project
andreadb added inline comments to D91704: [llvm-mca] Fix processing thumb instruction set.
Nov 18 2020, 4:34 PM · Restricted Project

Oct 31 2020

andreadb committed rG0e20666db3ac: [MCA][LSUnit] Correctly update the internal group flags on store barrier… (authored by andreadb).
[MCA][LSUnit] Correctly update the internal group flags on store barrier…
Oct 31 2020, 5:01 AM

Oct 26 2020

andreadb added a comment to D90150: [ARM][SchedModels] Convert IsLdstsoScaledNotOptimalPred to MCSchedPredicate.

Much better. Thanks a lot :)

Oct 26 2020, 7:46 AM · Restricted Project
andreadb added a comment to D90150: [ARM][SchedModels] Convert IsLdstsoScaledNotOptimalPred to MCSchedPredicate.

The new predicate looks good. I only have a couple of nits (see below).

Oct 26 2020, 7:08 AM · Restricted Project

Oct 23 2020

andreadb added a comment to D90024: [ARM][SchedModels] Get rid of IsLdrAm2ScaledPred .

I'm happy with that explanation, and would be happy with this patch going ahead. The mir test above seems to show it failing before and working now.

Oct 23 2020, 10:41 AM · Restricted Project
andreadb accepted D90024: [ARM][SchedModels] Get rid of IsLdrAm2ScaledPred .
Oct 23 2020, 10:25 AM · Restricted Project
andreadb added a comment to D90024: [ARM][SchedModels] Get rid of IsLdrAm2ScaledPred .

So, it doesn't look like the lowering is doing anything odd with the operand sequence. It should be fine.
At this point, the only explanation is that the original predicate was never really executed for that write variant. So it was never tested. I don't see other alternatives honestly.

Oct 23 2020, 10:17 AM · Restricted Project
andreadb added a comment to D90024: [ARM][SchedModels] Get rid of IsLdrAm2ScaledPred .

(paranoia level).
Are we sure that the lowering of MachineInstr to MCInst is preserving the operand sequence? Can it be that the immediate is at position 4 for the MCInst only?
I have no idea how an ldrbt looks like as a MachineInstr. The original check should have triggered an assertion too for MachineInstr then...

I was not aware that was a thing that could happen!

I'm unsure if anything would ever generate an ldrbt with negative postinc reg and shift from codegen, so it might be difficult to test. And may never have been tested in the past.

Oct 23 2020, 9:51 AM · Restricted Project
andreadb added a comment to D90024: [ARM][SchedModels] Get rid of IsLdrAm2ScaledPred .

Should it always have been checking operand 4 then? I think that makes sense

IMO, it was incorrectly checking for operand 3. I've got an assertion when switched to MC pred

Oct 23 2020, 8:56 AM · Restricted Project
andreadb added a comment to D90024: [ARM][SchedModels] Get rid of IsLdrAm2ScaledPred .

Should it always have been checking operand 4 then? I think that makes sense

(More of a commit message might make it more obvious why changes are being made ;) )

Oct 23 2020, 8:47 AM · Restricted Project
andreadb added a comment to D90045: [ARM][SchedModels] Convert IsLdrAm3NegRegOffPred to MCSchedPredicate.

Nice!
The predicates look good to me.

Oct 23 2020, 8:23 AM · Restricted Project
andreadb added inline comments to D90045: [ARM][SchedModels] Convert IsLdrAm3NegRegOffPred to MCSchedPredicate.
Oct 23 2020, 7:32 AM · Restricted Project
andreadb added a comment to D90029: [ARM][SchedModels] Convert IsLdstsoMinusRegPred to MCSchedPredicate.

The new predicate looks good to me.

Oct 23 2020, 7:22 AM · Restricted Project
andreadb added a comment to D90024: [ARM][SchedModels] Get rid of IsLdrAm2ScaledPred .

Could you please add more context to the diff? I don't remember how IsLdstsoScaledPredX2 was defined.

Oct 23 2020, 7:18 AM · Restricted Project
andreadb accepted D90017: [ARM][SchedModels] Convert IsR1P0AndLaterPred to MCSchedPredicate. NFC.

LGTM.

Oct 23 2020, 3:07 AM · Restricted Project

Oct 22 2020

andreadb added a comment to D89939: [ARM][SchedModels] Convert IsLdstsoScaledPred to MCSchedPredicate.

Nice patch!
The new predicate looks good.

Oct 22 2020, 2:37 AM · Restricted Project

Oct 21 2020

andreadb added a comment to D89876: [ARM][SchedModels] Convert IsLdrAm3RegOffPred to MCSchedPredicate.

Addressed

Oct 21 2020, 9:48 AM · Restricted Project
andreadb added inline comments to D89876: [ARM][SchedModels] Convert IsLdrAm3RegOffPred to MCSchedPredicate.
Oct 21 2020, 9:30 AM · Restricted Project
andreadb added a comment to D89876: [ARM][SchedModels] Convert IsLdrAm3RegOffPred to MCSchedPredicate.

Hi Eugene,

Oct 21 2020, 7:47 AM · Restricted Project

Oct 19 2020

andreadb added inline comments to D89460: [ARM][SchedModels] Convert IsCPSRDefinedPred to MCSchedPredicate.
Oct 19 2020, 3:02 AM · Restricted Project

Oct 17 2020

andreadb added a comment to D89458: [ARM][SchedModels] Convert IsPredicatedPred to MCSchedPredicate.

I have been thinking a bit more about this patch.
D89553 is surprisingly less complicated than I have originally thought it was. It definitely touches more files. However the majority of the changes are very mechanical (mostly boilerplate required for the new MCSchedPredicate to work). It is also a non-controversial change, since it doesn't modify the layout of MCInst.
Bonus point: by changing the signature of resolveVariantSchedClass() we potentially enable future developments in the area of MCSchedPredicate (and mca). Basically that change would make it possible for users to declare new predicates on MCinstrDesc (which is definitely not a bad thing!).

Oct 17 2020, 4:10 AM · Restricted Project
andreadb accepted D89553: [ARM][SchedModels] Convert IsPredicatedPred to MCSchedPredicate (alternative approach).

The patch is mostly a mechanical change (i.e. a lot of boilerplate to introduce a new scheduling predicate).

Oct 17 2020, 2:58 AM · Restricted Project

Oct 16 2020

andreadb added a comment to D89458: [ARM][SchedModels] Convert IsPredicatedPred to MCSchedPredicate.

For the record, this is how your patch would look like if you decided to add a new predicate.


I am just trying to be helpful by showing how the alternative approach would look like (in case, if people decide that it is not a good idea to add an extra field to MCInst).

Oct 16 2020, 5:36 AM · Restricted Project
andreadb added inline comments to D89458: [ARM][SchedModels] Convert IsPredicatedPred to MCSchedPredicate.
Oct 16 2020, 1:46 AM · Restricted Project

Oct 15 2020

andreadb added a comment to D86644: [llvm-mca] Initial implementation of output serialization using JSON.

Hey Wolfgang,
are you still looking at this patch? Do you have any updates?

Oct 15 2020, 3:27 PM · Restricted Project
andreadb added inline comments to D89458: [ARM][SchedModels] Convert IsPredicatedPred to MCSchedPredicate.
Oct 15 2020, 3:23 PM · Restricted Project

Sep 2 2020

andreadb accepted D87033: [APInt] New member function setBitVal.

Looks good to me.

Sep 2 2020, 12:22 PM · Restricted Project

Aug 28 2020

andreadb added a comment to D86644: [llvm-mca] Initial implementation of output serialization using JSON.

Can it simply be autogenerated, without omitting details?

It would have to be auto-generated based on the readable output, then, using some sort of (python) script that creates the structure independently, or did you have something else in mind?
Aug 28 2020, 8:47 AM · Restricted Project

Aug 26 2020

andreadb added a reviewer for D86644: [llvm-mca] Initial implementation of output serialization using JSON: RKSimon.
Aug 26 2020, 12:32 PM · Restricted Project
andreadb added a comment to D86644: [llvm-mca] Initial implementation of output serialization using JSON.

Thanks Wolfgang for working on this.

Aug 26 2020, 12:26 PM · Restricted Project

Aug 25 2020

andreadb added inline comments to D86390: [llvm-mca][NFC] Refactor instruction printing.
Aug 25 2020, 1:38 AM · Restricted Project
andreadb accepted D86390: [llvm-mca][NFC] Refactor instruction printing.

Looks good to me module a few nits (see below).

Aug 25 2020, 1:06 AM · Restricted Project

Aug 21 2020

andreadb accepted D86177: [llvm-mca][NFC] Separate calculation of display data from its display in the summary and instruction info views.

LGTM

Aug 21 2020, 8:59 AM · Restricted Project

Aug 20 2020

andreadb added a comment to D86177: [llvm-mca][NFC] Separate calculation of display data from its display in the summary and instruction info views.

Nice!

Thank you Andrea!

I agree with Roman in that we should not guarantee stability of data and/or structure.
Also, changes to the output structure should always be advertised (for example, by adding a line in the release notes), so that people are always aware of it.

I must confess I am not very informed about the issues regarding stability. Doesn't data stability depend on the scheduling model? I don't see how llvm-mca could guarantee it if that changes.

Aug 20 2020, 1:25 AM · Restricted Project

Aug 19 2020

andreadb added a comment to D86177: [llvm-mca][NFC] Separate calculation of display data from its display in the summary and instruction info views.

Thanks Wolfgang! I always wanted to add a structured output to llvm-mca.
Thanks for working on this.

Aug 19 2020, 2:04 AM · Restricted Project

Aug 12 2020

andreadb added a comment to D85165: [X86][MC][Target] Initial backend support a tune CPU to support -mtune.

Cheers. Now I have a better understanding of why you wanted to check latencies.

Aug 12 2020, 12:31 PM · Restricted Project
andreadb added a comment to D85165: [X86][MC][Target] Initial backend support a tune CPU to support -mtune.

@andreadb @RKSimon or @efriedma do any of you have suggestions for simple scheduler tests for this? I was hoping I could use -print-schedule like we used to but that no longer exists.

Aug 12 2020, 11:12 AM · Restricted Project

May 19 2020

andreadb committed rG0980c9c6f155: [X86] Split masked integer vector stores into vXi32/vXi64 variants (PR45975). (authored by andreadb).
[X86] Split masked integer vector stores into vXi32/vXi64 variants (PR45975).
May 19 2020, 10:23 AM
andreadb closed D80201: [X86] Split masked integer vector stores into vXi32/vXi64 variants (PR45975). NFC.
May 19 2020, 10:22 AM · Restricted Project