This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Cortex-M4 schedule
ClosedPublic

Authored by dmgreen on Nov 6 2018, 2:46 AM.

Details

Summary

This patch adds a simple Cortex-M4 schedule, renaming the existing M3
schedule to M4 and filling in the latencies as-per the Cortex-M4 TRM:
https://developer.arm.com/docs/ddi0439/latest

Most of these are 1, with the important exception being loads taking 2
cycles. A few others are also higher, but I don't believe they make a
large difference. I've repurposed the M3 schedule as the latencies are
mostly the same between the two cores, with the M4 having more FP and
DSP instructions. We also turn on MISched and UseAA for the cores that
now use this.

It also adds some schedule Write's to various instruction to make things
simpler.

Diff Detail

Repository
rL LLVM

Event Timeline

dmgreen created this revision.Nov 6 2018, 2:46 AM

Unfortunately, this also increased codesize a little at -Oz, which I will have to look into.

test/CodeGen/Thumb2/ifcvt-no-branch-predictor.ll
122 ↗(On Diff #172722)

This test in the past, seemed to not take into account that the loads would take two cycles. (Or alternatively, the cycle count for this isn't taking into account that the load will be pipelined, only taking a single cycle). I've changed things to an add to hopefully keep the intent of the test.

javed.absar added inline comments.Nov 6 2018, 3:46 AM
lib/Target/ARM/ARM.td
980 ↗(On Diff #172722)

Would it be better to rename CortexM4Model as something more generic e.g. CortexMEfficientModel if it really benefits more than just M4.

lib/Target/ARM/ARMScheduleM4.td
24 ↗(On Diff #172722)

wrong indentation for let

26 ↗(On Diff #172722)

Is this (BufferSize = 0) necessary here for in-order (given, MicroOpBufferSize = 0)?

46 ↗(On Diff #172722)

You might be able to simplify the code below by creating a new local class as:
class Lat1Type<SchedWrite write> : WriteRes<write, [M4Unit]>;
def : Lat1Type<WriteALU>;
def : Lat1Type<WriteALUsi>;
..
OR
foreach write = [WriteALU, Write ....] in

def : Lay1Type<write>;
dmgreen updated this revision to Diff 172913.Nov 7 2018, 3:07 AM
dmgreen marked an inline comment as done.

Cleanup using tablegen classes.

lib/Target/ARM/ARM.td
980 ↗(On Diff #172722)

I think M4 makes sense. It does use the latencies from that core, and the others are just close-enough that it still makes sense to use it. In the same way that we still use the A57 schedule for the A72. I don't believe this model would be used for v6m targets like the M0/M23, or for larger things like the M7.

lib/Target/ARM/ARMScheduleM4.td
24 ↗(On Diff #172722)

I think this is the same as for other schedules, such as the R52 or the AArch64's A53. Unless I am missing what you mean? I will move the M4Unit above it though.

26 ↗(On Diff #172722)

It seems that if BufferSize is left off, it gets a default value of -1 in the generated ARMGenSubtargetInfo file. I added a comment.

javed.absar accepted this revision.Nov 9 2018, 5:52 AM

LGTM. Thanks for this.

lib/Target/ARM/ARMScheduleM4.td
57 ↗(On Diff #172913)

unnecessary extra line

This revision is now accepted and ready to land.Nov 9 2018, 5:52 AM
fhahn added a comment.Mar 7 2019, 9:53 AM

Hi Dave, do you plan to commit this anytime soon?

Hello. This was increasing codesize by more that I'd like, which was why I never committed it. The performance results would probably make it worthwhile, but at -Oz, it's not something people should be paying for.

I had a look into turning this off at minsize, but that felt like a bit of a hack and IIRC there was a test failing because of it. I just re-ran the results and there are some performance regressions I'd like to look into before committing too.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2019, 5:39 AM
llvm/trunk/lib/Target/ARM/ARMInstrThumb2.td