This is an archive of the discontinued LLVM Phabricator instance.

[RISCV][llvm-mca] Use LMUL Instruments to provide more accurate reports on RISCV
ClosedPublic

Authored by michaelmaitland on Nov 4 2022, 9:37 AM.

Details

Summary

On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction
itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer
elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind
of information impacts how the instruction takes to execute and what dependencies this may cause.

On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or
vl, in addition with the instruction itself. But MCA does not track or use the data in these
registers. This patch fixes this problem.

Consider the following examples:

  1. No vset{i}vl{i}:
vadd.vv v12, v12, v12

MCA is expectd to work on snippets of programs. This means it is okay for an llvm-mca user
to pass in a program without a vset{i}vl{i} as such. But the user may still know the LMUL
that this instruction was executed under. As a solution they can instrument the program
with this information as follows:

# LLVM-MCA-RISCV-LMUL M1
vadd.vv v12, v12, v12

It was considered as a design to use a vset{i}vl{i} instead of a comment. The problem with this is that
the vset{i}vl{i} may take physical cycles, and artifically inserting it would have tangible impact on
the performance analysis.

  1. Program with vset{i}vl{i}
vsetvli zero, a0, e8, m1, tu, mu
vadd.vv v12, v12, v12

This program supplies the information to MCA about how the vtype register is being set,
but MCA is built to ignore the values of registers and immediates. Additionally, we're
not sure what vsetvli sets vl as since this is implementation specific.
As a result, I propose that a vset{i}vl{i} is considered in the same was as it was before
and we still rely on instruments placed to track the vtype state:

vsetvli zero, a0, e8, m1, tu, mu
# LLVM-MCA-RISCV-LMUL M1
vadd.vv v12, v12, v12
  1. Multiple vset{i}vl{i}
vsetvli zero, a0, e8, m1, tu, mu
vadd.vv v12, v12, v12
vsetvli zero, a0, e8, m8, tu, mu
vadd.vv v12, v12, v12

I had considered using a command line option to set the relevant vtype register contents.
But the data in the register can change over the life of the program as demonstrated above.
Using instrument comments also solves this problem:

vsetvli zero, a0, e8, m1, tu, mu
# LLVM-MCA-RISCV-LMUL M1
vadd.vv v12, v12, v12
vsetvli zero, a0, e8, m8, tu, mu
# LLVM-MCA-RISCV-LMUL M8
vadd.vv v12, v12, v12

In this example we get a better understanding of the behavior of instrument regions: an instrument
is active until the end of the program, or until another instrument of the same type (in this case
RISCV-LMUL type). This means a programmer only needs to insert instruments after vset{i}vl{i}
instructions, or when a vset{i}vl{i} instruction above another instruction in the sequence was
not included in the program.

  1. vsetvl
vsetvl rd, rs1, rs2
vadd.vv v12, v12, v12
vsetvl rd, rs1, rs2
vadd.vv v12, v12, v12

Example 2 proposed that we could use the values in the immediates instead of using instrument comments.
But vsetvl reads registers, not immediates, so we need to rely on instruments anyway.

vsetvl rd, rs1, rs2
# LLVM-MCA-RISCV-LMUL M1
vadd.vv v12, v12, v12
vsetvl rd, rs1, rs2
# LLVM-MCA-RISCV-LMUL M8
vadd.vv v12, v12, v12
  • Replace CodeRegions with AnalysisRegions
  • Add Instrument and InstrumentManager
  • Add InstrumentRegions
  • Add RISCV Instrument and InstrumentationManager
  • Parse Instruments in driver
  • Use instruments to override schedule class
  • RISCV use lmul instrument to override schedule class
  • Fix unit tests to pass empty instruments
  • Add -ignore-im clopt to disable this change

Diff Detail

Event Timeline

michaelmaitland created this revision.Nov 4 2022, 9:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 4 2022, 9:37 AM
michaelmaitland requested review of this revision.Nov 4 2022, 9:37 AM

Fix description of InstrumentManager

Hi Michael,

Thanks a lot for this patch!

Two requests:

  1. Please update the offical docs by adding a small paragraph (like we did for CustomBehaviour) describing this new feature and how to use it.
  2. Please add some tests to verify that your new custom behaviour works as intended.

About 2.: you could reuse the code snippets from the summary of this review.
For each test, you can provide two (-instruction-tables) RUN lines:

  • A RUN line with flag -disable-im.
  • A RUN line without flag -disable-im.

That way, we can see how resource pressure changes when instruments are used to override the scheduling class IDs.

I also suggest to add a test where your special "instrument comments" are used in the presence of multiple nested code regions.
This last test is to ensure that lifetime of instruments is correctly handled, and that top-level instruments are correctly associated to nested regions too.

Thanks
-Andrea

llvm/include/llvm/MCA/CustomBehaviour.h
119

These fields should be private.

140–141

This comment should be improved.

This class allows targets to optionally customize the logic that resolves scheduling class IDs.
Targets can use information encoded in so-called Instrument objects to make more informed scheduling decisions.

Something like that... Essentially, a brief description of what this class does.

llvm/tools/llvm-mca/CodeRegion.h
29–55

Could you please also update the llvm-mca documentation?

llvm/tools/llvm-mca/llvm-mca.cpp
234–239

Please update the llvm-mca docs too.

myhsu added a reviewer: myhsu.Nov 5 2022, 12:14 PM
myhsu added a subscriber: myhsu.

I read your patch descriptions with great interest and kudos on listing all cases. I generally agree with your instrument comment / analysis region. I think it's the most practical solution we have for now and also opens a door for creating other kinds of MCA extensions in the future.

On the technical side I dropped some drive-by comments, I'll have a deeper look later. As Andrea mentioned, please add some tests first. I think it's not hard in your case as you already had plenty of examples.

llvm/include/llvm/MCA/CustomBehaviour.h
121

you're already in llvm namespace so you can drop "llvm::". Ditto for rest of the changes.

130

Instrument() = default or even without one should suffice

llvm/lib/Target/RISCV/MCA/RISCVCustomBehaviour.cpp
50

nit: maybe replace this with StringSwitch?

58

okay these lines should definitely be replaced with StringSwitch

79

nit: use == instead? (unlike Java StringRef::operator== and StringRef::equals are the same)

llvm/tools/llvm-mca/CodeRegionGenerator.h
40

nit: add const qualifier

  • Updated docs
  • Made fields in Instrument private
  • Improved classdoc for InstrumentManager
  • Removed llvm namespace qualifier where it wasn't needed
  • Simplified Instrument default constructor
  • Use StringSwitch in RISCVCustomBehaviour
  • Use == instead of .equals on StringRefs
  • add const qualifier on MCACommentConsumer::hadErr

With respect to the request that tests are added:

There is no RISCV vector scheduler model that exists upstream. As a result, any tests I would add
will fail because processor does not support the vector instructions. I propose that we should create
a generic RISCV vector processor that exists upstream in a seperate patch. I think that this
would be beneficial for MCA so I can include those tests, but also for llvm contributors in areas
such as academia. What do you think about this?

  • Updated docs
  • Made fields in Instrument private
  • Improved classdoc for InstrumentManager
  • Removed llvm namespace qualifier where it wasn't needed
  • Simplified Instrument default constructor
  • Use StringSwitch in RISCVCustomBehaviour
  • Use == instead of .equals on StringRefs
  • add const qualifier on MCACommentConsumer::hadErr

With respect to the request that tests are added:

There is no RISCV vector scheduler model that exists upstream. As a result, any tests I would add
will fail because processor does not support the vector instructions. I propose that we should create
a generic RISCV vector processor that exists upstream in a seperate patch. I think that this
would be beneficial for MCA so I can include those tests, but also for llvm contributors in areas
such as academia. What do you think about this?

Sounds like a good idea. Thanks!

I don't have other comments. Patch looks good to me. @myhsu what do you think?

myhsu added a comment.Nov 7 2022, 10:38 PM

I only have minor comments. Otherwise LGTM. Cheers :-)

Also it will be great if you can mark inline comments as done upon addressing them, so that it's easier for both parties to navigate outstanding comments by simply clicking on "X comments" button at the top right corner. Another related tip: you can check those "done" boxes before updating a new diff. Phabricator will automatically submit them after uploading the diff.

llvm/lib/Target/RISCV/MCA/RISCVCustomBehaviour.cpp
99

this creates a copy for each element in IVec and brings some overhead due to extra bookkeeping upon copying a shared_ptr (and when I goes out of scope). I think you can auto &I : IVec (or even const auto& I : IVec) in this case.

106
llvm/lib/Target/RISCV/MCA/RISCVCustomBehaviour.h
33

remove "llvm::". ditto for other occurrences in this patch

llvm/tools/llvm-mca/CodeRegion.h
169

nit: const qualifier

174

nit: use struct instead and remove "public:". Ditto for InstrumentRegions

Longer term, I really do think we need to be able to parse LMUL from vsetvli instructions, and have that influence scheduling of following instructions. I want to be able to take a program trace, and run it through llvm-mca without having to parse the trace and add annotations before doing so. Having said that, I have zero objection to the annotation mechanism being proposed here, and think it is useful. For the actual review, I'll defer to already active reviewers who are a lot more familiar with the internals of MCA that I am. :)

Longer term, I really do think we need to be able to parse LMUL from vsetvli instructions, and have that influence scheduling of following instructions. I want to be able to take a program trace, and run it through llvm-mca without having to parse the trace and add annotations before doing so. Having said that, I have zero objection to the annotation mechanism being proposed here, and think it is useful. For the actual review, I'll defer to already active reviewers who are a lot more familiar with the internals of MCA that I am. :)

In the future, I am not opposed to adding support to use the immediate of the vsetvli instruction so annotations are not needed. But there are a few things we must consider:

  • I plan on adding the ability to generate assembly that contains the annotations (using llc for example), so you wouldn't need to add annotations yourself on code that you generated anyway. This does not mean reading immediate of vsetvli can't happen. In fact, reading immediates would be helpful if you have assembly you did not generate yourself or didn't generate with the expectation to use it with llvm-mca.
  • We would need to be more careful on programs that contained both vsetvli and vsetvl.
  • We would need to consider the case where there was an annotation that had one LMUL value but the vsetvli instruction that was associated with the instrument had a different LMUL value. Of course maybe a user messed up writing instrument by hand, but the other alternative is that they originally emitted code that had one LMUL but they wanted to see what the impact is if they used a different LMUL. I think we would be able to control this behavior with a clopt though.

Overall, I think that this suggestion would make using llvm-mca an even better experience on RISCV vector programs, and I'm confident that all the concerns above can be solved.

andreadb accepted this revision.Nov 10 2022, 2:54 PM

Longer term, I really do think we need to be able to parse LMUL from vsetvli instructions, and have that influence scheduling of following instructions. I want to be able to take a program trace, and run it through llvm-mca without having to parse the trace and add annotations before doing so. Having said that, I have zero objection to the annotation mechanism being proposed here, and think it is useful. For the actual review, I'll defer to already active reviewers who are a lot more familiar with the internals of MCA that I am. :)

In the future, I am not opposed to adding support to use the immediate of the vsetvli instruction so annotations are not needed. But there are a few things we must consider:

  • I plan on adding the ability to generate assembly that contains the annotations (using llc for example), so you wouldn't need to add annotations yourself on code that you generated anyway. This does not mean reading immediate of vsetvli can't happen. In fact, reading immediates would be helpful if you have assembly you did not generate yourself or didn't generate with the expectation to use it with llvm-mca.
  • We would need to be more careful on programs that contained both vsetvli and vsetvl.
  • We would need to consider the case where there was an annotation that had one LMUL value but the vsetvli instruction that was associated with the instrument had a different LMUL value. Of course maybe a user messed up writing instrument by hand, but the other alternative is that they originally emitted code that had one LMUL but they wanted to see what the impact is if they used a different LMUL. I think we would be able to control this behavior with a clopt though.

Overall, I think that this suggestion would make using llvm-mca an even better experience on RISCV vector programs, and I'm confident that all the concerns above can be solved.

Sounds like a good plan.

Thanks a lot for working on this.

Patch LGTM.

This revision is now accepted and ready to land.Nov 10 2022, 2:54 PM
michaelmaitland marked 15 inline comments as done.Nov 10 2022, 4:37 PM
  • drop llvm:: where possible
  • iterate over reference instead of copy of IVec
  • simplify nullptr check
  • add const qualifier
  • change public InstrumentRegion and AnalysisRegion classes to struct
myhsu accepted this revision.Nov 10 2022, 9:26 PM

LGTM thanks

This revision was landed with ongoing or failed builds.Nov 15 2022, 7:55 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Nov 15 2022, 8:02 AM

This doesn't build on Windows: http://45.33.8.238/win/70012/step_4.txt

Please take a look and revert for now if it takes a while to fix.

Apparently doesn't build on linux either: http://45.33.8.238/linux/91480/step_4.txt

Maybe it's "doesn't' build with clang as host compiler"?

Anyways, thanks for the quick revert!

Apparently doesn't build on linux either: http://45.33.8.238/linux/91480/step_4.txt

Maybe it's "doesn't' build with clang as host compiler"?

Anyways, thanks for the quick revert!

It was because building with ninja all didn't build the x86 unit tests so I didn't catch it. I have fixed the issue on my local machine. What is standard procedure after reverting commit? Do I make an entirely new revision or modify this one?

I think most people modify the existing review.

michaelmaitland reopened this revision.Nov 15 2022, 9:12 AM

Reopening so I can fix test cases.

This revision is now accepted and ready to land.Nov 15 2022, 9:12 AM

Fix unit tests so it builds

andreadb accepted this revision.Nov 15 2022, 11:08 AM

Fix unit tests so it builds

LGTM

This revision was landed with ongoing or failed builds.Nov 15 2022, 5:47 PM
This revision was automatically updated to reflect the committed changes.
michaelmaitland reopened this revision.Nov 17 2022, 4:13 PM

Reopening because there was still problems with the build. Not seeing these errors locally nor on the pre-commit checks. Going to investigate and will update once I resolve the issues.

This revision is now accepted and ready to land.Nov 17 2022, 4:13 PM
michaelmaitland marked an inline comment as not done.Nov 17 2022, 5:03 PM
michaelmaitland added inline comments.
llvm/include/llvm/MCA/CustomBehaviour.h
130

I cannot use this constructor form because N4659[dcl.init]p7.4), since:

  • the default constructor for StringRef is not user-provided, following N4659[dcl.fct.def.default]p5
  • the field Desc and Data are not const-default-constructible and has no default member initializer.
michaelmaitland marked an inline comment as not done.
  • Fix Instrument constructor
  • SmallVector<SharedInstrument, 2> -> SmallVector<SharedInstrument>
michaelmaitland marked an inline comment as done.Nov 17 2022, 5:05 PM

I accidentally bled some other changes into here from an unrelated change. I will remove those.

Remove unrelated changes

myhsu accepted this revision.Nov 17 2022, 9:30 PM

I think you'd addressed all errors from the buildbot failure.

llvm/include/llvm/MCA/CustomBehaviour.h
130

thanks for the investigation!

@michaelmaitland I've raised https://github.com/llvm/llvm-project/issues/59091 as we're seeing some MSVC warnings due to multiple inheritance