On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction
itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer
elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind
of information impacts how the instruction takes to execute and what dependencies this may cause.
On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or
vl, in addition with the instruction itself. But MCA does not track or use the data in these
registers. This patch fixes this problem.
Consider the following examples:
- No vset{i}vl{i}:
vadd.vv v12, v12, v12
MCA is expectd to work on snippets of programs. This means it is okay for an llvm-mca user
to pass in a program without a vset{i}vl{i} as such. But the user may still know the LMUL
that this instruction was executed under. As a solution they can instrument the program
with this information as follows:
# LLVM-MCA-RISCV-LMUL M1 vadd.vv v12, v12, v12
It was considered as a design to use a vset{i}vl{i} instead of a comment. The problem with this is that
the vset{i}vl{i} may take physical cycles, and artifically inserting it would have tangible impact on
the performance analysis.
- Program with vset{i}vl{i}
vsetvli zero, a0, e8, m1, tu, mu vadd.vv v12, v12, v12
This program supplies the information to MCA about how the vtype register is being set,
but MCA is built to ignore the values of registers and immediates. Additionally, we're
not sure what vsetvli sets vl as since this is implementation specific.
As a result, I propose that a vset{i}vl{i} is considered in the same was as it was before
and we still rely on instruments placed to track the vtype state:
vsetvli zero, a0, e8, m1, tu, mu # LLVM-MCA-RISCV-LMUL M1 vadd.vv v12, v12, v12
- Multiple vset{i}vl{i}
vsetvli zero, a0, e8, m1, tu, mu vadd.vv v12, v12, v12 vsetvli zero, a0, e8, m8, tu, mu vadd.vv v12, v12, v12
I had considered using a command line option to set the relevant vtype register contents.
But the data in the register can change over the life of the program as demonstrated above.
Using instrument comments also solves this problem:
vsetvli zero, a0, e8, m1, tu, mu # LLVM-MCA-RISCV-LMUL M1 vadd.vv v12, v12, v12 vsetvli zero, a0, e8, m8, tu, mu # LLVM-MCA-RISCV-LMUL M8 vadd.vv v12, v12, v12
In this example we get a better understanding of the behavior of instrument regions: an instrument
is active until the end of the program, or until another instrument of the same type (in this case
RISCV-LMUL type). This means a programmer only needs to insert instruments after vset{i}vl{i}
instructions, or when a vset{i}vl{i} instruction above another instruction in the sequence was
not included in the program.
- vsetvl
vsetvl rd, rs1, rs2 vadd.vv v12, v12, v12 vsetvl rd, rs1, rs2 vadd.vv v12, v12, v12
Example 2 proposed that we could use the values in the immediates instead of using instrument comments.
But vsetvl reads registers, not immediates, so we need to rely on instruments anyway.
vsetvl rd, rs1, rs2 # LLVM-MCA-RISCV-LMUL M1 vadd.vv v12, v12, v12 vsetvl rd, rs1, rs2 # LLVM-MCA-RISCV-LMUL M8 vadd.vv v12, v12, v12
- Replace CodeRegions with AnalysisRegions
- Add Instrument and InstrumentManager
- Add InstrumentRegions
- Add RISCV Instrument and InstrumentationManager
- Parse Instruments in driver
- Use instruments to override schedule class
- RISCV use lmul instrument to override schedule class
- Fix unit tests to pass empty instruments
- Add -ignore-im clopt to disable this change
These fields should be private.