Page MenuHomePhabricator

[RISCV] Add DAG nodes to represent read/write CSR
Needs ReviewPublic

Authored by sepavloff on Nov 5 2020, 8:03 AM.

Details

Summary

Two custom DAG nodes, READ_CSR and WRITE_CSR, were added. They represent
read and write operations on CSR.

This change also added additional instruction patterns to represent write-only
variants of CSRRW. They are encoded by putting X0 as destination register, so
actually such instruction does not define its destination register. Generic
read-and-write form of CSRRW is not suitable in this case, as it always
produce produce output value.

Diff Detail

Unit TestsFailed

TimeTest
300 msx64 debian > libarcher.races::task-dependency.c
Script: -- : 'RUN: at line 13'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -Wl,-rpath,/mnt/disks/ssd0/agent/llvm-project/build/lib /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-dependency.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-dependency.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/deflake.bash /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-dependency.c.tmp 2>&1 | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-dependency.c.tmp.log | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-dependency.c
300 msx64 debian > libarcher.races::task-taskgroup-unrelated.c
Script: -- : 'RUN: at line 13'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -Wl,-rpath,/mnt/disks/ssd0/agent/llvm-project/build/lib /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-taskgroup-unrelated.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-taskgroup-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/deflake.bash /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-taskgroup-unrelated.c.tmp 2>&1 | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-taskgroup-unrelated.c.tmp.log | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-taskgroup-unrelated.c
270 msx64 debian > libarcher.races::task-taskwait-nested.c
Script: -- : 'RUN: at line 13'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -Wl,-rpath,/mnt/disks/ssd0/agent/llvm-project/build/lib /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-taskwait-nested.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/deflake.bash /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp 2>&1 | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-taskwait-nested.c.tmp.log | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-taskwait-nested.c
240 msx64 debian > libarcher.races::task-two.c
Script: -- : 'RUN: at line 13'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -Wl,-rpath,/mnt/disks/ssd0/agent/llvm-project/build/lib /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-two.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-two.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/deflake.bash /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-two.c.tmp 2>&1 | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/task-two.c.tmp.log | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/races/task-two.c
270 msx64 debian > libarcher.task::task-barrier.c
Script: -- : 'RUN: at line 15'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -Wl,-rpath,/mnt/disks/ssd0/agent/llvm-project/build/lib /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/task/task-barrier.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/task/Output/task-barrier.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/task/Output/task-barrier.c.tmp 2>&1 | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/tools/archer/tests/task/Output/task-barrier.c.tmp.log | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/tools/archer/tests/task/task-barrier.c
View Full Test Results (13 Failed)

Event Timeline

sepavloff created this revision.Nov 5 2020, 8:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 5 2020, 8:03 AM
sepavloff requested review of this revision.Nov 5 2020, 8:03 AM

What's the motivation for this?

llvm/lib/Target/RISCV/RISCVInstrInfo.td
409

CSRW_ir might be a better name (though CSRW could be confused with the pseudoinstruction), not a huge fan of _wo.

411

This one has ReadCSR but CSR_ii_wo doesn't; which is it?

1269–1274

Why do we need CSR_i[ir]_wo but not CSR_i[ir]_ro? Either both are needed if you need to be able to specify scheduling for instructions that only do one of read or write, or you don't need either of them, surely?

sepavloff updated this revision to Diff 304443.Nov 11 2020, 2:40 AM

Updated patch

sepavloff marked an inline comment as done.Nov 11 2020, 4:08 AM

What's the motivation for this?

I am working on the implementation of SET_ROUNDIND on RISCV, it is D91242. To set rounding mode it is sufficient to write a proper value to corresponding CSR. CSRRW or CSRRS may be used for that but they set output register. It is not clear how to set a physical register (X0) as an output without making a new class. Moreover it is incorrect to have output register at all in this case because such instruction does not set any register. As read-write and write-only instructions differ in number of outputs, they must be different instructions in MIR. In contrast to them read-only instructions do not need separate MachineInstr as X0 may be specified as input operand.

llvm/lib/Target/RISCV/RISCVInstrInfo.td
409

Changed to CSRW_*.

411

Yes, updated it.

1269–1274

Why do we need CSR_i[ir]_wo but not CSR_i[ir]_ro?

We can specify X0 as an input operand, but it seems there is no simple way to specify X0 as output.

Either both are needed if you need to be able to specify scheduling for instructions that only do one of read or write

It depends on the implementation of the scheduler. Scheduling probably does not depend on the output register, as the hardware instruction in both cases is the same.

I’m pretty sure you can use X0 as the output...

sepavloff marked an inline comment as done.Nov 11 2020, 6:30 AM

I’m pretty sure you can use X0 as the output...

Could you please tell me how I can do that?

Even if there were a way to specify particular register in outputs, fake writes to X0 would create false output dependencies, which would require specific treatment. Using instructions without output is a natural way to represent such cases.

Even if there were a way to specify particular register in outputs

You mean make RISCV::X0 the first operand? I don’t understand what the problem is.

, fake writes to X0 would create false output dependencies, which would require specific treatment. Using instructions without output is a natural way to represent such cases.

LLVM knows it’s a constant register. The correct thing to do IMO is fix any places in LLVM that don’t account for constant registers, if there are any, and then have a generic solution, rather than try and work around any deficiencies by adding special cases to backends every single time it comes up.

, fake writes to X0 would create false output dependencies, which would require specific treatment. Using instructions without output is a natural way to represent such cases.

LLVM knows it’s a constant register. The correct thing to do IMO is fix any places in LLVM that don’t account for constant registers, if there are any, and then have a generic solution, rather than try and work around any deficiencies by adding special cases to backends every single time it comes up.

The fact that X0 is used in destination register field of an instruction to produce write-only variant is peculiarity of RICSV encoding. It does not mean that X0 is defined by the instruction. DAG is a higher layer it tries to abstracts from particular ISA. Results of a DAG node are treated much like results of function calls. So write-only CSR instructions should be represented by different nodes in DAG because they have different number of produced values.

rogfer01 added a comment.EditedJan 6 2021, 10:33 AM

Hi Serge, would it make sense to use a Pseudo for those specific cases with a custom inserter? (usesCustomInserter = 1 in the tablegen definition of the Pseudo)

This way you could have PseudoCSRW and PseudoCSRWI (it looks to me you do not need the other cases, did I get that right?) that you can use in the patterns. Then you can expand them to the existing MachineInstructions CSRRW, CSRRWI, respectively (hope I didn't get the names wrong), that use X0 as the destination register in RISCVTargetLowering::EmitInstrWithCustomInserter.

I understand your concern with X0 potentially defining a false write dependency, but I too understand that we should fix any case in LLVM where constant registers are not handled correctly.

Hi Serge, would it make sense to use a Pseudo for those specific cases with a custom inserter? (usesCustomInserter = 1 in the tablegen definition of the Pseudo)

This way you could have PseudoCSRW and PseudoCSRWI (it looks to me you do not need the other cases, did I get that right?) that you can use in the patterns. Then you can expand them to the existing MachineInstructions CSRRW, CSRRWI, respectively (hope I didn't get the names wrong), that use X0 as the destination register in RISCVTargetLowering::EmitInstrWithCustomInserter.

You describe how to make X0 an output register. I agree, it look like using custom inserter is the only way to make such instruction. This way is substantially more complex than just adding 3 new auxiliary instructions. But not complexity is the main concern.

Making X0 an output register contradicts with design of both DAG and MIR. DAG node and MachineInstr behave as functions, they have clear distinction of input and output operands. Even if a register is both input and output, it is represented by two operands. Functional nature of DAG and MIR is used in many cases, where use-def chains are examined, like lifetime analysis or loop invariant movement. Adding "definitions" for register that is immutable would break the functional nature and might require multiple changes in various parts of compiler.

I understand your concern with X0 potentially defining a false write dependency, but I too understand that we should fix any case in LLVM where constant registers are not handled correctly.

It creates also true dependencies, any use of X0 after CSR write would depend on the latter. For example, if a loop contains an invariant expression that uses X0, it would not be moved if the loop contains CSR write.

Using X0 as output is just a trick to have a new instruction without spending opcode. Actually such instruction does not define X0. What is the benefit of exposing this low-level encoding feature in high-level structures?

craig.topper added inline comments.Jan 9 2021, 10:52 PM
llvm/lib/Target/RISCV/RISCVInstrInfo.td
577

There is no write only version of CSRS/CSRC/CSRSI/CSRCI. There's a read/write and read only version of those.

sepavloff added inline comments.Jan 10 2021, 3:52 AM
llvm/lib/Target/RISCV/RISCVInstrInfo.td
577

Indeed. Thank you for the catch. I will remove them.

Hi Serge,

Using X0 as output is just a trick to have a new instruction without spending opcode. Actually such instruction does not define X0. What is the benefit of exposing this low-level encoding feature in high-level structures?

My suggestion was to avoid the situation where we have two machine instructions that overlap in their semantics. This entails that a later pass that analyses CSRs should take into account those write only forms in addition to the actual instructions. However, maybe this is not a practical issue. The number of CSR instructions is not large. It may also happen that SelectionDAG will never select a CSR write instruction that writes to X0. Or if it does, we would always use the new write-only form that you suggest.

I'm not sure if we would want to prefix those write-only versions with Pseudo? (like it happens with other instructions that exist for the purpose of Codegen).

lenary resigned from this revision.Jan 14 2021, 9:46 AM
sepavloff updated this revision to Diff 320433.Feb 1 2021, 4:08 AM

Removed read-only variants of CSRRC and CSRRS. Rebased.

sepavloff edited the summary of this revision. (Show Details)Feb 1 2021, 4:09 AM

Hi Serge,

Using X0 as output is just a trick to have a new instruction without spending opcode. Actually such instruction does not define X0. What is the benefit of exposing this low-level encoding feature in high-level structures?

My suggestion was to avoid the situation where we have two machine instructions that overlap in their semantics. This entails that a later pass that analyses CSRs should take into account those write only forms in addition to the actual instructions. However, maybe this is not a practical issue. The number of CSR instructions is not large. It may also happen that SelectionDAG will never select a CSR write instruction that writes to X0. Or if it does, we would always use the new write-only form that you suggest.

The new instruction patterns are marked as isCodegenOnly, so they won't appear in MC layer. For example, output of disassembler may not contain them. Any analysis made at this level may be unaware of the new patterns. It is only codegen that must take into account the new pattern. But for it these instructions indeed are different, they have different fundamental properties.

It would be, of course, convenient to have close correspondence between instruction bit representation and instruction object used in internal representation. It is however sometimes not possible and codegens often use isCodegenOnly patterns to cope with gap between encoding and machine IR representation. For example X86 defines different patterns for XOR8 depending on whether it is prefixed with REX prefix. Any analysis made at MachineInstr level need to take into account all variants. And X86 is not an exception.

I'm not sure if we would want to prefix those write-only versions with Pseudo? (like it happens with other instructions that exist for the purpose of Codegen).

Defining isCodegenOnly pattern is a clear and compact solution, Pseudo would require additional code to expand it. Besides it is not clear what could be a replacement in this case. It can't be CSRRW with X0 as defined register.