This patch follows the approach also used for MIPS, where we decode the offending instruction to determine if the fault was caused by a read or write operation, as that seems to be the only relevant information we have in the signal context structure to determine that.
Details
Diff Detail
Event Timeline
There are some cases where you need to perform more checking in order to be future proof for possible new standard extensions. Specifically, review the RVC Instruction Set Listings in the RISC-V spec and check for where an encoding is marked as "RES" (reserved). e.g. C.LWSP is valid only if rd!=0, and the version with rd=0 is a reserved encoding. Similar with C.LDSP. I think it's just those two.
I'm not sure that's something we want to do. Those are reserved for HINT instructions. The spec says:
This HINT encoding has been chosen so that simple implementations can ignore HINTs alto-
gether, and instead execute a HINT as a regular computational instruction that happens not to
mutate the architectural state.
If the core truly supports those HINTs then it shouldn't trap. If we are trapping presumably it's because the core is just executing the HINT as a plain load/store. That would typically be a NOP, but the address must be invalid and so we are trapping. In that case we are correctly identifying it as a READ or WRITE.
Compressed instruction encodings marked 'RES' are not the same as those marked 'HINT', so I think my reasoning for checking for those reserved encodings remains valid. From the spec:
Several instructions are only valid for certain operands; when invalid, they are marked either RES to indicate that the opcode is reserved for future standard extensions; NSE to indicate that the opcode is reserved for custom extensions; or HINT to indicate that the opcode is reserved for microarchitectural hints
../../../../libsanitizer/sanitizer_common/sanitizer_linux.cpp:1880:16: error: missing terminating ' character
1880 | case 0b10'010: // c.lwsp (rd != x0)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
That sounds like you are compiling with a compiler that doesn't recognize the ' digit separator. That (and the binary literals) should be supported since C++14. Let me know if that addresses the issue.
The LLVM documentation says:
Unless otherwise documented, LLVM subprojects are written using standard C++14 code and avoid unnecessary vendor-specific extensions.
I'm not finding any indication that compiler-rt or the sanitizers differ from that.
If you wish to argue for a different policy regarding the adoption of C++ standards the correct place for that is the llvm-dev mailing list.