This is an archive of the discontinued LLVM Phabricator instance.

[sanitizer][RISCV] Implement SignalContext::GetWriteFlag for RISC-V
ClosedPublic

Authored by luismarques on Feb 26 2020, 5:08 AM.

Details

Summary

This patch follows the approach also used for MIPS, where we decode the offending instruction to determine if the fault was caused by a read or write operation, as that seems to be the only relevant information we have in the signal context structure to determine that.

Diff Detail

Event Timeline

luismarques created this revision.Feb 26 2020, 5:08 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptFeb 26 2020, 5:08 AM
Herald added subscribers: llvm-commits, Restricted Project, evandro and 13 others. · View Herald Transcript

Fix preprocessor conditional compilation block conditions.

asb added a comment.Mar 5 2020, 9:49 AM

There are some cases where you need to perform more checking in order to be future proof for possible new standard extensions. Specifically, review the RVC Instruction Set Listings in the RISC-V spec and check for where an encoding is marked as "RES" (reserved). e.g. C.LWSP is valid only if rd!=0, and the version with rd=0 is a reserved encoding. Similar with C.LDSP. I think it's just those two.

In D75168#1907941, @asb wrote:

There are some cases where you need to perform more checking in order to be future proof for possible new standard extensions. Specifically, review the RVC Instruction Set Listings in the RISC-V spec and check for where an encoding is marked as "RES" (reserved). e.g. C.LWSP is valid only if rd!=0, and the version with rd=0 is a reserved encoding. Similar with C.LDSP. I think it's just those two.

I'm not sure that's something we want to do. Those are reserved for HINT instructions. The spec says:

This HINT encoding has been chosen so that simple implementations can ignore HINTs alto-
gether, and instead execute a HINT as a regular computational instruction that happens not to
mutate the architectural state.

If the core truly supports those HINTs then it shouldn't trap. If we are trapping presumably it's because the core is just executing the HINT as a plain load/store. That would typically be a NOP, but the address must be invalid and so we are trapping. In that case we are correctly identifying it as a READ or WRITE.

asb added a comment.Mar 19 2020, 6:37 AM
In D75168#1907941, @asb wrote:

There are some cases where you need to perform more checking in order to be future proof for possible new standard extensions. Specifically, review the RVC Instruction Set Listings in the RISC-V spec and check for where an encoding is marked as "RES" (reserved). e.g. C.LWSP is valid only if rd!=0, and the version with rd=0 is a reserved encoding. Similar with C.LDSP. I think it's just those two.

I'm not sure that's something we want to do. Those are reserved for HINT instructions. The spec says:

This HINT encoding has been chosen so that simple implementations can ignore HINTs alto-
gether, and instead execute a HINT as a regular computational instruction that happens not to
mutate the architectural state.

If the core truly supports those HINTs then it shouldn't trap. If we are trapping presumably it's because the core is just executing the HINT as a plain load/store. That would typically be a NOP, but the address must be invalid and so we are trapping. In that case we are correctly identifying it as a READ or WRITE.

Compressed instruction encodings marked 'RES' are not the same as those marked 'HINT', so I think my reasoning for checking for those reserved encodings remains valid. From the spec:

Several instructions are only valid for certain operands; when invalid, they are marked either RES to indicate that the opcode is reserved for future standard extensions; NSE to indicate that the opcode is reserved for custom extensions; or HINT to indicate that the opcode is reserved for microarchitectural hints

Properly handle compressed reserved instructions.

Fix copy paste issue in the updated patch.

asb accepted this revision.Mar 26 2020, 4:58 AM

This LGTM, thanks Luis!

This revision is now accepted and ready to land.Mar 26 2020, 4:58 AM
This revision was automatically updated to reflect the committed changes.
schwab added a subscriber: schwab.Jun 7 2020, 12:29 AM

../../../../libsanitizer/sanitizer_common/sanitizer_linux.cpp:1880:16: error: missing terminating ' character
1880 | case 0b10'010: // c.lwsp (rd != x0)

|                ^~~~~~~~~~~~~~~~~~~~~~~~~~~

../../../../libsanitizer/sanitizer_common/sanitizer_linux.cpp:1880:16: error: missing terminating ' character
1880 | case 0b10'010: // c.lwsp (rd != x0)

|                ^~~~~~~~~~~~~~~~~~~~~~~~~~~

That sounds like you are compiling with a compiler that doesn't recognize the ' digit separator. That (and the binary literals) should be supported since C++14. Let me know if that addresses the issue.

schwab added a comment.Jun 7 2020, 3:57 AM

C++14 is much too recent.

C++14 is much too recent.

The LLVM documentation says:

Unless otherwise documented, LLVM subprojects are written using standard C++14 code and avoid unnecessary vendor-specific extensions.

I'm not finding any indication that compiler-rt or the sanitizers differ from that.

schwab added a comment.Jun 7 2020, 6:01 AM

It's breaking users.

It's breaking users.

If you wish to argue for a different policy regarding the adoption of C++ standards the correct place for that is the llvm-dev mailing list.