Detect identifiers with right-to-left ordering through clang-tidy.
It detects a class of attack similar to the ones described in https://www.trojansource.codes/trojan-source.pdf
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Looks good as far as it goes (though I've not checked your classification functions are correct).
We should have some detection for RTL characters in string literals and comments too, at least when there is no subsequent character with strong LTR directionality. Neither " nor */ has strong directionality, so both א" and א*/ can potentially leave us in an RTL state. For example:
// This is "א" then + then "ג". "א" + "ג" // This is `a` then `/*א*/` then `*` then `-` then `/*ג*/` then `b`, aka "a * - b" not "a - * b". a /*א*/ * - /*ג*/ b
clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.cpp | ||
---|---|---|
2 | Please fix | |
19 | (throughout). | |
134–135 | As in the other patch, we could recover here by skipping until we find a valid UTF-8 lead byte. It's not important here since invalid UTF-8 won't be accepted in an identifier anyway, but might be important if we start checking strings and comments for RTL characters too. |
- fix formatting
- added documentation
- *not* doing the extra work wrt. unicode error recovery
- Add check to list of checks: https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/docs/clang-tidy/checks/list.rst
- Mention check in the Release Notes: https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/docs/ReleaseNotes.rst
AKAIU, the licensing issue doesn't impact that particular review, only the one on confusable identifiers.
Ok! I don't really know what applies when you take part of a file, so I'll leave that up to people who know. I don't know how to remove the "Requested changes" from here so I'll just remove myself from reviewer.
PS: I don't know what AKAIU means, nor can I find the answer in Google :)
That's a valid question, but I guess that using spec content is ok... not 100% sure though.
PS: I don't know what AKAIU means, nor can I find the answer in Google :)
I meant AFAIU, I'm so skilled I can make typo in acronyms :-)
clang-tools-extra/clang-tidy/misc/MisleadingIdentifier.h | ||
---|---|---|
3 | This needs fixing too. |
@serge-sans-paille Any luck with a fix for all the buildbot failures or should we revert ? https://lab.llvm.org/buildbot/#/builders/109/builds/25932
clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst | ||
---|---|---|
12 ↗ | (On Diff #385805) | Sphinx can't parse that (perhaps unsurprisingly), could you declare that as having no language? Warning, treated as error: /home/buildbot/llvm-build-dir/clang-tools-sphinx-docs/llvm/src/clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst:12:Could not lex literal_block as "c". Highlighting skipped. Should be reproducible by building docs-clang-tools-html (might need -j1). |
The test seems to trigger an assert: http://45.33.8.238/linux/60293/step_9.txt
Please take a look!
clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp | ||
---|---|---|
4 | @serge-sans-paille Any chance you can remove the include and just forward declare printf()? |
clang-tools-extra/test/clang-tidy/checkers/misc-misleading-identifier.cpp | ||
---|---|---|
4 | Sure, I'll prepare a patch. |
clang-tools-extra/docs/clang-tidy/checks/misc-misleading-identifier.rst | ||
---|---|---|
12 ↗ | (On Diff #385805) | I fixed this in ac8c813b89f62efcaff4d0c92fdf1a5dd29d795d as the bot was red. |
This needs fixing too.