The code was using the standard isalnum function which doesn't handle values outside the non-ascii range. Switching to using llvm::isAlnum instead ensures we don't crash.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
C11 says
The header <ctype.h> declares several functions useful for classifying and mapping characters.198) In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.
The crash problem you described may be similar to https://drewdevault.com/2020/09/25/A-story-of-two-libcs.html
With certain values (negative ones?) glibc may crash.
llvm/tools/llvm-objcopy/ELF/Object.cpp | ||
---|---|---|
1293–1295 | may need an unsigned char cast. |
Right, the the problem with the current code before the patch - by using a non-ascii character, some of the bytes of a multi-byte character invariably end up in the undefined range that causes problems with isalnum. It's basically the same as other issues I've been involved with in the past. Using llvm::isAlnum fixes the issue, since it doesn't have the same undefined behaviour.
llvm/tools/llvm-objcopy/ELF/Object.cpp | ||
---|---|---|
1293–1295 | Why? The signature for isAlnum is bool isAlnum(char C). |
llvm/tools/llvm-objcopy/ELF/Object.cpp | ||
---|---|---|
1292–1293 | FYI - I've fixed this linter issue locally, just haven't bothered uploading a new diff to resolve it. |
OK, I did not check that llvm::isAlnum accepts char instead of int.
Still worth clarifying that this is fixing UB, which may be a crash on some implementations (glibc? Though I cannot reproduce it).
Okay, I'll update the commit message. For reference, this crashed on Windows using VS2017 for me.
FYI - I've fixed this linter issue locally, just haven't bothered uploading a new diff to resolve it.