This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Fix crash for binary input files with non-ascii names
ClosedPublic

Authored by jhenderson on Mar 1 2021, 1:09 AM.

Details

Summary

The code was using the standard isalnum function which doesn't handle values outside the non-ascii range. Switching to using llvm::isAlnum instead ensures we don't crash.

Diff Detail

Event Timeline

jhenderson created this revision.Mar 1 2021, 1:09 AM
jhenderson requested review of this revision.Mar 1 2021, 1:09 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2021, 1:09 AM

C11 says

The header <ctype.h> declares several functions useful for classifying and mapping characters.198) In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

The crash problem you described may be similar to https://drewdevault.com/2020/09/25/A-story-of-two-libcs.html
With certain values (negative ones?) glibc may crash.

llvm/tools/llvm-objcopy/ELF/Object.cpp
1293–1295

may need an unsigned char cast.

C11 says

The header <ctype.h> declares several functions useful for classifying and mapping characters.198) In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

The crash problem you described may be similar to https://drewdevault.com/2020/09/25/A-story-of-two-libcs.html
With certain values (negative ones?) glibc may crash.

Right, the the problem with the current code before the patch - by using a non-ascii character, some of the bytes of a multi-byte character invariably end up in the undefined range that causes problems with isalnum. It's basically the same as other issues I've been involved with in the past. Using llvm::isAlnum fixes the issue, since it doesn't have the same undefined behaviour.

llvm/tools/llvm-objcopy/ELF/Object.cpp
1293–1295

Why? The signature for isAlnum is bool isAlnum(char C).

jhenderson added inline comments.Mar 4 2021, 2:57 AM
llvm/tools/llvm-objcopy/ELF/Object.cpp
1292–1293

FYI - I've fixed this linter issue locally, just haven't bothered uploading a new diff to resolve it.

MaskRay accepted this revision.Mar 4 2021, 2:33 PM

C11 says

The header <ctype.h> declares several functions useful for classifying and mapping characters.198) In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.

The crash problem you described may be similar to https://drewdevault.com/2020/09/25/A-story-of-two-libcs.html
With certain values (negative ones?) glibc may crash.

Right, the the problem with the current code before the patch - by using a non-ascii character, some of the bytes of a multi-byte character invariably end up in the undefined range that causes problems with isalnum. It's basically the same as other issues I've been involved with in the past. Using llvm::isAlnum fixes the issue, since it doesn't have the same undefined behaviour.

OK, I did not check that llvm::isAlnum accepts char instead of int.

Still worth clarifying that this is fixing UB, which may be a crash on some implementations (glibc? Though I cannot reproduce it).

This revision is now accepted and ready to land.Mar 4 2021, 2:33 PM

Still worth clarifying that this is fixing UB, which may be a crash on some implementations (glibc? Though I cannot reproduce it).

Okay, I'll update the commit message. For reference, this crashed on Windows using VS2017 for me.

This revision was landed with ongoing or failed builds.Mar 5 2021, 1:03 AM
This revision was automatically updated to reflect the committed changes.