This is an archive of the discontinued LLVM Phabricator instance.

[MC] - Don't assert when non-english characters are used.
ClosedPublic

Authored by grimar on Oct 2 2017, 8:07 AM.

Details

Summary

I found that llvm-mc does not like non-english characters even in comments,
which it tries to tokenize.

Problem happens because of functions like isdigit(), isalnum() which takes
int argument and expects it is not negative.
But at the same time MCParser uses char* to store input buffer poiner, char has signed value,
so it is possible to pass negative value to one of functions from above and
that triggers an assert.
Testcase for demonstration is provided.

To fix the issue helper functions were introduced in StringExtras.h.

Diff Detail

Repository
rL LLVM

Event Timeline

grimar created this revision.Oct 2 2017, 8:07 AM
grimar retitled this revision from [MC] - Don't crash when non-english characters are used. to [MC] - Don't assert when non-english characters are used..
davide edited edge metadata.Oct 2 2017, 8:15 AM

Indeed isdigit should have its argument converted to uchar proor to the call but, is the result you get from mc correct? http://en.cppreference.com/w/cpp/string/byte/isdigit

If so, can you please check the output in your test?

grimar updated this revision to Diff 117356.Oct 2 2017, 8:45 AM

Yes, output is correct. Updated testcase as requested.

rnk edited edge metadata.Oct 2 2017, 9:24 AM

cppreference suggests writing our own wrapper to avoid this issue: http://en.cppreference.com/w/cpp/string/byte/isdigit

I'd suggest adding llvm::isDigit and llvm::isHexDigit in StringExtras.h next to our other hex digit conversion routines, so we don't have to repeat this surprising cast everywhere.

davide added a comment.Oct 2 2017, 9:31 AM
In D38461#885923, @rnk wrote:

cppreference suggests writing our own wrapper to avoid this issue: http://en.cppreference.com/w/cpp/string/byte/isdigit

I'd suggest adding llvm::isDigit and llvm::isHexDigit in StringExtras.h next to our other hex digit conversion routines, so we don't have to repeat this surprising cast everywhere.

Yes, that sounds the best solution actually.

grimar updated this revision to Diff 117503.Oct 3 2017, 5:13 AM
  • Added helpers to StringExtras as was suggested.
ruiu accepted this revision.Oct 3 2017, 4:32 PM
ruiu added a subscriber: ruiu.

LGTM

This revision is now accepted and ready to land.Oct 3 2017, 4:32 PM
grimar edited the summary of this revision. (Show Details)Oct 4 2017, 1:50 AM
This revision was automatically updated to reflect the committed changes.