This is an archive of the discontinued LLVM Phabricator instance.

[FileCheck] Fix isalpha/isalnum calls
ClosedPublic

Authored by jdenny on May 12 2020, 2:50 PM.

Details

Summary

D79276 caused the following builder to fail:

http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/23489

Specifically, FileCheck dumped stack in the following tests:

LLVM :: MC/Mips/micromips-jump-pc-region.s
LLVM :: MC/Mips/mips-jump-pc-region.s

Those tests contained characters encoded as 160 but that render (at
least for me in vim) like a single space (32). Those characters
appeared between the # and RUN: on several lines, and D79276
caused FileCheck to process those lines differently: RUN: is a
comment directive. As a result, D79276 caused FileCheck to start
calling is isalnum on those characters.

The problem is that FileCheck calls isalnum on type char without
casting to unsigned char first, so it sign-extends 160 beyond what
unsigned char or EOF can represent. C says that has undefined
behavior. This problem is general to FileCheck's prefix parsing and
so exists independently of D79276.

524457edbc3d fixed the above tests. This patch changes FileCheck to
use LLVM's replacements for ctype.h functions, and it adds tests for
cases that are representative with or without D79276.

Diff Detail

Event Timeline

jdenny created this revision.May 12 2020, 2:50 PM
jhenderson accepted this revision.May 13 2020, 1:37 AM

LGTM, but with one suggestion.

llvm/test/FileCheck/bad-char.txt
34–37

Nit: I have a marginal preference for where CHECKs like this aren't shared to interleave them with the RUN lines, i.e.

RUN: ... --check-prefix=CHECK1

CHECK1: ...

RUN: ... --check-prefix=CHECK2

CHECK2: ...

It makes it slightly easier to follow each individual case. I might be inclined to move the input similarly, but I don't feel strongly about that one either way.

This revision is now accepted and ready to land.May 13 2020, 1:37 AM
thopre accepted this revision.May 13 2020, 6:24 AM

LGTM with James comment addressed. It seems I'm responsible for this mess since I introduced the first such patterns. My apologies for not paying enough attention to the type in the API.

jdenny marked an inline comment as done.May 13 2020, 7:09 AM

It seems I'm responsible for this mess since I introduced the first such patterns. My apologies for not paying enough attention to the type in the API.

Well, it seems no one who reviewed spotted it, and I doubt I would have thought of it either. :-)

llvm/test/FileCheck/bad-char.txt
34–37

I usually do that and originally wrote this one that way too. However, due to the symmetry among the cases here, I decided the way I have it now would be easier to read.

But it seems I'm outvoted two to one, and it don't feel strongly about it, especially given that my opinion might flip if the test evolves later. If no one objects, I'll change to the following form, with horizontal rules to guide the eye:

//------------------------------------------------
RUN: ... --check-prefix=CHECK1

input1 lines

CHECK1: ...

//------------------------------------------------
RUN: ... --check-prefix=CHECK2

input2 lines

CHECK2: ...

Please use llvm::isAlpha/llvm::isAlnum instead of the functions from ctype.h; ctype.h is locale-dependent.

jdenny updated this revision to Diff 264040.May 14 2020, 11:16 AM
jdenny edited the summary of this revision. (Show Details)

Rebased.
Reformatted test as discussed.
Switched to llvm::isAlpha and llvm::isAlnum.

jdenny marked an inline comment as done.May 14 2020, 11:18 AM

Please use llvm::isAlpha/llvm::isAlnum instead of the functions from ctype.h; ctype.h is locale-dependent.

Thanks! I wasn't aware of those.

This revision was automatically updated to reflect the committed changes.