This is an archive of the discontinued LLVM Phabricator instance.

AsmParser: Require null terminator in the created memory buffer.
ClosedPublic

Authored by arphaman on May 20 2015, 9:59 AM.

Details

Summary

This patch modifies the memory buffer creation in the AsmParser library so that is requires a terminating null character.
This change is needed because LLLexer in AsmParser checks for EOF when it sees the null character, but the memory buffer
that's being created with the source assumes that the terminating null character isn't required.

I was motivated to submit this patch as my initial MIR commit had the problem where the input to the AsmParser wasn't null
terminated and it would have been better to catch it with an assertion early on.

Diff Detail

Repository
rL LLVM

Event Timeline

arphaman updated this revision to Diff 26156.May 20 2015, 9:59 AM
arphaman retitled this revision from to AsmParser: Require null terminator in the created memory buffer..
arphaman updated this object.
arphaman edited the test plan for this revision. (Show Details)
arphaman added reviewers: dexonsmith, rafael.
arphaman set the repository for this revision to rL LLVM.
arphaman added a subscriber: Unknown Object (MLST).
MatzeB added a subscriber: MatzeB.May 20 2015, 11:11 AM

That's unfortunate. It's probably better to change LLLexer to not require a 0 at the end. From what I can see the interface takes a StringRef which has a length and is not necessarily 0-terminated anyway.

arphaman updated this revision to Diff 26170.May 20 2015, 12:36 PM
arphaman retitled this revision from AsmParser: Require null terminator in the created memory buffer. to AsmParser: Use range check instead of null character check for EOF while tokenizing. .

I updated the patch to use a range check in the LLLexer instead of relying on the null character for EOF.

I think I'm wrong here actually, while this does work, it's not enough as a
lot of other code in LLLexer as it assumes no range checks.
It would probably be better to go with the original patch with a null
terminating character, as the LLLexer will require a lot of range
checks to be bug free when working with the non-null terminated input.

2015-05-20 12:36 GMT-07:00 Alex Lorenz <arphaman@gmail.com>:

I updated the patch to use a range check in the LLLexer instead of relying
on the null character for EOF.

REPOSITORY

rL LLVM

http://reviews.llvm.org/D9883

Files:

lib/AsmParser/LLLexer.cpp
unittests/AsmParser/AsmParserTest.cpp
unittests/AsmParser/CMakeLists.txt
unittests/AsmParser/Makefile
unittests/CMakeLists.txt
unittests/Makefile

EMAIL PREFERENCES

http://reviews.llvm.org/settings/panel/emailpreferences/
arphaman updated this revision to Diff 26172.May 20 2015, 1:03 PM
arphaman retitled this revision from AsmParser: Use range check instead of null character check for EOF while tokenizing. to AsmParser: Require null terminator in the created memory buffer..

I went back to the patch that requires a terminating null character again, as the range
checks in LLLexer would have to be introduced in a lot of places.

I updated the unittest to include a passing testcase as Duncan suggested.

MatzeB accepted this revision.May 20 2015, 1:27 PM
MatzeB added a reviewer: MatzeB.

Thanks for checking. This LGTM if Matthias agrees.

I still think LLLexer should be changed in the long term to use getNextChar() consistently and a range check included in it. While technically it can be indeed faster to check for a sentinel value like '\0' in this case the check was not included in the relevant switches but was a separate check in getNextChar() anyway, so I don't think it would have had any impact; Even if we would get rid of the check I'd be surprised if you could actually measure the difference.

Anyway I can understand that we may not want to change it right now, so this patch LGTM.

This revision is now accepted and ready to land.May 20 2015, 1:27 PM
This revision was automatically updated to reflect the committed changes.

2015-05-20 13:27 GMT-07:00 Matthias Braun <matze@braunis.de>:

Thanks for checking. This LGTM if Matthias agrees.

I still think LLLexer should be changed in the long term to use
getNextChar() consistently and a range check included in it. While
technically it can be indeed faster to check for a sentinel value like '\0'
in this case the check was not included in the relevant switches but was a
separate check in getNextChar() anyway, so I don't think it would have had
any impact; Even if we would get rid of the check I'd be surprised if you
could actually measure the difference.

Anyway I can understand that we may not want to change it right now, so
this patch LGTM.

Thanks, I committed the patch in r237833.

REPOSITORY

rL LLVM

http://reviews.llvm.org/D9883

EMAIL PREFERENCES

http://reviews.llvm.org/settings/panel/emailpreferences/