This is an archive of the discontinued LLVM Phabricator instance.

[TableGen] Make the NUL character invalid in .td files
ClosedPublic

Authored by Paul-C-Anagnostopoulos on May 5 2021, 10:03 AM.

Details

Summary

This little revision makes the NUL character officially invalid in .td files.

Note that the TableGen lexer does not consistently use getNextChar() to obtain characters from the source files. So some NULs, such as those in // comments, are simply ignored.

Diff Detail

Event Timeline

Paul-C-Anagnostopoulos requested review of this revision.May 5 2021, 10:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2021, 10:04 AM

Hmm. I swore I deleted all the //// lines. Hang on . . .

Now the //// comments should be gone.

dblaikie added inline comments.May 5 2021, 1:06 PM
llvm/lib/TableGen/TGLexer.cpp
421–425

So previously a null character would terminate a comment - and with this change it won't terminate the comment anymore? Maybe this would be better in a separate patch (& maybe with a test demonstrating this behavior)

llvm/lib/TableGen/TGLexer.cpp
421–425

It didn't terminate the comment, but rather was included in the comment. Just as the new code does.

Good idea to write a test for this. I will write one tomorrow.

I added a test. There are NULs in the .td file.

Hmm, guess there's no way to review the nul character test file? (I can't download the patch, can't view it in phab)

Maybe it'd make sense to separate the nul char file from the text file that has the RUN and CHECK lines? /maybe/ it could be a plain text file that gets modified to include a null character (eg: a RUN line that uses 'cat' to add a nul character at the start of the file?)?

There are four or five NULs in the test file, to check them in various positions. Would it be helpful to add a second nul-char.txt file that is a duplicate, except that it has an at sign (@) where every null is? I could refer to it with a comment in nul-char.td.

There are four or five NULs in the test file, to check them in various positions. Would it be helpful to add a second nul-char.txt file that is a duplicate, except that it has an at sign (@) where every null is? I could refer to it with a comment in nul-char.td.

Maybe? But could you look around at other tests, see if maybe there's a way to construct this test while having it still be a single text file? I'm wondering if any existing tests use the unix tr program or something to do specific replacements.

If there's no reasonable way to make this test work as a text file - maybe it'll be best to include the binary file and then a separate text file with the RUN/CHECK lines - and, yeah, maybe it'd be alright/good to include the text of the binary file with @ instead of null where the nulls would be as the text inside the RUN/CHECK file.

Okay, sounds good. I'll see what I can find.

There are test files that use sed to do replacements. I'll ask on llvm dev how to create a RUN: line that does the right thing. It has to replace @ with NUL and get rid of the sed command so it doesn't recurse forever. My Unix command knowledge is meager.

There are test files that use sed to do replacements. I'll ask on llvm dev how to create a RUN: line that does the right thing. It has to replace @ with NUL and get rid of the sed command so it doesn't recurse forever. My Unix command knowledge is meager.

Sorry, I missed the second step - why would it need to get rid of the sed command?

I'd expect something like:

; RUN: sed -e 's/@/NUL/g' %s > %t.input
; RUN: llvm-tblgen %t.input | FileCheck

Stuff @ more stuff

Oh, you mean the problem that the sed line itself would get replaced - then there'd be a null character in the RUN line? Maybe you can strip that off with head -n +1 or something like that?

No, I was confused, thinking that the modified file would go back through Lit. But of course it doesn't. I will set up something like you suggest. I'm sure there is a way to use an escape sequence to represent the NUL character in sed . . . yes, \000.

First I have to get sed for Windows.

No, I was confused, thinking that the modified file would go back through Lit. But of course it doesn't. I will set up something like you suggest. I'm sure there is a way to use an escape sequence to represent the NUL character in sed . . . yes, \000.

First I have to get sed for Windows.

How does the existing sed test work on your machine? Or is the test flagged as "REQUIRES: shell" and so is unsupported on your system? (in which case getting sed won't be enough to make the test run, perhaps?)

Hang on, there is already a TableGen test that uses sed: intrin-properties.td. So obviously I have sed in the right place.

I forgot to submit this!

I used David's trick to eliminate the NUL characters in the test file. They are now represented by at signs (@) and translated on the fly.

dblaikie accepted this revision.May 10 2021, 9:43 AM

Looks good - please address the clang-format issues before submitting (there's a clang-format script that can automatically apply changes to only the things touched by your patch (from the root of the monorepo: ./clang/tools/clang-format/git-clang-format origin)

This revision is now accepted and ready to land.May 10 2021, 9:43 AM
This revision was landed with ongoing or failed builds.May 11 2021, 6:21 AM
This revision was automatically updated to reflect the committed changes.