This is an archive of the discontinued LLVM Phabricator instance.

Introduce FileEdit utility
Needs ReviewPublic

Authored by zturner on Jun 28 2017, 10:12 AM.

Details

Summary

This introduces a new utility, which I'm calling FileEdit, which is intended to increase portability in tests across platforms and shells, while adding some additional functionality that enables us to write more concise, easily-understandable, and easily-creatable tests.

To elaborate on the first point, a (somewhat) common pattern in tests is to run sed against the test file, replacing certain fixed strings in the check statements with values that are known only during the run line. For example, you might compile something and get some debug info with a path relative to the test output directory, and you want to write a check statement that checks the path is correct. Or you might want to run the same command multiple times where only the value of a single command line argument is changed, which would cause some small sectino of the output to be different but the test to otherwise be identical. In these and other cases, you may wish to use the same CHECK prefix with a replacement variable.

sed solves this, but a) it's needlessly complicated and not everyone is a Level 99 sed wizard, and b) There are real portability issues when it comes to escaping and slashes that are not easily solvable.

For the second point, lit tests are convenient to write when only have 1 input file to whatever tool we wish to exercise, because we can inline the text of the input file within the test file itself, meaning a test is entirely self contained.

This is not true anymore when a test requires multiple input files (for example, consider the case where you want to compile 2 .cpp files into separate object files and link them together. It would be nice we could have the same level of test-isolation for these kinds of tests as well.

This patch addresses both of the above issues, but one could easily imagine other use cases for a tool such as this, so it has potential for a larger feature set.

It addresses the first by having a command line option --replace=<string>, where <string> is of the form s/pattern/replace/ and behaves in a mostly intuitive way -- it replaces all occurrences of pattern (quirk: pattern is not a regex, it is a string literal).

It addresses the second by scanning lines for a directive that identifies the beginning of a file, and it copies everything that follows (until the next directive) to the corresponding file. For example,

{!-- Foo.cpp
int main(int argc, char **argv) {
  return 0;
}

would create a file named Foo.cpp with the aforementioned text, which can then be referenced in a run line.

Diff Detail

Event Timeline

zturner created this revision.Jun 28 2017, 10:12 AM
rnk edited edge metadata.Jun 28 2017, 10:18 AM

Why {!--? It seems inconsistent with the usual LLVM thing of an all caps directive like CHECK:, RUN:, or a Unixy-EOF. Unbalanced braces will make tools like clang-format misbehave.

In D34764#794038, @rnk wrote:

Why {!--? It seems inconsistent with the usual LLVM thing of an all caps directive like CHECK:, RUN:, or a Unixy-EOF. Unbalanced braces will make tools like clang-format misbehave.

clang-format doesn't run against .test files anyway, so I don't think that should be a problem. Also, all-caps directives are already reserved check prefixes, and overloading it here would not make the distinctino easily identifiable. Am I looking at a check prefix or am I looking at a file directive? I chose something that would stand out. I could use a square brace instead of a curly brace, or even an angle bracket (although <!-- marks the beginning of an xml comment, not sure if that's a good or bad thing).

chandlerc edited edge metadata.Jun 28 2017, 10:28 AM

First point, I feel like these are two very different pieces of functionality. My thought sare pretty specific to the particular tool.

Regarding replacing the 'sed' like functionality: many tests that use 'sed' really shouldn't. FileCheck variables and other techniques tend to be much better IMO. But if we need a portable replacement, sure. I don't have strong opinions about the exact way we replace it.

Regarding the separate file creation thing... I'm much less enthusiastic about this. I *do* want access to tools like clang-format while editing bits of C++ code. I don't think embedding them into the .test file is a good idea.

For almost all cases, I think we either *should* keep things in separate files and then we should just have actual separate files that are easy to edit, or we should use existing ways to put things into a single file. The use case you brought up: compile the same source file N different tims with a different macro to create the N object files. Use the existing preprocessor syntax to partition the file contents.

We used to have the problem of needing lots of separate files for C++ modules tests and are fixing that by allowing us to build a multi-file module from a single source input.

Technically, we can even do this with a single IR file, running llvm-extract to separate it, and then processing the separate files.

Anyways, I understand that we'll always need *some* support for separate files, but i think the Inputs directory and actual separate files is a reasonable mechanism.

rnk added a comment.Jun 28 2017, 10:47 AM

The tests in llvm/test/tools/llvm-symbolizer are a good place to use this. See the ADDR: sed+grep pattern I used to extract some hex numbers from the file.

rnk added a comment.Jun 28 2017, 10:56 AM

@chandlerc I think we have use cases for file splitting in the linker. Many interesting test cases require two object files, and it's annoying to have to make a separate file in %S/Inputs just for this. Consider lld/test/COFF/reloc-discarded.s. I recently added this and I need to make a COFF object that defines a global in a comdat. This is the RUN: line I used to do that:

# RUN: echo -e '.section .bss,"bw",discard,main_global\n.global main_global\n main_global:\n .long 0' | \
# RUN:     llvm-mc - -filetype=obj -o %t1.obj -triple x86_64-windows-msvc

That's pretty unreadable. I would much rather write:

# RUN: FileEdit %s %t
# RUN: llvm-mc %t/a.s -o %t/a.obj -filetype=obj -triple x86_64-windows-msvc
# RUN: llvm-mc %t/b.s -o %t/b.obj -filetype=obj -triple x86_64-windows-msvc
# RUN: lld-link %t/a.obj %t/b.obj -entry ...

# SPLIT: a.s
.section .bss,"bw",discard,main_global
.globl main_global
main_global:
    .long 0

# SPLIT: b.s
.section .bss,"bw",discard,main_global
.globl main_global
main_global:
    .long 0
...

Editors would enable assembly syntax highlighting, even. You can apply the same technique to Clang tests that need headers.

ruiu added a subscriber: ruiu.Jun 28 2017, 11:15 AM

How about adding here document to lit command? I mean it is more readable if you can write something like this

llvm-mc <<END -o %t/a.obj -filetype=obj -triple x86_64-windows-msvc
.section .bss,"bw",discard,main_global
.globl main_global
main_global:
    .long 0
END

than

  1. RUN: FileEdit %s %t
  2. RUN: llvm-mc %t/a.s -o %t/a.obj -filetype=obj -triple x86_64-windows-msvc
  3. SPLIT: a.s .section .bss,"bw",discard,main_global .globl main_global main_global: .long 0

Also people are already familiar with the concept of here document, so it should be easy to digest and use.

In D34764#794086, @rnk wrote:

@chandlerc I think we have use cases for file splitting in the linker. Many interesting test cases require two object files, and it's annoying to have to make a separate file in %S/Inputs just for this. Consider lld/test/COFF/reloc-discarded.s. I recently added this and I need to make a COFF object that defines a global in a comdat. This is the RUN: line I used to do that:

# RUN: echo -e '.section .bss,"bw",discard,main_global\n.global main_global\n main_global:\n .long 0' | \
# RUN:     llvm-mc - -filetype=obj -o %t1.obj -triple x86_64-windows-msvc

That's pretty unreadable.

FWIW, I agree this is awful.

I'm not as fussed as you seem to be about having separate files... but OK.

I would much rather write:

# RUN: FileEdit %s %t
# RUN: llvm-mc %t/a.s -o %t/a.obj -filetype=obj -triple x86_64-windows-msvc
# RUN: llvm-mc %t/b.s -o %t/b.obj -filetype=obj -triple x86_64-windows-msvc
# RUN: lld-link %t/a.obj %t/b.obj -entry ...

# SPLIT: a.s
.section .bss,"bw",discard,main_global
.globl main_global
main_global:
    .long 0

# SPLIT: b.s
.section .bss,"bw",discard,main_global
.globl main_global
main_global:
    .long 0
...

Editors would enable assembly syntax highlighting, even. You can apply the same technique to Clang tests that need headers.

I just don't think we need a new syntax or tool here.

We can run the preprocessor to create a header file, and then include it. We can use the preprocessor to have a single source file and compile it in N different ways.

We could even use preprocessed assembly to solve the issue you cite in the lld test suite. I'd honestly rather continue using the *existing* file manipulation syntaxes rather than adding yet another magic comment that actually does semantic splitting.

In D34764#794070, @rnk wrote:

The tests in llvm/test/tools/llvm-symbolizer are a good place to use this. See the ADDR: sed+grep pattern I used to extract some hex numbers from the file.

I think this is the closest to a good example honestly.

Notably, there is *no* file format or syntax here. Nothing to use. We just need to extract a prefixed set of lines from comment markers. *that* seems like a nice tool to add. FileExtract or something that does the exact same prefix-in-comment searching as FileCheck and lit, and extracts those lines (minus prefix) to a new file or to stdout.

But I would specifically advocate against using this heavily for things like assembly or C code. There, I would much rather a single file with preprocessing to split, or N files so that tools that are expected to work on those files continues to work.

In D34764#794086, @rnk wrote:

@chandlerc I think we have use cases for file splitting in the linker. Many interesting test cases require two object files, and it's annoying to have to make a separate file in %S/Inputs just for this. Consider lld/test/COFF/reloc-discarded.s. I recently added this and I need to make a COFF object that defines a global in a comdat. This is the RUN: line I used to do that:

# RUN: echo -e '.section .bss,"bw",discard,main_global\n.global main_global\n main_global:\n .long 0' | \
# RUN:     llvm-mc - -filetype=obj -o %t1.obj -triple x86_64-windows-msvc

That's pretty unreadable.

FWIW, I agree this is awful.

I'm not as fussed as you seem to be about having separate files... but OK.

I would much rather write:

# RUN: FileEdit %s %t
# RUN: llvm-mc %t/a.s -o %t/a.obj -filetype=obj -triple x86_64-windows-msvc
# RUN: llvm-mc %t/b.s -o %t/b.obj -filetype=obj -triple x86_64-windows-msvc
# RUN: lld-link %t/a.obj %t/b.obj -entry ...

# SPLIT: a.s
.section .bss,"bw",discard,main_global
.globl main_global
main_global:
    .long 0

# SPLIT: b.s
.section .bss,"bw",discard,main_global
.globl main_global
main_global:
    .long 0
...

Editors would enable assembly syntax highlighting, even. You can apply the same technique to Clang tests that need headers.

I just don't think we need a new syntax or tool here.

We can run the preprocessor to create a header file, and then include it. We can use the preprocessor to have a single source file and compile it in N different ways.

We could even use preprocessed assembly to solve the issue you cite in the lld test suite. I'd honestly rather continue using the *existing* file manipulation syntaxes rather than adding yet another magic comment that actually does semantic splitting.

When I went to edit some of the tests, I came up with a pattern that might be more agreeable than having a magic directive to spit out files? I ended up taking this test:

; RUN: sed -e s/.T1:// %s | not opt -lint -disable-output 2>&1 | FileCheck --check-prefix=CHECK1 %s
; RUN: sed -e s/.T2:// %s | not opt -lint -disable-output 2>&1 | FileCheck --check-prefix=CHECK2 %s

target triple = "x86_64-pc-windows-msvc"

declare void @f()

;T1: declare i8* @llvm.eh.exceptionpointer.p0i8(i32)
;T1:
;T1: define void @test1() personality i32 (...)* @__CxxFrameHandler3 {
;T1:   call i8* @llvm.eh.exceptionpointer.p0i8(i32 0)
;T1:   ret void
;T1: }
;CHECK1: Intrinsic has incorrect argument type!
;CHECK1-NEXT: i8* (i32)* @llvm.eh.exceptionpointer.p0i8

;T2: declare i8* @llvm.eh.exceptionpointer.p0i8(token)
;T2:
;T2: define void @test2() personality i32 (...)* @__CxxFrameHandler3 {
;T2:   call i8* @llvm.eh.exceptionpointer.p0i8(token undef)
;T2:   ret void
;T2: }
;CHECK2: eh.exceptionpointer argument must be a catchpad
;CHECK2-NEXT:  call i8* @llvm.eh.exceptionpointer.p0i8(token undef)

declare i32 @__CxxFrameHandler3(...)

and re-writing it as:

; RUN: FileEdit -keep-prefix=TEST1: %s | not opt -lint -disable-output 2>&1 | FileCheck --check-prefix=CHECK1 %s
; RUN: FileEdit -keep-prefix=TEST2: %s | not opt -lint -disable-output 2>&1 | FileCheck --check-prefix=CHECK2 %s

TEST1: target triple = "x86_64-pc-windows-msvc"
TEST1: declare void @f()
TEST1: declare i8* @llvm.eh.exceptionpointer.p0i8(i32)
TEST1: 
TEST1: define void @test1() personality i32 (...)* @__CxxFrameHandler3 {
TEST1:   call i8* @llvm.eh.exceptionpointer.p0i8(i32 0)
TEST1:   ret void
TEST1: }
TEST1: declare i32 @__CxxFrameHandler3(...)

CHECK1: Intrinsic has incorrect argument type!
CHECK1-NEXT: i8* (i32)* @llvm.eh.exceptionpointer.p0i8

TEST2: target triple = "x86_64-pc-windows-msvc"
TEST2: declare void @f()
TEST2: declare i8* @llvm.eh.exceptionpointer.p0i8(token)
TEST2: 
TEST2: define void @test2() personality i32 (...)* @__CxxFrameHandler3 {
TEST2:   call i8* @llvm.eh.exceptionpointer.p0i8(token undef)
TEST2:   ret void
TEST2: }
TEST2: declare i32 @__CxxFrameHandler3(...)

CHECK2: eh.exceptionpointer argument must be a catchpad
CHECK2-NEXT:  call i8* @llvm.eh.exceptionpointer.p0i8(token undef)

Perhaps just having a standardized way of saying "drop everything in the file that doesn't start with X, and for those that do, also drop the X" is sufficient?

rnk added a comment.Jun 28 2017, 1:53 PM

But I would specifically advocate against using this heavily for things like assembly or C code. There, I would much rather a single file with preprocessing to split, or N files so that tools that are expected to work on those files continues to work.

I don't believe llvm-mc is capable of doing this preprocessing, and lld doesn't depend on clang, so we can't use the C pre-processor there. Really, I just want a tool like the C preprocessor that works on plain text files. That's kind of how I envision FileEdit: it splits files (#ifdef) and does string substitutions (-D macros). If you think using the C preprocessor for this kind of stuff is OK and not confusing, then I don't see how a standalone tool is different.

I really don't want to use prefixes to describe multiple input files. It breaks all the normal editor highlighting and indentation that you normally get for free.

Also, users already submit bugs reduced multi-TU bugs to us in approximately this format. Examples:
https://bugs.llvm.org/show_bug.cgi?id=33598
https://bugs.llvm.org/show_bug.cgi?id=33542
https://bugs.llvm.org/show_bug.cgi?id=33624

I prefer bug reports like this over ones with an attached tarball or zip archive. If we agree (maybe we don't) that this is a good way to describe a multi-TU bug to a developer, then it follows that our test suite should support a test format that looks like these bug reports.


I have used the C pre-processor as you describe to write ABI handshake tests. They looked kind of like this:

// RUN: %clang -c %s -o t1.obj
// RUN: %clang -c %s -o t2.obj -DFOO
// RUN: %clang t1.obj t2.obj -o t.exe && ./t.exe
struct NonTrivial { int x; NonTrivial(const NonTrivial &); };
int foo(NonTrivial o);
#ifdef FOO
int foo(NonTrivial o) { return o.x; }
#else
int main() {
  NonTrivial o;
  o.x = 42;
  return foo(o);
}
#endif

... and I'd like to have that capability for non-C tests that need multiple input files.

rnk added a subscriber: rsmith.Jun 28 2017, 1:58 PM

I'm not as fussed as you seem to be about having separate files... but OK.

Fair. :)

It reminds me of how every compiler person I know has different ways of feeding small test cases to the compiler. On the one hand, you have @rsmith who does echo 'int evil_test() {...}' | clang -cc1 -x c++ - ... in zsh, and then you have me perpetually compiling t.cpp, and still other people create unique directories for every interesting test case they come across. I can't say what's right, but personally, not having to chase a filepath into %S/Inputs is one less step that I need to take to understand a test. For some people, that's not even having to think about giving filenames to their tests because they pipe them directly to the compiler.