This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
CMakeLists.txt
-
test/
-
CMakeLists.txt
-
FileEdit/
-
Inputs/
-
edit.in
-
multiple-edits-same-line.test
-
one-file.test
-
simple-edits.test
-
two-files.test
-
utils/FileEdit/
-
FileEdit/
-
CMakeLists.txt
-
FileEdit.cpp

Differential D34764

Introduce FileEdit utility
Needs ReviewPublic

Authored by zturner on Jun 28 2017, 10:12 AM.

Download Raw Diff

Details

Reviewers

rnk
chandlerc
mehdi_amini
dblaikie

Summary

This introduces a new utility, which I'm calling FileEdit, which is intended to increase portability in tests across platforms and shells, while adding some additional functionality that enables us to write more concise, easily-understandable, and easily-creatable tests.

To elaborate on the first point, a (somewhat) common pattern in tests is to run sed against the test file, replacing certain fixed strings in the check statements with values that are known only during the run line. For example, you might compile something and get some debug info with a path relative to the test output directory, and you want to write a check statement that checks the path is correct. Or you might want to run the same command multiple times where only the value of a single command line argument is changed, which would cause some small sectino of the output to be different but the test to otherwise be identical. In these and other cases, you may wish to use the same CHECK prefix with a replacement variable.

sed solves this, but a) it's needlessly complicated and not everyone is a Level 99 sed wizard, and b) There are real portability issues when it comes to escaping and slashes that are not easily solvable.

For the second point, lit tests are convenient to write when only have 1 input file to whatever tool we wish to exercise, because we can inline the text of the input file within the test file itself, meaning a test is entirely self contained.

This is not true anymore when a test requires multiple input files (for example, consider the case where you want to compile 2 .cpp files into separate object files and link them together. It would be nice we could have the same level of test-isolation for these kinds of tests as well.

This patch addresses both of the above issues, but one could easily imagine other use cases for a tool such as this, so it has potential for a larger feature set.

It addresses the first by having a command line option --replace=<string>, where <string> is of the form s/pattern/replace/ and behaves in a mostly intuitive way -- it replaces all occurrences of pattern (quirk: pattern is not a regex, it is a string literal).

It addresses the second by scanning lines for a directive that identifies the beginning of a file, and it copies everything that follows (until the next directive) to the corresponding file. For example,

{!-- Foo.cpp
int main(int argc, char **argv) {
  return 0;
}

would create a file named Foo.cpp with the aforementioned text, which can then be referenced in a run line.

Diff Detail

Event Timeline

zturner created this revision.Jun 28 2017, 10:12 AM

Herald added subscribers: aheejin, aprantl, mgorny. · View Herald TranscriptJun 28 2017, 10:12 AM

Why {!--? It seems inconsistent with the usual LLVM thing of an all caps directive like CHECK:, RUN:, or a Unixy-EOF. Unbalanced braces will make tools like clang-format misbehave.

In D34764#794038, @rnk wrote:

Why {!--? It seems inconsistent with the usual LLVM thing of an all caps directive like CHECK:, RUN:, or a Unixy-EOF. Unbalanced braces will make tools like clang-format misbehave.

clang-format doesn't run against .test files anyway, so I don't think that should be a problem. Also, all-caps directives are already reserved check prefixes, and overloading it here would not make the distinctino easily identifiable. Am I looking at a check prefix or am I looking at a file directive? I chose something that would stand out. I could use a square brace instead of a curly brace, or even an angle bracket (although <!-- marks the beginning of an xml comment, not sure if that's a good or bad thing).

First point, I feel like these are two very different pieces of functionality. My thought sare pretty specific to the particular tool.

Regarding replacing the 'sed' like functionality: many tests that use 'sed' really shouldn't. FileCheck variables and other techniques tend to be much better IMO. But if we need a portable replacement, sure. I don't have strong opinions about the exact way we replace it.

Regarding the separate file creation thing... I'm much less enthusiastic about this. I *do* want access to tools like clang-format while editing bits of C++ code. I don't think embedding them into the .test file is a good idea.

For almost all cases, I think we either *should* keep things in separate files and then we should just have actual separate files that are easy to edit, or we should use existing ways to put things into a single file. The use case you brought up: compile the same source file N different tims with a different macro to create the N object files. Use the existing preprocessor syntax to partition the file contents.

We used to have the problem of needing lots of separate files for C++ modules tests and are fixing that by allowing us to build a multi-file module from a single source input.

Technically, we can even do this with a single IR file, running llvm-extract to separate it, and then processing the separate files.

Anyways, I understand that we'll always need *some* support for separate files, but i think the Inputs directory and actual separate files is a reasonable mechanism.

The tests in llvm/test/tools/llvm-symbolizer are a good place to use this. See the ADDR: sed+grep pattern I used to extract some hex numbers from the file.

@chandlerc I think we have use cases for file splitting in the linker. Many interesting test cases require two object files, and it's annoying to have to make a separate file in %S/Inputs just for this. Consider lld/test/COFF/reloc-discarded.s. I recently added this and I need to make a COFF object that defines a global in a comdat. This is the RUN: line I used to do that:

# RUN: echo -e '.section .bss,"bw",discard,main_global\n.global main_global\n main_global:\n .long 0' | \
# RUN:     llvm-mc - -filetype=obj -o %t1.obj -triple x86_64-windows-msvc

That's pretty unreadable. I would much rather write:

# RUN: FileEdit %s %t
# RUN: llvm-mc %t/a.s -o %t/a.obj -filetype=obj -triple x86_64-windows-msvc
# RUN: llvm-mc %t/b.s -o %t/b.obj -filetype=obj -triple x86_64-windows-msvc
# RUN: lld-link %t/a.obj %t/b.obj -entry ...

# SPLIT: a.s
.section .bss,"bw",discard,main_global
.globl main_global
main_global:
    .long 0

# SPLIT: b.s
.section .bss,"bw",discard,main_global
.globl main_global
main_global:
    .long 0
...

Editors would enable assembly syntax highlighting, even. You can apply the same technique to Clang tests that need headers.

How about adding here document to lit command? I mean it is more readable if you can write something like this

llvm-mc <<END -o %t/a.obj -filetype=obj -triple x86_64-windows-msvc
.section .bss,"bw",discard,main_global
.globl main_global
main_global:
    .long 0
END

than

RUN: FileEdit %s %t
RUN: llvm-mc %t/a.s -o %t/a.obj -filetype=obj -triple x86_64-windows-msvc
SPLIT: a.s .section .bss,"bw",discard,main_global .globl main_global main_global: .long 0

Also people are already familiar with the concept of here document, so it should be easy to digest and use.

In D34764#794086, @rnk wrote:
@chandlerc I think we have use cases for file splitting in the linker. Many interesting test cases require two object files, and it's annoying to have to make a separate file in %S/Inputs just for this. Consider lld/test/COFF/reloc-discarded.s. I recently added this and I need to make a COFF object that defines a global in a comdat. This is the RUN: line I used to do that:
# RUN: echo -e '.section .bss,"bw",discard,main_global\n.global main_global\n main_global:\n .long 0' | \
# RUN:     llvm-mc - -filetype=obj -o %t1.obj -triple x86_64-windows-msvc
That's pretty unreadable.

FWIW, I agree this is awful.

I'm not as fussed as you seem to be about having separate files... but OK.

I would much rather write:

# RUN: FileEdit %s %t
# RUN: llvm-mc %t/a.s -o %t/a.obj -filetype=obj -triple x86_64-windows-msvc
# RUN: llvm-mc %t/b.s -o %t/b.obj -filetype=obj -triple x86_64-windows-msvc
# RUN: lld-link %t/a.obj %t/b.obj -entry ...

# SPLIT: a.s
.section .bss,"bw",discard,main_global
.globl main_global
main_global:
    .long 0

# SPLIT: b.s
.section .bss,"bw",discard,main_global
.globl main_global
main_global:
    .long 0
...

Editors would enable assembly syntax highlighting, even. You can apply the same technique to Clang tests that need headers.

I just don't think we need a new syntax or tool here.

We can run the preprocessor to create a header file, and then include it. We can use the preprocessor to have a single source file and compile it in N different ways.

We could even use preprocessed assembly to solve the issue you cite in the lld test suite. I'd honestly rather continue using the *existing* file manipulation syntaxes rather than adding yet another magic comment that actually does semantic splitting.

In D34764#794070, @rnk wrote:

The tests in llvm/test/tools/llvm-symbolizer are a good place to use this. See the ADDR: sed+grep pattern I used to extract some hex numbers from the file.

I think this is the closest to a good example honestly.

Notably, there is *no* file format or syntax here. Nothing to use. We just need to extract a prefixed set of lines from comment markers. *that* seems like a nice tool to add. FileExtract or something that does the exact same prefix-in-comment searching as FileCheck and lit, and extracts those lines (minus prefix) to a new file or to stdout.

But I would specifically advocate against using this heavily for things like assembly or C code. There, I would much rather a single file with preprocessing to split, or N files so that tools that are expected to work on those files continues to work.

In D34764#794154, @chandlerc wrote:
In D34764#794086, @rnk wrote:
@chandlerc I think we have use cases for file splitting in the linker. Many interesting test cases require two object files, and it's annoying to have to make a separate file in %S/Inputs just for this. Consider lld/test/COFF/reloc-discarded.s. I recently added this and I need to make a COFF object that defines a global in a comdat. This is the RUN: line I used to do that:
# RUN: echo -e '.section .bss,"bw",discard,main_global\n.global main_global\n main_global:\n .long 0' | \
# RUN:     llvm-mc - -filetype=obj -o %t1.obj -triple x86_64-windows-msvc
That's pretty unreadable.
FWIW, I agree this is awful.

I'm not as fussed as you seem to be about having separate files... but OK.
I would much rather write:
# RUN: FileEdit %s %t
# RUN: llvm-mc %t/a.s -o %t/a.obj -filetype=obj -triple x86_64-windows-msvc
# RUN: llvm-mc %t/b.s -o %t/b.obj -filetype=obj -triple x86_64-windows-msvc
# RUN: lld-link %t/a.obj %t/b.obj -entry ...

# SPLIT: a.s
.section .bss,"bw",discard,main_global
.globl main_global
main_global:
    .long 0

# SPLIT: b.s
.section .bss,"bw",discard,main_global
.globl main_global
main_global:
    .long 0
...
Editors would enable assembly syntax highlighting, even. You can apply the same technique to Clang tests that need headers.
I just don't think we need a new syntax or tool here.

We can run the preprocessor to create a header file, and then include it. We can use the preprocessor to have a single source file and compile it in N different ways.

We could even use preprocessed assembly to solve the issue you cite in the lld test suite. I'd honestly rather continue using the *existing* file manipulation syntaxes rather than adding yet another magic comment that actually does semantic splitting.

When I went to edit some of the tests, I came up with a pattern that might be more agreeable than having a magic directive to spit out files? I ended up taking this test:

; RUN: sed -e s/.T1:// %s | not opt -lint -disable-output 2>&1 | FileCheck --check-prefix=CHECK1 %s
; RUN: sed -e s/.T2:// %s | not opt -lint -disable-output 2>&1 | FileCheck --check-prefix=CHECK2 %s

target triple = "x86_64-pc-windows-msvc"

declare void @f()

;T1: declare i8* @llvm.eh.exceptionpointer.p0i8(i32)
;T1:
;T1: define void @test1() personality i32 (...)* @__CxxFrameHandler3 {
;T1:   call i8* @llvm.eh.exceptionpointer.p0i8(i32 0)
;T1:   ret void
;T1: }
;CHECK1: Intrinsic has incorrect argument type!
;CHECK1-NEXT: i8* (i32)* @llvm.eh.exceptionpointer.p0i8

;T2: declare i8* @llvm.eh.exceptionpointer.p0i8(token)
;T2:
;T2: define void @test2() personality i32 (...)* @__CxxFrameHandler3 {
;T2:   call i8* @llvm.eh.exceptionpointer.p0i8(token undef)
;T2:   ret void
;T2: }
;CHECK2: eh.exceptionpointer argument must be a catchpad
;CHECK2-NEXT:  call i8* @llvm.eh.exceptionpointer.p0i8(token undef)

declare i32 @__CxxFrameHandler3(...)

and re-writing it as:

; RUN: FileEdit -keep-prefix=TEST1: %s | not opt -lint -disable-output 2>&1 | FileCheck --check-prefix=CHECK1 %s
; RUN: FileEdit -keep-prefix=TEST2: %s | not opt -lint -disable-output 2>&1 | FileCheck --check-prefix=CHECK2 %s

TEST1: target triple = "x86_64-pc-windows-msvc"
TEST1: declare void @f()
TEST1: declare i8* @llvm.eh.exceptionpointer.p0i8(i32)
TEST1: 
TEST1: define void @test1() personality i32 (...)* @__CxxFrameHandler3 {
TEST1:   call i8* @llvm.eh.exceptionpointer.p0i8(i32 0)
TEST1:   ret void
TEST1: }
TEST1: declare i32 @__CxxFrameHandler3(...)

CHECK1: Intrinsic has incorrect argument type!
CHECK1-NEXT: i8* (i32)* @llvm.eh.exceptionpointer.p0i8

TEST2: target triple = "x86_64-pc-windows-msvc"
TEST2: declare void @f()
TEST2: declare i8* @llvm.eh.exceptionpointer.p0i8(token)
TEST2: 
TEST2: define void @test2() personality i32 (...)* @__CxxFrameHandler3 {
TEST2:   call i8* @llvm.eh.exceptionpointer.p0i8(token undef)
TEST2:   ret void
TEST2: }
TEST2: declare i32 @__CxxFrameHandler3(...)

CHECK2: eh.exceptionpointer argument must be a catchpad
CHECK2-NEXT:  call i8* @llvm.eh.exceptionpointer.p0i8(token undef)

Perhaps just having a standardized way of saying "drop everything in the file that doesn't start with X, and for those that do, also drop the X" is sufficient?

a.elovikov added a subscriber: a.elovikov.Jun 28 2017, 11:36 AM

In D34764#794163, @chandlerc wrote:

But I would specifically advocate against using this heavily for things like assembly or C code. There, I would much rather a single file with preprocessing to split, or N files so that tools that are expected to work on those files continues to work.

I don't believe llvm-mc is capable of doing this preprocessing, and lld doesn't depend on clang, so we can't use the C pre-processor there. Really, I just want a tool like the C preprocessor that works on plain text files. That's kind of how I envision FileEdit: it splits files (#ifdef) and does string substitutions (-D macros). If you think using the C preprocessor for this kind of stuff is OK and not confusing, then I don't see how a standalone tool is different.

I really don't want to use prefixes to describe multiple input files. It breaks all the normal editor highlighting and indentation that you normally get for free.

Also, users already submit bugs reduced multi-TU bugs to us in approximately this format. Examples:
https://bugs.llvm.org/show_bug.cgi?id=33598
https://bugs.llvm.org/show_bug.cgi?id=33542
https://bugs.llvm.org/show_bug.cgi?id=33624

I prefer bug reports like this over ones with an attached tarball or zip archive. If we agree (maybe we don't) that this is a good way to describe a multi-TU bug to a developer, then it follows that our test suite should support a test format that looks like these bug reports.

I have used the C pre-processor as you describe to write ABI handshake tests. They looked kind of like this:

// RUN: %clang -c %s -o t1.obj
// RUN: %clang -c %s -o t2.obj -DFOO
// RUN: %clang t1.obj t2.obj -o t.exe && ./t.exe
struct NonTrivial { int x; NonTrivial(const NonTrivial &); };
int foo(NonTrivial o);
#ifdef FOO
int foo(NonTrivial o) { return o.x; }
#else
int main() {
  NonTrivial o;
  o.x = 42;
  return foo(o);
}
#endif

... and I'd like to have that capability for non-C tests that need multiple input files.

In D34764#794154, @chandlerc wrote:

I'm not as fussed as you seem to be about having separate files... but OK.

Fair. :)

It reminds me of how every compiler person I know has different ways of feeding small test cases to the compiler. On the one hand, you have @rsmith who does echo 'int evil_test() {...}' | clang -cc1 -x c++ - ... in zsh, and then you have me perpetually compiling t.cpp, and still other people create unique directories for every interesting test case they come across. I can't say what's right, but personally, not having to chase a filepath into %S/Inputs is one less step that I need to take to understand a test. For some people, that's not even having to think about giving filenames to their tests because they pipe them directly to the compiler.

Revision Contents

Path

Size

llvm/

CMakeLists.txt

1 line

test/

CMakeLists.txt

1 line

FileEdit/

Inputs/

edit.in

3 lines

multiple-edits-same-line.test

16 lines

one-file.test

12 lines

simple-edits.test

15 lines

two-files.test

26 lines

utils/

FileEdit/

CMakeLists.txt

5 lines

FileEdit.cpp

235 lines

Diff 104449

llvm/CMakeLists.txt

	Show First 20 Lines • Show All 848 Lines • ▼ Show 20 Lines
	endif(LLVM_ENABLE_MODULES)			endif(LLVM_ENABLE_MODULES)

	add_subdirectory(include/llvm)			add_subdirectory(include/llvm)

	add_subdirectory(lib)			add_subdirectory(lib)

	if( LLVM_INCLUDE_UTILS )			if( LLVM_INCLUDE_UTILS )
	add_subdirectory(utils/FileCheck)			add_subdirectory(utils/FileCheck)
				add_subdirectory(utils/FileEdit)
	add_subdirectory(utils/PerfectShuffle)			add_subdirectory(utils/PerfectShuffle)
	add_subdirectory(utils/count)			add_subdirectory(utils/count)
	add_subdirectory(utils/not)			add_subdirectory(utils/not)
	add_subdirectory(utils/llvm-lit)			add_subdirectory(utils/llvm-lit)
	add_subdirectory(utils/yaml-bench)			add_subdirectory(utils/yaml-bench)
	else()			else()
	if ( LLVM_INCLUDE_TESTS )			if ( LLVM_INCLUDE_TESTS )
	message(FATAL_ERROR "Including tests when not building utils will not work.			message(FATAL_ERROR "Including tests when not building utils will not work.
	▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

llvm/test/CMakeLists.txt

Show All 23 Lines	if(NOT LLVM_BUILD_TOOLS)
set(EXCLUDE_FROM_ALL ON)		set(EXCLUDE_FROM_ALL ON)
endif()		endif()

# Set the depends list as a variable so that it can grow conditionally.		# Set the depends list as a variable so that it can grow conditionally.
# NOTE: Sync the substitutions in test/lit.cfg when adding to this list.		# NOTE: Sync the substitutions in test/lit.cfg when adding to this list.
set(LLVM_TEST_DEPENDS		set(LLVM_TEST_DEPENDS
BugpointPasses		BugpointPasses
FileCheck		FileCheck
		FileEdit
LLVMHello		LLVMHello
UnitTests		UnitTests
bugpoint		bugpoint
count		count
llc		llc
lli		lli
lli-child-target		lli-child-target
llvm-ar		llvm-ar
▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

llvm/test/FileEdit/Inputs/edit.in

This file was added.

				CHECK: Test - foo - Test
				CHECK: Test - bar - Test
				CHECK: Test - baz - Test

llvm/test/FileEdit/multiple-edits-same-line.test

This file was added.

				; RUN: FileEdit -dest=%t_dir -replace=s/THETYPE/int/ -replace=s/THEFILE/%T/ %s \| FileCheck %s


				{!-- TheFile.txt
				#include <iostream>
				THETYPE main(THETYPE argc, char **argv) {
				std::cout << "THEFILE" << std::endl;
				return 0;
				}

				{!--
				CHECK: #include <iostream>
				CHECK-NEXT: int main(int argc, char **argv) {
				CHECK-NEXT: std::cout << "{{.multiple-edits-same-line.}}" << std::endl;
				CHECK-NEXT: return 0;
				CHECK-NEXT: }

llvm/test/FileEdit/one-file.test

This file was added.

				; RUN: FileEdit -dest=%t_dir %s
				; RUN: FileCheck -input-file=%t_dir/TheFile.txt %s

				{!-- TheFile.txt
				int main(int argc, char **argv) {
				return 0;
				}

				{!--
				CHECK: int main(int argc, char **argv) {
				CHECK-NEXT: return 0;
				CHECK-NEXT: }

llvm/test/FileEdit/simple-edits.test

This file was added.

				; RUN: FileEdit %s \| FileCheck --check-prefix=NONE %s
				; RUN: FileEdit -replace=s/foo/bar/ %s \| FileCheck --check-prefix=ONE %s
				; RUN: FileEdit -replace=s/foo/bar/ -replace=s/bar/baz/ %s \| FileCheck --check-prefix=TWO %s

				NONE: CHECK: Test - foo - Test
				NONE-NEXT: CHECK: Test - bar - Test
				NONE-NEXT: CHECK: Test - baz - Test

				ONE: CHECK: Test - bar - Test
				ONE-NEXT: CHECK: Test - bar - Test
				ONE-NEXT: CHECK: Test - baz - Test

				TWO: CHECK: Test - bar - Test
				TWO-NEXT: CHECK: Test - baz - Test
				TWO-NEXT: CHECK: Test - baz - Test

llvm/test/FileEdit/two-files.test

This file was added.

				; RUN: FileEdit -dest=%t_dir %s
				; RUN: FileCheck -input-file=%t_dir/File1.cpp -check-prefix=FILE1 %s
				; RUN: FileCheck -input-file=%t_dir/File2.cpp -check-prefix=FILE2 %s

				{!-- File1.cpp
				// File1.cpp
				int main(int argc, char **argv) {
				return 0;
				}

				{!-- File2.cpp
				// File2.cpp
				int main(int argc, char **argv) {
				return 0;
				}

				{!--
				FILE1: // File1.cpp
				FILE1-NEXT: int main(int argc, char **argv) {
				FILE1-NEXT: return 0;
				FILE1-NEXT: }

				FILE2: // File2.cpp
				FILE2-NEXT: int main(int argc, char **argv) {
				FILE2-NEXT: return 0;
				FILE2-NEXT: }

llvm/utils/FileEdit/CMakeLists.txt

This file was added.

				add_llvm_utility(FileEdit
				FileEdit.cpp
				)

				target_link_libraries(FileEdit LLVMSupport)

llvm/utils/FileEdit/FileEdit.cpp

This file was added.

				//===- FileEdit.cpp - Split and transform file into multiple sub-files ----===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// FileEdit takes an input file and performs various operations on the file,
				// such as replacing certain sequences or splitting the file into multiple
				// sub-files. This is useful for keeping tests self-contained, as it allows us
				// to write a single file containing both test input and test check lines even
				// when the test input itself must span multiple files (e.g. a test which tests
				// some functionality involving compiling and linking 2 separate compilation
				// units).
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/SmallString.h"
				#include "llvm/ADT/StringExtras.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/FileSystem.h"
				#include "llvm/Support/FormatVariadic.h"
				#include "llvm/Support/LineIterator.h"
				#include "llvm/Support/Path.h"
				#include "llvm/Support/PrettyStackTrace.h"
				#include "llvm/Support/Regex.h"
				#include "llvm/Support/Signals.h"
				#include "llvm/Support/raw_ostream.h"

				using namespace llvm;

				namespace {
				enum class EditResult { Discard, UseModified, UseUnmodified };
				struct LineReplacement {
				StringRef Pattern;
				StringRef Repl;
				};
				} // namespace

				static cl::opt<std::string>
				InputFilename(cl::Positional, cl::desc("input file (defaults to stdin)"),
				cl::Required);

				static cl::list<std::string> StripPrefixes(
				"strip-prefix", cl::CommaSeparated,
				cl::desc("Exclude lines beginning with the given prefix from the output"));

				static cl::list<std::string> Replacements(
				"replace", cl::ZeroOrMore,
				cl::desc("Apply the specified replacement regex to each line"));

				static cl::opt<std::string>
				DestFolder("dest", cl::desc("Write split files to the specified folder"));

				static size_t findNextUnescapedSlash(StringRef Str) {
				size_t Pos = 0;
				while (Pos != StringRef::npos) {
				Pos = Str.find_first_of('/');
				if (Pos == StringRef::npos)
				return Pos;
				if (Pos == 0 \|\| Str[Pos - 1] != '\\')
				return Pos;
				}
				return StringRef::npos;
				}

				static bool parseOneReplacementItem(StringRef Str, StringRef &Pattern,
				StringRef &Repl) {
				if (!Str.consume_front("s/"))
				return false;

				for (StringRef *Dest : {&Pattern, &Repl}) {
				size_t NextSlash = findNextUnescapedSlash(Str);
				if (NextSlash == StringRef::npos)
				return false;
				*Dest = Str.take_front(NextSlash);
				Str = Str.drop_front(NextSlash + 1);
				}

				return Str.empty();
				}

				static void buildReplacementVector(std::vector<LineReplacement> &Repls) {
				std::string UniversalRegexStr;
				std::vector<StringRef> Patterns;
				for (StringRef S : Replacements) {
				StringRef Pattern;
				StringRef Repl;
				if (!parseOneReplacementItem(S, Pattern, Repl)) {
				outs() << formatv("FileEdit: Invalid replacement string `{0}`", S);
				continue;
				}

				Patterns.push_back(Pattern);
				Repls.push_back({Pattern, Repl});
				}
				}

				static bool handleFileDirective(StringRef Line, StringRef &OutFile) {
				if (!Line.consume_front("{!--"))
				return false;
				OutFile = Line.trim();
				return true;
				}

				static std::error_code writeOutput(StringRef FileName, StringRef Text) {
				if (FileName.empty()) {
				llvm::outs() << Text;
				return std::error_code();
				}

				SmallString<128> Path;
				if (DestFolder.getNumOccurrences() == 0)
				llvm::sys::fs::current_path(Path);
				else
				Path = DestFolder;
				llvm::sys::path::append(Path, FileName);
				llvm::sys::path::native(Path);
				auto Directory = llvm::sys::path::parent_path(Path);
				if (auto EC = llvm::sys::fs::create_directories(Directory, true))
				return EC;

				int FD;
				if (auto EC =
				llvm::sys::fs::openFileForWrite(Path, FD, llvm::sys::fs::F_Text))
				return EC;
				llvm::raw_fd_ostream OS(FD, true);
				OS << Text;
				OS.flush();
				return std::error_code();
				}

				static EditResult editLine(StringRef Line,
				ArrayRef<LineReplacement> Replacements,
				SmallVectorImpl<char> &ModifiedLine) {
				if (Line.empty())
				return EditResult::UseUnmodified;

				for (StringRef Prefix : StripPrefixes) {
				if (Line.startswith(Prefix))
				return EditResult::Discard;
				}

				// Perform all replacements from left to right in the input string.
				ModifiedLine.clear();
				bool WasModified = false;
				while (!Line.empty()) {
				const LineReplacement *UseRepl = nullptr;
				size_t FirstPos = StringRef::npos;

				// Find the leftmost replacement in the input string, and map it back to
				// the text that should replace it.
				for (const LineReplacement &LR : Replacements) {
				size_t Pos = Line.find(LR.Pattern);
				if (Pos < FirstPos) {
				UseRepl = &LR;
				FirstPos = Pos;
				}
				}
				if (!UseRepl) {
				if (!WasModified) {
				// If we didn't find anything to replace on this or any previous
				// iteration we can just exit and use the original line.
				return EditResult::UseUnmodified;
				}

				ModifiedLine.append(Line.begin(), Line.end());
				return EditResult::UseModified;
				}

				// The line can be split into 3 groups. Line = Left \| Repl \| Right. Append
				// `Left` as is, use the replacement value for `Repl`, and keep `Right` as
				// the remainder of the string for further replacement.
				StringRef Left = Line.take_front(FirstPos);

				ModifiedLine.append(Left.begin(), Left.end());
				Line = Line.drop_front(FirstPos);

				ModifiedLine.append(UseRepl->Repl.begin(), UseRepl->Repl.end());
				WasModified = true;
				Line = Line.drop_front(UseRepl->Pattern.size());
				}
				return EditResult::UseModified;
				}

				int main(int argc, char **argv) {
				sys::PrintStackTraceOnErrorSignal(argv[0]);
				PrettyStackTraceProgram X(argc, argv);
				cl::ParseCommandLineOptions(argc, argv);

				// Read the expected strings from the check file.
				ErrorOr<std::unique_ptr<MemoryBuffer>> InFileOrErr =
				MemoryBuffer::getFileOrSTDIN(InputFilename);
				if (std::error_code EC = InFileOrErr.getError()) {
				errs() << "Could not open input file '" << InputFilename
				<< "': " << EC.message() << '\n';
				return 1;
				}
				MemoryBuffer &InFile = *InFileOrErr.get();

				std::vector<LineReplacement> Replacements;

				buildReplacementVector(Replacements);

				SmallString<4096> InputFileBuffer;

				line_iterator Iter(InFile, false);
				line_iterator End;
				StringRef OutFile;
				std::string OutputText;
				SmallString<80> ModifiedLine;
				for (StringRef Line : make_range(Iter, End)) {
				StringRef NewOutFile;
				if (handleFileDirective(Line, NewOutFile)) {
				writeOutput(OutFile, OutputText);
				OutFile = NewOutFile;
				OutputText.clear();
				continue;
				}

				EditResult Result = editLine(Line, Replacements, ModifiedLine);
				if (Result == EditResult::Discard)
				continue;
				if (Result == EditResult::UseModified)
				Line = ModifiedLine;
				if (!OutputText.empty())
				OutputText += "\n";
				OutputText += Line;
				}

				writeOutput(OutFile, OutputText);
				return 0;
				}

This is an archive of the discontinued LLVM Phabricator instance.

Introduce FileEdit utilityNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 104449

llvm/CMakeLists.txt

llvm/test/CMakeLists.txt

llvm/test/FileEdit/Inputs/edit.in

llvm/test/FileEdit/multiple-edits-same-line.test

llvm/test/FileEdit/one-file.test

llvm/test/FileEdit/simple-edits.test

llvm/test/FileEdit/two-files.test

llvm/utils/FileEdit/CMakeLists.txt

llvm/utils/FileEdit/FileEdit.cpp

Introduce FileEdit utility
Needs ReviewPublic