This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lld/test/
-
test/
-
CMakeLists.txt
-
ELF/linkerscript/
-
linkerscript/
1/1
noload.s
-
llvm/
-
docs/
-
TestingGuide.rst
-
test/
-
CMakeLists.txt
-
lit.cfg.py
-
tools/
-
gold/X86/
-
X86/
-
multiple-sections.ll
-
llvm-strings/
-
radix.test
-
split-file/
-
Inputs/
-
basic-aa.txt
-
basic-bb.txt
-
basic-cc.txt
1/1
basic.test
-
empty.test
-
error.test
-
help.test
-
no-leading-lines.test
1/1
output-is-special.test
-
tools/split-file/
-
split-file/
-
.clang-tidy
-
CMakeLists.txt
8/8
split-file.cpp
-
utils/gn/secondary/
-
gn/
-
secondary/
-
lld/test/
-
test/
-
BUILD.gn
-
llvm/
-
test/
-
BUILD.gn
-
tools/split-file/
-
split-file/
-
BUILD.gn

Differential D83834

Add test utility 'split-file'
ClosedPublic

Authored by MaskRay on Jul 14 2020, 5:31 PM.

Download Raw Diff

Details

Reviewers

dblaikie
echristo
grimar
jhenderson
probinson
rsmith
• espindola
alexander-shaposhnikov
rupprecht
lattner

Commits

rGbcea3a7a288e: Add test utility 'split-file'
rGd054c7ee2e9f: Add test utility 'extract'

Summary

See https://lists.llvm.org/pipermail/llvm-dev/2020-July/143373.html
"[llvm-dev] Multiple documents in one test file" for some discussions.

This patch has explored several alternatives. The current semantics are similar to
what @dblaikie proposed.
split-file filename output splits the input file into multiple parts separated by
regex ^(.|//)--- filename and write each part to the file output/filename
(filename can include path separators).

Use case A (organizing input of different formats (e.g. linker
script+assembly) in one file).

# RUN: split-file %s %t
# RUN: llvm-mc %t/asm -o %t.o
# RUN: ld.lld -T %t/lds %t.o -o %t
This is sometimes better than the %S/Inputs/ approach because the user
can see the auxiliary files immediately and don't have to open another file.

# asm
...
# lds
...

Use case B (for utilities which don't have built-in input splitting
feature):

// RUN: split-file %s %t
// RUN: llc < %t/1.ll | FileCheck %s --check-prefix=CASE1
// RUN: llc < %t/2.ll | FileCheck %s --check-prefix=CASE2
Combing tests prudently can improve readability.
For example, when testing parsing errors if the recovery mechanism isn't possible,
grouping the tests in one file can more readily see test coverage/strategy.

//--- 1.ll
...
//--- 2.ll
...

Since this is a new utility, there is no git history concerns for
UpperCase variable names. I use lowerCase variable names like mlir/lld.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

MaskRay created this revision.Jul 14 2020, 5:31 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2020, 5:31 PM

Herald added subscribers: llvm-commits, stephenneuendorffer, rriddle, mgorny. · View Herald Transcript

MaskRay mentioned this in D83725: [llvm-mc] Add --doc-id=<id> to support multiple documents in a file.Jul 14 2020, 5:33 PM

Harbormaster failed remote builds in B64265: Diff 278028!Jul 14 2020, 5:43 PM

lkail added a subscriber: lkail.Jul 14 2020, 6:34 PM

gargaroff added a subscriber: gargaroff.Jul 15 2020, 12:16 AM

I wonder if this utility deserves a small number of tests to show the behaviour is exactly what we want? I think it's fine, but it's not obviously correct that the utility is doing what we want.

llvm/tools/extract/extract.cpp
24 ↗	(On Diff #278028)	`toolName` Also, should this be derived from `argv[0]`? That way if the tool gets renamed, the error message still uses the actual executable name.
26 ↗	(On Diff #278028)	Is there a way of telling clang-tidy to allow a different naming style?
42 ↗	(On Diff #278028)	`const Twine &`?
47 ↗	(On Diff #278028)	Ditto.
53 ↗	(On Diff #278028)	I might use `fileBegin` and `fileEnd`.
78 ↗	(On Diff #278028)	was not found
87 ↗	(On Diff #278028)	If I read this correctly, this will print a series of empty lines before the output. I'm not sure I see the benefit of that? What about empty lines after the contents?
101 ↗	(On Diff #278028)	`bufferOrErr`

jhenderson added inline comments.Jul 15 2020, 1:09 AM

llvm/tools/extract/extract.cpp
87 ↗	(On Diff #278028)	Oh, I just read the mailing list comment saying this is to preserve line numbers. It might be helpful having a comment here in the code explaining this.

Just wanted to say I agree this can be better than the Inputs/ idiom, especially for comparatively short inputs. We have some downstream tests that use a bunch of echo commands to generate a second file, but an extract utility would be cleaner IMO.

+1 to porting some examples to help show the benefit of such a tool (& seed the usage so hopefully folks are more likely to be aware of/find/use this in the future when looking at existing tests to draw inspiration from)

+1 to a comment explaining why the blank newlines are present (do we have any formats where the blank newlines could be semantically significant/problematic? I don't think so)

& not exactly relevant to the tool itself, but do common editors support any way to annotate regions of code as being for different file formats? It'd be good to still get syntax highlighting and auto-indent for different input formats mixed in a file - by adding comments that tell the editor what format to interpret the next set of lines as, etc. For the single format cases that's not an issue.

Address comments. Add some examples

Herald added a reviewer: • espindola. · View Herald TranscriptJul 15 2020, 11:30 AM

Herald added a reviewer: alexander-shaposhnikov. · View Herald Transcript

Herald added a reviewer: rupprecht. · View Herald Transcript

Herald added subscribers: rupprecht, aheejin, emaste. · View Herald Transcript

MaskRay added inline comments.Jul 15 2020, 11:30 AM

llvm/tools/extract/extract.cpp
26 ↗	(On Diff #278028)	Added `.clang-tidy` (like lldb/.clang-tidy and lld/.clang-tidy)
53 ↗	(On Diff #278028)	Changed to `partBegin`/`partEnd`. Hope that is clearer.

MaskRay marked an inline comment as done.Jul 15 2020, 11:33 AM

Harbormaster failed remote builds in B64395: Diff 278259!Jul 15 2020, 12:06 PM

Fix OVERVIEW.

Update TestingGuide.rst

Harbormaster failed remote builds in B64418: Diff 278299!Jul 15 2020, 2:38 PM

LGTM. Thanks for doing this!

This revision is now accepted and ready to land.Jul 16 2020, 12:03 AM

jhenderson mentioned this in D83520: [llvm-libtool-darwin] Allow flattening archives.Jul 16 2020, 12:37 AM

grimar added inline comments.Jul 16 2020, 1:37 AM

lld/test/ELF/linkerscript/noload.s
1–2	This new style in LLD tests looks much better to me!
llvm/test/tools/extract/no-leading-lines.s
4 ↗	(On Diff #278299)	I wonder if it is better to use a tool that doesn't require "REQUIRES: x86-registered-target". E.g. it probably could be yaml2obj that reported a syntax error.
llvm/test/tools/llvm-objcopy/ELF/strip-symbol.test
14 ↗	(On Diff #278299)	`#` -> `##` while you are here?
llvm/tools/extract/extract.cpp
43 ↗	(On Diff #278299)	This is used only once I think. Perhaps it can be merged with the `error()` below: LLVM_ATTRIBUTE_NORETURN static void error(StringRef filename, const Twine &message) { if (filename.empty()) ... else ... exit(1);
63 ↗	(On Diff #278299)	This supports `//---`, but doesn't seem you have a test for it? I'd also add a comment saying it tries to find `^(.\|//)--- <part>` line. Perhaps, it is easier to read this piece when it is: if (line.size() <= markerLen \|\| !line.substr(markerLen - 4).startswith("--- "))) continue;
68 ↗	(On Diff #278299)	test?
69 ↗	(On Diff #278299)	You do not need this line, `error()` calls `exit(1)` inside.
81 ↗	(On Diff #278299)	test?
111 ↗	(On Diff #278299)	test?

Address comments

llvm/test/tools/extract/no-leading-lines.s
4 ↗	(On Diff #278299)	For this one, we can drop -triple=x86_64. `#` in the line beginning is a universal comment among targets. For others, I want to avoid syntax errors because that would change the exist code of the tool.
llvm/test/tools/llvm-objcopy/ELF/strip-symbol.test
14 ↗	(On Diff #278299)	Since this is about the input, `#` works better, i.e. the syntax does not require two `##` to start a comment..
llvm/tools/extract/extract.cpp
63 ↗	(On Diff #278299)	basic.s has `//--- cc`
68 ↗	(On Diff #278299)	basic.s has `# DUP:`
81 ↗	(On Diff #278299)	basic.s has `was not found`
111 ↗	(On Diff #278299)	`basic.s` has `# NO_INPUT:`

Oh, forgot to git add basic.s in a previous revision:/

Harbormaster failed remote builds in B64537: Diff 278504!Jul 16 2020, 9:24 AM

Harbormaster failed remote builds in B64538: Diff 278505!Jul 16 2020, 9:42 AM

Sorry, I should have paid more attention when doing my previous review.

llvm/test/tools/extract/basic.s
2 ↗	(On Diff #278505)	Do you actually need to use llvm-mc at all? It seems a bit heavy duty. You could just use `FileCheck` directly on the output of `extract`, e.g. # RUN: extract aa %s \| FileCheck %s --check-prefix=AA --implicit-check-not=bb ... # AA: {{^}}aa{{$}} #--- aa aa I suppose that doesn't cover the line numbering, but perhaps that should be a different test?
llvm/tools/extract/extract.cpp
26 ↗	(On Diff #278028)	I'm not seeing the `.clang-tidy` in the file list?

MaskRay marked 3 inline comments as done.Jul 17 2020, 12:38 AM

MaskRay added inline comments.

llvm/test/tools/extract/basic.s
2 ↗	(On Diff #278505)	I suppose the only problem is `# REQUIRES: x86-registered-target` but this is the simplest tool (I can find) which can print line number information, and we do need to test several forms and ensure each form can get correct line numbers. If you have suggestion for another tool without a need to add `# REQUIRES: x86-registered-target`, I'd happily change. But I'd prefer sticking with llvm-mc otherwise.
llvm/tools/extract/extract.cpp
26 ↗	(On Diff #278028)	It is in the file list: `A M llvm/tools/extract/.clang-tidy (19 lines)`

MaskRay marked an inline comment as done.Jul 17 2020, 12:38 AM

grimar added inline comments.Jul 17 2020, 3:24 AM

llvm/test/tools/extract/basic.s
2 ↗	(On Diff #278505)	Perhaps, you could use the LLVM`s `count` tool to count the number of lines?

grimar added inline comments.Jul 17 2020, 3:32 AM

llvm/test/tools/extract/basic.s
2 ↗	(On Diff #278505)	Another way I can think of is to add more sample files to compare with. I.e. you can create sample.a, sample.b etc. And then compare the output of extract with those samples. Will this work? I just also do not think that using of `llvm-mc` is a good idea for the tool that splits the text lines...

MaskRay marked 2 inline comments as done.Jul 17 2020, 9:08 AM

MaskRay added inline comments.

llvm/test/tools/extract/basic.s
2 ↗	(On Diff #278505)	Perhaps, you could use the LLVM`s count tool to count the number of lines? This seems like an indirect way to test something. I.e. you can create sample.a, sample.b etc. And then compare the output of extract with those samples. This degrades readability. Adding a line means all Inputs/ files need updating. .warning has well defined and very stable interface. We use nothing but a very basic feature set of lexing of it. No assembly, no disassembly. I don't see a problem using it for testing.

dblaikie added inline comments.Jul 17 2020, 9:25 AM

llvm/test/tools/extract/basic.s
2 ↗	(On Diff #278505)	I.e. you can create sample.a, sample.b etc. And then compare the output of extract with those samples. This degrades readability. Adding a line means all Inputs/ files need updating. I think this (having separate test files, diffing them exactly) is the right way to test this tool - the nature of low-level tools that are designed to improve the readability of higher level tests is that testing them will be less elegant. (eg: testing yaml2obj if we intend to use it to test llvm-dwarfdump means yaml2obj tests can't use llvm-dwarfdump to test them, etc) Indeed, presumably we want to use this tool when testing llvm-mc, so we should not use llvm-mc to test the tool - otherwise the testing is circular. Please don't use llvm-mc or similar things to test this very low-level piece of test infrastructure like this.

llvm-mc -> FileCheck + count

dblaikie added inline comments.Jul 17 2020, 10:19 AM

llvm/test/tools/extract/basic.test
21–23 ↗	(On Diff #278826)	Does this test that the desired lines are at the desired line numbers? It seems not. The padding lines could be incorrectly printed at the end of the output & it would still pass, right? I think it'd be better to write this test with exact expected output files & run diff against them. This is a case, in my opinion, where simple golden files seem entirely appropriate - the point of the tool is to produce very specific output, unlike something like opt/llc/etc that produce one of a set of equivalent outputs.
llvm/tools/extract/.clang-tidy
1 ↗	(On Diff #278826)	Why does this project have a different format compared to the rest of LLVM?

Harbormaster failed remote builds in B64711: Diff 278826!Jul 17 2020, 10:49 AM

Add two golden files

llvm/tools/extract/.clang-tidy
1 ↗	(On Diff #278826)	Adopt docs/Proposals/VariableNames.rst (proposal, not an agreement. Not agreed because people don't want VariableName -> variableName churn. There is no such concern for a standalone fresh utility & this is the style adopted by lld & mlir)

dblaikie added inline comments.Jul 17 2020, 11:28 AM

llvm/tools/extract/.clang-tidy
1 ↗	(On Diff #278826)	For distinct top-level projects that seems sort of OK - but I don't think it's appropriate to have divergent styles within a single top-level project, such as LLVM.

Harbormaster failed remote builds in B64729: Diff 278859!Jul 17 2020, 11:57 AM

MaskRay marked an inline comment as done.Jul 17 2020, 12:08 PM

MaskRay added subscribers: michaelplatings, lattner.

MaskRay added inline comments.

llvm/tools/extract/.clang-tidy
1 ↗	(On Diff #278826)	For distinct top-level projects that seems sort of OK - but I don't think it's appropriate to have divergent styles within a single top-level project, such as LLVM. I don't think people expressed that this is not OK for the previous discussion. @michaelplatings @lattner

lit substitution: extract -> %extract

Unfortunately, 'extract' in 'llvm-xray extract' will be substituted, due to D83578

MaskRay edited the summary of this revision. (Show Details)Jul 23 2020, 8:29 AM

Harbormaster failed remote builds in B65386: Diff 280141!Jul 23 2020, 9:26 AM

Use %extract only in not %extract

Closed by commit rGd054c7ee2e9f: Add test utility 'extract' (authored by MaskRay). · Explain WhyJul 23 2020, 7:15 PM

This revision was automatically updated to reflect the committed changes.

In D83834#2171199, @MaskRay wrote:

Use %extract only in not %extract

So does 'extract' not work the same as all other tools in LLVM? It doesn't get substituted in non-starting locations?

That inconsistency seems undesirable/best avoided. The xray collission is problematic, to be sure - might indicate that another name should be used, or perhaps some escaping to disable substitution would be useful (quotation marks seem like a fairly accessible/simple syntax to disable substitution, for instance)

It's maybe a bit surprising that we have both llvm-extract and extract that are unrelated binaries?

In D83834#2171214, @thakis wrote:

It's maybe a bit surprising that we have both llvm-extract and extract that are unrelated binaries?

Yes. Seems that nobody is unhappy, though http://lists.llvm.org/pipermail/llvm-dev/2020-July/143373.html

If we add a standalone utility, how shall we name it? (Note that llvm-extract exists, but people can probably distinguish 'extract' from llvm-extract

In D83834#2171208, @dblaikie wrote:

In D83834#2171199, @MaskRay wrote:

Use %extract only in not %extract

So does 'extract' not work the same as all other tools in LLVM? It doesn't get substituted in non-starting locations?

That inconsistency seems undesirable/best avoided. The xray collission is problematic, to be sure - might indicate that another name should be used, or perhaps some escaping to disable substitution would be useful (quotation marks seem like a fairly accessible/simple syntax to disable substitution, for instance)

llvm/test/lit.cfg.py

# FIXME: Why do we have both `lli` and `%lli` that do slightly different things?
tools.extend([
    'dsymutil', 'lli', 'lli-child-target', 'llvm-ar', 'llvm-as',
    'llvm-addr2line', 'llvm-bcanalyzer', 'llvm-config', 'llvm-cov',
    'llvm-cxxdump', 'llvm-cvtres', 'llvm-diff', 'llvm-dis', 'llvm-dwarfdump',
    'llvm-exegesis', 'llvm-extract', 'llvm-isel-fuzzer', 'llvm-ifs',
    'llvm-install-name-tool', 'llvm-jitlink', 'llvm-opt-fuzzer', 'llvm-lib',
    'llvm-link', 'llvm-lto', 'llvm-lto2', 'llvm-mc', 'llvm-mca',
    'llvm-modextract', 'llvm-nm', 'llvm-objcopy', 'llvm-objdump',
    'llvm-pdbutil', 'llvm-profdata', 'llvm-ranlib', 'llvm-rc', 'llvm-readelf',
    'llvm-readobj', 'llvm-rtdyld', 'llvm-size', 'llvm-split', 'llvm-strings',
    'llvm-strip', 'llvm-tblgen', 'llvm-undname', 'llvm-c-test', 'llvm-cxxfilt',
    'llvm-xray', 'yaml2obj', 'obj2yaml', 'yaml-bench', 'verify-uselistorder',
    'bugpoint', 'llc', 'llvm-symbolizer', 'opt', 'sancov', 'sanstats'])

Any tool in this list is special. They can be substituted in any position on a RUN line, either a 'command' position or an 'argument' position. I hope that we can deprecated this behavior.

Harbormaster completed remote builds in B65483: Diff 280314.Jul 23 2020, 7:44 PM

In D83834#2171222, @MaskRay wrote:
In D83834#2171208, @dblaikie wrote:

In D83834#2171199, @MaskRay wrote:

Use %extract only in not %extract

So does 'extract' not work the same as all other tools in LLVM? It doesn't get substituted in non-starting locations?

That inconsistency seems undesirable/best avoided. The xray collission is problematic, to be sure - might indicate that another name should be used, or perhaps some escaping to disable substitution would be useful (quotation marks seem like a fairly accessible/simple syntax to disable substitution, for instance)

llvm/test/lit.cfg.py
# FIXME: Why do we have both `lli` and `%lli` that do slightly different things?
tools.extend([
    'dsymutil', 'lli', 'lli-child-target', 'llvm-ar', 'llvm-as',
    'llvm-addr2line', 'llvm-bcanalyzer', 'llvm-config', 'llvm-cov',
    'llvm-cxxdump', 'llvm-cvtres', 'llvm-diff', 'llvm-dis', 'llvm-dwarfdump',
    'llvm-exegesis', 'llvm-extract', 'llvm-isel-fuzzer', 'llvm-ifs',
    'llvm-install-name-tool', 'llvm-jitlink', 'llvm-opt-fuzzer', 'llvm-lib',
    'llvm-link', 'llvm-lto', 'llvm-lto2', 'llvm-mc', 'llvm-mca',
    'llvm-modextract', 'llvm-nm', 'llvm-objcopy', 'llvm-objdump',
    'llvm-pdbutil', 'llvm-profdata', 'llvm-ranlib', 'llvm-rc', 'llvm-readelf',
    'llvm-readobj', 'llvm-rtdyld', 'llvm-size', 'llvm-split', 'llvm-strings',
    'llvm-strip', 'llvm-tblgen', 'llvm-undname', 'llvm-c-test', 'llvm-cxxfilt',
    'llvm-xray', 'yaml2obj', 'obj2yaml', 'yaml-bench', 'verify-uselistorder',
    'bugpoint', 'llc', 'llvm-symbolizer', 'opt', 'sancov', 'sanstats'])
Any tool in this list is special. They can be substituted in any position on a RUN line, either a 'command' position or an 'argument' position. I hope that we can deprecated this behavior.

This looks like a fairly comprehensive list of llvm utilities - seems like "extract" should be included in that list until that deprecation design discussion reaches an agreement. I don't think it's suitable to start down that path before an agreement is reached in an ongoing design discussion.

In D83834#2171217, @MaskRay wrote:

In D83834#2171214, @thakis wrote:

It's maybe a bit surprising that we have both llvm-extract and extract that are unrelated binaries?

Yes. Seems that nobody is unhappy, though http://lists.llvm.org/pipermail/llvm-dev/2020-July/143373.html

If we add a standalone utility, how shall we name it? (Note that llvm-extract exists, but people can probably distinguish 'extract' from llvm-extract

Fairly small sample of folks in that thread - designing for all the rest of the LLVM developers isn't necessarily "what no one personally objects to".

(though the naming collission with xray extract did get me thinking about naming in general, but also the semantics - would it make sense to call this something like "fragment" (or other synonyms to "split", etc) and have the semantics be more like "write all the file fragments into the specified directory, named after the fragment name" - so you only have to run this tool once, rather than once for every fragment you have?)

In D83834#2171252, @dblaikie wrote:
In D83834#2171222, @MaskRay wrote:
In D83834#2171208, @dblaikie wrote:

In D83834#2171199, @MaskRay wrote:

Use %extract only in not %extract

So does 'extract' not work the same as all other tools in LLVM? It doesn't get substituted in non-starting locations?

That inconsistency seems undesirable/best avoided. The xray collission is problematic, to be sure - might indicate that another name should be used, or perhaps some escaping to disable substitution would be useful (quotation marks seem like a fairly accessible/simple syntax to disable substitution, for instance)

llvm/test/lit.cfg.py
# FIXME: Why do we have both `lli` and `%lli` that do slightly different things?
tools.extend([
    'dsymutil', 'lli', 'lli-child-target', 'llvm-ar', 'llvm-as',
    'llvm-addr2line', 'llvm-bcanalyzer', 'llvm-config', 'llvm-cov',
    'llvm-cxxdump', 'llvm-cvtres', 'llvm-diff', 'llvm-dis', 'llvm-dwarfdump',
    'llvm-exegesis', 'llvm-extract', 'llvm-isel-fuzzer', 'llvm-ifs',
    'llvm-install-name-tool', 'llvm-jitlink', 'llvm-opt-fuzzer', 'llvm-lib',
    'llvm-link', 'llvm-lto', 'llvm-lto2', 'llvm-mc', 'llvm-mca',
    'llvm-modextract', 'llvm-nm', 'llvm-objcopy', 'llvm-objdump',
    'llvm-pdbutil', 'llvm-profdata', 'llvm-ranlib', 'llvm-rc', 'llvm-readelf',
    'llvm-readobj', 'llvm-rtdyld', 'llvm-size', 'llvm-split', 'llvm-strings',
    'llvm-strip', 'llvm-tblgen', 'llvm-undname', 'llvm-c-test', 'llvm-cxxfilt',
    'llvm-xray', 'yaml2obj', 'obj2yaml', 'yaml-bench', 'verify-uselistorder',
    'bugpoint', 'llc', 'llvm-symbolizer', 'opt', 'sancov', 'sanstats'])
Any tool in this list is special. They can be substituted in any position on a RUN line, either a 'command' position or an 'argument' position. I hope that we can deprecated this behavior.
This looks like a fairly comprehensive list of llvm utilities - seems like "extract" should be included in that list until that deprecation design discussion reaches an agreement. I don't think it's suitable to start down that path before an agreement is reached in an ongoing design discussion.

Things like not and count are not in the list. I don't think using %extract only for its own test a problem. Nobody uses not %extract in tests. It is there just to test the tool itself.

In D83834#2171217, @MaskRay wrote:

In D83834#2171214, @thakis wrote:

It's maybe a bit surprising that we have both llvm-extract and extract that are unrelated binaries?

Yes. Seems that nobody is unhappy, though http://lists.llvm.org/pipermail/llvm-dev/2020-July/143373.html

If we add a standalone utility, how shall we name it? (Note that llvm-extract exists, but people can probably distinguish 'extract' from llvm-extract

Fairly small sample of folks in that thread - designing for all the rest of the LLVM developers isn't necessarily "what no one personally objects to".

(though the naming collission with xray extract did get me thinking about naming in general, but also the semantics - would it make sense to call this something like "fragment" (or other synonyms to "split", etc) and have the semantics be more like "write all the file fragments into the specified directory, named after the fragment name" - so you only have to run this tool once, rather than once for every fragment you have?)

'split' is a utility specified by POSIX.1-2017 Base.

"write all the file fragments into the specified directory, named after the fragment name" is a good idea. If it is implemented as a special mode, it can be: extract all %s -o %t.dir or extract all %s -o %t. -o specifies a directory name.

Instead of adding an explicit command 'extract', did you consider building this into llvm-mc and similar tools? This is will lead to more efficient tests and scales better to things that have diagnostic verification and other sorts of checks. This is also more consistent with the precedent MLIR has set here.

In D83834#2172384, @lattner wrote:

Instead of adding an explicit command 'extract', did you consider building this into llvm-mc and similar tools? This is will lead to more efficient tests and scales better to things that have diagnostic verification and other sorts of checks. This is also more consistent with the precedent MLIR has set here.

I considered the choice. mlir-opt/mlir-translate's -split-input-file/clang -verify is suitable when the input is homogeneous and the test is performed the tool itself.
For heterogeneous needs (llvm-objcopy --strip-symbols %t-list.txt %t %t5 (symbol list), ld.lld -T %t.lds %t.o (linker script)), or when the tool output is inspected by another tool, a standalone utility is likely more useful.

So far I like @dblaikie's semantics. I'd like syntax like extract %s -o %t.dir. For example the following will create %t.dir/{a.txt,b.ll,c.c}

#--- a.txt
0
;--- b.ll
1
//--- c.c
2

I am not settled on completely deleting the previous extract part %s syntax, or renaming 'extract' to something else.

Ok fair enough. I'd recommend a name like llvm-split-file or something like that given that we already have an llvm-extract tool that is very different.

Also, it is worth pointing out that we already have an (arguably much better) way to handle the problem you're trying to solve here:

Name your test something like "foo.ll", then have the auxilary files be named "foo.ll.xyz". You can then refer to them directly in the test with "%s.xyz". This is better because 1) it doesn't introduce another micro tool, 2) it is general to non text files, 3) it is easy to work with on the command line when a test breaks, and 4) this makes it easier for multiple tests to share the same file.

The only downside I see to this is the creation of more small files, but I think it is a good tradeoff to not introduce a new way of doing things here that is less general.

In D83834#2172415, @lattner wrote:

Ok fair enough. I'd recommend a name like llvm-split-file or something like that given that we already have an llvm-extract tool that is very different.

How about a short name, e.g. split-file? We have many lit utilities not named llvm-*: not, count, FileCheck. The utility is more of their league.

In D83834#2172423, @lattner wrote:

Also, it is worth pointing out that we already have an (arguably much better) way to handle the problem you're trying to solve here:

Name your test something like "foo.ll", then have the auxilary files be named "foo.ll.xyz". You can then refer to them directly in the test with "%s.xyz". This is better because 1) it doesn't introduce another micro tool, 2) it is general to non text files, 3) it is easy to work with on the command line when a test breaks, and 4) this makes it easier for multiple tests to share the same file.

The only downside I see to this is the creation of more small files, but I think it is a good tradeoff to not introduce a new way of doing things here that is less general.

Separate files are considered. Actually that motivated the standalone utility https://reviews.llvm.org/D83725#2149490

"I end up doing one of three things in this situation: 1) adding a separate file in the "Inputs" directory - this is not great because the test input is far away from the actual test (i.e. not in the same file), making it harder to follow; 2) echoing the second and later inputs to separate files at runtime - this is not great because it has a runtime cost; ..."

I don't see any reason to prefer a short name here, this isn't a simple too like 'opt' it takes command line flags etc.

Unrelatedly, I don't agree with your rationale for having a separate tool. You're right that moving files to an Inputs directory moves them further away, but that isn't what I was suggesting. Also, there is great precedent across the testsuite for this, and the proposed tool isn't a general solution to the problem (e.g. binary files etc).

In D83834#2172461, @lattner wrote:

I don't see any reason to prefer a short name here, this isn't a simple too like 'opt' it takes command line flags etc.

not, count, FileCheck, update_*_test_checks.py. These utilities set a precedent that these auxiliary test-only utilities are not named llvm-*.

Unrelatedly, I don't agree with your rationale for having a separate tool. You're right that moving files to an Inputs directory moves them further away, but that isn't what I was suggesting. Also, there is great precedent across the testsuite for this, and the proposed tool isn't a general solution to the problem (e.g. binary files etc).

I acknowledge that this falls into subjective points of view and people may have different opinions. For me, non-trivial separate files (not creatable with one-line echo) have caused enough pain to me. It is not unusual for me open two or three auxiliary files in Inputs/ to understand the purpose of a test. If I don't count wrong, at least @grimar, @jhenderson and @probinson hold a similar viewpoint.

(For binary files, they are sometimes useful, e.g. when testing compatibility of LLVM IR. a llvm/test/Bitcode/ pre-built file ensures that compatibility is retained. However, their use cases are very narrow. In 99% cases textual formats will be a superior replacement. I hope we don't let 1% inapplicable use case to be the reason that a general purpose testing utility should not be introduced. )

dblaikie added a comment.Jul 24 2020, 9:51 AM

This comment was removed by dblaikie.

I'm not sure how googles test runner works or how it differs. I recall many historical examples (in clang in particular), but it is possible they all got changed.

There's no need to speculate though :-), I'd just try an example to see if it works in practice in your environment.

In D83834#2172461, @lattner wrote:

I don't see any reason to prefer a short name here, this isn't a simple too like 'opt' it takes command line flags etc.

I think if it's going to be invoked once per fragment, inline with the destination (extract %s test1 | llc ... ) then having a shorter name is nice. If it's going to be invoked once to split everything up - that'd be on a separate RUN line anyway, and having a longer name seems fine by me. Whether or not it has an llvm- prefix I don't feel /too/ strongly about, but it does make clear that this isn't some unix vocabulary program (like not, count, etc), but a specific LLVM thing.

Name your test something like "foo.ll", then have the auxilary files be named "foo.ll.xyz". You can then refer to them directly in the test with "%s.xyz". This is better because 1) it doesn't introduce another micro tool, 2) it is general to non text files, 3) it is easy to work with on the command line when a test breaks, and 4) this makes it easier for multiple tests to share the same file.
...
Unrelatedly, I don't agree with your rationale for having a separate tool. You're right that moving files to an Inputs directory moves them further away, but that isn't what I was suggesting. Also, there is great precedent across the testsuite for this, and the proposed tool isn't a general solution to the problem (e.g. binary files etc).

What's the precedent you're referring to here? Could you point me to some examples? Because my understanding was that we have a fairly strong precedent for input files being in an Inputs directory (in fact, I think Google's internal test runner relies on this convention) - having other files immediately next to test files I don't think is done & if it was done, I'd worry about the suffixes potentially colliding with valid test prefixes, then the auxiliary files would be incorrectly run as independent tests & probably fail due to lack of RUN lines, etc.

(3) is certainly compelling to me - being able to copy/paste lines from a test to reproduce is nice - though lots of tests are stateful - generating something, then doing a thing with it (running objdump and dwarfdump to inspect different features of the output - so you can't just copy/paste the dwarfdump+filecheck, you have to rerun the right command that generated the input too, etc).

I guess one thing we could do with this functionality that might make tests more reliable - this tool could delete the contents of the target directory before extracting the files (maybe this would be conflating responsibilities a bit and catch someone by surprise, though) reducing the incidence of tests being polluted by previous executions.

To throw-in another naming suggestion, clang uses that name "unbundle".

Also see http://lists.llvm.org/pipermail/llvm-dev/2020-July/143568.html

In D83834#2172509, @MaskRay wrote:

In D83834#2172461, @lattner wrote:

Unrelatedly, I don't agree with your rationale for having a separate tool. You're right that moving files to an Inputs directory moves them further away, but that isn't what I was suggesting. Also, there is great precedent across the testsuite for this, and the proposed tool isn't a general solution to the problem (e.g. binary files etc).

I acknowledge that this falls into subjective points of view and people may have different opinions. For me, non-trivial separate files (not creatable with one-line echo) have caused enough pain to me. It is not unusual for me open two or three auxiliary files in Inputs/ to understand the purpose of a test. If I don't count wrong, at least @grimar, @jhenderson and @probinson hold a similar viewpoint.

(For binary files, they are sometimes useful, e.g. when testing compatibility of LLVM IR. a llvm/test/Bitcode/ pre-built file ensures that compatibility is retained. However, their use cases are very narrow. In 99% cases textual formats will be a superior replacement. I hope we don't let 1% inapplicable use case to be the reason that a general purpose testing utility should not be introduced. )

I have a strong +1 to have a way to keep inputs and test in a single file. It significantly improves readability from my experience. E.g. we have --docnum=x
option for yaml2obj and it is very common for llvm-readelf tests to have a few little YAML inputs in the same file. We introduced macros support for yaml2obj (-DMACRO=<val>)
and default values for them recently and this also helps to reuse YAML descriptions from the same file what generally helps to isolate test cases, reuse them for similar tests and keep the whole file compact.
If we had separate inputs it would be very hard to mantain those tests I believe.

Initially D84054 suggested the --doc-id=<id> option for llvm-mc which could help to solve a problem for asm files to avoid splitting. I think it was good solution by itself,
but also, having a separate tool is probably a more general solution for achieving the same.

I don't think I have much new to add that hasn't been said already. My personal opinions are:

Naming - I don't really mind whether it has a long or short name. Given it's usually going to be on a separate line anyway, I suspect, rather than writing stuff to stdout for feeding into stdin, I don't think a longer name is an issue. The point about it not being a unix tool equivalent makes some sense to me for adding an llvm- prefix, although there are counter-examples for this. I also have no preference on the exact name, as long as it conveys meaning well enough ([llvm-]split-file seems clear enough for example).
I want my test inputs to be all in one file where possible, as it makes it much easier to iterate on a test, understand it, etc, by keeping everything in the single file. There is a place for separate inputs, where they are going to be commonly reused across many tests (see for example the recent llvm-libtool-darwin testing, which reuses the same YAML repeatedly), but I think these are more the exception than the rule. It seems to me that the %s.xxx is not much better than being in %S\Inputs in this regards.
@dblaikie's concern about possible extension collisions with the %s.xxx approach is one I'd share. One example might be an lld or llvm-objdump test consuming multiple asm files. A naive user might call them %s.1.s, %s.2.s etc and if they're just writing the one test, won't notice that when the whole directory is tested, including those files, resulting in lit failures.
An "extract all" option would be useful. In fact, assuming we don't start adding other functionality to the tool (such as string substitution), I suspect the extract everything approach is going to be almost universally the better option, so might want to be the default behaviour.

Higuoxing added a subscriber: Higuoxing.Jul 28 2020, 1:38 AM

Hi All,

I'm concerned that this patch was just landed despite the open discussion about problems. As one example, it doesn't make any sense to me that we now have llvm/tools/extract and llvm/tools/llvm-extract. Please revert until the details are sorted out and we get to convergence on the issues.

-Chris

MaskRay reopened this revision.Jul 28 2020, 1:26 PM

This revision is now accepted and ready to land.Jul 28 2020, 1:26 PM

MaskRay added a reverting change: rGdd405f1a5397: Revert D83834 "Add test utility 'extract'".Jul 28 2020, 1:26 PM

MaskRay planned changes to this revision.Jul 28 2020, 1:26 PM

ychen added a subscriber: ychen.Jul 28 2020, 1:30 PM

Thank you Fangrui!

NeHuang mentioned this in D83669: [PowerPC] Support for R_PPC64_REL24_NOTOC calls where the caller has no TOC and the callee is not DSO local.Jul 28 2020, 3:09 PM

Rename extract to split-file and change the interface

This revision is now accepted and ready to land.Jul 29 2020, 10:31 AM

MaskRay edited the summary of this revision. (Show Details)Jul 29 2020, 10:32 AM

Harbormaster completed remote builds in B66245: Diff 281660.Jul 29 2020, 10:37 AM

MaskRay added inline comments.Jul 29 2020, 12:44 PM

llvm/test/tools/split-file/basic.test
11	I'll enhance the test a bit: ## Test that we will delete the output if it is a file, so that we can create ## a directory. # RUN: touch %t

jhenderson added inline comments.Jul 30 2020, 4:57 AM

llvm/tools/split-file/split-file.cpp
64	Whilst it's probably harmless, I don't think this should be an `int` (and similar comments for other liner number variables, e.g. `lineNo` in `handle`), since it can never be negative. `size_t` or possibly `unsigned` seem more appropriate for the context.
97	I'm not sure it's immediately obvious here what the type of `cur` is, so I'd prefer no `auto`.
114	The name `it` implies to me that this is an iterator, but it's actually a pair, right? Perhaps best to change the name to something (e.g. `keyValue` or whatever)
118	You already have an `ec` local variable here. Perhaps this should be renamed and/or the earlier `ec` declaration moved to where it is actually used.
158	`sys::fs::remove` returns a `std::error_code`. I'm not sure we want to be ignoring it - I'd expect some sort of error in that case to be printed.

Address comments.
Enhance tests.

MaskRay marked an inline comment as done.Jul 30 2020, 9:25 AM

MaskRay added inline comments.Jul 30 2020, 9:28 AM

llvm/tools/split-file/split-file.cpp
64	Emm. I think it is debatable whether `undefined` is a suitable type here. See comments starting from https://reviews.llvm.org/D82594#2127217 for some discussions. I actually perform arithmetic near zero below (`i.line_number() - 1`). int gives me more confidence that things don't go off.

MaskRay added inline comments.Jul 30 2020, 9:30 AM

llvm/test/tools/split-file/output-is-special.test
5	I'll change this comment to: ## Don't delete the output if it is special, otherwise root may accidentally ## remove important special files.

Harbormaster completed remote builds in B66413: Diff 281958.Jul 30 2020, 9:47 AM

LGTM, with or without the suggestion, but since others had comments about this, it would be good to get another pair of eyes to give it another look over.

llvm/tools/split-file/split-file.cpp
64	It turns out that `line_number` is an `int64_t`. So I drop my point about `unsigned` or `size_t` (at least for this case, but I'm hardly convinced on the general case discussed on that or previous threads). However, perhaps `leadingLines` etc should be `int64_t` to match and avoid any truncation issues (noting in particular that the value could be anything in the range due to how EOF is handled)?
127	Another possible `int64_t` site.

Thank you for renaming this!

int -> int64_t

Harbormaster completed remote builds in B66799: Diff 282671.Aug 3 2020, 11:40 AM

MaskRay edited the summary of this revision. (Show Details)Aug 3 2020, 2:43 PM

Closed by commit rGbcea3a7a288e: Add test utility 'split-file' (authored by MaskRay). · Explain WhyAug 3 2020, 8:43 PM

This revision was automatically updated to reflect the committed changes.

MaskRay added a commit: rGbcea3a7a288e: Add test utility 'split-file'.

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

In D83834#2201444, @aeubanks wrote:

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

This is strange. A Windows bot I usually look was good: http://45.33.8.238/win/summary.html

Can you please check why '\r' is printed? The output is opened with OF_None (not OF_Text), so I don't expect '\r' to be printed.

At the least diff -b can be used.

Some of the new tests FAIL when run in the same tree a second time:

LLVM :: tools/llvm-strings/radix.test
LLVM :: tools/split-file/basic.test
LLVM :: tools/split-file/empty.test
LLVM :: tools/split-file/no-leading-lines.test

The failure mode is always the same, e.g.

split-file: error: /var/llvm/local-sparcv9-relwithdebinfo-A/test/tools/llvm-strings/Output/radix.test.tmp: File exists

This affects the Solaris buildbots (e.g. the Solaris/sparcv9 one) and all others that run with clean=False.

In D83834#2201665, @MaskRay wrote:

In D83834#2201444, @aeubanks wrote:

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

This is strange. A Windows bot I usually look was good: http://45.33.8.238/win/summary.html

Can you please check why '\r' is printed? The output is opened with OF_None (not OF_Text), so I don't expect '\r' to be printed.

At the least diff -b can be used.

FWIW, I've tested this (on windows) and all tests in \llvm\test\tools\split-file folder pass fine for me.

MaskRay mentioned this in D83852: [llvm-profdata] Implement llvm-profdata overlap for sample profiles.Aug 8 2020, 3:18 PM

In D83834#2201444, @aeubanks wrote:

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

Have you got your git line-endings checkout settings correct? It's possible the \r is being added when you check out a file somewhere, I guess?

In D83834#2201952, @ro wrote:
Some of the new tests FAIL when run in the same tree a second time:
LLVM :: tools/llvm-strings/radix.test
LLVM :: tools/split-file/basic.test
LLVM :: tools/split-file/empty.test
LLVM :: tools/split-file/no-leading-lines.test
The failure mode is always the same, e.g.
split-file: error: /var/llvm/local-sparcv9-relwithdebinfo-A/test/tools/llvm-strings/Output/radix.test.tmp: File exists
This affects the Solaris buildbots (e.g. the Solaris/sparcv9 one) and all others that run with clean=False.

I just tried running the tests twice on my Windows machine, and they pass (no cleaning in between). At a guess, this is something different in the Solaris OS behaviour?

@MaskRay, I just noticed that this new code is in llvm/tools, but I think it belongs in llvm/utils, because it is more like tools like FileCheck, not, count etc intended for internal testing. What do you think?

In D83834#2206034, @jhenderson wrote:

In D83834#2201444, @aeubanks wrote:

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

Have you got your git line-endings checkout settings correct? It's possible the \r is being added when you check out a file somewhere, I guess?

Yup that's it. https://llvm.org/docs/GettingStarted.html#checkout-llvm-from-git says to use autocrlf=false which I wasn't doing. I had to delete and restore those files to get the proper line endings, but now it's good, thanks!

In D83834#2206034, @jhenderson wrote:
In D83834#2201444, @aeubanks wrote:

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

Have you got your git line-endings checkout settings correct? It's possible the \r is being added when you check out a file somewhere, I guess?
In D83834#2201952, @ro wrote:
Some of the new tests FAIL when run in the same tree a second time:
LLVM :: tools/llvm-strings/radix.test
LLVM :: tools/split-file/basic.test
LLVM :: tools/split-file/empty.test
LLVM :: tools/split-file/no-leading-lines.test
The failure mode is always the same, e.g.
split-file: error: /var/llvm/local-sparcv9-relwithdebinfo-A/test/tools/llvm-strings/Output/radix.test.tmp: File exists
This affects the Solaris buildbots (e.g. the Solaris/sparcv9 one) and all others that run with clean=False.
I just tried running the tests twice on my Windows machine, and they pass (no cleaning in between). At a guess, this is something different in the Solaris OS behaviour?

@MaskRay, I just noticed that this new code is in llvm/tools, but I think it belongs in llvm/utils, because it is more like tools like FileCheck, not, count etc intended for internal testing. What do you think?

According to http://lists.llvm.org/pipermail/llvm-dev/2020-August/143995.html

llvm/utils. -> this builds some executables like tablegen

llvm/lib. -> This is all libraries, shouldn’t include executables.

llvm/tools. -> Generally executables, also some libraries that are tool specific.

tests.. -> Things that depend on the above.

I am on the fence which of llvm/tools/split-file and llvm/utils/split-file makes more sense. I'd likely to collect a bit more opinions.

@ro Can you set a breakpoint while running split-file, checking whether split-file can delete an existing file and replace it with a directory on Solaris?
This behavior is intentionally added to support changing %t from a file to a directory. It appears to work well on macOS, Linux and Windows.

In D83834#2208059, @MaskRay wrote:

@ro Can you set a breakpoint while running split-file, checking whether split-file can delete an existing file and replace it with a directory on Solaris?
This behavior is intentionally added to support changing %t from a file to a directory. It appears to work well on macOS, Linux and Windows.

Here's what I found: the failing call to split-file is like this:

./bin/split-file --no-leading-lines /vol/llvm/src/llvm-project/local/llvm/test/tools/split-file/empty.test /var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp

Running under truss shows

19028:  fstatat(AT_FDCWD, "/var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp", 0xFFFFFFFF7FFFE380, AT_SYMLINK_NOFOLLOW) = 0
19028:      d=0x0000010600010012 i=212407 m=0042750 l=2  u=2110  g=4620  sz=3
19028:          at = Aug 10 11:21:41 MEST 2020  [ 1597051301.681593645 ]
19028:          mt = Aug 10 01:46:42 MEST 2020  [ 1597016802.271083280 ]
19028:          ct = Aug 10 01:46:42 MEST 2020  [ 1597016802.271083280 ]
19028:      bsz=512   blks=3     fs=zfs
19028:  unlinkat(AT_FDCWD, "/var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp", AT_REMOVEDIR) Err#17 EEXIST

i.e. it tries to remove a directory. However, this cannot work since the directory in question isn't empty:

$ ls -la /var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp
total 4
drwxr-s--- 2 ro gcc  3 Aug 10 01:46 ./
drwxr-sr-x 5 ro gcc 11 Aug 10 18:32 ../
-rw-r--r-- 1 ro gcc  0 Aug 10 01:46 empty

When I run split-file in gdb with a breakpoint in rmdir, I get

#0  0xffffffff7eeb7388 in rmdir () from /lib/64/libc.so.1
#1  0xffffffff7ee2c47c in remove () from /lib/64/libc.so.1
#2  0x000000010025069c in llvm::sys::fs::remove (path=..., 
    IgnoreNonExisting=true)
    at /vol/llvm/src/llvm-project/local/llvm/lib/Support/Unix/Path.inc:442
#3  0x00000001001821ac in main (argc=4, argv=0xffffffff7fffeb38)
    at /vol/llvm/src/llvm-project/local/llvm/tools/split-file/split-file.cpp:168

So llvm::sys::fs::remove tries to call ::remove on a non-empty directory, while the man page clearly states:

The  remove() function causes the file or empty directory whose name is
the string pointed to by path to be no longer accessible by that  name.
A  subsequent  attempt  to  open  that  file using that name will fail,
unless the file is created anew.

For files, remove() is identical to unlink(). For directories, remove()
is identical to rmdir().

and in turn rmdir(2) says

The  rmdir()  function  removes  the  directory  named by the path name
pointed to by path. The directory must not have any entries other  than
"." and "..".

This just cannot work!

In D83834#2207311, @aeubanks wrote:

In D83834#2206034, @jhenderson wrote:

In D83834#2201444, @aeubanks wrote:

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

Have you got your git line-endings checkout settings correct? It's possible the \r is being added when you check out a file somewhere, I guess?

Yup that's it. https://llvm.org/docs/GettingStarted.html#checkout-llvm-from-git says to use autocrlf=false which I wasn't doing. I had to delete and restore those files to get the proper line endings, but now it's good, thanks!

Might still be worth either making this test more general and/or setting this file to be binary rather than text, so it doesn't trip over this issue?

(from the Getting Started page: "Passing --config core.autocrlf=false should not be required in the future after we adjust the .gitattribute settings correctly, but is required for Windows users at the time of this writing.")

In D83834#2208107, @ro wrote:

In D83834#2208059, @MaskRay wrote:

@ro Can you set a breakpoint while running split-file, checking whether split-file can delete an existing file and replace it with a directory on Solaris?
This behavior is intentionally added to support changing %t from a file to a directory. It appears to work well on macOS, Linux and Windows.

Here's what I found: the failing call to split-file is like this:

./bin/split-file --no-leading-lines /vol/llvm/src/llvm-project/local/llvm/test/tools/split-file/empty.test /var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp

Running under truss shows

19028:  fstatat(AT_FDCWD, "/var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp", 0xFFFFFFFF7FFFE380, AT_SYMLINK_NOFOLLOW) = 0
19028:      d=0x0000010600010012 i=212407 m=0042750 l=2  u=2110  g=4620  sz=3
19028:          at = Aug 10 11:21:41 MEST 2020  [ 1597051301.681593645 ]
19028:          mt = Aug 10 01:46:42 MEST 2020  [ 1597016802.271083280 ]
19028:          ct = Aug 10 01:46:42 MEST 2020  [ 1597016802.271083280 ]
19028:      bsz=512   blks=3     fs=zfs
19028:  unlinkat(AT_FDCWD, "/var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp", AT_REMOVEDIR) Err#17 EEXIST

i.e. it tries to remove a directory. However, this cannot work since the directory in question isn't empty:

$ ls -la /var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp
total 4
drwxr-s--- 2 ro gcc  3 Aug 10 01:46 ./
drwxr-sr-x 5 ro gcc 11 Aug 10 18:32 ../
-rw-r--r-- 1 ro gcc  0 Aug 10 01:46 empty

When I run split-file in gdb with a breakpoint in rmdir, I get

#0  0xffffffff7eeb7388 in rmdir () from /lib/64/libc.so.1
#1  0xffffffff7ee2c47c in remove () from /lib/64/libc.so.1
#2  0x000000010025069c in llvm::sys::fs::remove (path=..., 
    IgnoreNonExisting=true)
    at /vol/llvm/src/llvm-project/local/llvm/lib/Support/Unix/Path.inc:442
#3  0x00000001001821ac in main (argc=4, argv=0xffffffff7fffeb38)
    at /vol/llvm/src/llvm-project/local/llvm/tools/split-file/split-file.cpp:168

So llvm::sys::fs::remove tries to call ::remove on a non-empty directory, while the man page clearly states:

The  remove() function causes the file or empty directory whose name is
the string pointed to by path to be no longer accessible by that  name.
A  subsequent  attempt  to  open  that  file using that name will fail,
unless the file is created anew.

For files, remove() is identical to unlink(). For directories, remove()
is identical to rmdir().

and in turn rmdir(2) says

The  rmdir()  function  removes  the  directory  named by the path name
pointed to by path. The directory must not have any entries other  than
"." and "..".

This just cannot work!

std::errc::directory_not_empty is intentionally excluded. What error code is it on Solaris?

if (std::error_code ec = sys::fs::remove(output, /*IgnoreNonExisting=*/true))
  if (ec.value() != static_cast<int>(std::errc::directory_not_empty))
    fatal(output, ec.message());

In D83834#2208178, @MaskRay wrote:

In D83834#2208107, @ro wrote:

std::errc::directory_not_empty is intentionally excluded. What error code is it on Solaris?

if (std::error_code ec = sys::fs::remove(output, /*IgnoreNonExisting=*/true))
  if (ec.value() != static_cast<int>(std::errc::directory_not_empty))
    fatal(output, ec.message());

On Solaris it's EEXIST/file_exists, as documented on the Solaris rmdir(2) man page and explicitly allowed by POSIX/XPG7:

[EEXIST] or [ENOTEMPTY]
    The path argument names a directory that is not an empty directory, or there are hard links to the directory other than dot or a single entry in dot-dot.

MaskRay mentioned this in rGdbc468dc3199: [split-file] Fix sys::fs::remove() on Solaris after D83834.Aug 11 2020, 8:06 AM

jansvoboda11 mentioned this in D118586: [C++20][Modules][3/8] Initial handling for module partitions..Feb 21 2022, 11:38 PM

Revision Contents

Path

Size

lld/

test/

CMakeLists.txt

2 lines

ELF/

linkerscript/

noload.s

21 lines

llvm/

docs/

TestingGuide.rst

23 lines

test/

CMakeLists.txt

1 line

lit.cfg.py

1 line

tools/

gold/

X86/

multiple-sections.ll

14 lines

llvm-strings/

radix.test

45 lines

split-file/

Inputs/

2 lines

6 lines

8 lines

40 lines

4 lines

16 lines

6 lines

no-leading-lines.test

10 lines

output-is-special.test

8 lines

tools/

split-file/

.clang-tidy

19 lines

CMakeLists.txt

7 lines

split-file.cpp

172 lines

utils/

gn/

secondary/

lld/

test/

BUILD.gn

1 line

llvm/

test/

BUILD.gn

1 line

tools/

split-file/

BUILD.gn

4 lines

Diff 282793

lld/test/CMakeLists.txt

Show All 22 Lines	configure_lit_site_cfg(
)		)

set(LLD_TEST_DEPS lld)		set(LLD_TEST_DEPS lld)
if (NOT LLD_BUILT_STANDALONE)		if (NOT LLD_BUILT_STANDALONE)
list(APPEND LLD_TEST_DEPS		list(APPEND LLD_TEST_DEPS
FileCheck count llc llvm-ar llvm-as llvm-bcanalyzer llvm-config llvm-cvtres		FileCheck count llc llvm-ar llvm-as llvm-bcanalyzer llvm-config llvm-cvtres
llvm-dis llvm-dwarfdump llvm-lib llvm-lipo llvm-mc llvm-nm llvm-objcopy		llvm-dis llvm-dwarfdump llvm-lib llvm-lipo llvm-mc llvm-nm llvm-objcopy
llvm-objdump llvm-pdbutil llvm-readelf llvm-readobj llvm-strip not obj2yaml		llvm-objdump llvm-pdbutil llvm-readelf llvm-readobj llvm-strip not obj2yaml
opt yaml2obj		opt split-file yaml2obj
)		)
endif()		endif()

if (LLVM_INCLUDE_TESTS)		if (LLVM_INCLUDE_TESTS)
list(APPEND LLD_TEST_DEPS LLDUnitTests)		list(APPEND LLD_TEST_DEPS LLDUnitTests)
endif()		endif()

add_lit_testsuite(check-lld "Running lld test suite"		add_lit_testsuite(check-lld "Running lld test suite"
Show All 17 Lines

lld/test/ELF/linkerscript/noload.s

	# REQUIRES: x86			# REQUIRES: x86
	# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t.o			# RUN: split-file %s %t
				grimarUnsubmitted Done Reply Inline Actions This new style in LLD tests looks much better to me! grimar: This new style in LLD tests looks much better to me!
	# RUN: echo "SECTIONS { \			# RUN: llvm-mc -filetype=obj -triple=x86_64 %t/asm -o %t.o
	# RUN: .data_noload_a (NOLOAD) : { *(.data_noload_a) } \			# RUN: ld.lld --script %t/lds %t.o -o %t/out
	# RUN: .data_noload_b (0x10000) (NOLOAD) : { *(.data_noload_b) } \			# RUN: llvm-readelf -S -l %t/out \| FileCheck %s
	# RUN: .no_input_sec_noload (NOLOAD) : { . += 1; } \
	# RUN: .text (0x20000) : { *(.text) } };" > %t.script
	# RUN: ld.lld -o %t --script %t.script %t.o
	# RUN: llvm-readelf -S -l %t \| FileCheck %s

	# CHECK: Name Type Address Off Size			# CHECK: Name Type Address Off Size
	# CHECK: .data_noload_a NOBITS 0000000000000000 [[OFF:[0-9a-f]+]] 001000			# CHECK: .data_noload_a NOBITS 0000000000000000 [[OFF:[0-9a-f]+]] 001000
	# CHECK-NEXT: .data_noload_b NOBITS 0000000000010000 [[OFF]] 001000			# CHECK-NEXT: .data_noload_b NOBITS 0000000000010000 [[OFF]] 001000
	# CHECK-NEXT: .no_input_sec_noload NOBITS 0000000000011000 [[OFF]] 000001			# CHECK-NEXT: .no_input_sec_noload NOBITS 0000000000011000 [[OFF]] 000001

	# CHECK: Type Offset VirtAddr PhysAddr			# CHECK: Type Offset VirtAddr PhysAddr
	# CHECK-NEXT: LOAD 0x001000 0x0000000000020000 0x0000000000020000			# CHECK-NEXT: LOAD 0x001000 0x0000000000020000 0x0000000000020000

				#--- asm
	.section .text,"ax",@progbits			.section .text,"ax",@progbits
	nop			nop

	.section .data_noload_a,"aw",@progbits			.section .data_noload_a,"aw",@progbits
	.zero 4096			.zero 4096

	.section .data_noload_b,"aw",@progbits			.section .data_noload_b,"aw",@progbits
	.zero 4096			.zero 4096

				#--- lds
				SECTIONS {
				.data_noload_a (NOLOAD) : { *(.data_noload_a) }
				.data_noload_b (0x10000) (NOLOAD) : { *(.data_noload_b) }
				.no_input_sec_noload (NOLOAD) : { . += 1; }
				.text (0x20000) : { *(.text) }
				}

llvm/docs/TestingGuide.rst

	Show First 20 Lines • Show All 265 Lines • ▼ Show 20 Lines

	Put related tests into a single file rather than having a separate file per			Put related tests into a single file rather than having a separate file per
	test. Check if there are files already covering your feature and consider			test. Check if there are files already covering your feature and consider
	adding your code there instead of creating a new file.			adding your code there instead of creating a new file.

	Extra files			Extra files
	-----------			-----------

	If your test requires extra files besides the file containing the ``RUN:``			If your test requires extra files besides the file containing the ``RUN:`` lines
	lines, the idiomatic place to put them is in a subdirectory ``Inputs``.			and the extra files are small, consider specifying them in the same file and
				using ``split-file`` to extract them. For example,

				.. code-block:: llvm

				; RUN: split-file %s %t
				; RUN: llvm-link -S %t/a.ll %t/b.ll \| FileCheck %s

				; CHECK: ...

				;--- a.ll
				...
				;--- b.ll
				...

				The parts are separated by the regex ``^(.\|//)--- <part>``. By default the
				extracted content has leading empty lines to preserve line numbers. Specify
				``--no-leading-lines`` to drop leading lines.

				If the extra files are large, the idiomatic place to put them is in a subdirectory ``Inputs``.
	You can then refer to the extra files as ``%S/Inputs/foo.bar``.			You can then refer to the extra files as ``%S/Inputs/foo.bar``.

	For example, consider ``test/Linker/ident.ll``. The directory structure is			For example, consider ``test/Linker/ident.ll``. The directory structure is
	as follows::			as follows::

	test/			test/
	Linker/			Linker/
	ident.ll			ident.ll
	▲ Show 20 Lines • Show All 339 Lines • Show Last 20 Lines

llvm/test/CMakeLists.txt

Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	set(LLVM_TEST_DEPENDS
llvm-tblgen		llvm-tblgen
llvm-undname		llvm-undname
llvm-xray		llvm-xray
not		not
obj2yaml		obj2yaml
opt		opt
sancov		sancov
sanstats		sanstats
		split-file
verify-uselistorder		verify-uselistorder
yaml-bench		yaml-bench
yaml2obj		yaml2obj
)		)

if(TARGET llvm-lto)		if(TARGET llvm-lto)
set(LLVM_TEST_DEPENDS ${LLVM_TEST_DEPENDS} llvm-lto)		set(LLVM_TEST_DEPENDS ${LLVM_TEST_DEPENDS} llvm-lto)
endif()		endif()
▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

llvm/test/lit.cfg.py

Show First 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	tools = [
ToolSubst('%gold', config.gold_executable, unresolved='ignore'),		ToolSubst('%gold', config.gold_executable, unresolved='ignore'),
ToolSubst('%ld64', ld64_cmd, unresolved='ignore'),		ToolSubst('%ld64', ld64_cmd, unresolved='ignore'),
ToolSubst('%ocamlc', ocamlc_command, unresolved='ignore'),		ToolSubst('%ocamlc', ocamlc_command, unresolved='ignore'),
ToolSubst('%ocamlopt', ocamlopt_command, unresolved='ignore'),		ToolSubst('%ocamlopt', ocamlopt_command, unresolved='ignore'),
ToolSubst('%opt-viewer', opt_viewer_cmd),		ToolSubst('%opt-viewer', opt_viewer_cmd),
ToolSubst('%llvm-objcopy', FindTool('llvm-objcopy')),		ToolSubst('%llvm-objcopy', FindTool('llvm-objcopy')),
ToolSubst('%llvm-strip', FindTool('llvm-strip')),		ToolSubst('%llvm-strip', FindTool('llvm-strip')),
ToolSubst('%llvm-install-name-tool', FindTool('llvm-install-name-tool')),		ToolSubst('%llvm-install-name-tool', FindTool('llvm-install-name-tool')),
		ToolSubst('%split-file', FindTool('split-file')),
]		]

# FIXME: Why do we have both `lli` and `%lli` that do slightly different things?		# FIXME: Why do we have both `lli` and `%lli` that do slightly different things?
tools.extend([		tools.extend([
'dsymutil', 'lli', 'lli-child-target', 'llvm-ar', 'llvm-as',		'dsymutil', 'lli', 'lli-child-target', 'llvm-ar', 'llvm-as',
'llvm-addr2line', 'llvm-bcanalyzer', 'llvm-config', 'llvm-cov',		'llvm-addr2line', 'llvm-bcanalyzer', 'llvm-config', 'llvm-cov',
'llvm-cxxdump', 'llvm-cvtres', 'llvm-diff', 'llvm-dis', 'llvm-dwarfdump',		'llvm-cxxdump', 'llvm-cvtres', 'llvm-diff', 'llvm-dis', 'llvm-dwarfdump',
'llvm-exegesis', 'llvm-extract', 'llvm-isel-fuzzer', 'llvm-ifs',		'llvm-exegesis', 'llvm-extract', 'llvm-isel-fuzzer', 'llvm-ifs',
▲ Show 20 Lines • Show All 211 Lines • Show Last 20 Lines

llvm/test/tools/gold/X86/multiple-sections.ll

	; RUN: echo ".text.tin" > %t_order_lto.txt			; RUN: split-file %s %t
	; RUN: echo ".text._start" >> %t_order_lto.txt			; RUN: llvm-as %t/a.ll -o %t.o
	; RUN: echo ".text.pat" >> %t_order_lto.txt
	; RUN: llvm-as %s -o %t.o
	; RUN: %gold -plugin %llvmshlibdir/LLVMgold%shlibext \			; RUN: %gold -plugin %llvmshlibdir/LLVMgold%shlibext \
	; RUN: -m elf_x86_64 -o %t.exe %t.o \			; RUN: -m elf_x86_64 -o %t.exe %t.o \
	; RUN: --section-ordering-file=%t_order_lto.txt			; RUN: --section-ordering-file=%t/order
	; RUN: llvm-readelf -s %t.exe \| FileCheck %s			; RUN: llvm-readelf -s %t.exe \| FileCheck %s

	; Check that the order of the sections is tin -> _start -> pat.			; Check that the order of the sections is tin -> _start -> pat.

	; CHECK: 00000000004000d0 1 FUNC LOCAL DEFAULT 1 pat			; CHECK: 00000000004000d0 1 FUNC LOCAL DEFAULT 1 pat
	; CHECK: 00000000004000b0 1 FUNC LOCAL DEFAULT 1 tin			; CHECK: 00000000004000b0 1 FUNC LOCAL DEFAULT 1 tin
	; CHECK: 00000000004000c0 15 FUNC GLOBAL DEFAULT 1 _start			; CHECK: 00000000004000c0 15 FUNC GLOBAL DEFAULT 1 _start

				;--- order
				.text.tin
				.text._start
				.text.pat

				;--- a.ll
	target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	define void @pat() #0 {			define void @pat() #0 {
	ret void			ret void
	}			}

	define void @tin() #0 {			define void @tin() #0 {
	Show All 10 Lines

llvm/test/tools/llvm-strings/radix.test

	## Show that llvm-strings can handle the -t/--radix switch properly.			## Show that llvm-strings can handle the -t/--radix switch properly.

	RUN: echo one > %t			RUN: split-file --no-leading-lines %s %t
	RUN: echo two >> %t			#--- a.txt
	RUN: echo three >> %t			one
	RUN: echo four >> %t			two
	RUN: echo five >> %t			three
	RUN: echo six >> %t			four
	RUN: echo seven >> %t			five
	RUN: echo eight >> %t			six
	RUN: echo nine >> %t			seven
	RUN: echo ten >> %t			eight
				nine
	RUN: llvm-strings %t \| FileCheck %s -check-prefix CHECK-NONE --implicit-check-not={{.}}			ten
	RUN: llvm-strings -t d %t \| FileCheck %s -check-prefix CHECK-DEC --strict-whitespace --implicit-check-not={{.}}			#--- end
	RUN: llvm-strings -t o %t \| FileCheck %s -check-prefix CHECK-OCT --strict-whitespace --implicit-check-not={{.}}
	RUN: llvm-strings -t x %t \| FileCheck %s -check-prefix CHECK-HEX --strict-whitespace --implicit-check-not={{.}}			RUN: llvm-strings %t/a.txt \| FileCheck %s -check-prefix CHECK-NONE --implicit-check-not={{.}}
				RUN: llvm-strings -t d %t/a.txt \| FileCheck %s -check-prefix CHECK-DEC --strict-whitespace --implicit-check-not={{.}}
				RUN: llvm-strings -t o %t/a.txt \| FileCheck %s -check-prefix CHECK-OCT --strict-whitespace --implicit-check-not={{.}}
				RUN: llvm-strings -t x %t/a.txt \| FileCheck %s -check-prefix CHECK-HEX --strict-whitespace --implicit-check-not={{.}}

	## Show --radix works too.			## Show --radix works too.
	RUN: llvm-strings --radix d %t \| FileCheck %s -check-prefix CHECK-DEC --strict-whitespace			RUN: llvm-strings --radix d %t/a.txt \| FileCheck %s -check-prefix CHECK-DEC --strict-whitespace
	RUN: llvm-strings --radix o %t \| FileCheck %s -check-prefix CHECK-OCT --strict-whitespace			RUN: llvm-strings --radix o %t/a.txt \| FileCheck %s -check-prefix CHECK-OCT --strict-whitespace
	RUN: llvm-strings --radix x %t \| FileCheck %s -check-prefix CHECK-HEX --strict-whitespace			RUN: llvm-strings --radix x %t/a.txt \| FileCheck %s -check-prefix CHECK-HEX --strict-whitespace

	## Show different syntaxes work.			## Show different syntaxes work.
	RUN: llvm-strings --radix=d %t \| FileCheck %s -check-prefix CHECK-DEC --strict-whitespace			RUN: llvm-strings --radix=d %t/a.txt \| FileCheck %s -check-prefix CHECK-DEC --strict-whitespace
	RUN: llvm-strings -t=d %t \| FileCheck %s -check-prefix CHECK-DEC --strict-whitespace			RUN: llvm-strings -t=d %t/a.txt \| FileCheck %s -check-prefix CHECK-DEC --strict-whitespace

	CHECK-NONE: {{^}}three			CHECK-NONE: {{^}}three
	CHECK-NONE: {{^}}four			CHECK-NONE: {{^}}four
	CHECK-NONE: {{^}}five			CHECK-NONE: {{^}}five
	CHECK-NONE: {{^}}seven			CHECK-NONE: {{^}}seven
	CHECK-NONE: {{^}}eight			CHECK-NONE: {{^}}eight
	CHECK-NONE: {{^}}nine			CHECK-NONE: {{^}}nine

	Show All 14 Lines
	CHECK-HEX: {{^}} 8 three			CHECK-HEX: {{^}} 8 three
	CHECK-HEX: {{^}} e four			CHECK-HEX: {{^}} e four
	CHECK-HEX: {{^}} 13 five			CHECK-HEX: {{^}} 13 five
	CHECK-HEX: {{^}} 1c seven			CHECK-HEX: {{^}} 1c seven
	CHECK-HEX: {{^}} 22 eight			CHECK-HEX: {{^}} 22 eight
	CHECK-HEX: {{^}} 28 nine			CHECK-HEX: {{^}} 28 nine

	## Show that an invalid value is rejected.			## Show that an invalid value is rejected.
	RUN: not llvm-strings --radix z %t 2>&1 \| FileCheck %s --check-prefix=INVALID			RUN: not llvm-strings --radix z %t/a.txt 2>&1 \| FileCheck %s --check-prefix=INVALID
	INVALID: llvm-strings{{.*}}: for the --radix option: Cannot find option named 'z'!			INVALID: llvm-strings{{.*}}: for the --radix option: Cannot find option named 'z'!

llvm/test/tools/split-file/Inputs/basic-aa.txt

This file was added.


				aa

llvm/test/tools/split-file/Inputs/basic-bb.txt

This file was added.




				; Comments are preserved.
				bb

llvm/test/tools/split-file/Inputs/basic-cc.txt

This file was added.








				cc

llvm/test/tools/split-file/basic.test

This file was added.

				#--- aa
				aa
				;--- bb
				; Comments are preserved.
				bb

				//--- subdir/cc
				cc
				//--- end

				# RUN: rm -rf %t
				MaskRayAuthorUnsubmitted Done Reply Inline Actions I'll enhance the test a bit: ## Test that we will delete the output if it is a file, so that we can create ## a directory. # RUN: touch %t MaskRay: I'll enhance the test a bit: ``` ## Test that we will delete the output if it is a file, so…
				# RUN: split-file %s %t
				# RUN: diff %S/Inputs/basic-aa.txt %t/aa
				# RUN: diff %S/Inputs/basic-bb.txt %t/bb
				# RUN: diff %S/Inputs/basic-cc.txt %t/subdir/cc
				# RUN: FileCheck %s --check-prefix=END < %t/end

				## Can be called on a non-empty directory.
				# RUN: split-file %s %t
				# RUN: diff %S/Inputs/basic-aa.txt %t/aa

				## Test that we will delete the output if it is a file, so that we can create
				## a directory.
				# RUN: rm -rf %t && touch %t
				# RUN: split-file %s %t
				# RUN: diff %S/Inputs/basic-aa.txt %t/aa

				# END: RUN: split-file %s %t

				# RUN: not %split-file 2>&1 \| FileCheck %s --check-prefix=NO_INPUT

				# NO_INPUT: split-file: error: input filename is not specified

				# RUN: not %split-file %s '' 2>&1 \| FileCheck %s --check-prefix=NO_OUTPUT

				# NO_OUTPUT: split-file: error: output directory is not specified

				# RUN: not %split-file %S/Inputs/basic-aa.txt %t 2>&1 \| FileCheck %s --check-prefix=NOT_EXIST

				# NOT_EXIST: split-file: error: {{.*}}.txt: no part separator was found

llvm/test/tools/split-file/empty.test

This file was added.

				# RUN: split-file --no-leading-lines %s %t
				# RUN: count 0 < %t/empty

				#--- empty

llvm/test/tools/split-file/error.test

This file was added.

				# RUN: not %split-file %s %t 2>&1 \| FileCheck %s
				# RUN: not ls %t/dup

				# CHECK: {{.*}}.test:[[#@LINE+1]]: error: empty part name
				//---

				# CHECK: {{.*}}.test:[[#@LINE+1]]: error: part name cannot have leading or trailing space
				//--- leading_space

				# CHECK: {{.*}}.test:[[#@LINE+1]]: error: part name cannot have leading or trailing space
				//--- trailing_space

				;--- dup

				# CHECK: {{.*}}.test:[[#@LINE+1]]: error: ';--- dup' occurs more than once
				;--- dup

llvm/test/tools/split-file/help.test

This file was added.

				RUN: split-file --help 2>&1 \| FileCheck --implicit-check-not='General Options:' %s
				CHECK: OVERVIEW: Split input {{.*}}
				CHECK: USAGE: split-file [options] filename directory
				CHECK: Generic Options:
				CHECK: split-file Options:
				CHECK: --no-leading-lines

llvm/test/tools/split-file/no-leading-lines.test

This file was added.

				## With --no-leading-lines, don't add leading lines (which is used to preserve line numbers).

				# RUN: split-file --no-leading-lines %s %t
				# RUN: count 1 < %t/a.txt
				# RUN: FileCheck %s < %t/a.txt

				# CHECK: input

				#--- a.txt
				input

llvm/test/tools/split-file/output-is-special.test

This file was added.

				# UNSUPPORTED: system-windows
				# REQUIRES: shell

				## Don't delete the output if it is special, otherwise root may accidentally
				## remove important special files.
				MaskRayAuthorUnsubmitted Done Reply Inline Actions I'll change this comment to: ## Don't delete the output if it is special, otherwise root may accidentally ## remove important special files. MaskRay: I'll change this comment to: ``` ## Don't delete the output if it is special, otherwise root…
				# RUN: not split-file %s /dev/null 2>&1 \| FileCheck %s

				# CHECK: error: /dev/null: output cannot be a special file

llvm/tools/split-file/.clang-tidy

This file was added.

				# Almost identical to the top-level .clang-tidy, except that {Member,Parameter,Variable}Case use camelBack.
				Checks: '-,clang-diagnostic-,llvm-,misc-,-misc-unused-parameters,-misc-non-private-member-variables-in-classes,readability-identifier-naming'
				CheckOptions:
				- key: readability-identifier-naming.ClassCase
				value: CamelCase
				- key: readability-identifier-naming.EnumCase
				value: CamelCase
				- key: readability-identifier-naming.FunctionCase
				value: camelBack
				- key: readability-identifier-naming.MemberCase
				value: camelBack
				- key: readability-identifier-naming.ParameterCase
				value: camelBack
				- key: readability-identifier-naming.UnionCase
				value: CamelCase
				- key: readability-identifier-naming.VariableCase
				value: camelBack
				- key: readability-identifier-naming.IgnoreMainLikeFunctions
				value: 1

llvm/tools/split-file/CMakeLists.txt

This file was added.

				set(LLVM_LINK_COMPONENTS
				Support
				)

				add_llvm_tool(split-file
				split-file.cpp
				)

llvm/tools/split-file/split-file.cpp

This file was added.

				//===- split-file.cpp - Input splitting utility ---------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Split input into multipe parts separated by regex '^(.\|//)--- ' and extract
				// the specified part.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/StringExtras.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/FileOutputBuffer.h"
				#include "llvm/Support/LineIterator.h"
				#include "llvm/Support/MemoryBuffer.h"
				#include "llvm/Support/Path.h"
				#include "llvm/Support/ToolOutputFile.h"
				#include "llvm/Support/WithColor.h"
				#include <string>
				#include <system_error>

				using namespace llvm;

				static cl::OptionCategory cat("split-file Options");

				static cl::opt<std::string> input(cl::Positional, cl::desc("filename"),
				cl::cat(cat));

				static cl::opt<std::string> output(cl::Positional, cl::desc("directory"),
				cl::value_desc("directory"), cl::cat(cat));

				static cl::opt<bool> noLeadingLines("no-leading-lines",
				cl::desc("Don't preserve line numbers"),
				cl::cat(cat));

				static StringRef toolName;
				static int errorCount;

				LLVM_ATTRIBUTE_NORETURN static void fatal(StringRef filename,
				const Twine &message) {
				if (filename.empty())
				WithColor::error(errs(), toolName) << message << '\n';
				else
				WithColor::error(errs(), toolName) << filename << ": " << message << '\n';
				exit(1);
				}

				static void error(StringRef filename, int64_t line, const Twine &message) {
				++errorCount;
				errs() << filename << ':' << line << ": ";
				WithColor::error(errs()) << message << '\n';
				}

				namespace {
				struct Part {
				const char *begin = nullptr;
				const char *end = nullptr;
				int64_t leadingLines = 0;
				};
				jhendersonUnsubmitted Done Reply Inline Actions Whilst it's probably harmless, I don't think this should be an `int` (and similar comments for other liner number variables, e.g. `lineNo` in `handle`), since it can never be negative. `size_t` or possibly `unsigned` seem more appropriate for the context. jhenderson: Whilst it's probably harmless, I don't think this should be an `int` (and similar comments for…
				MaskRayAuthorUnsubmitted Done Reply Inline Actions Emm. I think it is debatable whether `undefined` is a suitable type here. See comments starting from https://reviews.llvm.org/D82594#2127217 for some discussions. I actually perform arithmetic near zero below (`i.line_number() - 1`). int gives me more confidence that things don't go off. MaskRay: Emm. I think it is debatable whether `undefined` is a suitable type here. See comments starting…
				jhendersonUnsubmitted Done Reply Inline Actions It turns out that `line_number` is an `int64_t`. So I drop my point about `unsigned` or `size_t` (at least for this case, but I'm hardly convinced on the general case discussed on that or previous threads). However, perhaps `leadingLines` etc should be `int64_t` to match and avoid any truncation issues (noting in particular that the value could be anything in the range due to how EOF is handled)? jhenderson: It turns out that `line_number` is an `int64_t`. So I drop my point about `unsigned` or…
				} // namespace

				static int handle(MemoryBuffer &inputBuf, StringRef input) {
				DenseMap<StringRef, Part> partToBegin;
				StringRef lastPart, separator;
				for (line_iterator i(inputBuf, /SkipBlanks=/false, '\0'); !i.is_at_eof();) {
				const int64_t lineNo = i.line_number();
				const StringRef line = *i++;
				const size_t markerLen = line.startswith("//") ? 6 : 5;
				if (!(line.size() >= markerLen &&
				line.substr(markerLen - 4).startswith("--- ")))
				continue;
				separator = line.substr(0, markerLen);
				const StringRef partName = line.substr(markerLen);
				if (partName.empty()) {
				error(input, lineNo, "empty part name");
				continue;
				}
				if (isSpace(partName.front()) \|\| isSpace(partName.back())) {
				error(input, lineNo, "part name cannot have leading or trailing space");
				continue;
				}

				auto res = partToBegin.try_emplace(partName);
				if (!res.second) {
				error(input, lineNo,
				"'" + separator + partName + "' occurs more than once");
				continue;
				}
				if (!lastPart.empty())
				partToBegin[lastPart].end = line.data();
				Part &cur = res.first->second;
				if (!i.is_at_eof())
				jhendersonUnsubmitted Done Reply Inline Actions I'm not sure it's immediately obvious here what the type of `cur` is, so I'd prefer no `auto`. jhenderson: I'm not sure it's immediately obvious here what the type of `cur` is, so I'd prefer no `auto`.
				cur.begin = i->data();
				// If --no-leading-lines is not specified, numEmptyLines is 0. Append
				// newlines so that the extracted part preserves line numbers.
				cur.leadingLines = noLeadingLines ? 0 : i.line_number() - 1;

				lastPart = partName;
				}
				if (lastPart.empty())
				fatal(input, "no part separator was found");
				if (errorCount)
				return 1;
				partToBegin[lastPart].end = inputBuf.getBufferEnd();

				std::vector<std::unique_ptr<ToolOutputFile>> outputFiles;
				SmallString<256> partPath;
				for (auto &keyValue : partToBegin) {
				partPath.clear();
				jhendersonUnsubmitted Done Reply Inline Actions The name `it` implies to me that this is an iterator, but it's actually a pair, right? Perhaps best to change the name to something (e.g. `keyValue` or whatever) jhenderson: The name `it` implies to me that this is an iterator, but it's actually a pair, right? Perhaps…
				sys::path::append(partPath, output, keyValue.first);
				std::error_code ec =
				sys::fs::create_directories(sys::path::parent_path(partPath));
				if (ec)
				jhendersonUnsubmitted Done Reply Inline Actions You already have an `ec` local variable here. Perhaps this should be renamed and/or the earlier `ec` declaration moved to where it is actually used. jhenderson: You already have an `ec` local variable here. Perhaps this should be renamed and/or the earlier…
				fatal(input, ec.message());
				auto f = std::make_unique<ToolOutputFile>(partPath.str(), ec,
				llvm::sys::fs::OF_None);
				if (!f)
				fatal(input, ec.message());

				Part &part = keyValue.second;
				for (int64_t i = 0; i != part.leadingLines; ++i)
				(*f).os().write('\n');
				jhendersonUnsubmitted Done Reply Inline Actions Another possible `int64_t` site. jhenderson: Another possible `int64_t` site.
				if (part.begin)
				(*f).os().write(part.begin, part.end - part.begin);
				outputFiles.push_back(std::move(f));
				}

				for (std::unique_ptr<ToolOutputFile> &outputFile : outputFiles)
				outputFile->keep();
				return 0;
				}

				int main(int argc, const char **argv) {
				toolName = sys::path::stem(argv[0]);
				cl::HideUnrelatedOptions({&cat});
				cl::ParseCommandLineOptions(
				argc, argv,
				"Split input into multiple parts separated by regex '^(.\|//)--- ' and "
				"extract the part specified by '^(.\|//)--- <part>'\n",
				nullptr,
				/EnvVar=/nullptr,
				/LongOptionsUseDoubleDash=/true);

				if (input.empty())
				fatal("", "input filename is not specified");
				if (output.empty())
				fatal("", "output directory is not specified");
				ErrorOr<std::unique_ptr<MemoryBuffer>> bufferOrErr =
				MemoryBuffer::getFileOrSTDIN(input);
				if (std::error_code ec = bufferOrErr.getError())
				fatal(input, ec.message());

				// Delete output if it is a file or an empty directory, so that we can create
				jhendersonUnsubmitted Done Reply Inline Actions `sys::fs::remove` returns a `std::error_code`. I'm not sure we want to be ignoring it - I'd expect some sort of error in that case to be printed. jhenderson: `sys::fs::remove` returns a `std::error_code`. I'm not sure we want to be ignoring it - I'd…
				// a directory.
				sys::fs::file_status status;
				if (std::error_code ec = sys::fs::status(output, status))
				if (ec.value() != static_cast<int>(std::errc::no_such_file_or_directory))
				fatal(output, ec.message());
				if (status.type() != sys::fs::file_type::file_not_found &&
				status.type() != sys::fs::file_type::directory_file &&
				status.type() != sys::fs::file_type::regular_file)
				fatal(output, "output cannot be a special file");
				if (std::error_code ec = sys::fs::remove(output, /IgnoreNonExisting=/true))
				if (ec.value() != static_cast<int>(std::errc::directory_not_empty))
				fatal(output, ec.message());
				return handle(**bufferOrErr, input);
				}

llvm/utils/gn/secondary/lld/test/BUILD.gn

Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines	deps = [
"//llvm/tools/llvm-mc",		"//llvm/tools/llvm-mc",
"//llvm/tools/llvm-nm:symlinks",		"//llvm/tools/llvm-nm:symlinks",
"//llvm/tools/llvm-objcopy:symlinks",		"//llvm/tools/llvm-objcopy:symlinks",
"//llvm/tools/llvm-objdump:symlinks",		"//llvm/tools/llvm-objdump:symlinks",
"//llvm/tools/llvm-pdbutil",		"//llvm/tools/llvm-pdbutil",
"//llvm/tools/llvm-readobj:symlinks",		"//llvm/tools/llvm-readobj:symlinks",
"//llvm/tools/obj2yaml",		"//llvm/tools/obj2yaml",
"//llvm/tools/opt",		"//llvm/tools/opt",
		"//llvm/tools/split-file",
"//llvm/tools/yaml2obj",		"//llvm/tools/yaml2obj",
"//llvm/utils/FileCheck",		"//llvm/utils/FileCheck",
"//llvm/utils/count",		"//llvm/utils/count",
"//llvm/utils/llvm-lit",		"//llvm/utils/llvm-lit",
"//llvm/utils/not",		"//llvm/utils/not",
]		]
testonly = true		testonly = true
}		}
Show All 23 Lines

llvm/utils/gn/secondary/llvm/test/BUILD.gn

Show First 20 Lines • Show All 254 Lines • ▼ Show 20 Lines	deps = [
"//llvm/tools/llvm-symbolizer:symlinks",		"//llvm/tools/llvm-symbolizer:symlinks",
"//llvm/tools/llvm-undname",		"//llvm/tools/llvm-undname",
"//llvm/tools/llvm-xray",		"//llvm/tools/llvm-xray",
"//llvm/tools/lto",		"//llvm/tools/lto",
"//llvm/tools/obj2yaml",		"//llvm/tools/obj2yaml",
"//llvm/tools/opt",		"//llvm/tools/opt",
"//llvm/tools/sancov",		"//llvm/tools/sancov",
"//llvm/tools/sanstats",		"//llvm/tools/sanstats",
		"//llvm/tools/split-file",
"//llvm/tools/verify-uselistorder",		"//llvm/tools/verify-uselistorder",
"//llvm/tools/yaml2obj",		"//llvm/tools/yaml2obj",
"//llvm/unittests",		"//llvm/unittests",
"//llvm/utils/FileCheck",		"//llvm/utils/FileCheck",
"//llvm/utils/TableGen:llvm-tblgen",		"//llvm/utils/TableGen:llvm-tblgen",
"//llvm/utils/count",		"//llvm/utils/count",
"//llvm/utils/llvm-lit",		"//llvm/utils/llvm-lit",
"//llvm/utils/not",		"//llvm/utils/not",
▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/utils/gn/secondary/llvm/tools/split-file/BUILD.gn

This file was added.

				executable("split-file") {
				deps = [ "//llvm/lib/Support" ]
				sources = [ "split-file.cpp" ]
				}

This is an archive of the discontinued LLVM Phabricator instance.

Add test utility 'split-file'ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 282793

lld/test/CMakeLists.txt

lld/test/ELF/linkerscript/noload.s

llvm/docs/TestingGuide.rst

llvm/test/CMakeLists.txt

llvm/test/lit.cfg.py

llvm/test/tools/gold/X86/multiple-sections.ll

llvm/test/tools/llvm-strings/radix.test

llvm/test/tools/split-file/Inputs/basic-aa.txt

llvm/test/tools/split-file/Inputs/basic-bb.txt

llvm/test/tools/split-file/Inputs/basic-cc.txt

llvm/test/tools/split-file/basic.test

llvm/test/tools/split-file/empty.test

llvm/test/tools/split-file/error.test

llvm/test/tools/split-file/help.test

llvm/test/tools/split-file/no-leading-lines.test

llvm/test/tools/split-file/output-is-special.test

llvm/tools/split-file/.clang-tidy

llvm/tools/split-file/CMakeLists.txt

llvm/tools/split-file/split-file.cpp

llvm/utils/gn/secondary/lld/test/BUILD.gn

llvm/utils/gn/secondary/llvm/test/BUILD.gn

llvm/utils/gn/secondary/llvm/tools/split-file/BUILD.gn

Add test utility 'split-file'
ClosedPublic