This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
test/
-
CMakeLists.txt
-
lit.cfg.py
-
tools/extract/
-
extract/
-
CMakeLists.txt
23/23
extract.cpp

Differential D83834

Add test utility 'split-file'
ClosedPublic

Authored by MaskRay on Jul 14 2020, 5:31 PM.

Download Raw Diff

Details

Reviewers

dblaikie
echristo
grimar
jhenderson
probinson
rsmith
• espindola
alexander-shaposhnikov
rupprecht
lattner

Commits

rGbcea3a7a288e: Add test utility 'split-file'
rGd054c7ee2e9f: Add test utility 'extract'

Summary

See https://lists.llvm.org/pipermail/llvm-dev/2020-July/143373.html
"[llvm-dev] Multiple documents in one test file" for some discussions.

This patch has explored several alternatives. The current semantics are similar to
what @dblaikie proposed.
split-file filename output splits the input file into multiple parts separated by
regex ^(.|//)--- filename and write each part to the file output/filename
(filename can include path separators).

Use case A (organizing input of different formats (e.g. linker
script+assembly) in one file).

# RUN: split-file %s %t
# RUN: llvm-mc %t/asm -o %t.o
# RUN: ld.lld -T %t/lds %t.o -o %t
This is sometimes better than the %S/Inputs/ approach because the user
can see the auxiliary files immediately and don't have to open another file.

# asm
...
# lds
...

Use case B (for utilities which don't have built-in input splitting
feature):

// RUN: split-file %s %t
// RUN: llc < %t/1.ll | FileCheck %s --check-prefix=CASE1
// RUN: llc < %t/2.ll | FileCheck %s --check-prefix=CASE2
Combing tests prudently can improve readability.
For example, when testing parsing errors if the recovery mechanism isn't possible,
grouping the tests in one file can more readily see test coverage/strategy.

//--- 1.ll
...
//--- 2.ll
...

Since this is a new utility, there is no git history concerns for
UpperCase variable names. I use lowerCase variable names like mlir/lld.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

MaskRay created this revision.Jul 14 2020, 5:31 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2020, 5:31 PM

Herald added subscribers: llvm-commits, stephenneuendorffer, rriddle, mgorny. · View Herald Transcript

MaskRay mentioned this in D83725: [llvm-mc] Add --doc-id=<id> to support multiple documents in a file.Jul 14 2020, 5:33 PM

Harbormaster failed remote builds in B64265: Diff 278028!Jul 14 2020, 5:43 PM

lkail added a subscriber: lkail.Jul 14 2020, 6:34 PM

gargaroff added a subscriber: gargaroff.Jul 15 2020, 12:16 AM

I wonder if this utility deserves a small number of tests to show the behaviour is exactly what we want? I think it's fine, but it's not obviously correct that the utility is doing what we want.

llvm/tools/extract/extract.cpp
24	`toolName` Also, should this be derived from `argv[0]`? That way if the tool gets renamed, the error message still uses the actual executable name.
26	Is there a way of telling clang-tidy to allow a different naming style?
42	`const Twine &`?
47	Ditto.
53	I might use `fileBegin` and `fileEnd`.
78	was not found
87	If I read this correctly, this will print a series of empty lines before the output. I'm not sure I see the benefit of that? What about empty lines after the contents?
101	`bufferOrErr`

jhenderson added inline comments.Jul 15 2020, 1:09 AM

llvm/tools/extract/extract.cpp
87	Oh, I just read the mailing list comment saying this is to preserve line numbers. It might be helpful having a comment here in the code explaining this.

Just wanted to say I agree this can be better than the Inputs/ idiom, especially for comparatively short inputs. We have some downstream tests that use a bunch of echo commands to generate a second file, but an extract utility would be cleaner IMO.

+1 to porting some examples to help show the benefit of such a tool (& seed the usage so hopefully folks are more likely to be aware of/find/use this in the future when looking at existing tests to draw inspiration from)

+1 to a comment explaining why the blank newlines are present (do we have any formats where the blank newlines could be semantically significant/problematic? I don't think so)

& not exactly relevant to the tool itself, but do common editors support any way to annotate regions of code as being for different file formats? It'd be good to still get syntax highlighting and auto-indent for different input formats mixed in a file - by adding comments that tell the editor what format to interpret the next set of lines as, etc. For the single format cases that's not an issue.

Address comments. Add some examples

Herald added a reviewer: • espindola. · View Herald TranscriptJul 15 2020, 11:30 AM

Herald added a reviewer: alexander-shaposhnikov. · View Herald Transcript

Herald added a reviewer: rupprecht. · View Herald Transcript

Herald added subscribers: rupprecht, aheejin, emaste. · View Herald Transcript

MaskRay added inline comments.Jul 15 2020, 11:30 AM

llvm/tools/extract/extract.cpp
26	Added `.clang-tidy` (like lldb/.clang-tidy and lld/.clang-tidy)
53	Changed to `partBegin`/`partEnd`. Hope that is clearer.

MaskRay marked an inline comment as done.Jul 15 2020, 11:33 AM

Harbormaster failed remote builds in B64395: Diff 278259!Jul 15 2020, 12:06 PM

Fix OVERVIEW.

Update TestingGuide.rst

Harbormaster failed remote builds in B64418: Diff 278299!Jul 15 2020, 2:38 PM

LGTM. Thanks for doing this!

This revision is now accepted and ready to land.Jul 16 2020, 12:03 AM

jhenderson mentioned this in D83520: [llvm-libtool-darwin] Allow flattening archives.Jul 16 2020, 12:37 AM

grimar added inline comments.Jul 16 2020, 1:37 AM

lld/test/ELF/linkerscript/noload.s
1 ↗	(On Diff #278299)	This new style in LLD tests looks much better to me!
llvm/test/tools/extract/no-leading-lines.s
4 ↗	(On Diff #278299)	I wonder if it is better to use a tool that doesn't require "REQUIRES: x86-registered-target". E.g. it probably could be yaml2obj that reported a syntax error.
llvm/test/tools/llvm-objcopy/ELF/strip-symbol.test
14 ↗	(On Diff #278299)	`#` -> `##` while you are here?
llvm/tools/extract/extract.cpp
44	This is used only once I think. Perhaps it can be merged with the `error()` below: LLVM_ATTRIBUTE_NORETURN static void error(StringRef filename, const Twine &message) { if (filename.empty()) ... else ... exit(1);
64	This supports `//---`, but doesn't seem you have a test for it? I'd also add a comment saying it tries to find `^(.\|//)--- <part>` line. Perhaps, it is easier to read this piece when it is: if (line.size() <= markerLen \|\| !line.substr(markerLen - 4).startswith("--- "))) continue;
69	test?
70	You do not need this line, `error()` calls `exit(1)` inside.
82	test?
112	test?

Address comments

llvm/test/tools/extract/no-leading-lines.s
4 ↗	(On Diff #278299)	For this one, we can drop -triple=x86_64. `#` in the line beginning is a universal comment among targets. For others, I want to avoid syntax errors because that would change the exist code of the tool.
llvm/test/tools/llvm-objcopy/ELF/strip-symbol.test
14 ↗	(On Diff #278299)	Since this is about the input, `#` works better, i.e. the syntax does not require two `##` to start a comment..
llvm/tools/extract/extract.cpp
64	basic.s has `//--- cc`
69	basic.s has `# DUP:`
82	basic.s has `was not found`
112	`basic.s` has `# NO_INPUT:`

Oh, forgot to git add basic.s in a previous revision:/

Harbormaster failed remote builds in B64537: Diff 278504!Jul 16 2020, 9:24 AM

Harbormaster failed remote builds in B64538: Diff 278505!Jul 16 2020, 9:42 AM

Sorry, I should have paid more attention when doing my previous review.

llvm/test/tools/extract/basic.s
2 ↗	(On Diff #278505)	Do you actually need to use llvm-mc at all? It seems a bit heavy duty. You could just use `FileCheck` directly on the output of `extract`, e.g. # RUN: extract aa %s \| FileCheck %s --check-prefix=AA --implicit-check-not=bb ... # AA: {{^}}aa{{$}} #--- aa aa I suppose that doesn't cover the line numbering, but perhaps that should be a different test?
llvm/tools/extract/extract.cpp
26	I'm not seeing the `.clang-tidy` in the file list?

MaskRay marked 3 inline comments as done.Jul 17 2020, 12:38 AM

MaskRay added inline comments.

llvm/test/tools/extract/basic.s
2 ↗	(On Diff #278505)	I suppose the only problem is `# REQUIRES: x86-registered-target` but this is the simplest tool (I can find) which can print line number information, and we do need to test several forms and ensure each form can get correct line numbers. If you have suggestion for another tool without a need to add `# REQUIRES: x86-registered-target`, I'd happily change. But I'd prefer sticking with llvm-mc otherwise.
llvm/tools/extract/extract.cpp
26	It is in the file list: `A M llvm/tools/extract/.clang-tidy (19 lines)`

MaskRay marked an inline comment as done.Jul 17 2020, 12:38 AM

grimar added inline comments.Jul 17 2020, 3:24 AM

llvm/test/tools/extract/basic.s
2 ↗	(On Diff #278505)	Perhaps, you could use the LLVM`s `count` tool to count the number of lines?

grimar added inline comments.Jul 17 2020, 3:32 AM

llvm/test/tools/extract/basic.s
2 ↗	(On Diff #278505)	Another way I can think of is to add more sample files to compare with. I.e. you can create sample.a, sample.b etc. And then compare the output of extract with those samples. Will this work? I just also do not think that using of `llvm-mc` is a good idea for the tool that splits the text lines...

MaskRay marked 2 inline comments as done.Jul 17 2020, 9:08 AM

MaskRay added inline comments.

llvm/test/tools/extract/basic.s
2 ↗	(On Diff #278505)	Perhaps, you could use the LLVM`s count tool to count the number of lines? This seems like an indirect way to test something. I.e. you can create sample.a, sample.b etc. And then compare the output of extract with those samples. This degrades readability. Adding a line means all Inputs/ files need updating. .warning has well defined and very stable interface. We use nothing but a very basic feature set of lexing of it. No assembly, no disassembly. I don't see a problem using it for testing.

dblaikie added inline comments.Jul 17 2020, 9:25 AM

llvm/test/tools/extract/basic.s
2 ↗	(On Diff #278505)	I.e. you can create sample.a, sample.b etc. And then compare the output of extract with those samples. This degrades readability. Adding a line means all Inputs/ files need updating. I think this (having separate test files, diffing them exactly) is the right way to test this tool - the nature of low-level tools that are designed to improve the readability of higher level tests is that testing them will be less elegant. (eg: testing yaml2obj if we intend to use it to test llvm-dwarfdump means yaml2obj tests can't use llvm-dwarfdump to test them, etc) Indeed, presumably we want to use this tool when testing llvm-mc, so we should not use llvm-mc to test the tool - otherwise the testing is circular. Please don't use llvm-mc or similar things to test this very low-level piece of test infrastructure like this.

llvm-mc -> FileCheck + count

dblaikie added inline comments.Jul 17 2020, 10:19 AM

llvm/test/tools/extract/basic.test
21–23 ↗	(On Diff #278826)	Does this test that the desired lines are at the desired line numbers? It seems not. The padding lines could be incorrectly printed at the end of the output & it would still pass, right? I think it'd be better to write this test with exact expected output files & run diff against them. This is a case, in my opinion, where simple golden files seem entirely appropriate - the point of the tool is to produce very specific output, unlike something like opt/llc/etc that produce one of a set of equivalent outputs.
llvm/tools/extract/.clang-tidy
1 ↗	(On Diff #278826)	Why does this project have a different format compared to the rest of LLVM?

Harbormaster failed remote builds in B64711: Diff 278826!Jul 17 2020, 10:49 AM

Add two golden files

llvm/tools/extract/.clang-tidy
1 ↗	(On Diff #278826)	Adopt docs/Proposals/VariableNames.rst (proposal, not an agreement. Not agreed because people don't want VariableName -> variableName churn. There is no such concern for a standalone fresh utility & this is the style adopted by lld & mlir)

dblaikie added inline comments.Jul 17 2020, 11:28 AM

llvm/tools/extract/.clang-tidy
1 ↗	(On Diff #278826)	For distinct top-level projects that seems sort of OK - but I don't think it's appropriate to have divergent styles within a single top-level project, such as LLVM.

Harbormaster failed remote builds in B64729: Diff 278859!Jul 17 2020, 11:57 AM

MaskRay marked an inline comment as done.Jul 17 2020, 12:08 PM

MaskRay added subscribers: michaelplatings, lattner.

MaskRay added inline comments.

llvm/tools/extract/.clang-tidy
1 ↗	(On Diff #278826)	For distinct top-level projects that seems sort of OK - but I don't think it's appropriate to have divergent styles within a single top-level project, such as LLVM. I don't think people expressed that this is not OK for the previous discussion. @michaelplatings @lattner

lit substitution: extract -> %extract

Unfortunately, 'extract' in 'llvm-xray extract' will be substituted, due to D83578

MaskRay edited the summary of this revision. (Show Details)Jul 23 2020, 8:29 AM

Harbormaster failed remote builds in B65386: Diff 280141!Jul 23 2020, 9:26 AM

Use %extract only in not %extract

Closed by commit rGd054c7ee2e9f: Add test utility 'extract' (authored by MaskRay). · Explain WhyJul 23 2020, 7:15 PM

This revision was automatically updated to reflect the committed changes.

In D83834#2171199, @MaskRay wrote:

Use %extract only in not %extract

So does 'extract' not work the same as all other tools in LLVM? It doesn't get substituted in non-starting locations?

That inconsistency seems undesirable/best avoided. The xray collission is problematic, to be sure - might indicate that another name should be used, or perhaps some escaping to disable substitution would be useful (quotation marks seem like a fairly accessible/simple syntax to disable substitution, for instance)

It's maybe a bit surprising that we have both llvm-extract and extract that are unrelated binaries?

In D83834#2171214, @thakis wrote:

It's maybe a bit surprising that we have both llvm-extract and extract that are unrelated binaries?

Yes. Seems that nobody is unhappy, though http://lists.llvm.org/pipermail/llvm-dev/2020-July/143373.html

If we add a standalone utility, how shall we name it? (Note that llvm-extract exists, but people can probably distinguish 'extract' from llvm-extract

In D83834#2171208, @dblaikie wrote:

In D83834#2171199, @MaskRay wrote:

Use %extract only in not %extract

So does 'extract' not work the same as all other tools in LLVM? It doesn't get substituted in non-starting locations?

That inconsistency seems undesirable/best avoided. The xray collission is problematic, to be sure - might indicate that another name should be used, or perhaps some escaping to disable substitution would be useful (quotation marks seem like a fairly accessible/simple syntax to disable substitution, for instance)

llvm/test/lit.cfg.py

# FIXME: Why do we have both `lli` and `%lli` that do slightly different things?
tools.extend([
    'dsymutil', 'lli', 'lli-child-target', 'llvm-ar', 'llvm-as',
    'llvm-addr2line', 'llvm-bcanalyzer', 'llvm-config', 'llvm-cov',
    'llvm-cxxdump', 'llvm-cvtres', 'llvm-diff', 'llvm-dis', 'llvm-dwarfdump',
    'llvm-exegesis', 'llvm-extract', 'llvm-isel-fuzzer', 'llvm-ifs',
    'llvm-install-name-tool', 'llvm-jitlink', 'llvm-opt-fuzzer', 'llvm-lib',
    'llvm-link', 'llvm-lto', 'llvm-lto2', 'llvm-mc', 'llvm-mca',
    'llvm-modextract', 'llvm-nm', 'llvm-objcopy', 'llvm-objdump',
    'llvm-pdbutil', 'llvm-profdata', 'llvm-ranlib', 'llvm-rc', 'llvm-readelf',
    'llvm-readobj', 'llvm-rtdyld', 'llvm-size', 'llvm-split', 'llvm-strings',
    'llvm-strip', 'llvm-tblgen', 'llvm-undname', 'llvm-c-test', 'llvm-cxxfilt',
    'llvm-xray', 'yaml2obj', 'obj2yaml', 'yaml-bench', 'verify-uselistorder',
    'bugpoint', 'llc', 'llvm-symbolizer', 'opt', 'sancov', 'sanstats'])

Any tool in this list is special. They can be substituted in any position on a RUN line, either a 'command' position or an 'argument' position. I hope that we can deprecated this behavior.

Harbormaster completed remote builds in B65483: Diff 280314.Jul 23 2020, 7:44 PM

In D83834#2171222, @MaskRay wrote:
In D83834#2171208, @dblaikie wrote:

In D83834#2171199, @MaskRay wrote:

Use %extract only in not %extract

So does 'extract' not work the same as all other tools in LLVM? It doesn't get substituted in non-starting locations?

That inconsistency seems undesirable/best avoided. The xray collission is problematic, to be sure - might indicate that another name should be used, or perhaps some escaping to disable substitution would be useful (quotation marks seem like a fairly accessible/simple syntax to disable substitution, for instance)

llvm/test/lit.cfg.py
# FIXME: Why do we have both `lli` and `%lli` that do slightly different things?
tools.extend([
    'dsymutil', 'lli', 'lli-child-target', 'llvm-ar', 'llvm-as',
    'llvm-addr2line', 'llvm-bcanalyzer', 'llvm-config', 'llvm-cov',
    'llvm-cxxdump', 'llvm-cvtres', 'llvm-diff', 'llvm-dis', 'llvm-dwarfdump',
    'llvm-exegesis', 'llvm-extract', 'llvm-isel-fuzzer', 'llvm-ifs',
    'llvm-install-name-tool', 'llvm-jitlink', 'llvm-opt-fuzzer', 'llvm-lib',
    'llvm-link', 'llvm-lto', 'llvm-lto2', 'llvm-mc', 'llvm-mca',
    'llvm-modextract', 'llvm-nm', 'llvm-objcopy', 'llvm-objdump',
    'llvm-pdbutil', 'llvm-profdata', 'llvm-ranlib', 'llvm-rc', 'llvm-readelf',
    'llvm-readobj', 'llvm-rtdyld', 'llvm-size', 'llvm-split', 'llvm-strings',
    'llvm-strip', 'llvm-tblgen', 'llvm-undname', 'llvm-c-test', 'llvm-cxxfilt',
    'llvm-xray', 'yaml2obj', 'obj2yaml', 'yaml-bench', 'verify-uselistorder',
    'bugpoint', 'llc', 'llvm-symbolizer', 'opt', 'sancov', 'sanstats'])
Any tool in this list is special. They can be substituted in any position on a RUN line, either a 'command' position or an 'argument' position. I hope that we can deprecated this behavior.

This looks like a fairly comprehensive list of llvm utilities - seems like "extract" should be included in that list until that deprecation design discussion reaches an agreement. I don't think it's suitable to start down that path before an agreement is reached in an ongoing design discussion.

In D83834#2171217, @MaskRay wrote:

In D83834#2171214, @thakis wrote:

It's maybe a bit surprising that we have both llvm-extract and extract that are unrelated binaries?

Yes. Seems that nobody is unhappy, though http://lists.llvm.org/pipermail/llvm-dev/2020-July/143373.html

If we add a standalone utility, how shall we name it? (Note that llvm-extract exists, but people can probably distinguish 'extract' from llvm-extract

Fairly small sample of folks in that thread - designing for all the rest of the LLVM developers isn't necessarily "what no one personally objects to".

(though the naming collission with xray extract did get me thinking about naming in general, but also the semantics - would it make sense to call this something like "fragment" (or other synonyms to "split", etc) and have the semantics be more like "write all the file fragments into the specified directory, named after the fragment name" - so you only have to run this tool once, rather than once for every fragment you have?)

In D83834#2171252, @dblaikie wrote:
In D83834#2171222, @MaskRay wrote:
In D83834#2171208, @dblaikie wrote:

In D83834#2171199, @MaskRay wrote:

Use %extract only in not %extract

So does 'extract' not work the same as all other tools in LLVM? It doesn't get substituted in non-starting locations?

That inconsistency seems undesirable/best avoided. The xray collission is problematic, to be sure - might indicate that another name should be used, or perhaps some escaping to disable substitution would be useful (quotation marks seem like a fairly accessible/simple syntax to disable substitution, for instance)

llvm/test/lit.cfg.py
# FIXME: Why do we have both `lli` and `%lli` that do slightly different things?
tools.extend([
    'dsymutil', 'lli', 'lli-child-target', 'llvm-ar', 'llvm-as',
    'llvm-addr2line', 'llvm-bcanalyzer', 'llvm-config', 'llvm-cov',
    'llvm-cxxdump', 'llvm-cvtres', 'llvm-diff', 'llvm-dis', 'llvm-dwarfdump',
    'llvm-exegesis', 'llvm-extract', 'llvm-isel-fuzzer', 'llvm-ifs',
    'llvm-install-name-tool', 'llvm-jitlink', 'llvm-opt-fuzzer', 'llvm-lib',
    'llvm-link', 'llvm-lto', 'llvm-lto2', 'llvm-mc', 'llvm-mca',
    'llvm-modextract', 'llvm-nm', 'llvm-objcopy', 'llvm-objdump',
    'llvm-pdbutil', 'llvm-profdata', 'llvm-ranlib', 'llvm-rc', 'llvm-readelf',
    'llvm-readobj', 'llvm-rtdyld', 'llvm-size', 'llvm-split', 'llvm-strings',
    'llvm-strip', 'llvm-tblgen', 'llvm-undname', 'llvm-c-test', 'llvm-cxxfilt',
    'llvm-xray', 'yaml2obj', 'obj2yaml', 'yaml-bench', 'verify-uselistorder',
    'bugpoint', 'llc', 'llvm-symbolizer', 'opt', 'sancov', 'sanstats'])
Any tool in this list is special. They can be substituted in any position on a RUN line, either a 'command' position or an 'argument' position. I hope that we can deprecated this behavior.
This looks like a fairly comprehensive list of llvm utilities - seems like "extract" should be included in that list until that deprecation design discussion reaches an agreement. I don't think it's suitable to start down that path before an agreement is reached in an ongoing design discussion.

Things like not and count are not in the list. I don't think using %extract only for its own test a problem. Nobody uses not %extract in tests. It is there just to test the tool itself.

In D83834#2171217, @MaskRay wrote:

In D83834#2171214, @thakis wrote:

It's maybe a bit surprising that we have both llvm-extract and extract that are unrelated binaries?

Yes. Seems that nobody is unhappy, though http://lists.llvm.org/pipermail/llvm-dev/2020-July/143373.html

If we add a standalone utility, how shall we name it? (Note that llvm-extract exists, but people can probably distinguish 'extract' from llvm-extract

Fairly small sample of folks in that thread - designing for all the rest of the LLVM developers isn't necessarily "what no one personally objects to".

(though the naming collission with xray extract did get me thinking about naming in general, but also the semantics - would it make sense to call this something like "fragment" (or other synonyms to "split", etc) and have the semantics be more like "write all the file fragments into the specified directory, named after the fragment name" - so you only have to run this tool once, rather than once for every fragment you have?)

'split' is a utility specified by POSIX.1-2017 Base.

"write all the file fragments into the specified directory, named after the fragment name" is a good idea. If it is implemented as a special mode, it can be: extract all %s -o %t.dir or extract all %s -o %t. -o specifies a directory name.

Instead of adding an explicit command 'extract', did you consider building this into llvm-mc and similar tools? This is will lead to more efficient tests and scales better to things that have diagnostic verification and other sorts of checks. This is also more consistent with the precedent MLIR has set here.

In D83834#2172384, @lattner wrote:

Instead of adding an explicit command 'extract', did you consider building this into llvm-mc and similar tools? This is will lead to more efficient tests and scales better to things that have diagnostic verification and other sorts of checks. This is also more consistent with the precedent MLIR has set here.

I considered the choice. mlir-opt/mlir-translate's -split-input-file/clang -verify is suitable when the input is homogeneous and the test is performed the tool itself.
For heterogeneous needs (llvm-objcopy --strip-symbols %t-list.txt %t %t5 (symbol list), ld.lld -T %t.lds %t.o (linker script)), or when the tool output is inspected by another tool, a standalone utility is likely more useful.

So far I like @dblaikie's semantics. I'd like syntax like extract %s -o %t.dir. For example the following will create %t.dir/{a.txt,b.ll,c.c}

#--- a.txt
0
;--- b.ll
1
//--- c.c
2

I am not settled on completely deleting the previous extract part %s syntax, or renaming 'extract' to something else.

Ok fair enough. I'd recommend a name like llvm-split-file or something like that given that we already have an llvm-extract tool that is very different.

Also, it is worth pointing out that we already have an (arguably much better) way to handle the problem you're trying to solve here:

Name your test something like "foo.ll", then have the auxilary files be named "foo.ll.xyz". You can then refer to them directly in the test with "%s.xyz". This is better because 1) it doesn't introduce another micro tool, 2) it is general to non text files, 3) it is easy to work with on the command line when a test breaks, and 4) this makes it easier for multiple tests to share the same file.

The only downside I see to this is the creation of more small files, but I think it is a good tradeoff to not introduce a new way of doing things here that is less general.

In D83834#2172415, @lattner wrote:

Ok fair enough. I'd recommend a name like llvm-split-file or something like that given that we already have an llvm-extract tool that is very different.

How about a short name, e.g. split-file? We have many lit utilities not named llvm-*: not, count, FileCheck. The utility is more of their league.

In D83834#2172423, @lattner wrote:

Also, it is worth pointing out that we already have an (arguably much better) way to handle the problem you're trying to solve here:

Name your test something like "foo.ll", then have the auxilary files be named "foo.ll.xyz". You can then refer to them directly in the test with "%s.xyz". This is better because 1) it doesn't introduce another micro tool, 2) it is general to non text files, 3) it is easy to work with on the command line when a test breaks, and 4) this makes it easier for multiple tests to share the same file.

The only downside I see to this is the creation of more small files, but I think it is a good tradeoff to not introduce a new way of doing things here that is less general.

Separate files are considered. Actually that motivated the standalone utility https://reviews.llvm.org/D83725#2149490

"I end up doing one of three things in this situation: 1) adding a separate file in the "Inputs" directory - this is not great because the test input is far away from the actual test (i.e. not in the same file), making it harder to follow; 2) echoing the second and later inputs to separate files at runtime - this is not great because it has a runtime cost; ..."

I don't see any reason to prefer a short name here, this isn't a simple too like 'opt' it takes command line flags etc.

Unrelatedly, I don't agree with your rationale for having a separate tool. You're right that moving files to an Inputs directory moves them further away, but that isn't what I was suggesting. Also, there is great precedent across the testsuite for this, and the proposed tool isn't a general solution to the problem (e.g. binary files etc).

In D83834#2172461, @lattner wrote:

I don't see any reason to prefer a short name here, this isn't a simple too like 'opt' it takes command line flags etc.

not, count, FileCheck, update_*_test_checks.py. These utilities set a precedent that these auxiliary test-only utilities are not named llvm-*.

Unrelatedly, I don't agree with your rationale for having a separate tool. You're right that moving files to an Inputs directory moves them further away, but that isn't what I was suggesting. Also, there is great precedent across the testsuite for this, and the proposed tool isn't a general solution to the problem (e.g. binary files etc).

I acknowledge that this falls into subjective points of view and people may have different opinions. For me, non-trivial separate files (not creatable with one-line echo) have caused enough pain to me. It is not unusual for me open two or three auxiliary files in Inputs/ to understand the purpose of a test. If I don't count wrong, at least @grimar, @jhenderson and @probinson hold a similar viewpoint.

(For binary files, they are sometimes useful, e.g. when testing compatibility of LLVM IR. a llvm/test/Bitcode/ pre-built file ensures that compatibility is retained. However, their use cases are very narrow. In 99% cases textual formats will be a superior replacement. I hope we don't let 1% inapplicable use case to be the reason that a general purpose testing utility should not be introduced. )

dblaikie added a comment.Jul 24 2020, 9:51 AM

This comment was removed by dblaikie.

I'm not sure how googles test runner works or how it differs. I recall many historical examples (in clang in particular), but it is possible they all got changed.

There's no need to speculate though :-), I'd just try an example to see if it works in practice in your environment.

In D83834#2172461, @lattner wrote:

I don't see any reason to prefer a short name here, this isn't a simple too like 'opt' it takes command line flags etc.

I think if it's going to be invoked once per fragment, inline with the destination (extract %s test1 | llc ... ) then having a shorter name is nice. If it's going to be invoked once to split everything up - that'd be on a separate RUN line anyway, and having a longer name seems fine by me. Whether or not it has an llvm- prefix I don't feel /too/ strongly about, but it does make clear that this isn't some unix vocabulary program (like not, count, etc), but a specific LLVM thing.

Name your test something like "foo.ll", then have the auxilary files be named "foo.ll.xyz". You can then refer to them directly in the test with "%s.xyz". This is better because 1) it doesn't introduce another micro tool, 2) it is general to non text files, 3) it is easy to work with on the command line when a test breaks, and 4) this makes it easier for multiple tests to share the same file.
...
Unrelatedly, I don't agree with your rationale for having a separate tool. You're right that moving files to an Inputs directory moves them further away, but that isn't what I was suggesting. Also, there is great precedent across the testsuite for this, and the proposed tool isn't a general solution to the problem (e.g. binary files etc).

What's the precedent you're referring to here? Could you point me to some examples? Because my understanding was that we have a fairly strong precedent for input files being in an Inputs directory (in fact, I think Google's internal test runner relies on this convention) - having other files immediately next to test files I don't think is done & if it was done, I'd worry about the suffixes potentially colliding with valid test prefixes, then the auxiliary files would be incorrectly run as independent tests & probably fail due to lack of RUN lines, etc.

(3) is certainly compelling to me - being able to copy/paste lines from a test to reproduce is nice - though lots of tests are stateful - generating something, then doing a thing with it (running objdump and dwarfdump to inspect different features of the output - so you can't just copy/paste the dwarfdump+filecheck, you have to rerun the right command that generated the input too, etc).

I guess one thing we could do with this functionality that might make tests more reliable - this tool could delete the contents of the target directory before extracting the files (maybe this would be conflating responsibilities a bit and catch someone by surprise, though) reducing the incidence of tests being polluted by previous executions.

To throw-in another naming suggestion, clang uses that name "unbundle".

Also see http://lists.llvm.org/pipermail/llvm-dev/2020-July/143568.html

In D83834#2172509, @MaskRay wrote:

In D83834#2172461, @lattner wrote:

Unrelatedly, I don't agree with your rationale for having a separate tool. You're right that moving files to an Inputs directory moves them further away, but that isn't what I was suggesting. Also, there is great precedent across the testsuite for this, and the proposed tool isn't a general solution to the problem (e.g. binary files etc).

I acknowledge that this falls into subjective points of view and people may have different opinions. For me, non-trivial separate files (not creatable with one-line echo) have caused enough pain to me. It is not unusual for me open two or three auxiliary files in Inputs/ to understand the purpose of a test. If I don't count wrong, at least @grimar, @jhenderson and @probinson hold a similar viewpoint.

(For binary files, they are sometimes useful, e.g. when testing compatibility of LLVM IR. a llvm/test/Bitcode/ pre-built file ensures that compatibility is retained. However, their use cases are very narrow. In 99% cases textual formats will be a superior replacement. I hope we don't let 1% inapplicable use case to be the reason that a general purpose testing utility should not be introduced. )

I have a strong +1 to have a way to keep inputs and test in a single file. It significantly improves readability from my experience. E.g. we have --docnum=x
option for yaml2obj and it is very common for llvm-readelf tests to have a few little YAML inputs in the same file. We introduced macros support for yaml2obj (-DMACRO=<val>)
and default values for them recently and this also helps to reuse YAML descriptions from the same file what generally helps to isolate test cases, reuse them for similar tests and keep the whole file compact.
If we had separate inputs it would be very hard to mantain those tests I believe.

Initially D84054 suggested the --doc-id=<id> option for llvm-mc which could help to solve a problem for asm files to avoid splitting. I think it was good solution by itself,
but also, having a separate tool is probably a more general solution for achieving the same.

I don't think I have much new to add that hasn't been said already. My personal opinions are:

Naming - I don't really mind whether it has a long or short name. Given it's usually going to be on a separate line anyway, I suspect, rather than writing stuff to stdout for feeding into stdin, I don't think a longer name is an issue. The point about it not being a unix tool equivalent makes some sense to me for adding an llvm- prefix, although there are counter-examples for this. I also have no preference on the exact name, as long as it conveys meaning well enough ([llvm-]split-file seems clear enough for example).
I want my test inputs to be all in one file where possible, as it makes it much easier to iterate on a test, understand it, etc, by keeping everything in the single file. There is a place for separate inputs, where they are going to be commonly reused across many tests (see for example the recent llvm-libtool-darwin testing, which reuses the same YAML repeatedly), but I think these are more the exception than the rule. It seems to me that the %s.xxx is not much better than being in %S\Inputs in this regards.
@dblaikie's concern about possible extension collisions with the %s.xxx approach is one I'd share. One example might be an lld or llvm-objdump test consuming multiple asm files. A naive user might call them %s.1.s, %s.2.s etc and if they're just writing the one test, won't notice that when the whole directory is tested, including those files, resulting in lit failures.
An "extract all" option would be useful. In fact, assuming we don't start adding other functionality to the tool (such as string substitution), I suspect the extract everything approach is going to be almost universally the better option, so might want to be the default behaviour.

Higuoxing added a subscriber: Higuoxing.Jul 28 2020, 1:38 AM

Hi All,

I'm concerned that this patch was just landed despite the open discussion about problems. As one example, it doesn't make any sense to me that we now have llvm/tools/extract and llvm/tools/llvm-extract. Please revert until the details are sorted out and we get to convergence on the issues.

-Chris

MaskRay reopened this revision.Jul 28 2020, 1:26 PM

This revision is now accepted and ready to land.Jul 28 2020, 1:26 PM

MaskRay added a reverting change: rGdd405f1a5397: Revert D83834 "Add test utility 'extract'".Jul 28 2020, 1:26 PM

MaskRay planned changes to this revision.Jul 28 2020, 1:26 PM

ychen added a subscriber: ychen.Jul 28 2020, 1:30 PM

Thank you Fangrui!

NeHuang mentioned this in D83669: [PowerPC] Support for R_PPC64_REL24_NOTOC calls where the caller has no TOC and the callee is not DSO local.Jul 28 2020, 3:09 PM

Rename extract to split-file and change the interface

This revision is now accepted and ready to land.Jul 29 2020, 10:31 AM

MaskRay edited the summary of this revision. (Show Details)Jul 29 2020, 10:32 AM

Harbormaster completed remote builds in B66245: Diff 281660.Jul 29 2020, 10:37 AM

MaskRay added inline comments.Jul 29 2020, 12:44 PM

llvm/test/tools/split-file/basic.test
10 ↗	(On Diff #281660)	I'll enhance the test a bit: ## Test that we will delete the output if it is a file, so that we can create ## a directory. # RUN: touch %t

jhenderson added inline comments.Jul 30 2020, 4:57 AM

llvm/tools/split-file/split-file.cpp
63 ↗	(On Diff #281660)	Whilst it's probably harmless, I don't think this should be an `int` (and similar comments for other liner number variables, e.g. `lineNo` in `handle`), since it can never be negative. `size_t` or possibly `unsigned` seem more appropriate for the context.
96 ↗	(On Diff #281660)	I'm not sure it's immediately obvious here what the type of `cur` is, so I'd prefer no `auto`.
113 ↗	(On Diff #281660)	The name `it` implies to me that this is an iterator, but it's actually a pair, right? Perhaps best to change the name to something (e.g. `keyValue` or whatever)
117 ↗	(On Diff #281660)	You already have an `ec` local variable here. Perhaps this should be renamed and/or the earlier `ec` declaration moved to where it is actually used.
157 ↗	(On Diff #281660)	`sys::fs::remove` returns a `std::error_code`. I'm not sure we want to be ignoring it - I'd expect some sort of error in that case to be printed.

Address comments.
Enhance tests.

MaskRay marked an inline comment as done.Jul 30 2020, 9:25 AM

MaskRay added inline comments.Jul 30 2020, 9:28 AM

llvm/tools/split-file/split-file.cpp
63 ↗	(On Diff #281660)	Emm. I think it is debatable whether `undefined` is a suitable type here. See comments starting from https://reviews.llvm.org/D82594#2127217 for some discussions. I actually perform arithmetic near zero below (`i.line_number() - 1`). int gives me more confidence that things don't go off.

MaskRay added inline comments.Jul 30 2020, 9:30 AM

llvm/test/tools/split-file/output-is-special.test
4 ↗	(On Diff #281958)	I'll change this comment to: ## Don't delete the output if it is special, otherwise root may accidentally ## remove important special files.

Harbormaster completed remote builds in B66413: Diff 281958.Jul 30 2020, 9:47 AM

LGTM, with or without the suggestion, but since others had comments about this, it would be good to get another pair of eyes to give it another look over.

llvm/tools/split-file/split-file.cpp
126 ↗	(On Diff #281958)	Another possible `int64_t` site.
63 ↗	(On Diff #281660)	It turns out that `line_number` is an `int64_t`. So I drop my point about `unsigned` or `size_t` (at least for this case, but I'm hardly convinced on the general case discussed on that or previous threads). However, perhaps `leadingLines` etc should be `int64_t` to match and avoid any truncation issues (noting in particular that the value could be anything in the range due to how EOF is handled)?

Thank you for renaming this!

int -> int64_t

Harbormaster completed remote builds in B66799: Diff 282671.Aug 3 2020, 11:40 AM

MaskRay edited the summary of this revision. (Show Details)Aug 3 2020, 2:43 PM

Closed by commit rGbcea3a7a288e: Add test utility 'split-file' (authored by MaskRay). · Explain WhyAug 3 2020, 8:43 PM

This revision was automatically updated to reflect the committed changes.

MaskRay added a commit: rGbcea3a7a288e: Add test utility 'split-file'.

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

In D83834#2201444, @aeubanks wrote:

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

This is strange. A Windows bot I usually look was good: http://45.33.8.238/win/summary.html

Can you please check why '\r' is printed? The output is opened with OF_None (not OF_Text), so I don't expect '\r' to be printed.

At the least diff -b can be used.

Some of the new tests FAIL when run in the same tree a second time:

LLVM :: tools/llvm-strings/radix.test
LLVM :: tools/split-file/basic.test
LLVM :: tools/split-file/empty.test
LLVM :: tools/split-file/no-leading-lines.test

The failure mode is always the same, e.g.

split-file: error: /var/llvm/local-sparcv9-relwithdebinfo-A/test/tools/llvm-strings/Output/radix.test.tmp: File exists

This affects the Solaris buildbots (e.g. the Solaris/sparcv9 one) and all others that run with clean=False.

In D83834#2201665, @MaskRay wrote:

In D83834#2201444, @aeubanks wrote:

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

This is strange. A Windows bot I usually look was good: http://45.33.8.238/win/summary.html

Can you please check why '\r' is printed? The output is opened with OF_None (not OF_Text), so I don't expect '\r' to be printed.

At the least diff -b can be used.

FWIW, I've tested this (on windows) and all tests in \llvm\test\tools\split-file folder pass fine for me.

MaskRay mentioned this in D83852: [llvm-profdata] Implement llvm-profdata overlap for sample profiles.Aug 8 2020, 3:18 PM

In D83834#2201444, @aeubanks wrote:

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

Have you got your git line-endings checkout settings correct? It's possible the \r is being added when you check out a file somewhere, I guess?

In D83834#2201952, @ro wrote:
Some of the new tests FAIL when run in the same tree a second time:
LLVM :: tools/llvm-strings/radix.test
LLVM :: tools/split-file/basic.test
LLVM :: tools/split-file/empty.test
LLVM :: tools/split-file/no-leading-lines.test
The failure mode is always the same, e.g.
split-file: error: /var/llvm/local-sparcv9-relwithdebinfo-A/test/tools/llvm-strings/Output/radix.test.tmp: File exists
This affects the Solaris buildbots (e.g. the Solaris/sparcv9 one) and all others that run with clean=False.

I just tried running the tests twice on my Windows machine, and they pass (no cleaning in between). At a guess, this is something different in the Solaris OS behaviour?

@MaskRay, I just noticed that this new code is in llvm/tools, but I think it belongs in llvm/utils, because it is more like tools like FileCheck, not, count etc intended for internal testing. What do you think?

In D83834#2206034, @jhenderson wrote:

In D83834#2201444, @aeubanks wrote:

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

Have you got your git line-endings checkout settings correct? It's possible the \r is being added when you check out a file somewhere, I guess?

Yup that's it. https://llvm.org/docs/GettingStarted.html#checkout-llvm-from-git says to use autocrlf=false which I wasn't doing. I had to delete and restore those files to get the proper line endings, but now it's good, thanks!

In D83834#2206034, @jhenderson wrote:
In D83834#2201444, @aeubanks wrote:

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

Have you got your git line-endings checkout settings correct? It's possible the \r is being added when you check out a file somewhere, I guess?
In D83834#2201952, @ro wrote:
Some of the new tests FAIL when run in the same tree a second time:
LLVM :: tools/llvm-strings/radix.test
LLVM :: tools/split-file/basic.test
LLVM :: tools/split-file/empty.test
LLVM :: tools/split-file/no-leading-lines.test
The failure mode is always the same, e.g.
split-file: error: /var/llvm/local-sparcv9-relwithdebinfo-A/test/tools/llvm-strings/Output/radix.test.tmp: File exists
This affects the Solaris buildbots (e.g. the Solaris/sparcv9 one) and all others that run with clean=False.
I just tried running the tests twice on my Windows machine, and they pass (no cleaning in between). At a guess, this is something different in the Solaris OS behaviour?

@MaskRay, I just noticed that this new code is in llvm/tools, but I think it belongs in llvm/utils, because it is more like tools like FileCheck, not, count etc intended for internal testing. What do you think?

According to http://lists.llvm.org/pipermail/llvm-dev/2020-August/143995.html

llvm/utils. -> this builds some executables like tablegen

llvm/lib. -> This is all libraries, shouldn’t include executables.

llvm/tools. -> Generally executables, also some libraries that are tool specific.

tests.. -> Things that depend on the above.

I am on the fence which of llvm/tools/split-file and llvm/utils/split-file makes more sense. I'd likely to collect a bit more opinions.

@ro Can you set a breakpoint while running split-file, checking whether split-file can delete an existing file and replace it with a directory on Solaris?
This behavior is intentionally added to support changing %t from a file to a directory. It appears to work well on macOS, Linux and Windows.

In D83834#2208059, @MaskRay wrote:

@ro Can you set a breakpoint while running split-file, checking whether split-file can delete an existing file and replace it with a directory on Solaris?
This behavior is intentionally added to support changing %t from a file to a directory. It appears to work well on macOS, Linux and Windows.

Here's what I found: the failing call to split-file is like this:

./bin/split-file --no-leading-lines /vol/llvm/src/llvm-project/local/llvm/test/tools/split-file/empty.test /var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp

Running under truss shows

19028:  fstatat(AT_FDCWD, "/var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp", 0xFFFFFFFF7FFFE380, AT_SYMLINK_NOFOLLOW) = 0
19028:      d=0x0000010600010012 i=212407 m=0042750 l=2  u=2110  g=4620  sz=3
19028:          at = Aug 10 11:21:41 MEST 2020  [ 1597051301.681593645 ]
19028:          mt = Aug 10 01:46:42 MEST 2020  [ 1597016802.271083280 ]
19028:          ct = Aug 10 01:46:42 MEST 2020  [ 1597016802.271083280 ]
19028:      bsz=512   blks=3     fs=zfs
19028:  unlinkat(AT_FDCWD, "/var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp", AT_REMOVEDIR) Err#17 EEXIST

i.e. it tries to remove a directory. However, this cannot work since the directory in question isn't empty:

$ ls -la /var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp
total 4
drwxr-s--- 2 ro gcc  3 Aug 10 01:46 ./
drwxr-sr-x 5 ro gcc 11 Aug 10 18:32 ../
-rw-r--r-- 1 ro gcc  0 Aug 10 01:46 empty

When I run split-file in gdb with a breakpoint in rmdir, I get

#0  0xffffffff7eeb7388 in rmdir () from /lib/64/libc.so.1
#1  0xffffffff7ee2c47c in remove () from /lib/64/libc.so.1
#2  0x000000010025069c in llvm::sys::fs::remove (path=..., 
    IgnoreNonExisting=true)
    at /vol/llvm/src/llvm-project/local/llvm/lib/Support/Unix/Path.inc:442
#3  0x00000001001821ac in main (argc=4, argv=0xffffffff7fffeb38)
    at /vol/llvm/src/llvm-project/local/llvm/tools/split-file/split-file.cpp:168

So llvm::sys::fs::remove tries to call ::remove on a non-empty directory, while the man page clearly states:

The  remove() function causes the file or empty directory whose name is
the string pointed to by path to be no longer accessible by that  name.
A  subsequent  attempt  to  open  that  file using that name will fail,
unless the file is created anew.

For files, remove() is identical to unlink(). For directories, remove()
is identical to rmdir().

and in turn rmdir(2) says

The  rmdir()  function  removes  the  directory  named by the path name
pointed to by path. The directory must not have any entries other  than
"." and "..".

This just cannot work!

In D83834#2207311, @aeubanks wrote:

In D83834#2206034, @jhenderson wrote:

In D83834#2201444, @aeubanks wrote:

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

Have you got your git line-endings checkout settings correct? It's possible the \r is being added when you check out a file somewhere, I guess?

Yup that's it. https://llvm.org/docs/GettingStarted.html#checkout-llvm-from-git says to use autocrlf=false which I wasn't doing. I had to delete and restore those files to get the proper line endings, but now it's good, thanks!

Might still be worth either making this test more general and/or setting this file to be binary rather than text, so it doesn't trip over this issue?

(from the Getting Started page: "Passing --config core.autocrlf=false should not be required in the future after we adjust the .gitattribute settings correctly, but is required for Windows users at the time of this writing.")

In D83834#2208107, @ro wrote:

In D83834#2208059, @MaskRay wrote:

@ro Can you set a breakpoint while running split-file, checking whether split-file can delete an existing file and replace it with a directory on Solaris?
This behavior is intentionally added to support changing %t from a file to a directory. It appears to work well on macOS, Linux and Windows.

Here's what I found: the failing call to split-file is like this:

./bin/split-file --no-leading-lines /vol/llvm/src/llvm-project/local/llvm/test/tools/split-file/empty.test /var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp

Running under truss shows

19028:  fstatat(AT_FDCWD, "/var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp", 0xFFFFFFFF7FFFE380, AT_SYMLINK_NOFOLLOW) = 0
19028:      d=0x0000010600010012 i=212407 m=0042750 l=2  u=2110  g=4620  sz=3
19028:          at = Aug 10 11:21:41 MEST 2020  [ 1597051301.681593645 ]
19028:          mt = Aug 10 01:46:42 MEST 2020  [ 1597016802.271083280 ]
19028:          ct = Aug 10 01:46:42 MEST 2020  [ 1597016802.271083280 ]
19028:      bsz=512   blks=3     fs=zfs
19028:  unlinkat(AT_FDCWD, "/var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp", AT_REMOVEDIR) Err#17 EEXIST

i.e. it tries to remove a directory. However, this cannot work since the directory in question isn't empty:

$ ls -la /var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp
total 4
drwxr-s--- 2 ro gcc  3 Aug 10 01:46 ./
drwxr-sr-x 5 ro gcc 11 Aug 10 18:32 ../
-rw-r--r-- 1 ro gcc  0 Aug 10 01:46 empty

When I run split-file in gdb with a breakpoint in rmdir, I get

#0  0xffffffff7eeb7388 in rmdir () from /lib/64/libc.so.1
#1  0xffffffff7ee2c47c in remove () from /lib/64/libc.so.1
#2  0x000000010025069c in llvm::sys::fs::remove (path=..., 
    IgnoreNonExisting=true)
    at /vol/llvm/src/llvm-project/local/llvm/lib/Support/Unix/Path.inc:442
#3  0x00000001001821ac in main (argc=4, argv=0xffffffff7fffeb38)
    at /vol/llvm/src/llvm-project/local/llvm/tools/split-file/split-file.cpp:168

So llvm::sys::fs::remove tries to call ::remove on a non-empty directory, while the man page clearly states:

The  remove() function causes the file or empty directory whose name is
the string pointed to by path to be no longer accessible by that  name.
A  subsequent  attempt  to  open  that  file using that name will fail,
unless the file is created anew.

For files, remove() is identical to unlink(). For directories, remove()
is identical to rmdir().

and in turn rmdir(2) says

The  rmdir()  function  removes  the  directory  named by the path name
pointed to by path. The directory must not have any entries other  than
"." and "..".

This just cannot work!

std::errc::directory_not_empty is intentionally excluded. What error code is it on Solaris?

if (std::error_code ec = sys::fs::remove(output, /*IgnoreNonExisting=*/true))
  if (ec.value() != static_cast<int>(std::errc::directory_not_empty))
    fatal(output, ec.message());

In D83834#2208178, @MaskRay wrote:

In D83834#2208107, @ro wrote:

std::errc::directory_not_empty is intentionally excluded. What error code is it on Solaris?

if (std::error_code ec = sys::fs::remove(output, /*IgnoreNonExisting=*/true))
  if (ec.value() != static_cast<int>(std::errc::directory_not_empty))
    fatal(output, ec.message());

On Solaris it's EEXIST/file_exists, as documented on the Solaris rmdir(2) man page and explicitly allowed by POSIX/XPG7:

[EEXIST] or [ENOTEMPTY]
    The path argument names a directory that is not an empty directory, or there are hard links to the directory other than dot or a single entry in dot-dot.

MaskRay mentioned this in rGdbc468dc3199: [split-file] Fix sys::fs::remove() on Solaris after D83834.Aug 11 2020, 8:06 AM

jansvoboda11 mentioned this in D118586: [C++20][Modules][3/8] Initial handling for module partitions..Feb 21 2022, 11:38 PM

Revision Contents

Path

Size

llvm/

test/

CMakeLists.txt

1 line

lit.cfg.py

2 lines

tools/

extract/

CMakeLists.txt

7 lines

extract.cpp

106 lines

Diff 278028

llvm/test/CMakeLists.txt

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	# NOTE: Sync the substitutions in test/lit.cfg when adding to this list.			# NOTE: Sync the substitutions in test/lit.cfg when adding to this list.
	set(LLVM_TEST_DEPENDS			set(LLVM_TEST_DEPENDS
	BugpointPasses			BugpointPasses
	FileCheck			FileCheck
	LLVMHello			LLVMHello
	UnitTests			UnitTests
	bugpoint			bugpoint
	count			count
				extract
	llc			llc
	lli			lli
	lli-child-target			lli-child-target
	llvm-addr2line			llvm-addr2line
	llvm-ar			llvm-ar
	llvm-as			llvm-as
	llvm-bcanalyzer			llvm-bcanalyzer
	llvm-c-test			llvm-c-test
	▲ Show 20 Lines • Show All 148 Lines • Show Last 20 Lines

llvm/test/lit.cfg.py

Show First 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	tools = [
ToolSubst('%opt-viewer', opt_viewer_cmd),		ToolSubst('%opt-viewer', opt_viewer_cmd),
ToolSubst('%llvm-objcopy', FindTool('llvm-objcopy')),		ToolSubst('%llvm-objcopy', FindTool('llvm-objcopy')),
ToolSubst('%llvm-strip', FindTool('llvm-strip')),		ToolSubst('%llvm-strip', FindTool('llvm-strip')),
ToolSubst('%llvm-install-name-tool', FindTool('llvm-install-name-tool')),		ToolSubst('%llvm-install-name-tool', FindTool('llvm-install-name-tool')),
]		]

# FIXME: Why do we have both `lli` and `%lli` that do slightly different things?		# FIXME: Why do we have both `lli` and `%lli` that do slightly different things?
tools.extend([		tools.extend([
'dsymutil', 'lli', 'lli-child-target', 'llvm-ar', 'llvm-as',		'dsymutil', 'extract', 'lli', 'lli-child-target', 'llvm-ar', 'llvm-as',
'llvm-addr2line', 'llvm-bcanalyzer', 'llvm-config', 'llvm-cov',		'llvm-addr2line', 'llvm-bcanalyzer', 'llvm-config', 'llvm-cov',
'llvm-cxxdump', 'llvm-cvtres', 'llvm-diff', 'llvm-dis', 'llvm-dwarfdump',		'llvm-cxxdump', 'llvm-cvtres', 'llvm-diff', 'llvm-dis', 'llvm-dwarfdump',
'llvm-exegesis', 'llvm-extract', 'llvm-isel-fuzzer', 'llvm-ifs',		'llvm-exegesis', 'llvm-extract', 'llvm-isel-fuzzer', 'llvm-ifs',
'llvm-install-name-tool', 'llvm-jitlink', 'llvm-opt-fuzzer', 'llvm-lib',		'llvm-install-name-tool', 'llvm-jitlink', 'llvm-opt-fuzzer', 'llvm-lib',
'llvm-link', 'llvm-lto', 'llvm-lto2', 'llvm-mc', 'llvm-mca',		'llvm-link', 'llvm-lto', 'llvm-lto2', 'llvm-mc', 'llvm-mca',
'llvm-modextract', 'llvm-nm', 'llvm-objcopy', 'llvm-objdump',		'llvm-modextract', 'llvm-nm', 'llvm-objcopy', 'llvm-objdump',
'llvm-pdbutil', 'llvm-profdata', 'llvm-ranlib', 'llvm-rc', 'llvm-readelf',		'llvm-pdbutil', 'llvm-profdata', 'llvm-ranlib', 'llvm-rc', 'llvm-readelf',
'llvm-readobj', 'llvm-rtdyld', 'llvm-size', 'llvm-split', 'llvm-strings',		'llvm-readobj', 'llvm-rtdyld', 'llvm-size', 'llvm-split', 'llvm-strings',
▲ Show 20 Lines • Show All 203 Lines • Show Last 20 Lines

llvm/tools/extract/CMakeLists.txt

This file was added.

				set(LLVM_LINK_COMPONENTS
				Support
				)

				add_llvm_tool(extract
				extract.cpp
				)

llvm/tools/extract/extract.cpp

This file was added.

				//===- extract.cpp - Input splitting utility ------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Split input into multipe parts separated by regex '^(.\|//)--- ' and extract
				// the specified part.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/StringRef.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/FileOutputBuffer.h"
				#include "llvm/Support/LineIterator.h"
				#include "llvm/Support/MemoryBuffer.h"
				#include "llvm/Support/WithColor.h"
				#include <string>

				using namespace llvm;

				const char tool_name[] = "extract";
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'tool_name' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'tool_name' [readability-identifier…
				jhendersonUnsubmitted Done Reply Inline Actions `toolName` Also, should this be derived from `argv[0]`? That way if the tool gets renamed, the error message still uses the actual executable name. jhenderson: `toolName` Also, should this be derived from `argv[0]`? That way if the tool gets renamed, the…

				static cl::OptionCategory cat("extract Options");
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'cat' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'cat' [readability-identifier-naming]…
				jhendersonUnsubmitted Done Reply Inline Actions Is there a way of telling clang-tidy to allow a different naming style? jhenderson: Is there a way of telling clang-tidy to allow a different naming style?
				MaskRayAuthorUnsubmitted Done Reply Inline Actions Added `.clang-tidy` (like lldb/.clang-tidy and lld/.clang-tidy) MaskRay: Added `.clang-tidy` (like lldb/.clang-tidy and lld/.clang-tidy)
				jhendersonUnsubmitted Done Reply Inline Actions I'm not seeing the `.clang-tidy` in the file list? jhenderson: I'm not seeing the `.clang-tidy` in the file list?
				MaskRayAuthorUnsubmitted Done Reply Inline Actions It is in the file list: `A M llvm/tools/extract/.clang-tidy (19 lines)` MaskRay: It is in the file list: `A M llvm/tools/extract/.clang-tidy (19 lines)`

				static cl::opt<std::string> part(
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'part' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'part' [readability-identifier-naming]…
				cl::Positional,
				cl::desc(
				"Split input into multiple parts separated by regex '^(.\|//)--- ' and "
				"extract the specified part"),
				cl::cat(cat));

				static cl::opt<std::string> input(cl::Positional, cl::desc("filename"),
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'input' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'input' [readability-identifier-naming]…
				cl::cat(cat));

				static cl::opt<std::string> output("o", cl::desc("Output filename"),
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'output' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'output' [readability-identifier-naming]…
				cl::value_desc("filename"), cl::init("-"),
				cl::cat(cat));

				LLVM_ATTRIBUTE_NORETURN static void error(Twine message) {
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: twine variables are prone to use-after-free bugs [llvm-twine-local] not useful clang-tidy: warning: invalid case style for parameter 'message' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: twine variables are prone to use-after-free bugs [llvm-twine-local]…
				jhendersonUnsubmitted Done Reply Inline Actions `const Twine &`? jhenderson: `const Twine &`?
				WithColor::error(errs(), tool_name) << message << '\n';
				exit(1);
				grimarUnsubmitted Done Reply Inline Actions This is used only once I think. Perhaps it can be merged with the `error()` below: LLVM_ATTRIBUTE_NORETURN static void error(StringRef filename, const Twine &message) { if (filename.empty()) ... else ... exit(1); grimar: This is used only once I think. Perhaps it can be merged with the `error()` below: ```…
				}

				LLVM_ATTRIBUTE_NORETURN static void error(StringRef filename, Twine message) {
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'filename' [readability-identifier-naming] not useful clang-tidy: warning: twine variables are prone to use-after-free bugs [llvm-twine-local] not useful clang-tidy: warning: invalid case style for parameter 'message' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'filename' [readability-identifier…
				jhendersonUnsubmitted Done Reply Inline Actions Ditto. jhenderson: Ditto.
				WithColor::error(errs(), tool_name) << filename << ": " << message << '\n';
				exit(1);
				}

				static void handle(MemoryBuffer &inputBuf, StringRef input) {
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'inputBuf' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'input' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'inputBuf' [readability-identifier…
				const char docBegin = nullptr, docEnd = nullptr;
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'docBegin' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for variable 'docEnd' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'docBegin' [readability-identifier-naming]…
				jhendersonUnsubmitted Done Reply Inline Actions I might use `fileBegin` and `fileEnd`. jhenderson: I might use `fileBegin` and `fileEnd`.
				MaskRayAuthorUnsubmitted Done Reply Inline Actions Changed to `partBegin`/`partEnd`. Hope that is clearer. MaskRay: Changed to `partBegin`/`partEnd`. Hope that is clearer.
				int numEmptyLines = 0;
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'numEmptyLines' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'numEmptyLines' [readability-identifier…
				StringRef separator;
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'separator' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'separator' [readability-identifier…
				for (line_iterator i(inputBuf, /SkipBlanks=/false, '\0'); !i.is_at_eof();) {
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'i' [readability-identifier-naming]…
				StringRef line = *i++;
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'line' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'line' [readability-identifier-naming]…
				size_t markerLen = line.startswith("//") ? 6 : 5;
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'markerLen' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'markerLen' [readability-identifier…
				if (!(line.size() > markerLen &&
				line.substr(markerLen - 4).startswith("--- ")))
				continue;
				separator = line.substr(0, markerLen);
				StringRef cur = line.substr(markerLen);
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'cur' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'cur' [readability-identifier-naming]…
				if (cur == part) {
				grimarUnsubmitted Done Reply Inline Actions This supports `//---`, but doesn't seem you have a test for it? I'd also add a comment saying it tries to find `^(.\|//)--- <part>` line. Perhaps, it is easier to read this piece when it is: if (line.size() <= markerLen \|\| !line.substr(markerLen - 4).startswith("--- "))) continue; grimar: This supports `//---`, but doesn't seem you have a test for it? I'd also add a comment saying…
				MaskRayAuthorUnsubmitted Done Reply Inline Actions basic.s has `//--- cc` MaskRay: basic.s has `//--- cc`
				if (docBegin) {
				error(input, "'" + separator + cur + "' occurs more than once");
				exit(1);
				}
				numEmptyLines = i.line_number() - 1;
				grimarUnsubmitted Done Reply Inline Actions test? grimar: test?
				MaskRayAuthorUnsubmitted Done Reply Inline Actions basic.s has `# DUP:` MaskRay: basic.s has `# DUP:`
				if (i.is_at_eof())
				grimarUnsubmitted Done Reply Inline Actions You do not need this line, `error()` calls `exit(1)` inside. grimar: You do not need this line, `error()` calls `exit(1)` inside.
				break;
				docBegin = i->data();
				} else if (docBegin && !docEnd) {
				docEnd = line.data();
				}
				}
				if (!docBegin)
				error(input, "'" + separator + part + "' is not found");
				jhendersonUnsubmitted Done Reply Inline Actions was not found jhenderson: was not found
				if (!docEnd)
				docEnd = inputBuf.getBufferEnd();

				Expected<std::unique_ptr<FileOutputBuffer>> outputBuf =
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'outputBuf' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'outputBuf' [readability-identifier…
				grimarUnsubmitted Done Reply Inline Actions test? grimar: test?
				MaskRayAuthorUnsubmitted Done Reply Inline Actions basic.s has `was not found` MaskRay: basic.s has `was not found`
				FileOutputBuffer::create(output, numEmptyLines + (docEnd - docBegin));
				if (!outputBuf)
				error(input, toString(outputBuf.takeError()));
				uint8_t buf = (outputBuf)->getBufferStart();
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'buf' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'buf' [readability-identifier-naming]…
				std::fill_n(buf, numEmptyLines, '\n');
				jhendersonUnsubmitted Done Reply Inline Actions If I read this correctly, this will print a series of empty lines before the output. I'm not sure I see the benefit of that? What about empty lines after the contents? jhenderson: If I read this correctly, this will print a series of empty lines before the output. I'm not…
				jhendersonUnsubmitted Done Reply Inline Actions Oh, I just read the mailing list comment saying this is to preserve line numbers. It might be helpful having a comment here in the code explaining this. jhenderson: Oh, I just read the mailing list comment saying this is to preserve line numbers. It might be…
				std::copy(docBegin, docEnd, buf + numEmptyLines);
				if (Error e = (*outputBuf)->commit())
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'e' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'e' [readability-identifier-naming]…
				error(input, toString(std::move(e)));
				}

				int main(int argc, const char **argv) {
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for parameter 'argc' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for parameter 'argv' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for parameter 'argc' [readability-identifier-naming]…
				cl::HideUnrelatedOptions({&cat});
				cl::ParseCommandLineOptions(argc, argv, "extract\n", nullptr,
				/EnvVar=/nullptr,
				/LongOptionsUseDoubleDash=/true);

				if (input.empty())
				error("input filename is not specified");
				ErrorOr<std::unique_ptr<MemoryBuffer>> buffer_or_err =
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'buffer_or_err' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'buffer_or_err' [readability-identifier…
				jhendersonUnsubmitted Done Reply Inline Actions `bufferOrErr` jhenderson: `bufferOrErr`
				MemoryBuffer::getFileOrSTDIN(input);
				if (std::error_code ec = buffer_or_err.getError())
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for variable 'ec' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for variable 'ec' [readability-identifier-naming]…
				error(input, ec.message());
				handle(**buffer_or_err, input);
				}
				grimarUnsubmitted Done Reply Inline Actions test? grimar: test?
				MaskRayAuthorUnsubmitted Done Reply Inline Actions `basic.s` has `# NO_INPUT:` MaskRay: `basic.s` has `# NO_INPUT:`

This is an archive of the discontinued LLVM Phabricator instance.

Add test utility 'split-file'ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 278028

llvm/test/CMakeLists.txt

llvm/test/lit.cfg.py

llvm/tools/extract/CMakeLists.txt

llvm/tools/extract/extract.cpp

Add test utility 'split-file'
ClosedPublic