Page MenuHomePhabricator

Add test utility 'split-file'
ClosedPublic

Authored by MaskRay on Jul 14 2020, 5:31 PM.

Details

Summary

See https://lists.llvm.org/pipermail/llvm-dev/2020-July/143373.html
"[llvm-dev] Multiple documents in one test file" for some discussions.

This patch has explored several alternatives. The current semantics are similar to
what @dblaikie proposed.
split-file filename output splits the input file into multiple parts separated by
regex ^(.|//)--- filename and write each part to the file output/filename
(filename can include path separators).

Use case A (organizing input of different formats (e.g. linker
script+assembly) in one file).

# RUN: split-file %s %t
# RUN: llvm-mc %t/asm -o %t.o
# RUN: ld.lld -T %t/lds %t.o -o %t
This is sometimes better than the %S/Inputs/ approach because the user
can see the auxiliary files immediately and don't have to open another file.

# asm
...
# lds
...

Use case B (for utilities which don't have built-in input splitting
feature):

// RUN: split-file %s %t
// RUN: llc < %t/1.ll | FileCheck %s --check-prefix=CASE1
// RUN: llc < %t/2.ll | FileCheck %s --check-prefix=CASE2
Combing tests prudently can improve readability.
For example, when testing parsing errors if the recovery mechanism isn't possible,
grouping the tests in one file can more readily see test coverage/strategy.

//--- 1.ll
...
//--- 2.ll
...

Since this is a new utility, there is no git history concerns for
UpperCase variable names. I use lowerCase variable names like mlir/lld.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
MaskRay marked an inline comment as done.Jul 17 2020, 12:08 PM
MaskRay added subscribers: michaelplatings, lattner.
MaskRay added inline comments.
llvm/tools/extract/.clang-tidy
1 ↗(On Diff #278826)

For distinct top-level projects that seems sort of OK - but I don't think it's appropriate to have divergent styles within a single top-level project, such as LLVM.

I don't think people expressed that this is not OK for the previous discussion.

@michaelplatings @lattner

MaskRay updated this revision to Diff 280141.Jul 23 2020, 8:29 AM

lit substitution: extract -> %extract

Unfortunately, 'extract' in 'llvm-xray extract' will be substituted, due to D83578

MaskRay edited the summary of this revision. (Show Details)Jul 23 2020, 8:29 AM
MaskRay updated this revision to Diff 280314.Jul 23 2020, 7:12 PM
MaskRay edited the summary of this revision. (Show Details)

Use %extract only in not %extract

This revision was automatically updated to reflect the committed changes.

Use %extract only in not %extract

So does 'extract' not work the same as all other tools in LLVM? It doesn't get substituted in non-starting locations?

That inconsistency seems undesirable/best avoided. The xray collission is problematic, to be sure - might indicate that another name should be used, or perhaps some escaping to disable substitution would be useful (quotation marks seem like a fairly accessible/simple syntax to disable substitution, for instance)

It's maybe a bit surprising that we have both llvm-extract and extract that are unrelated binaries?

It's maybe a bit surprising that we have both llvm-extract and extract that are unrelated binaries?

Yes. Seems that nobody is unhappy, though http://lists.llvm.org/pipermail/llvm-dev/2020-July/143373.html

  1. If we add a standalone utility, how shall we name it? (Note that llvm-extract exists, but people can probably distinguish 'extract' from llvm-extract

Use %extract only in not %extract

So does 'extract' not work the same as all other tools in LLVM? It doesn't get substituted in non-starting locations?

That inconsistency seems undesirable/best avoided. The xray collission is problematic, to be sure - might indicate that another name should be used, or perhaps some escaping to disable substitution would be useful (quotation marks seem like a fairly accessible/simple syntax to disable substitution, for instance)

llvm/test/lit.cfg.py

# FIXME: Why do we have both `lli` and `%lli` that do slightly different things?
tools.extend([
    'dsymutil', 'lli', 'lli-child-target', 'llvm-ar', 'llvm-as',
    'llvm-addr2line', 'llvm-bcanalyzer', 'llvm-config', 'llvm-cov',
    'llvm-cxxdump', 'llvm-cvtres', 'llvm-diff', 'llvm-dis', 'llvm-dwarfdump',
    'llvm-exegesis', 'llvm-extract', 'llvm-isel-fuzzer', 'llvm-ifs',
    'llvm-install-name-tool', 'llvm-jitlink', 'llvm-opt-fuzzer', 'llvm-lib',
    'llvm-link', 'llvm-lto', 'llvm-lto2', 'llvm-mc', 'llvm-mca',
    'llvm-modextract', 'llvm-nm', 'llvm-objcopy', 'llvm-objdump',
    'llvm-pdbutil', 'llvm-profdata', 'llvm-ranlib', 'llvm-rc', 'llvm-readelf',
    'llvm-readobj', 'llvm-rtdyld', 'llvm-size', 'llvm-split', 'llvm-strings',
    'llvm-strip', 'llvm-tblgen', 'llvm-undname', 'llvm-c-test', 'llvm-cxxfilt',
    'llvm-xray', 'yaml2obj', 'obj2yaml', 'yaml-bench', 'verify-uselistorder',
    'bugpoint', 'llc', 'llvm-symbolizer', 'opt', 'sancov', 'sanstats'])

Any tool in this list is special. They can be substituted in any position on a RUN line, either a 'command' position or an 'argument' position. I hope that we can deprecated this behavior.

Use %extract only in not %extract

So does 'extract' not work the same as all other tools in LLVM? It doesn't get substituted in non-starting locations?

That inconsistency seems undesirable/best avoided. The xray collission is problematic, to be sure - might indicate that another name should be used, or perhaps some escaping to disable substitution would be useful (quotation marks seem like a fairly accessible/simple syntax to disable substitution, for instance)

llvm/test/lit.cfg.py

# FIXME: Why do we have both `lli` and `%lli` that do slightly different things?
tools.extend([
    'dsymutil', 'lli', 'lli-child-target', 'llvm-ar', 'llvm-as',
    'llvm-addr2line', 'llvm-bcanalyzer', 'llvm-config', 'llvm-cov',
    'llvm-cxxdump', 'llvm-cvtres', 'llvm-diff', 'llvm-dis', 'llvm-dwarfdump',
    'llvm-exegesis', 'llvm-extract', 'llvm-isel-fuzzer', 'llvm-ifs',
    'llvm-install-name-tool', 'llvm-jitlink', 'llvm-opt-fuzzer', 'llvm-lib',
    'llvm-link', 'llvm-lto', 'llvm-lto2', 'llvm-mc', 'llvm-mca',
    'llvm-modextract', 'llvm-nm', 'llvm-objcopy', 'llvm-objdump',
    'llvm-pdbutil', 'llvm-profdata', 'llvm-ranlib', 'llvm-rc', 'llvm-readelf',
    'llvm-readobj', 'llvm-rtdyld', 'llvm-size', 'llvm-split', 'llvm-strings',
    'llvm-strip', 'llvm-tblgen', 'llvm-undname', 'llvm-c-test', 'llvm-cxxfilt',
    'llvm-xray', 'yaml2obj', 'obj2yaml', 'yaml-bench', 'verify-uselistorder',
    'bugpoint', 'llc', 'llvm-symbolizer', 'opt', 'sancov', 'sanstats'])

Any tool in this list is special. They can be substituted in any position on a RUN line, either a 'command' position or an 'argument' position. I hope that we can deprecated this behavior.

This looks like a fairly comprehensive list of llvm utilities - seems like "extract" should be included in that list until that deprecation design discussion reaches an agreement. I don't think it's suitable to start down that path before an agreement is reached in an ongoing design discussion.

It's maybe a bit surprising that we have both llvm-extract and extract that are unrelated binaries?

Yes. Seems that nobody is unhappy, though http://lists.llvm.org/pipermail/llvm-dev/2020-July/143373.html

  1. If we add a standalone utility, how shall we name it? (Note that llvm-extract exists, but people can probably distinguish 'extract' from llvm-extract

Fairly small sample of folks in that thread - designing for all the rest of the LLVM developers isn't necessarily "what no one personally objects to".

(though the naming collission with xray extract did get me thinking about naming in general, but also the semantics - would it make sense to call this something like "fragment" (or other synonyms to "split", etc) and have the semantics be more like "write all the file fragments into the specified directory, named after the fragment name" - so you only have to run this tool once, rather than once for every fragment you have?)

Use %extract only in not %extract

So does 'extract' not work the same as all other tools in LLVM? It doesn't get substituted in non-starting locations?

That inconsistency seems undesirable/best avoided. The xray collission is problematic, to be sure - might indicate that another name should be used, or perhaps some escaping to disable substitution would be useful (quotation marks seem like a fairly accessible/simple syntax to disable substitution, for instance)

llvm/test/lit.cfg.py

# FIXME: Why do we have both `lli` and `%lli` that do slightly different things?
tools.extend([
    'dsymutil', 'lli', 'lli-child-target', 'llvm-ar', 'llvm-as',
    'llvm-addr2line', 'llvm-bcanalyzer', 'llvm-config', 'llvm-cov',
    'llvm-cxxdump', 'llvm-cvtres', 'llvm-diff', 'llvm-dis', 'llvm-dwarfdump',
    'llvm-exegesis', 'llvm-extract', 'llvm-isel-fuzzer', 'llvm-ifs',
    'llvm-install-name-tool', 'llvm-jitlink', 'llvm-opt-fuzzer', 'llvm-lib',
    'llvm-link', 'llvm-lto', 'llvm-lto2', 'llvm-mc', 'llvm-mca',
    'llvm-modextract', 'llvm-nm', 'llvm-objcopy', 'llvm-objdump',
    'llvm-pdbutil', 'llvm-profdata', 'llvm-ranlib', 'llvm-rc', 'llvm-readelf',
    'llvm-readobj', 'llvm-rtdyld', 'llvm-size', 'llvm-split', 'llvm-strings',
    'llvm-strip', 'llvm-tblgen', 'llvm-undname', 'llvm-c-test', 'llvm-cxxfilt',
    'llvm-xray', 'yaml2obj', 'obj2yaml', 'yaml-bench', 'verify-uselistorder',
    'bugpoint', 'llc', 'llvm-symbolizer', 'opt', 'sancov', 'sanstats'])

Any tool in this list is special. They can be substituted in any position on a RUN line, either a 'command' position or an 'argument' position. I hope that we can deprecated this behavior.

This looks like a fairly comprehensive list of llvm utilities - seems like "extract" should be included in that list until that deprecation design discussion reaches an agreement. I don't think it's suitable to start down that path before an agreement is reached in an ongoing design discussion.

Things like not and count are not in the list. I don't think using %extract only for its own test a problem. Nobody uses not %extract in tests. It is there just to test the tool itself.

It's maybe a bit surprising that we have both llvm-extract and extract that are unrelated binaries?

Yes. Seems that nobody is unhappy, though http://lists.llvm.org/pipermail/llvm-dev/2020-July/143373.html

  1. If we add a standalone utility, how shall we name it? (Note that llvm-extract exists, but people can probably distinguish 'extract' from llvm-extract

Fairly small sample of folks in that thread - designing for all the rest of the LLVM developers isn't necessarily "what no one personally objects to".

(though the naming collission with xray extract did get me thinking about naming in general, but also the semantics - would it make sense to call this something like "fragment" (or other synonyms to "split", etc) and have the semantics be more like "write all the file fragments into the specified directory, named after the fragment name" - so you only have to run this tool once, rather than once for every fragment you have?)

'split' is a utility specified by POSIX.1-2017 Base.

"write all the file fragments into the specified directory, named after the fragment name" is a good idea. If it is implemented as a special mode, it can be: extract all %s -o %t.dir or extract all %s -o %t. -o specifies a directory name.

Instead of adding an explicit command 'extract', did you consider building this into llvm-mc and similar tools? This is will lead to more efficient tests and scales better to things that have diagnostic verification and other sorts of checks. This is also more consistent with the precedent MLIR has set here.

Instead of adding an explicit command 'extract', did you consider building this into llvm-mc and similar tools? This is will lead to more efficient tests and scales better to things that have diagnostic verification and other sorts of checks. This is also more consistent with the precedent MLIR has set here.

I considered the choice. mlir-opt/mlir-translate's -split-input-file/clang -verify is suitable when the input is homogeneous and the test is performed the tool itself.
For heterogeneous needs (llvm-objcopy --strip-symbols %t-list.txt %t %t5 (symbol list), ld.lld -T %t.lds %t.o (linker script)), or when the tool output is inspected by another tool, a standalone utility is likely more useful.

So far I like @dblaikie's semantics. I'd like syntax like extract %s -o %t.dir. For example the following will create %t.dir/{a.txt,b.ll,c.c}

#--- a.txt
0
;--- b.ll
1
//--- c.c
2

I am not settled on completely deleting the previous extract part %s syntax, or renaming 'extract' to something else.

Ok fair enough. I'd recommend a name like llvm-split-file or something like that given that we already have an llvm-extract tool that is very different.

Also, it is worth pointing out that we already have an (arguably much better) way to handle the problem you're trying to solve here:

Name your test something like "foo.ll", then have the auxilary files be named "foo.ll.xyz". You can then refer to them directly in the test with "%s.xyz". This is better because 1) it doesn't introduce another micro tool, 2) it is general to non text files, 3) it is easy to work with on the command line when a test breaks, and 4) this makes it easier for multiple tests to share the same file.

The only downside I see to this is the creation of more small files, but I think it is a good tradeoff to not introduce a new way of doing things here that is less general.

Ok fair enough. I'd recommend a name like llvm-split-file or something like that given that we already have an llvm-extract tool that is very different.

How about a short name, e.g. split-file? We have many lit utilities not named llvm-*: not, count, FileCheck. The utility is more of their league.

Also, it is worth pointing out that we already have an (arguably much better) way to handle the problem you're trying to solve here:

Name your test something like "foo.ll", then have the auxilary files be named "foo.ll.xyz". You can then refer to them directly in the test with "%s.xyz". This is better because 1) it doesn't introduce another micro tool, 2) it is general to non text files, 3) it is easy to work with on the command line when a test breaks, and 4) this makes it easier for multiple tests to share the same file.

The only downside I see to this is the creation of more small files, but I think it is a good tradeoff to not introduce a new way of doing things here that is less general.

Separate files are considered. Actually that motivated the standalone utility https://reviews.llvm.org/D83725#2149490

"I end up doing one of three things in this situation: 1) adding a separate file in the "Inputs" directory - this is not great because the test input is far away from the actual test (i.e. not in the same file), making it harder to follow; 2) echoing the second and later inputs to separate files at runtime - this is not great because it has a runtime cost; ..."

I don't see any reason to prefer a short name here, this isn't a simple too like 'opt' it takes command line flags etc.

Unrelatedly, I don't agree with your rationale for having a separate tool. You're right that moving files to an Inputs directory moves them further away, but that isn't what I was suggesting. Also, there is great precedent across the testsuite for this, and the proposed tool isn't a general solution to the problem (e.g. binary files etc).

I don't see any reason to prefer a short name here, this isn't a simple too like 'opt' it takes command line flags etc.

not, count, FileCheck, update_*_test_checks.py. These utilities set a precedent that these auxiliary test-only utilities are not named llvm-*.

Unrelatedly, I don't agree with your rationale for having a separate tool. You're right that moving files to an Inputs directory moves them further away, but that isn't what I was suggesting. Also, there is great precedent across the testsuite for this, and the proposed tool isn't a general solution to the problem (e.g. binary files etc).

I acknowledge that this falls into subjective points of view and people may have different opinions. For me, non-trivial separate files (not creatable with one-line echo) have caused enough pain to me. It is not unusual for me open two or three auxiliary files in Inputs/ to understand the purpose of a test. If I don't count wrong, at least @grimar, @jhenderson and @probinson hold a similar viewpoint.

(For binary files, they are sometimes useful, e.g. when testing compatibility of LLVM IR. a llvm/test/Bitcode/ pre-built file ensures that compatibility is retained. However, their use cases are very narrow. In 99% cases textual formats will be a superior replacement. I hope we don't let 1% inapplicable use case to be the reason that a general purpose testing utility should not be introduced. )

This comment was removed by dblaikie.

I'm not sure how googles test runner works or how it differs. I recall many historical examples (in clang in particular), but it is possible they all got changed.

There's no need to speculate though :-), I'd just try an example to see if it works in practice in your environment.

I don't see any reason to prefer a short name here, this isn't a simple too like 'opt' it takes command line flags etc.

I think if it's going to be invoked once per fragment, inline with the destination (extract %s test1 | llc ... ) then having a shorter name is nice. If it's going to be invoked once to split everything up - that'd be on a separate RUN line anyway, and having a longer name seems fine by me. Whether or not it has an llvm- prefix I don't feel /too/ strongly about, but it does make clear that this isn't some unix vocabulary program (like not, count, etc), but a specific LLVM thing.

Name your test something like "foo.ll", then have the auxilary files be named "foo.ll.xyz". You can then refer to them directly in the test with "%s.xyz". This is better because 1) it doesn't introduce another micro tool, 2) it is general to non text files, 3) it is easy to work with on the command line when a test breaks, and 4) this makes it easier for multiple tests to share the same file.
...
Unrelatedly, I don't agree with your rationale for having a separate tool. You're right that moving files to an Inputs directory moves them further away, but that isn't what I was suggesting. Also, there is great precedent across the testsuite for this, and the proposed tool isn't a general solution to the problem (e.g. binary files etc).

What's the precedent you're referring to here? Could you point me to some examples? Because my understanding was that we have a fairly strong precedent for input files being in an Inputs directory (in fact, I think Google's internal test runner relies on this convention) - having other files immediately next to test files I don't think is done & if it was done, I'd worry about the suffixes potentially colliding with valid test prefixes, then the auxiliary files would be incorrectly run as independent tests & probably fail due to lack of RUN lines, etc.

(3) is certainly compelling to me - being able to copy/paste lines from a test to reproduce is nice - though lots of tests are stateful - generating something, then doing a thing with it (running objdump and dwarfdump to inspect different features of the output - so you can't just copy/paste the dwarfdump+filecheck, you have to rerun the right command that generated the input too, etc).

I guess one thing we could do with this functionality that might make tests more reliable - this tool could delete the contents of the target directory before extracting the files (maybe this would be conflating responsibilities a bit and catch someone by surprise, though) reducing the incidence of tests being polluted by previous executions.

To throw-in another naming suggestion, clang uses that name "unbundle".

Also see http://lists.llvm.org/pipermail/llvm-dev/2020-July/143568.html

grimar added a comment.EditedJul 25 2020, 1:50 AM

Unrelatedly, I don't agree with your rationale for having a separate tool. You're right that moving files to an Inputs directory moves them further away, but that isn't what I was suggesting. Also, there is great precedent across the testsuite for this, and the proposed tool isn't a general solution to the problem (e.g. binary files etc).

I acknowledge that this falls into subjective points of view and people may have different opinions. For me, non-trivial separate files (not creatable with one-line echo) have caused enough pain to me. It is not unusual for me open two or three auxiliary files in Inputs/ to understand the purpose of a test. If I don't count wrong, at least @grimar, @jhenderson and @probinson hold a similar viewpoint.

(For binary files, they are sometimes useful, e.g. when testing compatibility of LLVM IR. a llvm/test/Bitcode/ pre-built file ensures that compatibility is retained. However, their use cases are very narrow. In 99% cases textual formats will be a superior replacement. I hope we don't let 1% inapplicable use case to be the reason that a general purpose testing utility should not be introduced. )

I have a strong +1 to have a way to keep inputs and test in a single file. It significantly improves readability from my experience. E.g. we have --docnum=x
option for yaml2obj and it is very common for llvm-readelf tests to have a few little YAML inputs in the same file. We introduced macros support for yaml2obj (-DMACRO=<val>)
and default values for them recently and this also helps to reuse YAML descriptions from the same file what generally helps to isolate test cases, reuse them for similar tests and keep the whole file compact.
If we had separate inputs it would be very hard to mantain those tests I believe.

Initially D84054 suggested the --doc-id=<id> option for llvm-mc which could help to solve a problem for asm files to avoid splitting. I think it was good solution by itself,
but also, having a separate tool is probably a more general solution for achieving the same.

I don't think I have much new to add that hasn't been said already. My personal opinions are:

  1. Naming - I don't really mind whether it has a long or short name. Given it's usually going to be on a separate line anyway, I suspect, rather than writing stuff to stdout for feeding into stdin, I don't think a longer name is an issue. The point about it not being a unix tool equivalent makes some sense to me for adding an llvm- prefix, although there are counter-examples for this. I also have no preference on the exact name, as long as it conveys meaning well enough ([llvm-]split-file seems clear enough for example).
  2. I want my test inputs to be all in one file where possible, as it makes it much easier to iterate on a test, understand it, etc, by keeping everything in the single file. There is a place for separate inputs, where they are going to be commonly reused across many tests (see for example the recent llvm-libtool-darwin testing, which reuses the same YAML repeatedly), but I think these are more the exception than the rule. It seems to me that the %s.xxx is not much better than being in %S\Inputs in this regards.
  3. @dblaikie's concern about possible extension collisions with the %s.xxx approach is one I'd share. One example might be an lld or llvm-objdump test consuming multiple asm files. A naive user might call them %s.1.s, %s.2.s etc and if they're just writing the one test, won't notice that when the whole directory is tested, including those files, resulting in lit failures.
  4. An "extract all" option would be useful. In fact, assuming we don't start adding other functionality to the tool (such as string substitution), I suspect the extract everything approach is going to be almost universally the better option, so might want to be the default behaviour.

Hi All,

I'm concerned that this patch was just landed despite the open discussion about problems. As one example, it doesn't make any sense to me that we now have llvm/tools/extract and llvm/tools/llvm-extract. Please revert until the details are sorted out and we get to convergence on the issues.

-Chris

MaskRay reopened this revision.Jul 28 2020, 1:26 PM
This revision is now accepted and ready to land.Jul 28 2020, 1:26 PM
MaskRay planned changes to this revision.Jul 28 2020, 1:26 PM
ychen added a subscriber: ychen.Jul 28 2020, 1:30 PM

Thank you Fangrui!

MaskRay updated this revision to Diff 281660.Jul 29 2020, 10:31 AM
MaskRay retitled this revision from Add test utility 'extract' to Add test utility 'split-file'.
MaskRay edited the summary of this revision. (Show Details)

Rename extract to split-file and change the interface

This revision is now accepted and ready to land.Jul 29 2020, 10:31 AM
MaskRay edited the summary of this revision. (Show Details)Jul 29 2020, 10:32 AM
MaskRay added inline comments.Jul 29 2020, 12:44 PM
llvm/test/tools/split-file/basic.test
11

I'll enhance the test a bit:

## Test that we will delete the output if it is a file, so that we can create
## a directory.
# RUN: touch %t
jhenderson added inline comments.Jul 30 2020, 4:57 AM
llvm/tools/split-file/split-file.cpp
64

Whilst it's probably harmless, I don't think this should be an int (and similar comments for other liner number variables, e.g. lineNo in handle), since it can never be negative. size_t or possibly unsigned seem more appropriate for the context.

97

I'm not sure it's immediately obvious here what the type of cur is, so I'd prefer no auto.

114

The name it implies to me that this is an iterator, but it's actually a pair, right? Perhaps best to change the name to something (e.g. keyValue or whatever)

118

You already have an ec local variable here. Perhaps this should be renamed and/or the earlier ec declaration moved to where it is actually used.

158

sys::fs::remove returns a std::error_code. I'm not sure we want to be ignoring it - I'd expect some sort of error in that case to be printed.

MaskRay updated this revision to Diff 281958.Jul 30 2020, 9:24 AM
MaskRay marked 3 inline comments as done.

Address comments.
Enhance tests.

MaskRay marked an inline comment as done.Jul 30 2020, 9:25 AM
MaskRay added inline comments.Jul 30 2020, 9:28 AM
llvm/tools/split-file/split-file.cpp
64

Emm. I think it is debatable whether undefined is a suitable type here. See comments starting from https://reviews.llvm.org/D82594#2127217 for some discussions.

I actually perform arithmetic near zero below (i.line_number() - 1). int gives me more confidence that things don't go off.

MaskRay added inline comments.Jul 30 2020, 9:30 AM
llvm/test/tools/split-file/output-is-special.test
5

I'll change this comment to:

## Don't delete the output if it is special, otherwise root may accidentally
## remove important special files.
jhenderson accepted this revision.Jul 31 2020, 12:37 AM

LGTM, with or without the suggestion, but since others had comments about this, it would be good to get another pair of eyes to give it another look over.

llvm/tools/split-file/split-file.cpp
64

It turns out that line_number is an int64_t. So I drop my point about unsigned or size_t (at least for this case, but I'm hardly convinced on the general case discussed on that or previous threads). However, perhaps leadingLines etc should be int64_t to match and avoid any truncation issues (noting in particular that the value could be anything in the range due to how EOF is handled)?

127

Another possible int64_t site.

lattner accepted this revision.Aug 2 2020, 12:47 PM

Thank you for renaming this!

MaskRay updated this revision to Diff 282671.Aug 3 2020, 10:18 AM
MaskRay marked 3 inline comments as done.

int -> int64_t

MaskRay edited the summary of this revision. (Show Details)Aug 3 2020, 2:43 PM
This revision was automatically updated to reflect the committed changes.

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

This is strange. A Windows bot I usually look was good: http://45.33.8.238/win/summary.html

Can you please check why '\r' is printed? The output is opened with OF_None (not OF_Text), so I don't expect '\r' to be printed.

At the least diff -b can be used.

ro added a subscriber: ro.Aug 7 2020, 12:07 AM

Some of the new tests FAIL when run in the same tree a second time:

LLVM :: tools/llvm-strings/radix.test
LLVM :: tools/split-file/basic.test
LLVM :: tools/split-file/empty.test
LLVM :: tools/split-file/no-leading-lines.test

The failure mode is always the same, e.g.

split-file: error: /var/llvm/local-sparcv9-relwithdebinfo-A/test/tools/llvm-strings/Output/radix.test.tmp: File exists

This affects the Solaris buildbots (e.g. the Solaris/sparcv9 one) and all others that run with clean=False.

grimar added a comment.EditedAug 7 2020, 1:10 AM

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

This is strange. A Windows bot I usually look was good: http://45.33.8.238/win/summary.html

Can you please check why '\r' is printed? The output is opened with OF_None (not OF_Text), so I don't expect '\r' to be printed.

At the least diff -b can be used.

FWIW, I've tested this (on windows) and all tests in \llvm\test\tools\split-file folder pass fine for me.

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

Have you got your git line-endings checkout settings correct? It's possible the \r is being added when you check out a file somewhere, I guess?

In D83834#2201952, @ro wrote:

Some of the new tests FAIL when run in the same tree a second time:

LLVM :: tools/llvm-strings/radix.test
LLVM :: tools/split-file/basic.test
LLVM :: tools/split-file/empty.test
LLVM :: tools/split-file/no-leading-lines.test

The failure mode is always the same, e.g.

split-file: error: /var/llvm/local-sparcv9-relwithdebinfo-A/test/tools/llvm-strings/Output/radix.test.tmp: File exists

This affects the Solaris buildbots (e.g. the Solaris/sparcv9 one) and all others that run with clean=False.

I just tried running the tests twice on my Windows machine, and they pass (no cleaning in between). At a guess, this is something different in the Solaris OS behaviour?

@MaskRay, I just noticed that this new code is in llvm/tools, but I think it belongs in llvm/utils, because it is more like tools like FileCheck, not, count etc intended for internal testing. What do you think?

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

Have you got your git line-endings checkout settings correct? It's possible the \r is being added when you check out a file somewhere, I guess?

Yup that's it. https://llvm.org/docs/GettingStarted.html#checkout-llvm-from-git says to use autocrlf=false which I wasn't doing. I had to delete and restore those files to get the proper line endings, but now it's good, thanks!

MaskRay added a comment.EditedAug 10 2020, 1:35 PM

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

Have you got your git line-endings checkout settings correct? It's possible the \r is being added when you check out a file somewhere, I guess?

In D83834#2201952, @ro wrote:

Some of the new tests FAIL when run in the same tree a second time:

LLVM :: tools/llvm-strings/radix.test
LLVM :: tools/split-file/basic.test
LLVM :: tools/split-file/empty.test
LLVM :: tools/split-file/no-leading-lines.test

The failure mode is always the same, e.g.

split-file: error: /var/llvm/local-sparcv9-relwithdebinfo-A/test/tools/llvm-strings/Output/radix.test.tmp: File exists

This affects the Solaris buildbots (e.g. the Solaris/sparcv9 one) and all others that run with clean=False.

I just tried running the tests twice on my Windows machine, and they pass (no cleaning in between). At a guess, this is something different in the Solaris OS behaviour?

@MaskRay, I just noticed that this new code is in llvm/tools, but I think it belongs in llvm/utils, because it is more like tools like FileCheck, not, count etc intended for internal testing. What do you think?

According to http://lists.llvm.org/pipermail/llvm-dev/2020-August/143995.html

  1. llvm/utils. -> this builds some executables like tablegen
  2. llvm/lib. -> This is all libraries, shouldn’t include executables.
  3. llvm/tools. -> Generally executables, also some libraries that are tool specific.
  4. tests.. -> Things that depend on the above.

I am on the fence which of llvm/tools/split-file and llvm/utils/split-file makes more sense. I'd likely to collect a bit more opinions.

@ro Can you set a breakpoint while running split-file, checking whether split-file can delete an existing file and replace it with a directory on Solaris?
This behavior is intentionally added to support changing %t from a file to a directory. It appears to work well on macOS, Linux and Windows.

ro added a comment.Aug 10 2020, 1:57 PM

@ro Can you set a breakpoint while running split-file, checking whether split-file can delete an existing file and replace it with a directory on Solaris?
This behavior is intentionally added to support changing %t from a file to a directory. It appears to work well on macOS, Linux and Windows.

Here's what I found: the failing call to split-file is like this:

./bin/split-file --no-leading-lines /vol/llvm/src/llvm-project/local/llvm/test/tools/split-file/empty.test /var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp

Running under truss shows

19028:  fstatat(AT_FDCWD, "/var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp", 0xFFFFFFFF7FFFE380, AT_SYMLINK_NOFOLLOW) = 0
19028:      d=0x0000010600010012 i=212407 m=0042750 l=2  u=2110  g=4620  sz=3
19028:          at = Aug 10 11:21:41 MEST 2020  [ 1597051301.681593645 ]
19028:          mt = Aug 10 01:46:42 MEST 2020  [ 1597016802.271083280 ]
19028:          ct = Aug 10 01:46:42 MEST 2020  [ 1597016802.271083280 ]
19028:      bsz=512   blks=3     fs=zfs
19028:  unlinkat(AT_FDCWD, "/var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp", AT_REMOVEDIR) Err#17 EEXIST

i.e. it tries to remove a directory. However, this cannot work since the directory in question isn't empty:

$ ls -la /var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp
total 4
drwxr-s--- 2 ro gcc  3 Aug 10 01:46 ./
drwxr-sr-x 5 ro gcc 11 Aug 10 18:32 ../
-rw-r--r-- 1 ro gcc  0 Aug 10 01:46 empty

When I run split-file in gdb with a breakpoint in rmdir, I get

#0  0xffffffff7eeb7388 in rmdir () from /lib/64/libc.so.1
#1  0xffffffff7ee2c47c in remove () from /lib/64/libc.so.1
#2  0x000000010025069c in llvm::sys::fs::remove (path=..., 
    IgnoreNonExisting=true)
    at /vol/llvm/src/llvm-project/local/llvm/lib/Support/Unix/Path.inc:442
#3  0x00000001001821ac in main (argc=4, argv=0xffffffff7fffeb38)
    at /vol/llvm/src/llvm-project/local/llvm/tools/split-file/split-file.cpp:168

So llvm::sys::fs::remove tries to call ::remove on a non-empty directory, while the man page clearly states:

The  remove() function causes the file or empty directory whose name is
the string pointed to by path to be no longer accessible by that  name.
A  subsequent  attempt  to  open  that  file using that name will fail,
unless the file is created anew.

For files, remove() is identical to unlink(). For directories, remove()
is identical to rmdir().

and in turn rmdir(2) says

The  rmdir()  function  removes  the  directory  named by the path name
pointed to by path. The directory must not have any entries other  than
"." and "..".

This just cannot work!

There seem to be newline issues on Windows, causing llvm/test/tools/llvm-strings/radix.test and llvm/test/tools/split-file/basic.test to fail.

$ xxd llvm/test/tools/split-file/Inputs/basic-aa.txt
00000000: 0d0a 6161 0d0a ..aa..
$ xxd build_debug/obj/llvm/test/tools/split-file/Output/basic.test.tmp/aa
00000000: 0a61 610d 0a .aa..

Any simple fix?

Have you got your git line-endings checkout settings correct? It's possible the \r is being added when you check out a file somewhere, I guess?

Yup that's it. https://llvm.org/docs/GettingStarted.html#checkout-llvm-from-git says to use autocrlf=false which I wasn't doing. I had to delete and restore those files to get the proper line endings, but now it's good, thanks!

Might still be worth either making this test more general and/or setting this file to be binary rather than text, so it doesn't trip over this issue?

(from the Getting Started page: "Passing --config core.autocrlf=false should not be required in the future after we adjust the .gitattribute settings correctly, but is required for Windows users at the time of this writing.")

In D83834#2208107, @ro wrote:

@ro Can you set a breakpoint while running split-file, checking whether split-file can delete an existing file and replace it with a directory on Solaris?
This behavior is intentionally added to support changing %t from a file to a directory. It appears to work well on macOS, Linux and Windows.

Here's what I found: the failing call to split-file is like this:

./bin/split-file --no-leading-lines /vol/llvm/src/llvm-project/local/llvm/test/tools/split-file/empty.test /var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp

Running under truss shows

19028:  fstatat(AT_FDCWD, "/var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp", 0xFFFFFFFF7FFFE380, AT_SYMLINK_NOFOLLOW) = 0
19028:      d=0x0000010600010012 i=212407 m=0042750 l=2  u=2110  g=4620  sz=3
19028:          at = Aug 10 11:21:41 MEST 2020  [ 1597051301.681593645 ]
19028:          mt = Aug 10 01:46:42 MEST 2020  [ 1597016802.271083280 ]
19028:          ct = Aug 10 01:46:42 MEST 2020  [ 1597016802.271083280 ]
19028:      bsz=512   blks=3     fs=zfs
19028:  unlinkat(AT_FDCWD, "/var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp", AT_REMOVEDIR) Err#17 EEXIST

i.e. it tries to remove a directory. However, this cannot work since the directory in question isn't empty:

$ ls -la /var/llvm/local-sparcv9-A/test/tools/split-file/Output/empty.test.tmp
total 4
drwxr-s--- 2 ro gcc  3 Aug 10 01:46 ./
drwxr-sr-x 5 ro gcc 11 Aug 10 18:32 ../
-rw-r--r-- 1 ro gcc  0 Aug 10 01:46 empty

When I run split-file in gdb with a breakpoint in rmdir, I get

#0  0xffffffff7eeb7388 in rmdir () from /lib/64/libc.so.1
#1  0xffffffff7ee2c47c in remove () from /lib/64/libc.so.1
#2  0x000000010025069c in llvm::sys::fs::remove (path=..., 
    IgnoreNonExisting=true)
    at /vol/llvm/src/llvm-project/local/llvm/lib/Support/Unix/Path.inc:442
#3  0x00000001001821ac in main (argc=4, argv=0xffffffff7fffeb38)
    at /vol/llvm/src/llvm-project/local/llvm/tools/split-file/split-file.cpp:168

So llvm::sys::fs::remove tries to call ::remove on a non-empty directory, while the man page clearly states:

The  remove() function causes the file or empty directory whose name is
the string pointed to by path to be no longer accessible by that  name.
A  subsequent  attempt  to  open  that  file using that name will fail,
unless the file is created anew.

For files, remove() is identical to unlink(). For directories, remove()
is identical to rmdir().

and in turn rmdir(2) says

The  rmdir()  function  removes  the  directory  named by the path name
pointed to by path. The directory must not have any entries other  than
"." and "..".

This just cannot work!

std::errc::directory_not_empty is intentionally excluded. What error code is it on Solaris?

if (std::error_code ec = sys::fs::remove(output, /*IgnoreNonExisting=*/true))
  if (ec.value() != static_cast<int>(std::errc::directory_not_empty))
    fatal(output, ec.message());
ro added a comment.Aug 10 2020, 2:46 PM
In D83834#2208107, @ro wrote:

std::errc::directory_not_empty is intentionally excluded. What error code is it on Solaris?

if (std::error_code ec = sys::fs::remove(output, /*IgnoreNonExisting=*/true))
  if (ec.value() != static_cast<int>(std::errc::directory_not_empty))
    fatal(output, ec.message());

On Solaris it's EEXIST/file_exists, as documented on the Solaris rmdir(2) man page and explicitly allowed by POSIX/XPG7:

[EEXIST] or [ENOTEMPTY]
    The path argument names a directory that is not an empty directory, or there are hard links to the directory other than dot or a single entry in dot-dot.