This is an archive of the discontinued LLVM Phabricator instance.

[llvm-readobj] - Stop using precompiled objects in file-headers.test
ClosedPublic

Authored by grimar on Jul 16 2019, 8:15 AM.

Diff Detail

Event Timeline

grimar created this revision.Jul 16 2019, 8:15 AM

I feel look we've lost some test coverage here: a number of the checked fields have gone from being non-zero to being zero. This means that we're no longer testing that the property is actually read and printed correctly, so I think the YAML needs updating in these cases.

MaskRay added inline comments.Jul 17 2019, 7:25 AM
test/tools/llvm-readobj/file-headers.test
1

%t1.coff-arm

Super nit: it seems the object files are differentiated by their extensions, so the serial numbers 1, 2, 3, 4, etc can be deleted.

I feel look we've lost some test coverage here: a number of the checked fields have gone from being non-zero to being zero. This means that we're no longer testing that the property is actually read and printed correctly, so I think the YAML needs updating in these cases.

Let me clarify. Do you mean, for example, that for COFF-ARM64, SectionCount became 0, but was non-zero before?
Such things is a result of YAML simplifications and was my intention actually. I tried to make YAMLs as minimal as possible to show
how the headers are dumped. Like now, COFF-ARM64 check that when there is no sections then we print zero as expected, what is also a reasonable check I think.
I could just insert the unsimplified YAMLs instead, but that would make this test really huge and actually dirty. The binaries we have seems have too
much excessive content. Also that anyways would not actually test the each possible field. Because for full testing of that we would need to have at least 2 tests I think,
like with 0 sections and with N sections to show the difference. That applies for each field.

I am OK to insert unmodified or partially simplified YAMLs if you want. (I do not think I am ready to make a huge amount of work to add complete tests for the each field though)
What kind of YAMLs would make you happy enough? :)

(Btw, my motivation of doing this patch was: I want to simplify another test by moving its parts to another places,
like this one and eliminate. Generally I found very usefull to have no binaries in the any inputs, removing them should allow us to significantly
cleanup and improve the existent tests and also the tools).

I feel look we've lost some test coverage here: a number of the checked fields have gone from being non-zero to being zero. This means that we're no longer testing that the property is actually read and printed correctly, so I think the YAML needs updating in these cases.

Let me clarify. Do you mean, for example, that for COFF-ARM64, SectionCount became 0, but was non-zero before?

Yes, that's exactly what I was talking about.

(Btw, my motivation of doing this patch was: I want to simplify another test by moving its parts to another places,
like this one and eliminate. Generally I found very usefull to have no binaries in the any inputs, removing them should allow us to significantly
cleanup and improve the existent tests and also the tools).

No problem with the motivation at all. Having canned inputs can cause headaches too.

I think part of the problem is that there are WAY too many test cases in this single test file. I think it would be much simpler if we split it up into e.g. file-headers-coff.test, file-headers-elf.test etc (or similar). Further, I think they could be further divided to separate out the machine type testing from testing the other fields (i.e. having a test for file-headers-elf-machine.test or similar, which tests just the Format and Arch values). This would be a much smaller test. Only when an architecture has special behaviour for certain fields should we consider testing it more in detail. Each of the fields could then be tested individually in this way (or to a limited extent at least). It would lead to many more tests, but each individual test case would be much smaller, as it only needs to check one small thing.

I do think there is value in both 0 and non-zero values for most fields. These would become much simpler if the test cases were split up as I've described. Thus there could be elf-file-headers-section-count.s, elf-file-headers-shentsize.s, elf-file-headers-shstrndx.s, coff-file-headers-section-count.s etc.

Does that make sense? I know it's more work, and perhaps this is an okay change as an interim level, as long as we're moving towards the suggestion, and long-term don't lose test coverage.

grimar planned changes to this revision.EditedJul 18 2019, 2:04 AM

I feel look we've lost some test coverage here: a number of the checked fields have gone from being non-zero to being zero. This means that we're no longer testing that the property is actually read and printed correctly, so I think the YAML needs updating in these cases.

Let me clarify. Do you mean, for example, that for COFF-ARM64, SectionCount became 0, but was non-zero before?

Yes, that's exactly what I was talking about.

(Btw, my motivation of doing this patch was: I want to simplify another test by moving its parts to another places,
like this one and eliminate. Generally I found very usefull to have no binaries in the any inputs, removing them should allow us to significantly
cleanup and improve the existent tests and also the tools).

No problem with the motivation at all. Having canned inputs can cause headaches too.

I think part of the problem is that there are WAY too many test cases in this single test file. I think it would be much simpler if we split it up into e.g. file-headers-coff.test, file-headers-elf.test etc (or similar). Further, I think they could be further divided to separate out the machine type testing from testing the other fields (i.e. having a test for file-headers-elf-machine.test or similar, which tests just the Format and Arch values). This would be a much smaller test. Only when an architecture has special behaviour for certain fields should we consider testing it more in detail. Each of the fields could then be tested individually in this way (or to a limited extent at least). It would lead to many more tests, but each individual test case would be much smaller, as it only needs to check one small thing.

I do think there is value in both 0 and non-zero values for most fields. These would become much simpler if the test cases were split up as I've described. Thus there could be elf-file-headers-section-count.s, elf-file-headers-shentsize.s, elf-file-headers-shstrndx.s, coff-file-headers-section-count.s etc.

Does that make sense? I know it's more work, and perhaps this is an okay change as an interim level, as long as we're moving towards the suggestion, and long-term don't lose test coverage.

OK, let me do a split for this one for start and then we'll see where it goes.

grimar updated this revision to Diff 210572.Jul 18 2019, 8:03 AM

I splitted the huge test into a few little,
used -DFILE to test file names, used
--strict-whitespace and --match-full-lines.

Does it look fine for the first steps
(or what else can I do in this patch)?

MaskRay added inline comments.Jul 18 2019, 7:17 PM
test/tools/llvm-readobj/coff-file-headers.test
33 ↗(On Diff #210572)

Due to --strict-whitespace, there are now no space after NEXT:. Just a question: do you find anything that requires --strict-whitespace?

test/tools/llvm-readobj/elf-file-headers.test
2 ↗(On Diff #210572)

Nit: --check-prefix since you use the double-dash form for --strict-whitespace and --match-full-lines

grimar marked an inline comment as done.Jul 19 2019, 2:43 AM
grimar added inline comments.
test/tools/llvm-readobj/coff-file-headers.test
33 ↗(On Diff #210572)

In this patch we are testing how the headers are dumped.
And this is the main and the only test set atm for headers output I think.
I supposed that we might want to check that do not print excessive spaces for example somewhere.
My feeling is not strong here though. I am OK to remove --strict-whitespace to make the tests nicer probably.

jhenderson added inline comments.Jul 19 2019, 4:58 AM
test/tools/llvm-readobj/coff-file-headers.test
5 ↗(On Diff #210572)

Up to you what you think, but you can add some leading spaces after the '#' to keep things lining up nicely:

#      ARM:File:
# ARM-NEXT:Arch:
33 ↗(On Diff #210572)

I prefer having --strict-whitespace like you've got here. This helps us catch odd formatting issues that sometimes creep in. We don't need it on every test case, but for the basic ones, it makes sense.

57 ↗(On Diff #210572)

Nit: use double dash for check-prefix here and elsewhere, to be consistent with other options.

322 ↗(On Diff #210572)

Nit missing new line at EOF.

test/tools/llvm-readobj/elf-file-headers.test
130 ↗(On Diff #210572)

Nit: no new line at EOF.

grimar updated this revision to Diff 210827.Jul 19 2019, 6:55 AM
grimar marked 8 inline comments as done.
  • Addressed review comments.
test/tools/llvm-readobj/coff-file-headers.test
5 ↗(On Diff #210572)

Looks better!

grimar planned changes to this revision.Jul 19 2019, 6:58 AM

Plan more cosmetic changes.

grimar requested review of this revision.Jul 19 2019, 7:01 AM

Seems fine as is. (We do not want to swich -h to --h, right?)

Seems fine as is. (We do not want to swich -h to --h, right?)

No, not at all! -h is a single-letter option, so should have only one dash (see also -D for FileCheck).

test/tools/llvm-readobj/coff-file-headers.test
3 ↗(On Diff #210827)

I think here -DFILE makes more sense, since it looks like compiler defines. -D is a short option too, so should have only a single dash.

grimar updated this revision to Diff 210848.Jul 19 2019, 8:58 AM
  • --DFILE -> -DFILE
jhenderson accepted this revision.Jul 19 2019, 9:01 AM

Okay, looks good to me.

This revision is now accepted and ready to land.Jul 19 2019, 9:01 AM
MaskRay accepted this revision.Jul 19 2019, 9:13 AM

Is there a danger that by moving in this direction we loose some test coverage since we are no longer verifying the libObject can read a know set of binaries, but only that libObject can read the output yaml2obj? i.e. we could evolve the object format in llvm such that these tests would continue pass but the objects would no longer be standards compliant?

grimar added a comment.EditedJul 19 2019, 10:23 AM

Is there a danger that by moving in this direction we loose some test coverage since we are no longer verifying the libObject can read a know set of binaries, but only that libObject can read the output yaml2obj?

Having a precompiled binaries committed is very unflexible generally and we always tried to avoid that.
(ok, at least I can say starting from ~2016-2017, I wasn't on LLVM before). Reviewers always asked to use yaml2obj when possible.
Previously yaml2obj was lacked many modern features and many things were impossible to do. Nowadays situation changes every day.

Also, don't forget that we have other projects, like LLD, where libObject based tools like llvm-readobj/objdump check the outputs too.

i.e. we could evolve the object format in llvm such that these tests would continue pass but the objects would no longer be standards compliant?

At fact many of our tests that use yaml are not standart complaint because of different reasons. I do not think it is a problem, that often helps to
simplify the test case for readability. Sometimes it is impossible to write a test that is standart compliant to test something
(e.g. testing errors in tools like llvm-readelf etc require creating of broken objects).

To summarize: I do not think that often use of yaml2obj is bad. It is a good tool, we might want to improve something,
maybe fix something, but generally, I think that it produces output no worse than other producers.

I'm not arguing against using yaml2obj in general, and obviously having binaries checked in is *almost* always a bad idea.

I'm only wondering if in this specific case it might make sense to keep binaries checked in, since we almost certainly never want them to change. Perhaps the point of these files is to be fixed in time forever to ensure we can parse basic object files. Ensuring that we can parse objects that we just created is kind of cheating maybe? I don't feel strongly about this. Does anyone else share this concern?

I am going to commit this to free road for another patches. If during futher discussion(s)
will be decided to keep these or other binaries checked in for something, we can do it
in a follow-up independent commit(s) with appropriate test case(s) and comment(s) about
how exactly binaries were produced and for what we keep them.

(FTR I do not think we should).

grimar closed this revision.Jul 22 2019, 1:16 AM

r366668

Is there a danger that by moving in this direction we loose some test coverage since we are no longer verifying the libObject can read a know set of binaries, but only that libObject can read the output yaml2obj? i.e. we could evolve the object format in llvm such that these tests would continue pass but the objects would no longer be standards compliant?

I think the pre-canned binaries don't really help us any more than yaml2obj inputs - it would be just as straightforward to tune our object reading code to work with a very specific set of binaries (unintentionally or otherwise) as it would be to work with our yaml2obj output. It's worth noting that the yaml2obj writing code and llvm-readobj reading code are not the same set of code, so we are already testing two different implementations against each other. If we change one and not the other, a test should break, alerting us to a behaviour change. We would need a good reason to change both.

Basically, I don't think we need to be concerned.