This is an archive of the discontinued LLVM Phabricator instance.

[Fortran] Avoid digits in character constant
AbandonedPublic

Authored by rovka on Jun 21 2022, 5:20 AM.

Details

Summary

Test 9 from FM905.f tests the splitting of long character strings at the
80-character boundary when using list-directed output. There are
2 issues that make it difficult to keep this test in the test-suite in
its current form:

  1. One of the character constants contains only digits, which makes

fpcmp interpret it as a very large number rather than a character
constant. This is only really a problem because of issue #2:

  1. The standard is not strict about the number of spaces that can

crop up in list-directed output, which means that different compilers
will reach column 80 at different points in the character constant, and
therefore fpcmp will either see one very large number or 2 smaller but
somewhat arbitrary numbers, depending on where the newline was inserted.

This patch changes the problematic character constant to use letters
instead of digits. Since whitespaces are ignored, this fixes the issue.

Diff Detail

Event Timeline

rovka created this revision.Jun 21 2022, 5:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2022, 5:20 AM
Herald added a subscriber: jdoerfert. · View Herald Transcript
rovka requested review of this revision.Jun 21 2022, 5:20 AM

How does fpcmp fit into this issue? Is that in the checker?

rovka added a comment.Jun 22 2022, 4:28 AM

How does fpcmp fit into this issue? Is that in the checker?

Yes, fpcmp is the tool we use for comparing the results with the reference output. It treats all the numbers it encounters as floats.

Hey Diana, cheers for looking into this!

Please bear with me, as this is not my are of expertise :)

Test 9 from FM905.f tests the splitting of long character strings at the 80-character boundary when using list-directed output.

Where is this boundary imposed? Is it a limit for the input of for the output? The get a better idea, I've run this benchmark using flang-new and gfortran:

flang-new

                COMPUTED=
SHORTTHIS IS A LONGER CHARACTER STRING1234567890123456789012345678901234567890
12345678901234567890123456789012
                CORRECT=
SHORT  THIS IS A LONGER CHARACTER STRING 123456789012345678901234567890123456789
012345678901234567890123456789012

gfortran

                COMPUTED=
SHORTTHIS IS A LONGER CHARACTER STRING123456789012345678901234567890123456789012345678901234567890123456789012
                CORRECT=
SHORT  THIS IS A LONGER CHARACTER STRING 123456789012345678901234567890123456789
012345678901234567890123456789012

So the expected output was generated with gfortran in mind. But why is the output from flang-new different? You referred to white spaces in list-directed output, but the formatting seems identical with respect to white spaces.

rovka added a comment.Jun 23 2022, 6:06 AM

Hey Diana, cheers for looking into this!

Please bear with me, as this is not my are of expertise :)

Not mine either :D

Test 9 from FM905.f tests the splitting of long character strings at the 80-character boundary when using list-directed output.

Where is this boundary imposed? Is it a limit for the input of for the output? The get a better idea, I've run this benchmark using flang-new and gfortran:

flang-new

                COMPUTED=
SHORTTHIS IS A LONGER CHARACTER STRING1234567890123456789012345678901234567890
12345678901234567890123456789012
                CORRECT=
SHORT  THIS IS A LONGER CHARACTER STRING 123456789012345678901234567890123456789
012345678901234567890123456789012

gfortran

                COMPUTED=
SHORTTHIS IS A LONGER CHARACTER STRING123456789012345678901234567890123456789012345678901234567890123456789012
                CORRECT=
SHORT  THIS IS A LONGER CHARACTER STRING 123456789012345678901234567890123456789
012345678901234567890123456789012

So the expected output was generated with gfortran in mind. But why is the output from flang-new different? You referred to white spaces in list-directed output, but the formatting seems identical with respect to white spaces.

You're right, it's not about whitespace. I don't remember why I thought one of them was splitting at the 0 and one at the 9, I must've accidentally looked at the value printed after "CORRECT=". I guess the difference is only whether or not it splits at all. From what I managed to google up, gfortran never introduces a newline, but ifort does unless told otherwise on the command line. As you can see, flang also introduces a newline by default. To be honest, I don't understand the standardese around list-directed output very well, and since both behaviours exist in the wild I think we should try to accept both. WDYT? (I'll try to update the summary accordingly if you agree with the approach)

From what I managed to google up, gfortran never introduces a newline, but ifort does unless told otherwise on the command line. As you can see, flang also introduces a newline by default.

So, we know that the format of the output in this case is compiler-specific and the reference output was generated using gfortran. Generalizing it so that it works for other compilers makes sense to me.

think we should try to accept both. WDYT?

How can we achieve this? Wouldn't that require a dedicated "reference" file for every compiler that we want to support here? Also, by replacing digits with letters you are basically making fpcmp ignore this particular bit of output, right? So:

  • why not delete this particular test line if it's to be ignored anyway?
  • why is the generated output only really verified with fpcmp?

I've scanned llvm-test-suite and I don't see any obvious way to use any other DIFFPROGR. So it sounds like we can only use fpcmp for now and this tool shouldn't be comparing strings, should it? Perhaps switching from SingleSource CMake logic in the test suite to e.g. TestFile would help, but that's IMO outside the scope of this patch.

You can get list-directed output records to wrap at any length. The default is 79 but it can be overridden via FORT_FMT_RECL=n in the environment.

From what I managed to google up, gfortran never introduces a newline, but ifort does unless told otherwise on the command line. As you can see, flang also introduces a newline by default.

So, we know that the format of the output in this case is compiler-specific and the reference output was generated using gfortran. Generalizing it so that it works for other compilers makes sense to me.

think we should try to accept both. WDYT?

How can we achieve this? Wouldn't that require a dedicated "reference" file for every compiler that we want to support here? Also, by replacing digits with letters you are basically making fpcmp ignore this particular bit of output, right? So:

  • why not delete this particular test line if it's to be ignored anyway?
  • why is the generated output only really verified with fpcmp?

No, it's not ignored, fpcmp handles strings correctly (it errors out if there are any differences). In this case, it only ignores the whitespaces because I'm telling it to, but it still checks that the non-space characters are there.

I've scanned llvm-test-suite and I don't see any obvious way to use any other DIFFPROGR. So it sounds like we can only use fpcmp for now and this tool shouldn't be comparing strings, should it? Perhaps switching from SingleSource CMake logic in the test suite to e.g. TestFile would help, but that's IMO outside the scope of this patch.

Yeah, definitely outside the scope of this patch :)

You can get list-directed output records to wrap at any length. The default is 79 but it can be overridden via FORT_FMT_RECL=n in the environment.

That's cool, thanks :)
OTOH, that's specific to flang, right? So we'd have to tell people to add that to their environment when running the test-suite with flang. With other compilers, people will have to find alternative solutions on their own, if needed. With this patch, things just work, for any compiler, and we're not really changing the meaning of the test, since it says it's checking character constants (we're just switching to characters that don't confuse our existing diff tool).

I've scanned llvm-test-suite and I don't see any obvious way to use any other DIFFPROGR. So it sounds like we can only use fpcmp for now and this tool shouldn't be comparing strings, should it? Perhaps switching from SingleSource CMake logic in the test suite to e.g. TestFile would help, but that's IMO outside the scope of this patch.

SingleMultiSource is a skeleton for the simplest of test cases (program consists of single source file, stdout/err is compared to reference in file)
I've integrated other validators than fpmcp (https://github.com/llvm/llvm-test-suite/blob/8e4703af93b04d72e225215e67a89973e84e59fd/External/SPEC/SpecCPU2017.cmake#L258) using llvm_test_verify. fpcmp has its issues as well in that is ALWAYS (even with 0 tolerance) parses anything that looks like a floating-point number (as you noticed). This has been brought up before but nobody has changed it yet. I think that would be the cleaner (and overdue) solution.

@rovka
Since I changed fpcmp before (basically I rewrote the entire comparison algorithm once), I could help with that.

rovka added a comment.Jul 1 2022, 2:20 AM

I've scanned llvm-test-suite and I don't see any obvious way to use any other DIFFPROGR. So it sounds like we can only use fpcmp for now and this tool shouldn't be comparing strings, should it? Perhaps switching from SingleSource CMake logic in the test suite to e.g. TestFile would help, but that's IMO outside the scope of this patch.

SingleMultiSource is a skeleton for the simplest of test cases (program consists of single source file, stdout/err is compared to reference in file)
I've integrated other validators than fpmcp (https://github.com/llvm/llvm-test-suite/blob/8e4703af93b04d72e225215e67a89973e84e59fd/External/SPEC/SpecCPU2017.cmake#L258) using llvm_test_verify. fpcmp has its issues as well in that is ALWAYS (even with 0 tolerance) parses anything that looks like a floating-point number (as you noticed). This has been brought up before but nobody has changed it yet. I think that would be the cleaner (and overdue) solution.

@rovka
Since I changed fpcmp before (basically I rewrote the entire comparison algorithm once), I could help with that.

Ok, feel free to have a stab at it :) I'm going to be on vacation until the 20th of July, so I'll also consider the other patch on hold until you fix this one (since it might need at least a rebase afterwards). Thanks for looking into it!

I just saw that the reference output also contains floating point numbers (13.6, 12.4,15.25). Maybe we indeed new to process the file in floating-point mode?

rovka abandoned this revision.Jul 25 2022, 5:03 AM

This isn't needed anymore, thanks for having a look!