This is an archive of the discontinued LLVM Phabricator instance.

[test] Use host platform specific error message substitution in lit tests
ClosedPublic

Authored by abhina.sreeskantharajan on Feb 25 2021, 6:38 AM.

Details

Summary

This patch uses the errno python library to print out the correct error messages instead of hardcoding the error message per platform.

Diff Detail

Event Timeline

abhina.sreeskantharajan requested review of this revision.Feb 25 2021, 6:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 25 2021, 6:38 AM
jhenderson accepted this revision.Feb 25 2021, 7:17 AM

If it works, this is a much nicer solution, thanks. I didn't know about the errno library. Do you need me to verify it on Windows?

This revision is now accepted and ready to land.Feb 25 2021, 7:17 AM

If it works, this is a much nicer solution, thanks. I didn't know about the errno library. Do you need me to verify it on Windows?

Yes, that would be great! Thanks. I've already verified that it works on z/OS and Linux on Power.

ASDenysPetrov added a comment.EditedFeb 25 2021, 7:45 AM

Thanks, I'm also not sure if the second way is possible. I had another idea but I will need @ASDenysPetrov to test it out.
Does the following python program print out the correct error string for you?

import errno
import os
print ( os.strerror(errno.ENOENT))

If so, we can use this as the error message instead of hardcoding it per platform.

This is my output:

>>> import errno
>>> import os
>>> print(os.strerror(errno.ENOENT))
No such file or directory
>>>

I've tested this patch. Yes, I confirm, it works! Well done!

ASDenysPetrov accepted this revision.Feb 25 2021, 7:46 AM

I'm not sure why CI is failing, so I've tried rebasing my patch.
I have also fixed this testcase that fails on MSYS, llvm/test/tools/llvm-elfabi/fail-file-write-windows.test.

ASDenysPetrov accepted this revision.Feb 25 2021, 11:12 AM

I'm not sure why CI is failing, so I've tried rebasing my patch.
I have also fixed this testcase that fails on MSYS, llvm/test/tools/llvm-elfabi/fail-file-write-windows.test.

LGTM

jhenderson requested changes to this revision.Feb 26 2021, 1:20 AM

Bad news: this doesn't work on Windows. I've dug into why, and unfortunately it's a bizarre difference in behaviour between two different ways of getting the same error message from an error code on Windows. Given the following mini C++ program:

int main()
{
  printf("%s\n", strerror(EINVAL));
  printf("%s\n", std::make_error_code(std::errc(EINVAL)).message().c_str());
  return 0;
}

The output is:

Invalid argument
invalid argument

Note that the first uses an upper-case first letter, and the second a lower-case letter. Python, which is written in C will be using the first of these lines to get its error message. However, for the messages produced by LLVM, we are using the C++ std::error_code version, which produes the other version. This is obviously a slight problem...!

I wonder if we could use the old approach but have the substitution use a regex still (i.e. something like {{[Nn]}}? That would satisfy our use cases at least. It may not work though. I'm not sure if FileCheck variables can be regex patterns or not.

This revision now requires changes to proceed.Feb 26 2021, 1:20 AM

Thanks for testing it out. Since os.strerror does not emit the correct message on Windows we have to go back to hardcoding the error message for Windows. I had the same thought of passing a regex but FileCheck does not allow it. I currently check the host_cxx to see if MSYS is in the path because the library path is not available in the config.

I see there is a check for msys bash in the same file that looks like this:

if sys.platform in ['win32'] and not self.use_lit_shell:
    # Don't pass dosish path separator to msys bash.exe.

Is this a potential alternative to using host_cxx?

ASDenysPetrov added a comment.EditedFeb 26 2021, 7:16 AM

@abhina.sreeskantharajan
I really prefer the patch you made. Can we just ignore the case as a workaround for Windows in such case?

Ignore case on Windows

@abhina.sreeskantharajan
I've checked it. This patch works for me. I'm ready to approve.
@jhenderson could you check it again as well?

I haven't physically checked this (I expect it would work), but I don't think we want to add --ignore-case, in case there are other parts of the check that will be checked where case is important. For example, if the filename was part of the error message, we want it to match exactly the case of the file, as doing anything else can result in ambiguity and therefore risk for a bug to be hidden (most Windows filesystems are case insensitive, but it is possible for them to be case sensitive). Hiding the --ignore-case option behind a substitution would also be a potential for serious confusion, in my opinion, e.g. if a user explicitly wants to ignore the case themselves for other parts of their check.

I've subscribed @thopre and @jdenny, who both work on FileCheck. Maybe they'll have some other ideas. If not, I think we have to detect the C++ standard library being used somehow (or at least the compiler path, which is probably a good-enough proxy).

thopre added a comment.Mar 3 2021, 2:24 AM

I haven't physically checked this (I expect it would work), but I don't think we want to add --ignore-case, in case there are other parts of the check that will be checked where case is important. For example, if the filename was part of the error message, we want it to match exactly the case of the file, as doing anything else can result in ambiguity and therefore risk for a bug to be hidden (most Windows filesystems are case insensitive, but it is possible for them to be case sensitive). Hiding the --ignore-case option behind a substitution would also be a potential for serious confusion, in my opinion, e.g. if a user explicitly wants to ignore the case themselves for other parts of their check.

I've subscribed @thopre and @jdenny, who both work on FileCheck. Maybe they'll have some other ideas. If not, I think we have to detect the C++ standard library being used somehow (or at least the compiler path, which is probably a good-enough proxy).

I cannot think of a solution on FileCheck (the regex are allowed when matching/defining a variable, not when using it) but I don't think it's the right place to address this issue. That would require anyone using the substitution to be aware of the casing difference. I think the best approach is to lower the casing of the first letter of error message on Windows. The current code already checks for win32 platform so we don't lose more generality than we what we already have. We could even abstract that behind a function (e.g. llvm_strerror).

I haven't physically checked this (I expect it would work), but I don't think we want to add --ignore-case, in case there are other parts of the check that will be checked where case is important. For example, if the filename was part of the error message, we want it to match exactly the case of the file, as doing anything else can result in ambiguity and therefore risk for a bug to be hidden (most Windows filesystems are case insensitive, but it is possible for them to be case sensitive). Hiding the --ignore-case option behind a substitution would also be a potential for serious confusion, in my opinion, e.g. if a user explicitly wants to ignore the case themselves for other parts of their check.

I've subscribed @thopre and @jdenny, who both work on FileCheck. Maybe they'll have some other ideas. If not, I think we have to detect the C++ standard library being used somehow (or at least the compiler path, which is probably a good-enough proxy).

I cannot think of a solution on FileCheck (the regex are allowed when matching/defining a variable, not when using it) but I don't think it's the right place to address this issue. That would require anyone using the substitution to be aware of the casing difference. I think the best approach is to lower the casing of the first letter of error message on Windows. The current code already checks for win32 platform so we don't lose more generality than we what we already have. We could even abstract that behind a function (e.g. llvm_strerror).

I agree adding ignore-case may not be ideal especially because we can only specify it once. If that's the case, is my previous diff on this patch better? We check

if sys.platform == 'win32' and 'MSYS' not in host_cxx
thopre added a comment.Mar 3 2021, 4:23 AM

I haven't physically checked this (I expect it would work), but I don't think we want to add --ignore-case, in case there are other parts of the check that will be checked where case is important. For example, if the filename was part of the error message, we want it to match exactly the case of the file, as doing anything else can result in ambiguity and therefore risk for a bug to be hidden (most Windows filesystems are case insensitive, but it is possible for them to be case sensitive). Hiding the --ignore-case option behind a substitution would also be a potential for serious confusion, in my opinion, e.g. if a user explicitly wants to ignore the case themselves for other parts of their check.

I've subscribed @thopre and @jdenny, who both work on FileCheck. Maybe they'll have some other ideas. If not, I think we have to detect the C++ standard library being used somehow (or at least the compiler path, which is probably a good-enough proxy).

I cannot think of a solution on FileCheck (the regex are allowed when matching/defining a variable, not when using it) but I don't think it's the right place to address this issue. That would require anyone using the substitution to be aware of the casing difference. I think the best approach is to lower the casing of the first letter of error message on Windows. The current code already checks for win32 platform so we don't lose more generality than we what we already have. We could even abstract that behind a function (e.g. llvm_strerror).

I agree adding ignore-case may not be ideal especially because we can only specify it once. If that's the case, is my previous diff on this patch better? We check

if sys.platform == 'win32' and 'MSYS' not in host_cxx

If you mean version 3 (aka https://reviews.llvm.org/D97472?id=326671) of your diff then I personally prefer yes. It's not as elegant but it's less surprising.

Revert to previous commit. Set host_cxx to empty string if not set.

jdenny added a comment.Mar 3 2021, 8:43 AM

I may be lost. Why is --ignore-case still needed in llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml? I thought the special case for win32 was supposed to handle that.

ASDenysPetrov accepted this revision.Mar 3 2021, 9:10 AM

LGTM. Let it be the most balanced solution.

I may be lost. Why is --ignore-case still needed in llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml? I thought the special case for win32 was supposed to handle that.

I missed this testcase in my old patch. I noticed it because it started to fail when I tried to add --ignore-case as a solution. I can fix it in this patch as well if desired or open a separate patch for it after this one.

I may be lost. Why is --ignore-case still needed in llvm/test/tools/yaml2obj/ELF/DWARF/debug-gnu-pubnames.yaml? I thought the special case for win32 was supposed to handle that.

I missed this testcase in my old patch. I noticed it because it started to fail when I tried to add --ignore-case as a solution. I can fix it in this patch as well if desired or open a separate patch for it after this one.

Either way is fine by me. I was just concerned I had misunderstood what was happening.

Based on @jhenderson's comments, it sounds like C++ code could still call strerror(EINVAL) on windows. It doesn't look like that would be handled by this patch, but I suppose we can cross that bridge if we actually come to it.

Of the solutions I've seen, the current one seems best.

Sorry @jdenny, I misunderstood your comment. I removed the old --ignore-case workaround and fixed a similar lit test.

@jhenderson does the solution in this patch look acceptable to you?

The approach basically looks fine. I haven't had a chance to verify it today. I'll try to find time tomorrow if I can.

Based on @jhenderson's comments, it sounds like C++ code could still call strerror(EINVAL) on windows. It doesn't look like that would be handled by this patch, but I suppose we can cross that bridge if we actually come to it.

Yes, I expect that this could be a potential problem, but it's probably solved by saying "don't do that", probably in the coding standards? Perhaps worth an RFC on the mailing list to see if others have opinions. There are a limited number of existing cases that actually do call strerror in LLVM, but I'm guessing none of them are tested, at least not outside of a Linux environment.

llvm/utils/lit/lit/llvm/config.py
351

Add a comment here explaining why we have to treat windows behaviour separately.

Thanks! Adding comment

abhina.sreeskantharajan marked an inline comment as done.Mar 4 2021, 12:43 PM
jhenderson accepted this revision.Mar 5 2021, 12:32 AM

LGTM. I've tested this on a small sample of tests that use these substitutions, using my Windows machine, and the tests pass.

This revision is now accepted and ready to land.Mar 5 2021, 12:32 AM
ASDenysPetrov accepted this revision.Mar 5 2021, 2:49 AM
This revision was landed with ongoing or failed builds.Mar 5 2021, 4:22 AM
This revision was automatically updated to reflect the committed changes.