This is an archive of the discontinued LLVM Phabricator instance.

[Flang] Ported test_symbols to Python
ClosedPublic

Authored by ijan1 on Jul 29 2021, 12:41 AM.

Details

Summary

Due to unavailability of Flang testing on Windows, the shell scripts are being ported to Python. The following changes are being made in this patch: removed test_symbols.sh and common.sh, and ported them to Python. Changes to the tests themselves so that they use the python scripts instead of the shell script.

Diff Detail

Event Timeline

ijan1 created this revision.Jul 29 2021, 12:41 AM
ijan1 requested review of this revision.Jul 29 2021, 12:41 AM
Herald added a project: Restricted Project. · View Herald Transcript
ijan1 changed the visibility from "Public (No Login Required)" to "No One".
ijan1 changed the edit policy from "All Users" to "No One".
ijan1 edited the summary of this revision. (Show Details)Jul 29 2021, 12:52 AM
ijan1 changed the visibility from "No One" to "Public (No Login Required)".
ijan1 changed the edit policy from "No One" to "All Users".
ijan1 retitled this revision from Ported test_symbols to Python to [Flang] Ported test_symbols to Python.Jul 29 2021, 12:56 AM
ijan1 added a project: Restricted Project.
kiranchandramohan requested changes to this revision.Jul 29 2021, 2:24 AM

Some tests are failing, (see message below). I think we will have to retain common.sh till all scripts that use it are ported to python.

Command Output (stderr):
--
/var/lib/buildkite-agent/builds/llvm-project/flang/test/Semantics/test_errors.sh: line 7: /var/lib/buildkite-agent/builds/llvm-project/flang/test/Semantics/common.sh: No such file or directory
This revision now requires changes to proceed.Jul 29 2021, 2:24 AM
Meinersbur added inline comments.Jul 29 2021, 3:33 PM
flang/test/Semantics/OpenACC/acc-symbols01.f90
1

The python used here should be the PYTHON_EXECUTABLE discovered by cmake, not the one in the default path. Typically, the lit.cfg.py determines what the correct python executable. You could define a placeholder %test_symbols in lit.cfg.py that expands to the correct python executable and test_symbols.py program (similar to %flang_fc1 or %clang_cc1 by clang). Otherwise, I think llvm-lit even expands some standard tools itself, I know it does in LLVM for opt. Could you confirm this is done for python?

flang/test/Semantics/common.py
10–14

[not a change request] It does not seem necessary to make this a class. I would understand if test_symbols.py would derive from it to e.g. customize main, but now its a mixed of object-oriented and script style.

67–69

AFAIU, this file is never executed directly. Could you remove this?

flang/test/Semantics/test_symbols.py
22–24

This script write intermediate strings into temporary files just to read them back in again. I don't see reason to do that.

25

diffs is written to, but never used.

33

[typo] blank

53

Prefer subprocess.run or subprocess.check_output

73
ijan1 updated this revision to Diff 363126.EditedJul 30 2021, 9:34 AM
ijan1 edited the summary of this revision. (Show Details)

Incorporated Michael's feedback into a small patch.

Some suggestions for more idiomatic python

flang/test/Semantics/common.py
14
20
30
32
37–38
43–44
43–44
flang/test/Semantics/test_symbols.py
28–29
35–36
47–51

Isn't this identical to:

diff_check = unified_diff(diff1, diff3, n=999999, fromfile="1.90", tofile="3.f90")

?

ijan1 updated this revision to Diff 363416.Aug 2 2021, 2:43 AM
ijan1 marked 17 inline comments as done.

Changes for more idiomatic python.

Meinersbur accepted this revision.Aug 2 2021, 11:28 AM

LGTM. Thanks for all the work.

There would be possibilities to use re.compile and/or re.MULTILINE for better perfomance, but I don't think its a bottleneck.

flang/test/Semantics/test_symbols.py
20

diff3 is overwritten later, no need set it to an empty string in advance.

awarzynski added inline comments.Aug 3 2021, 1:10 AM
flang/test/Semantics/test_symbols.py
2

Assumes that python3 lives in /usr/bin/python3. #!/usr/bin/env python3 would be a bit safer/generic.

14

PurePath and Path not used?

A few minor comments.

flang/test/Semantics/common.py
2

Nit: Is this script directly executed? If not do we need this here?

37

Nit: you can rename f18-command as flang-command.

flang/test/Semantics/test_symbols.py
48

Would it be better to rename 1.f90 to source.f90 and 3.f90 to compiled.f90?
Otherwise, there can be the question, what happened to 2.f90?

awarzynski added inline comments.Aug 3 2021, 5:40 AM
flang/test/Semantics/test_symbols.py
48

Also, in the original test script it was $temp/1.90: https://github.com/llvm/llvm-project/blob/main/flang/test/Semantics/test_symbols.sh#L12-L14. IIUC, $temp is a directory unique to the corresponding test. Are you preserving this here? Otherwise tests will start failing in weird way when run in parallel.

Meinersbur added inline comments.Aug 3 2021, 7:22 AM
flang/test/Semantics/test_symbols.py
48

difflib is a pure-python implementation, it does not write to those files. They are only used in to indicate what to appear in the output as:

--- 1.90
+++ 3.f90

See https://docs.python.org/3/library/difflib.html#difflib.unified_diff

They also don't need to be filenames, but could also be descriptions such as "Actual output" and "Expected output".

The files themselves (1.f90, 2.f90 and 3.f90) were just used in the shell script to pass data between commands, as shell scripts work. This python version keeps the data in strings.

ijan1 updated this revision to Diff 364484.Aug 5 2021, 8:08 AM
ijan1 marked 6 inline comments as done.

Incorporating feedback from Kiran and Andrzej.

awarzynski accepted this revision.Aug 6 2021, 1:20 AM

LGTM, thank you for addressing my comments!

This revision is now accepted and ready to land.Aug 6 2021, 1:41 AM
awarzynski added inline comments.Aug 6 2021, 6:13 AM
flang/test/Semantics/common.py
3–5

[nit] This script does not take any arguments :)

flang/test/Semantics/symbol19.f90
1

! RUN: %python %S/test_symbols.py %s %flang_fc1

ijan1 updated this revision to Diff 364831.Aug 6 2021, 9:48 AM
ijan1 marked 2 inline comments as done.

Fixed a bug in a test file. And also updated the scripts to use docstrings instead of normal comments.

This revision was automatically updated to reflect the committed changes.

When I run tests after this patch, a directory (flang/test/Semantics/__pycache__) gets created in my checked-out source tree.

When I run tests after this patch, a directory (flang/test/Semantics/__pycache__) gets created in my checked-out source tree.

Thanks for drawing our attention to this. Not sure what the best approach would be here. See https://reviews.llvm.org/D108023 for a suggestion.

When I run tests after this patch, a directory (flang/test/Semantics/__pycache__) gets created in my checked-out source tree.

Thanks for drawing our attention to this. Not sure what the best approach would be here. See https://reviews.llvm.org/D108023 for a suggestion.

Nothing to do here. This is normal python behavior and improves execution speed. When running tests, a __pycache__ is also created in llvm/utils/lit/lit since this is where the python source code resides. The .pyc files it creates are already in the global .gitignore.

If you don't want these to be created, set PYTHONDONTWRITEBYTECODE=1 in your shell environment.

When I run tests after this patch, a directory (flang/test/Semantics/__pycache__) gets created in my checked-out source tree.

Thanks for drawing our attention to this. Not sure what the best approach would be here. See https://reviews.llvm.org/D108023 for a suggestion.

Nothing to do here. This is normal python behavior and improves execution speed. When running tests, a __pycache__ is also created in llvm/utils/lit/lit since this is where the python source code resides. The .pyc files it creates are already in the global .gitignore.

If you don't want these to be created, set PYTHONDONTWRITEBYTECODE=1 in your shell environment.

I don't care whether they are created or not so long as a git commit -a doesn't pick them up.

I don't care whether they are created or not so long as a git commit -a doesn't pick them up.

It shouldn't (from git commit --help):

-a, --all
    Tell the command to automatically stage files that have been modified and deleted, but new files you have not told Git about are not affected.

Are you seeing different behavior?

I don't care whether they are created or not so long as a git commit -a doesn't pick them up.

It shouldn't (from git commit --help):

-a, --all
    Tell the command to automatically stage files that have been modified and deleted, but new files you have not told Git about are not affected.

Are you seeing different behavior?

Yes, I was, while doing out-of-tree development, since I typically run "git add flang" before reformatting before committing. I'll modify my process accordingly to work around this change.

Do you have a sparse git checkout that is missing the top-level .gitignore (sibling of the llvm, flang, clang etc directories)? Normally, git outright refuses to add files that match .gitignore, unless you pass -f.

Do you have a sparse git checkout that is missing the top-level .gitignore (sibling of the llvm, flang, clang etc directories)? Normally, git outright refuses to add files that match .gitignore, unless you pass -f.

@klausler mentions above that he uses an out of tree build which does not have the top level gitignore file. I have created a patch to add pycache and pyc to gitgnore in flang directory.
https://reviews.llvm.org/D108057

@klausler mentions above that he uses an out of tree build which does not have the top level gitignore file. I have created a patch to add pycache and pyc to gitgnore in flang directory.
https://reviews.llvm.org/D108057

A "out-of-tree build" is a build directory that is not the same as the source directory (mandatory for LLVM) and is independent of how the source tree is organized.

I haven't understood @klausler's workflow, as long as -f is not used for the git add command, ignored files mentioned in the .gitignore should not be added.

A "out-of-tree build" is a build directory that is not the same as the source directory (mandatory for LLVM) and is independent of how the source tree is organized.

I haven't understood @klausler's workflow, as long as -f is not used for the git add command, ignored files mentioned in the .gitignore should not be added.

In the context of LLVM Flang, "out-of-tree" usually means that Flang itself is out-of-tree: https://github.com/llvm/llvm-project/tree/main/flang#building-flang-out-of-tree. This means building in two stages:

  • stage 1 - build LLVM, MLIR, Clang
  • stage 2 - build Flang against the binaries generated in stage 1 (i.e. as if Flang was an out-of-tree project)

In this workflow, LLVM Flang is like an independent project. It's more like "out-of-tree development" than "out-of-tree build" and .gitignore from LLVM doesn't really exist in this set-up.

In the context of LLVM Flang, "out-of-tree" usually means that Flang itself is out-of-tree: https://github.com/llvm/llvm-project/tree/main/flang#building-flang-out-of-tree. This means building in two stages:

  • stage 1 - build LLVM, MLIR, Clang
  • stage 2 - build Flang against the binaries generated in stage 1 (i.e. as if Flang was an out-of-tree project)

This would usually be referred to as a standalone build: https://github.com/llvm/llvm-project/blob/94b4598d77fe0585a8a3bd2a798fc7ce15a6aa56/flang/CMakeLists.txt#L25

In this workflow, LLVM Flang is like an independent project. It's more like "out-of-tree development" than "out-of-tree build" and .gitignore from LLVM doesn't really exist in this set-up.

Your linked readme suggests that the source is checked out in ~/flang/src. This does not correspond to structure from a git clone of the llvm-project repository. Maybe it's a symlink to the flang directory of the llvm-project repository? Why not just ask cmake to use that directory for the source directory? However, I also don't have any problem with adding additional files to flang's .gitignore (D108057) but would ask someone who thinks it is useful to approve that patch.

Your linked readme suggests that the source is checked out in ~/flang/src.

That's a typo and probably a leftover from when F18 was a separate project (i.e. before merging into llvm-project). Updated in https://reviews.llvm.org/D108198.

This does not correspond to structure from a git clone of the llvm-project repository.

When I test "standalone" builds, I clone llvm-project once, but then build twice (i.e. in two stages). I'm not sure whether that's the intended workflow - I only use it when testing changes that can potentially break it.

Why not just ask cmake to use that directory for the source directory?

But with LLVM's CMake. you can't really build just one sub-project, can you?

We don't really use this workflow, so we just focus on making sure that our changes don't break it.

You can have an LLVM build and a flang standalone build with just a single git checkout. That is:

$ git clone https://github.com/llvm/llvm-project.git
$ cmake -S llvm-project/llvm -B build-llvm -GNinja ...
$ (cd build-llvm && ninja)
$ cmake -S llvm-project/flang -B build-flang -GNinja -DLLVM_DIR=../build-llvm ...

My point is, it is irrelevant how you build LLVM & flang, you will have that same layout of the source checkout in all cases. This layout has a top-level .gitignore that already excludes *.pyc and automatically used by git while doing any operation in any subdirectory of the source checkout. git doesn't even know/care where/how the build directories are located.

I tested what happens if you have a symlink into a subdirectory of the llvm-project checkout. Turns out that git recognizes it is a symlink and still finds the .gitignore of the parent directory the symlink points to. That is, the only means I can think of where git does not recognize the .gitignore is when copying the flang subdirectory somewhere else (and copy back when upstreaming a change) or using git sparse checkouts. The former is not how git was intended to be used and the latter only has experimental support.

You can have an LLVM build and a flang standalone build with just a single git checkout.

Despite doing exactly that, I completely missed that indeed the top .gitignore would still be there and that Git would use it. My bad, sorry. Thank you for the detailed explanation!

The new script doesn't produce anything like the output of the old one, which was a lot more readable.

If I change the second occurrence of "Implicit" to "xxx" in symbol01.f90 to make it fail, the output starts like this.

Command Output (stdout):
--
---Expectedoutput
+++Actualoutput
@@-1,586+1,591@@
!DEF:/mModule
modulem
!DEF:/m/fPRIVATE,PURE,RECURSIVE(Function)SubprogramREAL(4)
private::f
contains
!DEF:/m/sBIND(C),PUBLIC,PURE(Subroutine)Subprogram
!DEF:/m/s/xINTENT(IN)(Implicit)ObjectEntityREAL(4)
!DEF:/m/s/yINTENT(INOUT)(-x-x-x+I+m+p+l+i+c+i+t)ObjectEntityREAL(4)
puresubroutines(x,y)bind(c)

With the old script it was

Command Output (stdout):
--
 !DEF: /m Module
 module m
  !DEF: /m/f PRIVATE, PURE, RECURSIVE (Function) Subprogram REAL(4)
  private :: f
 contains
  !DEF: /m/s BIND(C), PUBLIC, PURE (Subroutine) Subprogram
  !DEF: /m/s/x INTENT(IN) (Implicit) ObjectEntity REAL(4)
- !DEF: /m/s/y INTENT(INOUT) (xxx) ObjectEntity REAL(4)
+ !DEF: /m/s/y INTENT(INOUT) (Implicit) ObjectEntity REAL(4)
  pure subroutine s (x, y) bind(c)
awarzynski added a comment.EditedAug 18 2021, 10:16 AM

The new script doesn't produce anything like the output of the old one, which was a lot more readable.

If I change the second occurrence of "Implicit" to "xxx" in symbol01.f90 to make it fail, the output starts like this.

Command Output (stdout):
--
---Expectedoutput
+++Actualoutput
@@-1,586+1,591@@
!DEF:/mModule
modulem
!DEF:/m/fPRIVATE,PURE,RECURSIVE(Function)SubprogramREAL(4)
private::f
contains
!DEF:/m/sBIND(C),PUBLIC,PURE(Subroutine)Subprogram
!DEF:/m/s/xINTENT(IN)(Implicit)ObjectEntityREAL(4)
!DEF:/m/s/yINTENT(INOUT)(-x-x-x+I+m+p+l+i+c+i+t)ObjectEntityREAL(4)
puresubroutines(x,y)bind(c)

With the old script it was

Command Output (stdout):
--
 !DEF: /m Module
 module m
  !DEF: /m/f PRIVATE, PURE, RECURSIVE (Function) Subprogram REAL(4)
  private :: f
 contains
  !DEF: /m/s BIND(C), PUBLIC, PURE (Subroutine) Subprogram
  !DEF: /m/s/x INTENT(IN) (Implicit) ObjectEntity REAL(4)
- !DEF: /m/s/y INTENT(INOUT) (xxx) ObjectEntity REAL(4)
+ !DEF: /m/s/y INTENT(INOUT) (Implicit) ObjectEntity REAL(4)
  pure subroutine s (x, y) bind(c)

Should be fixed in https://reviews.llvm.org/D107954.

Thanks, that fixes the weird diff formatting. But why are all of the spaces removed?

ijan1 added a comment.Aug 19 2021, 2:11 AM

Thanks, that fixes the weird diff formatting. But why are all of the spaces removed?

A file might have a line with whitespace, such as real x(100, 100), whereas the output would have stripped some of the whitespace and have it as real x(100,100). However, Python's difflib library[1] unfortunately doesn't have support for ignoring whitespace, unlike diff which was used in the shell scripts. There are some threads[2][3] asking about charjunk mentioned in the Python docs but it's poorly documented and it doesn't actually ignore characters.

[1] https://docs.python.org/3/library/difflib.html
[2] https://stackoverflow.com/questions/63893283/difflib-ignore-whitespace-diffs-w-ndiff
[3] https://stackoverflow.com/questions/65780782/how-to-do-a-diff-in-python-of-two-text-files-and-ignore-white-space-and-blank-li

flang/test/Semantics/test_symbols.py
47–51

"unified_diff" returns a generator.
An alternative would be

diff_check = "".join(list(unified_diff(diff1, diff3, n=999999, fromfile="1.90", tofile="3.f90")))
flang/test/Semantics/test_symbols.py