This is an archive of the discontinued LLVM Phabricator instance.

workflows: Make issue-subscriber more robust for labels with special characters
ClosedPublic

Authored by tstellar on Jan 6 2022, 11:55 AM.

Details

Summary

Also, replace the existing actionscript implementation with a python
script that can be run outside of GitHub Actions. The intention is
that going forward, all github action functionality would be implemented
in this script.

Diff Detail

Event Timeline

tstellar requested review of this revision.Jan 6 2022, 11:55 AM
tstellar created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptJan 6 2022, 11:55 AM
kwk requested changes to this revision.Jan 6 2022, 1:45 PM

Since this looks like the beginning of something new and big I highly suggest to consider using GraphQL instead of doing the many REST calls that are needed. Especially when you drive stuff from a github action you have almost everything you need right at your finger tips. I mean all the global IDs for issues and alike. I'm working on something for another project and here's my example of how one could encapsulate the github functionality into classes with easy usage: https://gist.github.com/kwk/c89b6a3e5eb40487fed78f226e982fcc#file-graphql-py-L138 .

Anyway this is just a thought.

.github/workflows/issue-subscriber.yml
15

I suspect that checking out the repo is not an option here?

But anyways, if you absolutely need curl to download the script, then be aware of some issues with this approach. I noticed that curl sometimes queries an older version of the content than currently is on github. The solution could be this:

curl \
   --compressed \
   -s \
   -H 'Cache-Control: no-cache' \
   https://raw.githubusercontent.com/$GITHUB_REPOSITORY/$GITHUB_SHA/llvm/utils/git/github-automation.py?$(uuidgen)

You might wonder about the --compressed and caching options or even about
the UUID being attached to the URL at the very end. These are all ways to
ensure we get the freshest of all versions of the file on github. I got the inspiration for this from here:
https://stackoverflow.com/questions/31653271/how-to-call-curl-without-using-server-side-cache?noredirect=1&lq=1

llvm/utils/git/github-automation.py
19

Being picky here, although this looks straight forward, I highly suggest that we use typed python:

def __init__(self, token:str, repo:str, issue_number:int, label_name:str):

This makes functions so much more readable IMHO.

35–39

os.getenv() is overloaded and accepts a second value to be the default if the environment variable is None.

def get_default_repo():
    return os.getenv('GITHUB_REPOSITORY', 'llvm/llvm-project')
This revision now requires changes to proceed.Jan 6 2022, 1:45 PM
tstellar updated this revision to Diff 398052.Jan 6 2022, 11:03 PM

Address review comments

tstellar marked 2 inline comments as done.Jan 6 2022, 11:04 PM
tstellar added inline comments.
.github/workflows/issue-subscriber.yml
15

I don't think any older content exists, because we are pulling a specific version of the file : GITHUB_SHA.

Since this looks like the beginning of something new and big I highly suggest to consider using GraphQL instead of doing the many REST calls that are needed. Especially when you drive stuff from a github action you have almost everything you need right at your finger tips. I mean all the global IDs for issues and alike. I'm working on something for another project and here's my example of how one could encapsulate the github functionality into classes with easy usage: https://gist.github.com/kwk/c89b6a3e5eb40487fed78f226e982fcc#file-graphql-py-L138 .

Anyway this is just a thought.

It would be nice if the PyGitHub module had wrappers for this already. What are the advantages of GraphQL vs the REST API?

kwk added a comment.Jan 10 2022, 11:04 AM

Since this looks like the beginning of something new and big I highly suggest to consider using GraphQL instead of doing the many REST calls that are needed. Especially when you drive stuff from a github action you have almost everything you need right at your finger tips. I mean all the global IDs for issues and alike. I'm working on something for another project and here's my example of how one could encapsulate the github functionality into classes with easy usage: https://gist.github.com/kwk/c89b6a3e5eb40487fed78f226e982fcc#file-graphql-py-L138 .

Anyway this is just a thought.

It would be nice if the PyGitHub module had wrappers for this already. What are the advantages of GraphQL vs the REST API?

The advantages of GraphQL that I see is that you can aggregate more than one endpoint into one response. The following example is totally made up but it showcases what you can do: Suppose, you have an issue comment that links to another issue and you want to get the assignee of the other issue. You could totally get this information with one call to the GraphQL by doing nested queries. This also is the reason for why there's no wrapper for this and why you don't need one: GraphQL was designed so that you as a developer can specify what you get in the result, hence there's no way to wrap this.
The GraphQL explorer makes it extremely nice to tailor your queries/mutations to your needs by letting you define what (aggregated) fields the result should include.

Try out this example if you want:

{
  node(id: "IC_kwDOETcFfs47PM2o") {
    ... on IssueComment {
      body
      issue {
        id
      }
      author {
        login
      }
    }
  }
}

But comparing REST and GraphQL is really comparing apples and oranges. Let me forward this article and a quote from it:

Where GraphQL Shines

Unlike REST, GraphQL is designed to access multiple resources simultaneously. This means that you are not only able to be more precise in not only retrieving just the data you desire (something that is built into some of today’s modern RESTful APIs), but you are able to do so across multiple resources/ data models (with data joins automatically built in) in a single HTTP (or other applicable protocol) call.

GraphQL is also designed to be incredibly structured (so much so that the order of properties in the response is critical). This means clients will know exactly what to expect (and in what types) without having to pull in JSON or XML schemas. It also means the API is much easier to document as the possibilities are limited to its models, not its representations and dynamically managed relationships.

That being said I think programmatically the GraphQL query can easily be verified for correctness outside the program code where the the REST API let's you use auto-completion inside the editor of choice. IMHO with growing complexity of tasks or queries I'd lean towards GraphQL and it's good practice to start small, aka with less complexity.

Sorry for this long debatable answer.

.github/workflows/issue-subscriber.yml
15

That's a fair point. Sorry for not seeing that.

llvm/utils/git/github-automation.py
22

Sorry for being picky. This is not needed but a though. I'd make this a property to be immutable to change from the outside:

# In __init__:
# ...
self._team_name = 'issue-subscribers-{}'.format(self.label_name).lower()

@property
def team_name(self) -> str:
    return self._team_name
24

To help code assists, it's nice to include the return type f this function.

def run(self) -> bool:
29

Do we ignore any exceptions that can happen? This question is important for future additions.

tstellar updated this revision to Diff 398834.Jan 10 2022, 9:36 PM

Make team_name a property and add a return type to run()

tstellar marked 2 inline comments as done.Jan 10 2022, 9:38 PM
tstellar added inline comments.
llvm/utils/git/github-automation.py
29

That was my plan for this task. I'm not sure what we would do with the exceptions if we caught them.

kwk accepted this revision.Jan 13 2022, 2:13 AM

I don't see anything obvious to hold this back.

This revision is now accepted and ready to land.Jan 13 2022, 2:13 AM