This is an archive of the discontinued LLVM Phabricator instance.

Fix Windows command line bug when last token in response file is ""
ClosedPublic

Authored by neildhar on Apr 16 2020, 11:22 PM.

Details

Summary

Current state machine for parsing tokens from response files in Windows does not correctly handle the case where the last token is "". The current implementation handles the last token by only adding it if it is not empty, however this does not cover the case where the last token is meant to be the empty string. We can cover this case by checking whether the state machine was last in the UNQUOTED state, which indicates that the last character of the input was a non-whitespace character.

Diff Detail

Event Timeline

neildhar created this revision.Apr 16 2020, 11:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 16 2020, 11:22 PM
neildhar edited the summary of this revision. (Show Details)Apr 16 2020, 11:27 PM
neildhar added a reviewer: Bigcheese.

Hi @rnk and @Bigcheese, I was wondering if you would be willing to review this diff.

Thank you!

I think the fix looks good, but I'd like to see a bit more in the test.

Thanks for taking on this bug.

llvm/unittests/Support/CommandLineTest.cpp
258

It would make the test easier to read if we could use raw string literals for these kinds of inputs. Does LLVM style permit that?

261

I'm mildly concerned that in the future someone may come in and append another argument to the end in order to test some other case, and thus invalidate this test.

Can you add a comment that this test is (among other things) explicitly testing the case of an empty quoted string at the end of the command? Or maybe even make this a separate test for specifically that (e.g., TokenizeWindowsCommandLineQuotedLastArgument).

Are there other cases we should test to make sure this doesn't change things? What if the last argument is " or """ or """"? What if there's white space after the last quotation mark? A quick look at the code suggests these might all work with this change, but it looks like they'd hit some other code paths.

rnk added a comment.May 19 2020, 11:35 AM

Thanks for the fix, I agree with Adrian's feedback. @amccarth, can you take over the review? I will remove myself, thanks.

rnk removed a reviewer: rnk.May 19 2020, 11:35 AM

I'm happy to see this through.

neildhar updated this revision to Diff 265970.May 25 2020, 12:00 AM

Update tests

neildhar planned changes to this revision.May 25 2020, 10:05 AM

Thanks for taking a look @amccarth! I've updated the tests to include one for " as well, I think the other versions (""", """") will trigger the same code paths although I can add them in if that is preferred.

Unfortunately, I've used the wrong linter here and will fix that and update.

neildhar updated this revision to Diff 266050.May 25 2020, 10:44 AM

Fix linting

amccarth requested changes to this revision.May 26 2020, 11:28 AM

Thanks for extending the tests. Those will certainly make your fix more robust against regressions.

I'm concerned about the extra complexity added to the state machine and the introduction of an unused API. The new version has more complicated flow control for no obvious benefit, and it seems unrelated to the relatively simple fix this patch started out to be. If there are good reasons for these additional changes, perhaps it would be best to break them into a separate follow-on patch. I'd be happy to review that as well.

llvm/lib/Support/CommandLine.cpp
921–922

Looks like we have some scope-creep: This patch is now about more than solving the problem with arguments at the end of the line.

I'm not convinced that it's important this be inlined. That's ultimately up to the compiler.

I think the comment about the user entry points doesn't tell us anything that we cannot see in the function signature.

927–928

I don't understand what I'm supposed to take away from this comment. Either delete it or explain _why_ it's important to do as much work as possible.

I'm not even sure what you mean by "work." It seems this version of the state machine does the same amount of work as the old version. Is this about the fact that the states now have their own internal loops in addition to the outer loop?

930–940

I'm concerned that the index is advanced here and and in the loop header. I'm not saying it's wrong. It's just harder to reason about state machine behavior when it's a hybrid of a classic loop surrounding state handlers that have their own internal state machines. Now we have a bunch of checks throughout the body to see if we've hit (or exceeded?!) the end. Is there an advantage to this that I'm not seeing?

941–942

This seems like a behavior change: AddToken will now be called _after_ MarkEOL. Maybe that doesn't matter, but it seems nonsensical to report the delimiter before the token it delimits.

941–942

Consistent with [[ https://llvm.org/docs/CodingStandards.html#use-early-exits-and-continue-to-simplify-code

LLVM style ]], the old version used continue to avoid cascades of else if statements. Please follow that pattern here.
1002–1003

Nobody is calling this new function, not even in the tests. Is the intent to replace the original function with this new one to reduce copying? Do we really need the flexibility to parse it both ways, or can we keep things a bit simpler?

1583–1584

Did you intentionally change J to start at 0 rather than 1? This mostly looks like you were just trying to make the style consistent by capitalizing these variable names.

This revision now requires changes to proceed.May 26 2020, 11:28 AM

Hi @amccarth, thanks for taking another look! I'm not sure how those changes render for you, on my screen they don't appear as part of this diff. They were picked up when I rebased, the original diff for those is https://reviews.llvm.org/D79262.

neildhar requested review of this revision.May 26 2020, 8:19 PM
amccarth accepted this revision.May 27 2020, 8:52 AM
amccarth added a subscriber: rnk.

Ah, I guess the diff was based off the old head rather than the rebased one. Phabricator highlighted all that code as part of this patch.

Sorry about that. I'll have to redirect most of those comments @rnk . That'll be awkward.

Now that Phabricator is showing just seeing fix and test improvements, it looks great.

This revision is now accepted and ready to land.May 27 2020, 8:52 AM

No worries, thanks! This is my first patch to LLVM so I do not have commit access, could I trouble you to commit it?

No problem. I'll land it this afternoon (Pacific time). Thanks!

This revision was automatically updated to reflect the committed changes.