This is an archive of the discontinued LLVM Phabricator instance.

[Dexter] Add DexDeclareAddress command and address function
ClosedPublic

Authored by StephenTozer on Oct 8 2021, 10:51 AM.

Details

Summary

This patch adds a new Dexter command, DexDeclareAddress, which is used to test the relative values of pointer variables. The motivation for adding this command is to allow meaningful assertions to be made about pointers that go beyond checking variable availability and null equality.

The full explanation and syntax is given in Commands.md, but as an example the following code tests that the pointer foo is equal to bar, and baz is equal to foo + 16:

DexDeclareAddress('my_addr')
DexExpectWatchValue('foo', address('my_addr'))
DexExpectWatchValue('bar', address('my_addr'))
DexExpectWatchValue('baz', address('my_addr', 16))

Diff Detail

Event Timeline

StephenTozer requested review of this revision.Oct 8 2021, 10:51 AM
StephenTozer created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptOct 8 2021, 10:51 AM

In the test you've added cross-project-tests/debuginfo-tests/dexter-tests/address.cpp we have:

// DexDeclareAddress('addr')
// DexExpectWatchValue('a', 0, address('addr'), from_line=8, to_line=10)
// DexExpectWatchValue('b', address('addr'), on_line=10)

Comparing this to:

// DexExpectWatchValue('b == a', True, on_line=10)

Am I right in thinking the main benefit is that addr is captured (on first use in a DexExpectWatchValue) which means we can use the value later, even if a has been optimized out when we want to check the value of b?

I don't particularly like the way of assigning a value to the address variable: "In its first appearance it will match against any valid pointer". I'd prefer a dedicated command (or even rolling it into DexDeclareAddress) because this implicit behaviour is not very obvious when skimming a test. What do you think?

Please can you add some regression tests for the Command? You can take inspiration from the duplicate label test and referencing an undefined label test, plus it would be good to have at least one test in the command tests using it.

cross-project-tests/debuginfo-tests/dexter/dex/command/commands/DexExpectWatchBase.py
58–1

nit: .

67

nit: a address -> an address, and the comment needs a full stop.

cross-project-tests/debuginfo-tests/dexter/dex/heuristic/Heuristic.py
31

Sorry if this is a silly question, why do we need to do this here in the heuristic code?

Am I right in thinking the main benefit is that addr is captured (on first use in a DexExpectWatchValue) which means we can use the value later, even if a has been optimized out when we want to check the value of b?

Yes - this doesn't necessarily just apply to optimized out variables but to any set of variables in the program, including variables in entirely disjoint scopes, or variables in the same scope at different points in time. This could also include comparing a variable to itself, such as DexExpectValue('a', address('a'), address('a', 4), address('a', 8)) to test that an iterator increments correctly.

I don't particularly like the way of assigning a value to the address variable: "In its first appearance it will match against any valid pointer". I'd prefer a dedicated command (or even rolling it into DexDeclareAddress) because this implicit behaviour is not very obvious when skimming a test. What do you think?

I considered a "declare a specific address once" version, but wasn't sure if it would be preferable, since it means that if the "canonical" address is not available at the point you declare it at then every test using the address would fail - even if every other use of the address is consistent. This is suitable for lit tests where we always expect sucess/failure to be binary, but it seems more brittle in score-based testing, which Dexter ostensibly supports. I'm not necessarily opposed to it, since it would simplify the code and the test syntax in some cases, but I'm not sure whether it's worth making the scoring heuristic less reliable.

Am I right in thinking the main benefit is that addr is captured (on first use in a DexExpectWatchValue) which means we can use the value later, even if a has been optimized out when we want to check the value of b?

Yes - this doesn't necessarily just apply to optimized out variables but to any set of variables in the program, including variables in entirely disjoint scopes, or variables in the same scope at different points in time. This could also include comparing a variable to itself, such as DexExpectValue('a', address('a'), address('a', 4), address('a', 8)) to test that an iterator increments correctly.

Cool, that SGTM.

I don't particularly like the way of assigning a value to the address variable: "In its first appearance it will match against any valid pointer". I'd prefer a dedicated command (or even rolling it into DexDeclareAddress) because this implicit behaviour is not very obvious when skimming a test. What do you think?

I considered a "declare a specific address once" version, but wasn't sure if it would be preferable, since it means that if the "canonical" address is not available at the point you declare it at then every test using the address would fail - even if every other use of the address is consistent. This is suitable for lit tests where we always expect sucess/failure to be binary, but it seems more brittle in score-based testing, which Dexter ostensibly supports. I'm not necessarily opposed to it, since it would simplify the code and the test syntax in some cases, but I'm not sure whether it's worth making the scoring heuristic less reliable.

That is a good point and I'm almost convinced. Again, from your test cross-project-tests/debuginfo-tests/dexter-tests/address.cpp: Am I right in thinking that if a is always reported as optimized out then the DexExpectWatchValue for b will always succeed? I wonder if "address name is assigned but never checked" and "address name is never assigned" should be cases for heuristic penalties. OTOH that adds complexity to and already complex command / interaction. What do you think?

That is a good point and I'm almost convinced. Again, from your test cross-project-tests/debuginfo-tests/dexter-tests/address.cpp: Am I right in thinking that if a is always reported as optimized out then the DexExpectWatchValue for b will always succeed? I wonder if "address name is assigned but never checked" and "address name is never assigned" should be cases for heuristic penalties. OTOH that adds complexity to and already complex command / interaction. What do you think?

It seems reasonable that we could add more specific scoring for these - right now in terms of points, the ranking of the following scenarios is:
Both variables unavailable < One variable unavailable == Both variables available but different < Both variables available

If we add the suggested "address name assigned but never checked" penalty, "One variable unavailable" would give a worse score than "Both variables available but different". Alternatively if we added an additional penalty for an address that was "successful" but didn't match elsewhere, then we would have "Both variables available but different" would be worse. Which do you think would be best?

That is a good point and I'm almost convinced. Again, from your test cross-project-tests/debuginfo-tests/dexter-tests/address.cpp: Am I right in thinking that if a is always reported as optimized out then the DexExpectWatchValue for b will always succeed? I wonder if "address name is assigned but never checked" and "address name is never assigned" should be cases for heuristic penalties. OTOH that adds complexity to and already complex command / interaction. What do you think?

It seems reasonable that we could add more specific scoring for these - right now in terms of points, the ranking of the following scenarios is:
Both variables unavailable < One variable unavailable == Both variables available but different < Both variables available

If we add the suggested "address name assigned but never checked" penalty, "One variable unavailable" would give a worse score than "Both variables available but different". Alternatively if we added an additional penalty for an address that was "successful" but didn't match elsewhere, then we would have "Both variables available but different" would be worse. Which do you think would be best?

IMO "both variables available but different" should be treated as worse (higher penalty, lower desirability) than "one variable available". This fits the assumption that an incorrect value is more damaging to debugging experience than a missing one.

IMO "both variables available but different" should be treated as worse (higher penalty, lower desirability) than "one variable available". This fits the assumption that an incorrect value is more damaging to debugging experience than a missing one.

Then again, treating "one variable available" as "better" isn't good if the value that is available is incorrect, so it's a tricky situation.

Fixed up comments, and moved address_resolution map creation/assignment into parsing (from heuristic).

StephenTozer added a comment.EditedOct 13 2021, 10:58 AM

Then again, treating "one variable available" as "better" isn't good if the value that is available is incorrect, so it's a tricky situation.

But also, if we only see the value a single time, we have no way of knowing whether or not it's correct - since the value is essentially arbitrarily determined at runtime, it may be accurate to say that the relationship between the values is the only thing that matters (which may be an argument against using DexExpectWatchBase as a framework), so if we only have a single value it's hard to assign a "correctness" score to that value. A way to extend this reasoning might be to say, what if we check the same address in 3 different variables. In this case, there are 7 possibilities, using the format (num_variables_available, num_unique_address_values): (0, 0), (1, 1), (2, 2), (2, 1), (3, 3), (3, 2), (3, 1).

Going on current default penalty values, the penalty for the (3, 3) case would be 14, as there would be 2 incorrect values worth 7 points each. The penalty for the (0, 0) case on the other hand is somewhere from 9-18, depending on whether the values are recorded as "optimized out", "missing", or some other failure. In this case, the penalty amount also depends on the order that the evaluation happens in: if we have the variables (a, b, c) and a == b, a != c, then the penalty will depend on whether we encounter c first or not - if c is the first, then we have 2 incorrect variables, otherwise we have 1. I originally wanted to prevent this by having the heuristic attempt to find the best scoring value for each address and using that, but I thought it might be too "clever" an approach for a test framework. However, it might be the case that this is necessary for getting somewhat consistent scorin, and if we can't get it to have consistent scoring, then maybe we should just bite the bullet and have a single declared value that we assume is correct - at least that way it's clear to the user.

This is basically a fairly long-winded way of saying that I'm not sure what the answer is, but the options seem to be:

  1. Leave it as is and accept that it's going to be inconsistent in its penalties sometimes
  2. Have a single point for the value declared upfront, and leave it to the user to make sure that the value isn't incorrect or unavailable
  3. Make the heuristic try to select address values to maximize the program's score
  4. Use a different scoring system entirely, most likely making this completely separate to DexExpectWatchValue

I think I'm actually leaning more towards 2 now, since it makes it easier for the user to pick a value that they're sure is correct, potentially by modifying the source to guarantee that the correct value is accessible somewhere, although doing so could also affect the validity of the test. Option 1 does simplify things for common cases, but may not be worth the potential for inconsistent scoring - WDUT?

Then again, treating "one variable available" as "better" isn't good if the value that is available is incorrect, so it's a tricky situation.

But also, if we only see the value a single time, we have no way of knowing whether or not it's correct - since the value is essentially arbitrarily determined at runtime, it may be accurate to say that the relationship between the values is the only thing that matters (which may be an argument against using DexExpectWatchBase as a framework), so if we only have a single value it's hard to assign a "correctness" score to that value. A way to extend this reasoning might be to say, what if we check the same address in 3 different variables. In this case, there are 7 possibilities, using the format (num_variables_available, num_unique_address_values): (0, 0), (1, 1), (2, 2), (2, 1), (3, 3), (3, 2), (3, 1).

Going on current default penalty values, the penalty for the (3, 3) case would be 14, as there would be 2 incorrect values worth 7 points each. The penalty for the (0, 0) case on the other hand is somewhere from 9-18, depending on whether the values are recorded as "optimized out", "missing", or some other failure. In this case, the penalty amount also depends on the order that the evaluation happens in: if we have the variables (a, b, c) and a == b, a != c, then the penalty will depend on whether we encounter c first or not - if c is the first, then we have 2 incorrect variables, otherwise we have 1. I originally wanted to prevent this by having the heuristic attempt to find the best scoring value for each address and using that, but I thought it might be too "clever" an approach for a test framework. However, it might be the case that this is necessary for getting somewhat consistent scorin, and if we can't get it to have consistent scoring, then maybe we should just bite the bullet and have a single declared value that we assume is correct - at least that way it's clear to the user.

This is basically a fairly long-winded way of saying that I'm not sure what the answer is, but the options seem to be:

  1. Leave it as is and accept that it's going to be inconsistent in its penalties sometimes
  2. Have a single point for the value declared upfront, and leave it to the user to make sure that the value isn't incorrect or unavailable
  3. Make the heuristic try to select address values to maximize the program's score
  4. Use a different scoring system entirely, most likely making this completely separate to DexExpectWatchValue

I think I'm actually leaning more towards 2 now, since it makes it easier for the user to pick a value that they're sure is correct, potentially by modifying the source to guarantee that the correct value is accessible somewhere, although doing so could also affect the validity of the test. Option 1 does simplify things for common cases, but may not be worth the potential for inconsistent scoring - WDUT?

I considered mentioned something along the lines of this (3) but thought it seemed too much - too "clever" as you said - especially given the fact that we know the heuristic score is quite wonky already. I agree that either 1 or 2 is good. I like 2 because it makes the tests less magic, easier to interpret, but I'm not sure which is "better". It might be something we just have to experiment with (I will happily LGTM either 1 or 2)?

cross-project-tests/debuginfo-tests/dexter/dex/command/commands/DexExpectWatchBase.py
67

You've gone too far the other way! 😄 nit: ans address -> an address. Plus there're a couple of comments that need full stops still.

FIrst pass implementation of the discussed change to DexDeclareAddress.

The new tactic, code, and doc changes all SGTM (one nit inline).

I'll hold off my LGTM just for now though as I think it'd be good to have regression tests for the new command. You can take inspiration from the duplicate label test and referencing an undefined label test, plus it would be good to have at least one test in the command tests directory using it. To that end, the test you have at the moment (cross-project-tests/debuginfo-tests/dexter-tests/address.cpp) makes more sense as a command regression test IMO (see inline comment).

cross-project-tests/debuginfo-tests/dexter-tests/address.cpp
3–4 ↗(On Diff #381078)

Imo this test should be a regression test in the dexter/feature_tests directory, and this RUN line should become:

// RUN: %dexter_regression_test -- %s

IIRC this substitution should allow the test to run on all systems, so you can also remove REQUIRES: system-windows.

wdyt?

cross-project-tests/debuginfo-tests/dexter/dex/command/ParseCommand.py
38

I wonder if there's a better way of sharing access to the address_resolutions dict? This doesn't feel quite right but (unhelpfully) no suggestion comes to mind.

Add a set of feature tests for DexDeclareAddress; also fix a minor error that would appear if the address was never resolved.

Before merging, I'm also going to add a bit more useful output to the test - in particular making the penalty cases more clear: if a declared address never resolves to a value this should be explicitly stated in the output (in verbose mode if not by default), and if a variable has a missing value that was an address (i.e. the variable is never seen holding the address value) then the user should see which address was missing, rather than just the resolved value of that address. This should come with a couple of new tests, and should also resolve Orlando's inline comment about the address resolutions map (as this will need to be referenced in a proper context object in order for this feature to work).

StephenTozer added inline comments.Nov 11 2021, 2:29 AM
cross-project-tests/debuginfo-tests/dexter/dex/command/commands/DexExpectWatchBase.py
36

Aforementioned minor fix is here: the addition of resolutions[self.name] is None is needed to catch an address which is validly declared but does not have a resolved value (which should be because the line it was declared for was never stepped on). This results in a 'missing value' result for the variable that references it.

Move the generation of the address resolution map to Heuristic (None initialized on command objects until the heuristic runs). Added address information to the verbose output (only prints additional info if DexDeclareAddress is actually present), and always print the name of the address in the "missing values" and "encountered expected values" output (implementation for misordered values is more complex, and has been ignored). Also, removed some unused functions in DexExpectWatchBase.

Adds a test for the penalty of a missing address, and also a test for all the aforementioned printing behaviour.

Add 1 more test for the hit_count argument.

Orlando accepted this revision.Nov 12 2021, 8:14 AM

I've looked at the two changes since my SGTM comment, which I'll now happily upgrade to a LGTM.

The regression tests look great! Most of them have REQUIRES: system-linux. I assume this is because they rely on the conditional controller, and dbgeng wrapper doesn't support it. Is that right? If so, I wonder if there's anything stopping them working on system-darwin. Perhaps these should instead be XFAIL: system-windows instead?

I enjoy the small refactor, and the improved verbose output looks good. Thanks!

cross-project-tests/debuginfo-tests/dexter/dex/command/commands/DexExpectWatchBase.py
34

nit: this comment needs updating after the latest change

36

SGTM

cross-project-tests/debuginfo-tests/dexter/feature_tests/subtools/test/address_printing.cpp
7–10

I don't think it should block the feature either way but I'm curious about why this is?

This revision is now accepted and ready to land.Nov 12 2021, 8:14 AM
StephenTozer added inline comments.Nov 12 2021, 11:01 AM
cross-project-tests/debuginfo-tests/dexter/feature_tests/subtools/test/address_printing.cpp
7–10

The short summary is that the misordered result array is constructed by a function that does not trivially convert to the address/resolution logic; at least, that was my assessment when I looked at it - I'll take another look before merging to see if it isn't actually reasonable.