This is an archive of the discontinued LLVM Phabricator instance.

increase timeouts if running on valgrind
AbandonedPublic

Authored by llunak on Apr 4 2022, 4:02 AM.

Details

Reviewers
clayborg
Summary

Valgrind makes everything run much slower, so don't time out too soon.

BTW, does it make sense to get even things like this reviewed, or is it ok if I push these directly if I'm reasonably certain I know what I'm doing? I feel like I'm spamming you by now.

Diff Detail

Event Timeline

llunak created this revision.Apr 4 2022, 4:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2022, 4:02 AM
llunak requested review of this revision.Apr 4 2022, 4:02 AM
labath added a subscriber: labath.Apr 4 2022, 7:10 AM

BTW, does it make sense to get even things like this reviewed, or is it ok if I push these directly if I'm reasonably certain I know what I'm doing? I feel like I'm spamming you by now.

Generally, I would say yes. I'm not even sure that some of your other patches should have been submitted without a pre-commit review (*all* llvm patches are expected to be reviewed).

In principle this idea seems fine, but the implementation is somewhat odd. The caller of ScopedTimeout is passing a specific timeout value, but then the class sets the timeout to something completely different. And it means you're not increasing the timeout for regular (fast) packets, so they still get to run with the default timeout of 1 second.

I would say that the entire ScopedTimeout class should be multiplier-based. Instead of passing a fix value, the user would say "I know this packed is slow, so I want to use X-times the usual timeout value". Then, we could support valgrind (and various *SANs) by just setting the initial timeout value to a reasonably high value, and the multipliers would take care of the rest.

Is this really something we want to hardcode at all? It seems like this should be a setting that is configured by the test harness.

labath added a comment.Apr 4 2022, 9:10 AM

If it's just a matter of setting the default timeout value, then I don't think it would be that bad -- we already set a different timeout for debug and release builds, so it we could conceivably put this there. But yes, my "setting the initial timeout value to a reasonably high value" comment was going in the direction that this could be then set from inside the test harness.

BTW, does it make sense to get even things like this reviewed, or is it ok if I push these directly if I'm reasonably certain I know what I'm doing? I feel like I'm spamming you by now.

Generally, I would say yes. I'm not even sure that some of your other patches should have been submitted without a pre-commit review (*all* llvm patches are expected to be reviewed).

I see. I haven't contributed anything for a while and git history shows a number of commits to do not refer to reviews.llvm.org, so I assumed simple patches were fine. I'll get even the simple ones reviewed if that's expected.

I would say that the entire ScopedTimeout class should be multiplier-based. Instead of passing a fix value, the user would say "I know this packed is slow, so I want to use X-times the usual timeout value". Then, we could support valgrind (and various *SANs) by just setting the initial timeout value to a reasonably high value, and the multipliers would take care of the rest.

For the Valgrind case it doesn't really matter what the timeout is, as long as it's long enough to not time out on normal calls. Even multiplying is just taking a guess at it, Valgrind doesn't slow things down linearly.

Is this really something we want to hardcode at all? It seems like this should be a setting that is configured by the test harness.

This is not about tests. I had a double-free abort, so I ran lldb normally in valgrind and it aborted attach because of the 6-second timeout in GDBRemoteCommunicationClient::QueryNoAckModeSupported(). That's too short for anything in Valgrind, and I thought this change would be useful in general, as otherwise lldb is presumably unusable for Valgrind runs. But if you think this requires getting complicated as such rewritting an entire class for it, then I'm abandoning the patch.

BTW, does it make sense to get even things like this reviewed, or is it ok if I push these directly if I'm reasonably certain I know what I'm doing? I feel like I'm spamming you by now.

Generally, I would say yes. I'm not even sure that some of your other patches should have been submitted without a pre-commit review (*all* llvm patches are expected to be reviewed).

I see. I haven't contributed anything for a while and git history shows a number of commits to do not refer to reviews.llvm.org, so I assumed simple patches were fine. I'll get even the simple ones reviewed if that's expected.

FWIW the official policy is outlined here: https://llvm.org/docs/CodeReview.html

I would say that the entire ScopedTimeout class should be multiplier-based. Instead of passing a fix value, the user would say "I know this packed is slow, so I want to use X-times the usual timeout value". Then, we could support valgrind (and various *SANs) by just setting the initial timeout value to a reasonably high value, and the multipliers would take care of the rest.

For the Valgrind case it doesn't really matter what the timeout is, as long as it's long enough to not time out on normal calls. Even multiplying is just taking a guess at it, Valgrind doesn't slow things down linearly.

Is this really something we want to hardcode at all? It seems like this should be a setting that is configured by the test harness.

This is not about tests. I had a double-free abort, so I ran lldb normally in valgrind and it aborted attach because of the 6-second timeout in GDBRemoteCommunicationClient::QueryNoAckModeSupported(). That's too short for anything in Valgrind, and I thought this change would be useful in general, as otherwise lldb is presumably unusable for Valgrind runs.

My suggestion was that this timeout should be configurable through a setting. If it is, then you can increase it when running under Valgrind (or anywhere else that slows down lldb, like the sanitizers). I wouldn't mind having a bit of code that checks llvm::sys::RunningOnValgrind() and increases the default timeouts so that you don't even have to change your setting. But what I don't think is a good idea is to sprinkle checks for whether we're running under Vanguard throughout the code.

BTW, does it make sense to get even things like this reviewed, or is it ok if I push these directly if I'm reasonably certain I know what I'm doing? I feel like I'm spamming you by now.

Generally, I would say yes. I'm not even sure that some of your other patches should have been submitted without a pre-commit review (*all* llvm patches are expected to be reviewed).

I see. I haven't contributed anything for a while and git history shows a number of commits to do not refer to reviews.llvm.org, so I assumed simple patches were fine. I'll get even the simple ones reviewed if that's expected.

FWIW the official policy is outlined here: https://llvm.org/docs/CodeReview.html

I would say that the entire ScopedTimeout class should be multiplier-based. Instead of passing a fix value, the user would say "I know this packed is slow, so I want to use X-times the usual timeout value". Then, we could support valgrind (and various *SANs) by just setting the initial timeout value to a reasonably high value, and the multipliers would take care of the rest.

For the Valgrind case it doesn't really matter what the timeout is, as long as it's long enough to not time out on normal calls. Even multiplying is just taking a guess at it, Valgrind doesn't slow things down linearly.

Is this really something we want to hardcode at all? It seems like this should be a setting that is configured by the test harness.

This is not about tests. I had a double-free abort, so I ran lldb normally in valgrind and it aborted attach because of the 6-second timeout in GDBRemoteCommunicationClient::QueryNoAckModeSupported(). That's too short for anything in Valgrind, and I thought this change would be useful in general, as otherwise lldb is presumably unusable for Valgrind runs.

My suggestion was that this timeout should be configurable through a setting. If it is, then you can increase it when running under Valgrind (or anywhere else that slows down lldb, like the sanitizers). I wouldn't mind having a bit of code that checks llvm::sys::RunningOnValgrind() and increases the default timeouts so that you don't even have to change your setting. But what I don't think is a good idea is to sprinkle checks for whether we're running under Vanguard throughout the code.

The GDB remote packet timeout is customizable:

(lldb) settings set plugin.process.gdb-remote.packet-timeout 10000

So maybe we don't need this patch?

llunak abandoned this revision.Apr 7 2022, 1:09 PM

FWIW the official policy is outlined here: https://llvm.org/docs/CodeReview.html

I'm aware of it, but as far as I can judge I was following it. Even reading it now again I see nothing that I would understand as mandating review for everything.

The GDB remote packet timeout is customizable:
...
So maybe we don't need this patch?

The timeout in GDBRemoteCommunicationClient::QueryNoAckModeSupported() is hardcoded to 6 seconds, so I doubt the setting matters there. And even if it did, I'd find it sometwhat silly having to look for a please-unbreak-valgrind setting, especially given that there is already LLVM support for detecting Valgrind.

But this all is too much for something so simple.

FWIW the official policy is outlined here: https://llvm.org/docs/CodeReview.html

I'm aware of it, but as far as I can judge I was following it. Even reading it now again I see nothing that I would understand as mandating review for everything.

The GDB remote packet timeout is customizable:
...
So maybe we don't need this patch?

The timeout in GDBRemoteCommunicationClient::QueryNoAckModeSupported() is hardcoded to 6 seconds, so I doubt the setting matters there.

This is incorrect. It's hardcoded to be std::max(GetPacketTimeout(), seconds(6)) where GetPacketTimeout() is the value from the setting.

And even if it did, I'd find it sometwhat silly having to look for a please-unbreak-valgrind setting, especially given that there is already LLVM support for detecting Valgrind.

The suggestion was to increase the timeout globally (through the setting) when running under Valgrind.

But this all is too much for something so simple.

So the setting should take care of everything and we should be able to increase it when running with valgrind right? I would rather not have code all over LLDB making valgrind tests and doing something in response if possible

So the setting should take care of everything and we should be able to increase it when running with valgrind right? I would rather not have code all over LLDB making valgrind tests and doing something in response if possible

Yup that's exactly what I suggested because I have the same concern

FWIW the official policy is outlined here: https://llvm.org/docs/CodeReview.html

I'm aware of it, but as far as I can judge I was following it. Even reading it now again I see nothing that I would understand as mandating review for everything.

It does say "patches that meet likely-community-consensus requirements can be committed prior to an explicit review" and "where there is any uncertainty, a patch should be reviewed prior to being committed".
It can be hard to judge what is a likely-community-consensus without being an active member of the community, which is why it's safer to go down the pre-commit review path.

Also note that when I said that "all patches are expected to be reviewed", that included both pre-commit and post-commit review. I deliberately used passive voice because in the latter case, there's nothing for you (as the patch author) to do. It's generally up to the owners of individual components to ensure that all patches going in get reviewed by someone. Since there's no paper trail, this is very hard to verify, but I can tell you that people do that, and that it's not a good way to introduce yourself to someone.

FWIW the official policy is outlined here: https://llvm.org/docs/CodeReview.html

I'm aware of it, but as far as I can judge I was following it. Even reading it now again I see nothing that I would understand as mandating review for everything.

It does say "patches that meet likely-community-consensus requirements can be committed prior to an explicit review" and "where there is any uncertainty, a patch should be reviewed prior to being committed".
It can be hard to judge what is a likely-community-consensus without being an active member of the community, which is why it's safer to go down the pre-commit review path.

Also note that when I said that "all patches are expected to be reviewed", that included both pre-commit and post-commit review. I deliberately used passive voice because in the latter case, there's nothing for you (as the patch author) to do. It's generally up to the owners of individual components to ensure that all patches going in get reviewed by someone. Since there's no paper trail, this is very hard to verify, but I can tell you that people do that, and that it's not a good way to introduce yourself to someone.

Based on that 'introduce' comment I expect the part that you're not aware of is that all 4 of those simple commits I pushed directly changed code that had been written by me. So I still think I was following the guidelines, and I got an explicit review for all changes where I had any uncertainty, but as I said if it's expected that I'll get explicit review even for simple changes in code I'm familiar with, I can do that.

Based on that 'introduce' comment I expect the part that you're not aware of is that all 4 of those simple commits I pushed directly changed code that had been written by me. So I still think I was following the guidelines, and I got an explicit review for all changes where I had any uncertainty, but as I said if it's expected that I'll get explicit review even for simple changes in code I'm familiar with, I can do that.

You are correct -- I was not aware of that. It is a somewhat unusual situation, with you being not being otherwise very active, but given the overall lack of activity on the gui front, I don't think anyone (who is aware of that fact) would question your ownership of those parts. So I apologise for doing so.

OTOH, if you do have a sense of ownership of (parts of) gui, then I can complain to you about the lack of testing of that feature :P, and there is another policy about patches coming with tests. (I know this looks like a bait-and-switch, but that is what has mostly upset me about those patches -- functionally I don't see anything wrong with them).