This is an archive of the discontinued LLVM Phabricator instance.

[test-suite] Add HOME to test env under lit
AbandonedPublic

Authored by mcrosier on Jan 14 2016, 9:29 AM.

Details

Reviewers
MatzeB
Summary

All,
llvm-lit blanks the environment during testing, adding back only those
variables explicitly provided during configuration (through lit.cfg or
lit.site.cfg or others).

Normally this causes no issues, but under emulation, sqlite3 will fail
verification because it attempts to find the home directory of the user
running the test. It does this by using getpwuid, which does not work
under emulation, and prints an error that breaks the diff step.

Fortunately, it uses the HOME env variable as a backup, so this patch
adds a step to lit.cfg that adds the HOME env variable to the test
environment if it exists in the host environment.

Chad

Diff Detail

Event Timeline

mcrosier updated this revision to Diff 44896.Jan 14 2016, 9:29 AM
mcrosier retitled this revision from to [test-suite] Add HOME to test env under lit.
mcrosier updated this object.
mcrosier added a reviewer: MatzeB.
mcrosier added a subscriber: llvm-commits.
MatzeB accepted this revision.Jan 14 2016, 11:21 AM
MatzeB edited edge metadata.

LGTM. Could be refactored to something like the following though:

environment_whitelist = ['HOME', 'SSH_AUTH_SOCK']
for var in environment_whitelist:
  if var in os.environ:
    config.environment[var] = os.environ[var]
This revision is now accepted and ready to land.Jan 14 2016, 11:21 AM

We have found that some benchmarks will differ their performance based on the sizes of the environment variables. KS is a good example of this. Is there a better way to do this than populating the environment?

We have found that some benchmarks will differ their performance based on the sizes of the environment variables. KS is a good example of this. Is there a better way to do this than populating the environment?

Very good point!

Maybe we should be more conservative and force people to configure the environment variables in the cmake configuration and use an empty env by default?

(Though I must admit if we start thinking about these issues: I think the directory of the testsuite matters as well because that ends up being in argv[0] and cmake/lit currently uses absolute paths wherever possible)

As for the sqlite3 issue: Maybe we should just patch it not check for the home directory and continue anyway. It's always unfortunate if we have to modify the existing benchmarks and it has the risk that the next person putting a newer version of sqlite3 in there will miss the patches. The latter point can be mitigated by documenting the changes in a README-LLVM in the benchmark directory I guess.

As for the sqlite3 issue: Maybe we should just patch it not check for the home directory and continue anyway. It's always unfortunate if we have to modify the existing benchmarks and it has the risk that the next person putting a newer version of sqlite3 in there will miss the patches. The latter point can be mitigated by documenting the changes in a README-LLVM in the benchmark directory I guess.

I'm fine with this solution as well. Just let me know which direction you'd prefer to take.

Chad

We have found that some benchmarks will differ their performance based on the sizes of the environment variables. KS is a good example of this. Is there a better way to do this than populating the environment?

Very good point!

Maybe we should be more conservative and force people to configure the environment variables in the cmake configuration and use an empty env by default?

(Though I must admit if we start thinking about these issues: I think the directory of the testsuite matters as well because that ends up being in argv[0] and cmake/lit currently uses absolute paths wherever possible)

I think that in the general case, benchmarks that are sensitive to data layout like this should be measured multiple times, with variations in the environment, to avoid measurement bias effects.
Therefore, while it may not be needed to clean the environment completely, there should be a mechanism to enable varying the size of the environment on different runs. http://www.inf.usi.ch/faculty/hauswirth/publications/CU-CS-1042-08.pdf, section 4.2.1 discusses this well, I thought.

As for the sqlite3 issue: Maybe we should just patch it not check for the home directory and continue anyway. It's always unfortunate if we have to modify the existing benchmarks and it has the risk that the next person putting a newer version of sqlite3 in there will miss the patches. The latter point can be mitigated by documenting the changes in a README-LLVM in the benchmark directory I guess.

I'm fine with this solution as well. Just let me know which direction you'd prefer to take.

Chad

I shortly looked at the sqlite3 source and if I read it correctly, then it should be enough to checkin an empty sqliterc file and change RUN_OPTIONS TO:

set(RUN_OPTIONS -init ${CMAKE_CURREN_SOURCE_DIR}/sqliterc :memory:)

and it should not try to read $HOME/.sqliterc anymore. You can do that right now without further review I think.

About the environment variables, I currently prefer the idea of the user specifying a whitelist at configuration time, but we can tackle that in a separate commit.

We have found that some benchmarks will differ their performance based on the sizes of the environment variables. KS is a good example of this. Is there a better way to do this than populating the environment?

Very good point!

Maybe we should be more conservative and force people to configure the environment variables in the cmake configuration and use an empty env by default?

(Though I must admit if we start thinking about these issues: I think the directory of the testsuite matters as well because that ends up being in argv[0] and cmake/lit currently uses absolute paths wherever possible)

I think that in the general case, benchmarks that are sensitive to data layout like this should be measured multiple times, with variations in the environment, to avoid measurement bias effects.
Therefore, while it may not be needed to clean the environment completely, there should be a mechanism to enable varying the size of the environment on different runs. http://www.inf.usi.ch/faculty/hauswirth/publications/CU-CS-1042-08.pdf, section 4.2.1 discusses this well, I thought.

Benchmarking can be hard. And I am not even sure how to identify the "benchmarks that are sensitive to data layout", I can very well imagine this also depends on the target architecture (cache strategies, ...). Also there are even better techniques around ( http://plasma.cs.umass.edu/emery/stabilizer) for minimizing benchmarking noise. But implementing these things would be a bigger project I think and should be discussed on llvm-dev I guess.

I shortly looked at the sqlite3 source and if I read it correctly, then it should be enough to checkin an empty sqliterc file and change RUN_OPTIONS TO:

set(RUN_OPTIONS -init ${CMAKE_CURREN_SOURCE_DIR}/sqliterc :memory:)

and it should not try to read $HOME/.sqliterc anymore. You can do that right now without further review I think.

Done so in r257928.

mcrosier abandoned this revision.Jan 15 2016, 12:28 PM