This is an archive of the discontinued LLVM Phabricator instance.

Create basic SBEnvironment class
ClosedPublic

Authored by wallace on Mar 12 2020, 5:21 PM.

Details

Summary

Inspired by https://reviews.llvm.org/D74636, I'm introducing a basic version of Environment in the API. More functionalities can be added as needed.

I tried it in the command line:

(lldb) script
Python Interactive Interpreter. To exit, type 'quit()', 'exit()' or Ctrl-D.
>>> env =lldb.debugger.GetSelectedPlatform().GetEnvironment()
>>> env.Size()
35
>>> env.GetEntryAtIndex(2)
'HOME=/Users/wallace'
>>> env.GetEntryAtIndex(0)
'COMMAND_MODE=unix2003'

Diff Detail

Event Timeline

wallace created this revision.Mar 12 2020, 5:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 12 2020, 5:21 PM
wallace edited the summary of this revision. (Show Details)Mar 12 2020, 5:26 PM
jingham requested changes to this revision.Mar 12 2020, 7:29 PM
jingham added a subscriber: jingham.

Thanks for doing this, it will be useful!

I'll let Jonas comment on whether the Reproducer bindings are right, though it looks fine to me.

One of the very common uses of an SBEnvironment is the one in an SBTarget, which gets used to create new processes. With what you have you can get access to it, so that's all good. But you do it by getting the target's platform, and then the Environment from the platform. It might be more discoverable to have the SBTarget hand it out directly as well, however. And there is an lldb_private::Target::GetEnvironment, so it's pretty natural.

Also, you need some tests. You can probably do a lot of it with unit tests. Then maybe a small API test that sets environment variables and fetches them out this way and checks that they are right.

The added API's are up to you. The requested changes are for tests...

This revision now requires changes to proceed.Mar 12 2020, 7:29 PM

I'm not sure that this is the right API to represent an environment. The environment is more like a dictionary/map than an array. (The internal Environment object *is* a map, though this does not immediately mean the SB one should be too).

Even for your most immediate use case, you'll want to use the map-like properties of the environment (to "merge" the inherited environment and the user-provided values). With an api like this you'd have to implement all of that by yourself.

So, I am wondering if we shouldn't provide a more map-like interface here too. Rough proposal:

const char *GetEntry(const char *name); // nullptr if not present
void PutEntry(const char *string); // modeled on putenv(3)
void SetEntry(const char *name, const char *value, bool overwrite); //modeled on setenv(3), maybe we can skip it if it's not needed now
SBStringList GetEntries(); // if someone wants to enumerate all of them, maybe also skippable

If we don't want a map-like interface, then maybe we don't actually need a separate class, and an SBStringList would work just fine?

Maybe you could also refactor the other patch to use this new functionality (whatever it ends up being), so that we can see whether it makes sense at least for that use case..

lldb/include/lldb/API/SBEnvironment.h
26–28

s/int/size_t

lldb/source/API/SBEnvironment.cpp
1–2

fix formatting

30–31

new(Environment(rhs))

53

This will return a dangling pointer as Envp is a temporary object. It's intended to be used to pass the environment object to execve-like function.
The current "state of the art" is to ConstStringify all temporary strings that go out of the SB api. (Though I think we should start using std::string).

Also, constructing the Envp object just to fetch a single string out of it is pretty expensive. You can fetch the desired result from the Environment object directly.

Putting it all together, this kind of an API would be implemented via something like:

if (idx >= m_opaque_up->size())
  return nullptr;
return ConstString(Environment::compose(*std::next(m_opaque_up.begin(), idx)).AsCString();

... but I am not sure this is really the right api. More on that in a separate comment.

Yes, a key/value approach would be a better API. Looks like the Environment class would make that pretty easy to wrap up, as well.

Thanks a lot for the feedback! I was thinking about adding features to this class as needed, but definitely I should make this API be more like a map

clayborg added inline comments.Mar 13 2020, 4:23 PM
lldb/bindings/interface/SBPlatform.i
197–198

What does it mean to get the environment from a platform? Fetching it from the remote platform as what ever binary was used to provide the connection?

lldb/include/lldb/API/SBEnvironment.h
26

return size_t or uint32_t.

Possibly rename to GetNumEntries() to be consistent with all of the other APIs that have a size and item accessor.

28

use size_t or uint32_t (which ever one you pick from Size() above

lldb/include/lldb/API/SBPlatform.h
157

I know there isn't much header documentation here already, but it is a good time to start with any new functions. We would need to detail what this environment would be. Something like: "Return the environment variables contained in the remote platform connection process."

lldb/source/API/SBEnvironment.cpp
46
size_t SBEnvironment::GetNumEnrtries()
53

Yes, any C strings returned from the SB API should use ConstString(cstr).AsCString() as the ConstString puts the string into a string pool and there is no need to clients to destruct the string or take ownership.

No STL in the SB API please, so no std::string. std::string isn't stable on all platforms and competing C++ runtimes on different platforms can cause problems. If we need string ownership, we can introduce SBString as a class.

Please do bounds check this like Pavel requested as the index isn't guaranteed to be valid. If you bounds check first then your code above could be modified to return a ConstString().AsCString().

wallace updated this revision to Diff 250649.Mar 16 2020, 4:30 PM

address comments≈

wallace updated this revision to Diff 250651.Mar 16 2020, 4:34 PM
wallace marked 9 inline comments as done.Mar 17 2020, 4:48 PM

This looks pretty good overall. I have a bunch of comments, but nothing major.

Could you also please rebase the other patch (D74636) on top of this. I think that a very common use case for this will be taking the platform environment, tweaking it, and then using it to launch a process (i.e. what you are about to do), so it would be good to make sure that flow is sufficiently straight-forward.

lldb/include/lldb/API/SBEnvironment.h
39

This doesn't sounds right.

42–44

Maybe explicitly mention that this overwrites any previous value with that name?

47

we use the name_and_value style elsewhere

lldb/include/lldb/API/SBTarget.h
97 ↗(On Diff #250651)

s/or/of

100 ↗(On Diff #250651)

s/taget/target/

Also the concept of a "target's environment variables" is somewhat fuzzy. I take it this is the environment that would be used by default to launch a new process (aka the value of target.env-vars setting)?

And maybe also mention that this is a *copy* of that environment (so any changes made to it don't propagate back anywhere)

lldb/source/API/SBEnvironment.cpp
53

Can you check what will this return for non-existing variables and for variables which are set to an empty string FOO=. I have a feeling this will return the same thing, but I would expect to get a nullptr in the first case, and "" in the second.

58

This is using auto just to be fancy. That is not consistent with llvm style.

lldb/test/API/python_api/sbenvironment/TestSBEnvironment.py
31–32 ↗(On Diff #250651)

I don't think you actually need to build an inferior for this. target = dbg.CreateTarget("") should be sufficient, I believe.

45 ↗(On Diff #250651)

As per the other comment, please also test a case like env.PutEntry("BAZ=")

54 ↗(On Diff #250651)

self.assertIn(needle, haystack)

wallace updated this revision to Diff 251143.Mar 18 2020, 11:48 AM

address comments

Thanks for the review!

clayborg requested changes to this revision.Mar 18 2020, 12:00 PM

We need to determine if the objects we return are copies (from SBPlatform and SBTarget), or if they are objects that will modify the environment for the platform and target respectively. If we are returning copies, we need setters on both the platform and target. If they are references, then they will just update the current environment. I think I would rather have the ability to modify them using the returned object, but I am not set on this if everyone else thinks otherwise.

I added many comments in the SBEnvironment header.

lldb/include/lldb/API/SBEnvironment.h
27

would GetNumValues() be better? Also, if we have this, we need some accessors that work by index:

const char *GetNameAtIndex(size_t);
const char *GetValueAtIndex(size_t);

Again, I would rather people not have to split up the name and value themselves if we can avoid it.

33

Would get Get(...) be better? Mirror the getenv() call?

40

If we can get a list of values and edit the SBStringList, do we want a setter?:

void SetEntries(SBStringList);
47

Change to:

bool Set(const char *name, const char *value, bool overwrite);

The bool indicates if it was set. I would rather people not have to make string values that contain "name=value" just to set an env var. This mirrors the setenv() API in libc

48

We need a Unset accessor:

bool Unset(const char *name);

The returned bool can indicate if the value was removed.

lldb/include/lldb/API/SBTarget.h
97 ↗(On Diff #250651)

Maybe better as:

/// Return an object that contains the environment that will be used when 
/// launching processes.

Then we need to explain if this is a copy, or a value that can be modified. It is can be modified, we should add

/// The SBEnvironment object can be used to add or remove environment 
/// variables prior to launching a process.

If it is a copy, mention using the SetEnvironment accessor I mention below

100 ↗(On Diff #250651)

If this is a copy of the environment, then there should be a set accessor:

void SBTarget::SetEnvironment(SBEnvironment);

I would be ok with either way (return a copy and have setting, or return the actual object so it can be manipulated). What is the best approach?

lldb/source/API/SBEnvironment.cpp
53

Yes, we need a test for env vars with that are set with no value and it should return "" (non NULL).

This revision now requires changes to proceed.Mar 18 2020, 12:00 PM
wallace updated this revision to Diff 251200.Mar 18 2020, 3:44 PM

Addressed Greg's comments.

At the end I ended up not using APIs that use the format NAME=value, as they could be error prone. I think it's safer to use functions that work with explicit names and values.

Besides, I don't want to complicate this diff, so for now I'd prefer not to add setters for Platform and Target using the Environment class. We can add that later when we need it.

wallace updated this revision to Diff 251201.Mar 18 2020, 3:45 PM
This comment was removed by wallace.
clayborg requested changes to this revision.Mar 19 2020, 12:01 AM

Just some header doc cleanup before we submit. Thanks for sticking with these changes!

lldb/include/lldb/API/SBEnvironment.h
25

const means nothing in these classes since the ivar will never get modified, so you can remote the leading "const" for the return value.

28

need \param entries in header doc

30

Rephrase the return value docs a bit. Maybe:

/// The value of the environment variable or null if not present. If the environment variable has no value but is present a valid pointer to an empty string will be returned.

Is that what we are doing? Returning "" when variable is there but has no value?

33

need brief description and \param entries in header doc

37

need brief description and \param entries in header doc

38

Need to clarify what happens when the env far is there but has no value. Should match the Get(const char*) function return value with what ever we say is the solution for that.

41

Move above the *AtIndex(size_t) calls so users see this first in API.

45

need \param entries in header doc

51

need \param entries in header doc

This revision now requires changes to proceed.Mar 19 2020, 12:01 AM

The new interface is ok for me, but there are two things I want to mention:

  • the iteration via Get{Name,Value}AtIndex is going to be slow because the underlying map (like most maps) does not have random access iterators. This is the problem I was trying to solve with the GetEntries-like API, but I agree that has its issues too.
  • I agree that forcing a user to manually construct name=value strings if he has then as separate entities is not good, but I also don't think we're doing anyone a favor by not providing an api which would accept such strings. The name=value format is very universal, and I think users will often have the value in this format already (for example, D74636 does). This means that those users would have to implement =-splitting themselves before using this class. This is why the internal Environment class provides both kinds of api, and maybe also the reason why the c library provides both putenv(3) and setenv(3).
lldb/source/API/SBEnvironment.cpp
54

Please reuse the result of m_opaque_up->find(name) here to avoid double lookup.

60

index < 0 is not possible now.

63–74

I don't think these need to const-stringified now, given that they are backed by the underlying map. We already have functions which return pointers to internal objects (e.g. SBStream::GetData).

@clayborg?

81–82

how about if(overwrite) insert_or_assign(name, value) else try_emplace(name, value)? (avoiding two map lookups)

Well, I think I'll implement both kind of accessors in the API to account for all possible cases

wallace marked 16 inline comments as done.Mar 19 2020, 12:02 PM
wallace marked 7 inline comments as done.
wallace marked 4 inline comments as done.Mar 19 2020, 12:04 PM

The new interface is ok for me, but there are two things I want to mention:

  • the iteration via Get{Name,Value}AtIndex is going to be slow because the underlying map (like most maps) does not have random access iterators. This is the problem I was trying to solve with the GetEntries-like API, but I agree that has its issues too.

I am guessing people won't use these much. I didn't know he underlying class used a map. No great solution here unless we modify the underlying class to have a vector with a map of name to index?

  • I agree that forcing a user to manually construct name=value strings if he has then as separate entities is not good, but I also don't think we're doing anyone a favor by not providing an api which would accept such strings. The name=value format is very universal, and I think users will often have the value in this format already (for example, D74636 does). This means that those users would have to implement =-splitting themselves before using this class. This is why the internal Environment class provides both kinds of api, and maybe also the reason why the c library provides both putenv(3) and setenv(3).

Fine to add functions for this. The user might grab the environment dump in text and want to set all of those env vars, so having the API is fine with me.

lldb/source/API/SBEnvironment.cpp
60

yeah, size_t is unsigned on all platforms.

63–74

That's a tough one. I would like to think that any "const char *" that comes back from an API that returns strings could just be pointer compared if needed. So I like the idea that for strings that comes out of the API that they are all const-ed and could easily be compared. I am fine with SBStream::GetData() returning its own string because _it_ owns the data. So I would vote to ConstString() any returned results

82

Or do we change the underlying API in the m_opaque_up class to accept an overwrite arg?

lldb/source/API/SBPlatform.cpp
658–660

remove {} (clang code guidelines for single statement if)

lldb/source/API/SBTarget.cpp
2396–2398 ↗(On Diff #251201)

remove {} (clang code guidelines for single statement if)

wallace marked 2 inline comments as done.Mar 19 2020, 1:17 PM
wallace added inline comments.
lldb/source/API/SBEnvironment.cpp
63–74

I'm adding a Clear method. With that, the backing char* might be wiped out and the python reference to this string might be invalid unless we use ConstString

81–82

teach me more

wallace updated this revision to Diff 251470.Mar 19 2020, 1:55 PM
  • Added both kinds of APIs we were discussing. It will come handy for all different kind of usages
  • Added an SBEnvironment API for SBLaunchInfo, which will be used in https://reviews.llvm.org/D74636
  • Address all sorts of comments
wallace updated this revision to Diff 251472.Mar 19 2020, 2:06 PM

fix grammar

labath accepted this revision.Mar 20 2020, 3:05 AM

This looks good to me. I still have some doubts about the ConstStringification of the returned values (I am not a fan of constifying everything), but I don't want to block this review for that.

lldb/source/API/SBEnvironment.cpp
27

move this into the initializer list?

30

super-nit: if you make this constructor accept a Environment rhs (without the const & part) and then do the initialization as new Environment(std::move(rhs)), you will avoid copying the environment in the expressions like SBEnvironment(Environment(whatever)). Currently that will create one Environment object to initialize the rhs argument, and then a second one when copying things to m_opaque_up.

63–74

I am fine with SBStream::GetData() returning its own string because _it_ owns the data.

I am afraid I don't see the distinction here. SBStream owns a Stream (via a unique_ptr) and the stream data is inside that. An SBEnvironment has a unique_ptr<Environment>, and the strings are inside of that.

I'm adding a Clear method. With that, the backing char* might be wiped out and the python reference to this string might be invalid unless we use ConstString

Python will be fine if as it will copy the string as soon as it gets its hand on it. C++ users could be affected by that, but, this is not something that should come across as surprising to them (and indeed, the same thing will happen with SBStream::GetData).

wallace updated this revision to Diff 251740.Mar 20 2020, 1:07 PM

apply last suggestions

Thanks a lot, guys. I learned a lot from you doing this patch :)

This revision was not accepted when it landed; it landed in state Needs Review.Mar 20 2020, 2:41 PM
This revision was automatically updated to reflect the committed changes.

Since it's late on Friday afternoon I'm taking the liberty to revert the patch to get the bots running again. Feel free to re-land with a fix or ask for another round of review if you have more questions.

MaskRay added a subscriber: MaskRay.EditedMar 23 2020, 10:43 PM

commit fd868f517d2c5ca8c0f160dbec0857b14ecf74c1 should be associated with Differential Revision: https://reviews.llvm.org/D76111.

The revert of fd868f517d2c5ca8c0f160dbec0857b14ecf74c1 should explain why fd868f517d2c5ca8c0f160dbec0857b14ecf74c1 was reverted.

Please don't make a commit whose description contains just one word fix.

If you are unsure, upload a new diff to trigger Harbormaster, check its test status on 3 Linux/Windows/macOS, instead of committing and reverting like playing a game. Every commit has a permanent record which will be shared among thousands of developers and can increase the size of the repository. Please be careful.

Subscribers: and Tags: lines are generally considered useless. You can strip them with:

arcfilter () {
        git log -1 --pretty=%B | awk '/Reviewers:|Subscribers:/{p=1} /Reviewed By:|Differential Revision:/{p=0} !p && !/^Summary:$/ {sub(/^Summary: /,"");print}' | git commit --amend -F -
}

Thanks for the notice. I apologize for not doing everything correctly. I'm still trying to understand how all the different piece of the system works and I end up learning from the things that fail...