Page MenuHomePhabricator

Document the stability policy for LLVM-C APIs.
AbandonedPublic

Authored by jyknight on Sep 7 2015, 10:39 PM.

Details

Summary

("Reasonably stable, not 100% guaranteed")

This is my proposal, following from the thread "[RFC] Developer Policy
for LLVM C API".

Please comment, bikeshed, and/or flame, as appropriate. Initial
reviewers chosen from a few interested parties from the email thread,
but I dunno what the policy actually is for making policy changes.

Diff Detail

Event Timeline

jyknight updated this revision to Diff 34189.Sep 7 2015, 10:39 PM
jyknight retitled this revision from to Document the stability policy for LLVM-C APIs..
jyknight updated this object.
jyknight added a subscriber: llvm-commits.
deadalnix edited edge metadata.Sep 7 2015, 10:58 PM

Amen to that. That's perfect and if that can get things moving on the C API side of things, the only thing I have to say is thank you.

docs/DeveloperPolicy.rst
539

Typo : LLVM

551

We broken that policy in 3.7 and are about to break it again in 3.8 on the landingpad question. As the 3.8 version is compatible with the 3.6 one, 3.7 is pretty much unusable from C if one want to play with landingpad.

This policy sounds good, so let's make sure we apply it.

Could we have comments, bikeshed, and/or flame, as appropriate ?

This is important for many LLVM users.

echristo edited edge metadata.Sep 9 2015, 2:45 PM

I had a proposal in the other thread. I don't like this one. I'll write up something soon.

Repeating Eric's proposal (at least, I think the one he means):

What I'm proposing is that we make the C API that exists in tree a bindings API that has the same stability guarantees as the C++ API. Honestly it'll probably much more stable, but at least then we won't have to worry or revert work because the C API was "too close to the machine" or rather the C++ code. This means that someone that wants a stable C API can go off and develop one (tests and all) and we can possibly look at bringing it back into tree at some point in the future. For example, if someone comes up with a good "libjit" api then we can look at how the API design works and make sure it's general enough that it's not going to cause undue solidification of the existing APIs.

Caveat: I'm not talking about the existing libclang or liblto libraries. Those seem to work and have a small enough API surface that they seem reasonable to support and we can move to a new API if they seem to be hindering development in the future.

This help explain where I'm coming from here?

So, I'm not opposed to having two levels of stability for LLVM-C APIs. But, if we end up having a separate "stable" and "bindings" API, I think my proposal is basically the level of stability that's needed for a bindings API to be sanely usable. And an actually "stable" API would be expected to have more stability than this: a deprecation policy, at least.

The goal I had is to come up with something that has a minimal cost to ongoing C++ development, while continuing to be useful to users of the LLVM-C API, and further, help to avoid the issue of "we can't add that because it might not be 100% stable".

But, "Anything goes" like the C++ API has, I think is not at all a good idea, as users cannot be guaranteed to get compiler errors to tell them something's changed.

The thing is C is the esperanto of programing languages. Many are using the C API from other languages than C. When changing the signature, you don't get any error, but some weird result. For instance, the change in the landingpad function between 3.6 and 3.7 result in a segfault at runtime rather than any clear error. The extra guarantee that this policy provides is that a given function signature won't change. the only constraint it adds is that, if one want to change the signature, one need to rename the function.

Note that this is pretty much what happens in C++ already as the mangling of the function contains its signature. The policy here ensure that the same guarantee exists for the C API. It needs to be in the policy because the C language does not mangle the signature in the function as C++ does.

Thanks for putting this together, this initiative was needed to have this discussion moving forward.

FWIW I'm on the same track as Eric on this topic.
Ideally, I would love to have a way for the "bindings" to be auto-generated as much as possible from the C++ public headers, and thus they couldn't have different stability guarantee than the C++ API.

Right now I'm not sure we take it by the right end, Who are the clients of a "stable" C API vs a binding API? what are the use cases to solve? A "stable" C API needs to have a smaller surface than the bindings (obviously), but to which extent?

FWIW I'm on the same track as Eric on this topic.
Ideally, I would love to have a way for the "bindings" to be auto-generated as much as possible from the C++ public headers, and thus they couldn't have different stability guarantee than the C++ API.

Right now I'm not sure we take it by the right end, Who are the clients of a "stable" C API vs a binding API? what are the use cases to solve? A "stable" C API needs to have a smaller surface than the bindings (obviously), but to which extent?

What use-cases do you or Eric think would be well-served by an autogenerated C api which has no stability or compatibility guarantees at all? I just can't see that sort of completely-unstable api as being really usable for what people would like from a C API.

The LLVM C API use-case I'm most familiar with is other language frontends, themselves written in non-C languages. That use-case can basically tolerate the API changing over time, by using build or runtime conditionals (or just dropping support for the old version of llvm), but it's very useful to be able to be able to easily detect such API incompatibility. That use-case is what this policy intends to support, with the existing LLVM-C API.

On the other side, the "actually stable" API that's been discussed, I also don't really see a terribly large use-case for. "It would be nice" if all the APIs were stable, but I can't really see being able to say "this is stable" with confidence about enough parts of LLVM for that to be realistically useful, except in very special limited circumstances. E.g., if you can't expose the LLVM IR builder functions as "stable" (which I think you cannot, since the IR changes with some regularity), that just seems like a non-starter.

So, I'm not sure who the envisioned clients of a "stable" LLVM API are, other than perhaps ld64 (which, I think, seems like somewhat of a special case.)

Honestly, to me, the existing C API seems to be positioned in just about the right place, so I'd like to see a policy on how to maintain it sanely, and not a policy of replacing it with something less suitable in either direction (that is, neither "too unstable" nor "too restricted").

The LLVM C API use-case I'm most familiar with is other language frontends, themselves written in non-C languages. That use-case can basically tolerate the API changing over time, by using build or runtime conditionals (or just dropping support for the old version of llvm), but it's very useful to be able to be able to easily detect such API incompatibility. That use-case is what this policy intends to support, with the existing LLVM-C API.

I see something similar to this, but for external projects that just don't update frequently. I.e. I've got my own project, it needs a lot of C APIs to do approximately everything with llvm (or, hey, we could autogenerate all of the language bindings). We can't support that sort of C API use case at all, but that's the direction that a lot of the use is going. I say we can't support it because any sort of deprecation or attempt to keep stable scheme is going to lock down the C++ API in ways that are unacceptable (at least to me and anyone I've spoken to on the project).

I 100% agree with that, the existence or maintenance policy for the LLVM-C API should not lock down the ability to change the C++ API. I disagree that this policy has the effect you're concerned about, however. I do not think this policy locks down the C++ API in any way.

My problem is that the surface you're proposing to be even pseudo-stable is just too large and locks down too much of the API. I, honestly, don't even like the little bit of hack that we (Mehdi in this case) had to add in TargetMachine while working on the DataLayout requirement to support the existing C API.

But that's exactly the point! That sort of hacky workaround to attempt to keep 100% stability is exactly what I'm trying to say is not required!

According to the policy written here (at least, what I meant, if it's not clear as written) all Mehhdi should've done in this instance was to delete the LLVMGetTargetMachineData function, because it no longer makes any sense. LLVM changed such that data layout is only an attribute of the module, not the target machine, so the old C API accessing it from the TargetMachine is nonsensical. Thus, remove it. End of story.

Ideally, LLVMGetTargetMachineData() could've been deprecated at the beginning of the process of starting to remove TargetMachine::getDataLayout (Early march?), before it was finally removed on Jul 24. But even that is no requirement, just a "nice thing to do if you remember".

The LLVM C API use-case I'm most familiar with is other language frontends, themselves written in non-C languages. That use-case can basically tolerate the API changing over time, by using build or runtime conditionals (or just dropping support for the old version of llvm), but it's very useful to be able to be able to easily detect such API incompatibility. That use-case is what this policy intends to support, with the existing LLVM-C API.

I see something similar to this, but for external projects that just don't update frequently. I.e. I've got my own project, it needs a lot of C APIs to do approximately everything with llvm (or, hey, we could autogenerate all of the language bindings). We can't support that sort of C API use case at all, but that's the direction that a lot of the use is going. I say we can't support it because any sort of deprecation or attempt to keep stable scheme is going to lock down the C++ API in ways that are unacceptable (at least to me and anyone I've spoken to on the project).

I 100% agree with that, the existence or maintenance policy for the LLVM-C API should not lock down the ability to change the C++ API. I disagree that this policy has the effect you're concerned about, however. I do not think this policy locks down the C++ API in any way.

Ah. I see what you mean. Basically codify "best effort" as the policy, but not any more strict than that.

On that note then, what would be the point of @deadalnix's patch to add testing to the C API? To make sure that nothing changes out from under people in behavior?

My problem is that the surface you're proposing to be even pseudo-stable is just too large and locks down too much of the API. I, honestly, don't even like the little bit of hack that we (Mehdi in this case) had to add in TargetMachine while working on the DataLayout requirement to support the existing C API.

But that's exactly the point! That sort of hacky workaround to attempt to keep 100% stability is exactly what I'm trying to say is not required!

According to the policy written here (at least, what I meant, if it's not clear as written) all Mehhdi should've done in this instance was to delete the LLVMGetTargetMachineData function, because it no longer makes any sense. LLVM changed such that data layout is only an attribute of the module, not the target machine, so the old C API accessing it from the TargetMachine is nonsensical. Thus, remove it. End of story.

Ideally, LLVMGetTargetMachineData() could've been deprecated at the beginning of the process of starting to remove TargetMachine::getDataLayout (Early march?), before it was finally removed on Jul 24. But even that is no requirement, just a "nice thing to do if you remember".

OK, I mean sure, I'd like this too, but I think enough users at least care about the "being able to read" and probably "be able to execute" IR created by an older process when a newer llvm comes onto the system. I'm thinking in the graphics driver area that Mehdi works in. I think webkit even uses the C++ API for creation and the C API for the jit. Basically the stable C API could be the same as the bitcode guarantees + deprecation if we change something drastically - we support reading older IR and autoupgrading it to something that can be used.

I really don't like libLTO, but I think creating a new one is going to have issues, we should just deprecate it as lld comes on line.

Thoughts?

@echristo the C APi test is not intended as to freeze the API, but as to make sure that things are done intentionally. It follow the recent landingpad change fiasco.

If a change is made that break the C API, then the test is going to break. Either this breakage is intentional, in which case the test can be updated to reflect the change, or the breakage is not intentional and can be fixed.

The test is intended to cover read and write IR, which seems like a minimum to have for the API to be useful.

Generally, I think the best effort policy proposed here is the right balance. Freezing the C API would impair refactoring capabilities, while changing it in the same way it is done with the C++ API would cause hard to debug problems when using LLVM from other languages.

jyknight updated this revision to Diff 34845.Sep 15 2015, 3:39 PM
jyknight marked 2 inline comments as done.
jyknight edited edge metadata.

Update slightly, add some examples and rationale.

@echristo does the update address your concerns ?

docs/DeveloperPolicy.rst
552

This does not convey the idea that this is generally undesirable. It may be done, but generally should be avoided.

I'm not sure what data you'd like to see. I'm talking from experience using the C API from a foreign language, and it seems that @jyknight has the same experience.

What I had in mind was trying to answer when I said that I haven’t seen data:

  • Who are the primary users of the C API?
  • What are their use case? Which should translate to “What part of the API is of interest”? (the linker does not care about the IRBuilder for instance).
  • What do they expect from a C API: is stability really important or would pure bindings be OK?
  • etc.

It may be that a large amount of project are using C because you can’t interface with C++ conveniently. Since we don’t promise full compatibility these projects would have to either revlock to LLVM or write some compatibility layer anyway, so having some way to auto-generate C-bindings over the C++ API can cover these use cases.
In this scenario, the non-bindings “stable” C API would have very little surface in comparison.


Mehdi

OK, I mean sure, I'd like this too, but I think enough users at least care about the "being able to read" and probably "be able to execute" IR created by an older process when a newer llvm comes onto the system. I'm thinking in the graphics driver area that Mehdi works in. I think webkit even uses the C++ API for creation and the C API for the jit. Basically the stable C API could be the same as the bitcode guarantees + deprecation if we change something drastically - we support reading older IR and autoupgrading it to something that can be used.

I'm not sure what you're saying there. Are you saying you want to provide a more stable C IR Building API? That if we remove an instruction from IR, that we should continue to support emitting the instruction from the C API the same way autoupgrade from an old IR format supports reading it?

That seems like it'd be a lot of extra work that nobody has signed up to do, to me -- if you want to emit old IR, doesn't it make more sense to just continue linking the old version of LLVM?

I'd rather ask if with a binding API for the IR Builder would we even need to provide the same feature in a “stable API”?

So, I think that asking for more data collection is not actually useful.

From the "do people need MORE stable than this proposal provides" side: it's not that I think having an actually-fully-stable interface that WebKit and Rust and everyone else could use would not be useful to them -- it's that LLVM developers are not willing/able to put up with the restrictions that places on the codebase. So asking people "hey would you like a more stable api?" is not worthwhile. Obviously, yes, all other things equal I'd like that. But it's not going to happen -- there's a lot of pushback from LLVM developers about not inconveniencing development of the LLVM C++ code.

You might instead ask: "Hey would you like to have a completely stable LLVM C API, but that has a much much smaller set of functionality exposed?". And I already know the answer to that: "no". Most users already have to add their own custom wrappers for APIs that aren't exposed in the llvm-c api (typically, things that really ought to be right there next to one of the existing APIs, but aren't because it's very hard to get any additions approved!). Having less is just not going to be useful.

(I'll just note: the fact that people have to add their own C++ code doesn't mean the C API is useless! It's a huge head-start -- that's quite helpful even if you also need custom C++ wrappers!)

Literally the ONLY "stable API" user that I think currently exists is ld64 -- and that is only because they have designed and deployed their own private API. llvm-c/lto.h has, to the best of my knowledge, just the one user. Everyone else works fine with "pseudo-stable".

And everyone else using LLVM that I know of wants to access a wide swath of functionality, from IR Building, to custom pass setup, to jit compiling/linking. All of that cannot possibly be provided in guaranteed-stable form without inconveniencing LLVM development.

I mean, let's take examples: Rust, WebKit, Mono, or the Radeon R600 Mesa driver? tThere's really no way what they're doing can be provided as "stable".

As far as I can see a useful "stable API" is basically impossible, and should be off the table for that reason alone.


The next question is from the other direction: maybe we should just ditch the llvm-c API entirely, and use an autogenerated C binding generator instead?

I really think that idea is just a distraction.

Firstly, auto-generated bindings typically target the end language, not C. (e.g. with swig, boost::python, etc). So you probably wouldn't even want a C API at all, if that's the way you want to do things. You'd probably just expose the C++ api directly to Python, Perl, etc.

But more importantly, this autogenerated llvm C binding doesn't exist. If someone had created it, it could be reasonably examined and considered as a replacement for the LLVM-C API, weighing any advantages and disadvantages. (My expectation is that it would not be an improvement, and would fail if put to such a comparison.) But right now it's vaporware! As such, talking about it as if it was a real alternative is not useful.


What we HAVE now is the LLVM-C API -- and it's actually pretty damn good. What it needs is the ability to continue to evolve alongside changes in the underlying LLVM library, without being held back (and without threats of deletion) due to concerns of exposing "too much" functionality.

That's what this proposal is about.

The minimal effect of a policy here needs to be that LLVM developers are comfortable enough to allow new APIs to be added to the LLVM-C API set, without objecting each time.

So, bottom line, I think there are only three realistic alternatives:
a) Approximately what I've written in this diff.
b) "Anything goes" (make any changes you like, no stability rules whatsoever).
c) Just delete the LLVM-C API entirely.

Obviously, I think (a) is the best choice. And if forced, I'd go with (b) over (c).

What we HAVE now is the LLVM-C API -- and it's actually pretty damn good.
What it needs is the ability to continue to evolve alongside changes in the
underlying LLVM library, without being held back (and without threats of
deletion) due to concerns of exposing "too much" functionality.

That's what this proposal is about.

The minimal effect of a policy here needs to be that LLVM developers are
comfortable enough to allow new APIs to be added to the LLVM-C API set,
without objecting each time.

A specific issue I have with this proposal for what it's worth is the
explosion of functions in the C API. Not the surface area, but the number.
I see a future (perhaps jaded) that looks like a walk through the history
of evolving C++ API as more and more things are wrapped and changed and
rewrapped and changed. That said, I might be picturing it wrong?

-eric

@echristo Yes and no. My understanding of the proposal is that method can be removed. Let's get practical here. There is a proposal to move toward pointer being untyped, and type being on the instruction manipulating the pointer (GEP, load, ...).

It will require the C API to be updated. The current function to build a load is as follow :

LLVMValueRef LLVMBuildLoad(LLVMBuilderRef builder, LLVMValueRef ptr);

What the proposal forbid, is to change this as :

LLVMValueRef LLVMBuildLoad(LLVMBuilderRef builder, LLVMTypeRef type, LLVMValueRef ptr);

But, it is possible to introduce another function and remove the current, existing one :

LLVMValueRef LLVMBuildTypedLoad(LLVMBuilderRef builder, LLVMTypeRef type, LLVMValueRef ptr);

The proposal does not force us to keep the old method. Simply to introduce a method with a new name rather than modifying the signature of the existing method.

Can we move forward with that ? I'd like to work on adding support to emit debug infos to the C API, if possible soon enough to get it out in 3.8 . I would need it. Right now, all work on the C API is frozen.

@jyknight , @echristo , Can we resume the discussion here ?

jyknight abandoned this revision.Jan 12 2016, 1:28 PM

After much discussion, Eric has updated policy in a different commit.

Thanks so much for your work here. We wouldn't have gotten to where we did without it.