This is an archive of the discontinued LLVM Phabricator instance.

Document the stability policy for LLVM-C APIs.
AbandonedPublic

Authored by jyknight on Sep 7 2015, 10:39 PM.

Download Raw Diff

Details

Reviewers

grosbach
echristo
deadalnix
ributzka
lhames

Summary

("Reasonably stable, not 100% guaranteed")

This is my proposal, following from the thread "[RFC] Developer Policy
for LLVM C API".

Please comment, bikeshed, and/or flame, as appropriate. Initial
reviewers chosen from a few interested parties from the email thread,
but I dunno what the policy actually is for making policy changes.

Diff Detail

Event Timeline

jyknight updated this revision to Diff 34189.Sep 7 2015, 10:39 PM

jyknight retitled this revision from to Document the stability policy for LLVM-C APIs..

jyknight updated this object.

jyknight added reviewers: ributzka, lhames, echristo, deadalnix.

jyknight added a subscriber: llvm-commits.

Amen to that. That's perfect and if that can get things moving on the C API side of things, the only thing I have to say is thank you.

docs/DeveloperPolicy.rst
539	Typo : LLVM
551	We broken that policy in 3.7 and are about to break it again in 3.8 on the landingpad question. As the 3.8 version is compatible with the 3.6 one, 3.7 is pretty much unusable from C if one want to play with landingpad. This policy sounds good, so let's make sure we apply it.

Could we have comments, bikeshed, and/or flame, as appropriate ?

This is important for many LLVM users.

I had a proposal in the other thread. I don't like this one. I'll write up something soon.

Repeating Eric's proposal (at least, I think the one he means):

What I'm proposing is that we make the C API that exists in tree a bindings API that has the same stability guarantees as the C++ API. Honestly it'll probably much more stable, but at least then we won't have to worry or revert work because the C API was "too close to the machine" or rather the C++ code. This means that someone that wants a stable C API can go off and develop one (tests and all) and we can possibly look at bringing it back into tree at some point in the future. For example, if someone comes up with a good "libjit" api then we can look at how the API design works and make sure it's general enough that it's not going to cause undue solidification of the existing APIs.

Caveat: I'm not talking about the existing libclang or liblto libraries. Those seem to work and have a small enough API surface that they seem reasonable to support and we can move to a new API if they seem to be hindering development in the future.

This help explain where I'm coming from here?

So, I'm not opposed to having two levels of stability for LLVM-C APIs. But, if we end up having a separate "stable" and "bindings" API, I think my proposal is basically the level of stability that's needed for a bindings API to be sanely usable. And an actually "stable" API would be expected to have more stability than this: a deprecation policy, at least.

The goal I had is to come up with something that has a minimal cost to ongoing C++ development, while continuing to be useful to users of the LLVM-C API, and further, help to avoid the issue of "we can't add that because it might not be 100% stable".

But, "Anything goes" like the C++ API has, I think is not at all a good idea, as users cannot be guaranteed to get compiler errors to tell them something's changed.

The thing is C is the esperanto of programing languages. Many are using the C API from other languages than C. When changing the signature, you don't get any error, but some weird result. For instance, the change in the landingpad function between 3.6 and 3.7 result in a segfault at runtime rather than any clear error. The extra guarantee that this policy provides is that a given function signature won't change. the only constraint it adds is that, if one want to change the signature, one need to rename the function.

Note that this is pretty much what happens in C++ already as the mangling of the function contains its signature. The policy here ensure that the same guarantee exists for the C API. It needs to be in the policy because the C language does not mangle the signature in the function as C++ does.

@echristo any progress ?

Thanks for putting this together, this initiative was needed to have this discussion moving forward.

FWIW I'm on the same track as Eric on this topic.
Ideally, I would love to have a way for the "bindings" to be auto-generated as much as possible from the C++ public headers, and thus they couldn't have different stability guarantee than the C++ API.

Right now I'm not sure we take it by the right end, Who are the clients of a "stable" C API vs a binding API? what are the use cases to solve? A "stable" C API needs to have a smaller surface than the bindings (obviously), but to which extent?

FWIW I'm on the same track as Eric on this topic.
Ideally, I would love to have a way for the "bindings" to be auto-generated as much as possible from the C++ public headers, and thus they couldn't have different stability guarantee than the C++ API.

Right now I'm not sure we take it by the right end, Who are the clients of a "stable" C API vs a binding API? what are the use cases to solve? A "stable" C API needs to have a smaller surface than the bindings (obviously), but to which extent?

What use-cases do you or Eric think would be well-served by an autogenerated C api which has no stability or compatibility guarantees at all? I just can't see that sort of completely-unstable api as being really usable for what people would like from a C API.

The LLVM C API use-case I'm most familiar with is other language frontends, themselves written in non-C languages. That use-case can basically tolerate the API changing over time, by using build or runtime conditionals (or just dropping support for the old version of llvm), but it's very useful to be able to be able to easily detect such API incompatibility. That use-case is what this policy intends to support, with the existing LLVM-C API.

On the other side, the "actually stable" API that's been discussed, I also don't really see a terribly large use-case for. "It would be nice" if all the APIs were stable, but I can't really see being able to say "this is stable" with confidence about enough parts of LLVM for that to be realistically useful, except in very special limited circumstances. E.g., if you can't expose the LLVM IR builder functions as "stable" (which I think you cannot, since the IR changes with some regularity), that just seems like a non-starter.

So, I'm not sure who the envisioned clients of a "stable" LLVM API are, other than perhaps ld64 (which, I think, seems like somewhat of a special case.)

Honestly, to me, the existing C API seems to be positioned in just about the right place, so I'd like to see a policy on how to maintain it sanely, and not a policy of replacing it with something less suitable in either direction (that is, neither "too unstable" nor "too restricted").

The LLVM C API use-case I'm most familiar with is other language frontends, themselves written in non-C languages. That use-case can basically tolerate the API changing over time, by using build or runtime conditionals (or just dropping support for the old version of llvm), but it's very useful to be able to be able to easily detect such API incompatibility. That use-case is what this policy intends to support, with the existing LLVM-C API.

I see something similar to this, but for external projects that just don't update frequently. I.e. I've got my own project, it needs a lot of C APIs to do approximately everything with llvm (or, hey, we could autogenerate all of the language bindings). We can't support that sort of C API use case at all, but that's the direction that a lot of the use is going. I say we can't support it because any sort of deprecation or attempt to keep stable scheme is going to lock down the C++ API in ways that are unacceptable (at least to me and anyone I've spoken to on the project).

I 100% agree with that, the existence or maintenance policy for the LLVM-C API should not lock down the ability to change the C++ API. I disagree that this policy has the effect you're concerned about, however. I do not think this policy locks down the C++ API in any way.

My problem is that the surface you're proposing to be even pseudo-stable is just too large and locks down too much of the API. I, honestly, don't even like the little bit of hack that we (Mehdi in this case) had to add in TargetMachine while working on the DataLayout requirement to support the existing C API.

But that's exactly the point! That sort of hacky workaround to attempt to keep 100% stability is exactly what I'm trying to say is not required!

According to the policy written here (at least, what I meant, if it's not clear as written) all Mehhdi should've done in this instance was to delete the LLVMGetTargetMachineData function, because it no longer makes any sense. LLVM changed such that data layout is only an attribute of the module, not the target machine, so the old C API accessing it from the TargetMachine is nonsensical. Thus, remove it. End of story.

Ideally, LLVMGetTargetMachineData() could've been deprecated at the beginning of the process of starting to remove TargetMachine::getDataLayout (Early march?), before it was finally removed on Jul 24. But even that is no requirement, just a "nice thing to do if you remember".

In D12685#246351, @jyknight wrote:

The LLVM C API use-case I'm most familiar with is other language frontends, themselves written in non-C languages. That use-case can basically tolerate the API changing over time, by using build or runtime conditionals (or just dropping support for the old version of llvm), but it's very useful to be able to be able to easily detect such API incompatibility. That use-case is what this policy intends to support, with the existing LLVM-C API.

I see something similar to this, but for external projects that just don't update frequently. I.e. I've got my own project, it needs a lot of C APIs to do approximately everything with llvm (or, hey, we could autogenerate all of the language bindings). We can't support that sort of C API use case at all, but that's the direction that a lot of the use is going. I say we can't support it because any sort of deprecation or attempt to keep stable scheme is going to lock down the C++ API in ways that are unacceptable (at least to me and anyone I've spoken to on the project).

I 100% agree with that, the existence or maintenance policy for the LLVM-C API should not lock down the ability to change the C++ API. I disagree that this policy has the effect you're concerned about, however. I do not think this policy locks down the C++ API in any way.

Ah. I see what you mean. Basically codify "best effort" as the policy, but not any more strict than that.

On that note then, what would be the point of @deadalnix's patch to add testing to the C API? To make sure that nothing changes out from under people in behavior?

My problem is that the surface you're proposing to be even pseudo-stable is just too large and locks down too much of the API. I, honestly, don't even like the little bit of hack that we (Mehdi in this case) had to add in TargetMachine while working on the DataLayout requirement to support the existing C API.

But that's exactly the point! That sort of hacky workaround to attempt to keep 100% stability is exactly what I'm trying to say is not required!

According to the policy written here (at least, what I meant, if it's not clear as written) all Mehhdi should've done in this instance was to delete the LLVMGetTargetMachineData function, because it no longer makes any sense. LLVM changed such that data layout is only an attribute of the module, not the target machine, so the old C API accessing it from the TargetMachine is nonsensical. Thus, remove it. End of story.

Ideally, LLVMGetTargetMachineData() could've been deprecated at the beginning of the process of starting to remove TargetMachine::getDataLayout (Early march?), before it was finally removed on Jul 24. But even that is no requirement, just a "nice thing to do if you remember".

OK, I mean sure, I'd like this too, but I think enough users at least care about the "being able to read" and probably "be able to execute" IR created by an older process when a newer llvm comes onto the system. I'm thinking in the graphics driver area that Mehdi works in. I think webkit even uses the C++ API for creation and the C API for the jit. Basically the stable C API could be the same as the bitcode guarantees + deprecation if we change something drastically - we support reading older IR and autoupgrading it to something that can be used.

I really don't like libLTO, but I think creating a new one is going to have issues, we should just deprecate it as lld comes on line.

Thoughts?

@echristo the C APi test is not intended as to freeze the API, but as to make sure that things are done intentionally. It follow the recent landingpad change fiasco.

If a change is made that break the C API, then the test is going to break. Either this breakage is intentional, in which case the test can be updated to reflect the change, or the breakage is not intentional and can be fixed.

The test is intended to cover read and write IR, which seems like a minimum to have for the API to be useful.

Generally, I think the best effort policy proposed here is the right balance. Freezing the C API would impair refactoring capabilities, while changing it in the same way it is done with the C++ API would cause hard to debug problems when using LLVM from other languages.

Update slightly, add some examples and rationale.

@echristo does the update address your concerns ?

docs/DeveloperPolicy.rst
552	This does not convey the idea that this is generally undesirable. It may be done, but generally should be avoided.

I'm not sure what data you'd like to see. I'm talking from experience using the C API from a foreign language, and it seems that @jyknight has the same experience.

What I had in mind was trying to answer when I said that I haven’t seen data:

Who are the primary users of the C API?
What are their use case? Which should translate to “What part of the API is of interest”? (the linker does not care about the IRBuilder for instance).
What do they expect from a C API: is stability really important or would pure bindings be OK?
etc.

It may be that a large amount of project are using C because you can’t interface with C++ conveniently. Since we don’t promise full compatibility these projects would have to either revlock to LLVM or write some compatibility layer anyway, so having some way to auto-generate C-bindings over the C++ API can cover these use cases.
In this scenario, the non-bindings “stable” C API would have very little surface in comparison.

—
Mehdi

OK, I mean sure, I'd like this too, but I think enough users at least care about the "being able to read" and probably "be able to execute" IR created by an older process when a newer llvm comes onto the system. I'm thinking in the graphics driver area that Mehdi works in. I think webkit even uses the C++ API for creation and the C API for the jit. Basically the stable C API could be the same as the bitcode guarantees + deprecation if we change something drastically - we support reading older IR and autoupgrading it to something that can be used.

I'm not sure what you're saying there. Are you saying you want to provide a more stable C IR Building API? That if we remove an instruction from IR, that we should continue to support emitting the instruction from the C API the same way autoupgrade from an old IR format supports reading it?

That seems like it'd be a lot of extra work that nobody has signed up to do, to me -- if you want to emit old IR, doesn't it make more sense to just continue linking the old version of LLVM?

I'd rather ask if with a binding API for the IR Builder would we even need to provide the same feature in a “stable API”?

So, I think that asking for more data collection is not actually useful.

From the "do people need MORE stable than this proposal provides" side: it's not that I think having an actually-fully-stable interface that WebKit and Rust and everyone else could use would not be useful to them -- it's that LLVM developers are not willing/able to put up with the restrictions that places on the codebase. So asking people "hey would you like a more stable api?" is not worthwhile. Obviously, yes, all other things equal I'd like that. But it's not going to happen -- there's a lot of pushback from LLVM developers about not inconveniencing development of the LLVM C++ code.

You might instead ask: "Hey would you like to have a completely stable LLVM C API, but that has a much much smaller set of functionality exposed?". And I already know the answer to that: "no". Most users already have to add their own custom wrappers for APIs that aren't exposed in the llvm-c api (typically, things that really ought to be right there next to one of the existing APIs, but aren't because it's very hard to get any additions approved!). Having less is just not going to be useful.

(I'll just note: the fact that people have to add their own C++ code doesn't mean the C API is useless! It's a huge head-start -- that's quite helpful even if you also need custom C++ wrappers!)

Literally the ONLY "stable API" user that I think currently exists is ld64 -- and that is only because they have designed and deployed their own private API. llvm-c/lto.h has, to the best of my knowledge, just the one user. Everyone else works fine with "pseudo-stable".

And everyone else using LLVM that I know of wants to access a wide swath of functionality, from IR Building, to custom pass setup, to jit compiling/linking. All of that cannot possibly be provided in guaranteed-stable form without inconveniencing LLVM development.

I mean, let's take examples: Rust, WebKit, Mono, or the Radeon R600 Mesa driver? tThere's really no way what they're doing can be provided as "stable".

As far as I can see a useful "stable API" is basically impossible, and should be off the table for that reason alone.

The next question is from the other direction: maybe we should just ditch the llvm-c API entirely, and use an autogenerated C binding generator instead?

I really think that idea is just a distraction.

Firstly, auto-generated bindings typically target the end language, not C. (e.g. with swig, boost::python, etc). So you probably wouldn't even want a C API at all, if that's the way you want to do things. You'd probably just expose the C++ api directly to Python, Perl, etc.

But more importantly, this autogenerated llvm C binding doesn't exist. If someone had created it, it could be reasonably examined and considered as a replacement for the LLVM-C API, weighing any advantages and disadvantages. (My expectation is that it would not be an improvement, and would fail if put to such a comparison.) But right now it's vaporware! As such, talking about it as if it was a real alternative is not useful.

What we HAVE now is the LLVM-C API -- and it's actually pretty damn good. What it needs is the ability to continue to evolve alongside changes in the underlying LLVM library, without being held back (and without threats of deletion) due to concerns of exposing "too much" functionality.

That's what this proposal is about.

The minimal effect of a policy here needs to be that LLVM developers are comfortable enough to allow new APIs to be added to the LLVM-C API set, without objecting each time.

So, bottom line, I think there are only three realistic alternatives:
a) Approximately what I've written in this diff.
b) "Anything goes" (make any changes you like, no stability rules whatsoever).
c) Just delete the LLVM-C API entirely.

Obviously, I think (a) is the best choice. And if forced, I'd go with (b) over (c).

What we HAVE now is the LLVM-C API -- and it's actually pretty damn good.
What it needs is the ability to continue to evolve alongside changes in the
underlying LLVM library, without being held back (and without threats of
deletion) due to concerns of exposing "too much" functionality.

That's what this proposal is about.

The minimal effect of a policy here needs to be that LLVM developers are
comfortable enough to allow new APIs to be added to the LLVM-C API set,
without objecting each time.

A specific issue I have with this proposal for what it's worth is the
explosion of functions in the C API. Not the surface area, but the number.
I see a future (perhaps jaded) that looks like a walk through the history
of evolving C++ API as more and more things are wrapped and changed and
rewrapped and changed. That said, I might be picturing it wrong?

-eric

@echristo Yes and no. My understanding of the proposal is that method can be removed. Let's get practical here. There is a proposal to move toward pointer being untyped, and type being on the instruction manipulating the pointer (GEP, load, ...).

It will require the C API to be updated. The current function to build a load is as follow :

LLVMValueRef LLVMBuildLoad(LLVMBuilderRef builder, LLVMValueRef ptr);

What the proposal forbid, is to change this as :

LLVMValueRef LLVMBuildLoad(LLVMBuilderRef builder, LLVMTypeRef type, LLVMValueRef ptr);

But, it is possible to introduce another function and remove the current, existing one :

LLVMValueRef LLVMBuildTypedLoad(LLVMBuilderRef builder, LLVMTypeRef type, LLVMValueRef ptr);

The proposal does not force us to keep the old method. Simply to introduce a method with a new name rather than modifying the signature of the existing method.

Can we move forward with that ? I'd like to work on adding support to emit debug infos to the C API, if possible soon enough to get it out in 3.8 . I would need it. Right now, all work on the C API is frozen.

@jyknight , @echristo , Can we resume the discussion here ?

After much discussion, Eric has updated policy in a different commit.

Thanks so much for your work here. We wouldn't have gotten to where we did without it.

Revision Contents

Path

Size

docs/

DeveloperPolicy.rst

110 lines

Diff 34845

docs/DeveloperPolicy.rst

Show First 20 Lines • Show All 519 Lines • ▼ Show 20 Lines	* Newer releases can ignore features from older releases, but they cannot
dropping it would be a valid way to upgrade the IR.		dropping it would be a valid way to upgrade the IR.

* Debug metadata is special in that it is currently dropped during upgrades.		* Debug metadata is special in that it is currently dropped during upgrades.

* Non-debug metadata is defined to be safe to drop, so a valid way to upgrade		* Non-debug metadata is defined to be safe to drop, so a valid way to upgrade
it is to drop it. That is not very user friendly and a bit more effort is		it is to drop it. That is not very user friendly and a bit more effort is
expected, but no promises are made.		expected, but no promises are made.

		LLVM-C API Compatibility
		------------------------

		While most of LLVM's library API is explicitly lacking any sort of cross-version
		compatibility/API-stability guarantees, the API exposed by the headers in
		``include/llvm-c/`` are covered by a somewhat enhanced compatibility policy.

		This policy is designed to make life easier for users calling into LLVM -- both
		those using the ``.h`` file from C or C++ code, and those doing a
		foreign-function-call from some other programming language.

		However, LLVM can not and does not promise absolute forward or backwards
		deadalnixUnsubmitted Done Reply Inline Actions Typo : LLVM deadalnix: Typo : LLVM
		compatibility for the LLVM-C API, or even a n-release deprecation policy,
		because we do not want these compatibility rules to slow development on the
		underlying LLVM functionality.

		We also do not want to restrict the APIs covered by the LLVM-C API to only those
		which can be frozen without restricting future C++ development. That subset of
		APIs would be too limited to be useful for most use-cases.

		Therefore, this policy attempts to set some middle ground.

		TL;DR: You must not change the ABI/function signatures of existing
		functions. You may add new functions. You may remove functions without any prior
		deadalnixUnsubmitted Done Reply Inline Actions We broken that policy in 3.7 and are about to break it again in 3.8 on the landingpad question. As the 3.8 version is compatible with the 3.6 one, 3.7 is pretty much unusable from C if one want to play with landingpad. This policy sounds good, so let's make sure we apply it. deadalnix: We broken that policy in 3.7 and are about to break it again in 3.8 on the landingpad question.
		deprecation period. You may change the semantics of existing functions.
		deadalnixUnsubmitted Not Done Reply Inline Actions This does not convey the idea that this is generally undesirable. It may be done, but generally should be avoided. deadalnix: This does not convey the idea that this is generally undesirable. It may be done, but generally…

		Here are some more detailed guidelines:

		* The ABI/signature of an existing function must not be modified. That
		includes adding arguments, removing arguments, changing the types of
		arguments, etc. If a function needs a new signature, instead create a new
		function with a new name.

		For example: The EH personality function was moved from the ``landingpad``
		instruction to the function itself. However, the existing API
		``LLVMBuildLandingPad`` must not be modified to remove the "PersFn"
		argument. The alternative chosen was to use the PersFn argument to set the
		function's landing pad. (The alternative of adding a new function
		``LLVMBuildLandingPad2`` would've been acceptable as well.)

		Rationale: Callers from other languages will not necessarily get a compiler
		error to notify them of such a signature change, and having a mismatched
		argument list can result in very hard to diagnose errors. The cost of this
		compatibility is epxected to be low, as it's typically easy to notice this
		sort of change in the ``llvm-c/*.h`` files.

		* In contrast, the semantics of a function call may change incompatibly,
		requiring callers to be modified. (Use your best judgment about whether the
		behavior change makes sense, of course.)

		For example: Prior to ``comdat`` being added as an IR feature, calling
		``LLVMSetLinkage(x, LLVMLinkOnceODRLinkage)`` would have implicitly created a
		COMDAT section. Now, it will not: callers must explicitly create a comdat if
		they want one. No attempt has been made to preserve the old semantic.

		Rationale: It would be too difficult and burdensome to keep strict behavioral
		compatibility in the C API with each such underlying behavioral change. It
		would be both difficult to detect such changes, as well as difficult to
		preserve the old behavior.

		* New functions exposing new LLVM functionality may be added to the LLVM-C API
		"as needed". Functions for a part of the API already exposed should generally
		be accepted with simple code review, and may be done at the same time as
		adding the underlying LLVM functionality, if the developer so desires.
		(e.g. exposing a LLVMBuild* function for a new IR feature would fall under
		this category.)

		Be a little more careful when exposing an as-yet-unexposed part of LLVM. It is
		not necessarily a good idea for rapidly-changing or less-generally-useful
		parts of LLVM's API to be exposed.

		Rationale: The compatibility required makes the cost to LLVM developers of
		having a C++ API exposed via the LLVM-C API relatively low, so the barrier for
		exposing new functionality should be commensurately low. However, the
		maintenance cost is still non-zero, so it still is not be appropriate to
		expose absolutely everything.

		* Functions should not be removed from the LLVM-C API, but may be removed,
		when required. For example, if the underlying LLVM functionality is being
		removed, there is no realistic and useful way to keep an LLVM-C wrapper for
		it.

		In a similar vein, an older variant of a function should be kept when
		introducing a new variant when possible. However, if is not reasonably
		possible to preserve the old function, it may be removed at the same time.

		Marking a function "deprecated" for some time before its final removal is
		appreciated, but is not a requirement.

		For example: ``LLVMGetTargetMachineData`` wrap the C++
		``TargetMachine::getDataLayout()``. This C++ API was removed, and DataLayout
		is now associated only with a Module. Thus, the LLVM-C API function should be
		removed as well.

		For example: ``LLVMIntTypeInContext`` effectively replaces ``LLVMIntType``,
		but ``LLVMIntType`` has not been removed, because it still has a clear
		meaning, and was easy to preserve.

		Rationale: Preserving existing functions is helpful to clients, and thus
		should be encouraged. But, it is not a valuable to keep a C API wrapper beyond
		the lifetime of the underlying functionality, and the lifetime of the
		underlying LLVM functionality will not be extended only for the benefit of the
		C API.

		* Similar rules follow for ``enum`` elements: they must not be renumbered, but
		may be added and, if necessary, removed. Be aware that removing an ``enum``
		element can not prevent existing callers from continuing to pass it as input
		to functions! Also, be careful not to cause the underlying integer type of the
		``enum`` to change (e.g. by adding a value that doesn't fit in a 32-bit int).

.. _copyright-license-patents:		.. _copyright-license-patents:

Copyright, License, and Patents		Copyright, License, and Patents
===============================		===============================

.. note::		.. note::

This section deals with legal matters but does not provide legal advice. We		This section deals with legal matters but does not provide legal advice. We
▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines