This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
-
BitSets.rst
4
LangRef.rst
-
index.rst
-
include/llvm/
-
llvm/
-
ADT/
-
EquivalenceClasses.h
-
PointerUnion.h
-
IR/
-
Intrinsics.td
-
InitializePasses.h
-
Transforms/
-
IPO.h
-
IPO/
-
LowerBitSets.h
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
-
CMakeLists.txt
-
IPO.cpp
51
LowerBitSets.cpp
-
PassManagerBuilder.cpp
-
test/Transforms/LowerBitSets/
-
Transforms/
-
LowerBitSets/
-
constant.ll
-
simple.ll
-
single-offset.ll
-
unittests/Transforms/
-
Transforms/
-
CMakeLists.txt
-
IPO/
-
CMakeLists.txt
-
LowerBitSets.cpp
-
Makefile
-
Makefile

Differential D7288

Introduce bitset metadata format and bitset lowering pass.
ClosedPublic

Authored by pcc on Jan 29 2015, 6:42 PM.

Download Raw Diff

Details

Reviewers

silvas

Commits

rGe6909c8e8ba0: Introduce bitset metadata format and bitset lowering pass.
rL230054: Introduce bitset metadata format and bitset lowering pass.

Summary

This patch introduces a new mechanism that allows IR modules to co-operatively
build pointer sets corresponding to addresses within a given set of
globals. One particular use case for this is to allow a C++ program to
efficiently verify (at each call site) that a vtable pointer is in the set
of valid vtable pointers for the class or its derived classes. One way of
doing this is for a toolchain component to build, for each class, a bit set
that maps to the memory region allocated for the vtables, such that each 1
bit in the bit set maps to a valid vtable for that class, and lay out the
vtables next to each other, to minimize the total size of the bit sets.

The patch introduces a metadata format for representing pointer sets, an
'@llvm.bitset.test' intrinsic and an LTO lowering pass that lays out the globals
and builds the bitsets, and documents the new feature.

Design discussion: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-January/081285.html

Diff Detail

Event Timeline

pcc updated this revision to Diff 19019.Jan 29 2015, 6:42 PM

pcc retitled this revision from to Introduce bitset global attribute and bitset lowering pass..

pcc updated this object.

pcc edited the test plan for this revision. (Show Details)

pcc added a reviewer: silvas.

pcc added subscribers: kcc, Unknown Object (MLST).

pcc added a subscriber: jfb.Jan 29 2015, 8:04 PM

My overarching concern here is extending the IR and bitcode for a feature whose utility hasn't been demonstrated. There doesn't seem to be consensus on llvmdev that this feature is worth the maintenance cost.

Correctly handle multiple calls to llvm.bitset.test for a single bitset

Use metadata instead of introducing the bitset global feature
Move bitcode format updates to separate patch

Do you want to also change the patch summary?

Do you want to also change the patch summary?

Done.

Unfortunately I've found that this won't work if a pass (such as globaldce) calls removeDeadConstantUsers on the vtable global, as this will cause the getelementptr to be deleted and replaced with null in the metadata.

I'll try to fix this by changing the metadata format to be of the form:

{ bitset name, global, byte offset }

Switch to metadata format { bitset name, global, byte offset } to avoid removeDeadConstantUsers issue

Two comments on the doc so far.

docs/LangRef.rst
1866	two elements... and the third is ...
1889	I'd suggest you to have an example that has a non-zero offset (otherwise it's not clear why you need offsets)

pcc added inline comments.Feb 4 2015, 11:56 AM

docs/LangRef.rst
1866	Done.
1889	Done.

Documentation fixes

Can you please break the docs for this into a separate document? This is what we have been doing lately for "large" new IR features, like stackmaps, statepoints and inalloca. I think this feature qualifies. The separate document format gives a bit more breathing room for discussing the design and rationale of the feature, along with providing more in-depth explanation and examples.

+1 for a separate doc file

Move documentation to separate page

kcc added inline comments.Feb 4 2015, 6:57 PM

lib/Transforms/IPO/LowerBitSets.cpp
45	} // namespace
55	I wonder if you could split this 300 LOC function into several smaller ones.

I haven't looked at the big function or tests in-depth yet, agreed with @kcc that breaking up the function would make it easier to review.

Could you also add stats to the pass? I'd like Chrome to be able to have a sanity check test that this pass does something.

lib/Transforms/IPO/LowerBitSets.cpp
59	Shouldn't it be an error to try to run the bit set test pass without a module containing the `llvm.bitset.test` intrinsic?
76	What other valid uses of the intrinsic are there?
90	I'm not sure I get what's going on here :-)
182	`UNINT64_MIN` or `std::numeric_limits<uint64_t>::min()` is nicer.

Address review comments

lib/Transforms/IPO/LowerBitSets.cpp
45	Done.
55	Done.
59	This pass should work with any module, including modules that happen not to contain calls to the intrinsic.
76	There shouldn't be any (or at least the verifier should have errored on them). I've changed this to a `cast`.
90	I've added a comment which should hopefully clarify things here.
182	(I think you mean `::max`) Done.

s/constants/metadata/

Add statistics

Few comments in random places, more to go

lib/Transforms/IPO/LowerBitSets.cpp
11	are we going to handle InvokeInst? Check llvm/IR/CallSite.h
56	SmallVector maybe
107	I'd prefer if we report_fatal_error w/o DL, just like in AddressSanitizer.cpp

Few comments in random places, more to go

kcc added inline comments.Feb 5 2015, 4:10 PM

lib/Transforms/IPO/LowerBitSets.cpp
116	I wonder if this thing is unit-testable separately? We are not doing these things frequently, but I think we should. E.g. see ASanStackFrameLayoutTest.cpp
122	SmallVector, here and maybe in other places.
211	Clever!
223	Do we really want a separate BB here? On valid programs, i.e. in 99.9999% cases both checks will pass, so it might be better to do this w/o branches.

This also introduces a couple of optimizations which yielded a binary size
improvement of about 0.5% in a certain medium size (~14MB) binary.

Enforce availability of data layout
Check if the pointer is statically a member of the bitset; if so, remove the check
Add fast check for case where bitset contains single element
Use SmallVector

pcc added inline comments.Feb 6 2015, 12:44 PM

lib/Transforms/IPO/LowerBitSets.cpp
11	It isn't normally possible to invoke an intrinsic -- I think this is only possible for two specific intrinsics. In any case, this intrinsic cannot unwind.
56	I don't think we can make any prediction one way or another how large these are going to get. For vptr CFI it depends on the number of virtual calls made using a particular class, which is essentially unbounded. In the absence of any evidence I'd prefer to leave this as a `std::vector`.
107	Done.
116	Most likely. I think this already has enough test coverage for the moment though.
122	Done. This is probably better as a SmallVector since it lives on the stack and we can probably make an educated guess on its size.
223	It seems better to have separate BBs for the llvm.bitset.test intrinsic at least. Without the separate BBs, we could segv on out-of-bounds addresses, and it's possible that the caller might always need the boolean result (say if it wanted to print a diagnostic message if the check fails). If we do introduce intrinsics that are required to abort the program if the address is out of bounds, we could probably do both checks in the same BB, as either way the program would abort (but it would be harder to distinguish CFI failures from other types of crashes).

kcc added inline comments.Feb 6 2015, 3:11 PM

lib/Transforms/IPO/LowerBitSets.cpp
169	do you need M here?
198	I'd split this function into two at this point. Then the second part of the function will become easily unit-testable.
202	This part is the key to understanding what we are doing here. It would be nice to have examples of input (array of offsets) and output (bitset). But then, a unit tests would be a better example than comments. I do not agree with you here that we have enough test coverage and hence do not need more tests. unit tests are not necessarily required only for test coverage -- they are also a good way to provide examples.
294	the code gen seems to generate the BT instruction sometimes. It's may be better to just create an instruction sequence that will be lowered to BT. Example: void foo(unsigned char a, unsigned long b) { if (a & (1 << b)) __builtin_trap(); } 0: 0f a3 f7 bt %esi,%edi 3: 72 01 jb 6 <_Z3foohm+0x6> 5: c3 retq 6: 0f 0b ud2

pcc added inline comments.Feb 6 2015, 3:26 PM

lib/Transforms/IPO/LowerBitSets.cpp
169	No, I'll remove it.
198	Makes sense, I'll do that.
202	Fair enough, I'll see if I can add a few unit tests.
294	Yes, we can do that. I did try before to emit the register variant of BT but I guess I wasn't using the correct pattern. I'll play around with the IR to see if I can get it to match. I was also thinking more of the memory variant of BT that basically does exactly what we want given a bit offset.

Use register variant of bt instruction on x86
Move bitset lowering pass before simplifycfg
If the bit set is sufficiently small, we can avoid a load by bit testing a constant
Update test case
Move part of implementation to testable location
Add unit tests for BitSetBuilder
Remove M parameter

Also implemented an optimization for bit sets of size <= 64 bits.

lib/Transforms/IPO/LowerBitSets.cpp
169	Done.
198	Done.
202	Done.
294	This now uses the register BT.

Update comment with benchmark results for memory bt

kcc added inline comments.Feb 18 2015, 2:49 PM

lib/Transforms/IPO/LowerBitSets.cpp
175	You can get all of these, except for IntPtrTy, from the IRBuilder. Just FYI, this way is also fine, I think.
189	I would add comments for every non-trivial function. In most cases, I understand what these things are doing, other readers (including myself in 6 months) will probably be puzzled.
292	split into a separate function?

Remove Int8Ty field.
Add function comments
Factor out some code into a function

lib/Transforms/IPO/LowerBitSets.cpp
189	Done
292	Done

jfb added inline comments.Feb 19 2015, 10:26 AM

lib/Transforms/IPO/LowerBitSets.cpp
43	1 should technically be a `uint64_t` too, though if the current code triggered UB things would be awful in other ways!
237	I'm mildly disappointed that the optimizer doesn't do this by taking into account ISA-specific sizes (and then removing the dead global because its address isn't taken).

Use a 64-bit shift, add unit tests for containsGlobalOffset

lib/Transforms/IPO/LowerBitSets.cpp
43	Thanks, fixed and added unit tests for this function.
237	This should in principle be possible, but this pass runs late so in any case it seems best to directly generate the IR we need.

jfb added inline comments.Feb 19 2015, 12:51 PM

lib/Transforms/IPO/LowerBitSets.cpp
237	Does it need to run that late? 8 may not be the right number on all architectures, and you should see a good binary size reduction by GC'ing unreferenced globals (especially if CFI+devirtualization occurs).

pcc added inline comments.Feb 19 2015, 1:45 PM

lib/Transforms/IPO/LowerBitSets.cpp
237	It seemed better to run the pass later because it splits basic blocks, which could pessimize things in other passes, and bitset creation essentially locks in the set of virtual tables that appear in the binary, preventing them from being GCd. I'm not sure what the best way to deal with this might be. We might later want to split this pass into an early pass and a late pass, where the early pass uses bitset metadata to do devirtualization, a later globalopt pass GCs unused vtables and the late pass builds the actual bitsets.

jfb added inline comments.Feb 19 2015, 1:59 PM

lib/Transforms/IPO/LowerBitSets.cpp
237	Maybe I have a naive view, but shouldn't bitsets be GC'able if no test refers to them? This doesn't need to be an optimization that's specific to your code, LLVM can do this in general when a global doesn't escape and isn't address-taken (and in your case, is read-only). If this is correct, then I don't think you need to split up this pass, though I agree that you may want to do devirtualization earlier to expose more optimization opportunities. Under the current setup, do redundant tests in the same function get eliminated and control flow merged? This may be something that we can leave open for later changes: I think the current code is good in that it does what's required and is pretty efficient at it. I don't think the design will change substantially, but I do think there are further optimization opportunities here. WDYT?

pcc added inline comments.Feb 19 2015, 2:59 PM

lib/Transforms/IPO/LowerBitSets.cpp
237	Maybe I have a naive view, but shouldn't bitsets be GC'able if no test refers to them? They could be, but the globals that the bitsets map onto (i.e. the vtables) cannot be GC'd because we lay them out in a specific order in this pass. We only build bitset constants for bitsets that are referred to by tests. The loop near the start of `LowerBitSets::buildBitSets` identifies all such bitsets by looking through the uses of the `llvm.bitset.test` intrinsic. If a particular test is dead, LLVM should equally be able to remove the dead test (as it is a readonly intrinsic) or remove a dead load from a bitset as part of DCE (in fact the former would probably be easier because of the simpler control flow). Another advantage of doing this late is that allowing the earlier passes to eliminate dead tests we potentially reduce the number of equivalence classes we need to create, which could result in smaller disjoint sets of classes and therefore smaller bitsets. Under the current setup, do redundant tests in the same function get eliminated and control flow merged? Are you referring to cases where a virtual call happens twice through the same pointer? struct S { virtual void f(); }; [...] S *p = ...; p->f(); p->f(); The problem is that it will be difficult to remove redundant tests because of the semantics of C++. In this case the function f could overwrite the memory region that p refers to with an object of a different derived class without invoking undefined behavior. We might want a flag that a user can use to promise that such things will never happen though. I don't think the design will change substantially, but I do think there are further optimization opportunities here. WDYT? At a high level I do agree that there are optimization opportunities to pursue here. (I could elaborate, but this probably isn't the place.)

Comment at: lib/Transforms/IPO/LowerBitSets.cpp:236
@@ +235,3 @@
+ Value *BitOffset) {
+ if (BSI.Bits.size() <= 8) {
+ // If the bit set is sufficiently small, we can avoid a load by bit

testing

Maybe I have a naive view, but shouldn't bitsets be GC'able if no test

refers to them? This doesn't need to be an optimization that's specific to
your code, LLVM can do this in general when a global doesn't escape and
isn't address-taken (and in your case, is read-only). If this is correct,
then I don't think you need to split up this pass, though I agree that you
may want to do devirtualization earlier to expose more optimization
opportunities.

Under the current setup, do redundant tests in the same function get

eliminated and control flow merged?

This may be something that we can leave open for later changes: I think

the current code is good in that it does what's required and is pretty
efficient at it. I don't think the design will change substantially, but I
do think there are further optimization opportunities here. WDYT?

Maybe I have a naive view, but shouldn't bitsets be GC'able if no test

refers to them?

They could be, but the globals that the bitsets map onto (i.e. the
vtables) cannot be GC'd because we lay them out in a specific order in this
pass.

We only build bitset constants for bitsets that are referred to by tests.
The loop near the start of LowerBitSets::buildBitSets identifies all such
bitsets by looking through the uses of the llvm.bitset.test intrinsic. If
a particular test is dead, LLVM should equally be able to remove the dead
test (as it is a readonly intrinsic) or remove a dead load from a bitset as
part of DCE (in fact the former would probably be easier because of the
simpler control flow).

OK, that's what I was hoping happens (I was afraid the optimization was
running too early to be able to do this).

Another advantage of doing this late is that allowing the earlier passes to

eliminate dead tests we potentially reduce the number of equivalence
classes we need to create, which could result in smaller disjoint sets of
classes and therefore smaller bitsets.

Agreed.

Under the current setup, do redundant tests in the same function get
eliminated and control flow merged?

Are you referring to cases where a virtual call happens twice through the
same pointer?

Yes, or any case where the test intrinsic is redundant (doesn't have to be
the same f).

The problem is that it will be difficult to remove redundant tests because

of the semantics of C++. In this case the function f could overwrite the
memory region that p refers to with an object of a different derived class
without invoking undefined behavior. We might want a flag that a user can
use to promise that such things will never happen though.

Good point, I hadn't thought through that. It may be worth adding to the
design doc?

I don't think the design will change substantially, but I do think there
are further optimization opportunities here. WDYT?

At a high level I do agree that there are optimization opportunities to
pursue here. (I could elaborate, but this probably isn't the place.)

OK, overall this lgtm, I think we're on the same page w.r.t. potential
optimizations.

LGTM

I think at this point it will become easier to continue reviews/improvements if the code is committed.
Let's give one more day to other reviewers -- if you don't hear significant objections please commit tomorrow.

lib/Transforms/IPO/LowerBitSets.cpp
363	remove {}

Closed by commit rL230054: Introduce bitset metadata format and bitset lowering pass. (authored by pcc). · Explain WhyFeb 20 2015, 12:33 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

docs/

BitSets.rst

66 lines

LangRef.rst

31 lines

index.rst

1 line

include/

llvm/

ADT/

EquivalenceClasses.h

2 lines

PointerUnion.h

6 lines

IR/

Intrinsics.td

5 lines

InitializePasses.h

1 line

Transforms/

IPO.h

4 lines

IPO/

LowerBitSets.h

78 lines

lib/

Transforms/

IPO/

CMakeLists.txt

1 line

IPO.cpp

1 line

LowerBitSets.cpp

512 lines

PassManagerBuilder.cpp

3 lines

test/

Transforms/

LowerBitSets/

constant.ll

34 lines

simple.ll

127 lines

single-offset.ll

40 lines

unittests/

Transforms/

CMakeLists.txt

1 line

IPO/

CMakeLists.txt

9 lines

LowerBitSets.cpp

50 lines

	IPO/

Makefile

14 lines

Makefile

2 lines

Commit	Tree	Parents	Author	Summary	Date
fe9a85c2c1de	6eeb7302cf0f	f1f3c21c85ea	Peter Collingbourne	Update comment with benchmark results for memory bt	Feb 12 2015, 11:26 AM
f1f3c21c85ea	c9bce0f7f3c9	528a5d1e9d66	Peter Collingbourne	Remove M parameter	Feb 9 2015, 2:27 PM
528a5d1e9d66	e05ea1fdb000	21cff3940920	Peter Collingbourne	Add unit tests for BitSetBuilder	Feb 9 2015, 1:44 PM
21cff3940920	b1fe38646cae	72b75a32047c	Peter Collingbourne	Move part of implementation to testable location	Feb 9 2015, 1:25 PM
72b75a32047c	e9c8bed11ff7	974c2ba0d00d	Peter Collingbourne	Update test case	Feb 9 2015, 12:12 PM
974c2ba0d00d	79b9ae4b81ce	2d8e0fa9ce4d	Peter Collingbourne	If the bit set is sufficiently small, we can avoid a load by bit testing a… (Show More…)	Feb 6 2015, 8:08 PM
2d8e0fa9ce4d	c11d1903b5f4	5e941de6b494	Peter Collingbourne	Move bitset lowering pass before simplifycfg	Feb 6 2015, 6:27 PM
5e941de6b494	f675849b666f	52c4f8d79226	Peter Collingbourne	Use register variant of bt instruction on x86	Feb 6 2015, 6:09 PM
52c4f8d79226	eb75c18adb71	47f4f098de5f	Peter Collingbourne	Use SmallVector	Feb 6 2015, 12:36 PM
47f4f098de5f	60e9e6121541	fce3bc7f3bfd	Peter Collingbourne	Add fast check for case where bitset contains single element	Feb 6 2015, 10:47 AM
fce3bc7f3bfd	49baf043189d	29045f8d4abb	Peter Collingbourne	Check if the pointer is statically a member of the bitset; if so, remove the… (Show More…)	Feb 5 2015, 6:30 PM
29045f8d4abb	3d859d47af2d	dc48932689b5	Peter Collingbourne	Enforce availability of data layout	Feb 5 2015, 5:33 PM
dc48932689b5	cafe60a4a566	10dff646bc92	Peter Collingbourne	Add statistics	Feb 5 2015, 1:19 PM
10dff646bc92	a13b4d701bc4	719208df3193	Peter Collingbourne	s/constants/metadata/	Feb 5 2015, 1:10 PM
719208df3193	e5bb13c9a5d4	2470d2246024	Peter Collingbourne	Address review comments	Feb 5 2015, 11:48 AM
2470d2246024	db6147edc1e1	b3044b3ee04e	Peter Collingbourne	Move documentation to separate page	Feb 4 2015, 5:11 PM
b3044b3ee04e	638b30cb30b9	f47940e08da4	Peter Collingbourne	Documentation fixes	Feb 4 2015, 11:55 AM
f47940e08da4	cbea4706dbc6	25e0a815c498	Peter Collingbourne	Switch to metadata format { bitset name, global, byte offset } to avoid… (Show More…)	Feb 3 2015, 10:15 PM
25e0a815c498	0ee817b1c930	f172fdcec072	Peter Collingbourne	Use metadata instead of introducing the bitset global feature	Feb 3 2015, 3:41 PM
f172fdcec072	dfdd6d63226e	d6962cc3d06d	Peter Collingbourne	Correctly handle multiple calls to llvm.bitset.test for a single bitset	Jan 30 2015, 2:55 PM
d6962cc3d06d	444988a8b05e	e247dd283994	Peter Collingbourne	Introduce bitset global attribute and bitset lowering pass. (Show More…)	Jan 23 2015, 11:49 AM

Diff 19849

docs/BitSets.rst

This file was added.

				=======
				Bitsets
				=======

				This is a mechanism that allows IR modules to co-operatively build pointer
				sets corresponding to addresses within a given set of globals. One example
				of a use case for this is to allow a C++ program to efficiently verify (at
				each call site) that a vtable pointer is in the set of valid vtable pointers
				for the type of the class or its derived classes.

				To use the mechanism, a client creates a global metadata node named
				``llvm.bitsets``. Each element is a metadata node with three elements:
				the first is a metadata string containing an identifier for the bitset,
				the second is a global variable and the third is a byte offset into the
				global variable.

				This will cause a link-time optimization pass to generate bitsets from the
				memory addresses referenced from the elements of the bitset metadata. The pass
				will lay out the referenced globals consecutively, so their definitions must
				be available at LTO time. An intrinsic, :ref:`llvm.bitset.test <bitset.test>`,
				generates code to test whether a given pointer is a member of a bitset.

				:Example:

				::

				target datalayout = "e-p:32:32"

				@a = internal global i32 0
				@b = internal global i32 0
				@c = internal global i32 0
				@d = internal global [2 x i32] [i32 0, i32 0]

				!llvm.bitsets = !{!0, !1, !2, !3, !4}

				!0 = !{!"bitset1", i32* @a, i32 0}
				!1 = !{!"bitset1", i32* @b, i32 0}
				!2 = !{!"bitset2", i32* @b, i32 0}
				!3 = !{!"bitset2", i32* @c, i32 0}
				!4 = !{!"bitset2", i32* @d, i32 4}

				declare i1 @llvm.bitset.test(i8* %ptr, metadata %bitset) nounwind readnone

				define i1 @foo(i32* %p) {
				%pi8 = bitcast i32* %p to i8*
				%x = call i1 @llvm.bitset.test(i8* %pi8, metadata !"bitset1")
				ret i1 %x
				}

				define i1 @bar(i32* %p) {
				%pi8 = bitcast i32* %p to i8*
				%x = call i1 @llvm.bitset.test(i8* %pi8, metadata !"bitset2")
				ret i1 %x
				}

				define void @main() {
				%a1 = call i1 @foo(i32* @a) ; returns 1
				%b1 = call i1 @foo(i32* @b) ; returns 1
				%c1 = call i1 @foo(i32* @c) ; returns 0
				%a2 = call i1 @bar(i32* @a) ; returns 0
				%b2 = call i1 @bar(i32* @b) ; returns 1
				%c2 = call i1 @bar(i32* @c) ; returns 1
				%d02 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 0)) ; returns 0
				%d12 = call i1 @bar(i32* getelementptr ([2 x i32]* @d, i32 0, i32 1)) ; returns 1
				ret void
				}

docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,857 Lines • ▼ Show 20 Lines

The LLVM type system is one of the most important features of the		The LLVM type system is one of the most important features of the
intermediate representation. Being typed enables a number of		intermediate representation. Being typed enables a number of
optimizations to be performed on the intermediate representation		optimizations to be performed on the intermediate representation
directly, without having to do extra analyses on the side before the		directly, without having to do extra analyses on the side before the
transformation. A strong type system makes it easier to read the		transformation. A strong type system makes it easier to read the
generated code and enables novel analyses and transformations that are		generated code and enables novel analyses and transformations that are
not feasible to perform on normal three address code representations.		not feasible to perform on normal three address code representations.

		kccUnsubmitted Not Done Reply Inline Actions two elements... and the third is ... kcc: two elements... and the third is ...
		pccAuthorUnsubmitted Not Done Reply Inline Actions Done. pcc: Done.
.. _t_void:		.. _t_void:

Void Type		Void Type
---------		---------

:Overview:		:Overview:


The void type does not represent any value and has no size.		The void type does not represent any value and has no size.

:Syntax:		:Syntax:


::		::

void		void


.. _t_function:		.. _t_function:

Function Type		Function Type
-------------		-------------

		kccUnsubmitted Not Done Reply Inline Actions I'd suggest you to have an example that has a non-zero offset (otherwise it's not clear why you need offsets) kcc: I'd suggest you to have an example that has a non-zero offset (otherwise it's not clear why you…
		pccAuthorUnsubmitted Not Done Reply Inline Actions Done. pcc: Done.
:Overview:		:Overview:


The function type can be thought of as a function signature. It consists of a		The function type can be thought of as a function signature. It consists of a
return type and a list of formal parameter types. The return type of a function		return type and a list of formal parameter types. The return type of a function
type is a void type or first class type --- except for :ref:`label <t_label>`		type is a void type or first class type --- except for :ref:`label <t_label>`
and :ref:`metadata <t_metadata>` types.		and :ref:`metadata <t_metadata>` types.

▲ Show 20 Lines • Show All 1,402 Lines • ▼ Show 20 Lines	inner.for.end:
br i1 %exitcond, label %outer.for.end, label %outer.for.body, !llvm.loop !2		br i1 %exitcond, label %outer.for.end, label %outer.for.body, !llvm.loop !2

outer.for.end: ; preds = %for.body		outer.for.end: ; preds = %for.body
...		...
!0 = !{!1, !2} ; a list of loop identifiers		!0 = !{!1, !2} ; a list of loop identifiers
!1 = !{!1} ; an identifier for the inner loop		!1 = !{!1} ; an identifier for the inner loop
!2 = !{!2} ; an identifier for the outer loop		!2 = !{!2} ; an identifier for the outer loop

		'``llvm.bitsets``'
		^^^^^^^^^^^^^^^^^^

		The ``llvm.bitsets`` global metadata is used to implement
		:doc:`bitsets <BitSets>`.

Module Flags Metadata		Module Flags Metadata
=====================		=====================

Information about the module as a whole is difficult to convey to LLVM's		Information about the module as a whole is difficult to convey to LLVM's
subsystems. The LLVM IR isn't sufficient to transmit this information.		subsystems. The LLVM IR isn't sufficient to transmit this information.
The ``llvm.module.flags`` named metadata exists in order to facilitate		The ``llvm.module.flags`` named metadata exists in order to facilitate
this. These flags are in the form of key / value pairs --- much like a		this. These flags are in the form of key / value pairs --- much like a
dictionary --- making it easy for any subsystem who cares about a flag to		dictionary --- making it easy for any subsystem who cares about a flag to
▲ Show 20 Lines • Show All 6,567 Lines • ▼ Show 20 Lines
used by the ``llvm.assume`` intrinsic in order to preserve the instructions		used by the ``llvm.assume`` intrinsic in order to preserve the instructions
only used to form the intrinsic's input argument. This might prove undesirable		only used to form the intrinsic's input argument. This might prove undesirable
if the extra information provided by the ``llvm.assume`` intrinsic does not cause		if the extra information provided by the ``llvm.assume`` intrinsic does not cause
sufficient overall improvement in code quality. For this reason,		sufficient overall improvement in code quality. For this reason,
``llvm.assume`` should not be used to document basic mathematical invariants		``llvm.assume`` should not be used to document basic mathematical invariants
that the optimizer can otherwise deduce or facts that are of little use to the		that the optimizer can otherwise deduce or facts that are of little use to the
optimizer.		optimizer.

		.. _bitset.test:

		'``llvm.bitset.test``' Intrinsic
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""

		::

		declare i1 @llvm.bitset.test(i8* %ptr, metadata %bitset) nounwind readnone


		Arguments:
		""""""""""

		The first argument is a pointer to be tested. The second argument is a
		metadata string containing the name of a :doc:`bitset <BitSets>`.

		Overview:
		"""""""""

		The ``llvm.bitset.test`` intrinsic tests whether the given pointer is a
		member of the given bitset.

'``llvm.donothing``' Intrinsic		'``llvm.donothing``' Intrinsic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 26 Lines

docs/index.rst

Show First 20 Lines • Show All 238 Lines • ▼ Show 20 Lines	.. toctree::
NVPTXUsage		NVPTXUsage
R600Usage		R600Usage
StackMaps		StackMaps
InAlloca		InAlloca
BigEndianNEON		BigEndianNEON
CoverageMappingFormat		CoverageMappingFormat
Statepoints		Statepoints
MergeFunctions		MergeFunctions
		BitSets

:doc:`WritingAnLLVMPass`		:doc:`WritingAnLLVMPass`
Information on how to write LLVM transformations and analyses.		Information on how to write LLVM transformations and analyses.

:doc:`WritingAnLLVMBackend`		:doc:`WritingAnLLVMBackend`
Information on how to write LLVM backends for machine targets.		Information on how to write LLVM backends for machine targets.

:doc:`CodeGenerator`		:doc:`CodeGenerator`
▲ Show 20 Lines • Show All 214 Lines • Show Last 20 Lines

include/llvm/ADT/EquivalenceClasses.h

Show First 20 Lines • Show All 249 Lines • ▼ Show 20 Lines	public:

explicit member_iterator() {}		explicit member_iterator() {}
explicit member_iterator(const ECValue *N) : Node(N) {}		explicit member_iterator(const ECValue *N) : Node(N) {}

reference operator*() const {		reference operator*() const {
assert(Node != nullptr && "Dereferencing end()!");		assert(Node != nullptr && "Dereferencing end()!");
return Node->getData();		return Node->getData();
}		}
reference operator->() const { return operator*(); }		pointer operator->() const { return &operator*(); }

member_iterator &operator++() {		member_iterator &operator++() {
assert(Node != nullptr && "++'d off the end of the list!");		assert(Node != nullptr && "++'d off the end of the list!");
Node = Node->getNext();		Node = Node->getNext();
return *this;		return *this;
}		}

member_iterator operator++(int) { // postincrement operators.		member_iterator operator++(int) { // postincrement operators.
Show All 17 Lines

include/llvm/ADT/PointerUnion.h

Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	namespace llvm {
}		}

template<typename PT1, typename PT2>		template<typename PT1, typename PT2>
static bool operator!=(PointerUnion<PT1, PT2> lhs,		static bool operator!=(PointerUnion<PT1, PT2> lhs,
PointerUnion<PT1, PT2> rhs) {		PointerUnion<PT1, PT2> rhs) {
return lhs.getOpaqueValue() != rhs.getOpaqueValue();		return lhs.getOpaqueValue() != rhs.getOpaqueValue();
}		}

		template<typename PT1, typename PT2>
		static bool operator<(PointerUnion<PT1, PT2> lhs,
		PointerUnion<PT1, PT2> rhs) {
		return lhs.getOpaqueValue() < rhs.getOpaqueValue();
		}

// Teach SmallPtrSet that PointerUnion is "basically a pointer", that has		// Teach SmallPtrSet that PointerUnion is "basically a pointer", that has
// # low bits available = min(PT1bits,PT2bits)-1.		// # low bits available = min(PT1bits,PT2bits)-1.
template<typename PT1, typename PT2>		template<typename PT1, typename PT2>
class PointerLikeTypeTraits<PointerUnion<PT1, PT2> > {		class PointerLikeTypeTraits<PointerUnion<PT1, PT2> > {
public:		public:
static inline void *		static inline void *
getAsVoidPointer(const PointerUnion<PT1, PT2> &P) {		getAsVoidPointer(const PointerUnion<PT1, PT2> &P) {
return P.getOpaqueValue();		return P.getOpaqueValue();
▲ Show 20 Lines • Show All 301 Lines • Show Last 20 Lines

include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 578 Lines • ▼ Show 20 Lines	def int_masked_store : Intrinsic<[], [llvm_anyvector_ty, LLVMPointerTo<0>,
llvm_i32_ty,		llvm_i32_ty,
LLVMVectorSameWidth<0, llvm_i1_ty>],		LLVMVectorSameWidth<0, llvm_i1_ty>],
[IntrReadWriteArgMem]>;		[IntrReadWriteArgMem]>;

def int_masked_load : Intrinsic<[llvm_anyvector_ty],		def int_masked_load : Intrinsic<[llvm_anyvector_ty],
[LLVMPointerTo<0>, llvm_i32_ty,		[LLVMPointerTo<0>, llvm_i32_ty,
LLVMVectorSameWidth<0, llvm_i1_ty>, LLVMMatchType<0>],		LLVMVectorSameWidth<0, llvm_i1_ty>, LLVMMatchType<0>],
[IntrReadArgMem]>;		[IntrReadArgMem]>;

		// Intrinsics to support bit sets.
		def int_bitset_test : Intrinsic<[llvm_i1_ty], [llvm_ptr_ty, llvm_metadata_ty],
		[IntrNoMem]>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Target-specific intrinsics		// Target-specific intrinsics
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

include "llvm/IR/IntrinsicsPowerPC.td"		include "llvm/IR/IntrinsicsPowerPC.td"
include "llvm/IR/IntrinsicsX86.td"		include "llvm/IR/IntrinsicsX86.td"
include "llvm/IR/IntrinsicsARM.td"		include "llvm/IR/IntrinsicsARM.td"
include "llvm/IR/IntrinsicsAArch64.td"		include "llvm/IR/IntrinsicsAArch64.td"
include "llvm/IR/IntrinsicsXCore.td"		include "llvm/IR/IntrinsicsXCore.td"
include "llvm/IR/IntrinsicsHexagon.td"		include "llvm/IR/IntrinsicsHexagon.td"
include "llvm/IR/IntrinsicsNVVM.td"		include "llvm/IR/IntrinsicsNVVM.td"
include "llvm/IR/IntrinsicsMips.td"		include "llvm/IR/IntrinsicsMips.td"
include "llvm/IR/IntrinsicsR600.td"		include "llvm/IR/IntrinsicsR600.td"
include "llvm/IR/IntrinsicsBPF.td"		include "llvm/IR/IntrinsicsBPF.td"

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 171 Lines • ▼ Show 20 Lines
	void initializeLoopSimplifyPass(PassRegistry&);			void initializeLoopSimplifyPass(PassRegistry&);
	void initializeLoopStrengthReducePass(PassRegistry&);			void initializeLoopStrengthReducePass(PassRegistry&);
	void initializeGlobalMergePass(PassRegistry&);			void initializeGlobalMergePass(PassRegistry&);
	void initializeLoopRerollPass(PassRegistry&);			void initializeLoopRerollPass(PassRegistry&);
	void initializeLoopUnrollPass(PassRegistry&);			void initializeLoopUnrollPass(PassRegistry&);
	void initializeLoopUnswitchPass(PassRegistry&);			void initializeLoopUnswitchPass(PassRegistry&);
	void initializeLoopIdiomRecognizePass(PassRegistry&);			void initializeLoopIdiomRecognizePass(PassRegistry&);
	void initializeLowerAtomicPass(PassRegistry&);			void initializeLowerAtomicPass(PassRegistry&);
				void initializeLowerBitSetsPass(PassRegistry&);
	void initializeLowerExpectIntrinsicPass(PassRegistry&);			void initializeLowerExpectIntrinsicPass(PassRegistry&);
	void initializeLowerIntrinsicsPass(PassRegistry&);			void initializeLowerIntrinsicsPass(PassRegistry&);
	void initializeLowerInvokePass(PassRegistry&);			void initializeLowerInvokePass(PassRegistry&);
	void initializeLowerSwitchPass(PassRegistry&);			void initializeLowerSwitchPass(PassRegistry&);
	void initializeMachineBlockFrequencyInfoPass(PassRegistry&);			void initializeMachineBlockFrequencyInfoPass(PassRegistry&);
	void initializeMachineBlockPlacementPass(PassRegistry&);			void initializeMachineBlockPlacementPass(PassRegistry&);
	void initializeMachineBlockPlacementStatsPass(PassRegistry&);			void initializeMachineBlockPlacementStatsPass(PassRegistry&);
	void initializeMachineBranchProbabilityInfoPass(PassRegistry&);			void initializeMachineBranchProbabilityInfoPass(PassRegistry&);
	▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

include/llvm/Transforms/IPO.h

	Show First 20 Lines • Show All 193 Lines • ▼ Show 20 Lines
	//			//
	ModulePass *createMetaRenamerPass();			ModulePass *createMetaRenamerPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	/// createBarrierNoopPass - This pass is purely a module pass barrier in a pass			/// createBarrierNoopPass - This pass is purely a module pass barrier in a pass
	/// manager.			/// manager.
	ModulePass *createBarrierNoopPass();			ModulePass *createBarrierNoopPass();

				/// \brief This pass lowers bitset metadata and the llvm.bitset.test intrinsic
				/// to bitsets.
				ModulePass *createLowerBitSetsPass();

	} // End llvm namespace			} // End llvm namespace

	#endif			#endif

include/llvm/Transforms/IPO/LowerBitSets.h

This file was added.

				//===- LowerBitSets.h - Bitset lowering pass --------------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines parts of the bitset lowering pass implementation that may
				// be usefully unit tested.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_IPO_LOWERBITSETS_H
				#define LLVM_TRANSFORMS_IPO_LOWERBITSETS_H

				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/SmallVector.h"

				#include <stdint.h>
				#include <limits>
				#include <vector>

				namespace llvm {

				class DataLayout;
				class GlobalVariable;
				class Value;

				struct BitSetInfo {
				// The actual bitset.
				std::vector<uint8_t> Bits;

				// The byte offset into the combined global represented by the bitset.
				uint64_t ByteOffset;

				// The size of the bitset in bits.
				uint64_t BitSize;

				// Log2 alignment of the bit set relative to the combined global.
				// For example, a log2 alignment of 3 means that bits in the bitset
				// represent addresses 8 bytes apart.
				unsigned AlignLog2;

				bool isSingleOffset() const {
				return Bits.size() == 1 && Bits[0] == 1;
				}

				bool containsGlobalOffset(uint64_t Offset) const;

				bool containsValue(const DataLayout *DL,
				const DenseMap<GlobalVariable *, uint64_t> &GlobalLayout,
				Value *V, uint64_t COffset = 0) const;

				};

				struct BitSetBuilder {
				SmallVector<uint64_t, 16> Offsets;
				uint64_t Min, Max;

				BitSetBuilder() : Min(std::numeric_limits<uint64_t>::max()), Max(0) {}

				void addOffset(uint64_t Offset) {
				if (Min > Offset)
				Min = Offset;
				if (Max < Offset)
				Max = Offset;

				Offsets.push_back(Offset);
				}

				BitSetInfo build();
				};

				} // namespace llvm

				#endif

lib/Transforms/IPO/CMakeLists.txt

	add_llvm_library(LLVMipo			add_llvm_library(LLVMipo
	ArgumentPromotion.cpp			ArgumentPromotion.cpp
	BarrierNoopPass.cpp			BarrierNoopPass.cpp
	ConstantMerge.cpp			ConstantMerge.cpp
	DeadArgumentElimination.cpp			DeadArgumentElimination.cpp
	ExtractGV.cpp			ExtractGV.cpp
	FunctionAttrs.cpp			FunctionAttrs.cpp
	GlobalDCE.cpp			GlobalDCE.cpp
	GlobalOpt.cpp			GlobalOpt.cpp
	IPConstantPropagation.cpp			IPConstantPropagation.cpp
	IPO.cpp			IPO.cpp
	InlineAlways.cpp			InlineAlways.cpp
	InlineSimple.cpp			InlineSimple.cpp
	Inliner.cpp			Inliner.cpp
	Internalize.cpp			Internalize.cpp
	LoopExtractor.cpp			LoopExtractor.cpp
				LowerBitSets.cpp
	MergeFunctions.cpp			MergeFunctions.cpp
	PartialInlining.cpp			PartialInlining.cpp
	PassManagerBuilder.cpp			PassManagerBuilder.cpp
	PruneEH.cpp			PruneEH.cpp
	StripDeadPrototypes.cpp			StripDeadPrototypes.cpp
	StripSymbols.cpp			StripSymbols.cpp
	)			)

	add_dependencies(LLVMipo intrinsics_gen)			add_dependencies(LLVMipo intrinsics_gen)

lib/Transforms/IPO/IPO.cpp

Show All 30 Lines	void llvm::initializeIPO(PassRegistry &Registry) {
initializeGlobalOptPass(Registry);		initializeGlobalOptPass(Registry);
initializeIPCPPass(Registry);		initializeIPCPPass(Registry);
initializeAlwaysInlinerPass(Registry);		initializeAlwaysInlinerPass(Registry);
initializeSimpleInlinerPass(Registry);		initializeSimpleInlinerPass(Registry);
initializeInternalizePassPass(Registry);		initializeInternalizePassPass(Registry);
initializeLoopExtractorPass(Registry);		initializeLoopExtractorPass(Registry);
initializeBlockExtractorPassPass(Registry);		initializeBlockExtractorPassPass(Registry);
initializeSingleLoopExtractorPass(Registry);		initializeSingleLoopExtractorPass(Registry);
		initializeLowerBitSetsPass(Registry);
initializeMergeFunctionsPass(Registry);		initializeMergeFunctionsPass(Registry);
initializePartialInlinerPass(Registry);		initializePartialInlinerPass(Registry);
initializePruneEHPass(Registry);		initializePruneEHPass(Registry);
initializeStripDeadPrototypesPassPass(Registry);		initializeStripDeadPrototypesPassPass(Registry);
initializeStripSymbolsPass(Registry);		initializeStripSymbolsPass(Registry);
initializeStripDebugDeclarePass(Registry);		initializeStripDebugDeclarePass(Registry);
initializeStripDeadDebugInfoPass(Registry);		initializeStripDeadDebugInfoPass(Registry);
initializeStripNonDebugSymbolsPass(Registry);		initializeStripNonDebugSymbolsPass(Registry);
▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

lib/Transforms/IPO/LowerBitSets.cpp

This file was added.

				//===-- LowerBitSets.cpp - Bitset lowering pass ---------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass lowers bitset metadata and calls to the llvm.bitset.test intrinsic.
				// See http://llvm.org/docs/LangRef.html#bitsets for more information.
				kccUnsubmitted Not Done Reply Inline Actions are we going to handle InvokeInst? Check llvm/IR/CallSite.h kcc: are we going to handle InvokeInst? Check llvm/IR/CallSite.h
				pccAuthorUnsubmitted Not Done Reply Inline Actions It isn't normally possible to invoke an intrinsic -- I think this is only possible for two specific intrinsics. In any case, this intrinsic cannot unwind. pcc: It isn't normally possible to invoke an intrinsic -- I think this is only possible for two…
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/IPO/LowerBitSets.h"
				#include "llvm/Transforms/IPO.h"
				#include "llvm/ADT/EquivalenceClasses.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/IR/Constant.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/GlobalVariable.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/Intrinsics.h"
				#include "llvm/IR/Module.h"
				#include "llvm/IR/Operator.h"
				#include "llvm/Pass.h"
				#include "llvm/Transforms/Utils/BasicBlockUtils.h"

				using namespace llvm;

				#define DEBUG_TYPE "lowerbitsets"

				STATISTIC(NumBitSetsCreated, "Number of bitsets created");
				STATISTIC(NumBitSetCallsLowered, "Number of bitset calls lowered");
				STATISTIC(NumBitSetDisjointSets, "Number of disjoint sets of bitsets");

				bool BitSetInfo::containsGlobalOffset(uint64_t Offset) const {
				if (Offset < ByteOffset)
				return false;

				if ((Offset - ByteOffset) % (1 << AlignLog2) != 0)
				return false;
				jfbUnsubmitted Not Done Reply Inline Actions 1 should technically be a `uint64_t` too, though if the current code triggered UB things would be awful in other ways! jfb: 1 should technically be a `uint64_t` too, though if the current code triggered UB things would…
				pccAuthorUnsubmitted Not Done Reply Inline Actions Thanks, fixed and added unit tests for this function. pcc: Thanks, fixed and added unit tests for this function.

				uint64_t BitOffset = (Offset - ByteOffset) >> AlignLog2;
				kccUnsubmitted Not Done Reply Inline Actions } // namespace kcc: } // namespace
				pccAuthorUnsubmitted Not Done Reply Inline Actions Done. pcc: Done.
				if (BitOffset >= BitSize)
				return false;

				return (Bits[BitOffset / 8] >> (BitOffset % 8)) & 1;
				}

				bool BitSetInfo::containsValue(
				const DataLayout *DL,
				const DenseMap<GlobalVariable , uint64_t> &GlobalLayout, Value V,
				uint64_t COffset) const {
				kccUnsubmitted Not Done Reply Inline Actions I wonder if you could split this 300 LOC function into several smaller ones. kcc: I wonder if you could split this 300 LOC function into several smaller ones.
				pccAuthorUnsubmitted Not Done Reply Inline Actions Done. pcc: Done.
				if (auto GV = dyn_cast<GlobalVariable>(V)) {
				kccUnsubmitted Not Done Reply Inline Actions SmallVector maybe kcc: SmallVector maybe
				pccAuthorUnsubmitted Not Done Reply Inline Actions I don't think we can make any prediction one way or another how large these are going to get. For vptr CFI it depends on the number of virtual calls made using a particular class, which is essentially unbounded. In the absence of any evidence I'd prefer to leave this as a `std::vector`. pcc: I don't think we can make any prediction one way or another how large these are going to get.
				auto I = GlobalLayout.find(GV);
				if (I == GlobalLayout.end())
				return false;
				jfbUnsubmitted Not Done Reply Inline Actions Shouldn't it be an error to try to run the bit set test pass without a module containing the `llvm.bitset.test` intrinsic? jfb: Shouldn't it be an error to try to run the bit set test pass without a module containing the…
				pccAuthorUnsubmitted Not Done Reply Inline Actions This pass should work with any module, including modules that happen not to contain calls to the intrinsic. pcc: This pass should work with any module, including modules that happen not to contain calls to…
				return containsGlobalOffset(I->second + COffset);
				}

				if (auto GEP = dyn_cast<GEPOperator>(V)) {
				APInt APOffset(DL->getPointerSizeInBits(0), 0);
				bool Result = GEP->accumulateConstantOffset(*DL, APOffset);
				if (!Result)
				return false;
				COffset += APOffset.getZExtValue();
				return containsValue(DL, GlobalLayout, GEP->getPointerOperand(),
				COffset);
				}

				if (auto Op = dyn_cast<Operator>(V)) {
				if (Op->getOpcode() == Instruction::BitCast)
				return containsValue(DL, GlobalLayout, Op->getOperand(0), COffset);

				jfbUnsubmitted Not Done Reply Inline Actions What other valid uses of the intrinsic are there? jfb: What other valid uses of the intrinsic are there?
				pccAuthorUnsubmitted Not Done Reply Inline Actions There shouldn't be any (or at least the verifier should have errored on them). I've changed this to a `cast`. pcc: There shouldn't be any (or at least the verifier should have errored on them). I've changed…
				if (Op->getOpcode() == Instruction::Select)
				return containsValue(DL, GlobalLayout, Op->getOperand(1), COffset) &&
				containsValue(DL, GlobalLayout, Op->getOperand(2), COffset);
				}

				return false;
				}

				BitSetInfo BitSetBuilder::build() {
				if (Min > Max)
				Min = 0;

				// Normalize each offset against the minimum observed offset, and compute
				// the bitwise OR of each of the offsets. The number of trailing zeros
				jfbUnsubmitted Not Done Reply Inline Actions I'm not sure I get what's going on here :-) jfb: I'm not sure I get what's going on here :-)
				pccAuthorUnsubmitted Not Done Reply Inline Actions I've added a comment which should hopefully clarify things here. pcc: I've added a comment which should hopefully clarify things here.
				// in the mask gives us the log2 of the alignment of all offsets, which
				// allows us to compress the bitset by only storing one bit per aligned
				// address.
				uint64_t Mask = 0;
				for (uint64_t &Offset : Offsets) {
				Offset -= Min;
				Mask \|= Offset;
				}

				BitSetInfo BSI;
				BSI.ByteOffset = Min;

				BSI.AlignLog2 = 0;
				// FIXME: Can probably do something smarter if all offsets are 0.
				if (Mask != 0)
				BSI.AlignLog2 = countTrailingZeros(Mask, ZB_Undefined);

				kccUnsubmitted Not Done Reply Inline Actions I'd prefer if we report_fatal_error w/o DL, just like in AddressSanitizer.cpp kcc: I'd prefer if we report_fatal_error w/o DL, just like in AddressSanitizer.cpp
				pccAuthorUnsubmitted Not Done Reply Inline Actions Done. pcc: Done.
				// Build the compressed bitset while normalizing the offsets against the
				// computed alignment.
				BSI.BitSize = ((Max - Min) >> BSI.AlignLog2) + 1;
				uint64_t ByteSize = (BSI.BitSize + 7) / 8;
				BSI.Bits.resize(ByteSize);
				for (uint64_t Offset : Offsets) {
				Offset >>= BSI.AlignLog2;
				BSI.Bits[Offset / 8] \|= 1 << (Offset % 8);
				}
				kccUnsubmitted Not Done Reply Inline Actions I wonder if this thing is unit-testable separately? We are not doing these things frequently, but I think we should. E.g. see ASanStackFrameLayoutTest.cpp kcc: I wonder if this thing is unit-testable separately? We are not doing these things frequently…
				pccAuthorUnsubmitted Not Done Reply Inline Actions Most likely. I think this already has enough test coverage for the moment though. pcc: Most likely. I think this already has enough test coverage for the moment though.

				return BSI;
				}

				namespace {

				kccUnsubmitted Not Done Reply Inline Actions SmallVector, here and maybe in other places. kcc: SmallVector, here and maybe in other places.
				pccAuthorUnsubmitted Not Done Reply Inline Actions Done. This is probably better as a SmallVector since it lives on the stack and we can probably make an educated guess on its size. pcc: Done. This is probably better as a SmallVector since it lives on the stack and we can probably…
				struct LowerBitSets : public ModulePass {
				static char ID;
				LowerBitSets() : ModulePass(ID) {
				initializeLowerBitSetsPass(*PassRegistry::getPassRegistry());
				}

				const DataLayout *DL;
				IntegerType *Int1Ty;
				IntegerType *Int8Ty;
				IntegerType *Int32Ty;
				Type *Int32PtrTy;
				IntegerType *Int64Ty;
				Type *IntPtrTy;

				// The llvm.bitsets named metadata.
				NamedMDNode *BitSetNM;

				// Mapping from bitset mdstrings to the call sites that test them.
				DenseMap<MDString , std::vector<CallInst >> BitSetTestCallSites;

				BitSetInfo
				buildBitSet(MDString *BitSet,
				const DenseMap<GlobalVariable *, uint64_t> &GlobalLayout);
				void
				lowerBitSetCall(CallInst *CI, const BitSetInfo &BSI,
				GlobalVariable BitSetGlobal, GlobalVariable CombinedGlobal,
				const DenseMap<GlobalVariable *, uint64_t> &GlobalLayout);
				void buildBitSetsFromGlobals(Module &M,
				const std::vector<MDString *> &BitSets,
				const std::vector<GlobalVariable *> &Globals);
				bool buildBitSets(Module &M);
				bool eraseBitSetMetadata(Module &M);

				bool doInitialization(Module &M) override;
				bool runOnModule(Module &M) override;
				};

				} // namespace

				INITIALIZE_PASS_BEGIN(LowerBitSets, "lowerbitsets",
				"Lower bitset metadata", false, false)
				INITIALIZE_PASS_END(LowerBitSets, "lowerbitsets",
				"Lower bitset metadata", false, false)
				char LowerBitSets::ID = 0;

				ModulePass *llvm::createLowerBitSetsPass() { return new LowerBitSets; }

				kccUnsubmitted Not Done Reply Inline Actions do you need M here? kcc: do you need M here?
				pccAuthorUnsubmitted Not Done Reply Inline Actions No, I'll remove it. pcc: No, I'll remove it.
				pccAuthorUnsubmitted Not Done Reply Inline Actions Done. pcc: Done.
				bool LowerBitSets::doInitialization(Module &M) {
				DL = M.getDataLayout();
				if (!DL)
				report_fatal_error("Data layout required");

				Int1Ty = Type::getInt1Ty(M.getContext());
				kccUnsubmitted Not Done Reply Inline Actions You can get all of these, except for IntPtrTy, from the IRBuilder. Just FYI, this way is also fine, I think. kcc: You can get all of these, except for IntPtrTy, from the IRBuilder. Just FYI, this way is also…
				Int8Ty = Type::getInt8Ty(M.getContext());
				Int32Ty = Type::getInt32Ty(M.getContext());
				Int32PtrTy = PointerType::getUnqual(Int32Ty);
				Int64Ty = Type::getInt64Ty(M.getContext());
				IntPtrTy = DL->getIntPtrType(M.getContext(), 0);

				BitSetNM = M.getNamedMetadata("llvm.bitsets");
				jfbUnsubmitted Not Done Reply Inline Actions `UNINT64_MIN` or `std::numeric_limits<uint64_t>::min()` is nicer. jfb: `UNINT64_MIN` or `std::numeric_limits<uint64_t>::min()` is nicer.
				pccAuthorUnsubmitted Not Done Reply Inline Actions (I think you mean `::max`) Done. pcc: (I think you mean `::max`) Done.

				BitSetTestCallSites.clear();

				return false;
				}

				BitSetInfo LowerBitSets::buildBitSet(
				kccUnsubmitted Not Done Reply Inline Actions I would add comments for every non-trivial function. In most cases, I understand what these things are doing, other readers (including myself in 6 months) will probably be puzzled. kcc: I would add comments for every non-trivial function. In most cases, I understand what these…
				pccAuthorUnsubmitted Not Done Reply Inline Actions Done pcc: Done
				MDString *BitSet,
				const DenseMap<GlobalVariable *, uint64_t> &GlobalLayout) {
				BitSetBuilder BSB;

				// Compute the byte offset of each element of this bitset.
				if (BitSetNM) {
				for (MDNode *Op : BitSetNM->operands()) {
				if (Op->getOperand(0) != BitSet \|\| !Op->getOperand(1))
				continue;
				kccUnsubmitted Not Done Reply Inline Actions I'd split this function into two at this point. Then the second part of the function will become easily unit-testable. kcc: I'd split this function into two at this point. Then the second part of the function will…
				pccAuthorUnsubmitted Not Done Reply Inline Actions Makes sense, I'll do that. pcc: Makes sense, I'll do that.
				pccAuthorUnsubmitted Not Done Reply Inline Actions Done. pcc: Done.
				auto OpGlobal = cast<GlobalVariable>(
				cast<ConstantAsMetadata>(Op->getOperand(1))->getValue());
				uint64_t Offset =
				cast<ConstantInt>(cast<ConstantAsMetadata>(Op->getOperand(2))
				kccUnsubmitted Not Done Reply Inline Actions This part is the key to understanding what we are doing here. It would be nice to have examples of input (array of offsets) and output (bitset). But then, a unit tests would be a better example than comments. I do not agree with you here that we have enough test coverage and hence do not need more tests. unit tests are not necessarily required only for test coverage -- they are also a good way to provide examples. kcc: This part is the key to understanding what we are doing here. It would be nice to have…
				pccAuthorUnsubmitted Not Done Reply Inline Actions Fair enough, I'll see if I can add a few unit tests. pcc: Fair enough, I'll see if I can add a few unit tests.
				pccAuthorUnsubmitted Not Done Reply Inline Actions Done. pcc: Done.
				->getValue())->getZExtValue();

				Offset += GlobalLayout.find(OpGlobal)->second;

				BSB.addOffset(Offset);
				}
				}

				return BSB.build();
				kccUnsubmitted Not Done Reply Inline Actions Clever! kcc: Clever!
				}

				static Value createMaskedBitTest(IRBuilder<> &B, Value Bits,
				Value *BitOffset) {
				auto BitsType = cast<IntegerType>(Bits->getType());
				unsigned BitWidth = BitsType->getBitWidth();

				BitOffset = B.CreateZExtOrTrunc(BitOffset, BitsType);
				Value *BitIndex =
				B.CreateAnd(BitOffset, ConstantInt::get(BitsType, BitWidth - 1));
				Value *BitMask = B.CreateShl(ConstantInt::get(BitsType, 1), BitIndex);
				Value *MaskedBits = B.CreateAnd(Bits, BitMask);
				kccUnsubmitted Not Done Reply Inline Actions Do we really want a separate BB here? On valid programs, i.e. in 99.9999% cases both checks will pass, so it might be better to do this w/o branches. kcc: Do we really want a separate BB here? On valid programs, i.e. in 99.9999% cases both checks…
				pccAuthorUnsubmitted Not Done Reply Inline Actions It seems better to have separate BBs for the llvm.bitset.test intrinsic at least. Without the separate BBs, we could segv on out-of-bounds addresses, and it's possible that the caller might always need the boolean result (say if it wanted to print a diagnostic message if the check fails). If we do introduce intrinsics that are required to abort the program if the address is out of bounds, we could probably do both checks in the same BB, as either way the program would abort (but it would be harder to distinguish CFI failures from other types of crashes). pcc: It seems better to have separate BBs for the llvm.bitset.test intrinsic at least. Without the…
				return B.CreateICmpNE(MaskedBits, ConstantInt::get(BitsType, 0));
				}

				void LowerBitSets::lowerBitSetCall(
				CallInst CI, const BitSetInfo &BSI, GlobalVariable BitSetGlobal,
				GlobalVariable *CombinedGlobal,
				const DenseMap<GlobalVariable *, uint64_t> &GlobalLayout) {
				Value *Ptr = CI->getArgOperand(0);

				if (BSI.containsValue(DL, GlobalLayout, Ptr)) {
				CI->replaceAllUsesWith(
				ConstantInt::getTrue(BitSetGlobal->getParent()->getContext()));
				CI->eraseFromParent();
				return;
				jfbUnsubmitted Not Done Reply Inline Actions I'm mildly disappointed that the optimizer doesn't do this by taking into account ISA-specific sizes (and then removing the dead global because its address isn't taken). jfb: I'm mildly disappointed that the optimizer doesn't do this by taking into account ISA-specific…
				pccAuthorUnsubmitted Not Done Reply Inline Actions This should in principle be possible, but this pass runs late so in any case it seems best to directly generate the IR we need. pcc: This should in principle be possible, but this pass runs late so in any case it seems best to…
				jfbUnsubmitted Not Done Reply Inline Actions Does it need to run that late? 8 may not be the right number on all architectures, and you should see a good binary size reduction by GC'ing unreferenced globals (especially if CFI+devirtualization occurs). jfb: Does it need to run that late? 8 may not be the right number on all architectures, and you…
				pccAuthorUnsubmitted Not Done Reply Inline Actions It seemed better to run the pass later because it splits basic blocks, which could pessimize things in other passes, and bitset creation essentially locks in the set of virtual tables that appear in the binary, preventing them from being GCd. I'm not sure what the best way to deal with this might be. We might later want to split this pass into an early pass and a late pass, where the early pass uses bitset metadata to do devirtualization, a later globalopt pass GCs unused vtables and the late pass builds the actual bitsets. pcc: It seemed better to run the pass later because it splits basic blocks, which could pessimize…
				jfbUnsubmitted Not Done Reply Inline Actions Maybe I have a naive view, but shouldn't bitsets be GC'able if no test refers to them? This doesn't need to be an optimization that's specific to your code, LLVM can do this in general when a global doesn't escape and isn't address-taken (and in your case, is read-only). If this is correct, then I don't think you need to split up this pass, though I agree that you may want to do devirtualization earlier to expose more optimization opportunities. Under the current setup, do redundant tests in the same function get eliminated and control flow merged? This may be something that we can leave open for later changes: I think the current code is good in that it does what's required and is pretty efficient at it. I don't think the design will change substantially, but I do think there are further optimization opportunities here. WDYT? jfb: Maybe I have a naive view, but shouldn't bitsets be GC'able if no test refers to them? This…
				pccAuthorUnsubmitted Not Done Reply Inline Actions Maybe I have a naive view, but shouldn't bitsets be GC'able if no test refers to them? They could be, but the globals that the bitsets map onto (i.e. the vtables) cannot be GC'd because we lay them out in a specific order in this pass. We only build bitset constants for bitsets that are referred to by tests. The loop near the start of `LowerBitSets::buildBitSets` identifies all such bitsets by looking through the uses of the `llvm.bitset.test` intrinsic. If a particular test is dead, LLVM should equally be able to remove the dead test (as it is a readonly intrinsic) or remove a dead load from a bitset as part of DCE (in fact the former would probably be easier because of the simpler control flow). Another advantage of doing this late is that allowing the earlier passes to eliminate dead tests we potentially reduce the number of equivalence classes we need to create, which could result in smaller disjoint sets of classes and therefore smaller bitsets. Under the current setup, do redundant tests in the same function get eliminated and control flow merged? Are you referring to cases where a virtual call happens twice through the same pointer? struct S { virtual void f(); }; [...] S p = ...; p->f(); p->f(); The problem is that it will be difficult to remove redundant tests because of the semantics of C++. In this case the function f could overwrite the memory region that p refers to with an object of a different derived class without invoking undefined behavior. We might want a flag that a user can use to promise that such things will never happen though. I don't think the design will change substantially, but I do think there are further optimization opportunities here. WDYT? At a high level I do agree that there are optimization opportunities to pursue here. (I could elaborate, but this probably isn't the place.) pcc:* > Maybe I have a naive view, but shouldn't bitsets be GC'able if no test refers to them? They…
				}

				Constant *GlobalAsInt = ConstantExpr::getPtrToInt(CombinedGlobal, IntPtrTy);
				Constant *OffsetedGlobalAsInt = ConstantExpr::getAdd(
				GlobalAsInt, ConstantInt::get(IntPtrTy, BSI.ByteOffset));

				BasicBlock *InitialBB = CI->getParent();

				IRBuilder<> B(CI);

				Value *PtrAsInt = B.CreatePtrToInt(Ptr, IntPtrTy);

				if (BSI.isSingleOffset()) {
				Value *Eq = B.CreateICmpEQ(PtrAsInt, OffsetedGlobalAsInt);
				CI->replaceAllUsesWith(Eq);
				CI->eraseFromParent();
				return;
				}

				Value *PtrOffset = B.CreateSub(PtrAsInt, OffsetedGlobalAsInt);

				Value *BitOffset;
				if (BSI.AlignLog2 == 0) {
				BitOffset = PtrOffset;
				} else {
				// We need to check that the offset both falls within our range and is
				// suitably aligned. We can check both properties at the same time by
				// performing a right rotate by log2(alignment) followed by an integer
				// comparison against the bitset size. The rotate will move the lower
				// order bits that need to be zero into the higher order bits of the
				// result, causing the comparison to fail if they are nonzero. The rotate
				// also conveniently gives us a bit offset to use during the load from
				// the bitset.
				Value *OffsetSHR =
				B.CreateLShr(PtrOffset, ConstantInt::get(IntPtrTy, BSI.AlignLog2));
				Value *OffsetSHL = B.CreateShl(
				PtrOffset, ConstantInt::get(IntPtrTy, DL->getPointerSizeInBits(0) -
				BSI.AlignLog2));
				BitOffset = B.CreateOr(OffsetSHR, OffsetSHL);
				}

				Constant *BitSizeConst = ConstantInt::get(IntPtrTy, BSI.BitSize);
				Value *OffsetInRange = B.CreateICmpULT(BitOffset, BitSizeConst);

				TerminatorInst *Term = SplitBlockAndInsertIfThen(OffsetInRange, CI, false);
				IRBuilder<> ThenB(Term);

				// Now that we know that the offset is in range and aligned, load the
				// appropriate bit from the bitset. This pattern matches to the bt instruction
				// on x86. TODO: We might want to use the memory variant of the bt instruction
				// with the previously computed bit offset at -Os. This instruction does
				// exactly what we want but has been benchmarked as being slower than open
				// coding the load+bt.
				Value *Bit;
				if (BSI.Bits.size() <= 8) {
				kccUnsubmitted Not Done Reply Inline Actions split into a separate function? kcc: split into a separate function?
				pccAuthorUnsubmitted Not Done Reply Inline Actions Done pcc: Done
				// If the bit set is sufficiently small, we can avoid a load by bit testing
				// a constant.
				kccUnsubmitted Not Done Reply Inline Actions the code gen seems to generate the BT instruction sometimes. It's may be better to just create an instruction sequence that will be lowered to BT. Example: void foo(unsigned char a, unsigned long b) { if (a & (1 << b)) __builtin_trap(); } 0: 0f a3 f7 bt %esi,%edi 3: 72 01 jb 6 <_Z3foohm+0x6> 5: c3 retq 6: 0f 0b ud2 kcc: the code gen seems to generate the BT instruction sometimes. It's may be better to just…
				pccAuthorUnsubmitted Not Done Reply Inline Actions Yes, we can do that. I did try before to emit the register variant of BT but I guess I wasn't using the correct pattern. I'll play around with the IR to see if I can get it to match. I was also thinking more of the memory variant of BT that basically does exactly what we want given a bit offset. pcc: Yes, we can do that. I did try before to emit the register variant of BT but I guess I wasn't…
				pccAuthorUnsubmitted Not Done Reply Inline Actions This now uses the register BT. pcc: This now uses the register BT.
				IntegerType *BitsTy;
				if (BSI.Bits.size() <= 4)
				BitsTy = Int32Ty;
				else
				BitsTy = Int64Ty;

				uint64_t Bits = 0;
				for (auto I = BSI.Bits.rbegin(), E = BSI.Bits.rend(); I != E; ++I) {
				Bits <<= 8;
				Bits \|= *I;
				}
				Constant *BitsConst = ConstantInt::get(BitsTy, Bits);
				Bit = createMaskedBitTest(ThenB, BitsConst, BitOffset);
				} else {
				Value *BitSetGlobalOffset =
				ThenB.CreateLShr(BitOffset, ConstantInt::get(IntPtrTy, 5));
				Value *BitSetEntryAddr = ThenB.CreateGEP(
				ConstantExpr::getBitCast(BitSetGlobal, Int32PtrTy), BitSetGlobalOffset);
				Value *BitSetEntry = ThenB.CreateLoad(BitSetEntryAddr);

				Bit = createMaskedBitTest(ThenB, BitSetEntry, BitOffset);
				}

				// The value we want is 0 if we came directly from the initial block
				// (having failed the range or alignment checks), or the loaded bit if
				// we came from the block in which we loaded it.
				B.SetInsertPoint(CI);
				PHINode *P = B.CreatePHI(Int1Ty, 2);
				P->addIncoming(ConstantInt::get(Int1Ty, 0), InitialBB);
				P->addIncoming(Bit, ThenB.GetInsertBlock());

				CI->replaceAllUsesWith(P);
				CI->eraseFromParent();
				}

				void LowerBitSets::buildBitSetsFromGlobals(
				Module &M,
				const std::vector<MDString *> &BitSets,
				const std::vector<GlobalVariable *> &Globals) {
				// Build a new global with the combined contents of the referenced globals.
				std::vector<Constant *> GlobalInits;
				for (GlobalVariable *G : Globals)
				GlobalInits.push_back(G->getInitializer());
				Constant *NewInit = ConstantStruct::getAnon(M.getContext(), GlobalInits);
				auto CombinedGlobal =
				new GlobalVariable(M, NewInit->getType(), /isConstant=/true,
				GlobalValue::PrivateLinkage, NewInit);

				const StructLayout *CombinedGlobalLayout =
				DL->getStructLayout(cast<StructType>(NewInit->getType()));

				// Compute the offsets of the original globals within the new global.
				DenseMap<GlobalVariable *, uint64_t> GlobalLayout;
				for (unsigned I = 0; I != Globals.size(); ++I) {
				GlobalLayout[Globals[I]] = CombinedGlobalLayout->getElementOffset(I);
				}

				// For each bitset in this disjoint set...
				for (MDString *BS : BitSets) {
				// Build the bitset.
				BitSetInfo BSI = buildBitSet(BS, GlobalLayout);

				// Create a global in which to store it.
				++NumBitSetsCreated;
				Constant *BitsConst = ConstantDataArray::get(M.getContext(), BSI.Bits);
				auto BitSetGlobal = new GlobalVariable(
				M, BitsConst->getType(), /isConstant=/true,
				GlobalValue::PrivateLinkage, BitsConst, BS->getString() + ".bits");

				kccUnsubmitted Not Done Reply Inline Actions remove {} kcc: remove {}
				// Lower each call to llvm.bitset.test for this bitset.
				for (CallInst *CI : BitSetTestCallSites[BS]) {
				++NumBitSetCallsLowered;
				lowerBitSetCall(CI, BSI, BitSetGlobal, CombinedGlobal, GlobalLayout);
				}
				}

				// Build aliases pointing to offsets into the combined global for each
				// global from which we built the combined global, and replace references
				// to the original globals with references to the aliases.
				for (unsigned I = 0; I != Globals.size(); ++I) {
				Constant *CombinedGlobalIdxs[] = {ConstantInt::get(Int32Ty, 0),
				ConstantInt::get(Int32Ty, I)};
				Constant *CombinedGlobalElemPtr =
				ConstantExpr::getGetElementPtr(CombinedGlobal, CombinedGlobalIdxs);
				GlobalAlias *GAlias = GlobalAlias::create(
				Globals[I]->getType()->getElementType(),
				Globals[I]->getType()->getAddressSpace(), Globals[I]->getLinkage(),
				"", CombinedGlobalElemPtr, &M);
				GAlias->takeName(Globals[I]);
				Globals[I]->replaceAllUsesWith(GAlias);
				Globals[I]->eraseFromParent();
				}
				}

				bool LowerBitSets::buildBitSets(Module &M) {
				Function *BitSetTestFunc =
				M.getFunction(Intrinsic::getName(Intrinsic::bitset_test));
				if (!BitSetTestFunc)
				return false;

				// Equivalence class set containing bitsets and the globals they reference.
				// This is used to partition the set of bitsets in the module into disjoint
				// sets.
				typedef EquivalenceClasses<PointerUnion<GlobalVariable , MDString >>
				GlobalClassesTy;
				GlobalClassesTy GlobalClasses;

				for (const Use &U : BitSetTestFunc->uses()) {
				auto CI = cast<CallInst>(U.getUser());

				auto BitSetMDVal = dyn_cast<MetadataAsValue>(CI->getArgOperand(1));
				if (!BitSetMDVal \|\| !isa<MDString>(BitSetMDVal->getMetadata()))
				report_fatal_error(
				"Second argument of llvm.bitset.test must be metadata string");
				auto BitSet = cast<MDString>(BitSetMDVal->getMetadata());

				// Add the call site to the list of call sites for this bit set. We also use
				// BitSetTestCallSites to keep track of whether we have seen this bit set
				// before. If we have, we don't need to re-add the referenced globals to the
				// equivalence class.
				std::pair<DenseMap<MDString , std::vector<CallInst >>::iterator,
				bool> Ins =
				BitSetTestCallSites.insert(
				std::make_pair(BitSet, std::vector<CallInst *>()));
				Ins.first->second.push_back(CI);
				if (!Ins.second)
				continue;

				// Add the bitset to the equivalence class.
				GlobalClassesTy::iterator GCI = GlobalClasses.insert(BitSet);
				GlobalClassesTy::member_iterator CurSet = GlobalClasses.findLeader(GCI);

				if (!BitSetNM)
				continue;

				// Verify the bitset metadata and add the referenced globals to the bitset's
				// equivalence class.
				for (MDNode *Op : BitSetNM->operands()) {
				if (Op->getNumOperands() != 3)
				report_fatal_error(
				"All operands of llvm.bitsets metadata must have 3 elements");

				if (Op->getOperand(0) != BitSet \|\| !Op->getOperand(1))
				continue;

				auto OpConstMD = dyn_cast<ConstantAsMetadata>(Op->getOperand(1));
				if (!OpConstMD)
				report_fatal_error("Bit set element must be a constant");
				auto OpGlobal = dyn_cast<GlobalVariable>(OpConstMD->getValue());
				if (!OpGlobal)
				report_fatal_error("Bit set element must refer to global");

				auto OffsetConstMD = dyn_cast<ConstantAsMetadata>(Op->getOperand(2));
				if (!OffsetConstMD)
				report_fatal_error("Bit set element offset must be a constant");
				auto OffsetInt = dyn_cast<ConstantInt>(OffsetConstMD->getValue());
				if (!OffsetInt)
				report_fatal_error(
				"Bit set element offset must be an integer constant");

				CurSet = GlobalClasses.unionSets(
				CurSet, GlobalClasses.findLeader(GlobalClasses.insert(OpGlobal)));
				}
				}

				if (GlobalClasses.empty())
				return false;

				// For each disjoint set we found...
				for (GlobalClassesTy::iterator I = GlobalClasses.begin(),
				E = GlobalClasses.end();
				I != E; ++I) {
				if (!I->isLeader()) continue;

				++NumBitSetDisjointSets;

				// Build the list of bitsets and referenced globals in this disjoint set.
				std::vector<MDString *> BitSets;
				std::vector<GlobalVariable *> Globals;
				for (GlobalClassesTy::member_iterator MI = GlobalClasses.member_begin(I);
				MI != GlobalClasses.member_end(); ++MI) {
				if ((MI).is<MDString >())
				BitSets.push_back(MI->get<MDString *>());
				else
				Globals.push_back(MI->get<GlobalVariable *>());
				}

				// Order bitsets and globals by name for determinism. TODO: We may later
				// want to use a more sophisticated ordering that lays out globals so as to
				// minimize the sizes of the bitsets.
				std::sort(BitSets.begin(), BitSets.end(), [](MDString S1, MDString S2) {
				return S1->getString() < S2->getString();
				});
				std::sort(Globals.begin(), Globals.end(),
				[](GlobalVariable GV1, GlobalVariable GV2) {
				return GV1->getName() < GV2->getName();
				});

				// Build the bitsets from this disjoint set.
				buildBitSetsFromGlobals(M, BitSets, Globals);
				}

				return true;
				}

				bool LowerBitSets::eraseBitSetMetadata(Module &M) {
				if (!BitSetNM)
				return false;

				M.eraseNamedMetadata(BitSetNM);
				return true;
				}

				bool LowerBitSets::runOnModule(Module &M) {
				bool Changed = buildBitSets(M);
				Changed \|= eraseBitSetMetadata(M);
				return Changed;
				}

lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 460 Lines • ▼ Show 20 Lines	if (LoadCombine)
PM.add(createLoadCombinePass());		PM.add(createLoadCombinePass());

// Cleanup and simplify the code after the scalar optimizations.		// Cleanup and simplify the code after the scalar optimizations.
PM.add(createInstructionCombiningPass());		PM.add(createInstructionCombiningPass());
addExtensionsToPM(EP_Peephole, PM);		addExtensionsToPM(EP_Peephole, PM);

PM.add(createJumpThreadingPass());		PM.add(createJumpThreadingPass());

		// Lower bitset metadata to bitsets.
		PM.add(createLowerBitSetsPass());

// Delete basic blocks, which optimization passes may have killed.		// Delete basic blocks, which optimization passes may have killed.
PM.add(createCFGSimplificationPass());		PM.add(createCFGSimplificationPass());

// Now that we have optimized the program, discard unreachable functions.		// Now that we have optimized the program, discard unreachable functions.
PM.add(createGlobalDCEPass());		PM.add(createGlobalDCEPass());

// FIXME: this is profitable (for compiler time) to do at -O0 too, but		// FIXME: this is profitable (for compiler time) to do at -O0 too, but
// currently it damages debug info.		// currently it damages debug info.
▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

test/Transforms/LowerBitSets/constant.ll

This file was added.

				; RUN: opt -S -lowerbitsets < %s \| FileCheck %s

				target datalayout = "e-p:32:32"

				@a = constant i32 1
				@b = constant [2 x i32] [i32 2, i32 3]

				!0 = !{!"bitset1", i32* @a, i32 0}
				!1 = !{!"bitset1", [2 x i32]* @b, i32 4}

				!llvm.bitsets = !{ !0, !1 }

				declare i1 @llvm.bitset.test(i8* %ptr, metadata %bitset) nounwind readnone

				; CHECK: @foo(
				define i1 @foo() {
				; CHECK: ret i1 true
				%x = call i1 @llvm.bitset.test(i8* bitcast (i32* @a to i8*), metadata !"bitset1")
				ret i1 %x
				}

				; CHECK: @bar(
				define i1 @bar() {
				; CHECK: ret i1 true
				%x = call i1 @llvm.bitset.test(i8* bitcast (i32* getelementptr ([2 x i32]* @b, i32 0, i32 1) to i8*), metadata !"bitset1")
				ret i1 %x
				}

				; CHECK: @baz(
				define i1 @baz() {
				; CHECK-NOT: ret i1 true
				%x = call i1 @llvm.bitset.test(i8* bitcast (i32* getelementptr ([2 x i32]* @b, i32 0, i32 0) to i8*), metadata !"bitset1")
				ret i1 %x
				}

test/Transforms/LowerBitSets/simple.ll

This file was added.

				; RUN: opt -S -lowerbitsets < %s \| FileCheck %s
				; RUN: opt -S -O3 < %s \| FileCheck -check-prefix=CHECK-NODISCARD %s

				target datalayout = "e-p:32:32"

				; CHECK: [[G:@[^ ]*]] = private constant { i32, [63 x i32], i32, [2 x i32] } { i32 1, [63 x i32] zeroinitializer, i32 3, [2 x i32] [i32 4, i32 5] }
				@a = constant i32 1
				@b = constant [63 x i32] zeroinitializer
				@c = constant i32 3
				@d = constant [2 x i32] [i32 4, i32 5]

				; Offset 0, 4 byte alignment
				; CHECK: @bitset1.bits = private constant [9 x i8] c"\03\00\00\00\00\00\00\00\04"
				!0 = !{!"bitset1", i32* @a, i32 0}
				; CHECK-NODISCARD-DAG: !{!"bitset1", i32* @a, i32 0}
				!1 = !{!"bitset1", [63 x i32]* @b, i32 0}
				; CHECK-NODISCARD-DAG: !{!"bitset1", [63 x i32]* @b, i32 0}
				!2 = !{!"bitset1", [2 x i32]* @d, i32 4}
				; CHECK-NODISCARD-DAG: !{!"bitset1", [2 x i32]* @d, i32 4}

				; Offset 4, 4 byte alignment
				; CHECK: @bitset2.bits = private constant [8 x i8] c"\01\00\00\00\00\00\00\80"
				!3 = !{!"bitset2", [63 x i32]* @b, i32 0}
				; CHECK-NODISCARD-DAG: !{!"bitset2", [63 x i32]* @b, i32 0}
				!4 = !{!"bitset2", i32* @c, i32 0}
				; CHECK-NODISCARD-DAG: !{!"bitset2", i32* @c, i32 0}

				; Offset 0, 256 byte alignment
				; CHECK: @bitset3.bits = private constant [1 x i8] c"\03"
				!5 = !{!"bitset3", i32* @a, i32 0}
				; CHECK-NODISCARD-DAG: !{!"bitset3", i32* @a, i32 0}
				!6 = !{!"bitset3", i32* @c, i32 0}
				; CHECK-NODISCARD-DAG: !{!"bitset3", i32* @c, i32 0}

				; Entries whose second operand is null (the result of a global being DCE'd)
				; should be ignored.
				!7 = !{!"bitset2", null, i32 0}

				!llvm.bitsets = !{ !0, !1, !2, !3, !4, !5, !6, !7 }

				; CHECK: @a = alias getelementptr inbounds ({ i32, [63 x i32], i32, [2 x i32] }* [[G]], i32 0, i32 0)
				; CHECK: @b = alias getelementptr inbounds ({ i32, [63 x i32], i32, [2 x i32] }* [[G]], i32 0, i32 1)
				; CHECK: @c = alias getelementptr inbounds ({ i32, [63 x i32], i32, [2 x i32] }* [[G]], i32 0, i32 2)
				; CHECK: @d = alias getelementptr inbounds ({ i32, [63 x i32], i32, [2 x i32] }* [[G]], i32 0, i32 3)

				declare i1 @llvm.bitset.test(i8* %ptr, metadata %bitset) nounwind readnone

				; CHECK: @foo(i32* [[A0:%[^ ]*]])
				define i1 @foo(i32* %p) {
				; CHECK-NOT: llvm.bitset.test

				; CHECK: [[R0:%[^ ]]] = bitcast i32 [[A0]] to i8*
				%pi8 = bitcast i32* %p to i8*
				; CHECK: [[R1:%[^ ]]] = ptrtoint i8 [[R0]] to i32
				; CHECK: [[R2:%[^ ]]] = sub i32 [[R1]], ptrtoint ({ i32, [63 x i32], i32, [2 x i32] } [[G]] to i32)
				; CHECK: [[R3:%[^ ]*]] = lshr i32 [[R2]], 2
				; CHECK: [[R4:%[^ ]*]] = shl i32 [[R2]], 30
				; CHECK: [[R5:%[^ ]*]] = or i32 [[R3]], [[R4]]
				; CHECK: [[R6:%[^ ]*]] = icmp ult i32 [[R5]], 67
				; CHECK: br i1 [[R6]]

				; CHECK: [[R8:%[^ ]*]] = lshr i32 [[R5]], 5
				; CHECK: [[R9:%[^ ]]] = getelementptr i32 bitcast ([9 x i8]* @bitset1.bits to i32*), i32 [[R8]]
				; CHECK: [[R10:%[^ ]]] = load i32 [[R9]]
				; CHECK: [[R11:%[^ ]*]] = and i32 [[R5]], 31
				; CHECK: [[R12:%[^ ]*]] = shl i32 1, [[R11]]
				; CHECK: [[R13:%[^ ]*]] = and i32 [[R10]], [[R12]]
				; CHECK: [[R14:%[^ ]*]] = icmp ne i32 [[R13]], 0

				; CHECK: [[R16:%[^ ]]] = phi i1 [ false, {{%[^ ]}} ], [ [[R14]], {{%[^ ]*}} ]
				%x = call i1 @llvm.bitset.test(i8* %pi8, metadata !"bitset1")

				; CHECK-NOT: llvm.bitset.test
				%y = call i1 @llvm.bitset.test(i8* %pi8, metadata !"bitset1")

				; CHECK: ret i1 [[R16]]
				ret i1 %x
				}

				; CHECK: @bar(i32* [[B0:%[^ ]*]])
				define i1 @bar(i32* %p) {
				; CHECK: [[S0:%[^ ]]] = bitcast i32 [[B0]] to i8*
				%pi8 = bitcast i32* %p to i8*
				; CHECK: [[S1:%[^ ]]] = ptrtoint i8 [[S0]] to i32
				; CHECK: [[S2:%[^ ]]] = sub i32 [[S1]], add (i32 ptrtoint ({ i32, [63 x i32], i32, [2 x i32] } [[G]] to i32), i32 4)
				; CHECK: [[S3:%[^ ]*]] = lshr i32 [[S2]], 2
				; CHECK: [[S4:%[^ ]*]] = shl i32 [[S2]], 30
				; CHECK: [[S5:%[^ ]*]] = or i32 [[S3]], [[S4]]
				; CHECK: [[S6:%[^ ]*]] = icmp ult i32 [[S5]], 64
				; CHECK: br i1 [[S6]]

				; CHECK: [[S8:%[^ ]*]] = zext i32 [[S5]] to i64
				; CHECK: [[S9:%[^ ]*]] = and i64 [[S8]], 63
				; CHECK: [[S10:%[^ ]*]] = shl i64 1, [[S9]]
				; CHECK: [[S11:%[^ ]*]] = and i64 -9223372036854775807, [[S10]]
				; CHECK: [[S12:%[^ ]*]] = icmp ne i64 [[S11]], 0

				; CHECK: [[S16:%[^ ]]] = phi i1 [ false, {{%[^ ]}} ], [ [[S12]], {{%[^ ]*}} ]
				%x = call i1 @llvm.bitset.test(i8* %pi8, metadata !"bitset2")
				; CHECK: ret i1 [[S16]]
				ret i1 %x
				}

				; CHECK: @baz(i32* [[C0:%[^ ]*]])
				define i1 @baz(i32* %p) {
				; CHECK: [[T0:%[^ ]]] = bitcast i32 [[C0]] to i8*
				%pi8 = bitcast i32* %p to i8*
				; CHECK: [[T1:%[^ ]]] = ptrtoint i8 [[T0]] to i32
				; CHECK: [[T2:%[^ ]]] = sub i32 [[T1]], ptrtoint ({ i32, [63 x i32], i32, [2 x i32] } [[G]] to i32)
				; CHECK: [[T3:%[^ ]*]] = lshr i32 [[T2]], 8
				; CHECK: [[T4:%[^ ]*]] = shl i32 [[T2]], 24
				; CHECK: [[T5:%[^ ]*]] = or i32 [[T3]], [[T4]]
				; CHECK: [[T6:%[^ ]*]] = icmp ult i32 [[T5]], 2
				; CHECK: br i1 [[T6]]

				; CHECK: [[T8:%[^ ]*]] = and i32 [[T5]], 31
				; CHECK: [[T9:%[^ ]*]] = shl i32 1, [[T8]]
				; CHECK: [[T10:%[^ ]*]] = and i32 3, [[T9]]
				; CHECK: [[T11:%[^ ]*]] = icmp ne i32 [[T10]], 0

				; CHECK: [[T16:%[^ ]]] = phi i1 [ false, {{%[^ ]}} ], [ [[T11]], {{%[^ ]*}} ]
				%x = call i1 @llvm.bitset.test(i8* %pi8, metadata !"bitset3")
				; CHECK: ret i1 [[T16]]
				ret i1 %x
				}

				; CHECK-NOT: !llvm.bitsets

test/Transforms/LowerBitSets/single-offset.ll

This file was added.

				; RUN: opt -S -lowerbitsets < %s \| FileCheck %s

				target datalayout = "e-p:32:32"

				; CHECK: [[G:@[^ ]*]] = private constant { i32, i32 }
				@a = constant i32 1
				@b = constant i32 2

				!0 = !{!"bitset1", i32* @a, i32 0}
				!1 = !{!"bitset1", i32* @b, i32 0}
				!2 = !{!"bitset2", i32* @a, i32 0}
				!3 = !{!"bitset3", i32* @b, i32 0}

				!llvm.bitsets = !{ !0, !1, !2, !3 }

				declare i1 @llvm.bitset.test(i8* %ptr, metadata %bitset) nounwind readnone

				; CHECK: @foo(i8* [[A0:%[^ ]*]])
				define i1 @foo(i8* %p) {
				; CHECK: [[R0:%[^ ]]] = ptrtoint i8 [[A0]] to i32
				; CHECK: [[R1:%[^ ]]] = icmp eq i32 [[R0]], ptrtoint ({ i32, i32 } [[G]] to i32)
				%x = call i1 @llvm.bitset.test(i8* %p, metadata !"bitset2")
				; CHECK: ret i1 [[R1]]
				ret i1 %x
				}

				; CHECK: @bar(i8* [[B0:%[^ ]*]])
				define i1 @bar(i8* %p) {
				; CHECK: [[S0:%[^ ]]] = ptrtoint i8 [[B0]] to i32
				; CHECK: [[S1:%[^ ]]] = icmp eq i32 [[S0]], add (i32 ptrtoint ({ i32, i32 } [[G]] to i32), i32 4)
				%x = call i1 @llvm.bitset.test(i8* %p, metadata !"bitset3")
				; CHECK: ret i1 [[S1]]
				ret i1 %x
				}

				; CHECK: @x(
				define i1 @x(i8* %p) {
				%x = call i1 @llvm.bitset.test(i8* %p, metadata !"bitset1")
				ret i1 %x
				}

unittests/Transforms/CMakeLists.txt

				add_subdirectory(IPO)
	add_subdirectory(Utils)			add_subdirectory(Utils)

unittests/Transforms/IPO/CMakeLists.txt

This file was added.

				set(LLVM_LINK_COMPONENTS
				Core
				Support
				IPO
				)

				add_llvm_unittest(IPOTests
				LowerBitSets.cpp
				)

unittests/Transforms/IPO/LowerBitSets.cpp

This file was added.

				//===- LowerBitSets.cpp - Unit tests for bitset lowering ------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/IPO/LowerBitSets.h"
				#include "gtest/gtest.h"

				using namespace llvm;

				TEST(LowerBitSets, BitSetBuilder) {
				struct {
				std::vector<uint64_t> Offsets;
				std::vector<uint8_t> Bits;
				uint64_t ByteOffset;
				uint64_t BitSize;
				unsigned AlignLog2;
				bool IsSingleOffset;
				} BSBTests[] = {
				{{}, {0}, 0, 1, 0, false},
				{{0}, {1}, 0, 1, 0, true},
				{{4}, {1}, 4, 1, 0, true},
				{{37}, {1}, 37, 1, 0, true},
				{{0, 1}, {3}, 0, 2, 0, false},
				{{0, 4}, {3}, 0, 2, 2, false},
				{{3, 7}, {3}, 3, 2, 2, false},
				{{0, 1, 7}, {131}, 0, 8, 0, false},
				{{0, 2, 14}, {131}, 0, 8, 1, false},
				{{0, 1, 8}, {3, 1}, 0, 9, 0, false},
				{{0, 2, 16}, {3, 1}, 0, 9, 1, false},
				};

				for (auto &&T : BSBTests) {
				BitSetBuilder BSB;
				for (auto Offset : T.Offsets)
				BSB.addOffset(Offset);

				BitSetInfo BSI = BSB.build();

				EXPECT_EQ(T.Bits, BSI.Bits);
				EXPECT_EQ(T.ByteOffset, BSI.ByteOffset);
				EXPECT_EQ(T.BitSize, BSI.BitSize);
				EXPECT_EQ(T.AlignLog2, BSI.AlignLog2);
				EXPECT_EQ(T.IsSingleOffset, BSI.isSingleOffset());
				}
				}

unittests/Transforms/IPO/Makefile

This file was copied from unittests/Transforms/Makefile.

	##===- unittests/Transforms/Makefile ------------------------ Makefile --===##			##===- unittests/Transforms/IPO/Makefile -------------------- Makefile --===##
	#			#
	# The LLVM Compiler Infrastructure			# The LLVM Compiler Infrastructure
	#			#
	# This file is distributed under the University of Illinois Open Source			# This file is distributed under the University of Illinois Open Source
	# License. See LICENSE.TXT for details.			# License. See LICENSE.TXT for details.
	#			#
	##===----------------------------------------------------------------------===##			##===----------------------------------------------------------------------===##

	LEVEL = ../..			LEVEL = ../../..
				TESTNAME = IPO
				LINK_COMPONENTS := IPO

	PARALLEL_DIRS = Utils			include $(LEVEL)/Makefile.config
				include $(LLVM_SRC_ROOT)/unittests/Makefile.unittest
	include $(LEVEL)/Makefile.common

	clean::
	$(Verb) $(RM) -f *Tests

unittests/Transforms/Makefile

This file was copied to unittests/Transforms/IPO/Makefile.

	##===- unittests/Transforms/Makefile ------------------------ Makefile --===##			##===- unittests/Transforms/Makefile ------------------------ Makefile --===##
	#			#
	# The LLVM Compiler Infrastructure			# The LLVM Compiler Infrastructure
	#			#
	# This file is distributed under the University of Illinois Open Source			# This file is distributed under the University of Illinois Open Source
	# License. See LICENSE.TXT for details.			# License. See LICENSE.TXT for details.
	#			#
	##===----------------------------------------------------------------------===##			##===----------------------------------------------------------------------===##

	LEVEL = ../..			LEVEL = ../..

	PARALLEL_DIRS = Utils			PARALLEL_DIRS = IPO Utils

	include $(LEVEL)/Makefile.common			include $(LEVEL)/Makefile.common

	clean::			clean::
	$(Verb) $(RM) -f *Tests			$(Verb) $(RM) -f *Tests

This is an archive of the discontinued LLVM Phabricator instance.

Introduce bitset metadata format and bitset lowering pass.ClosedPublic

Details

Diff Detail

Event Timeline

testing

Revision Contents

Diff 19849

docs/BitSets.rst

docs/LangRef.rst

docs/index.rst

include/llvm/ADT/EquivalenceClasses.h

include/llvm/ADT/PointerUnion.h

include/llvm/IR/Intrinsics.td

include/llvm/InitializePasses.h

include/llvm/Transforms/IPO.h

include/llvm/Transforms/IPO/LowerBitSets.h

lib/Transforms/IPO/CMakeLists.txt

lib/Transforms/IPO/IPO.cpp

lib/Transforms/IPO/LowerBitSets.cpp

lib/Transforms/IPO/PassManagerBuilder.cpp

test/Transforms/LowerBitSets/constant.ll

test/Transforms/LowerBitSets/simple.ll

test/Transforms/LowerBitSets/single-offset.ll

unittests/Transforms/CMakeLists.txt

unittests/Transforms/IPO/CMakeLists.txt

unittests/Transforms/IPO/LowerBitSets.cpp

unittests/Transforms/IPO/Makefile

unittests/Transforms/Makefile

Introduce bitset metadata format and bitset lowering pass.
ClosedPublic