This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
CodeGen/
2
Passes.h
-
IR/
-
PatternMatch.h
-
InitializePasses.h
-
LinkAllPasses.h
-
lib/
-
CodeGen/
-
CodeGen.cpp
2
TargetPassConfig.cpp
-
Transforms/Scalar/
-
Scalar/
16
BitfieldShrinking.cpp
-
CMakeLists.txt
-
test/CodeGen/
-
CodeGen/
-
ARM/
-
bitfield-store.ll
-
illegal-bitfield-loadstore.ll
-
X86/
-
bitfield-store.ll
-
illegal-bitfield-loadstore.ll
-
tools/opt/
-
opt/
-
opt.cpp

Differential D30416

[BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently
Needs RevisionPublic

Authored by wmi on Feb 27 2017, 10:39 AM.

Download Raw Diff

Details

Reviewers

spatel
chandlerc
majnemer
eli.friedman
mkuper
javed.absar

Summary

reduceLoadOpStoreWidth is a useful optimization already existing in DAGCombiner. It can shrink the bitfield store in the following testcase:

class A {
public:
unsigned long f1:16;
unsigned long f2:3;
};
A a;

void foo() {
// if (a.f1 == 2) return;
a.f1 = a.f1 + 3;
}

For a.f2 = a.f2 + 3, without reduceLoadOpStoreWidth in DAGCombiner, the code will be:
movl a(%rip), %eax
leal 3(%rax), %ecx
movzwl %cx, %ecx
andl $-65536, %eax # imm = 0xFFFF0000
orl %ecx, %eax
movl %eax, a(%rip)

with reduceLoadOpStoreWidth, the code will be:
movl a(%rip), %eax
addl $3, %eax
movw %ax, a(%rip)

However, if we remove the comment above, the load of a.f2 and the store of a.f2 will stay in two different BasicBlocks and reduceLoadOpStoreWidth in DAGCombiner cannot work.

The patch is redoing the same optimization in InstCombine, so the optimization will not be limited by the BasicBlock boundary.

Diff Detail

Repository: rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

wmi added inline comments.Feb 27 2017, 5:14 PM

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1331 ↗	(On Diff #89902)	Ah, I had a mental block here. I was thinking MaskLeadOnes has to be APInt if I use APInt::countLeadingOnes and APInt doesn't support % operator. It is better to APInt. I will fix it.
1388 ↗	(On Diff #89902)	I am not expecting the alignment will increase. I am worried that the original alignment will be overestimated if directly applied to the new store and caused undefine behavior. Suppose the original i32 store to address @a has 32 bits alignment. Now we will store an i16 to a.f2 which is at address "@a + 2B". "@a + 2B" should only have 16bits alignment.
test/Transforms/InstCombine/bitfield-store.ll
89 ↗	(On Diff #89902)	Sorry I don't get the point. Are you suggesting the following? %bf.set = or i16 %bf.clear3, %bf.value %bf.set.truncate = trunc %bf.set i16 to i13 store i13 %bf.set.trunc, i13* bitcast (%class.A4* @a4 to i13*), align 8 llvm will still generate the same code: andl $8191, %edi # imm = 0x1FFF movw %di, a4(%rip)

efriedma added inline comments.Feb 27 2017, 6:05 PM

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1388 ↗	(On Diff #89902)	Suppose the original i32 store to address @a has 32 bits alignment. Now we will store an i16 to a.f2 which is at address "@a + 2B". "@a + 2B" should only have 16bits alignment. Suppose the original i32 store to address @a has 8 bits alignment. What is the alignment of "@a + 2B"? (You need to compute the GCD of the offset and the original alignment.)
test/Transforms/InstCombine/bitfield-store.ll
89 ↗	(On Diff #89902)	Oh, sorry, this isn't a good example; I mixed up the fields. But consider: ; class ATest { ; unsigned long f1:13; ; unsigned long f2:3; ; } atest; ; atest.f2 = n; You could shrink the store here (trunc to i8).

wmi added inline comments.Feb 27 2017, 10:02 PM

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1388 ↗	(On Diff #89902)	You are right. class A { public: unsigned long f1:8; unsigned long f2:16; unsigned long f3:8; }; A a; int foo () { a.f2 = 3; } i16 has 16 bits natural alignment, but a.f2 only has 8 bits alignment here.
test/Transforms/InstCombine/bitfield-store.ll
89 ↗	(On Diff #89902)	Ah, I see what you mean. In your case, we can shrink the store, but cannot remove the original load and bit operations doing the mask. I can add the shrink but I am not sure whether it is better than without the shrink.

efriedma added inline comments.Mar 1 2017, 12:40 PM

test/Transforms/InstCombine/bitfield-store.ll
89 ↗	(On Diff #89902)	It's a substantial improvement if you're transforming from an illegal type to a legal type. (I've been dealing with trying to optimize an i24 bitfield recently; see, for example, test/CodeGen/ARM/illegal-bitfield-loadstore.ll.) In other cases, you're right, it's not obviously profitable.

Update patch according to Eli's comments.

Add safety check to ensure no memory modifying inst bewteen load and store.
Extend the shrinking funcationality to cover the cases Eli gave to me.
Code refactoring so that different shrinking requirements can share the code as much as they can.

Although I did't find regression in internal benchmarks testing, I still moved the transformation to codegenprepare because we want to use TargetLowering information to decide how to shrink in some cases.

mkuper added inline comments.Mar 6 2017, 7:42 PM

lib/CodeGen/CodeGenPrepare.cpp
5719 ↗	(On Diff #90589)	We don't have alias analysis in CGP at all, do we? Maybe it would be better to pull this out somewhere else (late in the pipeline).
5739 ↗	(On Diff #90589)	Oh, ok, now I see why Eli suggested MemorySSA. There has to be a better way to do this. (Although I can't think of one at the moment.)
5823 ↗	(On Diff #90589)	Are you sure this does the right thing for xor?
5895 ↗	(On Diff #90589)	I don't think this is what you meant to check here. :-)
5912 ↗	(On Diff #90589)	Do we want to look through bitcasts? It probably doesn't matter in practice, though.
lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1388 ↗	(On Diff #89902)	Could you add a test-case like this?

mkuper added inline comments.Mar 6 2017, 8:02 PM

lib/CodeGen/CodeGenPrepare.cpp
5823 ↗	(On Diff #90589)	Err, never mind, of course it does.

Could you add a test-case like this?

Sure. I will add such testcase after other major issues are solved.

lib/CodeGen/CodeGenPrepare.cpp
5739 ↗	(On Diff #90589)	Yes, if the optimization happens before loadpre, then this simple check is enough. If the optimization happens late in the pipeline, we need memoryssa + alias query to do the safety check.
5895 ↗	(On Diff #90589)	Oh, thanks for catching the stupid mistake.
5912 ↗	(On Diff #90589)	Yes, I want. I do see case that needs it.

efriedma added a subscriber: • dberlin.Mar 8 2017, 12:46 PM

Revamp the patch.

Extend bitfield store shrinking to handle and(or(and( ... or(load, C_1), MaskedVal_1), ..., C_N), MaskedVal_N))) pattern.
Add bitfield load shrinking.
Use memorySSA to do the safety check and maintain it on the fly.

With all these changes, llvm now can catch most of the shrinking opportunities for testcase http://lists.llvm.org/pipermail/llvm-commits/attachments/20170307/23ad5702/attachment-0001.cc, but still keep its bitfield coalescing ablity by putting the shrinking pass in the late pipeline.

I need to add testcases for bitfield load shrinking. Will send out patch update soon.

Herald added a subscriber: mgorny. · View Herald TranscriptMar 24 2017, 3:53 PM

wmi retitled this revision from [InstCombine] Redo reduceLoadOpStoreWidth in instcombine for bitfield store optimization. to [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.Mar 24 2017, 3:57 PM

Thanks a lot for working on this, first round of comments!

include/llvm/CodeGen/Passes.h
67–68	Since this is a CodeGen pass, the code should live in lib/CodeGen rather than in lib/Transforms.
lib/CodeGen/TargetPassConfig.cpp
480–482	This should probably be predicated on `getOptLevel()` much like below?
lib/Transforms/Scalar/BitfieldShrinking.cpp
1	I understand that the motivation here are bitfield accesses, but that isn't how we should describe the pass IMO. This is a generic pass to narrow memory accesses, and I think you should name it and document it accordingly. Naturally, you can still mention bitfields as one of the motivations and to help explain the specific patterns that are handled here. But if we have some other memory access shrinking we want to do, I would imagine that we would want to add it to this pass. This will probably need to be propagated through many of the comments here.
10–19	See above about re-focusing the documentation here on the generic memory access narrowing, and making the details about bitfields part of the motivation. I would also make sure to include here a high level overview of the approach / algorithm used. Things like the fact that this uses MemorySSA and is specifically designed to handle shrinking across control flow seems important. I'd also suggest making this a \file doxygen comment.
54	Use C++ struct naming rather than a C-style typedef of an anonymous struct.
232–235	While I generally like the use of lambdas to help factor this code, I find the parameters which are changing with each loop iteration being captured by reference and so implicitly changing to be really confusing. I would prefer to pass parameters that are fundamentally the input to the lambda as actual parameters, and use capture more for common context that isn't really specific to a particular call to the lambda. Does that make sense?
266–269	We try to avoid doing `isa<Foo>(...)` and then `cast<Foo>(...)` in LLVM (it adds overhead to asserts builds that can really add up and is a bit redundant). Instead, use `dyn_cast` here?
488–489	Is this valid at this point? It seems like it shouldn't be able to happen. I'd either use llvm_unreachable to mark that or add a comment explaining what is happening here.
568	`Worklist` is a more common name for this kind of vector in LLVM.
887–889	The pass manager already provides for facilities for printing before and after passes -- is this needed?

Chandler, thanks for the review and sorry about the delay of replying. It takes me a while to fix some issues of the patch found when I was adding test for the load shrinking part and doing the unittest.

include/llvm/CodeGen/Passes.h
67–68	Fixed.
lib/CodeGen/TargetPassConfig.cpp
480–482	Fixed.
lib/Transforms/Scalar/BitfieldShrinking.cpp
1	You are right. The impact of the pass is not limited to bitfield access. I renamed the pass to MemAccessShrink and changed the comments accordingly.
10–19	Add a highlevel overview of the approach used as suggested and use \file doxygen comment.
54	Fixed.
232–235	It makes sense. Fixed.
266–269	Fixed.
488–489	Fixed.
568	Fixed.
887–889	The printing was Removed.

Address Chandler's comments.
Fix unittest errors.
Add unittest for load shrinking part. Add the original motivation case as a unittest.
Add cost evaluation for the case when there is multiple use node inside the shrinking pattern.

Herald added a reviewer: javed.absar. · View Herald TranscriptApr 3 2017, 4:44 PM

Ping.

I'll find some time to look at the core algorithm later.

include/llvm/Target/TargetLowering.h
1908 ↗	(On Diff #93973)	I'm not sure I see the point of this hook. Every in-tree target has cheap i8 load/store and aligned i16 load/store operations. And we have existing hooks to check support for misaligned operations. If there's some case I'm not thinking of, please add an example to the comment.
lib/CodeGen/MemAccessShrinking.cpp
52 ↗	(On Diff #93973)	"try", not "tries".
1014 ↗	(On Diff #93973)	If an instruction has no uses and isn't trivially dead, we're never going to erase it; no point to adding it to CandidatesToErase.
1027 ↗	(On Diff #93973)	Should we clear CandidatesToErase here, as opposed to modifying it inside the loop?

Digging into the code next, but wanted to send some comments just on terminology and the documentation while I'm doing that.

lib/CodeGen/MemAccessShrinking.cpp
15 ↗	(On Diff #93973)	nit: s/now//
26 ↗	(On Diff #93973)	This sentence doesn't really parse for me, it reads as a command rather a description and comments typically are descriptive.
46 ↗	(On Diff #93973)	"no more change has happened" -> "no more changes have happened"
50 ↗	(On Diff #93973)	"mergd" -> "merged" "load/store" needs to be "loads and stores" or "load/store instructions".
52 ↗	(On Diff #93973)	I would reword this sentence to be a bit easier to read: It provides scalable and precise safety checks even when we try to insert a smaller access into a block which is many blocks away from the original access.
53 ↗	(On Diff #93973)	A comment on the terminology throughout this patch: the adjective describing something which has been reduced in size in the past is "shrunken". That said, if this is awkward to use, I might use the adjective "smaller". But "shrinked" isn't a word in English.

wmi added inline comments.Apr 21 2017, 4:31 PM

include/llvm/Target/TargetLowering.h
1908 ↗	(On Diff #93973)	It is because some testcase for amdgpu. Like the testcase below: define void @s_sext_in_reg_i1_i16(i16 addrspace(1)* %out, i32 addrspace(2)* %ptr) #0 { %ld = load i32, i32 addrspace(2)* %ptr %in = trunc i32 %ld to i16 %shl = shl i16 %in, 15 %sext = ashr i16 %shl, 15 store i16 %sext, i16 addrspace(1)* %out ret void } code with the patch: s_load_dwordx2 s[4:5], s[0:1], 0x9 s_load_dwordx2 s[0:1], s[0:1], 0xb s_mov_b32 s7, 0xf000 s_mov_b32 s6, -1 s_mov_b32 s2, s6 s_mov_b32 s3, s7 s_waitcnt lgkmcnt(0) buffer_load_ushort v0, off, s[0:3], 0 s_waitcnt vmcnt(0) v_bfe_i32 v0, v0, 0, 1 buffer_store_short v0, off, s[4:7], 0 s_endpgm code without the patch: s_load_dwordx2 s[4:5], s[0:1], 0x9 s_load_dwordx2 s[0:1], s[0:1], 0xb s_mov_b32 s7, 0xf000 s_mov_b32 s6, -1 s_waitcnt lgkmcnt(0) s_load_dword s0, s[0:1], 0x0 s_waitcnt lgkmcnt(0) s_bfe_i32 s0, s0, 0x10000 v_mov_b32_e32 v0, s0 buffer_store_short v0, off, s[4:7], 0 s_endpgm amdgpu codegen chooses to use buffer_load_short instead of s_load_dword and generates longer code sequence. I know almost nothing about amdgpu so I simply add the hook and only focus on the architectures I am more faimiliar with before the patch becomes in better shape and stable.
lib/CodeGen/MemAccessShrinking.cpp
1014 ↗	(On Diff #93973)	An instruction which is not trivially dead for now may become dead after other instructions in CandidatesToErase are removed. That is why I want to add it to CandidatesToErase.
1027 ↗	(On Diff #93973)	Ah, right. Actually, I shouldn't use range based loop since the iterator will be invalidated after insertion and deletion.

efriedma added a subscriber: arsenm.Apr 21 2017, 5:08 PM

efriedma added inline comments.Apr 21 2017, 5:14 PM

include/llvm/Target/TargetLowering.h
1908 ↗	(On Diff #93973)	Huh, GPU targets are weird like that. I would still rather turn it off for amdgpu, as opposed to leaving it off by default.
lib/CodeGen/MemAccessShrinking.cpp
1014 ↗	(On Diff #93973)	OpI has no uses here. The only way an instruction can have no uses and still not be trivially dead is if it has side-effects. Deleting other instructions won't change the fact that it has side-effects.

arsenm added inline comments.Apr 21 2017, 5:45 PM

include/llvm/Target/TargetLowering.h
1908 ↗	(On Diff #93973)	32-bit loads should not be reduced to a shorter width. Using a buffer_load_ushort is definitely worse than using s_load_dword. There is a target hook that is supposed to avoid reducing load widths like this

wmi added inline comments.Apr 21 2017, 6:07 PM

include/llvm/Target/TargetLowering.h
1908 ↗	(On Diff #93973)	Matt, thanks for the explanation. I guess the hook is isNarrowingProfitable. However, the hook I need is a little different. I need to know whether narrowing is expensive enough. isNarrowingProfitable on x86 shows i32 --> i16 is not profitable, maybe slightly harmful, but it is not quite harmful, and the benefit to do narrowing may outweigh the cost.
lib/CodeGen/MemAccessShrinking.cpp
1014 ↗	(On Diff #93973)	You are right. The entire logic about CandidatesToErase is problematic. I will fix it.

I'm still working on this, but since Wei mentioned he is looking at fixing the CandidatesToErase stuff, I wanted to go ahead and send these comments -- there is a significant one w.r.t. to the deletion stuff as well.

lib/CodeGen/MemAccessShrinking.cpp
84–85 ↗	(On Diff #93973)	Are both of these really useful for debugging? We already have a flag that controls whether the pass is enabled or not.
88–89 ↗	(On Diff #93973)	"mod" range seems like an odd name?
92 ↗	(On Diff #93973)	Maybe a comment on what this is used for?
99–100 ↗	(On Diff #93973)	We generally use initializer sequences: MemAccessShrinkingPass(const TargetMachine *TM = nullptr) : FunctionPass(ID), TM(TM) { Also, sense this is an internal type I'd skip the default argument for `TM` as it seems to not really give you much.
131 ↗	(On Diff #93973)	I would try to expand unusual acronyms: 'du-chain' -> 'Def-Use chain'.
983–996 ↗	(On Diff #93973)	Rather than reproduce the body of `RecursivelyDeleteTriviallyDeadInstructions` here, how about actually refactoring that routine to have an overload taht accepts a SmallVectorImpl<Instruction > list of dead instructions, then you can hand your list to this routine rather than writing your own. Specifically, I think (as Eli is alluding to below) you should only put things into your `CandidatesToErase` vector (which I would rename `DeadInsts` or something) when they satisfy `isInstructionTriviallyDead`. Even if deleting one of the instructions is necessary to make one of the candidates dead, we'll still visit it because these routines recursively* delete dead instructions already.
1051–1053 ↗	(On Diff #93973)	This pattern seems confusing. How about using a lambda (or even an actual separate function) to model a single pass over the function, so that it can just return a single `Changed` variable?
1056–1058 ↗	(On Diff #93973)	I would still use a for loop here, and importantly capture rend early: for (auto InstI = BB->rbegin(), InstE = BB-rend(); InstI != InstE;) ...(*InstI++);
1061 ↗	(On Diff #93973)	Why not just one `Changed` variable?

Thanks for bearing with my poor English. I will fix the terminologies and comments according to your suggestions.

lib/CodeGen/MemAccessShrinking.cpp
84–85 ↗	(On Diff #93973)	It is actually put there for my own conveniency when debugging. I will remove them.
88–89 ↗	(On Diff #93973)	Yes, I will change them.
92 ↗	(On Diff #93973)	Sure.
99–100 ↗	(On Diff #93973)	Ok.
131 ↗	(On Diff #93973)	Will fix it.
983–996 ↗	(On Diff #93973)	Ok, I will do some refactoring based on RecursivelyDeleteTriviallyDeadInstructions. Another motivation is that I need to update MSSA while deleting instruction. We don't have callback to remove MemoryAccess when we delete memory instruction.
1051–1053 ↗	(On Diff #93973)	It uses the same iterative pattern as CodeGenPrepare, but maybe the iterative pattern in InstCombine is clearer -- only one Changed variable is used there. I will wrap a single pass into a function. I probably rename tryShrinkOnInst to tryShrinkOnFunc and use it to wrap a single pass. Existing tryShrinkOnInst is simple enough so I can inline its content into tryShrinkOnFunc.
1056–1058 ↗	(On Diff #93973)	ok, will change it.

First off, really sorry to keep sending *partial* code reviews. =[ I again didn't quite have enough time to do a full review of the patch (it is a bit large) but wanted to at least send out everything I have so that you aren't blocked waiting on me to produce some comments. =] I'll try again tomorrow to make more progress here although it may start to make sense for me to wait for an iteration as one of the refactorings I'm suggesting here will I think change the structure quite a bit.

In D30416#734516, @wmi wrote:

Thanks for bearing with my poor English.

Please don't stress at all. =D I think reviewing comments, phrasing, etc., needs to happen in any code review. The whole point is to figure out how to write comments and such in a way that make sense to others, and speaking for myself at least, no level of knowledge about the English language is enough there -- it really requires someone else reading it to figure this out.

lib/CodeGen/MemAccessShrinking.cpp
374–396 ↗	(On Diff #93973)	This comment has been deleted.
417–430 ↗	(On Diff #93973)	Even after reading your comment I'm not sure I understand what is driving the complexity of this match. Can you explain (maybe just here in the review) what patterns you're trying to handle that are motivating this? I'm wondering whether there is any canonicalization that can be leveraged (or added if not already there) to reduce the complexity of the pattern here. Or if we really have to handle this complexity, what the best way to write it and/or comment it so that readers understand the result.
453–456 ↗	(On Diff #93973)	What happens when both are true? It looks like we just overwrite the 'MR' code? I feel like both of these `analyze...()` methods should return the `ModRange` struct rather than having an output parameter.

chandlerc added inline comments.Apr 24 2017, 6:41 PM

lib/CodeGen/MemAccessShrinking.cpp
374–396 ↗	(On Diff #93973)	After reading more of this routine, I think you should split it into two routines, one that tries to handle the first pattern, and one that only handles the second pattern. You can factor the rewriting code that is currently shared by both patterns into utility functions that are called for both. But the logic of this routine is harder to follow because you always have this state to hold between doing two different kinds of transforms.
407 ↗	(On Diff #93973)	`TBits` doesn't really give me enough information as a variable name... Maybe `StoreBitSize`?
467–470 ↗	(On Diff #93973)	Should this be testing against the `DataLayout` rather than hard coded `8`, `16`, and `32`? What if 64 bits is legal and that's the width of the MR?
604–607 ↗	(On Diff #93973)	This comment and function name don't really add up for me... There is no `Cst` parameter here. I assume you mean `AI`? Also having a flag like `AInB` seems to make this much more confusing to read. Why not just have two routines for each case? My guess at what this is actually trying to do is `areConstantBitsWithinModRange` and `areConstantBitsOutsideModRange`?
619–623 ↗	(On Diff #93973)	Maybe a method and use the term 'disjoint'? `MR1.isDisjoint(MR2)` reads a bit better to me.
626–628 ↗	(On Diff #93973)	This makes this a very confusing API -- now it isn't really just a predicate, it also computes the insertion point... Also, why do you need to compute a particular insertion point within a basic block? Can't you just always insert into the beginning of the basic block and let the scheduler do any other adjustments?

arsenm added inline comments.Apr 24 2017, 7:36 PM

include/llvm/Target/TargetLowering.h
1908 ↗	(On Diff #93973)	The hook I was thinking of was shouldReduceLoadWidth. s_load_dword uses a different cache with much faster access than the buffer instruction if it can be used

Thanks for drafting the comments. It is apparently more descriptive and clearer, and I like the varnames -- (LargeVal and SmallVal), which are much better than what I used -- (OrigVal, MaskedVal). I will rewrite the comments based on your draft.

lib/CodeGen/MemAccessShrinking.cpp
374–396 ↗	(On Diff #93973)	I tried that splitting before but I had to move many temporaries to MemAccessShrinkingPass. However, some temporaries are only used by store shrinking but not load shrinking, so that looked a little weird. I agree the logic will be clearer if it is split into two routines. I will try again, and see if I can separate some common temporaries into MemAccessShrinkingPass and leave the special temporaries as parameters, or create a store shrinking class to keep the temporaries.
407 ↗	(On Diff #93973)	Ok, will change it.
417–430 ↗	(On Diff #93973)	I borrow the template in your comments to explain: store(or(and(LargeVal, MaskConstant), SmallVal), address) The case is: store(or_1(and_1(or_2(and_2(load, -65281), Val1), -256), and_3(Val2, 7)) The two operands of "or_1" are "and_1" and "and_3", but it doesn't know which subtree of and1 or and3 contains the LargeVal. I hope or_2 can be matched to the LargeVal. It is a common pattern after bitfield load/store coalescing. But I realize when I am explaining to you, that I can split the complex pattern matching above into two, which may be simpler. bool OrAndPattern = match(Val, m_c_Or(m_And(m_Value(LargeVal), m_ConstantInt(Cst)), m_Value(SmallVal))); if (match(SmallVal, m_c_And(m_c_Or(m_And(m_Value(), m_Value()), m_Value()))) std::swap(SmallVal, LargeVal);
453–456 ↗	(On Diff #93973)	We will just overwrite "MR" but it is still not good for "OrAndPattern". I will change the second "if" to "else if".
467–470 ↗	(On Diff #93973)	That is better. Will fix it.
604–607 ↗	(On Diff #93973)	Sorry, Cst means AI here. The func is doing areConstantBitsWithinModRange and areModRangeWithinConstantBits.
619–623 ↗	(On Diff #93973)	Ok.
626–628 ↗	(On Diff #93973)	Good point. If there is a clobber instruction, but that clobber instruction is at the same block as "To" Instruction, I can simply insert it at the beginning of the block of "To" instruction, and NewInsertPt is not needed. But I still prefer to use the insertion point closer to "To" instruction if there is no clobber instruction, because the IR looks better. That means at least a flag showing whether I need to insert at the beginning of the "To" instruction block has to be returned. I.E., I can simplify "Instruction *&NewInsertPt" to a flag. Is that API acceptable?

wmi added inline comments.Apr 28 2017, 4:11 PM

include/llvm/Target/TargetLowering.h
1908 ↗	(On Diff #93973)	shouldReduceLoadWidth is a hook for TargetLowering but I need a hook in TargetLoweringBase, which can be used for llvm IR pass. I cannot change shouldReduceLoadWidth to be a hook in TargetLoweringBase because of the way in which x86 uses it, so I copy the logic in AMDGPUTargetLowering::shouldReduceLoadWidth to AMDGPUTargetLowering::isNarrowingExpensive. I can let shouldReduceLoadWidth call isNarrowingExpensive in a NFC. Is it ok?
lib/CodeGen/MemAccessShrinking.cpp
417–430 ↗	(On Diff #93973)	I find I still have to keep the original complex pattern. Now I remember where the real difficulty is: For case like: store(or_1(and_1(or_2(and_2(load, -65281), Val1), -256), and_3(Val2, 7)), I want to match LargeVal to or_2(and_2(load, ...) But I cannot use match(Val, m_c_Or(m_And(m_c_Or(m_And(...)))) because I have no way to get the intermediate results of the match, like I cannot bind LargeVal to the second m_c_Or. So I have to split the match into multiples. That is where the complexity comes from.
467–470 ↗	(On Diff #93973)	For x8664, DataLayout works fine. However, for other architectures, like arm, the datalayout is target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64" DL::LegalIntWidths will only contain widths of natural integers, represented by "n32", so only 32 is Legal Integer Width for ARM. For x8664, its datalayout is: target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" because of "n8:16:32:64", 8, 16, 32, 64 are all legal integer width.

Address Eli, Matt and Chandler's comments.

Some major changes:

A lot of comments changed.
split reduceLoadOpsStoreWidth into two and several other helper functions.
refactor RecursivelyDeleteTriviallyDeadInstructions.

Herald added a subscriber: nhaehnle. · View Herald TranscriptApr 28 2017, 4:20 PM

Somewhat focused on the store side. Near the bottom is a high-level comment about the load shrinking approach.

lib/CodeGen/MemAccessShrinking.cpp
104 ↗	(On Diff #97170)	Still should omit the `= nullptr` here since this is an internal type.
154–157 ↗	(On Diff #97170)	These seem like helper functions that don't actually need to be part of the class at all. Maybe make them static free functions? (This may be true for some of the other routines in this file.)
205–207 ↗	(On Diff #97170)	It seems much more idiomatic to return the bool indicating whether a valid BitRange was computable, and if true, set up the values in an output parameter. Or even better, you could return an `Optional<BitRange>`, and return `None` when the requirements aren't satisfied.
209 ↗	(On Diff #97170)	I know other passes use the variable name `Cst` but I'd suggest using just `C` for generic constants or some more descriptive term like you use elsewhere like `Mask`.
212–215 ↗	(On Diff #97170)	Do you really need to extend or truncate here? Surely the type system has already caused the constant to be of the size you want? If so, I'd just assert it here. Maybe could directly pass a 'const &' APInt in as the parameter, letting you call the parameter `Mask` above?
216–217 ↗	(On Diff #97170)	I would call these `MaskLeadingOnes` and `MaskTrailingOnes`.
218–220 ↗	(On Diff #97170)	I'm having trouble understanding the logic here in the case where there are leading ones. Here is my reasoning, but maybe I've gotten something wrong here: Shifting right will remove leading ones, but you're shifting right the number of trailing ones... Shouldn't that be leading ones? And won't the result of a shift right be to place the middle zero sequence at the least significant bit, meaning you would want to count the leading zeros? Put differently, arithmetic shift is required to not change the most significant bit, so doing an arithmetic shift right based on how many ones are trailing, seems like it will never change the count of trailing zeros. If this is correct, then this is a bug and you should add some test cases that will hit this bug. But regardless of whether my understanding is correct or there is a bug here, I think this can be written in a more obvious way: unsigned MaskMidZeros = BitSize - (MaskLeadingOnes + MaskTrailingOnes); And then directly testing whether they are all zero: if (Mask == APInt::getBitsSet(BitSize, MaskLeadingOnes, MaskLeadingOnes + MaskMidZeros)) {
223–225 ↗	(On Diff #97170)	Why would we see an all ones mask? Shouldn't have that been eliminated earlier? It seems like we could just bail in this case.
246 ↗	(On Diff #97170)	The idiomatic way to test this with APInt is `BitMask.isSubsetOf(KnownZero)`. Also, it would be good to use early-exit here. It sounds like you are testing whether it is valid to do anything, but that isn't clear when you have set up members of `BR` here before returning.
260 ↗	(On Diff #97170)	When you have alternates, the pattern notation is a bit confusing. I'd just say something like `Analyze <bop>(load P, \p Cst) where <bop> is either 'or', 'xor', or 'and'.`.
261 ↗	(On Diff #97170)	This isn't really about whether the original value is loaded or not, right? It is just bounding the changed bits? I'd explain it that way. You'll mention the load when you use it.
263–264 ↗	(On Diff #97170)	Maybe a better name of this function would be: `computeBopChangedBitRange`?
265 ↗	(On Diff #97170)	Same comment above about just asserting the correct bitsize and passing the APInt Mask in directly.
266 ↗	(On Diff #97170)	Why not pass the argument as a BinaryOperator?
267–268 ↗	(On Diff #97170)	Might be nice to add a comment explaining the logic here. Something like: Both 'or' and 'xor' operations only mutate when the operand has a one bit. But 'and' only mutates when the operand has a zero bit, so invert the constant when the instruction is an and so that all the (potentially) changed bits are ones in the operand.
284 ↗	(On Diff #97170)	Why the `PowerOf2Ceil` here? Will the actual store used have that applied? If the actual store has that applied, why don't we want to consider that as `BitSize` so we're free to use that larger size for the narrow new type?
286–289 ↗	(On Diff #97170)	As the comment explains, this lambda is actually computing the Shift. But the name seems to indicate it is just a predicate testing whether the old range is covered by the new one. Also, why does the old `BR` need to passed in as an argument, isn't that something that can be captured? I actually like passing `NewBR` in here to show that it is what is changing between calls to this routine. But it seems awkward to setup `NewBR` before this lambda (which would allow it to be implicitly capture it) and then call it with a parameter name that shadows that, therefore avoiding capturing it. I'd consider whether you want to sink `NewBR` down or othewise more cleanly handle it in the loop. Nit pick: we typically name lambdas like variables with `FooBar` rather than like functions with `fooBar`.
290 ↗	(On Diff #97170)	What about platforms with support for unaligned loads? Probably best to just leave a FIXME rather than adding more to this patch, but it seems nice to mention that technique. As an example, on x86, if you have a bitfield that looks like: struct S { unsigned a : 48; unsigned b : 48; unsigned c : 32; }; It seems likely to be substantially better to do single 8-byte load and mask off the high 2 bytes when accessing `b` than to do two nicely aligned 8-byte loads and all the bit math to recombine things.
329 ↗	(On Diff #97170)	Pass `BR` by value? (or make it a const reference, but it seems small)
331–335 ↗	(On Diff #97170)	Generally we prefer early returns to state. That would make this: if (...) return ...; return ...;
345–346 ↗	(On Diff #97170)	You can replace uses of this with: `V1->stripPointerCasts() == V2->stripPointerCasts()`. This will be more powerful as well.
370–371 ↗	(On Diff #97170)	Maybe this is just a strange API on MemorySSA, but typically I wouldn't expect a lack of dominance to indicate that no access between two points exists. How does MemorySSA model a pattern that looks like: From x \ / \ / A \| \| To Where `A` is a defining access, is between `From` and `To`, but I wouldn't expect `From` to dominate `A` because there is another predecessor `x`.
393 ↗	(On Diff #97170)	`StOffset` seems an odd name if this is used to create new pointers for loads as well as stores.
402 ↗	(On Diff #97170)	No need to call it `uglygep`. If you want a clue as to the types, maybe `rawgep` or `bytegep`.
407–408 ↗	(On Diff #97170)	This comment mentions `LargeVal` but that isn't an argument?
422 ↗	(On Diff #97170)	This reads like a fragment, were there supposed to be more comments before this line?
557–558 ↗	(On Diff #97170)	Keeping this close to `extendBitRange` would make it a lot easier to read. Also, why have two functions at all? It appears this is the only thing calling `extendBitRange`. (I'm OK if there is a reason, just curious what it is.)
562–565 ↗	(On Diff #97170)	I'm surprised this doesn't just fall out from the logic in `extendBitRange`.
575 ↗	(On Diff #97170)	Is `StoreShrunkInfo` buying much here? It seems to mostly be used in arguments, why not just pass the argument directly? The first bit of the code seems to just unpack everything into local variables.
587–588 ↗	(On Diff #97170)	Doesn't MinAlign handle 0 correctly so that you can just do this unconditionally?
593–595 ↗	(On Diff #97170)	Isn't this only called when we need to insert two stores?
596–606 ↗	(On Diff #97170)	It feels like all of this could be factored into an 'insertStore' method? In particular, the clone doesn't seem to buy you much as you rewrite most parts of the store anyways. This could handle all of the MemorySSA updating, logging, etc.
662–663 ↗	(On Diff #97170)	Rather than cloning and mutating, just build a new load? the IRBuilder has a helpful API here.
679–687 ↗	(On Diff #97170)	There is no need to handle constants specially. The constant folder will do all the work for you.
690–705 ↗	(On Diff #97170)	I think the amount of code that is special cased here for one caller of this routine or the other is an indication that there is a better factoring of the code. If you had load insertion and store insertion factored out, then each caller could cleanly insert the narrow load, compute the narrow store (differently), and then insert it. Does that make sense? Maybe there is a reason why that doesn't work well?
729–730 ↗	(On Diff #97170)	Might read more easily as: "Assuming that \p AI contains a single sequence of bits set to 1, check whether the range \p BR is covered by that sequence."
732–733 ↗	(On Diff #97170)	It seems more obvious to me to test this the other way: BR.Shift >= AI.countLeadingZeros() && BR.Shift + BR.Width < (AI.getBitWidth() - AI.countTrailingZeros()) Is this not equivalent for some reason? (Maybe my brain is off...) The reason I find this easier to read is because it seems to more directly test: "is the start of the BitRange after the start of the 1s, and is the end of the BitRange before the end of the 1s.".
742–745 ↗	(On Diff #97170)	There is no comment about the cost of this routine. It looks really expensive. It appears to walk all transitive predecessors of the block containing `To`. So worst case, every basic block in the function. I see this called in several places from inside of for-loops. Is this really a reasonable approach? Why aren't we just walking the def-use chain from MemorySSA to figure this kind of thing out in a much lower time complexity bound? Like, shouldn't we just be able to walk up defs until we either see a clobber or `From`?
762–763 ↗	(On Diff #97170)	You can just insert -- that will return whether it succeeded in inserting the block.
770 ↗	(On Diff #97170)	Naming convention.
780–781 ↗	(On Diff #97170)	So, each of these `dominates` queries are incredibly slow. They require linearly walking every instruction in the basic block (worst case). Why doesn't MemorySSA handle this for you? (Maybe my comment above about using MemorySSA will obviate this comment though.)
946 ↗	(On Diff #97170)	It would be much more clear for this to be a parameter rather than an implicit parameter via class member. For example, multiple uses of what?
967–997 ↗	(On Diff #97170)	Rather than re-implementing all of this logic, can you re-use the existing demanded bits facilities in LLVM? For example, I think you can use the `DemandedBits` analysis, walk all loads in the function, and then narrow them based on the demanded bits it has computed. Because of how `DemandedBits` works, it is both efficient and very powerful. It can handle many more patterns. Thinking about this, I suspect you'll want to do two passes essentially. First, narrow all the stores that you can. This will likely be iterative. Once that finishes, it seems like you'll be able to then do a single walk over the loads with a fresh `DemandedBits` analysis and narrow all of those left. You'll probably want to narrow the stores first because that may make bits stop being demanded. But I don't see any way for the reverse to be true, so there should be a good sequencing. To make the analysis invalidation stuff easier, you may actually need this to actually be two passes so that the store pass can invalidate the `DemandedBits` analysis, and the load pass can recompute it fresh. Does that make sense? If so, I would suggest getting just the store shrinking in this patch, and add the load shrinking in a follow-up patch. I'm happy for them to be implemented in a single file as they are very similar and its good for people to realize they likely want both passes.
998 ↗	(On Diff #97170)	`Inst` would seem like a more common variable name here.
1144–1145 ↗	(On Diff #97170)	Comments would be good explaining the very particular iteration order.
1183–1184 ↗	(On Diff #97170)	Do you want to run `tryShrink` again just because you removed dead instructions? If so, do you want to remove dead instructions on each iteration instead of just once `tryShrink` doesn't make a change?
417–430 ↗	(On Diff #93973)	It would be really nice if LLVM would canonicalize in one way or the other so you didn't have to handle so many variations. Asking folks about whether we can/should do anything like that. But I think the bigger question is, why would only two layers be enough? I feel like there is something more general here that will make explaining everything else much simpler. Are you looking for a load specifically? Or are you just looking for one side of an `or` which has a "narrow" (after masking) `and`? If the former, maybe just search for the load? If the latter, maybe you should be just capturing the two sides of the or, and rather than looking explicitly for an 'and', instead compute whether the non-zero bits of one side or the other are "narrow"?

This revision now requires changes to proceed.May 9 2017, 5:33 PM

Chandler, Thanks for the comments. They are very helpful. I will address them in the next revision. I only replied some comments which I had questions or concerns.

lib/CodeGen/MemAccessShrinking.cpp
104 ↗	(On Diff #97170)	I cannot omit it because In INITIALIZE_TM_PASS_END, callDefaultCtor<passName> requires the param to have a default value.
218–220 ↗	(On Diff #97170)	Shifting right will remove leading ones, but you're shifting right the number of trailing ones... Shouldn't that be leading ones? And won't the result of a shift right be to place the middle zero sequence at the least significant bit, meaning you would want to count the leading zeros? I think shifting right will remove trailing ones? And after the shift (Mask.ashr(MaskTrailOnes)), middle zeros are at the least sigficant bits, and they are trailing zeros, right? But like you said, I should rule out the all zero/all one cases separately so the logic will become more clear.
370–371 ↗	(On Diff #97170)	The case will not happen because we ensure `From` dominates `To` before calling the function. You are right, it is better to add an assertion at the entry of the function to prevent misuse of the API.
596–606 ↗	(On Diff #97170)	I use clone here just to duplicate the subclass data like volatile and ordered.
742–745 ↗	(On Diff #97170)	That is because the instruction `To` here may not be a memory access instruction (It is probably a And or Trunc instruction which indicates only some bits of the input are demanded), and we cannot get a MemoryAccess for it. Note hasClobberBetween are overloaded and there are two versions. The other version which walks the MSSA def-use chain is used in several for-loops as you saw. This higher cost version is not used in a loop. Besides, we only check MSSA DefList in each BB, so the worse case complexity is the number of memory access instructions in the func, which is usually much less than the number of instructions in the func.
946 ↗	(On Diff #97170)	MultiUsesSeen is not changed for every instruction. It is saying whether a previous instruction on the chain was found to have multiuse when we walk the chain bottom-up. r1 = ...; r2 = r1 + r3; r4 = r2 + r5; If `r2` has multiple uses, both `r2 = r1 + r3` and `r1 = ...` cannot be removed after the shrinking.
967–997 ↗	(On Diff #97170)	I considered demanded bits facilities before, but I found it can only simplify the code a little bit. Finding the demanded bits inside of the load is only a small part of the work. Most of the complexity of the code comes from figuring out which ops in the sequence on the Def-Use Chain change the demanded bits. Like if we see shifts, we may clear some demanded bits in less significant position to zeros because we shift right then shift left. Because we change the demanded bits, we must include the shifts into the shrinked code sequence. Like if we see Or(And(Or(And(...)) pattern, we want to know that the bits changed by Or(And()) are different bits of the demanded bits, only when that is true, we can omit the Or(And(...)) pattern in the final shrinked code sequence. Another reason is, demanded bits analysis may not be very cheap. As for memory shrinking, few pattern like and/trunc is very common to be useful for the shrinking so we actually don't need very general demanded bits analysis for every instruction.
1183–1184 ↗	(On Diff #97170)	If dead instruction is removed, another iteration will be taken and tryShrink will run again. I think it makes no difference between `running removeDeadInsts only when tryShrink makes no change` and `running removeDeadInsts everytime after tryShrink makes a change`.

Trying to provide answers to the open questions here...

lib/CodeGen/MemAccessShrinking.cpp
104 ↗	(On Diff #97170)	Ah, right, I forgot that about the pass initialization. Sorry for the noise!
218–220 ↗	(On Diff #97170)	Ah, ok, this makes sense to me now. I had confused myself thinking about it. Anyways, the simpler formulation will avoid any future reader confusion.
370–371 ↗	(On Diff #97170)	Ok, while that makes sense, it still seems counter-intuitive in terms of how to use MemorySSA based on my limited understanding. I would have expected essentially walking up the def's from the use until either a clobber is found or the 'from' is found. One has to come first, and which ever is first dictates if there is a clobber. Essentially, I would expect to use the SSA properties to answer these questions rather than the dominance or control flow properties. But I'm happy if folks more deeply familiar with MemorySSA can explain better why this is the right way to use it, as I'm still new to this infrastructure in LLVM.
596–606 ↗	(On Diff #97170)	I still think it will be cleaner to directly construct the load. Also, I wouldn't expect this pass to be valid for either volatile or ordered loads...
742–745 ↗	(On Diff #97170)	Given how different the two routines are, I would suggest giving them separate names. It seemingly wasn't obvious that they were different already. I'm happy to look at the use cases, but this still feels much too expensive to me. In terms of big-O, the fact that it is only memory accesses doesn't really seem to help much. Quadratic in the number of memory accesses is still probably not something we can realistically do. I want to think more about the algorithm once I see exactly where this is being called.
946 ↗	(On Diff #97170)	Ok, that explanation makes sense, but you'll need to find a way to make this clear from the code itself. =] At the very least, not using a member, but probably with some more helpful variable names, function names, structure or comments.
967–997 ↗	(On Diff #97170)	I don't really understand the argument here... I would expect the demanded bits facilities to already handle things like shifts changing which bits are demanded, does it not? If not, it seems like we should extend that general facility rather than building an isolated system over here. Regarding the cost, I'm much more worried about the algorithmic cost of this and the fact that it seems relatively sensitive to things that we don't reliably canonicalize (using trunc or and instructions to remove demanded bits). Generally speaking, working backwards from the and or trunc is going to be much more expensive than working forwards. But even if it the existing demanded bits is too expensive, we still shouldn't develop a new cheap one locally here. We should either add a parameter to decrease its cost or add a new general purpose "fast" demanded bits and then use that here.
1183–1184 ↗	(On Diff #97170)	I guess I'm trying to ask: why will the removal of dead instructions cause shrinking to become more effective? Most of the algorithms here don't seem likely to remove entire classes of uses, and so I'm not sure why this iteration is valuable at all. But if it is valuable, that is, if removing dead instructions exposes more shrinking opportunities, I would expect that removing dead instructions earlier (IE, on each iteration) to cause this to converge faster.

wmi added inline comments.May 10 2017, 10:37 PM

lib/CodeGen/MemAccessShrinking.cpp
596–606 ↗	(On Diff #97170)	Ok, I will change it to use IRBuilder.
967–997 ↗	(On Diff #97170)	I would expect the demanded bits facilities to already handle things like shifts changing which bits are demanded, does it not? Yes, demanded bits facilities can adjust which bits are demanded if there is shift. However, which bits are the demanded bits is not the only thing I need to know. I still need to know whether an operation in the Def-Use Chain effectively change the value of the bits demanded. If the operation changes the value of demanded bits, the operation should be shrinked together with the load. If it only changes the value of bits other than the demanded bits, the operation can be omitted. That is actually what most of the pattern matching is doing in load shrinking, and that part of work I think cannot be fullfilled by demanded bits analysis.
1183–1184 ↗	(On Diff #97170)	why will the removal of dead instructions cause shrinking to become more effective? After removing some dead instructions, some multi uses def will become single use def and it will help increasing the benefit to do more shrinking. if removing dead instructions exposes more shrinking opportunities, I would expect that removing dead instructions earlier (IE, on each iteration) to cause this to converge faster. Ok, then I will do it after tryShrink in every iteration.

wmi added inline comments.May 11 2017, 7:26 AM

lib/CodeGen/MemAccessShrinking.cpp
967–997 ↗	(On Diff #97170)	Even only for demanded bits, working forwards is not that straightforward because a wide load for a field group will be shared by multiple field accesses, and demanded bits analysis will sometimes show that all the bits of the load will be used. And it is possible that on the upper side of the Def-Use Chain, the width of the demanded bits are larger because of a node may have multiple uses, on the lower side of the Def-Use Chain, the width of the demanded bits are smaller. Working forward, we have to do some search on the expr tree rooted at a load stmt. Working backwards will let us know the demanded bits we want to use from the beginning.

Discussed with Chandler offline, and we decided to split the patch and tried to commit the store shrinking first.

Then I tried the idea of walking forward for load shrinking by using demandedbits but I run into a problem for the motivational testcase (test/CodeGen/X86/mem-access-shrink.ll). Look at the %bf.load which we want to shrink in mem-access-shrink.ll, it has multiple uses, so we want to look at all its uses and get the demanded bits for each use. However, on the Def-Use chain from %bf.load to its narrower uses, it is not only %bf.load having multiple uses, for example, %bf.set also has multiple uses, so we also need to look at all the uses of %bf.set. In theory, every node on the Def-Use Chain can have multiple uses, then at the initial portion of Def-Use Chain starting from %bf.load, we don't know whether the %bf.load can be shrinked or not from demanded bits, only when we walk pretty close to the end of the Def-Use Chain, we know whether %bf.load can be shrinked at the specific place. In other words, by walking forward, in order not to miss any shrinking opportunity, we have to walk across almost all the nodes on the Def-Uses tree before knowing where %bf.load can be shrinked.

For walking backward, most of the cases, we only have to walk from a narrower use to "%bf.load =" which is the root of the Def-Uses tree. It is like walking from some leafs to root, which should be more efficient in most cases. I agree we may see special testcase like there is a long common portion for those paths from leafs to root (for that case walking forward is better). If that happen, we can add a cap about the maximum walking distance to avoid the compile time cost from being too high. Chandler, do you think it is ok?

In D30416#756899, @wmi wrote:

Discussed with Chandler offline, and we decided to split the patch and tried to commit the store shrinking first.

Then I tried the idea of walking forward for load shrinking by using demandedbits but I run into a problem for the motivational testcase (test/CodeGen/X86/mem-access-shrink.ll). Look at the %bf.load which we want to shrink in mem-access-shrink.ll, it has multiple uses, so we want to look at all its uses and get the demanded bits for each use. However, on the Def-Use chain from %bf.load to its narrower uses, it is not only %bf.load having multiple uses, for example, %bf.set also has multiple uses, so we also need to look at all the uses of %bf.set. In theory, every node on the Def-Use Chain can have multiple uses, then at the initial portion of Def-Use Chain starting from %bf.load, we don't know whether the %bf.load can be shrinked or not from demanded bits, only when we walk pretty close to the end of the Def-Use Chain, we know whether %bf.load can be shrinked at the specific place. In other words, by walking forward, in order not to miss any shrinking opportunity, we have to walk across almost all the nodes on the Def-Uses tree before knowing where %bf.load can be shrinked.

For walking backward, most of the cases, we only have to walk from a narrower use to "%bf.load =" which is the root of the Def-Uses tree. It is like walking from some leafs to root, which should be more efficient in most cases. I agree we may see special testcase like there is a long common portion for those paths from leafs to root (for that case walking forward is better). If that happen, we can add a cap about the maximum walking distance to avoid the compile time cost from being too high. Chandler, do you think it is ok?

I'm somewhat worried about this cap -- it has hurt us in the past. But maybe there is a way to make walking backwards have reasonable complexity. It still seems like something we can do in a separate phase rather than having it interleave with the store-based shrinking, and so I'd still split it into a separate patch.

Setting aside forwards vs. backwards-with-a-cap, I still think it is a mistake to add yet another implementation of tracking which bits are demanded. So I would look at how you might share the logic in DemandedBits (or one of the other places in LLVM where we reason about this, I think there are already some others) for reasoning about the semantics of the IR instructions. Maybe there is no way to share that, but it seems worth trying. Either way, I'd suggest a fresh thread (or IRC) to discuss issues until there is a patch so that we can move the store side of this forward independently.

That make sense?

fhahn added a subscriber: fhahn.May 19 2017, 5:30 AM

Looks like this thread has gone stale for a while.

I have not read the patch in details so what I said might be be nonsense :) From reading of the discussions, it seems like that using DemandedBits Analysis is not ready today to handle forward walking for load shrinking, nor ideal for backward walking without using capping, so what is the good path going forward? Is it possible to keep the core of this patch but with more re-use of DemandedBit analysis (with refactoring)? If not, we may want to consider moving on with a stop-gap solution for now and committing on longer term unification as follow ups.

wmi mentioned this in D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type.Aug 9 2017, 5:04 PM

spatel mentioned this in D112300: [InstCombine] Don't split up Loads and free Exts.Oct 25 2021, 7:12 AM

Revision Contents

Path

Size

include/

llvm/

CodeGen/

Passes.h

3 lines

IR/

PatternMatch.h

3 lines

InitializePasses.h

1 line

LinkAllPasses.h

1 line

lib/

CodeGen/

CodeGen.cpp

1 line

TargetPassConfig.cpp

5 lines

Transforms/

Scalar/

BitfieldShrinking.cpp

894 lines

CMakeLists.txt

1 line

test/

CodeGen/

ARM/

bitfield-store.ll

364 lines

illegal-bitfield-loadstore.ll

137 lines

X86/

bitfield-store.ll

363 lines

illegal-bitfield-loadstore.ll

91 lines

tools/

opt/

opt.cpp

1 line

Diff 93018

include/llvm/CodeGen/Passes.h

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	namespace llvm {
/// This pass resets a MachineFunction when it has the FailedISel property		/// This pass resets a MachineFunction when it has the FailedISel property
/// as if it was just created.		/// as if it was just created.
/// If EmitFallbackDiag is true, the pass will emit a		/// If EmitFallbackDiag is true, the pass will emit a
/// DiagnosticInfoISelFallback for every MachineFunction it resets.		/// DiagnosticInfoISelFallback for every MachineFunction it resets.
/// If AbortOnFailedISel is true, abort compilation instead of resetting.		/// If AbortOnFailedISel is true, abort compilation instead of resetting.
MachineFunctionPass *createResetMachineFunctionPass(bool EmitFallbackDiag,		MachineFunctionPass *createResetMachineFunctionPass(bool EmitFallbackDiag,
bool AbortOnFailedISel);		bool AbortOnFailedISel);

		/// This pass shrinks some bitfields load/store to legal type accesses.
		FunctionPass createBitfieldShrinkingPass(const TargetMachine TM = nullptr);
		chandlercUnsubmitted Not Done Reply Inline Actions Since this is a CodeGen pass, the code should live in lib/CodeGen rather than in lib/Transforms. chandlerc: Since this is a CodeGen pass, the code should live in lib/CodeGen rather than in lib/Transforms.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.

/// createCodeGenPreparePass - Transform the code to expose more pattern		/// createCodeGenPreparePass - Transform the code to expose more pattern
/// matching during instruction selection.		/// matching during instruction selection.
FunctionPass createCodeGenPreparePass(const TargetMachine TM = nullptr);		FunctionPass createCodeGenPreparePass(const TargetMachine TM = nullptr);

/// AtomicExpandID -- Lowers atomic operations in terms of either cmpxchg		/// AtomicExpandID -- Lowers atomic operations in terms of either cmpxchg
/// load-linked/store-conditional loops.		/// load-linked/store-conditional loops.
extern char &AtomicExpandID;		extern char &AtomicExpandID;

▲ Show 20 Lines • Show All 362 Lines • Show Last 20 Lines

include/llvm/IR/PatternMatch.h

	Show First 20 Lines • Show All 311 Lines • ▼ Show 20 Lines
	/// \brief Match an instruction, capturing it if we match.			/// \brief Match an instruction, capturing it if we match.
	inline bind_ty<Instruction> m_Instruction(Instruction *&I) { return I; }			inline bind_ty<Instruction> m_Instruction(Instruction *&I) { return I; }
	/// \brief Match a binary operator, capturing it if we match.			/// \brief Match a binary operator, capturing it if we match.
	inline bind_ty<BinaryOperator> m_BinOp(BinaryOperator *&I) { return I; }			inline bind_ty<BinaryOperator> m_BinOp(BinaryOperator *&I) { return I; }

	/// \brief Match a ConstantInt, capturing the value if we match.			/// \brief Match a ConstantInt, capturing the value if we match.
	inline bind_ty<ConstantInt> m_ConstantInt(ConstantInt *&CI) { return CI; }			inline bind_ty<ConstantInt> m_ConstantInt(ConstantInt *&CI) { return CI; }

				/// \brief Match a load instruction, capturing the value if we match.
				inline bind_ty<LoadInst> m_Load(LoadInst *&LI) { return LI; }

	/// \brief Match a Constant, capturing the value if we match.			/// \brief Match a Constant, capturing the value if we match.
	inline bind_ty<Constant> m_Constant(Constant *&C) { return C; }			inline bind_ty<Constant> m_Constant(Constant *&C) { return C; }

	/// \brief Match a ConstantFP, capturing the value if we match.			/// \brief Match a ConstantFP, capturing the value if we match.
	inline bind_ty<ConstantFP> m_ConstantFP(ConstantFP *&C) { return C; }			inline bind_ty<ConstantFP> m_ConstantFP(ConstantFP *&C) { return C; }

	/// \brief Match a specified Value*.			/// \brief Match a specified Value*.
	struct specificval_ty {			struct specificval_ty {
	▲ Show 20 Lines • Show All 1,082 Lines • Show Last 20 Lines

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	void initializeAlwaysInlinerLegacyPassPass(PassRegistry&);			void initializeAlwaysInlinerLegacyPassPass(PassRegistry&);
	void initializeArgPromotionPass(PassRegistry&);			void initializeArgPromotionPass(PassRegistry&);
	void initializeAssumptionCacheTrackerPass(PassRegistry &);			void initializeAssumptionCacheTrackerPass(PassRegistry &);
	void initializeAtomicExpandPass(PassRegistry&);			void initializeAtomicExpandPass(PassRegistry&);
	void initializeBBVectorizePass(PassRegistry&);			void initializeBBVectorizePass(PassRegistry&);
	void initializeBDCELegacyPassPass(PassRegistry &);			void initializeBDCELegacyPassPass(PassRegistry &);
	void initializeBarrierNoopPass(PassRegistry&);			void initializeBarrierNoopPass(PassRegistry&);
	void initializeBasicAAWrapperPassPass(PassRegistry&);			void initializeBasicAAWrapperPassPass(PassRegistry&);
				void initializeBitfieldShrinkingPassPass(PassRegistry&);
	void initializeBlockExtractorPassPass(PassRegistry&);			void initializeBlockExtractorPassPass(PassRegistry&);
	void initializeBlockFrequencyInfoWrapperPassPass(PassRegistry&);			void initializeBlockFrequencyInfoWrapperPassPass(PassRegistry&);
	void initializeBoundsCheckingPass(PassRegistry&);			void initializeBoundsCheckingPass(PassRegistry&);
	void initializeBranchFolderPassPass(PassRegistry&);			void initializeBranchFolderPassPass(PassRegistry&);
	void initializeBranchProbabilityInfoWrapperPassPass(PassRegistry&);			void initializeBranchProbabilityInfoWrapperPassPass(PassRegistry&);
	void initializeBranchRelaxationPass(PassRegistry&);			void initializeBranchRelaxationPass(PassRegistry&);
	void initializeBreakCriticalEdgesPass(PassRegistry&);			void initializeBreakCriticalEdgesPass(PassRegistry&);
	void initializeCFGOnlyViewerLegacyPassPass(PassRegistry&);			void initializeCFGOnlyViewerLegacyPassPass(PassRegistry&);
	▲ Show 20 Lines • Show All 284 Lines • Show Last 20 Lines

include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createAggressiveDCEPass();		(void) llvm::createAggressiveDCEPass();
(void) llvm::createBitTrackingDCEPass();		(void) llvm::createBitTrackingDCEPass();
(void) llvm::createArgumentPromotionPass();		(void) llvm::createArgumentPromotionPass();
(void) llvm::createAlignmentFromAssumptionsPass();		(void) llvm::createAlignmentFromAssumptionsPass();
(void) llvm::createBasicAAWrapperPass();		(void) llvm::createBasicAAWrapperPass();
(void) llvm::createSCEVAAWrapperPass();		(void) llvm::createSCEVAAWrapperPass();
(void) llvm::createTypeBasedAAWrapperPass();		(void) llvm::createTypeBasedAAWrapperPass();
(void) llvm::createScopedNoAliasAAWrapperPass();		(void) llvm::createScopedNoAliasAAWrapperPass();
		(void) llvm::createBitfieldShrinkingPass();
(void) llvm::createBoundsCheckingPass();		(void) llvm::createBoundsCheckingPass();
(void) llvm::createBreakCriticalEdgesPass();		(void) llvm::createBreakCriticalEdgesPass();
(void) llvm::createCallGraphDOTPrinterPass();		(void) llvm::createCallGraphDOTPrinterPass();
(void) llvm::createCallGraphViewerPass();		(void) llvm::createCallGraphViewerPass();
(void) llvm::createCFGSimplificationPass();		(void) llvm::createCFGSimplificationPass();
(void) llvm::createCFLAndersAAWrapperPass();		(void) llvm::createCFLAndersAAWrapperPass();
(void) llvm::createCFLSteensAAWrapperPass();		(void) llvm::createCFLSteensAAWrapperPass();
(void) llvm::createStructurizeCFGPass();		(void) llvm::createStructurizeCFGPass();
▲ Show 20 Lines • Show All 143 Lines • Show Last 20 Lines

lib/CodeGen/CodeGen.cpp

	Show All 15 Lines
	#include "llvm-c/Initialization.h"			#include "llvm-c/Initialization.h"
	#include "llvm/PassRegistry.h"			#include "llvm/PassRegistry.h"

	using namespace llvm;			using namespace llvm;

	/// initializeCodeGen - Initialize all passes linked into the CodeGen library.			/// initializeCodeGen - Initialize all passes linked into the CodeGen library.
	void llvm::initializeCodeGen(PassRegistry &Registry) {			void llvm::initializeCodeGen(PassRegistry &Registry) {
	initializeAtomicExpandPass(Registry);			initializeAtomicExpandPass(Registry);
				initializeBitfieldShrinkingPassPass(Registry);
	initializeBranchFolderPassPass(Registry);			initializeBranchFolderPassPass(Registry);
	initializeBranchRelaxationPass(Registry);			initializeBranchRelaxationPass(Registry);
	initializeCodeGenPreparePass(Registry);			initializeCodeGenPreparePass(Registry);
	initializeCountingFunctionInserterPass(Registry);			initializeCountingFunctionInserterPass(Registry);
	initializeDeadMachineInstructionElimPass(Registry);			initializeDeadMachineInstructionElimPass(Registry);
	initializeDetectDeadLanesPass(Registry);			initializeDetectDeadLanesPass(Registry);
	initializeDwarfEHPreparePass(Registry);			initializeDwarfEHPreparePass(Registry);
	initializeEarlyIfConverterPass(Registry);			initializeEarlyIfConverterPass(Registry);
	▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

lib/CodeGen/TargetPassConfig.cpp

Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	static cl::opt<bool> DisablePostRAMachineLICM("disable-postra-machine-licm",
cl::Hidden,		cl::Hidden,
cl::desc("Disable Machine LICM"));		cl::desc("Disable Machine LICM"));
static cl::opt<bool> DisableMachineSink("disable-machine-sink", cl::Hidden,		static cl::opt<bool> DisableMachineSink("disable-machine-sink", cl::Hidden,
cl::desc("Disable Machine Sinking"));		cl::desc("Disable Machine Sinking"));
static cl::opt<bool> DisableLSR("disable-lsr", cl::Hidden,		static cl::opt<bool> DisableLSR("disable-lsr", cl::Hidden,
cl::desc("Disable Loop Strength Reduction Pass"));		cl::desc("Disable Loop Strength Reduction Pass"));
static cl::opt<bool> DisableConstantHoisting("disable-constant-hoisting",		static cl::opt<bool> DisableConstantHoisting("disable-constant-hoisting",
cl::Hidden, cl::desc("Disable ConstantHoisting"));		cl::Hidden, cl::desc("Disable ConstantHoisting"));
		static cl::opt<bool> DisableBitfieldShrinking("disable-bitfield-shrinking",
		cl::Hidden, cl::desc("Disable BitfieldShrinking"));
static cl::opt<bool> DisableCGP("disable-cgp", cl::Hidden,		static cl::opt<bool> DisableCGP("disable-cgp", cl::Hidden,
cl::desc("Disable Codegen Prepare"));		cl::desc("Disable Codegen Prepare"));
static cl::opt<bool> DisableCopyProp("disable-copyprop", cl::Hidden,		static cl::opt<bool> DisableCopyProp("disable-copyprop", cl::Hidden,
cl::desc("Disable Copy Propagation pass"));		cl::desc("Disable Copy Propagation pass"));
static cl::opt<bool> DisablePartialLibcallInlining("disable-partial-libcall-inlining",		static cl::opt<bool> DisablePartialLibcallInlining("disable-partial-libcall-inlining",
cl::Hidden, cl::desc("Disable Partial Libcall Inlining"));		cl::Hidden, cl::desc("Disable Partial Libcall Inlining"));
static cl::opt<bool> EnableImplicitNullChecks(		static cl::opt<bool> EnableImplicitNullChecks(
"enable-implicit-null-checks",		"enable-implicit-null-checks",
▲ Show 20 Lines • Show All 387 Lines • ▼ Show 20 Lines	void TargetPassConfig::addIRPasses() {

// Make sure that no unreachable blocks are instruction selected.		// Make sure that no unreachable blocks are instruction selected.
addPass(createUnreachableBlockEliminationPass());		addPass(createUnreachableBlockEliminationPass());

// Prepare expensive constants for SelectionDAG.		// Prepare expensive constants for SelectionDAG.
if (getOptLevel() != CodeGenOpt::None && !DisableConstantHoisting)		if (getOptLevel() != CodeGenOpt::None && !DisableConstantHoisting)
addPass(createConstantHoistingPass());		addPass(createConstantHoistingPass());

		if (!DisableBitfieldShrinking)
		addPass(createBitfieldShrinkingPass(TM));

		chandlercUnsubmitted Not Done Reply Inline Actions This should probably be predicated on `getOptLevel()` much like below? chandlerc: This should probably be predicated on `getOptLevel()` much like below?
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
if (getOptLevel() != CodeGenOpt::None && !DisablePartialLibcallInlining)		if (getOptLevel() != CodeGenOpt::None && !DisablePartialLibcallInlining)
addPass(createPartiallyInlineLibCallsPass());		addPass(createPartiallyInlineLibCallsPass());

// Insert calls to mcount-like functions.		// Insert calls to mcount-like functions.
addPass(createCountingFunctionInserterPass());		addPass(createCountingFunctionInserterPass());
}		}

/// Turn exception handling constructs into something the code generators can		/// Turn exception handling constructs into something the code generators can
▲ Show 20 Lines • Show All 430 Lines • Show Last 20 Lines

lib/Transforms/Scalar/BitfieldShrinking.cpp

				//===- BitfieldShrinking.cpp - Shrink the type of bitfield accesses -------===//
				chandlercUnsubmitted Not Done Reply Inline Actions I understand that the motivation here are bitfield accesses, but that isn't how we should describe the pass IMO. This is a generic pass to narrow memory accesses, and I think you should name it and document it accordingly. Naturally, you can still mention bitfields as one of the motivations and to help explain the specific patterns that are handled here. But if we have some other memory access shrinking we want to do, I would imagine that we would want to add it to this pass. This will probably need to be propagated through many of the comments here. chandlerc: I understand that the motivation here are bitfield accesses, but that isn't how we should…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions You are right. The impact of the pass is not limited to bitfield access. I renamed the pass to MemAccessShrink and changed the comments accordingly. wmi: You are right. The impact of the pass is not limited to bitfield access. I renamed the pass to…
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// Consecutive bitfields are now wrapped as a group and represented as a
				// large integer. To access a specific bitfield, wide load/store plus a
				// series of bit operations are needed. However, if the bitfield is of legal
				// type, it is more efficient to access it directly. The pass analyzes the
				// load/store and bit operations, tries to find out such widened bitfield
				// accesses and shrink them. The pass is expected to run in the late
				// pipeline so that possible wide load/store coalescing opportunities for
				// bitfields, especially those of illegal types, will not be affected, but
				// the cost is the pass has to recognize some complex patterns in order to
				// catch the most of the shrinking opportunities.
				chandlercUnsubmitted Not Done Reply Inline Actions See above about re-focusing the documentation here on the generic memory access narrowing, and making the details about bitfields part of the motivation. I would also make sure to include here a high level overview of the approach / algorithm used. Things like the fact that this uses MemorySSA and is specifically designed to handle shrinking across control flow seems important. I'd also suggest making this a \file doxygen comment. chandlerc: See above about re-focusing the documentation here on the generic memory access narrowing, and…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Add a highlevel overview of the approach used as suggested and use \file doxygen comment. wmi: Add a highlevel overview of the approach used as suggested and use \file doxygen comment.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/PostOrderIterator.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/Analysis/ValueTracking.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/PatternMatch.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Target/TargetLowering.h"
				#include "llvm/Target/TargetMachine.h"
				#include "llvm/Target/TargetSubtargetInfo.h"
				#include "llvm/Transforms/Scalar.h"
				#include "llvm/Transforms/Utils/Local.h"
				#include "llvm/Transforms/Utils/MemorySSA.h"
				#include "llvm/Transforms/Utils/MemorySSAUpdater.h"

				#define DEBUG_TYPE "bitfieldshrink"

				using namespace llvm;
				using namespace llvm::PatternMatch;

				STATISTIC(NumBitfieldStoreShrinked, "Number of bitfield stores shrinked");
				STATISTIC(NumBitfieldLoadShrinked, "Number of bitfield Loads shrinked");

				static cl::opt<bool> EnableLoadShrink("enable-load-bfshrink", cl::init(true));
				static cl::opt<bool> EnableStoreShrink("enable-store-bfshrink", cl::init(true));

				namespace {
				typedef struct {
				chandlercUnsubmitted Not Done Reply Inline Actions Use C++ struct naming rather than a C-style typedef of an anonymous struct. chandlerc: Use C++ struct naming rather than a C-style typedef of an anonymous struct.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
				unsigned Shift;
				unsigned Width;
				bool ShrinkWithMaskedVal;
				} ModRange;

				/// \brief The bitfield shrinking pass.
				class BitfieldShrinkingPass : public FunctionPass {
				public:
				static char ID; // Pass identification, replacement for typeid
				BitfieldShrinkingPass(const TargetMachine *TM = nullptr) : FunctionPass(ID) {
				this->TM = TM;
				initializeBitfieldShrinkingPassPass(*PassRegistry::getPassRegistry());
				}

				bool runOnFunction(Function &Fn) override;

				StringRef getPassName() const override { return "Bitfield Shrinking"; }

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesCFG();
				AU.addRequired<DominatorTreeWrapperPass>();
				AU.addRequired<MemorySSAWrapperPass>();
				AU.addPreserved<MemorySSAWrapperPass>();
				AU.addRequired<TargetLibraryInfoWrapperPass>();
				}

				private:
				const DataLayout *DL = nullptr;
				const DominatorTree *DT = nullptr;
				const TargetMachine *TM = nullptr;
				const TargetLowering *TLI = nullptr;
				const TargetLibraryInfo *TLInfo = nullptr;
				LLVMContext *Context;

				MemorySSA *MSSA;
				MemorySSAWalker *MSSAWalker;
				std::unique_ptr<MemorySSAUpdater> MSSAUpdater;

				void analyzeOrAndPattern(Value &MaskedVal, ConstantInt &Cst, ModRange &MR,
				unsigned TBits);
				void analyzeBOpPattern(Value &Val, ConstantInt &Cst, ModRange &MR,
				unsigned TBits);
				bool updateModRange(ModRange &MR, unsigned TBits, unsigned Align);
				bool hasClobberBetween(Instruction &From, Instruction &To);
				bool needNewStore(Value &OrigVal, StoreInst &SI);
				Value createNewPtr(Value Ptr, unsigned StOffset, Type *NewTy,
				IRBuilder<> &Builder);

				bool reduceLoadOpsStoreWidth(StoreInst &SI);
				bool tryShrinkOnInst(Instruction &Inst);
				bool reduceLoadOpsWidth(Instruction &IN);
				Value shrinkInsts(Value NewPtr, ModRange &MR,
				SmallVectorImpl<Instruction *> &Insts,
				Instruction *NewInsertPt, unsigned ResShift,
				IRBuilder<> &Builder);
				bool matchInstsToShrink(Value *Val, ModRange &MR,
				SmallVectorImpl<Instruction *> &Insts);
				bool hasClobberBetween(Instruction &From, Instruction &To,
				const MemoryLocation &Mem, Instruction *&NewInsertPt);
				};
				} // namespace

				char BitfieldShrinkingPass::ID = 0;
				INITIALIZE_TM_PASS_BEGIN(BitfieldShrinkingPass, "bitfieldshrink",
				"Bitfield Shrinking", false, false)
				INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
				INITIALIZE_TM_PASS_END(BitfieldShrinkingPass, "bitfieldshrink",
				"Bitfield Shrinking", false, false)

				FunctionPass llvm::createBitfieldShrinkingPass(const TargetMachine TM) {
				return new BitfieldShrinkingPass(TM);
				}

				/// Analyze or ((and (load P), \p Cst), \p MaskedVal). Update \p MR.Width
				/// with the number of bits of the original load to be modified, and update
				/// \p MR.Shift with the pos of the first bit to be modified. If the
				/// analysis result shows we can use bits extracted from MaskedVal as store
				/// value, set \p MR.ShrinkWithMaskedVal to be true.
				void BitfieldShrinkingPass::analyzeOrAndPattern(Value &MaskedVal,
				ConstantInt &Cst, ModRange &MR,
				unsigned TBits) {
				// Cst is the mask. Analyze the pattern of mask after sext it to uint64_t. We
				// will handle patterns like either 0..01..1 or 1..10..01..1
				APInt Mask = Cst.getValue().sextOrTrunc(TBits);
				assert(Mask.getBitWidth() == TBits && "Unexpected bitwidth of Mask");
				unsigned MaskLeadOnes = Mask.countLeadingOnes();
				if (MaskLeadOnes == TBits)
				return;
				unsigned MaskTrailOnes = Mask.countTrailingOnes();
				unsigned MaskMidZeros = !MaskLeadOnes
				? Mask.countLeadingZeros()
				: Mask.ashr(MaskTrailOnes).countTrailingZeros();

				MR.ShrinkWithMaskedVal = true;
				// See if we have a continuous run of zeros.
				if (MaskLeadOnes + MaskMidZeros + MaskTrailOnes != TBits) {
				MaskMidZeros = TBits - MaskLeadOnes - MaskTrailOnes;
				MR.ShrinkWithMaskedVal = false;
				}

				// Check MaskedVal only provides nonzero bits within range from lowbits
				// (MaskTrailOnes) to highbits (MaskTrailOnes + MaskMidZeros).
				APInt BitMask =
				~APInt::getBitsSet(TBits, MaskTrailOnes, MaskTrailOnes + MaskMidZeros);

				// Find out the range in which 1 appears in MaskedVal.
				APInt KnownOne(TBits, 0), KnownZero(TBits, 0);
				computeKnownBits(&MaskedVal, KnownZero, KnownOne, *DL, 0);

				MR.Shift = MaskTrailOnes;
				MR.Width = MaskMidZeros;
				// Expect the bits being 1 in BitMask are all KnownZero bits in MaskedVal,
				// otherwise we need to set ShrinkWithMaskedVal to false and expand MR.
				if ((KnownZero & BitMask) != BitMask) {
				MR.ShrinkWithMaskedVal = false;
				// Lower is the first bit for which we are not sure about the fact of
				// its being zero.
				unsigned Lower = KnownZero.countTrailingOnes();
				// Higher is the last bit for which we are not sure about the fact of
				// its being zero.
				unsigned Higher = TBits - KnownZero.countLeadingOnes();
				MR.Shift = std::min(Lower, MaskTrailOnes);
				MR.Width = std::max(Higher, MaskTrailOnes + MaskMidZeros) - MR.Shift;
				}
				}

				/// Analyze \p Val = or/xor/and ((load P), \p Cst). Update \p MR.Width
				/// with the number of bits of the original load to be modified, and update
				/// \p MR.Shift with the pos of the first bit to be modified.
				void BitfieldShrinkingPass::analyzeBOpPattern(Value &Val, ConstantInt &Cst,
				ModRange &MR, unsigned TBits) {
				APInt Mask = Cst.getValue().sextOrTrunc(TBits);
				BinaryOperator *BOP = cast<BinaryOperator>(&Val);
				if (BOP->getOpcode() == Instruction::And)
				Mask = ~Mask;

				MR.Shift = Mask.countTrailingZeros();
				MR.Width = Mask.getBitWidth() - MR.Shift;
				if (MR.Width)
				MR.Width = MR.Width - Mask.countLeadingZeros();
				MR.ShrinkWithMaskedVal = false;
				}

				/// Update \p MR.Width and \p MR.Shift so the updated \p MR.Width
				/// bits can form a legal type and also cover all the modified bits.
				bool BitfieldShrinkingPass::updateModRange(ModRange &MR, unsigned TBits,
				unsigned Align) {
				unsigned NewWidth = PowerOf2Ceil(MR.Width);
				Type NewTy = Type::getIntNTy(Context, NewWidth);
				int NewShift;

				// Check if we can find a NewShift for the NewWidth, so that
				// NewShift and NewWidth forms a new range covering the old
				// modified range without worsening alignment.
				auto coverOldRange = [&]() -> bool {
				unsigned MAlign = MinAlign(Align, DL->getABITypeAlignment(NewTy));
				NewShift = MR.Shift - MR.Shift % (MAlign * 8);
				while (NewShift >= 0) {
				if (NewWidth + NewShift <= TBits &&
				NewWidth + NewShift >= MR.Width + MR.Shift)
				return true;
				NewShift -= MAlign * 8;
				}
				return false;
				};
				// See whether we can store NewTy legally.
				auto isStoreLegalType = [&]() -> bool {
				EVT OldEVT =
				TLI->getValueType(DL, Type::getIntNTy(Context, PowerOf2Ceil(TBits)));
				EVT NewEVT = TLI->getValueType(*DL, NewTy);
				return TLI->isOperationLegalOrCustom(ISD::STORE, NewEVT) \|\|
				TLI->isTruncStoreLegalOrCustom(OldEVT, NewEVT);
				};
				// Try to find the minimal NewWidth which can form a legal type and cover
				// all the old modified bits.
				while (NewWidth < TBits && (!isStoreLegalType() \|\| !coverOldRange())) {
				NewWidth = NextPowerOf2(NewWidth);
				NewTy = Type::getIntNTy(*Context, NewWidth);
				}
				chandlercUnsubmitted Not Done Reply Inline Actions While I generally like the use of lambdas to help factor this code, I find the parameters which are changing with each loop iteration being captured by reference and so implicitly changing to be really confusing. I would prefer to pass parameters that are fundamentally the input to the lambda as actual parameters, and use capture more for common context that isn't really specific to a particular call to the lambda. Does that make sense? chandlerc: While I generally like the use of lambdas to help factor this code, I find the parameters which…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions It makes sense. Fixed. wmi: It makes sense. Fixed.
				MR.Width = NewWidth;
				MR.Shift = NewShift;

				if (MR.Width >= TBits)
				return false;
				return true;
				}

				/// Compute the offset used to compute the new ptr address. It will be
				/// mainly based on MR and the endian of the target.
				static unsigned computeStOffset(ModRange &MR, unsigned TBits,
				const DataLayout &DL) {
				unsigned StOffset;
				if (DL.isBigEndian())
				StOffset = TBits - MR.Shift - MR.Width;
				else
				StOffset = MR.Shift;

				if (StOffset % 8 != 0)
				MR.ShrinkWithMaskedVal = false;

				return StOffset / 8;
				}

				/// Check whether V1 and V2 has the same ptr value by looking through bitcast.
				static bool hasSamePtr(Value V1, Value V2) {
				Value *NV1 = nullptr;
				Value *NV2 = nullptr;
				if (V1 == V2)
				return true;
				if (isa<BitCastInst>(V1))
				NV1 = cast<BitCastInst>(V1)->getOperand(0);
				if (isa<BitCastInst>(V2))
				NV2 = cast<BitCastInst>(V2)->getOperand(0);
				chandlercUnsubmitted Not Done Reply Inline Actions We try to avoid doing `isa<Foo>(...)` and then `cast<Foo>(...)` in LLVM (it adds overhead to asserts builds that can really add up and is a bit redundant). Instead, use `dyn_cast` here? chandlerc: We try to avoid doing `isa<Foo>(...)` and then `cast<Foo>(...)` in LLVM (it adds overhead to…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
				if (!NV1 && !NV2)
				return false;
				if (V1 == NV2 \|\| V2 == NV1 \|\| NV1 == NV2)
				return true;
				return false;
				}

				/// Check whether there is any clobber instruction between \p From and \p To
				/// for the memory location accessed by \p To.
				bool BitfieldShrinkingPass::hasClobberBetween(Instruction &From,
				Instruction &To) {
				assert(MSSA->getMemoryAccess(&To) && "Expect To has valid MemoryAccess");
				MemoryAccess *FromAccess = MSSA->getMemoryAccess(&From);
				assert(FromAccess && "Expect From has valid MemoryAccess");
				MemoryAccess *DefiningAccess = MSSAWalker->getClobberingMemoryAccess(&To);
				if (FromAccess != DefiningAccess &&
				MSSA->dominates(FromAccess, DefiningAccess))
				return true;
				return false;
				}

				/// It is possible that we already have a store of OrigVal previously and
				/// it is not clobbered, then we can use it and don't have to generate a
				/// a new store.
				bool BitfieldShrinkingPass::needNewStore(Value &OrigVal, StoreInst &SI) {
				for (User *U : OrigVal.users()) {
				StoreInst *PrevSI = dyn_cast<StoreInst>(U);
				if (!PrevSI \|\|
				!hasSamePtr(PrevSI->getPointerOperand(), SI.getPointerOperand()))
				continue;
				if (!hasClobberBetween(*PrevSI, SI))
				return false;
				}
				return true;
				}

				/// Create new address to be used by shrinked load/store based on original
				/// address \p Ptr, offset \p StOffset and the new type \p NewTy.
				Value BitfieldShrinkingPass::createNewPtr(Value Ptr, unsigned StOffset,
				Type *NewTy, IRBuilder<> &Builder) {
				Value *NewPtr = Ptr;
				unsigned AS = cast<PointerType>(Ptr->getType())->getAddressSpace();
				if (StOffset) {
				ConstantInt Idx = ConstantInt::get(Type::getInt32Ty(Context), StOffset);
				NewPtr =
				Builder.CreateBitCast(Ptr, Type::getInt8PtrTy(*Context, AS), "cast");
				NewPtr =
				Builder.CreateGEP(Type::getInt8Ty(*Context), NewPtr, Idx, "uglygep");
				}
				return Builder.CreateBitCast(NewPtr, NewTy->getPointerTo(AS), "cast");
				}

				/// Try to shrink the store when the input \p SI has one of the patterns:
				/// Pattern1: or(and(Val, Cst), MaskedVal).
				/// Pattern2: or/and/xor(load, Cst).
				/// For the first pattern, when the Cst and MaskedVal satisfies some
				/// requirements, the or+and pair has the property that only a portion of
				/// Val is modified and the rest of it will not be changed. We want to
				/// shrink the store to be aligned to the modified range of the Val.
				/// Pattern1 after the shrinking looks like:
				/// store(Val) // The store can be omitted if the Val is a load with the
				/// // same address as the original store.
				/// store(shrinked MaskedVal)
				/// However, if some condition doesn't satisfy, which will be indicated by
				/// MR::ShrinkWithMaskedVal being false, We may try another type of shrinking
				/// -- shrink the store to legal type if it is of illegal type. So for
				/// Pattern2 and Pattern1 when Val is load and MR::ShrinkWithMaskedVal is
				/// false, if the store is of illegal type, we may shrink them to:
				/// store(or(and(shrinked load, shrinked Cst), shrinked MaskedVal))
				/// store(or/and/xor(shrinked load, shrinked Cst))
				/// After the shrinking, all the operations will be of legal type.
				bool BitfieldShrinkingPass::reduceLoadOpsStoreWidth(StoreInst &SI) {
				Value *Val = SI.getOperand(0);
				Value *Ptr = SI.getOperand(1);
				Type *StoreTy = Val->getType();
				if (StoreTy->isVectorTy() \|\| !StoreTy->isIntegerTy())
				return false;

				unsigned TBits = DL->getTypeSizeInBits(StoreTy);
				if (TBits != DL->getTypeStoreSizeInBits(StoreTy))
				return false;

				LoadInst *LI = nullptr;
				Value *OrigVal = nullptr;
				Value *MaskedVal;
				ConstantInt *Cst;
				// Match or(and(Val, Cst), MaskedVal) pattern or their correspondents
				// after commutation.
				// However the pattern matching below is more complex than that because
				// of the commutation and some matching preference we expect.
				// for case: or(and(or(and(LI, Cst_1), MaskedVal_1), Cst_2), MaskedVal_2)
				// and if MaskedVal_2 happens to be another and operator, we hope we can
				// match or(and(LI, Cst_1), MaskedVal_1) to Val instead of MaskedVal.
				bool OrAndPattern =
				match(Val, m_c_Or(m_And(m_Load(LI), m_ConstantInt(Cst)),
				m_Value(MaskedVal))) \|\|
				(match(Val, m_Or(m_And(m_Value(OrigVal), m_ConstantInt(Cst)),
				m_Value(MaskedVal))) &&
				match(OrigVal, m_c_Or(m_And(m_Value(), m_Value()), m_Value()))) \|\|
				(match(Val, m_Or(m_Value(MaskedVal),
				m_And(m_Value(OrigVal), m_ConstantInt(Cst)))) &&
				match(OrigVal, m_c_Or(m_And(m_Value(), m_Value()), m_Value())));
				// Match "or/and/xor(load, cst)" pattern or their correspondents after
				// commutation.
				bool BopPattern = match(Val, m_c_Or(m_Load(LI), m_ConstantInt(Cst))) \|\|
				match(Val, m_c_And(m_Load(LI), m_ConstantInt(Cst))) \|\|
				match(Val, m_c_Xor(m_Load(LI), m_ConstantInt(Cst)));
				if (!OrAndPattern && !BopPattern)
				return false;

				// LI should have the same address as SI.
				if (LI && !hasSamePtr(LI->getPointerOperand(), Ptr))
				return false;

				// Make sure the memory location of LI/SI is not clobbered between them.
				if (LI && hasClobberBetween(*LI, SI))
				return false;

				// Analyze MR which indicates the range of the input that will actually
				// be modified and stored.
				ModRange MR = {0, 0, true};
				if (OrAndPattern)
				analyzeOrAndPattern(MaskedVal, Cst, MR, TBits);
				if (BopPattern)
				analyzeBOpPattern(Val, Cst, MR, TBits);

				assert(MR.Shift + MR.Width <= TBits && "Unexpected ModRange!");
				if (!MR.Width)
				return false;

				unsigned StOffset;
				if (MR.ShrinkWithMaskedVal) {
				// Get the offset from Ptr for the shrinked store.
				StOffset = computeStOffset(MR, TBits, *DL);

				// If MR.Width is not the length of legal type, we cannot
				// store MaskedVal directly.
				if (MR.Width != 8 && MR.Width != 16 && MR.Width != 32)
				MR.ShrinkWithMaskedVal = false;
				}

				unsigned Align = SI.getAlignment();
				// If we are shrink illegal type of store with original val, update MR.Width
				// and MR.Shift to ensure the shrinked store is of legal type.
				if (!MR.ShrinkWithMaskedVal) {
				// Check whether the store is of illegal type. If it is not, don't bother.
				if (!TLI \|\| TLI->isOperationLegalOrCustom(ISD::STORE,
				TLI->getValueType(*DL, StoreTy)))
				return false;
				// Try to find out a MR of legal type.
				if (!updateModRange(MR, TBits, Align))
				return false;
				// It is meaningless to shrink illegal type store for and(or(Val, ...)
				// pattern if Val is not a load, because we still have to insert another
				// illegal type store for Val.
				if (OrAndPattern && !LI)
				return false;

				StOffset = computeStOffset(MR, TBits, *DL);
				}

				// Start shrinking the size of the store.
				IRBuilder<> Builder(*Context);
				Builder.SetInsertPoint(&SI);
				Type NewTy = Type::getIntNTy(Context, MR.Width);
				Value *NewPtr = createNewPtr(Ptr, StOffset, NewTy, Builder);

				if (StOffset)
				Align = MinAlign(StOffset, Align);

				LoadInst *NewLI;
				if (MR.ShrinkWithMaskedVal) {
				// Check if we need to split the original store and generate a new
				// store for the OrigVal.
				if (OrigVal && needNewStore(*OrigVal, SI)) {
				StoreInst *NewSI =
				Builder.CreateAlignedStore(OrigVal, Ptr, SI.getAlignment());
				DEBUG(dbgs() << "Insert" << *NewSI << " before" << SI << "\n");
				// MemorySSA update for the new store.
				MemoryDef *OldMemAcc = cast<MemoryDef>(MSSA->getMemoryAccess(&SI));
				MemoryAccess *Def = OldMemAcc->getDefiningAccess();
				MemoryDef *NewMemAcc = cast<MemoryDef>(
				MSSAUpdater->createMemoryAccessBefore(NewSI, Def, OldMemAcc));
				MSSAUpdater->insertDef(NewMemAcc, false);
				}
				} else {
				// MemorySSA update for the shrinked load.
				NewLI = Builder.CreateAlignedLoad(NewPtr, Align, "load.trunc");
				MemoryDef *OldMemAcc = cast<MemoryDef>(MSSA->getMemoryAccess(&SI));
				MemoryAccess *Def = OldMemAcc->getDefiningAccess();
				MSSAUpdater->createMemoryAccessBefore(NewLI, Def, OldMemAcc);
				}

				// Create the New Value to store.
				Value *NewVal = nullptr;
				APInt ModifiedCst = Cst->getValue().lshr(MR.Shift).trunc(MR.Width);
				ConstantInt NewCst = ConstantInt::get(Context, ModifiedCst);
				if (OrAndPattern) {
				// Shift and truncate MaskedVal.
				Value *Trunc;
				if (auto MVCst = dyn_cast<ConstantInt>(MaskedVal)) {
				ModifiedCst = MVCst->getValue().lshr(MR.Shift).trunc(MR.Width);
				Trunc = ConstantInt::get(*Context, ModifiedCst);
				} else {
				Value *ShiftedVal = MR.Shift
				? Builder.CreateLShr(MaskedVal, MR.Shift, "lshr")
				: MaskedVal;
				Trunc = Builder.CreateTruncOrBitCast(ShiftedVal, NewTy, "trunc");
				}
				if (MR.ShrinkWithMaskedVal) {
				NewVal = Trunc;
				} else {
				Value *NewAnd = Builder.CreateAnd(NewLI, NewCst, "and.trunc");
				NewVal = Builder.CreateOr(NewAnd, Trunc, "or.trunc");
				}
				} else {
				BinaryOperator *BOP = cast<BinaryOperator>(Val);
				switch (BOP->getOpcode()) {
				default:
				break;
				chandlercUnsubmitted Not Done Reply Inline Actions Is this valid at this point? It seems like it shouldn't be able to happen. I'd either use llvm_unreachable to mark that or add a comment explaining what is happening here. chandlerc: Is this valid at this point? It seems like it shouldn't be able to happen. I'd either use…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
				case Instruction::And:
				NewVal = Builder.CreateAnd(NewLI, NewCst, "and.trunc");
				break;
				case Instruction::Or:
				NewVal = Builder.CreateOr(NewLI, NewCst, "or.trunc");
				break;
				case Instruction::Xor:
				NewVal = Builder.CreateXor(NewLI, NewCst, "xor.trunc");
				break;
				}
				}

				// Create the new store and remove the old one.
				StoreInst *NewSI = Builder.CreateAlignedStore(NewVal, NewPtr, Align);
				DEBUG(dbgs() << "Replace" << SI << " with" << *NewSI << "\n");
				// MemorySSA update for the shrinked store.
				MemoryDef *OldMemAcc = cast<MemoryDef>(MSSA->getMemoryAccess(&SI));
				MemoryAccess *Def = OldMemAcc->getDefiningAccess();
				MemoryAccess *NewMemAcc =
				MSSAUpdater->createMemoryAccessBefore(NewSI, Def, OldMemAcc);
				OldMemAcc->replaceAllUsesWith(NewMemAcc);
				MSSAUpdater->removeMemoryAccess(OldMemAcc);
				SI.eraseFromParent();
				NumBitfieldStoreShrinked++;
				return true;
				}

				/// The driver of shrinking and final dead instructions cleanup.
				bool BitfieldShrinkingPass::tryShrinkOnInst(Instruction &Inst) {
				if (isInstructionTriviallyDead(&Inst, TLInfo)) {
				MemoryAccess *MemAcc = MSSA->getMemoryAccess(&Inst);
				if (MemAcc)
				MSSAUpdater->removeMemoryAccess(MemAcc);
				Inst.eraseFromParent();
				return true;
				}

				StoreInst *SI = dyn_cast<StoreInst>(&Inst);
				if (EnableStoreShrink && SI)
				return reduceLoadOpsStoreWidth(*SI);
				if (EnableLoadShrink && (isa<TruncInst>(&Inst) \|\| isa<BinaryOperator>(&Inst)))
				return reduceLoadOpsWidth(Inst);
				return false;
				}

				/// Check the range of Cst containing non-zero bit is within \p MR.
				static bool CompareAPIntWithModRange(APInt &AI, ModRange &MR, bool AInB) {
				unsigned LZ = AI.countLeadingZeros();
				unsigned TZ = AI.countTrailingZeros();
				unsigned TBits = AI.getBitWidth();
				if (AInB && (TZ < MR.Shift \|\| (TBits - LZ) > MR.Width + MR.Shift))
				return false;

				if (!AInB && (MR.Shift < TZ \|\| MR.Width + MR.Shift > (TBits - LZ)))
				return false;
				return true;
				}

				/// Check if there is overlap between range \MR and \OtherMR.
				static bool nonoverlap(ModRange &MR, ModRange &OtherMR) {
				return (MR.Shift > OtherMR.Shift + OtherMR.Width - 1) \|\|
				(OtherMR.Shift > MR.Shift + MR.Width - 1);
				}

				/// Check there is no instruction between *LI and IN which may clobber
				/// the MemoryLocation specified by MR. We relax it a little bit to allow
				/// clobber instruction in the same BB as IN, if that happens we need to
				/// use the NewInsertPt to insert the shrinked load.
				bool BitfieldShrinkingPass::hasClobberBetween(Instruction &From,
				Instruction &To,
				const MemoryLocation &Mem,
				Instruction *&NewInsertPt) {
				assert(DT->dominates(&From, &To) && "From doesn't dominate To");
				const BasicBlock *FBB = From.getParent();
				const BasicBlock *TBB = To.getParent();
				// Collect BasicBlocks to scan between FBB and TBB into BBSet.
				// Limit the maximum number of BasicBlocks to 3 to protect compile time.
				SmallPtrSet<const BasicBlock *, 4> BBSet;
				SmallVector<const BasicBlock *, 4> WorkSet;
				chandlercUnsubmitted Not Done Reply Inline Actions `Worklist` is a more common name for this kind of vector in LLVM. chandlerc: `Worklist` is a more common name for this kind of vector in LLVM.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
				BBSet.insert(FBB);
				BBSet.insert(TBB);
				WorkSet.push_back(TBB);
				do {
				const BasicBlock *BB = WorkSet.pop_back_val();
				for (const BasicBlock *Pred : predecessors(BB)) {
				if (!BBSet.count(Pred)) {
				BBSet.insert(Pred);
				WorkSet.push_back(Pred);
				}
				}
				} while (!WorkSet.empty());

				for (const BasicBlock *BB : BBSet) {
				const MemorySSA::DefsList *DList = MSSA->getBlockDefs(BB);
				if (!DList)
				continue;

				for (const MemoryAccess &MA : *DList) {
				if (const MemoryDef *MD = dyn_cast<MemoryDef>(&MA)) {
				Instruction *Inst = MD->getMemoryInst();
				if ((FBB == Inst->getParent() && DT->dominates(Inst, &From)) \|\|
				(TBB == Inst->getParent() && DT->dominates(&To, Inst)))
				continue;
				if (MSSAWalker->getClobberingMemoryAccess(const_cast<MemoryDef *>(MD),
				Mem) == MD) {
				if (TBB == Inst->getParent()) {
				if (!NewInsertPt \|\| DT->dominates(Inst, NewInsertPt))
				NewInsertPt = Inst;
				continue;
				}
				return true;
				}
				}
				}
				}
				return false;
				}

				/// Match \p Val to the pattern:
				/// Bop(...Bop(V, Cst_1), Cst_2, ..., Cst_N) and pattern:
				/// or(and(...or(and(LI, Cst_1), MaskedVal_1), ..., Cse_N), MaskedVal_N),
				/// and find the final load to be shrinked.
				/// All the Bop instructions in the first pattern and the final load will
				/// be added into \p Insts and shrinked afterwards.
				bool BitfieldShrinkingPass::matchInstsToShrink(
				Value Val, ModRange &MR, SmallVectorImpl<Instruction > &Insts) {
				Value *Opd;
				ConstantInt *Cst;
				unsigned TBits = DL->getTypeSizeInBits(Val->getType());

				// Match the pattern:
				// Bop(...Bop(V, Cst_1), Cst_2, ..., Cst_N), Bop can be Add/Sub/And/Or/Xor.
				// All these Csts cannot have one bits outside of range MR, so that we can
				// shrink these Bops to be MR.Width bits integer type.
				while (match(Val, m_Add(m_Value(Opd), m_ConstantInt(Cst))) \|\|
				match(Val, m_Sub(m_Value(Opd), m_ConstantInt(Cst))) \|\|
				match(Val, m_And(m_Value(Opd), m_ConstantInt(Cst))) \|\|
				match(Val, m_Or(m_Value(Opd), m_ConstantInt(Cst))) \|\|
				match(Val, m_Xor(m_Value(Opd), m_ConstantInt(Cst)))) {
				APInt AI = Cst->getValue().zextOrTrunc(TBits);
				if (!CompareAPIntWithModRange(AI, MR, true))
				return false;
				Insts.push_back(cast<Instruction>(Val));
				Val = Opd;
				}

				LoadInst *LI;
				if ((LI = dyn_cast<LoadInst>(Val))) {
				Insts.push_back(LI);
				return true;
				}

				// Match the pattern:
				// or(and(...or(and(LI, Cst_1), MaskedVal_1), ..., Cse_N), MaskedVal_N)
				// Analyze the ModRange of each or(and(...)) pair and see if it has
				// any overlap with MR. If any overlap is found, we cannot do shrinking
				// for the final Load LI.
				Value *MaskedVal;
				Value *OrigVal;
				ModRange OtherMR;
				while (match(Val, m_c_Or(m_And(m_Load(LI), m_ConstantInt(Cst)),
				m_Value(MaskedVal))) \|\|
				(match(Val, m_Or(m_And(m_Value(OrigVal), m_ConstantInt(Cst)),
				m_Value(MaskedVal))) &&
				match(OrigVal, m_c_Or(m_And(m_Value(), m_Value()), m_Value()))) \|\|
				(match(Val, m_Or(m_Value(MaskedVal),
				m_And(m_Value(OrigVal), m_ConstantInt(Cst)))) &&
				match(OrigVal, m_c_Or(m_And(m_Value(), m_Value()), m_Value())))) {
				OtherMR.Shift = 0;
				OtherMR.Width = TBits;
				// Analyze ModRange OtherMR of A from or(and(A, Cst), MaskedVal) pattern.
				analyzeOrAndPattern(MaskedVal, Cst, OtherMR, TBits);
				// If OtherMR has overlap with MR, we cannot shrink load to its MR portion.
				if (!nonoverlap(MR, OtherMR))
				return false;
				if (LI) {
				Insts.push_back(LI);
				return true;
				}
				Val = OrigVal;
				}

				return false;
				}

				/// Find the next MemoryAccess after LI in the same BB.
				MemoryAccess *findNextMemoryAccess(LoadInst &LI, MemorySSA &MSSA) {
				for (auto Scan = LI.getIterator(); Scan != LI.getParent()->end(); ++Scan) {
				if (MemoryAccess MA = MSSA.getMemoryAccess(&Scan))
				return MA;
				}
				return nullptr;
				}

				/// Shrink \p Insts according to the range MR.
				Value BitfieldShrinkingPass::shrinkInsts(Value NewPtr, ModRange &MR,
				SmallVectorImpl<Instruction *> &Insts,
				Instruction *NewInsertPt,
				unsigned ResShift,
				IRBuilder<> &Builder) {
				Value *NewVal;
				IntegerType NewTy = Type::getIntNTy(Context, MR.Width);
				Instruction &InsertPt = *Builder.GetInsertPoint();
				for (Instruction *I : make_range(Insts.rbegin(), Insts.rend())) {
				if (LoadInst *LI = dyn_cast<LoadInst>(I)) {
				unsigned TBits = DL->getTypeSizeInBits(LI->getType());
				unsigned StOffset = computeStOffset(MR, TBits, *DL);
				unsigned Align = MinAlign(StOffset, LI->getAlignment());
				// If we use a new insertion point, we have to recreate the NewPtr in
				// the new place.
				if (NewInsertPt) {
				Builder.SetInsertPoint(NewInsertPt);
				RecursivelyDeleteTriviallyDeadInstructions(NewPtr);
				NewPtr =
				createNewPtr(LI->getPointerOperand(), StOffset, NewTy, Builder);
				}
				LoadInst *NewLI = Builder.CreateAlignedLoad(NewPtr, Align, "load.trunc");
				NewVal = NewLI;
				// Update MemorySSA.
				MemoryUseOrDef *OldMemAcc =
				cast<MemoryUseOrDef>(MSSA->getMemoryAccess(LI));
				MemoryAccess *Def = OldMemAcc->getDefiningAccess();
				// Find a proper position to insert the new MemoryAccess.
				MemoryAccess Next = findNextMemoryAccess(NewLI, *MSSA);
				if (Next)
				MSSAUpdater->createMemoryAccessBefore(NewLI, Def,
				cast<MemoryUseOrDef>(Next));
				else
				MSSAUpdater->createMemoryAccessInBB(NewLI, Def, NewLI->getParent(),
				MemorySSA::End);
				} else {
				// shrink Add/Sub/And/Xor/Or in Insts.
				BinaryOperator *BOP = cast<BinaryOperator>(I);
				ConstantInt *Cst = cast<ConstantInt>(BOP->getOperand(1));
				APInt NewAPInt = Cst->getValue().extractBits(MR.Width, MR.Shift);
				ConstantInt *NewCst =
				ConstantInt::get(*Context, NewAPInt.zextOrTrunc(MR.Width));
				NewVal = Builder.CreateBinOp(BOP->getOpcode(), NewVal, NewCst);
				}
				}
				// Adjust the type to be consistent with the use of input trunc/and
				// instructions.
				IntegerType *UseType = cast<IntegerType>(InsertPt.getType());
				if (DL->getTypeSizeInBits(UseType) != MR.Width)
				NewVal = Builder.CreateZExt(NewVal, UseType, "trunc.zext");
				// Adjust the NewVal with ResShift.
				if (ResShift) {
				ConstantInt *NewCst = ConstantInt::get(UseType, ResShift);
				NewVal = Builder.CreateShl(NewVal, NewCst, "shl");
				}
				return NewVal;
				}

				/// When the input instruction \p IN is and(Val, Cst) or trunc, it indicates
				/// only a portion of its input value has been used. We will walk through the
				/// ud chain, try to find out a final load and find out the range of the load
				/// really being used. That is the range we want to shrink the load to.
				bool BitfieldShrinkingPass::reduceLoadOpsWidth(Instruction &IN) {
				Type *Ty = IN.getType();
				if (Ty->isVectorTy() \|\| !Ty->isIntegerTy())
				return false;

				// Match and(Val, Cst) or Trunc(Val)
				Value *Val;
				ConstantInt *AndCst = nullptr;
				if (!match(&IN, m_Trunc(m_Value(Val))) &&
				!match(&IN, m_And(m_Value(Val), m_ConstantInt(AndCst))))
				return false;

				// Get initial MR.
				ModRange MR;
				MR.Shift = 0;
				unsigned ResShift = 0;
				unsigned TBits = DL->getTypeSizeInBits(Val->getType());
				if (AndCst) {
				APInt AI = AndCst->getValue();
				// Check AI has consecutive one bits. The consecutive bits are the
				// range to be used. If the num of trailing zeros are not zero,
				// remember the num in ResShift and the val after shrinking needs
				// to be shifted accordingly.
				MR.Shift = AI.countTrailingZeros();
				ResShift = MR.Shift;
				if (!(AI.lshr(MR.Shift) + 1).isPowerOf2() \|\| MR.Shift == TBits)
				return false;
				MR.Width = AI.getActiveBits() - MR.Shift;
				} else {
				MR.Width = DL->getTypeSizeInBits(Ty);
				}

				// Match a series of LShr or Shl. Adjust MR.Shift accordingly.
				// Note we have to be careful that valid bits may be cleared during
				// back-and-force shifts. We use a all-one Mask APInt to simulate
				// the shifts and track the valid bits after shifts.
				Value *Opd;
				bool isLShr;
				unsigned NewShift = MR.Shift;
				APInt Mask(TBits, -1);
				ConstantInt *Cst;
				// Record the series of LShr or Shl in ShiftRecs.
				SmallVector<std::pair<bool, unsigned>, 4> ShiftRecs;
				while ((isLShr = match(Val, m_LShr(m_Value(Opd), m_ConstantInt(Cst)))) \|\|
				match(Val, m_Shl(m_Value(Opd), m_ConstantInt(Cst)))) {
				NewShift = isLShr ? (NewShift + Cst->getZExtValue())
				: (NewShift - Cst->getZExtValue());
				ShiftRecs.push_back({isLShr, Cst->getZExtValue()});
				Val = Opd;
				}
				// Iterate ShiftRecs in reverse order. Simulate the shifts of Mask.
				for (auto SR : make_range(ShiftRecs.rbegin(), ShiftRecs.rend())) {
				if (SR.first)
				Mask.lshr(SR.second);
				else
				Mask.shl(SR.second);
				}
				// If all the bits in MR are one in Mask, MR is valid after the shifts.
				if (!CompareAPIntWithModRange(Mask, MR, false))
				return false;
				MR.Shift = NewShift;

				// Specify the proper MR we want to handle.
				if (MR.Shift + MR.Width > TBits)
				return false;
				if (MR.Width != 8 && MR.Width != 16 && MR.Width != 32)
				return false;
				if (MR.Shift % 8 != 0)
				return false;

				// Match other Binary operators like Add/Sub/And/Xor/Or or pattern like
				// And(Or(...And(Or(Val, Cst)))) and find the final load.
				SmallVector<Instruction *, 8> Insts;
				if (!matchInstsToShrink(Val, MR, Insts))
				return false;

				// Expect the final load has been found here.
				assert(!Insts.empty() && "Expect Insts to be not empty");
				LoadInst *LI = dyn_cast<LoadInst>(Insts.back());
				assert(LI && "Last elem in Insts should be a LoadInst");

				// Do legality check to ensure the range of shrinked load is not clobbered
				// from *LI to IN.
				IRBuilder<> Builder(*Context);
				Builder.SetInsertPoint(&IN);
				IntegerType NewTy = Type::getIntNTy(Context, MR.Width);
				unsigned StOffset = computeStOffset(MR, TBits, *DL);
				Value *NewPtr =
				createNewPtr(LI->getPointerOperand(), StOffset, NewTy, Builder);
				Instruction *NewInsertPt = nullptr;
				if (hasClobberBetween(*LI, IN, MemoryLocation(NewPtr, MR.Width / 8),
				NewInsertPt)) {
				RecursivelyDeleteTriviallyDeadInstructions(NewPtr);
				return false;
				}

				// Start to shrink instructions in Ops.
				Value *NewVal =
				shrinkInsts(NewPtr, MR, Insts, NewInsertPt, ResShift, Builder);
				DEBUG(dbgs() << "Replace" << IN << " with" << *NewVal << "\n");
				IN.replaceAllUsesWith(NewVal);
				NumBitfieldLoadShrinked++;
				return true;
				}

				/// Perform the bitfield shrinking optimization for the given function.
				bool BitfieldShrinkingPass::runOnFunction(Function &Fn) {
				if (skipFunction(Fn))
				return false;

				DEBUG(dbgs() << "******** Begin Bitfield Shrinking ********\n");
				DEBUG(dbgs() << "********** Function: " << Fn.getName() << '\n');

				DL = &Fn.getParent()->getDataLayout();
				DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
				TLInfo = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
				MSSA = &getAnalysis<MemorySSAWrapperPass>().getMSSA();
				MSSAWalker = MSSA->getWalker();
				MSSAUpdater = make_unique<MemorySSAUpdater>(MSSA);

				Context = &Fn.getContext();

				if (TM)
				TLI = TM->getSubtargetImpl(Fn)->getTargetLowering();

				// Iterates the Fn until nothing is changed.
				MSSA->verifyMemorySSA();
				bool MadeChange = true;
				bool EverMadeChange = false;
				while (MadeChange) {
				MadeChange = false;
				for (BasicBlock *BB : post_order(&Fn)) {
				auto CurInstIterator = BB->rbegin();
				while (CurInstIterator != BB->rend())
				MadeChange \|= tryShrinkOnInst(*CurInstIterator++);
				}
				EverMadeChange \|= MadeChange;
				}

				if (EverMadeChange) {
				DEBUG(dbgs() << "********** Function after Bitfield Shrinking: "
				<< Fn.getName() << '\n');
				DEBUG(dbgs() << Fn);
				chandlercUnsubmitted Not Done Reply Inline Actions The pass manager already provides for facilities for printing before and after passes -- is this needed? chandlerc: The pass manager already provides for facilities for printing before and after passes -- is…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions The printing was Removed. wmi: The printing was Removed.
				}
				DEBUG(dbgs() << "******** End Bitfield Shrinking ********\n");

				return EverMadeChange;
				}

lib/Transforms/Scalar/CMakeLists.txt

	add_llvm_library(LLVMScalarOpts			add_llvm_library(LLVMScalarOpts
	ADCE.cpp			ADCE.cpp
	AlignmentFromAssumptions.cpp			AlignmentFromAssumptions.cpp
	BDCE.cpp			BDCE.cpp
				BitfieldShrinking.cpp
	ConstantHoisting.cpp			ConstantHoisting.cpp
	ConstantProp.cpp			ConstantProp.cpp
	CorrelatedValuePropagation.cpp			CorrelatedValuePropagation.cpp
	DCE.cpp			DCE.cpp
	DeadStoreElimination.cpp			DeadStoreElimination.cpp
	EarlyCSE.cpp			EarlyCSE.cpp
	FlattenCFGPass.cpp			FlattenCFGPass.cpp
	Float2Int.cpp			Float2Int.cpp
	▲ Show 20 Lines • Show All 59 Lines • Show Last 20 Lines

test/CodeGen/ARM/bitfield-store.ll

				; RUN: opt < %s -mtriple=arm-eabi -bitfieldshrink -S \| FileCheck %s
				; Check bitfield store is shrinked properly in cases below.

				; class A1 {
				; unsigned long f1:8;
				; unsigned long f2:3;
				; } a1;
				; a1.f1 = n;
				;
				; The bitfield store can be shrinked from i16 to i8.
				; CHECK-LABEL: @test1(
				; CHECK: %conv = zext i32 %n to i64
				; CHECK: %t0 = trunc i64 %conv to i16
				; CHECK: %bf.value = and i16 %t0, 255
				; CHECK: %trunc = trunc i16 %bf.value to i8
				; CHECK: store i8 %trunc, i8* bitcast (%class.A1* @a1 to i8*), align 8

				%class.A1 = type { i16, [6 x i8] }
				@a1 = local_unnamed_addr global %class.A1 zeroinitializer, align 8

				define void @test1(i32 %n) {
				entry:
				%conv = zext i32 %n to i64
				%t0 = trunc i64 %conv to i16
				%bf.load = load i16, i16* getelementptr inbounds (%class.A1, %class.A1* @a1, i32 0, i32 0), align 8
				%bf.value = and i16 %t0, 255
				%bf.clear = and i16 %bf.load, -256
				%bf.set = or i16 %bf.clear, %bf.value
				store i16 %bf.set, i16* getelementptr inbounds (%class.A1, %class.A1* @a1, i32 0, i32 0), align 8
				ret void
				}

				; class A2 {
				; unsigned long f1:16;
				; unsigned long f2:3;
				; } a2;
				; a2.f1 = n;
				; The bitfield store can be shrinked from i32 to i16.
				; CHECK-LABEL: @test2(
				; CHECK: %bf.value = and i32 %n, 65535
				; CHECK: %trunc = trunc i32 %bf.value to i16
				; CHECK: store i16 %trunc, i16* bitcast (%class.A2* @a2 to i16*), align 8

				%class.A2 = type { i24, [4 x i8] }
				@a2 = local_unnamed_addr global %class.A2 zeroinitializer, align 8

				define void @test2(i32 %n) {
				entry:
				%bf.load = load i32, i32* bitcast (%class.A2* @a2 to i32*), align 8
				%bf.value = and i32 %n, 65535
				%bf.clear = and i32 %bf.load, -65536
				%bf.set = or i32 %bf.clear, %bf.value
				store i32 %bf.set, i32* bitcast (%class.A2* @a2 to i32*), align 8
				ret void
				}

				; class A3 {
				; unsigned long f1:32;
				; unsigned long f2:3;
				; } a3;
				; a3.f1 = n;
				; The bitfield store can be shrinked from i64 to i32.
				; CHECK-LABEL: @test3(
				; CHECK: %conv = zext i32 %n to i64
				; CHECK: %bf.value = and i64 %conv, 4294967295
				; CHECK: %trunc = trunc i64 %bf.value to i32
				; CHECK: store i32 %trunc, i32* bitcast (%class.A3* @a3 to i32*), align 8

				%class.A3 = type { i40 }
				@a3 = local_unnamed_addr global %class.A3 zeroinitializer, align 8

				define void @test3(i32 %n) {
				entry:
				%conv = zext i32 %n to i64
				%bf.load = load i64, i64* bitcast (%class.A3* @a3 to i64*), align 8
				%bf.value = and i64 %conv, 4294967295
				%bf.clear = and i64 %bf.load, -4294967296
				%bf.set = or i64 %bf.clear, %bf.value
				store i64 %bf.set, i64* bitcast (%class.A3* @a3 to i64*), align 8
				ret void
				}

				; class A4 {
				; unsigned long f1:13;
				; unsigned long f2:3;
				; } a4;
				; a4.f1 = n;
				; The bitfield store cannot be shrinked because the field is not 8/16/32 bits.
				; CHECK-LABEL: @test4(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: %bf.load = load i16, i16* getelementptr inbounds (%class.A4, %class.A4* @a4, i64 0, i32 0), align 8
				; CHECK-NEXT: %t0 = trunc i32 %n to i16
				; CHECK-NEXT: %bf.value = and i16 %t0, 8191
				; CHECK-NEXT: %bf.clear3 = and i16 %bf.load, -8192
				; CHECK-NEXT: %bf.set = or i16 %bf.clear3, %bf.value
				; CHECK-NEXT: store i16 %bf.set, i16* getelementptr inbounds (%class.A4, %class.A4* @a4, i64 0, i32 0), align 8
				; CHECK-NEXT: ret void

				%class.A4 = type { i16, [6 x i8] }
				@a4 = local_unnamed_addr global %class.A4 zeroinitializer, align 8

				define void @test4(i32 %n) {
				entry:
				%bf.load = load i16, i16* getelementptr inbounds (%class.A4, %class.A4* @a4, i64 0, i32 0), align 8
				%t0 = trunc i32 %n to i16
				%bf.value = and i16 %t0, 8191
				%bf.clear3 = and i16 %bf.load, -8192
				%bf.set = or i16 %bf.clear3, %bf.value
				store i16 %bf.set, i16* getelementptr inbounds (%class.A4, %class.A4* @a4, i64 0, i32 0), align 8
				ret void
				}

				; class A5 {
				; unsigned long f1:3;
				; unsigned long f2:16;
				; } a5;
				; a5.f2 = n;
				; The bitfield store cannot be shrinked because it is not aligned on
				; 16bits boundary.
				; CHECK-LABEL: @test5(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: %bf.load = load i32, i32* bitcast (%class.A5* @a5 to i32*), align 8
				; CHECK-NEXT: %bf.value = and i32 %n, 65535
				; CHECK-NEXT: %bf.shl = shl i32 %bf.value, 3
				; CHECK-NEXT: %bf.clear = and i32 %bf.load, -524281
				; CHECK-NEXT: %bf.set = or i32 %bf.clear, %bf.shl
				; CHECK-NEXT: store i32 %bf.set, i32* bitcast (%class.A5* @a5 to i32*), align 8
				; CHECK-NEXT: ret void

				%class.A5 = type { i24, [4 x i8] }
				@a5 = local_unnamed_addr global %class.A5 zeroinitializer, align 8

				define void @test5(i32 %n) {
				entry:
				%bf.load = load i32, i32* bitcast (%class.A5* @a5 to i32*), align 8
				%bf.value = and i32 %n, 65535
				%bf.shl = shl i32 %bf.value, 3
				%bf.clear = and i32 %bf.load, -524281
				%bf.set = or i32 %bf.clear, %bf.shl
				store i32 %bf.set, i32* bitcast (%class.A5* @a5 to i32*), align 8
				ret void
				}

				; class A6 {
				; unsigned long f1:16;
				; unsigned long f2:3;
				; } a6;
				; a6.f1 = n;
				; The bitfield store can be shrinked from i32 to i16 even the load and store
				; are in different BasicBlocks.
				; CHECK-LABEL: @test6(
				; CHECK: if.end:
				; CHECK: %bf.value = and i32 %n, 65535
				; CHECK: %trunc = trunc i32 %bf.value to i16
				; CHECK: store i16 %trunc, i16* bitcast (%class.A6* @a6 to i16*), align 8

				%class.A6 = type { i24, [4 x i8] }
				@a6 = local_unnamed_addr global %class.A6 zeroinitializer, align 8

				define void @test6(i32 %n) {
				entry:
				%bf.load = load i32, i32* bitcast (%class.A6* @a6 to i32*), align 8
				%bf.clear = and i32 %bf.load, 65535
				%cmp = icmp eq i32 %bf.clear, 2
				br i1 %cmp, label %return, label %if.end

				if.end: ; preds = %entry
				%bf.value = and i32 %n, 65535
				%bf.clear3 = and i32 %bf.load, -65536
				%bf.set = or i32 %bf.clear3, %bf.value
				store i32 %bf.set, i32* bitcast (%class.A6* @a6 to i32*), align 8
				br label %return

				return: ; preds = %entry, %if.end
				ret void
				}

				; class A7 {
				; unsigned long f1:16;
				; unsigned long f2:16;
				; } a7;
				; a7.f2 = n;
				; The bitfield store can be shrinked from i32 to i16.
				; CHECK-LABEL: @test7(
				; CHECK: %bf.value = and i32 %n, 65535
				; CHECK: %bf.shl = shl i32 %bf.value, 16
				; CHECK: %lshr = lshr i32 %bf.shl, 16
				; CHECK: %trunc = trunc i32 %lshr to i16
				; CHECK: store i16 %trunc, i16* bitcast (i8* getelementptr (i8, i8* bitcast (%class.A7* @a7 to i8), i32 2) to i16), align 2

				%class.A7 = type { i32, [4 x i8] }
				@a7 = local_unnamed_addr global %class.A7 zeroinitializer, align 8

				define void @test7(i32 %n) {
				entry:
				%bf.load = load i32, i32* getelementptr inbounds (%class.A7, %class.A7* @a7, i32 0, i32 0), align 8
				%bf.value = and i32 %n, 65535
				%bf.shl = shl i32 %bf.value, 16
				%bf.clear = and i32 %bf.load, 65535
				%bf.set = or i32 %bf.clear, %bf.shl
				store i32 %bf.set, i32* getelementptr inbounds (%class.A7, %class.A7* @a7, i32 0, i32 0), align 8
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i24 store
				; to i16.
				; CHECK-LABEL: @i24_or(
				; CHECK: %cast = bitcast i24* %a to i16*
				; CHECK: %load.trunc = load i16, i16* %cast, align 1
				; CHECK: %or.trunc = or i16 %load.trunc, 384
				; CHECK: store i16 %or.trunc, i16* %cast, align 1
				;
				define void @i24_or(i24* %a) {
				%aa = load i24, i24* %a, align 1
				%b = or i24 %aa, 384
				store i24 %b, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i24 store
				; to i8.
				; CHECK-LABEL: @i24_and(
				; CHECK: %cast = bitcast i24* %a to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 1
				; CHECK: %load.trunc = load i8, i8* %uglygep, align 1
				; CHECK: %and.trunc = and i8 %load.trunc, -7
				; CHECK: store i8 %and.trunc, i8* %uglygep, align 1
				;
				define void @i24_and(i24* %a) {
				%aa = load i24, i24* %a, align 1
				%b = and i24 %aa, -1537
				store i24 %b, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i24 store
				; to i16.
				; CHECK-LABEL: @i24_xor(
				; CHECK: %cast = bitcast i24* %a to i16*
				; CHECK: %load.trunc = load i16, i16* %cast, align 1
				; CHECK: %xor.trunc = xor i16 %load.trunc, 384
				; CHECK: store i16 %xor.trunc, i16* %cast, align 1
				;
				define void @i24_xor(i24* %a) {
				%aa = load i24, i24* %a, align 1
				%b = xor i24 %aa, 384
				store i24 %b, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i24 store
				; to i16.
				; CHECK-LABEL: @i24_and_or(
				; CHECK: %cast = bitcast i24* %a to i16*
				; CHECK: %load.trunc = load i16, i16* %cast, align 1
				; CHECK: %and.trunc = and i16 %load.trunc, -128
				; CHECK: %or.trunc = or i16 %and.trunc, 384
				; CHECK: store i16 %or.trunc, i16* %cast, align 1
				;
				define void @i24_and_or(i24* %a) {
				%b = load i24, i24* %a, align 1
				%c = and i24 %b, -128
				%d = or i24 %c, 384
				store i24 %d, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can shrink the i24 store to i8.
				; CHECK-LABEL: @i24_insert_bit(
				; CHECK: %extbit = zext i1 %bit to i24
				; CHECK: %extbit.shl = shl nuw nsw i24 %extbit, 13
				; CHECK: %cast = bitcast i24* %a to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 1
				; CHECK: %load.trunc = load i8, i8* %uglygep, align 1
				; CHECK: %lshr = lshr i24 %extbit.shl, 8
				; CHECK: %trunc = trunc i24 %lshr to i8
				; CHECK: %and.trunc = and i8 %load.trunc, -33
				; CHECK: %or.trunc = or i8 %and.trunc, %trunc
				; CHECK: store i8 %or.trunc, i8* %uglygep, align 1
				;
				define void @i24_insert_bit(i24* %a, i1 zeroext %bit) {
				%extbit = zext i1 %bit to i24
				%b = load i24, i24* %a, align 1
				%extbit.shl = shl nuw nsw i24 %extbit, 13
				%c = and i24 %b, -8193
				%d = or i24 %c, %extbit.shl
				store i24 %d, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i56 store
				; to i16.
				; CHECK-LABEL: @i56_or(
				; CHECK: %cast = bitcast i56* %a to i32*
				; CHECK: %load.trunc = load i32, i32* %cast, align 1
				; CHECK: %or.trunc = or i32 %load.trunc, 384
				; CHECK: store i32 %or.trunc, i32* %cast, align 1
				;
				define void @i56_or(i56* %a) {
				%aa = load i56, i56* %a, align 1
				%b = or i56 %aa, 384
				store i56 %b, i56* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can shrink the i56 store
				; to i16.
				; CHECK-LABEL: @i56_and_or(
				; CHECK: %cast = bitcast i56* %a to i32*
				; CHECK: %load.trunc = load i32, i32* %cast, align 1
				; CHECK: %and.trunc = and i32 %load.trunc, -128
				; CHECK: %or.trunc = or i32 %and.trunc, 384
				; CHECK: store i32 %or.trunc, i32* %cast, align 1
				;
				define void @i56_and_or(i56* %a) {
				%b = load i56, i56* %a, align 1
				%c = and i56 %b, -128
				%d = or i56 %c, 384
				store i56 %d, i56* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can shrink the i56 store to i8.
				; CHECK-LABEL: @i56_insert_bit(
				; CHECK: %extbit = zext i1 %bit to i56
				; CHECK: %extbit.shl = shl nuw nsw i56 %extbit, 13
				; CHECK: %cast = bitcast i56* %a to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 1
				; CHECK: %cast1 = bitcast i8* %uglygep to i32*
				; CHECK: %load.trunc = load i32, i32* %cast1, align 1
				; CHECK: %lshr = lshr i56 %extbit.shl, 8
				; CHECK: %trunc = trunc i56 %lshr to i32
				; CHECK: %and.trunc = and i32 %load.trunc, -33
				; CHECK: %or.trunc = or i32 %and.trunc, %trunc
				; CHECK: store i32 %or.trunc, i32* %cast1, align 1
				;
				define void @i56_insert_bit(i56* %a, i1 zeroext %bit) {
				%extbit = zext i1 %bit to i56
				%b = load i56, i56* %a, align 1
				%extbit.shl = shl nuw nsw i56 %extbit, 13
				%c = and i56 %b, -8193
				%d = or i56 %c, %extbit.shl
				store i56 %d, i56* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i56 store
				; to i16.
				; CHECK-LABEL: @i56_or_alg2(
				; CHECK: %cast = bitcast i56* %a to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 2
				; CHECK: %cast1 = bitcast i8* %uglygep to i32*
				; CHECK: %load.trunc = load i32, i32* %cast1, align 2
				; CHECK: %or.trunc = or i32 %load.trunc, 272
				; CHECK: store i32 %or.trunc, i32* %cast1, align 2
				;
				define void @i56_or_alg2(i56* %a) {
				%aa = load i56, i56* %a, align 2
				%b = or i56 %aa, 17825792
				store i56 %b, i56* %a, align 2
				ret void
				}

test/CodeGen/ARM/illegal-bitfield-loadstore.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=arm-eabi \| FileCheck %s -check-prefix=LE			; RUN: llc < %s -mtriple=arm-eabi \| FileCheck %s -check-prefix=LE
	; RUN: llc < %s -mtriple=armeb-eabi \| FileCheck %s -check-prefix=BE			; RUN: llc < %s -mtriple=armeb-eabi \| FileCheck %s -check-prefix=BE

	define void @i24_or(i24* %a) {			define void @i24_or(i24* %a) {
	; LE-LABEL: i24_or:			; LE-LABEL: i24_or:
	; LE: @ BB#0:			; LE: @ BB#0:
	; LE-NEXT: ldrh r1, [r0]			; LE-NEXT: ldrh r1, [r0]
	; LE-NEXT: orr r1, r1, #384			; LE-NEXT: orr r1, r1, #384
	; LE-NEXT: strh r1, [r0]			; LE-NEXT: strh r1, [r0]
	; LE-NEXT: mov pc, lr			; LE-NEXT: mov pc, lr
	;			;
	; BE-LABEL: i24_or:			; BE-LABEL: i24_or:
	; BE: @ BB#0:			; BE: @ BB#0:
	; BE-NEXT: ldrh r1, [r0]			; BE-NEXT: ldrh r1, [r0, #1]
	; BE-NEXT: ldrb r2, [r0, #2]
	; BE-NEXT: orr r1, r2, r1, lsl #8
	; BE-NEXT: orr r1, r1, #384			; BE-NEXT: orr r1, r1, #384
	; BE-NEXT: strb r1, [r0, #2]			; BE-NEXT: strh r1, [r0, #1]
	; BE-NEXT: lsr r1, r1, #8
	; BE-NEXT: strh r1, [r0]
	; BE-NEXT: mov pc, lr			; BE-NEXT: mov pc, lr
	%aa = load i24, i24* %a, align 1			%aa = load i24, i24* %a, align 1
	%b = or i24 %aa, 384			%b = or i24 %aa, 384
	store i24 %b, i24* %a, align 1			store i24 %b, i24* %a, align 1
	ret void			ret void
	}			}

	define void @i24_and_or(i24* %a) {			define void @i24_and_or(i24* %a) {
	; LE-LABEL: i24_and_or:			; LE-LABEL: i24_and_or:
	; LE: @ BB#0:			; LE: @ BB#0:
	; LE-NEXT: ldrb r1, [r0, #2]			; LE-NEXT: ldrh r1, [r0]
	; LE-NEXT: ldrh r2, [r0]			; LE-NEXT: mov r2, #16256
	; LE-NEXT: orr r1, r2, r1, lsl #16			; LE-NEXT: orr r2, r2, #49152
	; LE-NEXT: ldr r2, .LCPI1_0
	; LE-NEXT: orr r1, r1, #384			; LE-NEXT: orr r1, r1, #384
	; LE-NEXT: and r1, r1, r2			; LE-NEXT: and r1, r1, r2
	; LE-NEXT: strh r1, [r0]			; LE-NEXT: strh r1, [r0]
	; LE-NEXT: lsr r1, r1, #16
	; LE-NEXT: strb r1, [r0, #2]
	; LE-NEXT: mov pc, lr			; LE-NEXT: mov pc, lr
	; LE-NEXT: .p2align 2
	; LE-NEXT: @ BB#1:
	; LE-NEXT: .LCPI1_0:
	; LE-NEXT: .long 16777088 @ 0xffff80
	;			;
	; BE-LABEL: i24_and_or:			; BE-LABEL: i24_and_or:
	; BE: @ BB#0:			; BE: @ BB#0:
	; BE-NEXT: ldrh r1, [r0]			; BE-NEXT: ldrh r1, [r0, #1]
	; BE-NEXT: mov r2, #384			; BE-NEXT: mov r2, #16256
	; BE-NEXT: orr r1, r2, r1, lsl #8			; BE-NEXT: orr r2, r2, #49152
	; BE-NEXT: ldr r2, .LCPI1_0			; BE-NEXT: orr r1, r1, #384
	; BE-NEXT: and r1, r1, r2			; BE-NEXT: and r1, r1, r2
	; BE-NEXT: strb r1, [r0, #2]			; BE-NEXT: strh r1, [r0, #1]
	; BE-NEXT: lsr r1, r1, #8			; BE-NEXT: mov pc, lr
	; BE-NEXT: strh r1, [r0]
	; BE-NEXT: mov pc, lr
	; BE-NEXT: .p2align 2
	; BE-NEXT: @ BB#1:
	; BE-NEXT: .LCPI1_0:
	; BE-NEXT: .long 16777088 @ 0xffff80
	%b = load i24, i24* %a, align 1			%b = load i24, i24* %a, align 1
	%c = and i24 %b, -128			%c = and i24 %b, -128
	%d = or i24 %c, 384			%d = or i24 %c, 384
	store i24 %d, i24* %a, align 1			store i24 %d, i24* %a, align 1
	ret void			ret void
	}			}

	define void @i24_insert_bit(i24* %a, i1 zeroext %bit) {			define void @i24_insert_bit(i24* %a, i1 zeroext %bit) {
	; LE-LABEL: i24_insert_bit:			; LE-LABEL: i24_insert_bit:
	; LE: @ BB#0:			; LE: @ BB#0:
	; LE-NEXT: ldrb r2, [r0, #2]			; LE-NEXT: ldrb r2, [r0, #1]
	; LE-NEXT: ldrh r3, [r0]			; LE-NEXT: and r2, r2, #223
	; LE-NEXT: orr r2, r3, r2, lsl #16			; LE-NEXT: orr r1, r2, r1, lsl #5
	; LE-NEXT: ldr r3, .LCPI2_0			; LE-NEXT: strb r1, [r0, #1]
	; LE-NEXT: and r2, r2, r3
	; LE-NEXT: lsr r3, r2, #16
	; LE-NEXT: orr r1, r2, r1, lsl #13
	; LE-NEXT: strb r3, [r0, #2]
	; LE-NEXT: strh r1, [r0]
	; LE-NEXT: mov pc, lr			; LE-NEXT: mov pc, lr
	; LE-NEXT: .p2align 2
	; LE-NEXT: @ BB#1:
	; LE-NEXT: .LCPI2_0:
	; LE-NEXT: .long 16769023 @ 0xffdfff
	;			;
	; BE-LABEL: i24_insert_bit:			; BE-LABEL: i24_insert_bit:
	; BE: @ BB#0:			; BE: @ BB#0:
	; BE-NEXT: ldrh r2, [r0]			; BE-NEXT: ldrb r2, [r0, #1]
	; BE-NEXT: ldrb r3, [r0, #2]			; BE-NEXT: and r2, r2, #223
	; BE-NEXT: orr r2, r3, r2, lsl #8			; BE-NEXT: orr r1, r2, r1, lsl #5
	; BE-NEXT: ldr r3, .LCPI2_0			; BE-NEXT: strb r1, [r0, #1]
	; BE-NEXT: and r2, r2, r3			; BE-NEXT: mov pc, lr
	; BE-NEXT: orr r1, r2, r1, lsl #13
	; BE-NEXT: strb r2, [r0, #2]
	; BE-NEXT: lsr r1, r1, #8
	; BE-NEXT: strh r1, [r0]
	; BE-NEXT: mov pc, lr
	; BE-NEXT: .p2align 2
	; BE-NEXT: @ BB#1:
	; BE-NEXT: .LCPI2_0:
	; BE-NEXT: .long 16769023 @ 0xffdfff
	%extbit = zext i1 %bit to i24			%extbit = zext i1 %bit to i24
	%b = load i24, i24* %a, align 1			%b = load i24, i24* %a, align 1
	%extbit.shl = shl nuw nsw i24 %extbit, 13			%extbit.shl = shl nuw nsw i24 %extbit, 13
	%c = and i24 %b, -8193			%c = and i24 %b, -8193
	%d = or i24 %c, %extbit.shl			%d = or i24 %c, %extbit.shl
	store i24 %d, i24* %a, align 1			store i24 %d, i24* %a, align 1
	ret void			ret void
	}			}

	define void @i56_or(i56* %a) {			define void @i56_or(i56* %a) {
	; LE-LABEL: i56_or:			; LE-LABEL: i56_or:
	; LE: @ BB#0:			; LE: @ BB#0:
	; LE-NEXT: ldr r1, [r0]			; LE-NEXT: ldr r1, [r0]
	; LE-NEXT: orr r1, r1, #384			; LE-NEXT: orr r1, r1, #384
	; LE-NEXT: str r1, [r0]			; LE-NEXT: str r1, [r0]
	; LE-NEXT: mov pc, lr			; LE-NEXT: mov pc, lr
	;			;
	; BE-LABEL: i56_or:			; BE-LABEL: i56_or:
	; BE: @ BB#0:			; BE: @ BB#0:
	; BE-NEXT: mov r1, r0			; BE-NEXT: ldr r1, [r0, #3]
	; BE-NEXT: ldr r12, [r0]			; BE-NEXT: orr r1, r1, #384
	; BE-NEXT: ldrh r2, [r1, #4]!			; BE-NEXT: str r1, [r0, #3]
	; BE-NEXT: ldrb r3, [r1, #2]
	; BE-NEXT: orr r2, r3, r2, lsl #8
	; BE-NEXT: orr r2, r2, r12, lsl #24
	; BE-NEXT: orr r2, r2, #384
	; BE-NEXT: lsr r3, r2, #8
	; BE-NEXT: strb r2, [r1, #2]
	; BE-NEXT: strh r3, [r1]
	; BE-NEXT: bic r1, r12, #255
	; BE-NEXT: orr r1, r1, r2, lsr #24
	; BE-NEXT: str r1, [r0]
	; BE-NEXT: mov pc, lr			; BE-NEXT: mov pc, lr
	%aa = load i56, i56* %a			%aa = load i56, i56* %a
	%b = or i56 %aa, 384			%b = or i56 %aa, 384
	store i56 %b, i56* %a			store i56 %b, i56* %a
	ret void			ret void
	}			}

	define void @i56_and_or(i56* %a) {			define void @i56_and_or(i56* %a) {
	; LE-LABEL: i56_and_or:			; LE-LABEL: i56_and_or:
	; LE: @ BB#0:			; LE: @ BB#0:
	; LE-NEXT: ldr r1, [r0]			; LE-NEXT: ldr r1, [r0]
	; LE-NEXT: orr r1, r1, #384			; LE-NEXT: orr r1, r1, #384
	; LE-NEXT: bic r1, r1, #127			; LE-NEXT: bic r1, r1, #127
	; LE-NEXT: str r1, [r0]			; LE-NEXT: str r1, [r0]
	; LE-NEXT: mov pc, lr			; LE-NEXT: mov pc, lr
	;			;
	; BE-LABEL: i56_and_or:			; BE-LABEL: i56_and_or:
	; BE: @ BB#0:			; BE: @ BB#0:
	; BE-NEXT: mov r1, r0			; BE-NEXT: ldr r1, [r0, #3]
	; BE-NEXT: mov r3, #128			; BE-NEXT: orr r1, r1, #384
	; BE-NEXT: ldrh r2, [r1, #4]!			; BE-NEXT: bic r1, r1, #127
	; BE-NEXT: strb r3, [r1, #2]			; BE-NEXT: str r1, [r0, #3]
	; BE-NEXT: lsl r2, r2, #8
	; BE-NEXT: ldr r12, [r0]
	; BE-NEXT: orr r2, r2, r12, lsl #24
	; BE-NEXT: orr r2, r2, #384
	; BE-NEXT: lsr r3, r2, #8
	; BE-NEXT: strh r3, [r1]
	; BE-NEXT: bic r1, r12, #255
	; BE-NEXT: orr r1, r1, r2, lsr #24
	; BE-NEXT: str r1, [r0]
	; BE-NEXT: mov pc, lr			; BE-NEXT: mov pc, lr

	%b = load i56, i56* %a, align 1			%b = load i56, i56* %a, align 1
	%c = and i56 %b, -128			%c = and i56 %b, -128
	%d = or i56 %c, 384			%d = or i56 %c, 384
	store i56 %d, i56* %a, align 1			store i56 %d, i56* %a, align 1
	ret void			ret void
	}			}

	define void @i56_insert_bit(i56* %a, i1 zeroext %bit) {			define void @i56_insert_bit(i56* %a, i1 zeroext %bit) {
	; LE-LABEL: i56_insert_bit:			; LE-LABEL: i56_insert_bit:
	; LE: @ BB#0:			; LE: @ BB#0:
	; LE-NEXT: ldr r2, [r0]			; LE-NEXT: ldr r2, [r0, #1]
	; LE-NEXT: bic r2, r2, #8192			; LE-NEXT: bic r2, r2, #32
	; LE-NEXT: orr r1, r2, r1, lsl #13			; LE-NEXT: orr r1, r2, r1, lsl #5
	; LE-NEXT: str r1, [r0]			; LE-NEXT: str r1, [r0, #1]
	; LE-NEXT: mov pc, lr			; LE-NEXT: mov pc, lr
	;			;
	; BE-LABEL: i56_insert_bit:			; BE-LABEL: i56_insert_bit:
	; BE: @ BB#0:			; BE: @ BB#0:
	; BE-NEXT: .save {r11, lr}			; BE-NEXT: ldr r2, [r0, #2]
	; BE-NEXT: push {r11, lr}			; BE-NEXT: bic r2, r2, #32
	; BE-NEXT: mov r2, r0			; BE-NEXT: orr r1, r2, r1, lsl #5
	; BE-NEXT: ldr lr, [r0]			; BE-NEXT: str r1, [r0, #2]
	; BE-NEXT: ldrh r12, [r2, #4]!
	; BE-NEXT: ldrb r3, [r2, #2]
	; BE-NEXT: orr r12, r3, r12, lsl #8
	; BE-NEXT: orr r3, r12, lr, lsl #24
	; BE-NEXT: bic r3, r3, #8192
	; BE-NEXT: orr r1, r3, r1, lsl #13
	; BE-NEXT: strb r3, [r2, #2]
	; BE-NEXT: lsr r3, r1, #8
	; BE-NEXT: strh r3, [r2]
	; BE-NEXT: bic r2, lr, #255
	; BE-NEXT: orr r1, r2, r1, lsr #24
	; BE-NEXT: str r1, [r0]
	; BE-NEXT: pop {r11, lr}
	; BE-NEXT: mov pc, lr			; BE-NEXT: mov pc, lr
	%extbit = zext i1 %bit to i56			%extbit = zext i1 %bit to i56
	%b = load i56, i56* %a, align 1			%b = load i56, i56* %a, align 1
	%extbit.shl = shl nuw nsw i56 %extbit, 13			%extbit.shl = shl nuw nsw i56 %extbit, 13
	%c = and i56 %b, -8193			%c = and i56 %b, -8193
	%d = or i56 %c, %extbit.shl			%d = or i56 %c, %extbit.shl
	store i56 %d, i56* %a, align 1			store i56 %d, i56* %a, align 1
	ret void			ret void
	}			}

test/CodeGen/X86/bitfield-store.ll

				; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -bitfieldshrink -S \| FileCheck %s
				; Check bitfield store is shrinked properly in cases below.

				; class A1 {
				; unsigned long f1:8;
				; unsigned long f2:3;
				; } a1;
				; a1.f1 = n;
				;
				; The bitfield store can be shrinked from i16 to i8.
				; CHECK-LABEL: @test1(
				; CHECK: %conv = zext i32 %n to i64
				; CHECK: %t0 = trunc i64 %conv to i16
				; CHECK: %bf.value = and i16 %t0, 255
				; CHECK: %trunc = trunc i16 %bf.value to i8
				; CHECK: store i8 %trunc, i8* bitcast (%class.A1* @a1 to i8*), align 8

				%class.A1 = type { i16, [6 x i8] }
				@a1 = local_unnamed_addr global %class.A1 zeroinitializer, align 8

				define void @test1(i32 %n) {
				entry:
				%conv = zext i32 %n to i64
				%t0 = trunc i64 %conv to i16
				%bf.load = load i16, i16* getelementptr inbounds (%class.A1, %class.A1* @a1, i32 0, i32 0), align 8
				%bf.value = and i16 %t0, 255
				%bf.clear = and i16 %bf.load, -256
				%bf.set = or i16 %bf.clear, %bf.value
				store i16 %bf.set, i16* getelementptr inbounds (%class.A1, %class.A1* @a1, i32 0, i32 0), align 8
				ret void
				}

				; class A2 {
				; unsigned long f1:16;
				; unsigned long f2:3;
				; } a2;
				; a2.f1 = n;
				; The bitfield store can be shrinked from i32 to i16.
				; CHECK-LABEL: @test2(
				; CHECK: %bf.value = and i32 %n, 65535
				; CHECK: %trunc = trunc i32 %bf.value to i16
				; CHECK: store i16 %trunc, i16* bitcast (%class.A2* @a2 to i16*), align 8

				%class.A2 = type { i24, [4 x i8] }
				@a2 = local_unnamed_addr global %class.A2 zeroinitializer, align 8

				define void @test2(i32 %n) {
				entry:
				%bf.load = load i32, i32* bitcast (%class.A2* @a2 to i32*), align 8
				%bf.value = and i32 %n, 65535
				%bf.clear = and i32 %bf.load, -65536
				%bf.set = or i32 %bf.clear, %bf.value
				store i32 %bf.set, i32* bitcast (%class.A2* @a2 to i32*), align 8
				ret void
				}

				; class A3 {
				; unsigned long f1:32;
				; unsigned long f2:3;
				; } a3;
				; a3.f1 = n;
				; The bitfield store can be shrinked from i64 to i32.
				; CHECK-LABEL: @test3(
				; CHECK: %conv = zext i32 %n to i64
				; CHECK: %bf.value = and i64 %conv, 4294967295
				; CHECK: %trunc = trunc i64 %bf.value to i32
				; CHECK: store i32 %trunc, i32* bitcast (%class.A3* @a3 to i32*), align 8

				%class.A3 = type { i40 }
				@a3 = local_unnamed_addr global %class.A3 zeroinitializer, align 8

				define void @test3(i32 %n) {
				entry:
				%conv = zext i32 %n to i64
				%bf.load = load i64, i64* bitcast (%class.A3* @a3 to i64*), align 8
				%bf.value = and i64 %conv, 4294967295
				%bf.clear = and i64 %bf.load, -4294967296
				%bf.set = or i64 %bf.clear, %bf.value
				store i64 %bf.set, i64* bitcast (%class.A3* @a3 to i64*), align 8
				ret void
				}

				; class A4 {
				; unsigned long f1:13;
				; unsigned long f2:3;
				; } a4;
				; a4.f1 = n;
				; The bitfield store cannot be shrinked because the field is not 8/16/32 bits.
				; CHECK-LABEL: @test4(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: %bf.load = load i16, i16* getelementptr inbounds (%class.A4, %class.A4* @a4, i64 0, i32 0), align 8
				; CHECK-NEXT: %t0 = trunc i32 %n to i16
				; CHECK-NEXT: %bf.value = and i16 %t0, 8191
				; CHECK-NEXT: %bf.clear3 = and i16 %bf.load, -8192
				; CHECK-NEXT: %bf.set = or i16 %bf.clear3, %bf.value
				; CHECK-NEXT: store i16 %bf.set, i16* getelementptr inbounds (%class.A4, %class.A4* @a4, i64 0, i32 0), align 8
				; CHECK-NEXT: ret void

				%class.A4 = type { i16, [6 x i8] }
				@a4 = local_unnamed_addr global %class.A4 zeroinitializer, align 8

				define void @test4(i32 %n) {
				entry:
				%bf.load = load i16, i16* getelementptr inbounds (%class.A4, %class.A4* @a4, i64 0, i32 0), align 8
				%t0 = trunc i32 %n to i16
				%bf.value = and i16 %t0, 8191
				%bf.clear3 = and i16 %bf.load, -8192
				%bf.set = or i16 %bf.clear3, %bf.value
				store i16 %bf.set, i16* getelementptr inbounds (%class.A4, %class.A4* @a4, i64 0, i32 0), align 8
				ret void
				}

				; class A5 {
				; unsigned long f1:3;
				; unsigned long f2:16;
				; } a5;
				; a5.f2 = n;
				; The bitfield store cannot be shrinked because it is not aligned on
				; 16bits boundary.
				; CHECK-LABEL: @test5(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: %bf.load = load i32, i32* bitcast (%class.A5* @a5 to i32*), align 8
				; CHECK-NEXT: %bf.value = and i32 %n, 65535
				; CHECK-NEXT: %bf.shl = shl i32 %bf.value, 3
				; CHECK-NEXT: %bf.clear = and i32 %bf.load, -524281
				; CHECK-NEXT: %bf.set = or i32 %bf.clear, %bf.shl
				; CHECK-NEXT: store i32 %bf.set, i32* bitcast (%class.A5* @a5 to i32*), align 8
				; CHECK-NEXT: ret void

				%class.A5 = type { i24, [4 x i8] }
				@a5 = local_unnamed_addr global %class.A5 zeroinitializer, align 8

				define void @test5(i32 %n) {
				entry:
				%bf.load = load i32, i32* bitcast (%class.A5* @a5 to i32*), align 8
				%bf.value = and i32 %n, 65535
				%bf.shl = shl i32 %bf.value, 3
				%bf.clear = and i32 %bf.load, -524281
				%bf.set = or i32 %bf.clear, %bf.shl
				store i32 %bf.set, i32* bitcast (%class.A5* @a5 to i32*), align 8
				ret void
				}

				; class A6 {
				; unsigned long f1:16;
				; unsigned long f2:3;
				; } a6;
				; a6.f1 = n;
				; The bitfield store can be shrinked from i32 to i16 even the load and store
				; are in different BasicBlocks.
				; CHECK-LABEL: @test6(
				; CHECK: if.end:
				; CHECK: %bf.value = and i32 %n, 65535
				; CHECK: %trunc = trunc i32 %bf.value to i16
				; CHECK: store i16 %trunc, i16* bitcast (%class.A6* @a6 to i16*), align 8

				%class.A6 = type { i24, [4 x i8] }
				@a6 = local_unnamed_addr global %class.A6 zeroinitializer, align 8

				define void @test6(i32 %n) {
				entry:
				%bf.load = load i32, i32* bitcast (%class.A6* @a6 to i32*), align 8
				%bf.clear = and i32 %bf.load, 65535
				%cmp = icmp eq i32 %bf.clear, 2
				br i1 %cmp, label %return, label %if.end

				if.end: ; preds = %entry
				%bf.value = and i32 %n, 65535
				%bf.clear3 = and i32 %bf.load, -65536
				%bf.set = or i32 %bf.clear3, %bf.value
				store i32 %bf.set, i32* bitcast (%class.A6* @a6 to i32*), align 8
				br label %return

				return: ; preds = %entry, %if.end
				ret void
				}

				; class A7 {
				; unsigned long f1:16;
				; unsigned long f2:16;
				; } a7;
				; a7.f2 = n;
				; The bitfield store can be shrinked from i32 to i16.
				; CHECK-LABEL: @test7(
				; CHECK: %bf.value = and i32 %n, 65535
				; CHECK: %bf.shl = shl i32 %bf.value, 16
				; CHECK: %lshr = lshr i32 %bf.shl, 16
				; CHECK: %trunc = trunc i32 %lshr to i16
				; CHECK: store i16 %trunc, i16* bitcast (i8* getelementptr (i8, i8* bitcast (%class.A7* @a7 to i8), i32 2) to i16), align 2

				%class.A7 = type { i32, [4 x i8] }
				@a7 = local_unnamed_addr global %class.A7 zeroinitializer, align 8

				define void @test7(i32 %n) {
				entry:
				%bf.load = load i32, i32* getelementptr inbounds (%class.A7, %class.A7* @a7, i32 0, i32 0), align 8
				%bf.value = and i32 %n, 65535
				%bf.shl = shl i32 %bf.value, 16
				%bf.clear = and i32 %bf.load, 65535
				%bf.set = or i32 %bf.clear, %bf.shl
				store i32 %bf.set, i32* getelementptr inbounds (%class.A7, %class.A7* @a7, i32 0, i32 0), align 8
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i24 store
				; to i16.
				; CHECK-LABEL: @i24_or(
				; CHECK: %cast = bitcast i24* %a to i16*
				; CHECK: %load.trunc = load i16, i16* %cast, align 1
				; CHECK: %or.trunc = or i16 %load.trunc, 384
				; CHECK: store i16 %or.trunc, i16* %cast, align 1
				;
				define void @i24_or(i24* %a) {
				%aa = load i24, i24* %a, align 1
				%b = or i24 %aa, 384
				store i24 %b, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i24 store
				; to i8.
				; CHECK-LABEL: @i24_and(
				; CHECK: %cast = bitcast i24* %a to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 1
				; CHECK: %load.trunc = load i8, i8* %uglygep, align 1
				; CHECK: %and.trunc = and i8 %load.trunc, -7
				; CHECK: store i8 %and.trunc, i8* %uglygep, align 1
				;
				define void @i24_and(i24* %a) {
				%aa = load i24, i24* %a, align 1
				%b = and i24 %aa, -1537
				store i24 %b, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i24 store
				; to i16.
				; CHECK-LABEL: @i24_xor(
				; CHECK: %cast = bitcast i24* %a to i16*
				; CHECK: %load.trunc = load i16, i16* %cast, align 1
				; CHECK: %xor.trunc = xor i16 %load.trunc, 384
				; CHECK: store i16 %xor.trunc, i16* %cast, align 1
				;
				define void @i24_xor(i24* %a) {
				%aa = load i24, i24* %a, align 1
				%b = xor i24 %aa, 384
				store i24 %b, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i24 store
				; to i16.
				; CHECK-LABEL: @i24_and_or(
				; CHECK: %cast = bitcast i24* %a to i16*
				; CHECK: %load.trunc = load i16, i16* %cast, align 1
				; CHECK: %and.trunc = and i16 %load.trunc, -128
				; CHECK: %or.trunc = or i16 %and.trunc, 384
				; CHECK: store i16 %or.trunc, i16* %cast, align 1
				;
				define void @i24_and_or(i24* %a) {
				%b = load i24, i24* %a, align 1
				%c = and i24 %b, -128
				%d = or i24 %c, 384
				store i24 %d, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can shrink the i24 store to i8.
				; CHECK-LABEL: @i24_insert_bit(
				; CHECK: %extbit = zext i1 %bit to i24
				; CHECK: %extbit.shl = shl nuw nsw i24 %extbit, 13
				; CHECK: %cast = bitcast i24* %a to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 1
				; CHECK: %load.trunc = load i8, i8* %uglygep, align 1
				; CHECK: %lshr = lshr i24 %extbit.shl, 8
				; CHECK: %trunc = trunc i24 %lshr to i8
				; CHECK: %and.trunc = and i8 %load.trunc, -33
				; CHECK: %or.trunc = or i8 %and.trunc, %trunc
				; CHECK: store i8 %or.trunc, i8* %uglygep, align 1
				;
				define void @i24_insert_bit(i24* %a, i1 zeroext %bit) {
				%extbit = zext i1 %bit to i24
				%b = load i24, i24* %a, align 1
				%extbit.shl = shl nuw nsw i24 %extbit, 13
				%c = and i24 %b, -8193
				%d = or i24 %c, %extbit.shl
				store i24 %d, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i56 store
				; to i16.
				; CHECK-LABEL: @i56_or(
				; CHECK: %cast = bitcast i56* %a to i16*
				; CHECK: %load.trunc = load i16, i16* %cast, align 1
				; CHECK: %or.trunc = or i16 %load.trunc, 384
				; CHECK: store i16 %or.trunc, i16* %cast, align 1
				;
				define void @i56_or(i56* %a) {
				%aa = load i56, i56* %a, align 1
				%b = or i56 %aa, 384
				store i56 %b, i56* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can shrink the i56 store
				; to i16.
				; CHECK-LABEL: @i56_and_or(
				; CHECK: %cast = bitcast i56* %a to i16*
				; CHECK: %load.trunc = load i16, i16* %cast, align 1
				; CHECK: %and.trunc = and i16 %load.trunc, -128
				; CHECK: %or.trunc = or i16 %and.trunc, 384
				; CHECK: store i16 %or.trunc, i16* %cast, align 1
				;
				define void @i56_and_or(i56* %a) {
				%b = load i56, i56* %a, align 1
				%c = and i56 %b, -128
				%d = or i56 %c, 384
				store i56 %d, i56* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can shrink the i56 store to i8.
				; CHECK-LABEL: @i56_insert_bit(
				; CHECK: %extbit = zext i1 %bit to i56
				; CHECK: %extbit.shl = shl nuw nsw i56 %extbit, 13
				; CHECK: %cast = bitcast i56* %a to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 1
				; CHECK: %load.trunc = load i8, i8* %uglygep, align 1
				; CHECK: %lshr = lshr i56 %extbit.shl, 8
				; CHECK: %trunc = trunc i56 %lshr to i8
				; CHECK: %and.trunc = and i8 %load.trunc, -33
				; CHECK: %or.trunc = or i8 %and.trunc, %trunc
				; CHECK: store i8 %or.trunc, i8* %uglygep, align 1
				;
				define void @i56_insert_bit(i56* %a, i1 zeroext %bit) {
				%extbit = zext i1 %bit to i56
				%b = load i56, i56* %a, align 1
				%extbit.shl = shl nuw nsw i56 %extbit, 13
				%c = and i56 %b, -8193
				%d = or i56 %c, %extbit.shl
				store i56 %d, i56* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i56 store
				; to i16.
				; CHECK-LABEL: @i56_or_alg2(
				; CHECK: %cast = bitcast i56* %a to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 2
				; CHECK: %cast1 = bitcast i8* %uglygep to i16*
				; CHECK: %load.trunc = load i16, i16* %cast1, align 2
				; CHECK: %or.trunc = or i16 %load.trunc, 272
				; CHECK: store i16 %or.trunc, i16* %cast1, align 2
				;
				define void @i56_or_alg2(i56* %a) {
				%aa = load i56, i56* %a, align 2
				%b = or i56 %aa, 17825792
				store i56 %b, i56* %a, align 2
				ret void
				}

test/CodeGen/X86/illegal-bitfield-loadstore.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7 \| FileCheck %s

	define void @i24_or(i24* %a) {			define void @i24_or(i24* %a) {
	; CHECK-LABEL: i24_or:			; CHECK-LABEL: i24_or:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzwl (%rdi), %eax			; CHECK-NEXT: orw $384, (%rdi) # imm = 0x180
	; CHECK-NEXT: movzbl 2(%rdi), %ecx
	; CHECK-NEXT: movb %cl, 2(%rdi)
	; CHECK-NEXT: shll $16, %ecx
	; CHECK-NEXT: orl %eax, %ecx
	; CHECK-NEXT: orl $384, %ecx # imm = 0x180
	; CHECK-NEXT: movw %cx, (%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%aa = load i24, i24* %a, align 1			%aa = load i24, i24* %a, align 1
	%b = or i24 %aa, 384			%b = or i24 %aa, 384
	store i24 %b, i24* %a, align 1			store i24 %b, i24* %a, align 1
	ret void			ret void
	}			}

	define void @i24_and_or(i24* %a) {			define void @i24_and_or(i24* %a) {
	; CHECK-LABEL: i24_and_or:			; CHECK-LABEL: i24_and_or:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzwl (%rdi), %eax			; CHECK-NEXT: movzwl (%rdi), %eax
	; CHECK-NEXT: movzbl 2(%rdi), %ecx			; CHECK-NEXT: orl $384, %eax # imm = 0x180
	; CHECK-NEXT: shll $16, %ecx			; CHECK-NEXT: andl $65408, %eax # imm = 0xFF80
	; CHECK-NEXT: orl %eax, %ecx			; CHECK-NEXT: movw %ax, (%rdi)
	; CHECK-NEXT: orl $384, %ecx # imm = 0x180
	; CHECK-NEXT: andl $16777088, %ecx # imm = 0xFFFF80
	; CHECK-NEXT: movw %cx, (%rdi)
	; CHECK-NEXT: shrl $16, %ecx
	; CHECK-NEXT: movb %cl, 2(%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%b = load i24, i24* %a, align 1			%b = load i24, i24* %a, align 1
	%c = and i24 %b, -128			%c = and i24 %b, -128
	%d = or i24 %c, 384			%d = or i24 %c, 384
	store i24 %d, i24* %a, align 1			store i24 %d, i24* %a, align 1
	ret void			ret void
	}			}

	define void @i24_insert_bit(i24* %a, i1 zeroext %bit) {			define void @i24_insert_bit(i24* %a, i1 zeroext %bit) {
	; CHECK-LABEL: i24_insert_bit:			; CHECK-LABEL: i24_insert_bit:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzbl %sil, %eax			; CHECK-NEXT: movb 1(%rdi), %al
	; CHECK-NEXT: movzwl (%rdi), %ecx			; CHECK-NEXT: shlb $5, %sil
	; CHECK-NEXT: movzbl 2(%rdi), %edx			; CHECK-NEXT: andb $-33, %al
	; CHECK-NEXT: shll $16, %edx			; CHECK-NEXT: orb %sil, %al
	; CHECK-NEXT: orl %ecx, %edx			; CHECK-NEXT: movb %al, 1(%rdi)
	; CHECK-NEXT: shll $13, %eax
	; CHECK-NEXT: andl $16769023, %edx # imm = 0xFFDFFF
	; CHECK-NEXT: orl %edx, %eax
	; CHECK-NEXT: shrl $16, %edx
	; CHECK-NEXT: movb %dl, 2(%rdi)
	; CHECK-NEXT: movw %ax, (%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%extbit = zext i1 %bit to i24			%extbit = zext i1 %bit to i24
	%b = load i24, i24* %a, align 1			%b = load i24, i24* %a, align 1
	%extbit.shl = shl nuw nsw i24 %extbit, 13			%extbit.shl = shl nuw nsw i24 %extbit, 13
	%c = and i24 %b, -8193			%c = and i24 %b, -8193
	%d = or i24 %c, %extbit.shl			%d = or i24 %c, %extbit.shl
	store i24 %d, i24* %a, align 1			store i24 %d, i24* %a, align 1
	ret void			ret void
	}			}

	define void @i56_or(i56* %a) {			define void @i56_or(i56* %a) {
	; CHECK-LABEL: i56_or:			; CHECK-LABEL: i56_or:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzwl 4(%rdi), %eax			; CHECK-NEXT: orw $384, (%rdi) # imm = 0x180
	; CHECK-NEXT: movzbl 6(%rdi), %ecx
	; CHECK-NEXT: movl (%rdi), %edx
	; CHECK-NEXT: movb %cl, 6(%rdi)
	; CHECK-NEXT: # kill: %ECX<def> %ECX<kill> %RCX<kill> %RCX<def>
	; CHECK-NEXT: shll $16, %ecx
	; CHECK-NEXT: orl %eax, %ecx
	; CHECK-NEXT: shlq $32, %rcx
	; CHECK-NEXT: orq %rcx, %rdx
	; CHECK-NEXT: orq $384, %rdx # imm = 0x180
	; CHECK-NEXT: movl %edx, (%rdi)
	; CHECK-NEXT: shrq $32, %rdx
	; CHECK-NEXT: movw %dx, 4(%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%aa = load i56, i56* %a, align 1			%aa = load i56, i56* %a, align 1
	%b = or i56 %aa, 384			%b = or i56 %aa, 384
	store i56 %b, i56* %a, align 1			store i56 %b, i56* %a, align 1
	ret void			ret void
	}			}

	define void @i56_and_or(i56* %a) {			define void @i56_and_or(i56* %a) {
	; CHECK-LABEL: i56_and_or:			; CHECK-LABEL: i56_and_or:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzwl 4(%rdi), %eax			; CHECK-NEXT: movzwl (%rdi), %eax
	; CHECK-NEXT: movzbl 6(%rdi), %ecx			; CHECK-NEXT: orl $384, %eax # imm = 0x180
	; CHECK-NEXT: shll $16, %ecx			; CHECK-NEXT: andl $65408, %eax # imm = 0xFF80
	; CHECK-NEXT: orl %eax, %ecx			; CHECK-NEXT: movw %ax, (%rdi)
	; CHECK-NEXT: shlq $32, %rcx
	; CHECK-NEXT: movl (%rdi), %eax
	; CHECK-NEXT: orq %rcx, %rax
	; CHECK-NEXT: orq $384, %rax # imm = 0x180
	; CHECK-NEXT: movabsq $72057594037927808, %rcx # imm = 0xFFFFFFFFFFFF80
	; CHECK-NEXT: andq %rax, %rcx
	; CHECK-NEXT: movl %ecx, (%rdi)
	; CHECK-NEXT: movq %rcx, %rax
	; CHECK-NEXT: shrq $32, %rax
	; CHECK-NEXT: movw %ax, 4(%rdi)
	; CHECK-NEXT: shrq $48, %rcx
	; CHECK-NEXT: movb %cl, 6(%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%b = load i56, i56* %a, align 1			%b = load i56, i56* %a, align 1
	%c = and i56 %b, -128			%c = and i56 %b, -128
	%d = or i56 %c, 384			%d = or i56 %c, 384
	store i56 %d, i56* %a, align 1			store i56 %d, i56* %a, align 1
	ret void			ret void
	}			}

	define void @i56_insert_bit(i56* %a, i1 zeroext %bit) {			define void @i56_insert_bit(i56* %a, i1 zeroext %bit) {
	; CHECK-LABEL: i56_insert_bit:			; CHECK-LABEL: i56_insert_bit:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzbl %sil, %eax			; CHECK-NEXT: movb 1(%rdi), %al
	; CHECK-NEXT: movzwl 4(%rdi), %ecx			; CHECK-NEXT: shlb $5, %sil
	; CHECK-NEXT: movzbl 6(%rdi), %edx			; CHECK-NEXT: andb $-33, %al
	; CHECK-NEXT: shll $16, %edx			; CHECK-NEXT: orb %sil, %al
	; CHECK-NEXT: orl %ecx, %edx			; CHECK-NEXT: movb %al, 1(%rdi)
	; CHECK-NEXT: shlq $32, %rdx
	; CHECK-NEXT: movl (%rdi), %ecx
	; CHECK-NEXT: orq %rdx, %rcx
	; CHECK-NEXT: shlq $13, %rax
	; CHECK-NEXT: movabsq $72057594037919743, %rdx # imm = 0xFFFFFFFFFFDFFF
	; CHECK-NEXT: andq %rcx, %rdx
	; CHECK-NEXT: orq %rdx, %ra
	; CHECK-NEXT: movl %eax, (%rdi)
	; CHECK-NEXT: shrq $48, %rdx
	; CHECK-NEXT: movb %dl, 6(%rdi)
	; CHECK-NEXT: shrq $32, %rax
	; CHECK-NEXT: movw %ax, 4(%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%extbit = zext i1 %bit to i56			%extbit = zext i1 %bit to i56
	%b = load i56, i56* %a, align 1			%b = load i56, i56* %a, align 1
	%extbit.shl = shl nuw nsw i56 %extbit, 13			%extbit.shl = shl nuw nsw i56 %extbit, 13
	%c = and i56 %b, -8193			%c = and i56 %b, -8193
	%d = or i56 %c, %extbit.shl			%d = or i56 %c, %extbit.shl
	store i56 %d, i56* %a, align 1			store i56 %d, i56* %a, align 1
	ret void			ret void
	}			}

tools/opt/opt.cpp

Show First 20 Lines • Show All 374 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {
initializeIPO(Registry);		initializeIPO(Registry);
initializeAnalysis(Registry);		initializeAnalysis(Registry);
initializeTransformUtils(Registry);		initializeTransformUtils(Registry);
initializeInstCombine(Registry);		initializeInstCombine(Registry);
initializeInstrumentation(Registry);		initializeInstrumentation(Registry);
initializeTarget(Registry);		initializeTarget(Registry);
// For codegen passes, only passes that do IR to IR transformation are		// For codegen passes, only passes that do IR to IR transformation are
// supported.		// supported.
		initializeBitfieldShrinkingPassPass(Registry);
initializeCodeGenPreparePass(Registry);		initializeCodeGenPreparePass(Registry);
initializeAtomicExpandPass(Registry);		initializeAtomicExpandPass(Registry);
initializeRewriteSymbolsLegacyPassPass(Registry);		initializeRewriteSymbolsLegacyPassPass(Registry);
initializeWinEHPreparePass(Registry);		initializeWinEHPreparePass(Registry);
initializeDwarfEHPreparePass(Registry);		initializeDwarfEHPreparePass(Registry);
initializeSafeStackPass(Registry);		initializeSafeStackPass(Registry);
initializeSjLjEHPreparePass(Registry);		initializeSjLjEHPreparePass(Registry);
initializePreISelIntrinsicLoweringLegacyPassPass(Registry);		initializePreISelIntrinsicLoweringLegacyPassPass(Registry);
▲ Show 20 Lines • Show All 361 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independentlyNeeds RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 93018

include/llvm/CodeGen/Passes.h

include/llvm/IR/PatternMatch.h

include/llvm/InitializePasses.h

include/llvm/LinkAllPasses.h

lib/CodeGen/CodeGen.cpp

lib/CodeGen/TargetPassConfig.cpp

lib/Transforms/Scalar/BitfieldShrinking.cpp

lib/Transforms/Scalar/CMakeLists.txt

test/CodeGen/ARM/bitfield-store.ll

test/CodeGen/ARM/illegal-bitfield-loadstore.ll

test/CodeGen/X86/bitfield-store.ll

test/CodeGen/X86/illegal-bitfield-loadstore.ll

tools/opt/opt.cpp

[BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently
Needs RevisionPublic