This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
Analysis/
-
MemorySSAUpdater.h
-
CodeGen/
2
Passes.h
-
IR/
-
PatternMatch.h
-
InitializePasses.h
-
LinkAllPasses.h
-
Target/
7
TargetLowering.h
-
Transforms/Utils/
-
Utils/
-
Local.h
-
lib/
-
Analysis/
-
MemorySSAUpdater.cpp
-
CodeGen/
-
CMakeLists.txt
-
CodeGen.cpp
115
MemAccessShrinking.cpp
2
TargetPassConfig.cpp
-
Target/AMDGPU/
-
AMDGPU/
-
AMDGPUISelLowering.h
-
AMDGPUISelLowering.cpp
-
Transforms/Utils/
-
Utils/
-
Local.cpp
-
test/CodeGen/
-
CodeGen/
-
ARM/
-
illegal-bitfield-loadstore.ll
-
load-shrink.ll
-
store-shrink.ll
-
X86/
-
2008-09-11-CoalescerBug2.ll
-
constant-combines.ll
-
i16lshr8pat.ll
-
illegal-bitfield-loadstore.ll
-
load-shrink.ll
-
load-slice.ll
-
lsr-loop-exit-cond.ll
-
mem-access-shrink.ll
-
store-shrink.ll
-
tools/opt/
-
opt/
-
opt.cpp

Differential D30416

[BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently
Needs RevisionPublic

Authored by wmi on Feb 27 2017, 10:39 AM.

Download Raw Diff

Details

Reviewers

spatel
chandlerc
majnemer
eli.friedman
mkuper
javed.absar

Summary

reduceLoadOpStoreWidth is a useful optimization already existing in DAGCombiner. It can shrink the bitfield store in the following testcase:

class A {
public:
unsigned long f1:16;
unsigned long f2:3;
};
A a;

void foo() {
// if (a.f1 == 2) return;
a.f1 = a.f1 + 3;
}

For a.f2 = a.f2 + 3, without reduceLoadOpStoreWidth in DAGCombiner, the code will be:
movl a(%rip), %eax
leal 3(%rax), %ecx
movzwl %cx, %ecx
andl $-65536, %eax # imm = 0xFFFF0000
orl %ecx, %eax
movl %eax, a(%rip)

with reduceLoadOpStoreWidth, the code will be:
movl a(%rip), %eax
addl $3, %eax
movw %ax, a(%rip)

However, if we remove the comment above, the load of a.f2 and the store of a.f2 will stay in two different BasicBlocks and reduceLoadOpStoreWidth in DAGCombiner cannot work.

The patch is redoing the same optimization in InstCombine, so the optimization will not be limited by the BasicBlock boundary.

Diff Detail

Repository: rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

wmi added inline comments.Feb 27 2017, 5:14 PM

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1331 ↗	(On Diff #89902)	Ah, I had a mental block here. I was thinking MaskLeadOnes has to be APInt if I use APInt::countLeadingOnes and APInt doesn't support % operator. It is better to APInt. I will fix it.
1388 ↗	(On Diff #89902)	I am not expecting the alignment will increase. I am worried that the original alignment will be overestimated if directly applied to the new store and caused undefine behavior. Suppose the original i32 store to address @a has 32 bits alignment. Now we will store an i16 to a.f2 which is at address "@a + 2B". "@a + 2B" should only have 16bits alignment.
test/Transforms/InstCombine/bitfield-store.ll
89 ↗	(On Diff #89902)	Sorry I don't get the point. Are you suggesting the following? %bf.set = or i16 %bf.clear3, %bf.value %bf.set.truncate = trunc %bf.set i16 to i13 store i13 %bf.set.trunc, i13* bitcast (%class.A4* @a4 to i13*), align 8 llvm will still generate the same code: andl $8191, %edi # imm = 0x1FFF movw %di, a4(%rip)

efriedma added inline comments.Feb 27 2017, 6:05 PM

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1388 ↗	(On Diff #89902)	Suppose the original i32 store to address @a has 32 bits alignment. Now we will store an i16 to a.f2 which is at address "@a + 2B". "@a + 2B" should only have 16bits alignment. Suppose the original i32 store to address @a has 8 bits alignment. What is the alignment of "@a + 2B"? (You need to compute the GCD of the offset and the original alignment.)
test/Transforms/InstCombine/bitfield-store.ll
89 ↗	(On Diff #89902)	Oh, sorry, this isn't a good example; I mixed up the fields. But consider: ; class ATest { ; unsigned long f1:13; ; unsigned long f2:3; ; } atest; ; atest.f2 = n; You could shrink the store here (trunc to i8).

wmi added inline comments.Feb 27 2017, 10:02 PM

lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1388 ↗	(On Diff #89902)	You are right. class A { public: unsigned long f1:8; unsigned long f2:16; unsigned long f3:8; }; A a; int foo () { a.f2 = 3; } i16 has 16 bits natural alignment, but a.f2 only has 8 bits alignment here.
test/Transforms/InstCombine/bitfield-store.ll
89 ↗	(On Diff #89902)	Ah, I see what you mean. In your case, we can shrink the store, but cannot remove the original load and bit operations doing the mask. I can add the shrink but I am not sure whether it is better than without the shrink.

efriedma added inline comments.Mar 1 2017, 12:40 PM

test/Transforms/InstCombine/bitfield-store.ll
89 ↗	(On Diff #89902)	It's a substantial improvement if you're transforming from an illegal type to a legal type. (I've been dealing with trying to optimize an i24 bitfield recently; see, for example, test/CodeGen/ARM/illegal-bitfield-loadstore.ll.) In other cases, you're right, it's not obviously profitable.

Update patch according to Eli's comments.

Add safety check to ensure no memory modifying inst bewteen load and store.
Extend the shrinking funcationality to cover the cases Eli gave to me.
Code refactoring so that different shrinking requirements can share the code as much as they can.

Although I did't find regression in internal benchmarks testing, I still moved the transformation to codegenprepare because we want to use TargetLowering information to decide how to shrink in some cases.

mkuper added inline comments.Mar 6 2017, 7:42 PM

lib/CodeGen/CodeGenPrepare.cpp
5719 ↗	(On Diff #90589)	We don't have alias analysis in CGP at all, do we? Maybe it would be better to pull this out somewhere else (late in the pipeline).
5739 ↗	(On Diff #90589)	Oh, ok, now I see why Eli suggested MemorySSA. There has to be a better way to do this. (Although I can't think of one at the moment.)
5823 ↗	(On Diff #90589)	Are you sure this does the right thing for xor?
5895 ↗	(On Diff #90589)	I don't think this is what you meant to check here. :-)
5912 ↗	(On Diff #90589)	Do we want to look through bitcasts? It probably doesn't matter in practice, though.
lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
1388 ↗	(On Diff #89902)	Could you add a test-case like this?

mkuper added inline comments.Mar 6 2017, 8:02 PM

lib/CodeGen/CodeGenPrepare.cpp
5823 ↗	(On Diff #90589)	Err, never mind, of course it does.

Could you add a test-case like this?

Sure. I will add such testcase after other major issues are solved.

lib/CodeGen/CodeGenPrepare.cpp
5739 ↗	(On Diff #90589)	Yes, if the optimization happens before loadpre, then this simple check is enough. If the optimization happens late in the pipeline, we need memoryssa + alias query to do the safety check.
5895 ↗	(On Diff #90589)	Oh, thanks for catching the stupid mistake.
5912 ↗	(On Diff #90589)	Yes, I want. I do see case that needs it.

efriedma added a subscriber: • dberlin.Mar 8 2017, 12:46 PM

Revamp the patch.

Extend bitfield store shrinking to handle and(or(and( ... or(load, C_1), MaskedVal_1), ..., C_N), MaskedVal_N))) pattern.
Add bitfield load shrinking.
Use memorySSA to do the safety check and maintain it on the fly.

With all these changes, llvm now can catch most of the shrinking opportunities for testcase http://lists.llvm.org/pipermail/llvm-commits/attachments/20170307/23ad5702/attachment-0001.cc, but still keep its bitfield coalescing ablity by putting the shrinking pass in the late pipeline.

I need to add testcases for bitfield load shrinking. Will send out patch update soon.

Herald added a subscriber: mgorny. · View Herald TranscriptMar 24 2017, 3:53 PM

wmi retitled this revision from [InstCombine] Redo reduceLoadOpStoreWidth in instcombine for bitfield store optimization. to [BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently.Mar 24 2017, 3:57 PM

Thanks a lot for working on this, first round of comments!

include/llvm/CodeGen/Passes.h
67–68	Since this is a CodeGen pass, the code should live in lib/CodeGen rather than in lib/Transforms.
lib/CodeGen/TargetPassConfig.cpp
488–490	This should probably be predicated on `getOptLevel()` much like below?
lib/Transforms/Scalar/BitfieldShrinking.cpp
1 ↗	(On Diff #93018)	I understand that the motivation here are bitfield accesses, but that isn't how we should describe the pass IMO. This is a generic pass to narrow memory accesses, and I think you should name it and document it accordingly. Naturally, you can still mention bitfields as one of the motivations and to help explain the specific patterns that are handled here. But if we have some other memory access shrinking we want to do, I would imagine that we would want to add it to this pass. This will probably need to be propagated through many of the comments here.
10–19 ↗	(On Diff #93018)	See above about re-focusing the documentation here on the generic memory access narrowing, and making the details about bitfields part of the motivation. I would also make sure to include here a high level overview of the approach / algorithm used. Things like the fact that this uses MemorySSA and is specifically designed to handle shrinking across control flow seems important. I'd also suggest making this a \file doxygen comment.
54 ↗	(On Diff #93018)	Use C++ struct naming rather than a C-style typedef of an anonymous struct.
232–235 ↗	(On Diff #93018)	While I generally like the use of lambdas to help factor this code, I find the parameters which are changing with each loop iteration being captured by reference and so implicitly changing to be really confusing. I would prefer to pass parameters that are fundamentally the input to the lambda as actual parameters, and use capture more for common context that isn't really specific to a particular call to the lambda. Does that make sense?
266–269 ↗	(On Diff #93018)	We try to avoid doing `isa<Foo>(...)` and then `cast<Foo>(...)` in LLVM (it adds overhead to asserts builds that can really add up and is a bit redundant). Instead, use `dyn_cast` here?
488–489 ↗	(On Diff #93018)	Is this valid at this point? It seems like it shouldn't be able to happen. I'd either use llvm_unreachable to mark that or add a comment explaining what is happening here.
568 ↗	(On Diff #93018)	`Worklist` is a more common name for this kind of vector in LLVM.
887–889 ↗	(On Diff #93018)	The pass manager already provides for facilities for printing before and after passes -- is this needed?

Chandler, thanks for the review and sorry about the delay of replying. It takes me a while to fix some issues of the patch found when I was adding test for the load shrinking part and doing the unittest.

include/llvm/CodeGen/Passes.h
67–68	Fixed.
lib/CodeGen/TargetPassConfig.cpp
488–490	Fixed.
lib/Transforms/Scalar/BitfieldShrinking.cpp
1 ↗	(On Diff #93018)	You are right. The impact of the pass is not limited to bitfield access. I renamed the pass to MemAccessShrink and changed the comments accordingly.
10–19 ↗	(On Diff #93018)	Add a highlevel overview of the approach used as suggested and use \file doxygen comment.
54 ↗	(On Diff #93018)	Fixed.
232–235 ↗	(On Diff #93018)	It makes sense. Fixed.
266–269 ↗	(On Diff #93018)	Fixed.
488–489 ↗	(On Diff #93018)	Fixed.
568 ↗	(On Diff #93018)	Fixed.
887–889 ↗	(On Diff #93018)	The printing was Removed.

Address Chandler's comments.
Fix unittest errors.
Add unittest for load shrinking part. Add the original motivation case as a unittest.
Add cost evaluation for the case when there is multiple use node inside the shrinking pattern.

Herald added a reviewer: javed.absar. · View Herald TranscriptApr 3 2017, 4:44 PM

Ping.

I'll find some time to look at the core algorithm later.

include/llvm/Target/TargetLowering.h
1973	I'm not sure I see the point of this hook. Every in-tree target has cheap i8 load/store and aligned i16 load/store operations. And we have existing hooks to check support for misaligned operations. If there's some case I'm not thinking of, please add an example to the comment.
lib/CodeGen/MemAccessShrinking.cpp
53	"try", not "tries".
1015	If an instruction has no uses and isn't trivially dead, we're never going to erase it; no point to adding it to CandidatesToErase.
1028	Should we clear CandidatesToErase here, as opposed to modifying it inside the loop?

Digging into the code next, but wanted to send some comments just on terminology and the documentation while I'm doing that.

lib/CodeGen/MemAccessShrinking.cpp
16	nit: s/now//
27	This sentence doesn't really parse for me, it reads as a command rather a description and comments typically are descriptive.
47	"no more change has happened" -> "no more changes have happened"
51	"mergd" -> "merged" "load/store" needs to be "loads and stores" or "load/store instructions".
53	I would reword this sentence to be a bit easier to read: It provides scalable and precise safety checks even when we try to insert a smaller access into a block which is many blocks away from the original access.
54	A comment on the terminology throughout this patch: the adjective describing something which has been reduced in size in the past is "shrunken". That said, if this is awkward to use, I might use the adjective "smaller". But "shrinked" isn't a word in English.

wmi added inline comments.Apr 21 2017, 4:31 PM

include/llvm/Target/TargetLowering.h
1973	It is because some testcase for amdgpu. Like the testcase below: define void @s_sext_in_reg_i1_i16(i16 addrspace(1)* %out, i32 addrspace(2)* %ptr) #0 { %ld = load i32, i32 addrspace(2)* %ptr %in = trunc i32 %ld to i16 %shl = shl i16 %in, 15 %sext = ashr i16 %shl, 15 store i16 %sext, i16 addrspace(1)* %out ret void } code with the patch: s_load_dwordx2 s[4:5], s[0:1], 0x9 s_load_dwordx2 s[0:1], s[0:1], 0xb s_mov_b32 s7, 0xf000 s_mov_b32 s6, -1 s_mov_b32 s2, s6 s_mov_b32 s3, s7 s_waitcnt lgkmcnt(0) buffer_load_ushort v0, off, s[0:3], 0 s_waitcnt vmcnt(0) v_bfe_i32 v0, v0, 0, 1 buffer_store_short v0, off, s[4:7], 0 s_endpgm code without the patch: s_load_dwordx2 s[4:5], s[0:1], 0x9 s_load_dwordx2 s[0:1], s[0:1], 0xb s_mov_b32 s7, 0xf000 s_mov_b32 s6, -1 s_waitcnt lgkmcnt(0) s_load_dword s0, s[0:1], 0x0 s_waitcnt lgkmcnt(0) s_bfe_i32 s0, s0, 0x10000 v_mov_b32_e32 v0, s0 buffer_store_short v0, off, s[4:7], 0 s_endpgm amdgpu codegen chooses to use buffer_load_short instead of s_load_dword and generates longer code sequence. I know almost nothing about amdgpu so I simply add the hook and only focus on the architectures I am more faimiliar with before the patch becomes in better shape and stable.
lib/CodeGen/MemAccessShrinking.cpp
1015	An instruction which is not trivially dead for now may become dead after other instructions in CandidatesToErase are removed. That is why I want to add it to CandidatesToErase.
1028	Ah, right. Actually, I shouldn't use range based loop since the iterator will be invalidated after insertion and deletion.

efriedma added a subscriber: arsenm.Apr 21 2017, 5:08 PM

efriedma added inline comments.Apr 21 2017, 5:14 PM

include/llvm/Target/TargetLowering.h
1973	Huh, GPU targets are weird like that. I would still rather turn it off for amdgpu, as opposed to leaving it off by default.
lib/CodeGen/MemAccessShrinking.cpp
1015	OpI has no uses here. The only way an instruction can have no uses and still not be trivially dead is if it has side-effects. Deleting other instructions won't change the fact that it has side-effects.

arsenm added inline comments.Apr 21 2017, 5:45 PM

include/llvm/Target/TargetLowering.h
1973	32-bit loads should not be reduced to a shorter width. Using a buffer_load_ushort is definitely worse than using s_load_dword. There is a target hook that is supposed to avoid reducing load widths like this

wmi added inline comments.Apr 21 2017, 6:07 PM

include/llvm/Target/TargetLowering.h
1973	Matt, thanks for the explanation. I guess the hook is isNarrowingProfitable. However, the hook I need is a little different. I need to know whether narrowing is expensive enough. isNarrowingProfitable on x86 shows i32 --> i16 is not profitable, maybe slightly harmful, but it is not quite harmful, and the benefit to do narrowing may outweigh the cost.
lib/CodeGen/MemAccessShrinking.cpp
1015	You are right. The entire logic about CandidatesToErase is problematic. I will fix it.

I'm still working on this, but since Wei mentioned he is looking at fixing the CandidatesToErase stuff, I wanted to go ahead and send these comments -- there is a significant one w.r.t. to the deletion stuff as well.

lib/CodeGen/MemAccessShrinking.cpp
85–86	Are both of these really useful for debugging? We already have a flag that controls whether the pass is enabled or not.
89–90	"mod" range seems like an odd name?
93	Maybe a comment on what this is used for?
100–101	We generally use initializer sequences: MemAccessShrinkingPass(const TargetMachine *TM = nullptr) : FunctionPass(ID), TM(TM) { Also, sense this is an internal type I'd skip the default argument for `TM` as it seems to not really give you much.
132	I would try to expand unusual acronyms: 'du-chain' -> 'Def-Use chain'.
984–997	Rather than reproduce the body of `RecursivelyDeleteTriviallyDeadInstructions` here, how about actually refactoring that routine to have an overload taht accepts a SmallVectorImpl<Instruction > list of dead instructions, then you can hand your list to this routine rather than writing your own. Specifically, I think (as Eli is alluding to below) you should only put things into your `CandidatesToErase` vector (which I would rename `DeadInsts` or something) when they satisfy `isInstructionTriviallyDead`. Even if deleting one of the instructions is necessary to make one of the candidates dead, we'll still visit it because these routines recursively* delete dead instructions already.
1052–1054	This pattern seems confusing. How about using a lambda (or even an actual separate function) to model a single pass over the function, so that it can just return a single `Changed` variable?
1057–1059	I would still use a for loop here, and importantly capture rend early: for (auto InstI = BB->rbegin(), InstE = BB-rend(); InstI != InstE;) ...(*InstI++);
1062	Why not just one `Changed` variable?

Thanks for bearing with my poor English. I will fix the terminologies and comments according to your suggestions.

lib/CodeGen/MemAccessShrinking.cpp
85–86	It is actually put there for my own conveniency when debugging. I will remove them.
89–90	Yes, I will change them.
93	Sure.
100–101	Ok.
132	Will fix it.
984–997	Ok, I will do some refactoring based on RecursivelyDeleteTriviallyDeadInstructions. Another motivation is that I need to update MSSA while deleting instruction. We don't have callback to remove MemoryAccess when we delete memory instruction.
1052–1054	It uses the same iterative pattern as CodeGenPrepare, but maybe the iterative pattern in InstCombine is clearer -- only one Changed variable is used there. I will wrap a single pass into a function. I probably rename tryShrinkOnInst to tryShrinkOnFunc and use it to wrap a single pass. Existing tryShrinkOnInst is simple enough so I can inline its content into tryShrinkOnFunc.
1057–1059	ok, will change it.

First off, really sorry to keep sending *partial* code reviews. =[ I again didn't quite have enough time to do a full review of the patch (it is a bit large) but wanted to at least send out everything I have so that you aren't blocked waiting on me to produce some comments. =] I'll try again tomorrow to make more progress here although it may start to make sense for me to wait for an iteration as one of the refactorings I'm suggesting here will I think change the structure quite a bit.

In D30416#734516, @wmi wrote:

Thanks for bearing with my poor English.

Please don't stress at all. =D I think reviewing comments, phrasing, etc., needs to happen in any code review. The whole point is to figure out how to write comments and such in a way that make sense to others, and speaking for myself at least, no level of knowledge about the English language is enough there -- it really requires someone else reading it to figure this out.

lib/CodeGen/MemAccessShrinking.cpp
375–397	This comment has been deleted.
418–431	Even after reading your comment I'm not sure I understand what is driving the complexity of this match. Can you explain (maybe just here in the review) what patterns you're trying to handle that are motivating this? I'm wondering whether there is any canonicalization that can be leveraged (or added if not already there) to reduce the complexity of the pattern here. Or if we really have to handle this complexity, what the best way to write it and/or comment it so that readers understand the result.
454–457	What happens when both are true? It looks like we just overwrite the 'MR' code? I feel like both of these `analyze...()` methods should return the `ModRange` struct rather than having an output parameter.

chandlerc added inline comments.Apr 24 2017, 6:41 PM

lib/CodeGen/MemAccessShrinking.cpp
375–397	After reading more of this routine, I think you should split it into two routines, one that tries to handle the first pattern, and one that only handles the second pattern. You can factor the rewriting code that is currently shared by both patterns into utility functions that are called for both. But the logic of this routine is harder to follow because you always have this state to hold between doing two different kinds of transforms.
408	`TBits` doesn't really give me enough information as a variable name... Maybe `StoreBitSize`?
468–471	Should this be testing against the `DataLayout` rather than hard coded `8`, `16`, and `32`? What if 64 bits is legal and that's the width of the MR?
605–608	This comment and function name don't really add up for me... There is no `Cst` parameter here. I assume you mean `AI`? Also having a flag like `AInB` seems to make this much more confusing to read. Why not just have two routines for each case? My guess at what this is actually trying to do is `areConstantBitsWithinModRange` and `areConstantBitsOutsideModRange`?
620–624	Maybe a method and use the term 'disjoint'? `MR1.isDisjoint(MR2)` reads a bit better to me.
627–629	This makes this a very confusing API -- now it isn't really just a predicate, it also computes the insertion point... Also, why do you need to compute a particular insertion point within a basic block? Can't you just always insert into the beginning of the basic block and let the scheduler do any other adjustments?

arsenm added inline comments.Apr 24 2017, 7:36 PM

include/llvm/Target/TargetLowering.h
1973	The hook I was thinking of was shouldReduceLoadWidth. s_load_dword uses a different cache with much faster access than the buffer instruction if it can be used

Thanks for drafting the comments. It is apparently more descriptive and clearer, and I like the varnames -- (LargeVal and SmallVal), which are much better than what I used -- (OrigVal, MaskedVal). I will rewrite the comments based on your draft.

lib/CodeGen/MemAccessShrinking.cpp
375–397	I tried that splitting before but I had to move many temporaries to MemAccessShrinkingPass. However, some temporaries are only used by store shrinking but not load shrinking, so that looked a little weird. I agree the logic will be clearer if it is split into two routines. I will try again, and see if I can separate some common temporaries into MemAccessShrinkingPass and leave the special temporaries as parameters, or create a store shrinking class to keep the temporaries.
408	Ok, will change it.
418–431	I borrow the template in your comments to explain: store(or(and(LargeVal, MaskConstant), SmallVal), address) The case is: store(or_1(and_1(or_2(and_2(load, -65281), Val1), -256), and_3(Val2, 7)) The two operands of "or_1" are "and_1" and "and_3", but it doesn't know which subtree of and1 or and3 contains the LargeVal. I hope or_2 can be matched to the LargeVal. It is a common pattern after bitfield load/store coalescing. But I realize when I am explaining to you, that I can split the complex pattern matching above into two, which may be simpler. bool OrAndPattern = match(Val, m_c_Or(m_And(m_Value(LargeVal), m_ConstantInt(Cst)), m_Value(SmallVal))); if (match(SmallVal, m_c_And(m_c_Or(m_And(m_Value(), m_Value()), m_Value()))) std::swap(SmallVal, LargeVal);
454–457	We will just overwrite "MR" but it is still not good for "OrAndPattern". I will change the second "if" to "else if".
468–471	That is better. Will fix it.
605–608	Sorry, Cst means AI here. The func is doing areConstantBitsWithinModRange and areModRangeWithinConstantBits.
620–624	Ok.
627–629	Good point. If there is a clobber instruction, but that clobber instruction is at the same block as "To" Instruction, I can simply insert it at the beginning of the block of "To" instruction, and NewInsertPt is not needed. But I still prefer to use the insertion point closer to "To" instruction if there is no clobber instruction, because the IR looks better. That means at least a flag showing whether I need to insert at the beginning of the "To" instruction block has to be returned. I.E., I can simplify "Instruction *&NewInsertPt" to a flag. Is that API acceptable?

wmi added inline comments.Apr 28 2017, 4:11 PM

include/llvm/Target/TargetLowering.h
1973	shouldReduceLoadWidth is a hook for TargetLowering but I need a hook in TargetLoweringBase, which can be used for llvm IR pass. I cannot change shouldReduceLoadWidth to be a hook in TargetLoweringBase because of the way in which x86 uses it, so I copy the logic in AMDGPUTargetLowering::shouldReduceLoadWidth to AMDGPUTargetLowering::isNarrowingExpensive. I can let shouldReduceLoadWidth call isNarrowingExpensive in a NFC. Is it ok?
lib/CodeGen/MemAccessShrinking.cpp
418–431	I find I still have to keep the original complex pattern. Now I remember where the real difficulty is: For case like: store(or_1(and_1(or_2(and_2(load, -65281), Val1), -256), and_3(Val2, 7)), I want to match LargeVal to or_2(and_2(load, ...) But I cannot use match(Val, m_c_Or(m_And(m_c_Or(m_And(...)))) because I have no way to get the intermediate results of the match, like I cannot bind LargeVal to the second m_c_Or. So I have to split the match into multiples. That is where the complexity comes from.
468–471	For x8664, DataLayout works fine. However, for other architectures, like arm, the datalayout is target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64" DL::LegalIntWidths will only contain widths of natural integers, represented by "n32", so only 32 is Legal Integer Width for ARM. For x8664, its datalayout is: target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" because of "n8:16:32:64", 8, 16, 32, 64 are all legal integer width.

Address Eli, Matt and Chandler's comments.

Some major changes:

A lot of comments changed.
split reduceLoadOpsStoreWidth into two and several other helper functions.
refactor RecursivelyDeleteTriviallyDeadInstructions.

Herald added a subscriber: nhaehnle. · View Herald TranscriptApr 28 2017, 4:20 PM

Somewhat focused on the store side. Near the bottom is a high-level comment about the load shrinking approach.

lib/CodeGen/MemAccessShrinking.cpp
104	Still should omit the `= nullptr` here since this is an internal type.
154–157	These seem like helper functions that don't actually need to be part of the class at all. Maybe make them static free functions? (This may be true for some of the other routines in this file.)
205–207	It seems much more idiomatic to return the bool indicating whether a valid BitRange was computable, and if true, set up the values in an output parameter. Or even better, you could return an `Optional<BitRange>`, and return `None` when the requirements aren't satisfied.
209	I know other passes use the variable name `Cst` but I'd suggest using just `C` for generic constants or some more descriptive term like you use elsewhere like `Mask`.
212–215	Do you really need to extend or truncate here? Surely the type system has already caused the constant to be of the size you want? If so, I'd just assert it here. Maybe could directly pass a 'const &' APInt in as the parameter, letting you call the parameter `Mask` above?
216–217	I would call these `MaskLeadingOnes` and `MaskTrailingOnes`.
218–220	I'm having trouble understanding the logic here in the case where there are leading ones. Here is my reasoning, but maybe I've gotten something wrong here: Shifting right will remove leading ones, but you're shifting right the number of trailing ones... Shouldn't that be leading ones? And won't the result of a shift right be to place the middle zero sequence at the least significant bit, meaning you would want to count the leading zeros? Put differently, arithmetic shift is required to not change the most significant bit, so doing an arithmetic shift right based on how many ones are trailing, seems like it will never change the count of trailing zeros. If this is correct, then this is a bug and you should add some test cases that will hit this bug. But regardless of whether my understanding is correct or there is a bug here, I think this can be written in a more obvious way: unsigned MaskMidZeros = BitSize - (MaskLeadingOnes + MaskTrailingOnes); And then directly testing whether they are all zero: if (Mask == APInt::getBitsSet(BitSize, MaskLeadingOnes, MaskLeadingOnes + MaskMidZeros)) {
223–225	Why would we see an all ones mask? Shouldn't have that been eliminated earlier? It seems like we could just bail in this case.
246	The idiomatic way to test this with APInt is `BitMask.isSubsetOf(KnownZero)`. Also, it would be good to use early-exit here. It sounds like you are testing whether it is valid to do anything, but that isn't clear when you have set up members of `BR` here before returning.
260	When you have alternates, the pattern notation is a bit confusing. I'd just say something like `Analyze <bop>(load P, \p Cst) where <bop> is either 'or', 'xor', or 'and'.`.
261	This isn't really about whether the original value is loaded or not, right? It is just bounding the changed bits? I'd explain it that way. You'll mention the load when you use it.
263–264	Maybe a better name of this function would be: `computeBopChangedBitRange`?
265	Same comment above about just asserting the correct bitsize and passing the APInt Mask in directly.
266	Why not pass the argument as a BinaryOperator?
267–268	Might be nice to add a comment explaining the logic here. Something like: Both 'or' and 'xor' operations only mutate when the operand has a one bit. But 'and' only mutates when the operand has a zero bit, so invert the constant when the instruction is an and so that all the (potentially) changed bits are ones in the operand.
284	Why the `PowerOf2Ceil` here? Will the actual store used have that applied? If the actual store has that applied, why don't we want to consider that as `BitSize` so we're free to use that larger size for the narrow new type?
286–289	As the comment explains, this lambda is actually computing the Shift. But the name seems to indicate it is just a predicate testing whether the old range is covered by the new one. Also, why does the old `BR` need to passed in as an argument, isn't that something that can be captured? I actually like passing `NewBR` in here to show that it is what is changing between calls to this routine. But it seems awkward to setup `NewBR` before this lambda (which would allow it to be implicitly capture it) and then call it with a parameter name that shadows that, therefore avoiding capturing it. I'd consider whether you want to sink `NewBR` down or othewise more cleanly handle it in the loop. Nit pick: we typically name lambdas like variables with `FooBar` rather than like functions with `fooBar`.
290	What about platforms with support for unaligned loads? Probably best to just leave a FIXME rather than adding more to this patch, but it seems nice to mention that technique. As an example, on x86, if you have a bitfield that looks like: struct S { unsigned a : 48; unsigned b : 48; unsigned c : 32; }; It seems likely to be substantially better to do single 8-byte load and mask off the high 2 bytes when accessing `b` than to do two nicely aligned 8-byte loads and all the bit math to recombine things.
329	Pass `BR` by value? (or make it a const reference, but it seems small)
331–335	Generally we prefer early returns to state. That would make this: if (...) return ...; return ...;
345–346	You can replace uses of this with: `V1->stripPointerCasts() == V2->stripPointerCasts()`. This will be more powerful as well.
370–371	Maybe this is just a strange API on MemorySSA, but typically I wouldn't expect a lack of dominance to indicate that no access between two points exists. How does MemorySSA model a pattern that looks like: From x \ / \ / A \| \| To Where `A` is a defining access, is between `From` and `To`, but I wouldn't expect `From` to dominate `A` because there is another predecessor `x`.
393	`StOffset` seems an odd name if this is used to create new pointers for loads as well as stores.
402	No need to call it `uglygep`. If you want a clue as to the types, maybe `rawgep` or `bytegep`.
407–408	This comment mentions `LargeVal` but that isn't an argument?
418–431	It would be really nice if LLVM would canonicalize in one way or the other so you didn't have to handle so many variations. Asking folks about whether we can/should do anything like that. But I think the bigger question is, why would only two layers be enough? I feel like there is something more general here that will make explaining everything else much simpler. Are you looking for a load specifically? Or are you just looking for one side of an `or` which has a "narrow" (after masking) `and`? If the former, maybe just search for the load? If the latter, maybe you should be just capturing the two sides of the or, and rather than looking explicitly for an 'and', instead compute whether the non-zero bits of one side or the other are "narrow"?
422	This reads like a fragment, were there supposed to be more comments before this line?
557–558	Keeping this close to `extendBitRange` would make it a lot easier to read. Also, why have two functions at all? It appears this is the only thing calling `extendBitRange`. (I'm OK if there is a reason, just curious what it is.)
562–565	I'm surprised this doesn't just fall out from the logic in `extendBitRange`.
575	Is `StoreShrunkInfo` buying much here? It seems to mostly be used in arguments, why not just pass the argument directly? The first bit of the code seems to just unpack everything into local variables.
587–588	Doesn't MinAlign handle 0 correctly so that you can just do this unconditionally?
593–595	Isn't this only called when we need to insert two stores?
596–606	It feels like all of this could be factored into an 'insertStore' method? In particular, the clone doesn't seem to buy you much as you rewrite most parts of the store anyways. This could handle all of the MemorySSA updating, logging, etc.
662–663	Rather than cloning and mutating, just build a new load? the IRBuilder has a helpful API here.
679–687	There is no need to handle constants specially. The constant folder will do all the work for you.
690–705	I think the amount of code that is special cased here for one caller of this routine or the other is an indication that there is a better factoring of the code. If you had load insertion and store insertion factored out, then each caller could cleanly insert the narrow load, compute the narrow store (differently), and then insert it. Does that make sense? Maybe there is a reason why that doesn't work well?
729–730	Might read more easily as: "Assuming that \p AI contains a single sequence of bits set to 1, check whether the range \p BR is covered by that sequence."
732–733	It seems more obvious to me to test this the other way: BR.Shift >= AI.countLeadingZeros() && BR.Shift + BR.Width < (AI.getBitWidth() - AI.countTrailingZeros()) Is this not equivalent for some reason? (Maybe my brain is off...) The reason I find this easier to read is because it seems to more directly test: "is the start of the BitRange after the start of the 1s, and is the end of the BitRange before the end of the 1s.".
742–745	There is no comment about the cost of this routine. It looks really expensive. It appears to walk all transitive predecessors of the block containing `To`. So worst case, every basic block in the function. I see this called in several places from inside of for-loops. Is this really a reasonable approach? Why aren't we just walking the def-use chain from MemorySSA to figure this kind of thing out in a much lower time complexity bound? Like, shouldn't we just be able to walk up defs until we either see a clobber or `From`?
762–763	You can just insert -- that will return whether it succeeded in inserting the block.
770	Naming convention.
780–781	So, each of these `dominates` queries are incredibly slow. They require linearly walking every instruction in the basic block (worst case). Why doesn't MemorySSA handle this for you? (Maybe my comment above about using MemorySSA will obviate this comment though.)
946	It would be much more clear for this to be a parameter rather than an implicit parameter via class member. For example, multiple uses of what?
967–997	Rather than re-implementing all of this logic, can you re-use the existing demanded bits facilities in LLVM? For example, I think you can use the `DemandedBits` analysis, walk all loads in the function, and then narrow them based on the demanded bits it has computed. Because of how `DemandedBits` works, it is both efficient and very powerful. It can handle many more patterns. Thinking about this, I suspect you'll want to do two passes essentially. First, narrow all the stores that you can. This will likely be iterative. Once that finishes, it seems like you'll be able to then do a single walk over the loads with a fresh `DemandedBits` analysis and narrow all of those left. You'll probably want to narrow the stores first because that may make bits stop being demanded. But I don't see any way for the reverse to be true, so there should be a good sequencing. To make the analysis invalidation stuff easier, you may actually need this to actually be two passes so that the store pass can invalidate the `DemandedBits` analysis, and the load pass can recompute it fresh. Does that make sense? If so, I would suggest getting just the store shrinking in this patch, and add the load shrinking in a follow-up patch. I'm happy for them to be implemented in a single file as they are very similar and its good for people to realize they likely want both passes.
998	`Inst` would seem like a more common variable name here.
1144–1145	Comments would be good explaining the very particular iteration order.
1183–1184	Do you want to run `tryShrink` again just because you removed dead instructions? If so, do you want to remove dead instructions on each iteration instead of just once `tryShrink` doesn't make a change?

This revision now requires changes to proceed.May 9 2017, 5:33 PM

Chandler, Thanks for the comments. They are very helpful. I will address them in the next revision. I only replied some comments which I had questions or concerns.

lib/CodeGen/MemAccessShrinking.cpp
104	I cannot omit it because In INITIALIZE_TM_PASS_END, callDefaultCtor<passName> requires the param to have a default value.
218–220	Shifting right will remove leading ones, but you're shifting right the number of trailing ones... Shouldn't that be leading ones? And won't the result of a shift right be to place the middle zero sequence at the least significant bit, meaning you would want to count the leading zeros? I think shifting right will remove trailing ones? And after the shift (Mask.ashr(MaskTrailOnes)), middle zeros are at the least sigficant bits, and they are trailing zeros, right? But like you said, I should rule out the all zero/all one cases separately so the logic will become more clear.
370–371	The case will not happen because we ensure `From` dominates `To` before calling the function. You are right, it is better to add an assertion at the entry of the function to prevent misuse of the API.
596–606	I use clone here just to duplicate the subclass data like volatile and ordered.
742–745	That is because the instruction `To` here may not be a memory access instruction (It is probably a And or Trunc instruction which indicates only some bits of the input are demanded), and we cannot get a MemoryAccess for it. Note hasClobberBetween are overloaded and there are two versions. The other version which walks the MSSA def-use chain is used in several for-loops as you saw. This higher cost version is not used in a loop. Besides, we only check MSSA DefList in each BB, so the worse case complexity is the number of memory access instructions in the func, which is usually much less than the number of instructions in the func.
946	MultiUsesSeen is not changed for every instruction. It is saying whether a previous instruction on the chain was found to have multiuse when we walk the chain bottom-up. r1 = ...; r2 = r1 + r3; r4 = r2 + r5; If `r2` has multiple uses, both `r2 = r1 + r3` and `r1 = ...` cannot be removed after the shrinking.
967–997	I considered demanded bits facilities before, but I found it can only simplify the code a little bit. Finding the demanded bits inside of the load is only a small part of the work. Most of the complexity of the code comes from figuring out which ops in the sequence on the Def-Use Chain change the demanded bits. Like if we see shifts, we may clear some demanded bits in less significant position to zeros because we shift right then shift left. Because we change the demanded bits, we must include the shifts into the shrinked code sequence. Like if we see Or(And(Or(And(...)) pattern, we want to know that the bits changed by Or(And()) are different bits of the demanded bits, only when that is true, we can omit the Or(And(...)) pattern in the final shrinked code sequence. Another reason is, demanded bits analysis may not be very cheap. As for memory shrinking, few pattern like and/trunc is very common to be useful for the shrinking so we actually don't need very general demanded bits analysis for every instruction.
1183–1184	If dead instruction is removed, another iteration will be taken and tryShrink will run again. I think it makes no difference between `running removeDeadInsts only when tryShrink makes no change` and `running removeDeadInsts everytime after tryShrink makes a change`.

Trying to provide answers to the open questions here...

lib/CodeGen/MemAccessShrinking.cpp
104	Ah, right, I forgot that about the pass initialization. Sorry for the noise!
218–220	Ah, ok, this makes sense to me now. I had confused myself thinking about it. Anyways, the simpler formulation will avoid any future reader confusion.
370–371	Ok, while that makes sense, it still seems counter-intuitive in terms of how to use MemorySSA based on my limited understanding. I would have expected essentially walking up the def's from the use until either a clobber is found or the 'from' is found. One has to come first, and which ever is first dictates if there is a clobber. Essentially, I would expect to use the SSA properties to answer these questions rather than the dominance or control flow properties. But I'm happy if folks more deeply familiar with MemorySSA can explain better why this is the right way to use it, as I'm still new to this infrastructure in LLVM.
596–606	I still think it will be cleaner to directly construct the load. Also, I wouldn't expect this pass to be valid for either volatile or ordered loads...
742–745	Given how different the two routines are, I would suggest giving them separate names. It seemingly wasn't obvious that they were different already. I'm happy to look at the use cases, but this still feels much too expensive to me. In terms of big-O, the fact that it is only memory accesses doesn't really seem to help much. Quadratic in the number of memory accesses is still probably not something we can realistically do. I want to think more about the algorithm once I see exactly where this is being called.
946	Ok, that explanation makes sense, but you'll need to find a way to make this clear from the code itself. =] At the very least, not using a member, but probably with some more helpful variable names, function names, structure or comments.
967–997	I don't really understand the argument here... I would expect the demanded bits facilities to already handle things like shifts changing which bits are demanded, does it not? If not, it seems like we should extend that general facility rather than building an isolated system over here. Regarding the cost, I'm much more worried about the algorithmic cost of this and the fact that it seems relatively sensitive to things that we don't reliably canonicalize (using trunc or and instructions to remove demanded bits). Generally speaking, working backwards from the and or trunc is going to be much more expensive than working forwards. But even if it the existing demanded bits is too expensive, we still shouldn't develop a new cheap one locally here. We should either add a parameter to decrease its cost or add a new general purpose "fast" demanded bits and then use that here.
1183–1184	I guess I'm trying to ask: why will the removal of dead instructions cause shrinking to become more effective? Most of the algorithms here don't seem likely to remove entire classes of uses, and so I'm not sure why this iteration is valuable at all. But if it is valuable, that is, if removing dead instructions exposes more shrinking opportunities, I would expect that removing dead instructions earlier (IE, on each iteration) to cause this to converge faster.

wmi added inline comments.May 10 2017, 10:37 PM

lib/CodeGen/MemAccessShrinking.cpp
596–606	Ok, I will change it to use IRBuilder.
967–997	I would expect the demanded bits facilities to already handle things like shifts changing which bits are demanded, does it not? Yes, demanded bits facilities can adjust which bits are demanded if there is shift. However, which bits are the demanded bits is not the only thing I need to know. I still need to know whether an operation in the Def-Use Chain effectively change the value of the bits demanded. If the operation changes the value of demanded bits, the operation should be shrinked together with the load. If it only changes the value of bits other than the demanded bits, the operation can be omitted. That is actually what most of the pattern matching is doing in load shrinking, and that part of work I think cannot be fullfilled by demanded bits analysis.
1183–1184	why will the removal of dead instructions cause shrinking to become more effective? After removing some dead instructions, some multi uses def will become single use def and it will help increasing the benefit to do more shrinking. if removing dead instructions exposes more shrinking opportunities, I would expect that removing dead instructions earlier (IE, on each iteration) to cause this to converge faster. Ok, then I will do it after tryShrink in every iteration.

wmi added inline comments.May 11 2017, 7:26 AM

lib/CodeGen/MemAccessShrinking.cpp
967–997	Even only for demanded bits, working forwards is not that straightforward because a wide load for a field group will be shared by multiple field accesses, and demanded bits analysis will sometimes show that all the bits of the load will be used. And it is possible that on the upper side of the Def-Use Chain, the width of the demanded bits are larger because of a node may have multiple uses, on the lower side of the Def-Use Chain, the width of the demanded bits are smaller. Working forward, we have to do some search on the expr tree rooted at a load stmt. Working backwards will let us know the demanded bits we want to use from the beginning.

Discussed with Chandler offline, and we decided to split the patch and tried to commit the store shrinking first.

Then I tried the idea of walking forward for load shrinking by using demandedbits but I run into a problem for the motivational testcase (test/CodeGen/X86/mem-access-shrink.ll). Look at the %bf.load which we want to shrink in mem-access-shrink.ll, it has multiple uses, so we want to look at all its uses and get the demanded bits for each use. However, on the Def-Use chain from %bf.load to its narrower uses, it is not only %bf.load having multiple uses, for example, %bf.set also has multiple uses, so we also need to look at all the uses of %bf.set. In theory, every node on the Def-Use Chain can have multiple uses, then at the initial portion of Def-Use Chain starting from %bf.load, we don't know whether the %bf.load can be shrinked or not from demanded bits, only when we walk pretty close to the end of the Def-Use Chain, we know whether %bf.load can be shrinked at the specific place. In other words, by walking forward, in order not to miss any shrinking opportunity, we have to walk across almost all the nodes on the Def-Uses tree before knowing where %bf.load can be shrinked.

For walking backward, most of the cases, we only have to walk from a narrower use to "%bf.load =" which is the root of the Def-Uses tree. It is like walking from some leafs to root, which should be more efficient in most cases. I agree we may see special testcase like there is a long common portion for those paths from leafs to root (for that case walking forward is better). If that happen, we can add a cap about the maximum walking distance to avoid the compile time cost from being too high. Chandler, do you think it is ok?

In D30416#756899, @wmi wrote:

Discussed with Chandler offline, and we decided to split the patch and tried to commit the store shrinking first.

Then I tried the idea of walking forward for load shrinking by using demandedbits but I run into a problem for the motivational testcase (test/CodeGen/X86/mem-access-shrink.ll). Look at the %bf.load which we want to shrink in mem-access-shrink.ll, it has multiple uses, so we want to look at all its uses and get the demanded bits for each use. However, on the Def-Use chain from %bf.load to its narrower uses, it is not only %bf.load having multiple uses, for example, %bf.set also has multiple uses, so we also need to look at all the uses of %bf.set. In theory, every node on the Def-Use Chain can have multiple uses, then at the initial portion of Def-Use Chain starting from %bf.load, we don't know whether the %bf.load can be shrinked or not from demanded bits, only when we walk pretty close to the end of the Def-Use Chain, we know whether %bf.load can be shrinked at the specific place. In other words, by walking forward, in order not to miss any shrinking opportunity, we have to walk across almost all the nodes on the Def-Uses tree before knowing where %bf.load can be shrinked.

For walking backward, most of the cases, we only have to walk from a narrower use to "%bf.load =" which is the root of the Def-Uses tree. It is like walking from some leafs to root, which should be more efficient in most cases. I agree we may see special testcase like there is a long common portion for those paths from leafs to root (for that case walking forward is better). If that happen, we can add a cap about the maximum walking distance to avoid the compile time cost from being too high. Chandler, do you think it is ok?

I'm somewhat worried about this cap -- it has hurt us in the past. But maybe there is a way to make walking backwards have reasonable complexity. It still seems like something we can do in a separate phase rather than having it interleave with the store-based shrinking, and so I'd still split it into a separate patch.

Setting aside forwards vs. backwards-with-a-cap, I still think it is a mistake to add yet another implementation of tracking which bits are demanded. So I would look at how you might share the logic in DemandedBits (or one of the other places in LLVM where we reason about this, I think there are already some others) for reasoning about the semantics of the IR instructions. Maybe there is no way to share that, but it seems worth trying. Either way, I'd suggest a fresh thread (or IRC) to discuss issues until there is a patch so that we can move the store side of this forward independently.

That make sense?

fhahn added a subscriber: fhahn.May 19 2017, 5:30 AM

Looks like this thread has gone stale for a while.

I have not read the patch in details so what I said might be be nonsense :) From reading of the discussions, it seems like that using DemandedBits Analysis is not ready today to handle forward walking for load shrinking, nor ideal for backward walking without using capping, so what is the good path going forward? Is it possible to keep the core of this patch but with more re-use of DemandedBit analysis (with refactoring)? If not, we may want to consider moving on with a stop-gap solution for now and committing on longer term unification as follow ups.

wmi mentioned this in D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type.Aug 9 2017, 5:04 PM

spatel mentioned this in D112300: [InstCombine] Don't split up Loads and free Exts.Oct 25 2021, 7:12 AM

Revision Contents

Path

Size

include/

llvm/

Analysis/

MemorySSAUpdater.h

3 lines

CodeGen/

Passes.h

3 lines

IR/

PatternMatch.h

3 lines

InitializePasses.h

1 line

LinkAllPasses.h

1 line

Target/

TargetLowering.h

8 lines

Transforms/

Utils/

Local.h

8 lines

lib/

Analysis/

MemorySSAUpdater.cpp

6 lines

CodeGen/

CMakeLists.txt

1 line

CodeGen.cpp

1 line

MemAccessShrinking.cpp

1189 lines

TargetPassConfig.cpp

6 lines

Target/

AMDGPU/

AMDGPUISelLowering.h

1 line

AMDGPUISelLowering.cpp

15 lines

Transforms/

Utils/

Local.cpp

10 lines

test/

CodeGen/

ARM/

illegal-bitfield-loadstore.ll

101 lines

load-shrink.ll

111 lines

store-shrink.ll

366 lines

X86/

2008-09-11-CoalescerBug2.ll

2 lines

constant-combines.ll

2 lines

i16lshr8pat.ll

2 lines

illegal-bitfield-loadstore.ll

88 lines

load-shrink.ll

111 lines

load-slice.ll

15 lines

lsr-loop-exit-cond.ll

10 lines

mem-access-shrink.ll

216 lines

store-shrink.ll

365 lines

tools/

opt/

opt.cpp

1 line

Diff 97170

include/llvm/Analysis/MemorySSAUpdater.h

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	public:

/// \brief Remove a MemoryAccess from MemorySSA, including updating all		/// \brief Remove a MemoryAccess from MemorySSA, including updating all
/// definitions and uses.		/// definitions and uses.
/// This should be called when a memory instruction that has a MemoryAccess		/// This should be called when a memory instruction that has a MemoryAccess
/// associated with it is erased from the program. For example, if a store or		/// associated with it is erased from the program. For example, if a store or
/// load is simply erased (not replaced), removeMemoryAccess should be called		/// load is simply erased (not replaced), removeMemoryAccess should be called
/// on the MemoryAccess for that store/load.		/// on the MemoryAccess for that store/load.
void removeMemoryAccess(MemoryAccess *);		void removeMemoryAccess(MemoryAccess *);
		/// If the instruction has a MemoryAccess associated with it, remove
		/// the MemoryAccess from MemorySSA.
		void removeMemoryAccess(Instruction *);

private:		private:
// Move What before Where in the MemorySSA IR.		// Move What before Where in the MemorySSA IR.
template <class WhereType>		template <class WhereType>
void moveTo(MemoryUseOrDef What, BasicBlock BB, WhereType Where);		void moveTo(MemoryUseOrDef What, BasicBlock BB, WhereType Where);
MemoryAccess getPreviousDef(MemoryAccess );		MemoryAccess getPreviousDef(MemoryAccess );
MemoryAccess getPreviousDefInBlock(MemoryAccess );		MemoryAccess getPreviousDefInBlock(MemoryAccess );
MemoryAccess getPreviousDefFromEnd(BasicBlock );		MemoryAccess getPreviousDefFromEnd(BasicBlock );
Show All 9 Lines

include/llvm/CodeGen/Passes.h

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	namespace llvm {
/// This pass resets a MachineFunction when it has the FailedISel property		/// This pass resets a MachineFunction when it has the FailedISel property
/// as if it was just created.		/// as if it was just created.
/// If EmitFallbackDiag is true, the pass will emit a		/// If EmitFallbackDiag is true, the pass will emit a
/// DiagnosticInfoISelFallback for every MachineFunction it resets.		/// DiagnosticInfoISelFallback for every MachineFunction it resets.
/// If AbortOnFailedISel is true, abort compilation instead of resetting.		/// If AbortOnFailedISel is true, abort compilation instead of resetting.
MachineFunctionPass *createResetMachineFunctionPass(bool EmitFallbackDiag,		MachineFunctionPass *createResetMachineFunctionPass(bool EmitFallbackDiag,
bool AbortOnFailedISel);		bool AbortOnFailedISel);

		/// This pass shrinks some load/store to narrower legal type accesses.
		FunctionPass createMemAccessShrinkingPass(const TargetMachine TM = nullptr);
		chandlercUnsubmitted Not Done Reply Inline Actions Since this is a CodeGen pass, the code should live in lib/CodeGen rather than in lib/Transforms. chandlerc: Since this is a CodeGen pass, the code should live in lib/CodeGen rather than in lib/Transforms.
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.

/// createCodeGenPreparePass - Transform the code to expose more pattern		/// createCodeGenPreparePass - Transform the code to expose more pattern
/// matching during instruction selection.		/// matching during instruction selection.
FunctionPass createCodeGenPreparePass(const TargetMachine TM = nullptr);		FunctionPass createCodeGenPreparePass(const TargetMachine TM = nullptr);

/// AtomicExpandID -- Lowers atomic operations in terms of either cmpxchg		/// AtomicExpandID -- Lowers atomic operations in terms of either cmpxchg
/// load-linked/store-conditional loops.		/// load-linked/store-conditional loops.
extern char &AtomicExpandID;		extern char &AtomicExpandID;

▲ Show 20 Lines • Show All 365 Lines • Show Last 20 Lines

include/llvm/IR/PatternMatch.h

	Show First 20 Lines • Show All 311 Lines • ▼ Show 20 Lines
	/// \brief Match an instruction, capturing it if we match.			/// \brief Match an instruction, capturing it if we match.
	inline bind_ty<Instruction> m_Instruction(Instruction *&I) { return I; }			inline bind_ty<Instruction> m_Instruction(Instruction *&I) { return I; }
	/// \brief Match a binary operator, capturing it if we match.			/// \brief Match a binary operator, capturing it if we match.
	inline bind_ty<BinaryOperator> m_BinOp(BinaryOperator *&I) { return I; }			inline bind_ty<BinaryOperator> m_BinOp(BinaryOperator *&I) { return I; }

	/// \brief Match a ConstantInt, capturing the value if we match.			/// \brief Match a ConstantInt, capturing the value if we match.
	inline bind_ty<ConstantInt> m_ConstantInt(ConstantInt *&CI) { return CI; }			inline bind_ty<ConstantInt> m_ConstantInt(ConstantInt *&CI) { return CI; }

				/// \brief Match a load instruction, capturing the value if we match.
				inline bind_ty<LoadInst> m_Load(LoadInst *&LI) { return LI; }

	/// \brief Match a Constant, capturing the value if we match.			/// \brief Match a Constant, capturing the value if we match.
	inline bind_ty<Constant> m_Constant(Constant *&C) { return C; }			inline bind_ty<Constant> m_Constant(Constant *&C) { return C; }

	/// \brief Match a ConstantFP, capturing the value if we match.			/// \brief Match a ConstantFP, capturing the value if we match.
	inline bind_ty<ConstantFP> m_ConstantFP(ConstantFP *&C) { return C; }			inline bind_ty<ConstantFP> m_ConstantFP(ConstantFP *&C) { return C; }

	/// \brief Match a specified Value*.			/// \brief Match a specified Value*.
	struct specificval_ty {			struct specificval_ty {
	▲ Show 20 Lines • Show All 1,113 Lines • Show Last 20 Lines

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines
	void initializeMachineOutlinerPass(PassRegistry&);			void initializeMachineOutlinerPass(PassRegistry&);
	void initializeMachinePipelinerPass(PassRegistry&);			void initializeMachinePipelinerPass(PassRegistry&);
	void initializeMachinePostDominatorTreePass(PassRegistry&);			void initializeMachinePostDominatorTreePass(PassRegistry&);
	void initializeMachineRegionInfoPassPass(PassRegistry&);			void initializeMachineRegionInfoPassPass(PassRegistry&);
	void initializeMachineSchedulerPass(PassRegistry&);			void initializeMachineSchedulerPass(PassRegistry&);
	void initializeMachineSinkingPass(PassRegistry&);			void initializeMachineSinkingPass(PassRegistry&);
	void initializeMachineTraceMetricsPass(PassRegistry&);			void initializeMachineTraceMetricsPass(PassRegistry&);
	void initializeMachineVerifierPassPass(PassRegistry&);			void initializeMachineVerifierPassPass(PassRegistry&);
				void initializeMemAccessShrinkingPassPass(PassRegistry &);
	void initializeMemCpyOptLegacyPassPass(PassRegistry&);			void initializeMemCpyOptLegacyPassPass(PassRegistry&);
	void initializeMemDepPrinterPass(PassRegistry&);			void initializeMemDepPrinterPass(PassRegistry&);
	void initializeMemDerefPrinterPass(PassRegistry&);			void initializeMemDerefPrinterPass(PassRegistry&);
	void initializeMemoryDependenceWrapperPassPass(PassRegistry&);			void initializeMemoryDependenceWrapperPassPass(PassRegistry&);
	void initializeMemorySSAPrinterLegacyPassPass(PassRegistry&);			void initializeMemorySSAPrinterLegacyPassPass(PassRegistry&);
	void initializeMemorySSAWrapperPassPass(PassRegistry&);			void initializeMemorySSAWrapperPassPass(PassRegistry&);
	void initializeMemorySanitizerPass(PassRegistry&);			void initializeMemorySanitizerPass(PassRegistry&);
	void initializeMergeFunctionsPass(PassRegistry&);			void initializeMergeFunctionsPass(PassRegistry&);
	▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createJumpThreadingPass();		(void) llvm::createJumpThreadingPass();
(void) llvm::createUnifyFunctionExitNodesPass();		(void) llvm::createUnifyFunctionExitNodesPass();
(void) llvm::createInstCountPass();		(void) llvm::createInstCountPass();
(void) llvm::createConstantHoistingPass();		(void) llvm::createConstantHoistingPass();
(void) llvm::createCodeGenPreparePass();		(void) llvm::createCodeGenPreparePass();
(void) llvm::createCountingFunctionInserterPass();		(void) llvm::createCountingFunctionInserterPass();
(void) llvm::createEarlyCSEPass();		(void) llvm::createEarlyCSEPass();
(void) llvm::createGVNHoistPass();		(void) llvm::createGVNHoistPass();
		(void) llvm::createMemAccessShrinkingPass();
(void) llvm::createMergedLoadStoreMotionPass();		(void) llvm::createMergedLoadStoreMotionPass();
(void) llvm::createGVNPass();		(void) llvm::createGVNPass();
(void) llvm::createNewGVNPass();		(void) llvm::createNewGVNPass();
(void) llvm::createMemCpyOptPass();		(void) llvm::createMemCpyOptPass();
(void) llvm::createLoopDeletionPass();		(void) llvm::createLoopDeletionPass();
(void) llvm::createPostDomTree();		(void) llvm::createPostDomTree();
(void) llvm::createInstructionNamerPass();		(void) llvm::createInstructionNamerPass();
(void) llvm::createMetaRenamerPass();		(void) llvm::createMetaRenamerPass();
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

include/llvm/Target/TargetLowering.h

Show First 20 Lines • Show All 1,959 Lines • ▼ Show 20 Lines	public:

/// Return true if it's profitable to narrow operations of type VT1 to		/// Return true if it's profitable to narrow operations of type VT1 to
/// VT2. e.g. on x86, it's profitable to narrow from i32 to i8 but not from		/// VT2. e.g. on x86, it's profitable to narrow from i32 to i8 but not from
/// i32 to i16.		/// i32 to i16.
virtual bool isNarrowingProfitable(EVT /VT1/, EVT /VT2/) const {		virtual bool isNarrowingProfitable(EVT /VT1/, EVT /VT2/) const {
return false;		return false;
}		}

		/// Return true if it's expensive to narrow operations of type VT1 to
		/// VT2. Sometimes we want to try load/store shrinking to save extra
		/// bit manipulation operations. We need to make sure the shrinking will
		/// not be so expensive to offset the benefit.
		virtual bool isNarrowingExpensive(EVT /VT1/, EVT /VT2/) const {
		return false;
		efriedmaUnsubmitted Not Done Reply Inline Actions I'm not sure I see the point of this hook. Every in-tree target has cheap i8 load/store and aligned i16 load/store operations. And we have existing hooks to check support for misaligned operations. If there's some case I'm not thinking of, please add an example to the comment. efriedma: I'm not sure I see the point of this hook. Every in-tree target has cheap i8 load/store and…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions It is because some testcase for amdgpu. Like the testcase below: define void @s_sext_in_reg_i1_i16(i16 addrspace(1)* %out, i32 addrspace(2)* %ptr) #0 { %ld = load i32, i32 addrspace(2)* %ptr %in = trunc i32 %ld to i16 %shl = shl i16 %in, 15 %sext = ashr i16 %shl, 15 store i16 %sext, i16 addrspace(1)* %out ret void } code with the patch: s_load_dwordx2 s[4:5], s[0:1], 0x9 s_load_dwordx2 s[0:1], s[0:1], 0xb s_mov_b32 s7, 0xf000 s_mov_b32 s6, -1 s_mov_b32 s2, s6 s_mov_b32 s3, s7 s_waitcnt lgkmcnt(0) buffer_load_ushort v0, off, s[0:3], 0 s_waitcnt vmcnt(0) v_bfe_i32 v0, v0, 0, 1 buffer_store_short v0, off, s[4:7], 0 s_endpgm code without the patch: s_load_dwordx2 s[4:5], s[0:1], 0x9 s_load_dwordx2 s[0:1], s[0:1], 0xb s_mov_b32 s7, 0xf000 s_mov_b32 s6, -1 s_waitcnt lgkmcnt(0) s_load_dword s0, s[0:1], 0x0 s_waitcnt lgkmcnt(0) s_bfe_i32 s0, s0, 0x10000 v_mov_b32_e32 v0, s0 buffer_store_short v0, off, s[4:7], 0 s_endpgm amdgpu codegen chooses to use buffer_load_short instead of s_load_dword and generates longer code sequence. I know almost nothing about amdgpu so I simply add the hook and only focus on the architectures I am more faimiliar with before the patch becomes in better shape and stable. wmi: It is because some testcase for amdgpu. Like the testcase below: define void…
		efriedmaUnsubmitted Not Done Reply Inline Actions Huh, GPU targets are weird like that. I would still rather turn it off for amdgpu, as opposed to leaving it off by default. efriedma: Huh, GPU targets are weird like that. I would still rather turn it off for amdgpu, as opposed…
		arsenmUnsubmitted Not Done Reply Inline Actions 32-bit loads should not be reduced to a shorter width. Using a buffer_load_ushort is definitely worse than using s_load_dword. There is a target hook that is supposed to avoid reducing load widths like this arsenm: 32-bit loads should not be reduced to a shorter width. Using a buffer_load_ushort is definitely…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Matt, thanks for the explanation. I guess the hook is isNarrowingProfitable. However, the hook I need is a little different. I need to know whether narrowing is expensive enough. isNarrowingProfitable on x86 shows i32 --> i16 is not profitable, maybe slightly harmful, but it is not quite harmful, and the benefit to do narrowing may outweigh the cost. wmi: Matt, thanks for the explanation. I guess the hook is isNarrowingProfitable. However, the hook…
		arsenmUnsubmitted Not Done Reply Inline Actions The hook I was thinking of was shouldReduceLoadWidth. s_load_dword uses a different cache with much faster access than the buffer instruction if it can be used arsenm: The hook I was thinking of was shouldReduceLoadWidth. s_load_dword uses a different cache with…
		wmiAuthorUnsubmitted Not Done Reply Inline Actions shouldReduceLoadWidth is a hook for TargetLowering but I need a hook in TargetLoweringBase, which can be used for llvm IR pass. I cannot change shouldReduceLoadWidth to be a hook in TargetLoweringBase because of the way in which x86 uses it, so I copy the logic in AMDGPUTargetLowering::shouldReduceLoadWidth to AMDGPUTargetLowering::isNarrowingExpensive. I can let shouldReduceLoadWidth call isNarrowingExpensive in a NFC. Is it ok? wmi: shouldReduceLoadWidth is a hook for TargetLowering but I need a hook in TargetLoweringBase…
		}

/// \brief Return true if it is beneficial to convert a load of a constant to		/// \brief Return true if it is beneficial to convert a load of a constant to
/// just the constant itself.		/// just the constant itself.
/// On some targets it might be more efficient to use a combination of		/// On some targets it might be more efficient to use a combination of
/// arithmetic instructions to materialize the constant instead of loading it		/// arithmetic instructions to materialize the constant instead of loading it
/// from a constant pool.		/// from a constant pool.
virtual bool shouldConvertConstantLoadToIntImm(const APInt &Imm,		virtual bool shouldConvertConstantLoadToIntImm(const APInt &Imm,
Type *Ty) const {		Type *Ty) const {
return false;		return false;
▲ Show 20 Lines • Show All 1,295 Lines • Show Last 20 Lines

include/llvm/Transforms/Utils/Local.h

	Show All 9 Lines
	// This family of functions perform various local transformations to the			// This family of functions perform various local transformations to the
	// program.			// program.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_TRANSFORMS_UTILS_LOCAL_H			#ifndef LLVM_TRANSFORMS_UTILS_LOCAL_H
	#define LLVM_TRANSFORMS_UTILS_LOCAL_H			#define LLVM_TRANSFORMS_UTILS_LOCAL_H

				#include "llvm/ADT/SmallPtrSet.h"
	#include "llvm/Analysis/AliasAnalysis.h"			#include "llvm/Analysis/AliasAnalysis.h"
				#include "llvm/Analysis/MemorySSAUpdater.h"
	#include "llvm/IR/DataLayout.h"			#include "llvm/IR/DataLayout.h"
	#include "llvm/IR/Dominators.h"			#include "llvm/IR/Dominators.h"
	#include "llvm/IR/GetElementPtrTypeIterator.h"			#include "llvm/IR/GetElementPtrTypeIterator.h"
	#include "llvm/IR/IRBuilder.h"			#include "llvm/IR/IRBuilder.h"
	#include "llvm/IR/Operator.h"			#include "llvm/IR/Operator.h"
	#include "llvm/ADT/SmallPtrSet.h"

	namespace llvm {			namespace llvm {

	class User;			class User;
	class BasicBlock;			class BasicBlock;
	class Function;			class Function;
	class BranchInst;			class BranchInst;
	class Instruction;			class Instruction;
	▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
	/// effects if it was not used. This is equivalent to checking whether			/// effects if it was not used. This is equivalent to checking whether
	/// isInstructionTriviallyDead would be true if the use count was 0.			/// isInstructionTriviallyDead would be true if the use count was 0.
	bool wouldInstructionBeTriviallyDead(Instruction *I,			bool wouldInstructionBeTriviallyDead(Instruction *I,
	const TargetLibraryInfo *TLI = nullptr);			const TargetLibraryInfo *TLI = nullptr);

	/// If the specified value is a trivially dead instruction, delete it.			/// If the specified value is a trivially dead instruction, delete it.
	/// If that makes any of its operands trivially dead, delete them too,			/// If that makes any of its operands trivially dead, delete them too,
	/// recursively. Return true if any instructions were deleted.			/// recursively. Return true if any instructions were deleted.
	bool RecursivelyDeleteTriviallyDeadInstructions(Value *V,			bool RecursivelyDeleteTriviallyDeadInstructions(
	const TargetLibraryInfo *TLI = nullptr);			Value V, const TargetLibraryInfo TLI = nullptr,
				MemorySSAUpdater *MSSAUpdater = nullptr);

	/// If the specified value is an effectively dead PHI node, due to being a			/// If the specified value is an effectively dead PHI node, due to being a
	/// def-use chain of single-use nodes that either forms a cycle or is terminated			/// def-use chain of single-use nodes that either forms a cycle or is terminated
	/// by a trivially dead instruction, delete it. If that makes any of its			/// by a trivially dead instruction, delete it. If that makes any of its
	/// operands trivially dead, delete them too, recursively. Return true if a			/// operands trivially dead, delete them too, recursively. Return true if a
	/// change was made.			/// change was made.
	bool RecursivelyDeleteDeadPHINode(PHINode *PN,			bool RecursivelyDeleteDeadPHINode(PHINode *PN,
	const TargetLibraryInfo *TLI = nullptr);			const TargetLibraryInfo *TLI = nullptr);
	▲ Show 20 Lines • Show All 321 Lines • Show Last 20 Lines

lib/Analysis/MemorySSAUpdater.cpp

Show First 20 Lines • Show All 458 Lines • ▼ Show 20 Lines	void MemorySSAUpdater::removeMemoryAccess(MemoryAccess *MA) {
}		}

// The call below to erase will destroy MA, so we can't change the order we		// The call below to erase will destroy MA, so we can't change the order we
// are doing things here		// are doing things here
MSSA->removeFromLookups(MA);		MSSA->removeFromLookups(MA);
MSSA->removeFromLists(MA);		MSSA->removeFromLists(MA);
}		}

		void MemorySSAUpdater::removeMemoryAccess(Instruction *I) {
		MemoryAccess *MemAcc = MSSA->getMemoryAccess(I);
		if (MemAcc)
		removeMemoryAccess(MemAcc);
		}

MemoryAccess *MemorySSAUpdater::createMemoryAccessInBB(		MemoryAccess *MemorySSAUpdater::createMemoryAccessInBB(
Instruction I, MemoryAccess Definition, const BasicBlock *BB,		Instruction I, MemoryAccess Definition, const BasicBlock *BB,
MemorySSA::InsertionPlace Point) {		MemorySSA::InsertionPlace Point) {
MemoryUseOrDef *NewAccess = MSSA->createDefinedAccess(I, Definition);		MemoryUseOrDef *NewAccess = MSSA->createDefinedAccess(I, Definition);
MSSA->insertIntoListsForBlock(NewAccess, BB, Point);		MSSA->insertIntoListsForBlock(NewAccess, BB, Point);
return NewAccess;		return NewAccess;
}		}

Show All 19 Lines

lib/CodeGen/CMakeLists.txt

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	add_llvm_library(LLVMCodeGen
MachinePostDominators.cpp		MachinePostDominators.cpp
MachineRegionInfo.cpp		MachineRegionInfo.cpp
MachineRegisterInfo.cpp		MachineRegisterInfo.cpp
MachineScheduler.cpp		MachineScheduler.cpp
MachineSink.cpp		MachineSink.cpp
MachineSSAUpdater.cpp		MachineSSAUpdater.cpp
MachineTraceMetrics.cpp		MachineTraceMetrics.cpp
MachineVerifier.cpp		MachineVerifier.cpp
		MemAccessShrinking.cpp
PatchableFunction.cpp		PatchableFunction.cpp
MIRPrinter.cpp		MIRPrinter.cpp
MIRPrintingPass.cpp		MIRPrintingPass.cpp
OptimizePHIs.cpp		OptimizePHIs.cpp
ParallelCG.cpp		ParallelCG.cpp
PeepholeOptimizer.cpp		PeepholeOptimizer.cpp
PHIElimination.cpp		PHIElimination.cpp
PHIEliminationUtils.cpp		PHIEliminationUtils.cpp
▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

lib/CodeGen/CodeGen.cpp

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	void llvm::initializeCodeGen(PassRegistry &Registry) {
initializeMachineOptimizationRemarkEmitterPassPass(Registry);		initializeMachineOptimizationRemarkEmitterPassPass(Registry);
initializeMachineOutlinerPass(Registry);		initializeMachineOutlinerPass(Registry);
initializeMachinePipelinerPass(Registry);		initializeMachinePipelinerPass(Registry);
initializeMachinePostDominatorTreePass(Registry);		initializeMachinePostDominatorTreePass(Registry);
initializeMachineRegionInfoPassPass(Registry);		initializeMachineRegionInfoPassPass(Registry);
initializeMachineSchedulerPass(Registry);		initializeMachineSchedulerPass(Registry);
initializeMachineSinkingPass(Registry);		initializeMachineSinkingPass(Registry);
initializeMachineVerifierPassPass(Registry);		initializeMachineVerifierPassPass(Registry);
		initializeMemAccessShrinkingPassPass(Registry);
initializeOptimizePHIsPass(Registry);		initializeOptimizePHIsPass(Registry);
initializePEIPass(Registry);		initializePEIPass(Registry);
initializePHIEliminationPass(Registry);		initializePHIEliminationPass(Registry);
initializePatchableFunctionPass(Registry);		initializePatchableFunctionPass(Registry);
initializePeepholeOptimizerPass(Registry);		initializePeepholeOptimizerPass(Registry);
initializePostMachineSchedulerPass(Registry);		initializePostMachineSchedulerPass(Registry);
initializePostRAHazardRecognizerPass(Registry);		initializePostRAHazardRecognizerPass(Registry);
initializePostRASchedulerPass(Registry);		initializePostRASchedulerPass(Registry);
Show All 27 Lines

lib/CodeGen/MemAccessShrinking.cpp

				//===- MemAccessShrinking.cpp - Shrink the type of mem accesses -------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				/// \file
				/// The pass tries to find opportunities where we can save bit manipulation
				/// operations or convert illegal type operations to legal type by shrinking
				/// the type of memory accesses.
				///
				/// A major motivation of the optimization:
				/// Consecutive bitfields are wrapped as a group and represented as a
				/// large integer. To access a specific bitfield, wide load/store plus a
				chandlercUnsubmitted Not Done Reply Inline Actions nit: s/now// chandlerc: nit: s/now//
				/// series of bit operations are needed. However, if the bitfield is of legal
				/// type, it is more efficient to access it directly. Another case is the
				/// bitfield group may be of illegal integer type, if we only access one
				/// bitfield in the group, we may have chance to shrink the illegal type
				/// to a smaller legal type, and save some bit mask operations.
				/// Although the major motivation of the optimization is about bitfield
				/// accesses, it is not limited to them and can be applied to general memory
				/// accesses with similar patterns.
				///
				/// Description about the optimization:
				/// The optimization have three basic categories. The first category is store
				chandlercUnsubmitted Not Done Reply Inline Actions This sentence doesn't really parse for me, it reads as a command rather a description and comments typically are descriptive. chandlerc: This sentence doesn't really parse for me, it reads as a command rather a description and…
				/// shrinking. If we are storing a large value, with some of its consecutive
				/// bits replaced by a small value through bit manipulation, we can use two
				/// stores, one for the large value and one for the small, to replace the
				/// original store and bit manipulation operations. Even better if the large
				/// value is got from a load with the same address as the store, the store
				/// to the large value can be saved too.
				///
				/// The second category is another kind of store shrinking. We are looking
				/// for the operations sequence like: load a large value, adjust some bits
				/// of the value and then store it back to the same address as the load.
				/// The essence of the operation sequence is to change some bits of the
				/// large value but keep the rest part of it the same. We may shrink the
				/// size of both the load and store, and the bit manipulation operations
				/// in the middle of them, if only the bits to be changed are still included.
				/// This is especially useful when the load and store are of illegal type.
				/// Shrinking the load and store to a smaller legal type can save us many
				/// extra operations to be generated in ISel legalization.
				///
				/// The third category is load use shrinking. If we are loading a large
				/// value and using it to do a series of operations, but finally we only
				chandlercUnsubmitted Not Done Reply Inline Actions "no more change has happened" -> "no more changes have happened" chandlerc: "no more change has happened" -> "no more changes have happened"
				/// use a small part of the result value, we can shrink the size of the
				/// whole chain including the load.
				///
				/// Some notes about the algorithm:
				chandlercUnsubmitted Not Done Reply Inline Actions "mergd" -> "merged" "load/store" needs to be "loads and stores" or "load/store instructions". chandlerc: "mergd" -> "merged" "load/store" needs to be "loads and stores" or "load/store instructions".
				/// * The algorithm scans the program and tries to recognize and transform the
				/// patterns above. It runs iteratively until no more changes have happened.
				efriedmaUnsubmitted Not Done Reply Inline Actions "try", not "tries". efriedma: "try", not "tries".
				chandlercUnsubmitted Not Done Reply Inline Actions I would reword this sentence to be a bit easier to read: It provides scalable and precise safety checks even when we try to insert a smaller access into a block which is many blocks away from the original access. chandlerc: I would reword this sentence to be a bit easier to read: It provides scalable and precise…
				/// * To prevent the optimization from blocking load/store coalescing, it is
				chandlercUnsubmitted Not Done Reply Inline Actions A comment on the terminology throughout this patch: the adjective describing something which has been reduced in size in the past is "shrunken". That said, if this is awkward to use, I might use the adjective "smaller". But "shrinked" isn't a word in English. chandlerc: A comment on the terminology throughout this patch: the adjective describing something which…
				/// invoked late in the pipeline, just before CodeGenPrepare. At this late
				/// stage, both the pattern matching and related safety check become more
				/// difficult because of previous optimizations introducing merged loads and
				/// stores and more complex control flow. That is why MemorySSA is used here.
				/// It provides scalable and precise safety checks even when we try to insert
				/// a smaller access into a block which is many blocks away from the original
				/// access.
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/PostOrderIterator.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/Analysis/MemorySSA.h"
				#include "llvm/Analysis/MemorySSAUpdater.h"
				#include "llvm/Analysis/ValueTracking.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/PatternMatch.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Target/TargetLowering.h"
				#include "llvm/Target/TargetMachine.h"
				#include "llvm/Target/TargetSubtargetInfo.h"
				#include "llvm/Transforms/Scalar.h"
				#include "llvm/Transforms/Utils/Local.h"

				#define DEBUG_TYPE "memaccessshrink"

				using namespace llvm;
				using namespace llvm::PatternMatch;
				chandlercUnsubmitted Not Done Reply Inline Actions Are both of these really useful for debugging? We already have a flag that controls whether the pass is enabled or not. chandlerc: Are both of these really useful for debugging? We already have a flag that controls whether the…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions It is actually put there for my own conveniency when debugging. I will remove them. wmi: It is actually put there for my own conveniency when debugging. I will remove them.

				STATISTIC(NumStoreShrunkBySplit, "Number of stores shrunk by splitting");
				STATISTIC(NumStoreShrunkToLegal, "Number of stores shrunk to legal types");
				STATISTIC(NumLoadShrunk, "Number of Loads shrunk");
				chandlercUnsubmitted Not Done Reply Inline Actions "mod" range seems like an odd name? chandlerc: "mod" range seems like an odd name?
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Yes, I will change them. wmi: Yes, I will change them.

				namespace {
				/// Describe the bit range of a value.
				chandlercUnsubmitted Not Done Reply Inline Actions Maybe a comment on what this is used for? chandlerc: Maybe a comment on what this is used for?
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Sure. wmi: Sure.
				struct BitRange {
				unsigned Shift;
				unsigned Width;
				bool isDisjoint(BitRange &BR);
				};

				/// \brief The memory access shrinking pass.
				class MemAccessShrinkingPass : public FunctionPass {
				chandlercUnsubmitted Not Done Reply Inline Actions We generally use initializer sequences: MemAccessShrinkingPass(const TargetMachine TM = nullptr) : FunctionPass(ID), TM(TM) { Also, sense this is an internal type I'd skip the default argument for `TM` as it seems to not really give you much. chandlerc:* We generally use initializer sequences: MemAccessShrinkingPass(const TargetMachine *TM =…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Ok. wmi: Ok.
				public:
				static char ID; // Pass identification, replacement for typeid
				MemAccessShrinkingPass(const TargetMachine *TM = nullptr)
				chandlercUnsubmitted Not Done Reply Inline Actions Still should omit the `= nullptr` here since this is an internal type. chandlerc: Still should omit the `= nullptr` here since this is an internal type.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions I cannot omit it because In INITIALIZE_TM_PASS_END, callDefaultCtor<passName> requires the param to have a default value. wmi: I cannot omit it because In INITIALIZE_TM_PASS_END, callDefaultCtor<passName> requires the…
				chandlercUnsubmitted Not Done Reply Inline Actions Ah, right, I forgot that about the pass initialization. Sorry for the noise! chandlerc: Ah, right, I forgot that about the pass initialization. Sorry for the noise!
				: FunctionPass(ID), TM(TM) {
				initializeMemAccessShrinkingPassPass(*PassRegistry::getPassRegistry());
				}

				bool runOnFunction(Function &Fn) override;

				StringRef getPassName() const override { return "MemAccess Shrinking"; }

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesCFG();
				AU.addRequired<DominatorTreeWrapperPass>();
				AU.addRequired<MemorySSAWrapperPass>();
				AU.addPreserved<MemorySSAWrapperPass>();
				AU.addRequired<TargetLibraryInfoWrapperPass>();
				}

				struct StoreShrunkInfo {
				StoreInst &SI;
				Value *LargeVal;
				Value *SmallVal;
				ConstantInt *Cst;
				BitRange &BR;
				StoreShrunkInfo(StoreInst &SI, Value LargeVal, Value SmallVal,
				ConstantInt *Cst, BitRange &BR)
				: SI(SI), LargeVal(LargeVal), SmallVal(SmallVal), Cst(Cst), BR(BR) {}
				};

				private:
				chandlercUnsubmitted Not Done Reply Inline Actions I would try to expand unusual acronyms: 'du-chain' -> 'Def-Use chain'. chandlerc: I would try to expand unusual acronyms: 'du-chain' -> 'Def-Use chain'.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Will fix it. wmi: Will fix it.
				const DataLayout *DL = nullptr;
				const DominatorTree *DT = nullptr;
				const TargetMachine *TM = nullptr;
				const TargetLowering *TLI = nullptr;
				const TargetLibraryInfo *TLInfo = nullptr;
				LLVMContext *Context;

				MemorySSA *MSSA;
				MemorySSAWalker *MSSAWalker;
				std::unique_ptr<MemorySSAUpdater> MSSAUpdater;

				// Candidate instructions to be erased if they have no other uses.
				SmallPtrSet<Instruction *, 8> CandidatesToErase;

				// MultiUsesSeen shows a multiuse node is found on the Def-Use Chain.
				bool MultiUsesSeen;
				// SavedInsts shows how many instructions saved for a specific shrink so far.
				// It is used to evaluate whether some instructions will be saved for the
				// shrinking especially when MultiUsesSeen is true.
				unsigned SavedInsts;

				BitRange analyzeOrAndPattern(Value &SmallVal, ConstantInt &Cst,
				unsigned BitSize, bool &DoReplacement);
				BitRange analyzeBOpPattern(Value &Val, ConstantInt &Cst, unsigned BitSize);
				bool extendBitRange(BitRange &BR, unsigned BitSize, unsigned Align);
				chandlercUnsubmitted Not Done Reply Inline Actions These seem like helper functions that don't actually need to be part of the class at all. Maybe make them static free functions? (This may be true for some of the other routines in this file.) chandlerc: These seem like helper functions that don't actually need to be part of the class at all. Maybe…
				bool hasClobberBetween(Instruction &From, Instruction &To);
				bool needNewStore(Value &LargeVal, StoreInst &SI);
				Value createNewPtr(Value Ptr, unsigned StOffset, Type *NewTy,
				IRBuilder<> &Builder);

				bool findBRWithLegalType(StoreInst &SI, BitRange &BR);
				bool isLegalToShrinkStore(LoadInst &LI, StoreInst &SI);
				bool tryShrinkStoreBySplit(StoreInst &SI);
				bool tryShrinkStoreToLegalTy(StoreInst &SI);
				bool splitStoreIntoTwo(StoreShrunkInfo &SInfo);
				bool shrinkStoreToLegalType(StoreShrunkInfo &SInfo);
				bool tryShrinkLoadUse(Instruction &IN);
				bool tryShrink(Function &Fn);
				Value shrinkInsts(Value NewPtr, BitRange &BR,
				SmallVectorImpl<Instruction *> &Insts,
				bool AllClobberInToBlock, unsigned ResShift,
				IRBuilder<> &Builder);
				bool matchInstsToShrink(Value *Val, BitRange &BR,
				SmallVectorImpl<Instruction *> &Insts);
				bool hasClobberBetween(Instruction &From, Instruction &To,
				const MemoryLocation &Mem, bool &AllClobberInToBlock);
				void addSaveCandidate(Instruction &Inst, bool ReplaceAllUses = false);
				bool isBeneficial(unsigned ResShift, SmallVectorImpl<Instruction *> &Insts);
				void markCandForDeletion(Instruction &I) { CandidatesToErase.insert(&I); }
				bool removeDeadInsts();
				};
				} // namespace

				char MemAccessShrinkingPass::ID = 0;
				INITIALIZE_TM_PASS_BEGIN(MemAccessShrinkingPass, "memaccessshrink",
				"MemAccess Shrinking", false, false)
				INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(MemorySSAWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
				INITIALIZE_TM_PASS_END(MemAccessShrinkingPass, "memaccessshrink",
				"MemAccess Shrinking", false, false)

				FunctionPass llvm::createMemAccessShrinkingPass(const TargetMachine TM) {
				return new MemAccessShrinkingPass(TM);
				}

				/// Analyze pattern or(and(LargeVal, \p Cst), \p SmallVal). A common usage
				/// of the pattern is to replace some consecutive bits in LargeVal with
				/// nonzero bits of SmallVal, but to do the replacement as described,
				/// some requirements have to be satisfied like that \p Cst has to contain
				/// a consecutive run of zeros, and that all the nonzero bits of \p SmallVal
				/// should be inside the range of consecutive zeros bits of \p Cst.
				/// \p DoReplacement will contain the result of whether the requirements are
				/// satisfied. We will also return BitRange describing the maximum range of
				/// LargeVal that may be modified.
				chandlercUnsubmitted Not Done Reply Inline Actions It seems much more idiomatic to return the bool indicating whether a valid BitRange was computable, and if true, set up the values in an output parameter. Or even better, you could return an `Optional<BitRange>`, and return `None` when the requirements aren't satisfied. chandlerc: It seems much more idiomatic to return the bool indicating whether a valid BitRange was…
				BitRange MemAccessShrinkingPass::analyzeOrAndPattern(Value &SmallVal,
				ConstantInt &Cst,
				chandlercUnsubmitted Not Done Reply Inline Actions I know other passes use the variable name `Cst` but I'd suggest using just `C` for generic constants or some more descriptive term like you use elsewhere like `Mask`. chandlerc: I know other passes use the variable name `Cst` but I'd suggest using just `C` for generic…
				unsigned BitSize,
				bool &DoReplacement) {
				// Cst is the mask. Analyze the pattern of mask after sext it to uint64_t. We
				// will handle patterns like either 0..01..1 or 1..10..01..1
				APInt Mask = Cst.getValue().sextOrTrunc(BitSize);
				assert(Mask.getBitWidth() == BitSize && "Unexpected bitwidth of Mask");
				chandlercUnsubmitted Not Done Reply Inline Actions Do you really need to extend or truncate here? Surely the type system has already caused the constant to be of the size you want? If so, I'd just assert it here. Maybe could directly pass a 'const &' APInt in as the parameter, letting you call the parameter `Mask` above? chandlerc: Do you really need to extend or truncate here? Surely the type system has already caused the…
				unsigned MaskLeadOnes = Mask.countLeadingOnes();
				unsigned MaskTrailOnes = Mask.countTrailingOnes();
				chandlercUnsubmitted Not Done Reply Inline Actions I would call these `MaskLeadingOnes` and `MaskTrailingOnes`. chandlerc: I would call these `MaskLeadingOnes` and `MaskTrailingOnes`.
				unsigned MaskMidZeros = !MaskLeadOnes
				? Mask.countLeadingZeros()
				: Mask.ashr(MaskTrailOnes).countTrailingZeros();
				chandlercUnsubmitted Not Done Reply Inline Actions I'm having trouble understanding the logic here in the case where there are leading ones. Here is my reasoning, but maybe I've gotten something wrong here: Shifting right will remove leading ones, but you're shifting right the number of trailing ones... Shouldn't that be leading ones? And won't the result of a shift right be to place the middle zero sequence at the least significant bit, meaning you would want to count the leading zeros? Put differently, arithmetic shift is required to not change the most significant bit, so doing an arithmetic shift right based on how many ones are trailing, seems like it will never change the count of trailing zeros. If this is correct, then this is a bug and you should add some test cases that will hit this bug. But regardless of whether my understanding is correct or there is a bug here, I think this can be written in a more obvious way: unsigned MaskMidZeros = BitSize - (MaskLeadingOnes + MaskTrailingOnes); And then directly testing whether they are all zero: if (Mask == APInt::getBitsSet(BitSize, MaskLeadingOnes, MaskLeadingOnes + MaskMidZeros)) { chandlerc: I'm having trouble understanding the logic here in the case where there are leading ones. Here…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Shifting right will remove leading ones, but you're shifting right the number of trailing ones... Shouldn't that be leading ones? And won't the result of a shift right be to place the middle zero sequence at the least significant bit, meaning you would want to count the leading zeros? I think shifting right will remove trailing ones? And after the shift (Mask.ashr(MaskTrailOnes)), middle zeros are at the least sigficant bits, and they are trailing zeros, right? But like you said, I should rule out the all zero/all one cases separately so the logic will become more clear. wmi: > Shifting right will remove leading ones, but you're shifting right the number of trailing…
				chandlercUnsubmitted Not Done Reply Inline Actions Ah, ok, this makes sense to me now. I had confused myself thinking about it. Anyways, the simpler formulation will avoid any future reader confusion. chandlerc: Ah, ok, this makes sense to me now. I had confused myself thinking about it. Anyways, the…
				// Replace some consecutive bits in LargeVal with all the bits of SmallVal.
				DoReplacement = true;
				if (MaskLeadOnes == BitSize) {
				MaskMidZeros = 0;
				DoReplacement = false;
				chandlercUnsubmitted Not Done Reply Inline Actions Why would we see an all ones mask? Shouldn't have that been eliminated earlier? It seems like we could just bail in this case. chandlerc: Why would we see an all ones mask? Shouldn't have that been eliminated earlier? It seems like…
				} else if (MaskLeadOnes + MaskMidZeros + MaskTrailOnes != BitSize) {
				// See if we have a continuous run of zeros.
				MaskMidZeros = BitSize - MaskLeadOnes - MaskTrailOnes;
				DoReplacement = false;
				}

				// Check SmallVal only provides nonzero bits within range from lowbits
				// (MaskTrailOnes) to highbits (MaskTrailOnes + MaskMidZeros).
				APInt BitMask =
				~APInt::getBitsSet(BitSize, MaskTrailOnes, MaskTrailOnes + MaskMidZeros);

				// Find out the range in which 1 appears in SmallVal.
				APInt KnownOne(BitSize, 0), KnownZero(BitSize, 0);
				computeKnownBits(&SmallVal, KnownZero, KnownOne, *DL, 0);

				BitRange BR;
				BR.Shift = MaskTrailOnes;
				BR.Width = MaskMidZeros;
				// Expect the bits being 1 in BitMask are all KnownZero bits in SmallVal,
				// otherwise we need to set ShrinkWithMaskedVal to false and expand BR.
				if ((KnownZero & BitMask) != BitMask) {
				chandlercUnsubmitted Not Done Reply Inline Actions The idiomatic way to test this with APInt is `BitMask.isSubsetOf(KnownZero)`. Also, it would be good to use early-exit here. It sounds like you are testing whether it is valid to do anything, but that isn't clear when you have set up members of `BR` here before returning. chandlerc: The idiomatic way to test this with APInt is `BitMask.isSubsetOf(KnownZero)`. Also, it would…
				DoReplacement = false;
				// Lower is the first bit for which we are not sure about the fact of
				// its being zero.
				unsigned Lower = KnownZero.countTrailingOnes();
				// Higher is the last bit for which we are not sure about the fact of
				// its being zero.
				unsigned Higher = BitSize - KnownZero.countLeadingOnes();
				BR.Shift = std::min(Lower, MaskTrailOnes);
				BR.Width = std::max(Higher, MaskTrailOnes + MaskMidZeros) - BR.Shift;
				}
				return BR;
				}

				/// Analyze pattern or/xor/and(load P, \p Cst). The result of the pattern
				chandlercUnsubmitted Not Done Reply Inline Actions When you have alternates, the pattern notation is a bit confusing. I'd just say something like `Analyze <bop>(load P, \p Cst) where <bop> is either 'or', 'xor', or 'and'.`. chandlerc: When you have alternates, the pattern notation is a bit confusing. I'd just say something like…
				/// is a partially changed load P. The func will return the the range
				chandlercUnsubmitted Not Done Reply Inline Actions This isn't really about whether the original value is loaded or not, right? It is just bounding the changed bits? I'd explain it that way. You'll mention the load when you use it. chandlerc: This isn't really about whether the original value is loaded or not, right? It is just bounding…
				/// showing where the load result are changed.
				BitRange MemAccessShrinkingPass::analyzeBOpPattern(Value &Val, ConstantInt &Cst,
				unsigned BitSize) {
				chandlercUnsubmitted Not Done Reply Inline Actions Maybe a better name of this function would be: `computeBopChangedBitRange`? chandlerc: Maybe a better name of this function would be: `computeBopChangedBitRange`?
				APInt Mask = Cst.getValue().sextOrTrunc(BitSize);
				chandlercUnsubmitted Not Done Reply Inline Actions Same comment above about just asserting the correct bitsize and passing the APInt Mask in directly. chandlerc: Same comment above about just asserting the correct bitsize and passing the APInt Mask in…
				BinaryOperator *BOP = cast<BinaryOperator>(&Val);
				chandlercUnsubmitted Not Done Reply Inline Actions Why not pass the argument as a BinaryOperator? chandlerc: Why not pass the argument as a BinaryOperator?
				if (BOP->getOpcode() == Instruction::And)
				Mask = ~Mask;
				chandlercUnsubmitted Not Done Reply Inline Actions Might be nice to add a comment explaining the logic here. Something like: Both 'or' and 'xor' operations only mutate when the operand has a one bit. But 'and' only mutates when the operand has a zero bit, so invert the constant when the instruction is an and so that all the (potentially) changed bits are ones in the operand. chandlerc: Might be nice to add a comment explaining the logic here. Something like: Both 'or' and…

				BitRange BR;
				BR.Shift = Mask.countTrailingZeros();
				BR.Width = Mask.getBitWidth() - BR.Shift;
				if (BR.Width)
				BR.Width = BR.Width - Mask.countLeadingZeros();
				return BR;
				}

				/// Extend \p BR so the new BR.Width bits can form a legal type.
				bool MemAccessShrinkingPass::extendBitRange(BitRange &BR, unsigned BitSize,
				unsigned Align) {
				BitRange NewBR;
				NewBR.Width = PowerOf2Ceil(BR.Width);
				Type NewTy = Type::getIntNTy(Context, NewBR.Width);
				Type StoreTy = Type::getIntNTy(Context, PowerOf2Ceil(BitSize));
				chandlercUnsubmitted Not Done Reply Inline Actions Why the `PowerOf2Ceil` here? Will the actual store used have that applied? If the actual store has that applied, why don't we want to consider that as `BitSize` so we're free to use that larger size for the narrow new type? chandlerc: Why the `PowerOf2Ceil` here? Will the actual store used have that applied? If the actual store…

				// Check if we can find a new Shift for the Width of NewBR, so that
				// NewBR forms a new range covering the old modified range without
				// worsening alignment.
				auto coverOldRange = [&](BitRange &NewBR, BitRange &BR) -> bool {
				chandlercUnsubmitted Not Done Reply Inline Actions As the comment explains, this lambda is actually computing the Shift. But the name seems to indicate it is just a predicate testing whether the old range is covered by the new one. Also, why does the old `BR` need to passed in as an argument, isn't that something that can be captured? I actually like passing `NewBR` in here to show that it is what is changing between calls to this routine. But it seems awkward to setup `NewBR` before this lambda (which would allow it to be implicitly capture it) and then call it with a parameter name that shadows that, therefore avoiding capturing it. I'd consider whether you want to sink `NewBR` down or othewise more cleanly handle it in the loop. Nit pick: we typically name lambdas like variables with `FooBar` rather than like functions with `fooBar`. chandlerc: As the comment explains, this lambda is actually computing the Shift. But the name seems to…
				unsigned MAlign = MinAlign(Align, DL->getABITypeAlignment(NewTy));
				chandlercUnsubmitted Not Done Reply Inline Actions What about platforms with support for unaligned loads? Probably best to just leave a FIXME rather than adding more to this patch, but it seems nice to mention that technique. As an example, on x86, if you have a bitfield that looks like: struct S { unsigned a : 48; unsigned b : 48; unsigned c : 32; }; It seems likely to be substantially better to do single 8-byte load and mask off the high 2 bytes when accessing `b` than to do two nicely aligned 8-byte loads and all the bit math to recombine things. chandlerc: What about platforms with support for unaligned loads? Probably best to just leave a FIXME…
				int Shift = BR.Shift - BR.Shift % (MAlign * 8);
				while (Shift >= 0) {
				if (NewBR.Width + Shift <= BitSize &&
				NewBR.Width + Shift >= BR.Width + BR.Shift) {
				NewBR.Shift = Shift;
				return true;
				}
				Shift -= MAlign * 8;
				}
				return false;
				};
				// See whether we can store NewTy legally.
				auto isStoreLegalType = [&](Type *NewTy) -> bool {
				EVT OldEVT = TLI->getValueType(*DL, StoreTy);
				EVT NewEVT = TLI->getValueType(*DL, NewTy);
				return TLI->isOperationLegalOrCustom(ISD::STORE, NewEVT) \|\|
				TLI->isTruncStoreLegalOrCustom(OldEVT, NewEVT);
				};

				// Try to find the minimal NewBR.Width which can form a legal type and cover
				// all the old modified bits.
				while (NewBR.Width < BitSize &&
				(!isStoreLegalType(NewTy) \|\|
				TLI->isNarrowingExpensive(EVT::getEVT(StoreTy), EVT::getEVT(NewTy)) \|\|
				!coverOldRange(NewBR, BR))) {
				NewBR.Width = NextPowerOf2(NewBR.Width);
				NewTy = Type::getIntNTy(*Context, NewBR.Width);
				}
				BR.Width = NewBR.Width;
				BR.Shift = NewBR.Shift;

				if (BR.Width >= BitSize)
				return false;
				return true;
				}

				/// Compute the offset used to compute the new ptr address. It will be
				/// mainly based on BR and the endian of the target.
				static unsigned computeBitOffset(BitRange &BR, unsigned BitSize,
				chandlercUnsubmitted Not Done Reply Inline Actions Pass `BR` by value? (or make it a const reference, but it seems small) chandlerc: Pass `BR` by value? (or make it a const reference, but it seems small)
				const DataLayout &DL) {
				unsigned BitOffset;
				if (DL.isBigEndian())
				BitOffset = BitSize - BR.Shift - BR.Width;
				else
				BitOffset = BR.Shift;
				chandlercUnsubmitted Not Done Reply Inline Actions Generally we prefer early returns to state. That would make this: if (...) return ...; return ...; chandlerc: Generally we prefer early returns to state. That would make this: if (...) return ...

				return BitOffset;
				}

				static unsigned computeByteOffset(BitRange &BR, unsigned BitSize,
				const DataLayout &DL) {
				return computeBitOffset(BR, BitSize, DL) / 8;
				}

				/// Check whether V1 and V2 has the same ptr value by looking through bitcast.
				static bool hasSamePtr(Value V1, Value V2) {
				chandlercUnsubmitted Not Done Reply Inline Actions You can replace uses of this with: `V1->stripPointerCasts() == V2->stripPointerCasts()`. This will be more powerful as well. chandlerc: You can replace uses of this with: `V1->stripPointerCasts() == V2->stripPointerCasts()`. This…
				Value *NV1 = nullptr;
				Value *NV2 = nullptr;
				if (V1 == V2)
				return true;
				if (BitCastInst *BC1 = dyn_cast<BitCastInst>(V1))
				NV1 = BC1->getOperand(0);
				if (BitCastInst *BC2 = dyn_cast<BitCastInst>(V2))
				NV2 = BC2->getOperand(0);
				if (!NV1 && !NV2)
				return false;
				if (V1 == NV2 \|\| V2 == NV1 \|\| NV1 == NV2)
				return true;
				return false;
				}

				/// Check whether there is any clobber instruction between \p From and \p To
				/// for the memory location accessed by \p To.
				bool MemAccessShrinkingPass::hasClobberBetween(Instruction &From,
				Instruction &To) {
				assert(MSSA->getMemoryAccess(&To) && "Expect To has valid MemoryAccess");
				MemoryAccess *FromAccess = MSSA->getMemoryAccess(&From);
				assert(FromAccess && "Expect From has valid MemoryAccess");
				MemoryAccess *DefiningAccess = MSSAWalker->getClobberingMemoryAccess(&To);
				if (FromAccess != DefiningAccess &&
				MSSA->dominates(FromAccess, DefiningAccess))
				chandlercUnsubmitted Not Done Reply Inline Actions Maybe this is just a strange API on MemorySSA, but typically I wouldn't expect a lack of dominance to indicate that no access between two points exists. How does MemorySSA model a pattern that looks like: From x \ / \ / A \| \| To Where `A` is a defining access, is between `From` and `To`, but I wouldn't expect `From` to dominate `A` because there is another predecessor `x`. chandlerc: Maybe this is just a strange API on MemorySSA, but typically I wouldn't expect a lack of…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions The case will not happen because we ensure `From` dominates `To` before calling the function. You are right, it is better to add an assertion at the entry of the function to prevent misuse of the API. wmi: The case will not happen because we ensure `From` dominates `To` before calling the function.
				chandlercUnsubmitted Not Done Reply Inline Actions Ok, while that makes sense, it still seems counter-intuitive in terms of how to use MemorySSA based on my limited understanding. I would have expected essentially walking up the def's from the use until either a clobber is found or the 'from' is found. One has to come first, and which ever is first dictates if there is a clobber. Essentially, I would expect to use the SSA properties to answer these questions rather than the dominance or control flow properties. But I'm happy if folks more deeply familiar with MemorySSA can explain better why this is the right way to use it, as I'm still new to this infrastructure in LLVM. chandlerc: Ok, while that makes sense, it still seems counter-intuitive in terms of how to use MemorySSA…
				return true;
				return false;
				}

				/// It is possible that we already have a store of LargeVal and it is not
				/// clobbered, then we can use it and don't have to generate a a new store.
				bool MemAccessShrinkingPass::needNewStore(Value &LargeVal, StoreInst &SI) {
				for (User *U : LargeVal.users()) {
				StoreInst *PrevSI = dyn_cast<StoreInst>(U);
				if (!PrevSI \|\|
				!hasSamePtr(PrevSI->getPointerOperand(), SI.getPointerOperand()) \|\|
				PrevSI->getValueOperand()->getType() != SI.getValueOperand()->getType())
				continue;
				if (!hasClobberBetween(*PrevSI, SI))
				return false;
				}
				return true;
				}

				/// Create new address to be used by shrunk load/store based on original
				/// address \p Ptr, offset \p StOffset and the new type \p NewTy.
				Value MemAccessShrinkingPass::createNewPtr(Value Ptr, unsigned StOffset,
				chandlercUnsubmitted Not Done Reply Inline Actions `StOffset` seems an odd name if this is used to create new pointers for loads as well as stores. chandlerc: `StOffset` seems an odd name if this is used to create new pointers for loads as well as stores.
				Type *NewTy, IRBuilder<> &Builder) {
				Value *NewPtr = Ptr;
				unsigned AS = cast<PointerType>(Ptr->getType())->getAddressSpace();
				if (StOffset) {
				chandlercUnsubmitted Not Done Reply Inline Actions After reading more of this routine, I think you should split it into two routines, one that tries to handle the first pattern, and one that only handles the second pattern. You can factor the rewriting code that is currently shared by both patterns into utility functions that are called for both. But the logic of this routine is harder to follow because you always have this state to hold between doing two different kinds of transforms. chandlerc: After reading more of this routine, I think you should split it into two routines, one that…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions I tried that splitting before but I had to move many temporaries to MemAccessShrinkingPass. However, some temporaries are only used by store shrinking but not load shrinking, so that looked a little weird. I agree the logic will be clearer if it is split into two routines. I will try again, and see if I can separate some common temporaries into MemAccessShrinkingPass and leave the special temporaries as parameters, or create a store shrinking class to keep the temporaries. wmi: I tried that splitting before but I had to move many temporaries to MemAccessShrinkingPass.
				ConstantInt Idx = ConstantInt::get(Type::getInt32Ty(Context), StOffset);
				NewPtr =
				Builder.CreateBitCast(Ptr, Type::getInt8PtrTy(*Context, AS), "cast");
				NewPtr =
				Builder.CreateGEP(Type::getInt8Ty(*Context), NewPtr, Idx, "uglygep");
				chandlercUnsubmitted Not Done Reply Inline Actions No need to call it `uglygep`. If you want a clue as to the types, maybe `rawgep` or `bytegep`. chandlerc: No need to call it `uglygep`. If you want a clue as to the types, maybe `rawgep` or `bytegep`.
				}
				return Builder.CreateBitCast(NewPtr, NewTy->getPointerTo(AS), "cast");
				}

				/// Check if it is legal to shrink the store if the input LargeVal is a
				/// LoadInst.
				chandlercUnsubmitted Not Done Reply Inline Actions `TBits` doesn't really give me enough information as a variable name... Maybe `StoreBitSize`? chandlerc: `TBits` doesn't really give me enough information as a variable name... Maybe `StoreBitSize`?
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Ok, will change it. wmi: Ok, will change it.
				chandlercUnsubmitted Not Done Reply Inline Actions This comment mentions `LargeVal` but that isn't an argument? chandlerc: This comment mentions `LargeVal` but that isn't an argument?
				bool MemAccessShrinkingPass::isLegalToShrinkStore(LoadInst &LI, StoreInst &SI) {
				Value *Ptr = SI.getOperand(1);
				// LI should have the same address as SI.
				if (!hasSamePtr(LI.getPointerOperand(), Ptr))
				return false;
				if (LI.isVolatile() \|\| !LI.isUnordered())
				return false;
				// Make sure the memory location of LI/SI is not clobbered between them.
				if (hasClobberBetween(LI, SI))
				return false;
				return true;
				}

				/// we try to handle is when a smaller value is "inserted" into some bits
				chandlercUnsubmitted Not Done Reply Inline Actions This reads like a fragment, were there supposed to be more comments before this line? chandlerc: This reads like a fragment, were there supposed to be more comments before this line?
				/// of a larger value. This typically looks something like:
				///
				/// store(or(and(LargeVal, MaskConstant), SmallVal)), address)
				///
				/// Here, we try to use a narrow store of `SmallVal` rather than bit
				/// operations to combine them:
				///
				/// store(LargeVal, address)
				/// store(SmallVal, address)
				chandlercUnsubmitted Not Done Reply Inline Actions Even after reading your comment I'm not sure I understand what is driving the complexity of this match. Can you explain (maybe just here in the review) what patterns you're trying to handle that are motivating this? I'm wondering whether there is any canonicalization that can be leveraged (or added if not already there) to reduce the complexity of the pattern here. Or if we really have to handle this complexity, what the best way to write it and/or comment it so that readers understand the result. chandlerc: Even after reading your comment I'm not sure I understand what is driving the complexity of…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions I borrow the template in your comments to explain: store(or(and(LargeVal, MaskConstant), SmallVal), address) The case is: store(or_1(and_1(or_2(and_2(load, -65281), Val1), -256), and_3(Val2, 7)) The two operands of "or_1" are "and_1" and "and_3", but it doesn't know which subtree of and1 or and3 contains the LargeVal. I hope or_2 can be matched to the LargeVal. It is a common pattern after bitfield load/store coalescing. But I realize when I am explaining to you, that I can split the complex pattern matching above into two, which may be simpler. bool OrAndPattern = match(Val, m_c_Or(m_And(m_Value(LargeVal), m_ConstantInt(Cst)), m_Value(SmallVal))); if (match(SmallVal, m_c_And(m_c_Or(m_And(m_Value(), m_Value()), m_Value()))) std::swap(SmallVal, LargeVal); wmi: I borrow the template in your comments to explain: store(or(and(LargeVal, MaskConstant)…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions I find I still have to keep the original complex pattern. Now I remember where the real difficulty is: For case like: store(or_1(and_1(or_2(and_2(load, -65281), Val1), -256), and_3(Val2, 7)), I want to match LargeVal to or_2(and_2(load, ...) But I cannot use match(Val, m_c_Or(m_And(m_c_Or(m_And(...)))) because I have no way to get the intermediate results of the match, like I cannot bind LargeVal to the second m_c_Or. So I have to split the match into multiples. That is where the complexity comes from. wmi: I find I still have to keep the original complex pattern. Now I remember where the real…
				chandlercUnsubmitted Not Done Reply Inline Actions It would be really nice if LLVM would canonicalize in one way or the other so you didn't have to handle so many variations. Asking folks about whether we can/should do anything like that. But I think the bigger question is, why would only two layers be enough? I feel like there is something more general here that will make explaining everything else much simpler. Are you looking for a load specifically? Or are you just looking for one side of an `or` which has a "narrow" (after masking) `and`? If the former, maybe just search for the load? If the latter, maybe you should be just capturing the two sides of the or, and rather than looking explicitly for an 'and', instead compute whether the non-zero bits of one side or the other are "narrow"? chandlerc: It would be really nice if LLVM would canonicalize in one way or the other so you didn't have…
				///
				/// Or if `LargeVal` was a load, we may not need to store it at all:
				///
				/// store(SmallVal, address)
				///
				/// We may also be able incorporate shifts of `SmallVal` by using an offset
				/// of `address`.
				///
				/// However, this may not be valid if, for example, the size of `SmallVal`
				/// isn't a valid type to store or the shift can't be modeled as a valid
				/// offset from the address, then we can still try another transformation
				/// which is to shrink the store to smaller legal width when the original
				/// width of the store is illegal.
				bool MemAccessShrinkingPass::tryShrinkStoreBySplit(StoreInst &SI) {
				Value *Val = SI.getOperand(0);
				Type *StoreTy = Val->getType();
				if (StoreTy->isVectorTy() \|\| !StoreTy->isIntegerTy())
				return false;
				if (SI.isVolatile() \|\| !SI.isUnordered())
				return false;
				unsigned BitSize = DL->getTypeSizeInBits(StoreTy);
				if (BitSize != DL->getTypeStoreSizeInBits(StoreTy))
				return false;

				Value *LargeVal;
				Value *SmallVal;
				chandlercUnsubmitted Not Done Reply Inline Actions What happens when both are true? It looks like we just overwrite the 'MR' code? I feel like both of these `analyze...()` methods should return the `ModRange` struct rather than having an output parameter. chandlerc: What happens when both are true? It looks like we just overwrite the 'MR' code? I feel like…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions We will just overwrite "MR" but it is still not good for "OrAndPattern". I will change the second "if" to "else if". wmi: We will just overwrite "MR" but it is still not good for "OrAndPattern". I will change the…
				ConstantInt *Cst;
				if (!(match(Val, m_Or(m_And(m_Value(LargeVal), m_ConstantInt(Cst)),
				m_Value(SmallVal))) &&
				match(LargeVal, m_c_Or(m_And(m_Value(), m_Value()), m_Value()))) &&
				!(match(Val, m_Or(m_Value(SmallVal),
				m_And(m_Value(LargeVal), m_ConstantInt(Cst)))) &&
				match(LargeVal, m_c_Or(m_And(m_Value(), m_Value()), m_Value()))) &&
				!match(Val, m_c_Or(m_And(m_Value(LargeVal), m_ConstantInt(Cst)),
				m_Value(SmallVal))))
				return false;

				LoadInst *LI = dyn_cast<LoadInst>(LargeVal);
				if (LI && !isLegalToShrinkStore(*LI, SI))
				return false;
				chandlercUnsubmitted Not Done Reply Inline Actions Should this be testing against the `DataLayout` rather than hard coded `8`, `16`, and `32`? What if 64 bits is legal and that's the width of the MR? chandlerc: Should this be testing against the `DataLayout` rather than hard coded `8`, `16`, and `32`?
				wmiAuthorUnsubmitted Not Done Reply Inline Actions That is better. Will fix it. wmi: That is better. Will fix it.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions For x8664, DataLayout works fine. However, for other architectures, like arm, the datalayout is target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64" DL::LegalIntWidths will only contain widths of natural integers, represented by "n32", so only 32 is Legal Integer Width for ARM. For x8664, its datalayout is: target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" because of "n8:16:32:64", 8, 16, 32, 64 are all legal integer width. wmi: For x8664, DataLayout works fine. However, for other architectures, like arm, the datalayout is…

				// Return BR which indicates the range of the LargeVal that will actually
				// be modified. If the pattern does the work to replace portion of LargeVal
				// with bits from SmallVal, we can do the split store transformation.
				// Otherwise, we can only try shrink the store to legal type transformation.
				// ToSplitStore will be populated to reflect that.
				bool ToSplitStore;
				BitRange BR = analyzeOrAndPattern(SmallVal, Cst, BitSize, ToSplitStore);
				assert(BR.Shift + BR.Width <= BitSize && "Unexpected BitRange!");
				if (!BR.Width)
				return false;

				if (ToSplitStore) {
				// Get the offset from Ptr for the shrunk store.
				unsigned StoreBitOffset = computeBitOffset(BR, BitSize, *DL);
				if (StoreBitOffset % 8 != 0)
				ToSplitStore = false;

				// If BR.Width is not the length of legal integer type, we cannot
				// store SmallVal directly.
				if (BR.Width != 8 && BR.Width != 16 && BR.Width != 32 && BR.Width != 64)
				ToSplitStore = false;
				}

				if (!ToSplitStore) {
				if (!LI)
				return false;
				if (!findBRWithLegalType(SI, BR))
				return false;
				StoreShrunkInfo SInfo(SI, LI, SmallVal, Cst, BR);
				shrinkStoreToLegalType(SInfo);
				return true;
				}
				IntegerType NewTy = Type::getIntNTy(Context, BR.Width);
				if (TLI->isNarrowingExpensive(EVT::getEVT(StoreTy), EVT::getEVT(NewTy)))
				return false;

				StoreShrunkInfo SInfo(SI, LargeVal, SmallVal, Cst, BR);
				splitStoreIntoTwo(SInfo);
				return true;
				}

				/// The pattern we try to shrink (which may also apply to code matching
				/// the tryShrinkStoreBySplit pattern when that transformation isn't valid)
				/// is to shrink the inputs of basic bit operations (or, and, xor) and then
				/// store the smaller width result. This is valid whenever we know that
				/// shrinking the inputs and operations doesn't change the result. For example,
				/// if the removed bits are known to be 0s, 1s, or undef, we may be able to
				/// avoid the bitwise computation on them. This is particularly useful when
				/// the type width is not a legal type, and when the inputs are loads and
				/// constants.
				bool MemAccessShrinkingPass::tryShrinkStoreToLegalTy(StoreInst &SI) {
				Value *Val = SI.getOperand(0);
				Type *StoreTy = Val->getType();
				if (StoreTy->isVectorTy() \|\| !StoreTy->isIntegerTy())
				return false;
				if (SI.isVolatile() \|\| !SI.isUnordered())
				return false;
				unsigned BitSize = DL->getTypeSizeInBits(StoreTy);
				if (BitSize != DL->getTypeStoreSizeInBits(StoreTy))
				return false;

				LoadInst *LI;
				ConstantInt *Cst;
				if (!match(Val, m_c_Or(m_Load(LI), m_ConstantInt(Cst))) &&
				!match(Val, m_c_And(m_Load(LI), m_ConstantInt(Cst))) &&
				!match(Val, m_c_Xor(m_Load(LI), m_ConstantInt(Cst))))
				return false;

				if (!isLegalToShrinkStore(*LI, SI))
				return false;

				// Find out the range of Val which is changed.
				BitRange BR = analyzeBOpPattern(Val, Cst, BitSize);
				assert(BR.Shift + BR.Width <= BitSize && "Unexpected BitRange!");
				if (!BR.Width)
				return false;
				if (!findBRWithLegalType(SI, BR))
				return false;

				StoreShrunkInfo SInfo(SI, LI, nullptr, Cst, BR);
				shrinkStoreToLegalType(SInfo);
				return true;
				}

				/// Try to extend the existing BitRange \BR to legal integer width.
				bool MemAccessShrinkingPass::findBRWithLegalType(StoreInst &SI, BitRange &BR) {
				chandlercUnsubmitted Not Done Reply Inline Actions Keeping this close to `extendBitRange` would make it a lot easier to read. Also, why have two functions at all? It appears this is the only thing calling `extendBitRange`. (I'm OK if there is a reason, just curious what it is.) chandlerc: Keeping this close to `extendBitRange` would make it a lot easier to read. Also, why have two…
				Type *StoreTy = SI.getOperand(0)->getType();
				unsigned Align = SI.getAlignment();
				unsigned BitSize = DL->getTypeSizeInBits(StoreTy);
				// Check whether the store is of illegal type. If it is not, don't bother.
				if (!TLI \|\| TLI->isOperationLegalOrCustom(ISD::STORE,
				TLI->getValueType(*DL, StoreTy)))
				return false;
				chandlercUnsubmitted Not Done Reply Inline Actions I'm surprised this doesn't just fall out from the logic in `extendBitRange`. chandlerc: I'm surprised this doesn't just fall out from the logic in `extendBitRange`.
				// Try to find out a BR of legal type.
				if (!extendBitRange(BR, BitSize, Align))
				return false;
				return true;
				}

				/// The transformation to split the store into two: one is for the LargeVal
				/// and one is for the SmallVal. The first store can be saved if the LargeVal
				/// is got from a load or there is an existing store for the LargeVal.
				bool MemAccessShrinkingPass::splitStoreIntoTwo(StoreShrunkInfo &SInfo) {
				chandlercUnsubmitted Not Done Reply Inline Actions Is `StoreShrunkInfo` buying much here? It seems to mostly be used in arguments, why not just pass the argument directly? The first bit of the code seems to just unpack everything into local variables. chandlerc: Is `StoreShrunkInfo` buying much here? It seems to mostly be used in arguments, why not just…
				BitRange &BR = SInfo.BR;
				StoreInst &SI = SInfo.SI;
				IRBuilder<> Builder(*Context);
				Builder.SetInsertPoint(&SI);
				Value *Val = SI.getOperand(0);
				Value *Ptr = SI.getOperand(1);

				Type *StoreTy = SI.getOperand(0)->getType();
				unsigned BitSize = DL->getTypeSizeInBits(StoreTy);
				unsigned StoreByteOffset = computeByteOffset(BR, BitSize, *DL);
				unsigned Align = SI.getAlignment();
				if (StoreByteOffset)
				Align = MinAlign(StoreByteOffset, Align);
				chandlercUnsubmitted Not Done Reply Inline Actions Doesn't MinAlign handle 0 correctly so that you can just do this unconditionally? chandlerc: Doesn't MinAlign handle 0 correctly so that you can just do this unconditionally?

				IntegerType NewTy = Type::getIntNTy(Context, BR.Width);
				Value *NewPtr = createNewPtr(Ptr, StoreByteOffset, NewTy, Builder);

				// Check if we need to split the original store and generate a new
				// store for the LargeVal.
				if (!isa<LoadInst>(SInfo.LargeVal) && needNewStore(*SInfo.LargeVal, SI)) {
				chandlercUnsubmitted Not Done Reply Inline Actions Isn't this only called when we need to insert two stores? chandlerc: Isn't this only called when we need to insert two stores?
				StoreInst *NewSI = cast<StoreInst>(SI.clone());
				NewSI->setOperand(0, SInfo.LargeVal);
				NewSI->setOperand(1, Ptr);
				Builder.Insert(NewSI);
				DEBUG(dbgs() << "MemShrink: Insert" << *NewSI << " before" << SI << "\n");
				// MemorySSA update for the new store.
				MemoryDef *OldMemAcc = cast<MemoryDef>(MSSA->getMemoryAccess(&SI));
				MemoryAccess *Def = OldMemAcc->getDefiningAccess();
				MemoryDef *NewMemAcc = cast<MemoryDef>(
				MSSAUpdater->createMemoryAccessBefore(NewSI, Def, OldMemAcc));
				MSSAUpdater->insertDef(NewMemAcc, false);
				chandlercUnsubmitted Not Done Reply Inline Actions It feels like all of this could be factored into an 'insertStore' method? In particular, the clone doesn't seem to buy you much as you rewrite most parts of the store anyways. This could handle all of the MemorySSA updating, logging, etc. chandlerc: It feels like all of this could be factored into an 'insertStore' method? In particular, the…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions I use clone here just to duplicate the subclass data like volatile and ordered. wmi: I use clone here just to duplicate the subclass data like volatile and ordered.
				chandlercUnsubmitted Not Done Reply Inline Actions I still think it will be cleaner to directly construct the load. Also, I wouldn't expect this pass to be valid for either volatile or ordered loads... chandlerc: I still think it will be cleaner to directly construct the load. Also, I wouldn't expect this…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Ok, I will change it to use IRBuilder. wmi: Ok, I will change it to use IRBuilder.
				}

				chandlercUnsubmitted Not Done Reply Inline Actions This comment and function name don't really add up for me... There is no `Cst` parameter here. I assume you mean `AI`? Also having a flag like `AInB` seems to make this much more confusing to read. Why not just have two routines for each case? My guess at what this is actually trying to do is `areConstantBitsWithinModRange` and `areConstantBitsOutsideModRange`? chandlerc: This comment and function name don't really add up for me... There is no `Cst` parameter here.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Sorry, Cst means AI here. The func is doing areConstantBitsWithinModRange and areModRangeWithinConstantBits. wmi: Sorry, Cst means AI here. The func is doing areConstantBitsWithinModRange and…
				// Create the New Value to store.
				Value *SmallVal = nullptr;
				if (auto MVCst = dyn_cast<ConstantInt>(SInfo.SmallVal)) {
				APInt ModifiedCst = MVCst->getValue().lshr(BR.Shift).trunc(BR.Width);
				SmallVal = ConstantInt::get(*Context, ModifiedCst);
				} else {
				Value *ShiftedVal =
				BR.Shift ? Builder.CreateLShr(SInfo.SmallVal, BR.Shift, "lshr")
				: SInfo.SmallVal;
				SmallVal = Builder.CreateTruncOrBitCast(ShiftedVal, NewTy, "trunc");
				}

				// Create the new store and remove the old one.
				StoreInst *NewSI = cast<StoreInst>(SI.clone());
				NewSI->setOperand(0, SmallVal);
				NewSI->setOperand(1, NewPtr);
				chandlercUnsubmitted Not Done Reply Inline Actions Maybe a method and use the term 'disjoint'? `MR1.isDisjoint(MR2)` reads a bit better to me. chandlerc: Maybe a method and use the term 'disjoint'? `MR1.isDisjoint(MR2)` reads a bit better to me.
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Ok. wmi: Ok.
				NewSI->setAlignment(Align);
				Builder.Insert(NewSI);

				DEBUG(dbgs() << "MemShrink: Replace" << SI << " with" << *NewSI << "\n");
				// MemorySSA update for the shrunk store.
				chandlercUnsubmitted Not Done Reply Inline Actions This makes this a very confusing API -- now it isn't really just a predicate, it also computes the insertion point... Also, why do you need to compute a particular insertion point within a basic block? Can't you just always insert into the beginning of the basic block and let the scheduler do any other adjustments? chandlerc: This makes this a very confusing API -- now it isn't really just a predicate, it also computes…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Good point. If there is a clobber instruction, but that clobber instruction is at the same block as "To" Instruction, I can simply insert it at the beginning of the block of "To" instruction, and NewInsertPt is not needed. But I still prefer to use the insertion point closer to "To" instruction if there is no clobber instruction, because the IR looks better. That means at least a flag showing whether I need to insert at the beginning of the "To" instruction block has to be returned. I.E., I can simplify "Instruction &NewInsertPt" to a flag. Is that API acceptable? wmi:* Good point. If there is a clobber instruction, but that clobber instruction is at the same…
				MemoryDef *OldMemAcc = cast<MemoryDef>(MSSA->getMemoryAccess(&SI));
				MemoryAccess *Def = OldMemAcc->getDefiningAccess();
				MemoryAccess *NewMemAcc =
				MSSAUpdater->createMemoryAccessBefore(NewSI, Def, OldMemAcc);
				OldMemAcc->replaceAllUsesWith(NewMemAcc);
				MSSAUpdater->removeMemoryAccess(OldMemAcc);

				markCandForDeletion(*cast<Instruction>(Val));
				SI.eraseFromParent();
				NumStoreShrunkBySplit++;
				return true;
				}

				/// The transformation to shrink the value of the store to smaller width.
				bool MemAccessShrinkingPass::shrinkStoreToLegalType(StoreShrunkInfo &SInfo) {
				BitRange &BR = SInfo.BR;
				StoreInst &SI = SInfo.SI;
				IRBuilder<> Builder(*Context);
				Builder.SetInsertPoint(&SI);
				Value *Val = SI.getOperand(0);
				Value *Ptr = SI.getOperand(1);

				Type *StoreTy = SI.getOperand(0)->getType();
				unsigned BitSize = DL->getTypeSizeInBits(StoreTy);
				unsigned StoreByteOffset = computeByteOffset(BR, BitSize, *DL);
				unsigned Align = SI.getAlignment();
				if (StoreByteOffset)
				Align = MinAlign(StoreByteOffset, Align);

				IntegerType NewTy = Type::getIntNTy(Context, BR.Width);
				Value *NewPtr = createNewPtr(Ptr, StoreByteOffset, NewTy, Builder);

				LoadInst *NewLI = cast<LoadInst>(cast<LoadInst>(SInfo.LargeVal)->clone());
				NewLI->mutateType(NewTy);
				chandlercUnsubmitted Not Done Reply Inline Actions Rather than cloning and mutating, just build a new load? the IRBuilder has a helpful API here. chandlerc: Rather than cloning and mutating, just build a new load? the IRBuilder has a helpful API here.
				NewLI->setOperand(0, NewPtr);
				NewLI->setAlignment(Align);
				Builder.Insert(NewLI, "load.trunc");
				// MemorySSA update for the shrunk load.
				MemoryDef *OldMemAcc = cast<MemoryDef>(MSSA->getMemoryAccess(&SI));
				MemoryAccess *Def = OldMemAcc->getDefiningAccess();
				MSSAUpdater->createMemoryAccessBefore(NewLI, Def, OldMemAcc);

				// Create the SmallVal to store.
				Value *SmallVal = nullptr;
				APInt ModifiedCst = SInfo.Cst->getValue().lshr(BR.Shift).trunc(BR.Width);
				ConstantInt NewCst = ConstantInt::get(Context, ModifiedCst);
				if (SInfo.SmallVal) {
				// Use SInfo.SmallVal to get the SmallVal.
				Value *Trunc;
				if (auto MVCst = dyn_cast<ConstantInt>(SInfo.SmallVal)) {
				ModifiedCst = MVCst->getValue().lshr(BR.Shift).trunc(BR.Width);
				Trunc = ConstantInt::get(*Context, ModifiedCst);
				} else {
				Value *ShiftedVal =
				BR.Shift ? Builder.CreateLShr(SInfo.SmallVal, BR.Shift, "lshr")
				: SInfo.SmallVal;
				Trunc = Builder.CreateTruncOrBitCast(ShiftedVal, NewTy, "trunc");
				}
				chandlercUnsubmitted Not Done Reply Inline Actions There is no need to handle constants specially. The constant folder will do all the work for you. chandlerc: There is no need to handle constants specially. The constant folder will do all the work for…
				Value *NewAnd = Builder.CreateAnd(NewLI, NewCst, "and.trunc");
				SmallVal = Builder.CreateOr(NewAnd, Trunc, "or.trunc");
				} else {
				BinaryOperator *BOP = cast<BinaryOperator>(Val);
				switch (BOP->getOpcode()) {
				default:
				llvm_unreachable("BOP can only be And/Or/Xor");
				case Instruction::And:
				SmallVal = Builder.CreateAnd(NewLI, NewCst, "and.trunc");
				break;
				case Instruction::Or:
				SmallVal = Builder.CreateOr(NewLI, NewCst, "or.trunc");
				break;
				case Instruction::Xor:
				SmallVal = Builder.CreateXor(NewLI, NewCst, "xor.trunc");
				break;
				}
				}
				chandlercUnsubmitted Not Done Reply Inline Actions I think the amount of code that is special cased here for one caller of this routine or the other is an indication that there is a better factoring of the code. If you had load insertion and store insertion factored out, then each caller could cleanly insert the narrow load, compute the narrow store (differently), and then insert it. Does that make sense? Maybe there is a reason why that doesn't work well? chandlerc: I think the amount of code that is special cased here for one caller of this routine or the…

				// Create the new store and remove the old one.
				StoreInst *NewSI = cast<StoreInst>(SI.clone());
				NewSI->setOperand(0, SmallVal);
				NewSI->setOperand(1, NewPtr);
				NewSI->setAlignment(Align);
				Builder.Insert(NewSI);

				DEBUG(dbgs() << "MemShrink: Replace" << SI << " with" << *NewSI << "\n");
				// MemorySSA update for the shrunk store.
				OldMemAcc = cast<MemoryDef>(MSSA->getMemoryAccess(&SI));
				Def = OldMemAcc->getDefiningAccess();
				MemoryAccess *NewMemAcc =
				MSSAUpdater->createMemoryAccessBefore(NewSI, Def, OldMemAcc);
				OldMemAcc->replaceAllUsesWith(NewMemAcc);
				MSSAUpdater->removeMemoryAccess(OldMemAcc);

				markCandForDeletion(*cast<Instruction>(Val));
				SI.eraseFromParent();
				NumStoreShrunkToLegal++;
				return true;
				}

				/// AI contains a consecutive run of 1. Check the range \p BR of \p AI are all
				/// 1s.
				chandlercUnsubmitted Not Done Reply Inline Actions Might read more easily as: "Assuming that \p AI contains a single sequence of bits set to 1, check whether the range \p BR is covered by that sequence." chandlerc: Might read more easily as: "Assuming that \p AI contains a single sequence of bits set to 1…
				static bool isRangeofAIAllOnes(BitRange &BR, APInt &AI) {
				if (BR.Shift < AI.countTrailingZeros() \|\|
				BR.Width + BR.Shift > (AI.getBitWidth() - AI.countLeadingZeros()))
				chandlercUnsubmitted Not Done Reply Inline Actions It seems more obvious to me to test this the other way: BR.Shift >= AI.countLeadingZeros() && BR.Shift + BR.Width < (AI.getBitWidth() - AI.countTrailingZeros()) Is this not equivalent for some reason? (Maybe my brain is off...) The reason I find this easier to read is because it seems to more directly test: "is the start of the BitRange after the start of the 1s, and is the end of the BitRange before the end of the 1s.". chandlerc: It seems more obvious to me to test this the other way: BR.Shift >= AI.countLeadingZeros()…
				return false;
				return true;
				}

				bool BitRange::isDisjoint(BitRange &BR) {
				return (Shift > BR.Shift + BR.Width - 1) \|\| (BR.Shift > Shift + Width - 1);
				}

				/// Check if there is no instruction between \p From and \p To which may
				/// clobber the MemoryLocation \p Mem. However, if there are clobbers and
				/// all the clobber instructions between \p From and \p To are in the same
				/// block as \p To, We will set \p AllClobberInToBlock to true.
				chandlercUnsubmitted Not Done Reply Inline Actions There is no comment about the cost of this routine. It looks really expensive. It appears to walk all transitive predecessors of the block containing `To`. So worst case, every basic block in the function. I see this called in several places from inside of for-loops. Is this really a reasonable approach? Why aren't we just walking the def-use chain from MemorySSA to figure this kind of thing out in a much lower time complexity bound? Like, shouldn't we just be able to walk up defs until we either see a clobber or `From`? chandlerc: There is no comment about the cost of this routine. It looks really expensive. It appears to…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions That is because the instruction `To` here may not be a memory access instruction (It is probably a And or Trunc instruction which indicates only some bits of the input are demanded), and we cannot get a MemoryAccess for it. Note hasClobberBetween are overloaded and there are two versions. The other version which walks the MSSA def-use chain is used in several for-loops as you saw. This higher cost version is not used in a loop. Besides, we only check MSSA DefList in each BB, so the worse case complexity is the number of memory access instructions in the func, which is usually much less than the number of instructions in the func. wmi: That is because the instruction `To` here may not be a memory access instruction (It is…
				chandlercUnsubmitted Not Done Reply Inline Actions Given how different the two routines are, I would suggest giving them separate names. It seemingly wasn't obvious that they were different already. I'm happy to look at the use cases, but this still feels much too expensive to me. In terms of big-O, the fact that it is only memory accesses doesn't really seem to help much. Quadratic in the number of memory accesses is still probably not something we can realistically do. I want to think more about the algorithm once I see exactly where this is being called. chandlerc: Given how different the two routines are, I would suggest giving them separate names. It…
				bool MemAccessShrinkingPass::hasClobberBetween(Instruction &From,
				Instruction &To,
				const MemoryLocation &Mem,
				bool &AllClobberInToBlock) {
				assert(DT->dominates(&From, &To) && "From doesn't dominate To");
				const BasicBlock *FBB = From.getParent();
				const BasicBlock *TBB = To.getParent();
				// Collect BasicBlocks to scan between FBB and TBB into BBSet.
				SmallPtrSet<const BasicBlock *, 4> BBSet;
				SmallVector<const BasicBlock *, 4> Worklist;
				BBSet.insert(FBB);
				BBSet.insert(TBB);
				Worklist.push_back(TBB);
				do {
				const BasicBlock *BB = Worklist.pop_back_val();
				for (const BasicBlock *Pred : predecessors(BB)) {
				if (!BBSet.count(Pred)) {
				BBSet.insert(Pred);
				chandlercUnsubmitted Not Done Reply Inline Actions You can just insert -- that will return whether it succeeded in inserting the block. chandlerc: You can just insert -- that will return whether it succeeded in inserting the block.
				Worklist.push_back(Pred);
				}
				}
				} while (!Worklist.empty());

				// Check the DefsList inside of each BB.
				bool hasClobber = false;
				chandlercUnsubmitted Not Done Reply Inline Actions Naming convention. chandlerc: Naming convention.
				AllClobberInToBlock = true;
				for (const BasicBlock *BB : BBSet) {
				const MemorySSA::DefsList *DList = MSSA->getBlockDefs(BB);
				if (!DList)
				continue;

				for (const MemoryAccess &MA : *DList) {
				if (const MemoryDef *MD = dyn_cast<MemoryDef>(&MA)) {
				Instruction *Inst = MD->getMemoryInst();
				if ((FBB == Inst->getParent() && DT->dominates(Inst, &From)) \|\|
				(TBB == Inst->getParent() && DT->dominates(&To, Inst)))
				chandlercUnsubmitted Not Done Reply Inline Actions So, each of these `dominates` queries are incredibly slow. They require linearly walking every instruction in the basic block (worst case). Why doesn't MemorySSA handle this for you? (Maybe my comment above about using MemorySSA will obviate this comment though.) chandlerc: So, each of these `dominates` queries are incredibly slow. They require linearly walking…
				continue;
				// Check whether MD is a clobber of Mem.
				if (MSSAWalker->getClobberingMemoryAccess(const_cast<MemoryDef *>(MD),
				Mem) == MD) {
				if (Inst->getParent() != TBB)
				AllClobberInToBlock = false;
				hasClobber = true;
				}
				}
				}
				}
				if (!hasClobber)
				AllClobberInToBlock = false;
				return hasClobber;
				}

				/// Match \p Val to the pattern:
				/// Bop(...Bop(V, Cst_1), ..., Cst_N) and pattern:
				/// or(and(...or(and(LI, Cst_1), SmallVal_1), ..., Cse_N), SmallVal_N),
				/// and find the final load to be shrunk.
				/// All the Bop instructions in the first pattern and the final load will
				/// be added into \p Insts and will be shrunk afterwards.
				bool MemAccessShrinkingPass::matchInstsToShrink(
				Value Val, BitRange &BR, SmallVectorImpl<Instruction > &Insts) {
				unsigned BitSize = DL->getTypeSizeInBits(Val->getType());

				// Match the pattern:
				// Bop(...Bop(V, Cst_1), Cst_2, ..., Cst_N), Bop can be Add/Sub/And/Or/Xor.
				// Those BinaryOperator instructions are easy to be shrunk to BR range.
				Value *Opd;
				ConstantInt *Cst;
				while (match(Val, m_Add(m_Value(Opd), m_ConstantInt(Cst))) \|\|
				match(Val, m_Sub(m_Value(Opd), m_ConstantInt(Cst))) \|\|
				match(Val, m_And(m_Value(Opd), m_ConstantInt(Cst))) \|\|
				match(Val, m_Or(m_Value(Opd), m_ConstantInt(Cst))) \|\|
				match(Val, m_Xor(m_Value(Opd), m_ConstantInt(Cst)))) {
				Instruction *BOP = cast<Instruction>(Val);
				addSaveCandidate(*BOP);
				Insts.push_back(BOP);
				Val = Opd;
				}

				LoadInst *LI;
				if ((LI = dyn_cast<LoadInst>(Val))) {
				addSaveCandidate(*LI);
				Insts.push_back(LI);
				return true;
				}

				// Match the pattern:
				// or(and(...or(and(LI, Cst_1), SmallVal_1), ..., Cse_N), SmallVal_N).
				// Analyze the range of LargeVal which may be modified by the current
				// or(and(LargeVal, Cst), SmallVal)) operations. If each or(and(...))
				// pair modifies a BitRange which is disjoint with current BR, we can
				// skip all these operations when we shrink the final load, so we only
				// have to add the final load into Insts.
				Value *LargeVal;
				Value *SmallVal;
				BitRange OtherBR;
				while (match(Val, m_c_Or(m_And(m_Load(LI), m_ConstantInt(Cst)),
				m_Value(SmallVal))) \|\|
				(match(Val, m_Or(m_And(m_Value(LargeVal), m_ConstantInt(Cst)),
				m_Value(SmallVal))) &&
				match(LargeVal, m_c_Or(m_And(m_Value(), m_Value()), m_Value()))) \|\|
				(match(Val, m_Or(m_Value(SmallVal),
				m_And(m_Value(LargeVal), m_ConstantInt(Cst)))) &&
				match(LargeVal, m_c_Or(m_And(m_Value(), m_Value()), m_Value())))) {
				addSaveCandidate(*cast<Instruction>(Val));
				bool DoReplacement;
				OtherBR.Shift = 0;
				OtherBR.Width = BitSize;
				// Analyze BitRange of current or(and(LargeVal, Cst), SmallVal) operations.
				OtherBR = analyzeOrAndPattern(SmallVal, Cst, BitSize, DoReplacement);
				// If OtherBR is not disjoint with BR, we cannot shrink load to its BR
				// portion.
				if (!BR.isDisjoint(OtherBR))
				return false;
				if (LI) {
				Insts.push_back(LI);
				return true;
				}
				Val = LargeVal;
				}

				return false;
				}

				/// Find the next MemoryAccess after LI in the same BB.
				MemoryAccess *findNextMemoryAccess(LoadInst &LI, MemorySSA &MSSA) {
				for (auto Scan = LI.getIterator(); Scan != LI.getParent()->end(); ++Scan) {
				if (MemoryAccess MA = MSSA.getMemoryAccess(&Scan))
				return MA;
				}
				return nullptr;
				}

				/// Shrink \p Insts according to the range BR.
				Value *MemAccessShrinkingPass::shrinkInsts(
				Value NewPtr, BitRange &BR, SmallVectorImpl<Instruction > &Insts,
				bool AllClobberInToBlock, unsigned ResShift, IRBuilder<> &Builder) {
				Value *NewVal;
				IntegerType NewTy = Type::getIntNTy(Context, BR.Width);
				Instruction &InsertPt = *Builder.GetInsertPoint();
				for (Instruction *I : make_range(Insts.rbegin(), Insts.rend())) {
				if (LoadInst *LI = dyn_cast<LoadInst>(I)) {
				unsigned BitSize = DL->getTypeSizeInBits(LI->getType());
				unsigned LoadByteOffset = computeByteOffset(BR, BitSize, *DL);
				unsigned Align = MinAlign(LoadByteOffset, LI->getAlignment());
				Instruction &NewInsertPt = *(InsertPt.getParent()->getFirstInsertionPt());
				if (AllClobberInToBlock && (&InsertPt != &NewInsertPt)) {
				// If we use a new insertion point, we have to recreate the NewPtr in
				// the new place.
				Builder.SetInsertPoint(&NewInsertPt);
				RecursivelyDeleteTriviallyDeadInstructions(NewPtr);
				NewPtr = createNewPtr(LI->getPointerOperand(), LoadByteOffset, NewTy,
				Builder);
				}
				// Create shrunk load.
				LoadInst *NewLI = cast<LoadInst>(LI->clone());
				NewLI->setOperand(0, NewPtr);
				NewLI->mutateType(NewTy);
				NewLI->setAlignment(Align);
				Builder.Insert(NewLI, "load.trunc");
				NewVal = NewLI;
				// Update MemorySSA.
				MemoryUseOrDef *OldMemAcc =
				cast<MemoryUseOrDef>(MSSA->getMemoryAccess(LI));
				MemoryAccess *Def = OldMemAcc->getDefiningAccess();
				// Find a proper position to insert the new MemoryAccess.
				MemoryAccess Next = findNextMemoryAccess(NewLI, *MSSA);
				if (Next)
				MSSAUpdater->createMemoryAccessBefore(NewLI, Def,
				cast<MemoryUseOrDef>(Next));
				else
				MSSAUpdater->createMemoryAccessInBB(NewLI, Def, NewLI->getParent(),
				MemorySSA::End);
				} else {
				// shrink Add/Sub/And/Xor/Or in Insts.
				BinaryOperator *BOP = cast<BinaryOperator>(I);
				ConstantInt *Cst = cast<ConstantInt>(BOP->getOperand(1));
				APInt NewAPInt = Cst->getValue().extractBits(BR.Width, BR.Shift);
				ConstantInt *NewCst =
				ConstantInt::get(*Context, NewAPInt.zextOrTrunc(BR.Width));
				NewVal = Builder.CreateBinOp(BOP->getOpcode(), NewVal, NewCst);
				}
				}
				// Adjust the type to be consistent with the input instruction.
				IntegerType *UseType = cast<IntegerType>(InsertPt.getType());
				if (DL->getTypeSizeInBits(UseType) != BR.Width)
				NewVal = Builder.CreateZExt(NewVal, UseType, "trunc.zext");
				// Adjust the NewVal with ResShift.
				if (ResShift) {
				ConstantInt *NewCst = ConstantInt::get(UseType, ResShift);
				NewVal = Builder.CreateShl(NewVal, NewCst, "shl");
				}
				return NewVal;
				}

				/// Determine whether \p Inst can be counted as a saved instruction when we
				/// are evaluate the benefit. If \p ReplaceAllUses is true, it means we are
				/// going to replace all the uses of \p Inst with the shrunk result, so it
				/// can still be counted as a saved instruction even if it has multiple uses.
				void MemAccessShrinkingPass::addSaveCandidate(Instruction &Inst,
				bool ReplaceAllUses) {
				if (!MultiUsesSeen) {
				chandlercUnsubmitted Not Done Reply Inline Actions It would be much more clear for this to be a parameter rather than an implicit parameter via class member. For example, multiple uses of what? chandlerc: It would be much more clear for this to be a parameter rather than an implicit parameter via…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions MultiUsesSeen is not changed for every instruction. It is saying whether a previous instruction on the chain was found to have multiuse when we walk the chain bottom-up. r1 = ...; r2 = r1 + r3; r4 = r2 + r5; If `r2` has multiple uses, both `r2 = r1 + r3` and `r1 = ...` cannot be removed after the shrinking. wmi: MultiUsesSeen is not changed for every instruction. It is saying whether a previous instruction…
				chandlercUnsubmitted Not Done Reply Inline Actions Ok, that explanation makes sense, but you'll need to find a way to make this clear from the code itself. =] At the very least, not using a member, but probably with some more helpful variable names, function names, structure or comments. chandlerc: Ok, that explanation makes sense, but you'll need to find a way to make this clear from the…
				// If the Inst has multiple uses and the current shrinking cannot replace
				// all its uses, it will not regarded as a SavedInst.
				if (ReplaceAllUses \|\| Inst.hasOneUse())
				SavedInsts += (!isa<CastInst>(Inst) && !isa<TruncInst>(Inst));
				else
				MultiUsesSeen = true;
				}
				}

				/// Compare \p SavedInsts with instructions we are about to create and decide
				/// whether it is beneficial to do the shrinking.
				bool MemAccessShrinkingPass::isBeneficial(
				unsigned ResShift, SmallVectorImpl<Instruction *> &Insts) {
				unsigned InstsToCreate = 0;
				if (ResShift)
				InstsToCreate++;
				InstsToCreate += Insts.size();
				return SavedInsts >= InstsToCreate;
				}

				/// When the input instruction \p IN is and(Val, Cst) or trunc, it indicates
				/// only a portion of its input value has been used. We will walk through the
				/// Def-Use chain, track the range of value which will be used, remember the
				/// operations contributing to the used value range, and skip operations which
				/// changes value range that is not to be used, until a load is found.
				///
				/// So we start from and or trunc operations, then try to find a sequence of
				/// shl or lshr operations. Those shifts are only changing the valid range
				/// of input value.
				///
				/// After we know the valid range of input value, we will collect the sequence
				/// of binary operators, which we want to shrink their input to the same range.
				/// To make it simpler, we requires one of the operands of any binary operator
				/// has to be constant.
				///
				/// Then we look for the pattern of or(and(LargeVal, Cst), SmallVal), which
				/// will only change a portion of the input LargeVal and keep the rest of it
				/// the same. If the operations pattern doesn't change the valid range, we can
				/// use LargeVal as input and skip all the operations here.
				///
				/// In the end, we look for a load which can provide the valid range directly
				/// after shifting the address and shrinking the size of the load.
				///
				/// An example:
				/// and(lshr(add(or(and(load P, -65536), SmallVal), 0x1000000000000), 48),
				/// 255)
				///
				/// The actual valid range of the load is [48, 56). The value in the range is
				/// incremented by 1. The sequence above will be transformed to:
				/// zext(add(load_i8(P + 48), 1), 64)
				///
				chandlercUnsubmitted Not Done Reply Inline Actions Rather than reproduce the body of `RecursivelyDeleteTriviallyDeadInstructions` here, how about actually refactoring that routine to have an overload taht accepts a SmallVectorImpl<Instruction > list of dead instructions, then you can hand your list to this routine rather than writing your own. Specifically, I think (as Eli is alluding to below) you should only put things into your `CandidatesToErase` vector (which I would rename `DeadInsts` or something) when they satisfy `isInstructionTriviallyDead`. Even if deleting one of the instructions is necessary to make one of the candidates dead, we'll still visit it because these routines recursively* delete dead instructions already. chandlerc: Rather than reproduce the body of `RecursivelyDeleteTriviallyDeadInstructions` here, how about…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Ok, I will do some refactoring based on RecursivelyDeleteTriviallyDeadInstructions. Another motivation is that I need to update MSSA while deleting instruction. We don't have callback to remove MemoryAccess when we delete memory instruction. wmi: Ok, I will do some refactoring based on RecursivelyDeleteTriviallyDeadInstructions. Another…
				chandlercUnsubmitted Not Done Reply Inline Actions Rather than re-implementing all of this logic, can you re-use the existing demanded bits facilities in LLVM? For example, I think you can use the `DemandedBits` analysis, walk all loads in the function, and then narrow them based on the demanded bits it has computed. Because of how `DemandedBits` works, it is both efficient and very powerful. It can handle many more patterns. Thinking about this, I suspect you'll want to do two passes essentially. First, narrow all the stores that you can. This will likely be iterative. Once that finishes, it seems like you'll be able to then do a single walk over the loads with a fresh `DemandedBits` analysis and narrow all of those left. You'll probably want to narrow the stores first because that may make bits stop being demanded. But I don't see any way for the reverse to be true, so there should be a good sequencing. To make the analysis invalidation stuff easier, you may actually need this to actually be two passes so that the store pass can invalidate the `DemandedBits` analysis, and the load pass can recompute it fresh. Does that make sense? If so, I would suggest getting just the store shrinking in this patch, and add the load shrinking in a follow-up patch. I'm happy for them to be implemented in a single file as they are very similar and its good for people to realize they likely want both passes. chandlerc: Rather than re-implementing all of this logic, can you re-use the existing demanded bits…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions I considered demanded bits facilities before, but I found it can only simplify the code a little bit. Finding the demanded bits inside of the load is only a small part of the work. Most of the complexity of the code comes from figuring out which ops in the sequence on the Def-Use Chain change the demanded bits. Like if we see shifts, we may clear some demanded bits in less significant position to zeros because we shift right then shift left. Because we change the demanded bits, we must include the shifts into the shrinked code sequence. Like if we see Or(And(Or(And(...)) pattern, we want to know that the bits changed by Or(And()) are different bits of the demanded bits, only when that is true, we can omit the Or(And(...)) pattern in the final shrinked code sequence. Another reason is, demanded bits analysis may not be very cheap. As for memory shrinking, few pattern like and/trunc is very common to be useful for the shrinking so we actually don't need very general demanded bits analysis for every instruction. wmi: I considered demanded bits facilities before, but I found it can only simplify the code a…
				chandlercUnsubmitted Not Done Reply Inline Actions I don't really understand the argument here... I would expect the demanded bits facilities to already handle things like shifts changing which bits are demanded, does it not? If not, it seems like we should extend that general facility rather than building an isolated system over here. Regarding the cost, I'm much more worried about the algorithmic cost of this and the fact that it seems relatively sensitive to things that we don't reliably canonicalize (using trunc or and instructions to remove demanded bits). Generally speaking, working backwards from the and or trunc is going to be much more expensive than working forwards. But even if it the existing demanded bits is too expensive, we still shouldn't develop a new cheap one locally here. We should either add a parameter to decrease its cost or add a new general purpose "fast" demanded bits and then use that here. chandlerc: I don't really understand the argument here... I would expect the demanded bits facilities to…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions I would expect the demanded bits facilities to already handle things like shifts changing which bits are demanded, does it not? Yes, demanded bits facilities can adjust which bits are demanded if there is shift. However, which bits are the demanded bits is not the only thing I need to know. I still need to know whether an operation in the Def-Use Chain effectively change the value of the bits demanded. If the operation changes the value of demanded bits, the operation should be shrinked together with the load. If it only changes the value of bits other than the demanded bits, the operation can be omitted. That is actually what most of the pattern matching is doing in load shrinking, and that part of work I think cannot be fullfilled by demanded bits analysis. wmi: > I would expect the demanded bits facilities to already handle things like shifts changing…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Even only for demanded bits, working forwards is not that straightforward because a wide load for a field group will be shared by multiple field accesses, and demanded bits analysis will sometimes show that all the bits of the load will be used. And it is possible that on the upper side of the Def-Use Chain, the width of the demanded bits are larger because of a node may have multiple uses, on the lower side of the Def-Use Chain, the width of the demanded bits are smaller. Working forward, we have to do some search on the expr tree rooted at a load stmt. Working backwards will let us know the demanded bits we want to use from the beginning. wmi: Even only for demanded bits, working forwards is not that straightforward because a wide load…
				bool MemAccessShrinkingPass::tryShrinkLoadUse(Instruction &IN) {
				chandlercUnsubmitted Not Done Reply Inline Actions `Inst` would seem like a more common variable name here. chandlerc: `Inst` would seem like a more common variable name here.
				// If the instruction is actually dead, skip it.
				if (IN.use_empty())
				return false;

				Type *Ty = IN.getType();
				if (Ty->isVectorTy() \|\| !Ty->isIntegerTy())
				return false;

				MultiUsesSeen = false;
				SavedInsts = 0;

				// Match and(Val, Cst) or Trunc(Val)
				Value *Val;
				ConstantInt *AndCst = nullptr;
				if (!match(&IN, m_Trunc(m_Value(Val))) &&
				!match(&IN, m_And(m_Value(Val), m_ConstantInt(AndCst))))
				return false;
				efriedmaUnsubmitted Not Done Reply Inline Actions If an instruction has no uses and isn't trivially dead, we're never going to erase it; no point to adding it to CandidatesToErase. efriedma: If an instruction has no uses and isn't trivially dead, we're never going to erase it; no point…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions An instruction which is not trivially dead for now may become dead after other instructions in CandidatesToErase are removed. That is why I want to add it to CandidatesToErase. wmi: An instruction which is not trivially dead for now may become dead after other instructions in…
				efriedmaUnsubmitted Not Done Reply Inline Actions OpI has no uses here. The only way an instruction can have no uses and still not be trivially dead is if it has side-effects. Deleting other instructions won't change the fact that it has side-effects. efriedma: OpI has no uses here. The only way an instruction can have no uses and still not be trivially…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions You are right. The entire logic about CandidatesToErase is problematic. I will fix it. wmi: You are right. The entire logic about CandidatesToErase is problematic. I will fix it.
				addSaveCandidate(IN, true);

				// Get the initial BR showing the range of Val to be used.
				BitRange BR;
				BR.Shift = 0;
				unsigned ResShift = 0;
				unsigned BitSize = DL->getTypeSizeInBits(Val->getType());
				if (BitSize % 8 != 0)
				return false;
				if (AndCst) {
				APInt AI = AndCst->getValue();
				// Check AI has consecutive one bits. The consecutive bits are the
				// range to be used. If the num of trailing zeros are not zero,
				efriedmaUnsubmitted Not Done Reply Inline Actions Should we clear CandidatesToErase here, as opposed to modifying it inside the loop? efriedma: Should we clear CandidatesToErase here, as opposed to modifying it inside the loop?
				wmiAuthorUnsubmitted Not Done Reply Inline Actions Ah, right. Actually, I shouldn't use range based loop since the iterator will be invalidated after insertion and deletion. wmi: Ah, right. Actually, I shouldn't use range based loop since the iterator will be invalidated…
				// remember the num in ResShift and the val after shrinking needs
				// to be shifted accordingly.
				BR.Shift = AI.countTrailingZeros();
				ResShift = BR.Shift;
				if (!(AI.lshr(BR.Shift) + 1).isPowerOf2() \|\| BR.Shift == BitSize)
				return false;
				BR.Width = AI.getActiveBits() - BR.Shift;
				} else {
				BR.Width = DL->getTypeSizeInBits(Ty);
				}

				// Match a series of LShr or Shl. Adjust BR.Shift accordingly.
				// Note we have to be careful that valid bits may be cleared during
				// back-and-force shifts. We use an all-one APInt Mask to simulate
				// the shifts and track the valid bits.
				Value *Opd;
				unsigned NewShift = BR.Shift;
				APInt Mask(BitSize, -1);
				ConstantInt *Cst;
				// Record the series of LShr or Shl in ShiftRecs.
				SmallVector<std::pair<bool, unsigned>, 4> ShiftRecs;
				bool isLShr;
				while ((isLShr = match(Val, m_LShr(m_Value(Opd), m_ConstantInt(Cst)))) \|\|
				match(Val, m_Shl(m_Value(Opd), m_ConstantInt(Cst)))) {
				addSaveCandidate(*cast<Instruction>(Val));
				NewShift = isLShr ? (NewShift + Cst->getZExtValue())
				chandlercUnsubmitted Not Done Reply Inline Actions This pattern seems confusing. How about using a lambda (or even an actual separate function) to model a single pass over the function, so that it can just return a single `Changed` variable? chandlerc: This pattern seems confusing. How about using a lambda (or even an actual separate function) to…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions It uses the same iterative pattern as CodeGenPrepare, but maybe the iterative pattern in InstCombine is clearer -- only one Changed variable is used there. I will wrap a single pass into a function. I probably rename tryShrinkOnInst to tryShrinkOnFunc and use it to wrap a single pass. Existing tryShrinkOnInst is simple enough so I can inline its content into tryShrinkOnFunc. wmi: It uses the same iterative pattern as CodeGenPrepare, but maybe the iterative pattern in…
				: (NewShift - Cst->getZExtValue());
				ShiftRecs.push_back({isLShr, Cst->getZExtValue()});
				Val = Opd;
				}
				// Iterate ShiftRecs in reverse order. Simulate the shifts of Mask.
				chandlercUnsubmitted Not Done Reply Inline Actions I would still use a for loop here, and importantly capture rend early: for (auto InstI = BB->rbegin(), InstE = BB-rend(); InstI != InstE;) ...(InstI++); chandlerc:* I would still use a for loop here, and importantly capture rend early: for (auto InstI = BB…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions ok, will change it. wmi: ok, will change it.
				for (auto SR : make_range(ShiftRecs.rbegin(), ShiftRecs.rend())) {
				if (SR.first)
				Mask = Mask.lshr(SR.second);
				chandlercUnsubmitted Not Done Reply Inline Actions Why not just one `Changed` variable? chandlerc: Why not just one `Changed` variable?
				else
				Mask = Mask.shl(SR.second);
				}
				// Mask contains a consecutive run of 1s. If the range BR of Mask are all
				// 1s, BR is valid after the shifts.
				if (!isRangeofAIAllOnes(BR, Mask))
				return false;
				BR.Shift = NewShift;

				// Specify the proper BR we want to handle.
				if (BR.Shift + BR.Width > BitSize)
				return false;
				if (BR.Width != 8 && BR.Width != 16 && BR.Width != 32 && BR.Width != 64)
				return false;
				if (BR.Shift % 8 != 0)
				return false;

				// Match other Binary operators like Add/Sub/And/Xor/Or or pattern like
				// And(Or(...And(Or(LargeVal, Cst), SmallVal))) and find the final load.
				SmallVector<Instruction *, 8> Insts;
				if (!matchInstsToShrink(Val, BR, Insts))
				return false;

				// Expect the final load has been found here.
				assert(!Insts.empty() && "Expect Insts to be not empty");
				LoadInst *LI = dyn_cast<LoadInst>(Insts.back());
				assert(LI && "Last elem in Insts should be a LoadInst");
				if (LI->isVolatile() \|\| !LI->isUnordered())
				return false;

				IntegerType NewTy = Type::getIntNTy(Context, BR.Width);
				if (TLI->isNarrowingExpensive(EVT::getEVT(Ty), EVT::getEVT(NewTy)))
				return false;

				// Do legality check to ensure the range of shrunk load is not clobbered
				// from *LI to IN.
				IRBuilder<> Builder(*Context);
				Builder.SetInsertPoint(&IN);
				unsigned LoadByteOffset = computeByteOffset(BR, BitSize, *DL);
				Value *NewPtr =
				createNewPtr(LI->getPointerOperand(), LoadByteOffset, NewTy, Builder);
				// There are clobbers and all the clobbers are in the same block as IN.
				bool AllClobberInToBlock;
				if (hasClobberBetween(*LI, IN, MemoryLocation(NewPtr, BR.Width / 8),
				AllClobberInToBlock)) {
				if (!AllClobberInToBlock) {
				RecursivelyDeleteTriviallyDeadInstructions(NewPtr);
				return false;
				}
				}
				if (!isBeneficial(ResShift, Insts)) {
				RecursivelyDeleteTriviallyDeadInstructions(NewPtr);
				return false;
				}

				// Do the shrinking transformation.
				Value *NewVal =
				shrinkInsts(NewPtr, BR, Insts, AllClobberInToBlock, ResShift, Builder);
				DEBUG(dbgs() << "MemShrink: Replace" << IN << " with" << *NewVal << "\n");
				IN.replaceAllUsesWith(NewVal);
				markCandForDeletion(IN);
				NumLoadShrunk++;
				return true;
				}

				/// Check insts in CandidatesToErase. If they are dead, remove the dead
				/// instructions recursively and clean up Memory SSA.
				bool MemAccessShrinkingPass::removeDeadInsts() {
				bool Changed = false;

				for (Instruction *I : CandidatesToErase) {
				RecursivelyDeleteTriviallyDeadInstructions(I, TLInfo, MSSAUpdater.get());
				Changed = true;
				}
				CandidatesToErase.clear();
				return Changed;
				}

				/// Try memory access shrinking on Function \Fn.
				bool MemAccessShrinkingPass::tryShrink(Function &Fn) {
				bool MadeChange = false;
				for (BasicBlock *BB : post_order(&Fn)) {
				for (auto InstI = BB->rbegin(), InstE = BB->rend(); InstI != InstE;) {
				chandlercUnsubmitted Not Done Reply Inline Actions Comments would be good explaining the very particular iteration order. chandlerc: Comments would be good explaining the very particular iteration order.
				Instruction &Inst = *InstI++;
				if (StoreInst *SI = dyn_cast<StoreInst>(&Inst)) {
				if (tryShrinkStoreBySplit(*SI)) {
				MadeChange = true;
				continue;
				}
				MadeChange \|= tryShrinkStoreToLegalTy(*SI);
				continue;
				}
				if (isa<TruncInst>(&Inst) \|\| isa<BinaryOperator>(&Inst))
				MadeChange \|= tryShrinkLoadUse(Inst);
				}
				}
				return MadeChange;
				}

				bool MemAccessShrinkingPass::runOnFunction(Function &Fn) {
				if (skipFunction(Fn))
				return false;

				DL = &Fn.getParent()->getDataLayout();
				DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
				TLInfo = &getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
				MSSA = &getAnalysis<MemorySSAWrapperPass>().getMSSA();
				MSSAWalker = MSSA->getWalker();
				MSSAUpdater = make_unique<MemorySSAUpdater>(MSSA);
				CandidatesToErase.clear();

				Context = &Fn.getContext();

				if (TM)
				TLI = TM->getSubtargetImpl(Fn)->getTargetLowering();

				// Iterates the Fn until nothing is changed.
				MSSA->verifyMemorySSA();
				bool MadeChange = false;
				while (true) {
				if (!tryShrink(Fn) && !removeDeadInsts())
				break;
				chandlercUnsubmitted Not Done Reply Inline Actions Do you want to run `tryShrink` again just because you removed dead instructions? If so, do you want to remove dead instructions on each iteration instead of just once `tryShrink` doesn't make a change? chandlerc: Do you want to run `tryShrink` again just because you removed dead instructions? If so, do you…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions If dead instruction is removed, another iteration will be taken and tryShrink will run again. I think it makes no difference between `running removeDeadInsts only when tryShrink makes no change` and `running removeDeadInsts everytime after tryShrink makes a change`. wmi: If dead instruction is removed, another iteration will be taken and tryShrink will run again.
				chandlercUnsubmitted Not Done Reply Inline Actions I guess I'm trying to ask: why will the removal of dead instructions cause shrinking to become more effective? Most of the algorithms here don't seem likely to remove entire classes of uses, and so I'm not sure why this iteration is valuable at all. But if it is valuable, that is, if removing dead instructions exposes more shrinking opportunities, I would expect that removing dead instructions earlier (IE, on each iteration) to cause this to converge faster. chandlerc: I guess I'm trying to ask: why will the removal of dead instructions cause shrinking to become…
				wmiAuthorUnsubmitted Not Done Reply Inline Actions why will the removal of dead instructions cause shrinking to become more effective? After removing some dead instructions, some multi uses def will become single use def and it will help increasing the benefit to do more shrinking. if removing dead instructions exposes more shrinking opportunities, I would expect that removing dead instructions earlier (IE, on each iteration) to cause this to converge faster. Ok, then I will do it after tryShrink in every iteration. wmi: > why will the removal of dead instructions cause shrinking to become more effective? After…
				MadeChange = true;
				}

				return MadeChange;
				}

lib/CodeGen/TargetPassConfig.cpp

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
static cl::opt<cl::boolOrDefault> OptimizeRegAlloc(		static cl::opt<cl::boolOrDefault> OptimizeRegAlloc(
"optimize-regalloc", cl::Hidden,		"optimize-regalloc", cl::Hidden,
cl::desc("Enable optimized register allocation compilation path."));		cl::desc("Enable optimized register allocation compilation path."));
static cl::opt<bool> DisablePostRAMachineLICM("disable-postra-machine-licm",		static cl::opt<bool> DisablePostRAMachineLICM("disable-postra-machine-licm",
cl::Hidden,		cl::Hidden,
cl::desc("Disable Machine LICM"));		cl::desc("Disable Machine LICM"));
static cl::opt<bool> DisableMachineSink("disable-machine-sink", cl::Hidden,		static cl::opt<bool> DisableMachineSink("disable-machine-sink", cl::Hidden,
cl::desc("Disable Machine Sinking"));		cl::desc("Disable Machine Sinking"));
		static cl::opt<bool>
		DisableMemAccessShrinking("disable-memaccess-shrinking", cl::Hidden,
		cl::desc("Disable MemAccessShrinking"));
static cl::opt<bool> DisableLSR("disable-lsr", cl::Hidden,		static cl::opt<bool> DisableLSR("disable-lsr", cl::Hidden,
cl::desc("Disable Loop Strength Reduction Pass"));		cl::desc("Disable Loop Strength Reduction Pass"));
static cl::opt<bool> DisableConstantHoisting("disable-constant-hoisting",		static cl::opt<bool> DisableConstantHoisting("disable-constant-hoisting",
cl::Hidden, cl::desc("Disable ConstantHoisting"));		cl::Hidden, cl::desc("Disable ConstantHoisting"));
static cl::opt<bool> DisableCGP("disable-cgp", cl::Hidden,		static cl::opt<bool> DisableCGP("disable-cgp", cl::Hidden,
cl::desc("Disable Codegen Prepare"));		cl::desc("Disable Codegen Prepare"));
static cl::opt<bool> DisableCopyProp("disable-copyprop", cl::Hidden,		static cl::opt<bool> DisableCopyProp("disable-copyprop", cl::Hidden,
cl::desc("Disable Copy Propagation pass"));		cl::desc("Disable Copy Propagation pass"));
▲ Show 20 Lines • Show All 398 Lines • ▼ Show 20 Lines	void TargetPassConfig::addIRPasses() {

// Make sure that no unreachable blocks are instruction selected.		// Make sure that no unreachable blocks are instruction selected.
addPass(createUnreachableBlockEliminationPass());		addPass(createUnreachableBlockEliminationPass());

// Prepare expensive constants for SelectionDAG.		// Prepare expensive constants for SelectionDAG.
if (getOptLevel() != CodeGenOpt::None && !DisableConstantHoisting)		if (getOptLevel() != CodeGenOpt::None && !DisableConstantHoisting)
addPass(createConstantHoistingPass());		addPass(createConstantHoistingPass());

		if (getOptLevel() != CodeGenOpt::None && !DisableMemAccessShrinking)
		addPass(createMemAccessShrinkingPass(TM));

		chandlercUnsubmitted Not Done Reply Inline Actions This should probably be predicated on `getOptLevel()` much like below? chandlerc: This should probably be predicated on `getOptLevel()` much like below?
		wmiAuthorUnsubmitted Not Done Reply Inline Actions Fixed. wmi: Fixed.
if (getOptLevel() != CodeGenOpt::None && !DisablePartialLibcallInlining)		if (getOptLevel() != CodeGenOpt::None && !DisablePartialLibcallInlining)
addPass(createPartiallyInlineLibCallsPass());		addPass(createPartiallyInlineLibCallsPass());

// Insert calls to mcount-like functions.		// Insert calls to mcount-like functions.
addPass(createCountingFunctionInserterPass());		addPass(createCountingFunctionInserterPass());
}		}

/// Turn exception handling constructs into something the code generators can		/// Turn exception handling constructs into something the code generators can
▲ Show 20 Lines • Show All 442 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.h

Show First 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	public:
bool isTruncateFree(EVT Src, EVT Dest) const override;		bool isTruncateFree(EVT Src, EVT Dest) const override;
bool isTruncateFree(Type Src, Type Dest) const override;		bool isTruncateFree(Type Src, Type Dest) const override;

bool isZExtFree(Type Src, Type Dest) const override;		bool isZExtFree(Type Src, Type Dest) const override;
bool isZExtFree(EVT Src, EVT Dest) const override;		bool isZExtFree(EVT Src, EVT Dest) const override;
bool isZExtFree(SDValue Val, EVT VT2) const override;		bool isZExtFree(SDValue Val, EVT VT2) const override;

bool isNarrowingProfitable(EVT VT1, EVT VT2) const override;		bool isNarrowingProfitable(EVT VT1, EVT VT2) const override;
		bool isNarrowingExpensive(EVT VT1, EVT VT2) const override;

MVT getVectorIdxTy(const DataLayout &) const override;		MVT getVectorIdxTy(const DataLayout &) const override;
bool isSelectSupported(SelectSupportKind) const override;		bool isSelectSupported(SelectSupportKind) const override;

bool isFPImmLegal(const APFloat &Imm, EVT VT) const override;		bool isFPImmLegal(const APFloat &Imm, EVT VT) const override;
bool ShouldShrinkFPConstant(EVT VT) const override;		bool ShouldShrinkFPConstant(EVT VT) const override;
bool shouldReduceLoadWidth(SDNode *Load,		bool shouldReduceLoadWidth(SDNode *Load,
ISD::LoadExtType ExtType,		ISD::LoadExtType ExtType,
▲ Show 20 Lines • Show All 247 Lines • Show Last 20 Lines

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Show First 20 Lines • Show All 755 Lines • ▼ Show 20 Lines	bool AMDGPUTargetLowering::isNarrowingProfitable(EVT SrcVT, EVT DestVT) const {
// limited number of native 64-bit operations. Shrinking an operation to fit		// limited number of native 64-bit operations. Shrinking an operation to fit
// in a single 32-bit register should always be helpful. As currently used,		// in a single 32-bit register should always be helpful. As currently used,
// this is much less general than the name suggests, and is only used in		// this is much less general than the name suggests, and is only used in
// places trying to reduce the sizes of loads. Shrinking loads to < 32-bits is		// places trying to reduce the sizes of loads. Shrinking loads to < 32-bits is
// not profitable, and may actually be harmful.		// not profitable, and may actually be harmful.
return SrcVT.getSizeInBits() > 32 && DestVT.getSizeInBits() == 32;		return SrcVT.getSizeInBits() > 32 && DestVT.getSizeInBits() == 32;
}		}

		bool AMDGPUTargetLowering::isNarrowingExpensive(EVT SrcVT, EVT DestVT) const {
		// If we are reducing to a 32-bit load, this is always better.
		if (DestVT.getStoreSizeInBits() == 32)
		return true;

		// Don't produce extloads from sub 32-bit types. SI doesn't have scalar
		// extloads, so doing one requires using a buffer_load. In cases where we
		// still couldn't use a scalar load, using the wider load shouldn't really
		// hurt anything.

		// If the old size already had to be an extload, there's no harm in continuing
		// to reduce the width.
		return (SrcVT.getStoreSizeInBits() < 32);
		}

//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//
// TargetLowering Callbacks		// TargetLowering Callbacks
//===---------------------------------------------------------------------===//		//===---------------------------------------------------------------------===//

CCAssignFn *AMDGPUCallLowering::CCAssignFnForCall(CallingConv::ID CC,		CCAssignFn *AMDGPUCallLowering::CCAssignFnForCall(CallingConv::ID CC,
bool IsVarArg) const {		bool IsVarArg) const {
return CC_AMDGPU;		return CC_AMDGPU;
}		}
▲ Show 20 Lines • Show All 2,881 Lines • Show Last 20 Lines

lib/Transforms/Utils/Local.cpp

Show First 20 Lines • Show All 358 Lines • ▼ Show 20 Lines	bool llvm::wouldInstructionBeTriviallyDead(Instruction *I,

return false;		return false;
}		}

/// RecursivelyDeleteTriviallyDeadInstructions - If the specified value is a		/// RecursivelyDeleteTriviallyDeadInstructions - If the specified value is a
/// trivially dead instruction, delete it. If that makes any of its operands		/// trivially dead instruction, delete it. If that makes any of its operands
/// trivially dead, delete them too, recursively. Return true if any		/// trivially dead, delete them too, recursively. Return true if any
/// instructions were deleted.		/// instructions were deleted.
bool		bool llvm::RecursivelyDeleteTriviallyDeadInstructions(
llvm::RecursivelyDeleteTriviallyDeadInstructions(Value *V,		Value V, const TargetLibraryInfo TLI, MemorySSAUpdater *MSSAUpdater) {
const TargetLibraryInfo *TLI) {
Instruction *I = dyn_cast<Instruction>(V);		Instruction *I = dyn_cast<Instruction>(V);
if (!I \|\| !I->use_empty() \|\| !isInstructionTriviallyDead(I, TLI))		if (!I \|\| !I->use_empty() \|\| !isInstructionTriviallyDead(I, TLI))
return false;		return false;

SmallVector<Instruction*, 16> DeadInsts;		SmallVector<Instruction*, 16> DeadInsts;
DeadInsts.push_back(I);		DeadInsts.push_back(I);

do {		do {
Show All 10 Lines	for (unsigned i = 0, e = I->getNumOperands(); i != e; ++i) {
// If the operand is an instruction that became dead as we nulled out the		// If the operand is an instruction that became dead as we nulled out the
// operand, and if it is 'trivially' dead, delete it in a future loop		// operand, and if it is 'trivially' dead, delete it in a future loop
// iteration.		// iteration.
if (Instruction *OpI = dyn_cast<Instruction>(OpV))		if (Instruction *OpI = dyn_cast<Instruction>(OpV))
if (isInstructionTriviallyDead(OpI, TLI))		if (isInstructionTriviallyDead(OpI, TLI))
DeadInsts.push_back(OpI);		DeadInsts.push_back(OpI);
}		}

		// If MemorySSA is used, update it when we are about to erase a memory
		// access instruction.
		if (MSSAUpdater)
		MSSAUpdater->removeMemoryAccess(I);

I->eraseFromParent();		I->eraseFromParent();
} while (!DeadInsts.empty());		} while (!DeadInsts.empty());

return true;		return true;
}		}

/// areAllUsesEqual - Check whether the uses of a value are all the same.		/// areAllUsesEqual - Check whether the uses of a value are all the same.
/// This is similar to Instruction::hasOneUse() except this will also return		/// This is similar to Instruction::hasOneUse() except this will also return
▲ Show 20 Lines • Show All 1,742 Lines • Show Last 20 Lines

test/CodeGen/ARM/illegal-bitfield-loadstore.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=arm-eabi \| FileCheck %s -check-prefix=LE			; RUN: llc < %s -mtriple=arm-eabi \| FileCheck %s -check-prefix=LE
	; RUN: llc < %s -mtriple=armeb-eabi \| FileCheck %s -check-prefix=BE			; RUN: llc < %s -mtriple=armeb-eabi \| FileCheck %s -check-prefix=BE

	define void @i24_or(i24* %a) {			define void @i24_or(i24* %a) {
	; LE-LABEL: i24_or:			; LE-LABEL: i24_or:
	; LE: @ BB#0:			; LE: @ BB#0:
	; LE-NEXT: ldrh r1, [r0]			; LE-NEXT: ldrh r1, [r0]
	; LE-NEXT: orr r1, r1, #384			; LE-NEXT: orr r1, r1, #384
	; LE-NEXT: strh r1, [r0]			; LE-NEXT: strh r1, [r0]
	; LE-NEXT: mov pc, lr			; LE-NEXT: mov pc, lr
	;			;
	; BE-LABEL: i24_or:			; BE-LABEL: i24_or:
	; BE: @ BB#0:			; BE: @ BB#0:
	; BE-NEXT: ldrh r1, [r0]			; BE-NEXT: ldrh r1, [r0, #1]
	; BE-NEXT: ldrb r2, [r0, #2]
	; BE-NEXT: orr r1, r2, r1, lsl #8
	; BE-NEXT: orr r1, r1, #384			; BE-NEXT: orr r1, r1, #384
	; BE-NEXT: strb r1, [r0, #2]			; BE-NEXT: strh r1, [r0, #1]
	; BE-NEXT: lsr r1, r1, #8
	; BE-NEXT: strh r1, [r0]
	; BE-NEXT: mov pc, lr			; BE-NEXT: mov pc, lr
	%aa = load i24, i24* %a, align 1			%aa = load i24, i24* %a, align 1
	%b = or i24 %aa, 384			%b = or i24 %aa, 384
	store i24 %b, i24* %a, align 1			store i24 %b, i24* %a, align 1
	ret void			ret void
	}			}

	define void @i24_and_or(i24* %a) {			define void @i24_and_or(i24* %a) {
	; LE-LABEL: i24_and_or:			; LE-LABEL: i24_and_or:
	; LE: @ BB#0:			; LE: @ BB#0:
	; LE-NEXT: ldrh r1, [r0]			; LE-NEXT: ldrh r1, [r0]
	; LE-NEXT: mov r2, #16256			; LE-NEXT: mov r2, #16256
	; LE-NEXT: orr r2, r2, #49152			; LE-NEXT: orr r2, r2, #49152
	; LE-NEXT: orr r1, r1, #384			; LE-NEXT: orr r1, r1, #384
	; LE-NEXT: and r1, r1, r2			; LE-NEXT: and r1, r1, r2
	; LE-NEXT: strh r1, [r0]			; LE-NEXT: strh r1, [r0]
	; LE-NEXT: mov pc, lr			; LE-NEXT: mov pc, lr
	;			;
	; BE-LABEL: i24_and_or:			; BE-LABEL: i24_and_or:
	; BE: @ BB#0:			; BE: @ BB#0:
	; BE-NEXT: mov r1, #128			; BE-NEXT: ldrh r1, [r0, #1]
	; BE-NEXT: strb r1, [r0, #2]			; BE-NEXT: mov r2, #16256
	; BE-NEXT: ldrh r1, [r0]			; BE-NEXT: orr r2, r2, #49152
	; BE-NEXT: orr r1, r1, #1			; BE-NEXT: orr r1, r1, #384
	; BE-NEXT: strh r1, [r0]			; BE-NEXT: and r1, r1, r2
				; BE-NEXT: strh r1, [r0, #1]
	; BE-NEXT: mov pc, lr			; BE-NEXT: mov pc, lr
	%b = load i24, i24* %a, align 1			%b = load i24, i24* %a, align 1
	%c = and i24 %b, -128			%c = and i24 %b, -128
	%d = or i24 %c, 384			%d = or i24 %c, 384
	store i24 %d, i24* %a, align 1			store i24 %d, i24* %a, align 1
	ret void			ret void
	}			}

	define void @i24_insert_bit(i24* %a, i1 zeroext %bit) {			define void @i24_insert_bit(i24* %a, i1 zeroext %bit) {
	; LE-LABEL: i24_insert_bit:			; LE-LABEL: i24_insert_bit:
	; LE: @ BB#0:			; LE: @ BB#0:
	; LE-NEXT: ldrh r2, [r0]			; LE-NEXT: ldrb r2, [r0, #1]
	; LE-NEXT: mov r3, #255			; LE-NEXT: and r2, r2, #223
	; LE-NEXT: orr r3, r3, #57088			; LE-NEXT: orr r1, r2, r1, lsl #5
	; LE-NEXT: and r2, r2, r3			; LE-NEXT: strb r1, [r0, #1]
	; LE-NEXT: orr r1, r2, r1, lsl #13
	; LE-NEXT: strh r1, [r0]
	; LE-NEXT: mov pc, lr			; LE-NEXT: mov pc, lr
	;			;
	; BE-LABEL: i24_insert_bit:			; BE-LABEL: i24_insert_bit:
	; BE: @ BB#0:			; BE: @ BB#0:
	; BE-NEXT: ldrh r2, [r0]			; BE-NEXT: ldrb r2, [r0, #1]
	; BE-NEXT: mov r3, #57088			; BE-NEXT: and r2, r2, #223
	; BE-NEXT: orr r3, r3, #16711680			; BE-NEXT: orr r1, r2, r1, lsl #5
	; BE-NEXT: and r2, r3, r2, lsl #8			; BE-NEXT: strb r1, [r0, #1]
	; BE-NEXT: orr r1, r2, r1, lsl #13
	; BE-NEXT: lsr r1, r1, #8
	; BE-NEXT: strh r1, [r0]
	; BE-NEXT: mov pc, lr			; BE-NEXT: mov pc, lr
	%extbit = zext i1 %bit to i24			%extbit = zext i1 %bit to i24
	%b = load i24, i24* %a, align 1			%b = load i24, i24* %a, align 1
	%extbit.shl = shl nuw nsw i24 %extbit, 13			%extbit.shl = shl nuw nsw i24 %extbit, 13
	%c = and i24 %b, -8193			%c = and i24 %b, -8193
	%d = or i24 %c, %extbit.shl			%d = or i24 %c, %extbit.shl
	store i24 %d, i24* %a, align 1			store i24 %d, i24* %a, align 1
	ret void			ret void
	}			}

	define void @i56_or(i56* %a) {			define void @i56_or(i56* %a) {
	; LE-LABEL: i56_or:			; LE-LABEL: i56_or:
	; LE: @ BB#0:			; LE: @ BB#0:
	; LE-NEXT: ldr r1, [r0]			; LE-NEXT: ldr r1, [r0]
	; LE-NEXT: orr r1, r1, #384			; LE-NEXT: orr r1, r1, #384
	; LE-NEXT: str r1, [r0]			; LE-NEXT: str r1, [r0]
	; LE-NEXT: mov pc, lr			; LE-NEXT: mov pc, lr
	;			;
	; BE-LABEL: i56_or:			; BE-LABEL: i56_or:
	; BE: @ BB#0:			; BE: @ BB#0:
	; BE-NEXT: mov r1, r0			; BE-NEXT: ldr r1, [r0, #3]
	; BE-NEXT: ldr r12, [r0]			; BE-NEXT: orr r1, r1, #384
	; BE-NEXT: ldrh r2, [r1, #4]!			; BE-NEXT: str r1, [r0, #3]
	; BE-NEXT: ldrb r3, [r1, #2]
	; BE-NEXT: orr r2, r3, r2, lsl #8
	; BE-NEXT: orr r2, r2, r12, lsl #24
	; BE-NEXT: orr r2, r2, #384
	; BE-NEXT: lsr r3, r2, #8
	; BE-NEXT: strb r2, [r1, #2]
	; BE-NEXT: strh r3, [r1]
	; BE-NEXT: bic r1, r12, #255
	; BE-NEXT: orr r1, r1, r2, lsr #24
	; BE-NEXT: str r1, [r0]
	; BE-NEXT: mov pc, lr			; BE-NEXT: mov pc, lr
	%aa = load i56, i56* %a			%aa = load i56, i56* %a
	%b = or i56 %aa, 384			%b = or i56 %aa, 384
	store i56 %b, i56* %a			store i56 %b, i56* %a
	ret void			ret void
	}			}

	define void @i56_and_or(i56* %a) {			define void @i56_and_or(i56* %a) {
	; LE-LABEL: i56_and_or:			; LE-LABEL: i56_and_or:
	; LE: @ BB#0:			; LE: @ BB#0:
	; LE-NEXT: ldr r1, [r0]			; LE-NEXT: ldr r1, [r0]
	; LE-NEXT: orr r1, r1, #384			; LE-NEXT: orr r1, r1, #384
	; LE-NEXT: bic r1, r1, #127			; LE-NEXT: bic r1, r1, #127
	; LE-NEXT: str r1, [r0]			; LE-NEXT: str r1, [r0]
	; LE-NEXT: mov pc, lr			; LE-NEXT: mov pc, lr
	;			;
	; BE-LABEL: i56_and_or:			; BE-LABEL: i56_and_or:
	; BE: @ BB#0:			; BE: @ BB#0:
	; BE-NEXT: mov r1, r0			; BE-NEXT: ldr r1, [r0, #3]
	; BE-NEXT: mov r3, #128			; BE-NEXT: orr r1, r1, #384
	; BE-NEXT: ldrh r2, [r1, #4]!			; BE-NEXT: bic r1, r1, #127
	; BE-NEXT: strb r3, [r1, #2]			; BE-NEXT: str r1, [r0, #3]
	; BE-NEXT: lsl r2, r2, #8
	; BE-NEXT: ldr r12, [r0]
	; BE-NEXT: orr r2, r2, r12, lsl #24
	; BE-NEXT: orr r2, r2, #384
	; BE-NEXT: lsr r3, r2, #8
	; BE-NEXT: strh r3, [r1]
	; BE-NEXT: bic r1, r12, #255
	; BE-NEXT: orr r1, r1, r2, lsr #24
	; BE-NEXT: str r1, [r0]
	; BE-NEXT: mov pc, lr			; BE-NEXT: mov pc, lr

	%b = load i56, i56* %a, align 1			%b = load i56, i56* %a, align 1
	%c = and i56 %b, -128			%c = and i56 %b, -128
	%d = or i56 %c, 384			%d = or i56 %c, 384
	store i56 %d, i56* %a, align 1			store i56 %d, i56* %a, align 1
	ret void			ret void
	}			}

	define void @i56_insert_bit(i56* %a, i1 zeroext %bit) {			define void @i56_insert_bit(i56* %a, i1 zeroext %bit) {
	; LE-LABEL: i56_insert_bit:			; LE-LABEL: i56_insert_bit:
	; LE: @ BB#0:			; LE: @ BB#0:
	; LE-NEXT: ldr r2, [r0]			; LE-NEXT: ldr r2, [r0, #1]
	; LE-NEXT: bic r2, r2, #8192			; LE-NEXT: bic r2, r2, #32
	; LE-NEXT: orr r1, r2, r1, lsl #13			; LE-NEXT: orr r1, r2, r1, lsl #5
	; LE-NEXT: str r1, [r0]			; LE-NEXT: str r1, [r0, #1]
	; LE-NEXT: mov pc, lr			; LE-NEXT: mov pc, lr
	;			;
	; BE-LABEL: i56_insert_bit:			; BE-LABEL: i56_insert_bit:
	; BE: @ BB#0:			; BE: @ BB#0:
	; BE-NEXT: .save {r11, lr}			; BE-NEXT: ldr r2, [r0, #2]
	; BE-NEXT: push {r11, lr}			; BE-NEXT: bic r2, r2, #32
	; BE-NEXT: mov r2, r0			; BE-NEXT: orr r1, r2, r1, lsl #5
	; BE-NEXT: ldr lr, [r0]			; BE-NEXT: str r1, [r0, #2]
	; BE-NEXT: ldrh r12, [r2, #4]!
	; BE-NEXT: ldrb r3, [r2, #2]
	; BE-NEXT: orr r12, r3, r12, lsl #8
	; BE-NEXT: orr r3, r12, lr, lsl #24
	; BE-NEXT: bic r3, r3, #8192
	; BE-NEXT: orr r1, r3, r1, lsl #13
	; BE-NEXT: lsr r3, r1, #8
	; BE-NEXT: strh r3, [r2]
	; BE-NEXT: bic r2, lr, #255
	; BE-NEXT: orr r1, r2, r1, lsr #24
	; BE-NEXT: str r1, [r0]
	; BE-NEXT: pop {r11, lr}
	; BE-NEXT: mov pc, lr			; BE-NEXT: mov pc, lr
	%extbit = zext i1 %bit to i56			%extbit = zext i1 %bit to i56
	%b = load i56, i56* %a, align 1			%b = load i56, i56* %a, align 1
	%extbit.shl = shl nuw nsw i56 %extbit, 13			%extbit.shl = shl nuw nsw i56 %extbit, 13
	%c = and i56 %b, -8193			%c = and i56 %b, -8193
	%d = or i56 %c, %extbit.shl			%d = or i56 %c, %extbit.shl
	store i56 %d, i56* %a, align 1			store i56 %d, i56* %a, align 1
	ret void			ret void
	}			}

test/CodeGen/ARM/load-shrink.ll

				; RUN: opt < %s -mtriple=arm-eabi -memaccessshrink -S \| FileCheck %s
				; Check bitfield load is shrinked properly in cases below.

				target datalayout = "e-m:o-p:32:32-f64:32:64-v64:32:64-v128:32:128-a:0:32-n32-S32"

				; The bitfield store can be shrinked from i64 to i16.
				; CHECK-LABEL: @load_and(
				; CHECK: %cast = bitcast i64* %ptr to i16*
				; CHECK: %load.trunc = load i16, i16* %cast, align 8
				; CHECK: %trunc.zext = zext i16 %load.trunc to i64
				; CHECK: %cmp = icmp eq i64 %trunc.zext, 1

				define i1 @load_and(i64* %ptr) {
				entry:
				%bf.load = load i64, i64* %ptr, align 8
				%bf.clear = and i64 %bf.load, 65535
				%cmp = icmp eq i64 %bf.clear, 1
				ret i1 %cmp
				}

				; The bitfield store can be shrinked from i64 to i8.
				; CHECK-LABEL: @load_trunc(
				; CHECK: %cast = bitcast i64* %ptr to i8*
				; CHECK: %load.trunc = load i8, i8* %cast, align 8
				; CHECK: %cmp = icmp eq i8 %load.trunc, 1

				define i1 @load_trunc(i64* %ptr) {
				entry:
				%bf.load = load i64, i64* %ptr, align 8
				%bf.clear = trunc i64 %bf.load to i8
				%cmp = icmp eq i8 %bf.clear, 1
				ret i1 %cmp
				}

				; The bitfield store can be shrinked from i64 to i8.
				; CHECK-LABEL: @load_and_shr(
				; CHECK: %cast = bitcast i64* %ptr to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 6
				; CHECK: %load.trunc = load i8, i8* %uglygep, align 2
				; CHECK: %trunc.zext = zext i8 %load.trunc to i64
				; CHECK: %cmp = icmp eq i64 %trunc.zext, 1

				define i1 @load_and_shr(i64* %ptr) {
				entry:
				%bf.load = load i64, i64* %ptr, align 8
				%bf.lshr = lshr i64 %bf.load, 48
				%bf.clear = and i64 %bf.lshr, 255
				%cmp = icmp eq i64 %bf.clear, 1
				ret i1 %cmp
				}

				; The bitfield store can be shrinked from i64 to i8.
				; CHECK-LABEL: @load_and_shr_add(
				; CHECK: %cast = bitcast i64* %ptr to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 6
				; CHECK: %load.trunc = load i8, i8* %uglygep, align 2
				; CHECK: %[[ADD:.*]] = add i8 %load.trunc, 1
				; CHECK: %trunc.zext = zext i8 %[[ADD]] to i64
				; CHECK: %cmp = icmp eq i64 %trunc.zext, 1

				define i1 @load_and_shr_add(i64* %ptr) {
				entry:
				%bf.load = load i64, i64* %ptr, align 8
				%add = add i64 %bf.load, 281474976710656
				%bf.lshr = lshr i64 %add, 48
				%bf.clear = and i64 %bf.lshr, 255
				%cmp = icmp eq i64 %bf.clear, 1
				ret i1 %cmp
				}

				; The bitfield store can be shrinked from i64 to i8.
				; CHECK-LABEL: @load_ops(
				; CHECK: %cast = bitcast i64* %ptr to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 6
				; CHECK: %load.trunc = load i8, i8* %uglygep, align 2
				; CHECK: %[[ADD:.*]] = add i8 %load.trunc, 1
				; CHECK: %trunc.zext = zext i8 %[[ADD]] to i64
				; CHECK: %cmp = icmp eq i64 %trunc.zext, 1

				define i1 @load_ops(i64* %ptr, i64 %value) {
				entry:
				%bf.load = load i64, i64* %ptr, align 8
				%bf.value = and i64 %value, 65535
				%bf.clear1 = and i64 %bf.load, -65536
				%bf.set = or i64 %bf.value, %bf.clear1
				%add = add i64 %bf.set, 281474976710656
				%bf.lshr = lshr i64 %add, 48
				%bf.clear = and i64 %bf.lshr, 255
				%cmp = icmp eq i64 %bf.clear, 1
				ret i1 %cmp
				}

				; It doesn't worth to do the shrink because %bf.load has multiple uses
				; and the shrink here doesn't save instructions.
				; CHECK-LABEL: @load_trunc_multiuses
				; CHECK: %bf.load = load i64, i64* %ptr, align 8
				; CHECK: %bf.trunc = trunc i64 %bf.load to i16
				; CHECK: %cmp1 = icmp eq i16 %bf.trunc, 3
				; CHECK: %cmp2 = icmp ult i64 %bf.load, 1500000
				; CHECK: %cmp = and i1 %cmp1, %cmp2

				define i1 @load_trunc_multiuses(i64* %ptr) {
				entry:
				%bf.load = load i64, i64* %ptr, align 8
				%bf.trunc = trunc i64 %bf.load to i16
				%cmp1 = icmp eq i16 %bf.trunc, 3
				%cmp2 = icmp ult i64 %bf.load, 1500000
				%cmp = and i1 %cmp1, %cmp2
				ret i1 %cmp
				}

test/CodeGen/ARM/store-shrink.ll

				; RUN: opt < %s -mtriple=arm-eabi -memaccessshrink -S \| FileCheck %s
				; Check bitfield store is shrinked properly in cases below.

				target datalayout = "e-m:o-p:32:32-f64:32:64-v64:32:64-v128:32:128-a:0:32-n32-S32"

				; class A1 {
				; unsigned long f1:8;
				; unsigned long f2:3;
				; } a1;
				; a1.f1 = n;
				;
				; The bitfield store can be shrinked from i16 to i8.
				; CHECK-LABEL: @test1(
				; CHECK: %conv = zext i32 %n to i64
				; CHECK: %t0 = trunc i64 %conv to i16
				; CHECK: %bf.value = and i16 %t0, 255
				; CHECK: %trunc = trunc i16 %bf.value to i8
				; CHECK: store i8 %trunc, i8* bitcast (%class.A1* @a1 to i8*), align 8

				%class.A1 = type { i16, [6 x i8] }
				@a1 = local_unnamed_addr global %class.A1 zeroinitializer, align 8

				define void @test1(i32 %n) {
				entry:
				%conv = zext i32 %n to i64
				%t0 = trunc i64 %conv to i16
				%bf.load = load i16, i16* getelementptr inbounds (%class.A1, %class.A1* @a1, i32 0, i32 0), align 8
				%bf.value = and i16 %t0, 255
				%bf.clear = and i16 %bf.load, -256
				%bf.set = or i16 %bf.clear, %bf.value
				store i16 %bf.set, i16* getelementptr inbounds (%class.A1, %class.A1* @a1, i32 0, i32 0), align 8
				ret void
				}

				; class A2 {
				; unsigned long f1:16;
				; unsigned long f2:3;
				; } a2;
				; a2.f1 = n;
				; The bitfield store can be shrinked from i32 to i16.
				; CHECK-LABEL: @test2(
				; CHECK: %bf.value = and i32 %n, 65535
				; CHECK: %trunc = trunc i32 %bf.value to i16
				; CHECK: store i16 %trunc, i16* bitcast (%class.A2* @a2 to i16*), align 8

				%class.A2 = type { i24, [4 x i8] }
				@a2 = local_unnamed_addr global %class.A2 zeroinitializer, align 8

				define void @test2(i32 %n) {
				entry:
				%bf.load = load i32, i32* bitcast (%class.A2* @a2 to i32*), align 8
				%bf.value = and i32 %n, 65535
				%bf.clear = and i32 %bf.load, -65536
				%bf.set = or i32 %bf.clear, %bf.value
				store i32 %bf.set, i32* bitcast (%class.A2* @a2 to i32*), align 8
				ret void
				}

				; class A3 {
				; unsigned long f1:32;
				; unsigned long f2:3;
				; } a3;
				; a3.f1 = n;
				; The bitfield store can be shrinked from i64 to i32.
				; CHECK-LABEL: @test3(
				; CHECK: %conv = zext i32 %n to i64
				; CHECK: %bf.value = and i64 %conv, 4294967295
				; CHECK: %trunc = trunc i64 %bf.value to i32
				; CHECK: store i32 %trunc, i32* bitcast (%class.A3* @a3 to i32*), align 8

				%class.A3 = type { i40 }
				@a3 = local_unnamed_addr global %class.A3 zeroinitializer, align 8

				define void @test3(i32 %n) {
				entry:
				%conv = zext i32 %n to i64
				%bf.load = load i64, i64* bitcast (%class.A3* @a3 to i64*), align 8
				%bf.value = and i64 %conv, 4294967295
				%bf.clear = and i64 %bf.load, -4294967296
				%bf.set = or i64 %bf.clear, %bf.value
				store i64 %bf.set, i64* bitcast (%class.A3* @a3 to i64*), align 8
				ret void
				}

				; class A4 {
				; unsigned long f1:13;
				; unsigned long f2:3;
				; } a4;
				; a4.f1 = n;
				; The bitfield store cannot be shrinked because the field is not 8/16/32 bits.
				; CHECK-LABEL: @test4(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: %bf.load = load i16, i16* getelementptr inbounds (%class.A4, %class.A4* @a4, i64 0, i32 0), align 8
				; CHECK-NEXT: %t0 = trunc i32 %n to i16
				; CHECK-NEXT: %bf.value = and i16 %t0, 8191
				; CHECK-NEXT: %bf.clear3 = and i16 %bf.load, -8192
				; CHECK-NEXT: %bf.set = or i16 %bf.clear3, %bf.value
				; CHECK-NEXT: store i16 %bf.set, i16* getelementptr inbounds (%class.A4, %class.A4* @a4, i64 0, i32 0), align 8
				; CHECK-NEXT: ret void

				%class.A4 = type { i16, [6 x i8] }
				@a4 = local_unnamed_addr global %class.A4 zeroinitializer, align 8

				define void @test4(i32 %n) {
				entry:
				%bf.load = load i16, i16* getelementptr inbounds (%class.A4, %class.A4* @a4, i64 0, i32 0), align 8
				%t0 = trunc i32 %n to i16
				%bf.value = and i16 %t0, 8191
				%bf.clear3 = and i16 %bf.load, -8192
				%bf.set = or i16 %bf.clear3, %bf.value
				store i16 %bf.set, i16* getelementptr inbounds (%class.A4, %class.A4* @a4, i64 0, i32 0), align 8
				ret void
				}

				; class A5 {
				; unsigned long f1:3;
				; unsigned long f2:16;
				; } a5;
				; a5.f2 = n;
				; The bitfield store cannot be shrinked because it is not aligned on
				; 16bits boundary.
				; CHECK-LABEL: @test5(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: %bf.load = load i32, i32* bitcast (%class.A5* @a5 to i32*), align 8
				; CHECK-NEXT: %bf.value = and i32 %n, 65535
				; CHECK-NEXT: %bf.shl = shl i32 %bf.value, 3
				; CHECK-NEXT: %bf.clear = and i32 %bf.load, -524281
				; CHECK-NEXT: %bf.set = or i32 %bf.clear, %bf.shl
				; CHECK-NEXT: store i32 %bf.set, i32* bitcast (%class.A5* @a5 to i32*), align 8
				; CHECK-NEXT: ret void

				%class.A5 = type { i24, [4 x i8] }
				@a5 = local_unnamed_addr global %class.A5 zeroinitializer, align 8

				define void @test5(i32 %n) {
				entry:
				%bf.load = load i32, i32* bitcast (%class.A5* @a5 to i32*), align 8
				%bf.value = and i32 %n, 65535
				%bf.shl = shl i32 %bf.value, 3
				%bf.clear = and i32 %bf.load, -524281
				%bf.set = or i32 %bf.clear, %bf.shl
				store i32 %bf.set, i32* bitcast (%class.A5* @a5 to i32*), align 8
				ret void
				}

				; class A6 {
				; unsigned long f1:16;
				; unsigned long f2:3;
				; } a6;
				; a6.f1 = n;
				; The bitfield store can be shrinked from i32 to i16 even the load and store
				; are in different BasicBlocks.
				; CHECK-LABEL: @test6(
				; CHECK: if.end:
				; CHECK: %bf.value = and i32 %n, 65535
				; CHECK: %trunc = trunc i32 %bf.value to i16
				; CHECK: store i16 %trunc, i16* bitcast (%class.A6* @a6 to i16*), align 8

				%class.A6 = type { i24, [4 x i8] }
				@a6 = local_unnamed_addr global %class.A6 zeroinitializer, align 8

				define void @test6(i32 %n) {
				entry:
				%bf.load = load i32, i32* bitcast (%class.A6* @a6 to i32*), align 8
				%bf.clear = and i32 %bf.load, 65535
				%cmp = icmp eq i32 %bf.clear, 2
				br i1 %cmp, label %return, label %if.end

				if.end: ; preds = %entry
				%bf.value = and i32 %n, 65535
				%bf.clear3 = and i32 %bf.load, -65536
				%bf.set = or i32 %bf.clear3, %bf.value
				store i32 %bf.set, i32* bitcast (%class.A6* @a6 to i32*), align 8
				br label %return

				return: ; preds = %entry, %if.end
				ret void
				}

				; class A7 {
				; unsigned long f1:16;
				; unsigned long f2:16;
				; } a7;
				; a7.f2 = n;
				; The bitfield store can be shrinked from i32 to i16.
				; CHECK-LABEL: @test7(
				; CHECK: %bf.value = and i32 %n, 65535
				; CHECK: %bf.shl = shl i32 %bf.value, 16
				; CHECK: %lshr = lshr i32 %bf.shl, 16
				; CHECK: %trunc = trunc i32 %lshr to i16
				; CHECK: store i16 %trunc, i16* bitcast (i8* getelementptr (i8, i8* bitcast (%class.A7* @a7 to i8), i32 2) to i16), align 2

				%class.A7 = type { i32, [4 x i8] }
				@a7 = local_unnamed_addr global %class.A7 zeroinitializer, align 8

				define void @test7(i32 %n) {
				entry:
				%bf.load = load i32, i32* getelementptr inbounds (%class.A7, %class.A7* @a7, i32 0, i32 0), align 8
				%bf.value = and i32 %n, 65535
				%bf.shl = shl i32 %bf.value, 16
				%bf.clear = and i32 %bf.load, 65535
				%bf.set = or i32 %bf.clear, %bf.shl
				store i32 %bf.set, i32* getelementptr inbounds (%class.A7, %class.A7* @a7, i32 0, i32 0), align 8
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i24 store
				; to i16.
				; CHECK-LABEL: @i24_or(
				; CHECK: %cast = bitcast i24* %a to i16*
				; CHECK: %load.trunc = load i16, i16* %cast, align 1
				; CHECK: %or.trunc = or i16 %load.trunc, 384
				; CHECK: store i16 %or.trunc, i16* %cast, align 1
				;
				define void @i24_or(i24* %a) {
				%aa = load i24, i24* %a, align 1
				%b = or i24 %aa, 384
				store i24 %b, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i24 store
				; to i8.
				; CHECK-LABEL: @i24_and(
				; CHECK: %cast = bitcast i24* %a to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 1
				; CHECK: %load.trunc = load i8, i8* %uglygep, align 1
				; CHECK: %and.trunc = and i8 %load.trunc, -7
				; CHECK: store i8 %and.trunc, i8* %uglygep, align 1
				;
				define void @i24_and(i24* %a) {
				%aa = load i24, i24* %a, align 1
				%b = and i24 %aa, -1537
				store i24 %b, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i24 store
				; to i16.
				; CHECK-LABEL: @i24_xor(
				; CHECK: %cast = bitcast i24* %a to i16*
				; CHECK: %load.trunc = load i16, i16* %cast, align 1
				; CHECK: %xor.trunc = xor i16 %load.trunc, 384
				; CHECK: store i16 %xor.trunc, i16* %cast, align 1
				;
				define void @i24_xor(i24* %a) {
				%aa = load i24, i24* %a, align 1
				%b = xor i24 %aa, 384
				store i24 %b, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i24 store
				; to i16.
				; CHECK-LABEL: @i24_and_or(
				; CHECK: %cast = bitcast i24* %a to i16*
				; CHECK: %load.trunc = load i16, i16* %cast, align 1
				; CHECK: %and.trunc = and i16 %load.trunc, -128
				; CHECK: %or.trunc = or i16 %and.trunc, 384
				; CHECK: store i16 %or.trunc, i16* %cast, align 1
				;
				define void @i24_and_or(i24* %a) {
				%b = load i24, i24* %a, align 1
				%c = and i24 %b, -128
				%d = or i24 %c, 384
				store i24 %d, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can shrink the i24 store to i8.
				; CHECK-LABEL: @i24_insert_bit(
				; CHECK: %extbit = zext i1 %bit to i24
				; CHECK: %extbit.shl = shl nuw nsw i24 %extbit, 13
				; CHECK: %cast = bitcast i24* %a to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 1
				; CHECK: %load.trunc = load i8, i8* %uglygep, align 1
				; CHECK: %lshr = lshr i24 %extbit.shl, 8
				; CHECK: %trunc = trunc i24 %lshr to i8
				; CHECK: %and.trunc = and i8 %load.trunc, -33
				; CHECK: %or.trunc = or i8 %and.trunc, %trunc
				; CHECK: store i8 %or.trunc, i8* %uglygep, align 1
				;
				define void @i24_insert_bit(i24* %a, i1 zeroext %bit) {
				%extbit = zext i1 %bit to i24
				%b = load i24, i24* %a, align 1
				%extbit.shl = shl nuw nsw i24 %extbit, 13
				%c = and i24 %b, -8193
				%d = or i24 %c, %extbit.shl
				store i24 %d, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i56 store
				; to i16.
				; CHECK-LABEL: @i56_or(
				; CHECK: %cast = bitcast i56* %a to i32*
				; CHECK: %load.trunc = load i32, i32* %cast, align 1
				; CHECK: %or.trunc = or i32 %load.trunc, 384
				; CHECK: store i32 %or.trunc, i32* %cast, align 1
				;
				define void @i56_or(i56* %a) {
				%aa = load i56, i56* %a, align 1
				%b = or i56 %aa, 384
				store i56 %b, i56* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can shrink the i56 store
				; to i16.
				; CHECK-LABEL: @i56_and_or(
				; CHECK: %cast = bitcast i56* %a to i32*
				; CHECK: %load.trunc = load i32, i32* %cast, align 1
				; CHECK: %and.trunc = and i32 %load.trunc, -128
				; CHECK: %or.trunc = or i32 %and.trunc, 384
				; CHECK: store i32 %or.trunc, i32* %cast, align 1
				;
				define void @i56_and_or(i56* %a) {
				%b = load i56, i56* %a, align 1
				%c = and i56 %b, -128
				%d = or i56 %c, 384
				store i56 %d, i56* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can shrink the i56 store to i8.
				; CHECK-LABEL: @i56_insert_bit(
				; CHECK: %extbit = zext i1 %bit to i56
				; CHECK: %extbit.shl = shl nuw nsw i56 %extbit, 13
				; CHECK: %cast = bitcast i56* %a to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 1
				; CHECK: %cast1 = bitcast i8* %uglygep to i32*
				; CHECK: %load.trunc = load i32, i32* %cast1, align 1
				; CHECK: %lshr = lshr i56 %extbit.shl, 8
				; CHECK: %trunc = trunc i56 %lshr to i32
				; CHECK: %and.trunc = and i32 %load.trunc, -33
				; CHECK: %or.trunc = or i32 %and.trunc, %trunc
				; CHECK: store i32 %or.trunc, i32* %cast1, align 1
				;
				define void @i56_insert_bit(i56* %a, i1 zeroext %bit) {
				%extbit = zext i1 %bit to i56
				%b = load i56, i56* %a, align 1
				%extbit.shl = shl nuw nsw i56 %extbit, 13
				%c = and i56 %b, -8193
				%d = or i56 %c, %extbit.shl
				store i56 %d, i56* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i56 store
				; to i16.
				; CHECK-LABEL: @i56_or_alg2(
				; CHECK: %cast = bitcast i56* %a to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 2
				; CHECK: %cast1 = bitcast i8* %uglygep to i32*
				; CHECK: %load.trunc = load i32, i32* %cast1, align 2
				; CHECK: %or.trunc = or i32 %load.trunc, 272
				; CHECK: store i32 %or.trunc, i32* %cast1, align 2
				;
				define void @i56_or_alg2(i56* %a) {
				%aa = load i56, i56* %a, align 2
				%b = or i56 %aa, 17825792
				store i56 %b, i56* %a, align 2
				ret void
				}

test/CodeGen/X86/2008-09-11-CoalescerBug2.ll

	; RUN: llc < %s -march=x86			; RUN: llc < %s -march=x86
	; RUN: llc -pre-RA-sched=source < %s -mtriple=i686-unknown-linux -mcpu=corei7 \| FileCheck %s --check-prefix=SOURCE-SCHED			; RUN: llc -pre-RA-sched=source < %s -mtriple=i686-unknown-linux -mcpu=corei7 \| FileCheck %s --check-prefix=SOURCE-SCHED
	; PR2748			; PR2748

	@g_73 = external global i32 ; <i32*> [#uses=1]			@g_73 = external global i32 ; <i32*> [#uses=1]
	@g_5 = external global i32 ; <i32*> [#uses=1]			@g_5 = external global i32 ; <i32*> [#uses=1]

	define i32 @func_44(i16 signext %p_46) nounwind {			define i32 @func_44(i16 signext %p_46) nounwind {
	entry:			entry:
	; SOURCE-SCHED: subl			; SOURCE-SCHED: subl
	; SOURCE-SCHED: movl			; SOURCE-SCHED: movl
	; SOURCE-SCHED: sarl			; SOURCE-SCHED: sarl
	; SOURCE-SCHED: xorl			; SOURCE-SCHED: xorl
	; SOURCE-SCHED: cmpl			; SOURCE-SCHED: cmpl
	; SOURCE-SCHED: setg			; SOURCE-SCHED: setg
	; SOURCE-SCHED: movb
	; SOURCE-SCHED: xorl			; SOURCE-SCHED: xorl
	; SOURCE-SCHED: subl			; SOURCE-SCHED: subl
				; SOURCE-SCHED: movb
	; SOURCE-SCHED: testb			; SOURCE-SCHED: testb
	; SOURCE-SCHED: jne			; SOURCE-SCHED: jne
	%0 = load i32, i32* @g_5, align 4 ; <i32> [#uses=1]			%0 = load i32, i32* @g_5, align 4 ; <i32> [#uses=1]
	%1 = ashr i32 %0, 1 ; <i32> [#uses=1]			%1 = ashr i32 %0, 1 ; <i32> [#uses=1]
	%2 = icmp sgt i32 %1, 1 ; <i1> [#uses=1]			%2 = icmp sgt i32 %1, 1 ; <i1> [#uses=1]
	%3 = zext i1 %2 to i32 ; <i32> [#uses=1]			%3 = zext i1 %2 to i32 ; <i32> [#uses=1]
	%4 = load i32, i32* @g_73, align 4 ; <i32> [#uses=1]			%4 = load i32, i32* @g_73, align 4 ; <i32> [#uses=1]
	%5 = zext i16 %p_46 to i64 ; <i64> [#uses=1]			%5 = zext i16 %p_46 to i64 ; <i64> [#uses=1]
	Show All 19 Lines

test/CodeGen/X86/constant-combines.ll

	; RUN: llc < %s \| FileCheck %s			; RUN: llc -disable-memaccess-shrinking < %s \| FileCheck %s

	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-unknown"			target triple = "x86_64-unknown-unknown"

	define void @PR22524({ float, float }* %arg) {			define void @PR22524({ float, float }* %arg) {
	; Check that we can materialize the zero constants we store in two places here,			; Check that we can materialize the zero constants we store in two places here,
	; and at least form a legal store of the floating point value at the end.			; and at least form a legal store of the floating point value at the end.
	; The DAG combiner at one point contained bugs that given enough permutations			; The DAG combiner at one point contained bugs that given enough permutations
	Show All 26 Lines

test/CodeGen/X86/i16lshr8pat.ll

	; RUN: llc -march=x86 -stop-after expand-isel-pseudos <%s 2>&1 \| FileCheck %s			; RUN: llc -march=x86 -disable-memaccess-shrinking -stop-after expand-isel-pseudos <%s 2>&1 \| FileCheck %s

	target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"			target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
	target triple = "i386-unknown-linux-gnu"			target triple = "i386-unknown-linux-gnu"

	; This test checks to make sure the lshr in %then1 block gets expanded using			; This test checks to make sure the lshr in %then1 block gets expanded using
	; GR16_ABCD pattern rather than GR32_ABCD pattern. By using the 16-bit pattern			; GR16_ABCD pattern rather than GR32_ABCD pattern. By using the 16-bit pattern
	; this doesn't make the register liveness information look like the whole			; this doesn't make the register liveness information look like the whole
	; 32-bit register is a live value, and allows generally better live register			; 32-bit register is a live value, and allows generally better live register
	Show All 23 Lines

test/CodeGen/X86/illegal-bitfield-loadstore.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7 \| FileCheck %s

	define void @i24_or(i24* %a) {			define void @i24_or(i24* %a) {
	; CHECK-LABEL: i24_or:			; CHECK-LABEL: i24_or:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzwl (%rdi), %eax			; CHECK-NEXT: orw $384, (%rdi) # imm = 0x180
	; CHECK-NEXT: movzbl 2(%rdi), %ecx
	; CHECK-NEXT: movb %cl, 2(%rdi)
	; CHECK-NEXT: shll $16, %ecx
	; CHECK-NEXT: orl %eax, %ecx
	; CHECK-NEXT: orl $384, %ecx # imm = 0x180
	; CHECK-NEXT: movw %cx, (%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%aa = load i24, i24* %a, align 1			%aa = load i24, i24* %a, align 1
	%b = or i24 %aa, 384			%b = or i24 %aa, 384
	store i24 %b, i24* %a, align 1			store i24 %b, i24* %a, align 1
	ret void			ret void
	}			}

	define void @i24_and_or(i24* %a) {			define void @i24_and_or(i24* %a) {
	; CHECK-LABEL: i24_and_or:			; CHECK-LABEL: i24_and_or:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzwl (%rdi), %eax			; CHECK-NEXT: movzwl (%rdi), %eax
	; CHECK-NEXT: movzbl 2(%rdi), %ecx			; CHECK-NEXT: orl $384, %eax # imm = 0x180
	; CHECK-NEXT: movb %cl, 2(%rdi)			; CHECK-NEXT: andl $65408, %eax # imm = 0xFF80
	; CHECK-NEXT: shll $16, %ecx			; CHECK-NEXT: movw %ax, (%rdi)
	; CHECK-NEXT: orl %eax, %ecx
	; CHECK-NEXT: orl $384, %ecx # imm = 0x180
	; CHECK-NEXT: andl $16777088, %ecx # imm = 0xFFFF80
	; CHECK-NEXT: movw %cx, (%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%b = load i24, i24* %a, align 1			%b = load i24, i24* %a, align 1
	%c = and i24 %b, -128			%c = and i24 %b, -128
	%d = or i24 %c, 384			%d = or i24 %c, 384
	store i24 %d, i24* %a, align 1			store i24 %d, i24* %a, align 1
	ret void			ret void
	}			}

	define void @i24_insert_bit(i24* %a, i1 zeroext %bit) {			define void @i24_insert_bit(i24* %a, i1 zeroext %bit) {
	; CHECK-LABEL: i24_insert_bit:			; CHECK-LABEL: i24_insert_bit:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzbl %sil, %eax			; CHECK-NEXT: movb 1(%rdi), %al
	; CHECK-NEXT: movzwl (%rdi), %ecx			; CHECK-NEXT: shlb $5, %sil
	; CHECK-NEXT: movzbl 2(%rdi), %edx			; CHECK-NEXT: andb $-33, %al
	; CHECK-NEXT: movb %dl, 2(%rdi)			; CHECK-NEXT: orb %sil, %al
	; CHECK-NEXT: shll $16, %edx			; CHECK-NEXT: movb %al, 1(%rdi)
	; CHECK-NEXT: orl %ecx, %edx
	; CHECK-NEXT: shll $13, %eax
	; CHECK-NEXT: andl $16769023, %edx # imm = 0xFFDFFF
	; CHECK-NEXT: orl %eax, %edx
	; CHECK-NEXT: movw %dx, (%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%extbit = zext i1 %bit to i24			%extbit = zext i1 %bit to i24
	%b = load i24, i24* %a, align 1			%b = load i24, i24* %a, align 1
	%extbit.shl = shl nuw nsw i24 %extbit, 13			%extbit.shl = shl nuw nsw i24 %extbit, 13
	%c = and i24 %b, -8193			%c = and i24 %b, -8193
	%d = or i24 %c, %extbit.shl			%d = or i24 %c, %extbit.shl
	store i24 %d, i24* %a, align 1			store i24 %d, i24* %a, align 1
	ret void			ret void
	}			}

	define void @i56_or(i56* %a) {			define void @i56_or(i56* %a) {
	; CHECK-LABEL: i56_or:			; CHECK-LABEL: i56_or:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzwl 4(%rdi), %eax			; CHECK-NEXT: orw $384, (%rdi) # imm = 0x180
	; CHECK-NEXT: movzbl 6(%rdi), %ecx
	; CHECK-NEXT: movl (%rdi), %edx
	; CHECK-NEXT: movb %cl, 6(%rdi)
	; CHECK-NEXT: # kill: %ECX<def> %ECX<kill> %RCX<kill> %RCX<def>
	; CHECK-NEXT: shll $16, %ecx
	; CHECK-NEXT: orl %eax, %ecx
	; CHECK-NEXT: shlq $32, %rcx
	; CHECK-NEXT: orq %rcx, %rdx
	; CHECK-NEXT: orq $384, %rdx # imm = 0x180
	; CHECK-NEXT: movl %edx, (%rdi)
	; CHECK-NEXT: shrq $32, %rdx
	; CHECK-NEXT: movw %dx, 4(%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%aa = load i56, i56* %a, align 1			%aa = load i56, i56* %a, align 1
	%b = or i56 %aa, 384			%b = or i56 %aa, 384
	store i56 %b, i56* %a, align 1			store i56 %b, i56* %a, align 1
	ret void			ret void
	}			}

	define void @i56_and_or(i56* %a) {			define void @i56_and_or(i56* %a) {
	; CHECK-LABEL: i56_and_or:			; CHECK-LABEL: i56_and_or:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzwl 4(%rdi), %eax			; CHECK-NEXT: movzwl (%rdi), %eax
	; CHECK-NEXT: movzbl 6(%rdi), %ecx			; CHECK-NEXT: orl $384, %eax # imm = 0x180
	; CHECK-NEXT: movl (%rdi), %edx			; CHECK-NEXT: andl $65408, %eax # imm = 0xFF80
	; CHECK-NEXT: movb %cl, 6(%rdi)			; CHECK-NEXT: movw %ax, (%rdi)
	; CHECK-NEXT: # kill: %ECX<def> %ECX<kill> %RCX<kill> %RCX<def>
	; CHECK-NEXT: shll $16, %ecx
	; CHECK-NEXT: orl %eax, %ecx
	; CHECK-NEXT: shlq $32, %rcx
	; CHECK-NEXT: orq %rcx, %rdx
	; CHECK-NEXT: orq $384, %rdx # imm = 0x180
	; CHECK-NEXT: movabsq $72057594037927808, %rax # imm = 0xFFFFFFFFFFFF80
	; CHECK-NEXT: andq %rdx, %rax
	; CHECK-NEXT: movl %eax, (%rdi)
	; CHECK-NEXT: shrq $32, %rax
	; CHECK-NEXT: movw %ax, 4(%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%b = load i56, i56* %a, align 1			%b = load i56, i56* %a, align 1
	%c = and i56 %b, -128			%c = and i56 %b, -128
	%d = or i56 %c, 384			%d = or i56 %c, 384
	store i56 %d, i56* %a, align 1			store i56 %d, i56* %a, align 1
	ret void			ret void
	}			}

	define void @i56_insert_bit(i56* %a, i1 zeroext %bit) {			define void @i56_insert_bit(i56* %a, i1 zeroext %bit) {
	; CHECK-LABEL: i56_insert_bit:			; CHECK-LABEL: i56_insert_bit:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movzbl %sil, %eax			; CHECK-NEXT: movb 1(%rdi), %al
	; CHECK-NEXT: movzwl 4(%rdi), %ecx			; CHECK-NEXT: shlb $5, %sil
	; CHECK-NEXT: movzbl 6(%rdi), %edx			; CHECK-NEXT: andb $-33, %al
	; CHECK-NEXT: movl (%rdi), %esi			; CHECK-NEXT: orb %sil, %al
	; CHECK-NEXT: movb %dl, 6(%rdi)			; CHECK-NEXT: movb %al, 1(%rdi)
	; CHECK-NEXT: # kill: %EDX<def> %EDX<kill> %RDX<kill> %RDX<def>
	; CHECK-NEXT: shll $16, %edx
	; CHECK-NEXT: orl %ecx, %edx
	; CHECK-NEXT: shlq $32, %rdx
	; CHECK-NEXT: orq %rdx, %rsi
	; CHECK-NEXT: shlq $13, %rax
	; CHECK-NEXT: movabsq $72057594037919743, %rcx # imm = 0xFFFFFFFFFFDFFF
	; CHECK-NEXT: andq %rsi, %rcx
	; CHECK-NEXT: orq %rax, %rcx
	; CHECK-NEXT: movl %ecx, (%rdi)
	; CHECK-NEXT: shrq $32, %rcx
	; CHECK-NEXT: movw %cx, 4(%rdi)
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%extbit = zext i1 %bit to i56			%extbit = zext i1 %bit to i56
	%b = load i56, i56* %a, align 1			%b = load i56, i56* %a, align 1
	%extbit.shl = shl nuw nsw i56 %extbit, 13			%extbit.shl = shl nuw nsw i56 %extbit, 13
	%c = and i56 %b, -8193			%c = and i56 %b, -8193
	%d = or i56 %c, %extbit.shl			%d = or i56 %c, %extbit.shl
	store i56 %d, i56* %a, align 1			store i56 %d, i56* %a, align 1
	ret void			ret void
	}			}

test/CodeGen/X86/load-shrink.ll

				; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -memaccessshrink -S \| FileCheck %s
				; Check bitfield load is shrinked properly in cases below.

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				; The bitfield store can be shrinked from i64 to i16.
				; CHECK-LABEL: @load_and(
				; CHECK: %cast = bitcast i64* %ptr to i16*
				; CHECK: %load.trunc = load i16, i16* %cast, align 8
				; CHECK: %trunc.zext = zext i16 %load.trunc to i64
				; CHECK: %cmp = icmp eq i64 %trunc.zext, 1

				define i1 @load_and(i64* %ptr) {
				entry:
				%bf.load = load i64, i64* %ptr, align 8
				%bf.clear = and i64 %bf.load, 65535
				%cmp = icmp eq i64 %bf.clear, 1
				ret i1 %cmp
				}

				; The bitfield store can be shrinked from i64 to i8.
				; CHECK-LABEL: @load_trunc(
				; CHECK: %cast = bitcast i64* %ptr to i8*
				; CHECK: %load.trunc = load i8, i8* %cast, align 8
				; CHECK: %cmp = icmp eq i8 %load.trunc, 1

				define i1 @load_trunc(i64* %ptr) {
				entry:
				%bf.load = load i64, i64* %ptr, align 8
				%bf.clear = trunc i64 %bf.load to i8
				%cmp = icmp eq i8 %bf.clear, 1
				ret i1 %cmp
				}

				; The bitfield store can be shrinked from i64 to i8.
				; CHECK-LABEL: @load_and_shr(
				; CHECK: %cast = bitcast i64* %ptr to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 6
				; CHECK: %load.trunc = load i8, i8* %uglygep, align 2
				; CHECK: %trunc.zext = zext i8 %load.trunc to i64
				; CHECK: %cmp = icmp eq i64 %trunc.zext, 1

				define i1 @load_and_shr(i64* %ptr) {
				entry:
				%bf.load = load i64, i64* %ptr, align 8
				%bf.lshr = lshr i64 %bf.load, 48
				%bf.clear = and i64 %bf.lshr, 255
				%cmp = icmp eq i64 %bf.clear, 1
				ret i1 %cmp
				}

				; The bitfield store can be shrinked from i64 to i8.
				; CHECK-LABEL: @load_and_shr_add(
				; CHECK: %cast = bitcast i64* %ptr to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 6
				; CHECK: %load.trunc = load i8, i8* %uglygep, align 2
				; CHECK: %[[ADD:.*]] = add i8 %load.trunc, 1
				; CHECK: %trunc.zext = zext i8 %[[ADD]] to i64
				; CHECK: %cmp = icmp eq i64 %trunc.zext, 1

				define i1 @load_and_shr_add(i64* %ptr) {
				entry:
				%bf.load = load i64, i64* %ptr, align 8
				%add = add i64 %bf.load, 281474976710656
				%bf.lshr = lshr i64 %add, 48
				%bf.clear = and i64 %bf.lshr, 255
				%cmp = icmp eq i64 %bf.clear, 1
				ret i1 %cmp
				}

				; The bitfield store can be shrinked from i64 to i8.
				; CHECK-LABEL: @load_ops(
				; CHECK: %cast = bitcast i64* %ptr to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 6
				; CHECK: %load.trunc = load i8, i8* %uglygep, align 2
				; CHECK: %[[ADD:.*]] = add i8 %load.trunc, 1
				; CHECK: %trunc.zext = zext i8 %[[ADD]] to i64
				; CHECK: %cmp = icmp eq i64 %trunc.zext, 1

				define i1 @load_ops(i64* %ptr, i64 %value) {
				entry:
				%bf.load = load i64, i64* %ptr, align 8
				%bf.value = and i64 %value, 65535
				%bf.clear1 = and i64 %bf.load, -65536
				%bf.set = or i64 %bf.value, %bf.clear1
				%add = add i64 %bf.set, 281474976710656
				%bf.lshr = lshr i64 %add, 48
				%bf.clear = and i64 %bf.lshr, 255
				%cmp = icmp eq i64 %bf.clear, 1
				ret i1 %cmp
				}

				; It doesn't worth to do the shrink because %bf.load has multiple uses
				; and the shrink here doesn't save instructions.
				; CHECK-LABEL: @load_trunc_multiuses
				; CHECK: %bf.load = load i64, i64* %ptr, align 8
				; CHECK: %bf.trunc = trunc i64 %bf.load to i16
				; CHECK: %cmp1 = icmp eq i16 %bf.trunc, 3
				; CHECK: %cmp2 = icmp ult i64 %bf.load, 1500000
				; CHECK: %cmp = and i1 %cmp1, %cmp2

				define i1 @load_trunc_multiuses(i64* %ptr) {
				entry:
				%bf.load = load i64, i64* %ptr, align 8
				%bf.trunc = trunc i64 %bf.load to i16
				%cmp1 = icmp eq i16 %bf.trunc, 3
				%cmp2 = icmp ult i64 %bf.load, 1500000
				%cmp = and i1 %cmp1, %cmp2
				ret i1 %cmp
				}

test/CodeGen/X86/load-slice.ll

Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	define i32 @t2(%class.Complex* nocapture %out, i64 %out_start) {
%chunk64 = load i64, i64* %bitcast, align 8		%chunk64 = load i64, i64* %bitcast, align 8
%slice32_low = trunc i64 %chunk64 to i32		%slice32_low = trunc i64 %chunk64 to i32
%shift48 = lshr i64 %chunk64, 48		%shift48 = lshr i64 %chunk64, 48
%slice32_high = trunc i64 %shift48 to i32		%slice32_high = trunc i64 %shift48 to i32
%res = add i32 %slice32_high, %slice32_low		%res = add i32 %slice32_high, %slice32_low
ret i32 %res		ret i32 %res
}		}

; Check that we do not optimize overlapping slices.
;		;
; The 64-bits should NOT have been split in as slices are overlapping.		; The slices are overlapping, but it can still be split.
; First slice uses bytes numbered 0 to 3.		; First slice uses bytes numbered 0 to 3.
; Second slice uses bytes numbered 6 and 7.		; Second slice uses bytes numbered 6 and 7.
; Third slice uses bytes numbered 4 to 7.		; Third slice uses bytes numbered 4 to 7.
;		;
; STRESS-LABEL: t3:		; STRESS-LABEL: t3:
; STRESS: shrq $48		; STRESS: movzwl 6(%rdi,%rsi,8), %eax
; STRESS: shrq $32		; STRESS-NEXT: addl (%rdi,%rsi,8), %eax
		; STRESS-NEXT: addl 4(%rdi,%rsi,8), %eax
;		;
; REGULAR-LABEL: t3:		; REGULAR-LABEL: t3:
; REGULAR: shrq $48		; REGULAR: movq (%rdi,%rsi,8), %rcx
; REGULAR: shrq $32		; REGULAR-NEXT: movq %rcx, %rax
		; REGULAR-NEXT: shrq $48, %rax
		; REGULAR-NEXT: addl %ecx, %eax
		; REGULAR-NEXT: addl 4(%rdi,%rsi,8), %eax
define i32 @t3(%class.Complex* nocapture %out, i64 %out_start) {		define i32 @t3(%class.Complex* nocapture %out, i64 %out_start) {
%arrayidx = getelementptr inbounds %class.Complex, %class.Complex* %out, i64 %out_start		%arrayidx = getelementptr inbounds %class.Complex, %class.Complex* %out, i64 %out_start
%bitcast = bitcast %class.Complex* %arrayidx to i64*		%bitcast = bitcast %class.Complex* %arrayidx to i64*
%chunk64 = load i64, i64* %bitcast, align 8		%chunk64 = load i64, i64* %bitcast, align 8
%slice32_low = trunc i64 %chunk64 to i32		%slice32_low = trunc i64 %chunk64 to i32
%shift48 = lshr i64 %chunk64, 48		%shift48 = lshr i64 %chunk64, 48
%slice32_high = trunc i64 %shift48 to i32		%slice32_high = trunc i64 %shift48 to i32
%shift32 = lshr i64 %chunk64, 32		%shift32 = lshr i64 %chunk64, 32
%slice32_lowhigh = trunc i64 %shift32 to i32		%slice32_lowhigh = trunc i64 %shift32 to i32
%tmpres = add i32 %slice32_high, %slice32_low		%tmpres = add i32 %slice32_high, %slice32_low
%res = add i32 %slice32_lowhigh, %tmpres		%res = add i32 %slice32_lowhigh, %tmpres
ret i32 %res		ret i32 %res
}		}

test/CodeGen/X86/lsr-loop-exit-cond.ll

	; RUN: llc -mtriple=x86_64-darwin -mcpu=generic < %s \| FileCheck %s			; RUN: llc -mtriple=x86_64-darwin -mcpu=generic < %s \| FileCheck %s
	; RUN: llc -mtriple=x86_64-darwin -mcpu=atom < %s \| FileCheck -check-prefix=ATOM %s			; RUN: llc -mtriple=x86_64-darwin -mcpu=atom < %s \| FileCheck -check-prefix=ATOM %s

	; CHECK-LABEL: t:			; CHECK-LABEL: t:
				; CHECK: [[LABEL:LBB.*]]:
	; CHECK: movl (%r9,%rax,4), %e{{..}}			; CHECK: movl (%r9,%rax,4), %e{{..}}
	; CHECK-NEXT: testq			; CHECK: testq
	; CHECK-NEXT: jne			; CHECK: jne [[LABEL]]

	; ATOM-LABEL: t:			; ATOM-LABEL: t:
				; ATOM: [[LABEL:LBB.*]]:
	; ATOM: movl (%r9,%r{{.+}},4), %e{{..}}			; ATOM: movl (%r9,%r{{.+}},4), %e{{..}}
	; ATOM-NEXT: testq			; ATOM: testq
	; ATOM-NEXT: jne			; ATOM: jne [[LABEL]]

	@Te0 = external global [256 x i32] ; <[256 x i32]*> [#uses=5]			@Te0 = external global [256 x i32] ; <[256 x i32]*> [#uses=5]
	@Te1 = external global [256 x i32] ; <[256 x i32]*> [#uses=4]			@Te1 = external global [256 x i32] ; <[256 x i32]*> [#uses=4]
	@Te3 = external global [256 x i32] ; <[256 x i32]*> [#uses=2]			@Te3 = external global [256 x i32] ; <[256 x i32]*> [#uses=2]

	define void @t(i8* nocapture %in, i8* nocapture %out, i32* nocapture %rk, i32 %r) nounwind {			define void @t(i8* nocapture %in, i8* nocapture %out, i32* nocapture %rk, i32 %r) nounwind {
	entry:			entry:
	%0 = load i32, i32* %rk, align 4 ; <i32> [#uses=1]			%0 = load i32, i32* %rk, align 4 ; <i32> [#uses=1]
	▲ Show 20 Lines • Show All 172 Lines • Show Last 20 Lines

test/CodeGen/X86/mem-access-shrink.ll

				; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -memaccessshrink -S \| FileCheck %s
				; Check the memory accesses in the test are shrinked as expected.

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				%class.B = type { i56, [4 x i16], i64 }
				@i = local_unnamed_addr global i64 0, align 8

				define zeroext i1 @foo(%class.B* nocapture %this, i8* %p1, i64 %p2) local_unnamed_addr align 2 {
				; CHECK-LABEL: @foo(
				entry:
				; CHECK: entry:
				; CHECK: %t0 = bitcast %class.B* %this to i64*
				; CHECK: %[[CAST1:.]] = bitcast i64 %t0 to i16*
				; CHECK: %[[LOAD1:.]] = load i16, i16 %[[CAST1]], align 8
				; CHECK: %[[TRUNC1:.*]] = zext i16 %[[LOAD1]] to i64
				; CHECK: %cmp = icmp eq i64 %[[TRUNC1]], 1

				%t0 = bitcast %class.B* %this to i64*
				%bf.load = load i64, i64* %t0, align 8
				%bf.clear = and i64 %bf.load, 65535
				%cmp = icmp eq i64 %bf.clear, 1
				br i1 %cmp, label %return, label %if.end

				if.end: ; preds = %entry
				; CHECK: if.end:
				; CHECK: %[[CAST2:.]] = bitcast i64 %t0 to i16*
				; CHECK: %[[LTRUNC1:.]] = load i16, i16 %[[CAST2]], align 8
				; CHECK: %[[ADD1:.*]] = add i16 %[[LTRUNC1]], -1
				; CHECK: %[[TRUNCZ1:.*]] = zext i16 %[[ADD1]] to i64
				; CHECK: %[[CAST3:.]] = bitcast i64 %t0 to i16*
				; CHECK: %[[TRUNC2:.*]] = trunc i64 %[[TRUNCZ1]] to i16
				; CHECK: store i16 %[[TRUNC2]], i16* %[[CAST3]], align 8

				%dec = add i64 %bf.load, 65535
				%bf.value = and i64 %dec, 65535
				%bf.clear5 = and i64 %bf.load, -65536
				%bf.set = or i64 %bf.value, %bf.clear5
				store i64 %bf.set, i64* %t0, align 8
				%t1 = ptrtoint i8* %p1 to i64
				%t2 = load i64, i64* @i, align 8
				%cmp.i = icmp ult i64 %t2, 3
				br i1 %cmp.i, label %if.then.i, label %if.else.i

				if.then.i: ; preds = %if.end
				%and.i = lshr i64 %t1, 3
				%div.i = and i64 %and.i, 1023
				br label %_ZNK1B5m_fn3EPv.exit

				if.else.i: ; preds = %if.end
				%first_page_.i = getelementptr inbounds %class.B, %class.B* %this, i64 0, i32 2
				%t3 = load i64, i64* %first_page_.i, align 8
				%mul.i = shl i64 %t3, 13
				%sub.i = sub i64 %t1, %mul.i
				%div2.i = lshr i64 %sub.i, 2
				br label %_ZNK1B5m_fn3EPv.exit

				_ZNK1B5m_fn3EPv.exit: ; preds = %if.then.i, %if.else.i
				; CHECK: _ZNK1B5m_fn3EPv.exit:
				; CHECK: %[[CAST4:.]] = bitcast i64 %t0 to i8*
				; CHECK: %[[UGEP1:.]] = getelementptr i8, i8 %[[CAST4]], i32 6
				; CHECK: %[[LTRUNC2:.]] = load i8, i8 %[[UGEP1]], align 2
				; CHECK: %[[TRUNCZ2:.*]] = zext i8 %[[LTRUNC2]] to i64
				; CHECK: %cmp8 = icmp eq i64 %[[TRUNCZ2]], 4

				%j.0.i = phi i64 [ %div.i, %if.then.i ], [ %div2.i, %if.else.i ]
				%conv.i = trunc i64 %j.0.i to i16
				%bf.lshr = lshr i64 %bf.load, 48
				%bf.clear7 = and i64 %bf.lshr, 255
				%cmp8 = icmp eq i64 %bf.clear7, 4
				br i1 %cmp8, label %if.else, label %if.then9

				if.then9: ; preds = %_ZNK1B5m_fn3EPv.exit
				; CHECK: if.then9:
				; CHECK: %arrayidx = getelementptr inbounds %class.B, %class.B* %this, i64 0, i32 1, i64 %[[TRUNCZ2]]
				; CHECK: store i16 %conv.i, i16* %arrayidx, align 2
				; CHECK: %[[CAST5:.]] = bitcast i64 %t0 to i8*
				; CHECK: %[[UGEP2:.]] = getelementptr i8, i8 %[[CAST5]], i32 6
				; CHECK: %[[CAST6:.]] = bitcast i64 %t0 to i8*
				; CHECK: %[[UGEP3:.]] = getelementptr i8, i8 %[[CAST6]], i32 6
				; CHECK: %[[LTRUNC3:.]] = load i8, i8 %[[UGEP3]], align 2
				; CHECK: %[[ADD2:.*]] = add i8 %[[LTRUNC3]], 1
				; CHECK: %[[AND1:.*]] = and i8 %[[ADD2]], -1
				; CHECK: store i8 %[[AND1]], i8* %[[UGEP2]], align 2

				%arrayidx = getelementptr inbounds %class.B, %class.B* %this, i64 0, i32 1, i64 %bf.clear7
				store i16 %conv.i, i16* %arrayidx, align 2
				%inc79 = add i64 %bf.set, 281474976710656
				%bf.shl = and i64 %inc79, 71776119061217280
				%bf.clear18 = and i64 %bf.set, -71776119061217281
				%bf.set19 = or i64 %bf.shl, %bf.clear18
				store i64 %bf.set19, i64* %t0, align 8
				br label %return

				if.else: ; preds = %_ZNK1B5m_fn3EPv.exit
				; CHECK: if.else:
				; CHECK: %[[CAST7:.]] = bitcast i64 %t0 to i8*
				; CHECK: %[[UGEP4:.]] = getelementptr i8, i8 %[[CAST7]], i32 4
				; CHECK: %[[CAST8:.]] = bitcast i8 %[[UGEP4]] to i16*
				; CHECK: %[[LTRUNC4:.]] = load i16, i16 %[[CAST8]], align 4
				; CHECK: %[[TRUNCZ3:.*]] = zext i16 %[[LTRUNC4]] to i64
				; CHECK: %cmp24 = icmp eq i64 %[[TRUNCZ3]], 1

				%bf.lshr21 = lshr i64 %bf.load, 32
				%bf.clear22 = and i64 %bf.lshr21, 65535
				%cmp24 = icmp eq i64 %bf.clear22, 1
				br i1 %cmp24, label %if.else55, label %land.lhs.true

				land.lhs.true: ; preds = %if.else
				; CHECK: land.lhs.true:
				; CHECK: %[[CAST9:.]] = bitcast i64 %t0 to i8*
				; CHECK: %[[UGEP5:.]] = getelementptr i8, i8 %[[CAST9]], i32 2
				; CHECK: %[[CAST10:.]] = bitcast i8 %[[UGEP5]] to i16*
				; CHECK: %[[LTRUNC5:.]] = load i16, i16 %[[CAST10]], align 2
				; CHECK: %[[TRUNCZ4:.*]] = zext i16 %[[LTRUNC5]] to i64
				; CHECK: %cmp28 = icmp eq i64 %[[TRUNCZ4]], %sub

				%bf.lshr26 = lshr i64 %bf.load, 16
				%bf.clear27 = and i64 %bf.lshr26, 65535
				%div = lshr i64 %p2, 1
				%sub = add nsw i64 %div, -1
				%cmp28 = icmp eq i64 %bf.clear27, %sub
				br i1 %cmp28, label %if.else55, label %if.then29

				if.then29: ; preds = %land.lhs.true
				%cmp30 = icmp ult i64 %p2, 3
				br i1 %cmp30, label %if.then31, label %if.else35

				if.then31: ; preds = %if.then29
				; CHECK: if.then31:
				; CHECK: %mul = shl nuw nsw i64 %[[TRUNCZ3]], 3

				%and = and i64 %t1, -8192
				%mul = shl nuw nsw i64 %bf.clear22, 3
				%add = add i64 %mul, %and
				br label %if.end41

				if.else35: ; preds = %if.then29
				; CHECK: if.else35:
				; CHECK: %[[CAST11:.]] = bitcast i64 %t0 to i8*
				; CHECK: %[[UGEP6:.]] = getelementptr i8, i8 %[[CAST11]], i32 4
				; CHECK: %[[CAST12:.]] = bitcast i8 %[[UGEP6]] to i16*
				; CHECK: %[[LTRUNC6:.]] = load i16, i16 %[[CAST12]], align 4
				; CHECK: %[[TRUNCZ5:.*]] = zext i16 %[[LTRUNC6]] to i64
				; CHECK: %shl = shl i64 %[[TRUNCZ5]], 6

				%first_page_.i80 = getelementptr inbounds %class.B, %class.B* %this, i64 0, i32 2
				%t4 = load i64, i64* %first_page_.i80, align 8
				%mul.i81 = shl i64 %t4, 13
				%conv.i82 = shl nuw nsw i64 %bf.lshr21, 6
				%mul3.i = and i64 %conv.i82, 4194240
				%add.i = add i64 %mul.i81, %mul3.i
				br label %if.end41

				if.end41: ; preds = %if.else35, %if.then31
				; CHECK: if.end41:
				; CHECK: %[[CAST17:.]] = bitcast i64 %t0 to i8*
				; CHECK: %[[UGEP9:.]] = getelementptr i8, i8 %[[CAST17]], i32 2
				; CHECK: %[[CAST18:.]] = bitcast i8 %[[UGEP9]] to i16*
				; CHECK: %[[LTRUNC8:.]] = load i16, i16 %[[CAST18]], align 2
				; CHECK: %[[ADD4:.*]] = add i16 %[[LTRUNC8]], 1
				; CHECK: %[[TRUNC3:.*]] = zext i16 %[[ADD4]] to i64
				; CHECK: %[[CAST13:.]] = bitcast i64 %t0 to i8*
				; CHECK: %[[UGEP7:.]] = getelementptr i8, i8 %[[CAST13]], i32 2
				; CHECK: %[[CAST14:.]] = bitcast i8 %[[UGEP7]] to i16*
				; CHECK: %[[CAST15:.]] = bitcast i64 %t0 to i8*
				; CHECK: %[[UGEP8:.]] = getelementptr i8, i8 %[[CAST15]], i32 2
				; CHECK: %[[CAST16:.]] = bitcast i8 %[[UGEP8]] to i16*
				; CHECK: %[[LTRUNC7:.]] = load i16, i16 %[[CAST16]], align 2
				; CHECK: %[[ADD3:.*]] = add i16 %[[LTRUNC7]], 1
				; CHECK: %[[AND2:.*]] = and i16 %[[ADD3]], -1
				; CHECK: store i16 %[[AND2]], i16* %[[CAST14]], align 2
				; CHECK: %arrayidx54 = getelementptr inbounds i16, i16* %t5, i64 %[[TRUNC3]]

				%add.i.sink = phi i64 [ %add.i, %if.else35 ], [ %add, %if.then31 ]
				%t5 = inttoptr i64 %add.i.sink to i16*
				%inc4578 = add i64 %bf.set, 65536
				%bf.shl48 = and i64 %inc4578, 4294901760
				%bf.clear49 = and i64 %bf.set, -4294901761
				%bf.set50 = or i64 %bf.shl48, %bf.clear49
				store i64 %bf.set50, i64* %t0, align 8
				%bf.lshr52 = lshr i64 %inc4578, 16
				%bf.clear53 = and i64 %bf.lshr52, 65535
				%arrayidx54 = getelementptr inbounds i16, i16* %t5, i64 %bf.clear53
				store i16 %conv.i, i16* %arrayidx54, align 2
				br label %return

				if.else55: ; preds = %land.lhs.true, %if.else
				; CHECK: if.else55:
				; CHECK: %[[CAST19:.]] = bitcast i64 %t0 to i8*
				; CHECK: %[[UGEP10:.]] = getelementptr i8, i8 %[[CAST19]], i32 4
				; CHECK: %[[CAST20:.]] = bitcast i8 %[[UGEP10]] to i16*
				; CHECK: %[[LTRUNC9:.]] = load i16, i16 %[[CAST20]], align 4
				; CHECK: store i16 %[[LTRUNC9]], i16* %t6, align 2
				; CHECK: %[[UGEP11:.]] = getelementptr i8, i8 %cast, i32 2
				; CHECK: %[[CAST21:.]] = bitcast i8 %[[UGEP11]] to i32*
				; CHECK: %lshr = lshr i64 %bf.shl63, 16
				; CHECK: %[[TRUNC4:.*]] = trunc i64 %lshr to i32
				; CHECK: store i32 %[[TRUNC4]], i32* %[[CAST21]], align 2

				%conv59 = trunc i64 %bf.lshr21 to i16
				%t6 = bitcast i8* %p1 to i16*
				store i16 %conv59, i16* %t6, align 2
				%bf.load61 = load i64, i64* %t0, align 8
				%conv60 = shl i64 %j.0.i, 32
				%bf.shl63 = and i64 %conv60, 281470681743360
				%bf.clear64 = and i64 %bf.load61, -281474976645121
				%bf.clear67 = or i64 %bf.clear64, %bf.shl63
				store i64 %bf.clear67, i64* %t0, align 8
				br label %return

				return: ; preds = %if.then9, %if.else55, %if.end41, %entry
				%retval.0 = phi i1 [ false, %entry ], [ true, %if.end41 ], [ true, %if.else55 ], [ true, %if.then9 ]
				ret i1 %retval.0
				}

test/CodeGen/X86/store-shrink.ll

				; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -memaccessshrink -S \| FileCheck %s
				; Check bitfield store is shrinked properly in cases below.

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				; class A1 {
				; unsigned long f1:8;
				; unsigned long f2:3;
				; } a1;
				; a1.f1 = n;
				;
				; The bitfield store can be shrinked from i16 to i8.
				; CHECK-LABEL: @test1(
				; CHECK: %conv = zext i32 %n to i64
				; CHECK: %t0 = trunc i64 %conv to i16
				; CHECK: %bf.value = and i16 %t0, 255
				; CHECK: %trunc = trunc i16 %bf.value to i8
				; CHECK: store i8 %trunc, i8* bitcast (%class.A1* @a1 to i8*), align 8

				%class.A1 = type { i16, [6 x i8] }
				@a1 = local_unnamed_addr global %class.A1 zeroinitializer, align 8

				define void @test1(i32 %n) {
				entry:
				%conv = zext i32 %n to i64
				%t0 = trunc i64 %conv to i16
				%bf.load = load i16, i16* getelementptr inbounds (%class.A1, %class.A1* @a1, i32 0, i32 0), align 8
				%bf.value = and i16 %t0, 255
				%bf.clear = and i16 %bf.load, -256
				%bf.set = or i16 %bf.clear, %bf.value
				store i16 %bf.set, i16* getelementptr inbounds (%class.A1, %class.A1* @a1, i32 0, i32 0), align 8
				ret void
				}

				; class A2 {
				; unsigned long f1:16;
				; unsigned long f2:3;
				; } a2;
				; a2.f1 = n;
				; The bitfield store can be shrinked from i32 to i16.
				; CHECK-LABEL: @test2(
				; CHECK: %bf.value = and i32 %n, 65535
				; CHECK: %trunc = trunc i32 %bf.value to i16
				; CHECK: store i16 %trunc, i16* bitcast (%class.A2* @a2 to i16*), align 8

				%class.A2 = type { i24, [4 x i8] }
				@a2 = local_unnamed_addr global %class.A2 zeroinitializer, align 8

				define void @test2(i32 %n) {
				entry:
				%bf.load = load i32, i32* bitcast (%class.A2* @a2 to i32*), align 8
				%bf.value = and i32 %n, 65535
				%bf.clear = and i32 %bf.load, -65536
				%bf.set = or i32 %bf.clear, %bf.value
				store i32 %bf.set, i32* bitcast (%class.A2* @a2 to i32*), align 8
				ret void
				}

				; class A3 {
				; unsigned long f1:32;
				; unsigned long f2:3;
				; } a3;
				; a3.f1 = n;
				; The bitfield store can be shrinked from i64 to i32.
				; CHECK-LABEL: @test3(
				; CHECK: %conv = zext i32 %n to i64
				; CHECK: %bf.value = and i64 %conv, 4294967295
				; CHECK: %trunc = trunc i64 %bf.value to i32
				; CHECK: store i32 %trunc, i32* bitcast (%class.A3* @a3 to i32*), align 8

				%class.A3 = type { i40 }
				@a3 = local_unnamed_addr global %class.A3 zeroinitializer, align 8

				define void @test3(i32 %n) {
				entry:
				%conv = zext i32 %n to i64
				%bf.load = load i64, i64* bitcast (%class.A3* @a3 to i64*), align 8
				%bf.value = and i64 %conv, 4294967295
				%bf.clear = and i64 %bf.load, -4294967296
				%bf.set = or i64 %bf.clear, %bf.value
				store i64 %bf.set, i64* bitcast (%class.A3* @a3 to i64*), align 8
				ret void
				}

				; class A4 {
				; unsigned long f1:13;
				; unsigned long f2:3;
				; } a4;
				; a4.f1 = n;
				; The bitfield store cannot be shrinked because the field is not 8/16/32 bits.
				; CHECK-LABEL: @test4(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: %bf.load = load i16, i16* getelementptr inbounds (%class.A4, %class.A4* @a4, i64 0, i32 0), align 8
				; CHECK-NEXT: %t0 = trunc i32 %n to i16
				; CHECK-NEXT: %bf.value = and i16 %t0, 8191
				; CHECK-NEXT: %bf.clear3 = and i16 %bf.load, -8192
				; CHECK-NEXT: %bf.set = or i16 %bf.clear3, %bf.value
				; CHECK-NEXT: store i16 %bf.set, i16* getelementptr inbounds (%class.A4, %class.A4* @a4, i64 0, i32 0), align 8
				; CHECK-NEXT: ret void

				%class.A4 = type { i16, [6 x i8] }
				@a4 = local_unnamed_addr global %class.A4 zeroinitializer, align 8

				define void @test4(i32 %n) {
				entry:
				%bf.load = load i16, i16* getelementptr inbounds (%class.A4, %class.A4* @a4, i64 0, i32 0), align 8
				%t0 = trunc i32 %n to i16
				%bf.value = and i16 %t0, 8191
				%bf.clear3 = and i16 %bf.load, -8192
				%bf.set = or i16 %bf.clear3, %bf.value
				store i16 %bf.set, i16* getelementptr inbounds (%class.A4, %class.A4* @a4, i64 0, i32 0), align 8
				ret void
				}

				; class A5 {
				; unsigned long f1:3;
				; unsigned long f2:16;
				; } a5;
				; a5.f2 = n;
				; The bitfield store cannot be shrinked because it is not aligned on
				; 16bits boundary.
				; CHECK-LABEL: @test5(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: %bf.load = load i32, i32* bitcast (%class.A5* @a5 to i32*), align 8
				; CHECK-NEXT: %bf.value = and i32 %n, 65535
				; CHECK-NEXT: %bf.shl = shl i32 %bf.value, 3
				; CHECK-NEXT: %bf.clear = and i32 %bf.load, -524281
				; CHECK-NEXT: %bf.set = or i32 %bf.clear, %bf.shl
				; CHECK-NEXT: store i32 %bf.set, i32* bitcast (%class.A5* @a5 to i32*), align 8
				; CHECK-NEXT: ret void

				%class.A5 = type { i24, [4 x i8] }
				@a5 = local_unnamed_addr global %class.A5 zeroinitializer, align 8

				define void @test5(i32 %n) {
				entry:
				%bf.load = load i32, i32* bitcast (%class.A5* @a5 to i32*), align 8
				%bf.value = and i32 %n, 65535
				%bf.shl = shl i32 %bf.value, 3
				%bf.clear = and i32 %bf.load, -524281
				%bf.set = or i32 %bf.clear, %bf.shl
				store i32 %bf.set, i32* bitcast (%class.A5* @a5 to i32*), align 8
				ret void
				}

				; class A6 {
				; unsigned long f1:16;
				; unsigned long f2:3;
				; } a6;
				; a6.f1 = n;
				; The bitfield store can be shrinked from i32 to i16 even the load and store
				; are in different BasicBlocks.
				; CHECK-LABEL: @test6(
				; CHECK: if.end:
				; CHECK: %bf.value = and i32 %n, 65535
				; CHECK: %trunc = trunc i32 %bf.value to i16
				; CHECK: store i16 %trunc, i16* bitcast (%class.A6* @a6 to i16*), align 8

				%class.A6 = type { i24, [4 x i8] }
				@a6 = local_unnamed_addr global %class.A6 zeroinitializer, align 8

				define void @test6(i32 %n) {
				entry:
				%bf.load = load i32, i32* bitcast (%class.A6* @a6 to i32*), align 8
				%bf.clear = and i32 %bf.load, 65535
				%cmp = icmp eq i32 %bf.clear, 2
				br i1 %cmp, label %return, label %if.end

				if.end: ; preds = %entry
				%bf.value = and i32 %n, 65535
				%bf.clear3 = and i32 %bf.load, -65536
				%bf.set = or i32 %bf.clear3, %bf.value
				store i32 %bf.set, i32* bitcast (%class.A6* @a6 to i32*), align 8
				br label %return

				return: ; preds = %entry, %if.end
				ret void
				}

				; class A7 {
				; unsigned long f1:16;
				; unsigned long f2:16;
				; } a7;
				; a7.f2 = n;
				; The bitfield store can be shrinked from i32 to i16.
				; CHECK-LABEL: @test7(
				; CHECK: %bf.value = and i32 %n, 65535
				; CHECK: %bf.shl = shl i32 %bf.value, 16
				; CHECK: %lshr = lshr i32 %bf.shl, 16
				; CHECK: %trunc = trunc i32 %lshr to i16
				; CHECK: store i16 %trunc, i16* bitcast (i8* getelementptr (i8, i8* bitcast (%class.A7* @a7 to i8), i32 2) to i16), align 2

				%class.A7 = type { i32, [4 x i8] }
				@a7 = local_unnamed_addr global %class.A7 zeroinitializer, align 8

				define void @test7(i32 %n) {
				entry:
				%bf.load = load i32, i32* getelementptr inbounds (%class.A7, %class.A7* @a7, i32 0, i32 0), align 8
				%bf.value = and i32 %n, 65535
				%bf.shl = shl i32 %bf.value, 16
				%bf.clear = and i32 %bf.load, 65535
				%bf.set = or i32 %bf.clear, %bf.shl
				store i32 %bf.set, i32* getelementptr inbounds (%class.A7, %class.A7* @a7, i32 0, i32 0), align 8
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i24 store
				; to i16.
				; CHECK-LABEL: @i24_or(
				; CHECK: %cast = bitcast i24* %a to i16*
				; CHECK: %load.trunc = load i16, i16* %cast, align 1
				; CHECK: %or.trunc = or i16 %load.trunc, 384
				; CHECK: store i16 %or.trunc, i16* %cast, align 1
				;
				define void @i24_or(i24* %a) {
				%aa = load i24, i24* %a, align 1
				%b = or i24 %aa, 384
				store i24 %b, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i24 store
				; to i8.
				; CHECK-LABEL: @i24_and(
				; CHECK: %cast = bitcast i24* %a to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 1
				; CHECK: %load.trunc = load i8, i8* %uglygep, align 1
				; CHECK: %and.trunc = and i8 %load.trunc, -7
				; CHECK: store i8 %and.trunc, i8* %uglygep, align 1
				;
				define void @i24_and(i24* %a) {
				%aa = load i24, i24* %a, align 1
				%b = and i24 %aa, -1537
				store i24 %b, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i24 store
				; to i16.
				; CHECK-LABEL: @i24_xor(
				; CHECK: %cast = bitcast i24* %a to i16*
				; CHECK: %load.trunc = load i16, i16* %cast, align 1
				; CHECK: %xor.trunc = xor i16 %load.trunc, 384
				; CHECK: store i16 %xor.trunc, i16* %cast, align 1
				;
				define void @i24_xor(i24* %a) {
				%aa = load i24, i24* %a, align 1
				%b = xor i24 %aa, 384
				store i24 %b, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i24 store
				; to i16.
				; CHECK-LABEL: @i24_and_or(
				; CHECK: %cast = bitcast i24* %a to i16*
				; CHECK: %load.trunc = load i16, i16* %cast, align 1
				; CHECK: %and.trunc = and i16 %load.trunc, -128
				; CHECK: %or.trunc = or i16 %and.trunc, 384
				; CHECK: store i16 %or.trunc, i16* %cast, align 1
				;
				define void @i24_and_or(i24* %a) {
				%b = load i24, i24* %a, align 1
				%c = and i24 %b, -128
				%d = or i24 %c, 384
				store i24 %d, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can shrink the i24 store to i8.
				; CHECK-LABEL: @i24_insert_bit(
				; CHECK: %extbit = zext i1 %bit to i24
				; CHECK: %extbit.shl = shl nuw nsw i24 %extbit, 13
				; CHECK: %cast = bitcast i24* %a to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 1
				; CHECK: %load.trunc = load i8, i8* %uglygep, align 1
				; CHECK: %lshr = lshr i24 %extbit.shl, 8
				; CHECK: %trunc = trunc i24 %lshr to i8
				; CHECK: %and.trunc = and i8 %load.trunc, -33
				; CHECK: %or.trunc = or i8 %and.trunc, %trunc
				; CHECK: store i8 %or.trunc, i8* %uglygep, align 1
				;
				define void @i24_insert_bit(i24* %a, i1 zeroext %bit) {
				%extbit = zext i1 %bit to i24
				%b = load i24, i24* %a, align 1
				%extbit.shl = shl nuw nsw i24 %extbit, 13
				%c = and i24 %b, -8193
				%d = or i24 %c, %extbit.shl
				store i24 %d, i24* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i56 store
				; to i16.
				; CHECK-LABEL: @i56_or(
				; CHECK: %cast = bitcast i56* %a to i16*
				; CHECK: %load.trunc = load i16, i16* %cast, align 1
				; CHECK: %or.trunc = or i16 %load.trunc, 384
				; CHECK: store i16 %or.trunc, i16* %cast, align 1
				;
				define void @i56_or(i56* %a) {
				%aa = load i56, i56* %a, align 1
				%b = or i56 %aa, 384
				store i56 %b, i56* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can shrink the i56 store
				; to i16.
				; CHECK-LABEL: @i56_and_or(
				; CHECK: %cast = bitcast i56* %a to i16*
				; CHECK: %load.trunc = load i16, i16* %cast, align 1
				; CHECK: %and.trunc = and i16 %load.trunc, -128
				; CHECK: %or.trunc = or i16 %and.trunc, 384
				; CHECK: store i16 %or.trunc, i16* %cast, align 1
				;
				define void @i56_and_or(i56* %a) {
				%b = load i56, i56* %a, align 1
				%c = and i56 %b, -128
				%d = or i56 %c, 384
				store i56 %d, i56* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can shrink the i56 store to i8.
				; CHECK-LABEL: @i56_insert_bit(
				; CHECK: %extbit = zext i1 %bit to i56
				; CHECK: %extbit.shl = shl nuw nsw i56 %extbit, 13
				; CHECK: %cast = bitcast i56* %a to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 1
				; CHECK: %load.trunc = load i8, i8* %uglygep, align 1
				; CHECK: %lshr = lshr i56 %extbit.shl, 8
				; CHECK: %trunc = trunc i56 %lshr to i8
				; CHECK: %and.trunc = and i8 %load.trunc, -33
				; CHECK: %or.trunc = or i8 %and.trunc, %trunc
				; CHECK: store i8 %or.trunc, i8* %uglygep, align 1
				;
				define void @i56_insert_bit(i56* %a, i1 zeroext %bit) {
				%extbit = zext i1 %bit to i56
				%b = load i56, i56* %a, align 1
				%extbit.shl = shl nuw nsw i56 %extbit, 13
				%c = and i56 %b, -8193
				%d = or i56 %c, %extbit.shl
				store i56 %d, i56* %a, align 1
				ret void
				}

				; Cannot remove the load and bit operations, but can still shrink the i56 store
				; to i16.
				; CHECK-LABEL: @i56_or_alg2(
				; CHECK: %cast = bitcast i56* %a to i8*
				; CHECK: %uglygep = getelementptr i8, i8* %cast, i32 2
				; CHECK: %cast1 = bitcast i8* %uglygep to i16*
				; CHECK: %load.trunc = load i16, i16* %cast1, align 2
				; CHECK: %or.trunc = or i16 %load.trunc, 272
				; CHECK: store i16 %or.trunc, i16* %cast1, align 2
				;
				define void @i56_or_alg2(i56* %a) {
				%aa = load i56, i56* %a, align 2
				%b = or i56 %aa, 17825792
				store i56 %b, i56* %a, align 2
				ret void
				}

tools/opt/opt.cpp

Show First 20 Lines • Show All 379 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {
initializeIPO(Registry);		initializeIPO(Registry);
initializeAnalysis(Registry);		initializeAnalysis(Registry);
initializeTransformUtils(Registry);		initializeTransformUtils(Registry);
initializeInstCombine(Registry);		initializeInstCombine(Registry);
initializeInstrumentation(Registry);		initializeInstrumentation(Registry);
initializeTarget(Registry);		initializeTarget(Registry);
// For codegen passes, only passes that do IR to IR transformation are		// For codegen passes, only passes that do IR to IR transformation are
// supported.		// supported.
		initializeMemAccessShrinkingPassPass(Registry);
initializeCodeGenPreparePass(Registry);		initializeCodeGenPreparePass(Registry);
initializeAtomicExpandPass(Registry);		initializeAtomicExpandPass(Registry);
initializeRewriteSymbolsLegacyPassPass(Registry);		initializeRewriteSymbolsLegacyPassPass(Registry);
initializeWinEHPreparePass(Registry);		initializeWinEHPreparePass(Registry);
initializeDwarfEHPreparePass(Registry);		initializeDwarfEHPreparePass(Registry);
initializeSafeStackPass(Registry);		initializeSafeStackPass(Registry);
initializeSjLjEHPreparePass(Registry);		initializeSjLjEHPreparePass(Registry);
initializePreISelIntrinsicLoweringLegacyPassPass(Registry);		initializePreISelIntrinsicLoweringLegacyPassPass(Registry);
▲ Show 20 Lines • Show All 375 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independentlyNeeds RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 97170

include/llvm/Analysis/MemorySSAUpdater.h

include/llvm/CodeGen/Passes.h

include/llvm/IR/PatternMatch.h

include/llvm/InitializePasses.h

include/llvm/LinkAllPasses.h

include/llvm/Target/TargetLowering.h

include/llvm/Transforms/Utils/Local.h

lib/Analysis/MemorySSAUpdater.cpp

lib/CodeGen/CMakeLists.txt

lib/CodeGen/CodeGen.cpp

lib/CodeGen/MemAccessShrinking.cpp

lib/CodeGen/TargetPassConfig.cpp

lib/Target/AMDGPU/AMDGPUISelLowering.h

lib/Target/AMDGPU/AMDGPUISelLowering.cpp

lib/Transforms/Utils/Local.cpp

test/CodeGen/ARM/illegal-bitfield-loadstore.ll

test/CodeGen/ARM/load-shrink.ll

test/CodeGen/ARM/store-shrink.ll

test/CodeGen/X86/2008-09-11-CoalescerBug2.ll

test/CodeGen/X86/constant-combines.ll

test/CodeGen/X86/i16lshr8pat.ll

test/CodeGen/X86/illegal-bitfield-loadstore.ll

test/CodeGen/X86/load-shrink.ll

test/CodeGen/X86/load-slice.ll

test/CodeGen/X86/lsr-loop-exit-cond.ll

test/CodeGen/X86/mem-access-shrink.ll

test/CodeGen/X86/store-shrink.ll

tools/opt/opt.cpp

[BitfieldShrinking] Shrink Bitfields load/store when the bitfields are legal to access independently
Needs RevisionPublic