This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/CodeGen/GlobalISel/
-
llvm/
-
CodeGen/
-
GlobalISel/
1
Combiner.h
3/3
CombinerHelper.h
-
CombinerInfo.h
-
MachineIRBuilder.h
-
lib/
-
CodeGen/GlobalISel/
-
GlobalISel/
7/7
Combiner.cpp
14/20
CombinerHelper.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64.h
3/3
AArch64PreLegalizerCombiner.cpp
-
AArch64TargetMachine.cpp
-
CMakeLists.txt
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
GlobalISel/
2/2
arm64-fallback.ll
-
gisel-commandline-option.ll
-
prelegalizercombiner-extending-loads.mir
-
O0-pipeline.ll

Differential D45543

[globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
ClosedPublic

Authored by dsanders on Apr 11 2018, 4:12 PM.

Download Raw Diff

Details

Reviewers

ab
aditya_nandakumar
bogner
rtereshin
volkan
rovka
javed.absar
aemerson

Commits

rGc973ad1878f3: Re-commit: [globalisel] Add a combiner helpers for extending loads and use them…
rG9659bfda5a64: [globalisel] Add a combiner helpers for extending loads and use them in a pre…
rGd24dcdd1f74b: [globalisel] Add a combiner helpers for extending loads and use them in a pre…
rL343654: Re-commit: [globalisel] Add a combiner helpers for extending loads and use them…
rL343521: [globalisel] Add a combiner helpers for extending loads and use them in a pre…
rL331816: [globalisel] Add a combiner helpers for extending loads and use them in a pre…

Summary

Depends on D45541

Diff Detail

Repository

rL LLVM

Build Status

Buildable 18686
Build 18686: arc lint + arc unit

Event Timeline

dsanders created this revision.Apr 11 2018, 4:12 PM

Herald added subscribers: kristof.beyls, javed.absar, mgorny, rengolin. · View Herald TranscriptApr 11 2018, 4:12 PM

Harbormaster completed remote builds in B16998: Diff 142091.Apr 11 2018, 4:13 PM

Hey Daniel - this looks mostly good to me.

Maybe we could have all the tests in one file called prelegalize-combine-extloads.mir

I'm assuming this is in Target/AArch64 because other targets haven't been updated to use the new opcodes yet? We do eventually want to use these representations for every target though right?

In D45543#1066099, @aditya_nandakumar wrote:

Maybe we could have all the tests in one file called prelegalize-combine-extloads.mir

I went with three files to match how we've been organizing the tests for the other passes but you raise a good point here. There's a good argument for the combiner being tested with one file per CombinerHelper::try*() function. I think that's probably a better organization.

In D45543#1071617, @aemerson wrote:

I'm assuming this is in Target/AArch64 because other targets haven't been updated to use the new opcodes yet? We do eventually want to use these representations for every target though right?

That's right. As you say, we'll want to support it in every target that supports the extending loads (which is most if not all of the in-tree targets). However, until the new opcodes are legal for those targets, there's not much point in combining them only to revert them back to load+extend in the legalizer.

One other thing to mention is that we don't have a target-independent combiner in GlobalISel at the moment. Each target implements its own combiner(s) and makes use of code in CombinerHelper (where appropriate) to share code. I expect these combines to be used by multiple targets so I've put the bulk of the code in CombinerHelper but each target will need to add a pass and call to it.

Herald added a reviewer: javed.absar. · View Herald TranscriptApr 23 2018, 12:45 AM

In D45543#1074883, @dsanders wrote:

In D45543#1066099, @aditya_nandakumar wrote:

Maybe we could have all the tests in one file called prelegalize-combine-extloads.mir

I went with three files to match how we've been organizing the tests for the other passes but you raise a good point here. There's a good argument for the combiner being tested with one file per CombinerHelper::try*() function. I think that's probably a better organization.

In D45543#1071617, @aemerson wrote:

I'm assuming this is in Target/AArch64 because other targets haven't been updated to use the new opcodes yet? We do eventually want to use these representations for every target though right?

That's right. As you say, we'll want to support it in every target that supports the extending loads (which is most if not all of the in-tree targets). However, until the new opcodes are legal for those targets, there's not much point in combining them only to revert them back to load+extend in the legalizer.

One other thing to mention is that we don't have a target-independent combiner in GlobalISel at the moment. Each target implements its own combiner(s) and makes use of code in CombinerHelper (where appropriate) to share code. I expect these combines to be used by multiple targets so I've put the bulk of the code in CombinerHelper but each target will need to add a pass and call to it.

Ok makes sense. We still have some work to do on the combiner design front on how to allow targets to select subsets of combines they're interested in so we can have a shared pipeline. I think this is fine for now but it raises the priority for that discussion later.

This revision is now accepted and ready to land.Apr 23 2018, 8:34 AM

rtereshin added inline comments.Apr 30 2018, 12:40 PM

lib/CodeGen/GlobalISel/CombinerHelper.cpp
53	It looks like we have a contract-inconsistency here. `CombinerHelper::tryCombineCopy` assumes it could be called on any opcode, and if it's not a COPY, it's expected to just gracefully return and report it didn't change anything. While the newly added `CombinerHelper::tryCombineExtendingLoads` requires the opcode belonging to a specific subset. I think it makes sense to be consistent about it and probably not just within the `CombinerHelper`, but among all the derived combiners and maybe even all global isel combiners in general. + @aditya_nandakumar
64	Is it possible to have a memory operand missing here? Also, if not, does MachineVerifier enforce it?
73	`getOpcodeDef` is able to look through copies, therefore I'd expect this combiner to match the following sequence: %v1:_(s16) = G_LOAD %ptr(p0), (load 2) %v2:_(s16) = COPY %v1(s16) %v3:_(s32) = G_ZEXT %v2(s16) and produce the following output: %v1:_(s16) = G_LOAD %ptr(p0), (load 2) %v2:_(s16) = COPY %v1(s16) %v3:_(s32) = G_ZEXTLOAD %ptr(p0), (load 2) Do you think it's a good idea to add tests like this and control that this, in fact, actually happens and happens correctly?
244–246	This is clearly supposed to try all of the combines implemented in the helper: /// If \p MI is extend that consumes the result of a load, try to combine it. /// Returns true if MI changed. bool tryCombineExtendingLoads(MachineInstr &MI); /// Try to transform \p MI by using all of the above /// combine functions. Returns true if changed. bool tryCombine(MachineInstr &MI); };
lib/Target/AArch64/AArch64PreLegalizerCombiner.cpp
50	`CombinerHelper::tryCombineCopy` contains a bug (it doesn't check that register classes and register banks of its source and destination are compatible ), as soon as it's fixed* and `CombinerHelper::tryCombine` properly tries all of the combines implemented in `CombinerHelper` as it's supposed to, this could be replaced with just a single call to `CombinerHelper::tryCombine`, not before, though. *I'm planning on adding a patch with that fix soon, but it will be dependent on https://reviews.llvm.org/D45732 as the latter makes the former simpler and shorter. + @aditya_nandakumar

aditya_nandakumar added inline comments.Apr 30 2018, 12:47 PM

lib/CodeGen/GlobalISel/CombinerHelper.cpp
53	Good catch @rtereshin - I would think being consistent here is nice - ie return gracefully if it's not the opcode we want (unless there's a strong reason to change that).
244–246	I suspect he's put the calls to tryCombineExtendingLoads in AArch pass as other passes may not handle the extending load opcodes correctly in the legalizer. If/when they're ready, then it makes sense to move them into the generic helper.

dsanders added inline comments.May 1 2018, 9:58 AM

lib/CodeGen/GlobalISel/CombinerHelper.cpp
53	I can switch it over to that. I just thought it was a shame to check it on both sides of the call.
64	Is it possible to have a memory operand missing here? No, several parts of GlobalISel require it Also, if not, does MachineVerifier enforce it? Yes, it reports 'Generic instruction accessing memory must have one mem operand' if it's not the case
73	That makes sense to me. We won't want to go overboard with that kind of thing though, it's enough to check it in a couple combines
244–246	They'll combine correctly but the legalizer will immediately decompose them again on most targets at the moment so it's just wasted effort slowing the compile-times. This brings up something I've been wondering how to deal with over the last few days. If we continue down the path we're going, I currently don't see how we're going to manage large collections of combines and several targets. Suppose the Foo and Bar targets supports the following combines: tryCombine tryCombineGroup1 tryA tryB tryC tryCombineGroup2 tryD tryE tryF and each calls tryCombine. Now suppose Bar realizes that tryE is doing more harm than good. We could make tryE check that the target isn't Bar. The consequence of that is that we have a monolithic implementation covering all targets. It means a target specific change can introduce all-target bugs, all targets must carry everyones complexity, every change needs testing (including performance) for every target, etc. We could make Bar call tryCombineGroup1, tryD, and tryF instead. The consequence of that is that useful changes to tryCombine, and tryCombineGroup2 won't apply to Bar. Everyone will have to review combines and choose to enable them and as such won't benefit from improvements. They also won't suffer from losses either which is a good thing but I would hope that those are less common. We could split tryCombineGroup2 to get (this is just one example): tryCombine tryCombineGroup1 tryA tryB tryC tryCombineGroup2 tryD tryF tryCombineGroup3 tryE this of course assumes that ordering between E and F doesn't matter. Of course, if we do that enough then we eventually reach: tryA tryB tryC tryD tryF tryE bringing with it the same problems as the previous option. The answer I keep returning to at the moment is that even if the majority of our implementation is C++, we need something declarative to organize and control the combines for each target. We don't necessarily need to go to my original intent of re-using the ISel matcher for combines and thereby doing the majority of the matching in tablegenerated code but we do need a declarative way to control whether a combine is enabled/disabled (which is trivial) and the order they're applied (not so trivial).
lib/Target/AArch64/AArch64PreLegalizerCombiner.cpp
50	and CombinerHelper::tryCombine properly tries all of the combines implemented in CombinerHelper as it's supposed to At the moment, that will look like a good idea but having a single monolithic tryCombine() isn't going to last long as more combines get implemented and more targets use combines. (see above)

rtereshin added inline comments.May 1 2018, 10:55 AM

lib/CodeGen/GlobalISel/CombinerHelper.cpp
64	Cool, thanks!
73	Agreed.
244–246	I like the idea with groups. I think it will make it easier for target-writers (and not only human target-writers ;-) ) to compose a reasonable pipeline for their needs relatively easily. Also, the grouping doesn't have to be 1-level, it could be more, if well-structured and useful. It may worth contemplating though what principle should be used to group combines together. I think it's going to be tempting to group them by root-level opcode: all G_ADD combines, all arithmetic combines (2-nd level group), etc, but this kind of grouping has a chance to prove less useful than architecture / micro-architecture targeted grouping. Like "combines for super-scalar architectures with a lot of instruction-level parallelism", and "architectures with highly efficient SIMD units" etc, something driven by what actual targets use and ignore. As for the declarative approach, honestly, putting as much as possible into Tablegen doesn't strike me like a wise approach. Tablegen'erated implementations are hard to read and search, interfaces between them and the rest of the compiler seem to be obscure and fragile, and it's very hard to make changes to any of it. If it's possible to derive the implementation from the information already provided by target writers, extending a Tablegen-backend at least goes into "let's auto-generate the entire compiler" direction, which is valuable. For instance, if we could derive the optimal pipeline of combines from scheduling and instruction info already put together by the targets. If it's needed, however, to come up with a new set of Tablegen-classes to explicitly define that pipeline manually in a *.td-file, and write an entirely new Tablegen-backend to process it, that doesn't look like a valuable thing to do. We could be as declarative as we want to be in C++ and that may be much easier to work with. Also, I think it may be valuable to make whatever design we eventually come up with easily compatible with a hypothetical (at the moment) tool that would be able to generate an optimal combine-pipeline for a target automatically, provided a performance feedback mechanism. Let it start with something reasonable, compile a corpus of programs, evaluate the code quality via that feedback mechanism, mutate the pipeline a bit, and try again. That sort of thing calls for a separate binary tool having a wide access to LLVM infrastructure, including all the combines, not for a Table-gen backend. I gave up on making Testgen a part of the Tablegen process very quickly, for instance.
lib/Target/AArch64/AArch64PreLegalizerCombiner.cpp
50	Something still needs to be done about `tryCombine`. If it doesn't represent any practically useful group used by any target (not necessarily in final implementation, during experimentation too), let's remove it. if it does, let's maintain it so it does precisely what it promises to do.

aemerson added inline comments.May 8 2018, 8:33 AM

lib/CodeGen/GlobalISel/CombinerHelper.cpp
244–246	All good points from you and Daniel. I don't know if tablegen is the best tool for the combiners, as I worry we'll paint ourselves into a corner once we go down that path. I think it would be worth exploring the design of combiner groups, and higher level schemes expressed in a declarative way as Daniel wants. The schemes could encapsulate the choices a target makes about their combiner configuration, with a default one that most targets would use at least at the beginning. Where customisability comes into it is with something like predicates/masks that enable/disable the activation of specific combines in the groups, so a scheme is a set of groups or existing scheme with overlaid modifications. For altering the orders of combines, an overlay could express the reversal of two given combines within a specific group.

dsanders added inline comments.May 8 2018, 12:05 PM

lib/CodeGen/GlobalISel/CombinerHelper.cpp
244–246	It may worth contemplating though what principle should be used to group combines together. I think it's going to be tempting to group them by root-level opcode: all G_ADD combines, all arithmetic combines (2-nd level group), etc, but this kind of grouping has a chance to prove less useful than architecture / micro-architecture targeted grouping. Like "combines for super-scalar architectures with a lot of instruction-level parallelism", and "architectures with highly efficient SIMD units" etc, something driven by what actual targets use and ignore. That's what I was thinking and is the reason for naming this tryCombineExtendingLoads(). We might need to break it down as more targets get added though as some only have sign-extending loads and others only have zero-extending loads As for the declarative approach, honestly, putting as much as possible into Tablegen doesn't strike me like a wise approach... As you say, it doesn't necessarily have to be tablegen. The main thing is being able to throw the common rules and target-dependent rules together and get something that executes them in a sensible order and excludes the things the target doesn't want. Also, I think it may be valuable to make whatever design we eventually come up with easily compatible with a hypothetical (at the moment) tool that would be able to generate an optimal combine-pipeline for a target automatically, provided a performance feedback mechanism. Let it start with something reasonable, compile a corpus of programs, evaluate the code quality via that feedback mechanism, mutate the pipeline a bit, and try again. I agree. This line of thinking is a key reason I included a code coverage mechanism in the InstructionSelector. By extending the 1-bit counters to N-bit and feeding that into the rule sorting, we would end up with a match table that prioritized common rules as far as correctness allowed. All good points from you and Daniel. I don't know if tablegen is the best tool for the combiners, as I worry we'll paint ourselves into a corner once we go down that path. That's something we'll need to investigate while we look at how to maintain this long-term. Just for reference, in my original plan back when I started on the DAGISel importer, my main arguments in favour of tablegen for combiners were: ISel and Combiners are essentially the same thing. They match MIR and replace it with equivalent MIR. By using the same infrastructure for both, optimization investments we make for one can benefit the other. By using the same infrastructure for both, solving the ordering problem for one solves it for the other. The feature bits mechanism (e.g. `Requires<[HasX]>`) is ideal for compiling out Combines as you can simply tell tablegen to discard rules with particular feature bits and check the rest at run time. This can also be done for ISel, potentially reducing the match table size when you only need particular subtarget(s). I didn't go through the implementation detail for Combiners at the time but I did leave a few doors open in case we decided to go down this route later. For example, this is the reason the implementation permits multiple match roots.

Update before commit

Harbormaster completed remote builds in B17860: Diff 145802.May 8 2018, 3:30 PM

Closed by commit rL331816: [globalisel] Add a combiner helpers for extending loads and use them in a pre… (authored by dsanders). · Explain WhyMay 8 2018, 3:30 PM

This revision was automatically updated to reflect the committed changes.

Re-opening as I'm about to update it with a revised version. The previous one broke several bots because it sank loads down to the extends

This revision is now accepted and ready to land.May 29 2018, 10:17 AM

Rewrite the patch such that the extends hoist up to the loads. The previous
patch had a couple problems that became clear on the bots:

The loads could be duplicated. This is a correctness problem for volatile loads but more generally is likely to harm performance.
The loads would sink to the extends without any hazard checking

Matching the load and folding in the uses is significantly more complex than
the previous code, but by anchoring the load where it is we both avoid both
correctness and performance issues.

The majority of the complexity comes from the need to resolve multiple
(potentially conflicting) uses. The approach this patch takes is fairly simple
in principle:

Pick a preferred extend (see below)
Fold that extend into the load
Fix up the other uses with truncates/extends

2 and 3 are fairly straightforward but 1 relies on heuristics to make a good
choice. The current heuristics are:

Prefer a sext/zext over an anyext. This is on the basis that anyext is essentially free (and we therefore don't save instructions by folding it), is compatible with sext/zext, and extending with defined bits opens further optimization opportunities once we have known-bits infrastructure.
Prefer a sext over a zext. This is on the basis that a zext typically lowers to a single immediate 'and' instruction whereas a signext typically lowers to a shift-left/shift-right sequence (except in special cases). It's therefore cheaper to zext.
Prefer larger types. This is on the basis that G_TRUNC is usually free (Mips is a notable exception and will probably want to tweak this when its port gets this far). There is a catch with this though since it can also increase the pressure on larger registers which some targets have a smaller supply of.

Harbormaster completed remote builds in B18684: Diff 148945.May 29 2018, 10:42 AM

The full patch. The previous update was taken from the wrong machine and hadn't
been finished. Aside from not compiling, there was also a bug where additional
combine attempts would see G_SEXTLOAD mutate into G_LOAD (the anyext form).

aemerson added inline comments.May 31 2018, 5:30 AM

test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll
42	This was removed during r332449?

dsanders added inline comments.Jun 7 2018, 7:59 AM

test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll
42	This is probably a result of the rebase. I'll remove it

Hi Daniel, sorry for the delay.

I have some questions inline, but overall it seems that we don't keep track of how many of each use is a particular kind of extend. Could we have situations where overall we increase code size due to, for example, preferring a sign extend even if we have multiple zero-extending users? This might not be worth tackling in a first pass at this if it's a rare case though.

include/llvm/CodeGen/GlobalISel/CombinerHelper.h
34	There doesn't seem to be a user of this.
lib/CodeGen/GlobalISel/Combiner.cpp
29	What's the purpose of this? Currently it just wraps around the underlying worklist. Debugging messages only?
33	Should we have a specialization for `GISelWorklist<512>`, and then not repeat the 512 in multiple places?
37	Delete or DEBUG()?
lib/CodeGen/GlobalISel/CombinerHelper.cpp
55	Can we promote this lambda into a helper function? From the name and lack of header doc it's unclear what it's supposed to do with the params on first reading.
64	What else could ExtendOpcodeB be at this point? If nothing else, should be an assert?

Thanks Amara

I have some questions inline, but overall it seems that we don't keep track of how many of each use
is a particular kind of extend. Could we have situations where overall we increase code size due to,
for example, preferring a sign extend even if we have multiple zero-extending users? This might not
be worth tackling in a first pass at this if it's a rare case though.

That's a good point. Yes, that can happen at the moment.

I can see two cases (and mixtures of the two). The first is multiple zexts to the same type. In this case, we would generate one trunc/zext pair for each use. CSE ought to fix it later (once we enable that) but it would be better to CSE it up front and avoid paying the cost of allocating+processing the redundant instructions. A simple map of opcode and LLT to vreg while emitting the trunc/zext's ought to do the trick there. The other is the case where there's zexts to multiple types. In this case, picking the sext and trunc/zext is still a win as it eliminates 1-2 instructions whereas whichever zext we pick can eliminate at most 1 since all but one zext still needs to emit an instruction. There's target-specific special cases though. For example, Mips sext from 32-bit to >32-bit is free (because it was actually done by the trunc or the gpr32 instruction that def'd it) so picking that particular sext is worse than picking any zext.

include/llvm/CodeGen/GlobalISel/CombinerHelper.h
34	That's right. It's not needed for this particular combine but there should be some that need it in future. I could drop it from this patch
lib/CodeGen/GlobalISel/Combiner.cpp
29	The intent behind CombinerChangeObserver is to inform the combiner pass that certain events happened in CombinerHelper. The main one's I'm expecting to end up with are instruction creation and deletion. I can also see instruction modification events here in future. I don't think scheduling for revisit should be in CombinerChangeObserver though. I think that should be determined by the combiner pass in response to these events. It should also be derived from the implemented combines in some way to avoid the issue in DAGCombine where combines are sometimes missed because one rule failed to schedule the right node (e.g. because the root wasn't directly connected to the modified nodes in the case combines that cover several nodes). The reason for having CombinerChangeObserver subclasses rather than just implementing it directly is so that CombinerHelper is usable by lots of different kinds of Combiner passes. We might have different implementations for `O0` - `O3`, or a specific strategy might work better for one particular target, or maybe the pass isn't a combiner at all and just wants to borrow a couple combines. Each implementation would provide a CombinerChangeObserver subclass that glues CombinerHelper into its implementation. This particular implementation ought to grow a mechanism to limit the run-time of the pass. Exactly how that should work will need some more thinking about but as an example we could track how deeply recursed the combiner is (i.e. how many combine rules triggered to create it) and decline to schedule instructions beyond a certain point. For `O1` combines might be limited to the first generation of new/modified instructions, for `O2` maybe the first 3 generations, and `O3` might be unlimited.
33	Yes, let's add an alias for it as Combiner::WorkListTy
37	Oops. That was some temporary debugging code. It's useful for tracing the combines though so let's make it DEBUG()/dbgs()
lib/CodeGen/GlobalISel/CombinerHelper.cpp
55	Sure.
64	Anything. There's no guarantee that the use of a load is an extend. We ignore that when selecting a preferred use and deal with the non-extends by inserting a trunc (which is free for most targets) later

aemerson requested changes to this revision.Jun 24 2018, 7:17 PM

aemerson added inline comments.

include/llvm/CodeGen/GlobalISel/CombinerHelper.h
34	Ok, as long as we're aware let's keep it in.
lib/CodeGen/GlobalISel/Combiner.cpp
29	Ok thanks. I think it would be beneficial to have a comment at the definition (can be your reply here summarised).
lib/CodeGen/GlobalISel/CombinerHelper.cpp
64	Then perhaps change the name to something more accurate, like "UseOpcode"?

This revision now requires changes to proceed.Jun 24 2018, 7:17 PM

Rebased

Fixed the various nits

Fixed a problem where the pass manager couldn't schedule 'Function Alias
Analysis Results' which was apparently required by 'AArch64 Instruction
Selection' according to the pass manager... except it wasn't. The actual problem
was that we weren't preserving the StackProtector pass.

I've also added preservation of the CFG analyses while I was fixing the pass
manager issue above.

Harbormaster completed remote builds in B21486: Diff 160708.Aug 14 2018, 3:29 PM

dsanders marked 3 inline comments as done.Aug 14 2018, 3:31 PM

I believe the only remaining issue on this was the compile-time impact. I've run CTMark* before and after and found that no tests were significantly different. The geomean was improved by 1.24%. I'm going to be working on the combiner infrastructure for the next few months so if there are any further issues that come up we can address them post-commit.

Except bullet, kc, and tramp3d-v4 as these don't seem to build when I target AArch64. They all fail due to missing headers for me.

aditya_nandakumar added inline comments.Oct 1 2018, 11:54 AM

include/llvm/CodeGen/GlobalISel/Combiner.h
35	This should not be necessary any more. You can attach delegate to the machine function to observe insertions and deletions. The combiner can install a delegate at the beginning of the pass. Do you see any other reason to have the Observer mechanism? Is there a need to rely on the maintainer scheduling order of visit or is it just to make sure of insertions and deletions?

This revision was not accepted when it landed; it landed in state Needs Review.Oct 1 2018, 11:58 AM

Closed by commit rL343521: [globalisel] Add a combiner helpers for extending loads and use them in a pre… (authored by dsanders). · Explain Why

This revision was automatically updated to reflect the committed changes.

dsanders reopened this revision.Oct 2 2018, 2:55 PM

This revision was not accepted when it landed; it landed in state Needs Review.Oct 2 2018, 7:14 PM

Closed by commit rL343654: Re-commit: [globalisel] Add a combiner helpers for extending loads and use them… (authored by dsanders). · Explain Why

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

GlobalISel/

10 lines

11 lines

5 lines

10 lines

lib/

CodeGen/

GlobalISel/

Combiner.cpp

26 lines

CombinerHelper.cpp

212 lines

Target/

AArch64/

AArch64.h

2 lines

AArch64PreLegalizerCombiner.cpp

106 lines

AArch64TargetMachine.cpp

6 lines

CMakeLists.txt

1 line

test/

CodeGen/

AArch64/

GlobalISel/

arm64-fallback.ll

23 lines

gisel-commandline-option.ll

1 line

prelegalizercombiner-extending-loads.mir

450 lines

O0-pipeline.ll

1 line

Diff 148961

include/llvm/CodeGen/GlobalISel/Combiner.h

	Show All 18 Lines
	#include "llvm/CodeGen/MachineFunctionPass.h"			#include "llvm/CodeGen/MachineFunctionPass.h"

	namespace llvm {			namespace llvm {
	class MachineRegisterInfo;			class MachineRegisterInfo;
	class CombinerInfo;			class CombinerInfo;
	class TargetPassConfig;			class TargetPassConfig;
	class MachineFunction;			class MachineFunction;

				class CombinerChangeObserver {
				public:
				virtual ~CombinerChangeObserver() {}

				/// An instruction was erased.
				virtual void erasedInstr(MachineInstr &MI) = 0;
				/// An instruction was created and inseerted into the function.
				virtual void createdInstr(MachineInstr &MI) = 0;
				};
				aditya_nandakumarUnsubmitted Not Done Reply Inline Actions This should not be necessary any more. You can attach delegate to the machine function to observe insertions and deletions. The combiner can install a delegate at the beginning of the pass. Do you see any other reason to have the Observer mechanism? Is there a need to rely on the maintainer scheduling order of visit or is it just to make sure of insertions and deletions? aditya_nandakumar: This should not be necessary any more. You can attach delegate to the machine function to…

	class Combiner {			class Combiner {
	public:			public:
	Combiner(CombinerInfo &CombinerInfo, const TargetPassConfig *TPC);			Combiner(CombinerInfo &CombinerInfo, const TargetPassConfig *TPC);

	bool combineMachineInstrs(MachineFunction &MF);			bool combineMachineInstrs(MachineFunction &MF);

	protected:			protected:
	CombinerInfo &CInfo;			CombinerInfo &CInfo;
	Show All 9 Lines

include/llvm/CodeGen/GlobalISel/CombinerHelper.h

	Show All 14 Lines
	//			//
	//===--------------------------------------------------------------------===//			//===--------------------------------------------------------------------===//

	#ifndef LLVM_CODEGEN_GLOBALISEL_COMBINER_HELPER_H			#ifndef LLVM_CODEGEN_GLOBALISEL_COMBINER_HELPER_H
	#define LLVM_CODEGEN_GLOBALISEL_COMBINER_HELPER_H			#define LLVM_CODEGEN_GLOBALISEL_COMBINER_HELPER_H

	namespace llvm {			namespace llvm {

				class CombinerChangeObserver;
	class MachineIRBuilder;			class MachineIRBuilder;
	class MachineRegisterInfo;			class MachineRegisterInfo;
	class MachineInstr;			class MachineInstr;

	class CombinerHelper {			class CombinerHelper {
	MachineIRBuilder &Builder;			MachineIRBuilder &Builder;
	MachineRegisterInfo &MRI;			MachineRegisterInfo &MRI;
				CombinerChangeObserver &Observer;

				void eraseInstr(MachineInstr &MI);
				void scheduleForVisit(MachineInstr &MI);
				aemersonUnsubmitted Done Reply Inline Actions There doesn't seem to be a user of this. aemerson: There doesn't seem to be a user of this.
				dsandersAuthorUnsubmitted Done Reply Inline Actions That's right. It's not needed for this particular combine but there should be some that need it in future. I could drop it from this patch dsanders: That's right. It's not needed for this particular combine but there should be some that need it…
				aemersonUnsubmitted Done Reply Inline Actions Ok, as long as we're aware let's keep it in. aemerson: Ok, as long as we're aware let's keep it in.

	public:			public:
	CombinerHelper(MachineIRBuilder &B);			CombinerHelper(CombinerChangeObserver &Observer, MachineIRBuilder &B);

	/// If \p MI is COPY, try to combine it.			/// If \p MI is COPY, try to combine it.
	/// Returns true if MI changed.			/// Returns true if MI changed.
	bool tryCombineCopy(MachineInstr &MI);			bool tryCombineCopy(MachineInstr &MI);

				/// If \p MI is extend that consumes the result of a load, try to combine it.
				/// Returns true if MI changed.
				bool tryCombineExtendingLoads(MachineInstr &MI);

	/// Try to transform \p MI by using all of the above			/// Try to transform \p MI by using all of the above
	/// combine functions. Returns true if changed.			/// combine functions. Returns true if changed.
	bool tryCombine(MachineInstr &MI);			bool tryCombine(MachineInstr &MI);
	};			};
	} // namespace llvm			} // namespace llvm

	#endif			#endif

include/llvm/CodeGen/GlobalISel/CombinerInfo.h

	Show All 11 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_CODEGEN_GLOBALISEL_COMBINER_INFO_H			#ifndef LLVM_CODEGEN_GLOBALISEL_COMBINER_INFO_H
	#define LLVM_CODEGEN_GLOBALISEL_COMBINER_INFO_H			#define LLVM_CODEGEN_GLOBALISEL_COMBINER_INFO_H

	#include <cassert>			#include <cassert>
	namespace llvm {			namespace llvm {

				class CombinerChangeObserver;
	class LegalizerInfo;			class LegalizerInfo;
	class MachineInstr;			class MachineInstr;
	class MachineIRBuilder;			class MachineIRBuilder;
	class MachineRegisterInfo;			class MachineRegisterInfo;

	// Contains information relevant to enabling/disabling various combines for a			// Contains information relevant to enabling/disabling various combines for a
	// pass.			// pass.
	class CombinerInfo {			class CombinerInfo {
	public:			public:
	CombinerInfo(bool AllowIllegalOps, bool ShouldLegalizeIllegal,			CombinerInfo(bool AllowIllegalOps, bool ShouldLegalizeIllegal,
	LegalizerInfo *LInfo)			LegalizerInfo *LInfo)
	: IllegalOpsAllowed(AllowIllegalOps),			: IllegalOpsAllowed(AllowIllegalOps),
	LegalizeIllegalOps(ShouldLegalizeIllegal), LInfo(LInfo) {			LegalizeIllegalOps(ShouldLegalizeIllegal), LInfo(LInfo) {
	assert(((AllowIllegalOps \|\| !LegalizeIllegalOps) \|\| LInfo) &&			assert(((AllowIllegalOps \|\| !LegalizeIllegalOps) \|\| LInfo) &&
	"Expecting legalizerInfo when illegalops not allowed");			"Expecting legalizerInfo when illegalops not allowed");
	}			}
	virtual ~CombinerInfo() = default;			virtual ~CombinerInfo() = default;
	/// If \p IllegalOpsAllowed is false, the CombinerHelper will make use of			/// If \p IllegalOpsAllowed is false, the CombinerHelper will make use of
	/// the legalizerInfo to check for legality before each transformation.			/// the legalizerInfo to check for legality before each transformation.
	bool IllegalOpsAllowed; // TODO: Make use of this.			bool IllegalOpsAllowed; // TODO: Make use of this.

	/// If \p LegalizeIllegalOps is true, the Combiner will also legalize the			/// If \p LegalizeIllegalOps is true, the Combiner will also legalize the
	/// illegal ops that are created.			/// illegal ops that are created.
	bool LegalizeIllegalOps; // TODO: Make use of this.			bool LegalizeIllegalOps; // TODO: Make use of this.
	const LegalizerInfo *LInfo;			const LegalizerInfo *LInfo;
	virtual bool combine(MachineInstr &MI, MachineIRBuilder &B) const = 0;			virtual bool combine(CombinerChangeObserver &Observer, MachineInstr &MI,
				MachineIRBuilder &B) const = 0;
	};			};
	} // namespace llvm			} // namespace llvm

	#endif			#endif

include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines

/// Helper class to build MachineInstr.		/// Helper class to build MachineInstr.
/// It keeps internally the insertion point and debug location for all		/// It keeps internally the insertion point and debug location for all
/// the new instructions we want to create.		/// the new instructions we want to create.
/// This information can be modify via the related setters.		/// This information can be modify via the related setters.
class MachineIRBuilderBase {		class MachineIRBuilderBase {

MachineIRBuilderState State;		MachineIRBuilderState State;
const TargetInstrInfo &getTII() {
assert(State.TII && "TargetInstrInfo is not set");
return *State.TII;
}

void validateTruncExt(unsigned Dst, unsigned Src, bool IsExtend);		void validateTruncExt(unsigned Dst, unsigned Src, bool IsExtend);

protected:		protected:
unsigned getDestFromArg(unsigned Reg) { return Reg; }		unsigned getDestFromArg(unsigned Reg) { return Reg; }
unsigned getDestFromArg(LLT Ty) {		unsigned getDestFromArg(LLT Ty) {
return getMF().getRegInfo().createGenericVirtualRegister(Ty);		return getMF().getRegInfo().createGenericVirtualRegister(Ty);
}		}
unsigned getDestFromArg(const TargetRegisterClass *RC) {		unsigned getDestFromArg(const TargetRegisterClass *RC) {
Show All 26 Lines	public:
MachineIRBuilderBase() = default;		MachineIRBuilderBase() = default;
MachineIRBuilderBase(MachineFunction &MF) { setMF(MF); }		MachineIRBuilderBase(MachineFunction &MF) { setMF(MF); }
MachineIRBuilderBase(MachineInstr &MI) : MachineIRBuilderBase(*MI.getMF()) {		MachineIRBuilderBase(MachineInstr &MI) : MachineIRBuilderBase(*MI.getMF()) {
setInstr(MI);		setInstr(MI);
}		}

MachineIRBuilderBase(const MachineIRBuilderState &BState) : State(BState) {}		MachineIRBuilderBase(const MachineIRBuilderState &BState) : State(BState) {}

		const TargetInstrInfo &getTII() {
		assert(State.TII && "TargetInstrInfo is not set");
		return *State.TII;
		}

/// Getter for the function we currently build.		/// Getter for the function we currently build.
MachineFunction &getMF() {		MachineFunction &getMF() {
assert(State.MF && "MachineFunction is not set");		assert(State.MF && "MachineFunction is not set");
return *State.MF;		return *State.MF;
}		}

/// Getter for DebugLoc		/// Getter for DebugLoc
const DebugLoc &getDL() { return State.DL; }		const DebugLoc &getDL() { return State.DL; }
▲ Show 20 Lines • Show All 758 Lines • Show Last 20 Lines

lib/CodeGen/GlobalISel/Combiner.cpp

Show All 19 Lines
#include "llvm/CodeGen/GlobalISel/Utils.h"		#include "llvm/CodeGen/GlobalISel/Utils.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"

#define DEBUG_TYPE "gi-combiner"		#define DEBUG_TYPE "gi-combiner"

using namespace llvm;		using namespace llvm;

		namespace {
		class WorkListMaintainer : public CombinerChangeObserver {
		aemersonUnsubmitted Done Reply Inline Actions What's the purpose of this? Currently it just wraps around the underlying worklist. Debugging messages only? aemerson: What's the purpose of this? Currently it just wraps around the underlying worklist. Debugging…
		dsandersAuthorUnsubmitted Done Reply Inline Actions The intent behind CombinerChangeObserver is to inform the combiner pass that certain events happened in CombinerHelper. The main one's I'm expecting to end up with are instruction creation and deletion. I can also see instruction modification events here in future. I don't think scheduling for revisit should be in CombinerChangeObserver though. I think that should be determined by the combiner pass in response to these events. It should also be derived from the implemented combines in some way to avoid the issue in DAGCombine where combines are sometimes missed because one rule failed to schedule the right node (e.g. because the root wasn't directly connected to the modified nodes in the case combines that cover several nodes). The reason for having CombinerChangeObserver subclasses rather than just implementing it directly is so that CombinerHelper is usable by lots of different kinds of Combiner passes. We might have different implementations for `O0` - `O3`, or a specific strategy might work better for one particular target, or maybe the pass isn't a combiner at all and just wants to borrow a couple combines. Each implementation would provide a CombinerChangeObserver subclass that glues CombinerHelper into its implementation. This particular implementation ought to grow a mechanism to limit the run-time of the pass. Exactly how that should work will need some more thinking about but as an example we could track how deeply recursed the combiner is (i.e. how many combine rules triggered to create it) and decline to schedule instructions beyond a certain point. For `O1` combines might be limited to the first generation of new/modified instructions, for `O2` maybe the first 3 generations, and `O3` might be unlimited. dsanders: The intent behind CombinerChangeObserver is to inform the combiner pass that certain events…
		aemersonUnsubmitted Done Reply Inline Actions Ok thanks. I think it would be beneficial to have a comment at the definition (can be your reply here summarised). aemerson: Ok thanks. I think it would be beneficial to have a comment at the definition (can be your…
		GISelWorkList<512> &WorkList;

		public:
		WorkListMaintainer(GISelWorkList<512> &WorkList) : WorkList(WorkList) {}
		aemersonUnsubmitted Done Reply Inline Actions Should we have a specialization for `GISelWorklist<512>`, and then not repeat the 512 in multiple places? aemerson: Should we have a specialization for `GISelWorklist<512>`, and then not repeat the 512 in…
		dsandersAuthorUnsubmitted Done Reply Inline Actions Yes, let's add an alias for it as Combiner::WorkListTy dsanders: Yes, let's add an alias for it as Combiner::WorkListTy
		virtual ~WorkListMaintainer() {}

		void erasedInstr(MachineInstr &MI) override {
		errs() << "Erased: ";
		aemersonUnsubmitted Done Reply Inline Actions Delete or DEBUG()? aemerson: Delete or DEBUG()?
		dsandersAuthorUnsubmitted Done Reply Inline Actions Oops. That was some temporary debugging code. It's useful for tracing the combines though so let's make it DEBUG()/dbgs() dsanders: Oops. That was some temporary debugging code. It's useful for tracing the combines though so…
		MI.print(errs());
		errs() << "\n";
		WorkList.remove(&MI);
		}
		void createdInstr(MachineInstr &MI) override {
		errs() << "Created: ";
		MI.print(errs());
		errs() << "\n";
		WorkList.insert(&MI);
		}
		};
		}

Combiner::Combiner(CombinerInfo &Info, const TargetPassConfig *TPC)		Combiner::Combiner(CombinerInfo &Info, const TargetPassConfig *TPC)
: CInfo(Info), TPC(TPC) {		: CInfo(Info), TPC(TPC) {
(void)this->TPC; // FIXME: Remove when used.		(void)this->TPC; // FIXME: Remove when used.
}		}

bool Combiner::combineMachineInstrs(MachineFunction &MF) {		bool Combiner::combineMachineInstrs(MachineFunction &MF) {
// If the ISel pipeline failed, do not bother running this pass.		// If the ISel pipeline failed, do not bother running this pass.
// FIXME: Should this be here or in individual combiner passes.		// FIXME: Should this be here or in individual combiner passes.
Show All 12 Lines	bool Combiner::combineMachineInstrs(MachineFunction &MF) {
bool Changed;		bool Changed;

do {		do {
// Collect all instructions. Do a post order traversal for basic blocks and		// Collect all instructions. Do a post order traversal for basic blocks and
// insert with list bottom up, so while we pop_back_val, we'll traverse top		// insert with list bottom up, so while we pop_back_val, we'll traverse top
// down RPOT.		// down RPOT.
Changed = false;		Changed = false;
GISelWorkList<512> WorkList;		GISelWorkList<512> WorkList;
		WorkListMaintainer Observer(WorkList);
for (MachineBasicBlock *MBB : post_order(&MF)) {		for (MachineBasicBlock *MBB : post_order(&MF)) {
if (MBB->empty())		if (MBB->empty())
continue;		continue;
for (auto MII = MBB->rbegin(), MIE = MBB->rend(); MII != MIE;) {		for (auto MII = MBB->rbegin(), MIE = MBB->rend(); MII != MIE;) {
MachineInstr CurMI = &MII;		MachineInstr CurMI = &MII;
++MII;		++MII;
// Erase dead insts before even adding to the list.		// Erase dead insts before even adding to the list.
if (isTriviallyDead(CurMI, MRI)) {		if (isTriviallyDead(CurMI, MRI)) {
LLVM_DEBUG(dbgs() << *CurMI << "Is dead; erasing.\n");		LLVM_DEBUG(dbgs() << *CurMI << "Is dead; erasing.\n");
CurMI->eraseFromParentAndMarkDBGValuesForRemoval();		CurMI->eraseFromParentAndMarkDBGValuesForRemoval();
continue;		continue;
}		}
WorkList.insert(CurMI);		WorkList.insert(CurMI);
}		}
}		}
// Main Loop. Process the instructions here.		// Main Loop. Process the instructions here.
while (!WorkList.empty()) {		while (!WorkList.empty()) {
MachineInstr *CurrInst = WorkList.pop_back_val();		MachineInstr *CurrInst = WorkList.pop_back_val();
LLVM_DEBUG(dbgs() << "Try combining " << *CurrInst << "\n";);		LLVM_DEBUG(dbgs() << "Try combining " << *CurrInst << "\n";);
Changed \|= CInfo.combine(*CurrInst, Builder);		Changed \|= CInfo.combine(Observer, *CurrInst, Builder);
}		}
MFChanged \|= Changed;		MFChanged \|= Changed;
} while (Changed);		} while (Changed);

return MFChanged;		return MFChanged;
}		}

lib/CodeGen/GlobalISel/CombinerHelper.cpp

	//== ---lib/CodeGen/GlobalISel/GICombinerHelper.cpp --------------------- == //			//== ---lib/CodeGen/GlobalISel/GICombinerHelper.cpp --------------------- == //
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
				#include "llvm/CodeGen/GlobalISel/Combiner.h"
	#include "llvm/CodeGen/GlobalISel/CombinerHelper.h"			#include "llvm/CodeGen/GlobalISel/CombinerHelper.h"
	#include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"			#include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"
	#include "llvm/CodeGen/GlobalISel/Utils.h"			#include "llvm/CodeGen/GlobalISel/Utils.h"
	#include "llvm/CodeGen/MachineInstr.h"			#include "llvm/CodeGen/MachineInstr.h"
	#include "llvm/CodeGen/MachineRegisterInfo.h"			#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/CodeGen/TargetInstrInfo.h"

	#define DEBUG_TYPE "gi-combine"			#define DEBUG_TYPE "gi-combine"

	using namespace llvm;			using namespace llvm;

	CombinerHelper::CombinerHelper(MachineIRBuilder &B) :			CombinerHelper::CombinerHelper(CombinerChangeObserver &Observer,
	Builder(B), MRI(Builder.getMF().getRegInfo()) {}			MachineIRBuilder &B)
				: Builder(B), MRI(Builder.getMF().getRegInfo()), Observer(Observer) {}

				void CombinerHelper::eraseInstr(MachineInstr &MI) {
				Observer.erasedInstr(MI);
				}
				void CombinerHelper::scheduleForVisit(MachineInstr &MI) {
				Observer.createdInstr(MI);
				}

	bool CombinerHelper::tryCombineCopy(MachineInstr &MI) {			bool CombinerHelper::tryCombineCopy(MachineInstr &MI) {
	if (MI.getOpcode() != TargetOpcode::COPY)			if (MI.getOpcode() != TargetOpcode::COPY)
	return false;			return false;
	unsigned DstReg = MI.getOperand(0).getReg();			unsigned DstReg = MI.getOperand(0).getReg();
	unsigned SrcReg = MI.getOperand(1).getReg();			unsigned SrcReg = MI.getOperand(1).getReg();
	LLT DstTy = MRI.getType(DstReg);			LLT DstTy = MRI.getType(DstReg);
	LLT SrcTy = MRI.getType(SrcReg);			LLT SrcTy = MRI.getType(SrcReg);
	// Simple Copy Propagation.			// Simple Copy Propagation.
	// a(sx) = COPY b(sx) -> Replace all uses of a with b.			// a(sx) = COPY b(sx) -> Replace all uses of a with b.
	if (DstTy.isValid() && SrcTy.isValid() && DstTy == SrcTy) {			if (DstTy.isValid() && SrcTy.isValid() && DstTy == SrcTy) {
	MI.eraseFromParent();			MI.eraseFromParent();
	MRI.replaceRegWith(DstReg, SrcReg);			MRI.replaceRegWith(DstReg, SrcReg);
	return true;			return true;
	}			}
	return false;			return false;
	}			}

				bool CombinerHelper::tryCombineExtendingLoads(MachineInstr &MI) {
				struct PreferredTuple {
				LLT Ty; // The result type of the extend.
				unsigned ExtendOpcode; // G_ANYEXT/G_SEXT/G_ZEXT
				MachineInstr *MI;
				rtereshinUnsubmitted Done Reply Inline Actions It looks like we have a contract-inconsistency here. `CombinerHelper::tryCombineCopy` assumes it could be called on any opcode, and if it's not a COPY, it's expected to just gracefully return and report it didn't change anything. While the newly added `CombinerHelper::tryCombineExtendingLoads` requires the opcode belonging to a specific subset. I think it makes sense to be consistent about it and probably not just within the `CombinerHelper`, but among all the derived combiners and maybe even all global isel combiners in general. + @aditya_nandakumar rtereshin: It looks like we have a contract-inconsistency here. `CombinerHelper::tryCombineCopy` assumes…
				aditya_nandakumarUnsubmitted Done Reply Inline Actions Good catch @rtereshin - I would think being consistent here is nice - ie return gracefully if it's not the opcode we want (unless there's a strong reason to change that). aditya_nandakumar: Good catch @rtereshin - I would think being consistent here is nice - ie return gracefully if…
				dsandersAuthorUnsubmitted Done Reply Inline Actions I can switch it over to that. I just thought it was a shame to check it on both sides of the call. dsanders: I can switch it over to that. I just thought it was a shame to check it on both sides of the…
				};
				const auto ChoosePreferredScalar =
				aemersonUnsubmitted Done Reply Inline Actions Can we promote this lambda into a helper function? From the name and lack of header doc it's unclear what it's supposed to do with the params on first reading. aemerson: Can we promote this lambda into a helper function? From the name and lack of header doc it's…
				dsandersAuthorUnsubmitted Done Reply Inline Actions Sure. dsanders: Sure.
				[](PreferredTuple &A, const LLT &TyB, unsigned ExtendOpcodeB,
				MachineInstr *InstrB) -> PreferredTuple {
				if (!A.Ty.isValid()) {
				if (A.ExtendOpcode == ExtendOpcodeB)
				return {TyB, ExtendOpcodeB, InstrB};
				if (A.ExtendOpcode == TargetOpcode::G_ANYEXT &&
				(ExtendOpcodeB == TargetOpcode::G_SEXT \|\|
				ExtendOpcodeB == TargetOpcode::G_ZEXT \|\|
				ExtendOpcodeB == TargetOpcode::G_ANYEXT))
				rtereshinUnsubmitted Done Reply Inline Actions Is it possible to have a memory operand missing here? Also, if not, does MachineVerifier enforce it? rtereshin: Is it possible to have a memory operand missing here? Also, if not, does MachineVerifier…
				dsandersAuthorUnsubmitted Done Reply Inline Actions Is it possible to have a memory operand missing here? No, several parts of GlobalISel require it Also, if not, does MachineVerifier enforce it? Yes, it reports 'Generic instruction accessing memory must have one mem operand' if it's not the case dsanders: > Is it possible to have a memory operand missing here? No, several parts of GlobalISel…
				rtereshinUnsubmitted Done Reply Inline Actions Cool, thanks! rtereshin: Cool, thanks!
				aemersonUnsubmitted Done Reply Inline Actions What else could ExtendOpcodeB be at this point? If nothing else, should be an assert? aemerson: What else could ExtendOpcodeB be at this point? If nothing else, should be an assert?
				dsandersAuthorUnsubmitted Done Reply Inline Actions Anything. There's no guarantee that the use of a load is an extend. We ignore that when selecting a preferred use and deal with the non-extends by inserting a trunc (which is free for most targets) later dsanders: Anything. There's no guarantee that the use of a load is an extend. We ignore that when…
				aemersonUnsubmitted Done Reply Inline Actions Then perhaps change the name to something more accurate, like "UseOpcode"? aemerson: Then perhaps change the name to something more accurate, like "UseOpcode"?
				return {TyB, ExtendOpcodeB, InstrB};
				return A;
				}

				// We permit the extend to hoist through basic blocks but this is only
				// sensible if the target has extending loads. If you end up lowering back
				// into a load and extend during the legalizer then the end result is
				// hoisting the extend up to the load.

				rtereshinUnsubmitted Done Reply Inline Actions `getOpcodeDef` is able to look through copies, therefore I'd expect this combiner to match the following sequence: %v1:_(s16) = G_LOAD %ptr(p0), (load 2) %v2:_(s16) = COPY %v1(s16) %v3:_(s32) = G_ZEXT %v2(s16) and produce the following output: %v1:_(s16) = G_LOAD %ptr(p0), (load 2) %v2:_(s16) = COPY %v1(s16) %v3:_(s32) = G_ZEXTLOAD %ptr(p0), (load 2) Do you think it's a good idea to add tests like this and control that this, in fact, actually happens and happens correctly? rtereshin: `getOpcodeDef` is able to look through copies, therefore I'd expect this combiner to match the…
				dsandersAuthorUnsubmitted Done Reply Inline Actions That makes sense to me. We won't want to go overboard with that kind of thing though, it's enough to check it in a couple combines dsanders: That makes sense to me. We won't want to go overboard with that kind of thing though, it's…
				rtereshinUnsubmitted Done Reply Inline Actions Agreed. rtereshin: Agreed.
				// Prefer defined extensions to undefined extensions as these are more
				// likely to reduce the number of instructions.
				if (ExtendOpcodeB == TargetOpcode::G_ANYEXT &&
				A.ExtendOpcode != TargetOpcode::G_ANYEXT)
				return A;
				else if (A.ExtendOpcode == TargetOpcode::G_ANYEXT &&
				ExtendOpcodeB != TargetOpcode::G_ANYEXT)
				return {TyB, ExtendOpcodeB, InstrB};

				// Prefer sign extensions to zero extensions as sign-extensions tend to be
				// more expensive.
				if (A.Ty == TyB) {
				if (A.ExtendOpcode == TargetOpcode::G_SEXT &&
				ExtendOpcodeB == TargetOpcode::G_ZEXT)
				return A;
				else if (A.ExtendOpcode == TargetOpcode::G_ZEXT &&
				ExtendOpcodeB == TargetOpcode::G_SEXT)
				return {TyB, ExtendOpcodeB, InstrB};
				}

				// This is potentially target specific. We've chosen the largest type
				// because G_TRUNC is usually free. One potential catch with this is that
				// some targets have a reduced number of larger registers than smaller
				// registers and this choice potentially increases the live-range for the
				// larger value.
				if (TyB.getSizeInBits() > A.Ty.getSizeInBits()) {
				return {TyB, ExtendOpcodeB, InstrB};
				}
				return A;
				};

				// We match the loads and follow the uses to the extend instead of matching
				// the extends and following the def to the load. This is because the load
				// must remain in the same position for correctness (unless we also add code
				// to find a safe place to sink it) whereas the extend is freely movable.
				// It also prevents us from duplicating the load for the volatile case or just
				// for performance.

				if (MI.getOpcode() != TargetOpcode::G_LOAD &&
				MI.getOpcode() != TargetOpcode::G_SEXTLOAD &&
				MI.getOpcode() != TargetOpcode::G_ZEXTLOAD)
				return false;

				auto &LoadValue = MI.getOperand(0);
				assert(LoadValue.isReg() && "Result wasn't a register?");

				LLT LoadValueTy = MRI.getType(LoadValue.getReg());
				if (!LoadValueTy.isScalar())
				return false;

				// Find the preferred type aside from the any-extends (unless it's the only
				// one) and non-extending ops. We'll emit an extending load to that type and
				// and emit a variant of (extend (trunc X)) for the others according to the
				// relative type sizes. At the same time, pick an extend to use based on the
				// extend involved in the chosen type.
				PreferredTuple Preferred = {LLT(),
				MI.getOpcode() == TargetOpcode::G_LOAD
				? TargetOpcode::G_ANYEXT
				: MI.getOpcode() == TargetOpcode::G_SEXTLOAD
				? TargetOpcode::G_SEXT
				: TargetOpcode::G_ZEXT,
				nullptr};
				for (auto &UseMI : MRI.use_instructions(LoadValue.getReg())) {
				if (UseMI.getOpcode() == TargetOpcode::G_SEXT \|\|
				UseMI.getOpcode() == TargetOpcode::G_ZEXT \|\| !Preferred.Ty.isValid())
				Preferred = ChoosePreferredScalar(
				Preferred, MRI.getType(UseMI.getOperand(0).getReg()),
				UseMI.getOpcode(), &UseMI);
				}

				// There were no extends
				if (!Preferred.MI)
				return false;
				// It should be impossible to chose an extend without selecting a different
				// type since by definition the result of an extend is larger.
				assert(Preferred.Ty != LoadValueTy && "Extending to same type?");

				// Rewrite the load and schedule the canonical use for erasure.
				const auto TruncateUse = [*this](MachineOperand & UseMO, unsigned DstReg,
				unsigned SrcReg) {
				MachineInstr &UseMI = *UseMO.getParent();
				MachineBasicBlock &UseMBB = *UseMI.getParent();

				Builder.setInsertPt(UseMBB, MachineBasicBlock::iterator(UseMI));
				Builder.buildTrunc(DstReg, SrcReg);
				};

				// Rewrite the load to the chosen extending load.
				unsigned ChosenDstReg = Preferred.MI->getOperand(0).getReg();
				MI.setDesc(
				Builder.getTII().get(Preferred.ExtendOpcode == TargetOpcode::G_SEXT
				? TargetOpcode::G_SEXTLOAD
				: Preferred.ExtendOpcode == TargetOpcode::G_ZEXT
				? TargetOpcode::G_ZEXTLOAD
				: TargetOpcode::G_LOAD));

				// Rewrite all the uses to fix up the types.
				SmallVector<MachineInstr *, 1> ScheduleForErase;
				for (auto &UseMO : MRI.use_operands(LoadValue.getReg())) {
				MachineInstr *UseMI = UseMO.getParent();

				// If the extend is compatible with the preferred extend then we should fix
				// up the type and extend so that it uses the preferred use.
				if (UseMI->getOpcode() == Preferred.ExtendOpcode \|\|
				UseMI->getOpcode() == TargetOpcode::G_ANYEXT) {
				unsigned UseDstReg = UseMI->getOperand(0).getReg();
				unsigned UseSrcReg = UseMI->getOperand(1).getReg();
				const LLT &UseDstTy = MRI.getType(UseDstReg);
				if (UseDstReg != ChosenDstReg) {
				if (Preferred.Ty == UseDstTy) {
				// If the use has the same type as the preferred use, then merge
				// the vregs and erase the extend. For example:
				// %1:_(s8) = G_LOAD ...
				// %2:_(s32) = G_SEXT %1(s8)
				// %3:_(s32) = G_ANYEXT %1(s8)
				// ... = ... %3(s32)
				// rewrites to:
				// %2:_(s32) = G_SEXTLOAD ...
				// ... = ... %2(s32)
				MRI.replaceRegWith(UseDstReg, ChosenDstReg);
				ScheduleForErase.push_back(UseMO.getParent());
				Observer.erasedInstr(*UseMO.getParent());
				} else if (Preferred.Ty.getSizeInBits() < UseDstTy.getSizeInBits()) {
				// If the preferred size is smaller, then keep the extend but extend
				// from the result of the extending load. For example:
				// %1:_(s8) = G_LOAD ...
				// %2:_(s32) = G_SEXT %1(s8)
				// %3:_(s64) = G_ANYEXT %1(s8)
				// ... = ... %3(s64)
				/// rewrites to:
				// %2:_(s32) = G_SEXTLOAD ...
				// %3:_(s64) = G_ANYEXT %2:_(s32)
				// ... = ... %3(s64)
				MRI.replaceRegWith(UseSrcReg, ChosenDstReg);
				} else {
				// If the preferred size is large, then insert a truncate. For
				// example:
				// %1:_(s8) = G_LOAD ...
				// %2:_(s64) = G_SEXT %1(s8)
				// %3:_(s32) = G_ZEXT %1(s8)
				// ... = ... %3(s32)
				/// rewrites to:
				// %2:_(s64) = G_SEXTLOAD ...
				// %4:_(s8) = G_TRUNC %2:_(s32)
				// %3:_(s64) = G_ZEXT %2:_(s8)
				// ... = ... %3(s64)
				TruncateUse(UseMO, MI.getOperand(0).getReg(), ChosenDstReg);
				}
				continue;
				}
				// The use is (one of) the uses of the preferred use we chose earlier.
				// We're going to update the load to def this value later so just erase
				// the old extend.
				ScheduleForErase.push_back(UseMO.getParent());
				Observer.erasedInstr(*UseMO.getParent());
				continue;
				}

				// The use isn't an extend. Truncate back to the type we originally loaded.
				// This is free on many targets.
				TruncateUse(UseMO, MI.getOperand(0).getReg(), ChosenDstReg);
				}
				for (auto &EraseMI : ScheduleForErase)
				EraseMI->eraseFromParent();
				MI.getOperand(0).setReg(ChosenDstReg);

				return true;
				}

	bool CombinerHelper::tryCombine(MachineInstr &MI) {			bool CombinerHelper::tryCombine(MachineInstr &MI) {
	return tryCombineCopy(MI);			if (tryCombineCopy(MI))
				return true;
				return tryCombineExtendingLoads(MI);
				rtereshinUnsubmitted Not Done Reply Inline Actions This is clearly supposed to try all of the combines implemented in the helper: /// If \p MI is extend that consumes the result of a load, try to combine it. /// Returns true if MI changed. bool tryCombineExtendingLoads(MachineInstr &MI); /// Try to transform \p MI by using all of the above /// combine functions. Returns true if changed. bool tryCombine(MachineInstr &MI); }; rtereshin: This is clearly supposed to try all of the combines implemented in the helper: ``` /// If \p…
				aditya_nandakumarUnsubmitted Not Done Reply Inline Actions I suspect he's put the calls to tryCombineExtendingLoads in AArch pass as other passes may not handle the extending load opcodes correctly in the legalizer. If/when they're ready, then it makes sense to move them into the generic helper. aditya_nandakumar: I suspect he's put the calls to tryCombineExtendingLoads in AArch pass as other passes may not…
				dsandersAuthorUnsubmitted Not Done Reply Inline Actions They'll combine correctly but the legalizer will immediately decompose them again on most targets at the moment so it's just wasted effort slowing the compile-times. This brings up something I've been wondering how to deal with over the last few days. If we continue down the path we're going, I currently don't see how we're going to manage large collections of combines and several targets. Suppose the Foo and Bar targets supports the following combines: tryCombine tryCombineGroup1 tryA tryB tryC tryCombineGroup2 tryD tryE tryF and each calls tryCombine. Now suppose Bar realizes that tryE is doing more harm than good. We could make tryE check that the target isn't Bar. The consequence of that is that we have a monolithic implementation covering all targets. It means a target specific change can introduce all-target bugs, all targets must carry everyones complexity, every change needs testing (including performance) for every target, etc. We could make Bar call tryCombineGroup1, tryD, and tryF instead. The consequence of that is that useful changes to tryCombine, and tryCombineGroup2 won't apply to Bar. Everyone will have to review combines and choose to enable them and as such won't benefit from improvements. They also won't suffer from losses either which is a good thing but I would hope that those are less common. We could split tryCombineGroup2 to get (this is just one example): tryCombine tryCombineGroup1 tryA tryB tryC tryCombineGroup2 tryD tryF tryCombineGroup3 tryE this of course assumes that ordering between E and F doesn't matter. Of course, if we do that enough then we eventually reach: tryA tryB tryC tryD tryF tryE bringing with it the same problems as the previous option. The answer I keep returning to at the moment is that even if the majority of our implementation is C++, we need something declarative to organize and control the combines for each target. We don't necessarily need to go to my original intent of re-using the ISel matcher for combines and thereby doing the majority of the matching in tablegenerated code but we do need a declarative way to control whether a combine is enabled/disabled (which is trivial) and the order they're applied (not so trivial). dsanders: They'll combine correctly but the legalizer will immediately decompose them again on most…
				rtereshinUnsubmitted Not Done Reply Inline Actions I like the idea with groups. I think it will make it easier for target-writers (and not only human target-writers ;-) ) to compose a reasonable pipeline for their needs relatively easily. Also, the grouping doesn't have to be 1-level, it could be more, if well-structured and useful. It may worth contemplating though what principle should be used to group combines together. I think it's going to be tempting to group them by root-level opcode: all G_ADD combines, all arithmetic combines (2-nd level group), etc, but this kind of grouping has a chance to prove less useful than architecture / micro-architecture targeted grouping. Like "combines for super-scalar architectures with a lot of instruction-level parallelism", and "architectures with highly efficient SIMD units" etc, something driven by what actual targets use and ignore. As for the declarative approach, honestly, putting as much as possible into Tablegen doesn't strike me like a wise approach. Tablegen'erated implementations are hard to read and search, interfaces between them and the rest of the compiler seem to be obscure and fragile, and it's very hard to make changes to any of it. If it's possible to derive the implementation from the information already provided by target writers, extending a Tablegen-backend at least goes into "let's auto-generate the entire compiler" direction, which is valuable. For instance, if we could derive the optimal pipeline of combines from scheduling and instruction info already put together by the targets. If it's needed, however, to come up with a new set of Tablegen-classes to explicitly define that pipeline manually in a .td-file, and write an entirely new Tablegen-backend to process it, that doesn't look like a valuable thing to do. We could be as declarative as we want to be in C++ and that may be much easier to work with. Also, I think it may be valuable to make whatever design we eventually come up with easily compatible with a hypothetical (at the moment) tool that would be able to generate an optimal combine-pipeline for a target automatically, provided a performance feedback mechanism. Let it start with something reasonable, compile a corpus of programs, evaluate the code quality via that feedback mechanism, mutate the pipeline a bit, and try again. That sort of thing calls for a separate binary tool having a wide access to LLVM infrastructure, including all the combines, not for a Table-gen backend. I gave up on making Testgen a part of the Tablegen process very quickly, for instance. rtereshin:* I like the idea with groups. I think it will make it easier for target-writers (and not only…
				aemersonUnsubmitted Not Done Reply Inline Actions All good points from you and Daniel. I don't know if tablegen is the best tool for the combiners, as I worry we'll paint ourselves into a corner once we go down that path. I think it would be worth exploring the design of combiner groups, and higher level schemes expressed in a declarative way as Daniel wants. The schemes could encapsulate the choices a target makes about their combiner configuration, with a default one that most targets would use at least at the beginning. Where customisability comes into it is with something like predicates/masks that enable/disable the activation of specific combines in the groups, so a scheme is a set of groups or existing scheme with overlaid modifications. For altering the orders of combines, an overlay could express the reversal of two given combines within a specific group. aemerson: All good points from you and Daniel. I don't know if tablegen is the best tool for the…
				dsandersAuthorUnsubmitted Not Done Reply Inline Actions It may worth contemplating though what principle should be used to group combines together. I think it's going to be tempting to group them by root-level opcode: all G_ADD combines, all arithmetic combines (2-nd level group), etc, but this kind of grouping has a chance to prove less useful than architecture / micro-architecture targeted grouping. Like "combines for super-scalar architectures with a lot of instruction-level parallelism", and "architectures with highly efficient SIMD units" etc, something driven by what actual targets use and ignore. That's what I was thinking and is the reason for naming this tryCombineExtendingLoads(). We might need to break it down as more targets get added though as some only have sign-extending loads and others only have zero-extending loads As for the declarative approach, honestly, putting as much as possible into Tablegen doesn't strike me like a wise approach... As you say, it doesn't necessarily have to be tablegen. The main thing is being able to throw the common rules and target-dependent rules together and get something that executes them in a sensible order and excludes the things the target doesn't want. Also, I think it may be valuable to make whatever design we eventually come up with easily compatible with a hypothetical (at the moment) tool that would be able to generate an optimal combine-pipeline for a target automatically, provided a performance feedback mechanism. Let it start with something reasonable, compile a corpus of programs, evaluate the code quality via that feedback mechanism, mutate the pipeline a bit, and try again. I agree. This line of thinking is a key reason I included a code coverage mechanism in the InstructionSelector. By extending the 1-bit counters to N-bit and feeding that into the rule sorting, we would end up with a match table that prioritized common rules as far as correctness allowed. All good points from you and Daniel. I don't know if tablegen is the best tool for the combiners, as I worry we'll paint ourselves into a corner once we go down that path. That's something we'll need to investigate while we look at how to maintain this long-term. Just for reference, in my original plan back when I started on the DAGISel importer, my main arguments in favour of tablegen for combiners were: ISel and Combiners are essentially the same thing. They match MIR and replace it with equivalent MIR. By using the same infrastructure for both, optimization investments we make for one can benefit the other. By using the same infrastructure for both, solving the ordering problem for one solves it for the other. The feature bits mechanism (e.g. `Requires<[HasX]>`) is ideal for compiling out Combines as you can simply tell tablegen to discard rules with particular feature bits and check the rest at run time. This can also be done for ISel, potentially reducing the match table size when you only need particular subtarget(s). I didn't go through the implementation detail for Combiners at the time but I did leave a few doors open in case we decided to go down this route later. For example, this is the reason the implementation permits multiple match roots. dsanders: > It may worth contemplating though what principle should be used to group combines together. I…
	}			}

lib/Target/AArch64/AArch64.h

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	FunctionPass *createFalkorMarkStridedAccessesPass();			FunctionPass *createFalkorMarkStridedAccessesPass();

	FunctionPass *createAArch64CleanupLocalDynamicTLSPass();			FunctionPass *createAArch64CleanupLocalDynamicTLSPass();

	FunctionPass *createAArch64CollectLOHPass();			FunctionPass *createAArch64CollectLOHPass();
	InstructionSelector *			InstructionSelector *
	createAArch64InstructionSelector(const AArch64TargetMachine &,			createAArch64InstructionSelector(const AArch64TargetMachine &,
	AArch64Subtarget &, AArch64RegisterBankInfo &);			AArch64Subtarget &, AArch64RegisterBankInfo &);
				FunctionPass *createAArch64PreLegalizeCombiner();

	void initializeAArch64A53Fix835769Pass(PassRegistry&);			void initializeAArch64A53Fix835769Pass(PassRegistry&);
	void initializeAArch64A57FPLoadBalancingPass(PassRegistry&);			void initializeAArch64A57FPLoadBalancingPass(PassRegistry&);
	void initializeAArch64AdvSIMDScalarPass(PassRegistry&);			void initializeAArch64AdvSIMDScalarPass(PassRegistry&);
	void initializeAArch64CollectLOHPass(PassRegistry&);			void initializeAArch64CollectLOHPass(PassRegistry&);
	void initializeAArch64CondBrTuningPass(PassRegistry &);			void initializeAArch64CondBrTuningPass(PassRegistry &);
	void initializeAArch64ConditionalComparesPass(PassRegistry&);			void initializeAArch64ConditionalComparesPass(PassRegistry&);
	void initializeAArch64ConditionOptimizerPass(PassRegistry&);			void initializeAArch64ConditionOptimizerPass(PassRegistry&);
	void initializeAArch64DeadRegisterDefinitionsPass(PassRegistry&);			void initializeAArch64DeadRegisterDefinitionsPass(PassRegistry&);
	void initializeAArch64ExpandPseudoPass(PassRegistry&);			void initializeAArch64ExpandPseudoPass(PassRegistry&);
	void initializeAArch64LoadStoreOptPass(PassRegistry&);			void initializeAArch64LoadStoreOptPass(PassRegistry&);
	void initializeAArch64SIMDInstrOptPass(PassRegistry&);			void initializeAArch64SIMDInstrOptPass(PassRegistry&);
				void initializeAArch64PreLegalizerCombinerPass(PassRegistry&);
	void initializeAArch64PromoteConstantPass(PassRegistry&);			void initializeAArch64PromoteConstantPass(PassRegistry&);
	void initializeAArch64RedundantCopyEliminationPass(PassRegistry&);			void initializeAArch64RedundantCopyEliminationPass(PassRegistry&);
	void initializeAArch64StorePairSuppressPass(PassRegistry&);			void initializeAArch64StorePairSuppressPass(PassRegistry&);
	void initializeFalkorHWPFFixPass(PassRegistry&);			void initializeFalkorHWPFFixPass(PassRegistry&);
	void initializeFalkorMarkStridedAccessesLegacyPass(PassRegistry&);			void initializeFalkorMarkStridedAccessesLegacyPass(PassRegistry&);
	void initializeLDTLSCleanupPass(PassRegistry&);			void initializeLDTLSCleanupPass(PassRegistry&);
	} // end namespace llvm			} // end namespace llvm

	#endif			#endif

lib/Target/AArch64/AArch64PreLegalizerCombiner.cpp

This file was added.

				//=== lib/CodeGen/GlobalISel/AArch64PreLegalizerCombiner.cpp --------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass does combining of machine instructions at the generic MI level,
				// before the legalizer.
				//
				//===----------------------------------------------------------------------===//

				#include "AArch64TargetMachine.h"
				#include "llvm/CodeGen/GlobalISel/Combiner.h"
				#include "llvm/CodeGen/GlobalISel/CombinerHelper.h"
				#include "llvm/CodeGen/GlobalISel/CombinerInfo.h"
				#include "llvm/CodeGen/GlobalISel/MIPatternMatch.h"
				#include "llvm/CodeGen/MachineFunctionPass.h"
				#include "llvm/CodeGen/TargetPassConfig.h"
				#include "llvm/Support/Debug.h"

				#define DEBUG_TYPE "aarch64-prelegalizer-combiner"

				using namespace llvm;
				using namespace MIPatternMatch;

				namespace {
				class AArch64PreLegalizerCombinerInfo : public CombinerInfo {
				public:
				AArch64PreLegalizerCombinerInfo()
				: CombinerInfo(/AllowIllegalOps/ true, /ShouldLegalizeIllegal/ false,
				/LegalizerInfo/ nullptr) {}
				virtual bool combine(CombinerChangeObserver &Observer, MachineInstr &MI,
				MachineIRBuilder &B) const override;
				};

				bool AArch64PreLegalizerCombinerInfo::combine(CombinerChangeObserver &Observer,
				MachineInstr &MI,
				MachineIRBuilder &B) const {
				CombinerHelper Helper(Observer, B);

				switch (MI.getOpcode()) {
				default:
				return false;
				case TargetOpcode::G_LOAD:
				case TargetOpcode::G_SEXTLOAD:
				case TargetOpcode::G_ZEXTLOAD:
				return Helper.tryCombineExtendingLoads(MI);
				rtereshinUnsubmitted Done Reply Inline Actions `CombinerHelper::tryCombineCopy` contains a bug (it doesn't check that register classes and register banks of its source and destination are compatible ), as soon as it's fixed* and `CombinerHelper::tryCombine` properly tries all of the combines implemented in `CombinerHelper` as it's supposed to, this could be replaced with just a single call to `CombinerHelper::tryCombine`, not before, though. I'm planning on adding a patch with that fix soon, but it will be dependent on https://reviews.llvm.org/D45732 as the latter makes the former simpler and shorter. + @aditya_nandakumar rtereshin:* `CombinerHelper::tryCombineCopy` contains a bug (it doesn't check that register classes and…
				dsandersAuthorUnsubmitted Done Reply Inline Actions and CombinerHelper::tryCombine properly tries all of the combines implemented in CombinerHelper as it's supposed to At the moment, that will look like a good idea but having a single monolithic tryCombine() isn't going to last long as more combines get implemented and more targets use combines. (see above) dsanders: > and CombinerHelper::tryCombine properly tries all of the combines implemented in…
				rtereshinUnsubmitted Done Reply Inline Actions Something still needs to be done about `tryCombine`. If it doesn't represent any practically useful group used by any target (not necessarily in final implementation, during experimentation too), let's remove it. if it does, let's maintain it so it does precisely what it promises to do. rtereshin: Something still needs to be done about `tryCombine`. If it doesn't represent any practically…
				}

				return false;
				}

				// Pass boilerplate
				// ================

				class AArch64PreLegalizerCombiner : public MachineFunctionPass {
				public:
				static char ID;

				AArch64PreLegalizerCombiner();

				StringRef getPassName() const override { return "AArch64PreLegalizerCombiner"; }

				bool runOnMachineFunction(MachineFunction &MF) override;

				void getAnalysisUsage(AnalysisUsage &AU) const override;
				};
				}

				void AArch64PreLegalizerCombiner::getAnalysisUsage(AnalysisUsage &AU) const {
				AU.addRequired<TargetPassConfig>();
				MachineFunctionPass::getAnalysisUsage(AU);
				}

				AArch64PreLegalizerCombiner::AArch64PreLegalizerCombiner() : MachineFunctionPass(ID) {
				initializeAArch64PreLegalizerCombinerPass(*PassRegistry::getPassRegistry());
				}

				bool AArch64PreLegalizerCombiner::runOnMachineFunction(MachineFunction &MF) {
				if (MF.getProperties().hasProperty(
				MachineFunctionProperties::Property::FailedISel))
				return false;
				auto *TPC = &getAnalysis<TargetPassConfig>();
				AArch64PreLegalizerCombinerInfo PCInfo;
				Combiner C(PCInfo, TPC);
				return C.combineMachineInstrs(MF);
				}

				char AArch64PreLegalizerCombiner::ID = 0;
				INITIALIZE_PASS_BEGIN(AArch64PreLegalizerCombiner, DEBUG_TYPE,
				"Combine AArch64 machine instrs before legalization",
				false, false)
				INITIALIZE_PASS_DEPENDENCY(TargetPassConfig)
				INITIALIZE_PASS_END(AArch64PreLegalizerCombiner, DEBUG_TYPE,
				"Combine AArch64 machine instrs before legalization", false,
				false)


				namespace llvm {
				FunctionPass *createAArch64PreLegalizeCombiner() {
				return new AArch64PreLegalizerCombiner();
				}
				} // end namespace llvm

lib/Target/AArch64/AArch64TargetMachine.cpp

Show First 20 Lines • Show All 152 Lines • ▼ Show 20 Lines	extern "C" void LLVMInitializeAArch64Target() {
initializeAArch64AdvSIMDScalarPass(*PR);		initializeAArch64AdvSIMDScalarPass(*PR);
initializeAArch64CollectLOHPass(*PR);		initializeAArch64CollectLOHPass(*PR);
initializeAArch64ConditionalComparesPass(*PR);		initializeAArch64ConditionalComparesPass(*PR);
initializeAArch64ConditionOptimizerPass(*PR);		initializeAArch64ConditionOptimizerPass(*PR);
initializeAArch64DeadRegisterDefinitionsPass(*PR);		initializeAArch64DeadRegisterDefinitionsPass(*PR);
initializeAArch64ExpandPseudoPass(*PR);		initializeAArch64ExpandPseudoPass(*PR);
initializeAArch64LoadStoreOptPass(*PR);		initializeAArch64LoadStoreOptPass(*PR);
initializeAArch64SIMDInstrOptPass(*PR);		initializeAArch64SIMDInstrOptPass(*PR);
		initializeAArch64PreLegalizerCombinerPass(*PR);
initializeAArch64PromoteConstantPass(*PR);		initializeAArch64PromoteConstantPass(*PR);
initializeAArch64RedundantCopyEliminationPass(*PR);		initializeAArch64RedundantCopyEliminationPass(*PR);
initializeAArch64StorePairSuppressPass(*PR);		initializeAArch64StorePairSuppressPass(*PR);
initializeFalkorHWPFFixPass(*PR);		initializeFalkorHWPFFixPass(*PR);
initializeFalkorMarkStridedAccessesLegacyPass(*PR);		initializeFalkorMarkStridedAccessesLegacyPass(*PR);
initializeLDTLSCleanupPass(*PR);		initializeLDTLSCleanupPass(*PR);
}		}

▲ Show 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	createPostMachineScheduler(MachineSchedContext *C) const override {

return nullptr;		return nullptr;
}		}

void addIRPasses() override;		void addIRPasses() override;
bool addPreISel() override;		bool addPreISel() override;
bool addInstSelector() override;		bool addInstSelector() override;
bool addIRTranslator() override;		bool addIRTranslator() override;
		void addPreLegalizeMachineIR() override;
bool addLegalizeMachineIR() override;		bool addLegalizeMachineIR() override;
bool addRegBankSelect() override;		bool addRegBankSelect() override;
void addPreGlobalInstructionSelect() override;		void addPreGlobalInstructionSelect() override;
bool addGlobalInstructionSelect() override;		bool addGlobalInstructionSelect() override;
bool addILPOpts() override;		bool addILPOpts() override;
void addPreRegAlloc() override;		void addPreRegAlloc() override;
void addPostRegAlloc() override;		void addPostRegAlloc() override;
void addPreSched2() override;		void addPreSched2() override;
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	bool AArch64PassConfig::addInstSelector() {
return false;		return false;
}		}

bool AArch64PassConfig::addIRTranslator() {		bool AArch64PassConfig::addIRTranslator() {
addPass(new IRTranslator());		addPass(new IRTranslator());
return false;		return false;
}		}

		void AArch64PassConfig::addPreLegalizeMachineIR() {
		addPass(createAArch64PreLegalizeCombiner());
		}

bool AArch64PassConfig::addLegalizeMachineIR() {		bool AArch64PassConfig::addLegalizeMachineIR() {
addPass(new Legalizer());		addPass(new Legalizer());
return false;		return false;
}		}

bool AArch64PassConfig::addRegBankSelect() {		bool AArch64PassConfig::addRegBankSelect() {
addPass(new RegBankSelect());		addPass(new RegBankSelect());
return false;		return false;
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

lib/Target/AArch64/CMakeLists.txt

Show All 37 Lines	add_llvm_target(AArch64CodeGen
AArch64ISelDAGToDAG.cpp		AArch64ISelDAGToDAG.cpp
AArch64ISelLowering.cpp		AArch64ISelLowering.cpp
AArch64InstrInfo.cpp		AArch64InstrInfo.cpp
AArch64InstructionSelector.cpp		AArch64InstructionSelector.cpp
AArch64LegalizerInfo.cpp		AArch64LegalizerInfo.cpp
AArch64LoadStoreOptimizer.cpp		AArch64LoadStoreOptimizer.cpp
AArch64MacroFusion.cpp		AArch64MacroFusion.cpp
AArch64MCInstLower.cpp		AArch64MCInstLower.cpp
		AArch64PreLegalizerCombiner.cpp
AArch64PromoteConstant.cpp		AArch64PromoteConstant.cpp
AArch64PBQPRegAlloc.cpp		AArch64PBQPRegAlloc.cpp
AArch64RegisterBankInfo.cpp		AArch64RegisterBankInfo.cpp
AArch64RegisterInfo.cpp		AArch64RegisterInfo.cpp
AArch64SelectionDAGInfo.cpp		AArch64SelectionDAGInfo.cpp
AArch64StorePairSuppress.cpp		AArch64StorePairSuppress.cpp
AArch64Subtarget.cpp		AArch64Subtarget.cpp
AArch64TargetMachine.cpp		AArch64TargetMachine.cpp
Show All 14 Lines

test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll

Show All 26 Lines
; FALLBACK-WITH-REPORT-OUT: ldr q0,		; FALLBACK-WITH-REPORT-OUT: ldr q0,
; FALLBACK-WITH-REPORT-OUT-NEXT: bl __fixunstfti		; FALLBACK-WITH-REPORT-OUT-NEXT: bl __fixunstfti
define i128 @ABIi128(i128 %arg1) {		define i128 @ABIi128(i128 %arg1) {
%farg1 = bitcast i128 %arg1 to fp128		%farg1 = bitcast i128 %arg1 to fp128
%res = fptoui fp128 %farg1 to i128		%res = fptoui fp128 %farg1 to i128
ret i128 %res		ret i128 %res
}		}

		; It happens that we don't handle ConstantArray instances yet during
		; translation. Any other constant would be fine too.

		; FALLBACK-WITH-REPORT-ERR: remark: <unknown>:0:0: unable to translate constant: [1 x double] (in function: constant)
		; FALLBACK-WITH-REPORT-ERR: warning: Instruction selection used fallback path for constant
		; FALLBACK-WITH-REPORT-OUT-LABEL: constant:
		; FALLBACK-WITH-REPORT-OUT: fmov d0, #1.0
		define [1 x double] @constant() {
		aemersonUnsubmitted Done Reply Inline Actions This was removed during r332449? aemerson: This was removed during r332449?
		dsandersAuthorUnsubmitted Done Reply Inline Actions This is probably a result of the rebase. I'll remove it dsanders: This is probably a result of the rebase. I'll remove it
		ret [1 x double] [double 1.0]
		}

; The key problem here is that we may fail to create an MBB referenced by a		; The key problem here is that we may fail to create an MBB referenced by a
; PHI. If so, we cannot complete the G_PHI and mustn't try or bad things		; PHI. If so, we cannot complete the G_PHI and mustn't try or bad things
; happen.		; happen.
; FALLBACK-WITH-REPORT-ERR: remark: <unknown>:0:0: cannot select: G_STORE %6:gpr(s32), %2:gpr(p0) :: (store seq_cst 4 into %ir.addr) (in function: pending_phis)		; FALLBACK-WITH-REPORT-ERR: remark: <unknown>:0:0: cannot select: G_STORE %6:gpr(s32), %2:gpr(p0) :: (store seq_cst 4 into %ir.addr) (in function: pending_phis)
; FALLBACK-WITH-REPORT-ERR: warning: Instruction selection used fallback path for pending_phis		; FALLBACK-WITH-REPORT-ERR: warning: Instruction selection used fallback path for pending_phis
; FALLBACK-WITH-REPORT-OUT-LABEL: pending_phis:		; FALLBACK-WITH-REPORT-OUT-LABEL: pending_phis:
define i32 @pending_phis(i1 %tst, i32 %val, i32* %addr) {		define i32 @pending_phis(i1 %tst, i32 %val, i32* %addr) {
br i1 %tst, label %true, label %false		br i1 %tst, label %true, label %false

end:		end:
%res = phi i32 [%val, %true], [42, %false]		%res = phi i32 [%val, %true], [42, %false]
ret i32 %res		ret i32 %res

true:		true:
store atomic i32 42, i32* %addr seq_cst, align 4		store atomic i32 42, i32* %addr seq_cst, align 4
br label %end		br label %end

false:		false:
br label %end		br label %end

}		}

; FALLBACK-WITH-REPORT-ERR: remark: <unknown>:0:0: unable to legalize instruction: %0:_(s24) = G_LOAD %1:_(p0) :: (load 3 from `i24* undef`, align 1) (in function: odd_type_load)		; FALLBACK-WITH-REPORT-ERR: remark: <unknown>:0:0: unable to legalize instruction: %2:_(s32) = G_ZEXTLOAD %1:_(p0) :: (load 3 from `i24* undef`, align 1) (in function: odd_type_load)
; FALLBACK-WITH-REPORT-ERR: warning: Instruction selection used fallback path for odd_type_load		; FALLBACK-WITH-REPORT-ERR: warning: Instruction selection used fallback path for odd_type_load
; FALLBACK-WITH-REPORT-OUT-LABEL: odd_type_load		; FALLBACK-WITH-REPORT-OUT-LABEL: odd_type_load
define i32 @odd_type_load() {		define i32 @odd_type_load() {
entry:		entry:
%ld = load i24, i24* undef, align 1		%ld = load i24, i24* undef, align 1
%cst = zext i24 %ld to i32		%cst = zext i24 %ld to i32
ret i32 %cst		ret i32 %cst
}		}
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	block:
store <2 x i16> %dummy, <2 x i16>* undef		store <2 x i16> %dummy, <2 x i16>* undef
ret void		ret void

end:		end:
%vec = load <2 x i16>, <2 x i16>* undef		%vec = load <2 x i16>, <2 x i16>* undef
br label %block		br label %block
}		}

		; FALLBACK-WITH-REPORT-ERR: remark: <unknown>:0:0: unable to legalize instruction: G_STORE %1:_(s96), %3:_(p0) :: (store 12 into `%struct96* undef`, align 4) (in function: nonpow2_insertvalue_narrowing)
		; FALLBACK-WITH-REPORT-ERR: warning: Instruction selection used fallback path for nonpow2_insertvalue_narrowing
		; FALLBACK-WITH-REPORT-OUT-LABEL: nonpow2_insertvalue_narrowing:
		%struct96 = type { float, float, float }
		define void @nonpow2_insertvalue_narrowing(float %a) {
		%dummy = insertvalue %struct96 undef, float %a, 0
		store %struct96 %dummy, %struct96* undef
		ret void
		}

; FALLBACK-WITH-REPORT-ERR remark: <unknown>:0:0: unable to legalize instruction: G_STORE %3, %4 :: (store 12 into `i96* undef`, align 16) (in function: nonpow2_add_narrowing)		; FALLBACK-WITH-REPORT-ERR remark: <unknown>:0:0: unable to legalize instruction: G_STORE %3, %4 :: (store 12 into `i96* undef`, align 16) (in function: nonpow2_add_narrowing)
; FALLBACK-WITH-REPORT-ERR: warning: Instruction selection used fallback path for nonpow2_add_narrowing		; FALLBACK-WITH-REPORT-ERR: warning: Instruction selection used fallback path for nonpow2_add_narrowing
; FALLBACK-WITH-REPORT-OUT-LABEL: nonpow2_add_narrowing:		; FALLBACK-WITH-REPORT-OUT-LABEL: nonpow2_add_narrowing:
define void @nonpow2_add_narrowing() {		define void @nonpow2_add_narrowing() {
%a = add i128 undef, undef		%a = add i128 undef, undef
%b = trunc i128 %a to i96		%b = trunc i128 %a to i96
%dummy = add i96 %b, %b		%dummy = add i96 %b, %b
store i96 %dummy, i96* undef		store i96 %dummy, i96* undef
▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

test/CodeGen/AArch64/GlobalISel/gisel-commandline-option.ll

	Show All 30 Lines

	; RUN: llc -mtriple=aarch64-- -debug-pass=Structure %s -o /dev/null 2>&1 \			; RUN: llc -mtriple=aarch64-- -debug-pass=Structure %s -o /dev/null 2>&1 \
	; RUN: \| FileCheck %s --check-prefix DISABLED			; RUN: \| FileCheck %s --check-prefix DISABLED

	; RUN: llc -mtriple=aarch64-- -fast-isel=0 -global-isel=false \			; RUN: llc -mtriple=aarch64-- -fast-isel=0 -global-isel=false \
	; RUN: -debug-pass=Structure %s -o /dev/null 2>&1 \| FileCheck %s --check-prefix DISABLED			; RUN: -debug-pass=Structure %s -o /dev/null 2>&1 \| FileCheck %s --check-prefix DISABLED

	; ENABLED: IRTranslator			; ENABLED: IRTranslator
				; ENABLED-NEXT: PreLegalizerCombiner
	; ENABLED-NEXT: Legalizer			; ENABLED-NEXT: Legalizer
	; ENABLED-NEXT: RegBankSelect			; ENABLED-NEXT: RegBankSelect
	; ENABLED-O0-NEXT: Localizer			; ENABLED-O0-NEXT: Localizer
	; ENABLED-NEXT: InstructionSelect			; ENABLED-NEXT: InstructionSelect
	; ENABLED-NEXT: ResetMachineFunction			; ENABLED-NEXT: ResetMachineFunction

	; FALLBACK: AArch64 Instruction Selection			; FALLBACK: AArch64 Instruction Selection
	; NOFALLBACK-NOT: AArch64 Instruction Selection			; NOFALLBACK-NOT: AArch64 Instruction Selection
	Show All 9 Lines

test/CodeGen/AArch64/GlobalISel/prelegalizercombiner-extending-loads.mir

This file was added.

				# RUN: llc -O0 -run-pass=aarch64-prelegalizer-combiner -global-isel %s -o - \| FileCheck %s

				--- \|
				target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"
				target triple = "aarch64--"
				define void @test_anyext(i8* %addr) {
				entry:
				ret void
				}
				define void @test_anyext_with_copy(i8* %addr) {
				entry:
				ret void
				}
				define void @test_signext(i8* %addr) {
				entry:
				ret void
				}
				define void @test_zeroext(i8* %addr) {
				entry:
				ret void
				}
				define void @test_2anyext(i8* %addr) {
				entry:
				ret void
				}
				define void @test_1anyext_1signext(i8* %addr) {
				entry:
				ret void
				}
				define void @test_1xor_1signext(i8* %addr) {
				entry:
				ret void
				}
				define void @test_1anyext_1zeroext(i8* %addr) {
				entry:
				ret void
				}
				define void @test_1signext_1zeroext(i8* %addr) {
				entry:
				ret void
				}
				define void @test_1anyext64_1signext32(i8* %addr) {
				entry:
				ret void
				}
				define void @test_1anyext32_1signext64(i8* %addr) {
				entry:
				ret void
				}
				define void @test_2anyext32_1signext64(i8* %addr) {
				entry:
				ret void
				}
				define void @test_multiblock_anyext(i8* %addr) {
				entry:
				ret void
				}
				define void @test_multiblock_signext(i8* %addr) {
				entry:
				ret void
				}
				define void @test_multiblock_zeroext(i8* %addr) {
				entry:
				ret void
				}
				define void @test_multiblock_2anyext(i8* %addr) {
				entry:
				ret void
				}
				define void @test_multiblock_1anyext64_1signext32(i8* %addr) {
				entry:
				ret void
				}
				define void @test_multiblock_1anyext32_1signext64(i8* %addr) {
				entry:
				ret void
				}
				define void @test_multiblock_2anyext32_1signext64(i8* %addr) {
				entry:
				ret void
				}
				...

				---
				name: test_anyext
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: test_anyext
				; CHECK: [[T0:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[T1:%[0-9]+]]:_(s32) = G_LOAD [[T0]](p0) :: (load 1 from %ir.addr)
				; CHECK: $w0 = COPY [[T1]](s32)
				%0:_(p0) = COPY $x0
				%1:_(s8) = G_LOAD %0 :: (load 1 from %ir.addr)
				%2:_(s32) = G_ANYEXT %1
				$w0 = COPY %2
				...

				---
				name: test_anyext_with_copy
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: test_anyext
				; CHECK: [[T0:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[T1:%[0-9]+]]:_(s32) = G_LOAD [[T0]](p0) :: (load 1 from %ir.addr)
				; CHECK: $w0 = COPY [[T1]](s32)
				%0:_(p0) = COPY $x0
				%1:_(s8) = G_LOAD %0 :: (load 1 from %ir.addr)
				%2:_(s8) = COPY %1
				%3:_(s32) = G_ANYEXT %1
				$w0 = COPY %3
				...

				---
				name: test_signext
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: test_signext
				; CHECK: [[T0:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[T1:%[0-9]+]]:_(s32) = G_SEXTLOAD [[T0]](p0) :: (load 1 from %ir.addr)
				; CHECK: $w0 = COPY [[T1]](s32)
				%0:_(p0) = COPY $x0
				%1:_(s8) = G_LOAD %0 :: (load 1 from %ir.addr)
				%2:_(s32) = G_SEXT %1
				$w0 = COPY %2
				...

				---
				name: test_zeroext
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: test_zeroext
				; CHECK: [[T0:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[T1:%[0-9]+]]:_(s32) = G_ZEXTLOAD [[T0]](p0) :: (load 1 from %ir.addr)
				; CHECK: $w0 = COPY [[T1]](s32)
				%0:_(p0) = COPY $x0
				%1:_(s8) = G_LOAD %0 :: (load 1 from %ir.addr)
				%2:_(s32) = G_ZEXT %1
				$w0 = COPY %2
				...

				---
				name: test_2anyext
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: test_2anyext
				; CHECK: [[T0:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[T1:%[0-9]+]]:_(s32) = G_LOAD [[T0]](p0) :: (load 1 from %ir.addr)
				; CHECK: $w0 = COPY [[T1]](s32)
				; CHECK: $w1 = COPY [[T1]](s32)
				%0:_(p0) = COPY $x0
				%1:_(s8) = G_LOAD %0 :: (load 1 from %ir.addr)
				%2:_(s32) = G_ANYEXT %1
				%3:_(s32) = G_ANYEXT %1
				$w0 = COPY %2
				$w1 = COPY %3
				...

				---
				name: test_1anyext_1signext
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: test_1anyext_1signext
				; CHECK: [[T0:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[T1:%[0-9]+]]:_(s32) = G_SEXTLOAD [[T0]](p0) :: (load 1 from %ir.addr)
				; CHECK: $w0 = COPY [[T1]](s32)
				; CHECK: $w1 = COPY [[T1]](s32)
				%0:_(p0) = COPY $x0
				%1:_(s8) = G_LOAD %0 :: (load 1 from %ir.addr)
				%2:_(s32) = G_ANYEXT %1
				%3:_(s32) = G_SEXT %1
				$w0 = COPY %2
				$w1 = COPY %3
				...

				---
				name: test_1xor_1signext
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: test_1xor_1signext
				; CHECK: [[T0:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[T1:%[0-9]+]]:_(s32) = G_SEXTLOAD [[T0]](p0) :: (load 1 from %ir.addr)
				; CHECK: [[T2:%[0-9]+]]:_(s8) = G_TRUNC [[T1]]
				; CHECK: [[T3:%[0-9]+]]:_(s8) = G_XOR [[T2]], {{%[0-9]+}}
				; CHECK: [[T4:%[0-9]+]]:_(s32) = G_ANYEXT [[T3]]
				; CHECK: $w0 = COPY [[T4]](s32)
				; CHECK: $w1 = COPY [[T1]](s32)
				%0:_(p0) = COPY $x0
				%1:_(s8) = G_LOAD %0 :: (load 1 from %ir.addr)
				%2:_(s8) = G_CONSTANT i32 -1
				%3:_(s8) = G_XOR %1, %2
				%5:_(s32) = G_ANYEXT %3
				%6:_(s32) = G_SEXT %1
				$w0 = COPY %5
				$w1 = COPY %6
				...

				---
				name: test_1anyext_1zeroext
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: test_1anyext_1zeroext
				; CHECK: [[T0:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[T1:%[0-9]+]]:_(s32) = G_ZEXTLOAD [[T0]](p0) :: (load 1 from %ir.addr)
				; CHECK: $w0 = COPY [[T1]](s32)
				; CHECK: $w1 = COPY [[T1]](s32)
				%0:_(p0) = COPY $x0
				%1:_(s8) = G_LOAD %0 :: (load 1 from %ir.addr)
				%2:_(s32) = G_ANYEXT %1
				%3:_(s32) = G_ZEXT %1
				$w0 = COPY %2
				$w1 = COPY %3
				...

				---
				name: test_1signext_1zeroext
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: test_1signext_1zeroext
				; CHECK: [[T0:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[T1:%[0-9]+]]:_(s32) = G_SEXTLOAD [[T0]](p0) :: (load 1 from %ir.addr)
				; CHECK: [[T2:%[0-9]+]]:_(s8) = G_TRUNC [[T1]]
				; CHECK: [[T3:%[0-9]+]]:_(s32) = G_ZEXT [[T2]]
				; CHECK: $w0 = COPY [[T3]](s32)
				; CHECK: $w1 = COPY [[T1]](s32)
				%0:_(p0) = COPY $x0
				%1:_(s8) = G_LOAD %0 :: (load 1 from %ir.addr)
				%2:_(s32) = G_ZEXT %1
				%3:_(s32) = G_SEXT %1
				$w0 = COPY %2
				$w1 = COPY %3
				...

				---
				name: test_1anyext64_1signext32
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: test_1anyext64_1signext32
				; CHECK: [[T0:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[T1:%[0-9]+]]:_(s32) = G_SEXTLOAD [[T0]](p0) :: (load 1 from %ir.addr)
				; CHECK: [[T2:%[0-9]+]]:_(s64) = G_ANYEXT [[T1]]
				; CHECK: $x0 = COPY [[T2]](s64)
				; CHECK: $w1 = COPY [[T1]](s32)
				%0:_(p0) = COPY $x0
				%1:_(s8) = G_LOAD %0 :: (load 1 from %ir.addr)
				%2:_(s64) = G_ANYEXT %1
				%3:_(s32) = G_SEXT %1
				$x0 = COPY %2
				$w1 = COPY %3
				...

				---
				name: test_1anyext32_1signext64
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: test_1anyext32_1signext64
				; CHECK: [[T0:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[T1:%[0-9]+]]:_(s64) = G_SEXTLOAD [[T0]](p0) :: (load 1 from %ir.addr)
				; CHECK: [[T2:%[0-9]+]]:_(s8) = G_TRUNC [[T1]]
				; CHECK: [[T3:%[0-9]+]]:_(s32) = G_ANYEXT [[T2]]
				; CHECK: $w0 = COPY [[T3]](s32)
				; CHECK: $x1 = COPY [[T1]](s64)
				%0:_(p0) = COPY $x0
				%1:_(s8) = G_LOAD %0 :: (load 1 from %ir.addr)
				%2:_(s32) = G_ANYEXT %1
				%3:_(s64) = G_SEXT %1
				$w0 = COPY %2
				$x1 = COPY %3
				...

				---
				name: test_2anyext32_1signext64
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: test_2anyext32_1signext64
				; CHECK: [[T0:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[T1:%[0-9]+]]:_(s64) = G_SEXTLOAD [[T0]](p0) :: (load 1 from %ir.addr)
				; CHECK: [[T2:%[0-9]+]]:_(s8) = G_TRUNC [[T1]]
				; CHECK: [[T3:%[0-9]+]]:_(s32) = G_ANYEXT [[T2]]
				; CHECK: [[T4:%[0-9]+]]:_(s8) = G_TRUNC [[T1]]
				; CHECK: [[T5:%[0-9]+]]:_(s32) = G_ANYEXT [[T4]]
				; CHECK: $w0 = COPY [[T3]](s32)
				; CHECK: $x1 = COPY [[T1]](s64)
				; CHECK: $w2 = COPY [[T5]](s32)
				%0:_(p0) = COPY $x0
				%1:_(s8) = G_LOAD %0 :: (load 1 from %ir.addr)
				%2:_(s32) = G_ANYEXT %1
				%3:_(s64) = G_SEXT %1
				%4:_(s32) = G_ANYEXT %1
				$w0 = COPY %2
				$x1 = COPY %3
				$w2 = COPY %4
				...

				---
				name: test_multiblock_anyext
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: test_multiblock_anyext
				; CHECK: [[T0:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[T1:%[0-9]+]]:_(s32) = G_LOAD [[T0]](p0) :: (load 1 from %ir.addr)
				; CHECK: G_BR %bb.1
				; CHECK: $w0 = COPY [[T1]](s32)
				%0:_(p0) = COPY $x0
				%1:_(s8) = G_LOAD %0 :: (load 1 from %ir.addr)
				G_BR %bb.1
				bb.1:
				%2:_(s32) = G_ANYEXT %1
				$w0 = COPY %2
				...

				---
				name: test_multiblock_signext
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: test_multiblock_signext
				; CHECK: [[T0:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[T1:%[0-9]+]]:_(s32) = G_SEXTLOAD [[T0]](p0) :: (load 1 from %ir.addr)
				; CHECK: $w0 = COPY [[T1]](s32)
				%0:_(p0) = COPY $x0
				%1:_(s8) = G_LOAD %0 :: (load 1 from %ir.addr)
				G_BR %bb.1
				bb.1:
				%2:_(s32) = G_SEXT %1
				$w0 = COPY %2
				...

				---
				name: test_multiblock_zeroext
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: test_multiblock_zeroext
				; CHECK: [[T0:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[T1:%[0-9]+]]:_(s32) = G_ZEXTLOAD [[T0]](p0) :: (load 1 from %ir.addr)
				; CHECK: $w0 = COPY [[T1]](s32)
				%0:_(p0) = COPY $x0
				%1:_(s8) = G_LOAD %0 :: (load 1 from %ir.addr)
				G_BR %bb.1
				bb.1:
				%2:_(s32) = G_ZEXT %1
				$w0 = COPY %2
				...

				---
				name: test_multiblock_2anyext
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: test_multiblock
				; CHECK: [[T0:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[T1:%[0-9]+]]:_(s32) = G_LOAD [[T0]](p0) :: (load 1 from %ir.addr)
				; CHECK: $w0 = COPY [[T1]](s32)
				; CHECK: $w1 = COPY [[T1]](s32)
				%0:_(p0) = COPY $x0
				%1:_(s8) = G_LOAD %0 :: (load 1 from %ir.addr)
				%2:_(s32) = G_ANYEXT %1
				G_BR %bb.1
				bb.1:
				%3:_(s32) = G_ANYEXT %1
				$w0 = COPY %2
				$w1 = COPY %3
				...

				---
				name: test_multiblock_1anyext64_1signext32
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: test_multiblock_1anyext64_1signext32
				; CHECK: [[T0:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[T1:%[0-9]+]]:_(s32) = G_SEXTLOAD [[T0]](p0) :: (load 1 from %ir.addr)
				; CHECK: G_BR %bb.1
				; CHECK: [[T2:%[0-9]+]]:_(s64) = G_ANYEXT [[T1]]
				; CHECK: $x0 = COPY [[T2]](s64)
				; CHECK: $w1 = COPY [[T1]](s32)
				%0:_(p0) = COPY $x0
				%1:_(s8) = G_LOAD %0 :: (load 1 from %ir.addr)
				G_BR %bb.1
				bb.1:
				%2:_(s64) = G_ANYEXT %1
				%3:_(s32) = G_SEXT %1
				$x0 = COPY %2
				$w1 = COPY %3
				...

				---
				name: test_multiblock_1anyext32_1signext64
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: test_multiblock_1anyext32_1signext64
				; CHECK: [[T0:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[T1:%[0-9]+]]:_(s64) = G_SEXTLOAD [[T0]](p0) :: (load 1 from %ir.addr)
				; CHECK: G_BR %bb.1
				; CHECK: [[T2:%[0-9]+]]:_(s8) = G_TRUNC [[T1]]
				; CHECK: [[T3:%[0-9]+]]:_(s32) = G_ANYEXT [[T2]]
				; CHECK: $w0 = COPY [[T3]](s32)
				; CHECK: $x1 = COPY [[T1]](s64)
				%0:_(p0) = COPY $x0
				%1:_(s8) = G_LOAD %0 :: (load 1 from %ir.addr)
				G_BR %bb.1
				bb.1:
				%2:_(s32) = G_ANYEXT %1
				%3:_(s64) = G_SEXT %1
				$w0 = COPY %2
				$x1 = COPY %3
				...

				---
				name: test_multiblock_2anyext32_1signext64
				body: \|
				bb.0.entry:
				liveins: $x0
				; CHECK-LABEL: name: test_multiblock_2anyext32_1signext64
				; CHECK: [[T0:%[0-9]+]]:_(p0) = COPY $x0
				; CHECK: [[T1:%[0-9]+]]:_(s64) = G_SEXTLOAD [[T0]](p0) :: (load 1 from %ir.addr)
				; CHECK: [[T2:%[0-9]+]]:_(s8) = G_TRUNC [[T1]]
				; CHECK: [[T3:%[0-9]+]]:_(s32) = G_ANYEXT [[T2]]
				; CHECK: G_BR %bb.1
				; CHECK: [[T4:%[0-9]+]]:_(s8) = G_TRUNC [[T1]]
				; CHECK: [[T5:%[0-9]+]]:_(s32) = G_ANYEXT [[T4]]
				; CHECK: $w0 = COPY [[T5]](s32)
				; CHECK: $x1 = COPY [[T1]](s64)
				; CHECK: $w2 = COPY [[T3]](s32)
				%0:_(p0) = COPY $x0
				%1:_(s8) = G_LOAD %0 :: (load 1 from %ir.addr)
				%4:_(s32) = G_ANYEXT %1
				G_BR %bb.1
				bb.1:
				%2:_(s32) = G_ANYEXT %1
				%3:_(s64) = G_SEXT %1
				$w0 = COPY %2
				$x1 = COPY %3
				$w2 = COPY %4
				...

test/CodeGen/AArch64/O0-pipeline.ll

	Show All 27 Lines
	; CHECK-NEXT: Rewrite Symbols			; CHECK-NEXT: Rewrite Symbols
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Exception handling preparation			; CHECK-NEXT: Exception handling preparation
	; CHECK-NEXT: Safe Stack instrumentation pass			; CHECK-NEXT: Safe Stack instrumentation pass
	; CHECK-NEXT: Insert stack protectors			; CHECK-NEXT: Insert stack protectors
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: IRTranslator			; CHECK-NEXT: IRTranslator
				; CHECK-NEXT: AArch64PreLegalizerCombiner
	; CHECK-NEXT: Legalizer			; CHECK-NEXT: Legalizer
	; CHECK-NEXT: RegBankSelect			; CHECK-NEXT: RegBankSelect
	; CHECK-NEXT: Localizer			; CHECK-NEXT: Localizer
	; CHECK-NEXT: InstructionSelect			; CHECK-NEXT: InstructionSelect
	; CHECK-NEXT: ResetMachineFunction			; CHECK-NEXT: ResetMachineFunction
	; CHECK-NEXT: AArch64 Instruction Selection			; CHECK-NEXT: AArch64 Instruction Selection
	; CHECK-NEXT: Expand ISel Pseudo-instructions			; CHECK-NEXT: Expand ISel Pseudo-instructions
	; CHECK-NEXT: Local Stack Slot Allocation			; CHECK-NEXT: Local Stack Slot Allocation
	Show All 24 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 148961

include/llvm/CodeGen/GlobalISel/Combiner.h

include/llvm/CodeGen/GlobalISel/CombinerHelper.h

include/llvm/CodeGen/GlobalISel/CombinerInfo.h

include/llvm/CodeGen/GlobalISel/MachineIRBuilder.h

lib/CodeGen/GlobalISel/Combiner.cpp

lib/CodeGen/GlobalISel/CombinerHelper.cpp

lib/Target/AArch64/AArch64.h

lib/Target/AArch64/AArch64PreLegalizerCombiner.cpp

lib/Target/AArch64/AArch64TargetMachine.cpp

lib/Target/AArch64/CMakeLists.txt

test/CodeGen/AArch64/GlobalISel/arm64-fallback.ll

test/CodeGen/AArch64/GlobalISel/gisel-commandline-option.ll

test/CodeGen/AArch64/GlobalISel/prelegalizercombiner-extending-loads.mir

test/CodeGen/AArch64/O0-pipeline.ll

[globalisel] Add a combiner helpers for extending loads and use them in a pre-legalize combiner for AArch64
ClosedPublic