This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
3/7
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
CodeGen/
-
BasicTTIImpl.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/
-
AArch64/
-
AArch64TargetTransformInfo.h
-
ARM/
-
ARMTargetTransformInfo.h
-
PowerPC/
-
PPCTargetTransformInfo.h
-
PPCTargetTransformInfo.cpp
-
SystemZ/
-
SystemZTargetTransformInfo.h
-
SystemZTargetTransformInfo.cpp
-
WebAssembly/
-
WebAssemblyTargetTransformInfo.h
-
WebAssemblyTargetTransformInfo.cpp
-
X86/
-
X86TargetTransformInfo.h
-
X86TargetTransformInfo.cpp
-
XCore/
-
XCoreTargetTransformInfo.h
-
Transforms/
-
Scalar/
1
LoopStrengthReduce.cpp
-
Vectorize/
7/8
LoopVectorize.cpp
-
SLPVectorizer.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
PowerPC/
-
reg-usage.ll
-
X86/
-
reg-usage-debug.ll
-
reg-usage.ll

Differential D67148

[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
ClosedPublic

Authored by • wuzish on Sep 4 2019, 12:48 AM.

Download Raw Diff

Details

Reviewers

hfinkel
nemanjai
nadav
congh
chandlerc

Group Reviewers

Restricted Project

Commits

rG9802268ad312: recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure…
rL374634: recommit: [LoopVectorize][PowerPC] Estimate int and float register pressure…
rG9f41deccc0e6: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in…
rL374017: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in…

Summary

In loop-vectorize, interleave count and vector factor depend on target register number. Currently, it does not estimate different register pressure for different register class separately(especially for scalar type, float type should not be on the same position with int type), so it's not accurate. Specifically, it causes too many times interleaving/unrolling, result in too many register spills in loop body and hurting performance.

So we need classify the register classes in IR level, and importantly these are abstract register classes, and are not the target register class of
backend provided in td file. It's used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types.

For POWER target, register num is special when VSX is enabled. When VSX is enabled, the number of int scalar register is 32(GPR), float is 64(VSR), but for int and float vector register both are 64(VSR). So there should be 2 kinds of register class when vsx is enabled, and 3 kinds of register class when VSX is NOT enabled.

I test it on POWER target, it makes big(+~30%) performance improvement in one specific bmk of spec2017 and no other obvious degressions. Could anyone help to adjust the register num and verify in other targets?

Diff Detail

Event Timeline

• wuzish created this revision.Sep 4 2019, 12:48 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 4 2019, 12:49 AM

Herald added subscribers: llvm-commits, shchenz, zzheng and 11 others. · View Herald Transcript

• wuzish edited the summary of this revision. (Show Details)Sep 4 2019, 1:09 AM

• wuzish edited the summary of this revision. (Show Details)Sep 4 2019, 1:23 AM

• wuzish edited the summary of this revision. (Show Details)Sep 4 2019, 2:06 AM

Herald added a subscriber: dmgreen. · View Herald TranscriptSep 4 2019, 2:06 AM

Gentle pin..

Thanks for looking at this (it's been a problem for a long time). Let me suggest a different interface, which I believe will improve generality and reduce code duplication in the register-pressure estimator, and let me know what you think...

// Return the number of registers in the target-provided register class.
unsigned getNumberOfRegisters(unsigned ClassID = 0) const;

// Return the target-provided register class for the provided type.
unsigned getRegisterClassForType(Type *Ty) const;

The idea, then, is that we just calculate register usage for each register class separately (i.e., keep a hash table), and then when computing the interleaving factor, etc. we just iterate over all of the register classes returned by the target, and pick the smallest interleaving factor calculated over all of the register classes. There's probably even a nice way to construct a default implementation of this in the backend (although that we'd save for follow-up work).

In D67148#1665594, @hfinkel wrote:
Thanks for looking at this (it's been a problem for a long time). Let me suggest a different interface, which I believe will improve generality and reduce code duplication in the register-pressure estimator, and let me know what you think...
// Return the number of registers in the target-provided register class.
unsigned getNumberOfRegisters(unsigned ClassID = 0) const;

// Return the target-provided register class for the provided type.
unsigned getRegisterClassForType(Type *Ty) const;
The idea, then, is that we just calculate register usage for each register class separately (i.e., keep a hash table), and then when computing the interleaving factor, etc. we just iterate over all of the register classes returned by the target, and pick the smallest interleaving factor calculated over all of the register classes. There's probably even a nice way to construct a default implementation of this in the backend (although that we'd save for follow-up work).

Yes. Using ClassID is more general and fine-grained. But I think there is no need to iterate all kinds of register class because target register class is flexible and many register classes target provided are only different width with same position(such as gprc and g8rc), which makes too many kinds of register classes to maintain and some are just the same thing. It's not just use smallest interleaving factor calculated over all of the register classes and we also need to care about the overlapping relationship between different register class.

In all, it would be too fine-grained and little over-design, many register classes are just same behavior as separate with scalar and vector, int and float. What's your opinion?

In D67148#1665615, @wuzish wrote:
In D67148#1665594, @hfinkel wrote:
Thanks for looking at this (it's been a problem for a long time). Let me suggest a different interface, which I believe will improve generality and reduce code duplication in the register-pressure estimator, and let me know what you think...
// Return the number of registers in the target-provided register class.
unsigned getNumberOfRegisters(unsigned ClassID = 0) const;

// Return the target-provided register class for the provided type.
unsigned getRegisterClassForType(Type *Ty) const;
The idea, then, is that we just calculate register usage for each register class separately (i.e., keep a hash table), and then when computing the interleaving factor, etc. we just iterate over all of the register classes returned by the target, and pick the smallest interleaving factor calculated over all of the register classes. There's probably even a nice way to construct a default implementation of this in the backend (although that we'd save for follow-up work).
Yes. Using ClassID is more general and fine-grained. But I think there is no need to iterate all kinds of register class because target register class is flexible and many register classes target provided are only different width with same position(such as gprc and g8rc), which makes too many kinds of register classes to maintain and some are just the same thing. It's not just use smallest interleaving factor calculated over all of the register classes and we also need to care about the overlapping relationship between different register class.

In all, it would be too fine-grained and little over-design, many register classes are just same behavior as separate with scalar and vector, int and float. What's your opinion?

I think that you misunderstood my suggestion. I did not mean that the backend register classes would be directly mapped into the classes returned by the TTI interface. These would, especially for non-trivial architectures, be abstracted for the purpose of the TTI interface. For PPC, Altivec, we'd have three class IDs for now (scalar float, scalar int, vectors), all with 32 registers. For VSX, we'd have two register class IDs, one for scalar ints (with 32 registers), and one for everything else (with 64 registers). (*) These don't need to have anything to do with the register classes actually defined in the backend. Do we need to capture finer-grained details than that in the heuristic (e.g. is your overlap suggestion capturing more than this)?

(*) Actually, for both cases, we can also have a separate class ID for scalar i1 types (with 8 registers).

(*) Actually, for both cases, we can also have a separate class ID for scalar i1 types (with 8 registers).

Oops, I mean with 32 registers.

In D67148#1665626, @hfinkel wrote:
In D67148#1665615, @wuzish wrote:
In D67148#1665594, @hfinkel wrote:
Thanks for looking at this (it's been a problem for a long time). Let me suggest a different interface, which I believe will improve generality and reduce code duplication in the register-pressure estimator, and let me know what you think...
// Return the number of registers in the target-provided register class.
unsigned getNumberOfRegisters(unsigned ClassID = 0) const;

// Return the target-provided register class for the provided type.
unsigned getRegisterClassForType(Type *Ty) const;
The idea, then, is that we just calculate register usage for each register class separately (i.e., keep a hash table), and then when computing the interleaving factor, etc. we just iterate over all of the register classes returned by the target, and pick the smallest interleaving factor calculated over all of the register classes. There's probably even a nice way to construct a default implementation of this in the backend (although that we'd save for follow-up work).
Yes. Using ClassID is more general and fine-grained. But I think there is no need to iterate all kinds of register class because target register class is flexible and many register classes target provided are only different width with same position(such as gprc and g8rc), which makes too many kinds of register classes to maintain and some are just the same thing. It's not just use smallest interleaving factor calculated over all of the register classes and we also need to care about the overlapping relationship between different register class.

In all, it would be too fine-grained and little over-design, many register classes are just same behavior as separate with scalar and vector, int and float. What's your opinion?
I think that you misunderstood my suggestion. I did not mean that the backend register classes would be directly mapped into the classes returned by the TTI interface. These would, especially for non-trivial architectures, be abstracted for the purpose of the TTI interface. For PPC, Altivec, we'd have three class IDs for now (scalar float, scalar int, vectors), all with 32 registers. For VSX, we'd have two register class IDs, one for scalar ints (with 32 registers), and one for everything else (with 64 registers). (*) These don't need to have anything to do with the register classes actually defined in the backend. Do we need to capture finer-grained details than that in the heuristic (e.g. is your overlap suggestion capturing more than this)?

(*) Actually, for both cases, we can also have a separate class ID for scalar i1 types (with 8 registers).

Thank you. I see what you mean. The classID value can be defined by target itself and just use to distinguish. It's cool to use this method to express the overlapping so that it can be consistent to use smallest factor algorithm instead of separating considering.

Address comments. Add 3 function interfaces.

/// \return the number of registers in the target-provided register class.
unsigned getNumberOfRegisters(unsigned ClassID) const;

/// return the target-provided register class for the provided type.
unsigned getRegisterClassForType(Type *Ty, bool Vector) const;

/// return the target-provided register class name
const char* getRegisterClassName(unsigned ClassID) const;

Use register class enum to distinguish different llvm types that residing in different register positions. Every target can has its own register class mapping from llvm type to register class ID. For general backend implementation, there are 3 default register class, GenericIntScalarRC = 1, GenericFloatScalarRC = 2, GenericVectorRC = 3.

Thanks for exploring this direction.

llvm/include/llvm/Analysis/TargetTransformInfo.h
798	I'm not sure that these defaults make sense. Many targets won't even have these as distinct classes (e.g., PowerPC with VSX). I think that we should have the default implementation just return one register class, 0, with its current default (which I suppose is 8 registers), and the default implementation will put everything in that one class. Then, I don't think that we need this enum at all. My impression is that you decided to do it this way so that you could write in the other targets: bool Vector = (ClassID == TargetTransformInfo::GenericVectorRC); but I think it's better to just give all of the other targets which did something with Vector two register classes, and return the second one for all types which are vector types. That should match the current behavior and then the targets can customize as they see fit. But I'd leave this all within each target (there's no need to expose generic classes because there's no need for a generic meaning).
llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp
1390	This can be: unsigned TTIRegNum = TTI->getNumberOfRegisters(TTI->getRegisterClassForType(F.getType(), false)) - 1;
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7768–7769	I think that we can just make a separate function for this: TTI->hasVectorRegisters() (and then use that here and in the SLP vectorizer).

• wuzish marked an inline comment as done.Sep 15 2019, 11:01 PM

• wuzish added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7768–7769	I think it's could be like `TTI->getRegisterClassForType(F.getType(), true)` above

Address comments to simplify the default implementation for targets.

Use 1 and 0 to represent default register classes for vector and scalar type to keep other targets behavior as before.
Targets can reimplement it like PowerPC.

hfinkel added inline comments.Sep 16 2019, 1:42 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7768–7769	But above, F.getType() gives back the right scalar type because F is the LSR::Formula. Here F in the function, right? I don't think it makes sense to ask for the register class of the function type.

• wuzish marked an inline comment as done.Sep 16 2019, 8:16 PM

Change the interface prototype of getRegisterClassForType and make second parameter with default value.

unsigned getRegisterClassForType(bool Vector, Type *Ty = nullptr)

• wuzish marked an inline comment as done.Sep 16 2019, 8:23 PM

• wuzish added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7768–7769	Yes. And above case would return nullptr, so we need care about this situation. And here we can left type to be nullptr as default value argument.

arsenm added inline comments.Sep 16 2019, 8:26 PM

llvm/include/llvm/Analysis/TargetTransformInfo.h
801	I don't like spreading the concept of register classes corresponding to types. I also don't think register classes as a concept should be leaking out to the IR

• wuzish marked an inline comment as done.Sep 16 2019, 10:19 PM

• wuzish added inline comments.

llvm/include/llvm/Analysis/TargetTransformInfo.h
801	I think it's not the concept of register class in backend. It's an abstraction of register class in backend and just to classify and distinguish different kinds of data residing in different register position to help estimate register pressure.

hfinkel added inline comments.Sep 16 2019, 11:28 PM

llvm/include/llvm/Analysis/TargetTransformInfo.h
801	Yeah, I think that it's pretty important that these are abstract register classes - used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types - it's probably worth stating that explicitly in the description.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7768–7769	I suppose that this makes sense if we assume that `TTI->getRegisterClassForType` is called only on legal types? I'm a bit worried here because, at the IR level, all types are supported and should be legalized into some register class. We should better document the expected behavior here one way or the other.

• wuzish edited the summary of this revision. (Show Details)Sep 17 2019, 12:13 AM

• wuzish marked 2 inline comments as done.Sep 17 2019, 12:31 AM

• wuzish added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7768–7769	If I am not misunderstanding what you mean, then I think the type should be `isSingleValueType`. We can document it at the comments in getRegisterClassForType prototype declare.

Add some comments at function interface prototype.

• wuzish marked an inline comment as done.Sep 17 2019, 7:24 PM

Gentle pin...

@hfinkel Any more comments or advice?

Gentle pin...

I apologize for the delay; I've been contemplating what to recommend here...

(except for the comments below, I think this looks good)

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7768–7769	I agree, but I think that's insufficient. isSingleValueType is true for vector types, and so if we have an architecture with no vector registers, and we call TTI->getRegisterClassForType on a vector type, it would return the class associated with the scalarized type. I think that what you intend to say is something like: getRegisterClassForType returns the register class associated with the provided type, accounting for type promotion and other type-legalization techniques that the target might apply, however, it specifically does not account for the scalarization or splitting of vector types. Should a vector type require scalarization or splitting into multiple underlying vector registers, that type should be mapped to a register class containing no registers. In some sense, this seems reasonable, because the interface does not provide any way to figure out how many of a particular register in a register class the type might use, and so we can't return a sensible answer in cases where splitting or scalarization is required. It's a bit unfortunate, however, because the same consideration applies to scalar types too (e.g., i256 probably takes up multiple scalar registers). What we really should do, I think, is expand this interface to return, perhaps optionally, the number of registers of the provided class. In the implementation, in such a case, could start by running: std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty); and then using the MVT for making the register-class decisions. However, it does not seem possible to make this change while retaining the current behavior for other targets (because we'd change what happens for illegal types), and thus, I recommend that a comment be added along the lines of my suggestion above, and then we also add this: // FIXME: It's not currently possible to determine how many registers are used by the provided type. and we address this aspect later in a different patch.

• wuzish marked an inline comment as done.Sep 26 2019, 7:52 PM

• wuzish added inline comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7768–7769	May the number issue be addressed by such code in LoopVectorize.cpp? // A lambda that gets the register usage for the given type and VF. auto GetRegUsage = [&DL, WidestRegister](Type Ty, unsigned VF) { if (Ty->isTokenTy()) return 0U; unsigned TypeSize = DL.getTypeSizeInBits(Ty->getScalarType()); return std::max<unsigned>(1, VF TypeSize / WidestRegister); }; And the `WidestRegister` is from `TTI.getRegisterBitWidth` which I think should be related with Type*. Yes, the only one interface `TLI->getTypeLegalizationCost` would be more consistent and easy to maintain.

update the comment of getRegisterClassForType. Make it landed firstly and update the interface to leverage MVT in follow-up work.

We can land it firstly and update the interface to use MVT as simplest type to getRegisterClassForType parameter and call TLI->getTypeLegalizationCost before.
This can apply to where GetRegUsage works

In D67148#1685389, @wuzish wrote:

We can land it firstly and update the interface to use MVT as simplest type to getRegisterClassForType parameter and call TLI->getTypeLegalizationCost before.
This can apply to where GetRegUsage works

LGTM. Thanks!

This revision is now accepted and ready to land.Sep 27 2019, 8:18 AM

Rebase it and rerun bmk 2017 to make last check before launch.

nhaehnle added inline comments.Sep 30 2019, 10:54 AM

llvm/include/llvm/Analysis/TargetTransformInfo.h
801	Can we explicitly call out the fact that these are not the CodeGen register classes? Also, how about making the interface a `getRegisterClassForValue` as opposed to `getRegisterClassForType`? This would allow a context-sensitive determination of register classes. Specifically, in the AMDGPU backend, it would potentially allow us to distinguish between uniform and divergent values (in the overall program sense).

hfinkel added inline comments.Oct 1 2019, 12:20 PM

llvm/include/llvm/Analysis/TargetTransformInfo.h
801	Can we explicitly call out the fact that these are not the CodeGen register classes? I think that's a good idea. We can say that this is designed to provide a simple, high-level view of the register allocation later process later performed by the backend. These register classes don't necessarily map onto the register classes used by the backend. Also, how about making the interface a getRegisterClassForValue as opposed to getRegisterClassForType? This would allow a context-sensitive determination of register classes. Specifically, in the AMDGPU backend, it would potentially allow us to distinguish between uniform and divergent values (in the overall program sense). I don't object, but we should make sure to think about the overall computational complexity of the analysis the backend would need to use, and whether this per-value interface will allow that analysis to be performed efficiently.

• wuzish marked an inline comment as done.Oct 1 2019, 10:48 PM

• wuzish added inline comments.

llvm/include/llvm/Analysis/TargetTransformInfo.h
801	if use `getRegisterClassForValue`, I find a scene to distinguish 2 different register class in PowerPC target. For the i1 type generated from icmp, there are 8 CRRC, but there are 32 CRBITRC for the i1 type generated from other operation. Is this required? @hfinkel

I'd like to upstream it first, and interface modification would be done in follow-up patches.

Add more comments and rebase to up-to-date.

Closed by commit rG9f41deccc0e6: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in… (authored by • wuzish). · Explain WhyOct 7 2019, 10:27 PM

This revision was automatically updated to reflect the committed changes.

Hi,

This is breaking our internal benchmarks, could you please take a look? Thanks!

Steven

add ; REQUIRES: asserts for test case

update test case

bjope added a subscriber: bjope.Oct 12 2019, 12:09 AM

rampitec mentioned this in D122850: [AMDGPU] Fix regression with vectorization limiting.Mar 31 2022, 2:47 PM

rampitec mentioned this in rGfced87d457d3: [AMDGPU] Fix regression with vectorization limiting.Apr 8 2022, 5:47 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

26 lines

TargetTransformInfoImpl.h

15 lines

CodeGen/

BasicTTIImpl.h

2 lines

lib/

Analysis/

TargetTransformInfo.cpp

12 lines

Target/

AArch64/

AArch64TargetTransformInfo.h

3 lines

ARM/

ARMTargetTransformInfo.h

3 lines

PowerPC/

PPCTargetTransformInfo.h

8 lines

PPCTargetTransformInfo.cpp

42 lines

SystemZ/

SystemZTargetTransformInfo.h

2 lines

SystemZTargetTransformInfo.cpp

3 lines

WebAssembly/

WebAssemblyTargetTransformInfo.h

2 lines

WebAssemblyTargetTransformInfo.cpp

5 lines

X86/

X86TargetTransformInfo.h

2 lines

X86TargetTransformInfo.cpp

3 lines

XCore/

XCoreTargetTransformInfo.h

3 lines

Transforms/

Scalar/

LoopStrengthReduce.cpp

4 lines

Vectorize/

LoopVectorize.cpp

151 lines

SLPVectorizer.cpp

2 lines

test/

Transforms/

LoopVectorize/

PowerPC/

reg-usage.ll

178 lines

X86/

reg-usage-debug.ll

12 lines

reg-usage.ll

34 lines

Diff 220278

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 788 Lines • ▼ Show 20 Lines	enum OperandValueKind {
OK_UniformValue, // Operand is uniform (splat of a value).		OK_UniformValue, // Operand is uniform (splat of a value).
OK_UniformConstantValue, // Operand is uniform constant.		OK_UniformConstantValue, // Operand is uniform constant.
OK_NonUniformConstantValue // Operand is a non uniform constant value.		OK_NonUniformConstantValue // Operand is a non uniform constant value.
};		};

/// Additional properties of an operand's values.		/// Additional properties of an operand's values.
enum OperandValueProperties { OP_None = 0, OP_PowerOf2 = 1 };		enum OperandValueProperties { OP_None = 0, OP_PowerOf2 = 1 };

/// \return The number of scalar or vector registers that the target has.		/// \return the number of registers in the target-provided register class.
/// If 'Vectors' is true, it returns the number of vector registers. If it is		unsigned getNumberOfRegisters(unsigned ClassID) const;
		hfinkelUnsubmitted Not Done Reply Inline Actions I'm not sure that these defaults make sense. Many targets won't even have these as distinct classes (e.g., PowerPC with VSX). I think that we should have the default implementation just return one register class, 0, with its current default (which I suppose is 8 registers), and the default implementation will put everything in that one class. Then, I don't think that we need this enum at all. My impression is that you decided to do it this way so that you could write in the other targets: bool Vector = (ClassID == TargetTransformInfo::GenericVectorRC); but I think it's better to just give all of the other targets which did something with Vector two register classes, and return the second one for all types which are vector types. That should match the current behavior and then the targets can customize as they see fit. But I'd leave this all within each target (there's no need to expose generic classes because there's no need for a generic meaning). hfinkel: I'm not sure that these defaults make sense. Many targets won't even have these as distinct…
/// set to false, it returns the number of scalar registers.
unsigned getNumberOfRegisters(bool Vector) const;		/// return the target-provided register class for the provided type.
		unsigned getRegisterClassForType(Type *Ty, bool Vector) const;
		arsenmUnsubmitted Not Done Reply Inline Actions I don't like spreading the concept of register classes corresponding to types. I also don't think register classes as a concept should be leaking out to the IR arsenm: I don't like spreading the concept of register classes corresponding to types. I also don't…
		wuzishAuthorUnsubmitted Done Reply Inline Actions I think it's not the concept of register class in backend. It's an abstraction of register class in backend and just to classify and distinguish different kinds of data residing in different register position to help estimate register pressure. wuzish: I think it's not the concept of register class in backend. It's an abstraction of register…
		hfinkelUnsubmitted Done Reply Inline Actions Yeah, I think that it's pretty important that these are abstract register classes - used to establish the mapping between the types of IR values and the number of simultaneous live ranges to which we'd like to limit for some set of those types - it's probably worth stating that explicitly in the description. hfinkel: Yeah, I think that it's pretty important that these are abstract register classes - used to…
		nhaehnleUnsubmitted Not Done Reply Inline Actions Can we explicitly call out the fact that these are not the CodeGen register classes? Also, how about making the interface a `getRegisterClassForValue` as opposed to `getRegisterClassForType`? This would allow a context-sensitive determination of register classes. Specifically, in the AMDGPU backend, it would potentially allow us to distinguish between uniform and divergent values (in the overall program sense). nhaehnle: Can we explicitly call out the fact that these are not the CodeGen register classes? Also, how…
		hfinkelUnsubmitted Not Done Reply Inline Actions Can we explicitly call out the fact that these are not the CodeGen register classes? I think that's a good idea. We can say that this is designed to provide a simple, high-level view of the register allocation later process later performed by the backend. These register classes don't necessarily map onto the register classes used by the backend. Also, how about making the interface a getRegisterClassForValue as opposed to getRegisterClassForType? This would allow a context-sensitive determination of register classes. Specifically, in the AMDGPU backend, it would potentially allow us to distinguish between uniform and divergent values (in the overall program sense). I don't object, but we should make sure to think about the overall computational complexity of the analysis the backend would need to use, and whether this per-value interface will allow that analysis to be performed efficiently. hfinkel: > Can we explicitly call out the fact that these are not the CodeGen register classes? I think…
		wuzishAuthorUnsubmitted Done Reply Inline Actions if use `getRegisterClassForValue`, I find a scene to distinguish 2 different register class in PowerPC target. For the i1 type generated from icmp, there are 8 CRRC, but there are 32 CRBITRC for the i1 type generated from other operation. Is this required? @hfinkel wuzish: if use `getRegisterClassForValue`, I find a scene to distinguish 2 different register class in…

		/// return the target-provided register class name
		const char* getRegisterClassName(unsigned ClassID) const;

/// \return The width of the largest scalar or vector register type.		/// \return The width of the largest scalar or vector register type.
unsigned getRegisterBitWidth(bool Vector) const;		unsigned getRegisterBitWidth(bool Vector) const;

/// \return The width of the smallest vector register type.		/// \return The width of the smallest vector register type.
unsigned getMinVectorRegisterBitWidth() const;		unsigned getMinVectorRegisterBitWidth() const;

/// \return True if the vectorization factor should be chosen to		/// \return True if the vectorization factor should be chosen to
▲ Show 20 Lines • Show All 437 Lines • ▼ Show 20 Lines	public:
virtual int getFPOpCost(Type *Ty) = 0;		virtual int getFPOpCost(Type *Ty) = 0;
virtual int getIntImmCodeSizeCost(unsigned Opc, unsigned Idx, const APInt &Imm,		virtual int getIntImmCodeSizeCost(unsigned Opc, unsigned Idx, const APInt &Imm,
Type *Ty) = 0;		Type *Ty) = 0;
virtual int getIntImmCost(const APInt &Imm, Type *Ty) = 0;		virtual int getIntImmCost(const APInt &Imm, Type *Ty) = 0;
virtual int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,		virtual int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,
Type *Ty) = 0;		Type *Ty) = 0;
virtual int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		virtual int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
Type *Ty) = 0;		Type *Ty) = 0;
virtual unsigned getNumberOfRegisters(bool Vector) = 0;		virtual unsigned getNumberOfRegisters(unsigned ClassID) const = 0;
		virtual unsigned getRegisterClassForType(Type *Ty, bool Vector) const = 0;
		virtual const char* getRegisterClassName(unsigned ClassID) const = 0;
virtual unsigned getRegisterBitWidth(bool Vector) const = 0;		virtual unsigned getRegisterBitWidth(bool Vector) const = 0;
virtual unsigned getMinVectorRegisterBitWidth() = 0;		virtual unsigned getMinVectorRegisterBitWidth() = 0;
virtual bool shouldMaximizeVectorBandwidth(bool OptSize) const = 0;		virtual bool shouldMaximizeVectorBandwidth(bool OptSize) const = 0;
virtual unsigned getMinimumVF(unsigned ElemWidth) const = 0;		virtual unsigned getMinimumVF(unsigned ElemWidth) const = 0;
virtual bool shouldConsiderAddressTypePromotion(		virtual bool shouldConsiderAddressTypePromotion(
const Instruction &I, bool &AllowPromotionWithoutCommonHeader) = 0;		const Instruction &I, bool &AllowPromotionWithoutCommonHeader) = 0;
virtual unsigned getCacheLineSize() = 0;		virtual unsigned getCacheLineSize() = 0;
virtual llvm::Optional<unsigned> getCacheSize(CacheLevel Level) = 0;		virtual llvm::Optional<unsigned> getCacheSize(CacheLevel Level) = 0;
▲ Show 20 Lines • Show All 328 Lines • ▼ Show 20 Lines	public:
int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,		int getIntImmCost(unsigned Opc, unsigned Idx, const APInt &Imm,
Type *Ty) override {		Type *Ty) override {
return Impl.getIntImmCost(Opc, Idx, Imm, Ty);		return Impl.getIntImmCost(Opc, Idx, Imm, Ty);
}		}
int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
Type *Ty) override {		Type *Ty) override {
return Impl.getIntImmCost(IID, Idx, Imm, Ty);		return Impl.getIntImmCost(IID, Idx, Imm, Ty);
}		}
unsigned getNumberOfRegisters(bool Vector) override {		unsigned getNumberOfRegisters(unsigned ClassID) const override {
return Impl.getNumberOfRegisters(Vector);		return Impl.getNumberOfRegisters(ClassID);
		}
		unsigned getRegisterClassForType(Type *Ty, bool Vector) const override {
		return Impl.getRegisterClassForType(Ty, Vector);
		}
		const char* getRegisterClassName(unsigned ClassID) const override {
		return Impl.getRegisterClassName(ClassID);
}		}
unsigned getRegisterBitWidth(bool Vector) const override {		unsigned getRegisterBitWidth(bool Vector) const override {
return Impl.getRegisterBitWidth(Vector);		return Impl.getRegisterBitWidth(Vector);
}		}
unsigned getMinVectorRegisterBitWidth() override {		unsigned getMinVectorRegisterBitWidth() override {
return Impl.getMinVectorRegisterBitWidth();		return Impl.getMinVectorRegisterBitWidth();
}		}
bool shouldMaximizeVectorBandwidth(bool OptSize) const override {		bool shouldMaximizeVectorBandwidth(bool OptSize) const override {
▲ Show 20 Lines • Show All 304 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 352 Lines • ▼ Show 20 Lines	unsigned getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm,
return TTI::TCC_Free;		return TTI::TCC_Free;
}		}

unsigned getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		unsigned getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
Type *Ty) {		Type *Ty) {
return TTI::TCC_Free;		return TTI::TCC_Free;
}		}

unsigned getNumberOfRegisters(bool Vector) { return 8; }		unsigned getNumberOfRegisters(unsigned ClassID) const { return 8; }

		unsigned getRegisterClassForType(Type *Ty, bool Vector) const {
		return Vector ? 1 : 0;
		};

		const char* getRegisterClassName(unsigned ClassID) const {
		switch (ClassID) {
		default:
		return "Generic::Unknown Register Class";
		case 0: return "Generic::ScalarRC";
		case 1: return "Generic::VectorRC";
		}
		}

unsigned getRegisterBitWidth(bool Vector) const { return 32; }		unsigned getRegisterBitWidth(bool Vector) const { return 32; }

unsigned getMinVectorRegisterBitWidth() { return 128; }		unsigned getMinVectorRegisterBitWidth() { return 128; }

bool shouldMaximizeVectorBandwidth(bool OptSize) const { return false; }		bool shouldMaximizeVectorBandwidth(bool OptSize) const { return false; }

unsigned getMinimumVF(unsigned ElemWidth) const { return 0; }		unsigned getMinimumVF(unsigned ElemWidth) const { return 0; }
▲ Show 20 Lines • Show All 543 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 513 Lines • ▼ Show 20 Lines	int getInstructionLatency(const Instruction *I) {
return BaseT::getInstructionLatency(I);		return BaseT::getInstructionLatency(I);
}		}

/// @}		/// @}

/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
/// @{		/// @{

unsigned getNumberOfRegisters(bool Vector) { return Vector ? 0 : 1; }

unsigned getRegisterBitWidth(bool Vector) const { return 32; }		unsigned getRegisterBitWidth(bool Vector) const { return 32; }

/// Estimate the overhead of scalarizing an instruction. Insert and Extract		/// Estimate the overhead of scalarizing an instruction. Insert and Extract
/// are set if the result needs to be inserted and/or extracted from vectors.		/// are set if the result needs to be inserted and/or extracted from vectors.
unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) {		unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) {
assert(Ty->isVectorTy() && "Can only scalarize vectors");		assert(Ty->isVectorTy() && "Can only scalarize vectors");
unsigned Cost = 0;		unsigned Cost = 0;

▲ Show 20 Lines • Show All 1,183 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 474 Lines • ▼ Show 20 Lines

	int TargetTransformInfo::getIntImmCost(Intrinsic::ID IID, unsigned Idx,			int TargetTransformInfo::getIntImmCost(Intrinsic::ID IID, unsigned Idx,
	const APInt &Imm, Type *Ty) const {			const APInt &Imm, Type *Ty) const {
	int Cost = TTIImpl->getIntImmCost(IID, Idx, Imm, Ty);			int Cost = TTIImpl->getIntImmCost(IID, Idx, Imm, Ty);
	assert(Cost >= 0 && "TTI should not produce negative costs!");			assert(Cost >= 0 && "TTI should not produce negative costs!");
	return Cost;			return Cost;
	}			}

	unsigned TargetTransformInfo::getNumberOfRegisters(bool Vector) const {			unsigned TargetTransformInfo::getNumberOfRegisters(unsigned ClassID) const {
	return TTIImpl->getNumberOfRegisters(Vector);			return TTIImpl->getNumberOfRegisters(ClassID);
				}

				unsigned TargetTransformInfo::getRegisterClassForType(Type *Ty, bool Vector) const {
				return TTIImpl->getRegisterClassForType(Ty, Vector);
				}

				const char* TargetTransformInfo::getRegisterClassName(unsigned ClassID) const {
				return TTIImpl->getRegisterClassName(ClassID);
	}			}

	unsigned TargetTransformInfo::getRegisterBitWidth(bool Vector) const {			unsigned TargetTransformInfo::getRegisterBitWidth(bool Vector) const {
	return TTIImpl->getRegisterBitWidth(Vector);			return TTIImpl->getRegisterBitWidth(Vector);
	}			}

	unsigned TargetTransformInfo::getMinVectorRegisterBitWidth() const {			unsigned TargetTransformInfo::getMinVectorRegisterBitWidth() const {
	return TTIImpl->getMinVectorRegisterBitWidth();			return TTIImpl->getMinVectorRegisterBitWidth();
	▲ Show 20 Lines • Show All 896 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines	public:

/// @}		/// @}

/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
/// @{		/// @{

bool enableInterleavedAccessVectorization() { return true; }		bool enableInterleavedAccessVectorization() { return true; }

unsigned getNumberOfRegisters(bool Vector) {		unsigned getNumberOfRegisters(unsigned ClassID) const {
		bool Vector = (ClassID == 1);
if (Vector) {		if (Vector) {
if (ST->hasNEON())		if (ST->hasNEON())
return 32;		return 32;
return 0;		return 0;
}		}
return 31;		return 31;
}		}

▲ Show 20 Lines • Show All 91 Lines • Show Last 20 Lines

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

Show First 20 Lines • Show All 116 Lines • ▼ Show 20 Lines	public:

int getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, Type *Ty);		int getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, Type *Ty);

/// @}		/// @}

/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
/// @{		/// @{

unsigned getNumberOfRegisters(bool Vector) {		unsigned getNumberOfRegisters(unsigned ClassID) const {
		bool Vector = (ClassID == 1);
if (Vector) {		if (Vector) {
if (ST->hasNEON())		if (ST->hasNEON())
return 16;		return 16;
if (ST->hasMVEIntegerOps())		if (ST->hasMVEIntegerOps())
return 8;		return 8;
return 0;		return 0;
}		}

▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h

Show First 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	public:

/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
/// @{		/// @{
bool useColdCCForColdCall(Function &F);		bool useColdCCForColdCall(Function &F);
bool enableAggressiveInterleaving(bool LoopHasReductions);		bool enableAggressiveInterleaving(bool LoopHasReductions);
TTI::MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize,		TTI::MemCmpExpansionOptions enableMemCmpExpansion(bool OptSize,
bool IsZeroCmp) const;		bool IsZeroCmp) const;
bool enableInterleavedAccessVectorization();		bool enableInterleavedAccessVectorization();
unsigned getNumberOfRegisters(bool Vector);
		enum PPCRegisterClass {
		GPRRC, FPRRC, VRRC, VSXRC
		};
		unsigned getNumberOfRegisters(unsigned ClassID) const;
		unsigned getRegisterClassForType(Type *Ty, bool Vector) const;
		const char* getRegisterClassName(unsigned ClassID) const;
unsigned getRegisterBitWidth(bool Vector) const;		unsigned getRegisterBitWidth(bool Vector) const;
unsigned getCacheLineSize();		unsigned getCacheLineSize();
unsigned getPrefetchDistance();		unsigned getPrefetchDistance();
unsigned getMaxInterleaveFactor(unsigned VF);		unsigned getMaxInterleaveFactor(unsigned VF);
int vectorCostAdjustment(int Cost, unsigned Opcode, Type Ty1, Type Ty2);		int vectorCostAdjustment(int Cost, unsigned Opcode, Type Ty1, Type Ty2);
int getArithmeticInstrCost(		int getArithmeticInstrCost(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
Show All 26 Lines

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

Show First 20 Lines • Show All 588 Lines • ▼ Show 20 Lines	PPCTTIImpl::enableMemCmpExpansion(bool OptSize, bool IsZeroCmp) const {
Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);		Options.MaxNumLoads = TLI->getMaxExpandSizeMemcmp(OptSize);
return Options;		return Options;
}		}

bool PPCTTIImpl::enableInterleavedAccessVectorization() {		bool PPCTTIImpl::enableInterleavedAccessVectorization() {
return true;		return true;
}		}

unsigned PPCTTIImpl::getNumberOfRegisters(bool Vector) {		unsigned PPCTTIImpl::getNumberOfRegisters(unsigned ClassID) const {
if (Vector && !ST->hasAltivec() && !ST->hasQPX())		assert(ClassID == GPRRC \|\| ClassID == FPRRC \|\|
return 0;		ClassID == VRRC \|\| ClassID == VSXRC);
return ST->hasVSX() ? 64 : 32;		if (ST->hasVSX()) {
		assert(ClassID == GPRRC \|\| ClassID == VSXRC);
		return ClassID == GPRRC ? 32 : 64;
		}
		assert(ClassID == GPRRC \|\| ClassID == FPRRC \|\| ClassID == VRRC);
		return 32;
		}

		unsigned PPCTTIImpl::getRegisterClassForType(Type *Ty, bool Vector) const {
		if (ST->hasVSX()) {
		if (!Vector && !Ty->getScalarType()->isFloatTy())
		return GPRRC;
		else
		return VSXRC;
		}

		if (Vector)
		return VRRC;
		else if (Ty->getScalarType()->isFloatTy())
		return FPRRC;
		else
		return GPRRC;
		}

		const char* PPCTTIImpl::getRegisterClassName(unsigned ClassID) const {

		switch (ClassID) {
		default:
		llvm_unreachable("unknown register class");
		return "PPC::unknown register class";
		case GPRRC: return "PPC::GPRRC";
		case FPRRC: return "PPC::FPRRC";
		case VRRC: return "PPC::VRRC";
		case VSXRC: return "PPC::VSXRC";
		}
}		}

unsigned PPCTTIImpl::getRegisterBitWidth(bool Vector) const {		unsigned PPCTTIImpl::getRegisterBitWidth(bool Vector) const {
if (Vector) {		if (Vector) {
if (ST->hasQPX()) return 256;		if (ST->hasQPX()) return 256;
if (ST->hasAltivec()) return 128;		if (ST->hasAltivec()) return 128;
return 0;		return 0;
}		}
▲ Show 20 Lines • Show All 291 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	public:

bool isLSRCostLess(TargetTransformInfo::LSRCost &C1,		bool isLSRCostLess(TargetTransformInfo::LSRCost &C1,
TargetTransformInfo::LSRCost &C2);		TargetTransformInfo::LSRCost &C2);
/// @}		/// @}

/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
/// @{		/// @{

unsigned getNumberOfRegisters(bool Vector);		unsigned getNumberOfRegisters(unsigned ClassID) const;
unsigned getRegisterBitWidth(bool Vector) const;		unsigned getRegisterBitWidth(bool Vector) const;

unsigned getCacheLineSize() { return 256; }		unsigned getCacheLineSize() { return 256; }
unsigned getPrefetchDistance() { return 2000; }		unsigned getPrefetchDistance() { return 2000; }
unsigned getMinPrefetchStride() { return 2048; }		unsigned getMinPrefetchStride() { return 2048; }

bool hasDivRemOp(Type *DataType, bool IsSigned);		bool hasDivRemOp(Type *DataType, bool IsSigned);
bool prefersVectorizedAddressing() { return false; }		bool prefersVectorizedAddressing() { return false; }
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp

Show First 20 Lines • Show All 298 Lines • ▼ Show 20 Lines	bool SystemZTTIImpl::isLSRCostLess(TargetTransformInfo::LSRCost &C1,
return std::tie(C1.Insns, C1.NumRegs, C1.AddRecCost,		return std::tie(C1.Insns, C1.NumRegs, C1.AddRecCost,
C1.NumIVMuls, C1.NumBaseAdds,		C1.NumIVMuls, C1.NumBaseAdds,
C1.ScaleCost, C1.SetupCost) <		C1.ScaleCost, C1.SetupCost) <
std::tie(C2.Insns, C2.NumRegs, C2.AddRecCost,		std::tie(C2.Insns, C2.NumRegs, C2.AddRecCost,
C2.NumIVMuls, C2.NumBaseAdds,		C2.NumIVMuls, C2.NumBaseAdds,
C2.ScaleCost, C2.SetupCost);		C2.ScaleCost, C2.SetupCost);
}		}

unsigned SystemZTTIImpl::getNumberOfRegisters(bool Vector) {		unsigned SystemZTTIImpl::getNumberOfRegisters(unsigned ClassID) const {
		bool Vector = (ClassID == 1);
if (!Vector)		if (!Vector)
// Discount the stack pointer. Also leave out %r0, since it can't		// Discount the stack pointer. Also leave out %r0, since it can't
// be used in an address.		// be used in an address.
return 14;		return 14;
if (ST->hasVector())		if (ST->hasVector())
return 32;		return 32;
return 0;		return 0;
}		}
▲ Show 20 Lines • Show All 831 Lines • Show Last 20 Lines

llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	public:

TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth) const;		TTI::PopcntSupportKind getPopcntSupport(unsigned TyWidth) const;

/// @}		/// @}

/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
/// @{		/// @{

unsigned getNumberOfRegisters(bool Vector);		unsigned getNumberOfRegisters(unsigned ClassID) const;
unsigned getRegisterBitWidth(bool Vector) const;		unsigned getRegisterBitWidth(bool Vector) const;
unsigned getArithmeticInstrCost(		unsigned getArithmeticInstrCost(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
ArrayRef<const Value > Args = ArrayRef<const Value >());		ArrayRef<const Value > Args = ArrayRef<const Value >());
unsigned getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);		unsigned getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index);

/// @}		/// @}
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp

	Show All 19 Lines
	#define DEBUG_TYPE "wasmtti"			#define DEBUG_TYPE "wasmtti"

	TargetTransformInfo::PopcntSupportKind			TargetTransformInfo::PopcntSupportKind
	WebAssemblyTTIImpl::getPopcntSupport(unsigned TyWidth) const {			WebAssemblyTTIImpl::getPopcntSupport(unsigned TyWidth) const {
	assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");			assert(isPowerOf2_32(TyWidth) && "Ty width must be power of 2");
	return TargetTransformInfo::PSK_FastHardware;			return TargetTransformInfo::PSK_FastHardware;
	}			}

	unsigned WebAssemblyTTIImpl::getNumberOfRegisters(bool Vector) {			unsigned WebAssemblyTTIImpl::getNumberOfRegisters(unsigned ClassID) const {
	unsigned Result = BaseT::getNumberOfRegisters(Vector);			unsigned Result = BaseT::getNumberOfRegisters(ClassID);

	// For SIMD, use at least 16 registers, as a rough guess.			// For SIMD, use at least 16 registers, as a rough guess.
				bool Vector = (ClassID == 1);
	if (Vector)			if (Vector)
	Result = std::max(Result, 16u);			Result = std::max(Result, 16u);

	return Result;			return Result;
	}			}

	unsigned WebAssemblyTTIImpl::getRegisterBitWidth(bool Vector) const {			unsigned WebAssemblyTTIImpl::getRegisterBitWidth(bool Vector) const {
	if (Vector && getST()->hasSIMD128())			if (Vector && getST()->hasSIMD128())
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	llvm::Optional<unsigned> getCacheSize(
TargetTransformInfo::CacheLevel Level) const;		TargetTransformInfo::CacheLevel Level) const;
llvm::Optional<unsigned> getCacheAssociativity(		llvm::Optional<unsigned> getCacheAssociativity(
TargetTransformInfo::CacheLevel Level) const;		TargetTransformInfo::CacheLevel Level) const;
/// @}		/// @}

/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
/// @{		/// @{

unsigned getNumberOfRegisters(bool Vector);		unsigned getNumberOfRegisters(unsigned ClassID) const;
unsigned getRegisterBitWidth(bool Vector) const;		unsigned getRegisterBitWidth(bool Vector) const;
unsigned getLoadStoreVecRegBitWidth(unsigned AS) const;		unsigned getLoadStoreVecRegBitWidth(unsigned AS) const;
unsigned getMaxInterleaveFactor(unsigned VF);		unsigned getMaxInterleaveFactor(unsigned VF);
int getArithmeticInstrCost(		int getArithmeticInstrCost(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
▲ Show 20 Lines • Show All 90 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	case TargetTransformInfo::CacheLevel::L1D:
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case TargetTransformInfo::CacheLevel::L2D:		case TargetTransformInfo::CacheLevel::L2D:
return 8;		return 8;
}		}

llvm_unreachable("Unknown TargetTransformInfo::CacheLevel");		llvm_unreachable("Unknown TargetTransformInfo::CacheLevel");
}		}

unsigned X86TTIImpl::getNumberOfRegisters(bool Vector) {		unsigned X86TTIImpl::getNumberOfRegisters(unsigned ClassID) const {
		bool Vector = (ClassID == 1);
if (Vector && !ST->hasSSE1())		if (Vector && !ST->hasSSE1())
return 0;		return 0;

if (ST->is64Bit()) {		if (ST->is64Bit()) {
if (Vector && ST->hasAVX512())		if (Vector && ST->hasAVX512())
return 32;		return 32;
return 16;		return 16;
}		}
▲ Show 20 Lines • Show All 3,607 Lines • Show Last 20 Lines

llvm/lib/Target/XCore/XCoreTargetTransformInfo.h

Show All 34 Lines	class XCoreTTIImpl : public BasicTTIImplBase<XCoreTTIImpl> {
const XCoreSubtarget *getST() const { return ST; }		const XCoreSubtarget *getST() const { return ST; }
const XCoreTargetLowering *getTLI() const { return TLI; }		const XCoreTargetLowering *getTLI() const { return TLI; }

public:		public:
explicit XCoreTTIImpl(const XCoreTargetMachine *TM, const Function &F)		explicit XCoreTTIImpl(const XCoreTargetMachine *TM, const Function &F)
: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl()),		: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl()),
TLI(ST->getTargetLowering()) {}		TLI(ST->getTargetLowering()) {}

unsigned getNumberOfRegisters(bool Vector) {		unsigned getNumberOfRegisters(unsigned ClassID) const {
		bool Vector = (ClassID == 1);
if (Vector) {		if (Vector) {
return 0;		return 0;
}		}
return 12;		return 12;
}		}
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp

Show First 20 Lines • Show All 1,380 Lines • ▼ Show 20 Lines	void Cost::RateFormula(const Formula &F,
// If we don't count instruction cost exit here.		// If we don't count instruction cost exit here.
if (!InsnsCost) {		if (!InsnsCost) {
assert(isValid() && "invalid cost");		assert(isValid() && "invalid cost");
return;		return;
}		}

// Treat every new register that exceeds TTI.getNumberOfRegisters() - 1 as		// Treat every new register that exceeds TTI.getNumberOfRegisters() - 1 as
// additional instruction (at least fill).		// additional instruction (at least fill).
unsigned TTIRegNum = TTI->getNumberOfRegisters(false) - 1;		// TODO: Need distinguish register class?
		unsigned TTIRegNum = TTI->getNumberOfRegisters(
		hfinkelUnsubmitted Not Done Reply Inline Actions This can be: unsigned TTIRegNum = TTI->getNumberOfRegisters(TTI->getRegisterClassForType(F.getType(), false)) - 1; hfinkel: This can be: unsigned TTIRegNum = TTI->getNumberOfRegisters(TTI->getRegisterClassForType(F.
		TTI->getRegisterClassForType(F.getType(), false)) - 1;
if (C.NumRegs > TTIRegNum) {		if (C.NumRegs > TTIRegNum) {
// Cost already exceeded TTIRegNum, then only newly added register can add		// Cost already exceeded TTIRegNum, then only newly added register can add
// new instructions.		// new instructions.
if (PrevNumRegs > TTIRegNum)		if (PrevNumRegs > TTIRegNum)
C.Insns += (C.NumRegs - PrevNumRegs);		C.Insns += (C.NumRegs - PrevNumRegs);
else		else
C.Insns += (C.NumRegs - TTIRegNum);		C.Insns += (C.NumRegs - TTIRegNum);
}		}
▲ Show 20 Lines • Show All 4,374 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 977 Lines • ▼ Show 20 Lines	public:
/// The calculated cost is saved with widening decision in order to		/// The calculated cost is saved with widening decision in order to
/// avoid redundant calculations.		/// avoid redundant calculations.
void setCostBasedWideningDecision(unsigned VF);		void setCostBasedWideningDecision(unsigned VF);

/// A struct that represents some properties of the register usage		/// A struct that represents some properties of the register usage
/// of a loop.		/// of a loop.
struct RegisterUsage {		struct RegisterUsage {
/// Holds the number of loop invariant values that are used in the loop.		/// Holds the number of loop invariant values that are used in the loop.
unsigned LoopInvariantRegs;		/// The key is ClassID of target-provided register class.
		SmallMapVector<unsigned, unsigned, 4> LoopInvariantRegs;
/// Holds the maximum number of concurrent live intervals in the loop.		/// Holds the maximum number of concurrent live intervals in the loop.
unsigned MaxLocalUsers;		/// The key is ClassID of target-provided register class.
		SmallMapVector<unsigned, unsigned, 4> MaxLocalUsers;
};		};

/// \return Returns information about the register usages of the loop for the		/// \return Returns information about the register usages of the loop for the
/// given vectorization factors.		/// given vectorization factors.
SmallVector<RegisterUsage, 8> calculateRegisterUsage(ArrayRef<unsigned> VFs);		SmallVector<RegisterUsage, 8> calculateRegisterUsage(ArrayRef<unsigned> VFs);

/// Collect values we want to ignore in the cost model.		/// Collect values we want to ignore in the cost model.
void collectValuesToIgnore();		void collectValuesToIgnore();
▲ Show 20 Lines • Show All 3,925 Lines • ▼ Show 20 Lines	if (TTI.shouldMaximizeVectorBandwidth(!isScalarEpilogueAllowed()) \|\|
for (unsigned VS = MaxVectorSize * 2; VS <= NewMaxVectorSize; VS *= 2)		for (unsigned VS = MaxVectorSize * 2; VS <= NewMaxVectorSize; VS *= 2)
VFs.push_back(VS);		VFs.push_back(VS);

// For each VF calculate its register usage.		// For each VF calculate its register usage.
auto RUs = calculateRegisterUsage(VFs);		auto RUs = calculateRegisterUsage(VFs);

// Select the largest VF which doesn't require more registers than existing		// Select the largest VF which doesn't require more registers than existing
// ones.		// ones.
unsigned TargetNumRegisters = TTI.getNumberOfRegisters(true);
for (int i = RUs.size() - 1; i >= 0; --i) {		for (int i = RUs.size() - 1; i >= 0; --i) {
if (RUs[i].MaxLocalUsers <= TargetNumRegisters) {		bool Selected = true;
		for (auto& pair : RUs[i].MaxLocalUsers) {
		unsigned TargetNumRegisters = TTI.getNumberOfRegisters(pair.first);
		if (pair.second > TargetNumRegisters)
		Selected = false;
		}
		if (Selected) {
MaxVF = VFs[i];		MaxVF = VFs[i];
break;		break;
}		}
}		}
if (unsigned MinVF = TTI.getMinimumVF(SmallestType)) {		if (unsigned MinVF = TTI.getMinimumVF(SmallestType)) {
if (MaxVF < MinVF) {		if (MaxVF < MinVF) {
LLVM_DEBUG(dbgs() << "LV: Overriding calculated MaxVF(" << MaxVF		LLVM_DEBUG(dbgs() << "LV: Overriding calculated MaxVF(" << MaxVF
<< ") with target's minimum: " << MinVF << '\n');		<< ") with target's minimum: " << MinVF << '\n');
▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	unsigned LoopVectorizationCostModel::selectInterleaveCount(unsigned VF,
if (Legal->getMaxSafeDepDistBytes() != -1U)		if (Legal->getMaxSafeDepDistBytes() != -1U)
return 1;		return 1;

// Do not interleave loops with a relatively small trip count.		// Do not interleave loops with a relatively small trip count.
unsigned TC = PSE.getSE()->getSmallConstantTripCount(TheLoop);		unsigned TC = PSE.getSE()->getSmallConstantTripCount(TheLoop);
if (TC > 1 && TC < TinyTripCountInterleaveThreshold)		if (TC > 1 && TC < TinyTripCountInterleaveThreshold)
return 1;		return 1;

unsigned TargetNumRegisters = TTI.getNumberOfRegisters(VF > 1);
LLVM_DEBUG(dbgs() << "LV: The target has " << TargetNumRegisters
<< " registers\n");

if (VF == 1) {
if (ForceTargetNumScalarRegs.getNumOccurrences() > 0)
TargetNumRegisters = ForceTargetNumScalarRegs;
} else {
if (ForceTargetNumVectorRegs.getNumOccurrences() > 0)
TargetNumRegisters = ForceTargetNumVectorRegs;
}

RegisterUsage R = calculateRegisterUsage({VF})[0];		RegisterUsage R = calculateRegisterUsage({VF})[0];
// We divide by these constants so assume that we have at least one		// We divide by these constants so assume that we have at least one
// instruction that uses at least one register.		// instruction that uses at least one register.
R.MaxLocalUsers = std::max(R.MaxLocalUsers, 1U);		for (auto& pair : R.MaxLocalUsers) {
		pair.second = std::max(pair.second, 1U);
		}

// We calculate the interleave count using the following formula.		// We calculate the interleave count using the following formula.
// Subtract the number of loop invariants from the number of available		// Subtract the number of loop invariants from the number of available
// registers. These registers are used by all of the interleaved instances.		// registers. These registers are used by all of the interleaved instances.
// Next, divide the remaining registers by the number of registers that is		// Next, divide the remaining registers by the number of registers that is
// required by the loop, in order to estimate how many parallel instances		// required by the loop, in order to estimate how many parallel instances
// fit without causing spills. All of this is rounded down if necessary to be		// fit without causing spills. All of this is rounded down if necessary to be
// a power of two. We want power of two interleave count to simplify any		// a power of two. We want power of two interleave count to simplify any
// addressing operations or alignment considerations.		// addressing operations or alignment considerations.
// We also want power of two interleave counts to ensure that the induction		// We also want power of two interleave counts to ensure that the induction
// variable of the vector loop wraps to zero, when tail is folded by masking;		// variable of the vector loop wraps to zero, when tail is folded by masking;
// this currently happens when OptForSize, in which case IC is set to 1 above.		// this currently happens when OptForSize, in which case IC is set to 1 above.
unsigned IC = PowerOf2Floor((TargetNumRegisters - R.LoopInvariantRegs) /		unsigned IC = UINT_MAX;
R.MaxLocalUsers);

		for (auto& pair : R.MaxLocalUsers) {
		unsigned TargetNumRegisters = TTI.getNumberOfRegisters(pair.first);
		LLVM_DEBUG(dbgs() << "LV: The target has " << TargetNumRegisters
		<< " registers of "
		<< TTI.getRegisterClassName(pair.first) << " register class\n");
		if (VF == 1) {
		if (ForceTargetNumScalarRegs.getNumOccurrences() > 0)
		TargetNumRegisters = ForceTargetNumScalarRegs;
		} else {
		if (ForceTargetNumVectorRegs.getNumOccurrences() > 0)
		TargetNumRegisters = ForceTargetNumVectorRegs;
		}
		unsigned MaxLocalUsers = pair.second;
		unsigned LoopInvariantRegs = 0;
		if (R.LoopInvariantRegs.find(pair.first) != R.LoopInvariantRegs.end())
		LoopInvariantRegs = R.LoopInvariantRegs[pair.first];

		unsigned TmpIC = PowerOf2Floor((TargetNumRegisters - LoopInvariantRegs) / MaxLocalUsers);
// Don't count the induction variable as interleaved.		// Don't count the induction variable as interleaved.
if (EnableIndVarRegisterHeur)		if (EnableIndVarRegisterHeur) {
IC = PowerOf2Floor((TargetNumRegisters - R.LoopInvariantRegs - 1) /		TmpIC =
std::max(1U, (R.MaxLocalUsers - 1)));		PowerOf2Floor((TargetNumRegisters - LoopInvariantRegs - 1) /
		std::max(1U, (MaxLocalUsers - 1)));
		}

		IC = std::min(IC, TmpIC);
		}

// Clamp the interleave ranges to reasonable counts.		// Clamp the interleave ranges to reasonable counts.
unsigned MaxInterleaveCount = TTI.getMaxInterleaveFactor(VF);		unsigned MaxInterleaveCount = TTI.getMaxInterleaveFactor(VF);

// Check if the user has overridden the max.		// Check if the user has overridden the max.
if (VF == 1) {		if (VF == 1) {
if (ForceTargetMaxScalarInterleaveFactor.getNumOccurrences() > 0)		if (ForceTargetMaxScalarInterleaveFactor.getNumOccurrences() > 0)
MaxInterleaveCount = ForceTargetMaxScalarInterleaveFactor;		MaxInterleaveCount = ForceTargetMaxScalarInterleaveFactor;
▲ Show 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::calculateRegisterUsage(ArrayRef<unsigned> VFs) {
unsigned MaxSafeDepDist = -1U;		unsigned MaxSafeDepDist = -1U;
if (Legal->getMaxSafeDepDistBytes() != -1U)		if (Legal->getMaxSafeDepDistBytes() != -1U)
MaxSafeDepDist = Legal->getMaxSafeDepDistBytes() * 8;		MaxSafeDepDist = Legal->getMaxSafeDepDistBytes() * 8;
unsigned WidestRegister =		unsigned WidestRegister =
std::min(TTI.getRegisterBitWidth(true), MaxSafeDepDist);		std::min(TTI.getRegisterBitWidth(true), MaxSafeDepDist);
const DataLayout &DL = TheFunction->getParent()->getDataLayout();		const DataLayout &DL = TheFunction->getParent()->getDataLayout();

SmallVector<RegisterUsage, 8> RUs(VFs.size());		SmallVector<RegisterUsage, 8> RUs(VFs.size());
SmallVector<unsigned, 8> MaxUsages(VFs.size(), 0);		SmallVector<SmallMapVector<unsigned, unsigned, 4>, 8> MaxUsages(VFs.size());

LLVM_DEBUG(dbgs() << "LV(REG): Calculating max register usage:\n");		LLVM_DEBUG(dbgs() << "LV(REG): Calculating max register usage:\n");

// A lambda that gets the register usage for the given type and VF.		// A lambda that gets the register usage for the given type and VF.
auto GetRegUsage = [&DL, WidestRegister](Type *Ty, unsigned VF) {		auto GetRegUsage = [&DL, WidestRegister](Type *Ty, unsigned VF) {
if (Ty->isTokenTy())		if (Ty->isTokenTy())
return 0U;		return 0U;
unsigned TypeSize = DL.getTypeSizeInBits(Ty->getScalarType());		unsigned TypeSize = DL.getTypeSizeInBits(Ty->getScalarType());
Show All 13 Lines	if (Ends.find(I) == Ends.end())
continue;		continue;

// Skip ignored values.		// Skip ignored values.
if (ValuesToIgnore.find(I) != ValuesToIgnore.end())		if (ValuesToIgnore.find(I) != ValuesToIgnore.end())
continue;		continue;

// For each VF find the maximum usage of registers.		// For each VF find the maximum usage of registers.
for (unsigned j = 0, e = VFs.size(); j < e; ++j) {		for (unsigned j = 0, e = VFs.size(); j < e; ++j) {
		// Count the number of live intervals.
		SmallMapVector<unsigned, unsigned, 4> RegUsage;

if (VFs[j] == 1) {		if (VFs[j] == 1) {
MaxUsages[j] = std::max(MaxUsages[j], OpenIntervals.size());		for (auto Inst : OpenIntervals) {
continue;		unsigned ClassID = TTI.getRegisterClassForType(Inst->getType(), false);
		if (RegUsage.find(ClassID) == RegUsage.end())
		RegUsage[ClassID] = 1;
		else
		RegUsage[ClassID] += 1;
}		}
		} else {
collectUniformsAndScalars(VFs[j]);		collectUniformsAndScalars(VFs[j]);
// Count the number of live intervals.
unsigned RegUsage = 0;
for (auto Inst : OpenIntervals) {		for (auto Inst : OpenIntervals) {
// Skip ignored values for VF > 1.		// Skip ignored values for VF > 1.
if (VecValuesToIgnore.find(Inst) != VecValuesToIgnore.end() \|\|		if (VecValuesToIgnore.find(Inst) != VecValuesToIgnore.end())
isScalarAfterVectorization(Inst, VFs[j]))
continue;		continue;
RegUsage += GetRegUsage(Inst->getType(), VFs[j]);		if (isScalarAfterVectorization(Inst, VFs[j])) {
		unsigned ClassID = TTI.getRegisterClassForType(Inst->getType(), false);
		if (RegUsage.find(ClassID) == RegUsage.end())
		RegUsage[ClassID] = 1;
		else
		RegUsage[ClassID] += 1;
		} else {
		unsigned ClassID = TTI.getRegisterClassForType(Inst->getType(), true);
		if (RegUsage.find(ClassID) == RegUsage.end())
		RegUsage[ClassID] = GetRegUsage(Inst->getType(), VFs[j]);
		else
		RegUsage[ClassID] += GetRegUsage(Inst->getType(), VFs[j]);
		}
		}
		}

		for (auto& pair : RegUsage) {
		if (MaxUsages[j].find(pair.first) != MaxUsages[j].end())
		MaxUsages[j][pair.first] = std::max(MaxUsages[j][pair.first], pair.second);
		else
		MaxUsages[j][pair.first] = pair.second;
}		}
MaxUsages[j] = std::max(MaxUsages[j], RegUsage);
}		}

LLVM_DEBUG(dbgs() << "LV(REG): At #" << i << " Interval # "		LLVM_DEBUG(dbgs() << "LV(REG): At #" << i << " Interval # "
<< OpenIntervals.size() << '\n');		<< OpenIntervals.size() << '\n');

// Add the current instruction to the list of open intervals.		// Add the current instruction to the list of open intervals.
OpenIntervals.insert(I);		OpenIntervals.insert(I);
}		}

for (unsigned i = 0, e = VFs.size(); i < e; ++i) {		for (unsigned i = 0, e = VFs.size(); i < e; ++i) {
unsigned Invariant = 0;		SmallMapVector<unsigned, unsigned, 4> Invariant;
if (VFs[i] == 1)
Invariant = LoopInvariants.size();		for (auto Inst : LoopInvariants) {
else {		unsigned Usage = VFs[i] == 1 ? 1 : GetRegUsage(Inst->getType(), VFs[i]);
for (auto Inst : LoopInvariants)		unsigned ClassID = TTI.getRegisterClassForType(Inst->getType(), VFs[i] > 1);
Invariant += GetRegUsage(Inst->getType(), VFs[i]);		if (Invariant.find(ClassID) == Invariant.end())
		Invariant[ClassID] = Usage;
		else
		Invariant[ClassID] += Usage;
}		}

LLVM_DEBUG(dbgs() << "LV(REG): VF = " << VFs[i] << '\n');		LLVM_DEBUG(dbgs() << "LV(REG): VF = " << VFs[i] << '\n');
LLVM_DEBUG(dbgs() << "LV(REG): Found max usage: " << MaxUsages[i] << '\n');		LLVM_DEBUG(dbgs() << "LV(REG): Found max usage: "
LLVM_DEBUG(dbgs() << "LV(REG): Found invariant usage: " << Invariant		<< MaxUsages[i].size() << " item\n");
<< '\n');		for (const auto& pair : MaxUsages[i]) {
		LLVM_DEBUG(dbgs() << "LV(REG): RegisterClass: "
		<< TTI.getRegisterClassName(pair.first)
		<< ", " << pair.second << " registers \n");
		}
		LLVM_DEBUG(dbgs() << "LV(REG): Found invariant usage: "
		<< Invariant.size() << " item\n");
		for (const auto& pair : Invariant) {
		LLVM_DEBUG(dbgs() << "LV(REG): RegisterClass: "
		<< TTI.getRegisterClassName(pair.first)
		<< ", " << pair.second << " registers \n");
		}

RU.LoopInvariantRegs = Invariant;		RU.LoopInvariantRegs = Invariant;
RU.MaxLocalUsers = MaxUsages[i];		RU.MaxLocalUsers = MaxUsages[i];
RUs[i] = RU;		RUs[i] = RU;
}		}

return RUs;		return RUs;
}		}
▲ Show 20 Lines • Show All 2,329 Lines • ▼ Show 20 Lines	bool LoopVectorizePass::runImpl(

// Don't attempt if		// Don't attempt if
// 1. the target claims to have no vector registers, and		// 1. the target claims to have no vector registers, and
// 2. interleaving won't help ILP.		// 2. interleaving won't help ILP.
//		//
// The second condition is necessary because, even if the target has no		// The second condition is necessary because, even if the target has no
// vector registers, loop vectorization may still enable scalar		// vector registers, loop vectorization may still enable scalar
// interleaving.		// interleaving.
if (!TTI->getNumberOfRegisters(true) && TTI->getMaxInterleaveFactor(1) < 2)		if (!TTI->getNumberOfRegisters(TTI->getRegisterClassForType(F.getType(), true))
		&& TTI->getMaxInterleaveFactor(1) < 2)
		hfinkelUnsubmitted Not Done Reply Inline Actions I think that we can just make a separate function for this: TTI->hasVectorRegisters() (and then use that here and in the SLP vectorizer). hfinkel: I think that we can just make a separate function for this: TTI->hasVectorRegisters() (and…
		wuzishAuthorUnsubmitted Done Reply Inline Actions I think it's could be like `TTI->getRegisterClassForType(F.getType(), true)` above wuzish: I think it's could be like `TTI->getRegisterClassForType(F.getType(), true)` above
		hfinkelUnsubmitted Done Reply Inline Actions But above, F.getType() gives back the right scalar type because F is the LSR::Formula. Here F in the function, right? I don't think it makes sense to ask for the register class of the function type. hfinkel: But above, F.getType() gives back the right scalar type because F is the LSR::Formula. Here F…
		wuzishAuthorUnsubmitted Done Reply Inline Actions Yes. And above case would return nullptr, so we need care about this situation. And here we can left type to be nullptr as default value argument. wuzish: Yes. And above case would return nullptr, so we need care about this situation. And here we can…
		hfinkelUnsubmitted Done Reply Inline Actions I suppose that this makes sense if we assume that `TTI->getRegisterClassForType` is called only on legal types? I'm a bit worried here because, at the IR level, all types are supported and should be legalized into some register class. We should better document the expected behavior here one way or the other. hfinkel: I suppose that this makes sense if we assume that `TTI->getRegisterClassForType` is called only…
		wuzishAuthorUnsubmitted Done Reply Inline Actions If I am not misunderstanding what you mean, then I think the type should be `isSingleValueType`. We can document it at the comments in getRegisterClassForType prototype declare. wuzish: If I am not misunderstanding what you mean, then I think the type should be `isSingleValueType`.
		hfinkelUnsubmitted Done Reply Inline Actions I agree, but I think that's insufficient. isSingleValueType is true for vector types, and so if we have an architecture with no vector registers, and we call TTI->getRegisterClassForType on a vector type, it would return the class associated with the scalarized type. I think that what you intend to say is something like: getRegisterClassForType returns the register class associated with the provided type, accounting for type promotion and other type-legalization techniques that the target might apply, however, it specifically does not account for the scalarization or splitting of vector types. Should a vector type require scalarization or splitting into multiple underlying vector registers, that type should be mapped to a register class containing no registers. In some sense, this seems reasonable, because the interface does not provide any way to figure out how many of a particular register in a register class the type might use, and so we can't return a sensible answer in cases where splitting or scalarization is required. It's a bit unfortunate, however, because the same consideration applies to scalar types too (e.g., i256 probably takes up multiple scalar registers). What we really should do, I think, is expand this interface to return, perhaps optionally, the number of registers of the provided class. In the implementation, in such a case, could start by running: std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Ty); and then using the MVT for making the register-class decisions. However, it does not seem possible to make this change while retaining the current behavior for other targets (because we'd change what happens for illegal types), and thus, I recommend that a comment be added along the lines of my suggestion above, and then we also add this: // FIXME: It's not currently possible to determine how many registers are used by the provided type. and we address this aspect later in a different patch. hfinkel: I agree, but I think that's insufficient. isSingleValueType is true for vector types, and so if…
		wuzishAuthorUnsubmitted Done Reply Inline Actions May the number issue be addressed by such code in LoopVectorize.cpp? // A lambda that gets the register usage for the given type and VF. auto GetRegUsage = [&DL, WidestRegister](Type Ty, unsigned VF) { if (Ty->isTokenTy()) return 0U; unsigned TypeSize = DL.getTypeSizeInBits(Ty->getScalarType()); return std::max<unsigned>(1, VF TypeSize / WidestRegister); }; And the `WidestRegister` is from `TTI.getRegisterBitWidth` which I think should be related with Type. Yes, the only one interface `TLI->getTypeLegalizationCost` would be more consistent and easy to maintain. wuzish:* May the number issue be addressed by such code in LoopVectorize.cpp? ``` // A lambda that gets…
return false;		return false;

bool Changed = false;		bool Changed = false;

// The vectorizer requires loops to be in simplified form.		// The vectorizer requires loops to be in simplified form.
// Since simplification may add new inner loops, it has to run before the		// Since simplification may add new inner loops, it has to run before the
// legality and profitability checks. This means running the loop vectorizer		// legality and profitability checks. This means running the loop vectorizer
// will simplify all loops, regardless of whether anything end up being		// will simplify all loops, regardless of whether anything end up being
▲ Show 20 Lines • Show All 74 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,197 Lines • ▼ Show 20 Lines	bool SLPVectorizerPass::runImpl(Function &F, ScalarEvolution *SE_,
DL = &F.getParent()->getDataLayout();		DL = &F.getParent()->getDataLayout();

Stores.clear();		Stores.clear();
GEPs.clear();		GEPs.clear();
bool Changed = false;		bool Changed = false;

// If the target claims to have no vector registers don't attempt		// If the target claims to have no vector registers don't attempt
// vectorization.		// vectorization.
if (!TTI->getNumberOfRegisters(true))		if (!TTI->getNumberOfRegisters(TTI->getRegisterClassForType(F.getType(), true)))
return false;		return false;

// Don't vectorize when the attribute NoImplicitFloat is used.		// Don't vectorize when the attribute NoImplicitFloat is used.
if (F.hasFnAttribute(Attribute::NoImplicitFloat))		if (F.hasFnAttribute(Attribute::NoImplicitFloat))
return false;		return false;

LLVM_DEBUG(dbgs() << "SLP: Analyzing blocks in " << F.getName() << ".\n");		LLVM_DEBUG(dbgs() << "SLP: Analyzing blocks in " << F.getName() << ".\n");

▲ Show 20 Lines • Show All 1,859 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/PowerPC/reg-usage.ll

This file was added.

				; RUN: opt < %s -debug-only=loop-vectorize -loop-vectorize -vectorizer-maximize-bandwidth -O2 -mtriple=powerpc64-unknown-linux -S -mcpu=pwr8 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-PWR8
				; RUN: opt < %s -debug-only=loop-vectorize -loop-vectorize -vectorizer-maximize-bandwidth -O2 -mtriple=powerpc64le-unknown-linux -S -mcpu=pwr9 2>&1 \| FileCheck %s --check-prefixes=CHECK,CHECK-PWR9

				@a = global [1024 x i8] zeroinitializer, align 16
				@b = global [1024 x i8] zeroinitializer, align 16

				define i32 @foo() {
				;
				; CHECK-LABEL: foo

				; CHECK: LV(REG): VF = 8
				; CHECK-NEXT: LV(REG): Found max usage: 2 item
				; CHECK-NEXT: LV(REG): RegisterClass: PPC::GPRRC, 2 registers
				; CHECK-NEXT: LV(REG): RegisterClass: PPC::VSXRC, 7 registers
				; CHECK-NEXT: LV(REG): Found invariant usage: 0 item
				; CHECK: LV(REG): VF = 16
				; CHECK-NEXT: LV(REG): Found max usage: 2 item
				; CHECK-NEXT: LV(REG): RegisterClass: PPC::GPRRC, 2 registers
				; CHECK-NEXT: LV(REG): RegisterClass: PPC::VSXRC, 13 registers
				; CHECK-NEXT: LV(REG): Found invariant usage: 0 item

				; CHECK-PWR8: LV(REG): VF = 16
				; CHECK-PWR8-NEXT: LV(REG): Found max usage: 2 item
				; CHECK-PWR8-NEXT: LV(REG): RegisterClass: PPC::GPRRC, 2 registers
				; CHECK-PWR8-NEXT: LV(REG): RegisterClass: PPC::VSXRC, 13 registers
				; CHECK-PWR8-NEXT: LV(REG): Found invariant usage: 0 item
				; CHECK-PWR8: Setting best plan to VF=16, UF=4

				; CHECK-PWR9: LV(REG): VF = 8
				; CHECK-PWR9-NEXT: LV(REG): Found max usage: 2 item
				; CHECK-PWR9-NEXT: LV(REG): RegisterClass: PPC::GPRRC, 2 registers
				; CHECK-PWR9-NEXT: LV(REG): RegisterClass: PPC::VSXRC, 7 registers
				; CHECK-PWR9-NEXT: LV(REG): Found invariant usage: 0 item
				; CHECK-PWR9: Setting best plan to VF=8, UF=8


				entry:
				br label %for.body

				for.cond.cleanup:
				%add.lcssa = phi i32 [ %add, %for.body ]
				ret i32 %add.lcssa

				for.body:
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%s.015 = phi i32 [ 0, %entry ], [ %add, %for.body ]
				%arrayidx = getelementptr inbounds [1024 x i8], [1024 x i8]* @a, i64 0, i64 %indvars.iv
				%0 = load i8, i8* %arrayidx, align 1
				%conv = zext i8 %0 to i32
				%arrayidx2 = getelementptr inbounds [1024 x i8], [1024 x i8]* @b, i64 0, i64 %indvars.iv
				%1 = load i8, i8* %arrayidx2, align 1
				%conv3 = zext i8 %1 to i32
				%sub = sub nsw i32 %conv, %conv3
				%ispos = icmp sgt i32 %sub, -1
				%neg = sub nsw i32 0, %sub
				%2 = select i1 %ispos, i32 %sub, i32 %neg
				%add = add nsw i32 %2, %s.015
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.cond.cleanup, label %for.body
				}

				define i32 @goo() {
				; For indvars.iv used in a computating chain only feeding into getelementptr or cmp,
				; it will not have vector version and the vector register usage will not exceed the
				; available vector register number.
				; CHECK-LABEL: goo
				; CHECK: LV(REG): VF = 8
				; CHECK-NEXT: LV(REG): Found max usage: 2 item
				; CHECK-NEXT: LV(REG): RegisterClass: PPC::GPRRC, 2 registers
				; CHECK-NEXT: LV(REG): RegisterClass: PPC::VSXRC, 7 registers
				; CHECK-NEXT: LV(REG): Found invariant usage: 0 item
				; CHECK: LV(REG): VF = 16
				; CHECK-NEXT: LV(REG): Found max usage: 2 item
				; CHECK-NEXT: LV(REG): RegisterClass: PPC::GPRRC, 2 registers
				; CHECK-NEXT: LV(REG): RegisterClass: PPC::VSXRC, 13 registers
				; CHECK-NEXT: LV(REG): Found invariant usage: 0 item
				; CHECK: LV(REG): VF = 16
				; CHECK-NEXT: LV(REG): Found max usage: 2 item
				; CHECK-NEXT: LV(REG): RegisterClass: PPC::GPRRC, 2 registers
				; CHECK-NEXT: LV(REG): RegisterClass: PPC::VSXRC, 13 registers
				; CHECK-NEXT: LV(REG): Found invariant usage: 0 item

				; CHECK: Setting best plan to VF=16, UF=4

				entry:
				br label %for.body

				for.cond.cleanup: ; preds = %for.body
				%add.lcssa = phi i32 [ %add, %for.body ]
				ret i32 %add.lcssa

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%s.015 = phi i32 [ 0, %entry ], [ %add, %for.body ]
				%tmp1 = add nsw i64 %indvars.iv, 3
				%arrayidx = getelementptr inbounds [1024 x i8], [1024 x i8]* @a, i64 0, i64 %tmp1
				%tmp = load i8, i8* %arrayidx, align 1
				%conv = zext i8 %tmp to i32
				%tmp2 = add nsw i64 %indvars.iv, 2
				%arrayidx2 = getelementptr inbounds [1024 x i8], [1024 x i8]* @b, i64 0, i64 %tmp2
				%tmp3 = load i8, i8* %arrayidx2, align 1
				%conv3 = zext i8 %tmp3 to i32
				%sub = sub nsw i32 %conv, %conv3
				%ispos = icmp sgt i32 %sub, -1
				%neg = sub nsw i32 0, %sub
				%tmp4 = select i1 %ispos, i32 %sub, i32 %neg
				%add = add nsw i32 %tmp4, %s.015
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.cond.cleanup, label %for.body
				}

				define i64 @bar(i64* nocapture %a) {
				; CHECK-LABEL: bar
				; CHECK: LV(REG): VF = 2
				; CHECK-NEXT: LV(REG): Found max usage: 2 item
				; CHECK-NEXT: LV(REG): RegisterClass: PPC::VSXRC, 3 registers
				; CHECK-NEXT: LV(REG): RegisterClass: PPC::GPRRC, 1 registers
				; CHECK-NEXT: LV(REG): Found invariant usage: 0 item

				; CHECK: Setting best plan to VF=2, UF=12

				entry:
				br label %for.body

				for.cond.cleanup:
				%add2.lcssa = phi i64 [ %add2, %for.body ]
				ret i64 %add2.lcssa

				for.body:
				%i.012 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
				%s.011 = phi i64 [ 0, %entry ], [ %add2, %for.body ]
				%arrayidx = getelementptr inbounds i64, i64* %a, i64 %i.012
				%0 = load i64, i64* %arrayidx, align 8
				%add = add nsw i64 %0, %i.012
				store i64 %add, i64* %arrayidx, align 8
				%add2 = add nsw i64 %add, %s.011
				%inc = add nuw nsw i64 %i.012, 1
				%exitcond = icmp eq i64 %inc, 1024
				br i1 %exitcond, label %for.cond.cleanup, label %for.body
				}

				@d = external global [0 x i64], align 8
				@e = external global [0 x i32], align 4
				@c = external global [0 x i32], align 4

				define void @hoo(i32 %n) {
				; CHECK-LABEL: hoo
				; CHECK: LV(REG): VF = 4
				; CHECK-NEXT: LV(REG): Found max usage: 2 item
				; CHECK-NEXT: LV(REG): RegisterClass: PPC::GPRRC, 2 registers
				; CHECK-NEXT: LV(REG): RegisterClass: PPC::VSXRC, 2 registers
				; CHECK-NEXT: LV(REG): Found invariant usage: 0 item
				; CHECK: LV(REG): VF = 1
				; CHECK-NEXT: LV(REG): Found max usage: 1 item
				; CHECK-NEXT: LV(REG): RegisterClass: PPC::GPRRC, 2 registers
				; CHECK-NEXT: LV(REG): Found invariant usage: 0 item
				; CHECK: Setting best plan to VF=1, UF=12

				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds [0 x i64], [0 x i64]* @d, i64 0, i64 %indvars.iv
				%tmp = load i64, i64* %arrayidx, align 8
				%arrayidx1 = getelementptr inbounds [0 x i32], [0 x i32]* @e, i64 0, i64 %tmp
				%tmp1 = load i32, i32* %arrayidx1, align 4
				%arrayidx3 = getelementptr inbounds [0 x i32], [0 x i32]* @c, i64 0, i64 %indvars.iv
				store i32 %tmp1, i32* %arrayidx3, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 10000
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret void
				}

llvm/test/Transforms/LoopVectorize/X86/reg-usage-debug.ll

Show All 16 Lines
; r += a[i];		; r += a[i];
; return r;		; return r;
; }		; }

target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"		target triple = "x86_64-unknown-linux-gnu"

; CHECK: LV: Checking a loop in "test_g"		; CHECK: LV: Checking a loop in "test_g"
; CHECK: LV(REG): Found max usage: 2		; CHECK: LV(REG): Found max usage: 2 item
		; CHECK-NEXT: LV(REG): RegisterClass: Generic::ScalarRC, 2 registers
		; CHECK-NEXT: LV(REG): RegisterClass: Generic::VectorRC, 2 registers
		; CHECK-NEXT: LV(REG): Found invariant usage: 1 item
		; CHECK-NEXT: LV(REG): RegisterClass: Generic::VectorRC, 2 registers

define i32 @test_g(i32* nocapture readonly %a, i32 %n) local_unnamed_addr !dbg !6 {		define i32 @test_g(i32* nocapture readonly %a, i32 %n) local_unnamed_addr !dbg !6 {
entry:		entry:
tail call void @llvm.dbg.value(metadata i32* %a, i64 0, metadata !12, metadata !16), !dbg !17		tail call void @llvm.dbg.value(metadata i32* %a, i64 0, metadata !12, metadata !16), !dbg !17
tail call void @llvm.dbg.value(metadata i32 %n, i64 0, metadata !13, metadata !16), !dbg !18		tail call void @llvm.dbg.value(metadata i32 %n, i64 0, metadata !13, metadata !16), !dbg !18
tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !15, metadata !16), !dbg !19		tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !15, metadata !16), !dbg !19
tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !14, metadata !16), !dbg !20		tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !14, metadata !16), !dbg !20
tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !15, metadata !16), !dbg !19		tail call void @llvm.dbg.value(metadata i32 0, i64 0, metadata !15, metadata !16), !dbg !19
Show All 21 Lines	for.end.loopexit: ; preds = %for.body
br label %for.end, !dbg !38		br label %for.end, !dbg !38

for.end: ; preds = %for.end.loopexit, %entry		for.end: ; preds = %for.end.loopexit, %entry
%r.0.lcssa = phi i32 [ 0, %entry ], [ %add, %for.end.loopexit ]		%r.0.lcssa = phi i32 [ 0, %entry ], [ %add, %for.end.loopexit ]
ret i32 %r.0.lcssa, !dbg !38		ret i32 %r.0.lcssa, !dbg !38
}		}

; CHECK: LV: Checking a loop in "test"		; CHECK: LV: Checking a loop in "test"
; CHECK: LV(REG): Found max usage: 2		; CHECK: LV(REG): Found max usage: 2 item
		; CHECK-NEXT: LV(REG): RegisterClass: Generic::ScalarRC, 2 registers
		; CHECK-NEXT: LV(REG): RegisterClass: Generic::VectorRC, 2 registers
		; CHECK-NEXT: LV(REG): Found invariant usage: 1 item
		; CHECK-NEXT: LV(REG): RegisterClass: Generic::VectorRC, 2 registers

define i32 @test(i32* nocapture readonly %a, i32 %n) local_unnamed_addr {		define i32 @test(i32* nocapture readonly %a, i32 %n) local_unnamed_addr {
entry:		entry:
%cmp6 = icmp eq i32 %n, 0		%cmp6 = icmp eq i32 %n, 0
br i1 %cmp6, label %for.end, label %for.body.preheader		br i1 %cmp6, label %for.end, label %for.body.preheader

for.body.preheader: ; preds = %entry		for.body.preheader: ; preds = %entry
%wide.trip.count = zext i32 %n to i64		%wide.trip.count = zext i32 %n to i64
▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/X86/reg-usage.ll

; RUN: opt < %s -debug-only=loop-vectorize -loop-vectorize -vectorizer-maximize-bandwidth -O2 -mtriple=x86_64-unknown-linux -S 2>&1 \| FileCheck %s		; RUN: opt < %s -debug-only=loop-vectorize -loop-vectorize -vectorizer-maximize-bandwidth -O2 -mtriple=x86_64-unknown-linux -S 2>&1 \| FileCheck %s
; RUN: opt < %s -debug-only=loop-vectorize -loop-vectorize -vectorizer-maximize-bandwidth -O2 -mtriple=x86_64-unknown-linux -mattr=+avx512f -S 2>&1 \| FileCheck %s --check-prefix=AVX512F		; RUN: opt < %s -debug-only=loop-vectorize -loop-vectorize -vectorizer-maximize-bandwidth -O2 -mtriple=x86_64-unknown-linux -mattr=+avx512f -S 2>&1 \| FileCheck %s --check-prefix=AVX512F
; REQUIRES: asserts		; REQUIRES: asserts

@a = global [1024 x i8] zeroinitializer, align 16		@a = global [1024 x i8] zeroinitializer, align 16
@b = global [1024 x i8] zeroinitializer, align 16		@b = global [1024 x i8] zeroinitializer, align 16

define i32 @foo() {		define i32 @foo() {
; This function has a loop of SAD pattern. Here we check when VF = 16 the		; This function has a loop of SAD pattern. Here we check when VF = 16 the
; register usage doesn't exceed 16.		; register usage doesn't exceed 16.
;		;
; CHECK-LABEL: foo		; CHECK-LABEL: foo
; CHECK: LV(REG): VF = 8		; CHECK: LV(REG): VF = 8
; CHECK-NEXT: LV(REG): Found max usage: 7		; CHECK-NEXT: LV(REG): Found max usage: 2 item
		; CHECK-NEXT: LV(REG): RegisterClass: Generic::ScalarRC, 2 registers
		; CHECK-NEXT: LV(REG): RegisterClass: Generic::VectorRC, 7 registers
		; CHECK-NEXT: LV(REG): Found invariant usage: 0 item
; CHECK: LV(REG): VF = 16		; CHECK: LV(REG): VF = 16
; CHECK-NEXT: LV(REG): Found max usage: 13		; CHECK-NEXT: LV(REG): Found max usage: 2 item
		; CHECK-NEXT: LV(REG): RegisterClass: Generic::ScalarRC, 2 registers
		; CHECK-NEXT: LV(REG): RegisterClass: Generic::VectorRC, 13 registers
		; CHECK-NEXT: LV(REG): Found invariant usage: 0 item

entry:		entry:
br label %for.body		br label %for.body

for.cond.cleanup:		for.cond.cleanup:
%add.lcssa = phi i32 [ %add, %for.body ]		%add.lcssa = phi i32 [ %add, %for.body ]
ret i32 %add.lcssa		ret i32 %add.lcssa

Show All 17 Lines
}		}

define i32 @goo() {		define i32 @goo() {
; For indvars.iv used in a computating chain only feeding into getelementptr or cmp,		; For indvars.iv used in a computating chain only feeding into getelementptr or cmp,
; it will not have vector version and the vector register usage will not exceed the		; it will not have vector version and the vector register usage will not exceed the
; available vector register number.		; available vector register number.
; CHECK-LABEL: goo		; CHECK-LABEL: goo
; CHECK: LV(REG): VF = 8		; CHECK: LV(REG): VF = 8
; CHECK-NEXT: LV(REG): Found max usage: 7		; CHECK-NEXT: LV(REG): Found max usage: 2 item
		; CHECK-NEXT: LV(REG): RegisterClass: Generic::ScalarRC, 2 registers
		; CHECK-NEXT: LV(REG): RegisterClass: Generic::VectorRC, 7 registers
		; CHECK-NEXT: LV(REG): Found invariant usage: 0 item
; CHECK: LV(REG): VF = 16		; CHECK: LV(REG): VF = 16
; CHECK-NEXT: LV(REG): Found max usage: 13		; CHECK-NEXT: LV(REG): Found max usage: 2 item
		; CHECK-NEXT: LV(REG): RegisterClass: Generic::ScalarRC, 2 registers
		; CHECK-NEXT: LV(REG): RegisterClass: Generic::VectorRC, 13 registers
		; CHECK-NEXT: LV(REG): Found invariant usage: 0 item
entry:		entry:
br label %for.body		br label %for.body

for.cond.cleanup: ; preds = %for.body		for.cond.cleanup: ; preds = %for.body
%add.lcssa = phi i32 [ %add, %for.body ]		%add.lcssa = phi i32 [ %add, %for.body ]
ret i32 %add.lcssa		ret i32 %add.lcssa

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
Show All 15 Lines	for.body: ; preds = %for.body, %entry
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1		%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, 1024		%exitcond = icmp eq i64 %indvars.iv.next, 1024
br i1 %exitcond, label %for.cond.cleanup, label %for.body		br i1 %exitcond, label %for.cond.cleanup, label %for.body
}		}

define i64 @bar(i64* nocapture %a) {		define i64 @bar(i64* nocapture %a) {
; CHECK-LABEL: bar		; CHECK-LABEL: bar
; CHECK: LV(REG): VF = 2		; CHECK: LV(REG): VF = 2
; CHECK: LV(REG): Found max usage: 3		; CHECK-NEXT: LV(REG): Found max usage: 2 item
;		; CHECK-NEXT: LV(REG): RegisterClass: Generic::VectorRC, 3 registers
		; CHECK-NEXT: LV(REG): RegisterClass: Generic::ScalarRC, 1 registers
		; CHECK-NEXT: LV(REG): Found invariant usage: 0 item

entry:		entry:
br label %for.body		br label %for.body

for.cond.cleanup:		for.cond.cleanup:
%add2.lcssa = phi i64 [ %add2, %for.body ]		%add2.lcssa = phi i64 [ %add2, %for.body ]
ret i64 %add2.lcssa		ret i64 %add2.lcssa

for.body:		for.body:
Show All 14 Lines
@c = external global [0 x i32], align 4		@c = external global [0 x i32], align 4

define void @hoo(i32 %n) {		define void @hoo(i32 %n) {
; For c[i] = e[d[i]] in the loop, e[d[i]] is not consecutive but its index %tmp can		; For c[i] = e[d[i]] in the loop, e[d[i]] is not consecutive but its index %tmp can
; be gathered into a vector. For VF == 16, the vector version of %tmp will be <16 x i64>		; be gathered into a vector. For VF == 16, the vector version of %tmp will be <16 x i64>
; so the max usage of AVX512 vector register will be 2.		; so the max usage of AVX512 vector register will be 2.
; AVX512F-LABEL: bar		; AVX512F-LABEL: bar
; AVX512F: LV(REG): VF = 16		; AVX512F: LV(REG): VF = 16
; AVX512F: LV(REG): Found max usage: 2		; AVX512F-CHECK: LV(REG): Found max usage: 2 item
;		; AVX512F-CHECK: LV(REG): RegisterClass: Generic::ScalarRC, 2 registers
		; AVX512F-CHECK: LV(REG): RegisterClass: Generic::VectorRC, 2 registers
		; AVX512F-CHECK: LV(REG): Found invariant usage: 0 item

entry:		entry:
br label %for.body		br label %for.body

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]		%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
%arrayidx = getelementptr inbounds [0 x i64], [0 x i64]* @d, i64 0, i64 %indvars.iv		%arrayidx = getelementptr inbounds [0 x i64], [0 x i64]* @d, i64 0, i64 %indvars.iv
%tmp = load i64, i64* %arrayidx, align 8		%tmp = load i64, i64* %arrayidx, align 8
%arrayidx1 = getelementptr inbounds [0 x i32], [0 x i32]* @e, i64 0, i64 %tmp		%arrayidx1 = getelementptr inbounds [0 x i32], [0 x i32]* @e, i64 0, i64 %tmp
Show All 10 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorizeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 220278

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/CodeGen/BasicTTIImpl.h

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/Target/AArch64/AArch64TargetTransformInfo.h

llvm/lib/Target/ARM/ARMTargetTransformInfo.h

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.h

llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp

llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.h

llvm/lib/Target/WebAssembly/WebAssemblyTargetTransformInfo.cpp

llvm/lib/Target/X86/X86TargetTransformInfo.h

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

llvm/lib/Target/XCore/XCoreTargetTransformInfo.h

llvm/lib/Transforms/Scalar/LoopStrengthReduce.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/LoopVectorize/PowerPC/reg-usage.ll

llvm/test/Transforms/LoopVectorize/X86/reg-usage-debug.ll

llvm/test/Transforms/LoopVectorize/X86/reg-usage.ll

[LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize
ClosedPublic