Adding all x86 Processor families to initialize several uArch properties (based on the family)
this shows how gather cost can be initialized based on the proc. family
Details
Diff Detail
Event Timeline
I really don't like this approach - we've been getting closer and closer to killing off X86ProcFamilyEnum for years now and move to a mixture of feature bits and scheduler cost driven decisions that works in a more general manner.
Assuming you intend to use getGatherOverhead() inside X86TTIImpl::getGatherScatterOpCost, might not it be better to just provide a Fast/Slow Gather feature bit, or (even better) finally start trying to use the scheduler model data within X86TTIImpl ?
I agree with you. We also considered using scheduling model (or any other target micro-architecture data base) from TTI. It requires detailed design and enters to our long term plans.
We already started feeding micro-architecture details to LLVM, step-by-step, and it will take a time to complete.
The cost, that is kept in X86TargetTransformInfo.cpp should also be moved to .td file and it will require a special interface for communication between IR and MachineInstructions.
Using processorFamily as a property allows tuning compiler for a processor meanwhile, otherwise we just tie all patches like "gather-enabling" in one chain, waiting until all .td files are entered.
At the very least: This is adding a ton of unused and unneeded details. If we end up needing to dispatch on processor family, then maybe we can add (some of) these.
Otherwise, I don't think it's reasonable to add all these enum values.
lib/Target/X86/X86Subtarget.cpp | ||
---|---|---|
273 | GatherOverhead is unsigned. Why set it to INT_MAX? | |
274 | This could just be encoded in a bool ShouldUseGather/ShouldAvoidGather or similar. I'm ok with getting a bool now and changing it to int/unsigned/whatever in the future, but only if we need to. | |
lib/Target/X86/X86Subtarget.h | ||
528 | What's a "suitable property", btw? | |
531 | Stray change. |
lib/Target/X86/X86Subtarget.cpp | ||
---|---|---|
274 | Gather is available since Haswell (AVX2 set). So technically, we can generate Gathers on all AVX2 processors. I want to say that "GSOverhead = 2" we have today inside X86TTI. The problem, that TTI can't distinguish between HSW and SKL. They both have Gather, but the cost is different. I assume that TTI should provide a cost for Gather, for *all* AVX2 and AVX-512 processors, but again, it is not one value for all. Vectorizer compares this cost with scalar and interleave variants and chooses the best solution. |
If we were to go the route of adding a feature bit for every CPU. I wonder if we should just convert the CPU name into an enum in the target independent subtarget classes. Going through a feature bit to create an enum is just making the number of feature bits larger. Feature bits are stored in a std::bitset that last I checked was 160 bits and is stored in some of the tablegen generated data structures. Due to limitations in various place this max size of 160 is determined by the target with the most feature bits. I believe it got as high as it is because ARM or AArch64 has added a feature bit per CPU similar to this patch.
Another option may be to allow an enum value to be passed as a separate input to ProcModel that would get stored in a separate field in the target independent subtarget. This would also avoid the feature bit pressure.
I still don't get why this can't be achieved with a single 'FeatureFastGather' feature bit, just used by Skylake CPU target - then X86TTIImpl::getGatherScatterOpCost can just call ST->hasFastGather() to decide whether to use gathers or not.
Or if you want to keep the overhead as a cost multiplier you can do something like:
unsigned getGatherOverhead() const { return HasFastGather ? 2 : UINT_MAX; }
These would be much easier to remove when you've had time to sort out the scheduler models.
In order to set different numbers to different mach. Each processor that supports Gather should be able to provide a number, not only /NoGather/SlowGather/FastGather.
So how many numbers are we talking about? Is it just Skylake, SKX and Cannonlake? (KNL?) I guess we don't want to handle weaker AVX2 CPUs so we don't need a feature bit for those (default getGatherOverhead() to UINT_MAX)
Whatever happens, almost all those new processor enums are not required.
removing unnecessary processor
fixing uint issue
lib/Target/X86/X86Subtarget.h | ||
---|---|---|
528 | it depends in usage of the isAtom\isSLM. for example in X86TargetTransformInfo.cpp::getMaxInterleaveFactor the property will be getMaxInterleaveFactor |
Before you continue with this, is there any chance that you can create a phab for the follow on patches that are dependent on this please? To get a better idea of what you are needing this for.
lib/Target/X86/X86Subtarget.cpp | ||
---|---|---|
269 | You got to all that trouble of adding the other intel cpus and then just need IntelSkylake and IntelSKX? |
lib/Target/X86/X86Subtarget.cpp | ||
---|---|---|
270 | Mohamed, you have to add ScatterOverhead for SKX. |
lib/Target/X86/X86Subtarget.cpp | ||
---|---|---|
269 | This is only one property. even HSW has Gather and later will add it's overhead (currently we dont recommend generating gathers for HSW). There is other properties that will need to return different values based on the family. so my patch just getting the infrastructure ready for this. I will upload my patch soon, it uses the "getGatherOverhead" property from the subtarget. |
adding ScatterOverhead and updating the max overhead.
max overhead can't be MAX_INT or MAX_UINT because it will overflow on the CM calculation and causing a wrong decision.
lib/Target/X86/X86Subtarget.h | ||
---|---|---|
62 | You can use Others instead of Generic. |
lib/Target/X86/X86Subtarget.cpp | ||
---|---|---|
267 | Why can't this just be done by comparing the CPU string? What did we get by creating an enum that is named the same as the CPU string? |
lib/Target/X86/X86Subtarget.cpp | ||
---|---|---|
267 | X86ProcFamily enum was already exist and we have just added more values. In general I think it's better to parse the CPU strings in one place and after that to start using the enum values instead of comparing the strings all the time. |
lib/Target/X86/X86Subtarget.cpp | ||
---|---|---|
12 | Gather is available since Haswell (AVX2 set). So technically, we can generate Gathers on all AVX2 processors. GatherOverhead = 2; if (hasAVX512) // SKX and KNL fail here ScatterOverhead = 2; | |
15 | Please remove. | |
31 | Please remove the "else", the values are already initialized. |
LGTM + a minor comment fix
lib/Target/X86/X86Subtarget.cpp | ||
---|---|---|
16 | *we are already using it to calculate GS cost -> |
LGTM - I still think this patch only needs to add IntelSkylake but it's a lot better than it was.
You don't need Others here, I think.