Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 40924 Build 41066: arc lint + arc unit
Event Timeline
This patch is not functional yet, but I have added it here to show in the test what I intend to do.
I am adding a pass that adds the "vector-function-abi-variant" attribute from the mappings that are stored in the TargetLibraryInfo (TLI).
This patch is based on the work published in https://reviews.llvm.org/D69976 and https://reviews.llvm.org/D70089
There is not need to review the code for now, all I need to know is whether people agree on this approach.
@fpetrogalli This means that all passes that need to care about vector variants of functions need to add a dependency on this pass, right? This seems quite similar to how TBAA info is loaded from IR metadata using an analysis pass, after which the information can be queried using the AliasAnalysis interfaces. This is doing a similar thing, but then for querying vector-variants of functions, so +1 for the approach!
We should not forget to document the new mechanism (the VFABI, metadata attributes and the SVFS mechanism), perhaps as a separate document. The metadata format should at least be described in the LangRef.
llvm/include/llvm/Transforms/Utils/InjectTLIMappings.h | ||
---|---|---|
20 | Don't forget about the old pass manager :) |
IIUC, this a transformation pass (it does modify the module, e.g. by appendToCompilerUsed(*M, {Global});). So you probably want to register it with one of the optimisation pipelines. I _believe_ that that's how you do it:
For legacy PM:
- Use INITIALIZE_PASS_BEGIN (https://github.com/llvm/llvm-project/blob/848007cfbc7509543c5b8604ae063bb6c8ffa0a9/llvm/include/llvm/PassSupport.h#L33) that will define initializeInjectTLIMappingsPass for you. Then use it where you want to add it - you can check initializeSLPVectorizerPass for reference (https://github.com/llvm/llvm-project/blob/848007cfbc7509543c5b8604ae063bb6c8ffa0a9/llvm/lib/Transforms/Vectorize/Vectorize.cpp#L28).
- You can also use llvm::RegisterStandardPasses (http://llvm.org/docs/WritingAnLLVMPass.html#basic-code-required) if you want to add it in one of the avaiable extension points (so that it runs automagically with e.g. -O1)
For the new PM, you probably want to add your pass to an existing FunctionPassPamanager, e.g.
Once that's done, your Pass will be run _automagically_ together with other passes in the pipeline. This is just a quick brain-dump so please ping me if it's unclear.
llvm/include/llvm/Transforms/Utils/InjectTLIMappings.h | ||
---|---|---|
20 | Legacy :) Also, I am not a fan of appending Pass to pass classes (it's clear what this class inherits from either way). Also, if you use INITIALIZE_PASS_BEGIN, Pass is going to be prepended to the _Initialize_ method anyway (so you will have initializeInjestTLIMappingPassPass): https://github.com/llvm/llvm-project/blob/848007cfbc7509543c5b8604ae063bb6c8ffa0a9/llvm/include/llvm/PassSupport.h#L62 | |
llvm/lib/Transforms/Utils/InjectTLIMappings.cpp | ||
105 | [nit] 'things' -> 'thinks' | |
128 | [nit] What follows is the definition of InjectTLIMappingsPass::run though. | |
llvm/test/Transforms/Util/add-TLI-mappings.ll | ||
2 | For this to work you need to register a command line option. Why not use print-after and print-before instead? Or maybe we do need a command line option? |
llvm/lib/Transforms/Utils/InjectTLIMappings.cpp | ||
---|---|---|
67 | When you think the patch is ready to be reviewed, can you address the comments that I added in D69976 before you removed it? | |
llvm/test/Transforms/Util/add-TLI-mappings.ll | ||
18 | I don't think you need to add a loop here to prove the IR contains the vectorized versions of the IR, a call to @sin should be sufficient. |
This is an update in which I have tried to add the legacy pass manager.
It is not working yet, but I think I am on a good path!
Thank you @andwar for all the pointers.
For reference, this is the linking error I am getting:
llvm-project/llvm/include/llvm/Transforms/Utils/InjectTLIMappings.h:31: error: undefined reference to 'vtable for llvm::InjectTLIMappingsLegacy' /usr/bin/ld: the vtable symbol may be undefined because the class is missing its key function
llvm/include/llvm/Transforms/Utils/InjectTLIMappings.h | ||
---|---|---|
20 | I am working on it! :) |
New pass manager command line option missing.
FWIW, you can checkout D69930 for an example of adding a pass with more hookup into the system.
llvm/include/llvm/Transforms/Utils/InjectTLIMappings.h | ||
---|---|---|
23 | I doubt you need the constructor or the name function. | |
llvm/include/llvm/Transforms/Utils/ModuleUtils.h | ||
115 | Unrelated? | |
llvm/lib/Transforms/Utils/InjectTLIMappings.cpp | ||
2 | TODO, also below | |
20 | Is there precedent for not having it in all lower-case letters? | |
75 | Wasn't that checked below but with a different call to the module? | |
81 | Where is the VarArgs restrictions checked? If it is implicit, please add an assertion that the original callee is not vararg. | |
93 | Why do you need to query Global here? Global is VectorF isn't it? | |
119 | For brevity and without the need to assign: | |
134 | Can we make 16 a return value of a (maybe static) function in TLI? Style, above: if (!...count(...)) | |
153 | I doubt you need the lllvm. Above, auto CI*please :) | |
llvm/test/Transforms/Util/add-TLI-mappings.ll | ||
2 | command line options are good for various things, please make sure they work. (new and old PM) |
Thank you all for the review. The pass is now fuctional, working for
the libraries supported by the TLI: SVML, Accelerate and MASSV.
I haven't added it to any of the optimization pipelines at the moment,
under the assumption that once this pass is listed in the required
passes of the loop vectorizer, it will be automatically loaded after
the TLI wrapper pass.
llvm/include/llvm/Transforms/Utils/ModuleUtils.h | ||
---|---|---|
115 | Yes, but still useful. I have also removed a group comment. I'd rather not create a separate patch for this? | |
llvm/lib/Transforms/Utils/InjectTLIMappings.cpp | ||
20 | The pass debug option is all lower case now (and consequently, the command line too). | |
75 | This avoids running the creation of the function if the function already exists. Otherwise I expect Function::Create to have some problems. | |
93 | Ah right, good catch. Fixed. | |
153 |
You mean I don't need to wrap the pass in the llvm namespace? I have done it in the header file too. Is that wrong? |
llvm/include/llvm/Transforms/Utils/ModuleUtils.h | ||
---|---|---|
115 | create an RFC one and commit it w/o review if it is trivial. | |
llvm/lib/Transforms/Utils/InjectTLIMappings.cpp | ||
75 | I mean you checked that already: Function *VariantF = M->getFunction(TLIName); if (!VariantF) addVariantDeclaration(CI, VF, TLIName); but once with getFunction and once with getNamedValue. I think you don't need two checks and you should pick consistent the one lookup call you want. | |
93 | Forgot to update or not fixed? | |
153 | you don't need it here because you opened the namespace. you should not open namespaces in headers that is why you wrap it in the namespace there. Plus, you don't need the explicit qualification here because these are not top level declarations as you have in the header. |
llvm/include/llvm/Transforms/Utils/ModuleUtils.h | ||
---|---|---|
115 | Done in https://reviews.llvm.org/D70218 |
llvm/include/llvm/Transforms/Utils/InjectTLIMappings.h | ||
---|---|---|
23 | What about this one? You don't need the name function, the one provided by the base class is fine: https://github.com/llvm/llvm-project/blob/848007cfbc7509543c5b8604ae063bb6c8ffa0a9/llvm/include/llvm/IR/PassManager.h#L373 As for the constructor, the auto-generated one should be sufficient here. Is it not? | |
llvm/lib/Transforms/Utils/InjectTLIMappings.cpp | ||
165 | [nit] `Legacy PM implementation (the pass manager is legacy, not the pass :) ). |
llvm/include/llvm/Transforms/Utils/InjectTLIMappings.h | ||
---|---|---|
23 | Contructor and name function removed. That was too much copy and paste. | |
llvm/lib/Transforms/Utils/InjectTLIMappings.cpp | ||
75 | I was using Global mostly to be able to call appendToCompilerUsed, which works on globals. Then, I realized that Function inherits from Global, so there is no need to use Global at all. I also have replaced the check on having an empty body with an assertion, just in case someone modifies the function in the middle and populates the body of the function before invoking appendToCompilerUsed. | |
93 | Not it is fixed, no more Global. | |
153 | Facepalm myself at the using namespace llvm; on top of this cpp file. Thanks for the explanation. |
Some minor comments below, otherwise LGTM.
@sdesmalen any more comments?
llvm/lib/Transforms/Utils/InjectTLIMappings.cpp | ||
---|---|---|
121 | The call is not free but invariant: | |
155 | Remove Changed or doe something with it. | |
171 | You can preserve more here, e.g. all CFG analysis. | |
183 | I think one of the false could be a true, unclear if that will make a difference though. |
Update according to last round of review from @jdoerfert.
Thank you.
llvm/lib/Transforms/Utils/InjectTLIMappings.cpp | ||
---|---|---|
183 | It is not clear what the cfg parameter is for. I'll leave it false for now. If we need to revise it, we will change it later. |
@fpetrogalli I have a high level question - why does this pass require that the function should be from one of the vector libraries (Mass, SMVL etc)? I'd like to piggy-back on the vector-function-abi-variant attribute to vectorize a call to a scalar function, once the front-end adds this attribute to the callsite. So, if we have the following code in IR, with a scalar call to a function trivially.vectorizable.func with the vector-function-abi-variant attribute, we will generate the vector function declarations and then LoopVectorizer/SLPVectorizer would do its thing.
Test IR:
; ModuleID = 'chk.ll' source_filename = "chk.ll" target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" define dso_local double @test(float* %Arr) { entry: br label %for.cond for.cond: %Sum.0 = phi double [ 0.000000e+00, %entry ], [ %add, %for.inc ] %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %cmp = icmp slt i32 %i.0, 128 br i1 %cmp, label %for.body, label %for.cond.cleanup for.cond.cleanup: br label %for.end for.body: %idxprom = sext i32 %i.0 to i64 %arrayidx = getelementptr inbounds float, float* %Arr, i64 %idxprom %0 = load float, float* %arrayidx, align 4 %conv = fpext float %0 to double %1 = call fast double @trivially.vectorizable.func(double %conv) #2 <-- CALL OF INTEREST %add = fadd fast double %Sum.0, %1 br label %for.inc for.inc: %inc = add nsw i32 %i.0, 1 br label %for.cond for.end: ret double %Sum.0 } declare double @trivially.vectorizable.func(double) #1 attributes #2 = { "vector-function-abi-variant"="_ZGV_LLVM_N2v_trivially.vectorizable.func(trivially.vectorizable.func.v2)" } attributes #1 = { nounwind readnone speculatable willreturn }
The way I'm thinking - this pass will add the declaration for the vector function in IR declare <2 x double> @trivially.vectorizable.func.v2(<2 x double>).
I've confirmed that once this is done, LoopVectorizer will vectorize that call, i.e. convert the scalar call to the vector version.
Is there any reason to avoid updating this pass with such a functionality? Basically, I'm trying to use something similar to simd pragma in clang (https://clang.llvm.org/docs/AttributeReference.html#id211), where we can specify any function as vectorizable.
Do we already have such support for front-ends to specify any scalar function as 'vectorizable' in IR? This is the only existing attribute that I see which can be reused for this purpose.
Also, to clarify, this vectorized call is finally lowered before passing to the backend/linker etc. So, you can think of this as I have a handwritten nice vectorized form for "trivially.vectorizable.call", which will be inlined before passing to codegen (so these declarations needn't be kept around after the inlining). The main reason for such a usecase is if we want the "high level function call" remaining in IR form until some late pass, which is just before codegen. It is not to bypass some vectorizer code generation limitation. We have such usecases internally.
Hi @anna ,
this is exactly what the attribute is for. This pass works only with the library recognized by the TLI as a shortcut implementation to avoid doing the codegeneration of the attribute for the library functions in the frontend.
As things are, if you run opt with -O2 on your IR, the code should be vectorized with a call to the vector version, if not for the fact that the IR is missing the declaration of the actual vector function. In fact, the vector-function-abi-variant attribute requires that all the mappings listed in the attribute are resolved by some declaration/definition in the IR (notice that unused _declarations_ are kept in the IR and not deleted because such declarations are passed as parameter to the IR intrinsics`@llvm.compiler.used`). You can find more infos on this attribute here: https://llvm.org/docs/LangRef.html#call-site-attributes
Test IR:
; ModuleID = 'chk.ll' source_filename = "chk.ll" target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" define dso_local double @test(float* %Arr) { entry: br label %for.cond for.cond: %Sum.0 = phi double [ 0.000000e+00, %entry ], [ %add, %for.inc ] %i.0 = phi i32 [ 0, %entry ], [ %inc, %for.inc ] %cmp = icmp slt i32 %i.0, 128 br i1 %cmp, label %for.body, label %for.cond.cleanup for.cond.cleanup: br label %for.end for.body: %idxprom = sext i32 %i.0 to i64 %arrayidx = getelementptr inbounds float, float* %Arr, i64 %idxprom %0 = load float, float* %arrayidx, align 4 %conv = fpext float %0 to double %1 = call fast double @trivially.vectorizable.func(double %conv) #2 <-- CALL OF INTEREST %add = fadd fast double %Sum.0, %1 br label %for.inc for.inc: %inc = add nsw i32 %i.0, 1 br label %for.cond for.end: ret double %Sum.0 } declare double @trivially.vectorizable.func(double) #1 attributes #2 = { "vector-function-abi-variant"="_ZGV_LLVM_N2v_trivially.vectorizable.func(trivially.vectorizable.func.v2)" } attributes #1 = { nounwind readnone speculatable willreturn }The way I'm thinking - this pass will add the declaration for the vector function in IR declare <2 x double> @trivially.vectorizable.func.v2(<2 x double>).
I've confirmed that once this is done, LoopVectorizer will vectorize that call, i.e. convert the scalar call to the vector version.
Yes, this is correct. Your code will vectorize as it sees the vector declaration. Just make sure to mark it with @llvm.compiler.used so it doesn't get deleted by other optimization in the pipelines before reaching the vectorizer.
Is there any reason to avoid updating this pass with such a functionality? Basically, I'm trying to use something similar to simd pragma in clang (https://clang.llvm.org/docs/AttributeReference.html#id211), where we can specify any function as vectorizable.
This pass is not to be used as the main way to generate the IR attribute that carries the mappings. It is here only to translate the TLI mappings used by the libraries into IR information, as the loop vectorizer doesn not interface anymore with the TLI but uses a class called VFDatabase that loads the mappings from vector-function-abi-variant. Any other mapping should be added by the frontend. In fact, the vector-function-abi-variant attribute was originally designed to carry the information of the OpenMP declare variant and declare simd directives: http://lists.llvm.org/pipermail/llvm-dev/2019-June/133484.html (Please notice that non of the frontend changes discussed in that email has been implemented yet, all we have done for now are the backend pieces)
Do we already have such support for front-ends to specify any scalar function as 'vectorizable' in IR? This is the only existing attribute that I see which can be reused for this purpose.
As I said, no. Changing the codegeneration of declare simd to use vector-function-abi-variant is on my to do list, unfortunately not as close as I would like them to be. I suspect that also you have your own pipeline of things to do, but let me know if declare simd becomes closer to you than to me, I can always help you getting things sorted! Of course, feel free to ignore my offer, which is essentially a friendly "patches are welcome!" :)
Also, to clarify, this vectorized call is finally lowered before passing to the backend/linker etc. So, you can think of this as I have a handwritten nice vectorized form for "trivially.vectorizable.call", which will be inlined before passing to codegen (so these declarations needn't be kept around after the inlining). The main reason for such a usecase is if we want the "high level function call" remaining in IR form until some late pass, which is just before codegen. It is not to bypass some vectorizer code generation limitation. We have such usecases internally.
If the function is defined in the module, even if all its uses are inlined, why would you want to remove it? If the function declaration is in the module, it is because it is present in the original source? Even if it comes from a declare simd on a scalar function, the compiler is allowed to think it is there even if it is unused.
Please let me know if you have any more question!
Kind regards,
Francesco
Thank you for the detailed and quick response @fpetrogalli.
Ah, interesting. I hadn't noticed that property of the attribute. So, basically, if we were to pass the attribute through the front-end, we're required to keep the vector declarations in the IR as well.
For my purposes, our front-end converts the code into IR and our use case works with adding the attribute at that point itself. I was hoping to actually avoid having the various vector declarations in the IR - frankly it's just for IR cleanliness and some compile time (depending on number of vector variants for each such scalar function) since the vector declarations won't typically be used until we do a pass for some form of vectorization (loop/SLP etc).
what I was thinking of adding in inject-TLI-mappings or a separate pass such as "inject-vector-declarations" is:
if (call->hasAttr("variant-abi")) { getVectorVariantFromAttr(call, VariantNames) for (VName: VariantNames) addVariantDeclaration(CI, VF, VName); }
So, to that end, I think it would be better to make the property that the "vector name mentioned in the attribute should be present in the IR as a vector function declaration" as optional? This would also go better with the work of converting "pragma simd" to vector-abi attribute, since the front-end needn't add all the various vector declarations (until actually it is required by some pass, such as loop/SLP vectorizer). Again, this is just for IR compactness, compile time benefits. We do not have any functional issues with the attribute as-is, and we do not need source language support for pragma-simd (more details below).
Yes, this is correct. Your code will vectorize as it sees the vector declaration. Just make sure to mark it with @llvm.compiler.used so it doesn't get deleted by other optimization in the pipelines before reaching the vectorizer.
Yup, the main thing was I wanted to avoid having the declarations in the IR, but I see that it is the property of the attribute itself.
Changing the codegeneration of declare simd to use vector-function-abi-variant is on my to do list, unfortunately not as close as I would like them to be. I suspect that also you have your own pipeline of things to do, but let me know if declare simd becomes closer to you than to me, I can always help you getting things sorted!
For us, we are able to pass in the "vector-function-abi-variant" since the function we're tring to vectorize is not from the source code (java), so that part is sorted. We are adding these vectorized versions of some internal functions we add through the front-end (not present in source code).
I was just curious if we had some other attribute I hadn't noticed :) It's just that adding these bunches of declaration from front-end seemed a lot of such declarations going through each pass (number of scalar functions * 5).
Also, this is slightly OT: the pass seems to rely on the fact that vectorizer will always use power-of-2 VF (which is true currently), but once we start supporting any number for VF (for example in middle-end it's VF=6 and backend decides what's the correct VF is), we will start having way too many declarations in module (and the pass will also need to be updated). I think we're functionally good in that case, because we will just not generate the vectorized call, since we don't have the corresponding VF variant declaration.
You are welcome!
Ah, interesting. I hadn't noticed that property of the attribute. So, basically, if we were to pass the attribute through the front-end, we're required to keep the vector declarations in the IR as well.
Yep! I have added as many assertions as I could to make sure the declaration was not missing when reading the attribute.
For my purposes, our front-end converts the code into IR and our use case works with adding the attribute at that point itself. I was hoping to actually avoid having the various vector declarations in the IR - frankly it's just for IR cleanliness and some compile time (depending on number of vector variants for each such scalar function) since the vector declarations won't typically be used until we do a pass for some form of vectorization (loop/SLP etc).
I understand - this was our original intent too. Add the attribute and build the vector function signature at compile time in the middle end, our of the scalar signature in the IR and the mangled string of the vector function name. We started this work last summer, until our intern @aranisumedh found out that it was not possible to retrieve the vector signatures in some cases [1]. Essentially, it turned out that only the frontend has enough information about the language types that are needed to build the correct signature in IR. hence, we decided the vector declaration to be required in the IR.
[1] See this: http://lists.llvm.org/pipermail/llvm-dev/2019-June/133225.html, and the follow up discussion. Let me know if you have any question!
what I was thinking of adding in inject-TLI-mappings or a separate pass such as "inject-vector-declarations" is:
if (call->hasAttr("variant-abi")) { getVectorVariantFromAttr(call, VariantNames) for (VName: VariantNames) addVariantDeclaration(CI, VF, VName); }So, to that end, I think it would be better to make the property that the "vector name mentioned in the attribute should be present in the IR as a vector function declaration" as optional?
Sorry, I don't think we can do this, for the aforementioned reason, and for the sake of consistency. It would be bad to end up having code to handle the custom behavior of the attribute.
This would also go better with the work of converting "pragma simd" to vector-abi attribute, since the front-end needn't add all the various vector declarations (until actually it is required by some pass, such as loop/SLP vectorizer). Again, this is just for IR compactness, compile time benefits. We do not have any functional issues with the attribute as-is, and we do not need source language support for pragma-simd (more details below).
Yes, this is correct. Your code will vectorize as it sees the vector declaration. Just make sure to mark it with @llvm.compiler.used so it doesn't get deleted by other optimization in the pipelines before reaching the vectorizer.
Yup, the main thing was I wanted to avoid having the declarations in the IR, but I see that it is the property of the attribute itself.
Changing the codegeneration of declare simd to use vector-function-abi-variant is on my to do list, unfortunately not as close as I would like them to be. I suspect that also you have your own pipeline of things to do, but let me know if declare simd becomes closer to you than to me, I can always help you getting things sorted!
For us, we are able to pass in the "vector-function-abi-variant" since the function we're tring to vectorize is not from the source code (java), so that part is sorted. We are adding these vectorized versions of some internal functions we add through the front-end (not present in source code).
I think your starting point should be to make sure that the front end generates the exact list of functions you want to provide in vector form, using the attribute and relative declarations. Once you have verified the declarations are there, check that the vectorizer vectorizes as expected. If not, improve whatever part of the middle end opt that is needed to make your input IR work.
I was just curious if we had some other attribute I hadn't noticed :) It's just that adding these bunches of declaration from front-end seemed a lot of such declarations going through each pass (number of scalar functions * 5).
I can see that this might not "look nice" from some points of view, but it is the best way to guarantee that front-end and middle-end are decoupled to be able to unit-test each components independently. In my past I have gone through testing a front-end coupled with a backend - you don't wanna do that if you want to keep sane! :)
Also, this is slightly OT: the pass seems to rely on the fact that vectorizer will always use power-of-2 VF (which is true currently),
the pass assumes power of 2 because the TLI assumes power of two. The pass doesn't know anything about the vectorizer.
but once we start supporting any number for VF (for example in middle-end it's VF=6 and backend decides what's the correct VF is),
Of course, the TLI assumes power of 2 because the vectorizer assumes power of 2. It is a chain. If you want to vectorize VF=6, I think you should start from the vectorizer.
we will start having way too many declarations in module (and the pass will also need to be updated).
I think you need to define how many are too many. Even if the IR file will seem to have many unused declarations, those will not end up in an object file, and will not be useless because they could be used by other optimization passes if needed. It seems to be the only way we can keep the scalar-to-mapping info in a useful place.
I think we're functionally good in that case, because we will just not generate the vectorized call, since we don't have the corresponding VF variant declaration.
Yep. The VFDatabase is the interface you want to use to check the availability of vector functions in the IR, for a given CallInst CI (please refer to the code in the vectorizer, and the implementation of VFDatabase and its API in llvm/Transforms/VectorUtils.h).
Sorry if some of my answer sound "on the go", these days I am working odd hours and have many things in the pipeline! :)
Please don't stop asking questions!
Thank you!
Francesco
Thanks for the clarification here.
This would also go better with the work of converting "pragma simd" to vector-abi attribute, since the front-end needn't add all the various vector declarations (until actually it is required by some pass, such as loop/SLP vectorizer). Again, this is just for IR compactness, compile time benefits. We do not have any functional issues with the attribute as-is, and we do not need source language support for pragma-simd (more details below).
Yes, this is correct. Your code will vectorize as it sees the vector declaration. Just make sure to mark it with @llvm.compiler.used so it doesn't get deleted by other optimization in the pipelines before reaching the vectorizer.
Yup, the main thing was I wanted to avoid having the declarations in the IR, but I see that it is the property of the attribute itself.
Changing the codegeneration of declare simd to use vector-function-abi-variant is on my to do list, unfortunately not as close as I would like them to be. I suspect that also you have your own pipeline of things to do, but let me know if declare simd becomes closer to you than to me, I can always help you getting things sorted!
For us, we are able to pass in the "vector-function-abi-variant" since the function we're tring to vectorize is not from the source code (java), so that part is sorted. We are adding these vectorized versions of some internal functions we add through the front-end (not present in source code).
I think your starting point should be to make sure that the front end generates the exact list of functions you want to provide in vector form, using the attribute and relative declarations. Once you have verified the declarations are there, check that the vectorizer vectorizes as expected. If not, improve whatever part of the middle end opt that is needed to make your input IR work.
I agree with all of the points. Again, to state, for a simple scalar function, we will have 5 vector forms being generated (2,4,8,16 and 32) and we'll have to start recording each of those declarations in the module. Is that right? This will functionally work for us (since we've tried a similar idea in our pipeline).
I was just curious if we had some other attribute I hadn't noticed :) It's just that adding these bunches of declaration from front-end seemed a lot of such declarations going through each pass (number of scalar functions * 5).
I can see that this might not "look nice" from some points of view, but it is the best way to guarantee that front-end and middle-end are decoupled to be able to unit-test each components independently. In my past I have gone through testing a front-end coupled with a backend - you don't wanna do that if you want to keep sane! :)
Ah, so there is some difference here. Our front-end and LLVM is completely decoupled (more details here: https://llvm.org/devmtg/2017-10/slides/Reames-FalconKeynote.pdf), but we have a mechanism to query from LLVM to our java VM for anything we want more information about (in this case, pass in the exact set of declarations). So, we can always guarantee the correct set of declarations are retrieved. Building the signature at compile time without any input from FE will be problematic (as you have pointed out above). I can see why the declarations are marked as required for the attribute.
the pass assumes power of 2 because the TLI assumes power of two. The pass doesn't know anything about the vectorizer.
but once we start supporting any number for VF (for example in middle-end it's VF=6 and backend decides what's the correct VF is),
Of course, the TLI assumes power of 2 because the vectorizer assumes power of 2. It is a chain. If you want to vectorize VF=6, I think you should start from the vectorizer.
Agreed, I was just it pointing out (and to be clear, this seems to be the assumption in various other parts of the vectorizer as well). :)
we will start having way too many declarations in module (and the pass will also need to be updated).
I think you need to define how many are too many. Even if the IR file will seem to have many unused declarations, those will not end up in an object file, and will not be useless because they could be used by other optimization passes if needed. It seems to be the only way we can keep the scalar-to-mapping info in a useful place.
So, as stated previously in numbers, we have something like 5 * number of scalar functions which have vector mappings. In our case, we will have 5 per scalar because there is nothing preventing generating a vectorized power-of-2 VF. I remember seeing some "vector length agnostic function", perhaps those can be generated on the fly, if we specify something like _vN rather than _v2 or _v4 etc?
Thank you!
[...]
I think your starting point should be to make sure that the front end generates the exact list of functions you want to provide in vector form, using the attribute and relative declarations. Once you have verified the declarations are there, check that the vectorizer vectorizes as expected. If not, improve whatever part of the middle end opt that is needed to make your input IR work.
I agree with all of the points. Again, to state, for a simple scalar function, we will have 5 vector forms being generated (2,4,8,16 and 32) and we'll have to start recording each of those declarations in the module. Is that right? This will functionally work for us (since we've tried a similar idea in our pipeline).
"is that right?" -> yes :)
I was just curious if we had some other attribute I hadn't noticed :) It's just that adding these bunches of declaration from front-end seemed a lot of such declarations going through each pass (number of scalar functions * 5).
I can see that this might not "look nice" from some points of view, but it is the best way to guarantee that front-end and middle-end are decoupled to be able to unit-test each components independently. In my past I have gone through testing a front-end coupled with a backend - you don't wanna do that if you want to keep sane! :)
Ah, so there is some difference here. Our front-end and LLVM is completely decoupled (more details here: https://llvm.org/devmtg/2017-10/slides/Reames-FalconKeynote.pdf), but we have a mechanism to query from LLVM to our java VM for anything we want more information about (in this case, pass in the exact set of declarations). So, we can always guarantee the correct set of declarations are retrieved. Building the signature at compile time without any input from FE will be problematic (as you have pointed out above). I can see why the declarations are marked as required for the attribute.
Good that we reached the same conclusion after going through the same process of discovery!
the pass assumes power of 2 because the TLI assumes power of two. The pass doesn't know anything about the vectorizer.
but once we start supporting any number for VF (for example in middle-end it's VF=6 and backend decides what's the correct VF is),
Of course, the TLI assumes power of 2 because the vectorizer assumes power of 2. It is a chain. If you want to vectorize VF=6, I think you should start from the vectorizer.
Agreed, I was just it pointing out (and to be clear, this seems to be the assumption in various other parts of the vectorizer as well). :)
Yep.
we will start having way too many declarations in module (and the pass will also need to be updated).
I think you need to define how many are too many. Even if the IR file will seem to have many unused declarations, those will not end up in an object file, and will not be useless because they could be used by other optimization passes if needed. It seems to be the only way we can keep the scalar-to-mapping info in a useful place.
So, as stated previously in numbers, we have something like 5 * number of scalar functions which have vector mappings. In our case, we will have 5 per scalar because there is nothing preventing generating a vectorized power-of-2 VF.
Well, the cost model selects the one that is more appropriate for the datatype and the content of the loop. So, for example, if your loop is processing 64-bits scalars, and your vector registers are 128-bit wide, it is very unluckily that the vectorizer will select anything other than VF = 2... So if your function processes 64-bit data, you should first emit the 2-labes version of the vector function in the module, without bothering generating the other 4/8/16/32 lanes ones if the vectorizer already picks up the 2-lane version.
I remember seeing some "vector length agnostic function", perhaps those can be generated on the fly, if we specify something like _vN rather than _v2 or _v4 etc?
Vector Lengh Agnostic (VLA) is currently used only when targeting the scalable vector extension (SVE, of AArch64), because it uses a property of the underlying hardware. You cannot used it for a fixed width vector extension.
Francesco @fpetrogalli, I was reusing this attribute for custom scalar functions and I have a fundamental question - why is it that this *only* a callsite attribute? I don't see anything preventing this from being a function attribute as well. The meaning for the function attribute being if "variant-abi" is present on the function, it implies all callsites calling this particular function has that attribute. Of course, if you have multiple units being finally linked, it is the responsibility of the front end to choose it it wants to add the function attribute or a callsite attribute. However, I strongly feel we should have support for both, because there are use cases where the vector shape etc does not change depending on the callsite. Was there a reason for not supporting it as a function attribute as well?
And I came across exactly that suggestion for making this a function attribute as well: https://lists.llvm.org/pipermail/llvm-dev/2019-May/132631.html
I don't see anything preventing this from being a function attribute as well.
HI @anna ,
apologies in advance for the late reply.
I don't think we want this. Imagine situation in which proto.h contains a scalar declaration foo marked with #pragma ompe declare simd. Suppose you include this header in 2 compilation units, one (sourceA.c) that can be compiled with -fopenmp-simd, and one (sourceB.c) that cannot be compiled with openmp - for any reason, for example because of some requirements on floating points computations. Then, you want to do some optimization that involves merging the two modules. The calls to foo in the IR generated from sourceB.c suddenly become vectorizable, which is wrong. So, overall, I think we should not attach this attribute to a function declaration.
I appreciate that what I described might be a remote possibility, but if we ever end up having to deal with it, it will be quite hard to fix.
Let me know if you have any question.
Kind regards,
Francesco
FWIW, I think for generic OpenMP it is needed to annotate the call site. That said, I don't see a fundamental reason one could not annotate the function. So we need the call site attributes and later when we allow them on functions, everything should be in-place. WDYT?
Hi! I really don't think this should be matter. The way I'm thinking of this - it is the responsibility of the front-end to add the correct attribute, either on callsite or function. We can repurpose this attribute for other front-ends and for things other than OpenMP. For your example and in OpenMP, you have this on the callsites. Always. So, that cases you describe is handled.
Don't forget about the old pass manager :)