This is an archive of the discontinued LLVM Phabricator instance.

[TLI][AArch64] Extend ReplaceWithVeclib to replace vector FREM instructions for scalable vectors
AbandonedPublic

Authored by jolanta.jensen on Jul 27 2023, 8:10 AM.

Details

Summary

This patch teaches ReplaceWithVeclib pass how to replace
vector FREM instructions for scalable vectors.

Depends on D157258

Diff Detail

Unit TestsFailed

Event Timeline

jolanta.jensen created this revision.Jul 27 2023, 8:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2023, 8:10 AM
jolanta.jensen requested review of this revision.Jul 27 2023, 8:10 AM
mgabka added inline comments.Jul 28 2023, 2:13 AM
llvm/lib/CodeGen/ReplaceWithVeclib.cpp
142–153

from what I can see this is the only different part of code then in "replaceWithTLIFunction" , hence most of the code below is a code duplication. I think a smart refactoring of replaceWithTLIFunction would be enough to make it work for CI and Instructions.

The other thing is that the code you wrote duplicates a lot, you can use a suitable container and just add extra element there when you are creating a type for masked function, in that way you do not need to create the type twice and add input types twice.

180

I guess that this is handled by "replaceAllUsesWith" but worth to check if with a test.

251

this is not needed, the mechanism here should work for both fixed and scalable types if mappings exist.
if we want to reject the transformation for scalable vector types I think we should reject it earlier, i.e where we detect frem.

please LLVM coding guideline on using braces with simple if statememts https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements, it applies to other places in this patch.

253

Hi Jolanta,
in my opinion it would be better to have a single main entry point here, and then branch from inside replaceWithCallToVeclib, thanks to it you can void some of the code duplication, like for example the debug messages.

254

when moving this outside this function, this check can be combined with the one above since ScalableVectorType class exists.

260–263

the function name "replaceInstructionWithCallToVeclib" does not suggest that this is specific to fmod/frem, at the moment we do not have intention to extend it beyond frem, so in my opinion it is worth to replace the function name, to avoid confusion.

264

please use early exit instead, it is much better idea to use early exits from function instead long nested if block, it improves code readability.
please apply it where possible to other checks you are doing

272–273

why looking for it in cases where you are not going to use it? it isn't efficient. please change it.

274–276

this message is printed actually after looking for the mappings, so probably should be moved up.

llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-armpl.ll
414

having both arguments of frem as inputs to this function would simplify the test, same apply to other tests.

The other thing is that other tests in this file are using fast math flags, so please use it for frem as well.

llvm/test/Transforms/LoopVectorize/AArch64/sleef-calls-aarch64.ll
2 ↗(On Diff #544781)

could you explain why did you decide to modify this test?

Addressed review comments.

jolanta.jensen added inline comments.Jul 31 2023, 3:55 AM
llvm/lib/CodeGen/ReplaceWithVeclib.cpp
142–153

Lines 129-134 also differ. Line 77 has an assert that is not vaIid here. I created small functions for lines 54-66 and 116-127 that are shared. And I fixed the code duplication above.

180

Indeed, it is handled by "replaceAllUsesWith".

void Value::replaceAllUsesWith(Value *New) {
  doRAUW(New, ReplaceMetadataUses::Yes);
}
void Value::doRAUW(Value *New, ReplaceMetadataUses ReplaceMetaUses) {
...
  if (ReplaceMetaUses == ReplaceMetadataUses::Yes && isUsedByMetadata())
    ValueAsMetadata::handleRAUW(this, New);
...

Do we need a test to confirm or is this explanation good enough?

251

There is no mappings for frem with fixed vectors for SLEEF library so we need to check it's a scalable type.

I'll check the code for breaches of the standard and I'll correct.

253

Fixed.

254

I changed to ScalableVectorType.

260–263

Fixed.

264

Fixed.

272–273

Fixed.

274–276

Fixed.

llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-armpl.ll
414

Fixed.

llvm/test/Transforms/LoopVectorize/AArch64/sleef-calls-aarch64.ll
2 ↗(On Diff #544781)

As I understand the commit message and the implementation itself, ReplaceWithVeclib pass is to be run on intrinsics operating on vectors not scalar values. This pass did make any changes to the IR and is only confusing.

mgabka added inline comments.Jul 31 2023, 5:45 AM
llvm/lib/CodeGen/ReplaceWithVeclib.cpp
142–153

I am not against having "replaceFremWithCallToVeclib" I think that it could stay.
However I would like you to try to not create " replaceFremWithCallToVeclib".
In my opinion the code will be more readable if we have only replaceWithTLIFunction, which takes an Instruction, and do not create functions like "replaceWithNewCallInst" or "addFunctionToCompilerUsed".
If it makes code more clean it might be worth to move some of the frem related things like creating function type to a helper function.

Addressing review comment about refactoring.

jolanta.jensen added inline comments.Aug 1 2023, 4:07 AM
llvm/lib/CodeGen/ReplaceWithVeclib.cpp
142–153

Fixed. All frem handling is in replaceWithTLIFunction. I think it's better to have it all gathered there than break out into separate function(s) as it's not that much code.

mgabka added inline comments.Aug 1 2023, 5:19 AM
llvm/lib/CodeGen/ReplaceWithVeclib.cpp
58

the NumElements and ElementType you can obtain from the Frem instruction return tyoe, so you do not need to pass it as arguments to this function and you can remove this assert as well.

64

IR builder has getAllOnesMask please use it instead

120–121

this assert can be moved up so you won't need a temporary variable

122

could you move this comment abve the if stmt and remove the braces?

128

extra new line, it is not needed

170

could you move this comment abve the if stmt and remove the braces?

171–174

IMO this would be more readable if written as if statements, moreover if the type is not double of float we should rerutn false from this function, am I right?

StringRef ScalarName;
if (ElementType->isFloatTy())

ScalarName =TLI.getName(LibFunc_fmodf);

elseif(ElementType->isDoubleTy())

ScalarName =TLI.getName(LibFunc_fmod)

else
return false;

thanks to to the statement below is not needed and code is more clear and compact.

174

please move the comment above the if stmt and remove braces

178

these 2 if stmts can be combined together, C/C++ guarantees that they are executed from left to right

178–179

could you move the comments above the if and remove the braces?

I think this is not needed entirely to be fair, can we rely on the check performed by TLI.getVectorizedFunction? and add debug message just below the final return false?

268

could you can remove the braces here?

llvm/test/Transforms/LoopVectorize/AArch64/sleef-calls-aarch64.ll
2 ↗(On Diff #544781)

yes, you are correct, am I aware that this test is quite bad from the beginning. I would like to rearrange the SLEEF tests to match what I did for armpl in https://reviews.llvm.org/home/menu/view/137/, i.e have separate tests for LV and for replace with veclib.

For now I would leave it as it is and remove the code you added, AFAIK we should not see calls to @fmod or @fmodf in the situations we want to use _ZGVsMxvv_fmod or _ZGVsMxvv_fmodf, in those cases frem instructions with vector operands should be present in IR instead.

Addressing review comments.

llvm/lib/CodeGen/ReplaceWithVeclib.cpp
122

Fixed.

170

Fixed.

171–174

Fixed. In the unusual case the ScalarName is empty we will just do 2 unnecessary lookups.

174

Fixed.

178

Fixed.

178–179

Yes, we can. It calls into:

bool isFunctionVectorizable(StringRef F, const ElementCount &VF) const {
  return !(getVectorizedFunction(F, VF, false).empty() &&
           getVectorizedFunction(F, VF, true).empty());
}

Removed the isFunctionVectorizable() check and added a debug message if a vectorized function wasn't found.

jolanta.jensen added inline comments.Aug 2 2023, 5:17 AM
llvm/lib/CodeGen/ReplaceWithVeclib.cpp
58

I would need to do a cast similar way as I do in replaceFremWithCallToVeclib, i.e. something like (dyn_cast<ScalableVectorType>(I.getType()))->getElementCount() and I would need a variable for it in the whole function scope even if only frem is using it. I think it's less fuss to send it as argument. But if you think it's better to obtain it from Frem instruction anyway, I'll change.
I realized I do not use ElementType and I removed it.

64

It does similar except I'll get a Value not Type since it does more than I need and I would need to convert back.

Value *getAllOnesMask(ElementCount NumElts) {
  VectorType *VTy = VectorType::get(Type::getInt1Ty(Context), NumElts);
  return Constant::getAllOnesValue(VTy);
}

I kept line 65 as it is but I changed code on lines 98-100 to use getAllOnesMask.

120–121

Fixed.

128

Fixed.

268

Fixed. I kept the a blank line before return to make it more visible what is returned.

Addressing review comments.

llvm/lib/CodeGen/ReplaceWithVeclib.cpp
58

Fixed.

Addressing yet another review comments.

jolanta.jensen edited the summary of this revision. (Show Details)Aug 3 2023, 10:37 AM
mgabka added inline comments.Aug 4 2023, 7:29 AM
llvm/lib/CodeGen/ReplaceWithVeclib.cpp
54–84

I think most of us are more used to read positive conditions this is why I would suggest to write it as:
if(CI) {
handle intrinsic
}
else {
handle frem and here the assert to check that the instruction is a frem would make more sense
}

I would suggest to do a simila change in the if/else block you added below

63

you can just use here: Tys.push_back(VectorType::get(Type::getInt1Ty(M->getContext()), NumElements));

74

IMO this assert is redundant, it is essentially checking if the Function::Create works correctly, what is not needed in my opinion.

93

I know that this is not part of your work, but I realised that this is not tested at all, could you create a an NFC patch with regenerated tests with "--check-globals" same applied to the non scalable test and rebase your patch?

103

not needed change

150

this is wrong, please read about StringRef https://discourse.llvm.org/t/std-string-vs-llvm-stringref/65873/2

you can just write it as :

StringRef TLIName = TLI.getVectorizedFunction(ScalarName, NumElements

151–155

I am not sure that this is even needed, in the TLI all mappings for SLEEF or ArmPL for scalable vectors are masked, so I would assume that we either replace with masked call or we do not replace at all, @paulwalker-arm what is your opinion?

161

I think LLVM prefers to use an in-line C-style comment in this case /*Masked*/ true

163

I think this is going to pollute dbg log too much as it will add a debug message for each unsupported intrinsic, please remove it.

258

nit: better to use consistent spelling so either please use FRem of frem everywhere

llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-sleef-scalable.ll
393

please use same naming scheme as other functions in this file so this should be : llvm_frem_vscale_f64, please apply to the other tests

paulwalker-arm added inline comments.Aug 4 2023, 7:39 AM
llvm/lib/CodeGen/ReplaceWithVeclib.cpp
151–155

Checking for both is fine and akin to what we do in LoopVectorize. There's no requirement for scalable vector math routines to require a mask, it's just masked versions are easier to work with when tail-folding.

jolanta.jensen added inline comments.Aug 7 2023, 3:06 AM
llvm/lib/CodeGen/ReplaceWithVeclib.cpp
54–84

I agree. It was ordered this way because frem handling was before intrinsics handling in replaceWithCallToVeclib. But since branching was reverted to reside in runImpl where it was in thefirst version of the patch, it does not make sense to have intrinsics handling before frem handling here. Hopefully the code will be more clear this way even if I dont like those if...else... anyway and I would prefer separate functions for intrinsics and frem.

63

Fixed. This way I could move declaration of IRBuilder more locally too.

74

This assert is part of the original implementation, removing it would be introducing unnecessary nfc (even if it has been moved to avoid creation of an extra variable -- so the move is a nfc anyway). I would like to keep it since it is a part of the original implementation.

93

Fixed.

150

Yeah, copy-pasted from the original code in replaceWithCallToVeclib. Corrected -- but only here, in my own code.

161

Fixed.

163

Why should it pollute? We won't get it more times than

LLVM_DEBUG(dbgs() << DEBUG_TYPE << ": Looking up TLI mapping for `"
                   << ScalarName << "` and vector width " << NumElements
                   << ".\n");

It matches the above print in case we find nothing. I think it's user friendly and I would like to keep it.

258

Fixed.

llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-sleef-scalable.ll
393

The other test names start with @llvm since they call functions which names start with @llvm. My test names start with @frem since I call frem. In my opinion it is consistent with the naming scheme in this file.

jolanta.jensen added inline comments.Aug 7 2023, 5:32 AM
llvm/lib/CodeGen/ReplaceWithVeclib.cpp
52

This is a bug. It will loose its value when it comes to line 104. It needs to be instantiated at once, i.e.
ElementCount NumElements = (dyn_cast<ScalableVectorType>(I.getType()))->getElementCount();

jolanta.jensen added inline comments.Aug 7 2023, 6:00 AM
llvm/lib/CodeGen/ReplaceWithVeclib.cpp
52

Or actually, the issue here is we set the ElementCount from an if statement, so if the TLIFunc is present, it will not be set.

mgabka added inline comments.Aug 7 2023, 6:05 AM
llvm/lib/CodeGen/ReplaceWithVeclib.cpp
51

this is wrong, i.e in cases where the function is already declared in the module you are not initialising it and in the line 112 you are going to print empty string.

The code below is adding function declaration to the module, and you are setting the OldName only in such situation similar situation applied to the NumElements variable.
So this is an incorrect code as those variables should be set to correct value in both situations.

I also believe the code below could be written in a bit shorter, please check if you can do anything with it as inspiration have a look at the code I suggested to use in the other place.

103

not needed change please revert this change.

103–122

this can be written in much more compacted way for example:

SmallVector<Value *> Args(I.operand_values());
SmallVector<OperandBundleDef, 1> OpBundles;
// Preserve the operand bundles if it is an intrinsic call.
if (CI)
  CI->getOperandBundlesAsDefs(OpBundles);
// for masked calls to frem add a mask operand =
else if (Masked)
       Args.push_back(IRBuilder.getAllOnesMask(NumElements));

CallInst *Replacement = IRBuilder.CreateCall(TLIFunc, Args, OpBundles);
if (isa<FPMathOperator>(Replacement)) {
    // Preserve fast math flags for FP math.
    Replacement->copyFastMathFlags(&I);
  }
163

I was too fast, this function only works for frem i nstruction, not for all llvm intrinsics.
In my view there is no need to add messages when optimization won't happen, you added messages when it happens, so lack of it means that it is not happening.

llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-sleef-scalable.ll
393

In my opinion it makes things only more difficult as it breaks the alphabetical order, and when looking for output for frem we need to too at. the end of the file instead place where all math operations started which name starts with "f" is.

mgabka added inline comments.Aug 7 2023, 6:53 AM
llvm/lib/CodeGen/ReplaceWithVeclib.cpp
103–122

I was wrong here, when the Instruction is a CI has the call as the first operand so it needs to be a bit different, I also think that it would be good to handle always mask (assuming it is always the last operand)

SmallVector<Value *> Args;
SmallVector<OperandBundleDef, 1> OpBundles;
// Preserve the operand bundles and copy arguments if it is an intrinsic call.
if (CI) {
  Args.assign(CI->arg_begin(), CI->arg_end());
  CI->getOperandBundlesAsDefs(OpBundles);
 }
else
  Args.assign(I.op_begin(), I.op_end());
// if mask is requested we need to add it
if (Masked)
  Args.push_back(IRBuilder.getAllOnesMask(NumElements));

CallInst *Replacement = IRBuilder.CreateCall(TLIFunc, Args, OpBundles);

Bug fixes and review comments.

llvm/lib/CodeGen/ReplaceWithVeclib.cpp
51

Yup. this is a bug as well. Corrected.

The code below (everything that is in if (CI) block) is part of the original implementation and I would prefer not to touch it.

103–122

This is part of the original implementation and I would prefer not to touch it. And CallInst part does not handle scalable vectors so no mask. And I don't think we even can assume mask is always the last operand even if so is the case for FRem.

llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-sleef-scalable.ll
393

replace-intrinsics-with-veclib-sleef-scalable.ll is alphabetically ordered for intrinsics tests but replace-intrinsics-with-veclib-armpl.ll is not. It starts with cos and the next one is sin. The remainder of the test file is fairly alphabetical even if log10 comes after log2. But I moved the frem tests so they come in alphabetical order as they are listed alphabetically in VecFuncs.def. I renamed them also to llvm_frem even if they not intrinsics.

jolanta.jensen added inline comments.Aug 7 2023, 9:41 AM
llvm/lib/CodeGen/ReplaceWithVeclib.cpp
161–163

Forgot about this one. Will be included in next patch.

jolanta.jensen added inline comments.Aug 9 2023, 3:50 AM
llvm/lib/CodeGen/ReplaceWithVeclib.cpp
161–163

Removed the unnecessary debug print.

mgabka added inline comments.Aug 10 2023, 1:14 AM
llvm/include/llvm/Analysis/VectorUtils.h
184

please add documentation for the new added parameter.

in my opinion the interface will be better if we use it for both scalable and fixed vectors, otherwise it is confusing what this parameter is and when it is used.

llvm/lib/CodeGen/ReplaceWithVeclib.cpp
55

this may fail, not all instructions used here will be returning Vector.
You need the EC only for frem case to create Type for the mask, but in this case you have guaranteed that it returns a vector type. Please change it.

57

nod needed change

74

FWIW this you already refactored the code quite a bit so removing not needed bit is not a problem as part of this work.

75

The whole idea about std::optional is that the value may not exist, https://en.cppreference.com/w/cpp/utility/optional/value
and you need to handle it otherwise you will get exception.

76

IMO this dbg message is not needed as it is too detailed. LLVM tends to have more high level dbg output otherwise the dbg log are growing too quickly.

for this pass having a dbg messages to show that the is replacement and what has been replaced with what in my opinion is enough.

103–122

The final goal is to make this pass work for scalable vectors, so introducing unnecessary if/else blocks to just remove them later in my opinion is not a good idea.
Since we now want to use getParamIndexForOptionalMask we will now the place of mask.

llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-armpl.ll
195

a comment explaining why the transformation is not happening would be useful here.

jolanta.jensen added inline comments.Aug 10 2023, 1:40 AM
llvm/lib/CodeGen/ReplaceWithVeclib.cpp
55

Change to what? I was sending ElementCount as an argument but this was disliked and I was asked to retrieve it here. How do you want it? Could you explain a bit more?

75

In my opinion the check on line 74 is enough but I can add one more check for the position of the mask.

103–122

It's not only about the mask. CallInst needs it's OpBundles as well. But sure, if we add support for scalable vectors for CI it may be rewritten. But we are not there yet.

Addressing review comments.

llvm/include/llvm/Analysis/VectorUtils.h
184

Updated the documentation.
For fixed vectors do not need to retrieve the Vectorization Factor. The Module parameter is only used for scalable vectors to retrieve the Vectorization Factor.
But I realized the EC parameter can be misused if we use it with a CallInst and there is no Function in the Module. Any suggestions how to mitigate?

// 1. We don't accept a zero lanes vectorization factor.
// 2. We don't accept the demangling if the vector function is not
// present in the module unless we handle an Instruction.
if (VF == 0)
  return std::nullopt;
if (!EC && !M.getFunction(VectorName))
  return std::nullopt;
llvm/lib/Analysis/VFABIDemangling.cpp
455

Here, the EC parameter can be misused if it is set for a CallInst but the VectorFunction is not present in the Module. Any advice how to mitigate?

llvm/lib/CodeGen/ReplaceWithVeclib.cpp
55

Sending NumElements as argument again, this time as optional one. Please check if this is a better solution.

57

Added a blank line again. Same with line 68.

74

Removed the assert checking for the function type. i.e. on line 62.

75

Added a check if we got a return value.

76

Removed.

llvm/test/CodeGen/AArch64/replace-intrinsics-with-veclib-armpl.ll
195

Fixed.

llvm/lib/CodeGen/ReplaceWithVeclib.cpp
44

This will malfunction if false is sent as an argument. To be corrected.

Bugfix to a bug introduced in previous patch.

jolanta.jensen added inline comments.Aug 11 2023, 2:55 AM
llvm/lib/CodeGen/ReplaceWithVeclib.cpp
44

Now fixed.

jolanta.jensen abandoned this revision.Aug 29 2023, 4:10 AM