This is an archive of the discontinued LLVM Phabricator instance.

[Clang] Extend emitUnaryBuiltin to avoid duplicate logic.
ClosedPublic

Authored by junaire on Dec 22 2021, 4:56 AM.

Details

Summary

This patch extends emitUnaryBuiltin so that we can better emitting IR when
implement builtins specified in D111529.

Also contains some NFC, applying it to existing code.

Diff Detail

Event Timeline

junaire requested review of this revision.Dec 22 2021, 4:56 AM
junaire created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2021, 4:56 AM
fhahn added a comment.Dec 22 2021, 5:10 AM

Can you also update the existing places that could use it?

junaire updated this revision to Diff 395852.Dec 22 2021, 5:32 AM

Update the existing place that can use emitUnaryBuiltin.

junaire updated this revision to Diff 395853.Dec 22 2021, 5:39 AM

Sorry, It seems that the base branch is wrong, reupdate it.

fhahn added a comment.Dec 22 2021, 5:39 AM

Update the existing place that can use emitUnaryBuiltin.

I meant just update the existing uses *without* adding floor, roundeven, trunc, so this change should be NFC (non-functional change)

Update the existing place that can use emitUnaryBuiltin.

I meant just update the existing uses *without* adding floor, roundeven, trunc, so this change should be NFC (non-functional change)

I'm sorry about it, I'm just too careless. :-(

junaire updated this revision to Diff 395863.Dec 22 2021, 6:52 AM

Update the existing place that can use emitUnaryBuiltin.

junaire updated this revision to Diff 395864.Dec 22 2021, 6:58 AM

Update the existing place that can use emitUnaryBuiltin.

junaire updated this revision to Diff 396140.Dec 24 2021, 12:12 AM

Fix wrong usage.

fhahn added inline comments.Dec 24 2021, 12:44 AM
clang/lib/CodeGen/CGBuiltin.cpp
3137–3138

Should also be used here?

3202

Should also be used here?

3221

Should also be used here?

junaire added a comment.EditedDec 24 2021, 2:02 AM

Should also be used here?

I don't know why but these will cause tests to fail.
Maybe it is caused by my local cache? I'm gonna make a clean build to test it again.
BTW, thanks for taking look at this. Merry Christmas! ;D

I confirmed that we can use emitUnaryBuiltin in the cases you pointed out. Please see the logs below:

$ path/to/llvm-project/build/bin/clang -cc1 -internal-isystem /path/to/llvm-project/build/lib/clang/14.0.0/include -nostdsysteminc -triple x86_64-apple-darwin /path/to/llvm-project/clang/test/CodeGen/builtins-elementwise-math.c -emit-llvm -disable-llvm-passes -o - | /path/to/llvm-project/build/bin/FileCheck /path/to/llvm-project/clang/test/CodeGen/builtins-elementwise-math.c
/path/to/llvm-project/clang/test/CodeGen/builtins-elementwise-math.c:16:17: error: CHECK-NEXT: expected string not found in input
 // CHECK-NEXT: call float @llvm.fabs.f32(float [[F1]])
                ^
<stdin>:35:43: note: scanning from here
 %0 = load float, float* %f1.addr, align 4
                                          ^
<stdin>:35:43: note: with "F1" equal to "%0"
 %0 = load float, float* %f1.addr, align 4
                                          ^
<stdin>:37:13: note: possible intended match here
 %elt.abs = call float @llvm.fabs.f32(float %1)
            ^

Input file: <stdin>
Check file: /path/to/llvm-project/clang/test/CodeGen/builtins-elementwise-math.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          30:  store <8 x i16> %vi1, <8 x i16>* %vi1.addr, align 16 
          31:  store <8 x i16> %vi2, <8 x i16>* %vi2.addr, align 16 
          32:  store i64 %i1, i64* %i1.addr, align 8 
          33:  store i64 %i2, i64* %i2.addr, align 8 
          34:  store i16 %si, i16* %si.addr, align 2 
          35:  %0 = load float, float* %f1.addr, align 4 
next:16'0                                               X error: no match found
next:16'1                                                 with "F1" equal to "%0"
          36:  %1 = load float, float* %f1.addr, align 4 
next:16'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          37:  %elt.abs = call float @llvm.fabs.f32(float %1) 
next:16'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:16'2                 ?                                    possible intended match
          38:  store float %elt.abs, float* %f2.addr, align 4 
next:16'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          39:  %2 = load double, double* %d1.addr, align 8 
next:16'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          40:  %3 = load double, double* %d1.addr, align 8 
next:16'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          41:  %elt.abs1 = call double @llvm.fabs.f64(double %3) 
next:16'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          42:  store double %elt.abs1, double* %d2.addr, align 8 
next:16'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>
$ /path/to/llvm-project/build/bin/clang -cc1 -internal-isystem /path/to/llvm-project/build/lib/clang/14.0.0/include -nostdsysteminc -triple x86_64-apple-darwin ~/dev/cpp-projects/llvm-project/clang/test/CodeGen/builtins-reduction-math.c -emit-llvm -disable-llvm-passes -o - | ~/dev/cpp-projects/llvm-project/build/bin/FileCheck ~/dev/cpp-projects/llvm-project/clang/test/CodeGen/builtins-reduction-math.c
/path/to/llvm-project/clang/test/CodeGen/builtins-reduction-math.c:12:17: error: CHECK-NEXT: expected string not found in input
 // CHECK-NEXT: call float @llvm.vector.reduce.fmax.v4f32(<4 x float> [[VF1]])
                ^
<stdin>:23:57: note: scanning from here
 %0 = load <4 x float>, <4 x float>* %vf1.addr, align 16
                                                        ^
<stdin>:23:57: note: with "VF1" equal to "%0"
 %0 = load <4 x float>, <4 x float>* %vf1.addr, align 16
                                                        ^
<stdin>:25:13: note: possible intended match here
 %rdx.min = call float @llvm.vector.reduce.fmax.v4f32(<4 x float> %1)
            ^
/path/to/llvm-project/clang/test/CodeGen/builtins-reduction-math.c:38:17: error: CHECK-NEXT: expected string not found in input
 // CHECK-NEXT: call float @llvm.vector.reduce.fmin.v4f32(<4 x float> [[VF1]])
                ^
<stdin>:74:57: note: scanning from here
 %0 = load <4 x float>, <4 x float>* %vf1.addr, align 16
                                                        ^
<stdin>:74:57: note: with "VF1" equal to "%0"
 %0 = load <4 x float>, <4 x float>* %vf1.addr, align 16
                                                        ^
<stdin>:76:13: note: possible intended match here
 %rdx.min = call float @llvm.vector.reduce.fmin.v4f32(<4 x float> %1)
            ^

Input file: <stdin>
Check file: /path/to/llvm-project/clang/test/CodeGen/builtins-reduction-math.c

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          18:  %cvi1 = alloca <8 x i16>, align 16 
          19:  %r5 = alloca i64, align 8 
          20:  store <4 x float> %vf1, <4 x float>* %vf1.addr, align 16 
          21:  store <8 x i16> %vi1, <8 x i16>* %vi1.addr, align 16 
          22:  store <4 x i32> %vu1, <4 x i32>* %vu1.addr, align 16 
          23:  %0 = load <4 x float>, <4 x float>* %vf1.addr, align 16 
next:12'0                                                             X error: no match found
next:12'1                                                               with "VF1" equal to "%0"
          24:  %1 = load <4 x float>, <4 x float>* %vf1.addr, align 16 
next:12'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          25:  %rdx.min = call float @llvm.vector.reduce.fmax.v4f32(<4 x float> %1) 
next:12'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:12'2                 ?                                                          possible intended match
          26:  store float %rdx.min, float* %r1, align 4 
next:12'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          27:  %2 = load <8 x i16>, <8 x i16>* %vi1.addr, align 16 
next:12'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          28:  %3 = load <8 x i16>, <8 x i16>* %vi1.addr, align 16 
next:12'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          29:  %rdx.min1 = call i16 @llvm.vector.reduce.smax.v8i16(<8 x i16> %3) 
next:12'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          30:  store i16 %rdx.min1, i16* %r2, align 2 
next:12'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          69:  %cvi1 = alloca <8 x i16>, align 16 
          70:  %r5 = alloca i64, align 8 
          71:  store <4 x float> %vf1, <4 x float>* %vf1.addr, align 16 
          72:  store <8 x i16> %vi1, <8 x i16>* %vi1.addr, align 16 
          73:  store <4 x i32> %vu1, <4 x i32>* %vu1.addr, align 16 
          74:  %0 = load <4 x float>, <4 x float>* %vf1.addr, align 16 
next:38'0                                                             X error: no match found
next:38'1                                                               with "VF1" equal to "%0"
          75:  %1 = load <4 x float>, <4 x float>* %vf1.addr, align 16 
next:38'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          76:  %rdx.min = call float @llvm.vector.reduce.fmin.v4f32(<4 x float> %1) 
next:38'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
next:38'2                 ?                                                          possible intended match
          77:  store float %rdx.min, float* %r1, align 4 
next:38'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          78:  %2 = load <8 x i16>, <8 x i16>* %vi1.addr, align 16 
next:38'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          79:  %3 = load <8 x i16>, <8 x i16>* %vi1.addr, align 16 
next:38'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          80:  %rdx.min1 = call i16 @llvm.vector.reduce.smin.v8i16(<8 x i16> %3) 
next:38'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          81:  store i16 %rdx.min1, i16* %r2, align 2 
next:38'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>

Do you know why these will happen? @fhahn

fhahn added a comment.Dec 24 2021, 6:38 AM
35:  %0 = load float, float* %f1.addr, align 4 
36:  %1 = load float, float* %f1.addr, align 4 
37:  %elt.abs = call float @llvm.fabs.f32(float %1)

It looks like the argument expression is evaluated twice. Did you remove the Value *Op0 = EmitScalarExpr(E->getArg(0)); calls?

junaire added a comment.EditedDec 24 2021, 7:10 AM
35:  %0 = load float, float* %f1.addr, align 4 
36:  %1 = load float, float* %f1.addr, align 4 
37:  %elt.abs = call float @llvm.fabs.f32(float %1)

It looks like the argument expression is evaluated twice. Did you remove the Value *Op0 = EmitScalarExpr(E->getArg(0)); calls?

Well, for example:
For __builtin_elementwise_abs, we have code like:

case Builtin::BI__builtin_elementwise_abs: {
  Value *Op0 = EmitScalarExpr(E->getArg(0));
  Value *Result;
  if (Op0->getType()->isIntOrIntVectorTy())
    Result = Builder.CreateBinaryIntrinsic(
        llvm::Intrinsic::abs, Op0, Builder.getFalse(), nullptr, "elt.abs");
  else
    Result = Builder.CreateUnaryIntrinsic(llvm::Intrinsic::fabs, Op0, nullptr,
                                          "elt.abs");

  return RValue::get(Result);
}

If we use emitUnaryBuiltin, we are supposed to do something like:

Result = emitUnaryBuiltin(*this, E, llvm::Intrinsic::fabs, "elt.abs");

We need to pass CallExpr* E to the function to meet its interface, and it calls CGF.EmitScalarExpr(E->getArg(0)); inside. Maybe this is what you said about the argument expression being evaluated twice? I think it is unavoidable if we don't change the function's interface.

Maybe we can have something like:

static Value *emitUnaryBuiltin(CodeGenFunction &CGF, Value* Op0,
                               unsigned IntrinsicID, llvm::StringRef Name) {
  return CGF.Builder.CreateUnaryIntrinsic(IntrinsicID, Op0, nullptr, Name);
}

Then for __builtin_elementwise_abs we can have:

Result = emitUnaryBuiltin(*this, Op0, llvm::Intrinsic::fabs, "elt.abs");

and for __builtin_elementwise_ceil have:

return RValue::get(
    emitUnaryBuiltin(*this, EmitScalarExpr(E->getArg(0)), llvm::Intrinsic::ceil, "elt.ceil"));

WDYT? Well, franking speaking I think this one-line function is ugly but I can't come up with a more elegant solution, I would appreciate it if you can offer some suggestions. :)

junaire updated this revision to Diff 396380.Dec 28 2021, 12:26 AM

In order to use emitUnaryBuiltin in other cases, I changed the function interface.
This allows us to use it in all Builder.CreateUnaryIntrinsic() cases, but will make
the function body very small.

fhahn added a comment.Dec 28 2021, 8:28 AM

In order to use emitUnaryBuiltin in other cases, I changed the function interface.
This allows us to use it in all Builder.CreateUnaryIntrinsic() cases, but will make
the function body very small.

I think we should extend & use the existing emitUnaryBuiltin. I think in most cases where it cannot be used straight away you should be able to slightly rewrite the existing code to not rely on llvm::Type. Then there should be no need to call EmitScalarExpr early (an example is in the inline comments)

clang/lib/CodeGen/CGBuiltin.cpp
535

I think we should extend this emitUnaryBuiltin function, rather than having a second one.

e.g.

 static Value *emitUnaryBuiltin(CodeGenFunction &CGF,
                                const CallExpr *E,
-                               unsigned IntrinsicID) {
+                               unsigned IntrinsicID, llvm::StringRef Name = "") {
   llvm::Value *Src0 = CGF.EmitScalarExpr(E->getArg(0));

   Function *F = CGF.CGM.getIntrinsic(IntrinsicID, Src0->getType());
-  return CGF.Builder.CreateCall(F, Src0);
+  return CGF.Builder.CreateCall(F, Src0, Name);
+}
3201–3204

IICU the only reason to call EmitScalarExpr early is the use in GetIntrinsicID, but it could solely rely on QualType:

-    auto GetIntrinsicID = [](QualType QT, llvm::Type *IrTy) {
-      if (IrTy->isIntOrIntVectorTy()) {
-        if (auto *VecTy = QT->getAs<VectorType>())
-          QT = VecTy->getElementType();
-        if (QT->isSignedIntegerType())
-          return llvm::Intrinsic::vector_reduce_smax;
-        else
-          return llvm::Intrinsic::vector_reduce_umax;
-      }
+    auto GetIntrinsicID = [](QualType QT) {
+      if (auto *VecTy = QT->getAs<VectorType>())
+        QT = VecTy->getElementType();
+      if (QT->isSignedIntegerType())
+        return llvm::Intrinsic::vector_reduce_smax;
+      if (QT->isUnsignedIntegerType())
+        return llvm::Intrinsic::vector_reduce_umax;
+      assert(QT->isFloatingType() && "must have a float here");
       return llvm::Intrinsic::vector_reduce_fmax;
     };
junaire updated this revision to Diff 396490.Dec 28 2021, 11:17 PM

Extend emitUnaryBuiltin instead of adding a new overload, also apply it to __builtin_elementwise_abs.

junaire retitled this revision from [Clang] Add an overload for emitUnaryBuiltin. to [Clang] Extend emitUnaryBuiltin to avoid duplicate logic..Dec 28 2021, 11:18 PM
junaire edited the summary of this revision. (Show Details)
junaire updated this revision to Diff 396492.Dec 28 2021, 11:35 PM
junaire edited the summary of this revision. (Show Details)

Refactor code a little bit and fix wrong names.

fhahn accepted this revision.Jan 2 2022, 1:08 PM

LGTM, thanks!

This revision is now accepted and ready to land.Jan 2 2022, 1:08 PM
fhahn added a comment.Jan 4 2022, 1:17 AM

@junaire please let me know if you want me to land this on your behalf.

@junaire please let me know if you want me to land this on your behalf.

Yeah, thanks a lot!

You can use:
Jun Zhang
jun@junz.org

This revision was landed with ongoing or failed builds.Jan 4 2022, 3:48 AM
This revision was automatically updated to reflect the committed changes.
fhahn added a comment.Jan 4 2022, 3:51 AM

@junaire please let me know if you want me to land this on your behalf.

Yeah, thanks a lot!

You can use:
Jun Zhang
jun@junz.org

Committed! After D115429, I'd recommend you obtain commit access https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access

@junaire please let me know if you want me to land this on your behalf.

Yeah, thanks a lot!

You can use:
Jun Zhang
jun@junz.org

Committed! After D115429, I'd recommend you obtain commit access https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access

That's so great! Thank you for your continued patient guidance and the review work, I really appreciate!