This patch extends emitUnaryBuiltin so that we can better emitting IR when
implement builtins specified in D111529.
Also contains some NFC, applying it to existing code.
Differential D116161
[Clang] Extend emitUnaryBuiltin to avoid duplicate logic. junaire on Dec 22 2021, 4:56 AM. Authored by
Details This patch extends emitUnaryBuiltin so that we can better emitting IR when Also contains some NFC, applying it to existing code.
Diff Detail
Unit Tests Event TimelineComment Actions I meant just update the existing uses *without* adding floor, roundeven, trunc, so this change should be NFC (non-functional change) Comment Actions
I don't know why but these will cause tests to fail. Comment Actions I confirmed that we can use emitUnaryBuiltin in the cases you pointed out. Please see the logs below: $ path/to/llvm-project/build/bin/clang -cc1 -internal-isystem /path/to/llvm-project/build/lib/clang/14.0.0/include -nostdsysteminc -triple x86_64-apple-darwin /path/to/llvm-project/clang/test/CodeGen/builtins-elementwise-math.c -emit-llvm -disable-llvm-passes -o - | /path/to/llvm-project/build/bin/FileCheck /path/to/llvm-project/clang/test/CodeGen/builtins-elementwise-math.c /path/to/llvm-project/clang/test/CodeGen/builtins-elementwise-math.c:16:17: error: CHECK-NEXT: expected string not found in input // CHECK-NEXT: call float @llvm.fabs.f32(float [[F1]]) ^ <stdin>:35:43: note: scanning from here %0 = load float, float* %f1.addr, align 4 ^ <stdin>:35:43: note: with "F1" equal to "%0" %0 = load float, float* %f1.addr, align 4 ^ <stdin>:37:13: note: possible intended match here %elt.abs = call float @llvm.fabs.f32(float %1) ^ Input file: <stdin> Check file: /path/to/llvm-project/clang/test/CodeGen/builtins-elementwise-math.c -dump-input=help explains the following input dump. Input was: <<<<<< 30: store <8 x i16> %vi1, <8 x i16>* %vi1.addr, align 16 31: store <8 x i16> %vi2, <8 x i16>* %vi2.addr, align 16 32: store i64 %i1, i64* %i1.addr, align 8 33: store i64 %i2, i64* %i2.addr, align 8 34: store i16 %si, i16* %si.addr, align 2 35: %0 = load float, float* %f1.addr, align 4 next:16'0 X error: no match found next:16'1 with "F1" equal to "%0" 36: %1 = load float, float* %f1.addr, align 4 next:16'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 37: %elt.abs = call float @llvm.fabs.f32(float %1) next:16'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ next:16'2 ? possible intended match 38: store float %elt.abs, float* %f2.addr, align 4 next:16'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 39: %2 = load double, double* %d1.addr, align 8 next:16'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 40: %3 = load double, double* %d1.addr, align 8 next:16'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 41: %elt.abs1 = call double @llvm.fabs.f64(double %3) next:16'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 42: store double %elt.abs1, double* %d2.addr, align 8 next:16'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>>> $ /path/to/llvm-project/build/bin/clang -cc1 -internal-isystem /path/to/llvm-project/build/lib/clang/14.0.0/include -nostdsysteminc -triple x86_64-apple-darwin ~/dev/cpp-projects/llvm-project/clang/test/CodeGen/builtins-reduction-math.c -emit-llvm -disable-llvm-passes -o - | ~/dev/cpp-projects/llvm-project/build/bin/FileCheck ~/dev/cpp-projects/llvm-project/clang/test/CodeGen/builtins-reduction-math.c /path/to/llvm-project/clang/test/CodeGen/builtins-reduction-math.c:12:17: error: CHECK-NEXT: expected string not found in input // CHECK-NEXT: call float @llvm.vector.reduce.fmax.v4f32(<4 x float> [[VF1]]) ^ <stdin>:23:57: note: scanning from here %0 = load <4 x float>, <4 x float>* %vf1.addr, align 16 ^ <stdin>:23:57: note: with "VF1" equal to "%0" %0 = load <4 x float>, <4 x float>* %vf1.addr, align 16 ^ <stdin>:25:13: note: possible intended match here %rdx.min = call float @llvm.vector.reduce.fmax.v4f32(<4 x float> %1) ^ /path/to/llvm-project/clang/test/CodeGen/builtins-reduction-math.c:38:17: error: CHECK-NEXT: expected string not found in input // CHECK-NEXT: call float @llvm.vector.reduce.fmin.v4f32(<4 x float> [[VF1]]) ^ <stdin>:74:57: note: scanning from here %0 = load <4 x float>, <4 x float>* %vf1.addr, align 16 ^ <stdin>:74:57: note: with "VF1" equal to "%0" %0 = load <4 x float>, <4 x float>* %vf1.addr, align 16 ^ <stdin>:76:13: note: possible intended match here %rdx.min = call float @llvm.vector.reduce.fmin.v4f32(<4 x float> %1) ^ Input file: <stdin> Check file: /path/to/llvm-project/clang/test/CodeGen/builtins-reduction-math.c -dump-input=help explains the following input dump. Input was: <<<<<< 18: %cvi1 = alloca <8 x i16>, align 16 19: %r5 = alloca i64, align 8 20: store <4 x float> %vf1, <4 x float>* %vf1.addr, align 16 21: store <8 x i16> %vi1, <8 x i16>* %vi1.addr, align 16 22: store <4 x i32> %vu1, <4 x i32>* %vu1.addr, align 16 23: %0 = load <4 x float>, <4 x float>* %vf1.addr, align 16 next:12'0 X error: no match found next:12'1 with "VF1" equal to "%0" 24: %1 = load <4 x float>, <4 x float>* %vf1.addr, align 16 next:12'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 25: %rdx.min = call float @llvm.vector.reduce.fmax.v4f32(<4 x float> %1) next:12'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ next:12'2 ? possible intended match 26: store float %rdx.min, float* %r1, align 4 next:12'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 27: %2 = load <8 x i16>, <8 x i16>* %vi1.addr, align 16 next:12'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 28: %3 = load <8 x i16>, <8 x i16>* %vi1.addr, align 16 next:12'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 29: %rdx.min1 = call i16 @llvm.vector.reduce.smax.v8i16(<8 x i16> %3) next:12'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 30: store i16 %rdx.min1, i16* %r2, align 2 next:12'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 69: %cvi1 = alloca <8 x i16>, align 16 70: %r5 = alloca i64, align 8 71: store <4 x float> %vf1, <4 x float>* %vf1.addr, align 16 72: store <8 x i16> %vi1, <8 x i16>* %vi1.addr, align 16 73: store <4 x i32> %vu1, <4 x i32>* %vu1.addr, align 16 74: %0 = load <4 x float>, <4 x float>* %vf1.addr, align 16 next:38'0 X error: no match found next:38'1 with "VF1" equal to "%0" 75: %1 = load <4 x float>, <4 x float>* %vf1.addr, align 16 next:38'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 76: %rdx.min = call float @llvm.vector.reduce.fmin.v4f32(<4 x float> %1) next:38'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ next:38'2 ? possible intended match 77: store float %rdx.min, float* %r1, align 4 next:38'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 78: %2 = load <8 x i16>, <8 x i16>* %vi1.addr, align 16 next:38'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 79: %3 = load <8 x i16>, <8 x i16>* %vi1.addr, align 16 next:38'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 80: %rdx.min1 = call i16 @llvm.vector.reduce.smin.v8i16(<8 x i16> %3) next:38'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 81: store i16 %rdx.min1, i16* %r2, align 2 next:38'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>>> Do you know why these will happen? @fhahn Comment Actions It looks like the argument expression is evaluated twice. Did you remove the Value *Op0 = EmitScalarExpr(E->getArg(0)); calls? Comment Actions Well, for example: case Builtin::BI__builtin_elementwise_abs: { Value *Op0 = EmitScalarExpr(E->getArg(0)); Value *Result; if (Op0->getType()->isIntOrIntVectorTy()) Result = Builder.CreateBinaryIntrinsic( llvm::Intrinsic::abs, Op0, Builder.getFalse(), nullptr, "elt.abs"); else Result = Builder.CreateUnaryIntrinsic(llvm::Intrinsic::fabs, Op0, nullptr, "elt.abs"); return RValue::get(Result); } If we use emitUnaryBuiltin, we are supposed to do something like: Result = emitUnaryBuiltin(*this, E, llvm::Intrinsic::fabs, "elt.abs"); We need to pass CallExpr* E to the function to meet its interface, and it calls CGF.EmitScalarExpr(E->getArg(0)); inside. Maybe this is what you said about the argument expression being evaluated twice? I think it is unavoidable if we don't change the function's interface. Maybe we can have something like: static Value *emitUnaryBuiltin(CodeGenFunction &CGF, Value* Op0, unsigned IntrinsicID, llvm::StringRef Name) { return CGF.Builder.CreateUnaryIntrinsic(IntrinsicID, Op0, nullptr, Name); } Then for __builtin_elementwise_abs we can have: Result = emitUnaryBuiltin(*this, Op0, llvm::Intrinsic::fabs, "elt.abs"); and for __builtin_elementwise_ceil have: return RValue::get( emitUnaryBuiltin(*this, EmitScalarExpr(E->getArg(0)), llvm::Intrinsic::ceil, "elt.ceil")); WDYT? Well, franking speaking I think this one-line function is ugly but I can't come up with a more elegant solution, I would appreciate it if you can offer some suggestions. :) Comment Actions In order to use emitUnaryBuiltin in other cases, I changed the function interface. Comment Actions I think we should extend & use the existing emitUnaryBuiltin. I think in most cases where it cannot be used straight away you should be able to slightly rewrite the existing code to not rely on llvm::Type. Then there should be no need to call EmitScalarExpr early (an example is in the inline comments)
Comment Actions Extend emitUnaryBuiltin instead of adding a new overload, also apply it to __builtin_elementwise_abs. Comment Actions Committed! After D115429, I'd recommend you obtain commit access https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access Comment Actions That's so great! Thank you for your continued patient guidance and the review work, I really appreciate! |
I think we should extend this emitUnaryBuiltin function, rather than having a second one.
e.g.