This patch extends emitUnaryBuiltin so that we can better emitting IR when
implement builtins specified in D111529.
Also contains some NFC, applying it to existing code.
Differential D116161
[Clang] Extend emitUnaryBuiltin to avoid duplicate logic. junaire on Dec 22 2021, 4:56 AM. Authored by
Details This patch extends emitUnaryBuiltin so that we can better emitting IR when Also contains some NFC, applying it to existing code.
Diff Detail
Event TimelineComment Actions I meant just update the existing uses *without* adding floor, roundeven, trunc, so this change should be NFC (non-functional change) Comment Actions
I don't know why but these will cause tests to fail. Comment Actions I confirmed that we can use emitUnaryBuiltin in the cases you pointed out. Please see the logs below: $ path/to/llvm-project/build/bin/clang -cc1 -internal-isystem /path/to/llvm-project/build/lib/clang/14.0.0/include -nostdsysteminc -triple x86_64-apple-darwin /path/to/llvm-project/clang/test/CodeGen/builtins-elementwise-math.c -emit-llvm -disable-llvm-passes -o - | /path/to/llvm-project/build/bin/FileCheck /path/to/llvm-project/clang/test/CodeGen/builtins-elementwise-math.c /path/to/llvm-project/clang/test/CodeGen/builtins-elementwise-math.c:16:17: error: CHECK-NEXT: expected string not found in input // CHECK-NEXT: call float @llvm.fabs.f32(float [[F1]]) ^ <stdin>:35:43: note: scanning from here %0 = load float, float* %f1.addr, align 4 ^ <stdin>:35:43: note: with "F1" equal to "%0" %0 = load float, float* %f1.addr, align 4 ^ <stdin>:37:13: note: possible intended match here %elt.abs = call float @llvm.fabs.f32(float %1) ^ Input file: <stdin> Check file: /path/to/llvm-project/clang/test/CodeGen/builtins-elementwise-math.c -dump-input=help explains the following input dump. Input was: <<<<<< 30: store <8 x i16> %vi1, <8 x i16>* %vi1.addr, align 16 31: store <8 x i16> %vi2, <8 x i16>* %vi2.addr, align 16 32: store i64 %i1, i64* %i1.addr, align 8 33: store i64 %i2, i64* %i2.addr, align 8 34: store i16 %si, i16* %si.addr, align 2 35: %0 = load float, float* %f1.addr, align 4 next:16'0 X error: no match found next:16'1 with "F1" equal to "%0" 36: %1 = load float, float* %f1.addr, align 4 next:16'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 37: %elt.abs = call float @llvm.fabs.f32(float %1) next:16'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ next:16'2 ? possible intended match 38: store float %elt.abs, float* %f2.addr, align 4 next:16'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 39: %2 = load double, double* %d1.addr, align 8 next:16'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 40: %3 = load double, double* %d1.addr, align 8 next:16'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 41: %elt.abs1 = call double @llvm.fabs.f64(double %3) next:16'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 42: store double %elt.abs1, double* %d2.addr, align 8 next:16'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>>> $ /path/to/llvm-project/build/bin/clang -cc1 -internal-isystem /path/to/llvm-project/build/lib/clang/14.0.0/include -nostdsysteminc -triple x86_64-apple-darwin ~/dev/cpp-projects/llvm-project/clang/test/CodeGen/builtins-reduction-math.c -emit-llvm -disable-llvm-passes -o - | ~/dev/cpp-projects/llvm-project/build/bin/FileCheck ~/dev/cpp-projects/llvm-project/clang/test/CodeGen/builtins-reduction-math.c /path/to/llvm-project/clang/test/CodeGen/builtins-reduction-math.c:12:17: error: CHECK-NEXT: expected string not found in input // CHECK-NEXT: call float @llvm.vector.reduce.fmax.v4f32(<4 x float> [[VF1]]) ^ <stdin>:23:57: note: scanning from here %0 = load <4 x float>, <4 x float>* %vf1.addr, align 16 ^ <stdin>:23:57: note: with "VF1" equal to "%0" %0 = load <4 x float>, <4 x float>* %vf1.addr, align 16 ^ <stdin>:25:13: note: possible intended match here %rdx.min = call float @llvm.vector.reduce.fmax.v4f32(<4 x float> %1) ^ /path/to/llvm-project/clang/test/CodeGen/builtins-reduction-math.c:38:17: error: CHECK-NEXT: expected string not found in input // CHECK-NEXT: call float @llvm.vector.reduce.fmin.v4f32(<4 x float> [[VF1]]) ^ <stdin>:74:57: note: scanning from here %0 = load <4 x float>, <4 x float>* %vf1.addr, align 16 ^ <stdin>:74:57: note: with "VF1" equal to "%0" %0 = load <4 x float>, <4 x float>* %vf1.addr, align 16 ^ <stdin>:76:13: note: possible intended match here %rdx.min = call float @llvm.vector.reduce.fmin.v4f32(<4 x float> %1) ^ Input file: <stdin> Check file: /path/to/llvm-project/clang/test/CodeGen/builtins-reduction-math.c -dump-input=help explains the following input dump. Input was: <<<<<< 18: %cvi1 = alloca <8 x i16>, align 16 19: %r5 = alloca i64, align 8 20: store <4 x float> %vf1, <4 x float>* %vf1.addr, align 16 21: store <8 x i16> %vi1, <8 x i16>* %vi1.addr, align 16 22: store <4 x i32> %vu1, <4 x i32>* %vu1.addr, align 16 23: %0 = load <4 x float>, <4 x float>* %vf1.addr, align 16 next:12'0 X error: no match found next:12'1 with "VF1" equal to "%0" 24: %1 = load <4 x float>, <4 x float>* %vf1.addr, align 16 next:12'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 25: %rdx.min = call float @llvm.vector.reduce.fmax.v4f32(<4 x float> %1) next:12'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ next:12'2 ? possible intended match 26: store float %rdx.min, float* %r1, align 4 next:12'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 27: %2 = load <8 x i16>, <8 x i16>* %vi1.addr, align 16 next:12'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 28: %3 = load <8 x i16>, <8 x i16>* %vi1.addr, align 16 next:12'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 29: %rdx.min1 = call i16 @llvm.vector.reduce.smax.v8i16(<8 x i16> %3) next:12'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 30: store i16 %rdx.min1, i16* %r2, align 2 next:12'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 69: %cvi1 = alloca <8 x i16>, align 16 70: %r5 = alloca i64, align 8 71: store <4 x float> %vf1, <4 x float>* %vf1.addr, align 16 72: store <8 x i16> %vi1, <8 x i16>* %vi1.addr, align 16 73: store <4 x i32> %vu1, <4 x i32>* %vu1.addr, align 16 74: %0 = load <4 x float>, <4 x float>* %vf1.addr, align 16 next:38'0 X error: no match found next:38'1 with "VF1" equal to "%0" 75: %1 = load <4 x float>, <4 x float>* %vf1.addr, align 16 next:38'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 76: %rdx.min = call float @llvm.vector.reduce.fmin.v4f32(<4 x float> %1) next:38'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ next:38'2 ? possible intended match 77: store float %rdx.min, float* %r1, align 4 next:38'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 78: %2 = load <8 x i16>, <8 x i16>* %vi1.addr, align 16 next:38'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 79: %3 = load <8 x i16>, <8 x i16>* %vi1.addr, align 16 next:38'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 80: %rdx.min1 = call i16 @llvm.vector.reduce.smin.v8i16(<8 x i16> %3) next:38'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 81: store i16 %rdx.min1, i16* %r2, align 2 next:38'0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >>>>>> Do you know why these will happen? @fhahn Comment Actions It looks like the argument expression is evaluated twice. Did you remove the Value *Op0 = EmitScalarExpr(E->getArg(0)); calls? Comment Actions Well, for example: case Builtin::BI__builtin_elementwise_abs: { Value *Op0 = EmitScalarExpr(E->getArg(0)); Value *Result; if (Op0->getType()->isIntOrIntVectorTy()) Result = Builder.CreateBinaryIntrinsic( llvm::Intrinsic::abs, Op0, Builder.getFalse(), nullptr, "elt.abs"); else Result = Builder.CreateUnaryIntrinsic(llvm::Intrinsic::fabs, Op0, nullptr, "elt.abs"); return RValue::get(Result); } If we use emitUnaryBuiltin, we are supposed to do something like: Result = emitUnaryBuiltin(*this, E, llvm::Intrinsic::fabs, "elt.abs"); We need to pass CallExpr* E to the function to meet its interface, and it calls CGF.EmitScalarExpr(E->getArg(0)); inside. Maybe this is what you said about the argument expression being evaluated twice? I think it is unavoidable if we don't change the function's interface. Maybe we can have something like: static Value *emitUnaryBuiltin(CodeGenFunction &CGF, Value* Op0, unsigned IntrinsicID, llvm::StringRef Name) { return CGF.Builder.CreateUnaryIntrinsic(IntrinsicID, Op0, nullptr, Name); } Then for __builtin_elementwise_abs we can have: Result = emitUnaryBuiltin(*this, Op0, llvm::Intrinsic::fabs, "elt.abs"); and for __builtin_elementwise_ceil have: return RValue::get( emitUnaryBuiltin(*this, EmitScalarExpr(E->getArg(0)), llvm::Intrinsic::ceil, "elt.ceil")); WDYT? Well, franking speaking I think this one-line function is ugly but I can't come up with a more elegant solution, I would appreciate it if you can offer some suggestions. :) Comment Actions In order to use emitUnaryBuiltin in other cases, I changed the function interface. Comment Actions I think we should extend & use the existing emitUnaryBuiltin. I think in most cases where it cannot be used straight away you should be able to slightly rewrite the existing code to not rely on llvm::Type. Then there should be no need to call EmitScalarExpr early (an example is in the inline comments)
Comment Actions Extend emitUnaryBuiltin instead of adding a new overload, also apply it to __builtin_elementwise_abs. Comment Actions Committed! After D115429, I'd recommend you obtain commit access https://llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access Comment Actions That's so great! Thank you for your continued patient guidance and the review work, I really appreciate! |
I think we should extend this emitUnaryBuiltin function, rather than having a second one.
e.g.