r337347 added support for the Signal Processing Engine (SPE) to LLVM.
This follows that up with the clang side.
Not yet implemented: vectors, SPE builtins.
Differential D49754
Add -m(no-)spe, and e500 CPU definitions and support to clang jhibbits on Jul 24 2018, 1:27 PM. Authored by
Details r337347 added support for the Signal Processing Engine (SPE) to LLVM. Not yet implemented: vectors, SPE builtins.
Diff Detail
Event TimelineComment Actions Hello, Thank you for working this. I tried the change and have a couple of suggestions:
Other than that, it appears that the implementation has several issues. Since this patch is partially unmerged yet, I did not use bugzilla but include the information below.
struct a { long double b() const; }; class c { struct C { C(long d) : e(d) {} long e; double f; }; unsigned g; C h[5]; unsigned i; void k(C d) { long j(g); if (j) h[i] = d; } c &fn(const a &); }; c &c::fn(const a &d) { double l = d.b(); k(l); } $ clang-7 -cc1 -triple powerpc-gnu-linux-eabi -emit-obj -target-cpu ppc -target-feature +spe -O2 t.cpp t.cpp:23:1: warning: control reaches end of non-void function } ^ Stack dump: 0. Program arguments: clang-7 -cc1 -triple powerpc-gnu-linux-eabi -emit-obj -target-cpu ppc -target-feature +spe -O2 t.cpp 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module 't.cpp'. 4. Running pass 'PowerPC DAG->DAG Pattern Instruction Selection' on function '@_ZN1c2fnERK1a' 0 clang-7 0x000000010a6e722d llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 37 1 clang-7 0x000000010a6e760d SignalHandler(int) + 185 2 libsystem_platform.dylib 0x00007fff6a4baf5a _sigtramp + 26 3 clang-7 0x000000010ac4948a llvm::DenseMapBase<llvm::SmallDenseMap<unsigned int, unsigned int, 8u, llvm::DenseMapInfo<unsigned int>, llvm::detail::DenseMapPair<unsigned int, unsigned int> >, unsigned int, unsigned int, llvm::DenseMapInfo<unsigned int>, llvm::detail::DenseMapPair<unsigned int, unsigned int> >::find(unsigned int const&) + 18 4 clang-7 0x000000010ac3e23e llvm::DAGTypeLegalizer::ExpandIntRes_SIGN_EXTEND(llvm::SDNode*, llvm::SDValue&, llvm::SDValue&) + 514 5 clang-7 0x000000010ac3a670 llvm::DAGTypeLegalizer::ExpandIntegerResult(llvm::SDNode*, unsigned int) + 1774 6 clang-7 0x000000010ac49793 llvm::DAGTypeLegalizer::run() + 531 7 clang-7 0x000000010ac4c1cd llvm::SelectionDAG::LegalizeTypes() + 57 8 clang-7 0x000000010acde3da llvm::SelectionDAGISel::CodeGenAndEmitDAG() + 468 9 clang-7 0x000000010acdd8a9 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 5521 10 clang-7 0x000000010acdbbc9 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 1433 11 clang-7 0x0000000109e18962 (anonymous namespace)::PPCDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 70 12 clang-7 0x000000010a20d353 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 113 13 clang-7 0x000000010a393af4 llvm::FPPassManager::runOnFunction(llvm::Function&) + 338 14 clang-7 0x000000010a393c89 llvm::FPPassManager::runOnModule(llvm::Module&) + 49 15 clang-7 0x000000010a393f66 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 556 16 clang-7 0x000000010a81a77b clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) + 13071 17 clang-7 0x000000010a97e61c clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) + 888 18 clang-7 0x000000010b14479c clang::ParseAST(clang::Sema&, bool, bool) + 458 19 clang-7 0x000000010aaff2b9 clang::FrontendAction::Execute() + 67 20 clang-7 0x000000010aad1ac0 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 664 21 clang-7 0x000000010ab3352c clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 1267 22 clang-7 0x0000000109c49f00 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) + 1138 23 clang-7 0x0000000109c4923d main + 7612 24 libdyld.dylib 0x00007fff6a1ac015 start + 1 25 libdyld.dylib 0x000000000000000b start + 2514829303 Segmentation fault: 11
int global_var1; void test_func2(void) { } void test_func(int a1, long long a2, void *a3, int *a4, int *a5) { int *ptr = &global_var1; if (ptr != 0) { volatile int v1[6] = {}; *a5 = 0; test_func2(); volatile int v2[5] = {}; } else { volatile int v3[3] = {}; *a5 = 0; } } void main(void) { int ret; test_func(0, 0, 0, 0, &ret); } The assembly generated is as follows: $ clang -S -o main.S -c -target powerpc-gnu-linux-eabi -mcpu=e500 -mspe main.c .Lfunc_begin1: # %bb.0: mflr 0 stw 0, 4(1) stwu 1, -288(1) stw 31, 284(1) mr 31, 1 stw 30, 280(31) # 4-byte Folded Spill evstdd 14, 128(31) # 8-byte Folded Spill evstdd 15, 136(31) # 8-byte Folded Spill evstdd 16, 144(31) # 8-byte Folded Spill evstdd 17, 152(31) # 8-byte Folded Spill evstdd 18, 160(31) # 8-byte Folded Spill evstdd 19, 168(31) # 8-byte Folded Spill evstdd 20, 176(31) # 8-byte Folded Spill evstdd 21, 184(31) # 8-byte Folded Spill evstdd 22, 192(31) # 8-byte Folded Spill evstdd 23, 200(31) # 8-byte Folded Spill evstdd 24, 208(31) # 8-byte Folded Spill evstdd 25, 216(31) # 8-byte Folded Spill evstdd 26, 224(31) # 8-byte Folded Spill evstdd 27, 232(31) # 8-byte Folded Spill evstdd 28, 240(31) # 8-byte Folded Spill evstdd 29, 248(31) # 8-byte Folded Spill evstdd 30, 256(31) # 8-byte Folded Spill evstdd 31, 264(31) # 8-byte Folded Spill … The offset wraps around and what is actually encoded for the last two instructions is effectively: evstdd 30, 0(31) evstdd 30, 8(31) Hope this helps, Comment Actions Hi Vit, Thanks for the feedback. I can add the -mspe=yes/no, that shouldn't be hard. I didn't include it because it's been deprecated by GCC already as well. I can add the -mcpu=8548 option as well. I use -mcpu=8540 on FreeBSD for the powerpcspe target anyway (GCC based). Regarding the stack overwriting, that's very peculiar, because I explicitly mark the immediate as needing to be an 8-bit multiple-of-8 value, so the codegen logic should take that into account. I'm surprised it's not. I've reproduced it as well myself, so I can take a closer look. Comment Actions Thanks for the fix. I made a quick check of the mentioned patch, and it looks like it does solve the issue. However, besides the previous crash, which remains unfixed as of 7.0.1rc2, there is another instruction selection failure crash that may be caused by the change. I have not yet had a chance to properly research it, but here is an example if you feel interested: http://llvm.org/svn/llvm-project/compiler-rt/tags/RELEASE_701/rc2/lib/builtins/divdc3.c clang -O3 -std=c11 -target powerpc-gnu-linux-eabi -ffreestanding -nostdlib -g -mcpu=e500 -mspe -femulated-tls -c divdc3.c -o divdc3.o fatal error: error in backend: Cannot select: 0x7fcb8184c270: i64 = build_pair 0x7fcb8184c208, 0x7fcb8184c1a0, divdc3.c:24:22 0x7fcb8184c208: i32,ch,glue = CopyFromReg 0x7fcb8184c1a0:1, Register:i32 $r4, 0x7fcb8184c1a0:2, divdc3.c:24:22 0x7fcb81846678: i32 = Register $r4 0x7fcb8184c1a0: i32,ch,glue = CopyFromReg 0x7fcb8184c138, Register:i32 $r3, 0x7fcb8184c138:1, divdc3.c:24:22 0x7fcb818465a8: i32 = Register $r3 0x7fcb8184c138: ch,glue = callseq_end 0x7fcb8184c0d0, TargetConstant:i32<8>, TargetConstant:i32<0>, 0x7fcb8184c0d0:1, divdc3.c:24:22 0x7fcb81846408: i32 = TargetConstant<8> 0x7fcb81846470: i32 = TargetConstant<0> 0x7fcb8184c0d0: ch,glue = PPCISD::CALL 0x7fcb8184c000, TargetExternalSymbol:i32'fmax' [TF=1], Register:i32 $r3, Register:i32 $r4, Register:i32 $r5, Register:i32 $r6, RegisterMask:Untyped, 0x7fcb8184c000:1, divdc3.c:24:22 0x7fcb8184c068: i32 = TargetExternalSymbol'fmax' [TF=1] 0x7fcb818465a8: i32 = Register $r3 0x7fcb81846678: i32 = Register $r4 0x7fcb81849f48: i32 = Register $r5 0x7fcb8184a768: i32 = Register $r6 0x7fcb818467b0: Untyped = RegisterMask 0x7fcb8184c000: ch,glue = CopyToReg 0x7fcb8184a220, Register:i32 $r6, 0x7fcb81842200, 0x7fcb8184a220:1, divdc3.c:24:22 0x7fcb8184a768: i32 = Register $r6 0x7fcb81842200: i32 = truncate 0x7fcb8184c680, divdc3.c:24:22 0x7fcb8184c680: i64,ch = load<(load 8 from %stack.7)> 0x7fcb8184c618, FrameIndex:i32<7>, undef:i32, divdc3.c:24:22 0x7fcb81846c28: i32 = FrameIndex<7> 0x7fcb81842818: i32 = undef 0x7fcb8184a220: ch,glue = CopyToReg 0x7fcb81849c70, Register:i32 $r5, 0x7fcb8184c5b0, 0x7fcb81849c70:1, divdc3.c:24:22 0x7fcb81849f48: i32 = Register $r5 0x7fcb8184c5b0: i32 = truncate 0x7fcb8184c2d8, divdc3.c:24:22 0x7fcb8184c2d8: i64 = srl 0x7fcb8184c680, Constant:i32<32>, divdc3.c:24:22 0x7fcb8184c680: i64,ch = load<(load 8 from %stack.7)> 0x7fcb8184c618, FrameIndex:i32<7>, undef:i32, divdc3.c:24:22 0x7fcb81846fd0: i32 = Constant<32> 0x7fcb81849c70: ch,glue = CopyToReg 0x7fcb81846540, Register:i32 $r4, 0x7fcb818469b8, 0x7fcb81846540:1, divdc3.c:24:22 0x7fcb81846678: i32 = Register $r4 0x7fcb818469b8: i32 = truncate 0x7fcb8184c750, divdc3.c:24:22 0x7fcb8184c750: i64,ch = load<(load 8 from %stack.8)> 0x7fcb818462d0, FrameIndex:i32<8>, undef:i32, divdc3.c:24:22 0x7fcb81846540: ch,glue = CopyToReg 0x7fcb818470a0, Register:i32 $r3, 0x7fcb8184c6e8, divdc3.c:24:22 0x7fcb818465a8: i32 = Register $r3 0x7fcb8184c6e8: i32 = truncate 0x7fcb818463a0, divdc3.c:24:22 0x7fcb8184c1a0: i32,ch,glue = CopyFromReg 0x7fcb8184c138, Register:i32 $r3, 0x7fcb8184c138:1, divdc3.c:24:22 0x7fcb818465a8: i32 = Register $r3 0x7fcb8184c138: ch,glue = callseq_end 0x7fcb8184c0d0, TargetConstant:i32<8>, TargetConstant:i32<0>, 0x7fcb8184c0d0:1, divdc3.c:24:22 0x7fcb81846408: i32 = TargetConstant<8> 0x7fcb81846470: i32 = TargetConstant<0> 0x7fcb8184c0d0: ch,glue = PPCISD::CALL 0x7fcb8184c000, TargetExternalSymbol:i32'fmax' [TF=1], Register:i32 $r3, Register:i32 $r4, Register:i32 $r5, Register:i32 $r6, RegisterMask:Untyped, 0x7fcb8184c000:1, divdc3.c:24:22 0x7fcb8184c068: i32 = TargetExternalSymbol'fmax' [TF=1] 0x7fcb818465a8: i32 = Register $r3 0x7fcb81846678: i32 = Register $r4 0x7fcb81849f48: i32 = Register $r5 0x7fcb8184a768: i32 = Register $r6 0x7fcb818467b0: Untyped = RegisterMask 0x7fcb8184c000: ch,glue = CopyToReg 0x7fcb8184a220, Register:i32 $r6, 0x7fcb81842200, 0x7fcb8184a220:1, divdc3.c:24:22 0x7fcb8184a768: i32 = Register $r6 0x7fcb81842200: i32 = truncate 0x7fcb8184c680, divdc3.c:24:22 0x7fcb8184c680: i64,ch = load<(load 8 from %stack.7)> 0x7fcb8184c618, FrameIndex:i32<7>, undef:i32, divdc3.c:24:22 0x7fcb81846c28: i32 = FrameIndex<7> 0x7fcb81842818: i32 = undef 0x7fcb8184a220: ch,glue = CopyToReg 0x7fcb81849c70, Register:i32 $r5, 0x7fcb8184c5b0, 0x7fcb81849c70:1, divdc3.c:24:22 0x7fcb81849f48: i32 = Register $r5 0x7fcb8184c5b0: i32 = truncate 0x7fcb8184c2d8, divdc3.c:24:22 0x7fcb8184c2d8: i64 = srl 0x7fcb8184c680, Constant:i32<32>, divdc3.c:24:22 0x7fcb8184c680: i64,ch = load<(load 8 from %stack.7)> 0x7fcb8184c618, FrameIndex:i32<7>, undef:i32, divdc3.c:24:22 0x7fcb81846c28: i32 = FrameIndex<7> 0x7fcb81842818: i32 = undef 0x7fcb81846fd0: i32 = Constant<32> 0x7fcb81849c70: ch,glue = CopyToReg 0x7fcb81846540, Register:i32 $r4, 0x7fcb818469b8, 0x7fcb81846540:1, divdc3.c:24:22 0x7fcb81846678: i32 = Register $r4 0x7fcb818469b8: i32 = truncate 0x7fcb8184c750, divdc3.c:24:22 0x7fcb8184c750: i64,ch = load<(load 8 from %stack.8)> 0x7fcb818462d0, FrameIndex:i32<8>, undef:i32, divdc3.c:24:22 0x7fcb81846268: i32 = FrameIndex<8> 0x7fcb81842818: i32 = undef 0x7fcb81846540: ch,glue = CopyToReg 0x7fcb818470a0, Register:i32 $r3, 0x7fcb8184c6e8, divdc3.c:24:22 0x7fcb818465a8: i32 = Register $r3 0x7fcb8184c6e8: i32 = truncate 0x7fcb818463a0, divdc3.c:24:22 0x7fcb818463a0: i64 = srl 0x7fcb8184c750, Constant:i32<32>, divdc3.c:24:22 In function: __divdc3 clang-7: error: clang frontend command failed with exit code 70 (use -v to see invocation) clang version 7.0.1 (tags/RELEASE_701/rc2 347035) Target: powerpc-gnu-linux-eabi Thread model: posix InstalledDir: /llvm/bin clang-7: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. clang-7: note: diagnostic msg: ******************** PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: Preprocessed source(s) and associated run script(s) are located at: clang-7: note: diagnostic msg: /var/folders/9y/zl96m3v94kgcc6wxx0tvqhsc0000gn/T/divdc3-042ad6.c clang-7: note: diagnostic msg: /var/folders/9y/zl96m3v94kgcc6wxx0tvqhsc0000gn/T/divdc3-042ad6.sh clang-7: note: diagnostic msg: Crash backtrace is located in clang-7: note: diagnostic msg: /Library/Logs/DiagnosticReports/clang-7_<YYYY-MM-DD-HHMMSS>_<hostname>.crash clang-7: note: diagnostic msg: (choose the .crash file that corresponds to your crash) clang-7: note: diagnostic msg: ******************** Comment Actions Hi Vit, I'm unable to reproduce the previous crash with clang8, so for now assuming it's fixed by other events. If that turns out not to be the case, I can revisit it. I am able to reproduce this crash though, so am taking a look. I'm' not quite sure how to interpret it, since i64 is supposed to be a synthetic type on this platform, so the normal expansion should be taking care of it. I'll see what I can find. Thanks for your testing! Comment Actions I have applied this patch to the llvm-toolchain-7 package in Debian and did not see any regressions on x86_64 or 32-Bit PowerPC. Additionally, I have included the patches from https://reviews.llvm.org/D54409 and https://reviews.llvm.org/D54583 saw no regressions on x86_64 and 32-bit PowerPC. All three patches will be part of the next upload of the llvm-toolchain-7 package in Debian unstable which will be version 1:7.0.1~+rc2-9. Comment Actions Might the first crash from https://reviews.llvm.org/D49754#1183753 reproduce for you or perhaps you have already bisected to trunk to figure out the changest that fixes it? Comment Actions Yes, it does on Debian unstable 32-bit PowerPC with the patches against LLVM-7, didn't test LLVM-8: root@kapitsa:/srv/llvm# clang-7 -cc1 -triple powerpc-gnu-linux-eabi -emit-obj -target-cpu ppc -target-feature +spe -O2 t.cpp t.cpp:23:1: warning: control reaches end of non-void function } ^ Stack dump: 0. Program arguments: clang-7 -cc1 -triple powerpc-gnu-linux-eabi -emit-obj -target-cpu ppc -target-feature +spe -O2 t.cpp 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module 't.cpp'. 4. Running pass 'PowerPC DAG->DAG Pattern Instruction Selection' on function '@_ZN1c2fnERK1a' /usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm3sys15PrintStackTraceERNS_11raw_ostreamE+0x40)[0xcc684a4] /usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(+0x81886c)[0xcc6886c] /usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm3sys17RunSignalHandlersEv+0xd8)[0xcc65b68] /usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(+0x818a8c)[0xcc68a8c] linux-vdso32.so.1(__kernel_sigtramp32+0x0)[0x100424] /usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(+0xe157dc)[0xd2657dc] /usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(+0xdfdab0)[0xd24dab0] /usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(+0xdf7224)[0xd247224] /usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(+0xe10014)[0xd260014] /usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm12SelectionDAG13LegalizeTypesEv+0x244)[0xd265ba4] /usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm16SelectionDAGISel17CodeGenAndEmitDAGEv+0x238)[0xd36a39c] /usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm16SelectionDAGISel16SelectBasicBlockENS_14ilist_iteratorINS_12ilist_detail12node_optionsINS_11InstructionELb0ELb0EvEELb0ELb1EEES6_Rb+0x1fc)[0xd36a048] /usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE+0x21dc)[0xd368af0] /usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE+0x778)[0xd3659cc] /usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(+0x24a5a00)[0xe8f5a00] /usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE+0xcc)[0xcf98948] /usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE+0x1e8)[0xcdc0e18] /usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE+0x60)[0xcdc13d4] /usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE+0x390)[0xcdc1960] /usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm6legacy11PassManager3runERNS_6ModuleE+0x14)[0xcdc220c] clang-7(_ZN5clang17EmitBackendOutputERNS_17DiagnosticsEngineERKNS_19HeaderSearchOptionsERKNS_14CodeGenOptionsERKNS_13TargetOptionsERKNS_11LangOptionsERKN4llvm10DataLayoutEPNSE_6ModuleENS_13BackendActionESt10unique_ptrINSE_17raw_pwrite_streamESt14default_deleteISM_EE+0x32b4)[0x1023f644] clang-7[0x1074857c] clang-7(_ZN5clang8ParseASTERNS_4SemaEbb+0x2b4)[0x10c3da9c] clang-7(_ZN5clang17ASTFrontendAction13ExecuteActionEv+0xac)[0x106a2a84] clang-7(_ZN5clang13CodeGenAction13ExecuteActionEv+0x390)[0x107472f4] clang-7(_ZN5clang14FrontendAction7ExecuteEv+0x6c)[0x106a2408] clang-7(_ZN5clang16CompilerInstance13ExecuteActionERNS_14FrontendActionE+0x638)[0x10659aa4] clang-7(_ZN5clang25ExecuteCompilerInvocationEPNS_16CompilerInstanceE+0x748)[0x10743ba0] clang-7(_Z8cc1_mainN4llvm8ArrayRefIPKcEES2_Pv+0x6bc)[0x101ee2c4] clang-7(main+0x2920)[0x101ec550] /lib/powerpc-linux-gnu/libc.so.6(+0x22558)[0xbef2558] /lib/powerpc-linux-gnu/libc.so.6(__libc_start_main+0xe8)[0xbef2748] Segmentation fault root@kapitsa:/srv/llvm# Comment Actions Ok, I found the fix for the first crash that landed in 8.0 trunk. It works fine for me if backported to 7.0.1: Comment Actions Hi vit, I found what's going on with the "Cannot Select" error. divdc3.c contains code that gets lowered to a libcall. However this lowering doesn't go further to lower down to the legal operations permitted in this target. The debug snippet is: Legalizing: t38: f64 = fmaxnum t36, t37, /home/chmeee/freebsd/contrib/compiler-rt/ lib/builtins/divdc3.c:24:22 Op: t36: f64 = fabs t79, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins /divdc3.c:24:22 Op: t37: f64 = fabs t80, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins /divdc3.c:24:22 Trying to expand node Cannot expand node Trying to convert node to libcall Creating new node: t99: i64 = bitcast t36, /home/chmeee/freebsd/contrib/compiler-r t/lib/builtins/divdc3.c:24:22 Creating new node: t100: i32 = extract_element t99, Constant:i32<1>, /home/chmeee/ freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22 Creating new node: t101: i32 = extract_element t99, Constant:i32<0>, /home/chmeee/ freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22 Creating new node: t102: i64 = bitcast t37, /home/chmeee/freebsd/contrib/compiler- rt/lib/builtins/divdc3.c:24:22 Creating new node: t103: i32 = extract_element t102, Constant:i32<1>, /home/chmeee /freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22 Creating new node: t104: i32 = extract_element t102, Constant:i32<0>, /home/chmeee /freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22 Creating new node: t105: ch,glue = callseq_start t0, TargetConstant:i32<8>, Target Constant:i32<0>, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24 :22 Creating new node: t107: ch,glue = CopyToReg t105, Register:i32 $r3, t100, /home/c hmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22 Creating new node: t108: ch,glue = CopyToReg t107, Register:i32 $r4, t101, t107:1, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22 Creating new node: t110: ch,glue = CopyToReg t108, Register:i32 $r5, t103, t108:1, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22 Creating new node: t112: ch,glue = CopyToReg t110, Register:i32 $r6, t104, t110:1, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22 Creating new node: t114: ch,glue = PPCISD::CALL t112, TargetExternalSymbol:i32'fma x' [TF=1], Register:i32 $r3, Register:i32 $r4, Register:i32 $r5, Register:i32 $r6, RegisterMask:Untyped, t112:1, /home/chmeee/freebsd/contrib/compiler-rt/lib/builti ns/divdc3.c:24:22 Creating new node: t115: ch,glue = callseq_end t114, TargetConstant:i32<8>, Target Constant:i32<0>, t114:1, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/div dc3.c:24:22 Creating new node: t116: i32,ch,glue = CopyFromReg t115, Register:i32 $r3, t115:1, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22 Creating new node: t117: i32,ch,glue = CopyFromReg t116:1, Register:i32 $r4, t116: 2, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22 Creating new node: t118: i64 = build_pair t117, t116, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22 Creating new node: t119: f64 = bitcast t118, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22 Created libcall: t119: f64 = bitcast t118, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22 Successfully converted node to libcall ... replacing: t38: f64 = fmaxnum t36, t37, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22 with: t119: f64 = bitcast t118, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22 The last 3 lines are the key. It's converting the node to a bitcast libcall, which ends up yielding a 'f64 bitcast (i64 build_pair i32, i32)', instead of lowering further to 'f64 build_spe64 i32, i32'. Comment Actions @jhibbits sorry for not answering earlier, I was occupied with NY holidays, and then had a lot of stuff on the road. Trying to sync to your latest changes, I merged the updated https://reviews.llvm.org/D54583 in my local copy with your fixes to libcall expansion. I narrowed a potentially smaller example, which reproduces the issue. Does it reproduce for you locally? void f1(long double v) { } void f2(void* v, __builtin_va_list arg) { f1(__builtin_va_arg(arg, long double)); } $ clang -c sample.c -target powerpc-gnu-linux-eabi -ffreestanding -nostdlib -femulated-tls -mcpu=8548 -mspe Stack dump: 0. Program arguments: clang-7 -cc1 -triple powerpc-gnu-linux-eabi -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name sample.c -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -ffreestanding -fuse-init-array -target-cpu ppc -target-feature +spe -mfloat-abi hard -dwarf-column-info -debugger-tuning=gdb -target-linker-version 409.12 -coverage-notes-file Desktop/sample/sample.gcno -resource-dir /llvm/llvm-7.0.1-Darwin-x86_64/lib/clang/7.0.1 -internal-isystem /usr/local/include -internal-isystem /llvm/llvm-7.0.1-Darwin-x86_64/lib/clang/7.0.1/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -fdebug-compilation-dir Desktop/sample -ferror-limit 19 -fmessage-length 265 -femulated-tls -fno-signed-char -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -o sample.o -x c Desktop/sample/sample.c -faddrsig 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module 'Desktop/sample/sample.c'. 4. Running pass 'PowerPC DAG->DAG Pattern Instruction Selection' on function '@f2' 0 clang-7 0x00000001099af90a llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 37 1 clang-7 0x00000001099afcfa SignalHandler(int) + 200 2 libsystem_platform.dylib 0x00007fff52975f5a _sigtramp + 26 3 libsystem_platform.dylib 0x00007fe219f0eec0 _sigtramp + 3344535424 4 clang-7 0x00000001091093e1 llvm::PPCTargetLowering::LowerCall(llvm::TargetLowering::CallLoweringInfo&, llvm::SmallVectorImpl<llvm::SDValue>&) const + 867 5 clang-7 0x0000000109f55b7a llvm::TargetLowering::LowerCallTo(llvm::TargetLowering::CallLoweringInfo&) const + 3856 6 clang-7 0x0000000109f686fd llvm::SelectionDAGBuilder::lowerInvokable(llvm::TargetLowering::CallLoweringInfo&, llvm::BasicBlock const*) + 357 7 clang-7 0x0000000109f5ad64 llvm::SelectionDAGBuilder::LowerCallTo(llvm::ImmutableCallSite, llvm::SDValue, bool, llvm::BasicBlock const*) + 1178 8 clang-7 0x0000000109f4e05a llvm::SelectionDAGBuilder::visitCall(llvm::CallInst const&) + 1016 9 clang-7 0x0000000109f474c0 llvm::SelectionDAGBuilder::visit(llvm::Instruction const&) + 114 10 clang-7 0x0000000109fa76d3 llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void>, false, true>, llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void>, false, true>, bool&) + 177 11 clang-7 0x0000000109fa6dde llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 5532 12 clang-7 0x0000000109fa50f4 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 1474 13 clang-7 0x00000001090e0bcc (anonymous namespace)::PPCDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 70 14 clang-7 0x00000001094d4785 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 117 15 clang-7 0x000000010965c36b llvm::FPPassManager::runOnFunction(llvm::Function&) + 335 16 clang-7 0x000000010965c505 llvm::FPPassManager::runOnModule(llvm::Module&) + 49 17 clang-7 0x000000010965c7e6 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 558 18 clang-7 0x0000000109ae2839 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) + 12981 19 clang-7 0x0000000109c48ea2 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) + 864 20 clang-7 0x000000010a40bbbf clang::ParseAST(clang::Sema&, bool, bool) + 458 21 clang-7 0x0000000109dcb7d7 clang::FrontendAction::Execute() + 69 22 clang-7 0x0000000109d9d9fc clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 664 23 clang-7 0x0000000109dfea50 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 1265 24 clang-7 0x0000000108f1238a cc1_main(llvm::ArrayRef<char const*>, char const*, void*) + 1140 25 clang-7 0x0000000108f115a4 main + 7638 26 libdyld.dylib 0x00007fff52667015 start + 1 27 libdyld.dylib 0x000000000000003c start + 2912522280 clang-7: error: unable to execute command: Segmentation fault: 11 clang-7: error: clang frontend command failed due to signal (use -v to see invocation) clang version 7.0.1 (tags/RELEASE_701/final 351415) Target: powerpc-gnu-linux-eabi Thread model: posix InstalledDir: /llvm/toolchain/bin clang-7: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. clang-7: note: diagnostic msg: ******************** PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: Preprocessed source(s) and associated run script(s) are located at: clang-7: note: diagnostic msg: /var/folders/9y/zl96m3v94kgcc6wxx0tvqhsc0000gn/T/sample-e182be.c clang-7: note: diagnostic msg: /var/folders/9y/zl96m3v94kgcc6wxx0tvqhsc0000gn/T/sample-e182be.sh clang-7: note: diagnostic msg: Crash backtrace is located in clang-7: note: diagnostic msg: /Library/Logs/DiagnosticReports/clang-7_<YYYY-MM-DD-HHMMSS>_<hostname>.crash clang-7: note: diagnostic msg: (choose the .crash file that corresponds to your crash) clang-7: note: diagnostic msg: ******************** Comment Actions Hi @vit9696 , This looks to be caused by using 128-bit long double on the platform. Does linux really use 128-bit long doubles on ppc32? FreeBSD uses 64-bit long double, so compiling that with '-target powerpc-gnu-freebsd' works fine. I'm not sure how to handle the 128-bit values. Comment Actions Actually I am not sure about Linux, since this is bare metal, and I just used what fited. However, it does not look like 128-bit or 64-bit long doubles are related. Comment Actions Hi @vit9696 it does crash with the linux target (powerpc-gnu-linux), but is fine with powerpc-gnu-freebsd. Comment Actions You are right, had to modify it like this to get the crash with FreeBSD triple: void f1(long double v, void *a) { } void f2(void* v, __builtin_va_list arg) { f1(__builtin_va_arg(arg, long double), 0); } Comment Actions Hi @vit9696, thanks for that, it was a straightforward fix. I'll post an update shortly for D54583, if arcanist cooperates. The short of it is I need two indices for arguments, since one is for logical arguments the other is for physical register allocation. I was only using 1, based on physical register allocation. Comment Actions Justin, I am currently testing the latest patches, yet so far it looks very good. Thanks. } else if (Feature == "+spe") { HasSPE = true; LongDoubleWidth = LongDoubleAlign = 64; LongDoubleFormat = &llvm::APFloat::IEEEdouble(); } Comment Actions @jhibbits it appears that va_list is not functional with SPE (va_arg returns garbage for double and stuff like printf is not functional). Is this expected or I miss a patch? Comment Actions @vit9696 I'm working since 3 days on that issue, and found nothing... PPCISelLowering.cpp has 2 functions: LowerVASTART() and LowerVAARG(). LowerVASTART is correctly called (store the GPR to the internal va_list structure), but LowerVAARG is never called and I don't understand why. The generated code is exactly what the LowerVAARG source is shown, but it must be generated somewhere else. typedef __builtin_va_list va_list; double a; long l = 0; void pr(char *txt, ...) { va_list vp; __builtin_va_start(vp,txt); a = __builtin_va_arg(vp,double); l = __builtin_va_arg(vp,long); __builtin_va_end(vp); } @vit9696 if you like, you can contact me directly, so that we can coordinate our work on the SPE. thomsen@microsys.de Comment Actions Right, I noticed the same thing yesterday. There is an override calling LowerVAARG for 64-bit integers, yet that is not a thing that does lowering for all the rest. I believe it is derived somewhere from td calling conventions. I will check it out later this evening and mail you if I find anything (vit9696 at avp dot su). Comment Actions The desired function for this va_arg is not in lib/Target/PowerPC/*.cpp, it is in tools/clang/lib/CodeGen/TargetInfo.cpp , a little bit unexpected to me. Comment Actions Thanks for pointing it out. You could use hasFeature from there during construction: return SetCGInfo( new PPC32TargetCodeGenInfo(Types, CodeGenOpts.FloatABI == "soft" || getTarget().hasFeature("spe"))); It works for me, but probably a separate argument is best to be used (or at least the current one is to be renamed). Comment Actions With this modification for SPE in VAARG, I was now able to compile all OS-9 libraries for SPE and tested them with whetstone. The results of the whetstone are the same like with a real FPU and they are correctly shown with printf. case llvm::Triple::ppc: return SetCGInfo( new PPC32TargetCodeGenInfo(Types, (CodeGenOpts.FloatABI == "soft") || getTarget().hasFeature("spe"))); case llvm::Triple::ppc64: Helps to get the va_arg() parameter correctly. We should not rename the isSoftFloatABI to something else, because this is just one line of code change compared to >12 lines. Also the naming is still correct and not really confusing. Comment Actions Hi Kei, that's fantastic! There's one more thing to add to this, which is to predefine NO_FPRS, and it should be a good replacement for gcc for 90+% of cases. I'll add your changes and this, and resubmit this review. Thanks for all your help! Comment Actions That's pretty good. Do you think it is possible to land it in 8.0 release? @hfinkel? @jhibbits could you please include this change too: https://reviews.llvm.org/D49754#1364865? We would prefer to continue using Linux target to be able to use LLVM sanitizers. Comment Actions @vit9696 sure thing. We'll need to get all these patches in together before any are actually useful. Comment Actions @jhibbits, @kthomsen, it appears that current patchset has issues when handling && on me. I have it applied over llvm 8.0.0 rc2, and the following code returns 0 to me with -O2 and below: #include <stdio.h> #define FEQUAL(x,y) (((x) - (y)) < 0.000001) // could put fabs if needed typedef struct { float x; float y; } float2; static bool __attribute__((noinline)) equals(float2* f40, float2* f41) { return FEQUAL(f40->x, f41->x) && FEQUAL(f40->y, f41->y); } int main() { float2 a = {0.721569, 0.1234}; float2 b = {0.721569, 0.1234}; printf("%d\n", equals(&a, &b)); } Does it reproduce for you? GCC output (working): # _Bool __fastcall equals(float2 *f40, float2 *f41) .globl equals equals: lwz r9, 0(r3) lis r10, -0x7FFF lwz r8, 0(r4) addi r10, r10, -0x7198 # 0x80008E68 evldd r10, 0(r10) efssub r9, r9, r8 efdcfs r9, r9 efdcmplt cr7, r9, r10 ble cr7, loc_80000048 lwz r3, 4(r3) lwz r9, 4(r4) efssub r3, r3, r9 efdcfs r3, r3 efdcmplt cr7, r3, r10 mfcr r3 extrwi r3, r3, 1,29 clrlwi r3, r3, 24 blr loc_80000048: li r3, 0 blr LLVM output (not working): # _Bool __fastcall equals(float2 *f40, float2 *f41) .globl equals equals: lwz r6, 0(r3) lwz r7, 0(r4) li r5, -0x6160 lis r8, -0x7FFF evlddx r5, r8, r5 efssub r6, r6, r7 efdcfs r6, r6 efdcmplt cr0, r6, r5 bge loc_800002B0 lwz r3, 4(r3) lwz r4, 4(r4) efssub r3, r3, r4 efdcfs r3, r3 efdcmplt cr0, r3, r5 b loc_800002B4 loc_800002B0: crclr gt loc_800002B4: li r3, 0 li r4, 1 bgt loc_800002C4 blr loc_800002C4: addi r3, r4, 0 blr Comment Actions There is a long series of comments in this patch and I am not clear at this point on whether this patch breaks anything or it is fine. Could you please Request Changes if this patch is broken or approve if it is fine? Comment Actions This is a series of patches, which I believe should merged altogether. Currently the following patches are relevant:
The patches are intended to add PowerPC SPE support, and they do not seem to break things outside. Initially I wanted them to get merged into 8.x, but extensive local testing unveiled a number of issues, so the time was missed. I believe all the 4 patches are pretty much ready for merging, aside the following:
Depending on the above the decision should be taken. Meanwhile, a test should be added for __NO_FPRS__ near PPC32-SPE:#define __SPE__ 1 now that we define it. Comment Actions Please, no more patches without context. This one was actually easy to review without context and the comments are minor, so I'm fine with these being addressed on the commit.
Comment Actions No, please don't merge them together. It is much more manageable for review when they're separate patches. I realize that this makes it a bit more difficult for the author to keep the dependency ordering straight, but I think preference needs to be given to the "reviewability" of the code. Comment Actions @nemanjai, sorry, under merging I meant committing into llvm upstream.
Comment Actions Right, ok. I have been testing this for quite some time now, including maths, and so far had no issues. Can this get merged into 9.0? I do not believe there are enough obstacles to postpone it any further. Thanks! Comment Actions I'll commit it tonight. Was going to last night, but ran into a clang test failure, that turned out to be a long-standing failure with FreeBSD/powerpc64, not a problem with my change. Comment Actions Actually, I'm not yet ready to commit this. I want to enforce the 8548 -> e500 processor model before I call this ready. How do I do that with the mcpu? Comment Actions Not sure whether I understood you, do not you already have the common switch by feature, named spe, in Features["spe"] line for that? Comment Actions Made '8548' CPU designation just a stub, to be filled out later. I added it Comment Actions Not in 9.0, but I will try to push for it in 9.0.1. To others, for posterity, I pushed *only* the SPE subset, not the 8548 CPU component. That will be a separate commit. I wanted it to be more than just a dummy stub, so that part will be a separate commit. The part committed was already reviewed by @nemanjai and approved. |