This is an archive of the discontinued LLVM Phabricator instance.

Add -m(no-)spe, and e500 CPU definitions and support to clang
ClosedPublic

Authored by jhibbits on Jul 24 2018, 1:27 PM.

Details

Summary

r337347 added support for the Signal Processing Engine (SPE) to LLVM.
This follows that up with the clang side.

Not yet implemented: vectors, SPE builtins.

Diff Detail

Repository
rL LLVM

Event Timeline

jhibbits created this revision.Jul 24 2018, 1:27 PM
jhibbits edited the summary of this revision. (Show Details)Jul 24 2018, 1:30 PM
jhibbits added reviewers: nemanjai, joerg.

Hello,

Thank you for working this. I tried the change and have a couple of suggestions:

  1. -mspe option in GCC works like -mspe=yes or -mspe=no. While it does make sense to have it the way you did (-mno-spe and -mspe) it would be nice to have at least have an alias for compiler compatibility.
  2. One of the known CPU examples with SPE support is MPC8548 (https://www.nxp.com/docs/en/reference-manual/MPC8548ERM.pdf), it will be nice to have it recognised in -mcpu argument as 8548 like in GCC.

Other than that, it appears that the implementation has several issues. Since this patch is partially unmerged yet, I did not use bugzilla but include the information below.

  1. The following code makes the compiler to crash:
struct a {
  long double b() const;
};
class c {
  struct C {
    C(long d) : e(d) {}
    long e;
    double f;
  };
  unsigned g;
  C h[5];
  unsigned i;
  void k(C d) {
    long j(g);
    if (j)
      h[i] = d;
  }
  c &fn(const a &);
};
c &c::fn(const a &d) {
  double l = d.b();
  k(l);
}
$ clang-7 -cc1 -triple powerpc-gnu-linux-eabi -emit-obj -target-cpu ppc -target-feature +spe -O2  t.cpp
t.cpp:23:1: warning: control reaches end of non-void function
}
^
Stack dump:
0.	Program arguments: clang-7 -cc1 -triple powerpc-gnu-linux-eabi -emit-obj -target-cpu ppc -target-feature +spe -O2 t.cpp 
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module 't.cpp'.
4.	Running pass 'PowerPC DAG->DAG Pattern Instruction Selection' on function '@_ZN1c2fnERK1a'
0  clang-7                  0x000000010a6e722d llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 37
1  clang-7                  0x000000010a6e760d SignalHandler(int) + 185
2  libsystem_platform.dylib 0x00007fff6a4baf5a _sigtramp + 26
3  clang-7                  0x000000010ac4948a llvm::DenseMapBase<llvm::SmallDenseMap<unsigned int, unsigned int, 8u, llvm::DenseMapInfo<unsigned int>, llvm::detail::DenseMapPair<unsigned int, unsigned int> >, unsigned int, unsigned int, llvm::DenseMapInfo<unsigned int>, llvm::detail::DenseMapPair<unsigned int, unsigned int> >::find(unsigned int const&) + 18
4  clang-7                  0x000000010ac3e23e llvm::DAGTypeLegalizer::ExpandIntRes_SIGN_EXTEND(llvm::SDNode*, llvm::SDValue&, llvm::SDValue&) + 514
5  clang-7                  0x000000010ac3a670 llvm::DAGTypeLegalizer::ExpandIntegerResult(llvm::SDNode*, unsigned int) + 1774
6  clang-7                  0x000000010ac49793 llvm::DAGTypeLegalizer::run() + 531
7  clang-7                  0x000000010ac4c1cd llvm::SelectionDAG::LegalizeTypes() + 57
8  clang-7                  0x000000010acde3da llvm::SelectionDAGISel::CodeGenAndEmitDAG() + 468
9  clang-7                  0x000000010acdd8a9 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 5521
10 clang-7                  0x000000010acdbbc9 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 1433
11 clang-7                  0x0000000109e18962 (anonymous namespace)::PPCDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 70
12 clang-7                  0x000000010a20d353 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 113
13 clang-7                  0x000000010a393af4 llvm::FPPassManager::runOnFunction(llvm::Function&) + 338
14 clang-7                  0x000000010a393c89 llvm::FPPassManager::runOnModule(llvm::Module&) + 49
15 clang-7                  0x000000010a393f66 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 556
16 clang-7                  0x000000010a81a77b clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) + 13071
17 clang-7                  0x000000010a97e61c clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) + 888
18 clang-7                  0x000000010b14479c clang::ParseAST(clang::Sema&, bool, bool) + 458
19 clang-7                  0x000000010aaff2b9 clang::FrontendAction::Execute() + 67
20 clang-7                  0x000000010aad1ac0 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 664
21 clang-7                  0x000000010ab3352c clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 1267
22 clang-7                  0x0000000109c49f00 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) + 1138
23 clang-7                  0x0000000109c4923d main + 7612
24 libdyld.dylib            0x00007fff6a1ac015 start + 1
25 libdyld.dylib            0x000000000000000b start + 2514829303
Segmentation fault: 11
  1. Several instructions seem to fail at variable allocation due to improper type. For example, the following code will overwrite the variables on stack and crash:
int global_var1;

void test_func2(void) { }

void test_func(int a1, long long a2, void *a3, int *a4, int *a5) {
    int *ptr = &global_var1;

    if (ptr != 0) {
        volatile int v1[6] = {};
        *a5 = 0;
        test_func2();
        volatile int v2[5] = {};
    } else {
        volatile int v3[3] = {};
        *a5 = 0;
    }
}

void main(void) {
    int ret;
    test_func(0, 0, 0, 0, &ret);
}

The assembly generated is as follows:

$ clang -S -o main.S -c -target powerpc-gnu-linux-eabi -mcpu=e500 -mspe main.c
.Lfunc_begin1:
# %bb.0:
	mflr 0
	stw 0, 4(1)
	stwu 1, -288(1)
	stw 31, 284(1)
	mr 31, 1
	stw 30, 280(31)                 # 4-byte Folded Spill
	evstdd 14, 128(31)              # 8-byte Folded Spill
	evstdd 15, 136(31)              # 8-byte Folded Spill
	evstdd 16, 144(31)              # 8-byte Folded Spill
	evstdd 17, 152(31)              # 8-byte Folded Spill
	evstdd 18, 160(31)              # 8-byte Folded Spill
	evstdd 19, 168(31)              # 8-byte Folded Spill
	evstdd 20, 176(31)              # 8-byte Folded Spill
	evstdd 21, 184(31)              # 8-byte Folded Spill
	evstdd 22, 192(31)              # 8-byte Folded Spill
	evstdd 23, 200(31)              # 8-byte Folded Spill
	evstdd 24, 208(31)              # 8-byte Folded Spill
	evstdd 25, 216(31)              # 8-byte Folded Spill
	evstdd 26, 224(31)              # 8-byte Folded Spill
	evstdd 27, 232(31)              # 8-byte Folded Spill
	evstdd 28, 240(31)              # 8-byte Folded Spill
	evstdd 29, 248(31)              # 8-byte Folded Spill
	evstdd 30, 256(31)              # 8-byte Folded Spill
	evstdd 31, 264(31)              # 8-byte Folded Spill

The offset wraps around and what is actually encoded for the last two instructions is effectively:

evstdd 30, 0(31)
evstdd 30, 8(31)

Hope this helps,
Vit

Hi Vit,

Thanks for the feedback. I can add the -mspe=yes/no, that shouldn't be hard. I didn't include it because it's been deprecated by GCC already as well. I can add the -mcpu=8548 option as well. I use -mcpu=8540 on FreeBSD for the powerpcspe target anyway (GCC based).

Regarding the stack overwriting, that's very peculiar, because I explicitly mark the immediate as needing to be an 8-bit multiple-of-8 value, so the codegen logic should take that into account. I'm surprised it's not. I've reproduced it as well myself, so I can take a closer look.

Hi Vit,

The register spilling bug is being addressed in D54409 now.

Thanks for the fix. I made a quick check of the mentioned patch, and it looks like it does solve the issue. However, besides the previous crash, which remains unfixed as of 7.0.1rc2, there is another instruction selection failure crash that may be caused by the change. I have not yet had a chance to properly research it, but here is an example if you feel interested: http://llvm.org/svn/llvm-project/compiler-rt/tags/RELEASE_701/rc2/lib/builtins/divdc3.c

clang -O3 -std=c11  -target powerpc-gnu-linux-eabi -ffreestanding -nostdlib -g -mcpu=e500 -mspe -femulated-tls -c divdc3.c -o divdc3.o
fatal error: error in backend: Cannot select: 0x7fcb8184c270: i64 = build_pair 0x7fcb8184c208, 0x7fcb8184c1a0, divdc3.c:24:22
  0x7fcb8184c208: i32,ch,glue = CopyFromReg 0x7fcb8184c1a0:1, Register:i32 $r4, 0x7fcb8184c1a0:2, divdc3.c:24:22
    0x7fcb81846678: i32 = Register $r4
    0x7fcb8184c1a0: i32,ch,glue = CopyFromReg 0x7fcb8184c138, Register:i32 $r3, 0x7fcb8184c138:1, divdc3.c:24:22
      0x7fcb818465a8: i32 = Register $r3
      0x7fcb8184c138: ch,glue = callseq_end 0x7fcb8184c0d0, TargetConstant:i32<8>, TargetConstant:i32<0>, 0x7fcb8184c0d0:1, divdc3.c:24:22
        0x7fcb81846408: i32 = TargetConstant<8>
        0x7fcb81846470: i32 = TargetConstant<0>
        0x7fcb8184c0d0: ch,glue = PPCISD::CALL 0x7fcb8184c000, TargetExternalSymbol:i32'fmax' [TF=1], Register:i32 $r3, Register:i32 $r4, Register:i32 $r5, Register:i32 $r6, RegisterMask:Untyped, 0x7fcb8184c000:1, divdc3.c:24:22
          0x7fcb8184c068: i32 = TargetExternalSymbol'fmax' [TF=1]
          0x7fcb818465a8: i32 = Register $r3
          0x7fcb81846678: i32 = Register $r4
          0x7fcb81849f48: i32 = Register $r5
          0x7fcb8184a768: i32 = Register $r6
          0x7fcb818467b0: Untyped = RegisterMask
          0x7fcb8184c000: ch,glue = CopyToReg 0x7fcb8184a220, Register:i32 $r6, 0x7fcb81842200, 0x7fcb8184a220:1, divdc3.c:24:22
            0x7fcb8184a768: i32 = Register $r6
            0x7fcb81842200: i32 = truncate 0x7fcb8184c680, divdc3.c:24:22
              0x7fcb8184c680: i64,ch = load<(load 8 from %stack.7)> 0x7fcb8184c618, FrameIndex:i32<7>, undef:i32, divdc3.c:24:22
                0x7fcb81846c28: i32 = FrameIndex<7>
                0x7fcb81842818: i32 = undef
            0x7fcb8184a220: ch,glue = CopyToReg 0x7fcb81849c70, Register:i32 $r5, 0x7fcb8184c5b0, 0x7fcb81849c70:1, divdc3.c:24:22
              0x7fcb81849f48: i32 = Register $r5
              0x7fcb8184c5b0: i32 = truncate 0x7fcb8184c2d8, divdc3.c:24:22
                0x7fcb8184c2d8: i64 = srl 0x7fcb8184c680, Constant:i32<32>, divdc3.c:24:22
                  0x7fcb8184c680: i64,ch = load<(load 8 from %stack.7)> 0x7fcb8184c618, FrameIndex:i32<7>, undef:i32, divdc3.c:24:22


                  0x7fcb81846fd0: i32 = Constant<32>
              0x7fcb81849c70: ch,glue = CopyToReg 0x7fcb81846540, Register:i32 $r4, 0x7fcb818469b8, 0x7fcb81846540:1, divdc3.c:24:22
                0x7fcb81846678: i32 = Register $r4
                0x7fcb818469b8: i32 = truncate 0x7fcb8184c750, divdc3.c:24:22
                  0x7fcb8184c750: i64,ch = load<(load 8 from %stack.8)> 0x7fcb818462d0, FrameIndex:i32<8>, undef:i32, divdc3.c:24:22


                0x7fcb81846540: ch,glue = CopyToReg 0x7fcb818470a0, Register:i32 $r3, 0x7fcb8184c6e8, divdc3.c:24:22
                  0x7fcb818465a8: i32 = Register $r3
                  0x7fcb8184c6e8: i32 = truncate 0x7fcb818463a0, divdc3.c:24:22

  0x7fcb8184c1a0: i32,ch,glue = CopyFromReg 0x7fcb8184c138, Register:i32 $r3, 0x7fcb8184c138:1, divdc3.c:24:22
    0x7fcb818465a8: i32 = Register $r3
    0x7fcb8184c138: ch,glue = callseq_end 0x7fcb8184c0d0, TargetConstant:i32<8>, TargetConstant:i32<0>, 0x7fcb8184c0d0:1, divdc3.c:24:22
      0x7fcb81846408: i32 = TargetConstant<8>
      0x7fcb81846470: i32 = TargetConstant<0>
      0x7fcb8184c0d0: ch,glue = PPCISD::CALL 0x7fcb8184c000, TargetExternalSymbol:i32'fmax' [TF=1], Register:i32 $r3, Register:i32 $r4, Register:i32 $r5, Register:i32 $r6, RegisterMask:Untyped, 0x7fcb8184c000:1, divdc3.c:24:22
        0x7fcb8184c068: i32 = TargetExternalSymbol'fmax' [TF=1]
        0x7fcb818465a8: i32 = Register $r3
        0x7fcb81846678: i32 = Register $r4
        0x7fcb81849f48: i32 = Register $r5
        0x7fcb8184a768: i32 = Register $r6
        0x7fcb818467b0: Untyped = RegisterMask
        0x7fcb8184c000: ch,glue = CopyToReg 0x7fcb8184a220, Register:i32 $r6, 0x7fcb81842200, 0x7fcb8184a220:1, divdc3.c:24:22
          0x7fcb8184a768: i32 = Register $r6
          0x7fcb81842200: i32 = truncate 0x7fcb8184c680, divdc3.c:24:22
            0x7fcb8184c680: i64,ch = load<(load 8 from %stack.7)> 0x7fcb8184c618, FrameIndex:i32<7>, undef:i32, divdc3.c:24:22
              0x7fcb81846c28: i32 = FrameIndex<7>
              0x7fcb81842818: i32 = undef
          0x7fcb8184a220: ch,glue = CopyToReg 0x7fcb81849c70, Register:i32 $r5, 0x7fcb8184c5b0, 0x7fcb81849c70:1, divdc3.c:24:22
            0x7fcb81849f48: i32 = Register $r5
            0x7fcb8184c5b0: i32 = truncate 0x7fcb8184c2d8, divdc3.c:24:22
              0x7fcb8184c2d8: i64 = srl 0x7fcb8184c680, Constant:i32<32>, divdc3.c:24:22
                0x7fcb8184c680: i64,ch = load<(load 8 from %stack.7)> 0x7fcb8184c618, FrameIndex:i32<7>, undef:i32, divdc3.c:24:22
                  0x7fcb81846c28: i32 = FrameIndex<7>
                  0x7fcb81842818: i32 = undef
                0x7fcb81846fd0: i32 = Constant<32>
            0x7fcb81849c70: ch,glue = CopyToReg 0x7fcb81846540, Register:i32 $r4, 0x7fcb818469b8, 0x7fcb81846540:1, divdc3.c:24:22
              0x7fcb81846678: i32 = Register $r4
              0x7fcb818469b8: i32 = truncate 0x7fcb8184c750, divdc3.c:24:22
                0x7fcb8184c750: i64,ch = load<(load 8 from %stack.8)> 0x7fcb818462d0, FrameIndex:i32<8>, undef:i32, divdc3.c:24:22
                  0x7fcb81846268: i32 = FrameIndex<8>
                  0x7fcb81842818: i32 = undef
              0x7fcb81846540: ch,glue = CopyToReg 0x7fcb818470a0, Register:i32 $r3, 0x7fcb8184c6e8, divdc3.c:24:22
                0x7fcb818465a8: i32 = Register $r3
                0x7fcb8184c6e8: i32 = truncate 0x7fcb818463a0, divdc3.c:24:22
                  0x7fcb818463a0: i64 = srl 0x7fcb8184c750, Constant:i32<32>, divdc3.c:24:22


In function: __divdc3
clang-7: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 7.0.1 (tags/RELEASE_701/rc2 347035)
Target: powerpc-gnu-linux-eabi
Thread model: posix
InstalledDir: /llvm/bin
clang-7: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
clang-7: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-7: note: diagnostic msg: /var/folders/9y/zl96m3v94kgcc6wxx0tvqhsc0000gn/T/divdc3-042ad6.c
clang-7: note: diagnostic msg: /var/folders/9y/zl96m3v94kgcc6wxx0tvqhsc0000gn/T/divdc3-042ad6.sh
clang-7: note: diagnostic msg: Crash backtrace is located in
clang-7: note: diagnostic msg: /Library/Logs/DiagnosticReports/clang-7_<YYYY-MM-DD-HHMMSS>_<hostname>.crash
clang-7: note: diagnostic msg: (choose the .crash file that corresponds to your crash)
clang-7: note: diagnostic msg: 

********************

Hi Vit,

I'm unable to reproduce the previous crash with clang8, so for now assuming it's fixed by other events. If that turns out not to be the case, I can revisit it. I am able to reproduce this crash though, so am taking a look. I'm' not quite sure how to interpret it, since i64 is supposed to be a synthetic type on this platform, so the normal expansion should be taking care of it. I'll see what I can find.

Thanks for your testing!

I have applied this patch to the llvm-toolchain-7 package in Debian and did not see any regressions on x86_64 or 32-Bit PowerPC. Additionally, I have included the patches from https://reviews.llvm.org/D54409 and https://reviews.llvm.org/D54583 saw no regressions on x86_64 and 32-bit PowerPC.

All three patches will be part of the next upload of the llvm-toolchain-7 package in Debian unstable which will be version 1:7.0.1~+rc2-9.

I have applied this patch to the llvm-toolchain-7 package in Debian and did not see any regressions on x86_64 or 32-Bit PowerPC. Additionally, I have included the patches from https://reviews.llvm.org/D54409 and https://reviews.llvm.org/D54583 saw no regressions on x86_64 and 32-bit PowerPC.

All three patches will be part of the next upload of the llvm-toolchain-7 package in Debian unstable which will be version 1:7.0.1~+rc2-9.

Might the first crash from https://reviews.llvm.org/D49754#1183753 reproduce for you or perhaps you have already bisected to trunk to figure out the changest that fixes it?

Might the first crash from https://reviews.llvm.org/D49754#1183753 reproduce for you or perhaps you have already bisected to trunk to figure out the changest that fixes it?

Yes, it does on Debian unstable 32-bit PowerPC with the patches against LLVM-7, didn't test LLVM-8:

root@kapitsa:/srv/llvm# clang-7 -cc1 -triple powerpc-gnu-linux-eabi -emit-obj -target-cpu ppc -target-feature +spe -O2  t.cpp
t.cpp:23:1: warning: control reaches end of non-void function
}
^
Stack dump:
0.      Program arguments: clang-7 -cc1 -triple powerpc-gnu-linux-eabi -emit-obj -target-cpu ppc -target-feature +spe -O2 t.cpp 
1.      <eof> parser at end of file
2.      Code generation
3.      Running pass 'Function Pass Manager' on module 't.cpp'.
4.      Running pass 'PowerPC DAG->DAG Pattern Instruction Selection' on function '@_ZN1c2fnERK1a'
/usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm3sys15PrintStackTraceERNS_11raw_ostreamE+0x40)[0xcc684a4]
/usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(+0x81886c)[0xcc6886c]
/usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm3sys17RunSignalHandlersEv+0xd8)[0xcc65b68]
/usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(+0x818a8c)[0xcc68a8c]
linux-vdso32.so.1(__kernel_sigtramp32+0x0)[0x100424]
/usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(+0xe157dc)[0xd2657dc]
/usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(+0xdfdab0)[0xd24dab0]
/usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(+0xdf7224)[0xd247224]
/usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(+0xe10014)[0xd260014]
/usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm12SelectionDAG13LegalizeTypesEv+0x244)[0xd265ba4]
/usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm16SelectionDAGISel17CodeGenAndEmitDAGEv+0x238)[0xd36a39c]
/usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm16SelectionDAGISel16SelectBasicBlockENS_14ilist_iteratorINS_12ilist_detail12node_optionsINS_11InstructionELb0ELb0EvEELb0ELb1EEES6_Rb+0x1fc)[0xd36a048]
/usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE+0x21dc)[0xd368af0]
/usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE+0x778)[0xd3659cc]
/usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(+0x24a5a00)[0xe8f5a00]
/usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE+0xcc)[0xcf98948]
/usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE+0x1e8)[0xcdc0e18]
/usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE+0x60)[0xcdc13d4]
/usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE+0x390)[0xcdc1960]
/usr/lib/powerpc-linux-gnu/libLLVM-7.so.1(_ZN4llvm6legacy11PassManager3runERNS_6ModuleE+0x14)[0xcdc220c]
clang-7(_ZN5clang17EmitBackendOutputERNS_17DiagnosticsEngineERKNS_19HeaderSearchOptionsERKNS_14CodeGenOptionsERKNS_13TargetOptionsERKNS_11LangOptionsERKN4llvm10DataLayoutEPNSE_6ModuleENS_13BackendActionESt10unique_ptrINSE_17raw_pwrite_streamESt14default_deleteISM_EE+0x32b4)[0x1023f644]
clang-7[0x1074857c]
clang-7(_ZN5clang8ParseASTERNS_4SemaEbb+0x2b4)[0x10c3da9c]
clang-7(_ZN5clang17ASTFrontendAction13ExecuteActionEv+0xac)[0x106a2a84]
clang-7(_ZN5clang13CodeGenAction13ExecuteActionEv+0x390)[0x107472f4]
clang-7(_ZN5clang14FrontendAction7ExecuteEv+0x6c)[0x106a2408]
clang-7(_ZN5clang16CompilerInstance13ExecuteActionERNS_14FrontendActionE+0x638)[0x10659aa4]
clang-7(_ZN5clang25ExecuteCompilerInvocationEPNS_16CompilerInstanceE+0x748)[0x10743ba0]
clang-7(_Z8cc1_mainN4llvm8ArrayRefIPKcEES2_Pv+0x6bc)[0x101ee2c4]
clang-7(main+0x2920)[0x101ec550]
/lib/powerpc-linux-gnu/libc.so.6(+0x22558)[0xbef2558]
/lib/powerpc-linux-gnu/libc.so.6(__libc_start_main+0xe8)[0xbef2748]
Segmentation fault
root@kapitsa:/srv/llvm#

Ok, I found the fix for the first crash that landed in 8.0 trunk. It works fine for me if backported to 7.0.1:
https://reviews.llvm.org/D50461

glaubitz added a comment.EditedDec 21 2018, 8:33 AM

Ok, I found the fix for the first crash that landed in 8.0 trunk. It works fine for me if backported to 7.0.1:
https://reviews.llvm.org/D50461

That's awesome, thanks so much for finding this!

jhibbits added a comment.EditedDec 31 2018, 1:15 PM

Hi vit,

I found what's going on with the "Cannot Select" error.

divdc3.c contains code that gets lowered to a libcall. However this lowering doesn't go further to lower down to the legal operations permitted in this target.

The debug snippet is:

Legalizing: t38: f64 = fmaxnum t36, t37, /home/chmeee/freebsd/contrib/compiler-rt/
lib/builtins/divdc3.c:24:22

    Op: t36: f64 = fabs t79, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins
/divdc3.c:24:22

    Op: t37: f64 = fabs t80, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins
/divdc3.c:24:22
Trying to expand node
Cannot expand node
Trying to convert node to libcall
Creating new node: t99: i64 = bitcast t36, /home/chmeee/freebsd/contrib/compiler-r
t/lib/builtins/divdc3.c:24:22
Creating new node: t100: i32 = extract_element t99, Constant:i32<1>, /home/chmeee/
freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22
Creating new node: t101: i32 = extract_element t99, Constant:i32<0>, /home/chmeee/
freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22
Creating new node: t102: i64 = bitcast t37, /home/chmeee/freebsd/contrib/compiler-
rt/lib/builtins/divdc3.c:24:22
Creating new node: t103: i32 = extract_element t102, Constant:i32<1>, /home/chmeee
/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22
Creating new node: t104: i32 = extract_element t102, Constant:i32<0>, /home/chmeee
/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22
Creating new node: t105: ch,glue = callseq_start t0, TargetConstant:i32<8>, Target
Constant:i32<0>, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24
:22
Creating new node: t107: ch,glue = CopyToReg t105, Register:i32 $r3, t100, /home/c
hmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22
Creating new node: t108: ch,glue = CopyToReg t107, Register:i32 $r4, t101, t107:1,
 /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22
Creating new node: t110: ch,glue = CopyToReg t108, Register:i32 $r5, t103, t108:1,
 /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22
Creating new node: t112: ch,glue = CopyToReg t110, Register:i32 $r6, t104, t110:1,
 /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22
Creating new node: t114: ch,glue = PPCISD::CALL t112, TargetExternalSymbol:i32'fma
x' [TF=1], Register:i32 $r3, Register:i32 $r4, Register:i32 $r5, Register:i32 $r6,
 RegisterMask:Untyped, t112:1, /home/chmeee/freebsd/contrib/compiler-rt/lib/builti
ns/divdc3.c:24:22
Creating new node: t115: ch,glue = callseq_end t114, TargetConstant:i32<8>, Target
Constant:i32<0>, t114:1, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/div
dc3.c:24:22
Creating new node: t116: i32,ch,glue = CopyFromReg t115, Register:i32 $r3, t115:1,
 /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22
Creating new node: t117: i32,ch,glue = CopyFromReg t116:1, Register:i32 $r4, t116:
2, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22
Creating new node: t118: i64 = build_pair t117, t116, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22
Creating new node: t119: f64 = bitcast t118, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22
Created libcall: t119: f64 = bitcast t118, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22
Successfully converted node to libcall
 ... replacing: t38: f64 = fmaxnum t36, t37, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22
     with:      t119: f64 = bitcast t118, /home/chmeee/freebsd/contrib/compiler-rt/lib/builtins/divdc3.c:24:22

The last 3 lines are the key. It's converting the node to a bitcast libcall, which ends up yielding a 'f64 bitcast (i64 build_pair i32, i32)', instead of lowering further to 'f64 build_spe64 i32, i32'.

@jhibbits sorry for not answering earlier, I was occupied with NY holidays, and then had a lot of stuff on the road.

Trying to sync to your latest changes, I merged the updated https://reviews.llvm.org/D54583 in my local copy with your fixes to libcall expansion.
From what I understood (correct me if I am wrong) it should have fixed the __divdc3 case.
However, it still appears appears unresolved to me, with a segmentation fault instead of selection error.

I narrowed a potentially smaller example, which reproduces the issue. Does it reproduce for you locally?

void f1(long double v)
{
}

void f2(void* v, __builtin_va_list arg)
{
  f1(__builtin_va_arg(arg, long double));
}
$ clang -c sample.c -target powerpc-gnu-linux-eabi -ffreestanding -nostdlib -femulated-tls -mcpu=8548 -mspe
Stack dump:
0.	Program arguments: clang-7 -cc1 -triple powerpc-gnu-linux-eabi -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name sample.c -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -ffreestanding -fuse-init-array -target-cpu ppc -target-feature +spe -mfloat-abi hard -dwarf-column-info -debugger-tuning=gdb -target-linker-version 409.12 -coverage-notes-file Desktop/sample/sample.gcno -resource-dir /llvm/llvm-7.0.1-Darwin-x86_64/lib/clang/7.0.1 -internal-isystem /usr/local/include -internal-isystem /llvm/llvm-7.0.1-Darwin-x86_64/lib/clang/7.0.1/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -fdebug-compilation-dir Desktop/sample -ferror-limit 19 -fmessage-length 265 -femulated-tls -fno-signed-char -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -o sample.o -x c Desktop/sample/sample.c -faddrsig 
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module 'Desktop/sample/sample.c'.
4.	Running pass 'PowerPC DAG->DAG Pattern Instruction Selection' on function '@f2'
0  clang-7                  0x00000001099af90a llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 37
1  clang-7                  0x00000001099afcfa SignalHandler(int) + 200
2  libsystem_platform.dylib 0x00007fff52975f5a _sigtramp + 26
3  libsystem_platform.dylib 0x00007fe219f0eec0 _sigtramp + 3344535424
4  clang-7                  0x00000001091093e1 llvm::PPCTargetLowering::LowerCall(llvm::TargetLowering::CallLoweringInfo&, llvm::SmallVectorImpl<llvm::SDValue>&) const + 867
5  clang-7                  0x0000000109f55b7a llvm::TargetLowering::LowerCallTo(llvm::TargetLowering::CallLoweringInfo&) const + 3856
6  clang-7                  0x0000000109f686fd llvm::SelectionDAGBuilder::lowerInvokable(llvm::TargetLowering::CallLoweringInfo&, llvm::BasicBlock const*) + 357
7  clang-7                  0x0000000109f5ad64 llvm::SelectionDAGBuilder::LowerCallTo(llvm::ImmutableCallSite, llvm::SDValue, bool, llvm::BasicBlock const*) + 1178
8  clang-7                  0x0000000109f4e05a llvm::SelectionDAGBuilder::visitCall(llvm::CallInst const&) + 1016
9  clang-7                  0x0000000109f474c0 llvm::SelectionDAGBuilder::visit(llvm::Instruction const&) + 114
10 clang-7                  0x0000000109fa76d3 llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void>, false, true>, llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction, false, false, void>, false, true>, bool&) + 177
11 clang-7                  0x0000000109fa6dde llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 5532
12 clang-7                  0x0000000109fa50f4 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 1474
13 clang-7                  0x00000001090e0bcc (anonymous namespace)::PPCDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 70
14 clang-7                  0x00000001094d4785 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 117
15 clang-7                  0x000000010965c36b llvm::FPPassManager::runOnFunction(llvm::Function&) + 335
16 clang-7                  0x000000010965c505 llvm::FPPassManager::runOnModule(llvm::Module&) + 49
17 clang-7                  0x000000010965c7e6 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 558
18 clang-7                  0x0000000109ae2839 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) + 12981
19 clang-7                  0x0000000109c48ea2 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) + 864
20 clang-7                  0x000000010a40bbbf clang::ParseAST(clang::Sema&, bool, bool) + 458
21 clang-7                  0x0000000109dcb7d7 clang::FrontendAction::Execute() + 69
22 clang-7                  0x0000000109d9d9fc clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 664
23 clang-7                  0x0000000109dfea50 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 1265
24 clang-7                  0x0000000108f1238a cc1_main(llvm::ArrayRef<char const*>, char const*, void*) + 1140
25 clang-7                  0x0000000108f115a4 main + 7638
26 libdyld.dylib            0x00007fff52667015 start + 1
27 libdyld.dylib            0x000000000000003c start + 2912522280
clang-7: error: unable to execute command: Segmentation fault: 11
clang-7: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 7.0.1 (tags/RELEASE_701/final 351415)
Target: powerpc-gnu-linux-eabi
Thread model: posix
InstalledDir: /llvm/toolchain/bin
clang-7: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
clang-7: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-7: note: diagnostic msg: /var/folders/9y/zl96m3v94kgcc6wxx0tvqhsc0000gn/T/sample-e182be.c
clang-7: note: diagnostic msg: /var/folders/9y/zl96m3v94kgcc6wxx0tvqhsc0000gn/T/sample-e182be.sh
clang-7: note: diagnostic msg: Crash backtrace is located in
clang-7: note: diagnostic msg: /Library/Logs/DiagnosticReports/clang-7_<YYYY-MM-DD-HHMMSS>_<hostname>.crash
clang-7: note: diagnostic msg: (choose the .crash file that corresponds to your crash)
clang-7: note: diagnostic msg: 

********************

Hi @vit9696 ,

This looks to be caused by using 128-bit long double on the platform. Does linux really use 128-bit long doubles on ppc32? FreeBSD uses 64-bit long double, so compiling that with '-target powerpc-gnu-freebsd' works fine. I'm not sure how to handle the 128-bit values.

Actually I am not sure about Linux, since this is bare metal, and I just used what fited. However, it does not look like 128-bit or 64-bit long doubles are related.
I retried to compile the test case with powerpc-gnu-freebsd target (and even made a compile-time assertion to ensure that long double is now 64-bit), yet it still crashed in a similar way.
Might it be another LLVM 7 vs LLVM 8 difference? Does it crash for you with Linux target?

Hi @vit9696 it does crash with the linux target (powerpc-gnu-linux), but is fine with powerpc-gnu-freebsd.

You are right, had to modify it like this to get the crash with FreeBSD triple:

void f1(long double v, void *a)
{
}

void f2(void* v, __builtin_va_list arg)
{
  f1(__builtin_va_arg(arg, long double), 0);
}

Hi @vit9696, thanks for that, it was a straightforward fix. I'll post an update shortly for D54583, if arcanist cooperates. The short of it is I need two indices for arguments, since one is for logical arguments the other is for physical register allocation. I was only using 1, based on physical register allocation.

Justin, I am currently testing the latest patches, yet so far it looks very good. Thanks.
I rechecked GCC, and it seems that it forces 64-bit long double for SPE regardless of the target.
Could you please force that in LLVM as well? While imperfect, I currently changed the following part of code in clang/lib/Basic/Targets/PPC.cpp, and it seems to work for the needs:

} else if (Feature == "+spe") {
  HasSPE = true;
  LongDoubleWidth = LongDoubleAlign = 64;
  LongDoubleFormat = &llvm::APFloat::IEEEdouble();
}

@jhibbits it appears that va_list is not functional with SPE (va_arg returns garbage for double and stuff like printf is not functional). Is this expected or I miss a patch?

@vit9696 I'm working since 3 days on that issue, and found nothing... PPCISelLowering.cpp has 2 functions: LowerVASTART() and LowerVAARG(). LowerVASTART is correctly called (store the GPR to the internal va_list structure), but LowerVAARG is never called and I don't understand why. The generated code is exactly what the LowerVAARG source is shown, but it must be generated somewhere else.
The Problem is the following:
The calling function is correctly placing the double date into a register pair (r5/r6 or r7/r8). In the function all registers (GPR) are placed on the stack (by LowerVASTART) and it reserves space for the FPU registers to save (which SPE don't have and therefore this space is left empty). The va_arg is now getting the double parameter from that FPU area (it has an offset of 32 to the GPR space), but not from the GPR space.
I am searching for that code generation. I'm 99% sure LowerVAARG can generate that code, but 100% sure that LowerVAARG is not called. Therefore, where is the va_arg loading generated?
My test code:

typedef __builtin_va_list va_list;
double a;
long l = 0;

void pr(char *txt, ...)
{
    va_list vp;

    __builtin_va_start(vp,txt);
    a = __builtin_va_arg(vp,double);
    l = __builtin_va_arg(vp,long);
    __builtin_va_end(vp);
}

@vit9696 if you like, you can contact me directly, so that we can coordinate our work on the SPE. thomsen@microsys.de
Kei

Right, I noticed the same thing yesterday. There is an override calling LowerVAARG for 64-bit integers, yet that is not a thing that does lowering for all the rest. I believe it is derived somewhere from td calling conventions. I will check it out later this evening and mail you if I find anything (vit9696 at avp dot su).

The desired function for this va_arg is not in lib/Target/PowerPC/*.cpp, it is in tools/clang/lib/CodeGen/TargetInfo.cpp , a little bit unexpected to me.
PPC32_SVR4_ABIInfo::EmitVAArg() is doing the va_arg handling. For testing, I have added a hasSPE = true and treat the parameter like SoftFloat. It looks good! Now I need to find out, where to get "hasSPE" from.

Thanks for pointing it out. You could use hasFeature from there during construction:

return SetCGInfo(
        new PPC32TargetCodeGenInfo(Types, CodeGenOpts.FloatABI == "soft" || getTarget().hasFeature("spe")));

It works for me, but probably a separate argument is best to be used (or at least the current one is to be renamed).

With this modification for SPE in VAARG, I was now able to compile all OS-9 libraries for SPE and tested them with whetstone. The results of the whetstone are the same like with a real FPU and they are correctly shown with printf.
Also the performance of CLANG is about 30% better than with my old compiler. Therefore, the modification in tools/clang/lib/CodeGen/TargetInfo.cpp line 9322

case llvm::Triple::ppc:
  return SetCGInfo(
      new PPC32TargetCodeGenInfo(Types, (CodeGenOpts.FloatABI == "soft") ||
            getTarget().hasFeature("spe")));
case llvm::Triple::ppc64:

Helps to get the va_arg() parameter correctly. We should not rename the isSoftFloatABI to something else, because this is just one line of code change compared to >12 lines. Also the naming is still correct and not really confusing.
Conclusion: the SPE is now at a level, where it can be used completely.

Hi Kei, that's fantastic! There's one more thing to add to this, which is to predefine NO_FPRS, and it should be a good replacement for gcc for 90+% of cases. I'll add your changes and this, and resubmit this review.

Thanks for all your help!

That's pretty good. Do you think it is possible to land it in 8.0 release? @hfinkel?

@jhibbits could you please include this change too: https://reviews.llvm.org/D49754#1364865? We would prefer to continue using Linux target to be able to use LLVM sanitizers.

@vit9696 sure thing.

We'll need to get all these patches in together before any are actually useful.

jhibbits updated this revision to Diff 185487.Feb 5 2019, 8:36 PM

Add feedback from @vit9696, and VAARG fix from @kthomsen.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2019, 8:36 PM

@jhibbits, @kthomsen, it appears that current patchset has issues when handling && on me. I have it applied over llvm 8.0.0 rc2, and the following code returns 0 to me with -O2 and below:

#include <stdio.h>

#define FEQUAL(x,y) (((x) - (y)) < 0.000001) // could put fabs if needed

typedef struct {
  float x;
  float y;
} float2;

static bool __attribute__((noinline)) equals(float2* f40, float2* f41) {
    return FEQUAL(f40->x, f41->x) && FEQUAL(f40->y, f41->y);
}

int main() {
    float2 a = {0.721569, 0.1234};
    float2 b = {0.721569, 0.1234};

    printf("%d\n", equals(&a, &b));
}

Does it reproduce for you?

GCC output (working):

# _Bool __fastcall equals(float2 *f40, float2 *f41)
.globl equals
equals:
  lwz       r9, 0(r3)
  lis       r10, -0x7FFF
  lwz       r8, 0(r4)
  addi      r10, r10, -0x7198 # 0x80008E68
  evldd     r10, 0(r10)
  efssub    r9, r9, r8
  efdcfs    r9, r9
  efdcmplt  cr7, r9, r10
  ble       cr7, loc_80000048
  lwz       r3, 4(r3)
  lwz       r9, 4(r4)
  efssub    r3, r3, r9
  efdcfs    r3, r3
  efdcmplt  cr7, r3, r10
  mfcr      r3
  extrwi    r3, r3, 1,29
  clrlwi    r3, r3, 24
  blr
loc_80000048:
  li        r3, 0
  blr

LLVM output (not working):

# _Bool __fastcall equals(float2 *f40, float2 *f41)
.globl equals
equals:
  lwz       r6, 0(r3)
  lwz       r7, 0(r4)
  li        r5, -0x6160
  lis       r8, -0x7FFF
  evlddx    r5, r8, r5
  efssub    r6, r6, r7
  efdcfs    r6, r6
  efdcmplt  cr0, r6, r5
  bge       loc_800002B0
  lwz       r3, 4(r3)
  lwz       r4, 4(r4)
  efssub    r3, r3, r4
  efdcfs    r3, r3
  efdcmplt  cr0, r3, r5
  b         loc_800002B4
loc_800002B0:
  crclr     gt
loc_800002B4:
  li        r3, 0
  li        r4, 1
  bgt       loc_800002C4
  blr
loc_800002C4:
  addi      r3, r4, 0
  blr

There is a long series of comments in this patch and I am not clear at this point on whether this patch breaks anything or it is fine. Could you please Request Changes if this patch is broken or approve if it is fine?

This is a series of patches, which I believe should merged altogether. Currently the following patches are relevant:

The patches are intended to add PowerPC SPE support, and they do not seem to break things outside. Initially I wanted them to get merged into 8.x, but extensive local testing unveiled a number of issues, so the time was missed. I believe all the 4 patches are pretty much ready for merging, aside the following:

Depending on the above the decision should be taken. Meanwhile, a test should be added for __NO_FPRS__ near PPC32-SPE:#define __SPE__ 1 now that we define it.

nemanjai accepted this revision.Feb 20 2019, 4:19 AM

Please, no more patches without context. This one was actually easy to review without context and the comments are minor, so I'm fine with these being addressed on the commit.

lib/Basic/Targets/PPC.cpp
318 ↗(On Diff #185487)

The e500v2 that you added doesn't support SPE?

test/Misc/target-invalid-cpu-note.c
82 ↗(On Diff #185487)

I think you may have missed adding 8548 to this line or above. Of course, the test should still pass, but we should add it.

test/Preprocessor/init.c
7021 ↗(On Diff #185487)

Please add a check for the other predefined macro you added.

This revision is now accepted and ready to land.Feb 20 2019, 4:19 AM

This is a series of patches, which I believe should merged altogether. Currently the following patches are relevant:

No, please don't merge them together. It is much more manageable for review when they're separate patches. I realize that this makes it a bit more difficult for the author to keep the dependency ordering straight, but I think preference needs to be given to the "reviewability" of the code.

@nemanjai, sorry, under merging I meant committing into llvm upstream.

lib/Basic/Targets/PPC.cpp
318 ↗(On Diff #185487)

I rechecked the docs, and there is no e500v2 option for -mcpu in GCC:
https://gcc.gnu.org/onlinedocs/gcc-8.1.0/gcc/RS_002f6000-and-PowerPC-Options.html#RS_002f6000-and-PowerPC-Options.

GCC defines 8540 (currently missing) and 8548, which I suppose we should support with default switches for SPE. As for other options, there also is -me500, which enables SPE, and -me500mc, which does not. I believe the need of these options is discussible.

The author of the patch may have his own thoughts behind, but I am not sure if e500, e500v1, or e500v2 should be used as mcpu switches at all. I am not against it if it makes things more convenient though.

jhibbits marked an inline comment as done.Jun 27 2019, 8:25 PM
jhibbits added inline comments.
lib/Basic/Targets/PPC.cpp
318 ↗(On Diff #185487)

I decided it's better to match gcc in this regard, and support only 8548 (8540 is e500v1, which this SPE codegen does not support).

Right, ok. I have been testing this for quite some time now, including maths, and so far had no issues. Can this get merged into 9.0? I do not believe there are enough obstacles to postpone it any further. Thanks!

I'll commit it tonight. Was going to last night, but ran into a clang test failure, that turned out to be a long-standing failure with FreeBSD/powerpc64, not a problem with my change.

Actually, I'm not yet ready to commit this. I want to enforce the 8548 -> e500 processor model before I call this ready. How do I do that with the mcpu?

Not sure whether I understood you, do not you already have the common switch by feature, named spe, in Features["spe"] line for that?

jhibbits updated this revision to Diff 207214.Jun 29 2019, 8:08 PM

Made '8548' CPU designation just a stub, to be filled out later. I added it
just for parity with GCC. The 8548 CPU, for GCC, also sets the
NO_LWSYNC macro, which doesn't belong with the SPE change, so will have
to be revisited later.

emaste added a subscriber: emaste.Jul 29 2019, 12:39 PM
jhibbits requested review of this revision.Jul 29 2019, 12:52 PM

Should've marked it as need review earlier.

This revision was not accepted when it landed; it landed in state Needs Review.Sep 5 2019, 6:39 AM
Closed by commit rL371066: Add -m(no)-spe to clang (authored by jhibbits). · Explain Why
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptSep 5 2019, 6:39 AM

@jhibbits, thank you for merging. Will we have this in LLVM 9.0?

Not in 9.0, but I will try to push for it in 9.0.1.

To others, for posterity, I pushed *only* the SPE subset, not the 8548 CPU component. That will be a separate commit. I wanted it to be more than just a dummy stub, so that part will be a separate commit. The part committed was already reviewed by @nemanjai and approved.