This is an archive of the discontinued LLVM Phabricator instance.

[RuntimeDyld] Handle SPARC
Needs ReviewPublic

Authored by ro on Jan 28 2022, 4:12 AM.

Details

Summary

A couple of tests FAIL on Solaris/sparcv9 with

Unsupported CPU type!
UNREACHABLE executed at /vol/llvm/src/llvm-project/dist/llvm/lib/ExecutionEngine/RuntimeDyld/RuntimeDyldELF.cpp:1067!

Fixed by adding SPARC support to RuntimeDyld.

Tested on sparcv9-sun-solaris2.11.

Authored-By: Stefan Teleman <stefan.teleman@oracle.com>

The patch has been available from 033-solaris-LLVM-JIT.patch for quite a while. I've learned from my contacts in Oracle Solaris Engineering that it had originally been developed for LLVM 3.8.1 by Stefan Teleman while working at Oracle. They are fine with me submitting it upstream.

I had to make a few changes to that patch:

  • Formatting fixes
  • Remove unused variables
  • Undo the getFileFormatName changes in ELFObjectFile.h which broke ELFObjectFileTest.MachineTestForSPARC*

Diff Detail

Event Timeline

ro created this revision.Jan 28 2022, 4:12 AM
ro requested review of this revision.Jan 28 2022, 4:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 28 2022, 4:12 AM
ro added a comment.Feb 3 2022, 3:20 PM

Ping? It's been a week.

lhames added a comment.Feb 3 2022, 8:58 PM

HI Rainer,

Sorry, I've been busy and haven't had a chance to review this yet. I should be able to take a look this week or next.

The first thing that I notice is that this is missing a testcase. I don't think we had infrastructure to test this with in LLVM 3, but it should now be testable with an llvm-rtdyld test (see llvm/test/ExecutionEngine/RuntimeDyld/*/ for examples).

Is there a motivating use case for adding this support? Where possible, new development should focus on JITLink rather than RuntimeDyld, but if there are existing MCJIT users on SPARC who need this then I think it makes sense to upstream it.

ro added a comment.Feb 4 2022, 12:57 AM

Sorry, I've been busy and haven't had a chance to review this yet. I should be able to take a look this week or next.

No worries at all: I just pinged to make sure I won't drop the ball.

The first thing that I notice is that this is missing a testcase. I don't think we had infrastructure to test this with in LLVM 3, but it should now be testable with an llvm-rtdyld test (see llvm/test/ExecutionEngine/RuntimeDyld/*/ for examples).

I see: seems like I'll have to dive really deep into this area then.

Is there a motivating use case for adding this support? Where possible, new development should focus on JITLink rather than RuntimeDyld, but if there are existing MCJIT users on SPARC who need this then I think it makes sense to upstream it.

The original patch starts with this comment:

gnome-shell cores dump on sparc when mesa is build with llvm without this.
(to be submitted to upstream).

------------  lwp# 1 / thread# 1  ---------------
 00000000e7a01a84 fail (e7a017d8, 21, e7a017b0, 1d, f71a7cee, 0) + 108 (ssp.c:169)
 00000000e7a01ad8 __stack_chk_fail (ff000000, ff000000, ff000000, 0, f3e29970, f3e33d44) + 28
 00000000ea6c6328 llvm::RuntimeDyldELF::resolveRelocation(const llvm::SectionEntry&,unsigned long,unsigned long,unsigned,long,unsigned long,unsigned) (3cd2e10, 3cd2ee8, 20, f30a6000, 6, 0) + 1c0
[...]

I suspect that still holds.

However, my primary motivation was that this patch fixes 5 failures on Solaris/sparcv9:

-  Clang-Unit :: Interpreter/./ClangReplInterpreterTests/IncrementalProcessing.InstantiateTemplate

-  LLVM-Unit :: ExecutionEngine/Orc/./OrcJITTests/OrcCAPITestBase.AddObjectBuffer
-  LLVM-Unit :: ExecutionEngine/Orc/./OrcJITTests/OrcCAPITestBase.ExecutionTest
-  LLVM-Unit :: ExecutionEngine/Orc/./OrcJITTests/OrcCAPITestBase.ResourceTrackerDefinitionLifetime
-  LLVM-Unit :: ExecutionEngine/Orc/./OrcJITTests/OrcCAPITestBase.ResourceTrackerTransfer

I try to get Solaris testresults clean so the buildbots (currently on staging) become actually useful to pinpoint regressions and new failures. Just getting an existing patch upstream seemed like an easy way to get closer to that goal.

brad added a subscriber: brad.Feb 6 2022, 5:12 PM

I noticed this bug report for PostgreSQL on Sparc for this particular functionality.

https://github.com/llvm/llvm-project/issues/47229

ro added a comment.Feb 8 2022, 4:04 AM

I noticed this bug report for PostgreSQL on Sparc for this particular functionality.

https://github.com/llvm/llvm-project/issues/47229

... which suggests that maybe HaveJIT should be set to false in SparcTargetInfo.cpp until this patch is resolved?

brad added a comment.Feb 8 2022, 6:55 AM
In D118450#3304109, @ro wrote:

... which suggests that maybe HaveJIT should be set to false in SparcTargetInfo.cpp until this patch is resolved?

No, that something other than the tests uses the MCJIT support.

Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2022, 2:11 PM
lhames added a comment.Jul 4 2022, 2:41 PM
In D118450#3304109, @ro wrote:

... which suggests that maybe HaveJIT should be set to false in SparcTargetInfo.cpp until this patch is resolved?

No, that something other than the tests uses the MCJIT support.

Lots of things use MCJIT, but MCJIT/Sparc can't have been be used anywhere, since this is the patch that introduces support.

Setting HaveJIT to false is a good interim solution.

@ro Any interest in adapting this as a JITLink backend? Do you have access to a Sparc box to test on, or are you just trying to silence failures on Sparc bots?

ro added a comment.Jul 5 2022, 5:21 AM
In D118450#3304109, @ro wrote:

... which suggests that maybe HaveJIT should be set to false in SparcTargetInfo.cpp until this patch is resolved?

No, that something other than the tests uses the MCJIT support.

Lots of things use MCJIT, but MCJIT/Sparc can't have been be used anywhere, since this is the patch that introduces support.

True for unmodified upstream LLVM. However, this patch as referenced in the submission has lived in the Solaris userland repo on github.com for almost 6 years now. Given that not many people build LLVM from source themselves, it will well have been in use by those using clang from the Solaris repo.

Besides, the patch states:

gnome-shell cores dump on sparc when mesa is build with llvm without this.
(to be submitted to upstream).

------------  lwp# 1 / thread# 1  ---------------
 00000000e7a01a84 fail (e7a017d8, 21, e7a017b0, 1d, f71a7cee, 0) + 108 (ssp.c:16
9)
 00000000e7a01ad8 __stack_chk_fail (ff000000, ff000000, ff000000, 0, f3e29970, f
3e33d44) + 28
 00000000ea6c6328 llvm::RuntimeDyldELF::resolveRelocation(const llvm::SectionEnt
ry&,unsigned long,unsignedlong,unsigned,long,unsigned long,unsigned) (3cd2e10, 3
cd2ee8, 20, f30a6000, 6, 0) + 1c0
 00000000ea6c639c llvm::RuntimeDyldELF::resolveRelocation(const llvm::Relocation
Entry&,unsigned long) (3cd2e10, 3ee67d8, f30a6000, 0, 0, e7b04430) + 64
 00000000ea6ab3e0 llvm::RuntimeDyldImpl::resolveRelocationList(const llvm::Small
Vector<llvm::RelocationEntry, (unsigned)64>&,unsigned long) (3cd2e10, 3ee67c0, f
30a6000, e7b04430, 0, 38) + e0
 00000000ea6af60c llvm::RuntimeDyldImpl::resolveRelocations() (3cd2e10, 6, 1, e7
b04430, 3cd42d0, 3ee67b0) + 2dc
 00000000ea6af888 llvm::RuntimeDyld::resolveRelocations() (3e13038, 3d801a0, 3fa
49a0, 3fa49b0, e7b04430, e7b04430) + 30

explaining the need for it.

Setting HaveJIT to false is a good interim solution.

I could give this a try (at least for the benefit of the Solaris/sparcv9 buildbot), but I vaguely recall having tried it and still getting quite a number of MCJIT-related failures.

@ro Any interest in adapting this as a JITLink backend? Do you have access to a Sparc box to test on, or are you just trying to silence failures on Sparc bots?

Given my current backlog, it's unlikely to get to that anytime soon. I do in fact have a SPARC box, which also hosts the Solaris/sparcv9 buildbot.

ro added a comment.Jul 5 2022, 7:19 AM
In D118450#3630074, @ro wrote:

Setting HaveJIT to false is a good interim solution.

I could give this a try (at least for the benefit of the Solaris/sparcv9 buildbot), but I vaguely recall having tried it and still getting quite a number of MCJIT-related failures.

I just gave it a quick whirl: it introduced a new failure

FAIL: Clang :: Interpreter/code-undo.cpp

although that test is guarded by host-supports-jit. The remaining tests seem to persist, too, although this is hard to see since gtest sharding produced useless test names like

-FAIL: Clang-Unit :: Interpreter/./ClangReplInterpreterTests/2/7
+FAIL: Clang-Unit :: Interpreter/./ClangReplInterpreterTests/2/8

and

-FAIL: LLVM-Unit :: ExecutionEngine/Orc/./OrcJITTests/67/126
-FAIL: LLVM-Unit :: ExecutionEngine/Orc/./OrcJITTests/68/126
-FAIL: LLVM-Unit :: ExecutionEngine/Orc/./OrcJITTests/69/126
-FAIL: LLVM-Unit :: ExecutionEngine/Orc/./OrcJITTests/70/126
+FAIL: LLVM-Unit :: ExecutionEngine/Orc/./OrcJITTests/68/127
+FAIL: LLVM-Unit :: ExecutionEngine/Orc/./OrcJITTests/69/127
+FAIL: LLVM-Unit :: ExecutionEngine/Orc/./OrcJITTests/70/127
+FAIL: LLVM-Unit :: ExecutionEngine/Orc/./OrcJITTests/71/127

This claimed optimization seems like a nightmare to me. I'll fire off a debug build to investigate.

ro added a comment.Jul 8 2022, 12:40 AM

I've now found what's going on: D129349 disables JIT support on SPARC until this one lands, but D129350 is also required so the hasJIT setting is considered at all.

In D118450#3630074, @ro wrote:

@ro Any interest in adapting this as a JITLink backend? Do you have access to a Sparc box to test on, or are you just trying to silence failures on Sparc bots?

Given my current backlog, it's unlikely to get to that anytime soon. I do in fact have a SPARC box, which also hosts the Solaris/sparcv9 buildbot.

Ok -- we can leave this open for now, just in case anyone else stumbles on it and wants to pick up the work.

ro added a comment.Jul 15 2022, 1:17 AM

Ok -- we can leave this open for now, just in case anyone else stumbles on it and wants to pick up the work.

Good: I'll talk to the Solaris engineers about this. Maybe the gnome-shell crash that prompted the original patch was caused by clang claiming SPARC JIT support when in fact it didn't.
With D129349 and D129350 now in, the issue might be gone.