This is an archive of the discontinued LLVM Phabricator instance.

[TSAN, PowerPC] Setjmp/longjmp handling for PowerPC
ClosedPublic

Authored by wschmidt on Oct 14 2015, 9:30 AM.

Details

Summary

This patch builds on Simone Atzeni's patches (D12840, D12841) to enable TSAN for Power. Simone wasn't comfortable attempting to write the assembly code for intercepting setjmp and sigsetjmp calls, so I've done that here.

The main part of the patch is the new tsan_rtl_ppc64.S file, based on what's done in tsan_rtl_amd64.S. Our version looks much more complicated because we can't do a tail call optimization on an indirect call on Power, since we need to restore the TOC pointer (r2) following the call. Therefore we have to stack a frame, leading to problems with correct manipulation of the jmpbuf. The solution used here (idea due to Ulrich Weigand) is to replace the tail call to the libc setjmp() with a hand-inlined version of the setjmp logic. Since kernels and libc must have the same view of a jmpbuf, it is safe to assume this will not change. We can then modify the code slightly to save the original stack pointer of our caller, and its mangled version, instead of the current stack pointer.

A header file, ppc-regs.h, is included to allow us to write more legible assembly code, distinguishing more easily between register numbers and constants.

The LongJmp interceptor in tsan_interceptors.cc is modified to look for the mangled stack pointer in the architected jmpbuf location for Power.

One test case, race_on_mutex.c, is modified to be sufficiently general for the test to pass on Power. There are two issues:

(1) The atomic read as part of pthread_mutex_init and pthread_mutex_lock is of size 8 on Power, as opposed to size 1 for existing targets.  
(2) The libc implementation of pthread_mutex_init on Power includes a memset, which is on top of the stack trace for the "previous write."  So the position of pthread_mutex_init will be at #1 rather than #0 for Power.  I've changed the test to verify that pthread_mutex_init and Thread1 are on the stack, but not necessarily right at the top.

These changes fix 6 of the 7 test case failures that remained after applying Simone's patches. The remaining java-related patch needs more investigation, but we might want to XFAIL that test for Power for now and go ahead with enabling TSAN when Simone is finished with his remaining investigations.

This patch will not be applied until Simone's work is complete.

Diff Detail

Repository
rL LLVM

Event Timeline

wschmidt updated this revision to Diff 37354.Oct 14 2015, 9:30 AM
wschmidt retitled this revision from to [TSAN, PowerPC] Setjmp/longjmp handling for PowerPC.
wschmidt updated this object.
wschmidt set the repository for this revision to rL LLVM.
wschmidt added a subscriber: llvm-commits.

Looks reasonable, but I'll let Dmitry sign this off.

lib/tsan/rtl/tsan_interceptors.cc
458

remove comment

wschmidt added inline comments.Oct 27 2015, 9:29 AM
lib/tsan/rtl/tsan_rtl_ppc64.S
24

It turns out that I won't be able to use this form, as older binutils versions do not recognize .TOC. as a true symbol. There is another way to do this: see https://sourceware.org/ml/binutils/2012-11/msg00055.html. I will have to use that method instead.

dvyukov edited edge metadata.Oct 27 2015, 9:05 PM

I did not look at the assembly because I don't understand it. But I am fine with it as long as tests pass. This code does not change frequently.

lib/tsan/rtl/ppc-regs.h
1 ↗(On Diff #37354)

all tsan source file names currently start with tsan_, please keep this convention

test/tsan/race_on_mutex.c
36 ↗(On Diff #37354)

How does it happen that the access is not 1 byte?
This memory access is emulated in __tsan::MutexLock, and it explicitly coded as 1-byte.

This test change is unrelated to setjmp/longjmp and should go into a separate patch.

wschmidt added inline comments.Nov 2 2015, 12:45 PM
lib/tsan/rtl/ppc-regs.h
1 ↗(On Diff #37354)

OK, will do.

test/tsan/race_on_mutex.c
36 ↗(On Diff #37354)

Hm, I need to look into this again. We are seeing an 8-byte access, which I thought was due to not having a single-byte lock instruction implemented. But now that I look at it, the lbarx instruction is in our machine description now. So it's a good question why we're getting a different size. I'll drop this from this patch for now, and at some point we'll have to sort this out.

Bill, you did not upload a new version. Waiting.

Bill, you did not upload a new version. Waiting.

Right, I don't have one ready yet. I need to fix some of the assembly code as well, and haven't yet done so. I should have time to work on this later today.

wschmidt updated this revision to Diff 39366.Nov 5 2015, 6:48 AM
wschmidt edited edge metadata.

All comments should be addressed here. I have dropped the test case that still needs a little investigation. Again, this patch requires Simone's patches as a prerequisite. I had to do a little massaging of his compiler-rt patch to bring it up to current trunk.

All setjmp/longjmp-related TSAN tests now pass. We still have failures in race_on_mutex.c and java_race_pc.cc. I believe the former is still a test case that is too "tight," but I need to prove that at some point.

I forgot to mention that I now have separate ways of materializing the TOC pointer for BE and LE. For BE, the TOC can be materialized from an offset of 8 from the beginning of the OPD, where the OPD is equivalent to the function address. For LE, all supported binutils define the .TOC. symbol, so that method can be used instead.

Simone, can you please test this assembly code on the machine where you were having difficulty with the previous patch?

dvyukov accepted this revision.Nov 5 2015, 7:02 AM
dvyukov edited edge metadata.

I am fine with landing this as long as it works.
I did not read the assembly.

This revision is now accepted and ready to land.Nov 5 2015, 7:02 AM
simoatze edited edge metadata.Nov 6 2015, 12:54 AM
simoatze added a subscriber: simoatze.

Simone, can you please test this assembly code on the machine where you were having difficulty with the previous patch?

I am trying to test it on the big endian machine with older kernel, but since I updated to the to the current trunk
and applied the new patch with my patch, all tests are failing and just before running the tests I get this message:

lit.py: discovery.py:190: warning: test suite 'ThreadSanitizer-Unit' contained no tests

did something change on Tsan that can affect the execution with older kernel?
It’s working on little endian newer kernel though.

Any idea?

Best,
Simone

Repository:
rL LLVM

http://reviews.llvm.org/D13729

Simone, can you please test this assembly code on the machine where you were having difficulty with the previous patch?

I am trying to test it on the big endian machine with older kernel, but since I updated to the to the current trunk
and applied the new patch with my patch, all tests are failing and just before running the tests I get this message:

lit.py: discovery.py:190: warning: test suite 'ThreadSanitizer-Unit' contained no tests

did something change on Tsan that can affect the execution with older kernel?
It’s working on little endian newer kernel though.

Any idea?

Best,
Simone

Repository:
rL LLVM

http://reviews.llvm.org/D13729

Looking at the error message, it seems that the tests are just not enabled on your machine inside of cmake files. Probably cmake queries identity of your machine in some way (e.g. uname -a), and then fails to recognize it as power.

+Alexey for cmake magic, where does recognition of target machine happens inside of cmake files?

This comment was removed by wschmidt.

Hi Bill,

looks like that the older Kernel on PPC64/BE still does not like the assembly part.
If I comment out the tsan_rtl_ppc64.S from the CMakeList.txt it compiles
and when I run the check-tsan it fails all the setjump tests plus some other few tests (which I am already trying to fix).
However, if I compile also the assembly code it fails all the tests.

I’ll try to look at into it with GDB and see if I can understand what is going on.

Best,
Simone

OK, thank you. I am wondering if this is a glibc "feature" when initializing a thread for Power. For LE, I noticed that it was not setting up the TOC for a call to setjmp from init_thread, which is technically within its rights only if setjmp cannot be overridden. Hence I added code to materialize the TOC from the .TOC. symbol. For BE, the TOC must be initialized by placing it in the OPD. If the OPD TOC slot contains garbage, which would be true if init_thread didn't set it up, then we have no way to materialize the TOC on BE systems. If this turns out to be the case, we would either need to disable TSAN for BE (not ideal), or try to figure out when setjmp is being called in this heinous way.

I should emphasize that I'm speculating, so looking at things with GDB is the way to go. But this may give you something to look for.

Bill,

I guess you might be right.
The only info that I can gather from GDB are the following:


Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xfffb7196430 (LWP 13646)]
0x7c0802a6f821ffd0 in ?? ()
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.7.ppc64
(gdb) bt
#0 0x7c0802a6f821ffd0 in ?? ()
#1 0x00000fffb7f6e590 in .start_thread () from /lib64/libpthread.so.0

#2 0x00000fffb7d448ec in .__clone () from /lib64/libc.so.6

Looks like there the problem comes from libthread or glibc. So, could be the init_thread problem you are talking about.
I think some debuginfo are missing, but I don’t have any root access to this machine so I can’t install anything.

Do you have any other idea how to investigate this problem?

Best,
Simone

Simone, thanks for confirming. I've discussed this with one of our ABI experts. For BE, we are going to have to do something pretty ugly, I'm afraid. We'll likely need a trampoline to a separate function so that the linker will create an OPD from which we can load the TOC. But we'll still need to keep the link register clean along that path, which could be a little tricky.

The question is whether we want to go to all this trouble, or whether we should just enable TSAN for little endian. LE is the path going forward. Do you know whether LLNL has a need to support TSAN on big-endian systems? If not, I might advocate to just drop big-endian support for TSAN.

Actually, I managed to dig up a BE Power8 machine and run the tests, and although I have failures due to the setjmp handling, most of the tests pass for me. What sort of BE machine are you using? Is it a Power7?

Actually, I managed to dig up a BE Power8 machine and run the tests, and although I have failures due to the setjmp handling, most of the tests pass for me. What sort of BE machine are you using? Is it a Power7?

Well, scratch that. I accidentally rebuilt on the LE machine after applying patches to the BE one.

At the moment I'm having trouble building on the BE machine due to recent bugs introduced in Clang. Will let you know when I have actual results.

uweigand edited edge metadata.Nov 11 2015, 8:39 AM

I've now had a look at the current patch, and it seems the problem is simply that the assembly does not actually provide any .opd section in the ELFv1 case.

So instead of:

        .globl _setjmp
        .type _setjmp, @function
        .align 4
_setjmp:

you need something along the lines of:

        .globl _setjmp
        .type   _setjmp, @function
       ​ .align 4
        .section        ".opd","aw"
        .align 3
_setjmp:
        .quad   .L._setjmp,.TOC.@tocbase,0
        .previous
.L._setjmp:

so that the _setjmp symbol actually points to the OPD! (Otherwise, calling the function will load the first 8 bytes of function text and treat it as code address ....)

For a "normal" function, you could support both ELFv1 and ELFv2 by a prolog like this:

        .globl _setjmp
        .type _setjmp, @function
        .align 2
#if _CALL_ELF == 2
_setjmp:
         addis 2,12,(.TOC.-_setjmp)@ha;
         addi 2,2,(.TOC.-_setjmp)@l;
         .localentry _setjmp,.-_setjmp
#else
         .section        ".opd","aw"
         .align 3
_setjmp:
         .quad   .L._setjmp,.TOC.@tocbase,0
         .previous
.L._setjmp:
#endif

But if you interpose a function from a different shared library, this will not work correctly since caller and callee may get confused as to who is setting up the TOC. To handle those cases, you'll have to explicitly set up the TOC just like your patch did:

        .globl _setjmp
        .type _setjmp, @function
        .align 2
#if _CALL_ELF == 2
_setjmp:
#else
         .section        ".opd","aw"
         .align 3
_setjmp:
         .quad   .L._setjmp,.TOC.@tocbase,0
         .previous
.L._setjmp:
#endif

[...]
        // Materialize a TOC in case we were called from libc.
        // For ELFv1, we load the TOC from the OPD.  For
        // ELFv2 we use the .TOC. symbol to find it.
        nop
        bcl     20,31,0f
0:
        mflr    r2
#if _CALL_ELF == 2
        addis   r2,r2,.TOC.-0b@ha
        addi    r2,r2,.TOC.-0b@l
#else
        addis   r2,r2,_setjmp-0b@ha 
        addi    r2,r2,_setjmp-0b@l
        ld      r2,8(r2)
#endif

(Note that the _setjmp symbol now refers to the OPD entry in the ELFv1 case, not the code address!)

Oh, of course! Thanks, Uli. I totally forgot about having to declare the OPD. And it would be better if I were using _CALL_ELF rather than LITTLE_ENDIAN for the ABI distinction. I'll update the patch and try to solve my build problems to test it.

Thanks again!

wschmidt updated this revision to Diff 40077.Nov 12 2015, 11:47 AM
wschmidt edited edge metadata.

This version adds the correct OPD implementation per Uli's suggestions. The only change beyond that is to make .L._setjmp and .L.__sigsetjmp unconditional symbols, so they can be used in the .size calculation for their respective functions.

I tested this as best I could on a BE system and verified that the setjmp/longjmp logic is now working correctly. There are other problems because I don't have Simone's BE fixes yet, but the remaining failures don't appear to have anything to do with this code.

wschmidt closed this revision.Dec 8 2015, 2:18 PM

Committed as r255059. Thanks!