This is an archive of the discontinued LLVM Phabricator instance.

[libc] Fix set_thread_ptr call in rv32 start up code
ClosedPublic

Authored by mikhail.ramalho on Aug 29 2023, 8:25 AM.

Details

Summary

This patch changes the instruction in set_thread_ptr from ld (load
doubleword) to lw (load word), as rv32 doesn't have the ld instruction.

Diff Detail

Event Timeline

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 29 2023, 8:25 AM
mikhail.ramalho requested review of this revision.Aug 29 2023, 8:25 AM
jrtc27 requested changes to this revision.Aug 29 2023, 2:27 PM

as rv32 doesn't have the ld instruction.

More importantly, even if it did, pointers are 32-bit values not 64-bit values. But regardless, the original RV64 code is wrong, it's doing a load not a move, so is setting tp to whatever the first pointer in the static TLS block happens to be, which is clearly nonsense. This should just be:

static void set_thread_ptr(uintptr_t val) {
  LIBC_INLINE_ASM("mv tp, %0\n\t" : : "r"(val));
}
This revision now requires changes to proceed.Aug 29 2023, 2:27 PM
  • Implement code as suggested by reviewer
  • Updated commit message to reflect new changes
  • Updated commit message to reflect new changes

You need to do that manually in Phabricator, arc won't sync your local changes to the commit message.

Change looks fine, but message is not, and I'd like to know how you're testing this. Presumably it's currently completely untested given it couldn't possibly have worked before.

For my knowledge, what is wrong with a tp load in the existing code?

For my knowledge, what is wrong with a tp load in the existing code?

Well it's the difference between x = p and x = *p. Only one is ever the right thing to do. Like I said, "it's doing a load not a move, so is setting tp to whatever the first pointer in the static TLS block happens to be", i.e. you're reading in the first sizeof(void *) bytes of data from .tdata (or .tbss) and using that as the thread pointer, not the region of memory that just got mapped for the thread pointer.

For my knowledge, what is wrong with a tp load in the existing code?

Well it's the difference between x = p and x = *p. Only one is ever the right thing to do. Like I said, "it's doing a load not a move, so is setting tp to whatever the first pointer in the static TLS block happens to be", i.e. you're reading in the first sizeof(void *) bytes of data from .tdata (or .tbss) and using that as the thread pointer, not the region of memory that just got mapped for the thread pointer.

Shouldn't the ld operation overwrite tp and not read from tp? The assembly looks like this: https://godbolt.org/z/PTqfTcshv

  • Updated commit message to reflect new changes

You need to do that manually in Phabricator, arc won't sync your local changes to the commit message.

Ok.

Change looks fine, but message is not, and I'd like to know how you're testing this. Presumably it's currently completely untested given it couldn't possibly have worked before.

For rv64: I test it locally on my VisionFive V2 board with ninja check-libc, which covers all the tests in libc, even more than what's being tested on the buildbots.
For rv32: I test it locally with qemu-system, with ninja libc-unit-tests, as our buildbots do for rv64. I do get some random crashes with ninja check-libc with or without this patch. I'll only investigate these crashes once I get ninja libc-unit-tests passing.

jrtc27 added a comment.EditedAug 29 2023, 4:58 PM

For my knowledge, what is wrong with a tp load in the existing code?

Well it's the difference between x = p and x = *p. Only one is ever the right thing to do. Like I said, "it's doing a load not a move, so is setting tp to whatever the first pointer in the static TLS block happens to be", i.e. you're reading in the first sizeof(void *) bytes of data from .tdata (or .tbss) and using that as the thread pointer, not the region of memory that just got mapped for the thread pointer.

Shouldn't the ld operation overwrite tp and not read from tp? The assembly looks like this: https://godbolt.org/z/PTqfTcshv

Yes, it writes to tp (which is currently junk / 0), with the result of *(void **)val, where val points into the allocated TLS region, at the first byte of the static TLS block within it. Therefore it does exactly what I have now twice said it does.

For my knowledge, what is wrong with a tp load in the existing code?

Well it's the difference between x = p and x = *p. Only one is ever the right thing to do. Like I said, "it's doing a load not a move, so is setting tp to whatever the first pointer in the static TLS block happens to be", i.e. you're reading in the first sizeof(void *) bytes of data from .tdata (or .tbss) and using that as the thread pointer, not the region of memory that just got mapped for the thread pointer.

Shouldn't the ld operation overwrite tp and not read from tp? The assembly looks like this: https://godbolt.org/z/PTqfTcshv

Yes, it writes to tp (which is currently junk / 0), with the result of *(void *)val, where val points into the allocated TLS region, at the first byte of the static TLS block within it. Therefore it does exactly what I have now twice said it does.

That is, currently the code does tp = *(void **)val;, but it should do tp = val;.

  • Updated commit message to reflect new changes

You need to do that manually in Phabricator, arc won't sync your local changes to the commit message.

Ok.

Change looks fine, but message is not, and I'd like to know how you're testing this. Presumably it's currently completely untested given it couldn't possibly have worked before.

For rv64: I test it locally on my VisionFive V2 board with ninja check-libc, which covers all the tests in libc, even more than what's being tested on the buildbots.
For rv32: I test it locally with qemu-system, with ninja libc-unit-tests, as our buildbots do for rv64. I do get some random crashes with ninja check-libc with or without this patch. I'll only investigate these crashes once I get ninja libc-unit-tests passing.

Does ninja check-libc on RV64 pass without this change? If so then there's no test coverage for this. If not then it's concerning that broken code was committed.

Yes, it writes to tp (which is currently junk / 0), with the result of *(void *)val, where val points into the allocated TLS region, at the first byte of the static TLS block within it. Therefore it does exactly what I have now twice said it does.

That is, currently the code does tp = *(void **)val;, but it should do tp = val;.

What I am not getting is why should we even consider a dereference operation in the ld case. They way I understand, this is what is happening with ld:

  1. Store val (which is in a0) on the stack.
  2. Load tp with the value stored on the stack in the above step.

So, there is no dereference happening anywhere?

Yes, it writes to tp (which is currently junk / 0), with the result of *(void *)val, where val points into the allocated TLS region, at the first byte of the static TLS block within it. Therefore it does exactly what I have now twice said it does.

That is, currently the code does tp = *(void **)val;, but it should do tp = val;.

What I am not getting is why should we even consider a dereference operation in the ld case. They way I understand, this is what is happening with ld:

  1. Store val (which is in a0) on the stack.
  2. Load tp with the value stored on the stack in the above step.

So, there is no dereference happening anywhere?

Oh sorry, right, it's an m constraint, so it's implicitly &val. Bleh, I had skimmed over that. Then yes, the existing code was correct, just added pointless indirection and made it XLEN-specific. This change is still the sensible way to do things.

sivachandra accepted this revision.Aug 29 2023, 5:28 PM

OK with commit message updated.

This revision was not accepted when it landed; it landed in state Needs Review.Aug 30 2023, 7:31 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.