This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO][Test] XFAIL profile-context-tracker-debug.ll on AIX
ClosedPublic

Authored by jsji on Apr 2 2021, 2:09 PM.

Details

Summary

The case start to fail since https://reviews.llvm.org/D99351.

Looks like to me that the node order within Context Profile Tree depends
on the implmementation of std::hash<std::string>.

Unfortunately, the current clang implementation generate different values on
AIX (or for all big-endian systems?)

On Linux:
main: 2408804140(0x8f936f2c)
external: 896680882(0x357243b2)
externalA: 620231129(0x24f7f9d9)

On AIX:
main: 994322777(0x3b442959)
external: 3548191215(0xd37d19ef)
externalA: 1390365101(0x52df49ad)

XFAIL it first while we discuss and seek for a fix.

Diff Detail

Event Timeline

jsji requested review of this revision.Apr 2 2021, 2:09 PM
jsji created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptApr 2 2021, 2:09 PM
jsji edited the summary of this revision. (Show Details)Apr 2 2021, 2:11 PM
hoy added a comment.Apr 2 2021, 2:38 PM

Thanks for reporting this issue. I'm wondering if you have tried D99547 which fixed a similar non-determinism issue.

jsji added a comment.Apr 2 2021, 2:53 PM
In D99815#2667017, @hoy wrote:

Thanks for reporting this issue. I'm wondering if you have tried D99547 which fixed a similar non-determinism issue.

Yes, this is still failing with ToT.

wenlei added a comment.Apr 2 2021, 3:02 PM

Hmm.. the node order is determined by std::map<uint32_t, ContextTrieNode> AllChildContext which should be consistent across platforms.

Which part of context tracker did you see the difference triggered by std::hash<std::string>?

jsji added a comment.Apr 2 2021, 3:04 PM

Hmm.. the node order is determined by std::map<uint32_t, ContextTrieNode> AllChildContext which should be consistent across platforms.

Which part of context tracker did you see the difference triggered by std::hash<std::string>?

On Linux:

Context Profile Tree:
Node:

Callsite: 0
Children:
  Node: externalA
  Node: external
  Node: main

On AIX:
Context Profile Tree:
Node:

Callsite: 0
Children:
  Node: main
  Node: externalA
  Node: external
jsji added a comment.EditedApr 2 2021, 3:05 PM

The hash values are different, so the order of these 3 are different in map.

wenlei accepted this revision.Apr 2 2021, 3:07 PM

Ah, now I see it coming from ContextTrieNode::nodeHash. Thanks for explaining. LGTM.

This revision is now accepted and ready to land.Apr 2 2021, 3:07 PM
This revision was automatically updated to reflect the committed changes.
jsji added a comment.Apr 12 2021, 8:11 AM

So this is not only failing for AIX, it is actually depends on the libc++ implementation. It may fail on Linux if using non default libc++ .

Any plan of fixing the code to not using std::hash<std::string> ? Thanks. @hoy

hoy added a comment.Apr 12 2021, 2:31 PM

So this is not only failing for AIX, it is actually depends on the libc++ implementation. It may fail on Linux if using non default libc++ .

Any plan of fixing the code to not using std::hash<std::string> ? Thanks. @hoy

Thanks for pointing it out. Yes, we think that may be related to libstdc++ function _Hash_bytes. My plan is to tweak the test with Check-DAG since the promoting order of the children should not cause codegen diffs. We could also sort the children by names instead of hash value for a more stable order across platforms, but that''ll likely incur more overhead. What do you think?

namespace std
{
_GLIBCXX_BEGIN_NAMESPACE_VERSION

  // Hash function implementation for the nontrivial specialization.
  // All of them are based on a primitive that hashes a pointer to a
  // byte array. The actual hash algorithm is not guaranteed to stay
  // the same from release to release -- it may be updated or tuned to
  // improve hash quality or speed.
  size_t
  _Hash_bytes(const void* __ptr, size_t __len, size_t __seed);
jsji added a comment.Apr 12 2021, 2:37 PM
In D99815#2684210, @hoy wrote:

So this is not only failing for AIX, it is actually depends on the libc++ implementation. It may fail on Linux if using non default libc++ .

Any plan of fixing the code to not using std::hash<std::string> ? Thanks. @hoy

Thanks for pointing it out. Yes, we think that may be related to libstdc++ function _Hash_bytes. My plan is to tweak the test with Check-DAG since the promoting order of the children should not cause codegen diffs. We could also sort the children by names instead of hash value for a more stable order across platforms, but that''ll likely incur more overhead. What do you think?

namespace std
{
_GLIBCXX_BEGIN_NAMESPACE_VERSION

  // Hash function implementation for the nontrivial specialization.
  // All of them are based on a primitive that hashes a pointer to a
  // byte array. The actual hash algorithm is not guaranteed to stay
  // the same from release to release -- it may be updated or tuned to
  // improve hash quality or speed.
  size_t
  _Hash_bytes(const void* __ptr, size_t __len, size_t __seed);

Thanks, I am OK with tweaking test with check-DAG as long as you can confirm that this is safe in codegen ( I am not familiar with these code enough to judge).

hoy added a comment.Apr 12 2021, 3:05 PM
In D99815#2684210, @hoy wrote:

So this is not only failing for AIX, it is actually depends on the libc++ implementation. It may fail on Linux if using non default libc++ .

Any plan of fixing the code to not using std::hash<std::string> ? Thanks. @hoy

Thanks for pointing it out. Yes, we think that may be related to libstdc++ function _Hash_bytes. My plan is to tweak the test with Check-DAG since the promoting order of the children should not cause codegen diffs. We could also sort the children by names instead of hash value for a more stable order across platforms, but that''ll likely incur more overhead. What do you think?

namespace std
{
_GLIBCXX_BEGIN_NAMESPACE_VERSION

  // Hash function implementation for the nontrivial specialization.
  // All of them are based on a primitive that hashes a pointer to a
  // byte array. The actual hash algorithm is not guaranteed to stay
  // the same from release to release -- it may be updated or tuned to
  // improve hash quality or speed.
  size_t
  _Hash_bytes(const void* __ptr, size_t __len, size_t __seed);

Thanks, I am OK with tweaking test with check-DAG as long as you can confirm that this is safe in codegen ( I am not familiar with these code enough to judge).

Thanks. Do you still have the output of the failing test? That'll help to tweak the test.

jsji added a comment.Apr 12 2021, 4:11 PM

Thanks. Do you still have the output of the failing test? That'll help to tweak the test.

The output is the same as without https://reviews.llvm.org/D99351