This is an archive of the discontinued LLVM Phabricator instance.

[Driver] Default CLANG_DEFAULT_PIE_ON_LINUX to ON
ClosedPublic

Authored by MaskRay on Feb 22 2022, 12:15 AM.

Diff Detail

Event Timeline

MaskRay created this revision.Feb 22 2022, 12:15 AM
MaskRay requested review of this revision.Feb 22 2022, 12:15 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptFeb 22 2022, 12:15 AM

@atanasyan A few mips test/Driver tests will fail. Wonder if you have time making them more portable... Otherwise I'll just add UNSUPPORTED: default-pie-on-linux to them.

Failed Tests (6):
  Clang :: Driver/hip-fpie-option.hip
  Clang :: Driver/mips-cs.cpp
  Clang :: Driver/mips-fsf.cpp
  Clang :: Driver/mips-img-v2.cpp
  Clang :: Driver/mips-img.cpp
  Clang :: Driver/mips-mti-linux.c
clang/docs/ReleaseNotes.rst
186

it would be nice to explain what "pie" it is doing :)

@atanasyan A few mips test/Driver tests will fail. Wonder if you have time making them more portable... Otherwise I'll just add UNSUPPORTED: default-pie-on-linux to them.

Failed Tests (6):
  Clang :: Driver/hip-fpie-option.hip
  Clang :: Driver/mips-cs.cpp
  Clang :: Driver/mips-fsf.cpp
  Clang :: Driver/mips-img-v2.cpp
  Clang :: Driver/mips-img.cpp
  Clang :: Driver/mips-mti-linux.c

I will take a look at the tests. Probably hip-fpie-option.hip is unrelated to MIPS?

@atanasyan A few mips test/Driver tests will fail. Wonder if you have time making them more portable... Otherwise I'll just add UNSUPPORTED: default-pie-on-linux to them.

Failed Tests (6):
  Clang :: Driver/hip-fpie-option.hip
  Clang :: Driver/mips-cs.cpp
  Clang :: Driver/mips-fsf.cpp
  Clang :: Driver/mips-img-v2.cpp
  Clang :: Driver/mips-img.cpp
  Clang :: Driver/mips-mti-linux.c

I will take a look at the tests. Probably hip-fpie-option.hip is unrelated to MIPS?

Fixed MIPS tests at cedc23b

MaskRay updated this revision to Diff 411016.Feb 23 2022, 11:34 PM
MaskRay marked an inline comment as done.

Explain pie

thesamesam accepted this revision.Feb 24 2022, 2:37 PM

Been running with this for some time now (before we added the option). We should address the tests but they look sorted now.

This revision is now accepted and ready to land.Feb 24 2022, 2:37 PM

I'll test both CLANG_DEFAULT_PIE_ON_LINUX=on and CLANG_DEFAULT_PIE_ON_LINUX=off, and then push.
This is long undue. I did not change the default when CLANG_DEFAULT_PIE_ON_LINUX was added to avoid changes to release/14.x.

MaskRay edited the summary of this revision. (Show Details)Feb 24 2022, 4:11 PM
This revision was landed with ongoing or failed builds.Feb 24 2022, 4:22 PM
This revision was automatically updated to reflect the committed changes.
nikic added a subscriber: nikic.Feb 25 2022, 12:36 AM

Hm, it looks like enabling PIE has a pretty big negative compile-time impact, with a 20% regression on sqlite3: http://llvm-compile-time-tracker.com/compare.php?from=611122892e6d5813444bdd0e1fbe0a96f6e09779&to=3c4ed02698afec021c6bca80740d1e58e3ee019e&stat=instructions Code-size on kimwitu++ regresses by 13%.

Is that kind of impact expected?

Hm, it looks like enabling PIE has a pretty big negative compile-time impact, with a 20% regression on sqlite3: http://llvm-compile-time-tracker.com/compare.php?from=611122892e6d5813444bdd0e1fbe0a96f6e09779&to=3c4ed02698afec021c6bca80740d1e58e3ee019e&stat=instructions Code-size on kimwitu++ regresses by 13%.

Is that kind of impact expected?

It's not expected on aarch64/x86-64. Some architectures not supporting PC-relative relocations (i386/ppc32) may suffer and that's expected.

We need to investigate the -fno-pic vs -fpie difference.
Many groups already use PIE and the investigation will be useful.

@MaskRay The ppc buildbots have been red since these patches - please can you take a look? https://lab.llvm.org/buildbot/#/builders/57/builds/15454

MaskRay added a comment.EditedFeb 25 2022, 10:19 AM

@MaskRay The ppc buildbots have been red since these patches - please can you take a look? https://lab.llvm.org/buildbot/#/builders/57/builds/15454

Seems that ppc64 doesn't support test/sanitizer_common test/crt -fpie. I'll just disable them: https://github.com/llvm/llvm-project/issues/54084

Hm, it looks like enabling PIE has a pretty big negative compile-time impact, with a 20% regression on sqlite3: http://llvm-compile-time-tracker.com/compare.php?from=611122892e6d5813444bdd0e1fbe0a96f6e09779&to=3c4ed02698afec021c6bca80740d1e58e3ee019e&stat=instructions Code-size on kimwitu++ regresses by 13%.

Is that kind of impact expected?

almost no impact on legacy pm, interesting..

@MaskRay The ppc buildbots have been red since these patches - please can you take a look? https://lab.llvm.org/buildbot/#/builders/57/builds/15454

Seems that ppc64 doesn't support test/sanitizer_common test/crt -fpie. I'll just disable them: https://github.com/llvm/llvm-project/issues/54084

Actually I don't know how to disable the tests: https://lab.llvm.org/buildbot/#/builders/57/builds/15497 still failed.
Hope someone from #powerpc can disable them.

@MaskRay The ppc buildbots have been red since these patches - please can you take a look? https://lab.llvm.org/buildbot/#/builders/57/builds/15454

Seems that ppc64 doesn't support test/sanitizer_common test/crt -fpie. I'll just disable them: https://github.com/llvm/llvm-project/issues/54084

Actually I don't know how to disable the tests: https://lab.llvm.org/buildbot/#/builders/57/builds/15497 still failed.
Hope someone from #powerpc can disable them.

This does not appear to be a matter of simply marking some tests as UNSUPPORTED. Since this landed, there have been many builds with different sanitizer failures and different numbers of sanitizer failures. Please pull this patch to bring the bots back to green and we can work with you next week on fixing what needs to be fixed.

MaskRay added a comment.EditedFeb 25 2022, 4:43 PM

@MaskRay The ppc buildbots have been red since these patches - please can you take a look? https://lab.llvm.org/buildbot/#/builders/57/builds/15454

Seems that ppc64 doesn't support test/sanitizer_common test/crt -fpie. I'll just disable them: https://github.com/llvm/llvm-project/issues/54084

Actually I don't know how to disable the tests: https://lab.llvm.org/buildbot/#/builders/57/builds/15497 still failed.
Hope someone from #powerpc can disable them.

This does not appear to be a matter of simply marking some tests as UNSUPPORTED. Since this landed, there have been many builds with different sanitizer failures and different numbers of sanitizer failures. Please pull this patch to bring the bots back to green and we can work with you next week on fixing what needs to be fixed.

I enabled -DCLANG_DEFAULT_PIE_ON_LINUX=OFF for clang-ppc64le-rhel: https://github.com/llvm/llvm-zorg/commit/b6ddf02ce3a54da2df29e7e599b1838167e0e3ad
which should fix the issues.

While I feel sorry for leaving clang-ppc64le-rhel red for some time and am willing to fix issues if I have access to a ppc64 machine (especially compiler-rt ones that I care about),
I feel uncomfortably if a group just bluntly request "please pull this patch" when apparently (a) there is a better approach (explicitly setting CLANG_DEFAULT_PIE_ON_LINUX=OFF) and (b) there is something a bot maintainer can do
and (c) there is just some inherent stability problem (in this case, consider not enabling the testing when the target is still unstable) that is causing not only this issue, but various other reports (as I watch sanitizer failures quite closely and ppc64 often tends to be the outlier thing)

@MaskRay The ppc buildbots have been red since these patches - please can you take a look? https://lab.llvm.org/buildbot/#/builders/57/builds/15454

Seems that ppc64 doesn't support test/sanitizer_common test/crt -fpie. I'll just disable them: https://github.com/llvm/llvm-project/issues/54084

Actually I don't know how to disable the tests: https://lab.llvm.org/buildbot/#/builders/57/builds/15497 still failed.
Hope someone from #powerpc can disable them.

This does not appear to be a matter of simply marking some tests as UNSUPPORTED. Since this landed, there have been many builds with different sanitizer failures and different numbers of sanitizer failures. Please pull this patch to bring the bots back to green and we can work with you next week on fixing what needs to be fixed.

I enabled -DCLANG_DEFAULT_PIE_ON_LINUX=OFF for clang-ppc64le-rhel: https://github.com/llvm/llvm-zorg/commit/b6ddf02ce3a54da2df29e7e599b1838167e0e3ad
which should fix the issues.

While I feel sorry for leaving clang-ppc64le-rhel red for some time and am willing to fix issues if I have access to a ppc64 machine (especially compiler-rt ones that I care about),
I feel uncomfortably if a group just bluntly request "please pull this patch" when apparently (a) there is a better approach (explicitly setting CLANG_DEFAULT_PIE_ON_LINUX=OFF) and (b) there is something a bot maintainer can do
and (c) there is just some inherent stability problem (in this case, consider not enabling the testing when the target is still unstable) that is causing not only this issue, but various other reports (as I watch sanitizer failures quite closely and ppc64 often tends to be the outlier thing)

Fixing the buildbot like this doesn't help regular users though. I think it's better to revert and then work on a solution as @nemanjai suggested. What are the downsides to reverting this?

@MaskRay The ppc buildbots have been red since these patches - please can you take a look? https://lab.llvm.org/buildbot/#/builders/57/builds/15454

Seems that ppc64 doesn't support test/sanitizer_common test/crt -fpie. I'll just disable them: https://github.com/llvm/llvm-project/issues/54084

Actually I don't know how to disable the tests: https://lab.llvm.org/buildbot/#/builders/57/builds/15497 still failed.
Hope someone from #powerpc can disable them.

This does not appear to be a matter of simply marking some tests as UNSUPPORTED. Since this landed, there have been many builds with different sanitizer failures and different numbers of sanitizer failures. Please pull this patch to bring the bots back to green and we can work with you next week on fixing what needs to be fixed.

I enabled -DCLANG_DEFAULT_PIE_ON_LINUX=OFF for clang-ppc64le-rhel: https://github.com/llvm/llvm-zorg/commit/b6ddf02ce3a54da2df29e7e599b1838167e0e3ad
which should fix the issues.

While I feel sorry for leaving clang-ppc64le-rhel red for some time and am willing to fix issues if I have access to a ppc64 machine (especially compiler-rt ones that I care about),
I feel uncomfortably if a group just bluntly request "please pull this patch" when apparently (a) there is a better approach (explicitly setting CLANG_DEFAULT_PIE_ON_LINUX=OFF) and (b) there is something a bot maintainer can do
and (c) there is just some inherent stability problem (in this case, consider not enabling the testing when the target is still unstable) that is causing not only this issue, but various other reports (as I watch sanitizer failures quite closely and ppc64 often tends to be the outlier thing)

Fixing the buildbot like this doesn't help regular users though. I think it's better to revert and then work on a solution as @nemanjai suggested. What are the downsides to reverting this?

Well, I can speak for regular ppc64le users now since q66 kindly gives me access to a ppc64le machine.
check-lsan check-sanitizer check-crt etc passed fairly well with default PIE Clang.

I think we previously paid too less attention on build bot stability problems.
In this case I suspected again this is such a problem (typically sanitizer not so compatible with some older glibc/binutils)

<stdin>:1:16: note: possible intended match here
error: Segmentation fault (core dumped)

While I feel sorry for leaving clang-ppc64le-rhel red for some time and am willing to fix issues if I have access to a ppc64 machine (especially compiler-rt ones that I care about),
I feel uncomfortably if a group just bluntly request "please pull this patch" when apparently (a) there is a better approach (explicitly setting CLANG_DEFAULT_PIE_ON_LINUX=OFF) and (b) there is something a bot maintainer can do
and (c) there is just some inherent stability problem (in this case, consider not enabling the testing when the target is still unstable) that is causing not only this issue, but various other reports (as I watch sanitizer failures quite closely and ppc64 often tends to be the outlier thing)

Statements like this seem to be at odds with this community's culture (or at least the way I understand it).
As long as I have been a member of this community, the guidance for patches that break bots is to fix it immediately if the fix is obvious/trivial and if it isn't, to pull the patch until a solution can be found. I am not aware of any changes to this policy. I would also like to add that this approach serves most other members of the community quite well and at least I personally don't see much opposition to this. Frankly, the only person I have ever received a response that amounts to "I would rather not" when asking them to pull a patch that breaks bots is yourself.

So I'll try to respond to the individual statements you have made here.

  1. No access to a PowerPC machine - we have given you access to one before and are happy to do so again at any time.
  2. "bluntly requesting to pull the patch" - This is perhaps the part of your statement that I find most surprising. In case you haven't come across this, I encourage you to have a read through it. If you feel this needs to change, I encourage you to bring it up for discussion with the wider community. Of particular interest is this sentence: We strongly encourage “revert to green” as opposed to “fixing forward”. We encourage reverting first, investigating offline, and then reapplying the fixed patch - possibly after another round of review if warranted.
  3. "there is a better approach" - I don't think I have to spend a lot of time explaining how subjective this statement is.
  4. "there is something that a bot maintainer can do" - I can't quite decipher whether this is a disingenuous statement pretending that this is a situation where the bot maintainer (effectively myself in this case) isn't willing to help. Or perhaps it is a statement you made in the heat of the moment when I asked you to pull the patch and you missed the part where I offered help to resolve the issues when normal business resumes. I will give you the benefit of assuming the latter as I truly don't believe you have ill intentions here. I would just like to add that, as you surely realize, bot maintainers are not sitting around waiting for someone to break their bot so they can jump in immediately and work on resolving the issue. As contributors to this project, it is our responsibility to keep the tip of trunk green and to work with bot maintainers on their own time (within reason) to resolve issues we cause on their bots.
  5. The rather surprising and discouraging statement you made under (c) - while I realize that PowerPC may not be a target on which you develop, it is really not fair to make a blanket statement that PowerPC is not stable wrt. sanitizers. If you feel there is a stability issue with a specific bot or a specific target, I encourage you to collect data about false failures and bring it up with us - we would be happy to look into it as I am sure would any other target/bot maintainer. Statements like this sow the seeds of resentment towards specific targets - you are effectively saying that PowerPC is an unstable target (at least wrt. to sanitizers) but we insist on burdening the community with our unstable target by running sanitizer tests in our bots. I am afraid that unlike number 4. above, I don't find any ambiguity in your statement here. Your statement seems to be clearly and unambiguously hostile towards PowerPC and as such, I find it at odds with the spirit of this community. Regardless of all of that, I would once again like to reiterate that I would be very happy to look into false failures you are encountering with sanitizers on PowerPC.

Fangrui, I would really like you to understand that I very much value your contribution to various LLVM projects. You are an exceptional developer and seem to be laser focused on advancing the state of LLVM and its subprojects. This is not at all lost on me. However, you need to understand that this community has all kinds of participants, many of which also care about older or somewhat esoteric targets, operating systems, toolchains, etc. As such, there will be situations such as this where it is definitely the case that most != all and there may have to be some nuance in applying changes. At the very least, such situations may require slowing down, pulling a patch or two, re-evaluating and finding the best way forward. And it is important to recognize that sometimes the best way forward may not be something that all parties prefer, but at least all affected parties need to be part of the discussion.

It is unfortunate that a discussion in a patch review devolved into a discussion on the philosophy of community participation, but I felt that you made strong statements that I needed to address. While I won't continue this discussion in this forum, I would be happy to meet with you virtually and discuss this.

MaskRay added a comment.EditedFeb 25 2022, 10:10 PM

Statements like this seem to be at odds with this community's culture (or at least the way I understand it).
As long as I have been a member of this community, the guidance for patches that break bots is to fix it immediately if the fix is obvious/trivial and if it isn't, to pull the patch until a solution can be found. I am not aware of any changes to this policy. I would also like to add that this approach serves most other members of the community quite well and at least I personally don't see much opposition to this. Frankly, the only person I have ever received a response that amounts to "I would rather not" when asking them to pull a patch that breaks bots is yourself.

Hi @nemanjai, when I mentioned "as I watch sanitizer failures quite closely and ppc64 often tends to be the outlier thing", I was talking about the build bots.
However, you seem to take it very personally, conflate it with a judgement to the PowerPC target, and try to defend the PowerPC target.
I think perhaps you become too emotional from this reply. Hope we can clam down.

So I'll try to respond to the individual statements you have made here.

No access to a PowerPC machine - we have given you access to one before and are happy to do so again at any time.

Ack. Thanks for that.

"bluntly requesting to pull the patch" - This is perhaps the part of your statement that I find most surprising. In case you haven't come across this, I encourage you to have a read through it. If you feel this needs to change, I encourage you to bring it up for discussion with the wider community. Of particular interest is this sentence: We strongly encourage “revert to green” as opposed to “fixing forward”. We encourage reverting first, investigating offline, and then reapplying the fixed patch - possibly after another round of review if warranted.

Sure, I know this policy. See below.

"there is a better approach" - I don't think I have to spend a lot of time explaining how subjective this statement is.
"there is something that a bot maintainer can do" - I can't quite decipher whether this is a disingenuous statement pretending that this is a situation where the bot maintainer (effectively myself in this case) isn't willing to help. Or perhaps it is a statement you made in the heat of the moment when I asked you to pull the patch and you missed the part where I offered help to resolve the issues when normal business resumes. I will give you the benefit of assuming the latter as I truly don't believe you have ill intentions here. I would just like to add that, as you surely realize, bot maintainers are not sitting around waiting for someone to break their bot so they can jump in immediately and work on resolving the issue. As contributors to this project, it is our responsibility to keep the tip of trunk green and to work with bot maintainers on their own time (within reason) to resolve issues we cause on their bots.

The current issue is that one ppc64 bot (clang-ppc64le-rhel) was complaing about some sanitizer tests while other ppc64 bots testing sanitizers (e.g. clang-ppc64le-linux-multistage sanitizer-ppc64be-linux sanitizer-ppc64le-linux) were apparently happy.
This is an important factor making me believe this is a build bot stability issue, not the patch's fault.

From the failure log, the diagnostic looks like:

<stdin>:1:11: note: possible intended match here
error: Segmentation fault (core dumped)

This is weird. This was not the first time I saw this "Segmentation fault".
In the past, I saw GNU ld crashes or similar weird segfaults (on perhaps this bot).

-fPIE -pie is a very common configuration tested extensively by users and a bunch of ppc64 distributions.
In the downstream distributors typically patch Clang to behave the similar way as GCC, as this CMake patch tried to do to reduce downstream patch/customization burden.

Just a while ago, I analyzed a simple test test/crt/ctor_dtor.c. I am going to say the failure looked stranger to me. See my analysis in 71c4b8dfe1dd7c20242d303cd33cc022b7af9331. I used q66's machine to verify that -fpie -pie should be fine with crt1.o used by the test.

Making use of my expertise and experience, I put up this statement fairly responsibly: this really looks like the problem with this bot, or sanitizer tests got enabled while some inherent stability issues havn't been addressed.

Perhaps sanitizers are not so compatible with the bot's environment (say glibc).
If so, the issue will pop up from time to time and a bot maintainer may jump out to every patch which tickles the problem.

This is just going to waste developer time.
There is a very high probability that a patch author may shortgun and revert the wrong patch, causing unneeded churn.

I understand that a bot maintainer does not bandwidth to watch every change.
But this is the very time that we would appreciate that a bot maintainer did a little work just to make everyone happy by drop testing the flaky tests.
I realized that I would not suggest adding UNSUPPORTED to test files since that would add churn to the history.
The right call is to disable such tests for the particular bot, or let it use CLANG_DEFAULT_PIE_ON_LINUX=off for now, which I did in https://github.com/llvm/llvm-zorg/commit/b6ddf02ce3a54da2df29e7e599b1838167e0e3ad

I can share with you for some sanitizer tests I have quite good experience with Linaro developers. They often give more tolerance and in many times they are willing to investigate the issue by themselves.

The rather surprising and discouraging statement you made under (c) - while I realize that PowerPC may not be a target on which you develop, it is really not fair to make a blanket statement that PowerPC is not stable wrt. sanitizers. If you feel there is a stability issue with a specific bot or a specific target, I encourage you to collect data about false failures and bring it up with us - we would be happy to look into it as I am sure would any other target/bot maintainer. Statements like this sow the seeds of resentment towards specific targets - you are effectively saying that PowerPC is an unstable target (at least wrt. to sanitizers) but we insist on burdening the community with our unstable target by running sanitizer tests in our bots. I am afraid that unlike number 4. above, I don't find any ambiguity in your statement here. Your statement seems to be clearly and unambiguously hostile towards PowerPC and as such, I find it at odds with the spirit of this community. Regardless of all of that, I would once again like to reiterate that I would be very happy to look into false failures you are encountering with sanitizers on PowerPC.

Everyone can prove the issue on my behalf.

Open https://lab.llvm.org/buildbot/#/builders/57 , click "Success Rate" tab, the rate is lower compared to other bots recently, even lower than some other ppc64 bots.
Perhaps other ppc64 sanitizer bots are quite good and this is just a problem at least in this bot.

Perhaps open https://lab.llvm.org/buildbot/#/builders/57?numbuilds=1000 : it failed quite frequently in the past.

Fangrui, I would really like you to understand that I very much value your contribution to various LLVM projects. You are an exceptional developer and seem to be laser focused on advancing the state of LLVM and its subprojects. This is not at all lost on me. However, you need to understand that this community has all kinds of participants, many of which also care about older or somewhat esoteric targets, operating systems, toolchains, etc. As such, there will be situations such as this where it is definitely the case that most != all and there may have to be some nuance in applying changes. At the very least, such situations may require slowing down, pulling a patch or two, re-evaluating and finding the best way forward. And it is important to recognize that sometimes the best way forward may not be something that all parties prefer, but at least all affected parties need to be part of the discussion.

It is unfortunate that a discussion in a patch review devolved into a discussion on the philosophy of community participation, but I felt that you made strong statements that I needed to address. While I won't continue this discussion in this forum, I would be happy to meet with you virtually and discuss this.

Thank you for the praise:) But please see my first paragraph in the reply. You perhaps become too emotional from this reply.
I do not see how my previous reply implied anything disrespect to the PowerPC target.
I am going to say I do not necessarily agree with your implicit claim that I do not care about PowerPC: I have made ~191 changes mentioning PowerPC or PPC.

While I feel sorry for leaving clang-ppc64le-rhel red for some time and am willing to fix issues if I have access to a ppc64 machine (especially compiler-rt ones that I care about),
I feel uncomfortably if a group just bluntly request "please pull this patch" when apparently (a) there is a better approach (explicitly setting CLANG_DEFAULT_PIE_ON_LINUX=OFF) and (b) there is something a bot maintainer can do
and (c) there is just some inherent stability problem (in this case, consider not enabling the testing when the target is still unstable) that is causing not only this issue, but various other reports (as I watch sanitizer failures quite closely and ppc64 often tends to be the outlier thing)

Statements like this seem to be at odds with this community's culture (or at least the way I understand it).
As long as I have been a member of this community, the guidance for patches that break bots is to fix it immediately if the fix is obvious/trivial and if it isn't, to pull the patch until a solution can be found. I am not aware of any changes to this policy. I would also like to add that this approach serves most other members of the community quite well and at least I personally don't see much opposition to this. Frankly, the only person I have ever received a response that amounts to "I would rather not" when asking them to pull a patch that breaks bots is yourself.

@nemanjai Is correct here.

@MaskRay I feel like we are starting to repeat the same discussion we had with the start-stop-gc patches, and I would like to have a better outcome this time. Can you please just revert the patch and then we can discuss the next steps.

While I feel sorry for leaving clang-ppc64le-rhel red for some time and am willing to fix issues if I have access to a ppc64 machine (especially compiler-rt ones that I care about),
I feel uncomfortably if a group just bluntly request "please pull this patch" when apparently (a) there is a better approach (explicitly setting CLANG_DEFAULT_PIE_ON_LINUX=OFF) and (b) there is something a bot maintainer can do
and (c) there is just some inherent stability problem (in this case, consider not enabling the testing when the target is still unstable) that is causing not only this issue, but various other reports (as I watch sanitizer failures quite closely and ppc64 often tends to be the outlier thing)

Statements like this seem to be at odds with this community's culture (or at least the way I understand it).
As long as I have been a member of this community, the guidance for patches that break bots is to fix it immediately if the fix is obvious/trivial and if it isn't, to pull the patch until a solution can be found. I am not aware of any changes to this policy. I would also like to add that this approach serves most other members of the community quite well and at least I personally don't see much opposition to this. Frankly, the only person I have ever received a response that amounts to "I would rather not" when asking them to pull a patch that breaks bots is yourself.

@nemanjai Is correct here.

May I beg that you read my reply first and give more evidence when standing by one party?
You as a https://foundation.llvm.org/docs/board/ member, your words weigh a lot, but with great power comes great responsibility, so every judgement needs to be made prudently.

@MaskRay I feel like we are starting to repeat the same discussion we had with the start-stop-gc patches, and I would like to have a better outcome this time. Can you please just revert the patch and then we can discuss the next steps.

Everyone wants to discuss start-stop-gc can go to https://discourse.llvm.org/t/lld-default-nostart-stop-gc-default-on-release-13-x-and-main/5369
Please don't conflate things here. I felt bad that people used an unguaranteed behavior as granted and accused/attacked me of changes.
I think time has proved my correctness: I don't think anyone sees new wacky things in this area.

While I feel sorry for leaving clang-ppc64le-rhel red for some time and am willing to fix issues if I have access to a ppc64 machine (especially compiler-rt ones that I care about),
I feel uncomfortably if a group just bluntly request "please pull this patch" when apparently (a) there is a better approach (explicitly setting CLANG_DEFAULT_PIE_ON_LINUX=OFF) and (b) there is something a bot maintainer can do
and (c) there is just some inherent stability problem (in this case, consider not enabling the testing when the target is still unstable) that is causing not only this issue, but various other reports (as I watch sanitizer failures quite closely and ppc64 often tends to be the outlier thing)

Statements like this seem to be at odds with this community's culture (or at least the way I understand it).
As long as I have been a member of this community, the guidance for patches that break bots is to fix it immediately if the fix is obvious/trivial and if it isn't, to pull the patch until a solution can be found. I am not aware of any changes to this policy. I would also like to add that this approach serves most other members of the community quite well and at least I personally don't see much opposition to this. Frankly, the only person I have ever received a response that amounts to "I would rather not" when asking them to pull a patch that breaks bots is yourself.

@nemanjai Is correct here.

May I beg that you read my reply first and give more evidence when standing by one party?
You as a https://foundation.llvm.org/docs/board/ member, your words weigh a lot, but with great power comes great responsibility, so every judgement needs to be made prudently.

@MaskRay I feel like we are starting to repeat the same discussion we had with the start-stop-gc patches, and I would like to have a better outcome this time. Can you please just revert the patch and then we can discuss the next steps.

Everyone wants to discuss start-stop-gc can go to https://discourse.llvm.org/t/lld-default-nostart-stop-gc-default-on-release-13-x-and-main/5369
Please don't conflate things here. I felt bad that people used an unguaranteed behavior as granted and accused/attacked me of changes.
I think time has proved my correctness: I don't think anyone sees new wacky things in this area.

The issue here has nothing to do with the technical merits of the patch or what the root cause of the problem is. The policy for this project is that if you commit a patch that breaks someone's configuration (especially a buildbot), then it needs to be fixed quickly or reverted. I get that this policy can be frustrating as a committer when you feel your patch is correct, and the real problem is elsewhere, but this is still the policy and it should be followed.

error: Segmentation fault (core dumped)

This happens quite often on these ppc bots - they are known for unstability and slow cycles.

Open https://lab.llvm.org/buildbot/#/builders/57 , click "Success Rate" tab, the rate is lower compared to other bots recently, even lower than some other ppc64 bots.

We should also open discussion if such low success rate is “acceptable” for LLVM project. Buildbot maintainers are also responsible to provide as stable and working bot as possible.

MaskRay added a comment.EditedFeb 25 2022, 11:32 PM

[...]
The issue here has nothing to do with the technical merits of the patch or what the root cause of the problem is. The policy for this project is that if you commit a patch that breaks someone's configuration (especially a buildbot), then it needs to be fixed quickly or reverted. I get that this policy can be frustrating as a committer when you feel your patch is correct, and the real problem is elsewhere, but this is still the policy and it should be followed.

7 hours ago my https://github.com/llvm/llvm-zorg/commit/b6ddf02ce3a54da2df29e7e599b1838167e0e3ad was sufficient to fix the issue and was the suggested fix in my opinion.
Unfortunately nobody on the PowerPC side made the change effective in the build bot. Rather, I received such a heated message (https://reviews.llvm.org/D120305#3347058).
It was another way to fix the redness (revert) but IMO not justified.

And I think I need to push back such request as otherwise the red bot is going to waste other contributors' time (see my previous links on "Success Rate").
OK, that took me our hour to reply as a non-native speaker, forgetting that I probably should try harder to fix the issue.

Anyway, I find that my previous disabling tests attempt was actually effective. 274ec425dcc3e3f637dd006c5e9ae33bd0e2e917 should have appeased https://lab.llvm.org/buildbot/#/builders/57

[...]
The issue here has nothing to do with the technical merits of the patch or what the root cause of the problem is. The policy for this project is that if you commit a patch that breaks someone's configuration (especially a buildbot), then it needs to be fixed quickly or reverted. I get that this policy can be frustrating as a committer when you feel your patch is correct, and the real problem is elsewhere, but this is still the policy and it should be followed.

7 hours ago my https://github.com/llvm/llvm-zorg/commit/b6ddf02ce3a54da2df29e7e599b1838167e0e3ad was sufficient to fix the issue and was the suggested fix in my opinion.
Unfortunately nobody on the PowerPC side made the change effective in the build bot. Rather, I received such a heated message (https://reviews.llvm.org/D120305#3347058).
It was another way to fix the redness (revert) but IMO not justified.

I feel like we are talking past each other at this point, but in general changing the buildbot configuration is not an acceptable solution to a broken bot. Did the bot owner approve that change?

[...]
The issue here has nothing to do with the technical merits of the patch or what the root cause of the problem is. The policy for this project is that if you commit a patch that breaks someone's configuration (especially a buildbot), then it needs to be fixed quickly or reverted. I get that this policy can be frustrating as a committer when you feel your patch is correct, and the real problem is elsewhere, but this is still the policy and it should be followed.

7 hours ago my https://github.com/llvm/llvm-zorg/commit/b6ddf02ce3a54da2df29e7e599b1838167e0e3ad was sufficient to fix the issue and was the suggested fix in my opinion.
Unfortunately nobody on the PowerPC side made the change effective in the build bot. Rather, I received such a heated message (https://reviews.llvm.org/D120305#3347058).
It was another way to fix the redness (revert) but IMO not justified.

I feel like we are talking past each other at this point, but in general changing the buildbot configuration is not an acceptable solution to a broken bot. Did the bot owner approve that change?

Unsure why it isn't acceptable. There is fairly strong evidence that that specific ppc64le buildbot has a stability issue and -DCLANG_DEFAULT_PIE_ON_LINUX=OFF keeps it as if before this CMake patch.
We can always discuss this with @nemanjai in another place, e.g. in IRC if we want to stop bothering others subscribing to this thread.

(The trunk is clean and churn-less (there is no revert so downstream users will not get changed behaviors back and forth)).

[...]
The issue here has nothing to do with the technical merits of the patch or what the root cause of the problem is. The policy for this project is that if you commit a patch that breaks someone's configuration (especially a buildbot), then it needs to be fixed quickly or reverted. I get that this policy can be frustrating as a committer when you feel your patch is correct, and the real problem is elsewhere, but this is still the policy and it should be followed.

7 hours ago my https://github.com/llvm/llvm-zorg/commit/b6ddf02ce3a54da2df29e7e599b1838167e0e3ad was sufficient to fix the issue and was the suggested fix in my opinion.
Unfortunately nobody on the PowerPC side made the change effective in the build bot. Rather, I received such a heated message (https://reviews.llvm.org/D120305#3347058).
It was another way to fix the redness (revert) but IMO not justified.

I feel like we are talking past each other at this point, but in general changing the buildbot configuration is not an acceptable solution to a broken bot. Did the bot owner approve that change?

Unsure why it isn't acceptable. There is fairly strong evidence that that specific ppc64le buildbot has a stability issue and -DCLANG_DEFAULT_PIE_ON_LINUX=OFF keeps it as if before this CMake patch.
We can always discuss this with @nemanjai in another place, e.g. in IRC if we want to stop bothering others subscribing to this thread.

It's not acceptable, because buildbots are meant to represent a specific configuration that the buildbot owner cares about. Changing the buildbot configuration makes the buildbot no longer useful to them since it is now testing something different. For example, if someone is running production builds that used the same configuration as the buildbot, those production builds are still broken even though the buildbot is green. Configuration changes like this need to have approval of the bot owner.

I still don't understand why you can't revert the patch. I've encountered this same situation numerous times while working on this project and no one has ever objected this much to doing a revert. The fact that this is the second time this has happened is concerning to me.

[...]
The issue here has nothing to do with the technical merits of the patch or what the root cause of the problem is. The policy for this project is that if you commit a patch that breaks someone's configuration (especially a buildbot), then it needs to be fixed quickly or reverted. I get that this policy can be frustrating as a committer when you feel your patch is correct, and the real problem is elsewhere, but this is still the policy and it should be followed.

7 hours ago my https://github.com/llvm/llvm-zorg/commit/b6ddf02ce3a54da2df29e7e599b1838167e0e3ad was sufficient to fix the issue and was the suggested fix in my opinion.
Unfortunately nobody on the PowerPC side made the change effective in the build bot. Rather, I received such a heated message (https://reviews.llvm.org/D120305#3347058).
It was another way to fix the redness (revert) but IMO not justified.

I feel like we are talking past each other at this point, but in general changing the buildbot configuration is not an acceptable solution to a broken bot. Did the bot owner approve that change?

Unsure why it isn't acceptable. There is fairly strong evidence that that specific ppc64le buildbot has a stability issue and -DCLANG_DEFAULT_PIE_ON_LINUX=OFF keeps it as if before this CMake patch.
We can always discuss this with @nemanjai in another place, e.g. in IRC if we want to stop bothering others subscribing to this thread.

It's not acceptable, because buildbots are meant to represent a specific configuration that the buildbot owner cares about. Changing the buildbot configuration makes the buildbot no longer useful to them since it is now testing something different. For example, if someone is running production builds that used the same configuration as the buildbot, those production builds are still broken even though the buildbot is green. Configuration changes like this need to have approval of the bot owner.

If clang-ppc64le-rhel works with -DCLANG_DEFAULT_PIE_ON_LINUX=off but not with -DCLANG_DEFAULT_PIE_ON_LINUX=on, adding -DCLANG_DEFAULT_PIE_ON_LINUX=off makes the intention explicit. I am not sure why this isn't acceptable.
It's a tech debt, though, as we are making configurations fragmented.

I still don't understand why you can't revert the patch. I've encountered this same situation numerous times while working on this project and no one has ever objected this much to doing a revert. The fact that this is the second time this has happened is concerning to me.

I have stated my reasoning. Configuration churn can also be a problem to users.
Consider what if the DWARF v5 patch got reverted and relanded back and forth. Downstream users would keep observing changing behaviors.

In this case, really even normal ppc64le machines (including some bots) were happy and just that one was picky.
Given that the bot does not have a high success rate (track record) for at least the past month, I am unsure I am supposed to revert my change.

It is more concerning to me that a bot maintainer leaves an unstable bot for so long and is not even willing to spend very little time to make a llvm-zorg change live (perhaps just restart the bot software), but rather is more willing to write such a long reply taking it very personally.
(Sorry, I know when I write this, I was a bit in a mood. I felt quite frustrated at this point, so I could not keep using the tune when I made the reply https://reviews.llvm.org/D120305#3347094)

I can response to you that I have shared the thread to some folks and at least two agree with me that nemanjai's reply made the discussion less technical but more personal.
And one pointed out that your "nemanjai is correct" message was made a bit arbitrary, without evidence of considering the full context.

[...]
The issue here has nothing to do with the technical merits of the patch or what the root cause of the problem is. The policy for this project is that if you commit a patch that breaks someone's configuration (especially a buildbot), then it needs to be fixed quickly or reverted. I get that this policy can be frustrating as a committer when you feel your patch is correct, and the real problem is elsewhere, but this is still the policy and it should be followed.

7 hours ago my https://github.com/llvm/llvm-zorg/commit/b6ddf02ce3a54da2df29e7e599b1838167e0e3ad was sufficient to fix the issue and was the suggested fix in my opinion.
Unfortunately nobody on the PowerPC side made the change effective in the build bot. Rather, I received such a heated message (https://reviews.llvm.org/D120305#3347058).
It was another way to fix the redness (revert) but IMO not justified.

I feel like we are talking past each other at this point, but in general changing the buildbot configuration is not an acceptable solution to a broken bot. Did the bot owner approve that change?

Unsure why it isn't acceptable. There is fairly strong evidence that that specific ppc64le buildbot has a stability issue and -DCLANG_DEFAULT_PIE_ON_LINUX=OFF keeps it as if before this CMake patch.
We can always discuss this with @nemanjai in another place, e.g. in IRC if we want to stop bothering others subscribing to this thread.

It's not acceptable, because buildbots are meant to represent a specific configuration that the buildbot owner cares about. Changing the buildbot configuration makes the buildbot no longer useful to them since it is now testing something different. For example, if someone is running production builds that used the same configuration as the buildbot, those production builds are still broken even though the buildbot is green. Configuration changes like this need to have approval of the bot owner.

If clang-ppc64le-rhel works with -DCLANG_DEFAULT_PIE_ON_LINUX=off but not with -DCLANG_DEFAULT_PIE_ON_LINUX=on, adding -DCLANG_DEFAULT_PIE_ON_LINUX=off makes the intention explicit. I am not sure why this isn't acceptable.
It's a tech debt, though, as we are making configurations fragmented.

I still don't understand why you can't revert the patch. I've encountered this same situation numerous times while working on this project and no one has ever objected this much to doing a revert. The fact that this is the second time this has happened is concerning to me.

I have stated my reasoning. Configuration churn can also be a problem to users.
Consider what if the DWARF v5 patch got reverted and relanded back and forth. Downstream users would keep observing changing behaviors.

In this case, really even normal ppc64le machines (including some bots) were happy and just that one was picky.
Given that the bot does not have a high success rate (track record) for at least the past month, I am unsure I am supposed to revert my change.

It is more concerning to me that a bot maintainer leaves an unstable bot for so long and is not even willing to spend very little time to make a llvm-zorg change live (perhaps just restart the bot software), but rather is more willing to write such a long reply taking it very personally.
(Sorry, I know when I write this, I was a bit in a mood. I felt quite frustrated at this point, so I could not keep using the tune when I made the reply https://reviews.llvm.org/D120305#3347094)

I can response to you that I have shared the thread to some folks and at least two agree with me that nemanjai's reply made the discussion less technical but more personal.

Can these folks provide their feedback on the thread?

And one pointed out that your "nemanjai is correct" message was made a bit arbitrary, without evidence of considering the full context.

The Reversion Policy is documented here: https://llvm.org/docs/DeveloperPolicy.html#patch-reversion-policy

Given that the bot does not have a high success rate (track record) for at least the past month, I am unsure I am supposed to revert my change.

What is interesting that always the patch author is the person to be blamed even when there are situations where it is so obvious that a bot has a problem. LLVM project/board should also care about buildbots and whether they work - also there should be strict policy about bot removal if maintainer does not want to spend any time to fix problems.

It is very annoying to have unstable buildbots. One one side LLVM tries to be very inclusive and open for new people, on the other hand LLVM scares them with broken bots - first experience could be very bad.

tstellar added a comment.EditedFeb 26 2022, 12:49 AM

Given that the bot does not have a high success rate (track record) for at least the past month, I am unsure I am supposed to revert my change.

What is interesting that always the patch author is the person to be blamed even when there are situations where it is so obvious that a bot has a problem. LLVM project/board should also care about buildbots and whether they work - also there should be strict policy about bot removal if maintainer does not want to spend any time to fix problems.

It is very annoying to have unstable buildbots. One one side LLVM tries to be very inclusive and open for new people, on the other hand LLVM scares them with broken bots - first experience could be very bad.

The problem here is not an unstable bot. If you look at the build log for the bot, it is very clear that this patch introduced the failures.

Given that the bot does not have a high success rate (track record) for at least the past month, I am unsure I am supposed to revert my change.

What is interesting that always the patch author is the person to be blamed even when there are situations where it is so obvious that a bot has a problem. LLVM project/board should also care about buildbots and whether they work - also there should be strict policy about bot removal if maintainer does not want to spend any time to fix problems.

It is very annoying to have unstable buildbots. One one side LLVM tries to be very inclusive and open for new people, on the other hand LLVM scares them with broken bots - first experience could be very bad.

The problem here is not an unstable bot. If you look at the build log for the bot, it is very clear that this patch introduced the failures.

clang-ppc64le-rhel has the lowest success rate.

nikic added a comment.Feb 26 2022, 1:01 AM

@MaskRay Please revert the change and all dependent changes you have made. A revert is not a personal affront to you. It's not a judgement that you or your change are bad. It's a simple matter of policy and standard procedure. There's a good chance that next week you'll get confirmation that it's indeed some outdated libraries on the buildbot, and the change can reland without any changes on your side. Or maybe it turns out that this default is not quite viable for powerpc targets yet. Who knows.

Don't worry about the churn. These short-term reverts happen all the time, we're used to it. Especially for tiny changes like this one, it's really no problem (reverts can be more annoying if the commit touches 1500 test files).

@MaskRay Please revert the change and all dependent changes you have made. A revert is not a personal affront to you. It's not a judgement that you or your change are bad. It's a simple matter of policy and standard procedure. There's a good chance that next week you'll get confirmation that it's indeed some outdated libraries on the buildbot, and the change can reland without any changes on your side. Or maybe it turns out that this default is not quite viable for powerpc targets yet. Who knows.

Don't worry about the churn. These short-term reverts happen all the time, we're used to it. Especially for tiny changes like this one, it's really no problem (reverts can be more annoying if the commit touches 1500 test files).

I have mentioned that https://lab.llvm.org/buildbot/#/builders/57 has been green.

[...]

It's not acceptable, because buildbots are meant to represent a specific configuration that the buildbot owner cares about. Changing the buildbot configuration makes the buildbot no longer useful to them since it is now testing something different. For example, if someone is running production builds that used the same configuration as the buildbot, those production builds are still broken even though the buildbot is green. Configuration changes like this need to have approval of the bot owner.

If clang-ppc64le-rhel works with -DCLANG_DEFAULT_PIE_ON_LINUX=off but not with -DCLANG_DEFAULT_PIE_ON_LINUX=on, adding -DCLANG_DEFAULT_PIE_ON_LINUX=off makes the intention explicit. I am not sure why this isn't acceptable.
It's a tech debt, though, as we are making configurations fragmented.

I still don't understand why you can't revert the patch. I've encountered this same situation numerous times while working on this project and no one has ever objected this much to doing a revert. The fact that this is the second time this has happened is concerning to me.

I have stated my reasoning. Configuration churn can also be a problem to users.
Consider what if the DWARF v5 patch got reverted and relanded back and forth. Downstream users would keep observing changing behaviors.

In this case, really even normal ppc64le machines (including some bots) were happy and just that one was picky.
Given that the bot does not have a high success rate (track record) for at least the past month, I am unsure I am supposed to revert my change.

It is more concerning to me that a bot maintainer leaves an unstable bot for so long and is not even willing to spend very little time to make a llvm-zorg change live (perhaps just restart the bot software), but rather is more willing to write such a long reply taking it very personally.
(Sorry, I know when I write this, I was a bit in a mood. I felt quite frustrated at this point, so I could not keep using the tune when I made the reply https://reviews.llvm.org/D120305#3347094)

I can response to you that I have shared the thread to some folks and at least two agree with me that nemanjai's reply made the discussion less technical but more personal.

Can these folks provide their feedback on the thread?

Folks willing to provide feedback usually want to be anonymous. You may understand that they don't want to be bothered by further messages and don't want to stand by one side.

I know my replies are wasting subscribers' time and I feel sorry about that.
But I can assure to you that if without https://reviews.llvm.org/D120305#3347058 and your reply to that comment, we were likely in a green state pretty early 🥳

nikic added a comment.Feb 26 2022, 1:17 AM

Yes, because you reverted the change for that one buildbot, of course it is green now. You could have also made the buildbot green by disabling tests on that bot. Or disabling sanitizers on it. Doesn't change the fact that the configuration it was originally testing is still broken, you just hid the failure.

@MaskRay Please revert the change and all dependent changes you have made. A revert is not a personal affront to you. It's not a judgement that you or your change are bad. It's a simple matter of policy and standard procedure. There's a good chance that next week you'll get confirmation that it's indeed some outdated libraries on the buildbot, and the change can reland without any changes on your side. Or maybe it turns out that this default is not quite viable for powerpc targets yet. Who knows.

Don't worry about the churn. These short-term reverts happen all the time, we're used to it. Especially for tiny changes like this one, it's really no problem (reverts can be more annoying if the commit touches 1500 test files).

I have mentioned that https://lab.llvm.org/buildbot/#/builders/57 has been green.

Disabling the failing tests with an unreviewed patch is not the right way to fix this.

Yes, because you reverted the change for that one buildbot, of course it is green now. You could have also made the buildbot green by disabling tests on that bot. Or disabling sanitizers on it. Doesn't change the fact that the configuration it was originally testing is still broken, you just hid the failure.

@MaskRay Please revert the change and all dependent changes you have made. A revert is not a personal affront to you. It's not a judgement that you or your change are bad. It's a simple matter of policy and standard procedure. There's a good chance that next week you'll get confirmation that it's indeed some outdated libraries on the buildbot, and the change can reland without any changes on your side. Or maybe it turns out that this default is not quite viable for powerpc targets yet. Who knows.

Don't worry about the churn. These short-term reverts happen all the time, we're used to it. Especially for tiny changes like this one, it's really no problem (reverts can be more annoying if the commit touches 1500 test files).

I have mentioned that https://lab.llvm.org/buildbot/#/builders/57 has been green.

Disabling the failing tests with an unreviewed patch is not the right way to fix this.

clang-ppc64le-rhel got -DCLANG_DEFAULT_PIE_ON_LINUX=OFF ~9 hours ago. If any of you can make the configuration live on the bot, it will work and we can re-enable the tests.

Yes, because you reverted the change for that one buildbot, of course it is green now. You could have also made the buildbot green by disabling tests on that bot. Or disabling sanitizers on it. Doesn't change the fact that the configuration it was originally testing is still broken, you just hid the failure.

The build is green because of this commit: https://github.com/llvm/llvm-project/commit/274ec425dcc3e3f637dd006c5e9ae33bd0e2e917 The buildbot change you are referring to, which is https://github.com/llvm/llvm-zorg/commit/b6ddf02ce3a54da2df29e7e599b1838167e0e3ad, has not taken effect yet, because the buildbot server has not been restarted.

nikic added a comment.Feb 26 2022, 1:32 AM

Yes, because you reverted the change for that one buildbot, of course it is green now. You could have also made the buildbot green by disabling tests on that bot. Or disabling sanitizers on it. Doesn't change the fact that the configuration it was originally testing is still broken, you just hid the failure.

The build is green because of this commit: https://github.com/llvm/llvm-project/commit/274ec425dcc3e3f637dd006c5e9ae33bd0e2e917 The buildbot change you are referring to, which is https://github.com/llvm/llvm-zorg/commit/b6ddf02ce3a54da2df29e7e599b1838167e0e3ad, has not taken effect yet, because the buildbot server has not been restarted.

Wow, that's even worse. So now it's not just a change to that one buildbot, but sanitizer tests were disabled for powerpc entirely?!

Yes, because you reverted the change for that one buildbot, of course it is green now. You could have also made the buildbot green by disabling tests on that bot. Or disabling sanitizers on it. Doesn't change the fact that the configuration it was originally testing is still broken, you just hid the failure.

The build is green because of this commit: https://github.com/llvm/llvm-project/commit/274ec425dcc3e3f637dd006c5e9ae33bd0e2e917 The buildbot change you are referring to, which is https://github.com/llvm/llvm-zorg/commit/b6ddf02ce3a54da2df29e7e599b1838167e0e3ad, has not taken effect yet, because the buildbot server has not been restarted.

Wow, that's even worse. So now it's not just a change to that one buildbot, but sanitizer tests were disabled for powerpc entirely?!

Only few sanitizer_common and lsan tests, not entirely. It should be re-enabled pretty soon once the llvm-zorg change is made live.
That was I mentioned that "we can enable the tests".

I think at this point, if you prefer, I am happy to revert this change the disabling (it needs quite a bit of tests), so that we can know whether -fno-pic and -fpie have a large difference on sqlite3 performance.

I lost my control when I saw https://reviews.llvm.org/D120305#3347058 and Tom's first reply to it.
For quite few hours yesterday, I did not know my fixed did not fix the problem or that nobody tried making llvm-zorg change work.

MaskRay added a comment.EditedFeb 26 2022, 2:13 AM

Yes, because you reverted the change for that one buildbot, of course it is green now. You could have also made the buildbot green by disabling tests on that bot. Or disabling sanitizers on it. Doesn't change the fact that the configuration it was originally testing is still broken, you just hid the failure.

The build is green because of this commit: https://github.com/llvm/llvm-project/commit/274ec425dcc3e3f637dd006c5e9ae33bd0e2e917 The buildbot change you are referring to, which is https://github.com/llvm/llvm-zorg/commit/b6ddf02ce3a54da2df29e7e599b1838167e0e3ad, has not taken effect yet, because the buildbot server has not been restarted.

Wow, that's even worse. So now it's not just a change to that one buildbot, but sanitizer tests were disabled for powerpc entirely?!

Only few sanitizer_common and lsan tests, not entirely. It should be re-enabled pretty soon once the llvm-zorg change is made live.
That was I mentioned that "we can enable the tests".

I think at this point, if you prefer, I am happy to revert this change the disabling (it needs quite a bit of tests), so that we can know whether -fno-pic and -fpie have a large difference on sqlite3 performance.

I lost my control when I saw https://reviews.llvm.org/D120305#3347058 and Tom's first reply to it.
For quite few hours yesterday, I did not know my fixed did not fix the problem or that nobody tried making llvm-zorg change work.

@nikic If you still want to revert, you can use https://reviews.llvm.org/P8281
I have done some testing that it is fine.

It is too late here so I'll not be responsive for many hours.

Just realized that I had some unneeded debate which isn't really important

MaskRay added a comment.EditedFeb 26 2022, 12:55 PM

Hm, it looks like enabling PIE has a pretty big negative compile-time impact, with a 20% regression on sqlite3: http://llvm-compile-time-tracker.com/compare.php?from=611122892e6d5813444bdd0e1fbe0a96f6e09779&to=3c4ed02698afec021c6bca80740d1e58e3ee019e&stat=instructions Code-size on kimwitu++ regresses by 13%.

Is that kind of impact expected?

Many groups build sqlite3 with -fPIC. -fPIC and -fPIE have similar compile time.
The metric on llvm-compile-time-tracker.com is related to llvm-test-suite/CTMark/sqlite3, which just focuses on (without the CMake patch) -fno-pic performance.
I do not think the CMake change for linux-gnu needs to be blamed. Most targets default to PIC/PIE nowadays.

(
I checked run time, no big difference.

% hyperfine --warmup 2 --min-runs 16 "/tmp/c/sqlite3.nopie -init sqlite3rc :memory: < commands"                         
Benchmark 1: /tmp/c/sqlite3.nopie -init sqlite3rc :memory: < commands
  Time (mean ± σ):      2.068 s ±  0.011 s    [User: 2.044 s, System: 0.024 s]
  Range (min … max):    2.044 s …  2.085 s    16 runs

% hyperfine --warmup 2 --min-runs 16 "/tmp/c/sqlite3.pie -init sqlite3rc :memory: < commands"  
Benchmark 1: /tmp/c/sqlite3.pie -init sqlite3rc :memory: < commands
  Time (mean ± σ):      2.053 s ±  0.015 s    [User: 2.034 s, System: 0.018 s]
  Range (min … max):    2.027 s …  2.080 s    16 runs

)

About compile time,

I run =time -f "CPU: %Us\tReal: %es\tRAM: %MKB" /tmp/out/custom1/bin/clang -DNDEBUG -O3 -DNDEBUG -w -Werror=date-time -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DSQLITE_OMIT_LOAD_EXTENSION=1 -DSQLITE_THREADSAFE=0 -I. -o CTMark/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.o -c ~/Dev/llvm-test-suite/CTMark/sqlite3/sqlite3.c -ftime-report with -fno-pie/-fpie/-fpic to get some insight.

The IR has minor difference but pie's is slightly larger:

% wc -l nopie.ll pie.ll
  157975 nopie.ll
  159675 pie.ll
...

If someone wants to investigate, sqlite3Parser changes a lot from -fno-pic to -fPIE, but that looks organic changes to me.
-ftime-report comparison suggests that every pass is slightly slower with -fPIE. ModuleInlinerWrapperPass and DevirtSCCRepeatedPass contribute most of the slowness.

-fno-pic -ftime-report
   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   5.1753 ( 28.1%)   0.2359 ( 27.5%)   5.4112 ( 28.0%)   5.4123 ( 28.1%)  ModuleInlinerWrapperPass
   5.1447 ( 27.9%)   0.2309 ( 26.9%)   5.3755 ( 27.8%)   5.3767 ( 27.9%)  DevirtSCCRepeatedPass
   1.8716 ( 10.1%)   0.0936 ( 10.9%)   1.9652 ( 10.2%)   1.9613 ( 10.2%)  InstCombinePass
   0.7674 (  4.2%)   0.0254 (  3.0%)   0.7928 (  4.1%)   0.7921 (  4.1%)  GVNPass
   0.4563 (  2.5%)   0.0288 (  3.4%)   0.4851 (  2.5%)   0.4844 (  2.5%)  InlinerPass
   0.4194 (  2.3%)   0.0174 (  2.0%)   0.4368 (  2.3%)   0.4359 (  2.3%)  MemorySSAAnalysis
   0.3399 (  1.8%)   0.0191 (  2.2%)   0.3589 (  1.9%)   0.3578 (  1.9%)  SimplifyCFGPass
   0.3206 (  1.7%)   0.0143 (  1.7%)   0.3349 (  1.7%)   0.3354 (  1.7%)  BlockFrequencyAnalysis
   0.2985 (  1.6%)   0.0081 (  0.9%)   0.3065 (  1.6%)   0.3061 (  1.6%)  CorrelatedValuePropagationPass
   0.2608 (  1.4%)   0.0158 (  1.8%)   0.2767 (  1.4%)   0.2763 (  1.4%)  EarlyCSEPass
-fpie -ftime-report
   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   7.8109 ( 29.2%)   0.2239 ( 26.8%)   8.0347 ( 29.1%)   8.0363 ( 29.2%)  ModuleInlinerWrapperPass
   7.7752 ( 29.1%)   0.2230 ( 26.6%)   7.9981 ( 29.0%)   7.9997 ( 29.0%)  DevirtSCCRepeatedPass
   2.5352 (  9.5%)   0.0807 (  9.6%)   2.6158 (  9.5%)   2.6114 (  9.5%)  InstCombinePass
   1.2792 (  4.8%)   0.0162 (  1.9%)   1.2954 (  4.7%)   1.2948 (  4.7%)  GVNPass
   0.6529 (  2.4%)   0.0157 (  1.9%)   0.6686 (  2.4%)   0.6674 (  2.4%)  MemorySSAAnalysis
   0.6048 (  2.3%)   0.0261 (  3.1%)   0.6309 (  2.3%)   0.6306 (  2.3%)  InlinerPass
   0.4625 (  1.7%)   0.0126 (  1.5%)   0.4751 (  1.7%)   0.4743 (  1.7%)  CorrelatedValuePropagationPass
   0.4541 (  1.7%)   0.0160 (  1.9%)   0.4701 (  1.7%)   0.4690 (  1.7%)  SimplifyCFGPass
   0.3981 (  1.5%)   0.0124 (  1.5%)   0.4105 (  1.5%)   0.4101 (  1.5%)  EarlyCSEPass
  • -fno-pie: Time (mean ± σ): 11.999 s ± 0.043 s
  • -fpie: Time (mean ± σ): 14.643 s ± 0.073 s
  • -fno-pie -flegacy-pass-manager: Time (mean ± σ): 16.831 s ± 0.027 s
  • -fpie -flegacy-pass-manager: Time (mean ± σ): 16.887 s ± 0.099 s

I have reverted the buildbot configuration change as well: https://github.com/llvm/llvm-zorg/commit/c01d0787c4a11e40d0fed81b0df6841bebc16f7f

Now that everything is back to the original state, we can discuss how to proceed.

nikic added a comment.EditedMar 1 2022, 1:47 AM

I've been looking into the compile-time regressions.

The large regression on sqlite3 is due to differences in inlining behavior. The inlining cost model is slightly different for PIC builds, because GEPs that are based on a global and have a variable offset are no longer considered free. In the sqlite3 case this leads to different inlining decisions, which ultimately lead to the large compile-time regression. I don't think there is anything immediately actionable here, it's just bad luck that the different cost modeling happens to push us across an arbitrary threshold.

I've also looked at many smaller regressions (around ~5% on individual files), where the common factor is that we spend a lot more time in RAGreedy (e.g. from 2% of total compile-time up to 6%). Most of the additional cost is in the calculation of region splitting costs. Something is clearly very wrong here, because a single RAGreedy::trySplit() call shouldn't take 3M instructions to execute. (Affected files are for example kimwy.cc, z06.c, z20.c.)

Edit: Here's a single-function sample file for which llc -relocation-model=pic spends appreciably more time in RAGreedy: https://gist.github.com/nikic/e08cc1d0caeb2b721f1fc690c3b76384

This comment was removed by tstellar.
Herald added a project: Restricted Project. · View Herald TranscriptMar 15 2022, 3:26 AM

@xbolva00 I've removed your comment due to inappropriate language and tone. Please find a more constructive way to communicate your point.

@nemanjai mentioned "All the issues I was seeing seem to have been resolved. I can build with CLANG_DEFAULT_PIE_ON_LINUX=on without any failing sanitizer or crt failures."
Will recommit. And ppc64 bots have hopefully got better with the upgrade. Thanks!

nikic added a comment.Apr 7 2022, 2:20 AM

Looks like this causes a relocation failure on this buildbot: https://lab.llvm.org/buildbot/#/builders/169/builds/7311

ld.lld: error: relocation R_MIPS_64 cannot be used against local symbol; recompile with -fPIC
>>> defined in ScudoUnitTestsObjects.wrappers_cpp_test.cpp.mips64.o
>>> referenced by wrappers_cpp_test.cpp
>>>               ScudoUnitTestsObjects.wrappers_cpp_test.cpp.mips64.o:(.eh_frame+0x15A39)
MaskRay added a comment.EditedApr 7 2022, 9:56 AM

The ScudoCUnitTest-mips64el-Test failure looks like a mips64el specific issue.

FAILED: lib/scudo/standalone/tests/ScudoCUnitTest-mips64el-Test 
cd /b/sanitizer-x86_64-linux-qemu/build/llvm_build2_debug_mips64el_qemu/lib/scudo/standalone/tests && /b/sanitizer-x86_64-linux-qemu/build/llvm_build0/bin/clang++ ScudoUnitTestsObjects.wrappers_c_test.cpp.mips64el.o ScudoUnitTestsObjects.scudo_unit_test_main.cpp.mips64el.o ScudoUnitTestsObjects.gtest-all.cc.mips64el.o /b/sanitizer-x86_64-linux-qemu/build/llvm_build2_debug_mips64el_qemu/lib/scudo/standalone/tests/libRTScudoCUnitTest.mips64el.a -o /b/sanitizer-x86_64-linux-qemu/build/llvm_build2_debug_mips64el_qemu/lib/scudo/standalone/tests/./ScudoCUnitTest-mips64el-Test -fuse-ld=lld --target=mips64el-linux-gnuabi64 -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta -lstdc++ -pthread -latomic
ld.lld: error: cannot preempt symbol: DW.ref.__gxx_personality_v0
>>> defined in ScudoUnitTestsObjects.wrappers_c_test.cpp.mips64el.o
>>> referenced by wrappers_c_test.cpp
>>>               ScudoUnitTestsObjects.wrappers_c_test.cpp.mips64el.o:(.eh_frame+0x1420B)

This was a different problem: on some old systems glibc crt1.o was not compiled with -fPIC and cannot be linked as -pie. I just fixed it by adding -fno-pic -no-pie.