This is an archive of the discontinued LLVM Phabricator instance.

Fix ItaniumRecordLayoutBuilder so that is grabs the correct bases class offsets from the external source
ClosedPublic

Authored by shafik on Jul 1 2020, 4:01 PM.

Details

Summary

Currently the ItaniumRecordLayoutBuilder when laying out base classes has the virtual and non-virtual bases mixed up when pulling the base class layouts from the external source.

This came up in an LLDB bug where on arm64 because of differences in how it deals with tail padding would layout the bases differently without the correct layout from the external source (LLDB). This would result in some fields being off by 4 bytes.

Diff Detail

Event Timeline

shafik created this revision.Jul 1 2020, 4:01 PM
shafik marked an inline comment as done.Jul 1 2020, 4:02 PM
shafik added a subscriber: rsmith.
shafik added inline comments.
clang/lib/AST/RecordLayoutBuilder.cpp
1190

@rsmith you were correct, this was indeed reversed.

teemperor requested changes to this revision.EditedJul 5 2020, 4:12 AM

Thanks for tracking this down, this is a really nasty bug...

The fix itself is obviously fine, but I think I'm out of the loop regarding the testing strategy. We talked about adding a Clang test for this with the help of this layout overwrite JSON file. I assume that extending this to cover virtual bases turned out to be more complicated than expected? FWIW, I'm not necessarily the biggest fan of this Clang test option so I would be fine if we leave it as-is.

I think having an LLDB test is a good idea, but it's not clear why it's a Shell test. If I understand correctly this test requires running on arm64 (so, a remote test target), but from what I remember all the remote platform support is only in dotest.py? Also pretty much all other expression evaluation tests and the utilities for that are in the Python/API test infrastructure, so it's a bit out of place.

Also I think the test can be in general much simpler than an arm64-specific test. We get all base class offsets wrong in LLDB, so we can just write a simple test where you change the layout of some structs in a way that it doesn't fit the default layout. E.g., just putting alignas on a base class and then reading fields should be enough to trigger the same bug.

This revision now requires changes to proceed.Jul 5 2020, 4:12 AM
tschuett added a subscriber: tschuett.EditedJul 5 2020, 6:41 AM

You could try:

clangxx_host -Xclang -fdump-record-layouts %p/Inputs/layout.cpp -emit-llvm -o /dev/null

You could commit the record layouts without your fix to show the differences that you patch makes.

You could try:

clangxx_host -Xclang -fdump-record-layouts %p/Inputs/layout.cpp -emit-llvm -o /dev/null

You could commit the record layouts without your fix to show the differences that you patch makes.

Yeah that's I think the proper way to check the results, but IIRC the problem with testing this in Clang is that foverride-record-layout= (the testing approach for this code) doesn't support specifying base class offset), so we can't even *trigger* the bug itself in Clang itself. That what I was referring to with "layout overwrite JSON file" in my comment above.

shafik updated this revision to Diff 275865.Jul 6 2020, 5:04 PM

Adding a second test that is not arm64 specific.

shafik added a comment.Jul 6 2020, 5:07 PM

Thanks for tracking this down, this is a really nasty bug...

The fix itself is obviously fine, but I think I'm out of the loop regarding the testing strategy. We talked about adding a Clang test for this with the help of this layout overwrite JSON file. I assume that extending this to cover virtual bases turned out to be more complicated than expected? FWIW, I'm not necessarily the biggest fan of this Clang test option so I would be fine if we leave it as-is.

I think having an LLDB test is a good idea, but it's not clear why it's a Shell test. If I understand correctly this test requires running on arm64 (so, a remote test target), but from what I remember all the remote platform support is only in dotest.py? Also pretty much all other expression evaluation tests and the utilities for that are in the Python/API test infrastructure, so it's a bit out of place.

Also I think the test can be in general much simpler than an arm64-specific test. We get all base class offsets wrong in LLDB, so we can just write a simple test where you change the layout of some structs in a way that it doesn't fit the default layout. E.g., just putting alignas on a base class and then reading fields should be enough to trigger the same bug.

Good idea using alignas that actually did the trick, I was having trouble getting this to reproduce otherwise. I added a second test which should also reproduce on non-arm64 cases but I will keep the original test as well since they are hitting the bug from slightly different paths.

The goal of using a shell test was to avoid writing a device specific test and even though the first case should also pass on all other platforms.

I think we are talking about different things. My question was why is this a 'Shell' test (like, a test in the Shell directory that uses FileCheck) and not a test in the API directory (using Python). An 'API' test could use the proper expression testing tools. And it could actually run when doing on-device testing (which is to my knowledge not supported for Shell tests) which seems important for a test concerning a bug that only triggers when doing on-device testing (beside that one ubuntu ARM bot).

Also when looking over the ARM-specific test, I think there might be two bugs that were involved in triggering it. One is the bug fixed here which triggers that Clang will produce its own layout for those classes. Now I also wonder why the layout the expression parser Clang generates doesn't match the one from the test (which appears to be a second bug). The ARM-specific test doesn't have any information in its AST that isn't also available in the expression AST, so why would they produce different layouts? Not sure what exactly is behind the "differences in how it deals with tail padding" description but I assume this is related to tail padding reuse? If yes, then maybe the second bug is that the records in our AST are (not) PODs in the expression AST (see the inline comment for where the tail padding is enabled/disabling based on whether the RD is POD).

clang/lib/AST/RecordLayoutBuilder.cpp
2197

See here for the POD check that we might get wrong.

lldb/test/Shell/Expr/Inputs/layout.cpp
22

I think there should be some comment that explains why this test is structured like this (maybe point out where the tail padding change is happening).

40

Do we actually need these locals in addition to the globals?

shafik added a comment.Jul 7 2020, 9:53 AM

I think we are talking about different things. My question was why is this a 'Shell' test (like, a test in the Shell directory that uses FileCheck) and not a test in the API directory (using Python). An 'API' test could use the proper expression testing tools. And it could actually run when doing on-device testing (which is to my knowledge not supported for Shell tests) which seems important for a test concerning a bug that only triggers when doing on-device testing (beside that one ubuntu ARM bot).

I did not realize the shell tests were not tested on device.

Also when looking over the ARM-specific test, I think there might be two bugs that were involved in triggering it. One is the bug fixed here which triggers that Clang will produce its own layout for those classes. Now I also wonder why the layout the expression parser Clang generates doesn't match the one from the test (which appears to be a second bug). The ARM-specific test doesn't have any information in its AST that isn't also available in the expression AST, so why would they produce different layouts? Not sure what exactly is behind the "differences in how it deals with tail padding" description but I assume this is related to tail padding reuse? If yes, then maybe the second bug is that the records in our AST are (not) PODs in the expression AST (see the inline comment for where the tail padding is enabled/disabling based on whether the RD is POD).

So we are hitting this code for non-arm64 case:

case TargetCXXABI::UseTailPaddingUnlessPOD03:
  //...
  return RD->isPOD();

and we return false which is correct for the original test case since it has a base class and base therefore not an aggregate (until C++17) see it here and struct that had the difference I was looking at. I did not check for the test cases in the PR though.

In the arm64 case from the original problem we hit the following:

case TargetCXXABI::UseTailPaddingUnlessPOD11:
   // ...
   return RD->isTrivial() && RD->isCXX11StandardLayout();

which returns true for the original case.

shafik updated this revision to Diff 276229.Jul 7 2020, 3:04 PM
shafik marked 5 inline comments as done.
  • Removing spurious local variables in test
  • Simplifying the test
shafik added inline comments.Jul 8 2020, 6:17 AM
clang/lib/AST/RecordLayoutBuilder.cpp
2197

I don't believe we are getting this wrong or at least not in the original problem. Going back and checking it we seem to be getting this right.

lldb/test/Shell/Expr/Inputs/layout.cpp
40

I left them in by accident.

shafik updated this revision to Diff 276407.Jul 8 2020, 6:38 AM

Moved from shell test

teemperor accepted this revision.Jul 8 2020, 6:39 AM

LGTM, thanks for the patch!

This revision is now accepted and ready to land.Jul 8 2020, 6:39 AM
This revision was automatically updated to reflect the committed changes.
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 8 2020, 10:07 AM