This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
test/
-
dynamic_cast14.pass.cpp

Differential D15539

[libcxxabi] Reducing stack usage of test
ClosedPublic

Authored by bcraig on Dec 15 2015, 1:47 PM.

Download Raw Diff

Details

Reviewers

mclow.lists

Summary

This test has a lot of classes with large amounts of manually inserted padding in them, presumable to prevent various optimizations. The test then creates lots of these objects on the stack. On embedded targets, this was usually enough to overflow the stack.

I moved the objects to global / namespace scope. Since the tests are each in their own namespace, there should be no cross-test conflicts.

Diff Detail

Event Timeline

bcraig updated this revision to Diff 42903.Dec 15 2015, 1:47 PM

bcraig retitled this revision from to [libcxxabi] Reducing stack usage of test.

bcraig updated this object.

bcraig added a reviewer: mclow.lists.

bcraig added a subscriber: cfe-commits.

Could these large padding things be related to the fact that the test is used as a performance check for the implementation? That being said, I have no idea who is paying attention to the numbers that come out of this test (if anyone even is?). Maybe @howard.hinnant knows something about it?

In D15539#311343, @jroelofs wrote:

Could these large padding things be related to the fact that the test is used as a performance check for the implementation? That being said, I have no idea who is paying attention to the numbers that come out of this test (if anyone even is?). Maybe @howard.hinnant knows something about it?

My expectation is that the only thing the padding numbers would effect with regards to execution time would be cache hits and misses. If there is something subtle going on though, then those sizes need some comments and rationale. "93481" looks like it was pulled out of a hat, and it is large enough on it's own to overflow many embedded stacks.

As an alternative, would it be acceptable to heap allocate these
objects, using the original padding numbers?

Starting primes at 13, and skipping the larger twin in twin prime pairs. Using long double instead of char for padding to help alleviate fears of multiple structs having the same actual size.

What does having them be long doubles give over just multiplying the counts by 16 (or however big it is on your platform)? Alignment?

Seems like it'd be better to start with a prime that's ~16x larger, say 211, than to have that factor of 16 floating around everywhere.

In D15539#312319, @jroelofs wrote:

What does having them be long doubles give over just multiplying the counts by 16 (or however big it is on your platform)? Alignment?

Seems like it'd be better to start with a prime that's ~16x larger, say 211, than to have that factor of 16 floating around everywhere.

Alignment is the main answer. I'd use max_align_t directly, except that it got added in C++11, and this test should be able to run in C++98 and C++03.
Using a large type instead of a char also makes it easier to avoid making two padding values that look different, but aren't actually all that different due to internal padding. Just starting at a high prime wouldn't fix this, as then I have to ensure that consecutive values are separated by at least sizeof(void *).

In D15539#312332, @bcraig wrote:

In D15539#312319, @jroelofs wrote:

What does having them be long doubles give over just multiplying the counts by 16 (or however big it is on your platform)? Alignment?

Seems like it'd be better to start with a prime that's ~16x larger, say 211, than to have that factor of 16 floating around everywhere.

Alignment is the main answer. I'd use max_align_t directly, except that it got added in C++11, and this test should be able to run in C++98 and C++03.
Using a large type instead of a char also makes it easier to avoid making two padding values that look different, but aren't actually all that different due to internal padding. Just starting at a high prime wouldn't fix this, as then I have to ensure that consecutive values are separated by at least sizeof(void *).

What platform are you on where sizeof(void*) is anywhere near 13 bytes?

In D15539#312336, @jroelofs wrote:

In D15539#312332, @bcraig wrote:

In D15539#312319, @jroelofs wrote:

What does having them be long doubles give over just multiplying the counts by 16 (or however big it is on your platform)? Alignment?

Seems like it'd be better to start with a prime that's ~16x larger, say 211, than to have that factor of 16 floating around everywhere.

Alignment is the main answer. I'd use max_align_t directly, except that it got added in C++11, and this test should be able to run in C++98 and C++03.
Using a large type instead of a char also makes it easier to avoid making two padding values that look different, but aren't actually all that different due to internal padding. Just starting at a high prime wouldn't fix this, as then I have to ensure that consecutive values are separated by at least sizeof(void *).

What platform are you on where sizeof(void*) is anywhere near 13 bytes?

Ohh, nevermind... now I see what you're saying.

In D15539#312336, @jroelofs wrote:

In D15539#312332, @bcraig wrote:

In D15539#312319, @jroelofs wrote:

What does having them be long doubles give over just multiplying the counts by 16 (or however big it is on your platform)? Alignment?

Seems like it'd be better to start with a prime that's ~16x larger, say 211, than to have that factor of 16 floating around everywhere.

Alignment is the main answer. I'd use max_align_t directly, except that it got added in C++11, and this test should be able to run in C++98 and C++03.
Using a large type instead of a char also makes it easier to avoid making two padding values that look different, but aren't actually all that different due to internal padding. Just starting at a high prime wouldn't fix this, as then I have to ensure that consecutive values are separated by at least sizeof(void *).

What platform are you on where sizeof(void*) is anywhere near 13 bytes?

I'm on a platform where void* is 4 bytes (Hexagon). That's not what I was getting at though. Lets take x86_64 as an example instead.

#include <iostream>

struct A {
  virtual ~A() {}
  char _[17];
};
struct B {
  virtual ~B() {}
  char _[19];
};
struct C {
  virtual ~C() {}
  char _[23];
};

int main() {
  std::cout<<"A: "<<sizeof(A)<<std::endl; //prints 32
  std::cout<<"B: "<<sizeof(B)<<std::endl; //prints 32
  std::cout<<"C: "<<sizeof(C)<<std::endl; //prints 32
  return 0;
}

I could reasonable see an optimizer collapsing some of these classes together in some way or another. If you change the chars out for long doubles, then all three classes have different sizes.

Ping.
If desired, I could provide an alternative implementation where all the structs are allocated at global scope with their original padding.

ping

FOAD: Ball's in @mclow.lists's court, not mine. I don't feel comfortable signing off on this.

ping

@mclow.lists ping

I think you've reduced the scope of the tests significantly with this change.

If you want to make the objects smaller, that's fine - but please heed Howard's advice in the email thread:

Don't make them too small. (three digits is better than two)
Keep the object sizes odd, and their differences more than the max alignment for your platform.

test/dynamic_cast14.pass.cpp
18	Using long double here (instead of char) forces alignment; reducing the scope of the tests. (and means that the sizes are always an even number, and (probably) a multiple of 8.
36	@howard.hinnant wrote in the email thread: If you need to reduce these numbers for other platforms, I think that would be fine. However I would not reduce them to double digits. I would keep them as large as you can reasonably get away with.

Any objections to using the original sizes, but constructing the objects at global scope? That fixes the stack usage without significantly changing the layout characteristics of the test?

bcraig updated this revision to Diff 47327.Feb 9 2016, 8:53 AM

bcraig updated this object.

bcraig edited edge metadata.

Any objections to using the original sizes, but constructing the objects at global scope?

I think that should work.

So does this latest revision get the check-mark of approval?

@mclow.lists ping

Yes, sorry.

This revision is now accepted and ready to land.Mar 3 2016, 3:21 PM

bcraig closed this revision.Mar 4 2016, 6:30 AM

Revision Contents

Path

Size

test/

dynamic_cast14.pass.cpp

72 lines

Diff 47327

test/dynamic_cast14.pass.cpp

Context not available.
	A13* getA13() {return this;}	A13* getA13() {return this;}
	};	};

		A3 a3;
		A4 a4;
		A5 a5;
		A6 a6;
		A7 a7;
		A8 a8;
		A9 a9;
		A10 a10;
		A11 a11;
		A12 a12;
		A13 a13;

	void test()	void test()
	{	{
	A3 a3;
	A4 a4;
	A5 a5;
	A6 a6;
	A7 a7;
	A8 a8;
	A9 a9;
	A10 a10;
	A11 a11;
	A12 a12;
	A13 a13;

	assert(dynamic_cast<A3*>(a3.getA1_3()) == a3.getA3());	assert(dynamic_cast<A3*>(a3.getA1_3()) == a3.getA3());
	assert(dynamic_cast<A3*>(a3.getA2()) == a3.getA3());	assert(dynamic_cast<A3*>(a3.getA2()) == a3.getA3());

Context not available.
	A13* getA13() {return this;}	A13* getA13() {return this;}
	};	};

		A3 a3;
		A4 a4;
		A5 a5;
		A6 a6;
		A7 a7;
		A8 a8;
		A9 a9;
		A10 a10;
		A11 a11;
		A12 a12;
		A13 a13;

	void test()	void test()
	{	{
	A3 a3;
	A4 a4;
	A5 a5;
	A6 a6;
	A7 a7;
	A8 a8;
	A9 a9;
	A10 a10;
	A11 a11;
	A12 a12;
	A13 a13;

	assert(dynamic_cast<A3*>(a3.getA1_3()) == 0);	assert(dynamic_cast<A3*>(a3.getA1_3()) == 0);
	assert(dynamic_cast<A3*>(a3.getA2()) == a3.getA3());	assert(dynamic_cast<A3*>(a3.getA2()) == a3.getA3());

Context not available.
	A13* getA13() {return this;}	A13* getA13() {return this;}
	};	};

		A3 a3;
		A4 a4;
		A5 a5;
		A6 a6;
		A7 a7;
		A8 a8;
		A9 a9;
		A10 a10;
		A11 a11;
		A12 a12;
		A13 a13;

	void test()	void test()
	{	{
	A3 a3;
	A4 a4;
	A5 a5;
	A6 a6;
	A7 a7;
	A8 a8;
	A9 a9;
	A10 a10;
	A11 a11;
	A12 a12;
	A13 a13;

	assert(dynamic_cast<A3*>(a3.getA1_3()) == a3.getA3());	assert(dynamic_cast<A3*>(a3.getA1_3()) == a3.getA3());
	assert(dynamic_cast<A3*>(a3.getA2()) == 0);	assert(dynamic_cast<A3*>(a3.getA2()) == 0);

Context not available.