This is an archive of the discontinued LLVM Phabricator instance.

[LLD][ELF][ARM][AArch64] Only round up ThunkSection Size when large OS.
ClosedPublic

Authored by peter.smith on Jan 7 2020, 9:20 AM.

Details

Summary

In D71281 a fix was put in to round up the size of a ThunkSection to the nearest 4KiB when performing errata patching. This fixed a problem with a very large instrumented program that had thunks and patches mutually trigger each other. Unfortunately it triggers an assertion failure in an AArch64 allyesconfig build of the kernel. There is a specific assertion preventing an InputSectionDescription being larger than 4KiB. This will always trigger if there is at least one Thunk needed in that InputSectionDescription, which is possible for an allyesconfig build.

Abstractly the problem case is:

.text : {
          *(.text) ;
          ...
          . = ALIGN(SZ_4K);
          __idmap_text_start = .;
          *(.idmap.text)
          __idmap_text_end = .;
          ...
        }

The assertion checks that idmap_text_end - idmap_start is < 4 KiB. Note that there is more than one InputSectionDescription in the OutputSection so we can't just restrict the fix to OutputSections smaller than 4 KiB.

The fix presented here limits the D71281 to InputSectionDescriptions that meet the following conditions:

  • The OutputSection is bigger than the thunkSectionSpacing so adding thunks will affect the addresses of following code.
  • The InputSectionDescription is smaller than 4 KiB. This will prevent any assertion failures that an InputSectionDescription is < 4 KiB in size.

We do this at ThunkSection creation time as at this point we know that the addresses are stable and up to date prior to adding the thunks as assignAddresses() will have been called immediately prior to thunk generation.

The fix reverts the two tests affected by D71281 to their original state as they no longer need the 4KiB size roundup. I've added simpler tests to check for D71281 when the OutputSection size is larger than the ThunkSection spacing.

Fixes https://github.com/ClangBuiltLinux/linux/issues/812

Diff Detail

Event Timeline

peter.smith created this revision.Jan 7 2020, 9:20 AM

On looking again at the linux kernel makefile

#define IDMAP_TEXT					\
	. = ALIGN(SZ_4K);				\
	__idmap_text_start = .;				\
	*(.idmap.text)					\
	__idmap_text_end = .;
...
.text : {			/* Real text segment		*/
		_stext = .;		/* Text and read-only data	*/
			IRQENTRY_TEXT
			SOFTIRQENTRY_TEXT
			ENTRY_TEXT
			TEXT_TEXT
			SCHED_TEXT
			CPUIDLE_TEXT
			LOCK_TEXT
			KPROBES_TEXT
			HYPERVISOR_TEXT
			IDMAP_TEXT
			HIBERNATE_TEXT
			TRAMP_TEXT
			*(.fixup)
			*(.gnu.warning)
		. = ALIGN(16);
		*(.got)			/* Global offset table		*/
	}

It looks like this fix may not be enough if the aggregate size of the .text section is > ThunkSectionSpacing. If it isn't enough then my next preference would be to only set roundUpSizeForErrata if the InputSectionDescription such as *(.idmap.text) is over a certain size. A small InputSectionDescription is not likely to generate a large enough range of patches and thunks to cause the iterate for ever corner case.

grimar added inline comments.Jan 14 2020, 1:27 AM
lld/ELF/SyntheticSections.cpp
3462

does the following look better?

roundUpSizeForErrata = (config->fixCortexA53Errata843419 || config->fixCortexA8) &&
      os->size >= target->getThunkSectionSpacing();

But, basing on the comment which says "os->size is stable" (I've not debug the code, sorry),
can't you just get rid of roundUpSizeForErrata and do the following?

size_t ThunkSection::getSize() const {
  if ((config->fixCortexA53Errata843419 || config->fixCortexA8) &&
      this->parent->size >= target->getThunkSectionSpacing())
    return alignTo(size, 4096);
  return size;
}
lld/ELF/SyntheticSections.h
1079

This needs a comment probably.

peter.smith marked 3 inline comments as done.Jan 14 2020, 4:02 AM

Thanks for the comments. I've decided to rework this In light of it not actually fixing the problem in Linux. In the allyesconfig build the overall .text OutputSection is larger than thunkSectionSpacing, but the assert was for in effect an InputSectionDescription within .text so I've had to think again. I'm nearly ready to submit again, just got to test in on the kernel itself first.

lld/ELF/SyntheticSections.cpp
3462

Unfortunately not as getSize() is called when calculating the addresses and this resets this->parent->size. I ended up with a different size reported when doing assignOffsets.

lld/ELF/SyntheticSections.h
1079

Agreed, will do so by reference.

peter.smith marked an inline comment as done.
peter.smith edited the summary of this revision. (Show Details)

Updated the fix, and description after testing on a real linux kernel allyesconfig build. The description contains the full details, which should also be present in the comment. In summary we round up the size of a thunk section to nearest 4KiB when:

  • The erratum fix is being used.
  • The thunk section may be placed before the end of Output Section so there may be code after it.
  • The InputSectionDescription is larger than 4KiB.
MaskRay added inline comments.Jan 15 2020, 4:18 PM
lld/ELF/Relocations.cpp
1772

larger? ( isdLimit - isdBase > 4096)

1775

Can we simplify isdBase + target->getThunkSectionSpacing() < os->size to off + target->getThunkSectionSpacing() < os->size

MaskRay added inline comments.Jan 15 2020, 6:10 PM
lld/ELF/Relocations.cpp
1775

We can't.

off + target->getThunkSectionSpacing() < os->size can cause https://bugs.llvm.org//show_bug.cgi?id=44071#c5 to fail with ld.lld: error: thunk creation not converged.

I suggested off because I thought off corresponds to prevIsecLimit in the code below:

if (isecLimit > thunkUpperBound) {
  addThunkSection(os, isd, prevIsecLimit);
  thunkUpperBound = prevIsecLimit + thunkSectionSpacing;
}

Still puzzled why we use isdBase + target->getThunkSectionSpacing() < os->size here.

1781

I wonder whether we should use 4 unconditionally https://reviews.llvm.org/D72819 .

Most thunks are 16 bytes. I don't know why we selected 8, instead of 4 or 16 in D29129.
(If 8 were for locality/cache line considerations, 16 would be a better choice.)

peter.smith marked 3 inline comments as done.Jan 16 2020, 3:27 AM

Thanks for the comments. I'll post an update with a simplification as I don't think I can justify the heuristic without being too hand wavy.

lld/ELF/Relocations.cpp
1772

Yes, larger. Thanks for spotting that one.

1775

Going over this again, and refreshing where the thunks are actually inserted and not using my memory and looking at the failing chromium example the isdBase, TS1 off, TS2 off, isdLimit are as follows

0TS1 0x7fcffb8TS2 0x80514880x10024008

The Thunk insertion code tries to insert the first thunk section as far away as thunkUpperBound, but stops at lastThunkLowerBound as we can comfortably reach from a branch from the end of the isd.

Both the ThunkSections + thunkSectionSpacing are > os->size, but isdBase + thunkSectionSpacing is not, so any ThunkSection created in the isd will have its size rounded up.

The isdBase + thunkSectionSpacing < os->size is essentially a heuristic. In effect saying that we are less likely to need thunks and patches when in the last thunkSectionSpacing() of the os. I can only really justify as a balance in avoiding running out of passes versus an arbitrary assertion such as in the linux kernel. If we are less concerned about the latter then I think we can simplify to:

if  (os->size > target->getThunkSectionSpacing() && (isdLimit - isdBase > 4096))

This will get both of the problems we know about, but may be too specific to the linux kernel assert.

1781

Agreeed. It was probably my fault for introducing it in the first place. I didn't have a good reason for it for Arm/AArch64 but didn't know about other targets so I used config->wordsize.

peter.smith edited the summary of this revision. (Show Details)

Updated diff (and description) to simplify part of the check to os->size > target->getThunkSectionSpacing(). This is conservative, but simple. We can in theory do better, but it is difficult to do so without using heuristics. In effect it is a judgement that ThunkSections close to, but not necessarily at the end of the OutputSection have less of a chance of generating sufficient extra thunks and patches to cause non-convergence.

I've taken out the reduction in alignment of ThunkSection for AArch64 as this change can be made outside of this patch.

Tested against Linux Kernel allyesconfig and instrumented Chromium build.

MaskRay accepted this revision.Jan 17 2020, 12:22 AM
This revision is now accepted and ready to land.Jan 17 2020, 12:22 AM
This revision was automatically updated to reflect the committed changes.