This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Don't advance position in a memory region when assigning to the Dot
ClosedPublic

Authored by MaskRay on May 21 2019, 12:47 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

MaskRay created this revision.May 21 2019, 12:47 AM
ruiu accepted this revision.May 21 2019, 12:53 AM

LGTM

Nice.

This revision is now accepted and ready to land.May 21 2019, 12:53 AM

gold has a different behavior from bfd here btw.

gold has a different behavior from bfd here btw.

I think gold has a bug in its memory region matching. It uses boolean AND instead of boolean OR to find matching memory region... For example, in memory5.test, it fails to associate the two sections to the memory region and fails to report an error when there is no matching memory region...

This revision was automatically updated to reflect the committed changes.
grimar added a comment.EditedMay 21 2019, 1:42 AM

I find the behavior of the GNU linkers wierd.

If we use bfd and have:

.section .foo,"ax";
nop;

.section .bar,"ax";
nop;

and script:

MEMORY {
  ram (ax) : ORIGIN = 0x42000, LENGTH = 0x100000
}
SECTIONS {
  .foo : { *(.foo*) }
  . += 0x2000;
  .bar : { *(.bar*) }
}

Then .foo section has address 0x42000 and .bar has 0x42001.
I.e. moving a dot does not change the address of the next section.

But if I remove MEMORY:

SECTIONS {
  .foo : { *(.foo*) }
  . += 0x2000;
  .bar : { *(.bar*) }
}

Then .foo is 0x0 and .bar is 0x2001 (as I would expect to see)

I think it is very strange that MEMORY command affects the Dot assignment behavior in that way.

Seems that original script from PR had to use a hack to move the second output section:

aligned_dot = ALIGN(0x10 * 1024);  
    
.data aligned_dot :
{
  *(.data*)
}

I find the behavior of the GNU linkers wierd.

If we use bfd and have:

.section .foo,"ax";
nop;

.section .bar,"ax";
nop;

and script:

MEMORY {
  ram (ax) : ORIGIN = 0x42000, LENGTH = 0x100000
}
SECTIONS {
  .foo : { *(.foo*) }
  . += 0x2000;
  .bar : { *(.bar*) }
}

Then .foo section has address 0x42000 and .bar has 0x42001.
I.e. moving a dot does not change the address of the next section.

But if I remove MEMORY:

SECTIONS {
  .foo : { *(.foo*) }
  . += 0x2000;
  .bar : { *(.bar*) }
}

Then .foo is 0x0 and .bar is 0x2001 (as I would expect to see)

I think it is very strange that MEMORY command affects the Dot assignment behavior in that way.

Seems that original script from PR had to use a hack to move the second output section:

aligned_dot = ALIGN(0x10 * 1024);  
    
.data aligned_dot :
{
  *(.data*)
}

I agree... The code near gold/script-sections.cc:2476 (in Output_section_definition::set_section_addresses) is responsible for this behavior

      vma_region = script_sections->find_memory_region(this, true, false, NULL);
      if (vma_region != NULL)  ////////// if it can find a matched memory region, dot_value will be ignored.
	address = vma_region->get_current_address()->eval(symtab, layout,
							  false);
      else
	address = *dot_value;

Our code responsible for this behavior is:

if (Ctx->MemRegion)
  Dot = Ctx->MemRegion->CurPos;

Thanks for the fix. The MEMORY command is indeed strange. I think that this behaviour is documented in https://sourceware.org/binutils/docs/ld/Output-Section-Address.html#Output-Section-Address

The output section address heuristic is as follows:

    If an output memory region is set for the section then it is added to this region and its address will be the next free address in that region.
    If the MEMORY command has been used to create a list of memory regions then the first region which has attributes compatible with the section is selected to contain it. The section’s output address will be the next free address in that region; MEMORY.
    If no memory regions were specified, or none match the section then the output address will be based on the current value of the location counter.

In the location counter page

The location counter may not be moved backwards inside an output section, and may not be moved backwards outside of an output section if so doing creates areas with overlapping LMAs.

So I think that the assignment to dot in the test case is ignored when setting the base address of .data due to Memory taking precedence in the above heuristic, as it doesn't create overlapping LMAs it is legal to move the location counter backwards.