This is an archive of the discontinued LLVM Phabricator instance.

Fix get base address bug
Needs ReviewPublic

Authored by JohnLee1243 on Jun 7 2023, 9:56 AM.

Details

Diff Detail

Event Timeline

JohnLee1243 created this revision.Jun 7 2023, 9:56 AM
Herald added a reviewer: Amir. · View Herald Transcript
Herald added a reviewer: maksfb. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: ayermolo. · View Herald Transcript
JohnLee1243 requested review of this revision.Jun 7 2023, 9:56 AM
Amir added a comment.Jun 7 2023, 11:08 AM

I assume it addresses the issue https://github.com/llvm/llvm-project/issues/61370 and https://reviews.llvm.org/D144588.
Aside from getpagesize returning the host page size which may not match the page size of the host on which perf data was captured, can you please elaborate why the change you're making is correct while the ELF spec says this:

[p_align] integral power of 2, and p_vaddr should equal p_offset, modulo p_align.

(no mention of a page size)

I assume it addresses the issue https://github.com/llvm/llvm-project/issues/61370 and https://reviews.llvm.org/D144588.
Aside from getpagesize returning the host page size which may not match the page size of the host on which perf data was captured, can you please elaborate why the change you're making is correct while the ELF spec says this:

[p_align] integral power of 2, and p_vaddr should equal p_offset, modulo p_align.

(no mention of a page size)

I found the definition of base address in the document -- Tool Interface Standard (TIS) Executable and Linking Format (ELF) Specification.

An executable or shared object file's base address is calculated during execution from three values: the virtual memory load address, the maximum page size, and the lowest virtual address of a program's loadable segment. To compute the base address, one determines the memory
address associated with the lowest p_vaddr value for a PT_LOAD segment. This address is truncated to the nearest multiple of the maximum page size. The corresponding p_vaddr value itself is also truncated to the nearest multiple of the maximum page size. The base address is the difference between the truncated memory address and the truncated p_vaddr value.

It mentioned the page size in this document.

Either this patch https://reviews.llvm.org/D144588 or my patch can solve my problem, and my system is 4k page. I want to know whether my patch can solve your problem on 64k page system.

hzq added a subscriber: hzq.Jul 5 2023, 11:31 PM
Amir added a comment.Jul 19 2023, 8:24 AM

Please address the comment. We'll also need a test case – using linker script or yaml2obj to tightly control segment offsets + pre-aggregated perf containing mmap information.

bolt/lib/Core/BinaryContext.cpp
1877

getpagesize() returns host page size, which could be different from the one used on the target system.

I don't know if my understanding of the elf format is right. Is the baseaddress in the document --"Tool Interface Standard (TIS) Executable and Linking Format (ELF) Specification" has the same meaning as the base address in the code.
If so, the pagesize is needed. But I don't know how to get the target system pagesize. Do you have some ideas?
Meanwhile, I am not familiar with linker script or yaml2obj. Can you explain the test case more specific? @Amir

I don't know if my understanding of the elf format is right. Is the baseaddress in the document --"Tool Interface Standard (TIS) Executable and Linking Format (ELF) Specification" has the same meaning as the base address in the code.
If so, the pagesize is needed. But I don't know how to get the target system pagesize. Do you have some ideas?

  • Seems to me that the existing bolt code can only handle 1 executable LOAD segment.
  • In theory a process can use different pagesizes to map ELF objects into memory, shouldn't we get the memory mapping information from PERF_RECORD_MMAP2 rather than just the system page size?