This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profdata] Avoid keeping reference to every files
Needs ReviewPublic

Authored by paulsemel on Jun 21 2019, 2:05 PM.

Details

Summary

This is definitely not the ideal solution to solve this problem.
But I think this is better than destroying my RAM.

Diff Detail

Repository
rL LLVM

Event Timeline

paulsemel created this revision.Jun 21 2019, 2:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2019, 2:05 PM
vsk edited reviewers, added: danielcdh, xur, dnovillo, davidxl; removed: vsk.Jun 21 2019, 2:10 PM

Adding reviewers who are familiar with SamplePGO.

davidxl added a reviewer: wmi.Jun 21 2019, 2:14 PM
wmi added a comment.Jun 21 2019, 3:59 PM

We had better have a test for it. It is not a strict NFC and we best verify it works as we expect -- like if we miss any place which still uses StringRef refering to string in file data buffer, we may have dangling pointer after we free the buffer.

llvm/tools/llvm-profdata/llvm-profdata.cpp
444

Please add some comment about why we need FunctionNames. I think it is because FunctionSamples and SampleRecord all use StringRef and the original strings refered to are in the data buffer. To remove the dependence on the data buffer, we want to use a name table to save all the name strings. The usage is not very straightforward.

In D63671#1554457, @wmi wrote:

We had better have a test for it. It is not a strict NFC and we best verify it works as we expect -- like if we miss any place which still uses StringRef refering to string in file data buffer, we may have dangling pointer after we free the buffer.

I agree with you that this is not completely NFC. However, I didn't really see what could best test this functionality (anyway, in our case, accessing dangling pointer would lead to SIGSEGV as we unmap the files).

paulsemel updated this revision to Diff 206318.Jun 24 2019, 4:06 PM

Add comment to describe why we need this string table
Add one test case.

paulsemel marked an inline comment as done.Jun 24 2019, 4:06 PM
paulsemel retitled this revision from [llvm-profdata] [NFC] Avoir keeping reference to every files to [llvm-profdata] Avoid keeping reference to every files.Jun 24 2019, 4:32 PM
wmi added a comment.Jun 28 2019, 3:42 PM

Add one test case.

Thanks. If we remove the code to replace string reference from file data buffer to FunctionNames set for call targets, will the test fail?

In D63671#1563092, @wmi wrote:

Add one test case.

Thanks. If we remove the code to replace string reference from file data buffer to FunctionNames set for call targets, will the test fail?

No, because this change is kind of a NFC, which implies that I want to keep the same behaviors as previously.
To actually have a test that fails if we remove this code, we would need to have a super heavy and time consuming test, which is, imo, not what we want for a test.

Do we really need to have a tests that fails if we remove this code ?

wmi added a comment.Jul 1 2019, 11:08 AM
In D63671#1563092, @wmi wrote:

Add one test case.

Thanks. If we remove the code to replace string reference from file data buffer to FunctionNames set for call targets, will the test fail?

No, because this change is kind of a NFC, which implies that I want to keep the same behaviors as previously.
To actually have a test that fails if we remove this code, we would need to have a super heavy and time consuming test, which is, imo, not what we want for a test.

Do we really need to have a tests that fails if we remove this code ?

Is it possible to add an option to flush the file data buffer with 0 after the file is released? The option is only enabled for testing. With the option on, even with small test we can catch some difference in merge result if the patch is incorrect.

wmi added a comment.Jul 3 2019, 12:25 PM
In D63671#1565126, @wmi wrote:
In D63671#1563092, @wmi wrote:

Add one test case.

Thanks. If we remove the code to replace string reference from file data buffer to FunctionNames set for call targets, will the test fail?

No, because this change is kind of a NFC, which implies that I want to keep the same behaviors as previously.
To actually have a test that fails if we remove this code, we would need to have a super heavy and time consuming test, which is, imo, not what we want for a test.

Do we really need to have a tests that fails if we remove this code ?

Is it possible to add an option to flush the file data buffer with 0 after the file is released? The option is only enabled for testing. With the option on, even with small test we can catch some difference in merge result if the patch is incorrect.

I realize I wasn't clear. I mean flush the data buffer after reading and merging for the file is done and just before the file is released.

paulsemel added a comment.EditedJul 11 2019, 9:20 AM
In D63671#1568997, @wmi wrote:
In D63671#1565126, @wmi wrote:
In D63671#1563092, @wmi wrote:

Add one test case.

Thanks. If we remove the code to replace string reference from file data buffer to FunctionNames set for call targets, will the test fail?

No, because this change is kind of a NFC, which implies that I want to keep the same behaviors as previously.
To actually have a test that fails if we remove this code, we would need to have a super heavy and time consuming test, which is, imo, not what we want for a test.

Do we really need to have a tests that fails if we remove this code ?

Is it possible to add an option to flush the file data buffer with 0 after the file is released? The option is only enabled for testing. With the option on, even with small test we can catch some difference in merge result if the patch is incorrect.

I realize I wasn't clear. I mean flush the data buffer after reading and merging for the file is done and just before the file is released.

I am very sorry, but I don't really understand what you want... The files are mmap'd with only READ protection, so zeroing the data isn't really an option!
Plus, we munmap the file when we destroy the object at the end of the loop, so accessing again the data would result in a segfault already... we cannot do much more here I think, but maybe I am missing smth ?

Sorry, I didn't have time to work on this the past 10 days..

wmi added a comment.Jul 23 2019, 11:54 AM

From my last reply "I assumed the input files are small since we want to have small testcase. If the input files are small, the buffer won't be allocated through mmap but through new (From getOpenFileImpl in lib/Support/MemoryBuffer.cpp)".

Do you think we can construct a testcase with small inputs so the buffer could be writable?

Plus, we munmap the file when we destroy the object at the end of the loop, so accessing again the data would result in a segfault already...

If that is the case, why you didn't see SEGV when you remove the code to replace string reference from file data buffer to FunctionNames set for call targets? I think munmap area could possibly be remapped when reading in the next file?