Page MenuHomePhabricator

[libomptarget][ve] Workaround for the libomptarget memory manager cleanup
AbandonedPublic

Authored by manorom on Jan 6 2021, 5:29 AM.

Details

Summary

The target memory manager, introduced to libomptarget with rG0289696751e9a959b2413ca26624fc6c91be1eea, will try to free target memory it allocated, using __tgt_rtl_data_delete(), when the host program shuts down.
This is a problem for the VE plugin because it cleans up all its target resources when its destructor is called, which happens before the Memory Manager destructor frees is executed. This means that there is no VE process handle when the Memory Manager tries to free its allocated target memory and the call to veo_free_mem() causes a program crash.

This patch adds a workaround for this problem. It simply makes __tgt_rtl_data_delete() return OFFLOAD_SUCCESS immediately if the plugin has already cleaned up its target memory.

Diff Detail

Event Timeline

manorom requested review of this revision.Jan 6 2021, 5:29 AM
manorom created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptJan 6 2021, 5:29 AM
JonChesterfield accepted this revision.Jan 6 2021, 6:41 AM
JonChesterfield added a subscriber: JonChesterfield.

We really should fix this in libomptarget, but yeah, workaround for now

This revision is now accepted and ready to land.Jan 6 2021, 6:41 AM

A colleague has now written a similar hack for amdgpu. The control flow in libomptarget is difficult to follow, but it is surely possible to move the destructor call.

Until we fix that, I think we should disable the known-broken memory allocator, at least by default.

I'm trying to fix the issue in D94256. Please take a review. :-)

The issue has been fixed D94379 by moving the memory manager to plugin. I only enabled CUDA. If VE would like this feature, you can also do it in a similar way.

manorom abandoned this revision.Jan 20 2021, 3:50 AM

Obsolete due to D94379