Page MenuHomePhabricator

carlo.bertolli (Carlo Bertolli)
User

Projects

User does not belong to any projects.

User Details

User Since
May 29 2015, 9:24 AM (408 w, 2 d)

Recent Activity

Tue, Mar 21

carlo.bertolli added a comment to D145831: [OpenMP][libomptarget] Add support for critical regions in AMD GPU device offloading.

I'm really sure that locks at thread scope do not work on amdgpu or pre-volta nvptx. One of the threads wins the cas, all the others do not, and it immediately deadlocks.

Critical sections can be done by rewriting the cfg, general purpose locks can't.

What am I missing here?

Tue, Mar 21, 9:46 AM · Restricted Project, Restricted Project

Mon, Mar 6

carlo.bertolli added inline comments to D144634: [Clang][OpenMP] Support for Code Generation of loop bind clause.
Mon, Mar 6, 10:13 AM · Restricted Project, Restricted Project
carlo.bertolli added inline comments to D144634: [Clang][OpenMP] Support for Code Generation of loop bind clause.
Mon, Mar 6, 6:50 AM · Restricted Project, Restricted Project

Wed, Mar 1

carlo.bertolli added inline comments to D144634: [Clang][OpenMP] Support for Code Generation of loop bind clause.
Wed, Mar 1, 4:02 PM · Restricted Project, Restricted Project

Jan 25 2023

carlo.bertolli added a comment to D142512: [OpenMP][libomptarget] Fix mapping/prelock.cpp test.

@kevinsala if the issue is with returning the host thread a pointer that cannot be used on the host - except in a map clause and a target region - then perhaps we should not return the locked pointer at all. The lock function could operate in a "side effect" mode, where the user asks the runtime to lock memory and the ROCr keeps track of it. Does this make sense?

Jan 25 2023, 6:34 AM · Restricted Project, Restricted Project
carlo.bertolli added a comment to D142512: [OpenMP][libomptarget] Fix mapping/prelock.cpp test.

@carlo.bertolli Just tested the example below with the old plugins and it also segfaults:

#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <omp.h>

void *llvm_omp_target_lock_mem(void *ptr, size_t size, int device_num);
void llvm_omp_target_unlock_mem(void *ptr, int device_num);

int main(int argc, char **argv) {
    const size_t size = 1000*sizeof(int);

    int *ptr1 = (int *) malloc(size);
    assert(ptr1 != NULL);

    memset(ptr1, 0, size);

    int *ptr2 = (int *) llvm_omp_target_lock_mem(ptr1, size, 0);
    assert(ptr2 != NULL);

    fprintf(stdout, "ptr1: %p, ptr2: %p\n", ptr1, ptr2);
    fprintf(stdout, "ptr1[0]: %d\n", ptr1[0]);
    fprintf(stdout, "ptr2[0]: %d\n", ptr2[0]);

    return 0;
}

When running the program, the access to ptr2[0] segfaults at the host side:

ptr1: 0x55b461f00ec0, ptr2: 0x7fc5576c8ec0
ptr1[0]: 0
Segmentation fault

So I don't think ptr2 is a locked host pointer. It's the pointer returned by hsa_amd_memory_lock as the agent_ptr output parameter. And probably it's a pointer that the GPU agents should use for asynchronous memory transfers involving the buffer [ptr1, size). I assumed the host locked pointer is ptr1 directly, after hsa_amd_memory_lock is executed. However, I'm not 100% sure since the documentation of the lock operation on HSA AMD is a bit confusing.

Jan 25 2023, 6:26 AM · Restricted Project, Restricted Project

Jan 24 2023

carlo.bertolli added a comment to D142512: [OpenMP][libomptarget] Fix mapping/prelock.cpp test.

This test works correctly with the old plugin. A locked pointer is not a device pointer. It is a host pointer that has been pinned in memory and that has been made accessible by ROCr to the device agent that will be assigned the data transfers implementing the map clause. In abstract programming terms, the second feature is only an AMD one and it does not need to have a corresponding functionality on other targets.

Jan 24 2023, 8:25 PM · Restricted Project, Restricted Project

Jan 17 2023

carlo.bertolli added a comment to D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).

@carlo.bertolli is there documentation regarding llvm_omp_target_lock_mem llvm_omp_target_unlock_mem? While reading the comments of this patch, I see that the API ensures that locked areas feature reference counting. But I have the following doubts. Should the API support locking memory buffers that were already locked (e.g., complete or partial overlapping)? What's the behavior in such a case?

Jan 17 2023, 8:02 AM · Restricted Project, Restricted Project

Jan 13 2023

carlo.bertolli added a comment to D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).

[AMD Official Use Only - General]

Jan 13 2023, 1:07 PM · Restricted Project, Restricted Project
carlo.bertolli added a comment to D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).

Pushed a fix to the non-amdgpu build:
7928d9e12d47fcc226d0c6984e11f5f463670f4a
Tested on machine without ROCm installation or AMDGPU.

Jan 13 2023, 1:05 PM · Restricted Project, Restricted Project
carlo.bertolli committed rG7928d9e12d47: [OpenMP][libomptarget][AMDGPU] Add missing declarations to fix non amdgpu builds (authored by carlo.bertolli).
[OpenMP][libomptarget][AMDGPU] Add missing declarations to fix non amdgpu builds
Jan 13 2023, 1:05 PM · Restricted Project, Restricted Project
carlo.bertolli committed rGb215932e6991: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in… (authored by carlo.bertolli).
[OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in…
Jan 13 2023, 10:19 AM · Restricted Project, Restricted Project
carlo.bertolli closed D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).
Jan 13 2023, 10:19 AM · Restricted Project, Restricted Project

Jan 11 2023

carlo.bertolli added inline comments to D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).
Jan 11 2023, 5:29 PM · Restricted Project, Restricted Project
carlo.bertolli updated the diff for D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).

[OpenMP][libomptarget][AMDGPU] Address comments, including fix in error handling in is_locked function.

Jan 11 2023, 5:29 PM · Restricted Project, Restricted Project

Jan 10 2023

carlo.bertolli added inline comments to D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).
Jan 10 2023, 2:27 PM · Restricted Project, Restricted Project
carlo.bertolli added a comment to D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).

hsa_amd_memory_lock gives the corresponding region coarse semantics. No update visible until after a kernel exits. Is that right for this?

Jan 10 2023, 2:24 PM · Restricted Project, Restricted Project
carlo.bertolli added a comment to D141227: [OpenMP][libomptarget] Implement memory lock/unlock API in NextGen plugins.

What is locked memory, and could we define it in a comment/documentation somewhere if it isn't already?

Is it the same as mmap LOCKED? That plus extra?

Does it imply GPUs can read/write it? If so, can they read/write within a kernel execution, or can we implement locked memory by copy at the start and end of the execution?

Jan 10 2023, 2:14 PM · Restricted Project, Restricted Project
carlo.bertolli added a comment to D141227: [OpenMP][libomptarget] Implement memory lock/unlock API in NextGen plugins.

What is locked memory, and could we define it in a comment/documentation somewhere if it isn't already?

Is it the same as mmap LOCKED? That plus extra?

Does it imply GPUs can read/write it? If so, can they read/write within a kernel execution, or can we implement locked memory by copy at the start and end of the execution?

Jan 10 2023, 2:12 PM · Restricted Project, Restricted Project
carlo.bertolli added inline comments to D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).
Jan 10 2023, 2:01 PM · Restricted Project, Restricted Project
carlo.bertolli added a comment to D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).

lock_memory and unlock_memory not symmetric.

I mean
lock_memory(ptr) # lock happens
lock_memory(ptr) # no-op
unlock_memory(ptr) # unlock happens
unlock_memory(ptr) # no-op

I just feel unsafe. Feel more safe if
lock_memory(ptr) # lock happens
lock_memory(ptr) # no-op
unlock_memory(ptr) # no-op
unlock_memory(ptr) # unlock happens
but this requires doing reference counting.

Jan 10 2023, 1:56 PM · Restricted Project, Restricted Project
carlo.bertolli updated subscribers of D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).

I still don't feel safe about the amd plugin implementation

  1. how is_locked handle error.
Jan 10 2023, 11:59 AM · Restricted Project, Restricted Project
carlo.bertolli added inline comments to D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).
Jan 10 2023, 11:37 AM · Restricted Project, Restricted Project
carlo.bertolli updated the diff for D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).

[OpenMP][libomptarget][AMDGPU] Addressed comments.

Jan 10 2023, 11:37 AM · Restricted Project, Restricted Project

Jan 9 2023

carlo.bertolli added a comment to D141227: [OpenMP][libomptarget] Implement memory lock/unlock API in NextGen plugins.

I changed the old plugin interface for tgt_rtl_data_lock to return an error code. It now returns the lockedptr as function argument. Let me know if this is not what was called for.
Thanks for this extension!

Jan 9 2023, 9:29 AM · Restricted Project, Restricted Project
carlo.bertolli added a comment to D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).

Thanks for being patience with this patch update. I hit a problem with the is_locked function, when the prelocked pointer is passed to the map clause: the address offset calculation was based on the system-allocator (e.g., malloc) pointer but it would not work if the agentBasePointer (locked) was passed in. Fixed now.

Jan 9 2023, 9:26 AM · Restricted Project, Restricted Project
carlo.bertolli updated the diff for D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).

[OpenMP][libomptarget][AMDGPU] Address comments and use prelocked pointers in mep clause implementation.

Jan 9 2023, 9:26 AM · Restricted Project, Restricted Project

Jan 4 2023

carlo.bertolli added inline comments to D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).
Jan 4 2023, 2:55 PM · Restricted Project, Restricted Project
carlo.bertolli updated the diff for D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).

[OpenMP][libomptarget][AMDGPU] Add lock/unlock to prevent races on Devices vector in plugin manager.

Jan 4 2023, 2:55 PM · Restricted Project, Restricted Project

Jan 3 2023

carlo.bertolli added inline comments to D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).
Jan 3 2023, 2:02 PM · Restricted Project, Restricted Project
carlo.bertolli updated the diff for D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).

[OpenMP][libomptarget][AMDGPU] Apply requested changes and merge against trunk.

Jan 3 2023, 2:02 PM · Restricted Project, Restricted Project

Dec 15 2022

carlo.bertolli committed rGac52c8f58979: [OpenMP] Add missing test for pinned memory API (authored by carlo.bertolli).
[OpenMP] Add missing test for pinned memory API
Dec 15 2022, 7:30 PM · Restricted Project, Restricted Project
carlo.bertolli closed D140077: [OpenMP] Add missing test for pinned memory API.
Dec 15 2022, 7:30 PM · Restricted Project, Restricted Project

Dec 14 2022

carlo.bertolli added a comment to D140077: [OpenMP] Add missing test for pinned memory API.

Should this have a target? Do you want to check a and b after?

Dec 14 2022, 7:15 PM · Restricted Project, Restricted Project
carlo.bertolli updated the diff for D140077: [OpenMP] Add missing test for pinned memory API.

Updated test with checks of a and b pointers.

Dec 14 2022, 7:15 PM · Restricted Project, Restricted Project
carlo.bertolli requested review of D140077: [OpenMP] Add missing test for pinned memory API.
Dec 14 2022, 5:32 PM · Restricted Project, Restricted Project
carlo.bertolli committed rGd6281caa3446: [OpenMP] Add API for pinned memory (authored by carlo.bertolli).
[OpenMP] Add API for pinned memory
Dec 14 2022, 6:52 AM · Restricted Project, Restricted Project
carlo.bertolli closed D138933: [OpenMP] Add API for pinned memory.
Dec 14 2022, 6:52 AM · Restricted Project, Restricted Project

Dec 13 2022

carlo.bertolli updated the diff for D138933: [OpenMP] Add API for pinned memory.

[OpenMP] Only rerun lit test for pinned memory, no need to rebuild.

Dec 13 2022, 12:14 PM · Restricted Project, Restricted Project
carlo.bertolli updated the diff for D138933: [OpenMP] Add API for pinned memory.

[OpenMP] Add test for default allocator and move unlock call to appropriate site.

Dec 13 2022, 11:11 AM · Restricted Project, Restricted Project

Dec 12 2022

carlo.bertolli added a comment to D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).

ping

Dec 12 2022, 7:00 AM · Restricted Project, Restricted Project
carlo.bertolli added a comment to D138933: [OpenMP] Add API for pinned memory.

ping

Dec 12 2022, 7:00 AM · Restricted Project, Restricted Project

Dec 5 2022

carlo.bertolli updated the diff for D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).

Apply comments

Dec 5 2022, 7:45 AM · Restricted Project, Restricted Project
carlo.bertolli added a comment to D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).

It's interesting that locking locked memory succeeds, but doesn't give you something that has to be unlocked twice. Not totally sure about the convenience vs error detection there. What does the proposed user facing interface look like?

Dec 5 2022, 7:45 AM · Restricted Project, Restricted Project

Dec 2 2022

carlo.bertolli requested review of D139208: [OpenMP][libomptarget][AMDGPU] lock/unlock (pin/unpin) mechanism in libomptarget amdgpu plugin (API and implementation).
Dec 2 2022, 9:41 AM · Restricted Project, Restricted Project

Nov 30 2022

carlo.bertolli updated the diff for D138933: [OpenMP] Add API for pinned memory.

Removed llvm_omp_target_host_mem_alloc as a case for pinning.

Nov 30 2022, 7:15 AM · Restricted Project, Restricted Project

Nov 29 2022

carlo.bertolli added inline comments to D138933: [OpenMP] Add API for pinned memory.
Nov 29 2022, 5:22 PM · Restricted Project, Restricted Project
carlo.bertolli updated the diff for D138933: [OpenMP] Add API for pinned memory.

Pinning does not apply to device memory on AMDGPU target, only to host memory, as explained here:
https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/inc/hsa_ext_amd.h#L1526

Nov 29 2022, 5:22 PM · Restricted Project, Restricted Project
carlo.bertolli requested review of D138933: [OpenMP] Add API for pinned memory.
Nov 29 2022, 11:08 AM · Restricted Project, Restricted Project
carlo.bertolli added a comment to D138614: [Clang][OpenMP][AMDGPU] Fix capture of variably modified type alias in teams distribute.

LGTM but @ABataev should review/approve this

Nov 29 2022, 9:38 AM · Restricted Project, Restricted Project, Restricted Project

Aug 22 2022

carlo.bertolli accepted D132074: [OpenMP] Add option to assert no nested OpenMP parallelism on the GPU.

Thanks for explaining. LGTM.

Aug 22 2022, 7:27 AM · Restricted Project, Restricted Project, Restricted Project
carlo.bertolli added a comment to D132074: [OpenMP] Add option to assert no nested OpenMP parallelism on the GPU.

This looks good, but what happens when the user accidentally adds a nested parallel when this option is turned on? Do we get serial (correct) execution?

Aug 22 2022, 7:21 AM · Restricted Project, Restricted Project, Restricted Project

Apr 12 2022

carlo.bertolli accepted D123527: [libomptarget][amdgpu] Add hidden_heap_v1 kernarg metadata.

LGTM, thanks for this.

Apr 12 2022, 11:15 AM · Restricted Project, Restricted Project

Feb 18 2022

carlo.bertolli committed rG7b731f4d0bfb: [OpenMP][libomptarget] Delay restore of shadow pointers in structs to after H2D… (authored by carlo.bertolli).
[OpenMP][libomptarget] Delay restore of shadow pointers in structs to after H2D…
Feb 18 2022, 8:09 AM
carlo.bertolli closed D119968: [OpenMP][libomptarget] Delay restore of shadow pointers in structs to after H2D memory copies are completed.
Feb 18 2022, 8:09 AM · Restricted Project

Feb 16 2022

carlo.bertolli updated the diff for D119968: [OpenMP][libomptarget] Delay restore of shadow pointers in structs to after H2D memory copies are completed.

Replace list of shadow pointer to be restored with simple synchronization.

Feb 16 2022, 6:48 PM · Restricted Project
carlo.bertolli added inline comments to D119968: [OpenMP][libomptarget] Delay restore of shadow pointers in structs to after H2D memory copies are completed.
Feb 16 2022, 1:01 PM · Restricted Project
carlo.bertolli requested review of D119968: [OpenMP][libomptarget] Delay restore of shadow pointers in structs to after H2D memory copies are completed.
Feb 16 2022, 12:42 PM · Restricted Project

Dec 17 2021

carlo.bertolli committed rGd3abb04e148b: [OpenMP][libomptarget] Fix __tgt_rtl_run_target_team_region_async API with… (authored by carlo.bertolli).
[OpenMP][libomptarget] Fix __tgt_rtl_run_target_team_region_async API with…
Dec 17 2021, 7:59 AM
carlo.bertolli closed D115887: [OpenMP][libomptarget] Fix __tgt_rtl_run_target_team_region_async API with missing parameter.
Dec 17 2021, 7:59 AM · Restricted Project
carlo.bertolli added a comment to D115887: [OpenMP][libomptarget] Fix __tgt_rtl_run_target_team_region_async API with missing parameter.

It looks like https://reviews.llvm.org/D115823 will take a bit longer to be merged. I will merge this in to fix the API error, as it can go in first.

Dec 17 2021, 7:48 AM · Restricted Project

Dec 16 2021

carlo.bertolli updated the diff for D115887: [OpenMP][libomptarget] Fix __tgt_rtl_run_target_team_region_async API with missing parameter.

Add context.

Dec 16 2021, 9:47 AM · Restricted Project
carlo.bertolli requested review of D115887: [OpenMP][libomptarget] Fix __tgt_rtl_run_target_team_region_async API with missing parameter.
Dec 16 2021, 9:47 AM · Restricted Project
carlo.bertolli accepted D115823: [openmp][libomptarget] Introduce and call createAsyncInfo.

LGTM

Dec 16 2021, 8:17 AM · Restricted Project

Dec 15 2021

carlo.bertolli committed rGd83dc4c64814: [OpenMP] Increase opportunity for parallel kernel launch in AMDGPUs: add… (authored by carlo.bertolli).
[OpenMP] Increase opportunity for parallel kernel launch in AMDGPUs: add…
Dec 15 2021, 7:35 AM
carlo.bertolli closed D115771: [OpenMP] Increase opportunity for parallel kernel launch in AMDGPUs: add multiple hsa queue's per device in plugin.
Dec 15 2021, 7:35 AM · Restricted Project

Dec 14 2021

carlo.bertolli requested review of D115771: [OpenMP] Increase opportunity for parallel kernel launch in AMDGPUs: add multiple hsa queue's per device in plugin.
Dec 14 2021, 5:47 PM · Restricted Project

Dec 10 2021

carlo.bertolli committed rG28309c543669: [OpenMP] Part 2 of At present, amdgpu plugin merges both asynchronous (authored by carlo.bertolli).
[OpenMP] Part 2 of At present, amdgpu plugin merges both asynchronous
Dec 10 2021, 11:21 AM
carlo.bertolli closed D115273: [NFC][OpenMP] Prepare amdgpu plugin for asynchronous implementation of target region launch.
Dec 10 2021, 11:21 AM · Restricted Project
carlo.bertolli updated the diff for D115273: [NFC][OpenMP] Prepare amdgpu plugin for asynchronous implementation of target region launch.

Trying rebase again.

Dec 10 2021, 8:29 AM · Restricted Project
carlo.bertolli updated the diff for D115273: [NFC][OpenMP] Prepare amdgpu plugin for asynchronous implementation of target region launch.

rebase onto main

Dec 10 2021, 8:25 AM · Restricted Project

Dec 9 2021

carlo.bertolli updated the diff for D115273: [NFC][OpenMP] Prepare amdgpu plugin for asynchronous implementation of target region launch.

[OpenMP][NFC] Apply requested change: remove "nowait" from function name.
Add context (sorry about that!)

Dec 9 2021, 10:36 AM · Restricted Project

Dec 8 2021

carlo.bertolli updated the diff for D115279: [OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version.

[OpenMP] Add missing hsa declarations/definitions when building runtime without rocr (or hsa library) installed on the system

Dec 8 2021, 11:33 AM · Restricted Project
carlo.bertolli added a comment to D115279: [OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version.

I know what happened to my machine. Some CMake change caused offload plugins are not compiled. Sign. broken upstream.
My intention is to check pinned status. Before the first lock(not pinned), after the first lock(pinned), after the offload region(pinned), after the unlock(unpinned).
Could you also verify with rocprof hsa trace that the lock and unlock are both called twice?

Dec 8 2021, 10:19 AM · Restricted Project
carlo.bertolli added a comment to D115279: [OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version.

I failed to verify your first lock behaves as intended.

#include <hsa/hsa.h>
#include <hsa/hsa_ext_amd.h>
#include <omp.h>
#include <stdio.h>

#define N 100293

int checkLocked(void *ptr) {
  hsa_amd_pointer_info_t info;
  hsa_status_t herr;

  herr = hsa_amd_pointer_info(ptr, &info, NULL, NULL, NULL);
  if (herr != HSA_STATUS_SUCCESS) {
    printf("  hsa_amd_pointer_info failed\n");
    return 1;
  }

  if (info.type != HSA_EXT_POINTER_TYPE_LOCKED) {
    printf("  pointer is noooooooooooot locked\n");
    return 1;
  } else
    printf("  pointer is locked\n");

  return 0;
}

int main() {
  int n = N;
  int *a = new int[n];
  for (int i = 0; i < n; i++)
    a[i] = 0;

  int *a_locked = nullptr;
  hsa_status_t herr =
      hsa_amd_memory_lock(a, n * sizeof(int), nullptr, 0, (void **)&a_locked);
  if (herr != HSA_STATUS_SUCCESS) {
    printf("Locking failed\n");
    return 1;
  }

  checkLocked(a);

#pragma omp target parallel for map(tofrom : a_locked[:n])
  for (int i = 0; i < n; i++)
    a_locked[i] = i;

  herr = hsa_amd_memory_unlock(a);
  if (herr != HSA_STATUS_SUCCESS) {
    printf("Unlocking failed\n");
    return 1;
  }

  int err = 0;
  for (int i = 0; i < n; i++)
    if (a[i] != i) {
      err++;
      printf("Err at %d, got %d expected %d\n", i, a[i], i);
      if (err > 10)
        break;
    }

  delete[] a;

  return err;
}

I got failure at the first check with "hsa_amd_pointer_info failed". Could you take a look?

Dec 8 2021, 9:51 AM · Restricted Project
carlo.bertolli added a comment to D115279: [OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version.

I have not tried with memory that has already been locked, but I will. In any case, with this patch, if locking fails, then we revert to malloc+lock+unlock+free. This is not ideal, and this case is added for other reasons, but it should be supporting the case.

It will be better skipping lock/free if the memory is known to HSA already. I think IBM XL skips its pinned memory optimization when it sees the pointer pinned already for CUDA.
I have code managing lock/unlock via HIP. Even if a lock call from the plugin succeeds, and then a plugin unlock call succeeds, the user unlock call fails.
For this reason, check memory info is required.
fallback to "malloc+lock+unlock+free" is the worst option.

Dec 8 2021, 8:40 AM · Restricted Project
carlo.bertolli added a comment to D115279: [OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version.

Yes, should revert and update the dlopen HSA source. Apologies for not checking this builds before committing it.

I don't know whether there's more ritual to to around whether it's already pinned - @carlo.bertolli did you look into testing if the memory is already pinned before doing so? Particularly interested if already pinned is a reason for lock to fail

Dec 8 2021, 5:07 AM · Restricted Project

Dec 7 2021

carlo.bertolli requested review of D115279: [OpenMP][AMDGPU] Switch host-device memory copy to asynchronous version.
Dec 7 2021, 1:15 PM · Restricted Project
carlo.bertolli added a comment to D115267: [NFC][OpenMP] Prepare amdgpu plugin for asynchronous implementation of target region launch.

@JonChesterfield thanks for the quick review. Would you mind merging this for me?

Dec 7 2021, 12:57 PM · Restricted Project
carlo.bertolli added inline comments to D115267: [NFC][OpenMP] Prepare amdgpu plugin for asynchronous implementation of target region launch.
Dec 7 2021, 12:48 PM · Restricted Project
carlo.bertolli updated the diff for D115273: [NFC][OpenMP] Prepare amdgpu plugin for asynchronous implementation of target region launch.

format

Dec 7 2021, 12:47 PM · Restricted Project
carlo.bertolli requested review of D115273: [NFC][OpenMP] Prepare amdgpu plugin for asynchronous implementation of target region launch.
Dec 7 2021, 11:21 AM · Restricted Project
carlo.bertolli added a comment to D115267: [NFC][OpenMP] Prepare amdgpu plugin for asynchronous implementation of target region launch.

@JonChesterfield this looks more readable as a first step.

Dec 7 2021, 10:41 AM · Restricted Project
carlo.bertolli abandoned D115258: [NFC][OpenMP] Prepare amdgpu plugin for asynchronous implementation of target region launch.

Abandoning this one for readability issues and moving to two separate patches.
First one:
https://reviews.llvm.org/D115267

Dec 7 2021, 10:41 AM
carlo.bertolli requested review of D115267: [NFC][OpenMP] Prepare amdgpu plugin for asynchronous implementation of target region launch.
Dec 7 2021, 10:40 AM · Restricted Project
carlo.bertolli requested review of D115258: [NFC][OpenMP] Prepare amdgpu plugin for asynchronous implementation of target region launch.
Dec 7 2021, 8:50 AM

Nov 17 2021

carlo.bertolli added a comment to D102449: [WIP][Clang][OpenMP] Add the support for compare clause in atomic directive.

This is already a lot of code with parse+sema. I wonder if we should split the patch into two, i.e. 1. parse+sema; 2. code gen? @ABataev ?
It should simplify maintenance of the patch and allow time to extend the OpenMP IR builder.

Nov 17 2021, 8:02 AM · Restricted Project, Restricted Project

Nov 3 2021

carlo.bertolli accepted D113111: [OpenMP] Build device runtimes for sm_86.

LGTM

Nov 3 2021, 8:31 AM · Restricted Project

Oct 25 2021

carlo.bertolli added a comment to D111280: [OpenMP] Avoid starting a kernel without work.

Thanks for this. Do you think this could use a test?
I assume a test would be easier to write once this information is used somewhere, but it's hard to tell without context.

How would a test look like? And what do you mean by "information is used somewhere"?
If we land this code it is "used" and will avoid execution of kernels with an associated loop trip count of 0.

Oct 25 2021, 6:39 PM

Oct 15 2021

carlo.bertolli accepted D111905: [OpenMP][deviceRTLs] Fix wrong return value of `__kmpc_is_spmd_exec_mode`.

Thanks for fixing this. It would be nice if you could add a comment where it is used at the beginning of parallel_51, stating what are the initial value of parallelLevel (kmpc_parallel_level) and those of kmpc_is_spmd_exec_mode in the two cases of generic and spmd modes (assuming generic spmd mode is handled as a spmd). Something like:

Oct 15 2021, 1:02 PM · Restricted Project

Oct 7 2021

carlo.bertolli added a comment to D111280: [OpenMP] Avoid starting a kernel without work.

Thanks for this. Do you think this could use a test?
I assume a test would be easier to write once this information is used somewhere, but it's hard to tell without context.

Oct 7 2021, 7:35 AM

Sep 30 2021

carlo.bertolli accepted D110845: [libomptarget] Apply D110029 to amdgpu.

LGTM

Sep 30 2021, 1:27 PM · Restricted Project

Sep 20 2021

carlo.bertolli updated the diff for D108569: [OpenMP] Enable map checks under unified_shared_memory mode.

Updated patch based on comments: add new map table entry field to track USM maps; use it to determine behavior in getTargetPointer and deallocTgtPtr; update var name and add explaining comments.

Sep 20 2021, 8:48 AM

Sep 17 2021

carlo.bertolli added a comment to D102449: [WIP][Clang][OpenMP] Add the support for compare clause in atomic directive.

Another thing is how we deal with a corner case. Say the OpenMP code is written in the following way:

#pragma omp atomic compare
  x = e < x ? e : x;

That's how OpenMP spec defines the atomic operation. x is always in "else statement" of a conditional statement.

Now we need to lower it to LLVM IR, which is atomicrmw operation. Based on the LLVM IR reference, it only supports the atomic operations that x is in the "then statement". For example: x = x > e ? x : e. See the x here is before :. In order to lower the OpenMP statement, we have to do a transformation. In order to swap e and x, we need to transform it to x = e >= x : x : e, a.k.a. x = x <= e : x : e. However, we don't have an atomic operation for <=. We only have <. So if x != e, the result is good.

The incorrectness happens if x == e. Recall at the OpenMP statement, when x == e, the result should be x = x. But if we look at our lowered LLVM IR, x = x < e : x : e, when x == e, it becomes x = e, which doesn't conform with OpenMP spec.

What should we do here?

Sep 17 2021, 11:14 AM · Restricted Project, Restricted Project

Aug 26 2021

carlo.bertolli added a comment to D102449: [WIP][Clang][OpenMP] Add the support for compare clause in atomic directive.

Thanks for uploading this.

Aug 26 2021, 10:33 AM · Restricted Project, Restricted Project

Aug 23 2021

carlo.bertolli updated the diff for D108569: [OpenMP] Enable map checks under unified_shared_memory mode.

Update environment variable name to reflect comments and intended name

Aug 23 2021, 11:34 AM
carlo.bertolli requested review of D108569: [OpenMP] Enable map checks under unified_shared_memory mode.
Aug 23 2021, 11:22 AM

Jul 6 2021

carlo.bertolli accepted D102307: [OpenMP] Detect SPMD compatible kernels and execute them as such.

I have very small comments. LGTM.

Jul 6 2021, 9:13 AM · Restricted Project

Jun 14 2021

carlo.bertolli added a comment to D103899: [OpenMP] Fix C-only clang assert on parsing use_allocator clause of target directive.

@ABataev can you please merge this for me? I still have to ask for commit privileges.

Jun 14 2021, 8:41 AM · Restricted Project
carlo.bertolli added a comment to D103899: [OpenMP] Fix C-only clang assert on parsing use_allocator clause of target directive.

Nice, thanks. I guess we already have an equivalent test in c++?

Jun 14 2021, 8:38 AM · Restricted Project