Page MenuHomePhabricator

[OpenMP] Added the support for hidden helper task in RTL
ClosedPublic

Authored by tianshilei1992 on Apr 6 2020, 4:54 PM.

Details

Summary

The basic design is to create an outer-most parallel team. It is not a regular team because it is only created when the first hidden helper task is encountered, and is only responsible for the execution of hidden helper tasks. We first use pthread_create to create a new thread, let's call it the initial and also the main thread of the hidden helper team. This initial thread then initializes a new root, just like what RTL does in initialization. After that, it directly calls __kmpc_fork_call. It is like the initial thread encounters a parallel region. The wrapped function for this team is, for main thread, which is the initial thread that we create via pthread_create on Linux, waits on a condition variable. The condition variable can only be signaled when RTL is being destroyed. For other work threads, they just do nothing. The reason that main thread needs to wait there is, in current implementation, once the main thread finishes the wrapped function of this team, it starts to free the team which is not what we want.

Two environment variables, LIBOMP_NUM_HIDDEN_HELPER_THREADS and LIBOMP_USE_HIDDEN_HELPER_TASK, are also set to configure the number of threads and enable/disable this feature. By default, the number of hidden helper threads is 8.

Here are some open issues to be discussed:

  1. The main thread goes to sleeping when the initialization is finished. As Andrey mentioned, we might need it to be awaken from time to time to do some stuffs. What kind of update/check should be put here?

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I left some comments. Generally, I would prefer we minimize the use of the macro to elide declarations. I'd also prefer to use the macro as part of the conditions to avoid duplication.
Instead of

#ifdef MACRO
foo(X)
#else 
foo(Y)
#endif

we do

v = MACRO ? X : Y;
foo(v);

which is really helpful if foo is complex code and just as fast.

@adurang @AndreyChurbanov have your concerns been addressed?

openmp/runtime/src/kmp.h
2337–2338

Do we really need the USE_UNSHACKLED_TASK flag? Even if we want it, we don't need it here in the struct. Let's waste one bit on windows until we catch up and remove complexity for everyone.

2392

Grammar:
"The task team of its parent task team"
and
"therefore we it when this task is created"

2402

Similarly, I don't think the byte savings here are worth it.

openmp/runtime/src/kmp_runtime.cpp
3648

Is the code in the #else case the same as in the } else { case? If so, make the conditional if (USE_UNSHACKLED_TASK && ...) and avoid the duplication of bugs.

4332

Nit: can we move the initialization out of the loop, hard to read. A comment might help as well.

Looking at the code more generally, is this the same code as below with different bounds? If so, avoid the duplication all together please, same way as suggested above.

@adurang @AndreyChurbanov have your concerns been addressed?

I didn't see the problem with release_deps being solved (maybe I missed). And I think we should really have a mechanism to set the number of threads instead of a hardcoded '8' and not have the threads created until is necessary.

Also, given efforts in OpenMP to remove master and similar terms maybe we should think about renaming "unshackled" to something else like "helper" or "auxilary"? I know is a bit of a pain to do that so I won't press for this but thought that I should mention it.

tianshilei1992 added inline comments.Oct 28 2020, 7:41 AM
openmp/runtime/src/kmp_taskdeps.h
136

@adurang The problem of release deeps was fixed here.

Enhanced one test case and fixed some comments

tianshilei1992 marked 2 inline comments as done.Oct 28 2020, 8:04 PM

I left some comments. Generally, I would prefer we minimize the use of the macro to elide declarations. I'd also prefer to use the macro as part of the conditions to avoid duplication.
Instead of

#ifdef MACRO
foo(X)
#else 
foo(Y)
#endif

we do

v = MACRO ? X : Y;
foo(v);

which is really helpful if foo is complex code and just as fast.

Some variables are only defined when the MACRO is enabled. I have changed some code to make it more readable and less complex.

Changed some code to make it more readable and less complex.

The failed case is because the gtid is not offset. What is a right way to detect whether a CMake variable or macro is defined?

jdoerfert added a comment.EditedOct 29 2020, 1:08 PM

Some variables are only defined when the MACRO is enabled. I have changed some code to make it more readable and less complex.

As I said before, I don't see the point in omitting declarations. It just increases our testing surface for no real benefit. If you don't use this but have two more functions and a few declarations, all of which you don't use, you really don't pay a price in the big scheme of things.

What is a right way to detect whether a CMake variable or macro is defined?

In C/C++ (#ifdef) or in CMake (idk)?

Added support for setting number of unshackled threads via environment variable

Some variables are only defined when the MACRO is enabled. I have changed some code to make it more readable and less complex.

As I said before, I don't see the point in omitting declarations. It just increases our testing surface for no real benefit. If you don't use this but have two more functions and a few declarations, all of which you don't use, you really don't pay a price in the big scheme of things.

What is a right way to detect whether a CMake variable or macro is defined?

In C/C++ (#ifdef) or in CMake (idk)?

The point is, our test cases are not run by CMake, so it cannot detect whether we define any variable.

Some variables are only defined when the MACRO is enabled. I have changed some code to make it more readable and less complex.

As I said before, I don't see the point in omitting declarations. It just increases our testing surface for no real benefit. If you don't use this but have two more functions and a few declarations, all of which you don't use, you really don't pay a price in the big scheme of things.

What is a right way to detect whether a CMake variable or macro is defined?

In C/C++ (#ifdef) or in CMake (idk)?

The point is, our test cases are not run by CMake, so it cannot detect whether we define any variable.

Then make USE_UNSHACKLED_TASK default and remove all the uses that elide declarations and definitions.

ye-luo added a comment.EditedNov 10 2020, 10:05 AM

Some variables are only defined when the MACRO is enabled. I have changed some code to make it more readable and less complex.

As I said before, I don't see the point in omitting declarations. It just increases our testing surface for no real benefit. If you don't use this but have two more functions and a few declarations, all of which you don't use, you really don't pay a price in the big scheme of things.

What is a right way to detect whether a CMake variable or macro is defined?

In C/C++ (#ifdef) or in CMake (idk)?

The point is, our test cases are not run by CMake, so it cannot detect whether we define any variable.

Then make USE_UNSHACKLED_TASK default and remove all the uses that elide declarations and definitions.

Better to have a way to elide unshackled thread team creation at runtime before putting LIBOMP_USE_UNSHACKLED_TASK by default.

Some variables are only defined when the MACRO is enabled. I have changed some code to make it more readable and less complex.

As I said before, I don't see the point in omitting declarations. It just increases our testing surface for no real benefit. If you don't use this but have two more functions and a few declarations, all of which you don't use, you really don't pay a price in the big scheme of things.

What is a right way to detect whether a CMake variable or macro is defined?

In C/C++ (#ifdef) or in CMake (idk)?

The point is, our test cases are not run by CMake, so it cannot detect whether we define any variable.

Then make USE_UNSHACKLED_TASK default and remove all the uses that elide declarations and definitions.

Better to have a way to elide unshackled thread team creation at runtime before putting LIBOMP_USE_UNSHACKLED_TASK by default.

It's already included in this patch.

Added the missing variable initialization

Removed the marcro USE_UNSHACKLED_TASK

tianshilei1992 edited the summary of this revision. (Show Details)Nov 11 2020, 8:58 AM
tianshilei1992 marked 3 inline comments as done.Nov 11 2020, 12:51 PM
jdoerfert accepted this revision.Dec 18 2020, 7:21 PM

As far as I can tell the issues have been addressed. This has been sitting here a while, let's get it in so we get more exposure. LGTM

If you go over your comments once more, add punctuation to make all of them sentences. If you want to change "unshackled" to "hidden_helper" or similar, that might be good.

This revision is now accepted and ready to land.Dec 18 2020, 7:21 PM

Updated the patch to use more inclusive words

tianshilei1992 retitled this revision from [OpenMP] Added the support for unshackled task in RTL to [OpenMP] Added the support for hidden helper task in RTL.Dec 19 2020, 6:01 PM
tianshilei1992 edited the summary of this revision. (Show Details)

Fixed one remained part

Still something left...

Fixed a bug in __kmp_release_deps

Refined test cases and rebased

This revision was landed with ongoing or failed builds.Sat, Jan 16, 11:13 AM
This revision was automatically updated to reflect the committed changes.

This broke building OpenMP for windows; all the new helper functions, like __kmp_hidden_helper_threads_initz_wait, that are added in z_Linux_util.cpp would need to be added similarly to z_Windows_NT_util.cpp. What do you propose doing - revert the patch for now until that's in place?

tianshilei1992 reopened this revision.Mon, Jan 18, 3:58 AM

reopen as the change was reverted

This revision is now accepted and ready to land.Mon, Jan 18, 3:58 AM
tianshilei1992 planned changes to this revision.Mon, Jan 18, 4:02 AM

This broke building OpenMP for windows; all the new helper functions, like __kmp_hidden_helper_threads_initz_wait, that are added in z_Linux_util.cpp would need to be added similarly to z_Windows_NT_util.cpp. What do you propose doing - revert the patch for now until that's in place?

Thanks for the report. We had a macro controlling whether the feature is enabled before. On Windows the macro is not defined so that corresponding parts in common files will not be built on Windows. Later we decided to remove the macro and turn the feature ON by default but I forgot to add the logic in Windows files, and I didn’t have Windows machines then.......

I’ve reverted the change and will fix the issue.

@ronlieb tells me an out of tree offloading test (aomp/test/smoke/devices) started crashing (hangs/segv/fp exception) with this patch applied. That doesn't make sense to me since this doesn't appear to change the target offloading logic, but it might be a smoking gun for a lifetime management error somewhere in the above. Does anyone know if the host openmp runtime is expected to be clean under things like valgrind or thread sanitizer?

Added missing functions on Windows but forced __kmp_enable_hidden_helper to
FALSE on all non-Linux platforms

This revision is now accepted and ready to land.Wed, Jan 20, 5:51 PM

@mstorsjo Would you mind giving it a shot on Windows?

tianshilei1992 requested review of this revision.Wed, Jan 20, 5:53 PM

@mstorsjo Would you mind giving it a shot on Windows?

Looks like it builds correctly now, thanks!

Haven't tested it practically (except for a very trivial smoke test) to see if it breaks anything at runtime, but it doesn't at least regress the build any longer.

JonChesterfield added a comment.EditedFri, Jan 22, 8:33 AM

The information I've got on the possible race is:
When this patch is applied (by git's automerge, I think) to the rocm stack, a test located at:
https://github.com/ROCm-Developer-Tools/aomp/blob/master/test/smoke/devices/devices.c
fails in unpredictable fashion.

I've reproduced the test here as it's fairly short, but it uses some functions on the device that the trunk implementation returns zero for. Adjusted so it builds on trunk. Run as

export LD_LIBRARY_PATH=$HOME/llvm-install/lib/ ; $HOME/llvm-install/bin/clang  -O2  -target x86_64-pc-linux-gnu -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_50   devices.c -o devices -L/usr/local/cuda/targets/x86_64-linux/lib -lcudart && valgrind --fair-sched=yes ./devices
// devices.c
#include <stdio.h>
#include <omp.h>

int main() {
  int num_devs = omp_get_num_devices();
  for (int device_num = 0; device_num < num_devs ; device_num++) {
#pragma omp target device(device_num) nowait
#pragma omp teams num_teams(2) thread_limit(4)
#pragma omp parallel num_threads(2)
    {
      // need to pass the total device number to all devices, per module load
      int num_threads = omp_get_num_threads();
      int num_teams   = omp_get_num_teams();
      int num_devices = omp_get_num_devices(); // not legal in 4.5

      // need to pass the device id to the device starting the kernel
      int thread_id   = omp_get_thread_num();
      int team_id     = omp_get_team_num();
      int device_id   = 0; // omp_get_device_num();  // no API in omp 4.5

      // assume we have homogeneous devices
      int total_threads = num_devices * num_teams * num_threads;
      int gthread_id    = (device_id * num_teams * num_threads) + (team_id * num_threads) + thread_id;

      // print out id
      printf("Hello OpenMP 5 from \n");
      printf(" Device num  %d of %d devices\n", device_id, num_devices);
      printf(" Team num    %d of %d teams  \n", team_id,   num_teams);
      printf(" Thread num  %d of %d threads\n", thread_id, num_threads);
      printf(" Global thread %d of %d total threads\n", gthread_id, total_threads);
    };
  };
#pragma omp taskwait
  printf("The host device num is %d\n", omp_get_device_num());
  printf("The initial device num is %d\n", omp_get_initial_device());
  printf("The number of devices are %d\n", num_devs);
}

Trunk before this patch makes a use of uninitialized memory but the test succeeds (prints a lot of stuff).

==27099== Conditional jump or move depends on uninitialised value(s)
==27099==    at 0x4C36DC1: __tgt_target_teams_nowait_mapper (llvm-project/openmp/libomptarget/src/interface.cpp:470)
==27099==    by 0x40148E: .omp_task_entry. (in /home/amd/aomp/aomp/test/smoke/devices/devices)
==27099==    by 0x4B5B688: __kmp_invoke_task(int, kmp_task*, kmp_taskdata*) (llvm-project/openmp/runtime/src/kmp_tasking.cpp:1562)
==27099==    by 0x4B5B8BB: __kmp_omp_task (llvm-project/openmp/runtime/src/kmp_tasking.cpp:1679)
==27099==    by 0x4B5BB7E: __kmpc_omp_task (llvm-project/openmp/runtime/src/kmp_tasking.cpp:1739)
==27099==    by 0x401309: main

With this patch applied, most of the print output is lost, and the uninitialized data error changes

The host device num is 1
The initial device num is 1
==20091== Thread 9:
==20091== Conditional jump or move depends on uninitialised value(s)
==20091==    at 0x4C3ADC1: __tgt_target_teams_nowait_mapper (llvm-project/openmp/libomptarget/src/interface.cpp:470)
==20091==    by 0x40148E: .omp_task_entry. (in /home/amd/aomp/aomp/test/smoke/devices/devices)
==20091==    by 0x4B5C399: __kmp_invoke_task(int, kmp_task*, kmp_taskdata*) (llvm-project/openmp/runtime/src/kmp_tasking.cpp:1633)
==20091==    by 0x4B60012: int __kmp_execute_tasks_template<kmp_flag_64<false, true> >(kmp_info*, int, kmp_flag_64<false, true>*, int, int*, void*, int) (llvm-project/openmp/runtime/src/kmp_tasking.cpp:3012)
==20091==    by 0x4B6AE91: int __kmp_execute_tasks_64<false, true>(kmp_info*, int, kmp_flag_64<false, true>*, int, int*, void*, int) (llvm-project/openmp/runtime/src/kmp_tasking.cpp:3111)
==20091==    by 0x4B79901: kmp_flag_64<false, true>::execute_tasks(kmp_info*, int, int, int*, void*, int) (llvm-project/openmp/runtime/src/kmp_wait_release.h:915)
==20091==    by 0x4B7497C: bool __kmp_wait_template<kmp_flag_64<false, true>, true, false, true>(kmp_info*, kmp_flag_64<false, true>*, void*) (llvm-project/openmp/runtime/src/kmp_wait_release.h:345)
==20091==    by 0x4B797D9: kmp_flag_64<false, true>::wait(kmp_info*, int, void*) (llvm-project/openmp/runtime/src/kmp_wait_release.h:922)
==20091==    by 0x4B70559: __kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*) (llvm-project/openmp/runtime/src/kmp_barrier.cpp:672)
==20091==    by 0x4B7401D: __kmp_fork_barrier(int, int) (llvm-project/openmp/runtime/src/kmp_barrier.cpp:1982)
==20091==    by 0x4B3B701: __kmp_launch_thread (llvm-project/openmp/runtime/src/kmp_runtime.cpp:5776)
==20091==    by 0x4BB976D: __kmp_launch_worker(void*) (llvm-project/openmp/runtime/src/z_Linux_util.cpp:591)
==20091== 
The number of devices are 1
CUDA error: Error returned from cuDeviceGet

This is more obvious on the amd implementation because it segfaults on a null pointer dereference.

The information I've got on the possible race is:
When this patch is applied (by git's automerge, I think) to the rocm stack, a test located at:
https://github.com/ROCm-Developer-Tools/aomp/blob/master/test/smoke/devices/devices.c
fails in unpredictable fashion.

I've reproduced the test here as it's fairly short, but it uses some functions on the device that the trunk implementation returns zero for. Adjusted so it builds on trunk. Run as

export LD_LIBRARY_PATH=$HOME/llvm-install/lib/ ; $HOME/llvm-install/bin/clang  -O2  -target x86_64-pc-linux-gnu -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target=nvptx64-nvidia-cuda -march=sm_50   devices.c -o devices -L/usr/local/cuda/targets/x86_64-linux/lib -lcudart && valgrind --fair-sched=yes ./devices
// devices.c
#include <stdio.h>
#include <omp.h>

int main() {
  int num_devs = omp_get_num_devices();
  for (int device_num = 0; device_num < num_devs ; device_num++) {
#pragma omp target device(device_num) nowait
#pragma omp teams num_teams(2) thread_limit(4)
#pragma omp parallel num_threads(2)
    {
      // need to pass the total device number to all devices, per module load
      int num_threads = omp_get_num_threads();
      int num_teams   = omp_get_num_teams();
      int num_devices = omp_get_num_devices(); // not legal in 4.5

      // need to pass the device id to the device starting the kernel
      int thread_id   = omp_get_thread_num();
      int team_id     = omp_get_team_num();
      int device_id   = 0; // omp_get_device_num();  // no API in omp 4.5

      // assume we have homogeneous devices
      int total_threads = num_devices * num_teams * num_threads;
      int gthread_id    = (device_id * num_teams * num_threads) + (team_id * num_threads) + thread_id;

      // print out id
      printf("Hello OpenMP 5 from \n");
      printf(" Device num  %d of %d devices\n", device_id, num_devices);
      printf(" Team num    %d of %d teams  \n", team_id,   num_teams);
      printf(" Thread num  %d of %d threads\n", thread_id, num_threads);
      printf(" Global thread %d of %d total threads\n", gthread_id, total_threads);
    };
  };
#pragma omp taskwait
  printf("The host device num is %d\n", omp_get_device_num());
  printf("The initial device num is %d\n", omp_get_initial_device());
  printf("The number of devices are %d\n", num_devs);
}

Trunk before this patch makes a use of uninitialized memory but the test succeeds (prints a lot of stuff).

==27099== Conditional jump or move depends on uninitialised value(s)
==27099==    at 0x4C36DC1: __tgt_target_teams_nowait_mapper (llvm-project/openmp/libomptarget/src/interface.cpp:470)
==27099==    by 0x40148E: .omp_task_entry. (in /home/amd/aomp/aomp/test/smoke/devices/devices)
==27099==    by 0x4B5B688: __kmp_invoke_task(int, kmp_task*, kmp_taskdata*) (llvm-project/openmp/runtime/src/kmp_tasking.cpp:1562)
==27099==    by 0x4B5B8BB: __kmp_omp_task (llvm-project/openmp/runtime/src/kmp_tasking.cpp:1679)
==27099==    by 0x4B5BB7E: __kmpc_omp_task (llvm-project/openmp/runtime/src/kmp_tasking.cpp:1739)
==27099==    by 0x401309: main

With this patch applied, most of the print output is lost, and the uninitialized data error changes

The host device num is 1
The initial device num is 1
==20091== Thread 9:
==20091== Conditional jump or move depends on uninitialised value(s)
==20091==    at 0x4C3ADC1: __tgt_target_teams_nowait_mapper (llvm-project/openmp/libomptarget/src/interface.cpp:470)
==20091==    by 0x40148E: .omp_task_entry. (in /home/amd/aomp/aomp/test/smoke/devices/devices)
==20091==    by 0x4B5C399: __kmp_invoke_task(int, kmp_task*, kmp_taskdata*) (llvm-project/openmp/runtime/src/kmp_tasking.cpp:1633)
==20091==    by 0x4B60012: int __kmp_execute_tasks_template<kmp_flag_64<false, true> >(kmp_info*, int, kmp_flag_64<false, true>*, int, int*, void*, int) (llvm-project/openmp/runtime/src/kmp_tasking.cpp:3012)
==20091==    by 0x4B6AE91: int __kmp_execute_tasks_64<false, true>(kmp_info*, int, kmp_flag_64<false, true>*, int, int*, void*, int) (llvm-project/openmp/runtime/src/kmp_tasking.cpp:3111)
==20091==    by 0x4B79901: kmp_flag_64<false, true>::execute_tasks(kmp_info*, int, int, int*, void*, int) (llvm-project/openmp/runtime/src/kmp_wait_release.h:915)
==20091==    by 0x4B7497C: bool __kmp_wait_template<kmp_flag_64<false, true>, true, false, true>(kmp_info*, kmp_flag_64<false, true>*, void*) (llvm-project/openmp/runtime/src/kmp_wait_release.h:345)
==20091==    by 0x4B797D9: kmp_flag_64<false, true>::wait(kmp_info*, int, void*) (llvm-project/openmp/runtime/src/kmp_wait_release.h:922)
==20091==    by 0x4B70559: __kmp_hyper_barrier_release(barrier_type, kmp_info*, int, int, int, void*) (llvm-project/openmp/runtime/src/kmp_barrier.cpp:672)
==20091==    by 0x4B7401D: __kmp_fork_barrier(int, int) (llvm-project/openmp/runtime/src/kmp_barrier.cpp:1982)
==20091==    by 0x4B3B701: __kmp_launch_thread (llvm-project/openmp/runtime/src/kmp_runtime.cpp:5776)
==20091==    by 0x4BB976D: __kmp_launch_worker(void*) (llvm-project/openmp/runtime/src/z_Linux_util.cpp:591)
==20091== 
The number of devices are 1
CUDA error: Error returned from cuDeviceGet

This is more obvious on the amd implementation because it segfaults on a null pointer dereference.

If you take a look at the code around interface.cpp:470, it is:

EXTERN int __tgt_target_teams_nowait_mapper(
    ident_t *loc, int64_t device_id, void *host_ptr, int32_t arg_num,
    void **args_base, void **args, int64_t *arg_sizes, int64_t *arg_types,
    map_var_info_t *arg_names, void **arg_mappers, int32_t team_num,
    int32_t thread_limit, int32_t depNum, void *depList, int32_t noAliasDepNum,
    void *noAliasDepList) {
  TIMESCOPE();
  if (depNum + noAliasDepNum > 0)
    __kmpc_omp_taskwait(loc, __kmpc_global_thread_num(loc));

  return __tgt_target_teams_mapper(loc, device_id, host_ptr, arg_num, args_base,
                                   args, arg_sizes, arg_types, arg_names,
                                   arg_mappers, team_num, thread_limit);
}

Line 470 is if (depNum + noAliasDepNum > 0). The reason it raises an error is, depNum and noAliasDepNum are not passed to the function call at all due to the known issue we have in clang. Actually, depNum, depList, noAliasDepNum, and noAliasDepList are all not passed on the callsite. So your issue encountered probably has nothing to do with this part.

I did try on my local systems with NVIDIA GPUs. I didn't encounter any crash/hang with 1000 runs. The only potential problem is printf in the target region doesn't work at all, which I believe has nothing to do with this patch.

Fixed some issues in miniQMC

The only potential problem is printf in the target region doesn't work at all, which I believe has nothing to do with this patch.

Do you see the print statements within the region without this patch applied? On sm_50, I see all the print output on trunk and the ones inside target missing with this patch.

I suspect there is a race condition in the library that this patch has exposed.

Did you run the test under valgrind? The fair scheduler setting does a reasonable job of perturbing thread order, though I suppose one should use an actual race detector instead.

rebased and remove unnecessary struct data member

The only potential problem is printf in the target region doesn't work at all, which I believe has nothing to do with this patch.

Do you see the print statements within the region without this patch applied? On sm_50, I see all the print output on trunk and the ones inside target missing with this patch.

I suspect there is a race condition in the library that this patch has exposed.

Did you run the test under valgrind? The fair scheduler setting does a reasonable job of perturbing thread order, though I suppose one should use an actual race detector instead.

I tested the latest version of this patch, and it can print out all information. Can you give it a shot on your side with AMD GPUs?

The test still doesn't work ideally on amdgpu, but it no longer crashes, and some of the print statements within the target region are seen.

jdoerfert accepted this revision.Mon, Jan 25, 7:12 PM

Known issues resolved, AMDGPU is not yet a supported target and hard to test right now. LG

This revision is now accepted and ready to land.Mon, Jan 25, 7:12 PM
This revision was landed with ongoing or failed builds.Mon, Jan 25, 7:16 PM
This revision was automatically updated to reflect the committed changes.