This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Support for a ternary atomic RMW instruction
ClosedPublic

Authored by aheejin on Jul 11 2018, 9:16 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

aheejin created this revision.Jul 11 2018, 9:16 AM
aheejin planned changes to this revision.Jul 11 2018, 2:28 PM

Turns out this is not currently able to make use of truncating/extending instructions when the 'success flag' of the LLVM IR cmpxchg instruction is used.

aheejin updated this revision to Diff 155126.Jul 12 2018, 1:02 AM

Variable name change

I think this CL can be reviewed as is. I will add an optimization for the success flag thing in another CL bc it is gonna be too long. So the problem I was talking about is, let's say we have these two test cases:

define i64 @cmpxchg_i8_i64_loaded_value(i8* %p, i64 %exp, i64 %new) {
  %exp_t = trunc i64 %exp to i8
  %new_t = trunc i64 %new to i8
  %pair = cmpxchg i8* %p, i8 %exp_t, i8 %new_t seq_cst seq_cst
  %old = extractvalue { i8, i1 } %pair, 0
  %e = zext i8 %old to i64
  ret i64 %e
}

define i1 @cmpxchg_i8_i64_success(i8* %p, i64 %exp, i64 %new) {
  %exp_t = trunc i64 %exp to i8
  %new_t = trunc i64 %new to i8
  %pair = cmpxchg i8* %p, i8 %exp_t, i8 %new_t seq_cst seq_cst
  %succ = extractvalue { i8, i1 } %pair, 1
  ret i1 %succ
}

So, in the LLVM IR (not wasm), unlike atomicrmw instruction, cmpxchg instruction returns a pair of { loaded value, success flag }. So it returns an additional 'success flag' which indicates whether the loaded value and the expected value matches. With this CL, the first function's compilation result is going to be

cmpxchg_i8_i64_loaded_value:
  .param    i32, i64, i64
  .result   i64
  i64.atomic.rmw8_u.cmpxchg  $push0=, 0($0), $1, $2
  return    $pop0

But for the second function (which is little contrived, because, usually the success flag is not gonna be returned from a function but likely to be used in a loop condition), this fails to make use of the i64.atomic.rmw8_u.cmpxchg instruction. It's gonna be something like

cmpxchg_i8_i64_success:
  .param    i32, i64, i64
  .result   i32
  i32.wrap/i64  $push6=, $1
  tee_local  $push5=, $3=, $pop6
  i32.wrap/i64  $push0=, $2
  i32.atomic.rmw8_u.cmpxchg  $push1=, 0($0), $pop5, $pop0
  i32.const  $push2=, 255
  i32.and   $push3=, $3, $pop2
  i32.eq    $push4=, $pop1, $pop3
  return    $pop4

which is suboptimal. (This only happens when truncation-extension exists.)

I think we need another set of patterns to optimize this. And in case we want to use both the loaded value and the success flag, which I guess is the most common case, we need another set of patterns for that as well. I'll add that in another CL separately.

aheejin updated this revision to Diff 155726.Jul 16 2018, 11:38 AM
  • Add a TODO
dschuff accepted this revision.Aug 1 2018, 11:19 AM
This revision is now accepted and ready to land.Aug 1 2018, 11:19 AM
This revision was automatically updated to reflect the committed changes.