This adds support for a ternary atomic RMW instruction: cmpxchg.
Details
Diff Detail
- Repository
- rL LLVM
- Build Status
Buildable 20268 Build 20268: arc lint + arc unit
Event Timeline
Turns out this is not currently able to make use of truncating/extending instructions when the 'success flag' of the LLVM IR cmpxchg instruction is used.
I think this CL can be reviewed as is. I will add an optimization for the success flag thing in another CL bc it is gonna be too long. So the problem I was talking about is, let's say we have these two test cases:
define i64 @cmpxchg_i8_i64_loaded_value(i8* %p, i64 %exp, i64 %new) { %exp_t = trunc i64 %exp to i8 %new_t = trunc i64 %new to i8 %pair = cmpxchg i8* %p, i8 %exp_t, i8 %new_t seq_cst seq_cst %old = extractvalue { i8, i1 } %pair, 0 %e = zext i8 %old to i64 ret i64 %e } define i1 @cmpxchg_i8_i64_success(i8* %p, i64 %exp, i64 %new) { %exp_t = trunc i64 %exp to i8 %new_t = trunc i64 %new to i8 %pair = cmpxchg i8* %p, i8 %exp_t, i8 %new_t seq_cst seq_cst %succ = extractvalue { i8, i1 } %pair, 1 ret i1 %succ }
So, in the LLVM IR (not wasm), unlike atomicrmw instruction, cmpxchg instruction returns a pair of { loaded value, success flag }. So it returns an additional 'success flag' which indicates whether the loaded value and the expected value matches. With this CL, the first function's compilation result is going to be
cmpxchg_i8_i64_loaded_value: .param i32, i64, i64 .result i64 i64.atomic.rmw8_u.cmpxchg $push0=, 0($0), $1, $2 return $pop0
But for the second function (which is little contrived, because, usually the success flag is not gonna be returned from a function but likely to be used in a loop condition), this fails to make use of the i64.atomic.rmw8_u.cmpxchg instruction. It's gonna be something like
cmpxchg_i8_i64_success: .param i32, i64, i64 .result i32 i32.wrap/i64 $push6=, $1 tee_local $push5=, $3=, $pop6 i32.wrap/i64 $push0=, $2 i32.atomic.rmw8_u.cmpxchg $push1=, 0($0), $pop5, $pop0 i32.const $push2=, 255 i32.and $push3=, $3, $pop2 i32.eq $push4=, $pop1, $pop3 return $pop4
which is suboptimal. (This only happens when truncation-extension exists.)
I think we need another set of patterns to optimize this. And in case we want to use both the loaded value and the success flag, which I guess is the most common case, we need another set of patterns for that as well. I'll add that in another CL separately.