This patch addresses some bugs in the commutative operand sorting logic, adds some tests and performs some cleanup.
Bug 1:
The first bug occurs when comparing key sizes when at least one ancestorQueue is empty. If one ancestorQueue is not empty, its traversal isn't complete and the size comparison may be incorrect.
In the while loop of the comparison method, before comparing the entries at the current key index, if one of the ancestor queues is empty, then the sizes of the ancestor keys are compared.
Consider operandA and operandB with the following keys and ancestorQueues:
operand A key = { {type: NON_CONSTANT_OP, opName: "arith.addi"}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""} } operand A ancestorQueue = empty operand B key = { {type: NON_CONSTANT_OP, opName: "arith.divsi"} } operand B ancestorQueue = {block_argument, block_argument}
Operand A has a fully generated key and empty ancestor queue while Operand B is not fully generated. When they are compared, operand A should be smaller because at keyIndex 0, "arith.addi" is smaller than "arith.divsi". However, Operand A's ancestor queue is empty, so the following if statement is executed:
if (commOperandA->ancestorQueue.empty() || commOperandB->ancestorQueue.empty()) return commOperandA->key.size() < commOperandB->key.size();
The comparison incorrectly returns false since operand A's key size is 3 and operand B's key size is 1.
To fix this, the entries at the current index must be checked and compared first if possible. The ancestor key sizes must only be compared if the current index exceeds one of the sizes and both ancestor keys are fully generated.
Bug 2:
The second bug lies in the visitedAncestors set skipping previously seen Operation pointers. The visitedAncestors DenseSet that keeps track of ancestors that were already visited attempts to keep track of and prevent checks on duplicate operands. The same operation may be used as multiple different operands and should not be treated as a duplicate.
Consider the following example:
%0 = arith.addi <block argument> <block argument> %1 = arith.subi <block argument> <block argument> %2 = arith.divsi <block argument> <block argument> %3 = arith.muli %0, %0 %4 = arith.muli %1, %0 %5 = arith.muli %0, %2 operand A = arith.divsi %3, %4 operand B = arith.divsi %5, %4
The expected ancestor keys of operands A and B are:
operand A expected key = { {type: NON_CONSTANT_OP, opName: "arith.divsi"}, {type: NON_CONSTANT_OP, opName: "arith.muli"}, {type: NON_CONSTANT_OP, opName: "arith.muli"}, {type: NON_CONSTANT_OP, opName: "arith.addi"}, {type: NON_CONSTANT_OP, opName: "arith.addi"}, // duplicate appearance of %0 {type: NON_CONSTANT_OP, opName: "arith.subi"}, {type: NON_CONSTANT_OP, opName: "arith.addi"}, // duplicate appearance of %0 {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""} } operand B expected key = { {type: NON_CONSTANT_OP, opName: "arith.divsi"}, {type: NON_CONSTANT_OP, opName: "arith.muli"}, {type: NON_CONSTANT_OP, opName: "arith.muli"}, {type: NON_CONSTANT_OP, opName: "arith.addi"}, {type: NON_CONSTANT_OP, opName: "arith.divsi"}, {type: NON_CONSTANT_OP, opName: "arith.subi"}, {type: NON_CONSTANT_OP, opName: "arith.addi"}, // duplicate appearance of %0 {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""} }
The actual ancestor keys of operands A and B are:
operand A actual key = { {type: NON_CONSTANT_OP, opName: "arith.divsi"}, {type: NON_CONSTANT_OP, opName: "arith.muli"}, {type: NON_CONSTANT_OP, opName: "arith.muli"}, {type: NON_CONSTANT_OP, opName: "arith.addi"}, {type: NON_CONSTANT_OP, opName: "arith.subi"}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""} } operand B actual key = { {type: NON_CONSTANT_OP, opName: "arith.divsi"}, {type: NON_CONSTANT_OP, opName: "arith.muli"}, {type: NON_CONSTANT_OP, opName: "arith.muli"}, {type: NON_CONSTANT_OP, opName: "arith.addi"}, {type: NON_CONSTANT_OP, opName: "arith.divsi"}, {type: NON_CONSTANT_OP, opName: "arith.subi"}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""} }
When comparing the expected keys, operand A is smaller than operand B because at keyIndex 4, "arith.addi" is smaller than "arith.divsi". However, when comparing the actual keys, operand B ends up being smaller than operand A because at keyIndex 4, “arith.divsi” is smaller than “arith.subi”.
To fix this, visitedAncestors and its corresponding checks are removed completely.
Bug 3:
The third bug has to do with block arguments not being added to the ancestor queue and key.
The ancestorQueue is a queue of Operation pointers. Currently, ancestors are only added to the queue if they can be successfully cast to Operation pointers, which means Block Arguments are skipped. The key is formed by taking from the front of the ancestorQueue each time, so it also does not contain Block Arguments. This is fine for cases where Block Arguments are on the lowest level of the keys (at the end of the key). However, if Block Arguments appear at other levels, it can cause incorrect comparisons.
Consider the operandA and operandB in the following example:
%0 = arith.addi <block argument> <block argument> %1 = arith.subi <block argument> <block argument> operandA = arith.divsi <block argument>, %1 operandB = arith.divsi %0, %1 commutativeOp = commutative %operandA, %operandB
The expected ancestor keys of operandA and operandB are:
operandA expected key = { {type: NON_CONSTANT_OP, opName: "arith.divsi"}, {type: BLOCK_ARGUMENT, opName: ""}, {type: NON_CONSTANT_OP, opName: "arith.subi"}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""} } operandB expected key = { {type: NON_CONSTANT_OP, opName: "arith.divsi"}, {type: NON_CONSTANT_OP, opName: "arith.addi"}, {type: NON_CONSTANT_OP, opName: "arith.subi"}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""} }
The actual ancestor keys of operandA and operandB are:
operandA actual key = { {type: NON_CONSTANT_OP, opName: "arith.divsi"}, {type: NON_CONSTANT_OP, opName: "arith.subi"} } operandB actual key = { {type: NON_CONSTANT_OP, opName: "arith.divsi"}, {type: NON_CONSTANT_OP, opName: "arith.addi"}, {type: NON_CONSTANT_OP, opName: "arith.subi"} }
When comparing the expected keys, operandA is smaller than operandB because at keyIndex 1, the block argument is smaller than "arith.addi". However, when comparing the actual keys, operand B ends up being smaller than operand A because at keyIndex 1, “arith.addi” is smaller than “arith.subi”.
To fix this, Block Arguments are added to the ancestorQueue in the form of nullptr. The nullcheck for the Operation pointer cast is removed. The AncestorKey constructor which takes an Operation pointer already treats nullptr as Block Arguments, so nothing needs to be changed there.
Bug 4:
The fourth bug is caused by an assertion failure on the comparison function that is passed into stable_sort. stable_sort expects a strictly less than comparison function, which means it should return false for equivalent values.
Note that equivalent operands have the same key size and key values at each index. When the algorithm reaches a keyIndex that exceeds the operands' key sizes, it checks commOperandA first and separately from commOperandB. If the current keyIndex exceeds commOperandA's key size and commOperandA's ancestorQueue is empty, then the comparison function returns true, implying commOperandA is smaller regardless of commOperandB's key.
To fix this, before returning true if commOperandA's ancestorQueue is empty, commOperandB's key size and ancestorQueue is checked for equivalence.
Improvement:
The OpTraitRewritePattern which is a wrapper for the RewritePattern class can be used to perform matchAndRewrite on Ops with a specified trait.
Cleanup and Tests:
Example 2 in the comments is corrected. The key associated with operand %2 is missing a block argument entry at the end. Additionally, the final sorted operand ordering should be %4, %2, %3, %1. Both examples in the comments are added as tests.
Adding this test is unnecessary because it was working before any bug fixes.