The presence of a ReadAdvance for the input operand at index 0 is problematic (as shown by the diff in the llvm-mca test).
A broadcast cannot start executing if the base address for the load hasn't been computed yet.
In the llvm-mca example, the VBROADCASTSS has to wait on the write from LEAQ. If we apply ReadAdvance to the register read associated with the base address, then we wrongly assume that the load can start 3 cycles in advance.