The problem is the following. With fast8, we broke an important invariant when loading shadows.
A wide shadow of 64 bits used to correspond to 4 application bytes with fast16; so, generating a single load was okay since those 4 application bytes would share a single origin.
Now, using fast8, a wide shadow of 64 bits corresponds to 8 application bytes that should be backed by 2 origins (but we kept generating just one).
Let’s say our wide shadow is 64-bit and consists of the following: 0xABCDEFGH. To check if we need the second origin value, we could do the following (on the 64-bit wide shadow) case:
- bitwise shift the wide shadow left by 32 bits (yielding 0xEFGH0000)
- push the result along with the first origin load to the shadow/origin vectors
- load the second 32-bit origin of the 64-bit wide shadow
- push the wide shadow along with the second origin to the shadow/origin vectors.
The combineOrigins would then select the second origin if the wide shadow is of the form 0xABCDE0000.
The tests illustrate how this change affects the generated bitcode.
-> loadNextOriginAndIncAddr or a similar name? So a reader can follow the main code w/o looking into the method.