- Unit-stride loads and stores can operate at the full bandwidth of the
memory pipe. The memory pipe is DLEN bits wide.
- Strided loads and stores operate at one element per cycle and should
be scheduled accordingly.
- Indexed loads and stores operate at one element per cycle, and they
stall the machine until all addresses have been generated, so they
cannot be scheduled.
- Unit stride seg2 load is number of DLEN parts
- seg3-8 are one segment per cycle, unless the segment is larger
than DLEN in which each segment takes multiple cycles.
I don't think we usually indent after let if the let doesn't use braces.