M33 still can do some magic, this sequence doesn't stall the pipeline, although has some ACCESS_CONTESTED in SRAM0, maybe not enough to starve the prefetch FIFO (guess 2 words - 8 bytes), and executes in 18 cycles:65535 bytes / 18442 CPU cycles
(122947 ns) = 533.036011 MB/s
1.200651 cycles/word, 18.009766 cycles/loop
Code:
200002b0:c111 stmiar1!, {r0, r4}200002b2:c111 stmiar1!, {r0, r4}200002b4:c111 stmiar1!, {r0, r4}200002b6:c111 stmiar1!, {r0, r4}200002b8:3a01 subsr2, #1200002ba:c111 stmiar1!, {r0, r4}200002bc:c111 stmiar1!, {r0, r4}200002be:c111 stmiar1!, {r0, r4}200002c0:c111 stmiar1!, {r0, r4}200002c2:d1f5 bne.n200002b0 <main+0x150>(122947 ns) = 533.036011 MB/s
1.200651 cycles/word, 18.009766 cycles/loop
Statistics: Posted by gmx — Wed Mar 12, 2025 3:23 am