View Single Post
Old 04 December 2023, 16:36   #69
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,115
Only expected interrupts. However, I've figured out why it's slow on my machine: ATC-misses! If I disable page address translation and rely just on the TTR's I get 34.72 / 28.57 fps respectively.

I timed a small test program that reads a byte from a (pseudo-)random offset into an array and varied the size:


Up to 256KB there are no differences, and the no-MMU case stays flat at around ~375ns/loop iteration, while with MMU it grows to 867 for at 32MB array.

EDIT: 256KB of course lines up perfectly with 64-entry ATC and page size of 4K, and forgot something actionable: Of course being more cache friendly is likely a big rework, but I think grouping (height,color) rather than having separate arrays would likely be an easy win. Obviously you can't just switch switch off MMU w/o consequences, so don't change something like that in your own code.
Attached Thumbnails
Click image for larger version

Name:	atc-miss.png
Views:	308
Size:	18.9 KB
ID:	80929  

Last edited by paraj; 04 December 2023 at 19:40.
paraj is offline  
 
Page generated in 0.04534 seconds with 11 queries