20 March 2024, 04:16 | #1 |
Registered User
Join Date: Jan 2024
Location: Zagreb / Croatia
Posts: 11
|
Cycle-exact 68000 beam racing for stock A500
Here's what you probably didn't care to know, but it had to be done (TM).
Can you count the exact number of cycles available to CPU during any stock Amiga 500 PAL lowres mode? Why let the C64 coders have all the fun? Have you ever wondered just how much CPU budget do you have in a frame? Have you ever scratched your head about how such calculations are done? Is it 4am and you need something to finally make you feel at ease and put you to sleep? Can you write code that does not re-synchronize to the beam position, but still visits every line at exactly the same time, regardless of screen size and number of bitplanes? Heavily inspired by hooverphonique's recent thread and ross's excellent minimalist example, I wrote some code and tested it on my Amiga 600. I am attaching an exe so you can test it, if for some reason this is hard for you to assemble. Your kid is crying and your wife is threatening divorce because you are still coding instead of spending time with them.. so you just need a quick cheat sheet? Check out this post Code:
;; Cycle-exact 68000 CPU screen routine / clock calculator by !ZAJC!/GDS ;; Credits to ross for the most excellent brute force nop post on EAB ;; SPDX-License-Identifier: 0BSD ;; Probably a good emulator test? Tested on real hardware ;; See you @ Revision 2024 :) ROSS_SCREEN=0 ; <- ultra hyper mega overscan by ross WANT_MOUSE=1 ; <- can click lmb to exit, minor jitter if set MARK_YPOS=$2A ; <- where to show the mark NBPLS=6 ; <- screen colordepth in bitplanes IF ROSS_SCREEN ; resolution: 376*286 + 24 px hscroll DIWSTRT_Y equ $1a DIWSTRT_X equ $5c DIWSTOP_Y equ $38 DIWSTOP_X equ $c8 DDFSTRT equ $18 DDFSTOP equ $d8 ELSE ; resolution: 320*256 + no extra scroll DIWSTRT_Y equ $2c DIWSTRT_X equ $81 DIWSTOP_Y equ $2c DIWSTOP_X equ $c1 DDFSTRT equ $38 DDFSTOP equ $d0 ENDC DIWSTRT equ (DIWSTRT_Y<<8)!(DIWSTRT_X) DIWSTOP equ (DIWSTOP_Y<<8)!(DIWSTOP_X) Y_END equ ((~DIWSTOP_Y&$80)<<1)!DIWSTOP_Y ; how many lines of display? BPL_LINES equ Y_END-DIWSTRT_Y ; how many words per line will be fetched for a single bitplane: BPL_DMA_WORDS equ 1+((DDFSTOP-DDFSTRT)>>3) ; how many color registers do we need to set for this screen? NUM_COLOR_REGS equ (1<<NBPLS)-((5-NBPLS)>>3)&32 ; for each 16 pixels, how many dma fetches will fall on an odd cycle? ODD_CCKSperWORD equ ((NBPLS&4)*(NBPLS-4))>>2 ; for each line of display, how many odd cycle dma fetches do we have? ODD_CCKSperLINE equ ODD_CCKSperWORD*BPL_DMA_WORDS ; how many nops will we fail to do during a bitplane dma fetch per line? ODD_NOPSperLINE equ ODD_CCKSperLINE ; number of long frame PAL lines (standard PAL mode setup by Kickstart) PAL_LINES_LOF equ 313 ; number of lines that are not used for bitplane display NOT_BPL_LINES equ PAL_LINES_LOF-BPL_LINES ; color clocks / dma slots per PAL line CCKS_PER_LINE equ 227 ; CPU clock ticks per PAL line CLKS_PER_LINE equ CCKS_PER_LINE*2 ; Number of NOP instructions fitting into a non-display line NOPS_PER_NBLINE equ CLKS_PER_LINE/4 ; Number of NOP instructions fitting into a display line NOPS_PER_BLINE equ NOPS_PER_NBLINE-ODD_NOPSperLINE ; Calculate the number of NOPs and available CPU clocks/frame NBLINE_NOPS equ NOPS_PER_NBLINE*NOT_BPL_LINES BLINE_NOPS equ NOPS_PER_BLINE*BPL_LINES NOPS_PER_FRAME equ NBLINE_NOPS+BLINE_NOPS CLKS_PER_FRAME equ NOPS_PER_FRAME*4 ; count non-nop code that we have to do in each frame SETCOLORS_CLKS equ 2*8 JMP_IMM_CLKS equ 12 MOUSETST_CLKS equ WANT_MOUSE*28 INSN_CLKS equ SETCOLORS_CLKS+JMP_IMM_CLKS+MOUSETST_CLKS INSN_NOPS equ INSN_CLKS/4 ; number of nops we will generate for our per-frame code nops equ NOPS_PER_FRAME-INSN_NOPS SECTION main,code_c moveq #0,d2 lea $dff000,a6 move.w #$4000,$9a(a6) ; all interrupts off move.w #$03ff,$96(a6) ; all dma off move.w #$0200,$100(a6) ; no bitplanes move.w d2,$102(a6) ; BPLCON1 <- 0 move.w #DIWSTRT,$8e(a6) move.w #DIWSTOP,$90(a6) move.w #DDFSTRT,$92(a6) move.w #DDFSTOP,$94(a6) lea $180(a6),a0 ; all colors to black moveq #NUM_COLOR_REGS-1,d0 .all_black move.w d2,(a0)+ dbf d0,.all_black move.l d2,$140(a6) ; mouse sprite to 0,0 move.w #$8300,$96(a6) ; turn on bitplane DMA move.w #(NBPLS<<12)!$200,$100(a6) ; NBPLS bitplanes move.w #$4e71,d0 ; d0 <- NOP instruction move.w #nops-1,d1 ; fill .code with NOPs lea .code(pc),a0 .cn move.w d0,(a0)+ dbf d1,.cn lea 6(a6),a0 ; A0 <- VHPOSR lea $180(a6),a1 ; A1 <- COLOR00 lea 5(a6),a2 ; A2 <- lobyte of VPOSR lea $bfe001,a3 ; A3 <- CIAA pra for mouse .in_ntsc_lines btst #0,(a2) ; wait until we pass beq.s .in_ntsc_lines ; all NTSC lines .in_pal_lines btst #0,(a2) ; and then PAL lines, so bne.s .in_pal_lines ; that we are at TOF here .line cmp.b #MARK_YPOS,(a0) ; then wait for MARK_YPOS bne.b .line REPT 20 ; get closer to middle add.l d0,d0 ENDR .setcolors move.w d1,(a1) move.w d2,(a1) .code ds.w nops IF WANT_MOUSE btst #6,(a3) beq.s .exit ENDC jmp .setcolors .exit move.w #$0080, $096(a6) ; copper off move.l $04,a5 ; a5 <- execbase move.l 156(a5),a5 ; blittervec move.l 38(a5),$080(a6) ; restore coplist move.w #$83ff, $096(a6) ; all dma on move.w #$c000, $09a(a6) ; interrupts on rts |
20 March 2024, 09:14 | #2 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,488
|
|
22 March 2024, 22:12 | #3 |
German Translator
Join Date: Aug 2018
Location: Drübeck / Germany
Age: 49
Posts: 193
|
the testprogram is also amazing stuff, especially the nop calculator.
I made some short test in WinUAE and cycle-exact mode. Really stable output! I check the nop value for 0 bitplanes. >?$8a21 ; move.w #nops-1,d1 ; fill .code with NOPs $00008A21 = %00000000`00000000`10001010`00100001 = 35361 = 35361 >?$8a21<<2 $00022884 = %00000000`00000010`00101000`10000100 = 141444 = 141444 >?!141476-!141444 $00000020 = %00000000`00000000`00000000`00100000 = 32 = 32 +28(5/2) for the "rest" move.w d1,(a1) move.w d2,(a1) jmp $0003aef0 Is this the way to get this value: 141.476? or how to calculate it? Some WinUAE debugger hints from my side: I tested without mouse. To go back to shell/asmone I make following steps: Shift+F12 ; opens debugger >d pc ; see the nop-code >fi 4ef9 ; debugger runs to the jmp >g <adress> ; run further on the .exit move.w #$0080,$096(a6) Edit: (I found my mistake) I have to calculate with: >?$8a22 ; this ist the nops value! $00008A22 = %00000000`00000000`10001010`00100010 = 35362 = 35362 >?$8a22<<2 $00022888 = %00000000`00000010`00101000`10001000 = 141448 = 141448 >?!141448+28 $000228A4 = %00000000`00000010`00101000`10100100 = 141476 = 141476 > Last edited by Rock'n Roll; 22 March 2024 at 22:18. Reason: correct nops value! |
24 March 2024, 21:45 | #4 |
German Translator
Join Date: Aug 2018
Location: Drübeck / Germany
Age: 49
Posts: 193
|
The cycle calculation '141.476' is now clear for me. (processing/java program is attached)
WinUAE Debugger shows from frame to frame the same time, but the time value is "not userfriendly" Average frame time: 1977919412.57 ms [frames: 2674 time: 212976899] >g Average frame time: 1977919412.57 ms [frames: 2675 time: 212976899] How to interpret the time value? I remember on my idea for a scanline/rasterline counter. I still think the suggestion is good. Counts "runningtime" with every triggering breakpoint. https://eab.abime.net/showpost.php?p...44&postcount=4 |
06 April 2024, 15:13 | #5 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,652
|
Beam-racing is the old way and Amiga showed the new way (bit inspired by Atari 8-bit). It was necessary on the C64 etc in order to do splits, but splits are not a problem with the Copper.
Assessing or formulas are mostly unnecessary, since you can get a much better gut feel for it by changing the background color at spots in the code. (As in, maybe if you do something simple with "a buffer" it could save a few minutes, but as soon as you have some interrupts, blits, or sprites going or optimize your display - or do anything dynamic or interesting really, the formula becomes too complex.) Changing the background color also gives more info, such as if a code line is executed, how often, and how close to priority events (DMA, IRQs, VBL, HBL etc). It can give instant feedback on optimizations, which is very gratifying and makes it the fun part of any new project really That said, there are probably some niche areas that could be explored, such as screen size choices for games running < 50 FPS, although... then you would rather write a benchmark to be able to profile performance hogs. |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Question about new beam racing VSync | Torkio | support.WinUAE | 43 | 11 June 2024 21:23 |
NVIDIA Shadowplay screen tearing with Beam Racing | lordofchaos | support.WinUAE | 2 | 30 March 2019 18:54 |
Beam-racing vsync shearing problems | keith.f.kelly | support.WinUAE | 1 | 17 September 2018 08:22 |
WinUAE 4.0.0 beam racing with Wine/CrossoverMac | bloodline | support.WinUAE | 17 | 04 July 2018 14:09 |
"Approximate A500/A1200 or cycle-exact" at +500% CPU Speed | Foebane | support.WinUAE | 2 | 28 January 2017 08:11 |
|
|