14 October 2010, 22:41 | #1 |
AMOS Extensions Developer
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 44
Posts: 1,924
|
Measuring speed with pixels
Hi folks,
I am trying to piece together some code that enable me to test and see how fast an instruction, or series of instructions, are. The problem is that the code I have (partially) written means that when I change the instruction/s to test, the pixels of measurement flicker.. (e.g. don't start in the same place each frame, and thus cause a flickering effect). Here's my code: Code:
; ; Commands Speed Test ;********************* ; ; By Andrew D. Burton, October 2010 ; ; Based on code from the Danish Assembler Course, Init Screen (part 12, ; page 17). This demonstrates the speed (in pixels) of each command. ; ; This program sets up a lowres screen 320x256 pixels in size, with one ; bitplane. ; COLOR00 equ $DFF180 VPOSR equ $DFF004 move.w #$4000,$dff09a ;INTENA, disable interrupts move.w #$01a0,$dff096 ;DMACON, bitplane, copper & sprite ; DMA disabled lea.l screen(pc),a1 ;screen address -> a1 lea.l bplcop+2(pc),a2 ;bplcop+2 address -> a2 move.l a1,d1 move.w d1,4(a2) ;set address (BPL1PTH) swap d1 move.w d1,(a2) ;set address (BPL1PTL) lea.l copper(pc),a1 ;copper address -> a1 move.l a1,$dff080 ;COP1LCH, our copper set move.w #$8180,$dff096 ;DMACON,bitplane & copper DMA enabled main: movem.l d0-d7/a0-a6,-(sp) ;store registers lea.l screen(pc),a0 add #400,a0 ;move to line 4 (80*5) move.l #$FFFF,(a0) ;mark part of the screen with color 1 waitline: ;wait for scanline at bottom of ;screen to get vertical sync :) move.l VPOSR,d0 asr.l #8,d0 ;shift bits into place and.l #$1FF,d0 ;mask cmp.w #200,d0 ;wait for line xxx (was #200) blt waitline ;line reached? moveq #0,d1 waitlineb: move.l VPOSR,d0 asr.l #8,d0 and.l #$1FF,d0 cmp.w #100,d0 ;was #100 bne waitlineb ;was blt dbra d1,waitlineb move.w #$FFF,COLOR00 * move.b #$FF,dummy ;<<<---- command/s to test ********* moveq #10,d0 mulu #40,d0 move.w #$000,COLOR00 waitline2: move.l VPOSR,d0 asr.l #8,d0 and.l #$1FF,d0 cmp.w #100,d0 blt waitline2 * move.w #$000,COLOR00 exit: btst #6,$bfe001 bne waitline ;LMB pressed? (was main) movem.l (sp)+,d0-d7/a0-a6 ;restore registers move.l 4.w,a6 ;ExecBase -> a6 move.l 156(a6),a6 move.l 38(a6),$dff080 ;COP1LCH,restore workbench copperlist move.w #$8020,$dff096 ;DMACON, sprite DMA enabled rts ;exit to workbench copper: dc.w $2001,$fffe ;wait for scanline $20 dc.w $0102,$0000 ;BPLCON1 cleared & setup in program dc.w $0104,$0000 ;BPLCON2 cleared & setup in program dc.w $0108,$0000 ;BPL1MOD cleared & setup in program dc.w $010a,$0000 ;BPL2MOD cleared & setup in program dc.w $008e,$2c81 ;DIWSTRT dc.w $0090,$f4c1 ;DIWSTOP dc.w $0090,$38c1 ;DIWSTOP (overscan) dc.w $0092,$0038 ;DDFSTRT dc.w $0094,$00d0 ;DDFSTOP * dc.w $0180,$0000 ;COLOR00 set as $000 dc.w $0182,$0ff0 ;COLOR01 set as $FF0 dc.w $2c01,$fffe ;wait for scanline $2C bplcop: dc.w $00e0,$0000 ;BPL1PTH cleared & setup in program dc.w $00e2,$0000 ;BPL1PTL cleared & setup in program dc.w $0100,$1200 ;BPLCON0 dc.w $6201,$FFFE ;2 horizontal (Y) green lines dc.w $0180,$00F0 dc.w $6301,$FFFE dc.w $0180,$0000 dc.w $6601,$FFFE dc.w $0180,$00F0 dc.w $6701,$FFFE dc.w $0180,$0000 dc.w $ffdf,$fffe ;wait for scanline $FF dc.w $2c01,$fffe ;wait for scanline $2C dc.w $0100,$0200 ;BPLCON0 dc.w $ffff,$fffe ;wait forever (until next VBL) screen: * blk.w 5120,0 dcb.w 5120,0 ;dcb = define constant block ($AAAA) ;5120,0 = length, pre-set value dummy: ;dummy address used for testing dc.l 0 I did a little reading of the Hardware Reference manual and it mentions the registers VPOSW and VHPOSW. Should I be using these to get my screen to sync instead? Regards, Lonewolf10 |
15 October 2010, 01:21 | #2 |
Ya' like it Retr0?
Join Date: Jul 2005
Location: United Kingdom
Age: 49
Posts: 9,768
|
the problem with using VPOSW and VHPOSW is that these are "waits" and thus will skew your results in terms of instruction execution speed.
the best bet would be to use a timer - start it by collecting the varible - run your test instruction itterations - then stop the clock - check the time - divide this by the number of instructions executed by the system. otherwise you wll be at the mercy of the vertical and horizonal register conditions. |
15 October 2010, 10:48 | #3 |
gone
Join Date: Apr 2007
Location: completely gone
Posts: 1,596
|
Morning Lonewolf10
The classic way to check code execution times is in raster lines: set the background colour to black wait for a scanline that's visible on screen (use VPOSR / VHPOSR) change the background colour to red run the routine you want to check execution speed of change the background colour back to black count the number of red lines == the number of raster lines the code took to execute I won't post any code cos the above should be easy for you to do plus it looks like that's what you've been aiming at anyway EDIT: the Amiga .exe in the .zip file attached is what doing what I said looks like in practice... Last edited by pmc; 20 November 2010 at 10:41. |
15 October 2010, 11:01 | #4 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,602
|
It is not possible to wait for exact horizontal position because it increases by one every second 68000 CPU clock but all CPU memory accesses take 4 CPU clock cycles and comparison instructions take at least 8 or so CPU cycles..
|
15 October 2010, 18:51 | #5 | |
AMOS Extensions Developer
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 44
Posts: 1,924
|
@ Zetr0
Thanks for the info. @ pmc Yeah, that's what I was aiming for, but I was hoping to get the exact timings for each instruction since I can't seem to find any references to them online (other than moveq takes 6 cycles and clr.l takes 8). It will be a useful routine to have for measuring routines though, once I get the code right :P Quote:
Yes, I seem to have found that out the hard way I don't suppose you have a list of all the cycle times for each instruction?? I do have various PDF's that reference MC680X0 cycle times, but they seem to be for the later CPU's (and FPU's), rather than the 68K Regards, Lonewolf10 |
|
15 October 2010, 19:13 | #6 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,602
|
68000 timings are fully documented in MC68000UM : M68000 8-16-32-Bit Microprocessors User's Manual (pdf) (http://cache.freescale.com/files/32b.../MC68000UM.pdf)
|
17 October 2010, 03:34 | #7 | |
AMOS Extensions Developer
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 44
Posts: 1,924
|
Quote:
Thanks for the link Regards, Lonewolf10 |
|
19 October 2010, 22:46 | #8 |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,753
|
I do rely on raster to time my stuff (read VHPOS before and after, 'log' the difference). It's only to see if something goes 'faster or slower' than before the code-change. For not-too-long code runs, you can use the CIA timers. For runs of minutes, the TOD clock can be read (or indeed a stopwatch).
For 68000 *optimization* (not actual time with misc DMA running) you can calculate a sum of clock cycles. But for real-world uses, your code will go faster the faster your Amiga's CPU is. Obvious and general stuff, hope some of it helps. |
20 October 2010, 21:50 | #9 | |
AMOS Extensions Developer
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 44
Posts: 1,924
|
Quote:
I think I will go with the CIA timers for now, as it's just a simple grab time at start, time at end and take one from the other. Thanks for the advice. Regards, Lonewolf10 |
|
21 October 2010, 22:23 | #10 |
AMOS Extensions Developer
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 44
Posts: 1,924
|
OK, so which CIA timers should I be using? The TOD's (TODLO,TODMID & TODHI) in the CIA chips seem to run so fast that I get stupid results (e.g. 1325400064) when there's only 1 instruction (a move.l) between the "gettime" routines. I am thinking I should be using timers A (TALO & TAHI) and B (TBLO & TBHI) from CIAB, am I correct?
Regards, Lonewolf10 |
22 October 2010, 08:55 | #11 |
gone
Join Date: Apr 2007
Location: completely gone
Posts: 1,596
|
You can use those CIA timers to count down from a supplied time value - I used those for accurate wait times in my trackloader - so, as you suggested previously you could just do something like:
supply a start value and set the timer running execute your code read the value left in the timer counter and the difference between the start value and time left would then be how long the code took to run. There'll be a maximum start value you can supply (can't remember it off the top off my head...) so I suppose that's what Photon was getting at when he said "not too long code runs" What's the purpose of accurately timing out your code like this Lonewolf10...? Are you just experimenting...? I'm just curious. |
22 October 2010, 09:14 | #12 |
WinUAE developer
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,602
|
TOD counters are relative slow, they count horizontal and vertical syncs. (some models count powersupply ticks, not vsyncs)
Note that you need to read TOD registers in correct order or you will get strange results |
22 October 2010, 18:20 | #13 | ||
AMOS Extensions Developer
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 44
Posts: 1,924
|
Quote:
For several reasons, but the main one is so I can time my code accurately for optimization purposes. It's also so I can work out whether I have enough space to put in another routine, without effecting anything else. After all, a demo would be pretty boring (and unimpressive) with only one thing going on at a time. Quote:
I have been reading Appendix F (CIA chips) of my Amiga HRM. It's only for my timing use, I would not use it in software I release to the public domain. Thanks anyway Regards, Lonewolf10 |
||
23 October 2010, 14:05 | #14 | |
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
|
Quote:
|
|
23 October 2010, 16:04 | #15 |
AMOS Extensions Developer
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 44
Posts: 1,924
|
|
24 October 2010, 19:27 | #16 | |
coder
Join Date: Jul 2009
Location: a galaxy far far away
Age: 50
Posts: 84
|
Quote:
Dude, I've been meaning to ask you about this for some time, now Lonewolf10 asked for me. Great! Now that I've set aside some time out of my usual c64 coding schedule to do some Amiga coding, there are so many things I am having to learn, and so much is rushing at me at once. So much fun stuff here! |
|
14 November 2011, 10:51 | #17 |
Registered User
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 213
|
Hi friends, I am resurrecting this old thread.
I need a more precise measure than numer of lines, so I was thinking to compute the number of "color clocks" occurred between two values of VPOSR+VHPOSR. Maybe is simpler than using CIA Would you consider that approach a lot less accurate than CIA? |
15 November 2011, 07:16 | #18 |
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
|
I think using the CIA timers would be more accurate for tiny measurements, they have about 1.4 microseconds granularity, but I would just let my code run many times over and count scanlines since it's simple and accurate enough.
EDIT: Here's a timing example using the CIA: Code:
ciab = $bfd000 talo = $400 tahi = $500 tblo = $600 tbhi = $700 icr = $d00 cra = $e00 crb = $f00 lea ciab, a5 move.b #%01111111, icr(a5) ; clear CIA interrupts and set up st tblo(a5) ; timer counters st tbhi(a5) st talo(a5) st tahi(a5) move.b #%01010001, crb(a5) ; start timer move.b #%00010001, cra(a5) (code here) sf cra(a5) ; stop timer move.b tbhi(a5), d0 lsl.w #8, d0 move.b tblo(a5), d0 not.w d0 move.b tahi(a5), d1 lsl.w #8, d1 move.b talo(a5), d1 not.w d1 mulu.w #46193, d0 ; 46193 for PAL, 45771 for NTSC mulu.w #46193, d1 add.l d0, d0 lsr.l #8, d1 lsr.l #7, d1 addx.l d1, d0 ; return microseconds in D0 rts Last edited by Leffmann; 15 November 2011 at 07:30. |
17 November 2011, 22:08 | #19 |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,938
|
The method I use is to simply kill the system, then setup a vertical blank interrupt handler that counts frames. The main code then executes the test code a million times (for PAL). On a 50 mhz CPU this will give you the time in cycles (don't forget the loop overhead). Works for single instructions.
For different clock speeds and refresh rates just calculate a different number of times the test code has to be executed. Example: PAL+25 mhz CPU=500000 times. |
18 November 2011, 09:31 | #20 | |
Registered User
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 213
|
Quote:
The routine takes much less than a frame. Executing the routine many times in a frame will alter the results, because the routine is meant to start during vblank, if I repeat the second execution in the frame will not run in the "real" DMA condition. I want to count the number of color clocks elapsed, I'll do it first using CIA and then computing the difference between the VPOSR positions. So I will double check my measurments |
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Poland in pixels | s2325 | Nostalgia & memories | 3 | 05 May 2014 22:38 |
Pushing Pesky Pixels | Zetr0 | support.Hardware | 0 | 21 October 2010 23:49 |
PIXELS by PATRICK JEAN | s2325 | Nostalgia & memories | 12 | 10 April 2010 20:39 |
Image off by 3 pixels when using filters | Maren | support.WinUAE | 2 | 27 February 2010 13:55 |
How to make Objects in material pixels.... | sebmacfly | Nostalgia & memories | 2 | 20 July 2009 16:19 |
|
|