English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 14 October 2010, 22:41   #1
Lonewolf10
AMOS Extensions Developer
 
Lonewolf10's Avatar
 
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 44
Posts: 1,924
Measuring speed with pixels

Hi folks,

I am trying to piece together some code that enable me to test and see how fast an instruction, or series of instructions, are. The problem is that the code I have (partially) written means that when I change the instruction/s to test, the pixels of measurement flicker.. (e.g. don't start in the same place each frame, and thus cause a flickering effect).

Here's my code:
Code:
;
; Commands Speed Test
;*********************
;
; By Andrew D. Burton, October 2010
;
; Based on code from the Danish Assembler Course, Init Screen (part 12,
; page 17). This demonstrates the speed (in pixels) of each command.
;
; This program sets up a lowres screen 320x256 pixels in size, with one
; bitplane.
;

COLOR00        equ    $DFF180
VPOSR        equ    $DFF004


    move.w    #$4000,$dff09a        ;INTENA, disable interrupts
    move.w    #$01a0,$dff096        ;DMACON, bitplane, copper & sprite
                    ;     DMA disabled

    lea.l    screen(pc),a1        ;screen address -> a1
    lea.l    bplcop+2(pc),a2        ;bplcop+2 address -> a2
    move.l    a1,d1
    move.w    d1,4(a2)        ;set address (BPL1PTH)
    swap    d1
    move.w    d1,(a2)            ;set address (BPL1PTL)
    lea.l    copper(pc),a1        ;copper address -> a1
    move.l    a1,$dff080        ;COP1LCH, our copper set
    move.w    #$8180,$dff096        ;DMACON,bitplane & copper DMA enabled

main:
    movem.l    d0-d7/a0-a6,-(sp)    ;store registers

    lea.l    screen(pc),a0
    add    #400,a0            ;move to line 4 (80*5)
    move.l    #$FFFF,(a0)        ;mark part of the screen with color 1

waitline:                ;wait for scanline at bottom of
                    ;screen to get vertical sync :)
    move.l    VPOSR,d0
    asr.l    #8,d0            ;shift bits into place
    and.l    #$1FF,d0        ;mask
    cmp.w    #200,d0            ;wait for line xxx (was #200)
    blt    waitline        ;line reached?
    moveq    #0,d1

waitlineb:
    move.l    VPOSR,d0
    asr.l    #8,d0
    and.l    #$1FF,d0
    cmp.w    #100,d0            ;was #100
    bne    waitlineb        ;was blt
    dbra    d1,waitlineb
    move.w    #$FFF,COLOR00

*    move.b    #$FF,dummy        ;<<<---- command/s to test *********
    moveq    #10,d0
    mulu    #40,d0
    move.w    #$000,COLOR00

waitline2:
    move.l    VPOSR,d0
    asr.l    #8,d0
    and.l    #$1FF,d0
    cmp.w    #100,d0
    blt    waitline2
*    move.w    #$000,COLOR00


exit:
    btst    #6,$bfe001
    bne    waitline        ;LMB pressed? (was main)

    movem.l    (sp)+,d0-d7/a0-a6    ;restore registers
    move.l    4.w,a6            ;ExecBase -> a6
    move.l    156(a6),a6
    move.l    38(a6),$dff080        ;COP1LCH,restore workbench copperlist

    move.w    #$8020,$dff096        ;DMACON, sprite DMA enabled
    rts                ;exit to workbench


copper:
    dc.w    $2001,$fffe        ;wait for scanline $20
    dc.w    $0102,$0000        ;BPLCON1 cleared & setup in program
    dc.w    $0104,$0000        ;BPLCON2 cleared & setup in program
    dc.w    $0108,$0000        ;BPL1MOD cleared & setup in program
    dc.w    $010a,$0000        ;BPL2MOD cleared & setup in program
    dc.w    $008e,$2c81        ;DIWSTRT
    dc.w    $0090,$f4c1        ;DIWSTOP
    dc.w    $0090,$38c1        ;DIWSTOP (overscan)
    dc.w    $0092,$0038        ;DDFSTRT
    dc.w    $0094,$00d0        ;DDFSTOP
*    dc.w    $0180,$0000        ;COLOR00 set as $000
    dc.w    $0182,$0ff0        ;COLOR01 set as $FF0
    dc.w    $2c01,$fffe        ;wait for scanline $2C
bplcop:
    dc.w    $00e0,$0000        ;BPL1PTH cleared & setup in program
    dc.w    $00e2,$0000        ;BPL1PTL cleared & setup in program
    dc.w    $0100,$1200        ;BPLCON0

    dc.w    $6201,$FFFE        ;2 horizontal (Y) green lines
    dc.w    $0180,$00F0
    dc.w    $6301,$FFFE
    dc.w    $0180,$0000
    dc.w    $6601,$FFFE
    dc.w    $0180,$00F0
    dc.w    $6701,$FFFE
    dc.w    $0180,$0000

    dc.w    $ffdf,$fffe        ;wait for scanline $FF
    dc.w    $2c01,$fffe        ;wait for scanline $2C
    dc.w    $0100,$0200        ;BPLCON0
    dc.w    $ffff,$fffe        ;wait forever (until next VBL)

screen:
*    blk.w    5120,0
    dcb.w    5120,0            ;dcb = define constant block ($AAAA)
                    ;5120,0 = length, pre-set value

dummy:                    ;dummy address used for testing
    dc.l    0

I did a little reading of the Hardware Reference manual and it mentions the registers VPOSW and VHPOSW. Should I be using these to get my screen to sync instead?


Regards,
Lonewolf10
Lonewolf10 is offline  
Old 15 October 2010, 01:21   #2
Zetr0
Ya' like it Retr0?
 
Zetr0's Avatar
 
Join Date: Jul 2005
Location: United Kingdom
Age: 49
Posts: 9,768
the problem with using VPOSW and VHPOSW is that these are "waits" and thus will skew your results in terms of instruction execution speed.

the best bet would be to use a timer - start it by collecting the varible - run your test instruction itterations - then stop the clock - check the time - divide this by the number of instructions executed by the system.

otherwise you wll be at the mercy of the vertical and horizonal register conditions.
Zetr0 is offline  
Old 15 October 2010, 10:48   #3
pmc
gone
 
pmc's Avatar
 
Join Date: Apr 2007
Location: completely gone
Posts: 1,596
Morning Lonewolf10

The classic way to check code execution times is in raster lines:

set the background colour to black
wait for a scanline that's visible on screen (use VPOSR / VHPOSR)
change the background colour to red
run the routine you want to check execution speed of
change the background colour back to black
count the number of red lines == the number of raster lines the code took to execute

I won't post any code cos the above should be easy for you to do plus it looks like that's what you've been aiming at anyway

EDIT: the Amiga .exe in the .zip file attached is what doing what I said looks like in practice...

Last edited by pmc; 20 November 2010 at 10:41.
pmc is offline  
Old 15 October 2010, 11:01   #4
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,506
It is not possible to wait for exact horizontal position because it increases by one every second 68000 CPU clock but all CPU memory accesses take 4 CPU clock cycles and comparison instructions take at least 8 or so CPU cycles..
Toni Wilen is offline  
Old 15 October 2010, 18:51   #5
Lonewolf10
AMOS Extensions Developer
 
Lonewolf10's Avatar
 
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 44
Posts: 1,924
@ Zetr0

Thanks for the info.


@ pmc

Yeah, that's what I was aiming for, but I was hoping to get the exact timings for each instruction since I can't seem to find any references to them online (other than moveq takes 6 cycles and clr.l takes 8).
It will be a useful routine to have for measuring routines though, once I get the code right :P


Quote:
Originally Posted by Toni Wilen View Post
It is not possible to wait for exact horizontal position because it increases by one every second 68000 CPU clock but all CPU memory accesses take 4 CPU clock cycles and comparison instructions take at least 8 or so CPU cycles..

Yes, I seem to have found that out the hard way
I don't suppose you have a list of all the cycle times for each instruction??

I do have various PDF's that reference MC680X0 cycle times, but they seem to be for the later CPU's (and FPU's), rather than the 68K


Regards,
Lonewolf10
Lonewolf10 is offline  
Old 15 October 2010, 19:13   #6
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,506
68000 timings are fully documented in MC68000UM : M68000 8-16-32-Bit Microprocessors User's Manual (pdf) (http://cache.freescale.com/files/32b.../MC68000UM.pdf)
Toni Wilen is offline  
Old 17 October 2010, 03:34   #7
Lonewolf10
AMOS Extensions Developer
 
Lonewolf10's Avatar
 
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 44
Posts: 1,924
Quote:
Originally Posted by Toni Wilen View Post
68000 timings are fully documented in MC68000UM : M68000 8-16-32-Bit Microprocessors User's Manual (pdf) (http://cache.freescale.com/files/32b.../MC68000UM.pdf)

Thanks for the link


Regards,
Lonewolf10
Lonewolf10 is offline  
Old 19 October 2010, 22:46   #8
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
I do rely on raster to time my stuff (read VHPOS before and after, 'log' the difference). It's only to see if something goes 'faster or slower' than before the code-change. For not-too-long code runs, you can use the CIA timers. For runs of minutes, the TOD clock can be read (or indeed a stopwatch).

For 68000 *optimization* (not actual time with misc DMA running) you can calculate a sum of clock cycles. But for real-world uses, your code will go faster the faster your Amiga's CPU is.

Obvious and general stuff, hope some of it helps.
Photon is offline  
Old 20 October 2010, 21:50   #9
Lonewolf10
AMOS Extensions Developer
 
Lonewolf10's Avatar
 
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 44
Posts: 1,924
Quote:
Originally Posted by Photon View Post
I do rely on raster to time my stuff (read VHPOS before and after, 'log' the difference). It's only to see if something goes 'faster or slower' than before the code-change. For not-too-long code runs, you can use the CIA timers. For runs of minutes, the TOD clock can be read (or indeed a stopwatch).

For 68000 *optimization* (not actual time with misc DMA running) you can calculate a sum of clock cycles. But for real-world uses, your code will go faster the faster your Amiga's CPU is.

Obvious and general stuff, hope some of it helps.

I think I will go with the CIA timers for now, as it's just a simple grab time at start, time at end and take one from the other.

Thanks for the advice.


Regards,
Lonewolf10
Lonewolf10 is offline  
Old 21 October 2010, 22:23   #10
Lonewolf10
AMOS Extensions Developer
 
Lonewolf10's Avatar
 
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 44
Posts: 1,924
OK, so which CIA timers should I be using? The TOD's (TODLO,TODMID & TODHI) in the CIA chips seem to run so fast that I get stupid results (e.g. 1325400064) when there's only 1 instruction (a move.l) between the "gettime" routines. I am thinking I should be using timers A (TALO & TAHI) and B (TBLO & TBHI) from CIAB, am I correct?


Regards,
Lonewolf10
Lonewolf10 is offline  
Old 22 October 2010, 08:55   #11
pmc
gone
 
pmc's Avatar
 
Join Date: Apr 2007
Location: completely gone
Posts: 1,596
You can use those CIA timers to count down from a supplied time value - I used those for accurate wait times in my trackloader - so, as you suggested previously you could just do something like:

supply a start value and set the timer running
execute your code
read the value left in the timer counter

and the difference between the start value and time left would then be how long the code took to run.

There'll be a maximum start value you can supply (can't remember it off the top off my head...) so I suppose that's what Photon was getting at when he said "not too long code runs"

What's the purpose of accurately timing out your code like this Lonewolf10...? Are you just experimenting...? I'm just curious.
pmc is offline  
Old 22 October 2010, 09:14   #12
Toni Wilen
WinUAE developer
 
Join Date: Aug 2001
Location: Hämeenlinna/Finland
Age: 49
Posts: 26,506
TOD counters are relative slow, they count horizontal and vertical syncs. (some models count powersupply ticks, not vsyncs)


Note that you need to read TOD registers in correct order or you will get strange results
Toni Wilen is offline  
Old 22 October 2010, 18:20   #13
Lonewolf10
AMOS Extensions Developer
 
Lonewolf10's Avatar
 
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 44
Posts: 1,924
Quote:
Originally Posted by pmc View Post
What's the purpose of accurately timing out your code like this Lonewolf10...? Are you just experimenting...? I'm just curious.

For several reasons, but the main one is so I can time my code accurately for optimization purposes. It's also so I can work out whether I have enough space to put in another routine, without effecting anything else. After all, a demo would be pretty boring (and unimpressive) with only one thing going on at a time.

Quote:
Originally Posted by Toni Wilen View Post
TOD counters are relative slow, they count horizontal and vertical syncs. (some models count powersupply ticks, not vsyncs)

I have been reading Appendix F (CIA chips) of my Amiga HRM. It's only for my timing use, I would not use it in software I release to the public domain.
Thanks anyway


Regards,
Lonewolf10
Lonewolf10 is offline  
Old 23 October 2010, 14:05   #14
Leffmann
 
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
Quote:
Originally Posted by Lonewolf10 View Post
OK, so which CIA timers should I be using? The TOD's (TODLO,TODMID & TODHI) in the CIA chips seem to run so fast that I get stupid results (e.g. 1325400064) when there's only 1 instruction (a move.l) between the "gettime" routines. I am thinking I should be using timers A (TALO & TAHI) and B (TBLO & TBHI) from CIAB, am I correct?
The CIA A is used by the AmigaOS so yeah you need to use the CIA B. The CIA registers are byte sized and are placed 256 bytes apart in the address space, that's why you get bogus numbers when you do longword reads on them.
Leffmann is offline  
Old 23 October 2010, 16:04   #15
Lonewolf10
AMOS Extensions Developer
 
Lonewolf10's Avatar
 
Join Date: Jun 2007
Location: near Cambridge, UK
Age: 44
Posts: 1,924
Quote:
Originally Posted by Leffmann View Post
The CIA A is used by the AmigaOS so yeah you need to use the CIA B. The CIA registers are byte sized and are placed 256 bytes apart in the address space, that's why you get bogus numbers when you do longword reads on them.

Ahh, oops!


Thanks,
Lonewolf10
Lonewolf10 is offline  
Old 24 October 2010, 19:27   #16
Plagueis/KRX
coder
 
Plagueis/KRX's Avatar
 
Join Date: Jul 2009
Location: a galaxy far far away
Age: 49
Posts: 84
Quote:
Originally Posted by pmc View Post
Morning Lonewolf10

The classic way to check code execution times is in raster lines:

set the background colour to black
wait for a scanline that's visible on screen (use VPOSR / VHPOSR)
change the background colour to red
run the routine you want to check execution speed of
change the background colour back to black
count the number of red lines == the number of raster lines the code took to execute

I won't post any code cos the above should be easy for you to do plus it looks like that's what you've been aiming at anyway

EDIT: the Amiga .exe in the .zip file attached is what doing what I said looks like in practice...


Dude, I've been meaning to ask you about this for some time, now Lonewolf10 asked for me. Great! Now that I've set aside some time out of my usual c64 coding schedule to do some Amiga coding, there are so many things I am having to learn, and so much is rushing at me at once. So much fun stuff here!
Plagueis/KRX is offline  
Old 14 November 2011, 10:51   #17
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 213
Hi friends, I am resurrecting this old thread.
I need a more precise measure than numer of lines, so I was thinking to compute the number of "color clocks" occurred between two values of VPOSR+VHPOSR. Maybe is simpler than using CIA

Would you consider that approach a lot less accurate than CIA?
TheDarkCoder is offline  
Old 15 November 2011, 07:16   #18
Leffmann
 
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
I think using the CIA timers would be more accurate for tiny measurements, they have about 1.4 microseconds granularity, but I would just let my code run many times over and count scanlines since it's simple and accurate enough.

EDIT: Here's a timing example using the CIA:

Code:
ciab    = $bfd000
talo    = $400
tahi    = $500
tblo    = $600
tbhi    = $700
icr     = $d00
cra     = $e00
crb     = $f00

        lea      ciab, a5

        move.b   #%01111111, icr(a5)   ; clear CIA interrupts and set up
        st       tblo(a5)              ; timer counters
        st       tbhi(a5)
        st       talo(a5)
        st       tahi(a5)

        move.b   #%01010001, crb(a5)   ; start timer
        move.b   #%00010001, cra(a5)

        (code here)

        sf       cra(a5)               ; stop timer

        move.b   tbhi(a5), d0
        lsl.w    #8, d0
        move.b   tblo(a5), d0
        not.w    d0
        move.b   tahi(a5), d1
        lsl.w    #8, d1
        move.b   talo(a5), d1
        not.w    d1
        mulu.w   #46193, d0            ; 46193 for PAL, 45771 for NTSC
        mulu.w   #46193, d1
        add.l    d0, d0
        lsr.l    #8, d1
        lsr.l    #7, d1
        addx.l   d1, d0                ; return microseconds in D0
        rts

Last edited by Leffmann; 15 November 2011 at 07:30.
Leffmann is offline  
Old 17 November 2011, 22:08   #19
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
The method I use is to simply kill the system, then setup a vertical blank interrupt handler that counts frames. The main code then executes the test code a million times (for PAL). On a 50 mhz CPU this will give you the time in cycles (don't forget the loop overhead). Works for single instructions.

For different clock speeds and refresh rates just calculate a different number of times the test code has to be executed. Example: PAL+25 mhz CPU=500000 times.
Thorham is online now  
Old 18 November 2011, 09:31   #20
TheDarkCoder
Registered User
 
Join Date: Dec 2007
Location: Dark Kingdom
Posts: 213
Quote:
Originally Posted by Thorham View Post
The method I use is to simply kill the system, then setup a vertical blank interrupt handler that counts frames. The main code then executes the test code a million times (for PAL). On a 50 mhz CPU this will give you the time in cycles (don't forget the loop overhead). Works for single instructions.

For different clock speeds and refresh rates just calculate a different number of times the test code has to be executed. Example: PAL+25 mhz CPU=500000 times.
I also use this technique, but it's not appropriate. I would like to compare different instruction schedulings which interact with the DMA, so I cannot just loop the routine but I also have to properly syncronize with the vblank.
The routine takes much less than a frame. Executing the routine many times in a frame will alter the results, because the routine is meant to start during vblank, if I repeat the second execution in the frame will not run in the "real" DMA condition.
I want to count the number of color clocks elapsed, I'll do it first using CIA and then computing the difference between the VPOSR positions. So I will double check my measurments
TheDarkCoder is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Poland in pixels s2325 Nostalgia & memories 3 05 May 2014 22:38
Pushing Pesky Pixels Zetr0 support.Hardware 0 21 October 2010 23:49
PIXELS by PATRICK JEAN s2325 Nostalgia & memories 12 10 April 2010 20:39
Image off by 3 pixels when using filters Maren support.WinUAE 2 27 February 2010 13:55
How to make Objects in material pixels.... sebmacfly Nostalgia & memories 2 20 July 2009 16:19

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 21:29.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.14751 seconds with 13 queries