English Amiga Board Trying to measuring the CPU cycles/instr ! (A500)
 Register Amiga FAQ Rules & Help Members List  /  Moderators List Today's Posts Mark Forums Read

 23 August 2017, 18:39 #1 amilo3438 Amiga 500 User   Join Date: Jun 2013 Location: EU Posts: 1,198 Trying to measuring the CPU cycles/instr ! (A500) Trying to measuring the CPU cycles/instr using ASM-One V1.20 in WinUAE A500 quickstart ! Here is the program: move.l #\$dff006,a0 ;VHPOSR move.w (a0),d0 ;start_value move.w (a0),d1 ;end_value move.w d1,d2 sub.w d0,d2 ;diff_value rts The result in d2=00000004. So, if resolution of VHPOSR H timer = 280nS (= 2 X 140nS CPU cycles time), that would mean the value in d2 is equal to 8 CPU cycles (or 2 bus cycles) ! That means instruction "move.w (a0),d1" takes 8 CPU cycles (or 2 bus cycles)! On that way would be possible to "measure" the cycles of any instruction in between, f.e.: move.l #\$dff006,a0 ;VHPOSR move.w (a0),d0 ;start_value nop ;unknown_value move.w (a0),d1 ;end_value (we now know this takes \$00000004) move.w d1,d2 ;end_value to d2 sub.w d0,d2 ;end_value - start value sub.w #\$4,d2 ;d2 - \$00000004 = unknown_value rts The 1st result in d2=00000006, but as we already know the last instr takes \$00000004, so the final result in d2=00000002 and that is equal to 4 CPU cycles (or 1 bus cycle)! That means instruction "nop" takes 4 CPU cycles (or 1 bus cycle)! Now would like to check is that correct way or not ? (tia)
 23 August 2017, 18:45 #2 Toni Wilen WinUAE developer   Join Date: Aug 2001 Location: Hämeenlinna/Finland Age: 46 Posts: 24,783 It is not possible to time anything cycle-accurately in software. Each CPU custom register access takes 2 color clocks (1 color clock = ~3.5MHz), even in AGA systems which is huge waste of clocks when CPU is 68020+.
 23 August 2017, 18:50 #3 amilo3438 Amiga 500 User   Join Date: Jun 2013 Location: EU Posts: 1,198 Hmm, I hoped it could work. (at least approx) Thanks on the answer !
23 August 2017, 19:11   #4
amilo3438
Amiga 500 User

Join Date: Jun 2013
Location: EU
Posts: 1,198
Quote:
 Originally Posted by Toni Wilen It is not possible to time anything cycle-accurately in software. Each CPU custom register access takes 2 color clocks (1 color clock = ~3.5MHz)
Yes, but does the two readings of the register at the beginning and the end do not degrade the delay !? (it lefts only what is in between)
(i.e. between the start_value and end_value)

 23 August 2017, 19:20 #5 Toni Wilen WinUAE developer   Join Date: Aug 2001 Location: Hämeenlinna/Finland Age: 46 Posts: 24,783 It isn't that simple either. Each CPU custom access also need to wait until CPU clock is in sync with color clock = there is "unknown" 0-7 extra wasted clocks when CPU is doing nothing (or some internal ALU operation) -> short timing result are totally useless..
23 August 2017, 19:33   #6
amilo3438
Amiga 500 User

Join Date: Jun 2013
Location: EU
Posts: 1,198
Quote:
 Originally Posted by amilo3438 So, if resolution of VHPOSR H timer = 280nS (= 2 X 140nS CPU cycles time), that would mean the value in d2 is equal to 8 CPU cycles (or 2 bus cycles) ! That means instruction "move.w (a0),d1" takes 8 CPU cycles (or 2 bus cycles)! The 1st result in d2=00000006, but as we already know the last instr takes \$00000004, so the final result in d2=00000002 and that is equal to 4 CPU cycles (or 1 bus cycle)! That means instruction "nop" takes 4 CPU cycles (or 1 bus cycle)!
At least it seems above is working for mentioned examples "move.w(a0),d1" and "nop":

http://oldwww.nvg.ntnu.no/amiga/MC68...s/timmove.HTML

Move Byte and Word Instruction Execution Times
(An) Dn 8(2/0)

http://oldwww.nvg.ntnu.no/amiga/MC68...s/timmisc.HTML

instruction size register memory
NOP - 4(1/0) -

Quote:
 Originally Posted by Toni Wilen -> short timing result are totally useless..
Anyway I want to timing only 1 instruction per time, not the program!
And for above mentioned two examples it works! (also need to test some other instructions to confirm is it in general usable)

Last edited by amilo3438; 23 August 2017 at 19:40.

 23 August 2017, 20:28 #7 Toni Wilen WinUAE developer   Join Date: Aug 2001 Location: Hämeenlinna/Finland Age: 46 Posts: 24,783 No. It does not work, it may appear to work in some cases only.
23 August 2017, 21:11   #8
amilo3438
Amiga 500 User

Join Date: Jun 2013
Location: EU
Posts: 1,198
Quote:
 Originally Posted by Toni Wilen No. It does not work, it may appear to work in some cases only.
Can you give an instr example ? (I can't find what is not working.)

 23 August 2017, 21:33 #9 Toni Wilen WinUAE developer   Join Date: Aug 2001 Location: Hämeenlinna/Finland Age: 46 Posts: 24,783 For example if code is already in cache. Which is common real world use case. This is totally useless, those cycle usage charts for 68020+ are only min/max theoretical values, they can be only used to calculate worst/best case situations. EDIT: for some reason I thought you meant 68020+. 68000 has nothing of those and there isn't even any need to do any tests. It is simple and can be fully checked using logic analyzer because previous or next instruction makes no difference to execution speed. This method is quite accurate with 68000 because it is always in sync with color clock and 68000 memory cycle is exactly 2 color clocks. (But make sure to run the test code in real fast ram!)
 23 August 2017, 21:39 #10 amilo3438 Amiga 500 User   Join Date: Jun 2013 Location: EU Posts: 1,198 Yea, I am playing with 68000 on A500 config only ! So far this works fine here. Here is final: move.l #\$dff006,a0 ;VHPOSR test: move.w (a0),d0 cmp #\$00,d0 ;test if Hpos=0 bne test move.w (a0),d0 ;start_value nop ;unknown_cycles to count move.w (a0),d1 ;end_value (we know this takes \$00000004) move.w d1,d2 ;end_value to d2 sub.w d0,d2 ;end_value - start value sub.w #\$4,d2 ;d2 - \$00000004 add d2,d2 ;d2 x 2 = unknown_cycles rts So result of unknown_cycles for "nop" is in d2! ("nop" can be replaced with any other instr to test) P.S. Maybe it would need to disable INTENA on the start and enable INTENA on the end, but I am not sure how to do it !? (and DMA channels too, everything) I am not programmer, have not much experience on machine coding on Amiga. (very little, only basics) Last edited by amilo3438; 23 August 2017 at 21:56.
 23 August 2017, 22:25 #11 Toni Wilen WinUAE developer   Join Date: Aug 2001 Location: Hämeenlinna/Finland Age: 46 Posts: 24,783 Quick and dirty way: move.w #\$4000,INTENA move.w #\$0200,DMACON test code move.w #\$8200,DMACON move.w #\$c000,INTENA But you can still hit refresh cycles when accessing custom registers (or chip ram) that adds extra 2 cycle delay.
 23 August 2017, 22:37 #12 amilo3438 Amiga 500 User   Join Date: Jun 2013 Location: EU Posts: 1,198 Thanks ! I wonder if same/similar code can be used on an A1200 for quick and dirty comparison between a real and emulation ? (at least for unknown cases like muls, mulu, divs etc.) EDIT: Because an A1200 runs at 14MHz I guess the best results would be to slow it down on 7MHz or even 3.5MHz! For example, on standard 14MHz A1200 the test program for "nop" instruction gives 0, with 7MHz it gives 1 and with 3.5 MHz it gives 3 as result. (need to be tested w/o "add.w d2,d2" and "sub.w #\$4,d2" so to measure/compare only color clocks for tested i.e. "nop" instr + "move.w(a0),d1" instr also as its value depends of current CPU frequency.) Last edited by amilo3438; 24 August 2017 at 00:18.
24 August 2017, 09:11   #13
meynaf
son of 68k

Join Date: Nov 2007
Location: Lyon / France
Age: 48
Posts: 4,306
Quote:
 Originally Posted by amilo3438 I wonder if same/similar code can be used on an A1200 for quick and dirty comparison between a real and emulation ? (at least for unknown cases like muls, mulu, divs etc.)
If what you want is having clock cycles of a specific instruction, a better way would be executing a large number of it in a loop.
On my 68030/50 i have a program doing 50,000,000 iterations of a loop. I count the number of seconds it takes, do -6 to take dbf into account, and it gives me the clock cycles of a single or group of instructions.

24 August 2017, 13:39   #14
amilo3438
Amiga 500 User

Join Date: Jun 2013
Location: EU
Posts: 1,198
So the idea to using VHPOSR (or color clks) to count cycles works in practice, but is tested only for 68000 on A500!

Final program is attached as picture below!

Also ,instead of only one instruction it can be added more instructions, but if its too more it may finish with an error (D0-D2 reset to zero) what means that starting counter is higher than the end counter! If everything is fine the D2 register will contain the cpu_cycles.

Cheers!

PS. I feel this concept can be more improved, so I left it to someone experienced in amiga machine language programming!

Quote:
 Originally Posted by meynaf If what you want is having clock cycles of a specific instruction, a better way would be executing a large number of it in a loop.
Yea, this may be one way, but I wanted to see does it works by using VHPOSR, and it does.
Attached Thumbnails

Last edited by amilo3438; 24 August 2017 at 13:46.

 24 August 2017, 13:48 #15 Kalms Registered User   Join Date: Nov 2006 Location: Stockholm, Sweden Posts: 228 You can use VHPOSR as a high-precision timer source on all Amiga systems. You need to make sure that the resolution is set to something such that you know how the hardware will count (DBLPAL will count differently from PAL for example). Also, you need to take into account that accessing VHPOSR is done over a bus where you compete with other hardware for access cycles. As people have outlined above, the simplest way to minimize the measurement errors is to 1) minimize other hardware activity and 2) measure across a large chunk of code & time. You can combine VHPOSR with the TOD counters to do measurements over >1 frame without needing any interrupt driven activity in between. If you are profiling on a non-a500 platform then you are probably targeting a range of configurations. You will need to combine the profiling with estimations (based on instruction timing from the processor manuals) to build performant code across a range of hardware.
 24 August 2017, 14:11 #16 amilo3438 Amiga 500 User   Join Date: Jun 2013 Location: EU Posts: 1,198 My motivation/idea was to try to find a way how approx "measure/compare" the emulated CPU with an real CPU by using software. Ok, this "proof of concept" obviously works for 68000 and A500, but for higher speeds the VHPOSR resolution is not enough, so it would need to ad f.e. 10 same instr (or more) instead of just one to test, or reduce a cpu speed. Generally, it would be nice if this concept could be used f.e. to take some values on an real machine like A1200 and than compare same values on emulated A1200, in order to improve the emulation accuracy even more, but I am afraid my current knowledge of machine programming is still not enough for a such task.
24 August 2017, 16:53   #17
Kalms
Registered User

Join Date: Nov 2006
Location: Stockholm, Sweden
Posts: 228
Quote:
 Originally Posted by amilo3438 My motivation/idea was to try to find a way how approx "measure/compare" the emulated CPU with an real CPU by using software. Ok, this "proof of concept" obviously works for 68000 and A500, but for higher speeds the VHPOSR resolution is not enough, so it would need to ad f.e. 10 same instr (or more) instead of just one to test, or reduce a cpu speed. Generally, it would be nice if this concept could be used f.e. to take some values on an real machine like A1200 and than compare same values on emulated A1200, in order to improve the emulation accuracy even more, but I am afraid my current knowledge of machine programming is still not enough for a such task.
As others have mentioned, the measurement error will be large if you take single instructions. If you want less than 10% error margin then I think you will need to measure blocks that take 100+ cycles to execute.

When you begin to look at faster machines than the A500, the interactions between CPUs, buses and other hardware become more complicated and more pronounced. You can probably use the framework that you have, but you will also need to design different test cases very carefully.

Then, I wonder what the purpose is of the comparison. Is the ultimate purpose to adjust the emulator to match real hardware performance better? You will find that the emulator includes a number of approximations of how the machine is built, and you will need to understand both the real machine's workings (best done through analysis, read hw specs, do measurements) and the emulator (best done by reading the source code -- not by doing measurements) before you will be able to make useful changes to the emulator.

In other words; the timing framework will enable you to tell that "yep there seems to be a difference in <this area over here> between the real hw and the emulator" but it will probably not enable you to pinpoint exactly where and why.

 24 August 2017, 18:42 #18 amilo3438 Amiga 500 User   Join Date: Jun 2013 Location: EU Posts: 1,198 Here is new updated version attached! I have check how it work with loops and have found that it counts accurate till d2=\$190 (400) cpu_cycles! (what is equal to 100 nop) So loops till 400 cpu_cycles should work fine, I hope! PS. As mentioned before, result of cpu_cycles is in d2, and on an error all d0-d2 registers is erased! (what means that it takes more than 400 cycles) EDIT: Counting for the loop in d4=\$1b (27) on the attached picture below: Result in d2=\$18c (396) cycles: 396-(4*28)-(10*27)-14=0. (Note: loop goes from 27 till 0 = 28; nop takes (4*28) and dbeq takes (10*27)+14 cycles.) EDIT2: Added 4 cycles faster/optimized version ! Attached Thumbnails   Last edited by amilo3438; 25 August 2017 at 00:59.
 31 August 2017, 13:26 #19 amilo3438 Amiga 500 User   Join Date: Jun 2013 Location: EU Posts: 1,198 New version that can now count accurate till 35444 mem_cycles (or nop_s) => 141776 cpu_cycles ! Note: It needs an A500 + fast RAM configuration for 100% non-mem wait states ! But depending of instruction used in test, it could also work fine w/chip memory only. (I guess) (no, w/chip memory it works fine only till \$e3-d0 color_clocks) EDIT: Counting example ! d7=\$278d loop value (max value for accurate) loop: nop dbf d7,loop d3=\$229c8 cpu_cycles => \$229c8-(4x\$278e)-(10x\$278d)-14=0 => Accurate! Attached Thumbnails     Last edited by amilo3438; 01 September 2017 at 18:38.
31 August 2017, 13:49   #20
Thorham
Computer Nerd

Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 45
Posts: 3,231
Quote:
 Originally Posted by meynaf If what you want is having clock cycles of a specific instruction, a better way would be executing a large number of it in a loop. On my 68030/50 i have a program doing 50,000,000 iterations of a loop. I count the number of seconds it takes, do -6 to take dbf into account, and it gives me the clock cycles of a single or group of instructions.
That's indeed the best way.

I do something similar. On my 50mhz 68030, I execute the code that is to be measured one million times and simply count the number of vertical blanks in a VBL interrupt. For screen modes that have a refresh rate of 50hrtz , this count gives you the number of cycles including the loop handling (50.000.000 / 50 = 1.000.000).

Advantage compared to meynaf's method: Takes a lot less time to run for larger pieces code while still being quite accurate.

Disadvantage compared to meynaf's method: More code. If you run something 50 million times, then you can probably just use a time stamp (from timer.device, not dos.library).

 Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 Similar Threads Thread Thread Starter Forum Replies Last Post source support.Apps 7 25 June 2016 00:20 Weemus support.WinUAE 11 15 June 2012 21:14 mc6809e Coders. Asm / Hardware 2 02 April 2012 19:50 Lonewolf10 Coders. General 19 18 November 2011 09:31 Eamoe support.Hardware 5 31 January 2011 23:31

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is Off Forum Rules
 Forum Jump User Control Panel Private Messages Subscriptions Who's Online Search Forums Forums Home News Main     Amiga scene     Retrogaming General Discussion     Nostalgia & memories Support     New to Emulation or Amiga scene         Member Introductions     support.WinUAE     support.WinFellow     support.OtherUAE     support.FS-UAE         project.AmigaLive     support.Hardware         Hardware mods         Hardware pics     support.Games     support.Demos     support.Apps     support.Amiga Forever     support.Amix     support.Other Requests     request.UAE Wishlist     request.Old Rare Games     request.Demos     request.Apps     request.Modules     request.Music     request.Other     Looking for a game name ?     Games images which need to be WHDified abime.net - Hall Of Light     HOL news     HOL suggestions and feedback     HOL data problems     HOL contributions abime.net - Amiga Magazine Rack     AMR news     AMR suggestions and feedback     AMR data problems     AMR contributions abime.net - Home Projects     project.Amiga Lore     project.EAB     project.IRC     project.Mods Jukebox     project.Wiki abime.net - Hosted Projects     project.aGTW     project.APoV     project.ClassicWB     project.Jambo!     project.Green Amiga Alien GUIDES     project.Maptapper     project.Sprites     project.WinUAE - Kaillera Other Projects     project.Amiga Demo DVD     project.Amiga Game Factory     project.CARE     project.Amiga File Server     project.CD32 Conversion     project.Game Cover Art         GCA.Feedback and Suggestions         GCA.Work in Progress         GCA.Cover Requests         GCA.Usefull Programs         GCA.Helpdesk     project.KGLoad     project.MAGE     project.Missing Full Shareware Games     project.SPS (was CAPS)     project.TOSEC (amiga only)     project.WHDLoad         project.Killergorilla's WHD packs Misc     Amiga websites reviews     MarketPlace         Swapshop     Kinky Amiga Stuff     Collections     EAB's competition Coders     Coders. General         Coders. Releases         Coders. Tutorials     Coders. Asm / Hardware     Coders. System         Coders. Scripting         Coders. Nextgen     Coders. Language         Coders. C/C++         Coders. AMOS         Coders. Blitz Basic     Coders. Contest         Coders. Entries Creation     Graphics         Graphics. Work In Progress         Graphics. Finished Work         Graphics. Tutorials     Music         Music. Work In Progress         Music. Finished Work         Music. Tutorials

All times are GMT +2. The time now is 02:03.

 -- EAB3 skin ---- EAB2 skin ---- Mobile skin Archive - Top