![]() |
![]() |
#61 |
OCS forever!
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
|
I did have a $ffdf,$fffe in there. I tend to trigger my lev3 irq from the copper and usually just after the last line of the display so I can start clearing - so I usually wait for 255, then my last display line. But in this case by adding one more bob it pushes the display past line 255 so that when it hits $ffdf,$fffe it skips a frame. So i'd hit a fake limit.
I've swapped it to a more normal irq at line 0 and I got this up to 79 BOBs as well. Output looks better. Look how fast the blits occur after the display dma ends - imagine if it were that fast all the time... ![]() |
![]() |
![]() |
#62 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,438
|
Hmm, still 79... That's actually quite interesting, I would've expected a bit more impact of moving to Copper blits. But if I understand you correctly, both methods (at least as you did them) take a very similar amount of total time. Fascinating stuff
![]() |
![]() |
![]() |
#63 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,498
|
Quote:
![]() 4 static, 7 updates in chip mem. My stupid calculations, seen over this results, seem quite sensible. Consider that this is a bad situation to use copper to control the blitter, the 68k code version just do too little... More than anything else the 'gain' comes from the absence of [wasted CPU cycles for blitter waits] Last edited by ross; 10 August 2020 at 00:05. Reason: [] |
|
![]() |
![]() |
#64 |
OCS forever!
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
|
I think it kinda makes sense. identical BOBs are as predictable as it gets. No long or short blits. In the code the blit takes longer than the CPU code in the draw loop so it's only held up by the blitwait - there's never that normal bottleneck of having the blitter sitting idle.
I think with anything more complicated in the CPU part the copper one will pull ahead. But is losing other copper features worth it... Out of interest, I enabled 5 bitplanes just to see and both versions dropped to 57 BOBs. |
![]() |
![]() |
#65 | |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,438
|
Quote:
![]() It's actually in part your calculations that made me wonder why there was no gain ![]() Basically, I kind of disagree we shouldn't see any gains (or perhaps I just don't understand why, which is equally possible ![]()
This is surprising to me, as I've been told by several people (and your calculations also suggest this) that there are pretty big advantages in terms of setup cost and avoiding CPU based Blitter waiting. I'm inclined to conclude that perhaps the costs of setting up/maintaining the Copperlist(s) for Copper based blitting is not where the speed advantage lies (despite many claims to the contrary). Rather, the advantage apparently lies purely in reaching better bus utilization. Last edited by roondar; 10 August 2020 at 11:35. |
|
![]() |
![]() |
#66 | |
Registered User
Join Date: Aug 2006
Location: Finland
Age: 52
Posts: 244
|
Quote:
|
|
![]() |
![]() |
#67 | |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,438
|
Quote:
As a thought experiment: suppose the Copper based blitting method offered an very small 6 CPU cycle gain in terms of setup/Blitter waiting over using the CPU to blit as normal. In that case we ought to see at least one full rasterline that is "free" (as in not blitting)*. That doesn't seem to be the case, so the gain appear to be even smaller than that. Had the Copper based blitting method offered the kind of gains that ross predicted, we ought to see at least some extra bobs or a chunk of free raster time. We see neither, which to me seems to show that there indeed is very little actual gains in this case. *) 6*79=474c or slightly more than 1 rasterline. |
|
![]() |
![]() |
#68 |
Registered User
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
|
I'm wondering if there's a gain to be made in the screen clear? As the copper list already holds the positions of where the Bob's are I'm wondering if it's quicker to use that to clear the bobs too. That would have an advantage over the CPU based version.
I'd need to think it through properly but I think I'm right. |
![]() |
![]() |
#69 | ||
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 54
Posts: 4,498
|
Quote:
In my superficial calculation I've adopted 2 (vs 11) chip mem update for the gain, here there are 7 (vs 11) for a tie! So you can also roughly estimate how much CWAIT_BFD saves compared to CPU_BWAIT. For me it is a good result, from here on you can only gain ![]() Quote:
I have a half idea how to do it .. |
||
![]() |
![]() |
#70 | |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,438
|
Quote:
![]() So if you could get it down to two chipmemory updates you'd gain something on the order of 12*5 CPU cycles per blit (or about 4700 cycles total in this example case). That's a nice little bump in performance. |
|
![]() |
![]() |
#71 |
OCS forever!
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
|
Just to finish this off for me, I did a blit interrupt version. I usually run everything in lev3 vblank/lev3 copper which was clashing with the lev3 blit interrupt. So I changed to running in a lev1 softint initiated from the copper. That was interesting.
![]() Copper list blitting version was 79 BOBs and I got the blit interrupt version to 65 BOBs. The interrupt save/restore registers is the killer part. I could optimize this case because it is so simple, but the difference between 'movem.l d0-d7/a0-a6' vs 'movem.l a0/a6' was the difference between 55 and 65 BOBs. Definitely not optimal but I can think of a few pieces of code where I've not been able to split the CPU/blit parts up neatly and maybe for a handful of blits this method is useful. Code:
P0_BlitDoneIrq_SecondBlit: movem.l a0/a6,-(sp) lea _custom+bltcpth,a6 move.w #INTF_BLIT,intreq-bltcpth(a6) move.w #INTF_BLIT,intreq-bltcpth(a6) move.l BOB_BlitQueue_PTR(pc),a0 cmp.l #BOB_BlitQueue_End,a0 bge.s .exit move.l (a0)+,bltcon0-bltcpth(a6) move.l (a0)+,(a6)+ ;bltcpt move.l (a0)+,(a6)+ ;bltbpt move.l (a0)+,(a6)+ ;bltapt move.l (a0)+,(a6)+ ;bltdpt move.w (a0)+,(a6) ;bltsize move.l a0,BOB_BlitQueue_PTR .exit: movem.l (sp)+,a0/a6 rte BOB_BlitQueue_PTR: dc.l 0 rsreset ;BLTQ_BLTCON0 rs.w 1 ;BLTQ_BLTCON1 rs.w 1 ;BLTQ_BLTCPTH rs.w 1 ;BLTQ_BLTCPTL rs.w 1 ;BLTQ_BLTBPTH rs.w 1 ;BLTQ_BLTBPTL rs.w 1 ;BLTQ_BLTAPTH rs.w 1 ;BLTQ_BLTAPTL rs.w 1 ;BLTQ_BLTDPTH rs.w 1 ;BLTQ_BLTDPTL rs.w 1 ;BLTQ_BLTSIZE rs.w 1 ;BLTQ_SIZEOF rs.w 0 |
![]() |
![]() |
#72 |
Registered User
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
|
Nice one @Antiriad_UK.
I'm on the same journey but on stock AGA. In the same 3 bitplane graphics mode using traditional blits I'm currently at 111 16x16 bobs (full clear and redraw every frame). I haven't done any sine/cosine yet but can plot at any x/y position independently. I'll move it to the Copper blits soon and post results... may also put the mode into Dual Playfield on AGA and test results which would be a more real world. Geezer |
![]() |
![]() |
#73 | |
Registered User
Join Date: Nov 2017
Location: Los Angeles
Posts: 49
|
Quote:
It's not much, but you could save 4 cycles and a bus access on the restore by doing this: Code:
move.l (sp)+, a6 // 12 (3/0) move.l (sp)+, a0 // 12 (3/0) Code:
movem.l (sp)+, a0/a6 // 28 (7/0) (12+8n (3+2n/0)) ![]() Last edited by FSizzle; 15 August 2020 at 19:58. |
|
![]() |
![]() |
#74 |
Registered User
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
|
So here's mine after a bit of work on AGA.
CPU Blits I can push around 136 16x16 (8 col) bobs COPPER Blits I can push around 132 16x16 (8 Col) bobs. Screen mode is x4 fetch with 3 bitplanes enabled. More importantly, I've found this to be a really great discussion point on Amiga programming as I have learned so much from it. If I knew this info now doing my previous projects I could have improved upon their performance by a long stretch. Thanks to all for your contributions. A great gift of knowledge from everyone involed. Geezer |
![]() |
![]() |
#75 |
OCS forever!
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
|
Yes me too. I’ve shunned copper blits and blit interrupts before. I’m kinda in awe at the people who figured this out without today’s resources.
I’m also impressed with the order of the blit registers. Amiga team thought so far ahead about the optimal order of those registers. Crazy. |
![]() |
![]() |
#76 |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,213
|
|
![]() |
![]() |
#77 | |
Registered User
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
|
Quote:
But with that said, there's one thing gaining performance in a demo, and quite another gaining one writing a game (only in my opinion ofcourse). ![]() Edit - it's amazing the old stuff I remember... just checked out the bobs on Dragons Megademo, clearly a screen buffer trick used there... was used all the time on the ST demos. Last edited by mcgeezer; 16 August 2020 at 00:03. |
|
![]() |
![]() |
#78 |
OCS forever!
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
|
Unlimited bobs don’t count. They do make me smile though
![]() |
![]() |
![]() |
#79 |
Registered User
Join Date: Oct 2015
Location: Landsberg / Germany
Posts: 526
|
Very inspiring and englightening thread going on here. Pretty enlightening for Kevins and my current project Proxima 3 too, since this relies heavily on moving around as many objects as possible within on frame.
Thank you guys for your measures and comparisons regarding cpu and dma overhead. This is really amazing work and very precious input. The conclusion for p3 seems it makes sense to stick with a rather traditional approach of feeding blitter with the cpu and trying to use the copper for visual fx. Copper feeds BPLxMOD and BPLCON1 at least every two scanlines with modified modulus and scrolldata to achieve for a number of visual distortions, and I can´t see how I could combine this with copper feeding the blitter. Feeding blitter-with-copper-technique seems very interesting and I´d love to use it in a future project. Imagine there is a lot of room for elegant and fast optimisations here. For example by setting up a number of predefined sub-copperlists for various bob sizes and bitplane source adresses, which are then called from a main copperlist with a series of copper jumps. Modifying these jumps would be the only job the cpu would have to do each frame. |
![]() |
![]() |
#80 |
Inviyya Dude!
Join Date: Sep 2016
Location: Amiga Island
Posts: 2,798
|
This approach seems to be a bit of a pain in the ass for any non general display engine, isn't it?
Like, if you have a sprite parallax layer or a lot of copper palette changes this thing seems hard to code. |
![]() |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Mega Typhoon ECS only? | Photon | HOL suggestions and feedback | 8 | 16 April 2020 21:47 |
EAB/Lemon Super League 2017: Round 4 - Mega Typhoon | Graham Humphrey | EAB's competition | 50 | 09 April 2017 11:01 |
Working copy of Mega Typhoon ECS game? | ImmortalA1000 | request.Old Rare Games | 9 | 04 February 2013 06:38 |
Mega Typhoon Trainer Version - Working! | plasmatron | request.Old Rare Games | 1 | 03 July 2011 23:52 |
Mega Typhoon | haynor666 | HOL contributions | 1 | 19 August 2008 00:37 |
|
|