18 October 2019, 11:00 | #41 | |
It's coming back!
Join Date: Jul 2018
Location: comp.sys.amiga
Posts: 762
|
Quote:
Code:
void FillPolygon2D(const ScreenBuffer * screenBuffer, const ScreenBuffer * scratchScreenBufferA, const ScreenBuffer * scratchScreenBufferB, const Point2D * polygon, const UWORD n, const UWORD colour) { P_START // profiling Rectangle2D boundingRect = { -1, -1, -1, -1 }; for (UWORD i = 0; i < n; i++) { if (boundingRect.top == -1 || polygon[i].value[Y] < boundingRect.top) boundingRect.top = polygon[i].value[Y]; if (boundingRect.right == -1 || polygon[i].value[X] > boundingRect.right) boundingRect.right = polygon[i].value[X]; if (boundingRect.bottom == -1 || polygon[i].value[Y] > boundingRect.bottom) boundingRect.bottom = polygon[i].value[Y]; if (boundingRect.left == -1 || polygon[i].value[X] < boundingRect.left) boundingRect.left = polygon[i].value[X]; } for (UWORD i = 0; i < n; i++) DrawOneDotLine(scratchScreenBufferA, scratchScreenBufferB, (Point2D [2]) { polygon[i], polygon[(i + 1) % n] }); ScratchAreaFill(scratchScreenBufferB, scratchScreenBufferA, boundingRect); CopyScratchInColour(screenBuffer, scratchScreenBufferB, colour, boundingRect); for (UWORD i = 0; i < n; i++) DrawOneDotLine(scratchScreenBufferA, scratchScreenBufferB, (Point2D [2]) { polygon[i], polygon[(i + 1) % n] }); P_STOP // profiling } |
|
18 October 2019, 13:04 | #42 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
All the other stuff aside...
1. If you initialize boundingRect differently (something like: top +maxint, bottom -maxint, left +maxint, right -maxint), you can get rid of 4 comparisons with -1. 2. You are doing 2 slow divisions (modulos) per vertex. Change for loops to for(i=0; i<n-1; i++) DrawOneDotLine(...,i,i+1), and do the final call outside as DrawOneDotLine(...,n-1,0). That way you don't have to handle the wrap per vertex. |
18 October 2019, 13:55 | #43 | |
It's coming back!
Join Date: Jul 2018
Location: comp.sys.amiga
Posts: 762
|
Quote:
I think it must be down to the overhead of the blitter interrupts. I don't think I'm spending much actual time in these routines, but the clock keeps ticking while the interrupts are being serviced. I'm adding some code to try and see what's happening. I don't think I can measure the interrupt overhead, but maybe I can count how many times I'm interrupted. |
|
18 October 2019, 13:58 | #44 |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
|
Consider blit clearing smaller areas for small polygons (might be quicker than redrawing the lines to clear the scratch)
There's probably a "size" that you could use as the cut-off point.. not sure what the value would be though, would need some experimentation. Also, it probably is quicker to render smaller polygons directly to the screen, completely with the CPU |
18 October 2019, 14:13 | #45 | |
It's coming back!
Join Date: Jul 2018
Location: comp.sys.amiga
Posts: 762
|
Quote:
My queue is based around callbacks, so I could implement some bits as CPU-based, but right now I think the problem is the queue itself, so apart from reducing the number of interrupts generated per polygon, I don't think this would solve the problem. |
|
18 October 2019, 14:37 | #46 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
Yeah, I was just checking the previous posts and noticed this (number of calls):
Blitter_AquireBlit 22556 1268 Blitter_EnqueueBlit 18440 5923 DrawOneDotLine 18006 19968 So, you are using interrupts to draw every single line? That's like instant (at least) 2 times slower than start blitter linedraw and then calc the next line while blitter is working. Even if you inline aquire/enque and optimize in asm, it's still super slow. In general... I'd use interrupts only for larger blits (and not for lines), and if I'd need it done asynchronously I'd rather use copper lists, which could be very tricky if you are not doing stuff with fixed frame rates (e.g demo effects). |
18 October 2019, 14:45 | #47 |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
|
I would imagine he is CPU calculating his lines / fills / copies with the CPU into one buffer, while the 2nd buffer is being rendered with the interrupt??
|
18 October 2019, 14:53 | #48 | |
It's coming back!
Join Date: Jul 2018
Location: comp.sys.amiga
Posts: 762
|
Quote:
I thought that what I'd lose with interrupt overhead I'd gain in never having to busy-wait. I don't see a practical way to mix and match an interrupt-driven blitter queue with ad hoc blits. I suppose I could write something akin to a yield function that would consume from the queue instead of an interrupt. But that does not spark joy. |
|
18 October 2019, 14:54 | #49 |
It's coming back!
Join Date: Jul 2018
Location: comp.sys.amiga
Posts: 762
|
|
18 October 2019, 15:10 | #50 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
Well, context (blit size in this case) matters... Interrupts work with few large blits. Many small blits, probably best to do it all with cpu (and busy waits). And with mixed sizes, copper if you can afford it. Otherwise I'd try cpu and busy waits, *and* set blitter nasty with area blits (clear/copy/fill). And if you have some really large blits, try to decouple them from the pipeline and do with interrupts.
Doing it all with a single routine, like in this case, obviously won't cut it ;\. |
18 October 2019, 15:14 | #51 |
It's coming back!
Join Date: Jul 2018
Location: comp.sys.amiga
Posts: 762
|
As it stands, to draw the just the first frame takes 178 blits, so 178 BLIT interrupts will be generated. Just how much overhead are we talking about to process an interrupt? If it takes 500ms to render the screen, and most of the time is spent in interrupt overhead, then that implies around 2ms of overhead per interrupt. That sounds like a big number.
|
18 October 2019, 15:18 | #52 |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,413
|
Using interrupts to control the Blitter is a very nice and elegant way of doing things. I've experimented with it for quite a while.
However, nice as it is from a programmers viewpoint (no busy waiting, simply queue stuff) it is usually quite slow on (low end) Amiga's. As of now, I've not managed to make my interrupt based blitting to run anywhere near the speed of my non-interrupt based blitting. Although I've not found any form of blitting that was consistently faster using interrupts, I have found that blits that don't use all DMA slots (such as drawing lines and clear blits) do better than those that do use all DMA slots (such as copy and cookie cut blits). This is because these types of blits allow the CPU to do more useful work while blitting. However, even with that interrupt based blitting was up up to twice as slow for me than using a 'smart blitter wait'* (depending on size of blit - smaller was much closer to this than bigger). Basically, interrupt overhead can cost as much or more than what you normally spend setting up blits on an A500. *) Meaning: set up as much of the next blit as possible while the previous blit is running and only then start waiting. When you are finally waiting, set BLTPRI in DMACON to one ('Blitter Nasty mode') first. When done waiting, switch it back to zero. Edit: on overhead numbers... IIRC my assembly blitter routine ends up taking around 150-200 cycles per blit (in total). An interrupt takes a minimum of around 70 cycles assuming you just immediately RTE without doing anything. Add in saving/restoring registers and acknowledging the interrupt and you're looking at around 150-200 cycles in overhead for an interrupt. Doing a 178 blits per frame would then cost my code between ~27.000 and ~71.000 cycles in overhead (without vs with interrupt). For reference, the 68000 runs at ~7MHz. So it has about 141.000 cycles per frame to play with. Any cycle used by the CPU can't be used by the blitter so that suffers too. The above is all based on A500/68000. Last edited by roondar; 18 October 2019 at 15:32. Reason: Grammar stuff / corrected my cycle numbers |
18 October 2019, 15:37 | #53 | |
It's coming back!
Join Date: Jul 2018
Location: comp.sys.amiga
Posts: 762
|
Quote:
|
|
18 October 2019, 15:41 | #54 | |
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,413
|
Quote:
A question here would be what the performance difference here is between your compiled C code and assembly. I honestly don't know, it could be the compiler makes very decent code and the numbers are similar. It could also be the code is much slower. Perhaps this is worth investigating by itself? Or can we already tell the per-blit setup time from the numbers you posted? My reason for pointing this out is that if adding a small bit of assembly in your code would make a big difference, it may be worthwhile. |
|
18 October 2019, 15:52 | #55 |
It's coming back!
Join Date: Jul 2018
Location: comp.sys.amiga
Posts: 762
|
I think the compiler is doing a good job, I can't see anything that would cause a massive amount of extra overhead (remembering I'm trying to find where half a second has gone).
Code:
00001202 <_InterruptHandler_handleLevel3Interrupt>: static __attribute__((interrupt)) void _InterruptHandler_handleLevel3Interrupt(void) { 1202: 48e7 e0c0 movem.l d0-d2/a0-a1,-(sp) for (UWORD intreqr_intenar = custom->intreqr & custom->intenar & (INTF_COPER | INTF_VERTB | INTF_BLIT); 1206: 3439 00df f01e move.w dff01e <_end+0xddb07e>,d2 120c: 3039 00df f01c move.w dff01c <_end+0xddb07c>,d0 1212: c440 and.w d0,d2 1214: 0242 0070 andi.w #112,d2 1218: 6700 0096 beq.w 12b0 <_InterruptHandler_handleLevel3Interrupt+0xae> intreqr_intenar; intreqr_intenar = custom->intreqr & custom->intenar & (INTF_COPER | INTF_VERTB | INTF_BLIT)) { if (intreqr_intenar & INTF_COPER) { 121c: 0802 0004 btst #4,d2 1220: 6724 beq.s 1246 <_InterruptHandler_handleLevel3Interrupt+0x44> custom->intreq = (UWORD) INTF_COPER; custom->intreq = (UWORD) INTF_COPER; 1222: 33fc 0010 00df move.w #16,dff09c <_end+0xddb0fc> 1228: f09c 122a: 33fc 0010 00df move.w #16,dff09c <_end+0xddb0fc> 1230: f09c interruptHandler->_processInterrupt(interruptHandler, INTB_COPER); 1232: 2079 0000 7fc0 movea.l 7fc0 <interruptHandler>,a0 1238: 4878 0004 pea 4 <_start+0x4> 123c: 2f08 move.l a0,-(sp) 123e: 2068 0008 movea.l 8(a0),a0 1242: 4e90 jsr (a0) 1244: 508f addq.l #8,sp } if (intreqr_intenar & INTF_VERTB) { 1246: 0802 0005 btst #5,d2 124a: 6724 beq.s 1270 <_InterruptHandler_handleLevel3Interrupt+0x6e> custom->intreq = (UWORD) INTF_VERTB; custom->intreq = (UWORD) INTF_VERTB; 124c: 33fc 0020 00df move.w #32,dff09c <_end+0xddb0fc> 1252: f09c 1254: 33fc 0020 00df move.w #32,dff09c <_end+0xddb0fc> 125a: f09c interruptHandler->_processInterrupt(interruptHandler, INTB_VERTB); 125c: 2079 0000 7fc0 movea.l 7fc0 <interruptHandler>,a0 1262: 4878 0005 pea 5 <_start+0x5> 1266: 2f08 move.l a0,-(sp) 1268: 2068 0008 movea.l 8(a0),a0 126c: 4e90 jsr (a0) 126e: 508f addq.l #8,sp } if (intreqr_intenar & INTF_BLIT) { 1270: 0802 0006 btst #6,d2 1274: 6790 beq.s 1206 <_InterruptHandler_handleLevel3Interrupt+0x4> custom->intreq = (UWORD) INTF_BLIT; custom->intreq = (UWORD) INTF_BLIT; 1276: 33fc 0040 00df move.w #64,dff09c <_end+0xddb0fc> 127c: f09c 127e: 33fc 0040 00df move.w #64,dff09c <_end+0xddb0fc> 1284: f09c interruptHandler->_processInterrupt(interruptHandler, INTB_BLIT); 1286: 2079 0000 7fc0 movea.l 7fc0 <interruptHandler>,a0 128c: 4878 0006 pea 6 <_start+0x6> 1290: 2f08 move.l a0,-(sp) 1292: 2068 0008 movea.l 8(a0),a0 1296: 4e90 jsr (a0) intreqr_intenar = custom->intreqr & custom->intenar & (INTF_COPER | INTF_VERTB | INTF_BLIT)) 1298: 3039 00df f01e move.w dff01e <_end+0xddb07e>,d0 129e: 3439 00df f01c move.w dff01c <_end+0xddb07c>,d2 12a4: c440 and.w d0,d2 12a6: 0242 0070 andi.w #112,d2 for (UWORD intreqr_intenar = custom->intreqr & custom->intenar & (INTF_COPER | INTF_VERTB | INTF_BLIT); 12aa: 508f addq.l #8,sp 12ac: 6600 ff6e bne.w 121c <_InterruptHandler_handleLevel3Interrupt+0x1a> } } } 12b0: 4cdf 0307 movem.l (sp)+,d0-d2/a0-a1 12b4: 4e73 rte 000012b6 <_InterruptHandler_handleLevel4Interrupt>: 12b6: 4e73 rte 000012b8 <_InterruptHandler_handleLevel5Interrupt>: 12b8: 4e73 rte 000012ba <_InterruptHandler_handleLevel6Interrupt>: 12ba: 4e73 rte |
18 October 2019, 16:44 | #56 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
If we look at the profiling...
FillPolygon2D: 39sec - almost all aquire/enqueue calls are done here, so 1sec+6sec=7sec - copyscratch 3sec - DrawOneDotLine 20sec That leaves us with 39-7-3-20=9sec. Since it was measured before suggested adjustements: - boundingbox: always 8 cmps per vertex except 4 for the first one - 2 divs per vertex (that's 300 cycles on average, say 4 vertices per poly and 50 polys is then 60k extra cycles per drawn frame) - DrawOneDotLine is using a newly constructed struct with two member structs as 3rd parameter instead of e.g. taking two Point2D pointers - majority of calls are done here, everything is passed via stack with extra copy/paste (e.g. scratchScreenBufferA, scratchScreenBufferB are the same for all calls). So while 9sec looks like a lot, it's reasonable considering the above. I don't see anything being out of line. |
18 October 2019, 18:07 | #57 | |
It's coming back!
Join Date: Jul 2018
Location: comp.sys.amiga
Posts: 762
|
Quote:
If I remove the calls to ScratchAreaFill and CopyScratchInColour from FillPolygon2D, the amount of time spent in DrawOneDotLine should not be affected (apart from the time spent catching up on the blits for those two calls). In my current test code (with your initial optimisations to FillPolygon2D implemented) the time spent in DrawOneDotLine drops from 30s to 26s. These two function calls take almost the same parameters. I can leave the call to ScratchAreaFill in while removing the call to CopyScratchInColour. The time spent in DrawOneDotLine returns to 30s. I'm not sure what this tells me, but it does make me think that a large proportion of what the clock says is spent in FillPolygon2D is actually spent elsewhere, in interrupts or interrupt overhead. And amount of "lost" time seems too high to be explained by interrupt overhead or inefficiencies due to compiling from C. |
|
18 October 2019, 19:26 | #58 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,039
|
First, small correction. I missed fillscratch, so that's 5sec for fill+copy and 7sec for code within FillPolygon2D...
How about if you test these scenarios: - completely skip blitter code in the interrupt handler (only handle intena bits), - only handle clear screen and similar (skip poly related blits), - instead of actual poly blits, use size=(1<<6)+1 D=0 blits (minimal size blits, so you keep triggering interrupts). - anything similar that comes to your mind. So you can see how much overhead you have from pure interrupts, interrupts with minimal blits, etc. and then compare. |
18 October 2019, 20:31 | #59 |
Lemon. / Core Design
Join Date: Mar 2016
Location: Tier 5
Posts: 1,212
|
Of course, you have to remember that you are going to be writing your blitter values to a memory buffer, and then reading them again before writing them AGAIN to the actual blitter hardware registers... this (with the overhead of the blitter interrupt request) is going to eat a LOT of extra cycles.... the only main advantage is that you are not halting the CPU with blit waits... perhaps with a few large blits, then an interrupt it better, and with a lot of small blits, not so good.
Makes you wonder why nearly ALL the games used software CPU rendering back in the day And demos are designed around the amiga hardware (filling convex objects in one pass etc..) so they will use the blitter to the max in the most efficient way possible. To me, the "holy grail" of Amiga 3D, would be to have a system that plotted the edge points of polygons in a complex scene, allowing the whole scene to be filled in one blitter fill operation at the end something I have been looking at on and off since around 1991 Last edited by DanScott; 18 October 2019 at 20:39. |
18 October 2019, 21:11 | #60 |
It's coming back!
Join Date: Jul 2018
Location: comp.sys.amiga
Posts: 762
|
I've added profiling to my interrupt code. The numbers make me think there's a problem there.
Code:
_InterruptHandler_handleLevel3Interrupt 33426 40725 _GameInterruptHandler_processVERTB 7793 528 _GameInterruptHandler_processBLIT 29293 1037 Obviously you can add interrupts to the list of things I don't really understand. Code:
static __attribute__((interrupt)) void _InterruptHandler_handleLevel3Interrupt(void) { F_START for (UWORD intreqr_intenar = custom->intreqr & custom->intenar & (INTF_COPER | INTF_VERTB | INTF_BLIT); intreqr_intenar; intreqr_intenar = custom->intreqr & custom->intenar & (INTF_COPER | INTF_VERTB | INTF_BLIT)) { if (intreqr_intenar & INTF_COPER) { custom->intreq = (UWORD) INTF_COPER; custom->intreq = (UWORD) INTF_COPER; interruptHandler->_processInterrupt(interruptHandler, INTB_COPER); } if (intreqr_intenar & INTF_VERTB) { custom->intreq = (UWORD) INTF_VERTB; custom->intreq = (UWORD) INTF_VERTB; interruptHandler->_processInterrupt(interruptHandler, INTB_VERTB); } if (intreqr_intenar & INTF_BLIT) { custom->intreq = (UWORD) INTF_BLIT; custom->intreq = (UWORD) INTF_BLIT; interruptHandler->_processInterrupt(interruptHandler, INTB_BLIT); } } F_STOP } Code:
Function Name Number of Calls Total Elapsed Time (ms) _InterruptHandler_handleLevel2Interrupt 0 0 _InterruptHandler_handleLevel3Interrupt 33426 40725 _GameInterruptHandler_processVERTB 7793 528 _GameInterruptHandler_processBLIT 29293 1037 PlayGame 1 156282 FillScreen 88 132 Blitter_AquireBlit 22556 1337 Blitter_EnqueueBlit 22556 61151 Renderer_TransformModel 88 3144 Renderer_RenderModel 88 152210 RenderFace 2237 144695 ClipPolyToNearPlane 2187 2054 ClipAndFillPolygon2D 2187 138433 FillPolygon2D 2187 129042 DrawOneDotLine 18006 74168 ScratchAreaFill 2187 9254 CopyScratchInColour 2187 24010 DrawingComplete 88 153 Last edited by deimos; 18 October 2019 at 21:38. |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
So, I'd like to write a new Amiga game - what do you want to see? | Graham Humphrey | Amiga scene | 88 | 26 February 2012 21:50 |
My sales over next couple of weeks | emdxxxx | MarketPlace | 4 | 31 October 2007 10:17 |
AmigaSYS 1.7 Released ETA : 1-2 Weeks. | Dary | News | 34 | 22 March 2005 19:51 |
HOL mentioned in this weeks Micro Mart | fiath | Amiga scene | 8 | 06 June 2004 23:56 |
|
|