English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 13 June 2022, 18:27   #41
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,098
Quote:
Originally Posted by VladR View Post
Given 6 bitplanes, the CPU will be at about 54% utilization after all the DMA - right ?
So, 0.54*119,333 = 64,439c available per frame, which results in 126 px (64439/510) rendered per frame.
Back of the envelope calculation I did was that 320x256x6 would take 320*256*6/16 out of 313*223 available slots ~45%, so 54% left for CPU sounds about right.
Quote:
Originally Posted by VladR View Post
I am certainly curious how we can use the Blitter for this scenario.
Had a quick go at it, and for power-of-two bitplane widths (like a/b suggested, otherwise it's not as easy as I first envisioned) it's straight forward once you have the idea. You start with a list of 16-bit x,y coordinates and a code buffer (of same size+2 bytes) in chipmem.

1st blitter pass goes in reverse (so you can shift left) over the x coordinates and outputs the wanted instruction with the correct data register. If you can ensure the y-coordinate is preshifted/multiplied by the rowsize (if you're doing 3d stuff anyway maybe you could fold it into your projection routine) then just a second pass is needed to combine the two into an offset for the instruction. Otherwise you need more two passes, one for x and one for y shifting them correctly into place.

More concretely, say you have a 256x256 screen with a line going from 0,0 to 255,255:
Code:
linelist: dc.w 0,0,1,1,....
After the first pass you'd have:
Code:
8128 0000                or.b d0,$0000(a0)
8328 0000                or.b d1,$0000(a0)
8528 0000                or.b d2,$0000(a0)
8728 0000                or.b d3,$0000(a0)
...
Then updating the x-coordinate:
Code:
8128 0000                or.b d0,$0000(a0)
8328 0000                or.b d1,$0000(a0)
...

8f28 0000                or.b d7,$0000(a0)
8128 0001                or.b d0,$0001(a0)
Finally the y-coordinate:
Code:
8128 0000                or.b d0,$0000(a0)
8328 0030                or.b d1,$0020(a0)
...

8F28 00E0                or.b d7,$00e0(a0)
8128 0101                or.b d0,$0101(a0)
Put in a RTS instruction at the end and you have function that will (with d0-d7 set to $80..$01) plot pixels in a bitplane. Repeat for all the ones where you need bits set. Or generate eor.b instead if you think it'll be quicker to clear the bitplanes that way rather than normal clearing. Generate and.b (and set d0-d7 to the complement) if you need explicit clearing.If you need some combination, you'd only need to run step 1 to change the instruction.

Last edited by paraj; 13 June 2022 at 19:17.
paraj is offline  
Old 13 June 2022, 20:15   #42
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
You can still do it in 2 passes by shifting Y 16-* in the opposite direction (and doing a few minor adjustments).
In one of my routines I'm using fixed point 11:5 for both X and Y, and screen width 512 (64=2^6 bytes). So I'd have to shift X by 3+5=8 to the right, and shift Y by 6-5=1 to the left. And since that doesn't work, I did 16-(6-5)=15 to the right for Y. With adjusted Y bltptr and an extra row in bltsize (height+1).

EDIT: Forgot to mention that you also have to (manually) patch every 1024th pixel because you're doing multiple blits and the Y bits you shift out of the last row won't carry over to next blit's first row.

Last edited by a/b; 13 June 2022 at 20:56. Reason: patching
a/b is online now  
Old 14 June 2022, 18:20   #43
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,098
Quote:
Originally Posted by a/b View Post
You can still do it in 2 passes by shifting Y 16-* in the opposite direction (and doing a few minor adjustments).
In one of my routines I'm using fixed point 11:5 for both X and Y, and screen width 512 (64=2^6 bytes). So I'd have to shift X by 3+5=8 to the right, and shift Y by 6-5=1 to the left. And since that doesn't work, I did 16-(6-5)=15 to the right for Y. With adjusted Y bltptr and an extra row in bltsize (height+1).

EDIT: Forgot to mention that you also have to (manually) patch every 1024th pixel because you're doing multiple blits and the Y bits you shift out of the last row won't carry over to next blit's first row.

Ah cool, I considered that something like that should be possible, but it seemed like something that would take quite a bit of work to get right, though maybe now I'll have to give it a shot
paraj is offline  
Old 15 June 2022, 00:21   #44
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by paraj View Post
Back of the envelope calculation I did was that 320x256x6 would take 320*256*6/16 out of 313*223 available slots ~45%, so 54% left for CPU sounds about right.

Had a quick go at it, and for power-of-two bitplane widths (like a/b suggested, otherwise it's not as easy as I first envisioned) it's straight forward once you have the idea. You start with a list of 16-bit x,y coordinates and a code buffer (of same size+2 bytes) in chipmem.

1st blitter pass goes in reverse (so you can shift left) over the x coordinates and outputs the wanted instruction with the correct data register. If you can ensure the y-coordinate is preshifted/multiplied by the rowsize (if you're doing 3d stuff anyway maybe you could fold it into your projection routine) then just a second pass is needed to combine the two into an offset for the instruction. Otherwise you need more two passes, one for x and one for y shifting them correctly into place.

More concretely, say you have a 256x256 screen with a line going from 0,0 to 255,255:
Code:
linelist: dc.w 0,0,1,1,....
After the first pass you'd have:
Code:
8128 0000                or.b d0,$0000(a0)
8328 0000                or.b d1,$0000(a0)
8528 0000                or.b d2,$0000(a0)
8728 0000                or.b d3,$0000(a0)
...
Then updating the x-coordinate:
Code:
8128 0000                or.b d0,$0000(a0)
8328 0000                or.b d1,$0000(a0)
...

8f28 0000                or.b d7,$0000(a0)
8128 0001                or.b d0,$0001(a0)
Finally the y-coordinate:
Code:
8128 0000                or.b d0,$0000(a0)
8328 0030                or.b d1,$0020(a0)
...

8F28 00E0                or.b d7,$00e0(a0)
8128 0101                or.b d0,$0101(a0)
Put in a RTS instruction at the end and you have function that will (with d0-d7 set to $80..$01) plot pixels in a bitplane. Repeat for all the ones where you need bits set. Or generate eor.b instead if you think it'll be quicker to clear the bitplanes that way rather than normal clearing. Generate and.b (and set d0-d7 to the complement) if you need explicit clearing.If you need some combination, you'd only need to run step 1 to change the instruction.
Do I get the basic idea right that you prepare all the instruction opcodes first and let the Blitter compute the address offsets and bit masks ?
That is, indeed, very interesting approach!
VladR is offline  
Old 15 June 2022, 00:41   #45
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by paraj View Post
Also don't know why you'd forgo using LUTs?

...

Even with this version you're not going to be drawing more than a couple of hundred pixels per frame (with a 320x256x6 display active).
I have an update on that. I implemented the LUT (2*320 longs long) and the result is quite underwhelming, honestly. Only 44 cycles were gained


I still have 2 things on my ToDo list that should shave off some more...

Code:
 Version 7
[c] : Cycles
EHB : 0.54*119,333 = 64,439c (available cycles after DMA given ~54% utilization)
---------------------------------------------------------------------
| CPU  |  MHz | Frame [c]  | Colors | DrawPixel [c] |  Pixels/Frame |
---------------------------------------------------------------------
  6502   1.79      24,186       4           33            732.9
 68000   7.16     119,333       4          264            452.0
 68000   7.16      64,439      64          466            138.2
---------------------------------------------------------------------

EDIT:
Code:
 Version 8 - using a 4-cycle btst.b instead of move.l/andi.w
[c] : Cycles
EHB : 0.54*119,333 = 64,439c (available cycles after DMA given ~54% utilization)
---------------------------------------------------------------------
| CPU  |  MHz | Frame [c]  | Colors | DrawPixel [c] |  Pixels/Frame |
---------------------------------------------------------------------
  6502   1.79      24,186       4           33            732.9
 68000   7.16     119,333       4          248            481.1
 68000   7.16      64,439      64          426            151.2
---------------------------------------------------------------------
I suppose, if I could guarantee that there would be no overdraw, then I could get rid of 6x and.l d6,($X000,a1) , which is 140 cycles less for EHB (raising throughput to 225.3 pixels/frame), but it's messing my voxel test data (lots of overdraw for proper 3D perspective) now.


EDIT2:
Code:
 Version 9 - LUT for YPOS
[c] : Cycles
EHB : 0.54*119,333 = 64,439c (available cycles after DMA given ~54% utilization)
---------------------------------------------------------------------
| CPU  |  MHz | Frame [c]  | Colors | DrawPixel [c] |  Pixels/Frame |
---------------------------------------------------------------------
  6502   1.79      24,186       4           33            732.9
 68000   7.16     119,333       4          212            562.8
 68000   7.16      64,439      64          390            165.2
---------------------------------------------------------------------
         No Overdraw version (No AND Masking)
 68000   7.16     119,333       4          128            932.2
 68000   7.16      64,439      64          230            280.1
---------------------------------------------------------------------
There is one more option on my ToDo List - using BCLR/BSET instead of AND/OR - doesn't look like it's an instant savings, though (but needs to be implemented for completeness). After that, I am out of ideas...

Last edited by VladR; 15 June 2022 at 15:20. Reason: Performance Update
VladR is offline  
Old 15 June 2022, 18:32   #46
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,098
Quote:
Do I get the basic idea right that you prepare all the instruction opcodes first and let the Blitter compute the address offsets and bit masks ?
That is, indeed, very interesting approach!
1st pass can both prepare instruction and "shift" (select correct dN). For example:
Code:
        move.l  #pointlist+4*numpoints-4,bltapt(a6)
        move.l  #codebuffer+4*numpoints-4,bltdpt(a6)
        move.w  #$8128,bltbdat(a6) ; or.b d0,(a0) instruction
        move.w  #7<<9,bltcdat(a6) ; mask for x
        move.w  #9<<12!SRCA!DEST!$E4,bltcon0(a6) ; $E4 D = Bc+AC, shift x & 7 into right place
        move.w  #BLITREVERSE, bltcon1(a6)
        move.w  #numpoints*64+1,bltsize(a6)
Quote:
I have an update on that. I implemented the LUT (2*320 longs long) and the result is quite underwhelming, honestly. Only 44 cycles were gained
Not saying LUTs are an instant way to get massive speed, just that they should be in your toolbox, and 44 cycles is nothing to scoff at

Quote:
get rid of 6x and.l d6,($X000,a1)
You should be operating on bytes (or words), not longwords for a putpixel routine. A plain 68000 can only access one word at a time.

Quote:
There is one more option on my ToDo List - using BCLR/BSET instead of AND/OR - doesn't look like it's an instant savings, though (but needs to be implemented for completeness). After that, I am out of ideas
In my example code I have one function for each possible color (so 64 functions for 6bpl) and jump to the correct one with a jump table, and if I didn't miscount my "overdraw" version takes 210 cycles w/o any (other) nasty tricks.
paraj is offline  
Old 16 June 2022, 14:56   #47
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by paraj View Post
You should be operating on bytes (or words), not longwords for a putpixel routine. A plain 68000 can only access one word at a time.
Yeah, but then you have to split top and bottom 16 bits, which I totally didn't feel like doing just yet
I did keep regretting that decision during last 8 attempts at optimization, as the difference in cycles between 16 and 32-bit adds up pretty quickly everywhere.


Quote:
Originally Posted by paraj View Post
In my example code I have one function for each possible color (so 64 functions for 6bpl) and jump to the correct one with a jump table, and if I didn't miscount my "overdraw" version takes 210 cycles w/o any (other) nasty tricks.
This is an exercise in patience I certainly didn't mind writing 4 versions of DrawPixel on 6502. But 64 ?

210 is a really good number for 6 BPL
I guess I am going to have to work for it a bit harder



EDIT: I was just about to do the last item on the ToDo list - BSET/BCLR instead of OR/AND

Except, they don't support the 32-bit addressing mode (only 8-bit). Hence I gotta switch to 8/16-bit access. I don't think I can do the full rewrite of LUTs (and everything else) now, that's possible only during weekend.

Last edited by VladR; 16 June 2022 at 15:25.
VladR is offline  
Old 17 June 2022, 17:56   #48
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,098
Quote:
Originally Posted by VladR View Post
Yeah, but then you have to split top and bottom 16 bits, which I totally didn't feel like doing just yet
I did keep regretting that decision during last 8 attempts at optimization, as the difference in cycles between 16 and 32-bit adds up pretty quickly everywhere.
Why? If you're just plotting a pixel you're only accessing a single bit in one byte for each plane?


Quote:
Originally Posted by VladR View Post
This is an exercise in patience I certainly didn't mind writing 4 versions of DrawPixel on 6502. But 64 ?

210 is a really good number for 6 BPL
I guess I am going to have to work for it a bit harder
Patience is not needed, a decent assembler is (code is in post #36)



Learning to use the more advanced features really pays dividends in both speed of development, maintainability and performance. In this case I had to use some slightly esoteric features for the "colfunc" macro (to get a proper label), but otherwise it was bread and butter stuff.
paraj is offline  
Old 17 June 2022, 18:21   #49
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by paraj View Post
You should be operating on bytes (or words), not longwords for a putpixel routine. A plain 68000 can only access one word at a time.
Well, I did get to rewrite it using bytes this morning (before I started working), but it's slightly underwhelming, performance-wise.

Code:
 Version 10 - accessing Bytes instead of long-words

[c] : Cycles
EHB : 0.54*119,333 = 64,439c (available cycles after DMA given ~54% utilization)
---------------------------------------------------------------------
| CPU  |  MHz | Frame [c]  | Colors | DrawPixel [c] |  Pixels/Frame |
---------------------------------------------------------------------
  6502   1.79      24,186       4           33            732.9
---------------------------------------------------------------------
 68000   7.16     119,333       4          173            689.8
 68000   7.16      64,439      64          337            191.2
---------------------------------------------------------------------
         No Overdraw version (No AND Masking)
 68000   7.16     119,333       4          121            986.2
 68000   7.16      64,439      64          213            302.5
         ErasePixel
 68000   7.16     119,333       4          128            932.3
 68000   7.16      64,439      64          198            325.5
I just checked the cycle table and if I am reading it right, then
Code:
or.b d7,(-$2000,a1)
Takes exact same 18 cycles like
Code:
bset d7,(-$2000,a1)

Is that correct ? If so, then it makes no sense to write a new version using bclr / bset.
VladR is offline  
Old 17 June 2022, 18:39   #50
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,098
Quote:
Originally Posted by VladR View Post
Well, I did get to rewrite it using bytes this morning (before I started working), but it's slightly underwhelming, performance-wise.
[/CODE]I just checked the cycle table and if I am reading it right, then
Code:
or.b d7,(-$2000,a1)
Takes exact same 18 cycles like
Code:
bset d7,(-$2000,a1)
Is that correct ? If so, then it makes no sense to write a new version using bclr / bset.

bset.b Dn,(ofs,Am) and or.b Dn,(ofs,Am) should both take 16 cycles. Like most 68000 instructions (mul/div/shift being the most common exceptions) they're limited by each memory access taking 4 cycles. In this case 4 (word-sized) memory accesses are needed: 1 for the offset, 2 to do RMW and 1 for prefetch.


The advantage in a plain putpixel routine of using bset would come from not having to calculate a bitmask like you need for or.b (either through a LUT or by shifting).


When I recommended operating on bytes (or words) it because it seemed like you were doing long word accesses (which double the memory access time).
paraj is offline  
Old 18 June 2022, 00:07   #51
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by paraj View Post
When I recommended operating on bytes (or words) it because it seemed like you were doing long word accesses (which double the memory access time).
Yes, that was a deliberate decision on my part to not complicate things too much at the beginning, even though it was instantly obvious there would be some cycles lost due to 32-bit access being much slower.
But I knew that eventually I would get to rewrite it using bytes (which I finally did, though it took some time).

Quote:
Originally Posted by paraj View Post
The advantage in a plain putpixel routine of using bset would come from not having to calculate a bitmask like you need for or.b (either through a LUT or by shifting).
So, using bytes provided an opportunity to merge both OR and AND mask together (inside LUT) with XPOS byte offset from the start of the line. All 3 things fit into 4 bytes now.
It was faster to compute the mask when it was 4 bytes in a LUT, but it's 14 cycles to read as a byte, so it makes sense now.

It's only 10c faster, but every little bit helps. And it looks like I finally matched the 6502 with pixel throughput (though, admittedly, 3 separate DrawPixel versions (for each color) would raise that number).

Code:
 Version 11 - OR mask (byte) from LUT

[c] : Cycles
EHB : 0.54*119,333 = 64,439c (available cycles after DMA given ~54% utilization)
---------------------------------------------------------------------
| CPU  |  MHz | Frame [c]  | Colors | DrawPixel [c] |  Pixels/Frame |
---------------------------------------------------------------------
  6502   1.79      24,186       4           33            732.9
---------------------------------------------------------------------
 68000   7.16     119,333       4          163            732.1
 68000   7.16      64,439      64          327            197.1
---------------------------------------------------------------------
         No Overdraw version (No AND Masking)
 68000   7.16     119,333       4          113          1,056.0
 68000   7.16      64,439      64          205            314.3
         ErasePixel
 68000   7.16     119,333       4          118          1,011.3
 68000   7.16      64,439      64          190            339.2
VladR is offline  
Old 18 June 2022, 00:15   #52
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by paraj View Post
bset.b Dn,(ofs,Am) and or.b Dn,(ofs,Am) should both take 16 cycles. Like most 68000 instructions (mul/div/shift being the most common exceptions) they're limited by each memory access taking 4 cycles. In this case 4 (word-sized) memory accesses are needed: 1 for the offset, 2 to do RMW and 1 for prefetch.
Thank you!
I do have another cycle question. I just started using a different cycle table from https://mrjester.hapisan.com/04_MC68/CycleTimes.htm

But those numbers are different from the ones I was inferring from the PDF I got.
if I am reading it right, then the following op (ColorByte is a variable, so I presume it's the (addr).l column) takes 20c (I repeat it 6x for each BP), which is way more than I read from the PDF.
Code:
btst.b #0,ColorByte
If that's indeed 20c, I gotta revert to previous way...
VladR is offline  
Old 18 June 2022, 00:28   #53
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Yeah, 20c (4 fetch, 4 src operand, 2x4 dst operand, 4 mem read).
a/b is online now  
Old 18 June 2022, 01:54   #54
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by a/b View Post
Yeah, 20c (4 fetch, 4 src operand, 2x4 dst operand, 4 mem read).
Thanks.

But, then the numbers in my last summary table are off and I need to recompute it all. Probably best to go over every single instruction using the new cycle table. Highly likely there are few other errors...
VladR is offline  
Old 18 June 2022, 02:40   #55
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
I don't make any claims this is 100% accurate but it shouldn't be far off.
Attached Files
File Type: txt m68k.txt (33.5 KB, 112 views)
a/b is online now  
Old 18 June 2022, 03:53   #56
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Damn, this is amazing. Thank you !
I don't have to switch between constantly swapping Adobe Acrobat and just keep this file open in the second window of Notepad++, seeing both the method I am timing and the table at the same time !!!
VladR is offline  
Old 19 June 2022, 14:05   #57
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by a/b View Post
Yeah, 20c (4 fetch, 4 src operand, 2x4 dst operand, 4 mem read).
So, that raised the 6-BPL version from 327c to 415c, but I instantly reverted one of the previous versions (that I also assigned wrong cycle value from that PDF) that used btst against a register. It never made sense to me why would an op working against register be slower than against RAM, but I didn't question the PDF...
Now I'm at 353c:
Code:
Version 12 - BTST #x,d2

[c] : Cycles
EHB : 0.54*119,333 = 64,439c (available cycles after DMA given ~54% utilization)
---------------------------------------------------------------------
| CPU  |  MHz | Frame [c]  | Colors | DrawPixel [c] |  Pixels/Frame |
---------------------------------------------------------------------
  6502   1.79      24,186       4           33            732.9
---------------------------------------------------------------------
 68000   7.16     119,333       4          165            723.2
 68000   7.16      64,439      64          353            182.5
---------------------------------------------------------------------
         No Overdraw version (No AND Masking)
 68000   7.16     119,333       4          121            986.2
 68000   7.16      64,439      64          237            271.9
         ErasePixel
 68000   7.16     119,333       4           96          1,243.1
 68000   7.16      64,439      64          168            383.6
At this point, for a CPU-based plotter, I should create separate versions for each color, like on Atari (though there were just 4 versions for 4 colors, not 64).
Now I can go examine the Blitter-based approaches...
VladR is offline  
Old 19 June 2022, 17:15   #58
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Quote:
Originally Posted by VladR View Post
...btst against a register...
In case you don't have to preserve the color and if I understand correctly what you are doing, did you consider lsl.b #3,dx/bcc for the top bit and then add.b dx,dx/bcc for the rest (instead of btst#y,dx six times)?
a/b is online now  
Old 19 June 2022, 19:58   #59
VladR
Registered User
 
Join Date: Dec 2019
Location: North Dakota
Posts: 741
Quote:
Originally Posted by a/b View Post
In case you don't have to preserve the color and if I understand correctly what you are doing, did you consider lsl.b #3,dx/bcc for the top bit and then add.b dx,dx/bcc for the rest (instead of btst#y,dx six times)?
Somewhere around version 4 I was doing bitshifting. But then I found something faster. However, I should revisit it again because of the new cycle table, as I don't recall how many cycles was that. But I don't think it was 10 as the btst...
Either way, I don't think I understand what you mean here. Could you please elaborate ? Once you shift it right, you loose those bits. And any of the 6 bits might be on (and across whole screen they will be, as the input range of color is <0,63>)

Either way, this is my current code:
Code:
    ; d0:ypos   d1:xpos   d2:color     a2/a3: LUTs
		;  Compute Address Offset (xpos,ypos) : (yp*40) + (xp / 8)
	asl.w #2,d0
	move.l	(a3,d0),a1	; a1 = vidPtr [(yp * 40)]
	
	asl.w #2,d1		; d1 = (xpos*4) : ArrayIndex into LUT_XPOS_REL
	add.w	(a2,d1),a1	; d0 += xpos address Offset
	move.b	3(a2,d1),d3	; d3 = MaskAND = $FF - (1 << xpRelMask)
		
		; MaskAND:	Clear all bits
	and.b d3,(-$4000,a1)
	and.b d3,(-$2000,a1)
	and.b d3,(a1)
	and.b d3,($2000,a1)
	and.b d3,($4000,a1)
	and.b d3,($6000,a1)
	
		; MaskOR:	d3 = (1 << xpRelMask)
	move.b	2(a2,d1),d3

	btst #0,d2			; 10c 
	beq dp9_2
	or.b d3,(-$4000,a1)		; BP1
		dp9_2:
	btst #1,d2
	beq dp9_3
	or.b d3,(-$2000,a1)		; BP2
		dp9_3:
	btst #2,d2
	beq dp9_4
	or.b d3,(a1)			; BP3
		dp9_4:
	btst #3,d2
	beq dp9_5
	or.b d3,($2000,a1)		; BP4
		dp9_5:
	btst #4,d2
	beq dp9_6
	or.b d3,($4000,a1)		; BP5
		dp9_6:
	btst #5,d2
	beq dp9_7
	or.b d3,($6000,a1)		; BP6
		dp9_7:
VladR is offline  
Old 19 June 2022, 21:07   #60
saimo
Registered User
 
saimo's Avatar
 
Join Date: Aug 2010
Location: Italy
Posts: 787
@VladR

I never had to write a generic pixel-plotting routine for planar graphics in my life (at least, I can't remember), so this attracted my attention. I couldn't help but take your code and whip up an alternative version that minimizes the memory accesses (which is crucial, given that you're using 6 bitplanes).
Writte on the fly and totally untested, so apologies if it contains bugs! - anyway, even in that case, it's still good enough to illustrate the concepts.

Code:
    asl.w   #2,d0
    movea.l (a3,d0.w),a1   ;line base address
    move.w  d1,d0
    lsr.w   #3,d1          ;X offset
    adda.w  d1,a1          ;pixel base address

    moveq.l #7,d1
    and.w   d1,d0
    sub.w   d0,d1          ;bit number
    moveq.l #0,d0
    bset.l  d1,d0          ;OR mask
    move.b  d0,d1
    not.b   d1             ;AND mask

    move.b  ($6000,a1),d3
    and.b   d1,d3
    lsl.b   #3,d2
    bcc.b   .b5
    or.b    d0,d3
.b5 move.b  d3,($6000,a1)

    move.b  ($4000,a1),d3
    and.b   d1,d3
    add.b   d2,d2
    bcc.b   .b4
    or.b    d0,d3
.b4 move.b  d3,($4000,a1)

    move.b  ($2000,a1),d3
    and.b   d1,d3
    add.b   d2,d2
    bcc.b   .b3
    or.b    d0,d3
.b3 move.b  d3,($2000,a1)

    move.b  (a1),d3
    and.b   d1,d3
    add.b   d2,d2
    bcc.b   .b2
    or.b    d0,d3
.b2 move.b  d3,(a1)

    move.b  (-$2000,a1),d3
    and.b   d1,d3
    add.b   d2,d2
    bcc.b   .b1
    or.b    d0,d3
.b1 move.b  d3,(-$2000,a1)

    move.b  (-$4000,a1),d3
    and.b   d1,d3
    add.b   d2,d2
    bcc.b   .b0
    or.b    d0,d3
.b0 move.b  d3,(-$4000,a1)

Last edited by saimo; 20 June 2022 at 07:42. Reason: Fixed some offsets I had forgotten after copying and pasting.
saimo is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Help Fund the Amiga 4000 Replica Project! Acill Amiga scene 82 02 March 2020 20:04
Financial Fund London Amiga or PC runandbecome Amiga scene 8 30 September 2016 00:44
An idea for continued games development... using Amiga Galahad/FLT Amiga scene 91 29 December 2010 11:45
Amiga development freehand Retrogaming General Discussion 4 18 April 2010 17:53
Amizilla Fund closes in on almost $9000 in donations; first one that donates and gets Pyromania News 0 11 January 2005 11:00

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 15:00.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10184 seconds with 14 queries