English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 20 October 2013, 18:29   #81
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
Quote:
Originally Posted by Mrs Beanbag View Post
instead of
Code:
	sub.l	#640*6,a0
how about
Code:
        lea -640*6(A0),A0
Yeah, haven't gotten around to optimizing it farther because I was still busy with the contents of the loop. Now that I have plenty of registers left, I'm going to do this:
Code:
    sub.l   d5,a0
Thorham is offline  
Old 20 October 2013, 18:47   #82
PeterK
Registered User
 
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,366
@Thorham
Your C-array init would require to setup all 834 bytes. That's not what I'm looking for.

I still need a short and efficient table generator in C like this one in assembler:
Quote:
/* DCBlock.Byte count, data */

DCB.B 53, 4
DCB.B 43, 5
DCB.B 43, 6
DCB.B 43, 7
DCB.B 43, 8
DCB.B 53, 9

DCB.B 41, 0
DCB.B 37, 6
DCB.B 37, 12
DCB.B 37, 18
DCB.B 37, 24
DCB.B 37, 30
DCB.B 52, 36

DCB.B 53, 0
DCB.B 43, 42
DCB.B 43, 84
DCB.B 43, 126
DCB.B 43, 168
DCB.B 53, 210
The other code could be like this now:

Code:
case NSFB_PALETTE_CUBE_676:

	dr = ( c        & 0xFF);
	dg = ((c >>  8) & 0xFF);
	db = ((c >> 16) & 0xFF);

	if (pushRGBlevel = ~pushRGBlevel) { /* push up every 2. pixel */
		dr += 0x16;
		dg += 0x16;
		db += 0x16;
	}
	if (dr > 250)
		if (dg > 250)
			if (db > 250) return 2; /* this is white */

	best_col = table_for_cube_676[dr+556]
		 + table_for_cube_676[dg+278]
		 + table_for_cube_676[db];

	break;
Attached Files
File Type: rar TableForCube,bin.rar (136 Bytes, 72 views)

Last edited by PeterK; 20 October 2013 at 19:17.
PeterK is offline  
Old 20 October 2013, 18:56   #83
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
Quote:
Originally Posted by PeterK View Post
Your C-array init would require to setup all 834 bytes. That's not what I'm looking for. I need a short and efficient table generator.
Sorry, thought you meant how to do that asm table in C
Thorham is offline  
Old 20 October 2013, 18:58   #84
Mrs Beanbag
Glastonbridge Software
 
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
Quote:
Originally Posted by Thorham View Post
Yeah, haven't gotten around to optimizing it farther because I was still busy with the contents of the loop. Now that I have plenty of registers left, I'm going to do this:
Code:
    sub.l   d5,a0
fair enough although it's worth remembering that sub.w will sign extend the source operand when destination is an address register. Makes no odds in this case though I suppose.
Mrs Beanbag is offline  
Old 20 October 2013, 18:59   #85
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
Quote:
Originally Posted by Mrs Beanbag View Post
fair enough although it's worth remembering that sub.w will sign extend the source operand when destination is an address register. Makes no odds in this case though I suppose.
Size of register to register subs and adds makes no difference on 68020+. Same for moves, logical operators, shifts and rotates.
Thorham is offline  
Old 20 October 2013, 19:14   #86
Mrs Beanbag
Glastonbridge Software
 
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
Quote:
Originally Posted by Thorham View Post
To PeterK:
Code:
; dithering
    subq.l    #1,d3
    bge    .l1
    moveq    #2,d3
    add.l    d4,a2

.l1
    dbra    d6,.loopx
Code:
; dithering
    subq.l    #1,d3
    dblt    d6,.loopx
    bge.s  .l1
    moveq    #2,d3
    add.l    d4,a2

    dbra    d6,.loopx
.l1

Last edited by Mrs Beanbag; 20 October 2013 at 19:21.
Mrs Beanbag is offline  
Old 20 October 2013, 20:03   #87
PeterK
Registered User
 
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,366
@arti
Atm, I don't know how to help you any further as long as you don't tell me what you need now.
PeterK is offline  
Old 20 October 2013, 20:04   #88
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
To Mrs Beanbag:

Good one That shaves off a good few cycles! I should read up on the dbcc instruction, because I only use it to make for loops.
Thorham is offline  
Old 20 October 2013, 20:45   #89
arti
Registered User
 
Join Date: Jul 2008
Location: Poland
Posts: 662
@PeterK

Should I comment nsfb_palette_generate_nsfb_8bpp(nsfb->palette);
and use nsfb_palette_generate_cube_676(nsfb->palette); instead.
Or use both functions?

I've implemented your code and this is result. Doesn't work yet.
Attached Thumbnails
Click image for larger version

Name:	test1.png
Views:	163
Size:	9.3 KB
ID:	37331  
arti is offline  
Old 20 October 2013, 20:59   #90
PeterK
Registered User
 
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,366
Yeah, you could try to comment out nsfb_palette_generate_nsfb_8bpp and use nsfb_palette_generate_cube_676 instead.

I must admit that I don't understand all the dependencies in Netsurf concerning how the palettes are mapped to the screen pens and how it manages to use more than one palette at the same time. I've never done anything with Netsurf yet.

If you are still using my older code then please comment the alpha channel handling out:
// if (c < 0x46000000) return 0; /* alpha < 70 gets pen 0 */

Maybe, NetSurf sets the alpha channel always to zero ? I don't know,

Last edited by PeterK; 20 October 2013 at 21:14.
PeterK is offline  
Old 20 October 2013, 21:18   #91
arti
Registered User
 
Join Date: Jul 2008
Location: Poland
Posts: 662
Have you looked at common.c ? Maybe that helps you understand.
Attached Files
File Type: c common.c (21.3 KB, 119 views)
arti is offline  
Old 20 October 2013, 21:37   #92
PeterK
Registered User
 
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,366
Where can I download the latest source code of Netsurf and which compiler and additional resources will I need to compile it?
PeterK is offline  
Old 20 October 2013, 21:59   #93
arti
Registered User
 
Join Date: Jul 2008
Location: Poland
Posts: 662
Here https://www.dropbox.com/sh/k49d8viddz9xo28/Z-HGQIXIRe

I use gcc 4.5.0 for cygwin from amiga.sf with AmiDevCpp 0.9.8 workspace

Last edited by arti; 20 October 2013 at 22:05.
arti is offline  
Old 20 October 2013, 22:54   #94
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
Quote:
Originally Posted by Mrs Beanbag View Post
Code:
; dithering
    subq.l    #1,d3
    dblt    d6,.loopx
    bge.s  .l1
    moveq    #2,d3
    add.l    d4,a2

    dbra    d6,.loopx
.l1
LOL:
Code:
; dithering
    move.l  a2,d5
    move.l  a3,a2
    move.l  a4,a3
    move.l  d5,a4
    
    dbra    d6,.loopx
    sub.l   #640*6,a0
    dbra    d7,.loopy

Last edited by Thorham; 20 October 2013 at 23:01.
Thorham is offline  
Old 20 October 2013, 23:00   #95
Mrs Beanbag
Glastonbridge Software
 
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
Quote:
Originally Posted by Thorham View Post
LOL:
Code:
; dithering
    move.l  a2,d5
    move.l  a3,a2
    move.l  a4,a3
    move.l  d5,a4
    
    dbra    d6,.loopx
    sub.l   #640*6,a0
    dbra    d7,.loopy


I take it d4 is double d2 then

edit: d2=256, d4=512, right?

Last edited by Mrs Beanbag; 20 October 2013 at 23:06.
Mrs Beanbag is offline  
Old 20 October 2013, 23:09   #96
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
Quote:
Originally Posted by Mrs Beanbag View Post
I take it d4 is double d2 then

edit: d2=256, d4=512, right?
Here's the whole render routine:

Code:
renderImage
    lea     image_end-640*3,a0
    lea     bmp,a1
    lea     tableR+256-16,a2

    move.l  a2,a3
    add.l   #16,a3
    move.l  a3,a4
    add.l   #16,a4

    clr.l   d0
;
; render loop
;
    move.l  #512-1,d7   ; image height
.loopy
    move.l  #640-1,d6   ; image width
.loopx
    move.b  (a0)+,d0
    move.b  (a2,d0.w,256*6.w),d1

    move.b  (a0)+,d0
    add.b   (a2,d0.w,256*3.w),d1

    move.b  (a0)+,d0
    add.b   (a2,d0.w),d1

    move.b  d1,(a1)+

; dithering
    move.l  a2,d2
    move.l  a3,a2
    move.l  a4,a3
    move.l  d2,a4

.next
    dbra    d6,.loopx
    sub.l   #640*6,a0
    dbra    d7,.loopy

Last edited by Thorham; 21 October 2013 at 05:45.
Thorham is offline  
Old 20 October 2013, 23:12   #97
Mrs Beanbag
Glastonbridge Software
 
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
Neat! You could probably re-arrange the instruction order a bit to assist pipelining/mitigate memory stalls.

Code:
.loopx
    move.b  (a0)+,d0
    move.l  a4,a3
    move.b  (a2,d0.w,256*6.w),d1

    move.b  (a0)+,d0
    move.l  d2,a4
    add.b   (a2,d0.w,256*3.w),d1

    move.b  (a0)+,d0
    move.l  a2,d2
    add.b   (a2,d0.w),d1

    move.l  a3,a2
    move.b  d1,(a1)+
I'll admit I have no idea what is the structure of this look-up table.

Last edited by Mrs Beanbag; 20 October 2013 at 23:18.
Mrs Beanbag is offline  
Old 20 October 2013, 23:40   #98
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
Quote:
Originally Posted by Mrs Beanbag View Post
Neat! You could probably re-arrange the instruction order a bit to assist pipelining/mitigate memory stalls.
Yeah, for '60 that's best. This code is written for '20/'30, so you can't do much as far as I'm aware (although I have no clue about those memory stalls, if 20/30 has them).

Quote:
Originally Posted by Mrs Beanbag View Post
I'll admit I have no idea what is the structure of this look-up table.
It's a little hard to explain, but it's not too complicated. Here's the code that generates the table:

Code:
;
; generate color reduction tables
;
genTables
    movem.l d0-a6,-(sp)

    lea tableR+256,a0
    lea tableG+256,a1
    lea tableB+256,a2

    clr.l   d0 ; Red
    clr.l   d1 ; Green
    clr.l   d2 ; Blue

    move.l  #(1<<16)/(51-1),d3 ; Red 16bit.16bit fixed point number
    move.l  #(1<<16)/(42-1),d4 ; Green 16bit.16bit fixed point number

    move.l  #255,d7
.loop
    move.l  d0,d6
    swap    d6
    move.b  d6,(a0)+

    move.l  d1,d6
    swap    d6
    mulu.w  #6,d6
    move.b  d6,(a1)+

    move.l  d0,d6
    swap    d6
    mulu.w  #7*6,d6
    move.b  d6,(a2)+

    add.l   d3,d0
    add.l   d4,d1

    dbra    d7,.loop

    lea tableR,a0
    lea tableG,a1
    lea tableB,a2

    move.l  508(a0),d0
    move.l  508(a1),d1
    move.l  508(a2),d2

    move.l  #255,d7
.loop2
    move.b  d0,512(a0)
    clr.b   (a0)+
    move.b  d1,512(a1)
    clr.b   (a1)+
    move.b  d2,512(a2)
    clr.b   (a2)+

    dbra    d7,.loop2

    movem.l (sp)+,d0-a6
    rts
And here's the palette generation code:

Code:
;
; set palette to a 6*7*6 palette
;
setPalette
    movem.l d0-a6,-(sp)
    move.l  scr,a5

    lea     sc_BitMap(a5),a4
    lea     sc_ViewPort(a5),a4
    move.l  a4,svport

    lea b,a0
    moveq   #255/5,d4
    moveq   #255/6,d3

    moveq   #5,d7   ; blue
.loopz
    moveq   #6,d6   ; green
.loopy
    moveq   #5,d5   ; red
.loopx
    moveq   #5,d0
    sub.l   d5,d0
    mulu.w  d4,d0
    ror.l   #8,d0
    move.l  d0,(a0)+

    moveq   #6,d0
    sub.l   d6,d0
    mulu.w  d3,d0
    ror.l   #8,d0
    move.l  d0,(a0)+

    moveq   #5,d0
    sub.l   d7,d0
    mulu.w  d4,d0
    ror.l   #8,d0
    move.l  d0,(a0)+

    dbra    d5,.loopx
    dbra    d6,.loopy
    dbra    d7,.loopz

    move.l  gfxbase,a6
    move.l  svport,a0
    lea     pal,a1
    jsr     _LVOLoadRGB32(a6)

    movem.l (sp)+,d0-a6
    rts
Thorham is offline  
Old 21 October 2013, 15:51   #99
Mrs Beanbag
Glastonbridge Software
 
Mrs Beanbag's Avatar
 
Join Date: Jan 2012
Location: Edinburgh/Scotland
Posts: 2,243
another thing you could do is unroll the loop 3 times, and get rid of those four moves entirely.
Mrs Beanbag is offline  
Old 21 October 2013, 16:43   #100
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,762
Quote:
Originally Posted by Mrs Beanbag View Post
another thing you could do is unroll the loop 3 times, and get rid of those four moves entirely.
Thanks, that's a good idea Especially with a loop this small. Make a macro, and it should stay pretty clean looking, too.
Thorham is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
NetSurf for AGA arti News 92 14 March 2016 21:44
Optimizing question: instruction order TheDarkCoder Coders. Asm / Hardware 9 29 October 2011 17:07
Layered tile engine optimizing. Thorham Coders. General 0 30 September 2011 20:43
Benching and optimizing CF-IDE speed Photon support.Hardware 12 15 July 2009 01:48
For people who like optimizing 680x0 code. Thorham Coders. General 5 28 May 2008 11:48

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 18:57.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.12390 seconds with 14 queries