English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 20 June 2017, 20:47   #1
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
Optimizing HAM8 renderer.

For people who like optimizing 68020/68030 code (don't sacrifice render quality):

Code:
ham8.render
    movem.l d0-a6,-(sp)

    move.l  bmpFile,a5
    add.l   bmpFileSize,a5
    sub.l   #640*3,a5
    move.l  bmp,a6

    clr.l   d0
    clr.l   d1
    clr.l   d2
    clr.l   d3
    clr.l   d4
    clr.l   d5

    move.w  #512-1,-(sp)
.loopy

    clra    a0
    clra    a1
    clra    a2

    move.w  #640-1,d7
.loopx

; read pixel's red green and blue components (little endian)

    move.b  (a5)+,d2 ; blue
    move.b  (a5)+,d1 ; green
    move.b  (a5)+,d0 ; red

; get pointer to closest palette color

    move.b  d0,d6
    lsl.w   #4,d6
    move.b  d1,d6
    lsl.w   #4,d6
    move.b  d2,d6
    lsr.w   #4,d6

    lea     (ham8.colorTable.w,pc,d6.w*8),a3

; palette difference

    move.w  (a3)+,d3 ; red
    sub.w   d0,d3
    subx.w  d6,d6
    eor.w   d6,d3

    move.w  (a3)+,d4 ; green
    sub.w   d1,d4
    subx.w  d6,d6
    eor.w   d6,d4

    move.w  (a3)+,d5 ; blue
    sub.w   d2,d5
    subx.w  d6,d6
    eor.w   d6,d5

; calculate weighted x2 x3 x1 threshold

    add.l   d4,d3
    add.l   d3,d3
    add.l   d4,d3
    add.l   d5,d3
    move.l  d3,a4

; ham difference

    move.l  a0,d3 ; red
    sub.w   d0,d3
    subx.w  d6,d6
    eor.w   d6,d3

    move.l  a1,d4 ; green
    sub.w   d1,d4
    subx.w  d6,d6
    eor.w   d6,d4

    move.l  a2,d5 ; blue
    sub.w   d2,d5
    subx.w  d6,d6
    eor.w   d6,d5

; 2x 3x 1x ham difference weights

    add.l   d3,d3
    move.l  d4,d6
    add.l   d4,d4
    add.l   d6,d4

; mask for ham pixels

    moveq   #-4,d6

; compare ham differences for green

    cmp.l   d4,d3
    bgt.s   .red
    cmp.l   d4,d5
    bgt.s   .blue

; check weighted threshold

    add.l   d5,d3
    cmp.l   d3,a4
    ble.s   .palette

; update ham color, set green ham code, write pixel

    and.b   d6,d1
    move.l  d1,a1
    addq.l  #3,d1
    move.b  d1,(a6)+

    dbra    d7,.loopx
    bra.s   .next

; compare ham differences for red

.red
    cmp.l   d3,d5
    bgt.s   .blue

; check weighted threshold

    add.l   d5,d4
    cmp.l   d4,a4
    ble.s   .palette

; update ham color, set red ham code, write pixel

    and.b   d6,d0
    move.l  d0,a0
    addq.l  #2,d0
    move.b  d0,(a6)+

    dbra    d7,.loopx
    bra.s   .next

; check weighted threshold

.blue
    add.l   d4,d3
    cmp.l   d3,a4
    ble.s   .palette

; update ham color, set blue ham code, write pixel

    and.b   d6,d2
    move.l  d2,a2
    addq.l  #1,d2
    move.b  d2,(a6)+

    dbra    d7,.loopx
    bra.s   .next

; write palette color and update current ham color

.palette

    subq.l  #6,a3

    move.w  (a3)+,a0
    move.w  (a3)+,a1
    move.w  (a3)+,a2
    move.b  (a3),(a6)+

    dbra    d7,.loopx

.next
    sub.l   #640*6,a5

    subq.w  #1,(sp)
    bge     .loopy

    addq.l  #2,sp

    movem.l (sp)+,d0-a6
    rts

ham8.render_end
ham8.colorTable
Thorham is offline  
Old 22 June 2017, 04:33   #2
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
I refuse to believe no one sees any optimizations at all.
Thorham is offline  
Old 22 June 2017, 08:21   #3
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,038
I'm more into 000/040...
Anyway, three minor things after taking a quick look at the code and 020 tables:
- lea (ham8.colorTable.w,pc,d6.w*8),a3 is out of 8-bit range
- and.b #$fc,dx as fast as and.b d6,dx? if so, moveq #-4,d6 not needed
- (-6,a3)/(-4,a3)/(-2,a3) faster than subq.l #6,a3 and 3x(a3)+? (postinc as fast as indirect displacement)

Branching looks OK to me (G's weight is 3 so it makes sense to assume branch not taken when comparing d4 with d3/d5).

EDIT:
So much about taking a nap, now I can't shut my brain off..
Code:
;  move.w  #512-1,-(sp)
...
;  move.w  #640-1,d7
  move.l  #(512-1)<<16+(640-1),d7

  dbf d7,.loopx
..
;  subq.w  #1,(sp)
  sub.l #(2<<16)-640,d7
  bge     .loopy
;  addq.l  #2,sp

Last edited by a/b; 22 June 2017 at 09:28.
a/b is offline  
Old 22 June 2017, 13:16   #4
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
Thanks, but it's not that easy

Quote:
Originally Posted by a/b View Post
- lea (ham8.colorTable.w,pc,d6.w*8),a3 is out of 8-bit range
That one might be a bit of a problem, because the table is 32kb.

Quote:
Originally Posted by a/b View Post
- and.b #$fc,dx as fast as and.b d6,dx? if so, moveq #-4,d6 not needed
On 68020/30 AND immediate same speed as AND register + moveq.

Quote:
Originally Posted by a/b View Post
- (-6,a3)/(-4,a3)/(-2,a3) faster than subq.l #6,a3 and 3x(a3)+? (postinc as fast as indirect displacement)
Auto decrement is 1 cycle slower than auto increment (really). Furthermore, you have to move the write to memory to a place where nothing gets pipelined (the dbra gets partially pipelined now).

Quote:
Originally Posted by a/b View Post
Branching looks OK to me (G's weight is 3 so it makes sense to assume branch not taken when comparing d4 with d3/d5).
G's case should happen the most often, so it's done first (if that's what you mean).

Quote:
Originally Posted by a/b View Post
Code:
;  move.w  #512-1,-(sp)
...
;  move.w  #640-1,d7
  move.l  #(512-1)<<16+(640-1),d7

  dbf d7,.loopx
..
;  subq.w  #1,(sp)
  sub.l #(2<<16)-640,d7
  bge     .loopy
;  addq.l  #2,sp
That's very interesting, thanks.
Thorham is offline  
Old 22 June 2017, 17:02   #5
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,038
I meant:
Code:
.palette
;    subq.l  #6,a3
;    move.w  (a3)+,a0
;    move.w  (a3)+,a1
;    move.w  (a3)+,a2
    move.w (-6,a3),a0
    move.w (-4,a3),a1
    move.w (-2,a3),a2
    move.b  (a3),(a6)+
But, just the same as with AND, it depends on several things. In theory it could be faster. (Ax)+ calc-ea is 2/2/2 (best/cache/worst case), (d16,pc/Ax) is 2/2/3. calc&fetch-ea is 4/4/4 vs. 3/5/6 so in best case scenario it's 1 cycle less and subq is not needed.
For AND calc&fetch-ea looks like 0/0/0 reg and 0/2/3 immed so, again, best case it's the same speed but moveq is not needed. In theory, and assuming 020 ;P.

Let me take a look at 030... Uhm, this is significantly different.
(Ax)+ calc-ea is 0+0/2/2 (head+tail/cache/nocache) and (d16,pc/Ax) is 2+0/2/2, so yeah it's very likely slower on 030.

And regarding the color table. It's fine as is, I simply forgot that asm-one wants the 16-bit displacement at the end, otherwise it will parse it as a brief/old mode and then complain it's not within 8-bit.
a/b is offline  
Old 22 June 2017, 18:29   #6
Thorham
Computer Nerd
 
Thorham's Avatar
 
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 47
Posts: 3,751
Quote:
Originally Posted by a/b View Post
Let me take a look at 030... Uhm, this is significantly different.
It is. 16 bit immediate AND is always 4 cycles, for example. The code you posted is a few cycles slower on 68020/30. Very typical how that works.

Quote:
Originally Posted by a/b View Post
I simply forgot that asm-one wants the 16-bit displacement at the end
Yeah, I use Barfly. It's very annoying that different assemblers have a different syntax for this

Come on guys... surely more people can see something?
Thorham is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Want to Find : Amiga 808 synth / sample renderer Zetr0 Nostalgia & memories 5 14 August 2016 11:06
Renderer that played synth sound, know its name? copse Nostalgia & memories 0 10 June 2015 10:12
Improved scanline renderer in FS-UAE FrodeSolheim support.FS-UAE 55 30 March 2013 14:31
HAM8 screen question. Thorham Coders. General 28 04 April 2011 19:26
REQ : Vistapro (Landscape Renderer) Djay request.Apps 22 01 May 2002 22:47

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 10:31.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.15542 seconds with 13 queries