30 September 2013, 17:36 | #1 |
Registered User
Join Date: Jul 2008
Location: Poland
Posts: 675
|
NetSurf AGA optimizing
In order to speed up scrolling a little faster this code should be optimized or even better, asm'fied:
Code:
switch (palette->type) { case NSFB_PALETTE_NSFB_8BPP: /* Index into colour cube part */ dr = ((( c & 0xFF) * 5) + 128) / 256; dg = ((((c >> 8) & 0xFF) * 7) + 128) / 256; db = ((((c >> 16) & 0xFF) * 4) + 128) / 256; col = 40 * dr + 5 * dg + db; palent = palette->data[col]; dr = ( c & 0xFF) - ( palent & 0xFF); dg = ((c >> 8) & 0xFF) - ((palent >> 8 ) & 0xFF); db = ((c >> 16) & 0xFF) - ((palent >> 16) & 0xFF); cur_distance = (dr * dr) + (dg * dg) + (db * db); best_col = col; best_distance = cur_distance; *r_error = dr; *g_error = dg; *b_error = db; /* Index into grayscale part */ col = (( c & 0xFF) + ((c >> 8) & 0xFF) + ((c >> 16) & 0xFF) + (45 / 2)) / (15 * 3) - 1 + 240; palent = palette->data[col]; dr = ( c & 0xFF) - ( palent & 0xFF); dg = ((c >> 8) & 0xFF) - ((palent >> 8) & 0xFF); db = ((c >> 16) & 0xFF) - ((palent >> 16) & 0xFF); cur_distance = (dr * dr) + (dg * dg) + (db * db); if (cur_distance < best_distance) { best_distance = cur_distance; best_col = col; *r_error = dr; *g_error = dg; *b_error = db; } break; |
30 September 2013, 19:07 | #2 |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,859
|
It would help if you could explain what this code does exactly. Also, what are the types of the variables? What does palette->data[col] contain exactly? Rewriting this in assembler isn't hard, and it's certainly possible to optimize it quite a bit, but without some extra info it's going to be a little problematic. Especially knowledge of what the code does allows better optimization.
Last edited by Thorham; 30 September 2013 at 19:20. |
01 October 2013, 16:45 | #3 |
Registered User
Join Date: Jul 2008
Location: Poland
Posts: 675
|
Here is whole file:
http://pastebin.com/X9rg2WSA It converts 24bit to 8bit colour. And function above is used here: http://pastebin.com/CwyUWVsp Last edited by arti; 01 October 2013 at 17:01. |
01 October 2013, 20:37 | #4 |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,859
|
Thanks, seeing this in the right context is much better
Anyway, I've made some simple optimizations to the original C code. Get it here: pallete.txt One thing: Make sure you generate and add a multiplication table that multiplies a number by itself. This is for the deltas. The table should be 511 ints long, starting with the result for -255 * -255 and it should end with the result for 255 * 255. Then just use a pointer that points to the middle of the table (0 * 0). The array name used in the code is square, but you'll see that. There are numerous other optimizations possible, but I'm not very experienced with C (I do know some things about optimizing), so try the code first. In assembly language all this can be done in a better way, but we'll get to that later |
02 October 2013, 00:52 | #5 |
Registered User
Join Date: Sep 2007
Location: Melbourne/Australia
Posts: 4,441
|
Just to help a bit
Code:
/** representation of a colour. * * The colour value comprises of four components arranged in the order ABGR: * bits 24-31 are the alpha value and represent the opacity. 0 is * transparent i.e. there would be no change in the target surface if * this colour were to be used and 0xFF is opaque. * * bits 16-23 are the Blue component of the colour. * * bits 8-15 are the Green component of the colour. * * bits 0-7 are the Red component of the colour. */ typedef uint32_t nsfb_colour_t; |
02 October 2013, 01:08 | #6 |
Registered User
Join Date: Sep 2007
Location: Melbourne/Australia
Posts: 4,441
|
The old code was written like this, did you update it?
Code:
static uint8_t colour_to_pixel(nsfb_t *nsfb, nsfb_colour_t c) { nsfb_colour_t palent; int col; int dr, dg, db; /* delta red, green blue values */ int cur_distance; int best_distance = INT_MAX; uint8_t best_col = 0; for (col = 0; col < 256; col++) { palent = nsfb->palette[col]; dr = (c & 0xFF) - (palent & 0xFF); dg = ((c >> 8) & 0xFF) - ((palent >> 8) & 0xFF); db = ((c >> 16) & 0xFF) - ((palent >> 16) & 0xFF); cur_distance = ((dr * dr) + (dg * dg) + (db *db)); if (cur_distance < best_distance) { best_distance = cur_distance; best_col = col; } } return best_col; } Color conversion is always going to be slow if it's doing it for every time it needs to find out what color a pixel is Maybe there is some fancy code that will allow you to store 16,777,216 indexes? (I think alpha can be ignored for AGA) It looks like this call is only used in two places, is that correct? Code:
static bool fill(nsfb_t *nsfb, nsfb_bbox_t *rect, nsfb_colour_t c) { int y; uint8_t ent; uint8_t *pvideo; if (!nsfb_plot_clip_ctx(nsfb, rect)) return true; /* fill lies outside current clipping region */ pvideo = get_xy_loc(nsfb, rect->x0, rect->y0); ent = colour_to_pixel(nsfb, c); for (y = rect->y0; y < rect->y1; y++) { memset(pvideo, ent, rect->x1 - rect->x0); pvideo += nsfb->linelen; } return true; } Code:
static bool glyph1(nsfb_t *nsfb, nsfb_bbox_t *loc, const uint8_t *pixel, int pitch, nsfb_colour_t c) { PLOT_TYPE *pvideo; PLOT_TYPE fgcol; int xloop, yloop; int xoff, yoff; /* x and y offset into image */ int x = loc->x0; int y = loc->y0; int width = loc->x1 - loc->x0; int height = loc->y1 - loc->y0; const uint8_t *fntd; uint8_t row; if (!nsfb_plot_clip_ctx(nsfb, loc)) return true; if (height > (loc->y1 - loc->y0)) height = (loc->y1 - loc->y0); if (width > (loc->x1 - loc->x0)) width = (loc->x1 - loc->x0); xoff = loc->x0 - x; yoff = loc->y0 - y; pvideo = get_xy_loc(nsfb, loc->x0, loc->y0); fgcol = colour_to_pixel(nsfb, c); for (yloop = yoff; yloop < height; yloop++) { fntd = pixel + (yloop * (pitch>>3)) + (xoff>>3); row = (*fntd++) << (xoff & 3); for (xloop = xoff; xloop < width ; xloop++) { if (((xloop % 8) == 0) && (xloop != 0)) { row = *fntd++; } if ((row & 0x80) != 0) { *(pvideo + xloop) = fgcol; } row = row << 1; } pvideo += PLOT_LINELEN(nsfb->linelen); } return true; } static bool glyph8(nsfb_t *nsfb, nsfb_bbox_t *loc, const uint8_t *pixel, int pitch, nsfb_colour_t c) { PLOT_TYPE *pvideo; nsfb_colour_t fgcol; nsfb_colour_t abpixel; /* alphablended pixel */ int xloop, yloop; int xoff, yoff; /* x and y offset into image */ int x = loc->x0; int y = loc->y0; int width = loc->x1 - loc->x0; int height = loc->y1 - loc->y0; if (!nsfb_plot_clip_ctx(nsfb, loc)) return true; if (height > (loc->y1 - loc->y0)) height = (loc->y1 - loc->y0); if (width > (loc->x1 - loc->x0)) width = (loc->x1 - loc->x0); xoff = loc->x0 - x; yoff = loc->y0 - y; pvideo = get_xy_loc(nsfb, loc->x0, loc->y0); fgcol = c & 0xFFFFFF; for (yloop = 0; yloop < height; yloop++) { for (xloop = 0; xloop < width; xloop++) { abpixel = (pixel[((yoff + yloop) * pitch) + xloop + xoff] << 24) | fgcol; if ((abpixel & 0xFF000000) != 0) { /* pixel is not transparent */ if ((abpixel & 0xFF000000) != 0xFF000000) { abpixel = nsfb_plot_ablend(abpixel, pixel_to_colour(nsfb, *(pvideo + xloop))); } *(pvideo + xloop) = colour_to_pixel(nsfb, abpixel); } } pvideo += PLOT_LINELEN(nsfb->linelen); } return true; } Last edited by NovaCoder; 02 October 2013 at 01:23. |
02 October 2013, 02:59 | #7 | ||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,859
|
Quote:
Quote:
Ultimately there's only so much you can do to speed this up. It would be much better to rewrite the whole loop with all calls inlined in assembly language. This goes for the dither routine as well, which can also be sped up even more by using Sierra Filter Lite instead of Floyd Steiberg (and is just as good). |
||
02 October 2013, 03:47 | #8 |
Registered User
Join Date: Sep 2007
Location: Melbourne/Australia
Posts: 4,441
|
Yep that's what I'm thinking now........convert the color from 32 to 16 bits and then do a lookup.
That should result in a massive speed increase Last edited by NovaCoder; 02 October 2013 at 04:00. |
02 October 2013, 04:18 | #9 |
Registered User
Join Date: Sep 2007
Location: Melbourne/Australia
Posts: 4,441
|
So this is how I'd write it:
Code:
/* * Copyright 2009 Vincent Sanders <vince@simtec.co.uk> * Copyright 2010 Michael Drake <tlsa@netsurf-browser.org> * * This file is part of libnsfb, http://www.netsurf-browser.org/ * Licenced under the MIT License, * http://www.opensource.org/licenses/mit-license.php */ #include <stdbool.h> #include <endian.h> #include <stdlib.h> #include <string.h> #include "libnsfb.h" #include "libnsfb_plot.h" #include "libnsfb_plot_util.h" #include "nsfb.h" #include "palette.h" #include "plot.h" static byte palette_index_lookup[65536]; static uint8_t colour_to_pixel(nsfb_t *nsfb, nsfb_colour_t c) { int 16bitColor; int paletteIndex; if (nsfb->palette == NULL) return 0; // First convert you 32bit ABGR color to a 16bit RGB color 16bitColor = COLOR_CONVERSION_MACRO(c); // See if the 16bit color is already stored in the lookup. if (palette_index_lookup[16bitColor]) { // Just return the cached value. paletteIndex = palette_index_lookup[16bitColor]; } else { // Calcuate paletteIndex = nsfb_palette_best_match_dither(nsfb->palette,c); // Store in lookup so it can be used next time. palette_index_lookup[16bitColor] = paletteIndex; } return paletteIndex; } You'll need to come up with a MACRO to convert from ABGR to a 16 bit color (Google is your friend). You'll also need to initialize the 'palette_index_lookup' and reset it every time a logical palette update occurs. And you should handle black (16bit value of zero) properly. Last edited by NovaCoder; 08 October 2013 at 03:30. |
02 October 2013, 16:38 | #10 | ||
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,859
|
Quote:
Quote:
In assembler it would look something like this (68020/030, 060 needs to be optimized differently): Code:
lea c+1,a0 ; ptr to c, skip alpha part clr d0 ; clear table index move.b (a0)+,d0 ; get red lsl.l #5,d0 ; make room for next byte (shift) move.b (a0)+,d ; get green lsl.l #5,d0 ; make room for next byte (shift) move.b (a0)+,d0 ; get blue lsr.l #3,d0 ; final 15 bit value move.l (a1,d0.w),d0 ; get palette color number There's always a ton of stuff you can happily do in C, but tight loops like this should be optimized in assembler... after the C code works properly, of course Last edited by Thorham; 02 October 2013 at 21:36. |
||
03 October 2013, 02:53 | #11 | |
Registered User
Join Date: Sep 2007
Location: Melbourne/Australia
Posts: 4,441
|
Quote:
I'm not sure how often NetSurf is updating the palette but if it's coding properly it will be updating it only once per page, if that is the case then it would look a lot better (and still be reasonably fast) to use my palette caching code listed above. An important thing to check would be to ensure that it's not updating the palette each time the user scrolls the web page (the palette should be based on the entire page including the off screen regions). |
|
03 October 2013, 07:37 | #12 |
Registered User
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,385
|
You can find my color reduction from ARGB to the OS 3.5 palette mapped format with up to 256 colors here:
http://eab.abime.net/showthread.php?p=914435#post914435 The OS 3.5 format can be adapted to every AGA screen mode with the normal color mapping code which is also part of my library (including the optional and much faster color mapping based on a 512 byte cache). Just search for IconBeFast in my source code and ask me if I should explain how it works. |
03 October 2013, 08:52 | #13 |
Computer Nerd
Join Date: Sep 2007
Location: Rotterdam/Netherlands
Age: 48
Posts: 3,859
|
To NovaCoder:
Yep, it won't be pretty Anyway, if you're going to generate a palette for the whole page, you might as well render it in HAM8. Looks better, and I wouldn't be surprised if it was faster, too (especially when you start adding error diffusion into the mix). Meynaf's HAM8 rendering routine (not the one I showed you) runs at about 120 cycles per pixel on a 68030. Seems hard to beat. Also, you'd have to render the whole page with all the images in one go, otherwise you can't calculate the palette. It's better if images are loaded dynamically so that the user can at least read and scroll through the text on the page. It seems to me that a fixed palette or HAM is best. |
03 October 2013, 15:44 | #14 | |
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
|
Quote:
EDIT: try this to start with. I've changed the calculation of col on line 92 in pallete.h as it would sometimes produce an index of 256, and I've reduced the size of the r g b_error variables to 16-bit, so you need to change this in the dithering function as well. The code also assumes entries 239 through 255 in the palette are perfect grays: Code:
- Last edited by Leffmann; 04 October 2013 at 16:58. |
|
04 October 2013, 12:26 | #15 |
Registered User
Join Date: Jul 2008
Location: Poland
Posts: 675
|
So, I simply replace code from line 92 to 106 with this code , yes?
Where can I find tables.i include? |
04 October 2013, 16:58 | #16 |
Join Date: Jul 2008
Location: Sweden
Posts: 2,269
|
Unfortunately it's not that simple, you can't just plop some assembly right into the C source code without knowing more about the compiler.
I've added all files to the attached archive, and for now you can try linking in the object file, change the rgb_error variables to 16-bit sizes, and declare and call it like this: Code:
int func_NSFB_PALETTE_NSFB_8BPP(int, void*, void*, void*, void*); switch (palette->type) { case NSFB_PALETTE_NSFB_8BPP: best_col = func_NSFB_PALETTE_NSFB_8BPP(c, palette->data, &r_error, &g_error, &b_error); break; ... } |
04 October 2013, 18:37 | #17 | |
Registered User
Join Date: Jul 2008
Location: Poland
Posts: 675
|
It doesn't work.
Quote:
I'll ask him how to implement it here: Code:
static uint8_t colour_to_pixel(nsfb_t *nsfb, nsfb_colour_t c) { if (nsfb->palette == NULL) return 0; return nsfb_palette_best_match_dither(nsfb->palette,c); } Last edited by arti; 04 October 2013 at 18:46. |
|
05 October 2013, 00:11 | #18 |
Registered User
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,385
|
Attached is the nature image (reduced to 256x160) as a direct drawing snapshot of my nature.info icon file. The size limitation exists only for the icons, of course. My color reduction routine could also handle larger sizes, too.
http://eab.abime.net/showpost.php?p=...&postcount=730 But I'm sorry, I've no intentions to port my assembler code to C. Last edited by PeterK; 10 May 2018 at 21:21. |
05 October 2013, 10:11 | #19 |
Registered User
Join Date: Oct 2012
Location: Germany
Posts: 585
|
|
05 October 2013, 23:46 | #20 |
Registered User
Join Date: Apr 2005
Location: digital hell, Germany, after 1984, but worse
Posts: 3,385
|
I came to the conclusion that NetSurf needs a totally different solution than my color recuction code since it creates a seperate palette for every image which is not really suitable for a browser.
Just for short now what I'm planning to do: The netsurf browser needs a fixed 256 color palette in order to decode all images as fast as possible. I want to use the 4 system colors and then add 252 fixed colors to the palette (6x7x6 possible values for the RGB components). The first 3 system colors (transparent, black and white) should be used too for the color assignments. The decoding of the color components will be done without any table or comparison by simply substracting 42 or 36 again and again from the component values. This will only require about 10 substractions for an average RGB value, no DIV, no MUL or anything CPU time consuming. And it can probably done in C. But I've no idea yet how good the image quality could be without using any dithering (at least not in the first version based on this direct color to palette assignment method). Update1: The first problem I noticed now are the mouse pointer colors (pen 16-18) which could detroy my concept for the fixed palette. Update2: Instead of 252 colors I will try to use 216 colors now (RGB = 6x6x6) from pen 32 up to pen 247. This leave some free pens for MWB and the color mapping and it also allows correct gray shades. The color components could have the values 15, 60, 105, 150, 195, 240 for a first test. Last edited by PeterK; 07 October 2013 at 11:07. |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
NetSurf for AGA | arti | News | 92 | 14 March 2016 21:44 |
Optimizing question: instruction order | TheDarkCoder | Coders. Asm / Hardware | 9 | 29 October 2011 17:07 |
Layered tile engine optimizing. | Thorham | Coders. General | 0 | 30 September 2011 20:43 |
Benching and optimizing CF-IDE speed | Photon | support.Hardware | 12 | 15 July 2009 01:48 |
For people who like optimizing 680x0 code. | Thorham | Coders. General | 5 | 28 May 2008 11:48 |
|
|