English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 18 December 2020, 16:00   #1
Ernst Blofeld
<optimized out>
 
Ernst Blofeld's Avatar
 
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
Arithmetic shift right with the blitter

I have a large number of 48 bit wide numbers that I need to shift right arithmetically, i.e. preserving the sign, by 14 bits.

At some point, if I can fiddle it, the blitter will become the best option to do this, right? Does anyone want to make a guess as to where this point might be?

Does anyone have any insight into good ways to achieve this?

I'm guessing that I'll need to sign extend to a 64 bit value in order to get the right ones or zeros to fill in the top bits. Tricks like right shifting by 2 will lose me 1/4 of my number space, so I think I should discount them.

Any smart ideas out there?
Ernst Blofeld is offline  
Old 18 December 2020, 16:42   #2
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,430
Depending on how the numbers are organised in memory, this should definitely be possible using the Blitter. Compared to using the 68000 it should be significantly faster even when not doing that many numbers. Shifting 48 bit numbers on 68000 using shift instructions takes around 70 cycles per number (table based shifting may help here but that'll take a lot of memory and likely won't be more than about 1.5-2x as fast). Shifting 64 bit numbers using the Blitter should take around 16 cycles per number.

The overhead of setting up the Blitter is obviously still there, but with a simple blit like a copy & shift it shouldn't be a very large percentage.

However, this all is only true if you can keep the numbers in memory such that you can do this in a single blit. If this is not possible, the overhead could spiral out of control (worst case of one blit for one number is way slower).
roondar is offline  
Old 18 December 2020, 17:05   #3
Ernst Blofeld
<optimized out>
 
Ernst Blofeld's Avatar
 
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
Quote:
Originally Posted by roondar View Post
Depending on how the numbers are organised in memory, this should definitely be possible using the Blitter. Compared to using the 68000 it should be significantly faster even when not doing that many numbers. Shifting 48 bit numbers on 68000 using shift instructions takes around 70 cycles per number (table based shifting may help here but that'll take a lot of memory and likely won't be more than about 1.5-2x as fast). Shifting 64 bit numbers using the Blitter should take around 16 cycles per number.

The overhead of setting up the Blitter is obviously still there, but with a simple blit like a copy & shift it shouldn't be a very large percentage.

However, this all is only true if you can keep the numbers in memory such that you can do this in a single blit. If this is not possible, the overhead could spiral out of control (worst case of one blit for one number is way slower).
Thanks, those are good numbers and I think they show the idea is valid, if I can implement it. Conceptually it seems simple, as most things do.
Ernst Blofeld is offline  
Old 19 December 2020, 09:53   #4
Ernst Blofeld
<optimized out>
 
Ernst Blofeld's Avatar
 
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
Can we play "Spot the mistake?"

Code:
void BlitRight14Bits(volatile WORD * buffer, UWORD sizeInWords) {
    KPrintF("Waiting...");
    WaitForBlitter();

    KPrintF("Blitting...");
    custom->bltcon0 = 14 << ASHIFTSHIFT | SRCA | DEST | A_TO_D;
    custom->bltcon1 = BLITREVERSE;
    custom->bltapt = (WORD *) buffer + sizeInWords - 1;
    custom->bltdpt = (WORD *) buffer + sizeInWords - 1;
    custom->bltdmod = 0;

    custom->bltsize = sizeInWords;
    KPrintF("Waiting...");
    WaitForBlitter();
    KPrintF("Blitted!");
}
I get the second "Waiting..." line output, but not the final "Blitted!" line.

Anyone?

I'm testing with sizeInWords = 6.

Edit,

My BLTSIZE is wrong, but if I try my next guess of
custom->bltsize = (1 << 6) + sizeInWords;
I get the same results.

Last edited by Ernst Blofeld; 19 December 2020 at 10:01.
Ernst Blofeld is offline  
Old 19 December 2020, 10:11   #5
StingRay
move.l #$c0ff33,throat
 
StingRay's Avatar
 
Join Date: Dec 2005
Location: Berlin/Joymoney
Posts: 6,863
Where do you set the masks and modulo for channel A?
StingRay is offline  
Old 19 December 2020, 10:19   #6
Ernst Blofeld
<optimized out>
 
Ernst Blofeld's Avatar
 
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
Quote:
Originally Posted by StingRay View Post
Where do you set the masks and modulo for channel A?
I forgot BLTAMOD, thanks.

I convinced myself that the first and last word masks didn't matter for my use case, but I'm going to think about that again.

And I now have it blitting, after moving the test code to a point in my program where DMA is enabled, but it doesn't look like it's doing anything.

Going to check those masks and a few other things.
Ernst Blofeld is offline  
Old 19 December 2020, 10:26   #7
Antiriad_UK
OCS forever!
 
Antiriad_UK's Avatar
 
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
Code:
custom->bltapt = (WORD *) buffer + sizeInWords - 1;
    custom->bltdpt = (WORD *) buffer + sizeInWords - 1;
Shouldn't that be -2 to select the last word in DESC mode? IIRC it rounds down on non even addresses anyway so probably doesn't do anything for you.

Edit: NVM saw the word* cast.
Antiriad_UK is offline  
Old 19 December 2020, 10:27   #8
Ernst Blofeld
<optimized out>
 
Ernst Blofeld's Avatar
 
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
Quote:
Originally Posted by Antiriad_UK View Post
Code:
custom->bltapt = (WORD *) buffer + sizeInWords - 1;
    custom->bltdpt = (WORD *) buffer + sizeInWords - 1;
Shouldn't that be -2 to select the last word in DESC mode? IIRC it rounds down on non even addresses anyway so probably doesn't do anything for you.
C pointers should mean the 1 is really a 2, if I've done it right.
Ernst Blofeld is offline  
Old 19 December 2020, 10:30   #9
Ernst Blofeld
<optimized out>
 
Ernst Blofeld's Avatar
 
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
Guess what? The data wasn't in chip memory.
Ernst Blofeld is offline  
Old 19 December 2020, 10:33   #10
Ernst Blofeld
<optimized out>
 
Ernst Blofeld's Avatar
 
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
Updated code, I'm now verifying the results of this, but I'm blitting so I can't be far wrong...


Code:
__attribute__((section("buffers.MEMF_CHIP"))) volatile WORD foo [] = {
	0x8000, 0x0000, 0x1234, 0x5678,
	0xffff, 0x0fff, 0xffff, 0x0fff,
	0x1010, 0x1010, 0x1010, 0x1010
};
Code:
	for (UWORD i = 0; i < 3 * 4; i++) {
		KPrintF("0x%04lx", (LONG) foo[i] & 0x0000ffff);
	}
	
	BlitRight14Bits(foo, 3);

	for (UWORD i = 0; i < 3 * 4; i++) {
		KPrintF("0x%04lx", (LONG) foo[i] & 0x0000ffff);
	}
Code:
void BlitRight14Bits(volatile WORD * buffer, UWORD sizeInLong64s) {
    KPrintF("Waiting...");
    WaitForBlitter();

    KPrintF("Blitting...");
    custom->bltcon0 = 14 << ASHIFTSHIFT | SRCA | DEST | A_TO_D;
    custom->bltcon1 = BLITREVERSE;
    custom->bltafwm = 0xffff;
    custom->bltalwm = 0xffff;
    custom->bltapt = (WORD *) buffer + 4 * sizeInLong64s - 1;
    custom->bltdpt = (WORD *) buffer + 4 * sizeInLong64s - 1;
    custom->bltamod = 0;
    custom->bltdmod = 0;

    custom->bltsize = (sizeInLong64s << 6) + 4;
    KPrintF("Waiting...");
    WaitForBlitter();
    KPrintF("Blitted!");
}
Ernst Blofeld is offline  
Old 19 December 2020, 10:49   #11
Ernst Blofeld
<optimized out>
 
Ernst Blofeld's Avatar
 
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
Right then, it's not shifting by 14 bits. I guess that was too much to hope for.

I have to do something funky like set BLTAPT one word lower and shift by 2 bits instead?
Ernst Blofeld is offline  
Old 19 December 2020, 10:55   #12
Antiriad_UK
OCS forever!
 
Antiriad_UK's Avatar
 
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
DESC mode shifts left. Is that your intention as the function is called BlitRight ? What's your expected output?

If this is you input and I'm reading your ask correctly
Code:
	0x8000, 0x0000, 0x1234, 0x5678,
	0xffff, 0x0fff, 0xffff, 0x0fff,
	0x1010, 0x1010, 0x1010, 0x1010
Is it that you want 0x8000000012345678 shifted right 14?

Edit:
If that's the case you'll need to treat each 64bit/4 words chunk as a single "line" and not in DESC mode. Then your blit is 4 words wide of xx lines high (xx is however many groups of 4 words). bltlwm mask will need to mask the last 14 bits so they don't get shifted to the next line so $c000.

Last edited by Antiriad_UK; 19 December 2020 at 11:04.
Antiriad_UK is offline  
Old 19 December 2020, 11:10   #13
Ernst Blofeld
<optimized out>
 
Ernst Blofeld's Avatar
 
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
Quote:
Originally Posted by Antiriad_UK View Post
DESC mode shifts left. Is that your intention as the function is called BlitRight ? What's your expected output?

If this is you input and I'm reading your ask correctly
Code:
	0x8000, 0x0000, 0x1234, 0x5678,
	0xffff, 0x0fff, 0xffff, 0x0fff,
	0x1010, 0x1010, 0x1010, 0x1010
Edit:
Is it that you want 0x80000000 shifted right 14, and 0x12345678 shifted right 14?
If that's the case you'll need to treat each 64bit/4 words chunk as a single "line". Then your blit is 4 words wide of xx lines (xx is however many groups of 4 words). bltlwm mask will need to mask the last 14 bits so they don't get shifted to the next line so $c000.
Each row is a 64 bit signed value. It's really a 48 bit value that will be sign extended (it's the result of multiplying a 16 bit and a 32 bit value together).

I'm using DESC as I'm writing over the top of the data, shifting it in place, so I need to fetch the data before it's overwritten. But if that's not going to work there's no reason why I have to do it in place. Would it be better to use ascending mode and a separate destination buffer?

Edit: Yes, that's exactly what you're saying. Or can I still do it in place with ascending?
Ernst Blofeld is offline  
Old 19 December 2020, 11:19   #14
Antiriad_UK
OCS forever!
 
Antiriad_UK's Avatar
 
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
Yes you can do in place with ascending.

Rereading again you mentioned an arithmetic shift, setup in this way it will be a logical shift only so 0x8000000012345678 will end up with the sign bit shifting down as well...not sure it's possible to handle the sign bit..hmm I guess you could mask the sign bit and then OR the result back over the top rather than a normal A-D copy.

Edit: Would have to use a separate source/dest buffer if ORing.
Antiriad_UK is offline  
Old 19 December 2020, 11:21   #15
Ernst Blofeld
<optimized out>
 
Ernst Blofeld's Avatar
 
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
Quote:
Originally Posted by Antiriad_UK View Post
Yes you can do in place with ascending.

Rereading again you mentioned an arithmetic shift, setup in this way it will be a logical shift only so 0x8000000012345678 will end up with the 8 shifting down as well...not sure it's possible to handle the sign bit..hmm
Yeah, I'm discarding the top word of the result, so it should be ok.
Ernst Blofeld is offline  
Old 19 December 2020, 11:27   #16
Antiriad_UK
OCS forever!
 
Antiriad_UK's Avatar
 
Join Date: Mar 2019
Location: Birmingham, UK
Posts: 418
Quote:
Originally Posted by Ernst Blofeld View Post
Yeah, I'm discarding the top word of the result, so it should be ok.
If you are going to discard the sign bit anyway then you can use
bltfwm = $7fff
bltlwm = $c000

and that will strip the sign bit before shifting so you don't need to clear it up later
Antiriad_UK is offline  
Old 19 December 2020, 11:31   #17
Ernst Blofeld
<optimized out>
 
Ernst Blofeld's Avatar
 
Join Date: Sep 2020
Location: <optimized out>
Posts: 321
Quote:
Originally Posted by Antiriad_UK View Post
If you are going to discard the sign bit anyway then you can use
bltfwm = $7fff
bltlwm = $c000

and that will strip the sign bit before shifting so you don't need to clear it up later
I've come back with a picture.

It shows 4 words, or 64 bits, which are the result of a calculation. The calculation is currently a 16 word multiplied by a 32 bit long giving a 48 bit value, which I'm sign extending into the upper blue word to make it 64 bits. I will have the need for a full 64 bit calculation without this sign extension. I want to discard the lower yellow 14 bits of this 64 bit value, doing lots of them at a time.

Edit:

This seems to work for me:

Code:
void BlitRight14Bits(volatile WORD * input, volatile WORD * output, UWORD sizeInLong64s) {
    WaitForBlitter();

    custom->bltcon0 = 14 << ASHIFTSHIFT | SRCA | DEST | A_TO_D;
    custom->bltcon1 = 0;
    custom->bltafwm = 0xffff;
    custom->bltalwm = 0xc000;
    custom->bltapt = (WORD *) input;
    custom->bltdpt = (WORD *) output;
    custom->bltamod = 0;
    custom->bltdmod = 0;

    custom->bltsize = (sizeInLong64s << 6) + 4;
    WaitForBlitter();
}
Attached Thumbnails
Click image for larger version

Name:	64Bits.png
Views:	80
Size:	1,017 Bytes
ID:	69985  

Last edited by Ernst Blofeld; 19 December 2020 at 13:28.
Ernst Blofeld is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
BCD Arithmetic - howto^ Herpes Coders. General 50 22 November 2021 06:38
Blitter shift eats 1px away KONEY Coders. Asm / Hardware 64 04 November 2020 17:48
Blitter shift BACKWARDS KONEY Coders. Asm / Hardware 3 29 January 2020 21:50
Blitter Mask shift during copy LeCaravage Coders. Asm / Hardware 6 18 March 2018 22:50
Blitter busy flag with blitter DMA off? NorthWay Coders. Asm / Hardware 9 23 February 2014 21:05

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 16:56.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.11098 seconds with 14 queries