English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 08 December 2019, 20:38   #1
KONEY
OctaMED Music Composer
 
KONEY's Avatar
 
Join Date: Jan 2009
Location: Venice - Italy
Age: 49
Posts: 667
Stretch bit in word into longword

Here I am again stuck in very silly problems.

I'm trying to write a piece of code to transform a word into a longword by "stretching" the bits.

For example:

%1010110011010101

should become:

%11001100111100001111001100110011

so every value repeated twice.

But I have no idea how to proceed. Actually it's an exercise to learn manipulate data and I really need some help here.

I tried by copying the data, then LSR by 1 bit and copy it again with a XOR but all I got was a big mess
KONEY is offline  
Old 08 December 2019, 21:09   #2
mcgeezer
Registered User
 
Join Date: Oct 2017
Location: Sunderland, England
Posts: 2,702
Probably an easier way to do it... but a five minute shot.

Code:
	moveq	#0,d0
	moveq	#0,d1
	moveq	#0,d2
	move.w	#%1010110011010101,d0
	moveq	#15,d7
.loop:	btst	d7,d0
	beq.s	.next
	move.w	d7,d2
	add.w	d2,d2
	bset	d2,d1
	addq.w	#1,d2
	bset	d2,d1
.next:	dbf	d7,.loop
Your result will be in d1

mcgeezer
mcgeezer is offline  
Old 08 December 2019, 23:50   #3
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,476
This is faster and the number of processor cycles used are a constant:
Code:
    move.w  #%1010110011010101,d0
    move.w  d0,d1
    add.w   d0,d0
    addx.l  d2,d2
    add.w   d1,d1
    addx.l  d2,d2
    add.w   d0,d0
    addx.l  d2,d2
    add.w   d1,d1
    addx.l  d2,d2
    add.w   d0,d0
    addx.l  d2,d2
    add.w   d1,d1
    addx.l  d2,d2
    add.w   d0,d0
    addx.l  d2,d2
    add.w   d1,d1
    addx.l  d2,d2
    add.w   d0,d0
    addx.l  d2,d2
    add.w   d1,d1
    addx.l  d2,d2
    add.w   d0,d0
    addx.l  d2,d2
    add.w   d1,d1
    addx.l  d2,d2
    add.w   d0,d0
    addx.l  d2,d2
    add.w   d1,d1
    addx.l  d2,d2
    add.w   d0,d0
    addx.l  d2,d2
    add.w   d1,d1
    addx.l  d2,d2
    add.w   d0,d0
    addx.l  d2,d2
    add.w   d1,d1
    addx.l  d2,d2
    add.w   d0,d0
    addx.l  d2,d2
    add.w   d1,d1
    addx.l  d2,d2
    add.w   d0,d0
    addx.l  d2,d2
    add.w   d1,d1
    addx.l  d2,d2
    add.w   d0,d0
    addx.l  d2,d2
    add.w   d1,d1
    addx.l  d2,d2
    add.w   d0,d0
    addx.l  d2,d2
    add.w   d1,d1
    addx.l  d2,d2
    add.w   d0,d0
    addx.l  d2,d2
    add.w   d1,d1
    addx.l  d2,d2
    add.w   d0,d0
    addx.l  d2,d2
    add.w   d1,d1
    addx.l  d2,d2
    add.w   d0,d0
    addx.l  d2,d2
    add.w   d1,d1
    addx.l  d2,d2
    rts
Result in D2.
Yes, is ugly

Probably with a LUT can be made much faster.
ross is offline  
Old 09 December 2019, 00:14   #4
KONEY
OctaMED Music Composer
 
KONEY's Avatar
 
Join Date: Jan 2009
Location: Venice - Italy
Age: 49
Posts: 667
makes sense and actually works, thanks!
KONEY is offline  
Old 09 December 2019, 00:33   #5
KONEY
OctaMED Music Composer
 
KONEY's Avatar
 
Join Date: Jan 2009
Location: Venice - Italy
Age: 49
Posts: 667
Quote:
Originally Posted by ross View Post
This is faster and the number of processor cycles used are a constant:
Result in D2.
Yes, is ugly
Probably with a LUT can be made much faster.
LOL yes quite ugly but still a trick to learn, thanks
KONEY is offline  
Old 09 December 2019, 00:38   #6
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,476
A simple trick to make it much faster on bare 68k:

Code:
    move.w  #%1010110011010101,d0
    move.w  d0,d1
    add.w   d0,d0
    addx.w  d2,d2
    add.w   d1,d1
    addx.w  d2,d2
    add.w   d0,d0
    addx.w  d2,d2
    add.w   d1,d1
    addx.w  d2,d2
    add.w   d0,d0
    addx.w  d2,d2
    add.w   d1,d1
    addx.w  d2,d2
    add.w   d0,d0
    addx.w  d2,d2
    add.w   d1,d1
    addx.w  d2,d2
    add.w   d0,d0
    addx.w  d2,d2
    add.w   d1,d1
    addx.w  d2,d2
    add.w   d0,d0
    addx.w  d2,d2
    add.w   d1,d1
    addx.w  d2,d2
    add.w   d0,d0
    addx.w  d2,d2
    add.w   d1,d1
    addx.w  d2,d2
    add.w   d0,d0
    addx.w  d2,d2
    add.w   d1,d1
    addx.w  d2,d2
    swap    d2
    add.w   d0,d0
    addx.w  d2,d2
    add.w   d1,d1
    addx.w  d2,d2
    add.w   d0,d0
    addx.w  d2,d2
    add.w   d1,d1
    addx.w  d2,d2
    add.w   d0,d0
    addx.w  d2,d2
    add.w   d1,d1
    addx.w  d2,d2
    add.w   d0,d0
    addx.w  d2,d2
    add.w   d1,d1
    addx.w  d2,d2
    add.w   d0,d0
    addx.w  d2,d2
    add.w   d1,d1
    addx.w  d2,d2
    add.w   d0,d0
    addx.w  d2,d2
    add.w   d1,d1
    addx.w  d2,d2
    add.w   d0,d0
    addx.w  d2,d2
    add.w   d1,d1
    addx.w  d2,d2
    add.w   d0,d0
    addx.w  d2,d2
    add.w   d1,d1
    addx.w  d2,d2
    rts
mcgeezer routine: 678 cycles*
my previous one: 396 cycles
this: 272 cycles**

*of course only for this number of 1 bits
**occhio allo swap

Last edited by ross; 09 December 2019 at 00:50.
ross is offline  
Old 09 December 2019, 01:14   #7
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,476
LUT version: 82 cycles

Code:
    lea lut(pc),a0
    moveq   #0,d0
.md move.w  d0,d4
    moveq   #7,d3
    move.w  d4,d1
.ml add.b   d4,d4
    addx.w  d2,d2
    add.b   d1,d1
    addx.w  d2,d2
    dbf d3,.ml
    move.w  d2,(a0)+
    addq.b  #1,d0
    bne.b   .md
    
    move.w  #%1010110011010101,d0
    moveq   #0,d1
    move.b  d0,d1
    add.w   d1,d1
    move.w  lut(pc,d1.l),d2
    lsr.w   #8,d0
    swap    d2
    add.w   d0,d0
    move.w  lut(pc,d0.w),d2
    swap    d2
    rts
    
lut ds.w    256
ross is offline  
Old 09 December 2019, 01:30   #8
malko
Ex nihilo nihil
 
malko's Avatar
 
Join Date: Oct 2017
Location: CH
Posts: 4,903
Quote:
Originally Posted by ross View Post
LUT version: 82 cycles [...]
You're the best
malko is offline  
Old 09 December 2019, 03:34   #9
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Slightly lower cycle count (-4):
Code:
...
	move.w	#%1010110011010101,d0

	moveq	#0,d1
	move.b	d0,d1
	lsr.w	#8,d0
	add.w	d0,d0
	move.l	lut(pc,d0.w),d2
	add.w	d1,d1
	move.w	lut(pc,d1.w),d2

	rts
    
lut ds.w    256+1	; extra word
a/b is offline  
Old 09 December 2019, 09:48   #10
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,476
Quote:
Originally Posted by a/b View Post
Slightly lower cycle count (-4)


Another little gain (-2):
Code:
	move.w	#%1010110011010101,d0

	moveq	#0,d1
	move.w  d0,-(sp)
	move.b  (sp)+,d1
	add.w	d1,d1
	move.l	lut(pc,d1.l),d2
	moveq	#0,d1
	move.b	d0,d1
	add.w	d1,d1
	move.w	lut(pc,d1.l),d2

lut ds.w    256+1	; extra word
ross is offline  
Old 09 December 2019, 11:07   #11
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Getting too old for this , should've noticed earlier. 2 cycles faster, so the same as yours, but without extra mem accesses.
Code:
...
	move.w	#%1010110011010101,d0

	moveq	#0,d1
	move.b	d0,d1
;	lsr.w	#8,d0
;	add.w	d0,d0
 clr.b	d0
 lsr.w	#7,d0
	move.l	lut(pc,d0.w),d2
	add.w	d1,d1
	move.w	lut(pc,d1.w),d2

	rts
    
lut ds.w    256+1	; extra word
a/b is offline  
Old 09 December 2019, 11:11   #12
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,476
Quote:
Originally Posted by a/b View Post
Getting too old for this , should've noticed earlier. 2 cycles faster, so the same as yours, but without extra mem accesses.


Damn, that's why last night I dreamed of a lsr #7, but this morning I forgot it!
ross is offline  
Old 09 December 2019, 11:12   #13
KONEY
OctaMED Music Composer
 
KONEY's Avatar
 
Join Date: Jan 2009
Location: Venice - Italy
Age: 49
Posts: 667
Anyone shorter?
KONEY is offline  
Old 09 December 2019, 11:22   #14
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,476
Quote:
Originally Posted by KONEY View Post
Anyone shorter?
Be satisfied with that, without considering the initial setup in d0 and lut calc these are 68 cycles, I would say not bad at all
ross is offline  
Old 09 December 2019, 12:30   #15
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by KONEY View Post
Anyone shorter?
Absolute shortest (in some way ) :
Code:
 move.w #%1010110011010101,d0

 move.l (lut+32768*4,pc,d0.w*4),d0

lut ds.l 65536
Obviously needs 68020+.
meynaf is offline  
Old 09 December 2019, 13:04   #16
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,476
Quote:
Originally Posted by meynaf View Post
Absolute shortest (in some way ) :

Thinks that a 512byte lut already seemed big to me...
It would be interesting to calculate how many cycles it takes to fill the lut
ross is offline  
Old 09 December 2019, 13:50   #17
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Quote:
Originally Posted by ross View Post
It would be interesting to calculate how many cycles it takes to fill the lut
Indeed.
This is a rather typical example of code where we need to know what the program does and how badly it needs to be fast...
meynaf is offline  
Old 09 December 2019, 14:13   #18
grond
Registered User
 
Join Date: Jun 2015
Location: Germany
Posts: 1,919
The non-LUT code can be shortened and sped up quite a bit:


Code:
    
    move.w  #%1010110011010101,d0
    moveq    #0,d1

    add.w   d0,d0
    addx.w d1,d1
REPT 7
    add.w   d1,d1
    add.w   d0,d0
    addx.w d1,d1
ENDR

    swap    d1

    add.w   d0,d0
    addx.w d1,d1
REPT 7
    add.w   d1,d1
    add.w   d0,d0
    addx.w d1,d1
ENDR

    move.l  d1,d0
    add.l    d1,d1
    or.l       d1,d0
    rts
No idea about the actual cycle count.
grond is offline  
Old 09 December 2019, 14:51   #19
meynaf
son of 68k
 
meynaf's Avatar
 
Join Date: Nov 2007
Location: Lyon / France
Age: 51
Posts: 5,323
Another possibility is ye olde c2p merge trick (unverified, but it gives the idea) :
Code:
 move.w d0,d1
 ror.w #1,d1
 move.w d0,d2
 eor.w d1,d2
 andi.w #$5555,d2
 eor.w d2,d0
 eor.w d2,d1
 ror.w #1,d1
 move.w d0,d2
 eor.w d1,d2
 andi.w #$3333,d2
 eor.w d2,d0
 eor.w d2,d1
 ror.w #2,d1
 move.w d0,d2
 eor.w d1,d2
 andi.w #$0f0f,d2
 eor.w d2,d0
 eor.w d2,d1
 ror.w #4,d1
 move.b d0,d2
 move.b d1,d0
 move.b d2,d1
 rol.w #8,d1
 swap d0
 move.w d1,d0
Use of rol/ror is probably killing the timing on 68000, though...
meynaf is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Amiberry v2.19 on RP3 Raspbian Stretch Lite Panthros support.OtherUAE 0 05 May 2018 22:21
Indi ECS can't stretch image vertically? BarryB support.Hardware 4 15 December 2013 00:12
Word vs not word aligned playfield question nandius_c Coders. Asm / Hardware 8 03 December 2013 12:03
How do I stretch the screen in WinUAE? doomer support.WinUAE 1 21 July 2003 21:27
Any utilities to shrink or stretch WB bg pics? Mojo2000 New to Emulation or Amiga scene 14 29 January 2003 23:39

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 02:00.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.21617 seconds with 15 queries