English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 03 July 2021, 12:57   #461
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
I actually just did what I was "arguing" about with Alkis ;P. Installing nuke detector as I type this...
Optionally merged the prompt stuff into a single message. It's 4 bytes shorter exe (6 bytes code), so it's still pain time for me to gain 2 more (without obvious corner cutting! :P) to round it down, and that would also annulate the inevitable "must handle day transition" 8 bytes code.

Also, "#" instead of "number" would've worked better in your case.
a/b is online now  
Old 03 July 2021, 14:25   #462
alkis
Registered User
 
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
Umm, why do we call Forbid (even with macro) again?
alkis is offline  
Old 03 July 2021, 14:36   #463
alkis
Registered User
 
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
486 bytes (removing Forbid)

Edit: Yeah and if we dont Forbid, there is no reason for Exec. 482 bytes.

Last edited by alkis; 03 July 2021 at 14:43. Reason: removed move.l $4.w,a5
alkis is offline  
Old 03 July 2021, 16:23   #464
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Quote:
Originally Posted by alkis View Post
Umm, why do we call Forbid (even with macro) again?
Because it's shorter than exg a5,a6 + moveq #127,d0 + jsr SetTaskPri(a6) + exg a5,a6 and doesn't make your version potentially 3x slower under the same testing conditions .
a/b is online now  
Old 03 July 2021, 19:02   #465
alkis
Registered User
 
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
Quote:
Originally Posted by a/b View Post
Because it's shorter than exg a5,a6 + moveq #127,d0 + jsr SetTaskPri(a6) + exg a5,a6 and doesn't make your version potentially 3x slower under the same testing conditions .
For starters, its 2.26% slower at max 9320 digits (1496 vs 1463 secs on a500@fs-uae)
1.54% difference on 3000 digits.

Could go without DMA for more speed, or 1 bitplane screen/shell if display always on is needed.

And a question I've been meaning to ask. What's your actual executable size on disk?
Cause if it isn't 64k I am assembling this with wrong parameters
alkis is offline  
Old 03 July 2021, 19:17   #466
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by alkis View Post
..
Cause if it isn't 64k I am assembling this with wrong parameters
With vasm I just add the
-databss
option, I don't know for other assembler.

Alternatively you can use:
bss	DX.B	65536-(*-start)		; 64kb allowed for code+data
instead of
DS.B
ross is offline  
Old 03 July 2021, 19:56   #467
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
If you are using asm-one&co like myself (I noticed you commented in the PRINTVs so I guess that's the case), they can't handle merged code+bss or data+bss, they always write the entire bss part at the end to disk as well. So it's 65536+36=65572 bytes.
I'm a pleb so I handle this manually, RB + edit hunk size + WB with new size ;P.
a/b is online now  
Old 03 July 2021, 20:04   #468
alkis
Registered User
 
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
Quote:
Originally Posted by ross View Post
With vasm I just add the
-databss
option, I don't know for other assembler.

Alternatively you can use:
bss	DX.B	65536-(*-start)		; 64kb allowed for code+data
instead of
DS.B
Yeap, that did it. Thanks!
alkis is offline  
Old 03 July 2021, 20:04   #469
alkis
Registered User
 
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
Quote:
Originally Posted by a/b View Post
If you are using asm-one&co like myself (I noticed you commented in the PRINTVs so I guess that's the case), they can't handle merged code+bss or data+bss, they always write the entire bss part at the end to disk as well. So it's 65536+36=65572 bytes.
I'm a pleb so I handle this manually, RB + edit hunk size + WB with new size ;P.
I use vasm on linux. Ross was spot on.
alkis is offline  
Old 03 July 2021, 20:27   #470
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Ah, ok. Now you made me curious... vasm can handle printt/printv as well, nice.

I guess I should clarify what I meant with 3x slower: "accidentally" (that's why the red dude emoticon) having a few juicy processes running and proclaim that on my machine it's 3x slower ;P.
a/b is online now  
Old 03 July 2021, 21:40   #471
alkis
Registered User
 
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
Quote:
Originally Posted by a/b View Post
Ah, ok. Now you made me curious... vasm can handle printt/printv as well, nice.

I guess I should clarify what I meant with 3x slower: "accidentally" (that's why the red dude emoticon) having a few juicy processes running and proclaim that on my machine it's 3x slower ;P.
changetaskpri is your friend :P
alkis is offline  
Old 03 July 2021, 22:31   #472
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
So the range is now 468 (466 code) to 508 bytes depending on user settings (prompt, day timer, mt, doslib hack). The annoying part is most of the combinations could use another 2 byte reduction to drop the exe size by 4 and I can't find (yet) ;\.
I'll post my source by monday, gonna keep looking some more... And that will probably be it for me.

edit: Typical, right? There it is (in getnum):
Code:
;	move.b	#256-'0',d3
	move.b	d7,d3			; 2000&255 = $d0 = -'0'

Last edited by a/b; 03 July 2021 at 23:08.
a/b is online now  
Old 03 July 2021, 23:17   #473
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
And here is the whole thing... Gonna spend the rest of the evening watching Gubbdata.
edit: Added cursor disabling option by alkis.
edit2: Further size reduction by 2 bytes (4th print in getnum).
edit3: -2 bytes, zero value reg reuse.

Code:
;***************************************************************

; user settings
PRINT_DIGITS		= 1
OPT_PROMPT		= 1	; use optimized prompt text?
LONG_TIMER		= 0	; check for day transition?
DISABLE_MT		= 1	; disable multitasking?
CURSOR_OFF		= 0	; disable cursor (faster printing)?
HACKS			= 0	; use undocumented OS stuff?

; exec
TDNestCnt		= 295
LibList			= 378
LN_NAME			= 10	; list node name

; dos
Input			= -54
Output			= -60
Read			= -42
Write			= -48
DateStamp		= -192
TICKS_PER_SECOND	= 50	; dos timer frequency

; N = 7*D, D = digits, e.g. N = 700 for 100 digits

;***************************************************************

start
	IFEQ	HACKS&(~DISABLE_MT)
	move.l	4.w,a5			; exec library
	ENDIF
	IFEQ	HACKS
	lea	LibList(a5),a6		; find dos in library list
.lib_loop
	move.l	(a6),a6			; next library
	move.l	LN_NAME(a6),a0
	move.l	#'.sod',d0
.lib_name
	cmp.b	(a0)+,d0
	bne.b	.lib_loop
	lsr.l	#8,d0
	bne.b	.lib_name
	ELSE
	lea	-$148(a2),a6		; dos library from bcpl vector
	ENDIF	; HACKS

	jsr	Output(a6)
	move.l	d0,a3			; a3 = stdout
	lea	workspace(pc),a4
	movem.w	(a4),d5/d6/d7/a2	; 10000, MAXD, 2000, 7*4

	bsr.w	getnum			; returns N in d6 (k = N = 7*D)

	IFNE	DISABLE_MT
	addq.b	#1,TDNestCnt(a5)	; FORBID macro, a5 is free now
	ENDIF

	bsr.b	.gettime		; reg copy: a0 = d7, d7 = d6
	move.l	d1,-(a7)		; start time

;*** TIMED PART START ******************************************

.fill	move.w	a0,(a4)+		; 2000
	subq.w	#2,d7
	bne.b	.fill

; outer+inner loop:
;	d3 upper word must initially be and remain 0
; 	d7 must initially be 0 (c = 0)
; d0=*, d1=d, d2=b, d3=tmp, d4=10, d5=10000, d6=k, d7=c
; a0=*, a1=*, a2=7*4, a4=r[] (a3=stdout, a5=--, a6=dos)

.outer_loop
	moveq	#0,d1			; d = 0
	move.w	d6,d2
	subq.w	#1,d2			; b = k-1
	bra.b	.inner_entry

.gettime	; returns ticks in d1, and copies: d7->a0, d6->d7
	movem.l	d0/d1/d2/d6/d7,-(sp)
	move.l	sp,d1
	jsr	DateStamp(a6)
	movem.l	(sp)+,d0/d1/d2/d7/a0	; d0=days, d1=minutes, d2=ticks
	mulu.w	#TICKS_PER_SECOND*60,d1	; minutes to ticks
	add.l	d2,d1
	rts

.longdiv	; d0/d2, 32/16 -> 32q/16r
	swap	d0
	move.w	d0,d3
	divu.w	d2,d3
	swap	d3
	move.w	d3,d0
	swap	d0
	divu.w	d2,d0

	move.w	d0,d3
	clr.w	d0
	swap	d0
	move.w	d0,(a4)			; r[i] = d%b
	exg	d0,d3

	subq.w	#2,d2			; b -= 2
	bcs.b	.inner_done

.inner_loop
	sub.l	d0,d1			; d = (d-d/b-d%b)/2
	sub.l	d3,d1			;  (same as d *= i)
	lsr.l	#1,d1
.inner_entry
	move.w	-(a4),d0		; r[i]
	mulu.w	d5,d0
	add.l	d0,d1			; d += r[i]*10000
	move.l	d1,d0
	divu.w	d2,d0			; d/b
	bvs.b	.longdiv

	move.w	d0,d3			; d/b
	clr.w	d0
	swap	d0			; d%b
	move.w	d0,(a4)			; r[i] = d%b

	subq.w	#2,d2			; b -= 2
	bcc.b	.inner_loop

.inner_done
	divu.w	d5,d1			; d/10000
	add.w	d7,d1			; d = c+d/10000 (to be printed out)
	move.l	d1,d7
	swap	d7			; c = d%10000
	IFNE	PRINT_DIGITS
	bsr.b	PR0000
	ENDIF

	sub.w	a2,d6			; k -= 7*4
	add.l	d6,a4			; &r[k/2]
	bne.b	.outer_loop		; k = 0?

;*** TIMED PART END ********************************************

	bsr.b	.gettime
	sub.l	(a7)+,d1		; end-start time
	IFNE	LONG_TIMER
; I'll shoot if you ask for DST adjustment or anything similar.
	bpl.b	.same_day
	add.l	#TICKS_PER_SECOND*60*60*24,d1
.same_day
	ENDIF
	add.l	d1,d1			; dos ticks (1/50) to 1/100
	divu.w	#100,d1			; 100ths upper, seconds lower

	move.l	a4,d2			; print buffer
	move.b	#' ',(a4)+

	bsr.b	SPrintTime		; must have: d5 = 10000, d6 = 0

	move.b	#'.',(a4)+

	moveq	#'0',d6			; print leading zeroes
	swap	d1
	moveq	#10,d5
	bsr.b	SPrintTime

	move.b	d4,(a4)+		; newline

	move.l	a4,d3
	sub.l	d2,d3			; string length
	bra.b	callwrite

	; END OF PROGRAM (exec will re-enable multitasking)

;***************************************************************

SPrintTime	; d1=value, a4=buffer
	move.w	d1,d0
.Next	ext.l	d0
	divu.w	d5,d0			; digit 0-9
	cmp.b	d6,d0
	beq.b	.LeadZero
	moveq	#'0',d6
	add.b	d6,d0
	move.b	d0,(a4)+
.LeadZero
	swap	d0
	divu.w	d4,d5
	bne.b	.Next
	rts

PR0000		; d1=value
	move.l	#'0000'-$01010001,d0
	move.w	-(a4),d3
.Loop	addq.b	#1,d0			; top 3 digits in a loop
	add.w	d3,d1
	bpl.b	.Loop
	sub.w	d3,d1
	rol.l	#8,d0
	move.w	-(a4),d3		; last value is string length (4)
	bmi.b	.Loop
	add.b	d1,d0			; 4th digit
 
	move.l	d0,-(a4)		; to print buffer

	moveq	#pbuffer-workspace,d2
	sub.l	d2,a4
writetext
	add.l	a4,d2			; offset to buffer address
callwrite
	move.l	a3,d1			; stdout
	jmp	Write(a6) 		; call Write(stdout,buffer,length)

;***************************************************************

; Data must be in this order all up to msg1.

	CNOP	0,4
pbuffer	DCB.B	4,0			; keep it lword aligned preferably

dec2str	DC.W	dec2str-pbuffer,-10,-100,-1000

;*** OVERWRITTEN CODE/DATA STARTS HERE *************************
workspace

MAXD	=	((65536-(workspace-start))/7)&(~3) ; multiple of 4
	DC.W	10000,MAXD,2000,7*4

msg1	DC.B	"number pi calculator v18",10	; odd length
msg1end
msg2	DC.B	"number of digits (up to "	; even length
	IFNE	OPT_PROMPT
X	SET	MAXD
	DC.B	'0'+X/1000
X	SET	X-(X/1000)*1000
	DC.B	'0'+X/100
X	SET	X-(X/100)*100
	DC.B	'0'+X/10
X	SET	X-(X/10)*10
	DC.B	'0'+X
	ENDIF
msg2end
msg3	DC.B	")? "				; odd length
	IFNE	CURSOR_OFF
	DC.B	$1b,"[0 p"			; to even length
	ENDIF
msg3end
	IFEQ	OPT_PROMPT
	EVEN
printnum
	bra.b	PR0000				; chained short branch
	ENDIF
msg4	DC.B	" digits will be printed",10	; even length
msg4end
	EVEN

;***************************************************************

getnum
	moveq	#10,d4			; global const (here for alignment)

	moveq	#msg1-workspace,d2
	moveq	#msg1end-msg1,d3
	bsr.b	writetext
.error
	moveq	#msg2-workspace,d2
	IFNE	OPT_PROMPT
	moveq	#msg3end-msg2,d3
	ELSE
	moveq	#msg2end-msg2,d3
	bsr.b	writetext
	move.w	d6,d1			; MAXD
	bsr.b	printnum
	moveq	#msg3-workspace,d2
	moveq	#msg3end-msg3,d3
	ENDIF
	bsr.b	writetext

	jsr	Input(a6)
	move.l	d0,d1			; stdin
	move.l	a4,d2			; read buffer
	moveq	#4+1,d3			; up to 4 digits + newline
	jsr	Read(a6)		; returns length in d0

	move.l	d2,a0
	moveq	#0,d1
.nextch	subq.w	#1,d0
	beq.b	.parsed
	move.b	d7,d3			; 2000&$ff = $d0 = -'0'
	add.b	(a0)+,d3
	cmp.b	d4,d3			; digit 0-9?
	bhs.b	.error
	mulu.w	d4,d1			; D = D*10+digit
	add.w	d3,d1			; d3 bits 8-15 must be clear
	bra.b	.nextch
.parsed
	cmp.w	d6,d1			; D > MAXD?
	bhi.b	.error
	move.w	d1,d3			; D = 0?
	beq.b	.error

	addq.w	#3,d1
	and.w	#~3,d1			; adjust D to a multiple of 4
	moveq	#7,d6
	mulu.w	d1,d6			; k = N = 7*D
	cmp.b	(a0),d4			; last char is newline (1-4 digits)?
	bne.b	.adjusted
	sub.w	d1,d3
	beq.b	.not_adjusted
.adjusted				; either 5 digits or adjusted D
	IFNE	OPT_PROMPT
	bsr.w	PR0000
	ELSE
	bsr.b	printnum
	ENDIF
	moveq	#msg4end-msg4,d3
.not_adjusted
	moveq	#msg4-workspace,d2
	bra.w	writetext

;***************************************************************

bss	DS.B	65536-(*-start)		; 64kb allowed for code+data

;***************************************************************

; Enable these if your assembler can handle them:
;	PRINTV	bss-start+36		; 36 = hunk overhead
;	PRINTV	MAXD			; max number of digits
;	PRINTV	(pbuffer-start)&3	; pbuffer alignment

;***************************************************************

Last edited by a/b; 05 July 2021 at 03:52.
a/b is online now  
Old 04 July 2021, 00:00   #474
alkis
Registered User
 
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
You can add this to user settings

Code:
CURSOROFF		= 0	; minor speed increase on printing
and this near msg3
Code:
msg3	DC.B	")? "				; odd length
	IFNE	CURSOROFF
	DC.B	$1B,"[0 p"
	ENDIF
msg3end
tiny speedup when enabled, due to switching cursor off and printing faster.
alkis is offline  
Old 04 July 2021, 01:51   #475
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 55
Posts: 1,959
Quote:
Originally Posted by a/b View Post
So the range is now 468 (466 code) to 508 bytes depending on user settings (prompt, day timer, mt, doslib hack). The annoying part is most of the combinations could use another 2 byte reduction to drop the exe size by 4 and I can't find (yet) ;\.
I'll post my source by monday, gonna keep looking some more... And that will probably be it for me.

edit: Typical, right? There it is (in getnum):
Code:
;	move.b	#256-'0',d3
	move.b	d7,d3			; 2000&255 = $d0 = -'0'
But you can use my idea, maybe only changing size from "number " to "number". You win 2 bytes.
Don_Adan is offline  
Old 04 July 2021, 03:08   #476
Bruce Abbott
Registered User
 
Bruce Abbott's Avatar
 
Join Date: Mar 2018
Location: Hastings, New Zealand
Posts: 2,544
Quote:
Originally Posted by alkis View Post
changetaskpri is your friend :P
I bumped the priority up to +127 and it reduced execution time from 9.58 seconds to 9.52 seconds when running from a shell window on Workbench. However if I booted with no startup-sequence it only took 9.44 seconds with or without changing the task priority. So that means the background tasks I normally have running are using ~1.5% of the CPU.

The 'forbid' code doesn't seem to make any difference, perhaps because it is broken by DOS calls?

I think the 'hack' option without other speedup tricks and run from the initial CLI with high priority is a fair comparison to the 386 DOS code, as it is closest to being the same environment. ltwr's pi.ibmpc.com program is 623 bytes. At 480 bytes we are only 77% of that size, as well as faster on a 50MHz 030 than the fastest 386 ever made (40MHz).

Now to answer the question this thread was started for. litwr claimed that the 68020 was hard to optimize for compared to a 386, and he was right! Turns out that (in this case) using 68020 instructions provides no benefit. However we discovered that 68k code in general can easily be optimized for speed and code and density without having to limit it to a specific processor. This is good news for those of us who wish to 'write once, run anywhere' on any Amiga no matter what CPU it has.

Last edited by Bruce Abbott; 04 July 2021 at 03:16.
Bruce Abbott is offline  
Old 04 July 2021, 03:38   #477
alkis
Registered User
 
Join Date: Dec 2010
Location: Athens/Greece
Age: 53
Posts: 719
Quote:
Originally Posted by Bruce Abbott View Post
The 'forbid' code doesn't seem to make any difference, perhaps because it is broken by DOS calls?
Yes, I think dos IO breaks the Forbid state.

Quote:
Originally Posted by Bruce Abbott View Post
I think the 'hack' option without other speedup tricks and run from the initial CLI with high priority is a fair comparison to the 386 DOS code, as it is closest to being the same environment. ltwr's pi.ibmpc.com program is 623 bytes. At 480 bytes we are only 77% of that size, as well as faster on a 50MHz 030 than the fastest 386 ever made (40MHz).
Well, pc dos writes at a character screen and amiga on bitmap. So, pc starts with an advantage and still loses. To compare under the same conditions print should be disabled.
alkis is offline  
Old 04 July 2021, 10:44   #478
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Yes, printing is breaking it. If you run with PRINT_DIGITS=0 it will work as expected, it will "freeze" WB and everything else.
a/b is online now  
Old 04 July 2021, 10:57   #479
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by a/b View Post
... Gonna spend the rest of the evening watching Gubbdata.
[OT]
The 1541 demo? Pure demo scene awesomeness
And Bromance is GREAT.
[/OT]
ross is offline  
Old 04 July 2021, 10:57   #480
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,039
Quote:
Originally Posted by alkis View Post
You can add this to user settings
Done (renamed to CURSOR_OFF to use the same convention as the rest). Previous post edited.
a/b is online now  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
68020 Bit Field Instructions mcgeezer Coders. Asm / Hardware 9 27 October 2023 23:21
68060 64-bit integer math BSzili Coders. Asm / Hardware 7 25 January 2021 21:18
Discovery: Math Audio Snow request.Old Rare Games 30 20 August 2018 12:17
Math apps mtb support.Apps 1 08 September 2002 18:59

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 19:36.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.12914 seconds with 16 queries