English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 11 June 2024, 21:14   #1
Yulquen74
Registered User
 
Join Date: May 2013
Location: Kleppe / Norway
Posts: 262
Linefeed counter code

Below snip is part of something I pieced together, and it appears to work.
Can it be made faster? Target is 68000.

move.l _FileSz,d0
clr.l d1
move.l _BufferPt,a0

LineCountLoop:
cmp.b #LF,0(a0,d0)
beq NewLineFound

LineCountLoop2:
dbra d0,LineCountLoop
bra LineCountStop

NewLineFound:
addq.l #1,d1
bra LineCountLoop2

LineCountStop:
move.l d1,_LineCount
Yulquen74 is offline  
Old 11 June 2024, 21:55   #2
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,053
Something like this:
Code:
	move.l	_BufferPt,a0
	move.l	_FileSz,d0
	moveq	#LF,d2
; char after the last one (last to be checked) must not be a LF
; buffer size should be (at least) 1 char larger than filesize
	clr.b	(a0,d0.l)
	addq.l	#1,d0
	moveq	#-1,d1
found:	subq.l	#1,d0
	addq.l	#1,d1
loop:	cmp.b	(a0)+,d2
	dbeq	d0,loop
	beq.b	found
	sub.l	#1<<16,d0	; support for files >64kb
	bcc.b	loop
	move.l	d1,_LineCount

Last edited by a/b; 11 June 2024 at 23:28. Reason: large file support (hello 21st century) :P
a/b is offline  
Old 12 June 2024, 03:07   #3
jimshaw
Registered User
 
jimshaw's Avatar
 
Join Date: Apr 2021
Location: Sydney
Posts: 6
Instead of branching, you could try
seq d3
add.l d3,d1
after the cmp.

Your total will end up 255 times too big, so you would need to divide that out.
jimshaw is offline  
Old 12 June 2024, 19:49   #4
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,029
Nice trick, anyway divide can overflow on 68000.
Maybe something like this can works and be fast enough?

Code:
 moveq #0,d3
 ......

 seq d3
 neg.b d3
 add.l d3,d1
Don_Adan is offline  
Old 12 June 2024, 21:21   #5
Yulquen74
Registered User
 
Join Date: May 2013
Location: Kleppe / Norway
Posts: 262
Nice suggestions everyone!

I tested the seq trick with a 2MB file with 108000 lines, and at 7MHz loading and execution 18 secs, without the divide by 255 at the end.

Fast processing of the larger files is a priority, so I can perhaps unroll the loop combined with code to do proper 32/32 divide to get it even faster.
Yulquen74 is offline  
Old 12 June 2024, 21:28   #6
Thomas Richter
Registered User
 
Join Date: Jan 2019
Location: Germany
Posts: 3,302
For all practical matters, the I/O will be dominating the actual processing of the file content...
Thomas Richter is offline  
Old 12 June 2024, 22:20   #7
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,053
seq&co will be done per char, so unless you expect a very high % of NLs, you want as few instructions per char as possible. E.g. 8+10 cycles for cmp/dbeq for every non-NL char beats seq&co easily.
Using your example, 2mb chars and 108k lines (~5% of NLs):
- cmp/seq/add/dbf ~= 30*2mb ~= 63mil cycles
- cmp/dbeq = 18*(2mb-108k)+48*108k ~= 41mil cycles
50% faster and don't even have to bother unrolling it (it would still be faster).
a/b is offline  
Old 13 June 2024, 02:10   #8
a/b
Registered User
 
Join Date: Jun 2016
Location: europe
Posts: 1,053
Since I have nothing better to do while waiting on an NBA game, here is another take. A little faster but longer (unrolled). The same buffer requirement as above (1 extra byte at the end).
Code:
	move.l	_BufferPt,a0
	move.l	a0,a1
	add.l	_FileSz,a1
	moveq	#LF,d0
	move.b	d0,(a1)
	moveq	#-1,d1
loop	REPT	15
	cmp.b	(a0)+,d0
	beq.b	found
	ENDR
	cmp.b	(a0)+,d0
	bne.b	loop
found	addq.l	#1,d1
	cmp.l	a1,a0
	bls.b	loop
	move.l	d1,_LineCount
It's roughly 35-36mil cycles.
a/b is offline  
Old 13 June 2024, 08:02   #9
Yulquen74
Registered User
 
Join Date: May 2013
Location: Kleppe / Norway
Posts: 262
Great, will test it later!
Yulquen74 is offline  
Old 13 June 2024, 09:18   #10
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,029
Nice code.
Anyway, when I was young, sometimes I needed "in place" searching/patching without extra allocated 1 byte.
I dont want to copy source file and alloc new buffer too.
I used something like this:
Code:
	move.l	_BufferPt,a0
	move.l	a0,a1
	add.l	_FileSz,a1
	moveq	#LF,d0
   move.b (a1),d2          ; backup byte
	move.b	d0,(a1)
	moveq	#-1,d1
loop	REPT	15
	cmp.b	(a0)+,d0
	beq.b	found
	ENDR
	cmp.b	(a0)+,d0
	bne.b	loop
found	addq.l	#1,d1
	cmp.l	a1,a0
	bls.b	loop
  move.b d2,(a1)            ; restore byte
	move.l	d1,_LineCount
Don_Adan is offline  
Old 13 June 2024, 10:39   #11
roondar
Registered User
 
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,436
Quote:
Originally Posted by Yulquen74 View Post
I tested the seq trick with a 2MB file with 108000 lines, and at 7MHz loading and execution 18 secs, without the divide by 255 at the end.
Quote:
Originally Posted by a/b View Post
- cmp/seq/add/dbf ~= 30*2mb ~= 63mil cycles- cmp/dbeq = 18*(2mb-108k)+48*108k ~= 41mil cycles
Quote:
Originally Posted by a/b View Post
It's roughly 35-36mil cycles.
Just to give a different perspective than millions of cycles, I converted the above to seconds and a percentage of the total 18 seconds runtime.
  • Assuming cmp/seq/add/dbf is "the seq trick", that code will spend ~8,9 seconds counting newlines out of 18 seconds (or ~41% of the total time spent).
  • The simpler cmp/dbeq would take ~5,8 seconds counting newlines out of 15 seconds (or ~38% of the total time spent).
  • The unrolled code would take ~5,1 seconds counting newlines out of 14 seconds (or ~36% of the totel time spent)
Of course, these are theoretical numbers and they assume the system can spend all it's available cycles on counting newlines with the CPU for the period of time it's doing the counting. This is of course not entirely accurate, as the OS will both have some overhead and some form of disk caching that will likely impact the actual results.
roondar is offline  
Old 13 June 2024, 15:20   #12
nocash
Registered User
 
Join Date: Feb 2016
Location: Homeless
Posts: 65
For not allocating an extra byte, you could temporarily change the last byte (if it's writeable, and if no other code is reading from it).

But fastest would be doing the counting on the fly during loading, though results depend on whether it's loaded with/without OS, from which media, with/without RAM and disk cache.

And there are cases where you don't really need to know the exact number of lines, such like displaying a vertical scrollbar for a large file: simply use the byte-address as scrollbar index.

Last edited by nocash; 13 June 2024 at 15:26.
nocash is offline  
Old 13 June 2024, 15:43   #13
Don_Adan
Registered User
 
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,029
Counting on the fly during loading will be much slower.
For me not worth.
Maybe only, if not enough continues memory available for loading big file and it will be second (optional) routine.
But this is dependent for what this routine is necessary.
Don_Adan is offline  
Old 13 June 2024, 18:39   #14
Yulquen74
Registered User
 
Join Date: May 2013
Location: Kleppe / Norway
Posts: 262
So, at worst case (7MHz) a/b's new code uses about 7.3 secs to complete using the 2.1MB example file (1000 startup-sequences stitched together).
About 1 sec is spent loading the program + file to RAM (A590).

At regular speed in this A500 (61MHz/1ws in this case) the operation takes about 1.6 secs. Just under 1s is spent on HD activity.

So this is a great improvement, and I am pretty happy with the code as it is now.
Yulquen74 is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
feature request: frame counter a/b support.WinUAE 3 24 May 2023 21:15
Frame counter in Blitz LateBit Coders. Blitz Basic 3 27 December 2022 21:40
dbra to handle long counter Geijer Coders. Asm / Hardware 11 27 November 2020 10:50
Address pointers with Program Counter Lonewolf10 Coders. Asm / Hardware 8 27 October 2015 11:40
Program Counter with Offset - why? Jherek Carnelia Coders. General 26 21 March 2011 10:49

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 11:04.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.11472 seconds with 14 queries