11 June 2024, 21:14 | #1 |
Registered User
Join Date: May 2013
Location: Kleppe / Norway
Posts: 273
|
Linefeed counter code
Below snip is part of something I pieced together, and it appears to work.
Can it be made faster? Target is 68000. move.l _FileSz,d0 clr.l d1 move.l _BufferPt,a0 LineCountLoop: cmp.b #LF,0(a0,d0) beq NewLineFound LineCountLoop2: dbra d0,LineCountLoop bra LineCountStop NewLineFound: addq.l #1,d1 bra LineCountLoop2 LineCountStop: move.l d1,_LineCount |
11 June 2024, 21:55 | #2 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,087
|
Something like this:
Code:
move.l _BufferPt,a0 move.l _FileSz,d0 moveq #LF,d2 ; char after the last one (last to be checked) must not be a LF ; buffer size should be (at least) 1 char larger than filesize clr.b (a0,d0.l) addq.l #1,d0 moveq #-1,d1 found: subq.l #1,d0 addq.l #1,d1 loop: cmp.b (a0)+,d2 dbeq d0,loop beq.b found sub.l #1<<16,d0 ; support for files >64kb bcc.b loop move.l d1,_LineCount Last edited by a/b; 11 June 2024 at 23:28. Reason: large file support (hello 21st century) :P |
12 June 2024, 03:07 | #3 |
Registered User
Join Date: Apr 2021
Location: Sydney
Posts: 8
|
Instead of branching, you could try
seq d3 add.l d3,d1 after the cmp. Your total will end up 255 times too big, so you would need to divide that out. |
12 June 2024, 19:49 | #4 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,088
|
Nice trick, anyway divide can overflow on 68000.
Maybe something like this can works and be fast enough? Code:
moveq #0,d3 ...... seq d3 neg.b d3 add.l d3,d1 |
12 June 2024, 21:21 | #5 |
Registered User
Join Date: May 2013
Location: Kleppe / Norway
Posts: 273
|
Nice suggestions everyone!
I tested the seq trick with a 2MB file with 108000 lines, and at 7MHz loading and execution 18 secs, without the divide by 255 at the end. Fast processing of the larger files is a priority, so I can perhaps unroll the loop combined with code to do proper 32/32 divide to get it even faster. |
12 June 2024, 21:28 | #6 |
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 3,403
|
For all practical matters, the I/O will be dominating the actual processing of the file content...
|
12 June 2024, 22:20 | #7 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,087
|
seq&co will be done per char, so unless you expect a very high % of NLs, you want as few instructions per char as possible. E.g. 8+10 cycles for cmp/dbeq for every non-NL char beats seq&co easily.
Using your example, 2mb chars and 108k lines (~5% of NLs): - cmp/seq/add/dbf ~= 30*2mb ~= 63mil cycles - cmp/dbeq = 18*(2mb-108k)+48*108k ~= 41mil cycles 50% faster and don't even have to bother unrolling it (it would still be faster). |
13 June 2024, 02:10 | #8 |
Registered User
Join Date: Jun 2016
Location: europe
Posts: 1,087
|
Since I have nothing better to do while waiting on an NBA game, here is another take. A little faster but longer (unrolled). The same buffer requirement as above (1 extra byte at the end).
Code:
move.l _BufferPt,a0 move.l a0,a1 add.l _FileSz,a1 moveq #LF,d0 move.b d0,(a1) moveq #-1,d1 loop REPT 15 cmp.b (a0)+,d0 beq.b found ENDR cmp.b (a0)+,d0 bne.b loop found addq.l #1,d1 cmp.l a1,a0 bls.b loop move.l d1,_LineCount |
13 June 2024, 08:02 | #9 |
Registered User
Join Date: May 2013
Location: Kleppe / Norway
Posts: 273
|
Great, will test it later!
|
13 June 2024, 09:18 | #10 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,088
|
Nice code.
Anyway, when I was young, sometimes I needed "in place" searching/patching without extra allocated 1 byte. I dont want to copy source file and alloc new buffer too. I used something like this: Code:
move.l _BufferPt,a0 move.l a0,a1 add.l _FileSz,a1 moveq #LF,d0 move.b (a1),d2 ; backup byte move.b d0,(a1) moveq #-1,d1 loop REPT 15 cmp.b (a0)+,d0 beq.b found ENDR cmp.b (a0)+,d0 bne.b loop found addq.l #1,d1 cmp.l a1,a0 bls.b loop move.b d2,(a1) ; restore byte move.l d1,_LineCount |
13 June 2024, 10:39 | #11 | ||
Registered User
Join Date: Jul 2015
Location: The Netherlands
Posts: 3,456
|
Quote:
Quote:
|
||
13 June 2024, 15:20 | #12 |
Registered User
Join Date: Feb 2016
Location: Homeless
Posts: 67
|
For not allocating an extra byte, you could temporarily change the last byte (if it's writeable, and if no other code is reading from it).
But fastest would be doing the counting on the fly during loading, though results depend on whether it's loaded with/without OS, from which media, with/without RAM and disk cache. And there are cases where you don't really need to know the exact number of lines, such like displaying a vertical scrollbar for a large file: simply use the byte-address as scrollbar index. Last edited by nocash; 13 June 2024 at 15:26. |
13 June 2024, 15:43 | #13 |
Registered User
Join Date: Jan 2008
Location: Warsaw/Poland
Age: 56
Posts: 2,088
|
Counting on the fly during loading will be much slower.
For me not worth. Maybe only, if not enough continues memory available for loading big file and it will be second (optional) routine. But this is dependent for what this routine is necessary. |
13 June 2024, 18:39 | #14 |
Registered User
Join Date: May 2013
Location: Kleppe / Norway
Posts: 273
|
So, at worst case (7MHz) a/b's new code uses about 7.3 secs to complete using the 2.1MB example file (1000 startup-sequences stitched together).
About 1 sec is spent loading the program + file to RAM (A590). At regular speed in this A500 (61MHz/1ws in this case) the operation takes about 1.6 secs. Just under 1s is spent on HD activity. So this is a great improvement, and I am pretty happy with the code as it is now. |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
feature request: frame counter | a/b | support.WinUAE | 3 | 24 May 2023 21:15 |
Frame counter in Blitz | LateBit | Coders. Blitz Basic | 3 | 27 December 2022 21:40 |
dbra to handle long counter | Geijer | Coders. Asm / Hardware | 11 | 27 November 2020 10:50 |
Address pointers with Program Counter | Lonewolf10 | Coders. Asm / Hardware | 8 | 27 October 2015 11:40 |
Program Counter with Offset - why? | Jherek Carnelia | Coders. General | 26 | 21 March 2011 10:49 |
|
|