English Amiga Board


Go Back   English Amiga Board > Main > Amiga scene

 
 
Thread Tools
Old 20 July 2012, 20:23   #1
DH
Global Moderator
 
DH's Avatar
 
Join Date: Sep 2008
Location: Might as well be WORK :(
Age: 56
Posts: 4,110
HELP!!! Grapevine Diskmag Articles: Volunteers Wanted

Right, a quick explanation to what I'm doing. I am converting the Grapevine Diskmags to PDF (complete)

I can get hold of the the contents of #01 - #04 from bitfellas, but #05 - #07 will have to be hand written unless I can persuade Mr Oakvalley to rip the text using his Screenshot OCR proggy, or, if someone has the time and knowledge to disassemble issues #05 - #07 to retrieve the contents of the articles, either one would be greatly appreciated

So don't worry too much about those issues at this time

I would like a few volunteers to to help in the form of saving out the articles as .txt files using this filename 'Gv_09-Article###.txt' (the '###' represents the number allocated to the actual article (3 digit number) and '.txt' can be omitted when saving as it's automatically added) and that's about it
  • GV#08
  • * GV#09 (Being Processed)
  • GV#10
  • GV#11
  • GV#12
  • GV#13
  • GV#14
  • GV#15
  • GV#16
  • GV#17
  • GV#18
  • GV#19
  • GV#20
  • GV#21

* What's in bold green has already been ripped

Obviously, for best results you'll need to use WinUAE with the 'Most Common A500' setting, adding a non-bootable HDD to save the images into (tweaking:- 'Hardware/CPU & FPU/CPU Emulation Speed/' to fastest possible & the 'Floppy Drives/Floppy Drive Emulation Speed/' to Turbo) This speeds up the time taken to load, view and save each article, although I cannot guarantee it will work and behave correctly 100% of the time, so if you have problems return these settings to their defaults.

For those of you who are not familiar with the diskmag, nor the controls (because some are just pictures on different issues), I can give you some basic tips on how to use it and to minimise the time-scale it takes.

If anyone takes up the request, can you please post which one you have undertaken so as to stop others duplicating what's already being processed and stick to only that issue until it's complete.

I don't really know if this is a worthy project or not, I don't even know if this has ever been done/tried before, just seems a shame that this old diskmag has never been modernised for speedy viewing before now. Maybe I'm nuts, maybe the task at hand is too large, but all I can do is try.


EDIT: I can't do this on my own, it takes far too long from retrieving the articles to re-formatting and then converting them to pdf, so basically, if I don't get any help, the project is not going to go ahead, period.

Last edited by DH; 21 July 2012 at 13:32.
DH is offline  
Old 29 August 2012, 22:31   #2
StoneOakvalley
Registered User
 
Join Date: Jan 2009
Location: Norway
Age: 49
Posts: 105
My PointTracer-OCR v0.1 software is just a code that isn't really much helpful for others, allthough:

Some notes that I scribbled down:
Issue 5 = font style 1
Issue 6-7 same as above, but only uppercased
Isseu 8-21 = font style 2

;Define letter area
x=44 ; use 48 for issue 5-7, use 44 for issues 8-21
y=86 ; use 87 for issus 5-7, use 84 for issues 8-13, from 14-21 use 86 in value here
xx=680 ; use 680 for issues 8-21
yy=415 ; use 385 for issues 5-7, use 415 for issues 8-21

These was grabbing coordinates for whereever the text area was in pixel positions for all issues. x,y is topleft, while xx,yy is bottomright.

Talking about the 68OOO issue, it should be (within a good hit percentage) of code a routine that "if number and no space next to uppercases O" = produce a 0 (zero) instead:-)

Now getting all the issues out can be a tiresome process. Doing what you planned will take as long as they original issues took to compile together....years :-) It has to be automated, but problem is that they used coloring, illustrations and photos and with paragraph breaks and stuff.

There should be a kind-of automated way of ripping every page out using some home produced tools, automate key presses and automate winuae + screengrabbing. Naturally that does not exists, but I'm pretty sure if I get the time it would be possible.

It should be triggered by scanning the "page x of x" for each article, with counters together and voila, something automated might pop out as numbered screenshots.
StoneOakvalley is offline  
Old 29 August 2012, 23:38   #3
DH
Global Moderator
 
DH's Avatar
 
Join Date: Sep 2008
Location: Might as well be WORK :(
Age: 56
Posts: 4,110
I've nearly completed disk 1 of GV #9 and boy is it demanding. Their method of saving the articles doesn't always work correctly, sometimes there are lines missing, usually on page 1 only, and even whole pages are missing, so it's not really that reliable.

Even when it's done the whole process has to be re-formatted.... Completely which is not too bad when you're in the swing of things.

Not really sure if this is worthwhile tbh, I suppose it'll all depend on any automation process, but we'll have to wait and see
DH is offline  
Old 30 August 2012, 04:45   #4
Codetapper
2 contact me: email only!
 
Codetapper's Avatar
 
Join Date: May 2001
Location: Auckland / New Zealand
Posts: 3,182
Would it not be better to put time into understanding the internal format so it can rip all the articles automatically? Even if you had to go back through later manually and extract the pages with pictures, surely that would be a lot faster than this OCR'ing every screen technique?

I made an XFD decruncher to decrypt the 'Loons' docs disks as they had a very basic line scrambler built into them. Manually doing the work on each doc would have taken forever!
Codetapper is offline  
Old 30 August 2012, 13:15   #5
DH
Global Moderator
 
DH's Avatar
 
Join Date: Sep 2008
Location: Might as well be WORK :(
Age: 56
Posts: 4,110
Indeed it would be Codetapper, the only reason why I didn't ask in post#1 is, will anybody be willing to spend their time finding out and creating the auto rip.

I'll complete GV#9 (both didks) the way I'm doing it for now, but after that there has to be an automation process or it just isn't going to happen.
DH is offline  
Old 09 April 2015, 15:53   #6
Pyromania
Moderator
 
Pyromania's Avatar
 
Join Date: Jan 2002
Location: Chicago, IL
Posts: 3,375
Was this ever completed?
Pyromania is offline  
Old 03 November 2017, 14:09   #7
DH
Global Moderator
 
DH's Avatar
 
Join Date: Sep 2008
Location: Might as well be WORK :(
Age: 56
Posts: 4,110
Crap, completely missed your post Pyromania

Well, here's your extremely late, by 2.5 years, nope, wasn't completed as there simply wasn't an automation to rip the contents
DH is offline  
Old 05 November 2023, 18:12   #8
Geordie-Jedi
Registered User
 
Geordie-Jedi's Avatar
 
Join Date: Sep 2023
Location: UK - North East England
Posts: 45
Hi there guys.

(Sorry for necro'ing an old thread).

I absolutely loved Grapevine BITD. My favourite diskmag.
This sounds like a very interesting project (and a HUGE amount of work).

@DH - Did you get any further with this ?
Or did you get to issue 09 and stop there ?

It piqued my curiosity, and I took a quick look at issue 10
Using FS-UAE, and taking a look at the articles in a couple of different
hex editors, and text editors.

All with the same result - gobbeldygook, and some basic text
amongst all the hex. (No surprises there at all, really).

As others have said this would probably take -

1. Understanding the packing format for the files, so that they can be
extracted properly.

2. OCR'ing the articles individually

3. Contacting any of the LSD team to see if they would be willing to help out.
(E.G. Describing how the articles were packed/crunched - or if they still have
the old .txt files kicking around - Yes highly unlikely after such a long time)

I also tried some OCR work with OCR-Feeder - but got terrible results
(mainly due to the very poor DPI produced by the screenshot)

However I have had a bit of success with "gscan2pdf" on Linux
once you invert the colours to white background and black text
the accuracy was massively better.

Now just trying to see if I can -
Batch upload the images
Invert the colours
Scan / OCR
produce a .txt file

Last edited by Geordie-Jedi; 05 November 2023 at 18:35. Reason: additional information
Geordie-Jedi is offline  
Old 12 November 2023, 20:49   #9
Geordie-Jedi
Registered User
 
Geordie-Jedi's Avatar
 
Join Date: Sep 2023
Location: UK - North East England
Posts: 45
After a bit more experimentation I have found -

4. Using gscan2pdf on Linux.

4.1. You can upload multiple images (screen shots).
4.2. Convert them to negative (white background and black text).
4.3. Select the area of the screen that you want to OCR.
4.5. Run the OCR scan.
4.6. Clean up the resultant text file.


5. Using Google docs and then OCR

5.1. You don't need to convert the images to negative.
5.2. Google-OCR will produce reasonable text from the screenshots.
5.3. Download the Google-OCR results.
5.4. Clean up the resultant text file.

6. Either way, you still have to manually clean up the text file produced by the OCR scan.

It seems that the 2 pane view of the pages in Grapevine seem to confuse the OCR scanners.

As DH and others have said - It's a HUGE amount of work without load of people chipping in,
or trying to "unpack / decrunch" the articles.

Last edited by Geordie-Jedi; 12 November 2023 at 20:51. Reason: Spelling
Geordie-Jedi is offline  
Old 13 November 2023, 16:45   #10
andy2004
Zone Friend
 
Join Date: May 2006
Location: Hampshire
Age: 49
Posts: 271
Send a message via Yahoo to andy2004
from what i remember about grapevine..
articles where compressed with powerpacker then joined together.. the header PP20 was replaced with something else.. to hide the fact powerpacker was used..
there was a app that came with the xpkmaster archive i used to scan the bin of grapevine xscan or something.. which could extract and unpack the txt's
andy2004 is offline  
Old 14 November 2023, 09:08   #11
h0ffman
Registered User
 
Join Date: Aug 2008
Location: Salisbury
Posts: 744
Each article appears to have a header of "TXT!" which is probably just a replacement of "PP20". Splitting them out and unpacking is likely a piece of piss.
h0ffman is offline  
Old 14 November 2023, 09:31   #12
h0ffman
Registered User
 
Join Date: Aug 2008
Location: Salisbury
Posts: 744
Yes, it was easy.


h0ffman is offline  
Old 14 November 2023, 09:36   #13
TCD
HOL/FTP busy bee
 
TCD's Avatar
 
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,518
Quote:
Originally Posted by h0ffman View Post
Yes, it was easy.
Nice one
TCD is offline  
Old 14 November 2023, 10:28   #14
h0ffman
Registered User
 
Join Date: Aug 2008
Location: Salisbury
Posts: 744
Windows command line executable now in the zone "GV.zip". It only extracts the articles and fixes the header. You'll need to unpack them from Power Packer yourself.
h0ffman is offline  
Old 14 November 2023, 20:03   #15
Geordie-Jedi
Registered User
 
Geordie-Jedi's Avatar
 
Join Date: Sep 2023
Location: UK - North East England
Posts: 45
@andy2004 - Cheers Andy, that's interesting.

7. I downloaded and installed a few different packers / crunchers
onto my emulated Amiga A1200, and tried to unpack / access the
Grapevine text files.

I didn't have much joy....

I kept getting errors like -

7.1. "Buffer overflow cant crunch / de-crunch the file"
(I wasn't trying to crunch the file, only open it).

7.2. "No hunk header, not a command file".


@H0ffman - Wow ! Outstanding work mate ! Well done.

Can you tell me more about how you manged to do this please ?


Other questions (if you don't mind ?) -

8. Did you only need/use that Windows CLI tool to extract the files
or amend the headers of the file (so that they can then be read
by Powepacker once more) ?

9. Did you use any other tools to get this done ?
9.1. Were they Windows or Amiga apps ?

Thank you very much indeed for everyone's help on this project.
It's very much appreciated
Geordie-Jedi is offline  
Old 14 November 2023, 21:04   #16
h0ffman
Registered User
 
Join Date: Aug 2008
Location: Salisbury
Posts: 744
Quote:
Originally Posted by Geordie-Jedi View Post
@andy2004 - Cheers Andy, that's interesting.

@H0ffman - Wow ! Outstanding work mate ! Well done.

Can you tell me more about how you manged to do this please ?


Other questions (if you don't mind ?) -

8. Did you only need/use that Windows CLI tool to extract the files
or amend the headers of the file (so that they can then be read
by Powepacker once more) ?

9. Did you use any other tools to get this done ?
9.1. Were they Windows or Amiga apps ?

Thank you very much indeed for everyone's help on this project.
It's very much appreciated
I looked at the grapevine articles file in a hex editor, the format of it is really simple.

Code:
$2  bytes - Number of articles

[repeat for each article]
$2c bytes - Article title
$4  bytes - Article offset in file
$4  bytes - Article file size

rest..  data files packed with powerpack with PP20 replaced with TXT!
Once you've found all the articles, you just pull them out of the file, replace the header and save them individually. With the format determined I chalked up a quick console app in C#, took about 5 minutes. You could write this tool in any language easily enough.
h0ffman is offline  
Old 19 November 2023, 19:02   #17
Geordie-Jedi
Registered User
 
Geordie-Jedi's Avatar
 
Join Date: Sep 2023
Location: UK - North East England
Posts: 45
@H0ffman

Thank you very much for the script.

I've taken a quick look at it in a text editor, and although it mostly makes sense to me.
(I've not programmed properly for donkeys).

I run Linux as my OS, and I span up a version of Wine to try and run the script.
However I'm having a bit of difficulty, getting it to run in either a 32-bit Wineprefix
Or a 32-bit version of Play on Linux.

Would you mind If I shared your code on another forum, to see if anyone else
can come up with a Linux equivalent script or bash app, please ?

I don't want to pester the life outta you with a bunch of extra questions
seeing as you've already provided a great deal of help so far.
Geordie-Jedi is offline  
Old 20 November 2023, 09:30   #18
h0ffman
Registered User
 
Join Date: Aug 2008
Location: Salisbury
Posts: 744
See below. C# source code attached.
Attached Files
File Type: zip Program.zip (1.2 KB, 33 views)
h0ffman is offline  
Old 20 November 2023, 22:59   #19
Geordie-Jedi
Registered User
 
Geordie-Jedi's Avatar
 
Join Date: Sep 2023
Location: UK - North East England
Posts: 45
Wow ! That's outstanding

Thanks again H0ffman
Geordie-Jedi is offline  
Old 22 November 2023, 01:12   #20
Geordie-Jedi
Registered User
 
Geordie-Jedi's Avatar
 
Join Date: Sep 2023
Location: UK - North East England
Posts: 45
Hi again H0ffman
If I can indulge your patience a little bit more.............

I have almost got your script/app running in Linux, by doing the following -

1.1. Downloaded and installed the whole Mono suite (including Mono mcs)

1.2. I have created a test dir that contains -

Program.zip
Program.cs
GrapevineArticles#1-

1.3. Built the Program.exe from the Program.cs using Mono mcs via the CLI

1.4. This is the Code I ran to try and run the Program.exe from the CLI
Code:
./Program.exe GrapevineArticles#1-
However when I try to then run the Program.exe, I get the following error messages -

Code:
Grapevine article ripper

Extracking GrapevineArticles#1

Unhandled Exception:
System.IO.FileNotFoundException: Could not find file "/home/[my-user-name]/Documents/programming/GrapevineArticles#1-"
File name: '/home/[my-user-name]/Documents/programming/GrapevineArticles#1-'
  at System.IO.FileStream..ctor (System.String path, System.IO.FileMode mode, System.IO.FileAccess access, System.IO.FileShare share, System.Int32 bufferSize, System.Boolean anonymous, System.IO.FileOptions options) [0x001ef] in <12b418a7818c4ca0893feeaaf67f1e7f>:0 
  at System.IO.FileStream..ctor (System.String path, System.IO.FileMode mode, System.IO.FileAccess access, System.IO.FileShare share, System.Int32 bufferSize) [0x00000] in <12b418a7818c4ca0893feeaaf67f1e7f>:0 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream..ctor(string,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare,int)
  at System.IO.File.ReadAllBytes (System.String path) [0x00000] in <12b418a7818c4ca0893feeaaf67f1e7f>:0 
  at GV.Program.Main (System.String[] args) [0x00061] in <34d8276890744542a3859d8312694d7c>:0 
[ERROR] FATAL UNHANDLED EXCEPTION: System.IO.FileNotFoundException: Could not find file "/home/[my-user-name]/Documents/programming/GrapevineArticles#1-"
File name: '/home/[my-user-name]/Documents/programming/GrapevineArticles#1-'
  at System.IO.FileStream..ctor (System.String path, System.IO.FileMode mode, System.IO.FileAccess access, System.IO.FileShare share, System.Int32 bufferSize, System.Boolean anonymous, System.IO.FileOptions options) [0x001ef] in <12b418a7818c4ca0893feeaaf67f1e7f>:0 
  at System.IO.FileStream..ctor (System.String path, System.IO.FileMode mode, System.IO.FileAccess access, System.IO.FileShare share, System.Int32 bufferSize) [0x00000] in <12b418a7818c4ca0893feeaaf67f1e7f>:0 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream..ctor(string,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare,int)
  at System.IO.File.ReadAllBytes (System.String path) [0x00000] in <12b418a7818c4ca0893feeaaf67f1e7f>:0 
  at GV.Program.Main (System.String[] args) [0x00061] in <34d8276890744542a3859d8312694d7c>:0 
[my-user-name]@falcon:~/Documents/programming$ (./Program.exe GrapevineArticles#1)
Grapevine article ripper

Extracking GrapevineArticles#1


Unhandled Exception:
System.IO.FileNotFoundException: Could not find file "/home/[my-user-name]/Documents/programming/GrapevineArticles#1"
File name: '/home/[my-user-name]/Documents/programming/GrapevineArticles#1'
  at System.IO.FileStream..ctor (System.String path, System.IO.FileMode mode, System.IO.FileAccess access, System.IO.FileShare share, System.Int32 bufferSize, System.Boolean anonymous, System.IO.FileOptions options) [0x001ef] in <12b418a7818c4ca0893feeaaf67f1e7f>:0 
  at System.IO.FileStream..ctor (System.String path, System.IO.FileMode mode, System.IO.FileAccess access, System.IO.FileShare share, System.Int32 bufferSize) [0x00000] in <12b418a7818c4ca0893feeaaf67f1e7f>:0 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream..ctor(string,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare,int)
  at System.IO.File.ReadAllBytes (System.String path) [0x00000] in <12b418a7818c4ca0893feeaaf67f1e7f>:0 
  at GV.Program.Main (System.String[] args) [0x00061] in <34d8276890744542a3859d8312694d7c>:0 
[ERROR] FATAL UNHANDLED EXCEPTION: System.IO.FileNotFoundException: Could not find file "/home/[my-user-name]/Documents/programming/GrapevineArticles#1"
File name: '/home/[my-user-name]/Documents/programming/GrapevineArticles#1'
  at System.IO.FileStream..ctor (System.String path, System.IO.FileMode mode, System.IO.FileAccess access, System.IO.FileShare share, System.Int32 bufferSize, System.Boolean anonymous, System.IO.FileOptions options) [0x001ef] in <12b418a7818c4ca0893feeaaf67f1e7f>:0 
  at System.IO.FileStream..ctor (System.String path, System.IO.FileMode mode, System.IO.FileAccess access, System.IO.FileShare share, System.Int32 bufferSize) [0x00000] in <12b418a7818c4ca0893feeaaf67f1e7f>:0 
  at (wrapper remoting-invoke-with-check) System.IO.FileStream..ctor(string,System.IO.FileMode,System.IO.FileAccess,System.IO.FileShare,int)
  at System.IO.File.ReadAllBytes (System.String path) [0x00000] in <12b418a7818c4ca0893feeaaf67f1e7f>:0 
  at GV.Program.Main (System.String[] args) [0x00061] in <34d8276890744542a3859d8312694d7c>:0

N.B. I have edited this error message just to replace my actual name with [my-user-name]
That's the only change I have made to the error message


Question:

Do you have any ideas why this might be failing to produce extraction
that you receive when you run the program yourself ?


Useful details:
OS: Linux (Ubuntu 22.04 LTS)
Kernel: 6.2.0-37-generic (64-bit)
DE: KDE (5.92.0)
Plasma: 5.24.7
Geordie-Jedi is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Amiga Launcher need a volunteers synchro Retrogaming General Discussion 39 09 January 2012 16:35
NFA The Word - Diskmag wanted chocolate_boy request.Other 2 09 February 2011 00:44
Old German Diskmag called Amigo! list wanted Joe Maroni Nostalgia & memories 4 30 December 2010 19:10
LSD Grapevine#15 Disk#4 Wanted DH request.Other 31 10 June 2009 10:57
WTD: LSD Grapevine diskmag's! ElectroBlaster request.Demos 4 21 April 2002 23:42

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 13:00.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10259 seconds with 16 queries