English Amiga Board


Go Back   English Amiga Board > Coders > Coders. General

 
 
Thread Tools
Old 01 March 2023, 20:59   #1
nocash
Registered User
 
Join Date: Feb 2016
Location: Homeless
Posts: 64
DMS compression methods

I am writing a decompressor for floppy disk images in .dms format. Going by the xDMS source code, the compression methods are based on standard formats like RLE90, LZSS, LZHUF, AR002 (although all .dms methods are slightly customized variants of those formats). So far, I can decompress these methods:
Method 0: Uncompressed some files contain banners in uncompressed form, for example: http://aminet.net/package/demo/mega/love_anarchy
Method 1: Simple (RLE90) used in various 17bit cdrom files, http://cd.textfiles.com/17b5/files/
Method 3: Medium (RLE90+LZSS with Huffman) used in http://aminet.net/package/demo/mega/Anarchy2
Method 6: Heavy 2 (OptionalRLE90+AR002/lh5) most common, found in various .dms files on aminet

I haven't found .dms files with these methods:
Method 2: Quick (RLE90+LZSS) - not found
Method 4: Deep (RLE90+LZHUF) - not found
Method 5: Heavy1 (OptionalRLE90+AR002/lh4) - not found
Would be cool if somebody knowns where to find images in that formats for testing (admittedly I am too dumb to use an Amiga emulator to create my own dms files in it).

For curiosity:
Is it important to support the Append feature (for one disk split into to two .dms files)?
Are there any High Density .dms disk images for Amiga?
Are there any .DMS files for DOS, Atari, Mac?
And does the Disk Masher software even exist for those computers?
And there seems to be also a File Masher .FMS variant, was that ever used anywhere?
(asking because the DMS header contains entries for that stuff)

Reportedly there are .dms files with "fake bootblocks", what is that, where could one find that such files?
I assume that they contain 80 normal tracks, plus an additional track 0 entry that does overwrite/patch the original track 0, or parts of it (?)

According to the xDMS source code, method 2-5 are sharing a single ring buffer for the dictionary (plus some oddities like buffer gaps and separate buffer pointers for each method). I hope that those methods are never combined inside of a single .dms file (like AR002 tracks relying on previous LZHUF tracks, with parts of it being overwritten by LZSS tracks).

------------

Is DMS bugged or not?
I am confused there. xDMS doesn't mention any major problems. But Wikipedia says that Disk Masher has problems about bugs in the compression algorithm. But it's hard to believe that all compression methods & all software versions were bugged... unless the bug is in the shared RLE-compression layer, or in the overall match lookup function?

Apart from actual bugs, there are two generic problems: DMS doesn't store low-level MFM data. And (without NOZERO option) it does only store blocks that are flagged as used in the allocation bitmap (a nice feature, but fatal if the disk doesn't contain a regular OFS/FFS filesystem).

Some people also mentioned DMS problems here:
Codetapper: https://eab.abime.net/showpost.php?p=19008&postcount=19
StingRay: https://eab.abime.net/showpost.php?p...0&postcount=13
Would be nice to have a freeware sample with an uncompressable disk in .ADF format, so people could try for themself!

Looking at the xDMS source code, there is one small issue in the init function:
The last dozens of bytes of the 16Kbyte dictionary are uninitialized, that bytes are only used by Medium and Deep methods, and they are normally instantly overwritten by the first compressed track... unless the first track is only a handful of bytes tall, then the uninitialized bytes could stay in the dictionary and keep getting "rotated around" in the ringbuffer.
But that bytes would be usually overwritten at some later point before reaching the end of the disk (so it's unlikely to explain problems on track 79). And I've merely spotted that issue in the (inofficial) xDMS source code, the official software might behave differently (and the compressor might use its own lookup tree without caring about the ring buffer contents at all).
Altogether, I doubt that the uninitilialized bytes are causing problems (but if they do, they could cause unpredictable results depending on whether the uninitialized random values do match up with the data on the disk or not).

Last edited by nocash; 01 March 2023 at 21:05.
nocash is offline  
Old 02 March 2023, 12:53   #2
Bartman
Registered User
 
Join Date: Feb 2019
Location: Munich, Germany
Posts: 63
Quote:
Originally Posted by nocash View Post
Reportedly there are .dms files with "fake bootblocks", what is that, where could one find that such files?
I assume that they contain 80 normal tracks, plus an additional track 0 entry that does overwrite/patch the original track 0, or parts of it (?)
That was used by BBSs to include adverts. Later DMS versions displayed the bootblock during decompression if it looked non-standard. So I think they just prepended a fake bootblock with their BBS ad which was then immediately overwritten by the real bootblock.
Bartman is offline  
Old 02 March 2023, 13:10   #3
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,430
Quote:
Originally Posted by nocash View Post
For curiosity:
Is it important to support the Append feature (for one disk split into to two .dms files)?
Isn't this necessary if compression makes the .dms file bigger than 880KB?

Quote:
Originally Posted by nocash View Post
Are there any .DMS files for Atari?
Atari ST had it's own competing equivalent called .MSA (Magic Shadow Archiver)
alexh is offline  
Old 02 March 2023, 17:01   #4
Bartman
Registered User
 
Join Date: Feb 2019
Location: Munich, Germany
Posts: 63
Quote:
Originally Posted by alexh View Post
Isn't this necessary if compression makes the .dms file bigger than 880KB?
yeah, I think the BBSs had a rule that you had to split your DMS file if it doesn't fit on a disk.
Bartman is offline  
Old 04 March 2023, 14:36   #5
temisu
Registered User
 
Join Date: Mar 2017
Location: Tallinn / Estonia
Posts: 74
Good questions,
I have researched some answers when I made my decompressor.

First, I can provide test files for all formats (I will check them in to my github). They are generated by me so there are no copyright issues.

Then there is question about platforms: guys behind DMS advertised support for non-amiga platforms (for a fee) but I do not know if it existed and if there are files. For Amiga HD floppy format I managed to create working DMS-file but I doubt no one really used it

I have not seem FMS either but that does not mean it wont exist

About bugs, there are a bunch. Some of them can be circumvented on the fly when decompressing but others just create broken images.

See here if interested https://github.com/temisu/ancient/bl...compressor.cpp

Last edited by temisu; 05 March 2023 at 16:45.
temisu is offline  
Old 05 March 2023, 16:32   #6
temisu
Registered User
 
Join Date: Mar 2017
Location: Tallinn / Estonia
Posts: 74
And now for the promised files...

Tests for all compression methods (with and without password). These files are created by me:
https://github.com/temisu/ancient/ra...st_C1_none.dms
https://github.com/temisu/ancient/ra...1_none_pwd.dms
https://github.com/temisu/ancient/ra..._C1_simple.dms
https://github.com/temisu/ancient/ra...simple_pwd.dms
https://github.com/temisu/ancient/ra...t_C1_quick.dms
https://github.com/temisu/ancient/ra..._quick_pwd.dms
https://github.com/temisu/ancient/ra..._C1_medium.dms
https://github.com/temisu/ancient/ra...medium_pwd.dms
https://github.com/temisu/ancient/ra...st_C1_deep.dms
https://github.com/temisu/ancient/ra...1_deep_pwd.dms
https://github.com/temisu/ancient/ra..._C1_heavy1.dms
https://github.com/temisu/ancient/ra...heavy1_pwd.dms
https://github.com/temisu/ancient/ra..._C1_heavy2.dms
https://github.com/temisu/ancient/ra...heavy2_pwd.dms

And one for HD-disk:
https://github.com/temisu/ancient/ra...s/test_ext.dms

And then some broken file examples from the wild. First broken file that can be uncompressed with some trickery correctly:

https://www.amigapd.com/uploads/5/5/...10/_asi029.dms

Then file which can't be reliably decompressed.

https://aminet.net/demo/disk/Eradication.dms

Last edited by temisu; 05 March 2023 at 16:38.
temisu is offline  
Old 05 March 2023, 19:37   #7
nocash
Registered User
 
Join Date: Feb 2016
Location: Homeless
Posts: 64
Many thanks for the test images! The source code does also look interesting, so far I had only seen the xDMS source code.

Quote:
Originally Posted by alexh View Post
Isn't this necessary if compression makes the .dms file bigger than 880KB?
Yes, but are there many such disk images in use today? Like, when writing an amiga emulator, would it be a must-have feature to support loading/merging such dms files?

Quote:
Originally Posted by alexh View Post
Atari ST had it's own competing equivalent called .MSA
If there was any competition... I haven't found any traces of DMS being released for PC/MAC/Atari (except, the internet archive has something called "Disk Masher XE" which seems to be for 8bit Atari, but I couldn't tell if it's the same software from same author).

Or maybe the PC/Mac/Atari support did just mean that the Amiga version could compress floppies from such computers?

Quote:
Originally Posted by Bartman View Post
Later DMS versions displayed the bootblock during decompression if it looked non-standard.
That bootblocks were displayed (as text?) during decompression? That's weird. I thought the banners were used for that purpose.
Would be nice to see a dms file with such bootblocks.

Apropos, another oddity is the "file_id.diz" feature, as in the "miamivic.dms" file here https://telparia.com/fileFormatSamples/archive/dms/ - those diz files are stored as "track 80" which seems to rule out any support for compressing disks with more than 80 tracks?
nocash is offline  
Old 05 March 2023, 21:43   #8
temisu
Registered User
 
Join Date: Mar 2017
Location: Tallinn / Estonia
Posts: 74
Quote:
Originally Posted by nocash View Post
Or maybe the PC/Mac/Atari support did just mean that the Amiga version could compress floppies from such computers?
From official documentation...

Code:
        For $30 you will be mailed a complete hard-copy manual and the
    latest version of DMS on disk.  You will also receive an account on the
    SDS Software bulletin board system and be able to download via modem
    the latest versions free of charge.  If you do not wish or have the
    capability to call the BBS , you can at any time send your disk with an
    SASE and we will copy the latest versions and mail it back to you. 
    It also gives you an unlimited Upload/Download ratio on the BBS for
    one year.  Along with the CLI/Shell version of DMS you will also receive
    when available:

        DMSPro - an advanced version of DMS with faster routines and
                 the capability to archive Amax, Mac, MS-DOS, and Atari-ST
                 format disks.
I think the keyword here is "when available"
Also probably it still means amiga executable.

Quote:
Originally Posted by nocash View Post
Apropos, another oddity is the "file_id.diz" feature, as in the "miamivic.dms" file here https://telparia.com/fileFormatSamples/archive/dms/ - those diz files are stored as "track 80" which seems to rule out any support for compressing disks with more than 80 tracks?
This is where reality meets the theory. In theory you can pack hard drives with the DMS as well. But in practice stuff out of tracks 0 to 80 is used for all random stuff and is to be ignored
temisu is offline  
Old 06 March 2023, 18:42   #9
nocash
Registered User
 
Join Date: Feb 2016
Location: Homeless
Posts: 64
I've downloaded dms files (thanks again). To get them, I had to fix two links:
https://github.com/temisu/ancient/ra...est_C1_ext.dms - renamed "test_ext" to "test_C1_ext"
http://www.amigapd.com/uploads/5/5/0...10/_asi029.dms - https didn't work for me, but http does
My decompressor is now working with all methods, including newly added quick, deep, heavy1.

Do you know more about what is bugged in the two broken files? And for the first one, which trickery did you use to fix it?

The Eradiction.dms file contains four of those fake bootblocks. Okay, I see. They contain a "DOS",00h,<checksum> bootblock header in first 8 bytes, but the remaining 7F8h bytes are just ASCII/ANSI text without any bootcode. Very weird : )
And as Bartman has said, they are then followed by the real Track 0 data entry, which is overwriting the fake crap.

---

Looking through temisu's source code...

If you want to get the source code more compact. The two lengthTable[256], bitlengthTable[256] arrays aren't really needed (they are equivalent to the standard LZHUF constants: 3,4,4,4,5,5,etc). Medium decompression can be done as so:
Code:
  lzh_explode_tree(tree,lh1_dist_codesizes,40h)    ;aka createOrderlyHuffmanTable
 @@decompress_lop:
  if dst=dst_end then goto @@decompress_done
  if GetBits(1)=1 then
    [dst]=GetBits(8), dst=dst+1
  else
    len=GetHuffCode(tree)+3                       ;=max 42h
    disp=(GetHuffCode(tree)*100h+GetBits(8))+1    ;=max 4000h
    for i=1 to len, [dst]=[dst-disp], dst=dst+1, next i
  goto @@decompress_lop
 @@decompress_done:
  CastDmsPostGap(42h)
  ret
 lh1_dist_codesizes:   ;same values as for LZHUF disp_tree
  db 3,4,4,4,5,5,5,5, 5,5,5,5,6,6,6,6, 6,6,6,6,6,6,6,6, 7,7,7,7,7,7,7,7
  db 7,7,7,7,7,7,7,7, 7,7,7,7,7,7,7,7, 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8
And similar in the Deep decompressor (unlike original LZHUF it's using GetBits(8) instead GetBits(6)):
Code:
    disp=(GetHuffCode(tree)*100h+GetBits(8))+1    ;=max 4000h
What is the "missing last char" stuff good for? I haven't encountered such files yet.

And what is this...
Code:
heavyLastOffset=use8kDict?0U:~0U;
My brain locks up on those ? : notations. I've the vague idea that it means if/else, alike
Code:
  if use4kDict then LastOffset=-1  ;heavy1
  if use8kDict then LastOffset=0   ;heavy2
  (or is it vice versa?)
  (and after further subtraction (rawoffset-1) the actual offset will be -2 or -1, right?)
Is that needed for any files? I haven't encountered that yet. But I guess one could easily create .adf files with tracks starting with nonzero repeating 1-byte or 2-byte values (eg. "AAAAAAAA" or "AxAxAxAxAxAx") for testing.

In xDMS, that LastOffset seems to be (mis-)named "last length" or the like. The original xDMS code didn't seem to initialize it, but somebody has added initialization some years ago (confusingly the github change notes claim that it's initialized between files, but the github source code is actually initializing it anytime when clearing the dict buffer).

---

And some thoughts on whether or not needing the ringbuffer... Basically, it might be faster to allocate space for initial dictionary in front of the decompression buffer, so one could just copy data from [dst-disp] to [dst], without needing the ringbuffer.

With the dictionary being re-used by further tracks and with weird gaps inserted, I've brewed up this chart, showing the initial 4000h-byte content, the newly decoded data, and the required relocations for creating the initial 4000h-byte content for the next track.
Code:
 ;      |----------- old 4000h ------------------->
 ;                                                :
 ;      .- - - - - - - - - - - - - - - - - -.-----.----------------.
 ;      |                :     :       old  | old | new            |
 ;      |                :     :       data | gap | data           |
 ;      '- - - - - - - - - - - - - - - - - -'-----'----------------'
 ;   __________________________:___________/     /:           needed
 ;  /     ______________________________________/ :       <-- when new<gap
 ; /     /                     :                  :
 ;.-----.- - - - - - - - - - - - - - - - - -.-----.----------------.
 ;| old |                      :       old  | old | new            |
 ;| gap |                      :       data | gap | data           |
 ;'-----'- - - - - - - - - - - - - - - - - -'-----'----------------'
 ;                        \     \_________________:____________________
 ;                         \______________________________________     \
 ;                             :                  :               \     \
 ;.-----.- - - - - - - - - - - - - - - - - -.-----.----------------.-----.
 ;| old |                      :       old  | old | new            | new |
 ;| gap |                      :       data | gap | data           | gap |
 ;'-----'- - - - - - - - - - - - - - - - - -'-----'----------------'-----'
 ;         ___________________/                   :                     /
 ;        /                    :                  : ___________________/
 ;       /                     :                   /
 ;      . - - - - - -.-----:----------------.-----.
 ;      |       old  | old | new            | new |
 ;      |       data | gap | data           | gap |
 ;      '- - - - - - '-----'----------------'-----'
 ;                             :                  :
 ;                             <----------------- new 4000h -------------|
That has been actually working okay. But it's quite insane. And it would require further adjustments for dictionaries smaller than 4000h bytes, and it would get really complicated when switching between different dict sizes within the same file.

So, I've dripped that idea, and I am now using a ringbuffer with masked index.

So far, I've never seen files with more than one compression method (in range 2..6). For now, I didn't bother about the starting locations:
Code:
quickContextLocation=251;
mediumContextLocation=16318;
deepContextLocation=16324;
heavyContextLocation=0;
If the whole ringbuffer is initially zerofilled then it doesn't matter where to start. Unless files mixing different methods should exist... But I guess I shouldn't be surprised if such files do exist & do require those exact inital locations.

Last edited by nocash; 06 March 2023 at 18:58.
nocash is offline  
Old 06 March 2023, 19:41   #10
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,430
I can't fire up my Amiga at the moment but I still have some DMS files downloaded from BBS of the day (all the data stored in my DMS files will be available in other formats) but I remember them all being less than 880k. There were 10s of them per release because towards the end of Amiga games were on CD or across multiple disks.

Am I remembering wrong or was there a facility in the DMS to add ANSI/ASCII which was displayed while decompressing/writing back to disks? I have memories of the BBS' of the day tagging their ANSI/ASCII banners (.nfo) to the DMS files (but it could have been another format)
alexh is offline  
Old 06 March 2023, 22:29   #11
temisu
Registered User
 
Join Date: Mar 2017
Location: Tallinn / Estonia
Posts: 74
Quote:
Originally Posted by nocash View Post
Do you know more about what is bugged in the two broken files? And for the first one, which trickery did you use to fix it?
Quote:
Originally Posted by nocash View Post
What is the "missing last char" stuff good for? I haven't encountered such files yet.
So this is exactly the stuff where it breaks. In my tests I found out that the encoder might leave the last byte out. If it is repeat-counter we can guess that it is until the end of the track, and fix it. If it is literal value, we are kind of bummed. (It is some time ago when I wrote this, and my memory is already being a bit fuzzy on this. I remember I did lots of debugging)

Quote:
Originally Posted by nocash View Post
If you want to get the source code more compact. The two lengthTable[256], bitlengthTable[256] arrays aren't really needed (they are equivalent to the standard LZHUF constants: 3,4,4,4,5,5,etc).
You are correct. I don't know what I was thinking. However, it is tested with hundreds of images and it works anything I throw at it. I'm not going to change it to make it more pretty. The password guessing makes it horrible anyway

Quote:
Originally Posted by nocash View Post
My brain locks up on those ? : notations. I've the vague idea that it means if/else, alike
It is very easy to write an conditional as expression instead of code. It is very useful

Quote:
Originally Posted by nocash View Post
And some thoughts on whether or not needing the ringbuffer... Basically, it might be faster to allocate space for initial dictionary in front of the decompression buffer, so one could just copy data from [dst-disp] to [dst], without needing the ringbuffer.

With the dictionary being re-used by further tracks and with weird gaps inserted, I've brewed up this chart, showing the initial 4000h-byte content, the newly decoded data, and the required relocations for creating the initial 4000h-byte content for the next track.
I think you are bit overthinking the format. The best way to interpret the data is stream of compressed bytes which then get distributed across tracks

Quote:
Originally Posted by nocash View Post
So far, I've never seen files with more than one compression method (in range 2..6).
I haven't seen those either. Most likely some tools which add the infos add those uncompressed but real mixed compression ... not really

Quote:
Originally Posted by alexh View Post
I can't fire up my Amiga at the moment but I still have some DMS files downloaded from BBS of the day
You know, I always could use more test files
temisu is offline  
Old 07 March 2023, 09:59   #12
hooverphonique
ex. demoscener "Bigmama"
 
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,634
Quote:
Originally Posted by alexh View Post
Am I remembering wrong or was there a facility in the DMS to add ANSI/ASCII which was displayed while decompressing/writing back to disks? I have memories of the BBS' of the day tagging their ANSI/ASCII banners (.nfo) to the DMS files (but it could have been another format)

See posts #2 and #7.
hooverphonique is offline  
Old 07 March 2023, 10:44   #13
alexh
Thalion Webshrine
 
alexh's Avatar
 
Join Date: Jan 2004
Location: Oxford
Posts: 14,430
Were there 2 sources of banner?

I have memories of files that had passed through multiple BBS' and had more than one banner.

But no evidence.

Last edited by alexh; 07 March 2023 at 11:02.
alexh is offline  
Old 07 March 2023, 10:56   #14
hooverphonique
ex. demoscener "Bigmama"
 
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,634
Quote:
Originally Posted by alexh View Post
Was there 2 sources of banner?

I have memories of files that had passed through multiple BBS' and had more than one banner.

But no evidence.
I'm not sure, but i definitely remember the file_id.diz business - if it was stored as track 0/bootblock or 80, I can't recall.

Last edited by hooverphonique; 08 March 2023 at 11:06.
hooverphonique is offline  
Old 08 March 2023, 09:41   #15
Exodous
Registered User
 
Join Date: Sep 2019
Location: Leicester / England
Posts: 203
I wrote a series of Ami Express BBS doors to add/remove BBS adverts, one of which handled DMS files.

DMS files have a 16 bit [oops, originally I wrote 16 byte] unsigned header for the track number, and theoretically could store data for tracks 0 through to 65535. Also, they could have any number of track entries with the same track ID. All tracks would be written in the order they were read from the file, so it was important to put "advert" track 0 entries before the real track 0 entry, otherwise it would corrupt the final written disk content.

Providing the DMS track checksums matched, you could theoretically put your advert as any track number. However, only "track 0" entries would be displayed if they were a non-standard bootblock starting with 'DOS' followed by a null byte.

It was therefore common practice to also add advert tracks to other track numbers and, whilst this could theorietically be any track number as long as it appeared before the legitimate intance of that track number, most tools added them at the end of the file and outside the track range 0 to 79 as they wouldn't be written back to the disk by DMS. My own tool added them as track 65535.

If you want, I'm happy to update a couple of temisu's example DMS files with adverts in the various positions if it helps?

Last edited by Exodous; 08 March 2023 at 17:59.
Exodous is offline  
Old 11 March 2023, 11:13   #16
nocash
Registered User
 
Join Date: Feb 2016
Location: Homeless
Posts: 64
Quote:
Originally Posted by Exodous View Post
If you want, I'm happy to update a couple of temisu's example DMS files with adverts in the various positions if it helps?
Panic! Please don't brew up more unusual hacks with unusual data inserted at weird locations (or if you do, make sure that it is really compatible with the original decompressor).

Other than that, yes, an example with all kind of banners & stuff would be nice (as long as it does comply with what was used back then, without bringing up new obstacles that could make it more difficult to decompress the files).

What I am more interested in is what was the purpose of the adverts...
- Why are they called adverts... did they have commercial value, or is it just underground grafitti?
- The banners (Track FFFFh) are displayed during decompression, right?
- If so, why would one additionally use fake bootblocks (Track 0) to display extra stuff?
- Is there some difference... something like banners could be disabled, but fake bootblocks are always displayed?
- If stuff could be also stored on track 1,2,3,... and if it wouldn't be displayed... why would one do that? Is that just intended as hidden message that could be only viewed in hex editors?
- What is "file_id.diz" (Track 80) for... is that also displayed during decompression? Or displayed elsewhere?
- temisu mentioned something in Track FFFEh, what's that for? Is it also a hidden message without any other purpose?
- Oh, and is there a size limit on banner tracks? Like causing memory/buffer overflows?

Quote:
Originally Posted by temisu View Post
I think you are bit overthinking the format. The best way to interpret the data is stream of compressed bytes
I was just trying to sort out how to store that byte stream in memory (and where to insert the gaps) (and if there were problems with uninitialized data, where that junk would occur in the buffer).
But the above chart was overcomplicated (I had somehow thought that there would be a corner case where one needed to store a copy of the old gap before old data, that was nonsense). Having that corrected, it now looks as so:
Code:
 ;      |----------- old 4000h ------------------->
 ;                                                :
 ;      .-.- - . - - - - - - - - - - - - - -.-----.----------------.
 ;      | |junk|         :     :       old  | old | new            |
 ;      | |    |         :     :       data | gap | data           |
 ;      '-'- - ' - - - - - - - - - - - - - -'-----'----------------'
 ;                        \     \_________________:____________________
 ;                         \______________________________________     \
 ;                             :                  :               \     \
 ;      .-.- - . - - - - - - - - - - - - - -.-----.----------------.-----.
 ;      | |junk|               :       old  | old | new            | new |
 ;      | |    |               :       data | gap | data           | gap |
 ;      '-'- - ' - - - - - - - - - - - - - -'-----'----------------'-----'
 ;         ___________________/                   :                     /
 ;        /                    :                  : ___________________/
 ;       /                     :                   /
 ;      . - - - - - -.-----:----------------.-----.
 ;      |       old  | old | new            | new |
 ;      |       data | gap | data           | gap |
 ;      '- - - - - - '-----'----------------'-----'
 ;                             :                  :
 ;                             <----------------- new 4000h -------------|
That is, one small memcopy for appending the new gap. And one large memcopy to move the dictionary back to the begin of the buffer (to avoid getting the dictionary somewhere near 1Mbyte tall). So that's do-able, and I am now back doing it like that (my personal preference is to store data in output buffer only, without addiotionally storing it in context/ringbuffer).

Quote:
Originally Posted by temisu View Post
the encoder might leave the last byte out. If it is repeat-counter we can guess that it is until the end of the track, and fix it. If it is literal value, we are kind of bummed.
Why bummed? If the last literal value is known to be wrong, that's the optimal situation: Just change it to match the decompressed checksum. That should be 100% safe (unless there are cases where non-last bytes are also missing or corrupted).

EDIT: Or do you mean a RLE code with missing fillvalue? The possible RLE codes are:
Code:
  90h,00h            Output 90h
  90h,FFh,xxh,Hi,Lo  Output xxh repeated Hi*100h+Lo times (Len=0..FFFFh)
  90h,Len,xxh        Output xxh repeated Len times        (Len=1..FEh)
  xxh                Output xxh
Those would all need different handling, but it should be all fix-able.

I've been looking into the two bugged files...

_asi029.dms
The bug occurs in Track 4Fh (aka 79, aka last track). That track has
-- Heavy size = 1011h (but there's one byte missing, it would require 1012h bytes)
-- RLE size = 2B5Fh
Did you track down where the byte got lost? Is it the RLE compressor not forwarding the last byte to Heavy? Or the Heavy compressor not storing the last bits of the bitstream?
Also, is that problem common to occur (only?) on Track 79? Asking because Codetapper also mentioned issues on Track 79.

Eradication.dms
The bug occurs in Track 3Bh (aka 59). That track has
-- Heavy size = 1FFBh
-- RLE size = 2BE3h
Those sizes are looking okay, no missing bytes, but the checksum is slightly off (+0Eh).

After thinking about that for some days... what if it's the same problem, and the last byte is missing there, too?
What I mean is: The heavy bitstream is padded to 8bit boundary. So, if the last RLE byte is missing, then those padding bits can appear to contain the missing byte.
That could be easily fixed using your "guess the last byte" trick, too. Simply add something like this: IF size=okay AND checksum=wrong THEN assume size=size-1

To confirm that theory... Unfortunately, the Eradication file doesn't have OFS sector checksums (which could offer some (imperfect) extra error check if they were present). But fortunately, the original file can be found here in ADF format:
https://www.pouet.net/prod.php?which=62355 - Insane-Eradication.adf
And... yes, that seems to confirm the missing last byte theory : ) the file is exactly same as the dms decompression output, except the "missing" last byte on track 59 is different (in the ADF file the track ends with DF,E2,E0,DC, and the bugged DMS output has DF,E2,E0,EA in that location).

PS. Would be interesting if one could actually reproduce the bug when compressing the original "Insane-Eradication.adf" file. Best with different DMS versions. And with different DMS methods. And also with "heavy-without-rle" (if DMS has an option for that).

Last edited by nocash; 11 March 2023 at 11:27.
nocash is offline  
Old 11 March 2023, 16:41   #17
Exodous
Registered User
 
Join Date: Sep 2019
Location: Leicester / England
Posts: 203
Quote:
Originally Posted by nocash View Post
Panic! Please don't brew up more unusual hacks with unusual data inserted at weird locations (or if you do, make sure that it is really compatible with the original decompressor).
I was just pointing out that technically these entries can be anywhere in the file, they don't have to be at the beginning and the end, though that is where they are usually placed.

Quote:
Originally Posted by nocash View Post
- Why are they called adverts... did they have commercial value, or is it just underground grafitti?
I guess they were called adverts as they were usually used to advertise the BBS's they had passed through and, in theory, the more files that passed through a BBS, the more people would want to connect to that BBS and the better reputation it would get.

One man's advert is another's graffiti though.

Quote:
Originally Posted by nocash View Post
- The banners (Track FFFFh) are displayed during decompression, right?
- temisu mentioned something in Track FFFEh, what's that for? Is it also a hidden message without any other purpose?
I've just tested and tracks added with number FFFFh or FFFEh are both displayed as banners using DMS 2.01 as that's what's installed.

Quote:
Originally Posted by nocash View Post
- If so, why would one additionally use fake bootblocks (Track 0) to display extra stuff?
- Is there some difference... something like banners could be disabled, but fake bootblocks are always displayed?
It would be very easy to write a piece of code to strip all tracks outside the normal 0 to 79 range and consequenty strip other banners/adverts. Stripping track 0 would require more work to ensure the real track 0 wasn't stripped. It's easy to do this now, but when resources were limited (both RAM and CPU) it added additional overhead.

The fake bootblock was displayed where it didn't match a standard bootblock as a way of warning the end user it could be malicious.

However, the "NOTEXT" option in DMS could suppress both the banners and track 0 display.

Quote:
Originally Posted by nocash View Post
- If stuff could be also stored on track 1,2,3,... and if it wouldn't be displayed... why would one do that? Is that just intended as hidden message that could be only viewed in hex editors?
It isn't displayed, but I was trying to use it as an example that a file could have multiple tracks with the same number, but ultimately whilst all would be written to the destination disk as they are identified, only the last one would actually be present on the final disk.

Quote:
Originally Posted by nocash View Post
- What is "file_id.diz" (Track 80) for... is that also displayed during decompression? Or displayed elsewhere?
I have no idea - writing a track 80 banner doesn't seem to be displayed with DMS 2.01 I have, so it would be just a hidden message?

Quote:
Originally Posted by nocash View Post
- Oh, and is there a size limit on banner tracks? Like causing memory/buffer overflows?
The packed and unpacked length of a track or banner is a 16 bit word. Which theoretically suggests that the limit is 65,535 characters. However, at least with DMS 2.01 it doesn't actually work with anything more than 32,767 characters for a banner, presumably because internally it's a signed word value.

Attached are a selection of DMS files using temisu's example file test_C1_medium.dms as the base and then adding a front banner using ID FFFFh, a front banner using ID FFFEh, a track 0 banner, a rear banner using ID FFFFh and finally a 32767 byte front banner (the first part of Romeo and Juliet). There is as a log file showing the output when written by DMS 2.01.
Attached Files
File Type: zip sample-dms-with-banner.zip (366.5 KB, 38 views)
Exodous is offline  
Old 11 March 2023, 21:54   #18
temisu
Registered User
 
Join Date: Mar 2017
Location: Tallinn / Estonia
Posts: 74
Quote:
Originally Posted by nocash View Post
Why bummed? If the last literal value is known to be wrong, that's the optimal situation: Just change it to match the decompressed checksum. That should be 100% safe (unless there are cases where non-last bytes are also missing or corrupted).
Quote:
Originally Posted by nocash View Post
Those would all need different handling, but it should be all fix-able.
Quote:
Originally Posted by nocash View Post
To confirm that theory... Unfortunately, the Eradication file doesn't have OFS sector checksums (which could offer some (imperfect) extra error check if they were present).
Yes, you are correct that it this case should be fixable. However I specifically decided not to fix this case what the eradication has because
  1. This case requires working backwards from CRC to rebuild the data. If we do it this way the error checking becomes meaningless since it is used to derive the data
  2. It is rare case, to my knowledge this is only case in the wild which has this specific bug. Thus it is not worth it. (the other problem is more prevalent)

I do believe that these bugged files are rare in the wild, most likely when some images were broken they were re-uploaded and replaced with fixed images. In these cases where the broken files are still available the problem probably appears in a place where it is non-breaking.

My decision to fix the "easy" breakage that is pretty safe to do was just to give people extra 0.1% to make my implementation the best on there is

Quote:
Originally Posted by nocash View Post
PS. Would be interesting if one could actually reproduce the bug when compressing the original "Insane-Eradication.adf" file. Best with different DMS versions. And with different DMS methods. And also with "heavy-without-rle" (if DMS has an option for that).
the compression options allow to choose method but I did not see any fine grained options

In any case you can find some DMS versions here: http://www.amiga-stuff.com/archivers-download.html

Then take fs-uae and go wild (even the basic AROS will do the trick, you don't have to buy roms), you can then process as many files as you want. If you are serious about any amiga stuff, you have to do this sooner or later anyway.

(I also had to go outside my comfort zone earlier when dealing with LOB compression since tooling was only available for Atari. So now I know Hatari )
temisu is offline  
Old 13 March 2023, 19:22   #19
nocash
Registered User
 
Join Date: Feb 2016
Location: Homeless
Posts: 64
Quote:
Originally Posted by Exodous View Post
I've just tested and tracks added with number FFFFh or FFFEh are both displayed...
with DMS 2.01 it doesn't actually work with anything more than 32,767 characters for a banner...
Good to know!

Quote:
Originally Posted by Exodous View Post
It would be very easy to write a piece of code to strip all tracks outside the normal 0 to 79 range and consequenty strip other banners/adverts.
Ah, now I got it, thanks! Those fake bootnlocks are just there to dodge tools that change/remove banners on track FFFFh. I guess using track FFFEh might have been used with the same intention.

Quote:
Originally Posted by Exodous View Post
I have no idea - writing a track 80 banner doesn't seem to be displayed with DMS 2.01 I have, so it would be just a hidden message?
There is this (official?) document http://lclevy.free.fr/amiga/DMS.txt describing the File_ID.DIZ stuff. I guess it must have had some purpose, perhaps only supported in later DMS versions? Or it might even be an external addition (like being solely used for BBS to display file descriptions)?

Quote:
Originally Posted by Exodous View Post
Attached are a selection of DMS files using temisu's example file
Okay, thanks. I've changed my code to support Track FFFEh as banner and size 3C0h as fake bootblock.
Your 32767 byte banner seems to be only 32766 bytes tall. And why did you use 3C0h bytes for fake bootblocks???
Normally bootblocks are charactersistically having these two features: They contain a checksum on the 400h byte block. And they are stored on two physical 200h byte floppy sectors.

I don't know how DMS is detecting uncommon bootblocks (for triggering the warnings)...
Does it do that only when the bootblock contains a (in-)correct checksum?
Or does it somehow detect "uncommon program code" (however it could determine that)?
Or does it simply check for uncommon track sizes, ie. anything less than 2C00h bytes?

I have found some more broken dms files: http://eab.abime.net/showpost.php?p=262947&postcount=18 two of them contain traditional dms errors. The other file includes a bad CRC which is probably unrelated to dms bugs (it might have been damaged at some point after compression... when reading from a worn-out floppy, or from cross-linked FAT filesystem or whatever).

What we have now (more files would be welcome):
Code:
  Name_______________________Method_____Sys_LZ__RLE_Notes_______________
  _asi029.dms                RLE+Heavy2 OFS lit lit Missing byte on Track 4Fh (79)
  Eradication.dms            RLE+Heavy2 -   lit lit Bad checksum on Track 3Bh (59)
  Flt-cup.dms                Heavy2     -   lit -   Bad checksum on Track 18h (24)
  Grandnt2.dms               RLE+Heavy2 -   lit lit Bad checksum on Track 2Ah (42)
  TheUNT01.dms               ?          ?   ?   ?   Bad CRC (damaged AFTER compression?)
  EDIT:
  Parallax-CriticalMassA.dms ?          bad -   -   Good Checksums (but damaged BEFORE compression)
  Parallax-CriticalMassB.dms RLE+Heavy2 -   lz  rle Bad checksum on track 4Fh, unfixable
Assuming that the bugs are occuring in the last byte of those tracks: All of those bugged bytes are "literals" (neither RLE run-length codes nor LZ length-distance codes).
Interestingly, Flt-cup.dms isn't using RLE on the bugged track, so the problem isn't related to the RLE compression layer.

Mmmh, temisu, I think you might misremember what you were doing with which files. Looking at you source code, you may have perhaps wanted to say something like:
Quote:
Originally Posted by temisu View Post
However I specifically decided to fix not to fix this case what the _as029 eradication has because
  1. This case requires working backwards from checksum CRC to rebuild the data. If we do it this way the lower 8bit of the error checking becomes meaningless since it is used to derive the data and the upper 8bit is used to maintain some error checking
  2. It is rare case, to my knowledge this _as029 is only case in the wild which has this specific bug.
But anyways, what I have discovered is that the case doesn't seem to be as rare as you think.

The other three files (Eradication, Flt-cup, Grandnt2) seem to have the same missing byte problem, but the garbage padding at the end of the bitstream is fooling your decompressor into thinking that missingNo=0, and that's wrong, it should be missingNo=1.

You can detect that situtation by looking for checksum errors. Admittedly it isn't optimal to use the checksum to detect and correct the errors. I am still struggling to confirm if the error corrected results are reliable.

Another more general problem is that the wrong last byte is also stored in the LZ dictionary. That brings up the question if the dictionary content should be also fixed.
Currently I am not fixing the dictionary content and that seems to work okay (without generating a series of checksum errors on the following tracks).
I still need to test if fixing the dictionary content is triggering errors on following tracks (of course, that's don't care for errors on the last track).

Last edited by nocash; 05 April 2023 at 02:40.
nocash is offline  
Old 13 March 2023, 19:51   #20
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,172
This one also seems to be missing at least one byte on track $4F: https://files.scene.org/view/mirrors...ticalMassB.dms
paraj is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
Beginner question about WHLoad (2 different methods?) c0dehunter support.Apps 17 13 January 2019 20:29
Internal scan-doubling methods for the A1200 arkpandora support.Hardware 4 31 March 2015 15:56
Methods for removing labels from floppies diablothe2nd support.Other 22 08 July 2013 21:43
Best Compression Methods For... Lonewolf10 Coders. General 16 16 June 2013 17:31
How can I force DMS to write a dms in spite of fatal errors? andreas support.Apps 43 10 January 2007 05:13

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 22:50.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.18428 seconds with 14 queries