01 March 2023, 20:59 | #1 |
Registered User
Join Date: Feb 2016
Location: Homeless
Posts: 67
|
DMS compression methods
I am writing a decompressor for floppy disk images in .dms format. Going by the xDMS source code, the compression methods are based on standard formats like RLE90, LZSS, LZHUF, AR002 (although all .dms methods are slightly customized variants of those formats). So far, I can decompress these methods:
Method 0: Uncompressed some files contain banners in uncompressed form, for example: http://aminet.net/package/demo/mega/love_anarchy Method 1: Simple (RLE90) used in various 17bit cdrom files, http://cd.textfiles.com/17b5/files/ Method 3: Medium (RLE90+LZSS with Huffman) used in http://aminet.net/package/demo/mega/Anarchy2 Method 6: Heavy 2 (OptionalRLE90+AR002/lh5) most common, found in various .dms files on aminet I haven't found .dms files with these methods: Method 2: Quick (RLE90+LZSS) - not found Method 4: Deep (RLE90+LZHUF) - not found Method 5: Heavy1 (OptionalRLE90+AR002/lh4) - not found Would be cool if somebody knowns where to find images in that formats for testing (admittedly I am too dumb to use an Amiga emulator to create my own dms files in it). For curiosity: Is it important to support the Append feature (for one disk split into to two .dms files)? Are there any High Density .dms disk images for Amiga? Are there any .DMS files for DOS, Atari, Mac? And does the Disk Masher software even exist for those computers? And there seems to be also a File Masher .FMS variant, was that ever used anywhere? (asking because the DMS header contains entries for that stuff) Reportedly there are .dms files with "fake bootblocks", what is that, where could one find that such files? I assume that they contain 80 normal tracks, plus an additional track 0 entry that does overwrite/patch the original track 0, or parts of it (?) According to the xDMS source code, method 2-5 are sharing a single ring buffer for the dictionary (plus some oddities like buffer gaps and separate buffer pointers for each method). I hope that those methods are never combined inside of a single .dms file (like AR002 tracks relying on previous LZHUF tracks, with parts of it being overwritten by LZSS tracks). ------------ Is DMS bugged or not? I am confused there. xDMS doesn't mention any major problems. But Wikipedia says that Disk Masher has problems about bugs in the compression algorithm. But it's hard to believe that all compression methods & all software versions were bugged... unless the bug is in the shared RLE-compression layer, or in the overall match lookup function? Apart from actual bugs, there are two generic problems: DMS doesn't store low-level MFM data. And (without NOZERO option) it does only store blocks that are flagged as used in the allocation bitmap (a nice feature, but fatal if the disk doesn't contain a regular OFS/FFS filesystem). Some people also mentioned DMS problems here: Codetapper: https://eab.abime.net/showpost.php?p=19008&postcount=19 StingRay: https://eab.abime.net/showpost.php?p...0&postcount=13 Would be nice to have a freeware sample with an uncompressable disk in .ADF format, so people could try for themself! Looking at the xDMS source code, there is one small issue in the init function: The last dozens of bytes of the 16Kbyte dictionary are uninitialized, that bytes are only used by Medium and Deep methods, and they are normally instantly overwritten by the first compressed track... unless the first track is only a handful of bytes tall, then the uninitialized bytes could stay in the dictionary and keep getting "rotated around" in the ringbuffer. But that bytes would be usually overwritten at some later point before reaching the end of the disk (so it's unlikely to explain problems on track 79). And I've merely spotted that issue in the (inofficial) xDMS source code, the official software might behave differently (and the compressor might use its own lookup tree without caring about the ring buffer contents at all). Altogether, I doubt that the uninitilialized bytes are causing problems (but if they do, they could cause unpredictable results depending on whether the uninitialized random values do match up with the data on the disk or not). Last edited by nocash; 01 March 2023 at 21:05. |
02 March 2023, 12:53 | #2 |
Registered User
Join Date: Feb 2019
Location: Munich, Germany
Posts: 63
|
That was used by BBSs to include adverts. Later DMS versions displayed the bootblock during decompression if it looked non-standard. So I think they just prepended a fake bootblock with their BBS ad which was then immediately overwritten by the real bootblock.
|
02 March 2023, 13:10 | #3 | |
Thalion Webshrine
Join Date: Jan 2004
Location: Oxford
Posts: 14,550
|
Quote:
Atari ST had it's own competing equivalent called .MSA (Magic Shadow Archiver) |
|
02 March 2023, 17:01 | #4 |
Registered User
Join Date: Feb 2019
Location: Munich, Germany
Posts: 63
|
|
04 March 2023, 14:36 | #5 |
Registered User
Join Date: Mar 2017
Location: Tallinn / Estonia
Posts: 74
|
Good questions,
I have researched some answers when I made my decompressor. First, I can provide test files for all formats (I will check them in to my github). They are generated by me so there are no copyright issues. Then there is question about platforms: guys behind DMS advertised support for non-amiga platforms (for a fee) but I do not know if it existed and if there are files. For Amiga HD floppy format I managed to create working DMS-file but I doubt no one really used it I have not seem FMS either but that does not mean it wont exist About bugs, there are a bunch. Some of them can be circumvented on the fly when decompressing but others just create broken images. See here if interested https://github.com/temisu/ancient/bl...compressor.cpp Last edited by temisu; 05 March 2023 at 16:45. |
05 March 2023, 19:37 | #7 | ||
Registered User
Join Date: Feb 2016
Location: Homeless
Posts: 67
|
Many thanks for the test images! The source code does also look interesting, so far I had only seen the xDMS source code.
Quote:
If there was any competition... I haven't found any traces of DMS being released for PC/MAC/Atari (except, the internet archive has something called "Disk Masher XE" which seems to be for 8bit Atari, but I couldn't tell if it's the same software from same author). Or maybe the PC/Mac/Atari support did just mean that the Amiga version could compress floppies from such computers? Quote:
Would be nice to see a dms file with such bootblocks. Apropos, another oddity is the "file_id.diz" feature, as in the "miamivic.dms" file here https://telparia.com/fileFormatSamples/archive/dms/ - those diz files are stored as "track 80" which seems to rule out any support for compressing disks with more than 80 tracks? |
||
05 March 2023, 21:43 | #8 | ||
Registered User
Join Date: Mar 2017
Location: Tallinn / Estonia
Posts: 74
|
Quote:
Code:
For $30 you will be mailed a complete hard-copy manual and the latest version of DMS on disk. You will also receive an account on the SDS Software bulletin board system and be able to download via modem the latest versions free of charge. If you do not wish or have the capability to call the BBS , you can at any time send your disk with an SASE and we will copy the latest versions and mail it back to you. It also gives you an unlimited Upload/Download ratio on the BBS for one year. Along with the CLI/Shell version of DMS you will also receive when available: DMSPro - an advanced version of DMS with faster routines and the capability to archive Amax, Mac, MS-DOS, and Atari-ST format disks. Also probably it still means amiga executable. Quote:
|
||
06 March 2023, 18:42 | #9 |
Registered User
Join Date: Feb 2016
Location: Homeless
Posts: 67
|
I've downloaded dms files (thanks again). To get them, I had to fix two links:
https://github.com/temisu/ancient/ra...est_C1_ext.dms - renamed "test_ext" to "test_C1_ext" http://www.amigapd.com/uploads/5/5/0...10/_asi029.dms - https didn't work for me, but http does My decompressor is now working with all methods, including newly added quick, deep, heavy1. Do you know more about what is bugged in the two broken files? And for the first one, which trickery did you use to fix it? The Eradiction.dms file contains four of those fake bootblocks. Okay, I see. They contain a "DOS",00h,<checksum> bootblock header in first 8 bytes, but the remaining 7F8h bytes are just ASCII/ANSI text without any bootcode. Very weird : ) And as Bartman has said, they are then followed by the real Track 0 data entry, which is overwriting the fake crap. --- Looking through temisu's source code... If you want to get the source code more compact. The two lengthTable[256], bitlengthTable[256] arrays aren't really needed (they are equivalent to the standard LZHUF constants: 3,4,4,4,5,5,etc). Medium decompression can be done as so: Code:
lzh_explode_tree(tree,lh1_dist_codesizes,40h) ;aka createOrderlyHuffmanTable @@decompress_lop: if dst=dst_end then goto @@decompress_done if GetBits(1)=1 then [dst]=GetBits(8), dst=dst+1 else len=GetHuffCode(tree)+3 ;=max 42h disp=(GetHuffCode(tree)*100h+GetBits(8))+1 ;=max 4000h for i=1 to len, [dst]=[dst-disp], dst=dst+1, next i goto @@decompress_lop @@decompress_done: CastDmsPostGap(42h) ret lh1_dist_codesizes: ;same values as for LZHUF disp_tree db 3,4,4,4,5,5,5,5, 5,5,5,5,6,6,6,6, 6,6,6,6,6,6,6,6, 7,7,7,7,7,7,7,7 db 7,7,7,7,7,7,7,7, 7,7,7,7,7,7,7,7, 8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8 Code:
disp=(GetHuffCode(tree)*100h+GetBits(8))+1 ;=max 4000h And what is this... Code:
heavyLastOffset=use8kDict?0U:~0U; Code:
if use4kDict then LastOffset=-1 ;heavy1 if use8kDict then LastOffset=0 ;heavy2 (or is it vice versa?) (and after further subtraction (rawoffset-1) the actual offset will be -2 or -1, right?) In xDMS, that LastOffset seems to be (mis-)named "last length" or the like. The original xDMS code didn't seem to initialize it, but somebody has added initialization some years ago (confusingly the github change notes claim that it's initialized between files, but the github source code is actually initializing it anytime when clearing the dict buffer). --- And some thoughts on whether or not needing the ringbuffer... Basically, it might be faster to allocate space for initial dictionary in front of the decompression buffer, so one could just copy data from [dst-disp] to [dst], without needing the ringbuffer. With the dictionary being re-used by further tracks and with weird gaps inserted, I've brewed up this chart, showing the initial 4000h-byte content, the newly decoded data, and the required relocations for creating the initial 4000h-byte content for the next track. Code:
; |----------- old 4000h -------------------> ; : ; .- - - - - - - - - - - - - - - - - -.-----.----------------. ; | : : old | old | new | ; | : : data | gap | data | ; '- - - - - - - - - - - - - - - - - -'-----'----------------' ; __________________________:___________/ /: needed ; / ______________________________________/ : <-- when new<gap ; / / : : ;.-----.- - - - - - - - - - - - - - - - - -.-----.----------------. ;| old | : old | old | new | ;| gap | : data | gap | data | ;'-----'- - - - - - - - - - - - - - - - - -'-----'----------------' ; \ \_________________:____________________ ; \______________________________________ \ ; : : \ \ ;.-----.- - - - - - - - - - - - - - - - - -.-----.----------------.-----. ;| old | : old | old | new | new | ;| gap | : data | gap | data | gap | ;'-----'- - - - - - - - - - - - - - - - - -'-----'----------------'-----' ; ___________________/ : / ; / : : ___________________/ ; / : / ; . - - - - - -.-----:----------------.-----. ; | old | old | new | new | ; | data | gap | data | gap | ; '- - - - - - '-----'----------------'-----' ; : : ; <----------------- new 4000h -------------| So, I've dripped that idea, and I am now using a ringbuffer with masked index. So far, I've never seen files with more than one compression method (in range 2..6). For now, I didn't bother about the starting locations: Code:
quickContextLocation=251; mediumContextLocation=16318; deepContextLocation=16324; heavyContextLocation=0; Last edited by nocash; 06 March 2023 at 18:58. |
06 March 2023, 19:41 | #10 |
Thalion Webshrine
Join Date: Jan 2004
Location: Oxford
Posts: 14,550
|
I can't fire up my Amiga at the moment but I still have some DMS files downloaded from BBS of the day (all the data stored in my DMS files will be available in other formats) but I remember them all being less than 880k. There were 10s of them per release because towards the end of Amiga games were on CD or across multiple disks.
Am I remembering wrong or was there a facility in the DMS to add ANSI/ASCII which was displayed while decompressing/writing back to disks? I have memories of the BBS' of the day tagging their ANSI/ASCII banners (.nfo) to the DMS files (but it could have been another format) |
06 March 2023, 22:29 | #11 | ||||||
Registered User
Join Date: Mar 2017
Location: Tallinn / Estonia
Posts: 74
|
Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
You know, I always could use more test files |
||||||
07 March 2023, 09:59 | #12 | |
ex. demoscener "Bigmama"
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,649
|
Quote:
See posts #2 and #7. |
|
07 March 2023, 10:44 | #13 |
Thalion Webshrine
Join Date: Jan 2004
Location: Oxford
Posts: 14,550
|
Were there 2 sources of banner?
I have memories of files that had passed through multiple BBS' and had more than one banner. But no evidence. Last edited by alexh; 07 March 2023 at 11:02. |
07 March 2023, 10:56 | #14 |
ex. demoscener "Bigmama"
Join Date: Jun 2012
Location: Fyn / Denmark
Posts: 1,649
|
I'm not sure, but i definitely remember the file_id.diz business - if it was stored as track 0/bootblock or 80, I can't recall.
Last edited by hooverphonique; 08 March 2023 at 11:06. |
08 March 2023, 09:41 | #15 |
Registered User
Join Date: Sep 2019
Location: Leicester / England
Posts: 203
|
I wrote a series of Ami Express BBS doors to add/remove BBS adverts, one of which handled DMS files.
DMS files have a 16 bit [oops, originally I wrote 16 byte] unsigned header for the track number, and theoretically could store data for tracks 0 through to 65535. Also, they could have any number of track entries with the same track ID. All tracks would be written in the order they were read from the file, so it was important to put "advert" track 0 entries before the real track 0 entry, otherwise it would corrupt the final written disk content. Providing the DMS track checksums matched, you could theoretically put your advert as any track number. However, only "track 0" entries would be displayed if they were a non-standard bootblock starting with 'DOS' followed by a null byte. It was therefore common practice to also add advert tracks to other track numbers and, whilst this could theorietically be any track number as long as it appeared before the legitimate intance of that track number, most tools added them at the end of the file and outside the track range 0 to 79 as they wouldn't be written back to the disk by DMS. My own tool added them as track 65535. If you want, I'm happy to update a couple of temisu's example DMS files with adverts in the various positions if it helps? Last edited by Exodous; 08 March 2023 at 17:59. |
11 March 2023, 11:13 | #16 | |||
Registered User
Join Date: Feb 2016
Location: Homeless
Posts: 67
|
Quote:
Other than that, yes, an example with all kind of banners & stuff would be nice (as long as it does comply with what was used back then, without bringing up new obstacles that could make it more difficult to decompress the files). What I am more interested in is what was the purpose of the adverts... - Why are they called adverts... did they have commercial value, or is it just underground grafitti? - The banners (Track FFFFh) are displayed during decompression, right? - If so, why would one additionally use fake bootblocks (Track 0) to display extra stuff? - Is there some difference... something like banners could be disabled, but fake bootblocks are always displayed? - If stuff could be also stored on track 1,2,3,... and if it wouldn't be displayed... why would one do that? Is that just intended as hidden message that could be only viewed in hex editors? - What is "file_id.diz" (Track 80) for... is that also displayed during decompression? Or displayed elsewhere? - temisu mentioned something in Track FFFEh, what's that for? Is it also a hidden message without any other purpose? - Oh, and is there a size limit on banner tracks? Like causing memory/buffer overflows? Quote:
But the above chart was overcomplicated (I had somehow thought that there would be a corner case where one needed to store a copy of the old gap before old data, that was nonsense). Having that corrected, it now looks as so: Code:
; |----------- old 4000h -------------------> ; : ; .-.- - . - - - - - - - - - - - - - -.-----.----------------. ; | |junk| : : old | old | new | ; | | | : : data | gap | data | ; '-'- - ' - - - - - - - - - - - - - -'-----'----------------' ; \ \_________________:____________________ ; \______________________________________ \ ; : : \ \ ; .-.- - . - - - - - - - - - - - - - -.-----.----------------.-----. ; | |junk| : old | old | new | new | ; | | | : data | gap | data | gap | ; '-'- - ' - - - - - - - - - - - - - -'-----'----------------'-----' ; ___________________/ : / ; / : : ___________________/ ; / : / ; . - - - - - -.-----:----------------.-----. ; | old | old | new | new | ; | data | gap | data | gap | ; '- - - - - - '-----'----------------'-----' ; : : ; <----------------- new 4000h -------------| Quote:
EDIT: Or do you mean a RLE code with missing fillvalue? The possible RLE codes are: Code:
90h,00h Output 90h 90h,FFh,xxh,Hi,Lo Output xxh repeated Hi*100h+Lo times (Len=0..FFFFh) 90h,Len,xxh Output xxh repeated Len times (Len=1..FEh) xxh Output xxh I've been looking into the two bugged files... _asi029.dms The bug occurs in Track 4Fh (aka 79, aka last track). That track has -- Heavy size = 1011h (but there's one byte missing, it would require 1012h bytes) -- RLE size = 2B5Fh Did you track down where the byte got lost? Is it the RLE compressor not forwarding the last byte to Heavy? Or the Heavy compressor not storing the last bits of the bitstream? Also, is that problem common to occur (only?) on Track 79? Asking because Codetapper also mentioned issues on Track 79. Eradication.dms The bug occurs in Track 3Bh (aka 59). That track has -- Heavy size = 1FFBh -- RLE size = 2BE3h Those sizes are looking okay, no missing bytes, but the checksum is slightly off (+0Eh). After thinking about that for some days... what if it's the same problem, and the last byte is missing there, too? What I mean is: The heavy bitstream is padded to 8bit boundary. So, if the last RLE byte is missing, then those padding bits can appear to contain the missing byte. That could be easily fixed using your "guess the last byte" trick, too. Simply add something like this: IF size=okay AND checksum=wrong THEN assume size=size-1 To confirm that theory... Unfortunately, the Eradication file doesn't have OFS sector checksums (which could offer some (imperfect) extra error check if they were present). But fortunately, the original file can be found here in ADF format: https://www.pouet.net/prod.php?which=62355 - Insane-Eradication.adf And... yes, that seems to confirm the missing last byte theory : ) the file is exactly same as the dms decompression output, except the "missing" last byte on track 59 is different (in the ADF file the track ends with DF,E2,E0,DC, and the bugged DMS output has DF,E2,E0,EA in that location). PS. Would be interesting if one could actually reproduce the bug when compressing the original "Insane-Eradication.adf" file. Best with different DMS versions. And with different DMS methods. And also with "heavy-without-rle" (if DMS has an option for that). Last edited by nocash; 11 March 2023 at 11:27. |
|||
11 March 2023, 16:41 | #17 | |||||||
Registered User
Join Date: Sep 2019
Location: Leicester / England
Posts: 203
|
Quote:
Quote:
One man's advert is another's graffiti though. Quote:
Quote:
The fake bootblock was displayed where it didn't match a standard bootblock as a way of warning the end user it could be malicious. However, the "NOTEXT" option in DMS could suppress both the banners and track 0 display. Quote:
Quote:
Quote:
Attached are a selection of DMS files using temisu's example file test_C1_medium.dms as the base and then adding a front banner using ID FFFFh, a front banner using ID FFFEh, a track 0 banner, a rear banner using ID FFFFh and finally a 32767 byte front banner (the first part of Romeo and Juliet). There is as a log file showing the output when written by DMS 2.01. |
|||||||
11 March 2023, 21:54 | #18 | ||||
Registered User
Join Date: Mar 2017
Location: Tallinn / Estonia
Posts: 74
|
Quote:
Quote:
Quote:
I do believe that these bugged files are rare in the wild, most likely when some images were broken they were re-uploaded and replaced with fixed images. In these cases where the broken files are still available the problem probably appears in a place where it is non-breaking. My decision to fix the "easy" breakage that is pretty safe to do was just to give people extra 0.1% to make my implementation the best on there is Quote:
In any case you can find some DMS versions here: http://www.amiga-stuff.com/archivers-download.html Then take fs-uae and go wild (even the basic AROS will do the trick, you don't have to buy roms), you can then process as many files as you want. If you are serious about any amiga stuff, you have to do this sooner or later anyway. (I also had to go outside my comfort zone earlier when dealing with LOB compression since tooling was only available for Atari. So now I know Hatari ) |
||||
13 March 2023, 19:22 | #19 | |||||
Registered User
Join Date: Feb 2016
Location: Homeless
Posts: 67
|
Quote:
Quote:
Quote:
Quote:
Your 32767 byte banner seems to be only 32766 bytes tall. And why did you use 3C0h bytes for fake bootblocks??? Normally bootblocks are charactersistically having these two features: They contain a checksum on the 400h byte block. And they are stored on two physical 200h byte floppy sectors. I don't know how DMS is detecting uncommon bootblocks (for triggering the warnings)... Does it do that only when the bootblock contains a (in-)correct checksum? Or does it somehow detect "uncommon program code" (however it could determine that)? Or does it simply check for uncommon track sizes, ie. anything less than 2C00h bytes? I have found some more broken dms files: http://eab.abime.net/showpost.php?p=262947&postcount=18 two of them contain traditional dms errors. The other file includes a bad CRC which is probably unrelated to dms bugs (it might have been damaged at some point after compression... when reading from a worn-out floppy, or from cross-linked FAT filesystem or whatever). What we have now (more files would be welcome): Code:
Name_______________________Method_____Sys_LZ__RLE_Notes_______________ _asi029.dms RLE+Heavy2 OFS lit lit Missing byte on Track 4Fh (79) Eradication.dms RLE+Heavy2 - lit lit Bad checksum on Track 3Bh (59) Flt-cup.dms Heavy2 - lit - Bad checksum on Track 18h (24) Grandnt2.dms RLE+Heavy2 - lit lit Bad checksum on Track 2Ah (42) TheUNT01.dms ? ? ? ? Bad CRC (damaged AFTER compression?) EDIT: Parallax-CriticalMassA.dms ? bad - - Good Checksums (but damaged BEFORE compression) Parallax-CriticalMassB.dms RLE+Heavy2 - lz rle Bad checksum on track 4Fh, unfixable Interestingly, Flt-cup.dms isn't using RLE on the bugged track, so the problem isn't related to the RLE compression layer. Mmmh, temisu, I think you might misremember what you were doing with which files. Looking at you source code, you may have perhaps wanted to say something like: Quote:
The other three files (Eradication, Flt-cup, Grandnt2) seem to have the same missing byte problem, but the garbage padding at the end of the bitstream is fooling your decompressor into thinking that missingNo=0, and that's wrong, it should be missingNo=1. You can detect that situtation by looking for checksum errors. Admittedly it isn't optimal to use the checksum to detect and correct the errors. I am still struggling to confirm if the error corrected results are reliable. Another more general problem is that the wrong last byte is also stored in the LZ dictionary. That brings up the question if the dictionary content should be also fixed. Currently I am not fixing the dictionary content and that seems to work okay (without generating a series of checksum errors on the following tracks). I still need to test if fixing the dictionary content is triggering errors on following tracks (of course, that's don't care for errors on the last track). Last edited by nocash; 05 April 2023 at 02:40. |
|||||
13 March 2023, 19:51 | #20 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,276
|
This one also seems to be missing at least one byte on track $4F: https://files.scene.org/view/mirrors...ticalMassB.dms
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Beginner question about WHLoad (2 different methods?) | c0dehunter | support.Apps | 17 | 13 January 2019 20:29 |
Internal scan-doubling methods for the A1200 | arkpandora | support.Hardware | 4 | 31 March 2015 15:56 |
Methods for removing labels from floppies | diablothe2nd | support.Other | 22 | 08 July 2013 21:43 |
Best Compression Methods For... | Lonewolf10 | Coders. General | 16 | 16 June 2013 17:31 |
How can I force DMS to write a dms in spite of fatal errors? | andreas | support.Apps | 43 | 10 January 2007 05:13 |
|
|