14 October 2018, 03:58 | #1 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
|
MFM en/de-coding space and (dynamic?) fragmental datastream?
That was the best I could do for the topic. Read through and suggest something better if you want to improve it.
Ok, so I have started this in a backwards fashion: I wondered if the 2:1 standard encoding was the only possible way to store data on a floppy. Without breaking any 'rules'. The rules as I understand them is no more than one "1" bit in sequence and no more than three "0" bits in sequence. If this is wrong (or needs more qualifiers) then the rest below is just garbage. I have looked at encoded mfm data, trying to find all valid combinations. This in itself is kinda meaningless unless you restrict yourself to a certain size for the data you look at. I first looked at 8 bit groupings which gave a worst case worse than the normal encoding rules (but an average that was probably a good deal better if you could find a way to encode and decode the stream). Next I tried with 16 bit groupings and for worst case there is (if my manual calculations are correct) 270 valid encoding patterns. Worst case being that the previous 3 bit prefix was "000" and so your 16-bit pattern begins with a "1". I haven't checked the other cases ("001", "010", "100", "101") but they had more patterns for 8 bits so I'll be ever so bombastic and assume they are better. How do you construct a datastream using base-270 numbers? I know arithmetic encoding uses sub-bit math, and IIRC MIME-encoding uses something that fits into less space. Can you also do it with varying base numbers? Some of the prefixes might generate more than 300 valid patterns which could increase the storage density nicely if the algorithm doesn't break down. Now, this might not be practically possible, or just 30 years too late. You can browse the encodings I typed out. And I have this feeling that I must be missing something major here... (Bonus Q: What is the rule for encoding the first bit in the mfm stream? Since there is no previous bits I mean. Or does it depend on the sync mark?) |
14 October 2018, 10:03 | #2 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Hi NorthWay, check some Psygnosis title for a 'dense' MFM (Nitro come to my mind).
Or a different 'trackdisk.device' (I dont remember the name..) that give you 14 sectors/track. EDIT: Quote:
During writing the first bit in absolute is rewritten in the following rotation (unless it uses short tracks). This is why you put a pattern '010101010 .. "before the sync mark and terminate with 0 in stream, so whatever the amount of bits before sync you will have a valid MFM pattern. If you mean the first bit after sync, then just follow the MFM rules EDIT2: found! floppy.device, in Aminet floppy43.lha (but it is not the only one, I remember of others in the end of the 80s). I had also made my own version of track-writer for dense MFM, but lost in time.. Last edited by ross; 14 October 2018 at 10:25. |
|
14 October 2018, 19:46 | #3 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
|
Do you know if they use a regular mfm encoding or an alternative? And/or just longer tracks and (Nitro written on a slower drive?
|
14 October 2018, 23:05 | #4 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
|
14 October 2018, 23:22 | #5 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
|
|
14 October 2018, 23:32 | #6 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
In the past I had done tests using 4 "0" in a row and worked, only you had to use right precompensation on writing otherwise you had problems on the internal tracks. And sure was more prone to sporadic read errors, but I do not remember much, too many years have passed.. Nitro fit on one disk because is packed with shrinkler, ask Galahad for the challenge |
|
15 October 2018, 06:21 | #7 |
Registered User
Join Date: Feb 2007
Location: Melbourne, Australia
Age: 41
Posts: 3,772
|
|
04 February 2023, 06:59 | #8 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
|
So I just re-visited this topic tonight and tried looking at it from another angle.
Unless I have messed up my logic and math it should be possible to store on _average_ 1142K decoded data from a 2M raw bitstream. The size will vary depending on the data to encode, so you might get better results with using random numbers to XOR with it? The general idea is like this: Begin with a known state. Treat your data like an input stream and the encoded data like an output stream. Do a switch/case depending on the 3 previous bits in your output and expand your output as follows Code:
- [001][101] 0 -> 00, 1 -> 01 - [100] 0 -> 01, 1 -> 1 This isn't an optimal solution as there are a number of bit pattern combinations that will never be generated and are therefore wasted. I haven't coded up a test for this, but that could be fun. As far as I can see this obeys the MFM rules, though you need to use the cpu to decode it. Any holes you can poke in this? |
04 February 2023, 10:00 | #9 |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Hi NorthWay, the only caveat I have is that it would be extremely slow
(I think you're unlikely to go below two (three?) rotations per track using all the CPU time for a bare 68k). I don't remember exactly the aforementioned encoding of some Psygnosis titles, but it seems very very similar to me and in fact the decoding takes place per single bit (unfortunately you cannot use a 3-bit packet approach because the window is variable in the case [100] 1 -> 1). Another problem is the variable length of the tracks (solved by Psygosis by inserting a tag per track and not using sectors). As regards the weight of the 0s and 1s it is usually a self-solving problem using in any case an associated data compressor which rebalances the presence of bits (especially by entropy coding). Well, have fun and report the results |
04 February 2023, 18:09 | #10 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
|
One way of implementing a variation of your idea is the following encoding:
Code:
State Prior Coding (0/1) Next state A 000 100 101 C/D B 001 00 01 C/D C 100 0 1 A/B D 101 00 01 C/D Haven't tried, but I imagine it would be possible to make a decoder that's not crazy slow by using some clever LUTs. To output 4 bits you need 9 inputs in the worst case (can only happen when last state was C). One (useful?) thing is that while 010 is a valid prior value it can't happen with this encoding. If the scheme could be tweaked to make it happen it would be useful since you could output a single bit in that case. |
04 February 2023, 18:30 | #11 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
For all the RLL(1,3) encodings I've tried (long time ago..) I haven't found any LUTs that weren't gigantic. Perhaps the solution is to be able to handle various sub LUTs with smart (and unrolled) selector code. No problem instead for RLL(1,4) encodings, where the LUT can also be single and of quite limited size (I must have the code somewhere..). But seems that 'new' drives hardly support it. |
|
04 February 2023, 18:52 | #12 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
Even if you sync to 'disk index' you need a 'unique' word to start DMA.* Not that it's mandatory, but let's at least try to avoid further slowdown, in theory a double buffer for the data could be used during decoding. EDIT: * thinking about it better, in fact it is not that it is fundamental, given that in any case we are basically forced to read an entire track. It is enough that the word is preceded by 'neutral' values. However, in addition to the DSKLEN start exactly synchronized, even the writing on disk must also be. Last edited by ross; 04 February 2023 at 19:10. |
|
05 February 2023, 01:04 | #13 |
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
|
Just to flesh out my table in the same style with all the unused states
Code:
State Prior Coding (0/1) Next State A 001 00 01 D/B B 101 00 01 D/B C 010 D 100 01 1 A/A E 000 100 101 D/B (Not used, but how it has to be done if used.) Another variable way to do MFM encode could be to always assume prior bits to end in 1 and encode 3 states as 01 / 001 / 0001. That is on average the same as MFM but much more of a hassle to handle and has worst-case at 75(?)% density. |
05 February 2023, 01:41 | #14 | |
Registered User
Join Date: Jun 2010
Location: PL?
Posts: 2,743
|
Quote:
|
|
05 February 2023, 11:52 | #15 | |
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
Paula is absolutely not interested in the incoming flux, apart from a different modality in the synchronism: it only select the data clock rate! Verbose and boring mode on. The limit is the magnetic surface: the 1 bits cannot be too close because the change of polarity in the flux would move the cells and make recognition impossible. To overcome the problem you can use 'enough' spaced cells. For DD disks this distance is 4us: the 1s can be consecutive, the flux change is acceptable on the magnetic surface, no problem on read. However, there is a different problem, i.e. several consecutive 0s*: without a change of polarity the synch can be lost 'cause difference in motor speed (the DPLL fails to resynchronize to flux change). In this case the best coding is certainly the GCR, where 'groups' of bits even with infinite consecutive 1's can be used but few zeros. But at a certain point someone must have thought: why don't we make a coding where it is impossible to have a flux change <4us but which also allows underclocks, i.e. flux changes even at 6us? And from there comes the MFM where it is enough to double the clock and prevent two consecutive 1s So: GCR is limited to 4us cells and therefore, however optimizing the bitgroup, this coding will never equal the 2:1 ratio of MFM. Yes, so in fact for Paura there is no GCR or MFM writing, there is only writing at 2us or 4us and related precompensation; different for reading where the start of data reading, i.e. the synchronization method, is important. *A series of 0s, specifically 3 at 2us but even a couple more, generally doesn't create problems, the DPLL manages to resynchronize. It seems that the new drives (modified HD and derivatives) are 'too smart' and tend to 'spontaneously correct' >3 zeros making RLL(1,4) encodings problematic. The ones on my old A500 or A1200 had no big problems if the right precompensation in writing was used. I have no idea about Goteks or similar but I think they are not a problem and neither is WinUAE. A last word: it is not known exactly how Paula's precompensation or DPLL works, one can only guess from indirect evidence. |
|
06 February 2023, 22:33 | #16 |
Registered User
Join Date: Jun 2010
Location: PL?
Posts: 2,743
|
I'm fully aware of this, that's why there is many RLL codes (GCR is also RLL code) to deal with such limitations - theoretically Paula should be able to deal with them at a cost of complexity encode/decode (CPU time).
But GCR is unknown too me - no details about implementations and HRM information suggest that GCR is usable only in single density mode (i.e. 125/250 kbps not 250/500kbps). All precompensation implementations i know advance or delay time when bit is written to disk, it depends on predicted flux shift - usually some shifter is used to do so. There is some DPLL Paula patent that address some details of inner work but still there is no information how it works in GCR mode? One way of dealing with such problem is simply variable amount of sectors stored per track - outer tracks may simply carry more data where inner less. Amiga never used such approach to my knowledge... And floppy read/write "processors" are analog devices - so perhaps there is issue that they are tuned for HD not DD floppies - that's why they may misbehave with some odd sequences... (there is a bunch of analog filters there, some high pass to act as differentiator - if time constant is incorrect then i assume it may be some source of problems). I bet that by proper selection of filter characteristic this would be not and issue. Seem there is some work on complex RLL encoding for floppies https://c65gs.blogspot.com/2022/01/m...-encoding.html - curious if Paula can be flexible enough to deal with RLL2,7 or more advanced encoding. Last edited by pandy71; 11 February 2023 at 22:04. |
12 February 2023, 13:24 | #17 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
|
I've made a small proof of concept. Unfortunately I no longer have a working floppy drive. It works using gotek on my A1200, but that doesn't really prove anything.
It would be cool if somebody with a working drive could try it out, and see if it's viable in this form. In the attached archive there's a program (write_eadf.exe) to write the (extended) ADF file altmfm.adf to a disk (Run write_eadf.exe 0 altmfm.adfto write it to DF0). It may not work reliably under KS1.x due to use of TD_RAWWRITE though. The disk should be bootable and verifies all the encoded tracks. Each track is encoded in full using the table from my earlier post with two twists suggested by Ross. 1. Using a RLL(1,4) syncword ($5084) to avoid using index sync. Wordsync can't really be used if the encoded data can contain the sync word (see https://eab.abime.net/showthread.php?p=1437047). 2. Xoring the data with a 16-bit value before encoding to increase channel capacity usage. This acts as a simple filter to ensure a good distribution of the bits. I find the value by bruteforce for each track, but just trying a few (e.g. 256) random ones also gives a decent increase. Even for compressed/random data capacity usage is >99.1% (== encoding cost: 1.828). For a full disk that's an extra ~3K. Decoding is done 4 bits at a time and takes ~192 cycles/byte using 12K RAM (4K could be saved at a cost of 14 cycles/byte by shifting instead of using different tables for the upper and lower nibble). |
12 February 2023, 15:21 | #18 | ||||
Defendit numerus
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
|
Quote:
But if it was just a magnetic surface issue it would mean that setting 2us and a GCR encoding of some sort [RLL(0,x)] and using an HD floppy then something should work. But my feeling is that Paula's DPLL would not be able to handle the data and would sample completely wrong as it is probably only set to handle flux changes every ~4us or more (else it 'correct' the clock and messes everything). About GCR implementations: as it is a 'software' LUT approach it is basically 'free' and the various companies implemented it at will. So it's usually just a variation of an RLL(0,2), with a variable bit-group association. And a different synchronization for the data start, which is discussed somewhere on the forum (of course I haven't found now..) and which WinUAE supports. Quote:
Quote:
Missing these two conditions I would say that the best (and even the only possible) approach is to have a defined amount of sectors per track. Quote:
Nice |
||||
12 February 2023, 21:17 | #19 | |
Moderator
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
|
Quote:
I experimented with this back in the day, and I think I got it down to 1.675 bits per decoded bit. The sync marks are indeed MFM-compatible. You can also experiment with longer tracks, write across the gap and rewrite the gap data, and of course have one 'sector' per track. On top of this you can adapt a compression algorithm to somehow use your built in optimal encoding, and so waste less disk space. But the train has mostly passed for these types of experiments. Now we appreciate if a disk is preservable in a standard format, and any compression is done on the data before encoding. But I remember having a cozy time. I think this was around the time when I learned to drink coffee. |
|
12 February 2023, 21:47 | #20 |
Registered User
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
|
The limit for arbitrary data is 1/0.5515 (~1.8132). Old thread: https://eab.abime.net/showthread.php?t=101466&
Thinking about it again, I think this scheme is better: Code:
State Prior Coding (0/1) Next State A 010 0 / 10 B / A B 100 010 / 10 A / A Previous test case would have to work before it makes sense to optimize, so still looking for volunteers . (Source is included BTW). |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[Found: Space Baller] Bouncing ball in space? | Kada | Looking for a game name ? | 23 | 06 April 2018 03:10 |
Dynamic Drums | illy5603 | request.Music | 2 | 21 September 2017 21:39 |
Dynamic HDF - Please explain | ransom1122 | support.WinUAE | 18 | 04 January 2017 07:32 |
Problem with Dynamic HDF | tero | support.WinUAE | 13 | 27 October 2009 17:33 |
Blitter MFM decoding | Photon | Coders. General | 14 | 16 March 2006 11:24 |
|
|