English Amiga Board


Go Back   English Amiga Board > Coders > Coders. Asm / Hardware

 
 
Thread Tools
Old 14 October 2018, 03:58   #1
NorthWay
Registered User
 
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
MFM en/de-coding space and (dynamic?) fragmental datastream?

That was the best I could do for the topic. Read through and suggest something better if you want to improve it.

Ok, so I have started this in a backwards fashion: I wondered if the 2:1 standard encoding was the only possible way to store data on a floppy. Without breaking any 'rules'. The rules as I understand them is no more than one "1" bit in sequence and no more than three "0" bits in sequence. If this is wrong (or needs more qualifiers) then the rest below is just garbage.

I have looked at encoded mfm data, trying to find all valid combinations. This in itself is kinda meaningless unless you restrict yourself to a certain size for the data you look at.
I first looked at 8 bit groupings which gave a worst case worse than the normal encoding rules (but an average that was probably a good deal better if you could find a way to encode and decode the stream).
Next I tried with 16 bit groupings and for worst case there is (if my manual calculations are correct) 270 valid encoding patterns.
Worst case being that the previous 3 bit prefix was "000" and so your 16-bit pattern begins with a "1". I haven't checked the other cases ("001", "010", "100", "101") but they had more patterns for 8 bits so I'll be ever so bombastic and assume they are better.

How do you construct a datastream using base-270 numbers? I know arithmetic encoding uses sub-bit math, and IIRC MIME-encoding uses something that fits into less space. Can you also do it with varying base numbers? Some of the prefixes might generate more than 300 valid patterns which could increase the storage density nicely if the algorithm doesn't break down.

Now, this might not be practically possible, or just 30 years too late. You can browse the encodings I typed out.
And I have this feeling that I must be missing something major here...

(Bonus Q: What is the rule for encoding the first bit in the mfm stream? Since there is no previous bits I mean. Or does it depend on the sync mark?)
Attached Files
File Type: txt mfm-16.txt (5.8 KB, 162 views)
NorthWay is offline  
Old 14 October 2018, 10:03   #2
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Hi NorthWay, check some Psygnosis title for a 'dense' MFM (Nitro come to my mind).

Or a different 'trackdisk.device' (I dont remember the name..) that give you 14 sectors/track.


EDIT:
Quote:
Originally Posted by NorthWay View Post
(Bonus Q: What is the rule for encoding the first bit in the mfm stream? Since there is no previous bits I mean. Or does it depend on the sync mark?)
What's the 'first bit'?
During writing the first bit in absolute is rewritten in the following rotation (unless it uses short tracks).
This is why you put a pattern '010101010 .. "before the sync mark and terminate with 0 in stream, so whatever the amount of bits before sync you will have a valid MFM pattern.
If you mean the first bit after sync, then just follow the MFM rules


EDIT2: found!
floppy.device, in Aminet floppy43.lha (but it is not the only one, I remember of others in the end of the 80s).
I had also made my own version of track-writer for dense MFM, but lost in time..

Last edited by ross; 14 October 2018 at 10:25.
ross is offline  
Old 14 October 2018, 19:46   #3
NorthWay
Registered User
 
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
Quote:
Originally Posted by ross View Post
Hi NorthWay, check some Psygnosis title for a 'dense' MFM (Nitro come to my mind).

Or a different 'trackdisk.device' (I dont remember the name..) that give you 14 sectors/track.
Do you know if they use a regular mfm encoding or an alternative? And/or just longer tracks and (Nitro written on a slower drive?
NorthWay is offline  
Old 14 October 2018, 23:05   #4
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by NorthWay View Post
Do you know if they use a regular mfm encoding or an alternative? And/or just longer tracks and (Nitro written on a slower drive?
Both non-regular MFM encoding and practically standard track length (Nitro==$189d DMA words, floppy.device inevitably standard).
ross is offline  
Old 14 October 2018, 23:22   #5
NorthWay
Registered User
 
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
Quote:
Originally Posted by ross View Post
Both non-regular MFM encoding and practically standard track length (Nitro==$189d DMA words, floppy.device inevitably standard).
How interesting! Is there any description on how this is achieved? What did the cracks of Nitro do to fit on disk?
NorthWay is offline  
Old 14 October 2018, 23:32   #6
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by NorthWay View Post
How interesting! Is there any description on how this is achieved? What did the cracks of Nitro do to fit on disk?
No description, but there is some "0000" patterns.
In the past I had done tests using 4 "0" in a row and worked, only you had to use right precompensation on writing otherwise you had problems on the internal tracks.
And sure was more prone to sporadic read errors, but I do not remember much, too many years have passed..

Nitro fit on one disk because is packed with shrinkler, ask Galahad for the challenge
ross is offline  
Old 15 October 2018, 06:21   #7
Hewitson
Registered User
 
Hewitson's Avatar
 
Join Date: Feb 2007
Location: Melbourne, Australia
Age: 41
Posts: 3,772
Quote:
Originally Posted by NorthWay View Post
How interesting! Is there any description on how this is achieved? What did the cracks of Nitro do to fit on disk?
All cracks (except Galahad's) either removed the intro or had it on a second disk.
Hewitson is offline  
Old 04 February 2023, 06:59   #8
NorthWay
Registered User
 
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
So I just re-visited this topic tonight and tried looking at it from another angle.
Unless I have messed up my logic and math it should be possible to store on _average_ 1142K decoded data from a 2M raw bitstream. The size will vary depending on the data to encode, so you might get better results with using random numbers to XOR with it?

The general idea is like this:
Begin with a known state. Treat your data like an input stream and the encoded data like an output stream.
Do a switch/case depending on the 3 previous bits in your output and expand your output as follows
Code:
- [001][101] 0 -> 00, 1 -> 01
- [100] 0 -> 01, 1 -> 1
If the input distribution is evenly distributed among the 3 possible previous-bits I believe you should get the following average number of output bits per input bit: ((1/3)*2)+((1/3)*2)+((1/3)*1,5) == 1,835

This isn't an optimal solution as there are a number of bit pattern combinations that will never be generated and are therefore wasted.
I haven't coded up a test for this, but that could be fun.

As far as I can see this obeys the MFM rules, though you need to use the cpu to decode it. Any holes you can poke in this?
NorthWay is offline  
Old 04 February 2023, 10:00   #9
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Hi NorthWay, the only caveat I have is that it would be extremely slow
(I think you're unlikely to go below two (three?) rotations per track using all the CPU time for a bare 68k).

I don't remember exactly the aforementioned encoding of some Psygnosis titles, but it seems very very similar to me and in fact the decoding takes place per single bit (unfortunately you cannot use a 3-bit packet approach because the window is variable in the case [100] 1 -> 1).

Another problem is the variable length of the tracks (solved by Psygosis by inserting a tag per track and not using sectors).

As regards the weight of the 0s and 1s it is usually a self-solving problem using in any case an associated data compressor which rebalances the presence of bits (especially by entropy coding).

Well, have fun and report the results
ross is offline  
Old 04 February 2023, 18:09   #10
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
One way of implementing a variation of your idea is the following encoding:

Code:
State Prior   Coding (0/1)  Next state
A     000     100 101       C/D
B     001      00  01       C/D
C     100       0   1       A/B
D     101      00  01       C/D
Assuming random input you should end up outputting on average (1/6)*3 + (1/6)*2 + (2/6)*1 + (2/6)*2 = 1.833 bits/input bit.
Haven't tried, but I imagine it would be possible to make a decoder that's not crazy slow by using some clever LUTs. To output 4 bits you need 9 inputs in the worst case (can only happen when last state was C).

One (useful?) thing is that while 010 is a valid prior value it can't happen with this encoding. If the scheme could be tweaked to make it happen it would be useful since you could output a single bit in that case.
paraj is offline  
Old 04 February 2023, 18:30   #11
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by paraj View Post
Haven't tried, but I imagine it would be possible to make a decoder that's not crazy slow by using some clever LUTs.
And here is the problem.
For all the RLL(1,3) encodings I've tried (long time ago..) I haven't found any LUTs that weren't gigantic.
Perhaps the solution is to be able to handle various sub LUTs with smart (and unrolled) selector code.

No problem instead for RLL(1,4) encodings, where the LUT can also be single and of quite limited size (I must have the code somewhere..).
But seems that 'new' drives hardly support it.
ross is offline  
Old 04 February 2023, 18:52   #12
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
One (useful?) thing is that while 010 is a valid prior value it can't happen with this encoding
Ah!, and remember that at least a combination should be left free for WORDSYNC.

Even if you sync to 'disk index' you need a 'unique' word to start DMA.*
Not that it's mandatory, but let's at least try to avoid further slowdown, in theory a double buffer for the data could be used during decoding.


EDIT: * thinking about it better, in fact it is not that it is fundamental, given that in any case we are basically forced to read an entire track.
It is enough that the word is preceded by 'neutral' values.
However, in addition to the DSKLEN start exactly synchronized, even the writing on disk must also be.

Last edited by ross; 04 February 2023 at 19:10.
ross is offline  
Old 05 February 2023, 01:04   #13
NorthWay
Registered User
 
Join Date: May 2013
Location: Grimstad / Norway
Posts: 839
Just to flesh out my table in the same style with all the unused states
Code:
State Prior Coding (0/1) Next State
A      001    00   01          D/B
B      101    00   01          D/B
C      010
D      100    01    1          A/A
E      000   100  101          D/B (Not used, but how it has to be done if used.)
The 4-state suggestion by paraj should be a little bit better though.

Another variable way to do MFM encode could be to always assume prior bits to end in 1 and encode 3 states as 01 / 001 / 0001. That is on average the same as MFM but much more of a hassle to handle and has worst-case at 75(?)% density.
NorthWay is offline  
Old 05 February 2023, 01:41   #14
pandy71
Registered User
 
Join Date: Jun 2010
Location: PL?
Posts: 2,742
Quote:
Originally Posted by ross View Post
Even if you sync to 'disk index' you need a 'unique' word to start DMA.*
Not that it's mandatory, but let's at least try to avoid further slowdown, in theory a double buffer for the data could be used during decoding.


EDIT: * thinking about it better, in fact it is not that it is fundamental, given that in any case we are basically forced to read an entire track.
It is enough that the word is preceded by 'neutral' values.
However, in addition to the DSKLEN start exactly synchronized, even the writing on disk must also be.
What with GCR mode? Multiple times tried to find some information's about GCR mode but no luck... How Paula work in GCR mode?
pandy71 is offline  
Old 05 February 2023, 11:52   #15
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by pandy71 View Post
What with GCR mode? Multiple times tried to find some information's about GCR mode but no luck... How Paula work in GCR mode?
Unfortunately it is not a viable way, due to the 'relative inefficiency' of the encoding.
Paula is absolutely not interested in the incoming flux, apart from a different modality in the synchronism: it only select the data clock rate!


Verbose and boring mode on.

The limit is the magnetic surface: the 1 bits cannot be too close because the change of polarity in the flux would move the cells and make recognition impossible.
To overcome the problem you can use 'enough' spaced cells.
For DD disks this distance is 4us: the 1s can be consecutive, the flux change is acceptable on the magnetic surface, no problem on read.
However, there is a different problem, i.e. several consecutive 0s*: without a change of polarity the synch can be lost 'cause difference in motor speed (the DPLL fails to resynchronize to flux change).
In this case the best coding is certainly the GCR, where 'groups' of bits even with infinite consecutive 1's can be used but few zeros.

But at a certain point someone must have thought: why don't we make a coding where it is impossible to have a flux change <4us but which also allows underclocks, i.e. flux changes even at 6us?
And from there comes the MFM where it is enough to double the clock and prevent two consecutive 1s

So: GCR is limited to 4us cells and therefore, however optimizing the bitgroup, this coding will never equal the 2:1 ratio of MFM.
Yes, so in fact for Paura there is no GCR or MFM writing, there is only writing at 2us or 4us and related precompensation; different for reading where the start of data reading, i.e. the synchronization method, is important.

*A series of 0s, specifically 3 at 2us but even a couple more, generally doesn't create problems, the DPLL manages to resynchronize.
It seems that the new drives (modified HD and derivatives) are 'too smart' and tend to 'spontaneously correct' >3 zeros making RLL(1,4) encodings problematic.
The ones on my old A500 or A1200 had no big problems if the right precompensation in writing was used. I have no idea about Goteks or similar but I think they are not a problem and neither is WinUAE.

A last word: it is not known exactly how Paula's precompensation or DPLL works, one can only guess from indirect evidence.
ross is offline  
Old 06 February 2023, 22:33   #16
pandy71
Registered User
 
Join Date: Jun 2010
Location: PL?
Posts: 2,742
I'm fully aware of this, that's why there is many RLL codes (GCR is also RLL code) to deal with such limitations - theoretically Paula should be able to deal with them at a cost of complexity encode/decode (CPU time).
But GCR is unknown too me - no details about implementations and HRM information suggest that GCR is usable only in single density mode (i.e. 125/250 kbps not 250/500kbps).
All precompensation implementations i know advance or delay time when bit is written to disk, it depends on predicted flux shift - usually some shifter is used to do so.
There is some DPLL Paula patent that address some details of inner work but still there is no information how it works in GCR mode?
One way of dealing with such problem is simply variable amount of sectors stored per track - outer tracks may simply carry more data where inner less. Amiga never used such approach to my knowledge...

And floppy read/write "processors" are analog devices - so perhaps there is issue that they are tuned for HD not DD floppies - that's why they may misbehave with some odd sequences... (there is a bunch of analog filters there, some high pass to act as differentiator - if time constant is incorrect then i assume it may be some source of problems). I bet that by proper selection of filter characteristic this would be not and issue.

Seem there is some work on complex RLL encoding for floppies https://c65gs.blogspot.com/2022/01/m...-encoding.html - curious if Paula can be flexible enough to deal with RLL2,7 or more advanced encoding.

Last edited by pandy71; 11 February 2023 at 22:04.
pandy71 is offline  
Old 12 February 2023, 13:24   #17
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
I've made a small proof of concept. Unfortunately I no longer have a working floppy drive. It works using gotek on my A1200, but that doesn't really prove anything.

It would be cool if somebody with a working drive could try it out, and see if it's viable in this form. In the attached archive there's a program (write_eadf.exe) to write the (extended) ADF file altmfm.adf to a disk (Run
write_eadf.exe 0 altmfm.adf
to write it to DF0). It may not work reliably under KS1.x due to use of TD_RAWWRITE though. The disk should be bootable and verifies all the encoded tracks.

Each track is encoded in full using the table from my earlier post with two twists suggested by Ross.
1. Using a RLL(1,4) syncword ($5084) to avoid using index sync. Wordsync can't really be used if the encoded data can contain the sync word (see https://eab.abime.net/showthread.php?p=1437047).
2. Xoring the data with a 16-bit value before encoding to increase channel capacity usage. This acts as a simple filter to ensure a good distribution of the bits. I find the value by bruteforce for each track, but just trying a few (e.g. 256) random ones also gives a decent increase. Even for compressed/random data capacity usage is >99.1% (== encoding cost: 1.828). For a full disk that's an extra ~3K.

Decoding is done 4 bits at a time and takes ~192 cycles/byte using 12K RAM (4K could be saved at a cost of 14 cycles/byte by shifting instead of using different tables for the upper and lower nibble).
Attached Files
File Type: 7z altmfm.7z (35.9 KB, 38 views)
paraj is offline  
Old 12 February 2023, 15:21   #18
ross
Defendit numerus
 
ross's Avatar
 
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 53
Posts: 4,468
Quote:
Originally Posted by pandy71 View Post
But GCR is unknown too me - no details about implementations and HRM information suggest that GCR is usable only in single density mode (i.e. 125/250 kbps not 250/500kbps).
Yes, only in SD mode, I've explained someway in the previous message.
But if it was just a magnetic surface issue it would mean that setting 2us and a GCR encoding of some sort [RLL(0,x)] and using an HD floppy then something should work.
But my feeling is that Paula's DPLL would not be able to handle the data and would sample completely wrong as it is probably only set to handle flux changes every ~4us or more (else it 'correct' the clock and messes everything).

About GCR implementations: as it is a 'software' LUT approach it is basically 'free' and the various companies implemented it at will.
So it's usually just a variation of an RLL(0,2), with a variable bit-group association.
And a different synchronization for the data start, which is discussed somewhere on the forum (of course I haven't found now..) and which WinUAE supports.
Quote:
All precompensation implementations i know advance or delay time when bit is written to disk, it depends on predicted flux shift - usually some shifter is used to do so.
There is some DPLL Paula patent that address some details of inner work but still there is no information how it works in GCR mode?
Yes, all this is substantially unknown and in fact we should face it scientifically and do some tests (of course someone may have already done it but I haven't found any evidence on the net)
Quote:
One way of dealing with such problem is simply variable amount of sectors stored per track - outer tracks may simply carry more data where inner less. Amiga never used such approach to my knowledge...
In fact, it would be a possible approach but with one of the following conditions: the possibility of setting the duration of the cells with greater precision or the possibility of having a variable speed drive available.
Missing these two conditions I would say that the best (and even the only possible) approach is to have a defined amount of sectors per track.

Quote:
Seem there is some work on complex RLL encoding for floppies https://c65gs.blogspot.com/2022/01/m...-encoding.html - curious if Paula can be flexible enough to deal with RLL2,7 or more advanced encoding.
No, impossible for Paula, this require too many underclocks not supported by the existing implementation (but would be already technically feasible!).


Quote:
Originally Posted by paraj View Post
I've made a small proof of concept
Nice
ross is offline  
Old 12 February 2023, 21:17   #19
Photon
Moderator
 
Photon's Avatar
 
Join Date: Nov 2004
Location: Eksjö / Sweden
Posts: 5,602
Quote:
Originally Posted by NorthWay View Post
That was the best I could do for the topic. Read through and suggest something better if you want to improve it.

Ok, so I have started this in a backwards fashion: I wondered if the 2:1 standard encoding was the only possible way to store data on a floppy. Without breaking any 'rules'. The rules as I understand them is no more than one "1" bit in sequence and no more than three "0" bits in sequence. If this is wrong (or needs more qualifiers) then the rest below is just garbage.

I have looked at encoded mfm data, trying to find all valid combinations. This in itself is kinda meaningless unless you restrict yourself to a certain size for the data you look at.
I first looked at 8 bit groupings which gave a worst case worse than the normal encoding rules (but an average that was probably a good deal better if you could find a way to encode and decode the stream).
Next I tried with 16 bit groupings and for worst case there is (if my manual calculations are correct) 270 valid encoding patterns.
Worst case being that the previous 3 bit prefix was "000" and so your 16-bit pattern begins with a "1". I haven't checked the other cases ("001", "010", "100", "101") but they had more patterns for 8 bits so I'll be ever so bombastic and assume they are better.

How do you construct a datastream using base-270 numbers? I know arithmetic encoding uses sub-bit math, and IIRC MIME-encoding uses something that fits into less space. Can you also do it with varying base numbers? Some of the prefixes might generate more than 300 valid patterns which could increase the storage density nicely if the algorithm doesn't break down.

Now, this might not be practically possible, or just 30 years too late. You can browse the encodings I typed out.
And I have this feeling that I must be missing something major here...

(Bonus Q: What is the rule for encoding the first bit in the mfm stream? Since there is no previous bits I mean. Or does it depend on the sync mark?)
The Amiga supports MFM and GCR, but if you take the information in HRM and extrapolate, you can experiment. GCR is a generic term and you can invent your own group coded recording.

I experimented with this back in the day, and I think I got it down to 1.675 bits per decoded bit.

The sync marks are indeed MFM-compatible.

You can also experiment with longer tracks, write across the gap and rewrite the gap data, and of course have one 'sector' per track.

On top of this you can adapt a compression algorithm to somehow use your built in optimal encoding, and so waste less disk space.

But the train has mostly passed for these types of experiments. Now we appreciate if a disk is preservable in a standard format, and any compression is done on the data before encoding.

But I remember having a cozy time. I think this was around the time when I learned to drink coffee.
Photon is offline  
Old 12 February 2023, 21:47   #20
paraj
Registered User
 
paraj's Avatar
 
Join Date: Feb 2017
Location: Denmark
Posts: 1,099
The limit for arbitrary data is 1/0.5515 (~1.8132). Old thread: https://eab.abime.net/showthread.php?t=101466&

Thinking about it again, I think this scheme is better:
Code:
State Prior Coding (0/1) Next State
A     010    0   / 10    B / A
B     100    010 / 10    A / A
Just as efficient theoretically and RAM usage would be halved. Should be easy enough to adapt the existing code.

Previous test case would have to work before it makes sense to optimize, so still looking for volunteers . (Source is included BTW).
paraj is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Found: Space Baller] Bouncing ball in space? Kada Looking for a game name ? 23 06 April 2018 03:10
Dynamic Drums illy5603 request.Music 2 21 September 2017 21:39
Dynamic HDF - Please explain ransom1122 support.WinUAE 18 04 January 2017 07:32
Problem with Dynamic HDF tero support.WinUAE 13 27 October 2009 17:33
Blitter MFM decoding Photon Coders. General 14 16 March 2006 11:24

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 22:28.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.10574 seconds with 16 queries