01 April 2020, 02:01  #21  
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 661

Quote:
You build tables for "000","001","010","100","101" as the possible prequels (I only did for "000" as that showed to be pessimal when I tried with 8 bits). And so the "001" table will start something like 001 0001000100010001 001 0001000100010010 001 0001000100010100 001 0001000100010101 ... And yes, the encoding becomes complex as it keeps switching between 5 different enumerations as the input changes. You could possibly simplify by using only as many variations as the most pessimal prequel gives you, but that is giving away a lot of potential space saving. I feel that this is very hard to express in writing without twoway human interaction... 

01 April 2020, 08:57  #22  
Per aspera ad astra
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 2,716

Quote:
I do not think that the 'prequel' approach is good in this situation, because you have to deal with undefined stream and in very short time your complexity will explode and sure you will be forced to: drop some bits 'grouping' or use bigger and multiple LUTs or resort to convoluted decoding. But in any case I can be wrong and your ideas (or part of your ideas) can contribute, added to mine or someone else's, to something usable and valid. Thirty years ago the easiest and most sensible way I found to go beyond the limits was to change the coding and use a RLL (1,4) code. Perhaps we can now find a valid and profitable RLL (1,3) approach. 

01 April 2020, 09:35  #23  
Registered User
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 804

Quote:
Yes, that's similar to what I had in mind. Is there anything to to be gained from using the lower three bits as a selector? 000, 001, 010 and 100 all have different sets of valid codes that can follow them. Also if four zeroes is technically invalid but usually works, perhaps exploit that for the sync word? (Perhaps I'm wrong, but my gut says one isolated run of four zeroes is less likely to cause problems that a stream in which such runs are commonplace.) 

01 April 2020, 09:48  #24  
Registered User
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 804

Quote:
Am I missing something, or are "001" and "101" not equivalent as a prequel? The four cases that matter, unless I'm misunderstanding, are "...1", "..10", ".100" and "1000", yes? So "001" and "101" are both examples of the first case? 

01 April 2020, 11:23  #25  
Per aspera ad astra
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 2,716

Quote:
Yes, can be explored, but I'm worried about big and multiple LUTs.. With a single byte as a 'group' I think is also counterproductive in final stream bits quantity (or not significative). Quote:


01 April 2020, 12:12  #26 
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 661


01 April 2020, 16:32  #27 
Per aspera ad astra
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 2,716

Third attempt, a sum of all the ideas.
Four LUTs for the possible last 3 LSB, 13 bit "grouping", to keep in mind 68k and low memory usage. Results: %000>$58, %001>$81, %010>$BD, %100>$94 So another failure, worst than the previous 'single bit chain' and 14 bits (%000 need to be >>$60..). Code is the usual, small changes to accommodate LUT dimension saving and LSB group management: Code:
lea rll(pc),a0 lea lut(pc),a1 moveq #%000,d5 bsr.b tab move.w d0,(a1)+ moveq #%001,d5 bsr.b tab move.w d0,(a1)+ moveq #%010,d5 bsr.b tab move.w d0,(a1)+ moveq #%100,d5 bsr.b tab move.w d0,(a1)+ rts tab ror.w #3,d5 move.w #1<<131,d6 moveq #0,d0 .st moveq #2,d2 moveq #4,d3 move.l d5,d1 moveq #161,d4 .ll add.w d1,d1 bcc.b .cc moveq #4,d3 addq.w #1,d2 beq.b .ex bra.b .pp .cc moveq #2,d2 addq.w #1,d3 beq.b .ex .pp dbf d4,.ll move.w d5,(a0)+ addq.w #1,d0 .ex addq.w #1,d5 dbf d6,.st rts lut ds.w 4 rll ds.w $22A 
01 April 2020, 17:37  #28 
Registered User
Join Date: Mar 2012
Location: Norfolk, UK
Posts: 804


02 April 2020, 06:36  #29 
Registered User
Join Date: Jan 2012
Location: USA
Posts: 320

From "An Introduction to Coding for Contrained Systems"
page 74 table 3.1 the channel capacity for certain RLL codes: (0,1) is 0.6942 (1,2) is 0.4057 (1,3) is 0.5515 (1,4) is 0.6174 (1,5) is 0.6509 (1,6) is 0.6690 (1,inf) is 0.6942 (2,inf) is 0.5515 RLL(0,1) looks nice, but the bit rate would be half. Normalized capacity would then be 0.3471. Disk spins at 300 RPM and 2 microseconds per bit = 100,000 bits/track (more or less). The absolute best RLL(1,3) code would give 55,150 data bits/track = 6,893 bytes. These are limits. They don't tell us how to actually construct the most efficient codes. Impressive that a simple code like MFM is almost within 10% of the limit. Might be easier to build an RLL(1,4) code for bigger gain if Amiga can handle it. 
02 April 2020, 11:13  #30 
Per aspera ad astra
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 2,716

Thanks mc6809e, this explains many things.
As in fact be seen from my attempts, for RLL(1,3) it is necessary to arrive at groupings >14bit to have some advantage over strict MFM, and in any case there would be a limited gain, and 68k difficulties. Different for the RLL(1,4): not only gain is bigger but there are some subtle and fundamental differences. If you can reach a gain of 25% compared to the MFM you no longer incur the penalty of managing 'odd' numbers of bits and you can build tables for multiples of powers of 2 or extract submultiples that align in a simpler way. Or you can use other techniques for inserting clock bits similar to MFM ones, maybe every 12 bits or alike. Is it possible this gain with RLL(1,4)? Yes. A few calculations. Channel capacity for MFM: 0.5, for RLL(1.4): 0.6174, so a difference of 23.48%. BUT usually you do not use a full track for recording. On Amiga, where gaps can be very small (or null), you can accommodate 12 sectors with easiness. So, even taking the minimum case, a good 9% more space is usable (which also contains data, sync, header and gap). So for RLL (1,4) you have, for one track, a grand total of 0.5*1.2348 *1.09 = ~ 0.673 which is 34% more than a 'Amiga default' MFM track. Of course RLL(1,4) channel capacity of 0.6174 is a theoretical maximum, but even if a maximum of 0.58 was reached after the encoding, would be enough to achieve the goal: 0.5*(0.58/0.5)*1.09 > 0.5*1.25 So enough space to manage tracks with 14 sectors, with reliable SYNC, headers and sufficient gaps, and an encoding that allows you to manage groupings of bits or other simpler techniques for inserting the clock bits without incurring large penalties for management. There remains the problem of the reliability of the RLL (1,4) for real systems that have been designed for physical times more related to the MFM.. EDIT I've forgot to comment this: "Impressive that a simple code like MFM is almost within 10% of the limit." Yes absolutely impressive, and explains the difficulties in beating it in a simple and substantial way. Last edited by ross; 02 April 2020 at 11:38. 
02 April 2020, 17:31  #31 
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 510

Maybe not, because there is an unspoken assumption: Channel capacity means that this is a constant to variable bit coding, not block coding as you may want to use. This means that the size of the tracks will vary, depending on the data you want to store. In worst case, you have a 2times expansion, namely if your input consists of 1's only, and there is nothing you can do about that or prevent that. It is a possible input.
For a random coin flip, I compute a 1.625 expansion of the input (i.e. expectation value of the number of bits generated given an iid input source, i.e. a fair coin). That is not quite as bad as MFM. For higher capacity block codes, you'll necessarily have to switch to less redundant codes, but then violate the protocol the floppy is based on, with all sorts of reliability issues. I believe this strict "a 1 must be followed a 0, and no more than 3 0's in a row" is probably better looked at as a "frequency limited channel" with an upper frequency limit of 1/2 f, f = the clock frequency, and a lower frequency of 1/4 f (keeping in mind that a 1 is written as magnetization change). That can possibly give rise to better modulation techniques to fit the channel constraints, but I afraid it would not work without additional hardware for decoding them. RLL(1,4) gives rise to lower frequences (1/5 f), and it is likely that there is a bandpass filter in the (analog) part of the system that may already eliminate them to the degree to disallow reliable decoding, but who knows... 
02 April 2020, 18:26  #32 
Per aspera ad astra
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 2,716

Yes, is unspecified how this distribution is.
But as a statistical approach I suppose this data is for a not skewed source, one with sintetically generated equal 50% source bit distribution, otherwise it would make no sense. And when you make a block (de)coder you need to be constistent in the very block, so when you have some data expansion this expansion is respected in all cases. For the MFM the 2:1 ratio is respected down to a 'grouping of two bits. I tried to find a grouping that would allow me to improve the ratio while keeping the gain per block constant (my first try was bad for this reason, no way to separate blocks even if the generated stream was a valid RLL(1,3) one). A RLL(1,4) could maybe guarantee a ratio like 2:1.25 and MUST be consistent for the 'grouping' you decide to be used. 
02 April 2020, 23:47  #33 
Per aspera ad astra
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 2,716

Ah! @Thomas, only now maybe I fully understood what was wrong for you for "channel capacity" values (and why you consider possible variable code and a 1.625 'fair' expansion), but correct me anyway if I'm wrong.
All this codes, even if atomized to a 'grouping', are streamed! So expansion can be lower than the maximum of the worst input code (for MFM this is an equality). A different 'grouping' can express the same value(s) if is a 'sequel' of different 'grouping's. So you can define same 'token' with different bit codes! In my mind this have some (reversed) analogy to tabled huffman codes or tabled ANS, where you read a fixed number of bits from a stream, use these for a LUT, that also indicate how many bits you can discard from the stream, that can usually be much lower than the maximum expansion. You can pretty notice it in my 14 bit codes (where I gain a little over MFM) or in NorthWay's text (where there is more gain for 16 bit 'grouping'). So I imagine the "channel capacity" as a limit where "grouping" tends to define a peak value, where decoding is guaranteed for a generic stream (like Shannon's theorem define a punctual value for arithmetic compression, a value that represent asymptotic perfection). Sure there is a theorem that define this "channel capacity". 
03 April 2020, 02:34  #34  
Registered User
Join Date: Jan 2012
Location: USA
Posts: 320

Quote:
The capacity is the number of bits encoded divided by the total length of the run. The table provides the upper bound on the capacity when the total length of the sequence of 0 and 1 allowed by the RLL code is unbounded. If you want to think in terms of a block code, you can imagine taking all possible sequences of 0 and 1 for a fixed length of say, 1024, constrained by RLL(1,3) for instance, and matching each possible sequence to another sequence of output bits of length log2(number of legal sequences of 1024 bits). For MFM, decoding 1024 input bits always yields 512 output bits. The capacity is 0.5. The capacity numbers in the table tell suggest we can do better. Indeed, that there are legal RLL sequences that are unmatched to output bit sequences (SYNC values, for example) tells us we can encode more bits using these extra sequences. How that might be done is another question. 

03 April 2020, 09:24  #35 
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 510

I understand that, but this is still a constant to variable bit rate code, due to the constraint to write a 0 after a 1 and a 1 after three 0's. Thus, if you constrain yourself to block codes, the capacity will be lower than what was computed there, and it will be lower for a particular source. For a source of all 1's, the output size will be twice of the input size, no matter what, and the expected size for an iid input source I computed above.

03 April 2020, 14:01  #36  
Registered User
Join Date: Jan 2012
Location: USA
Posts: 320

Quote:
Create a list of every valid sequence 0 and 1 of length k constrained by the RLL code. Take the first n codes in the list where n is a power of 2 such that every binary number from 0 to x is associated with the code. The problem of course is finding a compact way to represent the mapping. One would like to replace the table with some small amount of code that performs the same mapping. 

03 April 2020, 16:37  #37 
Registered User
Join Date: May 2013
Location: Grimstad / Norway
Posts: 661

I found an example of what I had in mind, though it is simplified compared to arithmetic compressors AFAIK  see https://en.wikipedia.org/wiki/Ascii85
85(or whatever you end up with) would be your table length. By this method you're limited by the shortest table ("000" prefix). 
03 April 2020, 16:54  #38 
Per aspera ad astra
Join Date: Mar 2017
Location: Crossing the Rubicon
Age: 50
Posts: 2,716

A new tentative to explain to Thomas, second part some considerations and tests about RLL(1,4).
EDIT: removed second part, some wrong assumption @Thomas: No, that's not like you think, and my code is there to prove it. I tried in my posts above to explain to you but I'm absolutely not good to do it in English (unfortunately), so I'll retry . I've no idea if you are familiar with entropic compressions, ANS particularly, but I've mentioned analogies with compression because then the concept would seem clearer to you. The key lies in residues and states (like a finitestate machine). I take as an example my posts with previous code even if they do not lead to any result (in the end you will understand why they are unbalanced and unusable), they only serve to illustrate the concept. In my code I created two 2^14 bytes LUTs, one reachable in $81 cells and the other in $BD cells, where I've a 7 bits content that is the real value of my original bits. Also the example with 13 bits can be fine, you can notice good values for some combinations of LSB index, although unfortunately there is one that does not cover the minimum bits so the code is unusable. I switch from one table to another depending on the value of the LSB(s) and in any case I grab a valid 7 bits value, my MSB(s) chain bit guarantee a valid stream. I can build the stream without difficulty but I have a fundamental usable capacity: I can not only insert a valid 'grouping' but at the same time I can insert one that makes me 'transit' in the states I want, piling up extra bits for the original stream. So not only the actual value in the LUT, but also the position where I am (state) gives me usable information (weel, I can also simply use the extra bit in the cell in this case..). Unfortunately the bits are not enough to cover a transit to the extra states to accumulate extra bits, but if the 'grouping' had been wider I would have succeeded. This is why the RLL (1,4) works much better, I have a much greater number of possible states to accumulate extra bits with groupings even of manageable size. So probably I not even need big tables because I can spread the clock/chain bit(s) directly on stream. Last edited by ross; 03 April 2020 at 17:24. 
03 April 2020, 16:55  #39 
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 510

Yes, of course, but this limits the size of an output sequence to the size of an element in the LUT, and thus compromizes the capacity of the channel. In other words, the capacity of the RLL(1,3) channel computed above is too high for the method you propose. It is an upper boundary for arbitrary long LUT elements.
This is "only" an implementation problem. 
03 April 2020, 21:13  #40 
Registered User
Join Date: Jan 2019
Location: Germany
Posts: 510

For the sake of entertainment, here is a small program that finds all valid codes in codewords of 1 to 28 bits size, and prints the ratio of valid codes within the codespace:
Code:
#include <stdio.h> #include <math.h> int main(int argc,char **argv) { int i,max; int word; int count; int bits; double quot; for(bits=1;bits<=28;bits++) { count=0; max=(1<<bits)1; for(i=0;i<=max;i++) { if ((i & (i<<1))==0) { word=(~i)&max; if ((word & (word<<1) & (word<<2) & (word<<3))==0) { count++; } } } quot=log((double)(count))/log((double)(1<<bits)); printf("%d bits, %d valid codes, ratio %f\n",bits,count,quot); } return 0; } Code:
1 bits, 2 valid codes, ratio 1.000000 2 bits, 3 valid codes, ratio 0.792481 3 bits, 5 valid codes, ratio 0.773976 4 bits, 7 valid codes, ratio 0.701839 5 bits, 10 valid codes, ratio 0.664386 6 bits, 15 valid codes, ratio 0.651148 7 bits, 22 valid codes, ratio 0.637062 8 bits, 32 valid codes, ratio 0.625000 9 bits, 47 valid codes, ratio 0.617177 10 bits, 69 valid codes, ratio 0.610852 11 bits, 101 valid codes, ratio 0.605292 12 bits, 148 valid codes, ratio 0.600788 13 bits, 217 valid codes, ratio 0.597042 14 bits, 318 valid codes, ratio 0.593777 15 bits, 466 valid codes, ratio 0.590946 16 bits, 683 valid codes, ratio 0.588484 17 bits, 1001 valid codes, ratio 0.586307 18 bits, 1467 valid codes, ratio 0.584370 19 bits, 2150 valid codes, ratio 0.582638 20 bits, 3151 valid codes, ratio 0.581080 21 bits, 4618 valid codes, ratio 0.579669 22 bits, 6768 valid codes, ratio 0.578387 23 bits, 9919 valid codes, ratio 0.577216 24 bits, 14537 valid codes, ratio 0.576143 25 bits, 21305 valid codes, ratio 0.575156 26 bits, 31224 valid codes, ratio 0.574245 27 bits, 45761 valid codes, ratio 0.573401 28 bits, 67066 valid codes, ratio 0.572618 
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)  
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Kaffer's diskanalyse  more IPF formats?  Jaimie_H  project.SPS (was CAPS)  3  08 October 2015 15:08 
Anyone adding formats to kaffers DiskUtilities?  BarryB  Coders. General  1  09 August 2015 21:11 
Custom Disk Formats  CorrodedCoder  Coders. General  3  08 January 2010 14:54 
New Catweasel mk IV drivers & disk formats!  stainy  Amiga scene  14  14 August 2008 17:20 
Disk2FDI & Custom Disk Formats  jmmijo  support.Apps  9  14 April 2002 22:01 

