English Amiga Board - Loading WHDLoad games by decompressing on the fly

English Amiga Board (https://eab.abime.net/index.php)

- project.WHDLoad (https://eab.abime.net/forumdisplay.php?f=63)

- - Loading WHDLoad games by decompressing on the fly (https://eab.abime.net/showthread.php?t=103144)

Loading WHDLoad games by decompressing on the fly

Hi All, not sure if this is doable. I am thinking about some of the games in the WHDload games library that can sometimes have 100's of small files. The obvious annoyance with this is the time it takes to copy these files onto a CF/HDD etc even in Winuae.

Qusetion is, could WHDload have a User option to run a compressed game file decompressing on the fly. I understand that this would probably need an expanded/upgraded Amiga. but part of the benefits would be reduced HDD space and file copying time?

Quote:

Originally Posted by kipper2k (Post 1414419)

Well, it's not the same thing, but you can unpack the archive in RAM: and launch the game from there ;)

To support some kind of archives directly is on my ToDo list.
But it probably won't happen in the next time.

Maybe in version 19?

Maybe someone wants to contribute?
I need an archive format which:
- preserves all amiga filesystem meta data (protection, date, file comment)
- stores directories and normal files
- files should be stored uncompressed or compressed
- the format must allow random access to the stored files, so chunks should be used if compressed (like in XPK)
- I think a stream format like lha is best suited

Doesn't LHA or LZX work? I ask because emulators have LHA support but I don' know how they handle that. I guess the problem is write access for changing files (highscore, savestate, icons, ...)? If the user want to change tooltypes, how shall it work?

lha probably is the best option. Supports Amiga flags, comments. Open source (Original unix lha).

lzx also works but there is only reverse-engineered source available and lzx (if I remember correctly) can't be seeked freely, at least in some compression modes.

Existing archivers like lha/lzx doesn't allow random access and are also too slow.
The archives will be read-only for whdload. The archives will be like an additional read-only data directory. Writing will only occur to normal files if there is also real data directory (e.g. SavePath).

Store without compression/use simpler compression method? If the main point was to reduce number of tiny files.

I thought you meant seeking to any file which is possible with lha (but not necessarily possible with lzx or any other with "solid" method without decompressing all previous files in archive). AFAIK no normal archiver supports random access seeking to any file position without (at least partially) decompressing the file first.

In my opinion some kind of caching (decompress file when needed on the fly and then keep decompressed data in memory) would fix the slow down problem. Files are generally tiny (unless slave use disk image) so decompression is always fast :)

Starting without compression is probably the best. But without compression it will be a step back for people which use XPK currently.

Seeking in the files is probably also no intention for a archiver. But it should also work without preloaded files in low memory configs. If the files are compressed in not too large chunks it should be possible. A caching buffer of one chunk should be suffient if many small IOs are performed to avoid a slowdown.

I know nobody asked my opinion (and this is an old thread) but I've been experimenting idea of having filesystem-like view for archives (lhz, lzx, zip) and I've learned a thing or two in the process. I have not yet shared any code for that since it is not maybe in best of shape, although I have open decompressor algorithms made for lzx, most of lha and some of zip. (This is not in the amiga context itself, but same learnings apply)

My view on this topic would be siply "please no no no another container format". There are enough already. Features like this always live or die the easiness of the tooling made.
Lets take a look of the options:

* tar.(gz,bz2,xz...) - is compressed with the container. completely unsuitable
* lzx - can compress multiple files into a stream, is not random access per files.
* lha - linear structure. not ideal for random access.
* zip - has central directory. Files randomly accessible.
* rar - proprietary, free solutions can only unpack (legally that is)
* 7z - open format but current tooling does not look like it is classic friendly
* adf + xpk files - Well, it is an option. Not necessarily a good one though...

So frankly, we are left with zip. Fortunately zip has support for storing amiga protection bits (actually 2 different ways), file comments and support files being both stored compressed and uncompressed, although again tool support varies for this feature.

The best part of zip is (assuming we are not interested about fancier features like zip64, multi file archives, encryption etc.) is that it is rather low overhead format. I can see that central directory probably needs to be read into memory but rest of the file is in-demand only. Also if we stay only on the deflate compression there are hundreds of implementations - I'm pretty sure there is speedy m68k variant as well...

Then comes the tricky part of random access inside files. Even many xpk-compressors do not implement this properly. Also nothing nerfs the compression performance than splitting the compression blocks into a smaller pieces. However, not many people know that deflate can also split the the bitstream into smaller chunks (some rare compressors use this feature, but not many). So it is easy to emulate a block-size structure for the Deflate. Only problem is to where to store this information. Here the zlib and zip container format comes handy. We can append offsets in some sane fashion to the end of the bitstream and still be compliant and other zip-file extractors do not see anything special but implementation that knows about the method can speed up seek() made quite considerably. Best part is that this does not have to be static - some files could for example be made block size by track whereas others could be by sector. This only depends how to make such tooling...

This comes to the last point. This would be a really fun project for me and I could make a reference implementation (C) for using the zip-archives and tooling (C++) how to create "enhanced" zip files for open license (like BSD) assuming this is something that would be picked up by WHDLoad :) I don't even know what are the technical requirement for the code. I'm too old for assembly, even though I have written it "enough" earlier...

would make sense for big games with a lot of files. Currently starting whdload on such games (even without PRELOAD) takes forever. I don't really know why whdload scans the contents of the "data" directory since files aren't loaded.

Apparently, zip is missing support for a particular amiga file attribute that breaks some game files for whdload that have been compressed, then decompressed using it.

This is why the preferred compressor for game files for whdload is lha.

Sorry that I cannot remember the details of the problem with zip. It was identified in a thread on eab recently which I can't find. But this is relevent because you'd need to use a customised version of zip to fix this for whdload compression, and so you effectively have another container format.

Quote:

Originally Posted by rare_j (Post 1426329)

It was identified in a thread on eab recently

You're probably referring to this one where StingRay schooled me: http://eab.abime.net/showthread.php?t=31450

Zip cannot store Amiga filesystem specific features, such as comments, so it's out of the question.

Yes it was that thread, but now I look again I don't think stingray was talking about zip.
He was talking about extracting archives onto a non-amigados filesystem.
Additionally, temisu says above that zip on the amiga does support comments.

However, there is a problem with info-zip on the amiga. I have reproduced the issue myself. There is a game, that if you zip up all the contents on the amiga, then unzip the contents again, the game doesn't work. I apologise that I don't remember the game, and (as far as I know) it was not identified what the issue is.
Perhaps Retroplay remembers which game it is.

Quote:

Originally Posted by Radertified (Post 1426331)

Zip cannot store Amiga filesystem specific features, such as comments, so it's out of the question.

Hi, I did not come here to argue (or to start flame war) about archivers. I think we all know the state of the affairs with lha / zip in amiga and the problems they have and why people prefer lha...

Following we know for certain for zip

Amiga attributes are easily broken if the file is ever decompressed in any other operating system
To include file comments in zip files you need to add special flag both when compressing and decompressing
There can be character set conversions that break the filenames

However, if we want to choose between a new container format and fixing bugs in existing one (and extending it backwards compatible fashion), fixing existing one would be in my mind a better option. There is nothing wrong in the zip-file format itself, it is well documented in a RFC and has working extension system. It is always about the implementation...

I just wanted to point out that zip-file format is better suited for random access of files. Obviously there is a risk that if zip is used as a container, there will be people abusing it. But it only takes a simple check to see what is the creator operating system to make sure we have a proper image. So there is always a trade off

Now, in order to steer the discussion to more technical side I made some comparisons. I tested with known adf-file (3DDemo1). Lets compare what are the effects of making file seekable with track accuracy (5632 bytes)

ADF - 901120 bytes
ADZ - 597108 bytes
DMS - 617348 bytes
Split deflate - 628245 bytes

So, obviously there is a price to pay if you make the compressed file seekable. But to me it looks like there would be a benefit as long as it is something that is dynamically tunable per file

Quote:

Originally Posted by temisu (Post 1426359)

Hi, I did not come here to argue (or to start flame war) about archivers.

I'm definitely not arguing about it. I'm sorry if my reply came off as hostile.

If zip works, fantastic. Let's go with that :)

Quote:

Originally Posted by temisu (Post 1426359)

[*]To include file comments in zip files you need to add special flag both when compressing and decompressing

It's possible that this feature has been causing trouble in the past.
Is there a way to tell if an archive has been generated preserving file comments?

Quote:

Originally Posted by Radertified (Post 1426362)

I'm definitely not arguing about it. I'm sorry if my reply came off as hostile.

If zip works, fantastic. Let's go with that :)

No worries. Sometimes in order to make progress we need to be able talk about crazy ideas openly.

I do not know whether having a zip is a good idea. But I'm sure we can think and talk about it :D

Quote:

Originally Posted by rare_j (Post 1426364)

It's possible that this feature has been causing trouble in the past.
Is there a way to tell if an archive has been generated preserving file comments?

Well, if there are content in the comment-attribute you can be certain that archive is created properly. If not, you have no safe way of knowing whether there actually was not any comments or whether they were not included by accident.

On decompression side, they will get just dropped if the special flag is not used for unzip.

Looking implementation I wrote earlier, I have following (horrific) logic for reading zip-files. (isAmiga is coming from os-creator field for zip-files)

Code:

std::string fileNote;

if (isAmiga && commentLength>1)

{

        // it is not a comment, it is filenote

        size_t commentOffset=dirEntOffset+centralFileHeader.size()+nameLength+extraLength;

        // null is included in file, but we do not have to store it...

        fileNote=UTF8::convertFromISO88591(std::string(reinterpret_cast<const char*>(centralDirectory->data())+commentOffset,std::min(size_t(commentLength),size_t(79U))-1),true); 

}

Quote: