English Amiga Board

English Amiga Board (https://eab.abime.net/index.php)
-   project.WHDLoad (https://eab.abime.net/forumdisplay.php?f=63)
-   -   Loading WHDLoad games by decompressing on the fly (https://eab.abime.net/showthread.php?t=103144)

kipper2k 17 July 2020 16:28

Loading WHDLoad games by decompressing on the fly
 
Hi All, not sure if this is doable. I am thinking about some of the games in the WHDload games library that can sometimes have 100's of small files. The obvious annoyance with this is the time it takes to copy these files onto a CF/HDD etc even in Winuae.

Qusetion is, could WHDload have a User option to run a compressed game file decompressing on the fly. I understand that this would probably need an expanded/upgraded Amiga. but part of the benefits would be reduced HDD space and file copying time?

ross 17 July 2020 16:50

Quote:

Originally Posted by kipper2k (Post 1414419)
Hi All, not sure if this is doable. I am thinking about some of the games in the WHDload games library that can sometimes have 100's of small files. The obvious annoyance with this is the time it takes to copy these files onto a CF/HDD etc even in Winuae.

Qusetion is, could WHDload have a User option to run a compressed game file decompressing on the fly. I understand that this would probably need an expanded/upgraded Amiga. but part of the benefits would be reduced HDD space and file copying time?

Well, it's not the same thing, but you can unpack the archive in RAM: and launch the game from there ;)

Wepl 17 July 2020 17:18

To support some kind of archives directly is on my ToDo list.
But it probably won't happen in the next time.

coldacid 18 July 2020 00:16

Maybe in version 19?

Wepl 20 July 2020 13:27

Maybe someone wants to contribute?
I need an archive format which:
- preserves all amiga filesystem meta data (protection, date, file comment)
- stores directories and normal files
- files should be stored uncompressed or compressed
- the format must allow random access to the stored files, so chunks should be used if compressed (like in XPK)
- I think a stream format like lha is best suited

daxb 20 July 2020 14:27

Doesn't LHA or LZX work? I ask because emulators have LHA support but I don' know how they handle that. I guess the problem is write access for changing files (highscore, savestate, icons, ...)? If the user want to change tooltypes, how shall it work?

Toni Wilen 20 July 2020 14:58

lha probably is the best option. Supports Amiga flags, comments. Open source (Original unix lha).

lzx also works but there is only reverse-engineered source available and lzx (if I remember correctly) can't be seeked freely, at least in some compression modes.

Wepl 20 July 2020 15:08

Existing archivers like lha/lzx doesn't allow random access and are also too slow.
The archives will be read-only for whdload. The archives will be like an additional read-only data directory. Writing will only occur to normal files if there is also real data directory (e.g. SavePath).

Toni Wilen 20 July 2020 15:26

Store without compression/use simpler compression method? If the main point was to reduce number of tiny files.

I thought you meant seeking to any file which is possible with lha (but not necessarily possible with lzx or any other with "solid" method without decompressing all previous files in archive). AFAIK no normal archiver supports random access seeking to any file position without (at least partially) decompressing the file first.

In my opinion some kind of caching (decompress file when needed on the fly and then keep decompressed data in memory) would fix the slow down problem. Files are generally tiny (unless slave use disk image) so decompression is always fast :)

Wepl 20 July 2020 16:37

Starting without compression is probably the best. But without compression it will be a step back for people which use XPK currently.

Seeking in the files is probably also no intention for a archiver. But it should also work without preloaded files in low memory configs. If the files are compressed in not too large chunks it should be possible. A caching buffer of one chunk should be suffient if many small IOs are performed to avoid a slowdown.

temisu 06 September 2020 20:54

I know nobody asked my opinion (and this is an old thread) but I've been experimenting idea of having filesystem-like view for archives (lhz, lzx, zip) and I've learned a thing or two in the process. I have not yet shared any code for that since it is not maybe in best of shape, although I have open decompressor algorithms made for lzx, most of lha and some of zip. (This is not in the amiga context itself, but same learnings apply)

My view on this topic would be siply "please no no no another container format". There are enough already. Features like this always live or die the easiness of the tooling made.
Lets take a look of the options:

* tar.(gz,bz2,xz...) - is compressed with the container. completely unsuitable
* lzx - can compress multiple files into a stream, is not random access per files.
* lha - linear structure. not ideal for random access.
* zip - has central directory. Files randomly accessible.
* rar - proprietary, free solutions can only unpack (legally that is)
* 7z - open format but current tooling does not look like it is classic friendly
* adf + xpk files - Well, it is an option. Not necessarily a good one though...

So frankly, we are left with zip. Fortunately zip has support for storing amiga protection bits (actually 2 different ways), file comments and support files being both stored compressed and uncompressed, although again tool support varies for this feature.

The best part of zip is (assuming we are not interested about fancier features like zip64, multi file archives, encryption etc.) is that it is rather low overhead format. I can see that central directory probably needs to be read into memory but rest of the file is in-demand only. Also if we stay only on the deflate compression there are hundreds of implementations - I'm pretty sure there is speedy m68k variant as well...

Then comes the tricky part of random access inside files. Even many xpk-compressors do not implement this properly. Also nothing nerfs the compression performance than splitting the compression blocks into a smaller pieces. However, not many people know that deflate can also split the the bitstream into smaller chunks (some rare compressors use this feature, but not many). So it is easy to emulate a block-size structure for the Deflate. Only problem is to where to store this information. Here the zlib and zip container format comes handy. We can append offsets in some sane fashion to the end of the bitstream and still be compliant and other zip-file extractors do not see anything special but implementation that knows about the method can speed up seek() made quite considerably. Best part is that this does not have to be static - some files could for example be made block size by track whereas others could be by sector. This only depends how to make such tooling...

This comes to the last point. This would be a really fun project for me and I could make a reference implementation (C) for using the zip-archives and tooling (C++) how to create "enhanced" zip files for open license (like BSD) assuming this is something that would be picked up by WHDLoad :) I don't even know what are the technical requirement for the code. I'm too old for assembly, even though I have written it "enough" earlier...

jotd 06 September 2020 21:22

would make sense for big games with a lot of files. Currently starting whdload on such games (even without PRELOAD) takes forever. I don't really know why whdload scans the contents of the "data" directory since files aren't loaded.

rare_j 07 September 2020 19:29

Apparently, zip is missing support for a particular amiga file attribute that breaks some game files for whdload that have been compressed, then decompressed using it.

This is why the preferred compressor for game files for whdload is lha.

Sorry that I cannot remember the details of the problem with zip. It was identified in a thread on eab recently which I can't find. But this is relevent because you'd need to use a customised version of zip to fix this for whdload compression, and so you effectively have another container format.

Radertified 07 September 2020 19:48

Quote:

Originally Posted by rare_j (Post 1426329)
It was identified in a thread on eab recently

You're probably referring to this one where StingRay schooled me: http://eab.abime.net/showthread.php?t=31450

Zip cannot store Amiga filesystem specific features, such as comments, so it's out of the question.

rare_j 07 September 2020 22:14

Yes it was that thread, but now I look again I don't think stingray was talking about zip.
He was talking about extracting archives onto a non-amigados filesystem.
Additionally, temisu says above that zip on the amiga does support comments.

However, there is a problem with info-zip on the amiga. I have reproduced the issue myself. There is a game, that if you zip up all the contents on the amiga, then unzip the contents again, the game doesn't work. I apologise that I don't remember the game, and (as far as I know) it was not identified what the issue is.
Perhaps Retroplay remembers which game it is.

temisu 07 September 2020 22:19

Quote:

Originally Posted by Radertified (Post 1426331)
Zip cannot store Amiga filesystem specific features, such as comments, so it's out of the question.

Hi, I did not come here to argue (or to start flame war) about archivers. I think we all know the state of the affairs with lha / zip in amiga and the problems they have and why people prefer lha...

Following we know for certain for zip
  • Amiga attributes are easily broken if the file is ever decompressed in any other operating system
  • To include file comments in zip files you need to add special flag both when compressing and decompressing
  • There can be character set conversions that break the filenames

However, if we want to choose between a new container format and fixing bugs in existing one (and extending it backwards compatible fashion), fixing existing one would be in my mind a better option. There is nothing wrong in the zip-file format itself, it is well documented in a RFC and has working extension system. It is always about the implementation...

I just wanted to point out that zip-file format is better suited for random access of files. Obviously there is a risk that if zip is used as a container, there will be people abusing it. But it only takes a simple check to see what is the creator operating system to make sure we have a proper image. So there is always a trade off

Now, in order to steer the discussion to more technical side I made some comparisons. I tested with known adf-file (3DDemo1). Lets compare what are the effects of making file seekable with track accuracy (5632 bytes)
  • ADF - 901120 bytes
  • ADZ - 597108 bytes
  • DMS - 617348 bytes
  • Split deflate - 628245 bytes

So, obviously there is a price to pay if you make the compressed file seekable. But to me it looks like there would be a benefit as long as it is something that is dynamically tunable per file

Radertified 07 September 2020 22:39

Quote:

Originally Posted by temisu (Post 1426359)
Hi, I did not come here to argue (or to start flame war) about archivers.

I'm definitely not arguing about it. I'm sorry if my reply came off as hostile.

If zip works, fantastic. Let's go with that :)

rare_j 07 September 2020 22:41

Quote:

Originally Posted by temisu (Post 1426359)
[*]To include file comments in zip files you need to add special flag both when compressing and decompressing

It's possible that this feature has been causing trouble in the past.
Is there a way to tell if an archive has been generated preserving file comments?

temisu 07 September 2020 22:57

Quote:

Originally Posted by Radertified (Post 1426362)
I'm definitely not arguing about it. I'm sorry if my reply came off as hostile.

If zip works, fantastic. Let's go with that :)

No worries. Sometimes in order to make progress we need to be able talk about crazy ideas openly.

I do not know whether having a zip is a good idea. But I'm sure we can think and talk about it :D

Quote:

Originally Posted by rare_j (Post 1426364)
It's possible that this feature has been causing trouble in the past.
Is there a way to tell if an archive has been generated preserving file comments?

Well, if there are content in the comment-attribute you can be certain that archive is created properly. If not, you have no safe way of knowing whether there actually was not any comments or whether they were not included by accident.

On decompression side, they will get just dropped if the special flag is not used for unzip.

Looking implementation I wrote earlier, I have following (horrific) logic for reading zip-files. (isAmiga is coming from os-creator field for zip-files)

Code:

std::string fileNote;
if (isAmiga && commentLength>1)
{
        // it is not a comment, it is filenote
        size_t commentOffset=dirEntOffset+centralFileHeader.size()+nameLength+extraLength;
        // null is included in file, but we do not have to store it...
        fileNote=UTF8::convertFromISO88591(std::string(reinterpret_cast<const char*>(centralDirectory->data())+commentOffset,std::min(size_t(commentLength),size_t(79U))-1),true);
}


Wepl 08 September 2020 00:55

Quote:

Originally Posted by temisu (Post 1426160)
My view on this topic would be simply "please no no no another container format". There are enough already. Features like this always live or die the easiness of the tooling made.

I fully agree with this. If there are formats which match the requirements it will be better to use existing ones.
Quote:

Originally Posted by temisu (Post 1426160)
Lets take a look of the options:
* lha - linear structure. not ideal for random access.

Yes, if there are many files and the archive is not preloaded accessing a file may require many seeks and reads.
An advantage would be that file comments etc. are widely used with that format. The archive format itself with different header and compression formats is not very nice I think.
Quote:

Originally Posted by temisu (Post 1426160)
* zip - has central directory. Files randomly accessible.

So frankly, we are left with zip. Fortunately zip has support for storing amiga protection bits (actually 2 different ways), file comments and support files being both stored compressed and uncompressed, although again tool support varies for this feature.

Which zip implementation does support protection bits and file comments?
I did not know about.
Quote:

Originally Posted by temisu (Post 1426160)
The best part of zip is (assuming we are not interested about fancier features like zip64, multi file archives, encryption etc.) is that it is rather low overhead format. I can see that central directory probably needs to be read into memory but rest of the file is in-demand only.

If the directory is held in memory then this could also be done for lha. Currently this is not done by WHDLoad (except for PreLoad and Examine).
Quote:

Originally Posted by temisu (Post 1426160)
Also if we stay only on the deflate compression there are hundreds of implementations - I'm pretty sure there is speedy m68k variant as well...

Speed is important but as long as uncompressed storage is supported the user has the choice. Archivers are normally optimized to save space and not for fast decompression. So this is a point why I'm unsure is reusing existing archivers is a good idea.
Quote:

Originally Posted by temisu (Post 1426160)
Then comes the tricky part of random access inside files. Even many xpk-compressors do not implement this properly. Also nothing nerfs the compression performance than splitting the compression blocks into a smaller pieces. However, not many people know that deflate can also split the the bitstream into smaller chunks (some rare compressors use this feature, but not many). So it is easy to emulate a block-size structure for the Deflate. Only problem is to where to store this information. Here the zlib and zip container format comes handy. We can append offsets in some sane fashion to the end of the bitstream and still be compliant and other zip-file extractors do not see anything special but implementation that knows about the method can speed up seek() made quite considerably. Best part is that this does not have to be static - some files could for example be made block size by track whereas others could be by sector. This only depends how to make such tooling...

Great if this is possible ;)
Quote:

Originally Posted by temisu (Post 1426160)
This comes to the last point. This would be a really fun project for me and I could make a reference implementation (C) for using the zip-archives and tooling (C++) how to create "enhanced" zip files for open license (like BSD) assuming this is something that would be picked up by WHDLoad :) I don't even know what are the technical requirement for the code. I'm too old for assembly, even though I have written it "enough" earlier...

I really would like to add this. C would be fine, there is already a small part compiled using vbcc. I could probably also define a small ABI which would allow to have this as a separate BLOB.
Basically needed are:
- function which iterates over all file names (PreLoad)
- function which returns size of a file (GetFileSize)
- function which reads part of a file (or complete file)
- function which iterates over all file names in a sub directory (ListFiles)
- function which iterates over all files/dirs and delivers all filesystem meta data (Examine)

Quote:

Originally Posted by jotd (Post 1426171)
would make sense for big games with a lot of files. Currently starting whdload on such games (even without PRELOAD) takes forever. I don't really know why whdload scans the contents of the "data" directory since files aren't loaded.

Only if the Slave has Examine flag set WHDLoad collects all filesystem meta data at startup. This data is needed for the Examine/ExNext calls and collected before to avoid many os-switches. So this happens for most kickemu games.


All times are GMT +2. The time now is 20:21.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.

Page generated in 0.10240 seconds with 11 queries