English Amiga Board


Go Back   English Amiga Board > Other Projects > project.Amiga File Server

 
 
Thread Tools
Old 13 January 2023, 22:28   #3181
Turran
Moderator
 
Turran's Avatar
 
Join Date: May 2012
Location: Stockholm / Sweden
Age: 49
Posts: 1,571
Quote:
Originally Posted by jbl007 View Post
Is there a reason why you want to stick with ASCII in the db? Using UTF-8 for everything would be much, much easier.
Yes, I know some file name are ISO-8859 (probably ancient files uploaded from Windows 98 or even Amiga) and they can't be encoded to UTF-8 directly. But maybe this could be fixed...
It does not really matter unless we plan to rename files based on the file I uploaded, something I've now decided against. The files are named that way for a reason. It does not matter because all I need is something unique I can get fast to attach a md5sum to in the database. The name in ASCII does just that. The stripped names are not in the final filelist_md5.txt file as it shows the unedited version, so it does not effect the end user.
I tried every different collation and region to try and get all files, named as they are, into the database but failed. There are a LOT of strange chars in some and since this script was only a Proof Of Concept really, I did not want to take the easy route to just start renaming, so ASCII is the safe bet and it works fine for the purpose.
If you want examples, just look in Non Amiga/DOS/Total DOS Collection v18/Games/Files/1989
Code:
?riture Automatique Anglais (Fr) (1989)(J?ko) [Educational].zip
Where in Time is Carmen Sandiego_ v1.01 [a1] (1989)(Br??und Software, Inc.) [Adventure, Educational].zip
Trivia (??????) (He) (1989)(Aryeh Segal) [Educational, Trivia].zip
Quete de l'Oiseau du Temps, La (It) (1989)(Infogrames) [Adventure].zip
Horoscope (????????) (Ru) (1989)(Xyton) [Simulation, Educational].zip
(even the forum can't show it properly, heh).

Quote:
Originally Posted by jbl007 View Post
...which leads to the next questions: Is it even worth the trouble? Do we need a checksum for every file? Is there a real use case for this?
Short answer, NO.

Long answer.

peo wrote and asked about a md5 list here
http://eab.abime.net/showpost.php?p=...postcount=3159

My reply was pretty much the same. Whats the point?
http://eab.abime.net/showpost.php?p=...postcount=3160

The response to that triggered me a bit.
http://eab.abime.net/showpost.php?p=...postcount=3161
"It would be easy.."

That set me off. There were two possible outcomes. Either it was easy as he said and we'd have a nice new list OR I would be right and we would come to the conclusion in your post. Is it worth the trouble? No.

TCD has done a tremendous job of cleaning up and deserves a lot of praise for it. A lot by himself but I also ran a script to find dupes on the FTP that we (Well, TCD mostly) have worked with to remove more non obvious dupes. That is still ongoing and not really related to this.

If nothing else, this project could make our lives easier, both now to clean up, but also in the future. I envisioned an easy "dupe finder" list or query I could build for the website. I envisioned a check after a file was uploaded. "Is this a dupe? If so, remove it and throw a message to the uploader what it dupes on, etc.
If it was successful, it could lead to a lot of other things.
Plus, its fun! I love scripting things, particularly bash/mariadb/html/php etc. This project fit the bill just fine and I do get a bit manic about things like this, that's why I let it go this far.
Just now I added another check in the script. It randoms out 10 files every 10-500 files it goes through and does a verification on the md5sum in the database with the real one of the file. If it finds a mis-match, it rechecks that entire directory. That would keep the database somewhat fresh. I just ran it through again and it took 200 minutes.

But your post brought me back down to earth to be honest. I'll use what we have today to help us do a cleanup and save it for the future, but filesize_md5.zip on the FTP will not be updated on a regular basis, because resource wise, its not worth it.

And I'm blabbering again =)

Last edited by Turran; 13 January 2023 at 22:48.
Turran is offline  
Old 13 January 2023, 23:32   #3182
peo
Registered User
 
Join Date: Dec 2008
Location: Ursviken
Posts: 137
Quote:
Originally Posted by Turran View Post
2: Get the full path from above, replacing any non ASCII chars with ?
Find the "new" filename in the database. If its there, use the md5sum for it. If not, run md5sum on the file (File uploaded or moved since last run).
echo the entire original line with the md5sum we got added to the end to a text file.
2022-12-20T14:08:18+0100|1394|Flux_Images/DoDumpAmiga.7z|81bea2ce62e342f2ac02c5c29344acc9
...
3: We read the filename and md5sum from filelist_md5.txt, replacing any non ASCII chars with a ? into a new file, filelist_md5_import.txt.
...
4: Import the filelist_md5_import.txt file into the database using LOAD DATA LOCAL INFILE.

Maybe it would be more effective in total by comparing (finding file path/location) using base64 encoded filenames ? These could be changed back during import to mysql using from_base64() if you would like them readable in the db for any reason.



Quote:
Like I said, it still takes about 181 minutes to run so I'll probably just update the file once per week instead of every day to make it easier on the poor database. There is only so much processing and fast harddrive you get with a ~$86 machine per month from Hetzner.
As you already build a nightly 'newfiles.txt' file, you could add the md5sum of these new files to the db (and the 'filelist_md5'). This will also catch the mentioned problem that will occur when a file is replaced using the previous name.



The weekly run will keep the 'filelist_md5' free from no longer existing files.
peo is offline  
Old 13 January 2023, 23:44   #3183
Turran
Moderator
 
Turran's Avatar
 
Join Date: May 2012
Location: Stockholm / Sweden
Age: 49
Posts: 1,571
Quote:
Originally Posted by peo View Post
Maybe it would be more effective in total by comparing (finding file path/location) using base64 encoded filenames ?
The whole problem is querying a database 1.5 million times with the hardware at my disposal. Plus I accidentally just deleted the script with no backup, so I'm a bit bummed and am going to bed! Trying to get it back but running strings on a 24TB disk, trying to find a piece of text to get it back takes a while. It can run over night.

Thought it was an old version... Don't script tired!
Project is on hold at the moment. No new zip coming.
Turran is offline  
Old 13 January 2023, 23:51   #3184
jbl007
Registered User
 
Join Date: Mar 2013
Location: Leipzig/Germany
Posts: 466
Quote:
Originally Posted by Turran View Post
It does not really matter unless we plan to rename files based on the file I uploaded, something I've now decided against.
Maybe you should re-think your decision. Sorry for being a bit harsh, but right now the server is a mix of many different encodings, displaying random garbage to the end user for everything that is not english. Impossible to find something that is not english.


Quote:
Code:
Horoscope (????????) (Ru) (1989)(Xyton) [Simulation, Educational].zip
(even the forum can't show it properly, heh).
It's not displayed correctly because encoding is KOI-8 and forum tries to represent that in UTF-8. One more very good reason for a proper fix (renaming files). Correct name is
Horoscope (????????) (Ru) (1989)(Xyton) [Simulation, Educational].zip
Where ???????? is Choroskop in cyrillic. Yes, I can read that. was forced to learn russian in school for 5 years. No fun, but still helpfull today.

Edit: Hm... There is "content="text/html; charset=UTF-8" in the header, but not supported. Interesting forum software.

Quote:
And I'm blabbering again =)
No, for me it's interesting to read something about the inner workings of the server. So please keep me/us updated.

Last edited by jbl007; 13 January 2023 at 23:57.
jbl007 is offline  
Old 14 January 2023, 00:03   #3185
Turran
Moderator
 
Turran's Avatar
 
Join Date: May 2012
Location: Stockholm / Sweden
Age: 49
Posts: 1,571
Fixing filenames is not really connected to the whole md5 thing though. It has nothing to do with why the md5 list is worth it or not =) That is something we can do separately later on if we want to.
Turran is offline  
Old 14 January 2023, 00:31   #3186
jbl007
Registered User
 
Join Date: Mar 2013
Location: Leipzig/Germany
Posts: 466
Quote:
Originally Posted by Turran View Post
Fixing filenames is not really connected to the whole md5 thing though. It has nothing to do with why the md5 list is worth it or not =) That is something we can do separately later on if we want to.
Yes, you're right, there are enough problems already.
And I was wrong about the the given example, This (and the names of the entire DOS collection) is valid UTF-8 already. lol ?
jbl007 is offline  
Old 14 January 2023, 13:29   #3187
peo
Registered User
 
Join Date: Dec 2008
Location: Ursviken
Posts: 137
Quote:
Originally Posted by TCD View Post
If you spot a 'duplicate' file (either MD5/CRC32 or content) please let us know. We're sorting through our own list right now, but any input is welcome.

A few of the duplicates (more than 9x) in the one and only filelist with md5:

Code:
| 3e138ebada8ba93126dd302446b7141e |     9 |
| 2fbe009fdd60fc06de8f89fd233be057 |    10 |
| abdc58466e244dfb311f283bf963af79 |    10 |
| ba2cca460771a476c47030ab7e090eb2 |    10 |
| bbdab90ca1a8cdfb93c9f37bc158d1b3 |    10 |
| be86f74ba9c53e75e9b803fb7d7a36af |    10 |
| 70bf34060d50e597056688fa99bf2ec0 |    10 |
| 84da30b63a1330eed9e7b2a9b41f599d |    10 |
| 84e949105a9487ee1995b0d7d84e2863 |    10 |
| e598cbc8041a61fda570c5ce0f244cfc |    10 |
| a27492eaff39203a2cf0aeac85dda5dd |    11 |
| abbb148c548196075af1e4cce52967aa |    11 |
| dbf944dad1da68bc27df75ff3f3d0c48 |    11 |
| 05e151512960287bdc677a3cc692a79d |    11 |
| 1defdcb40befe33beea19633b36d3e38 |    11 |
| 31e665deab37fe896026ba74d753b21d |    11 |
| 327be6b5215af0a0c0e2cfd8868601ee |    11 |
| 16ca97e42a483ba4558072f148b58e92 |    12 |
| 19325d83936951cc800b495ebccfbb90 |    12 |
| 42001a90b2eef277f85de6f3faf8eff6 |    12 |
| 4adcd25ac65aac19d63727ab82282564 |    12 |
| b769460a6d03c2e31f924dd333fb27e4 |    12 |
| efdf70ec9cc14fffa2f7741c3b21329a |    12 |
| fca6f842650db23aa07594b2cf785e76 |    12 |
| 5c9e49e76868389745b4ed291557deff |    12 |
| 79e98dead188e5cae2883b3f3c610c17 |    12 |
| 81279d38beca10736b747ccb07eb7784 |    12 |
| 8c3a761755490209a636516574194a94 |    12 |
| 546fc569f44bc0be0703759fdc169c81 |    13 |
| 5bbe009c8533f6cc0c89f82b4e890a06 |    13 |
| 7a9431d8190e01df1c0d5a93015d2df0 |    13 |
| 8744eddbca0ed973d89c46d5e2c25e56 |    13 |
| 898239c9587fb575a12275855d8dee5d |    13 |
| ab2ce89d56352f83b718de7285f71402 |    13 |
| ccb7f7b8902fa003fb6816c95842fe1a |    13 |
| d2ddd41cb45b5bc78a35533c7f0d53b5 |    13 |
| dd004267691d166d35218aa908d35c30 |    13 |
| 15f3ca37b2eb7dd40e2898464c1df0e2 |    13 |
| 18e2ae9d8759a8f301a4f077bba415c8 |    13 |
| 20f9bb526f14e267de88311683b2b90f |    13 |
| 2acea5c09be34380312a3b8d2031f8eb |    13 |
| 30f958cd6078c3f39abb863e6eb30a63 |    13 |
| e40a747b067ebad7a90098fb63f7cc49 |    13 |
| e9b106c35a0258471b324fb1521c421c |    13 |
| e102dcd9e7e53a85640f411001de8a23 |    14 |
| ef9fbd7f1557f425929c2571be109c8b |    14 |
| NA                               |    14 |
| c598b0679a0bea27501e94e2b5c26e19 |    14 |
| c5f159e827c5173608aca75b27daf98d |    14 |
| c7b6b31986e79eff1dd3910203b03dcb |    14 |
| dc8a08d7e1fe12650211a0606129ecc3 |    14 |
| 53a0f14273e7a711031704c4adeaac31 |    14 |
| 5d156e706c4b0b05fc75f6214468a428 |    14 |
| 7710b52d8d76a239a2330d3867afe582 |    14 |
| 9237f6371bdfd0d2403e981f6b99ccd6 |    14 |
| 0ebf55553610a366194e9acd996b19e8 |    14 |
| 477af24ba73420a7527277dce268521a |    14 |
| 2de2a7f120bfabec2267eecba7a19bbf |    15 |
| 36a12e8b199bf6f80d67d91d2ac9107f |    15 |
| 9f5030b7d77614bf2d58df73e33abc4c |    15 |
| c46184ba56b9a930b7a660f10edead16 |    15 |
| f4552101b6a9d784708dbb279c23aee9 |    15 |
| 68b5528081c6af598dce5dbf46394228 |    15 |
| 9216fc725ae432b895d44370397955e7 |    15 |
| 5693c3cf9cf3bb4549e00d0687125c08 |    16 |
| 86e4062d68c47b0272b99e6accecdb3f |    16 |
| 13704d049d35c19f914b0de7c4981fd0 |    16 |
| 3ec67fa56390c3b67869bd26e8ab7945 |    16 |
| 982bf12f065365a3322548b678f8e732 |    17 |
| c69a06fd775704d53c86202ff5eb421d |    17 |
| ec1f7d92ba1eeb94a2e9da979abcc1ab |    17 |
| fbd6480bcc6e75085d4851d85666aafb |    18 |
| 52df7414b07d5eae03617032bb797869 |    18 |
| 1f96cd4ee519fcc89418507ee8ec6fbc |    18 |
| 30192725bfe997ce8103be40e697db62 |    18 |
| c57a16cb9a616c8878f65fb5a90e8cdd |    18 |
| f039e3d94d80bb3815f99669ca144c2c |    19 |
| eb708c0394753b8a9a74bee8b10e790b |    20 |
| adad36abed68604dfbaf7c690a205cd4 |    20 |
| 3396839ee915f2ecb8ffebe1edc1a826 |    20 |
| 4193bbccbed77c64a87856ba0fa092e1 |    20 |
| e30ada3ed35f15f8fc281de888066b38 |    21 |
| 96c61981db9d358f7864a044e9ec794e |    23 |
| 348a9791dc41b89796ec3808b5b5262f |    28 |
| ef833a28239ed8130a696318eba3f99a |    29 |
| 96eb917322d5c0420eea3b9b0f37e78c |    33 |
| 7215ee9c7d9dc229d2921a40e899ec5f |    35 |
| d41d8cd98f00b204e9800998ecf8427e |    71 |
| b9b045dc873f0fb6d7430a761ef6b42b |    79 |
| 5384ffc9add566c7551a3db28b4dee18 |    97 |
| 12952de1925843902708a6c485909d8d |   161 |
| fa0c1b2cf14b9e4ae088ab389c2f0736 |   240 |
+----------------------------------+-------+
13695 rows in set (13.22 sec)

Last edited by peo; 14 January 2023 at 13:38.
peo is offline  
Old 14 January 2023, 13:56   #3188
TCD
HOL/FTP busy bee
 
TCD's Avatar
 
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,518
Hmm, without file names and sizes that list is not really useful I'm afraid.

A quick check for the 240x MD5 shows it's a 163 byte PNG:

Not really worth hunting those down I'm afraid.
TCD is offline  
Old 14 January 2023, 14:26   #3189
peo
Registered User
 
Join Date: Dec 2008
Location: Ursviken
Posts: 137
Quote:
Originally Posted by TCD View Post
Hmm, without file names and sizes that list is not really useful I'm afraid.

A quick check for the 240x MD5 shows it's a 163 byte PNG:

Not really worth hunting those down I'm afraid.

Script and grep in the filelist with md5s. The other high count was a set of broken DMS and text files from some place..


I'm currently modifying my Jotta/Elgigantencloud dupfinder to use the eab-md5 list instead, which will first get the md5 of the duplicates (with possible minimum size limit of the file), then for each of these results list the files.
Probably 10 minutes away or something
peo is offline  
Old 14 January 2023, 15:05   #3190
peo
Registered User
 
Join Date: Dec 2008
Location: Ursviken
Posts: 137
ok, 20 minutes later
because of slow database search (no key on md5, which I will add) and because mysql didn't allow to combine normal columns with the aggregate for counting duplicates
This is allowed using sqlite: "select checksum,size,path,count(checksum) as count from files where 1=1 {$sql_minsize} group by checksum having count > 1 order by count"

Easy way to get the size and one sample file name for the first "quick" output (which is really quick on my JottaDB with 7+ million entries)



== File with md5 8fabfb30b05e1218035dea4416b28ca3, 7469947 bytes, found in 4 locations ==
One file named as: ErbenDerErde-DieGrosseSuche_2788.zip

== File with md5 9c1cec9d30f2e70d22154f5abf3a79a3, 6454354 bytes, found in 3 locations ==
One file named as: emanual [Excelsior! BBS Manual].pdf

== File with md5 308b6fe45029894a101bf1dba5bc40b5, 6014654 bytes, found in 3 locations ==
One file named as: aiab-r10.6-20071213.zip

== File with md5 4cdf60fca74085576102d50a068d7103, 6174800 bytes, found in 3 locations ==
One file named as: ScummVM_V1.5.0.004_AGA_060.lha

== File with md5 f651032e4593ca636363f64059bda621, 6996176 bytes, found in 3 locations ==
One file named as: RiseOfTheDragon_2938.zip

== File with md5 92675876c18b266ae9e35e09eb2a1a46, 6267627 bytes, found in 3 locations ==
One file named as: FS-UAE_241_osx5ppc [FS-UAE 2.4.1 for PPC MacOS X 10.5 (CD audio tracks and drive clicks OK)].zip

== File with md5 cd44b2909814f55974b5ebe8fdbc47ce, 9340632 bytes, found in 3 locations ==
One file named as: EvilsDoom_v1.9.lha

== File with md5 df0e666c0b3b422154a6d3b7225e0954, 8425838 bytes, found in 3 locations ==
One file named as: FS-UAE_305_Leo_ppc_Generic.zip

== File with md5 acf95adcf4ee5355806571513136ec42, 7563554 bytes, found in 3 locations ==
One file named as: TFX (AGA) [Bootable HDF.].zip

== File with md5 73bfa557d410b29f0e50e39d984b7405, 6631119 bytes, found in 2 locations ==
One file named as: A3DSRC1.LHA

== File with md5 45dfab4b3101df7280b1bf0f440cf04c, 5311093 bytes, found in 2 locations ==
One file named as: A3DSRC2.LHA

== File with md5 790404ef9b37227882957f68eca2c3ec, 4911233 bytes, found in 2 locations ==
One file named as: A3DSRC3.LHA

== File with md5 392e9c9a573b3c06b498a9352c1c2dbd, 6338418 bytes, found in 2 locations ==
One file named as: DW-ADMIR.LHA

== File with md5 c2be9ba3fca3f88ea7b8a43d996e2395, 5511666 bytes, found in 2 locations ==
One file named as: DW-FREEB.LHA

== File with md5 c1697fa01dfbde19482b9126b74bd6eb, 6431290 bytes, found in 2 locations ==
One file named as: DW-GREET.LHA

== File with md5 80c0d7b617584dfb3ca2ef0f791d2f7a, 4808652 bytes, found in 2 locations ==
One file named as: DW-OUTLA.LHA

== File with md5 e84b2f8de56ac248fe760b5228083df0, 4264302 bytes, found in 2 locations ==
One file named as: DW-PHOEN.LHA

== File with md5 07f49c9b11795abb4581ce65dce3165f, 7164239 bytes, found in 2 locations ==
One file named as: DW-WASTE.LHA

== File with md5 c620dc3fb59649270a4cc4a08aae1031, 4769614 bytes, found in 2 locations ==
One file named as: tbl-tint_.lha

== File with md5 479ad2931f275a6631014689d9a90980, 4559454 bytes, found in 2 locations ==
One file named as: rno-mekk.lha

== File with md5 a1f4aa952d3bc9f2c38b2be752ac541c, 7609636 bytes, found in 2 locations ==
One file named as: ALVEDON.XM

== File with md5 88b507fdebb11505123a68e50779c4c7, 125440245 bytes, found in 2 locations ==
One file named as: SCALA_Multimedia_MM200_English_Manual.pdf

== File with md5 00eff639f0347ddf6fc859f4b1de7824, 15643167 bytes, found in 2 locations ==
One file named as: ALG_Boxart & Config & NVR & ROMs & SShot & Title_WinUAE v4.9.1.rar

== File with md5 a667b55715f8ff8ba65418f9a9eeb042, 2574092061 bytes, found in 2 locations ==
One file named as: Crime Patrol 2 - Drug Wars [American Laser Games].rar

== File with md5 9bbc1ecb9ac000d5a551cff1fceab9db, 2527000332 bytes, found in 2 locations ==
One file named as: Crime Patrol [American Laser Games].rar

== File with md5 2bb5f11bb32bc1b904dfcd331d9076e0, 1716116108 bytes, found in 2 locations ==
One file named as: Fast Draw Showdown [American Laser Games]_original.rar

== File with md5 4436105204dfdc03ecb06b79f754b3a4, 2849424945 bytes, found in 2 locations ==
One file named as: Fast Draw Showdown [American Laser Games]_rotated.rar

== File with md5 c2320fdfb4dc8a69b31a623cef4d5c9b, 3540736450 bytes, found in 2 locations ==
One file named as: Mad Dog II - The Lost Gold [American Laser Games].rar

== File with md5 66a06c20c823d7bbdd7b4fb3141be31b, 815858485 bytes, found in 2 locations ==
One file named as: Platoon [Nova].rar

== File with md5 e3556c351152cdb227d972f4008843d2, 1918651408 bytes, found in 2 locations ==
One file named as: Space Pirates [American Laser Games].rar

== File with md5 42422bfe0529668bf241b260362c5842, 2476221997 bytes, found in 2 locations ==
One file named as: The Last Bounty Hunter [American Laser Games].rar

== File with md5 38501d8df7b8c7a0d2a1fc1501ca51d2, 1051845943 bytes, found in 2 locations ==
One file named as: Who Shot Johnny Rock [American Laser Games].rar

== File with md5 40dc4d55acf9df1c8ea4219926e332ff, 1425335446 bytes, found in 2 locations ==
One file named as: Gallagher's Gallery [American Laser Games]_alt.rar

== File with md5 73721e34e558e3f7fd0e09b18d74eea8, 1710449693 bytes, found in 2 locations ==
One file named as: Mad Dog McCree [American Laser Games]_alt.rar

== File with md5 e48fdeb87df64184c4f3116fb206dc78, 10212251 bytes, found in 2 locations ==
One file named as: East vs. West - Berlin 1948 - Soundtrack - 02. Cutscene-ENG.mp3

== File with md5 cb10f16f0753021d9730413e8b0c0149, 12473573 bytes, found in 2 locations ==
One file named as: East vs. West - Berlin 1948 - Soundtrack - 03. Cinema Scene-ENG.mp3

== File with md5 27f24f023a81ac981f24edde268238ed, 8047762 bytes, found in 2 locations ==
One file named as: Juli Sane - Adventures of Quik and Silva [Quik Silva Level 1 - Juli Sane RMX].mp3

== File with md5 be144dc7e5765e874fd55bb887da8f90, 4852472 bytes, found in 2 locations ==
One file named as: Shock - [Hostages - Feel the Heat Remix].mp3

== File with md5 27279adc2606516bc9bbdff3863fcb3f, 5839052 bytes, found in 2 locations ==
One file named as: Shock - [Hostages - Ghost Story Orchestral Remix].mp3

== File with md5 a708b50b89e534e972a9278d7953f759, 6498560 bytes, found in 2 locations ==
One file named as: WikoX - [Eye Of The Beholder - Title Remix].mp3

== File with md5 1532cdd9036e7da478ee26173bceea3a, 4549900 bytes, found in 2 locations ==
One file named as: commander flying kick - Sensible Soccer [Sensible Soccer - Menu Title - Sensible Roccer].mp3

== File with md5 ade16c651e8b884227e6ebe28b69906d, 4667264 bytes, found in 2 locations ==
One file named as: nightwolf - [lff].mp3


(a lot more following, but the forum limits a post to 20000 bytes)
peo is offline  
Old 14 January 2023, 15:17   #3191
TCD
HOL/FTP busy bee
 
TCD's Avatar
 
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,518
Okay that is more useful. If you could create a textfile of the result that would be appreciated
TCD is offline  
Old 14 January 2023, 15:27   #3192
peo
Registered User
 
Join Date: Dec 2008
Location: Ursviken
Posts: 137
Found a bug in Turran's deleted md5-script..
Code:
select size,filetime,convert(from_base64(path) using utf8) as path from files where 1=1 and checksum='8fabfb30b05e1218035dea4416b28ca3' order by path;

| 7469947 | 2018-09-27T21:20:45+0200 | Non TOSEC IPFs - Official/ErbenDerErde-DieGrosseSuche_2788.zip
| 7453735 | 2020-06-02T00:56:18+0200 | Non TOSEC IPFs - Official/ErbenderErde-DieGrosseSuche_2788.zip
| 7469947 | 2020-12-27T15:52:08+0100 | Non TOSEC IPFs - Unofficial/SPS 0001-3730/[2788]ErbenDerErde-DieGrosseSuche[2788].zip
| 7469947 | 2018-10-04T00:02:17+0200 | TheZone/files/_2018/ErbenDerErde-DieGrosseSuche_2788 [Erben der Erde - official IPF 2788].zip
peo is offline  
Old 14 January 2023, 16:32   #3193
peo
Registered User
 
Join Date: Dec 2008
Location: Ursviken
Posts: 137
Quote:
Originally Posted by TCD View Post
Okay that is more useful. If you could create a textfile of the result that would be appreciated

I'm ready to do a first-and-final (until there is a new md5-filelist available) run of the script to generate a text file for you.


What filtering and sorting do you want ?
Minimum file size to consider worth checking for
Sort order (first level), either count of duplicates or the size of an individual file in that set ?
(second level sort order will be the full path to the file)
peo is offline  
Old 14 January 2023, 16:37   #3194
TCD
HOL/FTP busy bee
 
TCD's Avatar
 
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,518
Sorting: File size (doesn't matter if ascending or descending)
Filter: Everything except .info files
Minimum file size: 1000 bytes

Ideally each occurence of the file (not just the count of duplicates) should be listed to avoid having to cross reference.
TCD is offline  
Old 14 January 2023, 17:36   #3195
peo
Registered User
 
Join Date: Dec 2008
Location: Ursviken
Posts: 137
Quote:
Originally Posted by TCD View Post
Sorting: File size (doesn't matter if ascending or descending)
Filter: Everything except .info files
Minimum file size: 1000 bytes

Ideally each occurence of the file (not just the count of duplicates) should be listed to avoid having to cross reference.

Done.. didn't have name output filtering in order, so I had to fix that one before (and in an ugly way)..


https://peo.yliniemi.se/eabdupes.txt.gz
peo is offline  
Old 14 January 2023, 17:54   #3196
TCD
HOL/FTP busy bee
 
TCD's Avatar
 
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,518
Thank you

Edit: Just went through the first few entries and the format is very easy to read

Last edited by TCD; 14 January 2023 at 19:22.
TCD is offline  
Old 14 January 2023, 20:07   #3197
Turran
Moderator
 
Turran's Avatar
 
Join Date: May 2012
Location: Stockholm / Sweden
Age: 49
Posts: 1,571
Managed to salvage 70% of the script.. I'll look into it again but more slowly this time =)

Still not sure of the validity of this but we'll see.
Turran is offline  
Old 14 January 2023, 20:27   #3198
Turran
Moderator
 
Turran's Avatar
 
Join Date: May 2012
Location: Stockholm / Sweden
Age: 49
Posts: 1,571
Can't find ErbenDerErde-DieGrosseSuche_2788.zip anywhere anymore. Guess TCD deleted them so not sure what the bug was.



Quote:
Originally Posted by peo View Post
Found a bug in Turran's deleted md5-script..
Code:
select size,filetime,convert(from_base64(path) using utf8) as path from files where 1=1 and checksum='8fabfb30b05e1218035dea4416b28ca3' order by path;

| 7469947 | 2018-09-27T21:20:45+0200 | Non TOSEC IPFs - Official/ErbenDerErde-DieGrosseSuche_2788.zip
| 7453735 | 2020-06-02T00:56:18+0200 | Non TOSEC IPFs - Official/ErbenderErde-DieGrosseSuche_2788.zip
| 7469947 | 2020-12-27T15:52:08+0100 | Non TOSEC IPFs - Unofficial/SPS 0001-3730/[2788]ErbenDerErde-DieGrosseSuche[2788].zip
| 7469947 | 2018-10-04T00:02:17+0200 | TheZone/files/_2018/ErbenDerErde-DieGrosseSuche_2788 [Erben der Erde - official IPF 2788].zip
Turran is offline  
Old 15 January 2023, 07:44   #3199
TCD
HOL/FTP busy bee
 
TCD's Avatar
 
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,518
The IPFs inside those zips were identical (MD5). There are three folders with identical IPFs, but slightly different zip file sizes.

Edit: Just ran across another example:

Contents of the zip files:


File sizes of the zip files:

Last edited by TCD; 15 January 2023 at 08:45.
TCD is offline  
Old 16 January 2023, 07:34   #3200
TCD
HOL/FTP busy bee
 
TCD's Avatar
 
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,518
I've moved the technical discussion to a new thread: http://eab.abime.net/showthread.php?t=113252

That way you can still exchange ideas and this thread stays more readable for us normals
TCD is offline  
 


Currently Active Users Viewing This Thread: 2 (1 members and 1 guests)
SquawkBox
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
While EAB file server is down... Retroplay project.Amiga File Server 24 18 March 2020 11:34
SWOS '96-'97 (EAB File Server) Specksynder support.Games 8 27 September 2014 14:08
Secondary file server DannyBoy project.Amiga File Server 12 12 August 2013 11:33
Is EAB file server deleted ? Ribdevil project.Amiga File Server 13 03 February 2012 19:28
'Syncing' with the file server exoticaga project.Amiga File Server 5 20 August 2010 00:36

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 14:51.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.17769 seconds with 14 queries