English Amiga Board


Go Back   English Amiga Board > Other Projects > project.Amiga File Server

 
 
Thread Tools
Old 14 January 2023, 22:05   #1
peo
Registered User
 
Join Date: Dec 2008
Location: Ursviken
Posts: 137
Floppy disk File name technical discussion

Quote:
Originally Posted by Turran View Post
Can't find ErbenDerErde-DieGrosseSuche_2788.zip anywhere anymore. Guess TCD deleted them so not sure what the bug was.

The problem was caused by the filenames:
Non TOSEC IPFs - Official/ErbenDerErde-DieGrosseSuche_2788.zip
Non TOSEC IPFs - Official/ErbenderErde-DieGrosseSuche_2788.zip


Different files, different checksums (but saved in the list as having the same checksum). As you see in the database/md5-file contents, they are in the same directory, have the "same" name (different case "ErbenDer" vs "Erbender"), so they are treated as different files by the file system.
peo is offline  
Old 15 January 2023, 10:22   #2
Turran
Moderator
 
Turran's Avatar
 
Join Date: May 2012
Location: Stockholm / Sweden
Age: 49
Posts: 1,572
Aha! Totally missed that. Must be a case insensitive sql query that caused that. Perhaps its better to move to base64 like you said. I'll give it a go soon.



Quote:
Originally Posted by peo View Post
The problem was caused by the filenames:
Non TOSEC IPFs - Official/ErbenDerErde-DieGrosseSuche_2788.zip
Non TOSEC IPFs - Official/ErbenderErde-DieGrosseSuche_2788.zip


Different files, different checksums (but saved in the list as having the same checksum). As you see in the database/md5-file contents, they are in the same directory, have the "same" name (different case "ErbenDer" vs "Erbender"), so they are treated as different files by the file system.
Turran is offline  
Old 15 January 2023, 11:12   #3
peo
Registered User
 
Join Date: Dec 2008
Location: Ursviken
Posts: 137
Quote:
Originally Posted by Turran View Post
Aha! Totally missed that. Must be a case insensitive sql query that caused that. Perhaps its better to move to base64 like you said. I'll give it a go soon.

When I changed to have base64-encoded filenames in the database, I stumbled across something unexpected.. I'm used to see that base64-encoded strings always ends with "==", but many of the encoded names ended with "SETVMK".. couldn't find anything useful when searching for that end-sequence..


From my short sample of the filelist (200 entries, while working on it)

Code:
mysql> select path from eabfiles;
+------------------------------------------------------------------------------------------------------+
| path                                                                                                 |
+------------------------------------------------------------------------------------------------------+
| Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC9DTE9VRF9LSU5HRE9NUy5ETVMK             |
| Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC81VEhfR0VBUi5ETVMK                     |
| Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC9CSU9fQ0hBTExFTkdFLkRNUwo=             |
| Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC82ODhhdHRhY2tzdWIuRE1TCg==             |
| Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC9DTE9XTk9NQU5JQS5ETVMK                 |
| Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC85X0xJVkVTLkRNUwo=                     |
| Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC9CRUFTVC5ETVMK                         |
| Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC9DUllTVEFMUy5ETVMK                     |
| Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC9BMTBfVEFOS19LSUxMRVJfMS5ETVMK         |
| Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC9CQVRUTEVfU1FVQURST04uRE1TCg==         |
And, another annoying thing.. when converting back to text, it looked like I got some extra content (a linebreak and some whitespace) after each filename (but PHP or some other thing during handling) seems to filter that out:


Code:
mysql> select convert(from_base64(path) using utf8) from eabfiles;
+---------------------------------------------------------------------------+
| convert(from_base64(path) using utf8)                                     |
+---------------------------------------------------------------------------+
| Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/CLOUD_KINGDOMS.DMS
        |
| Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/5TH_GEAR.DMS
              |
| Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/BIO_CHALLENGE.DMS
         |
| Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/688attacksub.DMS
          |
| Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/CLOWNOMANIA.DMS
           |
| Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/9_LIVES.DMS
               |
| Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/BEAST.DMS
                 |
| Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/CRYSTALS.DMS
              |
Just to let you know
It works fine using queries (or maybe I had to add '%' at the end when searching for the exact filename using 'like')
peo is offline  
Old 15 January 2023, 15:37   #4
Turran
Moderator
 
Turran's Avatar
 
Join Date: May 2012
Location: Stockholm / Sweden
Age: 49
Posts: 1,572
Quote:
Originally Posted by peo View Post
And, another annoying thing.. when converting back to text, it looked like I got some extra content (a linebreak and some whitespace) after each filename

Same here. What I found:
This is actually documented behaviour for to_base64 http://dev.mysql.com/doc/refman/5.6/...tion_to-base64 "A newline is added after each 76 characters of encoded output to divide long output into multiple lines. "

from https://bugs.mysql.com/bug.php?id=78486

Not sure if its related since its from_base64 in our case.. At least I import the base64 value directly and is not using to_base64.

Edit: this works
Code:
replace(FROM_BASE64(filepath), '\n', '')
No need for % in a like (I hate using like...)
Code:
select filepath, replace(FROM_BASE64(filepath), '\n', ''), md5sum FROM md5 where replace(FROM_BASE64(filepath), '\n', '') = 'Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/BIO_CHALLENGE.DMS' LIMIT 1
or
select filepath, trim(TRAILING '\n' from FROM_BASE64(filepath)), md5sum FROM md5 where replace(FROM_BASE64(filepath), '\n', '') = 'Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/BIO_CHALLENGE.DMS' LIMIT 1
I put a fresh mysqldump in /~Uploads/turranius if you want it.
database gllog
table md5
Feel free to modify before importing it. =)

Edit2: One problem with using base64 instead of my previous ASCII list is that indexes are never used this way.
Code:
MariaDB [gllog]> explain select filepath, replace(FROM_BASE64(filepath), '\n', '') AS realpath, md5sum FROM md5 where replace(FROM_BASE64(filepath), '\n', '') = 'Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/BIO_CHALLENGE.DMS' LIMIT 1;
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
|    1 | SIMPLE      | md5   | ALL  | NULL          | NULL | NULL    | NULL | 1500731 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
That searches through the entire database each time (1500731 rows).
My previous version with non ASCII gave me 1 possible hit with a good enough index. Luckily, we do not search for the filename that often and I only search for the md5sum when creating the list, so no biggie in this case, but other queries on the name directly will be significantly slower.
If I had not had the "limit 1" there, a full search takes 1.331 seconds which is.. to slow.
Can be solved with storing the name as well in the database of course, but that is what we are trying to prevent having to do..

Last edited by Turran; 15 January 2023 at 16:26.
Turran is offline  
Old 15 January 2023, 17:07   #5
peo
Registered User
 
Join Date: Dec 2008
Location: Ursviken
Posts: 137
Quote:
Originally Posted by Turran View Post
Same here. What I found:
This is actually documented behaviour for to_base64 http://dev.mysql.com/doc/refman/5.6/...tion_to-base64 "A newline is added after each 76 characters of encoded output to divide long output into multiple lines. "

from https://bugs.mysql.com/bug.php?id=78486

Not sure if its related since its from_base64 in our case.. At least I import the base64 value directly and is not using to_base64.

That's actually how it's supposed to work, but when I imported stuff into the database, I used the terribly slow method of converting the paths in the list using 'base64 --wrap=0':
Code:
...tml/eabstuff# base64 <<< 'Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/CLOUD_KINGDOMS.DMS'
Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC9DTE9VRF9LSU5H
RE9NUy5ETVMK
...tml/eabstuff# base64 --wrap=0 <<< 'Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/CLOUD_KINGDOMS.DMS'
Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC9DTE9VRF9LSU5HRE9NUy5ETVMK
Just to investigate the annoyance a bit, I wrote this test in PHP with base64_encode/decode:
Code:
<?php
  $b64 = "Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC9DTE9VRF9LSU5HRE9NUy5ETVMK";
  print "'" . $b64 . "'\n";
  $text = base64_decode($b64);
  print "'" . $text . "'\n";
  print "'" . trim($text) . "'\n";
  $b64 = base64_encode($text);
  print "'" . $b64 . "'\n";
  print "@END"
?>
If you run that, you will see that the base64 encoded string will become the same as it was.. First print of the decoded will put out a newline at the end of the string.. Removed using trim(), and re-encoded, that newline will be put back.. But maybe that is how it's supposed to work (every base64 encoded string includes the newline at the end and adds it if it's not already there)


Quote:

Edit: this works
Code:
replace(FROM_BASE64(filepath), '\n', '')
Seems to be the only way to get rid of the newline..

Quote:
No need for % in a like (I hate using like...)
Code:
 
select filepath, replace(FROM_BASE64(filepath), '\n', ''), md5sum FROM md5 where replace(FROM_BASE64(filepath), '\n', '') = 'Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/BIO_CHALLENGE.DMS' LIMIT 1
 or
select filepath, trim(TRAILING '\n' from FROM_BASE64(filepath)), md5sum FROM md5 where replace(FROM_BASE64(filepath), '\n', '') = 'Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/BIO_CHALLENGE.DMS' LIMIT 1
True, when searching for the complete path, but when searching for the same filename (omitting the path to it) % has to be used...
Code:
...  where replace(FROM_BASE64(filepath), '\n', '') like '%/BIO_CHALLENGE.DMS'
Quote:

I put a fresh mysqldump in /~Uploads/turranius if you want it.
database gllog
table md5
Feel free to modify before importing it. =)

Edit2: One problem with using base64 instead of my previous ASCII list is that indexes are never used this way.
Code:
MariaDB [gllog]> explain select filepath, replace(FROM_BASE64(filepath), '\n', '') AS realpath, md5sum FROM md5 where replace(FROM_BASE64(filepath), '\n', '') = 'Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/BIO_CHALLENGE.DMS' LIMIT 1;
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
|    1 | SIMPLE      | md5   | ALL  | NULL          | NULL | NULL    | NULL | 1500731 | Using where |
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------+
That searches through the entire database each time (1500731 rows).
My previous version with non ASCII gave me 1 possible hit with a good enough index. Luckily, we do not search for the filename that often and I only search for the md5sum when creating the list, so no biggie in this case, but other queries on the name directly will be significantly slower.
If I had not had the "limit 1" there, a full search takes 1.331 seconds which is.. to slow.
Can be solved with storing the name as well in the database of course, but that is what we are trying to prevent having to do..
Not a problem, even if you want to search for the filename (full path).. Just search for the base64 enoded string (and add an index to the field)
peo is offline  
Old 15 January 2023, 17:16   #6
Turran
Moderator
 
Turran's Avatar
 
Join Date: May 2012
Location: Stockholm / Sweden
Age: 49
Posts: 1,572
Quote:
Originally Posted by peo View Post
That's actually how it's supposed to work, but when I imported stuff into the database, I used the terribly slow method of converting the paths in the list using 'base64 --wrap=0':
Same here. As you say, seems it adds a newline anyway. Annoying.

Quote:
Originally Posted by peo View Post
True, when searching for the complete path, but when searching for the same filename (omitting the path to it) % has to be used...
That is true.. I was thinking of adding the filename to the database directly. I could do that easily before converting it to base64 or course, or of I stored the full path, I could, in my LOAD DATA INFILE, just use
filename = SUBSTRING_INDEX(@path, '/', -1)

Problem is that every little thing I do in the "for each file" loop adds several minutes. So trying to keep it simple.

Quote:
Originally Posted by peo View Post
Not a problem, even if you want to search for the filename (full path).. Just search for the base64 enoded string (and add an index to the field)
Ah, true. Clever.
Turran is offline  
Old 15 January 2023, 17:47   #7
peo
Registered User
 
Join Date: Dec 2008
Location: Ursviken
Posts: 137
Quote:
Originally Posted by Turran View Post
That is true.. I was thinking of adding the filename to the database directly. I could do that easily before converting it to base64 or course, or of I stored the full path, I could, in my LOAD DATA INFILE, just use filename = SUBSTRING_INDEX(@path, '/', -1)
That was what we were trying to avoid by using base64
There will be problems with those unusable characters in filenames if you try to add them to the database..


Index added to the base64 field, then the same lookup as you did:
Code:
mysql> select filepath, convert(replace(FROM_BASE64(filepath), '\n', '') using utf8) AS realpath, md5sum FROM
md5 where filepath='Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC9CSU9fQ0hBTExFTkdFLkRNUwo='\G
*************************** 1. row ***************************
filepath: Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC9CSU9fQ0hBTExFTkdFLkRNUwo=
realpath: Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/BIO_CHALLENGE.DMS
  md5sum: 97de190f4bd8ccf0f72b62e8df7efe1e
1 row in set, 1 warning (0.00 sec)
mysql>
peo is offline  
Old 15 January 2023, 18:08   #8
peo
Registered User
 
Join Date: Dec 2008
Location: Ursviken
Posts: 137
Truly annoying:
https://www.google.com/search?q=newi...+base64+encode


As one would do:

Code:
mysql> select replace(to_base64('Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/BIO_CHALLENGE.DMS'),'\n',''); +------------------------------------------------------------------------------------------------+
| replace(to_base64('Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/BIO_CHALLENGE.DMS'),'\n','') |
+------------------------------------------------------------------------------------------------+
| Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC9CSU9fQ0hBTExFTkdFLkRNUw==       |
+------------------------------------------------------------------------------------------------+

But need to do:
Code:
mysql> select replace(to_base64('Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/BIO_CHALLENGE.DMS\n'),'\n','');
+--------------------------------------------------------------------------------------------------+
| replace(to_base64('Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/BIO_CHALLENGE.DMS\n'),'\n','') |
+--------------------------------------------------------------------------------------------------+
| Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC9CSU9fQ0hBTExFTkdFLkRNUwo=         |
+--------------------------------------------------------------------------------------------------+

Then the previous search query without giving the base64-encoded path directly:
Code:
mysql> select filepath, convert(replace(FROM_BASE64(filepath), '\n', '') using utf8) AS realpath, md5sum FROM
md5 where filepath=replace(to_base64('Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/BIO_CHALLENGE.DMS\n'),'\n','')\G
*************************** 1. row ***************************
filepath: Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC9CSU9fQ0hBTExFTkdFLkRNUwo=
realpath: Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/BIO_CHALLENGE.DMS
  md5sum: 97de190f4bd8ccf0f72b62e8df7efe1e
1 row in set, 1 warning (0.00 sec)
peo is offline  
Old 15 January 2023, 22:20   #9
Turran
Moderator
 
Turran's Avatar
 
Join Date: May 2012
Location: Stockholm / Sweden
Age: 49
Posts: 1,572
If I want the filename as base64, I can add it in the LOAD DATA INPATH from the base64 of the path =)

Code:
MariaDB [(none)]> select TO_BASE64(SUBSTRING_INDEX(replace(FROM_BASE64('Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC9CSU9fQ0hBTExFTkdFLkRNUwo='), '\n', ''), '/', -1)) as "base64_filename from base64";
+-----------------------------+
| base64_filename from base64 |
+-----------------------------+
| QklPX0NIQUxMRU5HRS5ETVM=    |
+-----------------------------+
1 row in set (0.000 sec)


MariaDB [(none)]> SELECT REPLACE(FROM_BASE64('QklPX0NIQUxMRU5HRS5ETVM='), '\n', '') as 'filename from base64';
+----------------------+
| filename from base64 |
+----------------------+
| BIO_CHALLENGE.DMS    |
+----------------------+
1 row in set (0.000 sec)
So..
Code:
$SQL "load data LOCAL infile \"/tmp/filelist_md5_import.txt\" INTO TABLE md5 fields terminated BY '|' (@base64, @md5sum) set base64 = @base64, md5sum = @md5sum, base64filename = TO_BASE64(SUBSTRING_INDEX(replace(FROM_BASE64(@base64), '\n', ''), '/', -1))"
It beats trying to do the conversion to filename as base64 in the "for each" loop for each file...

Database gets pretty unreadable directly though .. =)

Code:
MariaDB [gllog]> select * from md5 where ID = '1507308';
+---------+----------------------------------------------------------------------------------+------------------+----------------------------------+
| ID      | base64                                                                           | base64filename   | md5sum                           |
+---------+----------------------------------------------------------------------------------+------------------+----------------------------------+
| 1507308 | Q29sbGVjdGlvbi9CQlMvQWxmcmVkX0FtaWdhX0NvbnNvbGVfQkJTL0FtaWdhMC9FTFZJUkFfNS5ETVMK | RUxWSVJBXzUuRE1T | 2c7f544921ded01430b340ba023a0f4a |
+---------+----------------------------------------------------------------------------------+------------------+----------------------------------+
1 row in set (0.001 sec)

MariaDB [gllog]> select ID, replace(FROM_BASE64(base64), '\n', '') as fullpath, replace(FROM_BASE64(base64filename), '\n', '') as filename, md5sum from md5 where ID = '1507308';
+---------+-------------------------------------------------------------+--------------+----------------------------------+
| ID      | fullpath                                                    | filename     | md5sum                           |
+---------+-------------------------------------------------------------+--------------+----------------------------------+
| 1507308 | Collection/BBS/Alfred_Amiga_Console_BBS/Amiga0/ELVIRA_5.DMS | ELVIRA_5.DMS | 2c7f544921ded01430b340ba023a0f4a |
+---------+-------------------------------------------------------------+--------------+----------------------------------+
1 row in set (0.000 sec)
We might want to stop posting queries in here.. getting a bit spammy and uninteresting for the rest I guess.. Sorry!

Last edited by Turran; 15 January 2023 at 22:51.
Turran is offline  
Old 16 January 2023, 15:03   #10
TCD
HOL/FTP busy bee
 
TCD's Avatar
 
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,522
Quote:
Originally Posted by peo View Post
Done.. didn't have name output filtering in order, so I had to fix that one before (and in an ugly way)..


https://peo.yliniemi.se/eabdupes.txt.gz
This is both for you and Turran: Just found an interesting case (I think). There's a file called 'Grafik auf dem Amiga (1987)(Markt&Technik)(DE).zip' in 'TOSEC-PIX/Commodore Amiga - Books'. There is a file with the identical size but one difference in the name in 'Collection/Scans/Books': Grafik auf dem Amiga (1987)(Markt&Technik)(De).zip

So the MD5 is different if one letter of the file name is a different case? (DE vs De)

Both zips have the same file in them.

Edit: Found another one 'Amiga Programmers Handbook Vol I 2nd Edition (1987)(Sybex).zip' in the same two folders.

Last edited by TCD; 16 January 2023 at 15:37.
TCD is online now  
Old 17 January 2023, 10:05   #11
peo
Registered User
 
Join Date: Dec 2008
Location: Ursviken
Posts: 137
Quote:
Originally Posted by TCD View Post
So the MD5 is different if one letter of the file name is a different case? (DE vs De)

Both zips have the same file in them.

Edit: Found another one 'Amiga Programmers Handbook Vol I 2nd Edition (1987)(Sybex).zip' in the same two folders.

No, the md5 for zip files will be different even if created from the same file. Probably because zip stores irrelevant stuff as zip creation datetime. Identical files, no matter how they are named have identical md5 checksums.

This is a reason for either leave already compressed content (pdf, videos, images) uncompressed if possible (when an acrhive contains a single file)

Different content might have the same checksum, but that has so far (as I know) not yet happened for me. There is only 16^32 possible combinations for md5sums..

Last edited by peo; 17 January 2023 at 10:46.
peo is offline  
Old 17 January 2023, 10:12   #12
TCD
HOL/FTP busy bee
 
TCD's Avatar
 
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,522
Quote:
Originally Posted by peo View Post
Probably because zip stores irrelevant stuff as zip creation datetime.
Okay, that would explain it.
TCD is online now  
Old 20 January 2023, 16:07   #13
Turran
Moderator
 
Turran's Avatar
 
Join Date: May 2012
Location: Stockholm / Sweden
Age: 49
Posts: 1,572
New filelist_md5.zip in the root if anyone wants it. Script is still not scheduled to run automatically so I run it during calm moments =)

I included the filelist_md5_import.txt file in it which is what I use to import to a database using
Code:
"load data LOCAL infile \"/tmp/filelist_md5_import.txt\" INTO TABLE md5 fields terminated BY '|' (@base64, @md5sum) set base64 = @base64, md5sum = @md5sum, base64filename = TO_BASE64(SUBSTRING_INDEX(replace(FROM_BASE64(@base64), '\n', ''), '/', -1))"
Table created as such
Code:
-- Dumping structure for table gllog.md5
CREATE TABLE IF NOT EXISTS `md5` (
  `ID` bigint(20) NOT NULL AUTO_INCREMENT,
  `base64` varchar(400) NOT NULL,
  `base64filename` varchar(255) DEFAULT NULL,
  `md5sum` tinytext NOT NULL,
  KEY `ID` (`ID`),
  KEY `md5sum_index` (`md5sum`(5)),
  KEY `base64_index` (`base64`(100)),
  KEY `base64filename_index` (`base64filename`(10))
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci COMMENT='Contains the md5sum of all files';
Turran is offline  
Old 20 January 2023, 18:16   #14
TCD
HOL/FTP busy bee
 
TCD's Avatar
 
Join Date: Sep 2006
Location: Germany
Age: 46
Posts: 31,522
Found another interesting case (at least for me ):
Code:
== File with md5 NA, 901120 bytes, found in 14 locations ==
One of those files appears as 'Dynamite Dick (C) C.S.adf' in FileZilla, but I guess there's some file name problem that prevents the MD5 calculation.
TCD is online now  
Old 21 January 2023, 15:49   #15
peo
Registered User
 
Join Date: Dec 2008
Location: Ursviken
Posts: 137
Quote:
Originally Posted by TCD View Post
Found another interesting case (at least for me ):
Code:
== File with md5 NA, 901120 bytes, found in 14 locations ==
One of those files appears as 'Dynamite Dick (C) C.S.adf' in FileZilla, but I guess there's some file name problem that prevents the MD5 calculation.

That's because the MD5 field was "NA" in the filelist with the md5sums (at the time I imported it). As you mentioned, probably a file name problem (and indicating 14 copies means nothing in this case, as the files involved could be just anything - all different or all the same)
Code:
2023-01-06T12:02:49+0100|901120|Collection/BBS/BS1 RoniWorlds PART1/010 AMIGA POKER  3.0/Disk1.adf|NA
2022-12-31T16:50:35+0100|901120|Collection/Compilation/Amiga_Up_Yours/Image12/Dynamite Dick      (C) C.S.adf|NA
2022-12-31T16:58:11+0100|901120|Collection/Compilation/Amiga_Up_Yours/image3/X-OUT  DISK ONE   PARANOIM.adf|NA
2022-12-31T16:58:29+0100|901120|Collection/Compilation/Amiga_Up_Yours/image30/DEREK'S RSI DISK  .adf|NA
2022-12-31T16:58:57+0100|901120|Collection/Compilation/Amiga_Up_Yours/image32/VICTORY  - VF -.adf|NA
2022-12-28T13:10:03+0100|901120|Collection/Various/Lynk's Stuff/John/Wings of Fury (1990)(Broderbund) [t  8+][h ACU].adf|NA
2022-12-28T16:26:03+0100|901120|Collection/Various/Lynk's Stuff/NEW/John/Disk 452 Mr. Wobbly Leg  V1.0 (M.C.Diskett).adf|NA
2022-12-28T16:32:04+0100|901120|Collection/Various/Lynk's Stuff/NEW/Marc & Dad/Amiganuts Disk AUE  210.adf|NA
2023-01-05T16:26:42+0100|901120|Non TOSEC Applications/Disk/ADF/Imagine V4.0  rev1.0 d2.adf|NA
2023-01-05T16:26:42+0100|901120|Non TOSEC Applications/Disk/ADF/Imagine V4.0  rev1.0 d3.adf|NA
2023-01-05T16:26:41+0100|901120|Non TOSEC Applications/Disk/ADF/Imagine V4.0  rev1.0 d1.adf|NA
2023-01-05T16:27:27+0100|901120|Non TOSEC Applications/Disk/ADF/STEINBERG  PRO-24 A.adf|NA
2023-01-05T17:27:38+0100|901120|Non TOSEC Games/Disk/Collections/JD_Disks_2020-04/ST-00    .adf|NA
2023-01-05T18:14:12+0100|901120|Non TOSEC Games/Disk/dylan dog 12 -  il lungo addio.adf|NA

Last edited by peo; 21 January 2023 at 16:05. Reason: added listing
peo is offline  
Old 21 January 2023, 16:41   #16
peo
Registered User
 
Join Date: Dec 2008
Location: Ursviken
Posts: 137
Quote:
Originally Posted by Turran View Post
I included the filelist_md5_import.txt file in it which is what I use to import to a database using
Code:
"load data LOCAL infile \"/tmp/filelist_md5_import.txt\" INTO TABLE md5 fields terminated BY '|' (@base64, @md5sum) set base64 = @base64, md5sum = @md5sum, base64filename = TO_BASE64(SUBSTRING_INDEX(replace(FROM_BASE64(@base64), '\n', ''), '/', -1))"

You need a larger field for base64filename:
Code:
Query OK, 1461527 rows affected, 20 warnings (1 min 41.25 sec)
Records: 1461527  Deleted: 0  Skipped: 0  Warnings: 20

mysql> show warnings;
+---------+------+-----------------------------------------------------------+
| Level   | Code | Message                                                   |
+---------+------+-----------------------------------------------------------+
| Warning | 1265 | Data truncated for column 'base64filename' at row 615961  |
| Warning | 1265 | Data truncated for column 'base64filename' at row 621119  |
| Warning | 1265 | Data truncated for column 'base64filename' at row 626187  |
| Warning | 1265 | Data truncated for column 'base64filename' at row 1285275 |
| Warning | 1265 | Data truncated for column 'base64filename' at row 1304865 |
| Warning | 1265 | Data truncated for column 'base64filename' at row 1306140 |
| Warning | 1265 | Data truncated for column 'base64filename' at row 1436379 |
| Warning | 1265 | Data truncated for column 'base64filename' at row 1436435 |
| Warning | 1265 | Data truncated for column 'base64filename' at row 1436528 |
| Warning | 1265 | Data truncated for column 'base64filename' at row 1436636 |
| Warning | 1265 | Data truncated for column 'base64filename' at row 1436853 |
| Warning | 1265 | Data truncated for column 'base64filename' at row 1436871 |
| Warning | 1265 | Data truncated for column 'base64filename' at row 1436941 |
| Warning | 1265 | Data truncated for column 'base64filename' at row 1437071 |
| Warning | 1265 | Data truncated for column 'base64filename' at row 1437133 |
| Warning | 1265 | Data truncated for column 'base64filename' at row 1437153 |
| Warning | 1265 | Data truncated for column 'base64filename' at row 1440163 |
| Warning | 1265 | Data truncated for column 'base64filename' at row 1440259 |
| Warning | 1265 | Data truncated for column 'base64filename' at row 1441249 |
| Warning | 1265 | Data truncated for column 'base64filename' at row 1441251 |
+---------+------+-----------------------------------------------------------+
20 rows in set (0.00 sec)
peo is offline  
Old 29 January 2023, 12:22   #17
rygar
Registered User
 
Join Date: Nov 2007
Location: Poland
Posts: 1,297
There's also problem with
Non TOSEC IPFs - Official:
SaintandGreavsie_2973.zip
SaintAndGreavsie_2973.zip
Both files have different file size but seems to be the same.
rygar is offline  
 


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Similar Threads
Thread Thread Starter Forum Replies Last Post
WinUAE technical questions. kamelito support.WinUAE 6 16 July 2022 20:31
academic technical discussion around the flaws Gorf Coders. System 17 20 February 2021 12:13
Any software to see technical OS details? necronom support.Other 3 02 April 2016 12:05
Looking for technical manuals Leandro Jardim Retrogaming General Discussion 5 30 December 2012 15:36
WB technical question... nikvest support.Apps 6 16 April 2008 09:13

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +2. The time now is 05:25.

Top

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
Page generated in 0.09312 seconds with 13 queries