peo 27 July 2019 15:17

Timestamps for files on FTP / Gdrive
Is it possible to add a timestamp to each of the files in filelist.txt on the FTP that gets mirrored over to the gdrive ?

An alternative to not disturb those who parses the filelist.txt (and are not prepared for a simple change), add another filelist with the timestamps in the second column (before size)..

It's painfully slow to run the stat command on the remote files..

To be more clear, I would like the output from stat -c "%Y" to be added (%Y = time of last data modification, seconds since Epoch)

[or even md5 checksums for the files, but that's painfully slow on large files even on the local drive, could be built around, storing the md5s in a database for each file and use the stored value when timestamp has not changed]

peo 28 July 2019 21:12

1 Attachment(s)
More as a proof of concept.. might be faster if coded directly as a shell script (I partially did that first, and that code is included too).

Written in PHP, this creates a filelist of the current directory (recursively) with the same format as EAB FTP filelist.txt but with the added fields for time (seconds since Epoch) and md5 (between date and size)

I ran this for benchmarking on a smaller set of 179 files with the total size of 235MB. Benchmarking done on a DS1517+, with ordinary 3.5" spinning disks.
Initial run (which checksums all the files) took 2.92 seconds. Running every second time took 0.12 seconds (output was redirected to a file).

1. relies on some unix commands as "md5sum" (can be implemented in PHP using md5(), but will be slower)
2. set the "md5path" to a path outside the files to be checksummed, otherwise this would be included in the filelist and the md5-summing on the next run, doubling the files in the list (and so on - doubled every time). Preferrably the md5path should be set to an absolute location.
3. change the current directory to the one you want to create a file list from. Run on a smaller set of files the first times to check that it does what's expected (can be run on any directory)
4. there's no cleanup routine added to the code (could be implemented by moving the md5path folder to another place, and then for each file in the current filelist.txt scan and move back the md5 files into place). Or simply just re-checksum all the 2.5TB and nearly a million of files every year or so.

Now.. only for someone to implement it on the filelist at EAB....

