I have a large xml file (and an excel spreadsheet) with all language variations (non-english) dupes, dupes with different names, dupes with different versions (fast ram/2 disk/etc.) marked on it - with a mind to selecting a "best set".
It's not 100% complete but it's a great start.
I'm busy tonight but will try and tidy it up and zone it.
|