06 November 2020, 17:30 | #1 |
Returning fan!
Join Date: Jan 2011
Location: Montréal, QC, Canada
Posts: 1,434
|
SMBFS and UTF-8 Characters
Hi all!
On my Amiga, I use SMBFS to mount volumes that points to shared drives on my RPI, where I put all my MP3 and modules. It works well and, using the TRANSLATE option, I can read/write files back-and-forth, even with "typical" accented characters, like é or Ô, with: SMBFS USER=<username> PASSWORD=<password> DOMAIN=GIB SERVICE=//CHAMSAE/Music TRANSLATE=L:FileSystem_Trans/INTL.crossdos Recently, I added MP3 files with names in Cyrillic or Hangul characters. These files are no problem for either the RPI itself or my Windows laptop because they both use UTF-8. But, these files to do not appear through SMBFS. I was expecting mangled file names, maybe like ????? ????.mp3, so I could have played them anyways , but they do not appear at all Is there a way to "see" files with UTF-8 names using SMBFS? Cheers! |
06 November 2020, 19:12 | #2 |
Amigan
Join Date: Feb 2012
Location: London
Posts: 1,309
|
Confirmed! I'm running a self-built SMBFS from Olaf's github dated 20th August.
Montréal works OK |
06 November 2020, 20:18 | #3 |
Returning fan!
Join Date: Jan 2011
Location: Montréal, QC, Canada
Posts: 1,434
|
Hi Nogginthenog!
Yes, Montréal also works for me... But try ????? ? - ???????.mp3 (Cyrillic), it doesn't show at all for me And, it just happens that EAB also doesn't support UTF-8! What appears above as ????? ? - ???????.mp3 should be the title of [ Show youtube player ]! Cheers! |
08 November 2020, 11:59 | #4 | |
Registered User
Join Date: Aug 2010
Location: Germany
Posts: 532
|
Quote:
This is an option which smbfs (version 1.176 and beyond) can make good use of for the small portion of Unicode which maps exactly to the Amiga default character set (this being ISO 8859-1). If this is all you need, then you don't even have to resort to the file name translation tables which use the original MS-DOS codepage-based scheme. So, what about the remaining (roughly) 65,280 Unicode characters? Because the Amiga cannot display these characters, smbfs will not attempt to return them during directory scanning and will not allow you to access them. The problem here is that these characters have no sound representation in the Amiga domain. Mapping them to UTF-8 sequences is tricky (what if you want to rename a file or directory?). Also, if you switch to UTF-8 then you would have to encode all characters except for those present in the US-ASCII 7 bit character set. I have been pondering how to work around that problem since January this year and didn't make much progress beyond that, I'm afraid The problem is solvable to some degree, but you'd still see the characters which don't fit the Amiga domain in an encoded form. This will have its own drawbacks since you'd easily have to double or triple the length of the respective file or directory names. The upper limit for these names is 107 characters, and smbfs won't let you use names longer than that (they will not show up in directory lists and remain inaccessible). |
|
08 November 2020, 16:54 | #5 |
Returning fan!
Join Date: Jan 2011
Location: Montréal, QC, Canada
Posts: 1,434
|
Hi Olaf and thank you very much for your answer!
I see that that's a tricky problem... What about something like the MS-DOS/Windows "scheme": converting long file names into 8+3 chars? Let me explain If I understand correctly, SMBFS knows that a file name contains chars beyond ISO-8859-1 and, wisely! , decides to hide it. Instead, could it convert the file name into a unique encoding of its own? For our beloved Amigas, SMBFS could keep the any ISO-8859-1 chars, up to 100 chars, and then append a unique number? The unique number would distinguish two files that could have the same ISO-8859-1 chars by chance. (It could also be used internally by SMBFS to keep a correspondance between files "shown" on the Amiga side and the original files on the RPI although something more robust could be necessary.) As with MS-DOS/Windows, this "scheme" would prevent renaming from the Amiga side but would allow accessing (yeah! ) and even moving. Would that be possible? Could that create other problems? Cheers! Last edited by tygre; 08 November 2020 at 16:56. Reason: Fixed proposed naming scheme |
09 November 2020, 14:00 | #6 | |||
Registered User
Join Date: Aug 2010
Location: Germany
Posts: 532
|
Quote:
Quote:
The alternative is to rescan every directory that may contain encoded file or folder names upon access: the Samba server does that in order to allow for the 8.3 encoding to work but a client such as smbfs does not have this luxury. smbfs might have to ask the server over and over again for every directory that is part of a path. Quote:
So this would have to be a 1:1 mapping, I'm afraid This could work, but it would have to use an "escape character" (or a sequence of characters) which indicate that what follows it is Unicode data. I think this could work if the encoded Unicode data could be stored in a compact form. UTF-8 showed how this could be done In order to keep the ISO 8859-1 Amiga file/drawer names I could use an "escape sequence" of two characters, for example. The drawback still is that the file/drawer name length is limited to 107 characters, and with each Unicode character becoming "escape sequence"+2 or 3 encoded characters this may quickly exhaust the available space. And that's not even considering how full path names will work out. Many applications don't allow path names longer than 100-300 characters (and there are, of course, those which don't even check if the full path name fits into the buffer). This remains a thorny problem Question is which trade-offs are acceptable. For example, how much memory may smbfs commit to lookup tables, or how often it may rescan directories. |
|||
09 November 2020, 23:43 | #7 | |||
Returning fan!
Join Date: Jan 2011
Location: Montréal, QC, Canada
Posts: 1,434
|
Hi Olaf!
Quote:
Quote:
Quote:
TextFile.Mine.txt <-> TEXTFI~1.TXT Then, couldn't it also match: VilleDeMontréal.txt <-> VilleDeMontréal.txt Where X is some Unicode code point and ? is just some ISO-8859-1 character chosen to replace any Unicode "code point" outside of the ISO-8859-1 chars. The ~ also could be different, maybe using \ to show that these files are for Amigas "consumption" only... Ironically, this is similar to what EAB does: when I copied/pasted file names with Cyrillic and Hangeul characters and saved my post, EAB replaced every "code point" (Cyrillic chars, Hangeul syllables) with "?" Wouldn't that work? Cheers! Last edited by tygre; 09 November 2020 at 23:53. Reason: Layout, typos, some more details |
|||
10 November 2020, 04:52 | #8 |
Returning fan!
Join Date: Jan 2011
Location: Montréal, QC, Canada
Posts: 1,434
|
PS. I saw in proc.c the function
static int copy_utf16le_to_latin1(byte * to,int to_size,const byte * from,int len) which is used in few places, into ifs like: if(server->unicode_enabled) Could this function help? I had tried setting dos charset = UTF-8 but maybe I made a mistake and should try with a different combination of other parameters? |
30 May 2021, 04:31 | #9 |
Returning fan!
Join Date: Jan 2011
Location: Montréal, QC, Canada
Posts: 1,434
|
Hi all!
I'm happy to write that I have a proof-of-concept version of SMBFS that can handle files with non-Latin1 characters in their name (like the MP3 of this video: [ Show youtube player ]) It's rather "simplistic" right now: it replaces non-Latin1 names with a (unique) numerical name but it could become smarter... Maybe using Jens' codesets.library? I wonder if there is an interest (besides mine!) in improving this PoC? Olaf, could I share with you my PoC, maybe via a pull request? Cheers! |
08 June 2021, 19:38 | #10 |
Returning fan!
Join Date: Jan 2011
Location: Montréal, QC, Canada
Posts: 1,434
|
PS. Just sent a pull request to Olaf
|
13 June 2021, 10:29 | #11 | |
Registered User
Join Date: Aug 2010
Location: Germany
Posts: 532
|
Quote:
Hang on... And the changes are committed Version 2.23 is now tagged and ready for tinkering. Last edited by Olaf Barthel; 13 June 2021 at 11:37. |
|
13 June 2021, 11:43 | #12 | |
Registered User
Join Date: Aug 2010
Location: Germany
Posts: 532
|
Quote:
The best idea I had on how to achieve something similar would have involved encoding the Unicode characters in the drawer/file name. This would have required only small changes to smbfs, but it would have bumped against the name length limitations. With only 107 characters to work with and any encoding scheme taking up more than two characters to represent a 16 bit value, some names would never have fit. How do you show that directory entries have been omitted because of that? No idea The same problem already exists for file/drawer names longer than 107 characters. Your solution does not have this problem by keeping the original name and its "alias" in memory. The extra memory spent will remain spent until smbfs shuts down, though. There's some room for improvement here, I'd say Also, your solution could be extended to file/drawer names longer than 107 characters which smbfs cannot currently represent. This looks like the way forward to me. Last edited by Olaf Barthel; 13 June 2021 at 11:58. |
|
16 June 2021, 19:07 | #13 | |
Returning fan!
Join Date: Jan 2011
Location: Montréal, QC, Canada
Posts: 1,434
|
Quote:
Cheers! |
|
16 June 2021, 19:19 | #14 | |
Returning fan!
Join Date: Jan 2011
Location: Montréal, QC, Canada
Posts: 1,434
|
Hi Olaf!
Quote:
Indeed the max. length of file names (and directory names) is really a hard constraint (in all sense of the term ). I actually limit the names to 31 characters because I met problems with the Ram Disk and some other programs... But these problems maybe came from my install? Agreed on extending this solution for directory names! Another thing I'd like to add is a real "transliteration" from Unicode to Latin1 but this seems complicated! Cheers! |
|
17 July 2021, 05:28 | #15 |
Registered User
Join Date: Nov 2017
Location: Rockford IL / USA
Posts: 35
|
I haven't seen any binaries for releases of smbfs for quite a while, and I don't have a build environment set up. Is there somewhere to download them from?
|
18 July 2021, 03:53 | #16 |
Returning fan!
Join Date: Jan 2011
Location: Montréal, QC, Canada
Posts: 1,434
|
Hi n9yty!
No problem, I can share it with you... Where would be more convenient? The Zone maybe? Let me know! |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to use SMBFS | @UAE | support.Apps | 23 | 06 November 2021 19:57 |
Yet Another Help with SMBFS? | tygre | support.Apps | 6 | 28 December 2019 20:38 |
SMBFS: Problems | AMIGASYSTEM | support.Apps | 9 | 24 April 2018 23:35 |
Help with SMBFS? | madman | support.Apps | 1 | 14 August 2011 19:32 |
|
|