Feature request: encoding detection

Talk about Keka
Forum rules
Talk about Keka here. For bugs go to Issues.
Post Reply
ltn
Keka & Me
Keka & Me
Posts: 3
Joined: Fri May 24, 2013 3:27 pm

Feature request: encoding detection

Post by ltn » Fri May 24, 2013 3:42 pm

Hi, Keka developer(s):

I'm a Chinese user, and just started using Keka to unarchive zip files. We know that there's no encoding information in a zip file, so if I unzip a zip file which is encoded in GBK, it will produce unrecognizable characters as the file names.

Suppose I have a file 英文.srt whose filename is in GBK (which is quite typical in a Windows computer), and compress it into a zip file on a Windows. Then I unzip it on a Mac using Keka, the filename would become ¼òÌå&Ó¢ÎÄ.srt. This is because the default encoding on Mac is utf-8, both the system unzipper and Keka will treat it as utf-8, but it is actually in GBK, which is the default filename encoding on Windows.

So, can you add encoding auto-detection in Keka for non-english speaking users?

Thank you very much.
Lotusbrod
Translator
Translator
Posts: 12
Joined: Fri Feb 22, 2013 3:37 am

Re: Feature request: encoding detection

Post by Lotusbrod » Sat May 25, 2013 10:05 am

Hello ltn

Obviously it is aONe who can give an answer on whether or not the functionality you are looking can be provided but as I have some experience of automatic encoding detection I thought I would comment. Unfortunately it is not as straightforward as you might think to discover what the encoding is for a particular filename especially as it does not need to make sense in the language you are using. However, there is a way to solve the problem you are having by providing manual encoding selection. I am not saying this is a trivial task but it is certainly not as difficult and would be a lot more reliable than an automated system.

I would think of it working something like this:
1. add an option to the extraction settings along the lines of "Allow filename encoding to be set"
2. when this flag is set and you drag and drop an archive into the extraction area it opens a dialog to set the encoding. Think of what happens when you open a CSV file in OpenOffice. You get a dialog box which shows encoding, language and lets you preview the contents of the file so you can make sure the encoding you choose gives readable results. The language is picked up from the system environment and the encoding would be UTF-8 by default as that is the system default but you could change both the language and the encoding using dropdown lists. The preview section could then display the structure of the archive so you can check if the encoding you have chosen results in readable filenames after extraction.
3. after finding the right encoding you could then confirm to extract with the chosen encoding and the filenames would be converted to utf-8

Of course code for the conversion of the filename would still need to be written (as far as I am aware Apple do not provide off the shelf conversion in Objective-C) and I don't know how big a job that might be.
ltn
Keka & Me
Keka & Me
Posts: 3
Joined: Fri May 24, 2013 3:27 pm

Re: Feature request: encoding detection

Post by ltn » Sat May 25, 2013 5:36 pm

Hi Lotusbrod,

Thanks for replying.

As far as I know, it is impossible to detect the encoding of a string with 100% accuracy. But actually we know a lot of apps such as TextMate who can "guess" the encoding of text with a rather high accuracy, and if the guess fails, it will ask the user to choose the encoding with a preview window to show the result of decoding. I don't really know how do they do it, and I don't think it's a trivial task. But probably there's a library to do this?

Your solution is a very good idea if encoding "guessing" is too hard to implement. Thank you very much.
User avatar
aone
Mr. Keka
Mr. Keka
Posts: 271
Joined: Sun Feb 26, 2012 8:42 pm
Contact:

Re: Feature request: encoding detection

Post by aone » Mon May 27, 2013 9:35 am

I'll have to see if p7zip supports any encoding flag first. If not, this will have to wait to a binary free Keka build.
aone ~
ltn
Keka & Me
Keka & Me
Posts: 3
Joined: Fri May 24, 2013 3:27 pm

Re: Feature request: encoding detection

Post by ltn » Tue Jun 04, 2013 2:57 pm

Hi, if p7zip doesn't support encoding, you can simply use a shell script to convert the file names after unzipping.

Thanks aone, great work!
SergeyGomanyuk
I know Keka
I know Keka
Posts: 1
Joined: Mon Feb 24, 2014 7:42 am

Re: Feature request: encoding detection

Post by SergeyGomanyuk » Tue Feb 25, 2014 11:09 am

Keka is the very good archiver but it can be the best one if filenames encoding would be supported for zip archives. I'd like to up this topic and describe my situation:
I often deal with zip archives created on Windows and create zip archives for guys who use Windows. All goes fine till latin letters are used for filenames. Once national letters, i.e. cyrillic, are met in archive the troubles come - filenames of extracted files are unreadable. So currently I use "The Unarchiver" for extracting zip archives, because it is possible to choose filenames encoding in its zip setting. For creating zip archives I use CleanArchiver that again has possibility to setup filenames encoding in zip archive. So if such settings would be available in Keka, Keka will be the solid and to my mind the best archiver solution for Mac for that even do not mind paying money.
Post Reply