Issues with compressing/decompressing in Japanese?

Posted: Mon Apr 16, 2012 10:25 am
by toddspitz
I've loved Keka for all of my files in English, but it's having trouble compressing & decompressing my files with Japanese names.

The file contents are unchanged, but the title becomes unreadable.
I believe the file isn't changed when opened on a Mac, but there is definitely an error when opened on a windows machine.

I need to be able to open and save files in Japanese for work, so switching to English for the title names unfortunately isn't an option. Any thoughts?

Posted: Tue Apr 17, 2012 7:09 pm
by aone
Let me know if this old version of Keka handles fine those Japanese files:

This one uses a p7zip binary modified by trip2me. Later versions use updated versions of p7zip without the patch.

Also if you can attach a compressed file (with some empty folders or txt) with Japanese characters so I can test :mrgreen:

Posted: Fri Feb 22, 2013 6:08 am
by Lotusbrod
Just out of curiosity, was there any follow-up on this. I don't see anything in the tracker nor any way of sending private messages which makes me think not.

Maybe toddspitz's problem has gone away. I have no problems with Japanese filenames but have only worked with files on the Mac. Filenames being unreadable on Windows when they are created on a Mac doesn't surprise me but I wouldn't expect an error opening the file just because of the encoding of the filename.

(EDIT: I must be blind. I now see the pm button :oops: . I'm still curious as to whether it was fixed or an isolated/unreplicated problem.)

Posted: Fri Feb 22, 2013 3:24 pm
by aone
This is unresolved. I did not received any more feedback from toddspitz. I think the problem comes when extracting some files compressed with another app/OS.

Posted: Tue Apr 16, 2013 5:46 am
by Lotusbrod
I finally got around to testing this and just in case someone else is searching the forum for information on this here is further information.

toddspitz isn't specific with the meaning of "definitely an error when opened on a windows machine" but I have tried with both Windows 7 and Windows XP and cannot replicate anything which prevents files from being extracted and then opened.

What is clear is that the filenames extracted from the archive are unreadable which is not entirely unexpected.

Windows and Mac communicate with each other using smb. So when the file is copiedd the protocol understands that character encoding is not the same in both environments and helps by specifically changing the encoding during the copy so the utf-8 filename of a file created on a Mac is converted to a shift-jis filename when copied to Windows and vice versa. Of course that is just the filename not the contents, and in the case of an archive that means the filenames of the compressed files (which are contents of the archive) are not touched. So when you extract the files they have filenames in an encoding which is not appropriate for the OS and therefore breaks (for people in Japan familiar with the problem this is "mojibake").

I would expect the majority of archiving programs to give the same results.

Depending on the type of file, the application used and the way it is saved there is also potential for breaking the contents of the file if you try to edit it on a Mac if it was created under Windows. Of course that's nothing to do with Keka (or any archiver) it's just the difference between Mac and Windows.

I've attached a zip file created using Windows XPs built in archiver just for interest/understanding the problem.