![]() The key thing to making it work properly is to account for the correct character size/length (and therefore the correct pointer operation) when referencing them. I use Base64 encoding in some of my applications which process string values, and the original unmodified Base64 routines from 20 years ago work perfectly on both UTF-8 and UTF-16 strings. The bug is present not only for UTF-16 strings, but also when you specifically tell UltraEdit to produce UTF-8 (on the File menu). The bug in UltraEdit is the result of an improper pointer operation in the string handling routine. It has nothing to do with "standard" or "modified" Base64 encoding, and reading every RFC in the world will not help. But perhaps my post is an explanation why Base64 encoding is not working for UTF-16 characters in UltraEdit. Mofi wrote: I don't want to read all the RFCs to get an answer on that question because for my work with UltraEdit that is totally unimportant. I don't want to read all the RFCs to get an answer on that question because for my work with UltraEdit that is totally unimportant. But why was UTF-7 encoding introduced if standard Base64 can be used also by reading the UTF-16 strings as binary data stream? Therefore the UTF-16 characters could be read as binary array like in hex edit mode and therefore encoding UTF-16 characters with standard Base64 encoding is also possible. Of course, standard Base64 encoding can be used for binary files. Now the question is, which Base64 encoding is implemented in UltraEdit at all, the standard Base64 or the modified Base64? It looks like standard Base64. The UTF-7 encoding is needed for encoding UTF-16 characters which is called also modified Base64. ![]() According to this article Base64 encoding is only for ASCII strings (single byte strings). So I wanted to know more about Base64 encoding and read (not entirely) the wikipedia article about Base64. Using Encode Base64 on ASCII string results also in P0A4MjVC. Omitting the high bytes with value 04, those bytes would be in ASCII Using Encode Base64 on UTF-16 string привет results in P0A4MjVC. I looked on this issue with UE v17.00 and word привет is still not got back after encoding/decoding it with Base64. Now let's try encoding something with non-ASCII characters.ġ) Erase all the text in your editor windowĢ) Copy and paste the word "привет" into your editor windowĪs you can see, the decoded text is not what you started with, and therefore we can see that the problem is with the handling of Base64 encoding of non-ASCII text. So as you can see, normal ASCII characters are encoded correctly. ![]() The decoded text will be exactly the same as what you started with. Then go to the File Menu -> Conversions -> ASCII to Unicodeģ) Highlight the word with your keyboard or mouse Mofi wrote:Can you explain the problem more detailed with a step by step list how to reproduce it because I can't reproduce it using UE v16.ġ) Create a new file in UltraEdit. ![]() Email programs solve that problem by adding additional information about the original data (= file information like name of file, content-type, etc.) as plain text above the encoded data stream to be able to correct decode the encoded data. The encoded data stream does not contain any information of which type the input data stream was. ![]() That is the general problem of Unicode to ANSI conversion with characters not available in the ANSI codepage. But that is not a problem of the Base64 encoding/decoding routines. Of course encoding a Unicode text with characters not available in the active codepage and decoding this text in an ANSI file results in wrong characters. Now I encoded it once again, copied the encoded string to a new ANSI file and decoded it - also correct result. I added some characters which must be really encoded in Unicode with 2 bytes (German umlauts) and saved the Unicode file. I again encoded and decoded it - no problem. Next I converted the file to Unicode (UTF-16 LE) and saved it. Can you explain the problem more detailed with a step by step list how to reproduce it because I can't reproduce it using UE v16.įor testing I took one of my ANSI HTML files (6468 bytes) encoded and decoded it - no problem. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |