How to write a Hebrew string to a log4j file. Right now I see ?????? in the file.
I have searched everywhere online to convert Unicode to string:
String abc = myStr.replaceAll("\u200F", "");
abc = abc.replaceAll("\u200E", "");
byte[] utf8Bytes = abc.getBytes(Charset.forName("UTF-8"));
String value = new String(utf8Bytes);
log.debug("value : "+ value );
I just need to write out a Hebrew string to a Log4j file in a readable format. Here is my configuration:
log4j.rootLogger=debug, stdout, R log4j.logger.testlogging=DEBUG
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd} %5p [%t] (%F:%L) - %m%n log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.File=C:\\dri\\ums.log log4j.appender.R.MaxBackupIndex=5
log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern= %d{dd MMM yyyy HH:mm:ss,SSS} %5p [%t] (%F:%L) - %m%n log4j.appender.FILE.encoding=UTF-8
Based from what I've gathered from the comments and my own experience this is most probably not an issue with Log4j itself. I've posted a comment indicating just that:
What exactly do you mean with Log4j file? Is it a regular text log file that the FileAppender points to? Because I've tried printing Hebrew text right now and all is working out fine. I believe this is not a Log4j issue and might be related to your text reader.
Other comments have confirmed their suspicion that it is your text reader that might be causing this issue. I was able to reproduce your issue by doing the following in Notepad++:
Open a new tab in notepad++.
Copy and paste sample text containing Hebrew letters.
Language -> Convert to ANSI
Text before the conversion:
See also: אלף־בית and אַלף־בית
Text after conversion:
See also: ???????? and ?????????
Based on the code you provided (assuming there is no shenanigans that we don't know for behind the scenes) we can definitively conclude you are either writing to a file that has it's encoding set to ANSI where all your special characters are being converted to question marks because they cannot be decoded or your characters are being read as UTF-8 but merely displayed as ANSI.
ANSI and UTF-8 are both encoding formats. ANSI is the common one byte format used to encode Latin alphabet; whereas, UTF-8 is a Unicode format of variable length (from 1 to 4 bytes) which can encode all possible characters.
I would recommend following these steps:
Navigate to Settings -> Preferences -> New Document -> Encoding and make sure that the UTF-8 (Apply to opened ANSI files) option is selected.
Close all your files currently opened in Notepad++ and delete the log file. Make sure you are actually closing the files instead of just closing Notepad++. This should clear the file entries from cache and allow you to open them again with a different encoding.
Run your Java application and let Log4j print to the file.
Open the file with Notepad++ and check that you are encoding in UTF-8 by clicking on Encoding tab. If the option is not set to UTF-8, change it.
If none of the above worked, please post further information in the comments.
Unfortunately I am not that well versed in encoding matters and had to look some stuff up in the process of writing this, so I can't help you as much as I would like to. However in addition to providing the steps above I can direct you to the following links which should provide you with further knowledge and (consequently) more insight into your problem:
Encodings And Character Display - Notepad++ Wiki
ANSI to UTF-8 in Notepad++ - SuperUser
Related
I want to download the text file by clicking on button, everything is working fine as expected. But the problem is the data I want to insert in text file is just one line.
String fileContent = "Simple Solution \nDownload Example 1";
here, \n is not working. It resulting in output as:
Simple Solution Download Example 1
Code snippets:
interface:
interface implementation in my service class:
controller:
Don't use hardcoded \n nor \r\n - line-separators are platform-specific (Windows differs to all other OS).
What you can do is:
Use System.lineSeparator()
Build content with String.format() and replace \n with %n
The main problem is that the server computer and client computer are basically independent with respect to character set encoding and line separators.
Defaults will not do.
As we are living in a Windows centric world (I am a linuxer), user "\r\n".
Then java can mix any Unicode script. A file does not have info on its encoding.
If it originates on an other computer/platform, that raises problems.
String fileContent = "Simple Solution façade, mañana, €\r\n"
+ "Download Обичам ĉĝĥĵŝŭ Example 1";
So the originating computer explicitly define the encoding. It should not do:
fileContent.getBytes(); // Default platform encoding Charset.defaultCharset().
So the originating computer can do:
fileContent.getBytes(StandardCharsets.UTF_8); // UTF-8, full Unicode.
fileContent.getBytes("Windows-1252); // MS Windows Latin 1, some ? failures.
The contentType can be set appropriately with "text/plain;charset=UTF-8" or for Windows-1252 "text/plain;charset=ISO-8859-1".
And from that byte[] you should take the .length for the contentLength.
Writing to the file can use Files.writeString
In that case use Files.size(exportedPath) for the content length.
Files.newInputStream(exportedPath) is the third goodie from Files.
The goal is to read from the database and write the records into a file.
When running code in IntelliJ IDEA, it writes Unicode characters as same as database content.
But when I build the artifact (Jar File) and run it in windows, the output file shows question mark character '?' instead of showing Database content correctly.
In another word, Although English characters and numbers are showing correctly, Problem occurs in non-English characters (e.g. Persian characters, Arabic or ...)
related parts of java code:
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outputFile.txt , true), "cp1256"));
while (resultSet.next()) {
try {
singleRow = resultSet.getString("CODE") + "|"
+ resultSet.getString("ACTIVITY") + "|"
+ resultSet.getString("TEL") + "|"
+ resultSet.getString("ZIPCD") + "|"
+ resultSet.getString("ADDR");
} catch (Exception e) {
LogUtil.writeLog(Constants.LOG_ERROR, e.getMessage());
}
out.write(singleRow + System.getProperty("line.separator"));
}
Output file content by running IntelliJ IDEA DEBUG mode:
130143|Active|ابتداي بلوار ميرداماد،کوچه سوم پلاک پنج|524|35254410
190730|Active|خیابان زیتون، بین انوشه و زیبا پلاک یک|771|92542001
Output file content by running corresponding JAR File:
130143|Active|35254410|524|??? ? ??? ??????? ????? ????
190730|Active|92542001|771|????? ??? ??????? ????? ??? ??
Could you please tell me what is wrong with the program?
You must change your code as follows:
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outputFile.txt , true), StandardCharsets.UTF_8));
while (resultSet.next()) {
try {
singleRow = resultSet.getString("CODE") + "|"
+ resultSet.getString("ACTIVITY") + "|"
+ resultSet.getString("TEL") + "|"
+ resultSet.getString("ZIPCD") + "|"
+ resultSet.getString("ADDR") ;
} catch (Exception e) {
LogUtil.writeLog(Constants.LOG_ERROR, e.getMessage());
}
byte[] bytes = singleRow.getBytes(StandardCharsets.UTF_8);
String utf8EncodedString = new String(bytes, StandardCharsets.UTF_8);
out.write(utf8EncodedString + System.getProperty("line.separator"));
}
String.getBytes() uses the system default character set.You can see your environment charset via :
System.out.println("Charset.defaultCharset="+ Charset.defaultCharset());
When running from IntelliJ , the system default character set is taken from IntelliJ environment.
When running from JAR file, the system default character set is taken from the Operating system (Explained at the end).
Because of the different charset of your windows and IntelliJ environment, you get different output.
It is highly recommended to explicitly specify "ISO-8859-1" or "US-ASCII" or "UTF-8" or whatever character set you to want when converting bytes into Strings of vice-versa
singleRow.getBytes(StandardCharsets.UTF_8)
see this link for more ionformation
what are Windows-1252 and Windows-1256 ?
Windows-1252
Windows-1252 or CP-1252 (code page 1252) is a single-byte(0-255) character.
encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows for English and many European languages including Spanish, French, and German.
The first 128 code (0-127) is the same as the standard ASCII code. The other codes(128-255) depend on system language ( Spanish, French, German).
Windows-1256
Windows-1256 is a code page used to write Arabic (and possibly some other languages that use Arabic script, like Persian and Urdu) under Microsoft Windows.
These are some Windows-1252 Latin characters used for French since this European language has some historic relevance in former French colonies in North Africa. This allowed French and Arabic text to be intermixed when using Windows 1256 without any need for code-page switching (however, upper-case letters with diacritics were not included).
What should I Do when using Unicode(persian) characters?
Because of existing some different characters that have similar notations such as “ی” and “ي” in Persian, this encoding will replace “ی” (U+06cc) with “ي”( U+064a), because Windows-1256 has not U+06cc character.
for Persian, instate of using Windows-1256 use UTF-8 encoding to avoid encoding problems.
Consider that Windows-1256 uses only 1 byte and UTF-8 take more bytes (1 to 4 bytes.)
A comparison of these encoding are here
How to change windows Default character set?
now on Microsoft windows Windows-1252 is the default encoding used by Windows systems in most western countries.
To change your Microsoft windows default character set to suitable Unicode follow this .
If you change as follows to Persian, your default charset will be changed to Windows-1256
How to change specific software character set (some for programming)?
you must change your specific software Unicode as it’s instructions.
1- for notepad++
2- on xml file or field
3- For IntelliJ files
Open the desired file for editing.
From the main menu, select File | File encoding or click the file encoding on the status bar.
Select the desired encoding from the popup.
If or is displayed next to the selected encoding, it means that this encoding might change the file contents. In this case, IntelliJ IDEA opens a dialog where you can decide what you want to do with the file: choose Reload to load the file in the editor from disk and apply encoding changes to the editor only, or choose Convert to overwrite the file with the encoding of your choice.
4-IntelliJ Console output encoding
IntelliJ IDEA creates files using the IDE encoding defined in the File Encodings page of the Settings / Preferences dialog Ctrl+Alt+S. You can use either the system default or select from the list of available encodings. By default, this encoding affects console output. If you want the encoding for console output to be different from the global IDE settings, configure the corresponding JVM option:
On the Help menu, click Edit Custom VM Options.
Add the -Dconsole.encoding option and set the value to the necessary encoding. For example: -Dconsole.encoding=UTF-8
Restart IntelliJ IDEA.
I am having problems to write out the following string into a file correctly. Especially with the character "œ". The Problem appears on my local machine (Windows 7) and on the server (Linux)
String: "Cœurs d’artichauts grillées"
Does Work (œ gets displays correctly, while the apostrophe get translated into a question mark):
Files.write(path, content.getBytes(StandardCharsets.ISO_8859_1));
Does not work (result in file):
Files.write(path, content.getBytes(StandardCharsets.UTF_8));
According to the first answer of this question, UTF-8 should be able to encode the œ correctly as well. Has anyone have an idea what i am doing wrong?
Your second approach works
String content = "Cœurs d’artichauts grillées";
Path path = Paths.get("out.txt");
Files.write(path, content.getBytes(Charset.forName("UTF-8")));
Is producing an out.txt file with:
Cœurs d’artichauts grillées
Most likely the editor you are using is not displaying the content correctly. You might have to force your editor to use the UTF-8 encoding and a font that displays œ and other UTF-8 characters. Notepad++ or IntelliJ IDEA work out of the box.
Im creating a log file system. the log file will be in json format so a server can read it after but i dont think thats too important. What i need to know is can log4j be configured to write into to a file but without any tags like info,debug, timestamp etc in the file. I have looked here
but this polutes the file with with other things. I want ONLY the data i write to show up in the file. I'd also like to set some kind of file rotation on the file if it gets too big after a max size is reached.
This is relatively easy, using a log4j.properties configuration file (place it at the top of your classpath, and Log4j will 'just find it'):
# This is the default logger, simply logs to console
log4j.logger.com.foo.bar=DEBUG,A1
log4j.appender.A1=org.apache.log4j.ConsoleAppender
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
# Note the Pattern here, emits a lot of stuff - btw, don't use this in production
# %C is expensive - see the Javadoc for ConversionPattern for the meaning of all
# the % modifiers:
log4j.appender.A1.layout.ConversionPattern=%d{MMM dd, HH:mm:ss} [%C{2}] %-5p - %m%n
# Logging to file can be enabled by using this one
log4j.logger.com.example=DEBUG, R
log4j.appender.R=org.apache.log4j.RollingFileAppender
log4j.appender.R.File=/var/log/generic.log
log4j.appender.R.MaxFileSize=100KB
# Keep one backup file
log4j.appender.R.MaxBackupIndex=1
# This is the most minimalist layout you can have: just the 'm'essage is emitted
# (and a \n newline):
log4j.appender.R.layout=org.apache.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%m%n
All the classes in the com.foo.bar package (and subpackages) will log to console, those in com.example (and below) will log to /var/log/generic.log.
To emit JSON, just use Jackson (com.fasterxml) convert your data to a JSON object and write it out as a string.
What you want is the PatternLayout with %m%n only, and combine with the answer to previously asked question here
You should be able to write a custom log appender, I have a vague recollection of there being an example to do this with MongoDB.
MongoDD stores its data as JSON, so it have converted the log to json format before inserting it.
I would start by searching for a MondgoDB appender, and looking at the file appender that is shipped, that should give you a starting point for an appender if one doesnt already exist.
http://log4mongo.org/display/PUB/Home
When I process a properties file with the Spanish characters ó and é, characters are displayed as ?. I tried different ways to fix this, but still fail:
I tried to use \uxxxx
I tried to use InputStreamReader with encoding UTF-8
I tried to convert string to bytes and then create a new String from those bytes:
new String( val.getBytes("UTF-8"), "UTF-8")
Nothing worked. What should I do next to fix this issue? Japanese and Russian are still OK.
The properties file needs to be in the proper encoding. By default some IDE's like eclipse saves the content using CP1252 but you are requiring the file as UTF-8. This is also required for your java code.
If you try to use \uxxxx characters but your application by default is working with CP1252 the conversion of the escape code result in a bad character.
If you use the InputStreamReader to force the reading as UTF-8 but your code and/or your file are not using UTF-8 support result in a bad character.
If you use UTF-8 conversion of an string but your source code is CP1252 you should have the same problem.
Related previous answer about source code : Should source code be saved in UTF-8 format
Notepad ++ Has a menu to view the format of the file and change it in "Format" menu you should view the file as if it should be opened by other formarts or you should convert the file to other file formats like "UTF-8"