I want to download the text file by clicking on button, everything is working fine as expected. But the problem is the data I want to insert in text file is just one line.
String fileContent = "Simple Solution \nDownload Example 1";
here, \n is not working. It resulting in output as:
Simple Solution Download Example 1
Code snippets:
interface:
interface implementation in my service class:
controller:
Don't use hardcoded \n nor \r\n - line-separators are platform-specific (Windows differs to all other OS).
What you can do is:
Use System.lineSeparator()
Build content with String.format() and replace \n with %n
The main problem is that the server computer and client computer are basically independent with respect to character set encoding and line separators.
Defaults will not do.
As we are living in a Windows centric world (I am a linuxer), user "\r\n".
Then java can mix any Unicode script. A file does not have info on its encoding.
If it originates on an other computer/platform, that raises problems.
String fileContent = "Simple Solution façade, mañana, €\r\n"
+ "Download Обичам ĉĝĥĵŝŭ Example 1";
So the originating computer explicitly define the encoding. It should not do:
fileContent.getBytes(); // Default platform encoding Charset.defaultCharset().
So the originating computer can do:
fileContent.getBytes(StandardCharsets.UTF_8); // UTF-8, full Unicode.
fileContent.getBytes("Windows-1252); // MS Windows Latin 1, some ? failures.
The contentType can be set appropriately with "text/plain;charset=UTF-8" or for Windows-1252 "text/plain;charset=ISO-8859-1".
And from that byte[] you should take the .length for the contentLength.
Writing to the file can use Files.writeString
In that case use Files.size(exportedPath) for the content length.
Files.newInputStream(exportedPath) is the third goodie from Files.
Related
The goal is to read from the database and write the records into a file.
When running code in IntelliJ IDEA, it writes Unicode characters as same as database content.
But when I build the artifact (Jar File) and run it in windows, the output file shows question mark character '?' instead of showing Database content correctly.
In another word, Although English characters and numbers are showing correctly, Problem occurs in non-English characters (e.g. Persian characters, Arabic or ...)
related parts of java code:
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outputFile.txt , true), "cp1256"));
while (resultSet.next()) {
try {
singleRow = resultSet.getString("CODE") + "|"
+ resultSet.getString("ACTIVITY") + "|"
+ resultSet.getString("TEL") + "|"
+ resultSet.getString("ZIPCD") + "|"
+ resultSet.getString("ADDR");
} catch (Exception e) {
LogUtil.writeLog(Constants.LOG_ERROR, e.getMessage());
}
out.write(singleRow + System.getProperty("line.separator"));
}
Output file content by running IntelliJ IDEA DEBUG mode:
130143|Active|ابتداي بلوار ميرداماد،کوچه سوم پلاک پنج|524|35254410
190730|Active|خیابان زیتون، بین انوشه و زیبا پلاک یک|771|92542001
Output file content by running corresponding JAR File:
130143|Active|35254410|524|??? ? ??? ??????? ????? ????
190730|Active|92542001|771|????? ??? ??????? ????? ??? ??
Could you please tell me what is wrong with the program?
You must change your code as follows:
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outputFile.txt , true), StandardCharsets.UTF_8));
while (resultSet.next()) {
try {
singleRow = resultSet.getString("CODE") + "|"
+ resultSet.getString("ACTIVITY") + "|"
+ resultSet.getString("TEL") + "|"
+ resultSet.getString("ZIPCD") + "|"
+ resultSet.getString("ADDR") ;
} catch (Exception e) {
LogUtil.writeLog(Constants.LOG_ERROR, e.getMessage());
}
byte[] bytes = singleRow.getBytes(StandardCharsets.UTF_8);
String utf8EncodedString = new String(bytes, StandardCharsets.UTF_8);
out.write(utf8EncodedString + System.getProperty("line.separator"));
}
String.getBytes() uses the system default character set.You can see your environment charset via :
System.out.println("Charset.defaultCharset="+ Charset.defaultCharset());
When running from IntelliJ , the system default character set is taken from IntelliJ environment.
When running from JAR file, the system default character set is taken from the Operating system (Explained at the end).
Because of the different charset of your windows and IntelliJ environment, you get different output.
It is highly recommended to explicitly specify "ISO-8859-1" or "US-ASCII" or "UTF-8" or whatever character set you to want when converting bytes into Strings of vice-versa
singleRow.getBytes(StandardCharsets.UTF_8)
see this link for more ionformation
what are Windows-1252 and Windows-1256 ?
Windows-1252
Windows-1252 or CP-1252 (code page 1252) is a single-byte(0-255) character.
encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows for English and many European languages including Spanish, French, and German.
The first 128 code (0-127) is the same as the standard ASCII code. The other codes(128-255) depend on system language ( Spanish, French, German).
Windows-1256
Windows-1256 is a code page used to write Arabic (and possibly some other languages that use Arabic script, like Persian and Urdu) under Microsoft Windows.
These are some Windows-1252 Latin characters used for French since this European language has some historic relevance in former French colonies in North Africa. This allowed French and Arabic text to be intermixed when using Windows 1256 without any need for code-page switching (however, upper-case letters with diacritics were not included).
What should I Do when using Unicode(persian) characters?
Because of existing some different characters that have similar notations such as “ی” and “ي” in Persian, this encoding will replace “ی” (U+06cc) with “ي”( U+064a), because Windows-1256 has not U+06cc character.
for Persian, instate of using Windows-1256 use UTF-8 encoding to avoid encoding problems.
Consider that Windows-1256 uses only 1 byte and UTF-8 take more bytes (1 to 4 bytes.)
A comparison of these encoding are here
How to change windows Default character set?
now on Microsoft windows Windows-1252 is the default encoding used by Windows systems in most western countries.
To change your Microsoft windows default character set to suitable Unicode follow this .
If you change as follows to Persian, your default charset will be changed to Windows-1256
How to change specific software character set (some for programming)?
you must change your specific software Unicode as it’s instructions.
1- for notepad++
2- on xml file or field
3- For IntelliJ files
Open the desired file for editing.
From the main menu, select File | File encoding or click the file encoding on the status bar.
Select the desired encoding from the popup.
If or is displayed next to the selected encoding, it means that this encoding might change the file contents. In this case, IntelliJ IDEA opens a dialog where you can decide what you want to do with the file: choose Reload to load the file in the editor from disk and apply encoding changes to the editor only, or choose Convert to overwrite the file with the encoding of your choice.
4-IntelliJ Console output encoding
IntelliJ IDEA creates files using the IDE encoding defined in the File Encodings page of the Settings / Preferences dialog Ctrl+Alt+S. You can use either the system default or select from the list of available encodings. By default, this encoding affects console output. If you want the encoding for console output to be different from the global IDE settings, configure the corresponding JVM option:
On the Help menu, click Edit Custom VM Options.
Add the -Dconsole.encoding option and set the value to the necessary encoding. For example: -Dconsole.encoding=UTF-8
Restart IntelliJ IDEA.
I am exporting .txt file to sFTP server, when I am downloading file from sFTP server all text printed in single line means line breaker is not working, even I exported file to local folder line breaker was working perfect but from sFTP line breaker is not working.
Used System.lineSeparator() and \r\n, \r and also more examples but still file is customizing
I want file should be like below:
test|test|test|test
test|test|test|test
test|test|test|test
But it looks as below after download:
test|test|test|test test|test|test|test test|test|test|test test|test|test|test test|test|test|test
I am using Tomcat server and Java 8 in Linux environment.
you should try with :
public static String newline = System.getProperty("line.separator");
if it doesn't work its the "\" that might be the problem here you could try to double it
the first one to say that the second isn't a computer tag
There are line-breaks, however different operating systems recognise different sequences for line-breaks.
Notepad only recognises CR, LF (0x0d, 0x0a), whereas other sources might use CR only, or LF only.
You can't make Notepad behave differently, so your only option is to make sure the content has the right sequence for Notepad. Note that notepad is the only editor with this restriction, so if your content works in Notepad, it will work everywhere else.
One simple way to fix the line-feeds is to copy and paste the text into Word, then back again into notepad, and the line-feeds will get "corrected" to the CR,LF sequence.
Also you can use other text editors like notepad++, sublime, etc. For more information visit here
FTP is notorious in that a Windows line ending \r\n can be converted to a Unix line ending \n when the file in not transferred as binary data (as opposed to text).
On Windows a text file with \n will not be seen as line separator in simple text editors like Notepad.
Use an other editor like Notepad++ or JEdit.
So
On FTP use binary transfer
Use a programmer's editor
There also is a simple bug where lines are read, and text is composed of those lines, forgetting the dropped new lines:
StringBuilder fileContent = new StringBuilder();
BufferedReader in = new BufferedReader(...);
for (;;) {
String line = in.readLine(); // No line ending!
if (line == null) {
break;
}
fileContent.append(line); // Forgotten: `.append("\r\n")`
}
return fileContent.toString();
So
Check the reading code
I download a file from a website using a Java program and the header looks like below
Content-Disposition attachment;filename="Textkürzung.asc";
There is no encoding specified
What I do is after downloading I pass the name of the file to another application for further processing. I use
System.out.println(filename);
In the standard out the string is printed as Textk³rzung.asc
How can I change the Standard Out to "UTF-8" in Java?
I tried to encode to "UTF-8" and the content is still the same
Update:
I was able to fix this without any code change. In the place where I call this my jar file from the other application, i did the following
java -DFile.Encoding=UTF-8 -jar ....
This seem to have fixed the issue
thank you all for your support
The default encoding of System.out is the operating system default. On international versions of Windows this is usually the windows-1252 codepage. If you're running your code on the command line, that is also the encoding the terminal expects, so special characters are displayed correctly. But if you are running the code some other way, or sending the output to a file or another program, it might be expecting a different encoding. In your case, apparently, UTF-8.
You can actually change the encoding of System.out by replacing it:
try {
System.setOut(new PrintStream(new FileOutputStream(FileDescriptor.out), true, "UTF-8"));
} catch (UnsupportedEncodingException e) {
throw new InternalError("VM does not support mandatory encoding UTF-8");
}
This works for cases where using a new PrintStream is not an option, for instance because the output is coming from library code which you cannot change, and where you have no control over system properties, or where changing the default encoding of all files is not appropriate.
The result you're seeing suggests your console expects text to be in Windows "code page 850" encoding - the character ü has Unicode code point U+00FC. The byte value 0xFC renders in Windows code page 850 as ³. So if you want the name to appear correctly on the console then you need to print it using the encoding "Cp850":
PrintWriter consoleOut = new PrintWriter(new OutputStreamWriter(System.out, "Cp850"));
consoleOut.println(filename);
Whether this is what your "other application" expects is a different question - the other app will only see the correct name if it is reading its standard input as Cp850 too.
Try to use:
PrintStream out = new PrintStream(System.out, true, "UTF-8");
out.println(test);
Background:
I have 2 machines: one is running German windows 7 and my PC running English(with Hebrew locale) windows 7.
In my Perl code I'm trying to check if the file that I got from the German machine exists on my machine.
The file name is ßßßzllpoöäüljiznppü.txt
Why is it failed when I do the following code:
use Encode;
use Encode::locale;
sub UTF8ToLocale
{
my $str = decode("utf8",$_[0]);
return encode(locale, $str);
}
if(!-e UTF8ToLocale($read_file))
{
print "failed to open the file";
}
else
{
print $read_file;
}
Same thing goes also when I'm trying to open the file:
open (wtFile, ">", UTF8ToLocale($read_file));
binmode wtFile;
shift #_;
print wtFile #_;
close wtFile;
The file name is converted from German to utf8 in my java application and this is passed to the perl script.
The perl script takes this file name and convert it from utf8 to the system locale, see UTF8ToLocale($read_file) function call, and I believe that is the problem.
Questions:
Can you please tell me what is the OS file system charset encoding?
When I create German file name in OS that the locale is Hebrew in which Charset is it saved?
How do I solve this problem?
Update:
Here is another code that I run with hard coded file name on my PC, the script file is utf8 encoded:
use Encode;
use Encode::locale;
my $string = encode("utf-16",decode("utf8","C:\\TestPerl\\ßßßzllpoöäüljiznppü.txt"));
if (-e $string)
{
print "exists\r\n";
}
else
{
print "not exists\r\n"
}
The output is "not exists".
I also tried different charsets: cp1252, cp850, utf-16le, nothing works.
If I'm changing the file name to English or Hebrew(my default locale) it works.
Any ideas?
Windows 7 uses UTF-16 internally [citation needed] (I don't remember the byte order). You don't need to convert file names because of that. However, if you transport the file via a FAT file system (eg an old USB stick) or other non Unicode aware file systems these benefits will get lost.
The locale setting you are talking about only affects the language of the user interface and the apparent folder names (Programme (x86) vs. Program Files (x86) with the latter being the real name in the file system).
The larger problem I can see is the internal encoding of the file contents that you want to transfer as some applications may default to different encodings depending on the locale. There is no solution to that except being explicit when the file is created. Sticking to UTF-8 is generally a good idea.
And why do you convert the file names with another tool? Any Unicode encoding should be sufficient for transfer.
Your script does not work because you reference an undefined global variable called $read_file. Assuming your second code block is not enclosed in any scope, especially not in a sub, then the #_ variable is not available. To get command line arguments you should consider using the #ARGV array. The logic ouf your script isn't clear anyway: You print error messages to STDOUT, not STDERR, you "decode" the file name and then print out the non-decoded string in your else-branch, you are paranoid about encodings (which is generally good) but you don't specify an encoding for your output stream etc.
I have a Java application that receives data over a socket using an InputStreamReader. It reports "Cp1252" from its getEncoding method:
/* java.net. */ Socket Sock = ...;
InputStreamReader is = new InputStreamReader(Sock.getInputStream());
System.out.println("Character encoding = " + is.getEncoding());
// Prints "Character encoding = Cp1252"
That doesn't necessarily match what the system reports as its code page. For example:
C:\>chcp
Active code page: 850
The application may receive byte 0x81, which in code page 850 represents the character ü. The program interprets that byte with code page 1252, which doesn't define any character at that value, so I get a question mark instead.
I was able to work around this problem for one customer who used code page 850 by adding another command-line option in the batch file that launches the application:
java.exe -Dfile.encoding=Cp850 ...
But not all my customers use code page 850, of course. How can I get Java to use a code page that's compatible with the underlying Windows system? My preference would be something I could just put in the batch file, leaving the Java code untouched:
ENC=...
java.exe -Dfile.encoding=%ENC% ...
The default encoding used by cmd.exe is Cp850 (or whatever "OEM" CP is native to the OS); the system encoding is Cp1252 (or whatever "ANSI" CP is native to the OS). Gory details here. One way to discover the console encoding would be to do it via native code (see GetConsoleOutputCP for current console encoding; see GetACP for default "ANSI" encoding; etc.).
Altering the encoding via the -D switch is going to affect all your default encoding mechanisms, including redirected stdout/stdin/stderr. It is not an ideal solution.
I came up with this WSH script that can set the console to the system ANSI codepage, but haven't figured out how to programmatically switch to a TrueType font.
'file: setacp.vbs
'usage: cscript /Nologo setacp.vbs
Set objShell = CreateObject("WScript.Shell")
'replace ACP (ANSI) with OEMCP for default console CP
cp = objShell.RegRead("HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001" &_
"\Control\Nls\CodePage\ACP")
WScript.Echo "Switching console code page to " & cp
objShell.Exec "chcp.com " & cp
(This is my first WSH script, so it may be flawed - I'm not familiar with registry read permissions.)
Using a TrueType font is another requirement for using ANSI/Unicode with cmd.exe. I'm going to look at a programmatic switch to a better font when time permits.
In regards to the code snippit, the right answer is to use the appropriate constructor for InputStreamReader that does the correct code conversion. That way it won't matter what encoding the default on the system is, you know you are getting a correct encoding that corresponds to what you are getting on the socket.
Then you can specify the encoding when you write out files if you need to, rather than relying on the system encoding, but of course when they open files on that system they may have issues, but modern windows systems support UTF-8, so you can write out the file in UTF-8 if you need to (internally Java is representing all Strings as 16 bit unicode).
I would think this is the "right" solution in general that would be most compatible with largest range of underlying systems.
If the code page value that comes back from a chcp command will return the value that you need, you can use the following command to get the code page
C:\>for /F "Tokens=4" %I in ('chcp') Do Set CodePage=%I
This sets the variable CodePage to the code page value returned from chcp
C:\>echo %CodePage%
437
You could use this value in your bat file by prefixing it with Cp
C:\>echo Cp%CodePage%
Cp437
If when you put this into a bat file, the %I values in the first command will need to be replaced with %%I
Windows has the added complication of having two active codepages. In your example both 1252 and 850 are correct, but they depend on the way the program is being run. For GUI applications, Windows will use the ANSI code page, which for Western European languages will typically be 1252. However, the command line will report the OEM codepage which is 850 for the same locales.