java - File charset

java - File charset - java

I have an application, which proccesses some text and then saves it to file.
When I run it from NetBeans IDE, both System.out and PrintWriter work correct and non-ACSII characters are displayed/saved correctly. But, if I run the JAR from Windows 7 command line (which uses the cp1250 (central european) encoding in this case) screen output and saved file are broken.
I tried to put UTF-8 to PrintWriter's constructor, but it didn't help… And it can't affect System.out, which will be corrupted even after this.
Why is it working in the IDE and not in cmd.exe?
I would understand that System.out has some problems, but why is also output file affected?
How can I fix this issue?

I just had the same problem.
Actual reason of that is because when your code is ran in NetBeans environment, NetBeans automatically sets properties of the system.
You can see that when you run your code with NetBeans, the code below probably prints "UTF-8". But when you run it with cmd, you sure will see "cp1256".
System.getProperty("file.encoding");
You should notice that while using 'setProperty' will change the output of 'getProperty' function, it will not have any effect on Input/Outputs. (because they are all set before the main function is called.)
Having this background in mind, when you want to read from files and write to them, It's better to use codes below:
File f = new File(sourcePath);
For reading:
InputStreamReader isr = new InputStreamReader(
new FileInputStream(f), Charset.forName("UTF-8"));
and for writing (I have not tested this):
OutputStreamWriter osw = new OutputStreamWriter(
new FileOutputStream(f), Charset.forName("UTF-8"));
the main difference is that these Classes get required Charset in their constructors, but classes like FileWrite and PrintWrite don't.
I hope that works for you.

Related

Why can't I get a file from resources?

Why can't I get a file from resources?
URL resource = getClass().getClassLoader().getResource("input data/logic test.csv");
System.out.println("Found "+resource);
CSVParser parser = new CSVParserBuilder().withSeparator(';').build();
CSVReader reader = new CSVReaderBuilder(new FileReader(resource.getFile())).withSkipLines(1).withCSVParser(parser).build();
Console output:
Found file:/home/alexandr/Repos/OTUS/first_home_work/target/classes/input%20data/logic%20test.csv
Exception in thread "main" java.io.FileNotFoundException: /home/alexandr/Repos/OTUS/first_home_work/target/classes/input%20data/logic%20test.csv (Нет такого файла или каталога)

There is an inherent logic problem with this line:
CSVReader reader = new CSVReaderBuilder(
new FileReader(resource.getFile()))..
Once the CSV is part of a Jar, it will no longer be accessible as a File object. But something like this should work directly for the URL.
CSVReader reader = new CSVReaderBuilder(
new InputStreamReader(resource.openStream()))..
change space for _ in directory name and file name, and working
This will only work while the resource is not in a Jar file.

It's:
try (InputStream raw = ClassThisIn.class.getResourceAsStream(""input data/logic test.csv")) {
InputStreamReader isr = new InputStreamReader(raw, StandardCharsets.UTF_8);
BufferedReader br = new BufferedReader(isr);
// now use br as if it was your filereader.
}
This addresses many issues:
Still works regardless of how you're running it: Your snippet only works if running as straight class files (vs., say, in a jar), and doesn't work if spaces are involved.
Still works even if your class is subclassed (getClass().getClassLoader().getResource won't, which is why you should not do that).
Still works even if the platform local charset encoding is weird (the snippet in this answer is explicit about it. That's always a good idea).
Doesn't have a resource leak. Your code never safely closes the reader you open. If you open resources, either do so in a try-with-resources construct, or, store the resource in a field and implement AutoClosable.

i change space for _ in directory name and file name, and working.... omg.

The answer is in your console output - the file was simply not found.
I would try the same code you have written, but use a file without spaces in it - and see if the file is still not found.

Program detects string when running in eclipse, but not when running from jar (possible encoding issue)

I have a program that grabs some strings from a location and puts them in a list. I also have an "exclusions list" that loads from a file. If the current string is in the exclusions list, it gets ignored.
In my exclusions list file, I have this string:
Something ›
Note, that is not a typical angle bracket. It's a special character (dec value 8250)
When I run this in Eclipse, everything works perfectly. My program sees that the Something › is in the exclusions list and ignores it. However, when I build and run my program as a jar, the Something › does not get ignored. Everything else works fine, it's just that one string.
I'm assuming it's because of the ›, which means it must be encoding related. However, I have the text file saved as UTF-8 (without BOM), and my eclipse is configured as UTF-8, too. Any ideas?

This seems to have fixed it. I changed the way it loaded the text file from:
Scanner fileIn = new Scanner(new File(filePath));
to
Scanner fileIn = new Scanner(new FileInputStream(filePath), "UTF-8");

Different Result in Java Netbeans Program

I am working on a small program to find text in a text file but I am getting a different result depending how I run my program.
When running my program from Netbeans I get 866 matches.
When running my program by double clicking on the .jar file in the DIST folder, I get 1209 matches (The correct number)
It seems that when I'm running the program from Netbeans, it doesn't get to the end of the text file. Is that to be expected ?
Text File in question
Here is my code for reading the file:
#FXML
public void loadFile(){
//Loading file
try{
linelist.clear();
aclist.clear();
reader = new Scanner(new File(filepathinput));
while(reader.hasNext()){
linelist.add(reader.nextLine());
}
for(int i = 0; i < linelist.size()-1; i++){
if(linelist.get(i).startsWith("AC#")){
aclist.add(linelist.get(i));
}
}
}
catch(java.io.FileNotFoundException e){
System.out.println(e);
}
finally{
String accountString = String.valueOf(aclist.size());
account.setText(accountString);
reader.close();
}
}

The problem is an incompatibility between the java app's (i.e. JVM) default file encoding and the input file's encoding.
The file's encoding is "ANSI" which commonly maps to Windows-1252 encoding (or its variants) on Windows machines.
When running the app from the command prompt, the JVM (so the Scanner implicitly) will take the system default file encoding which is Windows-1252. Reading the same encoded file with this setup will not cause the problem.
However, Netbeans by default sets the project encoding to utf-8, therefore when running the app from Netbeans its file encoding is utf-8. Reading the file with this encoding resulting to confusion of the scanner. The character "ï" (0xEF) of the text "Caraïbes" is the cause of the problem. Since it is one of characters of BOM (ï»¿ = 0xEF 0xBB 0xBF) sequence, it is somehow messing up the scanner.
As a solution,
either specify the encoding type of the scanner explicitly
reader = new Scanner(file, "windows-1252");
or convert the input file encoding to utf-8 using notepad or better notepad++, and set encoding type to utf-8 without using system default.
reader = new Scanner(file, "utf-8");
However, when the different OSes are considered, working with utf-8 at all places will the preferred way dealing with multi-platform environments. Hence the 2nd way is to go.

It can also depend on the filepathinput input. When jar and netbeans both might be referring to two different files. Possibly with same name in different location. Can you give more information on the filepathinput variable value?

Java text encoding

I read lines from a .txt file into a String list. I show the text in a JTextPane. The encoding is fine when running from Eclipse or NetBeans, however if I create a jar, the encoding is not correct. The encoding of the file is UTF-8. Is there a way to solve this problem?

Your problem is probably that you're opening a reader using the platform encoding.
You should manually specify the encoding whenever you convert between bytes and characters. If you know that the appropriate encoding is UTF-8 you can open a file thus:
FileInputStream inputFile = new FileInputStream(myFile);
try {
FileReader reader = new FileReader(inputFile, "UTF-8");
// Maybe buffer reader and do something with it.
} finally {
inputFile.close();
}
Libraries like Guava can make this whole process easier..

Have you tried to run your jar as
java -Dfile.encoding=utf-8 -jar xxx.jar

Writing to a file in unix not working; in windows it works

Given:
try{
FileWriter fw = new FileWriter(someFileName);
BufferedWriter bw = new BufferedWriter(fw);
bw.write("Hello Java");
}catch...
}finally{
bw.close();
}
It works perfectly in windows, but not in Unix.
Remark: the created file in unix has the complete 777 rights!
What should I do to get it working in unix?
Thanks,
Roxana

Try doing a
bw.flush();
before closing the file (on try block).
Maybe the information is still on the buffer, so it doesn't get reflected on the file contents

You should give us some more code, specially the section where the someFileName is specified. Since there is some difference in Java on how the 'file separator' is treated, your problem could be that you're creating/opening a file in windows, but it isn't on the unix... and your 'catch' is treating it, but you didn't provide its contents.
Take a look here
"file.separator" --> Character that separates components of a file path. This is "/" on UNIX and "\" on Windows.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

java - File charset - java

Related

Why can't I get a file from resources?

Program detects string when running in eclipse, but not when running from jar (possible encoding issue)

Different Result in Java Netbeans Program

Java text encoding

Writing to a file in unix not working; in windows it works

Categories

Resources