Vscode doesn't recognize Umlaute (äöü) when reading and writing files with Java

Vscode doesn't recognize Umlaute (äöü) when reading and writing files with Java - java

I have a Java project which reads from a .txt file and counts the frequency of every word and saves every word along with its frequency in a .stat file. The way I do this is by reading the file with a BufferedReader, using replaceAll to replace all special characters with spaces and then iterating through the words and finally writing into a .stat with a PrintWriter.
This program works fine if I run it in Eclipse.
However, if I run it in VSCode, the Umlaute (äöü) get recognized as Special characters and are removed from the words.
If I don't use a replaceAll and leave all the special characters in the text, they will get recognized and displayed normally in the .stat.
If I use replaceAll("[^\\p{IsAlphabetic}+]"), the Umlaute will get replaced by all kinds of weird Unicode characters (for Example Ăbermut instead of Übermut).
If I use replaceAll("[^a-zA-ZäöüÄÖÜß]"), the Umlaute will just get replaced by spaces. The same happens if I mention the Umlaute via their Unicode.
This has to be a problem with the encoding in VSCode or perhaps Powershell, as it works fine in other IDEs.
I already checked if Eclipse and VSCode use the same Jdk version, which they did. It's 17.0.5 and the only one installed on my machine.
I also tried out all the different encoding settings in VSCode and I recreated the project from scratch after changing the settings, to no avail.
Here's the code of the minimal reproducable problem:
import java.io.*;
public class App {
static String s;
public static void main(String[] args) {
Reader reader = new Reader();
reader.readFile();
}
}
public class Reader {
public void readFile() {
String s = null;
File file = new File("./src/textfile.txt");
try (FileReader fileReader = new FileReader(file);
BufferedReader bufferedReader = new BufferedReader(fileReader);) {
s = bufferedReader.readLine();
} catch (FileNotFoundException ex) {
// TODO: handle exception
} catch (IOException ex) {
System.out.println("IOException");
}
System.out.println(s);
System.out.println(s.replaceAll("[a-zA-ZäöüÄÖÜß]", " "));
}
}
My textfile.txt contains the line "abcABCäöüÄÖÜß".
The above program outputs
ï»¿abcABCÃ¤Ã¶Ã¼Ã?Ã?Ã?Ã?
ï»¿ Ã¤Ã¶Ã¼Ã?Ã?Ã?Ã?
Which shows that the problem is presumably in the Reader, as the glibberish Unicode symbols don't get picked up by the replaceAll.

I solved it by explicitly turning all java files and all .txt files into UTF-8 encoding (in the bottom bar in VSCode), setting UTF-8 as the standard encoding in the VSCode settings and modifying both the FileReader and FileWriter to work with the UTF-8 encoding like this:
FileReader fileReader = new FileReader(file, Charset.forName("UTF-8"));
FileWriter fileWriter = new FileWriter(file, Charset.forName("UTF-8"));

Related

Writing foreign characters to .json file, eclipse vs jar

Using eclipse IDE to for tests on writing data to .json and .txt files with few foreign(Chinese, Hindi) characters using Java. I could successfully write into .txt where as .json displayed ascii characters.
Code Snippet:
try(BufferedWriter br = new BufferedWriter(new FileWriter(new File("test.json")))) {
JSONObject obj = new JSONObject();
obj.put("key", "Hello, ओ ो ु ऋ 样品");
String str = obj.toJSONString();
System.out.println(str);
br.write(str);
br.close();
} catch (Exception e) {
e.printStackTrace();
}
Output of .txt: {"key":"Hello, ओ ो ु ऋ 样品"}
Output of .json: {"key":"Hello, à¤“ à¥‹ à¥� à¤‹ æ ·å“�"}
Have tried using DataOutputStream to write data. But the result is same.
On decoding, it worked to decode back as same foreign character and looks good.
On building a jar, and running the same as .jar file doesn't give same results. Writing and Reading both were displayed in ascii. Yes, I understand in eclipse the file is saved as utf-8, which helped to compile. By the way I'm using maven to build the jar.
Please help me with the wayout a solution. Thanks.

Selenium Chrome Driver In Tomcat is not working why?

was trying run my selenium automation code using java in a Tomcat server. It works fine when I run using javac but when it gets run on Tomcat as a jar It shows "com.google.common.base.Preconditions.checkState(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;Ljava/lang/Object;)V|" this as a log. Here my selenium-chrome driver is placed in desktop of my local machine and path is defined (Tomcat is also a local server)

I would go with a buffered file reader like this:
public static void main(String[] args) throws IOException {
try {
File f = new File("data.txt");
BufferedReader b = new BufferedReader(new FileReader(f));
String readLine;
while ((readLine = b.readLine()) != null) {
if (readLine.contains("WORD"))
System.out.println("Found WORD in: " + readLine);
}
} catch (IOException e) {
e.printStackTrace();
}
}
where "WORD" is the word you are searching for.
The advantage of a BufferedReader is that it reads ahead to reduce the number of I/O roundtrips - or as they put it in the JavaDoc: "Reads text from a character-input stream, buffering characters so as to provide for the efficient reading of characters, arrays, and lines."
FileChannel is a slightly newer invention, arriving in the NIO with Java 1.4. It might perform better than the BufferedReader - but I also find it a lot more low-level in its API, so unless you have very special performance requirements, I would leave the readahead/buffering to BufferedReader and FileReader.
You can also say that BufferedReader is "line oriented" whereas FileChannel is "byte oriented".

I like the BufferedReader from Java.io with a FileReader most:
https://docs.oracle.com/javase/7/docs/api/java/io/FileReader.html
https://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html
https://www.mkyong.com/java/how-to-read-file-from-java-bufferedreader-example/
It is easy to use and has most functions. But your file mus be char-based to use that ( like a text file)

Creating file with non english characters in the file name

How to create a file with chinese character in file name in Java? I have read many stack overflow answers but couldn't find a proper solution. The code I have written till now is as follows:
private void writeFile(String fileContent) {
try{
File file=new File("C:/temp/你好.docx");
if(file.exists()){
file.delete();
}
file.createNewFile();
FileOutputStream fos = new FileOutputStream(file);
fos.write(fileContent);
fos.close();
}
catch(Exception e)
{
e.printStackTrace();
}
}
The file is written to the output directory but the name of the file contains some garbage value. Thanks in advance.

I believe the key is as what I have mentioned in the comment.
You can have Chinese characters (or other non-ascii character) as literals in source code. However you should make sure two things:
The encoding you use to save the source file is capable to represent those character (My suggestion is just to stick with UTF-8)
The -encoding parameter you passed to javac should match the encoding of the file.
For example, if you use UTF-8, your javac command should looks like
javac -encoding utf-8 Foo.java
I have just tried a similar piece of code you have, and it works well in my machine.

Java: reading text from a file results with strange formatting

Usually, when I read text files, I do it like this:
File file = new File("some_text_file.txt");
Scanner scanner = new Scanner(new FileInputStream(file));
StringBuilder builder = new StringBuilder();
while(scanner.hasNextLine()) {
builder.append(scanner.nextLine());
builder.append('\n');
}
scanner.close();
String text = builder.toString();
There may be better ways, but this method has always worked for me perfectly.
For what I am working on right now, I need to read a large text file (over 700 kilobytes in size). Here is a sample of the text when opened in Notepad (the one that comes standard with any Windows operating system):
"lang"
{
"Language" "English"
"Tokens"
{
"DOTA_WearableType_Daggers" "Daggers"
"DOTA_WearableType_Glaive" "Glaive"
"DOTA_WearableType_Weapon" "Weapon"
"DOTA_WearableType_Armor" "Armor"
However, when I read the text from the file using the method that I provided above, the output is:
I could not paste the output for some reason. I have also tried to read the file like so:
File file = new File("some_text_file.txt");
Path path = file.toPath();
String text = new String(Files.readAllBytes(path));
... with no change in result.
How come the output is not as expected? I also tried reading a text file that I wrote and it worked perfectly fine.

It looks like encoding problem. Use a tool that can detect encoding to open the file (like Notepad++) and find how it is encoded. Then use the other constructor for Scanner:
Scanner scanner = new Scanner(new FileInputStream(file), encoding);
Or you can simply experiment with it, trying different encodings. It looks like UTF-16 to me.

final Scanner scanner = new Scanner(new FileInputStream(file), "UTF-16");

java output html code to file

I have a chunk of html code that should be outputted as a .html file, in java. The pre-written html code is the header and table for a page, and i need to generate some data, and output it to the same .html file. Is there an easier way to print the html code than to do prinln() line by line? Thanks

You can look at some Java libraries for parsing HTML code. A quick Google search tuns up a few. Read in the HTML and then use their queries to manipulate the DOM as needed and then spit it back out.
e.g. http://jsoup.org/

Try using a templating engine, MVEL2 or FreeMarker, for example. Both can be used by standalone applications outside of a web framework. You lose time upfront but it will save you time in the long run.

JSP (Java Server Pages) allows you to write HTML files which have some Java code easily embedded within them. For example
<html><head><title>Hi!</title></head><body>
<% some java code here that outputs some stuff %>
</body></html>
Though that requires that you have an enterprise Java server installed. But if this is on a web server, that might not be unreasonable to have.
If you want to do it in normal Java, that depends. I don't fully understand which part you meant you will be outputting line by line. Did you mean you are going to do something like
System.out.println("<html>");
System.out.println("<head><title>Hi!</title></head>");
System.out.println("<body>");
// etc
Like that? If that's what you meant, then don't do that. You can just read in the data from the template file and output all the data at once. You could read it into a multiline text string of you could read the data in from the template and output it directly to the new file. Something like
while( (strInput = templateFileReader.readLine()) != null)
newFileOutput.println(strInput);
Again, I'm not sure exactly what you mean by that part.

HTML is simply a way of marking up text, so to write a HTML file, you are simply writing the HTML as text to a file with the .html extension.
There's plenty of tutorials out there for reading and writing from files, as well as getting a list of files from a directory. (Google 'java read file', 'java write file', 'java list directory' - that is basically everything you need.) The important thing is the use of BufferedReader/BufferedWriter for pulling and pushing the text in to the files and realising that there is no particular code science involved in writing HTML to a file.
I'll reiterate; HTML is nothing more than <b>text with tags</b>.
Here's a really crude example that will output two files to a single file, wrapping them in an <html></html> tag.
BufferedReader getReaderForFile(filename) {
FileInputStream in = new FileInputStream(filename);
return new BufferedReader(new InputStreamReader(in));
}
public void main(String[] args) {
// Open a file
BufferedReader myheader = getReaderForFile("myheader.txt");
BufferedReader contents = getReaderForFile("contentfile.txt");
FileWriter fstream = new FileWriter("mypage.html");
BufferedWriter out = new BufferedWriter(fstream);
out.write("<html>");
out.newLine();
for (String line = myheader.readLine(); line!=null; line = myheader.readLine()) {
out.write(line);
out.newLine(); // readLine() strips 'carriage return' characters
}
for (String line = contents.readLine(); line!=null; line = contents.readLine()) {
out.write(line);
out.newLine(); // readLine() strips 'carriage return' characters
}
out.write("</html>");
}

To build a simple HTML text file, you don't have to read your input file line by line.
File theFile = new File("file.html");
byte[] content = new byte[(int) theFile.length()];
You can use "RandomAccessFile.readFully" to read files entirely as a byte array:
// Read file function:
RandomAccessFile file = null;
try {
file = new RandomAccessFile(theFile, "r");
file.readFully(content);
} finally {
if(file != null) {
file.close();
}
}
Make your modifications on the text content:
String text = new String(content);
text = text.replace("<!-- placeholder -->", "generated data");
content = text.getBytes();
Writing is also easy:
// Write file content:
RandomAccessFile file = null;
try {
file = new RandomAccessFile(theFile, "rw");
file.write(content);
} finally {
if(file != null) {
file.close();
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Vscode doesn't recognize Umlaute (äöü) when reading and writing files with Java - java

Related

Writing foreign characters to .json file, eclipse vs jar

Selenium Chrome Driver In Tomcat is not working why?

Creating file with non english characters in the file name

Java: reading text from a file results with strange formatting

java output html code to file

Categories

Resources