Java read file with strings from different languages - java

I made a program that reads different text files and combines this into a .csv file. Its a .csv file with translations into English, dutch, french, italian, portuguese and spanish.
Now here is my problem:
In the end i get a nice filled .csv file with all the translations together. I read the files with UTF-8 and all the languages get shown right except for the french one. Some chars are shows as Questionmarks like these: "Mis ? jour" and it should be "Mis à jour".
Here is the method that reads the different files with the different languages and makes objects from them so i can sort them en put them in the right spot in the .csv file
The files are filled like this:
To Airport;A l’aéroport
Today;Aujourd’hui
public static Language getTranslations(String inputFileName) {
Language language = new Language();
FileInputStream fstream;
try {
fstream = new FileInputStream(inputFileName);
// Get the object of DataInputStream
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader( new InputStreamReader( new FileInputStream(inputFileName), "UTF-8"));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
String[] values = strLine.split(";");
if(values.length == 2) {
language.putTranslationItem(values[0], values[1]);
}
}
//Close the input stream
in.close();
} catch (FileNotFoundException e) {
} catch (IOException e) {
}
return language;
}
I hope anybody can help out!
Thanks

I am not completely sure about this , but you can try to convert the values[0] and values[1] strings into bytearray
byte[] value_0_utfString = values[0].getBytes("UTF-8") ;
byte[] value_1_utfString = values[1].getBytes("UTF-8") ;
and then convert it back into a string
str_0 = new String(value_0_utfString ,"UTF-8") ;
str_1 = new String(value_1_utfString ,"UTF-8") ;
Not sure if this is the right / optimized way , but since a single line comprises of both english and french , I thought splitting and encoding might help , I haven't tried this myself

Resave the text file by clicking "save as" in any text editor(eg: memopad) and change the encoding type to ANSI instead of UTF-8.

Related

characters not appearing when I print when I import a file?

I'm importing a file into my code and trying to print it. the file contains
i don't like cake.
pizza is good.
i don’t like "cookies" to.
17.
29.
the second dont has a "right single quotation" and when I print it the output is
don�t
the question mark is printed out a blank square. is there a way to convert it to a regular apostrophe?
EDIT:
public class Somethingsomething {
public static void main(String[] args) throws FileNotFoundException,
IOException {
ArrayList<String> list = new ArrayList<String>();
File file = new File("D:\\project1Test.txt");//D:\\project1Test.txt
if(file.exists()){//checks if file exist
FileInputStream fileStream = new FileInputStream(file);
InputStreamReader input = new InputStreamReader(fileStream);
BufferedReader reader = new BufferedReader(input);
String line;
while( (line = reader.readLine()) != null) {
list.add(line);
}
for(int i = 0; i < list.size(); i ++){
System.out.println(list.get(i));
}
}
}}
it should print as normal but the second "don't" has a white block on the apostrophe
this is the file I'm using https://www.mediafire.com/file/8rk7nwilpj7rn7s/project1Test.txt
edit: if it helps even more my the full document where the character is found here
https://www.nytimes.com/2018/03/25/business/economy/labor-professionals.html
It’s all about character encoding. The way characters are represented isn't always the same and they tend to get misinterpreted.
Characters are usually stored as numbers that depend on the encoding standard (and there are so many of them). For example in ASCII, "a" is 97, and in UTF-8 it's 61.
Now when you see funny characters such as the question mark (called replacement character) in this case, it's usually that an encoding standard is being misinterpreted as another standard, and the replacement character is used to replace the unknown or misinterpreted character.
To fix your problem you need to tell your reader to read your file using a specific character encoding, say SOME-CHARSET.
Replace this:
InputStreamReader input = new InputStreamReader(fileStream);
with this:
InputStreamReader input = new InputStreamReader(fileStream, "SOME-CHARSET");
A list of charsets is available here. Unfortunately, you might want to go through them one by one. A short list of most common ones could be found here.
Your problem is almost certainly the encoding scheme you are using. You can read a file in most any encoding scheme you want. Just tell Java how your input was encoded. UTF-8 is common on Linux. Windows native is CP-1250.
This is the sort of problem you have all the time if you are processing files created on a different OS.
See here and Here
I'll give you a different approach...
Use the appropriate means for reading plain text files. Try this:
public static String getTxtContent(String path)
{
try(BufferedReader br = new BufferedReader(new FileReader(path)))
{
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
return sb.toString();
}catch(IOException fex){ return null; }
}

Trouble writing 's into a .txt file, using FileOutputStream

The problem is that when I read a string and then, try to write each characters in separate line, into a .txt file, although System.out.println will show correct characters, when I write them into a .txt file, for the 's it will write some weird characters instead. To illustrate, here is an example: suppose we have this line Second subject’s layout of same 100 pages. and we want to write it into a .txt file, using the following code:
public static void write(String Swrite) throws IOException {
if(!file.exists()){
file.createNewFile();
}
FileOutputStream fop=new FileOutputStream(file,true);
if(Swrite!=null)
for(final String s : Swrite.split(" ")){
fop.write(s.toLowerCase().getBytes());
fop.write(System.getProperty("line.separator").getBytes());
}
fop.flush();
fop.close();
}
the written file would look like this for the word, subject's: subject’s. I have no idea why this happens.
Try something like the following. It frees you from having to deal with character encoding.
PrintWriter pw = null;
try {
pw = new PrintWriter(file);
if (Swrite!=null)
for (String s : Swrite.split(" ")) {
pw.println(s);
}
}
}
finally {
if (pw != null) {
pw.close();
}
}
How about something like this:
// The file to read the input from and write the output to.
// Original content: Second subject’s layout of same 100 pages.
File file = new File("C:\\temp\\file.txt");
// The charset of the file, in our case UTF-8.
Charset utf8Charset = Charset.forName("UTF-8");
// Read all bytes from the file and create a string out of it (with the correct charset).
String inputString = new String(Files.readAllBytes(file.toPath()), utf8Charset);
// Create a list of all output lines
List<String> lines = new ArrayList<>();
// Add the original line and than an empty line for clarity sake.
lines.add(inputString);
lines.add("");
// Convert the input string to lowercase and iterate over it's char array.
// Than for each char create a string which is a new line.
for(char c : inputString.toLowerCase().toCharArray()){
lines.add(new String(new char[]{c}));
}
// Write all lines in the correct char encoding to the file
Files.write(file.toPath(), lines, utf8Charset);
It all has to do with the used charsets as commented above.

StringEscapeUtils.unescapeHtml doesn't work on strings read from files

I'm trying to read in a file that contains unicode characters, convert those characters to their corresponding symbols and then print the resulting text to a new file. I'm trying to use StringEscapeUtils.unescapeHtml to do this but the lines are just being printed as is, with the unicode points still intact. I did a practice run by copying a single line from the file, making a string from that and then calling StringEscapeUtils.unescapeHtml on that, which works perfectly. My code is below:
class FileWrite
{
public static void main(String args[])
{
try{
String testString = " \"text\":\"Dude With Knit Hat At Party Calls Beer \u2018Libations\u2019 http://t.co/rop8NSnRFu\" ";
FileReader instream = new FileReader("Home Timeline.txt");
BufferedReader b = new BufferedReader(instream);
FileWriter fstream = new FileWriter("out.txt");
BufferedWriter out = new BufferedWriter(fstream);
out.write(StringEscapeUtils.unescapeHtml3(testString) + "\n");//This gives the desired output,
//with unicode points converted
String line = b.readLine().toString();
while(line != null){
out.write(StringEscapeUtils.unescapeHtml3(line) + "\n");
line = b.readLine();
}
//Close the output streams
b.close();
out.close();
}
catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
}
//This gives the desired output,
//with unicode points converted
out.write(StringEscapeUtils.unescapeHtml3(testString) + "\n");
You are mistaken. Java unescapes String literals of this form at compile time when it builds them into the class file:
"\u2018Libations\u2019"
There are no HTML 3 escapes in this code. The method you have chosen is designed to unescape escape sequences of the form ‘.
You probably want the unescapeJava method.
You're strings are being both read and written using your platforms default encoding. You want to explicitly specify the character set to use as 'UTF-8':
Input stream:
BufferedReader b = new BufferedReader(new InputStreamReader(
new FileInputStream("Home Timeline.txt"),
Charset.forName("UTF-8")));
Output stream:
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream("out.txt"),
Charset.forName("UTF-8")));

Java save multiline string to text file

I am new to Java and trying to save a multi line string to a text file.
Right now, it does work within my application. Like, if I save the file from my application and then open it from my application, it does put a space between lines. However, if I save the file from my app and then open it in Notepad, it is all on one line.
Is there a way to make it show multi line on all programs? Here's my current code:
public static void saveFile(String contents) {
// Get where the person wants to save the file
JFileChooser fc = new JFileChooser();
int rval = fc.showSaveDialog(fc);
if(rval == JFileChooser.APPROVE_OPTION) {
File file = fc.getSelectedFile();
try {
//File out_file = new File(file);
BufferedWriter out = new BufferedWriter(new FileWriter(file));
out.write(contents);
out.flush();
out.close();
} catch(IOException e) {
messageUtilities.errorMessage("There was an error saving your file. IOException was thrown.", "File Error");
}
}
else {
// Do nothing
System.out.println("The user choose not to save anything");
}
}
depending on how you are constructing your string, you may just be running into a line ending problem. Notepad does not support unix line endings (\n only) it only supports windows line endings (\n\r). try opening your saved file using a more robust editor, and/or make sure you are using the proper line endings for your platform. java's system property (System.getProperty("line.separator")) will get you the proper line ending for the platform that the code is running on.
while you're building your string to be saved to the file, rather than explicitly specifying "\n" or "\n\r" (or on the mac "\r") for your line endings, you would instead append the value of that system property.
like so:
String eol = System.getProperty("line.separator");
... somewhere else in your code ...
String texttosave = "Here is a line of text." + eol;
... more code.. optionally adding lines of text .....
// call your save file method
saveFile(texttosave);
Yea as the previous answer mentions the System.getProperty("line.seperator").
your code doesn't show how you created String contents but since you said you were new to java I thought i'd mention that in java concatenating Strings is not nice since it creates a. If you are building the String by doing this:
String contents = ""
contents = contents + "sometext" + "some more text\n"
Then consider using java.lang.StrinBuilder instead
StringBuilder strBuilder = new StringBuilder();
strBuilder.append("sometext").append("somre more text\n");
...
String contents = strBuilder.toString();
Another alternative is to stream what ever your planning to write to a file rather than building a large string and then outputting that.
You could add something like:
contents = contents.replaceAll("\\n","\\n\\r");
if notepad does not display correctly. However you might run into a different problem: at each save/load you will get multiple \r chars. Then to avoid that at load you would have to call the same code above but with reversed parameters. This is really an ugly solution just to get the text to display properly in notepad.
I had this same problem my guy friend, after much thought and research I even found a solution.
You can use the ArrayList to put all the contents of the TextArea for exemple, and send as parameter by calling the save, as the writer just wrote string lines, then we use the "for" line by line to write our ArrayList in the end we will be content TextArea in txt file.
if something does not make sense, I'm sorry is google translator and I who do not speak English.
Watch the Windows Notepad, it does not always jump lines, and shows all in one line, use Wordpad ok.
private void SaveActionPerformed(java.awt.event.ActionEvent evt) {
String NameFile = Name.getText();
ArrayList< String > Text = new ArrayList< String >();
Text.add(TextArea.getText());
SaveFile(NameFile, Text);
}
public void SaveFile(String name, ArrayList< String> message) {
path = "C:\\Users\\Paulo Brito\\Desktop\\" + name + ".txt";
File file1 = new File(path);
try {
if (!file1.exists()) {
file1.createNewFile();
}
File[] files = file1.listFiles();
FileWriter fw = new FileWriter(file1, true);
BufferedWriter bw = new BufferedWriter(fw);
for (int i = 0; i < message.size(); i++) {
bw.write(message.get(i));
bw.newLine();
}
bw.close();
fw.close();
FileReader fr = new FileReader(file1);
BufferedReader br = new BufferedReader(fr);
fw = new FileWriter(file1, true);
bw = new BufferedWriter(fw);
while (br.ready()) {
String line = br.readLine();
System.out.println(line);
bw.write(line);
bw.newLine();
}
br.close();
fr.close();
} catch (IOException ex) {
ex.printStackTrace();
JOptionPane.showMessageDialog(null, "Error in" + ex);
}

PHP and a file written by Java FileOutputStream

I have a text file that is written by Java FileOutputStream.
When i read that file using file_get_contents, then everything is on same line and there are no separators between different strings.
I need to know, how to read/parse that file so i have some kind on separators between strings
I'm using somethig like this, to save the file:
Stream stream = new Stream(30000, 30000);
stream.outOffset = 0;
stream.writeString("first string");
stream.writeString("second string");
FileOutputStream out = new FileOutputStream("file.txt");
out.write(stream.outBuffer, 0, stream.outOffset);
out.flush();
out.close();
out = null;
I have no idea what that Stream thing in your code represents, but the usual approach to write String lines to a file is using a PrintWriter.
PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream("/file.txt"), "UTF-8"));
writer.println("first line");
writer.println("second line");
writer.close();
This way each line is separated by the platform default newline, which is the same as you obtain by System.getProperty("line.separator"). On Windows machines this is usually \r\n. In the PHP side, you can then just explode() on that.
file_get_contents returns the content of the file as a string. There are no lines in a string.
Are you familiar with newlines?
See wikipedia
So, what you are probably looking for is either reading your file line for line in PHP,
or reading it with file_get_contents like you did and then explode-ing it into lines (use "\n" as separator).
There is no indication in your code that you are writing a line separator to the output stream. You need to do something like this:
String nl = System.getProperty("line.separator");
Stream stream = new Stream(30000, 30000);
stream.outOffset = 0;
stream.writeString("first string");
stream.writeString(nl);
stream.writeString("second string");
stream.writeString(nl);
FileOutputStream out = null;
try
{
out = new FileOutputStream("file.txt");
out.write(stream.outBuffer, 0, stream.outOffset);
out.flush();
}
finally
{
try
{
if (out != null)
out.close();
}
catch (IOException ioex) { ; }
}
Using PHP, you can use the explode function to fill an array full of strings from the file you are reading in:
<?php
$data = file_get_contents('file.txt');
$lines = explode('\n', $data);
foreach ($lines as $line)
{
echo $line;
}
?>
Note that depending on your platform, you may need to put '\r\n' for the first explode parameter, or some of your lines may have carriage returns on the end of them.

Categories

Resources