I want to read an Arabic text file encoded in windows-1256 using Java (on the windows platform)
Any suggestions?
If your JVM supports that encoding, then yes, you can easily do that:
Reader r = new InputStreamReader(new FileInputStream(theFile), "Windows-1256");
BufferedReader buffered = new BufferedReader(r);
try {
String line;
while ((line = buffered.readLine()) != null) {
// handle each line
}
} finally {
buffered.close();
}
Something like:
BufferedReader in = new BufferedReader(new InputStreamReader(
new FileInputStream("myfile.txt"), "windows-1256"));
Should work.
To read from a FileInputStream with another character set than the platform default, use an InputStreamReader:
http://download.oracle.com/javase/6/docs/api/java/io/InputStreamReader.html#InputStreamReader(java.io.InputStream,%20java.lang.String)
Related
I'm having some problems with something simple and it's doing my head in. I have some entries in an Excel spreadsheet that contains various Asian characters etc. When exported as "Unicode text (*.txt)" with UTF-8 selected as encoding I can view it correctly in Notepad, but when I try to print it in the Eclipse console I get gibberish. I've tried variations as to how to read as UTF-8 and I know the console can display it:
try {
//BufferedReader in = new BufferedReader(new FileReader("testtest.txt"));
File fileDir = new File("testestet.txt");
//PrintStream out = new PrintStream(System.out, true, "UTF-8"); // tried this just in case
System.out.println("사과"); // this prints just fine
BufferedReader in = new BufferedReader(
new InputStreamReader(new FileInputStream(fileDir), StandardCharsets.UTF_8));
String line;
while((line = in.readLine()) != null)
{
System.out.println(line);
}
in.close();
}
catch (Exception e) { e.printStackTrace(); }
Any ideas? Whatever I solution I've found here has not worked. I'm wondering if excel is just borked...
Should you try this instead?
BufferedReader in = new BufferedReader(
new InputStreamReader(new FileInputStream(fileDir), StandardCharsets.UTF_8));
Heh, I found the problem while testing all StandardCharsets. UTF-16 works! I guess some of the characters I had in my Excel spreadsheet weren't supported by UTF-8 and Excel solved the problem without telling me...
I am using Marathi Wordnet.In this wordnet there are text documents including marathi words
I want to read these marathi documents in my java code.I have tried with using the BufferedReader and FileReader.But I failed.
This is the code I have tried.
FileReader fr=new FileReader("onto_txt");
BufferedReader br=new BufferedReader(fr);
String line=br.readLine();
while(line!=null){
System.out.println(line);
line=br.readLine();
}
fr.close();
br.close();
FileReader is an old utility class using the default encoding of the platform.
Assuming that the file is in UTF-8, better explicitly specify the encoding.
try (BufferedReader br = new BufferedReader(new InputStreamReader(
new FileInputStream("C:/xyz/onto_txt"), StandardCharsets.UTF_8))) {
String line = br.readLine();
while (line != null) {
System.out.println(line);
System.out.println(Arrays.toString(line.getBytes(StandardCharsets.UTF_8)));
line = br.readLine();
}
} // Closes br
Using System.out again converts the line to the encoding of the platform. That might not be able to display the String line; hence the dump of every single byte. Not very informative, but it might clarify that where ? is diplayed in the prior line, there really are Unicode characters.
Internally java String holds Unicode, and can contain any text. So you might process line as desired in the while.
I am reading a csv file in java, adding a new column with new information and exporting it back to a CSV file. I have a problem in reading the CSV file in UTF-8 format. I read line by line and store it in a StringBuilder, but when I print the line I can see that the information I'm reading is not in UTF-8 but in ANSI. I used both System.out.print and printstream in UTF and the information appears still in ANSI. This is my code :
BufferedReader br;
try {
br = new BufferedReader(new InputStreamReader(new FileInputStream(
"./users.csv"), "UTF8"));
String line;
while ((line = br.readLine()) != null) {
if (line.contains("none#none.com")) {
continue;
}
if (!line.contains("#") && !line.contains("FirstName")) {
continue;
}
PrintStream ps = new PrintStream(System.out, true, "UTF-8");
ps.print(line + "\n");
sbusers.append(line);
sbusers.append("\n");
sbusers2.append(line);
sbusers2.append(",");
}
br.close();
} catch (IOException e) {
System.out.println("Failed to read users file.");
} finally {
}
It prints out information like "Professor -P�s". Since the reading isn't being done correctly the output to the new file is also being exported in ANSI.
Are you sure your CSV is UTF-8 encoded? My guess is that it's not. Try using ISO-8859-1 for reading the file, but keep the output as UTF-8. (UTF8 and UTF-8 both tend to work, but you should use UTF-8 as #Marcelo suggested)
In the line:
br = new BufferedReader(new InputStreamReader(new FileInputStream("./users.csv"),"UTF8"));
Your charset should be "UTF-8" not "UTF8".
Printing to System.out using UTF encoding ????????????
Why would you do that ? System.out and the encoding it uses is determined at the OS level (it becomes the default charset in the JVM), and that's the only one you want to use on System.out.
Fist, as suggested by #Marcelo, use UTF8 instead of UTF-8:
BufferedReader in = new BufferedReader(
new InputStreamReader(
new FileInputStream("./users.csv"), "UTF8"));
Second, forget about the PrintStream, just use System.out, or better yet, a logging API. You don't need to worry about how Java will output your string to the console (number one rule about character encoding: After you've read things successfully, let Java handle the encoding and only worry about it again when you are writing to an external file / database / etc).
Third and more important, check that your file is really encoded in UTF-8, this is the cause of 99% of the encoding problems.
Make sure that you test with a real UTF-8 file (use tools like iconv to convert to UTF-8 and be sure about it).
found a potential solution(I had the same problem). Depending on the type of UTF-8 encoding you need to specify if further...
Replace:
br = new BufferedReader(new InputStreamReader(new FileInputStream(
"./users.csv"), "UTF8"));
With:
br = new BufferedReader(new InputStreamReader(new FileInputStream(
"./users.csv"), "ISO_8859_1"));
For further understanding: https://mincong.io/2019/04/07/understanding-iso-8859-1-and-utf-8/
I am using following code to write json to my local path which i get from my html page.Again I have to construct a html page by reading content from the saved local json file.For this I have to read this saved file from local which is plain text and give as input to java file. I got confused whether to use Buffered Reader or BufferedInputStream to read that file from local path.Please help me.
java.io.BufferedWriter jsonOut = new java.io.BufferedWriter(
new java.io.OutputStreamWriter(
new java.io.FileOutputStream(uploadDir +
_req.getParameter("filename")), "ISO-8859-1"));
BufferedReader for text.
Reason: http://tutorials.jenkov.com/java-io/bufferedreader.html
You can use BufferedReader for text but you should ensure to use the proper charset in your case (otherwise it defaults to the platform charset)
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(myFile),"ISO-8859-1"));
To read a file you can use the following code
File f = new File("your json file");
BufferedReader buf = new BufferedReader(new FileReader(f));
String line = null;
while ((line = buf.readLine()) != null) {
System.out.println("json file line " + line);
// do your changes
}
How would I read a .txt file in Java and put every line in an array when every lines contains integers, strings, and doubles? And every line has different amounts of words/numbers.
I'm a complete noob in Java so sorry if this question is a bit stupid.
Thanks
Try the Scanner class which no one knows about but can do almost anything with text.
To get a reader for a file, use
File file = new File ("...path...");
String encoding = "...."; // Encoding of your file
Reader reader = new BufferedReader (new InputStreamReader (
new FileInputStream (file), encoding));
... use reader ...
reader.close ();
You should really specify the encoding or else you will get strange results when you encounter umlauts, Unicode and the like.
Easiest option is to simply use the Apache Commons IO JAR and import the org.apache.commons.io.FileUtils class. There are many possibilities when using this class, but the most obvious would be as follows;
List<String> lines = FileUtils.readLines(new File("untitled.txt"));
It's that easy.
"Don't reinvent the wheel."
The best approach to read a file in Java is to open in, read line by line and process it and close the strea
// Open the file
FileInputStream fstream = new FileInputStream("textfile.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console - do what you want to do
System.out.println (strLine);
}
//Close the input stream
fstream.close();
To learn more about how to read file in Java, check out the article.
Your question is not very clear, so I'll only answer for the "read" part :
List<String> lines = new ArrayList<String>();
BufferedReader br = new BufferedReader(new FileReader("fileName"));
String line = br.readLine();
while (line != null)
{
lines.add(line);
line = br.readLine();
}
Common used:
String line = null;
File file = new File( "readme.txt" );
FileReader fr = null;
try
{
fr = new FileReader( file );
}
catch (FileNotFoundException e)
{
System.out.println( "File doesn't exists" );
e.printStackTrace();
}
BufferedReader br = new BufferedReader( fr );
try
{
while( (line = br.readLine()) != null )
{
System.out.println( line );
}
#user248921 first of all, you can store anything in string array , so you can make string array and store a line in array and use value in code whenever you want. you can use the below code to store heterogeneous(containing string, int, boolean,etc) lines in array.
public class user {
public static void main(String x[]) throws IOException{
BufferedReader b=new BufferedReader(new FileReader("<path to file>"));
String[] user=new String[500];
String line="";
while ((line = b.readLine()) != null) {
user[i]=line;
System.out.println(user[1]);
i++;
}
}
}
This is a nice way to work with Streams and Collectors.
List<String> myList;
try(BufferedReader reader = new BufferedReader(new FileReader("yourpath"))){
myList = reader.lines() // This will return a Stream<String>
.collect(Collectors.toList());
}catch(Exception e){
e.printStackTrace();
}
When working with Streams you have also multiple methods to filter, manipulate or reduce your input.
For Java 11 you could use the next short approach:
Path path = Path.of("file.txt");
try (var reader = Files.newBufferedReader(path)) {
String line;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
}
Or:
var path = Path.of("file.txt");
List<String> lines = Files.readAllLines(path);
lines.forEach(System.out::println);
Or:
Files.lines(Path.of("file.txt")).forEach(System.out::println);