How can I properly read an Arabic dataset in java? - java

Scenario: I want to read an Arabic dataset with utf-8 encoding. Each word in each line is separated by a space.
Problem: When I read each line, the output is:
??????? ?? ???? ?? ???
Question: How can I read the file and print each line?
for more information, here is my Arabic dataset and part of my source code that reads data would be like the following:
private ContextCountsImpl extractContextCounts(Map<Integer, String> phraseMap) throws IOException {
Reader reader;
reader = new InputStreamReader(new FileInputStream(inputFile), "utf-8");
BufferedReader rdr = new BufferedReader(reader);
while (rdr.ready()) {
String line = rdr.readLine();
System.out.println(line);
List<String> phrases = splitLineInPhrases(line);
//any process on this file
}
}

I can read using UTF-8, Can you try like this.
public class ReadArabic {
public static void main(String[] args) {
try {
String line;
InputStream fileInputStream = new FileInputStream("arabic.txt");
Reader reader = new InputStreamReader(fileInputStream, "UTF-8"); // leave charset out for default
BufferedReader bufferedReader = new BufferedReader(reader);
while ((line = bufferedReader.readLine()) != null) {
System.out.println(line);
}
} catch (Exception e) {
System.err.println(e.getMessage()); // handle all exceptions
}
}
}

Related

Is there a way to encode data from Windows-1253 encoding to ISO-8859-1 in java?

By mistake I encoded hex data with Windows-1253 (Java eclipse option) but the data should be encoded with ISO-8859-1. Is there a way to re encode the data to get the right conversion?
If your version of Java supports Windows-1253 (mine does), then yes. You can check with Charset.isSupported.
Re-encoding:
void encode( File src, File tgt ) throws IOException {
if (Charset.isSupported("Windows-1253")) {
try (BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(src), "Windows-1253"))) {
try (BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(tgt), "ISO-8859-1"))) {
String del = "";
for (String line = br.readLine(); line != null; line = br.readLine()) {
bw.write(del);
bw.write(line);
del = "\r\n";
}
bw.flush();
}
}
} else {
throw new IOException("Unsupported character encoding: Windows-1253");
}
}
(Not tested.)
The way I read the data from java is this:
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
String line;
while ((line = br.readLine()) != null) {
System.out.println(line);
}

Reading bytes from a file

I am reading from a ".264" file using code below.
public static void main (String[] args) throws IOException
{
BufferedReader br = null;try {
String sCurrentLine;
br = new BufferedReader(new InputStreamReader(new FileInputStream("test.264"),"ISO-8859-1"));
StringBuffer stringBuffer = new StringBuffer();
while ((sCurrentLine = br.readLine()) != null) {
stringBuffer.append(sCurrentLine);
}
String tempdec = new String(asciiToHex(stringBuffer.toString()));
System.out.println(tempdec);
String asciiEquivalent = hexToASCII(tempdec);
BufferedWriter xx = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("C:/Users/Administrator/Desktop/yuvplayer-2.3/video dinalized/testret.264"),"ISO-8859-1"));
xx.write(asciiEquivalent);
xx.close();
}catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (br != null)br.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
Opening input and output file in HEX Editor show me some missing values, e.g. 0d (see pictures attached).
Any solution to fix this?
Lose InputStreamReader and BufferedReader, just use FileInputStream on its own.
No character encoding, no line endings, just bytes.
Its Javadoc page is here, that should be all you need.
Tip if you want to read the entire file at once: File.length(), as in
File file = new File("test.264");
byte[] buf = new byte[(int)file.length()];
// Use buf in InputStream.read()

Writing multiple queries from a test file

public static void main(String[] args) {
ArrayList<String> studentTokens = new ArrayList<String>();
ArrayList<String> studentIds = new ArrayList<String>();
try {
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream(new File("file1.txt"));
BufferedReader br = new BufferedReader(new InputStreamReader(fstream, "UTF8"));
String strLine;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
strLine = strLine.trim();
if ((strLine.length()!=0) && (!strLine.contains("#"))) {
String[] students = strLine.split("\\s+");
studentTokens.add(students[TOKEN_COLUMN]);
studentIds.add(students[STUDENT_ID_COLUMN]);
}
}
for (int i=0; i<studentIds.size();i++) {
File file = new File("query.txt"); // The path of the textfile that will be converted to csv for upload
BufferedReader reader = new BufferedReader(new FileReader(file));
String line = "", oldtext = "";
while ((line = reader.readLine()) != null) {
oldtext += line + "\r\n";
}
reader.close();
String newtext = oldtext.replace("sanid", studentIds.get(i)).replace("salabel",studentTokens.get(i)); // Here the name "sanket" will be replaced by the current time stamp
FileWriter writer = new FileWriter("final.txt",true);
writer.write(newtext);
writer.close();
}
fstream.close();
br.close();
System.out.println("Done!!");
} catch (Exception e) {
e.printStackTrace();
System.err.println("Error: " + e.getMessage());
}
}
The above code of mine reads data from a text file and query is a file that has a query in which 2 places "sanid" and "salabel" are replaced by the content of string array and writes another file final . But when i run the code the the final does not have the queries. but while debugging it shows that all the values are replaced properly.
but while debugging it shows that all the values are replaced properly
If the values are found to be replaced when you debugged the code, but they are missing in the file, I would suggest that you flush the output stream. You are closing the FileWriter without calling flush(). The close() method delegates its call to the underlying StreamEncoder which does not flush the stream either.
public void close() throws IOException {
se.close();
}
Try this
writer.flush();
writer.close();
That should do it.

Modify the content of a file using Java

I want to delete some content of file using java program as below. Is this the write method to replace in the same file or it should be copied to the another file.
But its deleting the all content of the file.
class FileReplace
{
ArrayList<String> lines = new ArrayList<String>();
String line = null;
public void doIt()
{
try
{
File f1 = new File("d:/new folder/t1.htm");
FileReader fr = new FileReader(f1);
BufferedReader br = new BufferedReader(fr);
while (line = br.readLine() != null)
{
if (line.contains("java"))
line = line.replace("java", " ");
lines.add(line);
}
FileWriter fw = new FileWriter(f1);
BufferedWriter out = new BufferedWriter(fw);
out.write(lines.toString());
}
catch (Exception ex)
{
ex.printStackTrace();
}
}
public statc void main(String args[])
{
FileReplace fr = new FileReplace();
fr.doIt();
}
}
I would start with closing reader, and flushing writer:
public class FileReplace {
List<String> lines = new ArrayList<String>();
String line = null;
public void doIt() {
try {
File f1 = new File("d:/new folder/t1.htm");
FileReader fr = new FileReader(f1);
BufferedReader br = new BufferedReader(fr);
while ((line = br.readLine()) != null) {
if (line.contains("java"))
line = line.replace("java", " ");
lines.add(line);
}
fr.close();
br.close();
FileWriter fw = new FileWriter(f1);
BufferedWriter out = new BufferedWriter(fw);
for(String s : lines)
out.write(s);
out.flush();
out.close();
} catch (Exception ex) {
ex.printStackTrace();
}
}
public static void main(String args[]) {
FileReplace fr = new FileReplace();
fr.doIt();
}
}
The accepted answer is great. However, there is an easier way to replace content in a file using Apache's commons-io library (commons-io-2.4.jar - you can use any latest versions)
private void update() throws IOException{
File file = new File("myPath/myFile.txt");
String fileContext = FileUtils.readFileToString(file);
fileContext = fileContext.replaceAll("_PLACEHOLDER_", "VALUE-TO-BE-REPLACED");
FileUtils.write(file, fileContext);
}
Note: Thrown IOException needs to be caught and handled by the application accordingly.
Read + write to the same file simulatenously is not ok.
EDIT: to rephrase and be more correct and specific - reading and writing to the same file, in the same thread, without properly closing the reader (and flusing the writer) is not ok.
Make sure to:
close any stream when you no longer need them
In particular before reopening it for writing.
truncate the file, to make sure it shrinks if you write less than it had.
then write the output
write individual lines, don't rely on toString.
flush and close when you are finished writing!
If you use buffered IO, you always have to ensure that the buffer is flushed at the end, or you might lose data!
I can see three problems.
First you are writing to out which I assume is System.out, not an output stream to the file.
Second, if you do write to an output stream to the file, you need to close it.
Third, the toString() method on an ArrayList isn't going to write the file like you are expecting. Loop over the list and write each String one at a time. Ask yourself whether you need to write newline characters as well.
The accepted answer is slightly wrong. Here's the correct code.
public class FileReplace {
List<String> lines = new ArrayList<String>();
String line = null;
public void doIt() {
try {
File f1 = new File("d:/new folder/t1.htm");
FileReader fr = new FileReader(f1);
BufferedReader br = new BufferedReader(fr);
while ((line = br.readLine()) != null) {
if (line.contains("java"))
line = line.replace("java", " ");
lines.add(line);
}
fr.close();
br.close();
FileWriter fw = new FileWriter(f1);
BufferedWriter out = new BufferedWriter(fw);
for(String s : lines)
out.write(s);
out.flush();
}
out.close();
catch (Exception ex) {
ex.printStackTrace();
}
}

Reading a text file in java

How would I read a .txt file in Java and put every line in an array when every lines contains integers, strings, and doubles? And every line has different amounts of words/numbers.
I'm a complete noob in Java so sorry if this question is a bit stupid.
Thanks
Try the Scanner class which no one knows about but can do almost anything with text.
To get a reader for a file, use
File file = new File ("...path...");
String encoding = "...."; // Encoding of your file
Reader reader = new BufferedReader (new InputStreamReader (
new FileInputStream (file), encoding));
... use reader ...
reader.close ();
You should really specify the encoding or else you will get strange results when you encounter umlauts, Unicode and the like.
Easiest option is to simply use the Apache Commons IO JAR and import the org.apache.commons.io.FileUtils class. There are many possibilities when using this class, but the most obvious would be as follows;
List<String> lines = FileUtils.readLines(new File("untitled.txt"));
It's that easy.
"Don't reinvent the wheel."
The best approach to read a file in Java is to open in, read line by line and process it and close the strea
// Open the file
FileInputStream fstream = new FileInputStream("textfile.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console - do what you want to do
System.out.println (strLine);
}
//Close the input stream
fstream.close();
To learn more about how to read file in Java, check out the article.
Your question is not very clear, so I'll only answer for the "read" part :
List<String> lines = new ArrayList<String>();
BufferedReader br = new BufferedReader(new FileReader("fileName"));
String line = br.readLine();
while (line != null)
{
lines.add(line);
line = br.readLine();
}
Common used:
String line = null;
File file = new File( "readme.txt" );
FileReader fr = null;
try
{
fr = new FileReader( file );
}
catch (FileNotFoundException e)
{
System.out.println( "File doesn't exists" );
e.printStackTrace();
}
BufferedReader br = new BufferedReader( fr );
try
{
while( (line = br.readLine()) != null )
{
System.out.println( line );
}
#user248921 first of all, you can store anything in string array , so you can make string array and store a line in array and use value in code whenever you want. you can use the below code to store heterogeneous(containing string, int, boolean,etc) lines in array.
public class user {
public static void main(String x[]) throws IOException{
BufferedReader b=new BufferedReader(new FileReader("<path to file>"));
String[] user=new String[500];
String line="";
while ((line = b.readLine()) != null) {
user[i]=line;
System.out.println(user[1]);
i++;
}
}
}
This is a nice way to work with Streams and Collectors.
List<String> myList;
try(BufferedReader reader = new BufferedReader(new FileReader("yourpath"))){
myList = reader.lines() // This will return a Stream<String>
.collect(Collectors.toList());
}catch(Exception e){
e.printStackTrace();
}
When working with Streams you have also multiple methods to filter, manipulate or reduce your input.
For Java 11 you could use the next short approach:
Path path = Path.of("file.txt");
try (var reader = Files.newBufferedReader(path)) {
String line;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
}
Or:
var path = Path.of("file.txt");
List<String> lines = Files.readAllLines(path);
lines.forEach(System.out::println);
Or:
Files.lines(Path.of("file.txt")).forEach(System.out::println);

Categories

Resources