Program not reading text file properly outside of netbeans - java

I've run into a weird problem with netbeans; my program needs to read a csv file and get the first two columns from it. I am using opencsv to do the parsing. After I built my program and tried to run the jar outside of netbeans, The program didn't behave as it should. After much, much debugging trying to figure out what is going wrong, I've managed to narrow down the problem just a little bit.
My program is supposed to read kanji from a text file. And it does so very well while inside netbeans. But if I try to run it outside of netbeans, two things happen.
1) It doesn't read in the right characters. If I output everything I read in into a new csv file, then instead of getting kanji, I get characters like: 会 and 髪. The second column of my csv file is in English and that gets read in and written out properly.
2) It doesn't read all the lines; when I counted how many lines were being read, inside the IDE that number was correct. But outside of it, I am missing about 100 or so lines.
Can any one help me figure out why this might be happening? I've never worked with anything but English so character encoding is a bit foreign to me. But I did check in netbeans and the encoding is set to utf-8.
Edit: code as requested in comment. Though I don't know if this will be that helpful. The variable map is a hashmap.
private void loadNames() {
Thread thread = new Thread(new Runnable() {
#Override
public void run() {
try {
reader = new CSVReader(new FileReader(file));
String[] line;
while ((line = reader.readNext()) != null) {
map.put(line[0], line[1]);
}
//debug code
int counter = 0;
CSVWriter writer = new CSVWriter(new FileWriter("test.txt"), ',',
CSVWriter.NO_QUOTE_CHARACTER,
CSVWriter.NO_ESCAPE_CHARACTER,
System.getProperty("line.separator"));
for(Map.Entry<String, String> entry : map.entrySet()){
String[] string = {entry.getKey(), entry.getValue()};
writer.writeNext(string);
counter++;
}
JOptionPane.showMessageDialog(null, counter);
} catch (FileNotFoundException ex) {
ErrorHandler.displayError("ka_data.csv file not found in folder Data");
} catch (IOException ex) {
ErrorHandler.displayError("error at readNext");
}
}
});
thread.start();
}

Have you tried manually setting the charset when reading the file in? I suspect you have a default charset issue. Try:
new CSVReader(new InputStreamReader(new FileInputStream(file), "UTF-8"));

Related

CsvMalformedLineException: Unterminated quoted field at end of CSV line

I am writing code to process a list of tar.gz files, inside which there are multiple, csv files. I have encountered the error below
com.opencsv.exceptions.CsvMalformedLineException: Unterminated quoted field at end of CSV line. Beginning of lost text: [,,,,,,
]
at com.opencsv.CSVReader.primeNextRecord(CSVReader.java:245)
at com.opencsv.CSVReader.flexibleRead(CSVReader.java:598)
at com.opencsv.CSVReader.readNext(CSVReader.java:204)
at uk.ac.shef.inf.analysis.Test.readAllLines(Test.java:64)
at uk.ac.shef.inf.analysis.Test.main(Test.java:42)
And the code causing this problem is below, on line B.
public class Test {
public static void main(String[] args) {
try {
Path source = Paths.get("/home/xxxx/Work/data/amazon/labelled/small/Books_5.json.1.tar.gz");
InputStream fi = Files.newInputStream(source);
BufferedInputStream bi = new BufferedInputStream(fi);
GzipCompressorInputStream gzi = new GzipCompressorInputStream(bi);
TarArchiveInputStream ti = new TarArchiveInputStream(gzi);
CSVParser parser = new CSVParserBuilder().withStrictQuotes(true)
.withQuoteChar('"').withSeparator(',').
.withEscapeChar('|'). // Line A
build();
BufferedReader br = null;
ArchiveEntry entry;
entry = ti.getNextEntry();
while (entry != null) {
br = new BufferedReader(new InputStreamReader(ti)); // Read directly from tarInput
System.out.format("\n%s\t\t > %s", new Date(), entry.getName());
try{
CSVReader reader = new CSVReaderBuilder(br).withCSVParser(parser)
.build();
List<String[]> r = readAllLines(reader);
} catch (Exception ioe){
ioe.printStackTrace();
}
System.out.println(entry.getName());
entry=ti.getNextEntry(); // Line B
}
}catch (Exception e){
e.printStackTrace();
}
}
private static List<String[]> readAllLines(CSVReader reader) {
List<String[]> out = new ArrayList<>();
int line=0;
try{
String[] lineInArray = reader.readNext();
while(lineInArray!=null) {
//System.out.println(Arrays.asList(lineInArray));
out.add(lineInArray);
line++;
lineInArray=reader.readNext();
}
}catch (Exception e){
System.out.println(line);
e.printStackTrace();
}
System.out.println(out.size());
return out;
}
}
I also attach a screenshot of the actual line within the csv file that caused this problem here, look at line 5213. I also include a test tar.gz file here: https://drive.google.com/file/d/1qHfWiJItnE19-BFdbQ3s3Gek__VkoUqk/view?usp=sharing
While debugging, I have some questions.
I think the issue is the \ character in the data file (line 5213 above), which is the escape character in Java. I verified this idea by adding line A to my code above, and it works. However, obviously I don't want to hardcode this as there can be other characters in the data causing same issue. So my question 1 is: is there anyway to tell Java to ignore escape characters? Something like the opposite of withEscapeChar('|')? UPDATE: the answer is to use '\0', thanks to the first comment below.
When debugging, I notice that my program stops working on the next .csv file within the tar.gz file as soon as it hit the above exception. To explain what I mean, inside the tar.gz file included in the above link, there are two csvs: _10.csv and _110.csv. The problematic line is in _10.csv. When my program hit that line, an exception is thrown and the program moves on to the next file _110.csv (entry=ti.getNextEntry();). This file is actually fine, but the method readAllLines that is supposed to read this next csv file will throw the same exception immediately on the first line. I don't think my code is correct, especially the while loop: I suspect the input stream was still stuck at the previous position that caused the exception. But I don't know how to fix this. Help please?
using RFC4180Parser worked for me.

Creating a File using PrintWriter in Java, and Writing to that File

I'm trying to write a program that reads a file (which is a Java source file), makes an Arraylist of certain specified values from that file. and outputs that Arraylist into another resulting file.
I'm using PrintWriter to make the new resulting file. This is a summarised version of my program:
ArrayList<String> exampleArrayList = new ArrayList<String>();
File actualInputFile = new File("C:/Desktop/example.java");
PrintWriter resultingSpreadsheet= new PrintWriter("C:/Desktop/SpreadsheetValues.txt", "UTF-8");
FileReader fr = new FileReader(actualInputFile);
BufferedReader br = new BufferedReader(fr);
String line=null;
while ((line = br.readLine()) != null) {
// code that makes ArrayList
}
for (int i = 0; i < exampleArrayList.size(); i++) {
resultingSpreadsheet.println(exampleArrayList.get(i));
}
resultingSpreadsheet.close();
The problem is that when i run this, nothing gets printed to the resultingSpreadsheet. It's completely empty.
BUT, this program works perfectly (meaning that it prints out everything correctly to the resultingSpreadsheet file) when I replace:
File actualInputFile = new File("C:/Desktop/example.java");
which is the file that I want as my input file, and which has a size of 481 KB,
with:
File smallerInputFile = new File("C:/Desktop/smallerExample.txt");
which is really just a smaller .txt example version of the .java source file, and it has a size of 1.08 KB.
I've tried a few things including flushing the PrintWriter, wrapping it around FileWriter, copy-pasting all the code from the .java file into a text file in case it was an extension problem, but these don't seem to work.
I'm starting to think it must be because of the size of the file that the PrintWriter makes, but it's very possible that that's not the problem. Perhaps I need to put everything in a stream (like it says here: http://docs.oracle.com/javase/6/docs/api/java/io/PrintWriter.html)? If so, how would I do that?
Why is reading the bigger actualInputFile and outputting its data correctly such a problem, when everything works fine for the smallerInputFile?
Can anyone help with this?
Check for exceptions while writing to the the excel sheet , because i really don't think its a problem of size. Below is the sample code that is executing successfully and the file size was approx 1 MB.
public class Test {
/**
* #param args
*/
public static void main(String[] args) {
BufferedReader br = null;
try {
String sCurrentLine;
br = new BufferedReader(new FileReader("D:\\AdminController.java"));
while ((sCurrentLine = br.readLine()) != null) {
System.out.println(sCurrentLine);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (br != null)br.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
}
This should go as a comment, but I do not have the rep. In the documentation it has both write methods and print methods. Have you tried using write() instead?
I doubt it's the size of the file, it may be between the two files you are testing one is .txt, and the other is .java
EDIT: Probably second suggestion of the two. First is just something I noticed with the docs.
The methods of PrintWriter do not throw Exception. Call the checkError() method which would flush the stream as well as return true if an error occurred. It is quite possible that an error occurred processing the larger file, an encoding error for instance.
Check your program. When the file is empty it means that your program doesn't close the PrintWriter before finishing the program.
For example you may have a return in a part of your program which cause that resultingSpreadsheet.close(); have not being run.

TextArea - Any way to get all text?

so I'm designing a text editor. For the Open/Save methods, I'm trying to use a TextArea (it doesn't have to be one, it's just my current method). Now, I have two problems right now:
1) When I load a file, it currently doesn't remove the contents currently in the text editor. For example, if I typed in "Owl", then loaded a file that contained "Rat", it would end up as "OwlRat". To solve this, I plan to use the replaceRange method (again however, it isn't absolute, any suggestions would be great!). However, I must replace all the contents of the text editor, not just selected text, and I can't figure out how to do that. Any tips?
2) Currently, when I load a file, nothing will happen unless I saved that file the same time I ran the application. So, for example, running the program, saving a file, closing the program, running the program again, and then loading the file will give nothing. I know this is because the String x doesn't carry over, but I can't think of anyway to fix it. Somebody suggested Vectors, but I don't see how they would help...
Here is the code for the Open/Save methods:
Open:
public void Open(String name){
File textFile = new File(name + ".txt.");
BufferedReader reader = null;
try
{
textArea.append(x);
reader = new BufferedReader( new FileReader( textFile));
reader.read();
}
catch ( IOException e)
{
}
finally
{
try
{
if (reader != null)
reader.close();
}
catch (IOException e)
{
}
}
}
Save:
public void Save(String name){
File textFile = new File(name + ".txt");
BufferedWriter writer = null;
try
{
writer = new BufferedWriter( new FileWriter(textFile));
writer.write(name);
x = textArea.getText();
}
catch ( IOException e)
{
}
finally
{
try
{
if ( writer != null)
writer.close( );
}
catch ( IOException e)
{
}
}
}
I had this same problem my guy friend, after much thought and research I even found a solution.
You can use the ArrayList to put all the contents of the TextArea and send as parameter by calling the save, as the writer just wrote string lines, then we use the "for" line by line to write our ArrayList in the end we will be content TextArea in txt file.
if something does not make sense, I'm sorry is google translator and I who do not speak English.
Watch the Windows Notepad, it does not always jump lines, and shows all in one line, use Wordpad ok.
private void SaveActionPerformed(java.awt.event.ActionEvent evt) {
String NameFile = Name.getText();
ArrayList< String > Text = new ArrayList< String >();
Text.add(TextArea.getText());
SaveFile(NameFile, Text);
}
public void SaveFile(String name, ArrayList< String> message) {
path = "C:\\Users\\Paulo Brito\\Desktop\\" + name + ".txt";
File file1 = new File(path);
try {
if (!file1.exists()) {
file1.createNewFile();
}
File[] files = file1.listFiles();
FileWriter fw = new FileWriter(file1, true);
BufferedWriter bw = new BufferedWriter(fw);
for (int i = 0; i < message.size(); i++) {
bw.write(message.get(i));
bw.newLine();
}
bw.close();
fw.close();
FileReader fr = new FileReader(file1);
BufferedReader br = new BufferedReader(fr);
fw = new FileWriter(file1, true);
bw = new BufferedWriter(fw);
while (br.ready()) {
String line = br.readLine();
System.out.println(line);
bw.write(line);
bw.newLine();
}
br.close();
fr.close();
} catch (IOException ex) {
ex.printStackTrace();
JOptionPane.showMessageDialog(null, "Error in" + ex);
}
There's a lot going on here...
What is 'x' (hint: it's not anything from the file!), and why are you appending it to the text area?
BufferedReader.read() returns one character, which is probably not what you're expecting. Try looping across readline().
Follow Dave Newton's advice to handle your exceptions and provide better names for your variables.
The text file will persist across multiple invocation of your program, so the lack of data has nothing to do with that.
Good luck.
Use textArea.setText(TEXT); rather than append; append means to add on to, so when you append text to a TextArea, you add that text to it. setText on the other hand will set the text, replacing the old text with the new one (which is what you want).
As far as why it's failing to read, you are not reading correctly. First of all, .read() just reads a single character (not what you want). Second, you don't appear to do anything with the returned results. Go somewhere (like here) to find out how to read the file properly, then take the returned string and do textArea.setText(readString);.
And like the others said, use e.printStackTrace(); in all of your catch blocks to make the error actually show up in your console.

BufferedReader not reading file (Android)

I am having a problem reading files with bufferedReader... I am trying to read in a dictionary file where every word is on a newline. It works for one file I have, but when I tried adding a larger wordlist file the enable wordlist and then on the first read: 'while ((currentLine=br.readLine()) != null) ' it cause an exception with no description... Please help!
try
{
InputStream is = this.getResources().openRawResource(R.raw.enable1);
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String currentLine=null;
while ((currentLine=br.readLine()) != null)
{
dictionaryList.add(currentLine);
}
br.close();
}
catch (Exception e)
{
//error here
}
*Looks like there is a file size limit of 1048576 bytes... otherwise it crashes.
So I like I said in the edit the new wordlist was over 1048576 bytes and was causing an IO exception without any error... (i had a string set to e.Message() in the catch put the message was null)
What I did was divide the wordlist into separate files based on word size (btw there are 26 different files! message me if you want them)
then depending on the size of the word I have I load the specific wordlist where all of the files are in the format enable# (# is the word size). If anyone wants to know I am doing that like this:
int wordListID=0;
String wordList="enable"+goodText.length();
try {
Class res = R.raw.class;
Field field = res.getField(wordList);
wordListID= field.getInt(null);
}
catch (Exception e) {
//something
}
i then send that specific wordListID to:
InputStream is = this.getResources().openRawResource(wordListID);
and know I have a small enough file which actually helps my performance too!
*This is my first application so I may not be doing things the correct way... just trying to get the hang of things

Reading a line from a text file using FileReader, using System.out.println seems print in unicode?

Im still teaching myself Java so I wanted to try to read a text file and step 1) output it to console and step 2) write the contents to a new txt file.
Here is some code I have google'd to start with and it is reading the file, but when I output the line contents to the console I get the following (looks like its outputting in unicode or something... like every character as an extra byte associated to it....
ÿþFF□u□l□l□ □T□i□l□t□ □P□o□k□e□r□ <SNIP>
Here is what the first line of the file looks like when I open in via notepad:
Full Tilt Poker Game #xxxxxxxxxx: $1 + $0.20 Sit & Go (xxxxxxxx), Table 1 - 15/30 - No Limit Hold'em - 22:09:45 ET - 2009/12/26
Here is my code, do I need to specify the encoding to display txt file contents to the console? I assumed that simple text would be straight forward for java...but Im new and don't understand much about how finicky java is yet.
EDIT: I dont know if it matters but Im using Eclipse as my IDE currently.
package readWrite;
import java.io.*;
public class Read {
public static void main(String args[])
{
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader("C:\\Users\\brian\\workspace\\downloads\\poker_text.txt"));
String line = reader.readLine();
while (line!=null) {
// Print read line
System.out.println(line);
// Read next line for while condition
line = reader.readLine();
}
} catch (IOException ioe) {
System.out.println(ioe.getMessage());
} finally {
try { if (reader!=null) reader.close(); } catch (Exception e) {}
}
}
}
The ÿþ at the beginning appears to be a Byte Order Mark for a UTF-16 encoded file.
http://en.wikipedia.org/wiki/Byte_order_mark#UTF-16
You might need to read the file in a different manner so Java can convert those UTF-16 characters to something your System.out can display.
Try something like this
FileInputStream fis = new FileInputStream("filename");
BufferedReader reader = new BufferedReader(new InputStreamReader(fis, "UTF-16"));
OR
Open up your text file in notepad again, and File/Save As. On the save screen (at least in windows 7) there is a pulldown with the encoding setting. Choose ANSI or UTF-8

Categories

Resources