I am writing code to process a list of tar.gz files, inside which there are multiple, csv files. I have encountered the error below
com.opencsv.exceptions.CsvMalformedLineException: Unterminated quoted field at end of CSV line. Beginning of lost text: [,,,,,,
]
at com.opencsv.CSVReader.primeNextRecord(CSVReader.java:245)
at com.opencsv.CSVReader.flexibleRead(CSVReader.java:598)
at com.opencsv.CSVReader.readNext(CSVReader.java:204)
at uk.ac.shef.inf.analysis.Test.readAllLines(Test.java:64)
at uk.ac.shef.inf.analysis.Test.main(Test.java:42)
And the code causing this problem is below, on line B.
public class Test {
public static void main(String[] args) {
try {
Path source = Paths.get("/home/xxxx/Work/data/amazon/labelled/small/Books_5.json.1.tar.gz");
InputStream fi = Files.newInputStream(source);
BufferedInputStream bi = new BufferedInputStream(fi);
GzipCompressorInputStream gzi = new GzipCompressorInputStream(bi);
TarArchiveInputStream ti = new TarArchiveInputStream(gzi);
CSVParser parser = new CSVParserBuilder().withStrictQuotes(true)
.withQuoteChar('"').withSeparator(',').
.withEscapeChar('|'). // Line A
build();
BufferedReader br = null;
ArchiveEntry entry;
entry = ti.getNextEntry();
while (entry != null) {
br = new BufferedReader(new InputStreamReader(ti)); // Read directly from tarInput
System.out.format("\n%s\t\t > %s", new Date(), entry.getName());
try{
CSVReader reader = new CSVReaderBuilder(br).withCSVParser(parser)
.build();
List<String[]> r = readAllLines(reader);
} catch (Exception ioe){
ioe.printStackTrace();
}
System.out.println(entry.getName());
entry=ti.getNextEntry(); // Line B
}
}catch (Exception e){
e.printStackTrace();
}
}
private static List<String[]> readAllLines(CSVReader reader) {
List<String[]> out = new ArrayList<>();
int line=0;
try{
String[] lineInArray = reader.readNext();
while(lineInArray!=null) {
//System.out.println(Arrays.asList(lineInArray));
out.add(lineInArray);
line++;
lineInArray=reader.readNext();
}
}catch (Exception e){
System.out.println(line);
e.printStackTrace();
}
System.out.println(out.size());
return out;
}
}
I also attach a screenshot of the actual line within the csv file that caused this problem here, look at line 5213. I also include a test tar.gz file here: https://drive.google.com/file/d/1qHfWiJItnE19-BFdbQ3s3Gek__VkoUqk/view?usp=sharing
While debugging, I have some questions.
I think the issue is the \ character in the data file (line 5213 above), which is the escape character in Java. I verified this idea by adding line A to my code above, and it works. However, obviously I don't want to hardcode this as there can be other characters in the data causing same issue. So my question 1 is: is there anyway to tell Java to ignore escape characters? Something like the opposite of withEscapeChar('|')? UPDATE: the answer is to use '\0', thanks to the first comment below.
When debugging, I notice that my program stops working on the next .csv file within the tar.gz file as soon as it hit the above exception. To explain what I mean, inside the tar.gz file included in the above link, there are two csvs: _10.csv and _110.csv. The problematic line is in _10.csv. When my program hit that line, an exception is thrown and the program moves on to the next file _110.csv (entry=ti.getNextEntry();). This file is actually fine, but the method readAllLines that is supposed to read this next csv file will throw the same exception immediately on the first line. I don't think my code is correct, especially the while loop: I suspect the input stream was still stuck at the previous position that caused the exception. But I don't know how to fix this. Help please?
using RFC4180Parser worked for me.
Related
I've run into a weird problem with netbeans; my program needs to read a csv file and get the first two columns from it. I am using opencsv to do the parsing. After I built my program and tried to run the jar outside of netbeans, The program didn't behave as it should. After much, much debugging trying to figure out what is going wrong, I've managed to narrow down the problem just a little bit.
My program is supposed to read kanji from a text file. And it does so very well while inside netbeans. But if I try to run it outside of netbeans, two things happen.
1) It doesn't read in the right characters. If I output everything I read in into a new csv file, then instead of getting kanji, I get characters like: 会 and 髪. The second column of my csv file is in English and that gets read in and written out properly.
2) It doesn't read all the lines; when I counted how many lines were being read, inside the IDE that number was correct. But outside of it, I am missing about 100 or so lines.
Can any one help me figure out why this might be happening? I've never worked with anything but English so character encoding is a bit foreign to me. But I did check in netbeans and the encoding is set to utf-8.
Edit: code as requested in comment. Though I don't know if this will be that helpful. The variable map is a hashmap.
private void loadNames() {
Thread thread = new Thread(new Runnable() {
#Override
public void run() {
try {
reader = new CSVReader(new FileReader(file));
String[] line;
while ((line = reader.readNext()) != null) {
map.put(line[0], line[1]);
}
//debug code
int counter = 0;
CSVWriter writer = new CSVWriter(new FileWriter("test.txt"), ',',
CSVWriter.NO_QUOTE_CHARACTER,
CSVWriter.NO_ESCAPE_CHARACTER,
System.getProperty("line.separator"));
for(Map.Entry<String, String> entry : map.entrySet()){
String[] string = {entry.getKey(), entry.getValue()};
writer.writeNext(string);
counter++;
}
JOptionPane.showMessageDialog(null, counter);
} catch (FileNotFoundException ex) {
ErrorHandler.displayError("ka_data.csv file not found in folder Data");
} catch (IOException ex) {
ErrorHandler.displayError("error at readNext");
}
}
});
thread.start();
}
Have you tried manually setting the charset when reading the file in? I suspect you have a default charset issue. Try:
new CSVReader(new InputStreamReader(new FileInputStream(file), "UTF-8"));
I'm trying to write a program that reads a file (which is a Java source file), makes an Arraylist of certain specified values from that file. and outputs that Arraylist into another resulting file.
I'm using PrintWriter to make the new resulting file. This is a summarised version of my program:
ArrayList<String> exampleArrayList = new ArrayList<String>();
File actualInputFile = new File("C:/Desktop/example.java");
PrintWriter resultingSpreadsheet= new PrintWriter("C:/Desktop/SpreadsheetValues.txt", "UTF-8");
FileReader fr = new FileReader(actualInputFile);
BufferedReader br = new BufferedReader(fr);
String line=null;
while ((line = br.readLine()) != null) {
// code that makes ArrayList
}
for (int i = 0; i < exampleArrayList.size(); i++) {
resultingSpreadsheet.println(exampleArrayList.get(i));
}
resultingSpreadsheet.close();
The problem is that when i run this, nothing gets printed to the resultingSpreadsheet. It's completely empty.
BUT, this program works perfectly (meaning that it prints out everything correctly to the resultingSpreadsheet file) when I replace:
File actualInputFile = new File("C:/Desktop/example.java");
which is the file that I want as my input file, and which has a size of 481 KB,
with:
File smallerInputFile = new File("C:/Desktop/smallerExample.txt");
which is really just a smaller .txt example version of the .java source file, and it has a size of 1.08 KB.
I've tried a few things including flushing the PrintWriter, wrapping it around FileWriter, copy-pasting all the code from the .java file into a text file in case it was an extension problem, but these don't seem to work.
I'm starting to think it must be because of the size of the file that the PrintWriter makes, but it's very possible that that's not the problem. Perhaps I need to put everything in a stream (like it says here: http://docs.oracle.com/javase/6/docs/api/java/io/PrintWriter.html)? If so, how would I do that?
Why is reading the bigger actualInputFile and outputting its data correctly such a problem, when everything works fine for the smallerInputFile?
Can anyone help with this?
Check for exceptions while writing to the the excel sheet , because i really don't think its a problem of size. Below is the sample code that is executing successfully and the file size was approx 1 MB.
public class Test {
/**
* #param args
*/
public static void main(String[] args) {
BufferedReader br = null;
try {
String sCurrentLine;
br = new BufferedReader(new FileReader("D:\\AdminController.java"));
while ((sCurrentLine = br.readLine()) != null) {
System.out.println(sCurrentLine);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (br != null)br.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
}
This should go as a comment, but I do not have the rep. In the documentation it has both write methods and print methods. Have you tried using write() instead?
I doubt it's the size of the file, it may be between the two files you are testing one is .txt, and the other is .java
EDIT: Probably second suggestion of the two. First is just something I noticed with the docs.
The methods of PrintWriter do not throw Exception. Call the checkError() method which would flush the stream as well as return true if an error occurred. It is quite possible that an error occurred processing the larger file, an encoding error for instance.
Check your program. When the file is empty it means that your program doesn't close the PrintWriter before finishing the program.
For example you may have a return in a part of your program which cause that resultingSpreadsheet.close(); have not being run.
I want to get multiple file by parsing a input file Through Java.
The Input file contains many fasta format of thousands of protein sequence and I want to generate raw format(i.e., without any comma semicolon and without any extra symbol like ">", "[", "]" etc) of each protein sequence.
A fasta sequence starts form ">" symbol followed by description of protein and then sequence of protein.
For example ► >lcl|NC_000001.10_cdsid_XP_003403591.1 [gene=LOC100652771]
[protein=hypothetical protein LOC100652771] [protein_id=XP_003403591.1] [location=join(12190..12227,12595..12721,13403..13639)]
MSESINFSHNLGQLLSPPRCVVMPGMPFPSIRSPELQKTTADLDHTLVSVPSVAESLHHPEITFLTAFCL
PSFTRSRPLPDRQLHHCLALCPSFALPAGDGVCHGPGLQGSCYKGETQESVESRVLPGPRHRH
Like above formate the input file contains 1000s of protein sequence. I have to generate thousands of raw file containing only individual protein sequence without any special symbol or gaps.
I have developed the code for it in Java but out put is : Cannot open a file followed by cannot find file.
Please help me to solve my problem.
Regards
Vijay Kumar Garg
Varanasi
Bharat (India)
The code is
/*Java code to convert FASTA format to a raw format*/
import java.io.*;
import java.util.*;
import java.util.regex.*;
import java.io.FileInputStream;
// java package for using regular expression
public class Arrayren
{
public static void main(String args[]) throws IOException
{
String a[]=new String[1000];
String b[][] =new String[1000][1000];
/*open the id file*/
try
{
File f = new File ("input.txt");
//opening the text document containing genbank ids
FileInputStream fis = new FileInputStream("input.txt");
//Reading the file contents through inputstream
BufferedInputStream bis = new BufferedInputStream(fis);
// Writing the contents to a buffered stream
DataInputStream dis = new DataInputStream(bis);
//Method for reading Java Standard data types
String inputline;
String line;
String separator = System.getProperty("line.separator");
// reads a line till next line operator is found
int i=0;
while ((inputline=dis.readLine()) != null)
{
i++;
a[i]=inputline;
a[i]=a[i].replaceAll(separator,"");
//replaces unwanted patterns like /n with space
a[i]=a[i].trim();
// trims out if any space is available
a[i]=a[i]+".txt";
//takes the file name into an array
try
// to handle run time error
/*take the sequence in to an array*/
{
BufferedReader in = new BufferedReader (new FileReader(a[i]));
String inline = null;
int j=0;
while((inline=in.readLine()) != null)
{
j++;
b[i][j]=inline;
Pattern q=Pattern.compile(">");
//Compiling the regular expression
Matcher n=q.matcher(inline);
//creates the matcher for the above pattern
if(n.find())
{
/*appending the comment line*/
b[i][j]=b[i][j].replaceAll(">gi","");
//identify the pattern and replace it with a space
b[i][j]=b[i][j].replaceAll("[a-zA-Z]","");
b[i][j]=b[i][j].replaceAll("|","");
b[i][j]=b[i][j].replaceAll("\\d{1,15}","");
b[i][j]=b[i][j].replaceAll(".","");
b[i][j]=b[i][j].replaceAll("_","");
b[i][j]=b[i][j].replaceAll("\\(","");
b[i][j]=b[i][j].replaceAll("\\)","");
}
/*printing the sequence in to a text file*/
b[i][j]=b[i][j].replaceAll(separator,"");
b[i][j]=b[i][j].trim();
// trims out if any space is available
File create = new File(inputline+"R.txt");
try
{
if(!create.exists())
{
create.createNewFile();
// creates a new file
}
else
{
System.out.println("file already exists");
}
}
catch(IOException e)
// to catch the exception and print the error if cannot open a file
{
System.err.println("cannot create a file");
}
BufferedWriter outt = new BufferedWriter(new FileWriter(inputline+"R.txt", true));
outt.write(b[i][j]);
// printing the contents to a text file
outt.close();
// closing the text file
System.out.println(b[i][j]);
}
}
catch(Exception e)
{
System.out.println("cannot open a file");
}
}
}
catch(Exception ex)
// catch the exception and prints the error if cannot find file
{
System.out.println("cannot find file ");
}
}
}
If you provide me correct it will be much easier to understand.
This code will not win prices, due to missing java expertice. For instance I would expect OutOfMemory even if it is correct.
Best would be a rewrite. Nevertheless we all began small.
Give full path to file. Also on the output the directory is probably missing from the file.
Better use BufferedReader etc. i.o. DateInputStream.
Initialize i with -1. Better use for (int i = 0; i < a.length; ++i).
Best compile the Pattern outside the loop. But remove the Matcher. You can do if (s.contains(">") as well.
. One does not need to create a new file.
Code:
const String encoding = "Windows-1252"; // Or "UTF-8" or leave away.
File f = new File("C:/input.txt");
BufferedReader dis = new BufferedReader(new InputStreamReader(
new FileInputStream(f), encoding));
...
int i= -1; // So i++ starts with 0.
while ((inputline=dis.readLine()) != null)
{
i++;
a[i]=inputline.trim();
//replaces unwanted patterns like /n with space
// Not needed a[i]=a[i].replaceAll(separator,"");
Your code contains the following two catch blocks:
catch(Exception e)
{
System.out.println("cannot open a file");
}
catch(Exception ex)
// catch the exception and prints the error if cannot find file
{
System.out.println("cannot find file ");
}
Both of these swallow the exception and print a generic "it didn't work" message, which tells you that the catch block was entered, but nothing more than that.
Exceptions often contain useful information that would help you track down where the real problem is. By ignoring them, you're making it much harder to diagnose your problem. Worse still, you're catching Exception, which is the superclass of a lot of exceptions, so these catch blocks are catching lots of different types of exceptions and ignoring them all.
The simplest way to get information out of an exception is to call its printStackTrace() method, which prints the exception type, exception message and stack trace. Add a call to this within both of these catch blocks, and that will help you see more clearly what exception is being thrown and from where.
I wrote some code to read in a text file and to return an array with each line stored in an element. I can't for the life of me work out why this isn't working...can anyone have a quick look? The output from the System.out.println(line); is null so I'm guessing there's a problem reading the line in, but I can't see why. Btw, the file i'm passing to it definitely has something in it!
public InOutSys(String filename) {
try {
file = new File(filename);
br = new BufferedReader(new FileReader(file));
bw = new BufferedWriter(new FileWriter(file));
} catch (Exception e) {
e.printStackTrace();
}
}
public String[] readFile() {
ArrayList<String> dataList = new ArrayList<String>(); // use ArrayList because it can expand automatically
try {
String line;
// Read in lines of the document until you read a null line
do {
line = br.readLine();
System.out.println(line);
dataList.add(line);
} while (line != null && !line.isEmpty());
br.close();
} catch (Exception e) {
e.printStackTrace();
}
// Convert the ArrayList into an Array
String[] dataArr = new String[dataList.size()];
dataArr = dataList.toArray(dataArr);
// Test
for (String s : dataArr)
System.out.println(s);
return dataArr; // Returns an array containing the separate lines of the
// file
}
First, you open a FileWriter once after opening a FileReader using new FileWriter(file), which open a file in create mode. So it will be an empty file after you run your program.
Second, is there an empty line in your file? if so, !line.isEmpty() will terminate your do-while-loop.
You're using a FileWriter to the file you're reading, so the FileWriter clears the content of the file. Don't read and write to the same file concurrently.
Also:
don't assume a file contains a line. You shouldn't use a do/while loop, but rather a while loop;
always close steams, readers and writers in a finally block;
catch(Exception) is a bad practice. Only catch the exceptions you want, and can handle. Else, let them go up the stack.
I'm not sure if you're looking for a way to improve your provided code or just for a solution for "Reading in text file in Java" as the title said, but if you're looking for a solution I'd recommend using apache commons io to do it for you. The readLines method from FileUtils will do exactly what you want.
If you're looking to learn from a good example, FileUtils is open source, so you can take a look at how they chose to implement it by looking at the source.
There are several possible causes for your problem:
The file path is incorrect
You shouldn't try to read/write the same file at the same time
It's not such a good idea to initialize the buffers in the constructor, think of it - some method might close the buffer making it invalid for subsequent calls of that or other methods
The loop condition is incorrect
Better try this approach for reading:
try {
String line = null;
BufferedReader br = new BufferedReader(new FileReader(file));
while ((line = br.readLine()) != null) {
System.out.println(line);
dataList.add(line);
}
} finally {
if (br != null)
br.close();
}
this is the code that i have found in the internet for reading the lines of a file and also I use eclipse and I passed the name of files as SanShin.txt in its argument field. but it will print :
Error: textfile.txt (The system cannot find the file specified)
Code:
public class Zip {
public static void main(String[] args){
try{
// Open the file that is the first
// command line parameter
FileInputStream fstream = new FileInputStream("textfile.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console
System.out.println (strLine);
}
//Close the input stream
in.close();
}catch (Exception e){//Catch exception if any
System.err.println("Error: " + e.getMessage());
}
}
}
please help me why it prints this error.
thanks
...
// command line parameter
if(argv.length != 1) {
System.err.println("Invalid command line, exactly one argument required");
System.exit(1);
}
try {
FileInputStream fstream = new FileInputStream(argv[0]);
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
// Get the object of DataInputStream
...
> java -cp ... Zip \path\to\test.file
When you just specify "textfile.txt" the operating system will look in the program's working directory for that file.
You can specify the absolute path to the file with something like new FileInputStream("C:\\full\\path\\to\\file.txt")
Also if you want to know the directory your program is running in, try this:
System.out.println(new File(".").getAbsolutePath())
Your new FileInputStream("textfile.txt") is correct. If it's throwing that exception, there is no textfile.txt in the current directory when you run the program. Are you sure the file's name isn't actually testfile.txt (note the s, not x, in the third position).
Off-topic: But your earlier deleted question asked how to read a file line by line (I didn't think you needed to delete it, FWIW). On the assumption you're still a beginner and getting the hang of things, a pointer: You probably don't want to be using FileInputStream, which is for binary files, but instead use the Reader set of interfaces/classes in java.io (including FileReader). Also, whenever possible, declare your variables using the interface, even when initializing them to a specific class, so for instance, Reader r = new FileReader("textfile.txt") (rather than FileReader r = ...).