Java CSV Reader - Replace Quotes

Java CSV Reader - Replace Quotes - java

I am using CSVReader to read the csv file in Java. In my case, the csv file will have double quotes (") and single quotes ('). Something like this.
SL 12" WIR TREE ASST CD
The below code i am using to read the file.
CsvReader reader = null;
reader = readFile(fileName, delimiter, encoding);
while (reader.readRecord()) {
// Code Part
}
Whenever it cross the reader.readrecord(), its throwing the exception as 'Maximum column length of 100,000 exceeded in column 0 in record 0. Set the SafetySwitch property to false if you're expecting column lengths greater than 100,000 characters to avoid this error.'
What i am trying to do and what i need is,
Since i can't able to do any changes in the file, i am trying to replace the double quotes and single quotes to empty string in java. But it is throwing exception, what ever i mentioned above.

I don't know what CsvReader is (it is not part of standard JDK) but the problem seems to occur in readRecord() and thus way before you have the chance to replace any character. So, CsvReader is not usable here and you should use a less specialised reader such as java.io.BufferedReader, for example.
Given, the delimiter is not a quote or double quote (for obvious reasons) then this code snippet works:
File file = new File(fileName);
InputStream is = new FileInputStream(file);
BufferedReader reader = new BufferedReader(new InputStreamReader(is, encoding));
try {
String line = reader.readLine();
while (line != null) {
//replace qoutes
line = line.replace("\"", "");
line = line.replace("'", "");
//split line according to given delimiter
String[] items = line.split(delimiter);
//handle items...
line = reader.readLine();
}
}
catch (IOException e) {
//handle exception...
}

Related

characters not appearing when I print when I import a file?

I'm importing a file into my code and trying to print it. the file contains
i don't like cake.
pizza is good.
i don’t like "cookies" to.
17.
29.
the second dont has a "right single quotation" and when I print it the output is
don�t
the question mark is printed out a blank square. is there a way to convert it to a regular apostrophe?
EDIT:
public class Somethingsomething {
public static void main(String[] args) throws FileNotFoundException,
IOException {
ArrayList<String> list = new ArrayList<String>();
File file = new File("D:\\project1Test.txt");//D:\\project1Test.txt
if(file.exists()){//checks if file exist
FileInputStream fileStream = new FileInputStream(file);
InputStreamReader input = new InputStreamReader(fileStream);
BufferedReader reader = new BufferedReader(input);
String line;
while( (line = reader.readLine()) != null) {
list.add(line);
}
for(int i = 0; i < list.size(); i ++){
System.out.println(list.get(i));
}
}
}}
it should print as normal but the second "don't" has a white block on the apostrophe
this is the file I'm using https://www.mediafire.com/file/8rk7nwilpj7rn7s/project1Test.txt
edit: if it helps even more my the full document where the character is found here
https://www.nytimes.com/2018/03/25/business/economy/labor-professionals.html

It’s all about character encoding. The way characters are represented isn't always the same and they tend to get misinterpreted.
Characters are usually stored as numbers that depend on the encoding standard (and there are so many of them). For example in ASCII, "a" is 97, and in UTF-8 it's 61.
Now when you see funny characters such as the question mark (called replacement character) in this case, it's usually that an encoding standard is being misinterpreted as another standard, and the replacement character is used to replace the unknown or misinterpreted character.
To fix your problem you need to tell your reader to read your file using a specific character encoding, say SOME-CHARSET.
Replace this:
InputStreamReader input = new InputStreamReader(fileStream);
with this:
InputStreamReader input = new InputStreamReader(fileStream, "SOME-CHARSET");
A list of charsets is available here. Unfortunately, you might want to go through them one by one. A short list of most common ones could be found here.

Your problem is almost certainly the encoding scheme you are using. You can read a file in most any encoding scheme you want. Just tell Java how your input was encoded. UTF-8 is common on Linux. Windows native is CP-1250.
This is the sort of problem you have all the time if you are processing files created on a different OS.
See here and Here

I'll give you a different approach...
Use the appropriate means for reading plain text files. Try this:
public static String getTxtContent(String path)
{
try(BufferedReader br = new BufferedReader(new FileReader(path)))
{
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
sb.append(line);
sb.append(System.lineSeparator());
line = br.readLine();
}
return sb.toString();
}catch(IOException fex){ return null; }
}

Remove illegal xml characters from UTF-16LE encoded file

I have a java application that parses an xml file that was encoded in utf-16le. The xml has been erroring out while being parsed due to illegal xml characters. My solution is to read in this file into a java string, then removing the xml characters, so it can be parsed successfully. It works 99% but there are some slight differences in the input output from this process, not caused by the illegal characters being removed, but going from the utf-16le encoding to java string utf-16.. i think
BufferedReader reader = null;
String fileText = ""; //stored as UTF-16
try {
reader = new BufferedReader(new InputStreamReader(in, "UTF-16LE"));
for (String line; (line = reader.readLine()) != null; ) {
fileText += line;
}
} catch (Exception ex) {
logger.log(Level.WARNING, "Error removing illegal xml characters", ex);
} finally {
if (reader != null) {
reader.close();
}
}
//code to remove illegal chars from string here, irrelevant to problem
ByteArrayInputStream inStream = new ByteArrayInputStream(fileText.getBytes("UTF-16LE"));
Document doc = XmlUtil.openDocument(inStream, XML_ROOT_NODE_ELEM);
Do characters get changed/lost when going from UTF-16LE to UTF-16? Is there a way to do this in java and assuring the input is exactly the same as the output?

Certainly one problem is that readLine throws away the line ending.
You would need to do something like:
fileText += line + "\r\n";
Otherwise XML attributes, DTD entities, or something else could get glued together where at least a space was required. Also you do not want the text content to be altered when it contains a line break.
Performance (speed and memory) can be improved using a
StringBuilder fileText = new StringBuilder();
... fileText.append(line).append("\n");
... fileText.toString();
Then there might be a problem with the first character of the file, which
sometimes redundantly is added: a BOM char.
line = line.replace("\uFEFF", "");

Keep new lines when reading in a file

I'm trying to read in a file and modify the text, but I need to keep new lines when doing so. For example, if I were to read in a file that contained:
This is some text.
This is some more text.
It would just read in as
This is some text.This is some more text.
How do I keep that space? I think it has something to do with the /n escape character. I've seen using BufferReader and FileReader, but we haven't learned that in my class yet, so is there another way? What I've tried is something like this:
if (ch == 10)
{
ch = '\n';
fileOut.print(ch);
}
10 is the ASCII table code for a new line, so I thought Java could recognize it as that, but it doesn't.

In Java 8:
You can read lines using:
List<String> yourFileLines = Files.readAllLines(Paths.get("your_file"));
Then collect strings:
String collect = yourFileLines.stream().filter(StringUtils::isNotBlank).collect(Collectors.joining(" "));

The problem is that you (possibly) want to read your file a line at a time, and then you want to write it back a line at a time (keeping empty lines).
The following source does that, it reads the input file one line at a time, and writes it back one line at a time (keeping empty lines).
The only problem is ... it possibly changes the new line, maybe you are reading a unix file and write a dos file or vice-versa depending on the system you are running in and the source type of the file you a reading.
Keeping the original newline can introduce a lot complexity, read BufferedReader and PrintWriter api docs for more information.
public void process(File input , File output){
try(InputStream in = new FileInputStream(input);
OutputStream out = new FileOutputStream(output)){
BufferedReader reader = new BufferedReader(new InputStreamReader(in, "utf-8"),true);
PrintWriter writer = new PrintWriter( new OutputStreamWriter(out,"utf-8"));
String line=null;
while((line=reader.readLine())!=null){
String processed = proces(line);
writer.println(processed);
}
} catch (IOException e) {
// Some exception management
}
}
public String proces(String line){
return line;
}

/n should be \n
if (ch == 10)
{
ch = '\n';
fileOut.print(ch);
}

Is that a typo?
ch = '/n';
otherwise use
ch = '\n';

Java replace line in a text file

I found this code from another question
private void updateLine(String toUpdate, String updated) throws IOException {
BufferedReader file = new BufferedReader(new FileReader(data));
String line;
String input = "";
while ((line = file.readLine()) != null)
input += line + "\n";
input = input.replace(toUpdate, updated);
FileOutputStream os = new FileOutputStream(data);
os.write(input.getBytes());
file.close();
os.close();
}
This is my file before I replace some lines
example1
example2
example3
But when I replace a line, the file now looks like this
example1example2example3
Which makes it impossible to read the file when there are a lot of lines in it.
How would I go about editing the code above to make my file look what it looked like at the start?

Use System.lineSeparator() instead of \n.
while ((line = file.readLine()) != null)
input += line + System.lineSeparator();
The issue is that on Unix systems, the line separator is \n while on Windows systems, it's \r\n.
In Java versions older then Java 7, you would have to use System.getProperty("line.separator") instead.
As pointed out in the comments, if you have concerns about memory usage, it would be wise to not store the entire output in a variable, but write it out line-by-line in the loop that you're using to process the input.

If you read and modify line by line this has the advantage, that you dont need to fit the whole file in memory. Not sure if this is possible in your case, but it is generally a good thing to aim for streaming. In your case this would in addition remove the need for concatenate the string and you don't need to select a line terminator, because you can write each single transformed line with println(). It requires to write to a different file, which is generally a good thing as it is crash safe. You would lose data if you rewrite a file and get aborted.
private void updateLine(String toUpdate, String updated) throws IOException {
BufferedReader file = new BufferedReader(new FileReader(data));
PrintWriter writer = new PrintWriter(new File(data+".out"), "UTF-8");
String line;
while ((line = file.readLine()) != null)
{
line = line.replace(toUpdate, updated);
writer.println(line);
}
file.close();
if (writer.checkError())
throw new IOException("cannot write");
writer.close();
}
In this case, it assumes that you need to do the replace only on complete lines, not multiple lines. I also added an explicit encoding and use a writer, as you have a string to output.

This is because you use OutputStream which is better for handling binary data. Try using PrintWriter and don't add any line terminator at the end of the lines. Example is here

Counting Words and Newlines In A File Using Java?

I am writing a small java app which will scan a text file for any instances of particular word and need to have a feature whereby it can report that an instance of the word was found to be the 14th word in the file, on the third line, for example.
For this i tried to use the following code which i thought would check to see whether or not the input was a newline (\n) character and then incerement a line variable that i created:
FileInputStream fileStream = new FileInputStream("src/file.txt");
DataInputStream dataStream = new DataInputStream(fileStream);
BufferedReader buffRead = new BufferedReader(new InputStreamReader(dataStream));
String strLine;
String Sysnewline = System.getProperty("line.separator");
CharSequence newLines = Sysnewline;
int lines = 1;
while ((strLine = buffRead.readLine()) != null)
{
if(strLine.contains(newLines))
{
System.out.println("Line Found");
lines++;
}
}
System.out.println("Total Number Of Lines In File: " + lines);
This does not work for, it simply display 0 at the end of this file. I know the data is being placed into strLine during the while loop as if i change the code slightly to output the line, it is successfully getting each line from the file.
Would anyone happen to know the reason why the above code does not work?

Read the javadocs for readLine.
Returns:
A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached

readLine() strips newlines. Just increment every iteration of the loop. Also, you're overcomplicating your file reading code. Just do new BufferedReader(new FileReader("src/file.txt"))

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java CSV Reader - Replace Quotes - java

Related

characters not appearing when I print when I import a file?

Remove illegal xml characters from UTF-16LE encoded file

Keep new lines when reading in a file

Java replace line in a text file

Counting Words and Newlines In A File Using Java?

Categories

Resources