Java - Splitting a CSV file into an Array

Java - Splitting a CSV file into an Array - java

I have managed to split a CSV file based on the commas. I did this by placing a dummy String where ever there was a ',' and then splitting based on the dummy String.
However, the CSV file contains things such as:
something, something, something
something, something, something
Therefore, where there is a new line, the last and first values of each line get merged into their own string. How can I solve this? I've tried placing my dummy string where \n is found to split it based on that but to no success.
Help?!

I would strongly recommend you not reinventing the wheel :). Go with one of the already available libraries for handling CSV files, eg: OpenCSV

I don't see why you need a dummy string. Why not split on comma?
BufferedReader in = new BufferedReader(new FileReader("file.csv"));
String line;
while ((line = in.readLine()) != null) {
String[] fields = line.split(",");
}

As per the dummy strings you mentioned, it could be easily processed with the help of an existing library. I would like to recommand the open source library uniVocity-parsers, which procides simplfied API, significent performance and flexibility.
Just refer to few lines of code to read csv data into memory with array:
private static void parseCSV() throws FileNotFoundException {
CsvParser parser = new CsvParser(new CsvParserSettings());
List<String[]> parsedData = parser.parseAll(new FileReader("/examples/example.csv"));
for (String[] row : parsedData) {
StringBuilder strBuilder = new StringBuilder();
for (String col : row) {
strBuilder.append(col).append("\t");
}
System.out.println(strBuilder);
}
}

use the followin it will split lines
String[] a=scanner.next().split(" ");

Related

Strategy for Processing a Text File with a Header using Reg Exp in Java

I have a file that contains a header with comments (e.g. [Comment] This is a comment) and a subsequent data section. The data starts at "Mk1=".
The program I am working on should:
Copy the header contents
Search and replace only in the data section of the file
Write header and data to a new file
I am currently using:
StringBuffer
Scanner
regex.Pattern;
In my code so far (reduced to its essentials):
public static void main(String[] args) {
File file = readFile("file.ext");
Scanner inputScanner = null;
try {
inputScanner = new Scanner(file);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
String currentLine = "";
while(inputScanner.hasNext()) {
currentLine = inputScanner.findInLine(regexpPattern);
if (currentLine != null){
fileOutput.append(currentLine + "\n");
}
}
}
Because the Scanner works like a queue, I have trouble figuring out what strategy I should use. I have found examples of using a Matcher instead of a Scanner. To my understanding I also have to work with boolean flags, because of the queue-like structure of Scanner. The findInHorizon() method does not seem helpful as I want the reg exp only to apply beyond the horizon. Is there perhaps a "hack" for the delimiter of the Scanner, assuming I know the series of characters of the header start and end?
File Example
[Comment]
Text goes here.
[Another Comment]
;Instructions: Below you will find Mk1= where the data can be assigned.
;More text.
Mk1=data
Mk2=data
Mk3=data
What strategy should I use?

Assuming you can use java.nio.file.Files (since Java 1.7) and your text file isn't too big, I'd read all lines at once and go for the Matcher:
Charset charset = Charset.forName("UTF-8");
List<String> lines = Files.readAllLines(file.toPath(), charset);
for (String line : lines) {
Matcher matcher = regexpPattern.matcher(line);
if (matcher.matches()) {
// do something
}
}
Using regex groups will prove useful for retrieving parameter-value pairs:
Pattern dataPattern = Pattern.compile("^Mk(\\d+)=(.*)$");
Matcher dataMatcher = dataPattern.matcher(line);
int mk = Integer.parseInt(dataMatcher.group(1));
String data = dataMatcher.group(2);

Parsing is a two step process: You have a tokenizer which recognizes patterns in the input and a parser which reads tokens but also has a state to know where it is.
You can use regexp for the "tokenize" part of the problem but you also need a parser which remembers "I have seen [Comment]" so it knows what could/should be next.
Related:
https://class.coursera.org/compilers/lecture

Parsing Individual Lines of Multi-Line Text File?

I have a question about something I've done in the past, but never really thought if it was the most efficient method to use.
Let's say I have a text file, where each line contains something important and let's then say I have multiple sets of these lines, each corresponding to a unique environment...so for example:
1
String that I need to parse for specific tokens..
2
String that I need to parse for specific tokens..
String that I need to parse for specific tokens..
3
String that I need to parse for specific tokens..
String that I need to parse for specific tokens..
String that I need to parse for specific tokens..
So given the above input file, my past way of solving this would be something similar to the following (semi-pseudocode!):
BufferedReader inputFile = new BufferedReader(new FileReader("file.txt"));
while(inputFile.hasNextLine())
{
Scanner line = new Scanner(inputFile.nextLine());
//parse the line looking for tokens
}
inputFile.close();
My issue with this is it seems incredibly inefficient to create a new Scanner object for every line I have in my BufferedReader.
Is there a better way to achieve this functionality?
One suggestion may be to scan the whole document by tokens, but my issue with that is I won't be able to keep track of how many strings are apart of the subset (indicated by the integer); or at least I can't think of another solution to that other than to decrement a counter every time I look at a new line.
Thanks in advance!

check out with this;
public static void main(String[] args) throws IOException {
BufferedReader bf = new BufferedReader(new FileReader(new File("d:/sample.txt")));
LineNumberReader lr = new LineNumberReader(bf);
String line = "";
while ((line = lr.readLine()) != null) {
System.out.println("Line Number " + lr.getLineNumber() +
": " + line);
}
}

reading from a comma separated text file

I am trying to write a Java program that simulates a record store shopping cart. The first step is to open up the inventory.txt file and read the contents which is basically what the "store has to offer". Then I need to read every line individually and process the id record and price.
The current method outputs a result that is very close to what I need, however, it picks up on the item id of the next line, as you can see below.
I was wondering if someone can assist me in figuring out how to process every line in the text document individually and store every piece of data in its own variable without picking up the id of the next item?
public void openFile(){
try{
x = new Scanner(new File("inventory.txt"));
x.useDelimiter(",");
}
catch(Exception e){
System.out.println("Could not find file");
}
}
public void readFile(){
while(x.hasNext()){
String id = x.next();
String record = x.next();
String price = x.next();
System.out.println(id + " " + record + " " + price);
break;
}
}
.txt document:
11111, "Hush Hush... - Pussycat Dolls", 12.95
22222, "Animal - Ke$ha", 9.95
33333, "Hanging By A Moment - Lifehouse - Single, 4.95
44444, "Have A Nice Day - Bon Jovi", 9.99
55555, "Day & Age - Killers", 10.99
66666, "She Wolf - Shakira", 15.99
77777, "Dark Horse - Nickelback", 12.99
88888, "The E.N.D. - Black Eyed Peas", 10.95
actual output
11111 "Hush Hush... - Pussycat Dolls" 12.95
22222
expected result
11111 "Hush Hush... - Pussycat Dolls" 12.95

So the problem here specifically is that you are breaking on commas, and you should be breaking on commas and newlines. But there are tons of other corner cases (for example, if your column is "abc,,,abc" you shouldn't break on those commas). Apache Commons comes with a CSVParser that handles all of these corner cases, you should use it:
http://commons.apache.org/csv/apidocs/org/apache/commons/csv/CSVParser.html

You can use a Pattern as the argument to Scanner.useDelimiter. Use this to provide alernates for the delimiter: either comma, or the line separator.
x.useDelimiter(",|" + System.getProperty("line.separator"));
Depending on what your input file uses as the line separator, you may need to change the second option.
The advice in other answers to use an existing CSV library is good: parsing CSV isn't as simple as breaking up the input around commas.

There are multiple ways to achieve this but going with your own way, you could use Scanner to first read lines (use Java's "line.separator" as delimiter) and then use Scanner class again with comma as delimiter.

The problem you're going to be facing is the CSV is more then just splitting a String on a comma. There are considerations to take into account with "escaped" commas (commas you don't want to delimante against).
I suggest you save your self a lot of time and head aches and use an existing API.
The Apache Commons has already been mentioned. I recently used OpenCSV and found it to be extremely simple to use and powerful
IMHO

An easy way to read in the entire file into a list of Strings (lines)...
public class Scanner {
public static List<String> readLines(String filename) throws IOException {
FileReader fileReader = new FileReader(filename);
BufferedReader bufferedReader = new BufferedReader(fileReader);
List<String> lines = new ArrayList<String>();
String line = null;
while ((line = bufferedReader.readLine()) != null) {
lines.add(line);
}
bufferedReader.close();
return lines;
}
}
Then you can process the individual lines as before, as each line is it's own String object. That is, if you don't use a CSVParser.

Process of making a huge txt into array in Java

I got thousands of sentences on a txt file, and my first Android application should take one from there and put it on a textView.
I could put the txt file as a resource, or also, try to get all the sentences and convert it to an array. I don't want to put my txt into the application, but directly the array with the sentences. How could I automatically "translate" thousands of sentences to an array-like list?

Don't reinvent the wheel... this is a one-liner:
import org.apache.commons.io.IOUtils;
List<String> sentences = (List<String>)IOUtils.readLines(new FileInputStream("filename.txt"));

i guess this is what u dont want to do...
//InputStream is = getResources().openRawResource(R.raw.list);
so get a InputStream object and use following code
List<String> content=new ArrayList<String>();
InputStreamReader isr = new InputStreamReader(is);
linereader = new LineNumberReader(isr);
for (int i = 0; i < num; i++) { // num is total no of lines in file
try {
line = linereader.readLine();
content.add(line);
}
catch (IOException e)
{
e.printStackTrace();
}
}// for ends

If you know that each item only has one period, you split on that. If you can't do that but can use newlines, use that.
final String blob = "Quote 1. Quote2. Quote '3'.";
String[] quotes = blob.split('\\.');
> ["Quote 1", "Quote2", "Quote '3'"];
OR
final String blob = "Quote 1.\nQuote 2 longer.";
String[] quotes = blob.split("\n");
> ["Quote 1.", "Quote 2 longer."]

It sounds like you want to put the text file in your resources folder and then read it using a BufferedReader.
Depending on the contents of your text file, you could read the text line by line and add it to an array as you go, or you could just read the entire file as a string and use .split(), which would return an array of strings for you to use.

How to replace multiple occurences of a string in a text file with a variable entered by the user and save all to a new file?

public static void main(String args[])
{
try
{
File file = new File("input.txt");
BufferedReader reader = new BufferedReader(new FileReader(file));
String line = "000000", oldtext = "414141";
while((line = reader.readLine()) != null)
{
oldtext += line + "\r\n";
}
reader.close();
// replace a word in a file
//String newtext = oldtext.replaceAll("drink", "Love");
//To replace a line in a file
String newtext = oldtext.replaceAll("This is test string 20000", "blah blah blah");
FileWriter writer = new FileWriter("input.txt");
writer.write(newtext);writer.close();
}
catch (IOException ioe)
{
ioe.printStackTrace();
}
}
}

A couple suggestions on your sample code:
Have the user pass in old and new on the command line (i.e., args[0] and args1).
If it's sufficient to do this a line at a time, it's going to be much more efficient to read a line, replace old -> new, then stream it out.
Also check out StringUtils and IOUtils, which may make your life easier in this case.

Easiest is the String.replace(oldstring, newstring), or String.replaceAll(regex, newString) function, you can just read the one file and write the replacement into a new file (or do it line by line if you're concerned about file size).

After reading your last comment - that's a totally different story... the preferred solution would be to parse the css file into an object model (like DOM), apply the changes there and serialize the model to css afterwards. It's much easier to find all color attributes in DOM and change them compared to doing the same with search and replace.
I've found some CSS parser in the wild wild web, but none of them looked like being capable of writing CSS files.
If you wanted to replace the color names with search and replace, you'd search for 'color:<colorname>' and replace it with 'color:<youHexColorValue>'. You may have to do the same for 'color:"<colorname>"', because the color name can be set in double quotes (another argument for using a CSS parser..)
String.replaceAll() is the easiest way to do it. Just read the complete CSS file into one String, replace all as suggested above and write the new String to the same (or a temporary) file (first).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java - Splitting a CSV file into an Array - java

I would strongly recommend you not reinventing the wheel :). Go with one of the already available libraries for handling CSV files, eg: OpenCSV

I don't see why you need a dummy string. Why not split on comma? BufferedReader in = new BufferedReader(new FileReader("file.csv")); String line; while ((line = in.readLine()) != null) { String[] fields = line.split(","); }

use the followin it will split lines String[] a=scanner.next().split(" ");

Related

Strategy for Processing a Text File with a Header using Reg Exp in Java

Parsing Individual Lines of Multi-Line Text File?

reading from a comma separated text file

Process of making a huge txt into array in Java

How to replace multiple occurences of a string in a text file with a variable entered by the user and save all to a new file?

Categories

Resources