I've noticed that Java String will reuse char array inside it to avoid creating new char array for a new String instance in method such as subString(). There are several unpublish constructors in String for this purpose, accepting a char array and two int as range to construct a String instance.
But until today I found that split will also reuse the char arr of original String instance. Now I read a loooooong line from a file, split it with "," and cut a very limit column for real usage. Because every part of the line secretly holding the reference of the looooong char array, I got an OOO very soon.
here is example code:
ArrayList<String> test = new ArrayList<String>(3000000);
BufferedReader origReader = new BufferedReader(new FileReader(new File(
"G:\\filewithlongline.txt")));
String line = origReader.readLine();
int i = 0;
while ((line = origReader.readLine()) != null) {
String name = line.split(',')[0];
test.add(name);
i++;
if (i % 100000 == 0) {
System.out.println(name);
}
}
System.out.println(test.size());
Is there any standard method in JDK to make sure that every String instance that spitted is a "real deep copy" not "shallow copy"?
Now I am using a very ugly workaround to force creating a new String instance:
ArrayList<String> test = new ArrayList<String>(3000000);
BufferedReader origReader = new BufferedReader(new FileReader(new File(
"G:\\filewithlongline.txt")));
String line = origReader.readLine();
int i = 0;
while ((line = origReader.readLine()) != null) {
String name = line.split(',')[0]+" ".trim(); // force creating a String instance
test.add(name);
i++;
if (i % 100000 == 0) {
System.out.println(name);
}
}
System.out.println(test.size());
The simplest approach is to create a new String directly. This is one of the rare cases where its a good idea.
String name = new String(line.split(",")[0]); // note the use of ","
An alternative is to parse the file yourself.
do {
StringBuilder name = new StringBuilder();
int ch;
while((ch = origReader.read()) >= 0 && ch != ',' && ch >= ' ') {
name.append((char) ch);
}
test.add(name.toString());
} while(origReader.readLine() != null);
String has a copy constructor you can use for this purpose.
final String name = new String(line.substring(0, line.indexOf(',')));
... or, as Peter suggested, just only read until the ,.
final StringBuilder buf = new StringBuilder();
do {
int ch;
while ((ch = origReader.read()) >= 0 && ch != ',') {
buf.append((char) ch);
}
test.add(buf.toString());
buf.setLength(0);
} while (origReader.readLine() != null);
Related
I'm trying to parse a folder of csv files (balance sheets), and have everythings gone smoothly up until I tried to separate the row names from the values.
It looks like the last cell on the previous row is combining with the first cell (the row name in column A) in the next row.
File path = new File("/Users/Zack/Desktop/JavaDB/BALANCESHEETS");
for(File file: path.listFiles()) {
if (file.isFile()) {
String fileName = file.getName();
String ticker = fileName.split("\\_")[0];
if (ticker.equals("ASB") || ticker.equals("FRC")) {
if (ticker.equals("ASB")) {
ticker = ticker + "PRD";
}
if (ticker.equals("FRC")) {
ticker = ticker + "PRD";
}
}
Reader reader = new BufferedReader(new FileReader(file));
StringBuilder builder = new StringBuilder();
int c;
while ((c = reader.read()) != -1) {
builder.append((char) c);
}
String string = builder.toString();
ArrayList<String> stringResult = new ArrayList<String>();
if (string != null) {
String[] splitData = string.split("\\s*,\\s*");
for (int i = 0; i <splitData.length; i++) {
if (!(splitData[i] == null) || !(splitData[i].length() ==0)) {
stringResult.add(splitData[i].trim());
}
}
}
for (int i = 0; i < stringResult.size(); i++) {
int cL = stringResult.get(i).length();
for (int x = 0; x < cL; x++) {
if (Character.isLetter(stringResult.get(i).charAt(x))) {
System.out.println("index: " + i);
System.out.println(stringResult.get(i));
break;
}
}
}
Here are some photos of what's happening
https://postimg.org/image/a9qc1qggz/
https://postimg.org/image/mvna7p7s3/
Any idea on how to fix this?
I also noticed there is a space in front of the row names in the spreadsheets, which I suspect may be part of the problem.
The problem is coming from where you are reading in the file, here:
Reader reader = new BufferedReader(new FileReader(file));
StringBuilder builder = new StringBuilder();
int c;
while ((c = reader.read()) != -1) {
builder.append((char) c);
}
String string = builder.toString();
This reads all the characters into a single string, including the new line character(s). When you then split the string, you are not splitting on the new line character(s) and so you end up with what you are seeing.
As mentioned but others I strongly urge you to use one of the many csv parsers that already exist.
The simple (but ugly) fix would be to also split on newlines. A better fix would be to use the readLine() method of the BufferedReader.
Also != is your friend.
As Erwin stated in the comments, your Pattern that you are splitting on just looks for commas with whitespace around them. It looks like you know what format your data will be in since you know that the data will be separated by either whitespace comma whitespace or a newline. Seems to me you just need to change your input to "\\s*,\\s*|$", which is the regex that says that. Like has been mentioned you need to know beforehand that the data doesn't include whitespace comma whitespace in any of the fields or this breaks.
Here's the code:
FileReader fr = new FileReader("datos_clientes.txt");
BufferedReader br = new BufferedReader(fr);
while ((line = br.readLine()) != null) {
String nameMark = "#n";
String addressMark = "#d";
int nameStart = line.indexOf(nameMark) + nameMark.length();
int addressStart = line.indexOf(addressMark) + addressMark.length();
String name = line.substring(nameStart, addressStart - addressMark.length());
String address = line.substring(addressStart, line.length());
if (line.startsWith("tipo1.")) {
FileWriter fw = new FileWriter(name +".txt");
char[] vector = name.toCharArray();
char[] vector2 = address.toCharArray();
int index = 0;
while (index < vector.length) {
fw.write(vector[index]+vector2[index]);
index++;
}
fw.close();
} else if (line.startsWith("tipo2.")) {
FileWriter fw = new FileWriter(name +".txt");
char[] vector = name.toCharArray();
char[] vector2 = address.toCharArray();
int index = 0;
while (index < vector.length) {
fw.write(vector[index]+vector2[index]);
index++;
}
fw.close();
}
else if (line.startsWith("tipo3.")) {
FileWriter fw = new FileWriter(name +".txt");
char[] vector = name.toCharArray();
char[] vector2 = address.toCharArray();
int index = 0;
while (index < vector.length) {
fw.write(vector[index]+vector2[index]);
index++;
}
fw.close();
}
}
What I want from this code is to create the each new file with the name of the recipient and their address.
The new files just show a combination of random alphabethical characters.
Then I have three pre-made files which I have to include in each new file so for example if one of the new files is "Maria Roberts.txt" and this person will receive a "type 1" letter I want the file (Maria Roberts.txt) to include the name, her address and the file "type1.txt"
I don't know how to do that.
I know I add things in every new question... sorry, I thing it will be easier for me to understand it.
Thanks again!
You're adding one character from the name array with one character from the address array, then outputting the result.
fw.write(vector[index]+vector2[index]);
Instead, you want to write the entire name array, then (in a different loop) write the entire address array.
int index = 0;
while (index < vector.length) {
fw.write(vector[index]);
index++;
}
index = 0;
while (index < vector2.length) {
fw.write(vector2[index]);
index++;
}
That will just stick them together, but you can use your imagination and figure out how to separate them the way you want.
I'm trying to code a program that given a file with the names and addresses of five or more people, creates one different file (letter) for each of them (the new files will be named as the person who will receive it).
The structure of the main file is something like this:
type1.0001 #n John Harrison #a Whatever Street, 490 - Liverpool
.... and so on
So "type1" is the type of letter this person has to be sent, the words after "#n" are the name, and the words after "#a" the address.
What I've been trying is this:
String datos = "main_file.txt";
String tipo1 = "type1.txt";
String tipo2 = "type2.txt";
String tipo3 = "type3.txt";
char[] type1 = {'t', 'i', 'p', 'o', '1'};
//all other types should be here
String line;
FileReader fr = new FileReader("mainfile.txt");
BufferedReader br = new BufferedReader(fr);
while ((line = br.readLine()) != null) {
while ((line = br.readLine()) != ".") {
char[] lineArray = line.toCharArray();
if (lineArray == type1) {
//code that creates file type1
}
}
}
fr.close();
So far this would just be the code that decides which letter to send, but it doesn't work.
I think it's something related to the "while" loop.
Please, I started Java 1 month ago, so if anyone could help me I'd be so grateful!
Thanks
Right now, I've got this:
FileReader fr = new FileReader("main_file.txt");
BufferedReader br = new BufferedReader(fr);
while ((line = br.readLine()) != null) {
String nameMark = "#n";
String addressMark = "#a";
int nameStart = line.indexOf(nameMark) + nameMark.length();
int addressStart = line.indexOf(addressMark) + addressMark.length();
String name = line.substring(nameStart, addressStart - addressMark.length());
String address = line.substring(addressStart, line.length());
if (line.startsWith("tipo1.")) {
FileWriter fw = new FileWriter("file1.txt");
char[] vector = name.toCharArray();
int index = 0;
while (index < vector.length) {
fw.write(vector[index]);
index++;
}
fw.close();
} else if (line.startsWith("tipo2.")) {
FileWriter fw = new FileWriter("file1.txt");
char[] vector = name.toCharArray();
int index = 0;
while (index < vector.length) {
fw.write(vector[index]);
index++;
}
fw.close();
}
}
fr.close();
But it doesn't work.
Can someone help me out?
As Marc B told you, don't read the lines twice.
Also, just comparing the start of the line, will be much less overkill than you char array stuff.
To retrieve name and address, you could use indexOf and substring methods of String.
Here is a whole example :
while ((line = br.readLine()) != null) {
// get the name and the address of this line
String nameMark = "#n";
String addressMark = "#a";
int nameStart = line.indexOf(nameMark) + nameMark.length();
int addressStart = line.indexOf(addressMark) + addressMark.length();
String name = line.substring(nameStart, addressStart - addressMark.length());
String address = line.substring(addressStart, line.length());
// get the line type
if (line.startsWith("tipo1")) {
//code that creates file type1 with name and address
}
else if(line.startsWith("tipo2")) {
//code that creates file type2 with name and address
}
...
...
}
You can't use == with arrays. (Well, you can, but it doesn't do what you expect.) That is, this line is wrong:
if (lineArray == type1)
Try Arrays.equals instead:
if (Arrays.equals(lineArray, type1))
Try this for starters.
String line;
FileReader fr = new FileReader("mainfile.txt");
BufferedReader br = new BufferedReader(fr);
while ((line = br.readLine()) != null) { // finish when line is null not "."
String [] parts = line.split("#");
String name, address;
if (parts.length > 2) {
name = parts[1].substring(2);
address = parts[2].substring(2);
}
if (line.startsWith("tipo1")) {
// save to tipo1 file
} else if (line.startsWith("tipo2")) {
// save to tipo2 file
} else if (line.startsWith("tipo2")) {
// save to tipo2 file
} else if (line.startsWith("tipo3")) {
// save to tipo2 file
} else if (line.startsWith("tipo4")) {
// save to tipo2 file
} else if (line.startsWith("tipo2")) {
// save to tipo2 file
}
}
fr.close();
I need some code that will allow me to read one page at a time from a UTF-8 file.
I've used the code;
File fileDir = new File("DIRECTORY OF FILE");
BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(fileDir), "UTF8"));
String str;
while ((str = in.readLine()) != null) {
System.out.println(str);
}
in.close();
}
After surrounding it with a try catch block it runs but outputs the entire file!
Is there a way to amend this code to just display ONE PAGE of text at a time?
The file is in UTF-8 format and after viewing it in notepad++, i can see the file contains FF characters to denote the next page.
You will need to look for the form feed character by comparing to 0x0C.
For example:
char c = in.read();
while ( c != -1 ) {
if ( c == 0x0C ) {
// form feed
} else {
// handle displayable character
}
c = in.read();
}
EDIT added an example of using a Scanner, as suggested by Boris
Scanner s = new Scanner(new File("a.txt")).useDelimiter("\u000C");
while ( s.hasNext() ) {
String str = s.next();
System.out.println( str );
}
If the file is valid UTF-8, that is, the pages are split by U+00FF, aka (char) 0xFF, aka "\u00FF", 'ΓΏ', then a buffered reader can do. If it is a byte 0xFF there would be a problem, as UTF-8 may use a byte 0xFF.
int soughtPageno = ...; // Counted from 0
int currentPageno = 0;
try (BufferedReader in = new BufferedReader(new InputStreamReader(
new FileInputStream(fileDir), StandardCharsets.UTF_8))) {
String str;
while ((str = in.readLine()) != null && currentPageno <= soughtPageno) {
for (int pos = str.indexOf('\u00FF'; pos >= 0; )) {
if (currentPageno == soughtPageno) {
System.out.println(str.substring(0, pos);
++currentPageno;
break;
}
++currentPageno;
str = str.substring(pos + 1);
}
if (currentPageno == soughtPageno) {
System.out.println(str);
}
}
}
For a byte 0xFF (wrong, hacked UTF-8) use a wrapping InputStream between FileInputStream and the reader:
class PageInputStream implements InputStream {
InputStream in;
int pageno = 0;
boolean eof = false;
PageInputSTream(InputStream in, int pageno) {
this.in = in;
this.pageno = pageno;
}
int read() throws IOException {
if (eof) {
return -1;
}
while (pageno > 0) {
int c = in.read();
if (c == 0xFF) {
--pageno;
} else if (c == -1) {
eof = true;
in.close();
return -1;
}
}
int c = in.read();
if (c == 0xFF) {
c = -1;
eof = true;
in.close();
}
return c;
}
Take this as an example, a bit more work is to be done.
You can use a Regex to detect form-feed (page break) characters. Try something like this:
File fileDir = new File("DIRECTORY OF FILE");
BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(fileDir), "UTF8"));
String str;
Regex pageBreak = new Regex("(^.*)(\f)(.*$)")
while ((str = in.readLine()) != null) {
Match match = pageBreak.Match(str);
bool pageBreakFound = match.Success;
if(pageBreakFound){
String textBeforeLineBreak = match.Groups[1].Value;
//Group[2] will contain the form feed character
//Group[3] will contain the text after the form feed character
//Do whatever logic you want now that you know you hit a page boundary
}
System.out.println(str);
}
in.close();
The parenthesis around portions of the Regex denote capture groups, which get recorded in the Match object. The \f matches on the form feed character.
Edited Apologies, for some reason I read C# instead of Java, but the core concept is the same. Here's the Regex documentation for Java: http://docs.oracle.com/javase/tutorial/essential/regex/
I am new to Java. I have one text file with below content.
`trace` -
structure(
list(
"a" = structure(c(0.748701,0.243802,0.227221,0.752231,0.261118,0.263976,1.19737,0.22047,0.222584,0.835411)),
"b" = structure(c(1.4019,0.486955,-0.127144,0.642778,0.379787,-0.105249,1.0063,0.613083,-0.165703,0.695775))
)
)
Now what I want is, I need to get "a" and "b" as two different array list.
You need to read the file line by line. It is done with a BufferedReader like this :
try {
FileInputStream fstream = new FileInputStream("input.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
int lineNumber = 0;
double [] a = null;
double [] b = null;
// Read File Line By Line
while ((strLine = br.readLine()) != null) {
lineNumber++;
if( lineNumber == 4 ){
a = getDoubleArray(strLine);
}else if( lineNumber == 5 ){
b = getDoubleArray(strLine);
}
}
// Close the input stream
in.close();
//print the contents of a
for(int i = 0; i < a.length; i++){
System.out.println("a["+i+"] = "+a[i]);
}
} catch (Exception e) {// Catch exception if any
System.err.println("Error: " + e.getMessage());
}
Assuming your "a" and"b" are on the fourth and fifth line of the file, you need to call a method when these lines are met that will return an array of double :
private static double[] getDoubleArray(String strLine) {
double[] a;
String[] split = strLine.split("[,)]"); //split the line at the ',' and ')' characters
a = new double[split.length-1];
for(int i = 0; i < a.length; i++){
a[i] = Double.parseDouble(split[i+1]); //get the double value of the String
}
return a;
}
Hope this helps. I would still highly recommend reading the Java I/O and String tutorials.
You can play with split. First find the line in the text that matches "a" (or "b"). Then do something like this:
Array[] first= line.split("("); //first[2] will contain the values
Then:
Array[] arrayList = first[2].split(",");
You will have the numbers in arrayList[]. Be carefull with the final brackets )), because they have a "," right after. But that is code depuration and it is your mission. I gave you the idea.