split(//s+) dont remove whitespaces - java

I need to read alot of files and insert the data into Ms sql.
Got a file, it looks the texts are separated by //t.
Split does not do the job, I have even tried with "//s+" as you can see in the code below
public void InsetIntoCustomers(final File _file, final Connection _conn)
{
conn = _conn;
try
{
FileInputStream fs = new FileInputStream(_file);
DataInputStream in = new DataInputStream(fs);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
//String strline contains readline() from BufferedReader
String strline;
while((strline = br.readLine()) != null)
{
if(!strline.contains("#"))
{
String[] test = strline.split("//s+");
if((tempid = sNet.chkSharednet(_conn, test[0] )) != 0)
{
// do something
}
}
}
// close BufferedReader
br.close();
}
I need to know where in my String[] the data is placed in a file with 500k lines. But my Test[] get length 1 and all data from readline are on place 0.
Do I use split wrong ?
Or are there other places I need to look?:
// Mir
haha - Thank you so much - why the hell didnt I see that myself.
yeah ofc. iam using \s+ at all other files.
but thank for pointing it out.

The correct regex is \\s+, with back-shashes instead of forward-slashes.
You could have still tried with \\t

Related

How to split a file into several tokens

I was trying to tokenize an input file from sentences into tokens(words).
For example,
"This is a test file." into five words "this" "is" "a" "test" "file", omitting the punctuations and the white spaces. And store them into an arraylist.
I tried to write some codes like this:
public static ArrayList<String> tokenizeFile(File in) throws IOException {
String strLine;
String[] tokens;
//create a new ArrayList to store tokens
ArrayList<String> tokenList = new ArrayList<String>();
if (null == in) {
return tokenList;
} else {
FileInputStream fStream = new FileInputStream(in);
DataInputStream dataIn = new DataInputStream(fStream);
BufferedReader br = new BufferedReader(new InputStreamReader(dataIn));
while (null != (strLine = br.readLine())) {
if (strLine.trim().length() != 0) {
//make sure strings are independent of capitalization and then tokenize them
strLine = strLine.toLowerCase();
//create regular expression pattern to split
//first letter to be alphabetic and the remaining characters to be alphanumeric or '
String pattern = "^[A-Za-z][A-Za-z0-9'-]*$";
tokens = strLine.split(pattern);
int tokenLen = tokens.length;
for (int i = 1; i <= tokenLen; i++) {
tokenList.add(tokens[i - 1]);
}
}
}
br.close();
dataIn.close();
}
return tokenList;
}
This code works fine except I found out that instead of make a whole file into several words(tokens), it made a whole line into a token. "area area" becomes a token, instead of "area" appeared twice. I don't see the error in my codes. I believe maybe it's something wrong with my trim().
Any valuable advices is appreciated. Thank you so much.
Maybe I should use scanner instead?? I'm confused.
I think Scanner is more approprate for this task. As to this code, you should fix regex, try "\\s+";
Try pattern as String pattern = "[^\\w]"; in the same code

How do I use BufferedReader to read lines from a txt file into an array

I know how to read in lines with Scanner, but how do I use a BufferedReader? I want to be able to read lines into an array. I am able to use the hasNext() function with a Scanner but not a BufferedReader, that is the only thing I don't know how to do. How do I check when the end of the file text has been reached?
BufferedReader reader = new BufferedReader(new FileReader("weblog.txt"));
String[] fileRead = new String[2990];
int count = 0;
while (fileRead[count] != null) {
fileRead[count] = reader.readLine();
count++;
}
readLine() returns null after reaching EOF.
Just
do {
fileRead[count] = reader.readLine();
count++;
} while (fileRead[count-1]) != null);
Of course this piece of code is not the recommended way of reading the file, but shows how it might be done if you want to do it exactly the way you attempted to ( some predefined size array, counter etc. )
The documentation states that readLine() returns null if the end of the stream is reached.
The usual idiom is to update the variable that holds the current line in the while condition and check if it's not null:
String currentLine;
while((currentLine = reader.readLine()) != null) {
//do something with line
}
As an aside, you might not know in advance the number of lines you will read, so I suggest you use a list instead of an array.
If you plan to read all the file's content, you can use Files.readAllLines instead:
//or whatever the file is encoded with
List<String> list = Files.readAllLines(Paths.get("weblog.txt"), StandardCharsets.UTF_8);
using readLine(), try-with-resources and Vector
try (BufferedReader bufferedReader = new BufferedReader(new FileReader("C:\\weblog.txt")))
{
String line;
Vector<String> fileRead = new Vector<String>();
while ((line = bufferedReader.readLine()) != null) {
fileRead.add(line);
}
} catch (IOException exception) {
exception.printStackTrace();
}

Error in regards to 'ArrayIndexOutOfBoundsException' and more?

I cant seem to figure out what is causing this following error "Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at Bank.main(Bank.java:42) <--- this line is referring to the code line that starts as "banklist.add(new Bank(values[0]...."
public static void main (String[] args) throws FileNotFoundException
{
FileReader fr = new FileReader("Bank Data.txt");
BufferedReader reader = new BufferedReader(fr);
List<Bank> banklist = new ArrayList<Bank>();
try {
String line;
while ((line = reader.readLine()) != null)
{
String[] values = line.split("/t"); // Split on "tab"
banklist.add(new Bank(values[0], Integer.parseInt(values[1]),Integer.parseInt(values[2]),Integer.parseInt(values[3]),Integer.parseInt(values[4]), values[5])); // Create a new Player object with the values extract and add it to the list
Most possible explanation is that in your file, there are lines which does not have tab. Maybe last line is empty.
Possible solution may be to do a defensive programming and check the length of array before directly indexing it as array[1].
You have split on /t rather than \t. Note backslash rather than forward slash.
Since your lines probably don't contain any /t sequences, you don't get all the words your code expects.
You have used Wrong expression.
Find below Solution..
while ((line = reader.readLine()) != null) {
String regexp = "[\\s,;\\t]+";
String[] values = line.split(regexp);
banklist.add(new Bank(values[0],
Integer.parseInt(values[1]),
values[2],
values[3],
Integer.parseInt(values[4]),
values[5])
);
}

Reading group of lines in HUGE files

I have no idea how to do the following: I want to process a really huge textfile (almost 5 gigabytes). Since I cannot copy the file into temporarily memory, I thought of reading the first 500 lines (or as many as fit into the memory, I am not sure about that yet), do something with them, then go on to the next 500 until I am done with the whole file.
Could you post an example of the "loop" or command that you need for that? Because all the ways I tried resulted in starting from the beginning again but I want to go on after finishing the previous 500 lines.
Help appreciated.
BufferedReader br = new BufferedReader(new FileReader(file));
String line = null;
ArrayList<String> allLines = new ArrayList<String>();
while((line = br.readLine()) != null) {
allLines.add(line);
if (allLines.size() > 500) {
processLines(allLines);
allLines.clear();
}
}
processLines(allLines);
Ok so you indicated in a comment above that you only want to keep certain lines, writing them to a new file based on certain logic. You can read in one line at a time, decide whether to keep it and if so write it to the new file. This approach will use very little memory since you are only holding one line at a time in memory. Here is one way to do that:
BufferedReader br = new BufferedReader(new FileReader(file));
String lineRead = null;
FileWriter fw = new FileWriter(new File("newfile.txt"), false);
while((lineRead = br.readLine()) != null)
{
if (true) // put your test conditions here
{
fw.write(lineRead);
fw.flush();
}
}
fw.close();
br.close();

Reading a text file in java

How would I read a .txt file in Java and put every line in an array when every lines contains integers, strings, and doubles? And every line has different amounts of words/numbers.
I'm a complete noob in Java so sorry if this question is a bit stupid.
Thanks
Try the Scanner class which no one knows about but can do almost anything with text.
To get a reader for a file, use
File file = new File ("...path...");
String encoding = "...."; // Encoding of your file
Reader reader = new BufferedReader (new InputStreamReader (
new FileInputStream (file), encoding));
... use reader ...
reader.close ();
You should really specify the encoding or else you will get strange results when you encounter umlauts, Unicode and the like.
Easiest option is to simply use the Apache Commons IO JAR and import the org.apache.commons.io.FileUtils class. There are many possibilities when using this class, but the most obvious would be as follows;
List<String> lines = FileUtils.readLines(new File("untitled.txt"));
It's that easy.
"Don't reinvent the wheel."
The best approach to read a file in Java is to open in, read line by line and process it and close the strea
// Open the file
FileInputStream fstream = new FileInputStream("textfile.txt");
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while ((strLine = br.readLine()) != null) {
// Print the content on the console - do what you want to do
System.out.println (strLine);
}
//Close the input stream
fstream.close();
To learn more about how to read file in Java, check out the article.
Your question is not very clear, so I'll only answer for the "read" part :
List<String> lines = new ArrayList<String>();
BufferedReader br = new BufferedReader(new FileReader("fileName"));
String line = br.readLine();
while (line != null)
{
lines.add(line);
line = br.readLine();
}
Common used:
String line = null;
File file = new File( "readme.txt" );
FileReader fr = null;
try
{
fr = new FileReader( file );
}
catch (FileNotFoundException e)
{
System.out.println( "File doesn't exists" );
e.printStackTrace();
}
BufferedReader br = new BufferedReader( fr );
try
{
while( (line = br.readLine()) != null )
{
System.out.println( line );
}
#user248921 first of all, you can store anything in string array , so you can make string array and store a line in array and use value in code whenever you want. you can use the below code to store heterogeneous(containing string, int, boolean,etc) lines in array.
public class user {
public static void main(String x[]) throws IOException{
BufferedReader b=new BufferedReader(new FileReader("<path to file>"));
String[] user=new String[500];
String line="";
while ((line = b.readLine()) != null) {
user[i]=line;
System.out.println(user[1]);
i++;
}
}
}
This is a nice way to work with Streams and Collectors.
List<String> myList;
try(BufferedReader reader = new BufferedReader(new FileReader("yourpath"))){
myList = reader.lines() // This will return a Stream<String>
.collect(Collectors.toList());
}catch(Exception e){
e.printStackTrace();
}
When working with Streams you have also multiple methods to filter, manipulate or reduce your input.
For Java 11 you could use the next short approach:
Path path = Path.of("file.txt");
try (var reader = Files.newBufferedReader(path)) {
String line;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
}
Or:
var path = Path.of("file.txt");
List<String> lines = Files.readAllLines(path);
lines.forEach(System.out::println);
Or:
Files.lines(Path.of("file.txt")).forEach(System.out::println);

Categories

Resources