Java - Read and storing in an array - java

I want to read the contents of a text file, split on a delimiter and then store each part in a separate array.
For example the-file-name.txt contains different string all on a new line:
football/ronaldo
f1/lewis
wwe/cena
So I want to read the contents of the text file, split on the delimiter "/" and store the first part of the string before the delimiter in one array, and the second half after the delimiter in another array. This is what I have tried to do so far:
try {
File f = new File("the-file-name.txt");
BufferedReader b = new BufferedReader(new FileReader(f));
String readLine = "";
System.out.println("Reading file using Buffered Reader");
while ((readLine = b.readLine()) != null) {
String[] parts = readLine.split("/");
}
} catch (IOException e) {
e.printStackTrace();
}
This is what I have achieved so far but I am not sure how to go on from here, any help in completing the program will be appreciated.

You can create two Lists one for the first part and se second for the second part :
List<String> part1 = new ArrayList<>();//create a list for the part 1
List<String> part2 = new ArrayList<>();//create a list for the part 2
while ((readLine = b.readLine()) != null) {
String[] parts = readLine.split("/");//you mean to split with '/' not with '-'
part1.add(parts[0]);//put the first part in ths list part1
part2.add(parts[1]);//put the second part in ths list part2
}
Outputs
[football, f1, wwe]
[ronaldo, lewis, cena]

Related

Remove stop words from file - going over it multiple times causes content duplication and does not remove the words

I am trying to go over a bunch of files, read each of them, and remove all stopwords from a specified list with such words. The result is a disaster - the content of the whole file copied over and over again.
What I tried:
- Saving the file as String and trying to look with regex
- Saving the file as String and going over line by line and comparing tokens to the stopwords that are stored in a LinkedHashSet, I can also store them in a file
- tried to twist the logic below in multiple ways, getting more and more ridiculous output.
- tried looking into text / line with the .contains() method, but no luck
My general logic is as follows:
for every word in the stopwords set:
while(file has more lines):
save current line into String
while (current line has more tokens):
assign current token into String
compare token with current stopword:
if(token equals stopword):
write in the output file "" + " "
else: write in the output file the token as is
Tried what's in this question and many other SO questions, but just can't achieve what I need.
Real code below:
private static void removeStopWords(File fileIn) throws IOException {
File stopWordsTXT = new File("stopwords.txt");
System.out.println("[Removing StopWords...] FILE: " + fileIn.getName() + "\n");
// create file reader and go over it to save the stopwords into the Set data structure
BufferedReader readerSW = new BufferedReader(new FileReader(stopWordsTXT));
Set<String> stopWords = new LinkedHashSet<String>();
for (String line; (line = readerSW.readLine()) != null; readerSW.readLine()) {
// trim() eliminates leading and trailing spaces
stopWords.add(line.trim());
}
File outp = new File(fileIn.getPath().substring(0, fileIn.getPath().lastIndexOf('.')) + "_NoStopWords.txt");
FileWriter fOut = new FileWriter(outp);
Scanner readerTxt = new Scanner(new FileInputStream(fileIn), "UTF-8");
while(readerTxt.hasNextLine()) {
String line = readerTxt.nextLine();
System.out.println(line);
Scanner lineReader = new Scanner(line);
for (String curSW : stopWords) {
while(lineReader.hasNext()) {
String token = lineReader.next();
if(token.equals(curSW)) {
System.out.println("---> Removing SW: " + curSW);
fOut.write("" + " ");
} else {
fOut.write(token + " ");
}
}
}
fOut.write("\n");
}
fOut.close();
}
What happens most often is that it looks for the first word from the stopWords set and that's it. The output contains all the other words even if I manage to remove the first one. And the first will be there in the next appended output in the end.
Part of my stopword list
about
above
after
again
against
all
am
and
any
are
as
at
With tokens I mean words, i.e. getting every word from the line and comparing it to the current stopword
After awhile of debugging I believe I have found the solution. This problem is very tricky as you have to use several different scanners and file readers etc. Here is what I did:
I changed how you added to your StopWords set, as it wasn't adding them correctly. I used a buffered reader to read each line, then a scanner to read each word, then added it to the set.
Then when you compared them I got rid of one of your loops as you can easily use the .contains() method to check if the word was a stopWord.
I left you to do the part of writing to the file to take out the stop words, as I'm sure you can figure that out now that everything else is working.
-My sample stop words txt file:
Stop words
Words
-My samples input file was the exact same, so it should catch all three words.
The code:
// create file reader and go over it to save the stopwords into the Set data structure
BufferedReader readerSW = new BufferedReader(new FileReader("stopWords.txt"));
Set<String> stopWords = new LinkedHashSet<String>();
String stopWordsLine = readerSW.readLine();
while (stopWordsLine != null) {
// trim() eliminates leading and trailing spaces
Scanner words = new Scanner(stopWordsLine);
String word = words.next();
while(word != null) {
stopWords.add(word.trim()); //Add the stop words to the set
if(words.hasNext()) {
word = words.next(); //If theres another line, read it
}
else {
break; //else break the inner while loop
}
}
stopWordsLine = readerSW.readLine();
}
BufferedReader outp = new BufferedReader(new FileReader("Words.txt"));
String line = outp.readLine();
while(line != null) {
Scanner lineReader = new Scanner(line);
String line2 = lineReader.next();
while(line2 != null) {
if(stopWords.contains(line2)) {
System.out.println("removing " + line2);
}
if(lineReader.hasNext()) { //If theres another line, read it
line2 = lineReader.next();
}
else {
break; //else break the first while loop
}
}
lineReader.close();
line = outp.readLine();
}
OutPut:
removing Stop
removing words
removing Words
Let me know if I can elaborate any more on my code or why I did something!

Java compare strings from two places and exclude any matches

I'm trying to end up with a results.txt minus any matching items, having successfully compared some string inputs against another .txt file. Been staring at this code for way too long and I can't figure out why it isn't working. New to coding so would appreciate it if I could be steered in the right direction! Maybe I need a different approach? Apologies in advance for any loud tutting noises you may make. Using Java8.
//Sending a String[] into 'searchFile', contains around 8 small strings.
//Example of input: String[]{"name1","name2","name 3", "name 4.zip"}
^ This is my exclusions list.
public static void searchFile(String[] arr, String separator)
{
StringBuilder b = new StringBuilder();
for(int i = 0; i < arr.length; i++)
{
if(i != 0) b.append(separator);
b.append(arr[i]);
String findME = arr[i];
searchInfo(MyApp.getOptionsDir()+File.separator+"file-to-search.txt",findME);
}
}
^This works fine. I'm then sending the results to 'searchInfo' and trying to match and remove any duplicate (complete, not part) strings. This is where I am currently failing. Code runs but doesn't produce my desired output. It often finds part strings rather than complete ones. I think the 'results.txt' file is being overwritten each time...but I'm not sure tbh!
file-to-search.txt contains: "name2","name.zip","name 3.zip","name 4.zip" (text file is just a single line)
public static String searchInfo(String fileName, String findME)
{
StringBuffer sb = new StringBuffer();
try {
BufferedReader br = new BufferedReader(new FileReader(fileName));
String line = null;
while((line = br.readLine()) != null)
{
if(line.startsWith("\""+findME+"\""))
{
sb.append(line);
//tried various replace options with no joy
line = line.replaceFirst(findME+"?,", "");
//then goes off with results to create a txt file
FileHandling.createFile("results.txt",line);
}
}
} catch (Exception e) {
e.printStackTrace();
}
return sb.toString();
}
What i'm trying to end up with is a result file MINUS any matching complete strings (not part strings):
e.g. results.txt to end up with: "name.zip","name 3.zip"
ok with the information I have. What you can do is this
List<String> result = new ArrayList<>();
String content = FileUtils.readFileToString(file, "UTF-8");
for (String s : content.split(", ")) {
if (!s.equals(findME)) { // assuming both have string quotes added already
result.add(s);
}
}
FileUtils.write(newFile, String.join(", ", result), "UTF-8");
using apache commons file utils for ease. You may add or remove spaces after comma as per your need.

BufferedReader to read lines, then assign the new formed line's tokens to variables

I have a text file that I need to modify before parsing it. 1) I need to combine lines if leading line ends with "\" and delete white spaced line. this has been done using this code
public List<String> OpenFile() throws IOException {
try (BufferedReader br = new BufferedReader(new FileReader(path))) {
String line;
StringBuilder concatenatedLine = new StringBuilder();
List<String> formattedStrings = new ArrayList<>();
while ((line = br.readLine()) != null) {
if (line.isEmpty()) {
line = line.trim();
} else if (line.charAt(line.length() - 1) == '\\') {
line = line.substring(0, line.length() - 1);
concatenatedLine.append(line);
} else {
concatenatedLine.append(line);
formattedStrings.add(concatenatedLine.toString());
concatenatedLine.setLength(0);
}
}
return formattedStrings;
}
}
}//The formattedStrings arrayList contains all of the strings formatted for use.
Now My question, how can I search those lines for pattern and assign their token[i] to variables that I can call or use later.
the New combined text will look like this:
Field-1 Field-2 Field-3 Field-4 Field-5 Field-6 Field-7
Now, if the line contains "Field-6" and "Field-2" Then set the following:
String S =token[1] token[3];
String Y =token[5-7];
Question you might have for me, how am I deciding on which token to save to a string? I will manually search for the pattern in the text file and if the "Line contain Field-6 and Field-2 or any other required pattern. Then manually count which token I need to assign to the string. However, it will be nice if there is another way to approach this, for ex assign what's in between token[4] and token[7] to string (s) if the line has token[2] and token[6]. or another way that provides more Granule Control over what to store as string and what to ignore.

Read text containing multiple line using bufferedreader

I would like to know how to read a text file containing multiple lines in java using BufferedStreamReader.
Every line has two words separated by (;) and I want to use split() String operation to separate the 2 words. I also need to compare each word to a word in a master arraylist.
I'm having problems to continue.
Here's my code:
{
FileInputStreamReader f = new FileInputStreamReader(C://Desktop/test.txt);
InputStreamReader reader = new InputStreamReader(f);
BufferedReader Buff = new BufferedReader (reader);
String Line = buff.readLine();
String t[] = Line.split(;);
}
Replace
String Line = Buff.readLine();
with
// buffer for storing file contents in memory
StringBuffer stringBuffer = new StringBuffer("");
// for reading one line
String line = null;
// keep reading till readLine returns null
while ((line = Buff.readLine()) != null) {
// keep appending last line read to buffer
stringBuffer.append(line);
}
Now, you have read the complete file into StringBuffer, you do whatever you want.
Hope this helps.
Try
while((line=buff.readLine())!=null){
System.out.println(line);
}
You need a while loop to read all the lines.
Here is an example http://www.mkyong.com/java/how-to-read-file-from-java-bufferedreader-example/
You can use BufferedReader to loop through each of the line encountered within the specified file. In order to get your words split by a ";", you can use .split and can store the resulting array in a list.
Finally, combine all the lists to a single list which would inturn hold all the words present in your file.
List<String> words = Arrays.asList(line.split(";"));
list.addAll(words);
Now you would want to compare the retrieved list against a Master list containing all your records.
// Compare the 2 lists, assuming your file list has less number of
// records
masterList.removeAll(list);
The above statement can be used in reverse too; in case the file holds the master list of words. Alternatively, you can store the 2 lists in temporary lists and compare in whatsoever way your require.
Here is the complete code:
public static void main(String[] args) {
String line;
// List of all the words read from the file
List<String> list = new ArrayList<String>();
// Your original mast list of words against which you want to compare
List<String> masterList = new ArrayList<String>(Arrays.asList("cleaner",
"code", "java", "read", "write", "market", "python", "snake",
"stack", "overflow"));
BufferedReader reader;
try {
reader = new BufferedReader(new FileReader("testing.txt"));
while ((line = reader.readLine()) != null) {
// Add all the words split by a ; to the list
List<String> words = Arrays.asList(line.split(";"));
list.addAll(words);
}
// Compare the 2 lists, assuming your file list has less number of
// records
masterList.removeAll(list);
System.out.println(masterList);
reader.close();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
File which I have created looks like:
cleaner;code
java;read
write;market
python;snake
The output of the above code:
[stack, overflow]

Read text file and split each newline into a string array

So basically I'm reading a text file that has a bunch of lines. I need to extract certain lines from the text file and add those specific lines into string array. I've been trying to split each newLine with: "\n" , "\r". This did not work. I keep getting this error as well:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at A19010.main(A19010.java:47)
Here is the code:
Path objPath = Paths.get("dirsize.txt");
if (Files.exists(objPath)){
File objFile = objPath.toFile();
try(BufferedReader in = new BufferedReader(
new FileReader(objFile))){
String line = in.readLine();
while(line != null){
String[] linesFile = line.split("\n");
String line0 = linesFile[0];
String line1 = linesFile[1];
String line2 = linesFile[2];
System.out.println(line0 + "" + line1);
line = in.readLine();
}
}
catch(IOException e){
System.out.println(e);
}
}
else
{
System.out.println(
objPath.toAbsolutePath() + " doesn't exist");
}
String[] linesFile = new String[] {line}; // this array is initialized with a single element
String line0 = linesFile[0]; // fine
String line1 = linesFile[1]; // not fine, the array has size 1, so no element at second index
String line2 = linesFile[2];
You're creating a String[] linesFile with one element, line, but then trying to access elements at index 1 and 2. This will give you an ArrayIndexOutOfBoundsException
You're not actually splitting anything here. in.readLine();, as the method says, reads a full line from the file.
Edit: You can add lines (Strings) dynamically to a list instead of an array, since you don't know the size.
List<String> lines = new LinkedList<String>(); // create a new list
String line = in.readLine(); // read a line at a time
while(line != null){ // loop till you have no more lines
lines.add(line) // add the line to your list
line = in.readLine(); // try to read another line
}
readLine() method reads a entire line from the input but removes the newLine characters from it. When you split the line on \n character, you will not find one in the String. Hence, you get the exception.
Please, refer the answer in this link for more clarity.
You are initializing your String array with 1 element, namely line. linesFile[0] is therefore line and the rest of your array is out of bounds.
Try this:
String[] linesFile = line.split("SPLIT-CHAR-HERE");
if(linesFile.length >= 3)
{
String line0 = linesFile[0];
String line1 = linesFile[1];
String line2 = linesFile[2];
// further logic here
}else
{
//handle invalid lines here
}
You are using array to store the strings. Instead use ArrayList from Java as ArrayList are dynamically growing. after your reading operation completes convert it into array.
String line = in.readLine();
ArrayList<String> str_list = new ArrayList<String>();
String[] strArr = new String[str_list.size()];
while(line != null){
str_list.add(line);
line = in.readLine();
}
// at the end of the operation convert Arraylist to array
return str_list.toArray(strArr);
The issue here is that you are creating a new String array every time your parser reads in a new line. You then populate only the very first element in that String array with the line that is being read in with:
String[] linesFile = new String[] {line};
Since you create a new String[] with one element every single time your while loop runs from the top, you lose the values it stored from the previous iteration.
The solution is to use new String[]; right before you enter the while loop. If you don't know how to use ArrayList, then I suggest a while loop like this:
int numberOfLine = 0;
while (in.readLine() != null)
{
numberOfLine++;
}
String linesFile = new String[numberOfLine];
This will let you avoid using a dynamically resized ArrayList because you know how many lines your file contains from the above while loop. Then you would keep an additional counter (or resuse numberOfLine since we have no use for it anymore) so that you can populate this array:
numberOfLine = 0;
in = new BufferedReader(new FileReader(objFile)); // reset the buffer
while ((String line = in.readLine()) != null)
{
linesFile[numberOfLine] = line;
numberOfLine++;
}
At this point linesFile should be correctly populated with the lines in your file, such that linesFile[i] can be used to access the i'th line in the file.

Categories

Resources