How do I split multiple lines with multiple delimeters? - java

I have a .txt file with multiple lines with many emails such as Mike#sport.member.com, Laura#music.member.com, Quinn#music.member.com. How do I split them so I can add them to seperate arraylists like music or sport?
Mike: sport
Laura: music
Quinn: music
Thanks so much.

you can use regex
Pattern pattern = Pattern.compile("(\\w+)#(\\w+).(\\w+).(\\w+)");
Matcher matcher = pattern.matcher("Laura#music.member.com");
if (matcher.find()) {
matcher.group(1); //this return the name
matcher.group(2); //this return music sport
matcher.group(3); // get the company name
matcher.group(4); // get the domain com in this case
}
use this to fill the arraylist as you like
and have a nice day :)

If you don't mind using ArrayUtils you can use this code:
String[] emails = new String[] {"Mike#sport.member.com", "Laura#music.member.com", "Quinn#music.member.com"};
int startLength;
String[][] split = new String[emails.length][];
for(int i = 0; i < emails.length; i++) {
split[i] = emails[i].split("#"); //split by #
startLength = split[i].length;
for(int j = 0; j < startLength; j++) {
split[i] = ArrayUtils.addAll(split[i], split[i][0].split("\\.")); //Split by . and add items to the end of the array
split[i] = ArrayUtils.remove(split[i], 0); //remove the first item of the array
}
}
System.out.println(Arrays.deepToString(split));
Output:
[[Mike, sport, member, com], [Laura, music, member, com], [Quinn, music, member, com]]
What this does is it uses .split("#") to split the emails into arrays separated by "#" Then goes through for array and splits each item in the array with .split(".").
As it splits the elements the second time it adds them to the end of the split array and removes the first item (which was the item that it just split up).
Note that I had to make the startLength variable to keep track of the original size of the split array because in the second loop its constantly changing length.
This gives you a 2D array with all the items but I won't blame you for no using this because its a bit of a mess.

There are two steps to solve this problem.
Read lines from file(a.txt).
public List<String> readFile() throws IOException {
String fileName = "/Users/folder/a.txt";
FileInputStream inputStream = new FileInputStream(fileName);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
String str = null;
List<String> list = new ArrayList<>();
while ((str = bufferedReader.readLine()) != null) {
list.add(str);
}
//close
inputStream.close();
bufferedReader.close();
return list;
}
Split line, take out data.
List<String> lines = readFile();
Pattern pattern = Pattern.compile("([a-zA-Z]+)#([a-zA-Z]+)");
for (String line : lines) {
Matcher matcher = pattern.matcher(line);
if (matcher.find()){
System.out.println(matcher.group(1) + ":" + matcher.group(2));
}
}
The result is
Mike:sport
Laura:music
Quinn:music

Related

How do I count the occurence of a word in an array in Java?

I am working on a project where I have to read the data from a file into my code, in the txt file I have columns of data, and I have managed to separate each column of data into an array with this code.
public static void main(String[] args) {
String line = "";
String date = "";
ArrayList<String> date = new ArrayList<String>();
try {
FileReader fr = new FileReader("list.txt");
BufferedReader br = new BufferedReader(fr);
while ((line = br.readLine()) != null) {
line.split("\\s+");
date.add(line.split("\\s+")[0]);
System.out.println(line.split("\\s+")[0]);
}
} catch (IOException e) {
System.out.println("File not found!");
}
This will output the first column of data from the "list.txt" file which is...
30-Nov-2016
06-Oct-2016
05-Feb-2016
04-Sep-2016
18-Apr-2016
09-Feb-2016
22-Oct-2016
20-Aug-2016
17-Dec-2016
25-Dec-2016
However, I want to count the occurrence of the word "Feb" so for example it will come up...
"The month February occurs: 2 times"
But I'm struggling to find the right code, could somebody please help me on this matter I've been trying for over 24 hours, any help will be greatly appreciated, I can't find any other questions that help me.
Another solution could be using split
String month = "Feb";
int count = 0;
while ((line = br.readLine()) != null)
{
String strDate = line.split("\\s+")[0]; // get first column, which has date
String temp = strDate.split("\\-")[1]; // get Month from extracted date.
if (month.equalsIgnoreCase(temp))
{
count++;
// or store strDate into List for further process.
}
}
System.out.println (count);// should print total occurrence of date with Feb month
==Edited==
Since, you are extracting date from each line using line.split("\\s+")[0], which means actual string, which only contains date would be extract string.
For simplicity, you could simply use a regular expression, something like...
Pattern p = Pattern.compile("Feb", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher("30-Nov-2016, 06-Oct-2016, 05-Feb-2016, 04-Sep-2016, 18-Apr-2016, 09-Feb-2016, 22-Oct-2016, 20-Aug-2016, 17-Dec-2016, 25-Dec-2016");
int count = 0;
while (m.find()) {
count++;
}
System.out.println("Count = " + count);
Which, based on the input, would be 2.
Now, obviously, if you're reading each value from a file one at a time, this is not that efficient, and simply using something like...
if (line.toLowerCase().concat("feb")) {
count++;
}
would be simple and quicker
Updated...
So, based on the provided input data and the following code...
Pattern p = Pattern.compile("Feb", Pattern.CASE_INSENSITIVE);
int count = 0;
try (BufferedReader br = new BufferedReader(new InputStreamReader(Test.class.getResourceAsStream("Data.txt")))) {
String text = null;
while ((text = br.readLine()) != null) {
Matcher m = p.matcher(text);
if (m.find()) {
count++;
}
}
System.out.println(count);
} catch (IOException ex) {
Logger.getLogger(Test.class.getName()).log(Level.SEVERE, null, ex);
}
It prints 67.
Now, this is brute force method, because I'm checking the whole line. In order to overcome possible mismatches in the text, you should split the line by the common delimiter (ie tab character) and check the first element, for example...
String[] parts = text.split("\t");
Matcher m = p.matcher(parts[0]);

Splitting data into Arrays

I am trying to read data from a text file using a Buffered Reader. I'm trying to split the data into two Arrays, one of them is a double and the other one is a string. Below is the text file content:
55.6
Scholtz
85.6
Brown
74.9
Alawi
45.2
Weis
68.0
Baird
55
Baynard
68.5
Mills
65.1
Gibb
80.7
Grovner
87.6
Weaver
74.8
Kennedy
83.5
Landry.
Basically I'm trying to take all the numbers and put it into the double array, and take all the names and put it into the string array. Any ideas?
You could possibly get the entire string from the buffered reader and then use regex to parse out the digits and other data. A regex like \d+\.*\d should work to parse out the digits. And then a regex like [A-Za-z]+ should get all of the names. Then take each set of data from the regular expressions and split them into their respective arrays using .split("").
Try this:
String file = "path to file";
double dArr[] = new double[100];
String sArr[] = new String[100];
int i = 0, j = 0;
try {
FileReader fr = new FileReader(file);
BufferedReader br = new BufferedReader(fr);
String line;
while ((line = br.readLine()) != null) {
Pattern p = Pattern.compile("([0-9]*)\\.[0-9]*"); // should start with any number of 0-9 then "." and then any number of 0-9
Matcher m = p.matcher(line);
if (m.matches()) {
dArr[i] = Double.parseDouble(line);
i++;
} else {
sArr[j] = line;
j++;
}
}
} catch (IOException e) {
e.printStackTrace();
}
Suggestion: Try List instead of array if uncertain about number of elements
55 is treated as String as it is int

how to find i of a token in an array[i]

So, I've found a word in a document and print the line in which the word is present like this:
say example file contains : "The quick brown fox jumps over the lazy dog.Jackdaws love my big sphinx of quartz."
FileInputStream fstream = new FileInputStream(file);
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while((strLine = br.readLine()) != null){
//check to see whether testWord occurs at least once in the line of text
check = strLine.toLowerCase().contains(testWord.toLowerCase());
if(check){
//get the line, and parse its words into a String array
String[] lineWords = strLine.split("\\s+");
for(int i=0;i<lineWords.length;i++){
System.out.print(lineWords[i]+ ' ');
}
And if I search for 'fox' , then linewords[] will contain tokens from the first sentence. and linewords[3] = fox. To print the color of the fox, I need linewords[2].
I was wondering how can we get the 'i' of a token in that linewords[i], because I want the output to be linewords[i-1]
You could use a hashMap which stores the word and a list with the indices.
HashMap<String, List<Integer>> indices = new HashMap<>();
So in the for loop you fill the HashMap:
for(int i=0;i<lineWords.length;i++){
String word = lineWords[i];
if (!indices.contains(word)) {
indices.put(word, new ArrayList<>();
}
indices.get(word).add(i);
}
To get all the indices of a specific word call:
List<Integer> indicesForWord = indices.get("fox");
And to get the i - 1 word call:
for (int i = 0; i < indicesForWord.size(); i++) {
int index = indicesForWord[i] - 1;
if (index >= 0 || index >= lineWords.length) {
System.out.println(lineWords[index]);
}
}
If you are using Java 8, it is straightforward:
List<String> words = Files.lines(Paths.get("files/input.txt"))
.flatMap(line -> Arrays.stream(line.split("\\s+")))
.collect(Collectors.toList());
int index = words.indexOf("fox");
System.out.println(index);
if(index>0)
System.out.println(words.get(index-1));
This solution works also when the word you are searching is the first words in a line. I hope it helps!
If you need to find all occurences, you can use the indexOfAll method from this post.
That can be done by traversing the array and when you get your word , print the one before it.Here's how:-
if(lineWords[0].equals(testWord)
return;//no preceding word
for(int i=1;i<lineWords.length;i++){
if(lineWords[i].equals(testWord){
System.out.println(lineWords[i-1]);
break;
}
}

How to split a file into several tokens

I was trying to tokenize an input file from sentences into tokens(words).
For example,
"This is a test file." into five words "this" "is" "a" "test" "file", omitting the punctuations and the white spaces. And store them into an arraylist.
I tried to write some codes like this:
public static ArrayList<String> tokenizeFile(File in) throws IOException {
String strLine;
String[] tokens;
//create a new ArrayList to store tokens
ArrayList<String> tokenList = new ArrayList<String>();
if (null == in) {
return tokenList;
} else {
FileInputStream fStream = new FileInputStream(in);
DataInputStream dataIn = new DataInputStream(fStream);
BufferedReader br = new BufferedReader(new InputStreamReader(dataIn));
while (null != (strLine = br.readLine())) {
if (strLine.trim().length() != 0) {
//make sure strings are independent of capitalization and then tokenize them
strLine = strLine.toLowerCase();
//create regular expression pattern to split
//first letter to be alphabetic and the remaining characters to be alphanumeric or '
String pattern = "^[A-Za-z][A-Za-z0-9'-]*$";
tokens = strLine.split(pattern);
int tokenLen = tokens.length;
for (int i = 1; i <= tokenLen; i++) {
tokenList.add(tokens[i - 1]);
}
}
}
br.close();
dataIn.close();
}
return tokenList;
}
This code works fine except I found out that instead of make a whole file into several words(tokens), it made a whole line into a token. "area area" becomes a token, instead of "area" appeared twice. I don't see the error in my codes. I believe maybe it's something wrong with my trim().
Any valuable advices is appreciated. Thank you so much.
Maybe I should use scanner instead?? I'm confused.
I think Scanner is more approprate for this task. As to this code, you should fix regex, try "\\s+";
Try pattern as String pattern = "[^\\w]"; in the same code

sending all read lines to string array

I have a concept but I'm not sure how to go at it. I would like to parse a website and use regex to find certain parts. Then store these parts into a string. After I would like to do the same, but find differences between before and after.
The plan:
parse/regex add lines found to the array before.
refresh the website/parse/regex add lines found to the array after.
compare all strings before with all of string after. println any new ones.
send all after strings to before strings.
Then repeat from 2. forever.
Basically its just checking a website for updated code and telling me what's updated.
Firstly, is this doable?
Here's my code for part 1.
String before[] = {};
int i = 0;
while ((line = br.readLine()) != null) {
Matcher m = p.matcher(line);
if (m.find()) {
before[i]=line;
System.out.println(before[i]);
i++;
}
}
It doesn't work and I am not sure why.
You could do something like this, assuming you're reading from a file:
Scanner s = new Scanner(new File("oldLinesFilePath"));
List<String> oldLines = new ArrayList<String>();
List<String> newLines = new ArrayList<String>();
while (s.hasNext()){
oldLines.add(s.nextLine());
}
s = new Scanner(new File("newLinesFilePath"));
while (s.hasNext()){
newLines.add(s.nextLine());
}
s.close();
for(int i = 0; i < newLines.size(); i++) {
if(!oldLines.contains(newLines.get(i)) {
System.out.println(newLines.get(i));
}
}

Categories

Resources