I have a concept but I'm not sure how to go at it. I would like to parse a website and use regex to find certain parts. Then store these parts into a string. After I would like to do the same, but find differences between before and after.
The plan:
parse/regex add lines found to the array before.
refresh the website/parse/regex add lines found to the array after.
compare all strings before with all of string after. println any new ones.
send all after strings to before strings.
Then repeat from 2. forever.
Basically its just checking a website for updated code and telling me what's updated.
Firstly, is this doable?
Here's my code for part 1.
String before[] = {};
int i = 0;
while ((line = br.readLine()) != null) {
Matcher m = p.matcher(line);
if (m.find()) {
before[i]=line;
System.out.println(before[i]);
i++;
}
}
It doesn't work and I am not sure why.
You could do something like this, assuming you're reading from a file:
Scanner s = new Scanner(new File("oldLinesFilePath"));
List<String> oldLines = new ArrayList<String>();
List<String> newLines = new ArrayList<String>();
while (s.hasNext()){
oldLines.add(s.nextLine());
}
s = new Scanner(new File("newLinesFilePath"));
while (s.hasNext()){
newLines.add(s.nextLine());
}
s.close();
for(int i = 0; i < newLines.size(); i++) {
if(!oldLines.contains(newLines.get(i)) {
System.out.println(newLines.get(i));
}
}
Related
I have a .txt file with multiple lines with many emails such as Mike#sport.member.com, Laura#music.member.com, Quinn#music.member.com. How do I split them so I can add them to seperate arraylists like music or sport?
Mike: sport
Laura: music
Quinn: music
Thanks so much.
you can use regex
Pattern pattern = Pattern.compile("(\\w+)#(\\w+).(\\w+).(\\w+)");
Matcher matcher = pattern.matcher("Laura#music.member.com");
if (matcher.find()) {
matcher.group(1); //this return the name
matcher.group(2); //this return music sport
matcher.group(3); // get the company name
matcher.group(4); // get the domain com in this case
}
use this to fill the arraylist as you like
and have a nice day :)
If you don't mind using ArrayUtils you can use this code:
String[] emails = new String[] {"Mike#sport.member.com", "Laura#music.member.com", "Quinn#music.member.com"};
int startLength;
String[][] split = new String[emails.length][];
for(int i = 0; i < emails.length; i++) {
split[i] = emails[i].split("#"); //split by #
startLength = split[i].length;
for(int j = 0; j < startLength; j++) {
split[i] = ArrayUtils.addAll(split[i], split[i][0].split("\\.")); //Split by . and add items to the end of the array
split[i] = ArrayUtils.remove(split[i], 0); //remove the first item of the array
}
}
System.out.println(Arrays.deepToString(split));
Output:
[[Mike, sport, member, com], [Laura, music, member, com], [Quinn, music, member, com]]
What this does is it uses .split("#") to split the emails into arrays separated by "#" Then goes through for array and splits each item in the array with .split(".").
As it splits the elements the second time it adds them to the end of the split array and removes the first item (which was the item that it just split up).
Note that I had to make the startLength variable to keep track of the original size of the split array because in the second loop its constantly changing length.
This gives you a 2D array with all the items but I won't blame you for no using this because its a bit of a mess.
There are two steps to solve this problem.
Read lines from file(a.txt).
public List<String> readFile() throws IOException {
String fileName = "/Users/folder/a.txt";
FileInputStream inputStream = new FileInputStream(fileName);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
String str = null;
List<String> list = new ArrayList<>();
while ((str = bufferedReader.readLine()) != null) {
list.add(str);
}
//close
inputStream.close();
bufferedReader.close();
return list;
}
Split line, take out data.
List<String> lines = readFile();
Pattern pattern = Pattern.compile("([a-zA-Z]+)#([a-zA-Z]+)");
for (String line : lines) {
Matcher matcher = pattern.matcher(line);
if (matcher.find()){
System.out.println(matcher.group(1) + ":" + matcher.group(2));
}
}
The result is
Mike:sport
Laura:music
Quinn:music
I'm trying to end up with a results.txt minus any matching items, having successfully compared some string inputs against another .txt file. Been staring at this code for way too long and I can't figure out why it isn't working. New to coding so would appreciate it if I could be steered in the right direction! Maybe I need a different approach? Apologies in advance for any loud tutting noises you may make. Using Java8.
//Sending a String[] into 'searchFile', contains around 8 small strings.
//Example of input: String[]{"name1","name2","name 3", "name 4.zip"}
^ This is my exclusions list.
public static void searchFile(String[] arr, String separator)
{
StringBuilder b = new StringBuilder();
for(int i = 0; i < arr.length; i++)
{
if(i != 0) b.append(separator);
b.append(arr[i]);
String findME = arr[i];
searchInfo(MyApp.getOptionsDir()+File.separator+"file-to-search.txt",findME);
}
}
^This works fine. I'm then sending the results to 'searchInfo' and trying to match and remove any duplicate (complete, not part) strings. This is where I am currently failing. Code runs but doesn't produce my desired output. It often finds part strings rather than complete ones. I think the 'results.txt' file is being overwritten each time...but I'm not sure tbh!
file-to-search.txt contains: "name2","name.zip","name 3.zip","name 4.zip" (text file is just a single line)
public static String searchInfo(String fileName, String findME)
{
StringBuffer sb = new StringBuffer();
try {
BufferedReader br = new BufferedReader(new FileReader(fileName));
String line = null;
while((line = br.readLine()) != null)
{
if(line.startsWith("\""+findME+"\""))
{
sb.append(line);
//tried various replace options with no joy
line = line.replaceFirst(findME+"?,", "");
//then goes off with results to create a txt file
FileHandling.createFile("results.txt",line);
}
}
} catch (Exception e) {
e.printStackTrace();
}
return sb.toString();
}
What i'm trying to end up with is a result file MINUS any matching complete strings (not part strings):
e.g. results.txt to end up with: "name.zip","name 3.zip"
ok with the information I have. What you can do is this
List<String> result = new ArrayList<>();
String content = FileUtils.readFileToString(file, "UTF-8");
for (String s : content.split(", ")) {
if (!s.equals(findME)) { // assuming both have string quotes added already
result.add(s);
}
}
FileUtils.write(newFile, String.join(", ", result), "UTF-8");
using apache commons file utils for ease. You may add or remove spaces after comma as per your need.
So, I've found a word in a document and print the line in which the word is present like this:
say example file contains : "The quick brown fox jumps over the lazy dog.Jackdaws love my big sphinx of quartz."
FileInputStream fstream = new FileInputStream(file);
BufferedReader br = new BufferedReader(new InputStreamReader(fstream));
String strLine;
//Read File Line By Line
while((strLine = br.readLine()) != null){
//check to see whether testWord occurs at least once in the line of text
check = strLine.toLowerCase().contains(testWord.toLowerCase());
if(check){
//get the line, and parse its words into a String array
String[] lineWords = strLine.split("\\s+");
for(int i=0;i<lineWords.length;i++){
System.out.print(lineWords[i]+ ' ');
}
And if I search for 'fox' , then linewords[] will contain tokens from the first sentence. and linewords[3] = fox. To print the color of the fox, I need linewords[2].
I was wondering how can we get the 'i' of a token in that linewords[i], because I want the output to be linewords[i-1]
You could use a hashMap which stores the word and a list with the indices.
HashMap<String, List<Integer>> indices = new HashMap<>();
So in the for loop you fill the HashMap:
for(int i=0;i<lineWords.length;i++){
String word = lineWords[i];
if (!indices.contains(word)) {
indices.put(word, new ArrayList<>();
}
indices.get(word).add(i);
}
To get all the indices of a specific word call:
List<Integer> indicesForWord = indices.get("fox");
And to get the i - 1 word call:
for (int i = 0; i < indicesForWord.size(); i++) {
int index = indicesForWord[i] - 1;
if (index >= 0 || index >= lineWords.length) {
System.out.println(lineWords[index]);
}
}
If you are using Java 8, it is straightforward:
List<String> words = Files.lines(Paths.get("files/input.txt"))
.flatMap(line -> Arrays.stream(line.split("\\s+")))
.collect(Collectors.toList());
int index = words.indexOf("fox");
System.out.println(index);
if(index>0)
System.out.println(words.get(index-1));
This solution works also when the word you are searching is the first words in a line. I hope it helps!
If you need to find all occurences, you can use the indexOfAll method from this post.
That can be done by traversing the array and when you get your word , print the one before it.Here's how:-
if(lineWords[0].equals(testWord)
return;//no preceding word
for(int i=1;i<lineWords.length;i++){
if(lineWords[i].equals(testWord){
System.out.println(lineWords[i-1]);
break;
}
}
I got this code here:
try{
FileReader file = new FileReader("/Users/Tda/desktop/ReadFiles/tentares.txt");
BufferedReader br = new BufferedReader(file);
String line = null;
while((line = br.readLine()) != null){
String[] values = line.split(",");
grp1 = new int[values.length];
for(int i=0; i<grp1.length; i++){
try {
grp1[i]= Integer.parseInt(values[i]);
}catch (NumberFormatException e) {
continue;
}
}
System.out.println(Arrays.toString(grp1));
}
System.out.println("");
br.close();
}catch(IOException e){
System.out.println(e);
}
This is what the file im reading contains.
grp1:80,82,91,100,76,65,85,88,97,55,69,88,75,97,81
grp2:72,89,86,85,99,47,79,88,100,76,83,94,84,82,93
Right now im storing the values into one int array.
But if i wanted to store each line of values into two arrays?
Thought about using Arrays.CopyOfRange somehow, and copy the values from the int array
into two new arrays.
This answer won't correspond to your question, but will give a hint to my comment under your question post.
Try this at the beginning of your while loop:
Use String.IndexOf() to find the first occurence of the char : into each line. This will be the beginning index for the second part.
Call String.Substring() from your new beginning index to line.length. This will give you the line without the characters and your first numbers aren't lost.
Before the while
List<int[]> groups = new ArrayList<>();
Before the end of the loop:
groups.add(grp1);
Afterwards:
for (int[] grp : groups) {
...
}
A List is useful for a growing "array".
groups.size() grp1.length
groups.get(3) grp1[3]
groups.set(3, x) grp1[3 = x
I want to remove stop words in java.
So, I read stop words from text file.
and store Set
Set<String> stopWords = new LinkedHashSet<String>();
BufferedReader br = new BufferedReader(new FileReader("stopwords.txt"));
String words = null;
while( (words = br.readLine()) != null) {
stopWords.add(words.trim());
}
br.close();
And, I read another text file.
So, I wanna remove to duplicate string in text file.
How can I?
using set for stopword :
Set<String> stopWords = new LinkedHashSet<String>();
BufferedReader SW= new BufferedReader(new FileReader("StopWord.txt"));
for(String line;(line = SW.readLine()) != null;)
stopWords.add(line.trim());
SW.close();
and ArrayList for input txt_file
BufferedReader br = new BufferedReader(new FileReader(txt_file.txt));
//make your arraylist here
// function deletStopWord() for remove all stopword in your "stopword.txt"
public ArrayList<String> deletStopWord(Set stopWords,ArrayList arraylist){
System.out.println(stopWords.contains("?"));
ArrayList<String> NewList = new ArrayList<String>();
int i=3;
while(i < arraylist.size() ){
if(!stopWords.contains(arraylist.get(i))){
NewList.add((String) arraylist.get(i));
}
i++;
}
System.out.println(NewList);
return NewList;
}
arraylist=deletStopWord(stopWords,arraylist);
You want to remove duplicate words from file, below is the high level logic for same.
Read File
Loop through file content(i.e one line at a time)
Have string tokenizer for that line based on space
Add each each token to your set. This will make sure that you have only one entry per word.
Close file
Now you have set that contains all the unique word of file.
Using the ArrayList may be more easier.
public ArrayList removeDuplicates(ArrayList source){
ArrayList<String> newList = new ArrayList<String>();
for (int i=0; i<source.size(); i++){
String s = source.get(i);
if (!newList.contains(s)){
newList.add(s);
}
}
return newList;
}
Hope this helps.
If you simply want to remove a certain set of words from the words in a file, you can do it however you want. But if you are dealing with a problem involving natural language processing, you should use a library.
For example, using Lucene for tokenizing will seem more complicated at first, but it will deal with myriad complications that you will overlook, and allow for great flexibility should you change your mind on the specific stopwords, on how you are tokenizing, whether you care about case, etc.
You should try using StringTokenizer.