This is kind of a followup to my other question simple Java Regex read between two
Now my code looks like this. I am reading the contents of a file, scanning for whatever between src and -t1. Running this code will return 1 correct link but the source file contains 10 and I can't figure out the loop. I thought another way might be to write to a second file on disk and remove the first link from the original source but I can't code that either:
File workfile = new File("page.txt");
BufferedReader br = new BufferedReader(new FileReader(workfile));
String line;
while ((line = br.readLine()) != null) {
//System.out.println(line);
String url = line.split("<img src=")[1].split("-t1")[0];
System.out.println(url);
}
br.close();
I think you want something like
import java.util.regex.*;
Pattern urlPattern = Pattern.compile("<img src=(.*?)-t1");
while ((line = br.readLine()) != null) {
Matcher m = urlPattern.matcher (line);
while (m.find()) {
System.out.println(m.group(1));
}
}
The regular expression looks for strings beginning with <img src= and ending with -t1 (and looks for the shortest substrings possible, so that more than one can be found in the line). The part in parentheses is a "capture group" to capture the text that gets matched; this is called group 1. Then, for each line, we loop on find() to find all occurrences in each line. Each time we find one, we print what's in group 1.
Related
I am trying to find strings from one text file that are present in another. I have 2 text files, file1.txt and file2.txt the contents of which are as below :
file1.txt
Hello
Second Line
Text line
Final Line
file2.txt
Final Linee
Text llline
line 3 of file2
Helloo
The code I have is as below :
public class Regex {
public static void main (String[] args) throws IOException{
BufferedReader inputFile= new BufferedReader(new FileReader("file1.txt"));
String line;
String pattern;
while((line = inputFile.readLine()) != null){
System.out.println(line);
BufferedReader patternsFile = new BufferedReader(new FileReader("file2.txt"));
while ((pattern = patternsFile.readLine()) != null){
Pattern r = Pattern.compile(pattern);
System.out.println(r);
Matcher m = r.matcher(line);
if (m.find()){
System.out.println("Line corresponding to pattern in file1.txt : " + line);
}
}
}
}
However, the above code returns all the lines from file1.txt that match some pattern from file2.txt. However, I want to find the closest string with edit distance of n letters. So for example if n=1, then the output should be :
Hello
Final Line
and if n=2 then it should output
Hello
Final Line
Text line
I am starting out with Java, and have absolutely no experience with it. Therefore any and all help would be appreciated.
Thank you
Okay, i can give two tips.
First of all, you may want to look at Apache Lucene if you are writing a text analyser or something similar or you need some strong matching features.
Secondly, if you are looking for something more "minimal" you can implement a Cosine Similarity algorithm which is really interesting and should really look at it.
Then you can re-implement it and adapt for you code.
You can find an implementation in Apache Common Text
I am parsing a csv file and the file will be somewhat like this
07-Jan-2016
It is better to lead from behind and to put others in front, especially when you celebrate victory when nice things occur. You take the front line when there is danger. Then people will appreciate your leadership.
The main thing that you have to remember on this journey is, just be nice to everyone and always smile.
Some Other third Quote
and some all content here goes on
---------
----------
-----------
My question is how can i ignore parsing file after this particular line "Some Other third Quote "
I am reading the csv file as shown below
String csvFile = "ip/ASRER070116.csv";
BufferedReader br = null;
String line = "";
try {
br = new BufferedReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
System.out.println(line);
}
}
Could you please tell me how to resolve this ??
You can check every line for a substring and break out of the loop when a particular condition is met
//inside the loop
if (line.contains("some_string"))
break;
I am using Selenium to test web-pages and want to make a simpler way to update the test-cases (not important for the problem).
I loop through lines now with this:
driver.get("http://vg.no"); //open the web page
try {
BufferedReader reader = new BufferedReader(new FileReader("//Users//file.txt"));
try {
String line = null;
while ((line = reader.readLine()) != null) {
driver.findElement(By.cssSelector(line)).click();; //find and click on the data specified in every line in the document
}
} finally {
reader.close();
}
} catch (IOException ioe) {
System.err.println("oops " + ioe.getMessage());
}
Textfile content example now:
a[href*='//nyheter//meninger//']
img[class*='logo-red']
img[class*='article-image']
I want to rebuild it to a solution that start different commands based on regex expressions.
I try to get it to work this way:
vg.no //this will start driver.get("vg.no")
img[class*='logo-red'] //this will start driver.findElement(By.cssSelector("img[class*='logo-red']")).click()
img[class*='article-image']
ItAvisen.no
img[class*='article-image']
img[class*='article-image']
Is there a way I can use regex to start dirrerent parts of the code based on content in the textfile, and use part of the content in the textfile as variables?
It works this way after feedback from cvester:
Finding matches for img[class*='logo-red']
String regexp = "img\\[class\\*=\\'*\\'(.*)\\]";
boolean match = line.matches(regexp);
Will it still be line based?
In that case you can just read line by line and use the String.matches(String regex) for each case you identify.
If you can specify more specific information I might be able to give you a better solution.
How to read lines in a file using Java without losing the tabs, spaces in the beginning (indent)? I need this to read a sourcecode and than to print it out.
I am doing it like this:
br = new BufferedReader(new FileReader(filePath));
String line = null;
while ((line = br.readLine()) != null) {
aList.add(line);
}
(of course with try catch blocks)
Thank you!
It looks like your aList, presumably a JList, is dropping the tab character during rendering.
One solution is to replace your tabs with spaces:
aList.add(line.replaceAll("\t", " "));
Yet another solution is to write your own ListCellRenderer using a JTextPane, although this is not without its pitfalls.
strong textI have a bunch of lines in a textfile and I want to match this ${ALPANUMERIC characters} and replace it with ${SAME ALPHANUMERIC characters plus _SOMETEXT(CONSTANT)}.
I've tried this expression ${(.+)} but it didn't work and I also don't know how to do the replace regex in java.
thank you for your feedback
Here is some of my code :
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
StringBuilder sb = new StringBuilder();
while ((line = br.readLine()) != null) {
Pattern p = Pattern.compile("\\$\\{.+\\}");
Matcher m = p.matcher(line); // get a matcher object
if(m.find()) {
System.out.println("MATCH: "+m.group());
//TODO
//REPLACE STRING
//THEN APPEND String Builder
}
}
OK this above works but it only founds my variable and not the whole line for ex here is my input :
some text before ${VARIABLE_NAME} some text after
some text before ${VARIABLE_NAME2} some text after
some text before some text without variable some text after
... etc
so I just want to replace the ${VARIABLE_NAME} or ${VARIABLE_NAME} with ${VARIABLE_NAME2_SOMETHING} but leave preceding and following text line as it is
EDIT:
I though I though of a way like this :
if(line.contains("\\${([a-zA-Z0-9 ]+)}")){
System.out.println(line);
}
if(line.contains("\\$\\{.+\\}")){
System.out.println(line);
}
My idea was to capture the line containing this, then replace , but the regex is not ok, it works with pattern/matcher combination though.
EDIT II
I feel like I'm getting closer to the solution here, here is what I've come up with so far :
if(line.contains("$")){
System.out.println(line.replaceAll("\\$\\{.+\\}", "$1" +"_SUFFIX"));
}
What I meant by $1 is the string you just matched replace it with itself + _SUFFIX
I would use the String.replaceAll() method like so:
`String old="some string data";
String new=old.replaceAll("$([a-zA-Z0-9]+)","(\1) CONSTANT"); `
The $ is a special regular expression character that represents the end of a line. You'll need to escape it in order to match it. You'll also need to escape the backslash that you use for escaping the dollar sign because of the way Java handles strings.
Once you have your text in a string, you should be able to do the following:
str.replaceAll("\\${([a-zA-Z0-9 ]+)}", "\\${$1 _SOMETEXT(CONSTANT)}")
If you have other characters in your variable names (i.e. underscores, symbols, etc...) then just add them to the character class that you are matching for.
Edit: If you want to use a Pattern and Matcher then there are still a few changes. First, you probably want to compile your Pattern outside of the loop. Second, you can use this, although it is more verbose.
Pattern p = Pattern.compile("\\$\\{.+\\}");
Matcher m = p.matcher(line);
sb.append(m.replaceAll("\\${$1 _SOMETEXT(CONSTANT)}"));
THE SOLUTION :
while ((line = br.readLine()) != null) {
if(line.contains("$")){
sb.append(line.replaceAll("\\$\\{(.+)\\}", "\\${$1" +"_SUFFIX}") + "\n");
}else{
sb.append(line + "\n");
}
}
line = line.replaceAll("\\$\\{\\w+", "$0_SOMETHING");
There's no need to check for the presence of $ or whatever; that's part of what replaceAll() does. Anyway, contains() is not regex-powered like find(); it just does a plain literal text search.