please possible make matching between array and content of file without using regex.
please replay:-
if i have a txt file contain this sentences:
the sql is the best book for jon.
book sql is the best title for jon.
the html for author asr.
book java for famous writer amr.
and if i stored this string in array;
sql html java
jon asr amr
I want to search for content of array in the file for example if "sql" and"jon" in the same sentence in the txt file then write the sentence and
write all word before "sql" named as prefix and all word between two "sql" and"jon" and named as middle and all word after "jon"named as suffix.
I try to write cod :
String book[][] = {{"sql","html","java"},{"jon","asr","amr"}};
String input;
try {
BufferedReader br = new BufferedReader(new FileReader(new File("sample.txt") ));
input= br.readLine();
while ((input)!= null)
{
if((book[0][0].contains(input))&( book[1][0]).contains(input)){
System.out.println();
if((book[0][1].contains(input))&( book[1][1]).contains(input)){
System.out.println();
if((book[0][2].contains(input))&( book[1][2]).contains(input)){
System.out.println();
}
else
System.out.println("not match");
}}
}} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
i don't know how to write code to extract prefix,middle and suffix
the output is:
the sentence is : the sql is the best book for jon.
prefix is :the
middle is:is the best book for
suffix is: null
and so on...
You should use Pattern class for that. http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Tutorial http://docs.oracle.com/javase/tutorial/essential/regex/
Sorry, I'm not going to write the exact code.
The pattern will look like
"(.*)(?:sql|html|java)(.*)(?:jon|asr|amr)(.*)"
Then, in Matcher you will find your prefix, middle and suffix as matcher.group(1), matcher.group(2) and matcher.group(3).
Here is the code you need:
String line = "the sql is the best book for jon.";
String regex = "(.*)(sql|html|java)(.*)(jon|asr|amr)(.*)";
Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(line);
matcher.find();
String prefix = matcher.group(1);
String firstMatch = matcher.group(2);
String middle = matcher.group(3);
String secondMatch = matcher.group(4);
String suffix = matcher.group(5);
Related
I'm working on a program, that will find every occurence of a texture name in a file and then store it's name in a List. So lets say, that i have a text file with string that looks like this:
{"sampletext":"urlDistortion":"textures/Noises/FX_GaussianNoise_10x.png","sample text sample text "url":"textures/shader/shader_test/FX_Noise_Wispy_Dense.png","urlGradient":"textures/gradients/sparks.png","blendMode": "sample text","urlMask":"textures/shader_test/FX_Radial_Grad.png"}
Every texture name starts with "textures and then goes it's location which ends with quotation mark in example "textures/gradients/sparks.png".
Now i want to extract the file name and store it in a list, so from the first occurence which is "textures/Noises/FX_GaussianNoise_10x.png" i'll get just this "FX_GaussianNoise_10x.png" part. I came with an idea, that i'll create pattern that will find "textures", skip the location and somehow copy the remaining filename part.
try {
File file;
Pattern p = Pattern.compile("textures/");
List <String> textureNames = new ArrayList<>();
for (File f : list) {
file= f.getAbsoluteFile();
Scanner scanner = new Scanner(new FileReader(file));
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
Matcher matcher = p.matcher(line);
while (matcher.find()){
//and here i would like to add this texture name to my list and
continue searching for the next occurence
}
System.out.println("found " +count);
}
}
}
I know that there is a start() method in a Matcher class, which returns the start index of the previous match so i could do something like this inside the while loop
String s = line.substring(matcher.start(),)
and then just add this to the list, but I don't know how could i specify the endIndex.
If anyone knows how can i do it, or if there is a better way to achieve this i'll be grateful for help.
Here is simple test case that shows an idea for your inner loop:
#Test
public void testParse() {
String line = "{\"sampletext\":\"urlDistortion\":\"textures/Noises/FX_GaussianNoise_10x.png\",\"sample text sample text \"url\":\"textures/shader/shader_test/FX_Noise_Wispy_Dense.png\",\"urlGradient\":\"textures/gradients/sparks.png\",\"blendMode\": \"sample text\",\"urlMask\":\"textures/shader_test/FX_Radial_Grad.png\"}";
Pattern p = Pattern.compile("\"(textures/[^\\\"]*)\"");
Matcher m = p.matcher(line);
while (m.find()) {
System.out.println("found " + m.group(1));
}
}
It produces this output:
found textures/Noises/FX_GaussianNoise_10x.png
found textures/shader/shader_test/FX_Noise_Wispy_Dense.png
found textures/gradients/sparks.png
found textures/shader_test/FX_Radial_Grad.png
I would love to scrape the titles of the top 250 movies (https://www.imdb.com/chart/top/) for educational purposes.
I have tried a lot of things but I messed up at the end every time. Could you please help me scrape the titles with Java and regex?
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.URL;
import java.net.URLConnection;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class scraping {
public static void main (String args[]) {
try {
URL URL1=new URL("https://www.imdb.com/chart/top/");
URLConnection URL1c=URL1.openConnection();
BufferedReader br=new BufferedReader(new
InputStreamReader(URL1c.getInputStream(),"ISO8859_7"));
String line;int lineCount=0;
Pattern pattern = Pattern.compile("<td\\s+class=\"titleColumn\"[^>]*>"+ ".*?</a>");
Matcher matcher = pattern.matcher(br.readLine());
while(matcher.find()){
System.out.println(matcher.group());
}
} catch (Exception e) {
System.out.println("Exception: " + e.getClass() + ", Details: " + e.getMessage());
}
}
}
Thank you for your time.
Parsing Mode
To parse an XML or HTML content, a dedicated parser will always be easier than a regex, for HTML in Java there is Jsoup, you'll get your films very easily :
Document doc = Jsoup.connect("https://www.imdb.com/chart/top/").get();
Elements films = doc.select("td.titleColumn");
for (Element film : films) {
System.out.println(film);
}
<td class="titleColumn"> 1. Les évadés <span class="secondaryInfo">(1994)</span> </td>
<td class="titleColumn"> 2. Le parrain <span class="secondaryInfo">(1972)</span> </td>
To get the content only :
for (Element film : films) {
System.out.println(film.getElementsByTag("a").text());
}
Les évadés
Le parrain
Le parrain, 2ème partie
Regex Mode
You were not reading the whole content of the website, also it's XML type so all is not on the same line, you can't find the beginning and the end of the balise on the same line, you may read all, and then use the regex, it gives something like this :
URL url = new URL("https://www.imdb.com/chart/top/");
InputStream is = url.openStream();
StringBuilder sb = new StringBuilder();
try (BufferedReader br = new BufferedReader(new InputStreamReader(is))) {
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
}
} catch (MalformedURLException e) {
e.printStackTrace();
throw new MalformedURLException("URL is malformed!!");
} catch (IOException e) {
e.printStackTrace();
throw new IOException();
}
// Full line
Pattern pattern = Pattern.compile("<td class=\"titleColumn\">.*?</td>");
String content = sb.toString();
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
System.out.println(matcher.group());
}
// Title only
Pattern pattern = Pattern.compile("<td class=\"titleColumn\">.+?<a href=.+?>(.+?)</a>.+?</td>");
String content = sb.toString();
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
As the existing answer says, the Jsoup or other HTML parser should be used for sake of correctness.
I only complete your current solution if you want to use a similar approach for a more reasonable use-case. It cannot work, because you read only the first line from the buffer:
Matcher matcher = pattern.matcher(br.readLine);
Also the Regex pattern is wrong, because your solution seems is built to read 1 line-by-line and test that only line agasint the Regex. The source of the website shows that the content of the table row is spread across multiple lines.
The solution based on reading 1 line should use much simpler Regex (I am sorry, the example contains movie namess in my native language):
\" ?>([^<]+)<\/a>
An example of working code is:
try {
URL URL1=new URL("https://www.imdb.com/chart/top/");
URLConnection URL1c=URL1.openConnection();
BufferedReader br=new BufferedReader(new
InputStreamReader(URL1c.getInputStream(),"ISO8859_7"));
String line;int lineCount=0;
Pattern pattern = Pattern.compile("\" ?>([^<]+)<\\/a>"); // Compiled once
br.lines() // Stream<String>
.map(pattern::matcher) // Stream<Matcher>
.filter(Matcher::find) // Stream<Matcher> .. if Regex matches
.limit(250) // Stream<Matcher> .. to avoid possible mess below
.map(m -> m.group(1)) // String<String> .. captured movie name
.forEach(System.out::println); // Printed out
} catch (Exception e) {
System.out.println("Exception: " + e.getClass() + ", Details: " + e.getMessage());
}
Note the following:
Regex is not suitable for this. Use a library built for this use-case.
My solution is an working example, but the performance is poor (Stream API, Regex pattern matching of each line)...
Solution like this doesn't guarantee a possible mess. The Regex can captrue more than intended.
The website content, CSS class names etc. might change in the future.
I'm trying to get the result of a match with two lines and more, this is my text in a file (for JOURNAL ENTRIES for Wincor ATM):
DEMANDE SOLDE
N° CARTE : 1500000001180006
OPERATION NO. : 585068
========================================
RETRAIT
N° CARTE 1600001002200006
OPERATION NO. : 585302
MONTANT : MAD 200.00
========================================
... etc.
Theare more lines repeated for each operation : retrait(ATMs), demande de solde (balance inquiry), which I want to get a resul like: RETRAIT\nN° CARTE 1600001002200006
My java code:
String filename="20140604.jrn";
File file=new File(filename);
String regexe = ".*RETRAIT^\r\n.*CARTE.*\\d{16}"; // Work with .*CARTE.*\\d{16}: result: N° CARTE : 1500000001180006 N° CARTE 1600001002200006
Pattern pattern = Pattern.compile(regexe,Pattern.MULTILINE);
try {
BufferedReader in = new BufferedReader(new FileReader(file));
while (in.ready()) {
String s = in.readLine();
Matcher matcher = pattern.matcher(s);
while (matcher.find()) { // find the next match
System.out.println("found the pattern \"" + matcher.group());
}
}
in.close();
}
catch(IOException e) {
System.out.println("File 20140604.jrn not found");
}
Any Solution Please ?
I am unable to test this right now, but it looks like you have the boundary special character '^' in the wrong spot. It is trying to match RETRAIT followed by the beginning of a line followed by newline characters, when the beginning of the line won't start until after the newline characters.
UPDATE:
With an online java regex tool, I've been able to test this:
^RETRAIT\s*\w+.*CARTE\s+\d{16}
which matches what you want in multiline mode. The \s special character consumes whitespace (including carriage return and new line), which is more resilient than checking explicitly for \n or \r.
I have a text file that contains meta-urls in the following form:
http://www.xyz.com/.*services/
http://www.xyz.com/.*/wireless
I want to compare all the patterns from that file with my URL, and execute an action if I find a match. This matching process is hard to understand for me.
Assuming splitarray[0] contains the first line of text file:
String url = page.getWebURL().getURL();
URL url1 = new URL(url);
how can we compare url1 with splitarray[0]?
UPDATED
BufferedReader readbuffer = null;
try {
readbuffer = new BufferedReader(new FileReader("filters.txt"));
} catch (FileNotFoundException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
String strRead;
try {
while ((strRead=readbuffer.readLine())!=null){
String splitarray[] = strRead.split(",");
String firstentry = splitarray[0];
String secondentry = splitarray[1];
String thirdentry = splitarray[2];
//String fourthentry = splitarray[3];
//String fifthentry = splitarray[4];
System.out.println(firstentry + " " + secondentry+ " " +thirdentry);
URL url1 = new URL("http://www.xyz.com/ship/reach/news-and");
Pattern p = Pattern.compile("http://www.xyz.com/.*/reach");
Matcher m = p.matcher(url1.toString());
if (m.matches()) {
//Do whatever
System.out.println("Yes Done");
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Matching is working fine... But if I want that any url which start with the pattern giving in the splitarray[0] then do this... how we can implement this... As in the above case it is not matching but this url http://www.xyz.com/ship/w is from this pattern only http://www.xyz.com/.*/reach So any url that starts with this pattern.. just do this thing in the if loop... Any suggestions will be appreciated...!!
You are missing a step here. You first need to translate your URLs to a regular expression, or design a method to use those URLs, then only can you compare your URL url1 to those patterns.
Based on the patterns you have shown, I assume you are designing software for a xyz solution, like their routers. Therefore, your URLs probably fall in a simple pattern style, like
http://www.xyz.com/regular-expression-here
I'm confused as to where the regexes are coming from. The text file? In any case, you'll have a hard time comparing url1 to any regexes because it's a URL object, and regex compares strings. So you'll want to stick with your String url instead.
Try this:
Pattern p = Pattern.compile(splitarray[0]);
Matcher m = p.matcher(url);
if (m.matches()) {
//Do whatever
}
The m.matches() method checks whether the entire String you provide matches the pattern, which is probably what you want here. If you need to check whether part of your String matches, use m.find() instead.
Update
Since you're only looking to match the pattern at the beginning of the String, you'll want to use m.find() instead. The special character ^ only matches at the beginning of a String, so add that to the front of your regex, e.g.:
Pattern p = Pattern.compile("^" + splitarray[0]);
etc.
strong textI have a bunch of lines in a textfile and I want to match this ${ALPANUMERIC characters} and replace it with ${SAME ALPHANUMERIC characters plus _SOMETEXT(CONSTANT)}.
I've tried this expression ${(.+)} but it didn't work and I also don't know how to do the replace regex in java.
thank you for your feedback
Here is some of my code :
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
StringBuilder sb = new StringBuilder();
while ((line = br.readLine()) != null) {
Pattern p = Pattern.compile("\\$\\{.+\\}");
Matcher m = p.matcher(line); // get a matcher object
if(m.find()) {
System.out.println("MATCH: "+m.group());
//TODO
//REPLACE STRING
//THEN APPEND String Builder
}
}
OK this above works but it only founds my variable and not the whole line for ex here is my input :
some text before ${VARIABLE_NAME} some text after
some text before ${VARIABLE_NAME2} some text after
some text before some text without variable some text after
... etc
so I just want to replace the ${VARIABLE_NAME} or ${VARIABLE_NAME} with ${VARIABLE_NAME2_SOMETHING} but leave preceding and following text line as it is
EDIT:
I though I though of a way like this :
if(line.contains("\\${([a-zA-Z0-9 ]+)}")){
System.out.println(line);
}
if(line.contains("\\$\\{.+\\}")){
System.out.println(line);
}
My idea was to capture the line containing this, then replace , but the regex is not ok, it works with pattern/matcher combination though.
EDIT II
I feel like I'm getting closer to the solution here, here is what I've come up with so far :
if(line.contains("$")){
System.out.println(line.replaceAll("\\$\\{.+\\}", "$1" +"_SUFFIX"));
}
What I meant by $1 is the string you just matched replace it with itself + _SUFFIX
I would use the String.replaceAll() method like so:
`String old="some string data";
String new=old.replaceAll("$([a-zA-Z0-9]+)","(\1) CONSTANT"); `
The $ is a special regular expression character that represents the end of a line. You'll need to escape it in order to match it. You'll also need to escape the backslash that you use for escaping the dollar sign because of the way Java handles strings.
Once you have your text in a string, you should be able to do the following:
str.replaceAll("\\${([a-zA-Z0-9 ]+)}", "\\${$1 _SOMETEXT(CONSTANT)}")
If you have other characters in your variable names (i.e. underscores, symbols, etc...) then just add them to the character class that you are matching for.
Edit: If you want to use a Pattern and Matcher then there are still a few changes. First, you probably want to compile your Pattern outside of the loop. Second, you can use this, although it is more verbose.
Pattern p = Pattern.compile("\\$\\{.+\\}");
Matcher m = p.matcher(line);
sb.append(m.replaceAll("\\${$1 _SOMETEXT(CONSTANT)}"));
THE SOLUTION :
while ((line = br.readLine()) != null) {
if(line.contains("$")){
sb.append(line.replaceAll("\\$\\{(.+)\\}", "\\${$1" +"_SUFFIX}") + "\n");
}else{
sb.append(line + "\n");
}
}
line = line.replaceAll("\\$\\{\\w+", "$0_SOMETHING");
There's no need to check for the presence of $ or whatever; that's part of what replaceAll() does. Anyway, contains() is not regex-powered like find(); it just does a plain literal text search.