I can't get the url with Pattern.compile - java

What I really want is to return the URLs that are in the txt variable. the url comes from randomly then not are regular expreccion to use or not is part of my code this poorly written... use google translator only sorry I speak Spanish; ol
//I can't get the url with Pattern.compile
//My code example::::: in the works :(
String txt="sources: [{file:\"http://pla.cdn19.fx.rrrrrr.com/luq5t4nidtixexzw6wblbiexs7hg2hdu4coqdlltx6t3hu3knqhbfoxp7jna/normal.mp4\",label:\"360p\"}],sources: [{file:\"http://pla.cdn19.fx.rrrrrr.com/luq5t4nidtixexzw6wblbiexs7hg2hdu4coqdlltx6t3hu3knqhbfoxp7jna/normal.mp4\",label:\"360p\"}]";
ArrayList<String> getfi = new ArrayList<String>();
Matcher matcher = Pattern.compile("sources: [{file:\"(.*)\"").matcher(txt);
if (matcher.find()) {
while(matcher.find()) {
getfi.add(matcher.group(1));
}
System.out.println(getfi);
} else {
System.exit(1);
}

Pattern.compile("sources: [{file:\"(.*)\"")
You regex is wrong, since both [ and { are special characters, so they must be escaped. Which is why you get PatternSyntaxException: Unclosed character class near index 21, which you didn't mention in your question.
Also the pattern will match the entire string, except for the last two characters.
if (matcher.find()) {
while(matcher.find()) {
The find() call in the if statement consumes the first find. Since the first find is the entire text except last two characters, there is no second find for the find() call in the while loop, so loop is never entered.
To make it work, escape the special characters, change .* to not be greedy, and fix the loop:
String txt="sources: [{file:\"http://pla.cdn19.fx.rrrrrr.com/luq5t4nidtixexzw6wblbiexs7hg2hdu4coqdlltx6t3hu3knqhbfoxp7jna/normal.mp4\",label:\"360p\"}],sources: [{file:\"http://pla.cdn19.fx.rrrrrr.com/luq5t4nidtixexzw6wblbiexs7hg2hdu4coqdlltx6t3hu3knqhbfoxp7jna/normal.mp4\",label:\"360p\"}]";
Matcher matcher = Pattern.compile("sources: \\[\\{file:\"(.*?)\"").matcher(txt);
ArrayList<String> getfi = new ArrayList<String>();
while (matcher.find()) {
getfi.add(matcher.group(1));
}
if (getfi.isEmpty()) {
System.exit(1);
}
System.out.println(getfi);
WARNING:
Notice that sometimes there is a space after :, and sometimes not. That is perfectly valid for JSON. JSON text may contain whitespace, including newlines, so using a simple regex is not a good idea.
Use a JSON parser instead.

Related

String equal/contain none of them gets what I want

I have a string that can look somewhat like:
NCC_johjon (\users\johanjo\tomcattest\oysters\NCC_johjon, port 16001), utv_johjon (\users\johanjo\tomcattest\oysters\utv_johjon, port 16000)
and there could be like a lot of NCC_etskys, NCC_homyis and so on and I want to check if somewhere in the string there is an part that says "NCC_joh" already existing. I tried with like
if(oysters.contains("NCC_joh")){
System.out.println("HEJ HEJ HEJ HALLÅ HALLÅ HALLÅ");
}
but if there is an NCC_johjon in there it will go in the if case, but I only want to go in if exact that part exist not longer not shorter and .equal it needs to look like the whole String which is not what I want either. anyone got any idea? would be better if what I worked with were a list of Strings but I don't have that.
the oysterPaths is an Collection at first
Collection<TomcatResource> oysterPaths = TomcatResource.listCats(Paths.get(tomcatsPath));
Use regular expressions.
if (oysters.matches("(?s).*\\bNCC_joh\\b.*")) {
where
(?s) = single line mode, DOT-ALL, so . will match a newline too.
. = any char
.* = zero or more occurrences of . (any char)
\b = word boundary
String.matches does a match of the pattern over the entire string, hence the need for .* at begin and end.
(Word boundaries of course means, that between them a word has to be placed.)
This is similar to https://stackoverflow.com/a/49879388/2735286, but I would suggest to use the find method using this regular expression:
\bNCC_joh\b
Using the find method will simplify the regular expression and you will exclusively search for what is relevant.
Here is the corresponding method you can use:
public static boolean superExactMatch(String expression) {
Pattern p = Pattern.compile("\\bNCC_joh\\b", Pattern.MULTILINE);
final Matcher matcher = p.matcher(expression);
final boolean found = matcher.find();
if(found) {
// For debugging purposes to see where the match happened in the expression
System.out.println(matcher.start() + " " + matcher.end());
}
return found;
}

Pattern and Matcher is not working in java

Basically I have a simple String Where I need to explicitly restrict characters other than a-zA-Z0-9. Before I mention what is wrong here is how I am doing it.
Pattern p = Pattern.compile("[&=]");
Matcher m = p.matcher("Nothing is wrong");
if (m.find()){
out.print("You are not allowed to have &=.");
return;
}
Pattern p1 = Pattern.compile("[a-zA-Z0-9]");
Matcher m1 = p1.matcher("Itissupposetobeworking");
if (m1.find()){
out.print("There is something wrong.");
return;
}
The first one works fine, But on the second matcher m1 always gets to execute if(m1.find()) even though it doesn't contain any character other than specified in the pattern.
I also tried Pattern p1 = Pattern.compile("[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789]") But still have the same trouble.
and if you might wanna tell, which is better between String.matches(["a-zA-Z0-9"]); or the way I am using above?
Thanks in advance.
[a-zA-Z0-9] tries to match alphanumeric characters.
So, you will get "There is something wrong." to be printed, if you have a alphanumeric character in the input character sequence of matcher().
Change it to [^a-zA-Z0-9] and try.
This tries to match non-alphanumeric characters. So, you will get expected result.
You seem to want to find a partial match in a string that contains a character other than an alphanumeric character:
Pattern p1 = Pattern.compile("[^a-zA-Z0-9]");
or
Pattern p1 = Pattern.compile("\\P{Alnum}");
The [^a-zA-Z0-9] pattern is a negated character class that matches any char other than the ones defined in the class. So, if a string contains any chars other than ASCII letters or digits, your if (m1.find()) will get triggered and the message will appear.
Note that the whole negated character class can be replaced with a predefined character class \P{Alnum} that matches any char other than alphanumeric. \p{Alnum} matches any alphanumeric character and \P{Alnum} is the reverse class.
If you use the isAlphanumeric Method of org.apache.commons.lang.StringUtils yourcode become much more readable. So you need to write
if (!StringUtils.isAlphanumeric("Itissupposetobeworking"))
instead of
Pattern p1 = Pattern.compile("[a-zA-Z0-9]");
Matcher m1 = p1.matcher("Itissupposetobeworking");
if (!m1.find()){
When above expression finds a matching it prints "There is something wrong." but if you want to restrict then Use below code.
Pattern p1 = Pattern.compile("a-zA-Z0-9");
String a = "It$issupposetobeworking";
Matcher m1 = p1.matcher(a);
if (m1.find()){
System.out.print("There is something wrong.");
}
else
{
System.out.println("Everything is fine");
}
If you want the same code to be working with same regular expression in that scenario use this code.
Pattern p1 = Pattern.compile("[a-zA-Z0-9]");
Matcher m1 = p1.matcher("Itissupposetobeworking");
if (!(m1.find())){
out.print("There is something wrong.");
return;
}

Unable to Match Using Regex in Java

I asked this question a while ago, but did not get a proper answer, so giving it another shot.
class Test {
public static void main (String[] args) throws java.lang.Exception
{
String file_name = "C:\\Temp\\Test.txt";
String string = FileUtils.readFileToString(new File(file_name), "UTF-8");
String regex = "^(ipv6 pim(?: vrf .*?)? rp-address .*)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
if (matcher.find()) {
System.out.println("Matcher: " + matcher.group(1));
} else {
System.out.println("No Matches");
}
}
}
The file contains a lot of lines, more than 750, i guess, I want to extract all the lines that match the regex value. Now the problem is, the way i have done the code, does not return any matches. I only does if the first line of the file matches the regex and nothing else, if its somewhere in the middle, no luck. I thought that since everything is in new line it is causing a problem. But even writing some code converting the string into a single line one does not return a value if the pattern does not match is at the beginning.
A sample matching string: ipv6 pim rp-address 20:20:20::F
Try giving the MULTILINE modifier :
Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
Instead of using an if condition, switch it to a while loop.
while (matcher.find()) {
System.out.println("Matcher: " + matcher.group(1));
}
find() searches for one matching value. To get the next one, you must invoke find() again, hence the loop.
Additionally, the ^ prevents you to match again & again as subsequent searches don't match the starting with criteria. So you may drop the ^.
Alternatively, as Rambler suggested use the Pattern.MULTILINE flag. This will ensure the ^ is used at the beginning of every new line instead of once at the beginning of the whole string.

Regex matching up to a character if it occurs

I need to match string as below:
match everything upto ;
If - occurs, match only upto - excluding -
For e.g. :
abc; should return abc
abc-xyz; should return abc
Pattern.compile("^(?<string>.*?);$");
Using above i can achieve half. but dont know how to change this pattern to achieve the second requirement. How do i change .*? so that it stops at forst occurance of -
I am not good with regex. Any help would be great.
EDIT
I need to capture it as group. i cant change it since there many other patterns to match and capture. Its only part of it that i have posted.
Code looks something like below.
public static final Pattern findString = Pattern.compile("^(?<string>.*?);$");
if(findString.find())
{
return findString.group("string"); //cant change anything here.
}
Just use a negated char class.
^[^-;]*
ie.
Pattern p = Pattern.compile("^[^-;]*");
Matcher m = p.matcher(str);
while(m.find()) {
System.out.println(m.group());
}
This would match any character at the start but not of - or ;, zero or more times.
This should do what you are looking for:
[^-;]*
It matches characters that are not - or ;.
Tipp: If you don't feel sure with regular expressions there are great online solutions to test your input, e.g. https://regex101.com/
UPDATE
I see you have an issue in the code since you try to access .group in the Pattern object, while you need to use the .group method of the Matcher object:
public static String GetTheGroup(String str) {
Pattern findString = Pattern.compile("(?s)^(?<string>.*?)[;-]");
Matcher matcher = findString.matcher(str);
if (matcher.find())
{
return matcher.group("string"); //you have to change something here.
}
else
return "";
}
And call it as
System.out.println(GetTheGroup("abc-xyz;"));
See IDEONE demo
OLD ANSWER
Your ^(?<string>.*?);$ regex only matches 0 or more characters other than a newline from the beginning up to the first ; that is the last character in the string. I guess it is not what you expect.
You should learn more about using character classes in regex, as you can match 1 symbol from a specified character set that is defined with [...].
You can achieve this with a String.split taking the first element only and a [;-] regex that matches a ; or - literally:
String res = "abc-xyz;".split("[;-]")[0];
System.out.println(res);
Or with replaceAll with (?s)[;-].*$ regex (that matches the first ; or - and then anything up to the end of string:
res = "abc-xyz;".replaceAll("(?s)[;-].*$", "");
System.out.println(res);
See IDEONE demo
I have found the solution without removing groupings.
(?<string>.*?) matches everything upto next grouping pattern
(?:-.*?)? followed by a non grouping pattern starts with - and comes zero or once.
; end character.
So putting all together:
public static final Pattern findString = Pattern.compile("^(?<string>.*?)(?:-.*?)?;$");
if(findString.find())
{
return findString.group("string"); //cant change anything here.
}

Java String matches and replaceAll differ in matching parentheses

I have strings with parentheses and also escaped characters. I need to match against these characters and also delete them. In the following code, I use matches() and replaceAll() with the same regex, but the matches() returns false, while the replaceAll() seems to match just fine, because the replaceAll() executes and removes the characters. Can someone explain?
String input = "(aaaa)\\b";
boolean matchResult = input.matches("\\(|\\)|\\\\[a-z]+");
System.out.printf("matchResult=%s\n", matchResult);
String output = input.replaceAll("\\(|\\)|\\\\[a-z]+", "");
System.out.printf("INPUT: %s --> OUTPUT: %s\n", input, output);
Prints out:
matchResult=false
INPUT: (aaaa) --> OUTPUT: aaaa
matches matches the whole input, not part of it.
The regular expression \(|\)|\\[a-z]+ doesn't describe the whole word, but only parts of it, so in your case it fails.
What matches is doing has already been explained by Binyamin Sharet. I want to extend this a bit.
Java does not have a "findall" or a "g" modifier like other languages have it to get all matches at once.
The Java Matcher class knows only two methods to use a pattern against a string (without replacing it)
matches(): matches the whole string against the pattern
find(): returns the next match
If you want to get all things that fits your pattern, you need to use find() in a loop, something like this:
Pattern p = Pattern
.compile("\\(|\\)|\\\\[a-z]+");
Matcher m = p.matcher(text);
while(m.find()){
System.out.println(m.group(0));
}
or if you are only interested if your pattern exists in the string
if (m.find()) {
System.out.println(m.group());
} else {
System.out.println("not found");
}

Categories

Resources