I want to check if the text has more than one link or not
so for that i started with the following code:
private static void twoOrMorelinks(String commentstr){
String urlPattern = "^.*((?:http|https):\\/\\/\\S+){1,}.*((?:http|https):\\/\\/\\S+){1,}.*$";
Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(commentstr);
if (m.find()) {
System.out.println("yes");
}
}
But the above code is not very professional and I am looking for something as follow:
private static void twoOrMorelinks(String commentstr){
String urlPattern = "^.*((?:http|https):\\/\\/\\S+){2,}.*$";
Pattern p = Pattern.compile(urlPattern,Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(commentstr);
if (m.find()) {
System.out.println("yes");
}
}
But this code does not work for instance I expect the code to show match for the following text but it does not:
They say 2's company watch live on...? http://www.le testin this code http://www.lexilogos.com
any idea?
Just use this to count how many links you have:
private static int countLinks(String str) {
int total = 0;
Pattern p = Pattern.compile("(?:http|https):\\/\\/");
Matcher m = p.matcher(str);
while (m.find()) {
total++;
}
return total;
}
Then
boolean hasMoreThanTwo = countLinks("They say 2's company watch live on...? http://www.le testin this code http://www.lexilogos.com") >= 2;
If you just want to know if you have two or more, just exit after you found two.
I suggest to use the find method instead of the matches that must check all the string. I rewrite your pattern to limit the amount of backtracking:
String urlPattern = "\\bhttps?://[^h]*+(?:(?:\\Bh|h(?!ttps?://))[^h]*)*+https?://";
Pattern p = Pattern.compile(urlPattern, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(str);
if (m.find()) {
// true
} else {
// false
}
pattern details:
\\b # word boundary
https?:// # scheme for http or https
[^h]*+ # all that is not an "h"
(?:
(?:
\\Bh # an "h" not preceded by a word boundary
| # OR
h(?!ttps?://) # an "h" not followed by "ttp://" or "ttps://"
)
[^h]*
)*+
https?:// # an other scheme
Related
I'm facing a stupid problem... I know how to use Pattern and Matcher objects to capture a group in Java.
However, I cannot find a way to use them with an if statement where each choice depends on a match (simple example to illustrate the question, in reality, it's more complicated) :
String input="A=B";
String output="";
if (input.matches("#.*")) {
output="comment";
} else if (input.matches("A=(\\w+)")) {
output="value of key A is ..."; //how to get the content of capturing group?
} else {
output="unknown";
}
Should I create a Matcher for each possible test?!
Yes, you should.
Here is the example.
Pattern p = Pattern.compile("Phone: (\\d{9})");
String str = "Phone: 123456789";
Matcher m = p.matcher(str);
if (m.find()) {
String g = m.group(1); // g should hold 123456789
}
Take URL http://www.abc.com/alpha/beta/33445566778899/gamma/delta
i need to return the number 33445566778899 (with forward slashes removed, number is of variable length but between 10 & 20 digits)
Simple enough (or so i thought) except everything I've tried doesn't seem to work but why?
Pattern pattern = Pattern.compile("\\/([0-9])\\d{10,20}\\/");
Matcher matcher = pattern.matcher(fullUrl);
if (matcher.find()) {
return matcher.group(1);
}
Try this one-liner:
String number = url.replaceAll(".*/(\\d{10,20})/.*", "$1");
This regex works -
"\\/(\\d{10,20})\\/"
Testing it-
String fullUrl = "http://www.abc.com/alpha/beta/33445566778899/gamma/delta";
Pattern pattern = Pattern.compile("\\/(\\d{10,20})\\/");
Matcher matcher = pattern.matcher(fullUrl);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
OUTPUT - 33445566778899
Try,
String url = "http://www.abc.com/alpha/beta/33445566778899/gamma/delta";
String digitStr = null;
for(String str : url.split("/")){
System.out.println(str);
if(str.matches("[0-9]{10,20}")){
digitStr = str;
break;
}
}
System.out.println(digitStr);
Output:
33445566778899
Instead of saying it "doesn't seem to work", you should have given use what it was returning. Testing it confirmed what I thought: your code would return 3 for this input.
This is simply because your regexp as written will capture a digit following a / and followed by 10 to 20 digits themselves followed by a /.
The regex you want is "/(\\d{10,20})/" (you don't need to escape the /). Below you'll find the code I tested this with.
public static void main(String[] args) {
String src = "http://www.abc.com/alpha/beta/33445566778899/gamma/delta";
Pattern pattern = Pattern.compile("/(\\d{10,20})/");
Matcher matcher = pattern.matcher(src);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
}
This is the string that I have:
KLAS 282356Z 32010KT 10SM FEW090 10/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007
This is a weather report. I need to extract the following numbers from the report: 10/M13. It is temperature and dewpoint, where M means minus. So, the place in the String may differ and the temperature may be presented as M10/M13 or 10/13 or M10/13.
I have done the following code:
public String getTemperature (String metarIn){
Pattern regex = Pattern.compile(".*(\\d+)\\D+(\\d+)");
Matcher matcher = regex.matcher(metarIn);
if (matcher.matches() && matcher.groupCount() == 1) {
temperature = matcher.group(1);
System.out.println(temperature);
}
return temperature;
}
Obviously, the regex is wrong, since the method always returns null. I have tried tens of variations but to no avail. Thanks a lot if someone can help!
This will extract the String you seek, and it's only one line of code:
String tempAndDP = input.replaceAll(".*(?<![M\\d])(M?\\d+/M?\\d+).*", "$1");
Here's some test code:
public static void main(String[] args) throws Exception {
String input = "KLAS 282356Z 32010KT 10SM FEW090 M01/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007";
String tempAndDP = input.replaceAll(".*(?<![M\\d])(M?\\d+/M?\\d+).*", "$1");
System.out.println(tempAndDP);
}
Output:
M01/M13
The regex should look like:
M?\d+/M?\d+
For Java this will look like:
"M?\\d+/M?\\d+"
You might want to add a check for white space on the front and end:
"\\sM?\\d+/M?\\d+\\s"
But this will depend on where you think you are going to find the pattern, as it will not be matched if it is at the end of the string, so instead we should use:
"(^|\\s)M?\\d+/M?\\d+($|\\s)"
This specifies that if there isn't any whitespace at the end or front we must match the end of the string or the start of the string instead.
Example code used to test:
Pattern p = Pattern.compile("(^|\\s)M?\\d+/M?\\d+($|\\s)");
String test = "gibberish M130/13 here";
Matcher m = p.matcher(test);
if (m.find())
System.out.println(m.group().trim());
This returns: M130/13
Try:
Pattern regex = Pattern.compile(".*\\sM?(\\d+)/M?(\\d+)\\s.*");
Matcher matcher = regex.matcher(metarIn);
if (matcher.matches() && matcher.groupCount() == 2) {
temperature = matcher.group(1);
System.out.println(temperature);
}
Alternative for regex.
Some times a regex is not the only solution. It seems that in you case, you must get the 6th block of text. Each block is separated by a space character. So, what you need to do is count the blocks.
Considering that each block of text does NOT HAVE fixed length
Example:
String s = "KLAS 282356Z 32010KT 10SM FEW090 10/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007";
int spaces = 5;
int begin = 0;
while(spaces-- > 0){
begin = s.indexOf(' ', begin)+1;
}
int end = s.indexOf(' ', begin+1);
String result = s.substring(begin, end);
System.out.println(result);
Considering that each block of text does HAVE fixed length
String s = "KLAS 282356Z 32010KT 10SM FEW090 10/M13 A2997 RMK AO2 SLP145 T01001128 10100 20072 51007";
String result = s.substring(33, s.indexOf(' ', 33));
System.out.println(result);
Prettier alternative, as pointed by Adrian:
String result = rawString.split(" ")[5];
Note that split acctualy receives a regex pattern as parameter
i have pattern:
host=([a-z0-9./:]*)
it's find for me host address. And i have content
host=http//:sdf3452.domain.com/
And my code is:
Matcher m;
Pattern hostP = Pattern.compile("host=([a-z0-9./:]*)");
m=hostP.matcher(content);//string 1
String match = m.group();//string 2
Log.i("host", ""+hostP.matcher(content).find());
if i delete string 1 and 2 i see true in logcat. If left as is I got exception nothing found.
I've tried all kinds of pattern. Through debug looked m variable, finds no match. Please teach me use reg exp!
Before you group() a match, you need to invoke find().
Try it like this:
Pattern hostP = Pattern.compile("host=([a-z0-9./:]*)");
Matcher m = hostP.matcher(content);
if(m.find()) {
String match = m.group();
// ...
}
EDIT
and a little demo that shows what each match-group contains:
Pattern p = Pattern.compile("host=([a-z0-9./:]*)");
Matcher m = p.matcher("host=http://sdf3452.domain.com/");
if (m.find()) {
for(int i = 0; i <= m.groupCount(); i++) {
System.out.printf("m.group(%d) = '%s'\n", i, m.group(i));
}
}
which will print:
m.group(0) = 'host=http://sdf3452.domain.com/'
m.group(1) = 'http://sdf3452.domain.com/'
As you can see, group(0), which is the same as group(), contains what the entire pattern matches.
But realize that a URL can contain much more than what your defined in [a-z0-9./:]*!
String content = "host=http://sdf3452.domain.com/";
Matcher mm;
Pattern hostP = Pattern.compile("host=([a-z0-9./:]*)");
mm=hostP.matcher(content);
String match = "";
if (mm.find()){//use m.find() first
match = mm.group(1);//1 is order number of brackets
}
Example:
((UINT32)((384UL*1024UL) - 1UL)) should return "UINT32"
(char)abc should return "char".
((int)xyz) should return "int".
Pattern p = Pattern.compile("\\(([^()]*)\\)");
String[] tests = {
"((UINT32)((384UL*1024UL) - 1UL))",
"(char)abc",
"((int)xyz)"
};
for (String s : tests) {
Matcher m = p.matcher(s);
if (m.find())
System.out.println(m.group(1));
}
Prints
UINT32
char
int
Explanation of the regular expression:
\\( Start with a (
( start capturing group
[^()]* anything but ( and ) 0 or more times
) end capturing group
\\) end with a ).
Using regular expressions is a bit of an overkill though. You could also do
int close = s.indexOf(')');
int open = s.lastIndexOf('(', close);
result = s.substring(open+1, close);
Pattern p = Pattern.compile("\\(([^\\(\\)]+?)\\)");
Matcher m = p.matcher(input);
if (m.find())
result = m.group(1);