Java Pattern match

Java Pattern match - java

I've a long template from which I need to extract certain strings based on certain patterns. When I went through some examples I found that use of quantifiers is good in such situations.For example following is my template, from which I need to extract while and doWhile.
This is a sample document.
$while($variable)This text can be repeated many times until do while is called.$endWhile.
Some sample text follows this.
$while($variable2)This text can be repeated many times until do while is called.$endWhile.
Some sample text.
I need to extract the whole text, starting from $while($variable) till $endWhile. I then need to process the value of $variable. After that I need to insert the text between $while and $endWhile to the original text.
I've the logic of extracting the variable. But I'm not sure how to use quantifiers or pattern match here.
Can someone please provide me a sample code for this? Any help will be greatly appreciated

You can use a rather simple regex-based solution here with a Matcher:
Pattern pattern = Pattern.compile("\\$while\\((.*?)\\)(.*?)\\$endWhile", Pattern.DOTALL);
Matcher matcher = pattern.matcher(yourString);
while(matcher.find()){
String variable = matcher.group(1); // this will include the $
String value = matcher.group(2);
// now do something with variable and value
}
If you want to replace the variables in the original text, you should use the Matcher.appendReplacement() / Matcher.appendTail() solution:
Pattern pattern = Pattern.compile("\\$while\\((.*?)\\)(.*?)\\$endWhile", Pattern.DOTALL);
Matcher matcher = pattern.matcher(yourString);
StringBuffer sb = new StringBuffer();
while(matcher.find()){
String variable = matcher.group(1); // this will include the $
String value = matcher.group(2);
// now do something with variable and value
matcher.appendReplacement(sb, value);
}
matcher.appendTail(sb);
Reference:
Methods of the Pattern Class
(Sun Java Tutorial)
Methods of the Matcher Class
(Sun Java Tutorial)
Pattern JavaDoc
Matcher JavaDoc

public class PatternInString {
static String testcase1 = "what i meant here";
static String testcase2 = "here";
public static void main(String args[])throws StringIndexOutOfBoundsException{
PatternInString testInstance= new PatternInString();
boolean result = testInstance.occurs(testcase1,testcase2);
System.out.println(result);
}
//write your code here
public boolean occurs(String str1, String str2)throws StringIndexOutOfBoundsException
{ int i;
boolean result=false;
int num7=str1.indexOf(" ");
int num8=str1.lastIndexOf(" ");
String str6=str1.substring(num8+1);
String str5=str1.substring(0,num7);
if(str5.equals(str2))
{
result=true;
}
else if(str6.equals(str2))
{
result=true;
}
int num=-1;
try
{
for(i=0;i<str1.length()-1;i++)
{ num=num+1;
num=str1.indexOf(" ",num);
int num1=str1.indexOf(" ",num+1);
String str=str1.substring(num+1,num1);
if(str.equals(str2))
{
result=true;
break;
}
}
}
catch(Exception e)
{
}
return result;
}
}

Related

Java Browser, Dynamic string Matcher Pattern

I have the following code that uses a specific string and uses the matcher and pattern to draw a link, I also have a method that returns the html code as a string, my problem is that I dont know how to call it so that when the following method runs it uses the dynamic string instead of a static one, I tried using the dynamic string name inside the search but it gave me an error saying that it cannot be compiled since im trying to use a dynamic string instead of a static one, any hints or help would be appreciated, if you need any of my other classes and or methods feel free to ask.
String stringToSearch = "<a>www.google.com</a> ";
Pattern p = Pattern.compile("<a>(\\S+)</a>");
Matcher m = p.matcher(stringToSearch);
if (m.find())
{
String codeGroup = m.group(1);
System.out.format("'%s'\n", codeGroup);
}
}
}

This isn't really a 'design-patterns' question, it is more to do with just knowing how to pass arguments properly into methods.
The Pattern.compile(String) method takes a string as input. That string doesn't have to be a constant. You can pass that string in as a parameter, I've even put it into a 'helper' method to demonstrate that.
public public void someMethod(){
String stringToSearch = "<a>www.google.com</a> ";
String matchPattern = "<a>(\\S+)</a>";
if (doesMatch(matchPattern,stringToSearch)){
String codeGroup = m.group(1);
System.out.format("'%s'\n", codeGroup);
}
}
public static boolean doesMatch(String pattern, String stringToSearch){
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(stringToSearch);
return m.find();
}
to show you what I think you mean...
{
// code...
String stringToSearch = getContent(); //might have parameters here or not
String matchPattern = "<a>(\\S+)</a>";
if (doesMatch(matchPattern,stringToSearch)){
String codeGroup = m.group(1);
System.out.format("'%s'\n", codeGroup);
}
}
public static boolean doesMatch(String pattern, String stringToSearch){
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(stringToSearch);
return m.find();
}

java tokenizer for strings

I have a text file and want to tokenize its lines -- but only the sentences with the # character.
For example, given...
Buah... Molt bon concert!! #Postconcert #gintonic
...I want to print only #Postconcert #gintonic.
I have already tried this code with some changes...
public class MyTokenizer {
/**
* #param args
*/
public static void main(String[] args) {
tokenize("Europe3.txt","allo.txt");
}
public static void tokenize(String sFile,String sFileOut) {
String sLine="", sToken="";
MyBufferedReaderWriter f = new MyBufferedReaderWriter();
f.openRFile(sFile);
MyBufferedReaderWriter fOut = new MyBufferedReaderWriter();
fOut.openWFile(sFileOut);
while ((sLine=f.readLine()) != null) {
//StringTokenizer st = new StringTokenizer(sLine, "#");
String[] tokens = sLine.split("\\#");
for (String token : tokens)
{
fOut.writeLine(token);
//System.out.println(token);
}
/*while (st.hasMoreTokens()) {
sToken = st.nextToken();
System.out.println(sToken);
}*/
}
f.closeRFile();
}
}
Can anyone help?

You can try something like with Regex:
package com.stackoverflow.answers;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HashExtractor {
public static void main(String[] args) {
String strInput = "Buah... Molt bon concert!! #Postconcert #gintonic";
String strPattern = "(?:\\s|\\A)[##]+([A-Za-z0-9-_]+)";
Pattern pattern = Pattern.compile(strPattern);
Matcher matcher = pattern.matcher(strInput);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
}

As per the given example, when using the split() function the values would be stored something like this:
tokens[0]=Buah... Molt bon concert!!
tokens[1]=Postconcert
tokens[2]=gintonic
So you just need to skip first value and append '#' (if you need that in your other) to the other string values.
Hope this helps.

You have not specially asked for this, but I assume you try to extract all the #hashtags from your textfile.
To do this, Regex is your friend:
String text = "Buah... Molt bon concert!! #Postconcert #gintonic";
System.out.println(getHashTags(text));
public Collection<String> getHashTags(String text) {
Pattern pattern = Pattern.compile("(#\\w+)");
Matcher matcher = pattern.matcher(text);
Set<String> htags = new HashSet();
while (matcher.find()) {
htags.add(matcher.group(1));
}
return htags;
}
Compile a pattern like this #\w+, everything that starts with a # followed by one or more (+) word character (\w).
Then we have to escape the \ for java with a \\.
And finally put this expression in a group to get access to the matched text by surrounding it with braces (#\w+).
For every match, add the first matched group to the set htags, finally we get a set with all the hashtags in it.
[#gintonic, #Postconcert]

How to get full sentence using regex in java

As of now, I'm parsing PDF using PDFBox later I will be parsing other documents (.docx/.doc). Using PDFBox, I'm getting all file content into one string. Now, I wanted to get complete sentence wherever a user define words matches.
For example:
... some text here..
Raman took more than 12 year to complete his schooling and now he
is pursuing higher study.
Relational Database.
... some text here ..
If user gives the input year, then it should return whole sentence.
Expected Output:
Raman took more than 12 year to complete his schooling and now he
is pursuing higher study.
I'm trying below code, but it showing nothing. Can anyone correct this
Pattern pattern = Pattern.compile("[\\w|\\W]*+[YEAR]+[\\w]*+.");
Also, If I have to include multiple words to match as OR condition, then what should I make change in my regex ?
Please note all words are in uppercase.

Do not try to put everything into the single regexp. There's a standard Java class java.text.BreakIterator which can be used to find the sentence boundaries.
public static String getSentence(String input, String word) {
Matcher matcher = Pattern.compile(word, Pattern.LITERAL | Pattern.CASE_INSENSITIVE)
.matcher(input);
if(matcher.find()) {
BreakIterator br = BreakIterator.getSentenceInstance(Locale.ENGLISH);
br.setText(input);
int start = br.preceding(matcher.start());
int end = br.following(matcher.end());
return input.substring(start, end);
}
return null;
}
Usage:
public static void main(String[] args) {
String input = "... some text...\n Raman took more than 12 year to complete his schooling and now he\nis pursuing higher study. Relational Database. \n... some text...";
System.out.println(getSentence(input, "YEAR"));
}

Pattern re = Pattern.compile("[^.!?\\s][^.!?]*(?:[.!?](?!['\"]?\\s|$) [^.!?]*)*[.!?]?['\"]?(?=\\s|$)", Pattern.MULTILINE | Pattern.COMMENTS);
Matcher reMatcher = re.matcher(result);
while (reMatcher.find()) {
System.out.println(reMatcher.group());
}

A small fix to #Tagir Valeev answer to prevent index out of bounds exceptions.
private String getSentence(String input, String word) {
Matcher matcher = Pattern.compile(word , Pattern.LITERAL | Pattern.CASE_INSENSITIVE)
.matcher(input);
if(matcher.find()) {
BreakIterator br = BreakIterator.getSentenceInstance(Locale.ENGLISH);
br.setText(input);
int start = br.preceding(matcher.start());
int end = br.following(matcher.end());
if(start == BreakIterator.DONE) {
start = 0;
}
if(end == BreakIterator.DONE) {
end = input.length();
}
return input.substring(start, end);
}
return null;
}

phone Number validation in java

I want to validate a phone number in such Way :-
The field should allow the user to enter characters and should auto-correct. So an entry of "+1-908-528-5656" would not create an error for the user, it would just change to "19085285656".
I also want to number range between 9 to 11.
I also tried with the below code but not concluded to the final solution:
final String PHONE_REGEX = "^\\+([0-9\\-]?){9,11}[0-9]$";
final Pattern pattern = Pattern.compile(PHONE_REGEX);
String phone = "+1-908-528-5656";
phone=phone.replaceAll("[\\-\\+]", "");
System.out.println(phone);
final Matcher matcher = pattern.matcher(phone);
System.out.println(matcher.matches());

You can use simple String.matches(regex) to test any string against a regex pattern instead of using Pattern and Matcher classes.
Sample:
boolean isValid = phoneString.matches(regexPattern);
Find more examples
Here is the regex pattern as per your input string:
\+\d(-\d{3}){2}-\d{4}
Online demo
Better use Spring validation annotation for validation.
Example

// The Regex not validate mobile number, which is in internation format.
// The Following code work for me.
// I have use libphonenumber library to validate Number from below link.
// http://repo1.maven.org/maven2/com/googlecode/libphonenumber/libphonenumber/8.0.1/
// https://github.com/googlei18n/libphonenumber
// Here, is my source code.
public boolean isMobileNumberValid(String phoneNumber)
{
boolean isValid = false;
// Use the libphonenumber library to validate Number
PhoneNumberUtil phoneUtil = PhoneNumberUtil.getInstance();
Phonenumber.PhoneNumber swissNumberProto =null ;
try {
swissNumberProto = phoneUtil.parse(phoneNumber, "CH");
} catch (NumberParseException e) {
System.err.println("NumberParseException was thrown: " + e.toString());
}
if(phoneUtil.isValidNumber(swissNumberProto))
{
isValid = true;
}
// The Library failed to validate number if it contains - sign
// thus use regex to validate Mobile Number.
String regex = "[0-9*#+() -]*";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(phoneNumber);
if (matcher.matches()) {
isValid = true;
}
return isValid;
}

Assuming your input field take any kind of character and you just want the digits.
String phone = "+1-908-528-5656";
phone=phone.replaceAll("[\\D]","");
if(phone.length()>=9 || phone.length()<=11)
System.out.println(phone);

We can use String.matches(String regex)1 to validate phone numbers using java.
Sample code snippet
package regex;
public class Phone {
private static boolean isValid(String s) {
String regex = "\\d{3}-\\d{3}-\\d{4}"; // XXX-XXX-XXXX
return s.matches(regex);
}
public static void main(String[] args) {
System.out.println(isValid("123-456-7890"));
}
}
P.S. The regex pattern we use extra '\' for escaping when we use in java string. (Try to use "\d{3}-\d{3}-\d{4}" in java program, you will get an error.

Assuming you want an optimization (which is what your comment suggests).
How bout this? (the "0" is to exclude if they give complete garbage without even a single digit).
int parse(String phone){
int num = Integer.parseInt("0"+phone.replaceAll("[^0-9]",""));
return 100000000<=num&&num<100000000000?num:-1;
}

I am not sure but removing the garbage characters parenthesis, spaces and hyphens, if you match with ^((\+[1-9]?[0-9])|0)?[7-9][0-9]{9}$ , you may validate a mobile number
private static final String PHONE_NUMBER_GARBAGE_REGEX = "[()\\s-]+";
private static final String PHONE_NUMBER_REGEX = "^((\\+[1-9]?[0-9])|0)?[7-9][0-9]{9}$";
private static final Pattern PHONE_NUMBER_PATTERN = Pattern.compile(PHONE_NUMBER_REGEX);
public static boolean validatePhoneNumber(String phoneNumber) {
return phoneNumber != null && PHONE_NUMBER_PATTERN.matcher(phoneNumber.replaceAll(PHONE_NUMBER_GARBAGE_REGEX, "")).matches();
}

One easy and simple to use java phone validation regex:
public static final String PHONE_VERIFICATION = "^[+0-9-\\(\\)\\s]*{6,14}$";
private static Pattern p;
private static Matcher m;
public static void main(String[] args)
{
//Phone validation
p = Pattern.compile(PHONE_VERIFICATION);
m = p.matcher("+1 212-788-8609");
boolean isPhoneValid = m.matches();
if(!isPhoneValid)
{
System.out.println("The Phone number is NOT valid!");
return;
}
System.out.println("The Phone number is valid!");
}

i have done testing one regex for this combination of phone numbers
(294) 784-4554
(247) 784 4554
(124)-784 4783
(124)-784-4783
(124) 784-4783
+1(202)555-0138
THIS REGEX SURELY WILL BE WORKING FOR ALL THE US NUMBERS
\d{10}|(?:\d{3}-){2}\d{4}|\(\d{3}\)\d{3}-?\d{4}|\(\d{3}\)-\d{3}-?\d{4}|\(\d{3}\) \d{3} ?\d{4}|\(\d{3}\)-\d{3} ?\d{4}|\(\d{3}\) \d{3}-?\d{4}

Building on #k_g's answers, but for US numbers.
static boolean isValidTelephoneNumber(String number) {
long num = Long.parseLong("0" + number.replaceAll("[^0-9]", ""));
return 2000000000L <= num && num < 19999999999L;
}
public static void main(String[] args) {
var numbers = List.of("+1 212-788-8609", "212-788-8609", "1 212-788-8609", "12127888609", "2127888609",
"7143788", "736103355");
numbers.forEach(number -> {
boolean isPhoneValid = isValidTelephoneNumber(number);
log.debug(number + " matches = " + isPhoneValid);
});
}

Remove html from only a part of string

I have the following code which should remove all HTML from a part of string, which is quoted by dollar signs (could be more of them). This works fine, but I also need to preserve those dollar signs. Any suggestions, thanks
private static String removeMarkupBetweenDollars(String input){
if ((input.length()-input.replaceAll("\\$","").length())%2!=0)
{
throw new RuntimeException("Missing or extra: dollar");
}
Pattern pattern = Pattern.compile("\\$(.*?)\\$",Pattern.DOTALL);
Matcher matcher = pattern.matcher(input);
StringBuffer sb =new StringBuffer();
while(matcher.find())
{ //prepending does NOT work, if sth. is in front of first dollar
matcher.appendReplacement(sb,matcher.group(1).replaceAll("\\<.*?\\>", ""));
sb.append("$"); //note this manual appending
}
matcher.appendTail(sb);
System.out.println(sb.toString());
return sb.toString();
}
Thanks for help!
String input="<p>$<em>something</em>$</p> <p>anything else</p>";
String output="<p>$something$</p> <p>anything else</p>";
More complicated input and output:
String input="<p>$ bar <b>foo</b>  bar <span style=\"text-decoration: underline;\">foo</span>  $</p><p>another foos</p> $ foo bar <em>bar</em>$";
String output="<p>$ bar foo  bar foo  $</p><p>another foos</p> $ foo bar bar$"

Just some minor tweaks to your code:
private static String removeMarkupBetweenDollars(String input) {
if ((input.length() - input.replaceAll("\\$", "").length()) % 2 != 0) {
throw new RuntimeException("Missing or extra: dollar");
}
Pattern pattern = Pattern.compile("\\$(.*?)\\$", Pattern.DOTALL);
Matcher matcher = pattern.matcher(input);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
String s = matcher.group().replaceAll("<[^>]+>", "");
matcher.appendReplacement(sb, Matcher.quoteReplacement(s));
}
matcher.appendTail(sb);
return sb.toString();
}

String output = input.replaceAll("\\$<.*?>(.*?)<.*?>\\$", "\\$$1\\$");
One key point in the regex is the ? in .*? - it means a "non greedy" match, which in turn means "consume the least possible input you can". Without this, the regex would try to consume as much as possible - up to the end of a subsequent occurrence of $<html>foo</html>$ in the input if one existed.
Here's a test:
public static void main(String[] args) throws Exception {
String input = "<p>$<em>something</em>$</p> <p>and $<em>anything</em>$ else</p>";
String output = input.replaceAll("\\$<.*?>(.*?)<.*?>\\$", "\\$$1\\$");
System.out.println(output);
}
Output:
<p>$something$</p> <p>and $anything$ else</p>

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Pattern match - java

Related

Java Browser, Dynamic string Matcher Pattern

java tokenizer for strings

How to get full sentence using regex in java

phone Number validation in java

Remove html from only a part of string

Categories

Resources