Extract a specific word from a text in java

Extract a specific word from a text in java - java

I want to extract a particular word from a text using Java. Is it possible
e.g. :
String str = "this is 009876 birthday of mine";
I want to get '009876' from above text in Java. Is this possible ?

You can do it easily by regex. Below is an example:
import java.util.regex.*;
class Test {
public static void main(String[] args) {
String hello = "this is 009876 birthday of mine";
Pattern pattern = Pattern.compile("009876");
Matcher matcher = pattern.matcher(hello);
int count = 0;
while (matcher.find())
count++;
System.out.println(count); // prints 1
}
}
If you want to check if the text contains the source string (e.g. "009876") you can do it simply by contains method of String as shown in below example:
public static String search() {
// TODO Auto-generated method stub
String text = "this is 009876 birthday of mine";
String source = "009876";
if(text.contains(source))
return text;
else
return text;
}
Let me know if any issue.

You can do it like this:
class ExtractDesiredString{
static String extractedString;
public static void main(String[] args) {
String hello = "this is 009876 birthday of mine";
Pattern pattern = Pattern.compile("009876");
if (hello.contains(pattern.toString())) {
extractedString = pattern.toString();
}else{
Assert.fail("Given string doesn't contain desired text")
}
}
}

Related

Substring based on special characters

I have to fetch the tablename and columnnames from a sql. For this I had split from clause data based on space and stored all the elements in a list, But now some of the columns having method calling or some other validations.
For ex some of columns :
max(TableName1.ColumnName1) --> TableName1.ColumnName1
concat('Q',TableName2.ColumnName2)} --> TableName2.ColumnName2
left(convert(varchar(90),TableName3.ColumnName3),1)}) --> TableName3.ColumnName3
Now I validate strings which are having .
Here I had only hint i.e (.) based on this I have to get right and left strings upto/before special characters.
Might get special characters like , ( )

import java.util.*;
import java.text.*;
import java.util.regex.*;
public class Parser {
private static Pattern p = Pattern.compile("(?![\\(\\,])([^\\(\\)\\,]*\\.[^\\(\\)\\,]+)(?=[\\)\\,])");
private static String getColumnName(String s) {
Matcher m = p.matcher(s);
while(m.find()) {
return m.group(1);
}
return "";
}
public static void main(String []args) {
String s1= "max(TableName1.ColumnName1)";
System.out.println(getColumnName(s1));
String s2= "concat('Q',TableName2.ColumnName2)}";
System.out.println(getColumnName(s2));
String s3= "left(convert(varchar(90),TableName3.ColumnName3),1)})";
System.out.println(getColumnName(s3));
}
}
Output:
TableName1.ColumnName1
TableName2.ColumnName2
TableName3.ColumnName3

You can use a regular expression like [(),{}] to split the array into tokens, and then just select the token with the "." sign in it. For example:
public static String getColumnName (String input) {
if (StringUtils.isEmpty(input)) return input;
String[] tokens = input.split("[(),{}]");
for (String token: tokens) {
if (token.contains(".")) return token;
}
return input;
}
public static void main(String args[]) throws Exception {
//The two tokens will be "max", "TableName1.ColumnName1".
String test1 = "max(TableName1.ColumnName1)";
//The three tokens will be "concat", "Q" and "TableName2.ColumnName2".
String test2 = "concat('Q',TableName2.ColumnName2)}";
//The six tokens will be "left", "convert", "varchar",
//"90", "", "1" and "TableName3.ColumnName3".
String test3 = "left(convert(varchar(90),TableName3.ColumnName3),1)})";
System.out.println(getColumnName(test1));
System.out.println(getColumnName(test2));
System.out.println(getColumnName(test3));
}
The print out will give you:
TableName1.ColumnName1
TableName2.ColumnName2
TableName3.ColumnName3

java tokenizer for strings

I have a text file and want to tokenize its lines -- but only the sentences with the # character.
For example, given...
Buah... Molt bon concert!! #Postconcert #gintonic
...I want to print only #Postconcert #gintonic.
I have already tried this code with some changes...
public class MyTokenizer {
/**
* #param args
*/
public static void main(String[] args) {
tokenize("Europe3.txt","allo.txt");
}
public static void tokenize(String sFile,String sFileOut) {
String sLine="", sToken="";
MyBufferedReaderWriter f = new MyBufferedReaderWriter();
f.openRFile(sFile);
MyBufferedReaderWriter fOut = new MyBufferedReaderWriter();
fOut.openWFile(sFileOut);
while ((sLine=f.readLine()) != null) {
//StringTokenizer st = new StringTokenizer(sLine, "#");
String[] tokens = sLine.split("\\#");
for (String token : tokens)
{
fOut.writeLine(token);
//System.out.println(token);
}
/*while (st.hasMoreTokens()) {
sToken = st.nextToken();
System.out.println(sToken);
}*/
}
f.closeRFile();
}
}
Can anyone help?

You can try something like with Regex:
package com.stackoverflow.answers;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class HashExtractor {
public static void main(String[] args) {
String strInput = "Buah... Molt bon concert!! #Postconcert #gintonic";
String strPattern = "(?:\\s|\\A)[##]+([A-Za-z0-9-_]+)";
Pattern pattern = Pattern.compile(strPattern);
Matcher matcher = pattern.matcher(strInput);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
}

As per the given example, when using the split() function the values would be stored something like this:
tokens[0]=Buah... Molt bon concert!!
tokens[1]=Postconcert
tokens[2]=gintonic
So you just need to skip first value and append '#' (if you need that in your other) to the other string values.
Hope this helps.

You have not specially asked for this, but I assume you try to extract all the #hashtags from your textfile.
To do this, Regex is your friend:
String text = "Buah... Molt bon concert!! #Postconcert #gintonic";
System.out.println(getHashTags(text));
public Collection<String> getHashTags(String text) {
Pattern pattern = Pattern.compile("(#\\w+)");
Matcher matcher = pattern.matcher(text);
Set<String> htags = new HashSet();
while (matcher.find()) {
htags.add(matcher.group(1));
}
return htags;
}
Compile a pattern like this #\w+, everything that starts with a # followed by one or more (+) word character (\w).
Then we have to escape the \ for java with a \\.
And finally put this expression in a group to get access to the matched text by surrounding it with braces (#\w+).
For every match, add the first matched group to the set htags, finally we get a set with all the hashtags in it.
[#gintonic, #Postconcert]

phone Number validation in java

I want to validate a phone number in such Way :-
The field should allow the user to enter characters and should auto-correct. So an entry of "+1-908-528-5656" would not create an error for the user, it would just change to "19085285656".
I also want to number range between 9 to 11.
I also tried with the below code but not concluded to the final solution:
final String PHONE_REGEX = "^\\+([0-9\\-]?){9,11}[0-9]$";
final Pattern pattern = Pattern.compile(PHONE_REGEX);
String phone = "+1-908-528-5656";
phone=phone.replaceAll("[\\-\\+]", "");
System.out.println(phone);
final Matcher matcher = pattern.matcher(phone);
System.out.println(matcher.matches());

You can use simple String.matches(regex) to test any string against a regex pattern instead of using Pattern and Matcher classes.
Sample:
boolean isValid = phoneString.matches(regexPattern);
Find more examples
Here is the regex pattern as per your input string:
\+\d(-\d{3}){2}-\d{4}
Online demo
Better use Spring validation annotation for validation.
Example

// The Regex not validate mobile number, which is in internation format.
// The Following code work for me.
// I have use libphonenumber library to validate Number from below link.
// http://repo1.maven.org/maven2/com/googlecode/libphonenumber/libphonenumber/8.0.1/
// https://github.com/googlei18n/libphonenumber
// Here, is my source code.
public boolean isMobileNumberValid(String phoneNumber)
{
boolean isValid = false;
// Use the libphonenumber library to validate Number
PhoneNumberUtil phoneUtil = PhoneNumberUtil.getInstance();
Phonenumber.PhoneNumber swissNumberProto =null ;
try {
swissNumberProto = phoneUtil.parse(phoneNumber, "CH");
} catch (NumberParseException e) {
System.err.println("NumberParseException was thrown: " + e.toString());
}
if(phoneUtil.isValidNumber(swissNumberProto))
{
isValid = true;
}
// The Library failed to validate number if it contains - sign
// thus use regex to validate Mobile Number.
String regex = "[0-9*#+() -]*";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(phoneNumber);
if (matcher.matches()) {
isValid = true;
}
return isValid;
}

Assuming your input field take any kind of character and you just want the digits.
String phone = "+1-908-528-5656";
phone=phone.replaceAll("[\\D]","");
if(phone.length()>=9 || phone.length()<=11)
System.out.println(phone);

We can use String.matches(String regex)1 to validate phone numbers using java.
Sample code snippet
package regex;
public class Phone {
private static boolean isValid(String s) {
String regex = "\\d{3}-\\d{3}-\\d{4}"; // XXX-XXX-XXXX
return s.matches(regex);
}
public static void main(String[] args) {
System.out.println(isValid("123-456-7890"));
}
}
P.S. The regex pattern we use extra '\' for escaping when we use in java string. (Try to use "\d{3}-\d{3}-\d{4}" in java program, you will get an error.

Assuming you want an optimization (which is what your comment suggests).
How bout this? (the "0" is to exclude if they give complete garbage without even a single digit).
int parse(String phone){
int num = Integer.parseInt("0"+phone.replaceAll("[^0-9]",""));
return 100000000<=num&&num<100000000000?num:-1;
}

I am not sure but removing the garbage characters parenthesis, spaces and hyphens, if you match with ^((\+[1-9]?[0-9])|0)?[7-9][0-9]{9}$ , you may validate a mobile number
private static final String PHONE_NUMBER_GARBAGE_REGEX = "[()\\s-]+";
private static final String PHONE_NUMBER_REGEX = "^((\\+[1-9]?[0-9])|0)?[7-9][0-9]{9}$";
private static final Pattern PHONE_NUMBER_PATTERN = Pattern.compile(PHONE_NUMBER_REGEX);
public static boolean validatePhoneNumber(String phoneNumber) {
return phoneNumber != null && PHONE_NUMBER_PATTERN.matcher(phoneNumber.replaceAll(PHONE_NUMBER_GARBAGE_REGEX, "")).matches();
}

One easy and simple to use java phone validation regex:
public static final String PHONE_VERIFICATION = "^[+0-9-\\(\\)\\s]*{6,14}$";
private static Pattern p;
private static Matcher m;
public static void main(String[] args)
{
//Phone validation
p = Pattern.compile(PHONE_VERIFICATION);
m = p.matcher("+1 212-788-8609");
boolean isPhoneValid = m.matches();
if(!isPhoneValid)
{
System.out.println("The Phone number is NOT valid!");
return;
}
System.out.println("The Phone number is valid!");
}

i have done testing one regex for this combination of phone numbers
(294) 784-4554
(247) 784 4554
(124)-784 4783
(124)-784-4783
(124) 784-4783
+1(202)555-0138
THIS REGEX SURELY WILL BE WORKING FOR ALL THE US NUMBERS
\d{10}|(?:\d{3}-){2}\d{4}|\(\d{3}\)\d{3}-?\d{4}|\(\d{3}\)-\d{3}-?\d{4}|\(\d{3}\) \d{3} ?\d{4}|\(\d{3}\)-\d{3} ?\d{4}|\(\d{3}\) \d{3}-?\d{4}

Building on #k_g's answers, but for US numbers.
static boolean isValidTelephoneNumber(String number) {
long num = Long.parseLong("0" + number.replaceAll("[^0-9]", ""));
return 2000000000L <= num && num < 19999999999L;
}
public static void main(String[] args) {
var numbers = List.of("+1 212-788-8609", "212-788-8609", "1 212-788-8609", "12127888609", "2127888609",
"7143788", "736103355");
numbers.forEach(number -> {
boolean isPhoneValid = isValidTelephoneNumber(number);
log.debug(number + " matches = " + isPhoneValid);
});
}

How do you replace groups in a regular expression?

How, exactly, do you replace groups while appending them to a string buffer?
For Example:
(a)(b)(c)
How can you replace group 1 with d, group 2 with e and so on?
I'm working with the Java regex engine.
Thanks in advance.

You could use Matcher's appendReplacement
Here is an example sample using:
input: "hello bob How is your cat?"
regular expression: "(bob|cat)"
output: "hello alice How is your dog"
public static void main(String[] args) {
Pattern p = Pattern.compile("(bob|cat)");
Matcher m = p.matcher("hello bob How is your cat?");
StringBuffer s = new StringBuffer();
while (m.find()) {
m.appendReplacement(s, doReplace(m.group(1)));
}
m.appendTail(s);
System.out.println(s.toString());
}
public static String doReplace(String s) {
if(s.equals("bob")) {
return "alice";
}
if(s.equals("cat")) {
return "dog";
}
return "";
}

You could use Matcher#start(group) and Matcher#end(group) to build a generic replacement method:
public static String replaceGroup(String regex, String source, int groupToReplace, String replacement) {
return replaceGroup(regex, source, groupToReplace, 1, replacement);
}
public static String replaceGroup(String regex, String source, int groupToReplace, int groupOccurrence, String replacement) {
Matcher m = Pattern.compile(regex).matcher(source);
for (int i = 0; i < groupOccurrence; i++)
if (!m.find()) return source; // pattern not met, may also throw an exception here
return new StringBuilder(source).replace(m.start(groupToReplace), m.end(groupToReplace), replacement).toString();
}
public static void main(String[] args) {
// replace with "%" what was matched by group 1
// input: aaa123ccc
// output: %123ccc
System.out.println(replaceGroup("([a-z]+)([0-9]+)([a-z]+)", "aaa123ccc", 1, "%"));
// replace with "!!!" what was matched the 4th time by the group 2
// input: a1b2c3d4e5
// output: a1b2c3d!!!e5
System.out.println(replaceGroup("([a-z])(\\d)", "a1b2c3d4e5", 2, 4, "!!!"));
}
Check online demo here.

Are you looking for something like this?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Program1 {
public static void main(String[] args) {
Pattern p = Pattern.compile("(a)(b)(c)");
String str = "111abc222abc333";
String out = null;
Matcher m = p.matcher(str);
out = m.replaceAll("z$3y$2x$1");
System.out.println(out);
}
}
This gives 111zcybxa222zcybxa333 as output.
I guess you will see what this example does.
But OK, I think there's no ready built-in
method through which you can say e.g.:
- replace group 3 with zzz
- replace group 2 with yyy
- replace group 1 with xxx

How to remove everything from HTML except special tag in java?

I want to parse HTML String by extracting only <form> ... </form>. All other stuff don't needed and I can remove it.
Today I have some helpers to remove through replaceAll special tag content like:
/** remove form */
String newString = string.replaceAll("(?s)<form.*?</form>", "");
(?s)<form.*?</form>
removes form tags. But I need vice versa, remove everything except form.
How can I fix it?
See my Gskinner example

Try below code.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Client {
private static final String PATTERN = "<form>(.+?)</form>";
private static final Pattern REGEX = Pattern.compile(PATTERN);
private static final boolean ONLY_TAG = true;
public static void main(String[] args) {
String text = "Hello <form><span><table>Hello Rais</table></span></form> end";
System.out.println(getValues(text, ONLY_TAG));
System.out.println(getValues(text, !ONLY_TAG));
}
private static String getValues(final String text, boolean flag) {
final Matcher matcher = REGEX.matcher(text);
String tagValues = null;
if (flag) {
if (matcher.find()) {
tagValues = "<form>" + matcher.group(1) + "</form>";
}
} else {
tagValues = text.replaceAll(PATTERN, "");
}
return tagValues;
}
}
You will get below output
<form><span><table>Hello Rais</table></span></form>
Hello end

The below code will give you a direction in what you are looking for:
String str = "<html><form>test form</form></html>";
String newString = str.replaceAll("[^<form</form>]+|((?s)<form.*?</form>)", "$1");
System.out.println(newString);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extract a specific word from a text in java - java

I want to extract a particular word from a text using Java. Is it possible e.g. : String str = "this is 009876 birthday of mine"; I want to get '009876' from above text in Java. Is this possible ?

Related

Substring based on special characters

java tokenizer for strings

phone Number validation in java

How do you replace groups in a regular expression?

How to remove everything from HTML except special tag in java?

Categories

Resources