Using Multiple Java regular expressions - java

I am trying to extract an email and replace it with a space using a pattern(EMAIL_PATTERN). When running the following, no output is produced when a full document is passed in. The pattern will only match the entire region. So this means if we pass in only the email, the email will be matched and be replaced with a space. But the purpose of the following method is to find the email and previous manual extraction is not required. After the email in the tempString has been replaced, I want to use it for the next pattern. Should I combine the patterns I want to use in one method or should they be placed in separate methods? Below is the code I have as of now. Also I have other patterns, but since my method is not working correctly I have not posted them yet.
private static final String EMAIL_PATTERN = "[_A-Za-z0-9-]+(\\.[_A-Za-z0-9-]+)*#[A-Za-z0-9]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";
public static void main (String[] args) {
//Document takes in a ID, student information(which includes email, address, phone, name), school, and text
Document r = new Document("", "FirstName LastName, Address, example#email.com, phoneNumber", "School", "experience", "text");
personalEmailZone(r);
}
public static Document personalEmailZone(Document doc){
//tempString is the personal information section of a resume
String tempPI = doc.tempString();
if(doc.tempString().matches(EMAIL_PATTERN) == true){
//Pattern pattern = Pattern.compile("");
Pattern pattern = Pattern.compile(EMAIL_PATTERN);
Matcher matcher = pattern.matcher(tempPI);
String emailTemp = "";
if(matcher.find()){
emailTemp = matcher.group();
System.out.println(emailTemp);
//PI.replace(emailTemp, "");
System.out.println(emailTemp.replace(emailTemp, ""));
tempPI = tempPI.replace(emailTemp, "");
System.out.println(tempPI);
}
}
return doc;
}

You have several problems:
public static Document personalEmailZone(Document doc){
//tempString is the personal information section of a resume
String tempPI = doc.tempString();
if(doc.tempString().matches(EMAIL_PATTERN) == true){
The above statement attempts to match the entire document against the email address pattern. This will not match unless doc.tempString() contains ONLY a single email address and nothing else.
//Pattern pattern = Pattern.compile("");
Pattern pattern = Pattern.compile(EMAIL_PATTERN);
Matcher matcher = pattern.matcher(tempPI);
String emailTemp = "";
if(matcher.find()){
emailTemp = matcher.group();
System.out.println(emailTemp);
//PI.replace(emailTemp, "");
System.out.println(emailTemp.replace(emailTemp, ""));
Not sure what the above is for. If your code ever reached this point, it would always print an empty line.
tempPI = tempPI.replace(emailTemp, "");
System.out.println(tempPI);
}
Since there's no loop, you will have replaced only the first occurrence of an email address. If you're expecting to replace ALL occurrences, you need to loop over the input.
}
return doc;
At this point you haven't actually modified doc, so you're returning the document in its original form, with email addresses included.
}
Look at the Javadoc for String#replaceAll(String regex, String replacement)

You can place your patterns in different methods, which return the modified string for the text pattern usage. For example
String tempPI = doc.tempString();
tempPI = applyPattern1(tempPI);
tempPI = applyPattern2(tempPI)
tempPI = applyPattern3(tempPI);
Your code does't show any output because of doc.tempString().matches(EMAIL_PATTERN) == true. Maybe it's not needed there, since it expects the entire string to be an email.

Related

Regex Redirect URL excludes token

I'm trying to create a redirect URL for my client. We have a service that you specify "fromUrl" -> "toUrl" that is using a java regex Matcher. But I can't get it work to include the token in when it converts it. For example:
/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf
Should be:
/tourl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf
but it excludes the token so the result I get is:
/fromurl/login/
/tourl/login/
I tried various regex patterns like: " ?.* and [%5E//?]+)/([^/?]+)/(?.*)?$ and (/*) etc" but no one seems to work.
I'm not that familiar with regex. How can I solve this?
This can be easily done using simple string replace but if you insist on using regular expressions:
Pattern p = Pattern.compile("fromurl");
String originalUrlAsString = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf ";
String newRedirectedUrlAsString = p.matcher(originalUrlAsString).replaceAll("tourl");
System.out.println(newRedirectedUrlAsString);
If I understand you correctly you need something like this?
String from = "/my/old/url/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String to = from.replaceAll("\\/(.*)\\/", "/my/new/url/");
System.out.println(to); // /my/new/url/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
This will replace everything between the first and the last forward slash.
Can you detail more exactly what the original expression is like? This is necessary because the regular expression is based on it.
Assuming that the first occurrence of fromurl should simply be replaced with the following code:
String from = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String to = from.replaceFirst("fromurl", "tourl");
But if it is necessary to use more complex rules to determine the substring to replace, you can use:
String from = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String to = "";
String regularExpresion = "(<<pre>>)(fromurl)(<<pos>>)";
Pattern pattern = Pattern.compile(regularExpresion);
Matcher matcher = pattern.matcher(from);
if (matcher.matches()) {
to = from.replaceAll(regularExpresion, "$1tourl$3");
}
NOTE: pre and pos targets are referencial because I don't know the real expresion of the url
NOTE 2: $1 and $3 refer to the first and the third group
Although existing answers should solve the issue and some are similar, maybe below solution would be of help, with quite an easy regex being used (assuming you get input of same format as your example):
private static String replaceUrl(String inputUrl){
String regex = "/.*(/login\\?token=.*)";
String toUrl = "/tourl";
Pattern p = Pattern.compile(regex);
Matcher matcher = p.matcher(inputUrl);
if (matcher.find()) {
return toUrl + matcher.group(1);
} else
return null;
}
You can write a test if it works for other expected inputs/outputs if you want to change format and adjust regex:
String inputUrl = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String expectedUrl = "/tourl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
if (expectedUrl.equals(replaceUrl(inputUrl))){
System.out.println("Success");
}

parsing string to get content

I have the following html string:
<h3>I only want this content</h3> I don't want this content <b>random content</b>
And I would like to only get the content from the h3 tags and remove the other content. I have the following:
String getArticleBody = listArt.getChildText("body");
StringBuilder mainArticle = new StringBuilder();
String getSubHeadlineFromArticle;
if(getArticleBody.startsWith("<h3>") && getArticleBody.endsWith("</h3>")){
mainArticle.append(getSubHeadlineFromArticle);
}
But this returns the whole content, which is not what I am after. If someone could help me that would be great thanks.
Thanks, guys. All your answers worked, but I ended up using Jsoup.
String getArticleBody = listArt.getChildText("body");
org.jsoup.nodes.Document docc = Jsoup.parse(getArticleBody);
org.jsoup.nodes.Element h3Tag = docc.getElementsByTag("h3").first();
String getSubHeadlineFromArticle = h3Tag.text();
You can use substring method like this -
String a="<h3>I only want this content</h3> I don't want this content <b>random content</b>";
System.out.println(a.substring(a.indexOf("<h3>")+4,a.indexOf("</h3>")));
Output -
I only want this content
Try with this
String result = getArticleBody.substring(getArticleBody.indexOf("<h3>"), getArticleBody.indexOf("</h3>"))
.replaceFirst("<h3>", "");
System.out.println(result);
Using regular expression
It may helps you :
String str = "<h3>I only want this content</h3> I don't want this content <b>random content</b>";
final Pattern pattern = Pattern.compile("<h3>(.+?)</h3>");
final Matcher matcher = pattern.matcher(str);
matcher.find();
System.out.println(matcher.group(1)); // Prints String I want to extract
Output :
I only want this content
You need to use regex like this:
public static void main(String[] args) {
String str = "<h3>asdfsdafsdaf</h3>dsdafsdfsafsadfa<h3>second</h3>";
// your pattern goes here
// ? is important since you need to catch the nearest closing tag
Pattern pattern = Pattern.compile("<h3>(.+?)</h3>");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) System.out.println(matcher.group(1));
}
matcher.group(1) returns exactly text between h3 tags.
The other answers already cover how to get the result you want. I'm gonna comment your code to explain why it isn't doing that already. (Note that I modified your variable names because strings don't get anything; they are a thing.)
// declare a bunch of variables
String articleBody = listArt.getChildText("body");
StringBuilder mainArticle = new StringBuilder();
String subHeadlineFromArticle;
// check to see if the article body consists entirely of a subheadline
if(articleBody.startsWith("<h3>") && articleBody.endsWith("</h3>")){
// if it does, append an empty string to the StringBuilder
mainArticle.append(subHeadlineFromArticle);
}
// if it doesn't, don't do anything
// final result:
// articleBody = the entire article body
// mainArticle = empty StringBuilder (regardless of whether you appended anything)
// subHeadlineFromArticle = empty string

If a string contains a letter, return the entire String

Weird one but:
Let's say you've a huge html page and if the page contains an email address (looking for an # sign) you want to return that email.
So far I know I need something like this:
String email;
if (myString.contains("#")) {
email = myString.substring("#")
}
I know how to get to the # but how do I go back in the string to find what's before it etc?
if the myString is the string for email you received from html page then ,
you can return the same string if it has # right. something like below
String email;
if (myString.contains("#")) {
email = myString;
}
whats the challenge here.. can you explain any challenge if so ?
This method will give you a list of all the email addresses contained in a string.
static ArrayList<String> getEmailAdresses(String str) {
ArrayList<String> result = new ArrayList<>();
Matcher m = Pattern.compile("\\S+?#[^. ]+(\\.[^. ]+)*").matcher(str.replaceAll("\\s", " "));
while(m.find()) {
result.add(m.group());
}
return result;
}
String email;
if (myString.contains("#")) {
// Locate the #
int atLocation = myString.indexOf("#");
// Get the string before the #
String start = myString.substring(0, atLocation);
// Substring from the last space before the end
start = start.substring(start.lastIndexOf(" "), start.length);
// Get the string after the #
String end = myString.substring(atLocation, myString.length);
// Substring from the first space after the start (of the end, lol)
end = end.substring(end.indexOf(" "), end.length);
// Stick it all together
email = start + "#" + end;
}
This may be a little off as I've been writing javascript all day. :)
Rather than exact code, I would like to give you an approach.
Checking just by # symbol might not be appropriate as it might be possible in other cases as well.
Search through internet or create your own, a regex pattern which matches an email.
(if you want, you can add a check for email providers as well) [here is a link] (http://www.mkyong.com/regular-expressions/how-to-validate-email-address-with-regular-expression/)
Get the index of a pattern in a string using regex and find out the substring (email in your case).

android - search and replace in string java android

This is a part of a string
test="some text" test2="othertext"
It contains a lot more of similar text with same formating. Each "statment" is separate by empty space
How to search by name(test, test2) and replace its values(stuff between "")?
in java
I dont know if its clear enough but i dont know how else to explain it
I want to search for "test" and replace its content with something else
replace
test="some text" test2="othertext"
with something else
Edit:
This is a content of a file
test="some text" test2="othertext"
I read content of that file in a string
Now i want to replace some text with something else
some text is not static it can be anything
You can use the replace() method of String, which comes in 3 types and 4 variants:
revStr.replace(oldChar, newChar)
revStr.replace(target, replacement)
revStr.replaceAll(regex, replacement)
revStr.replaceFirst(regex, replacement)
Eg:
String myString = "Here is the home of the home of the Stars";
myString = myString.replace("home","heaven");
///////////////////// Edited Part //////////////////////////////////////
String s = "The quick brown fox test =\"jumped over\" the \"lazy\" dog";
String lastStr = new String();
String t = new String();
Pattern pat = Pattern.compile("test\\s*=\\s*\".*\"");
Matcher mat = pat.matcher(s);
while (mat.find()) {
// arL.add(mat.group());
lastStr = mat.group();
}
Pattern pat1 = Pattern.compile("\".*\"");
Matcher mat1 = pat1.matcher(lastStr);
while (mat1.find()) {
t = mat.replaceAll("test=" + "\"Hello\"");
}
System.out.println(t);
So you want to replace every instance of "test" with something else?
Let's say the string name is myString:
myString = myString.replace("test","something else");
Is this what you are looking to do?
I think you are asking that you fetch data from file in the form of string,
lets suppose, your string is,
String s = "My name="sahil" and my company="microsoft", also i live in
country="india"".
Now you want to replace "sahil" with "mahajan" and "microsoft" with "google".
I have tried experimenting with the string methods to implement this functionality, but didnt find a relavent result. But i could provide you with some methods. You could use regionMatches, indexOf("name=""). But these functions will help you in finding where sahil(suppose) is located. but the replcae function here is difficult to work, because it replaces character sequence, for which you should know the exact character sequence.
Now you might try experimenting with the string methods. It could help.
I haven't tested this, but it should work:
String mFileContents;
private void replaceValue(String name, String newValue) {
int nameIndex = mFileContents.indexOf(name);
int equalSignIndex = mFileContents.indexOf("=", nameIndex);
int oldValueIndex = equalSignIndex + 2;
int oldValueLength = mFileContents.indexOf("\"", oldValueIndex);
String oldValue = mFileContents.substring(oldValueIndex, oldValueLength);
String firstHalf = mFileContents.substring(0, oldValueIndex -1);
String secondHalf = mFileContents.substring(oldValueIndex);
secondHalf.replaceFirst(oldValue, newValue);
mFileContents = firstHalf + secondHalf;
}
String a = "some text";
a = a.replace("text", "inserted value");
System.out.print(a);
Try this

Getting paramValue for paramName in specifed querystring using regex

I like to write a java utility method that returns paramValue for paramName in specified query string
Pattern p = Pattern.compile("\\&?(\\w+)\\= (I don't know what to put here) ");
public String getParamValue(String entireQueryString, String paramName)
{
Matcher m = p.matcher(entireQueryString);
while(m.find()) {
if(m.group(1).equals(paramName)) {
return m.group(2);
}
}
return null;
}
I will be invoking this method from my servlet,
String qs = request.getQueryString(); //action=initASDF&requestId=9078-32&redirect=http://www.mydomain.com?actionId=4343
System.out.println(getParamValue(qs, "requestId"));
The output should be, 9078-32
you can use a regex negated group. See this other SO question: Regular Expressions and negating a whole character group
You'll need to get everything except a &.
Use the proper API to do it: request.getParameter("requestId")
Could you split the string based on ampersands (&) and then search the resulting array for the key (look upto the equals sign).
Here's a link to String.split(): http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split%28java.lang.String%29
Here's the type of thing I'm talking about:
private static final String KEY_VALUE_SEPARATOR = "=";
private static final String QUERY_STRING_SEPARATOR = "&";
public String getParamValue(String entireQueryString, String paramName) {
String[] fragments = entireQueryString.split(QUERY_STRING_SEPARATOR);
for (String fragment : fragments){
if (fragment.substring(0, fragment.indexOf(KEY_VALUE_SEPARATOR)).equalsIgnoreCase(paramName)){
return fragment.substring(fragment.indexOf(KEY_VALUE_SEPARATOR)+1);
}
}
throw new RuntimeException("can't find value");
}
The Exception at the end is a pretty rubbish idea but that's not really the important part of this.

Categories

Resources