parsing string to get content - java

I have the following html string:
<h3>I only want this content</h3> I don't want this content <b>random content</b>
And I would like to only get the content from the h3 tags and remove the other content. I have the following:
String getArticleBody = listArt.getChildText("body");
StringBuilder mainArticle = new StringBuilder();
String getSubHeadlineFromArticle;
if(getArticleBody.startsWith("<h3>") && getArticleBody.endsWith("</h3>")){
mainArticle.append(getSubHeadlineFromArticle);
}
But this returns the whole content, which is not what I am after. If someone could help me that would be great thanks.

Thanks, guys. All your answers worked, but I ended up using Jsoup.
String getArticleBody = listArt.getChildText("body");
org.jsoup.nodes.Document docc = Jsoup.parse(getArticleBody);
org.jsoup.nodes.Element h3Tag = docc.getElementsByTag("h3").first();
String getSubHeadlineFromArticle = h3Tag.text();

You can use substring method like this -
String a="<h3>I only want this content</h3> I don't want this content <b>random content</b>";
System.out.println(a.substring(a.indexOf("<h3>")+4,a.indexOf("</h3>")));
Output -
I only want this content

Try with this
String result = getArticleBody.substring(getArticleBody.indexOf("<h3>"), getArticleBody.indexOf("</h3>"))
.replaceFirst("<h3>", "");
System.out.println(result);

Using regular expression
It may helps you :
String str = "<h3>I only want this content</h3> I don't want this content <b>random content</b>";
final Pattern pattern = Pattern.compile("<h3>(.+?)</h3>");
final Matcher matcher = pattern.matcher(str);
matcher.find();
System.out.println(matcher.group(1)); // Prints String I want to extract
Output :
I only want this content

You need to use regex like this:
public static void main(String[] args) {
String str = "<h3>asdfsdafsdaf</h3>dsdafsdfsafsadfa<h3>second</h3>";
// your pattern goes here
// ? is important since you need to catch the nearest closing tag
Pattern pattern = Pattern.compile("<h3>(.+?)</h3>");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) System.out.println(matcher.group(1));
}
matcher.group(1) returns exactly text between h3 tags.

The other answers already cover how to get the result you want. I'm gonna comment your code to explain why it isn't doing that already. (Note that I modified your variable names because strings don't get anything; they are a thing.)
// declare a bunch of variables
String articleBody = listArt.getChildText("body");
StringBuilder mainArticle = new StringBuilder();
String subHeadlineFromArticle;
// check to see if the article body consists entirely of a subheadline
if(articleBody.startsWith("<h3>") && articleBody.endsWith("</h3>")){
// if it does, append an empty string to the StringBuilder
mainArticle.append(subHeadlineFromArticle);
}
// if it doesn't, don't do anything
// final result:
// articleBody = the entire article body
// mainArticle = empty StringBuilder (regardless of whether you appended anything)
// subHeadlineFromArticle = empty string

Related

Regex Redirect URL excludes token

I'm trying to create a redirect URL for my client. We have a service that you specify "fromUrl" -> "toUrl" that is using a java regex Matcher. But I can't get it work to include the token in when it converts it. For example:
/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf
Should be:
/tourl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf
but it excludes the token so the result I get is:
/fromurl/login/
/tourl/login/
I tried various regex patterns like: " ?.* and [%5E//?]+)/([^/?]+)/(?.*)?$ and (/*) etc" but no one seems to work.
I'm not that familiar with regex. How can I solve this?
This can be easily done using simple string replace but if you insist on using regular expressions:
Pattern p = Pattern.compile("fromurl");
String originalUrlAsString = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf ";
String newRedirectedUrlAsString = p.matcher(originalUrlAsString).replaceAll("tourl");
System.out.println(newRedirectedUrlAsString);
If I understand you correctly you need something like this?
String from = "/my/old/url/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String to = from.replaceAll("\\/(.*)\\/", "/my/new/url/");
System.out.println(to); // /my/new/url/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
This will replace everything between the first and the last forward slash.
Can you detail more exactly what the original expression is like? This is necessary because the regular expression is based on it.
Assuming that the first occurrence of fromurl should simply be replaced with the following code:
String from = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String to = from.replaceFirst("fromurl", "tourl");
But if it is necessary to use more complex rules to determine the substring to replace, you can use:
String from = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String to = "";
String regularExpresion = "(<<pre>>)(fromurl)(<<pos>>)";
Pattern pattern = Pattern.compile(regularExpresion);
Matcher matcher = pattern.matcher(from);
if (matcher.matches()) {
to = from.replaceAll(regularExpresion, "$1tourl$3");
}
NOTE: pre and pos targets are referencial because I don't know the real expresion of the url
NOTE 2: $1 and $3 refer to the first and the third group
Although existing answers should solve the issue and some are similar, maybe below solution would be of help, with quite an easy regex being used (assuming you get input of same format as your example):
private static String replaceUrl(String inputUrl){
String regex = "/.*(/login\\?token=.*)";
String toUrl = "/tourl";
Pattern p = Pattern.compile(regex);
Matcher matcher = p.matcher(inputUrl);
if (matcher.find()) {
return toUrl + matcher.group(1);
} else
return null;
}
You can write a test if it works for other expected inputs/outputs if you want to change format and adjust regex:
String inputUrl = "/fromurl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
String expectedUrl = "/tourl/login?token=7c8Q8grW5f2Kz7RP1%2FWsqpVB%2FEluVOGfXQdW4I0v82siR2Ism1D8VCvEmKJr%2BKhHhicwPey0uIiTxN049Be8TNsypf";
if (expectedUrl.equals(replaceUrl(inputUrl))){
System.out.println("Success");
}

Parse string value from URL

I have a string (which is an URL) in this pattern https://xxx.kflslfsk.com/kjjfkskfjksf/v1/files/media/93939393hhs8.jpeg
now I want to clip it to this
media/93939393hhs8.jpeg
I want to remove all the characters before the second last slash /.
i'm a newbie in java but in swift (iOS) this is how we do this:
if let url = NSURL(string:"https://xxx.kflslfsk.com/kjjfkskfjksf/v1/files/media/93939393hhs8.jpeg"), pathComponents = url.pathComponents {
let trimmedString = pathComponents.suffix(2).joinWithSeparator("/")
print(trimmedString) // "output = media/93939393hhs8.jpeg"
}
Basically, I'm removing everything from this Url expect of last 2 item and then.
I'm joining those 2 items using /.
String ret = url.substring(url.indexof("media"),url.indexof("jpg"))
Are you familiar with Regex? Try to use this Regex (explained in the link) that captures the last 2 items separated with /:
.*?\/([^\/]+?\/[^\/]+?$)
Here is the example in Java (don't forget the escaping with \\:
Pattern p = Pattern.compile("^.*?\\/([^\\/]+?\\/[^\\/]+?$)");
Matcher m = p.matcher(string);
if (m.find()) {
System.out.println(m.group(1));
}
Alternatively there is the split(..) function, however I recommend you the way above. (Finally concatenate separated strings correctly with StringBuilder).
String part[] = string.split("/");
int l = part.length;
StringBuilder sb = new StringBuilder();
String result = sb.append(part[l-2]).append("/").append(part[l-1]).toString();
Both giving the same result: media/93939393hhs8.jpeg
string result=url.substring(url.substring(0,url.lastIndexOf('/')).lastIndexOf('/'));
or
Use Split and add last 2 items
string[] arr=url.split("/");
string result= arr[arr.length-2]+"/"+arr[arr.length-1]
public static String parseUrl(String str) {
return (str.lastIndexOf("/") > 0) ? str.substring(1+(str.substring(0,str.lastIndexOf("/")).lastIndexOf("/"))) : str;
}

regex or string parsing

I am trying to parse a string which has a specific pattern. An example valid string is as follows:
<STX><DATA><ETX>
<STX>A?123<ETX>
<STX><DATA><ETX>
<STX>name!xyz<ETX>
<STX>age!27y<ETX>
<STX></DATA><ETX>
<STX>A?234<ETX>
<STX><DATA><ETX>
<STX>name!abc<ETX>
<STX>age!24y<ETX>
<STX></DATA><ETX>
<STX>A?345<ETX>
<STX><DATA><ETX>
<STX>name!bac<ETX>
<STX>age!22y<ETX>
<STX></DATA><ETX>
<STX>OK<ETX>
<STX></DATA><ETX>
this data is sent by device. All I need is to parse this string with id:123 name:xyz, age 27y.
I am trying to use this regex:
final Pattern regex = Pattern.compile("(.*?)", Pattern.DOTALL);
this does output the required data :
<ETX>
<STX>A?123<ETX>
<STX><DATA><ETX>
<STX>name!xyz<ETX>
<STX>age!27y<ETX>
<STX>
How can I loop the string recursively to copy all into list of string.
I am trying to loop over and delete the extracted pattern but it doesn't delete.
final Pattern regex = Pattern.compile("<DATA>(.*?)</DATA>", Pattern.DOTALL);// Q?(.*?)
final StringBuffer buff = new StringBuffer(frame);
final Matcher matcher = regex.matcher(buff);
while (matcher.find())
{
final String dataElements = matcher.group();
System.out.println("Data:" + dataElements);
}
}
Are there any beter ways to do this.
This is the output I am currently getting:
Data:<DATA><ETX><STX>A?123<ETX><STX><DATA><ETX><STX>name!xyz<ETX><STX>age!27y<ETX><STX> </DATA>
Data:<DATA><ETX><STX>name!abc<ETX><STX>age!24y<ETX><STX></DATA>
Data:<DATA><ETX><STX>name!bac<ETX><STX>age!22y<ETX><STX></DATA>
I am missing the A?234 and A?345 in the next two matches.
I really dont know what exactly you want to achieve by this but if you want to remove the occurances of that pattern this line:
buff.toString().replace(dataElements, "")
doesn't look good. you are just editing the string representation of that buff. You have to again replace the edited version back into the buff (after casting).
Using this regex solves my issue:
<STX>(A*)(.*?)<DATA>(.*?)</DATA>

android - search and replace in string java android

This is a part of a string
test="some text" test2="othertext"
It contains a lot more of similar text with same formating. Each "statment" is separate by empty space
How to search by name(test, test2) and replace its values(stuff between "")?
in java
I dont know if its clear enough but i dont know how else to explain it
I want to search for "test" and replace its content with something else
replace
test="some text" test2="othertext"
with something else
Edit:
This is a content of a file
test="some text" test2="othertext"
I read content of that file in a string
Now i want to replace some text with something else
some text is not static it can be anything
You can use the replace() method of String, which comes in 3 types and 4 variants:
revStr.replace(oldChar, newChar)
revStr.replace(target, replacement)
revStr.replaceAll(regex, replacement)
revStr.replaceFirst(regex, replacement)
Eg:
String myString = "Here is the home of the home of the Stars";
myString = myString.replace("home","heaven");
///////////////////// Edited Part //////////////////////////////////////
String s = "The quick brown fox test =\"jumped over\" the \"lazy\" dog";
String lastStr = new String();
String t = new String();
Pattern pat = Pattern.compile("test\\s*=\\s*\".*\"");
Matcher mat = pat.matcher(s);
while (mat.find()) {
// arL.add(mat.group());
lastStr = mat.group();
}
Pattern pat1 = Pattern.compile("\".*\"");
Matcher mat1 = pat1.matcher(lastStr);
while (mat1.find()) {
t = mat.replaceAll("test=" + "\"Hello\"");
}
System.out.println(t);
So you want to replace every instance of "test" with something else?
Let's say the string name is myString:
myString = myString.replace("test","something else");
Is this what you are looking to do?
I think you are asking that you fetch data from file in the form of string,
lets suppose, your string is,
String s = "My name="sahil" and my company="microsoft", also i live in
country="india"".
Now you want to replace "sahil" with "mahajan" and "microsoft" with "google".
I have tried experimenting with the string methods to implement this functionality, but didnt find a relavent result. But i could provide you with some methods. You could use regionMatches, indexOf("name=""). But these functions will help you in finding where sahil(suppose) is located. but the replcae function here is difficult to work, because it replaces character sequence, for which you should know the exact character sequence.
Now you might try experimenting with the string methods. It could help.
I haven't tested this, but it should work:
String mFileContents;
private void replaceValue(String name, String newValue) {
int nameIndex = mFileContents.indexOf(name);
int equalSignIndex = mFileContents.indexOf("=", nameIndex);
int oldValueIndex = equalSignIndex + 2;
int oldValueLength = mFileContents.indexOf("\"", oldValueIndex);
String oldValue = mFileContents.substring(oldValueIndex, oldValueLength);
String firstHalf = mFileContents.substring(0, oldValueIndex -1);
String secondHalf = mFileContents.substring(oldValueIndex);
secondHalf.replaceFirst(oldValue, newValue);
mFileContents = firstHalf + secondHalf;
}
String a = "some text";
a = a.replace("text", "inserted value");
System.out.print(a);
Try this

Using Multiple Java regular expressions

I am trying to extract an email and replace it with a space using a pattern(EMAIL_PATTERN). When running the following, no output is produced when a full document is passed in. The pattern will only match the entire region. So this means if we pass in only the email, the email will be matched and be replaced with a space. But the purpose of the following method is to find the email and previous manual extraction is not required. After the email in the tempString has been replaced, I want to use it for the next pattern. Should I combine the patterns I want to use in one method or should they be placed in separate methods? Below is the code I have as of now. Also I have other patterns, but since my method is not working correctly I have not posted them yet.
private static final String EMAIL_PATTERN = "[_A-Za-z0-9-]+(\\.[_A-Za-z0-9-]+)*#[A-Za-z0-9]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";
public static void main (String[] args) {
//Document takes in a ID, student information(which includes email, address, phone, name), school, and text
Document r = new Document("", "FirstName LastName, Address, example#email.com, phoneNumber", "School", "experience", "text");
personalEmailZone(r);
}
public static Document personalEmailZone(Document doc){
//tempString is the personal information section of a resume
String tempPI = doc.tempString();
if(doc.tempString().matches(EMAIL_PATTERN) == true){
//Pattern pattern = Pattern.compile("");
Pattern pattern = Pattern.compile(EMAIL_PATTERN);
Matcher matcher = pattern.matcher(tempPI);
String emailTemp = "";
if(matcher.find()){
emailTemp = matcher.group();
System.out.println(emailTemp);
//PI.replace(emailTemp, "");
System.out.println(emailTemp.replace(emailTemp, ""));
tempPI = tempPI.replace(emailTemp, "");
System.out.println(tempPI);
}
}
return doc;
}
You have several problems:
public static Document personalEmailZone(Document doc){
//tempString is the personal information section of a resume
String tempPI = doc.tempString();
if(doc.tempString().matches(EMAIL_PATTERN) == true){
The above statement attempts to match the entire document against the email address pattern. This will not match unless doc.tempString() contains ONLY a single email address and nothing else.
//Pattern pattern = Pattern.compile("");
Pattern pattern = Pattern.compile(EMAIL_PATTERN);
Matcher matcher = pattern.matcher(tempPI);
String emailTemp = "";
if(matcher.find()){
emailTemp = matcher.group();
System.out.println(emailTemp);
//PI.replace(emailTemp, "");
System.out.println(emailTemp.replace(emailTemp, ""));
Not sure what the above is for. If your code ever reached this point, it would always print an empty line.
tempPI = tempPI.replace(emailTemp, "");
System.out.println(tempPI);
}
Since there's no loop, you will have replaced only the first occurrence of an email address. If you're expecting to replace ALL occurrences, you need to loop over the input.
}
return doc;
At this point you haven't actually modified doc, so you're returning the document in its original form, with email addresses included.
}
Look at the Javadoc for String#replaceAll(String regex, String replacement)
You can place your patterns in different methods, which return the modified string for the text pattern usage. For example
String tempPI = doc.tempString();
tempPI = applyPattern1(tempPI);
tempPI = applyPattern2(tempPI)
tempPI = applyPattern3(tempPI);
Your code does't show any output because of doc.tempString().matches(EMAIL_PATTERN) == true. Maybe it's not needed there, since it expects the entire string to be an email.

Categories

Resources