Get image link and text from string

Get image link and text from string - java

I have this string
<div><img width="100px" src="http://www.mysite.com/Content/dataImages/news/small/some-pic.png" /><br />This is some text that I need to get.</div>
and i need to get the image link and the text This is some text that I need to get.from the string above in Java. Can anybody tell me how can I do this?

Use regex to get what you want.

If this is all you have to do there's no point in bringing in extra packages just use regex:
The pattern "(?<=src=\")(.*?)(?=\")" can be used to get the link, you can modify that to give you the text.

Try this, just change the patter if you must.
String str = "<div><img width=\"100px\" src=\"http://www.mysite.com/Content/dataImages/news/small/some-pic.png\" /><br />This is some text that I need to get.</div>";
Pattern p = Pattern.compile("src=\"(.*?)\" /><br />(.*?)</div>");
Matcher m = p.matcher(str);
if (m.find()) {
String link = m.group(1);
String text = m.group(2);
}

My solution was:
String tmp=xpp.nextText();
desc=android.text.Html.fromHtml(tmp).toString();
img=FindUrls.extractUrls(tmp);
for extracting the text from the string I used:
desc=android.text.Html.fromHtml(tmp).toString();
img=FindUrls.extractUrls(tmp);
and for the link inside the string I've used this function:
public static String extractUrls(String input) {
String result = null;
Pattern pattern = Pattern.compile(
"\\b(((ht|f)tp(s?)\\:\\/\\/|~\\/|\\/)|www.)" +
"(\\w+:\\w+#)?(([-\\w]+\\.)+(com|org|net|gov" +
"|mil|biz|info|mobi|name|aero|jobs|museum" +
"|travel|[a-z]{2}))(:[\\d]{1,5})?" +
"(((\\/([-\\w~!$+|.,=]|%[a-f\\d]{2})+)+|\\/)+|\\?|#)?" +
"((\\?([-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?" +
"([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)" +
"(&(?:[-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?" +
"([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)*)*" +
"(#([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)?\\b");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
result=matcher.group();
}
return result;
}
Hope It will help someone that has similar problem

Related

Retrieving concrete data from String

Im trying to retrieve data-product id from the String which goes like this:
<img class="lazy" src="/b/mp/img/svg/no_picture.svg" lazy-img="https://ecsmedia.pl/c/w-pustyni-i-w-puszczy-p-iext43240721.jpg" alt="">
The output should be
prod14290034
I tried to achieve this with a regular expression, but I'm beginner in it.
Is regular expression good for it? If so, how to do it?
/EDIT
According to Emma's comment.
I've made something like this:
String z = element.toString();
Pattern pattern = Pattern.compile("data-product-id=\"\\s*([^\\s\"]*?)\\s*\"");
Matcher matcher = pattern.matcher(z);
System.out.println(matcher.find());
if (matcher.find()) {
System.out.println(matcher.group());
}
it returns true, but dont print any value. Why?

You might use some HTML/XHTML/XML library which could transform your string data into document or at least Element and then you can easily obtain the attribute value from there. But if you want to use regex then you can try this snippet
#Test
public void productId() {
String src =
" <img class=\"lazy\" src=\"/b/mp/img/svg/no_picture.svg\" lazy-img=\"https://ecsmedia.pl/c/w-pustyni-i-w-puszczy-p-iext43240721.jpg\" alt=\"\"> ";
final Pattern pattern = Pattern.compile("(data-product-id=)\"(p[a-zA-Z]+[0-9]+)\"");
final Matcher matcher = pattern.matcher(src);
String prodId = null;
if (matcher.find()) {
System.out.println(matcher.groupCount());
prodId = matcher.group(2);
}
System.out.println(prodId);
Assert.assertNotNull(prodId);
Assert.assertEquals(prodId, "prod14290034");
}

You can use jsoup for Java - it is a library for parsing HTML pages. There are a lot of other libraries for different languages, beautifulSoup for python.
EDIT: Here is a snippet for jsoup, you can select any element with a tag, and then get needed attribute with attr method.
Document doc = Jsoup.parse(
"<a href=\"/w-pustyni-i-w-puszczy-sienkiewicz-henryk,prod14290034,ksiazka-p\" " +
"class=\"img seoImage\" " +
"title=\"W pustyni i w puszczy - Sienkiewicz Henryk\" " +
"rel=\"nofollow\" " +
"data-product-id=\"prod14290034\"> " +
"<img class=\"lazy\" src=\"/b/mp/img/svg/no_picture.svg\" lazy-img=\"https://ecsmedia.pl/c/w-pustyni-i-w-puszczy-p-iext43240721.jpg\" alt=\"\"> </a>\n"
);
String dataProductId = doc.select("a").first().attr("data-product-id");

RegEx to extract text between tags in Java

I need to extract the values after :70: in the following text file using RegEx. Value may contain line breaks as well.
My current solution is to extract the string between :70: and : but this always returns only one match, the whole text between the first :70: and last :.
:32B:xxx,
:59:yyy
something
:70:ACK1
ACK2
:21:something
:71A:something
:23E:something
value
:70:ACK2
ACK3
:71A:something
How can I achive this using Java? Ideally I want to iterate through all values, i.e.
ACK1\nACK2,
ACK2\nACK3
Thanks :)
Edit: What I'm doing right now,
Pattern pattern = Pattern.compile("(?<=:70:)(.*)(?=\n)", Pattern.DOTALL);
Matcher matcher = pattern.matcher(data);
while (matcher.find()) {
System.out.println(matcher.group())
}

Try this.
String data = ""
+ ":32B:xxx,\n"
+ ":59:yyy\n"
+ "something\n"
+ ":70:ACK1\n"
+ "ACK2\n"
+ ":21:something\n"
+ ":71A:something\n"
+ ":23E:something\n"
+ "value\n"
+ ":70:ACK2\n"
+ "ACK3\n"
+ ":71A:something\n";
Pattern pattern = Pattern.compile(":70:(.*?)\\s*:", Pattern.DOTALL);
Matcher matcher = pattern.matcher(data);
while (matcher.find())
System.out.println("found="+ matcher.group(1));
result:
found=ACK1
ACK2
found=ACK2
ACK3

You need a loop to do this.
Pattern p = Pattern.compile(regexPattern);
List<String> list = new ArrayList<String>();
Matcher m = p.matches(input);
while (m.find()) {
list.add(m.group());
}
As seen here Create array of regex matches

How to split a long string in Java?

How to edit this string and split it into two?
String asd = {RepositoryName: CodeCommitTest,RepositoryId: 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef};
I want to make two strings.
String reponame;
String RepoID;
reponame should be CodeCommitTest
repoID should be 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef
Can someone help me get it? Thanks

Here is Java code using a regular expression in case you can't use a JSON parsing library (which is what you probably should be using):
String pattern = "^\\{RepositoryName:\\s(.*?),RepositoryId:\\s(.*?)\\}$";
String asd = "{RepositoryName: CodeCommitTest,RepositoryId: 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef}";
String reponame = "";
String repoID = "";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(asd);
if (m.find()) {
reponame = m.group(1);
repoID = m.group(2);
System.out.println("Found reponame: " + reponame + " with repoID: " + repoID);
} else {
System.out.println("NO MATCH");
}
This code has been tested in IntelliJ and runs without error.
Output:
Found reponame: CodeCommitTest with repoID: 425f5fc5-18d8-4ae5-b1a8-55eb9cf72bef

Assuming there aren't quote marks in the input, and that the repository name and ID consist of letters, numbers, and dashes, then this should work to get the repository name:
Pattern repoNamePattern = Pattern.compile("RepositoryName: *([A-Za-z0-9\\-]+)");
Matcher matcher = repoNamePattern.matcher(asd);
if (matcher.find()) {
reponame = matcher.group(1);
}
and you can do something similar to get the ID. The above code just looks for RepositoryName:, possibly followed by spaces, followed by one or more letters, digits, or hyphen characters; then the group(1) method extracts the name, since it's the first (and only) group enclosed in () in the pattern.

RegEx Etract the second URL from String

I'm trying to extract the second url from Stings like these
submitted by thecrappycoder <br /> [link] [3 comments]
submitted by durdn <br /> [link] [1 comment]
by using regex. I tried this.
String regex = "\\(?\\b(http://|www[.])[-A-Za-z0-9+&##/%?=~_()|!:,.;]*[-A-Za-z0-9+&##/%=~_()|]";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(text);
while(m.find()) {
String urlStr = m.group();
urlStr = urlStr.substring(1, 3);
links.add(urlStr);
}
I also tried in that way
System.out.println(("http://"+text.split("http://")[1]).split("")[0]);
Unfortunately, I couldn't get it. Any help, thank you.

You can take the same approach with a simplified regex pattern:
String text = "submitted by thecrappycoder <br />" +
" [link] " +
"[3 comments]\n" +
" ";
String regex = "href=.(http.*?)\"";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(text);
m.find(); // ignore the 1st match
m.find(); // find the 2nd match
String urlStr = m.group(); // read the 2nd match
System.out.println("urlStr = " + urlStr); // prints: urlStr = http://blogs.msdn.com/b/bethmassi/archive/2015/02/25/understanding-net-2015.aspx

Find E-Mail Adresses in a text

Can somebody tell me how to find E-Mail Adresses in a text?
Example text:
"Hey,
I just blahblah
E-Mail: lolcat#catinator.com
Another would be lolcat2#catinator.com"
So the output is:
lolcat#catinator.com
lolcat2#catinator.com
I tried Regex, but I got no idea how I can do this over an entire text...
Pattern pattern = Pattern.compile("[A-Z0-9._%+-]+#[A-Z0-9.-]+\\.[A-Z]{2,4}");
Matcher matcher = pattern.matcher("asd#asdasd.de".toUpperCase());
if(matcher.matches()){
System.out.println("Mail found!");
}else{
System.out.println("No Mail...");
}
Can somebody help me? :(
Greetings!

They're so many different types of email address formats that it is hard to match all of them. A simple (for your structured data) but no so effective approach would be the following:
String s = "Hey,\n" +
"I just blahblah\n" +
"E-Mail: lolcat#catinator.com\n" +
"Another would be lolcat2#catinator.com";
Pattern p = Pattern.compile("\\S+#\\S+");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
Output
lolcat#catinator.com
lolcat2#catinator.com

I am not sure about the regex expression you have provided. But if it is good and serves your purpose then you can use following to extract the string,
Matcher matcher = pattern.matcher("asd#asdasd.de".toUpperCase());
String result;
while (matcher.find()) {
// result now will contain the email address
result = matcher .group();
System.out.println(result);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Get image link and text from string - java

Use regex to get what you want.

If this is all you have to do there's no point in bringing in extra packages just use regex: The pattern "(?<=src=\")(.*?)(?=\")" can be used to get the link, you can modify that to give you the text.

Related

Retrieving concrete data from String

RegEx to extract text between tags in Java

How to split a long string in Java?

RegEx Etract the second URL from String

Find E-Mail Adresses in a text

Categories

Resources