Java: Replace only exactly matching URL - java

I want to replace only exactly matching link given in String.
My code is as follows:
String originalString = "<a target=\"_blank\" href=\"http://example.com/\"><span style=\"font-size: 12px;\">ABC</span></a>"
+ "<a target=\"_blank\" href=\"http://example.com/contact/\"><span style=\"font-size: 12px;\">Contact</span></a>";
String replacedString = originalString.replace("http://example.com/", "link1");
System.out.println("Replaced String:" + replacedString);
replacedString = "<a target="_blank" href="link1"><span style="font-size: 12px;">ABC</span></a><a target="_blank" href="link1contact/"><span style="font-size: 12px;">Contact</span></a>"
requiredString = "<a target="_blank" href="link1"><span style="font-size: 12px;">ABC</span></a><a target="_blank" href="link2"><span style="font-size: 12px;">Contact</span></a>"
I get Output as replacedString but required Output should be as requiredString.
Thanks in advance.

Replace the URL with the quotes:
String replacedString = originalString.replace("\"http://example.com/\"", "\"link1\"");
replacedString = replacedString.replace("\"http://example.com/contact/\"", "\"link2\"");

The problem is that http://example.com/contact/ contains http://example.com/.
Use this instead:
String replacedString = originalString.replace("http://example.com/contact/", "link2");
String replacedString2 = replacedString.replace("http://example.com/", "link1");
replacedString2 is the required output

working regex ishttp:\\/\\/example.com.*?(?=\\\\) in java. it matches all occurences of http://example.com and until the next backslash

Related

How to extract substring(html) and another substring (which will be used for regex) and place it all in proper format?

I have a giant string which contains the below code and I need to extract contains in such a way that,if any HTML comes append it and if any substring that contains following pattern, create a link out of it and it in proper format and place and goes on.
Example:
<div id="contentPermission">
[[MI44,MI304,MI409,MI45,MI264,MI108,MI46,MI47,MI48,MI49,MI50,MI51,MI52,MI58,MI530]]
</div>
<div> </div>
<p> </p>
<div> </div>
<p> </p>
<p>[[LP1137]]</p>
Pattern: starting "[[" and ends with "]]"
Form above code:
[[anything between these brackets]]
So the outside should be like this:
<div id="contentPermission">
<a href="index?page=content&id=MI44></a>
<a href="index?page=content&id=MI304></a>
<a href="index?page=content&id=MI409></a>
......
......
</div>
<div> </div>
<p> </p>
<div> </div>
<p> </p>
<p><a href="index?page=content&id=LP1137></a></p>
Solution
public static void main(String[] args) {
StringBuilder str = new StringBuilder("<div id=\"contentPermission\">"
+ " [[MI44,MI304,MI409,MI45,MI264,MI108,MI46,MI47,MI48,MI49,MI50,MI51,MI52,MI58,MI530]]"
+ "</div><div> </div><p> </p><div> </div><p> </p><p>[[LP1137]]</p>");
System.out.println("Before " + str.toString()+"\n\n\n");
Pattern pattern = Pattern.compile("\\[{2}.[^\\]]*\\]{2}");
Matcher matcher = pattern.matcher(str);
while(matcher.find()){
String codes = matcher.group(0);
codes = codes.substring(2, codes.length()-2);
StringBuilder urls = new StringBuilder();
for(String code:codes.split(",")){
urls.append("\n");
}
str = new StringBuilder(matcher.replaceFirst(urls.toString()));
matcher = pattern.matcher(str);
}
System.out.println("Replaced " + str.toString());
}
Another solution with regex only (no split/loop nor substring) :
String content = "<div id=\"contentPermission\">[[MI44,MI304,MI409,MI45,MI264,MI108,MI46,MI47,MI48,MI49,MI50,MI51,MI52,MI58,MI530]]</div><div> </div><p> </p><div> </div><p> </p><p>[[LP1137]]</p>";
Pattern p = Pattern.compile("(?<=\\[\\[).*?(?=\\]\\])");
Matcher m = p.matcher(content);
while(m.find())
content = content.replaceFirst("(\\[\\[).*?(\\]\\])", m.group().replaceAll("(\\w+)(,\\s*\\d*)*", ""));

java regular expressions regex

I have problem with extracting data from website.
Im trying to get name of company and price its: SYGNITY and 8,40
<a class="link" href="http://www.money.pl/gielda/spolki-gpw/PLCMPLD00016.html">SYGNITY</a>
</td>
<td class="ac"><img width="12" height="11" src="http://static1.money.pl/i/gielda/chart.gif" title="Pokaż wykres" alt="Pokaż wykres" /></td>
<td class="al">SGN</td>
<td class="ar">8,40</td>
I tried to use this pattern but it doesnt work:
String expr = "<a class=\"link\" href=\"(.+?)\">(.+?)</a>(.+?)<td class=\"ar\">(.+?)</td> ";
any advices?
Using JSoup parser
You should use a html parser like JSoup since regex is not a good idea to parse html.
You can do something like this:
String htmlString = "YOUR HTML HERE";
Document document=Jsoup.parse(htmlString);
Element element=document.select("a[href=http://www.money.pl/gielda/spolki-gpw/PLCMPLD00016.html]").first();
System.out.println(element.text()); //SYGNITY
element=document.select("td[class=ar]").first();
System.out.println(element.text()); //8,40
Using regex
If you still want to use a regex, then you could use a regex like below and grab the content from capturing groups:
PLCMPLD00016.html">(.*?)<\/a>|"ar">(.*?)<\/td>
Working demo
String htmlString = "YOUR HTML HERE"
Pattern pattern = Pattern.compile("PLCMPLD00016.html">(.*?)<\\/a>|"ar">(.*?)<\\/td>");
Matcher matcher = pattern.matcher(htmlString );
while (matcher.find()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}

WebDriver getText() Method with replace using Java

Html Code
<span data-bind="html: TotalCharges">
<span class="CurrencySymbol">USD </span>
7400.00
<br>
(0.00+0.00)
</span>
Webdriver to get the Totalcharge value using getText method
Code:
driver.findElement(By.xpath("//span[#data-bind='html: TotalCharges']")).getText().substring(4);
with above will get the below output
"7400.00
(0.00+0.00)"
my Expected output :"7400.00"
so how can i replace the char from "< br>" tag (need to replace "(0.00+0.00)")
i'm using java
Use the following xpath to get 7400.00:
driver.findElement(By.xpath("//span[#class='CurrencySymbol']/following-sibling::text()[1]").getText();
Oh My mistake, Thanks for correcting me #alecxe:
You can get it by:
driver.findElement(By.xpath("//span[#class='CurrencySymbol']/.."))
.getText().split("\n")[0].split(" ")[1]
splitting at \n will split it for <br> tag.
Try following solution. It will give you 7400.00 output-
String temp = driver.findElement(By.cssSelector("html>body>span")).getText();
String s1=temp.replace("USD", "").replace("\n", "").replace("\r", "");
String finalStr = s1.substring(0,s1.indexOf("(")).trim();
System.out.println(finalStr);

java: regular expression

I have a Html string which include lots of image tag, I need to get the tag and change it. for example:
String imageRegex = "(<img.+(src=\".+\").+/>){1}";
String str = "<img src=\"static/image/smiley/comcom/9.gif\" smilieid=\"296\" border=\"0\" alt=\"\" />hello world<img src=\"static/image/smiley/comcom/7.gif\" smilieid=\"294\" border=\"0\" alt=\"\" />";
Matcher matcher = Pattern.compile(imageRegex, Pattern.CASE_INSENSITIVE).matcher(msg);
int i = 0;
while (matcher.find()) {
i++;
Log.i("TAG", matcher.group());
}
the result is :
<img src="static/image/smiley/comcom/9.gif" smilieid="296" border="0" alt="" />hello world<img src="static/image/smiley/comcom/7.gif" smilieid="294" border="0" alt="" />
but it's not I want, I want the result is
<img src="static/image/smiley/comcom/9.gif" smilieid="296" border="0" alt="" />
<img src="static/image/smiley/comcom/7.gif" smilieid="294" border="0" alt="" />
what's wrong with my regular expression?
Try (<img)(.*?)(/>), this should do the trick, although yes, you shouldn't use Regex for parsing HTML, as people will tell you over and over.
I don't have eclipse installed, but I have VS2010, and this works for me.
String imageRegex = "(<img)(.*?)(/>)";
String str = "<img src=\"static/image/smiley/comcom/9.gif\" smilieid=\"296\" border=\"0\" alt=\"\" />hello world<img src=\"static/image/smiley/comcom/7.gif\" smilieid=\"294\" border=\"0\" alt=\"\" />";
System.Text.RegularExpressions.MatchCollection match = System.Text.RegularExpressions.Regex.Matches(str, imageRegex, System.Text.RegularExpressions.RegexOptions.IgnoreCase);
StringBuilder sb = new StringBuilder();
foreach (System.Text.RegularExpressions.Match m in match)
{
sb.AppendLine(m.Value);
}
System.Windows.MessageBox.Show(sb.ToString());
Result:
<img src="static/image/smiley/comcom/9.gif" smilieid="296" border="0" alt="" />
<img src="static/image/smiley/comcom/7.gif" smilieid="294" border="0" alt="" />
David M is correct, you really shouldn't try to do this, but your specific problem is that the + quantifier in your regex is greedy, so it will match the longest possible substring that could match.
See The regex tutorial for more details on the quantifiers.
I'd NOT recommend to use regex for parsing HTML. Please consider JSoup or similar solutions
Document doc = Jsoup.connect("http://en.wikipedia.org/").get();
Elements images = doc.select("img");
Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp.

regex how do i remove the last "<br>" in a string

How can i remove the last <br> from a string with replace() or replaceAll()
the <br> comes after either a <br> or a word
my idea is to add a string to the end of a string and then it'll be <br> +my added string
how can i replace it then?
Regexp is probably not the best for this kind of task. also check answers to this similar question.Looking for <br> you could also find <BR> or <br />
String str = "ab <br> cd <br> 12";
String res = str.replaceAll( "^(.*)<br>(.*)$", "$1$2" );
// res = "ab <br> cd 12"
If you are trying to replace the last <br /> which might not be the last thing in the string, you could use something like this.
String replaceString = "<br />";
String str = "fdasfjlkds <br /> fdasfds <br /> dfasfads";
int ind = str.lastIndexOf(replaceString);
String newString = str.substring(0, ind - 1)
+ str.substring(ind + replaceString.length());
System.out.println(newString);
Output
fdasfjlkds <br /> fdasfds> dfasfads
Ofcourse, you'll have to add some checks to to avoid NPE.
Not using replace, but does what you want without a regex:
String s = "blah blah <br>";
if (s.endsWith("<br>")) {
s = s.substring(0, s.length() - 4);
}
Using regex, it would be:
theString.replaceAll("<br>$", "");

Categories

Resources