Remove html from only a part of string - java

I have the following code which should remove all HTML from a part of string, which is quoted by dollar signs (could be more of them). This works fine, but I also need to preserve those dollar signs. Any suggestions, thanks
private static String removeMarkupBetweenDollars(String input){
if ((input.length()-input.replaceAll("\\$","").length())%2!=0)
{
throw new RuntimeException("Missing or extra: dollar");
}
Pattern pattern = Pattern.compile("\\$(.*?)\\$",Pattern.DOTALL);
Matcher matcher = pattern.matcher(input);
StringBuffer sb =new StringBuffer();
while(matcher.find())
{ //prepending does NOT work, if sth. is in front of first dollar
matcher.appendReplacement(sb,matcher.group(1).replaceAll("\\<.*?\\>", ""));
sb.append("$"); //note this manual appending
}
matcher.appendTail(sb);
System.out.println(sb.toString());
return sb.toString();
}
Thanks for help!
String input="<p>$<em>something</em>$</p> <p>anything else</p>";
String output="<p>$something$</p> <p>anything else</p>";
More complicated input and output:
String input="<p>$ bar <b>foo</b>  bar <span style=\"text-decoration: underline;\">foo</span>  $</p><p>another foos</p> $ foo bar <em>bar</em>$";
String output="<p>$ bar foo  bar foo  $</p><p>another foos</p> $ foo bar bar$"

Just some minor tweaks to your code:
private static String removeMarkupBetweenDollars(String input) {
if ((input.length() - input.replaceAll("\\$", "").length()) % 2 != 0) {
throw new RuntimeException("Missing or extra: dollar");
}
Pattern pattern = Pattern.compile("\\$(.*?)\\$", Pattern.DOTALL);
Matcher matcher = pattern.matcher(input);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
String s = matcher.group().replaceAll("<[^>]+>", "");
matcher.appendReplacement(sb, Matcher.quoteReplacement(s));
}
matcher.appendTail(sb);
return sb.toString();
}

String output = input.replaceAll("\\$<.*?>(.*?)<.*?>\\$", "\\$$1\\$");
One key point in the regex is the ? in .*? - it means a "non greedy" match, which in turn means "consume the least possible input you can". Without this, the regex would try to consume as much as possible - up to the end of a subsequent occurrence of $<html>foo</html>$ in the input if one existed.
Here's a test:
public static void main(String[] args) throws Exception {
String input = "<p>$<em>something</em>$</p> <p>and $<em>anything</em>$ else</p>";
String output = input.replaceAll("\\$<.*?>(.*?)<.*?>\\$", "\\$$1\\$");
System.out.println(output);
}
Output:
<p>$something$</p> <p>and $anything$ else</p>

Related

Java Regex compress String

I have random String for example "aaaaaaBccccCCCCd" I need make regex which searches the text for groups to get effect "a6B1c4C4d1". My regex looks like that "(\\D+)\\D*\\1" but he lost single letters, so in this sample B and d.
Maybe someone would have an idea?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Compress {
public static void main(String[] args) {
String text = "aaaaaaBccccCCCCd";
String regex = "(\\D+)\\D*\\1"; // or (.+).*\\1
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(text);
String result = new String();
while (matcher.find()) {
String letter = matcher.group().substring(0, 1);
String numberOfLetter = String.valueOf(matcher.group().length());
result = result + letter + numberOfLetter;
}
System.out.println(result);
}
}
Thank you.
Use the following approach based on Matcher#appendReplacement:
String text = "aaaaaaBccccCCCCd"; //a6B1c4C4d1
String regex = "(.)(\\1*)";
String pattern = "test";
Pattern r = Pattern.compile(regex);
Matcher m = r.matcher(text);
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, m.group(1) + (m.group(2).length()+1));
}
m.appendTail(sb);
System.out.println(sb);
See the Java demo
The (.)(\1*) will capture any char into Group 1 and then will capture into Group 2 zero or more repetitions of the same content. In the "callback", Group 1 is concatenated with the length of Group 2 incremented to account for the Group 1 length.

Increment digit inside string

Hi i want to increment the integer values in between the string.
for example the initial string is -- m1p1b1.
The below code is working correctly, But it has one problem.
When the string is m10p10b10 it gives the result m21p21b21 not m11p11b11.
Also the integer length between the string dynamic, So i cant do any static code.
Pattern digitPattern = Pattern.compile("(\\d)");
Matcher matcher = digitPattern.matcher("m1p1b1");
StringBuffer result = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(result, String.valueOf(Integer.parseInt(matcher.group(1)) + 1));
}
matcher.appendTail(result);
System.out.println(result.toString());
Change \\d to \\d+ to match one or more digits:
Pattern digitPattern = Pattern.compile("\\d+");
Matcher matcher = digitPattern.matcher("m10p10b10");
StringBuffer result = new StringBuffer();
while (matcher.find()) {
matcher.appendReplacement(result, String.valueOf(Integer.parseInt(matcher.group(0)) + 1));
}
matcher.appendTail(result);
System.out.println(result.toString()); // => m11p11b11
See the IDEONE demo
Note you do not have to capture the whole pattern with (...), you can access the value using matcher.group(0).

Uppercase all characters but not those in quoted strings

I have a String and I would like to uppercase everything that is not quoted.
Example:
My name is 'Angela'
Result:
MY NAME IS 'Angela'
Currently, I am matching every quoted string then looping and concatenating to get the result.
Is it possible to achieve this in one regex expression maybe using replace?
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("\\'(.*?)\\'");
String input = "'s'Hello This is 'Java' Not '.NET'";
Matcher regexMatcher = regex.matcher(input);
StringBuffer sb = new StringBuffer();
int counter = 0;
while (regexMatcher.find())
{// Finds Matching Pattern in String
regexMatcher.appendReplacement(sb, "{"+counter+"}");
matchList.add(regexMatcher.group());// Fetching Group from String
counter++;
}
String format = MessageFormat.format(sb.toString().toUpperCase(), matchList.toArray());
System.out.println(input);
System.out.println("----------------------");
System.out.println(format);
Input: 's'Hello This is 'Java' Not '.NET'
Output: 's'HELLO THIS IS 'Java' NOT '.NET'
You could use a regular expression like this:
([^'"]+)(['"]+[^'"]+['"]+)(.*)
# match and capture everything up to a single or double quote (but not including)
# match and capture a quoted string
# match and capture any rest which might or might not be there.
This will only work with one quoted string, obviously. See a working demo here.
Ok. This will do it for you.. Not efficient, but will work for all cases. I actually don't suggest this solution as it will be too slow.
public static void main(String[] args) {
String s = "'Peter' said, My name is 'Angela' and I will not change my name to 'Pamela'.";
Pattern p = Pattern.compile("('\\w+')");
Matcher m = p.matcher(s);
List<String> quotedStrings = new ArrayList<>();
while(m.find()) {
quotedStrings.add(m.group(1));
}
s=s.toUpperCase();
// System.out.println(s);
for (String str : quotedStrings)
s= s.replaceAll("(?i)"+str, str);
System.out.println(s);
}
O/P :
'Peter' SAID, MY NAME IS 'Angela' AND I WILL NOT CHANGE MY NAME TO 'Pamela'.
Adding to the answer by #jan_kiran, we need to call the
appendTail()
method appendTail(). Updated code is:
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("\\'(.*?)\\'");
String input = "'s'Hello This is 'Java' Not '.NET'";
Matcher regexMatcher = regex.matcher(input);
StringBuffer sb = new StringBuffer();
int counter = 0;
while (regexMatcher.find())
{// Finds Matching Pattern in String
regexMatcher.appendReplacement(sb, "{"+counter+"}");
matchList.add(regexMatcher.group());// Fetching Group from String
counter++;
}
regexMatcher.appendTail(sb);
String formatted_string = MessageFormat.format(sb.toString().toUpperCase(), matchList.toArray());
I did not find my luck with these solutions, as they seemed to remove trailing non-quoted text.
This code works for me, and treats both ' and " by remembering the last opening quotation mark type. Replace toLowerCase appropriately, of course...
Maybe this is extremely slow; I don't know:
private static String toLowercaseExceptInQuotes(String line) {
StringBuffer sb = new StringBuffer(line);
boolean nowInQuotes = false;
char lastQuoteType = 0;
for (int i = 0; i < sb.length(); ++i) {
char cchar = sb.charAt(i);
if (cchar == '"' || cchar == '\''){
if (!nowInQuotes) {
nowInQuotes = true;
lastQuoteType = cchar;
}
else {
if (lastQuoteType == cchar) {
nowInQuotes = false;
}
}
}
else if (!nowInQuotes) {
sb.setCharAt(i, Character.toLowerCase(sb.charAt(i)));
}
}
return sb.toString();
}

Is there a way to use replaceAll on string but call method for replacing the text on each occurrence of a match

I want to replace all occurrences of particular string with different UUID's. For example,
content = content.replaceAll("xyz", "xyz" + generateUUID());
but problem here is that all the "xyz"'s will get replaced by same UUID. But I want that each "xyz" gets replaced by an individual unique ID. How can this be done?
You can do this using Matcher.appendReplacement. This will give you the replaceAll functionality of a complete regex (not just a static String). Here, I use uidCounter as a very simple generateUUID; you should be able to adapt this to your own generateUUID function.
public class AppendReplacementExample {
public static void main(String[] args) throws Exception {
int uidCounter = 1000;
Pattern p = Pattern.compile("xyz");
String test = "abc xyz def xyz ghi xyz";
Matcher m = p.matcher(test);
StringBuffer sb = new StringBuffer();
while(m.find()) {
m.appendReplacement(sb, m.group() + uidCounter);
uidCounter++;
}
m.appendTail(sb);
System.out.println(sb.toString());
}
}
Output:
abc xyz1000 def xyz1001 ghi xyz1002
You could use a StringBuilder (for efficiency, since String is immutable), a while loop and something like
// content = content.replaceAll("xyz", "xyz" + generateUUID());
StringBuilder sb = new StringBuilder(content);
String toReplace = "xyz";
int toReplaceLen = toReplace.length();
int pos;
while ((pos = sb.indexOf(toReplace)) > -1) {
sb.replace(pos, pos + toReplaceLen, generateUUID());
}
// content = sb.toString(); // <-- if you want to use content.
It looks like you'd like a way to say something like this:
content = content.replaceAll("xyz", x -> x + generateUUID());
Here's an adaptation of durron597's answer that lets you do almost that:
content = replaceAll(content, "xyz", x -> x + generateUUID());
public static String replaceAll(String source, String regex,
Function<String, String> replacement) {
StringBuffer sb = new StringBuffer();
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(source);
while (matcher.find()) {
matcher.appendReplacement(sb, replacement.apply(matcher.group(0)));
}
matcher.appendTail(sb);
return sb.toString();
}

Java Pattern match

I've a long template from which I need to extract certain strings based on certain patterns. When I went through some examples I found that use of quantifiers is good in such situations.For example following is my template, from which I need to extract while and doWhile.
This is a sample document.
$while($variable)This text can be repeated many times until do while is called.$endWhile.
Some sample text follows this.
$while($variable2)This text can be repeated many times until do while is called.$endWhile.
Some sample text.
I need to extract the whole text, starting from $while($variable) till $endWhile. I then need to process the value of $variable. After that I need to insert the text between $while and $endWhile to the original text.
I've the logic of extracting the variable. But I'm not sure how to use quantifiers or pattern match here.
Can someone please provide me a sample code for this? Any help will be greatly appreciated
You can use a rather simple regex-based solution here with a Matcher:
Pattern pattern = Pattern.compile("\\$while\\((.*?)\\)(.*?)\\$endWhile", Pattern.DOTALL);
Matcher matcher = pattern.matcher(yourString);
while(matcher.find()){
String variable = matcher.group(1); // this will include the $
String value = matcher.group(2);
// now do something with variable and value
}
If you want to replace the variables in the original text, you should use the Matcher.appendReplacement() / Matcher.appendTail() solution:
Pattern pattern = Pattern.compile("\\$while\\((.*?)\\)(.*?)\\$endWhile", Pattern.DOTALL);
Matcher matcher = pattern.matcher(yourString);
StringBuffer sb = new StringBuffer();
while(matcher.find()){
String variable = matcher.group(1); // this will include the $
String value = matcher.group(2);
// now do something with variable and value
matcher.appendReplacement(sb, value);
}
matcher.appendTail(sb);
Reference:
Methods of the Pattern Class
(Sun Java Tutorial)
Methods of the Matcher Class
(Sun Java Tutorial)
Pattern JavaDoc
Matcher JavaDoc
public class PatternInString {
static String testcase1 = "what i meant here";
static String testcase2 = "here";
public static void main(String args[])throws StringIndexOutOfBoundsException{
PatternInString testInstance= new PatternInString();
boolean result = testInstance.occurs(testcase1,testcase2);
System.out.println(result);
}
//write your code here
public boolean occurs(String str1, String str2)throws StringIndexOutOfBoundsException
{ int i;
boolean result=false;
int num7=str1.indexOf(" ");
int num8=str1.lastIndexOf(" ");
String str6=str1.substring(num8+1);
String str5=str1.substring(0,num7);
if(str5.equals(str2))
{
result=true;
}
else if(str6.equals(str2))
{
result=true;
}
int num=-1;
try
{
for(i=0;i<str1.length()-1;i++)
{ num=num+1;
num=str1.indexOf(" ",num);
int num1=str1.indexOf(" ",num+1);
String str=str1.substring(num+1,num1);
if(str.equals(str2))
{
result=true;
break;
}
}
}
catch(Exception e)
{
}
return result;
}
}

Categories

Resources