url encode matched groups - java

I've got a regex that's matching a given pattern(obviously, thats what regex's do) and replacing that pattern with an anchor tag and including a captured group. That part is working lovely.
String substituted = content.asString().replaceAll("\\[{2}((?:.)*?)\\]{2}",
"$1");
What I can't figure out is how to url encode the captured group before using it in the href attribute.
Example inputs
[[a]]
[[a b]]
[[a&b]]
desired outputs
a
a b
a&b
Is there any way to do this? I haven't found anything that looks useful yet, though once I ask I usually find an answer.

Replace all special chars with what you want first,
then match that inside the double [ and replace it in the <a href=..> tag.
That, or extract the url part inside the [ and pass it through a URL encoder before placing it in the <a href=..> tag.
Java seems to offer java.net.URLEncoder by default. So I think getting the url from the pattern, and passing though the encoder, and then placing it in the <a href=..> tag is your best choice.

Sure 'nough, found my answer.
Started with the code from Matcher.appendReplacement
Pure java:
Pattern p = Pattern.compile("\\[{2}((?:.)*?)\\]{2}" );
Matcher m = p.matcher(content.asString());
StringBuffer sb = new StringBuffer();
while (m.find()) {
String one = m.group(1);
try {
m.appendReplacement(sb, "$1");
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
m.appendTail(sb);
GWT:
RegExp p = RegExp.compile("\\[{2}((?:.)*?)\\]{2}", "g");
MatchResult m;
StringBuffer sb = new StringBuffer();
int beginIndex = 0;
while ((m = p.exec(content.asString())) != null) {
String one = m.getGroup(1);
int endIndex = m.getIndex();
sb.append(content.asString().substring(beginIndex, endIndex));
sb.append("" + one + "");
beginIndex = p.getLastIndex();
}
sb.append(content.asString().substring(beginIndex));

Related

Regex for replacing Exact String match [duplicate]

My input:
1. end
2. end of the day or end of the week
3. endline
4. something
5. "something" end
Based on the above discussions, If I try to replace a single string using this snippet, it removes the appropriate words from the line successfully
public class DeleteTest {
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
File file = new File("C:/Java samples/myfile.txt");
File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
String delete="end";
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));
for (String line; (line = reader.readLine()) != null;) {
line = line.replaceAll("\\b"+delete+"\\b", "");
writer.println(line);
}
reader.close();
writer.close();
}
catch (Exception e) {
System.out.println("Something went Wrong");
}
}
}
My output If I use the above snippet:(Also my expected output)
1.
2. of the day or of the week
3. endline
4. something
5. "something"
But when I include more words to delete, and for that purpose when I use Set, I use the below code snippet:
public static void main(String[] args) {
// TODO Auto-generated method stub
try {
File file = new File("C:/Java samples/myfile.txt");
File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));
Set<String> toDelete = new HashSet<>();
toDelete.add("end");
toDelete.add("something");
for (String line; (line = reader.readLine()) != null;) {
line = line.replaceAll("\\b"+toDelete+"\\b", "");
writer.println(line);
}
reader.close();
writer.close();
}
catch (Exception e) {
System.out.println("Something went Wrong");
}
}
I get my output as: (It just removes the space)
1. end
2. endofthedayorendoftheweek
3. endline
4. something
5. "something" end
Can u guys help me on this?
Click here to follow the thread
You need to create an alternation group out of the set with
String.join("|", toDelete)
and use as
line = line.replaceAll("\\b(?:"+String.join("|", toDelete)+")\\b", "");
The pattern will look like
\b(?:end|something)\b
See the regex demo. Here, (?:...) is a non-capturing group that is used to group several alternatives without creating a memory buffer for the capture (you do not need it since you remove the matches).
Or, better, compile the regex before entering the loop:
Pattern pat = Pattern.compile("\\b(?:" + String.join("|", toDelete) + ")\\b");
...
line = pat.matcher(line).replaceAll("");
UPDATE:
To allow matching whole "words" that may contain special chars, you need to Pattern.quote those words to escape those special chars, and then you need to use unambiguous word boundaries, (?<!\w) instead of the initial \b to make sure there is no word char before and (?!\w) negative lookahead instead of the final \b to make sure there is no word char after the match.
In Java 8, you may use this code:
Set<String> nToDel = new HashSet<>();
nToDel = toDelete.stream()
.map(Pattern::quote)
.collect(Collectors.toCollection(HashSet::new));
String pattern = "(?<!\\w)(?:" + String.join("|", nToDel) + ")(?!\\w)";
The regex will look like (?<!\w)(?:\Q+end\E|\Qsomething-\E)(?!\w). Note that the symbols between \Q and \E is parsed as literal symbols.
The problem is that you're not creating the correct regex for replacing the words in the set.
"\\b"+toDelete+"\\b" will produce this String \b[end, something]\b which is not what you need.
To fix that you can do something like this:
for(String del : toDelete){
line = line.replaceAll("\\b"+del+"\\b", "");
}
What this does is to go through the set, produce a regex from each word and remove that word from the line String.
Another approach will be to produce a single regex from all the words in the set.
Eg:
String regex = "";
for(String word : toDelete){
regex+=(regex.isEmpty() ? "" : "|") + "(\\b"+word+"\\b)";
}
....
line = line.replace(regex, "");
This should produce a regex that looks something like this: (\bend\b)|(\bsomething\b)

java Convert Hex NCRs texts to unicode characters

I'm making a feed reader app for local languages. A news site provide rss feed with these characters
ഹലോ സ്റ്റാക്ക്ഓവർ ഫ്ലോ
Which actually means
ഹലോ സ്റ്റാക്ക്ഓവർ ഫ്ലോ
this is also what I want display in my app..
How can I convert this input to the required form..
Try this.
String input = "ഹലോ സ്റ"
+ "്റാക്ക്ഓ"
+ "വർ ഫ്ലോ";
Pattern HEX = Pattern.compile("(?i)&#x([0-9a-f]+);|&#(\\d+);");
Matcher m = HEX.matcher(input);
StringBuffer sb = new StringBuffer();
while (m.find())
m.appendReplacement(sb,
String.valueOf((char) (m.group(1) != null ?
Integer.parseInt(m.group(1), 16) :
Integer.parseInt(m.group(2)))));
m.appendTail(sb);
String output = sb.toString();
System.out.println(output);
// -> ഹലോ സ്റ്റാക്ക്ഓവർ ഫ്ലോ
This code can handle also decimal NCR.
But cannot handle x10000 to x10FFFF.
Or you can use Jsoup like this.
Document doc = Jsoup.parse(input);
String output = doc.text();
System.out.println(output);
// -> ഹലോ സ്റ്റാക്ക്ഓവർ ഫ്ലോ

Java Pattern/ Matcher

This is a sample text: \1f\1e\1d\020028. I cannot modify the input text, I am reading long string of texts from a file.
I want to extract the following: \1f, \1e, \1d, \02
For this, I have written the following regular expression pattern: "\\[a-fA-F0-9]"
I am using Pattern and Matcher classes, but my matcher is not able find the pattern using the mentioned regular expression. I have tested this regex with the text on some online regex websites and surprisingly it works there.
Where am I going wrong?
Original code:
public static void main(String[] args) {
String inputText = "\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d";
inputText = inputText.replace("\\", "\\\\");
String regex = "\\\\[a-fA-F0-9]{2}";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(inputText);
while (m.find()) {
System.out.println(m.group());
}
}
Output: Nothing is printed
(answer changed after OP added more details)
Your string
String inputText = "\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d";
Doesn't actually contains any \ literals because according to Java Language Specification in section 3.10.6. Escape Sequences for Character and String Literals \xxx will be interpreted as character indexed in Unicode Table with octal (base/radix 8) value represented by xxx part.
Example \123 = 1*82 + 2*81 + 3*80 = 1*64 + 2*8 + 3*1 = 64+16+3 = 83 which represents character S
If string you presented in your question is written exactly the same in your text file then you should write it as
String inputText = "\\1f\\1e\\1d\\02002868BF03030000000000000000S023\\1f\\1e\\1d\\03\\0d";
(with escaped \ which now will represent literal).
(older version of my answer)
It is hard to tell what exactly you did wrong without seeing your code. You should be able to find at least \1, \1, \1, \0 since your regex can match one \ and one hexadecimal character placed after it.
Anyway this is how you can find results you mentioned in question:
String text = "\\1f\\1e\\1d\\020028";
Pattern p = Pattern.compile("\\\\[a-fA-F0-9]{2}");
// ^^^--we want to find two hexadecimal
// characters after \
Matcher m = p.matcher(text);
while (m.find())
System.out.println(m.group());
Output:
\1f
\1e
\1d
\02
You need to read the file properly and replace '\' characters with '\\'. Assume that there is file called test_file in your project with this content:
\1f\1e\1d\02002868BF03030000000000000000S023\1f\1e\1d\03\0d
Here is the code to read the file and extract values:
public static void main(String[] args) throws IOException, URISyntaxException {
Test t = new Test();
t.test();
}
public void test() throws IOException {
BufferedReader br =
new BufferedReader(
new InputStreamReader(
getClass().getResourceAsStream("/test_file.txt"), "UTF-8"));
String inputText;
while ((inputText = br.readLine()) != null) {
inputText = inputText.replace("\\", "\\\\");
Pattern pattern = Pattern.compile("\\\\[a-fA-F0-9]{2}");
Matcher match = pattern.matcher(inputText);
while (match.find()) {
System.out.println(match.group());
}
}
}
Try adding a . at the end, like:
\\[a-fA-F0-9].
If you don't want to modify the input string, you could try something like:
static public void main(String[] argv) {
String s = "\1f\1e\1d\020028";
Pattern regex = Pattern.compile("[\\x00-\\x1f][0-9A-Fa-f]");
Matcher match = regex.matcher(s);
while (match.find()) {
char[] c = match.group().toCharArray();
System.out.println(String.format("\\%d%s",c[0]+0, c[1])) ;
}
}
Yes, it's not perfect, but you get the idea.

How to match a url from a list of patterns in a textfile?

I have a text file that contains meta-urls in the following form:
http://www.xyz.com/.*services/
http://www.xyz.com/.*/wireless
I want to compare all the patterns from that file with my URL, and execute an action if I find a match. This matching process is hard to understand for me.
Assuming splitarray[0] contains the first line of text file:
String url = page.getWebURL().getURL();
URL url1 = new URL(url);
how can we compare url1 with splitarray[0]?
UPDATED
BufferedReader readbuffer = null;
try {
readbuffer = new BufferedReader(new FileReader("filters.txt"));
} catch (FileNotFoundException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
String strRead;
try {
while ((strRead=readbuffer.readLine())!=null){
String splitarray[] = strRead.split(",");
String firstentry = splitarray[0];
String secondentry = splitarray[1];
String thirdentry = splitarray[2];
//String fourthentry = splitarray[3];
//String fifthentry = splitarray[4];
System.out.println(firstentry + " " + secondentry+ " " +thirdentry);
URL url1 = new URL("http://www.xyz.com/ship/reach/news-and");
Pattern p = Pattern.compile("http://www.xyz.com/.*/reach");
Matcher m = p.matcher(url1.toString());
if (m.matches()) {
//Do whatever
System.out.println("Yes Done");
}
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Matching is working fine... But if I want that any url which start with the pattern giving in the splitarray[0] then do this... how we can implement this... As in the above case it is not matching but this url http://www.xyz.com/ship/w is from this pattern only http://www.xyz.com/.*/reach So any url that starts with this pattern.. just do this thing in the if loop... Any suggestions will be appreciated...!!
You are missing a step here. You first need to translate your URLs to a regular expression, or design a method to use those URLs, then only can you compare your URL url1 to those patterns.
Based on the patterns you have shown, I assume you are designing software for a xyz solution, like their routers. Therefore, your URLs probably fall in a simple pattern style, like
http://www.xyz.com/regular-expression-here
I'm confused as to where the regexes are coming from. The text file? In any case, you'll have a hard time comparing url1 to any regexes because it's a URL object, and regex compares strings. So you'll want to stick with your String url instead.
Try this:
Pattern p = Pattern.compile(splitarray[0]);
Matcher m = p.matcher(url);
if (m.matches()) {
//Do whatever
}
The m.matches() method checks whether the entire String you provide matches the pattern, which is probably what you want here. If you need to check whether part of your String matches, use m.find() instead.
Update
Since you're only looking to match the pattern at the beginning of the String, you'll want to use m.find() instead. The special character ^ only matches at the beginning of a String, so add that to the front of your regex, e.g.:
Pattern p = Pattern.compile("^" + splitarray[0]);
etc.

Java regex matching

strong textI have a bunch of lines in a textfile and I want to match this ${ALPANUMERIC characters} and replace it with ${SAME ALPHANUMERIC characters plus _SOMETEXT(CONSTANT)}.
I've tried this expression ${(.+)} but it didn't work and I also don't know how to do the replace regex in java.
thank you for your feedback
Here is some of my code :
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
StringBuilder sb = new StringBuilder();
while ((line = br.readLine()) != null) {
Pattern p = Pattern.compile("\\$\\{.+\\}");
Matcher m = p.matcher(line); // get a matcher object
if(m.find()) {
System.out.println("MATCH: "+m.group());
//TODO
//REPLACE STRING
//THEN APPEND String Builder
}
}
OK this above works but it only founds my variable and not the whole line for ex here is my input :
some text before ${VARIABLE_NAME} some text after
some text before ${VARIABLE_NAME2} some text after
some text before some text without variable some text after
... etc
so I just want to replace the ${VARIABLE_NAME} or ${VARIABLE_NAME} with ${VARIABLE_NAME2_SOMETHING} but leave preceding and following text line as it is
EDIT:
I though I though of a way like this :
if(line.contains("\\${([a-zA-Z0-9 ]+)}")){
System.out.println(line);
}
if(line.contains("\\$\\{.+\\}")){
System.out.println(line);
}
My idea was to capture the line containing this, then replace , but the regex is not ok, it works with pattern/matcher combination though.
EDIT II
I feel like I'm getting closer to the solution here, here is what I've come up with so far :
if(line.contains("$")){
System.out.println(line.replaceAll("\\$\\{.+\\}", "$1" +"_SUFFIX"));
}
What I meant by $1 is the string you just matched replace it with itself + _SUFFIX
I would use the String.replaceAll() method like so:
`String old="some string data";
String new=old.replaceAll("$([a-zA-Z0-9]+)","(\1) CONSTANT"); `
The $ is a special regular expression character that represents the end of a line. You'll need to escape it in order to match it. You'll also need to escape the backslash that you use for escaping the dollar sign because of the way Java handles strings.
Once you have your text in a string, you should be able to do the following:
str.replaceAll("\\${([a-zA-Z0-9 ]+)}", "\\${$1 _SOMETEXT(CONSTANT)}")
If you have other characters in your variable names (i.e. underscores, symbols, etc...) then just add them to the character class that you are matching for.
Edit: If you want to use a Pattern and Matcher then there are still a few changes. First, you probably want to compile your Pattern outside of the loop. Second, you can use this, although it is more verbose.
Pattern p = Pattern.compile("\\$\\{.+\\}");
Matcher m = p.matcher(line);
sb.append(m.replaceAll("\\${$1 _SOMETEXT(CONSTANT)}"));
THE SOLUTION :
while ((line = br.readLine()) != null) {
if(line.contains("$")){
sb.append(line.replaceAll("\\$\\{(.+)\\}", "\\${$1" +"_SUFFIX}") + "\n");
}else{
sb.append(line + "\n");
}
}
line = line.replaceAll("\\$\\{\\w+", "$0_SOMETHING");
There's no need to check for the presence of $ or whatever; that's part of what replaceAll() does. Anyway, contains() is not regex-powered like find(); it just does a plain literal text search.

Categories

Resources