Replace string by excluding some strings in Java - java

How can I replace following string in Java:
Sports videos (From 2002 To 2003) here.
TO
Sports videos 2002 2003 here.
I have use code but it remove the whole string i.e.
I am getting this ouput: Sports videos here.
String pattern= "\\((From)(?:\\s*\\d*\\s*)(To)(?:\\s*\\d*\\s*)\\)";
String testStr = "Sports videos (From 2002 To 2003) here.";
String testStrAfterRegex = testStr.replaceFirst(pattern, "");
What is missing here?
Thanks
DIFFERENT STRING WITH DATE FORMATTER
If above string has date formatter like(\\) or any other character/words then digit, the answer will not work
I replace orginal answer with this pattern and it will work
String pattern= "\\((From)(.*)(To)(.*)\\)";

Change to
String pattern= "\\((From)(\\s*\\d*\\s*)(To)(\\s*\\d*\\s*)\\)";
String testStr = "Sports videos (From 2002 To 2003) here.";
String testStrAfterRegex = testStr.replaceFirst(pattern, "$2 $4");
There are two problems:
First
You put (?:) in groups with years. This is used to not remember these groups.
Second
You don't use group identifiers, like $1, $2.
I fixed using $2 and $4 for 2th and 4th groups.
EDIT
Cleaner solution:
String pattern= "\\(From(\\s*\\d*\\s*)To(\\s*\\d*\\s*)\\)";
String testStr = "Sports videos (From 2002 To 2003) here.";
String testStrAfterRegex = testStr.replaceFirst(pattern, "$1$2");

Related

Why regex group doesn't work

I've tried the following regx (java string format):
^(.*(iOS\\s+[\\d\\.]+|Android\\s+[\\d\\.]+)?.*)$
String to match is :
Some Money 2.6.2; iOS 5.1.1
It supposes to return three groups :
group[0] :Some Money 2.6.2; iOS 5.1.1
group[1] :Some Money 2.6.2; iOS 5.1.1
group[2] :iOS 5.1.1
but it actually returns these:
group[0] :Some Money 2.6.2; iOS 5.1.1
group[1] :Some Money 2.6.2; iOS 5.1.1
group[2] :null
when i change regex as below
^(.*(iOS\\s+[\\d\\.]+|Android\\s+[\\d\\.]+).*)$
but it can't match string like
whatever iS 5.1.1 whatever
What i want to achieve is the regex returns three groups no matter what string likes.The first and second group always to be the entire string . The third group is the substring that matches '(iOS|Android) [\d.]*' if string does contains that part and is null or empty if it doesn't contain.
Maybe you can use the ; delimiter as indication that your iOS 5.1.1 part starts?
Then a pattern may look like .+;\\s+(.+).
.+; consumes everything up to the semi-colon
\\s+ consumes the spaces between semi-colon and the start of the version string
(.+) consumes everything up to the end
If you really only want to match iOS or Android then you might want to add a non capturing group within the (.+) part.
A regexp then would look like this: ".+;\\s+((?:iOS|Android).+)".
And here a executable example what a solution may look like. It shows the behaviour of both pattern variants I explained above.
public static void main(String[] args) {
String input1 = "Some Money 2.6.2; iS 5.1.1 ";
String input2 = "Some Money 2.6.2; iOS 5.1.1 ";
String input3 = "Some Money 2.6.2; Android 5.1.1 ";
String pattern1 = ".+;\\s+(.+)";
String pattern2 = ".+;\\s+((?:iOS|Android).+)";
System.out.println(pattern1);
matchPattern(input1, pattern1);
matchPattern(input2, pattern1);
matchPattern(input3, pattern1);
System.out.println();
System.out.println(pattern2);
matchPattern(input1, pattern2);
matchPattern(input2, pattern2);
matchPattern(input3, pattern2);
}
private static void matchPattern(String input, String pattern) {
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(input);
if(m.matches()) {
System.out.println(m.group(0));
System.out.println(m.group(1));
if(m.groupCount() > 1) {
System.out.println(m.group(2));
}
}
}
Update: Since the target of the question got clearer due to some edits by the author, I feel the need to update my answer. If it is about always getting three groups, the following might be better than working out all possible notation variants:
public static void main(String[] args) {
String input1 = "Some Money 2.6.2; iS 5.1.1";
String input2 = "Some Money 2.6.2; iOS 5.1.1";
String input3 = "Some Money 2.6.2; Android 5.1.1";
String input4 = "Some Money 2.6.2 iOS 5.1.1";
String input5 = "Some Money 2.6.2 iOS";
String input6 = "Some Money 2.6.2";
String pattern1 = "(.*?((?:iOS|Android)(?:\\s+[0-9\\.]+)?.*)?)";
System.out.println(pattern1);
matchPattern(input1, pattern1);
matchPattern(input2, pattern1);
matchPattern(input3, pattern1);
matchPattern(input4, pattern1);
matchPattern(input5, pattern1);
matchPattern(input6, pattern1);
}
private static void matchPattern(String input, String pattern) {
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(input);
if(m.matches()) {
System.out.println(m.group(0));
System.out.println(m.group(1));
System.out.println(m.group(2));
System.out.println();
}
}
Here the pattern is (.*?(?:((?:iOS|Android)(?:\\s+[0-9\\.]+)?).*)?).
.*? consumes everything before the version string. If no version string is available at all it matches the whole input. The Reluctant quantifier is needed here. It takes the shortest match that still matches and so avoids that the whole input is consumed.
(?:((?:iOS|Android)(?:\\s+[0-9\\.]+)?).*)? consumes the whole version string and everything that is following.
((?:iOS|Android)(?:\\s+[0-9\\.]+)?) is the group(2) output. It just matches the OS string, iOS or Android, with an optional version suffix consisting of numbers and dot.
please refer this topic about "How a RegEx engine works".
Those based on back-tracking. These often compile the pattern into byte-code, resembling machine instructions. The engine then executes the code, jumping from instruction to instruction. When an instruction fails, it then back-tracks to find another way to match the input.
Your regular expression have many way to match the input. And sadly, it return the other way (not your expected matches).
By removing "?" quantifier from the 2nd group, it becomes "required".
Your returned maches will match all required groups.
I finally solved the problem by regex as below.
(.*((?:iOS|Android)\\s+[0-9\\.]+).*|.*)

Java regexp to remove Freemarker interpolation tags

I'm trying to create a regexp to remove Freemarker interpolation tags in a String. I've a template with text and interpolations as "Hi customer, we remember your appointment ${date?string["dd"]}"
I want remove/replate this interpolation tag that is a bit particular because has inside the question mark.
I tried to create the regexp in this way:
String myString = "Hi customer, we remember your appointment ${date?string["dd"]}"
myString = myString.replaceAll(Pattern.quote("${date?string[\"dd\"]}"), "xx");
but don't works. Where I'm making the mistake?
Don't forget to assign return value of replaceAll method to original string as replaceAll (or any other String API) doesn't change the underlying immutable String object:
String myString = "Hi customer, we remember your appointment ${date?string[\"dd\"]}";
myString = myString.replaceAll(Pattern.quote("${date?string[\"dd\"]}"), "xx");
//=> Hi customer, we remember your appointment xx
Using regex, you could do:
String myString = "Hi customer, we remember your appointment ${date?string[\"dd\"]}";
myString = myString.replaceAll("\\$\\{date\\?string\\[\"dd\"\\]\\}", "xx");
The result string is:
Hi customer, we remember your appointment xx
I'm dividing string into 4 logical groups and then assembling required content together.
Try following regex:
(.*?)(?:\$\{.*?\"([^\"]+).{3})(.*)
Example:
String text = "Hi customer, we remember your appointment ${date?string[\"dd\"]}";
String replacement_text = "xxx";
String rx = "(.*?)(?:\\$\\{.*?\"([^\"]+).{3})(.*)";
Pattern regex = Pattern.compile(rx, Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
Matcher regexMatcher = regex.matcher(text);
String result = regexMatcher.replaceAll("$1" + replacement_text + "$3");
System.out.println(result);
Code will emit:
Hi customer, we remember your appointment xxx
However, if you want to extract content of marker i.e. dd, simply replace value of replacement_text with $2 and you'll get
Hi customer, we remember your appointment dd

parsing internal links from text in xml file

I need to get internal links present in text field of Wikinews xml file.
In my case those are coming in two formats
[[w:President of the People's Republic of China|President]]
[[People's Republic of China]]
I applied these regex patterns
internalLinks = Pattern.compile("\\[\\[w:([^|:]+)\\|.*\\]\\]").matcher(internalLinks).replaceAll("##en.wikipedia.org/wiki/$1##");
internalLinks = Pattern.compile("\\[\\[([^:|]+)\\]\\]").matcher(internalLinks).replaceAll("[[[en.wikinews.org/wiki/$1]]]");
Pattern pattern = Pattern.compile("\\[\\[\\[(.*?)\\]\\]\\]");
Matcher matcher = pattern.matcher(internalLinks);
while (matcher.find())
{
interLinks += matcher.group(1)+",";
}
Pattern pattern1 = Pattern.compile("##(.*?)##");
Matcher matcher1 = pattern1.matcher(internalLinks);
while (matcher1.find())
{
interLinks += matcher1.group(1)+",";
}
if (interLinks.length() > 0) {
interLinks = interLinks.substring(0, interLinks.length()-1);
return interLinks;
} else return "";
Problem is it is just giving me the links matching first pattern and that too only few links, just 3-4 and not all
Here I have provided an excerpt of the text field of a document.
{{date|November 13, 2004}}
{{Brazil}}[[w:Hu Jintao|Hu Jintao]], the [[w:President of the People's Republic of China|President]] of the [[People's Republic of China]] had lunch today with the [[w:President of Brazil|President]] of [[Brazil]], [[w:Luiz Inácio Lula da Silva|Luiz Inácio Lula da Silva]], at the ''Granja do Torto'', the President's country residence in the [[w:Brazilian Federal District|Brazilian Federal District]]. Lunch was a traditional Brazilian [[w:barbecue|barbecue]] with different kinds of meat.
Some Brazilian ministers were present at the event: [[w:Antonio Palocci|Antonio Palocci]] (Economy), [[w:pt:Eduardo Campos|Eduardo Campos]] ([[w:Ministry of Science and Technology (Brazil)|Science and Technology]]), [[w:João Roberto Rodrigues|Roberto Rodrigues]] (Agriculture), [[w:pt:Luiz Fernando Furlan|Luiz Fernando Furlan]] (Development), [[w:Celso Amorim|Celso Amorim]] ([[w:Ministry of
External Relations (Brazil)|Exterior Relations]]), [[w:Dilma Rousseff|Dilma Rousseff]] (Mines and Energy). Also present were [[w:pt:Roger Agnelli|Roger Agnelli]] ([[w:Vale (mining company)|Vale do Rio Doce]] company president) and Eduardo Dutra ([[w:Petrobras|Petrobras]], government oil company, president).
This meeting is part of a new [[w:political economy|political economy]] agreement between Brazil and China where Brazil has recognized mainland China's [[w:socialist market economy|market economy]] status, and China has promised to buy more [[w:economy of Brazil|Brazilian products]].
Solution
\[\[(?:w:)?.*?\]\]
Description
Discussion
This regex assumes that the sequence of characters ]] will not appear between [[ and ]].
I wasn't able for now to find the escape sequence of ]].
Demo
http://regexr.com?37e51
I've visited the download page, on top its written:
See Meta:Data dumps for documentation on the provided data formats.
I guess they offer better parsing approaches then plain regex, check it out...

How to pull numbers from a string/file name in Java?

Hopefully somebody can help me with this.. or at least point me in the right direction.
First off, I have a bunch of files with names such as:
vendor.2012-07-25
vendor.2012-07-25 2
ven_dor.2012-05-18
ven_dor.2012-05-18 2
Basically a vendor name (Sometimes one word, sometimes two with an underscore) + (period ".") + (year) + (month) + (day). Year, month, day are separated by (-). Possibly multiple files with the same name, denoted by a 2/3/4 etc after the date.
I obtain these as strings by doing file.getName(); where 'file' is the selected file from a JFileChooser
Then I need to chart some of the data based on date. Should I try to split the initial file name string by a "." first, so that the vendor and date are separated, and then split/divide up the remaining part by "-" to have the individual values for year/month/day?
I was thinking this could be a regex thing, but I'm pretty weak in that area.. so the double splitting is what I came up with. Anybody have input or suggestions? Thanks!
Indeed, you can use a regular expression:
String s = "vendor.2012-07-25 2";
Pattern p = Pattern.compile("([^.]+)\\.(\\d{4})-(\\d{2})-(\\d{2}) ?(\\d?)");
Matcher m = p.matcher(s);
if (m.find()) {
String vendorName = m.group(1);
String year = m.group(2);
String month = m.group(3);
String day = m.group(4);
String multipleFiles = m.groupCount() > 4 ? m.group(5) : "";
System.out.printf("%s %s %s %s %s", vendorName, year, month, day, multipleFiles);
}
Each expression wrapped with parentheses () is called a capturing group, and it basically tells the regex engine to save its content, so that it can be retrieved later on.
In sum, here's what each capturing group does:
([^.]+) - Everything but a dot (.), so we are basically capturing the vendor name part;
(\\d{4}) - \d matches a digit. \d{4} matches 4 digits (year);
(\\d{2}) - Month;
(\\d{2}) - Day;
(\\d?) - Matches an optional (?) last digit.
If you want to parse the date part as a java.Util.Date instance, you can use a single capturing group for it, and then use SimpleDateFormat:
Pattern p = Pattern.compile("([^.]+)\\.(\\d{4}-\\d{2}-\\d{2}) ?(\\d?)");
Matcher m = p.matcher(s);
if (m.find()) {
String vendorName = m.group(1);
String dateString = m.group(2);
SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd");
String multipleFiles = m.groupCount() > 2 ? m.group(3) : "";
}
String.split on the . (it will probably require escaping). Take the dotSplitString[1] as being the part after vendor. or ven_dor.
Split that part on space (spaceSplitString).
Parse the first part using DateFormat.parse(String) to get a Date
If the 2nd part (of the spaceSplitString) is present, use Integer.parseInt(spaceSplitString[1])
Java API String Tokenizer class
What you can do is:
tokenizer = new StringTokenizer(file.getName(), ".");
tokenizer.nextElement();
you get the picture, Or you can use Scanner to parse it as well
I tend to make use of StringTokenizers in my code a lot. To tokenize the above example you could use something akin to the following:
StringTokenizer tok = new StringTokenizer(filename,".-"); //tokenizes both on '.' and '-'
String name = tok.nextToken();
int year = Integer.parseInt(tok.nextToken());
int month = Integer.parseInt(tok.nextToken());
int day = Integer.parseInt(tok.nextToken());
int cnt = 1; //default one copy of the file
if(tok.hasMoreTokens()){
cnt = Integer.parseInt(tok.nextToken());
}
...and so on.
However I endorse the use of the regex solution above, if not only because it looks less comprehensible to a layman. Just including this here for completeness.

search and replace using regular expressions in java

I need to find and replace all dates present inside a document(basically bring it to present date). The problem in using regex is if the date is in this format
CreationDatetime="2012/07/24 10:00:19 649 GMT"
the regex will not find this entry as the date is attached to another string. Is there any other way to find dates in all formats ( yyyymmdd, yyyy/mm/dd etc.) and bring it to the current date.
working code for search for one format (yyyymmdd) but the replace doesn't work now.
String re1=".*?"; // Non-greedy match on filler
String re2="((?:(?:[1]{1}\\d{1}\\d{1}\\d{1})|(?:[2]{1}\\d{3}))[-:\\/.](?:[0]?[1-9]|[1][012])[-:\\/.](?:(?:[0-2]?\\d{1})|(?:[3][01]{1})))(?![\\d])"; // YYYYMMDD 1
Pattern p = Pattern.compile(re1+re2,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
for(Object s : x){
String temp = s.toString();
Matcher m = p.matcher(s.toString());
if (m.find())
{
temp.replaceAll(re1+re2, "test");
System.out.println(temp.toString());
}

Categories

Resources