search and replace using regular expressions in java - java

I need to find and replace all dates present inside a document(basically bring it to present date). The problem in using regex is if the date is in this format
CreationDatetime="2012/07/24 10:00:19 649 GMT"
the regex will not find this entry as the date is attached to another string. Is there any other way to find dates in all formats ( yyyymmdd, yyyy/mm/dd etc.) and bring it to the current date.
working code for search for one format (yyyymmdd) but the replace doesn't work now.
String re1=".*?"; // Non-greedy match on filler
String re2="((?:(?:[1]{1}\\d{1}\\d{1}\\d{1})|(?:[2]{1}\\d{3}))[-:\\/.](?:[0]?[1-9]|[1][012])[-:\\/.](?:(?:[0-2]?\\d{1})|(?:[3][01]{1})))(?![\\d])"; // YYYYMMDD 1
Pattern p = Pattern.compile(re1+re2,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
for(Object s : x){
String temp = s.toString();
Matcher m = p.matcher(s.toString());
if (m.find())
{
temp.replaceAll(re1+re2, "test");
System.out.println(temp.toString());
}

Related

Remove elements from Date Format String using a Regular Expression

I want to remove elements a supplied Date Format String - for example convert the format "dd/MM/yyyy" to "MM/yyyy" by removing any non-M/y element.
What I'm trying to do is create a localised month/year format based on the existing day/month/year format provided for the Locale.
I've done this using regular expressions, but the solution seems longer than I'd expect.
An example is below:
public static void main(final String[] args) {
System.out.println(filterDateFormat("dd/MM/yyyy HH:mm:ss", 'M', 'y'));
System.out.println(filterDateFormat("MM/yyyy/dd", 'M', 'y'));
System.out.println(filterDateFormat("yyyy-MMM-dd", 'M', 'y'));
}
/**
* Removes {#code charsToRetain} from {#code format}, including any redundant
* separators.
*/
private static String filterDateFormat(final String format, final char...charsToRetain) {
// Match e.g. "ddd-"
final Pattern pattern = Pattern.compile("[" + new String(charsToRetain) + "]+\\p{Punct}?");
final Matcher matcher = pattern.matcher(format);
final StringBuilder builder = new StringBuilder();
while (matcher.find()) {
// Append each match
builder.append(matcher.group());
}
// If the last match is "mmm-", remove the trailing punctuation symbol
return builder.toString().replaceFirst("\\p{Punct}$", "");
}
Let's try a solution for the following date format strings:
String[] formatStrings = { "dd/MM/yyyy HH:mm:ss",
"MM/yyyy/dd",
"yyyy-MMM-dd",
"MM/yy - yy/dd",
"yyabbadabbadooMM" };
The following will analyze strings for a match, then print the first group of the match.
Pattern p = Pattern.compile(REGEX);
for(String formatStr : formatStrings) {
Matcher m = p.matcher(formatStr);
if(m.matches()) {
System.out.println(m.group(1));
}
else {
System.out.println("Didn't match!");
}
}
Now, there are two separate regular expressions I've tried. First:
final String REGEX = "(?:[^My]*)([My]+[^\\w]*[My]+)(?:[^My]*)";
With program output:
MM/yyyy
MM/yyyy
yyyy-MMM
Didn't match!
Didn't match!
Second:
final String REGEX = "(?:[^My]*)((?:[My]+[^\\w]*)+[My]+)(?:[^My]*)";
With program output:
MM/yyyy
MM/yyyy
yyyy-MMM
MM/yy - yy
Didn't match!
Now, let's see what the first regex actually matches to:
(?:[^My]*)([My]+[^\\w]*[My]+)(?:[^My]*) First regex =
(?:[^My]*) Any amount of non-Ms and non-ys (non-capturing)
([My]+ followed by one or more Ms and ys
[^\\w]* optionally separated by non-word characters
(implying they are also not Ms or ys)
[My]+) followed by one or more Ms and ys
(?:[^My]*) finished by any number of non-Ms and non-ys
(non-capturing)
What this means is that at least 2 M/ys are required to match the regex, although you should be careful that something like MM-dd or yy-DD will match as well, because they have two M-or-y regions 1 character long. You can avoid getting into trouble here by just keeping a sanity check on your date format string, such as:
if(formatStr.contains('y') && formatStr.contains('M') && m.matches())
{
String yMString = m.group(1);
... // other logic
}
As for the second regex, here's what it means:
(?:[^My]*)((?:[My]+[^\\w]*)+[My]+)(?:[^My]*) Second regex =
(?:[^My]*) Any amount of non-Ms and non-ys
(non-capturing)
( ) followed by
(?:[My]+ )+[My]+ at least two text segments consisting of
one or more Ms or ys, where each segment is
[^\\w]* optionally separated by non-word characters
(?:[^My]*) finished by any number of non-Ms and non-ys
(non-capturing)
This regex will match a slightly broader series of strings, but it still requires that any separations between Ms and ys be non-words ([^a-zA-Z_0-9]). Additionally, keep in mind that this regex will still match "yy", "MM", or similar strings like "yyy", "yyyy"..., so it would be useful to have a sanity check as described for the previous regular expression.
Additionally, here's a quick example of how one might use the above to manipulate a single date format string:
LocalDateTime date = LocalDateTime.now();
String dateFormatString = "dd/MM/yyyy H:m:s";
System.out.println("Old Format: \"" + dateFormatString + "\" = " +
date.format(DateTimeFormatter.ofPattern(dateFormatString)));
Pattern p = Pattern.compile("(?:[^My]*)([My]+[^\\w]*[My]+)(?:[^My]*)");
Matcher m = p.matcher(dateFormatString);
if(dateFormatString.contains("y") && dateFormatString.contains("M") && m.matches())
{
dateFormatString = m.group(1);
System.out.println("New Format: \"" + dateFormatString + "\" = " +
date.format(DateTimeFormatter.ofPattern(dateFormatString)));
}
else
{
throw new IllegalArgumentException("Couldn't shorten date format string!");
}
Output:
Old Format: "dd/MM/yyyy H:m:s" = 14/08/2019 16:55:45
New Format: "MM/yyyy" = 08/2019
I'll try to answer with the understanding of my question : how do I remove from a list/table/array of String, elements that does not exactly follow the patern 'dd/MM'.
so I'm looking for a function that looks like
public List<String> removeUnWantedDateFormat(List<String> input)
We can expect, from my knowledge on Dateformat, only 4 possibilities that you would want, hoping i dont miss any, which are "MM/yyyy", "MMM/yyyy", "MM/yy", "MM/yyyy". So that we know what we are looking for we can do an easy function.
public List<String> removeUnWantedDateFormat(List<String> input) {
String s1 = "MM/yyyy";
string s2 = "MMM/yyyy";
String s3 = "MM/yy";
string s4 = "MMM/yy";
for (String format:input) {
if (!s1.equals(format) && s2.equals(format) && s3.equals(format) && s4.equals(format))
input.remove(format);
}
return input;
}
Better not to use regex if you can, it costs a lot of resources. And great improvement would be to use an enum of the date format you accept, like this you have better control over it, and even replace them.
Hope this will help, cheers
edit: after i saw the comment, i think it would be better to use contains instead of equals, should work like a charm and instead of remove,
input = string expected.
so it would looks more like:
public List<String> removeUnWantedDateFormat(List<String> input) {
List<String> comparaisons = new ArrayList<>();
comparaison.add("MMM/yyyy");
comparaison.add("MMM/yy");
comparaison.add("MM/yyyy");
comparaison.add("MM/yy");
for (String format:input) {
for(String comparaison: comparaisons)
if (format.contains(comparaison)) {
format = comparaison;
break;
}
}
return input;
}

how to extract date from the given filename in java

I have my file names as below
C:\Users\name\Documents\repository\zzz\xxx_yyy\new\aaa_bbb_ccc_ddd_eee_ZZ_E_20160801_20160831_v1-0.csv
C:\Users\name\Documents\repository\zzz\xxx_yyy\new\aaa_bbb_ppp_ccc_ddd_eee_ZZ_E_20160801_20160831_v1-0.csv
I have to write a single java script for both the file format to extract both the dates from each filename.
Can you please help.
You should use Regular expressions to extract dates from filenames like these.
private static Date[] extractDatesFromFileName(File file) throws ParseException {
Date[] dates = new Date[2];
SimpleDateFormat dateFormatter = new SimpleDateFormat("yyyyMMdd");
String regex = ".*(\\d{8})_(\\d{8}).*";
Pattern pattern = Pattern.compile(regex);
Matcher m = pattern.matcher(file.getName());
if (m.find()) {
dates[0] = dateFormatter.parse(m.group(1));
dates[1] = dateFormatter.parse(m.group(2));
}
System.out.println(dates[0]);
System.out.println(dates[1]);
return dates;
}
Little explanation:
In regex .*(\\d{8})_(\\d{8}).*:
.* stands for any sing repeated from zero to unlimited times
(\\d{8}) stands for exactly eight digits (if they are in brackets they are considered capturing groups, we have 2 capturing groups in this regex, one for each date)
_ stands for _ sign :)
If filename matches provided pattern both dates are extracted, parsed and returned as array. You should add some error handling etc.
If you mean a Java script (not Javascript) you can use regexp, something like the following:
String in = "C:\\Users\\name\\Documents\\repository\\zzz\\xxx_yyy\\new\\aaa_bbb_ppp_ccc_ddd_eee_ZZ_E_20160801_20160831_v1-0.csv";
Pattern p = Pattern.compile("_(\\d{8})_v1-0");
Matcher m = p.matcher(in);
if (m.find()){
System.out.println(m.group(1));
}
I think you want to extract two dates which are present in each file path.
This could be done as follows:
String filename1 = "C:\\Users\\name\\Documents\\repository\\zzz\\xxx_yyy\\new\\aaa_bbb_ccc_ddd_eee_ZZ_E_20160801_20160831_v1-0.csv";
Pattern p = Pattern.compile("[0-9]{8}+_[0-9]{8}+");
Matcher m = p.matcher(filename1);
String[] dateStrArr = m.find()?m.group(0).split("_"): null;
First date will be in 0 index and second date will be in 1 index position.
Same goes for second file name.
Hope this helps.
Also once extracted you can convert them to date object using SimpleDateFormat.

Extracting all DATES from a .txt file

hopefully this is short and to the question..
In the below program I have successfully extracted ALL data from a notepad doc named "pad.txt", which consists of 3 sets vertically aligned with an 'ID' followed by 'Name' followed by 'Date Joined', that pattern is consistent.
The notepad doc consists solely of this:
dID: 1
Name: Bob
Date Joined: 01/12/2014
ID: 2
Name: Jim
Date Joined: 8/21/1993
ID: 3
Name: Steve
Date Joined: 6/07/2016
I have also defined a regex that accepts an acceptable date format: 1-2 digits, a slash, 1-2 digits again, a slash, then 2 to four digits for YEAR date.. At the beginning of that I specified a wild card character "." <- the dot with a greedy quantifier "" the star, to say ANY number of ANY character before the date is accepted, as well as after the date I have also specified the "."
My main goal with this code is to EXTRACT ONLY all of the DATES within the pad.txt file, and store them in a String or something..
public class Main {
public static void main(String args[]) throws Exception{
StringBuilder builder = new StringBuilder();
FileReader reader = new FileReader(new File("pad.txt"));
// Define valid date format via regex
String dateRegex = ".* (\\d{1,2})/(\\d{1,2})/(\\d{2,4}) .* ";
int fileContent = 0;
// iterate through entire notepad doc, until = 0 AKA (finished searching doc)
while((fileContent = reader.read()) !=-1){
builder.append((char)fileContent);
}//encapsulating loop
reader.close();
String extracted = builder.toString();
System.out.println("Extracted: " + extracted);
System.out.println();
Matcher m = null;
// Validate that file contents conform with 'dateRegex'
m = Pattern.compile(dateRegex).matcher(extracted);
if(m.find()){
System.out.println("Entire group : " + m.group());
}
}
}
Unfortunately, the m.group(); outprint only returns:
"Entire group : 6/07/2016"
As stated, my goal is to extract ALL of the dates, but I can't fiddle with all of the dates if the .matcher call ONLY catches the "Entire group : 6/07/2016"
In my mind, I say ANY character of ANY amount is allowed before and AFTER the date, so it scrolls to the very bottom and finds ONLY the LAST date, how do I defined the regex so that it pulls out ALL of the dates, not just the very LAST one, and why is it only pulling the last one?
I've tried relentlessly with this and cannot figure out how..
Thanks in advance
Well, that's relatively easy. You can't write a regex that matches all dates at once, but you can use matcher as it was intended to be used, i.e. find() returns true as often as another match can be found.
So you have to modify your regex and remove the .* on both ends. Then you can simply do this:
StringBuilder dateListBuilder = new Stringbuilder();
while(m.find()){
dateListBuilder.append(m.group());
}
System.out.println(dateListBuilder.toString());

Matching dates with Regex inside of a random string

I am trying to do this in Java:
I receive this kind of string
"12/07/2004dddsss12/10/2010ñrrñrñr10/01/2000ksdifjsdifffffdd04/04/1998"
Then I have to find one or more dates inside that string, date format: dd/mm/yyyy
Finally I have to copy to another string dates matched: "12/07/2004 12/10/2010 10/01/2000 04/04/1998"
PD: I'm using this website http://regexpal.com/ to test if works. I tried some website regex and anyone worked for me.
You can separate the validity of the date with the extracted content.
To extract the dates:
String regex = "\\d{2}/\\d{2}/\\d{4}";
Check here at fiddle: http://fiddle.re/fa0bf
Code:
String input = "12/07/2004dddsss12/10/2010ñrrñrñr10/01/2000ksdifjsdifffffdd04/04/1998";
String regex = "\\d{2}/\\d{2}/\\d{4}";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group());
}
Gives,
12/07/2004
12/10/2010
10/01/2000
04/04/1998

How to pull numbers from a string/file name in Java?

Hopefully somebody can help me with this.. or at least point me in the right direction.
First off, I have a bunch of files with names such as:
vendor.2012-07-25
vendor.2012-07-25 2
ven_dor.2012-05-18
ven_dor.2012-05-18 2
Basically a vendor name (Sometimes one word, sometimes two with an underscore) + (period ".") + (year) + (month) + (day). Year, month, day are separated by (-). Possibly multiple files with the same name, denoted by a 2/3/4 etc after the date.
I obtain these as strings by doing file.getName(); where 'file' is the selected file from a JFileChooser
Then I need to chart some of the data based on date. Should I try to split the initial file name string by a "." first, so that the vendor and date are separated, and then split/divide up the remaining part by "-" to have the individual values for year/month/day?
I was thinking this could be a regex thing, but I'm pretty weak in that area.. so the double splitting is what I came up with. Anybody have input or suggestions? Thanks!
Indeed, you can use a regular expression:
String s = "vendor.2012-07-25 2";
Pattern p = Pattern.compile("([^.]+)\\.(\\d{4})-(\\d{2})-(\\d{2}) ?(\\d?)");
Matcher m = p.matcher(s);
if (m.find()) {
String vendorName = m.group(1);
String year = m.group(2);
String month = m.group(3);
String day = m.group(4);
String multipleFiles = m.groupCount() > 4 ? m.group(5) : "";
System.out.printf("%s %s %s %s %s", vendorName, year, month, day, multipleFiles);
}
Each expression wrapped with parentheses () is called a capturing group, and it basically tells the regex engine to save its content, so that it can be retrieved later on.
In sum, here's what each capturing group does:
([^.]+) - Everything but a dot (.), so we are basically capturing the vendor name part;
(\\d{4}) - \d matches a digit. \d{4} matches 4 digits (year);
(\\d{2}) - Month;
(\\d{2}) - Day;
(\\d?) - Matches an optional (?) last digit.
If you want to parse the date part as a java.Util.Date instance, you can use a single capturing group for it, and then use SimpleDateFormat:
Pattern p = Pattern.compile("([^.]+)\\.(\\d{4}-\\d{2}-\\d{2}) ?(\\d?)");
Matcher m = p.matcher(s);
if (m.find()) {
String vendorName = m.group(1);
String dateString = m.group(2);
SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd");
String multipleFiles = m.groupCount() > 2 ? m.group(3) : "";
}
String.split on the . (it will probably require escaping). Take the dotSplitString[1] as being the part after vendor. or ven_dor.
Split that part on space (spaceSplitString).
Parse the first part using DateFormat.parse(String) to get a Date
If the 2nd part (of the spaceSplitString) is present, use Integer.parseInt(spaceSplitString[1])
Java API String Tokenizer class
What you can do is:
tokenizer = new StringTokenizer(file.getName(), ".");
tokenizer.nextElement();
you get the picture, Or you can use Scanner to parse it as well
I tend to make use of StringTokenizers in my code a lot. To tokenize the above example you could use something akin to the following:
StringTokenizer tok = new StringTokenizer(filename,".-"); //tokenizes both on '.' and '-'
String name = tok.nextToken();
int year = Integer.parseInt(tok.nextToken());
int month = Integer.parseInt(tok.nextToken());
int day = Integer.parseInt(tok.nextToken());
int cnt = 1; //default one copy of the file
if(tok.hasMoreTokens()){
cnt = Integer.parseInt(tok.nextToken());
}
...and so on.
However I endorse the use of the regex solution above, if not only because it looks less comprehensible to a layman. Just including this here for completeness.

Categories

Resources