I have a requirement where depending on the filename I am required to call different methods
filename example are as below
Abc_def_20180719_ghi.txt
Pqr_xy_gh_20180730.txt
Here I want to remove all the characters once I encounter datepattern
So the output should be like:
"Abc_def"
"Pqr_xy_gh"
Please suggest suitable string operations with regex
For filtering all numbers you can use: yourText.replaceAll("[0-9]","") .
But if you want to drop the .txt use: yourTextAfterReplacingAll.split("\\.")
The text you want is in yourTextAfterSplit[0]
You can use following regex to detect required portion of file name
/.+(?=_\d{8})/
For demonstration have a look here. It detects any character except line breaks before an underscore and concurrent 8 digits which is pattern of date.
It may be overthinking it a little to validate that the date at least superficially looks like a good date. This regex could be simplified if you don't care about invalid dates like 10664964.
import java.util.Optional;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class DatePrefix {
// no digits before date; year must be between 2000 and 9999
// month from 01 to 12, day from 01 to 31
private static Pattern beforeDate = Pattern.compile(
"([^0-9]+)_[2-9]\\d{3}(?:0[1-9]|1[0-2])(?:0[1-9]|[1-2]\\d|3[01])");
public static void main(String[] args) {
for (String filename : args) {
getPrefixBeforeDate(filename)
.ifPresentOrElse(
prefix -> System.out.format("Found %s%n", prefix),
() -> System.out.format("Bad date: %s%n", filename));
}
}
public static Optional<String> getPrefixBeforeDate(String filename) {
Matcher matcher = beforeDate.matcher(filename);
if (matcher.find()) {
return Optional.of(matcher.group(1));
}
return Optional.empty();
}
}
When called with:
java DatePrefix Pq_xy_20180229.txt Abc_def_ghi_20380323_foo_1200.xml \
Hey_its_20182395.gif Foo_bar.txt
It prints:
Found Pq_xy
Found Abc_def_ghi
Bad date: Hey_its_20182395.gif
Bad date: Foo_bar.txt
The pattern could simply be the following if you don't care whether the date looks at all valid:
private static Pattern beforeDate = Pattern.compile("([^0-9]+)_\\d{8}");
Try this pattern:
[\w\d]+[A-Z-a-z][_]
You can test is online
Related
Considering a string in following format,
[ABCD:defg] [MSG:information] [MSG2:hello]
How to write regex to check if the line has '[MSG:' followed by some message & ']' and extract text 'information' from above string?
You can use the regex, \[MSG:(.*?)\] and extract the value of group(1).
Demo:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class Main {
public static void main(String args[]) {
String str = "[ABCD:defg] [MSG:information] [MSG2:hello]";
Matcher matcher = Pattern.compile("\\[MSG:(.*?)\\]").matcher(str);
if (matcher.find())
System.out.println(matcher.group(1));
}
}
Output:
information
Your requirement would be something like
/\[MSG:.+\]/ in standard regex notation. But I would suggest to you that you could use String.indexOf to extract your information
String str = ...
int idx = str.indexOf("MSG:");
int idx2 = str.indexOf("]", idx);
val = str.substring(idx + "MSG:".length(), idx2);
I want to remove elements a supplied Date Format String - for example convert the format "dd/MM/yyyy" to "MM/yyyy" by removing any non-M/y element.
What I'm trying to do is create a localised month/year format based on the existing day/month/year format provided for the Locale.
I've done this using regular expressions, but the solution seems longer than I'd expect.
An example is below:
public static void main(final String[] args) {
System.out.println(filterDateFormat("dd/MM/yyyy HH:mm:ss", 'M', 'y'));
System.out.println(filterDateFormat("MM/yyyy/dd", 'M', 'y'));
System.out.println(filterDateFormat("yyyy-MMM-dd", 'M', 'y'));
}
/**
* Removes {#code charsToRetain} from {#code format}, including any redundant
* separators.
*/
private static String filterDateFormat(final String format, final char...charsToRetain) {
// Match e.g. "ddd-"
final Pattern pattern = Pattern.compile("[" + new String(charsToRetain) + "]+\\p{Punct}?");
final Matcher matcher = pattern.matcher(format);
final StringBuilder builder = new StringBuilder();
while (matcher.find()) {
// Append each match
builder.append(matcher.group());
}
// If the last match is "mmm-", remove the trailing punctuation symbol
return builder.toString().replaceFirst("\\p{Punct}$", "");
}
Let's try a solution for the following date format strings:
String[] formatStrings = { "dd/MM/yyyy HH:mm:ss",
"MM/yyyy/dd",
"yyyy-MMM-dd",
"MM/yy - yy/dd",
"yyabbadabbadooMM" };
The following will analyze strings for a match, then print the first group of the match.
Pattern p = Pattern.compile(REGEX);
for(String formatStr : formatStrings) {
Matcher m = p.matcher(formatStr);
if(m.matches()) {
System.out.println(m.group(1));
}
else {
System.out.println("Didn't match!");
}
}
Now, there are two separate regular expressions I've tried. First:
final String REGEX = "(?:[^My]*)([My]+[^\\w]*[My]+)(?:[^My]*)";
With program output:
MM/yyyy
MM/yyyy
yyyy-MMM
Didn't match!
Didn't match!
Second:
final String REGEX = "(?:[^My]*)((?:[My]+[^\\w]*)+[My]+)(?:[^My]*)";
With program output:
MM/yyyy
MM/yyyy
yyyy-MMM
MM/yy - yy
Didn't match!
Now, let's see what the first regex actually matches to:
(?:[^My]*)([My]+[^\\w]*[My]+)(?:[^My]*) First regex =
(?:[^My]*) Any amount of non-Ms and non-ys (non-capturing)
([My]+ followed by one or more Ms and ys
[^\\w]* optionally separated by non-word characters
(implying they are also not Ms or ys)
[My]+) followed by one or more Ms and ys
(?:[^My]*) finished by any number of non-Ms and non-ys
(non-capturing)
What this means is that at least 2 M/ys are required to match the regex, although you should be careful that something like MM-dd or yy-DD will match as well, because they have two M-or-y regions 1 character long. You can avoid getting into trouble here by just keeping a sanity check on your date format string, such as:
if(formatStr.contains('y') && formatStr.contains('M') && m.matches())
{
String yMString = m.group(1);
... // other logic
}
As for the second regex, here's what it means:
(?:[^My]*)((?:[My]+[^\\w]*)+[My]+)(?:[^My]*) Second regex =
(?:[^My]*) Any amount of non-Ms and non-ys
(non-capturing)
( ) followed by
(?:[My]+ )+[My]+ at least two text segments consisting of
one or more Ms or ys, where each segment is
[^\\w]* optionally separated by non-word characters
(?:[^My]*) finished by any number of non-Ms and non-ys
(non-capturing)
This regex will match a slightly broader series of strings, but it still requires that any separations between Ms and ys be non-words ([^a-zA-Z_0-9]). Additionally, keep in mind that this regex will still match "yy", "MM", or similar strings like "yyy", "yyyy"..., so it would be useful to have a sanity check as described for the previous regular expression.
Additionally, here's a quick example of how one might use the above to manipulate a single date format string:
LocalDateTime date = LocalDateTime.now();
String dateFormatString = "dd/MM/yyyy H:m:s";
System.out.println("Old Format: \"" + dateFormatString + "\" = " +
date.format(DateTimeFormatter.ofPattern(dateFormatString)));
Pattern p = Pattern.compile("(?:[^My]*)([My]+[^\\w]*[My]+)(?:[^My]*)");
Matcher m = p.matcher(dateFormatString);
if(dateFormatString.contains("y") && dateFormatString.contains("M") && m.matches())
{
dateFormatString = m.group(1);
System.out.println("New Format: \"" + dateFormatString + "\" = " +
date.format(DateTimeFormatter.ofPattern(dateFormatString)));
}
else
{
throw new IllegalArgumentException("Couldn't shorten date format string!");
}
Output:
Old Format: "dd/MM/yyyy H:m:s" = 14/08/2019 16:55:45
New Format: "MM/yyyy" = 08/2019
I'll try to answer with the understanding of my question : how do I remove from a list/table/array of String, elements that does not exactly follow the patern 'dd/MM'.
so I'm looking for a function that looks like
public List<String> removeUnWantedDateFormat(List<String> input)
We can expect, from my knowledge on Dateformat, only 4 possibilities that you would want, hoping i dont miss any, which are "MM/yyyy", "MMM/yyyy", "MM/yy", "MM/yyyy". So that we know what we are looking for we can do an easy function.
public List<String> removeUnWantedDateFormat(List<String> input) {
String s1 = "MM/yyyy";
string s2 = "MMM/yyyy";
String s3 = "MM/yy";
string s4 = "MMM/yy";
for (String format:input) {
if (!s1.equals(format) && s2.equals(format) && s3.equals(format) && s4.equals(format))
input.remove(format);
}
return input;
}
Better not to use regex if you can, it costs a lot of resources. And great improvement would be to use an enum of the date format you accept, like this you have better control over it, and even replace them.
Hope this will help, cheers
edit: after i saw the comment, i think it would be better to use contains instead of equals, should work like a charm and instead of remove,
input = string expected.
so it would looks more like:
public List<String> removeUnWantedDateFormat(List<String> input) {
List<String> comparaisons = new ArrayList<>();
comparaison.add("MMM/yyyy");
comparaison.add("MMM/yy");
comparaison.add("MM/yyyy");
comparaison.add("MM/yy");
for (String format:input) {
for(String comparaison: comparaisons)
if (format.contains(comparaison)) {
format = comparaison;
break;
}
}
return input;
}
I have my file names as below
C:\Users\name\Documents\repository\zzz\xxx_yyy\new\aaa_bbb_ccc_ddd_eee_ZZ_E_20160801_20160831_v1-0.csv
C:\Users\name\Documents\repository\zzz\xxx_yyy\new\aaa_bbb_ppp_ccc_ddd_eee_ZZ_E_20160801_20160831_v1-0.csv
I have to write a single java script for both the file format to extract both the dates from each filename.
Can you please help.
You should use Regular expressions to extract dates from filenames like these.
private static Date[] extractDatesFromFileName(File file) throws ParseException {
Date[] dates = new Date[2];
SimpleDateFormat dateFormatter = new SimpleDateFormat("yyyyMMdd");
String regex = ".*(\\d{8})_(\\d{8}).*";
Pattern pattern = Pattern.compile(regex);
Matcher m = pattern.matcher(file.getName());
if (m.find()) {
dates[0] = dateFormatter.parse(m.group(1));
dates[1] = dateFormatter.parse(m.group(2));
}
System.out.println(dates[0]);
System.out.println(dates[1]);
return dates;
}
Little explanation:
In regex .*(\\d{8})_(\\d{8}).*:
.* stands for any sing repeated from zero to unlimited times
(\\d{8}) stands for exactly eight digits (if they are in brackets they are considered capturing groups, we have 2 capturing groups in this regex, one for each date)
_ stands for _ sign :)
If filename matches provided pattern both dates are extracted, parsed and returned as array. You should add some error handling etc.
If you mean a Java script (not Javascript) you can use regexp, something like the following:
String in = "C:\\Users\\name\\Documents\\repository\\zzz\\xxx_yyy\\new\\aaa_bbb_ppp_ccc_ddd_eee_ZZ_E_20160801_20160831_v1-0.csv";
Pattern p = Pattern.compile("_(\\d{8})_v1-0");
Matcher m = p.matcher(in);
if (m.find()){
System.out.println(m.group(1));
}
I think you want to extract two dates which are present in each file path.
This could be done as follows:
String filename1 = "C:\\Users\\name\\Documents\\repository\\zzz\\xxx_yyy\\new\\aaa_bbb_ccc_ddd_eee_ZZ_E_20160801_20160831_v1-0.csv";
Pattern p = Pattern.compile("[0-9]{8}+_[0-9]{8}+");
Matcher m = p.matcher(filename1);
String[] dateStrArr = m.find()?m.group(0).split("_"): null;
First date will be in 0 index and second date will be in 1 index position.
Same goes for second file name.
Hope this helps.
Also once extracted you can convert them to date object using SimpleDateFormat.
I am using regex in java to get a specific output from a list of rooms at my University.
A outtake from the list looks like this:
(A55:G260) Laboratorium 260
(A55:G292) Grupperom 292
(A55:G316) Grupperom 316
(A55:G366) Grupperom 366
(HDS:FLØYEN) Fløyen (appendix)
(ODO:PC-STUE) Pulpakammeret (PC-stue)
(SALEM:KONF) Konferanserom
I want to get the value that comes between the colon and the parenthesis.
The regex I am using at the moment is:
pattern = Pattern.compile("[:]([A-Za-z0-9ÆØÅæøå-]+)");
matcher = pattern.matcher(room.text());
I've included ÆØÅ, because some of the rooms have Norwegian letters in them.
Unfortunately the regex includes the building code also (e.g. "A55") in the output... Comes out like this:
A55
A55
A55
:G260
:G292
:G316
Any ideas on how to solve this?
The problem is not your regular expression. You need to reference group(1) for the match result.
while (matcher.find()) {
System.out.println(matcher.group(1));
}
However, you may consider using a negated character class instead.
pattern = Pattern.compile(":([^)]+)");
You can try a regex like this :
public static void main(String[] args) {
String s = "(HDS:FLØYEN) Fløyen (appendix)";
// select everything after ":" upto the first ")" and replace the entire regex with the selcted data
System.out.println(s.replaceAll(".*?:(.*?)\\).*", "$1"));
String s1 = "ODO:PC-STUE) Pulpakammeret (PC-stue)";
System.out.println(s1.replaceAll(".*?:(.*?)\\).*", "$1"));
}
O/P :
FLØYEN
PC-STUE
Can try with String Opreations as follows,
String val = "(HDS:FLØYEN) Fløyen (appendix)";
if(val.contains(":")){
String valSub = val.split("\\s")[0];
System.out.println(valSub);
valSub = valSub.substring(1, valSub.length()-1);
String valA = valSub.split(":")[0];
String valB = valSub.split(":")[1];
System.out.println(valA);
System.out.println(valB);
}
Output :
(HDS:FLØYEN)
HDS
FLØYEN
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class test
{
public static void main( String args[] ){
// String to be scanned to find the pattern.
String line = "(HDS:FLØYEN) Fløyen (appendix)";
String pattern = ":([^)]+)";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
while (m.find()) {
System.out.println(m.group(1));
}
}
}
I am trying to split date with milliseconds and print in my format, but having index out of bound exception. It is working in case of split("/") but not with split(".").
I don't know why this is happening.
Code is:
public class c {
public static void main(String[] arg)
{
Date date=new Date();
DateFormat formatter = new SimpleDateFormat("dd/MM/yyyy HH:mm:ss.FFF");
System.out.println(formatter.format(date));
String a=formatter.format(date);
String b[]=a.split(" ")[0].split("/");
String x1=(Integer.parseInt(b[2])-2000)+b[1]+b[0];
System.out.println("date part is : "+x1);
String c[]=a.split(" ")[1].split(":");
System.out.println(c[0]);
System.out.println(c[1]);
System.out.println(c[2]);
System.out.println(c[2].trim().split(".")[0]);// exception at this line
System.out.println(c[2].trim().split(".")[1]);
String x2=c[0]+c[1]+c[2].split(".")[0]+c[2].split(".")[1]+"";
System.out.println("time part is : "+x2);
}
}
Log is:
08/10/2013 12:02:18.002
date part is : 131008
12
02
18.002
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException:0 at c.main(c.java:22)
java.lang.String.split(String regex) takes a regular expressions as the argument.
A single dot . is the regular expression for 'any character'. So you split you input after every character.
Escape the dot:
split("\\.");
You can use java.util.regex.Pattern.quote(".") to split the string by "."
str.split(java.util.regex.Pattern.quote("."));
try not to split... u can always use this formatter.day | .month | .hour or so on....