How to extract data from string value using regex? - java

Hello I have the following string:
Country number Time Status USA B30111 11:15 ARRIVED PARIS NC0120 14:40 ON TIME DUBAI RA007 14:45 ON TIME
I need to extract following info:
country = USA
number = B30111
time = 11:15
status = ARRIVED
country = PARIS
number = NC0120
time = 14:40
status = ON TIME
How can I use regex to extract the above data from it?

You can try this:
(?: (\w+) ([\w\d]+) (\d+\:\d+) (ARRIVED|ON TIME))
Explanation
As status can hold more than one word therefore it is not possible to distinct it from the next country that appears, therefore you must append all the possible status as or| in the regex
Java Source:
final String regex = "(?: (\\w+) ([\\w\\d]+) (\\d+\\:\\d+) (ARRIVED|ON TIME))";
final String string = "Country number Time Status USA B30111 11:15 ARRIVED PARIS NC0120 14:40 ON TIME DUBAI RA007 14:45 ON TIME\n\n\n";
final Pattern pattern = Pattern.compile(regex);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("country =" + matcher.group(1));
System.out.println("number =" + matcher.group(2));
System.out.println("time =" + matcher.group(3));
System.out.println("status =" + matcher.group(4));
System.out.println("");
}
output
country =USA
number =B30111
time =11:15
status =ARRIVED
country =PARIS
number =NC0120
time =14:40
status =ON TIME
country =DUBAI
number =RA007
time =14:45
status =ON TIME

If you create an array based on split function, you will have each words in that array.
String[] splitted = str.split(" ");
Then to check, try this:-
for(String test:splitted){
System.out.println(test);
}
This looks more like a CSV file.

Related

Convert date number to words using java

I want to convert a date to words. For example: 12/12/2012 --> twelve twelve two thousand twelve and I already made number to word converter. But now I have problem to print it out.
Here my code:
String patternString = "\\d{2}/\\d{2}/\\d{4}"; // date regex
Pattern pattern = Pattern.compile(patternString); // pattern compiling
Matcher matcher = pattern.matcher(nom); // matching with pattern with input text from user
if (matcher.find()) {
String get_data = matcher.group();
if(get_data.contains("/")){ // check either has "/" slash or not
String parts[] = get_data.split("[/]"); // split process
String get_day = parts[0]; // day will store in first array
String get_month = parts[1]; // month will store in second array
String get_year = parts[2]; // year will store in third array
String s = NumberConvert.convert(Integer.parseInt(get_day))
+ NumberConvert.convert(Integer.parseInt(get_month))
+ NumberConvert.convert(Integer.parseInt(get_year));
String replace = matcher.replaceAll(s); // replace number to words
System.out.println(replace);
}
} else {...}
Input text from user:
12/12/2012 +++ 23/11/2010
But the result print only first pattern and next pattern also replace with value of first pattern too.
twelve twelve two thousand twelve +++ twelve twelve two thousand twelve
Please suggest me the solution
An immediate solution to your problem would be to use Matcher.replaceFirst(), instead of Matcher.replaceAll(), since you only want the first date pattern to be replaced with your written version of the date.
String replace = matcher.replaceFirst(s);
If you would like to be able to process each numeric date one at a time, you can do so in a left-to-right fashion using this code:
String patternString = "\\d{2}/\\d{2}/\\d{4}";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(nom);
String output = "";
while (matcher.find()) {
String get_data = matcher.group();
String parts[] = get_data.split("/");
String get_day = parts[0];
String get_month = parts[1];
String get_year = parts[2];
String s = NumberConvert.convert(Integer.parseInt(get_day)) +
NumberConvert.convert(Integer.parseInt(get_month)) +
NumberConvert.convert(Integer.parseInt(get_year));
if (output.equals("")) {
output = s;
}
else {
output += " +++ " + s;
}
String replace = matcher.replaceFirst("");
matcher = pattern.matcher(replace);
}
After each iteration, the above code resets the Matcher using a string from which the previous date matched has been removed. This lets you "eat" one date at a time, from left to right, building the human readable date output as you go along.

How do i parse a string to get specific information using java?

Here are some lines from a file and I'm not sure how to parse it to extract 4 pieces of information.
11::American President, The (1995)::Comedy|Drama|Romance
12::Dracula: Dead and Loving It (1995)::Comedy|Horror
13::Balto (1995)::Animation|Children's
14::Nixon (1995)::Drama
I would like to get the number, title, release date and genre.
Genre has multiple genres so I would like to save each one in a variable as well.
I'm using the .split("::|\\|"); method to parse it but I'm not able to parse out the release date.
Can anyone help me!
The easiest would be matching by regex, something like this
String x = "11::Title (2016)::Category";
Pattern p = Pattern.compile("^([0-9]+)::([a-zA-Z ]+)\\(([0-9]{4})\\)::([a-zA-Z]+)$");
Matcher m = p.matcher(x);
if (m.find()) {
System.out.println("Number: " + m.group(1) + " Title: " + m.group(2) + " Year: " + m.group(3) + " Categories: " + m.group(4));
}
(please don't nail me on the exact syntax, just out of my head)
Then first capture will be the number, the second will be the name, the third is the year and the fourth is the set of categories, which you may then split by '|'.
You may need to adjust the valid characters for title and categories, but you should get the idea.
If you have multiple lines, split them into an ArrayList first and treat each one separately in a loop.
Try this
String[] s = {
"11::American President, The (1995)::Comedy|Drama|Romance",
"12::Dracula: Dead and Loving It (1995)::Comedy|Horror",
"13::Balto (1995)::Animation|Children's",
"14::Nixon (1995)::Drama",
};
for (String e : s) {
String[] infos = e.split("::|\\s*\\(|\\)::");
String number = infos[0];
String title = infos[1];
String releaseDate = infos[2];
String[] genres = infos[3].split("\\|");
System.out.printf("number=%s title=%s releaseDate=%s genres=%s%n",
number, title, releaseDate, Arrays.toString(genres));
}
output
number=11 title=American President, The releaseDate=1995 genres=[Comedy, Drama, Romance]
number=12 title=Dracula: Dead and Loving It releaseDate=1995 genres=[Comedy, Horror]
number=13 title=Balto releaseDate=1995 genres=[Animation, Children's]
number=14 title=Nixon releaseDate=1995 genres=[Drama]

Java regular expression to parse between dates?

I am struggling to come up with a regular expression to parse some logs that are very unstructured but always have a date that begins with the line that needs to be parsed.
An example is 2015-9-20 05:20:22 lots of log data and then the next date for the next line. So I would basically need to parse everything from the starting date until the next date.
2015-9-20 05:20:22 lots of log data
2015-9-20 05:21:22 lots of new log data
Is it possible to parse this using regular expression?
So I would basically need to parse everything from the starting date until the next date.
If you want to match lines beggining with one date, or beggining with the following day (startDate + 1 day), you can use it in your pattern as literal text.
Using the dates in your example:
^(?:2015-9-20|2015-9-21) .*
Code:
// Instantiate a Date object
Date startDate = new GregorianCalendar(2015, 8, 20).getTime();
// Calculate end date (+1 day)
Calendar endDate = Calendar.getInstance();
endDate.setTime(startDate);
endDate.add(Calendar.DATE, 1); // Add 1 day
// format dates the same way logs use
SimpleDateFormat ft =
new SimpleDateFormat ("y-M-d");
// Create regex
String datesRegex = "^(?:" + ft.format(startDate) + "|" + ft.format(endDate.getTime()) + ") .*";
DEMO
If you want to get all lines from one date to another, and not only those starting with a given date, you should match with the .DOTALL modifier:
^2015-9-20 .*?(?=^2015-9-21 |\z)
Code:
// Create regex
String datesRegex = "^" + ft.format(startDate) + " .*?(?=^" + ft.format(endDate.getTime()) + " |\\z)";
// Compile
Pattern.compile(datesRegex, Pattern.MULTILINE | Pattern.DOTALL);
DEMO
Assuming you're reading the file line-by-line, this should work for you:
^\d{4}-\d{1,2}-\d{2} \d{2}:\d{2}:\d{2} (.*)$
Code example:
String line = "2015-9-20 05:20:22 log data" + System.lineSeparator();
String pattern = "^\\d{4}-\\d{1,2}-\\d{2} \\d{2}:\\d{2}:\\d{2} (.*)$";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println("Value after timestamp is: " + m.group(1));
} else {
System.out.println("NO MATCH");
}
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
public static void main( String args[] ){
String s1 = "2015-9-20 05:20:22 lots of log data";
String s2 = "2015-9-20 05:21:22 lots of new log data";
String pattern = "(\\d{4})-(0?\\d|1[0-2])-([012]\\d|3[01]) ([01]?\\d|2[0-4]):([0-5]?\\d):([0-5]?\\d)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(s1); //same for s2
if(m.find())
System.out.println("True");
else
System.out.println("False");
}
}
Output: True

regex- Extracting in different strings

I have this String :
Date Description Amount Price Charge Shares Owned
04/30/13 INCOME REINVEST 0.0245 $24.66 $12.34 1.998 1,008.369
05/31/13 INCOME REINVEST 0.0228 $22.99 $12.22 1.881 1,010.250
06/28/13 INCOME REINVEST 0.0224 $22.63 $11.97 1.891 1,012.141
I want to extract The dates in a string say "matchedDate" similarly description which in this case are "INCOME REINVEST", "INCOME REINVEST" "INCOME REINVEST"
Amount in a array which happen to be : "0.0245","0.0228","0.0224"
Price in a array :"24.66", "22.99", "22.63"
Charge in a array :"12.34","12.22","11.97"
Shares in a array :"1.998","1.881","1.891"
I don't need the last part "Owned" that corresponds to 1,008.369,1,010.250 and 1,012.141
So far I am able to successfully extract dates by this:
String regex="[0-9]{2}/[0-9]{2}/[0-9]{2}";
Pattern dateMatch = Pattern.compile(regex);
Matcher m = dateMatch.matcher(regString);
while (m.find()) {
String[] matchedDate=new String[] {m.group()};
for(int count=0;count<matchedDate.length;count++){
sysout(matchedDate[count]
}
regString being the string i am trying to do a match on i.e the table i explained in the first block.
I don't need the $ sign's so we can store the numbers in integer arrays. I think we have to identify some kind of pattern of spaces and dollar to do this.
Any help would be appreciated
This should match the parts you need:
(\d{1,2}/\d{1,2}/\d{1,2}).+?([\d.]+)\s\$(\S+)\s\$(\S+)\s(\S+)
Explained:
(\d{1,2}/\d{1,2}/\d{1,2}) - capture date
.+? - match anything up to next number
([\d.]+)\s - capture Amount but match space following it
$(\S+)\s - capture Price but match space following it
$(\S+)\s - capture Charge but match space following it
(\S+) - capture Shares
String regString = "04/30/13 INCOME REINVEST 0.0245 $24.66 $12.34 1.998 1,008.36";
String regex="([0-9]{2}/[0-9]{2}/[0-9]{2})\\s*([\\w ]+)\\s*(\\d+(\\.\\d+)?)\\s*\\$(\\d+(\\.\\d+)?)\\s*\\$(\\d+(\\.\\d+)?)\\s*(\\d+(\\.\\d+)?)\\s*(\\d+(,\\d{3})*(\\.\\d+)?)";
Pattern match = Pattern.compile(regex);
Matcher m = match.matcher(regString);
while (m.find()) {
System.out.println(m.group(1)); //04/30/13
System.out.println(m.group(2)); //INCOME REINVEST
System.out.println(m.group(3)); //0.0245
System.out.println(m.group(5)); //24.66
System.out.println(m.group(7)); //12.34
System.out.println(m.group(9)); //1.998
System.out.println(m.group(11)); //1,008.86
}
Demo
Regex Breakdown:
([0-9]{2}/[0-9]{2}/[0-9]{2}) - Your date regex.
([\\w ]+) - Description - 1+ Word characters and spaces.
(\\d+(\\.\\d+)?) (used 4 times) - Amount, Price, Charge, Shares - 1+ number potentially followed by a . and at least 1 more number.
(\\d+(,\\d{3})*(\\.\\d+)?) - 1+ number, followed potentially by sequences of a , and 3 numbers, followed potentially by a . and at least 1 more number.
String r = "([0-9]{2}/[0-9]{2}/[0-9]{2}).+?\\$((?:(?:\\d+|\\d+,\\d+)\\.\\d+\\s\\$?){3})";
String list = "04/30/13 INCOME REINVEST 0.0245 $24.66 $12.34 1.998 1,008.369";
Matcher m = Pattern.compile(r).matcher(list);
while (m.find())
{
String myData = m.group(1) + " " + m.group(2).replace("$", "");
String[] data = myData.split(" ");
for(String s : data)
System.out.println(s);
}
Outputs:
04/30/13
24.66
12.34
1.998
.+?\\$: non-greedy to ensure that we don't take a '$'--basically skips everything until '$'
((?:(?:\\d+|\\d+,\\d+)\\.\\d+\\s\\$?){3} uses a capturing group to get the three numbers of interest, but with one of the '$', which is removed via .replace() You could do this with .replace(), but the expression would be fairly long.
(?:\\d+|\\d+,\\d+) says "group, but do not capture" a number or #,#
\\.\\d+\\s\\$? says a '.' followed by a #, followed by whitespace and an optional '$'
Here's a general tutorial on Regular Expressions. Here's the section on capturing groups. Good luck!
This should give you what you need and it will also run for any number of similar records on your input string ...
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
private static Pattern PATTERN = Pattern.compile("([0-9]{2}/[0-9]{2}/[0-9]{2})\\s+([a-zA-Z]+\\s[a-zA-Z]+)\\s+(\\d{1}\\.\\d{0,4})\\s+\\$(\\d{1,2}\\.\\d{0,2})\\s+\\$(\\d{1,2}\\.\\d{0,2})\\s+(\\d{1,2}\\.\\d{0,3})\\s+");
public static void main(String a[] ) {
String regString = "04/30/13 INCOME REINVEST 0.0245 $24.66 $12.34 1.998 1,008.369 " +
"05/31/13 INCOME REINVEST 0.0228 $22.99 $12.22 1.881 1,010.250 " +
"06/28/13 INCOME REINVEST 0.0224 $22.63 $11.97 1.891 1,012.141 ";
ArrayList<String> date = new ArrayList<String>();
ArrayList<String> desc = new ArrayList<String>();
ArrayList<String> amt = new ArrayList<String>();
ArrayList<String> price = new ArrayList<String>();
ArrayList<String> charge = new ArrayList<String>();
ArrayList<String> share = new ArrayList<String>();
Matcher m = PATTERN.matcher(regString);
while(m.find()) {
date.add(m.group(1));
desc.add(m.group(2));
amt.add(m.group(3));
price.add(m.group(4));
charge.add(m.group(5));
share.add(m.group(6));
}
System.out.println("DATE : " + date.toString());
System.out.println("DESC : " + desc.toString());
System.out.println("AMOUNT : " + amt.toString());
System.out.println("PRICE : " + price.toString());
System.out.println("CHARGE : " + charge.toString());
System.out.println("SHARES : " + share.toString());
}
}
The output of the above program is as below,
DATE : [04/30/13, 05/31/13, 06/28/13]
DESC : [INCOME REINVEST, INCOME REINVEST, INCOME REINVEST]
AMOUNT : [0.0245, 0.0228, 0.0224]
PRICE : [24.66, 22.99, 22.63]
CHARGE : [12.34, 12.22, 11.97]
SHARES : [1.998, 1.881, 1.891]

How to get regex matched group values

I have following lines of code
String time = "14:35:59.99";
String timeRegex = "(([01][0-9])|(2[0-3])):([0-5][0-9]):([0-5][0-9])(.([0-9]{1,3}))?";
String hours, minutes, seconds, milliSeconds;
Pattern pattern = Pattern.compile(timeRegex);
Matcher matcher = pattern.matcher(time);
if (matcher.matches()) {
hours = matcher.replaceAll("$1");
minutes = matcher.replaceAll("$4");
seconds = matcher.replaceAll("$5");
milliSeconds = matcher.replaceAll("$7");
}
I am getting hours, minutes, seconds, and milliSeconds using the matcher.replace method and back references of regex groups. Is there any better method to get value of regex groups. I tried
hours = matcher.group(1);
but it throws the following exception:
java.lang.IllegalStateException: No match found
at java.util.regex.Matcher.group(Matcher.java:477)
at com.abnamro.cil.test.TimeRegex.main(TimeRegex.java:70)
Am I missing something here?
It works fine if you avoid calling matcher.replaceAll. When you call replaceAll it forgets any previous matches.
String time = "14:35:59.99";
String timeRegex = "([01][0-9]|2[0-3]):([0-5][0-9]):([0-5][0-9])(?:\\.([0-9]{1,3}))?";
Pattern pattern = Pattern.compile(timeRegex);
Matcher matcher = pattern.matcher(time);
if (matcher.matches()) {
String hours = matcher.group(1);
String minutes = matcher.group(2);
String seconds = matcher.group(3);
String miliSeconds = matcher.group(4);
System.out.println(hours + ", " + minutes + ", " + seconds + ", " + miliSeconds);
}
Notice that I've also made a couple of improvements to your regular expression:
I've used non-capturing groups (?: ... ) for the groups that you aren't interested in capturing.
I've changed . which matches any character to \\. which matches only a dot.
See it working online: ideone
It works if you use matcher.find() before calling the group function.

Categories

Resources