JAVA regex failing - java

I have string which is of format:
;1=2011-10-23T16:16:53+0530;2=2011-10-23T16:16:53+0530;3=2011-10-23T16:16:53+0530;4=2011-10-23T16:16:53+0530;
I have written following code to find string 2011-10-23T16:16:53+0530 from (;1=2011-10-23T16:16:53+0530;)
Pattern pattern = Pattern.compile("(;1+)=(\\w+);");
String strFound= "";
Matcher matcher = pattern.matcher(strindData);
while (matcher.find()) {
strFound= matcher.group(2);
}
But it is not working as expected. Can you please give me any hint?

Can you please give me any hint?
Yes. Neither -, nor :, nor + are part of \w.

Do you have to use a regex? Why not call String.split() to break up the string on semi-colon boundaries. Then call it again to break up the chunks by the equals sign. At that point you'll have an integer and the date in string form. From there you can parse the date string.
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
public final class DateScan {
private static final String INPUT = ";1=2011-10-23T16:16:53+0530;2=2011-10-23T16:16:53+0530;3=2011-10-23T16:16:53+0530;4=2011-10-23T16:16:53+0530;";
public static void main(final String... args) {
final SimpleDateFormat parser = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ssZ");
final String[] pairs = INPUT.split(";");
for (final String pair : pairs) {
if ("".equals(pair)) {
continue;
}
final String[] integerAndDate = pair.split("=");
final Integer integer = Integer.parseInt(integerAndDate[0]);
final String dateString = integerAndDate[1];
try {
final Date date = parser.parse(dateString);
System.out.println(integer + " -> " + date);
} catch (final ParseException pe) {
System.err.println("bad date: " + dateString + ": " + pe);
}
}
}
}

I've change the input a bit, but just for presentation reasons that is
You can try this:
String input = " ;1=2011-10-23T16:16:53+0530; 2=2011-10-23T16:17:53+0530;3=2011-10-23T16:18:53+0530;4=2011-10-23T16:19:53+0530;";
Pattern p = Pattern.compile("(;\\d+?)?=(.+?);");
Matcher m = p.matcher(input);
while(m.find()){
System.out.println(m.group(2));
}

Related

How to Split the String by symbol name and Date in this case

I have got a String in this format
FUTSTKACC28-APR-2016
ACC is a symbol and 28-APR-2016 is a expiry date
FUTSTK is predefined word
How to retrieve values symbol and Date in this case
For example how to get
ACC
and
28-APR-2016
some sample data
FUTSTKACC26-MAY-2016
FUTSTKACC28-APR-2016
FUTSTKACC30-JUN-2016
FUTSTKADANIENT26-MAY-2016
FUTSTKADANIENT28-APR-2016
FUTSTKADANIENT30-JUN-2016
You have a fixed length prefix word and a fixed length date. You can remove the prefix, and then take the substrings from the right by the 11 characters in your dates. Something like,
String[] sample = { "FUTSTKACC26-MAY-2016", "FUTSTKACC28-APR-2016",
"FUTSTKACC30-JUN-2016", "FUTSTKADANIENT26-MAY-2016",
"FUTSTKADANIENT28-APR-2016", "FUTSTKADANIENT30-JUN-2016" };
String predefWord = "FUTSTK";
for (String input : sample) {
if (input.startsWith(predefWord)) {
input = input.substring(predefWord.length());
// There are 11 characters in the date format
String symbol = input.substring(0, input.length() - 11);
String dateStr = input.substring(input.length() - 11);
System.out.printf("symbol=%s, date=%s%n", symbol, dateStr);
}
}
Output is
symbol=ACC, date=26-MAY-2016
symbol=ACC, date=28-APR-2016
symbol=ACC, date=30-JUN-2016
symbol=ADANIENT, date=26-MAY-2016
symbol=ADANIENT, date=28-APR-2016
symbol=ADANIENT, date=30-JUN-2016
Something like this should work:
final String PATTERN = "(FUTSTK)(.+)(\d\d-\w\w\w-\d\d\d\d)"
Pattern p = Pattern.compile(PATTERN);
Matcher m = p.matcher("FUTSTKACC28-APR-2016");
String symbol = m.group(1);
DateFormat format = new SimpleDateFormat("dd-MMM-yyyy", Locale.ENGLISH);
Date date = format.parse(string);
final String str = "FUTSTKACCCCCCC28-APR-2016";
final String[] strArr = str.split("-");
final String month = strArr[0].substring(strArr[0].length() - 2);
final String word = strArr[0].substring(0, strArr[0].length() - 2);
System.out.println("word: " + word);
System.out.println("date: " + month + "-" + strArr[1] + "-" + strArr[2]);
A regex approach (bits stolen from #ElliottFrisch) assuming you know the predefined word:
String[] sample = { "FUTSTKACC26-MAY-2016", "FUTSTKACC28-APR-2016",
"FUTSTKACC30-JUN-2016", "FUTSTKADANIENT26-MAY-2016",
"FUTSTKADANIENT28-APR-2016", "FUTSTKADANIENT30-JUN-2016" };
String predefined = "FUTSTK";
Pattern p = Pattern.compile(Pattern.quote(predefined) + "(\\w+)(\\d\\d-\\w\\w\\w-\\d\\d\\d\\d)");
for (String s: sample) {
Matcher m = p.matcher(s);
if (m.matches()) {
System.out.println(m.group(1) + " " + m.group(2));
}
}
output:
ACC 26-MAY-2016
ACC 28-APR-2016
ACC 30-JUN-2016
ADANIENT 26-MAY-2016
ADANIENT 28-APR-2016
ADANIENT 30-JUN-2016

How to grab text from a messy string in java?

I am reading a text file which contains movie titles, year, language etc.
I am trying to grab those attributes.
Suppose some string are like this :
String s = "A Fatal Inversion" (1992)"
String d = "(aka "Verhngnisvolles Erbe" (1992)) (Germany)"
String f = "\"#Yaprava\" (2013) "
String g = "(aka \"Love Heritage\" (2002)) (International: English title)"
How can i grab title, year, country if specified, what sort of title if specified from this?
I am not very good at using regex and patterns, but I don't know how to find what sort of attribute it is when they are not specified. I am doing this because I am trying to generate xml from a textfile. I have the dtd for it but im not sure I need it to use it in this case.
Edit: Here is what i have tried.
String pattern;
Pattern p = Pattern.compile("\"([^\"]*)\"");
Matcher m;
Pattern number = Pattern.compile("\\d+");
Matcher num;
m = p.matcher(s);
num = number.matcher(s);
if(m.find()){
System.out.println(m.group(1));
}
if(num.find()){
System.out.println(num.group(0));
}
I suggest you extract the year first as this seems fairly consistent. Then I'd extract the country (if present) and the rest I'll assume is the title.
For extracting the countries I'd recommend you hardcode a regex pattern with the names of known countries. It might take some iterating to determine what these are as they seem to be pretty inconsistent.
This code is a bit ugly (but then so is the data!):
public class Extraction {
public final String original;
public String year = "";
public String title = "";
public String country = "";
private String remaining;
public Extraction(String s) {
this.original = s;
this.remaining = s;
extractBracketedYear();
extractBracketedCountry();
this.title = remaining;
}
private void extractBracketedYear() {
Matcher matcher = Pattern.compile(" ?\\(([0-9]+)\\) ?").matcher(remaining);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
this.year = matcher.group(1);
matcher.appendReplacement(sb, "");
}
matcher.appendTail(sb);
remaining = sb.toString();
}
private void extractBracketedCountry() {
Matcher matcher = Pattern.compile("\\((Germany|International: English.*?)\\)").matcher(remaining);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
this.country = matcher.group(1);
matcher.appendReplacement(sb, "");
}
matcher.appendTail(sb);
remaining = sb.toString();
}
public static void main(String... args) {
for (String s : new String[] {
"A Fatal Inversion (1992)",
"(aka \"Verhngnisvolles Erbe\" (1992)) (Germany)",
"\"#Yaprava\" (2013) ",
"(aka \"Love Heritage\" (2002)) (International: English title)"}) {
Extraction extraction = new Extraction(s);
System.out.println("title = " + extraction.title);
System.out.println("country = " + extraction.country);
System.out.println("year = " + extraction.year);
System.out.println();
}
}
}
Produces:
title = A Fatal Inversion
country =
year = 1992
title = (aka "Verhngnisvolles Erbe")
country = Germany
year = 1992
title = "#Yaprava"
country =
year = 2013
title = (aka "Love Heritage")
country = International: English title
year = 2002
Once you've got this data, you can manipulate it further (e.g. "International: English title" -> "England").

Extracting a value from a file name base on regex in Java

Suppose my file name pattern is something like this %#_Report_%$_for_%&.xls and %# and %$ regex can have any character but %& is a date.
Now how can i get the actual values of those regex on filename in java.
For example if actual filename is Genr_Report_123_for_20151105.xls how to get
%# value is Genr
%$ value is 123
%& value is 20151105
You can do it like this:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Rgx {
private String str1 = "", str2 = "", date = "";
public static void main(String[] args) {
String fileName = "Genr_Report_123_for_20151105.xls";
Rgx rgx = new Rgx();
rgx.extractValues(fileName);
System.out.println(rgx.str1 + " " + rgx.str2 + " " + rgx.date);
}
private void extractValues(String fileName) {
Pattern pat = Pattern.compile("([^_]+)_Report_([^_]+)_for_([\\d]+)\\.xls");
Matcher m = pat.matcher(fileName);
if (m.find()) {
str1 = m.group(1);
str2 = m.group(2);
date = m.group(3);
}
}
}

Getting some data from HTML using regex

I was trying to get some data from html. This is my code:
public static void main(String[] args) {
final String str = "<div class=\"b-vacancy-list-salary\">\n" +
" from 50 000\n" +
" to 70 000\n" +
" USD.\n" +
" </div>";
System.out.println(Arrays.toString(getTagValues(str).toArray()));
}
static final String tag = "<div class=\"b-vacancy-list-salary\">\n";
private static final Pattern TAG_REGEX = Pattern.compile(tag+"(.+?)</div>");
private static List<String> getTagValues(final String str) {
System.out.println(tag);
final List<String> tagValues = new ArrayList<String>();
final Matcher matcher = TAG_REGEX.matcher(str);
while (matcher.find()) {
tagValues.add(matcher.group(1));
}
return tagValues;
}
It returns [], but not value. What's wrong?
You can remove line feed.
The better way to parse HTML is to use DOM parser or Xpath.
E.g :
public static void main(String[] args) {
final String str = "<div class=\"b-vacancy-list-salary\">\n"
+ " from 50 000\n"
+ " to 70 000\n"
+ " USD.\n"
+ " </div>";
System.out.println(Arrays.toString(getTagValues(str).toArray()));
}
static final String tag = "<div class=\"b-vacancy-list-salary\">";
private static final Pattern TAG_REGEX = Pattern.compile(tag + "(.+?)</div>");
private static List<String> getTagValues(final String str) {
System.out.println(tag);
final List<String> tagValues = new ArrayList<String>();
final Matcher matcher = TAG_REGEX.matcher(str.replace("\n", ""));
while (matcher.find()) {
tagValues.add(matcher.group(1).trim());
}
return tagValues;
}
Instead of
private static final Pattern TAG_REGEX = Pattern.compile(tag+"(.+?)</div>");
use
private static final Pattern TAG_REGEX = Pattern.compile(tag+"([\\s|\\S]+?)</div>");
Try adding Pattern.DOTALL as the second parameter of Pattern.compile. This enables the dot in the pattern to match newlines. Not sure if this quite gives you what you want, but it may help you get started.
private static final Pattern TAG_REGEX = Pattern.compile(tag + "(.+?)</div>",
Pattern.DOTALL);
Javadoc on DOTALL is here
.* is not include the new line. try this:
Pattern.compile(tag + "((.|\n)*)</div>");
You need to make the "." match newline characters, you can do this by putting "(?s)" at the front of your regular expression; so in your case, do Pattern.compile("(?s)" + tag + "(.+?)");

Scanner - parsing code values using delimiter regex

I'm trying to use a Scanner to read in lines of code from a string of the form "p.addPoint(x,y);"
The regex format I'm after is:
*anything*.addPoint(*spaces or nothing* OR ,*spaces or nothing*
What I've tried so far isn't working: [[.]+\\.addPoint(&&[\\s]*[,[\\s]*]]
Any ideas what I'm doing wrong?
I tested this in Python, but the regexp should be transferable to Java:
>>> regex = '(\w+\.addPoint\(\s*|\s*,\s*|\s*\)\s*)'
>>> re.split(regex, 'poly.addPoint(3, 7)')
['', 'poly.addPoint(', '3', ', ', '7', ')', '']
Your regexp seems seriously malformed. Even if it wasn't, matching infinitely many repetitions of the . wildcard character at the beginning of the string would probably result in huge swaths of text matching that aren't actually relevant/desired.
Edit: Misunderstood the original spec., current regexp should be correct.
Another way:
public class MyPattern {
private static final Pattern ADD_POINT;
static {
String varName = "[\\p{Alnum}_]++";
String argVal = "([\\p{Alnum}_\\p{Space}]++)";
String regex = "(" + varName + ")\\.addPoint\\(" +
argVal + "," +
argVal + "\\);";
ADD_POINT = Pattern.compile(regex);
System.out.println("The Pattern is: " + ADD_POINT.pattern());
}
public void findIt(String filename) throws FileNotFoundException {
Scanner s = new Scanner(new FileReader(filename));
while (s.findWithinHorizon(ADD_POINT, 0) != null) {
final MatchResult m = s.match();
System.out.println(m.group(0));
System.out.println(" arg1=" + m.group(2).trim());
System.out.println(" arg2=" + m.group(3).trim());
}
}
public static void main(String[] args) throws FileNotFoundException {
MyPattern p = new MyPattern();
final String fname = "addPoint.txt";
p.findIt(fname);
}
}

Categories

Resources