Next digit after a word using index position in java - java

I am trying to solve this question:
Get document on some condition in elastic search java API
My logic is first we get all the position of months which is in string, After that i extract next word which is a 4 digit or 2 digit year, Then calculate difference using this.
For getting months position i am using this piece of code:-
String[] threeMonthArray=new String[]{" Jan "," Feb "," Mar "," Apr "," May "," June "," July "," Aug "," Sep "," Oct "," Nov "," Dec "};
String[] completeMonthArray=new String[]{"January","Feburary","March","April","May","June","July","Augest","September","October","November","December"};
List indexArray=new ArrayList();
for(int i=0;i<threeMonthArray.length;i++){
int index = parsedContent.toLowerCase().indexOf(threeMonthArray[i].toLowerCase());
while (index >= 0) {
System.out.println(threeMonthArray[i]+" : "+index+"------");
indexArray.add(index);
index = parsedContent.toLowerCase().indexOf(threeMonthArray[i].toLowerCase(), index + 1);
}
// System.out.println(threeMonthArray[i]+" : "+parsedContent.toLowerCase().indexOf(threeMonthArray[i].toLowerCase())+"------");
}
Collections.sort(indexArray);
System.out.println( indexArray);
And it's showing this output:-
[2873, 2884, 3086, 3098, 4303, 4315, 6251, 6262, 8130, 8142, 15700, 15711]
I am getting correct position. My problem is how i can get next word which must be a digit.
Jun 2010 to Sep 2011 First Document
Jun 2009 to Aug 2011 Second Document
Nov 2011 – Sep 2012 Third Document
Nov 2012- Sep 2013 Forth Document

You can use a regular expression to find the next number starting at the position of your last found month:
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(parsedContent);
if (m.find(index)) {
String year = m.group();
}

Related

Parse and print only the stack trace when Regex pattern matches

I want to print just the stack trace starting from and at cab.ces.upgrade.controller.UpgradeRecommendationController.getPackageOffers(UpgradeRecommendationController.java:1234). This is how I get output.
**** Error Wed Jul 05 23:23:49 GMT 2017 1499297029577 /lll/dynamo/servlet/pipeline/RequestScopeManager/RequestScope-23596/cab/ces/upgrade/manager/business/UpgradeRecommendationsBusinessManager at cab.ces.upgrade.controller.UpgradeRecommendationController.getPackageOffers(UpgradeRecommendationController.java:1234)
So I need to remove **** Error Wed Jul 05 23:23:49 GMT 2017 1499297029577 /lll/dynamo/servlet/pipeline/RequestScopeManager/RequestScope-23596/cab/ces/upgrade/manager/business/UpgradeRecommendationsBusinessManager
My Code
if((strLine.contains("**** Error") )){
//|| (strLine.contains("(CreditCheckManager")
//String sPattern = "\\tat (.*)\\.(.*)\\(([^:]*):?([\\d]*)\\)";
Matcher m = p.matcher(strLine);
if(m.find()){
totCount++;
iCount++;
if((iCount==1) ){
System.out.println("Class name:- " + m.group(3));
System.out.println("Line Number:- " + m.group(4));
System.out.println();
System.out.println("VodafoneSystemException occurence count: " + exCount);
System.out.println();
System.out.println("VodafoneSystemException stack trace count: " + totCount);
}
System.out.println(strLine);
Use at (.)+ to find all string after at, See regex

Parsing flat file with repeating section using regex

I have a flat file with data in following format:
1:00 PM
Name UniqueID
ABX 298819 12 519440AD3
12:00 AM
Name UniqueID
AX1 239949 01 119440AD3
Where each section starts with a time, followed by headers and then values. I am trying to capture each of these sections through regex, so I can get:
section 1:
1:00 PM
Name UniqueID
ABX 298819 12 519440AD3
section 2:
12:00 AM
Name UniqueID
AX1 239949 01 119440AD3
And later parse each of these sections in to java class object, which is given below:
public class Section {
String timestamp;
List<Row> rows;
}
public class Row {
String name;
String uniqueId;
}
but I am not able to extract the "text" between two positive regex matches. Below is the regular expression i tried:
((1[012]|[1-9]):[0-5][0-9](\\s)?(?i)(am|pm))(?=.*)
But it returns only the time values:
10:30 AM
1:00 PM
1:30 PM
10:30 AM
1:00 PM
1:30 PM
I even tried adding Pattern.MULTILINE to Pattern but it didn't work either.
Assuming the structure you showed us repeats throughout the file, then there are four types of lines in sequence: timestamp, header, data, empty line.
For example, if you want to separate the unique ID from the name, you could try:
String third = "ABX 298819 12 519440AD3";
String uniqueId = third.replaceAll(".*\\s+(\\w+)", "$1");
String name = third.replaceAll("(.*)\\s+\\w+", "$1");

splitting or tokenize comma with space and comma[JAVA]

January 22, 2014
I want to split the string into three but the second has , and space
If I understand you, you could use a single (zero or one matches) regular expression with something like -
String in = "January 22, 2014";
String[] arr = in.split(",?\\ ");
for (String str : arr) {
System.out.println(str);
}
Output is
January
22
2014

Reformat the String after removing a word from it

I have a String as "AASS MON 01 2013 365.00 HJJ Call"
I need to remove the String HJJ from the above String and need the output as
AASS MON 01 2013 365.00 HJJ Call
I tried the following thing
if(symbol.contains("HJJ"))
{
symbol = symbol.replace("HJJ","");
}
But with this i am getting output as
AASS MON 01 2013 365.00 Call ( One extra space before call )
Where i want it to be
AASS MON 01 2013 365.00 Call
Here is what I usually use:
public static String removeExtraSpaces(String input) {
return input.trim().replaceAll(" +", " ");
}
trim removes beginning and ending spaces while replaceAll replaces any line of spaces by one single space.
public class Trimimg
{
public static void main(String[]args)
{
String str = "AASS MON 01 2013 365.00 HJJ Call";
String newStr = str.replace(" HJJ", "");
System.out.println(newStr);
}
}

Extract a sequence from a string in java using Regex

I have a log with a pattern . The thing is in the last it is a little bit different from regular .
a> nc,71802265,0,"Tuesday, June 26, 2012 09:06:49 UTC",38.8335,-122.8072,1.6,0.00,21,"Northern California"
b> ci,11127314,0,"Tuesday, June 26, 2012 08:37:52 UTC",34.2870,-118.3360,2.2,10.20,100,"Greater Los Angeles area, California"
c> us,b000aqpn,6,"Tuesday, June 26, 2012 08:29:55 UTC",53.4819,-165.2794,4.4,25.60,96,"Fox Islands, Aleutian Islands, Alaska"
String regex = "^\\"[a-z,A-Z]\\s*\\(,)*[a-z,A-Z]\\"";
Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
from a I need --- "Northern California"
from b I need --- "Greater Los Angeles area, California" and so on
Thanks
You could use String#lastIndexOf, starting from the penultimate character to find the first ":
String s = "a> nc,71802265,0,\"Tuesday, June 26, 2012 09:06:49 UTC\",38.8335,-122.8072,1.6,0.00,21,\"Northern California\"";
int start = s.lastIndexOf("\"", s.length() - 2) + 1;
String location = s.substring(start, s.length() - 1);
Why not use String.split(regex, limit) and specify the number of commas you need to split on.
That way you can get the last field intact with commas, and then simply strip the double-quotes.
Use the $ anchor to tell that your match should be at the end of the line:
String lines = "a> nc,71802265,0,\"Tuesday, June 26, 2012 09:06:49 UTC\",38.8335,-122.8072,1.6,0.00,21,\"Northern California\"\nb> ci,11127314,0,\"Tuesday, June 26, 2012 08:37:52 UTC\",34.2870,-118.3360,2.2,10.20,100,\"Greater Los Angeles area, California\"\nc> us,b000aqpn,6,\"Tuesday, June 26, 2012 08:29:55 UTC\",53.4819,-165.2794,4.4,25.60,96,\"Fox Islands, Aleutian Islands, Alaska\"";
String regex = "\"[^\"]*\"$";
Matcher m = Pattern.compile(regex, Pattern.MULTILINE).matcher(lines);
while (m.find()) {
System.out.println(m.group());
}
outputs:
"Northern California"
"Greater Los Angeles area, California"
"Fox Islands, Aleutian Islands, Alaska"
for(String s: log.split("\n")){
System.out.println(s.replaceAll(".+(\".+\")$","$1"));
}

Categories

Resources