I am trying to solve this question:
Get document on some condition in elastic search java API
My logic is first we get all the position of months which is in string, After that i extract next word which is a 4 digit or 2 digit year, Then calculate difference using this.
For getting months position i am using this piece of code:-
String[] threeMonthArray=new String[]{" Jan "," Feb "," Mar "," Apr "," May "," June "," July "," Aug "," Sep "," Oct "," Nov "," Dec "};
String[] completeMonthArray=new String[]{"January","Feburary","March","April","May","June","July","Augest","September","October","November","December"};
List indexArray=new ArrayList();
for(int i=0;i<threeMonthArray.length;i++){
int index = parsedContent.toLowerCase().indexOf(threeMonthArray[i].toLowerCase());
while (index >= 0) {
System.out.println(threeMonthArray[i]+" : "+index+"------");
indexArray.add(index);
index = parsedContent.toLowerCase().indexOf(threeMonthArray[i].toLowerCase(), index + 1);
}
// System.out.println(threeMonthArray[i]+" : "+parsedContent.toLowerCase().indexOf(threeMonthArray[i].toLowerCase())+"------");
}
Collections.sort(indexArray);
System.out.println( indexArray);
And it's showing this output:-
[2873, 2884, 3086, 3098, 4303, 4315, 6251, 6262, 8130, 8142, 15700, 15711]
I am getting correct position. My problem is how i can get next word which must be a digit.
Jun 2010 to Sep 2011 First Document
Jun 2009 to Aug 2011 Second Document
Nov 2011 – Sep 2012 Third Document
Nov 2012- Sep 2013 Forth Document
You can use a regular expression to find the next number starting at the position of your last found month:
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(parsedContent);
if (m.find(index)) {
String year = m.group();
}
I have a string :
String str = "sces123 4096 May 27 16:22 sces123 abc";
I want to get sces123 abc from the string. My code is :
String[] line = str.split("\\s+");
String name = str.substring(str.indexOf(line[5]));
It returns the whole string.
Dont know how to do.
any help appreciated!
Your code should be
String[] line = str.split("\\s+");
String name = str.substring(str.lastIndexOf(line[5]));
because str.lastindexOf(line[5]) returns 0 and then the substring returns the whole String.
In your case you just need to change str.indexOf -> str.lastIndexOf.
This is one easy solution :
String str = "sces123 4096 May 27 16:22 sces123 abc";
//split spaces
String[] line = str.split(" ");
//get 2 last columns
String name = (line[5] + " " + line[6]);
System.out.println(name);
As Glorfindel said in the comment sces123 which is the content of if line[5] also contain as the first substring in the main String str. That why you are getting the full string.
Whats really happening here is:
indexOf( line[ 5 ]) --> returning 0
str.substring(0) --> returning substring form 0 to last which is the main string
If you are only doing the hard codded things then i don't see the purpose of you here.
But What you want you get in this way (if it serve your purpose ) :
String name = str.substring( str.indexOf( line[ 5 ]+" "+line[6] ) );
Try This:
String str = "sces123 4096 May 27 16:22 sces123 abc";
String[] line = str.split("\\s+");
System.out.println(str.substring(str.lastIndexOf(line[5])));
You could use a Matcher to find the end of the 5th match:
String str = "sces123 4096 May 27 16:22 sces123 abc";
Pattern p = Pattern.compile("\\s+");
Matcher m = p.matcher(str);
for (int i = 0; i < 5; i++) {
m.find();
}
String name = str.substring(m.end());
In my opinion this is better than using lastIndexOf on to concatenating elements at indices 5 and 6, for the following reasons:
It does not require line[5] to be the last occurence of that string.
Using lastIndexOf doesn't work for input
"sces123 4096 May 27 16:22 sces123 sces123"
It also works for seperator strings of arbirtrary length.
Using line[ 5 ]+" "+line[6] doesn't work for input
"sces123 4096 May 27 16:22 sces123 abc"
It does not require the number elements after the split to be 7.
Using line[ 5 ]+" "+line[6] doesn't work for input
"sces123 4096 May 27 16:22 sces123 abc def"
I am trying to extract movie names from a list that looks like this:
The Maze Runner 2014 DVDRip XviD MP3-RARBG
Fury 2014 DVDSCR x264 AC3-Blackjesus
Dracula's Untold Story (WebRip / 2014)
I need to extract the words up to the year or a special character like ( or [ but not '
The Maze Runner 2014 DVDRip XviD MP3-RARBG ==> The Maze Runner
Fury 2014 DVDSCR x264 AC3-Blackjesus ==> Fury
Dracula's Untold Story (WebRip / 2014) == Dracula's Untold Story
Dracula's Untold Story [WebRip / 2014] == Dracula's Untold Story
I have no idea how to go on about writing a complex regex like this. Any ideas?
The below code snippet can be helpful to meet your requirements
public static String extractMovieName(String movieNameString){
Pattern pattern = Pattern.compile("([\\w' ]+)([\\[]|[\\(]|[\\d]{4})");
Matcher matcher = pattern.matcher(movieNameString);
String extractedName = "";
if(matcher.find()){
extractedName = matcher.group(1);
}
return extractedName;
}
^[a-zA-Z0-9\ '-]+(?=\b\d{4}\b|\()
Try this.See demo.
http://regex101.com/r/yR3mM3/4
Try below code:
Example:
System.out.println("Fury 2014 DVDSCR x264 AC3-Blackjesus".replaceAll("\\s(\\d|\\(|\\[).*", ""));
I have a String as "AASS MON 01 2013 365.00 HJJ Call"
I need to remove the String HJJ from the above String and need the output as
AASS MON 01 2013 365.00 HJJ Call
I tried the following thing
if(symbol.contains("HJJ"))
{
symbol = symbol.replace("HJJ","");
}
But with this i am getting output as
AASS MON 01 2013 365.00 Call ( One extra space before call )
Where i want it to be
AASS MON 01 2013 365.00 Call
Here is what I usually use:
public static String removeExtraSpaces(String input) {
return input.trim().replaceAll(" +", " ");
}
trim removes beginning and ending spaces while replaceAll replaces any line of spaces by one single space.
public class Trimimg
{
public static void main(String[]args)
{
String str = "AASS MON 01 2013 365.00 HJJ Call";
String newStr = str.replace(" HJJ", "");
System.out.println(newStr);
}
}
I have a log with a pattern . The thing is in the last it is a little bit different from regular .
a> nc,71802265,0,"Tuesday, June 26, 2012 09:06:49 UTC",38.8335,-122.8072,1.6,0.00,21,"Northern California"
b> ci,11127314,0,"Tuesday, June 26, 2012 08:37:52 UTC",34.2870,-118.3360,2.2,10.20,100,"Greater Los Angeles area, California"
c> us,b000aqpn,6,"Tuesday, June 26, 2012 08:29:55 UTC",53.4819,-165.2794,4.4,25.60,96,"Fox Islands, Aleutian Islands, Alaska"
String regex = "^\\"[a-z,A-Z]\\s*\\(,)*[a-z,A-Z]\\"";
Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
from a I need --- "Northern California"
from b I need --- "Greater Los Angeles area, California" and so on
Thanks
You could use String#lastIndexOf, starting from the penultimate character to find the first ":
String s = "a> nc,71802265,0,\"Tuesday, June 26, 2012 09:06:49 UTC\",38.8335,-122.8072,1.6,0.00,21,\"Northern California\"";
int start = s.lastIndexOf("\"", s.length() - 2) + 1;
String location = s.substring(start, s.length() - 1);
Why not use String.split(regex, limit) and specify the number of commas you need to split on.
That way you can get the last field intact with commas, and then simply strip the double-quotes.
Use the $ anchor to tell that your match should be at the end of the line:
String lines = "a> nc,71802265,0,\"Tuesday, June 26, 2012 09:06:49 UTC\",38.8335,-122.8072,1.6,0.00,21,\"Northern California\"\nb> ci,11127314,0,\"Tuesday, June 26, 2012 08:37:52 UTC\",34.2870,-118.3360,2.2,10.20,100,\"Greater Los Angeles area, California\"\nc> us,b000aqpn,6,\"Tuesday, June 26, 2012 08:29:55 UTC\",53.4819,-165.2794,4.4,25.60,96,\"Fox Islands, Aleutian Islands, Alaska\"";
String regex = "\"[^\"]*\"$";
Matcher m = Pattern.compile(regex, Pattern.MULTILINE).matcher(lines);
while (m.find()) {
System.out.println(m.group());
}
outputs:
"Northern California"
"Greater Los Angeles area, California"
"Fox Islands, Aleutian Islands, Alaska"
for(String s: log.split("\n")){
System.out.println(s.replaceAll(".+(\".+\")$","$1"));
}