I have a log with a pattern . The thing is in the last it is a little bit different from regular .
a> nc,71802265,0,"Tuesday, June 26, 2012 09:06:49 UTC",38.8335,-122.8072,1.6,0.00,21,"Northern California"
b> ci,11127314,0,"Tuesday, June 26, 2012 08:37:52 UTC",34.2870,-118.3360,2.2,10.20,100,"Greater Los Angeles area, California"
c> us,b000aqpn,6,"Tuesday, June 26, 2012 08:29:55 UTC",53.4819,-165.2794,4.4,25.60,96,"Fox Islands, Aleutian Islands, Alaska"
String regex = "^\\"[a-z,A-Z]\\s*\\(,)*[a-z,A-Z]\\"";
Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
from a I need --- "Northern California"
from b I need --- "Greater Los Angeles area, California" and so on
Thanks
You could use String#lastIndexOf, starting from the penultimate character to find the first ":
String s = "a> nc,71802265,0,\"Tuesday, June 26, 2012 09:06:49 UTC\",38.8335,-122.8072,1.6,0.00,21,\"Northern California\"";
int start = s.lastIndexOf("\"", s.length() - 2) + 1;
String location = s.substring(start, s.length() - 1);
Why not use String.split(regex, limit) and specify the number of commas you need to split on.
That way you can get the last field intact with commas, and then simply strip the double-quotes.
Use the $ anchor to tell that your match should be at the end of the line:
String lines = "a> nc,71802265,0,\"Tuesday, June 26, 2012 09:06:49 UTC\",38.8335,-122.8072,1.6,0.00,21,\"Northern California\"\nb> ci,11127314,0,\"Tuesday, June 26, 2012 08:37:52 UTC\",34.2870,-118.3360,2.2,10.20,100,\"Greater Los Angeles area, California\"\nc> us,b000aqpn,6,\"Tuesday, June 26, 2012 08:29:55 UTC\",53.4819,-165.2794,4.4,25.60,96,\"Fox Islands, Aleutian Islands, Alaska\"";
String regex = "\"[^\"]*\"$";
Matcher m = Pattern.compile(regex, Pattern.MULTILINE).matcher(lines);
while (m.find()) {
System.out.println(m.group());
}
outputs:
"Northern California"
"Greater Los Angeles area, California"
"Fox Islands, Aleutian Islands, Alaska"
for(String s: log.split("\n")){
System.out.println(s.replaceAll(".+(\".+\")$","$1"));
}
Related
I have a flat file with data in following format:
1:00 PM
Name UniqueID
ABX 298819 12 519440AD3
12:00 AM
Name UniqueID
AX1 239949 01 119440AD3
Where each section starts with a time, followed by headers and then values. I am trying to capture each of these sections through regex, so I can get:
section 1:
1:00 PM
Name UniqueID
ABX 298819 12 519440AD3
section 2:
12:00 AM
Name UniqueID
AX1 239949 01 119440AD3
And later parse each of these sections in to java class object, which is given below:
public class Section {
String timestamp;
List<Row> rows;
}
public class Row {
String name;
String uniqueId;
}
but I am not able to extract the "text" between two positive regex matches. Below is the regular expression i tried:
((1[012]|[1-9]):[0-5][0-9](\\s)?(?i)(am|pm))(?=.*)
But it returns only the time values:
10:30 AM
1:00 PM
1:30 PM
10:30 AM
1:00 PM
1:30 PM
I even tried adding Pattern.MULTILINE to Pattern but it didn't work either.
Assuming the structure you showed us repeats throughout the file, then there are four types of lines in sequence: timestamp, header, data, empty line.
For example, if you want to separate the unique ID from the name, you could try:
String third = "ABX 298819 12 519440AD3";
String uniqueId = third.replaceAll(".*\\s+(\\w+)", "$1");
String name = third.replaceAll("(.*)\\s+\\w+", "$1");
I am trying to solve this question:
Get document on some condition in elastic search java API
My logic is first we get all the position of months which is in string, After that i extract next word which is a 4 digit or 2 digit year, Then calculate difference using this.
For getting months position i am using this piece of code:-
String[] threeMonthArray=new String[]{" Jan "," Feb "," Mar "," Apr "," May "," June "," July "," Aug "," Sep "," Oct "," Nov "," Dec "};
String[] completeMonthArray=new String[]{"January","Feburary","March","April","May","June","July","Augest","September","October","November","December"};
List indexArray=new ArrayList();
for(int i=0;i<threeMonthArray.length;i++){
int index = parsedContent.toLowerCase().indexOf(threeMonthArray[i].toLowerCase());
while (index >= 0) {
System.out.println(threeMonthArray[i]+" : "+index+"------");
indexArray.add(index);
index = parsedContent.toLowerCase().indexOf(threeMonthArray[i].toLowerCase(), index + 1);
}
// System.out.println(threeMonthArray[i]+" : "+parsedContent.toLowerCase().indexOf(threeMonthArray[i].toLowerCase())+"------");
}
Collections.sort(indexArray);
System.out.println( indexArray);
And it's showing this output:-
[2873, 2884, 3086, 3098, 4303, 4315, 6251, 6262, 8130, 8142, 15700, 15711]
I am getting correct position. My problem is how i can get next word which must be a digit.
Jun 2010 to Sep 2011 First Document
Jun 2009 to Aug 2011 Second Document
Nov 2011 – Sep 2012 Third Document
Nov 2012- Sep 2013 Forth Document
You can use a regular expression to find the next number starting at the position of your last found month:
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(parsedContent);
if (m.find(index)) {
String year = m.group();
}
I am trying to extract movie names from a list that looks like this:
The Maze Runner 2014 DVDRip XviD MP3-RARBG
Fury 2014 DVDSCR x264 AC3-Blackjesus
Dracula's Untold Story (WebRip / 2014)
I need to extract the words up to the year or a special character like ( or [ but not '
The Maze Runner 2014 DVDRip XviD MP3-RARBG ==> The Maze Runner
Fury 2014 DVDSCR x264 AC3-Blackjesus ==> Fury
Dracula's Untold Story (WebRip / 2014) == Dracula's Untold Story
Dracula's Untold Story [WebRip / 2014] == Dracula's Untold Story
I have no idea how to go on about writing a complex regex like this. Any ideas?
The below code snippet can be helpful to meet your requirements
public static String extractMovieName(String movieNameString){
Pattern pattern = Pattern.compile("([\\w' ]+)([\\[]|[\\(]|[\\d]{4})");
Matcher matcher = pattern.matcher(movieNameString);
String extractedName = "";
if(matcher.find()){
extractedName = matcher.group(1);
}
return extractedName;
}
^[a-zA-Z0-9\ '-]+(?=\b\d{4}\b|\()
Try this.See demo.
http://regex101.com/r/yR3mM3/4
Try below code:
Example:
System.out.println("Fury 2014 DVDSCR x264 AC3-Blackjesus".replaceAll("\\s(\\d|\\(|\\[).*", ""));
January 22, 2014
I want to split the string into three but the second has , and space
If I understand you, you could use a single (zero or one matches) regular expression with something like -
String in = "January 22, 2014";
String[] arr = in.split(",?\\ ");
for (String str : arr) {
System.out.println(str);
}
Output is
January
22
2014
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
regex replace all ignore case
I need to replace all occurrences of Sony Ericsson with a tilda in between them. This is what I have tried
String outText="";
String inText="Sony Ericsson is a leading company in mobile. The company sony ericsson was found in oct 2001";
String word = "sony ericsson";
outText = inText.replaceAll(word, word.replaceAll(" ", "~"));
System.out.println(outText);
The output of this is
Sony Ericsson is a leading company in mobile. The company sony~ericsson was found in oct 2001
But what I want is
Sony~Ericsson is a leading company in mobile. The company sony~ericsson was found in oct 2001
It should ignore cases & give the desired output.
Change it to
outText = inText.replaceAll("(?i)" + word, word.replaceAll(" ", "~"));
to make the search / replace case insensitive.
String outText="";
String inText="Sony Ericsson is a leading company in mobile. " +
"The company sony ericsson was found in oct 2001";
String word = "sony ericsson";
outText = inText.replaceAll("(?i)" + word, word.replaceAll(" ", "~"));
System.out.println(outText);
Output:
sony~ericsson is a leading company in mobile.
The company sony~ericsson was found in oct 2001
Avoid ruining the original capitalization:
In the above approach however, you're ruining the capitalization of the replaced word. Here is a better suggestion:
String inText="Sony Ericsson is a leading company in mobile. " +
"The company sony ericsson was found in oct 2001";
String word = "sony ericsson";
Pattern p = Pattern.compile(word, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(inText);
StringBuffer sb = new StringBuffer();
while (m.find()) {
String replacement = m.group().replace(' ', '~');
m.appendReplacement(sb, Matcher.quoteReplacement(replacement));
}
m.appendTail(sb);
String outText = sb.toString();
System.out.println(outText);
Output:
Sony~Ericsson is a leading company in mobile.
The company sony~ericsson was found in oct 2001
str.replaceAll(regex, repl) is equal to Pattern.compile(regex).matcher(str).replaceAll(repl). Thus, you can make your matcher case-insensitive with a flag:
Pattern.compile(regex, Pattern.CASE_INSENSITIVE).matcher(str).replaceAll(repl)
Using backreferences to preserve case:
Pattern.compile("(sony) (ericsson)", Pattern.CASE_INSENSITIVE)
.matcher(str)
.replaceAll("$1~$2")
Gives:
Sony~Ericsson is a leading company in mobile. The company sony~ericsson was found in oct 2001
String outText = inText.replaceAll("(?i)(Sony) (Ericsson)", "$1~$2");
Output:
Sony~Ericsson is a leading company in mobile. The company Sony~ericsson was found in oct 2001