Extract a sequence from a string in java using Regex

Extract a sequence from a string in java using Regex - java

I have a log with a pattern . The thing is in the last it is a little bit different from regular .
a> nc,71802265,0,"Tuesday, June 26, 2012 09:06:49 UTC",38.8335,-122.8072,1.6,0.00,21,"Northern California"
b> ci,11127314,0,"Tuesday, June 26, 2012 08:37:52 UTC",34.2870,-118.3360,2.2,10.20,100,"Greater Los Angeles area, California"
c> us,b000aqpn,6,"Tuesday, June 26, 2012 08:29:55 UTC",53.4819,-165.2794,4.4,25.60,96,"Fox Islands, Aleutian Islands, Alaska"
String regex = "^\\"[a-z,A-Z]\\s*\\(,)*[a-z,A-Z]\\"";
Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
from a I need --- "Northern California"
from b I need --- "Greater Los Angeles area, California" and so on
Thanks

You could use String#lastIndexOf, starting from the penultimate character to find the first ":
String s = "a> nc,71802265,0,\"Tuesday, June 26, 2012 09:06:49 UTC\",38.8335,-122.8072,1.6,0.00,21,\"Northern California\"";
int start = s.lastIndexOf("\"", s.length() - 2) + 1;
String location = s.substring(start, s.length() - 1);

Why not use String.split(regex, limit) and specify the number of commas you need to split on.
That way you can get the last field intact with commas, and then simply strip the double-quotes.

Use the $ anchor to tell that your match should be at the end of the line:
String lines = "a> nc,71802265,0,\"Tuesday, June 26, 2012 09:06:49 UTC\",38.8335,-122.8072,1.6,0.00,21,\"Northern California\"\nb> ci,11127314,0,\"Tuesday, June 26, 2012 08:37:52 UTC\",34.2870,-118.3360,2.2,10.20,100,\"Greater Los Angeles area, California\"\nc> us,b000aqpn,6,\"Tuesday, June 26, 2012 08:29:55 UTC\",53.4819,-165.2794,4.4,25.60,96,\"Fox Islands, Aleutian Islands, Alaska\"";
String regex = "\"[^\"]*\"$";
Matcher m = Pattern.compile(regex, Pattern.MULTILINE).matcher(lines);
while (m.find()) {
System.out.println(m.group());
}
outputs:
"Northern California"
"Greater Los Angeles area, California"
"Fox Islands, Aleutian Islands, Alaska"

for(String s: log.split("\n")){
System.out.println(s.replaceAll(".+(\".+\")$","$1"));
}

Related

Parsing flat file with repeating section using regex

I have a flat file with data in following format:
1:00 PM
Name UniqueID
ABX 298819 12 519440AD3
12:00 AM
Name UniqueID
AX1 239949 01 119440AD3
Where each section starts with a time, followed by headers and then values. I am trying to capture each of these sections through regex, so I can get:
section 1:
1:00 PM
Name UniqueID
ABX 298819 12 519440AD3
section 2:
12:00 AM
Name UniqueID
AX1 239949 01 119440AD3
And later parse each of these sections in to java class object, which is given below:
public class Section {
String timestamp;
List<Row> rows;
}
public class Row {
String name;
String uniqueId;
}
but I am not able to extract the "text" between two positive regex matches. Below is the regular expression i tried:
((1[012]|[1-9]):[0-5][0-9](\\s)?(?i)(am|pm))(?=.*)
But it returns only the time values:
10:30 AM
1:00 PM
1:30 PM
10:30 AM
1:00 PM
1:30 PM
I even tried adding Pattern.MULTILINE to Pattern but it didn't work either.

Assuming the structure you showed us repeats throughout the file, then there are four types of lines in sequence: timestamp, header, data, empty line.
For example, if you want to separate the unique ID from the name, you could try:
String third = "ABX 298819 12 519440AD3";
String uniqueId = third.replaceAll(".*\\s+(\\w+)", "$1");
String name = third.replaceAll("(.*)\\s+\\w+", "$1");

Next digit after a word using index position in java

I am trying to solve this question:
Get document on some condition in elastic search java API
My logic is first we get all the position of months which is in string, After that i extract next word which is a 4 digit or 2 digit year, Then calculate difference using this.
For getting months position i am using this piece of code:-
String[] threeMonthArray=new String[]{" Jan "," Feb "," Mar "," Apr "," May "," June "," July "," Aug "," Sep "," Oct "," Nov "," Dec "};
String[] completeMonthArray=new String[]{"January","Feburary","March","April","May","June","July","Augest","September","October","November","December"};
List indexArray=new ArrayList();
for(int i=0;i<threeMonthArray.length;i++){
int index = parsedContent.toLowerCase().indexOf(threeMonthArray[i].toLowerCase());
while (index >= 0) {
System.out.println(threeMonthArray[i]+" : "+index+"------");
indexArray.add(index);
index = parsedContent.toLowerCase().indexOf(threeMonthArray[i].toLowerCase(), index + 1);
}
// System.out.println(threeMonthArray[i]+" : "+parsedContent.toLowerCase().indexOf(threeMonthArray[i].toLowerCase())+"------");
}
Collections.sort(indexArray);
System.out.println( indexArray);
And it's showing this output:-
[2873, 2884, 3086, 3098, 4303, 4315, 6251, 6262, 8130, 8142, 15700, 15711]
I am getting correct position. My problem is how i can get next word which must be a digit.
Jun 2010 to Sep 2011 First Document
Jun 2009 to Aug 2011 Second Document
Nov 2011 – Sep 2012 Third Document
Nov 2012- Sep 2013 Forth Document

You can use a regular expression to find the next number starting at the position of your last found month:
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher(parsedContent);
if (m.find(index)) {
String year = m.group();
}

Regex: Get all words until a number of a special character is found

I am trying to extract movie names from a list that looks like this:
The Maze Runner 2014 DVDRip XviD MP3-RARBG
Fury 2014 DVDSCR x264 AC3-Blackjesus
Dracula's Untold Story (WebRip / 2014)
I need to extract the words up to the year or a special character like ( or [ but not '
The Maze Runner 2014 DVDRip XviD MP3-RARBG ==> The Maze Runner
Fury 2014 DVDSCR x264 AC3-Blackjesus ==> Fury
Dracula's Untold Story (WebRip / 2014) == Dracula's Untold Story
Dracula's Untold Story [WebRip / 2014] == Dracula's Untold Story
I have no idea how to go on about writing a complex regex like this. Any ideas?

The below code snippet can be helpful to meet your requirements
public static String extractMovieName(String movieNameString){
Pattern pattern = Pattern.compile("([\\w' ]+)([\\[]|[\\(]|[\\d]{4})");
Matcher matcher = pattern.matcher(movieNameString);
String extractedName = "";
if(matcher.find()){
extractedName = matcher.group(1);
}
return extractedName;
}

^[a-zA-Z0-9\ '-]+(?=\b\d{4}\b|\()
Try this.See demo.
http://regex101.com/r/yR3mM3/4

Try below code:
Example:
System.out.println("Fury 2014 DVDSCR x264 AC3-Blackjesus".replaceAll("\\s(\\d|\\(|\\[).*", ""));

splitting or tokenize comma with space and comma[JAVA]

January 22, 2014
I want to split the string into three but the second has , and space

If I understand you, you could use a single (zero or one matches) regular expression with something like -
String in = "January 22, 2014";
String[] arr = in.split(",?\\ ");
for (String str : arr) {
System.out.println(str);
}
Output is
January
22
2014

replace words using regex [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
regex replace all ignore case
I need to replace all occurrences of Sony Ericsson with a tilda in between them. This is what I have tried
String outText="";
String inText="Sony Ericsson is a leading company in mobile. The company sony ericsson was found in oct 2001";
String word = "sony ericsson";
outText = inText.replaceAll(word, word.replaceAll(" ", "~"));
System.out.println(outText);
The output of this is
Sony Ericsson is a leading company in mobile. The company sony~ericsson was found in oct 2001
But what I want is
Sony~Ericsson is a leading company in mobile. The company sony~ericsson was found in oct 2001
It should ignore cases & give the desired output.

Change it to
outText = inText.replaceAll("(?i)" + word, word.replaceAll(" ", "~"));
to make the search / replace case insensitive.
String outText="";
String inText="Sony Ericsson is a leading company in mobile. " +
"The company sony ericsson was found in oct 2001";
String word = "sony ericsson";
outText = inText.replaceAll("(?i)" + word, word.replaceAll(" ", "~"));
System.out.println(outText);
Output:
sony~ericsson is a leading company in mobile.
The company sony~ericsson was found in oct 2001
Avoid ruining the original capitalization:
In the above approach however, you're ruining the capitalization of the replaced word. Here is a better suggestion:
String inText="Sony Ericsson is a leading company in mobile. " +
"The company sony ericsson was found in oct 2001";
String word = "sony ericsson";
Pattern p = Pattern.compile(word, Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(inText);
StringBuffer sb = new StringBuffer();
while (m.find()) {
String replacement = m.group().replace(' ', '~');
m.appendReplacement(sb, Matcher.quoteReplacement(replacement));
}
m.appendTail(sb);
String outText = sb.toString();
System.out.println(outText);
Output:
Sony~Ericsson is a leading company in mobile.
The company sony~ericsson was found in oct 2001

str.replaceAll(regex, repl) is equal to Pattern.compile(regex).matcher(str).replaceAll(repl). Thus, you can make your matcher case-insensitive with a flag:
Pattern.compile(regex, Pattern.CASE_INSENSITIVE).matcher(str).replaceAll(repl)
Using backreferences to preserve case:
Pattern.compile("(sony) (ericsson)", Pattern.CASE_INSENSITIVE)
.matcher(str)
.replaceAll("$1~$2")
Gives:
Sony~Ericsson is a leading company in mobile. The company sony~ericsson was found in oct 2001

String outText = inText.replaceAll("(?i)(Sony) (Ericsson)", "$1~$2");
Output:
Sony~Ericsson is a leading company in mobile. The company Sony~ericsson was found in oct 2001

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Extract a sequence from a string in java using Regex - java

Why not use String.split(regex, limit) and specify the number of commas you need to split on. That way you can get the last field intact with commas, and then simply strip the double-quotes.

for(String s: log.split("\n")){ System.out.println(s.replaceAll(".+(\".+\")$","$1")); }

Related

Parsing flat file with repeating section using regex

Next digit after a word using index position in java

Regex: Get all words until a number of a special character is found

splitting or tokenize comma with space and comma[JAVA]

replace words using regex [duplicate]

Categories

Resources