extracting a substring to use in simple date format

extracting a substring to use in simple date format - java

I am reading log files and need to extract the date and year from certain lines in order to then use simple date format and find out the average time in between 2 actions. An example of what a line that I would need the date from looks like this.
(INFO ) [07 Feb 2013 08:04:39,161] -- ua, navigation, fault
I can't figure out if I should split the the line twice or use the substring function. Also I don't think I need to include that last number when converting to simple date format (the 161).

I would suggest you to use regex to extract the required data from log files.

Consider using regex groups to extract the String you want. You can use
Pattern p = Pattern.compile("you regex pattern with () around the bit you wanna extract");
Matcher m = p.matcher(theLine);
if (m.find()) {
String date = m.group(1);
}

I'd go for using substring:
final int from = line.indexOf('[') + 1;
final int to = line.indexOf(']', from); // or , if you do not want to have the last number included
final String timestampAsString = line.substring(from, to);
BTW: Add some from/to-checks if valid indices have been found - for example to detect errors early on when the log format changed.

Related

Regex end with same pattern select

I have two string of pattern like below :
1. com.sumeet.iot.op.v1.motor.2f628568b15f11eb85290242ac130003
2. com.sumeet.iot.op.v1.2f628568b15f11eb85290242ac130003
The generic format is com.sumeet.iot.op.v1.< STRING >.< UUID > and com.sumeet.iot.op.v1.< UUID >
I am using regex = ^com.sumeet.iot.op.v1.*
It is selecting both, but I want to select the 2nd string only com.sumeet.iot.op.v1.< UUID >
What will be regex to select only the second string?

You may use:
^com\.sumeet\.iot\.op\.v1\.[a-f0-9]{32}$
The [a-f0-9]{32}$ final portion of the regex ensures that the second variant of the domain which you want to match ends in a UUID.

Java string indexing make me confused

So i need to gather data from my db, it's holiday date in my country, the data comes like this
Example 1 : THU 21 May Ascension Day of Jesus Christ *ICDX GOLD open for
Example 2 : MON-THU 28-31 Dec Substitute for Commemoration of Idul Fitri Festival
So i need to get data from days, dates, and the holiday name, for get data from example 1 i'm using code like this
public static void main(String[] args) {
String ex1 = "THU 21 May Ascension Day of Jesus Christ *ICDX GOLD open for";
String ex2 = "MON-THU 28-31 Dec Substitute for Commemoration of Idul Fitri Festival ";
String[] trim1 = ex1.trim().split("\\s+"); //to split by space
String[] trim2 = ex1.trim().split(" "); //to split by 3 space so i got the data from multiple space as delimiter
System.out.println("DAY " +trim1[0]);//display day
System.out.println("DATE " +trim1[1] +trim1[2]+"2020");//display date
System.out.println("HOLIDAY NAME " +trim2[3]);//dispay holiday name
}
The Output come like this
DAY MON
DATE 21May2020
HOLIDAY NAME Ascension Day of Jesus Christ
and just like what i need, but when come to example 2, i can't use same code because the space is different, how to get the data i need with example 1 and 2 with same code.
i am new in java so i'm sorry if my question looking dumb, i hope you can help me.Thanks

.split("\\s+") will split at any space, including multiple spaces. Eg. it will split at 1 space or more.
This means that you are able to split at any amount of spaces (what you want). However, this will also split your text comments. You are able to limit the length of the array produced (the amount of times it is split) using .split(regex, n), which will result in an array of n-1 size at most. See this for more details
As for splitting out your two textual comments, I cannot see a way to do this.
Substitute for Commemoration of Idul Fitri Festival "; contains no way of telling what is the first text comment and the second.
It seems quite strange to me that you receive information from your database like this, I would recommend seeing if there are other options for doing this. There is almost certainly a way to get seperate fields.
If have the ability to change all the information in the database, you could put single quotes (') or some other seperator, which you would then be able to split out the two pieces of text.

This is basically what #DanielBarbarian suggested: Since the information seems to always start at the same indexes, you can just use those to get what you need.
String ex1 = "THU 21 May Ascension Day of Jesus Christ *ICDX GOLD open for";
String ex2 = "MON-THU 28-31 Dec Substitute for Commemoration of Idul Fitri Festival ";
String day = ex2.substring(0, 8).trim();
String date = ex2.substring(8, 14).trim() + ex2.substring(14, 22).trim() + "2020";
String name = ex2.substring(22);
System.out.println("DAY " + day);// display day
System.out.println("DATE " + date);// display date
System.out.println("HOLIDAY NAME " + name);// dispay holiday name

How to trim/cut string in java by symbol?

I'm working on a project where my API returns url with id at the end of it and I want to extract it to be used in another function. Here's example url:
String advertiserUrl = http://../../.../uuid/advertisers/4 <<< this is the ID i want to extract.
At the moment I'm using java's string function called substring() but this not the best approach as IDs could become 3 digit numbers and I would only get part of it. Heres my current approach:
String id = advertiserUrl.substring(advertiserUrl.length()-1,advertiserUrl.length());
System.out.println(id) //4
It works in this case but if id would be e.g "123" I would only get it as "3" after using substring, so my question is: is there a way to cut/trim string using dashes "/"? lets say theres 5 / in my current url so the string would get cut off after it detects fifth dash? Also any other sensible approach would be helpful too. Thanks.
P.s uuid in url may vary in length too

You don't need to use regular expressions for this.
Use String#lastIndexOf along with substring instead:
String advertiserUrl = "http://../../.../uuid/advertisers/4";// <<< this is the ID i want to extract.
// this implies your URLs always end with "/[some value of undefined length]".
// Other formats might throw exception or yield unexpected results
System.out.println(advertiserUrl.substring(advertiserUrl.lastIndexOf("/") + 1));
Output
4
Update
To find the uuid value, you can use regular expressions:
String advertiserUrl = "http://111.111.11.111:1111/api/ppppp/2f5d1a31-878a-438b-a03b-e9f51076074a/adver‌tisers/9";
// | preceded by "/"
// | | any non-"/" character, reluctantly quantified
// | | | followed by "/advertisers"
Pattern p = Pattern.compile("(?<=/)[^/]+?(?=/adver‌tisers)");
Matcher m = p.matcher(advertiserUrl);
if (m.find()) {
System.out.println(m.group());
}
Output
2f5d1a31-878a-438b-a03b-e9f51076074a

You can either split the string on slashes and take the last position of the array returned, or use the lastIndexOf("/") to get the index of the last slash and then substring the rest of the string.

Use the lastIndexOf() method, which returns the index of the last occurrence of the specified character.
String id = advertiserUrl.substring(advertiserUrl.lastIndexOf('/') + 1, advertiserUrl.length());

Making a Regex More Dynamic

I posted this question a couple weeks ago pertaining to extracting a capture group using regex in Java, Extracting Capture Group Using Regex, and I received a working answer. I also posted this question a couple weeks ago pertaining to character replacement in Java using regex, Replace Character in Matching Regex, and received an even better answer that was more dynamic than the one I got from my first post. I'll quickly illustrate by example. I have a string like this that I want to extract the "ID" from:
String idInfo = "Any text up here\n" +
"Here is the id\n" +
"\n" +
"?a0 12 b5\n" +
"&Edit Properties...\n" +
"And any text down here";
And in this case I want the output to just be:
a0 12 b5
But it turns out the ID could be any number of octets (just has to be 1 or more octets), and I want my regex to be able to basically account for an ID of 1 octet then any number of subsequent octets (from 0 to however many). The person I received an answer from in my Replace Character in Matching Regex post did this for a similar but different use case of mine, but I'm having trouble porting this "more dynamic" regex over to the first use case.
Currently, I have ...
Pattern p = Pattern.compile("(?s)?:Here is the id\n\n\\?([a-z0-9]{2})|(?<!^)\\G:?([a-z0-9]{2})|.*?(?=Here is the id\n\n\\?)|.+");
Matcher m = p.matcher(certSerialNum);
String idNum = m.group(1);
System.out.println(idNum);
But it's throwing an exception. In addition, I would actually like it to use all known adjacent text in the pattern including "Here is the id\n\n\?" and "\n&Edit Properties...". What corrections do I need to get this working?

Seems like you want something like this,
String idInfo = "Any text up here\n" +
"Here is the id\n" +
"\n" +
"?a0 12 b5\n" +
"&Edit Properties...\n" +
"And any text down here";
Pattern regex = Pattern.compile("Here is the id\\n+\\?([a-z0-9]{2}(?:\\s[a-z0-9]{2})*)(?=\\n&Edit Properties)");
Matcher matcher = regex.matcher(idInfo);
while(matcher.find()){
System.out.println(matcher.group(1));
}
Output:
a0 12 b5
DEMO

Regex for parsing string with same type of expression

I am new to regex parsing in java. I want to parse the string which contain the records. But I want to select the selected part of that record only.
\"6\":\"Services Ops\",\"practice_name\":\"Services Ops\",\"7\":\"Management\",
For this, I have written regex expression as
(^\\\"6\\\":\\\"[A-Za-z \s]*)
and above expression gives me result as : \"6\":\"Services Ops\
I want only Service Ops
And also there are multiple records like \"5"\:\"xxx"\ and so on thus if I write the expression for only Service Ops then entries from other fields are also included in the result of the expression.
Is there any way that we can select the string which start with some pattern but we can exclude that pattern.
Like in above example, string starting with \"6\":\" but we can exclude this part and get only Service Ops as result.
Thank you.

You can use lookarounds which perform only a check but don't match:
lookahead (?=...)
lookbehind(?<=...)
example:
(?<=\\\"6\\\":\\\")[^\"]++(?=\")
An another way is to use a capturing group (...):
\\\"6\\\":\\\"([^\"]++)\"
Then you can extract only the content of the group. Example:
Pattern p = Pattern.compile("\\\"6\\\":\\\"([^\"]++)\"");
Matcher m = p.matcher(yourString);
if (m.matches()) {
System.out.println(m.group(1));
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

extracting a substring to use in simple date format - java

I would suggest you to use regex to extract the required data from log files.

Consider using regex groups to extract the String you want. You can use Pattern p = Pattern.compile("you regex pattern with () around the bit you wanna extract"); Matcher m = p.matcher(theLine); if (m.find()) { String date = m.group(1); }

Related

Regex end with same pattern select

Java string indexing make me confused

How to trim/cut string in java by symbol?

Making a Regex More Dynamic

Regex for parsing string with same type of expression

Categories

Resources