I'm using regular expressions to parse logs. I was previously reading the File into a string array, and then iterating through the string array appending if I don't match the timestamp, otherwise I add the line I'm iterating on to a variable and continue the search. Once I get a complete log entry, I use another regular expression to parse it.
Scanning file
try {
List<String> lines = Files.readAllLines(filepath);
Pattern pattern = Pattern.compile("\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}");
Matcher matcher;
String currentEntry = "";
for(String line : lines) {
matcher = pattern.matcher(line);
// If this is a new entry, then wrap up the previous one and start again
if ( matcher.lookingAt() ) {
// If the previous entry was not empty
if(!StringUtils.trimWhitespace(currentEntry).isEmpty()) {
entries.add(new LogEntry(currentEntry));
}
// Clear the current entry
currentEntry = "";
}
if (!currentEntry.trim().isEmpty())
currentEntry += "\n";
currentEntry += line;
}
// At the end, if we have one leftover entry, add it
if (!currentEntry.isEmpty()) {
entries.add(new LogEntry(currentEntry));
}
}catch (Exception ex){
return null;
}
Parsing entry
final private static String timestampRgx = "(?<timestamp>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})";
final private static String levelRgx = "(?<level>(?>INFO|ERROR|WARN|TRACE|DEBUG|FATAL))";
final private static String classRgx = "\\[(?<class>[^]]+)\\]";
final private static String threadRgx = "\\[(?<thread>[^]]+)\\]";
final private static String textRgx = "(?<text>.*)";
private static Pattern PatternFullLog = Pattern.compile(timestampRgx + " " + levelRgx + "\\s+" + classRgx + "-" + threadRgx + "\\s+" + textRgx + "$", Pattern.DOTALL);
public LogEntry(String logText) {
try {
Matcher matcher = PatternFullLog.matcher(logText);
matcher.find();
String dateStr = matcher.group("timestamp");
timestamp = new DateLogLevel();
timestamp.parseLogDate(dateStr);
String levelStr = matcher.group("level");
loglevel = LOG_LEVEL.valueOf(levelStr);
String fullClassStr = matcher.group("class");
String[] classNameArray = fullClassStr.split("\\.");
framework = classNameArray[2];
classname = classNameArray[classNameArray.length - 1];
threadname = matcher.group("thread");
logtext = matcher.group("text");
notes = "";
} catch (Exception ex) {
throw ex;
}
}
What I want to figure out
What I really want to do is read the whole file as a single string, then use a single regex to parse this out line by line, using a single regular expression once. My plan was to use the same expression I use in the constructor, but when looking for the log text make it end at either EOF or the next log line, as such
final String timestampRgx = "(?<timestamp>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})";
final String levelRgx = "(?<level>(?>INFO|ERROR|WARN|TRACE|DEBUG|FATAL))";
final String classRgx = "\\[(?<class>[^]]+)\\]";
final String threadRgx = "\\[(?<thread>[^]]+)\\]";
final String textRgx = "(?<text>.*[^(\Z|\\d{4}\-\\d{2}\-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})"; // change to handle multiple lines
private static Pattern PatternFullLog = Pattern.compile(timestampRgx + " " + levelRgx + "\\s+" + classRgx + "-" + threadRgx + "\\s+" + textRgx + "$", Pattern.DOTALL);
try {
// Read file into string
String lines = readFile(filepath);
Pattern pattern = Pattern.compile("\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}");
Matcher matcher;
matcher = pattern.matcher(line);
while(matcher.find())
String dateStr = matcher.group("timestamp");
timestamp = new DateLogLevel();
timestamp.parseLogDate(dateStr);
String levelStr = matcher.group("level");
loglevel = LOG_LEVEL.valueOf(levelStr);
String fullClassStr = matcher.group("class");
String[] classNameArray = fullClassStr.split("\\.");
framework = classNameArray[2];
classname = classNameArray[classNameArray.length - 1];
threadname = matcher.group("thread");
logtext = matcher.group("text");
entries.add(
new LogEntry(
timestamp,
loglevel,
framework,
threadname,
logtext,
""/* Notes are empty when importing new file */));
}
}
}catch (Exception ex){
return null;
}
The problem is that I can't seem to get the last group (textRgx) to multiline match until either a timestamp or end of file. Does anyone have any thoughts?
Sample Log Entries
2017-03-14 22:43:14,405 FATAL [org.springframework.web.context.support.XmlWebApplicationContext]-[localhost-startStop-1] Refreshing Root WebApplicationContext: startup date [Tue Mar 14 22:43:14 UTC 2017]; root of context hierarchy
2017-03-14 22:43:14,476 INFO [org.springframework.beans.factory.xml.XmlBeanDefinitionReader]-[localhost-startStop-1] Loading XML bean definitions from Serv
2017-03-14 22:43:14,476 INFO [org.springframework.beans.factory.xml.XmlBeanDefinitionReader]-[localhost-startStop-1] Here is a multiline
log entry with another entry after
2017-03-14 22:43:14,476 INFO [org.springframework.beans.factory.xml.XmlBeanDefinitionReader]-[localhost-startStop-1] Here is a multiline
log entry with no entries after
You need to define the patterns like
final static String timestampRgx = "(?<timestamp>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})";
final static String levelRgx = "(?<level>INFO|ERROR|WARN|TRACE|DEBUG|FATAL)";
final static String classRgx = "\\[(?<class>[^\\]]+)]";
final static String threadRgx = "\\[(?<thread>[^\\]]+)]";
final static String textRgx = "(?<text>.*?)(?=\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}|\\Z)";
private static Pattern PatternFullLog = Pattern.compile(timestampRgx + " " + levelRgx + "\\s+" + classRgx + "-" + threadRgx + "\\s+" + textRgx, Pattern.DOTALL);
Then, you may use that like
Matcher matcher = PatternFullLog.matcher(line);
See the Java demo
Here is what the pattern looks like:
(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}) (?<level>INFO|ERROR|WARN|TRACE|DEBUG|FATAL)\s+\[(?<class>[^\]]+)]-\[(?<thread>[^\]]+)]\s+(?<text>.*?)(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}|\Z)
See the regex demo.
Some notes:
You had several issues with escaping symbols (] inside a character class must be escaped, and \- should have been replaced with -
The pattern to match text up to the datetime or end of string is (?<text>.*?)(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}|\Z) where .*? matches any char, 0+ occurrences, reluctantly, up to the first occurrence of the timestamp pattern (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}) or end of string (\Z).
Related
I have a .txt file that I browse through a bufferReader and I need to extract the last character from this String, I leave the line below
<path
action="m"
text-mod="true"
mods="true"
kind="file">branches/RO/2021Align01/CO/DGSIG-DAO/src/main/java/eu/ca/co/vo/CsoorspeWsVo.java</path>
I have the following code that takes my entire line and sets it in a list, but I just need it Cs00rspeWsVo
while ((line = bufferdReader.readLine()) != null) {
Excel4 excel4 = new Excel4();
if (line.contains("</path>")) {
int index1 = line.indexOf(">");
int index2 = line.lastIndexOf("<");
line = line.substring(index1, index2);
excel4.setName(line);
listExcel4.add(excel4);
}
}
and I only want to extract Cs00rspeWsVo from here.
can anyone help me? thanks
You can use Regex groups to get it for example
public static void main(String []args){
String input = "<path\n" +
" action=\"m\"\n" +
" text-mod=\"true\"\n" +
" mods=\"true\"\n" +
" kind=\"file\">branches/RO/2021Align01/CO/DGSIG-DAO/src/main/java/eu/ca/co/vo/CsoorspeWsVo.java</path>\n";
Pattern pattern = Pattern.compile("kind=\"file\">.+/(.+\\..+)</path>");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
String fileName = matcher.group(1);
System.out.println(fileName);
}
}
Output will be -> CsoorspeWsVo.java
and if you want the fill path change the regex to
Pattern pattern = Pattern.compile("kind=\"file\">(.+)</path>");
The output will be:
branches/RO/2021Align01/CO/DGSIG-DAO/src/main/java/eu/ca/co/vo/CsoorspeWsVo.java
And you can get name and extension in two groups for example
Pattern pattern = Pattern.compile("kind=\"file\">.+/(.+)\\.(.+)</path>");
And inside the if
String fileName = matcher.group(1);
String fileExtension = matcher.group(2);
I have a very long text and I'm extracting some specific values that are followed by some particular words. Here's an example of my long text:
.........
FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]
.........
TotalFrames[ValMin: 100000, ValMax:200000]
.........
MemoryUsage(In MB)[ValMin:190000MB, ValMax:360000MB]
.........
here's my code:
File file = filePath.toFile();
JSONObject jsonObject = new JSONObject();
String FPSMin="";
String FPSMax="";
String TotalFramesMin="";
String TotalFramesMax="";
String MemUsageMin="";
String MemUsageMax="";
String log = "my//log//file";
final Matcher matcher = Pattern.compile("FPS/\(FramesPerSecond/\)/\[ValMin:");
if(matcher.find()){
FPSMin= matcher.end().trim();
}
But I can't make it work. Where am I wrong? Basically I need to select, for each String, the corresponding values (max and min) coming from that long text and store them into the variables. Like
FPSMin = 29.0000
FPSMax = 35.0000
FramesMin = 100000
Etc
Thank you
EDIT:
I tried the following code (in a test case) to see if the solution could work, but I'm experiencing issues because I can't print anything except an object. Here's the code:
#Test
public void whenReadLargeFileJava7_thenCorrect()
throws IOException, URISyntaxException {
Scanner txtScan = new Scanner("path//to//file//test.txt");
String[] FPSMin= new String[0];
String FPSMax= "";
//Read File Line By Line
while (txtScan.hasNextLine()) {
// Print the content on the console
String str = txtScan.nextLine();
Pattern FPSMin= Pattern.compile("^FPS\\(FramesPerSecond\\)\\[ValMin:");
Matcher matcher = FPSMin.matcher(str);
if(matcher.find()){
String MinMaxFPS= str.substring(matcher.end(), str.length()-1);
String[] splitted = MinMaxFPS.split(",");
FPSMin= splitted[0].split(": ");
FPSMax = splitted[1];
}
System.out.println(FPSMin);
System.out.println(FPSMax);
}
Maybe your pattern should be like this ^FPS\\(FramesPerSecond\\)\\[ValMin: . I've tried it and it works for me.
String line = "FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]";
Pattern pattern = Pattern.compile("^FPS\\(FramesPerSecond\\)\\[ValMin:");
Matcher matcher = pattern.matcher(line);
if (matcher.find()) {
System.out.println(line.substring(matcher.end(), line.length()-1));
}
}
In that way, you get the offset of the line that you want to extract data and using the substring function you can get all characters starting from offset until the size of the line-1 (because you dont want to get also the ] character)
The following regular expression will match and capture the name, min and max:
Pattern.compile("(.*)\\[.+:\\s*(\\d+(?:\\.\\d+)?)[A-Z]*,.+:\\s*(\\d+(?:\\.\\d+)?)[A-Z]*\\]");
Usage (extracting the captured groups):
String input = (".........\n" +
"FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]\n" +
".........\n" +
"TotalFrames[ValMin: 100000, ValMax:200000]\n" +
".........\n" +
"MemoryUsage(In MB)[ValMin:190000MB, ValMax:360000MB]\n" +
".........");
for (String s : input.split("\n")) {
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1) + ", " + matcher.group(2) + ", " + matcher.group(3));
}
}
Output:
FPS(FramesPerSecond), 29.0000, 35.000
TotalFrames, 100000, 200000
MemoryUsage(In MB), 190000, 360000
I am trying to find and replace in the file using java but unable to get the solution.
File contents are
"ProductCode" = "8:{3E3CDCB6-286C-4B7F-BCA6-D347A4AE37F5}"
"ProductCode" = "8:.NETFramework,Version=v4.5"
I have to update the guid of first one which is 3E3CDCB6-286C-4B7F-BCA6-D347A4AE37F5
String line = "\"ProductCode\" = \"8:{3E3CDCB6-286C-4B7F-BCA6-D347A4AE37F5}\"";
String pattern = "[\"]([P][r][o][d][u][c][t][C][o][d][e]).+([\"])(\\s)[\"][8][:][{]";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
System.out.println(m.matches());
I am getting false.
please provide the solution if possible.
Thanks in advance.
"ProductCode" = "8:{3E3CDCB6-286C-4B7F-BCA6-D347A4AE37F5}" This is of the form:
quote + ProductCode + quote + whitespace + equals + whitespace +
quote + number + colon + any + quote
A simple Regex for this is \"ProductCode\"\s*=\s*\"\d:(.+)\"
When we escape this to a Java string we get \\\"ProductCode\\\"\\s*=\\s*\\\"\\d:(.+)\\\"
Try this pattern:
String pattern = "^\\\"(ProductCode)\\\"\\s\\=\\s\\\"\\w\\:\\{(\\w+\\-*\\w+\\-\\w+\\-\\w+\\-\\w+)\\}\\\"$";
Using regex for this problem is like taking a sledgehammer to break a nut. Rather simple:
final String line = "\"ProductCode\" = \"8:{3E3CDCB6-286C-4B7F-BCA6-D347A4AE37F5}\"";
final String prefix = "\"ProductCode\" = \"8:{";
final int prefixIndex = line.indexOf(prefix);
final String suffix = "}\"";
final int suffixIndex = line.indexOf(suffix);
final String guid = line.substring(prefixIndex + prefix.length(), suffixIndex);
In my program I need to loop through a variety of dates. I am writing this program in java, and have a bit of experience with readers, but I do not know which reader would complete this task the best, or if another class would work better.
The dates would be input into a text file in the format as follows:
1/1/2013 to 1/7/2013
1/8/2013 to 1/15/2013
Or something of this manner. I would need to break each range of dates into 6 local variables for the loop, then change them for the next loop. The variables would be coded for example:
private static String startingMonth = "1";
private static String startingDay = "1";
private static String startingYear = "2013";
private static String endingMonth = "1";
private static String endingDay = "7";
private static String endingYear = "2013";
I imagine this could be done creating several delimiters to look for, but I do not know that this would be the easiest way. I have been looking at this post for help, but cant seem to find a relevant answer. What would be the best way to go about this?
There are several options.
You could use the scanner, and set the delimiter to include the slash. If you want the values as ints and not string, just use sc.nextInt()
Scanner sc = new Scanner(input).useDelimiter("\\s*|/");
// You can skip the loop to just read a single line.
while(sc.hasNext()) {
startingMonth = sc.next();
startingDay = sc.next();
startingYear = sc.next();
// skip "to"
sc.next()
endingMonth = sc.next();
endingDay = sc.next();
endingYear = sc.next();
}
You can use regex, as alfasin suggest, but this case is rather simple so you can just match the first and last space.
String str = "1/1/2013 to 1/7/2013";
String startDate = str.substring(0,str.indexOf(" "));
String endDate = str.substring(str.lastIndexOf(" ")+1);ยจ
// The rest is the same:
String[] start = startDate.split("/");
System.out.println(start[0] + "-" + start[1] + "-" + start[2]);
String[] end = endDate.split("/");
System.out.println(end[0] + "-" + end[1] + "-" + end[2]);
String str = "1/1/2013 to 1/7/2013";
Pattern pattern = Pattern.compile("(\\d+/\\d+/\\d+)");
Matcher matcher = pattern.matcher(str);
matcher.find();
String startDate = matcher.group();
matcher.find();
String endDate = matcher.group();
String[] start = startDate.split("/");
System.out.println(start[0] + "-" + start[1] + "-" + start[2]);
String[] end = endDate.split("/");
System.out.println(end[0] + "-" + end[1] + "-" + end[2]);
...
OUTPUT
1-1-2013
1-7-2013
I am trying to find a regular expression that would match the following format:
path/*.file_extension
For example:
temp/*.jpg
usr/*.pdf
var/lib/myLib.so
tmp/
Using the regex, I want to store the matching parts into a String array, such as:
String[] tokens;
// regex magic here
String path = tokens[0];
String filename = tokens[1];
String extension = tokens[2];
In case of the last case tmp/, that contains no filename and extension, then token[1] and token[2] would be null.
In case of the:
usr/*.pdf
then the token[1] would contain only the string "*".
Thank you very much for your help.
If you can use Java7 then you can use named groups like this
String data = "temp/*.jpg, usr/*.pdf, var/lib/*.so, tmp/*, usr/*, usr/*.*";
Pattern p = Pattern
.compile("(?<path>(\\w+/)+)((?<name>\\w+|[*]))?([.](?<extension>\\w+|[*]))?");
Matcher m = p.matcher(data);
while (m.find()) {
System.out.println("data=" + m.group());
System.out.println("path=" + m.group("path"));
System.out.println("name=" + m.group("name"));
System.out.println("extension=" + m.group("extension"));
System.out.println("------------");
}
This code should wotk:
String line = "var/lib/myLib.so";
Pattern p = Pattern.compile("(.+?(?=/[^/]*$))/([^.]+)\\.(.+)$");
Matcher m = p.matcher(line);
List<String> tokens = new ArrayList<String>();
if (m.find()) {
for (int i=1; i <= m.groupCount(); i++) {
tokens.add(m.group(i));
}
}
System.out.println("Tokens => " + tokens);
OUTPUT:
Tokens => [var/lib, myLib, so]
I'm assuming you're using Java. This should work:
Pattern.compile("path/(.*?)(?:\\.(file_extension))?");
Why use a regular expression?
I personally find lastIndexOf more readable.
String path;
String filename;
#Nullable String extension;
// Look for the last slash
int lastSlash = fullPath.lastIndexOf('/');
// Look for the last dot after the last slash
int lastDot = fullPath.lastIndexOf('.', lastSlash + 1);
if (lastDot < 0) {
filename = fullPath.substring(lastSlash + 1);
// If there is no dot, then there is no extension which
// is distinct from the empty extension in "foo/bar."
extension = null;
} else {
filename = fullPath.substring(lastSlash + 1, lastDot);
extension = fullPath.substring(lastDot + 1);
}
On a different approach, a simple usage of 'substring()/lastIndexOf()' methods should serve the purpose:
String filePath = "var/lib/myLib.so";
String fileName = filePath.substring(filePath.lastIndexOf('/')+1);
String path = filePath.substring(0, filePath.lastIndexOf('/'));
String fileName = fileName.substring(0, fileName.lastIndexOf('.'));
String extension = fileName.substring(fileName.lastIndexOf('.')+1);
Please Note: You need to handle the alternate scenarios e.g. file path without extension.