I need to pass Apache log file through this regex but not working, return false.
private String accessLogRegex()
{
String regex1 = "^([\\d.]+)"; // Client IP
String regex2 = " (\\S+)"; // -
String regex3 = " (\\S+)"; // -
String regex4 = " \\[([\\w:/]+\\s[+\\-]\\d{4})\\]"; // Date
String regex5 = " \"(.+?)\""; // request method and url
String regex6 = " (\\d{3})"; // HTTP code
String regex7 = " (\\d+|(.+?))"; // Number of bytes
String regex8 = " \"([^\"]+|(.+?))\""; // Referer
String regex9 = " \"([^\"]+|(.+?))\""; // Agent
return regex1+regex2+regex3+regex4+regex5+regex6+regex7+regex8+regex9;
}
Pattern accessLogPattern = Pattern.compile(accessLogRegex());
Matcher entryMatcher;
String log = "64.242.88.10 | 2004-07-25.16:36:22 | "GET /twiki/bin/rdiff/Main/ConfigurationVariables HTTP/1.1" 401 1284 | Mozilla/4.6 [en] (X11; U; OpenBSD 2.8 i386; Nav)";
entryMatcher = accessLogPattern.matcher(log);
if(!entryMatcher.matches()){
System.out.println("" + index +" : couldn't be parsed");
}
I've include the sample of Apache log, it's pip ("|") separated.
Is there a reason you want to use regexes? These are quite error-prone, easy to get wrong, and can be a maintenance nightmare...
An alternative might be to use a library for this, for example this one
That said, if you do want to use a regex, yours contains a number of errors:
String regex1 = "^([\\d.]+)"; // while quite liberal, this should work
String regex2 = " (\\S+)"; // matches the first pipe
String regex3 = " (\\S+)"; // this will match the date field
String regex4 = " \\[([\\w:/]+\\s[+\\-]\\d{4})\\]"; // date has already been matched so this won't work, also this is all wrong
String regex5 = " \"(.+?)\""; // you're not matching the pipe character before the URL; also, why the ?
String regex6 = " (\\d{3})"; // HTTP code
String regex7 = " (\\d+|(.+?))"; // Why are you also matching any other characters than just digits?
String regex8 = " \"([^\"]+|(.+?))\""; // Your sample log line doesn't contain a referer
String regex9 = " \"([^\"]+|(.+?))\""; // Agent is not enclosed in quotes
One possible regex solution for the example log line you have given is this:
String regex1 = "^([\\d.]+)"; // digits and dots: the IP
String regex2 = " \\|"; // why match any character if you *know* there is a pipe?
String regex3 = " ((?:\\d+[-:.])+\\d+)"; // match the date; don't capture the inner group as we are only interested in the full date
String regex4 = " \\|"; // pipe
String regex5 = " \"(.+)\""; // request method and url
String regex6 = " (\\d{3})"; // HTTP code
String regex7 = " (\\d+)"; // Number of bytes
String regex8 = " \\|"; // pipe again
String regex9 = " (.+)"; // The rest of the line is the user agent
Of course this may need some further tweaking if other log lines don't follow the exact same format.
Related
I have a very long text and I'm extracting some specific values that are followed by some particular words. Here's an example of my long text:
.........
FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]
.........
TotalFrames[ValMin: 100000, ValMax:200000]
.........
MemoryUsage(In MB)[ValMin:190000MB, ValMax:360000MB]
.........
here's my code:
File file = filePath.toFile();
JSONObject jsonObject = new JSONObject();
String FPSMin="";
String FPSMax="";
String TotalFramesMin="";
String TotalFramesMax="";
String MemUsageMin="";
String MemUsageMax="";
String log = "my//log//file";
final Matcher matcher = Pattern.compile("FPS/\(FramesPerSecond/\)/\[ValMin:");
if(matcher.find()){
FPSMin= matcher.end().trim();
}
But I can't make it work. Where am I wrong? Basically I need to select, for each String, the corresponding values (max and min) coming from that long text and store them into the variables. Like
FPSMin = 29.0000
FPSMax = 35.0000
FramesMin = 100000
Etc
Thank you
EDIT:
I tried the following code (in a test case) to see if the solution could work, but I'm experiencing issues because I can't print anything except an object. Here's the code:
#Test
public void whenReadLargeFileJava7_thenCorrect()
throws IOException, URISyntaxException {
Scanner txtScan = new Scanner("path//to//file//test.txt");
String[] FPSMin= new String[0];
String FPSMax= "";
//Read File Line By Line
while (txtScan.hasNextLine()) {
// Print the content on the console
String str = txtScan.nextLine();
Pattern FPSMin= Pattern.compile("^FPS\\(FramesPerSecond\\)\\[ValMin:");
Matcher matcher = FPSMin.matcher(str);
if(matcher.find()){
String MinMaxFPS= str.substring(matcher.end(), str.length()-1);
String[] splitted = MinMaxFPS.split(",");
FPSMin= splitted[0].split(": ");
FPSMax = splitted[1];
}
System.out.println(FPSMin);
System.out.println(FPSMax);
}
Maybe your pattern should be like this ^FPS\\(FramesPerSecond\\)\\[ValMin: . I've tried it and it works for me.
String line = "FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]";
Pattern pattern = Pattern.compile("^FPS\\(FramesPerSecond\\)\\[ValMin:");
Matcher matcher = pattern.matcher(line);
if (matcher.find()) {
System.out.println(line.substring(matcher.end(), line.length()-1));
}
}
In that way, you get the offset of the line that you want to extract data and using the substring function you can get all characters starting from offset until the size of the line-1 (because you dont want to get also the ] character)
The following regular expression will match and capture the name, min and max:
Pattern.compile("(.*)\\[.+:\\s*(\\d+(?:\\.\\d+)?)[A-Z]*,.+:\\s*(\\d+(?:\\.\\d+)?)[A-Z]*\\]");
Usage (extracting the captured groups):
String input = (".........\n" +
"FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]\n" +
".........\n" +
"TotalFrames[ValMin: 100000, ValMax:200000]\n" +
".........\n" +
"MemoryUsage(In MB)[ValMin:190000MB, ValMax:360000MB]\n" +
".........");
for (String s : input.split("\n")) {
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1) + ", " + matcher.group(2) + ", " + matcher.group(3));
}
}
Output:
FPS(FramesPerSecond), 29.0000, 35.000
TotalFrames, 100000, 200000
MemoryUsage(In MB), 190000, 360000
i try to capture tabs or spaces in the group to use them using replace all
it can be or space or tab or both and i like to set it group as i like to use it in the string above (Test_me)
This example doesn't work
String content = "\t\t<image.name>ABCD:44</docker.image.name>\n";
String content2 = " <image.name>ABCD:44</docker.image.name>\n";
String source = "ABCD:44";
String destination = "${XXXX}";
content = content.replaceAll("(^[ \\s\\t]*)(<image.name>.*" + source + ")(.*?>)",
"$1Test_me\n" +
"$1" + Matcher.quoteReplacement("<image.name>" + destination) + "$3");
If you want to replace whitespace (space, tab etc.) captured by the first capturing group, (\\s*), you need to omit them (i.e. $1) in the second argument.
Demo:
import java.util.regex.Matcher;
public class Main {
public static void main(String[] args) {
String content = "\t\t<image.name>ABCD:44</docker.image.name>\n";
String content2 = " <image.name>ABCD:44</docker.image.name>\n";
String source = "ABCD:44";
String destination = "${XXXX}";
content = content.replaceAll("(\\s*)(<image.name>.*" + source + ")(.*?>)",
"Test_me\n" + Matcher.quoteReplacement("<image.name>" + destination) + "$3");
System.out.println(content);
}
}
Output:
Test_me
<image.name>${XXXX}</docker.image.name>
I am getting data from a continuous buffer of strings staring from "8=XXX to 10=XXX". For suppose the string for the first buffer scan is say :Below is the entire string I got in one scan.
8=FIX.4.2|9=00815|35=W|49=TT_PRICE|56=SAP0094X|10=134|
8=FIX.4.2|9=00816|35=W49=TT_PRICE ----------------here I didn't get the full string
Now I want the string starting from "8=xxx" and ending with "10=xxx|" . I have written a program for that and it's working fine. Now the problem is when I pass the above string for matching I only get the string that is exactly starting from "8=xxx to 10=xxx" and the other part that is not match just gets vomited. I also want the remaining part.
|56=SAP0094X|10=134|-------This is the remaining part of the above vomited string
8=FIX.4.2|9=00815|35=W|49=TT_PRICE|56=SAP0094X|10=134|
In the next buffer scan I will get the string which is the remaining part of the vomited string while pattern matching. Now see , the vomited string in the first search is
8=FIX.4.2|9=00816|35=W49=TT_PRICE
and the vomited string in the next search is
|56=SAP0094X|10=134|
Both these strings are need to be appended as like
8=FIX.4.2|9=00816|35=W49=TT_PRICE|56=SAP0094X|10=134|
which is the full string.
Below is my code:
String text = in.toString(CharsetUtil.UTF_8); //in is a reference to ByteBuf
Pattern r = Pattern.compile("(8=\\w\\w\\w)[\\s\\S]*?(10=\\w\\w\\w)");
Matcher m = r.matcher(text);
while (m.find()) {
String message = m.group();
// I need to get the remaining not matched string and has to be appended to the not matched string in the next search so that I will be getting the whole string starting from "8=xxx to 10=xxx|"
System.out.println("Incoming From Exchange >> "+message);
}
You can use groups for that:
public static void main(String[] args) {
String someInput = "XXX-payload-YYY-some-tail";
Pattern r = Pattern.compile("(XXX)(.*)(YYY)(.*)");
Matcher m = r.matcher(someInput);
if (m.matches()) {
System.out.println("initial token: " + m.group(1));
System.out.println("payload: " + m.group(2));
System.out.println("end token: " + m.group(3));
System.out.println("tail: " + m.group(4));
}
}
output:
initial token: XXX
payload: -payload-
end token: YYY
tail: -some-tail
Than you can concatenate the "tail" with a result of the second scan and parse it again
I'm using regular expressions to parse logs. I was previously reading the File into a string array, and then iterating through the string array appending if I don't match the timestamp, otherwise I add the line I'm iterating on to a variable and continue the search. Once I get a complete log entry, I use another regular expression to parse it.
Scanning file
try {
List<String> lines = Files.readAllLines(filepath);
Pattern pattern = Pattern.compile("\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}");
Matcher matcher;
String currentEntry = "";
for(String line : lines) {
matcher = pattern.matcher(line);
// If this is a new entry, then wrap up the previous one and start again
if ( matcher.lookingAt() ) {
// If the previous entry was not empty
if(!StringUtils.trimWhitespace(currentEntry).isEmpty()) {
entries.add(new LogEntry(currentEntry));
}
// Clear the current entry
currentEntry = "";
}
if (!currentEntry.trim().isEmpty())
currentEntry += "\n";
currentEntry += line;
}
// At the end, if we have one leftover entry, add it
if (!currentEntry.isEmpty()) {
entries.add(new LogEntry(currentEntry));
}
}catch (Exception ex){
return null;
}
Parsing entry
final private static String timestampRgx = "(?<timestamp>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})";
final private static String levelRgx = "(?<level>(?>INFO|ERROR|WARN|TRACE|DEBUG|FATAL))";
final private static String classRgx = "\\[(?<class>[^]]+)\\]";
final private static String threadRgx = "\\[(?<thread>[^]]+)\\]";
final private static String textRgx = "(?<text>.*)";
private static Pattern PatternFullLog = Pattern.compile(timestampRgx + " " + levelRgx + "\\s+" + classRgx + "-" + threadRgx + "\\s+" + textRgx + "$", Pattern.DOTALL);
public LogEntry(String logText) {
try {
Matcher matcher = PatternFullLog.matcher(logText);
matcher.find();
String dateStr = matcher.group("timestamp");
timestamp = new DateLogLevel();
timestamp.parseLogDate(dateStr);
String levelStr = matcher.group("level");
loglevel = LOG_LEVEL.valueOf(levelStr);
String fullClassStr = matcher.group("class");
String[] classNameArray = fullClassStr.split("\\.");
framework = classNameArray[2];
classname = classNameArray[classNameArray.length - 1];
threadname = matcher.group("thread");
logtext = matcher.group("text");
notes = "";
} catch (Exception ex) {
throw ex;
}
}
What I want to figure out
What I really want to do is read the whole file as a single string, then use a single regex to parse this out line by line, using a single regular expression once. My plan was to use the same expression I use in the constructor, but when looking for the log text make it end at either EOF or the next log line, as such
final String timestampRgx = "(?<timestamp>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})";
final String levelRgx = "(?<level>(?>INFO|ERROR|WARN|TRACE|DEBUG|FATAL))";
final String classRgx = "\\[(?<class>[^]]+)\\]";
final String threadRgx = "\\[(?<thread>[^]]+)\\]";
final String textRgx = "(?<text>.*[^(\Z|\\d{4}\-\\d{2}\-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})"; // change to handle multiple lines
private static Pattern PatternFullLog = Pattern.compile(timestampRgx + " " + levelRgx + "\\s+" + classRgx + "-" + threadRgx + "\\s+" + textRgx + "$", Pattern.DOTALL);
try {
// Read file into string
String lines = readFile(filepath);
Pattern pattern = Pattern.compile("\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}");
Matcher matcher;
matcher = pattern.matcher(line);
while(matcher.find())
String dateStr = matcher.group("timestamp");
timestamp = new DateLogLevel();
timestamp.parseLogDate(dateStr);
String levelStr = matcher.group("level");
loglevel = LOG_LEVEL.valueOf(levelStr);
String fullClassStr = matcher.group("class");
String[] classNameArray = fullClassStr.split("\\.");
framework = classNameArray[2];
classname = classNameArray[classNameArray.length - 1];
threadname = matcher.group("thread");
logtext = matcher.group("text");
entries.add(
new LogEntry(
timestamp,
loglevel,
framework,
threadname,
logtext,
""/* Notes are empty when importing new file */));
}
}
}catch (Exception ex){
return null;
}
The problem is that I can't seem to get the last group (textRgx) to multiline match until either a timestamp or end of file. Does anyone have any thoughts?
Sample Log Entries
2017-03-14 22:43:14,405 FATAL [org.springframework.web.context.support.XmlWebApplicationContext]-[localhost-startStop-1] Refreshing Root WebApplicationContext: startup date [Tue Mar 14 22:43:14 UTC 2017]; root of context hierarchy
2017-03-14 22:43:14,476 INFO [org.springframework.beans.factory.xml.XmlBeanDefinitionReader]-[localhost-startStop-1] Loading XML bean definitions from Serv
2017-03-14 22:43:14,476 INFO [org.springframework.beans.factory.xml.XmlBeanDefinitionReader]-[localhost-startStop-1] Here is a multiline
log entry with another entry after
2017-03-14 22:43:14,476 INFO [org.springframework.beans.factory.xml.XmlBeanDefinitionReader]-[localhost-startStop-1] Here is a multiline
log entry with no entries after
You need to define the patterns like
final static String timestampRgx = "(?<timestamp>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})";
final static String levelRgx = "(?<level>INFO|ERROR|WARN|TRACE|DEBUG|FATAL)";
final static String classRgx = "\\[(?<class>[^\\]]+)]";
final static String threadRgx = "\\[(?<thread>[^\\]]+)]";
final static String textRgx = "(?<text>.*?)(?=\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}|\\Z)";
private static Pattern PatternFullLog = Pattern.compile(timestampRgx + " " + levelRgx + "\\s+" + classRgx + "-" + threadRgx + "\\s+" + textRgx, Pattern.DOTALL);
Then, you may use that like
Matcher matcher = PatternFullLog.matcher(line);
See the Java demo
Here is what the pattern looks like:
(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}) (?<level>INFO|ERROR|WARN|TRACE|DEBUG|FATAL)\s+\[(?<class>[^\]]+)]-\[(?<thread>[^\]]+)]\s+(?<text>.*?)(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}|\Z)
See the regex demo.
Some notes:
You had several issues with escaping symbols (] inside a character class must be escaped, and \- should have been replaced with -
The pattern to match text up to the datetime or end of string is (?<text>.*?)(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}|\Z) where .*? matches any char, 0+ occurrences, reluctantly, up to the first occurrence of the timestamp pattern (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}) or end of string (\Z).
In my program I need to loop through a variety of dates. I am writing this program in java, and have a bit of experience with readers, but I do not know which reader would complete this task the best, or if another class would work better.
The dates would be input into a text file in the format as follows:
1/1/2013 to 1/7/2013
1/8/2013 to 1/15/2013
Or something of this manner. I would need to break each range of dates into 6 local variables for the loop, then change them for the next loop. The variables would be coded for example:
private static String startingMonth = "1";
private static String startingDay = "1";
private static String startingYear = "2013";
private static String endingMonth = "1";
private static String endingDay = "7";
private static String endingYear = "2013";
I imagine this could be done creating several delimiters to look for, but I do not know that this would be the easiest way. I have been looking at this post for help, but cant seem to find a relevant answer. What would be the best way to go about this?
There are several options.
You could use the scanner, and set the delimiter to include the slash. If you want the values as ints and not string, just use sc.nextInt()
Scanner sc = new Scanner(input).useDelimiter("\\s*|/");
// You can skip the loop to just read a single line.
while(sc.hasNext()) {
startingMonth = sc.next();
startingDay = sc.next();
startingYear = sc.next();
// skip "to"
sc.next()
endingMonth = sc.next();
endingDay = sc.next();
endingYear = sc.next();
}
You can use regex, as alfasin suggest, but this case is rather simple so you can just match the first and last space.
String str = "1/1/2013 to 1/7/2013";
String startDate = str.substring(0,str.indexOf(" "));
String endDate = str.substring(str.lastIndexOf(" ")+1);ยจ
// The rest is the same:
String[] start = startDate.split("/");
System.out.println(start[0] + "-" + start[1] + "-" + start[2]);
String[] end = endDate.split("/");
System.out.println(end[0] + "-" + end[1] + "-" + end[2]);
String str = "1/1/2013 to 1/7/2013";
Pattern pattern = Pattern.compile("(\\d+/\\d+/\\d+)");
Matcher matcher = pattern.matcher(str);
matcher.find();
String startDate = matcher.group();
matcher.find();
String endDate = matcher.group();
String[] start = startDate.split("/");
System.out.println(start[0] + "-" + start[1] + "-" + start[2]);
String[] end = endDate.split("/");
System.out.println(end[0] + "-" + end[1] + "-" + end[2]);
...
OUTPUT
1-1-2013
1-7-2013