I have a .txt file that I browse through a bufferReader and I need to extract the last character from this String, I leave the line below
<path
action="m"
text-mod="true"
mods="true"
kind="file">branches/RO/2021Align01/CO/DGSIG-DAO/src/main/java/eu/ca/co/vo/CsoorspeWsVo.java</path>
I have the following code that takes my entire line and sets it in a list, but I just need it Cs00rspeWsVo
while ((line = bufferdReader.readLine()) != null) {
Excel4 excel4 = new Excel4();
if (line.contains("</path>")) {
int index1 = line.indexOf(">");
int index2 = line.lastIndexOf("<");
line = line.substring(index1, index2);
excel4.setName(line);
listExcel4.add(excel4);
}
}
and I only want to extract Cs00rspeWsVo from here.
can anyone help me? thanks
You can use Regex groups to get it for example
public static void main(String []args){
String input = "<path\n" +
" action=\"m\"\n" +
" text-mod=\"true\"\n" +
" mods=\"true\"\n" +
" kind=\"file\">branches/RO/2021Align01/CO/DGSIG-DAO/src/main/java/eu/ca/co/vo/CsoorspeWsVo.java</path>\n";
Pattern pattern = Pattern.compile("kind=\"file\">.+/(.+\\..+)</path>");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
String fileName = matcher.group(1);
System.out.println(fileName);
}
}
Output will be -> CsoorspeWsVo.java
and if you want the fill path change the regex to
Pattern pattern = Pattern.compile("kind=\"file\">(.+)</path>");
The output will be:
branches/RO/2021Align01/CO/DGSIG-DAO/src/main/java/eu/ca/co/vo/CsoorspeWsVo.java
And you can get name and extension in two groups for example
Pattern pattern = Pattern.compile("kind=\"file\">.+/(.+)\\.(.+)</path>");
And inside the if
String fileName = matcher.group(1);
String fileExtension = matcher.group(2);
Related
I have a very long text and I'm extracting some specific values that are followed by some particular words. Here's an example of my long text:
.........
FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]
.........
TotalFrames[ValMin: 100000, ValMax:200000]
.........
MemoryUsage(In MB)[ValMin:190000MB, ValMax:360000MB]
.........
here's my code:
File file = filePath.toFile();
JSONObject jsonObject = new JSONObject();
String FPSMin="";
String FPSMax="";
String TotalFramesMin="";
String TotalFramesMax="";
String MemUsageMin="";
String MemUsageMax="";
String log = "my//log//file";
final Matcher matcher = Pattern.compile("FPS/\(FramesPerSecond/\)/\[ValMin:");
if(matcher.find()){
FPSMin= matcher.end().trim();
}
But I can't make it work. Where am I wrong? Basically I need to select, for each String, the corresponding values (max and min) coming from that long text and store them into the variables. Like
FPSMin = 29.0000
FPSMax = 35.0000
FramesMin = 100000
Etc
Thank you
EDIT:
I tried the following code (in a test case) to see if the solution could work, but I'm experiencing issues because I can't print anything except an object. Here's the code:
#Test
public void whenReadLargeFileJava7_thenCorrect()
throws IOException, URISyntaxException {
Scanner txtScan = new Scanner("path//to//file//test.txt");
String[] FPSMin= new String[0];
String FPSMax= "";
//Read File Line By Line
while (txtScan.hasNextLine()) {
// Print the content on the console
String str = txtScan.nextLine();
Pattern FPSMin= Pattern.compile("^FPS\\(FramesPerSecond\\)\\[ValMin:");
Matcher matcher = FPSMin.matcher(str);
if(matcher.find()){
String MinMaxFPS= str.substring(matcher.end(), str.length()-1);
String[] splitted = MinMaxFPS.split(",");
FPSMin= splitted[0].split(": ");
FPSMax = splitted[1];
}
System.out.println(FPSMin);
System.out.println(FPSMax);
}
Maybe your pattern should be like this ^FPS\\(FramesPerSecond\\)\\[ValMin: . I've tried it and it works for me.
String line = "FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]";
Pattern pattern = Pattern.compile("^FPS\\(FramesPerSecond\\)\\[ValMin:");
Matcher matcher = pattern.matcher(line);
if (matcher.find()) {
System.out.println(line.substring(matcher.end(), line.length()-1));
}
}
In that way, you get the offset of the line that you want to extract data and using the substring function you can get all characters starting from offset until the size of the line-1 (because you dont want to get also the ] character)
The following regular expression will match and capture the name, min and max:
Pattern.compile("(.*)\\[.+:\\s*(\\d+(?:\\.\\d+)?)[A-Z]*,.+:\\s*(\\d+(?:\\.\\d+)?)[A-Z]*\\]");
Usage (extracting the captured groups):
String input = (".........\n" +
"FPS(FramesPerSecond)[ValMin: 29.0000, ValMax: 35.000]\n" +
".........\n" +
"TotalFrames[ValMin: 100000, ValMax:200000]\n" +
".........\n" +
"MemoryUsage(In MB)[ValMin:190000MB, ValMax:360000MB]\n" +
".........");
for (String s : input.split("\n")) {
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1) + ", " + matcher.group(2) + ", " + matcher.group(3));
}
}
Output:
FPS(FramesPerSecond), 29.0000, 35.000
TotalFrames, 100000, 200000
MemoryUsage(In MB), 190000, 360000
I need to get the text between the URL which has a date in Java
Input 1:
/test1/raw/2019-06-11/testcustomer/usr/pqr/DATA/mn/export/
Output: testcustomer
Only /raw/ remains, date will change and testcustomer will change
Input 2:
/test3/raw/2018-09-01/newcustomer/usr/pqr/DATA/mn/export/
Output: newcustomer
String url = "/test3/raw/2018-09-01/newcustomer/usr/pqr/DATA/mn/export/";
String customer = getCustomer(url);
public String getCustomer (String _url){
String source = "default";
String regex = basePath + "/raw/\\d{4}-\\d{2}-\\d{2}/usr*";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(_url);
if (m.find()) {
source = m.group(1);
} else {
logger.error("Cant get customer with regex " + regex);
}
return source;
}
It's returning 'default' :(
Your regex /raw/\\d{4}-\\d{2}-\\d{2}/usr* is missing the part for the value you want, you need a regex that find the date, and keep what's next :
/\w*/raw/[0-9-]+/(\w+)/.* or (?<=raw\/\d{4}-\d{2}-\d{2}\/)(\w+) will be good
Pattern p = Pattern.compile("/\\w*/raw/[0-9-]+/(\\w+)/.*");
Matcher m = p.matcher(str);
if (m.find()) {
String value = m.group(1);
System.out.println(value);
}
Or if it's always the 4th part, use split()
String value = str.split("/")[4];
System.out.println(value);
And here a >> code demo
Here, we can likely use raw followed by the date as a left boundary, then we would collect our desired output in a capturing group, we would add an slash and consume the rest of our string, with an expression similar to:
.+raw\/[0-9]{4}-[0-9]{2}-[0-9]{2}\/(.+?)\/.+
Demo
Test
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = ".+raw\\/[0-9]{4}-[0-9]{2}-[0-9]{2}\\/(.+?)\\/.+";
final String string = "/test1/raw/2019-06-11/testcustomer/usr/pqr/DATA/mn/export/\n"
+ "/test3/raw/2018-09-01/newcustomer/usr/pqr/DATA/mn/export/";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
RegEx
If this expression wasn't desired or you wish to modify it, please visit regex101.com.
RegEx Circuit
jex.im visualizes regular expressions:
I'm using regular expressions to parse logs. I was previously reading the File into a string array, and then iterating through the string array appending if I don't match the timestamp, otherwise I add the line I'm iterating on to a variable and continue the search. Once I get a complete log entry, I use another regular expression to parse it.
Scanning file
try {
List<String> lines = Files.readAllLines(filepath);
Pattern pattern = Pattern.compile("\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}");
Matcher matcher;
String currentEntry = "";
for(String line : lines) {
matcher = pattern.matcher(line);
// If this is a new entry, then wrap up the previous one and start again
if ( matcher.lookingAt() ) {
// If the previous entry was not empty
if(!StringUtils.trimWhitespace(currentEntry).isEmpty()) {
entries.add(new LogEntry(currentEntry));
}
// Clear the current entry
currentEntry = "";
}
if (!currentEntry.trim().isEmpty())
currentEntry += "\n";
currentEntry += line;
}
// At the end, if we have one leftover entry, add it
if (!currentEntry.isEmpty()) {
entries.add(new LogEntry(currentEntry));
}
}catch (Exception ex){
return null;
}
Parsing entry
final private static String timestampRgx = "(?<timestamp>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})";
final private static String levelRgx = "(?<level>(?>INFO|ERROR|WARN|TRACE|DEBUG|FATAL))";
final private static String classRgx = "\\[(?<class>[^]]+)\\]";
final private static String threadRgx = "\\[(?<thread>[^]]+)\\]";
final private static String textRgx = "(?<text>.*)";
private static Pattern PatternFullLog = Pattern.compile(timestampRgx + " " + levelRgx + "\\s+" + classRgx + "-" + threadRgx + "\\s+" + textRgx + "$", Pattern.DOTALL);
public LogEntry(String logText) {
try {
Matcher matcher = PatternFullLog.matcher(logText);
matcher.find();
String dateStr = matcher.group("timestamp");
timestamp = new DateLogLevel();
timestamp.parseLogDate(dateStr);
String levelStr = matcher.group("level");
loglevel = LOG_LEVEL.valueOf(levelStr);
String fullClassStr = matcher.group("class");
String[] classNameArray = fullClassStr.split("\\.");
framework = classNameArray[2];
classname = classNameArray[classNameArray.length - 1];
threadname = matcher.group("thread");
logtext = matcher.group("text");
notes = "";
} catch (Exception ex) {
throw ex;
}
}
What I want to figure out
What I really want to do is read the whole file as a single string, then use a single regex to parse this out line by line, using a single regular expression once. My plan was to use the same expression I use in the constructor, but when looking for the log text make it end at either EOF or the next log line, as such
final String timestampRgx = "(?<timestamp>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})";
final String levelRgx = "(?<level>(?>INFO|ERROR|WARN|TRACE|DEBUG|FATAL))";
final String classRgx = "\\[(?<class>[^]]+)\\]";
final String threadRgx = "\\[(?<thread>[^]]+)\\]";
final String textRgx = "(?<text>.*[^(\Z|\\d{4}\-\\d{2}\-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})"; // change to handle multiple lines
private static Pattern PatternFullLog = Pattern.compile(timestampRgx + " " + levelRgx + "\\s+" + classRgx + "-" + threadRgx + "\\s+" + textRgx + "$", Pattern.DOTALL);
try {
// Read file into string
String lines = readFile(filepath);
Pattern pattern = Pattern.compile("\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}");
Matcher matcher;
matcher = pattern.matcher(line);
while(matcher.find())
String dateStr = matcher.group("timestamp");
timestamp = new DateLogLevel();
timestamp.parseLogDate(dateStr);
String levelStr = matcher.group("level");
loglevel = LOG_LEVEL.valueOf(levelStr);
String fullClassStr = matcher.group("class");
String[] classNameArray = fullClassStr.split("\\.");
framework = classNameArray[2];
classname = classNameArray[classNameArray.length - 1];
threadname = matcher.group("thread");
logtext = matcher.group("text");
entries.add(
new LogEntry(
timestamp,
loglevel,
framework,
threadname,
logtext,
""/* Notes are empty when importing new file */));
}
}
}catch (Exception ex){
return null;
}
The problem is that I can't seem to get the last group (textRgx) to multiline match until either a timestamp or end of file. Does anyone have any thoughts?
Sample Log Entries
2017-03-14 22:43:14,405 FATAL [org.springframework.web.context.support.XmlWebApplicationContext]-[localhost-startStop-1] Refreshing Root WebApplicationContext: startup date [Tue Mar 14 22:43:14 UTC 2017]; root of context hierarchy
2017-03-14 22:43:14,476 INFO [org.springframework.beans.factory.xml.XmlBeanDefinitionReader]-[localhost-startStop-1] Loading XML bean definitions from Serv
2017-03-14 22:43:14,476 INFO [org.springframework.beans.factory.xml.XmlBeanDefinitionReader]-[localhost-startStop-1] Here is a multiline
log entry with another entry after
2017-03-14 22:43:14,476 INFO [org.springframework.beans.factory.xml.XmlBeanDefinitionReader]-[localhost-startStop-1] Here is a multiline
log entry with no entries after
You need to define the patterns like
final static String timestampRgx = "(?<timestamp>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3})";
final static String levelRgx = "(?<level>INFO|ERROR|WARN|TRACE|DEBUG|FATAL)";
final static String classRgx = "\\[(?<class>[^\\]]+)]";
final static String threadRgx = "\\[(?<thread>[^\\]]+)]";
final static String textRgx = "(?<text>.*?)(?=\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}|\\Z)";
private static Pattern PatternFullLog = Pattern.compile(timestampRgx + " " + levelRgx + "\\s+" + classRgx + "-" + threadRgx + "\\s+" + textRgx, Pattern.DOTALL);
Then, you may use that like
Matcher matcher = PatternFullLog.matcher(line);
See the Java demo
Here is what the pattern looks like:
(?<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}) (?<level>INFO|ERROR|WARN|TRACE|DEBUG|FATAL)\s+\[(?<class>[^\]]+)]-\[(?<thread>[^\]]+)]\s+(?<text>.*?)(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}|\Z)
See the regex demo.
Some notes:
You had several issues with escaping symbols (] inside a character class must be escaped, and \- should have been replaced with -
The pattern to match text up to the datetime or end of string is (?<text>.*?)(?=\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}|\Z) where .*? matches any char, 0+ occurrences, reluctantly, up to the first occurrence of the timestamp pattern (\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3}) or end of string (\Z).
I have this line of strings:
String line = "GET /MyFile.extension HTTP/1.1\n\n"
I want to get only the the file name MyFile.extension string, I tried this but the problem the HTTP version could change.
String fileName = line.replace("GET /", "");
fileName = fileName.replace(" HTTP/1.1", "");
This doesn't work too:
string fileName = line.indexOf("MyFile.extension");
I don't know the file Name too, it could be any file, It there a way to get that file between the strings "GET/" and "HTTP/"?
You can simple do this: line.split(" ")[1].substring(1)
Here is the code snippet:
public static void main (String[] args)
{
String line = "GET /MyFile.extension HTTP/1.1\n\n";
System.out.println(line.split(" ")[1].substring(1));
}
Output:
MyFile.extension
public static void main(String []args)
{
String line = "GET /MyFile.extension HTTP/1.1\n\n";
// To find the index of "/"
int start = line.indexOf("/");
// To find the index of space from int start which I got from the line above
int end = line.indexOf(" ", start);
// To extract the given string from the start+1 index to the end index
String s = line.substring(start+1, end);
System.out.println(s);
}
Output :
MyFile.extension
You could use regular expression to get you in the inner value
Pattern p = Pattern.compile("GET (.*?) HTTP/1.1")
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(1)); // MyFile.extension
}
I am trying to find a regular expression that would match the following format:
path/*.file_extension
For example:
temp/*.jpg
usr/*.pdf
var/lib/myLib.so
tmp/
Using the regex, I want to store the matching parts into a String array, such as:
String[] tokens;
// regex magic here
String path = tokens[0];
String filename = tokens[1];
String extension = tokens[2];
In case of the last case tmp/, that contains no filename and extension, then token[1] and token[2] would be null.
In case of the:
usr/*.pdf
then the token[1] would contain only the string "*".
Thank you very much for your help.
If you can use Java7 then you can use named groups like this
String data = "temp/*.jpg, usr/*.pdf, var/lib/*.so, tmp/*, usr/*, usr/*.*";
Pattern p = Pattern
.compile("(?<path>(\\w+/)+)((?<name>\\w+|[*]))?([.](?<extension>\\w+|[*]))?");
Matcher m = p.matcher(data);
while (m.find()) {
System.out.println("data=" + m.group());
System.out.println("path=" + m.group("path"));
System.out.println("name=" + m.group("name"));
System.out.println("extension=" + m.group("extension"));
System.out.println("------------");
}
This code should wotk:
String line = "var/lib/myLib.so";
Pattern p = Pattern.compile("(.+?(?=/[^/]*$))/([^.]+)\\.(.+)$");
Matcher m = p.matcher(line);
List<String> tokens = new ArrayList<String>();
if (m.find()) {
for (int i=1; i <= m.groupCount(); i++) {
tokens.add(m.group(i));
}
}
System.out.println("Tokens => " + tokens);
OUTPUT:
Tokens => [var/lib, myLib, so]
I'm assuming you're using Java. This should work:
Pattern.compile("path/(.*?)(?:\\.(file_extension))?");
Why use a regular expression?
I personally find lastIndexOf more readable.
String path;
String filename;
#Nullable String extension;
// Look for the last slash
int lastSlash = fullPath.lastIndexOf('/');
// Look for the last dot after the last slash
int lastDot = fullPath.lastIndexOf('.', lastSlash + 1);
if (lastDot < 0) {
filename = fullPath.substring(lastSlash + 1);
// If there is no dot, then there is no extension which
// is distinct from the empty extension in "foo/bar."
extension = null;
} else {
filename = fullPath.substring(lastSlash + 1, lastDot);
extension = fullPath.substring(lastDot + 1);
}
On a different approach, a simple usage of 'substring()/lastIndexOf()' methods should serve the purpose:
String filePath = "var/lib/myLib.so";
String fileName = filePath.substring(filePath.lastIndexOf('/')+1);
String path = filePath.substring(0, filePath.lastIndexOf('/'));
String fileName = fileName.substring(0, fileName.lastIndexOf('.'));
String extension = fileName.substring(fileName.lastIndexOf('.')+1);
Please Note: You need to handle the alternate scenarios e.g. file path without extension.