Java Regex: Extracting a Version Number - java

I have a program that stores the version number in a text file on the file system. I import the file within java and I am wanting to extract the version number. I'm not very good with regex so I am hoping someone can help.
The text file looks like such:
0=2.2.5 BUILD (tons of other junk here)
I am wanting to extract 2.2.5. Nothing else. Can someone help me with the regex for this?

If you know the structure, you don't need a regex:
String line = "0=2.2.5 BUILD (tons of other junk here)";
String versionNumber = line.split(" ", 2)[0].substring(2);

This regular expression should do the trick:
(?<==)\d+\.\d+\.\d+(?=\s*BUILD)
Trying it out:
String s = "0=2.2.5 BUILD (tons of other junk here)";
Pattern p = Pattern.compile("(?<==)\\d+\\.\\d+\\.\\d+(?=\\s*BUILD)");
Matcher m = p.matcher(s);
while (m.find())
System.out.println(m.group());
2.2.5

Also if you are really looking for a regex, though there are definitely many ways to do this.
String line = "0=2.2.5 BUILD (tons of other junk here)";
Matcher matcher = Pattern.compile("^\\d+=((\\d|\\.)+)").matcher(line);
if (matcher.find())
System.out.println(matcher.group(1));
Output:
2.2.5

There are many ways to do this. Here is one of them
String data = "0=2.2.5 BUILD (tons of other junk here)";
Matcher m = Pattern.compile("\\d+=(\\d+([.]\\d+)+) BUILD").matcher(data);
if (m.find())
System.out.println(m.group(1));
If you are sure that data contains version number then you can also
System.out.println(data.substring(data.indexOf('=')+1,data.indexOf(' ')));

Related

Java replace all regex matches

I have htmlBody field which has html of a web page assigned to it. I want to check for all occurences for relative links ending in .html and for each of them to remove their extension. I do not want htmlBody.replaceAll(".html", "") because it will remove for all links and break some external links so my approach is to find all occurences that matches regex, and for each occurence to remove their extension using replaceAll() and append to sb. I tried to follow the example from official documentation but apparently it does not change any link, what could be the problem?
StringBuilder sb = new StringBuilder();
Pattern p = Pattern.compile("^\\/(.+\\\\)*(.+).(html)$");
Matcher m = p.matcher(htmlBody);
while (m.find()) {
String updatedLink = m.group().replaceAll(".html", "");
m.appendReplacement(sb, updatedLink);
}
m.appendTail(sb);
your regex was wrong, ^ match start of string, $ match end of string.
so matcher in your code will never match.
right regex like Pattern p = Pattern.compile("['\"]\\/(.+\\\\)*(.+).(html)");
but, it can't match <a href=/a.html>

How to replace a given substring with "" from a given string?

I went through a couple of examples to replace a given sub-string from a given string with "" but could not achieve the result. The String is too long to post and it contains a sub-string which is as follows:-
/image/journal/article?img_id=24810&t=1475128689597
I want to replace this sub-string with "".Here the value of img_id and t can vary, so I would have to use regular expression. I tried with the following code:-
String regex="^/image/journal/article?img_id=([0-9])*&t=([0-9])*$";
content=content.replace(regex,"");
Here content is the original given string. But this code is actually not replacing anything from the content. So please help..any help would be appreciated .thanx in advance.
Use replaceAll works in nice way with regex
content=content.replaceAll("[0-9]*","");
Code
String content="/image/journal/article?img_id=24810&t=1475128689597";
content=content.replaceAll("[0-9]*","");
System.out.println(content);
Output :
/image/journal/article?img_id=&t=
Update : simple, might be little less cozy but easy one
String content="sas/image/journal/article?img_id=24810&t=1475128689597";
content=content.replaceAll("\\/image.*","");
System.out.println(content);
Output:
sas
If there is something more after t=1475128689597/?tag=343sdds and you want to retain ?tag=343sdds then use below
String content="sas/image/journal/article?img_id=24810&t=1475128689597/?tag=343sdds";
content=content.replaceAll("(\\/image.*[0-9]+[\\/])","");
System.out.println(content);
}
Output:
sas?tag=343sdds
If you're trying to replace the substring of the URL with two quotations like so:
/image/journal/article?img_id=""&t=""
Then you need to add escaped quotes \"\" inside your content assignment, edit your regex to only look for the numbers, and change it to replaceAll:
content=content.replaceAll(regex,"\"\"");
You can use Java regex Utility to replace your String with "" or (any desired String literal), based on given pattern (regex) as following:
String content = "ALPHA_/image/journal/article?img_id=24810&t=1475128689597_BRAVO";
String regex = "\\/image\\/journal\\/article\\?img_id=\\d+&t=\\d+";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(content);
if (matcher.find()) {
String replacement = matcher.replaceAll("PK");
System.out.println(replacement); // Will print ALPHA_PK_BRAVO
}

Error while using FileUtils

I am using FileUtils from apache commons.io to search text between two strings in a file with the following code:
Pattern p = Pattern.compile(Pattern.quote(fromDate) + "(.*?)" + Pattern.quote(toDate));
try {
Matcher m = p.matcher(fileContent);
while (m.find()) {
System.out.println(m.group(1));
But there is an error it is giving output only when both the strings lie in same line, no output if strings are in at different lines? Here i am taking the content of whole file into a Sting Varibale "fileContent".
The dot won't search over multiple lines. You need to give a second parameter for this Pattern.DOTALL like so:
Pattern p = Pattern.compile(Pattern.quote(fromDate) + "(.*?)" + Pattern.quote(toDate), Pattern.DOTALL);
Also this topics has some good explanation how it works: Match multiline text using regular expression
try ending the regex with ?s so you new regex should be:
"(.*?s)"
In most case the matcher stops evaluate expression when it encounters a line feed \n.
?s make the matcher pass the \n when it try to match the regex.

How to get (split) filenames from string in java?

I have a string that contains file names like:
"file1.txt file2.jpg tricky file name.txt other tricky filenames containing áéíőéáóó.gif"
How can I get the file names, one by one?
I am looking for the most safe most through method, preferably something java standard. There has got to be some regular expression already out there, I am counting on your experience.
Edit: expected results:
"file1.txt", "file2.jpg", "tricky file name.txt", "other tricky filenames containing áéíőéáóó.gif"
Thanks for the help,
Sziro
Regular expresion that enrico.bacis suggested (\S.?.\S+)* will not work if there are filenames without characters before "." like .project.
Correct pattern would be:
(([^ .]+ +)*\S*\.\S+)
You can try it here.
Java program that could extract filenames will look like:
String patternStr = "([^ .]+ +)*\\S*\\.\\S+";
String input = "file1.txt .project file2.jpg tricky file name.txt other tricky filenames containing áéíoéáóó.gif";
Pattern pattern = Pattern.compile(patternStr, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group());
}
If you want to use regular expressions you can find all the occurrences of:
(\S.*?\.\S+)
(you can test it here)
If there are spaces in the file names, it makes it trickier.
If you can assume there are no dots (.) in the file names, you can use the dot to find each individual records as has been suggested.
If you can't assume there are no dots in file names, e.g. my file.new something.txt
In this situation, I'd suggest you create a list of acceptable extentions, e.g. .doc, .jpg, .pdf etc.
I know the list may be long, so it's not ideal. Once you have done this you can look for these extensions and assume what's before it is a valid filename.
String txt = "file1.txt file2.jpg tricky file name.txt other tricky filenames containing áéíőéáóó.gif";
Pattern pattern = Pattern.compile("\\S.*?\\.\\S+"); // Get regex from enrico.bacis
Matcher matcher = pattern.matcher(txt);
while (matcher.find()) {
System.out.println(matcher.group().trim());
}

how to choose text from a file

i have a text file like:
"GET /opacial/index.php?op=results&catalog=1&view=1&language=el&numhits=10&query=\xce\x95\xce\xbb\xce\xbb\xce\xac\xce\xb4\xce\xb1%20--%20\xce\x95\xce\xb8\xce\xbd\xce\xb9\xce\xba\xce\xad\xcf\x82%20\xcf\x83\xcf\x87\xce\xad\xcf\x83\xce\xb5\xce\xb9\xcf\x82%20--%20\xce\x99\xcf\x83\xcf\x84\xce\xbf\xcf\x81\xce\xaf\xce\xb1&search_field=11&page=1
And i want to cut all the characters after the word "query" and before "&search". (bolds above).
I am trying to cut the data, using patterns but something is wrong.. Can you give me an example for the example code above?
EDIT:
An other problem , except the one above is that the matcher is used only for charSequences, and i have a file, which can not casted to charSequence... :\
something like that:
String yourNewText=yourOldText.split("query")[1].split("&search")[0];
?
to see how to read a file into a String, you can look here (there are different possiblities)
".*query\\=(.*)\\&search_field.*"
This regex should work to give you a capture of what you want to remove. Then String.replace should do the trick.
Edit - response to comment. The following code...
String s = "GET /opacial/index.php?op=results&catalog=1&view=1&language=el&numhits=10&query=\\xce\\x95\\xce\\xbb\\xce\\xbb\\xce\\xac\\xce\\xb4\\xce\\xb1%20--%20\\xce\\x95\\xce\\xb8\\xce\\xbd\\xce\\xb9\\xce\\xba\\xce\\xad\\xcf\\x82%20\\xcf\\x83\\xcf\\x87\\xce\\xad\\xcf\\x83\\xce\\xb5\\xce\\xb9\\xcf\\x82%20 --%20\\xce\\x99\\xcf\\x83\\xcf\\x84\\xce\\xbf\\xcf\\x81\\xce\\xaf\\xce\\xb1&search_field=11&page=1";
Pattern p = Pattern.compile(".*query\\=(.*)\\&search_field.*");
Matcher m = p.matcher(s);
if (m.matches()){
String betweenQueryAndSearch = m.group(1);
System.out.println(betweenQueryAndSearch);
}
Produced the following output....
\xce\x95\xce\xbb\xce\xbb\xce\xac\xce\xb4\xce\xb1%20--%20\xce\x95\xce\xb8\xce\xbd\xce\xb9\xce\xba\xce\xad\xcf\x82%20\xcf\x83\xcf\x87\xce\xad\xcf\x83\xce\xb5\xce\xb9\xcf\x82%20 --%20\xce\x99\xcf\x83\xcf\x84\xce\xbf\xcf\x81\xce\xaf\xce\xb1

Categories

Resources