I have a string that contains file names like:
"file1.txt file2.jpg tricky file name.txt other tricky filenames containing áéíőéáóó.gif"
How can I get the file names, one by one?
I am looking for the most safe most through method, preferably something java standard. There has got to be some regular expression already out there, I am counting on your experience.
Edit: expected results:
"file1.txt", "file2.jpg", "tricky file name.txt", "other tricky filenames containing áéíőéáóó.gif"
Thanks for the help,
Sziro
Regular expresion that enrico.bacis suggested (\S.?.\S+)* will not work if there are filenames without characters before "." like .project.
Correct pattern would be:
(([^ .]+ +)*\S*\.\S+)
You can try it here.
Java program that could extract filenames will look like:
String patternStr = "([^ .]+ +)*\\S*\\.\\S+";
String input = "file1.txt .project file2.jpg tricky file name.txt other tricky filenames containing áéíoéáóó.gif";
Pattern pattern = Pattern.compile(patternStr, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(input);
while (matcher.find()) {
System.out.println(matcher.group());
}
If you want to use regular expressions you can find all the occurrences of:
(\S.*?\.\S+)
(you can test it here)
If there are spaces in the file names, it makes it trickier.
If you can assume there are no dots (.) in the file names, you can use the dot to find each individual records as has been suggested.
If you can't assume there are no dots in file names, e.g. my file.new something.txt
In this situation, I'd suggest you create a list of acceptable extentions, e.g. .doc, .jpg, .pdf etc.
I know the list may be long, so it's not ideal. Once you have done this you can look for these extensions and assume what's before it is a valid filename.
String txt = "file1.txt file2.jpg tricky file name.txt other tricky filenames containing áéíőéáóó.gif";
Pattern pattern = Pattern.compile("\\S.*?\\.\\S+"); // Get regex from enrico.bacis
Matcher matcher = pattern.matcher(txt);
while (matcher.find()) {
System.out.println(matcher.group().trim());
}
Related
I went through a couple of examples to replace a given sub-string from a given string with "" but could not achieve the result. The String is too long to post and it contains a sub-string which is as follows:-
/image/journal/article?img_id=24810&t=1475128689597
I want to replace this sub-string with "".Here the value of img_id and t can vary, so I would have to use regular expression. I tried with the following code:-
String regex="^/image/journal/article?img_id=([0-9])*&t=([0-9])*$";
content=content.replace(regex,"");
Here content is the original given string. But this code is actually not replacing anything from the content. So please help..any help would be appreciated .thanx in advance.
Use replaceAll works in nice way with regex
content=content.replaceAll("[0-9]*","");
Code
String content="/image/journal/article?img_id=24810&t=1475128689597";
content=content.replaceAll("[0-9]*","");
System.out.println(content);
Output :
/image/journal/article?img_id=&t=
Update : simple, might be little less cozy but easy one
String content="sas/image/journal/article?img_id=24810&t=1475128689597";
content=content.replaceAll("\\/image.*","");
System.out.println(content);
Output:
sas
If there is something more after t=1475128689597/?tag=343sdds and you want to retain ?tag=343sdds then use below
String content="sas/image/journal/article?img_id=24810&t=1475128689597/?tag=343sdds";
content=content.replaceAll("(\\/image.*[0-9]+[\\/])","");
System.out.println(content);
}
Output:
sas?tag=343sdds
If you're trying to replace the substring of the URL with two quotations like so:
/image/journal/article?img_id=""&t=""
Then you need to add escaped quotes \"\" inside your content assignment, edit your regex to only look for the numbers, and change it to replaceAll:
content=content.replaceAll(regex,"\"\"");
You can use Java regex Utility to replace your String with "" or (any desired String literal), based on given pattern (regex) as following:
String content = "ALPHA_/image/journal/article?img_id=24810&t=1475128689597_BRAVO";
String regex = "\\/image\\/journal\\/article\\?img_id=\\d+&t=\\d+";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(content);
if (matcher.find()) {
String replacement = matcher.replaceAll("PK");
System.out.println(replacement); // Will print ALPHA_PK_BRAVO
}
I'm trying to build a Java regex to search a .txt file for a Windows formatted file path, however, due to the file path containing literal backslashes, my regex is failing.
The .txt file contains the line:
C\Windows\SysWOW64\ntdll.dll
However, some of the filenames in the text file are formatted like this:
C\Windows\SysWOW64\ntdll.dll (some developer stuff here...)
So I'm unable to use String.equals
To match this line, I'm using the regex:
filename = "C\\Windows\\SysWOW64\\ntdll.dll"
read = BufferedReader.readLine();
if (Pattern.compile(Pattern.quote(filename), Pattern.CASE_INSENSITIVE).matcher(read).find()) {
I've tried escaping the literal backslashes, using the replace method, i.e:
filename.replace("\\", "\\\\");
However, this is failing to find, I'm guessing this is because I need to further escape the backslashes after the Pattern has been built, I'm thinking I might need to escape upto an additional four backslashes, i.e:
Pattern.replaceAll("\\\\", "\\\\\\\\");
However, each time I try, the pattern doesn't get matched. I'm certain it's a problem with the backslashes, but I'm not sure where to do the replacement, or if there's a better way of building the pattern.
I think the problem is further being compounded as the replaceAll method also uses a regex, with means the pattern will have it's own backslashes in there, to deal with the case insensitivity.
Any input or advice would be appreciated.
Thanks
Seems like you're attempting to to a direct comparison of String against another. For exact matches, you could do (
if (read.equalsIgnoreCase(filename)) {
of simply
if (read.startsWith(filename)) {
Try this :
While reading each line from the file, replace '\' by '\\'.
Then :
String lLine = "C\\Windows\\SysWOW64\\ntdll.dll";
Pattern lPattern = Pattern.compile("C\\\\Windows\\\\SysWOW64\\\\ntdll\\.dll");
Matcher lMatcher = lPattern.matcher(lLine);
if(lMatcher.find()) {
System.out.println(lMatcher.group());
}
lLine = "C\\Windows\\SysWOW64\\ntdll.dll (some developer stuff here...)";
lMatcher = lPattern.matcher(lLine);
if(lMatcher.find()) {
System.out.println(lMatcher.group());
}
The correct usage will be:
String filename = "C\\Windows\\SysWOW64\\ntdll.dll";
String file = filename.replace('\\', ' ');
I have this code which accepts only those images with .bmp format
String format="";
if(pic_sel_file.toLowerCase().endsWith(".bmp"))
format="BMP";
How to change this code so that it can accept images with any file format(.jpg,.tif,etc)?
To avoid having multiple if-then-else statements, you could use use a regular expression to do a match on a list of file extension types. endsWith does not take a regex, so you could do:
Matcher m = Pattern.compile("(?i).*\\.(bmp|gif|jpg|tif)$").matcher(pic_sel_file);
if (m.matches()) {
format = m.group(1).toUpperCase();
}
(?i) Ignore case
\\. Escaped dot character
(bmp|... Image types
$ End of line
If you want to avoid regexp, you can use a collection and loop over it:
List<String> allowedFileEndings = Arrays.asList("bmp", "png", "gif"); //or array
for (String allowedFileEndingsItem : allowedFileEndings) {
if(pic_sel_file.toLowerCase().endsWith(allowedFileEndingsItem)) // or "." + allowedFileEndingsItem
{
format = allowedFileEndingsItem; // uppercase if you want
break;
}
}
I would even consider using a enum for the file formats, this could make life easier for the format handling.
I suspect this entire task would be better suited to a JFileChooser with a filter for supported image types.
Related
ListImageReaders
TIFF support
Ok so this is what I have
"C:\this\file\is\rev12\oh\A_12345\doll\classes"
I want to extract from this string the 12345 only.
How can it be done using Java Pattern.compile?
You should define in more general idea how this number appears.
So if it somewhere in string with leading underscore _ and trailing slash \ pattern will be following _(\d+)\\.
Your number can be extracted from pattern matched group.
Try it.
Below is the code you could use, however I had to change the backward slash to forward slash in the path and use an absolute path. I also tried to change the path "C:\\this\\file\\is\\rev12\\oh\\A_12345\\doll\\classes" to use it in Windows. You could change the '\' to '\\'. Both the path string works for the below code.
File file = new java.io.File("C:/this/file/is/rev12/oh/A_12345/doll/classes").getAbsoluteFile();
System.out.println(file.getAbsolutePath());
Pattern pat = Pattern.compile("-?\\d+");
Matcher mat = pat.matcher(file.getAbsolutePath());
while (mat.find()) {
System.out.println(mat.group());
}
i have a text file like:
"GET /opacial/index.php?op=results&catalog=1&view=1&language=el&numhits=10&query=\xce\x95\xce\xbb\xce\xbb\xce\xac\xce\xb4\xce\xb1%20--%20\xce\x95\xce\xb8\xce\xbd\xce\xb9\xce\xba\xce\xad\xcf\x82%20\xcf\x83\xcf\x87\xce\xad\xcf\x83\xce\xb5\xce\xb9\xcf\x82%20--%20\xce\x99\xcf\x83\xcf\x84\xce\xbf\xcf\x81\xce\xaf\xce\xb1&search_field=11&page=1
And i want to cut all the characters after the word "query" and before "&search". (bolds above).
I am trying to cut the data, using patterns but something is wrong.. Can you give me an example for the example code above?
EDIT:
An other problem , except the one above is that the matcher is used only for charSequences, and i have a file, which can not casted to charSequence... :\
something like that:
String yourNewText=yourOldText.split("query")[1].split("&search")[0];
?
to see how to read a file into a String, you can look here (there are different possiblities)
".*query\\=(.*)\\&search_field.*"
This regex should work to give you a capture of what you want to remove. Then String.replace should do the trick.
Edit - response to comment. The following code...
String s = "GET /opacial/index.php?op=results&catalog=1&view=1&language=el&numhits=10&query=\\xce\\x95\\xce\\xbb\\xce\\xbb\\xce\\xac\\xce\\xb4\\xce\\xb1%20--%20\\xce\\x95\\xce\\xb8\\xce\\xbd\\xce\\xb9\\xce\\xba\\xce\\xad\\xcf\\x82%20\\xcf\\x83\\xcf\\x87\\xce\\xad\\xcf\\x83\\xce\\xb5\\xce\\xb9\\xcf\\x82%20 --%20\\xce\\x99\\xcf\\x83\\xcf\\x84\\xce\\xbf\\xcf\\x81\\xce\\xaf\\xce\\xb1&search_field=11&page=1";
Pattern p = Pattern.compile(".*query\\=(.*)\\&search_field.*");
Matcher m = p.matcher(s);
if (m.matches()){
String betweenQueryAndSearch = m.group(1);
System.out.println(betweenQueryAndSearch);
}
Produced the following output....
\xce\x95\xce\xbb\xce\xbb\xce\xac\xce\xb4\xce\xb1%20--%20\xce\x95\xce\xb8\xce\xbd\xce\xb9\xce\xba\xce\xad\xcf\x82%20\xcf\x83\xcf\x87\xce\xad\xcf\x83\xce\xb5\xce\xb9\xcf\x82%20 --%20\xce\x99\xcf\x83\xcf\x84\xce\xbf\xcf\x81\xce\xaf\xce\xb1