Java: splitting the filename into a base and extension - java

Is there a better way to get file basename and extension than something like
File f = ...
String name = f.getName();
int dot = name.lastIndexOf('.');
String base = (dot == -1) ? name : name.substring(0, dot);
String extension = (dot == -1) ? "" : name.substring(dot+1);

I know others have mentioned String.split, but here is a variant that only yields two tokens (the base and the extension):
String[] tokens = fileName.split("\\.(?=[^\\.]+$)");
For example:
"test.cool.awesome.txt".split("\\.(?=[^\\.]+$)");
Yields:
["test.cool.awesome", "txt"]
The regular expression tells Java to split on any period that is followed by any number of non-periods, followed by the end of input. There is only one period that matches this definition (namely, the last period).
Technically Regexically speaking, this technique is called zero-width positive lookahead.
BTW, if you want to split a path and get the full filename including but not limited to the dot extension, using a path with forward slashes,
String[] tokens = dir.split(".+?/(?=[^/]+$)");
For example:
String dir = "/foo/bar/bam/boozled";
String[] tokens = dir.split(".+?/(?=[^/]+$)");
// [ "/foo/bar/bam/" "boozled" ]

Old question but I usually use this solution:
import org.apache.commons.io.FilenameUtils;
String fileName = "/abc/defg/file.txt";
String basename = FilenameUtils.getBaseName(fileName);
String extension = FilenameUtils.getExtension(fileName);
System.out.println(basename); // file
System.out.println(extension); // txt (NOT ".txt" !)

Source: http://www.java2s.com/Code/Java/File-Input-Output/Getextensionpathandfilename.htm
such an utility class :
class Filename {
private String fullPath;
private char pathSeparator, extensionSeparator;
public Filename(String str, char sep, char ext) {
fullPath = str;
pathSeparator = sep;
extensionSeparator = ext;
}
public String extension() {
int dot = fullPath.lastIndexOf(extensionSeparator);
return fullPath.substring(dot + 1);
}
public String filename() { // gets filename without extension
int dot = fullPath.lastIndexOf(extensionSeparator);
int sep = fullPath.lastIndexOf(pathSeparator);
return fullPath.substring(sep + 1, dot);
}
public String path() {
int sep = fullPath.lastIndexOf(pathSeparator);
return fullPath.substring(0, sep);
}
}
usage:
public class FilenameDemo {
public static void main(String[] args) {
final String FPATH = "/home/mem/index.html";
Filename myHomePage = new Filename(FPATH, '/', '.');
System.out.println("Extension = " + myHomePage.extension());
System.out.println("Filename = " + myHomePage.filename());
System.out.println("Path = " + myHomePage.path());
}
}

http://docs.oracle.com/javase/6/docs/api/java/io/File.html#getName()
From http://www.xinotes.org/notes/note/774/ :
Java has built-in functions to get the basename and dirname for a given file path, but the function names are not so self-apparent.
import java.io.File;
public class JavaFileDirNameBaseName {
public static void main(String[] args) {
File theFile = new File("../foo/bar/baz.txt");
System.out.println("Dirname: " + theFile.getParent());
System.out.println("Basename: " + theFile.getName());
}
}

What's wrong with your code? Wrapped in a neat utility method it's fine.
What's more important is what to use as separator — the first or last dot. The first is bad for file names like "setup-2.5.1.exe", the last is bad for file names with multiple extensions like "mybundle.tar.gz".

File extensions are a broken concept
And there exists no reliable function for it. Consider for example this filename:
archive.tar.gz
What is the extension? DOS users would have preferred the name archive.tgz. Sometimes you see stupid Windows applications that first decompress the file (yielding a .tar file), then you have to open it again to see the archive contents.
In this case, a more reasonable notion of file extension would have been .tar.gz. There are also .tar.bz2, .tar.xz, .tar.lz and .tar.lzma file "extensions" in use. But how would you decide, whether to split at the last dot, or the second-to-last dot?
Use mime-types instead.
The Java 7 function Files.probeContentType will likely be much more reliable to detect file types than trusting the file extension. Pretty much all the Unix/Linux world as well as your Webbrowser and Smartphone already does it this way.

You can also user java Regular Expression. String.split() also uses the expression internally. Refer http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html

Maybe you could use String#split
To answer your comment:
I'm not sure if there can be more than one . in a filename, but whatever, even if there are more dots you can use the split. Consider e.g. that:
String input = "boo.and.foo";
String[] result = input.split(".");
This will return an array containing:
{ "boo", "and", "foo" }
So you will know that the last index in the array is the extension and all others are the base.

Related

StringBuilder split not working on some files

I'm trying to get the parent directory of each file and put it into a ListView in Java fx.
It does work, but not for some file names, and I cannot understand why.
Iterator<String> listIterator = loadedFiles.iterator();
StringBuilder listItem = null;
while (listIterator.hasNext()) {
File listFile = new File(listIterator.next());
listItem = new StringBuilder(Arrays.toString(listFile.getAbsolutePath().split(listFile.getName())));
toDir(listItem);
ctrl.fileList.getItems().add(listItem.toString());
}
gets the File Path, and cuts off the filename.
toDir:
private void toDir(StringBuilder builder) {
builder.deleteCharAt(builder.length() - 1);
builder.deleteCharAt(0);
if (builder.charAt(builder.length() - 1) == '\\') {
builder.deleteCharAt(builder.length() - 1);
}
}
Removes the Array brackets and the last '\'
The given file paths are:
C:\Users\Test\Downloads\048815 - Kopie (2).jpg
C:\Users\Test\Downloads\048815 - Kopie (3).jpg
C:\Users\Test\Downloads\048815 - Kopie (4).jpg
C:\Users\Test\Downloads\048815 - Kopie.jpg
C:\Users\Test\Downloads\048815.jpg
The first tree files, the ones with the () in names do not work, the file name is still in the String and added to the list, only the last two ones get the file name removed.
Your immediate problem is that you are misusing String#split(). The argument to split() is a regular expression, in which parentheses are grouping meta-characters. You do not want to use split() at all here. In fact, you are much better off using the path manipulation methods in java.nio.file.Path.
for (String fileName : loadedFiles)
{
Path filePath = Path.of(fileName);
Path directory = filePath.getParent();
ctrl.fileList.getItems().add(directory.toString());
}
Once you're comfortable with the API, you can reduce this to
for (String fileName : loadedFiles)
{
ctrl.fileList.getItems().add(Path.of(filename).getParent().toString());
}

Decode alfresco file name or replace unicode[_x0020_] characters in String/fileName

I am using alfresco download upload services using java.
When I upload the file to alfreco server it gives me the following path :
/app:Home/cm:Company_x0020_Home/cm:Abc/cm:TestFile/cm:V4/cm:BC1X_x0020_0400_x0020_0109-_x0028_1-2_x0029__v2.pdf
When I use the same file path and download using alfresco services I took the file name at the end of the path
i.e ABC1X_x0020_0400_x0020_0109-_x0028_1-2_x0029__v2.pdf
How can I remove or decode the [Unicode] characters in fileName
String decoded = URLDecoder.decode(queryString, "UTF-8");
The above does not work .
These are some Unicode characters which appeared in my file name.
https://en.wikipedia.org/wiki/List_of_Unicode_characters
Please do not mark the question as duplicate as I have searched below links but non of those gave the solution.
Following are the links that I have searched for replacing unicode charectors in String with java.
Java removing unicode characters
Remove non-ASCII characters from String in Java
How can I replace a unicode character in java string
Java Replace Unicode Characters in a String
The solution given by Jeff Potts will be perfect .
But i had a situation where i was using file name in diffrent project where i wont use org.alfresco related jars
I had to take all those dependencies to use for a simple file decoding
So i used java native methods which uses regex to parse the file name and decode it,which gave me the perfect solution which was same from using
ISO9075.decode(test);
This is the code which can be used
public String decode_FileName(String fileName) {
System.out.println("fileName : " + fileName);
String decodedfileName = fileName;
String temp = "";
Matcher m = Pattern.compile("\\_x(.*?)\\_").matcher(decodedfileName); //rejex which matches _x0020_ kind of charectors
List<String> unicodeChars = new ArrayList<String>();
while (m.find()) {
unicodeChars.add(m.group(1));
}
for (int i = 0; i < unicodeChars.size(); i++) {
temp = unicodeChars.get(i);
if (isInteger(temp)) {
String replace_char = String.valueOf(((char) Integer.parseInt(String.valueOf(temp), 16)));//converting
decodedfileName = decodedfileName.replace("_x" + temp + "_", replace_char);
}
}
System.out.println("Decoded FileName :" + decodedfileName);
return decodedfileName;
}
And use this small java util to know Is integer
public static boolean isInteger(String s) {
try {
Integer.parseInt(s);
} catch (NumberFormatException e) {
return false;
} catch (NullPointerException e) {
return false;
}
return true;
}
So the above code works as simple as this :
Example :
0028 Left parenthesis U+0028 You can see in the link
https://en.wikipedia.org/wiki/List_of_Unicode_characters
String replace_char = String.valueOf(((char) Integer.parseInt(String.valueOf("0028"), 16)));
System.out.println(replace_char);
This code gives output : ( which is a Left parenthesis
This is what the logic i have used in my java program.
The above program will give results same as ISO9075.decode(test)
Output :
fileName : ABC1X_x0020_0400_x0020_0109-_x0028_1-2_x0029__v2.pdf
Decoded FileName :ABC1X 0400 0109-(1-2)_v2.pdf
In the org.alfresco.util package you will find a class called ISO9075. You can use it to encode and decode strings according to that spec. For example:
String test = "ABC1X_x0020_0400_x0020_0109-_x0028_1-2_x0029__v2.pdf";
String out = ISO9075.decode(test);
System.out.println(out);
Returns:
ABC1X 0400 0109-(1-2)_v2.pdf
If you want to see what it does behind the scenes, look at the source.

How to match file name containing spaces

We need to validate the file types. So certain file types should be accepted.
Actual code works fine as long as there is no spaces between file name. But when there is one, we get to the "error ...!!!" path and program says the file is invalid.
Example: Home home.png is marked as invalid but HomeHome.png is marked as valid.
How can we match file name even if they contains withe spaces?
Below is my code
public boolean fileUpload(UploadedFile uploadFile) {
boolean validFile = true;
System.out.println("file date...!!!" + uploadFile.getFileName());
System.out.println("file date...!!!" + uploadFile.getSize());
String expression = "([^\\s]+(\\.(?i)(jpg|xls|xlsx|doc|jpeg|png|pdf|ppt|pptx|docx|txt|xml|PNG|101701|AQR|ERR|CFR|RPA|AES|CAO|CIC|CAI|IMC|IMA|ITC|CDN|CIR|IMR|IME|ITR|CDR|CUN|CIN|CCN|CGI|IDL|IGR|IIL|IQL|NNL|SSR|STN))$)";
if ((uploadFile.getSize() != 0)) {
System.out.println(" file name" + uploadFile.getFileName());
if ((!uploadFile.getFileName().matches(expression))) {
System.out.println("error ...!!!" + uploadFile.getFileName());
validFile = false;
}
}
}
Because you are skipping spaces in your expression. i.e caret (^ ) symbol
String expression = "([^\\s]+(\\.(?i)(jpg|xls|xlsx|doc|jpeg|png|pdf|ppt|pptx|docx|txt|xml|PNG|101701|AQR|ERR|CFR|RPA|AES|CAO|CIC|CAI|IMC|IMA|ITC|CDN|CIR|IMR|IME|ITR|CDR|CUN|CIN|CCN|CGI|IDL|IGR|IIL|IQL|NNL|SSR|STN))$)";
instead use:
String expression = "([^\\.]+(\\.(?i)(jpg|xls|xlsx|doc|jpeg|png|pdf|ppt|pptx|docx|txt|xml|PNG|101701|AQR|ERR|CFR|RPA|AES|CAO|CIC|CAI|IMC|IMA|ITC|CDN|CIR|IMR|IME|ITR|CDR|CUN|CIN|CCN|CGI|IDL|IGR|IIL|IQL|NNL|SSR|STN))$)";
If you okay with using regular expressions,try this
String fileName = uploadFile.getFileName().replaceAll("\s+","");
(([a-zA-Z0-9\s_\\.\-:])+.(?i)(jpg|xls|xlsx|doc|jpeg|png|pdf|ppt|pptx|docx|txt|xml|101701|AQR|ERR|CFR|RPA|AES|CAO|CIC|CAI|IMC|IMA|ITC|CDN|CIR|IMR|IME|ITR|CDR|CUN|CIN|CCN|CGI|IDL|IGR|IIL|IQL|NNL|SSR|STN)$)
This regix can deal with any space issues

Regex to extract the filename and drop the file timestamp from complete path in Java

i have complete file path and i just need to extract the filename and just extension. So my output would be fileName.csv.
For ex: complete path is:
/Dir1/Dir2/Dir3/Dir4/Dir5/Dir6/fileName_20150108_002_20150109013841.csv
My output of Regex should be fileName.csv.
Extension and level of directories are not fixed.
As part of my requirement i need single regex that can extract fileName.csv not fileName_20150108_002_20150109013841.csv.how can i do it in single regular expression ?
Without using regex this can be solved as -public static String getFileName(String args){
args = args.substring(args.lastIndexOf('/')+1);
return args.substring(0,args.indexOf('_')) + args.substring(args.indexOf('.'));
}
Below would work for you might be
[^\\/:*?"<>|\r\n]+$
This regex has been tested on these two examples:
\var\www\www.example.com\index.jsp
\index.jsp
or rather you should use File.getName() for better approach.
String filename = new File("Payload/brownie.app/Info.plist").getName();
System.out.println(filename)
another way is
int index = path.lastIndexOf(File.separatorChar);
String filename = path.substring(index+1);
finally after getting the full filename use below code snippet
String str = filename;// in your case filename will be fileName_20150108_002_20150109013841.csv
str = str.substring(0,str.indexOf('_'))+str.substring(str.lastIndexOf('.'));
System.out.println("filename is ::"+str); // output will be fileName.csv
In the below code, group one will be fileName_timestamp.extension. I've replaced numerics and underscores with empty string. This may look ugly, but still will server your purpose. If the file name contains numerics, we need go for a different approach.
public static void main(String[] args) {
String toBeSplitted = "/Dir1/Dir2/Dir3/Dir4/Dir5/Dir6/fileName_20150108_002_20150109013841.csv";
Pattern r = Pattern.compile("(/[a-zA-Z0-9_.-]+)+/?");
Matcher m = r.matcher(toBeSplitted);
if(m.matches()){
String s = m.group(1).replaceAll("(/|[0-9]|_)", "");
System.out.println(s);
}
}

Java Replacing Help Needed

Hey guy's so am trying to replace all characters and numbers to get the /hello/what/ only without the REMOVEThis4.PNG i don't want to use string.replace("REMOVEThis4.PNG", ""); cause i wanna use it on other strings not only that
Any help is great my code
String sFile = "/hello/what/REMOVEThis4.PNG";
if (sFile.contains("/")){
String Replaced = sFile.replaceAll("(?s)", "");
System.out.println(Replaced);
}
I want the the output to be
/hello/what/
Only thanks alot!
If you are trying to parse a path, I recommend to find the last index of /, and get the substring to this index plus one. So
string = string.substring(0, string.lastIndexOf("/") + 1);
No need to use regular expressions in your case:
String sFile = "/hello/what/REMOVEThis4.PNG";
// TODO check actual last index of "/" against -1
System.out.println(sFile.substring(0, sFile.lastIndexOf("/") + 1));
Output
/hello/what/
Note
In case you are dealing with actual files, you can probably spare yourself the String manipulation and use File.getParent() instead:
File file = new File("/hello/what/REMOVEThis4.PNG");
System.out.println(file.getParent());
Output (may change depending on your system)
\hello\what
Use Java's File API:
String example = "/hello/what/REMOVEThis4.PNG";
File file = new File(example);
System.out.println(example);
String absolutePath = file.getAbsolutePath();
String filePath = absolutePath.substring(0, absolutePath.lastIndexOf(File.separator));
System.out.println(filePath);

Categories

Resources