I have a program which requires path to a file. Currently, it's configured like this:
public class Main {
private static final String ACOUSTIC_MODEL_PATH =
"resource:/edu/cmu/sphinx/models/en-us/en-us";
private static final String DICTIONARY_PATH =
"resource:/edu/cmu/sphinx/models/en-us/cmudict-en-us.dict";
I have two questions:
What is this resource:/path/goes/here format, spefically resource word?
How do I specify path to a file on a disk?
I tried these options but file fails to be found:
private static final String DICTIONARY_PATH = "/d/cmudict-en-us.dict";
private static final String DICTIONARY_PATH = "file:/d/cmudict-en-us.dict";
private static final String DICTIONARY_PATH = "file://d/cmudict-en-us.dict";
Here is how supposedly the path is used (location is path here):
public static URL resourceToURL(String location) throws MalformedURLException {
Matcher jarMatcher = jarPattern.matcher(location);
if (jarMatcher.matches()) {
String resourceName = jarMatcher.group(1);
return ConfigurationManagerUtils.class.getResource(resourceName);
} else {
if (location.indexOf(':') == -1) {
location = "file:" + location;
}
return new URL(location);
}
}
Seems that resources:/ is just an internal pattern which is used in your program (see jarPattern regular expression which should be declared somewhere in your program). It's not a common syntax of describing the URLs. But if this pattern does not match, the fallback implementation is to treat the string as the URL. Local file URLs are always started with file:/// followed by normal path where backslashes are replaced with forward slashes.
Try the following string, it's a valid URL:
"file:///d:/cmudict-en-us.dict"
If you want to specify a path on a disk, you can use its absolute path like so:
String path = "C:\Folder\AnotherFolder\file.txt";
The problem with this, however, is that the '\' symbol can cause issues as java recognizes it as trying to mark an escape sequence like \n, \r, etc. To prevent this error, you can replace '\' with '\\' and the JVM will read it as a regular path. So instead of the code above, you can try this :
String path = "C:\\Folder\\AnotherFolder\\file.txt";
Related
I have various urls like this:
String a = "file:./bla/file.txt"; // Valid, see See [RFC 3986][1], path - rootless definition
String b = "file:.file.txt"; // Valid, see See [RFC 3986][1], path - rootless definition
String c = "file:./file.txt"; // Valid, see See [RFC 3986][1], path - rootless definition
String d = "file:///file.txt";
String e = "file:///folder/file.txt";
String f = "http://example.com/file.txt";
String g = "https://example.com/file.txt";
These are all valid URLS, and I can convert them to a URL in java without errors:
URL url = new URL(...);
I want to extract the filename from each of the examples above, so I'm left with just:
file.txt
I have tried the following, but this doesn't work for example b above (which is a valid URL):
b.substring(path.lastIndexOf('/') + 1); // Returns file:.file.txt
I can prob write some custom code to check for slashes, just wondering if there a better more robust way to do it?
The URI class properly parses the parts of a URI. For most URLs, you want the path of the URI. In the case of a URI with no slashes, there won’t be any parsing of the parts, so you’ll have to rely on the entire scheme-specific part:
URI uri = new URI(b);
String path = uri.getPath();
if (path == null) {
path = uri.getSchemeSpecificPart();
}
String filename = path.substring(path.lastIndexOf('/') + 1);
The above should work for all of your URLs.
Problem: sometimes we are getting links/phrases with invalid(for us) encoding.
Examples and my first solution below
Description:
I have to fix invalid encoded strings in one part of the application. Sometimes it is a word or phrase, but somtimes also a url. When its a URL I would like to change only wrongly encoded characters. If I decode with ISO and encode to UTF-8 the special url characters are also encoded (/ : ? = &). I coded a solution, which is working for my cases just fine, but those hashes you will see below are smelling badly to me.
Do you had a similar problem or do you know a library which allows to decode a phrase except some characters? Something like this:
decode(String value, char[] ignored)
I also though about braking URL into pieces and fix only path and query but it would be even more mess with parsing them etc..
TLDR: Decode ISO-8858-1 encoded URL and encode it to UTF-8. Dont touch URL specific characters (/ ? = : &)
Input/Output examples:
// wrong input
"http://some.url/xxx/a/%e4t%fcr%E4/b/%e4t%fcr%E4"
"t%E9l%E9phone"
// good output
"http://some.url/xxx/a/%C3%A4t%C3%BCr%C3%A4/b/%C3%A4t%C3%BCr%C3%A4"
"t%C3%A9l%C3%A9phone"
// very wrong output
"http%3A%2F%2Fsome.url%2Fxxx%2Fa%2F%C3%A4t%C3%BCr%C3%A4%2Fb%2F%C3%A4t%C3%BCr%C3%A4"
My first solution:
class EncodingFixer {
private static final String SLASH_HASH = UUID.randomUUID().toString();
private static final String QUESTION_HASH = UUID.randomUUID().toString();
private static final String EQUALS_HASH = UUID.randomUUID().toString();
private static final String AND_HASH = UUID.randomUUID().toString();
private static final String COLON_HASH = UUID.randomUUID().toString();
EncodingFixer() {
}
String fix(String value) {
if (isBlank(value)) {
return value;
}
return tryFix(value);
}
private String tryFix(String str) {
try {
String replaced = replaceWithHashes(str);
String fixed = java.net.URLEncoder.encode(java.net.URLDecoder.decode(replaced, ISO_8859_1), UTF_8);
return replaceBack(fixed);
} catch (Exception e) {
return str;
}
}
private String replaceWithHashes(String str) {
return str
.replaceAll("/", SLASH_HASH)
.replaceAll("\\?", QUESTION_HASH)
.replaceAll("=", EQUALS_HASH)
.replaceAll("&", AND_HASH)
.replaceAll(":", COLON_HASH);
}
private String replaceBack(String fixed) {
return fixed
.replaceAll(SLASH_HASH, "/")
.replaceAll(QUESTION_HASH, "?")
.replaceAll(EQUALS_HASH, "=")
.replaceAll(AND_HASH, "&")
.replaceAll(COLON_HASH, ":");
}
}
Or it should be more like: ???
Check if input is an URL
Create URL
Get path
Split by /
Fix every part
Put it back together
Same for query but little more complicated
??
I also though about it but it seems even more messy than those replaceAlls above :/
If you are able to recognize clearly that some string is an URL, then following user's #jschnasse answer in similar question on SO, this might be the solution you need:
URL url= new URL("http://some.url/xxx/a/%e4t%fcr%E4/b/%e4t%fcr%E4");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String correctEncodedURL=uri.toASCIIString();
System.out.println(correctEncodedURL);
outputs:
http://some.url/xxx/a/%25e4t%25fcr%25E4/b/%25e4t%25fcr%25E4
i have complete file path and i just need to extract the filename and just extension. So my output would be fileName.csv.
For ex: complete path is:
/Dir1/Dir2/Dir3/Dir4/Dir5/Dir6/fileName_20150108_002_20150109013841.csv
My output of Regex should be fileName.csv.
Extension and level of directories are not fixed.
As part of my requirement i need single regex that can extract fileName.csv not fileName_20150108_002_20150109013841.csv.how can i do it in single regular expression ?
Without using regex this can be solved as -public static String getFileName(String args){
args = args.substring(args.lastIndexOf('/')+1);
return args.substring(0,args.indexOf('_')) + args.substring(args.indexOf('.'));
}
Below would work for you might be
[^\\/:*?"<>|\r\n]+$
This regex has been tested on these two examples:
\var\www\www.example.com\index.jsp
\index.jsp
or rather you should use File.getName() for better approach.
String filename = new File("Payload/brownie.app/Info.plist").getName();
System.out.println(filename)
another way is
int index = path.lastIndexOf(File.separatorChar);
String filename = path.substring(index+1);
finally after getting the full filename use below code snippet
String str = filename;// in your case filename will be fileName_20150108_002_20150109013841.csv
str = str.substring(0,str.indexOf('_'))+str.substring(str.lastIndexOf('.'));
System.out.println("filename is ::"+str); // output will be fileName.csv
In the below code, group one will be fileName_timestamp.extension. I've replaced numerics and underscores with empty string. This may look ugly, but still will server your purpose. If the file name contains numerics, we need go for a different approach.
public static void main(String[] args) {
String toBeSplitted = "/Dir1/Dir2/Dir3/Dir4/Dir5/Dir6/fileName_20150108_002_20150109013841.csv";
Pattern r = Pattern.compile("(/[a-zA-Z0-9_.-]+)+/?");
Matcher m = r.matcher(toBeSplitted);
if(m.matches()){
String s = m.group(1).replaceAll("(/|[0-9]|_)", "");
System.out.println(s);
}
}
Whenever I try to convert a File to a JavaFx Media, it tries to make the path relative, which I do not want. I'm using a Mac.
This is my code:
static String AUDIO_URL_TO_TEST = "file://Users/Mike/Desktop/calb.mp3";
basicTime.getAudioOutput().setSource(new File(AUDIO_URL_TO_TEST));
I've tried almost everything for AUDIO_URL_TO_TEST, such as:
static String AUDIO_URL_TO_TEST = "file:///Users/Mike/Desktop/calb.mp3";
static String AUDIO_URL_TO_TEST = "file:/c:/Users/Mike/Desktop/calb.mp3";
static String AUDIO_URL_TO_TEST = "/Users/Mike/Desktop/calb.mp3";
static String AUDIO_URL_TO_TEST = "~/Users/Mike/Desktop/calb.mp3";
This is the code that setSource() calls:
Media m = new Media(source.getAbsoluteFile().toURI().toURL().toString());
player = new MediaPlayer(m);
Media ends up as something like this: /path/to/eclipse/directory/file://Users/Mike/Desktop/Calb.mp3, trying to make it relative.
I've tried things other than source.getAbsoluteFile().toURI().toURL().toString(), with just as little luck.
A side question: Why does the Media class only accept strings? That seems like a horrible design. Strings were meant to contain text, not reference files.
The API doc of Media says:
The Media class represents a media resource. It is instantiated from
the string form of a source URI. ...
So the constructor of it converts the String path to URI. But since none of the example paths in your question is a valid URI, Media treated them as relative paths. For more info please refer to File, URI and file protocol documentations. The valid URI can be:
File f = new File("C:/Users/Mike/Desktop/Calb.mp3");
Media m = new Media(f.toURI().toString());
Alternatively,
URI uri = new URI("file:///C:/Users/Mike/Desktop/Calb.mp3");
// or
URI uri = new URI("file:/C:/Users/Mike/Desktop/Calb.mp3");
// in short.
Media m = new Media(uri.toString());
Is there a better way to get file basename and extension than something like
File f = ...
String name = f.getName();
int dot = name.lastIndexOf('.');
String base = (dot == -1) ? name : name.substring(0, dot);
String extension = (dot == -1) ? "" : name.substring(dot+1);
I know others have mentioned String.split, but here is a variant that only yields two tokens (the base and the extension):
String[] tokens = fileName.split("\\.(?=[^\\.]+$)");
For example:
"test.cool.awesome.txt".split("\\.(?=[^\\.]+$)");
Yields:
["test.cool.awesome", "txt"]
The regular expression tells Java to split on any period that is followed by any number of non-periods, followed by the end of input. There is only one period that matches this definition (namely, the last period).
Technically Regexically speaking, this technique is called zero-width positive lookahead.
BTW, if you want to split a path and get the full filename including but not limited to the dot extension, using a path with forward slashes,
String[] tokens = dir.split(".+?/(?=[^/]+$)");
For example:
String dir = "/foo/bar/bam/boozled";
String[] tokens = dir.split(".+?/(?=[^/]+$)");
// [ "/foo/bar/bam/" "boozled" ]
Old question but I usually use this solution:
import org.apache.commons.io.FilenameUtils;
String fileName = "/abc/defg/file.txt";
String basename = FilenameUtils.getBaseName(fileName);
String extension = FilenameUtils.getExtension(fileName);
System.out.println(basename); // file
System.out.println(extension); // txt (NOT ".txt" !)
Source: http://www.java2s.com/Code/Java/File-Input-Output/Getextensionpathandfilename.htm
such an utility class :
class Filename {
private String fullPath;
private char pathSeparator, extensionSeparator;
public Filename(String str, char sep, char ext) {
fullPath = str;
pathSeparator = sep;
extensionSeparator = ext;
}
public String extension() {
int dot = fullPath.lastIndexOf(extensionSeparator);
return fullPath.substring(dot + 1);
}
public String filename() { // gets filename without extension
int dot = fullPath.lastIndexOf(extensionSeparator);
int sep = fullPath.lastIndexOf(pathSeparator);
return fullPath.substring(sep + 1, dot);
}
public String path() {
int sep = fullPath.lastIndexOf(pathSeparator);
return fullPath.substring(0, sep);
}
}
usage:
public class FilenameDemo {
public static void main(String[] args) {
final String FPATH = "/home/mem/index.html";
Filename myHomePage = new Filename(FPATH, '/', '.');
System.out.println("Extension = " + myHomePage.extension());
System.out.println("Filename = " + myHomePage.filename());
System.out.println("Path = " + myHomePage.path());
}
}
http://docs.oracle.com/javase/6/docs/api/java/io/File.html#getName()
From http://www.xinotes.org/notes/note/774/ :
Java has built-in functions to get the basename and dirname for a given file path, but the function names are not so self-apparent.
import java.io.File;
public class JavaFileDirNameBaseName {
public static void main(String[] args) {
File theFile = new File("../foo/bar/baz.txt");
System.out.println("Dirname: " + theFile.getParent());
System.out.println("Basename: " + theFile.getName());
}
}
What's wrong with your code? Wrapped in a neat utility method it's fine.
What's more important is what to use as separator — the first or last dot. The first is bad for file names like "setup-2.5.1.exe", the last is bad for file names with multiple extensions like "mybundle.tar.gz".
File extensions are a broken concept
And there exists no reliable function for it. Consider for example this filename:
archive.tar.gz
What is the extension? DOS users would have preferred the name archive.tgz. Sometimes you see stupid Windows applications that first decompress the file (yielding a .tar file), then you have to open it again to see the archive contents.
In this case, a more reasonable notion of file extension would have been .tar.gz. There are also .tar.bz2, .tar.xz, .tar.lz and .tar.lzma file "extensions" in use. But how would you decide, whether to split at the last dot, or the second-to-last dot?
Use mime-types instead.
The Java 7 function Files.probeContentType will likely be much more reliable to detect file types than trusting the file extension. Pretty much all the Unix/Linux world as well as your Webbrowser and Smartphone already does it this way.
You can also user java Regular Expression. String.split() also uses the expression internally. Refer http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html
Maybe you could use String#split
To answer your comment:
I'm not sure if there can be more than one . in a filename, but whatever, even if there are more dots you can use the split. Consider e.g. that:
String input = "boo.and.foo";
String[] result = input.split(".");
This will return an array containing:
{ "boo", "and", "foo" }
So you will know that the last index in the array is the extension and all others are the base.