How to match file name containing spaces - java

We need to validate the file types. So certain file types should be accepted.
Actual code works fine as long as there is no spaces between file name. But when there is one, we get to the "error ...!!!" path and program says the file is invalid.
Example: Home home.png is marked as invalid but HomeHome.png is marked as valid.
How can we match file name even if they contains withe spaces?
Below is my code
public boolean fileUpload(UploadedFile uploadFile) {
boolean validFile = true;
System.out.println("file date...!!!" + uploadFile.getFileName());
System.out.println("file date...!!!" + uploadFile.getSize());
String expression = "([^\\s]+(\\.(?i)(jpg|xls|xlsx|doc|jpeg|png|pdf|ppt|pptx|docx|txt|xml|PNG|101701|AQR|ERR|CFR|RPA|AES|CAO|CIC|CAI|IMC|IMA|ITC|CDN|CIR|IMR|IME|ITR|CDR|CUN|CIN|CCN|CGI|IDL|IGR|IIL|IQL|NNL|SSR|STN))$)";
if ((uploadFile.getSize() != 0)) {
System.out.println(" file name" + uploadFile.getFileName());
if ((!uploadFile.getFileName().matches(expression))) {
System.out.println("error ...!!!" + uploadFile.getFileName());
validFile = false;
}
}
}

Because you are skipping spaces in your expression. i.e caret (^ ) symbol
String expression = "([^\\s]+(\\.(?i)(jpg|xls|xlsx|doc|jpeg|png|pdf|ppt|pptx|docx|txt|xml|PNG|101701|AQR|ERR|CFR|RPA|AES|CAO|CIC|CAI|IMC|IMA|ITC|CDN|CIR|IMR|IME|ITR|CDR|CUN|CIN|CCN|CGI|IDL|IGR|IIL|IQL|NNL|SSR|STN))$)";
instead use:
String expression = "([^\\.]+(\\.(?i)(jpg|xls|xlsx|doc|jpeg|png|pdf|ppt|pptx|docx|txt|xml|PNG|101701|AQR|ERR|CFR|RPA|AES|CAO|CIC|CAI|IMC|IMA|ITC|CDN|CIR|IMR|IME|ITR|CDR|CUN|CIN|CCN|CGI|IDL|IGR|IIL|IQL|NNL|SSR|STN))$)";

If you okay with using regular expressions,try this
String fileName = uploadFile.getFileName().replaceAll("\s+","");

(([a-zA-Z0-9\s_\\.\-:])+.(?i)(jpg|xls|xlsx|doc|jpeg|png|pdf|ppt|pptx|docx|txt|xml|101701|AQR|ERR|CFR|RPA|AES|CAO|CIC|CAI|IMC|IMA|ITC|CDN|CIR|IMR|IME|ITR|CDR|CUN|CIN|CCN|CGI|IDL|IGR|IIL|IQL|NNL|SSR|STN)$)
This regix can deal with any space issues

Related

Decode alfresco file name or replace unicode[_x0020_] characters in String/fileName

I am using alfresco download upload services using java.
When I upload the file to alfreco server it gives me the following path :
/app:Home/cm:Company_x0020_Home/cm:Abc/cm:TestFile/cm:V4/cm:BC1X_x0020_0400_x0020_0109-_x0028_1-2_x0029__v2.pdf
When I use the same file path and download using alfresco services I took the file name at the end of the path
i.e ABC1X_x0020_0400_x0020_0109-_x0028_1-2_x0029__v2.pdf
How can I remove or decode the [Unicode] characters in fileName
String decoded = URLDecoder.decode(queryString, "UTF-8");
The above does not work .
These are some Unicode characters which appeared in my file name.
https://en.wikipedia.org/wiki/List_of_Unicode_characters
Please do not mark the question as duplicate as I have searched below links but non of those gave the solution.
Following are the links that I have searched for replacing unicode charectors in String with java.
Java removing unicode characters
Remove non-ASCII characters from String in Java
How can I replace a unicode character in java string
Java Replace Unicode Characters in a String
The solution given by Jeff Potts will be perfect .
But i had a situation where i was using file name in diffrent project where i wont use org.alfresco related jars
I had to take all those dependencies to use for a simple file decoding
So i used java native methods which uses regex to parse the file name and decode it,which gave me the perfect solution which was same from using
ISO9075.decode(test);
This is the code which can be used
public String decode_FileName(String fileName) {
System.out.println("fileName : " + fileName);
String decodedfileName = fileName;
String temp = "";
Matcher m = Pattern.compile("\\_x(.*?)\\_").matcher(decodedfileName); //rejex which matches _x0020_ kind of charectors
List<String> unicodeChars = new ArrayList<String>();
while (m.find()) {
unicodeChars.add(m.group(1));
}
for (int i = 0; i < unicodeChars.size(); i++) {
temp = unicodeChars.get(i);
if (isInteger(temp)) {
String replace_char = String.valueOf(((char) Integer.parseInt(String.valueOf(temp), 16)));//converting
decodedfileName = decodedfileName.replace("_x" + temp + "_", replace_char);
}
}
System.out.println("Decoded FileName :" + decodedfileName);
return decodedfileName;
}
And use this small java util to know Is integer
public static boolean isInteger(String s) {
try {
Integer.parseInt(s);
} catch (NumberFormatException e) {
return false;
} catch (NullPointerException e) {
return false;
}
return true;
}
So the above code works as simple as this :
Example :
0028 Left parenthesis U+0028 You can see in the link
https://en.wikipedia.org/wiki/List_of_Unicode_characters
String replace_char = String.valueOf(((char) Integer.parseInt(String.valueOf("0028"), 16)));
System.out.println(replace_char);
This code gives output : ( which is a Left parenthesis
This is what the logic i have used in my java program.
The above program will give results same as ISO9075.decode(test)
Output :
fileName : ABC1X_x0020_0400_x0020_0109-_x0028_1-2_x0029__v2.pdf
Decoded FileName :ABC1X 0400 0109-(1-2)_v2.pdf
In the org.alfresco.util package you will find a class called ISO9075. You can use it to encode and decode strings according to that spec. For example:
String test = "ABC1X_x0020_0400_x0020_0109-_x0028_1-2_x0029__v2.pdf";
String out = ISO9075.decode(test);
System.out.println(out);
Returns:
ABC1X 0400 0109-(1-2)_v2.pdf
If you want to see what it does behind the scenes, look at the source.

Gujarati text in Java String

I have Gujarati Bible and trying to insert each verse in MySQL database using parser written in Java. When I assign Gujarati text to Java String variable it shows junks in debug.
E.g. This is my Gujarati text
હે યહોવા તું મારો દેવ છે;
I assign it to Java String variable as shown below
verse._verseText = "હે યહોવા તું મારો દેવ છે;";
What i see in debug window is all junk characters. Any help is appreciated. If need more information let me know and I will provide as and when asked.
UPDATE
Pasting my parser code here
private Boolean Insert(String _text)
{
BibleVerse verse = new BibleVerse();
String[] data = _text.split("\\|");
try
{
if (data[0].equals(bookName) || bookName.equals("All"))
{
verse._Version = "Gujarati";
verse._book = data[0];
verse._chapter = Integer.parseInt(data[1]);
verse._verse = Integer.parseInt(data[2]);
verse._verseText = new String(data[3].getBytes(), "UTF-8");
_bibleDatabase.Insert(verse);
pcs.firePropertyChange("logupdate", null, data[0] + " " + data[1] + "," + data[2] + " - INSERTED.");
}
else
{
pcs.firePropertyChange("logupdate", null, data[0] + " " + data[1] + "," + data[2] + " - SKIPPED.");
}
return true;
}
catch(Exception e)
{
pcs.firePropertyChange("logupdate", null, "ERROR : " + e.getMessage());
return false;
}
}
Here is the sample line from the text file
Isaiah|25|1|હે યહોવા તું મારો દેવ છે; હું તને મોટો માનીશ, હું તારા નામની સ્તુતિ કરીશ; કેમકે તેં અદભુત કાર્યો કર્યાં છે, તેં વિશ્વાસુપણે તથા સત્યતાથી પુરાતન સંકલ્પો પાર પાડ્યા છે.
UPDATE
Here is the code where I open & read file.
try
{
FileReader _file = new FileReader(this._filename);
_bufferedReader = new BufferedReader(_file);
SwingWorker parseWorker = new SwingWorker()
{
#Override
protected Object doInBackground() throws Exception
{
String line;
String[] data;
int lineno=0;
BibleVerse verse = new BibleVerse();
while ((line = _bufferedReader.readLine()) != null)
{
++lineno;
pcs.firePropertyChange("pgbupdate", null, lineno);
Insert(line);
}
_bufferedReader.close();
return null;
}
#Override
protected void done()
{
pcs.firePropertyChange("logupdate", null, "Parsing complete.");
}
};
parseWorker.execute();
}
catch (Exception e)
{
pcs.firePropertyChange("logupdate", null, "ERROR : " + e.getMessage());
}
The problem is this:
FileReader _file = new FileReader(this._filename);
This reads the file using the platform's default charset. If your data file is not encoded in that charset, you will get incorrect characters.
On Windows, the default charset is almost always UTF-16LE. On most other systems, it's UTF-8.
The easiest solution is to find out the actual encoding of your data file, so you can specify it explicitly in the code. The encoding of a file can be determined with the file command on Unix and Linux systems. In Windows, you may need to examine it with a binary editor, or install something like Cygwin, which has a file command of its own.
Once you know what it is, you should pass it explicitly to the construction of your Reader:
// Replace "UTF-8" with the actual encoding of your data file (if it's not UTF-8).
Reader _file = new InputStreamReader(new FileInputStream(this._filename), "UTF-8");
Once you've done that, there is no reason for any other part of your code to concern itself with bytes. You should replace this:
verse._verseText = new String(data[3].getBytes(), "UTF-8");
with this:
verse._verseText = data[3];
how to inject chinese characters using javascript?
not quite the same problem, but I think the same solution may work in this case.
If the script is inline (in the HTML file), then it's using the
encoding of the HTML file and you won't have an issue.
If the script is loaded from another file:
Your text editor must save the file in an appropriate encoding such as
utf-8 (it's probably doing this already if you're able to save it,
close it, and reopen it with the characters still displaying
correctly) Your web server must serve the file with the right http
header specifying that it's utf-8 (or whatever the enocding happens to
be, as determined by your text editor settings). Here's an example for
how to do this with php: Set http header to utf-8 php If you can't
have your webserver do this, try to set the charset attribute on your
script tag (e.g. > I tried to see what the spec said should happen
in the case of mismatching charsets defined by the tag and the http
headers, but couldn't find anything concrete, so just test and see if
it helps. If that doesn't work, place your script inline
It looks like if you want to store Gujarati text in Java string, you need to use unicode characters. See this: http://jrgraphix.net/r/Unicode/0A80-0AFF
So for example the first Gujarati character:
char example = '0A80';
String result = Character.toString((char)example);

Regex to extract the filename and drop the file timestamp from complete path in Java

i have complete file path and i just need to extract the filename and just extension. So my output would be fileName.csv.
For ex: complete path is:
/Dir1/Dir2/Dir3/Dir4/Dir5/Dir6/fileName_20150108_002_20150109013841.csv
My output of Regex should be fileName.csv.
Extension and level of directories are not fixed.
As part of my requirement i need single regex that can extract fileName.csv not fileName_20150108_002_20150109013841.csv.how can i do it in single regular expression ?
Without using regex this can be solved as -public static String getFileName(String args){
args = args.substring(args.lastIndexOf('/')+1);
return args.substring(0,args.indexOf('_')) + args.substring(args.indexOf('.'));
}
Below would work for you might be
[^\\/:*?"<>|\r\n]+$
This regex has been tested on these two examples:
\var\www\www.example.com\index.jsp
\index.jsp
or rather you should use File.getName() for better approach.
String filename = new File("Payload/brownie.app/Info.plist").getName();
System.out.println(filename)
another way is
int index = path.lastIndexOf(File.separatorChar);
String filename = path.substring(index+1);
finally after getting the full filename use below code snippet
String str = filename;// in your case filename will be fileName_20150108_002_20150109013841.csv
str = str.substring(0,str.indexOf('_'))+str.substring(str.lastIndexOf('.'));
System.out.println("filename is ::"+str); // output will be fileName.csv
In the below code, group one will be fileName_timestamp.extension. I've replaced numerics and underscores with empty string. This may look ugly, but still will server your purpose. If the file name contains numerics, we need go for a different approach.
public static void main(String[] args) {
String toBeSplitted = "/Dir1/Dir2/Dir3/Dir4/Dir5/Dir6/fileName_20150108_002_20150109013841.csv";
Pattern r = Pattern.compile("(/[a-zA-Z0-9_.-]+)+/?");
Matcher m = r.matcher(toBeSplitted);
if(m.matches()){
String s = m.group(1).replaceAll("(/|[0-9]|_)", "");
System.out.println(s);
}
}

Java Replacing Help Needed

Hey guy's so am trying to replace all characters and numbers to get the /hello/what/ only without the REMOVEThis4.PNG i don't want to use string.replace("REMOVEThis4.PNG", ""); cause i wanna use it on other strings not only that
Any help is great my code
String sFile = "/hello/what/REMOVEThis4.PNG";
if (sFile.contains("/")){
String Replaced = sFile.replaceAll("(?s)", "");
System.out.println(Replaced);
}
I want the the output to be
/hello/what/
Only thanks alot!
If you are trying to parse a path, I recommend to find the last index of /, and get the substring to this index plus one. So
string = string.substring(0, string.lastIndexOf("/") + 1);
No need to use regular expressions in your case:
String sFile = "/hello/what/REMOVEThis4.PNG";
// TODO check actual last index of "/" against -1
System.out.println(sFile.substring(0, sFile.lastIndexOf("/") + 1));
Output
/hello/what/
Note
In case you are dealing with actual files, you can probably spare yourself the String manipulation and use File.getParent() instead:
File file = new File("/hello/what/REMOVEThis4.PNG");
System.out.println(file.getParent());
Output (may change depending on your system)
\hello\what
Use Java's File API:
String example = "/hello/what/REMOVEThis4.PNG";
File file = new File(example);
System.out.println(example);
String absolutePath = file.getAbsolutePath();
String filePath = absolutePath.substring(0, absolutePath.lastIndexOf(File.separator));
System.out.println(filePath);

Java: splitting the filename into a base and extension

Is there a better way to get file basename and extension than something like
File f = ...
String name = f.getName();
int dot = name.lastIndexOf('.');
String base = (dot == -1) ? name : name.substring(0, dot);
String extension = (dot == -1) ? "" : name.substring(dot+1);
I know others have mentioned String.split, but here is a variant that only yields two tokens (the base and the extension):
String[] tokens = fileName.split("\\.(?=[^\\.]+$)");
For example:
"test.cool.awesome.txt".split("\\.(?=[^\\.]+$)");
Yields:
["test.cool.awesome", "txt"]
The regular expression tells Java to split on any period that is followed by any number of non-periods, followed by the end of input. There is only one period that matches this definition (namely, the last period).
Technically Regexically speaking, this technique is called zero-width positive lookahead.
BTW, if you want to split a path and get the full filename including but not limited to the dot extension, using a path with forward slashes,
String[] tokens = dir.split(".+?/(?=[^/]+$)");
For example:
String dir = "/foo/bar/bam/boozled";
String[] tokens = dir.split(".+?/(?=[^/]+$)");
// [ "/foo/bar/bam/" "boozled" ]
Old question but I usually use this solution:
import org.apache.commons.io.FilenameUtils;
String fileName = "/abc/defg/file.txt";
String basename = FilenameUtils.getBaseName(fileName);
String extension = FilenameUtils.getExtension(fileName);
System.out.println(basename); // file
System.out.println(extension); // txt (NOT ".txt" !)
Source: http://www.java2s.com/Code/Java/File-Input-Output/Getextensionpathandfilename.htm
such an utility class :
class Filename {
private String fullPath;
private char pathSeparator, extensionSeparator;
public Filename(String str, char sep, char ext) {
fullPath = str;
pathSeparator = sep;
extensionSeparator = ext;
}
public String extension() {
int dot = fullPath.lastIndexOf(extensionSeparator);
return fullPath.substring(dot + 1);
}
public String filename() { // gets filename without extension
int dot = fullPath.lastIndexOf(extensionSeparator);
int sep = fullPath.lastIndexOf(pathSeparator);
return fullPath.substring(sep + 1, dot);
}
public String path() {
int sep = fullPath.lastIndexOf(pathSeparator);
return fullPath.substring(0, sep);
}
}
usage:
public class FilenameDemo {
public static void main(String[] args) {
final String FPATH = "/home/mem/index.html";
Filename myHomePage = new Filename(FPATH, '/', '.');
System.out.println("Extension = " + myHomePage.extension());
System.out.println("Filename = " + myHomePage.filename());
System.out.println("Path = " + myHomePage.path());
}
}
http://docs.oracle.com/javase/6/docs/api/java/io/File.html#getName()
From http://www.xinotes.org/notes/note/774/ :
Java has built-in functions to get the basename and dirname for a given file path, but the function names are not so self-apparent.
import java.io.File;
public class JavaFileDirNameBaseName {
public static void main(String[] args) {
File theFile = new File("../foo/bar/baz.txt");
System.out.println("Dirname: " + theFile.getParent());
System.out.println("Basename: " + theFile.getName());
}
}
What's wrong with your code? Wrapped in a neat utility method it's fine.
What's more important is what to use as separator — the first or last dot. The first is bad for file names like "setup-2.5.1.exe", the last is bad for file names with multiple extensions like "mybundle.tar.gz".
File extensions are a broken concept
And there exists no reliable function for it. Consider for example this filename:
archive.tar.gz
What is the extension? DOS users would have preferred the name archive.tgz. Sometimes you see stupid Windows applications that first decompress the file (yielding a .tar file), then you have to open it again to see the archive contents.
In this case, a more reasonable notion of file extension would have been .tar.gz. There are also .tar.bz2, .tar.xz, .tar.lz and .tar.lzma file "extensions" in use. But how would you decide, whether to split at the last dot, or the second-to-last dot?
Use mime-types instead.
The Java 7 function Files.probeContentType will likely be much more reliable to detect file types than trusting the file extension. Pretty much all the Unix/Linux world as well as your Webbrowser and Smartphone already does it this way.
You can also user java Regular Expression. String.split() also uses the expression internally. Refer http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html
Maybe you could use String#split
To answer your comment:
I'm not sure if there can be more than one . in a filename, but whatever, even if there are more dots you can use the split. Consider e.g. that:
String input = "boo.and.foo";
String[] result = input.split(".");
This will return an array containing:
{ "boo", "and", "foo" }
So you will know that the last index in the array is the extension and all others are the base.

Categories

Resources