Java/ImageIO Validate format before reading the entire file?

Java/ImageIO Validate format before reading the entire file? - java

I'm developing a Web application that will let users upload images.
My concern is the file´s size, specially if they are invalid formats.
I'm wondering if there´s a way in java (or a third party library) to check the allowed files formats (jpg, gif and png) before reading the entire file.

If you wish to support only a few types of images you can start by (up)loading the image and at some point use the first few bytes to check wether you wish to continue the upload.
Quite a lot of image formats can be recognized by the first few bytes, the magic number. If the number matches you don't know whether the file is valid of course, but it may be used to match extension and magic number to prevent is really does not correspond at all.
Have a look at this page to check out some Java which checks mime-types. Do read the docs or source to check whether any given method requires the entire file, or can operate on the first few bytes. I've not used those libraries :)
Also check out this page which also lists some java libraries, and some papers on which detection is based.
Don't forget to put in some feedback if you managed to find something you like!

You don't need 3rd party libraries. The code you have to write is simple.
At the point you are handling your uploads, filter the files by their extension. This isn't perfect, but will account for most of the cases.
However, this would mean files are already uploaded to the server. You can use a bit of javascript on the client-side to perform the same operation - check whether the value of the file-upload component contains an allowed file type - .jpg, .png, etc.

function extensionsOkay(fval) {
var extension = new Array();
extension[0] = ".png";
extension[1] = ".gif";
extension[2] = ".jpg";
extension[3] = ".jpeg";
extension[4] = ".bmp";
// No other customization needed.
var thisext = fval.substr(fval.lastIndexOf('.')).toLowerCase();
for(var i = 0; i < extension.length; i++) {
if(thisext == extension[i]) {
$('#support-documents').hide();
return true; }
}
// show client side error message
$('#span.failed').show();
return false;
}

Related

jQuery validation in JSP

For my servlet Java program, I have used JSP and used jQuery client side validation for .xls for uploading a file. It's running successfully for all the user but one of our users is getting problems while uploading a .xls file. The program is showing an alert message and does not allow to upload a file. We also checked browser compatibility.
Questions:
What would be the reason?
Is it adding extra char with extension .xls? If yes, then what would be solution?
Is it a network issue? If yes, then what would be solution?
This is the code:
function dosearch(){
var reportFile = document.form1.reportFile.value;
if(reportFile==null || reportFile==''){
$("#span_filename").show();
if(!window.console)
{
window.console = {log: function(){}};
}else{
console.log("file name is not valid"+reportFile);
}
}else if(reportFile.substring(reportFile.indexOf(".")+1)!="xls"){
alert("Please make sure the uploaded file is an excel file");
if(!window.console)
{
window.console = {log: function(){}};
}else{
console.log("invalid file format or might be its adding extra name or extension name with file , "+reportFile);
}
}else if(confirm('Do you want to upload now?')){
form1.bt1.disabled=true;
$("#pro").show();
document.form1.action='catalog?module=PayrollResultUpload&cmd=doUpload';
document.form1.submit();
}

Well, seems that you are only validating xls extension. To improve this, for example to check xlsx extension, I suggest you to make a regex pattern check.
Change the line:
}else if(reportFile.substring(reportFile.indexOf(".")+1)!="xls"){
with this other version:
}else if(reportFile.substring(reportFile.indexOf(".")+1).match(/^xls.?$/i)){
This will check if there is a character at the end of the extension.
Regarding network issues... You can check the connection with an AJAX worker polling connection regularly and advising the user if there is no connection, but I think its too complicated for this case. Better tell the user to try to upload later ;)

I have done some RND test in browser console there, I have got the exact problem , its getting extra dot(.) before the file but that dot(.) was not in the file it was in the client systems folder structure i.e. C:\ads.test\file\FILE_NAME_HERE.xls . So, I have used lastIndexOf() method instead of indexOf() method . now it running fine without any problem.
#Thiamath , gratitude for your valuable information for me,and be in a part for this question.

Trailing null (\x00) characters when writing text to Accumulo

I am trying to write the name of a file into Accumulo. I am using accumulo-core-1.43.
For some reason, certain files seem to be written into Accumulo with trailing \x00 characters at the end of the name. The upload is coming through a Java servlet (using the jquery file upload plugin). In the servlet, I check the name of the file with a System.out.println and it looks normal, and I even tried unescaping the string with
org.apache.commons.lang.StringEscapeUtils.unescapeJava(...);
The actual writing to accumulo looks like this:
Mutation mut = new Mutation(new Text(checkSum));
Value val = new Value(new Text(filename).getBytes());
long timestamp = System.currentTimeMillis();
mut.put(new Text(colFam), new Text(EMPTY_BYTES), timestamp, val);
but nothing unusual showed up there (perhaps \x00 isn't escaped)? But then if I do a scan on my table in accumulo, there will be one or more \x00 in the file name.
The problem this seems to cause is that I return that string within XML when I retrieve a list of files (where it shows up) and pass that back to the browser, the the XSL that is supposed to render the information in the XML no longer works when there's these extra characters (not sure why that is the case either).
In chrome, for the response on these calls, I see that there's three red dots after the file name, and when I hover over it, \u0 pops up (which I think is a different representation of 0/null?).
Anyway, I'm just trying to figure out why this happens, or at the very least, how I can filter out \x00 characters before returning the file in Java. any ideas?

You are likely incorrectly using the Hadoop Text class -- this is not an error with Accumulo. Specifically, you make the mistake in your above example:
Value val = new Value(new Text(filename).getBytes());
You must adhere to the length of provided by the Text class. See the Text javadoc for more information. If you're using Hadoop-2.2.0, you can use the provided copyBytes method on Text. If you're on older version of Hadoop where this method doesn't yet exist, you can use something like the ByteBuffer class or the System.arraycopy method to get a copy of the byte[] with the proper limits enforced.

Failing for Larger Input Files Only: FileServiceFactory getBlobKey throws IllegalArgumentException

I have a Google App Engine App that converts CSV to XML files. It works fine for small XML inputs, but refuses to finalize the file for larger inputed XML. The XML is read from, and the resulting csv files are written to, many times before finalization, over a long-running (multi-day duration) task. My problem is different than this FileServiceFactory getBlobKey throws IllegalArgumentException , since my code works fine both in production and development with small input files. So it's not that I'm neglecting to write to the file before closing/finalizing. However, when I attempt to read from a larger XML file. The input XML file is ~150MB, and the resulting set of 5 CSV files is each much smaller (perhaps 10MB each). I persisted the file urls for the new csv files, and even tried to close them with some static code, but I just reproduce the same error, which is
java.lang.IllegalArgumentException: creation_handle: String properties must be 500 characters or less. Instead, use com.google.appengine.api.datastore.Text, which can store strings of any length.
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedSingleValue(DataTypeUtils.java:242)
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedValue(DataTypeUtils.java:207)
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedValue(DataTypeUtils.java:173)
at com.google.appengine.api.datastore.Query$FilterPredicate.<init>(Query.java:900)
at com.google.appengine.api.datastore.Query$FilterOperator.of(Query.java:75)
at com.google.appengine.api.datastore.Query.addFilter(Query.java:351)
at com.google.appengine.api.files.FileServiceImpl.getBlobKey(FileServiceImpl.java:329)
But I know that it's not a String/Text data type issue, since I am already using similar length file service urls for the previous successful attempts with smaller files. It also wasn't an issue for the other stackoverflow post I linked above. I also tried putting one last meaningless write before finalizing, just in case it would help as it did for the other post, but it made no difference. So there's really no way for me to debug this... Here is my file closing code that is not working. It's pretty similar to the Google how-to example at http://developers.google.com/appengine/docs/java/blobstore/overview#Writing_Files_to_the_Blobstore .
log.info("closing out file 1");
try {
//locked set to true
FileWriteChannel fwc1 = fileService.openWriteChannel(csvFile1, true);
fwc1.closeFinally();
} catch (IOException ioe) {ioe.printStackTrace();}
// You can't get the blob key until the file is finalized
BlobKey blobKeyCSV1 = fileService.getBlobKey(csvFile1);
log.info("csv blob storage key is:" + blobKeyCSV1.getKeyString());
csvUrls[i-1] = blobKeyCSV1.getKeyString();
break;
At this point, I just want to finalize my new blob files for which I have the urls, but cannot. How can I get around this issue, and also, what may be the cause? Again, my code works for small files (~60 kB), but the input file of ~150MB fails). Thank you for any advice on what is causing this or how to get around it! Also, how long will my unfinalized files stick around for, before being deleted?

This issue was a bug in the Java MapReduce and Files API, which was recently fixed by Google. Read announcement here: groups.google.com/forum/#!topic/google-appengine/NmjYYLuSizo

Get real file extension -Java code

I would like to determine real file extension for security reason.
How can I do that?

Supposing you really mean to get the true content type of a file (ie it's MIME type) you should refer to this excellent answer.
You can get the true content type of a file in Java using the following code:
File file = new File("filename.asgdsag");
InputStream is = new BufferedInputStream(new FileInputStream(file));
String mimeType = URLConnection.guessContentTypeFromStream(is);

There are a number of ways that you can do this, some more complicated (and more reliable) than others. The page I linked to discusses quite a few of these approaches.

Not sure exactly what you mean, but however you do this it is only going to work for the specific set of file formats which are known to you
you could exclude executables (are you talking windows here?) - there's some file header information here http://support.microsoft.com/kb/65122 - you could scan and block files which look like they have an exe header - is this getting close to what you mean when you say 'real file extension'?

How do I get the length of an MP3 from its URL?

If I know the URL of an MP3 file, what is the easiest/fastest way to get its length, bitrate, size, etc?
How can I download just the ID3 tag part of the MP3 to get these details?

You will need to look at the ID3 tags in the mp3 file.
Unless you keep track of the metadata you want somewhere else.
To specifically get the Track length of the file you will need to look into the ID3 metadata tag for sure, specifically the 'TRCK' frame of the tag.
To only download the ID3 Tag part, you must first download the ID3 header part of the file.
This website contains very specific information about the ID3 Tag format. You will need to look at the version number of the ID3 Tag and then, based on that, you will need to find the information regarding how long the ID3 Tag is. Then, you must download the WHOLE tag because the frames are not in any specific order.
Then you should be able to use a third party library to find the TRCK frame and its data.

Here is how you get file size, but how do you get bitrate and track length?
import java.io.*;
import java.net.*;
public class FileSizeFromURL {
public static final void main(String[] args) {
URL url;
URLConnection conn;
int size;
if(args.length != 1) {
System.out.println("Usage: FileSizeFromURL ");
return;
}
try {
url = new URL(args[0]);
conn = url.openConnection();
size = conn.getContentLength();
if(size < 0)
System.out.println("Could not determine file size.");
else
System.out.println(args[0] + "\nSize: " + size);
conn.getInputStream().close();
}
catch(Exception e) {
e.printStackTrace();
}
}
}

what is the easiest/fastest way to get its length, bitrate, size, etc?
File size you can get with an HTTP HEAD request. If by ‘length’ you mean playing time in seconds, you cannot get this without fetching the entire file. You can guess, by fetching the first few MP3 frames, looking at their bitrate, and assuming that the rest of the file has the same bitrate, but given the popularity of Variable Bit-Rate encoding the likelihood this will be close to accurate is quite low.
ID3 tags can in theory contain information that might allow you to guess the length better, in the ASPI and ETCO tags. But in practice these are very rarely present.
How can I download just the ID3 tag part of the MP3 to get these details?
For ID3v2 tags, grab the start of the file. (It's possible for ID3v2 frames to be elsewhere, but in practice they're always there.) You can't tell how long the tag is going to be in advance. For text-only tags you're likely to find the information you want in the first 512-1024 bytes. Unfortunately more and more MP3s have embedded ‘album art’ pictures, which can be much longer; try to pick an ID3 library that will gracefully ignore truncated ID3 information.
ID3v1 tags are located at the end of the file. Again you can't tell how long they're going to be. And of course you don't know in advance whether the file has ID3v1 tags, ID3v2 tags, both or neither. Generally these days ID3v2 is a better bet though.
To read part of a file through HTTP you need the Range header. This too is not supported everywhere.
In summary, there are enough problems with this that the best option may well be giving up and just fetching the whole file.

I am answering this question a year+ after it was asked but I do have some info to add for those happening by at a later time. I answered a similar question about getting the image dimensions from an image on a remote server without downloading the whole image. See that discussion at the following URL:
Get image dimensions with Curl
You could definitely use the same technique to pull info out of the ID3 tags without downloading the whole MP3 from a remote server.
I hope that helps for future passers-by.

You cannot get this information from the URL alone.
You'll have to load the first few K of content and use an mp3 library to decode the header and get the values.

That data is encoded in the id3 tag, so you need to download at least that much of the file, which is at the beginning so you're in luck.
Alternatively you can look at the content-length header value in the http response header to know the length, if the server tells you which it may not.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.