Getting MIME type of a File - java

I want to get mimetype of a file can anyone please help me
I want MIME Type like this...
File file=new File("example.jpeg");
String MimeTypeOfFile=/*files mimetype*/;
Thank You in Advance

You can use the Apache Tika Library: It detects and extracts metadata and text from over a thousand different file types
http://tika.apache.org/0.7/detection.html
It has various methods like extension checking or reading file data to detect mime-type. It would be easy and efficient rather than writing yourself.
Example :
System.out.println(new Tika().detect(new File(PATH_TO_FILE)));

Related

Identify File Type in Java

I want to check that the user uploads only a particular file format (say text files only).
I've written a verification mechanism which checks for format after the file name like this
filename.txt
But, this created a problem when it was accepting other files also (like excel files) which are saved as .txt like
myexcelfile.txt is being assumed as a text file even when it is an excel file
So, What would be the unique parameter to check for to make sure that the uploaded file is of the required type?
Using apache-commons uploader, servlets.
======================EDIT=====================
Based on answers below, I've tried
FileInputStream my = new FileInputStream(uploadedFile2);
InputStream inputStream = new BufferedInputStream(my);
String mimeType = URLConnection.guessContentTypeFromStream(inputStream);
But is always returning a null value.
probe content type is based on filename extension and also there is a bug with this approach, checked that too.
I don't prefer to use third party file verifiers, I believe that this problem will have a logical solution.
Apache Tika has content detection capabilities for a wide range of file formats. From the documentation, one of the simplest ways to detect content type is based on the following code:
// default tika configuration can detect a lot of different file types
TikaConfig tika = new TikaConfig();
// meta data collected about the source file
Metadata metadata = new Metadata();
metadata.set(Metadata.RESOURCE_NAME_KEY, f.toString());
// determine mime type from file contents
String mimetype = tika.getDetector().detect
(TikaInputStream.get(uploadedFile2), metadata);
System.out.println("File " + uploadedFile2 + " is " + mimetype);
If mimetype is text/plain, then the file or stream contains plain text content.
You could open the file and read the first few bytes into a byte[] and check the values to see if it matches the known magic numbers for a particular file format. I tried finding out what that would be for an Excel file (pre-XML; the xlsx file format would identify as a zip file), but I haven't really found much data about that. The closest thing I've found so far was looking at the code for a Java Excel file parser library.
The old Excel data format used what's called BIFF. Check out the Apache POI library for parsers and such for those types of files. From the looks of it, the magic numbers for an Excel file would probably be 00 06 10 00 (for BIFF8 worksheet), or 00 05 10 00 (BIFF7 worksheet, sounds rather old).
try
Files.probeContentType(Paths.get("~/a.xls"))
note that output depends on system content type detector - it may be different on different machines.
As for me, this code returns
application/vnd.ms-excel
private static String getMimeType(String fileUrl) {
String extension = MimeTypeMap.getFileExtensionFromUrl(fileUrl);
return MimeTypeMap.getSingleton().getMimeTypeFromExtension(extension);
}

How to Open .rd file extension in Java

Can anyone help me in figuring out how to retreive data from .rd extension files in java. I want to copy data from .rd extension file into a word document file. I am new to java and hence unable to sort out the problem.
Thanks in Advance
Sai
See these documents :
Guidelines for Rd files : http://developer.r-project.org/Rds.html
Parsing Rd File : http://developer.r-project.org/parseRd.pdf
Once you are able to read the file by previous methods, you can convert into word document by using Apache POI.
Hope This Helps!

JSP: Get MIME Type on File Upload

I'm doing a file upload, and I want to get the Mime type from the uploaded file.
I was trying to use the request.getContentType(), but when I call:
String contentType = req.getContentType();
It will return:
multipart/form-data; boundary=---------------------------310662768914663
How can I get the correct value?
Thanks in advance
It sounds like as if you're homegrowing a multipart/form-data parser. I wouldn't recommend to do that. Rather use a decent one like Apache Commons FileUpload. For uploaded files, it offers a FileItem#getContentType() to extract the client-specified content type, if any.
String contentType = item.getContentType();
If it returns null (just because the client didn't specify it), then you can take benefit of ServletContext#getMimeType() based on the file name.
String filename = FilenameUtils.getName(item.getName());
String contentType = getServletContext().getMimeType(filename);
This will be resolved based on <mime-mapping> entries in servletcontainer's default web.xml (in case of for example Tomcat, it's present in /conf/web.xml) and also on the web.xml of your webapp, if any, which can expand/override the servletcontainer's default mappings.
You however need to keep in mind that the value of the multipart content type is fully controlled by the client and also that the client-provided file extension does not necessarily need to represent the actual file content. For instance, the client could just edit the file extension. Be careful when using this information in business logic.
Related:
How to upload files in JSP/Servlet?
How to check whether an uploaded file is an image?
just use:
public String ServletContext.getMimeType(String file)
You could use MimetypesFileTypeMap
String contentType = new MimetypesFileTypeMap().getContentType(fileName)); // gets mime type
However, you would encounter the overhead of editing the mime.types file, if the file type is not already listed. (Sorry, I take that back, as you could add instances to the map programmatically and that would be the first place that it checks)

Java library or text file that maps mime types to nice human friendly file types

GOAL
My goal is to find a text file or library that enables me to map when given a mime type input and return a nice human friendly format.
For example given the mime type for Word (as shown below) I would like a result that is something like "Microsoft Office Word Document".
application/vnd.openxmlformats-officedocument.wordprocessingml.document
I realise I could compile my own list and use something like a Map (Java) but then it would not be comprehensive etc.
SIMPLISTIC OPTION
I know I can examine and return the sub mime type and keep the last component but that is not very sophisticated as per the Word mime type above the result would be a very generic "document". I could expand and take more components but the result is still quite ugly.
KEY/VALUE FILE
Another option I have tried to find is a text file with key/value pairs where the key is the mime type in full and the value being the nice human friendly text.
text/plain=Plain Text File
application/octet-stream=Unknown binary file
This seems like a nice option but I have not been able to find a definitive text file with lots of entries. It would also be nice if a source for just the media( i prefer to call it the primary mime type) the "text" in "text/plain" was present so an unknown text mime type such as "text/unknown a.b.c" would return "Unknown text file/format".
Apache Tika supports MimeTypes. It also supports Content Detection by the way if you don't know the mime type. Anyway, it looks like you need to do:
String t = "text/plain";
org.apache.tika.mime.MimeTypes.getMimeType(t).getDescription();
Disclaimer: I didn't actually try it. Also, I don't know if it supports all mime types you need.
The following links might save you some time:
http://help.dottoro.com/lapuadlp.php
http://www.yolinux.com/TUTORIALS/LinuxTutorialMimeTypesAndApplications.html
http://www.hansenb.pdx.edu/DMKB/dict/tutorials/mime_typ.php
And here are a couple of links that maps mime types and the file extensions:
http://www.webmaster-toolkit.com/mime-types.shtml
http://www.w3schools.com/media/media_mimeref.asp
Use this library
this works by files,bytes,...
MimeUtil >
https://github.com/saces/MimeUtil
usage:
MagicMimeMimeDetector g = new MagicMimeMimeDetector();
Collection<MimeType> list = g.getMimeTypes(file);
if(list.size() > 0)
{
MimeType mime = list.iterator().next();
return mime.toString();
}

Get real file extension -Java code

I would like to determine real file extension for security reason.
How can I do that?
Supposing you really mean to get the true content type of a file (ie it's MIME type) you should refer to this excellent answer.
You can get the true content type of a file in Java using the following code:
File file = new File("filename.asgdsag");
InputStream is = new BufferedInputStream(new FileInputStream(file));
String mimeType = URLConnection.guessContentTypeFromStream(is);
There are a number of ways that you can do this, some more complicated (and more reliable) than others. The page I linked to discusses quite a few of these approaches.
Not sure exactly what you mean, but however you do this it is only going to work for the specific set of file formats which are known to you
you could exclude executables (are you talking windows here?) - there's some file header information here http://support.microsoft.com/kb/65122 - you could scan and block files which look like they have an exe header - is this getting close to what you mean when you say 'real file extension'?

Categories

Resources