How to get the file extension from its download link by Java? - java

I want to get the extensions of a few files from their download links.
Download links does not contain the extensions of their files. For example, a link looks like below:
http://yourshot.nationalgeographic.com/u/fQYSUbVfts-T7odkrFJckdiFeHvab0GWOfzhj7tYdC0uglagsDNfNYI4FFesWV5zeSPtcfpyHzKZI7dHjkluwtIYNkXOGmjh43Ktdn0VeBWhQ-9l2kheOPt5N2TM3yPEW4tTrtFFqniatwxxhbqsc78IU2pBaqWwyEVLeQx64zSda2CNGmUpSxyte_tamVoIk3y4zXisQ-vjmMp6n1BAB3nbUVlwWg/
I tried to get the files extension using myHttpUrlConnection.getContentType(), but the result was not the result what I want.
Some download links return a phrase like “text/plain”, ”application-octet-stream”,multipart/form-data ,…. But I just want correct and clear type, like rar, mp4, txt, jpeg,mkv, zip, png, apk, mp3, … .

You cannot do that. The getContentType() method simpy:
Returns the value of the content-type header field.
which in most cases is (though there is no guarantee) related to the file extension/file type, for example application/pdf would mean there is a PDF file under that URL.
Each of the file types with extension you have listed (rar, mp4, txt, jpeg,mkv, zip, png, apk, mp3) have another structure. To do reliably what you want to do, you would have to first download the whole file and then check its type based on the contents.
A good example of a library you could use is Apache Tika.

Related

How to detect an HTML file that had extension changed to Excel .xls

I have a java app that processes excel files from emails in an automated fashion (.xls, xlsx etc). I've noticed that some files are not native files. Opening in Excel will give a warning that the file is corrupt/badly formated. Opening in notepad++ clearly shows HTML
Unfortunately I can't just manually handle these files so I need a way to automatically spot them.
I noticed that when I use java.io.fiile object then with org.apache.tika.Tika I can detect the the type. So with the file object I can find out the extension, and with tika.detect() i can find that the format is called "text/html". (Not sure if this is the best way, but it seems to work with my singular example)
So I can then find these kinds of files using:
File file = getTheFileObject();
if ( tika.detect(file).equalsIgnoreCase("text/html") && file.getName().contains(".xls") ) { ... do what I want with the corrupt file... }
My problem comes when doing something similar with email attachments. To get the file from emails I'm using the com.microsoft.ews-java-api 2.0 and from this I can get a FileAttachment object which represents the file.
But when I attempt to use tika.detect() on this (same corrupt file) i get a different format output "application/octet-stream" instead of "text/html". Or get "application/vnd.ms-excel" using the FileAttachments own methods
How can I spot these corrupt files if I can't spot the html formated xls files?
FileAttachment attachment = getFileAttachment();
attachment.getContentType() //application/vnd.ms-excel
tika.detect(attachment.getContentStream()) //application/octet-stream
How would I spot an html file that has .xls file extension from the emails ews FileAttachment object? Will tika still help?

convert .spd file to pdf file

I have an .spd file and I want to convert it to pdf file, I looked for libraries over web but couldn't find any.
Actually, my android app gives me spd and jpg file. I am able to convert from jpg to pdf but it takes a lot of time and I also used different libraries for that but got same time for that process. So I switched to convert spd to pdf file, but I am not able to locate any java based library for that.
It would be great if anyone could suggest any library or something.
Thanks
The extension of SPen's files is indeed .spd. I'm afraid that there is currently no official MIME type associated with .spd files.
You can find offical MIME types in the IANA's MIME Media Types register: http://www.iana.org/assignments/media-types
This might help you:
Go to http://developer.samsung.com/samsung-mobile-sdk/sdk Download
the SDK
Inside the binaries there are some programming guides as mentionned
in 1.3 at http://developer.samsung.com/samsung-mobile-sdk
Have a look to ProgrammingGuide_Pen.pdf
See quote like : The sample application saves the data created with
the Pen package in a file. The application supports the SPD format
for Pen data files and the +SPD data format (image file with added
SPD data) for general image files.

Two applications need to export and import a single file which needs to include data and images, best file type?

I'm making two Java applications one to collect data, another to use it. The one collecting will be importing a file from the other which will include data and images and will be decrypted.
I'm unsure what filetype to use. So far all of the data is in XML and works great but I need the images and was hoping not to have to rely on giving all the images in a folder with a path reference.
Ideas?
well, I think that the best way is to create your own format (.myformat or .data). This file will be in fact a Zip file that contains your XML file and images.
There is no perfect example writen in java as far as I know. However, here are some examples :
Not in java
The best example is, as #Bolo said, the odt format. Indeed, OpenOffice writes the doc in an xml file, and the images too. All that is wrapped in an odt file.
The .exe file is an other example. The C files and the resources are put in a single file. try to open it with 7-zip, you'll see.
The Skyrim plugins are .esp file that contain the dds, the scripts, the niffs (textures)...
In java
The minecraft texture packs are a zip file that contains a .mcmeta file (the infos) and the textures (.png)
Jar files are like exe.
If both programs are in java you could also go with serialization, which is basically saving an object as a file (suffix will be .ser I think) and then being able to retrieve it. You should google it, even if it won't help right now it is quite good to know about it.
I'd suggest using JSON. Gson is a decent library.
You can embed images as byte arrays.
Save the serialized string in a file with a preferred extension, read it from the second application, de-serialize, and reconstruct images.
You can convert binary image data to text with Base64 encoding and this way you can embed your images in XML. [1]: http://en.wikipedia.org/wiki/Base64

finding file extension by java code

Suppose users can upload files, I want to find the extension of the uploaded file.
Even if the user has renamed the extension of the file, I want to find the real extension of that file by it's header or bytecode..etc.
Please help me with a solution.
Note: Not just the extension by substring() or getContentType() but the real file extension,
say for example(in windows), its a .doc file and user renames it to .jpg and uploads it.
Its possible in php, but I don't know how to do that in java! but it can be done.
Thank you.
Apache Commons: FileUpload is a good place to start. further, you could look at the link i referenced: JSP: Get MIME Type on File Upload for hints on how to do this.
As the user states in their question, the thought is if i rename a .png to a .jpg this will fool the getContentType() into thinking it's now a .jpg file. A quick search on google provided the following answer result: Get the mime type from a file that lists 3 very good options:
Apache Tika
The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries
JMimeMagic
jMimeMagic is a Java library for determining the MIME type of files or streams
A java library that claims to help you with this, is: mime-util
Enable Java programs to detect MIME types based on file extensions, magic data and content sniffing. Supports detection from java.io.File, java.io.InputStream, java.net.URL and byte arrays.
A quick search brought this post up: http://fredeaker.blogspot.com/2006/12/file-type-mime-detection.html
The post lists several "magic" libraries that can detect the file type based on its contents.

Validation to check whether its a ".txt" file

I have this particular piece of code for restricting the users to upload image files only.
if (!fileName.getContentType().startsWith("image/"))
errors.add("", new ActionError("errors.imageFile.contentType"));
Similary I want the users to upload only files with extension ".txt" in another scenario. What MIME type should I use or please let me know the code which will be helpful for achieving this task.
Typically the mime type for text files is text/plain
Text files have the following MIME type:
text/plain
However, according to this site, it is not the only one. You can use Apache's FileNameUtils getExtension method to get the extension of the file.
What MIME type should I use ..?
Content-Type: text/plain
I want the users to upload only files with extension ".txt" in another scenario.
The mime type for plain text files is "text/plain". Or you can check the name of the uploaded file.
However, these won't prevent users uploading non-text files. All they need to do (on Windows) is to rename a non-text file to have the ".txt" extension ... and then upload it.
If you really want to make sure that users only upload text, you need to test the files after they have been uploaded.

Categories

Resources