Identifying file type in Java

Identifying file type in Java - java

Please help me to find out the type of the file which is being uploaded.
I wanted to distinguish between excel type and csv.
MIMEType returns same for both of these file. Please help.

I use Apache Tika which identifies the filetype using magic byte patterns and globbing hints (the file extension) to detect the MIME type. It also supports additional parsing of file contents (which I don't really use).
Here is a quick and dirty example on how Tika can be used to detect the file type without performing any additional parsing on the file:
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.HashMap;
import org.apache.tika.metadata.HttpHeaders;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.metadata.TikaMetadataKeys;
import org.apache.tika.mime.MediaType;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.Parser;
import org.xml.sax.helpers.DefaultHandler;
public class Detector {
public static void main(String[] args) throws Exception {
File file = new File("/pats/to/file.xls");
AutoDetectParser parser = new AutoDetectParser();
parser.setParsers(new HashMap<MediaType, Parser>());
Metadata metadata = new Metadata();
metadata.add(TikaMetadataKeys.RESOURCE_NAME_KEY, file.getName());
InputStream stream = new FileInputStream(file);
parser.parse(stream, new DefaultHandler(), metadata, new ParseContext());
stream.close();
String mimeType = metadata.get(HttpHeaders.CONTENT_TYPE);
System.out.println(mimeType);
}
}

I hope this will help. Taken from an example not from mine:
import javax.activation.MimetypesFileTypeMap;
import java.io.File;
class GetMimeType {
public static void main(String args[]) {
File f = new File("test.gif");
System.out.println("Mime Type of " + f.getName() + " is " +
new MimetypesFileTypeMap().getContentType(f));
// expected output :
// "Mime Type of test.gif is image/gif"
}
}
Same may be true for excel and csv types. Not tested.

I figured out a cheaper way of doing this with java.nio.file.Files
public String getContentType(File file) throws IOException {
return Files.probeContentType(file.toPath());
}
- or -
public String getContentType(Path filePath) throws IOException {
return Files.probeContentType(filePath);
}
Hope that helps.
Cheers.

A better way without using javax.activation.*:
URLConnection.guessContentTypeFromName(f.getAbsolutePath()));

If you are already using Spring this works for csv and excel:
import org.springframework.mail.javamail.ConfigurableMimeFileTypeMap;
import javax.activation.FileTypeMap;
import java.io.IOException;
public class ContentTypeResolver {
private FileTypeMap fileTypeMap;
public ContentTypeResolver() {
fileTypeMap = new ConfigurableMimeFileTypeMap();
}
public String getContentType(String fileName) throws IOException {
if (fileName == null) {
return null;
}
return fileTypeMap.getContentType(fileName.toLowerCase());
}
}
or with javax.activation you can update the mime.types file.

The CSV will start with text and the excel type is most likely binary.
However the simplest approach is to try to load the excel document using POI. If this fails try to load the file as a CSV, if that fails its possibly neither type.

Related

i get an error when i run this code

package demo;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;
import org.apache.poi.openxml4j.opc.*;
import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class DocxToPdf {
public static void main(String[] args){
try
{
String inputFile = "F:\\MY WORK\\CollectionPractice\\WebContent\\APCR1.docx";
String outputFile = "F:\\MY WORK\\CollectionPractice\\WebContent\\APCR1.pdf";
System.out.println("inputFile:" + inputFile + ",outputFile:" + outputFile);
FileInputStream in = new FileInputStream(inputFile);
XWPFDocument document = new XWPFDocument(in);
File outFile = new File(outputFile);
OutputStream out = new FileOutputStream(outFile);
PdfOptions options = null;
PdfConverter.getInstance().convert(document, out, options);
} catch (Exception e) {
e.printStackTrace();
}
}
}
when i run this code an error occur like these and i have used following jar files also.
error:
java.lang.NoSuchMethodError: org.apache.poi.POIXMLDocumentPart.getPackageRelationship()Lorg/apache/poi/openxml4j/opc/PackageRelationship;
jars:
List of jar files

You likely have jar-versions of POI mixed up. The error indicates that the class that was loaded did not have a method that the calling class saw during compilation, so you have a different version of POI in your classpath.
See "Component Map" at https://poi.apache.org/overview.html for the different components that are included and which jars they end up, make sure you only have one of these jars in your classpath, not multiple different versions.

create multiple files do loop with java and store in a drive , how to?

my first java program ..
so I'm trying to create a file and store in my pc using java
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
public class createfile {
public static void main(String[] args) throws IOException {
int[] numbers = {1,2,3};
for (int item : numbers) {
String key = "file" + item;
File file = File.createTempFile("c:\\",key,".txt");
Writer writer = new OutputStreamWriter(new FileOutputStream(file));
writer.write("abcdefghijklmnopqrstuvwxyz\n");
writer.write("01234567890112345678901234\n");
writer.write("!##$%^&*()-=[]{};':',.<>/?\n");
writer.write("01234567890112345678901234\n");
writer.write("abcdefghijklmnopqrstuvwxyz\n");
writer.close();
}
return file;
}
}
what am I missing here .. I coudln't figured it out. everything seem to follow along the book.
Thanks
===========update ===========
after I took of
- return file ;
- throws IOException ;
- and change to File file = File.createTempFile(key,".txt",new File("c:\\"));
I still get this error
Exception in thread "main" java.lang.Error: Unresolved compilation problems:
Unhandled exception type IOException
Unhandled exception type FileNotFoundException
Unhandled exception type IOException
Unhandled exception type IOException
Unhandled exception type IOException
Unhandled exception type IOException
Unhandled exception type IOException
Unhandled exception type IOException

you have some mistakes in java syntax:
When you declare method as void (here public static void main(....)) it means that method has no return value - so line "return file;" not needed here.
Use use wrong signature (wrong parameters types in File.createTempFile function.
Possible usages are:
createTempFile(String prefix, String suffix)
createTempFile(String prefix, String suffix, File directory)
For additional information about File class use this link: http://docs.oracle.com/javase/6/docs/api/java/io/File.html
Following possible version of working code:
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.Writer;
public class createfile
{
public static void main(String[] args) throws IOException
{
int[] numbers = {1,2,3};
for (int item : numbers)
{
String key = "file" + item;
File file = File.createTempFile(key,".txt",new File("c:\\"));
Writer writer = new OutputStreamWriter(new FileOutputStream(file));
writer.write("abcdefghijklmnopqrstuvwxyz\n");
writer.write("01234567890112345678901234\n");
writer.write("!##$%^&*()-=[]{};':',.<>/?\n");
writer.write("01234567890112345678901234\n");
writer.write("abcdefghijklmnopqrstuvwxyz\n");
writer.close();
}
}
}
You can also see another sample how to write text to file: http://www.homeandlearn.co.uk/java/write_to_textfile.html. This link use NetBeans as Java Tool for writing code. I strongly suggest to use some IDE (Eclipse,NetBeans) to write code in java.It will mark your compile mistakes and will suggest corrections.
NetBeans site:https://netbeans.org/
Welcome to Java world

public static void main(String[] args) throws IOException { doesn't return anything, so the return file statement is not required
File.createTempFile either takes String, String, File or String, String so File file = File.createTempFile("c:\\", key, ".txt"); won't compile.
Something like, File file = File.createTempFile(key, ".txt", new File("c:\\")); might be a better idea, but is depended on what you want to achieve.
The JavaDocs state that the prefix must be at least three characters long, so you'll need to pad the key value to meet these requirements.
You MAY find using something like...
File file = new File("C:\\" + key + ".txt");
more managable...

Save content of HTML in local storage

i want to fetch xml file from the links like
http://api.worldbank.org/countries/GBR/indicators/NY.GDP.MKTP.KD.ZG?date=2004:2012
it returns a xml file, i don't know how to save this file in my folder named "temp" using java or javascripts, actually i don't want to display this result of that link to the user, I'm generating such links dynamically.
please help!!!

I recommend you to use an HTML parser library like jsoup in this situation. Please have a look at the below steps for better under standing:
1. Download jsoup core library (jsoup-1.6.1.jar) from http://jsoup.org/download
2. Add the jsoup-1.6.1.jar file to your classpath.
3. Try the below code to save the xml file from the URL.
package com.overflow.stack;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
/**
*
* #author sarath_sivan
*/
public class XmlExtractor {
public static StringBuilder fetchXmlContent(String url) throws IOException {
StringBuilder xmlContent = new StringBuilder();
Document document = Jsoup.connect(url).get();
xmlContent.append(document.body().html());
return xmlContent;
}
public static void saveXmlFile(StringBuilder xmlContent, String saveLocation) throws IOException {
FileWriter fileWriter = new FileWriter(saveLocation);
BufferedWriter bufferedWriter = new BufferedWriter(fileWriter);
bufferedWriter.write(xmlContent.toString());
bufferedWriter.close();
System.out.println("Downloading completed successfully..!");
}
public static void downloadXml() throws IOException {
String url = "http://api.worldbank.org/countries/GBR/indicators/NY.GDP.MKTP.KD.ZG?date=2004:2012";
String saveLocation = System.getProperty("java.io.tmpdir")+"sarath.xml";
XmlExtractor.saveXmlFile(XmlExtractor.fetchXmlContent(url), saveLocation);
}
public static void main(String[] args) throws IOException {
XmlExtractor.downloadXml();
}
}
4. Once the above code is executed successfully, a file named "sarath.xml" should be there in your temp folder.
Thank you!

Well your body is XML not HTML, just retrieve it using Apache HttpClient, and pump the read InputStream to a FileOutputStream. What was the problem? Do you want to save parsed content in a formatted form?

public String execute() {
try {
String url = "http://api.worldbank.org/countries/GBR/indicators/NY.GDP.MKTP.KD.ZG?date=2004:2012";
String saveLocation = System.getProperty("java.io.tmpdir")+"sarath.xml";
XmlExtractor.saveXmlFile(XmlExtractor.fetchXmlContent(url), saveLocation);
} catch (Exception e) {
e.printStackTrace();
addActionError(e.getMessage());
}
return SUCCESS;
}

Reading MS Word 2007 using Java

I am trying to read a Microsoft word file through Java. I have included all the .jar files from Apache poi-3.8-beta1 to my classpath. However, when I try running this, I get the following exception:
org.apache.poi.poifs.filesystem.OfficeXmlFileException: The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:131)
at org.apache.poi.poifs.storage.HeaderBlock.<init>(HeaderBlock.java:104)
at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:138)
at readingmsword07.Main.main(Main.java:27)
Following is my code:
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.*;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class Main {
public static void main(String[] args) {
try {
FileInputStream fis = new FileInputStream("C:\\TrialDoc.docx");
POIFSFileSystem fileSystem = new POIFSFileSystem(fis);
org.apache.poi.xwpf.extractor.XWPFWordExtractor oleTextExtractor =
new XWPFWordExtractor(new XWPFDocument(fis));
System.out.print(oleTextExtractor.getText());
} catch (Exception e) {
e.printStackTrace();
}
}
}
I am using the XWPFWordExtractor since I am trying to read a 2007 word document but for some reason I am unable to figure out the right POI that deals with this.
Any help is much appreciated. Thanks in advance!
~ Woods

remove the line,
POIFSFileSystem fileSystem = new POIFSFileSystem(fis);

Why am I getting a NullPointerException when trying to read a file?

I use this test to convert txt to pdf :
package convert.pdf;
//getResourceAsStream(String name) : Returns an input stream for reading the specified resource.
//toByteArray : Get the contents of an InputStream as a byte[].
import java.io.FileOutputStream;
import java.io.IOException;
import org.apache.commons.io.IOUtils;
import convert.pdf.txt.TextConversion;
public class TestConversion {
private static byte[] readFilesInBytes(String file) throws IOException {
return IOUtils.toByteArray(TestConversion.class.getResourceAsStream(file));
}
private static void writeFilesInBytes(byte[] file, String name) throws IOException {
IOUtils.write(file, new FileOutputStream(name));
}
//just change the extensions and test conversions
public static void main(String args[]) throws IOException {
ConversionToPDF algorithm = new TextConversion();
byte[] file = readFilesInBytes("/convert/pdf/text.txt");
byte[] pdf = algorithm.convertDocument(file);
writeFilesInBytes(pdf, "text.pdf");
}
}
Problem:
Exception in thread "main" java.lang.NullPointerException
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1025)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:999)
at org.apache.commons.io.IOUtils.toByteArray(IOUtils.java:218)
at convert.pdf.TestConversion.readFilesInBytes(TestConversion.java:17)
at convert.pdf.TestConversion.main(TestConversion.java:28)
I use the debugger, and the problem seems to be located here :
private static byte[] readFilesInBytes(String file) throws IOException {
return IOUtils.toByteArray(TestConversion.class.getResourceAsStream(file));
}
What is my problem?

Sounds like the resource probably doesn't exist with that name.
Are you aware that Class.getResourceAsStream() finds a resource relative to that class's package, whereas ClassLoader.getResourceAsStream() doesn't? You can use a leading forward slash in Class.getResourceAsStream() to mimic this, so
Foo.class.getResourceAsStream("/bar.png")
is roughly equivalent to
Foo.class.getClassLoader().getResourceAsStream("bar.png")
Is this actually a file (i.e. a specific file on the normal file system) that you're trying to load? If so, using FileInputStream would be a better bet. Use Class.getResourceAsStream() if it's a resource bundled in a jar file or in the classpath in some other way; use FileInputStream if it's an arbitrary file which could be anywhere in the file system.
EDIT: Another thing to be careful of, which has caused me problems before now - if this has worked on your dev box which happens to be Windows, and is now failing on a production server which happens to be Unix, check the case of the filename. The fact that different file systems handle case-sensitivity differently can be a pain...

Are you checking to see if the file exists before you pass it to readFilesInBytes()? Note that Class.getResourceAsStream() returns null if the file cannot be found. You probably want to do:
private static byte[] readFilesInBytes(String file) throws IOException {
File testFile = new File(file);
if (!testFile.exists()) {
throw new FileNotFoundException("File " + file + " does not exist");
}
return IOUtils.toByteArray(TestConversion.class.getResourceAsStream(file));
}
or better yet:
private static byte[] readFilesInBytes(String file) throws IOException {
InputStream stream = TestConversion.class.getResourceAsStream(file);
if (stream == null) {
throw new FileNotFoundException("readFilesInBytes: File " + file
+ " does not exist");
}
return IOUtils.toByteArray(stream);
}

This class reads a TXT file in the classpath and uses TextConversion to convert to PDF, then save the pdf in the file system.
Here TextConversion code :
package convert.pdf.txt;
//Conversion to PDF from text using iText.
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import convert.pdf.ConversionToPDF;
import convert.pdf.ConvertDocumentException;
import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Font;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;
public class TextConversion implements ConversionToPDF {
public byte[] convertDocument(byte[] documents) throws ConvertDocumentException {
try {
return this.convertInternal(documents);
} catch (DocumentException e) {
throw new ConvertDocumentException(e);
} catch (IOException e) {
throw new ConvertDocumentException(e);
}
}
private byte[] convertInternal(byte[] documents) throws DocumentException, IOException {
Document document = new Document();
ByteArrayOutputStream pdfResultBytes = new ByteArrayOutputStream();
PdfWriter.getInstance(document, pdfResultBytes);
document.open();
BufferedReader reader = new BufferedReader( new InputStreamReader( new ByteArrayInputStream(documents) ) );
String line = "";
while ((line = reader.readLine()) != null) {
if ("".equals(line.trim())) {
line = "\n"; //white line
}
Font fonteDefault = new Font(Font.COURIER, 10);
Paragraph paragraph = new Paragraph(line, fonteDefault);
document.add(paragraph);
}
reader.close();
document.close();
return pdfResultBytes.toByteArray();
}
}
And here the code to ConversionToPDF :
package convert.pdf;
// Interface implemented by the conversion algorithms.
public interface ConversionToPDF {
public byte[] convertDocument(byte[] documentToConvert) throws ConvertDocumentException;
}
I think the problem come from my file system (devbox on windows and server is Unix).
I will try to modify my classpath.

This problem may be caused by calling methods on test.txt, which can be a folder shortcut. In other words, you're calling a method on a file that doesn't exist, resulting in a NullPointerException.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Identifying file type in Java - java

Please help me to find out the type of the file which is being uploaded. I wanted to distinguish between excel type and csv. MIMEType returns same for both of these file. Please help.

A better way without using javax.activation.*: URLConnection.guessContentTypeFromName(f.getAbsolutePath()));

The CSV will start with text and the excel type is most likely binary. However the simplest approach is to try to load the excel document using POI. If this fails try to load the file as a CSV, if that fails its possibly neither type.

Related

i get an error when i run this code

create multiple files do loop with java and store in a drive , how to?

Save content of HTML in local storage

Reading MS Word 2007 using Java

Why am I getting a NullPointerException when trying to read a file?

Categories

Resources