I am trying to create a simple java program which reads and extracts the content from the file(s) inside zip file. Zip file contains 3 files (txt, pdf, docx). I need to read the contents of all these files and I am using Apache Tika for this purpose.
Can somebody help me out here to achieve the functionality. I have tried this so far but no success
Code Snippet
public class SampleZipExtract {
public static void main(String[] args) {
List<String> tempString = new ArrayList<String>();
StringBuffer sbf = new StringBuffer();
File file = new File("C:\\Users\\xxx\\Desktop\\abc.zip");
InputStream input;
try {
input = new FileInputStream(file);
ZipInputStream zip = new ZipInputStream(input);
ZipEntry entry = zip.getNextEntry();
BodyContentHandler textHandler = new BodyContentHandler();
Metadata metadata = new Metadata();
Parser parser = new AutoDetectParser();
while (entry!= null){
if(entry.getName().endsWith(".txt") ||
entry.getName().endsWith(".pdf")||
entry.getName().endsWith(".docx")){
System.out.println("entry=" + entry.getName() + " " + entry.getSize());
parser.parse(input, textHandler, metadata, new ParseContext());
tempString.add(textHandler.toString());
}
}
zip.close();
input.close();
for (String text : tempString) {
System.out.println("Apache Tika - Converted input string : " + text);
sbf.append(text);
System.out.println("Final text from all the three files " + sbf.toString());
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TikaException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
If you're wondering how to get the file content from each ZipEntry it's actually quite simple. Here's a sample code:
public static void main(String[] args) throws IOException {
ZipFile zipFile = new ZipFile("C:/test.zip");
Enumeration<? extends ZipEntry> entries = zipFile.entries();
while(entries.hasMoreElements()){
ZipEntry entry = entries.nextElement();
InputStream stream = zipFile.getInputStream(entry);
}
}
Once you have the InputStream you can read it however you want.
As of Java 7, the NIO АРI provides a better and more generic way of accessing the contents of ZIP or JAR files. Actually, it is now a unified API which allows you to treat ZIP files exactly like normal files.
In order to extract all of the files contained inside of a ZIP file in this API, you'd do as shown below.
In Java 8
private void extractAll(URI fromZip, Path toDirectory) throws IOException {
FileSystems.newFileSystem(fromZip, Collections.emptyMap())
.getRootDirectories()
.forEach(root -> {
// in a full implementation, you'd have to
// handle directories
Files.walk(root).forEach(path -> Files.copy(path, toDirectory));
});
}
In Java 7
private void extractAll(URI fromZip, Path toDirectory) throws IOException {
FileSystem zipFs = FileSystems.newFileSystem(fromZip, Collections.emptyMap());
for (Path root : zipFs.getRootDirectories()) {
Files.walkFileTree(root, new SimpleFileVisitor<Path>() {
#Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs)
throws IOException {
// You can do anything you want with the path here
Files.copy(file, toDirectory);
return FileVisitResult.CONTINUE;
}
#Override
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs)
throws IOException {
// In a full implementation, you'd need to create each
// sub-directory of the destination directory before
// copying files into it
return super.preVisitDirectory(dir, attrs);
}
});
}
}
Because of the condition in while, the loop might never break:
while (entry != null) {
// If entry never becomes null here, loop will never break.
}
Instead of the null check there, you can try this:
ZipEntry entry = null;
while ((entry = zip.getNextEntry()) != null) {
// Rest of your code
}
Sample code you can use to let Tika take care of container files for you.
http://wiki.apache.org/tika/RecursiveMetadata
Form what I can tell, the accepted solution will not work for cases where there are nested zip files. Tika, however will take care of such situations as well.
My way of achieving this is by creating ZipInputStream wrapping class that would handle that would provide only the stream of current entry:
The wrapper class:
public class ZippedFileInputStream extends InputStream {
private ZipInputStream is;
public ZippedFileInputStream(ZipInputStream is){
this.is = is;
}
#Override
public int read() throws IOException {
return is.read();
}
#Override
public void close() throws IOException {
is.closeEntry();
}
}
The use of it:
ZipInputStream zipInputStream = new ZipInputStream(new FileInputStream("SomeFile.zip"));
while((entry = zipInputStream.getNextEntry())!= null) {
ZippedFileInputStream archivedFileInputStream = new ZippedFileInputStream(zipInputStream);
//... perform whatever logic you want here with ZippedFileInputStream
// note that this will only close the current entry stream and not the ZipInputStream
archivedFileInputStream.close();
}
zipInputStream.close();
One advantage of this approach: InputStreams are passed as an arguments to methods that process them and those methods have a tendency to immediately close the input stream after they are done with it.
i did mine like this and remember to change url or zip files
jdk 15
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.util.Scanner;
import java.util.stream.Stream;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
import java.io.*;
import java.util.*;
import java.nio.file.Paths;
class Main {
public static void main(String[] args) throws MalformedURLException,FileNotFoundException,IOException{
String url,kfile;
Scanner getkw = new Scanner(System.in);
System.out.println(" Please Paste Url ::");
url = getkw.nextLine();
System.out.println("Please enter name of file you want to save as :: ");
kfile = getkw.nextLine();
getkw.close();
Main Dinit = new Main();
System.out.println(Dinit.dloader(url, kfile));
ZipFile Vanilla = new ZipFile(new File("Vanilla.zip"));
Enumeration<? extends ZipEntry> entries = Vanilla.entries();
while(entries.hasMoreElements()){
ZipEntry entry = entries.nextElement();
// String nextr = entries.nextElement();
InputStream stream = Vanilla.getInputStream(entry);
FileInputStream inpure= new FileInputStream("Vanilla.zip");
FileOutputStream outter = new FileOutputStream(new File(entry.toString()));
outter.write(inpure.readAllBytes());
outter.close();
}
}
private String dloader(String kurl, String fname)throws IOException{
String status ="";
try {
URL url = new URL("URL here");
FileOutputStream out = new FileOutputStream(new File("Vanilla.zip")); // Output File
out.write(url.openStream().readAllBytes());
out.close();
} catch (MalformedURLException e) {
status = "Status: MalformedURLException Occured";
}catch (IOException e) {
status = "Status: IOexception Occured";
}finally{
status = "Status: Good";}
String path="\\tkwgter5834\\";
extractor(fname,"tkwgter5834",path);
return status;
}
private String extractor(String fname,String dir,String path){
File folder = new File(dir);
if(!folder.exists()){
folder.mkdir();
}
return "";
}
}
Related
package codes;
import java.io.File;
import java.io.FileOutputStream;
import java.io.FilenameFilter;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;
public class Rough {
public static void main(String[] args) throws IOException {
private static final String FOLDER_PATH = "C:\\Users\\s13w63\\Desktop\\Zip";
File dir = new File(FOLDER_PATH);
File[] files = dir.listFiles(new FilenameFilter() {
#Override
public boolean accept(File directory, String fileName) {
if (fileName.endsWith(".txt")) {
return true;
}
return false;
}
});
for (File f : files)
{
FileOutputStream fos=new FileOutputStream("C:\\Users\\s13w63\\Desktop\\Source.zip");
ZipOutputStream zos=new ZipOutputStream(fos);
ZipEntry ze=new ZipEntry(f.getCanonicalPath());
zos.putNextEntry(ze);
zos.close();
System.out.println(f.getCanonicalPath());
}
}
}
I tried this code to ZIP the files, it was showing the file names but not zipping them. Should I have to add anything..and it was showing there is error in code continue to compile??
Help me to solve this issue
Use java.nio.file; it has a very nice solution to your problem.
Illustration:
final Path zipPath = Paths.get("C:\\Users\\s13w63\\Desktop\\Source.zip");
final Path dir = Paths.get("C:\\Users\\s13w63\\Desktop\\Zip");
final DirectoryStream<Path> dirstream
= Files.newDirectoryStream(dir, "*.txt");
final URI uri = URI.create("jar:" + zipPath.toUri());
final Map<String, ?> env = Collections.emptyMap();
String filename;
try (
final FileSystem zipfs = FileSystems.newFileSystem(uri, env);
) {
for (final Path entry: dirstream) {
filename = dir.relativize(entry).toString();
Files.copy(entry, zipfs.getPath("/" + filename));
}
}
Yes, that's right, you can open a zip file as a FileSystem; as such, every operation in Files can be used "on a zip"!
This is JSR 203 for you; you even have FileSystem implementations in memory, over FTP, Dropbox, and others.
Note about the necessity to have the file name as a String: it is because you cannot .resolve() a Path against another if the other Path is from a different provider; I have published a package which solves this particular problem (among others) and has a MorePaths.resolve() method for such cases.
Bharat,
Please paste the error message that you see
Also, I see few issues with your approach.
You might need to do the below things
Take out the definition of FOS and ZOS from the for loop and move it above it...
The code to add the file is missing...
Move zos.close() after the for loop
You could use the below code to add contents of the text file to the ZipOutputStream
/**
* Adds a file to the current zip output stream
*
* #param file
* the file to be added
* #param zos
* the current zip output stream
*/
private static void addFileToZip(File file, ZipOutputStream zos) {
try (BufferedInputStream bis = new BufferedInputStream(new FileInputStream(file))) {
zos.putNextEntry(new ZipEntry(file.getName()));
byte[] bytesIn = new byte[BUFFER_SIZE];
int read = 0;
while ((read = bis.read(bytesIn)) != -1) {
zos.write(bytesIn, 0, read);
}
zos.closeEntry();
} catch (IOException e) {
//Take appropriate action
}
}
.Net guy working on a java app
I am uploading using the following example as my starting point ( I have this working). This shows using FileContent needing a java.io.File which does not contain the actual file only a pointer to the actual file.
We are uploading from a web site and attempting to insert into the drive, I would prefer to do this using a memory Stream like the .Net example. I cannot see that in looking at the FileContent class. So my questing is: Is there a way to insert a file in Google drive that is in memory and not first on the hard drive?
private static File insertFile(Drive service, String title, String description,
String parentId, String mimeType, String filename) {
// File's metadata.
File body = new File();
body.setTitle(title);
body.setDescription(description);
body.setMimeType(mimeType);
// Set the parent folder.
if (parentId != null && parentId.length() > 0) {
body.setParents(
Arrays.asList(new ParentReference().setId(parentId)));
}
// File's content.
java.io.File fileContent = new java.io.File(filename);
FileContent mediaContent = new FileContent(mimeType, fileContent);
try {
File file = service.files().insert(body, mediaContent).execute();
// Uncomment the following line to print the File ID.
// System.out.println("File ID: " + file.getId());
return file;
} catch (IOException e) {
System.out.println("An error occured: " + e);
return null;
}
}
Override the AbstractInputStream build your own FileContent
package com;
import java.io.IOException;
import java.io.InputStream;
import com.google.api.client.http.AbstractInputStreamContent;
import com.google.api.client.util.Preconditions;
public class FileContent extends AbstractInputStreamContent {
private InputStream inputStream = null;
private long inputLength = 0;
public FileContent(String type, InputStream pInputStream) throws IOException {
super(type);
this.inputStream = Preconditions.checkNotNull(pInputStream);
this.inputLength = this.inputStream.available();
}
public long getLength() throws IOException {
return this.inputLength;
}
public boolean retrySupported() {
return false;
}
#Override
public InputStream getInputStream() throws IOException {
return this.inputStream;
}
}
I am trying to store string of array in a text file and read it. But I can't get it working. I am getting NullPointerError.
Exception in thread "main" java.lang.NullPointerException
at java.io.FileInputStream.<init>(Unknown Source)
at java.io.FileReader.<init>(Unknown Source)
at in.isuru.justconverter.FileDbTool.readFile(FileDbTool.java:41)
at in.isuru.justconverter.Test.main(Test.java:10)
Here's two classes.
package in.isuru.justconverter;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.StringTokenizer;
import javax.swing.JOptionPane;
public class FileDbTool {
File dataFile;
ArrayList<String> filePath;
public void checkFile(){
dataFile = new File("db.txt");
if(dataFile.exists()){
readFile();
}else{
try {
dataFile.createNewFile();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
JOptionPane.showMessageDialog(null, "Coudn't Create New File!");
System.exit(1);
}
}
}
public void readFile(){
int len;
try{
char[] chr = new char[4096];
StringBuffer buffer = new StringBuffer();
FileReader reader = new FileReader(dataFile);
try {
while ((len = reader.read(chr)) > 0) {
buffer.append(chr, 0, len);
}
}finally {
reader.close();
}
System.out.println(buffer.toString());
StringTokenizer st = new StringTokenizer(buffer.toString(), ",");
while (st.hasMoreTokens()) {
String value = st.nextToken();
filePath = null;
filePath = new ArrayList<String>();
filePath.add(value);
}
}catch(IOException e){
JOptionPane.showMessageDialog(null, "Read Error");
}
}
public String[] getFilePathArray(){
readFile();
return filePath.toArray(new String[filePath.size()]);
}
public File[] getFiles(){
String[] paths = getFilePathArray();
ArrayList<File> files = new ArrayList<File>();
for(int i = 0; i < paths.length; i++){
File file = new File(paths[i]);
files.add(file);
}
return files.toArray(new File[files.size()]);
}
public void eraseFile(){
dataFile.delete();
}
public void writeFile(String[] stuff){
try{
BufferedWriter out = new BufferedWriter(new FileWriter(dataFile, true));
out.append(stuff + ",");
}catch(IOException e){
}
}
public void writeToDb(String[] array){
writeFile(array);
}
}
And main class
package in.isuru.justconverter;
public class Test {
/**
* #param args
*/
public static void main(String[] args) {
FileDbTool app = new FileDbTool();
app.checkFile();
}
}
Well this is a portion of a swing program. I am trying to use text file as a small database.
Line 41 is this:
FileReader reader = new FileReader(dataFile);
so I'd wager that dataFile is null here.
However, you do seem to initialize it before calling this method, otherwise the exception would be thrown inside checkFile.
Are you sure you are not calling readFile directly somewhere without calling checkFile first? In any case, this pattern is not a recommended approach, because you are requiring the users of your class to call methods in a specific order.
From the stack trace , it seems like you called readfile() directly from main rather than through checkfile() . So dataFile is null since it is not initialized by checkfile . Also the stack trace and the given code doesn't match . When FileReader constructor is called with null argument , it will throw NullPointerException when it reaches FileInputstream constructor .
Here is the code from jdk source :
public FileInputStream(File file) throws FileNotFoundException {
String name = (file != null ? file.getPath() : null);
SecurityManager security = System.getSecurityManager();
if (security != null) {
security.checkRead(name);
}
if (name == null) {
throw new NullPointerException();
}
fd = new FileDescriptor();
fd.incrementAndGetUseCount();
open(name);
}
So I was wondering if it was possible to write all the console output to a separate file outside of Java? I know about the Printwriter and Filewriter method. However, in my experience those would work if I was using them all within one method, but I don't think I can do that with the code I have right now. Below is what I have...
Java Code
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
public class XMLTagParser extends DefaultHandler
{
private int i;
public XMLTagParser()
{
traverse(new File("C:/Documents and Settings/user/workspace/Intern Project/Proposals/Converted Proposals/Extracted Items"));
}
private static final class SaxHandler extends DefaultHandler
{
private StringBuffer buffer;
private String heading;
private boolean inHeading;
public void startElement(String uri, String localName, String qName, Attributes attrs)
{
if ("w:pStyle".equals(qName))
{
String val = attrs.getValue("w:val");
if (val.contains("Heading"))
{
if (isHeading(val))
{
System.out.println(val);
inHeading = true;
}
}
}
if("w:t".equals(qName))
{
if (inHeading == true)
{
buffer = new StringBuffer();
}
}
}
public void characters(char buff[], int offset, int length) throws SAXException
{
String s = new String(buff, offset, length);
if(buffer != null)
{
buffer.append(s);
heading = heading += s;
}
}
public void endElement(String uri, String localName, String qName)
{
buffer = null;
//if the qName is "w:p" and it is in the heading, print out the heading and then reset
if ("w:p".equals(qName) && inHeading == true)
{
System.out.println(heading);
heading = "";
inHeading = false;
}
}
// method to verify whether element is an actual heading
private static boolean isHeading(String heading)
{
String headingNumber = heading.substring(7,8);
String headingName = heading.substring(0,7);
if (headingName.equals("Heading"))
{
if (headingNumber.equals("1")
|| headingNumber.equals("2")
|| headingNumber.equals("3")
|| headingNumber.equals("4")
|| headingNumber.equals("5")
|| headingNumber.equals("6"))
{
return true;
}
}
return false;
}
}
/*private void writeFile(File file)
{
try
{
PrintWriter out = new PrintWriter(new FileWriter(file + "/" + i++));
out.close();
}
catch (IOException e)
{
e.printStackTrace(System.out);
}
}*/
private void traverse(File directory)
{
//Get all files in directory
File[] files = directory.listFiles();
for (File file : files)
{
if (file.getName().equals("document.xml"))
{
try
{
// creates and returns new instance of SAX-implementation:
SAXParserFactory factory = SAXParserFactory.newInstance();
// create SAX-parser...
SAXParser parser = factory.newSAXParser();
// prints out the current working proposal, traversing up the directory structure
System.out.println(file.getParentFile().getParentFile().getName());
// .. define our handler:
SaxHandler handler = new SaxHandler();
// and parse:
parser.parse(file.getAbsolutePath(), handler);
try
{
// instantiates new printwriter which writes out to a file
PrintWriter out = new PrintWriter(new FileWriter(file.getParentFile().getParentFile() + "/" + i++ + ".txt"));
out.close();
}
catch (IOException e)
{
e.printStackTrace(System.out);
}
}
catch (Exception ex)
{
ex.printStackTrace(System.out);
}
}
else if (file.isDirectory())
{
//It's a directory so (recursively) traverse it
traverse(file);
}
}
}
}
So I've instantiated the printwriter in there, but obviously it's no good if I have nothing to write to it. So I'm not really sure how I can get what's printing out the console to be written to that file. Any ideas? Thanks in advance.
If you really want to you can redirect System.out to any PrintStream like this:
PrintStream stream = new PrintStream("filename.txt");
System.setOut(stream);
If you get into rolling your own file logger you'll spend more time dealing with io issues, rolling files, file sizes, ect. You should use log4j instead! It will handle things like this and make your logging more flexible. It's pretty much the standard for java logging.
The System.out is basically an OutputStream; which by default points to the console. Instead, you could just create a new FileOutputStream instance pointing to the file of your choice, and identify this stream by setting it through System.setOut. That will do it for you, throughout the life-cycle of the program/application. Check this link for a complete code.
Instead of using System.out, you could use a FileWriter, write to it and flush it. It is unclear, why you increment i in your code. I guess you want to write everything to just one file.
Also, it looks like you never write to the Writer that you initialize.
You could keep using System.out, and redirect it using ">" to the file when you invoke the application. You can still retain stderr for direct console output.
Or do you mean something else when you write "outside java"?
simple: how do i read the contents of a directory in Java, and save that data in an array or variable of some sort? secondly, how do i open an external file in Java?
You can use java IO API. Specifically java.io.File, java.io.BufferedReader, java.io.BufferedWriter etc.
Assuming by opening you mean opening file for reading. Also for good understanding of Java I/O functionalities check out this link: http://download.oracle.com/javase/tutorial/essential/io/
Check the below code.
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
public class FileIO
{
public static void main(String[] args)
{
File file = new File("c:/temp/");
// Reading directory contents
File[] files = file.listFiles();
for (int i = 0; i < files.length; i++) {
System.out.println(files[i]);
}
// Reading conetent
BufferedReader reader = null;
try {
reader = new BufferedReader(new FileReader("c:/temp/test.txt"));
String line = null;
while(true)
{
line = reader.readLine();
if(line == null)
break;
System.out.println(line);
}
}catch(Exception e) {
e.printStackTrace();
}finally {
if(reader != null)
{
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
You can use a class java.io.File to do that. A File is an abstract representation of file and directory pathnames. You can retrieve the list of files/directories within it using the File.list() method.
There's also the Commons IO package which has a variety of methods for manipulating files and directories.
import java.io.File;
import java.io.IOException;
import java.util.Collection;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.filefilter.FileFilterUtils;
public class CommonsIO
{
public static void main( String[] args )
{
// Read the contents of a file into a String
try {
String contents = FileUtils.readFileToString( new File( "/etc/mtab" ) );
} catch (IOException e) {
e.printStackTrace();
}
// Get a Collection of files in a directory without looking in subdirectories
Collection<File> files = FileUtils.listFiles( new File( "/home/ross/tmp" ), FileFilterUtils.trueFileFilter(), null );
for ( File f : files ) {
System.out.println( f.getName() );
}
}
}
public class StackOverflow {
public static void main(String[] sr) throws IOException{
//Read a folder and files in it
File f = new File("D:/workspace");
if(!f.exists())
System.out.println("No File/Dir");
if(f.isDirectory()){// a directory!
for(File file :f.listFiles()){
System.out.println(file.getName());
}
}
//Read a file an save content to a StringBuiilder
File f1 = new File("D:/workspace/so.txt");
BufferedReader br = new BufferedReader(new FileReader(f1));
StringBuilder sb = new StringBuilder();
String line = "";
while((line=br.readLine())!=null)
sb.append(line+"\n");
System.out.println(sb);
}
}