Get InputStream from Apache's POI Workbook - java

Is there a way to obtain InputStream of Apache's POI Workbook?
I need it for piping to another OutputStream, however I'm unable to find such method (If it exists).
If it doesn't, any tips on alternative ways to obtain it?

Here's how to implement #2 of Alexander Tokarev's answer (i.e get Inputstream from a workbook):
try {
ByteArrayOutputStream bos = new ByteArrayOutputStream();
workbook.write(bos);
byte[] barray = bos.toByteArray();
InputStream is = new ByteArrayInputStream(barray);
} catch (IOException e) {
e.printStackTrace();
}

There's a several ways to solve this:
You can use standard java PipetInputStream and PipedOutputStream. But you have to create different thread for using PipedInputStream (as described in http://docs.oracle.com/javase/7/docs/api/java/io/PipedInputStream.html)
You can write the content to ByteArrayOutputStream, and then you can use resulting byte array via ByteArrayInputStream. This can be done sequentially in one thread.

You can check https://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/xssf/eventusermodel/XLSX2CSV.java
here they are converting the whole sheet to csv file, using inputstream which they got from XLSX file.
ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(this.xlsxPackage);
XSSFReader xssfReader = new XSSFReader(this.xlsxPackage);
StylesTable styles = xssfReader.getStylesTable();
XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
int index = 0;
while (iter.hasNext()) {
InputStream stream = iter.next();
String sheetName = iter.getSheetName();
this.output.println();
this.output.println(sheetName + " [index=" + index + "]:");
processSheet(styles, strings, new SheetToCSV(), stream);
stream.close();
++index;
}

You can extend XSSFWorkbook, save the File or InputStream object from constructor and them get it back with some method like getInputStream()

Related

Not able to read ZipInputStream returned by ZipFile.getInputStream(ZipEntry) method

I am trying to read extract a given file from zip file. Zip file contains directories & sub-directories as well. I tried Java7 nio file apis but since my zip has subdirectories as well, I need to provide complete path to extract the file, which is not suitable in my scenario. As I have to take filetobeextracted input from user. I have been trying below code for it but somehow read method of ZipInputStream not reading any contents to buffer. On debugging I found out that ZipEntry object value is null inside ZipInputStream due to its read method simply returns -1.But now I am stuck as I am not able to figure out how that value is being set for it.
try(OutputStream out=new FileOutputStream("filetoExtract");) {
zipFile = new ZipFile("zipFile");
Enumeration<? extends ZipEntry> e = zipFile.entries();
while (e.hasMoreElements()) {
ZipEntry entry = e.nextElement();
if (!entry.isDirectory()) {
String entryName = entry.getName();
String fileName = entryName.substring(entryName.lastIndexOf("/") + 1);
System.out.println(i++ + "." + entryName);
if (searchFile.equalsIgnoreCase(fileName)) {
System.out.println("File Found");
BufferedInputStream bufferedInputStream = new BufferedInputStream(zipFile.getInputStream(entry));
ZipInputStream zin = new ZipInputStream(bufferedInputStream);
byte[] buffer = new byte[9000];
int len;
while ((len = zin.read(buffer)) != -1) {
out.write(buffer, 0, len);
}
out.close();
break;
}
}
}
} catch (IOException ioe) {
System.out.println("Error opening zip file" + ioe);
}
Please advice what I am doing wrong here. Thanks
EDIT:
After debugging little more I found out that ZipFile class has inner class of similar name(ZipFileInputStream). So it was creating object of it rather than the outside ZipFileInputStream class. So I tried out below code and it worked out well. But I don't quite understand things here, what has happened. If someone could help me logic behind the scenes would be really great.
// BufferedInputStream bufferedInputStream = new
//BufferedInputStream(zipFile.getInputStream(entry));
//ZipInputStream zin = new ZipInputStream(bufferedInputStream);
InputStream zin= zipFile.getInputStream(entry);
The second line is unnecessary, as zipFile.getInputStream(entry) already returns an InputStream that represents the decompressed data. Therefore there's no need (or in fact it's wrong) to wrap that InputStream in yet another ZipInputStream:
BufferedInputStream bufferedInputStream = new BufferedInputStream(zipFile.getInputStream(entry));
ZipInputStream zin = new ZipInputStream(bufferedInputStream);

Extract tar.gz file in memory in Java

I'm using the Apache Compress library to read a .tar.gz file, something like this:
final TarArchiveInputStream tarIn = initializeTarArchiveStream(this.archiveFile);
try {
TarArchiveEntry tarEntry = tarIn.getNextTarEntry();
while (tarEntry != null) {
byte[] btoRead = new byte[1024];
BufferedOutputStream bout = new BufferedOutputStream(new FileOutputStream(destPath)); //<- I don't want this!
int len = 0;
while ((len = tarIn.read(btoRead)) != -1) {
bout.write(btoRead, 0, len);
}
bout.close();
tarEntry = tarIn.getNextTarEntry();
}
tarIn.close();
}
catch (IOException e) {
e.printStackTrace();
}
Is it possible not to extract this into a seperate file, and read it in memory somehow? Maybe into a giant String or something?
You could replace the file stream with a ByteArrayOutputStream.
i.e. replace this:
BufferedOutputStream bout = new BufferedOutputStream(new FileOutputStream(destPath)); //<- I don't want this!
with this:
ByteArrayOutputStream bout = new ByteArrayOutputStream();
and then after closing bout, use bout.toByteArray() to get the bytes.
Is it possible not to extract this into a seperate file, and read it in memory somehow? Maybe into a giant String or something?
Yea sure.
Just replace the code in the inner loop that is openning files and writing to them with code that writes to a ByteArrayOutputStream ... or a series of such streams.
The natural representation of the data that you read from the TAR (like that) will be bytes / byte arrays. If the bytes are properly encoded characters, and you know the correct encoding, then you can convert them to strings. Otherwise, it is better to leave the data as bytes. (If you attempt to convert non-text data to strings, or if you convert using the wrong charset/encoding you are liable to mangle it ... irreversibly.)
Obviously, you are going to need to think through some of these issues yourself, but basic idea should work ... provided you have enough heap space.
copy the value of btoread to a String like
String s = String.valueof(byteVar);
and goon appending the byte value to the string untill end of the file reaches..

File(s) and InputStream/OutputStream

Hello Stack Overflow community,
I am doing multistep processing on some data I am receiving with a java Servlet. The current process I have is that I input the files to a server using Apache File Upload and convert them to a File. Then once input1 is populated with data, I run through a flow similar to this (where the process functions are xsl transforms):
File input1 = new File(FILE_NAME); // <---this is populated with data
File output1 = new File(TEMP_FILE); // <---this is the temporary file
InputStream read = new FileInputStream(input1);
OuputStream out = new FileOutputStream(output1);
process1ThatReadsProcessesOutputs( read, out);
out.close();
read.close();
//this is basically a repeat of the above process!
File output2 = new File(RESULT_FILE); // <--- This is the result file
InputStream read1 = new FileInputStream(output1);
OutputStream out1 = new FileOutputStream(output2);
Process2ThatReadsProcessesOutputs( read1, out1);
read1.close();
out1.close();
…
So my question is if there is a better way to do this so I do not have to create those temporary Files and recreate streams to those Files? (I am assuming I am incurring a decent performace penatly)
I saw this Most Efficient Way to create InputStream from OutputStream but I am not sure if this is the best route to go...
Just replace FileOutputStream to ByteArrayInputStream vice/versa.
Example:
ByteArrayOutputStream out = new ByteArrayOutputStream();
ByteArrayInputStream in = new ByteArrayInputStream(out.toByteArray());
I don't know why are you converting the FileItem retrieved with Apache Commons if you don't really needed. You can use the same InputStream that each FileItem has to using and read the content of the uploaded file:
// create/retrieve a new file upload handler
ServletFileUpload upload = ...;
// parse the request
List<FileItem> items = (List<FileItem>) upload.parseRequest(request);
/* get the FileItem from the List. Yes, it's not a best practice because you must verify
how many you receive, and check everything is ok, etc.
Let's suppose you've done it */
//...
FileItem item = items.get(0);
// get the InputStrem to read the contents of the file
InputStream is = item.getInputStream();
So finally, you can use the InputStream object to read the uploaded stream sent by the client avoiding unnecessary instantiations.
And yes, it's really recommended to use Buffered clases like BufferedInputStream and BufferedOutputStream.
The other idea could be to avoid FileOutputStream (the middle one) and replace it with ByteArrayOutputStream if you don't need to be written in disk (always is slower than working in memory).
Java 9 brings a new answer to the question:
// All bytes from an InputStream at once
byte[] result = new ByteArrayInputStream(buf)
.readAllBytes();
// Directly redirect an InputStream to an OutputStream
new ByteArrayInputStream(buf)
.transferTo(System.out);

How to create ByteArrayInputStream from a file in Java?

I have a file that can be any thing like ZIP, RAR, txt, CSV, doc etc. I would like to create a ByteArrayInputStream from it.
I'm using it to upload a file to FTP through FTPClient from Apache Commons Net.
Does anybody know how to do it?
For example:
String data = "hdfhdfhdfhd";
ByteArrayInputStream in = new ByteArrayInputStream(data.getBytes());
My code:
public static ByteArrayInputStream retrieveByteArrayInputStream(File file) {
ByteArrayInputStream in;
return in;
}
Use the FileUtils#readFileToByteArray(File) from Apache Commons IO, and then create the ByteArrayInputStream using the ByteArrayInputStream(byte[]) constructor.
public static ByteArrayInputStream retrieveByteArrayInputStream(File file) {
return new ByteArrayInputStream(FileUtils.readFileToByteArray(file));
}
The general idea is that a File would yield a FileInputStream and a byte[] a ByteArrayInputStream. Both implement InputStream so they should be compatible with any method that uses InputStream as a parameter.
Putting all of the file contents in a ByteArrayInputStream can be done of course:
read in the full file into a byte[]; Java version >= 7 contains a convenience method called readAllBytes to read all data from a file;
create a ByteArrayInputStream around the file content, which is now in memory.
Note that this may not be optimal solution for very large files - all the file will stored in memory at the same point in time. Using the right stream for the job is important.
A ByteArrayInputStream is an InputStream wrapper around a byte array. This means you'll have to fully read the file into a byte[], and then use one of the ByteArrayInputStream constructors.
Can you give any more details of what you are doing with the ByteArrayInputStream? Its likely there are better ways around what you are trying to achieve.
Edit:
If you are using Apache FTPClient to upload, you just need an InputStream. You can do this;
String remote = "whatever";
InputStream is = new FileInputStream(new File("your file"));
ftpClient.storeFile(remote, is);
You should of course remember to close the input stream once you have finished with it.
This isn't exactly what you are asking, but is a fast way of reading files in bytes.
File file = new File(yourFileName);
RandomAccessFile ra = new RandomAccessFile(yourFileName, "rw"):
byte[] b = new byte[(int)file.length()];
try {
ra.read(b);
} catch(Exception e) {
e.printStackTrace();
}
//Then iterate through b
This piece of code comes handy:
private static byte[] readContentIntoByteArray(File file)
{
FileInputStream fileInputStream = null;
byte[] bFile = new byte[(int) file.length()];
try
{
//convert file into array of bytes
fileInputStream = new FileInputStream(file);
fileInputStream.read(bFile);
fileInputStream.close();
}
catch (Exception e)
{
e.printStackTrace();
}
return bFile;
}
Reference: http://howtodoinjava.com/2014/11/04/how-to-read-file-content-into-byte-array-in-java/

Reading from a ZipInputStream into a ByteArrayOutputStream

I am trying to read a single file from a java.util.zip.ZipInputStream, and copy it into a java.io.ByteArrayOutputStream (so that I can then create a java.io.ByteArrayInputStream and hand that to a 3rd party library that will end up closing the stream, and I don't want my ZipInputStream getting closed).
I'm probably missing something basic here, but I never enter the while loop here:
ByteArrayOutputStream streamBuilder = new ByteArrayOutputStream();
int bytesRead;
byte[] tempBuffer = new byte[8192*2];
try {
while ((bytesRead = zipStream.read(tempBuffer)) != -1) {
streamBuilder.write(tempBuffer, 0, bytesRead);
}
} catch (IOException e) {
// ...
}
What am I missing that will allow me to copy the stream?
Edit:
I should have mentioned earlier that this ZipInputStream is not coming from a file, so I don't think I can use a ZipFile. It is coming from a file uploaded through a servlet.
Also, I have already called getNextEntry() on the ZipInputStream before getting to this snippet of code. If I don't try copying the file into another InputStream (via the OutputStream mentioned above), and just pass the ZipInputStream to my 3rd party library, the library closes the stream, and I can't do anything more, like dealing with the remaining files in the stream.
Your loop looks valid - what does the following code (just on it's own) return?
zipStream.read(tempBuffer)
if it's returning -1, then the zipStream is closed before you get it, and all bets are off. It's time to use your debugger and make sure what's being passed to you is actually valid.
When you call getNextEntry(), does it return a value, and is the data in the entry meaningful (i.e. does getCompressedSize() return a valid value)? IF you are just reading a Zip file that doesn't have read-ahead zip entries embedded, then ZipInputStream isn't going to work for you.
Some useful tidbits about the Zip format:
Each file embedded in a zip file has a header. This header can contain useful information (such as the compressed length of the stream, it's offset in the file, CRC) - or it can contain some magic values that basically say 'The information isn't in the stream header, you have to check the Zip post-amble'.
Each zip file then has a table that is attached to the end of the file that contains all of the zip entries, along with the real data. The table at the end is mandatory, and the values in it must be correct. In contrast, the values embedded in the stream do not have to be provided.
If you use ZipFile, it reads the table at the end of the zip. If you use ZipInputStream, I suspect that getNextEntry() attempts to use the entries embedded in the stream. If those values aren't specified, then ZipInputStream has no idea how long the stream might be. The inflate algorithm is self terminating (you actually don't need to know the uncompressed length of the output stream in order to fully recover the output), but it's possible that the Java version of this reader doesn't handle this situation very well.
I will say that it's fairly unusual to have a servlet returning a ZipInputStream (it's much more common to receive an inflatorInputStream if you are going to be receiving compressed content.
You probably tried reading from a FileInputStream like this:
ZipInputStream in = new ZipInputStream(new FileInputStream(...));
This won’t work since a zip archive can contain multiple files and you need to specify which file to read.
You could use java.util.zip.ZipFile and a library such as IOUtils from Apache Commons IO or ByteStreams from Guava that assist you in copying the stream.
Example:
ByteArrayOutputStream out = new ByteArrayOutputStream();
try (ZipFile zipFile = new ZipFile("foo.zip")) {
ZipEntry zipEntry = zipFile.getEntry("fileInTheZip.txt");
try (InputStream in = zipFile.getInputStream(zipEntry)) {
IOUtils.copy(in, out);
}
}
I'd use IOUtils from the commons io project.
IOUtils.copy(zipStream, byteArrayOutputStream);
You're missing call
ZipEntry entry = (ZipEntry) zipStream.getNextEntry();
to position the first byte decompressed of the first entry.
ByteArrayOutputStream streamBuilder = new ByteArrayOutputStream();
int bytesRead;
byte[] tempBuffer = new byte[8192*2];
ZipEntry entry = (ZipEntry) zipStream.getNextEntry();
try {
while ( (bytesRead = zipStream.read(tempBuffer)) != -1 ){
streamBuilder.write(tempBuffer, 0, bytesRead);
}
} catch (IOException e) {
...
}
You could implement your own wrapper around the ZipInputStream that ignores close() and hand that off to the third-party library.
thirdPartyLib.handleZipData(new CloseIgnoringInputStream(zipStream));
class CloseIgnoringInputStream extends InputStream
{
private ZipInputStream stream;
public CloseIgnoringInputStream(ZipInputStream inStream)
{
stream = inStream;
}
public int read() throws IOException {
return stream.read();
}
public void close()
{
//ignore
}
public void reallyClose() throws IOException
{
stream.close();
}
}
I would call getNextEntry() on the ZipInputStream until it is at the entry you want (use ZipEntry.getName() etc.). Calling getNextEntry() will advance the "cursor" to the beginning of the entry that it returns. Then, use ZipEntry.getSize() to determine how many bytes you should read using zipInputStream.read().
It is unclear how you got the zipStream. It should work when you get it like this:
zipStream = zipFile.getInputStream(zipEntry)
t is unclear how you got the zipStream. It should work when you get it like this:
zipStream = zipFile.getInputStream(zipEntry)
If you are obtaining the ZipInputStream from a ZipFile you can get one stream for the 3d party library, let it use it, and you obtain another input stream using the code before.
Remember, an inputstream is a cursor. If you have the entire data (like a ZipFile) you can ask for N cursors over it.
A diferent case is if you only have an "GZip" inputstream, only an zipped byte stream. In that case you ByteArrayOutputStream buffer makes all sense.
Please try code bellow
private static byte[] getZipArchiveContent(File zipName) throws WorkflowServiceBusinessException {
BufferedInputStream buffer = null;
FileInputStream fileStream = null;
ByteArrayOutputStream byteOut = null;
byte data[] = new byte[BUFFER];
try {
try {
fileStream = new FileInputStream(zipName);
buffer = new BufferedInputStream(fileStream);
byteOut = new ByteArrayOutputStream();
int count;
while((count = buffer.read(data, 0, BUFFER)) != -1) {
byteOut.write(data, 0, count);
}
} catch(Exception e) {
throw new WorkflowServiceBusinessException(e.getMessage(), e);
} finally {
if(null != fileStream) {
fileStream.close();
}
if(null != buffer) {
buffer.close();
}
if(null != byteOut) {
byteOut.close();
}
}
} catch(Exception e) {
throw new WorkflowServiceBusinessException(e.getMessage(), e);
}
return byteOut.toByteArray();
}
Check if the input stream is positioned in the begging.
Otherwise, as implementation: I do not think that you need to write to the result stream while you are reading, unless you process this exact stream in another thread.
Just create a byte array, read the input stream, then create the output stream.

Categories

Resources