Memory-efficient Java library to read Excel files? [closed]

Memory-efficient Java library to read Excel files? [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 15 days ago.
Improve this question
Is there a memory-efficient Java library to read large Microsoft Excel files (both .xls and .xlsx)? I have very limited experience with Apache POI, and it seemed to be a huge memory hog from what I recall (though perhaps this was just for writing and not for reading). Is there something better? Or am I misremembering and/or misusing POI?
It would be important for it to have a "friendly" open-source license as well.

Apache's POI library has an event-based API that has a smaller memory-footprint. Unfortunately, it only works with HSSF (Horrible Spreadsheet Format) and not XSSF (XML Spreadsheet Format - for OOXML files).

The Excel file formats are (both) huge and extremely complicated, and anything that reads all of their possible contents is going to be equally huge and complicated. Remember they can contain ranges, macros, links, embedded stuff etc.
However if you are reading something simple like a grid of numbers, I recommend first converting the spreadsheet to something simpler like CSV and then reading that format.

Take a look at JExcel:
http://jexcelapi.sourceforge.net/
I can't account for the memory footprint, but obviously with large spreadsheets your going to consume lots of memory for processing.
You should be able to use it for xls and xlsx:
Read XLSX file in Java

I cannot answer your question directly, as I'm not using Java; however I can share a similar experience in Perl that may be partially relevant.
The OOXML format is indeed very large and complex, so any software that aims at covering the full specification is likely to be quite costly in terms of resources. In Perl, the most well-known module for reading .xlsx files is https://metacpan.org/pod/Spreadsheet::ParseXLSX, which does the job well for small and medium files; however it is far too slow on large amounts of data. So I ended up writing another module https://metacpan.org/pod/Excel::ValueReader::XLSX, with far less features, but optimized for fast parsing of large files.
The moral is : there is no one-size-fits-all solution. If you are willing to sacrifice some features for better speed or less memory consumption, you might find other libraries. In Java, https://github.com/dhatim/fastexcel could perhaps be a good candidate (just from reading the documentation).

Related

Any open source api to covert to pdf file in JAVA [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I Need to convert below file format to pdf format.
TIF,TIFF,TXT,JPG,JPEG,BMP,DOC,DOCX,XLS,XLSX,PPT,PPTX,GIF,PDF
Do we have any open source API to convert into PDF. I tried APACHE POI. but its not look sufficient. Let me know any open source api is available.

Creating a PDF that contains nothing but an image is quite easy using the iText library; its web site has an example that shows how to do that.
Converting Excel files is not hard; the Apache POI library can be used for reading the Excel file, and then again the iText library can be used for creating PDFs that contain tables.
Word can be dealt with in a similar manner (POI also supports it), but it'll be quite a bit tricker, especially if the file contains tables and images, since the POI API for handling DOC/DOCX isn't as advanced as the one handling XLS/XLSX, and of course Word files have a less regular structure than Excel files.
JAI won't be of any help with this.
There are commercial packages available that can be used from Java applications; you may want to investigate those before embarking on writing your own, especially if you need to deal with complex documents - writing your own converter that handles those and generates good quality output could easily take a couple of weeks (or a month) of your time.

Library for analyzing xbrl files in java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed yesterday.
Improve this question
I'm trying to figure out how to: read xbrl files, analyze the files and make use of the data e.g. for calculating key figures, in Java.
I know how to read xbrl files as xml and structuring them with json nodes, but I have concluded that it's much more complicated to actually analyze them and use the data. I figured out that tags and attributes like "context id", "period" and "dimenson" etc determines how data is wired together.
Now, I'm not going to implement my own xbrl processor from scratch, because I simply don't have the time and knowledge to do that.
I'm looking for a Java library, including documentation and/or guides on how to use it, that processes xbrl files and that can be used to analyze and extract data.
I searched the web and read a few articles about how to get started, but I didn't quite find something that seemed very useful.
Any suggestions? I would really appreciate if someone could point me in the right direction.

Using an existing XBRL processor is a good idea as it saves you the (considerable) efforts of interpreting the XBRL semantics at a raw syntactic level.
From the top of my mind, I know of at least the following products that offer a Java API, in a random order. I have no affiliation with either and abstain from commenting further to not land into a taste/preference discussion.
Reporting Standard: http://www.reportingstandard.com/index.php/en/
CoreFiling: https://www.corefiling.com/
There are probably many more, possibly also open source. XBRL.org has a much more comprehensive list of vendors here as well as a getting started guide for developers.

I was able to parse Xbrl files from XbrlParser project here.
Credits: https://github.com/marcioalexandre

a java excel api for addressing my requirements? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
Problem description: i want to load image pixel data in excel sheet.
what i have tried: using apache POI for writing the data to excel, but i found there are some limitations in apache POI (as elaborated below)
I have come to know of some workarounds, which are tedious on the part of the programmer and i am not really willing to do that for such a trivial looking task.
Details:
i have been using apache POI for quite some time, and i have come across few limitations:
the whole file is in memory at once, so cant use directly for bigger files.
(specific to HSSF) :
no more than 255 columns
no more than 4000 cell styles
cant use custom colors directly.
my requirement is to read an image(say, 1024x764) pixel by pixel and write pixel value in rows and columns of the excel sheet, every different pixel value is styled differently.
the problems i have faced are:
out of memory exception, while writing to the excel sheet, because of so many rows/columns and styles
writing a logic for reusing styles would slow down the whole program
even if i reuse styles, what to do about the huge number of rows/columns
I have come to know that there are workarounds for these problems:
reusing styles
writing logic for efficient memory usage
but i do not intend to take much pain for a job as simple as that, and since these are not directly the limitations of excel (atleast not .xlsx), i am looking for a library that can do it for me.
can someone please suggest another library which can do this,or can you suggest some easier workarounds for these problems?

can someone please suggest a good library to do this, or else i would change from java to csharp
In short, nope - the POI libraries are, in my experience, the best ones available for the job. They're not perfect, but I don't know of an alternative that's better. You may want to try checking trunk out and seeing if any of your issues have been resolved there - entirely possible, it's a relatively active project.
The only other thing I'd suggest looking at is the OpenOffice API, but note that requires OO to be installed (or distributed with your app.)
In all honesty though, POI's strength is it's cross platform nature - it's a pure Java implementation with no native components. If you don't care about this and could therefore go with C# and use the native office APIs, this would seem like the logical approach surely? It seems odd to me that you're not doing this already.

JExcelApi
http://jexcelapi.sourceforge.net/
It works in declarative mode, as Adobe LifeCycle e JReport: you create a Template file xls andin every cell you put the reference to the beans.
Invoking the engine, a the end you have a XLS file.
Sorry for the extreme synthesis, but I worked with it a lot of years ago and I don't remember the details, but in the website there's the documentation.

Convert PDF to Word in Java [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Is it possible to convert PDF to Word in Java? I'm not talking about parsing a PDF document and then custom render it again to Word. I want a Java library that can directly convert it.

Reading PDF documents is a very involved process and there are no good free libraries for extracting non-text information from PDF documents in Java. Worse yet, PDF documents have a lot of layout information that is hard to reconstruct, for example a table in a Word document becomes some lines and a bunch of pieces of text in PDF.

It is almost impossible to recreate semantic information from an arbitrary PDF. If you have the same tool that wrote it you have somewhat more chance but even so there is much uncertainty. The only thing you can be sure of in a (text) PDF is the position of each character on the page. (Note that some PDFs include bitmaps in which textual information occurs and that has to rely on OCR).
There are several groups in computer science departments and elsewqhere who are spending very significant effort to try and get semantic information. We collaborate with Penn State - one of the leaders - and they are working on extracting tables. In good casees they get 90% in bad ones 50%.
So the answer is formally that you cannot, but you may occasionally be fortunate. (We do a lot of this for chemistry and count ourselves lucky if we get 50% on a regular basis).

You can try to do it with the iText library. Read the PDF and then write it as an RTF.
This is not that simple though, as you have to preserve the different style that the PDF has.
You can use some external tools.
Install some free program like "Free PDF to Doc" and execute it from you java program.
This Works fine in most cases.
use the Acrobat Pro SDK from you java code.
Best of luck

Java (ME) library for fixed-length record files [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I am looking for a library that can run on Java ME (Foundation Profile 1.1, CDC) and allows me to basically do something along the lines of
FILE OF type;
in Pascal.
Background: I need to have a largish (approx. 100MB) set of around 500.000 records for lookups by a known index value quickly. Do I really have to write this myself? Databases like Derby are way too big and bring lots of features (stored procedures, anyone?) I do not need.
Ideally I would just like to define a class with a few fields based on primitive types and Strings as a value holder object and persist these in a file I could - should the need arise - manually recover. That's why I am not too much into serialization. From the past I have fought several occasions of corrupted binary data files which could not be recovered at all.

Your biggest problem here is establishing a correspondence between field names and columns in the file, as you really shouldn't assume that the class layout matches the field ordering in the source file.
If the file were to contain a header row then it's a simple matter of using reflection/introspection and shouldn't take more than a day to implement yourself.
Alternatively, you'll have to use an annotation of some sort to specify, for each field, where it appears in the file.
Have you instead considered alternative text serialization methods, such as CSV, JSON or XML using XStream? These avoid the risks of binary corruption and would get you up and running faster, but might also impose a higher memory footprint which could be an issue as you're targeting a mobile device.

After looking around for quite some time, I have finally come to xBaseJ from SourceForge. It relies on java.nio, which is normally not included in the JavaME CDC profile, but we had a contractor port the relevant parts to the mobile J9 VM. Armed with this, we are now building our application on top of DBase III compatible files. Apart from being pretty reasonably fast, even on the mobile platform, this gives us access to a plethora of tools that can handle this format, without having to teach non-tech folks about a JDBC based DB admin tool they do not feel comfortable with.
There has just been a recent release of a whole eBook, called "The Minimum You Need To Know About xBaseJ", which is available for free from the project's website, too.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.