Best way for CSV reading in Apache Utils - java

What is the best option to go for CSV reading? I know that there is a way using super-csv API, java-csv API. But my boss asked me to one the API which is only provided by the Apache organization. Can anyone please help me?

You said in your original question:
only provided by the Apache organization
The Apache Commons CSV Project is currently in the Commons Sandbox.
From the Commons CSV Page:
There are currently no official downloads, and will not be until CSV moves out of the Sandbox, but a nightly build is available from http://people.apache.org/builds/commons/nightly/commons-csv/
One caution I would mention is that the latest nightly build I found here was 30-Jul-2007.
99% of the time I would advocate not re-inventing the wheel. For example, using Apache Commons Lang StringUtils, instead of rolling your own String Utility classes to check for blank or empty Strings.
However, due to the fact that:
Apache Commons CSV is in the Sandbox
Could not find any nightly builds more recent than July 2007
Relative ease of writing your own CSV parser
This is a scenario where I would recommend writing your own.
If you just need to read and parse a Comma-separated Values file, you should be able to accomplish this without too much code or difficulty using core Java IO and Util classes.
You should just be able to wrap a java.io.FileReader in a java.io.BufferedReader, read line by line.
For each line, use a java.util.StringTokenizer to split on the commas.
This is the logic you would need, obviously you would need to take care of closing readers, Exception Handling, etc.

Related

Efficient LZ4 multiple file compression using java

I took adrien grand's java repository providing JNI to the original LZ4 native code.
I want to compress multiple files under a given input directory, but LZ4 doesn't support multiple file compression like in java zip package so I tried another approach where I thought of to tar all my input files and pipe it as input to LZ4 compressor, and I used Jtar java package for taring all my input files. Is there any other better way other than this?
I came across many sample codes to compress some strings and how to correctly implement the LZ4 compressor and decompressor. Now I wanted to know how to actually implement it for multiple files? I also wanted to clarify whether I'm going in the correct direction.
After taring all files, according to sample code usage explanation I've to convert my tared file now to byte array to provide it to compressor module. I used apache-common-ioutil package for this purpose. So considering I've many files as input and which results in a tar of huge size, converting it always to byte array seems ineffective according to me. I wanted to first know whether this is effective or not? or is there any better way of using LZ4 package better than this?
Another problem that I came across was the end result. After compression of the tared files I would get an end result like MyResult.lz4 file as output but I was not able to decompress it using the archive manager ( I'm using ubuntu ) as it doesn't support this format. I'm also not clear about the archive and compression format that I have to use here. I also want to know what format should the end result be in. So now I'm speaking from an user point of view, consider a case where I'm generating a backup for the user if I provide him/her with traditional .zip, .gz or any known formats, the user would be in a position to decompress it by himself. As I know LZ4 doesn't mean I've to expect the user also should know such format right? He may even get baffled on seeing such a format. So this means a conversion from .lz4 to .zip format also seems meaningless. I already see the taring process of all my input files as a time consuming process, so I wanted to know how much it affects the performance. As I've seen in java zip package compressing multiple input files didn't seem to be a problem at all. So next to lz4 I came across Apache common compress and TrueZIP. I also came across several stack overflow links about them which helped me learn a lot. As of now I really wanted to use LZ4 for compression especially due it's performance but I came across these hurdles. Can anyone who has a good knowledge about LZ4 package provide solutions to all my queries and problems along with a simple implementation. Thanks.
Time I calculated for an input consisting of many files,
Time taken for taring : 4704 ms
Time taken for converting file to byte array : 7 ms
Time Taken for compression : 33 ms
Some facts:
LZ4 is no different here than GZIP: it is a single-concern project, dealing with compression. It does not deal with archive structure. This is intentional.
Adrien Grand's LZ4 lib produces output incompatible with the command-line LZ4 utility. This is also intentional.
Your approach with tar seems OK becuase that's how it's done with GZIP.
Ideally you should make the tar code produce a stream which is immediately compressed instead of first being entirely stored in RAM. This is what is achieved at the command line using Unix pipes.
I had the same problem. The current release of LZ4 for Java is incompatible with the later developed LZ4 standard to handle streams, however, in the projects repo there is a patch that supports the standard to compress/decompress streams, and I can confirm it is compatible with the command line tool. You can find it here https://github.com/jpountz/lz4-java/pull/61 .
In Java you can use that together with TarArchiveInputStream from the Apache Commons compress.
If you want an example, the code I use is in the Maven artifact io.github.htools 0.27-SNAPSHOT (or at github) the classes io.github.htools.io.compressed.TarLz4FileWriter and (the obsolete
class) io.github.htools.io.compressed.TarLz4File show how it works. In HTools, tar and lz4 are automatically used through ArchiveFile.getReader(String filename) and ArchiveFileWriter(String filename, int compressionlevel) provided your filename ends with .tar.lz4
You can chain IOStreams together, so using something like Tar Archive from Apache Commons and LZ4 from lz4-java,
try (LZ4FrameOutputStream outputStream = new LZ4FrameOutputStream(new FileOutputStream("path/to/myfile.tar.lz4"));
TarArchiveOutputStream taos = new TarArchiveOutputStream (outputStream)) {
...
}
Consolidating the bytes into a byte array will cause a bottleneck as you are not trying to hold the entire stream in-memory which can easily run into OutOfMemory problems with large streams. Instead, you'll want to pipeline the bytes through all the IOStreams like above.
I created a Java library that does this for you https://github.com/spoorn/tar-lz4-java.
If you want to implement it yourself, here's a technical doc that includes details on how to LZ4 compress a directory using TarArchive from Apache Commons and lz4-java: https://github.com/spoorn/tar-lz4-java/blob/main/SUMMARY.md#lz4

Is it more efficient to read from an Excel file or an CSV file?

I need to write a quick program (using Java since its the only language I am really comfortable with) that takes an Excel file (or CSV) and parces through the data adding information that might be missing.
The problem Im having is that I cant decide how to start, it feels like manipulating an Excel file would be easier but reading through a CSV file would be really simple.
Any insight on problems that might come up or maybe a third solution that I'm ignoring.
The excel document is basically just a mini audited database of printer IPs, names, manufacturers, and locations.
Edit: General consensus seems to be that CSV is a lot more easy to manipulate and since Im wanting to write a quick script that can be ran I think downloading the extra library for excel manipulation would be a hassel.
Going to start writing the code today or monday, will probably have more questions later in the week. Thank you everyone for your help! Venturing into new territory with my first job.
If reading a CSV is an option in your situation, I would definitely go for it, because you can do it in a way that is both system-independent and portable without using external libraries.
As far as the efficiency goes, the timing is very likely going to be I/O dominated, so the smaller the file - the faster you are going to read it in.
Adding the missing information and writing the file back may be a bit tricky because of the need to properly handle quotes, but it is still a lot simpler than accessing an Excel file through a special-purpose library.
CSV willl be easier since you do not need any additional libraries like jxl. Refer to this read and write CSV tutorial
500x10 is really quite small so difficult to imagine a lot of code would be required. If sticking with Excel its inbuilt features (Find/Replace, Sort, Filter, PivotTable, Copy down etc) I would expect to be sufficient.

how to generate the report in to excel in java

How to generate the report in to excel by java. Is there any link that describing this topics. I am using spring 3. Please suggest the examples.
You will likely need to use some 3rd party libraries. One such option is Java Excel API library as illustrated in this post by Lars Vogel.
You can check out the sample here
Disclaimer : I havent used it before but the article seems pretty descriptive. Hope it helps.
I've used Apache POI. It seems to be good enough for Excel file generation (though its Word document generator is not mature enough, by the way). I'm not sure it's very easy but it's quite flexible.
We have many library for generate report. I was working with JasperReport and Apache POI.
I think POI is good choose for you. It's very easy.
Jxls is a useful option. It integrates with Apache POI to allow you to have report templates that your Java code fills in with data.
See http://jxls.sourceforge.net/
I used previous versions, but it looks like it has come quite a ways since then.
I use simreport library. I see it is simple enough to make excel reports in java. It make report based on the report you want to make, so it is very easy to understand and edit, customize. It takes me only 15 minutes to start to make the first one. You try it. it's in http://www.jsimreport.com

Reliable Excel API for handling complex Excel report (parsing & writing ~10,000 lines) that works with Java

i know some similar question already exists, but I haven't found any satisfying answer yet.
I found several library such as Apache POI and JExcelAPI, however as I don't have any previous experience with any Java Excel API yet, perhaps some of you guys who experienced it before can enlighten me regarding the advantages and disadvantages of each API. My requirements are reliability and ease-of-use, because I have to parse and write numerous excel reports with ~10,000 lines in each file.
I'm also considering JXLS which can parse and write document using template to minimize coding effort, but based on my test, we have to hard-code the startRow and endRow when parsing (the startRow and endRow for my document is different for each file).
Actually, even the old versions of POI will support 10,000 rows - the limitation was either ~32000 or ~64000 rows.
But the latest POI supports the XML file formats for 2007, and therefore I'm sure memory will be your only limitation.
I use POI in a corporate application, and I've never had a problem with it.
Aspose.Cells for Java allows you to create or parse large Excel files in Java applications. The API is simple along with complete documentation and support. A large number of users have already incorporated it in their applications. It is easy to learn and any questions can be answered quickly through our support forum. You may try this and see if it helps in your scenario.
Disclosure: I work as developer evangelist at Aspose.
I've used JExcel with great success, although I can't say that any of those files were on the order of 10,000 rows per file.
I'd wonder if you'd be better off with a relational database with this volume of data. Excel might have been a fine way to start, but maybe it's time to ask yourself if you've outgrown spreadsheets.

Word document creation API in Java

I would like to create a word document using a template, replace some variables (fields) and save it as a new word document.
I was thinking using Apache POI, http://poi.apache.org/ is it the best for this purpose?
can you share your impression from it?
I've worked with POI before and it's certainly able to generate Word documents. But the devil is in the details.
Word has thousands of features: You can put numbered lists starting at #13 with negative indents into two joined cells of a table included in another table that is itself part of a bullet list... you get the idea. When the POI documentation says they are a work in progress, that reflects what will probably be an eternal state of trying to catch up to the (to us, undocumented) specification of Word.
Documents with a reasonably "normal" set of used features are well supported by POI, whose interfaces and methods are reasonable and consistent but sometimes require a bit of work. But as Pascal says, documents with a not too exorbitant set of features are also supported by RTF.
I have almost no experience "doing" RTF but it's probably a bit simpler than working with POI.
If you're working in an environment or for a customer who insists that your produced documents be .DOC rather than .RTF, then POI is pretty much your only choice, unless you can introduce a step where you use a bit of Office automation to convert RTF into DOC.
Update: I've had a couple more ideas in the meantime.
Using POI or creating RTF documents is something that you could do on practically any platform. At my job, all servers doing processing like this happen to be running Linux, for example.
However, in the likely case that your programs will run under Windows, there is another alternative: Jacob http://www.land-of-kain.de/docs/jacob/
Jacob is a COM interface for Java; it essentially allows you to "remote control" Windows programs such as Word and Excel. The document I linked to above is not to Jacob's own site but to a single page with "cookie cutter" recipes for using Jacob. The project itself is on SourceForge: http://sourceforge.net/projects/jacob-project/ But people claim, and rightly so, that the documentation is a bit lacking.
Jacob has the advantage over all other solutions that you're dealing with the "real" Word and therefore all capabilities of Word are available to you. This would be an alternative if there are detail aspects of your document that just can't be handled with POI or via the RTF format.
This is obviously way too late, But since 2013 there is a much better, more flexible solution to word document creation.
http://www.docx4java.org/trac/docx4j
I have had much more luck with docx4j than I ever did with POI.
I'm not sure of the exact status of the Word documents support in POI but, according to the POI website, work is still in progress (can't say what this mean exactly). So, at this time, I would not use POI but rather try to generate a RTF document. For this, you could :
Use RTFTemplate which is a RTF to RTF Engine that can generate RTF document as the result of the merge of a RTF model and data.
Use iText which is primarly a PDF generator but can also generate RTF.
Build your own custom solution (but I wouldn't do that).
I'd go for iText.
If you use a template, and do not want to create the word document from scratch, for what I know, POI is a pretty good solution. You open the template and select the zones you want to replace.
They say POI is still is developpement, but I've been using it in production environnement and it works pretty good at the moment.
I know this question is a bit old, but I think many people still find this with search engines, so I post another possibility to do what you want right here:
If the one and only goal is to have a Word Template and to replace some values in it, you might consider saving a Word Template as single xml (not docx) and then processing it with simple Java and without any Framework. If you want to do more (e.g. create lists or tables) you might also consider understanding the xml format and writing your own helpers before loading a Framework like POI.
Here is an example on how to do that:
http://dev-notes.com/code.php?q=10
This is the fast version, if you want a nice version, you could try using an XML processor.
PS: users might notice that the file extension is not doc but xml and they may blame you for that, but that's ok... just rename it to doc, word will recognize the format and everyone is happy again ;)
You should look into the Aspose.Words components. They have recently begun providing a Java version of the component.
See the following link: Aspose.Word for Java
This supports Word automation, creation and advanced features such as mail merging without the need for an instance of Microsoft Word on the machine. The real benefits are that you are able to work within the context of an actual word document and not having to compromise by creating RTFs etc.
The Java version is not currently as fully featured as the .Net version but the main core functionality is there and they are pushing very hard to have a feature equivalent version soon.
Also, if you purchase the Java version you get a years free upgrades / support as the new releases are created.
If you are working with docx documents, docx4j is an option. Like POI, its open source.
I created and use this: http://code.google.com/p/java2word

Categories

Resources