Processing password protected zip files using Mapreduce [duplicate] - java

This question already has answers here:
Recommendations on a free library to be used for zipping files [closed]
(9 answers)
Closed 8 years ago.
I want to process password protected zipped files using Hadoop mapreduce. I was able to process unprotected zip files using ZipFileInputformat. But it doesn't support password protected zips.
Is there any Java library that provide stream access to password protected zip files or extract zip files if I can make its byte content available ? Thanks in Advance.

Assuming you can find a java library that can read password protected zip files (see this blog article for an example), you should be able to modify the current ZipFileInputFormat to use this library and then you'll just need to configure the password for each zip file via a configuration option (hopefully you don't have too many files, or all the files are protected using the same password).
It should be easy enough. Give it a try and if you run into problems, post another question - or ask author of the input format (https://github.com/cotdp/com-cotdp-hadoop is one possible implementation of ZipFileInputFormat i found via google) as to whether he can roll the update for you

Related

How do you read a .PDF file the same way you read .txt file using scanner. This is in java for android

I am trying to read a .pdf file the same way you read a .txt file. I need to parse the pdf file to obtain some information.
Refering to this Stack Exchange article
(Your question is probably a duplicate)
How to read PDF files using Java?
Use a library like Apache's PDFBox to do it.
Please do some basic research next time before you ask a question, I found tons of answers in about three minutes.
Cheers!

Create/update JIRA issues from Excel file [duplicate]

This question already has answers here:
Add Attachment to Jira via REST API
(2 answers)
Closed 7 years ago.
Is it possible to import excel/csv file (which contains details of jira issue like Title, Description, Type, Priority) in jira using any open source Tool or API in Java?
It should log hours and update comments as well.
EDIT: I don't want to upload excel file, rather than I want to create/update issues on jira using excel/csv file upload.
I have not seen an open source tool that does exactly that as JIRA has built in importing functionality for CVS/Excel.
The one opensource tool to mention would be the REST Java Client for JIRA to communicate to your specific JIRA instance that could be used in conjunction with Apache POI to perform what you require.

Validate file contents based on extension [duplicate]

This question already has answers here:
Validation of files based on their file extensions
(2 answers)
Closed 9 years ago.
I want to validate file contents based on their extension. For example, a user can save a document file (.doc/.docx) as an Excel file (.xls/.xlsx). Before I get the file contents, using Java I need to validate the content type matches with that extension.
Is any one have idea about, please share your points.
There exists projects already to detect the details of a file. The file command on linux can do this for example.
A Java project called Tika might be useful to you, Tika will parse the file(s) specified on the command line and output the extracted text content or metadata to standard output.

Java string data compression [duplicate]

This question already has answers here:
What's a good compression library for Java?
(3 answers)
Closed 9 years ago.
I have a problem. Currently I'm developing video game in java language. At this moment data files take about 100MB space even though game world is not big. I want to zip those text files ant protect them with password or some kind of encryption, but I can't find any good and free library for that.
Or maybe it's possible to pack data into some kind of archive without external libraries?
Update
I tried to download Zip4j, but it shows that I need source attachments and I can't find in library's site.
You can compress yor game data with the facilities provided by the JDK:
java.util.ZipFile and related classes to handle normal zip files
java.util.GZIPInput/OutputStream to compress data directly (no "files catalog" concept, just a blob).
java.util.DeflaterInput/OutputStream (like GZIP)
There exists a multitude of 3rd-Party compression classes. I personally like XZ for Java, because it provides excellent compression ratios and is easy to use through the standard stream interface (http://tukaani.org/xz/java.html).
For encryption, there are encrypting streams that are used just like the compression streams. Beware that anybody will be able to extract the "key" from your game files with a little knowledge and patience if they really want to.
You can use apache Compress Library for compressing file.
http://commons.apache.org/proper/commons-compress/
you can also compress using zip libraries at java.util.zip
Java provides several features that compress the data such as:
GZIPInputStream and GZIPOutputStream
you can also use zip with ZipFile (java.util.zip)
For encryption you could write your own FilterOutputStream and FilterInputStream where you use a Cipher to encrypt it.
Those features are working all without external libraries.

What is a .lck file and why can't I read it with a buffered reader?

I'm trying to use crawler4j to crawl websites. I was able to follow the instructions on the crawler4j website. When it is done it creates a folder with two different .lck files, one .jdb file and one .info.0 file.
I tried to read in the file using the code that I provided in this answer to read in the file but it keeps failing. I've used the same function to read text files before, so I know the code works.
I also found someone else that asked the same question a few months ago. They never got an answer.
Why can't I use my code to open and read these .lck files to memory?
Crawler4j uses BerkeleyDB to store crawl informations. See here in the source.
From the command line you can use DB utils to acces the data. Already covered in SO here.
If you want to access the data in your Java code, you simply import BerkeleyDB library (Maven instruction there) and follow the tutorial on how to open the DB.

Categories

Resources