Searching Docx files in java

Searching Docx files in java - java

I am writing an application for searching the Content of Documents
i have already written the code for searching the documents which are editable by notepad.
I also wish to do the same for docx files. After some research i have come up with these two things
http://www.infoq.com/articles/cracking-office-2007-with-java
this method requires me to extract docx file and then search the xml files however this would involve an extra overhead on the extraction part and frankly i dont know how to process an xml file ( discarding attribute content etc)
http://www.javadocx.com/download
this method allows me to import a jar library to my project and supposedly i can create docx files with it, what i dont understand is how to open docx files using it
can anyone recommend me a alternate method to perform the same action or help with the above two mentioned methods?

Try http://tika.apache.org/ or docx4j or POI.

Related

Java - How to merge multiple documents (With Multiple file formats with file convert) to a single PDF?

Currently I am having a requirement to download multiple files (PDF , XLXS , PPT , JPEG , PNG) from SFTP Server and then merge it to a one PDF File and provide to the client in order take a printout. I thought of using ITEXT library to convert all files to PDF and then perform a PDF Merge , but don't know weather it is possible, Thus I am requesting a support from you guys for a better approach to perform the task. I have already performed the file download using JSCH from SFTP to the server.

You can merge multiple PDF documents into a single PDF document using the class named PDFMergerUtility class, this class provides methods to merge two or more PDF documents in to a single PDF document.

Answering to My Own question to Benefit another person.
In order to Convert Files with extensions docx , xlsx , pptx) Used
Spire.Office for Java (Free Evaluation version available)
Also I tried aspose cells libray as well (Free Evaluation available) to convert xlsx to PDF as well. Both Libraries worked fine and hassle free , But all libraries were not free.
Then Merged all the PDF Files using ITEXT Library.
If Someone is having a better alternative answer , kindly share.
For multiple files merge, you can refer This Example

Generate PDF or Similar to it

I'm creating a JAVA application
and I want to create and display and print a PDF file.
Like this example:
http://img11.hostingpics.net/pics/331702Sanstitre.jpg
So can you give the right way to do it ?
I mean is this a pdf file displayed into a JPanel or something else ?
and thnx alot.

For working with PDF files I would recommend using a library such as Apache PDFBox which has the ability to write, read, and print PDF files using org.apache.pdfbox.PrintPDF
The API can be found Here
As for displaying it in the JFrame, you can simply read the text and print it out in a Swing Text Area

For generation of pdf files you can use Jasper Reports library. It is popular API for creation pdf files from template in which specific data is inserted. Template files have ".jrxml" extension and can be created and edited by Jaspersoft Studio. These files look like forms with variable fields, this is very useful for generating different kinds of reports.
The API for Jasper Reports Library can be found here.

extracting text AND Images from PDF file

I have been bumping my head against the wall with this one, have researched and pretty much tried every library suggested to me. I am currently trying to write a program in java that will extract text AND images from a pdf file and allow me to write the extracted content to a word file. I have managed to extract the content using the ICEpdf library, however the problem is that I need to be able to write the content in the exact same order as it was read. So, to clarify, I need a library that will help me keep track of where exactly in the page the text and images are situated so I can put them in the same place in my word file.

A PDF to Word converter is a horribly complex proposition.
Your best bet will probably to use Open Office to do it for you and not even try to handle the intermediate steps.
http://www.openoffice.org/api/

Look at this: Advanced PDF parser for Java
OFF:
-Also to my knowledge there is a python parser that sorta converts the pdf to html (that way you can keep track of the ordering of the objects within the pdf). I know its not java, but you might be able to use the output.
http://www.unixuser.org/~euske/python/pdfminer/index.html

How to extract data from a pdf file using JPedal?

Actually I am attempting to extract the data from a PDF file but I didn't find any example in the internet and I am asking if there is any possibility that I can use the JPedal library to open to read the data from a PDF file.

You can use PDFBox from Apache.

I am not familiar with JPedal, but I write lots of code that generates and processes pdf files. I use IText and highly recommend it. If you have a specific question on how to process a pdf file, let me know.

Java: Placing a Header in MS Word Document

We are converting a C++ project to Java where we generate reports in ".doc" extension. The problem is we don't use any third party library to generate MS Word document, rather a file with .doc extension. Everything works fine except that we can't seem to find a way to add a Header at the beginning of every page. Using line numbers is not an option. Any other way it can be done?
Thank you.

The Apache POI library might be of some help.
It has facilities to read and modify Microsoft proprietary file formats like MS-Word .doc and MS-Excel .xls

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Searching Docx files in java - java

Try http://tika.apache.org/ or docx4j or POI.

Related

Java - How to merge multiple documents (With Multiple file formats with file convert) to a single PDF?

Generate PDF or Similar to it

extracting text AND Images from PDF file

How to extract data from a pdf file using JPedal?

Java: Placing a Header in MS Word Document

Categories

Resources