How to print ODF files using java print API - java

How do i print an ODF file using the java print api framework? i can't seem to find any info on printing odf file using the java print api.
java accept postscript and convert it into doc for printing.
FileInputStream fis = new FileInputStream("example.ps");
Doc doc = new SimpleDoc(fis, psFlavor, null);
pj.print(doc, aset);
how do i do the same for ODF file?

Related

Read Text from RTF file

I tried to read rtf file using Apache POI but I found issues with it. It reports Invalid Header exception. It seems like POI doesn't support rtf files. Is there any way to read .rtf using any open source java API. (I heard about Aspose API but it's not free)
Any solutions??
You can try the RTFEditorKit. It supports images and text as well.
Or look at this answer: Java API to convert RTF file to Word document (97-2003 format)
There is no free library that supports this. But it may not be that hard to create a basic compare function yourself. You can read in an rtf file and then extract the text like this:
// read rtf from file
JEditorPane p = new JEditorPane();
p.setContentType("text/rtf");
EditorKit rtfKit = p.getEditorKitForContentType("text/rtf");
rtfKit.read(new FileReader(fileName), p.getDocument(), 0);
rtfKit = null;
// convert to text
EditorKit txtKit = p.getEditorKitForContentType("text/plain");
Writer writer = new StringWriter();
txtKit.write(writer, p.getDocument(), 0, p.getDocument().getLength());
String documentText = writer.toString();

how to modify metadata of a doc document

I'm looking to modify certain tags (like comments, keywords, etc) of a .DOC file. I've been able to do this for DOCX using docx4j but I haven't been able to find anything that lets me change the tags for a .DOC format.
Is there a way to programmatically change the content of certain tags in a .DOC file?
Apache POI will quite happily let you read and edit the metadata of supported documents. For the older OLE2 formats (.doc, .xls etc), you'll want to use HPSF, likely via POIDocument. For the OOXML formats (.docx, .xlsx etc) use POIXMLDocument and POIXMLProperties
To modify the OLE2 properties, you can either follow the detailed instructions and code in the HPSF documentation, or on newer version of POI you can short cut quite a bit of that with HPSFPropertiesOnlyDocument, eg
NPOIFSFileSystem fs = new NPOIFSFileSystem(new File("test.doc"));
HPSFPropertiesOnlyDocument doc = new HPSFPropertiesOnlyDocument(fs);
SummaryInformation si = doc.getSummaryInformation();
if (si == null) doc.createInformationProperties();
si.setAuthor("StackOverflow");
si.setTitle("Properties Demo!");
FileOutputStream out = new FileOutputStream("changed.doc");
doc.write(out);
out.close();

convert a xls file as pdf without poi or jxl

So I summarize my problem. I would like to convert an xls file to PDF, while using java. .
I find two examples
The first is with Openoffice
import officetools.OfficeFile; // from officetools.jar
FileInputStream fis = new FileInputStream(new File("test.doc"));
FileOutputStream fos = new FileOutputStream(new File("test.pdf"));
OfficeFile f = new OfficeFile(fis,"localhost","8100", false);
f.convert(fos,"pdf");
But unfortunately I have to install it :(
I also find this example, two command line with vb (call pdf creator)
DoCmd.OpenReport "repClient", acViewPreview, "NumClient = 2"
DoCmd.OutputTo acOutputReport, "PDF", "d: \ test.pdf"
is there somthing like that on java !!!!
(Note I used for my first solution (jxl, appach poi) but formatting pdf generated is not like when I do save as PDF with Microsoft Excel)
think you in advance
I think you can stream the data from the excel document using
apache POI
library. You can pass this stream of data in
iText library API.
iText library API definitely has a function which writes stream data into PDF file. With iText, you can be sure of pdf formatting as it is widely used in organizations for PDF generation. Infact many reporting tool also use iText to generate PDF reports.

Adding pdf document to Solr in Java

I can add field like this in Java, but I want to add pdf document to Solr with SolrJ in Java, how can I add a pdf file?
CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr");
SolrInputDocument doc = new SolrInputDocument();
doc.addField("cat", "lalal");
doc.addField("id", "1");
server.add(doc);
server.commit();
Solr uses Apache Tika to process binary files.
See http://wiki.apache.org/solr/ExtractingRequestHandler and http://wiki.apache.org/solr/ContentStreamUpdateRequestExample for a SolrJ example.

How to convert .doc or .docx files to .txt

I'm wondering how you can convert Word .doc/.docx files to text files through Java. I understand that there's an option where I can do this through Word itself but I would like to be able to do something like this:
java DocConvert somedocfile.doc converted.txt
Thanks.
If you're interested in a Java library that deals with Word document files, you might want to look at e.g. Apache POI. A quote from the website:
Why should I use Apache POI?
A major use of the Apache POI api is
for Text Extraction applications such
as web spiders, index builders, and
content management systems.
P.S.: If, on the other hand, you're simply looking for a conversion utility, Stack Overflow may not be the most appropriate place to ask for this.
Edit: If you don't want to use an existing library but do all the hard work yourself, you'll be glad to hear that Microsoft has published the required file format specifications. (The Microsoft Open Specification Promise lists the available specifications. Just google for any of them that you're interested in. In your case, you'd need e.g. the OLE2 Compound File Format, the Word 97 binary file format, and the Open XML formats.)
Use command line utility Apache Tika. Tika suports a wide number of formats (ex: doc, docx, pdf, html, rtf ...)
java -jar tika-app-1.3.jar -t somedocfile.doc > converted.txt
Programatically:
File inputFile = ...;
Tika tika = new Tika();
String extractedText = tika.parseToString(inputFile);
You can use Apache POI too. They have a tool to extract text from doc/docx Text Extraction. If you want to extract only the text, you can use the code below. If you want to extract Rich Text (such as formatting and styling), you can use Apache Tika.
Extract doc:
InputStream fis = new FileInputStream(...);
POITextExtractor extractor;
// if docx
if (fileName.toLowerCase().endsWith(".docx")) {
XWPFDocument doc = new XWPFDocument(fis);
extractor = new XWPFWordExtractor(doc);
} else {
// if doc
POIFSFileSystem fileSystem = new POIFSFileSystem(fis);
extractor = ExtractorFactory.createExtractor(fileSystem);
}
String extractedText = extractor.getText();
You should consider using this library. Its Apache POI
Excerpt from the website
In short, you can read and write MS
Excel files using Java. In addition,
you can read and write MS Word and MS
PowerPoint files using Java. Apache
POI is your Java Excel solution (for
Excel 97-2008). We have a complete API
for porting other OOXML and OLE2
formats and welcome others to
participate.
Docmosis can read a doc and spit out the text in it. Requires some infrastructure to be installed (such as OpenOffice).
You can also use JODConverter.

Categories

Resources