Convert RTF string to HTML String using Java

Convert RTF string to HTML String using Java - java

I tried using the simple EditorKit option, but that doesn't seem to support all the RTF formats.
So I turned into using either Tika,JODConverter or POI.
As of now I managed to make it work with JODConverter and openOffice by using this
OfficeManager officeManager = new DefaultOfficeManagerConfiguration()
.setPortNumbers(8100, 8101).buildOfficeManager();
officeManager.start();
OfficeDocumentConverter converter - new
OfficeDocumentConverter(officeManger);
try{
File tempFile = File.createTempFile("tempRtf", ".rtf");
BufferedWriter bw = new BufferedWriter(new FileWriter(tempFile));
bw.write(rtfString);
bw.close;
File outputTempFile = File.createTempFile("otuputTepFile", ".html");
converter.convert(tempFile, outputTempFile);
return FileUtils.readFileToString(outputTempFile);
This works.
My problem is that I actually set up a server and close it, which takes a lot of time.
I tried to see if I can bring up the process on the first run\report (I use it as a Handler in birt report) and then just to check if the process is running, if so use it to convert, and that's it, it'll save lots of time I see is wasted on initiating and closing process ( I don't care it will stay up)
My problem is that it seems like these classes as noted here are not present on my version of JODConverter.
After farther investigation, I found out that they are on the JODConverter 2.2 API and i use the 3.0 core-beta-4.
JODConverter seems to be kinda complex to a my simple need.
so if anyone knows how to start the office manger once and then just check if its up I'd love a code sample, and Of course if anyone got better solution than JODConverter to my need ill be glad to hear it.
EDIT: I need my Handler to do 2 things, 1. check if there is an instance of officemanager up, and connect to it (we skip the officeManager.start())
and 2. if the instance isn't up, then ill basically do what the code sample i wrote sent.
This code is written in a BIRT Handler, so i can't create the officeManager globally and just share it, cause the handler class runs everytime i call birt engine.
Maybe i can set up the officeManager in the Birt itself? then ill have the instance in the handler?

Related

Save pdf report on database using BIRT

So, I'm trying to save the pdf report in database using service methode. I saw that there's a way to specify the output of the generated report by calling : pdfOptions.setOutputStream(output). But how can I call my save methode this way?
I saw this post but i'm stack at the persist point
I apreciate any advice
PDFRenderOption pdfOptions = new PDFRenderOption(options);
pdfOptions.setOutputFormat(FORMAT_PDF);
pdfOptions.setOption(IPDFRenderOption.PAGE_OVERFLOW, IPDFRenderOption.OUTPUT_TO_MULTIPLE_PAGES);
pdfOptions.setOutputStream(response.getOutputStream());//opens report on browser
runAndRenderTask.setRenderOption(pdfOptions);

You are streaming the output directly to the client with
pdfOptions.setOutputStream(response.getOutputStream());//opens report on browser
If you do this, your output gets consumed and you'll not be able to save it to the database.
I would use a "tee" like approach, you know, with one input stream and two output streams.
You could write that yourself, our you just use something like the Apache TeeOutputStream.
This could look like this:
OutputStream blobOutputStream = ...; // for writing to the DB as BLOB.
OutputStream teeStream = TeeOutputStream(response.getOutputStream(), blobOutputStream);
pdfOptions.setOutputStream(teeStream);

Merging PDFs with Sejda fails with stream output

Using Sejda 1.0.0.RELEASE, I basically followed the tutorial for splitting a PDF but tried merging instead (org.sejda.impl.itext5.MergeTask, MergeParameters, ...). All works great with the FileTaskOutput:
parameters.setOutput(new FileTaskOutput(new File("/some/path/merged.pdf")));
However I am unable to change this to StreamTaskOutput correctly:
OutputStream os = new FileOutputStream("/some/path/merged.pdf");
parameters.setOutput(new StreamTaskOutput(os));
parameters.setOutputName("merged.pdf");
No error is reported, but the resulting file cannot be read by Preview.app and is approximately 31 kB smaller (out of the ~1.2 MB total result) than the file saved above.
My first idea was: stream is not being closed properly! So I added os.close(); to the end of CompletionListener, still the same problem.
Remarks:
The reason I need to use StreamTaskOutput is that this merge logic will live in a web app, and the merged PDF will be sent directly over HTTP. I could store the temporary file and serve that one, but that is a hack.
Due to licencing issues, I cannot use the iText 5 version of the task.
Edit
Turns out, the reason is that StreamTaskOutput zips the result into a ZIP file! OutputWriterHelper.copyToStream() is the culprit. If I rename merged.pdf to merged.zip, it's a valid ZIP file containing a perfectly valid merged.pdf file!
Could anyone (dear authors of the library) comment on why this is happening?

The idea is that when a task consumes a MultipleOutputTaskParameters producing multiple output documents, the StreamTaskOutput has to group them to be able to write all of them to a stream output. Unfortunately Sejda currently applies the same logic to SingleOutputTaskParameters, hence your issue. We can fix this in Sejda 2.0 because it makes more sense to directly stream the out document in case of SingleOutputTaskParameters. For Sejda 1.x I'm not sure how to address this remaining compatible with the existing behaviour.

Failing for Larger Input Files Only: FileServiceFactory getBlobKey throws IllegalArgumentException

I have a Google App Engine App that converts CSV to XML files. It works fine for small XML inputs, but refuses to finalize the file for larger inputed XML. The XML is read from, and the resulting csv files are written to, many times before finalization, over a long-running (multi-day duration) task. My problem is different than this FileServiceFactory getBlobKey throws IllegalArgumentException , since my code works fine both in production and development with small input files. So it's not that I'm neglecting to write to the file before closing/finalizing. However, when I attempt to read from a larger XML file. The input XML file is ~150MB, and the resulting set of 5 CSV files is each much smaller (perhaps 10MB each). I persisted the file urls for the new csv files, and even tried to close them with some static code, but I just reproduce the same error, which is
java.lang.IllegalArgumentException: creation_handle: String properties must be 500 characters or less. Instead, use com.google.appengine.api.datastore.Text, which can store strings of any length.
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedSingleValue(DataTypeUtils.java:242)
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedValue(DataTypeUtils.java:207)
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedValue(DataTypeUtils.java:173)
at com.google.appengine.api.datastore.Query$FilterPredicate.<init>(Query.java:900)
at com.google.appengine.api.datastore.Query$FilterOperator.of(Query.java:75)
at com.google.appengine.api.datastore.Query.addFilter(Query.java:351)
at com.google.appengine.api.files.FileServiceImpl.getBlobKey(FileServiceImpl.java:329)
But I know that it's not a String/Text data type issue, since I am already using similar length file service urls for the previous successful attempts with smaller files. It also wasn't an issue for the other stackoverflow post I linked above. I also tried putting one last meaningless write before finalizing, just in case it would help as it did for the other post, but it made no difference. So there's really no way for me to debug this... Here is my file closing code that is not working. It's pretty similar to the Google how-to example at http://developers.google.com/appengine/docs/java/blobstore/overview#Writing_Files_to_the_Blobstore .
log.info("closing out file 1");
try {
//locked set to true
FileWriteChannel fwc1 = fileService.openWriteChannel(csvFile1, true);
fwc1.closeFinally();
} catch (IOException ioe) {ioe.printStackTrace();}
// You can't get the blob key until the file is finalized
BlobKey blobKeyCSV1 = fileService.getBlobKey(csvFile1);
log.info("csv blob storage key is:" + blobKeyCSV1.getKeyString());
csvUrls[i-1] = blobKeyCSV1.getKeyString();
break;
At this point, I just want to finalize my new blob files for which I have the urls, but cannot. How can I get around this issue, and also, what may be the cause? Again, my code works for small files (~60 kB), but the input file of ~150MB fails). Thank you for any advice on what is causing this or how to get around it! Also, how long will my unfinalized files stick around for, before being deleted?

This issue was a bug in the Java MapReduce and Files API, which was recently fixed by Google. Read announcement here: groups.google.com/forum/#!topic/google-appengine/NmjYYLuSizo

Unable to print PNG files using Java Print Services (Everything else works fine)

I am using the Java print services to print a PNG file, however it is sending erroneous output to the printer. What actually gets printed (when I use a PNG) is some text saying:
ERROR: /syntaxerror in --%ztokenexec_continue--
Operand stack:
--nostringval-
There seems to be some more text, but that is kind of lost out of the page margins. I am setting the DocFlavor to DocFlavor.INPUT_STREAM.PNG and the specified file is actually an InputStream (Just changing the DoccFlavor to DocFlavor.INPUT_STREAM.PDF and using a pdf file works).
I have also tried it with different PNG files, but the problem persists. For what its worth, even PostScript seems to be working.
The errors that are being printed look quite similar to the gd (or ImageMagick?) errors. So, my best assumption right now is that the conversion from PNG -> PS is failing.
The code is as follows:
PrintService printService = this.getPrintService("My printer name");
final Doc doc = new SimpleDoc(document, DocFlavor.INPUT_STREAM.PNG, null);
final DocPrintJob printJob = printService.createPrintJob();
Here, getPrintService fetches a print service and is fetching a valid one. As for the document, here is how I get it:
File pngFile = new File("/home/rprabhu/temp/myprintfile.png");
FileInputStream document = new FileInputStream(pngFile);
I have no clue why it is going wrong, and I don't see any errors being output to the console as well.
Any help is greatly appreciated. Thanks.

Printing is always a messy business – inevitably so, because you have to worry about tedious details such as the size of a page, the margin sizes, and how many pages you're going to need for your output. As you might expect, the process for printing an image is different from printing text and you may also have the added complication of several printers with different capabilities being available, so with certain types of documents you need to select an appropriate printer.
Please see below links :
http://vineetreynolds.wordpress.com/2005/12/12/silent-print-a-pdf-print-pdf-programmatically/
http://hillert.blogspot.com/2011/12/java-print-service-frustrations.html

GData Workspace Document

I have an application that currently uses Apache Abdera to parse Atom Pub documents (Workspace, Collection, Feed, Entry) - and want to switch the the GData Libraries, mainly to get rid of a lot of dependencies and I have found the GData calls to be consistently faster. Anyway, I cannot figure out how to generate some of these document types through GData.
Example:
Workspace w = new Workspace(new PlainTextConstruct("My Workspace"));
System.out.println(w); // prints a memory location
System.out.println(w.getXmlBlob()); // prints memory location or null
In Abdera this would have worked. I am guessing I am missing the use of some parsing class, but the documentation is not very forward on this topic.
I am expecting a document like this (not exactly):
<workspace><atom:title>My Workspace</atom:title></workspace>

Well I managed to find the answer myself, still trying to figure out how to assign a default namespace so it doesn't append "atom" to every xml tag.
Workspace workspace = new Workspace(new PlainTextConstruct("My Workspace"));
CharArrayWriter charWr = new CharArrayWriter();
workspace.generate(new XmlWriter(charWr), new ExntensionProfile());
System.out.println(charWr.toString());

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Convert RTF string to HTML String using Java - java

Related

Save pdf report on database using BIRT

Merging PDFs with Sejda fails with stream output

Failing for Larger Input Files Only: FileServiceFactory getBlobKey throws IllegalArgumentException

Unable to print PNG files using Java Print Services (Everything else works fine)

GData Workspace Document

Categories

Resources