I am having problem when importing document (PDF) into Alfresco repository inside java backed webscript. I am using writer of ContentService.
If I use
ContentWriter writer = ContentService.getWriter(nodeRef, ContentModel.PROP_CONTENT, true);
writer.setEncoding("UTF-8");
writer.setMimetype("application/pdf");
writer.putContent(new String(byte []) );
or
writer.putContent(new String(byte [], "UTF-8") );
my document is not previewable (I see blank PDF file, tried with few small PDF files, don't know what would happen in case of other/larger files).
But if I use another putContent method which takes File as argument I'll successfully import the document.
writer.setEncoding("UTF-8");
writer.setMimetype("application/pdf");
writer.putContent(File);
I don't want to import file from disk since I get the file as Base64 encoded String but I don't know what am I missing.
You could use an InputStream as a parameter for ContentWriter::putContent. So you will prevent the String to byte array (and vice versa) conversions, which leads to difficulties with the encoding.
writer.putContent(new ByteArrayInputStream(Base64.decodeBase64("yourBase64EncodedString")))
Related
I wanted to make a simple program to get text content from a pdf file through Java. Here is the code:
PDFTextStripper ts = new PDFTextStripper();
File file = new File("C:\\Meeting IDs.pdf");
PDDocument doc1 = PDDocument.load(file);
String allText = ts.getText(doc1);
String gradeText = allText.substring(allText.indexOf("GRADE 10B"), allText.indexOf("GRADE 10C"));
System.out.println("Meeting ID for English: "
+ gradeText.substring(gradeText.indexOf("English") + 7, gradeText.indexOf("English") + 20));
This is just part of the code, but this is the part with the problem.
The error is: The method load(File) is undefined for the type PDDocument
I have learnt using PDFBox from JavaTPoint. I have followed the correct instructions for installing the PDFBox libraries and adding them to the Build Path.
My PDFBox version is 3.0.0
I have also searched the source files and their methods, and I am unable to find the load method there.
Thank you in advance.
As per the 3.0 migration guide the PDDocument.load method has been replaced with the Loader method:
For loading a PDF PDDocument.load has been replaced with the Loader
methods. The same is true for loading a FDF document.
When saving a PDF this will now be done in compressed mode per
default. To override that use PDDocument.save with
CompressParameters.NO_COMPRESSION.
PDFBox now loads a PDF Document incrementally reducing the initial
memory footprint. This will also reduce the memory needed to consume a
PDF if only certain parts of the PDF are accessed. Note that, due to
the nature of PDF, uses such as iterating over all pages, accessing
annotations, signing a PDF etc. might still load all parts of the PDF
overtime leading to a similar memory consumption as with PDFBox 2.0.
The input file must not be used as output for saving operations. It
will corrupt the file and throw an exception as parts of the file are
read the first time when saving it.
So you can either swap to an earlier 2.x version of PDFBox, or you need to use the new Loader method. I believe this should work:
File file = new File("C:\\Meeting IDs.pdf");
PDDocument doc1 = Loader.loadPDF(file);
i'm trying to delete the first 3 rows of a tif file content generated by a scanner, because i cant open correctly.
example of rows to delete:
------=_Part_23XX49_-1XXXX3073.1XXXXX20715
ID: documento<br>
MimeType: image/tiff
I have no problem about change the content, but when i save the new file, i cant open correctly again.
System.out.println(new InputStreamReader(in).getEncoding());
this method tell me that the encoding of source file is "Cp1252", so i've put an argument in the JVM (-Dfile.encoding=Cp1252), but nothing appear to change.
This is what i do:
StringBuilder fileContent = new StringBuilder();
// working with content and save result content in fileContent variable
// save the file again
FileWriter fstreamWrite = new FileWriter(f.getAbsolutePath());
out = new BufferedWriter(fstreamWrite);
out.write(fileContent.toString());
Is possible that something is going wrong with Encoding?
if i do the operation with notepad++, i obtain a correct tiff that i can open without problem.
I found the TIFF Java library that maybe gonna be useful for your requirements.
Please take a look at the readme how to read and how to write a tiff file.
Hope this can help you
I am facing a very strange issue, I am trying to send the PDF file as attachment from my struts application using below code,
JasperReport jrReport = (JasperReport) JRLoader.loadObject(jasperReport);
JasperPrint jasperPrint = JasperFillManager.fillReport(jrReport, parameters, dataSource);
jasperPrint.setName(fileNameTobeGivenToExportedReport);
response.reset();
response.setContentType("application/pdf");
response.setHeader("Content-Disposition", "attachment; filename=\"" + fileNameTobeGivenToExportedReport + ".pdf" + "\"");
response.setHeader("Cache-Control", "private");
JasperExportManager.exportReportToPdfStream(jasperPrint, response.getOutputStream());
but the PDF that is being downloaded is coming with no data, means it is showing the blank page.
When in the above code I added the below line to save the PDF file in my D: directory
File pdf = new File("D:\\sample22.pdf");
JasperExportManager.exportReportToPdfStream(jasperPrint, new FileOutputStream(pdf));
The file that is getting generated is proper, mean with all the data. One thing that I noticed that the file that is downloading from browser and "sample22.pdf" have same size.
I read an article that says that it might be an issue with server configuration as our server might be corrupting the output stream. This is the article that I read Creating PDF from Servlet.
This article says
This can happen when your server flattens all bytes with a value higher than 127. Consult your web (or application) server manual to find out how to make sure binary data is sent correctly to the browser.
I am using struts 1.x, jBoss6, iReport 1.2
Suppose that you have a simple "Hello World" PDF document:
When you open this document, you see that the file structure uses ASCII characters, but that the actual content of the page is compressed to a binary stream:
You don't see the words "Hello World" anywhere, they are compressed along with the PDF syntax that contains info needed to draw these words on the page into this stream:
xœ+är
á26S°00SIá2PÐ5´ 1ôÝBÒ¸4<RsròÂó‹rR5C²€j#*\C¸¹ Çq°
Now suppose that a process shave all the non-ASCII characters into ASCII. I've done this manually as you can see in the next screen shot:
I can still open the document, because I didn't change anything to the file structure: there is still a /Pages three with a single /Page dictionary. From the syntactical point of view, the file looks OK, so I can open it in Adobe Reader:
As you can see, the words "Hello World" are gone. The stream containing the syntax to render these words were corrupted (in my case manually, in your case by the server, or by Struts, or by whatever process you are using that thinks you are creating plain text instead of a binary file).
What you need to do, is to find the place where this happens. Maybe Struts is the culprit. Maybe you are (unintentionally) using Struts as if you were creating a plain text file. It is hard to tell remotely. This is a typical problem caused by a configuration issue. Only somebody with access to your configuration can solve this.
my question: How can I create a new linked document and insert (or connect) it into an element (in my case a Note-Element of an activity diagram).
The Element-Class supports the three Methods:
GetLinkedDocument ()
LoadLinkedDocument (string Filename)
SaveLinkedDocument (string Filename)
I missing a function like
CreateLinkedDocument (string Filename)
My goal: I create an activity diagram programmatically and some notes are to big to display it pretty in the activity diagram. So my goal is to put this text into an linked document instead of directly in the activity diagram.
Regards
EDIT
Thank you very much to Uffe for the solution of my problem. Here is my solution code:
public void addLinkedDocumentToElement(Element element, String noteText) {
String filePath = "C:\\rtfNote.rtf";
PrintWriter writer;
//create new file on the disk
writer = new PrintWriter(filePath, "UTF-8");
//convert string to ea-rtf format
String rtfText = repository.GetFormatFromField("RTF", noteText);
//write content to file
writer.write(rtfText);
writer.close();
//create linked document to element by loading the before created rtf file
element.LoadLinkedDocument(filePath);
element.Update();
}
EDIT EDIT
It is also possible to work with a temporary file:
File f = File.createTempFile("rtfdoc", ".rtf");
FileOutputStream fos = new FileOutputStream(f);
String rtfText = repository.GetFormatFromField("RTF", noteText);
fos.write(rtfText.getBytes());
fos.flush();
fos.close();
element.LoadLinkedDocument(f.getAbsolutePath());
element.Update();
First up, let's separate the linked document, which is stored in the EA project and displayed in EA's built-in RTF viewer, from an RTF file, which is stored on disk.
Element.LoadLinkedDocument() is the only way to create a linked document. It reads an RTF file and stores its contents as the element's linked document. An element can only have one linked document, and I think it is overwritten if the method is called again but I'm not absolutely sure (you could get an error instead, but the EA API tends not to work that way).
In order to specify the contents of your linked document, you must create the file and then load it. The only other way would be to go hacking around in EA's internal and undocumented database, which people sometimes do but which I strongly advise against.
In .NET you can create RTF documents using Microsoft's Word API, but to my knowledge there is no corresponding API for Java. A quick search turns up jRTF, an open-source RTF library for Java. I haven't tested it but it looks as if it'll do the trick.
You can also use EA's API to create RTF data. You would then create your intended content in EA's internal display format and use Repository.GetFormatFromField() to convert it to RTF, which you would then save in the file.
If you need to, you can use Repository.GetFieldFromFormat() to convert plain-text or HTML-formatted text to EA's internal format.
I'm trying to index Wikpedia dumps. My SAX parser make Article objects for the XML with only the fields I care about, then send it to my ArticleSink, which produces Lucene Documents.
I want to filter special/meta pages like those prefixed with Category: or Wikipedia:, so I made an array of those prefixes and test the title of each page against this array in my ArticleSink, using article.getTitle.startsWith(prefix). In English, everything works fine, I get a Lucene index with all the pages except for the matching prefixes.
In French, the prefixes with no accent also work (i.e. filter the corresponding pages), some of the accented prefixes don't work at all (like Catégorie:), and some work most of the time but fail on some pages (like Wikipédia:) but I cannot see any difference between the corresponding lines (in less).
I can't really inspect all the differences in the file because of its size (5 GB), but it looks like a correct UTF-8 XML. If I take a portion of the file using grep or head, the accents are correct (even on the incriminated pages, the <title>Catégorie:something</title> is correctly displayed by grep). On the other hand, when I rectreate a wiki XML by tail/head-cutting the original file, the same page (here Catégorie:Rock par ville) gets filtered in the small file, not in the original…
Any idea ?
Alternatives I tried:
Getting the file (commented lines were tried wihtout success*):
FileInputStream fis = new FileInputStream(new File(xmlFileName));
//ReaderInputStream ris = ReaderInputStream.forceEncodingInputStream(fis, "UTF-8" );
//(custom function opening the stream,
//reading it as UFT-8 into a Reader and returning another byte stream)
//InputSource is = new InputSource( fis ); is.setEncoding("UTF-8");
parser.parse(fis, handler);
Filtered prefixes:
ignoredPrefix = new String[] {"Catégorie:", "Modèle:", "Wikipédia:",
"Cat\uFFFDgorie:", "Mod\uFFFDle:", "Wikip\uFFFDdia:", //invalid char
"Catégorie:", "Modèle:", "Wikipédia:", // UTF-8 as ISO-8859-1
"Image:", "Portail:", "Fichier:", "Aide:", "Projet:"}; // those last always work
* ERRATUM
Actually, my bad, that one I tried work, I tested the wrong index:
InputSource is = new InputSource( fis );
is.setEncoding("UTF-8"); // force UTF-8 interpretation
parser.parse(fis, handler);
Since you write the prefixes as plain strings into your source file, you want to make sure that you save that .java file in UTF-8, too (or any other encoding that supports the special characters you're using). Then, however, you have to tell the compiler which encoding the file is in with the -encoding flag:
javac -encoding utf-8 *.java
For the XML source, you could try
Reader r = new InputStreamReader(new FileInputStream(xmlFileName), "UTF-8");
InputStreams do not deal with encodings since they are byte-based, not character-based. So, here we create a Reader from an FileInputStream - the latter (stream) doesn't know about encodings, but the former (reader) does, because we give the encoding in the constructor.