Running a JavaScript command from MATLAB to fetch a PDF file - java

I'm currently writing some MATLAB code to interact with my company's internal reports database. So far I can access the HTML abstract page using code which looks like this:
import com.mathworks.mde.desk.*;
wb=com.mathworks.mde.webbrowser.WebBrowser.createBrowser;
wb.setCurrentLocation(ReportURL(8:end));
pause(1);
s={};
while isempty(s)
s=char(wb.getHtmlText);
pause(.1);
end
desk=MLDesktop.getInstance;
desk.removeClient(wb);
I can extract out various bits of information from the HTML text which ends up in the variable s, however the PDF of the report is accessed via what I believe is a JavaScript command (onClick="gotoFulltext('','[Report Number]')").
Any ideas as to how I execute this JavaScript command and get the contents of the PDF file into a MATLAB variable?
(MATLAB sits on top of Java, so I believe a Java solution would work...)

I think you should take a look at the JavaScript that is being called and see what the final request to the webserver looks like.
You can do this quite easily in Firefox using the FireBug plugin.
https://addons.mozilla.org/en-US/firefox/addon/1843
Once you have found the real server request then you can just request this URL or post to this URL instead of trying to run the JavaScript.

Once you have gotten the correct URL (a la the answer from pjp), your next problem is to "get the contents of the PDF file into a MATLAB variable". Whether or not this is possible may depend on what you mean by "contents"...
If you want to get the raw data in the PDF file, I don't think there is a way currently to do this in MATLAB. The URLREAD function was the first thing I thought of to read content from a URL into a string, but it has this note in the documentation:
s = urlread('url') reads the content
at a URL into the string s. If the
server returns binary data, s will
be unreadable.
Indeed, if you try to read a PDF as in the following example, s contains some text intermingled with mostly garbage:
s = urlread('http://samplepdf.com/sample.pdf');
If you want to get the text from the PDF file, you have some options. First, you can use URLWRITE to save the contents of the URL to a file:
urlwrite('http://samplepdf.com/sample.pdf','temp.pdf');
Then you should be able to use one of two submissions on The MathWorks File Exchange to extract the text from the PDF:
Extract text from a PDF document by Dimitri Shvorob
PDF Reader by Tom Gaudette
If you simply want to view the PDF, you can just open it in Adobe Acrobat with the OPEN function:
open('temp.pdf');

wb=com.mathworks.mde.webbrowser.WebBrowser.createBrowser;
wb.executeScript('javascript:alert(''Some code from a link'')');
desk=com.mathworks.mde.desk.MLDesktop.getInstance;
desk.removeClient(wb);

Related

How to put a HTML link in URLEncoder message

I am trying to create a String in java with a link in it.
The message reads like
String message ="Something happened please go back to Home and start again";
This message is ultimately encoded using
String msg = URLEncoder.encode(message,"UTF-8");
and displayed on a JSP page, but this message when rendered on JSP page looks like this.
Something happened please go back to Home and start again
Plain String without a actual link in it.
I am not sure how to embed a link in a String message in Java.
This seems a lot like the issue discussed on this link:
https://www.talisman.org/~erlkonig/misc/lunatech%5Ewhat-every-webdev-must-know-about-url-encoding/#Donotuse%7B%7Bjava.net.URLEncoder%7D%7Dor%7B%7Bjava.net.URLDecoder%7D%7DforwholeURLs
The article says:
Do not use java.net.URLEncoder or java.net.URLDecoder for whole URLs
We are not kidding. These classes are not made to encode or decode
URLs, as their API documentation clearly says:
Utility class for HTML form encoding. This class contains static
methods for converting a String to the
application/x-www-form-urlencoded MIME format. For more information
about HTML form encoding, consult the HTML specification.
This is not about URLs. At best it resembles the query part encoding.
It is wrong to use it to encode or decode entire URLs. You would think
the standard JDK had a standard class to deal with URL encoding
properly (part by part, that is) but either it is not there, or we
have not found it, which lures a lot of people into using URLEncoder
for the wrong purpose.
I have formatted the relevant code from the page above and adjusted it with regards to your code:
String pathSegment = "link.com";
String message ="Something happened please go back to Home and start again";

Using the created document trough FPDF with PHP/JAVA

I created a PDF document with PHP using FPDF. The next thing I want to do is silently printing the document without downloading the PDF file to the computer.
I've made the following code:
$pdfprintable = $pdf->Output(''.'.pdf','S');
$printcmd = "java -classpath jPDFPrint.jar;pdfprintcli.jar cli.PDFPrintCLI $pdfprintable";
exec($printcmd);
And it returns the following error message:
Warning: exec(): NULL byte detected. Possible attack in C:\Users\Jordy\Desktop\XAMPP\htdocs\php\stickers\pdf.php on line 392
If I echo the $pdfprintable in PHP it shows a lot of weird characters.
Are you sure the java command is supposed to be used with an hexadecimal string represenation of the PDF ?
use option
$pdfprintable = $pdf->Output('USEAFULLPATHTOFILE.pdf','F');
With the above the PDF is generated and then you can try to print it with the java application if that one works.
Also if you are loading the PDF correctly in FPDF you should be able to use the option D in ->Output
$pdfprintable = $pdf->Output('USEAFULLPATHTOFILE.pdf','D');
Use this to verify the that the PDF is loaded and also managed correctly by FPDF.
Also notice your example code is very limited.
If you need more troubleshooting pls show the Java and the full PHP source relevant to printing operation, loading or creation of the PDF in FPDF

Blank PDF while downloading

I am facing a very strange issue, I am trying to send the PDF file as attachment from my struts application using below code,
JasperReport jrReport = (JasperReport) JRLoader.loadObject(jasperReport);
JasperPrint jasperPrint = JasperFillManager.fillReport(jrReport, parameters, dataSource);
jasperPrint.setName(fileNameTobeGivenToExportedReport);
response.reset();
response.setContentType("application/pdf");
response.setHeader("Content-Disposition", "attachment; filename=\"" + fileNameTobeGivenToExportedReport + ".pdf" + "\"");
response.setHeader("Cache-Control", "private");
JasperExportManager.exportReportToPdfStream(jasperPrint, response.getOutputStream());
but the PDF that is being downloaded is coming with no data, means it is showing the blank page.
When in the above code I added the below line to save the PDF file in my D: directory
File pdf = new File("D:\\sample22.pdf");
JasperExportManager.exportReportToPdfStream(jasperPrint, new FileOutputStream(pdf));
The file that is getting generated is proper, mean with all the data. One thing that I noticed that the file that is downloading from browser and "sample22.pdf" have same size.
I read an article that says that it might be an issue with server configuration as our server might be corrupting the output stream. This is the article that I read Creating PDF from Servlet.
This article says
This can happen when your server flattens all bytes with a value higher than 127. Consult your web (or application) server manual to find out how to make sure binary data is sent correctly to the browser.
I am using struts 1.x, jBoss6, iReport 1.2
Suppose that you have a simple "Hello World" PDF document:
When you open this document, you see that the file structure uses ASCII characters, but that the actual content of the page is compressed to a binary stream:
You don't see the words "Hello World" anywhere, they are compressed along with the PDF syntax that contains info needed to draw these words on the page into this stream:
xœ+är
á26S°00SIá2PÐ5´ 1ôÝBÒ¸4<RsròÂó‹rR5C²€j#*\C¸¹ Çq°
Now suppose that a process shave all the non-ASCII characters into ASCII. I've done this manually as you can see in the next screen shot:
I can still open the document, because I didn't change anything to the file structure: there is still a /Pages three with a single /Page dictionary. From the syntactical point of view, the file looks OK, so I can open it in Adobe Reader:
As you can see, the words "Hello World" are gone. The stream containing the syntax to render these words were corrupted (in my case manually, in your case by the server, or by Struts, or by whatever process you are using that thinks you are creating plain text instead of a binary file).
What you need to do, is to find the place where this happens. Maybe Struts is the culprit. Maybe you are (unintentionally) using Struts as if you were creating a plain text file. It is hard to tell remotely. This is a typical problem caused by a configuration issue. Only somebody with access to your configuration can solve this.

Suppress Print Dialog when printing to Microsoft Document Image Writer from Oracle BPM 10g

We have an Oracle BPM 10g activity that:
Reads a form-fill protected Word document template.
Merges data into the fields.
Saves the merged/filled copy to the filesystem.
Prints the document to a selected, pre-defined printer, OR to the default printer.
All of this works fine when printing to a "real" printer. However, there is now a need to output the Word document to TIFF. Attempting to use "Microsoft Document Image Writer" as one of the printer selections does not work as expected. Normally, when printing to the Microsoft Document Image Writer from Word (or any other application) directly, you're prompted for a location to save the resultant file. This prompting does not occur when attempting to print from this particular activity in BPM 10g.
Ideally, we actually would like to bypass the dialog and output the TIFF directly to the filesystem. However, I have not found a way to control this programmatically. That is, being able to specify the destination filename in code. Right now, I'm just trying to get output to the Microsoft Document Image Writer at all, to make sure it works.
So, the bottom line question(s) is/are:
Can this be done? I.e., printing to Microsoft Document Image Writer
If yes, can the file location dialog be suppressed?
How?
You said nothing about the way you're automating Word.
In Word VBA, you may use this sample to print out the active document immediately without showing the print dialog:
Public Sub PrintToXPS()
'Presume that Microsoft XPS Document Writer was already
'set up as ActivePrinter
Dim strFilePath As String
strFilePath = "C:\temp\helloworld.xps"
ActiveDocument.PrintOut Background:=False, outputfilename:=strFilePath
End Sub
There's no need to use the print dialog instead. However, if you want to operate through the dialog object, that can be done in Word using a variable of type Word.Dialog and providing the necessary parameters, e.g.
Dim dlgFilePrint As Word.Dialog
Set dlgFilePrint = Application.Dialogs(wdDialogFilePrint)
dlgFilePrint.Update
dlgFilePrint.PrToFileName = strFilePath
dlgFilePrint.printtofile = True
'add other parameters as needed ...
'lock up parameter names in Word VBA Online Help using "WdWordDialog-Enumeration"
'as key word
dlgFilePrint.Execute
What I did here with the XPS printer, you may of course do also with any other printer.
Thank you, domke consulting.
After more searching, I found this forum post on MSDN.
Adding these registry entries to suppress the dialog box and suppress post-generation output seemed to do the trick:
In HKEY_CURRENT_USER\Software\Microsoft\Office\12.0\MODI\MDI Writer
PrivateFlags = 17 (Decimal)
OpenInMODI = 0 (Decimal)
For our purposes, this seems to work fine if we call the printOut() method with the following relevant arguments (other arguments omitted here for brevity):
document.printOut(outputFileName : "C:\\temp\\fileName.tif", printToFile : true);

Selenium 2: Detect content type of link destinations

I am using the Selenium 2 Java API to interact with web pages. My question is: How can i detect the content type of link destinations?
Basically, this is the background: Before clicking a link, i want to be sure that the response is an HTML file. If not, i need to handle it in another way. So, let's say there is a download link for a PDF file. The application should directly read the contents of that URL instead of opening it in the browser.
The goal is to have an application which automatically knows wheather the current location is an HTML, PDF, XML or whatever to use appropriate parsers to extract useful information out of the documents.
Update
Added bounty: Will reward it to the best solution which allows me to get the content type of a given URL.
As Jochen suggests, the way to get the Content-type without also downloading the content is HTTP HEAD, and the selenium webdrivers does not seem to offer functionality like that. You'll have to find another library to help you with fetching the content type of an url.
A Java library that can do this is Apache HttpComponents, especially HttpClient.
(The following code is untested)
HttpClient httpclient = new DefaultHttpClient();
HttpHead httphead = new HttpHead("http://foo/bar");
HttpResponse response = httpclient.execute(httphead);
BasicHeader contenttypeheader = response.getFirstHeader("Content-Type");
System.out.println(contenttypeheader);
The project publishes JavaDoc for HttpClient, the documentation for the HttpClient interface contains a nice example.
You can figure out the content type will processing the data coming in.
Not sure why you need to figure this out first.
If so, use the HEAD method and look at the Content-Type header.
You can retrieve all the URLs from the DOM, and then parse the last few characters of each URL (using a java regex) to determine the link type.
You can parse characters proceeding the last dot. For example, in the url http://yoursite.com/whatever/test.pdf, extract the pdf, and enforce your test logic accordingly.
Am I oversimplifying your problem?

Categories

Resources