PDF Reading using PDF box - Clarification with page count

PDF Reading using PDF box - Clarification with page count - java

Read a pdf file from url with using of PDFbox, below jave code its perfect to read a pdf and stored in project location.
String pdfPageCount = 17;
String pdfUrl = "abc.org/invoicepdf.pdf?Range=1";
URL pdfDownload = new URL(pdfUrl);
connectionGet = (HttpsURLConnection) pdfDownload.openConnection();
String authorizationHeader1 = "Bearer " + getToken;
connectionGet.setRequestProperty("Authorization", authorizationHeader1);
connectionGet.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
connectionGet.setRequestMethod("GET");
int responseCode = connectionGet.getResponseCode();
if (responseCode != 404) {
PDDocument pd = new PDDocument();
InputStream inputstreamFinal1 = connectionGet.getInputStream();
PDDocument load = PDDocument.load(inputstreamFinal1);
load.save("CopyOfInvoice1.pdf");
}
My next step
I want to looping the process based on the pdfPageCount value, currently i do hard-coded the page count in 1 in the pdfUrl (/invoicepdf.pdf?Range=1)
Expected:
Read all the 17 pages and save into an single pdf file

Here's some code, based on the PDFMergerExample that is mentioned in the comments. Note that I haven't checked if your URL retrieval code is correct.
List<InputStream> sources = new ArrayList<InputStream>();
int pdfPageCount = 17;
try
{
for (int p = 1; p <= pdfPageCount; ++p)
{
String pdfUrl = "abc.org/invoicepdf.pdf?Range=" + p;
URL pdfDownload = new URL(pdfUrl);
HttpsURLConnection connectionGet = (HttpsURLConnection) pdfDownload.openConnection();
String authorizationHeader1 = "Bearer " + getToken;
connectionGet.setRequestProperty("Authorization", authorizationHeader1);
connectionGet.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
connectionGet.setRequestMethod("GET");
int responseCode = connectionGet.getResponseCode();
if (responseCode != 404)
{
sources.add(connectionGet.getInputStream());
}
else
{
//TODO error handling
return;
}
}
PDFMergerUtility pdfMerger = new PDFMergerUtility();
pdfMerger.addSources(sources);
pdfMerger.setDestinationFileName("CopyOfInvoice1.pdf");
pdfMerger.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
}
catch (IOException e)
{
//TODO error handling
return;
}
finally
{
// cleanup
for (InputStream source : sources)
{
IOUtils.closeQuietly(source);
}
}

Related

Downloading an image in java

I have to download an image from the nasa website. Problem is, that my code sometimes works, sucessfully downloading an image, while sometimes saves only 186B (don't know why exactly 186).
Problems is for sure connected with the way nasa sahres those photos. For instance, an image from that link https://mars.jpl.nasa.gov/msl-raw-images/msss/00001/mcam/0001ML0000001000I1_DXXX.jpg is saved sucessfully, while from that link https://mars.nasa.gov/mer/gallery/all/2/f/001/2F126468064EDN0000P1001L0M1-BR.JPG fails.
Here is my code
public static void saveImage(String imageUrl, String destinationFile){
URL url;
try {
url = new URL(imageUrl);
System.out.println(url);
InputStream is = url.openStream();
OutputStream os = new FileOutputStream(destinationFile);
byte[] b = new byte[2048];
int length;
while ((length = is.read(b)) != -1) {
os.write(b, 0, length);
}
is.close();
os.close();
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Does someone have an idea, why is doesn't work?
public boolean downloadPhotosSol(int i) throws JSONException, IOException {
String url0 = "https://api.nasa.gov/mars-photos/api/v1/rovers/spirit/photos?sol=" + this.chosenMarsDate + "&camera=" + this.chosenCamera + "&page=" + i + "&api_key=###";
JSONObject json = JsonReader.readJsonFromUrl(url0);
if(json.getJSONArray("photos").length() == 0) return true;
String workspace = new File(".").getCanonicalPath();
String pathToFolder = workspace+File.separator+this.getManifest().getName() + this.chosenMarsDate + this.chosenCamera +"Strona"+i;
new File(pathToFolder).mkdirs();
for(int j = 0;j<json.getJSONArray("photos").length();j++) {
String url = ((JSONObject) json.getJSONArray("photos").get(j)).getString("img_src");
SaveImage.saveImage(url, pathToFolder+File.separator+"img"+j+".jpg");
}
return false;
}

When you get a 186 byte file, open it with a text editor and see what is inside. It could contain an HTTP error message in HTML format. If instead you see the first 186 bytes of your image file, then something is not working right with your program.
EDIT: From your comments it looks like you are getting an HTTP 301 response, which is a redirect to another location. A web browser handles this automatically without you noticing. However, your Java program is not following the redirect to the new location. You need to use an HTTP Java library that handles redirects.

Best and short way of doing it:
try(InputStream in = new URL("http://example.com/image.jpg").openStream()){
Files.copy(in, Paths.get("C:/File/To/Save/To/image.jpg"));
}

How to return JSON response from a URL returning HTML

First, some background :-
I'm trying to solve a question asked by an interviewer recently. I had to write a code and use below URL to return JSON response -
https://losangeles.craigslist.org/
This is what I did :-
1) I created a webclient and made HTTPURL Request to fetch an HTTP Response.
public static JSONArray getSearchResults(String arg) {
JSONArray jsonArray = null;
try {
QueryString qs = new QueryString("query", arg);
URL url = new URL("https://toronto.craigslist.ca/search?"+qs);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("GET");
conn.setRequestProperty("Accept", "application/text");
if (conn.getResponseCode() != 200) {
throw new RuntimeException("Failed : HTTP error code : "
+ conn.getResponseCode());
}
BufferedReader br = new BufferedReader(new InputStreamReader(
(conn.getInputStream())));
String readAPIResponse = " ";
StringBuilder output = new StringBuilder();
while ((readAPIResponse = br.readLine()) != null) {
output.append(readAPIResponse);
}
jsonArray = convertToJson(output);
System.out.println(" JSON response : "+jsonArray.toString(2));
conn.disconnect();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return jsonArray;
}
2) Below was my function to convert the response into JSON :-
public static JSONArray convertToJson(StringBuilder response) {
JSONArray jsonArr = new JSONArray();
if (response != null) {
try {
Document document = Jsoup.parse(response.toString());
Elements resultRows = document.getElementsByClass("result-row");
JSONObject jsonObj;
for (int i = 0; i < resultRows.size(); i++) {
jsonObj = new JSONObject();
Element e = resultRows.get(i);
Elements resultsDate = e.getElementsByClass("result-date");
Elements resultsTitle = e.getElementsByClass("result-title hdrlnk");
String key1 = "date";
String value1 = resultsDate.get(0).text();
jsonObj.put(key1, value1);
String key2 = "title";
String value2 = resultsTitle.get(0).text();
jsonObj.put(key2, value2);
jsonArr.put(i, jsonObj);
}
} catch (JSONException e) {
e.printStackTrace();
}
}
return jsonArr;
}
The response I received was the whole HTML page(I used postman to make requests). Since, I only had few hours to solve this question and was not sure how to parse an entire HTML, I ended up using a third party library, called JSoup. I was not 100% happy about it, but ended up having no other option.
I have not heard back from them and I am curious if this was the worst approach and if yes, what could be better options? They did not mention anything about what technology I could use. But,since the skill set I was interviewing involved Java/J2EE I was thinking to implement this in Java (Not using Node js though)
Thanks!

If you only need an XML Parser which is obviously the base of HTML this is built in in the JRE core API.
Even in the SE Version the needed packages to parse exist:
import org.w3c.dom.*;
import javax.xml.parsers.*;
import java.io.*;
Take a look at these classes they are the most important to parse or create an XML/HTML File
DocumentBuilderFactory
DocumentBuilder
Document
and here simple example for HTML
String text = "<html><head>HEAD</head><body>BODY</body>";
ByteArrayInputStream input = new ByteArrayInputStream(text.getBytes("UTF-8"));
Document doc = builder.parse(input);

Download all pdf files in website

Trying to download all pdf files in the website and I have a bad code. I guess there is a better out there. Anyways here is it:
try {
System.out.println("Download started");
URL getURL = new URL("http://cs.lth.se/eda095/foerelaesningar/?no_cache=1");
URL pdf;
URLConnection urlC = getURL.openConnection();
InputStream is = urlC.getInputStream();
BufferedReader buffRead = new BufferedReader(new InputStreamReader(is));
FileOutputStream fos = null;
byte[] b = new byte[1024];
String line;
double i = 1;
int t = 1;
int length;
while((line = buffRead.readLine()) != null) {
while((length = is.read(b)) > -1) {
if(line.contains(".pdf")) {
pdf = new URL("http://fileadmin.cs.lth.se/cs/Education/EDA095/2015/lectures/"
+ "f" + i + "-" + t + "x" + t);
fos = new FileOutputStream(new File("fil" + i + "-" + t + "x" + t + ".pdf"));
fos.write(b, 0, line.length());
i += 0.5;
t += 1;
if(t > 2) {
t = 1;
}
}
}
}
is.close();
System.out.println("Download finished");
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
The files I get is damage, BUT is there a better way to download the PDF files? Because on the site some of the files are f1-1x1, f1-2x2, f2-1x1.. But what IF the files were donalds.pdf stack.pdf etc..
So the question would be, How do I make my code better to download all the pdf files?

Basically you are asking: "how can I parse HTML reliably; to identify all download links that point to PDF files".
Anything else (like what you have right now; to anticipate how links would/could/should look like) will be a constant source for grieve; because any update to your web site; or trying to run your code against another different web site is very likely to fail. And that is because HTML is complex and has so many flavors that you should simply forget about "easy" solutions to analyse HTML content.
In that sense: learn how to use an HTML parser; a first starting point could be Which HTML Parser is the best?

Restrict the AutoCompleteTextView predictions to only restaurants

I've got an Android app that allows the user to perform a details search based on a restaurants name. However, depending on the users input, the predictions can contain places, countries, etc. Where can I add a restriction to only check for restaurant names?
My current code:
public static ArrayList<String> autocomplete(String input) {
ArrayList<String> resultList = null;
HttpURLConnection conn = null;
StringBuilder jsonResults = new StringBuilder();
try {
StringBuilder sb = new StringBuilder(PLACES_API_BASE
+ TYPE_AUTOCOMPLETE + OUT_JSON);
sb.append("?key=" + API_KEY);
sb.append("&input=" + URLEncoder.encode(input, "utf8"));
URL url = new URL(sb.toString());
System.out.println("URL: " + url);
conn = (HttpURLConnection) url.openConnection();
InputStreamReader in = new InputStreamReader(conn.getInputStream());
// Load the results into a StringBuilder
int read;
char[] buff = new char[1024];
while ((read = in.read(buff)) != -1) {
jsonResults.append(buff, 0, read);
}
} catch (MalformedURLException e) {
Log.e(LOG_TAG, "Error processing Places API URL", e);
return resultList;
} catch (IOException e) {
Log.e(LOG_TAG, "Error connecting to Places API", e);
return resultList;
} finally {
if (conn != null) {
conn.disconnect();
}
}
try {
// Create a JSON object hierarchy from the results
JSONObject jsonObj = new JSONObject(jsonResults.toString());
JSONArray predsJsonArray = jsonObj.getJSONArray("predictions");
resultList = new ArrayList<String>(predsJsonArray.length());
place = new HashMap<String, String>();
for (int i = 0; i < predsJsonArray.length(); i++) {
// System.out.println(predsJsonArray.getJSONObject(i).getString(
// "description"));
// System.out
// .println("============================================================");
resultList.add(predsJsonArray.getJSONObject(i).getString(
"description"));
String description = predsJsonArray.getJSONObject(i).getString("description");
String placeId = predsJsonArray.getJSONObject(i).getString("place_id");
place.put( description, placeId);
}
} catch (JSONException e) {
Log.e(LOG_TAG, "Cannot process JSON results", e);
}
return resultList;
}

According to place types the closest you can get is the establishment filter. You are going to have to perform a search rather than an autocomplete in order to use the restaurant filter.
Using the Places API for Android PlaceComplete sample I tried passing in Place.TYPE_ESTABLISHMENT and Place.TYPE_RESTAURANT to AutocompleteFilter.create to verify.

Try adding a place_type in your search, use this link to find the supported types.

StreamResource, ByteArray problems

I cant understand why my code are not running all the time.
I am opening a jasper report but for first 4 opening times the report is cached or code are not executing (Code in the new StreamResource are not executing first 4 times). new StreamResource.StreamSource() are running only at 5 time WHY ? The first 4 times i got the old,cached,temp or i event dont know what a pdf file with old params.
maybe someone know the issue ?
public static void open(final String fileName, final HashMap<String, Object> data ) {
mylog.pl("### Param's print # open Report: Filename:" + fileName);
try {
Iterator<?> i = data.keySet().iterator();
while (i.hasNext()) {
String id = i.next().toString();
String value = (data.get(id) != null) ? data.get(id).toString() : "null";
mylog.pl(" id: " + id + " value: " + value);
}
} catch (Exception e) {
e.printStackTrace();
mylog.pl(e.getMessage());
}
StreamResource.StreamSource source = null;
source = new StreamResource.StreamSource() {
public InputStream getStream() {
byte[] b = null;
InputStream reportStream = null;
try {
reportStream = new BufferedInputStream(new FileInputStream(PATH + fileName + JASPER));
b = JasperRunManager.runReportToPdf(reportStream, data, new JREmptyDataSource());
} catch (JRException ex) {
ex.printStackTrace();
mylog.pl("Err # JR" + ex.getMessage());
} catch (FileNotFoundException e) {
e.printStackTrace();
Utils.showMessage(SU.NOTFOUND);
return null;
}
return new ByteArrayInputStream(b);
}
};
StreamResource resource = null;
resource = new StreamResource(source, fileName + PDF);
resource.setMIMEType("application/pdf");
Page p = Page.getCurrent();
p.open(resource, "Report", false);
}

Here is the answer
I all the time used resource.setCacheTime(0); but really needed resource.setCacheTime(1000); because
In theory <= 0 disables caching. In practice Chrome, Safari (and,
apparently, IE) all ignore <=0.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

PDF Reading using PDF box - Clarification with page count - java

Related

Downloading an image in java

How to return JSON response from a URL returning HTML

Download all pdf files in website

Restrict the AutoCompleteTextView predictions to only restaurants

StreamResource, ByteArray problems

Categories

Resources