jsoup unexpectedly fetching shopwiki image

jsoup unexpectedly fetching shopwiki image - java

I am using the following code to fetch images from the web:
import java.io.FileOutputStream;
import java.io.IOException;
import org.jsoup.Jsoup;
public class fetchImageTest {
public static void main(String[] args) throws Exception {
saveImage(args[0], args[1]);
}
private static boolean saveImage(String string, String destination) throws IOException {
string = string.replaceAll(" ", "%20");
try {
byte[] image = Jsoup.connect(string).ignoreContentType(true).timeout(10000).execute().bodyAsBytes();
FileOutputStream os = new FileOutputStream(destination);
os.write(image);
os.close();
return true;
}
catch (IOException e) {
System.out.println("couldn't open " + string);
return false;
}
catch (Exception e) {
System.out.println("couldn't open - general exception" + string);
return false;
}
}
}
Due to a bug in some of my other code, I tried to fetch an image from a broken URL, of the form:
http://shop.foo.comhttp://shop.foo.com/1.jpg
My code ended up fetching a shopwiki image, like
I am using jsoup-1.7.1.jar. Is there a virus on my server? Is there a virus with my jsoup jar file?
I really have no idea ...

Several sites set up a system to protect the recovery of their image.
I guess you try to retrieve images shopwiki.com
I watched their URL to retrieve a picture is it is well established that security.
http://si4.shopwiki.com/i/data/120x120/18/4/2/aHR0cDovL2VjeC5pbWFnZXMtYW1hem9uLmNvbS9pbWFnZXMvSS81MVMwWTBuZHBjTC5qcGc=.jpg

Related

Cyrillic text coming from Document Properties is corrupt in PDF file in docx4j

I am trying to convert docx to pdf using docx4j 3.7.7.The issue is pdf is getting generated properly but the docpropery having cyrillic text is not coming up. It coming as #####. Normal paragraph with cyrillic text is getting generated properly. The issue is reproducible only in linux. In windows, docProperty is getting converted properly.
The file for testing can be found here
file
Below is the code :
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import org.docx4j.Docx4J;
import org.docx4j.convert.out.FOSettings;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
public class TestRussian {
public static void main(String[] args) {
new TestRussian().convertWordToPdf();
}
public void convertWordToPdf() {
FileOutputStream fileOutputStream =null;
try {
File file = new File("Test1.docx");
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(file);
boolean checkViaFo = Docx4J.pdfViaFO();
FOSettings foSettings = Docx4J.createFOSettings();
fileOutputStream= new FileOutputStream("PDFRussian1.pdf");
foSettings.setWmlPackage(wordMLPackage);
//Getting error in update() during complex field update
//FieldUpdater updater = new FieldUpdater(wordMLPackage);
//updater.update(true);
Docx4J.toPDF(wordMLPackage,fileOutputStream);
System.out.println("Done");
} catch (Exception ex) {
} finally {
try {
if (fileOutputStream != null) {
fileOutputStream.close();
}
} catch (IOException e) {
}
}
}
}
I have read something about MERGEGORMAT & CHARFORMAT but didnt have much idea on that

How to Get Total Page Count From Tiff

I have started to create a new method in our project to return total pages. We are using TIFFTweaker which can be referenced from the following URL - https://github.com/dragon66/icafe/blob/master/src/com/icafe4j/image/tiff/TIFFTweaker.java
In this class I found a method TIFFTweaker.getPageCount() which looks like it wants a RandomAccessInputStream object for their getPageCount().
I've been playing around with trying to get from my file object over to what they're looking for.
What would be the best way to approach this and return the total pages from the tiff?
I have looked over some java docs, stackOverflow and some random blogs but can't seem to figure out how to get from a file object to a randomaccessinputstream.
#Override
public Integer totalPages(File file) {
Integer numberOfPages = 0;
try{
//TIFFTweaker.getPageCount(); - How to pass the file and get the count? Problem is type is a random access input stream and I have a file type
FileInputStream fileInputStream = new FileInputStream(file);
String absolutePath = file.getAbsolutePath();
// return TIFFTweaker.getPageCount();
}catch(IOException e){
log.error("Error with Tiff File" + e);
}
return null;
}
I am expecting a numeric value returned which represents the total number of pages in the TIFF file I'm passing.

Here is what I got to work. #roeygol, thanks for your answer. I had tried to Maven import the dependency but something was broken in that version. Here is what I came up with.
#Override
public Integer totalPages(File file) {
try(
InputStream fis = new FileInputStream(file);
RandomAccessInputStream randomAccessInputStream = new
FileCacheRandomAccessInputStream(fis)
){
return TIFFTweaker.getPageCount(randomAccessInputStream);
}catch(IOException e){
log.error("Error with Tiff File" + e);
}
return null;
}

Try to use this code:
import java.io.File;
import java.io.IOException;
import java.awt.Frame;
import java.awt.image.RenderedImage;
import javax.media.jai.widget.ScrollingImagePanel;
import javax.media.jai.NullOpImage;
import javax.media.jai.OpImage;
import com.sun.media.jai.codec.SeekableStream;
import com.sun.media.jai.codec.FileSeekableStream;
import com.sun.media.jai.codec.TIFFDecodeParam;
import com.sun.media.jai.codec.ImageDecoder;
import com.sun.media.jai.codec.ImageCodec;
public class MultiPageRead extends Frame {
ScrollingImagePanel panel;
public MultiPageRead(String filename) throws IOException {
setTitle("Multi page TIFF Reader");
File file = new File(filename);
SeekableStream s = new FileSeekableStream(file);
TIFFDecodeParam param = null;
ImageDecoder dec = ImageCodec.createImageDecoder("tiff", s, param);
System.out.println("Number of images in this TIFF: " +
dec.getNumPages()); //<< use this function to get the number of pages of your TIFF
// Which of the multiple images in the TIFF file do we want to load
// 0 refers to the first, 1 to the second and so on.
int imageToLoad = 0;
RenderedImage op =
new NullOpImage(dec.decodeAsRenderedImage(imageToLoad),
null,
OpImage.OP_IO_BOUND,
null);
// Display the original in a 800x800 scrolling window
panel = new ScrollingImagePanel(op, 800, 800);
add(panel);
}
public static void main(String [] args) {
String filename = args[0];
try {
MultiPageRead window = new MultiPageRead(filename);
window.pack();
window.show();
} catch (java.io.IOException ioe) {
System.out.println(ioe);
}
}
}
Prerequisites for this code is to use jai-codec:
https://mvnrepository.com/artifact/com.sun.media/jai-codec/1.1.3
The main function to be used for is getNumPages()

Java - Logging into a website in order to retrieve data

What I am attempting to do is:
Login to a website in order to retrieve data that can only be accessed while logged on.
The website I need to login to is https://www.indemed.com.
I think that this is a two part program, part 1 being logging in, while part 2 is getting the information. When I run the login part of my program and then attempt to manually log in it says my account is in use, which I take to mean it is correctly logging in.
However when I try to get the price it is not there (if not logged in prices will not show up, but everything else will be there).
My questions are: Is there a problem with how I am combining my logging method and my retrieving method? Is the problem just with my logging method? Is the problem with just my retrieving method? Why doesn't this work? Most importantly, how can I fix this?
Here is what I have attempted so far:
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.URLConnection;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
public class IndependenceMedical {
public IndependenceMedical(){
login();
}
private void login() {
URL URLObj;
URLConnection connect;
try {
// Establish a URL and open a connection to it. Set it to output mode.
URLObj = new URL("https://www.indemed.com/Action/Login/LoginAction.cfm?Refer=/index.cfm");
connect = URLObj.openConnection();
System.out.println(connect.toString());
connect.setDoOutput(true);
// Create a buffered writer to the URLConnection's output stream and write our forms parameters.
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(connect.getOutputStream()));
writer.write("AccountNumber=12345&UserName=myUserName&Password=myPassword&Login=Login");
writer.close();
// Now establish a buffered reader to read the URLConnection's input stream.
BufferedReader reader = new BufferedReader(new InputStreamReader(connect.getInputStream()));
String lineRead = "";
// Read all available lines of data from the URL and print them to screen.
while ((lineRead = reader.readLine()) != null) {
System.out.println(lineRead);
}
reader.close();
}
catch (MalformedURLException ex) {
System.out.println("The URL specified was unable to be parsed or uses an invalid protocol. Please try again.");
System.exit(1);
}
catch (Exception ex) {
System.out.println(ex.getMessage() + "\nAn exception occurred.");
System.exit(1);
}
}
public Document getDoc(String itemNumber){
try {
return Jsoup.connect("https://www.indemed.com/Catalog/SearchResults.cfm?source=advancedSearch&psku=" + itemNumber + "&keyword=&PHCPCS=&PClassID=&ManufacturerID=&Search.x=41&Search.y=9").get();
}
catch (IOException e) {}
return null;
}
public String getPrice(Document doc){
try{
Elements stuff = doc.select("#tr_51187955");
stuff = stuff.select("div.product-price");
String newStuff = stuff.toString();
newStuff = newStuff.substring(newStuff.indexOf("$")); // throws exception because "$" is not in the String.
newStuff = newStuff.substring(0, newStuff.indexOf(" "));
return newStuff;
}
catch (Exception arg0){
return "";
}
}
public static void main(String[] args){
IndependenceMedical test = new IndependenceMedical();
Document doc = test.getDoc("187955");
System.out.println("\n\n\n\n\n\n\n\n\n\n"); //to separate the return lines
System.out.println(test.getPrice(doc));
}
}
Due to character restrictions and the fact that I don't know which parts are important, I can't show the output. However if requested I will try to provide all the requested output.
Sorry for being so wordy I'm just trying to make sure the question is clear.
Lastly I have thoroughly looked through other login questions and although there are examples of how to login, I can't seem to find how to do anything after logging in (i'm sure someone has talked about it, but I haven't been able to find it).
Thanks in advance to anyone that can help me with this.
EDIT:
Although this question is similar to Parse HTML source after login with Java
I'm not parsing the redirected page, I need access to all pages this grants access to.

Jsoup provides the methods for login mechanisms.
Try the below, after you've filled the username, password and account number.
import java.io.IOException;
import java.net.MalformedURLException;
import java.util.Map;
import org.jsoup.Connection;
import org.jsoup.Connection.Method;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
public class IndependenceMedical {
private Map<String, String> loginCookies;
public IndependenceMedical() {
login();
}
private void login() {
try {
Connection.Response res = Jsoup.connect("https://www.indemed.com/Action/Login/LoginAction.cfm?refer=MyAccount&qs=")
.data("UserName", "myUserName")
.data("Password", "myPassword")
.data("AccountNumber", "myAccountNumber")
.method(Method.POST)
.execute();
loginCookies = res.cookies();
} catch (MalformedURLException ex) {
System.out.println("The URL specified was unable to be parsed or uses an invalid protocol. Please try again.");
System.exit(1);
} catch (Exception ex) {
System.out.println(ex.getMessage() + "\nAn exception occurred.");
System.exit(1);
}
}
public Document getDoc(String itemNumber){
try {
return Jsoup.connect("https://www.indemed.com/Catalog/SearchResults.cfm?source=advancedSearch&psku=" + itemNumber + "&keyword=&PHCPCS=&PClassID=&ManufacturerID=&Search.x=41&Search.y=9")
.cookies(loginCookies)
.get();
} catch (IOException e) {}
return null;
}
public String getPrice(Document doc){
try {
Elements stuff = doc.select("#tr_51187955");
stuff = stuff.select("div.product-price");
String newStuff = stuff.toString();
newStuff = newStuff.substring(newStuff.indexOf("$")); // throws exception because "$" is not in the String.
newStuff = newStuff.substring(0, newStuff.indexOf(" "));
return newStuff;
} catch (Exception arg0) {
return "";
}
}
public static void main(String[] args){
IndependenceMedical test = new IndependenceMedical();
Document doc = test.getDoc("187955");
System.out.println("\n\n\n\n\n\n\n\n\n\n"); //to separate the return lines
System.out.println(test.getPrice(doc));
}
}

Saving internet icon then re-opening causes EOF with image4j

I'm currently working on a project where I'm attempting to download a .ico file, but for some strange reason, I can't seem to open it programmatically once downloaded. I can however, open the image saved using any image editor or viewer. My code:
public static BufferedImage parseImageLocal(String url) throws IOException {
if (url.endsWith(".ico")) {
return ICODecoder.read(new File(url)).get(0);
} else if (url.endsWith(".bmp")) {
return BMPDecoder.read(new File(url));
} else {
return ImageIO.read(new File(url));
}
}
public static void saveImage(BufferedImage img, String path)
throws IOException {
File outputfile = new File(path.replace("http://", ""));
File parent = outputfile.getParentFile();
parent.mkdir();
if (!outputfile.exists()) {
outputfile.createNewFile();
}
if (path.endsWith(".ico")) {
ICOEncoder.write(img, outputfile);
} else if (path.endsWith(".bmp")) {
BMPEncoder.write(img, outputfile);
} else {
ImageIO.write(img, "png", outputfile);
}
}
And this is how i download images from the internet:
public static BufferedImage parseImage(String url) throws IOException {
URL dest = new URL(url);
if (url.endsWith(".ico")) {
return ICODecoder.read(dest.openStream()).get(0);
} else if (url.endsWith(".bmp")) {
return BMPDecoder.read(dest.openStream());
} else {
return ImageIO.read(dest);
}
}
The error is on this line:
return ICODecoder.read(new File(url)).get(0);

It "seems" that you are trying to download the icon from the internet, but you are trying to treat the URL as a File.
Basically, this isn't going to be possible, File won't be able to resolve to an actual physical file.
Instead, you should be using ICODecoder#read(InputStream) and URL#openStream
Something more like...
BufferedImage img = null;
InputStream is = null;
try {
// url begin an instance of java.net.URL
is = url.openStream();
img = ICODecoder.read(is);
} finally {
try {
is.close();
} catch (Exception exp) {
}
}
return img;
Updated with example
A web resource is not a File, you can not access it as if they were, instead, you need to use the classes designed for interacting with the internet/network.
For example...
import java.awt.EventQueue;
import java.awt.image.BufferedImage;
import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.util.List;
import javax.swing.ImageIcon;
import javax.swing.JOptionPane;
import javax.swing.UIManager;
import javax.swing.UnsupportedLookAndFeelException;
import net.sf.image4j.codec.ico.ICODecoder;
public class ReadFavicon {
public static void main(String[] args) {
new ReadFavicon();
}
public ReadFavicon() {
EventQueue.invokeLater(new Runnable() {
#Override
public void run() {
try {
UIManager.setLookAndFeel(UIManager.getSystemLookAndFeelClassName());
} catch (ClassNotFoundException | InstantiationException | IllegalAccessException | UnsupportedLookAndFeelException ex) {
}
try {
BufferedImage img = readIcon(new URL("https://secure.gravatar.com/favicon.ico"));
JOptionPane.showMessageDialog(null, "My FAVICON", "Icon", JOptionPane.PLAIN_MESSAGE, new ImageIcon(img));
} catch (IOException ex) {
ex.printStackTrace();
}
}
});
}
public BufferedImage readIcon(URL url) throws IOException {
BufferedImage img = null;
InputStream is = null;
try {
// url begin an instance of java.net.URL
is = url.openStream();
List<BufferedImage> imgs = ICODecoder.read(is);
img = imgs != null ? imgs.size() > 0 ? imgs.get(0) : null : null;
} finally {
try {
is.close();
} catch (Exception exp) {
}
}
return img;
}
}
Update with some more ideas
Now. I could be wrong, but when I ran your code, I ran into a serious of problems with the paths...
Let's assume the original url/path is https://secure.gravatar.com/favicon.ico, when you save the image, you do something like...
File outputfile = new File(path.replace("http://", ""));
File parent = outputfile.getParentFile();
parent.mkdir();
With our original path, this would result in a outputfile of https://secure.gravatar.com/favicon.ico, which is obviously wrong...
We can correct for this by using path.replace("https://", "") as well...
path = path.replace("http://", "");
path = path.replace("https://", "");
File outputfile = new File(path);
File parent = outputfile.getParentFile();
parent.mkdir();
Now, this results in a outputfile of secure.gravatar.com/favicon.ico. I become a little unstuck, as I'm not sure if this is what you want...but it does work for me...
Now, when you read the file, you do something like this...
public static BufferedImage parseImage(String url) throws IOException {
URL dest = new URL(url);
if (url.endsWith(".ico")) {
return ICODecoder.read(dest.openStream()).get(0);
} else if (url.endsWith(".bmp")) {
return BMPDecoder.read(dest.openStream());
} else {
return ImageIO.read(dest);
}
}
Now, with no evidence to the contray, I have to assume the url has not changed and is still https://secure.gravatar.com/favicon.ico...this means that new File("https://secure.gravatar.com/favicon.ico") will produce an invalid file reference
So, again, I parsed the input...
url = url.replace("https://", "");
url = url.replace("http://", "");
File outputfile = new File(url);
String parentPath = outputfile.getParent();
String name = outputfile.getName();
url = parentPath + File.separator + name;
Which produces secure.gravatar.com\favicon.ico
This all downloaded, wrote and read without error.

Convert pdf to byte[] and vice versa with pdfbox

I've read the documentation and the examples but I'm having a hard time putting it all together. I'm just trying to take a test pdf file and then convert it to a byte array then take the byte array and convert it back into a pdf file then create the pdf file onto disk.
It probably doesn't help much, but this is what I've got so far:
package javaapplication1;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import org.apache.pdfbox.cos.COSStream;
import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.PDDocument;
public class JavaApplication1 {
private COSStream stream;
public static void main(String[] args) {
try {
PDDocument in = PDDocument.load("C:\\Users\\Me\\Desktop\\JavaApplication1\\in\\Test.pdf");
byte[] pdfbytes = toByteArray(in);
PDDocument out;
} catch (Exception e) {
System.out.println(e);
}
}
private static byte[] toByteArray(PDDocument pdDoc) throws IOException, COSVisitorException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
try {
pdDoc.save(out);
pdDoc.close();
} catch (Exception ex) {
System.out.println(ex);
}
return out.toByteArray();
}
public void PDStream(PDDocument document) {
stream = new COSStream(document.getDocument().getScratchFile());
}
}

You can use Apache commons, which is essential in any java project IMO.
Then you can use FileUtils's readFileToByteArray(File file) and writeByteArrayToFile(File file, byte[] data).
(here is commons-io, which is where FileUtils is: http://commons.apache.org/proper/commons-io/download_io.cgi )
For example, I just tried this here and it worked beautifully.
try {
File file = new File("/example/path/contract.pdf");
byte[] array = FileUtils.readFileToByteArray(file);
FileUtils.writeByteArrayToFile(new File("/example/path/contract2.pdf"), array);
} catch (IOException e) {
e.printStackTrace();
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

jsoup unexpectedly fetching shopwiki image - java

Related

Cyrillic text coming from Document Properties is corrupt in PDF file in docx4j

How to Get Total Page Count From Tiff

Java - Logging into a website in order to retrieve data

Saving internet icon then re-opening causes EOF with image4j

Convert pdf to byte[] and vice versa with pdfbox

Categories

Resources