How to check if a link is a download link using Java? - java

I'm using selenium to get a link from a , and i wanted to check if it was a download link.
For that i used this code that i made with URL and URLConnection :
final WebElement element = driver.findElement(By.xpath(pathToFile));
URL url = null;
final String urlFileToDownload = element.getAttribute("href");
URLConnection myCon = null;
String contentDisposition = "";
try {
url = new URL(urlFileToDownload);
myCon = url.openConnection();
contentDisposition = myCon.getHeaderField("Content-Disposition");
if (!contentDisposition.contains("attachment;filename=")) {
assertTrue(false, "The link isn't a download link.");
}
} catch (final MalformedURLException e) {
throw new TestIntegrationException("Error while creating URL : " + e.getMessage());
} catch (final IOException e) {
throw new TestIntegrationException("Error while connecting to the URL : " + e.getMessage());
}
assertTrue(true, "Link is a download link.");
The probleme is that my link is a download link as you can see on this picture : Image-link-download. (the picture is a print-screen of the console)
And when i open the connection of url.openConnection();
myCon.getHeaderField("Content-Disposition") is null.
I've searched a way to do this but everytime my header-field is empty and i can't find the problem because when i check with the console, my headerfield isn't empty ...
EDIT : I'm launching my selenium test on a docker server, i think that's a important point to know.

try this:
driver.get("https://i.stack.imgur.com/64qFG.png");
WebElement img = wait5s.until(ExpectedConditions.elementToBeClickable(By.xpath("/html/body/img")));
Dimension h = img.getSize();
Assert.assertNotEquals(0, h);

Instead of looking for attachments why don't you look at the MIME type?
String contentType = myCon.getContentType();
if(contentType.startsWith("text/")) {
assertTrue("The link isn't a download link.", false);
}

My problem was caused by my session who was different with the url.openConnection().
To correct the problem i've collected my cookie JSESSION using selenium like that :
String cookieTarget = null;
for (final Cookie cookie : this.kSupTestCase.getDriver().manage().getCookies()) {
if (StringUtils.equalsIgnoreCase(cookie.getName(), "JSESSIONID")) {
cookieTarget = cookie.getName() + "=" + cookie.getValue();
break;
}
}
Then i've put the cookie to the opened connection :
try {
url = new URL(urlFichierATelecharger);
myCon = url.openConnection();
myCon.setRequestProperty("Cookie", cookieCible);
contentDisposition = myCon.getHeaderField("Content-Disposition");
if (!contentDisposition.contains("attachment;filename=")) {
assertTrue(false, "The link isn't a download link.");
}
} catch [...]
Like that i've got the good session and my URL was recognized as a download link.

Related

OAuth with CData JDBC Driver for XML - Files on Google Drive - CallbackURL not Used

I am using the CData JDBC Driver for XML to read XML files to my java application and some of those files are on google drive so OAuth is needed.
I am following the Authenticate to XML from a Web Application flow specified on CData website.
The first step is to get the OAuth Authorization URL using the GetOAuthAuthorizationURL stored procedure.
Here is my code:
try {
Class.forName("cdata.jdbc.xml.XMLDriver");
} catch (ClassNotFoundException e1) {
}
String url="";
Properties prop = new Properties();
prop.setProperty("InitiateOAuth", "OFF");
prop.setProperty("OAuthClientId", "my-client-id");
prop.setProperty("OAuthClientSecret", "my-client-secret");
prop.setProperty("CallbackURL", redirectUri);
prop.setProperty("OAuthAuthorizationUrl", "https://accounts.google.com/o/oauth2/auth?scope=https://www.googleapis.com/auth/drive.readonly");
try (Connection connection = DriverManager.getConnection("jdbc:xml:", prop)) {
CallableStatement cstmt = connection.prepareCall("GetOAuthAuthorizationURL");
boolean ret = cstmt.execute();
if (ret) {
ResultSet rs = cstmt.getResultSet();
while (rs.next()){
for (int i = 1; i <= rs.getMetaData().getColumnCount(); i++) {
System.out.println(rs.getMetaData().getColumnName(i) + "=" + rs.getString(i));
if (StringUtils.equals(rs.getMetaData().getColumnName(i), "URL"))
url = rs.getString(i);
}
}
}
} catch (SQLException e) {
e.printStackTrace();
}
The redirect_uri parameter from the url returned is always set to the default value [127.0.0.1] instead of the callbackURL I send as a property to the JDBC.
prop.setProperty("CallbackURL", redirectUri);
How can this be fixed
This was the way to do it:
cstmt.setString("CallbackURL", redirectUri);

Save file from a website with java

I'm trying to build a jsoup based java app to automatically download English subtitles for films (I'm lazy, I know. It was inspired from a similar python based app). It's supposed to ask you the name of the film and then download an English subtitle for it from subscene.
I can make it reach the download link but I get an Unhandled content type error when I try to 'go' to that link. Here's my code
public static void main(String[] args) {
try {
String videoName = JOptionPane.showInputDialog("Title: ");
subscene(videoName);
}
catch (Exception e) {
System.out.println(e.getMessage());
}
}
public static void subscene(String videoName){
try {
String siteName = "http://www.subscene.com";
String[] splits = videoName.split("\\s+");
String codeName = "";
String text = "";
if(splits.length>1){
for(int i=0;i<splits.length;i++){
codeName = codeName+splits[i]+"-";
}
videoName = codeName.substring(0, videoName.length());
}
System.out.println("videoName is "+videoName);
// String url = "http://www.subscene.com/subtitles/"+videoName+"/english";
String url = "http://www.subscene.com/subtitles/title?q="+videoName+"&l=";
System.out.println("url is "+url);
Document doc = Jsoup.connect(url).get();
Element exact = doc.select("h2.exact").first();
Element yuel = exact.nextElementSibling();
Elements lis = yuel.children();
System.out.println(lis.first().children().text());
String hRef = lis.select("div.title > a").attr("href");
hRef = siteName+hRef+"/english";
System.out.println("hRef is "+hRef);
doc = Jsoup.connect(hRef).get();
Element nonHI = doc.select("td.a40").first();
Element papa = nonHI.parent();
Element link = papa.select("a").first();
text = link.text();
System.out.println("Subtitle is "+text);
hRef = link.attr("href");
hRef = siteName+hRef;
Document subDownloadPage = Jsoup.connect(hRef).get();
hRef = siteName+subDownloadPage.select("a#downloadButton").attr("href");
Jsoup.connect(hRef).get(); //<-- Here's where the problem lies
}
catch (java.io.IOException e) {
System.out.println(e.getMessage());
}
}
Can someone please help me so I don't have to manually download subs?
I just found out that using
java.awt.Desktop.getDesktop().browse(java.net.URI.create(hRef));
instead of
Jsoup.connect(hRef).get();
downloads the file after prompting me to save it. But I don't want to be prompted because this way I won't be able to read the name of the downloaded zip file (I want to unzip it after saving using java).
Assuming that your files are small, you can do it like this. Note that you can tell Jsoup to ignore the content type.
// get the file content
Connection connection = Jsoup.connect(path);
connection.timeout(5000);
Connection.Response resultImageResponse = connection.ignoreContentType(true).execute();
// save to file
FileOutputStream out = new FileOutputStream(localFile);
out.write(resultImageResponse.bodyAsBytes());
out.close();
I would recommend to verify the content before saving.
Because some servers will just return a HTML page when the file cannot be found, i.e. a broken hyperlink.
...
String body = resultImageResponse.body();
if (body == null || body.toLowerCase().contains("<body>"))
{
throw new IllegalStateException("invalid file content");
}
...
Here:
Document subDownloadPage = Jsoup.connect(hRef).get();
hRef = siteName+subDownloadPage.select("a#downloadButton").attr("href");
//specifically here
Jsoup.connect(hRef).get();
Looks like jsoup expects that the result of Jsoup.connect(hRef) should be an HTML or some text that it's able to parse, that's why the message states:
Unhandled content type. Must be text/*, application/xml, or application/xhtml+xml
I followed the execution of your code manually and the last URL you're trying to access returns a content type of application/x-zip-compressed, thus the cause of the exception.
In order to download this file, you should use a different approach. You could use the old but still useful URLConnection, URL or use a third party library like Apache HttpComponents to fire a GET request and retrieve the result as an InputStream, wrap it into a proper writer and write your file into your disk.
Here's an example about doing this using URL:
URL url = new URL(hRef);
InputStream in = url.openStream();
OutputStream out = new BufferedOutputStream(new FileOutputStream("D:\\foo.zip"));
final int BUFFER_SIZE = 1024 * 4;
byte[] buffer = new byte[BUFFER_SIZE];
BufferedInputStream bis = new BufferedInputStream(in);
int length;
while ( (length = bis.read(buffer)) > 0 ) {
out.write(buffer, 0, length);
}
out.close();
in.close();

detecting missing jpegs over the internet

The problem my application reads in jpegs and displays them on a jlabel (these are pictures of books)
Everything works fine when used with the local version e.g. reading from the C drive, but once I try to do this over the internet problems occur that I have tried without success to correct
Scenario
Should the jpeg not be present at the end of the url I get the following error
javax.imageio.IIOException: Can't get input stream from URL!
In the version that reads from the local drive I detect if the file exist and overcome this problem however I have tried lots of the ideas posted and I simply can’t find out how to detect that the jpeg is absent!
Please can some one help
here are the two version of the code
Read from local drive C
private void showcover() {
String stockPic;
String partofISBN;
String completeurl;
jButton9.setVisible(true);
stockPic = jTextField1.getText();// get the current isbn
partofISBN = stockPic.substring(0, 7); // get first 7 numbers
String picUrl;
stockPic = stockPic + localNumber + ".jpg";
picUrl = partofISBN + "\\" + stockPic;
completeurl = "C:\\Apicture\\" + picUrl;
File pf = new File(completeurl);
if (!pf.exists()) {
jLabel9.setIcon(new ImageIcon("C:\\Apicture\\" + picUrl));
jLabel9.setIcon(new ImageIcon("C:\\Apicture\\nojpegs.jpg"));
jLabel9.setText("NO Jpeg");
}
jLabel9.setIcon(new ImageIcon(completeurl));
}
Adaption to read from url
URL url;
url = new URL("http://ebid.s3.amazonaws.com/upload_big/9/1/1/1401018425-17770-385.jpg");
Image image = null;
try {
image = ImageIO.read(url);
} catch (IOException ex) {
Logger.getLogger(baseframe.class.getName()).log(Level.SEVERE, null, ex);
}
} catch (MalformedURLException ex) {
Logger.getLogger(baseframe.class.getName()).log(Level.SEVERE, null, ex);
}
javax.imageio.IIOException means your are not getting the image.
So add more code to failover to alternate URL/disk in your catch(IOException ) block.
Read http HEAD request and response code 404 for not found. Http is your friend.
Find out of the resource exists before you GET it.
You may want to check this answer I just provided: I can't download a specific image using java code
Seems like amazon is also checking your user-agent, so you may have to put something like this at the beginning of your code:
System.setProperty("http.agent", "Mozilla/5.0 (Windows NT 5.1; rv:19.0) Gecko/20100101 Firefox/19.0");
edit: "at the beginning of your code" really means something like "before creating any URL-related object". I mean, I didn't want to refer to the code you posted, but you whole application.
Use this:
URL url;
url = new URL("http://ebid.s3.amazonaws.com/upload_big/9/1/1/1401018425-17770-385.jpg");
url.setRequestProperty("User-Agent", "Mozilla/5.0 (Windows NT 5.1; rv:19.0) Gecko/20100101 Firefox/19.0");
Image image = null;
try {
image = ImageIO.read(url);
} catch (IOException ex) {
Logger.getLogger(baseframe.class.getName()).log(Level.SEVERE, null, ex);
}
} catch (MalformedURLException ex) {
Logger.getLogger(baseframe.class.getName()).log(Level.SEVERE, null, ex);
}

Saving the first Image from URL

Here's my problem. I have a txt file called "sites.txt" . In these i type random internet sites. My Goal is to save the first image of each site. I tried to filter the Server response by the img tag and it actually works for some sites, but for some not.
The sites where it works the img src starts with http:// ... the sites it doesnt work start with anything else.
I also tried to add the http:// to the img src images which didnt have it, but i still get the same error:
Exception in thread "main" java.net.MalformedURLException: no protocol:
at java.net.URL.<init>(Unknown Source)
My current code is:
public static void main(String[] args) throws IOException{
try {
File file = new File ("sites.txt");
Scanner scanner = new Scanner (file);
String url;
int counter = 0;
while(scanner.hasNext())
{
url=scanner.nextLine();
URL page = new URL(url);
URLConnection yc = page.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine = in.readLine();
while (!inputLine.toLowerCase().contains("img"))inputLine = in.readLine();
in.close();
String[] parts = inputLine.split(" ");
int i=0;
while(!parts[i].contains("src"))i++;
String destinationFile = "image"+(counter++)+".jpg";
saveImage(parts[i].substring(5,parts[i].length()-1), destinationFile);
String tmp=scanner.nextLine();
System.out.println(url);
}
scanner.close();
}
catch (FileNotFoundException e)
{
System.out.println ("File not found!");
System.exit (0);
}
}
public static void saveImage(String imageUrl, String destinationFile) throws IOException {
// TODO Auto-generated method stub
URL url = new URL(imageUrl);
String fileName = url.getFile();
String destName = fileName.substring(fileName.lastIndexOf("/"));
System.out.println(destName);
InputStream is = url.openStream();
OutputStream os = new FileOutputStream(destinationFile);
byte[] b = new byte[2048];
int length;
while ((length = is.read(b)) != -1) {
os.write(b, 0, length);
}
is.close();
os.close();
}
I also got a tip to use the apache jakarte http client libraries but i got absolutely no idea how i could use those i would appreciate any help.
A URL (a type of URI) requires a scheme in order to be valid. In this case, http.
When you type www.google.com into your browser, the browser is inferring you mean http:// and automatically prepends it for you. Java doesn't do this, hence your exception.
Make sure you always have http://. You can easily fix this using regex:
String fixedUrl = stringUrl.replaceAll("^((?!http://).{7})", "http://$1");
or
if(!stringUrl.startsWith("http://"))
stringUrl = "http://" + stringUrl;
An alternative solution
Simply try with ImageIO that contains static convenience methods for locating ImageReaders and ImageWriters, and performing simple encoding and decoding.
Sample code:
// read a image from the URL
// I used the URL that is your profile pic on StackOverflow
BufferedImage image = ImageIO
.read(new URL(
"https://www.gravatar.com/avatar/3935223a285ab35a1b21f31248f1e721?s=32&d=identicon&r=PG&f=1"));
// save the image
ImageIO.write(image, "jpg", new File("resources/avatar.jpg"));
When you're scraping the site's HTML for image elements and their src attributes, you'll run into several different representations of URLs.
Some examples are:
resource = https://google.com/images/srpr/logo9w.png
resource = google.com/images/srpr/logo9w.png
resource = //google.com/images/srpr/logo9w.png
resource = /images/srpr/logo9w.png
resource = images/srpr/logo9w.png
For the second through fifth ones, you'll need to build the rest of the URL.
The second one may be more difficult to differentiate from the fourth and fifth ones, but I'm sure there are workarounds. The URL Standard leads me to believe you won't see it as often, because I don't think it's technically valid.
The third case is pretty simple. If the resource variable starts with //, then you just need to prepend the protocol/scheme to it. You can do this with the site object you have:
url = site.getProtocol() + ":" + resource
For the fourth and fifth cases, you'll need to prepend the resource with the entire site's URL.
Here's a sample application that uses jsoup to parse the HTML, and a simple utility method to build the resource URL. You're interested in the buildResourceUrl method. Also, it doesn't handle the second case; I'll leave that to you.
import java.io.*;
import java.net.*;
import org.jsoup.*;
import org.jsoup.nodes.*;
import org.jsoup.select.*;
public class SiteScraper {
public static void main(String[] args) throws IOException {
URL site = new URL("https://google.com/");
Document doc = Jsoup.connect(site.toString()).get();
Elements images = doc.select("img");
for (Element image : images) {
String src = image.attr("src");
System.out.println(buildResourceUrl(site, src));
}
}
static URL buildResourceUrl(URL site, String resource)
throws MalformedURLException {
if (!resource.matches("^(http|https|ftp)://.*$")) {
if (resource.startsWith("//")) {
return new URL(site.getProtocol() + ":" + resource);
} else {
return new URL(site.getProtocol() + "://" + site.getHost() + "/"
+ resource.replaceAll("^/", ""));
}
}
return new URL(resource);
}
}
This obviously won't cover everything, but it's a start. You may run into problems when the URL you're trying to access is in a subdirectory of the root of the site (i.e., http://some.place/under/the/rainbow.html). You may even encounter base64 encoded data URI's in the src attribute... It really depends on the individual case and how far you're willing to go.

WRONG_DOCUMENT_ERR Error after login to sugarCRM from java Axis 1.4

I want to import data from java web application to sugarCRM. I created client stub using AXIS and then I am trying to connect, it seems it is getting connected, since I can get server information. But after login, it gives me error while getting sessionID:
Error is: "faultString: org.w3c.dom.DOMException: WRONG_DOCUMENT_ERR: A node is used in a different document than the one that created it."
Here is my code:
private static final String ENDPOINT_URL = " http://localhost/sugarcrm/service/v3/soap.php";
java.net.URL url = null;
try {
url = new URL(ENDPOINT_URL);
} catch (MalformedURLException e1) {
System.out.println("URL endpoing creation failed. Message: "+e1.getMessage());
e1.printStackTrace();
}
> System.out.println("URL endpoint created successfully!");
Sugarsoap service = new SugarsoapLocator();
SugarsoapPortType port = service.getsugarsoapPort(url);
Get_server_info_result result = port.get_server_info();
System.out.println(result.getGmt_time());
System.out.println(result.getVersion());
//I am getting right answers
User_auth userAuth=new User_auth();
userAuth.setUser_name(USER_NAME);
MessageDigest md =MessageDigest.getInstance("MD5");
String password=convertToHex(md.digest(USER_PASSWORD.getBytes()));
userAuth.setPassword(password);
Name_value nameValueListLogin[] = null;
Entry_value loginResponse = null;
loginResponse=port.login (userAuth, "sugarcrm",nameValueListLogin);
String sessionID = loginResponse.getId(); // <--- Get error on this one
The nameValueListLogin could be be from a different document context (coming from a different source). See if this link helps.
You may need to get more debugging/logging information so we can see what nameValueListLogin consists of and where it is coming from.

Categories

Resources