java.net.URL class throwing MalformedException because of unknown protocol: blob - java

I'm automating my test scenario for validation of a pdf document. This document opens in a new browser tab once clicked on the document link(anchor tag). I want to validate a few important contents in a document for which I'm using Apache PDFBox. But, the document URL has a prefix 'blob' because of which, java.net.URL class is throwing MalformedException for unknown protocol: blob. how should I define/add that protocol in java?
Please let me know how to get rid of this error so that I can successfully use PDFBox to parse my pdf file.
Java version - 1.8
This is the screenshot of pdf document after it opens in a browser.
This is HTML source of document. But, as it's a pdf view, cannot perform any operations such as fetching text/windowTitle etc.
following is a sample code snippet -
public void readPdfContents() throws IOException {
String url = "blob:https://cpswebqa.testcbidata.com/f9ad63bc-700e-4f49-a4fb-807ad1a44b01";
URL pdfUrl = new URL(url);
InputStream ips = pdfUrl.openStream();
BufferedInputStream bis = new BufferedInputStream(ips);
PDFParser pdfParser = new PDFParser(bis);
pdfParser.parse();
String pdfData = new PDFTextStripper().getText(pdfParser.getPDDocument());
System.out.println("PDF Data is - " + pdfData);
}
Error stack trace -
Exception in thread "main" java.net.MalformedURLException: unknown protocol: blob
at java.net.URL.<init>(URL.java:600)
at java.net.URL.<init>(URL.java:490)
at java.net.URL.<init>(URL.java:439)
at com.cbsh.automation.file.testrunner.WEB.Sample.main(Sample.java:11)

I got the same problem and found a solution injecting Javascript like in here:
How to download an image with Python 3/Selenium if the URL begins with “blob:”?
I wrote in Java and it worked very well, here is the code:
private String getBytesBase64FromBlobURI(ChromeDriver driver, String uri) {
String script = " "
+ "var uri = arguments[0];"
+ "var callback = arguments[1];"
+ "var toBase64 = function(buffer){for(var r,n=new Uint8Array(buffer),t=n.length,a=new Uint8Array(4*Math.ceil(t/3)),i=new Uint8Array(64),o=0,c=0;64>c;++c)i[c]='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'.charCodeAt(c);for(c=0;t-t%3>c;c+=3,o+=4)r=n[c]<<16|n[c+1]<<8|n[c+2],a[o]=i[r>>18],a[o+1]=i[r>>12&63],a[o+2]=i[r>>6&63],a[o+3]=i[63&r];return t%3===1?(r=n[t-1],a[o]=i[r>>2],a[o+1]=i[r<<4&63],a[o+2]=61,a[o+3]=61):t%3===2&&(r=(n[t-2]<<8)+n[t-1],a[o]=i[r>>10],a[o+1]=i[r>>4&63],a[o+2]=i[r<<2&63],a[o+3]=61),new TextDecoder('ascii').decode(a)};"
+ "var xhr = new XMLHttpRequest();"
+ "xhr.responseType = 'arraybuffer';"
+ "xhr.onload = function(){ callback(toBase64(xhr.response)) };"
+ "xhr.onerror = function(){ callback(xhr.status) };"
+ "xhr.open('GET','"+ uri +"');"
+ "xhr.send();";
String result = (String) driver.executeAsyncScript(script, uri);
return result;
}
I hope it help someone.
Cheers!

Related

how to show download progress on file download?

I have an api to download file. It is able to download file but showing only after download completes. there is no download progress.
I want when user hit that url it will show download progress in chrome, currently it is showing after completion.
I am using spring boot.
public responseEntity<Resource>getFile(String fileName){
byte[] data=null;
File file=new File(fileName);
InputStream inputStream=new FileInputStream(file);
data=IOUtils.toByteArray(inputStream);
ByteArrayResource fileToDownload = new ByteArrayResource(data);
return ResponseEntity.ok()
.contentType(MediaType.parseMediaType("application/octet-stream"))
.header("Content-Disposition", "filename=" + fileName)
.body(fileToDownload);
}
Use JavaScript in your webpage. This has nothing to do with how the server sends the file, and must be displayed client-side -- well, the server could output somewhere how far it is along sending the file, but that is not what you want to show - you are interested in showing how much you have received, and showing it in the client; so any answer will have to rely on JS+html to an extent. Why not solve it entirely in the client side?
In this answer they use the following code:
function saveOrOpenBlob(url, blobName) {
var blob;
var xmlHTTP = new XMLHttpRequest();
xmlHTTP.open('GET', url, true);
xmlHTTP.responseType = 'arraybuffer';
xmlHTTP.onload = function(e) {
blob = new Blob([this.response]);
};
xmlHTTP.onprogress = function(pr) {
//pr.loaded - current state
//pr.total - max
};
xmlHTTP.onloadend = function(e){
var fileName = blobName;
var tempEl = document.createElement("a");
document.body.appendChild(tempEl);
tempEl.style = "display: none";
url = window.URL.createObjectURL(blob);
tempEl.href = url;
tempEl.download = fileName;
tempEl.click();
window.URL.revokeObjectURL(url);
}
xmlHTTP.send();
}
Note that you are missing part where you display the progress somewhere. For example, you could implement it as follows:
xmlHTTP.onprogress = function(pr) {
//pr.loaded - current state
//pr.total - max
let percentage = (pr.loaded / pr.total) / 100;
document.getElementById("progress").textContent = "" + percentage + "% complete";
};
This assumes that there is something like
<span id="progress">downloading...</span>
in your html

response.asXml() always returns encoding error on Play Framework

I'm making play framweork application.
I tried to get xml content from web services.
http://example.com/api returns xml, but its encoding is EUC-JP. (charset=euc-jp)
I wrote the following code.
WSRequest request = ws.url("http://example.com/api");
WSRequest complexRequest = request.setHeader("Accept", "application/xml")
.setContentType("application/x-www-form-urlencoded");
Promise<Document> documentPromise = complexRequest.post("key1=value1").map(response -> {
String name = XPath.selectText("//name", response.asXml());
System.out.println("name :" + name);
return response.asXml();
});
However, response.asXml() always returns error :
[Fatal Error] :xx:xx: Invalid byte 1 of 1-byte UTF-8 sequence.
How can I get data by using response.asXml without any error?
Finally, I used DocumentBuilder on behalf of asXml, like
How to fix Invalid byte 1 of 1-byte UTF-8 sequence
Promise<Result> resultPromise = request.post("key=" + value).map(response -> {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
ByteArrayInputStream stream = new ByteArrayInputStream(response.getBody().getBytes("euc-jp"));
String name = XPath.selectText("//name", builder.parse(stream)));
System.out.println("name :" + name);
return ok(main.render());
});

How do I make a URL for Jsoup to parse?

So I'm making a program that extracts the lyrics from a user given song off of AZ Lyrics.
The problem I'm having is that after converting the string to a URL, it says Jsoup is not able to parse it because it doesn't accept strings despite the variable being a URL that we are passing in.
String strURL = "http://www.azlyrics.com/lyrics/" + artist + "/" + song + ".html";
URL url = new URL(strURL);
Document doc = Jsoup.parse(url);
What should I do?
I dont know which version of jsoup your are using, but as per latest version the parse method with url alone is not available. You need to pass a timeOut. So try
Document doc = Jsoup.parse(url, 30000);
There is a connect method which would be the best option (IMO). You could pass the stringURL variable directly. Try
Document doc = Jsoup.connect(strURL).get();
If these didn't help check the value of artist and song variables.
You don't need to convert the string strURL to URL, this should work:
Document doc = Jsoup.connect("http://www.azlyrics.com/lyrics/" + artist + "/" + song + ".html").timeout(10000).get();
String html = doc.text();
I've set a timeout of 10 seconds, adjust to fit your needs.
You can take a look at the available methods here

Save file from a website with java

I'm trying to build a jsoup based java app to automatically download English subtitles for films (I'm lazy, I know. It was inspired from a similar python based app). It's supposed to ask you the name of the film and then download an English subtitle for it from subscene.
I can make it reach the download link but I get an Unhandled content type error when I try to 'go' to that link. Here's my code
public static void main(String[] args) {
try {
String videoName = JOptionPane.showInputDialog("Title: ");
subscene(videoName);
}
catch (Exception e) {
System.out.println(e.getMessage());
}
}
public static void subscene(String videoName){
try {
String siteName = "http://www.subscene.com";
String[] splits = videoName.split("\\s+");
String codeName = "";
String text = "";
if(splits.length>1){
for(int i=0;i<splits.length;i++){
codeName = codeName+splits[i]+"-";
}
videoName = codeName.substring(0, videoName.length());
}
System.out.println("videoName is "+videoName);
// String url = "http://www.subscene.com/subtitles/"+videoName+"/english";
String url = "http://www.subscene.com/subtitles/title?q="+videoName+"&l=";
System.out.println("url is "+url);
Document doc = Jsoup.connect(url).get();
Element exact = doc.select("h2.exact").first();
Element yuel = exact.nextElementSibling();
Elements lis = yuel.children();
System.out.println(lis.first().children().text());
String hRef = lis.select("div.title > a").attr("href");
hRef = siteName+hRef+"/english";
System.out.println("hRef is "+hRef);
doc = Jsoup.connect(hRef).get();
Element nonHI = doc.select("td.a40").first();
Element papa = nonHI.parent();
Element link = papa.select("a").first();
text = link.text();
System.out.println("Subtitle is "+text);
hRef = link.attr("href");
hRef = siteName+hRef;
Document subDownloadPage = Jsoup.connect(hRef).get();
hRef = siteName+subDownloadPage.select("a#downloadButton").attr("href");
Jsoup.connect(hRef).get(); //<-- Here's where the problem lies
}
catch (java.io.IOException e) {
System.out.println(e.getMessage());
}
}
Can someone please help me so I don't have to manually download subs?
I just found out that using
java.awt.Desktop.getDesktop().browse(java.net.URI.create(hRef));
instead of
Jsoup.connect(hRef).get();
downloads the file after prompting me to save it. But I don't want to be prompted because this way I won't be able to read the name of the downloaded zip file (I want to unzip it after saving using java).
Assuming that your files are small, you can do it like this. Note that you can tell Jsoup to ignore the content type.
// get the file content
Connection connection = Jsoup.connect(path);
connection.timeout(5000);
Connection.Response resultImageResponse = connection.ignoreContentType(true).execute();
// save to file
FileOutputStream out = new FileOutputStream(localFile);
out.write(resultImageResponse.bodyAsBytes());
out.close();
I would recommend to verify the content before saving.
Because some servers will just return a HTML page when the file cannot be found, i.e. a broken hyperlink.
...
String body = resultImageResponse.body();
if (body == null || body.toLowerCase().contains("<body>"))
{
throw new IllegalStateException("invalid file content");
}
...
Here:
Document subDownloadPage = Jsoup.connect(hRef).get();
hRef = siteName+subDownloadPage.select("a#downloadButton").attr("href");
//specifically here
Jsoup.connect(hRef).get();
Looks like jsoup expects that the result of Jsoup.connect(hRef) should be an HTML or some text that it's able to parse, that's why the message states:
Unhandled content type. Must be text/*, application/xml, or application/xhtml+xml
I followed the execution of your code manually and the last URL you're trying to access returns a content type of application/x-zip-compressed, thus the cause of the exception.
In order to download this file, you should use a different approach. You could use the old but still useful URLConnection, URL or use a third party library like Apache HttpComponents to fire a GET request and retrieve the result as an InputStream, wrap it into a proper writer and write your file into your disk.
Here's an example about doing this using URL:
URL url = new URL(hRef);
InputStream in = url.openStream();
OutputStream out = new BufferedOutputStream(new FileOutputStream("D:\\foo.zip"));
final int BUFFER_SIZE = 1024 * 4;
byte[] buffer = new byte[BUFFER_SIZE];
BufferedInputStream bis = new BufferedInputStream(in);
int length;
while ( (length = bis.read(buffer)) > 0 ) {
out.write(buffer, 0, length);
}
out.close();
in.close();

java.net.MalformedURLException: no protocol on URL based on a string modified with URLEncoder

So I was attempting to use this String in a URL :-
http://site-test.com/Meetings/IC/DownloadDocument?meetingId=c21c905c-8359-4bd6-b864-844709e05754&itemId=a4b724d1-282e-4b36-9d16-d619a807ba67&file=\\s604132shvw140\Test-Documents\c21c905c-8359-4bd6-b864-844709e05754_attachments\7e89c3cb-ce53-4a04-a9ee-1a584e157987\myDoc.pdf
In this code: -
String fileToDownloadLocation = //The above string
URL fileToDownload = new URL(fileToDownloadLocation);
HttpGet httpget = new HttpGet(fileToDownload.toURI());
But at this point I get the error: -
java.net.URISyntaxException: Illegal character in query at index 169:Blahblahblah
I realised with a bit of googling this was due to the characters in the URL (guessing the &), so I then added in some code so it now looks like so: -
String fileToDownloadLocation = //The above string
fileToDownloadLocation = URLEncoder.encode(fileToDownloadLocation, "UTF-8");
URL fileToDownload = new URL(fileToDownloadLocation);
HttpGet httpget = new HttpGet(fileToDownload.toURI());
However, when I try and run this I get an error when I try and create the URL, the error then reads: -
java.net.MalformedURLException: no protocol: http%3A%2F%2Fsite-test.testsite.com%2FMeetings%2FIC%2FDownloadDocument%3FmeetingId%3Dc21c905c-8359-4bd6-b864-844709e05754%26itemId%3Da4b724d1-282e-4b36-9d16-d619a807ba67%26file%3D%5C%5Cs604132shvw140%5CTest-Documents%5Cc21c905c-8359-4bd6-b864-844709e05754_attachments%5C7e89c3cb-ce53-4a04-a9ee-1a584e157987%myDoc.pdf
It looks like I can't do the encoding until after I've created the URL else it replaces slashes and things which it shouldn't, but I can't see how I can create the URL with the string and then format it so its suitable for use. I'm not particularly familiar with all this and was hoping someone might be able to point out to me what I'm missing to get string A into a suitably formatted URL to then use with the correct characters replaced?
Any suggestions greatly appreciated!
You need to encode your parameter's values before concatenating them to URL.
Backslash \ is special character which have to be escaped as %5C
Escaping example:
String paramValue = "param\\with\\backslash";
String yourURLStr = "http://host.com?param=" + java.net.URLEncoder.encode(paramValue, "UTF-8");
java.net.URL url = new java.net.URL(yourURLStr);
The result is http://host.com?param=param%5Cwith%5Cbackslash which is properly formatted url string.
I have the same problem, i read the url with an properties file:
String configFile = System.getenv("system.Environment");
if (configFile == null || "".equalsIgnoreCase(configFile.trim())) {
configFile = "dev.properties";
}
// Load properties
Properties properties = new Properties();
properties.load(getClass().getResourceAsStream("/" + configFile));
//read url from file
apiUrl = properties.getProperty("url").trim();
URL url = new URL(apiUrl);
//throw exception here
URLConnection conn = url.openConnection();
dev.properties
url = "https://myDevServer.com/dev/api/gate"
it should be
dev.properties
url = https://myDevServer.com/dev/api/gate
without "" and my problem is solved.
According to oracle documentation
Thrown to indicate that a malformed URL has occurred. Either no legal protocol could be found in a specification string or the string
could not be parsed.
So it means it is not parsed inside the string.
You want to use URI templates. Look carefully at the README of this project: URLEncoder.encode() does NOT work for URIs.
Let us take your original URL:
http://site-test.test.com/Meetings/IC/DownloadDocument?meetingId=c21c905c-8359-4bd6-b864-844709e05754&itemId=a4b724d1-282e-4b36-9d16-d619a807ba67&file=\s604132shvw140\Test-Documents\c21c905c-8359-4bd6-b864-844709e05754_attachments\7e89c3cb-ce53-4a04-a9ee-1a584e157987\myDoc.pdf
and convert it to a URI template with two variables (on multiple lines for clarity):
http://site-test.test.com/Meetings/IC/DownloadDocument
?meetingId={meetingID}&itemId={itemID}&file={file}
Now let us build a variable map with these three variables using the library mentioned in the link:
final VariableMap = VariableMap.newBuilder()
.addScalarValue("meetingID", "c21c905c-8359-4bd6-b864-844709e05754")
.addScalarValue("itemID", "a4b724d1-282e-4b36-9d16-d619a807ba67e")
.addScalarValue("file", "\\\\s604132shvw140\\Test-Documents"
+ "\\c21c905c-8359-4bd6-b864-844709e05754_attachments"
+ "\\7e89c3cb-ce53-4a04-a9ee-1a584e157987\\myDoc.pdf")
.build();
final URITemplate template
= new URITemplate("http://site-test.test.com/Meetings/IC/DownloadDocument"
+ "meetingId={meetingID}&itemId={itemID}&file={file}");
// Generate URL as a String
final String theURL = template.expand(vars);
This is GUARANTEED to return a fully functional URL!
Thanks to Erhun's answer I finally realised that my JSON mapper was returning the quotation marks around my data too! I needed to use "asText()" instead of "toString()"
It's not an uncommon issue - one's brain doesn't see anything wrong with the correct data, surrounded by quotes!
discoveryJson.path("some_endpoint").toString();
"https://what.the.com/heck"
discoveryJson.path("some_endpoint").asText();
https://what.the.com/heck
This code worked for me
public static void main(String[] args) {
try {
java.net.URL url = new java.net.URL("http://path");
System.out.println("Instantiated new URL: " + url);
}
catch (MalformedURLException e) {
e.printStackTrace();
}
}
Instantiated new URL: http://path
Very simple fix
String encodedURL = UriUtils.encodePath(request.getUrl(), "UTF-8");
Works no extra functionality needed.

Categories

Resources