JSoup getting content type then data - java

so currently I'm retrieving the data from a url using the following code
Document doc = Jsoup.connect(url).get();
Before I fetch the data I've decided I want to get the content type, so I do that using the following.
Connection.Response res = Jsoup.connect(url).timeout(10*1000).execute();
String contentType = res.contentType();
Now I'm wondering, is this making 2 separate connections? Is this not efficient? Is there a way for me to get the content type and the document data in 1 single connection?
Thanks

Yes Jsoup.connect(url).get() and Jsoup.connect(url).timeout(10*1000).execute(); are two separate connections. Maybe you are looking for something like
Response resp = Jsoup.connect(url).timeout(10*1000).execute();
String contentType = res.contentType();
and later parse body of response as a Document
Document doc = resp.parse();
Anyway Jsoup by default parses only text/*, application/xml, or application/xhtml+xml and if content type is other, like application/pdf it will throw UnsupportedMimeTypeException so you shouldn't be worried about it.

Without looking at the Jsoup internals we can't know. Typically when you want to obtain just the headers of a file (the content type in your case) without downloading the actual file content, you use the HTTP GET method instead of the GET method to the same url. Perhaps the Jsoup API allows you to set the method, that code doesn't seem like it's doing it so I'd wager it's actually getting the entire file.
The HTTP spec allows clients to reuse the connection later, they are called HTTP persistent connections, and it avoids having to create a connection for each call to the same server. However it's up to the client, Jsoup in this case since you aren't handling the connections in your code, to make sure it's not closing the connections after each request.
I believe that the overhead of creating two connections is offset by not downloading the entire file if you're code decides that it shouldn't download the file if it's not of the content type that you want.

Related

REST call in Java

I have a few questions about a specific REST call I'm making in JAVA. I'm quite the novice, so I've cobbled this together from several sources. The call itself looks like this:
String src = AaRestCall.subTrackingNum(trackingNum);
The Rest call class looks like this:
public class AaRestCall {
public static String subTrackingNum (Sting trackingNum) throws IOException {
URL url = new URL("https://.../rest/" + trackingNum);
String query = "{'TRACKINGNUM': trackingNum}";
//make connection
URLConnection urlc = url.openConnection();
//use post mode
urlc.setDoOutput(true);
urlc.setAllowUserInteraction(false);
//send query
PrintStream ps = new PrintStream(urlc.getOutputStream());
ps.print(query);
ps.close();
//get result
BufferedReader br = new BufferedReader(new InputStreamReader(urlc
.getInputStream()));
StringBuilder sb = new StringBuilder();
String line = null;
while ((line=br.readLine())!=null) {
sb.append(line);
}
br.close();
return sb.toString();
}
}
Now, I have a few questions on top of the what is wrong with this in general.
1) If this rest call is returning a JSON object, is that going to get screwed up by going to a String?
2) What's the best way to parse out the JSON that is returning?
3) I'm not really certain how to format the query field. I assume that's supposed to be documented in the REST API?
Thanks in advance.
REST is a pattern applied on top of HTTP. From your questions, it seems to me that you first need to understand how HTTP (and chatty socket protocols in general) works and what the Java API offers for deal with it.
You can use whatever Json library out there to parse the HTTP response body (provided it's a 200 OK, that you need to check for, and also watch out for HTTP redirects!), but it's not how things are usually built.
If the service exposes a real RESTful interface (opposed to a simpler HTTP+JSON) you'll need to use four HTTP verbs, and URLConnection doesn't let you do so. Plus, you'll likely want to add headers for authentication, or maybe cookies (which in fact are just HTTP headers, but are still worth to be considered separately). So my suggestion is building the client-side part of the service with the HttpClient from Apache commons, or maybe some JAX-RS library with client support (for example Apache CXF). In that way you'll have full control of the communication while also getting nicer abstractions to work with, instead of consuming the InputStream provided by your URLConnection and manually serializing/deserializing parameters/responses.
Regarding the bit about how to format the query field, again you first need to grasp the basics of HTTP. Anyway, the definite answer depends on the remote service implementation, but you'll face four options:
The query string in the service URL
A form-encoded body of your HTTP request
A multipart body of your HTTP request (similar to the former, but the different MIME type is enough to give some headache) - this is often used in HTTP+JSON services that also have a website, and the same URL can be used for uploading a form that contains a file input
A service-defined (for example application/json, or application/xml) encoding for your HTTP body (again, it's really the same as the previous two points, but the different MIME encoding means that you'll have to use a different API)
Oh my. There are a couple of areas where you can improve on this code. I'm not even going to point out the errors since I'd like you to replace the HTTP calls with a HTTP client library. I'm also unaware of the spec required by your API so getting you to use the POST or GET methods properly at this level of abstraction will take more work.
1) If this rest call is returning a JSON object, is that going to get
screwed up by going to a String?
No, but marshalling that json into an obect is your job. A library like google gson can help.
2) What's the best way to parse out the JSON that is returning?
I like to use gson like I mentioned above, but you can use another marshal/unmarhal library.
3) I'm not really certain how to format the query field. I assume
that's supposed to be documented in the REST API?
Yes. Take a look at the documentation and come up with java objects that mirror the json structure. You can then parse them with the following code.
gson.fromJson(json, MyStructure.class);
Http client
Please take a look at writing your HTTP client using a library like apache HTTP client which will make your job much easier.
Testing
Since you seem to be new to this, I'd also suggest you take a look at a tool like Postman which can help you test your API calls if you suspect that the code you've written is faulty.
I think that you should use a REST client library instead of writing your own, unless it is for educational purposes - then by all means go nuts!
The REST service will respond to your call with a HTTP response, the payload may and may not be formatted as a JSON string. If it is, I suggest that you use a JSON parsing library to convert that String into a Java representation.
And yes, you will have to resort to the particular REST API:s documentation for details.
P.S. The java URL class is broken, use URI instead.

HttpURLConnection "enctype" POST [duplicate]

How can I set content type of HTTP Put as xxxx+xml?
I was referring to solution in this link Android, sending XML via HTTP POST (SOAP). Its fine when we set content type like this, i mean the xml is came along with the request:
httppost.setHeader("Content-Type","application/soap+xml;charset=UTF-8");
but when i change type soap to something custom, the xml disappear on the request (i saw on the wireshark), like this:
httppost.setHeader("Content-Type","application/vnd.oma-pcc+xml;charset=UTF-8");
then, i tried put the xml only, so the request is ok again:
httppost.setHeader("Content-Type","application/xml;charset=UTF-8");
I want to know what exactly the rules for the content-type than come together with the xml type so that the xml still there.
Thanks.
Assuming you're using HTTPClient of 4.1.3 or greater -
When constructing you're entity, you have the option to specify the content being used for the POST or PUT operation for certain entities.
There is a ContentType object which should be used to specify this.
Using the factory method .create() you can specify the mimetype with a charset - the ContentType will be used by the framework to properly emit the header in question.
Example API call:
ContentType.create("application/vnd.oma-pcc+xml", CharSet.forName("UTF-8"));
NOTE Editing for HttpClient 4.1.2
In the case of 4.1.2, when you create your entity for the post or put operation, set the content type on the entity not the execution (HttpPost or HttpPut) using setContentType(String). This is deprecated in 4.1.3 and beyond.

How can I pass JSON as well as File to REST API in JAVA?

My main question is how can I pass JSON as well as File to post request to REST API? What needs in Spring framework to work as client and wait for response by passing post with JSON and File?
Options:
Do I need to use FileRepresentation with ClientResource? But how can I pass file as well as JSON?
By using RestTemplate for passing both JSON as well as File? How it can be used for posting JSON as well as File?
Any other option is available?
Sounds like an awful resource you're trying to expose. My suggestion is to separate them into 2 different requests. Maybe the JSON has the URI for the file to then be requested…
From a REST(ish) perspective, it sounds like the resource you are passing is a multipart/mixed content-type. One subtype will be application/json, and one will be whatever type the file is. Either or both could be base64 encoded.
You may need to write specific providers to serialize/deserialize this data. Depending on the particular REST framework, this article may help.
An alternative is to create a single class that encapsulates both the json and the file data. Then, write a provider specific to that class. You could optionally create a new content-type for it, such as "application/x-combo-file-json".
You basically have three choices:
Base64 encode the file, at the expense of increasing the data size
by around 33%.
Send the file first in a multipart/form-data POST,
and return an ID to the client. The client then sends the metadata
with the ID, and the server re-associates the file and the metadata.
Send the metadata first, and return an ID to the client. The client
then sends the file with the ID, and the server re-associates the
file and the metadata.

Check if a URL's mimetype is not a web page

I want to check if a URL's mimetype is not a webpage. Can I do this in Java? I want to check if the file is a rar or mp3 or mp4 or mpeg or whatever, just not a webpage.
You can issue an HTTP HEAD request and check for Content-Type response headers. You can use the HttpURLConnection.setRequestMethod("HEAD") before you issue the request. Then issue the request with URLConnection.connect() and then use URLConnection.getContentType() which reads the HTTP headers.
The bonus of using a HEAD request is that the actual resource is never transmitted/generated. You can also use a GET request and inspect the resulting stream using URLConnection.guessContentTypeFromStream() which will inspect the actual bytes and try to guess what the stream represents. I think that it looks for magic numbers or other patterns in the stream.
There's nothing inherent in a URL which will tell you what you will receive when you request it. You have to actually request the resource, and then inspect the content-type header. At that point, it's still not clear what you should do - some content types will (almost) always be handled by the browser, e.g. text/html. Some types should be handled by a browser, e.g. application/xhtml+xml. Some types may be handled by the browser, e.g. application/pdf.
Which, if any, of these you consider to be "webpage" is still not clear - you'll need to decide for yourself.
You can inspect the content-type header once you're requested the resource, using, for example, the HttpURLConnection class.
content-type:text/html represents webpage.

What is the significance of the "missing content stream" exception in SolrCore?

I have seen some other questions on this topic. However, they do not address the general nature of what exactly a "content stream" is.
What is a "content stream" in solr, and what is the significance of this message? For example, often when we get a CONNECTION FAILURE in an SQL database, it means simply that there is no route to the database server. In this same idiom: what is the meaning of a missing "content stream" when we are writing to Solr?
What do we need to verify in our java apps before writing to a Solr Core?
You are using the Solr Update request handler that expects an http POST stream of XML documents to be sent. So in this case if you are not passing any documents to be posted when calling the url, you will get the "missing content stream message".
In your Java apps you need to verify that you are passing an appropriate content stream and parameters to Solr. You can get more details on the UpdateXMLMessages wiki page.
Also, you might want to consider using the SolrJ Java client to write and query the Solr index.
content stream = Is the data you submit to Solr
You need to verify that your content Stream is in accordance with the Solr schema . Example : You are submitting value "name" of Type String from your java client and in Solr Schema there is a field "name" of type String. You need to verify the username and the password are correct ,to authenticate to solr if such is needed.

Categories

Resources