Concise example of file upload via Java lib Apache Commons - java

[edit]
I've removed my convoluted and badly malformed question so that it doesn't detract from the very neat and correct answer beneath. Given the (surprising) difficulty of finding an on-line example for doing this incredibly common task, I hope Yoni gets a few more up-ticks for his response.
So... the question in a nutshell...
How do I use Apache.Commons to upload a file to some destination. I'm using it in Android and uploading to a PHP script, but obviously it can work from any Java program and to any HTTP based listener.

From the api of MultipartRequestEntity:
File f = new File("/path/fileToUpload.txt");
PostMethod filePost = new PostMethod("http://host/some_path");
Part[] parts = {
new StringPart("param_name", "value"),
new FilePart(f.getName(), f)
};
filePost.setRequestEntity(
new MultipartRequestEntity(parts, filePost.getParams())
);
HttpClient client = new HttpClient();
int status = client.executeMethod(filePost);
I don't think you need the content-disposition part, that is used for the other direction (when the browser downloads a file and needs to know what to do with it).
getParams.setParameter is optional. You can also set it directly on the HttpClient instance.
AFAIK, the order of setting request headers is irrelevant, as long as they are all set before you set the request body.

Related

Is it necessarily so that you can POST a byte stream to any API that will accept a file, or does it depend on the API?

I have come to the understanding that knowing this is indicative of a lack of knowledge of how REST-like APIs work, and if someone can provide me a reference where I can learn the background behind this question, I would appreciate it. In the meantime, though, I would also appreciate help answering this question!
I have a java application that posts files from the local filesystem to an API. My goal is to instead of having millions of files sitting on the volume with all of their file handles, I want to leave the files in a .tar.gz file, and then in memory pull them out of archive and POST them without writing them to disk. I know that I can write them to disk, POST them, and then delete them, but I view that option as a last resort.
So here's code that works to POST a file that exists in the file system, not in an archive
public CloseableHttpResponse submit (File file) throws IOException {
CloseableHttpClient client = HttpClients.createDefault();
HttpPost post = new HttpPost(API_LOCATION + API_BASE);
MultipartEntityBuilder builder = MultipartEntityBuilder.create();
builder.addBinaryBody("files", file, ContentType.APPLICATION_OCTET_STREAM, null);
HttpEntity multipartEntity = builder.build();
post.setEntity(multipartEntity);
CloseableHttpResponse response = client.execute(post);
System.out.println("response: " + IOUtils.toString(response.getEntity().getContent(),"UTF-8"));
client.close();
return response;
}
I get back a JSON response from my particular API that looks like this
response: {"data":[<bunch of json>]}
I've put the same file into a .tar.gz archive and have used apache commons compress to unzip the file and pull out each file as a TarArchiveEntry, and I've tested that it works properly by writing the text file to disk and opening it manually outside of java - I am definitely getting the entry into memory correctly. I tried changing the entity attached to the POST to a ByteArrayEntity and converting the archive entry to a byte stream, but the API insists it will only accept a multipart entity. So looking at the API for MultipartEntityBuilder.addBinaryBody it appears I'm left with two options: I can either post a byte array or an InputStream. I've tried both and I can't get either to work - I'll post my example code for the byte array approach, but I can't figure out how to convert the tar archive to an InputStream - at least not without converting it to a byte array first, which seems sorta silly at that point.
public CloseableHttpResponse submit (byte[] xmlBytes) throws IOException {
CloseableHttpClient client = HttpClients.createDefault();
HttpPost post = new HttpPost(API_LOCATION + API_BASE);
MultipartEntityBuilder builder = MultipartEntityBuilder.create();
builder.addBinaryBody("files", xmlBytes, ContentType.APPLICATION_OCTET_STREAM, null);
HttpEntity multipartEntity = builder.build();
post.setEntity(multipartEntity);
CloseableHttpResponse response = client.execute(post);
System.out.println("response: " + IOUtils.toString(response.getEntity().getContent(),"UTF-8"));
System.out.println(response.getStatusLine().getStatusCode());
client.close();
return response;
}
I believe the code is identical with the exception of the data type of the input parameter. Here is my empty response, which comes with a status code 207:
response: {"data":[]}
So here is my real question: Can any API that accept files also accept a file in the form of a byte stream or byte array? Can the API tell the difference, and what is really happening when I POST a file? Does the API have to be specifically configured to accept this file in the form of a byte stream or a byte array? A link to a reference along with a short explanation would be highly appreciated - I really need to learn this stuff and understand it well.
Is there some easy to correct mistake that I'm making? Am I using the wrong Content-Type or something? I'm not even sure what the meaning of the third argument to MultipartEntityBuilder.build is (the one I've left null).
Any help is appreciated, thank you very much!
It appears that an API that accepts a file doesn't care if it comes from a file object or a byte array. Per JB Nizet:
You're passing null as the file name. When passing a File as argument, the actual name of the File is used if you passed null as file name. That doesn't happen obviously if you pass a bute array. So specify a non-null file name as last argument. That can only be found out by reading the javadoc and the source code of MultipartEntityBuilder. It's open source: use that as an advantage.
In this specific case, adding a random string as the last argument of the build method fixes the problem and the API accepts the byte array as a file.

Handling file downloads via REST API

I want to set up the REST API to support file downloads via Java (The java part is not needed at the moment -- I am saying it in here so you can make your answer more specific for my problem).
How would I do that?
For example, I have this file in a folder (./java.jar), how can I stream it in such a way for it to be downloadable by a Java client?
I forgot to say that this, is for some paid-content.
My app should be able to do this
Client: Post to server with username,pass.
Rest: Respond accordingly to what user has bought (so if it has bought that file, download it)
Client: Download file and put it in x folder.
I thought of encoding a file in base64 and then posting the encoded result into the usual .json (maybe with a nice name -- useful for the java application, and with the code inside -- though I would not know how I should rebuild the file at this point). <- Is this plausible? Or is there an easier way?
Also, please do not downvote if unnecessary, although there is no code in the question, that doesn't mean I haven't researched it, it just means that I found nothing suitable for my situation.
Thanks.
What you need is a regular file streaming, using a valid URL.
Below code is an excerpt from here
import java.net.*;
import java.io.*;
public class URLReader {
public static void main(String[] args) throws Exception {
URL oracle = new URL("http://www.oracle.com/");
BufferedReader in = new BufferedReader(
new InputStreamReader(oracle.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();
}
}
For your needs, based on your updated comments on the above answer, you could call your REST endpoint after user logs in(with Auth and other headers/body you wish to receive) and proceed to the download.
Convert your jar/downloadable content to bytes. More on this
Java Convert File to Byte Array and visa versa
Later, in case if you dont want regular streaming as aforementioned in previous answers, you can put the byte content in the body as Base64 String. You can encode to Base64 from your byte array using something like below.
Base64.encodeToString(byte[], Base64.NO_WRAP + Base64.URL_SAFE);
Reference from here: How to send byte[] and strings to a restful webservice and retrieve this information in the web method implementation
Again, there are many ways to do this, this is one of the ways you can probably do using REST.

How to access and parse XML file using Java and JavaScript

New to the development scene, please ignore my ignorance if I happen to not make any sense......
I'm trying to access a xml file located in my EJB directory which has to stay there, I need to parse it into a javascript accessible object preferably JSON, to dynamically manipulate it using Javascript / Angular....
using JBOSS, and the file's location is something like
/FOO-ejb/src/main/resources/Config.xml, obviously not accessible through the web since it does not reside under a webserver root directory,
Java is the back-end and I can't seem to find any other ways to access this file to serve it to the front-end,
I'm heading towards the direction of using a service within the EJB to access the file, parse it, then use a REST service to serve the object to the front-end....or write a JSP to read in the file, parse it etc....
are there any other better solutions for this?
Thank you everyone for your time!
I think what you want to do is not achievable since it would mean you'd use Javascript to access the file system which is not possible though HTML5 offers some File API that could work but not to access any file in the file system.
So I'd say that the direction you're heading is the most appropriate and maybe easier because even if you find a way to do it in JavaScript it would be a browser-dependant or some weird workaround that could be broken in future browser's version.
I used Apache Abdera in a Servlet in the past to parse an XML RSS feed and convert it to JSON. Abdera is good at that and worked perfect for me. After getting the JSON object I just had to send it to the response and on the client side I used an AJAX call to the servlet to get the JSON object.
The code was something like this:
try {
PrintWriter result = response.getWriter();
// Creates Abdera object and client to process the request.
Abdera abderaObj = new Abdera();
AbderaClient client = new AbderaClient(abderaObj);
AbderaClient.registerTrustManager(); // For SSL connections.
// Sent the HTTP request of the ATOM Feed through AbderaClient.
ClientResponse resp = client.get( "http://url/to/your/feed" );
// if the response was OK...
if (resp.getType() == ResponseType.SUCCESS) {
// We get the document as a Feed
Document<Feed> doc = resp.getDocument();
// Creates a JSON writer to convert the ATOM Feed
Writer json = abderaObj.getWriterFactory().getWriter("json");
// Converts the (XML) ATOM Feed into JSON object
doc.writeTo(json, result);
}
} catch (Exception ex) {
ex.printStackTrace(System.out);
}

Office Web Apps Word Editing

The idea is to build a proprietary Java back end document system using Office Web Apps.
We have created the WOPI client which allows us to view/edit PowerPoint and Excel web app documents but we can only view Word Documents.
In order to edit Word Web App documents you need to implement MS-FSSHTTP.
It appears there is no information about how to actually do this in code. Has anyone performed this or would know how?
recently my team and I have implemented a WOPI-Host that supports viewing and editing of Word, PPT and Excel documents. You can take a look at https://github.com/marx-yu/WopiHost which is a command prompt project that listens on the 8080 port and enables editing and viewing of word documents though the Microsoft Office Web Apps.
We have implemented this solution in a webApi and it works great. Hope this sample project will help you out.
After requested, I will try and add code samples to clarify the way to implement it based on my webApi implementation, but their is a lot of code to implement to actually make it work properly.
First things first, to enabled editing you will need to capture Http Posts in a FilesController. Each posts that concern the actual editing will have the header X-WOPI-Override equal to COBALT. In these post you will find out that the InputStream is and Atom type. Based on the MS-WOPI documentation, in your response you will need to include the following headers X-WOPI-CorrelationID and request-id.
Here is the code of my webApi post method (it is not complete since I'm still implementing that WOPI protocol).
string wopiOverride = Request.Headers.GetValues("X-WOPI-Override").First();
if (wopiOverride.Equals("COBALT"))
{
string filename = name;
EditSession editSession = CobaltSessionManager.Instance.GetSession(filename);
var filePath = HostingEnvironment.MapPath("~/App_Data/");
if (editSession == null){
var fileExt = filename.Substring(filename.LastIndexOf('.') + 1);
if (fileExt.ToLower().Equals(#"xlsx"))
editSession = new FileSession(filename, filePath + "/" + filename, #"yonggui.yu", #"yuyg", #"yonggui.yu#emacle.com", false);
else
editSession = new CobaltSession(filename, filePath + "/" + filename, #"patrick.racicot", #"Patrick Racicot", #"patrick.racicot#hospitalis.com", false);
CobaltSessionManager.Instance.AddSession(editSession);
}
//cobalt, for docx and pptx
var ms = new MemoryStream();
HttpContext.Current.Request.InputStream.CopyTo(ms);
AtomFromByteArray atomRequest = new AtomFromByteArray(ms.ToArray());
RequestBatch requestBatch = new RequestBatch();
Object ctx;
ProtocolVersion protocolVersion;
requestBatch.DeserializeInputFromProtocol(atomRequest, out ctx, out protocolVersion);
editSession.ExecuteRequestBatch(requestBatch);
foreach (Request request in requestBatch.Requests)
{
if (request.GetType() == typeof(PutChangesRequest) && request.PartitionId == FilePartitionId.Content)
{
//upload file to hdfs
editSession.Save();
}
}
var responseContent = requestBatch.SerializeOutputToProtocol(protocolVersion);
var host = Request.Headers.GetValues("Host");
var correlationID = Request.Headers.GetValues("X-WOPI-CorrelationID").First();
response.Headers.Add("X-WOPI-CorrelationID", correlationID);
response.Headers.Add("request-id", correlationID);
MemoryStream memoryStream = new MemoryStream();
var streamContent = new PushStreamContent((outputStream, httpContext, transportContent) =>
{
responseContent.CopyTo(outputStream);
outputStream.Close();
});
response.Content = streamContent;
response.Content.Headers.ContentType = new MediaTypeHeaderValue("application/octet-stream");
response.Content.Headers.ContentLength = responseContent.Length;
}
As you can see in this method I make use of CobaltSessionManager and CobaltSession which are used to create and manage editing sessions on the Cobalt protocol. You will also need a what I call CobaltHostLockingStore which is used to handle the different requests when communicating with the Office Web App server in the edition initialization.
I won't be posting the code for these 3 classes since they are already coded in the sample github project I posted and that they are fairly simple to understand even though they are big.
If you have more questions or if it's not clear enough don't hesitate to comment and I will update my post accordingly.
Patrick Racicot, provided great answer. But i had problem saving docx(exception in CobaltCore.dll), and i even started using dotPeak reflector trying to figure it out.
But after i locked editSession variable in my WebApi method everything started working like magic. It seems that OWA is sending requests that should be handled as a chain, not in parallel as usually controller method acts.

Selenium 2: Detect content type of link destinations

I am using the Selenium 2 Java API to interact with web pages. My question is: How can i detect the content type of link destinations?
Basically, this is the background: Before clicking a link, i want to be sure that the response is an HTML file. If not, i need to handle it in another way. So, let's say there is a download link for a PDF file. The application should directly read the contents of that URL instead of opening it in the browser.
The goal is to have an application which automatically knows wheather the current location is an HTML, PDF, XML or whatever to use appropriate parsers to extract useful information out of the documents.
Update
Added bounty: Will reward it to the best solution which allows me to get the content type of a given URL.
As Jochen suggests, the way to get the Content-type without also downloading the content is HTTP HEAD, and the selenium webdrivers does not seem to offer functionality like that. You'll have to find another library to help you with fetching the content type of an url.
A Java library that can do this is Apache HttpComponents, especially HttpClient.
(The following code is untested)
HttpClient httpclient = new DefaultHttpClient();
HttpHead httphead = new HttpHead("http://foo/bar");
HttpResponse response = httpclient.execute(httphead);
BasicHeader contenttypeheader = response.getFirstHeader("Content-Type");
System.out.println(contenttypeheader);
The project publishes JavaDoc for HttpClient, the documentation for the HttpClient interface contains a nice example.
You can figure out the content type will processing the data coming in.
Not sure why you need to figure this out first.
If so, use the HEAD method and look at the Content-Type header.
You can retrieve all the URLs from the DOM, and then parse the last few characters of each URL (using a java regex) to determine the link type.
You can parse characters proceeding the last dot. For example, in the url http://yoursite.com/whatever/test.pdf, extract the pdf, and enforce your test logic accordingly.
Am I oversimplifying your problem?

Categories

Resources