How to distinguish pdf and non pdf files?

How to distinguish pdf and non pdf files? - java

I used the following snippet to download pdf files ( I took it from here , credits to Josh M)
public final class FileDownloader {
private FileDownloader(){}
public static void main(String args[]) throws IOException{
download("http://pdfobject.com/pdf/sample.pdf", new File("sample.pdf"));
}
public static void download(final String url, final File destination) throws IOException {
final URLConnection connection = new URL(url).openConnection();
connection.setConnectTimeout(60000);
connection.setReadTimeout(60000);
connection.addRequestProperty("User-Agent", "Mozilla/5.0");
final FileOutputStream output = new FileOutputStream(destination, false);
final byte[] buffer = new byte[2048];
int read;
final InputStream input = connection.getInputStream();
while((read = input.read(buffer)) > -1)
output.write(buffer, 0, read);
output.flush();
output.close();
input.close();
}
}
It works perfect with pdf files. However, as I encountered a "bad file" ... I do not know what the extension of that file is , but it appears that I fell into infinite loop of while((read = input.read(buffer)) > -1). How can I improve this snippet to discard any kind of inappropriate files (non pdfs)?

There is a question with the similar issue: Infinite Loop in Input Stream.
Check out a possible solution: Abort loop after fixed time.
You could try setting a timeout for the connection: Java URLConnection Timeout.

Related

OOM while uploading large file

I need to upload a very large file from my machine to a server. (a few GB)
Currently, I tried the below approach but I keep getting.
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
I can increase the memory but this is not something I want to do because not sure where my code will run. I want to read a few MB/kb send them to the server and release the memory and repeat. tried other approaches like Files utils or IOUtils.copyLarge but I get the same problem.
URL serverUrl =
new URL(url);
HttpURLConnection urlConnection = (HttpURLConnection) serverUrl.openConnection();
urlConnection.setConnectTimeout(Configs.TIMEOUT);
urlConnection.setReadTimeout(Configs.TIMEOUT);
File fileToUpload = new File(file);
urlConnection.setDoOutput(true);
urlConnection.setRequestMethod("POST");
urlConnection.addRequestProperty("Content-Type", "application/octet-stream");
urlConnection.connect();
OutputStream output = urlConnection.getOutputStream();
FileInputStream input = new FileInputStream(fileToUpload);
upload(input, output);
//..close streams
private static long upload(InputStream input, OutputStream output) throws IOException {
try (
ReadableByteChannel inputChannel = Channels.newChannel(input);
WritableByteChannel outputChannel = Channels.newChannel(output)
) {
ByteBuffer buffer = ByteBuffer.allocateDirect(10240);
long size = 0;
while (inputChannel.read(buffer) != -1) {
buffer.flip();
size += outputChannel.write(buffer);
buffer.clear();
}
return size;
}
}
I think it has something to do with this but I can't figure out what I am doing wrong.
Another approach was but I get the same issue:
private static long copy(InputStream source, OutputStream sink)
throws IOException {
long nread = 0L;
byte[] buf = new byte[10240];
int n;
int i = 0;
while ((n = source.read(buf)) > 0) {
sink.write(buf, 0, n);
nread += n;
i++;
if (i % 10 == 0) {
log.info("flush");
sink.flush();
}
}
return nread;
}

Use setFixedLengthStreamingMode as per this answer on the duplicate question Denis Tulskiy linked to:
conn.setFixedLengthStreamingMode((int) fileToUpload.length());
From the docs:
This method is used to enable streaming of a HTTP request body without internal buffering, when the content length is known in advance.
At the moment, your code is attempting to buffer the file into Java's heap memory in order to compute the Content-Length header on the HTTP request.

Java socket, get image file but it doesn't open

That's my first question so I hope I write it correctly.
I am trying to send an byte[] array through a Java socket, that array contains an image.
Here is the code to send the file:
public void WriteBytes(FileInputStream dis) throws IOException{
//bufferEscritura.writeInt(dis.available()); --- readInt() doesnt work correctly
Write(String.valueOf((int)dis.available()) + "\r\n");
byte[] buffer = new byte[1024];
int bytes = 0;
while((bytes = dis.read(buffer)) != -1){
Write(buffer, bytes);
}
System.out.println("Photo send!");
}
public void Write(byte[] buffer, int bytes) throws IOException {
bufferEscritura.write(buffer, 0, bytes);
}
public void Write(String contenido) throws IOException {
bufferEscritura.writeBytes(contenido);
}
My image:
URL url = this.getClass().getResource("fuegos_artificiales.png");
FileInputStream dis = new FileInputStream(url.getPath());
sockManager.WriteBytes(dis);
My code to get the image file:
public byte[] ReadBytes() throws IOException{
DataInputStream dis = new DataInputStream(mySocket.getInputStream());
int size = Integer.parseInt(Read());
System.out.println("Recived size: "+ size);
byte[] buffer = new byte[size];
System.out.println("We are going to read!");
dis.readFully(buffer);
System.out.println("Photo received!");
return buffer;
}
public String Leer() throws IOException {
return (bufferLectura.readLine());
}
And to create an image file:
byte[] array = tcpCliente.getSocket().LeerBytes();
FileOutputStream fos = new FileOutputStream("porfavor.png");
try {
fos.write(array);
}
finally {
fos.close();
}
The image file is created but when I try to open it for example with Paint it says that it can't open it because it is damaged...
I also tried to open both images (the original and the new one) with notepad and they have the same data inside!
I don't know what is happening...
I hope you help me.
Thanks!

Don't use available() as a measure of file length. It isn't. There is a specific warning in the Javadoc about that.
Use DataOutputStream.writeInt() to write the length, and DataInputStream.readInt() to read it, and use the same streams to read the image data. Don't use multiple streams on the same socket.
Also in this:
URL url = this.getClass().getResource("fuegos_artificiales.png");
FileInputStream dis = new FileInputStream(url.getPath());
the second line should be:
InputStream in = URL.openConnection.getInputStream();
A class resource is not a file.

Read Image from Socket [duplicate]

This question already has an answer here:
Closed 11 years ago.
Possible Duplicate:
Read Image File Through Java Socket
void readImage() throws IOException
{
socket = new Socket("upload.wikimedia.org", 80);
DataOutputStream bw = new DataOutputStream(new DataOutputStream(socket.getOutputStream()));
bw.writeBytes("GET /wikipedia/commons/8/80/Knut_IMG_8095.jpg HTTP/1.1\n");
bw.writeBytes("Host: wlab.cs.bilkent.edu.tr:80\n\n");
DataInputStream in = new DataInputStream(socket.getInputStream());
File file = new File("imgg.jpg");
file.createNewFile();
DataOutputStream dos = new DataOutputStream(new FileOutputStream(file));
int count;
byte[] buffer = new byte[8192];
while ((count = in.read(buffer)) > 0)
{
dos.write(buffer, 0, count);
dos.flush();
}
dos.close();
System.out.println("image transfer done");
socket.close();
}
-Create a socket
-Create output stream
-Request the page that includes image
-Read socket to an input stream
-Write to file
I am trying to read an image from socket.
But it is not working.
It seems to read and the image is opened but can not be seen
Where is the problem?

You need to skip HTTP headers to get correct image.
I've already answered to this question today, look at: Read Image File Through Java Socket
The second problem, that you are trying to receive an image from wikipedia without referer and wikipedia restrict to do that (you receiving access denied every time). Try to use another image URL (google image for example).

You can use URL objects directly to fetch HTTP content. The input stream returned by the URL object will only contain content at the URL. The example method below takes a URL, fetches its content and writes the content to a given file.
public static void createImageFile(URL url, File file) throws IOException{
FileOutputStream fos = null;
InputStream is = null;
byte[] b = new byte[1024]; // 1 kB read blocks.
URLConnection conn;
try{
conn = url.openConnection();
/* Set some connection options here
before opening the stream
(i.e. connect and read timeouts) */
is = conn.getInputStream();
fos = new FileOutputStream(file);
int i = 0;
do{
i = is.read(b);
if(i != -1)
fos.write(b, 0, i);
}while(i != -1)
}finally{
/* Don't forget to clean up. */
if(is != null){
try{
is.close();
}catch(Exception e){
/* Don't care */
}
}
if(fos != null){
try{
fos.close();
}catch(Exception e){
/* Don't care */
}
}
}
}

How to download a ZIp file from a URl and store them as Zip file only

I have a url like below
http://blah.com/download.zip
I want a java code to download this Zip file from the URL and save it in my server directory as ZIP file only . I would also like to know what is the most effecient way to do this.

First, your URL is not http:\\blah.com\download.zip. It is http://blah.com/download.zip.
Second, it is simple. You have to perform HTTP GET request, take the stream and copy it to FileOutputStream. Here is the code sample.
URL url = new URL("http://blah.com/download.zip");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
InputStream in = connection.getInputStream();
FileOutputStream out = new FileOutputStream("download.zip");
copy(in, out, 1024);
out.close();
public static void copy(InputStream input, OutputStream output, int bufferSize) throws IOException {
byte[] buf = new byte[bufferSize];
int n = input.read(buf);
while (n >= 0) {
output.write(buf, 0, n);
n = input.read(buf);
}
output.flush();
}

How can I send a generic file to a jersey service and receive it correctly?

I'm developing a Jersey service that uses Dropbox's API.
I need to post a generic file to my service (the service would be able to manage every kind of file as well as you can do with the Dropbox API).
Client Side
So, I've implemented a simple client that:
opens the file,
creates a connection to the URL,
sets the correct HTTP method,
creates a FileInputStream and writes the file on the connection's outputstream using byte buffer.
This is the client test code.
public class Client {
public static void main(String args[]) throws IOException, InterruptedException {
String target = "http://localhost:8080/DCService/REST/professor/upload";
URL putUrl = new URL(target);
HttpURLConnection connection = (HttpURLConnection) putUrl.openConnection();
connection.setDoOutput(true);
connection.setInstanceFollowRedirects(false);
connection.setRequestMethod("POST");
connection.setRequestProperty("content-Type", "application/pdf");
OutputStream os = connection.getOutputStream();
InputStream is = new FileInputStream("welcome.pdf");
byte buf[] = new byte[1024];
int len;
int lung = 0;
while ((len = is.read(buf)) > 0) {
System.out.print(len);
lung += len;
is.read(buf);
os.write(buf, 0, len);
}
}
}
Server Side
I've a method that:
gets an InputStream as an argument,
creates a file with the same name and type of the original file.
The following code implements a test method to receive a specific PDF file.
#PUT
#Path("/upload")
#Consumes("application/pdf")
public Response uploadMaterial(InputStream is) throws IOException {
String name = "info";
String type = "exerc";
String description = "not defined";
Integer c = 10;
Integer p = 131;
File f = null;
try {
f = new File("welcome.pdf");
OutputStream out = new FileOutputStream(f);
byte buf[] = new byte[1024];
int len;
while ((len = is.read(buf)) > 0)
out.write(buf, 0, len);
out.close();
is.close();
System.out.println("\nFile is created........");
} catch (IOException e) {
throw new WebApplicationException(Response.Status.BAD_REQUEST);
}
//I need to pass a java.io.file object to this method
professorManager.uploadMaterial(name, type, description, c,p, f);
return Response.ok("<result>File " + name + " was uploaded</result>").build();
}
This implementation works only with text files. If I try to send a simple PDF the received file is not readable (after I've saved it on disk).
How can I satisfy my requirements? Could anyone suggest me solution?

You're client code is faulty.
while ((len = is.read(buf)) > 0) {
...
is.read(buf);
...
}
You're reading from the InputStream twice in every iteration. Remove the read statement from the loop's body and you'll be fine.
You've also said that the code presented in your question works with text files. I think that doesn't work either. Reading twice from the file you're trying to upload means you're uploading only half of its contents. Half a text file is still a text file, but half a PDF is only rubbish, so you can't open the latter. You should have double checked if the contents of your uploaded and saved text file is the same as the original.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to distinguish pdf and non pdf files? - java

There is a question with the similar issue: Infinite Loop in Input Stream. Check out a possible solution: Abort loop after fixed time. You could try setting a timeout for the connection: Java URLConnection Timeout.

Related

OOM while uploading large file

Java socket, get image file but it doesn't open

Read Image from Socket [duplicate]

How to download a ZIp file from a URl and store them as Zip file only

How can I send a generic file to a jersey service and receive it correctly?

Categories

Resources