I'm kind of new to Hadoop HDFS and quite rusty with Java and I need some help. I'm trying to read a file from HDFS and calculate the MD5 hash of this file. The general Hadoop configuration is as below.
private FSDataInputStream hdfsDIS;
private FileInputStream FinputStream;
private FileSystem hdfs;
private Configuration myConfig;
hdfs = FileSystem.get(new URI("hdfs://NodeName:54310"), myConfig);
hdfsDIS = hdfs.open(hdfsFilePath);
The function hdfs.open(hdfsFilePath) returns an FSDataInputStream
The problem is that i can only get an FSDataInputStream out of the HDFS, but i'd like to get a FileInputStream out of it.
The code below performs the hashing part and is adapted from something i found somewhere on StackOverflow (can't seem to find the link to it now).
FileInputStream FinputStream = hdfsDIS; // <---This is where the problem is
MessageDigest md;
try {
md = MessageDigest.getInstance("MD5");
FileChannel channel = FinputStream.getChannel();
ByteBuffer buff = ByteBuffer.allocate(2048);
while(channel.read(buff) != -1){
byte[] hashValue = md.digest();
return toHex(hashValue);
catch (NoSuchAlgorithmException e){
return null;
catch (IOException e){
return null;
The reason why i need a FileInputStream is because the code that does the hashing uses a FileChannel which supposedly increases the efficiency of reading the data from the file.
Could someone show me how i could convert the FSDataInputStream into a FileInputStream
Use it as an InputStream:
MessageDigest md;
try {
md = MessageDigest.getInstance("MD5");
byte[] buff = new byte[2048];
int count;
while((count = hdfsDIS.read(buff)) != -1){
md.update(buff, 0, count);
byte[] hashValue = md.digest();
return toHex(hashValue);
catch (NoSuchAlgorithmException e){
return null;
catch (IOException e){
return null;
the code that does the hashing uses a FileChannel which supposedly increases the efficiency of reading the data from the file
Not in this case. It only improves efficiency if you're just copying the data to another channel, if you use a DirectByteBuffer. If you're processing the data, as here, it doesn't make any difference. A read is still a read.
You can use the FSDataInputStream as just a regular InputStream, and pass that to Channels.newChannel to get back a ReadableByteChannel instead of a FileChannel. Here's an updated version:
InputStream inputStream = hdfsDIS;
MessageDigest md;
try {
md = MessageDigest.getInstance("MD5");
ReadableByteChannel channel = Channels.newChannel(inputStream);
ByteBuffer buff = ByteBuffer.allocate(2048);
while(channel.read(buff) != -1){
byte[] hashValue = md.digest();
return toHex(hashValue);
catch (NoSuchAlgorithmException e){
return null;
catch (IOException e){
return null;
You can' t do that assignment because:
extended by java.io.InputStream
extended by java.io.FilterInputStream
extended by java.io.DataInputStream
extended by org.apache.hadoop.fs.FSDataInputStream
FSDataInputStream is not a FileInputStream.
That said to convert from FSDataInputStream to FileInputStream,
you could user FSDataInputStream FileDescriptors to create a FileInputStream according to the Api
new FileInputStream(hdfsDIS.getFileDescriptor());
Not sure it will work.
I've found many ways of converting a file to a byte array and writing byte array to a file on storage.
What I want is to convert java.io.File to a byte array and then convert a byte array back to a java.io.File.
I don't want to write it out to storage like the following:
//convert array of bytes into file
FileOutputStream fileOuputStream = new FileOutputStream("C:\\testing2.txt");
I want to somehow do the following:
File myFile = ConvertfromByteArray(bytes);
Otherwise Try this :
Converting File To Bytes
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
public class Temp {
public static void main(String[] args) {
File file = new File("c:/EventItemBroker.java");
byte[] b = new byte[(int) file.length()];
try {
FileInputStream fileInputStream = new FileInputStream(file);
for (int i = 0; i < b.length; i++) {
} catch (FileNotFoundException e) {
System.out.println("File Not Found.");
catch (IOException e1) {
System.out.println("Error Reading The File.");
Converting Bytes to File
public class WriteByteArrayToFile {
public static void main(String[] args) {
String strFilePath = "Your path";
try {
FileOutputStream fos = new FileOutputStream(strFilePath);
String strContent = "Write File using Java ";
catch(FileNotFoundException ex) {
System.out.println("FileNotFoundException : " + ex);
catch(IOException ioe) {
System.out.println("IOException : " + ioe);
I think you misunderstood what the java.io.File class really represents. It is just a representation of the file on your system, i.e. its name, its path etc.
Did you even look at the Javadoc for the java.io.File class? Have a look here
If you check the fields it has or the methods or constructor arguments, you immediately get the hint that all it is, is a representation of the URL/path.
Oracle provides quite an extensive tutorial in their Java File I/O tutorial, with the latest NIO.2 functionality too.
With NIO.2 you can read it in one line using java.nio.file.Files.readAllBytes().
Similarly you can use java.nio.file.Files.write() to write all bytes in your byte array.
Since the question is tagged Android, the more conventional way is to wrap the FileInputStream in a BufferedInputStream and then wrap that in a ByteArrayInputStream.
That will allow you to read the contents in a byte[]. Similarly the counterparts to them exist for the OutputStream.
You can't do this. A File is just an abstract way to refer to a file in the file system. It doesn't contain any of the file contents itself.
If you're trying to create an in-memory file that can be referred to using a File object, you aren't going to be able to do that, either, as explained in this thread, this thread, and many other places..
Apache FileUtil gives very handy methods to do the conversion
try {
File file = new File(imagefilePath);
byte[] byteArray = new byte[file.length()]();
byteArray = FileUtils.readFileToByteArray(file);
}catch(Exception e){
There is no such functionality but you can use a temporary file by File.createTempFile().
File temp = File.createTempFile(prefix, suffix);
// tell system to delete it when vm terminates.
You cannot do it for File, which is primarily an intelligent file path. Can you refactor your code so that it declares the variables, and passes around arguments, with type OutputStream instead of FileOutputStream? If so, see classes java.io.ByteArrayOutputStream and java.io.ByteArrayInputStream
OutputStream outStream = new ByteArrayOutputStream();
byte[] data = outStream.toByteArray();
InputStream inStream = new ByteArrayInputStream(data);
1- Traditional way
The traditional conversion way is through using read() method of InputStream as the following:
public static byte[] convertUsingTraditionalWay(File file)
byte[] fileBytes = new byte[(int) file.length()];
try(FileInputStream inputStream = new FileInputStream(file))
catch (Exception ex)
return fileBytes;
2- Java NIO
With Java 7, you can do the conversion using Files utility class of nio package:
public static byte[] convertUsingJavaNIO(File file)
byte[] fileBytes = null;
fileBytes = Files.readAllBytes(file.toPath());
catch (Exception ex)
return fileBytes;
3- Apache Commons IO
Besides JDK, you can do the conversion using Apache Commons IO library in 2 ways:
3.1. IOUtils.toByteArray()
public static byte[] convertUsingIOUtils(File file)
byte[] fileBytes = null;
try(FileInputStream inputStream = new FileInputStream(file))
fileBytes = IOUtils.toByteArray(inputStream);
catch (Exception ex)
return fileBytes;
3.2. FileUtils.readFileToByteArray()
public static byte[] convertUsingFileUtils(File file)
byte[] fileBytes = null;
fileBytes = FileUtils.readFileToByteArray(file);
catch(Exception ex)
return fileBytes;
Server side
public byte[] download() throws Exception {
File f = new File("C:\\WorkSpace\\Text\\myDoc.txt");
byte[] byteArray = new byte[(int) f.length()];
byteArray = FileUtils.readFileToByteArray(f);
return byteArray;
Client side
private ResponseEntity<byte[]> getDownload(){
URI end = URI.create(your url which server has exposed i.e. bla
return rest.getForEntity(end,byte[].class);
public static void main(String[] args) throws Exception {
byte[] byteArray = new TestClient().getDownload().getBody();
FileOutputStream fos = new
System.out.println("file written successfully..");
//The file that you wanna convert into byte[]
File file=new File("/storage/0CE2-EA3D/DCIM/Camera/VID_20190822_205931.mp4");
FileInputStream fileInputStream=new FileInputStream(file);
byte[] data=new byte[(int) file.length()];
BufferedInputStream bufferedInputStream=new BufferedInputStream(fileInputStream);
//Now the bytes of the file are contain in the "byte[] data"
/*If you want to convert these bytes into a file, you have to write these bytes to a
certain location, then it will make a new file at that location if same named file is
not available at that location*/
FileOutputStream fileOutputStream =new FileOutputStream(Environment.getExternalStoragePublicDirectory(Environment.DIRECTORY_DOWNLOADS).toString()+"/Video.mp4");
/* It will write or make a new file named Video.mp4 in the "Download" directory of
the External Storage */
I would like to hash (MD5) all the files of a given directory, which holds 1000 2MB photos.
I tried just running a for loop and hashing a file at a time, but that caused memory issues.
I need a method to hash each file in an efficient manner (memory wise).
I have posted 3 questions with my problem, but now instead of fixing my code, I want to see what would be the best general approach to my requirement.
Thank you very much for the help.
public class MD5 {
public static void main(String[] args) throws IOException {
File file = new File("/Users/itaihay/Desktop/test");
for (File f : file.listFiles()) {
try {
} catch (Exception e) {
e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates.
private static MessageDigest md;
private static BufferedInputStream fis;
private static byte[] dataBytes;
private static byte[] mdbytes;
private static void clean() throws NoSuchAlgorithmException {
md = MessageDigest.getInstance("MD5");
dataBytes = new byte[8192];
public static void hash(File file) {
try {
} catch (NoSuchAlgorithmException e) {
try {
fis = new BufferedInputStream(new FileInputStream(file));
int nread = 0;
while ((nread = fis.read(dataBytes)) != -1) {
md.update(dataBytes, 0, nread);
nread = 0;
mdbytes = md.digest(); System.out.println(javax.xml.bind.DatatypeConverter.printHexBinary(mdbytes).toLowerCase());
} catch (FileNotFoundException e) {
} catch (IOException e) {
} finally {
try {
dataBytes = null;
md = null;
mdbytes = null;
} catch (IOException e) {
As others have said, using built-in Java MD5 code, you should be able to keep your memory footprint very small. I do something similar when hashing a large number of Jar files (up to a few MB apiece, usually 500MB-worth at a time) and get decent performance. You'll definitely want to play around with different buffer sizes until you find the optimal size for your system configuration. The following code-snippet uses no more than bufSize+128 bytes at a time, plus a negligible amount of overhead for the File, MessageDigest, and InputStream objects used to compute the md5 hash:
InputStream is = null;
File f = ...
int bufSize = ...
byte[] md5sum = null;
try {
MessageDigest digest = MessageDigest.getInstance("MD5");
is = new FileInputStream(f);
byte[] buffer = new byte[bufSize];
int read = 0;
while((read = is.read(buffer)) > 0) digest.update(buffer,0,read);
md5sum = digest.digest();
} catch (Exception e){
} finally {
if(is != null) is.close();
} catch (IOException e){}
Increasing your Java heap space could solve it short term.
Long term, you want to look into reading images into a fixed-size queue that can fit in the memory. Don't read them all in at once. Enqueue the most recent image and dequeue the earliest image.
MD5 updates its state in 64 byte chunks, so you only need 16 bytes of a file in memory at a time. The MD5 state itself is 128 bits, as is the output size.
The most memory conservative approach would be to read 64 bytes at a time from each file, file-by-file, and use it to update that file's MD5 state. You would need at most 999 * 16 + 64 = 16048 ~= 16k of memory.
But such small reads would be very inefficient, so from there you can increase the read size from a file to fit within your memory constraints.
I need a very simple function that allows me to read the first 1k bytes of a file through FTP. I want to use it in MATLAB to read the first lines and, according to some parameters, to download only files I really need eventually. I found some examples online that unfortunately do not work. Here I'm proposing the sample code where I'm trying to download one single file (I'm using the Apache libraries).
FTPClient client = new FTPClient();
FileOutputStream fos = null;
try {
// filename to be downloaded.
String filename = "filename.Z";
fos = new FileOutputStream(filename);
// Download file from FTP server
InputStream stream = client.retrieveFileStream("/pub/obs/2008/021/ab120210.08d.Z");
byte[] b = new byte[1024];
} catch (IOException e) {
} finally {
try {
if (fos != null) {
} catch (IOException e) {
the error is in stream which is returned empty. I know I'm passing the folder name in a wrong way, but I cannot understand how I have to do. I've tried in many way.
I've also tried with the URL's Java classes as:
URL url;
url = new URL("ftp://data.site.org/pub/obs/2008/021/ab120210.08d.Z");
URLConnection con = url.openConnection();
BufferedInputStream in =
new BufferedInputStream(con.getInputStream());
FileOutputStream out =
new FileOutputStream("C:\\filename.Z");
int i;
byte[] bytesIn = new byte[1024];
if ((i = in.read(bytesIn)) >= 0) {
but it is giving an error when I'm closing the InputStream in!
I'm definitely stuck. Some comments about would be very useful!
Try this test
InputStream is = new URL("ftp://test:test#ftp.secureftp-test.com/bookstore.xml").openStream();
byte[] a = new byte[1000];
int n = is.read(a);
System.out.println(new String(a, 0, n));
it definitely works
From my experience when you read bytes from a stream acquired from ftpClient.retrieveFileStream, for the first run it is not guarantied that you get your byte buffer filled up. However, either you should read the return value of stream.read(b); surrounded with a cycle based on it or use an advanced library to fill up the 1024 length byte[] buffer:
InputStream stream = null;
try {
// Download file from FTP server
stream = client.retrieveFileStream("/pub/obs/2008/021/ab120210.08d.Z");
byte[] b = new byte[1024];
IOUtils.read(stream, b); // will call periodically stream.read() until it fills up your buffer or reaches end-of-file
} catch (IOException e) {
} finally {
I cannot understand why it doesn't work. I found this link where they used the Apache library to read 4096 bytes each time. I read the first 1024 bytes and it works eventually, the only thing is that if completePendingCommand() is used, the program is held for ever. Thus I've removed it and everything works fine.
I'm trying to read an image from an URL (with the Java package
java.net.URL) to a byte[]. "Everything" works fine, except that the content isn't being entirely read from the stream (the image is corrupt, it doesn't contain all the image data)... The byte array is being persisted in a database (BLOB). I really don't know what the correct approach is, maybe you can give me a tip. :)
This is my first approach (code formatted, removed unnecessary information...):
URL u = new URL("http://localhost:8080/images/anImage.jpg");
int contentLength = u.openConnection().getContentLength();
Inputstream openStream = u.openStream();
byte[] binaryData = new byte[contentLength];
My second approach was this one (as you'll see the contentlength is being fetched another way):
URL u = new URL(content);
openStream = u.openStream();
int contentLength = openStream.available();
byte[] binaryData = new byte[contentLength];
Both of the code result in a corrupted image...
I already read this post from Stack Overflow.
There's no guarantee that the content length you're provided is actually correct. Try something akin to the following:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
InputStream is = null;
try {
is = url.openStream ();
byte[] byteChunk = new byte[4096]; // Or whatever size you want to read in at a time.
int n;
while ( (n = is.read(byteChunk)) > 0 ) {
baos.write(byteChunk, 0, n);
catch (IOException e) {
System.err.printf ("Failed while reading bytes from %s: %s", url.toExternalForm(), e.getMessage());
e.printStackTrace ();
// Perform any other exception handling that's appropriate.
finally {
if (is != null) { is.close(); }
You'll then have the image data in baos, from which you can get a byte array by calling baos.toByteArray().
This code is untested (I just wrote it in the answer box), but it's a reasonably close approximation to what I think you're after.
Just extending Barnards's answer with commons-io. Separate answer because I can not format code in comments.
InputStream is = null;
try {
is = url.openStream ();
byte[] imageBytes = IOUtils.toByteArray(is);
catch (IOException e) {
System.err.printf ("Failed while reading bytes from %s: %s", url.toExternalForm(), e.getMessage());
e.printStackTrace ();
// Perform any other exception handling that's appropriate.
finally {
if (is != null) { is.close(); }
Here's a clean solution:
private byte[] downloadUrl(URL toDownload) {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
try {
byte[] chunk = new byte[4096];
int bytesRead;
InputStream stream = toDownload.openStream();
while ((bytesRead = stream.read(chunk)) > 0) {
outputStream.write(chunk, 0, bytesRead);
} catch (IOException e) {
return null;
return outputStream.toByteArray();
I am very surprised that nobody here has mentioned the problem of connection and read timeout. It could happen (especially on Android and/or with some crappy network connectivity) that the request will hang and wait forever.
The following code (which also uses Apache IO Commons) takes this into account, and waits max. 5 seconds until it fails:
public static byte[] downloadFile(URL url)
try {
URLConnection conn = url.openConnection();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
IOUtils.copy(conn.getInputStream(), baos);
return baos.toByteArray();
catch (IOException e)
// Log error and return null, some default or throw a runtime exception
byte[] b = IOUtils.toByteArray((new URL( )).openStream()); //idiom
Note however, that stream is not closed in the above example.
if you want a (76-character) chunk (using commons codec)...
byte[] b = Base64.encodeBase64(IOUtils.toByteArray((new URL( )).openStream()), true);
Use commons-io IOUtils.toByteArray(URL):
String url = "http://localhost:8080/images/anImage.jpg";
byte[] fileContent = IOUtils.toByteArray(new URL(url));
Maven dependency:
The content length is just a HTTP header. You cannot trust it. Just read everything you can from the stream.
Available is definitely wrong. It's just the number of bytes that can be read without blocking.
Another issue is your resource handling. Closing the stream has to happen in any case. try/catch/finally will do that.
It's important to specify timeouts, especially when the server takes to respond. With pure Java, without using any dependency:
public static byte[] copyURLToByteArray(final String urlStr,
final int connectionTimeout, final int readTimeout)
throws IOException {
final URL url = new URL(urlStr);
final URLConnection connection = url.openConnection();
try (InputStream input = connection.getInputStream();
ByteArrayOutputStream output = new ByteArrayOutputStream()) {
final byte[] buffer = new byte[8192];
for (int count; (count = input.read(buffer)) > 0;) {
output.write(buffer, 0, count);
return output.toByteArray();
Using dependencies, e.g., HC Fluent:
public byte[] copyURLToByteArray(final String urlStr,
final int connectionTimeout, final int readTimeout)
throws IOException {
return Request.Get(urlStr)
How to write a byte array to a file in Java?
As Sebastian Redl points out the most straight forward now java.nio.file.Files.write. Details for this can be found in the Reading, Writing, and Creating Files tutorial.
Old answer:
FileOutputStream.write(byte[]) would be the most straight forward. What is the data you want to write?
The tutorials for Java IO system may be of some use to you.
You can use IOUtils.write(byte[] data, OutputStream output) from Apache Commons IO.
KeyGenerator kgen = KeyGenerator.getInstance("AES");
SecretKey key = kgen.generateKey();
byte[] encoded = key.getEncoded();
FileOutputStream output = new FileOutputStream(new File("target-file"));
IOUtils.write(encoded, output);
As of Java 1.7, there's a new way: java.nio.file.Files.write
import java.nio.file.Files;
import java.nio.file.Paths;
KeyGenerator kgen = KeyGenerator.getInstance("AES");
SecretKey key = kgen.generateKey();
byte[] encoded = key.getEncoded();
Files.write(Paths.get("target-file"), encoded);
Java 1.7 also resolves the embarrassment that Kevin describes: reading a file is now:
byte[] data = Files.readAllBytes(Paths.get("source-file"));
A commenter asked "why use a third-party library for this?" The answer is that it's way too much of a pain to do it yourself. Here's an example of how to properly do the inverse operation of reading a byte array from a file (sorry, this is just the code I had readily available, and it's not like I want the asker to actually paste and use this code anyway):
public static byte[] toByteArray(File file) throws IOException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
boolean threw = true;
InputStream in = new FileInputStream(file);
try {
byte[] buf = new byte[BUF_SIZE];
long total = 0;
while (true) {
int r = in.read(buf);
if (r == -1) {
out.write(buf, 0, r);
threw = false;
} finally {
try {
} catch (IOException e) {
if (threw) {
log.warn("IOException thrown while closing", e);
} else {
throw e;
return out.toByteArray();
Everyone ought to be thoroughly appalled by what a pain that is.
Use Good Libraries. I, unsurprisingly, recommend Guava's Files.write(byte[], File).
To write a byte array to a file use the method
public void write(byte[] b) throws IOException
from BufferedOutputStream class.
java.io.BufferedOutputStream implements a buffered output stream. By setting up such an output stream, an application can write bytes to the underlying output stream without necessarily causing a call to the underlying system for each byte written.
For your example you need something like:
String filename= "C:/SO/SOBufferedOutputStreamAnswer";
BufferedOutputStream bos = null;
try {
//create an object of FileOutputStream
FileOutputStream fos = new FileOutputStream(new File(filename));
//create an object of BufferedOutputStream
bos = new BufferedOutputStream(fos);
KeyGenerator kgen = KeyGenerator.getInstance("AES");
SecretKey key = kgen.generateKey();
byte[] encoded = key.getEncoded();
// catch and handle exceptions...
Apache Commons IO Utils has a FileUtils.writeByteArrayToFile() method. Note that if you're doing any file/IO work then the Apache Commons IO library will do a lot of work for you.
No need for external libs to bloat things - especially when working with Android. Here is a native solution that does the trick. This is a pice of code from an app that stores a byte array as an image file.
// Byte array with image data.
final byte[] imageData = params[0];
// Write bytes to tmp file.
final File tmpImageFile = new File(ApplicationContext.getInstance().getCacheDir(), "scan.jpg");
FileOutputStream tmpOutputStream = null;
try {
tmpOutputStream = new FileOutputStream(tmpImageFile);
Log.d(TAG, "File successfully written to tmp file");
catch (FileNotFoundException e) {
Log.e(TAG, "FileNotFoundException: " + e);
return null;
catch (IOException e) {
Log.e(TAG, "IOException: " + e);
return null;
finally {
if(tmpOutputStream != null)
try {
} catch (IOException e) {
Log.e(TAG, "IOException: " + e);
File file = ...
byte[] data = ...
FileOutputStream fos = FileOutputStream(file);
}catch(Exception e){
but if the bytes array length is more than 1024 you should use loop to write the data.