How to get length of a stream writer in java - java

In java is there any functionality equivalent to below c# code for getting stream length.
StreamWriter.BaseStream.Length
I have searched on internet and also I checked the properties of "BufferredWriter", "OutputStreamWriter" and "FileOutputStream" but I did not find anything. Any information is appreciated.
Thank you so much.

An OutputStream has finally the length of the content which YOU write into the stream.

Finally I had to use File.length() property as I found no way to get length from stream like C#.
Here is how it was done:
Note (using flag etc.) which file the stream is associated with.
When you need the length of stream, just get File.Length for the file which you had associated with stream like below.
Why I needed to check length is to prevent writing to file more than defined max length.
String sFilePath = this.m_sLogFolderPath + File.separator;
if(this.m_File2Active == true)
{
sFilePath += Def.DEF_FILE2;
}
else
{
sFilePath += Def.DEF_FILE1;
}
File file = new File(sFilePath);
if(file.length() > this.m_lMaxSize)
{
this.m_bwWriter.flush();
this.m_bwWriter.close();
this.m_bwWriter = null;
sFilePath = this.m_sLogFolderPath + File.separator;
if (this.m_File2Active == true)
{
sFilePath += Def.DEF_FILE1;
this.m_File2Active = false;
}
else
{
sFilePath += Def.DEF_FILE2;
this.m_File2Active = true;
}
this.m_bwWriter = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(sFilePath, true), Def.DEF_ENCODING_UTF8));
}

Related

Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection WARNING

I am trying to read a very big file (almost 1 GB) from S3. However the getobject fails to read the file with the error
WARNING: Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
I checked a few posts and found out that s3ObjectInputStream.abort() might solve the problem. But that still does not help.
S3Object sourceS3Object = null;
S3ObjectInputStream s3ObjectInputStream=null;
InputStream reader = null;
ObjectMetadata metadata = null;
int retries = 10;
while (retries > 0) {
try {
sourceS3Object = sourceS3Client.getObject(sourceS3Uri.getBucket(), sourceS3Uri.getKey());
s3ObjectInputStream = sourceS3Object.getObjectContent();
reader = new BufferedInputStream(s3ObjectInputStream);
} catch (Exception readingException) {
retries--;
System.out.println(readingException);
System.out.println(readingException.getStackTrace().toString());
readingException.printStackTrace();
if (s3ObjectInputStream != null) {
s3ObjectInputStream.abort();
}
sourceS3Object = sourceS3Client.getObject(sourceS3Uri.getBucket(), sourceS3Uri.getKey());
s3ObjectInputStream = sourceS3Object.getObjectContent();
} finally {
if (retries == 0) {
System.out.println("Out of retries");
}
}
}
The copy succeeds for smaller files but for larger files it gives me this warning
A bit belated but there is an obvious problem to fix.
You need to remove these two lines from your catch (Exception readingException) { block:
sourceS3Object = sourceS3Client.getObject(sourceS3Uri.getBucket(), sourceS3Uri.getKey());
s3ObjectInputStream = sourceS3Object.getObjectContent();
This is because when an exception occurs, you do this:
...
s3ObjectInputStream = sourceS3Object.getObjectContent();
and then loop again. The first thing your loop does is the same:
...
s3ObjectInputStream = sourceS3Object.getObjectContent();
so your program is getting the content stream two times, but only reading the content once. You need to remove these two lines from your Exception block:
Your loop also runs forever which I expect means you just stripped out some code for the example.
The warning mentions that, the file is closed without reading the whole file.
In your code it is mentioned that,
if (s3ObjectInputStream != null) {
s3ObjectInputStream.abort();
}
s3ObjectInputstream stores the file contents and it is being aborted without reading the whole file. So the code should be:
if (s3ObjectInputStream == null) {
s3ObjectInputStream.abort();
}

Streaming a file from HDFS vs copying it to local disk

In my Java application I am using a text file(size ~ 300 MB) which is kept in HDFS. Each line of the file contains a string and an Integer ID separated by a comma. I am reading the file line by line and creating Hashmaps(String, ID) from it.
The file looks like this:
String1,Integer1
String2,Integer2
...
Now, I am currently reading the file from HDFS directly using Apacha Hadoop configuration and FileSystem Object.
Configuration conf = new Configuration();
conf.addResource("core-site.xml"));
conf.addResource("hdfs-site.xml"));
conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
conf.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
path= "<some location in HDFS>"
FileSystem fs = FileSystem.get(URI.create(path), conf);
in = fs.open(new Path(path));
The input Stream "in" is passed to another function called read(InputStream in) for reading the file.
public void init(InputStream is) throws Exception {
ConcurrentMap<String, String> pageToId = new ConcurrentHashMap();
ConcurrentMap<String, String> idToPage = new ConcurrentHashMap();
logger.info("Free memory: " + Runtime.getRuntime().freeMemory());
InputStreamReader stream = new InputStreamReader(is, StandardCharsets.UTF_8);
BufferedReader reader = new BufferedReader(stream);
List<String> pageIdMappingColumns = ServerProperties.getInstance().getIdMappingColumns();
String line;
int line_no=0;
while (true) {
try {
line = reader.readLine();
if (line == null) {
break;
}
line_no++;
//System.out.println("Free memory: " + Runtime.getRuntime().freeMemory());
String[] values = line.split(COMMA);
//System.out.println("Free memory: " + Runtime.getRuntime().freeMemory());
if (values.length < pageIdMappingColumns.size()) {
throw new RuntimeException(PAGEMAPPER_INVALID_MAPPING_FILE_FORMAT);
}
String id = EMPTY_STR;
String page = EMPTY_STR;
for (int i = 0; i < values.length; i++) {
String s = values[i].trim();
if (PAGEID.equals(pageIdMappingColumns.get(i))) {
id = s;
continue;
}
if (PAGENAME.equals(pageIdMappingColumns.get(i))) {
page = s;
}
}
pageToId.put(page, id);
idToPage.put(id, page);
} catch (Exception e) {
logger.error(PAGEMAPPER_INIT + e.toString() + " on line " + line_no);
}
}
logger.info("Free memory: " + Runtime.getRuntime().freeMemory());
logger.info("Total number of lines: " + line_no);
reader.close();
ConcurrentMap<String, String> oldPageToId = pageToIdRef.get();
ConcurrentMap<String, String> oldIdToPage = idToPageRef.get();
idToPage.put(MINUS_1, START);
idToPage.put(MINUS_2, EXIT);
pageToId.put(START, MINUS_1);
pageToId.put(EXIT, MINUS_2);
/* Update the Atomic reference hashmaps in memory in two conditions
1. If there was no map in memory(first iteration)
2. If the number of page-names and page-id pairs in the mappings.txt file are more than the previous iteration
*/
if (oldPageToId == null || oldIdToPage != null && oldIdToPage.size() <= idToPage.size() && oldPageToId.size() <= pageToId.size()) {
idToPageRef.set(idToPage);
pageToIdRef.set(pageToId);
logger.info(PAGEMAPPER_INIT + " " + PAGEMAPPER_UPDATE_MAPPING);
} else {
logger.info(PAGEMAPPER_INIT + " " + PAGEMAPPER_LOG_MSZ);
}
}
I am closing the stream when the work is done like this:
IOUtils.closeQuietly(is);
I am executing the above code every 1 hour since the file is being changed in HDFS in that duration. So now, I am getting java.lang.OutOfMemoryError: Java heap space.
My question is: Is it better to copy the file to disk and then use it rather than directly accessing it from HDFS as far as memory requirements are concerned ?
Note: The file has > 3200000 lines.
Stream is always the way to choose.
You're receiving OutOfMemory because you never close your stream, hence memory leak.
Either manually close your stream or use try-with-resource
Edit
pageToId.put(page, id);
idToPage.put(id, page);
You're storing atleast 2x your file size in memory. Which is roughly 600MB.
After that, you assign that value to some ref variable:
idToPageRef.set(idToPage);
pageToIdRef.set(pageToId);
I guess that you're still having reference to old ref data somewhere, hence the internal map data is not released.
You also have resource leak at
throw new RuntimeException(PAGEMAPPER_INVALID_MAPPING_FILE_FORMAT);
You should use try-with-resource or manually close your stream in finally block.

Faster way of copying data in Java?

I have been given a task of copying data from a server. I am using BufferedInputStream and output stream to copy the data and I am doing it byte by byte. Even though it is running but It is taking ages to copy the data as some of them are in 100's MBs, so definitely it is not gonna work. Can anyone suggest me any alternate of Byte by Byte copy so that my code can copy file that are in few Hundred MBs.
Buffer is 2048.
Here is how my code look like:
static void copyFiles(SmbFile[] files, String parent) throws IOException {
SmbFileInputStream input = null;
FileOutputStream output = null;
BufferedInputStream buf_input = null;
try {
for (SmbFile f : files) {
System.out.println("Working on files :" + f.getName());
if (f.isDirectory()) {
File folderToBeCreated = new File(parent+f.getName());
if (!folderToBeCreated.exists()) {
folderToBeCreated.mkdir();
System.out.println("Folder name " + parent
+ f.getName() + "has been created");
} else {
System.out.println("exists");
}
copyFiles(f.listFiles(), parent + f.getName());
} else {
input = (SmbFileInputStream) f.getInputStream();
buf_input = new BufferedInputStream(input, BUFFER);
File t = new File(parent + f.getName());
if (!t.exists()) {
t.createNewFile();
}
output = new FileOutputStream(t);
int c;
int count;
byte data[] = new byte[BUFFER];
while ((count = buf_input.read(data, 0, BUFFER)) != -1) {
output.write(data, 0, count);
}
}
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (input != null) {
input.close();
}
if (output != null) {
output.close();
}
}
}
Here is a link to an excellent post explaining how to use nio channels to make copies of streams. It introduces a helper method ChannelTools.fastChannelCopy that lets you copy streams like this:
final InputStream input = new FileInputStream(inputFile);
final OutputStream output = new FileOutputStream(outputFile);
final ReadableByteChannel inputChannel = Channels.newChannel(input);
final WriteableByteChannel outputChannel = Channels.newChannel(output);
ChannelTools.fastChannelCopy(inputChannel, outputChannel);
inputChannel.close();
outputChannel.close()
Well since you're using a BufferedInputStream, you aren't reading byte by byte, but rather the size of the buffer. You could just try increasing the buffer size.
Reading/writing byte-by-byte is definitely going to be slow, even though the actual reading/writing is done by chunks of the buffer size. One way to speed it up is to read/write by blocks. Have a look at read(byte[] b, int off, int len) method of BufferedInputStream. However it probably won't give you enough of the improvement.
What would be much better is to use nio package (new IO) to copy data using nio channels. Have a look at nio documentation for more info.
I would suggest to use FileUtils from org.apache.commons.io. It has enough utility methods to perform file operations.
org.apache.commons.io.FileUtils API Here

Java multiple connection downloading

I wanted to get some advice, I have started on a new project to create a java download accelerator that will use multiple connections. I wanted to know how best to go about this.
So far I have figured out that i can use HttpUrlConnection and use the range property, but wanted to know an efficient way of doing this. Once i have download the parts from the multiple connections i will then have to join the parts so that we end up with a fully downloaded file.
Thanks in advance :)
Get the content length of the file to download.
Divide it according to a criteria (size, speed, …).
Run multiple threads to download the file starting at different positions,
and save them in different files: myfile.part1, myfile.part2, …
Once downloaded, join the parts into one single file.
I tried the following code to get the content length:
public Downloader(String path) throws IOException {
int len = 0;
URL url = new URL(path);
URLConnection connectUrl = url.openConnection();
System.out.println(len = connectUrl.getContentLength());
System.out.println(connectUrl.getContentType());
InputStream input = connectUrl.getInputStream();
int i = len;
int c = 0;
System.out.println("=== Content ===");
while (((c = input.read()) != -1) && (--i > 0)) {
System.out.print((char) c);
}
input.close();
}
Here's a sample code to join the files:
public void join(String FilePath) {
long leninfile=0, leng=0;
int count=1, data=0;
try {
File filename = new File(FilePath);
RandomAccessFile outfile = new RandomAccessFile(filename,"rw");
while(true) {
filename = new File(FilePath + count + ".sp");
if (filename.exists()) {
RandomAccessFile infile = new RandomAccessFile(filename,"r");
data=infile.read();
while(data != -1) {
outfile.write(data);
data=infile.read();
}
leng++;
infile.close();
count++;
} else break;
}
outfile.close();
} catch(Exception e) {
e.printStackTrace();
}
}
If you want to avoid joining segments after downloading you could use a FileChannel.
With a FileChannel, you can write to any position of a file (even with multiple threads).
So you could first allocate the whole file, and then
write the segments where they belong as they come in.
See the Javadocs page for more info.
JDownloader is the best downloader I've seen. If you are interested, it's open source and surely you can learn a lot from their code.

Problem with FTPClient class in java

I'm using org.apache.commons.net.ftp.FTPClient and seeing behavior that is, well... perplexing.
The method beneath intends to go through an FTPFile list, read them in and then do something with the contents. That's all working. What is not (really) working is that the FTPClient object does the following...
1) Properly retrieves and stores the FIRST file in the list
2) List item evaluates to NULL for x number of successive iterations of the loop (x varies on successive attempts
3) manages to retrieve exactly 1 more file in the list
4) reports that it is null for exactly 1 more file in the list
5) hangs indefinitely, reporting no further activity.
public static String mergeXMLFiles(List<FTPFile> files, String rootElementNodeName, FTPClient ftp){
String ret = null;
String fileAsString = null;
//InputStream inStream;
int c;
if(files == null || rootElementNodeName == null)
return null;
try {
System.out.println("GETTING " + files.size() + " files");
for (FTPFile file : files) {
fileAsString = "";
InputStream inStream = ftp.retrieveFileStream(file.getName());
if(inStream == null){
System.out.println("FtpUtil.mergeXMLFiles() couldn't initialize inStream for file:" + file.getName());
continue;//THIS IS THE PART THAT I SEE FOR files [1 - arbitrary number (usually around 20)] and then 1 more time for [x + 2] after [x + 1] passes successfully.
}
while((c = inStream.read()) != -1){
fileAsString += Character.valueOf((char)c);
}
inStream.close();
System.out.println("FILE:" + file.getName() + "\n" + fileAsString);
}
} catch (Exception e) {
System.out.println("FtpUtil.mergeXMLFiles() failed:" + e);
}
return ret;
}
has anyone seen anything like this? I'm new to FTPClient, am I doing something wrong with it?
According to the API for FTPClient.retrieveFileStream(), the method returns null when it cannot open the data connection, in which case you should check the reply code (e.g. getReplyCode(), getReplyString(), getReplyStrings()) to see why it failed. Also, you are suppose to finalize file transfers by calling completePendingCommand() and verifying that the transfer was indeed successful.
It works ok when I add after the "retrieve" command :
int response = client.getReply();
if (response != FTPReply.CLOSING_DATA_CONNECTION){
//TODO
}

Categories

Resources