Java read URLConnection with many lines efficiently - java

I have to read a URLConnection response containing 2MB of pretty printed JSON in java.
2mb is not "small" but by no means large. It contains JSON. However, it is pretty printed JSON with around 60k lines. A
while ((line = bufferedReader.readLine()) != null) {
lineAllOfIt += line;
}
takes around 10 minutes to read this response. There must be something wrong with my approach, but I cannot picture a better approach.

For this particular case, I would cache the file locally using java you can have a low memory transfer of the file to your computer, then you can go through it line by line without loading the file into memory as well and pull out the data you need or loading it all at once.
EDIT: Made changes on variable names i pulled this from my code and forgot to neutralize the variables. Also FileChannel transferTo/transferFrom can be much more efficient as there is potentially less copies and depending on operation could go from the SocketBuffer -> Disk. FileChannel API
String urlString = "http://update.domain.com/file.json" // File URL Path
Path diskSaveLocation = Paths.get("file.json"); // This will be just help place it in your working directory
final URL url = new URL(fileUrlString);
final URLConnection conn = url.openConnection();
final long fileLength = conn.getContentLength();
System.out.println(String.format("Downloading file... %s, Size: %d bytes.", fileUrlString, fileLength));
try(
FileOutputStream stream = new FileOutputStream(diskSaveLocation.toFile(), false);
FileChannel fileChannel = stream.getChannel();
ReadableByteChannel inChannel = Channels.newChannel(conn.getInputStream());
) {
long read = 0;
long readerPosition = 0;
while ((read = fileChannel.transferFrom(inChannel, readerPosition, fileLength)) >= 0 && readerPosition < fileLength) {
readerPosition += read;
}
if (fileLength != Files.size(diskSaveLocation)) {
Files.delete(diskSaveLocation);
System.out.println(String.format("File... %s did not download correctly, deleting file artifact!", fileUrlString));
}
}
System.out.println(String.format("File Download... %s completed!", fileUrlString));
((HttpURLConnection) conn).disconnect();
You can now read this same file using a NIO2 method that allows you to read line by line without loading into memory. Using Scanner or RandomAccessFile methods you can prevent reading lines into the heap. If you want to read the whole file in you can also do so locally from the cached file using many of the methods from Javas Files utility methods.
Java Read Large Text File With 70million line of text

Related

FileOutputStream, directory

this is my directory structure
Inside the server I have the following code for saving a file that gets sent from the client
fileName = reader.readLine();
DataInputStream dis = null;
try {
dis = new DataInputStream(csocket.getInputStream());
FileOutputStream fos = new FileOutputStream(fileName);
buffer = new byte[4096];
int fileSize = 15123;
int read = 0;
int totalRead = 0;
int remaining = fileSize;
while((read = dis.read(buffer, 0, Math.min(buffer.length, remaining))) > 0) {
totalRead += read;
remaining -= read;
fos.write(buffer, 0, read);
}
fos.close();
dis.close();
} catch (IOException e) {
}
break;
I'm wondering how I would go about saving the file within the xml folder? I've tried using getClass().getResource and such but nothing seems to work.
fileName is just a simple string containing the name of the file, not a path or anything.
I get the correct path using this code:
File targetDir = new File(getClass().getResource("xml").getPath());
File targetFile = new File(targetDir, fileName);
targetFile.createNewFile();
System.out.println(targetFile.getAbsolutePath());
dis = new DataInputStream(csocket.getInputStream());
FileOutputStream fos = new FileOutputStream(targetFile.getAbsolutePath(), false);
But it still won't save it there...
The best way is to receive explicitly the target path for storing files, either through a .properties file or a command-line argument. In this way, you make your program flexible to be installed and adapted in different environments.
But if you wish your program to assume the target directory automatically, the best option is to set a relative path before creating the FileOutputStream, as long as you start your program always from the same path:
File targetDir=new File("xml");
File targetFile=new File(targetDir, fileName);
FileOutputStream fos = new FileOutputStream(targetFile);
This will work assuming the program is started from server as current directory.
Update
Other minor suggestions about your program:
Never base the exit condition of the loop on a hard-coded file size, because it is not possible to know it a priori. Instead, check explicitly if the value returned by read is less than 0 => that means End Of File reached.
Consequently, do not bother to calculate the exact amount of data to get through a call to read. Just enter the buffer size, because you are setting a maximum data size.
Never let exceptions catched without a proper treatment: If you know how to make your program recover, enter a proper code into the catch block. Otherwise, you'd better not catch them: Declare them in the throws clause and let them be propagated to the caller.
Always create stream resources through the try-with-resources instruction, to ensure they got closed at the end:
try (FileOutputStream fos = new FileOutputStream(...))
{
// ... use fos...
}
Save unnecessary instructions: If you don't care about if the file already exists on the filesystem or not, don't call createNewFile. But if you care, check the returned value and bifurcate consequently.
I try to create file and it does created but not at ProjectName\src\com\company\xml but in ProjectName\out\production\ProjectName\com\company\xml
my code:
File targetDir = new File(this.getClass().getResource("xml").getPath());
// get the parent of the file
String parentPath = targetDir.getParent( );
String fileName="xml/name.txt";
//do something
File targetFile = new File(parentPath, fileName);
targetFile.createNewFile();
Just pay attention that after compilation you will try to save it into a jar file and it a complicated thing to do.
usually you need to save file into file outside from your jar(separate in the root) like this:

Combining compressed Gzipped Text Files using Java

my question might not be entirely related to Java but I'm currently seeking a method to combine several compressed (gzipped) textfiles without the requirement to recompress them manually. Lets say I have 4 files, all text that is compressed using gzip and want to compress these into one single *.gz file without de + recompressing them. My current method is to open an InputStream and parse the file linewise, storing in a GZIPoutputstream, which works but isn't very fast.... I could of course also call
zcat file1 file2 file3 | gzip -c > output_all_four.gz
This would work, too but isn't really fast either.
My idea would be to copy the inputstream and write it to outputstream directly without "parsing" the stream, as I don't need to manipulate anything actually. Is something like this possible?
Find below a simple solution in Java (it does the same as my cat ... example). Any kind of buffering the input/output has been omitted to keep the code slim.
public class ConcatFiles {
public static void main(String[] args) throws IOException {
// concatenate the single gzip files to one gzip file
try (InputStream isOne = new FileInputStream("file1.gz");
InputStream isTwo = new FileInputStream("file2.gz");
InputStream isThree = new FileInputStream("file3.gz");
SequenceInputStream sis = new SequenceInputStream(new SequenceInputStream(isOne, isTwo), isThree);
OutputStream bos = new FileOutputStream("output_all_three.gz")) {
byte[] buffer = new byte[8192];
int intsRead;
while ((intsRead = sis.read(buffer)) != -1) {
bos.write(buffer, 0, intsRead);
}
bos.flush();
}
// ungezip the single gzip file, the output contains the
// concatenated input of the single uncompressed files
try (GZIPInputStream gzipis = new GZIPInputStream(new FileInputStream("output_all_three.gz"));
OutputStream bos = new FileOutputStream("output_all_three")) {
byte[] buffer = new byte[8192];
int intsRead;
while ((intsRead = gzipis.read(buffer)) != -1) {
bos.write(buffer, 0, intsRead);
}
bos.flush();
}
}
}
The above method works if you just require to gzip many zipped files. In my case I had made a web servlet and my response was in 20-30 KBs. So I was sending the zipped response.
I tried to zip all my individual JS files on server start only and then add dynamic code runtime using the above method. I could print the entire response in my log file but chrome was able to unzip the first file only. Rest output was coming in bytes.
After research I found out that this is not possible with chrome and they have closed the bug also without solving it.
https://bugs.chromium.org/p/chromium/issues/detail?id=20884

Broken Text : reading larger size text in android

i have a question about Broken text when android app is reading large size text file.
I am trying to build the app to read large size text file(about 10mb)
when I am reading a file and using System.println to check the contents of text file
However, when I display message but print statement
it displays broken text such as..
��T��h��e�� ��P��r��o��j��e��c��t�� ��G��u
when I was reading small size of rtf was find, but i used text file then i made problems
I used code like ..
String UTF8 = "utf8";
int BUFFER_SIZE = 8192;
File gone = new File(path);
FileInputStream inputStream = new FileInputStream(gone);
// FileInputStream inputStream = openFileInput(gone);
if ( inputStream != null ) {
InputStreamReader inputStreamReader = new InputStreamReader(inputStream,UTF8);
BufferedReader bufferedReader = new BufferedReader(inputStreamReader, BUFFER_SIZE);
String receiveString = "";
StringBuilder stringBuilder = new StringBuilder();
while ( (receiveString = bufferedReader.readLine()) != null ) {
stringBuilder.append(receiveString);
}
inputStream.close();
ret = stringBuilder.toString();
System.out.println(ret);
}
I was thinking about that it can be problem of encoding. there fore i added utf8 option.
However, it still doesn't work ..
Does anyone know solution of broken text ?
UPDATE:
I think, I solved problem.
I create new text file from window text editor and then i copy and paste content.
Now , it is reading file correctly
It may be wrong encoding for the given file, may be the file does not contain text, may be console does not support the characters.
Besides the code is too long, here's a one line solution
String s = new String(Files.readAllBytes(Paths.get(file)), "UTF-8");
The file may contain images or unsupported format, in that case it'll display like that.

What's up with AssetFileDescriptor.getFileDescriptor()?

I have an uncompressed binary file in res/raw that I was reading this way:
public byte[] file2Bytes (int rid) {
byte[] buffer = null;
try {
AssetFileDescriptor afd = res.openRawResourceFd(rid);
FileInputStream in = new FileInputStream(afd.getFileDescriptor());
int len = (int)afd.getLength();
buffer = new byte[len];
in.read(buffer, 0, len);
in.close();
} catch (Exception ex) {
Log.w(ACTNAME, "file2Bytes() fail\n"+ex.toString());
return null;
}
return buffer;
}
However, buffer did not contain what it was supposed to. The source file is 1024 essentially random bytes (a binary key). But buffer, when written out and examined, was not the same. Amongst unprintable bytes at beginning appeared "res/layout/main.xml" (the literal path) and then further down, part of the text content of another file from res/raw. O_O?
Exasperated after a while, I tried:
AssetFileDescriptor afd = res.openRawResourceFd(rid);
//FileInputStream in = new FileInputStream(afd.getFileDescriptor());
FileInputStream in = afd.createInputStream();
Presto, I got the content correctly -- this is easily reproducible.
So the relevant API docs read:
public FileDescriptor getFileDescriptor ()
Returns the FileDescriptor that can be used to read the data in the
file.
public FileInputStream createInputStream ()
Create and return a new auto-close input stream for this asset. This
will either return a full asset
AssetFileDescriptor.AutoCloseInputStream, or an underlying
ParcelFileDescriptor.AutoCloseInputStream depending on whether the the
object represents a complete file or sub-section of a file. You should
only call this once for a particular asset.
Why would a FileInputStream() constructed from getFileDescriptor() end up with garbage whereas createInputStream() gives proper access?
As per pskink's comment, the FileDescriptor returned by AssetFileDescriptor() is apparently not an fd that refers just to the file -- it perhaps refers to whatever bundle/parcel/conglomeration aapt has made of the resources.
AssetFileDescriptor afd = res.openRawResourceFd(rid);
FileInputStream in = new FileInputStream(afd.getFileDescriptor());
in.skip(afd.getStartOffset());
Turns out to be the equivalent of the FileInputStream in = afd.createInputStream() version.
I suppose there is a hint in the difference between "create" (something new) and "get" (something existing). :/
AssetFileDescriptor can be thought of as the entry point to the entire package's assets data.
I have run into the same issue and solved it finally.
If you want to manually create a stream from an AssetFileDescriptor, you have to skip n bytes to the requested resource. It is like you are paging thru all the available files in one big file.
Thanks to pskink! I had a look at the hex content of the jpg image I want to acquire, it starts with -1. The thing is, there are two jpg images. I did not know, so I arbitrarily skip 76L bytes. Got the first image!

Size of file which we get through AssetManager function getAssets in Android

AssetManager mngr = getAssets();
test_file = mngr.open("sample.txt");
above test_file variable is of InputStream type. Any way to calculate the file size of sample.txt from it?
I have an alternative to get size of a file in assets using AssetFileDescriptor:
AssetFileDescriptor fd = getAssets().openFd("test.png");
Long size = fd.getLength();
Hope it helps.
test_file.available();
Is not a very reliable method to get the file length as is stated in the docs.
size = fd.getLength();
Using the FileDescriptor as shown by Ayublin is!
His answer should be promoted to the correct answer.
inputStream.available() might match the file size if the file is very small, but for larger files it isn't expected to match.
For a compressed asset, the only way to get the size reliably is to copy it to the filesystem, ex: context.getCacheDir() then read the length of the file from there. Here's some sample code that does this. It probably then also makes sense to use the file from the cache dir as opposed to the assets after this.
String filename = "sample.txt";
InputStream in = context.getAssets().open(filename);
File outFile = new File(context.getCacheDir(), filename);
OutputStream out = new FileOutputStream(outFile);
try {
int len;
byte[] buff = new byte[1024];
while ((len = in.read(buff)) > 0) {
out.write(buff, 0, len);
}
} finally {
// close in & out
}
long theRealFileSizeInBytes = outFile.length();
You should also delete the file from the cache dir when you are done with it (and the entire cache dir will also be deleted automatically when uninstalling the app).

Categories

Resources