Copying file data over network using Java very slow - java

My code has to read a portion of data from each file it loads, typically about 100,000 bytes. This works fine for local files but can be really slow over my wifi network, even though my network seems adequate (but not blistering fast)
So I created this simple test:
public void testDataCopySpeed() throws Exception
{
File file = new File("Z:\\TestFile.mp3");
System.out.println("start:"+new Date());
FileChannel fc = new FileInputStream(file).getChannel();
ByteBuffer bb = ByteBuffer.allocate(500000); //1/2 MB
fc.read(bb);
System.out.println("end:"+new Date());
}
Would take less than a second on a local file, but over a minute on a networked file.
So I then tried to test my network speed, I cannot see how to just test the wifi but I tested the internet upload/download speed using http://myspeedtestonline.com/ assuming this would be slower than my actual wifi network. It gave me:
Download Speed:512KB/second
Upload Speed :40KB/second
and I ran the same test on another computer and it gave a similar speed
So how is it I can download 1/2 MB of data in one second but it can take a minute to copy 1/2MB of data from one file in Java, the file is hosted on a NAS. ?
EDIT:So I have a couple of good answers below, what I really want to know is what is the best way to get access to the first 100,000 bytes from a set of files for read only access that will work for local and networked files, or should I have different code depending on whether or not it is not networked. Fixing the network is not really the solution, I may be able to fix my network but this software has to work on any computer, many of my customer may not have optiminal networks and would not have the skill to fix their network issues.

Can you try a memory mapped file?
File file = new File("Z:/TestFile.mp3");
System.out.println("start:"+new Date());
FileChannel fc = new FileInputStream(file).getChannel();
MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());
System.out.println("end:"+new Date());
This might only appear faster, or it may help hide how long it takes.

You are comparing apples with oranges here. When you access http://myspeedtestonline.com/, the flash plugin is probably using the HTTP protocol, but certainly not CIFS.
When you address a file on a NAS, it's most probably using the CIFS protocol. This protocol is known to have performance problems, especially when implemented on consumer appliances (Buffalo drives, etc.).
Sometimes the MTU size is too big, causing the packets to be fragmented and resent.
So my guess is that Java is not the right address to blame in this case. In any case however, you cannot analyze the problem with a simple Java program. You should use a network sniffer for this.

Related

Is it advisable to store large strings in memory, or repeatedly read a file?

Let's say I have various text/json/xml/whatever files (stored locally, in the assets directory), ranging in size from 20 - 500 KB. Assuming these files are going to be referenced frequently, throughout the application, is it better to:
A) Read the file once, the first time it's requested, and store the data in a variable
or
B) Read the file each time it's requested, grab the requested bit of information, and allow GC to clean up afterward?
Coming from web-dev, I generally use option (A), but I wonder if the storage limitation of mobile devices makes B preferred in this context (Android app development).
TYIA.
You can store your data into memory by compressing it.That it will reduce your memory footprint at any point of time.So this technique can be applicable to both PCs and mobile phones.Later on when you need the data, read and decompress it.So read the file once, then compress and store it in the memory.The following example uses GZIPOutputStream to compress a string.
public static String compress(String str){
ByteArrayOutputStream out = new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(out);
gzip.write(str.getBytes());
gzip.close();
return out.toString("ISO-8859-1");
}
If the file is being requested frequently, definitely it's better to read the file once and store in cache.
You can also read this article titled How Google Taught me to Cache and Cash In in HighScalability website.
That depends on the total size of the files, accessing frequency, and your targeting customers. Although high-end phone got very large memory, the are many low-ends system which has fewer memory.
It might deserve to use some LRU cache to reach a balance.

Storing state in Java

Broad discussion question.
Are there any libraries already which allow me to store the state of execution of my application in Java?
E.g I have an application which processes files, now the application may be forced to shutdown suddenly at some point.I want to store the information on what all files have been processed and what all have not been, and what stage the processing was on for the ongoing processes.
Are there already any libraries which abstract this functionality or I would have to implement it from scratch?
It seems like what you are looking for is serialization which can be performed with the Java Serialization API.
You can write even less code if you decide to use known libraries such as Apache Commons Lang, and its SerializationUtils class which itself is built on top the Java Serialization API.
Using the latest, serializing/deserializing your application state into a file is done in a few lines.
The only thing you have to do is create a class holding your application state, let's call it... ApplicationState :-) It can look like that:
class ApplicationState {
enum ProcessState {
READ_DONE,
PROCESSING_STARTED,
PROCESSING_ENDED,
ANOTHER_STATE;
}
private List<String> filesDone, filesToDo;
private String currentlyProcessingFile;
private ProcessState currentProcessState;
}
With such a structure, and using SerializationUtils, serializing is done the following way:
try {
ApplicationState state = new ApplicationState();
...
// File to serialize object to
String fileName = "applicationState.ser";
// New file output stream for the file
FileOutputStream fos = new FileOutputStream(fileName);
// Serialize String
SerializationUtils.serialize(state, fos);
fos.close();
// Open FileInputStream to the file
FileInputStream fis = new FileInputStream(fileName);
// Deserialize and cast into String
String ser = (String) SerializationUtils.deserialize(fis);
System.out.println(ser);
fis.close();
} catch (Exception e) {
e.printStackTrace();
}
It sounds like the Java Preferences API might be a good option for you. This can store user/system settings with minimal effort on your part and you can update/retrieve at any time.
https://docs.oracle.com/javase/8/docs/technotes/guides/preferences/index.html
It's pretty simple to make from scratch. You could follow this:
Have a DB (or just a file) that stores the information of processing progress. Something like:
Id|fileName|status|metadata
As soon as you start processing a file make a entry to this table. Ans mark status as PROCESSING, the you can store intermediate states, and finally when you're done you can set status to DONE. This way, on restart, you would know what are the files processed; what are the files that were in-citu when the process shutdown/crashed. And (obviously) where to start.
In large enterprise environment where applications are loosely coupled (and there is no guarantee if the application will be available or might crash), we use Message Queue to do something like the same to ensure reliable architecture.
There are almost too many ways to mention. I would choice the option you believe is simplest.
You can use;
a file to record what is done (and what is to be done)
a persistent queue on JMS (which support multiple processes, even on different machine)
a embedded or remote database.
An approach I rave about is using memory mapped files. A nice feature is that information is not lost if the application dies or is killed (provided the OS doesn't crash) which means you don't have to flush it, nor worry about losing data if you don't.
This works because the data is partly managed by the OS which means it uses little heap (even for TB of data) and the OS deals with loading and flushing to disk making it much faster (and making sizes much larger than your main memory practical).
BTW: This approach works even with a kill -9 as the OS flushes the data to disk. To test this I use Unsafe.getByte(0) which crashes the application with a SEG fault immediately after making a change (as in the next machine code instruction) and it still writes the change to disk.
This won't work if you pull the power, but you have to be really quick. You can use memory mapped files to force the data to disk before continuing, but I don't know how you can test this really works. ;)
I have a library which could make memory mapped files easier to use
https://github.com/peter-lawrey/Java-Chronicle
Its a not long read and you can use it as an example.
Apache Commons Configuration API: http://commons.apache.org/proper/commons-configuration/userguide/howto_filebased.html#File-based_Configurations

download with java code is really slow

i wrote a bit of code that reads download links from a text file and downloads the videos using the copyURLToFile methode from apaches commons-io library and the download is really slow when im in my wlan.
when i put in an internet stick is is about 6 times faster although the stick got 4mbit and my wlan is 8 mbit.
i also tried to do it without the commons-io library but the problem is the same.
normally im downloading 600-700 kb/s in my wlan but with java it only downloads with about 50 kb/s. With the internet stick its about 300 kb/s.
Do you know what the Problem could be?
thanks in advance
//Edit: Here is the code but i dont think it has anything to do with this and what do you mean with network it policies?
FileInputStream fstream = new FileInputStream(linksFile);
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String link;
String name;
while ((link = br.readLine()) != null) {
name = br.readLine();
FileUtils.copyURLToFile(new URL(link), new File("videos/"+name+".flv"));;
System.out.println(link);
}
This isn't likely to be a Java problem.
The code you've posted actually doesn't do any IO over the network - it just determines a URL and passes it to (presumably Apache Commons') FileUtils.copyURLToFile. As usual with popular third-party libraries, if this method had a bug in it that caused slow throughput in all but the most unusual situations, it would already have been identified (and hopefully fixed).
Thus the issue is going to lie elsewhere. Do you get the expected speeds when accessing resource through normal HTTP methods (e.g. in a browser)? If not, then there's a universal problem at the OS level. Otherwise, I'd have a look at the policies on your network.
Two possible causes spring to mind:
The obvious one is some sort of traffic shaping - your network deprioritises the packets that come from your Java app (for an potentially arbitrary reason). You'd need to see hwo this is configured and look at its logs to see if this is the case.
The problem resides with DNS. If Java's using a primary server that's either blocked or incredibly slow, then it could take up to a few seconds to convert that URL to an IP address and begin the actual transfer. I had a similar problem once when a firewall was silently dropping packets to one server and it took three seconds (per lookup!) for the Java process to switch to the secondary server.
In any case, it's almost certainly not the Java code that's at fault.
The FileUtils.copyURLToFile internals uses a buffer to read.
Increasing the value of the buffer could speed up the download, but that seems not possible.

URL.openStream() is very slow when ran on school's unix server

I am using URL.openStream() to download many html pages for a crawler that I am writing. The method runs great locally on my mac however on my schools unix server the method is extremely slow. But only when downloading the first page.
Here is the method that downloads the page:
public static String download(URL url) throws IOException {
Long start = System.currentTimeMillis();
InputStream is = url.openStream();
System.out.println("\t\tCreated 'is' in "+((System.currentTimeMillis()-start)/(1000.0*60))+"minutes");
...
}
And the main method that invokes it:
LinkedList<URL> ll = new LinkedList<URL>();
ll.add(new URL("http://sheldonbrown.org/bicycle.html"));
ll.add(new URL("http://www.trentobike.org/nongeo/index.html"));
ll.add(new URL("http://www.trentobike.org/byauthor/index.html"));
ll.add(new URL("http://www.myra-simon.com/bike/travel/index.html"));
for (URL tmp : ll) {
System.out.println();
System.out.println(tmp);
CrawlerTools.download(tmp);
}
Output locally (Note: all are fast):
http://sheldonbrown.org/bicycle.html
Created 'is' in 0.00475minutes
http://www.trentobike.org/nongeo/index.html
Created 'is' in 0.005083333333333333minutes
http://www.trentobike.org/byauthor/index.html
Created 'is' in 0.0023833333333333332minutes
http://www.myra-simon.com/bike/travel/index.html
Created 'is' in 0.00405minutes
Output on School Machine Server (Note: All are fast except the first one. The first one is slow regardless of what the first site is):
http://sheldonbrown.org/bicycle.html
Created 'is' in 3.2330666666666668minutes
http://www.trentobike.org/nongeo/index.html
Created 'is' in 0.016416666666666666minutes
http://www.trentobike.org/byauthor/index.html
Created 'is' in 0.0022166666666666667minutes
http://www.myra-simon.com/bike/travel/index.html
Created 'is' in 0.009533333333333333minutes
I am not sure if this is a Java issue (*A problem in my Java code) or a server issue. What are my options?
When run on the server this is the output of the time command:
real 3m11.385s
user 0m0.277s
sys 0m0.113s
I am not sure if this is relevant... What should I do to try and isolate my problem..?
You've answered your own question. It's not a Java issue, it has to do with your school's network or server.
I'd recommend that you report your timings in milliseconds and see if they're repeatable. Run that test in a loop - 1,000 or 10,000 times - and keep track of all the values you get. Import them into a spreadsheet and calculate some statistics. Look at the distribution of values. You don't know if the one data point that you have is an outlier or the mean value. I'd recommend that you do this for both networks in exactly the same way.
I'd also recommend using Fiddler or some other tool to watch network traffic as you download. You can get better insight into what's going on and perhaps ferret out the root cause.
But it's not Java. It's your code, your network. If this was a bug in the JDK it would have been fixed a long time ago. Suspect yourself first, last, and always.
UPDATE:
My network admin assured me that this
was a bad java implementation Not a
network problem. What do you think?
"Assured" you? What evidence did s/he produce to support this conclusion? What data? What measurements were taken? Sounds like laziness and ignorance to me.
It certainly doesn't explain why all the other requests behave just fine. What changed in Java between the first and subsequent calls? Did the JVM suddenly rewrite itself?
You can accept it if you want, but I'd say shame on your network admin for not being more curious. It would have been more honorable to be honest and say they didn't know, didn't have time, and weren't interested.
By Default Java prefers to use IPv6. My school's firewall
drops all IPv6 traffic (with no warning). After 3 minutes, 15 seconds Java falls back to IPv4. Seems strange to me that it takes so long to fall back to IPv4.
duffymo's answer, essentially: "Go talk to your network admin", helped me to solve the problem however I think that this is a problem caused by a strange Java implementation and a strange network configuration.
My network admin assured me that this was a bad java implementation Not a network problem. What do you think?

How to transfer a file in a client/server Java application

I am currently working on a homework assignment and I am thoroughly stuck. I am on the last question and I just can not figure out how to go about accomplishing the last task. Below is the tasks I had to complete:
The client should
save the file in the "client"
subdirectory of the home directory.
Test your program. Be sure it works
with binary files, not just text
files. Be sure it works when both
programs are on the same machine as
well as when they are separated over
the network.
Thus far when I start the server it asks what port I want to use. Then I start the client and it asks what IP and port to use. The the server immediately sends a list of the files in the home directory "server" folder. I then respond with the client with the file number I wish to download. This is where I get stuck. I can't seem to find any information about how to do this. So as you can see in my code posted below, I am trying to use a FileInputReader to convert the file to an array of bytes. Then I am sending that to the client. I am then trying to FileOutputReader the recieved array of bytes to a file. But I can't seem to find the correct methods to do that, or even if I am doing that correctly.
CLIENT
int i = 0;
while(i < 1000){
String modifiedSentence = inFromServer.readLine();
System.out.println("From Server: " + modifiedSentence);
i++;
}
while(j < 1000) {
int byteString = inFromServer.read();
ArrayList<byte[]> bytes = new ArrayList<byte[]>();
bytes.add(byteString);
}
Integer byteInt = new Integer(byteString);
FileOutputStream fo = new FileOutputStream(System.getProperty("user.home")+ "/client/text.txt");
fo.write(byteInt.byteValue());
}
}
SERVER
byte[] bytes = new byte[1024];
FileInputStream fi = new FileInputStream(file.toString() + fileArray[userChoiceInt]);
fi.read(bytes, 0, 1024);
outToClient.write(bytes, 0, 1024);
}
}
}
If anyone could offer any advice or the correct classes or methods to use I would appreciate it.
Thank you in advance.
Without spoiling the whole thing here's some hints.
This can be easily accomplish by using Socket (Server & Client). Using byte[] for transfering the file(s) will ensure that your program will work with both ascii and binary file(s).
Another approach would be to use the build in Remote Method Invocation (RMI). I haven't transfered file using this approach but I'm sure it's feasible.
And in case you didn't know, getting the user home directory is accomplished with the following call: System.getProperty( "user.home" );
You can take a look at this tutorial by Sun (Oracle). That should give you a basic understanding of sockets.
What I do seem to notice however, in the client side, you iterate a specific amount of times (1000) which is not a good idea since generally, you do not know the size of the file to be sent, the tutorial should show this and how to make the appropriate changes. Also, you keep creating the structures within the loop, so you loose any information that you have received, besides creating new datastructures each and every time, which is inefficient.
What you have to do is to move the initialization of the structures from outside the loop. Also, in the case of the modifiedSentence variable, you might want to change it from string to StringBuilder.
If you can use an IO library for this, I would suggest Netty or Mina. There are some netty examples here: http://jboss.org/netty/documentation.html
Even if you cannot use a library, these may be helpful for learning how things are done.
You probably should not ask how to do homework in the class on Web sites like this. It is not appropriate. Your server does look mostly good. Your program will only read files up to 1024 bytes though. You should look at java.io.File in more detail. There is a length method you can use to find the length of the file, so you know how much to send.

Categories

Resources