URL.openStream() is very slow when ran on school's unix server - java

I am using URL.openStream() to download many html pages for a crawler that I am writing. The method runs great locally on my mac however on my schools unix server the method is extremely slow. But only when downloading the first page.
Here is the method that downloads the page:
public static String download(URL url) throws IOException {
Long start = System.currentTimeMillis();
InputStream is = url.openStream();
System.out.println("\t\tCreated 'is' in "+((System.currentTimeMillis()-start)/(1000.0*60))+"minutes");
...
}
And the main method that invokes it:
LinkedList<URL> ll = new LinkedList<URL>();
ll.add(new URL("http://sheldonbrown.org/bicycle.html"));
ll.add(new URL("http://www.trentobike.org/nongeo/index.html"));
ll.add(new URL("http://www.trentobike.org/byauthor/index.html"));
ll.add(new URL("http://www.myra-simon.com/bike/travel/index.html"));
for (URL tmp : ll) {
System.out.println();
System.out.println(tmp);
CrawlerTools.download(tmp);
}
Output locally (Note: all are fast):
http://sheldonbrown.org/bicycle.html
Created 'is' in 0.00475minutes
http://www.trentobike.org/nongeo/index.html
Created 'is' in 0.005083333333333333minutes
http://www.trentobike.org/byauthor/index.html
Created 'is' in 0.0023833333333333332minutes
http://www.myra-simon.com/bike/travel/index.html
Created 'is' in 0.00405minutes
Output on School Machine Server (Note: All are fast except the first one. The first one is slow regardless of what the first site is):
http://sheldonbrown.org/bicycle.html
Created 'is' in 3.2330666666666668minutes
http://www.trentobike.org/nongeo/index.html
Created 'is' in 0.016416666666666666minutes
http://www.trentobike.org/byauthor/index.html
Created 'is' in 0.0022166666666666667minutes
http://www.myra-simon.com/bike/travel/index.html
Created 'is' in 0.009533333333333333minutes
I am not sure if this is a Java issue (*A problem in my Java code) or a server issue. What are my options?
When run on the server this is the output of the time command:
real 3m11.385s
user 0m0.277s
sys 0m0.113s
I am not sure if this is relevant... What should I do to try and isolate my problem..?

You've answered your own question. It's not a Java issue, it has to do with your school's network or server.
I'd recommend that you report your timings in milliseconds and see if they're repeatable. Run that test in a loop - 1,000 or 10,000 times - and keep track of all the values you get. Import them into a spreadsheet and calculate some statistics. Look at the distribution of values. You don't know if the one data point that you have is an outlier or the mean value. I'd recommend that you do this for both networks in exactly the same way.
I'd also recommend using Fiddler or some other tool to watch network traffic as you download. You can get better insight into what's going on and perhaps ferret out the root cause.
But it's not Java. It's your code, your network. If this was a bug in the JDK it would have been fixed a long time ago. Suspect yourself first, last, and always.
UPDATE:
My network admin assured me that this
was a bad java implementation Not a
network problem. What do you think?
"Assured" you? What evidence did s/he produce to support this conclusion? What data? What measurements were taken? Sounds like laziness and ignorance to me.
It certainly doesn't explain why all the other requests behave just fine. What changed in Java between the first and subsequent calls? Did the JVM suddenly rewrite itself?
You can accept it if you want, but I'd say shame on your network admin for not being more curious. It would have been more honorable to be honest and say they didn't know, didn't have time, and weren't interested.

By Default Java prefers to use IPv6. My school's firewall
drops all IPv6 traffic (with no warning). After 3 minutes, 15 seconds Java falls back to IPv4. Seems strange to me that it takes so long to fall back to IPv4.
duffymo's answer, essentially: "Go talk to your network admin", helped me to solve the problem however I think that this is a problem caused by a strange Java implementation and a strange network configuration.
My network admin assured me that this was a bad java implementation Not a network problem. What do you think?

Related

Why is one user getting NoSuchMethodError where tens of thousands don't?

How can one of our many users get
java.lang.NoSuchMethodError:
at com.mycelium.wallet.activity.settings.SettingsPreference.getLanguage (SettingsPreference.kt:73)
at com.mycelium.wallet.WalletApplication.onCreate (WalletApplication.java:109)
at android.app.Instrumentation.callApplicationOnCreate (Instrumentation.java:1127)
on this line of Kotlin code:
#JvmStatic
fun getLanguage(): String? = sharedPreferences.getString(Constants.LANGUAGE_SETTING, Locale.getDefault().language)
There are three function calls on this line: android.content.SharedPreferences::getString(String,String), java.util.Locale::getDefault() and java.util.Locale::getLanguage() all of which are available since API 1.
The only affected user (Samsung Galaxy A5(2017) (a5y17lte), 2816MB RAM, Android 8.0) tried to start the app 180 times with this insta-crash.
The conversion to Kotlin might have issues still?
try { ... } catch (NoSuchMethodError e) { ... } might be a suitable workaround. But they already might have given up; if you don't have any email address or alike, you won't be able to notify them. You'd could return a static string in case of a NoSuchMethodError. Besides, if one has written down the seed phrase, the wallet is on the block-chain, the device only has the keys. I'd file that as an individual destiny - and that device probably could still be rooted, to have the keys extracted. It's difficult to help them without having a support request, so that one could notify them of a new version, which not relies on whatever unknown method. Maybe they still use it and would receive an auto-update and try again, but only maybe - but there's no guarantee, that this is the only one unknown method on this device.
It's definitely not a Kotlin issue, but rather a storage defect; google "eMMC corruption".
And if the user has not written down their seed phrase, it's their very own fault.
This all is an assumption, but the probability isn't that low.
As you know (I am sure) a NoSuchMethodError is caused by a mismatch between the versions of classes at compile time versus at runtime.
And, I agree with you that the three methods called by that line of code should be present at runtime.
I was a suspicious that there isn't a message string for the NoSuchMethodError to say which method was missing, but there are other examples for the Android platform where the message is missing. So I am (tentatively) calling this not significant.
So we have to look for other explanations. Here are some:
The line number in the stacktrace could be inaccurate. People sometimes report this kind of thing; e.g. Crashlytics is reporting wrong line numbers
This particular user could be running a different (older?) version of your app where the code at that line is different to code you are looking at.
The user has "rooted" his device and messed around with its standard libraries. Alternatively, the user's device has been hacked and the hacker has interfered with the standard libraries (rather crudely in this case).
The user has been messing with the bytecodes for your app and has accidentally got it trying to call a non-existent method. Alternatively, the user is running a (crudely) trojaned version of your app where the bad guy has done the same thing.
The fact that your app involves Bitcoins means that there could be strong incentive for someone to be doing nefarious things ... so the last two alternatives should not be discounted.
The conversion to Kotlin might have issues still?
I don't see why that would affect only one user.
The fact the user tried 180 times is why I care. This is a Bitcoin wallet, so ... they might have money in that wallet and I hate to not fix issues if I can.
(Or conversely, this might be a bad guy trying to get bitcoins out of a wallet via a stolen device. The fact that the user is being so persistent ... and hasn't contacted you for help ... is suspicious in itself.)
But the point is that if you don't have any way to contact this user, fixing the issue in general is unlikely to help them directly. And right now you don't have enough information to know what the problem really is.

Access .net DLL from Java

I am new to java and DLL-s
I need to access DLL's methods from java. So go easy on me.
I have tried using JNA to access the DLL here is what I have done.
import com.sun.jna.Library;
public class mapper {
public interface mtApi extends Library {
public boolean IsStopped();
}
public static void main(String []args){
mtApi lib = (mtApi) Native.loadLibrary("MtApi", mtApi.class);
boolean test = lib.IsStopped();
System.out.println(test);
}
}
When I run the code, I am getting the following error:
Exception in thread "main" java.lang.UnsatisfiedLinkError:Error looking up function 'IsStopped':The specified procedure could not be found.
I understand that this error is saying it cannot find the function, but I have no idea how to fix it.
I am trying to use this API mt4api
and here is the method, I am attempting to access MQL4
Can anyone tell me what I am doing wrong?
I have looked at other alternatives, like jni4net, but I cannot get this working either.
If anyone can link me to a tutorial that shows me how to set this up, or knows how to, I would be greatfull.
Trading?Hunting for milliseconds to shave-off?Go rather into Distributed Processing... Definitely safer than relying on API !
While your OP was directed onto how bend java to call .NET DLL-functions,
let me sketch a much future-safer solution.
Using AI/ML-regression based predictors for FOREX trading, I was hunting in the same forest. The best solution found within the last about 12-years, having spent about a few hundreds man*years of experience, was setup in the following manner:
Host A executes trades: operates MetaTrader Terminal 4, with both Script and EA --- the distributed-processing system communicates with with a use of ZeroMQ low-latency messaging/signalling framework ( about a few tens of microseconds needed )
Host B executes AI/ML processing of predictions for a traded instrument ( about a few hundreds of microseconds apply )
Cluster C executes continuous AI/ML predictor re-trainings and HyperParameterSPACE model selections ( many CPU-hours indeed needed, continuous model self-adapting process running 24/7 )
Signalling / Messaging layer with ZeroMQ has ports and/or bindings available and ready for most of the mainstream and many of niche programming languages, including java.
Hidden dangers of going just against a published API:
While the efforts for system integration and testing are immense, the API specifications are always dangerous for specification creeping.
This said, add countless man*months consumed on debugging after a silent change in MT4 language specifications that de-rail your previous tools + libraries. Why? Just imagine. Some time ago, MQL4 stopped to be MQL4 and was silently shifted towards MQL5, under a name New-MQL4. Among other changes in compilation, there were many small and big nails in the coffin -- string surprisingly ceased to be a string and was hidden as an internal struct -- which one could guess what will cause with all DLL-calls.
So, beware of API creepings.
Does it hurt a distributed processing solution?
No.
With a wise message-layout design, there are no adverse effects of MetaTrader Terminal 4 behaviour and all the logic ( incl. the strategy decision ) is put outside this creeping platform.
Doable. Fast and smart. Also could use remote-GPU-cluster processing, if your budget allows.
Does it work even in Strategy Tester?
Yes, it does.
If anyone has the gut to rely on the in-built Strategy Tester, the distributed-processing model still works there. Performance depends on the preferred style of modelling, a full one year, tick-by-tick simulation, with a quite complex AI/ML components took a few days on a common COTS desktops PC-systems ( after years of Quant R&D, we do not use Strategy Tester internally at all, but the request was to batch-test the y/y tick-data, so could be commented here ).

Java Security Risks for using URL class to get html source

Hello I have a question about the java.net.URL class.
If I just type a random url, and get the html in string form, do I put my computer at risk?
Is it possible that a certain website maybe exploits the URL class and take over my computer though my application, or at least infects with some kind of malware.
I hope that my question is clear, if not please ask me to clear it up, English is not my native language.
Thanks for all your help.
If I just type a random url, and get the html in string form, do I put my computer at risk?
You do not specify what you would consider to be a risk. But fetching a string consumes network bandwidth, CPU time and (in most cases) storage space for the down-loaded text. A malicious HTTP server than served an infinite random string would use some of your network bandwidth and CPU forever, and if your program stored the string would eventually cause your program to fail with an OutOfMemoryError. If your program was configured to use a large fraction of the RAM of the computer, the URL would reduce the performance of all the other programs on your computer, until your program exited.
Something similar, a tarpit, has been done to slow down malicious programs, such as computer worms.
No. There is no way by which any string can harm your machine. There is no harm until and unless you try to execute some extract from the string.
Getting it in string form in itself won't do any harm, although if you set no timeout and don't do the fetching in a background thread, it is possible for the remote party to "freeze" your application by intentionally sending data incredibly slowly.
All this can be avoided by using Executors to fetch the html and setting a sensible limit to how long you're willing to wait for the content.
There is also the buffer size problem that Raedwald mentions, again this can be avoided by limiting the amount of data you accept.
Remember though that while getting the content into a string is safe (with the caveats mentioned already), what you then do with that string is what can make things dangerous.

Linux, Java and USB

Kind of a followup to this question, I have been able to:
Find the device for with which I am working, disconnect it from the kernel, and claim the (single) interface. . . and that's about as far as I can get.
When I try to write to the device (which is a custom wireless transceiver, not my own design), I get (when using LibUsb.bulkTransfer with the endpoint 0x00):
LibUsb.bulkTransfer(handle, (byte)0x00, bb, transfered, 5000);
an Input/Output error, and (when using LibUsb.bulkTransfer with the endpoint 0x81):
LibUsb.bulkTransfer(handle, (byte)0x81, bb, transfered, 5000);
a TimeOut error.
I'm pretty sure I have absolutely NO idea what I'm doing here (which doesn't help my position), and this is way deeper than I am accustomed to going as far as communicating with devices on a lower level (the most I've done is interop with .Net).
I've seen the lsusb command and executed it and gotten... a lot of stuff, and I can recognize some of it, but most of it I'm kind of lost with and was hoping for someone to hold my hand, or point me to a sort of... USB for Dummies guide that might help me figure out what I need to do.
The end result (ideally) will be a Java package that allows cross-platform communication with the device without any sort of tinkering on behalf of the end-user (and by cross-platform I mean, windows, linux and mac, which is why I'm using the java4usb java library).
Where I am right now is, using the output from the lsusb command, I'm hoping to be able to send commands from the transceiver to the external device with which it communicates. (basically it sends commands to a device connected to an LED that can turn the LED on and off, and make it flash, and it also can receive commands from that device and respond accordingly to them, but baby steps).
The lsusb output you can find here (it's quite verbose, and I didn't want to flood the question with more than necessary). Any help or direction would be tremendously appreciated.
EDIT: A bit more research reveals (from the lsusb output) that the 0x81 endpoint is an interrupt type. Putting 2 and 2 together lead me to the conclusion that I wanted neither a bulk transfer nor a control transfer but an interrupt transfer:
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0002 1x 2 bytes
bInterval
LibUsb.interruptTransfer(handle, (byte)0x81, bb, transfered, 1000);
Unfortunately I'm still getting a Timeout error.
EDIT: Some more information is needed:
It has been suggested for Synchronous control (which is fine for half of what I need to do) that I should use the usb4java.LibUsb controlTransfer method, which is fine but there are several parameters that need to be filled, and I do not know what it is with that they need to be filled:
public static int controlTransfer(DeviceHandle handle, //I know this.
byte bmRequestType, //<--- What goes here?
byte bRequest, //<--- What goes here?
short wValue, //<--- What goes here?
short wIndex, //<--- What goes here?
ByteBuffer data, //<--- What goes here?
long timeout) //<--- What goes here?
Any and all direction to the answer to what I need to populate these fields with would be a great help and greatly appreciated.

download with java code is really slow

i wrote a bit of code that reads download links from a text file and downloads the videos using the copyURLToFile methode from apaches commons-io library and the download is really slow when im in my wlan.
when i put in an internet stick is is about 6 times faster although the stick got 4mbit and my wlan is 8 mbit.
i also tried to do it without the commons-io library but the problem is the same.
normally im downloading 600-700 kb/s in my wlan but with java it only downloads with about 50 kb/s. With the internet stick its about 300 kb/s.
Do you know what the Problem could be?
thanks in advance
//Edit: Here is the code but i dont think it has anything to do with this and what do you mean with network it policies?
FileInputStream fstream = new FileInputStream(linksFile);
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String link;
String name;
while ((link = br.readLine()) != null) {
name = br.readLine();
FileUtils.copyURLToFile(new URL(link), new File("videos/"+name+".flv"));;
System.out.println(link);
}
This isn't likely to be a Java problem.
The code you've posted actually doesn't do any IO over the network - it just determines a URL and passes it to (presumably Apache Commons') FileUtils.copyURLToFile. As usual with popular third-party libraries, if this method had a bug in it that caused slow throughput in all but the most unusual situations, it would already have been identified (and hopefully fixed).
Thus the issue is going to lie elsewhere. Do you get the expected speeds when accessing resource through normal HTTP methods (e.g. in a browser)? If not, then there's a universal problem at the OS level. Otherwise, I'd have a look at the policies on your network.
Two possible causes spring to mind:
The obvious one is some sort of traffic shaping - your network deprioritises the packets that come from your Java app (for an potentially arbitrary reason). You'd need to see hwo this is configured and look at its logs to see if this is the case.
The problem resides with DNS. If Java's using a primary server that's either blocked or incredibly slow, then it could take up to a few seconds to convert that URL to an IP address and begin the actual transfer. I had a similar problem once when a firewall was silently dropping packets to one server and it took three seconds (per lookup!) for the Java process to switch to the secondary server.
In any case, it's almost certainly not the Java code that's at fault.
The FileUtils.copyURLToFile internals uses a buffer to read.
Increasing the value of the buffer could speed up the download, but that seems not possible.

Categories

Resources