Java reading HTTP response (using StringBuilder) much slower than in python - java

I'm calling a webservice that returns a large response, about 59 megabytes of data.
This is how I read it from Java:
BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream(),"UTF-8"));
result = result.concat(this.getResponseText(in));
private String getResponseText(BufferedReader in) throws IOException {
StringBuilder response = new StringBuilder(Integer.MAX_VALUE/2);
System.out.println("Started reading");
String line = "";
while((line = in.readLine()) != null) {
response.append(line);
response.append("\n");
}
in.close();
System.out.println("Done");
String r = response.toString();
System.out.println("Built r");
return r;
}
In Windows Resource manager during the reading I can see a throughput of about 100000 Bytes per second.
However when I read exactly the same data from the same webservice in python, i.e.:
response = requests.request("POST", url, headers=headers, verify=False, json=json)
I can see throughput up to 700000 Bytes per second (about 7 times faster). And also the code is finished 7 times faster.
The question is - Am I missing something that can make the reads in Java faster? Is this way really the fastest way how I can read HTTP response in Java?
Update - even after I'm not reading, just going through the response, I'm still at at most 100000 bytes / seconds, so I believe that the bottleneck is somewhere in the way how Java reads:
private List<String> getResponseTextAsList(BufferedReader in) throws IOException {
System.out.println("Started reading");
List<String> l = new ArrayList<String>();
int i = 0;
long q = 0;
String line = "";
while((line = in.readLine()) != null) {
//l.add(line);
i++;
q = q+line.length();
}
in.close();
System.out.println("Done" + i + " " + q);
return l;
}

Related

How to know the ping in ms

I know how to check if a certain site/IP ADDRESS is reachable or not. But I wonder if it's possible to know the response time or the ping in millisecond (ms) for a specific site or an IP address?
Thanks!
It is very much possible :)
Just make sure you have the right permissions(You put this in your AndroidManifest.xml)
<uses-permission android:name="android.permission.INTERNET" />
My code that works absolutely well.
public String ping(String url) {
String str = "";
try {
java.lang.Process process = Runtime.getRuntime().exec(
"ping -c 1 " + url);
BufferedReader reader = new BufferedReader(new InputStreamReader(
process.getInputStream()));
int i;
char[] buffer = new char[4096];
StringBuffer output = new StringBuffer();
String op[] = new String[64];
String delay[] = new String[8];
while ((i = reader.read(buffer)) > 0)
output.append(buffer, 0, i);
reader.close();
op = output.toString().split("\n");
delay = op[1].split("time=");
// body.append(output.toString()+"\n");
str = delay[1];
Log.i("Pinger", "Ping: " + delay[1]);
} catch (IOException e) {
// body.append("Error\n");
e.printStackTrace();
}
return str;
}
The code above is as simple as you can get.
To call the code...
String str = ping("www.google.com");
//Unecessary but it makes it easier
android.widget.Toast.makeText(this, str, android.widget.Toast.LENGTH_LONG).show();
Java does not have ICMP out of the box. There are three things you can do:
Rely on CLI and use that to get the response from the ping command see here.
Use sockets to calculate latency between sending and receiving information see here.
Use an ICMP library.

reading specific lines from file is extremely slow

I have created a method that reads specific lines from a file based on their line number. It works fine for most files but when I try to read a file that contains a large number of really long lines then it takes ages, particularly as it gets further down in the file. I've also done some debugging and it appears to take a lot of memory as well but I'm not sure if this is something that can be improved. I know there are some other questions which focus on how to read certain lines from a file but this question is focussed primarily on the performance aspect.
public static final synchronized List<String> readLines(final File file, final Integer start, final Integer end) throws IOException {
BufferedReader bufferedReader = new BufferedReader(new FileReader(file));
List<String> lines = new ArrayList<>();
try {
String line = bufferedReader.readLine();
Integer currentLine = 1;
while (line != null) {
if ((currentLine >= start) && (currentLine <= end)) {
lines.add(line + "\n");
}
currentLine++;
if (currentLine > end) {
return lines;
}
line = bufferedReader.readLine();
}
} finally {
bufferedReader.close();
}
return lines;
}
How can I optimize this method to be faster than light?
I realised that what I was doing before was inherently slow and used up too much memory.
By adding all lines to memory and then processing all lines in a List it was not only taking twice as long but was also creating String variables for no reason.
I am now using Java 8 Stream and processing at point of reading which is the fastest method I've used so far.
Path path = Paths.get(file.getAbsolutePath());
Stream<String> stream = Files.lines(path, StandardCharsets.UTF_8);
for (String line : (Iterable<String>) stream::iterator) {
//do stuff
}
}

Java heap profiling seems to freeze during large file ingestion

I am trying to profile a java 7 application running on a redhat machine. When I run it as follows:
java -agentlib:hprof=cpu=samples,depth=10,monitor=y,thread=y ...
a particular block of code that creates a table-type object by reading a very large gzipped text file line by line completes in about 6 minutes. When I run it like this:
java -agentlib:hprof=heap=sites,depth=10,monitor=y,thread=y ...
the same block takes several orders of magnitude more time to complete (I am estimating something like 24 hours)
Here's the method (part of a class) that reads in the file:
private static void ingestValues()
{
int mSize = 30000;
pairScoresTable = new float[mSize][];
for (int i = 0; i < pairScoresTable.length; i++) {
pairScoresTable[i] = new float[mSize];
Arrays.fill(pairScoresTable[i], fillVal);
}
try
{
BufferedReader bufferedReader =
new BufferedReader(
new InputStreamReader(
new GZIPInputStream(
new FileInputStream(rawPath)), "US-ASCII"));
String line = null;
while((line = bufferedReader.readLine()) != null) { // file has 388661141 lines
Float value = 0.0F;
Integer i = 0;
Integer j = 0;
// extract value, i and j by parsing line...
pairScoresTable[i][j] = value;
pairScoresTable[j][i] = value;
}
bufferedReader.close();
}
catch(Exception e)
{
return;
}
return;
}
So it basically reads a file each line of which is formatted to describe the position of a value in the 2d matrix pairScoresTable.
Why is there such a large difference in execution time? Is there a way to do heap profiling of this code faster without having to refactor it?

How to run tshark in Java to get packets in real-time?

I have a problem with running tshark in Java. It seems that packets arrive in bulk instead of truly real-time (as it happens when run from terminal).
I tried a few different approaches:
ArrayList<String> command = new ArrayList<String>();
command.add("C:\\Program Files\\Wireshark\\tshark.exe");
ProcessBuilder pb = new ProcessBuilder(command);
Process process = pb.start();
BufferedReader br = null;
try {
//tried different numbers for BufferedReader's last parameter
br = new BufferedReader(new InputStreamReader(process.getInputStream()), 1);
String line = null;
while ((line = br.readLine()) != null) {
System.out.println(line);
}
} catch...
also tried using InputStream's available() method as seen in What does InputStream.available() do in Java?
I also tried NuProcess library with the following code:
NuProcessBuilder pb = new NuProcessBuilder(command);
ProcessHandler processHandler = new ProcessHandler();
pb.setProcessListener(processHandler);
NuProcess process = pb.start();
try {
process.waitFor(0, TimeUnit.SECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
private class ProcessHandler extends NuAbstractProcessHandler {
private NuProcess nuProcess;
#Override
public void onStart(NuProcess nuProcess) {
this.nuProcess = nuProcess;
}
#Override
public void onStdout(ByteBuffer buffer) {
if (buffer == null)
return;
byte[] bytes = new byte[buffer.remaining()];
buffer.get(bytes);
System.out.println(new String(bytes));
}
}
None of the methods work. Packets always arrive, as if buffered, only when about 50 were sniffed.
Do you have any idea why this may be happening and how to solve it? It's pretty frustrating. I spent a lot of time looking at similar questions at SO, but none of them helped.
Do you see any errors in my code? Is it working in your case?
As the tshark man page says:
−l Flush the standard output after the information for each packet is
printed. (This is not, strictly speaking, line‐buffered if −V was
specified; however, it is the same as line‐buffered if −V wasn’t
specified, as only one line is printed for each packet, and, as −l
is normally used when piping a live capture to a program or script,
so that output for a packet shows up as soon as the packet is seen
and dissected, it should work just as well as true line‐buffering.
We do this as a workaround for a deficiency in the Microsoft Visual
C++ C library.)
This may be useful when piping the output of TShark to another
program, as it means that the program to which the output is piped
will see the dissected data for a packet as soon as TShark sees the
packet and generates that output, rather than seeing it only when
the standard output buffer containing that data fills up.
Try running tshark with the -l command-line argument.
I ran some tests to see how much Buffering would be done by BufferedReader versus just using the input stream.
ProcessBuilder pb = new ProcessBuilder("ls", "-lR", "/");
System.out.println("pb.command() = " + pb.command());
Process p = pb.start();
byte ba[] = new byte[100];
InputStream is = p.getInputStream();
int bytecountsraw[] = new int[10000];
long timesraw[] = new long[10000];
long last_time = System.nanoTime();
for (int i = 0; i < timesraw.length; i++) {
int bytecount = is.read(ba);
long time = System.nanoTime();
timesraw[i] = time - last_time;
last_time = time;
bytecountsraw[i] = bytecount;
}
try (PrintWriter pw = new PrintWriter(new FileWriter("dataraw.csv"))) {
pw.println("bytecount,time");
for (int i = 0; i < timesraw.length; i++) {
pw.println(bytecountsraw[i] + "," + timesraw[i] * 1.0E-9);
}
} catch (Exception e) {
e.printStackTrace();
}
BufferedReader br = new BufferedReader(new InputStreamReader(is));
int bytecountsbuffered[] = new int[10000];
long timesbuffered[] = new long[10000];
last_time = System.nanoTime();
for (int i = 0; i < timesbuffered.length; i++) {
String str = br.readLine();
int bytecount = str.length();
long time = System.nanoTime();
timesbuffered[i] = time - last_time;
last_time = time;
bytecountsbuffered[i] = bytecount;
}
try (PrintWriter pw = new PrintWriter(new FileWriter("databuffered.csv"))) {
pw.println("bytecount,time");
for (int i = 0; i < timesbuffered.length; i++) {
pw.println(bytecountsbuffered[i] + "," + timesbuffered[i] * 1.0E-9);
}
} catch (Exception e) {
e.printStackTrace();
}
I tried to find a command that would just keep printing as fast as it could so that any delays would be due to the buffering and/or ProcessBuilder rather than in the command itself. Here is a plot of the result.
You can plot the csv files with excel although I used a Netbeans plugin called DebugPlot. There wasn't a great deal of difference between the raw and the buffered. Both were bursty with majority of reads taking less than a microsecond separated by peaks of 10 to 50 milliseconds. The scale of the plot is in nanoseconds so the top of 5E7 is 50 milliseconds or 0.05 seconds. If you test and get similar results perhaps it is the best process builder can do. If you get dramatically worse results with tshark than other commands, perhaps there is an option to tshark or the packets themselves are coming in bursts.

Reading a large text file faster

I'm trying to read a large text file as fast as possible.
Lines not beginning with '!' are passed over.
Lines with 8 CSV have their last value removed.
There will never be a ',' in a value (didn't need to use opencsv).
Everything is added to a long string that is decoded later.
So this is my code
BufferedReader br = new BufferedReader(new FileReader("C:\\Users\\Documents\\ais_messages1.3.txt"));
String line, aisLines="", cvsSplitBy = ",";
try {
while ((line = br.readLine()) != null) {
if(line.charAt(0) == '!') {
String[] cols = line.split(cvsSplitBy);
if(cols.length>=8) {
line = "";
for(int i=0; i<cols.length-1; i++) {
if(i == cols.length-2) {
line = line + cols[i];
} else {
line = line + cols[i] + ",";
}
}
aisLines += line + "\n";
} else {
aisLines += line + "\n";
}
}
}
} catch (IOException e) {
e.printStackTrace();
}
So right now it reads 36890 rows in 14 seconds. I also tried an InputStreamReader:
InputStreamReader isr = new InputStreamReader(new FileInputStream("C:\\Users\\Documents\\ais_messages1.3.txt"));
BufferedReader br = new BufferedReader(isr);
and it took the same amount of time. Is there a faster way to read a large text file (100,000 or 1,000,000 rows) ?
Stop trying to build up aisLines as a big String. Use an ArrayList<String> that you append the lines on to. That takes 0.6% the time as your method on my machine. (This code processes 1,000,000 simple lines in 0.75 seconds.) And it will reduce the effort needed to process the data later, as it'll already be split up by lines.
BufferedReader br = new BufferedReader(new FileReader("data.txt"));
List<String> aisLines = new ArrayList<String>();
String line, cvsSplitBy = ",";
try {
while ((line = br.readLine()) != null) {
if(line.charAt(0) == '!') {
String[] cols = line.split(cvsSplitBy);
if(cols.length>=8) {
line = "";
for(int i=0; i<cols.length-1; i++) {
if(i == cols.length-2) {
line = line + cols[i];
} else {
line = line + cols[i] + ",";
}
}
aisLines.add(line);
} else {
aisLines.add(line);
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
If you really want a big String at the end (because you're interfacing with someone else's code, or whatever), it'll still be faster to convert the ArrayList back into a single string, than to do what you were doing.
As the most consuming operation is IO the most efficient way is to split threads for parsing and reading:
private static void readFast(String filePath) throws IOException, InterruptedException {
ExecutorService executor = Executors.newWorkStealingPool();
BufferedReader br = new BufferedReader(new FileReader(filePath));
List<String> parsed = Collections.synchronizedList(new ArrayList<>());
try {
String line;
while ((line = br.readLine()) != null) {
final String l = line;
executor.submit(() -> {
if (l.charAt(0) == '!') {
parsed.add(parse(l));
}
});
}
} catch (IOException e) {
e.printStackTrace();
}
executor.shutdown();
executor.awaitTermination(1000, TimeUnit.MINUTES);
String result = parsed.stream().collect(Collectors.joining("\n"));
}
For my pc it has taken 386ms vs 10787ms with the slow one
You can use single thread reads your large csv file and multiple threads parse all lines. The way I do is using Producer-Consumer pattern and BlockingQueue.
Producer
Making one Producer Thread which is only responsible for reading the lines of your csv file, and stores lines into BlockingQueue. The producer side does not do anything else.
Consumers
Making multiple Consumer Threads, pass the same BlockingQueue object into your consumers. Implementing time consuming work in your Consumer Thread class.
The following code provide you an idea of solving problem, not the solution.
I was implemented this using python and it works much faster than using a single thread do everything. The language is not java, but the theory behind is the same.
import multiprocessing
import Queue
QUEUE_SIZE = 2000
def produce(file_queue, row_queue,):
while not file_queue.empty():
src_file = file_queue.get()
zip_reader = gzip.open(src_file, 'rb')
try:
csv_reader = csv.reader(zip_reader, delimiter=SDP_DELIMITER)
for row in csv_reader:
new_row = process_sdp_row(row)
if new_row:
row_queue.put(new_row)
finally:
zip_reader.close()
def consume(row_queue):
'''processes all rows, once queue is empty, break the infinit loop'''
while True:
try:
# takes a row from queue and process it
pass
except multiprocessing.TimeoutError as toe:
print "timeout, all rows have been processed, quit."
break
except Queue.Empty:
print "all rows have been processed, quit."
break
except Exception as e:
print "critical error"
print e
break
def main(args):
file_queue = multiprocessing.Queue()
row_queue = multiprocessing.Queue(QUEUE_SIZE)
file_queue.put(file1)
file_queue.put(file2)
file_queue.put(file3)
# starts 3 producers
for i in xrange(4):
producer = multiprocessing.Process(target=produce,args=(file_queue,row_queue))
producer.start()
# starts 1 consumer
consumer = multiprocessing.Process(target=consume,args=(row_queue,))
consumer.start()
# blocks main thread until consumer process finished
consumer.join()
# prints statistics results after consumer is done
sys.exit(0)
if __name__ == "__main__":
main(sys.argv[1:])

Categories

Resources