Which would have the greatest processor cost? - java

I have a class, as such:
private class verifierListener implements SerialPortEventListener {
String outBuffer;
char charBuffer;
public void serialEvent(SerialPortEvent event) {
if (event.isRXCHAR()) {//If data is available
timeOut = 1000;
lastReadTimer = System.currentTimeMillis();
if (event.getEventValue() > 0) {//Check bytes count in the input buffer
try {
byte[] buffer = verifierPort.readBytes(event.getEventValue());
//INSERT CODE HERE
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
With two possible methods of implementation(in the INSERT CODE HERE area)
Case A:
outBuffer = new String(buffer);
bfrFile.print(outBuffer);
sysOutWriter.append(outBuffer);
Case B:
for(byte bt : buffer) {
charBuffer = (char) bt;
bfrFile.print(charBuffer);
sysOutWriter.append(charBuffer);
}
Both compile and run, and do what they are supposed to do. But I'm trying to make this code perform as seamlessly as possible, so I don't possibly risk a loss of transmitted data on a lower end PC.
I'm assuming that case A will have more overhead due to the String initialization, but I wanted to make certain before I remove it.
Can y'all tell which one would be cleaner, and/or how to determine the processing cost for each?

You shouldn't be loosing any data, even on a lower-end PC. That's because there are (several) buffers in between your code and the actual data that's coming over the serial port(operating system buffers, Java buffers, etc..). Speed-wise, unless you're running this code a lot(as in, several thousand times per second) you shouldn't notice a difference.
Assuming this is a standard serial connection, and you were running at 115200 bits per second, that means that you're getting at most 14,400 characters per second. Even if you were to read those one character at a time, you shouldn't see a big speed hit.

if your using windows ctrl-alt-delete and view the thread when its running and see the memory that it uses. As far as your code goes, may I suggest you use a StringBuilder instead of String. I'm not sure if there really is a difference in overhead, but from a good code / programming perspective it should be A.

Related

How do I prevent Out of Memory errors with large Stream data?

I am dealing with input streams of unknown size that I need to serialize to a byte[] for fail-safe behavior.
I have this code right now based on IOUtils, but with 5-50 diffrent threads possibly running this, I don't know how reliable it is.
try(final ByteArrayOutputStream output= new ByteArrayOutputStream()){
long free_memory = Runtime.getRuntime().freeMemory() / 5;
final byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
long count = 0;
int n = 0;
while (-1 != (n = input.read(buffer))) {
output.write(buffer, 0, n);
count += n;
free_memory -= n;
if (free_memory < DEFAULT_BUFFER_SIZE) {
free_memory = Runtime.getRuntime().freeMemory();
if (free_memory < (DEFAULT_BUFFER_SIZE * 10)) {
throw new IOException("JVM is low on Memory.");
}
free_memory = free_memory / 5;
}
}
output.flush();
return output.toByteArray();
}
I want to catch an OOM error before it is a problem and kills the thread, and I don't want to save the stream as a file. Is there a better way of making sure you don't use too much memory?
(I'm using Java 8)
Too answer your question, given the fact that multiple threads are running the same code, this is a very unreliable way.
The code asks the system how many memory is available with Runtime.getRuntime().freeMemory(), which is a value that is obsolete the instant it returns, as other threads will have consumed more memory in the meantime. The corresponding I/O exception that should be thrown in case some not so obvious threshold of remaining memory is thrown might or might not be executed, but whether it is is totally not important.
The capturing of the data is done inside a ByteArrayOutputStream, which will increase (and copy) its buffer each time the end is reached. It is not controlled by the 'how much memory is there' check, so again, multiple threads will be resizing their buffer at the same time, any of which can fail.
The most fail safe manner is storing the data on a disk, thus making a copy. If the data comes from an outside streaming source you can use
Files.copy(). If what you get is a file, you can use the other variant of copy, which I think delegates it to the OS.

How many filereaders can concurrently read from the same file?

I have a massive 25GB CSV file. I know that there are ~500 Million records in the file.
I want to do some basic analysis with the data. Nothing too fancy.
I don't want to use Hadoop/Pig, not yet atleast.
I have written a java program to do my analysis concurrently. Here is what I am doing.
class MainClass {
public static void main(String[] args) {
long start = 1;
long increment = 10000000;
OpenFileAndDoStuff a = new OpenFileAndDoStuff[50];
for(int i=0;i<50;i++) {
a[i] = new OpenFileAndDoStuff("path/to/50GB/file.csv",start,start+increment-1);
a[i].start();
start += increment;
}
for(OpenFileAndDoStuff obj : a) {
obj.join();
}
//do aggregation
}
}
class OpenFileAndDoStuff extends Thread {
volatile HashMap<Integer, Integer> stuff = new HashMap<>();
BufferedReader _br;
long _end;
OpenFileAndDoStuff(String filename, long startline, long endline) throws IOException, FileNotFoundException {
_br = new BufferedReader(new FileReader(filename));
long counter=0;
//move the bufferedReader pointer to the startline specified
while(counter++ < start)
_br.readLine();
this._end = end;
}
void doStuff() {
//read from buffered reader until end of file or until the specified endline is reached and do stuff
}
public void run() {
doStuff();
}
public HashMap<Integer, Integer> getStuff() {
return stuff;
}
}
I thought doing this I could open 50 bufferedReaders, all reading 10 million lines chucks in parallel and once all of them are done doing their stuff, I'd aggregate them.
But, the problem I face is that even though I ask 50 threads to start, only two start at a time and can read from the file at a time.
Is there a way I can make all 50 of them open the file and read form it at the same time ? Why am I limited to only two readers at a time ?
The file is on a windows 8 machine and java is also on the same machine.
Any ideas ?
Here is a similar post: Concurrent reading of a File (java preffered)
The most important question here is what is the bottleneck in your case?
If the bottleneck is your disk IO, then there isn't much you can do at the software part. Parallelizing the computation will only make things worse, because reading the file from different parts simultaneously will degrade disk performance.
If the bottleneck is processing power, and you have multiple CPU cores, then you can take an advantage of starting multiple threads to work on different parts of the file. You can safely create several InputStreams or Readers to read different parts of the file in parallel (as long as you don't go over your operating system's limit for the number of open files). You could separate the work into tasks and run them in parallel
See the referred post for an example that reads a single file in parallel with FileInputStream, which should be significantly faster than using BufferedReader according to these benchmarks: http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly#FileReaderandBufferedReader
One issue I see is that when a Thread is being asked to read, for example, lines 80000000 through 90000000, you are still reading in the first 80000000 lines (and ignoring them).
Maybe try java.io.RandomAccessFile.
In order to do this, you need all of the lines to be the same number of Bytes. If you cannot adjust the structure of your file, then this would not be an option. But if you can, this should allow for greater concurrency.

Java NIO MappedByteBuffer OutOfMemoryException

I am really in trouble: I want to read HUGE files over several GB using FileChannels and MappedByteBuffers - all the documentation I found implies it's rather simple to map a file using the FileChannel.map() method.
Of course there is a limit at 2GB as all the Buffer methods use int for position, limit and capacity - but what about the system implied limits below that?
In reality, I get lots of problems regarding OutOfMemoryExceptions! And no documentation at all that really defines the limits!
So - how can I map a file that fits into the int-limit safely into one or several MappedByteBuffers without just getting exceptions?
Can I ask the system which portion of a file I can safely map before I try FileChannel.map()? How?
Why is there so little documentation about this feature??
I can offer some working code. Whether this solves your problem or not is difficult to say. This hunts through a file for a pattern recognised by the Hunter.
See the excellent article Java tip: How to read files quickly for the original research (not mine).
// 4k buffer size.
static final int SIZE = 4 * 1024;
static byte[] buffer = new byte[SIZE];
// Fastest because a FileInputStream has an associated channel.
private static void ScanDataFile(Hunter p, FileInputStream f) throws FileNotFoundException, IOException {
// Use a mapped and buffered stream for best speed.
// See: http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly
FileChannel ch = f.getChannel();
long red = 0L;
do {
long read = Math.min(Integer.MAX_VALUE, ch.size() - red);
MappedByteBuffer mb = ch.map(FileChannel.MapMode.READ_ONLY, red, read);
int nGet;
while (mb.hasRemaining() && p.ok()) {
nGet = Math.min(mb.remaining(), SIZE);
mb.get(buffer, 0, nGet);
for (int i = 0; i < nGet && p.ok(); i++) {
p.check(buffer[i]);
}
}
red += read;
} while (red < ch.size() && p.ok());
// Finish off.
p.close();
ch.close();
f.close();
}
What I use is a List<ByteBuffer> where each ByteBuffer maps to the file in block of 16 MB to 1 GB. I uses powers of 2 to simplify the logic. I have used this to map in files up to 8 TB.
A key limitation of memory mapped files is that you are limited by your virtual memory. If you have a 32-bit JVM you won't be able to map in very much.
I wouldn't keep creating new memory mappings for a file because these are never cleaned up. You can create lots of these but there appears to be a limit of about 32K of them on some systems (no matter how small they are)
The main reason I find MemoryMappedFiles useful is that they don't need to be flushed (if you can assume the OS won't die) This allows you to write data in a low latency way, without worrying about losing too much data if the application dies or too much performance by having to write() or flush().
You don't use the FileChannel API to write the entire file at once. Instead, you send the file in parts. See example code in Martin Thompson's post comparing performance of Java IO techniques: Java Sequential IO Performance
In addition, there is not much documentation because you are making a platform-dependent call. from the map() JavaDoc:
Many of the details of memory-mapped files are inherently dependent
upon the underlying operating system and are therefore unspecified.
The bigger the file, the less you want it all in memory at once. Devise a way to process the file a buffer at a time, a line at a time, etc.
MappedByteBuffers are especially problematic, as there is no defined release of the mapped memory, so using more than one at a time is essentially bound to fail.

Need a better method of reading and storing values from text file

Goal: to get values from a text file and store it into values to load into my sqlite database.
Problem: My method is not efficient, and I need help comming up with an easier way.
As of right now I am parsing my textfile that looks like this.
agency_id,agency_name,agency_url,agency_timezone,agency_lang,agency_phone
1,"NJ TRANSIT BUS","http://www.njtransit.com/",America/New_York,en,""
2,"NJ TRANSIT RAIL","http://www.njtransit.com/",America/New_York,en,""
I am parsing everytime i read a comma, then storing that value into a variable, then I will use that variable as my database value.
This method works and is time consuming, The next text file I have to read in has over 200 lines of code, and i need to find an easier way.
AgencyString = readText();
tv = (TextView) findViewById(R.id.letter);
tv.setText(readText());
StringTokenizer st = new StringTokenizer(AgencyString, ",");
for (int i = 0; i < AgencyArray.length; i++) {
size = i; // which value i am targeting in the textfile
//ex. 1 would be agency_id, 2 would be agency_name
AgencyArray[i] = st.nextToken();
}
tv.setText(AgencyArray[size]); //the value im going to store into database value
}
private String readText() {
InputStream inputStream = getResources().openRawResource(R.raw.agency);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
int i;
try {
i = inputStream.read();
while (i != -1) {
byteArrayOutputStream.write(i);
i = inputStream.read();
}
inputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
return byteArrayOutputStream.toString();
}
First, why is this a problem? I don't mean to answer your question with a question so to speak, but more context is required to understand in what way you need to improve the efficiency of what you're doing. Is there a perceived delay in the application due to the parsing of the file, or do you have a more serious ANR problem due to you running on the UI thread? Unless there is some bottleneck in other code not shown, I honestly doubt you'd read and tokenise it faster that you're presently doing. Well, actually, no doubt you probably could; however, I believe it's more a case of designing your application so that delays involved in fetching and parsing of large data aren't perceived by or cause irritation to the user. My own application parses massive files like this and it does take a fraction of a second, but it doesn't present a problem due to the design of the overall application and UI. Also, have you used the profiler to see what's taking time? And also, have you run this on a real device, without debugger attached? Having the debugger attached to the real device, or using the simulator greatly increases execution time by several orders.
I am making the assumption that you need to parse this file type after receiving it over a network, as opposed to being something that is bundled with the application and only needs parsing once.
You could just bundle the SQLite database with your application instead of representing it in a text file. Look at the answer to this question

An infinite loop somewhere in my code

I have this Java game server that handles up to 3,000 tcp connections, each player, or each tcp connection has its own thread, each thread goes something like this:
public void run()
{
try
{
String packet = "";
char charCur[] = new char[1];
while(_in.read(charCur, 0, 1)!=-1 && MainServer.isRunning)
{
if (charCur[0] != '\u0000' && charCur[0] != '\n' && charCur[0] != '\r')
{
packet += charCur[0];
}else if(!packet.isEmpty())
{
parsePlayerPacket(packet);
packet = "";
}
}
kickPlayer();
}catch(IOException e)
{
kickPlayer();
}catch(Exception e)
{
kickPlayer();
}
finally
{
try{
kickPlayer();
}catch(Exception e){};
MainServer.removeIP(ip);
}
}
The code runs fine, and I know that each thread for each player is a bad idea, but I'll have too keep it this way for now. The server runs fine on a fast machine (6cor x2, 64bits, 24GB RAM, Windows Server 2003).
But at some point, after about 12 hours of UpTime, the server starts to loop somewhere... I know that because the java process takes 99% of the CPU infinitely until the next reboot.
And I'm having hard time to profile the application because I don't want to disturb the players. The profiler I use (visualvm) always end up chashing the server without telling me where's the problem.
Anyways, in that piece of code above I think maybe the problem comes from this:
while(_in.read(charCur, 0, 1)!=-1)
(the _in is a BufferedReader of the client's socket).
Is it possible that _in.read() can return something else infinitely that will keep my code runing and taking 99% of ressources? Is there something wrong with my code? I don't understand everything, I only wrote half of it.
Reading one char at a time is almost as slow as building a String with +=. I wouldn't be able to tell you which is worse. It wouldn't surprise me if a single connection tied an entire core using this approach.
The simplest "fix" to do would be to use a BufferedReader and a StringBuilder.
However the most efficient way to read data is to read bytes, into a ByteBuffer and parse the "lines". I assume you are receiving ASCII text. You could write the parser to be able to process the content and the end of line in one stage (ie with one pass of the data)
Using the last approach, here is an example (including code) of where I am parsing an XML message from a socket and replying in XML. The typical latency was 16 micro-seconds and the throughput was 264K per second.
http://vanillajava.blogspot.com/2011/07/send-xml-over-socket-fast.html
You can do something like the following which likely to be fast enough
BufferedReader br = new BufferedReader(_in);
for(String line; ((line = br.readline()) != null;) {
if(line.indexOf('\0') >= 0)
for(String part: line.split("\0"))
parsePlayerPacket(part);
else
parsePlayerPacket(line);
}
If you find this solution dead simple and you are familiar with ByteBuffer you might consider using those.
I had the kind of a same problem at one of my applications I wrote. My Application took 50% cpu (in a dual core).
What I made then to resolve the Problem, is let the Thread sleeping 1 timetick
Thread.sleep(1);
I hope this is helpfull for you
edit:
oh and for what is that ?
}catch(IOException e)
{
kickPlayer();
}catch(Exception e)
{
kickPlayer();
}
I think you don't need the IOException Catch (the Exception catch, catches every kind of exception)
That exception handling just hurted my eyes. There's no point in calling kickPlayer() inside catch blocks since you are calling it again in finally. Finally executes (almost) always.
And now about your problem, forget my previous answer, I was a bit asleep XD. I don't see anything prone to loop forever in the posted while loop. InputStream.read() either returns -1 when there's no more data or throws an exception. The problem must be in other code, or maybe is a threading problem.
As they have told you in other answers, try to use buffered streams, reading a block of characters instead of only one at a time, and replace the string concatenation for StringBuilder's append method. This should improve performance, but not sure if it will solve the problem (maybe it appears in 24h instead of 12).

Categories

Resources