We are working in a Tomcat/J2EE application.
In this application we store a lot of data in the session, and I'm wondering how many data can we store with no problems.
What is the minimum restriction? The memory of Tomcat? The JVM?
How can I calculate if I can store it 200k strings?
For #1 - You can store as much data as heap size allocated to JVM. Of course, tomcat runs inside the JVM so it will also use some part of the memory allocated.
For #2 - It really depends on the size of the string - 2 bytes are required per unicode character. Take the average size of your string, multiply it by 200k and then make sure you have enough memory allocated.
Tomcat runs in a JVM. So if you have a 32 bits jre, you can have a maximum heap size of about 1,7Gb. If you want more, you should switch to a 64 bits jre.
About string allocation, the internal java character encoding is Unicode, so I think it is UTF-8. In order to save space, you may compress those strings using zip and saving them as these were files.
Related
Sorry if this has been asked before (though I can't really find a solution).
I'm not really too good at programming, but anyways, I am crawling a bunch of websites and storing information about them on a server. I need a java program to process vector coordinates associated with each of the documents (about a billion or so documents with a grant total of 500,000 numbers, plus or minus, associated with each of the documents). I need to calculate the singular value decomposition of that whole matrix.
Now Java, obviously, can't handle as big of a matrix as that to my knowledge. If i try making a relatively small array (about 44 million big) then I will get a heap error. I use eclipse, and so I tried changing the -xmx value to 1024m (it won't go any higher for some reason even though I have a computer with 8gb of ram).
What solution is there to this? Another way of retrieving the data I need? Calculating the SVD in a different way? Using a different programming language to do this?
EDIT: Just for right now, pretend there are a billion entries with 3 words associated with each. I am setting the Xmx and Xms correctly (from run configurations in eclipse -> this is the equivalent to running java -XmsXXXX -XmxXXXX ...... in command prompt)
The Java heap space can be set with the -Xmx (note the initial capital X) option and it can certainly reach far more than 1 GB, provided you are using an 64-bit JVM and the corresponding physical memory is available. You should try something along the lines of:
java -Xmx6144m ...
That said, you need to reconsider your design. There is a significant space cost associated with each object, with a typical minimum somewhere around 12 to 16 bytes per object, depending on your JVM. For example, a String has an overhead of about 36-40 bytes...
Even with a single object per document with no book-keeping overhead (impossible!), you just do not have the memory for 1 billion (1,000,000,000) documents. Even for a single int per document you need about 4 GB.
You should re-design your application to make use of any sparseness in the matrix, and possibly to make use of disk-based storage when possible. Having everything in memory is nice, but not always possible...
Are you using a 32 bit JVM? These cannot have more than 2 GB of Heap, I never managed to allocate more than 1.5 GB. Instead, use a 64 bit JVM, as these can allocate much more Heap.
Or you could apply some math to it and use divide and conquer strategy. This means, split the problem into little problems to get to the same result.
Don't know much about SVD but maybe this page can be helpful:
http://www.netlib.org/lapack/lug/node32.html
-Xms and -Xmx are different. The one containg s is the starting heap space and the one with x is the maximum heap space.
so
java -Xms512 -Xmx1024
would give you 512 to start with
As other people have said though you may need to break your problem down to get this to work. Are you using 32 or 64 bit java?
For data of that size, you should not plan to store it all in memory. The most common scheme to externalize this kind of data is to store it all in a database and structure your program around database queries.
Just for right now, pretend there are a billion entries with 3 words associated with each.
If you have one billion entries you need 1 billion times the size of each entry. If you mean 3 x int as words that's 12 GB at least just for the data. If you meant words as String, you would enumerate the words as there is only about 100K words in English and it would take the same amount of space.
Given 16 GB cost a few hundred dollars, I would suggest buying more memory.
I'm writing a Web Application with a Java back-end running on a Tomcat server and a JavaScript client.
In the backend I have to process a large int[][][] array, which holds the information of a CT-Scan. Size is approx. 1024x1024x200.
I want to load this array into memory only when it's needed to process new data like image slices, and store it in some kind of database for the remaining time.
Things I tried so far:
Using JDBM3 to store a String, int[][][] Hashmap, runs into out of memory error
Serializing object and save it into PostgreSQL-DB using bytea[] data type, stores correctly but is getting memory error while loading again.
So my first question is, how can I save such a big array (which db, method)? It should load fast, and there should be some kind of multi-user access security because multiple users will be able to use front-end and therefore load the int[][][] into the back-end. The database should have a non-commercial license eg. GPL, MIT, Apache...
Second question, I know I could save the array serialized in the file system and keep the link in the db, but is the access safe for multiple users?
If you have enough RAM on the client machines, you could start by simply increasing the size of the JVM heap. This way, you should be able to create larger arrays without running into 'out of memory' errors.
You'll need at least approximately 800 Mb to play with (1024 x 1024 x 200 x 32 bits) for the array alone.
I think a MemoryMappedFile was born to handle exactly stuff like this. It offers you an array-like view of a file on disk, with random access for read and write. All you have to do is develop a scheme that lays out an int[][][] over a byte[] and that shouldn't be a problem. If you do it this way, you'd never have to hold the whole array in memory, but create only the slices that you are actually using. Even if you need to iterate over all the slices, you can instantiate only a single slice at a time.
If its a CAT-Scan, are the pixels 256 color grayscale? if so, you can save a significant amount of memory by storing the data as a byte-array rather than an int array. If it is 64K grayscale, use a short rather than an int.
On this Oracle page Java HotSpot VM Options, it lists -XX:+UseCompressedStrings as being available and on by default. However in Java 6 update 29, it is off by default and in Java 7 update 2 it reports a warning
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseCompressedStrings; support was removed in 7.0
Does anyone know the thinking behind removing this option?
sorting lines of an enormous file.txt in java
With -mx2g, this example took 4.541 seconds with the option on and 5.206 second with it off in Java 6 update 29. It is hard to see that it impacts performance.
Note: Java 7 update 2 requires 2.0 G whereas Java 6 update 29 without compressed strings requires 1.8 GB and with compressed string requires only 1.0 GB.
Originally, this option was added to improve SPECjBB performance. The gains are due to reduced memory bandwidth requirements between the processor and DRAM. Loading and storing bytes in the byte[] consumes 1/2 the bandwidth versus chars in the char[].
However, this comes at a price. The code has to determine if the internal array is a byte[] or char[]. This takes CPU time and if the workload is not memory bandwidth constrained, it can cause a performance regression. There is also a code maintenance price due to the added complexity.
Because there weren't enough production-like workloads that showed significant gains (except perhaps SPECjBB), the option was removed.
There is another angle to this. The option reduces heap usage. For applicable Strings, it reduces the memory usage of those Strings by 1/2. This angle wasn't considered at the time of option removal. For workloads that are memory capacity constrained (i.e. have to run with limited heap space and GC takes a lot of time), this option can prove useful.
If enough memory capacity constrained production-like workloads can be found to justify the option's inclusion, then maybe the option will be brought back.
Edit 3/20/2013: An average server heap dump uses 25% of the space on Strings. Most Strings are compressible. If the option is reintroduced, it could save half of this space (e.g. ~12%)!
Edit 3/10/2016: A feature similar to compressed strings is coming back in JDK 9 JEP 254.
Just to add, for those interested...
The java.lang.CharSequence interface (which java.lang.String implements), allows more compact representations of Strings than UTF-16.
Apps which manipulate a lot of strings, should probably be written to accept CharSequence, such that they would work with java.lang.String, or more compact representations.
8-bit (UTF-8), or even 5, 6, or 7-bit encoded, or even compressed strings can be represented as CharSequence.
CharSequences can also be a lot more efficient to manipulate - subsequences can be defined as views (pointers) onto the original content for example, instead of copying.
For example in concurrent-trees, a suffix tree of ten of Shakespeare's plays, requires 2GB of RAM using CharSequence-based nodes, and would require 249GB of RAM if using char[] or String-based nodes.
Since there were up votes, I figure I wasn't missing something obvious so I have logged it as a bug (at the very least an omission in the documentation)
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7129417
(Should be visible in a couple of days)
Java 9 executes the sorting lines of an enormous file.txt in java twice as fast on my machine as Java 6 and also only needs 1G of memory as it has -XX:+CompactStrings enabled by default. Also, in Java 6, the compressed strings only worked for 7-bit ASCII characters, whereas in Java 9, it supports Latin1 (ISO-8859-1). Some operations like charAt(idx) might be slightly slower though. With the new design, they could also support other encodings in future.
I wrote a newsletter about this on The Java Specialists' Newsletter.
In OpenJDK 7 (1.7.0_147-icedtea, Ubuntu 11.10), the JVM simply fails with an
Unrecognized VM option 'UseCompressedStrings'
when JAVA_OPTS (or command line) contains -XX:+UseCompressedStrings.
It seems Oracle really removed the option.
I wanted to know the maximum file size that can be read by Java code?
I wanted to handle file size of 100mb. is this possible?
If possible what are the JVM initial settings that I have to do?
Please recommend some best practice in handling file. like use ObjectInputStream,FilterInputStream etc.. use byte array to store file contents etc
What's the biggest number you can write? That's the maximum size.
The total size of file is irrelevant if you read it in chunks; there's no rule in the world that would state that you have to read in your 100 megabyte file in one go, you can read it in, say, 10 megabyte blocks instead. What really matters is how you use that incoming data and whether you need to store the product of the raw data entirely (for example, if the data is a 3D model of a building, how do you internally need to represent it) or only the relevant parts (such as finding first ten matches to some clause from a huge text file).
Since there's a lot of possible ways to handle the data, there's no all-covering blanket answer to your question.
The only maximum I know of is the reporting maximum of the length() - which is a Long. That length is 2^62 - 1, or very very large.
Java will not hold the entire file in memory at one time. If you want to hold part of the file in memory, you should use one of the "Buffered" classes (the name of the class starts with Buffered). These classes buffer part of the file for you, based on the buffer size you set.
The exact classes you should use depend on the data in the file. If you are more specific, we might be able to help you figure out which classes to use.
(One humble note: Seriously, 100mb? That's pretty small.)
There is not any max file size that can be read theoretically but I think it is Integer.MAX_VALUE because you can't initialize charBuffer's size bigger than Integer.MAX_VALUE
char[] buffer = new char[/* int size */];
char[] buffer = new char[Integer.MAX_VALUE]; // maximum char buffer
BufferedReader b = new BufferedReader(new FileReader( new File("filename")));
b.read(buffer);
There is no specific maximum file size supported by Java, it all depends on what OS you're running on. 100 megabytes wouldn't be too much of a problem, even on a 32-bit OS.
You didn't say whether you wanted to read the entire file into memory at once. You may find that you only need to process the file a part at a time. For example, a text file might be processed a line at a time, so there would be no need to load the whole file. Just read a line at a time and process each one.
If you want to read the whole file into one block of memory, then you may need to change the default heap size allocate for your JVM. Many JVMs have a default of 128 MB, which probably isn't enough to load your entire file and still have enough room to do other useful things. Check the documentation for your JVM to find out how to increase the heap size allocation.
As long as you have more than 100 MB free you should be able to load the entire file into memory at once, though you probably won't need to.
BTW: In term of what letters mean
M = Mega or 1 million for disk or 1024^2 for memory.
B = Bytes (8-bits)
b = bit e.g. 100 Mb/s
m = milli e.g. mS - milli-seconds.
A 100 milli-bits only makes sense for compressed data, but I assumed this is not what you are talking about.
right now, i need to load huge data from database into a vector, but when i loaded 38000 rows of data, the program throw out OutOfMemoryError exception.
What can i do to handle this ?
I think there may be some memory leak in my program, good methods to detect it ?thanks
Provide more memory to your JVM (usually using -Xmx/-Xms) or don't load all the data into memory.
For many operations on huge amounts of data there are algorithms which don't need access to all of it at once. One class of such algorithms are divide and conquer algorithms.
If you must have all the data in memory, try caching commonly appearing objects. For example, if you are looking at employee records and they all have a job title, use a HashMap when loading the data and reuse the job titles already found. This can dramatically lower the amount of memory you're using.
Also, before you do anything, use a profiler to see where memory is being wasted, and to check if things that can be garbage collected have no references floating around. Again, String is a common example, since if for example you're using the first 10 chars of a 2000 char string, and you have used substring instead of allocating a new String, what you actually have is a reference to a char[2000] array, with two indices pointing at 0 and 10. Again, a huge memory waster.
You can try increasing the heap size:
java -Xms<initial heap size> -Xmx<maximum heap size>
Default is
java -Xms32m -Xmx128m
Do you really need to have such a large object stored in memory?
Depending of what you have to do with that data you might want to split it in lesser chunks.
Load the data section by section. This will not let you work on all data at the same time, but you won't have to change the memory provided to the JVM.
You could run your code using a profiler to understand how and why the memory is being eaten up. Debug your way through the loop and watch what is being instantiated. There are any number of them; JProfiler, Java Memory Profiler, see the list of profilers here, and so forth.
Maybe optimize your data classes? I've seen a case someone has been using Strings in place of native datatypes such as int or double for every class member that gave an OutOfMemoryError when storing a relatively small amount of data objects in memory. Take a look that you aren't duplicating your objects. And, of course, increase the heap size:
java -Xmx512M (or whatever you deem necessary)
Let your program use more memory or much better rethink the strategy. Do you really need so much data in the memory?
I know you are trying to read the data into vector - otherwise, if you where trying to display them, I would have suggested you use NatTable. It is designed for reading huge amount of data into a table.
I believe it might come in handy for another reader here.
Use a memory mapped file. Memory mapped files can basically grow as big as you want, without hitting the heap. It does require that you encode your data in a decoding-friendly way. (Like, it would make sense to reserve a fixed size for every row in your data, in order to quickly skip a number of rows.)
Preon allows you deal with that easily. It's a framework that aims to do to binary encoded data what Hibernate has done for relational databases, and JAXB/XStream/XmlBeans to XML.