HashMap<Long,Long> needs more memory

HashMap<Long,Long> needs more memory - java

I wrote this code:
public static void main(String[] args) {
// TODO Auto-generated method stub
HashMap<Long,Long> mappp = new HashMap<Long, Long>(); Long a = (long)55; Long c = (long)12;
for(int u = 1;u<=1303564/2 + 1303564/3;u++){
mappp.put(a, c);
a = a+1;
c = c+1;
}
System.out.println(" " + mappp.size());
}
And it does not finish, beacause the progrm stops with the message in the console:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
I calculated how memory I need to have such HashMAp in memory and in my opinion my computer memory is enough. I have 1024 RAM on my computer.
I use eclipse. Also have set the parameters :
i am starting eclipse from the command line with this:'eclipse -vmargs -Xms512m -Xmx730m'
And second from Run Configurations i have set the tab Arguments with this:'-Xmx730m'
And this still gives java.lang.OutOfMemoryError.
What is the reason for this?
ps. Just to add some strange fact - in the bottom right corner of eclipse is shown the heap memory usage, and it is written 130M of 495M.
Well, when the HashMap mappp increases in size, doesn't this info '130M of 495M' have to change,for example '357M of 495M', 1 second later to be '412M of 495M' and so on in order to reach this 495M?In my case this 130M stays the same, just a little changes, from 130M to 131M or to 132M.
Strange

Java does not allows map of primitive data types. So if you are using Hashmap you will have to pay for boxing/unboxing and overhead of object references.
To avoid the overhead you can write your custom hashmap or use existing implementation from one of those libs.
boxing and unboxing in java

You should not put millions of items in a map. A Long is an object containing an 8 byte long field, plus some object overhead. Then you use two instances per map entry.
Since the key is numeric, you could (if the maximum key value is low enough) use an array as the 'map'.
long[] mappp = new long[4000000]; // takes 4M * 8 = 32M memory
If you need to know whether a value is 'not in the map', use 0 for that value. If 0 needs to be in your map, you can do some tricks like increasing all values by 1 (if the values are always positive).

Related

Java heap space error at large files with string.split

I have in the line a heap space error on an another machine, but it runs on my machine
I can't chance the properties of the another machine.
How can I solve this problem without using
Scanner.java ?
Is the argument of string.split correct with " " for split after spaces to split the String in pieces?
[File:]
U 1 234.003 30 40 50 true
T 2 234.003 10 60 40 false
Z 3 17234.003 30 40 50 true
M 4 0.500 30 40 50 true
/* 1000000+ lines */
java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOfRange(Arrays.java:3821)
at java.base/java.lang.StringLatin1.newString(StringLatin1.java:764)
at java.base/java.lang.String.substring(String.java:1908)
at java.base/java.lang.String.split(String.java:2326)
at java.base/java.lang.String.split(String.java:2401)
at project.FileR(Fimporter.java:99)
public static DataBase File(String filename) throws IOException {
BufferedReader fs = new BufferedReader(new FileReader(filename),64 * 1024);
String line;
String[] wrds;
String A; int hash; double B; int C; int D; boolean E; DataBase DB = new DataBase();
while (true) {
line = fs.readLine();
if (line == null) {break;}
wrds = line.split(" "); /* this is line 99 in the error-message */
hash = Integer.parseInt(wrds[1]);
B = Double.parseDouble(wrds[2]);
C = Integer.parseInt(wrds[3]);
D = Integer.parseInt(wrds[4]);
E = Boolean.parseBoolean(wrds[5]);
// hash is hashcode for all values B C D E in DataBase DB
DB.listB.put(hash,B);
DB.listC.put(hash,C);
DB.listD.put(hash,D);
DB.listE.put(hash,E);
}

How can I solve this problem without using Scanner.java ?
Scanner is not the issue.
If you are getting OOME's with this code, the most likely root cause is the following:
DB.listB.put(hash,B);
DB.listC.put(hash,C);
DB.listD.put(hash,D);
DB.listE.put(hash,E);
You appear to loading all of your data into 4 maps. (You haven't shown us the relevant code ... but I am making an educated guess here.)
My second guess is that your input files are very large, and the amount of memory needed to hold them in the above data structures is simply too large for the "other" machine's heap.
The fact that the OOME's are occurring in a String.split call is not indicative of a problem in split per se. This is just the proverbial "straw that broke the camel's back". The root cause of the problem is in what you are doing with the data after splitting it.
Possible solutions / workarounds:
Increase the heap size on the "other" machine. If you haven't set the -Xmx or -Xms options, the JVM will use the default max heap size ... which is typically 1/4 of the physical memory.
Read the command documentation for the java command to understand what -Xmx and -Xms do and how to set them.
Use more memory efficient data structures:
Create a class to represent a tuple consisting of B, C, D, E values. Then replace the 4 maps with a map of these tuples.
Use a more memory efficient Map type.
Consider using a sorted array of tuples (including the hash) and using binary search to look them up.
Redesign your algorithms so that they don't need all of the data in memory at the same time; e.g. split the input into smaller files and process them separately. (This may not be possible ....)

If I'am not mistaken, you can allocate more heap-size when launching your jar file, e.g.:
java -Xmx256M -jar MyApp.jar
which means, you can change those settings.
But then again, just increasing the heap-size will not get rid of that problem, if the files get bigger, chances of oom increase.
You could think of splitting large files before processing like only process first X lines, then force GC to run (by nulling) and then process next lines.

Java , add half a million objects to ArrayList from sql Query

I have a query with a resultset of half a million records, with each record I'm creating an object and trying to add it into an ArrayList.
How can I optimize this operation to avoid memory issues as I'm getting out of heap space error.
This is a fragment o code :
while (rs.next()) {
lista.add(sd.loadSabanaDatos_ResumenLlamadaIntervalo(rs));
}
public SabanaDatos loadSabanaDatos_ResumenLlamadaIntervalo(ResultSet rs)
{
SabanaDatos sabanaDatos = new SabanaDatos();
try {
sabanaDatos.setId(rs.getInt("id"));
sabanaDatos.setHora(rs.getString("hora"));
sabanaDatos.setDuracion(rs.getInt("duracion"));
sabanaDatos.setNavegautenticado(rs.getInt("navegautenticado"));
sabanaDatos.setIndicadorasesor(rs.getInt("indicadorasesor"));
sabanaDatos.setLlamadaexitosa(rs.getInt("llamadaexitosa"));
sabanaDatos.setLlamadanoexitosa(rs.getInt("llamadanoexitosa"));
sabanaDatos.setTipocliente(rs.getString("tipocliente"));
} catch (SQLException e) {
logger.info("dip.sabana.SabanaDatos SQLException : "+ e);
e.printStackTrace();
}
return sabanaDatos;
}
NOTE: The reason of using list is that this is a critic system, and I just can make a call every 2 hours to the bd. I don't have permission to do more calls to the bd in short times, but I need to show data every 10 minutes. Example : first query 10 rows, I show 1 rows each minute after the sql query.
I dont't have permission to create local database, write file or other ... Just acces to memory.

First Of All - It is not a good practice to read half million objects
You can think of breaking down the number of records to be read into small chunks
As a solution to this you can think of following options
1 - use of CachedRowSetImpl - it is same resultSet , it is a bad practice to keep resultSet open (as it is a Database connection property) If you use ArrayList - then you are again performing operations and utilizing the memory
For more info on cachedRowSet you can go to
https://docs.oracle.com/javase/tutorial/jdbc/basics/cachedrowset.html
2 - you can think of using an In-Memory Database, such as HSQLDB or H2. They are very lightweight and fast, provide the JDBC interface you can run the SQL queries as well
For HSQLDB implementation you can check
https://www.tutorialspoint.com/hsqldb/

It might help to have Strings interned, have for two occurrences of the same string just one single object.
public class StringCache {
private Map<String, String> identityMap = new Map<>();
public String cached(String s) {
if (s == null) {
return null;
}
String t = identityMap.get(s);
if (t == null) {
t = s;
identityMap.put(t, t);
}
return t;
}
}
StringCache horaMap = new StringCache();
StringCache tipoclienteMap = new StringCache();
sabanaDatos.setHora(horaMap.cached(rs.getString("hora")));
sabanaDatos.setTipocliente(tipoclienteMap .cached(rs.getString("tipocliente")));
Increasing memory is already said.
A speed-up is possible by using column numbers; if needed gotten from the column name once before the loop (rs.getMetaData()).

Option1:
If you need all the items in the list at the same time you need to increase the heap space of the JVM, adding the argument -Xmx2G for example when you launch the app (java -Xmx2G -jar yourApp.jar).
Option2:
Divide the sql in more than one call

Some of your options:
Use a local database, such as SQLite. That's a very lightweight database management system which is easy to install – you don't need any special privileges to do so – its data is held in a single file in a directory of your choice (such as the directory that holds your Java application) and can be used as an alternative to a large Java data structure such as a List.
If you really must use an ArrayList, make sure you take up as little space as possible. Try the following:
a. If you know the approximate number of rows, then construct your ArrayList with an appropriate initialCapacity to avoid reallocations. Estimate the maximum number of rows your database will grow to, and add another few hundred to your initialCapacity just in case.
b. Make sure your SabanaDatos objects are as small as they can be. For example, make sure the id field is an int and not an Integer. If the hora field is just a time of day, it can be more efficiently held in a short than a String. Similarly for other fields, e.g. duracion - perhaps it can even fit into a byte, if its range allows it to? If you have several flag/Boolean fields, they can be packed into a single byte or short as bits. If you have String fields that have a lot of repetitions, you can intern them as per Joop's suggestion.
c. If you still get out-of-memory errors, increase your heap space using the JVM flags -Xms and -Xmx.

maximum limit on Java array

I am trying to create 2D array in Java as follows:
int[][] adjecancy = new int[96295][96295];
but it is failing with the following error:
JVMDUMP039I Processing dump event "systhrow", detail "java/lang/OutOfMemoryError" at 2017/04/07 11:58:55 - please wait.
JVMDUMP032I JVM requested System dump using 'C:\eclipse\workspaces\TryJavaProj\core.20170407.115855.7840.0001.dmp' in response to an event
JVMDUMP010I System dump written to C:\eclipse\workspaces\TryJavaProj\core.20170407.115855.7840.0001.dmp
JVMDUMP032I JVM requested Heap dump using 'C:\eclipse\workspaces\TryJavaProj\heapdump.20170407.115855.7840.0002.phd' in response to an event
JVMDUMP010I Heap dump written to C:\eclipse\workspaces\TryJavaProj\heapdump.20170407.115855.7840.0002.phd
A way to solve this is by increasing the JVM memory but I am trying to submit the code for an online coding challenge. There it is also failing and I will not be able to change the settings there.
Is there any standard limit or guidance for creating large arrays which one should not exceed?

int[][] adjecancy = new int[96295][96295];
When you do that you are trying to allocate 96525*96525*32 bits which is nearly 37091 MB which is nearly 37 gigs. That is highly impossible to get the memory from a PC for Java alone.
I don't think you need that much data in your hand on initialization of your program. Probably you have to look at ArrayList which gives you dynamic allocation of size and then keep on freeing up at runtime is a key to consider.
There is no limit or restriction to create an array. As long as you have memory, you can use it. But keep in mind that you should not hold a block of memory which makes JVM life hectic.

Array must obviously fit into memory. If it does not, the typical solutions are:
Do you really need int (max value 2,147,483,647)? Maybe byte (max
value 127) or short is good enough? byte is 8 times smaller than int.
Do you have really many identical values in array (like zeros)? Try to use sparse arrays.
for instance:
Map<Integer, Map<Integer, Integer>> map = new HashMap<>();
map.put(27, new HashMap<Integer, Integer>()); // row 27 exists
map.get(27).put(54, 1); // row 27, column 54 has value 1.
They need more memory per value stored, but have basically no limits on the array space (you can use Long rather than Integer as index to make them really huge).
Maybe you just do not know how long the array should be? Try ArrayList, it self-resizes. Use ArrayList of ArrayLists for 2D array.
If nothing else is helpful, use RandomAccessFile to store your overgrown data into the filesystem. 100 Gb or about are not a problem in these times on a good workstation, you just need to compute the required offset in the file. The filesystem is obviously much slower than RAM but with good SSD drive may be bearable.

It is recommended to allocate Maximum Heap Size that can be allocated is 1/4th of the Machine RAM Size.
1 int in Java takes 4 bytes and your array allocation needs approximately 37.09GB of Memory.
In that case even if I assume you are allocating Full Heap to just an Array your machine should be around 148GB RAM. That is huge.
Have a look at below.
Ref: http://docs.oracle.com/javase/8/docs/technotes/guides/vm/gc-ergonomics.html
Hope this helps.

It depends on maximum memory available to your JVM and the content type of the array. For int we have 4 bytes of memory. Now if 1 MB of memory is available on your machine , it can hold maximum of 1024 * 256 integers(1 MB = 1024 * 1024 bytes). Keeping that in mind you can create your 2D array accordingly.

Array that you can create depends upon JVM heap size.
96295*96295*4(bytes per number) = 37,090,908,100 bytes = ~34.54 GBytes. Most JVMs in competitive code judges don't have that much memory. Hence the error.
To get a good idea of what array size you can use for given heap size -
Run this code snippet with different -Xmx settings:
Scanner scanner = new Scanner(System.in);
while(true){
System.out.println("Enter 2-D array of size: ");
size = scanner.nextInt();
int [][]numbers = new int[size][size];
numbers = null;
}
e.g. with -Xmx 512M -> 2-D array of ~10k+ elements.
Generally most of online judges have ~1.5-2GB heap while evaluating submissions.

Performance difference between assigning value to an Array Index and assigning value to Variable in java

I am writing a code where I am doing some calculations on array values and storing result back to array. Demo Code is as follows -
public class Test {
private int[] x = new int[100000000];
/**
* #param args
* #throws Exception
*/
public static void main(String[] args) throws Exception {
Test t = new Test();
long start = System.nanoTime();
for(int i=0;i<100000000;i++) {
t.testing(i);
}
System.out.println("time = " + (System.nanoTime() - start)/1000);
}
public void testing(int a) throws Exception {
int b=1,c=0;
if(b<c || b < 1) {
throw new Exception("Invalid inputs");
}
int d= a>>b;
int e = a & 0x0f;
int f = x[d];
int g = x[e];
x[d] = f | g;
}
}
Main logic of program lies in
int d= a>>b;
int e = a & 0x0f;
x[d] = f | g;
When I test this code, it took 110ms.
But instead of assiging result back to x[d], if I assign it to a variable as
int h = f | g;
it took only 3 ms.
I want to assign result back to Array only, but it is hampering performance by big margin.
This is a time critical program.
So I want to know if there's any alternative to arrays in Java or any other way I can avoid this hampering?
I tested this code under default sun JVM config.
P.S. I tried UNSAFE API, but it isnt helping.

What you want to beware of is the JVM optimising the code to nothing because it isn't doing anything useful.
In you case you are performing 100 million calls in 110 ms or about 1.1 nano-second per call. Given a single memory to L1 cache access takes 4 clock cycles this is pretty fast. In your test where you got 100 million in 3 ms, this suggests it is taking 0.03 nano-seconds per call or about 1/10th of a clock cycle. To me this doesn't sound likely and I would expect that if the doubled the length of the loop it would still take 3 ms. i.e. you are timing how long it takes to detect and eliminate the code.
A basic problem you have is that you have an array which is 400 MB in size. This will not fit in L1, L2 or L3 cache. Instead it could be going to main memory and this typically takes 200 clock cycles. The best option is to reduce the size of your array so it at least fits in your L3 cache. How big is your L3 cache? If it is say 24 MB, try reducing the array to just 16 MB and you should see a performance improvement.

There are a number of things that could be happening. First of all, try running each version of your program multiple times consecutively and averaging those. Secondly, assigning to an array in Java is a method call that performs error checking (such as throwing ArrayIndexOutOfBoundsException when necessary). This is naturally going to be a bit slower than a variable assignment. If you have a really time-sensitive piece of code, consider using JNI for the numerical operations: http://docs.oracle.com/javase/6/docs/technotes/guides/jni/. This will often make your array logic faster.

That's because h is a local variable and is allocated on the stack, whereas the array is stored in the main memory, which is way slower to write to.
Also note that, if this is really a high-performance application, you should put your main logic inside the for loop and avoid the overhead of calling a method. The instructions could be inlined for you, but you should not rely on it.

Why is Java HashMap slowing down?

I try to build a map with the content of a file and my code is as below:
System.out.println("begin to build the sns map....");
String basePath = PropertyReader.getProp("oldbasepath");
String pathname = basePath + "\\user_sns.txt";
FileReader fr;
Map<Integer, List<Integer>> snsMap =
new HashMap<Integer, List<Integer>>(2000000);
try {
fr = new FileReader(pathname);
BufferedReader br = new BufferedReader(fr);
String line;
int i = 1;
while ((line = br.readLine()) != null) {
System.out.println("line number: " + i);
i++;
String[] strs = line.split("\t");
int key = Integer.parseInt(strs[0]);
int value = Integer.parseInt(strs[1]);
List<Integer> list = snsMap.get(key);
//if the follower is not in the map
if(snsMap.get(key) == null)
list = new LinkedList<Integer>();
list.add(value);
snsMap.put(key, list);
System.out.println("map size: " + snsMap.size());
}
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("finish building the sns map....");
return snsMap;
The program is very fast at first but gets much slowly when the information printed is :
map size: 1138338
line number: 30923602
map size: 1138338
line number: 30923603
....
I try to find to reason with two System.out.println() clauses to judge the preformance of BufferedReader and HashMap instead of a Java profiler.
Sometimes it takes a while to get the information of the map size after getting the line number information, and sometimes, it takes a while to get the information of the line number information after get the map size. My question is: which makes my program slow? the BufferedReader for a big file or HashMap for a big map?

If you are testing this from inside Eclipse, you should be aware of the huge performance penalty of writing to stdout/stderr, due to Eclipse's capturing that ouptut in the Console view. Printing inside a tight loop is always a performance issue, even outside of Eclipse.
But, if what you are complaining about is the slowdown experienced after processing 30 million lines, then I bet it's a memory issue. First it slows down due to intense GC'ing and then it breaks with OutOfMemoryError.

You will have to check you program with some profiling tools to understand why it is slow.
In general file access is much more slower than in memory operations (unless you are constrained in memory and doing excess GC) so the guess would be that reading file could be the slower here.

Before you profiled, you will not know what is slow and what isn't.
Most likely, the System.out will show up as being the bottleneck, and you'll then have to profile without them again. System.out is the worst thing you can do for finding performance bottlenecks, because in doing so you usually add an even worse bottleneck.
An obivous optimization for your code is to move the line
snsMap.put(key, list);
into the if statement. You only need to put this when you created a new list. Otherwise, the put will just replace the current value with itself.
Java cost associated with Integer objects (and in particular the use of Integers in the Java Collections API) is largely a memory (and thus Garbage Collection!) issue. You can sometimes get significant gains by using primitive collections such as GNU trove, depending how well you can adjust your code to use them efficiently. Most of the gains of Trove are in memory usage. Definitely try rewriting your code to use TIntArrayList and TIntObjectMap from GNU trove. I'd avoid linked lists, too, in particular for primitive types.
Roughly estimated, a HashMap<Integer, List<Integer>> needs at least 3*16 bytes per entry. The doubly linked list again needs at least 2*16 bytes per entry stored. 1m keys + 30m values ~ 1 GB. No overhead included yet. With GNU trove TIntObjectHash<TIntArrayList> that should be 4+4+16 bytes per key and 4 bytes per value, so 144 MB. The overhead is probably similar for both.
The reason that Trove uses less memory is because the types are specialized for primitive values such as int. They will store the int values directly, thus using 4 bytes to store each.
A Java collections HashMap consists of many objects. It roughly looks like this: there are Entry objects that point to a key and a value object each. These must be objects, because of the way generics are handled in Java. In your case, the key will be an Integer object, which uses 16 bytes (4 bytes mark, 4 bytes type, 4 bytes actual int value, 4 bytes padding) AFAIK. These are all 32 bit system estimates. So a single entry in the HashMap will probably need some 16 (entry) + 16 (Integer key) + 32 (yet empty LinkedList) bytes of memory that all need to be considered for garbage collection.
If you have lots of Integer objects, it just will take 4 times as much memory as if you could store everything using int primitives. This is the cost you pay for the clean OOP principles realized in Java.

The best way is to run your program with profiler (for example, JProfile) and see what parts are slow. Also debug output can slow your program, for example.

Hash Map is not slow, but in reality its the fastest among the maps. HashTable is the only thread safe among maps, and can be slow sometimes.
Important note: Close the BufferedReader and File after u read the data... this might help.
eg: br.close()
file.close()
Please check you system processes from task manager, there may be too may processes running in the background.
Sometimes eclipse is real resource heavy, so try to run it from console to check it.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.