Store large DefaultFeatureCollection - java

I have to update org.geotools.feature.DefaultFeatureCollection at every 1 seconds till the app is running (more than an hour).
I have created DefaultFeatureCollection lineCollection = new DefaultFeatureCollection(); as a class member. Adding points to it at every 1 second lineCollection.add(feature);
public void addLines(Coordinate[] coords) {
try {
line = geometryFactory.createLineString(coords);
featureBuilder.add(line);
feature = featureBuilder.buildFeature(null);
lineCollection.add(feature);
}
catch(Exception e) {
e.printStackTrace();
}
}
However, the collection gets huge and heap memory increases gradually, resulting in high CPU usage and app lagging.
Is there a way to free memory once line is displayed on map ?

You've tagged your question [memory-leaks] - is there any evidence you are leaking memory? You can use jmap to check. If so the developers would love to hear about it with the evidence.
It seems more likely that you are just drawing 3600 lines (60*60) after an hour with out having any indexing. If your dataset was constant I'd recommend a SpatialIndexFeatureCollection but as yours changes I would suggest you use a GeoPackage or other database based store (if you already have one) that will manage a spatial index for your lines.

Related

How to detect memory-pressure in a java program?

I have a batch process, written in java, that analyzes extremely long sequences of tokens (maybe billions or even trillions of them!) and observes bi-gram patterns (aka, word-pairs).
In this code, bi-grams are represented as Pairs of Strings, using the ImmutablePair class from Apache commons. I won't know in advance the cardinality of the tokens. They might be very repetitive, or each token might be totally unique.
The more data I can fit into memory, the better the analysis will be!
But I definitely can't process the whole job at once. So I need to load as much data as possible into a buffer, perform a partial analysis, flush my partial results to a file (or to an API, or whatever), then clear my caches and start over.
One way I'm optimizing memory usage is by using Guava interners to de-duplicate my String instances.
Right now, my code looks essentially like this:
int BUFFER_SIZE = 100_000_000;
Map<Pair<String, String>, LongAdder> bigramCounts = new HashMap<>(BUFFER_SIZE);
Interner<String> interner = Interners.newStrongInterner();
String prevToken = null;
Iterator<String> tokens = getTokensFromSomewhere();
while (tokens.hasNest()) {
String token = interner.intern(tokens.next());
if (prevToken != null) {
Pair<String, String> bigram = new ImmutablePair(prevToken, token);
LongAdder bigramCount = bigramCounts.computeIfAbsent(
bigram,
(c) -> new LongAdder()
);
bigramCount.increment();
// If our buffer is full, we need to flush!
boolean tooMuchMemoryPressure = bigramCounts.size() > BUFFER_SIZE;
if (tooMuchMemoryPressure) {
// Analyze the data, and write the partial results somewhere
doSomeFancyAnalysis(bigramCounts);
// Clear the buffer and start over
bigramCounts.clear();
}
}
prevToken = token;
}
The trouble with this code is that this is a very crude way of determining whether there is tooMuchMemoryPressure.
I want to run this job on many different kinds of hardware, with varying amounts of memory. No matter the instance, I want this code to automatically adjust to maximize the memory consumption.
Rather than using some hard-coded constant like BUFFER_SIZE (derived through experimentation, heuristics, guesswork), I actually just want ask the JVM whether the memory is almost full. But that's a very complicated question, considering the complexities of mark/sweep algorithms, and all the different generational collectors.
What would be a good general-purpose approach for accomplishing something like this, assuming this batch-job might run on a variety of different machines, with different amounts of available memory? I don't need this to be extremely precise... I'm just looking for a rough signal to know that I need to flush the buffer soon, based on the state of the actual heap.
The simplest way to get a first glimpse of what is going on with the process' heap space is Runtime.freeMemory() together with .maxMemory and .totalMemory. Yet the first does not factor in garbage and so is an under-estimation at best and may be completely misleading just before the GC kicks in.
Assuming that for your application "memory pressure" basically means "(soon) not enough", the interesting value is free memory right after a GC.
This is available by using GarbageCollectorMXBean
which provides GcInfo with memory usage after the GC.
The bean can be watched exactly after GC since it is a NotificationEmitter, despite this is not being advertised in the Javadoc. Some minimal code, patterned after a longer example is
void registerCallback() {
List<GarbageCollectorMXBean> gcbeans =
java.lang.management.ManagementFactory.getGarbageCollectorMXBeans();
for (GarbageCollectorMXBean gcbean : gcbeans) {
System.out.println(gcbean.getName());
NotificationEmitter emitter = (NotificationEmitter) gcbean;
emitter.addNotificationListener(this::handle, null, null);
}
}
private void handle(Notification notification, Object handback) {
if (!notification.getType()
.equals(GarbageCollectionNotificationInfo.GARBAGE_COLLECTION_NOTIFICATION)) {
return;
}
GarbageCollectionNotificationInfo info = GarbageCollectionNotificationInfo
.from((CompositeData) notification.getUserData());
GcInfo gcInfo = info.getGcInfo();
gcInfo.getMemoryUsageAfterGc().forEach((name, memUsage) -> {
System.err.println(name+ "->" + memUsage);
});
}
There will be several memUsage entries and this will also differ depending on the GC. But from the values provided, used, committed and max we can derive upper limits on free memory which again should give the "rough signal" the OP is asking for.
The doSomeFancyAnalysis will certainly also need its share of fresh memory, so with a very rough estimate how much that will be per bigramm to analyze, this could be the limit to watch for.

Surviving generations keep increasing while running Solr query

I am testing a query with jSolr (7.4) because I believe it is causing a memory leak in my program. But I am not sure it is indeed a memory leak, so I call for advices!
This method is called several times during the running time of my indexing program (should be able to run weeks / months without any problems). That's why I am testing it in a loop that I profile with Netbeans Profiler.
If I simply retrieve the id from all documents (there are 33k) in a given index :
public class MyIndex {
// This is used as a cache variable to avoid querying the index everytime the list of documents is needed
private List<MyDocument> listOfMyDocumentsAlreadyIndexed = null;
public final List<MyDocument> getListOfMyDocumentsAlreadyIndexed() throws SolrServerException, HttpSolrClient.RemoteSolrException, IOException {
SolrQuery query = new SolrQuery("*:*");
query.addField("id");
query.setRows(Integer.MAX_VALUE); // we want ALL documents in the index not only the first ones
SolrDocumentList results = this.getSolrClient().
query(query).getResults();
/**
* The following was commented for the test,
* so that it can be told where the leak comes from.
*
*/
// listOfMyDocumentsAlreadyIndexed = results.parallelStream()
// .map((doc) -> { // different stuff ...
// return myDocument;
// })
// .collect(Collectors.toList());
return listOfMyDocumentsAlreadyIndexed;
/** The number of surviving generations
* keeps increasing whereas if null is
* returned then the number of surviving
* generations is not increasing anymore
*/
}
I get this from the profiler (after nearly 200 runs that could simulate a year of runtime for my program) :
The object that is most surviving is String :
Is the growing number of surviving generations the expected behaviour while querying for all documents in the index ?
If so is it the root cause of the "OOM Java heap space" error that I get after some time on the production server as it seems to be from the stacktrace :
Exception in thread "Timer-0" java.lang.OutOfMemoryError: Java heap space
at org.noggit.CharArr.resize(CharArr.java:110)
at org.noggit.CharArr.reserve(CharArr.java:116)
at org.apache.solr.common.util.ByteUtils.UTF8toUTF16(ByteUtils.java:68)
at org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:868)
at org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:857)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:266)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readSolrDocument(JavaBinCodec.java:541)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:305)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:747)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:272)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readSolrDocumentList(JavaBinCodec.java:555)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:307)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:200)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:274)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:178)
at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:50)
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:614)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:957)
Would increasing the heap space ("-Xmx") from 8GB to anything greater solve the problem definitely or would it just postpone it ? What can be done to workaround this ?
Edit some hours later
If null is returned from the method under test (getListOfMyDocumentsAlreadyIndexed) then the number of surviving generations remains stable throughout the test :
So even though I was NOT using the result of the query for this test (because I wanted to focuse only on where the leak happened) it looks like returning an instance variable (even though it was null) is not a good idea. I will try to remove it.
Edit even later
I noticed that the surviving generations were still increasing in the telemetry tab when I was profiling "defined classes" ("focused (instrumented)") whereas it was stable when profiling "All classes" ("General (sampled)"). So I am not sure it solved the problem :
Any hint greatly appreciated :-)
The problem stems from the following line :
query.setRows(Integer.MAX_VALUE);
This should not be done according to this article :
The rows parameter for Solr can be used to return more than the default of 10 rows. I have seen users successfully set the rows parameter to 100-200 and not see any issues. However, setting the rows parameter higher has a big memory consequence and should be avoided at all costs.
So problem have been solved by retrieving the documents by chunks of 200 docs following this solr article on pagination :
SolrQuery q = (new SolrQuery(some_query)).setRows(r).setSort(SortClause.asc("id"));
String cursorMark = CursorMarkParams.CURSOR_MARK_START;
boolean done = false;
while (! done) {
q.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
QueryResponse rsp = solrServer.query(q);
String nextCursorMark = rsp.getNextCursorMark();
doCustomProcessingOfResults(rsp);
if (cursorMark.equals(nextCursorMark)) {
done = true;
}
cursorMark = nextCursorMark;
}
Please note : you should not exceed 200 documents in setRows otherwise the memory leak still happens (e.g. for 500 it does happen).
Now the profiler gives much better results regarding surviving generations as they do not increase over time anymore.
However the method is much slower.

Java , add half a million objects to ArrayList from sql Query

I have a query with a resultset of half a million records, with each record I'm creating an object and trying to add it into an ArrayList.
How can I optimize this operation to avoid memory issues as I'm getting out of heap space error.
This is a fragment o code :
while (rs.next()) {
lista.add(sd.loadSabanaDatos_ResumenLlamadaIntervalo(rs));
}
public SabanaDatos loadSabanaDatos_ResumenLlamadaIntervalo(ResultSet rs)
{
SabanaDatos sabanaDatos = new SabanaDatos();
try {
sabanaDatos.setId(rs.getInt("id"));
sabanaDatos.setHora(rs.getString("hora"));
sabanaDatos.setDuracion(rs.getInt("duracion"));
sabanaDatos.setNavegautenticado(rs.getInt("navegautenticado"));
sabanaDatos.setIndicadorasesor(rs.getInt("indicadorasesor"));
sabanaDatos.setLlamadaexitosa(rs.getInt("llamadaexitosa"));
sabanaDatos.setLlamadanoexitosa(rs.getInt("llamadanoexitosa"));
sabanaDatos.setTipocliente(rs.getString("tipocliente"));
} catch (SQLException e) {
logger.info("dip.sabana.SabanaDatos SQLException : "+ e);
e.printStackTrace();
}
return sabanaDatos;
}
NOTE: The reason of using list is that this is a critic system, and I just can make a call every 2 hours to the bd. I don't have permission to do more calls to the bd in short times, but I need to show data every 10 minutes. Example : first query 10 rows, I show 1 rows each minute after the sql query.
I dont't have permission to create local database, write file or other ... Just acces to memory.
First Of All - It is not a good practice to read half million objects
You can think of breaking down the number of records to be read into small chunks
As a solution to this you can think of following options
1 - use of CachedRowSetImpl - it is same resultSet , it is a bad practice to keep resultSet open (as it is a Database connection property) If you use ArrayList - then you are again performing operations and utilizing the memory
For more info on cachedRowSet you can go to
https://docs.oracle.com/javase/tutorial/jdbc/basics/cachedrowset.html
2 - you can think of using an In-Memory Database, such as HSQLDB or H2. They are very lightweight and fast, provide the JDBC interface you can run the SQL queries as well
For HSQLDB implementation you can check
https://www.tutorialspoint.com/hsqldb/
It might help to have Strings interned, have for two occurrences of the same string just one single object.
public class StringCache {
private Map<String, String> identityMap = new Map<>();
public String cached(String s) {
if (s == null) {
return null;
}
String t = identityMap.get(s);
if (t == null) {
t = s;
identityMap.put(t, t);
}
return t;
}
}
StringCache horaMap = new StringCache();
StringCache tipoclienteMap = new StringCache();
sabanaDatos.setHora(horaMap.cached(rs.getString("hora")));
sabanaDatos.setTipocliente(tipoclienteMap .cached(rs.getString("tipocliente")));
Increasing memory is already said.
A speed-up is possible by using column numbers; if needed gotten from the column name once before the loop (rs.getMetaData()).
Option1:
If you need all the items in the list at the same time you need to increase the heap space of the JVM, adding the argument -Xmx2G for example when you launch the app (java -Xmx2G -jar yourApp.jar).
Option2:
Divide the sql in more than one call
Some of your options:
Use a local database, such as SQLite. That's a very lightweight database management system which is easy to install – you don't need any special privileges to do so – its data is held in a single file in a directory of your choice (such as the directory that holds your Java application) and can be used as an alternative to a large Java data structure such as a List.
If you really must use an ArrayList, make sure you take up as little space as possible. Try the following:
a. If you know the approximate number of rows, then construct your ArrayList with an appropriate initialCapacity to avoid reallocations. Estimate the maximum number of rows your database will grow to, and add another few hundred to your initialCapacity just in case.
b. Make sure your SabanaDatos objects are as small as they can be. For example, make sure the id field is an int and not an Integer. If the hora field is just a time of day, it can be more efficiently held in a short than a String. Similarly for other fields, e.g. duracion - perhaps it can even fit into a byte, if its range allows it to? If you have several flag/Boolean fields, they can be packed into a single byte or short as bits. If you have String fields that have a lot of repetitions, you can intern them as per Joop's suggestion.
c. If you still get out-of-memory errors, increase your heap space using the JVM flags -Xms and -Xmx.

OutOfMemoryErrors even after using WeakReference's for keys and values

Below is a small test I've coded to educate myself on references API. I thought this would never throw OOME but it is throwing it. I am unable to figure out why. appreciate any help on this.
public static void main(String[] args)
{
Map<WeakReference<Long>, WeakReference<Double>> weak = new HashMap<WeakReference<Long>, WeakReference<Double>>(500000, 1.0f);
ReferenceQueue<Long> keyRefQ = new ReferenceQueue<Long>();
ReferenceQueue<Double> valueRefQ = new ReferenceQueue<Double>();
int totalClearedKeys = 0;
int totalClearedValues = 0;
for (long putCount = 0; putCount <= Long.MAX_VALUE; putCount += 100000)
{
weak(weak, keyRefQ, valueRefQ, 100000);
totalClearedKeys += poll(keyRefQ);
totalClearedValues += poll(valueRefQ);
System.out.println("Total PUTs so far = " + putCount);
System.out.println("Total KEYs CLEARED so far = " + totalClearedKeys);
System.out.println("Total VALUESs CLEARED so far = " + totalClearedValues);
}
}
public static void weak(Map<WeakReference<Long>, WeakReference<Double>> m, ReferenceQueue<Long> keyRefQ,
ReferenceQueue<Double> valueRefQ, long limit)
{
for (long i = 1; i <= limit; i++)
{
m.put(new WeakReference<Long>(new Long(i), keyRefQ), new WeakReference<Double>(new Double(i), valueRefQ));
long heapFreeSize = Runtime.getRuntime().freeMemory();
if (i % 100000 == 0)
{
System.out.println(i);
System.out.println(heapFreeSize / 131072 + "MB");
System.out.println();
}
}
}
private static int poll(ReferenceQueue<?> keyRefQ)
{
Reference<?> poll = keyRefQ.poll();
int i = 0;
while (poll != null)
{
//
poll.clear();
poll = keyRefQ.poll();
i++;
}
return i;
}
}
And below is the log when ran with 64MB of heap
Total PUTs so far = 0
Total KEYs CLEARED so far = 77982
Total VALUESs CLEARED so far = 77980
100000
24MB
Total PUTs so far = 100000
Total KEYs CLEARED so far = 134616
Total VALUESs CLEARED so far = 134614
100000
53MB
Total PUTs so far = 200000
Total KEYs CLEARED so far = 221489
Total VALUESs CLEARED so far = 221488
100000
157MB
Total PUTs so far = 300000
Total KEYs CLEARED so far = 366966
Total VALUESs CLEARED so far = 366966
100000
77MB
Total PUTs so far = 400000
Total KEYs CLEARED so far = 366968
Total VALUESs CLEARED so far = 366967
100000
129MB
Total PUTs so far = 500000
Total KEYs CLEARED so far = 533883
Total VALUESs CLEARED so far = 533881
100000
50MB
Total PUTs so far = 600000
Total KEYs CLEARED so far = 533886
Total VALUESs CLEARED so far = 533883
100000
6MB
Total PUTs so far = 700000
Total KEYs CLEARED so far = 775763
Total VALUESs CLEARED so far = 775762
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at Referencestest.weak(Referencestest.java:38)
at Referencestest.main(Referencestest.java:21)
from http://weblogs.java.net/blog/2006/05/04/understanding-weak-references
I think your use of HashMap is likely to be the issue. You might want to use WeakHashMap
To solve the "widget serial number"
problem above, the easiest thing to do
is use the built-in WeakHashMap class.
WeakHashMap works exactly like
HashMap, except that the keys (not the
values!) are referred to using weak
references. If a WeakHashMap key
becomes garbage, its entry is removed
automatically. This avoids the
pitfalls I described and requires no
changes other than the switch from
HashMap to a WeakHashMap. If you're
following the standard convention of
referring to your maps via the Map
interface, no other code needs to even
be aware of the change.
The heart of the problem is probably that you're filling your heap with WeakReference-objects, the weak references are cleared when you're getting low on memory, but the reference objects themselves are not, so your hashmap is filling up with boat-load if WeakReference objects (not to mention the object array the hashmap uses, which will grow indefinitely), all pointing to null.
The solution, as already pointed out, is a weak hashmap, which will clear out those objects if they're no longer in use (this is done during put).
EDIT:
As Kevin pointed out, you already have your reference-queue logic worked out (I didn't pay close enough attention), a solution using your code is to just clear it out of the map at the point where the key has been collected. This is exactly how weak hash map works (where the poll is simply triggered on insert).
Even when your weak references let go of the things they are referencing, they still do not get recycled themselves.
So eventually your hash will fill up with references to nothing and crash.
What you would need (if you wanted to do it this way) would be to have an event triggered by object deletion that went in and removed the reference from the hash. (which would cause threading issues you need to be aware of as well)
I'm not a java expert at all, but I know in .NET when doing a lot of large object memory allocation you can get heap fragmentation to the point where only small pieces of contiguous memory are available for allocation even though much more memory appears as "free".
A quick google search on "java heap fragmentation" brings up some seemingly relevant result although I haven't taken a good look at them.
Other's have correctly pointed out what the problem is; e.g. #roe, #Bill K.
But another way to solve this kind problem (apart from scratching your head, asking on SO, etc), is to look and see how the Sun recommended approach works. In this case, you can find it in the source code for the WeakHashMap class.
There are a few ways to find Java source code:
If you have a decent Java IDE to running, it should be able to show you the source code of any class in the class library.
Most J2SE JDK downloads include source JAR files for (at least) the public API classes.
You can specifically download full source distributions for the OpenJDK-based releases of Java.
But the ZERO EFFORT approach is to do a Google search, using the fully qualified name of the class with ".java.html" tacked on the end. For example, searching for "java.util.WeakHashMap.java.html" gives this link in the first page of search results.
And the source will tell you that the standard WeakHashMap implementation explicitly polls its reference queue to expunge stale (i.e. broken) weak references from the map's key set. In fact, it does this every time you access or update the map, or even just ask for its size.
An other problem might be that Java for some reason don't always activate its garbadge collecter when running out of memmory, so you might need to insert explicit calls to activate the collector. Try something like
if( (putCount%1000)===0)
Runtime.getRuntime().gc();
in your loop.
Edit: It seems that the new java implementations from sun now does call the garbadge collector before throwing OutOfMemmoryException, but I am pretty sure that the following program would throw OutOfMemmoryException with jre1.3 or 1.4
public class Test {
public static void main(String args[]) {
while(true) {
byte []data=new byte[1000000];
}
}
}

Any code tips for speeding up random reads from a Java FileChannel?

I have a large (3Gb) binary file of doubles which I access (more or less) randomly during an iterative algorithm I have written for clustering data. Each iteration does about half a million reads from the file and about 100k writes of new values.
I create the FileChannel like this...
f = new File(_filename);
_ioFile = new RandomAccessFile(f, "rw");
_ioFile.setLength(_extent * BLOCK_SIZE);
_ioChannel = _ioFile.getChannel();
I then use a private ByteBuffer the size of a double to read from it
private ByteBuffer _double_bb = ByteBuffer.allocate(8);
and my reading code looks like this
public double GetValue(long lRow, long lCol)
{
long idx = TriangularMatrix.CalcIndex(lRow, lCol);
long position = idx * BLOCK_SIZE;
double d = 0;
try
{
_double_bb.position(0);
_ioChannel.read(_double_bb, position);
d = _double_bb.getDouble(0);
}
...snip...
return d;
}
and I write to it like this...
public void SetValue(long lRow, long lCol, double d)
{
long idx = TriangularMatrix.CalcIndex(lRow, lCol);
long offset = idx * BLOCK_SIZE;
try
{
_double_bb.putDouble(0, d);
_double_bb.position(0);
_ioChannel.write(_double_bb, offset);
}
...snip...
}
The time taken for an iteration of my code increases roughly linearly with the number of reads. I have added a number of optimisations to the surrounding code to minimise the number of reads, but I am at the core set that I feel are necessary without fundamentally altering how the algorithm works, which I want to avoid at the moment.
So my question is whether there is anything in the read/write code or JVM configuration I can do to speed up the reads? I realise I can change hardware, but before I do that I want to make sure that I have squeezed every last drop of software juice out of the problem.
Thanks in advance
As long as your file is stored on a regular harddisk, you will get the biggest possible speedup by organizing your data in a way that gives your accesses locality, i.e. causes as many get/set calls in a row as possible to access the same small area of the file.
This is more important than anything else you can do because accessing random spots on a HD is by far the slowest thing a modern PC does - it takes about 10,000 times longer than anything else.
So if it's possible to work on only a part of the dataset (small enough to fit comfortably into the in-memory HD cache) at a time and then combine the results, do that.
Alternatively, avoid the issue by storing your file on an SSD or (better) in RAM. Even storing it on a simple thumb drive could be a big improvement.
Instead of reading into a ByteBuffer, I would use file mapping, see: FileChannel.map().
Also, you don't really explain how your GetValue(row, col) and SetValue(row, col) access the storage. Are row and col more or less random? The idea I have in mind is the following: sometimes, for image processing, when you have to access pixels like row + 1, row - 1, col - 1, col + 1 to average values; on trick is to organize the data in 8 x 8 or 16 x 16 blocks. Doing so helps keeping the different pixels of interest in a contiguous memory area (and hopefully in the cache).
You might transpose this idea to your algorithm (if it applies): you map a portion of your file once, so that the different calls to GetValue(row, col) and SetValue(row, col) work on this portion that's just been mapped.
Presumably if we can reduce the number of reads then things will go more quickly.
3Gb isn't huge for a 64 bit JVM, hence quite a lot of the file would fit in memory.
Suppose that you treat the file as "pages" which you cache. When you read a value, read the page around it and keep it in memory. Then when you do more reads check the cache first.
Or, if you have the capacity, read the whole thing into memory, in at the start of processing.
Access byte-by-byte always produce poor performance (not only in Java). Try to read/write bigger blocks (e.g. rows or columns).
How about switching to database engine for handling such amounts of data? It would handle all optimizations for you.
May be This article helps you ...
You might want to consider using a library which is designed for managing large amounts of data and random reads rather than using raw file access routines.
The HDF file format may by a good fit. It has a Java API but is not pure Java. It's licensed under an Apache Style license.

Categories

Resources