i have been struggling to understand the following question.
Is zero-matrix necessarily memory efficient? does zero-matrix cost less memory (or does it not cost any memory)?
I tried to verify it in java but it turns out the memory has been allocated for the specified size.
I am not sure about C/C++ or other language like matlab and octave and how they manage the matrix and vector memory;
The reason why i am asking this is want to build a sparse matrix with huge size, but most of entries are zeros, it turns out that java is not a good choice, because zero-matrix in java still cost much memory. Does any one have any experience with this problem? not sure how you deal with it, your help will be appreciated.
Thanks
Straightforward zero-filled matrix will cost you in any language: the amount allocated does not depend on what numbers you fill it with.
Take a look at e.g. UJMP that provides sparse matrices support, and many algorithms. Probably other implementations exist.
In general, if you find something to be difficult to implement but likely useful, google for open-source libraries. Chances are many wheels have been invented already.
Because you have to allocate space for your matrix, it will take space independently of which numbers it will hold. (Even if null).
However I can imagine that someone somewhere has designed a data structure to handle this.
First thing that pops up in my mind is that you could create a data structure which holds positions with its according value. If you were to ask for the value at a given position that doesn't exist, you could return 0. Of course this would be inefficient for small matrices or matrices with only a few zeroes. Just an idea.
Related
I'm programming something in Java, for context see this question: Markov Model descision process in Java
I have two options:
byte[MAX][4] mypatterns;
or
ArrayList mypatterns
I can use a Java ArrayList and append a new arrays whenever I create them, or use a static array by calculating all possible data combinations, then looping through to see which indexes are 'on or off'.
Essentially, I'm wondering if I should allocate a large block that may contain uninitialized values, or use the dynamic array.
I'm running in fps, so looping through 200 elements every frame could be very slow, especially because I will have multiple instances of this loop.
Based on theory and what I have heard, dynamic arrays are very inefficient
My question is: Would looping through an array of say, 200 elements be faster than appending an object to a dynamic array?
Edit>>>
More information:
I will know the maxlength of the array, if it is static.
The items in the array will frequently change, but their sizes are constant, therefore I can easily change them.
Allocating it statically will be the likeness of a memory pool
Other instances may have more or less of the data initialized than others
You right really, I should use a profiler first, but I'm also just curious about the question 'in theory'.
The "theory" is too complicated. There are too many alternatives (different ways to implement this) to analyse. On top of that, the actual performance for each alternative will depend on the the hardware, JIT compiler, the dimensions of the data structure, and the access and update patterns in your (real) application on (real) inputs.
And the chances are that it really doesn't matter.
In short, nobody can give you an answer that is well founded in theory. The best we can give is recommendations that are based on intuition about performance, and / or based on software engineering common sense:
simpler code is easier to write and to maintain,
a compiler is a more consistent1 optimizer than a human being,
time spent on optimizing code that doesn't need to be optimized is wasted time.
1 - Certainly over a large code-base. Given enough time and patience, human can do a better job for some problems, but that is not sustainable over a large code-base and it doesn't take account of the facts that 1) compilers are always being improved, 2) optimal code can depend on things that a human cannot take into account, and 3) a compiler doesn't get tired and make mistakes.
The fastest way to iterate over bytes is as a single arrays. A faster way to process these are as int or long types as process 4-8 bytes at a time is faster than process one byte at a time, however it rather depends on what you are doing. Note: a byte[4] is actually 24 bytes on a 64-bit JVM which means you are not making efficient use of your CPU cache. If you don't know the exact size you need you might be better off creating a buffer larger than you need even if you are not using all the buffer. i.e. in the case of the byte[][] you are using 6x time the memory you really need already.
Any performance difference will not be visible, when you set initialCapacity on ArrayList. You say that your collection's size can never change, but what if this logic changes?
Using ArrayList you get access to a lot of methods such as contains.
As other people have said already, use ArrayList unless performance benchmarks say it is a bottle neck.
If I have to store 3 integer values and would like to just retrieve the same , no calculation is required.Which one of the following would be a better option?
int i,j,k;
or
int [] arr = new int[3];
Array would be allocating 3 continuous blocks of memory (after allocation of space by JVM) or randomly assigning variables to some memory location (which I guess would consume lesser time for JVM as compared to array).
Apologies if the question is too trivial.
The answer is: It depends.
You shouldn't think too much about the performance implications for this case. the performance difference between the two is not big enough to notice.
What you really need to be on the look out for is readability and maintainability.
if i, j, and k, all essentially mean the same thing, and you're going to be using them the same way, and you feel like you might want to iterate over them, then it might make sense to use an array, so that you can iterate over them more easily.
if they're different values, with different meanings, and you're going to be using them differently, than it does not makes sense to include them in an array. They should each have their own identity, and their own descriptive variable name.
Choose whichever makes most sense semantically:
If these variables are three for a fundamental reason (maybe they are coordinates in the 3D space of a 3D game engine), then use three separate variables (because making, say, a 4D game engine is not a trivial change).
If these variables are three now but they could be trivially changed to be four tomorrow, it's reasonable to consider an array (or, better yet, a new type that contains them).
In terms of performance, traditionally local variables are faster than arrays. Under specific circumstances, the array may be allocated on the stack. Under specific circumstances, bound checks can be removed.
But don't make decisions based on performance, unless you have first done everything else correctly first and you have thorough tests and this particular piece of code is a performance-critical hot-spot and you're sure that it is the bottleneck of your application at the moment.
It depends on how would you access them. Array is of course an overhead, because you will first calculate a reference to a value and then get it. So if these values are totally unrelated, array is bad, and it may even count as code obfuscation. But naming variables like i, j, k is sort of obfuscation, too. Obfuscation is better to do automatically at build stage, there are tools like Proguard™ which can do it.
The two are not the same at all and are for different purpose.
in the first example you gave int i,j,k; you are pushing the values on to the stack,
The stack is for short term use and small data sizes i.e. function call arguments and iterator states.
The second example you gave int [] arr = new int[3]; the new keyword is allocating actual memory for the heap hat was giving to the process by the operating system.
The stack is optimized for short term use and all (most) all CPUs have a registers that are dedicated to point at the stack location and base making the stack a grate place for small dirty variables. The stack is also limited in size (by theory), its only a few KB in size (average case).
The heap on he other hand is proper memory allocation for large data types and proper memory management.
So, the two may be used for the same thing but it dose not mean it's right.
Arrays/Objects/Dicts go in allocated memory from he heap, function arguments (and iterator indexes usually) go on the stack.
It depends, but most probably, using distinct variables is the way to go.
In general, don't do micro-optimizations. Nobody will ever notice any difference in performance. Readable and maintainable code is what really matters in high-level languages.
See this article on micro-optimizations.
I'm after an efficient 2D mapping algorithm, and I've tried a number of implementations, but they all seem lacking. I'm hoping the stackoverflow world can help out with some pointers to existing, tried-n-tested algorithms I could learn from.
My goal is to display articles based on the genre of writing; for the prototype, I am using Philosophy, Programming, Politics and Poetry, since those are the only four styles of writing I have.
Each article is weighted based on each category, and the home view will have each category as a header in each corner. The articles are then laid out in word-cloud-like format, with "artificial gravity" placing each item as-near-as-possible to its main category (or between its main categories), without overlapping.
Currently, I am using an inefficient algorithm which stores arrays of rectangles to perform hit-test-and-search every time an article is added to the view, (with A* search patterns to find empty space to fill). By approximating a single destination for all articles of the same weight, and by using a round-robin queue to pick off articles from each pool, I can achieve fresh results (arrays are sorted by weight, then timestamp), with positioning-by-relevance ("artificial gravity").
However, using A* to blindly search seems really wasteful, even with heuristics to make each article check closest to it's target marks first. I need a more efficient way to iterate over a 2D space.
I'm wondering if a Linked-List approach might work better; rather than go searching blindly in all directions for empty space, I can just iterate through connected nodes to ask each one if it has either a) nearby free space, or b) other connected nodes to ask (and always ask the closest node first).
If there are any better algorithms available, or critiques on my methods, any and all help would surely be appreciated.
I am using gwt elemental + java in this gui, but any 2D mapping algorithm in any language will surely help.
[EDIT (request for more details)] : The main problem here is the amount of work each new addition performs; it produces noticable glitches in the ui thread, especially when there is almost no space left, as I am searching many points in a given radius for enough free space to fit the article.
If I cut the algorithm off too soon, I get blank spots that could have been filled. If I let it run too long, the ui glitches pretty bad, and I'm sure users will hate it.
What is the fastest / most efficient way to store and modify collections of 2D space?
You haven't provided enough information to say what would make an algorithm "better." Faster? Produces layouts that are "nicer" by some metric for quality? Able to handler bigger data sources?
There is certainly nothing wrong with arrays, nor with A*. If they are giving acceptable results with the size of problem you are trying to solve, how can they be "wasteful?" Linked data structures are worthwhile only if they reduce cost of frequently needed operations.
If you sharpen the problem, you're more likely to get a useful answer.
At any rate, there is an enormous literature on "graph layout" and "graph drawing." Try searching on these terms. If you can represent your desired layout as a collection of nodes and edges, these might apply. Many are based on simulated spring systems, which seems akin to what you are doing.
For ultra-fast code it essential that we keep locality of reference- keep as much of the data which is closely used together, in CPU cache:
http://en.wikipedia.org/wiki/Locality_of_reference
What techniques are to achieve this? Could people give examples?
I interested in Java and C/C++ examples. Interesting to know of ways people use to stop lots of cache swapping.
Greetings
This is probably too generic to have clear answer. The approaches in C or C++ compared to Java will differ quite a bit (the way the language lays out objects differ).
The basic would be, keep data that will be access in close loops together. If your loop operates on type T, and it has members m1...mN, but only m1...m4 are used in the critical path, consider breaking T into T1 that contains m1...m4 and T2 that contains m4...mN. You might want to add to T1 a pointer that refers to T2. Try to avoid objects that are unaligned with respect to cache boundaries (very platform dependent).
Use contiguous containers (plain old array in C, vector in C++) and try to manage the iterations to go up or down, but not randomly jumping all over the container. Linked Lists are killers for locality, two consecutive nodes in a list might be at completely different random locations.
Object containers (and generics) in Java are also a killer, while in a Vector the references are contiguous, the actual objects are not (there is an extra level of indirection). In Java there are a lot of extra variables (if you new two objects one right after the other, the objects will probably end up being in almost contiguous memory locations, even though there will be some extra information (usually two or three pointers) of Object management data in between. GC will move objects around, but hopefully won't make things much worse than it was before it run.
If you are focusing in Java, create compact data structures, if you have an object that has a position, and that is to be accessed in a tight loop, consider holding an x and y primitive types inside your object rather than creating a Point and holding a reference to it. Reference types need to be newed, and that means a different allocation, an extra indirection and less locality.
Two common techniques include:
Minimalism (of data size and/or code size/paths)
Use cache oblivious techniques
Example for minimalism: In ray tracing (a 3d graphics rendering paradigm), it is a common approach to use 8 byte Kd-trees to store static scene data. The traversal algorithm fits in just a few lines of code. Then, the Kd-tree is often compiled in a manner that minimalizes the number of traversal steps by having large, empty nodes at the top of tree ("Surface Area Heuristics" by Havran).
Mispredictions typically have a probability of 50%, but are of minor costs, because really many nodes fit in a cache-line (consider that you get 128 nodes per KiB!), and one of the two child nodes is always a direct neighbour in memory.
Example for cache oblivious techniques: Morton array indexing, also known as Z-order-curve-indexing. This kind of indexing might be preferred if you usually access nearby array elements in unpredictable direction. This might be valuable for large image or voxel data where you might have 32 or even 64 bytes big pixels, and then millions of them (typical compact camera measure is Megapixels, right?) or even thousands of billions for scientific simulations.
However, both techniques have one thing in common: Keep most frequently accessed stuff nearby, the less frequently things can be further away, spanning the whole range of L1 cache over main memory to harddisk, then other computers in the same room, next room, same country, worldwide, other planets.
Some random tricks that come to my mind, and which some of them I used recently:
Rethink your algorithm. For example, you have an image with a shape and the processing algorithm that looks for corners of the shape. Instead of operating on the image data directly, you can preprocess it, save all the shape's pixel coordinates in a list and then operate on the list. You avoid random the jumping around the image
Shrink data types. Regular int will take 4 bytes, and if you manage to use e.g. uint16_t you will cache 2x more stuff
Sometimes you can use bitmaps, I used it for processing a binary image. I stored pixel per bit, so I could fit 8*32 pixels in a single cache line. It really boosted the performance
Form Java, you can use JNI (it's not difficult) and implement your critical code in C to control the memory
In the Java world the JIT is going to be working hard to achieve this, and trying to second guess this is likely to be counterproductive. This SO question addresses Java-specific issues more fully.
Dear StackOverflowers,
I am in the process of writing an application that sorts a huge amount of integers from a binary file. I need to do it as quickly as possible and the main performance issue is the disk access time, since I make a multitude of reads it slows down the algorithm quite significantly.
The standard way of doing this would be to fill ~50% of the available memory with a buffered object of some sort (BufferedInputStream etc) then transfer the integers from the buffered object into an array of integers (which takes up the rest of free space) and sort the integers in the array. Save the sorted block back to disk, repeat the procedure until the whole file is split into sorted blocks and then merge the blocks together.
The strategy for sorting the blocks utilises only 50% of the memory available since the data is essentially duplicated (50% for the cache and 50% for the array while they store the same data).
I am hoping that I can optimise this phase of the algorithm (sorting the blocks) by writing my own buffered class that allows caching data straight into an int array, so that the array could take up all of the free space not just 50% of it, this would reduce the number of disk accesses in this phase by a factor of 2. The thing is I am not sure where to start.
EDIT:
Essentially I would like to find a way to fill up an array of integers by executing only one read on the file. Another constraint is the array has to use most of the free memory.
If any of the statements I made are wrong or at least seem to be please correct me,
any help appreciated,
Regards
when you say limited, how limited... <1mb <10mb <64mb?
It makes a difference since you won't actually get much benefit if any from having large BufferedInputStreams in most cases the default value of 8192 (JDK 1.6) is enough and increasing doesn't ussually make that much difference.
Using a smaller BufferedInputStream should leave you with nearly all of the heap to create and sort each chunk before writing them to disk.
You might want to look into the Java NIO libraries, specifically File Channels and Int Buffers.
You dont give many hints. But two things come to my mind. First, if you have many integers, but not that much distinctive values, bucket sort could be the solution.
Secondly, one word (ok term), screams in my head when I hear that: external tape sorting. In early computer days (i.e. stone age) data relied on tapes, and it was very hard to sort data spread over multiple tapes. It is very similar to your situation. And indeed merge sort was the most often used sorting that days, and as far as I remember, Knuths TAOCP had a nice chapter about it. There might be some good hints about the size of caches, buffers and similar.