I have to get in java protected final static int [] SIEVE = new int [ 1 << 32 ];
But i cant force java to that.
Max sieve what i get is 2^26 i need 2^32 to end my homework. I tried with mask but i need to have SIEVE[n] = k where min{k: k|n & k >2}.
EDIT
I need to find Factor numbers from 2 to 2^63-1 using Sieve and sieve must have information that P[n]= is smallest prime with divide n. I know that with sieve i can Factorise number to 2^52. But how do that exercises with holding on to the content.
EDIT x2 problem solved
You can't. A Java array can have at most 2^31 - 1 elements because the size of an array has to fit in a signed 32-bit integer.
This applies whether you run on a 32 bit or 64 bit JVM.
I suspect that you are missing something in your homework. Is the requirement to be able to find all primes less than 2^32 or something? If that is the case, they expect you to treat each int of the int[] as an array of 32 bits. And you need an array of only 2^25 ints to do that ... if my arithmetic is right.
A BitSet is another good alternative.
A LinkedList<Integer> is a poor alternative. It uses roughly 8 times the memory that an array of the same size would, and the performance of get(int) is going to be horribly slow for a long list ... assuming that you use it in the obvious fashion.
If you want something that can efficiently use as much memory as you can configure your JVM to use, then you should use an int[][] i.e. an array of arrays of integers, with the int[] instances being as large as you can make them.
I need to find Factor numbers from 2 to 2^63-1 using Sieve and sieve must have information that P[n]= is smallest prime with divide n. I know that with sieve i can Factorise number to 2^52. But how do that exercises with holding on to the content.
I'm not really sure I understand you. To factorize a number in the region of 2^64, you only need prime numbers up to 2^32 ... not 2^52. (The square root of 2^64 is 2^32 and a non-prime number must have a prime factor that is less than or equal to its square root.)
It sounds like you are trying to sieve more numbers than you need to.
If you really need to store that much data in memory, try using java.util.LinkedList collection instead.
However, there's a fundamental flaw in your algorithm if you need to store 16GB of data in memory.
If you're talking about Sieve of Eratosthenes and you need to store all primes < 2^32 in an array, you still wouldn't need an array of size 2^32. I'd suggest you use java.util.BitSet to find the primes and either iterate and print or store them in a LinkedList as required.
Related
The descriptions of bitCount() and bitLength() are rather cryptic:
public int bitCount()
Returns the number of bits in the two's complement representation of this BigInteger that differ from its sign bit. This method is useful when implementing bit-vector style sets atop BigIntegers.
Returns:
number of bits in the two's complement representation of this BigInteger that differ from its sign bit.
public int bitLength()
Returns the number of bits in the minimal two's-complement representation of this BigInteger, excluding a sign bit. For positive BigIntegers, this is equivalent to the number of bits in the ordinary binary representation. (Computes (ceil(log2(this < 0 ? -this : this+1))).)
Returns:
number of bits in the minimal two's-complement representation of this BigInteger, excluding a sign bit.
What is the real difference between these two methods and when should I use which?
I have used bitCount occasionally to count the number of set bits in a positive integer but I've only rarely use bitLength and usually when I meant bitCount because the differences between the descriptions are too subtle for me to instantly grok.
Google Attractor: Java BigInteger bitCount vs bitLength
A quick demonstration:
public void test() {
BigInteger b = BigInteger.valueOf(0x12345L);
System.out.println("b = " + b.toString(2));
System.out.println("bitCount(b) = " + b.bitCount());
System.out.println("bitLength(b) = " + b.bitLength());
}
prints
b = 10010001101000101
bitCount(b) = 7
bitLength(b) = 17
So, for positive integers:
bitCount() returns the number of set bits in the number.
bitLength() returns the position of the highest set bit i.e. the length of the binary representation of the number (i.e. log2).
Another basic function is missing:
bitCount() is useful to find the cardinal of a set of integers;
bitLength() is useful to find the largest of integers that are members in this set;
getLowestSetBit() is still needed to find the smallest of integers that are members in this set (this is also needed to implement fast iterators on bitsets).
There are efficient ways to:
reduce a very large bitset to a bitCount() without having to shift each word stored in the bitset (e.g. 64-bit words) using a slow loop over each of the 64-bits. This does not require any loop and can be computed using a small bounded number of arithmetic operations on 64-bit numbers (with the additional benefit: no need to perform any test for loop conditions, parallelism is possible, less than 64 operations for 64-bit words, so the cost is in O(1) time.
compute the bitLength(): you just need the number of words used to store the bitset, or its highest used index in an array of words, and then a small arithmetic operations on the single word stored at this index: on a 64-bit word, at most 8 arithmetic operations are sufficient, so the cost is in O(1) time.
but for the bitSmallest(): you still need to perform a binary search to locate the highest "bit-splitting" position in a word (at unknown position in the lowest subset of words, that still need to be scanned as long as they are all zeroes, so parallelization is difficult and the cost is O(N) time where N is the bitLength() of the bitset) under which all bits are zeroes. And I wonder if we can avoid the costly tests-and-branches on the first non-all zero words, using only arithmetic, so that full parallelism can be used to give a reply in O(1) time for this last word only.
In my opinion the 3rd problem requires a more efficient storage for bitsets than a flat array of words: we need a representation using a binary tree instead:
Suppose you want to store 64 bits in a bitset
this set is equivalent to storing 2 subsets A and B, of 32 bits for each
but instead of naively storing {A, B} you can store {A or B, (A or B) xor A, (A or B) xor B}, where "or" and "xor" are bit-for-bit operations (this basically adding 50% of info, by not storing jsut two separate elements but their "sum" and their respective difference of this sum).
You can apply it recursively for 128 bits, 256 bits, but in fact you could as well avoid the 50% cost at each step by summing more elements. using the "xor" differences instead of elements themselves can be used to accelerate some operations (not shown here), like other compression schemes that are efficient on sparse sets.
This allows faster scanning of zeroes because you can skip very fast, in O(log2(N)) time the null bits and locate words that have some non-zero bits: they have (A or B)==0.
Another common usage of bitsets is to allow them to represent their complement, but this is not easy when the number of integers that the set could have as members if very large (e.g. to represent a set of 64-bit integers): the bitset should then reserve at least one bit to indicate that the bitsets does NOT store directly the integers that are members of the set, but instead store only the integers that are NOT members of the set.
And an efficient representation of the bitset using a tree-like structure should allow each node in the binary tree to choose if it should store the members or the non-members, depending on the cardinality of members in each subrange (each subrange will represent a subset of all integers between k and (k+2^n-1), where k is the node number in the binary tree, each node storing a single word of n bits; one of these bits storing if the word contains members or non-members).
There's an efficient way to store binary trees in a flat indexed array, if the tree is dense enough to have few words set with bits all set to 0 or all set to 1. If this is not the case (for very "sparse" sets), you need something else using pointers like a B-tree, where each page of the B-tree can be either a flat "dense" range, or an ordered index of subtrees: you'll store flat dense ranges in leaf nodes which can be allocated in a flat array, and you'll sore other nodes separately in another store that can also be an array: instead of a pointer from one node to the other for a subbranch of the btree, you use an index in that array; the index itself can have one bit indicating if you are pointing to other pages of branches, or to a leaf node.
But the current default implementation of bitsets in Java collections does not use these technics, so BitSets are still not efficient enough to store very sparse sets of large integers. You need your own library to reduce the storage requirement and still allow fast lookup in the bitset, in O(log2(N)) time, to determine if an integer is a member or not of the set of integers represented by this optimized bitset.
But anyway the default Java implementation is sufficient if you just need bitCount() and bitLength() and your bitsets are used for dense sets, for sets of small integers (for a set of 16-bit integers, a naive approach storing 64K bit, i.e. using 8KB of memory at most, is generally enough).
For very sparse sets of large integers, it will always be more efficient to just store a sorted array of integer values (e.g. not more than one bit every 128 bits), or a hashed table if the bit set would not set more than 1 bit for every range of 32 bits: you can still add an extra bit in these structures to store the "complement" bit.
But I've not found that getLowestSetBit() was efficient enough: the BigInteger package still cannot support very sparse bitsets without huge memory costs, even if BigInteger can be used easility to represent the "complement" bit just as a "sign bit" with its signum() and substract methods, which are efficient.
Very large and very sparse bitsets are needed for example for somme wellknown operations, like searches in large very databases of RDF tuples in a knowledge database, each tuple being indexed by a very large GUID (represented by 128-bit integers): you need to be able to perform binary operations like unions, differences, and complements.
I am reading the implementation details of Java 8 HashMap, can anyone let me know why Java HashMap initial array size is 16 specifically? What is so special about 16? And why is it the power of two always? Thanks
The reason why powers of 2 appear everywhere is because when expressing numbers in binary (as they are in circuits), certain math operations on powers of 2 are simpler and faster to perform (just think about how easy math with powers of 10 are with the decimal system we use). For example, multication is not a very efficient process in computers - circuits use a method similar to the one you use when multiplying two numbers each with multiple digits. Multiplying or dividing by a power of 2 requires the computer to just move bits to the left for multiplying or the right for dividing.
And as for why 16 for HashMap? 10 is a commonly used default for dynamically growing structures (arbitrarily chosen), and 16 is not far off - but is a power of 2.
You can do modulus very efficiently for a power of 2. n % d = n & (d-1) when d is a power of 2, and modulus is used to determine which index an item maps to in the internal array - which means it occurs very often in a Java HashMap. Modulus requires division, which is also much less efficient than using the bitwise and operator. You can convince yourself of this by reading a book on Digital Logic.
The reason why bitwise and works this way for powers of two is because every power of 2 is expressed as a single bit set to 1. Let's say that bit is t. When you subtract 1 from a power of 2, you set every bit below t to 1, and every bit above t (as well as t) to 0. Bitwise and therefore saves the values of all bits below position t from the number n (as expressed above), and sets the rest to 0.
But how does that help us? Remember that when dividing by a power of 10, you can count the number of zeroes following the 1, and take that number of digits starting from the least significant of the dividend in order to find the remainder. Example: 637989 % 1000 = 989. A similar property applies to binary numbers with only one bit set to 1, and the rest set to 0. Example: 100101 % 001000 = 000101
There's one more thing about choosing the hash & (n - 1) versus modulo and that is negative hashes. hashcode is of type int, which of course can be negative. modulo on a negative number (in Java) is negative also, while & is not.
Another reason is that you want all of the slots in the array to be equally likely to be used. Since hash() is evenly distributed over 32 bits, if the array size didn't divide into the hash space, then there would be a remainder causing lower indexes to have a slightly higher chance of being used. Ideally, not just the hash, but (hash() % array_size) is random and evenly distributed.
But this only really matters for data with a small hash range (like a byte or character).
I think the answer to this question should be 16GB, the following is how I calculate:
One integer is 32bit
In java, the range of integers is from -2^31 to 2^31-1, so the total number of integers is 2^32
We need to have a int array with the size of 2^32 to do bucket sorting
So I got the result of 32bit * 2^32 = 16GB
Can anyone tell me if this is correct? Because I found people saying this should be 4GB. I don't know how 4GB is calculated.
One example can be found:
https://www.quora.com/How-would-you-sort-a-100-TB-file-with-only-4-GB
The example question linked to is not asking the same thing. A bucket sort on 1TB of integers would need 4GB of buckets, where each bucket would be 1TB * sizeof(integer) in size. This would not be a reasonable approach.
As for the linked question, to sort a large file, some variation of an external bottom up k-way merge sort would be used, probably with k == 16. The trade off on k is the number of passes it will take to do the sort (fewer passes if k is larger), versus the compare time to find the smallest element of k elements for each element merged (longer compare time if k is larger).
I'm probably missing something and kinda rusted out this sort of thing, I was wondering what is the best way to implement 2^Biginteger into a variable? I'm sure it's a simple question. This is to aid in figuring out a 2-pseudoprime. (Basically to see if 2^N-1 = 1 mod N or not).
So if I understand you, you want to do something like this:
BigInteger bigExp = ... some really large value
BigInteger pow = BigInteger.valueOf(2).pow(bigExp);
Unfortunately, that won't work.
As you noted, there is no pow overload that takes a BigInteger argument. And when you think about it, such a method would be problematic.
The value of 2N is a value with N significant bits when represented in binary. If is larger than will fit into an int, then that means N is 231 or more. Or converting to bytes, that is 228 bytes or 0.25 Gigabytes. For a single number.
That isn't impossibly large. It should be possible to represent numbers that big, even in a 32 bit JVM. The problem is that any BigInteger arithmetic operation is liable to generate another one of these monsters. Just creating a number of this size is going to copy 0.25 Gigabytes, and operations like multiplication and division ... which are O(N^2) for N bit numbers ... are going to take "forever".
Having said that, there would be ways to generate numbers that size. For example, you could allocate a huge byte array (which default initializes to zero), set the appropriate byte to contain a 1 bit, and then use BigInteger(byte[]) to construct the monster number.
As #StephenC points out, these numbers will be too unwieldy.
But
to see if 2^N-1 = 1 mod N or not
You don't need to calculate 2^X. You can calculate 2^X mod N. A much smaller number.
BigInteger.valueOf(2).modPow(n.subtract(ONE), n);
Want to SORT 1 BILLION of integer numbers and my system has just 1 GB of RAM.What could be the fastest and efficient way to sort?
Say we have an input in a text file an integer per line.
We are using java program to sort.
I have specified RAM as we cannot hold all the input integers in the RAM.
Update: Integers are 7 digit numbers.
Integers are 7 digit numbers.
So there are only 10 million possible values.
You have 1GB of RAM. Make an array of counters, one for each possible value.
Read through the file once, count up the counters.
When done, output the numbers according to the final counter values.
Every number can occur at most 1 billion times. So a 32bit counter would be enough. This means a 10M x 4 bytes = 40M byte array.
The simplest thing to do is break the input into smaller files that can fit in memory and sort each, and then merge the results.
Guido van Rossum has a good description of doing this in python while obviously not the same language the principle is the same.
You specified that are sorting a billion 7 (decimal) digit numbers.
If there were no duplicates, you could sort in memory with 107 BITS using radix sort. Since you must have duplicates (107 less than 109), you could implement radix sort using (say) an array of 107 8-bit counters, with a HashMap<Integer, Integer> to deal with the relatively few cases where the counters overflow. Or just an array of 107 32-bit counters.
Another more general approach (that works for any kind of value) is to split the file into N smaller subfiles, sort each subfile in memory, and then perform an N-way merge of the sorted subfiles.
Using a BitSet with 4 billion possible values occupies 512 MB. Just set all the int values you see and write them out in order (they are naturally sorted)
This only works if you don't care about duplicates.
If counting duplicates matters I would still consider either a memory mapped file for counting, or using a merge sort of sorted subsections of data. (I believe the later is an expected answer)
I recently bough a 24 GB PC for under £1K, so a few GB isn't that much unless you limited by a hosted solution. (Or using a mobile device)
Assuming every integer occurs exactly one time you can read the file and for every number you find you set a bit - the bit array has to hold 10000000 bits - this uses only 1,28 MB RAM which should be available... after you have read all integers you just go through the array and output the numbers where a bit ist set...