Googling around for a while to find subsets of a String, i read wikipedia and it mentions that
.....For the whole power set of S we get:
{ } = 000 (Binary) = 0 (Decimal)
{x} = 100 = 4
{y} = 010 = 2
{z} = 001 = 1
{x, y} = 110 = 6
{x, z} = 101 = 5
{y, z} = 011 = 3
{x, y, z} = 111 = 7
Is there a possible way to implement this through program and avoid recursive algorithm which uses string length?
What i understood so far is that, for a String of length n, we can run from 0 to 2^n - 1 and print characters for on bits.
What i couldn't get is how to map those on bits with the corresponding characters in the most optimized manner
PS : checked thread but couldnt understood this and c++ : Power set generated by bits
The idea is that a power set of a set of size n has exactly 2^n elements, exactly the same number as there are different binary numbers of length at most n.
Now all you have to do is create a mapping between the two and you don't need a recursive algorithm. Fortunately with binary numbers you have a real intuitive and natural mapping in that you just add a character at position j in the string to a subset if your loop variable has bit j set which you can easily do with getBit() I wrote there (you can inline it but for you I made a separate function for better readability).
P.S. As requested, more detailed explanation on the mapping:
If you have a recursive algorithm, your flow is given by how you traverse your data structure in the recursive calls. It is as such a very intuitive and natural way of solving many problems.
If you want to solve such a problem without recursion for whatever reason, for instance to use less time and memory, you have the difficult task of making this traversal explicit.
As we use a loop with a loop variable which assumes a certain set of values, we need to make sure to map each value of the loop variable, e.g. 42, to one element, in our case a subset of s, in a way that we have a bijective mapping, that is, we map to each subset exactly once. Because we have a set the order does not matter, so we just need whatever mapping that satisfies these requirements.
Now we look at a binary number, e.g. 42 = 32+8+2 and as such in binary with the position above:
543210
101010
We can thus map 42 to a subset as follows using the positions:
order the elements of the set s in any way you like but consistently (always the same in one program execution), we can in our case use the order in the string
add an element e_j if and only if the bit at position j is set (equal to 1).
As each number has at least one digit different from any other, we always get different subsets, and thus our mapping is injective (different input -> different output).
Our mapping is also valid, as the binary numbers we chose have at most the length equal to the size of our set so the bit positions can always be assigned to an element in the set. Combined with the fact that our set of inputs is chosen to have the same size (2^n) as the size of a power set, we can follow that it is in fact bijective.
import java.util.HashSet;
import java.util.Set;
public class PowerSet
{
static boolean getBit(int i, int pos) {return (i&1<<pos)>0;}
static Set<Set<Character>> powerSet(String s)
{
Set<Set<Character>> pow = new HashSet<>();
for(int i=0;i<(2<<s.length());i++)
{
Set<Character> subSet = new HashSet<>();
for(int j=0;j<s.length();j++)
{
if(getBit(i,j)) {subSet.add(s.charAt(j));}
}
pow.add(subSet);
}
return pow;
}
public static void main(String[] args)
{System.out.println(powerSet("xyz"));}
}
Here is easy way to do it (pseudo code) :-
for(int i=0;i<2^n;i++) {
char subset[];
int k = i;
int c = 0;
while(k>0) {
if(k%2==1) {
subset.add(string[c]);
}
k = k/2;
c++;
}
print subset;
}
Explanation :- The code divides number by 2 and calculates remainder which is used to convert number to binary form. Then as you know only selects index in string which has 1 at that bit number.
Related
I am trying to find a way to calculate and print the Ascii distance between a string from user input
Scanner scan = new Scanner(System.in);
System.out.print("Please enter a string of 5 uppercase characters:");
String userString = scan.nextLine();
and a randomly generated string
int leftLimit = 65; // Upper-case 'A'
int rightLimit = 90; // Upper-case 'Z'
int stringLength = 5;
Random random = new Random();
String randString = random.ints(leftLimit, rightLimit + 1)
.filter(i -> (i <= 57 || i >= 65) && (i <= 90 || i >= 97))
.limit(stringLength)
.collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append)
.toString();
Is there a way to calculate the distance without having to separate each individual character from the two strings, comparing them and adding them back together?
Use Edit distance (Levenshtein distance)
You can
Implement your own edit distance based on the algorithm on wikipedia,
you can use an existing source code, for that look at rosetta code.
use an existing library like apache LevenshteinDistance
you can also check
Levenshtein Distance on stackoverflow
Streams are, well, as the name says, streams. They don't work very well unless you can define an operation strictly on the basis of one input: One element from a stream, without knowing its index or referring to the entire collection.
Here, that is a problem; after all, to operate on, say, the 'H' in your input, you need the matching character from your random code.
I'm not sure why you find 'separate each individual character, compare them, and add them back together' is so distasteful to you. Isn't that a pretty clean mapping from the problem description to instructions for your computer to run?
The alternative is more convoluted: You could attempt to create a mixed object that contains both the letter as well as its index, stream over this, and use the index to look up the character in the second string. Alternatively, you could attempt to create a mix object containing both characters (so, for inputs ABCDE and HELLO, an object containing both A and H), but you'd be writing far more code to get that set up, then the simple, no-streams way.
So, let's start with the simple way:
int difference = 0;
for (int i = 0; i < stringLength; i++) {
char a = inString.charAt(i);
char b = randomString.charAt(i);
difference += difference(a, b);
}
You'd have to write the difference method yourself - but it'd be a very very simple one-liner.
Trying to take two collections of some sort, and from them create a single stream where each element in the stream is matching elements from each collection (so, a stream of ["HA", "EB", "LC", "LD", "OE"]) is generally called 'zipping' (no relation to the popular file compression algorithm and product), and java doesn't really support it (yet?). There are some third party libraries that can do it, but given that the above is so simple I don't think zipping is what you're looking for here.
If you absolutely must, I guess i'd look something like:
// a stream of 0,1,2,3,4
IntStream.range(0, stringLength)
// map 0 to "HA", 1 to "EB", etcetera
.mapToObj(idx -> "" + inString.charAt(idx) + randomString.charAt(idx))
// map "HA" to the difference score
.mapToInt(x -> difference(x))
// and sum it.
.sum();
public int difference(String a) {
// exercise for the reader
}
Create an 2D array fill the array with distances - you can index directly into the 2D array to pull out the distance between the characters.
So one expression that sums up a set of array accesses.
Here is my code for this (ASCII distance) in MATLAB
function z = asciidistance(input0)
if nargin ~= 1
error('please enter a string');
end
size0 = size(input0);
if size0(1) ~= 1
error ('please enter a string');
end
length0 = size0(2);
rng('shuffle');
a = 32;
b = 127;
string0 = (b-a).*rand(length0,1) + a;
x = char(floor(string0));
z = (input0 - x);
ascii0 = sum(abs(z),'all');
ascii1 = abs(sum(z,'all'));
disp(ascii0);
disp(ascii1);
disp(ascii0/ascii1/length0);
end
This script also differentiates between the absolute ASCII distance on a per-character basis vs that on a per-string basis, thus resulting in two integers returned for the ASCII distance.
I have also included the limit of these two values, the value of which approaches the inverse of the length of strings being compared. This actually approximates the entropy, E, of every random string generation event when run.
After standard error checking, the script first finds the length of the input string. The rnd function seeds the random number generator. the a and b variables define the ASCII table minus non-printable characters, which ends at 126, inclusively. 127 is actually used as an upper bound so that the next line of code can generate a random string of variables of input length. The following line of code turns the string into the alphanumeric characters provided by the ASCII table. The following line of code subtracts the two strings element-wise and stores the result. The next two lines of code sum up the ASCII distances in the two ways mentioned in the first paragraph. Finally, the values are printed out, as well as providing the entropy, E, of the random string generation event.
I want to generate an endless series of quasi random numbers to the following specification:-
Source of numbers is uniformly distributed and random, ranging 0 through 255 inclusive. It's an existing hardware device.
Required output range is 1 through 8 inclusive.
Two consecutive output numbers are never the same. For example 5 will never follow 5, but you can have 5,2,5.
Exactly one output number is required for every single source number. Rejection sampling therefore cannot be used. And while() loops, shuffles etc. can't be used.
It's this last stipulation that's vexing me. The source generator can only supply random bytes at a constant 1 /s and I want output at a constant 1 /s. Typically you'd simply reject a generated number if it was equal to the previous one, and generate another. In my case you only get one shot at each output. I think that it's some sort of random selection process, but this requirement has me going around in circles as I'm a bad programmer. An algorithm, flowchart or picture will do, but I'll be implementing in Java.
Apologies for the semi generic title, but I couldn't really think of anything more accurate yet concise.
If I understand the problem correctly, the first random number will be chosen randomly from among 8 different numbers (1 to 8), while every successive random number will be chosen from 7 different possibilities (1 to 8 excluding the previous one). Thus, your range of 256 values will need to be divided into 7 possibilities. It won't come out even, but that's the best you can do. So you need something like
public class RandomClass {
public RandomClass(HardwareSource source) {
this.source = source;
first = true;
}
pubic int nextRandom() {
int sourceValue = source.read();
int value;
if (first) {
value = sourceValue % 8 + 1;
prev = value;
} else {
value = sourceValue % 7 + 1;
if (value >= prev) {
value++;
}
prev = value;
first = false;
return value;
}
}
Suppose the first call generates 5. The second time you call it, value is first computed to be a number from 1 to 7; by incrementing it if the value is >= 5, the range of possible outputs becomes 1, 2, 3, 4, 6, 7, 8. The output will be almost evenly distributed between those two values. Since 256 is not divisible by 7, the distribution isn't quite even, and there will be a slight bias toward the lower numbers. You could fix it so that the bias will shift on each call and even out over the entire sequence; I believe one way is
value = (sourceValue + countGenerated) % 7 + 1;
where you keep track of how many numbers you've generated.
I think this is better than solutions that take the input modulo 8 and add 1 if the number equals the previous one. Those solutions will generate prev + 1 with twice the probability of generating other numbers, so it's more skewed than necessary.
int sum=0;
int prev=-1;
int next(int input){
sum=(sum+input)%8;
if(sum==prev)sum=(sum+1)%8;
prev=sum;
return sum+1;
}
(As I interpret even with the new bold emphasis, it is not required to always generate the same output value for the same input value - that would make the task impossible to solve)
Ok, I have a project that requires me to have a dynamic hash table that counts the frequency of words in a file. I must use java, however, we are not allowed to use any built in data types or built in classes at all except standard arrays. Also, I am not allowed to use any hash functions off the internet that are known to be fast. I have to make my own hash functions. Lastly, my instructor also wants my table to start as size "1" and double in size every time a new key is added.
My first idea was to sum the ASCII values of the letters composing a word and use that to make a hash function, but different words with the same letters will equal the same value.
How can I get started? Is the ASCII idea on the right track?
A hash table isn't expected to have in general a one-to-one mapping between a value and a hash. A hash table is expected to have collisions. That is, the domain of the hash-function is expected to be larger than the range (i.e., the hash value). However, the general idea is that you come up with a hash function where the probability of collision is drastically small. If your hash-function is uniform, i.e., if you have it designed such that each possible hash-value has the same probability of being generated, then you can minimize collisions this way.
Getting a collision isn't the end of the world. That just means that you have to search the list of values for that hash. If your hashing function is good, overall your performance for lookup should still be O(1).
Generating hashing functions is a subject of its own, and there is no one answer. But a good place for you to start could be to work with the bitwise representations of the characters in the string, and perform some sort of convolution operations on them (rotate, shift, XOR) in series. You could perform these in some way based on some initial seed-value, and then use the output of the first step of hashing as a seed for the next step. This way you can end up magnifying the effects of your convolution.
For example, let's say you get the character A, which is 41 in hex, or 0100 0001 in binary. You could designate each bit to mean some operation (maybe bit 0 is a ROR when it is 0, and a ROL when it is 1; bit 1 is an OR when it is 0, and a XOR when it is 1, etc.). You could even decide how much convolution you want to do based on the value itself. For example, you could say that the lower nibble specifies how much right-rotation you will do, and the upper nibble specifies how much left rotation you will do. Then once you have the final value, you will use that as the seed for the next character. These are just some ideas. Use your imagination as see what you get!
It does not matter how good your hash function is, you will always have collisions you need to resolve.
If you want to keep your approach by using the ASCII values of the you shouldn't just add the values this would lead to a lot collisions. You should work with the power of the values, for example for the word "Help" you just go like: 'H' * 256 + 'e' * 256 + 'l' * 256² + 'p' * 256³. Or in pseudocode:
int hash(String word, int hashSize)
int res = 0
int count = 0;
for char c in word
res += 'c' * 256^count
count++
count = count mod 5
return res mod hashSize
Now you just have to write your own Hashtable:
class WordCounterMap
Entry[] entrys = new Entry[1]
void add(String s)
int hash = hash(s, entrys.length)
if(entrys[hash] == null{
Entry[] temp = new Entry[entry.length * 2]
for(Entry e : entrys){
if(e != null)
int hash = hash(e.word, temp.length)
temp[hash] = e;
entrys = temp;
hash = hash(s, entrys.length)
while(true)
if(entrys[hash] != null)
if(entrys[hash].word.equals(s))
entrys[hash].count++
break
else
entrys[hash] = new Entry(s)
hash++
hash = hash mod entrys.length
int getCount(String s)
int hash = hash(s, length)
if(entrys[hash] == null)
return 0
while(true)
if(entrys[hash].word.equals(s))
entrys[hash].count++
break
hash++
hash = hash mod entrys.length
class Entry
int count
String word
Entry(String s)
this.word = s
count = 1
I have a decimal number which I need to convert to binary and then find the position of one's in that binary representation.
Input is 5 whose binary is 101 and Output should be
1
3
Below is my code which only provides output as 2 instead I want to provide the position of one's in binary representation. How can I also get position of set bits starting from 1?
public static void main(String args[]) throws Exception {
System.out.println(countBits(5));
}
private static int countBits(int number) {
boolean flag = false;
if (number < 0) {
flag = true;
number = ~number;
}
int result = 0;
while (number != 0) {
result += number & 1;
number = number >> 1;
}
return flag ? (32 - result) : result;
}
Your idea of having countBits return the result, instead of putting a System.out.println inside the method, is generally the best approach. If you want it to return a list of bit positions, the analogue would be to have your method return an array or some kind of List, like:
private static List<Integer> bitPositions(int number) {
As I mentioned in my comments, you will make life a lot easier for yourself if you use >>> and get rid of the special code to check for negatives. Doing this, and adapting the code you already have, gives you something like
private static List<Integer> bitPositions(int number) {
List<Integer> positions = new ArrayList<>();
int position = 1;
while (number != 0) {
if (number & 1 != 0) {
positions.add(position);
}
position++;
number = number >>> 1;
}
return positions;
}
Now the caller can do what it wants to print the positions out. If you use System.out.println on it, the output will be [1, 3]. If you want each output on a separate line:
for (Integer position : bitPositions(5)) {
System.out.println(position);
}
In any case, the decision about how to print the positions (or whatever else you want to do with them) is kept separate from the logic that computes the positions, because the method returns the whole list and doesn't have its own println.
(By the way, as Alex said, it's most common to think of the lower-order bit as "bit 0" instead of "bit 1", although I've seen hardware manuals that call the low-order bit "bit 31" and the high-order bit "bit 0". The advantage of calling it "bit 0" is that a 1 bit in position N represents the value 2N, making things simple. My code example calls it "bit 1" as you requested in your question; but if you want to change it to 0, just change the initial value of position.)
Binary representation: Your number, like anything on a modern day (non-quantum) computer, is already a binary representation in memory, as a sequence of bits of a given size.
Bit operations
You can use bit shifting, bit masking, 'AND', 'OR', 'NOT' and 'XOR' bitwise operations to manipulate them and get information about them on the level of individual bits.
Your example
For your example number of 5 (101) you mentioned that your expected output would be 1, 3. This is a bit odd, because generally speaking one would start counting at 0, e.g. for 5 as a byte (8 bit number):
76543210 <-- bit index
5 00000101
So I would expect the output to be 0 and 2 because the bits at those bit indexes are set (1).
Your sample implementation shows the code for the function
private static int countBits(int number)
Its name and signature imply the following behavior for any implementation:
It takes an integer value number and returns a single output value.
It is intended to count how many bits are set in the input number.
I.e. it does not match at all with what you described as your intended functionality.
A solution
You can solve your problem using a combination of a 'bit shift' (>>) and an AND (&) operation.
int index = 0; // start at bit index 0
while (inputNumber != 0) { // If the number is 0, no bits are set
// check if the bit at the current index 0 is set
if ((inputNumber & 1) == 1)
System.out.println(index); // it is, print its bit index.
// advance to the next bit position to check
inputNumber = inputNumber >> 1; // shift all bits one position to the right
index = index + 1; // so we are now looking at the next index.
}
If we were to run this for your example input number '5', we would see the following:
iteration input 76543210 index result
1 5 00000101 0 1 => bit set.
2 2 00000010 1 0 => bit not set.
3 1 00000001 2 1 => bit set.
4 0 00000000 3 Stop, because inputNumber is 0
You'll need to keep track of what position you're on, and when number & 1 results in 1, print out that position. It look something like:
...
int position = 1;
while (number != 0) {
if((number & 1)==1)
System.out.println(position);
result += number & 1;
position += 1;
number = number >> 1;
}
...
There is a way around working with bit-wise operations to solve your problem.
Integer.toBinaryString(int number) converts an integer to a String composed of zeros and ones. This is handy in your case because you could instead have:
public static void main(String args[]) throws Exception {
countBits(5);
}
public static void countBits(int x) {
String binaryStr = Integer.toBinaryString(x);
int length = binaryStr.length();
for(int i=0; i<length; i++) {
if(binaryStr.charAt(i)=='1')
System.out.println(length-1);
}
}
It bypasses what you might be trying to do (learn bitwise operations in Java), but makes the code look cleaner in my opinion.
The combination of Integer.lowestOneBit and Integer.numberOfTrailingZeros instantly gives the position of the lowest 1-Bit, and returns 32 iff the number is 0.
Therefore, the following code returns the positions of 1-Bits of the number number in ascending order:
public static List<Integer> BitOccurencesAscending(int number)
{
LinkedList<Integer> out = new LinkedList<>();
int x = number;
while(number>0)
{
x = Integer.lowestOneBit(number);
number -= x;
x = Integer.numberOfTrailingZeros(x);
out.add(x);
}
return out;
}
I have the following piece of code:
public class Main {
private static final Random rnd = new Random();
private static int getRand(int n) {
return (Math.abs(rnd.nextInt())%n);
}
public static void main(String[] args) {
int count=0, n = 2 * (Integer.MAX_VALUE/3);
for(int i=0; i<1000000; i++) {
if(getRand(n) < n/2) {
count++;
}
}
System.out.print(count);
}
}
This always gives me a number close to 666,666. Meaning two-thirds of the numbers generated are below the lower half of n. Not that this is obtained when n = 2/3 * Integer.MAX_VALUE. 4/7 is another fraction that gives me a similar spread (~5714285). However, I get an even spread if n = Integer.MAX_VALUE or if n = Integer.MAX_VALUE/2. How does this behavior differ with the fraction used. Can somebody throw some light on it.
PS: I got this problem from the book Effective Java by Joshua Bloch.
The problem is in the modulo (%) operator which results in an uneven distribution of numbers.
For example, imagine MAX_INT is 10, and n = 7, the mod operator will map the values 8, 9 and 10 to 1, 2 and 3, respectively. This will result that the numbers 1, 2 and 3 will have double the probability of all other numbers.
One way to solve this is by checking the output of rnd.nextInt() and try again while it's bigger than N.
You would get 50-50 if you kept only values of Math.abs(rnd.nextInt()) in the range of [0..2/3(Integer.MAX_VALUE)]. For the rest 1/3*Integer.MAX_VALUE numbers, due to modulo you will get a smaller number in the range of [0..1/3 Integer.MAX_VALUE].
All in all, numbers in the range of [0..1/3 Integer.MAX_VALUE] have double the chance to appear.
The Random class is designed to generate pseudo-random numbers. That means they are elements of a defined sequence that have an uniform distribution. If you don't know the sequence, they seem to be random.
Having said that, the problem is that you mess up the uniform distribution you get by using the modulus operator. On coding horror, there is a very nice article that explains this issue, although for a slightly different problem. Now, you can find a solution to your problem along with a proof here.
As observed above, getRand does not generate uniformly distributed random numbers over the range [0, n].
In general, suppose that n = a * Integer.MAX_VALUE / b, where a/b > 0.5
For ease of writing, let M = Integer.MAX_VALUE
The Probability Density Function (PDF) of getRand(n) is given by:
PDF(x) = 2/M for 0 < x < (b-a)M/b
= 1/M for (b-a)M/b < x < aM/b
n/2 corresponds to the mid-point of the range [0, aM/b] = aM/2b
Integrating the PDF over the 'first-half' range [0, n/2] we find that the probability (P) that getRand(n) is less than n/2 is given by:
P = a/b
Examples:
a=2, b=3. P = 2/3 = 2/3 = 0.66666... as computed by the questioner.
a=4, b=7. P = 4/7 = 0.5714... close to the questioner's computational result.