I am currently working with Java and MySQL, and I found an issue I don't know how to solve.
I have a class that stores a String of 365 positions that represents a Binary String "010111010010100...", and I would like to be able to store and read that field from the database.
Once it is read, I will perform an AND Logic operation with another bitarray.
I read about the BitSet class, that allows the logical operators (AND, OR, XOR, ...) between them. I tried it, but I didn't like the solutions I got. I could also try to transform the String to a byte array, and then store and read it from the database, in order to later perform the logic AND operation, but not sure if I would need to always create a BitSet, and how performant could it be.
I don't know which is the most performant way to do what I want:
Convert the Binary String in another element.
Store that element in the database (in the case of BitSet I tried to define the Database field as BLOB, but I had a lot of issues transforming the BitSet to BLOB and reading the BLOB to a BitSet).
Read the element from the database (at this point would be great to directly work with the element without making any cast or transformation).
Perform a logic AND with another bitarray and get the result.
I have tried a lot of options, but they didn't work.
Could someone help me with this problem and how to better approach it from the performance point of view?
Thanks!
Storing bit in a string is bit weird, I used long to store a number, and make bitwise operations on that. It won't work for you, since you use much more bits. If it can remain string, maybe you can write a short function to make the AND operator on each byte of the string, somehow like this:
for (int i = 0; i<366; i++) {
data .= (stringname[i] == binarystring[i]?"1":"0");
}
Go through your string, while checking if it equals binary string (The one you want to AND it), if they equal, concat 1, if not, concat 0;
Related
I'm pretty much a complete newbie to java and programming in general. I was wondering if anyone could help me out.
So I have a .csv that I am reading and storing data from (i think in the form of an array?) by using the following. This works fine and is grabbing all the data from the csv.
fo=new File()
fo.open(filename)
contents = fo.read()
fo.close
The data that I am grabbing from the .csv is in the form of well positions on a plate e.g. A1, B1, C1, D1 etc. Now is there a way that I can make each of the letters worth a particular value?
For example, A=1, B=2 C=3 etc, and then make this new value multiply by the second number e.g. A1 would become 1*1 = 2 and A2 would become 1*2 = 2 and B2 would become 2*2 = 4.
Any help would be greatly appreciated.
Usually the way that files are read follow this structure:
Open File using Reader. (using Scanner, FileReader, etc)
Read data. (Scanner.nextLine() for example)
Close File Reader. (close)
In your case, one possible approach is to read the data and then if you want an array of values, simply use String.split() method, passing the "," as a delimiter because it is comma separated file. Once you have an array, you make w/e changes you want. In your case, you want to iterate over that array and perform transformations.
However, I would also like to clarify something, because it is kind of implied by your question, that updating the array after you read will not update the file. Just wanted to make that clear to avoid confusion.
We need our protobuf messages to contain as little data as possible. So what are the best practices we can follow in order to gain the maximum out of it. As an example writing byte[] as a String or ByteString ? What makes the difference? And adding a list of Integers as a repeated list or something else ?
As an example writing byte[] as a String or ByteString ?
If you want to write binary data, use a bytes fields (so ByteString). A string field is UTF-8-encoded text, so can't be used for all possible byte sequences.
And adding a list of integers as a repeated list or something else ?
Yes, use a repeated list - but with the [packed=true] option.
Basically, look over the whole encoding documentation and work out what's most appropriate for you. In particular, choose carefully between the various numeric representations, based on what your actual data will be. (If you're writing 32-bit values which are typically very large, consider using the fixed32 format instead of just int32 for example.)
hy,
this question is pretty similar to SingleColumnValueFilter not returning proper number of rows .
I use four SingleColumnValueFilter's w/ operator EQUAL and add them to a FilterList with Operator MUST_PASS_ONE. the number of results is the same as w/o setting the FilterList. The value to compare is a byte[] that should be correct as I just store the values from previous results. (it is an IP address that I convert to InetAddress, new InetAddress(value as byte[]), when retrieving the data, and for the query described I just call InetAddress.getAddress which returns a byte[])
Do you have any ideas what might be the problem? Am I using the Filter wrong?
EDIT:
I also used the original values retrieved by the query as value for SingleColumnValueFilter, and there was no difference in the results, thus the byte[] contents can't be the problem.
I think I can give the answer myself, sorry for not debugging and checking all the hbase code before.
I just checked the implementation of the compare algorithm (which is lexicographically), and thus i realized that the length is not taken into account, though I thought it would be filled up w/ zero's; unfortunately it is not.
The only reasonable option would be to create a custom comparator (eg see How do you use a custom comparator with SingleColumnValueFilter on HBase?)
It's probably a stupid question but here's the thing. I was reading this question:
Storing 1 million phone numbers
and the accepted question was what I was thinking: using a trie. In the comments Matt Ball suggested:
I think storing the phone numbers as ASCII text and compressing is a very reasonable suggestion
Problem: how do I do that in Java? And ASCII text does stand for String?
For in-memory storage as indicated in the question:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
OutputStreamWriter out = new OutputStreamWriter(
new GZIPOutputStream(baos), "US-ASCII");
for(String number : numbers){
out.write(number);
out.write('\n');
}
byte[] data = baos.toByteArray();
But as Pete remarked: this may be good for memory efficiency, but you can't really do anything with the data afterwards, so it's not really very useful.
Yes, ASCII means Strings in this case. You can store compressed data in Java using the java.util.zip.GZIPOutputStream.
In answer to an implied, but different question;
Q: You have 1 billion phones numbers and you need to send these over a low bandwidth connection. You only need to send whether the phone number is in the collection or not. (No other information required)
A: This is the general approach
First sort the list if its not sorted already.
From the lowest number find regions of continuous numbers. Send the start of the region and the phones which are taken. This can be stored a BitSet (1-bit per possible number) Send the phone number at the start and the BitSet whenever the gap is more than some threshold.
Write the stream to a compressed data set.
Test this to compare with a simple sending of all numbers.
You can use Strings in a sorted TreeMap. One million numbers is not very much and will use about 64 MB. I don't see the need for a more complex solution.
The latest version of Java can store ASCII text efficiently by using a byte[] instead of a char[] however, the overhead of your data structure is likely to be larger.
If you need to store a phone numbers as a key, you could store them with the assumption that large ranges will be continous. As such you could store them like
NavigableMap<String, PhoneDetails[]>
In this structure, the key would define the start of the range and you could have a phone details for each number. This could be not much bigger than the reference to the PhoneDetails (which is the minimum)
BTW: You can invent very efficient structures if you don't need access to the data. If you never access the data, don't keep it in memory, in fact you can just discard it as it won't ever be needed.
Alot depending on what you want to do with the data and why you have it in memory at all.
You can Use DeflatorOutputStream to a ByteArrayOutputStream, which will be very small, but not very useful.
I suggest using DeflatorOutputStream as its more light weight/faster/smaller than GZIPOutputStream.
Java String are by default UTF-8 encoded, you have to change the encoding if you want to manipulate ASCII text.
Can anyone tell me the best way to decode binary data with variable length bit strings in java?
For example:
The binary data is 10101000 11100010 01100001 01010111 01110001 01010110
I might need to find the first match of any of the following 01, 100, 110, 1110, 1010...
In this case the match would be 1010. I then need to do the same for the remainder of the binary data. The bit strings can be up to 16 bits long and cross the byte boundaries.
Basically, I'm trying to Huffman decode jpegs using the bit strings I created from the Huffman tables in the headers. I can do it, only it's very messy, I'm turning everything, binary data included, into Stringbuffers first and I know that isn't the right way.
Before I loaded everything in string buffers I tried using just numbers in binary but of course I can't ignore the leading 0s in a code like 00011. I'm sure there must be some clever way using bit wise operators and the like to do this, but I've been staring at pages explaining bit masks and leftwise shifts etc and I still don't have a clue!
Thanks a lot for any help!
EDIT:
Thanks for all the suggestions. I've gone with the binary tree approach as it seems to be the standard way with Huffman stuff. Makes sense really as Huffman codes are created using trees. I'll also look into to storing the binary data I need to search in a big integer. Don't know how to mark multiple answers as correct, but thanks all the same.
You might use a state machine consuming zeros and ones. The state machine would have final states for all the patterns that you want to detect. Whenever it enters one of the final states, is sends a message to you with the matched pattern and goes back to the initial state.
Finally you would have only one state machine in form of a DAG which contains all your patterns.
To implement it use the state pattern (http://en.wikipedia.org/wiki/State_pattern) or any other implementation of a state machine.
Since you are decoding Huffman encoded-data, you should create a binary tree, where leaves hold the decoded bit string as data, and the bits of each Huffman code are the path to the corresponding data. The bits of the Huffman code are accessed with bit-shift and bit-mask operations. When you get to a leaf, you output the data at that leaf and go back to the root of the tree. It's very fast and efficient.
You could try stuffing it into a BigInteger then using the shift and test methods. Then use loop to walk and accept each sub pattern.
If the huffman code are in a tree, 1 == right node, 0 == left node.
for( int i =numbitsTotal; i > 0; --i )
{
int bit = bigInt.testBit( i );
if( bit == 1 )
{
// take right node -- if null accept code, apply from top
}
else
{
// take left node -- if null accept code, apply from top
}
}
I would suggest a trie. It is explicitly designed for prefix searching. In your case, it would be a binary trie.
You could use a java.util.BitSet to store your binary data and then you can implement some search functions to find the position of a smaller BitSet inside the big one...