I am trying to convert Strings to their integer equivalents for faster comparison using ByteBuffer (java.nio.ByteBuffer).
I got a very peculiar exception using ByteBuffer.
public class LargeCompare {
public static void main(String args[]){
byte[]b ="zzz".getBytes();
ByteBuffer bb = ByteBuffer.wrap(b);
bb.getInt();
}
}
The above code does not raise an exception for strings of length 4 but raises one for ones of length 3 and less.
Can anyone help me in fixing this?
An int is 32 bits, or 4 bytes, wide. You are a trying to read an int from a buffer that's shorter than this. This is why you're getting the exception.
I don't really follow where you're going with this, so will refrain from making suggestions.
Uhm, from the documentation:
Throws:
BufferUnderflowException - If there are fewer than four bytes remaining in this buffer
you only have 3 bytes.
Here is the solution...
public class LargeCompare {
public static void main(String args[]){
String str = "A";
System.out.println(bytesToInt(str.getBytes()));
}
public static int bytesToInt(byte[] byteArray){
int value= 0;
for(int i=0;i<byteArray.length;i++){
int x=(byteArray[i]<0?(int)byteArray[i]+256:(int)byteArray[i])<<(8*i);
value+=x;
}
return value;
}}
I have tested this code, working without any issues...
Related
I am trying to store a set of numbers that range from 0 to ~60 billion, where the set starts out empty and gradually becomes denser until it contains every number in the range. The set does not have to be capable of removing numbers. Currently my approach is to represent the set as a very long boolean array and store that array in a text file. I have made a class for this, and have tested both RandomAccessFile and FileChannel with the range of the numbers restricted from 0 to 2 billion, but in both cases the class is much slower at adding and querying numbers than using a regular boolean array.
Here is the current state of my class:
import java.io.*;
import java.nio.ByteBuffer;
import java.nio.channels.*;
import java.util.*;
public class FileSet {
private static final int BLOCK=10_000_000;
private final long U;
private final String fname;
private final FileChannel file;
public FileSet(long u, String fn) throws IOException {
U=u;
fname=fn;
BufferedOutputStream out=new BufferedOutputStream(new FileOutputStream(fname));
long n=u/8+1;
for (long rep=0; rep<n/BLOCK; rep++) out.write(new byte[BLOCK]);
out.write(new byte[(int)(n%BLOCK)]);
out.close();
file=new RandomAccessFile(fn,"rw").getChannel();
}
public void add(long v) throws IOException {
if (v<0||v>=U) throw new RuntimeException(v+" out of range [0,"+U+")");
file.position(v/8);
ByteBuffer b=ByteBuffer.allocate(1); file.read(b);
file.position(v/8);
file.write(ByteBuffer.wrap(new byte[] {(byte)(b.get(0)|(1<<(v%8)))}));
}
public boolean has(long v) throws IOException {
if (v<0||v>=U) return false;
file.position(v/8);
ByteBuffer b=ByteBuffer.allocate(1); file.read(b);
return ((b.get(0)>>(v%8))&1)!=0;
}
public static void main(String[] args) throws IOException {
long U=2000_000_000;
SplittableRandom rnd=new SplittableRandom(1);
List<long[]> actions=new ArrayList<>();
for (int i=0; i<1000000; i++) actions.add(new long[] {rnd.nextInt(2),rnd.nextLong(U)});
StringBuilder ret=new StringBuilder(); {
System.out.println("boolean[]:");
long st=System.currentTimeMillis();
boolean[] b=new boolean[(int)U];
System.out.println("init time="+(System.currentTimeMillis()-st));
st=System.currentTimeMillis();
for (long[] act:actions)
if (act[0]==0) b[(int)act[1]]=true;
else ret.append(b[(int)act[1]]?"1":"0");
System.out.println("query time="+(System.currentTimeMillis()-st));
}
StringBuilder ret2=new StringBuilder(); {
System.out.println("FileSet:");
long st=System.currentTimeMillis();
FileSet fs=new FileSet(U,"FileSet/"+U+"div8.txt");
System.out.println("init time="+(System.currentTimeMillis()-st));
st=System.currentTimeMillis();
for (long[] act:actions) {
if (act[0]==0) fs.add(act[1]);
else ret2.append(fs.has(act[1])?"1":"0");
}
System.out.println("query time="+(System.currentTimeMillis()-st));
fs.file.close();
}
if (!ret.toString().equals(ret2.toString())) System.out.println("MISMATCH");
}
}
and the output:
boolean[]:
init time=1248
query time=148
FileSet:
init time=269
query time=3014
Additionally, when increasing the range from 2 billion to 10 billion, there is a large jump in total running time for the queries, even though in theory the total running time should stay roughly constant. When I use the class by itself (since a boolean array no longer works for this big of a range), the query time goes from ~3 seconds to ~50 seconds. When I increase the range to 60 billion, the time increases to ~240 seconds.
My questions are: is there a faster way of accessing and modifying very large files at arbitrary indices? and is there an entirely different approach to storing large integer sets that is faster than my current approach?
Boolean arrays are a very inefficient way to store information as each boolean takes up 8 bits. You should use a BitSet instead. But BitSets also have the 2 billion limit as it uses primitive int values as parameters (and Integer.MAX_VALUE limits the size of the internal long array).
A space efficient in-memory alternative that spans beyond 2 billion entries would be to create your own BitSet wrapper that splits the data into subsets and does the indexing for you:
public class LongBitSet {
// TODO: Initialize array and add error checking.
private final BitSet bitSets = new BitSet[64];
public void set(long index) {
bitSets[(int) (index / Integer.MAX_VALUE)]
.set((int) (index % Integer.MAX_VALUE));
}
}
But there are other alternatives too. If you have a very dense data, using run length encoding would be a cheap way to increase memory capacity. But that would likely involve a B-tree structure to make accessing it more efficient. These are just pointers. A lot of what makes the correct answer depend solely on how you actually use the data structure.
Turns out the simplest solution is to use a 64-bit JVM and increase Java heap space by running my Java program in the terminal with a flag like -Xmx10g. Then I can simply use an array of longs to implicitly store the entire set.
I'm working on a implementation where I should convert a long hash String to a BigInteger back and forth (the function should be reversible) but I'm not figuring out how to make it work in Java with those classes.
My first idea was to do something as follows:
given s:String
for every character in the input string:
convert char to decimal ASCII representation (i.e. 'a' -> '97')
append result to s
build a BigDecimal with the resulting s
but the problem is (as commented by many users) the length for the conversion, because ASCII characters goes from 0 to 255. It could be changed from 'a' -> '97' to 'a' -> '097', but again there's a problem in decoding, removing heading zeroes to every character (BTW, doing the algorithm less efficient)
So, in conclusion, the algorithm proposed here is not the best idea so I'm open to some other solutions. Also, if there is any library or built-in method in String and/or BigInteger, it's helpful too. The signature is
public class EncodeUtil {
public BigInteger encode(String s) {...}
public String decode(BigInteger bi) {...}
}
and the condition is that decode(encode("som3_We1rd/5+ring")) outputs "som3_We1rd/5+ring"
I think it's worth to say that received strings for decoding are hashes like lQ5jkXWRkrbPlPlsRDUPcY6bwOD8Sm/tvJAVhYlLS3WwE5rGXv/rFRzyhn4XpUovwkLj2C3zS1JPTQ1FLPtxNXc2QLxfRcH1ZRi0RKJu1lK8TUCb6wm3cDw3VRXd21WRsnYKg6q9ytR+iFQykz6MWVs5UGM5NPsCw5KUBq/g3Bg=
Any idea/suggestion is welcomed. Thanks in advance for your time.
This does approximately what you want - but what you have asked, specifically, will not work when the number of digits per "decimal ASCII representation" is variable. Also, what you want is not a hash function:
public class Driver {
public static void main(String[] args) {
String s = "Reversible Hash a String to BigInteger in Java";
System.out.println(HashUtil.notReallyHash(s));
System.out.println(HashUtil.notReallyUnhash(HashUtil.notReallyHash(s)));
}
}
class HashUtil {
private static final byte SENTINEL = (byte) 1;
public static BigInteger notReallyHash(String s) {
CharBuffer charBuf = CharBuffer.wrap(s.toCharArray());
ByteBuffer byteBuf = ByteBuffer.allocate(charBuf.length() * Character.BYTES + 1);
byteBuf.put(SENTINEL); // need this in case first byte is 0 - biginteger will drop it
byteBuf.asCharBuffer()
.append(charBuf);
return new BigInteger(1, byteBuf.array());
}
public static String notReallyUnhash(BigInteger bi) {
ByteBuffer byteBuf = ByteBuffer.wrap(bi.toByteArray());
byteBuf.get(); // SENTINEL
CharBuffer charBuf = byteBuf.asCharBuffer();
StringBuilder sb = new StringBuilder();
int count = charBuf.length();
for (int i = 0; i < count; i++) {
sb.append(charBuf.get());
}
return sb.toString();
}
}
Yields:
361926078700757358567593716803587125664654843989863967556908753816306719264539871333731967310574715835858778584708939316915516582061621172700488541380894773554695375367299711405739159440282736685351257712598020862887985249
Reversible Hash a String to BigInteger in Java
As the title says i was trying to convert binary to decimal using recursive technique in java but i cannot get the desire output
here's what i did
public class deci {
public static void main(String args[]){
hexa s1=new deci();
s1.spawn(11000);
}
void spawn(int a){
int p=0;int x=0;int k=0;
if(a>0){
p=a%10;
x=x+p*(int)Math.pow(2,k);
k++;
spawn(a/10);
} else {
System.out.print(x);
}
}
}
The problem is you ar not using the result of spawn, either returning or printing it.
If you want to return it, you need the shifting or power, but it you want to print it.
I suggest you step through the code in your debugger so you can see what it is doing.
Here is a working program.
Parameters are (binary code, size of code-1) e.g. for (111, 2) this will return 7
int binaryToDecimal(int binary,int size){
if(binary==0) return 0;
return binary%10*(int)Math.pow(2,size)+binaryToDecimal((int)binary/10,size-1);
}
I have a wrapper around the ByteBuffer class (because in my code, it is underlying structure for an entity). I want a ByteBuffer to store fixed sized entries in it and return null or throw an exception if we try to read at an offset where nothing was written. I've written the following code:
private static final int SIZE = 16; //Bytes
private static final int BBSIZE = 48 * SIZE;
ByteBuffer blockMap = ByteBuffer.allocateDirect(BBSIZE);
byte[] readAtOffset(final int offset) throws BufferUnderflowException,
IndexOutOfBoundsException {
byte[] dataRead = new byte[SIZE];
blockMap.position(offset);
blockMap.get(dataRead);
return dataRead;
}
void writeAtOffset(final int offset, final byte[] data)
throws BufferOverflowException, IndexOutOfBoundsException, ReadOnlyBufferException
{
if (data.length != SIZE) {
throw new IllegalArgumentException("Invalid data received");
}
blockMap.position(offset);
blockMap.put(data);
}
public static void main(String[] args) {
ByteBufferTests tests = new ByteBufferTests();
System.out.println("At 0: " + tests.readAtOffset(0));
}
Shouldn't this throw an exception as I haven't written anything to the buffer yet? What am I doing wrong?
When you create a ByteBuffer it is full of zeros. It is full of the size you create it for. If you want to track which portions you have written to, you have to do this additionally,
I suggest using an index number instead of a raw offset, and you can use a BitSet to see which portions were written to. An alternative is to make an assumption that a message won't start with nul bytes and if it does, it is corrupt/not present.
To quote from the ByteBuffer JavaDoc:
The new buffer's position will be zero, its limit will be its capacity, its mark will be undefined, and each of its elements will be initialized to zero.
So, even though you haven't written to the buffer, the allocateDirect(...) method has (in a sense).
Cheers,
How to solve this?
File f=new File("d:/tester.txt");
long size=f.length(); // returns the size in bytes
char buff[]=new char[size]; // line of ERROR
// will not accept long in it's argument
// I can't do the casting (loss of data)
Is it possible to use size as the length of buff without the loss of data?
If yes how can i use it?
My second question is :
Why i am not getting the actual number of bytes?
This is the program :
import java.io.*;
class tester {
public static void main(String args[]) {
File f=new File("c:/windows/system32/drivers/etc/hosts.File");
long x=f.length(); // returns the number of bytes read from the file
System.out.println("long-> " + x );
}
}
The output is long-> 0 ,but obviously it is not so.Why do i get this result?
You need to cast the long to an int
char buff[]=new char[(int) size];
This will only work for files less than 2 GB in size.
However, if you intend to use this to read the file perhaps you meant
byte[] buff=new byte[(int) size];
I would look at FileUtils and IOUtils from Apache Commons IO which has lots of help methods.
I doubt you have a file with that name. perhaps you need to drop the .File at the end which sounds like an odd extension.
I would check f.exists() first.