i try to store a string into an integer as follows:
i read the characters of the string and every 4 characters i do this:
val = (int) ch << 24 | (int) ch << 16 | (int) ch << 8 | (int) ch;
Then i put the integer value in an array of integer that is called memory (=> int memory[16]).
I would like to do it in an automatic way for every length of a string, plus i have difficulties to inverse the procedure again for an arbitrary size string. Any help?
EDIT:
(from below)
Basically, i do an exercise in JAVA. It's a MIPS simulator system. I have Register, Datum, Instruction, Label, Control, APSImulator classes and others. When i try to load the program from an array to simulator's memory, i actually read every contents of the array which is called 'program' and put it in memory. Memory is 2048 long and 32 bits wide. Registers are declared also 32bit integers. So when there is an content in the array like Datum.datum( "string" ) - Datum class has IntDatum and StringDatum subclasses - i have somehow to store the "string" in the simulator's data segment of memory. Memory is 0-1023 text and 1024-2047 data region. I also have to delimit the string with a null char - plus any checkings for full memory etc. I figure out that one way to store a String to MemContents ( reference type - empty interface - implemented by class that memory field belongs to ) is to store the string every ( 2 or maybe 4 symbols ) to a register and then take the contents of the register and store it in memory. So, i found very difficult to implement that and the reverse procedure also.
If you are working in C, you have your string in a char array that is of a size multiple of a int, you can just take the pointer to the char array, cast it to a pointer to a int array and do whatever you want with your int array. If you don't have this last guarantee, you may simply write a function that creates your int array on the fly:
size_t IntArrayFromString(const char * Source, int ** Dest)
{
size_t stringLength=strlen(Source);
size_t intArrElements;
intArrElements=stringLength/sizeof(int);
if(stringLength%sizeof(int)!=0)
intArrElements++;
*Dest=(int *)malloc(intArrElements*sizeof(int));
(*Dest)[intArrElements-1]=0;
memcpy(Dest, Source, stringLength);
return intArrElements;
}
The caller is responsible for freeing the Dest buffer.
(I'm not sure if it really works, I didn't test it)
Have you considered simply using String.getBytes() ? You can then use the byte array to create the ints (for example, using the BigInteger(byte[]) constructor.
This may not be the most efficient solution, but is probably less prone to errors and more readable.
Assuming Java: You could look at the ByteBuffer class, and it's getInt method. It has a byte order parameter which you need to configure first.
Basically, i do an exercise in JAVA. It's a MIPS simulator system. I have Register, Datum, Instruction, Label, Control, APSImulator classes and others. When i try to load the program from an array to simulator's memory, i actually read every contents of the array which is called 'program' and put it in memory. Memory is 2048 long and 32 bits wide. Registers are declared also 32bit integers. So when there is an content in the array like Datum.datum( "string" ) - Datum class has IntDatum and StringDatum subclasses - i have somehow to store the "string" in the simulator's data segment of memory. Memory is 0-1023 text and 1024-2047 data region. I also have to delimit the string with a null char - plus any checkings for full memory etc. I figure out that one way to store a String to MemContents ( reference type - empty interface - implemented by class that memory field belongs to ) is to store the string every ( 2 or maybe 4 symbols ) to a register and then take the contents of the register and store it in memory. So, i found very difficult to implement that and the reverse procedure also.
One common way to do this in C is to use a union. It could look like
union u_intstr {
char fourChars[4];
int singleInt;
};
Set the chars into the union as
union u_intstr myIntStr;
myIntStr.fourChars[0] = ch1;
myIntStr.fourChars[1] = ch2;
myIntStr.fourChars[2] = ch3;
myIntStr.fourChars[3] = ch4;
and then access the int as
printf("%d\n", myIntStr.singleInt);
Edit
In your case for 16 ints the union could be extended to look like
union u_my16ints {
char str[16*sizeof(int)];
int ints[16];
};
This is what I come up with
int len = strlen(str);
int count = (len + sizeof(int))/sizeof(int);
int *ptr = (int *)calloc(count, sizeof(int));
memcpy((void *)ptr, (void *)str, count*sizeof(int));
Due to the use of calloc(), the resulting buffer has at least one NULL, maybe more to pad the last integer. This is not portable because the integers are in native byte order.
Related
today I have been experimenting with memory in java. Specifically, I was deserializing objects into binary data and reserializing them. Something caught my eye and that is that for example an array of bytes with the size 1 takes up less binary data than defining a byte. Here's what I mean:
I defined a single byte in java and printed out the binary data of the byte:
byte size: 75bytes
101011001110110100000000000001010111001101110010000000000000111001101010011000010111011001100001001011100110110001100001011011100110011100101110010000100111100101110100011001011001110001001110011000001000010011101110010100001111010100011100000000100000000000000001010000100000000000000101011101100110000101101100011101010110010101111000011100100000000000010000011010100110000101110110011000010010111001101100011000010110111001100111001011100100111001110101011011010110001001100101011100101000011010101100100101010001110100001011100101001110000010001011000000100000000000000000011110000111000000000001
and here's a byte[1] array
byte[] size: 28bytes
10101100111011010000000000000101011101010111001000000000000000100101101101000010101011001111001100010111111110000000011000001000010101001110000000000010000000000000000001111000011100000000000000000000000000000000000100000001
But if I print out the size of byte[0] (byte at location 0 in the array) it suddenly grows back to 75bytes:
size of byte[0] in byte array:
101011001110110100000000000001010111001101110010000000000000111001101010011000010111011001100001001011100110110001100001011011100110011100101110010000100111100101110100011001011001110001001110011000001000010011101110010100001111010100011100000000100000000000000001010000100000000000000101011101100110000101101100011101010110010101111000011100100000000000010000011010100110000101110110011000010010111001101100011000010110111001100111001011100100111001110101011011010110001001100101011100101000011010101100100101010001110100001011100101001110000010001011000000100000000000000000011110000111000000000001
And yes, it's the full object and not metadata or something because using this binary i can reconstruct the object to it's original state so the values are stored inside the binary data. Here's the code I used to find out the size of the data:
public class MemoryFunctions {
static int sizeOf(Object input) {
int size = 0;
ByteArrayOutputStream checker = new ByteArrayOutputStream();
try {
ObjectOutputStream byteArray = new ObjectOutputStream(checker);
byteArray.writeObject(input);
byteArray.flush();
byte sizeDetector[] = checker.toByteArray();
size = sizeDetector.length;
int amountOfBytes = 0;
for (byte b:
sizeDetector) {
System.out.print(String.format("%8s", Integer.toBinaryString(b & 0xFF)).replace(' ', '0'));
amountOfBytes +=1;
}
System.out.println("real size in byte " + amountOfBytes);
System.out.println();
} catch (Exception e) {
System.err.println(e);
}
return size;
}
}
Is there any reason, that a byte array takes up less space than the byte itself? I need to heavily optimize a program. Using this information, would it be a better idea to have the values of a class that I want to deserialize into binary data in array form or are there any benefits of using "the full value"? Also, I am kind of confused with this information because as far as I know, byte and byte[] are primitive datatypes so they don't get called by reference but are stored as binary in memory "as is". What I also found out that getting the size of value 0 in the smaller array suddenly generates me a new int because it's size is again 75 bytes. Does this mean that values are generated another time when you call an index of an array?
It'd be nice if any of you had more information about this topic and could answer my questions.
In Java byte is a primitive type passed by-value, but all arrays are effectively objects and by-reference, including byte[].
If you pass a byte[1] to your sizeOf it passes the array, because all object types are subclasses of and compatible with the parameter Object input, and sizeOf serializes the array.
If you try to pass a primitive byte it doesn't work, because no primitive type is a reference type so it cannot be compatible with java.lang.Object or any other object type. Instead the byte is 'boxed' to an an object of the language-defined class java.lang.Byte (note different spelling) -- and sizeOf serializes that object. This is often called auto-boxing (and auto-unboxing for the reverse) because the compiler does these conversions without you writing them in the source code. The boxed object is actually about the same size in memory as the array (and both are significantly larger than the primitive), but as commented by https://stackoverflow.com/users/869736/louis-wasserman the serialization of the java.lang.Byte object is more complicated and longer than the serialization of the byte[1] array.
I'm looking for a solution in pesudo code or java or js for the following problem:
We need to implement an efficient bit structure to hold data for N bits (you could think of the bits as booleans as well, on/off).
We need to support the following methods:
init(n)
get(index)
set(index, True/False)
setAll(True/false)
Now I got to a solution with o(1) in all except for init that is o(n). The idea was to create an array where each index saves value for a bit. In order to support the setAll I would also save a timestamp withe the bit vapue to know if to take the value from tge array or from tge last setAll value. The o(n) in init is because we need to go through the array to nullify it, otherwise it will have garbage which can be ANYTHING. Now I was asked to find a solution where the init is also o(1) (we can create an array, but we cant clear the garbage, the garbage might even look like valid data which is wrong and make the solution bad, we need a solution that works 100%).
Update:
This is an algorithmic qiestion and not a language specific one. I encountered it in an interview question. Also using an integer to represent the bit array is not good enough because of memory limits. I was tipped that it has something to do with some kind of smart handling of garbage data in the array without ckeaning it in the init, using some kind of mechanism to not fall because if the garbage data in the array (but I'm not sure how).
Make lazy data structure based on hashmap (while hashmap sometimes might have worse access time than o(1)) with 32-bit values (8,16,64 ints are suitable too) for storage and auxiliary field InitFlag
To clear all, make empty map with InitFlag = 0 (deleting old map is GC's work in Java, isn't it?)
To set all, make empty map with InitFlag = 1
When changing some bit, check whether corresponding int key bitnum/32 exists. If yes, just change bitnum&32 bit, if not and bit value differs from InitFlag - create key with value based on InitFlag (all zeros or all ones) and change needed bit.
When retrieving some bit, check whether corresponding key exists. If yes, extract bit, if not - get InitFlag value
SetAll(0): ifl = 0, map - {}
SetBit(35): ifl = 0, map - {1 : 0x10}
SetBit(32): ifl = 0, map - {1 : 0x12}
ClearBit(32): ifl = 0, map - {1 : 0x10}
ClearBit(1): do nothing, ifl = 0, map - {1 : 0x10}
GetBit(1): key=0 doesn't exist, return ifl=0
GetBit(35): key=1 exists, return map[1]>>3 =1
SetAll(1): ifl = 1, map = {}
SetBit(35): do nothing
ClearBit(35): ifl = 1, map - {1 : 0xFFFFFFF7 = 0b...11110111}
and so on
If this is a college/high-school computer science test or homework assignment question - I suspect they are trying to get you to use BOOLEAN BIT-WISE LOGIC - specifically, saving the bit inside of an int or a long. I suspect (but I'm not a mind-reader - and I could be wrong!) that using "Arrays" is exactly what your teacher would want you to avoid.
For instance - this quote is copied from Google's Search Reults:
long: The long data type is a 64-bit two's complement integer. The
signed long has a minimum value of -263 and a maximum value of 263-1.
In Java SE 8 and later, you can use the long data type to represent an
unsigned 64-bit long, which has a minimum value of 0 and a maximum
value of 264-1
What that means is that a single long variable in Java could store 64 of your bit-wise values:
long storage;
// To get the first bit-value, use logical-or ('|') and get the bit.
boolean result1 = (boolean) storage | 0b00000001; // Gets the first bit in 'storage'
boolean result2 = (boolean) storage | 0b00000010; // Gets the second
boolean result3 = (boolean) storage | 0b00000100; // Gets the third
...
boolean result8 = (boolean) storage | 0b10000000; // Gets the eighth result.
I could write the entire thing for you, but I'm not 100% sure of your actual specifications - if you use a long, you can only store 64 separate binary values. If you want an arbitrary number of values, you would have to use as many 'long' as you need.
Here is a SO posts about binary / boolean values:
Binary representation in Java
Here is a SO post about bit-shifting:
Java - Circular shift using bitwise operations
Again, it would be a job, and I'm not going to write the entire project. However, the get(int index) and set(int index, boolean val) methods would involve bit-wise shifting of the number 1.
int pos = 1;
pos = pos << 5; // This would function as a 'pointer' to the fifth element of the binary number list.
storage | pos; // This retrieves the value stored as position 5.
i need to to rewrite some code from c++ to java and i've got into trouble with such c++ code:
using u256 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<256, 256, boost::multiprecision::unsigned_magnitude, boost::multiprecision::unchecked, void>>;
using h256 = FixedHash<32>;
using bytes = std::vector<byte>;
uint32_t offset = ...;
bytes m_data = ...;
u256 result;
result = (u256)*(h256 const*)(m_data.data() + (size_t)offset);
I have no idea what's going on and how do i rewrite it in java code.
I've understood that firstly we made and offset and now pointing at some element of m_data array, then cast in to array of h256 type (i've watched debug and this cast made the following: we get data from 0 to offset from m_data and then cast to 32 size array with leading zero's)
And then we get a first value (im not sure about it) of this array and cast to u256? But the first value after (h256 const*) cast is zero but anyway the resulting value is not a zero.
Do u have any ideas?
I don't know what a u256 is, and the question miss the typedef, but this is the typical way in C to get a scalar type (int16_t, int32_t, int64_t, double....) from a buffer in memory.
Essentially the use of the syntax:
type t = (type)*(const type *)(buffer + offset)
... let you obtain an object of a specific type from a byte array starting from a particular index.
It's not very safe, but it blazing fast when converted to assembly!
NOTE: the pointer math depends from the declaration of "buffer", if it's int8_t * for instance buffer will be get from the "offset"-nth byte, if it's int32_t * it will be used from the "offset * 4"-nth byte.
I have read some answers for this question(Why I can't create an array with large size? and https://bugs.openjdk.java.net/browse/JDK-8029587) and I don't understand the following.
"In the GC code we pass around the size of objects in words as an int." As I know the size of a word in JVM is 4 bytes. According to this, if we pass around the size of long array of large size (for example, MAX_INT - 5) in words as an int, we must get OutOfMemoryException with Requested array size exceeds VM limit because the size is too large for int even without size of header. So why arrays of different types have the same limit on max count of elements?
Only addressing the why arrays of different types have the same limit on max count of elements? part:
Because it doesn't matter to much in practical reality; but allows the code implementing the JVM to be simpler.
When there is only one limit; that is the same for all kinds of arrays; then you can deal all arrays with that code. Instead of having a lot of type-specific code.
And given the fact that the people that need "large" arrays can still create them; and only those that need really really large arrays are impacted; why spent that effort?
The answer is in the jdk sources as far as I can tell (I'm looking at jdk-9); also after writing it I am not sure if it should be a comment instead (and if it answers your question), but it's too long for a comment...
First the error is thrown from hotspot/src/share/vm/oops/arrayKlass.cpp here:
if (length > arrayOopDesc::max_array_length(T_ARRAY)) {
report_java_out_of_memory("Requested array size exceeds VM limit");
....
}
Now, T_ARRAY is actually an enum of type BasicType that looks like this:
public static final BasicType T_ARRAY = new BasicType(tArray);
// tArray is an int with value = 13
That is the first indication that when computing the maximum size, jdk does not care what that array will hold (the T_ARRAY does not specify what types will that array hold).
Now the method that actually validates the maximum array size looks like this:
static int32_t max_array_length(BasicType type) {
assert(type >= 0 && type < T_CONFLICT, "wrong type");
assert(type2aelembytes(type) != 0, "wrong type");
const size_t max_element_words_per_size_t =
align_size_down((SIZE_MAX/HeapWordSize - header_size(type)), MinObjAlignment);
const size_t max_elements_per_size_t =
HeapWordSize * max_element_words_per_size_t / type2aelembytes(type);
if ((size_t)max_jint < max_elements_per_size_t) {
// It should be ok to return max_jint here, but parts of the code
// (CollectedHeap, Klass::oop_oop_iterate(), and more) uses an int for
// passing around the size (in words) of an object. So, we need to avoid
// overflowing an int when we add the header. See CRs 4718400 and 7110613.
return align_size_down(max_jint - header_size(type), MinObjAlignment);
}
return (int32_t)max_elements_per_size_t;
}
I did not dive too much into the code, but it is based on HeapWordSize; which is 8 bytes at least. here is a good reference (I tried to look it up into the code itself, but there are too many references to it).
I'm working on an application that's supposed to read and process flat files. These files don't always use a consistent encoding for every field in a record, so it was decided that we should read/write bytes and avoid the necessary decoding/encoding of turning them into Strings.
However, a lot of these fields are simple integers, and I need to validate them (test that they are really integers and in a certain range). I need a function that receives a byte[] and turns that into an int. I'm assuming all the digits are plain ASCII.
I know I could do this by first turning the byte[] into a CharBuffer, decoding to ISO-8859-1 or UTF-8, and then calling Integer.parseInt() but that seems like a lot of overhead and performance is important.
So, basically what I need is a Java equivalent of atoi(). I would prefer an API function (including 3rd party APIs). Also, the function should report errors in some way.
As a side note, I'm having the same issue with fields representing date/time (these are more rare though). It would be great if someone could mention some fast C-like library for Java.
while i can not give you a ready java solution i want to point you onto interesting (c) code for you to read: the author of qmail has a small function to quickly parse unsigned longs from a byte array scan_ulong, you can find lots of incarnations of that function all over the web:
unsigned int scan_ulong(register const char *s,register unsigned long *u)
{
register unsigned int pos = 0;
register unsigned long result = 0;
register unsigned long c;
while ((c = (unsigned long) (unsigned char) (s[pos] - '0')) < 10) {
result = result * 10 + c;
++pos;
}
*u = result;
return pos;
}
(taken from here: https://github.com/jordansissel/djbdnsplus/blob/master/scan_ulong.c )
that code should translate pretty smoothly to java.
The atoi function from the C library is an incredibly dull piece of code: you can translate it to Java in five minutes or less. If you must avoid writing your own, you could use the String(byte\[\] buf, int offset,int length) constructor to make Java string bypassing CharBuffer, and parse it to complete the conversion.