Merge three ByteArrays together and then split resulting ByteArrays

Merge three ByteArrays together and then split resulting ByteArrays - java

I have a ByteArray value as avroBinaryValue , Schema Name value as String schemaName and Last Modified Date value as lastModifiedDate in long.
byte[] avroBinaryValue = os.toByteArray();
String schemaName = "DEMOGRAPHIC";
long lastModifiedDate = "1379811105109";
Now I am planning to convert schemaName into byteArray as well. Let's name it byteSchmeName.
After that, I will convert lastModifiedDate to byteArray as well. let's name that as well to byteLMD.
Now what's the best way to concatenate these three byteArrays together.
avroBinaryValue + byteSchemaName + byteLMD
Secondly, after concatenating these three byteArrays together, I want to split the resulting byteArrays in such a way such that I will be able to get all the three respective byteArrays properly...
Is it possible to do that? Any help will be appreciated.
NOTE:-
All the three byteArrays value will be different in different scenarios.. I am looking the most efficient way to store the resulting byteArrays in such a way such that it doesn't take that much space on the disk. I dont want to serialize it again since avroBinaryValue that I am getting is coming from Avro Data Serialization.. So I want to convert the other two things as well in ByteArray so that I can merge all three together into a single ByteArray.

You need to define a format. You have the following
byte[] avroBinaryValue = os.toByteArray();
String schemaName = "DEMOGRAPHIC";
long lastModifiedDate = 1379811105109L;
I guess avroBinaryValue can be variable length and so can schemaName. For all intents and purposes, lastModifiedDate fits in a long, ie. 8 bytes.
If you want to serialize this (other than using Serializable), you'll have to use a specific format that will tell you what you are reading and when to stop readin it. For example
Offset Length (in bytes) Purpose
0 4 - length of avroBinaryValue array
4 X - avroBinaryValue array
4+X 4 - length of of schemaName byte array
4+X+4 Y - schemaName byte array
4+X+4+Y 8 - value of lastModifiedDate
Also decide if you want big-endian or small-endian byte order.
So you write your three fields as described in the format and you read it the same way.
Here's an example done in memory where os is a String (for simplicity)
public static void main(String[] args) throws Exception {
String os = "whatever os is";
byte[] avroBinaryValue = os.getBytes();
String schemaName = "DEMOGRAPHIC";
long lastModifiedDate = 1379811105109L;
byte[] schemaNameBytes = schemaName.getBytes();
ByteArrayOutputStream byteOs = new ByteArrayOutputStream();
DataOutputStream out = new DataOutputStream(byteOs);
out.writeInt(avroBinaryValue.length);
out.write(avroBinaryValue);
out.writeInt(schemaNameBytes.length);
out.write(schemaNameBytes);
out.writeLong(lastModifiedDate);
// write done
byte[] allWrittenBytes = byteOs.toByteArray();
DataInputStream in = new DataInputStream(new ByteArrayInputStream(allWrittenBytes));
int sizeAvro = in.readInt();
avroBinaryValue = new byte[sizeAvro];
in.read(avroBinaryValue, 0, sizeAvro);
int sizeSchema = in.readInt();
schemaNameBytes = new byte[sizeSchema];
in.read(schemaNameBytes, 0, sizeSchema);
lastModifiedDate = in.readLong();
// read done
System.out.println(new String(avroBinaryValue));
System.out.println(new String(schemaNameBytes));
System.out.println(lastModifiedDate);
}
It prints
whatever os is
DEMOGRAPHIC
1379811105109
I understand you are trying to save space, but it might just be better to write each field to its own column or use a standard format like XML or JSON to serialize your fields.

Related

Fastest way to convert sequence of bytes to string [duplicate]

I have a byte array of 151 bytes which is typically a record, The record needs to inserted in to a oracle database. In 151 byte of array range from 0 to 1 is a record id , 2 to 3 is an reference id , 4 to 9 is a date value. The following data in an byte array is a date value. i want to convert it to string
byte[] b= {48,48,49,48,48,52}; // when converted to string it becomes 10042.
new String(b); // current approach
is there any way to efficiently to convert byte array of some range (Arrays.copyOfRange(b,0,5)) to string .

new String(b, 0 ,5);
See the API doc for more information.

Use the String(bytes[] bytes, int offset, int length) constructor: http://download.oracle.com/javase/1.5.0/docs/api/java/lang/String.html#String(byte[], int, int)
new String(b, 0, 5);

If you need to create a string for each region in the record, I would suggest a substring approach:
byte[] wholeRecord = {0,1,2 .. all record goes here .. 151}
String wholeString = new String(wholeRecord);
String id = wholeString.substring(0,1);
String refId = wholeString.substring(1,3);
...
The actual offsets may be different depending on string encoding.
The advantage of this approach is that the byte array is only copied once. Subsequent calls to substring() will not create copies, but will simply reference the first copy with offsets. So you can save some memory and array copying time.

None of the answers here consider that you might not be using ASCII. When converting bytes to a string, you should always consider the charset.
new String(bytes, offset, length, charset);

and here fantastic way (not efficient) :)
byte[] b = { 48, 48, 49, 48, 48, 52 };
ByteArrayInputStream bais = new ByteArrayInputStream(b);
BufferedReader buf = new BufferedReader(new InputStreamReader(bais));
String s = buf.readLine();
System.out.println(s);

byte array length varies before and after transformation

I have a need to send and receive large byte array over internet(http restful service).
the simplest way I can think of is to convert the byte array into string.
I searched around and found this post Java Byte Array to String to Byte Array
I had the follow code to verify the accuracy of the transformation.
System.out.println("message");
System.out.println (message);
String message = "Die Strahlengriffelgewächse stammen...";
byte[] pack = Fbs.packExce(message);
System.out.println ("pack");
System.out.println (pack);
System.out.println ("packlenght:" + pack.length);
String toString = new String(pack);
System.out.println ("toString");
System.out.println (toString);
byte[] toBytes = toString.getBytes();
System.out.println ("toBytes");
System.out.println (toBytes);
System.out.println ("toByteslength:" +toBytes.length);
the "Fbs.packExce()" is a method of taking in large chunk of string and churning out byte array of large size.
I changed the length of the message, checked and printed out the length of byte arrays before converting to string and after converting back.
I got the following results:
...
pack
[B#5680a178
packlenght:748
...
toBytes
[B#5fdef03a
toByteslength:750
----------------------
...
pack
[B#5680a178
packlenght:1016
...
toBytes
[B#5fdef03a
toByteslength:1018
I had omitted the "message" since it is too long.
8 times out of 10, I can see that the derived byte array(the new one, saying "toBytes") is longer by 2 bytes than the original byte array ( the "pack")
I said 8 of 10, because there were also scenarios when the length are the same between the derived and the original, see below
...
pack
[B#5680a178
packlenght:824
toString
...
toBytes
[B#5fdef03a
toByteslength:824
...
I can not figure out the exact rules.
does anyone has any idea?
or are there any better ways of converting byte array to and from string?
cheers

the simplest way I can think of is to convert the byte array into string.
The simplest way is the wrong way. For most character encodings, converting an arbitrary byte sequence to a text is likely to be lossy.
A better (i.e. more robust) way is to use Base64 encoding. Read the javadoc for the Base64 class and its dependent encode and decoder classes.
If you do persist in trying to convert arbitrary bytes top characters and back using new String(byte[]) and the like:
Be sure that you chose a character encoding where a Bytes -> Characters -> Bytes conversion sequence is not lossy. (LATIN-1 will work)
Don't rely on the current execution platform's default character encoding for the encoding / decoding charset.
In a client / server system, the client and server have to use the same encoding.

I have a need to send and receive large byte array over internet(http
restful service).
the simplest way I can think of is to convert the byte array into
string.
If that's all about sending/receiving byte array with jaxrs, each jaxrs implementation is perfectly capable of transmitting byte[]. See specification, section 4.2.4.

as per suggestion by Stephen C, I turned to Base64 basic mode:
following are my current complete verification code:
String message = "Die Strahlengriffelgewächse stammen ... ...
System.out.println("message");
System.out.println (message);
byte[] pack = Fbs.packExce(message);
System.out.println ("pack");
System.out.println (pack);
System.out.println ("packlenght:" + pack.length);
String toString = Base64.getEncoder().encodeToString(pack);
System.out.println ("toString");
System.out.println (toString);
byte[] toBytes = Base64.getDecoder().decode(toString);
System.out.println ("toBytes");
System.out.println (toBytes);
System.out.println ("toByteslength:" +toBytes.length);
String toBytesExtraction = extractExce(toBytes);
System.out.println ("toBytesExtraction");
System.out.println (toBytesExtraction);
String extraction = extractExce(pack);
System.out.println ("extraction");
System.out.println (extraction);
public static byte[] packExce(String text){
FlatBufferBuilder builder = new FlatBufferBuilder(0);
int textOffset = builder.createString(text);
Exce.startExce(builder);
Exce.addText(builder, textOffset);
int exce = Exce.endExce(builder);
Bucket.startBucket(builder);
Bucket.addContentType(builder, Post.Exce);
Bucket.addContent(builder, exce);
int buck = Bucket.endBucket(builder);
builder.finish(buck);
return builder.sizedByteArray();
//ByteBuffer buf = builder.dataBuffer();
//return buf;
//return Base64.getMimeEncoder().encodeToString(buf.array());
}
private String extractExce(byte[] bucket ){
String message = null;
ByteBuffer buf = ByteBuffer.wrap(bucket);
Bucket cont = Bucket.getRootAsBucket(buf);
System.out.println (cont.contentType());
if (cont.contentType() == Post.Exce){
message = ((Exce)cont.content(new Exce())).text();
}
return message;
}
and it seems work for my purpose:
...
pack
[B#5680a178
packlenght:2020
...
toBytes
[B#5fdef03a
toByteslength:2020
'''
----------------------
...
pack
[B#5680a178
packlenght:1872
...
toBytes
[B#5fdef03a
toByteslength:1872
...
and both extraction respectively from "toBytes" and "pack" faithfully restored the original "message"
String toBytesExtraction = extractExce(toBytes);
String extraction = extractExce(pack);
as a matter of fact, what I did not mention is that my original implementation had been base64 mime. my start point had been ByteBuffer then (my current is byte[]).
following are my code snippets if you are interested in.
coder
...
ByteBuffer buf = builder.dataBuffer();
return Base64.getMimeEncoder().encodeToString(buf.array());
decoder
ByteBuffer buf = ByteBuffer.wrap(Base64.getMimeDecoder().decode(bucket));
my guess is that the problem might have come from base64.mime.
because my first step of trouble location had been removing base64.mime, and using ByteBuffer directly. and it was a success...
well, I am a bit wandering off.
Back to the topic, I am still having no idea about the "2 bytes vary" regarding byte arrays before and after converting by "new String(byte[]) and "String.getBytes()" ...
cheers

How to use ByteBuffer properly?

I am trying to use ByteBuffer properly with BigEndian byte order format..
I have couple of fields which I am trying to put together into a single ByteBuffer before storing it in Cassandra database.
That Byte Array which I will be writing into Cassandra is made up of three Byte Arrays as described below-
short employeeId = 32767;
long lastModifiedDate = "1379811105109L";
byte[] attributeValue = os.toByteArray();
Now, I will write employeeId , lastModifiedDate and attributeValue together into a single Byte Array and that resulting Byte Array I will write into Cassandra and then I will be having my C++ program which will retrieve that Byte Array data from Cassandra and then deserialize it to extract employeeId , lastModifiedDate and attributeValue from it.
So to do this, I am using ByteBuffer with BigEndian byte order format.
I have put up this code together -
public static void main(String[] args) throws Exception {
String text = "Byte Buffer Test";
byte[] attributeValue = text.getBytes();
long lastModifiedDate = 1289811105109L;
short employeeId = 32767;
int size = 2 + 8 + 4 + attributeValue.length; // short is 2 bytes, long 8 and int 4
ByteBuffer bbuf = ByteBuffer.allocate(size);
bbuf.order(ByteOrder.BIG_ENDIAN);
bbuf.putShort(employeeId);
bbuf.putLong(lastModifiedDate);
bbuf.putInt(attributeValue.length);
bbuf.put(attributeValue);
bbuf.rewind();
// best approach is copy the internal buffer
byte[] bytesToStore = new byte[size];
bbuf.get(bytesToStore);
// write bytesToStore in Cassandra...
// Now retrieve the Byte Array data from Cassandra and deserialize it...
byte[] allWrittenBytesTest = bytesToStore;//magicFunctionToRetrieveDataFromCassandra();
ByteBuffer bb = ByteBuffer.wrap(allWrittenBytesTest);
bb.order(ByteOrder.BIG_ENDIAN);
bb.rewind();
short extractEmployeeId = bb.getShort();
long extractLastModifiedDate = bb.getLong();
int extractAttributeValueLength = bb.getInt();
byte[] extractAttributeValue = new byte[extractAttributeValueLength];
bb.get(extractAttributeValue); // read attributeValue from the remaining buffer
System.out.println(extractEmployeeId);
System.out.println(extractLastModifiedDate);
System.out.println(new String(extractAttributeValue));
}
Is there any better way of doing this, the way I am doing it currently? Or some minor improvements that we can do it here??
This is the first time I am using ByteBuffer so having little bit problem...
Can anyone take a look and let me know whether this is the right way to use ByteBuffer?

The default order is always BIG_ENDIAN, so you don't meed to set it. Also when you wrap() is is already rewind()ed.
Instead of copying the underlying array, I would use the underlying array.
Replace
bbuf.rewind();
// best approach is copy the internal buffer
byte[] bytesToStore = new byte[size];
bbuf.get(bytesToStore);
with
byte[] bytesToStore = bbuf.array();

Array of bytes: how to know exactly numbers of byte received

I'm using array of bytes to store data packet received from another computer.
receivedData = new byte[1024];
receivedPacket = new DatagramPacket(receivedData, receivedData.length);
socket.receive(receivedPacket);
receivedData = receivedPacket.getData();
String res = new String(receivedData); // PROBLEM HERE
the problem is at last line: because I declare receivedData as a byte stream with length 1024. so, always last line will create a new string from whole array, although it doesn't know exactly how many real byte I received. So, I meet frustrated error: res is not received as I wish. (because length of real bytes that I received not fix whole array)
So, my question is: how can I fix this point, how can I know how many bytes I really received to convert to string?

Try using DatagramPacket.getLength().
receivedData = new byte[1024];
receivedPacket = new DatagramPacket(receivedData, receivedData.length);
socket.receive(receivedPacket);
receivedData = receivedPacket.getData();
String charsetName = "US-ASCII"; // set to desired charset
String res = new String(receivedData, 0, receivedPacket.getLength(), charsetName);
Edited to add charset. Thanks, parsifal.

From the javadoc for DatagramSocket.receive():
The length field of the datagram packet object contains the length of
the received message
You can then construct your String using the constructor that takes a byte array and offsets.

Call DatagramPacket.getLength() to find out how many bytes were actually received.
And when you construct the String from those bytes, be sure to specify the encoding (as it is, you're using the JDK default encoding, which may differ from the server's encoding).

Efficient way to split a byte array to map the fields in a java class object

Is there a more efficient way to break the data in a byte array in Java?
I have written the following function to read a binary file with fixed length data field. But the performance are really slow, I need to read a binary file with 30,000 records each with the length of 300 bytes, and each record contain 240 fields.
Any advise?
public void breakField(byte[] input) {
ByteArrayInputStream bais = new ByteArrayInputStream(input);
byte[] tmp = new byte[2];
bais.read(tmp);
this.id = new String(tmp);
tmp = new byte[4];
bais.read(tmp);
this.name = new String(tmp);
tmp = new byte[8];
bais.read(tmp);
this.phone = new String(tmp);
tmp = new byte[15];
bais.read(tmp);
this.otherInfo = new String(tmp);
.... more fields...
}

By reading the entire file into a byte[] array, if the file is larger than the amount of space available in memory to the JVM, then you will get an OutOfMemoryError.
Instead, you can use a BufferedReader in conjunction with a FileReader. This will allow you to read the file piece by piece without loading the entire thing into memory.
BufferedReader in = new BufferedReader(new FileReader("/path/to/my/file"));
To load the data record-by-record, simply read in 300 bytes at a time when you call BufferedReader.read. If you prefer to deal with individual fields, instead of reading 300 bytes at a time, just read the number of bytes that corresponds to the length of the next field.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Merge three ByteArrays together and then split resulting ByteArrays - java

Related

Fastest way to convert sequence of bytes to string [duplicate]

byte array length varies before and after transformation

How to use ByteBuffer properly?

Array of bytes: how to know exactly numbers of byte received

Efficient way to split a byte array to map the fields in a java class object

Categories

Resources