Manipulate big XML file in Java Spring - java

I have a Java program (a war) that runs out of memory when manipulating a big XML file.
The program is a REST API that returns the manipulated XML via a REST Controller.
First, the program gets an XML file from a remote URL.
Then it replaces the values of id attributes.
Finally, it returns the new XML to the caller via the API controller.
What I get from the remote URL is a byte[] body with XML data.
Then, I convert it to a String.
Next, I do a regexp search-replace on the whole string.
Then I convert it back to a byte[].
I'm guessing that the XML now is in memory 3 times (the incoming bytes, the string and the outgoing bytes).
I'm looking for ways to improve this.
I have no local copies on the filesystem btw.

You can delete the incoming bytes from memory after converting the bytes to String:
byte[] bytes = bytesFromURL;
String xml = new String(bytes);
{...manipulate xml}
bytes = null;
System.gc();
bytes = xml.getBytes();

Related

Send binary file from Java Server to C# Unity3d Client with Protocol Buffer

I have asked this question https://stackoverflow.com/questions/32735189/sending-files-from-java-server-to-unity3d-c-sharp-client but I saw that it isn't an optimal solution to send files between Java and C# via built-in operations, because I also need also other messages, not only the file content.
Therefore, I tried using Protobuf, because it is fast and can serialize/deserialize objects platform independent. My .proto file is the following:
message File{
optional int32 fileSize = 1;
optional string fileName = 2;
optional bytes fileContent = 3;
}
So, I set the values for each variable in the generated .java file:
file.setFileSize(fileSize);
file.setFileName(fileName);
file.setFileContent(ByteString.copyFrom(fileContent, 0, fileContent.length);
I saw many tutorials about how to write the objects to a file and read from it. However, I can't find any example about how to send a file from server socket to client socket.
My intention is to serialize the object (file size, file name and file content) on the java server and to send these information to the C# client. So, the file can be deserialized and stored on the client side.
In my example code above, the server read the bytes of the file (image file) and write it to the output stream, so that the client can read and write the bytes to disk through input stream. I want to achieve the same thing with serialization of my generated .proto file.
Can anyone provide me an example or give me a hint how to do that?
As described in the documentation, protobuf does not take care of where a message start and stops, so when using a stream socket like TCP you'll have to do that yourself.
From the doc:
[...] If you want to write multiple messages to a single file or stream, it is up to you to keep track of where one message ends and the next begins. The Protocol Buffer wire format is not self-delimiting, so protocol buffer parsers cannot determine where a message ends on their own. The easiest way to solve this problem is to write the size of each message before you write the message itself. When you read the messages back in, you read the size, then read the bytes into a separate buffer, then parse from that buffer. [...]
Length-prefixing is a good candidate. Depending on what language you're writing, there are libraries that does length-prefixing for e.g. TCP that you can use, or you can define it yourself.
An example representation of the buffer on the wire might beof the format might be (beginning of buffer to the left):
[buf_length|serialized_buffer2]
So you code to pack the the buffer before sending might look something like (this is in javascript with node.js):
function pack(message) {
var packet = new Buffer(message.length + 2);
packet.writeIntBE(message.length, 0, 2);
message.copy(packet, 2);
return packet;
}
To read you would have to do the opposite:
client.on('data', function (data) {
dataBuffer = Buffer.concat([dataBuffer, data]);
var dataLen = dataBuffer.readIntBE(0, 2);
while(dataBuffer.length >= dataLen) {
// Message length excluding length prefix of 2 bytes
var msgLen = dataBuffer.readIntBE(0, 2);
var thisMsg = new Buffer(dataBuffer.slice(2, msgLen + 2));
//do something with the msg here
// Remove processed message from buffer
dataBuffer = dataBuffer.slice(msgLen + 2);
}
});
You should also be aware of that when sending multiple protobufs on a TCP socket, they are likely to be buffered for network optimizations (concatenated) and sent together. Meaning some sort of delimiter is needed anyway.

Reading a binary file from the file system as a BLOB to use in rhino with javascript

I'm planing to use SheetJS with rhino. And sheetjs takes a binary object(BLOB if i'm correct) as it's input. So i need to read a file from the system using stranded java I/O methods and store it into a blob before passing it to sheetjs. eg :-
var XLDataWorkBook = XLSX.read(blobInput, {type : "binary"});
So how can i create a BLOB(or appropriate type) from a binary file in java in order to pass it in.
i guess i cant pass streams because i guess XLSX needs a completely created object to process.
I found the answer to this by myself. i was able to get it done this way.
Read the file with InputStream and then write it to a ByteArrayOutputStream. like below.
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
...
buffer.write(bytes, 0, len);
Then create a byte array from it.
byte[] byteArray = buffer.toByteArray();
Finally i did convert it to a Base64 String (which is also applicable in my case) using the "Base64.encodeBase64String()" method in apache.commons.codec.binary package. So i can pass Base64 String as a method parameter.
If you further need there are lot of libraries(3rd-party and default) available for Base64 to Blob conversion as well.

Saving a file returned by web-service in java

I have to make a call to a web-service for which the response is as per following
<ns2:wsresponse>
<ns2:length>10582</ns2:length>
<ns2:filecontent>H4sIAAAAAAAAALVZa3OjSLL9fB...
(Snip)
</ns2:filecontent>
<ns2:contentType>application/gzip</ns2:contentType>
</ns2:wsresponse>
The web-service is actually returning a file which is encoded using mime-type application/gzip (as in ns2:contentType). I am not sure how to save the file on disk on the client side in java?
The <ns2:filecontent> tag appears to hold a BASE64 encoded string which probably is the content of the file.
Basically take that BASE64 encoded string decode it and the resulting byte[] can be used to store the data on disk.

Encryption/Decryption, getting IllegalBlockSizeException

I'm working with Java's encryption library and getting a IllegalBlockSizeException.
I am currently trying to extract database contents in XML file format. During the data dump, I am creating a manifest file with a string that gets decrypted using a key defined in the database.
Later, when the contents of the XML's files are loaded into another database, it gets the key from that database and uses it to decrypt the manifest. If the decrypted manifest does not match the original contents, that means the encryption keys in the source and destination databases do not match and the user is notified of this.
The following is the code. The EncryptionEngine object is a singleton that uses the Java encryption library to abstract away a lot of the details of encryption. Assume that it works correctly, as it's fairly old and mature code.
This is all in a class I've made. First, we have these data members:
private final String encryptedManifestContents;
private final static String DECRYPTED_MANIFEST_CONTENTS = "This file contains the encrypted string for validating data in the dump and load process";
final static String ENCRYPTED_MANIFEST_FILENAME = "manifest.bin";
First the encryption process. The string is encrypted like the following:
final EncryptionEngine encryptionEngine = EncryptionEngine.getInstance();
encryptedManifestContents = encryptionEngine.symmetricEncrypt(DECRYPTED_MANIFEST_CONTENTS); // The contents get converted to bytes via getBytes("UTF-8")
Then written to the manifest file (destination is just a variable holding the file path as a string):
EncryptedManifestUtil encryptedManifestUtil = new EncryptedManifestUtil(); // The class I've created. The constructor is the code above, which just initialized the EncryptionEngine and encrypted the manifest string.
manifestOut = new FileOutputStream(destination + "/" + ENCRYPTED_MANIFEST_FILENAME);
manifestOut.write(encryptedManifestUtil.encryptedManifestContents.getBytes("UTF-8"));
At this point, the encryption process is done. We've taken a String, encrypted it, and written the contents to a file, in that order. Now when someone loads the data, the decryption process starts:
BufferedReader fileReader = new BufferedReader(new FileReader(filename)); // Filename is the manifest's file name and location
final EncryptionEngine encryptionEngine = EncryptionEngine.getInstance();
String decryptedManifest = encryptionEngine.decryptString(fileReader.readLine().getBytes("UTF-8")); // This is a symmetric decrypt
When the decryption happens, it throws this exception:
Caused by: javax.crypto.IllegalBlockSizeException: last block incomplete in decryption
at org.bouncycastle.jcajce.provider.symmetric.util.BaseBlockCipher.engineDoFinal(Unknown Source)
at javax.crypto.Cipher.doFinal(DashoA13*..)
It appears to read and write correctly to the file, but the contents are gibberish to me. The result from the fileReader.readLine() is:
9�Y�������䖷�߾��=Ă��� s7Cx�t�b��_-(�b��LFA���}�6�f����Ps�n�����ʢ�#�� �%��%�5P�p
Thanks for the help.
EDIT: So I changed the way I write to a file.
Recall this line:
encryptedManifestContents = encryptionEngine.symmetricEncrypt(DECRYPTED_MANIFEST_CONTENTS);
The encrypt first gets the bytes from the inputted string, then decrypts, then changes the bytes back to a string by first encoding it to the base 64 bytes. Then it converts the base 64 bytes array back to a string.
With this in mind, I changed the file writer to a PrintWriter instead of a FileOutputStream and directly write the string to the file instead of the bytes. I'm still getting the error unfortunately. However there seem to be less of the � in the resulting String from the read line.
It looks like the problem is with your fileReader.readLine() - you're writing a byte stream to a file, and then reading it back in as a string. Instead, you should either read in a byte stream, e.g. refer to this question, or else use Base64 Encoding to convert your byte array to a string, write it to a file, read it from a file, and convert it back to a byte array.
I believe you are incorrectly using a Reader which is an object defined to read characters when you actually want to be dealing strictly in bytes. This is most likely not the entirety of your problem but if you are writing bytes you should read bytes not characters.

transfer jpeg from c++ to android (java)

I am struggling with the transfer of a simple jpeg file inside an ID3v2 tag from c++ over a TCP socket to java (Android). The library "taglib" offers to extract this file and I am able to save the jpeg as a new file.
The send function looks like this
char *parameter_full = new char[f3->picture().size()+2];
sprintf(parameter_full,"%s\n\0",f3->picture().data());
// send
result = send(c,parameter_full,strlen(parameter_full),0);
delete[] parameter_full;
where
f3->picture().data() returns a pointer to the internal data structure (it returns char*) and
f3->picture().size() returns the size of the array.
Then Android receives it with
String imageString = inFromServer.readLine();
byte[] imageBytes = imageString.getBytes();
Bitmap cover = BitmapFactory.decodeByteArray(imageBytes,0,imageBytes.length);
But somehow decodeByteArray always returns null. My idea is that Java doesn't receive the image correctly because imageString only consists of 4 characters...while the extracted jpeg file has a size of 12.7 KB.
But what has gone wrong?
Martin
You shouldn't use string functions on byte data because 0 values are taken as string terminators. Try looking into memcpy on the C++ side if you need to copy the char* and also the byte[] read functions for InputStream on the Java side.

Categories

Resources