Java heap size error in mirth - java

I am using mirth connect 3.0.3 and i am having a .xml file which is almost 85mb size and contains some device information. i need to read this .xml file and insert that data to the database(sql server).
the problem i am facing is when i try to read the data it is showing java heap size error:
i increased server memory to 1024mb and client memory to 1024mb.
but it is showing the same error. if i increase the memory to more, i am not able to start mirth connect.
any suggestion is appreciated.
Thanks.

Is the XML file comprised of multiple separate sections/pieces of data that would make sense to split up into multiple channel messages? If so, consider using a Batch Adapter. The XML data type has options to split based on element/tag name, the node depth/level, or an XPath query. All of those options currently still require the message to read into memory in its entirety, but it will still be more memory-efficient than reading the entire XML document in as a single message.
You can also use a JavaScript batch script, in which case you're given a Java BufferedReader, and can use the script to read through the file and return a message at a time. In this case, you will not have to read the entire file into memory.
Are there large blobs of data in the message that don't need to be manipulated in a transformer? Like, embedded images, etc? If so, consider using an Attachment Handler. That way you can extract that data and store it once, rather than having it copied and stored multiple times throughout the message lifecycle (for Raw / Transformed / Encoded / etc.).

Related

Reading file on remote server via Java

I am developing a Java application through which I need to read the log files present on a server and perform operations depending on the content of the logs.
Files range from 3GB up to 9GB.
Here on stack I have already read the discussion about reading large files with java, I am attaching the link:
Java reading large file discussion
In the discussion, the files are read locally,
in my case i have to retrieve and read the file on the server, is there an efficient way to achieve this?
I would like to avoid having to download files given their size.
I had thought about using URL Reader to retrieve the files, but I have doubts about the speed of execution.
The files I need to recover are under the path C:\production\LOG\file.log
Do you have any suggestions or advice?

How to determine if a file is complete on Azure File Storage in Java?

For our project we are using Azure File Storage, in which large files (at most 500 MB) can be uploaded and must be processed by Java microservices (based on Spring Boot), by using Azure SDK for Java, that periodically polls the directory to see if new files have been uploaded.
Is it possible, in some ways, to determine when the uploaded file is completely uploaded, without the obvious solutions like monitoring the size?
Unfortunately it is not directly possible to monitor when a file upload has been completed (including monitoring the size). This is because the file upload happens in two stages:
First, an empty file of certain size is created. This maps to Create File REST API operation.
Next, content is written to that file. This maps to Put Range REST API operation. This is where the actual data is written to the file.
Assuming data is written to the file in sequential order (i.e. from byte 0 to file size), one possibility would be to keep on checking last "n" number of bytes of the file and see if all of them are non-zero bytes. That would indicate some data has been written at the end of the file. Again, this is not a fool-proof solution as there may be a case where last "n" bytes are genuinely zero.

Writing CSV files causing JVM crash

In the existing project user can download report and it has 10 million records, this process gets data from database and writes to csv by using super csv java api then sends an email to user by attaching, it takes huge heap space to hold 10 million java objects and writing these records to csv files, because of this server is crashing and going down as application has many reports like this. is there any better way to handle this.? I red sxssfworkbook documentation and it says specified records count can keep in memory and remaining records will be pushed to hard disk but this is using to create excel files. is there any similar api to create csv files or sxssfworkbook can be used to create csv files.?
There are few Java libraries for reading and writing CSV files. They typically support "streaming", so they do not have the problem of needing to hold the source data or the generated CSV in memory.
The Apache Commons CSV library would be a good place to start. Here is the User Guide. It supports various of flavors of CSV file, including the CSV formats generated by Microsoft Excel.
However, I would suggest that sending a CVS file containing 10 million records (say 1GB uncompressed data) is not going to make you popular with the people who run your users' email servers! Files that size should be made available via an web or file transfer service.

Need to upload and parse 15MB files, open files twice?

I have a file which I need to upload to a service, and parse into relevant data. The parser and the uploader both require an InputStream. Ought I to open the file twice? I could save the file to a String, but having many of these files in memory is concerning.
EDIT: Thought I should make it clear that the parsing and uploading are entirely separate processes.
Since you are parsing it already it would be most efficient to load the file into a string. Parse it into indexes to the string, you will save memory and can just upload the string whenever you want to. This would be the most effective way, with memory but maybe not processing time.
A reply to one of the comments above.
Separate processes does not mean different threads or processes, just they do not need each other to operate.

How to read Big files (~300 MB) from Google Cloud Storage?

I can easily upload/write or Read contents of files (~80KB) from Google Cloud Storage
Now, I have to perform a bigger task while serving big files (~200MB-300MB) :
1) Need to read the contents of uploaded file into chunks (~10 KB).
<--Want to modify chunked data programmatically-->
2) Repeat the "1" steps until stream read the whole content of the file (from starting to end sequentially).
I tried this program but in response i only some amount of data. How to perform mentioned task?
You should not use the file API (which is deprecated - see comment at the top of the page you mentioned). Instead use the GCS client (mentioned in the deprecation notice). The GCS client allows you to read continuously and you can serialize the state of the GcsInputChannel between requests until read was completed (if read is longer than request timeout). You should also consider using the mapreduce library and use the GoogleCloudStorageLineInput for reading the file and writing the modified one in your mapper (probably map-only in your case).

Categories

Resources