Huge Json Parser - java

I have this custom parser, made in Java, where I want to export a 3,6 GB Json into an Sql Oracle database. The import works fine with a sample Json of 8MB. But when I try parsing the whole 3,6 GB JSON some memory problems appear, namely java.lang.OutOfMemoryError
I have used -Xmx5000m to allocate 5 GB of memory for this. My laptop has plenty of RAM.
As you can see I have memory left. Does this error happen because of the CPU?
UPDATE:
The Json represents the data from Free Code Camp: https://medium.freecodecamp.com/free-code-camp-christmas-special-giving-the-gift-of-data-6ecbf0313d62#.7mjj6abbg
The Data looks like this:
[
{
“name”: “Waypoint: Say Hello to HTML Elements”,
“completedDate”: 1445854025698,
“solution”: “Hello World\n”
}
]
As I've said, I have tried this parsing with an 8MB sample Json with the same data and it worked. So is the code really the problem here?
Here is some code
enter code here
public class MainParser {
public static void main(String[] args) {
//Date time;
try {
BufferedReader br = new BufferedReader(
new FileReader("output.json")); //destination to json here
Gson gson = new Gson();
Type collectionType = new TypeToken<List<List<Tasks>>>() {
}.getType();
List<List<Tasks>> details = gson.fromJson(br, collectionType);
DBConnect connection = new DBConnect("STUDENT","student");
connection.connect();
for (int person=0;person<details.size();person++)
{
for (int task = 0; task < details.get(person).size(); task++)
{
connection.insert_query(person + 1,
task + 1,
details.get(person).get(task).getName(),
(details.get(person).get(task).getCompletedDate()/1000),
details.get(person).get(task).getSolution());
}
}
} catch (IOException e) {
e.printStackTrace();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
Here is the insert_query method:
enter code here
public void insert_query(int person_id, int task_id, String taskName, double date, String solution) throws SQLException {
Statement stmt = conn.createStatement();
try {
String query = "INSERT INTO FreeCodeCamp VALUES(?,?,?,?,?)";
PreparedStatement ps = conn.prepareStatement(query);
ps.setInt(1,person_id);
ps.setInt(2,task_id);
ps.setString(3,taskName);
ps.setDate(4,null);
ps.setString(5,solution);
/*stmt.executeUpdate("INSERT INTO FreeCodeCamp VALUES("
+ person_id + ","
+ task_id + ","
+ "'" + taskName + "',"
+ "TO_TIMESTAMP(unix_ts_to_date(" + date + "),'YYYY-MM-DD HH24:MI:SS'),"
+ "'" + solution + "')");
stmt.close();*/
ps.execute();
ps.close();
} catch (SQLException e) {
e.printStackTrace();
}

Parsing JSON (or anything, for that matter) will not take the same memory of the original file size.
Each block of JSON string that represent an object, will become an object, ADDING memory to the already loaded JSON. If you parse it using a some kind of stream, you will still add memory but to much less (you won't hold the entire 3.6GB file in memory).
Still, an object takes more memory to represent than the string. If you have an array, which might be parsed to a list, than there is overhead to that list. Multiply that overhead by the instances you have in the JSON (quite a lot, in a 3.6 GB file) and you end up with something taking much more than just 3.6GB in memory.
But if you want to parse it as a stream, and handle each record as it goes, then discard it, you can do that. In both cases for using a stream you'll need a component that parses the JSON and let you handle each parsed object. If you know the structure it just might be easier to write one yourself.
Hope it helps.

You need to use an event-based / streaming JSON parser. The idea is that instead of parsing the entire JSON file in one go and holding it in memory, the parser emits "events" at the start and end of each significant syntactic unit. Then you write your code to handle these events, extra and assemble the information and (in your case) insert the corresponding records into your database.
Here are some places to start reading about Oracle's streaming JSON APIs:
http://docs.oracle.com/javaee/7/api/javax/json/stream/JsonParser.html
http://www.oracle.com/technetwork/articles/java/json-1973242.html
and here is a link to the documentation for the GSON equivalent:
https://sites.google.com/site/gson/streaming

See Gson's Streaming doc
This is used when the whole model cannot be loaded into memory

Related

Java Stream a Large SQL Query into API CSV File

I am writing a Service that obtains data from large sql query in database (over 100,000 records) and streams into an API CSV File. Is there any java library function that does it, or any way to make the code below more efficient? Currently using Java 8 in Spring boot environment.
Code is below with sql repository method, and service for csv. Preferably trying to write to csv file, while data is being fetched from sql concurrently as query make take 2-3 min for user .
We are using Snowflake DB.
public class ProductService {
private final ProductRepository productRepository;
private final ExecutorService executorService;
public ProductService(ProductRepository productRepository) {
this.productRepository = productRepository;
this.executorService = Executors.newFixedThreadPool(20);
}
public InputStream getproductExportFile(productExportFilters filters) throws IOException {
PipedInputStream is = new PipedInputStream();
PipedOutputStream os = new PipedOutputStream(is);
executorService.execute(() -> {
try {
Stream<productExport> productStream = productRepository.getproductExportStream(filters);
Field[] fields = Stream.of(productExport.class.getDeclaredFields())
.peek(f -> f.setAccessible(true))
.toArray(Field[]::new);
String[] headers = Stream.of(fields)
.map(Field::getName).toArray(String[]::new);
CSVFormat csvFormat = CSVFormat.DEFAULT.builder()
.setHeader(headers)
.build();
OutputStreamWriter outputStreamWriter = new OutputStreamWriter(os);
CSVPrinter csvPrinter = new CSVPrinter(outputStreamWriter, csvFormat);
productStream.forEach(productExport -> writeproductExportToCsv(productExport, csvPrinter, fields));
outputStreamWriter.close();
csvPrinter.close();
} catch (Exception e) {
logger.warn("Unable to complete writing to csv stream.", e);
} finally {
try {
os.close();
} catch (IOException ignored) { }
}
});
return is;
}
private void writeProductExportToCsv(productExport productExport, CSVPrinter csvPrinter, Field[] fields) {
Object[] values = Stream.of(fields).
map(f -> {
try {
return f.get(productExport);
} catch (IllegalAccessException e) {
return null;
}
})
.toArray();
try {
csvPrinter.printRecord(values);
csvPrinter.flush();
} catch (IOException e) {
logger.warn("Unable to write record to file.", e);
}
}
public Stream<PatientExport> getProductExportStream(ProductExportFilters filters) {
MapSqlParameterSource parameterSource = new MapSqlParameterSource();
parameterSource.addValue("customerId", filters.getCustomerId().toString());
parameterSource.addValue("practiceId", filters.getPracticeId().toString());
StringBuilder sqlQuery = new StringBuilder("SELECT * FROM dbo.Product ");
sqlQuery.append("\nWHERE CUSTOMERID = :customerId\n" +
"AND PRACTICEID = :practiceId\n"
);
Streaming allows you to transfer the data, little by little, without having to load it all into the server’s memory. You can do your operations by using the extractData() method in ResultSetExtractor. You can find javadoc about ResultSetExtractor here.
You can view an example using ResultSetExtractor here.
You can also easily create your JPA queries as ResultSet using JdbcTemplate. You can take a look at an example here. to use ResultSetExtractor.
There is product which we bought some time ago for our company, we got even the source code back then. https://northconcepts.com/ We were also evaluating Apache Camel which had similar support but it didnt suite our goal. If you really need speed you should go to lowest level possible - pure JDBC and as simple as possible csv writer.
Nortconcepts library itself provides capability to read from jdbc and write to CSV on lower level. We found few tweaks which have sped up the transmission and processing. With single thread we are able to stream 100 000 records (with 400 columns) within 1-2 minutes.
Given that you didn't specify which database you use I can give you only generic answers.
In general code like this is network limited, as JDBC resultset is usually transferred in "only n rows" packages, and when you exhaust one, only then database triggers fetching of next packet. This property is often called fetch-size, and you should greatly increase it. By default settings, most of databases transfer 10-100 rows in one fetch. In spring you can use setFetchSize property. Some benchmarks here.
There are other similar low level stuff which you could do. For example, Oracle jdbc driver has "InsensitiveResultSetBufferSize" - how big in bytes is a buffer which holds result set. But dose things tend to be database specific.
Thus being said, the best way to really increase speed of your transfer is to actually launch multiple queries. Divide your data on some column value, and than launch multiple parallel queries. Essentially, if you can design data to support parallel queries working on easily distinguished subsets, bottleneck can be transferred to a network or CPU throughput.
For example one of your columns might be 'timestamp'. Instead having one query to fetch all rows, fetch multiple subset of rows with query like this:
SELECT * FROM dbo.Product
WHERE CUSTOMERID = :customerId
AND PRACTICEID = :practiceId
AND :lowerLimit <= timestamp AND timestamp < :upperLimit
Launch this query in parallel with different timestamp ranges. Aggregate result of those subqueries in shared ConcurrentLinkedQueue and build CSV there.
With similar approach I regularly read 100000 rows/sec on 80 column table from Oracle DB. That is 40-60 MB/sec sustained transfer rate from a table which is not even locked.

Parse a String java

I have a BuilderString that contain the same result as in this link:
https://hadoop.apache.org/docs/current/hadoop-project-dist/
I'm looking to extract the values of the ``. And return a list of String that contain all the files name.
My code is:
try {
HttpURLConnection conHttp = (HttpURLConnection) url.openConnection();
conHttp.setRequestMethod("GET");
conHttp.setDoInput(true);
InputStream in = conHttp.getInputStream();
int ch;
StringBuilder sb = new StringBuilder();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
How can I parse JSON to take all the values of pathSuffix and return a list of string that contains the file names ?
Could you please give me a suggestion ? Thanks
That is JSON formatted data; JSON is not regular, tehrefore, trying to parse this with a regular expression is impossible, and trying to parse it out with substring and friends will take you a week and will be very error prone.
Read up on what JSON is (no worries; it's very simple to understand!), then get a good JSON library (the standard json.org library absolutely sucks, don't get that one), such as Jackson or GSON, and the code to extract what you need will be robust and easy to write and test.
The good option
Do the following steps:
Convert to JSON
Get the value using: JSONObject.get("FileStatuses").getAsJson().get("FileStatus").getAsJsonArray()
Iterate over all objects in the array to get the value you want
The bad option
Although as mentioned it is not recommended- If you want to stay with Strings you can use:
String str_to_find= "pathSuffix" : \"";
while (str.indexOf(str_to_find) != -1){
str = str.substring(str.indexOf(str_to_find)+str_to_find.length);
value = str.substring(0,str.indexOf("\""));
System.out.println("Value is " + value);
}
I would not recommend to build from scratch an API binding for hadoop.
This binding exist already for the Java language:
https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html#listLocatedStatus-org.apache.hadoop.fs.Path-org.apache.hadoop.fs.PathFilter-

SLOW SPEED in using SAX Parser to parse XML data and save it to mysql localhost (JAVA)

I am programming in JAVA for my current program with the problem.
I have to parse a big .rdf file(XML format) which is 1.60 GB in size,
and then insert the parsed data to mysql localhost server.
After googling, I decided to use SAX parser in my code.
Many sites encouraged using SAX parser over DOM parser,
saying that SAX parser is much faster than DOM parser.
However, when I executed my code which uses SAX parser, I found out that
my program executes so slow.
One senior in my lab told me that the slow speed issue might have occurred
from file I/O process.
In the code of 'javax.xml.parsers.SAXParser.class',
'InputStream' is used for file input, which could make the job slow compared
to using 'Scanner' class or 'BufferedReader' class.
My question is..
1. Are SAX parsers good for parsing large-scale xml documents?
My program took 10 minutes to parse a 14MB sample file and insert data
to mysql localhost.
Actually, another senior in my lab who made a similar program
as mine but using DOM parser parses the 1.60GB xml file and saves data
in an hour.
How can I use 'BufferedReader' instead of using 'InputStream',
while using the SAX parser library?
This is my first question asking to stackoverflow, so any kinds of advices would be thankful and helpful. Thank you for reading.
Added part after receiving initial feedbacks
I should have uploaded my code to clarify my problem, I apologize for it..
package xml_parse;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
public class Readxml extends DefaultHandler {
Connection con = null;
String[] chunk; // to check /A/, /B/, /C/ kind of stuff.
public Readxml() throws SQLException {
// connect to local mysql database
con = DriverManager.getConnection("jdbc:mysql://localhost/lab_first",
"root", "2030kimm!");
}
public void getXml() {
try {
// obtain and configure a SAX based parser
SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
// obtain object for SAX parser
SAXParser saxParser = saxParserFactory.newSAXParser();
// default handler for SAX handler class
// all three methods are written in handler's body
DefaultHandler default_handler = new DefaultHandler() {
String topic_gate = "close", category_id_gate = "close",
new_topic_id, new_catid, link_url;
java.sql.Statement st = con.createStatement();
public void startElement(String uri, String localName,
String qName, Attributes attributes)
throws SAXException {
if (qName.equals("Topic")) {
topic_gate = "open";
new_topic_id = attributes.getValue(0);
// apostrophe escape in SQL query
new_topic_id = new_topic_id.replace("'", "''");
if (new_topic_id.contains("International"))
topic_gate = "close";
if (new_topic_id.equals("") == false) {
chunk = new_topic_id.split("/");
for (int i = 0; i < chunk.length - 1; i++)
if (chunk[i].length() == 1) {
topic_gate = "close";
break;
}
}
if (new_topic_id.startsWith("Top/"))
new_topic_id.replace("Top/", "");
}
if (topic_gate.equals("open") && qName.equals("catid"))
category_id_gate = "open";
// add each new link to table "links" (MySQL)
if (topic_gate.equals("open") && qName.contains("link")) {
link_url = attributes.getValue(0);
link_url = link_url.replace("'", "''"); // take care of
// apostrophe
// escape
String insert_links_command = "insert into links(link_url, catid) values('"
+ link_url + "', " + new_catid + ");";
try {
st.executeUpdate(insert_links_command);
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
public void characters(char ch[], int start, int length)
throws SAXException {
if (category_id_gate.equals("open")) {
new_catid = new String(ch, start, length);
// add new row to table "Topics" (MySQL)
String insert_topics_command = "insert into topics(topic_id, catid) values('"
+ new_topic_id + "', " + new_catid + ");";
try {
st.executeUpdate(insert_topics_command);
} catch (SQLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
public void endElement(String uri, String localName,
String qName) throws SAXException {
if (qName.equals("Topic"))
topic_gate = "close";
if (qName.equals("catid"))
category_id_gate = "close";
}
};
// BufferedInputStream!!
String filepath = null;
BufferedInputStream buffered_input = null;
/*
* // Content filepath =
* "C:/Users/Kim/Desktop/2016여름/content.rdf.u8/content.rdf.u8";
* buffered_input = new BufferedInputStream(new FileInputStream(
* filepath)); saxParser.parse(buffered_input, default_handler);
*
* // Adult filepath =
* "C:/Users/Kim/Desktop/2016여름/ad-content.rdf.u8"; buffered_input =
* new BufferedInputStream(new FileInputStream( filepath));
* saxParser.parse(buffered_input, default_handler);
*/
// Kids-and-Teens
filepath = "C:/Users/Kim/Desktop/2016여름/kt-content.rdf.u8";
buffered_input = new BufferedInputStream(new FileInputStream(
filepath));
saxParser.parse(buffered_input, default_handler);
System.out.println("Finished.");
} catch (SQLException sqex) {
System.out.println("SQLException: " + sqex.getMessage());
System.out.println("SQLState: " + sqex.getSQLState());
} catch (Exception e) {
e.printStackTrace();
}
}
}
This is my whole code of my program..
My original code from yesterday tried file I/O like the following way
(instead of using 'BufferedInputStream')
saxParser.parse("file:///C:/Users/Kim/Desktop/2016여름/content.rdf.u8/content.rdf.u8",
default_handler);
I expected some speed improvements in my program after I used
'BufferedInputStream', but speed didn't improve at all.
I am having trouble figuring out the bottleneck causing the speed issue.
Thank you very much.
the rdf file being read in the code is about 14 MB in size, and it takes about
11 minutes for my computer to execute this code.
Are SAX parsers good for parsing large-scale xml documents?
Yes clearly SAX and StAX parsers are the best choices to parse big XML documents as they are low memory and CPU consumers which is not the case of DOM parsers that load everything into memory which is clearly not the right choice in this case.
Response Update:
Regarding your code for me your slowness issue is more related to how you store your data in your database. Your current code executes your queries in auto commit mode while you should use the transactional mode for better performances as you have a lot of data to insert, read this for a better understanding. To reduce the round trips between the database and your application you should also consider using batch update like in this good example.
With a SAX parser you should be able to achieve a parsing speed of 1Gb/minute without too much difficulty. If it's taking 10min to parse 14Mb then either you are doing something wrong, or the time is being spent doing something other than SAX parsing (e.g. database updating).
You can keep with the SAX parser, and use a BufferedInputStream rather than a BufferedReader (as you then need not guess the charset encoding of the XML).
It could be for XML in general, that extra files are read: DTDs and such. For instance there is a huge number of named entities for (X)HTML. The usage of an XML catalog for having those remote files locally then helps enormously.
Maybe you can switch off validation.
Also you might compare network traffic versus calculation power using gzip compression. By setting headers and inspecting headers, a GZipInputStream-by-case might be more efficient (or not).

Faster alternative to JsonPath in Java

JsonPath seems to be pretty slow for large JSON files.
In my project, I'd like a user to be able to pass an entire query as a string. I used JsonPath because it lets you do an entire query like $.store.book[3].price all at once by doing JsonPath.read(fileOrString, "$.store.book[3].price", new Filter[0]). Is there a faster method to interact with JSON files in Javascript? It would be ideal to be able to pass the entire query as a string, but I'll write a parser if I have to. Any ideas?
Even small optimizations would be helpful. For instance, I'm currently reading from a JSON file every time I query. Would it be better just to copy the entire file into a string at the beginning and query to the string instead?
EDIT: To those of you saying "this is Javascript, not Java", well, it actually is Java. JsonPath is a Javascript-like query language, but the file I am writing is most assuredly Java. Only the query is written in Javascript. Here's some info about JsonPath, and a snippet of code: https://code.google.com/p/json-path/
List toRet;
String query = "$.store.book[3].price";
try {
// if output is a list, good
toRet = (List) JsonPath.read(filestring_, query, new Filter[0]);
} catch (ClassCastException cce) {
// if output isn't a list, put it in a list
Object outObj = null;
try {
outObj = JsonPath.read(filestring_, query, new Filter[0]);
} catch (Exception e) {
throw new DataSourceException("Invalid file!\n", e, DataSourceException.UNKNOWN);
}

Java Heap Space Error, OutofMemory Exception while writing large data to excel sheet

I am getting Java Heap Space Error while writing large data from database to an excel sheet.
I dont want to use JVM -XMX options to increase memory.
Following are the details:
1) I am using org.apache.poi.hssf api
for excel sheet writing.
2) JDK version 1.5
3) Tomcat 6.0
Code i have wriiten works well for around 23 thousand records, but it fails for more than 23K records.
Following is the code:
ArrayList l_objAllTBMList= new ArrayList();
l_objAllTBMList = (ArrayList) m_objFreqCvrgDAO.fetchAllTBMUsers(p_strUserTerritoryId);
ArrayList l_objDocList = new ArrayList();
m_objTotalDocDtlsInDVL= new HashMap();
Object l_objTBMRecord[] = null;
Object l_objVstdDocRecord[] = null;
int l_intDocLstSize=0;
VisitedDoctorsVO l_objVisitedDoctorsVO=null;
int l_tbmListSize=l_objAllTBMList.size();
System.out.println(" getMissedDocDtlsList_NSM ");
for(int i=0; i<l_tbmListSize;i++)
{
l_objTBMRecord = (Object[]) l_objAllTBMList.get(i);
l_objDocList = (ArrayList) m_objGenerateVisitdDocsReportDAO.fetchAllDocDtlsInDVL_NSM((String) l_objTBMRecord[1], p_divCode, (String) l_objTBMRecord[2], p_startDt, p_endDt, p_planType, p_LMSValue, p_CycleId, p_finYrId);
l_intDocLstSize=l_objDocList.size();
try {
l_objVOFactoryForDoctors = new VOFactory(l_intDocLstSize, VisitedDoctorsVO.class);
/* Factory class written to create and maintain limited no of Value Objects (VOs)*/
} catch (ClassNotFoundException ex) {
m_objLogger.debug("DEBUG:getMissedDocDtlsList_NSM :Exception:"+ex);
} catch (InstantiationException ex) {
m_objLogger.debug("DEBUG:getMissedDocDtlsList_NSM :Exception:"+ex);
} catch (IllegalAccessException ex) {
m_objLogger.debug("DEBUG:getMissedDocDtlsList_NSM :Exception:"+ex);
}
for(int j=0; j<l_intDocLstSize;j++)
{
l_objVstdDocRecord = (Object[]) l_objDocList.get(j);
l_objVisitedDoctorsVO = (VisitedDoctorsVO) l_objVOFactoryForDoctors.getVo();
if (((String) l_objVstdDocRecord[6]).equalsIgnoreCase("-"))
{
if (String.valueOf(l_objVstdDocRecord[2]) != "null")
{
l_objVisitedDoctorsVO.setPotential_score(String.valueOf(l_objVstdDocRecord[2]));
l_objVisitedDoctorsVO.setEmpcode((String) l_objTBMRecord[1]);
l_objVisitedDoctorsVO.setEmpname((String) l_objTBMRecord[0]);
l_objVisitedDoctorsVO.setDoctorid((String) l_objVstdDocRecord[1]);
l_objVisitedDoctorsVO.setDr_name((String) l_objVstdDocRecord[4] + " " + (String) l_objVstdDocRecord[5]);
l_objVisitedDoctorsVO.setDoctor_potential((String) l_objVstdDocRecord[3]);
l_objVisitedDoctorsVO.setSpeciality((String) l_objVstdDocRecord[7]);
l_objVisitedDoctorsVO.setActualpractice((String) l_objVstdDocRecord[8]);
l_objVisitedDoctorsVO.setLastmet("-");
l_objVisitedDoctorsVO.setPreviousmet("-");
m_objTotalDocDtlsInDVL.put((String) l_objVstdDocRecord[1], l_objVisitedDoctorsVO);
}
}
}// End of While
writeExcelSheet(); // Pasting this method at the end
// Clean up code
l_objVOFactoryForDoctors.resetFactory();
m_objTotalDocDtlsInDVL.clear();// Clear the used map
l_objDocList=null;
l_objTBMRecord=null;
l_objVstdDocRecord=null;
}// End of While
l_objAllTBMList=null;
m_objTotalDocDtlsInDVL=null;
-------------------------------------------------------------------
private void writeExcelSheet() throws IOException
{
HSSFRow l_objRow = null;
HSSFCell l_objCell = null;
VisitedDoctorsVO l_objVisitedDoctorsVO = null;
Iterator l_itrDocMap = m_objTotalDocDtlsInDVL.keySet().iterator();
while (l_itrDocMap.hasNext())
{
Object key = l_itrDocMap.next();
l_objVisitedDoctorsVO = (VisitedDoctorsVO) m_objTotalDocDtlsInDVL.get(key);
l_objRow = m_objSheet.createRow(m_iRowCount++);
l_objCell = l_objRow.createCell(0);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(String.valueOf(l_intSrNo++));
l_objCell = l_objRow.createCell(1);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getEmpname() + " (" + l_objVisitedDoctorsVO.getEmpcode() + ")"); // TBM Name
l_objCell = l_objRow.createCell(2);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getDr_name());// Doc Name
l_objCell = l_objRow.createCell(3);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getPotential_score());// Freq potential score
l_objCell = l_objRow.createCell(4);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getDoctor_potential());// Freq potential score
l_objCell = l_objRow.createCell(5);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getSpeciality());//CP_GP_SPL
l_objCell = l_objRow.createCell(6);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getActualpractice());// Actual practise
l_objCell = l_objRow.createCell(7);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getPreviousmet());// Lastmet
l_objCell = l_objRow.createCell(8);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getLastmet());// Previousmet
}
// Write OutPut Stream
try {
out = new FileOutputStream(m_objFile);
outBf = new BufferedOutputStream(out);
m_objWorkBook.write(outBf);
} catch (Exception ioe) {
ioe.printStackTrace();
System.out.println(" Exception in chunk write");
} finally {
if (outBf != null) {
outBf.flush();
outBf.close();
out.close();
l_objRow=null;
l_objCell=null;
}
}
}
Instead of populating the complete list in memory before starting to write to excel you need to modify the code to work in such a way that each object is written to a file as it is read from the database. Take a look at this question to get some idea of the other approach.
Well, I'm not sure if POI can handle incremental updates but if so you might want to write chunks of say 10000 Rows to the file. If not, you might have to use CSV instead (so no formatting) or increase memory.
The problem is that you need to make objects written to the file elligible for garbage collection (no references from a live thread anymore) before writing the file is finished (before all rows have been generated and written to the file).
Edit:
If can you write smaller chunks of data to the file you'd also have to only load the necessary chunks from the db. So it doesn't make sense to load 50000 records at once and then try and write 5 chunks of 10000, since those 50000 records are likely to consume a lot of memory already.
As Thomas points out, you have too many objects taking up too much space, and need a way to reduce that. There is a couple of strategies for this I can think of:
Do you need to create a new factory each time in the loop, or can you reuse it?
Can you start with a loop getting the information you need into a new structure, and then discarding the old one?
Can you split the processing into a thread chain, sending information forwards to the next step, avoiding building a large memory consuming structure at all?

Categories

Resources