Writing large of data to excel: GC overhead limit exceeded - java

I have a list of strings in read from MongoDB (~200k lines)
Then I want to write it to an excel file with Java code:
public class OutputToExcelUtils {
private static XSSFWorkbook workbook;
private static final String DATA_SEPARATOR = "!";
public static void clusterOutToExcel(List<String> data, String outputPath) {
workbook = new XSSFWorkbook();
FileOutputStream outputStream = null;
writeData(data, "Data");
try {
outputStream = new FileOutputStream(outputPath);
workbook.write(outputStream);
workbook.close();
} catch (IOException e) {
e.printStackTrace();
}
}
public static void writeData(List<String> data, String sheetName) {
int rowNum = 0;
XSSFSheet sheet = workbook.getSheet(sheetName);
sheet = workbook.createSheet(sheetName);
for (int i = 0; i < data.size(); i++) {
System.out.println(sheetName + " Processing line: " + i);
int colNum = 0;
// Split into value of cell
String[] valuesOfLine = data.get(i).split(DATA_SEPERATOR);
Row row = sheet.createRow(rowNum++);
for (String valueOfCell : valuesOfLine) {
Cell cell = row.createCell(colNum++);
cell.setCellValue(valueOfCell);
}
}
}
}
Then I get an error:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead
limit exceeded at
org.apache.xmlbeans.impl.store.Cur$Locations.(Cur.java:497) at
org.apache.xmlbeans.impl.store.Locale.(Locale.java:168) at
org.apache.xmlbeans.impl.store.Locale.getLocale(Locale.java:242) at
org.apache.xmlbeans.impl.store.Locale.newInstance(Locale.java:593) at
org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.newInstance(SchemaTypeLoaderBase.java:198)
at
org.apache.poi.POIXMLTypeLoader.newInstance(POIXMLTypeLoader.java:132)
at
org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst$Factory.newInstance(Unknown
Source) at
org.apache.poi.xssf.usermodel.XSSFRichTextString.(XSSFRichTextString.java:87)
at
org.apache.poi.xssf.usermodel.XSSFCell.setCellValue(XSSFCell.java:417)
at
ups.mongo.excelutil.OutputToExcelUtils.writeData(OutputToExcelUtils.java:80)
at
ups.mongo.excelutil.OutputToExcelUtils.clusterOutToExcel(OutputToExcelUtils.java:30)
at ups.mongodb.App.main(App.java:74)
Please give me some advice for that?
Thank you with my respect.
Update solution: Using SXSSWorkbook instead of XSSWorkbook
public class OutputToExcelUtils {
private static SXSSFWorkbook workbook;
private static final String DATA_SEPERATOR = "!";
public static void clusterOutToExcel(ClusterOutput clusterObject, ClusterOutputTrade clusterOutputTrade,
ClusterOutputDistance ClusterOutputDistance, String outputPath) {
workbook = new SXSSFWorkbook();
workbook.setCompressTempFiles(true);
FileOutputStream outputStream = null;
writeData(clusterOutputTrade.getTrades(), "Data");
try {
outputStream = new FileOutputStream(outputPath);
workbook.write(outputStream);
workbook.close();
} catch (IOException e) {
e.printStackTrace();
}
}
public static void writeData(List<String> data, String sheetName) {
int rowNum = 0;
SXSSFSheet sheet = workbook.createSheet(sheetName);
sheet.setRandomAccessWindowSize(100); // For 100 rows saved in memory, it will flushed after wirtten to excel file
for (int i = 0; i < data.size(); i++) {
System.out.println(sheetName + " Processing line: " + i);
int colNum = 0;
// Split into value of cell
String[] valuesOfLine = data.get(i).split(DATA_SEPERATOR);
Row row = sheet.createRow(rowNum++);
for (String valueOfCell : valuesOfLine) {
Cell cell = row.createCell(colNum++);
cell.setCellValue(valueOfCell);
}
}
}
}

Your application is spending too much time doing garbage collection. This doesn't necessarily mean that it is running out of heap space; however, it spends too much time in GC relative to performing actual work, so the Java runtime shuts it down.
Try to enable throughput collection with the following JVM option:
-XX:+UseParallelGC
While you're at it, give your application as much heap space as possible:
-Xms????m
(where ???? stands for the amount of heap space in MB, e.g. -Xms8192m)
If this doesn't help, try to set a more lenient throughput goal with this option:
-XX:GCTimeRatio=19
This specifies that your application should do 19 times more useful work than GC-related work, i.e. it allows the GC to consume up to 5% of the processor time (I believe the stricter 1% default goal may be causing the above runtime error)
No guarantee that his will work. Can you check and post back so others who experience similar problems may benefit?
EDIT
Your root problem remains the fact that you need to hold the entire spreadhseet and all its related objects in memory while you are building it. Another solution would be to serialize the data, i.e. writing the actual spreadsheet file instead of constructing it in memory and saving it at the end. However, this requires reading up on the XLXS format and creating a custom solution.
Another option would be looking for a less memory-intensive library (if one exists). Possible alternatives to POI are JExcelAPI (open source) and Aspose.Cells (commercial).
I've used JExcelAPI years ago and had a positive experience (however, it appears that it is much less actively maintained than POI, so may no longer be the best choice).
EDIT 2
Looks like POI offers a streaming model (https://poi.apache.org/spreadsheet/how-to.html#sxssf), so this may be the best overall approach.

Well try to not load all the data in memory. Even if the binary representation of 200k lines is not that big the hidrated object in memory may be too big. Just as a hint if you have a Pojo each attribute in this pojo has a pointer and each pointer depending on if it is compressed or not compressed will take 4 or 8 bytes. This mean that if your data is a Pojo with 4 attributes only for the pointers you will be spending 200 000* 4bytes(or 8 bytes).
Theoreticaly you can increase the amount of memory to the JVM, but this is not a good solution, or more precisly it is not a good solution for a Live system. For a non interactive system might be fine.
Hint: Use -Xmx -Xms jvm arguments to control the heap size.

Instead of getting the entire list from the data, iterate line wise.
If too cumbersome, write the list to a file, and reread it linewise, for instance as a Stream<String>:
Path path = Files.createTempFile(...);
Files.write(path, list, StandarCharsets.UTF_8);
Files.lines(path, StandarCharsets.UTF_8)
.forEach(line -> { ... });
On the Excel side: though xlsx uses shared strings, if XSSF was done careless,
the following would use a single String instance for repeated string values.
public class StringCache {
private static final int MAX_LENGTH = 40;
private Map<String, String> identityMap = new Map<>();
public String cached(String s) {
if (s == null) {
return null;
}
if (s.length() > MAX_LENGTH) {
return s;
}
String t = identityMap.get(s);
if (t == null) {
t = s;
identityMap.put(t, t);
}
return t;
}
}
StringCache strings = new StringCache();
for (String valueOfCell : valuesOfLine) {
Cell cell = row.createCell(colNum++);
cell.setCellValue(strings.cached(valueOfCell));
}

Related

Apache POI XSSFWorkbook memory leak

So I'm making a large-scale prime number generator in Java (with the help of JavaFX).
It uses the Apache POI library (I believe I'm using v3.17) to output the results to Excel spreadsheets.
The static methods for this exporting logic are held in a class called ExcelWriter. Basically, it iterates through an Arraylist arguments and populates a XSSFWorkbook with it's contents. Afterwords, a FileOutputStream is used to actually make it an excel file. Here are the relevant parts of it:
public class ExcelWriter {
//Configured JFileChooser to make alert before overwriting old files
private static JFileChooser fileManager = new JFileChooser(){
#Override
public void approveSelection(){
...
}
};
private static FileFilter filter = new FileNameExtensionFilter("Excel files","xlsx");
private static boolean hasBeenInitialized = false;
//Only method that can be called externally to access this class's functionality
public static <T extends Object> void makeSpreadsheet
(ArrayList<T> list, spreadsheetTypes type, int max, String title, JFXProgressBar progressBar)
throws IOException, InterruptedException{
progressBar.progressProperty().setValue(0);
switch (type){
case rightToLeftColumnLimit:
makeSpreadsheetRightToLeft(list, false, max, title, progressBar);
break;
...
}
}
static private <T extends Object> void makeSpreadsheetRightToLeft
(ArrayList<T> list, boolean maxRows, int max, String title, JFXProgressBar progressBar)
throws IOException, InterruptedException{
initializeChooser();
XSSFWorkbook workbook = new XSSFWorkbook();
XSSFSheet sheet = workbook.createSheet("Primus output");
int rowPointer = 0;
int columnPointer = 0;
double progressIncrementValue = 1/(double)list.size();
//Giving the spreadsheet an internal title also
Row row = sheet.createRow(0);
row.createCell(0).setCellValue(title);
row = sheet.createRow(++rowPointer);
//Making the sheet with a max column limit
if (!maxRows){
for (T number: list){
if (columnPointer == max){
columnPointer = 0;
row = sheet.createRow(++rowPointer);
}
Cell cell = row.createCell(columnPointer++);
progressBar.setProgress(progressBar.getProgress() + progressIncrementValue);
cell.setCellValue(number.toString());
}
}else {
//Making the sheet with a max row limit
int columnWrapIndex = (int)Math.ceil(list.size()/(float)max);
for (T number: list){
if (columnPointer == columnWrapIndex){
columnPointer = 0;
row = sheet.createRow(++rowPointer);
}
Cell cell = row.createCell(columnPointer++);
progressBar.setProgress(progressBar.getProgress() + progressIncrementValue);
cell.setCellValue(number.toString());
}
}
writeToExcel(workbook, progressBar);
}
static private void writeToExcel(XSSFWorkbook book, JFXProgressBar progressBar) throws IOException, InterruptedException{
//Exporting to Excel
int returnValue = fileManager.showSaveDialog(null);
if (returnValue == JFileChooser.APPROVE_OPTION){
File file = fileManager.getSelectedFile();
//Validation logic here
try{
FileOutputStream out = new FileOutputStream(file);
book.write(out);
out.close();
book.close();
}catch (FileNotFoundException ex){
}
}
}
}
Afterwards, my FXML document controller has a buttonListerner which calls:
longCalculationThread thread = new longCalculationThread(threadBundle);
thread.start();
The longcalculationthread creates a list of about a million prime numbers and Exports them to the ExcelWriter using this code:
private void publishResults() throws IOException, InterruptedException{
if (!longResults.isEmpty()){
if (shouldExport) {
progressText.setText("Exporting to Excel...");
ExcelWriter.makeSpreadsheet(longResults, exportType, excelExportLimit, getTitle(), progressBar);
}
}
The problem is, even though the variable holding the workbook in the XSSF workbook is a local variable to the methods it is used in, it doesn't get garbage collected afterwards.
It takes up like 1.5GB of RAM (I don't know why), and that data is only reallocated when another huge export is called (not for small exports).
My problem isn't really that the thing takes a lot of RAM, it's that even when the methods are completed the memory isn't GCed.
Here are some pictures of my NetBeans profiles:
Normal memory usage when making array of 1000000 primes:
Huge heap usage when making workbook
Memory Isn't reallocated when workbook ins't accessible anymore
Fluctuation seen when making a new workbook using the same static methods
I found the answer! I had to prompt the GC with System.gc(). I remember trying this out earlier, however I must have put it in a pace where the workbook was still accessible and hence couldn't be GCed.

Java Memory issue using Apache.POI to read write Excel

I am trying to read excel file...make some changes...save to new file.
I have created small form with button...On pressing button..
It will load Excel file and load all data to Array list of class I have created.
It will loop through Array list and change few properties in objects.
It will save data to new Excel file.
Finally, it will clear Array list and show message box of completion.
Now the problem is memory issue.
When form is loaded, I can see in windows task manager...javaw is using around 23MB.
During read and write excel...memory shoots upto 170MB.
After array list is cleared....Memory is not clearing up and stays around 150MB.
Following code is attached to Event to button click.
MouseListener mouseListener = new MouseAdapter() {
public void mouseReleased(MouseEvent mouseEvent) {
if (SwingUtilities.isLeftMouseButton(mouseEvent)) {
ArrayList<Address> addresses = ExcelFunctions.getExcelData(fn);
for (Address address : addresses){
address.setZestimate(Integer.toString(rnd.nextInt(45000)));
address.setRedfinestimate(Integer.toString(rnd.nextInt(45000)));
}
ExcelFunctions.saveToExcel(ofn,addresses);
addresses.clear();
JOptionPane.showMessageDialog(null, "Done");
}
}
};
The code for Reading/Excel file in this Class.
public class ExcelFunctions {
public static ArrayList<Address> getExcelData(String fn)
{
ArrayList<Address> output = new ArrayList<Address>();
try
{
FileInputStream file = new FileInputStream(new File(fn));
//Create Workbook instance holding reference to .xlsx file
XSSFWorkbook workbook = new XSSFWorkbook(file);
//Get first/desired sheet from the workbook
XSSFSheet sheet = workbook.getSheetAt(0);
System.out.println(sheet.getSheetName());
//Iterate through each rows one by one
Iterator<Row> rowIterator = sheet.iterator();
while (rowIterator.hasNext())
{
Row row = rowIterator.next();
int r = row.getRowNum();
int fc= row.getFirstCellNum();
int lc = row.getLastCellNum();
String msg = "Row:"+ r +"FColumn:"+ fc + "LColumn"+lc;
System.out.println(msg);
if (row.getRowNum() > 0) {
Address add = new Address();
Cell c0 = row.getCell(0);
Cell c1 = row.getCell(1);
Cell c2 = row.getCell(2);
Cell c3 = row.getCell(3);
Cell c4 = row.getCell(4);
Cell c5 = row.getCell(5);
if (c0 != null){c0.setCellType(Cell.CELL_TYPE_STRING);add.setState(c0.toString());}
if (c1 != null){c1.setCellType(Cell.CELL_TYPE_STRING);add.setCity(c1.toString());}
if (c2 != null){c2.setCellType(Cell.CELL_TYPE_STRING);add.setZipcode(c2.toString());}
if (c3 != null){c3.setCellType(Cell.CELL_TYPE_STRING);add.setAddress(c3.getStringCellValue());}
if (c4 != null){c4.setCellType(Cell.CELL_TYPE_STRING);add.setZestimate(c4.getStringCellValue());}
if (c5 != null){c5.setCellType(Cell.CELL_TYPE_STRING);add.setRedfinestimate(c5.getStringCellValue());}
output.add(add);
c0=null;c1=null;c2=null;c3=null;c4=null;c5=null;
}
}
workbook.close();
file.close();
}
catch (Exception e)
{
System.out.println(e.getMessage());
}
return output;
}
public static void saveToExcel(String ofn, ArrayList<Address> addresses) {
XSSFWorkbook workbook = new XSSFWorkbook();
XSSFSheet sheet = workbook.createSheet("Addresses");
Row header = sheet.createRow(0);
header.createCell(0).setCellValue("State");
header.createCell(1).setCellValue("City");
header.createCell(2).setCellValue("Zip");
header.createCell(3).setCellValue("Address");
header.createCell(4).setCellValue("Zestimates");
header.createCell(5).setCellValue("Redfin Estimate");
int row = 1;
for (Address address : addresses){
Row dataRow = sheet.createRow(row);
dataRow.createCell(0).setCellValue(address.getState());
dataRow.createCell(1).setCellValue(address.getCity());
dataRow.createCell(2).setCellValue(address.getZipcode());
dataRow.createCell(3).setCellValue(address.getAddress());
dataRow.createCell(4).setCellValue(address.getZestimate());
dataRow.createCell(5).setCellValue(address.getRedfinestimate());
row++;
}
try {
FileOutputStream out = new FileOutputStream(new File(ofn));
workbook.write(out);
out.close();
workbook.close();
System.out.println("Excel with foumula cells written successfully");
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}}
I am unable to figure out where the issue is.
I m closing workbook/inputstream/outputstream and clearing Arraylist too.
You probably don't have a memory leak...
When form is loaded, I can see in windows task manager...javaw is
using around 23MB. During read and write excel...memory shoots upto
170MB. After array list is cleared....Memory is not clearing up and
stays around 150MB.
This doesn't describe a memory leak - Task Manager is showing you the memory reserved by the process - not the application heap space.
Your JVM will allocate heap up to its configured maximum, say 200 MiB. Generally, after this memory is allocated from the OS, the JVM doesn't give it back (very often). However, if you look at your heap usage (with a tool like JConsole or JVisual VM) you'll see that the heap is reclaimed after a GC.
How Java consumes memory
As a very basic example:
Image source: https://stopcoding.files.wordpress.com/2010/04/visualvm_hfcd4.png
In this example, the JVM has a 1 GiB max heap, and as the application needed more memory, 400 MiB was reserved from the OS (the orange area).
The blue area is the actual heap memory used by the application. The saw-tooth effect is the result of the garbage collection process reclaiming unused memory. Note that the orange area remains fairly static - it generally won't resize with each GC event...
within few seconds...it shoot upto 800MB and stays there till end....I
have not got any memory error
If you had a memory leak, you'd eventually get an out of memory error. A "leak" (in Java at least) is when the application ties up memory in the heap, but doesn't release it for reuse by the application. If your observed memory shoots up that quickly, but the application doesn't fall over, you'll probably see that internally (in the JVM) memory is actually being released and reused.
Limiting how much (OS) memory Java can use
If you want to limit the memory your application can reserve from the OS, you need to configure your maximum heap size (via the -Xmx option) as well as your permanent generation size (if you're still using Java 7 or earlier). Note that the JVM uses some memory itself, so the value shown at OS level (using tools like Task Manager) can be higher than the sum of application memory you have specified.

Optimizing indexing lucene 5.2.1

I have developed my own Indexer in Lucene 5.2.1. I am trying to index a file of dimension of 1.5 GB and I need to do some non-trivial calculation during indexing time on every single document of the collection.
The problem is that it takes almost 20 minutes to do all the indexing! I have followed this very helpful wiki, but it is still way too slow. I have tried increasing Eclipse heap space and java VM memory, but it seems more a matter of hard disk rather than virtual memory (I am using a laptop with 6GB or RAM and a common Hard Disk).
I have read this discussion that suggests to use RAMDirectory or mount a RAM disk. The problem with RAM disk would be that of persisting index in my filesystem (I don't want to lose indexes after reboot). The problem with RAMDirectory instead is that, according to the APIs, I should not use it because my index is more than "several hundreds of megabites"...
Warning: This class is not intended to work with huge indexes. Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of byte[1024] arrays. This class is optimized for small memory-resident indexes. It also has bad concurrency on multithreaded environments.
Here you can find my code:
public class ReviewIndexer {
private JSONParser parser;
private PerFieldAnalyzerWrapper reviewAnalyzer;
private IndexWriterConfig iwConfig;
private IndexWriter indexWriter;
public ReviewIndexer() throws IOException{
parser = new JSONParser();
reviewAnalyzer = new ReviewWrapper().getPFAWrapper();
iwConfig = new IndexWriterConfig(reviewAnalyzer);
//change ram buffer size to speed things up
//#url https://wiki.apache.org/lucene-java/ImproveIndexingSpeed
iwConfig.setRAMBufferSizeMB(2048);
//little speed increase
iwConfig.setUseCompoundFile(false);
//iwConfig.setMaxThreadStates(24);
// Set to overwrite the existing index
indexWriter = new IndexWriter(FileUtils.openDirectory("review_index"), iwConfig);
}
/**
* Indexes every review.
* #param file_path : the path of the yelp_academic_dataset_review.json file
* #throws IOException
* #return Returns true if everything goes fine.
*/
public boolean indexReviews(String file_path) throws IOException{
BufferedReader br;
try {
//open the file
br = new BufferedReader(new FileReader(file_path));
String line;
//define fields
StringField type = new StringField("type", "", Store.YES);
String reviewtext = "";
TextField text = new TextField("text", "", Store.YES);
StringField business_id = new StringField("business_id", "", Store.YES);
StringField user_id = new StringField("user_id", "", Store.YES);
LongField stars = new LongField("stars", 0, LanguageUtils.LONG_FIELD_TYPE_STORED_SORTED);
LongField date = new LongField("date", 0, LanguageUtils.LONG_FIELD_TYPE_STORED_SORTED);
StringField votes = new StringField("votes", "", Store.YES);
Date reviewDate;
JSONObject jsonVotes;
try {
indexWriter.deleteAll();
//scan the file line by line
//TO-DO: split in chunks and use parallel computation
while ((line = br.readLine()) != null) {
try {
JSONObject jsonline = (JSONObject) parser.parse(line);
Document review = new Document();
//add values to fields
type.setStringValue((String) jsonline.get("type"));
business_id.setStringValue((String) jsonline.get("business_id"));
user_id.setStringValue((String) jsonline.get("user_id"));
stars.setLongValue((long) jsonline.get("stars"));
reviewtext = (String) jsonline.get("text");
//non-trivial function being calculated here
text.setStringValue(reviewtext);
reviewDate = DateTools.stringToDate((String) jsonline.get("date"));
date.setLongValue(reviewDate.getTime());
jsonVotes = (JSONObject) jsonline.get("votes");
votes.setStringValue(jsonVotes.toJSONString());
//add fields to document
review.add(type);
review.add(business_id);
review.add(user_id);
review.add(stars);
review.add(text);
review.add(date);
review.add(votes);
//write the document to index
indexWriter.addDocument(review);
} catch (ParseException | java.text.ParseException e) {
e.printStackTrace();
br.close();
return false;
}
}//end of while
} catch (IOException e) {
e.printStackTrace();
br.close();
return false;
}
//close buffer reader and commit changes
br.close();
indexWriter.commit();
} catch (FileNotFoundException e1) {
e1.printStackTrace();
return false;
}
System.out.println("Done.");
return true;
}
public void close() throws IOException {
indexWriter.close();
}
}
What is the best thing to do then? Should I Build a RAM disk and then copy indexes to FileSystem once they are done, or should I use RAMDirectory anyway -or maybe something else? Many thanks
Lucene claims 150GB/hour on modern hardware - that is with 20 indexing threads on a 24 core machine.
You have 1 thread, so expect about 150/20 = 7.5 GB/hour. You will probably see that 1 core is working 100% and the rest is only working when merging segments.
You should use multiple index threads to speeds things up. See for example the luceneutil Indexer.java for inspiration.
As you have a laptop I suspect you have either 4 or 8 cores, so multi-threading should be able to give your indexing a nice boost.
You can try setMaxTreadStates in IndexWriterConfig
iwConfig.setMaxThreadStates(50);

Java Heap Space Error, OutofMemory Exception while writing large data to excel sheet

I am getting Java Heap Space Error while writing large data from database to an excel sheet.
I dont want to use JVM -XMX options to increase memory.
Following are the details:
1) I am using org.apache.poi.hssf api
for excel sheet writing.
2) JDK version 1.5
3) Tomcat 6.0
Code i have wriiten works well for around 23 thousand records, but it fails for more than 23K records.
Following is the code:
ArrayList l_objAllTBMList= new ArrayList();
l_objAllTBMList = (ArrayList) m_objFreqCvrgDAO.fetchAllTBMUsers(p_strUserTerritoryId);
ArrayList l_objDocList = new ArrayList();
m_objTotalDocDtlsInDVL= new HashMap();
Object l_objTBMRecord[] = null;
Object l_objVstdDocRecord[] = null;
int l_intDocLstSize=0;
VisitedDoctorsVO l_objVisitedDoctorsVO=null;
int l_tbmListSize=l_objAllTBMList.size();
System.out.println(" getMissedDocDtlsList_NSM ");
for(int i=0; i<l_tbmListSize;i++)
{
l_objTBMRecord = (Object[]) l_objAllTBMList.get(i);
l_objDocList = (ArrayList) m_objGenerateVisitdDocsReportDAO.fetchAllDocDtlsInDVL_NSM((String) l_objTBMRecord[1], p_divCode, (String) l_objTBMRecord[2], p_startDt, p_endDt, p_planType, p_LMSValue, p_CycleId, p_finYrId);
l_intDocLstSize=l_objDocList.size();
try {
l_objVOFactoryForDoctors = new VOFactory(l_intDocLstSize, VisitedDoctorsVO.class);
/* Factory class written to create and maintain limited no of Value Objects (VOs)*/
} catch (ClassNotFoundException ex) {
m_objLogger.debug("DEBUG:getMissedDocDtlsList_NSM :Exception:"+ex);
} catch (InstantiationException ex) {
m_objLogger.debug("DEBUG:getMissedDocDtlsList_NSM :Exception:"+ex);
} catch (IllegalAccessException ex) {
m_objLogger.debug("DEBUG:getMissedDocDtlsList_NSM :Exception:"+ex);
}
for(int j=0; j<l_intDocLstSize;j++)
{
l_objVstdDocRecord = (Object[]) l_objDocList.get(j);
l_objVisitedDoctorsVO = (VisitedDoctorsVO) l_objVOFactoryForDoctors.getVo();
if (((String) l_objVstdDocRecord[6]).equalsIgnoreCase("-"))
{
if (String.valueOf(l_objVstdDocRecord[2]) != "null")
{
l_objVisitedDoctorsVO.setPotential_score(String.valueOf(l_objVstdDocRecord[2]));
l_objVisitedDoctorsVO.setEmpcode((String) l_objTBMRecord[1]);
l_objVisitedDoctorsVO.setEmpname((String) l_objTBMRecord[0]);
l_objVisitedDoctorsVO.setDoctorid((String) l_objVstdDocRecord[1]);
l_objVisitedDoctorsVO.setDr_name((String) l_objVstdDocRecord[4] + " " + (String) l_objVstdDocRecord[5]);
l_objVisitedDoctorsVO.setDoctor_potential((String) l_objVstdDocRecord[3]);
l_objVisitedDoctorsVO.setSpeciality((String) l_objVstdDocRecord[7]);
l_objVisitedDoctorsVO.setActualpractice((String) l_objVstdDocRecord[8]);
l_objVisitedDoctorsVO.setLastmet("-");
l_objVisitedDoctorsVO.setPreviousmet("-");
m_objTotalDocDtlsInDVL.put((String) l_objVstdDocRecord[1], l_objVisitedDoctorsVO);
}
}
}// End of While
writeExcelSheet(); // Pasting this method at the end
// Clean up code
l_objVOFactoryForDoctors.resetFactory();
m_objTotalDocDtlsInDVL.clear();// Clear the used map
l_objDocList=null;
l_objTBMRecord=null;
l_objVstdDocRecord=null;
}// End of While
l_objAllTBMList=null;
m_objTotalDocDtlsInDVL=null;
-------------------------------------------------------------------
private void writeExcelSheet() throws IOException
{
HSSFRow l_objRow = null;
HSSFCell l_objCell = null;
VisitedDoctorsVO l_objVisitedDoctorsVO = null;
Iterator l_itrDocMap = m_objTotalDocDtlsInDVL.keySet().iterator();
while (l_itrDocMap.hasNext())
{
Object key = l_itrDocMap.next();
l_objVisitedDoctorsVO = (VisitedDoctorsVO) m_objTotalDocDtlsInDVL.get(key);
l_objRow = m_objSheet.createRow(m_iRowCount++);
l_objCell = l_objRow.createCell(0);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(String.valueOf(l_intSrNo++));
l_objCell = l_objRow.createCell(1);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getEmpname() + " (" + l_objVisitedDoctorsVO.getEmpcode() + ")"); // TBM Name
l_objCell = l_objRow.createCell(2);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getDr_name());// Doc Name
l_objCell = l_objRow.createCell(3);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getPotential_score());// Freq potential score
l_objCell = l_objRow.createCell(4);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getDoctor_potential());// Freq potential score
l_objCell = l_objRow.createCell(5);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getSpeciality());//CP_GP_SPL
l_objCell = l_objRow.createCell(6);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getActualpractice());// Actual practise
l_objCell = l_objRow.createCell(7);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getPreviousmet());// Lastmet
l_objCell = l_objRow.createCell(8);
l_objCell.setCellStyle(m_objCellStyle4);
l_objCell.setCellValue(l_objVisitedDoctorsVO.getLastmet());// Previousmet
}
// Write OutPut Stream
try {
out = new FileOutputStream(m_objFile);
outBf = new BufferedOutputStream(out);
m_objWorkBook.write(outBf);
} catch (Exception ioe) {
ioe.printStackTrace();
System.out.println(" Exception in chunk write");
} finally {
if (outBf != null) {
outBf.flush();
outBf.close();
out.close();
l_objRow=null;
l_objCell=null;
}
}
}
Instead of populating the complete list in memory before starting to write to excel you need to modify the code to work in such a way that each object is written to a file as it is read from the database. Take a look at this question to get some idea of the other approach.
Well, I'm not sure if POI can handle incremental updates but if so you might want to write chunks of say 10000 Rows to the file. If not, you might have to use CSV instead (so no formatting) or increase memory.
The problem is that you need to make objects written to the file elligible for garbage collection (no references from a live thread anymore) before writing the file is finished (before all rows have been generated and written to the file).
Edit:
If can you write smaller chunks of data to the file you'd also have to only load the necessary chunks from the db. So it doesn't make sense to load 50000 records at once and then try and write 5 chunks of 10000, since those 50000 records are likely to consume a lot of memory already.
As Thomas points out, you have too many objects taking up too much space, and need a way to reduce that. There is a couple of strategies for this I can think of:
Do you need to create a new factory each time in the loop, or can you reuse it?
Can you start with a loop getting the information you need into a new structure, and then discarding the old one?
Can you split the processing into a thread chain, sending information forwards to the next step, avoiding building a large memory consuming structure at all?

How do I write a Java text file viewer for big log files

I am working on a software product with an integrated log file viewer. Problem is, its slow and unstable for really large files because it reads the whole file into memory when you view a log file. I'm wanting to write a new log file viewer that addresses this problem.
What are the best practices for writing viewers for large text files? How does editors like notepad++ and VIM acomplish this? I was thinking of using a buffered Bi-directional text stream reader together with Java's TableModel. Am I thinking along the right lines and are such stream implementations available for Java?
Edit: Will it be worthwhile to run through the file once to index the positions of the start of each line of text so that one knows where to seek to? I will probably need the amount of lines, so will probably have to scan through the file at least once?
Edit2: I've added my implementation to an answer below. Please comment on it or edit it to help me/us arrive at a more best-practice implementation or otherwise provide your own.
I'm not sure that NotePad++ actually implements random access, but I think that's the way to go, especially with a log file viewer, which implies that it will be read only.
Since your log viewer will be read only, you can use a read only random access memory mapped file "stream". In Java, this is the FileChannel.
Then just jump around in the file as needed and render to the screen just a scrolling window of the data.
One of the advantages of the FileChannel is that concurrent threads can have the file open, and reading doesn't affect the current file pointer. So, if you're appending to the log file in another thread, it won't be affected.
Another advantage is that you can call the FileChannel's size method to get the file size at any moment.
The problem with mapping memory directly to a random access file, which some text editors allow (such as HxD and UltraEdit), is that any changes directly affect the file. Therefore, changes are immediate (except for write caching), which is something users typically don't want. Instead, users typically don't want their changes made until they click Save. However, since this is just a viewer, you don't have the same concerns.
A typical approach is to use a seekable file reader, make one pass through the log recording an index of line offsets and then present only a window onto a portion of the file as requested.
This reduces both the data you need in quick recall and doesn't load up a widget where 99% of its contents aren't currently visible.
I post my test implementation (after following the advice of Marcus Adams and msw) here for your convenience and also for further comments and criticism. Its quite fast.
I've not bothered with Unicode encoding safety. I guess this will be my next question. Any hints on that very welcome.
class LogFileTableModel implements TableModel {
private final File f;
private final int lineCount;
private final String errMsg;
private final Long[] index;
private final ByteBuffer linebuf = ByteBuffer.allocate(1024);
private FileChannel chan;
public LogFileTableModel(String filename) {
f = new File(filename);
String m;
int l = 1;
Long[] idx = new Long[] {};
try {
FileInputStream in = new FileInputStream(f);
chan = in.getChannel();
m = null;
idx = buildLineIndex();
l = idx.length;
} catch (IOException e) {
m = e.getMessage();
}
errMsg = m;
lineCount = l;
index = idx;
}
private Long[] buildLineIndex() throws IOException {
List<Long> idx = new LinkedList<Long>();
idx.add(0L);
ByteBuffer buf = ByteBuffer.allocate(8 * 1024);
long offset = 0;
while (chan.read(buf) != -1) {
int len = buf.position();
buf.rewind();
int pos = 0;
byte[] bufA = buf.array();
while (pos < len) {
byte c = bufA[pos++];
if (c == '\n')
idx.add(offset + pos);
}
offset = chan.position();
}
System.out.println("Done Building index");
return idx.toArray(new Long[] {});
}
#Override
public int getColumnCount() {
return 2;
}
#Override
public int getRowCount() {
return lineCount;
}
#Override
public String getColumnName(int columnIndex) {
switch (columnIndex) {
case 0:
return "#";
case 1:
return "Name";
}
return "";
}
#Override
public Object getValueAt(int rowIndex, int columnIndex) {
switch (columnIndex) {
case 0:
return String.format("%3d", rowIndex);
case 1:
if (errMsg != null)
return errMsg;
try {
Long pos = index[rowIndex];
chan.position(pos);
chan.read(linebuf);
linebuf.rewind();
if (rowIndex == lineCount - 1)
return new String(linebuf.array());
else
return new String(linebuf.array(), 0, (int)(long)(index[rowIndex+1]-pos));
} catch (Exception e) {
return "Error: "+ e.getMessage();
}
}
return "a";
}
#Override
public Class<?> getColumnClass(int columnIndex) {
return String.class;
}
// ... other methods to make interface complete
}

Categories

Resources