Apache POI XSSFWorkbook memory leak - java

So I'm making a large-scale prime number generator in Java (with the help of JavaFX).
It uses the Apache POI library (I believe I'm using v3.17) to output the results to Excel spreadsheets.
The static methods for this exporting logic are held in a class called ExcelWriter. Basically, it iterates through an Arraylist arguments and populates a XSSFWorkbook with it's contents. Afterwords, a FileOutputStream is used to actually make it an excel file. Here are the relevant parts of it:
public class ExcelWriter {
//Configured JFileChooser to make alert before overwriting old files
private static JFileChooser fileManager = new JFileChooser(){
#Override
public void approveSelection(){
...
}
};
private static FileFilter filter = new FileNameExtensionFilter("Excel files","xlsx");
private static boolean hasBeenInitialized = false;
//Only method that can be called externally to access this class's functionality
public static <T extends Object> void makeSpreadsheet
(ArrayList<T> list, spreadsheetTypes type, int max, String title, JFXProgressBar progressBar)
throws IOException, InterruptedException{
progressBar.progressProperty().setValue(0);
switch (type){
case rightToLeftColumnLimit:
makeSpreadsheetRightToLeft(list, false, max, title, progressBar);
break;
...
}
}
static private <T extends Object> void makeSpreadsheetRightToLeft
(ArrayList<T> list, boolean maxRows, int max, String title, JFXProgressBar progressBar)
throws IOException, InterruptedException{
initializeChooser();
XSSFWorkbook workbook = new XSSFWorkbook();
XSSFSheet sheet = workbook.createSheet("Primus output");
int rowPointer = 0;
int columnPointer = 0;
double progressIncrementValue = 1/(double)list.size();
//Giving the spreadsheet an internal title also
Row row = sheet.createRow(0);
row.createCell(0).setCellValue(title);
row = sheet.createRow(++rowPointer);
//Making the sheet with a max column limit
if (!maxRows){
for (T number: list){
if (columnPointer == max){
columnPointer = 0;
row = sheet.createRow(++rowPointer);
}
Cell cell = row.createCell(columnPointer++);
progressBar.setProgress(progressBar.getProgress() + progressIncrementValue);
cell.setCellValue(number.toString());
}
}else {
//Making the sheet with a max row limit
int columnWrapIndex = (int)Math.ceil(list.size()/(float)max);
for (T number: list){
if (columnPointer == columnWrapIndex){
columnPointer = 0;
row = sheet.createRow(++rowPointer);
}
Cell cell = row.createCell(columnPointer++);
progressBar.setProgress(progressBar.getProgress() + progressIncrementValue);
cell.setCellValue(number.toString());
}
}
writeToExcel(workbook, progressBar);
}
static private void writeToExcel(XSSFWorkbook book, JFXProgressBar progressBar) throws IOException, InterruptedException{
//Exporting to Excel
int returnValue = fileManager.showSaveDialog(null);
if (returnValue == JFileChooser.APPROVE_OPTION){
File file = fileManager.getSelectedFile();
//Validation logic here
try{
FileOutputStream out = new FileOutputStream(file);
book.write(out);
out.close();
book.close();
}catch (FileNotFoundException ex){
}
}
}
}
Afterwards, my FXML document controller has a buttonListerner which calls:
longCalculationThread thread = new longCalculationThread(threadBundle);
thread.start();
The longcalculationthread creates a list of about a million prime numbers and Exports them to the ExcelWriter using this code:
private void publishResults() throws IOException, InterruptedException{
if (!longResults.isEmpty()){
if (shouldExport) {
progressText.setText("Exporting to Excel...");
ExcelWriter.makeSpreadsheet(longResults, exportType, excelExportLimit, getTitle(), progressBar);
}
}
The problem is, even though the variable holding the workbook in the XSSF workbook is a local variable to the methods it is used in, it doesn't get garbage collected afterwards.
It takes up like 1.5GB of RAM (I don't know why), and that data is only reallocated when another huge export is called (not for small exports).
My problem isn't really that the thing takes a lot of RAM, it's that even when the methods are completed the memory isn't GCed.
Here are some pictures of my NetBeans profiles:
Normal memory usage when making array of 1000000 primes:
Huge heap usage when making workbook
Memory Isn't reallocated when workbook ins't accessible anymore
Fluctuation seen when making a new workbook using the same static methods

I found the answer! I had to prompt the GC with System.gc(). I remember trying this out earlier, however I must have put it in a pace where the workbook was still accessible and hence couldn't be GCed.

Related

How to make workbook(outputstream) write all rows to an xlsx file instead of only one row?

I am still working on my file reading program but I have some problem with my code. A class 'file rating' will read all files in a directory and give them a rating. All the values are given to my 'sheetWriter class', which u can see down here. The class gets the correct values if I print the 'obj' (objects) but writing them to an excel is not working properly --> it will only write two rows: row 1 (the "Reachable for user", "Rating", "File path", and etc.) and row 2: ( reachable, 45, C://blabla, etc...). So it basically writes only one file to the xlsx. How Can I make it work so it writes all files to the xlsx?
Thanks! (I am a Java rookie)
public class SheetWriter {
private XSSFWorkbook workbook;
private XSSFSheet sheet;
//!! Maybe important to know: the values come from a for-loop
//from another class 'file rating': For (x : sourcefiles){ points=5 setPoints(points) }
public void SheetWriter(String file,String reachable, int points,String filePath,String fileName,String keywordMatch,String grootte,
String resolutie, String crea_date,String crea_mod,String last_acc, String authorString,String datetakenString,
String manufactString,String modelString,String gps ) {
workbook = new XSSFWorkbook();
sheet = workbook.createSheet("Rating Files");
Map<String, Object[]> data = new TreeMap<String, Object[]>();
data.put("1", new Object[]{"Reachable for user", "Rating", "File path","File name","Keyword","Size","Dimensions","Date_crea","Date_mod","Date_last_access",
"Author","Date taken","Camera maker","Camera model","GPS-data"});
data.put(file, new Object[]{reachable, points,filePath,fileName,keywordMatch,grootte, resolutie, crea_date,crea_mod,last_acc,
authorString,datetakenString,manufactString,modelString,gps});
Set<String> keyset = data.keySet();
int rownum = 0;
for (String key : keyset) {
// this creates a new row in the sheet
Row row = sheet.createRow(rownum++);
Object[] objArr = data.get(key);
int cellnum = 0;
for (Object obj : objArr) {
// this line creates a cell in the next column of that row
Cell cell = row.createCell(cellnum++);
//System.out.println(obj);
if (obj instanceof String)
cell.setCellValue((String)obj);
else if (obj instanceof Integer)
cell.setCellValue((Integer)obj);
}
} //!! so here it's still OK. I can print all the obj (objects)
try {
sheet.autoSizeColumn(0);sheet.autoSizeColumn(1);sheet.autoSizeColumn(2);sheet.autoSizeColumn(3);
sheet.autoSizeColumn(4);sheet.autoSizeColumn(5);sheet.autoSizeColumn(6);sheet.autoSizeColumn(7);
sheet.autoSizeColumn(8);sheet.autoSizeColumn(9);sheet.autoSizeColumn(10);sheet.autoSizeColumn(11);
sheet.autoSizeColumn(12);sheet.autoSizeColumn(13);sheet.autoSizeColumn(14);
FileOutputStream out = new FileOutputStream(new File("C:/Users/user/Pictures/test.xlsx"));
workbook.write(out); //!! Only writes one file to xlsx
out.close();
workbook.close();
System.out.println("test.xlsx is finished.");
}
catch (Exception e) {
e.printStackTrace();
}
If you want to write several rows into a sheet and the result is only one row, then you likely have a problem with a counter variable not getting properly incremented or every iteration doing exactly the same.
This time, it's slightly different because you are writing (creating and setting a value) rows in an enhanced for loop relying on the keySet of a Map with two entries only.
That means you always write those two entries only.
The problem that not all the files are written may be caused by an issue in an outer loop. I suggest to pass an argument List<String> fileNames instead of String fileName and do the writing for all the files in this method. Otherwise check the outer loop.

java.lang.OutOfMemoryError: Java heap space - Can not fix it

I have this code and this error constantly is appearing. I have only one excel , but nothing seems to work, I already tried a lot of option that I found surfing on internet, but nothing seems to work according of what I want to do.
I use different case to make easier the logical of my business and I am not going to change that, so I am not sure how to do solve this issue.
private static final String nombreArchivo = "casoPrueba.xlsx";
private static final String rutaArchivo = "src\\test\\resources\\data\\" + nombreArchivo;
public static XSSFSheet SacaHojaSegunTipo(String tipo) throws IOException {
if (workbook == null) {
try (FileInputStream fis = new FileInputStream(new File(rutaArchivo))) {
workbook = new XSSFWorkbook(fis);
}
}
XSSFSheet spreadsheet = null;
switch (tipo) {
case "Candidatos Minorista":
spreadsheet = workbook.getSheetAt(1);
break;
case "Conversion Candidatos":
spreadsheet = workbook.getSheetAt(2);
break;
case "Cuentas":
spreadsheet = workbook.getSheetAt(3);
break;
case "Detalle Cuenta":
spreadsheet = workbook.getSheetAt(4);
break;
case "Historial de Cuentas":
spreadsheet = workbook.getSheetAt(5);
break;
case "Cuentas Financieras":
spreadsheet = workbook.getSheetAt(6);
break;
case "AR Estado Automático":
spreadsheet = workbook.getSheetAt(7);
break;
case "Oportunidades":
spreadsheet = workbook.getSheetAt(8);
break;
default:
spreadsheet = workbook.getSheetAt(0);
break;
}
return spreadsheet;
}
I know this is not a efficient method.Hope anyone can help me with this.
Something like this (I tried to change your code as little as possible, so it's not perfect)
private static final String nombreArchivo = "casoPrueba.xlsx";
private static final String rutaArchivo = "src\\test\\resources\\data\\" + nombreArchivo;
private static XSSFWorkbook workbook = null;
public static XSSFSheet SacaHojaSegunTipo(String tipo) throws IOException {
if (workbook == null) {
try (FileInputStream fis = new FileInputStream(new File(rutaArchivo))) {
workbook = new XSSFWorkbook(fis);
}
}
XSSFSheet spreadsheet = null;
switch (tipo) {
case "Candidatos Minorista":
spreadsheet = workbook.getSheetAt(1);
break;
case "Conversion Candidatos":
spreadsheet = workbook.getSheetAt(2);
break;
case "Cuentas":
spreadsheet = workbook.getSheetAt(3);
break;
case "Detalle Cuenta":
spreadsheet = workbook.getSheetAt(4);
break;
case "Historial de Cuentas":
spreadsheet = workbook.getSheetAt(5);
break;
case "Navegar Cuentas":
spreadsheet = workbook.getSheetAt(6);
break;
case "Validar Número Operación":
spreadsheet = workbook.getSheetAt(7);
break;
case "Validar Tipos de Productos":
spreadsheet = workbook.getSheetAt(8);
break;
case "Validar Referencia y Cód. Auto.":
spreadsheet = workbook.getSheetAt(9);
break;
default:
spreadsheet = workbook.getSheetAt(0);
}
return spreadsheet;
}
First, a quick aside : it's worth noting the following from
https://poi.apache.org/apidocs/dev/org/apache/poi/xssf/usermodel/XSSFWorkbook.html#XSSFWorkbook-java.io.InputStream-
Using an InputStream requires more memory than using a File, so if a
File is available then you should instead do something like
OPCPackage pkg = OPCPackage.open(path);
XSSFWorkbook wb = new XSSFWorkbook(pkg);
// work with the wb object
......
pkg.close(); // gracefully closes the underlying zip file
(although doing wb.close() also closes the files and streams).
Now, your core issue is that you need to release resources after the sheet or workbook are no longer required, but at present you cannot do so since these are hidden local inside the method.
So you need to give your caller access to close them when it's done. It's a matter of preference, but personally I would prefer encapsulating the spreadsheet into it's own class - a spreadsheet IS a clearly defined object in its own right, after all ! As such, this would necessitate a change from static, so something like :
public class RutaArchivo implements AutoCloseable {
private static final String nombreArchivo = "casoPrueba.xlsx";
private static final String rutaArchivo = "src\\test\\resources\\data\\" + nombreArchivo;
public static final String CANDIDATOS_MINORISTA = "Candidatos Minorista";
public static final String CONVERSION_CANDIDATOS = "Conversion Candidatos"
public static final String CUENTAS = "Cuentas";
private XSSFWorkbook workbook;
public RutaArchivo() throws InvalidFormatException, IOException {
workbook = new XSSFWorkbook(new File(rutaArchivo));
}
#Override
public void close() throws Exception {
if (workbook != null) {
workbook.close();
workbook = null;
}
}
public XSSFSheet sacaHojaSegunTipo(String tipo) {
if (workbook == null) {
throw new IllegalStateException("It's closed");
}
XSSFSheet spreadsheet = workbook.getSheetAt(0);
if (tipo .equals(CANDIDATOS_MINORISTA)) {
spreadsheet = workbook.getSheetAt(1);
}else if(tipo.equals(CONVERSION_CANDIDATOS)){
spreadsheet = workbook.getSheetAt(2);
}else if(tipo.equals(CUENTAS)){
spreadsheet = workbook.getSheetAt(3);
// etc, etc
}
return spreadsheet;
}
}
A couple of things to note :
If we want the caller to close the file, then we should explictly make them take some action to open it as well, otherwise it's too easy for it to be left hanging. In the example above, this is implicit in creating the object - just like the standard Java types like FileInputStream, etc.
Making RutaArchivo AutoCloseable means that it can used in try-with-resources, so closed automatically - eg :
try (RutaArchivo rutaArchivo = new RutaArchivo()) {
XSSFSheet cuentas = rutaArchivo.getSheet(RutaArchivo.CUENTAS);
}
Using constants for the names of the sheets reduces bugs (eg, no typos when the method is called)
As this is it's own class rather than static methods, it's easier to substitute or mock when writing unit tests.
Anyhow, a few thoughts - hope they help.

Writing large of data to excel: GC overhead limit exceeded

I have a list of strings in read from MongoDB (~200k lines)
Then I want to write it to an excel file with Java code:
public class OutputToExcelUtils {
private static XSSFWorkbook workbook;
private static final String DATA_SEPARATOR = "!";
public static void clusterOutToExcel(List<String> data, String outputPath) {
workbook = new XSSFWorkbook();
FileOutputStream outputStream = null;
writeData(data, "Data");
try {
outputStream = new FileOutputStream(outputPath);
workbook.write(outputStream);
workbook.close();
} catch (IOException e) {
e.printStackTrace();
}
}
public static void writeData(List<String> data, String sheetName) {
int rowNum = 0;
XSSFSheet sheet = workbook.getSheet(sheetName);
sheet = workbook.createSheet(sheetName);
for (int i = 0; i < data.size(); i++) {
System.out.println(sheetName + " Processing line: " + i);
int colNum = 0;
// Split into value of cell
String[] valuesOfLine = data.get(i).split(DATA_SEPERATOR);
Row row = sheet.createRow(rowNum++);
for (String valueOfCell : valuesOfLine) {
Cell cell = row.createCell(colNum++);
cell.setCellValue(valueOfCell);
}
}
}
}
Then I get an error:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead
limit exceeded at
org.apache.xmlbeans.impl.store.Cur$Locations.(Cur.java:497) at
org.apache.xmlbeans.impl.store.Locale.(Locale.java:168) at
org.apache.xmlbeans.impl.store.Locale.getLocale(Locale.java:242) at
org.apache.xmlbeans.impl.store.Locale.newInstance(Locale.java:593) at
org.apache.xmlbeans.impl.schema.SchemaTypeLoaderBase.newInstance(SchemaTypeLoaderBase.java:198)
at
org.apache.poi.POIXMLTypeLoader.newInstance(POIXMLTypeLoader.java:132)
at
org.openxmlformats.schemas.spreadsheetml.x2006.main.CTRst$Factory.newInstance(Unknown
Source) at
org.apache.poi.xssf.usermodel.XSSFRichTextString.(XSSFRichTextString.java:87)
at
org.apache.poi.xssf.usermodel.XSSFCell.setCellValue(XSSFCell.java:417)
at
ups.mongo.excelutil.OutputToExcelUtils.writeData(OutputToExcelUtils.java:80)
at
ups.mongo.excelutil.OutputToExcelUtils.clusterOutToExcel(OutputToExcelUtils.java:30)
at ups.mongodb.App.main(App.java:74)
Please give me some advice for that?
Thank you with my respect.
Update solution: Using SXSSWorkbook instead of XSSWorkbook
public class OutputToExcelUtils {
private static SXSSFWorkbook workbook;
private static final String DATA_SEPERATOR = "!";
public static void clusterOutToExcel(ClusterOutput clusterObject, ClusterOutputTrade clusterOutputTrade,
ClusterOutputDistance ClusterOutputDistance, String outputPath) {
workbook = new SXSSFWorkbook();
workbook.setCompressTempFiles(true);
FileOutputStream outputStream = null;
writeData(clusterOutputTrade.getTrades(), "Data");
try {
outputStream = new FileOutputStream(outputPath);
workbook.write(outputStream);
workbook.close();
} catch (IOException e) {
e.printStackTrace();
}
}
public static void writeData(List<String> data, String sheetName) {
int rowNum = 0;
SXSSFSheet sheet = workbook.createSheet(sheetName);
sheet.setRandomAccessWindowSize(100); // For 100 rows saved in memory, it will flushed after wirtten to excel file
for (int i = 0; i < data.size(); i++) {
System.out.println(sheetName + " Processing line: " + i);
int colNum = 0;
// Split into value of cell
String[] valuesOfLine = data.get(i).split(DATA_SEPERATOR);
Row row = sheet.createRow(rowNum++);
for (String valueOfCell : valuesOfLine) {
Cell cell = row.createCell(colNum++);
cell.setCellValue(valueOfCell);
}
}
}
}
Your application is spending too much time doing garbage collection. This doesn't necessarily mean that it is running out of heap space; however, it spends too much time in GC relative to performing actual work, so the Java runtime shuts it down.
Try to enable throughput collection with the following JVM option:
-XX:+UseParallelGC
While you're at it, give your application as much heap space as possible:
-Xms????m
(where ???? stands for the amount of heap space in MB, e.g. -Xms8192m)
If this doesn't help, try to set a more lenient throughput goal with this option:
-XX:GCTimeRatio=19
This specifies that your application should do 19 times more useful work than GC-related work, i.e. it allows the GC to consume up to 5% of the processor time (I believe the stricter 1% default goal may be causing the above runtime error)
No guarantee that his will work. Can you check and post back so others who experience similar problems may benefit?
EDIT
Your root problem remains the fact that you need to hold the entire spreadhseet and all its related objects in memory while you are building it. Another solution would be to serialize the data, i.e. writing the actual spreadsheet file instead of constructing it in memory and saving it at the end. However, this requires reading up on the XLXS format and creating a custom solution.
Another option would be looking for a less memory-intensive library (if one exists). Possible alternatives to POI are JExcelAPI (open source) and Aspose.Cells (commercial).
I've used JExcelAPI years ago and had a positive experience (however, it appears that it is much less actively maintained than POI, so may no longer be the best choice).
EDIT 2
Looks like POI offers a streaming model (https://poi.apache.org/spreadsheet/how-to.html#sxssf), so this may be the best overall approach.
Well try to not load all the data in memory. Even if the binary representation of 200k lines is not that big the hidrated object in memory may be too big. Just as a hint if you have a Pojo each attribute in this pojo has a pointer and each pointer depending on if it is compressed or not compressed will take 4 or 8 bytes. This mean that if your data is a Pojo with 4 attributes only for the pointers you will be spending 200 000* 4bytes(or 8 bytes).
Theoreticaly you can increase the amount of memory to the JVM, but this is not a good solution, or more precisly it is not a good solution for a Live system. For a non interactive system might be fine.
Hint: Use -Xmx -Xms jvm arguments to control the heap size.
Instead of getting the entire list from the data, iterate line wise.
If too cumbersome, write the list to a file, and reread it linewise, for instance as a Stream<String>:
Path path = Files.createTempFile(...);
Files.write(path, list, StandarCharsets.UTF_8);
Files.lines(path, StandarCharsets.UTF_8)
.forEach(line -> { ... });
On the Excel side: though xlsx uses shared strings, if XSSF was done careless,
the following would use a single String instance for repeated string values.
public class StringCache {
private static final int MAX_LENGTH = 40;
private Map<String, String> identityMap = new Map<>();
public String cached(String s) {
if (s == null) {
return null;
}
if (s.length() > MAX_LENGTH) {
return s;
}
String t = identityMap.get(s);
if (t == null) {
t = s;
identityMap.put(t, t);
}
return t;
}
}
StringCache strings = new StringCache();
for (String valueOfCell : valuesOfLine) {
Cell cell = row.createCell(colNum++);
cell.setCellValue(strings.cached(valueOfCell));
}

Creating large .xls with SXSSF

I've got problem with exporting large .xls with SXSSF, saying large I mean 27 cols x 100 000 rows. Excel file is return on endpoint request. I've limited amount of rows - it can be 3x larger.
I'm using template engine for inserting data.
Original code
public StreamingOutput createStreamedExcelReport(Map<String, Object> params, String templateName, String[] columnsToHide) throws Exception {
try(InputStream is = ReportGenerator.class.getResourceAsStream(templateName)) {
assert is != null;
final Transformer transformer = PoiTransformer.createTransformer(is);
AreaBuilder areaBuilder = new XlsCommentAreaBuilder(transformer);
List<Area> xlsAreaList = areaBuilder.build();
Area xlsArea = xlsAreaList.get(0);
Context context = new PoiContext();
for(Map.Entry<String, Object> entry : params.entrySet()) {
context.putVar(entry.getKey(), entry.getValue());
}
xlsArea.applyAt(new CellRef("Sheet1!A1"), context);
xlsArea.processFormulas();
return new StreamingOutput() {
#Override
public void write(OutputStream out) throws IOException {
((PoiTransformer) transformer).getWorkbook().write(out);
}
};
}
}
SXSSF
public StreamingOutput createStreamedExcelReport(Map<String, Object> params, String templateName, String[] columnsToHide) throws Exception {
try(InputStream is = ReportGenerator.class.getResourceAsStream(templateName)) {
assert is != null;
Workbook workbook = WorkbookFactory.create(is);
final PoiTransformer transformer = PoiTransformer.createSxssfTransformer(workbook);
AreaBuilder areaBuilder = new XlsCommentAreaBuilder(transformer);
List<Area> xlsAreaList = areaBuilder.build();
Area xlsArea = xlsAreaList.get(0);
Context context = new PoiContext();
for(Map.Entry<String, Object> entry : params.entrySet()) {
context.putVar(entry.getKey(), entry.getValue());
}
xlsArea.applyAt(new CellRef("Sheet1!A1"), context);
xlsArea.processFormulas();
return new StreamingOutput() {
#Override
public void write(OutputStream out) throws IOException {
transformer.getWorkbook().write(out);
}
};
}
}
Export was running for 7 mins and I stopped server - it was too long. Acceptable time would be something like 1 min (max. 2 min). Most of that time CPU usage was about 60-80% and memory usage was constant. I tried also exporting 40 rows - it took something like 10 sec.
Maybe my function needs to be optimized.
Additional problem is that I'm inserting functions. In original code functions are replaced with values. In SXSSF version they are not.
I recommend you to disable formulas processing at this point because the formulas support for SXSSF version is limited and the memory consumption can be too high. The formulas support may be improved in the future JXLS releases.
So just remove xlsArea.processFormulas() call and add
context.getConfig().setIsFormulaProcessingRequired(false);
to disable tracking of cell references (as shown in Jxls doc) and see if it works.
Also please note that the template and the final report are expected to be in .xlsx format if you use SXSSF.

How do I write a Java text file viewer for big log files

I am working on a software product with an integrated log file viewer. Problem is, its slow and unstable for really large files because it reads the whole file into memory when you view a log file. I'm wanting to write a new log file viewer that addresses this problem.
What are the best practices for writing viewers for large text files? How does editors like notepad++ and VIM acomplish this? I was thinking of using a buffered Bi-directional text stream reader together with Java's TableModel. Am I thinking along the right lines and are such stream implementations available for Java?
Edit: Will it be worthwhile to run through the file once to index the positions of the start of each line of text so that one knows where to seek to? I will probably need the amount of lines, so will probably have to scan through the file at least once?
Edit2: I've added my implementation to an answer below. Please comment on it or edit it to help me/us arrive at a more best-practice implementation or otherwise provide your own.
I'm not sure that NotePad++ actually implements random access, but I think that's the way to go, especially with a log file viewer, which implies that it will be read only.
Since your log viewer will be read only, you can use a read only random access memory mapped file "stream". In Java, this is the FileChannel.
Then just jump around in the file as needed and render to the screen just a scrolling window of the data.
One of the advantages of the FileChannel is that concurrent threads can have the file open, and reading doesn't affect the current file pointer. So, if you're appending to the log file in another thread, it won't be affected.
Another advantage is that you can call the FileChannel's size method to get the file size at any moment.
The problem with mapping memory directly to a random access file, which some text editors allow (such as HxD and UltraEdit), is that any changes directly affect the file. Therefore, changes are immediate (except for write caching), which is something users typically don't want. Instead, users typically don't want their changes made until they click Save. However, since this is just a viewer, you don't have the same concerns.
A typical approach is to use a seekable file reader, make one pass through the log recording an index of line offsets and then present only a window onto a portion of the file as requested.
This reduces both the data you need in quick recall and doesn't load up a widget where 99% of its contents aren't currently visible.
I post my test implementation (after following the advice of Marcus Adams and msw) here for your convenience and also for further comments and criticism. Its quite fast.
I've not bothered with Unicode encoding safety. I guess this will be my next question. Any hints on that very welcome.
class LogFileTableModel implements TableModel {
private final File f;
private final int lineCount;
private final String errMsg;
private final Long[] index;
private final ByteBuffer linebuf = ByteBuffer.allocate(1024);
private FileChannel chan;
public LogFileTableModel(String filename) {
f = new File(filename);
String m;
int l = 1;
Long[] idx = new Long[] {};
try {
FileInputStream in = new FileInputStream(f);
chan = in.getChannel();
m = null;
idx = buildLineIndex();
l = idx.length;
} catch (IOException e) {
m = e.getMessage();
}
errMsg = m;
lineCount = l;
index = idx;
}
private Long[] buildLineIndex() throws IOException {
List<Long> idx = new LinkedList<Long>();
idx.add(0L);
ByteBuffer buf = ByteBuffer.allocate(8 * 1024);
long offset = 0;
while (chan.read(buf) != -1) {
int len = buf.position();
buf.rewind();
int pos = 0;
byte[] bufA = buf.array();
while (pos < len) {
byte c = bufA[pos++];
if (c == '\n')
idx.add(offset + pos);
}
offset = chan.position();
}
System.out.println("Done Building index");
return idx.toArray(new Long[] {});
}
#Override
public int getColumnCount() {
return 2;
}
#Override
public int getRowCount() {
return lineCount;
}
#Override
public String getColumnName(int columnIndex) {
switch (columnIndex) {
case 0:
return "#";
case 1:
return "Name";
}
return "";
}
#Override
public Object getValueAt(int rowIndex, int columnIndex) {
switch (columnIndex) {
case 0:
return String.format("%3d", rowIndex);
case 1:
if (errMsg != null)
return errMsg;
try {
Long pos = index[rowIndex];
chan.position(pos);
chan.read(linebuf);
linebuf.rewind();
if (rowIndex == lineCount - 1)
return new String(linebuf.array());
else
return new String(linebuf.array(), 0, (int)(long)(index[rowIndex+1]-pos));
} catch (Exception e) {
return "Error: "+ e.getMessage();
}
}
return "a";
}
#Override
public Class<?> getColumnClass(int columnIndex) {
return String.class;
}
// ... other methods to make interface complete
}

Categories

Resources