I have a class KeywordCount which tokenizes a given sentence and tags it using a maxent tagger by Apache OpenNLP-POS tagger. I first tokenize the output and then feed it to the tagger. I have a problem of RAM usage of upto 165 MB after the jar has completed its tasks. The rest of the program just makes a DB call and checks for new tasks. I have isolated the leak to this class. You can safely ignore the Apache POI Excel code. I need to know if any of you can find the leak in the code.
public class KeywordCount {
Task task;
String taskFolder = "";
List<String> listOfWords;
public KeywordCount(String taskFolder) {
this.taskFolder = taskFolder;
listOfWords = new ArrayList<String>();
}
public void tagText() throws Exception {
String xlsxOutput = taskFolder + File.separator + "results_pe.xlsx";
FileInputStream fis = new FileInputStream(new File(xlsxOutput));
XSSFWorkbook wb = new XSSFWorkbook(fis);
XSSFSheet sheet = wb.createSheet("Keyword Count");
XSSFRow row = sheet.createRow(0);
Cell cell = row.createCell(0);
XSSFCellStyle csf = (XSSFCellStyle)wb.createCellStyle();
csf.setVerticalAlignment(CellStyle.VERTICAL_TOP);
csf.setBorderBottom(CellStyle.BORDER_THICK);
csf.setBorderRight(CellStyle.BORDER_THICK);
csf.setBorderTop(CellStyle.BORDER_THICK);
csf.setBorderLeft(CellStyle.BORDER_THICK);
Font fontf = wb.createFont();
fontf.setColor(IndexedColors.GREEN.getIndex());
fontf.setBoldweight(Font.BOLDWEIGHT_BOLD);
csf.setFont(fontf);
int rowNum = 0;
BufferedReader br = null;
InputStream modelIn = null;
POSModel model = null;
try {
modelIn = new FileInputStream("taggers" + File.separator + "en-pos-maxent.bin");
model = new POSModel(modelIn);
}
catch (IOException e) {
// Model loading failed, handle the error
e.printStackTrace();
}
finally {
if (modelIn != null) {
try {
modelIn.close();
}
catch (IOException e) {
}
}
}
File ftmp = new File(taskFolder + File.separator + "phrase_tmp.txt");
if(ftmp.exists()) {
br = new BufferedReader(new FileReader(ftmp));
POSTaggerME tagger = new POSTaggerME(model);
String line = "";
while((line = br.readLine()) != null) {
if (line.equals("")) {
break;
}
row = sheet.createRow(rowNum++);
if(line.startsWith("Match")) {
int index = line.indexOf(":");
line = line.substring(index + 1);
String[] sent = getTokens(line);
String[] tags = tagger.tag(sent);
for(int i = 0; i < tags.length; i++) {
if (tags[i].equals("NN") || tags[i].equals("NNP") || tags[i].equals("NNS") || tags[i].equals("NNPS")) {
listOfWords.add(sent[i].toLowerCase());
} else if (tags[i].equals("JJ") || tags[i].equals("JJR") || tags[i].equals("JJS")) {
listOfWords.add(sent[i].toLowerCase());
}
}
Map<String, Integer> treeMap = new TreeMap<String, Integer>();
for(String temp : listOfWords) {
Integer counter = treeMap.get(temp);
treeMap.put(temp, (counter == null) ? 1 : counter + 1);
}
listOfWords.clear();
sent = null;
tags = null;
if (treeMap != null || !treeMap.isEmpty()) {
for(Map.Entry<String, Integer> entry : treeMap.entrySet()) {
row = sheet.createRow(rowNum++);
cell = row.createCell(0);
cell.setCellValue(entry.getKey().substring(0, 1).toUpperCase() + entry.getKey().substring(1));
XSSFCell cell1 = row.createCell(1);
cell1.setCellValue(entry.getValue());
}
treeMap.clear();
}
treeMap = null;
}
rowNum++;
}
br.close();
tagger = null;
model = null;
}
sheet.autoSizeColumn(0);
fis.close();
FileOutputStream fos = new FileOutputStream(new File(xlsxOutput));
wb.write(fos);
fos.close();
System.out.println("Finished writing XLSX file for Keyword Count!!");
}
public String[] getTokens(String match) throws Exception {
InputStream modelIn = new FileInputStream("taggers" + File.separator + "en-token.bin");
TokenizerModel model = null;
try {
model = new TokenizerModel(modelIn);
}
catch (IOException e) {
e.printStackTrace();
}
finally {
if (modelIn != null) {
try {
modelIn.close();
}
catch (IOException e) {
}
}
}
Tokenizer tokenizer = new TokenizerME(model);
String tokens[] = tokenizer.tokenize(match);
model = null;
return tokens;
}
}
My system GCed the RAM after 165MB...but when I upload to the server the GC is not performed and it rises upto 480 MB(49% of RAM usage).
First of all, increased heap usage is not evidence of a memory leak. It may simply be that the GC has not run yet.
Having said that, it is doubtful that anyone can spot a memory leak just by "eyeballing" your code. The correct way to solve this is for >>you<< to read up on the techniques for finding Java memory leaks, and >>you<< then use the relevant tools (e.g. visualvm, jhat, etc) to search for the problem yourself.
Here are some references on finding storage leaks:
Troubleshooting Guide for Java SE 6 with HotSpot VM : Troubleshooting Memory Leaks. http://www.oracle.com/technetwork/java/javase/memleaks-137499.html - Note 1.
How to find a Java Memory Leak
Note 1: This link is liable to break. If it does, use Google to find the article.
I have isolated the leak to this class. You can safely ignore the Apache POI Excel code.
If we ignore the Apache POI code, the only source of a potential memory "leakage" is that the word list ( listOfWords ) is retained. (Calling clear() will null out its contents, but the backing array is retained, and that array's size is determined by the maximum list size. From a memory footprint perspective, it would be better to replace the list with a new empty list.)
However, that is only a "leak" if you keep a reference to the KeywordCount instance. And if you are doing that because you are using the instance, I wouldn't call that a leak at all.
Related
I'm trying to write this xlsx file in the Download directory of a Samsung Galaxy Tab A 2019 (Android 9.0). If I try to do this on my emulator (Google Pixel C with android 9.0) it works without any problems and I can see the file. If I give the app to my client it gives an error, catches by up by this function:
try {
importIntoExcel();
DynamicToast.makeSuccess(UserList.this, "Saved!", 2000).show();
b1.setEnabled(true);
} catch (IOException e) {
DynamicToast.makeError(UserList.this, "Error!", 2000).show();
e.printStackTrace();
}
Unfortunately I cannot see the stack trace since I cannot connect the client's tablet to my PC. This is the method which doesn't works:
private void importIntoExcel() throws IOException {
String[] columns = {"Numero Test", "Codice ID", "Genere", "Data di nascita", "Protocollo", "Data del test", " ", "Cornice", "Nome cornice", "Fluidità", "Flessibilità",
"Originalita'", "Elaborazione'", "Titolo", "Tempo Reazione", "Tempo Completamento", "Numero cancellature", "Numero Undo"};
Workbook workbook = new XSSFWorkbook();
Sheet sheet = workbook.createSheet("RiepilogoTest");
Font headerFont = workbook.createFont();
headerFont.setBold(true);
headerFont.setFontHeightInPoints((short) 14);
headerFont.setColor(IndexedColors.RED.getIndex());
CellStyle headerCellStyle = workbook.createCellStyle();
headerCellStyle.setFont(headerFont);
headerCellStyle.setAlignment(HorizontalAlignment.CENTER_SELECTION);
// Create a Row
Row headerRow = sheet.createRow(0);
for (int i = 0; i < columns.length; i++) {
Cell cell = headerRow.createCell(i);
cell.setCellValue(columns[i]);
cell.setCellStyle(headerCellStyle);
}
// Create Other rows and cells with contacts data
int rowNum = 1;
//Inserting the data
File dir = new File("/data/user/0/com.example.williamstest/");
for (File file : dir.listFiles()) {
if (file.getName().startsWith("app_draw")) {
String typeTest = file.getName().replaceAll("[^\\d.]", "");
if (new File(file.getAbsolutePath() + "/infotest.txt").exists()) {
FileReader f = new FileReader(file.getAbsolutePath() + "/infotest.txt");
LineNumberReader reader = new LineNumberReader(f);
String line;
String protocollo = "";
line = reader.readLine();
Row row = null;
if (line.equals(userLogged)) {
row = sheet.createRow(rowNum++);
row.createCell(0).setCellValue("Test: " + typeTest);
line = reader.readLine();
row.createCell(2).setCellValue(line);
line = reader.readLine();
if (line.equals("0")) row.createCell(2).setCellValue("/");
row.createCell(3).setCellValue(line);
line = reader.readLine();
protocollo = line;
row.createCell(4).setCellValue(line);
line = reader.readLine();
row.createCell(5).setCellValue(line);
line = reader.readLine();
row.createCell(1).setCellValue(line);
}
for (int i=0; i<12; i++) {
String content = "";
reader = new LineNumberReader(new FileReader(file.getAbsolutePath() + "/" + protocollo + (i + 1) + "_score.txt"));
while ((line = reader.readLine()) != null) {
content+=line+"\n";
}
String[] values = content.split("\n");
row.createCell(6).setCellValue(" "); //Vuota
row.createCell(7).setCellValue(i+1); //Cornice
row.createCell(8).setCellValue(values[4]); //Nome cornice
row.createCell(9).setCellValue(values[0]); //Fluidita
row.createCell(10).setCellValue(values[1]); //Flessibilita
row.createCell(11).setCellValue(values[2]); //Originalita'
row.createCell(12).setCellValue(values[3]); //Elaborazione
row.createCell(13).setCellValue(values[9]); //Titolo
row.createCell(14).setCellValue(values[5]); //Tempo reazione
row.createCell(15).setCellValue(values[6]); //Tempo Completamento
row.createCell(16).setCellValue(values[7]); //Numero cancellature
row.createCell(17).setCellValue(values[8]); //Numero undo
row = sheet.createRow(rowNum++);
row.createCell(0).setCellValue(" ");
row.createCell(1).setCellValue(" ");
row.createCell(2).setCellValue(" ");
row.createCell(3).setCellValue(" ");
row.createCell(4).setCellValue(" ");
row.createCell(5).setCellValue(" ");
}
f.close();
}
}
}
sheet.setDefaultColumnWidth(23);
// Write the output to a file
if (new File(Environment.getExternalStorageDirectory(), "Download/risultatiTest.xlsx").exists())
new File(Environment.getExternalStorageDirectory(), "Download/risultatiTest.xlsx").delete();
FileOutputStream fileOut = new FileOutputStream(new File(Environment.getExternalStorageDirectory(), "Download/risultatiTest.xlsx"));
workbook.write(fileOut);
fileOut.close();
}
I also wrote this method which saves in the same directory and it works, so I don't think it's a permission problem:
private void generateImages() throws IOException {
File dir = new File("/data/user/0/com.example.williamstest/");
File mediaStorageDir = new File(Environment.getExternalStorageDirectory(), "/Download/ImmaginiTest");
if (!mediaStorageDir.exists()) {
if (!mediaStorageDir.mkdirs())
Log.d("App", "failed to create directory");
} else {
if (mediaStorageDir.isDirectory()) {
for (File child : mediaStorageDir.listFiles())
deleteRecursive(child);
}
mediaStorageDir.delete();
mediaStorageDir.mkdirs();
}
for (File file : dir.listFiles()) {
if (file.getName().startsWith("app_draw") && Character.isDigit(file.getName().charAt(file.getName().length() - 1))) {
File makingDir = new File(Environment.getExternalStorageDirectory(), "/Download/ImmaginiTest/Test"+file.getName().substring(file.getName().length() - 1));
makingDir.mkdirs();
for (File fileS : file.listFiles()) {
if (fileS.getName().endsWith(".png")) {
Bitmap b = BitmapFactory.decodeStream(new FileInputStream(fileS));
File mypath=new File(makingDir, fileS.getName());
FileOutputStream fos = null;
try {
fos = new FileOutputStream(mypath);
b.compress(Bitmap.CompressFormat.PNG, 100, fos);
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
fos.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
}
}
}
Do you have any logcat that could be use to narrow down where the error comes from ?
Also you could begin by avoiding using such magic string:
File dir = new File("/data/user/0/com.example.williamstest/");
File makingDir = new File(Environment.getExternalStorageDirectory(), "/Download/ImmaginiTest/Test"+file.getName().substring(file.getName().length() - 1));
As of API 29 Environment.getExternalStoragePublicDirectory() is deprecated. Look at this AndroidStudio getExternalStoragePublicDirectory in API 29 instead.
Traditionally, the external storage is typically an SD card but it may also be implemented as built-in storage.
Thus it is necessary to verify if you have one and if it is mounted before accessing a file in the
Environment.getExternalStorageDirectory(). Otherwise, you need an internal directory as a fallback. Check out the doc here to know how to.
Also, if you are targetting API level 29, make sure you use android:requestLegacyExternalStorage="true" on your manifest's application tag too. Check it out here.
On some devices and in newer Android versions Environment.getExternalStorageDirectory() no longer returns a valid path. Try using Context.getExternalFilesDir(null) instead,
it should return this path: /storage/emulated/0/Android/data/your.package.name/. Try it out to see if thats the issue. Here's the doc.
I recommend you to emulate some similar Samsung devices and see if you can replicate the error to have a look at the logcat output
Probably the writing files on computers and android devices are different. The new android versions are blocking applications access to some folders. So try different folder to write it in.
Maybe don’t create new folder and write it in existing.
Also as others say Environment.getExternalStorageDirectory()
is deprecated and you should not use it on newer android version, but older ones you still could.
Also you can’t 100% trust emulators, because it is not 100% same
i have 39 csv files which have a lot of memory size. I want to load this file by Java and set as one variable. Below paragraph is my coding which works for small size file, but doesn't work for large size file. Size of file is usually around 100mb to 800mb. I want to load 39 file in directory and put them into one 2d array.
public static String readCSV(File csvFile) {
BufferedReader bufferedReader = null;
StringBuffer stringBuffer = new StringBuffer();
try {
bufferedReader = new BufferedReader(new FileReader(csvFile));
} catch (FileNotFoundException e) {
e.printStackTrace();
}
try {
String temp = null;
while((temp = bufferedReader.readLine()) != null) {
stringBuffer.append(temp+","); // temp 에 저장되어있는 한 줄을 더한다.
}
System.out.println(stringBuffer);
} catch (IOException e) {
e.printStackTrace();
}
// -10,-9,-8,-7,-6,-5,-4,-3,-2,-1,0,,,,,,,,,,1,2,3,4,5,6,7,8,9,10, 반환
return stringBuffer.toString();
}
public static String[] parse(String str) {
String[] strArr = str.split(","); // 쉼표가 1개인 것을 기준으로 나누어서 배열에 저장
return strArr;
}
public static void main(String[] args) throws IOException {
//mergeCsvFiles("sample", 4, "D:\\sample_folder\\" + "merge_file" + ".csv");
String str = readCSV(new File("D:/sample_folder/sample1.csv"));
String[] strArr = parse(str); // String 배열에 차곡차곡 담겨서 나온다.
int varNumber = 45;
int rowNumber = strArr.length/varNumber;
String[][] Array2D = new String[varNumber][rowNumber];
for(int j=0;j<varNumber;j++)
{
for(int i=0; i<rowNumber;i++)
{
String k = strArr[i*varNumber+j];
Array2D[j][i]= k;
}
} //2D array 배열을 만들기
//String[][] naArray2D=removeNA(Array2D,rowNumber,varNumber); //NA 포함한 행 지우기
// /* 제대로 제거 됐는지 확인하는 코드
for(int i=0;i<varNumber;i++){
for(int j=0;j<16;j++){
System.out.println(Array2D[i][j]);
}
System.out.println("**********************NA제거&2차원 배열**********************");
}
// */
}
}
With the file sizes you are mentioning, you either are going to likely run out of memory in the JVM.
This is probably why your largest file of 800 some MB isn't loading into memory. Not only are you loading that 800MB into memory, but you are also adding the overhead of the arrays that you are using. In other words, you're using 1600MB + all of the extra overhead cost of each array, which becomes sizeable.
My bet is that you are exceeding memory limits under the assumption that file format is perfect in both cases. While I cannot confirm as I do not know your JVM, your memory consumption, nor have the required assets to figure any of this out, it is up to you to decide whether or not that is the case.
Also, I don't know - maybe I'm reading your code right, but it doesn't seem like it's going to do what I think you want it to do. Maybe I'm wrong, I don't know exactly what you're trying to do.
I'm working on this "program" that reads data from 2 large csv files (line by line), compares an Array element from the files and, when a match is found, it writes my necessary data into a 3rd file. The only problem I have is that it is very slow. It reads 1-2 lines per second, which is extremely slow, considering I have millions of records. Any ideas on how could I make it faster? Here's my code:
public class ReadWriteCsv {
public static void main(String[] args) throws IOException {
FileInputStream inputStream = null;
FileInputStream inputStream2 = null;
Scanner sc = null;
Scanner sc2 = null;
String csvSeparator = ",";
String line;
String line2;
String path = "D:/test1.csv";
String path2 = "D:/test2.csv";
String path3 = "D:/newResults.csv";
String[] columns;
String[] columns2;
Boolean matchFound = false;
int count = 0;
StringBuilder builder = new StringBuilder();
FileWriter writer = new FileWriter(path3);
try {
// specifies where to take the files from
inputStream = new FileInputStream(path);
inputStream2 = new FileInputStream(path2);
// creating scanners for files
sc = new Scanner(inputStream, "UTF-8");
// while there is another line available do:
while (sc.hasNextLine()) {
count++;
// storing the current line in the temporary variable "line"
line = sc.nextLine();
System.out.println("Number of lines read so far: " + count);
// defines the columns[] as the line being split by ","
columns = line.split(",");
inputStream2 = new FileInputStream(path2);
sc2 = new Scanner(inputStream2, "UTF-8");
// checks if there is a line available in File2 and goes in the
// while loop, reading file2
while (!matchFound && sc2.hasNextLine()) {
line2 = sc2.nextLine();
columns2 = line2.split(",");
if (columns[3].equals(columns2[1])) {
matchFound = true;
builder.append(columns[3]).append(csvSeparator);
builder.append(columns[1]).append(csvSeparator);
builder.append(columns2[2]).append(csvSeparator);
builder.append(columns2[3]).append("\n");
String result = builder.toString();
writer.write(result);
}
}
builder.setLength(0);
sc2.close();
matchFound = false;
}
if (sc.ioException() != null) {
throw sc.ioException();
}
} finally {
//then I close my inputStreams, scanners and writer
Use an existing CSV library rather than rolling your own. It will be far more robust than what you have now.
However, your problem is not CSV parsing speed, it that your algorithm is O(n^2), for each line in the first file, you need to scan the second file. This kind of algorithm explodes very quickly with the size of data, when you have millions of rows, you'll run into problems. You need a better algorithm.
The other problem is you are re-parsing the second file for every scan. You should at least read it into an memory as an ArrayList or something first at the start of the program so you only need to load and parse it once.
Use univocity-parsers' CSV parser as it won't take much longer than a couple of seconds to process two files with 1 million rows each:
public void diff(File leftInput, File rightInput) {
CsvParserSettings settings = new CsvParserSettings(); //many config options here, check the tutorial
CsvParser leftParser = new CsvParser(settings);
CsvParser rightParser = new CsvParser(settings);
leftParser.beginParsing(leftInput);
rightParser.beginParsing(rightInput);
String[] left;
String[] right;
int row = 0;
while ((left = leftParser.parseNext()) != null && (right = rightParser.parseNext()) != null) {
row++;
if (!Arrays.equals(left, right)) {
System.out.println(row + ":\t" + Arrays.toString(left) + " != " + Arrays.toString(right));
}
}
leftParser.stopParsing();
rightParser.stopParsing();
}
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).
I try to convert an old Applet to a GWT Application but I encountered a problem with the following function:
private String[] readBrandList() {
try {
File file = new File("Brands.csv");
String ToAdd = "Default";
BufferedReader read = new BufferedReader(new FileReader(file));
ArrayList<String> BrandName = new ArrayList<String>();
while (ToAdd != null) {
ToAdd = (read.readLine());
BrandName.add(ToAdd);
}
read.close();
String[] BrandList = new String[BrandName.size()];
for (int Counter = 0; Counter < BrandName.size(); Counter++) {
BrandList[Counter] = BrandName.get(Counter);
}
return BrandList;
} catch (Exception e) {
}
return null;
}
Now apparently The BufferedReader isn't supported by GWT and I find no way to replace it other than writing all entries into the code which would result in a maintenance nightmare.
Is there any function I'm not aware of or is it just impossible?
You need to read this file on the server side of your app, and then pass the results to the client using your preferred server-client communication method. You can read and pass the entire file, if it's small, or read/transfer in chunks if the file is big.
I have the following Lucene code for indexing, when I run this code with 1 million records - it running fast (indexing in 15 seconds (both local and server with high configuration)).
When I try to index 20 million records, its taking about 10 minutes to complete the indexing.
I am running this 20 million records in Linux Server with more than 100 GB RAM. Is setting more RAM Buffer size will help in this case? if yes how much RAM size can set in my case (where i have like more than 100 GB RAM)
I tried the same 20 million records in my local machine(8 GB RAM), it took the same ten minutes, i tried setting 1 GB RAM Buffer size same 10 minutes in local, without setting any RAM Buffer also same 10 minutes for 20 million records in my local machine.
I tried without setting RAM buffer size in linux,it took about 8 minutes for 20 million records.
final File docDir = new File(docsPath.getFile().getAbsolutePath());
LOG.info("Indexing to directory '" + indexPath + "'...");
Directory dir = FSDirectory.open(new File(indexPath.getFile().getAbsolutePath()));
Analyzer analyzer = null;
IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_47, analyzer);
iwc.setOpenMode(OpenMode.CREATE_OR_APPEND);
iwc.setRAMBufferSizeMB(512.0);
IndexWriter indexWriter = new IndexWriter(dir, iwc);
if (docDir.canRead()) {
if (docDir.isDirectory()) {
String[] files = docDir.list();
if (files != null) {
for (int i = 0; i < files.length; i++) {
File file = new File(docDir, files[i]);
String filePath = file.getPath();
String delimiter = BatchUtil.getProperty("file.delimiter");
if (filePath.indexOf("ecid") != -1) {
indexEcidFile(indexWriter, file, delimiter);
} else if (filePath.indexOf("entity") != -1) {
indexEntityFile(indexWriter, file, delimiter);
}
}
}
}
}
indexWriter.forceMerge(2);
indexWriter.close();
And one of the method used for indexing:
private void indexEntityFile(IndexWriter writer, File file, String delimiter) {
FileInputStream fis = null;
try {
fis = new FileInputStream(file);
BufferedReader br = new BufferedReader(new InputStreamReader(fis, Charset.forName("UTF-8")));
Document doc = new Document();
Field four_pk_Field = new StringField("four_pk", "", Field.Store.NO);
doc.add(four_pk_Field);
Field cust_grp_cd_Field = new StoredField("cust_grp_cd", "");
Field cust_grp_mbrp_id_Field = new StoredField("cust_grp_mbrp_id", "");
doc.add(cust_grp_cd_Field);
doc.add(cust_grp_mbrp_id_Field);
String line = null;
while ((line = br.readLine()) != null) {
String[] lineTokens = line.split("\\" + delimiter);
four_pk_Field.setStringValue(four_pk);
String cust_grp_cd = lineTokens[4];
cust_grp_cd_Field.setStringValue(cust_grp_cd);
String cust_grp_mbrp_id = lineTokens[5];
cust_grp_mbrp_id_Field.setStringValue(cust_grp_mbrp_id);
writer.addDocument(doc);
}
br.close();
} catch (FileNotFoundException fnfe) {
LOG.error("", fnfe);
} catch (IOException ioe) {
LOG.error("", ioe);
} finally {
try {
fis.close();
} catch (IOException e) {
LOG.error("", e);
}
}
}
Any ideas?
This happens, because you try to index all 20 million documents in 1 commit (and Lucene need to hold all 20 millions docs in memory). What should be done to fix it - is to add
writer.commit()
in indexEntityFile method, every X added documents. X could be 1 million or something like
Code could look like this (just show approach, you need to modify this code for your need)
int numberOfDocsInBatch = 0;
...
writer.addDocument(doc);
numberOfDocsInBatch ++;
if (numberOfDocsInBatch == 1_000_000) {
writer.commit();
numberOfDocsInBatch = 0;
}