Suppose I have a list of Book where the number of books can be quite large. I'm logging the isbn of those books. I've come up with two approaches, would there be any performance difference / which approach is considered as better ?
My concern with 2) will be whether the length of String is too long to become an issue. Refer to How many characters can a Java String have?, it's not likely it will hit the max number of characters, but I'm not sure on the point about "Half your maximum heap size", and whether it's actually a good practice to construct a long String.
Convert to list of String
List<Book> books = new ArrayList<>();
books.add(new Book().name("book1").isbn("001"));
books.add(new Book().name("book2").isbn("002"));
if (books != null && books.size() > 0) {
List<String> isbns = books.stream()
.map(Book:getIsbn)
.collect(Collectors.toList());
logger.info("List of isbn = {}", isbns);
} else {
logger.info("Empty list of isbn");
}
Using StringBuilder and concatenate as one long String
List<Book> books = new ArrayList<>();
books.add(new Book().name("book1").isbn("001"));
books.add(new Book().name("book2").isbn("002"));
if (books != null && books.size() > 0) {
StringBuilder strB = new StringBuilder();
strB.append("List of isbn: ");
books.stream()
.forEach(book -> {
strB.append(book.getIsbn());
strB.append("; ");
});
logger.info("List of isbn = {}", strB.toString());
} else {
logger.info("Empty list of isbn");
}
... but I'm not sure on the point about "Half your maximum heap size"
The JVM heap has a maximum size set by command line options, or by defaults.
If you fill the heap, your JVM will throw and OutOfMemoryError (OOME) and that will typically cause your application to terminate (or worse!).
When you construct a string using a StringBuilder the builder uses a roughly exponential resizing strategy. When you fill the builder's buffer, it allocates a new one with double the size. But the old and new buffers need to exist at the same time. So when the buffer is between 1/2 and 2/3rds of the size of the entire heap, and the buffer fills up, the StringBuilder will attempt allocate a new buffer that is larger than the remaining available space and an OOME will ensue.
Having said that, assembling a single string containing a huge amount of data is going to be bad for performance even if you don't trigger an OOME. A better idea is to write the data via a buffers OutputStream or Writer.
This may be problematic if you are outputting the data via a Logger. But I wouldn't try to use a Logger for that. And I certainly wouldn't try to do it using a single Logger.info(...) call.
Related
My application stores a large number (about 700,000) of strings in an ArrayList. The strings are loaded from a text file like this:
List<String> stringList = new ArrayList<String>(750_000);
//there's a try catch here but I omitted it for this example
Scanner fileIn = new Scanner(new FileInputStream(listPath), "UTF-8");
while (fileIn.hasNext()) {
String s = fileIn.nextLine().trim();
if (s.isEmpty()) continue;
if (s.startsWith("#")) continue; //ignore comments
stringList.add(s);
}
fileIn.close();
Later on, Other strings are compared to this list, using this code:
String example = "Something";
if (stringList.contains(example))
doSomething();
This comparison will happen many hundreds (thousands?) of times.
This all works, but I want to know if there's anything I can do to make it better. I notice that the JVM increases in size from about 100MB to 600MB when it loads the 700K Strings. The strings are mainly about this size:
Blackened Recordings
Divergent Series: Insurgent
Google
Pixels Movie Money
X Ambassadors
Power Path Pro Advanced
CYRFZQ
Is there anything I can do to reduce the memory, or is that to be expected? Any suggestions in general?
ArrayList is a memory effective. Probably your issue is caused by java.util.Scanner. Scanner creates a lot of temp objects during parsing (Patterns, Matchers etc) and not suitable for big files.
Try to replace it with java.io.BufferedReader:
List<String> stringList = new ArrayList<String>();
BufferedReader fileIn = new BufferedReader(new FileReader("UTF-8"));
String line = null;
while ((line = fileIn.readLine()) != null) {
line = line.trim();
if (line.isEmpty()) continue;
if (line.startsWith("#")) continue; //ignore comments
stringList.add(line);
}
fileIn.close();
See java.util.Scanner source code
To pinpoint memory issue attach to your JVM any memory profiler, for example VisualVM from JDK tools.
Added:
Let's make few assumtions:
you have 700000 string with 20 characters each.
object reference size is 32 bits, object header - 24, array header - 16, char - 16, int 32.
Then every string will consume 24+32*2+32+(16+20*16) = 456 bits.
Whole ArrayList with string object will consume about 700000*(32*2+456) = 364000000 bits = 43.4 MB (very roughly).
Not quite an answer, but:
Your scenario uses around 70mb on my machine:
long usedMemory = -(Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory());
{//
String[] strings = new String[700_000];
for (int i = 0; i < strings.length; i++) {
strings[i] = new String(new char[20]);
}
}//
usedMemory += Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
System.out.println(usedMemory / 1_000_000d + " mb");
How did you reach 500mb there? As far as I know, String has internally a char[], and each char has 16 bits. Taking the Object and String overhead in account, 500mb is still quite much for the strings only. You may perform some benchmarking tests on your machine.
As others already mentioned, you should change the datastructure for element look-ups/comparison.
You're likely going to be better off using a HashSet instead of an ArrayList as both add and contains are constant time operations in a HashSet.
However, it does assume that your object's hashCode implementation (which is part of Object, but can be overridden) is evenly distributed.
There is a Trie data structure which can be used as dictionary, with so many strings they can occur multiple times. https://en.wikipedia.org/wiki/Trie . It seems to fit your case.
UPDATE:
An alternative can be HashSet or HashMap string -> something if you want occurrences of strings for example. Hashed collection will be faster than list for sure.
I would start with HashSet.
Using an ArrayList is a very bad idea for your use case, because it is not sorted, and hence you cannot efficiently search for an entry.
The best built-in type for your case is a is a TreeSet<String>. It guarantees O(log(n)) Performance for add() and contains().
Be aware that TreeSet is not thread-safe in the basic implementation. Use an mt-safe wrapper (see the JavaDocs of TreeSet for this).
Here is a Java 8 approach. It uses Files.lines() method which take advantage of Stream API. This method reads all lines from a file as a Stream.
As a consequence no String objects are created till the terminal operation which is a static method MyExecutor.doSomething(String).
/**
* Process lines from a file.
* Uses Files.lines() method which take advantage of Stream API introduced in Java 8.
*/
private static void processStringsFromFile(final Path file) {
try (Stream<String> lines = Files.lines(file)) {
lines.map(s -> s.trim())
.filter(s -> !s.isEmpty())
.filter(s -> !s.startsWith("#"))
.filter(s -> s.contains("Something"))
.forEach(MyExecutor::doSomething);
} catch (IOException ex) {
logProcessStringsFailed(ex);
}
}
I conducted an Analysis of Memory Usage in NetBeans and here are the Memory Results for empty implementation of doSomething()
public static void doSomething(final String s) {
}
Live Bytes = 6702720 ≈ 6.4MB.
I have solved in various ways a simple problem on CodeEval, which specification can be found here (only a few lines long).
I have made 3 working versions (one of them in Scala) and I don't understand the difference of performances for my last Java version which I expected to be the best time and memory-wise.
I also compared this to a code found on Github. Here are the performance stats returned by CodeEval :
. Version 1 is the version found on Github
. Version 2 is my Scala solution :
object Main extends App {
val p = Pattern.compile("\\d+")
scala.io.Source.fromFile(args(0)).getLines
.filter(!_.isEmpty)
.map(line => {
val dists = new TreeSet[Int]
val m = p.matcher(line)
while (m.find) dists += m.group.toInt
val list = dists.toList
list.zip(0 +: list).map { case (x,y) => x - y }.mkString(",")
})
.foreach(println)
}
. Version 3 is my Java solution which I expected to be the best :
public class Main {
public static void main(String[] args) throws IOException {
Pattern p = Pattern.compile("\\d+");
File file = new File(args[0]);
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
while ((line = br.readLine()) != null) {
Set<Integer> dists = new TreeSet<Integer>();
Matcher m = p.matcher(line);
while (m.find()) dists.add(Integer.parseInt(m.group()));
Iterator<Integer> it = dists.iterator();
int prev = 0;
StringBuilder sb = new StringBuilder();
while (it.hasNext()) {
int curr = it.next();
sb.append(curr - prev);
sb.append(it.hasNext() ? "," : "");
prev = curr;
}
System.out.println(sb);
}
br.close();
}
}
Version 4 is the same as version 3 except I don't use a StringBuilder to print the output and do like in version 1
Here is how I interpreted those results :
version 1 is too slow because of the too high number of System.out.print calls. Moreover, using split on very large lines (that's the case in the tests performed) uses a lot of memory.
version 2 seems slow too but it is mainly because of an "overhead" on running Scala code on CodeEval, even very efficient code run slowly on it
version 2 uses unnecessary memory to build a list from the set, which also takes some time but should not be too significant. Writing more efficient Scala would probably like writing it in Java so I preferred elegance to performance
version 3 should not use that much memory in my opinion. The use of a StringBuilder has the same impact on memory as calling mkString in version 2
version 4 proves the calls to System.out.println are slowering down the program
Does someone see an explanation to those results ?
I conducted some tests.
There is a baseline for every type of language. I code in java and javascript. For javascript here are my test results:
Rev 1: Default empty boilerplate for JS with a message to standard output
Rev 2: Same without file reading
Rev 3: Just a message to the standard output
You can see that no matter what, there will be at least 200 ms runtime and about 5 megs of memory usage. This baseline depends on the load of the servers as well! There was a time when codeevals was heavily overloaded, thus making impossible to run anything within the max time(10s).
Check this out, a totally different challenge than the previous:
Rev4: My solution
Rev5: The same code submitted again now. Scored 8000 more ranking point. :D
Conclusion: I would not worry too much about CPU and memory usage and rank. It is clearly not reliable.
Your scala solution is slow, not because of "overhead on CodeEval", but because you are building an immutable TreeSet, adding elements to it one by one. Replacing it with something like
val regex = """\d+""".r // in the beginning, instead of your Pattern.compile
...
.map { line =>
val dists = regex.findAllIn(line).map(_.toInt).toIndexedSeq.sorted
...
Should shave about 30-40% off your execution time.
Same approach (build a list, then sort) will, probably, help your memory utilization in "version 3" (java sets are real memory hogs). It is also a good idea to give your list an initial size while you are at it (otherwise, it'll grow by 50% every time it runs out of capacity, which is wasteful in both memory and performance). 600 sounds like a good number, since that's the upper bound for the number of cities from the problem description.
Now, since we know the upper boundary, an even faster and slimmer approach is to do away with lists and boxed Integeres, and just do int dists[] = new int[600];.
If you wanted to get really fancy, you'd also make use of the "route length" range that's mentioned in the description. For example, instead of throwing ints into an array and sorting (or keeping a treeset), make an array of 20,000 bits (or even 20K bytes for speed), and set those that you see in input as you read it ... That would be both faster and more memory efficient than any of your solutions.
I tried solving this question and figured that you don't need the names of the cities, just the distances in a sorted array.
It has much better runtime of 738ms, and memory of 4513792 with this.
Although this may not help improve your piece of code, it seems like a better way to approach the question. Any suggestions to improve the code further are welcome.
import java.io.*;
import java.util.*;
public class Main {
public static void main (String[] args) throws IOException {
File file = new File(args[0]);
BufferedReader buffer = new BufferedReader(new FileReader(file));
String line;
while ((line = buffer.readLine()) != null) {
line = line.trim();
String out = new Main().getDistances(line);
System.out.println(out);
}
}
public String getDistances(String s){
//split the string
String[] arr = s.split(";");
//create an array to hold the distances as integers
int[] distances = new int[arr.length];
for(int i=0; i<arr.length; i++){
//find the index of , - get the characters after that - convert to integer - add to distances array
distances[i] = Integer.parseInt(arr[i].substring(arr[i].lastIndexOf(",")+1));
}
//sort the array
Arrays.sort(distances);
String output = "";
output += distances[0]; //append the distance to the closest city to the string
for(int i=0; i<arr.length-1; i++){
//get distance between current element(city) and next
int distance_between = distances[i+1] - distances[i];
//append the distance to the string
output += "," + distance_between;
}
return output;
}
}
I need to find out when I'm really close to the OutOfMemoryError so I can flush results to file and call runtime.gc();. My code is something like this:
Runtime runtime = Runtime.getRuntime();
...
if ((1.0 * runtime.totalMemory() / runtime.maxMemory()) > 0.9) {
... flush results to file ...
runtime.gc();
}
Is there a better way to do this? Can someone give me a hand please?
EDIT
I understood that I am playing with fire this way so I reasoned to a more solid and simple way of determining when I've had enough. I am currently working with the Jena model so I do a simple check: if the model has more than 550k statements then I flush so I don't run any risks.
First: if you want to determine if you're close to OutOfMemoryError, then what all you have to do is to compare the current memory with the max memory used by JVM, and that what you already did.
Second: You want to flush results to file, am wondering why you want to do that just if you close to OutOfMemoryError, you simply can use something like a FileWriter which has a buffer, so if the buffer got filled it will flush the results automatically.
Third: don't ever call the GC explicitly, its a bad practice, optimize your JVM memory arguments instead:
-Xmx -> this param to set the max memory that the JVM can allocate
-Xms -> the init memory that JVM will allocate on the start up
-XX:MaxPermSize= -> this for the max Permanent Generation memory
Also
-XX:MaxNewSize= -> this need to be 40% from your Xmx value
-XX:NewSize= -> this need to be 40% from your Xmx value
These will speed up the GC.
And -XX:+UseConcMarkSweepGC to enable using CMS for the old space.
This seems to work:
public class LowMemoryDetector {
// Use a soft reference to some memory - will be held onto until GC is nearly out of memory.
private final SoftReference<byte[]> buffer;
// The queue that watches for the buffer to be discarded.
private final ReferenceQueue<byte[]> queue = new ReferenceQueue<>();
// Have we seen the low condition?
private boolean seenLow = false;
public LowMemoryDetector(int bufferSize) {
// Make my buffer and add register the queue for it to be discarded to.
buffer = new SoftReference(new byte[bufferSize], queue);
}
/**
* Please be sure to create a new LMD after it returns true.
*
* #return true if a memory low condition has been detected.
*/
public boolean low () {
// Preserve that fact that we've seen a low.
seenLow |= queue.poll() != null;
return seenLow;
}
}
private static final int OneMeg = 0x100000;
public void test() {
LowMemoryDetector lmd = new LowMemoryDetector(2*OneMeg);
ArrayList<char[]> eatMemory = new ArrayList<>();
int ate = 0;
while ( !lmd.low() ) {
eatMemory.add(new char[OneMeg]);
ate += 1;
}
// Let it go.
eatMemory = null;
System.out.println("Ate "+ate);
}
it prints
Ate 1070
for me.
Use a buffer size of something larger than the largest allocation unit you are using. It needs to be big enough so that any allocation request would be satisfied if the buffer was freed.
Please remember that on a 64bit JVM it is potentially possible that you are running with many tb of memory. This approach would almost certainly encounter many difficulties in this case.
I have an application which accesses about 2 million tweets from a MySQL database. Specifically one of the fields holds a tweet of text (with maximum length of 140 characters). I am splitting every tweet into an ngram of words ngrams where 1 <= n <= 3. For example, consider the sentence:
I am a boring sentence.
The corresponding nGrams are:
I
I am
I am a
am
am a
am a boring
a
a boring
a boring sentence
boring
boring sentence
sentence
With about 2 million tweets, I am generating a lot of data. Regardless, I am surprised to get a heap error from Java:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:2145)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1922)
at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:3423)
at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:483)
at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:3118)
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:2288)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2709)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2728)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2678)
at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1612)
at twittertest.NGramFrequencyCounter.FreqCount(NGramFrequencyCounter.java:49)
at twittertest.Global.main(Global.java:40)
Here is the problem code statement (line 49) as given by the above output from Netbeans:
results = stmt.executeQuery("select * from tweets");
So, if I am running out of memory it must be that it is trying to return all the results at once and then storing them in memory. What is the best way to solve this problem? Specifically I have the following questions:
How can I process pieces of results rather than the whole set?
How would I increase the heap size? (If this is possible)
Feel free to include any suggestions, and let me know if you need more information.
EDIT
Instead of select * from tweets I partitioned the table into equally sized subsets of about 10% of the total size. Then I tried running the program. It looked like it was working fine but it eventually gave me the same heap error. This is strange to me because I have ran the same program in the past, successfully with 610,000 tweets. Now I have about 2,000,000 tweets or roughly 3 times as much more data. So if I split the data into thirds it should work, but I went further and split the subsets into size 10%.
Is some memory not being freed? Here is the rest of the code:
results = stmt.executeQuery("select COUNT(*) from tweets");
int num_tweets = 0;
if(results.next())
{
num_tweets = results.getInt(1);
}
int num_intervals = 10; //split into equally sized subets
int interval_size = num_tweets/num_intervals;
for(int i = 0; i < num_intervals-1; i++) //process 10% of the data at a time
{
results = stmt.executeQuery( String.format("select * from tweets limit %s, %s", i*interval_size, (i+1)*interval_size));
while(results.next()) //for each row in the tweets database
{
tweetID = results.getLong("tweet_id");
curTweet = results.getString("tweet");
int colPos = curTweet.indexOf(":");
curTweet = curTweet.substring(colPos + 1); //trim off the RT and retweeted
if(curTweet != null)
{
curTweet = removeStopWords(curTweet);
}
if(curTweet == null)
{
continue;
}
reader = new StringReader(curTweet);
tokenizer = new StandardTokenizer(Version.LUCENE_36, reader);
//tokenizer = new StandardFilter(Version.LUCENE_36, tokenizer);
//Set stopSet = StopFilter.makeStopSet(Version.LUCENE_36, stopWords, true);
//tokenizer = new StopFilter(Version.LUCENE_36, tokenizer, stopSet);
tokenizer = new ShingleFilter(tokenizer, 2, 3);
charTermAttribute = tokenizer.addAttribute(CharTermAttribute.class);
while(tokenizer.incrementToken()) //insert each nGram from each tweet into the DB
{
insertNGram.setInt(1, nGramID++);
insertNGram.setString(2, charTermAttribute.toString().toString());
insertNGram.setLong(3, tweetID);
insertNGram.executeUpdate();
}
}
}
Don't get all rows from table. Try to select partial
data based on your requirement by setting limits to query. You are using MySQL database your query would be select * from tweets limit 0,10. Here 0 is starting row id and 10 represents 10 rows from start.
You can always increase the heap size available to your JVM using the -Xmx argument. You should read up on all the knobs available to you (e.g. perm gen size). Google for other options or read this SO answer.
You probably can't do this kind of problem with a 32-bit machine. You'll want 64 bits and lots of RAM.
Another option would be to treat it as a map-reduce problem. Solve it on a cluster using Hadoop and Mahout.
Have you considered streaming the result set? Halfway down the page is a section on result set, and it addresses your problem (I think?) Write the n grams to a file, then process the next row? Or, am I misunderstanding your problem?
http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html
In some part of my application, I am parsing a 17MB log file into a list structure - one LogEntry per line. There are approximately 100K lines/log entries, meaning approx. 170 bytes per line. What surprised me is that I run out of heap space, even when I specify 128MB (256MB seems sufficient). How can 10MB of text turned into a list of objects cause a tenfold increase in space?
I understand that String objects use at least twice the amount of space compared to ANSI text (Unicode, one char=2 bytes), but this consumes at least four times that.
What I am looking for is an approximation for how much an ArrayList of n LogEntries will consume, or how my method might create extraneous objects that aggravate the situation (see comment below on String.trim())
This is the data part of my LogEntry class
public class LogEntry {
private Long id;
private String system, version, environment, hostName, userId, clientIP, wsdlName, methodName;
private Date timestamp;
private Long milliSeconds;
private Map<String, String> otherProperties;
This is the part doing the reading
public List<LogEntry> readLogEntriesFromFile(File f) throws LogImporterException {
CSVReader reader;
final String ISO_8601_DATE_PATTERN = "yyyy-MM-dd HH:mm:ss,SSS";
List<LogEntry> logEntries = new ArrayList<LogEntry>();
String[] tmp;
try {
int lineNumber = 0;
final char DELIM = ';';
reader = new CSVReader(new InputStreamReader(new FileInputStream(f)), DELIM);
while ((tmp = reader.readNext()) != null) {
lineNumber++;
if (tmp.length < LogEntry.getRequiredNumberOfAttributes()) {
String tmpString = concat(tmp);
if (tmpString.trim().isEmpty()) {
logger.debug("Empty string");
} else {
logger.error(String.format(
"Invalid log format in %s:L%s. Not enough attributes (%d/%d). Was %s . Continuing ...",
f.getAbsolutePath(), lineNumber, tmp.length, LogEntry.getRequiredNumberOfAttributes(), tmpString)
);
}
continue;
}
List<String> values = new ArrayList<String>(Arrays.asList(tmp));
String system, version, environment, hostName, userId, wsdlName, methodName;
Date timestamp;
Long milliSeconds;
Map<String, String> otherProperties;
system = values.remove(0);
version = values.remove(0);
environment = values.remove(0);
hostName = values.remove(0);
userId = values.remove(0);
String clientIP = values.remove(0);
wsdlName = cleanLogString(values.remove(0));
methodName = cleanLogString(stripNormalPrefixes(values.remove(0)));
timestamp = new SimpleDateFormat(ISO_8601_DATE_PATTERN).parse(values.remove(0));
milliSeconds = Long.parseLong(values.remove(0));
/* remaining properties are the key-value pairs */
otherProperties = parseOtherProperties(values);
logEntries.add(new LogEntry(system, version, environment, hostName, userId, clientIP,
wsdlName, methodName, timestamp, milliSeconds, otherProperties));
}
reader.close();
} catch (IOException e) {
throw new LogImporterException("Error reading log file: " + e.getMessage());
} catch (ParseException e) {
throw new LogImporterException("Error parsing logfile: " + e.getMessage(), e);
}
return logEntries;
}
Utility function used for populating the map
private Map<String, String> parseOtherProperties(List<String> values) throws ParseException {
HashMap<String, String> map = new HashMap<String, String>();
String[] tmp;
for (String s : values) {
if (s.trim().isEmpty()) {
continue;
}
tmp = s.split(":");
if (tmp.length != 2) {
throw new ParseException("Could not split string into key:value :\"" + s + "\"", s.length());
}
map.put(tmp[0], tmp[1]);
}
return map;
}
You also have a Map there, where you store other properties. Your code doesn't show how this Map is populated, but keep in mind that Maps may have a hefty memory overhead compared to the memory needed for the entries themselves.
The size of the array that backs the Map (at least 16 entries * 4 bytes) + one key/value pair per entry + the size of data themselves. Two map entries, each using 10 chars for key and 10 chars for value, would consume 16*4 + 2*2*4 + 2*10*2 + 2*10*2 + 2*2*8= 64+16+40+40+24 = 184 bytes (1 char = 2 byte, a String object consumes min 8 byte). That alone would almost double the space requirements for the entire log string.
Add to this that the LogEntry contains 12 Objects, i.e. at least 96 bytes. Hence the log objects alone would need around 100 bytes, give or take some, without the Map and without actual string data. Plus all the pointers for the references (4B each). I count at least 18 with the Map, meaning 72 bytes.
Adding the data (-object references and object "headers" mentioned in the last paragraph):
2 longs = 16B, 1 date stored as long = 8B, the map = 184B. In addition comes the string content, say 90 chars = 180 byte. Perhaps a byte or two in each end of the list item when put in the list, so in total somewhere around 100+72+16+8+184+180=560 ~ 600 byte per log line.
So around 600 byte per log line, meaning 100K lines would consume around 60MB minimum. This would place it at least in the same order of magnitude as the heap size that was set asize. In addition comes the fact that tmpString.trim() in a loop might be creating copies of string. Similarly String.format() may also be creating copies. The rest of the application must also fit within this heap space, and might explain where the rest of memory is going.
Don't forget that each String object consumes space (24 bytes ?) for the actual Object definition, plus the reference to the char array, the offset (for substring() usage) etc. So representing a line as 'n' strings will add that additional storage requirement. Can you lazily evaluate these instead within your LogEntry class ?
(re. the String offset usage - prior to Java 7b6 String.substring() acts as a window onto an existing char array and consequently you need an offset. This has recently changed and it may be worth determining if a later JDK build is more memory efficient)