why "while(Scanner.hasNext())" causes OutOfMemoryError in java?

why "while(Scanner.hasNext())" causes OutOfMemoryError in java? - java

import java.util.*;
import java.io.*;
public class Main {
public static void problem1 () {
Scanner scanner = new Scanner(System.in);
while (scanner.hasNext()) {
int n = scanner.nextInt();
int[][] nums = new int[n][2];
for (int i = 0; i < n; i++) {
nums[i][0] = scanner.nextInt();
nums[i][1] = scanner.nextInt();
}
Arrays.sort(nums, (a, b) -> {
return a[0] - b[0];
});
int[] dp = new int[n];
Arrays.fill(dp, 1);
int res = 1;
for (int i = 1; i < n; i++) {
for (int j = 0; j < i; j++) {
if (nums[i][1] >= nums[j][1]) {
dp[i] = Math.max(dp[i], dp[j] + 1);
}
}
if (dp[i] > res)
res = dp[i];
}
System.out.println(res);
}
}
public static void main(String[] args) throws IOException {
problem1();
}
}
enter link description here
While coding the above-mentioned code, I found that while(scanner.hasNext()) will cause "OutOfMemoryError: Java heap space"while the input data is more than 1000000; And the bug can be solved by removing while loop; but in my limited experience with JVM, I don't know why; Any ideas?

We really need to see the input to be sure why this is failing. But if you are getting an OOME in hasNext(), what is happening is that your application's input includes a token that is monstrously long.
The hasNext() call is going to read ahead on the input stream from the current position until it encounters a character (or EOF) that indicates the end of the next token. The characters that are read ahead need to be buffered in memory. The OOME means that you have managed to fill up the heap while buffering characters.
A possible "quick and dirty" workaround would be to make the heap size big enough to buffer the entire input. But I don't think that will work.
Why?
Suppose you manage to buffer a monstrously big token so that hasNext() can return true. The next thing that you do is to call nextInt() to read a value for n. But that is most likely going to fail, because either the token (found by hasNext()) is not a number, or is too large a number to be returned as an int. So nextInt() will throw an exception.
The real way to solve this is to figure out what the monstrous token actually is. That entails looking at the input that your application is reading.
And the bug can be solved by removing while loop;
Hmmm.
I read that as meaning that the OOME goes away if you remove the outer loop. You haven't actually solved the problem. Now your code will only process a single dataset.

You should do scanner.hasNextInt() before calling scanner.nextInt(). Not sure why you get OOM but this may help.

Related

Java Initialization Error

double pullPrice(String input){
if(input.length() < 3){
System.out.println("Error: 02; invalid item input, valid example: enter code here'milk 8.50'");
System.exit(0);
}
char[] inputArray = input.toCharArray();
char[] itemPriceArray;
double price;
boolean numVal = false;
int numCount = 0;
for(int i = 0; i <= inputArray.length-1; i ++){
//checking if i need to add char to char array of price
if(numVal == true){
//adding number to price array
itemPriceArray[numCount] = inputArray[i];
numCount++;
}
else{
if(inputArray[i] == ' '){
numVal = true;
//initializing price array
itemPriceArray = new char[inputArray.length - i];
}
else{
}
}
}
price = Double.parseDouble(String.valueOf(itemPriceArray));
return price;
}
Problem: attempting to pull the sequence of chars after white space between 'milk 8.50' as input. Initialization error occurs because I am initializing char array inside an if else statement that will initialize the array if it finds whitespace.
Question: since I don't know my char count number until I find a whitespace is there another way I can initialize? Does the compiler not trust me that I will initialize before calling array.
Also, if I am missing something or there are better ways to code any of this please let me know. I am in a java data structures class and learning fundamental data structures but would also like to focus on efficiency and modularity at the same time. I also have a pullPrice function that does the same thing but pulls the item name. I would like to combine these so i don't have to reuse the same code for both but can only return items with same datatype unless I create a class. Unfortunately this exercise is to use two arrays since we are practicing how to use ADT bags.
Any help is greatly appreciated?

Try something like this:
double pullPrice(String input)
{
try
{
// Instantiate a new scanner object, based on the input string
Scanner scanner = new Scanner(input);
// We skip the product (EG "milk")
String prod = scanner.next();
// and read the price(EG 8.5)
double price = scanner.nextDouble();
// We should close the scanner, to free resources...
scanner.close();
return price;
}
catch (NoSuchElementException ex)
{
System.out.println("Error: 02; invalid item input, valid example: enter code here 'milk 8.50'");
System.exit(0);
}
}

If you are sure that you program will get only proper input data then just initialize your array with null:
char[] itemPriceArray = null;
The main problem why the compiler is complaining - what happens if your program accesses uninitialized variable (for instance with wrong input data)? Java compiler prevents this kind of situations completely.

I will add to the other answers,
since you can't change the size of an array once created. You either have to allocate it bigger than you think you'll need or accept the overhead of having to reallocate it needs to grow in size. When it does you'll have to allocate a new one and copy the data from the old to the new:
int oldItems[] = new int[10];
for (int i=0; i<10; i++) {
oldItems[i] = i+10;
}
int newItems[] = new int[20];
System.arraycopy(oldItems, 0, newItems, 0, 10);
oldItems = newItems;

char[] itemPriceArray = new char[inputArray.length];

For Loop is performing slow

Please have a look at the following code
//Devide the has into set of 3 pieces
private void devideHash(String str)
{
int lastIndex = 0;
for(int i=0;i<=str.length();i=i+3)
{
lastIndex = i;
try
{
String stringPiece = str.substring(i, i+3);
// pw.println(stringPiece);
hashSet.add(stringPiece);
}
catch(Exception arr)
{
String stringPiece = str.substring(lastIndex, str.length());
// pw.println(stringPiece);
hashSet.add(stringPiece);
}
}
}
The above method receives String like abcdefgjijklmnop as the parameter. Inside the method, its job is to divide this sets of 3 letters. So when the operation is completed, the hashset will have pieces like abc def ghi jkl mno p
But the problem is that if the input String is big, then this loop takes noticeable amount of time to complete. Is there any way I can use to speed this process?

As an option, you could replace all your code with this line:
private void divideHash(String str) {
hashSet.addAll(Arrays.asList(str.split("(?<=\\G...)")));
}
Which will perform well.
Here's some test code:
String str = "abcdefghijklmnop";
hashSet.addAll(Arrays.asList(str.split("(?<=\\G...)")));
System.out.println(hashSet);
Output:
[jkl, abc, ghi, def, mno, p]

There is nothing we can really tell unless you tell us what the "noticeable large amount" is, and what is the expected time. It is recommended that you start a profiler to find what logic takes most time.
Some recommendations I can give from briefly reading your code is:
If the result Set is going to be huge, it will involve lots of resize and rehashing when your HashSet resize. It is recommended you first allocate required size. e.g.
HashSet hashSet = new HashSet<String>(input.size() / 3 + 1, 1.0);
This will save you lots of time for unnecessary rehashing
Never use exception to control your program flow.
Why not simply do:
int i = 0;
for (int i = 0; i < input.size(); i += 3) {
if (i + 3 > input.size()) {
// substring from i to end
} else {
// subtring from i to i+3
}
}

For Loop Not Terminating

I'm trying to get back into Java - it's been about 5 years since I studied the basics and I've been lost in the .Net world since.
I'm trying to create a student class below, however the for loop for reading in the integers into the array gets stuck when the program runs.
From my previous knowledge, and from research, the loop seems to be constructed properly and I can't seem to figure out where it's going wrong.
I'm sure it's something silly - as always but I was wondering if someone could point me in the right direction? :)
import java.util.*;
import acm.io.*;
public class Student {
// instance variables
private int studNumber; //Must be between (and including) 0 and 99999999. If input value invalid default to 0.
private String studName;
private int marks[];
/*
* Constructor Student Class
*/
public Student(int studNumber, String StudName, int marks[]) {
// initialise instance variables
if (studNumber >=0 && studNumber<= 99999999) {
this.studNumber= studNumber;
} else {
this.studNumber = 0; //default value
}
this.studName= StudName; // no validation
this.marks = marks;
IOConsole console = new IOConsole();
for (int i = 0; i <= 6; i++) {
marks[i] = console.readInt();
}
}
}

I think that the problem lies here:
for (int i = 0; i <= 6; i++)
{
marks[i] = console.readInt();
}
The only instance where I found a reference to IOConsole was here and it does not seem to be something which is part of the standard Java framework.
If you just need to scan numbers from console, you can use the Scanner class and the use the nextInt() method like below:
Scanner input = new Scanner(System.in);
for (int i = 0; i <= 6; i++)
{
marks[i] = input.nextInt();
}

The loop seems correct. Is it possible the console.readInt() call is blocking, which keeps you stuck in the loop (the IOConsole class is not part of the standard JDK, and I am not familiar with it)

readInt() is waiting for user input
from http://jtf.acm.org/javadoc/student/acm/io/IOConsole.html#readInt%28%29:
Reads and returns an integer value from the user

The problem is with console.readInt(), where another non-stop loop is executing or some other problem with that method

I believe the problem lies in the readInt() part. It's unusual to read input from the Console in a constructor for initializing the attributes, delegate that task to another part of your code and move it outside the constructor.

Very simple code for number search gives me infinite loop

I am a newbie Computer Science high school student and I have trouble with a small snippet of code. Basically, my code should perform a basic CLI search in an array of integers. However, what happens is I get what appears to be an infinite loop (BlueJ, the compiler I'm using, gets stuck and I have to reset the machine). I have set break points but I still don't quite get the problem...(I don't even understand most of the things that it tells me)
Here's the offending code (assume that "ArrayUtil" works, because it does):
import java.util.Scanner;
public class intSearch
{
public static void main(String[] args)
{
search();
}
public static void search()
{
int[] randomArray = ArrayUtil.randomIntArray(20, 100);
Scanner searchInput = new Scanner(System.in);
int searchInt = searchInput.nextInt();
if (findNumber(randomArray, searchInt) == -1)
{
System.out.println("Error");
}else System.out.println("Searched Number: " + findNumber(randomArray, searchInt));
}
private static int findNumber(int[] searchedArray, int searchTerm)
{
for (int i = 0; searchedArray[i] == searchTerm && i < searchedArray.length; i++)
{
return i;
}
return -1;
}
}
This has been bugging me for some time now...please help me identify the problem!

I don't know about the infinite loop but the following code is not going to work as you intended. The i++ can never be reached so i will always have the value 0.
for (int i = 0; searchedArray[i] == searchTerm && i < searchedArray.length; i++)
{
return i;
}
return -1;
You probably mean this:
for (int i = 0; i < searchedArray.length; i++)
{
if (searchedArray[i] == searchTerm)
{
return i;
}
}
return -1;

I don't know what is the class ArrayUtil (I can not import is using my Netbeans). When I try to change that line with the line int[] randomArray = {1 , 2, 3, 5, 7, 10, 1 , 5}; It works perfectly.
And you should change the loop condition. I will not tell you why but try with my array and you will see the bug soon. After you see it, you can fix it:)

There are 4 basic issues here.
1. Putting searchedArray[i] == searchTerm before i < searchedArray.length can result in an out-of-bounds exception. You must always prevent that kind of code.
2. Your intention seems to be the opposite of your code. Your method name implies finding a search term. But, your code implies that you want to continue your loop scan until the search term is not found, although your loop won't do that either. Think of "for (; this ;) { that } " as "while this do that".
3. Place a break point at the beginning of "search". Then, with a small array, step through the code line by line with the debugger and watch the variables. They don't lie. They will tell you exactly what's happening.
4. Please use a standard IDE and compiler, such as Eclipse and Sun's JDK 6 or 7. Eclipse with JDK 7 is a serious combination that doesn't exhibit a strange "infinite loop" as you describe above.

Efficient way to search a stream for a string

Let's suppose that have a stream of text (or Reader in Java) that I'd like to check for a particular string. The stream of text might be very large so as soon as the search string is found I'd like to return true and also try to avoid storing the entire input in memory.
Naively, I might try to do something like this (in Java):
public boolean streamContainsString(Reader reader, String searchString) throws IOException {
char[] buffer = new char[1024];
int numCharsRead;
while((numCharsRead = reader.read(buffer)) > 0) {
if ((new String(buffer, 0, numCharsRead)).indexOf(searchString) >= 0)
return true;
}
return false;
}
Of course this fails to detect the given search string if it occurs on the boundary of the 1k buffer:
Search text: "stackoverflow"
Stream buffer 1: "abc.........stack"
Stream buffer 2: "overflow.......xyz"
How can I modify this code so that it correctly finds the given search string across the boundary of the buffer but without loading the entire stream into memory?
Edit: Note when searching a stream for a string, we're trying to minimise the number of reads from the stream (to avoid latency in a network/disk) and to keep memory usage constant regardless of the amount of data in the stream. Actual efficiency of the string matching algorithm is secondary but obviously, it would be nice to find a solution that used one of the more efficient of those algorithms.

There are three good solutions here:
If you want something that is easy and reasonably fast, go with no buffer, and instead implement a simple nondeterminstic finite-state machine. Your state will be a list of indices into the string you are searching, and your logic looks something like this (pseudocode):
String needle;
n = needle.length();
for every input character c do
add index 0 to the list
for every index i in the list do
if c == needle[i] then
if i + 1 == n then
return true
else
replace i in the list with i + 1
end
else
remove i from the list
end
end
end
This will find the string if it exists and you will never need a
buffer.
Slightly more work but also faster: do an NFA-to-DFA conversion that figures out in advance what lists of indices are possible, and assign each one to a small integer. (If you read about string search on Wikipedia, this is called the powerset construction.) Then you have a single state and you make a state-to-state transition on each incoming character. The NFA you want is just the DFA for the string preceded with a state that nondeterministically either drops a character or tries to consume the current character. You'll want an explicit error state as well.
If you want something faster, create a buffer whose size is at least twice n, and user Boyer-Moore to compile a state machine from needle. You'll have a lot of extra hassle because Boyer-Moore is not trivial to implement (although you'll find code online) and because you'll have to arrange to slide the string through the buffer. You'll have to build or find a circular buffer that can 'slide' without copying; otherwise you're likely to give back any performance gains you might get from Boyer-Moore.

I did a few changes to the Knuth Morris Pratt algorithm for partial searches. Since the actual comparison position is always less or equal than the next one there is no need for extra memory. The code with a Makefile is also available on github and it is written in Haxe to target multiple programming languages at once, including Java.
I also wrote a related article: searching for substrings in streams: a slight modification of the Knuth-Morris-Pratt algorithm in Haxe. The article mentions the Jakarta RegExp, now retired and resting in the Apache Attic. The Jakarta Regexp library “match” method in the RE class uses a CharacterIterator as a parameter.
class StreamOrientedKnuthMorrisPratt {
var m: Int;
var i: Int;
var ss:
var table: Array<Int>;
public function new(ss: String) {
this.ss = ss;
this.buildTable(this.ss);
}
public function begin() : Void {
this.m = 0;
this.i = 0;
}
public function partialSearch(s: String) : Int {
var offset = this.m + this.i;
while(this.m + this.i - offset < s.length) {
if(this.ss.substr(this.i, 1) == s.substr(this.m + this.i - offset,1)) {
if(this.i == this.ss.length - 1) {
return this.m;
}
this.i += 1;
} else {
this.m += this.i - this.table[this.i];
if(this.table[this.i] > -1)
this.i = this.table[this.i];
else
this.i = 0;
}
}
return -1;
}
private function buildTable(ss: String) : Void {
var pos = 2;
var cnd = 0;
this.table = new Array<Int>();
if(ss.length > 2)
this.table.insert(ss.length, 0);
else
this.table.insert(2, 0);
this.table[0] = -1;
this.table[1] = 0;
while(pos < ss.length) {
if(ss.substr(pos-1,1) == ss.substr(cnd, 1))
{
cnd += 1;
this.table[pos] = cnd;
pos += 1;
} else if(cnd > 0) {
cnd = this.table[cnd];
} else {
this.table[pos] = 0;
pos += 1;
}
}
}
public static function main() {
var KMP = new StreamOrientedKnuthMorrisPratt("aa");
KMP.begin();
trace(KMP.partialSearch("ccaabb"));
KMP.begin();
trace(KMP.partialSearch("ccarbb"));
trace(KMP.partialSearch("fgaabb"));
}
}

The Knuth-Morris-Pratt search algorithm never backs up; this is just the property you want for your stream search. I've used it before for this problem, though there may be easier ways using available Java libraries. (When this came up for me I was working in C in the 90s.)
KMP in essence is a fast way to build a string-matching DFA, like Norman Ramsey's suggestion #2.

This answer applied to the initial version of the question where the key was to read the stream only as far as necessary to match on a String, if that String was present. This solution would not meet the requirement to guarantee fixed memory utilisation, but may be worth considering if you have found this question and are not bound by that constraint.
If you are bound by the constant memory usage constraint, Java stores arrays of any type on the heap, and as such nulling the reference does not deallocate memory in any way; I think any solution involving arrays in a loop will consume memory on the heap and require GC.
For simple implementation, maybe Java 5's Scanner which can accept an InputStream and use a java.util.regex.Pattern to search the input for might save you worrying about the implementation details.
Here's an example of a potential implementation:
public boolean streamContainsString(Reader reader, String searchString)
throws IOException {
Scanner streamScanner = new Scanner(reader);
if (streamScanner.findWithinHorizon(searchString, 0) != null) {
return true;
} else {
return false;
}
}
I'm thinking regex because it sounds like a job for a Finite State Automaton, something that starts in an initial state, changing state character by character until it either rejects the string (no match) or gets to an accept state.
I think this is probably the most efficient matching logic you could use, and how you organize the reading of the information can be divorced from the matching logic for performance tuning.
It's also how regexes work.

Instead of having your buffer be an array, use an abstraction that implements a circular buffer. Your index calculation will be buf[(next+i) % sizeof(buf)], and you'll have to be careful to full the buffer one-half at a time. But as long as the search string fits in half the buffer, you'll find it.

I believe the best solution to this problem is to try to keep it simple. Remember, beacause I'm reading from a stream, I want to keep the number of reads from the stream to a minimum (as network or disk latency may be an issue) while keeping the amount of memory used constant (as the stream may be very large in size). Actual efficiency of the string matching is not the number one goal (as that has been studied to death already).
Based on AlbertoPL's suggestion, here's a simple solution that compares the buffer against the search string character by character. The key is that because the search is only done one character at a time, no back tracking is needed and therefore no circular buffers, or buffers of a particular size are needed.
Now, if someone can come up with a similar implementation based on Knuth-Morris-Pratt search algorithm then we'd have a nice efficient solution ;)
public boolean streamContainsString(Reader reader, String searchString) throws IOException {
char[] buffer = new char[1024];
int numCharsRead;
int count = 0;
while((numCharsRead = reader.read(buffer)) > 0) {
for (int c = 0; c < numCharsRead; c++) {
if (buffer[c] == searchString.charAt(count))
count++;
else
count = 0;
if (count == searchString.length()) return true;
}
}
return false;
}

If you're not tied to using a Reader, then you can use Java's NIO API to efficiently load the file. For example (untested, but should be close to working):
public boolean streamContainsString(File input, String searchString) throws IOException {
Pattern pattern = Pattern.compile(Pattern.quote(searchString));
FileInputStream fis = new FileInputStream(input);
FileChannel fc = fis.getChannel();
int sz = (int) fc.size();
MappedByteBuffer bb = fc.map(FileChannel.MapMode.READ_ONLY, 0, sz);
CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
CharBuffer cb = decoder.decode(bb);
Matcher matcher = pattern.matcher(cb);
return matcher.matches();
}
This basically mmap()'s the file to search and relies on the operating system to do the right thing regarding cache and memory usage. Note however that map() is more expensive the just reading the file in to a large buffer for files less than around 10 KiB.

A very fast searching of a stream is implemented in the RingBuffer class from the Ujorm framework. See the sample:
Reader reader = RingBuffer.createReader("xxx ${abc} ${def} zzz");
String word1 = RingBuffer.findWord(reader, "${", "}");
assertEquals("abc", word1);
String word2 = RingBuffer.findWord(reader, "${", "}");
assertEquals("def", word2);
String word3 = RingBuffer.findWord(reader, "${", "}");
assertEquals("", word3);
The single class implementation is available on the SourceForge:
For more information see the link.

Implement a sliding window. Have your buffer around, move all elements in the buffer one forward and enter a single new character in the buffer at the end. If the buffer is equal to your searched word, it is contained.
Of course, if you want to make this more efficient, you can look at a way to prevent moving all elements in the buffer around, for example by having a cyclic buffer and a representation of the strings which 'cycles' the same way the buffer does, so you only need to check for content-equality. This saves moving all elements in the buffer.

I think you need to buffer a small amount at the boundary between buffers.
For example if your buffer size is 1024 and the length of the SearchString is 10, then as well as searching each 1024-byte buffer you also need to search each 18-byte transition between two buffers (9 bytes from the end of the previous buffer concatenated with 9 bytes from the start of the next buffer).

I'd say switch to a character by character solution, in which case you'd scan for the first character in your target text, then when you find that character increment a counter and look for the next character. Every time you don't find the next consecutive character restart the counter. It would work like this:
public boolean streamContainsString(Reader reader, String searchString) throws IOException {
char[] buffer = new char[1024];
int numCharsRead;
int count = 0;
while((numCharsRead = reader.read(buffer)) > 0) {
if (buffer[numCharsRead -1] == searchString.charAt(count))
count++;
else
count = 0;
if (count == searchString.size())
return true;
}
return false;
}
The only problem is when you're in the middle of looking through characters... in which case there needs to be a way of remembering your count variable. I don't see an easy way of doing so except as a private variable for the whole class. In which case you would not instantiate count inside this method.

You might be able to implement a very fast solution using Fast Fourier Transforms, which, if implemented properly, allow you to do string matching in times O(nlog(m)), where n is the length of the longer string to be matched, and m is the length of the shorter string. You could, for example, perform FFT as soon as you receive an stream input of length m, and if it matches, you can return, and if it doesn't match, you can throw away the first character in the stream input, wait for a new character to appear through the stream, and then perform FFT again.

You can increase the speed of search for very large strings by using some string search algorithm

If you're looking for a constant substring rather than a regex, I'd recommend Boyer-Moore. There's plenty of source code on the internet.
Also, use a circular buffer, to avoid think too hard about buffer boundaries.
Mike.

I also had a similar problem: skip bytes from the InputStream until specified string (or byte array). This is the simple code based on circular buffer. It is not very efficient but works for my needs:
private static boolean matches(int[] buffer, int offset, byte[] search) {
final int len = buffer.length;
for (int i = 0; i < len; ++i) {
if (search[i] != buffer[(offset + i) % len]) {
return false;
}
}
return true;
}
public static void skipBytes(InputStream stream, byte[] search) throws IOException {
final int[] buffer = new int[search.length];
for (int i = 0; i < search.length; ++i) {
buffer[i] = stream.read();
}
int offset = 0;
while (true) {
if (matches(buffer, offset, search)) {
break;
}
buffer[offset] = stream.read();
offset = (offset + 1) % buffer.length;
}
}

Here is my implementation:
static boolean containsKeywordInStream( Reader ir, String keyword, int bufferSize ) throws IOException{
SlidingContainsBuffer sb = new SlidingContainsBuffer( keyword );
char[] buffer = new char[ bufferSize ];
int read;
while( ( read = ir.read( buffer ) ) != -1 ){
if( sb.checkIfContains( buffer, read ) ){
return true;
}
}
return false;
}
SlidingContainsBuffer class:
class SlidingContainsBuffer{
private final char[] keyword;
private int keywordIndexToCheck = 0;
private boolean keywordFound = false;
SlidingContainsBuffer( String keyword ){
this.keyword = keyword.toCharArray();
}
boolean checkIfContains( char[] buffer, int read ){
for( int i = 0; i < read; i++ ){
if( keywordFound == false ){
if( keyword[ keywordIndexToCheck ] == buffer[ i ] ){
keywordIndexToCheck++;
if( keywordIndexToCheck == keyword.length ){
keywordFound = true;
}
} else {
keywordIndexToCheck = 0;
}
} else {
break;
}
}
return keywordFound;
}
}
This answer fully qualifies the task:
The implementation is able to find the searched keyword even if it was split between buffers
Minimum memory usage defined by the buffer size
Number of reads will be minimized by using bigger buffer

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

why "while(Scanner.hasNext())" causes OutOfMemoryError in java? - java

You should do scanner.hasNextInt() before calling scanner.nextInt(). Not sure why you get OOM but this may help.

Related

Java Initialization Error

For Loop is performing slow

For Loop Not Terminating

Very simple code for number search gives me infinite loop

Efficient way to search a stream for a string

Categories

Resources