StackOverflowError in a terminating loop? - java

To give some context: I recently started playing Dungeons and Dragons with a group of friends. I decided I wanted to try to make a program that allowed me to search for spells by level, school of magic, etc. To do this, I took a text file with every spell and its information listed alphabetically by spell name, and created a few regex expressions to sort through it all. I finally got it to give me the correct results for every attribute. But once I put it in a loop to get everything at once, I get a long list of errors, beginning with StackOverflowError. As far as I'm aware, this is supposed to happen when you get infinite loops, but mine definitely terminates. Moreover, I can go farther looping manually (with a loop that checks a boolean that I set with the keyboard at the end of each loop) than I can with a simple for or while loop.
The code I'm using is below. I didn't include the Spell class because it's just standard getters/setters and variable declarations. The School type I have is just an enum with the eight schools.
Map<String, Spell> allSpells = new HashMap<String, Spell>();
ArrayList<Spell> spellArray = new ArrayList<Spell>();
int finalLevel;
int lastMatch = 0;
int startIndex = 0;
Matcher match;
String finalTitle;
Spell.School finalSchool;
String finalDescription;
String fullList;
String titleString = ".+:\\n"; //Finds the titles of spells
Pattern titlePattern = Pattern.compile(titleString);
String levelString = "\\d\\w+-level"; //Finds the level of spells
Pattern levelPattern = Pattern.compile(levelString);
String schoolString = "(C|c)onjuration|(A|a)bjuration|(E|e)nchantment|(N|n)ecromancy|(E|e)vocation|(D|d)ivination|(I|i)llusion|(T|t)ransmutation"; //Finds the school of spells
Pattern schoolPattern = Pattern.compile(schoolString);
String ritualString = "\\(ritual\\)"; //Finds if a spell is a ritual
Pattern ritualPattern = Pattern.compile(ritualString);
String descriptionString = "\nCasting Time: (.|\\n)+?(\\n\\n)"; //Finds the description of spells
Pattern descriptionPattern = Pattern.compile(descriptionString);
try
{
BufferedReader in = new BufferedReader(new FileReader("Spell List.txt"));
// buffer for storing file contents in memory
StringBuffer stringBuffer = new StringBuffer("");
// for reading one line
String line = null;
// keep reading till readLine returns null
while ((line = in.readLine()) != null)
{
// keep appending last line read to buffer
stringBuffer.append(line + "\n");
}
fullList = stringBuffer.toString(); //Convert stringBuffer to a normal String. Used for setting fullList = a substring
boolean cont = true;
for(int i = 0; i < 100; i++) //This does not need to be set to 100. This is just a temporary number. Anything over 4 gives me this error, but under 4 I am fine.
{
//Spell Title
match = titlePattern.matcher(fullList);
match.find(); //Makes match point to the first title found
finalTitle = match.group().substring(0, match.group().length()-1); //finalTitle is set to found group, without the newline at the end
allSpells.put(finalTitle, new Spell()); //Creates unnamed Spell object tied to the matched title in the allSpells map
spellArray.add(allSpells.get(finalTitle)); //Adds the unnamed Spell object to a list.
//To be used for iterating through all Spells to find properties matching criteria
//Spell Level
match = levelPattern.matcher(fullList.substring(match.end(), match.end()+50)); //Gives an approximate region in which this could appear
if(match.find()) //Accounts for cantrips. If no match for a level is found, it is set to 0
{
finalLevel = Integer.valueOf(match.group().substring(0, 1));
}
else
{
finalLevel = 0;
}
allSpells.get(finalTitle).setSpellLevel(finalLevel);
//Spell School
match = schoolPattern.matcher(fullList);
match.find();
finalSchool = Spell.School.valueOf(match.group().substring(0, 1).toUpperCase() + match.group().substring(1, match.group().length())); //Capitalizes matched school
allSpells.get(finalTitle).setSpellSchool(finalSchool);
//Ritual?
match = ritualPattern.matcher(fullList.substring(0, 75));
if(match.find())
{
allSpells.get(finalTitle).setRitual(true);
}
else
allSpells.get(finalTitle).setRitual(false);
//Spell Description
match = descriptionPattern.matcher(fullList);
match.find();
finalDescription = match.group().substring(1); //Gets rid of the \n at the beginning of the description
allSpells.get(finalTitle).setDescription(finalDescription);
lastMatch = match.end();
System.out.println(finalTitle);
fullList = fullList.substring(lastMatch);
}
}
catch (Exception e)
{
e.printStackTrace();
}
If it helps, I have the list I'm using here.
As I mentioned in the comments of the code, going through the loop more than 4 times gives me this error, but under 4 does not. I have tried doing this as a while loop as well, and I get the same error.
I have tried searching for a solution online, but everything I see about this error just talks about recursive calls. If anyone has a solution for this I would greatly appreciate it. Thanks.
EDIT: The error list I'm getting is huge, so I put it in a text file here
. I know people are asking for stack traces, and I hope that this is what they mean. I'm still relatively new to java and have never had to work with stack traces before.
EDIT 2: I have found that if I simply replace the description regex with "\nCasting Time:" that it runs through the whole thing without errors. The only problem, of course, is that it doesn't collect all the information I want it to. Hopefully this information will help determine the problem though.
FINAL EDIT: I did a bit more searching once I found the specific line causing the problem, and found that increasing the stack size fixed the problem.

By increasing the stack size, you're treating a symptom and leaving the problem unsolved. In this case, the problem is an inefficient regex.
First, if you want to match anything including newlines, you should always use the DOTALL option. An alternation like .|\n is much less efficient. (It's also incorrect. The dot matches anything that's not a line terminator, which can be much more than just \n.)
Second, that alternation is inside a capturing group, with the quantifier outside the group: (.|\n)+?. That means you're capturing one character at a time, only to overwrite the captured character with the next character, an so on. You're making the regex engine do a lot of unnecessary work.
Here's the regex I would use:
"(?ms)^Casting Time: (.+?)\n\n"
The DOTALL option can be activated with the inline modifier, (?s). I also used the MULTILINE option, which lets me anchor the match to the beginning of the line with ^. That way, there's no need consume the leading \n, only to chop it off later. In fact, if you use group(1) instead of group(), the trailing \n\n will be excluded as well.
As for RegExr, it uses a different regex flavor than Java's--one with far fewer features. Most Java regexes will work on the excellent Regex101 site with the pcre (php) option selected. For absolute compatibility, there RegexPlanet's Java page, or a code testing site like Ideone.

Related

Java change full name to initial. last name

I have a database of player names that i need converted for me to be able to further work with them (for example: I need Antonio Brown converted to A. Brown). My problem is that there are also names that only consist of the first name (for example Antonio) Therefore i get an ArrayIndexOutOfBoundsException: 1, is there another way to get what i want and why does it even with the if condition stil split?
if(spalte[1].contains(" ")){
String[] me = spalte[0].split(" ", 2);
String na = me[0].substring(0);
name = na + ". " + me[1];
} else {
name = spalte[1];
}
Firstly, I highly recommend you to keep your code formatted and variables named properly. It helps not only others to understand a snippet better but also makes debugging a bit easier.
While working with arrays and String::split, you have to be careful with indices because they might overflow easily.
Do you need to make the code handle multiple spaces: Antonio Light Brown -> A. L. Brown? The steps are simple and practically the same for any number of names:
Split by a space delimiter
Shorten the n-1 first partitions
Concatenate the String back
Here is the code:
String split[] = name.trim().split(" "); // Trim the multiple spaces inside to avoid empty parts
StringBuilder sb = new StringBuilder(); // StringBuilder builds the String
for (int i=0; i<split.length; i++) { // Iterate the parts
if (i<split.length -1) { // If not the last part
sb.append(split[i].charAt(0)).append(". "); // Append the first letter and a dot
} else sb.append(split[i]); // Or else keep the entire word
}
System.out.println(sb.toString()); // StringBuilder::toString returns a composed String
Hypothetically: How would you handle names such as O'Neil or de Anthony? You can include the conditional concatenation in the for-loop.

Comparing parts of Arrays against each other?

I'm really really really not sure what is the best way to approach this. I've gotten as far as I can, but I basically want to scan a user response with an array of words and search for matches so that my AI can tell what mood someone is in based off the words they used. However, I've yet to find a clear or helpful answer. My code is pretty cluttered too because of how many different methods I've tried to use. I either need a way to compare sections of arrays to each other or portions of strings. I've found things for finding a part of an array. Like finding eggs in green eggs and ham, but I've found nothing that finds a section of an array in a section of another array.
public class MoodCompare extends Mood1 {
public static void MoodCompare(String inputMood){
int inputMoodLength = inputMood.length();
int HappyLength = Arrays.toString(Happy).length();
boolean itWorks = false;
String[] inputMoodArray = inputMood.split(" ");
if(Arrays.toString(Happy).contains(Arrays.toString(inputMoodArray)) == true)
System.out.println("Success!");
InputMood is the data the user has input that should have keywords lurking in them to their mood. Happy is an array of the class Mood1 that is being extended. This is only a small piece of the class, much less the program, but it should be all I need to make a valid comparison to complete the class.
If anyone can help me with this, you will save me hours of work. So THANK YOU!!!
Manipulating strings will be nicer when you do not use the relative primitive arrays, where you have to walk through yourself etcetera. A Dutch proverb says: not seeing the wood through the trees.
In this case it seems you check words of the input against a set of words for some mood.
Lets use java collections:
Turning an input string into a list of words:
String input = "...";
List<String> sentence = Arrays.asList(input.split("\\W+"));
sentence.remove("");
\\W+ is a sequence of one or more non-word characters. Mind "word" mean A-Za-z0-9_.
Now a mood would be a set of unique words:
Set<String> moodWords = new HashSet<>();
Collections.addAll(moodWords, "happy", "wow", "hurray", "great");
Evaluation could be:
int matches = 0;
for (String word : sentence) {
if (moodWords.contains(word)) {
++matches;
}
}
int percent = sentence.isEmpty() ? 0 : matches * 100 / sentence.size();
System.out.printf("Happiness: %d %%%n", percent);
In java 8 even compacter.
int matches = sentence.stream().filter(moodWords::contains).count();
Explanation:
The foreach-word-in-sentence takes every word. For every word it checks whether it is contained in moodWords, the set of all mood words.
The percentage is taken over the number of words in the sentence being moody. The boundary condition of an empty sentence is handled by the if-then-else expression ... ? ... : ... - an empty sentence given the arbitrary percentage 0%.
The printf format used %d for the integer, %% for the percent sign % (self-escaped) and %n for the line break character(s).
If I'm understanding your question correctly, you mean something like this?
String words[] = {"green", "eggs", "and", "ham"};
String response = "eggs or ham";
Mood mood = new Mood();
for(String foo : words)
{
if(response.contains(foo))
{
//Check if happy etc...
if(response.equals("green")
mood.sad++;
...
}
}
System.out.println("Success");
...
//CheckMood() etc... other methods.
Try to use tokens.
Every time that the program needs to compare the contents of a row from one array to the other array, just tokenize the contents in parallel and compare them.
Visit the following Java Doc page for farther reference: http://docs.oracle.com/javase/7/docs/api/java/util/StringTokenizer.html
or even view the following web pages:
http://introcs.cs.princeton.edu/java/72regular/Tokenizer.java.html

How to best strip out certain strings in a file?

If I have a file with the following content:
11:17 GET this is my content #2013
11:18 GET this is my content #2014
11:19 GET this is my content #2015
How can I use a Scanner and ignore certain parts of a `String line = scanner.nextLine();?
The result that I like to have would be:
this is my content
this is my content
this is my content
So I'd like to trip everything from the start until GET, and then take everything until the # char.
How could this easily be done?
You can use the String.indexOf(String str) and String.indexOf(char ch) methods. For example:
String line = scanner.nextLine();
int start = line.indexOf("GET");
int end = line.indexOf('#');
String result = line.substring(start + 4, end);
One way might be
String strippedStart = scanner.nextLine().split(" ", 3)[2];
String result = strippedStart.substring(0, strippedStart.lastIndexOf("#")).trim();
This assumes the are always two space separated tokens at the beginning (11:22 GET or 11:33 POST, idk).
You could do something like this:-
String line ="11:17 GET this is my content #2013";
int startIndex = line.indexOf("GET ");
int endIndex = line.indexOf("#");
line = line.substring(startIndex+4, endIndex-1);
System.out.println(line);
In my opinion the best solution for your problem would be using Java regex. Using regex you can define which group or groups of text you want to retrieve and what kind of text comes where. I haven't been working with Java in a long time, so I'll try to help you out from the top of my head. I'll try to give you a point in the right direction.
First off, compile a pattern:
Pattern pattern = Pattern.compile("^\d{1,2}:\d{1,2} GET (.*?) #\d+$", Pattern.MULTILINE);
First part of the regex says that you expect one or two digits followed by a colon followed by one or two digits again. After that comes the GET (you can use GET|POST if you expect those words or \w+? if you expect any word). Then you define the group you want with the parentheses. Lastly, you put the hash and any number of digits with at least one digit. You might consider putting flags DOTALL and CASE_INSENSITIVE, although I don't think you'll be needing them.
Then you continue with the matcher:
Matcher matcher = pattern.matcher(textToParse);
while (matcher.find())
{
//extract groups here
String group = matcher.group(1);
}
In the while loop you can use matcher.group(1) to find the text in the group you selected with the parentheses (the text you'd like extracted). matcher.group(0) gives the entire find, which is not what you're currently looking for (I guess).
Sorry for any errors in the code, it has not been tested. Hope this puts you on the right track.
You can try this rather flexible solution:
Scanner s = new Scanner(new File("data"));
Pattern p = Pattern.compile("^(.+?)\\s+(.+?)\\s+(.*)\\s+(.+?)$");
Matcher m;
while (s.hasNextLine()) {
m = p.matcher(s.nextLine());
if (m.find()) {
System.out.println(m.group(3));
}
}
This piece of code ignores first, second and last words from every line before printing them.
Advantage is that it relies on whitespaces rather than specific string literals to perform the stripping.

Word boundary detection from text

I am having this problem with word boundary identification. I removed all the markup of the wikipedia document, now I want to get a list of entities.(meaningful terms). I am planning to take bi-grams, tri-grams of the document and check if it exists in dictionary(wordnet). Is there a better way to achieve this.
Below is the sample text. I want to identify entities(shown as surrounded by double quotes)
Vulcans are a humanoid species in the fictional "Star Trek" universe who evolved on the planet Vulcan and are noted for their attempt to live by reason and logic with no interference from
emotion They were the first extraterrestrial species officially to make first contact with Humans and later became one of the founding members of the "United Federation of Planets"
I think what you're talking about is really still a subject of burgeoning research rather than a simple matter of applying well-established algorithms.
I can't give you a simple "do this" answer, but here are some pointers off the top of my head:
I think using WordNet could work (not sure where bigrams/trigrams come into it though), but you should view WordNet lookups as part of a hybrid system, not the be-all and end-all to spotting named entities
then, start by applying some simple, common-sense criteria (sequences of capitalised words; try and accommodate frequent lower-case function words like 'of' into these; sequences consisting of "known title" plus capitalisd word(s));
look for sequences of words that statistically you wouldn't expect to appear next to one another by chance as candidates for entities;
can you build in dynamic web lookup? (your system spots the capitalised sequence "IBM" and sees if it finds e.g. a wikipedia entry with the text pattern "IBM is ... [organisation|company|...]".
see if anything here and in the "information extraction" literature in general gives you some ideas: http://www-nlpir.nist.gov/related_projects/muc/proceedings/muc_7_toc.html
The truth is that when you look at what literature there is out there, it doesn't seem like people are using terribly sophisticated, well-established algorithms. So I think there's a lot of room for looking at your data, exploration and seeing what you can come up with... Good luck!
If I understand correctly, you're looking to extract substrings delimited by double-quotation marks ("). You could use capture-groups in regular expressions:
String text = "Vulcans are a humanoid species in the fictional \"Star Trek\"" +
" universe who evolved on the planet Vulcan and are noted for their " +
"attempt to live by reason and logic with no interference from emotion" +
" They were the first extraterrestrial species officially to make first" +
" contact with Humans and later became one of the founding members of the" +
" \"United Federation of Planets\"";
String[] entities = new String[10]; // An array to hold matched substrings
Pattern pattern = Pattern.compile("[\"](.*?)[\"]"); // The regex pattern to use
Matcher matcher = pattern.matcher(text); // The matcher - our text - to run the regex on
int startFrom = text.indexOf('"'); // The index position of the first " character
int endAt = text.lastIndexOf('"'); // The index position of the last " character
int count = 0; // An index for the array of matches
while (startFrom <= endAt) { // startFrom will be changed to the index position of the end of the last match
matcher.find(startFrom); // Run the regex find() method, starting at the first " character
entities[count++] = matcher.group(1); // Add the match to the array, without its " marks
startFrom = matcher.end(); // Update the startFrom index position to the end of the matched region
}
OR write a "parser" with String functions:
int startFrom = text.indexOf('"'); // The index-position of the first " character
int nextQuote = text.indexOf('"', startFrom+1); // The index-position of the next " character
int count = 0; // An index for the array of matches
while (startFrom > -1) { // Keep looping as long as there is another " character (if there isn't, or if it's index is negative, the value of startFrom will be less-than-or-equal-to -1)
entities[count++] = text.substring(startFrom+1, nextQuote); // Retrieve the substring and add it to the array
startFrom = text.indexOf('"', nextQuote+1); // Find the next " character after nextQuote
nextQuote = text.indexOf('"', startFrom+1); // Find the next " character after that
}
In both, the sample-text is hard-coded for the sake of the example and the same variable is presumed to be present (the String variable named text).
If you want to test the contents of the entities array:
int i = 0;
while (i < count) {
System.out.println(entities[i]);
i++;
}
I have to warn you, there may be issues with border/boundary cases (i.e. when a " character is at the beginning or end of a string. These examples will not work as expected if the parity of " characters is uneven (i.e. if there is an odd number of " characters in the text). You could use a simple parity-check before-hand:
static int countQuoteChars(String text) {
int nextQuote = text.indexOf('"'); // Find the first " character
int count = 0; // A counter for " characters found
while (nextQuote != -1) { // While there is another " character ahead
count++; // Increase the count by 1
nextQuote = text.indexOf('"', nextQuote+1); // Find the next " character
}
return count; // Return the result
}
static boolean quoteCharacterParity(int numQuotes) {
if (numQuotes % 2 == 0) { // If the number of " characters modulo 2 is 0
return true; // Return true for even
}
return false; // Otherwise return false
}
Note that if numQuotes happens to be 0 this method still returns true (because 0 modulo any number is 0, so (count % 2 == 0) will be true) though you wouldn't want to go ahead with the parsing if there are no " characters, so you'd want to check for this condition somewhere.
Hope this helps!
Someone else asked a similar question about how to find "interesting" words in a corpus of text. You should read the answers. In particular, Bolo's answer points to an interesting article which uses the density of appearance of a word to decide how important it is---using the observation that when a text talks about something, it usually refers to that something fairly often. This article is interesting because the technique does not require prior knowledge on the text that is being processed (for instance, you don't need a dictionary targeted at the specific lexicon).
The article suggests two algorithms.
The first algorithm rates single words (such as "Federation", or "Trek", etc.) according to their measured importance. It is straightforward to implement, and I could even provide a (not very elegant) implementation in Python.
The second algorithm is more interesting as it extracts noun phrases (such as "Star Trek", etc.) by completely ignoring whitespace and using a tree-structure to decide how to split noun phrases. The results given by this algorithm when applied to Darwin's seminal text on evolution are very impressive. However, I admit implementing this algorithm would take a bit more thought as the description given by the article is fairly elusive, and what more the authors seem a bit difficult to track down. That said, I did not spend much time, so you may have better luck.

How to create article spinner regex in Java?

Say for example I want to take this phrase:
{{Hello|What's Up|Howdy} {world|planet} |
{Goodbye|Later}
{people|citizens|inhabitants}}
and randomly make it into one of the following:
Hello world
Goodbye people
What's Up word
What's Up planet
Later citizens
etc.
The basic idea is that enclosed within every pair of braces will be an unlimited number of choices separated by "|". The program needs to go through and randomly choose one choice for each set of braces. Keep in mind that braces can be nested endlessly within each other. I found a thread about this and tried to convert it to Java, but it did not work. Here is the python code that supposedly worked:
import re
from random import randint
def select(m):
choices = m.group(1).split('|')
return choices[randint(0, len(choices)-1)]
def spinner(s):
r = re.compile('{([^{}]*)}')
while True:
s, n = r.subn(select, s)
if n == 0: break
return s.strip()
Here is my attempt to convert that Python code to Java.
public String generateSpun(String text){
String spun = new String(text);
Pattern reg = Pattern.compile("{([^{}]*)}");
Matcher matcher = reg.matcher(spun);
while (matcher.find()){
spun = matcher.replaceFirst(select(matcher.group()));
}
return spun;
}
private String select(String m){
String[] choices = m.split("|");
Random random = new Random();
int index = random.nextInt(choices.length - 1);
return choices[index];
}
Unfortunately, when I try to test this by calling
generateAd("{{Hello|What's Up|Howdy} {world|planet} | {Goodbye|Later} {people|citizens|inhabitants}}");
In the main of my program, it gives me an error in the line in generateSpun where Pattern reg is declared, giving me a PatternSyntaxException.
java.util.regex.PatternSyntaxException: Illegal repetition
{([^{}]*)}
Can someone try to create a Java method that will do what I am trying to do?
Here are some of the problems with your current code:
You should reuse your compiled Pattern, instead of Pattern.compile every time
You should reuse your Random, instead of new Random every time
Be aware that String.split is regex-based, so you must split("\\|")
Be aware that curly braces in Java regex must be escaped to match literally, so Pattern.compile("\\{([^{}]*)\\}");
You should query group(1), not group() which defaults to group 0
You're using replaceFirst wrong, look up Matcher.appendReplacement/Tail instead
Random.nextInt(int n) has exclusive upper bound (like many such methods in Java)
The algorithm itself actually does not handle arbitrarily nested braces properly
Note that escaping is done by preceding with \, and as a Java string literal it needs to be doubled (i.e. "\\" contains a single character, the backslash).
Attachment
Source code and output with above fix but no major change to algorithm
To fix the regex, add backslashes before the outer { and }. These are meta-characters in Java regexes. However, I don't think that will result in a working program. You are modifying the variable spun after it has been bound to the regex, and I do not think the returned Matcher will reflect the updated value.
I also don't think the python code will work for nested choices. Have you actually tried the python code? You say it "supposedly works", but it would be wise to verify that before you spend a lot of time porting it to Java.
Well , I just created one in PHP & Python , demo here http://spin.developerscrib.com , its at a very early stage so might not work to expectation , the source code is on github : https://github.com/razzbee/razzy-spinner
Use this, will work... I did, and working great
Pattern p = Pattern.compile("cat");
Matcher m = p.matcher("one cat two cats in the yard");
StringBuffer sb = new StringBuffer();
while (m.find()) {
m.appendReplacement(sb, "dog");
}
m.appendTail(sb);
System.out.println(sb.toString());
and here
private String select(String m){
String[] choices = m.split("|");
Random random = new Random();
int index = random.nextInt(choices.length - 1);
return choices[index];
}
m.split("|") use m.split("\\|")
Other wise it splits each an every character
and use Pattern.compile("\\{([^{}]*)\\}");

Categories

Resources