I have an assignment that requires me to take a large dataset, store it in an array, and then create methods that interpret the data in various ways. The file data I am given is in the form like so:
0 138
0 139
0 140
0 141
0 142
0 799
4 1
4 10
4 12
4 18
etc... (it is very large)
This data is supposed to represent a social network of people, with the numbers representing individuals. Each line contains a person on the left who has 'trusted' the person on the right. I am supposed to interpret this data so that I can find all the persons a particular person trusts, how many people trust a particular person, and how to find the most trusted person. However, I am at a complete loss as to how to write these methods, and so I was wondering if you guys could help me out. Here's the code I have so far:
public class SocialNetwork {
static Scanner scanner = new Scanner(System.in);
static void findTrusted()
{
System.out.println("Please input person number you would like to find Trustees for");
trustee = (scanner.next());
}
public static void main(String[] args){
File inData = new File("dataset.txt");
ArrayList<Integer> links = new ArrayList<Integer>();
try
{
Scanner in = new Scanner(inData);
in.nextLine();
in.nextLine();
in.nextLine();
in.nextLine();
while (in.hasNext())
{
int trustee = in.nextInt();
int trusted = in.nextInt();
links.add(trustee);
links.add(trusted);
}
in.close();
}
catch (FileNotFoundException e){
e.printStackTrace();
}
}
}
As you can see, my findTrustee method has very little in it. I just don't know where to even start. I have come up with a little pseudocode to try and dissect what needs to be done:
prompt user for input on which person(integer) to find his/her trustees
search arraylist links for person(integer) inputted
print all persons(integers) on the right side of the lines that begin with person requested
However, I just don't quite know how to do this.
The structure links doesn't really help you. It has no idea of "from" and "to". You are storing Persons as numbers, but not storing any relationships between two people. You're really working in graph theory, and when you can you should look at reference works and Java libraries for graph theory.
So, what is a trust link? It is an object that has two people, the trustee and trusted people. Create a class for this:
public class Trust {
private final int trustee;
private final int trusted;
public Trust(final int trustee, final int trusted) {
this.trustee = trustee;
this.trusted = trusted;
}
// Getters, equals, hashCode, toString, formatted output for humans.
}
Have your class SocialNetwork be able to create these. By the way, create a SocialNetwork instance in your main method, and stop using static for everything else.
public Trust createTrust(Scanner scanner) {
int trustee = scanner.nextInt();
int trusted = scanner.nextInt();
return new Trust(trustee, trusted);
}
You might need to add exception handling and end of file handling.
Make links a list of Trust objects, and then write methods that scan that list as needed.
/**
Return a list of all the people who trustee trusts.
#param trustee A person in the system.
#return a list of the people trustee trusts.
*/
public List<Integer> trusting(int trustee) {
final List<Integer> trusted = new ArrayList<>();
for (Trust link: links) {
// Add something from link to trusted if it should.
// This looks like homework; I'm not doing everything for you.
}
return trusted;
}
Write other methods as you need them. Then, think about whether these data structures are efficient for this problem. Could Maps be better? MultiMaps from other libraries? An open source graph theory library of some sort? Perhaps you should use a database instead. Perhaps you should have a Person class instead of using just integers; that way you can label people with their names.
I think there are quite a number of ways you can implement this (regardless of performance). For example, you can use HashMap, array of array (or list of lists if you really like list...)
I will give an example using list maybe, since you seem using it... (although I think this is a bit odd)
Say, you have a list holding the people on the left.
ArrayList<ArrayList> leftList = new ArrayList<ArrayList>();
For leftList,loop through it till you reach the max no. of the left column (now you may see why an array/HashMap is better...) by doing something like:
leftList.add(new ArrayList());
in each loop.
Then all you have to do now is to read the file and plug the list of trustees to rightList corresponding to the truster. E.g. I have 1 3, 1 4 and 2 3; your implementation will achieve sth like:
leftList.get(1).add(3) / leftList.get(1).add(4) / leftList.get(2).add(3)
depending which line you are reading.
With this setup, I guess you can solve those three questions quite easily? Otherwise, just look for more advice here. But make sure you think through it first!
Hope my answer gives you some ideas.
Related
I dont know how well I'll be able to ask this question, but given a text file I need to parse through and extract the productID data and store it in a HashSet, userID data and store it in a HashSet, and the review/score and store it in an ArrayList. They also need to be used to create a graph, where the productID is connected with an edge between the userID.
The data is found here http://snap.stanford.edu/data/web-FineFoods.html
You can ignore review/time, review/helpfulness, review/summary, and review/text information, they dont need to be stored in memory.
My current code looks like this:
import java.io.*;
import java.util.*;
import java.nio.charset.*;
public class Reviews
{
String fileName = "newfinefoods.txt";
GraphType<String> foodReview;
HashSet<String> productID;
HashSet<String> userID;
ArrayList<String> review;
int counter; //was using this to make sure I'm counting all the lines which I think I am
public Reviews(){
foodReview = new GraphType<>();
productID = new HashSet<>();
userID = new HashSet<>();
review = new ArrayList<>();
counter = 0;
}
public int numReviews(){
return review.size();
}
public int numProducts(){
return productID.size();
}
public int numUsers(){
return userID.size();
}
public void setupGraph(){
Scanner fileScanner;
String line = "";
try{
fileScanner = new Scanner (new File (fileName), "UTF-8");
String pr = "";
while(fileScanner.hasNextLine()){
line = fileScanner.nextLine();
String[] reviewInfo = line.split(": ");
String productInfo = reviewInfo[1];
System.out.println(productInfo);
}
}
catch (IOException e){
System.out.println(e);
}
}
public static void main(String[] args){
Reviews review = new Reviews();
review.setupGraph();
System.out.println("Number of Reviews:" + review.numReviews());
System.out.println("Number of Products:" + review.numProducts());
System.out.println("Number of Users:" + review.numUsers());
}
}
Whenever I run the code, looking in the array reviewInfo at 1, it only prints one set of data, but if I change it to 0 it seems to print all the information (just not the info that I need). I need to create this graph and get the info from the data but I am really just super stuck, and any tips or help would be very appreciated!
Here is a sample of the data:
product/productId: B001E4KFG0
review/userId: A3SGXH7AUHU8GW
review/profileName: delmartian
review/helpfulness: 1/1
review/score: 5.0
review/time: 1303862400
review/summary: Good Quality Dog Food
review/text: I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than most.
product/productId: B00813GRG4
review/userId: A1D87F6ZCVE5NK
review/profileName: dll pa
review/helpfulness: 0/0
review/score: 1.0
review/time: 1346976000
review/summary: Not as Advertised
review/text: Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as "Jumbo".
product/productId: B000LQOCH0
review/userId: ABXLMWJIXXAIN
review/profileName: Natalia Corres "Natalia Corres"
review/helpfulness: 1/1
review/score: 4.0
review/time: 1219017600
review/summary: "Delight" says it all
review/text: This is a confection that has been around a few centuries. It is a light, pillowy citrus gelatin with nuts - in this case Filberts. And it is cut into tiny squares and then liberally coated with powdered sugar. And it is a tiny mouthful of heaven. Not too chewy, and very flavorful. I highly recommend this yummy treat. If you are familiar with the story of C.S. Lewis' "The Lion, The Witch, and The Wardrobe" - this is the treat that seduces Edmund into selling out his Brother and Sisters to the Witch.
product/productId: B000UA0QIQ
Initial approach of your design is right, but you should structure it a little more:
Method setupGraph should be splitted in little specific and parametrized methods:
Since the users and products are part of the class' state, I deem it better that the class' constructor receives the scanner as an input parameter. Then, after initializing the state variables, it should call setupGraph (which should be private) passing the input scanner.
setupGraph shall receive an input scanner and take the responsibility of reading lines from it, and give a proper treatment to the IOExceptions that might arise. On each line, it should merely call another private method for processing the read line. If you want to count all the read lines, this is where you should place the increment.
The processing line method shall receive an input string, and take the responsibility of deciding if it contains a product data, a user data, a score data, or none. This must be done through properly parsing its contents.
Here is where you can use String.split() to get the name and value of each line, and then evaluate the name to decide where to store the value. And if you want to count all the processed lines, this is where you should place the increment.
Last, main method shall take the responsability of instancing the scanner and passing it when constructing the Reviews object. In this way, you could receive the file name as input argument from the command line, so your program would become flexible.
Realise that the only public methods of your class should be the constructor and the getters. And state variables shuld be private.
I have the following objects:
enum Slot
{
HANDS, LEGS, CHEST, HEAD, FEET;
}
class Clothing
{
// The slot this piece of clothing is worn on.
Slot s;
// The color of the clothing, used for `gradeOutfit`
Color c;
}
class Person
{
Map<Slot, Clothing> body;
// look through his outfit and give a score
// for how well he looks
int gradeOutfit()
{
return ...
}
}
I have one Person object and a collection of Clothing. This collection has many Clothing objects of each Slot. For example, it might look like this:
MyCloset = { GREEN_HAT, RED_VEST, BLACK_VEST,
BLUE_JEANS, BROWN_PANTS, RED_SHOES, BLACK_HAT, BLUE_GLOVES, PURPLE_VEST }
In the reality of my program, there are a lot more items than just these but this is just a simplified example.
Problem:
I need to find a combination of these clothes that lead to the highest gradeOutfit score. That means my Person will have to make sure he tries on every Clothing item with every other Clothing item (within limits, ex. it's impossible for two hats to be worn because both are for HEAD Slot). A Person cannot have their gradeOutfit called until they are wearing a Clothing item for every Slot.
I was thinking recursion is the best way to do this but then I think I'd get a stack overflow very fast if I had a decent amount of items. I tried doing it iteratively but I cannot seem to find a good easy way to loop through everything. My program basically looks like
Person p = new Person();
for (Clothing i : MyCloset)
{
for (Clothing h : MyCloset)
{
if (i == h) continue;
if (!p.isWearing(h.slot())
{
p.wear(h);
}
}
int score = p.gradeOutfit();
}
But I know this is just a terrible approach. In order to ensure that every clothing item has been paired up with every other Clothing item, I would need so much more looping logic than just this. No matter what I try, it turns into spaghetti code. I also need to avoid looping over the same outfit twice and make sure that no outfit combination is forgotten about.
What is the best way to approach something like this?
This is an example of a mathematical optimization problem. You seem to already have the objective function (the function that calculates the gradeOutfit score - taking as an input five clothings, one per slot) and you need some constraints (e.g. each clothing in a combination of 5 belongs to a different slot). You need a Java solver library to do this. If your objective function is linear, a linear solver will do. As I have only experience with commercial solvers, I cannot recommend an open-source one, a list of options can be found here.
A simpler (but not extremely elegant) way, without a solver:
Create 5 sets of Clothing objects, one per slot (you can use Java
HashSet for this).
Iterate over all combinations, each time taking one item from each of the 5 sets. You need n1 x n2 x n3 x n4 x n5 combinations, where ni is the number of clothing instances per slot.
It also seems to me that the gradeOutfit function should not be part of the Person class - as it is actually the property of an outfit, not a person (i.e. two persons with the same outfit have exactly the same scores). I 'd prefer to have an Outfit class and put it there.
You have very poorly created the data structure.
enum Slot
{
HANDS, LEGS, CHEST, HEAD, FEET;
numbers = new int[values.length()]
}
enum COLOR
{
RED,BLUE,...;
}
enum Clothing {
GREEN_HAT(HEAD,GREEN), ...;
Slot slot;
Color color;
public static Clothing (Slot slot, Color color){...}
}
class Outfit extends Map <Slot, Clothing> {
countScore(){};
public static Outfit(){
//foreach slot this.put(slot, Clothing.values().get(0));
}
}
...
int n=slot.values.length()-1;
Outfit currentOutfit = new Outfit();
Outfit bestOutfit = new Outfit();
int currentActiveSlot = 0;
// make a cycle for combination of all Outfits
for an enum , you have to use the method "values()" to loop on it:
For (clothe c: clothes.values())
I'd like some help with a Java assignment, if it's no problem. We've just been getting started, but my teacher wants us to do a bunch of research on our own and I can't figure out how to do the homework.
We have an assignment where he's given us the lines to 10 different speeches, and we have to use objective oriented coding to display the entire thing. I figured out so far how to set up variables to link to the first file and have things displayed on the screen, but he wants us to limit how many characters are on each line so he doesn't have to scroll sideways forever to read a speech on a single line. This leaves me in a position where I'd be making new variables for every sentence of every speech for the next few hours, and I figure there has to be a more efficient way. So, I asked my friend (who took the class last year) for advice, and he recommended using a for loop to scan for spaces after a certain amount of characters and jump to the next line to continue, but I have no idea how to do any of this. All I have so far is the base file that our teacher told us to use, and the beginning of the first of 10 speeches.
/**
* TextWriter is a program that uses objective coding to display 10 political speeches
* #author ()
* #version (10/12/16)
*/
public class TextWriter {
private String textToDisplay;//text to be displayed
public TextWriter() {
textToDisplay = "";
}
public TextWriter(String inputText) {
textToDisplay = inputText;
}
public void clearTextToDisplay() {
textToDisplay = "";
}
public void setTextToDisplay(String inputText) {
textToDisplay = inputText;
}
public String getTextToDisplay() {
return textToDisplay;
}
public void display() {
System.out.println(textToDisplay);
}
}
and the second one,
/**
* Displays Washington's Farewell speech using objective oriented coding.
* #author ()
* #version (10/12/16)
*/
public class WashingtonFarewellDriver {
public static void main(String[] args) {
TextWriter wf1;
wf1 = new TextWriter();
wf1.setTextToDisplay("Friends and Citizens: The period for a new election of a citizen to administer the executive government of the United States being not far distant, and the time actually arrived when your thoughts must be employed in designating the person who is to be clothed with that important trust, it appears to me proper, especially as it may conduce to a more distinct expression of the public voice, that I should now apprise you of the resolution I have formed, to decline being considered among the number of those out of whom a choice is to be made.");
wf1.display();
TextWriter wf2;
wf2 = new TextWriter("I beg you, at the same time, to do me the justice to be assured that this resolution has not been taken without a strict regard to all the considerations appertaining to the relation which binds a dutiful citizen to his country; and that in withdrawing the tender of service, which silence in my situation might imply, I am influenced by no diminution of zeal for your future interest, no deficiency of grateful respect for your past kindness, but am supported by a full conviction that the step is compatible with both.");
wf2.display();
TextWriter wf3;
wf3 = new TextWriter("The acceptance of, and continuance hitherto in, the office to which your suffrages have twice called me have been a uniform sacrifice of inclination to the opinion of duty and to a deference for what appeared to be your desire. I constantly hoped that it would have been much earlier in my power, consistently with motives which I was not at liberty to disregard, to return to that retirement from which I had been reluctantly drawn.");
wf3.display();
}
}
(hopefully that's formatted right)
I hope that it's ok that I'm asking for homework help, because it does seem to be kind of looked down upon, but I'm pretty confused and hopefully someone can explain what's going on a little more than my teacher.
Thank you! If there's any questions, I might be able to answer them too.
Loop thru the string character by character using String.charAt(). Keep track of how many characters you've put out. After say 25 characters the next time you see a space spit out a newline character, reset your counter to 0, and start printing it out again.
String in = "This is a run on sentence that is too long for a single line and should be broken up into multiple lines because I said so. This is a run on sentence that is too long for a single line and should be broken up into multiple lines because I said so.";
int counter=0;
for(int i=0;i<in.length();i++){
Char c=in.charAt(i);
counter++;
System.out.print(c+"");
if((counter>25)&&(c=' ')){
System.out.println();
counter=0;
}
}
There are many ways to approach this.
You can add this function in your TextWriter class for adding lines something like this:
public void addLines(int maxChars){
int lines = 1;
String[] lineStrings;
if(maxChars <= textToDisplay.length()){
if(textToDisplay.length() % maxChars > 0) lines = textToDisplay.length()/maxChars + 1;
else lines = textToDisplay.length()/maxChars;
lineStrings = new String[lines];
for(int i = 0; i < lines; i++){
if(i == (lines - 1)) lineStrings[i] = textToDisplay.substring(i*maxChars, i*maxChars + (textToDisplay.length() % maxChars)) + "\r\n";
else lineStrings[i] = textToDisplay.substring(i*maxChars, i*maxChars + maxChars) + "\r\n";
}
textToDisplay = "";
for(int i=0; i < lines; i++){
textToDisplay += lineStrings[i];
}
}
}
and in your Main function, maybe:
public class WashingtonFarewellDriver {
public static void main(String[] args) {
TextWriter wf1;
wf1 = new TextWriter();
wf1.setTextToDisplay("Friends and Citizens: The period for a new election of a citizen to administer the executive government of the United States being not far distant, and the time actually arrived when your thoughts must be employed in designating the person who is to be clothed with that important trust, it appears to me proper, especially as it may conduce to a more distinct expression of the public voice, that I should now apprise you of the resolution I have formed, to decline being considered among the number of those out of whom a choice is to be made.");
wf1.addLines(50);
wf1.display();
TextWriter wf2;
wf2 = new TextWriter("I beg you, at the same time, to do me the justice to be assured that this resolution has not been taken without a strict regard to all the considerations appertaining to the relation which binds a dutiful citizen to his country; and that in withdrawing the tender of service, which silence in my situation might imply, I am influenced by no diminution of zeal for your future interest, no deficiency of grateful respect for your past kindness, but am supported by a full conviction that the step is compatible with both.");
wf2.addLines(50);
wf2.display();
TextWriter wf3;
wf3 = new TextWriter("The acceptance of, and continuance hitherto in, the office to which your suffrages have twice called me have been a uniform sacrifice of inclination to the opinion of duty and to a deference for what appeared to be your desire. I constantly hoped that it would have been much earlier in my power, consistently with motives which I was not at liberty to disregard, to return to that retirement from which I had been reluctantly drawn.");
wf3.addLines(50);
wf3.display();
}
}
This should work, but some words will be cut off, because this just roughly separates lines by a maximum characters in a line.
thanks for all the feedback, it helped me, but ultimately there was another pretty easy way my that friend walked me through using the main.org.apache.commons.lang3.text.WordUtils package that he downloaded!
import java.io.IOException;
import org.apache.commons.lang3.text.WordUtils;
public class WashingtonFarewellDriver {
public static void main(String[] args) throws IOException {
int wwl = 110;
TextWriter wf1;
wf1 = new TextWriter(WordUtils.wrap("long sentences",wwl));
wf1.display();
}
}
The following requisites are those for the program I'm currently having an issue with:
The program must be able to open any text file specified by the user, and analyze the frequency of verbal ticks in the text. Since there are many different kinds of verbal ticks (such as "like", "uh", "um", "you know", etc) the program must ask the user what ticks to look for. A user can enter multiple ticks, separated by commas.
The program should output:
the total number of tics found in the text
the density of tics (proportion of all words in the text that are tics)
the frequency of each of the verbal tics
the percentage that each tic represents out of all the total number of tics
Here is my program:
public class TextfileHW2 {
// initiate(
public static int[] initiate(int[] values){
for (int z=0; z<keys.length; z++){
values[z] = 0;
}
return values;
processing(values);
}
// processing(values)
public static int[] processing(int[] valuez){
while (input.hasNext()){
String next = input.next();
totalwords++;
for (int x = 0; x<keys.length; x++){
if (next.toLowerCase().equals(keys[x])){
valuez[x]+=1;
}
}
return valuez;
output();
}
for (Integer u : valuez){
totalticks += u;
}
}
public static void output(){
System.out.println("Total number of tics :"+totalticks);
System.out.printf("Density of tics (in percent): %.2f \n", ((totalticks/totalwords)*100));
System.out.println(".........Tick Breakdown.......");
for (int z = 0; z<keys.length; z++){
System.out.println(keys[z] + " / "+ values[z]+" occurences /" + (values[z]*100/totalticks) + "% of all tics");
}
sc.close();
input.close();
}
public static void main(String[] args) throws FileNotFoundException {
static double totalwords = 0; // double so density (totalwords/totalticks) returned can be double
static int totalticks = 0;
System.out.println("What file would you like to open?");
static Scanner sc = new Scanner(System.in);
static String files = sc.nextLine();
static Scanner input = new Scanner(new File(files));
System.out.println("What words would you like to search for? (please separate with a comma)");
static String ticks = sc.nextLine(), tics = ticks.toLowerCase();
static String[] keys = tics.split(",");
static int[] values = new int[keys.length];
initiate(values);
}
My program should be logically right as I wrote it and successfully ran it for a while last week, but the difference with this one (which doesn't work) is that I must use separate methods for each component of the analysis, which shouldn't be too difficult a task considering the program was working before So I naturally tried to split up my program such that I can call my first method (which I called initiate) then my 2nd and 3rd methods called processing and output.
First of all, what does static really mean? I remember my teacher saying that it represents a global variable which I can use anywhere in the program. As you can see I changed every variable to static to perhaps make my task easier.
Also, do I strictly need to use public static + type returned if I'm going to change something?
Let's say I want to change the values of an array (like I do in my program and use public static void) do I need to return something to actually change the values of the array or is it ok to use public static void?
If anyone also has any general pointers for what concerns my methods I would really appreciate it.
Your problem is in your initiate method:
return values;
processing(values);
Once you call return, your method stops. If you are using Eclipse (which I highly recommend), you should have gotten an error saying "Unreachable code," because there is simply no way for the program to execute your processing method.
I also saw this flaw in your output method.
First of all, what does static really mean? I remember my teacher
saying that it represents a global variable which I can use anywhere
in the program. As you can see I changed every variable to static to
perhaps make my task easier.
It depends on the context. There is a good overall description here. The meaning is different when applied to methods, fields, and classes. To say it makes variables "global" is a bit simplified.
Also, do I strictly need to use public static + type returned if I'm going to change something?
I'm a little confused about what you mean. A method declared as public static *return_type* has three separate, independent qualities:
public: It is accessible by any other class.
static: It does not require an instance of the class to function (see above link).
*return_type*: This is, of course, the return type.
These properties aren't really related to "changing something". Unless I misunderstood your question, the answer is: No, the method specifiers and return type have no impact on its ability to change something with the exception that static methods cannot modify non-static fields or call non-static methods of this (there is no this in static methods).
Let's say I want to change the values of an array (like I do in my program and use public static void) do I need to return something to actually change the values of the array or is it ok to use public static void?
What you do in the function is entirely independent of the access specifier and static-ness of it (with the above-mentioned exception that this does not exist in static methods). If your function has any side-effects like changing the values in an array (or any values for that matter), then it does it regardless of public, or static, or its return type.
Check out the More on Classes section of the official language tutorial. It is concise and well-written and should help complete your understanding of the general concepts you're asking about. Check out some of the other tutorials there as well if you'd like.
So, I've written a spellchecker in Java and things work as they should. The only problem is that if I use a word where the max allowed distance of edits is too large (like say, 9) then my code runs out of memory. I've profiled my code and dumped the heap into a file, but I don't know how to use it to optimize my code.
Can anyone offer any help? I'm more than willing to put up the file/use any other approach that people might have.
-Edit-
Many people asked for more details in the comments. I figured that other people would find them useful, and they might get buried in the comments. Here they are:
I'm using a Trie to store the words themselves.
In order to improve time efficiency, I don't compute the Levenshtein Distance upfront, but I calculate it as I go. What I mean by this is that I keep only two rows of the LD table in memory. Since a Trie is a prefix tree, it means that every time I recurse down a node, the previous letters of the word (and therefore the distance for those words) remains the same. Therefore, I only calculate the distance with that new letter included, with the previous row remaining unchanged.
The suggestions that I generate are stored in a HashMap. The rows of the LD table are stored in ArrayLists.
Here's the code of the function in the Trie that leads to the problem. Building the Trie is pretty straight forward, and I haven't included the code for the same here.
/*
* #param letter: the letter that is currently being looked at in the trie
* word: the word that we are trying to find matches for
* previousRow: the previous row of the Levenshtein Distance table
* suggestions: all the suggestions for the given word
* maxd: max distance a word can be from th query and still be returned as suggestion
* suggestion: the current suggestion being constructed
*/
public void get(char letter, ArrayList<Character> word, ArrayList<Integer> previousRow, HashSet<String> suggestions, int maxd, String suggestion){
// the new row of the trie that is to be computed.
ArrayList<Integer> currentRow = new ArrayList<Integer>(word.size()+1);
currentRow.add(previousRow.get(0)+1);
int insert = 0;
int delete = 0;
int swap = 0;
int d = 0;
for(int i=1;i<word.size()+1;i++){
delete = currentRow.get(i-1)+1;
insert = previousRow.get(i)+1;
if(word.get(i-1)==letter)
swap = previousRow.get(i-1);
else
swap = previousRow.get(i-1)+1;
d = Math.min(delete, Math.min(insert, swap));
currentRow.add(d);
}
// if this node represents a word and the distance so far is <= maxd, then add this word as a suggestion
if(isWord==true && d<=maxd){
suggestions.add(suggestion);
}
// if any of the entries in the current row are <=maxd, it means we can still find possible solutions.
// recursively search all the branches of the trie
for(int i=0;i<currentRow.size();i++){
if(currentRow.get(i)<=maxd){
for(int j=0;j<26;j++){
if(children[j]!=null){
children[j].get((char)(j+97), word, currentRow, suggestions, maxd, suggestion+String.valueOf((char)(j+97)));
}
}
break;
}
}
}
Here's some code I quickly crafted showing one way to generate the candidates and to then "rank" them.
The trick is: you never "test" a non-valid candidate.
To me your: "I run out of memory when I've got an edit distance of 9" screams "combinatorial explosion".
Of course to dodge a combinatorial explosion you don't do thing like trying to generate yourself all words that are at a distance from '9' from your misspelled work. You start from the misspelled word and generate (quite a lot) of possible candidates, but you refrain from creating too many candidates, for then you'd run into trouble.
(also note that it doesn't make much sense to compute up to a Levenhstein Edit Distance of 9, because technically any word less than 10 letters can be transformed into any other word less than 10 letters in max 9 transformations)
Here's why you simply cannot test all words up to a distance of 9 without either having an OutOfMemory error or simply a program never terminating:
generating all the LED up to 1 for the word "ptmizing", by only adding one letter (from a to z) generates already 9*26 variations (i.e. 324 variations) [there are 9 positions where you can insert one out of 26 letters)
generating all the LED up to 2, by only adding one letter to what we know have generates already 10*26*324 variations (60 840)
generating all the LED up to 3 gives: 17 400 240 variations
And that is only by considering the case where we add one, add two or add three letters (we're not counting deletion, swaps, etc.). And that is on a misspelled word that is only nine characters long. On "real" words, it explodes even faster.
Sure, you could get "smart" and generate this in a way not to have too many dupes etc. but the point stays: it's a combinatorial explosion that explodes fastly.
Anyway... Here's an example. I'm simply passing the dictionary of valid words (containing only four words in this case) to the corresponding method to keep this short.
You'll obviously want to replace the call to the LED with your own LED implementation.
The double-metaphone is just an example: in a real spellchecker words that do "sound alike"
despite further LED should be considered as "more correct" and hence often suggest first. For example "optimizing" and "aupteemising" are quite far from a LED point of view, but using the double-metaphone you should get "optimizing" as one of the first suggestion.
(disclaimer: following was cranked in a few minutes, it doesn't take into account uppercase, non-english words, etc.: it's not a real spell-checker, just an example)
#Test
public void spellCheck() {
final String src = "misspeled";
final Set<String> validWords = new HashSet<String>();
validWords.add("boing");
validWords.add("Yahoo!");
validWords.add("misspelled");
validWords.add("stackoverflow");
final List<String> candidates = findNonSortedCandidates( src, validWords );
final SortedMap<Integer,String> res = computeLevenhsteinEditDistanceForEveryCandidate(candidates, src);
for ( final Map.Entry<Integer,String> entry : res.entrySet() ) {
System.out.println( entry.getValue() + " # LED: " + entry.getKey() );
}
}
private SortedMap<Integer, String> computeLevenhsteinEditDistanceForEveryCandidate(
final List<String> candidates,
final String mispelledWord
) {
final SortedMap<Integer, String> res = new TreeMap<Integer, String>();
for ( final String candidate : candidates ) {
res.put( dynamicProgrammingLED(candidate, mispelledWord), candidate );
}
return res;
}
private int dynamicProgrammingLED( final String candidate, final String misspelledWord ) {
return Levenhstein.getLevenshteinDistance(candidate,misspelledWord);
}
Here you generate all possible candidates using several methods. I've only implemented one such method (and quickly so it may be bogus but that's not the point ; )
private List<String> findNonSortedCandidates( final String src, final Set<String> validWords ) {
final List<String> res = new ArrayList<String>();
res.addAll( allCombinationAddingOneLetter(src, validWords) );
// res.addAll( allCombinationRemovingOneLetter(src) );
// res.addAll( allCombinationInvertingLetters(src) );
return res;
}
private List<String> allCombinationAddingOneLetter( final String src, final Set<String> validWords ) {
final List<String> res = new ArrayList<String>();
for (char c = 'a'; c < 'z'; c++) {
for (int i = 0; i < src.length(); i++) {
final String candidate = src.substring(0, i) + c + src.substring(i, src.length());
if ( validWords.contains(candidate) ) {
res.add(candidate); // only adding candidates we know are valid words
}
}
if ( validWords.contains(src+c) ) {
res.add( src + c );
}
}
return res;
}
One thing you could try is, increase the Java's heap size, in order to overcome "out of memory error".
Following article will help you in order to understand how to increase heap size in Java
http://viralpatel.net/blogs/2009/01/jvm-java-increase-heap-size-setting-heap-size-jvm-heap.html
But I think the better approach to address your problem is, find out a better algorithm than the current algorithm
Well without more Information on the topic there is not much the community could do for you... You can start with the following:
Look at what your Profiler says (after it has run a little while): Does anything pile up? Are there a lot of Objects - this should normally give you a hint on what is wrong with your code.
Publish your saved dump somewhere and link it in your question, so someone else could take a look at it.
Tell us which profiler you are using, then somebody can give you hints on where to look for valuable information.
After you have narrowed down your problem to a specific part of your Code, and you cannot figure out why there are so many objects of $FOO in your memory, post a snippet of the relevant part.