Summarizing expense by customer number in Java

Summarizing expense by customer number in Java - java

I want to code the following in Java.
I have the following customer file.
Name acct spending
BigGuy a1 30
BigGuy a2 40
smallGuy a1 300
smallGuy a2 400
smallGuy a3 400
AMR a1 300
AMR a2 400
I need to read the above file and calculate the total for each customer to produce the following.
BigGuy 70
SmallGuy 1100
AMR 700

F = open('spendFile.txt', "r")
arr = [];
for c in F.readlines()[0].split(): #this turns the file into a string and creates a list of each word
arr.append(c)
people = []
amount = []
for i in range(0,len(arr)/3):
name = arr[i*3]
a = arr[i*3+2]
if(name not in people):
people.append(name)
amount.append(int(a))
else:
index = people.index(name)
old = amount[index]
new = old+int(a)
amount[index] = new
for i in range(0,len(people)):
print(people[i]+" "+str(amount[i]))
I put the string you mentioned early in your post into a text file called 'spendFile.txt'. This then reads this and places the string into an array where each entry is a single word. I then iterate over this and grab the name with the corresponding amount. I check these values against the current list of people and amounts and finally print the total amount corresponding with each name. Let me know if you have any questions.
EDIT:
I did not notice that your original file had the first line of 'Name acct spending' and my program does not account for that. So you'll have to get rid of this line when testing and then programmatically throw it out when actually implementing this.

Related

ArrayList for java

So, this is my question that I have to answer:
In the Payroll class, add a method with the header private int
computePay(Programmer p). The method should return the programmer’s
grade multiplied by the number of hours they worked. For example, if
the programmer’s grade is 2, and the total hours they worked is 6, the
method should return 12.
But my question for this forum is, how do I get a txt. file which contains Firstname Secondname,paygrade(Out of 3),hours,hours,hours,hours,hours,hours
(txt. example;
Sean Dyke,3,34,54,67,78,34,12
Fred Flintsone,1,65,78,89,89,34,23
Scooby Doo,2,54,56,67,87,89,65
)
To make the grade part separate to the hours they work, so I can then use
private int computePay(Programmer p){
return p.grade*p.hours;
}
I may have confused myself in this one, or thinking backwards, but any sort of guidelines would help.

I think the grade is after the name so you can split the array and take the values.
String[] temp = array1.split(",");
temp[0] gives name
temp[1] gives grade
temp[2...] gives hours worked

Well there are 2 ways of storing data in a .txt file:
You can put your object into an object stream and simply write that into a textfile.
FileOutputStream out = new FileOutputStream("test.txt");
ObjectOutputStream oout = new ObjectOutputStream(out);
// write something in the file
oout.writeObject(s);
// close the stream
oout.close();
Sadly the formating would be really bad and you could not read data on your own. Also that method is really inconsistent. You would need your class to implement Serializable. Furthmore, any change in your java code would stop making it possible to read that file.
So I would go for an attampt where you just store the name, followed by those hours. Then, read it line by line:
String s = "Sean Dyke,3,34,54,67,78,34,12"
String[] entries = s.split(",");
String name = entries[0];
Integer[] hours = new Integer[entries.length -1];
for(int i = 0; i < hours; i++){
hours[i] = Integer.parseInt(entries[i + 1]);
}
This could would parse one line. Do it for every line in your code txt file and you should be fine.

Recursive calculation in Java

I'm trying to solve a calculation problem in Java.
Suppose my data looks as follows:
466,2.0762
468,2.0799
470,2.083
472,2.0863
474,2.09
476,2.0939
478,2.098
It's a list of ordered pairs, in the form of [int],[double]. Each line in my file contains one pair. The file can contain seven to seven thousand of those lines, all of them formatted as plain text.
Each [int] must be subtracted from the [int] one line above and the result written onto another file. The same calculation must be done for every [double]. For example, in the data reported above, the calculation should be:
478-476 -> result to file
476-474 -> result to file
(...)
2.098-2.0939 -> result to file
2.0939-2.09 -> result to file
and so on.
I beg your pardon if this question will look trivial for the vast majority of you, but after weeks trying to solve it, I got nowhere. I also had troubles finding something even remotely similar on this board!
Any help will be appreciated.
Thanks!

Read the file
Build the result
Write to a file
For the 1. task there are already several good answers here, for example try this one: Reading a plain text file in Java.
You see, we are able to read a file line per line. You may build a List<String> by that which contains the lines of your file.
To the 2. task. Let's iterate through all lines and build the result, again a List<String>.
List<String> inputLines = ...
List<String> outputLines = new LinkedList<String>();
int lastInt = 0;
int lastDouble = 0;
boolean firstValue = true;
for (String line : inputLines) {
// Split by ",", then values[0] is the integer and values[1] the double
String[] values = line.split(",");
int currentInt = Integer.parseInt(values[0]);
double currentDouble = Double.parseDouble(values[1]);
if (firstValue) {
// Nothing to compare to on the first run
firstValue = false;
} else {
// Compare to last values and build the result
int diffInt = lastInt - currentInt;
double diffDouble = lastDouble - currentDouble;
String outputLine = diffInt + "," + diffDouble;
outputLines.add(outputLine);
}
// Current values become last values
lastInt = currentInt;
lastDouble = currentDouble;
}
For the 3. task there are again good solutions on SO. You need to iterate through outputLines and save each line in a file: How to create a file and write to a file in Java?

receiving ArrayIndexOutOfBoundsException but cannot figure out why

this is a university assignment (sample academic report), I thought I was done and going to submit but when I started testing... I keep receiving ArrayIndexOutOfBoundsException on line 60 in main and I cannot see why. I am new to Java but really put a lot of hours into this program. Any help/advice is much appreciated.
line 60 = "int credits = Integer.parseInt(input[1]);" //im thinking error is something to due with data types??? im lost.
Course / Grade / Report classes pass data to the main java2pgm1

When you call split it returns you the an array. Here you are splitting using (":")
You need to check the length of variable input before accessing it.
String[] input = course.split(":");
int credits = Integer.parseInt(input[1]);
The input array may not contain more than 1 value so it fails

The exception will occur when the input from a user does not correspond to the expected format course_number:number_of_credits:grade_received:term_taken. In your case for what input value are you running into this exception? Does it contain a :?
Suggest that you test the length of the input array before referencing index[n]

String[] input = course.split(":");
int credits = Integer.parseInt(input[1]);
Integer term = Integer.parseInt(input[3]);
Course cObject = new Course(input[0],credits,input[2],input[3]);
The above snippet in your main always assumes that the course String has abc:def:ghi:jkl atleast 3 ":" in it. It is always good practice to handle the error case, when the string doesn't have 3 ":". Modify your code to something like below
String[] input = course.split(":");
if(input.length == 4)
{
int credits = Integer.parseInt(input[1]);
Integer term = Integer.parseInt(input[3]);
Course cObject = new Course(input[0],credits,input[2],input[3]);
}
else
{
//show some error message to user
}

here size of input array may be 0 or 1, you can check it by input.length.
If size of array is less or equal the element you want to get from array then runtime exception ArrayIndexOutOfBoundsException is thrown.

Java Mysql big data out of heap space

I have an application which accesses about 2 million tweets from a MySQL database. Specifically one of the fields holds a tweet of text (with maximum length of 140 characters). I am splitting every tweet into an ngram of words ngrams where 1 <= n <= 3. For example, consider the sentence:
I am a boring sentence.
The corresponding nGrams are:
I
I am
I am a
am
am a
am a boring
a
a boring
a boring sentence
boring
boring sentence
sentence
With about 2 million tweets, I am generating a lot of data. Regardless, I am surprised to get a heap error from Java:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:2145)
at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1922)
at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:3423)
at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:483)
at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:3118)
at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:2288)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2709)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2728)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2678)
at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1612)
at twittertest.NGramFrequencyCounter.FreqCount(NGramFrequencyCounter.java:49)
at twittertest.Global.main(Global.java:40)
Here is the problem code statement (line 49) as given by the above output from Netbeans:
results = stmt.executeQuery("select * from tweets");
So, if I am running out of memory it must be that it is trying to return all the results at once and then storing them in memory. What is the best way to solve this problem? Specifically I have the following questions:
How can I process pieces of results rather than the whole set?
How would I increase the heap size? (If this is possible)
Feel free to include any suggestions, and let me know if you need more information.
EDIT
Instead of select * from tweets I partitioned the table into equally sized subsets of about 10% of the total size. Then I tried running the program. It looked like it was working fine but it eventually gave me the same heap error. This is strange to me because I have ran the same program in the past, successfully with 610,000 tweets. Now I have about 2,000,000 tweets or roughly 3 times as much more data. So if I split the data into thirds it should work, but I went further and split the subsets into size 10%.
Is some memory not being freed? Here is the rest of the code:
results = stmt.executeQuery("select COUNT(*) from tweets");
int num_tweets = 0;
if(results.next())
{
num_tweets = results.getInt(1);
}
int num_intervals = 10; //split into equally sized subets
int interval_size = num_tweets/num_intervals;
for(int i = 0; i < num_intervals-1; i++) //process 10% of the data at a time
{
results = stmt.executeQuery( String.format("select * from tweets limit %s, %s", i*interval_size, (i+1)*interval_size));
while(results.next()) //for each row in the tweets database
{
tweetID = results.getLong("tweet_id");
curTweet = results.getString("tweet");
int colPos = curTweet.indexOf(":");
curTweet = curTweet.substring(colPos + 1); //trim off the RT and retweeted
if(curTweet != null)
{
curTweet = removeStopWords(curTweet);
}
if(curTweet == null)
{
continue;
}
reader = new StringReader(curTweet);
tokenizer = new StandardTokenizer(Version.LUCENE_36, reader);
//tokenizer = new StandardFilter(Version.LUCENE_36, tokenizer);
//Set stopSet = StopFilter.makeStopSet(Version.LUCENE_36, stopWords, true);
//tokenizer = new StopFilter(Version.LUCENE_36, tokenizer, stopSet);
tokenizer = new ShingleFilter(tokenizer, 2, 3);
charTermAttribute = tokenizer.addAttribute(CharTermAttribute.class);
while(tokenizer.incrementToken()) //insert each nGram from each tweet into the DB
{
insertNGram.setInt(1, nGramID++);
insertNGram.setString(2, charTermAttribute.toString().toString());
insertNGram.setLong(3, tweetID);
insertNGram.executeUpdate();
}
}
}

Don't get all rows from table. Try to select partial
data based on your requirement by setting limits to query. You are using MySQL database your query would be select * from tweets limit 0,10. Here 0 is starting row id and 10 represents 10 rows from start.

You can always increase the heap size available to your JVM using the -Xmx argument. You should read up on all the knobs available to you (e.g. perm gen size). Google for other options or read this SO answer.
You probably can't do this kind of problem with a 32-bit machine. You'll want 64 bits and lots of RAM.
Another option would be to treat it as a map-reduce problem. Solve it on a cluster using Hadoop and Mahout.

Have you considered streaming the result set? Halfway down the page is a section on result set, and it addresses your problem (I think?) Write the n grams to a file, then process the next row? Or, am I misunderstanding your problem?
http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html

diverging results from weka training and java training

I'm trying to create an "automated trainning" using weka's java api but I guess I'm doing something wrong, whenever I test my ARFF file via weka's interface using MultiLayerPerceptron with 10 Cross Validation or 66% Percentage Split I get some satisfactory results (around 90%), but when I try to test the same file via weka's API every test returns basically a 0% match (every row returns false)
here's the output from weka's gui:
=== Evaluation on test split ===
=== Summary ===
Correctly Classified Instances 78 91.7647 %
Incorrectly Classified Instances 7 8.2353 %
Kappa statistic 0.8081
Mean absolute error 0.0817
Root mean squared error 0.24
Relative absolute error 17.742 %
Root relative squared error 51.0603 %
Total Number of Instances 85
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.885 0.068 0.852 0.885 0.868 0.958 1
0.932 0.115 0.948 0.932 0.94 0.958 0
Weighted Avg. 0.918 0.101 0.919 0.918 0.918 0.958
=== Confusion Matrix ===
a b <-- classified as
23 3 | a = 1
4 55 | b = 0
and here's the code I've using on java (actually it's on .NET using IKVM):
var classifier = new weka.classifiers.functions.MultilayerPerceptron();
classifier.setOptions(weka.core.Utils.splitOptions("-L 0.7 -M 0.3 -N 75 -V 0 -S 0 -E 20 -H a")); //these are the same options (the default options) when the test is run under weka gui
string trainingFile = Properties.Settings.Default.WekaTrainingFile; //the path to the same file I use to test on weka explorer
weka.core.Instances data = null;
data = new weka.core.Instances(new java.io.BufferedReader(new java.io.FileReader(trainingFile))); //loads the file
data.setClassIndex(data.numAttributes() - 1); //set the last column as the class attribute
cl.buildClassifier(data);
var tmp = System.IO.Path.GetTempFileName(); //creates a temp file to create an arff file with a single row with the instance I want to test taken from the arff file loaded previously
using (var f = System.IO.File.CreateText(tmp))
{
//long code to read data from db and regenerate the line, simulating data coming from the source I really want to test
}
var dataToTest = new weka.core.Instances(new java.io.BufferedReader(new java.io.FileReader(tmp)));
dataToTest.setClassIndex(dataToTest.numAttributes() - 1);
double prediction = 0;
for (int i = 0; i < dataToTest.numInstances(); i++)
{
weka.core.Instance curr = dataToTest.instance(i);
weka.core.Instance inst = new weka.core.Instance(data.numAttributes());
inst.setDataset(data);
for (int n = 0; n < data.numAttributes(); n++)
{
weka.core.Attribute att = dataToTest.attribute(data.attribute(n).name());
if (att != null)
{
if (att.isNominal())
{
if ((data.attribute(n).numValues() > 0) && (att.numValues() > 0))
{
String label = curr.stringValue(att);
int index = data.attribute(n).indexOfValue(label);
if (index != -1)
inst.setValue(n, index);
}
}
else if (att.isNumeric())
{
inst.setValue(n, curr.value(att));
}
else
{
throw new InvalidOperationException("Unhandled attribute type!");
}
}
}
prediction += cl.classifyInstance(inst);
}
//prediction is always 0 here, my ARFF file has two classes: 0 and 1, 92 zeroes and 159 ones
it's funny because if I change the classifier to let's say NaiveBayes the results match the test made via weka's gui

You are using a deprecated way of reading in ARFF files. See this documentation. Try this instead:
import weka.core.converters.ConverterUtils.DataSource;
...
DataSource source = new DataSource("/some/where/data.arff");
Instances data = source.getDataSet();
Note that that documentation also shows how to connect to a database directly, and bypass the creation of temporary ARFF files. You could, additionally, read from the database and manually create instances to populate the Instances object with.
Finally, if simply changing the classifier type at the top of the code to NaiveBayes solved the problem, then check the options in your weka gui for MultilayerPerceptron, to see if they are different from the defaults (different settings can cause the same classifier type to produce different results).
Update: it looks like you're using different test data in your code than in your weka GUI (from a database vs a fold of the original training file); it might also be the case that the particular data in your database actually does look like class 0 to the MLP classifier. To verify whether this is the case, you can use the weka interface to split your training arff into train/test sets, and then repeat the original experiment in your code. If the results are the same as the gui, there's a problem with your data. If the results are different, then we need to look more closely at the code. The function you would call is this (from the Doc):
public Instances trainCV(int numFolds, int numFold)

I had the same Problem.
Weka gave me different results in the Explorer compared to a cross-validation in Java.
Something that helped:
Instances dataSet = ...;
dataSet.stratify(numOfFolds); // use this
//before splitting the dataset into train and test set!

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Summarizing expense by customer number in Java - java

Related

ArrayList for java

Recursive calculation in Java

receiving ArrayIndexOutOfBoundsException but cannot figure out why

Java Mysql big data out of heap space

diverging results from weka training and java training

Categories

Resources