Java sorting with collections plus manual sorting - java

I am writing a console application that calculates using hashtables various prices. It writes prices with a class called Priceprint. I am using hashtables for the rest of the program because order is not particularly important, but it orders the keys before creating a list as output. It puts them in order by putting the keys in a vector, sorting the vector with Collections.sort() and manually swapping the first and second keys with entries with keys exchange and special. It then uses an Enumeration to get everything from the vector, and calls another function to write each entry to the screen.
public void out(Hashtable<String, Double> b, Hashtable<String, Double> d) {
Vector<String> v;
Enumeration<String> k;
String te1, te2, e;
int ex, sp;
v = new Vector<String>(d.keySet());
Collections.sort(v);
te1 = new String(v.get(0));
ex = v.indexOf("exchange");
v.set(ex, te1); v.set(0, "exchange");
te2 = new String(v.get(1));
ex = v.indexOf("special");
v.set(ex, te2); v.set(1, "special");
if (msgflag == true)
System.out.println("Listing Bitcoin and dollar prices.");
else {
System.out.println("Listing Bitcoin and dollar prices, "
+ message + ".");
msgflag = true;
}
k = v.elements();
while (k.hasMoreElements()) {
e = new String(k.nextElement());
out(e, d.get(e), b.get(e));
}
}
Now the problem is I've ran into through lack of thought alone is that swap the entries and sort the list in alphabetical order of it's keys. So when I run the program exchange and special are at the top but the rest of the list is no longer in order. I might have to scrap the essential design where lists are outputted through the code for single entries with keys price and special coming to the top but having order with every other aspect of the list. It's a shame, because it pretty much all might need to go and I really liked the design.
Here is the full code, ignore the fact I'm using constructors on a class that evidently should be using static methods but overlooked that: http://pastebin.com/CdwhcV2L
Here is the code using Printprice to create a list of prices to test another part of the program but also Printprice lists: http://pastebin.com/E2Fq13zF
Output:
john#fekete:~/devel/java/pricecalc$ java backend.test.Test
I test CalcPrice, but I also test Printprice(Hashtable, Hashtable, String).
Listing Bitcoin and dollar prices, for unit test, check with calculator.
Exchange rate is $127.23 (USDBTC).
Special is 20.0%.
privacy: $2.0 0.0126BTC, for unit test, check with calculator.
quotaband: $1.5 0.0094BTC, for unit test, check with calculator.
quotahdd: $5.0 0.0314BTC, for unit test, check with calculator.
shells: $5.0 0.0314BTC, for unit test, check with calculator.
hosting: $10.0 0.0629BTC, for unit test, check with calculator.

The problem appears to be that you are putting the first and second elements back into the vector at the locations that "exchange" and "special" came from, instead of removing "exchange" and "special" from the vector and inserting them at the top of the vector.
Doing this correctly would be more efficient with a LinkedList instead of a Vector. To carry out the required operations, assuming v is a List:
v.add(0, v.remove(v.indexOf("special")));
v.add(0, v.remove(v.indexOf("exchange")));
This should put "exchange" first, "special" second and the rest of the list will remain in sorted order afterwards.

Related

Understanding JavaPairRDD.reduceByKey function

I came across follow code snippet of Apache Spark:
JavaRDD<String> lines = new JavaSparkContext(sparkSession.sparkContext()).textFile("src\\main\\resources\\data.txt");
JavaPairRDD<String, Integer> pairs = lines.mapToPair(s -> new Tuple2(s, 1));
System.out.println(pairs.collect());
JavaPairRDD<String, Integer> counts = pairs.reduceByKey((a, b) -> a + b);
System.out.println("Reduced data: " + counts.collect());
My data.txt is as follows:
Mahesh
Mahesh
Ganesh
Ashok
Abnave
Ganesh
Mahesh
The output is:
[(Mahesh,1), (Mahesh,1), (Ganesh,1), (Ashok,1), (Abnave,1), (Ganesh,1), (Mahesh,1)]
Reduced data: [(Ganesh,2), (Abnave,1), (Mahesh,3), (Ashok,1)]
While I understand how first line of output is obtained, I dont understand how second line is obtained, that is how JavaPairRDD<String, Integer> counts is formed by reduceByKey.
I found that the signature of reduceByKey() is as follows:
public JavaPairRDD<K,V> reduceByKey(Function2<V,V,V> func)
The [signature](http://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/api/java/function/Function2.html#call(T1, T2)) of Function2.call() is as follows:
R call(T1 v1, T2 v2) throws Exception
The explanation of reduceByKey() reads as follows:
Merge the values for each key using an associative reduce function. This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a "combiner" in MapReduce. Output will be hash-partitioned with the existing partitioner/ parallelism level.
Now this explanation sounds somewhat confusing to me. May be there is something more to the functionality of reduceByKey(). By looking at input and output to reduceByKey() and Function2.call(), I feel somehow reducebyKey() sends values of same keys to call() in pairs. But that simply does not sound clear. Can anyone explain what precisely how reduceByKey() and Function2.call() works together?
As its name implies, reduceByKey() reduces data based on the lambda function you pass to it.
In your example, this function is a simple adder: for a and b, return a + b.
The best way to understand how the result is formed is to imagine what happens internally. The ByKey() part groups your records based on their key values. In your example, you'll have 4 different sets of pairs:
Set 1: ((Mahesh, 1), (Mahesh, 1), (Mahesh, 1))
Set 2: ((Ganesh, 1), (Ganesh, 1))
Set 3: ((Ashok, 1))
Set 4: ((Abnave, 1))
Now, the reduce part will try to reduce the previous 4 sets using the lambda function (the adder):
For Set 1: (Mahesh, 1 + 1 + 1) -> (Mahesh, 3)
For Set 2: (Ganesh, 1 + 1) -> (Ganesh, 2)
For Set 3: (Ashok , 1) -> (Ashok, 1) (nothing to add)
For Set 4: (Abnave, 1) -> (Abnave, 1) (nothing to add)
Functions signatures can be sometimes confusing as they tend to be more generic.
I'm thinking that you probably understand groupByKey? groupByKey groups all values for a certain key into a list (or iterable) so that you can do something with that - like, say, sum (or count) the values. Basically, what sum does is to reduce a list of many values into a single value. It does so by iteratively adding two values to yield one value and that is what Function2 needs to do when you write your own. It needs to take in two values and return one value.
ReduceByKey does the same as a groupByKey, BUT it does what is called a "map-side reduce" before shuffling data around. Because Spark distributes data across many different machines to allow for parallel processing, there is no guarantee that data with the same key is placed on the same machine. Spark thus has to shuffle data around, and the more data that needs to be shuffled the longer our computations will take, so it's a good idea to shuffle as little data as needed.
In a map-side reduce, Spark will first sum all the values for a given key locally on the executors before it sends (shuffles) the result around for the final sum to be computed. This means that much less data - a single value instead of a list of values - needs to be send between the different machines in the cluster and for this reason, reduceByKey is most often preferable to a groupByKey.
For a more detailed description, I can recommend this article :)

A pair of strings as a KEY in reduce function - HADOOP

Hello I am implementing a facebook-like program in java using hadoop framework (I am new to this). The main idea is that I have an input .txt file like this:
Christina Bill,James,Nick,Jessica
James Christina,Mary,Toby,Nick
...
The 1st is the user and the comma separated are his friends.
In the map function I scan each line of the file and emit the user with each one of his friends like
Christina Bill
Christina James
which will be converted to (Christina,[Bill,James,..])...
BUT in the description of my assignment it specifies that the Reduce function will receive as key the tuple of
two users, following by both their friends, you will count the
common ones and if that number is equal or greater than a
set number, like 5, you can safely assume that their
uncommon friends can be suggested. So how exactly do I pass a pair of users to the reduce function. I thought the input of the reduce function has to be the same as the output of the map function. I started coding this but I don't think this is the right approach. Any ideas?
public class ReduceFunction<KEY> extends Reducer<KEY,Text,KEY,Text> {
private Text suggestedFriend = new Text();
public void reduce(KEY key1,KEY key2, Iterable<Text> value1,Iterable<Text> value2,Context context){
}}
The output of the map phase should, indeed, be of the same type as the input of the reduce phase. This means that, if there is a requirement for the input of the reduce phase, you have to change your mapper.
The idea is simple:
map(user u,friends F):
for each f in F do
emit (u-f, F\f)
reduce(userPair u1-u2, friends F1,F2):
#commonFriends = |F1 intersection F2|
To implement this logic, you can just use a Text key, in which you concatenate the names of the users, using, e.g., the '-' character between them.
Note that in each reduce method, you will only receive two lists of friends, assuming that each user appears once in your input data. Then, you only have to compare the two lists for common names of friends.
Check if you can implement custom record reader, read two records at once from input file in mapper class. And then emit context.write(outkey, NullWritable.get()); from mapper class. Now in reducer class you need to handle two records came as a key(outkey) from mapper class. Good luck !

How to prevent value from changing automatically in java

I'm very new to Java and stackoverflow so I'm sorry if I seem ignorant but I wrote this program to multiply two numbers together using the Russian Peasant multiplication algorithm. The complete program includes far more operations and is hundreds of lines of code but I only included what I thought was necessary for this particular method. I have a test harness so I know that all the submethods work correctly. The problem that I'm struggling with though is the 3rd line where I'm adding factor1 to the product. The values add correctly but then when factor1 is multiplied by 2 in the 5th line then the value that was added to product in the 3rd line also gets doubled resulting in an incorrect product value. How can I make sure that when factor 1 is doubled that it doesn't carry backwards to the product term?
while (Long.parseLong(factor2.toString()) >= 1) {
if (factor2.bigIntArray[0] % 2 == 1) {
product = product.add(factor1);
}
factor1 = factor1.multiplyByTwo();
factor2 = factor2.divideByTwo();
}
I think in your method multiplyByTwo you use code
`datamember=datamember*2;`
rather than that try doing this
return new FactorClass(datamember*2);
so it doesnt change the added value.
it would be better if u could show the mulTiplyByTwo method code since that is where your actually are getting the changed value.

In JBehave, how do I pass an array as a parameter from a story file to a step file?

I've been reading the JBehave docs and I'm not finding anything that speaks to this specific use case. The closest I found was this on parameterised scenarios, but it's not quite what I'm looking for. I don't need to run the same logic many times with different parameters, I need to run the step logic once with a set of parameters. Specifically, I need to pass combinations of the numbers 1-4. Is there a way to do this?
Do you mean something like Tabular Parameters?
You could use it like this:
Given the numbers:
|combinations|
|1234|
|4321|
|1324|
When ...
and then:
#Given("the numbers: $numbersTable")
public void theNumbers(ExamplesTable numbersTable) {
List numbers = new ArrayList();
for (Map<String,String> row : numbersTable.getRows()) {
String combination = row.get("combinations");
numbers.add(combination);
}
}
I just rewrote the jBehave example so it could fit your needs. You can pass any count of combinations into the tables inside the given,when,then steps and transform it to an array or in my example into a list.

Xtend "Movies example" best answer

I followed the Xtend tutorial and the Movies example. At the end of this tutorial, you can find this question:
#Test def void sumOfVotesOfTop2() {
val long sum = movies.sortBy[ -rating ].take(2).map[ numberOfVotes ].reduce[ a, b | a + b ]
assertEquals(47_229L, sum)
}
First the movies are sorted by rating, then we take the best two. Next the list of movies is turned into a list of their numberOfVotes using the map function. Now we have a List which can be reduced to a single Long by adding the values.
You could also use reduce instead of map and reduce. Do you know how?
My question is: What is the best answer for the last question ?
I found a way to compute the same "sum" value without using map() extension method, but it seems awful for me. Here is my solution:
assertEquals(47229, this.movies.sortBy[ -rating ].take(2).reduce[m1, m2 | new Movie('', 0, 0.0, m1.numberOfVotes + m2.numberOfVotes,null)].numberOfVotes)
Is there a better (and cleaner) way to do that ?
You could use fold(R seed, (R,T)=>R function) instead of reduce((T,T)=>T):
assertEquals(47229,
movies
.sortBy[rating]
.reverseView
.take(2)
.fold(0L) [ result, movie | result + movie.numberOfVotes ])
Please note that map((T)=>R) does not perform any eager computation but is evaluated on demand, so performance should not matter for a solution that uses the map function. Nevertheless, fold is quite handy if you need to accumulate a result for a set of values where the result type different from the element type.

Categories

Resources