get the highest key from a javaPairRDD - java

i have a javaPairRDD called "rdd", its tuples defined as:
<Integer,String[]>
i want to extract the highest key using max() function but it requires a Comparator as an argument, would you give me an example how to do it, please !!!
example:
rdd={(22,[ff,dd])(8,[hh,jj])(6,[rr,tt]).....}
after applying rdd.max(....) , it sould give me:
int max_key=22;
help me please...in java please

Your approach isn't working because tuples don't have an inherent ordering.
What you're trying to do is get the maximum of the keys. The easiest way to do this would be to extract the keys and then get the max like so
keyRdd = rdd.keys()
max_key = keyRdd.max()
Note: Not a javaSpark user, so the syntax may be a bit off.

even that #David's answer was so logic it didn't work for me and it always requires a Comparator, and when i used a Comparator it appeared an exception (not serialisable operation, so i tried with Ordering but this time the max-key was 1 (means the min in fact), so finally, i used the easiest way ever, i sorted my pairRDD descendantly then i extracted the first() tuple.
int max-key=rdd.first()._1;

Related

Iterating through LinkedHashMap Values Isn't Ordered?

For a problem I'm designing in Java, I'm given a list of dates and winning lottery numbers. I'm supposed to do things with them, and spit them back out in order. I decided to choose a LinkedHashMap to solve it, the Date containing the date, and int[] containing the array of winning numbers.
Thing is, when I run the .values() function, I'm noticing the numbers are no longer ordered (by insertion). The code I'm running is:
for(int i = 0; i < 30; i++){ //testing first 30 to see if ordered
System.out.println(Arrays.toString((int [])(winningNumbers.values().toArray()[i])));
}
Can anyone see what exactly I'm doing wrong? Tempted almost to just use .get() and iterate through the dates, since the dates do go in some order, but that might make using a LinkedHashMap moot. Might as well be using a 2-D ArrayList[][] in that case. Thanks in advance!
EDIT: Adding code for further examination!
Lottery Class: http://pastebin.com/9ezF5U7e
Text file: http://pastebin.com/iD8jm7f8
To test, call checkOldLTNums(). Just pass it any int[] array, it doesn't utilize it, at least relevant to this problem. The output is different from the first lines in the .txt, which is organized. Thanks!
EDIT2:
I think I figured out why it fails. I used a smaller .txt file, and the code worked perfectly. It might be that it isn't wise to load 1900 entries of stuff into memory. I suppose it's better to just load individual lines and compare them instead of grabbing everything at once. Is this logic sound? Any advice going from here would be useful.
The values() method will return the Map's values in the same order they were inserted, that's what ordered means, don't confuse it with sorted, that means a different thing - that the items follow the ordering defined by the value's Comparator or its natural ordering if no comparator was provided.
Perhaps you'd be better off using a SortedMap that keeps the items sorted by key. To summarise: LinkedHashMap is ordered and TreeMap is sorted.
Try using a TreeMap instead. You should get the values in ascending key order that way.
I can't believe I missed this. I did some more troubleshooting, and realized some of the Date keys were replacing others. After doing some testing, I finally realized that the formatting I was using was off: "dd/MM/yyyy" instead of "MM/dd/yyyy". That's what I get for copy-pasting, haha. Thanks to all who helped!

How to get common values from two different maps in the fastest possible way?

i have two hash-maps. I want to find the values that are common to both the maps. One way is to iterate through the first map,get the value and use that value to match it with the values of other map through iteration. But this takes lot of time. Is there any other way to find the common values in the fastest possible way??
firstMap.keySet().retainAll(secondMap.keySet()) does what you want.
I am still not sure that this is the really fastest way. Probably if you can control the population of these 2 maps you can create third map that will accumulate shared keys during the data population?
haven't tried this , and I am not sure whether its going to be faster or not , but you can consider converting HashMaps to HashSets and then call Set1.retainAll(Set2)
Obviously a very old post but if anyone else is looking for the answer for where values specifically have to match up you can do something like this
map1.entrySet().retainAll(map.entrySet());
If you use keySet() you'll retain common keys which has it's own use but if you want to match keys AND values, use entrySet().

What's the best way to remove duplicate URI params from a string?

I have a string, root?param1=...&param2=...&param3=..., and I want to create a java method that will remove any duplicate params. The values will always be the same, but sometimes the params are duplicated as per the application's function (don't ask). Therefor,
HTTP://root?param1=value&param2=value2&param2=value2param3=value3&param3=value3&param1=value&param1=value
becomes
HTTP://root?param1=value&param2=value2&param3=value3
I've been out of programming too long to remember the best ways to do this but my original train of thought went something like this:
Grab each param and stick into a temp array, run through temp array and compare if array[i] equals to any other param name. If so, delete. If not, add back to a return string. At end of loop, print return string.
But that would require O(n) for the length of the URI plus O(m)! for the size of the array (m being the number of params). I think that would be pretty bad considering I'll be running this method around 5,000x per minute for all incoming URIs. Is there a better way to go about this or an out-of-the-box java method that handles some of the overhead?
You could stick the key/value pairs into a Map<String,String>. That'll automatically take care of duplicated keys, and will be very easy to code up.
To verify that parameters with identical keys have identical values, you could check the return value of put(): it should be either null, or equal to the value you've just inserted.
If you insist on doing this (don't?), then you could use a Map.
For each of your parameter - value pairs, insert them into the map.
You'll be left with only unique parameters, which you can then use to rebuild your URI.
You'd be iterating your parameter - value pairs once, then iterating your Map once to rebuild the URI.
Or, like Thilo said, you could not do this and let the receiver deal with the duplicates.
For the love of God, don't write your own crufty parser for this.
Find an HTTP library which handles parameter parsing and use that.
This might do the trick: http://hc.apache.org/httpcomponents-client-ga/httpclient/apidocs/org/apache/http/client/utils/URLEncodedUtils.html
Here's my solution with lambda expression in Java.
// input: query string
// output: parameter map(Map<String, String>) in least recently updated order
List<String> pairs = Splitter.on("&").splitToList(queryStr);
return pairs.stream().map(s -> s.split("=", 2)).collect(Collectors.toMap(pair -> pair[0],
pair -> (pair.length > 1 ? pair[1] : ""), (formerValue, latterValue) -> latterValue, LinkedHashMap::new));

nth item of hashmap

HashMap selections = new HashMap<Integer, Float>();
How can i get the Integer key of the 3rd smaller value of Float in all HashMap?
Edit
im using the HashMap for this
for (InflatedRunner runner : prices.getRunners()) {
for (InflatedMarketPrices.InflatedPrice price : runner.getLayPrices()) {
if (price.getDepth() == 1) {
selections.put(new Integer(runner.getSelectionId()), new Float(price.getPrice()));
}
}
}
i need the runner of the 3rd smaller price with depth 1
maybe i should implement this in another way?
Michael Mrozek nails it with his question if you're using HashMap right: this is highly atypical scenario for HashMap. That said, you can do something like this:
get the Set<Map.Entry<K,V>> from the HashMap<K,V>.entrySet().
addAll to List<Map.Entry<K,V>>
Collections.sort the list with a custom Comparator<Map.Entry<K,V>> that sorts based on V.
If you just need the 3rd Map.Entry<K,V> only, then a O(N) selection algorithm may suffice.
//after edit
It looks like selection should really be a SortedMap<Float, InflatedRunner>. You should look at java.util.TreeMap.
Here's an example of how TreeMap can be used to get the 3rd lowest key:
TreeMap<Integer,String> map = new TreeMap<Integer,String>();
map.put(33, "Three");
map.put(44, "Four");
map.put(11, "One");
map.put(22, "Two");
int thirdKey = map.higherKey(map.higherKey(map.firstKey()));
System.out.println(thirdKey); // prints "33"
Also note how I take advantage of Java's auto-boxing/unboxing feature between int and Integer. I noticed that you used new Integer and new Float in your original code; this is unnecessary.
//another edit
It should be noted that if you have multiple InflatedRunner with the same price, only one will be kept. If this is a problem, and you want to keep all runners, then you can do one of a few things:
If you really need a multi-map (one key can map to multiple values), then you can:
have TreeMap<Float,Set<InflatedRunner>>
Use MultiMap from Google Collections
If you don't need the map functionality, then just have a List<RunnerPricePair> (sorry, I'm not familiar with the domain to name it appropriately), where RunnerPricePair implements Comparable<RunnerPricePair> that compares on prices. You can just add all the pairs to the list, then either:
Collections.sort the list and get the 3rd pair
Use O(N) selection algorithm
Are you sure you're using hashmaps right? They're used to quickly lookup a value given a key; it's highly unusual to sort the values and then try to find a corresponding key. If anything, you should be mapping the float to the int, so you could at least sort the float keys and get the integer value of the third smallest that way
You have to do it in steps:
Get the Collection<V> of values from the Map
Sort the values
Choose the index of the nth smallest
Think about how you want to handle ties.
You could do it with the google collections BiMap, assuming that the Floats are unique.
If you regularly need to get the key of the nth item, consider:
using a TreeMap, which efficiently keeps keys in sorted order
then using a double map (i.e. one TreeMap mapping integer > float, the other mapping float > integer)
You have to weigh up the inelegance and potential risk of bugs from needing to maintain two maps with the scalability benefit of having a structure that efficiently keeps the keys in order.
You may need to think about two keys mapping to the same float...
P.S. Forgot to mention: if this is an occasional function, and you just need to find the nth largest item of a large number of items, you could consider implementing a selection algorithm (effectively, you do a sort, but don't actually bother sorting subparts of the list that you realise you don't need to sort because their order makes no difference to the position of the item you're looking for).

Detect changes in random ordered input (hash function?)

I'm reading lines of text that can come in any order. The problem is that the output can actually be indentical to the previous output. How can I detect this, without sorting the output first?
Is there some kind of hash function that can take identical input, but in any order, and still produce the same result?
The easiest way would seem to be to hash each line on the way in, storing the hash and the original data, and then compare each new hash with your collection of existing hashes. If you get a positive, you could compare the actual data, to make sure it's not a false positive - though this would be extremely rare, you could go with a quicker hash algorithm, like MD5 or CRC (instead of something like SHA, which is slower but less likely to collide), just so it's quick, and then compare the actual data when you get a hit.
So you have input like
A B C D
D E F G
C B A D
and you need to detect that the first and third lines are identical?
If you want to find out if two files contain the same set of lines, but in a different order, you can use a regular hash function on each line individually, then combine them with a function where ordering doesn't matter, like addition.
If the lines are fairly long, you could just keep a list of the hashes of each line -- sort those and compare with previous outputs.
If you don't need a 100% fool-proof solution, you could store the hash of each line in a Bloom filter (look it up on Wikipedia) and compare the Bloom filters at the end of processing. This can give you false positives (i.e. you think you have the same output but it isn't really the same) but you can tweak the error rate by adjusting the size of the Bloom filter...
If you add up the ASCII values of each character, you'd get the same result regardless of order.
(This may be a bit too simplified, but perhaps it sparks an idea for you.
See Programming Pearls, section 2.8, for an interesting back story.)
Any of the hash-based methods may produce bad results because more than one string can produce the same hash. (It's not likely, but it's possible.) This is particularly true of the suggestion to add the hashes, since you would essentially be taking a particularly bad hash of the hash values.
A hash method should only be attempted if it's not critical that you miss a change or spot a change where none exists.
The most accurate way would be to keep a Map using the line strings as key and storing the count of each as the value. (If each string can only appear once, you don't need the count.) Compute this for the expected set of lines. Duplicate this collection to examine the incoming lines, reducing the count for each line as you see it.
If you encounter a line with a zero count (or no map entry at all), you've seen a line you didn't expect.
If you end this with non-zero entries remaining in the Map, you didn't see something you expected.
Well the problem specification is a bit limited.
As I understand it you wish to see if several strings contain the same elements regardless of order.
For example:
A B C
C B A
are the same.
The way to do this is to create a set of the values then compare the sets. To create a set do:
HashSet set = new HashSet();
foreach (item : string) {
set.add(item);
}
Then just compare the contents of the sets by running through one of the sets and comparing it w/others. The execution time will be O(N) instead of O(NlogN) for the sorting example.

Categories

Resources