Map with custom object as key to DataFrame in Apache Spark - java

I'm having trouble with creating a DataFrame from an RDD.
To start off, I'm using Spark to create the data I'm using (via simulations on the workers) and in return I get Report objects.
These Report object consist of two HashMaps where the keys are near identical between the maps and custom made and the values are Integer / Double. Worth noting is that I currently need these keys and maps to efficiently add and update the values during the simulations, so changing this to a "flat" object may lose a lot of efficiency.
public class Key implements Serializable, Comparable<Key> {
private final States states;
private final String event;
private final double age;
...
}
And the States are
public class States implements Serializable, Comparable<States> {
private String stateOne;
private String stateTwo;
...
}
The states used to be Enums, but as it turns out, DataFrame doesn't like that. (The Strings are still set from Enums to ensure the values are correct.)
The problem is that I want to convert these maps to DataFrames so that I can use SQL etc to manipulate/filter the data.
I am able to create DataFrames by creating a Bean like so
public class Event implements Serializable {
private String stateOne;
private String stateTwo;
private String event;
private Double age;
private Integer value;
...
}
with getters and setters, but is there a way that I can just use Tuple2 (or something similar) to create my DataFrame? Which could even give me a nice structure for the db?
I have tried using Tuple2 like this
JavaRDD<Report> reports = dataSet.map(new SimulationFunction(REPLICATIONS_PER_WORKER)).cache();
JavaRDD<Tuple2<Key, Integer>> events = reports.flatMap(new FlatMapFunction<Report, Tuple2<Key, Integer>>() {
#Override
public Iterable<Tuple2<Key, Integer>> call(Report t) throws Exception {
List<Tuple2<Key, Integer>> list = new ArrayList<>(t.getEvents().size());
for(Entry<Key, Integer> entry : t.getEvents().entrySet()) {
list.add(new Tuple2<>(entry.getKey(), entry.getValue()));
}
return list;
}
});
DataFrame schemaEvents = sqlContext.createDataFrame(events, ????);
But I don't know what to put where the question marks are.
Hopefully I've made myself clear enough and that you'll be able to shed some light on this. Thank you in advance!

As zero323 says, it's not possible to do what I'm trying to do. I'll just stick with the beans from now on.

Related

How can I make a java table where all rows can be used as keys?

This is hard for me to explain as I'm not native to the English language, so I will try setting up an example.
I am trying to save some data about a player in a class called PlayerData. It has three variables with getters and setters.
public class PlayerData {
private String player;
private String username;
private UUID uuid;
public String getPlayer() {
return player;
}
public void setPlayer(String player) {
this.player = player;
}
public String getUsername() {
return username;
}
public void setUsername(String username) {
this.username = username;
}
public UUID getUuid() {
return uuid;
}
public void setUuid(UUID uuid) {
this.uuid = uuid;
}
}
For each player in the game, there will be generated a PlayerData object. Normally I would store this in a Map, so I can get the data about a player from eg. the UUID. However, I could use a way to be able to use any variable in the PlayerData object as "key", so I don't require the UUID to get the PlayerData. A way to do this (and my usual approach) would be to have multiple maps, something like this.
Map<String, PlayerData> playerMap;
Map<String, PlayerData> usernameMap;
Map<UUID, PlayerData> uuidMap;
The problem is, when it scales up with multiple variables, this gets annoying, and perhaps even eats up the RAM? I'm not entirely sure, as it stores references.
It similar to SQL, where you can also get specific colums based on the content of the rows. That's what I'm looking for, but without the SQL database.
I made a table explanation below in an attempt to explain it further:
Player
Username
UUID
Peter
Peter1234
657f6c48-655f-11eb-ae93-0242ac130002
Stephen
DogLover69
657f6efa-655f-11eb-ae93-0242ac130002
Joshua
XxFlowerPotxX
657f6fea-655f-11eb-ae93-0242ac130002
Short edition
I'm looking for a way to store multiple objects of the same type, where I (unlike Maps, that only take a single object as Key) can use multiple assigned variables as keys.
I hope the explaination was clear, I have absoloutly no idea how to explain it, which is probably also why I can't solve it by googling.
Thank you for your time.
As far as I understand, it's need to store various data for a specific user (and not just to update old values)
One way is through a custom map. Since only need a key (unique), could assume that username is doing that (eg:login). MyData can be customized further with what ever wanted to store.
Each key/username will contain a distinct list where new data is added.
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
public class TestPData {
public static void main(String[] args)
{
TestPData t = new TestPData();
MyMap m = t.new MyMap();
//key can be just user name, if unique is assured
m.putMyData("player_1", t.new MyData("p1_data1"));
m.putMyData("player_1", t.new MyData("p1_data2"));
m.putMyData("player_2", t.new MyData("p2_data3"));
m.putMyData("player_3", t.new MyData("p2_data4"));
m.putMyData("player_3", t.new MyData("p2_data5"));
m.putMyData("player_3", t.new MyData("p2_data6"));
m.forEach((k,v)->{for(MyData d: v) {System.out.println(k+":"+d);}});
}
class MyData
{
String s;
public MyData(String s)
{
this.s = s;
}
public String toString()
{
return s;
}
}
class MyMap extends HashMap<String, List<MyData>>
{
private static final long serialVersionUID = 1L;
public void putMyData(String k, MyData d)
{
if(!this.containsKey(k))
{
this.put(k, new ArrayList<MyData>());
this.get(k).add(d);
}
else
{
this.get(k).add(d);
}
}
}
}
Output
player_1:p1_data1
player_1:p1_data2
player_3:p2_data4
player_3:p2_data5
player_3:p2_data6
player_2:p2_data3
If you are dealing with few records (some thousands), you can use a list and iterative search as suggested by #gilbert-le-blanc, but if you need to manage huge amounts of records/attributes, it is better to use a database anyway. You can also use an in-memory database like Derby or H2.
https://www.h2database.com/
https://db.apache.org/derby/
With some effort you can create a custom collection with multi-indexed properties also, but it is not worth the pain.
I would use a map of maps, with the first mapping by the name of the property and the second map by its value.
In code:
Map<String, Map<String, PlayerData>> index = new HashMap<>();
To add a mapping:
PlayerData peterData = new PlayerData(
"Peter",
"Peter1234",
"657f6c48-655f-11eb-ae93-0242ac130002");
index.computeIfAbsent("player", k -> new HashMap<>())
.put("Peter", peterData);
index.computeIfAbsent("username", k -> new HashMap<>())
.put("Peter1234", peterData);
index.computeIfAbsent("uuid", k -> new HashMap<>())
.put("657f6c48-655f-11eb-ae93-0242ac130002", peterData);
This navigates to the different inner maps (one per indexed property) by means of the Map.computeIfAbsent method, which creates an empty inner map and puts it into the outer map if it doesn't exist, or returns it if already present. Then, we add the mapping to the inner map by using Map.put as usual.
To remove a mapping:
index.computeIfAbsent("username", k -> new HashMap<>()).remove("Peter1234");
This is completely dynamic, as you don't have to change the data structure when you need to map by more properties. Instead, all you have to do is add mappings as needed.
The downside of this approach is that you'd need to use strings for the keys of the inner maps, but I think this is a reasonable trade-off.

Maintain java mapping from two different types of values in a single data structure

I have a collection of objects that look something like
class Widget {
String name;
int id;
// Intuitive constructor omitted
}
Sometimes I want to look up an item by name, and sometime I want to look it up by id. I can obviously do this by
Map<String, Widget> mapByName;
Map<Integer, Widget> mapById;
However, that requires maintaining two maps, and at some point, I will (or another user who is unfamiliar with the double map) will make a change to the code and only update one of the maps.
The obvious solution is to make a class to manage the two maps. Does such a class already exist, probably in a third party package?
I am looking for something that lets me do something along the lines of
DoubleMap<String, Integer, Widget> map = new DoubleMap<>();
Widget w = new Widget(3, "foo");
map.put(w.id, w.name, w);
map.get1(3); // returns w
map.get2("foo"); // returns w
A simple solution could be, to write your own key class that includes both keys.
class WidgetKey {
String id;
String name;
boolean equals() {...}
boolean hashCode() {...}
}
Map<WidgetKey, Widget> yourMap;
Beware that you have to implement equals and hashCode in the WidgetKey class. Otherwise put/get and other map methods wouldn't work properly.

Java: creating several instances of object in class itself or how to restructure

I'm a java beginner and have a question concerning how to best structure a cooking program.
I have a class called Ingredient, this class currently looks like this:
public class Ingredient {
private String identifier;
private double ingredientFactor;
private String titleInterface;
public Ingredient(String identifier, double ingredientFactor,String titleInterface) {
this.identifier = identifier;
this.ingredientFactor = ingredientFactor;
this.titleInterface = titleInterface;
}
I want to initialize several objects (about 40) with certain values as instance variables and save them in a Map, for example
Map<String, Ingredient> allIngredients = new HashMap<String, Ingredient>();
allIngredients.put("Almonds (ground)", new Ingredient("Almonds (ground)", 0.7185, "Almonds (ground)");
Later on I want to retrieve all these objects in the form of a Map/HashMap in a different class.
I'm not sure how to proceed best, initialize all these objects in the Ingredient class itself or provide a method that initializes it or would it be better to create an super class (AllIngredients or something like that?) that has a Map with Ingredients as instance variables?
Happy for any suggestions, thanks in advance :)
Please do not initialize all these objects in the Ingredient class itself. That would be a bad practice for oops.
Just think your class is a template from which you create copies(objects) with different values for attributes. In real world if your class represent model for a toy plane which you would use to create multiple toy planes but each bearing different name and color then think how such a system would be designed. You will have a model(class). Then a system(another class) for getting required color and name from different selection of colors and names present(like in database,files,property file ) etc.
Regarding your situation .
If predetermined values store the values in a text file,properties file,database,constants in class etc depending on the sensitivity of the data.
Create Ingredient class with constructors
Create a class which will have methods to initialize Ingredient class using predetermined values,update the values if required,save the values to text file -database etc and in your case return as map .
Also check the links below
http://www.tutorialspoint.com/design_pattern/data_access_object_pattern.htm
http://www.oracle.com/technetwork/java/dataaccessobject-138824.html
Sounds to me like you are looking for a static Map.
public class Ingredient {
private String identifier;
private double ingredientFactor;
private String titleInterface;
public Ingredient(String identifier, double ingredientFactor, String titleInterface) {
this.identifier = identifier;
this.ingredientFactor = ingredientFactor;
this.titleInterface = titleInterface;
}
static Map<String, Ingredient> allIngredients = new HashMap<String, Ingredient>();
static {
// Build my main set.
allIngredients.put("Almonds (ground)", new Ingredient("Almonds (ground)", 0.7185, "Almonds (ground)"));
}
}

java best data structure for two to many relations

So I have three important factors, filenames which there are many, there will also be duplicates, violation types which there are 6 of, and the data relating to them.
I was thinking of using a Map for this but it only accepts two types, so I want to sort the data by the filename and for every entry under that filename, i want to retrieve the violation type, from what i want it to retrieve all the matches from the data, so say it's a map I could of said map.get(filename, violation) and it will retrieve all the results that match that.
Is there a data structure that can allow me to do this? or am I being lazy and should just sort the data myself when it comes to outputting it.
One other way to approach this would be to use a custom Class for holding the needed data. Essentially 'building' your own node that you can iterate over.
For example! you could create the following class object: (Node.java)
import java.util.*;
public class Node
{
private String violationType;
private String dataInside;
public Node()
{
this("", "");
}
public Node(String violationType)
{
this(violationType, "");
}
public Node(String violationType, String dataInside)
{
this.violationType = violationType;
this.dataInside = dataInside;
}
public void setViolationType(String violationType)
{
this.violationType = violationType;
}
public void setDataInside(String dataInside)
{
this.dataInside = dataInside;
}
public String getViolationType()
{
return violationType;
}
public String getDataInside()
{
return dataInside;
}
}
ok, great, so we have this 'node' thing with some setters, some getters, and some constructors for ease of use. Cool. Now lets see how to use it:
import java.util.*;
public class main{
public static void main(String[] args){
Map<String, Node> customMap = new HashMap<String, Node>();
customMap.put("MyFilename", new Node("Violation 1", "Some Data"));
System.out.println("This is a test of the custom Node: " + customMap.get("MyFilename").getViolationType());
}
}
Now we have a map that relates all of the data you need it to. Now, you'll get a lot of people saying 'Don't reinvent the wheel" when it comes to things like this, because built in libraries are far more optimized. That is true! If you can find a data structure that is built into java that suits your needs, USE IT. That's always a good policy to follow. That being said, if you have a pretty custom situation, sometimes it calls for a custom approach. Don't be afraid to make your own objects like this, it's easy to do in Java, and it could save you a lot of time and headache!
EDIT
So, after re-reading the OP's question, I realize you want an entire list of associated data for the given violation of a given filename. In which case, you would switch the private String dataInside to something like private ArrayList<String> dataInside; which would allow you to associate as much data as you wanted, still inside that node, just inside of an arraylist. Also note, you'd have to switch up the getters/setters a little to accomodate a list, but that's not too bad.
You could use a custom class for a mapkey which contains the two fields filename and violation type. When doing so you need to implement equals() and hashCode() methods do ensure instances of that class can be used as key for map.
You can use TreeMap. TreeMap is sorted according to the natural ordering of its keys.
TreeMap<String, List<String>> map = new TreeMap<String, List<String>>();

How to build a complex, hierarchic immutable data structure in Java?

I'm building a Java library for a customer, and one of the things they want is a data representation of a particular set of standards they work with. I don't want to reveal my customer's interests, but if he were an alchemist, he might want the following:
Elements
Fire
Name="Fire"
Physical
Temperature=451
Color="Orange"
Magical
Domain="Strength"
Water
Name="Water"
Physical
Color="Blue"
Earth
Name="Earth"
Magical
Domain="Stability"
Ordinality=1
I need to be able to access various data elements by name, such as:
Elements.Earth.Name
Elements.Water.Physical.Color
I also need to be able to iterate through attributes, as:
for (MagicalType attrib : Elements.Fire.Magical)
{
...
}
I have actually been able to create this data structure, and I can do everything I've asked for above -- though I had to create separate arrays for the iteration, so really what I do looks more like:
for (MagicalType attrib : Elements.Fire.MagicalAuxArray)
Unfortunately I haven't been able to meet my last requirement: the entire data structure must be immutable. I have tried repeatedly, and scoured the web looking for examples, but so far I haven't been able to accomplish this in any reasonable manner. Note that the final data structure will be quite large; I'm really hoping to avoid a solution that is too repetitious or creates too many public symbols.
I am a very experienced programmer, less experienced with Java. Can anyone suggest how I might represent the above data to meet all my requirements?
A few ways that come to mind immediately:
Don't provide setter methods for your object. You users can only create the object via a constructor and once created, it cannot be modified. This goes for other state-modification methods as well. If you want to avoid a very large parameter-list in your constructor, you can use the Builder pattern (described in Effective Java by Joshua Bloch (2nd Ed))
When returning collections, make defensive copies. In this case use a List instead of an array. That way you can do return new ArrayList<MagicalType>(MagicalAuxList) instead of return MagicalAuxList. This way people who use the class won't be able to modify the collection. One caveat here. If your array contains complex objects, they must be immutable as well.
For immutable collections, you can also try using the unmodifiableCollection static method (there are similar static-methods for lists, sets, etc. - use whichever one is appropriate for you) to convert your collection when you return it. This is an alternative to defensive copying.
Why do you use arrays? Wouldn't immutable collections (e.g. from Google Guava) do a better job?
You can use Iterable in your public API. Cleaner than Collections with all the mutators that you have to suppress. (unfortunately Iterator has a remove() method(?!) but that's just one)
public final Iterable<MagicalType> magics;
for(MagicalType magic : magics) ...
you could try the code below that uses final, enums and unmodifiable maps. but that does not let you access by name since you need to do a get from the map. you could probably do that in groovy.
import java.util.*;
enum Color {
red, green, blue;
}
class Physical {
Physical(final Double temperature, final Color color) {
this.temperature = temperature;
this.color = color;
final Map<String, Object> map=new LinkedHashMap<String, Object>();
map.put("temperature", temperature);
map.put("color", color);
this.map=Collections.unmodifiableMap(map);
}
final Double temperature;
final Color color;
final Map<String, Object> map;
}
class Magical {
Magical(final String domain, final Integer ordinality) {
this.domain = domain;
this.ordinality = ordinality;
final Map<String, Object> map=new LinkedHashMap<String, Object>();
map.put("domain", domain);
map.put("ordinality", ordinality);
this.map=Collections.unmodifiableMap(map);
}
final String domain;
final Integer ordinality;
final Map<String, Object> map;
}
public enum Elements {
earth("Earth", new Magical("Stability", 1), null), air("Air", null, null), fire("Fire", new Magical("Strength", null), new Physical(451., Color.red)), water(
"Water", null, new Physical(null, Color.blue));
Elements(final String name, final Magical magical, final Physical physical) {
this.name = name;
this.magical = magical;
this.physical = physical;
}
public static void main(String[] arguments) {
System.out.println(Elements.earth.name);
System.out.println(Elements.water.physical.color);
for (Map.Entry<String, Object> entry : Elements.water.physical.map.entrySet())
System.out.println(entry.getKey() + '=' + entry.getValue());
for (Map.Entry<String, Object> entry : Elements.earth.magical.map.entrySet())
System.out.println(entry.getKey() + '=' + entry.getValue());
}
final String name;
final Magical magical;
final Physical physical;
}

Categories

Resources