Shortest path problem with DB and java - java

I have a movie database (postgreSQL). One of tables contains actors and movie titles. The assignment which I have to solve in java is as follows: two actors (A and B) are connected together when they in the same movie. Further two actors, A and B, are also connected when there is a third actor C, who plays with both of them in different movies (A and B don't play together!) and so on... I hope you get the idea :) Now I have to find the shortest connection (= path) between two actors.
Now to the implementation: fetching the data from the DB (prepared statements) and saving the names (as strings) in a linked list is working. As well as simple connection between actors like A -> B (= both play in the same movie). I'm hitting the wall trying to include more complicated connections (like A -> B -> C).
I am storing the actor names in a HashMap like this:
Map<String, List<String>> actorHashMap = new HashMap<String, List<String>>();
So when I load the first actor (Johnny Depp) I have his name as key, and other actors playing with him in a list referenced by the key. Checking, whether another actor played with him is easy:
List<String> connectedActors = actorHashMap.get(sourceActor);
if(connectedActors.contains(actor)) {
found = true; }
But... what do I do if the actor I'm looking for is not in the HashMap (ie. when I have to go one level deeper to find him)? I assume I would have to pick the first actors' name form the connectedActors list, insert it as new key into the HashMap, and fetch all actors he played with him to insert them to. Then search in this list.But that's exactly the part which i can't figure out. I already tried to store the names in graph nods and using bfs to search for them, but same problem here, just don't know how to go "one level down" without creating an infinite loop... Does anyone have an idea how i can solve this? I am just starting with java as well as programing in general so it's probably simple, but I just can't see it :/

First I would use a Set to store the actors someone played with:
Map<String, Set<String>> actorHashMap = ...
instead of a List to avoid duplicate names.
In order to find how many degrees of separation between 2 actors
I would start with one actor and generate all actors that are separated by
1, 2, 3... degrees. I would fix a maximum search depth, however that might not be necessary if the number of actors is not too large.
String actor = "Johnny Depp";
String targetActor = "John Travolta";
Map<String, Integer> connectedActorsAndDepth = new HashMap<String, Integer>();
Integer depth = 1;
Set<String> actorsAddedAtCurrentDepth = actorHashMap.get(actor);
for (String otherActor : actorsAddedAtPrecedingDepth) {
if (otherActor.equals(targetActor)) return depth;
connectedActorsAndDepth.put(otherActor, depth);
}
Set<String> actorsAddedAtPrecedingDepth = actorAddedAtCurrentDepth;
Integer maxDepth = 10;
while (++depth < maxDepth) {
actorsAddedAtCurrentDepth = new HashSet<String>():
for (String otherActor : actorsAddedAtPrecedingDepth) {
if (otherActor.equals(targetActor)) return depth;
if (!connectedActorsAndDepth.contains(otherActor)) {
actorsAddedAtCurrentDepth.add(otherActor);
connectedActorsAndDepth.put(otherActor, depth);
}
}
actorsAddedAtPrecedingDepth = actorsAddedAtCurrentDepth;
}
I do not claim it is the most efficient algorithm. There might also be bugs in the code above.

Related

How to process Iterables.partition(...) results in parallel for use with BatchGetItem API?

I am trying to call BatchGetItem to retrieve items from DynamoDB. As input we can get a list of up to 1000 keys (or as little as 1 key). These keys coincide with the hashKey for our DynamoDB table.
Since the BatchGetItem API only takes in up to 100 items per call, I am trying to split up the request into batches of only 100 items each, make the calls in parallel, and then merge the results into a single Set again.
For those unfamiliar with the DynamoDB who could still give advice on an extremely stripped down version (1st example) I'd appreciate it! Otherwise, please see the second more accurate example below.
1st Example - extremely stripped down
public Set<SomeResultType> retrieveSomething(Set<String> someSet) {
ImmutableSet.Builder<SomeResultType> resultBuilder = ImmutableSet.builder();
// FIXME - how to parallelize?
for (List<Map<String, String>> batch : Iterables.partition(someSet, 100)) {
result = callSomeLongRunningAPI(batch);
resultBuilder.addAll(result.getItems());
}
return resultBuilder.build();
}
2nd Example - closer to my actual problem -
Below is a stripped down, dummy version of what I'm currently doing (as such, please forgive formatting / style issues). It currently works and gets all the items, but I can't figure out how to get the batches (see FIXME) to get executed in parallel and end up in a single set. Since performance is pretty important in the system I'm trying to build, any tips would be appreciated in helping this code be more efficient!
public Set<SomeResultType> retrieveSomething(Set<String> someIds) {
if (someIds.isEmpty()) {
// handle this here
}
Collection<Map<String, AttributeValue>> keyAttributes = someIds.stream()
.map(id -> ImmutableMap.<String, AttributeValue>builder()
.put(tableName, new AttributeValue().withS(id)).build())
.collect(ImmutableList.toImmutableList());
ImmutableSet.Builder<SomeResultType> resultBuilder = ImmutableSet.builder();
Map<String, KeysAndAttributes> itemsToProcess;
BatchGetItemResult result;
// FIXME - make parallel?
for (List<Map<String, AttributeValue>> batch : Iterables.partition(keyAttributes, 100)) {
KeysAndAttributes keysAndAttributes = new KeysAndAttributes()
.withKeys(batch)
.withAttributesToGet(...// some attribute names);
itemsToProcess = ImmutableMap.of(tableName, keysAndAttributes);
result = this.dynamoDB.batchGetItem(itemsToProcess);
resultBuilder.addAll(extractItemsFromResults(tableName, result));
}
return resultBuilder.build());
}
Help with either the super stripped down case or the 2nd example would be greatly appreciated! Thanks!

How can I aggregate elements on a flux by group / how to reduce groupwise?

Assume you have a flux of objects with the following structure:
class Element {
String key;
int count;
}
Now imagine those elements flow in a predefined sort order, always in groups of a key, like
{ key = "firstKey", count=123}
{ key = "firstKey", count=1 }
{ key = "secondKey", count=4 }
{ key = "thirdKey", count=98 }
{ key = "thirdKey", count=5 }
.....
What I want to do is create a flux which returns one element for each distinct key and summed count for each key-group.
So basically like a classic reduce for each group, but using the reduce operator does not work, because it only returns a single element and I want to get a flux with one element for each distinct key.
Using bufferUntil might work, but has the drawback, that I have to keep a state to check if the key has changed in comparison to the previous one.
Using groupBy is an overkill, as I know that each group has come to an end once a new key is found, so I don't want to keep anything cached after that event.
Is such an aggregation possible using Flux, without keeping a state outside of the flow?
This is currently (as of 3.2.5) not possible without keeping track of state yourself. distinctUntilChanged could have fit the bill with minimal state but doesn't emit the state, just the values it considered as "distinct" according to said state.
The most minimalistic way of solving this is with windowUntil and compose + an AtomicReference for state-per-subscriber:
Flux<Tuple2<T, Integer>> sourceFlux = ...; //assuming key/count represented as `Tuple2`
Flux<Tuple2<T, Integer>> aggregated = sourceFlux.compose(source -> {
//having this state inside a compose means it will not be shared by multiple subscribers
AtomicReference<T> last = new AtomicReference<>(null);
return source
//use "last seen" state so split into windows, much like a `groupBy` but with earlier closing
.windowUntil(i -> !i.getT1().equals(last.getAndSet(i.getT1())), true)
//reduce each window
.flatMap(window -> window.reduce((i1, i2) -> Tuples.of(i1.getT1(), i1.getT2() + i2.getT2()))
});
That really worked for me! Thanks for that post.
Please note that in the meantime the "compose" method was renamed. You need to use transformDeferred instead.
In my case I have a "Dashboard" object which has an id (stored as UUID) on which I want to group the source flux:
Flux<Dashboard> sourceFlux = ... // could be a DB query. The Flux must be sorted according the id.
sourceFlux.transformDeferred(dashboardFlux -> {
// this stores the dashboardId's as the Flux publishes. It is used to decide when to open a new window
// having this state inside a compose means it will not be shared by multiple subscribers
AtomicReference<UUID> last = new AtomicReference<>(null);
return dashboardFlux
//use "last seen" state so split into windows, much like a `groupBy` but with earlier closing
.windowUntil(i -> !i.getDashboardId().equals(last.getAndSet(i.getDashboardId())), true)
//reduce each window
.flatMap(window -> window.reduce(... /* reduce one window here */));
})

Sqlite relative complement on combined key

First some background about my Problem:
I am building a crawler and I want to monitor some highscore lists.
The highscore lists are defined by two parameters: a category and a collection (together unique).
After a successful download I create a new stats entry (category, collection, createdAt, ...)
Problem: I want to query the highscore list only once per day. So I need a query that will return category and collection that haven't been downloaded in 24h.
The stats Table should be used for this.
I have a List of all possible categories and of all possible collections. They work like a cross join.
So basically i need the relative complement of the cross join with the entries from the last 24h
My Idea: Cross join categories and collections and 'substract' all Pair(category, collection) of stats entries that has been created during last 24 h
Question 1: Is it possible to define categories and collections inside the query and cross join them or do I have to create a table for them?
Question 2: Is my Idea the correct approach? How would you do this in Sqlite?
Ok i realise that this might sound confusing so I drew an image of what I actually want.
I am interested in C.
Here is my current code in java, maybe it helps to understand the problem:
public List<Pair<String, String>> getCollectionsToDownload() throws SQLException {
long threshold = System.currentTimeMillis() - DAY;
QueryBuilder<TopAppStatistics, Long> query = queryBuilder();
List<TopAppStatistics> collectionsNotToQuery = query.where().ge(TopAppStatistics.CREATED_AT, threshold).query();
List<Pair<String, String>> toDownload = crossJoin();
for (TopAppStatistics stat : collectionsNotToQuery) {
toDownload.remove(new Pair<>(stat.getCategory(), stat.getCollection()));
}
return toDownload;
}
private List<Pair<String, String>> crossJoin() {
String[] categories = PlayUrls.CATEGORIES;
String[] collections = PlayUrls.COLLECTIONS;
List<Pair<String, String>> toDownload = new ArrayList<>();
for (String ca : categories) {
for (String co : collections) {
toDownload.add(new Pair<>(ca, co));
}
}
return toDownload;
}
The easiest solution to your problem is an EXCEPT. Say you have a subquery
that computes A and another one that computes B. These queries
can be very complex. The key is that both should return the same number of columns and comparable data types.
In SQLite you can then do:
<your subquery 1> EXCEPT <your subquery 2>
As simple as that.
For example:
SELECT a, b FROM T where a > 10
EXCEPT
SELECT a,b FROM T where b < 5;
Remember, both subqueries must return the same number of columns.

Create a HashMap with a fixed Key corresponding to a HashSet. point of departure

My aim is to create a hashmap with a String as the key, and the entry values as a HashSet of Strings.
OUTPUT
This is what the output looks like now:
Hudson+(surname)=[Q2720681], Hudson,+Quebec=[Q141445], Hudson+(given+name)=[Q5928530], Hudson,+Colorado=[Q2272323], Hudson,+Illinois=[Q2672022], Hudson,+Indiana=[Q2710584], Hudson,+Ontario=[Q5928505], Hudson,+Buenos+Aires+Province=[Q10298710], Hudson,+Florida=[Q768903]]
According to my idea, it should look like this:
[Hudson+(surname)=[Q2720681,Q141445,Q5928530,Q2272323,Q2672022]]
The purpose is to store a particular name in Wikidata and then all of the Q values associated with it's disambiguation, so for example:
This is the page for "Bush".
I want Bush to be the Key, and then for all of the different points of departure, all of the different ways that Bush could be associated with a terminal page of Wikidata, I want to store the corresponding "Q value", or unique alpha-numeric identifier.
What I'm actually doing is trying to scrape the different names, values, from the wikipedia disambiguation and then look up the unique alpha-numeric identifier associated with that value in wikidata.
For example, with Bush we have:
George H. W. Bush
George W. Bush
Jeb Bush
Bush family
Bush (surname)
Accordingly the Q values are:
George H. W. Bush (Q23505)
George W. Bush (Q207)
Jeb Bush (Q221997)
Bush family (Q2743830)
Bush (Q1484464)
My idea is that the data structure should be construed in the following way
Key:Bush
Entry Set: Q23505, Q207, Q221997, Q2743830, Q1484464
But the code I have now doesn't do that.
It creates a seperate entry for each name and Q value. i.e.
Key:Jeb Bush
Entry Set: Q221997
Key:George W. Bush
Entry Set: Q207
and so on.
The full code in all it's glory can be seen on my github page, but I'll summarize it below also.
This is what I'm using to add values to my data strucuture:
// add Q values to their arrayList in the hash map at the index of the appropriate entity
public static HashSet<String> put_to_hash(String key, String value)
{
if (!q_valMap.containsKey(key))
{
return q_valMap.put(key, new HashSet<String>() );
}
HashSet<String> list = q_valMap.get(key);
list.add(value);
return q_valMap.put(key, list);
}
This is how I fetch the content:
while ((line_by_line = wiki_data_pagecontent.readLine()) != null)
{
// if we can determine it's a disambig page we need to send it off to get all
// the possible senses in which it can be used.
Pattern disambig_pattern = Pattern.compile("<div class=\"wikibase-entitytermsview-heading-description \">Wikipedia disambiguation page</div>");
Matcher disambig_indicator = disambig_pattern.matcher(line_by_line);
if (disambig_indicator.matches())
{
//off to get the different usages
Wikipedia_Disambig_Fetcher.all_possibilities( variable_entity );
}
else
{
//get the Q value off the page by matching
Pattern q_page_pattern = Pattern.compile("<!-- wikibase-toolbar --><span class=\"wikibase-toolbar-container\"><span class=\"wikibase-toolbar-item " +
"wikibase-toolbar \">\\[<span class=\"wikibase-toolbar-item wikibase-toolbar-button wikibase-toolbar-button-edit\"><a " +
"href=\"/wiki/Special:SetSiteLink/(.*?)\">edit</a></span>\\]</span></span>");
Matcher match_Q_component = q_page_pattern.matcher(line_by_line);
if ( match_Q_component.matches() )
{
String Q = match_Q_component.group(1);
// 'Q' should be appended to an array, since each entity can hold multiple
// Q values on that basis of disambig
put_to_hash( variable_entity, Q );
}
}
}
and this is how I deal with a disambiguation page:
public static void all_possibilities( String variable_entity ) throws Exception
{
System.out.println("this is a disambig page");
//if it's a disambig page we know we can go right to the wikipedia
//get it's normal wiki disambig page
Document docx = Jsoup.connect( "https://en.wikipedia.org/wiki/" + variable_entity ).get();
//this can handle the less structured ones.
Elements linx = docx.select( "p:contains(" + variable_entity + ") ~ ul a:eq(0)" );
for (Element linq : linx)
{
System.out.println(linq.text());
String linq_nospace = linq.text().replace(' ', '+');
Wikidata_Q_Reader.getQ( linq_nospace );
}
}
I was thinking maybe I could pass the Key value around, but I really don't know. I'm kind of stuck. Maybe someone can see how I can implement this functionality.
I'm not clear from your question what isn't working, or if you're seeing actual errors. But, while your basic data structure idea (HashMap of String to Set<String>) is sound, there's a bug in the "add" function.
public static HashSet<String> put_to_hash(String key, String value)
{
if (!q_valMap.containsKey(key))
{
return q_valMap.put(key, new HashSet<String>() );
}
HashSet<String> list = q_valMap.get(key);
list.add(value);
return q_valMap.put(key, list);
}
In the case where a key is seen for the first time (if (!q_valMap.containsKey(key))), it vivifies a new HashSet for that key, but it doesn't add value to it before returning. (And the returned value is the old value for that key, so it'll be null.) So you're going to be losing one of the Q-values for every term.
For multi-layered data structures like this, I usually special-case just the vivification of the intermediate structure, and then do the adding and return in a single code path. I think this would fix it. (I'm also going to call it valSet because it's a set and not a list. And there's no need to re-add the set to the map each time; it's a reference type and gets added the first time you encounter that key.)
public static HashSet<String> put_to_hash(String key, String value)
{
if (!q_valMap.containsKey(key)) {
q_valMap.put(key, new HashSet<String>());
}
HashSet<String> valSet = q_valMap.get(key);
valSet.add(value);
return valSet;
}
Also be aware that the Set you return is a reference to the live Set for that key, so you need to be careful about modifying it in callers, and if you're doing multithreading you're going to have concurrent access issues.
Or just use a Guava Multimap so you don't have to worry about writing the implementation yourself.

Check value inside Map

I have a Map where I save values with the form NAME-GROUP.
Before doing some operations, I need to know if the Map contains a specific group,
for example: I need to check for values containing group1 like Mark-group1.
I'm trying to get it this way:
if (checkList.containsValue(group1)) {
exists = true;
}
I can't provide the name when searching because there could be diferent names with the same group.
But it isn't finding the value, as seems that this function just looks for the entire value string and not only for part of it.
So, there would be any way of achieving this, or would I need to change the way I'm focusing my code.
Update--
This is the looking of my Map:
Map<Integer, String> checkList = new HashMap<Integer, String>();
I load some values from a database and I set them into the Map:
if (c.moveToFirst()) {
int checkKey = 0;
do {
checkKey++;
checkList.put(checkKey, c.getString(c.getColumnIndex(TravelOrder.RELATION)));
}while(c.moveToNext());
}
The relation column, has values like: mark-group1, jerry-group1, lewis-group2, etc...
So, the Map will have a structure like [1, mark-group1], etc...
What I need is to check if there is any value inside the map that contains the string group1 for example, I don't care about the name, I just need to know if that group exists there.
If you want to check any value contain your string as a substring you have to do the following:
for (String value : yourMap.values()) {
if (value.contains(subString)) {
return true;
}
}
return false;
By the way if your values in the map are really have two different parts, i suggest to store them in a structure with two fields, so they can be easily searched.

Categories

Resources