My English is not very well, BUT I will try my best to explain my issue here.
I am working on an application in which I have to create graph. For now I am using GraphStream.
Requirements for my graph is very complicated, which is :
I have a table named CDR(Call Data Record) in which I have 2 columns ANUMBER and BNUMBER. The structure of table is very clear that it shows that Anumber called Bnumber and there is another column for DATETIME, which shows the date and time of call. BUT I need here only two columns.
Lets say we have 4 numbers here : 123, 456 ,789 ,000 and table structure is like this
Anumber Bnumber
------- -------
123 456
123 789
456 789
789 000
456 000
My table here clearly shows that 123 didn't call 000 But 123 called 456 and 789 and these two numbers called 000 So I have to show the directed graph between 123 and 000 which probably shows like this 123->456->000 and 132->789->000
So the issue is I don't know how to find this path between 123 and 000. There may be more than 2 numbers like 5 or 6 numbers and I have to find the hidden numbers between all the given 5 or 6 numbers AS in the above scenario 456 and 789 are hidden numbers between 132 and 000.
And one thing more my table contains more than 20 million rows and in future obviously the number of rows will grow very fast as user calls each other.
WHAT I HAVE DONE SO FAR:
I have done some R&D on this issue but couldn't found any good library or any solution for this. So far, I think Dijkstra's Algorithm is best for my scenario as GraphStream luckily provides this algorithm here.
What I want from you guys, Give me an idea that Will this algorithm will give me required result OR I have to find any other algo or graph library which will suits best for my problem. I am not good at ALGO's thats why I am here for any help or guideline If you guys can gave me.
Thanks
You don't need Dijkstra's algorithm at all, since you don't have costs on edges. You need simple BFS algorithm.
Here is simple implementation but you should add 'labels' array to mark visited nodes. So after BFS you can restore pass from each node to source node(123 in your case) or say that node cannot be reached from given node(if label for this node remains 0).
You should add label in the following way:
all labels equals 0 on start
when you visit new node you set it's label as current_node_label+1
But Dijkstra's can help you if you set cost of each edge as 1. It's just not efficient way to sole you problem.
Related
I am trying to make a genetic algorithm for optimizing a seating problem where people have preferences of who they want to sit next to. The problem I'm having is with the crossover stage. I have two members of a population that I want to breed and produce a child member. Typically, one would take a 'gene' from one parent and a different gene from another. The issue is placement matters in the context of seating at a table.
Here is a randomly generated example:
i j Child Value Mother Value Father Value
0 0 P08 P00
0 1 P11 P06
0 2 P02 P05
0 3 P12 P09
0 4 P07 P04
1 0 P09 P10
1 1 P01 P01
1 2 P03 P13
1 3 P06 P03
1 4 P13 P11
2 0 P00 P12
2 1 P10 P07
2 2 P04 P02
2 3 P05 P08
2 4 P14 P14
I want to generate a child that has characteristics of both parents but also has no duplicate objects (PXX shouldn't occur twice in the child).
i : Represents the Table number. Here there are 3 tables
j : Represents the seat number. Here there are 5 Seats
Total of 15 people.
Any good way to do this and sorry if my explanation is confusing.
Note I can't use any genetic algorithm libraries.
For this kind of problem the solution is often to "repair" the child. You need a repairing function which will take an infeasible child (one with duplicates) and make it feasible (remove the duplicates and replace them by the missing values).
For your particular case, another way of fixing this issue would be to take all non-overlapping tables from the parents and put them in the child, and fill the missing tables of the child with the remainder of the people (e.g., in your example take table 1 from Mother and table 2 from Father–they have no people in common–and put the rest of the people in table 0).
I've found many resources online, discussing this and related topics, but I haven't found anything that really helps me know where to start with implementing this solution.
To clarify, starting from city 0, I need to visit every other city once, and return back to city 0.
I have an array such as this one:
0 1129 1417 1240 1951
1129 0 1100 800 2237
1417 1100 0 1890 3046
1240 800 1890 0 1558
1951 2237 3046 1558 0
Along with finding the optimal route, I need to also find the optimal partial routes along the way. For example, I'd start with routes of length 2, and end up printing out something like this:
S = {0,1}
C({0,1},1) = 1129
S = {0,2}
C({0,2},2) = 1417
S = {0,3}
C({0,3},3) = 1240
S = {0,4}
C({0,4},4) = 1951
Then I'd go to routes of length 3, and print something like this:
S = {0,1,2}
C({0,1,2},1) = 2517
C({0,1,2},2) = 2229
and so on...
To make this a dynamic programming solution, I assume I should be saving the shortest distance between any n nodes, and the best way I've thought to do that is with a Hashmap, where the key would be an integer value of every node included in that path, in ascending order (A path going from nodes 0>1>3>4 or 0>1>4>3 could be stored as '134'), and each key would hold a pair that could store the path order as a List, and the total distance as an integer.
At this point I would think I'd want to calculate all paths of distance 2, then all of distance 3, and then take the smallest few and use the hashmap to find the shortest path back for each, and compare.
Does this seem like it could work? Or am I completely on the wrong track?
You're sort of on track. Dynamic programming isn't the way to calculate a TSP. What you're sort of close to is calculating a minimum spanning tree. This is a tree that connects all nodes using the shortest possible sum of edges. There are two algorithms that are frequently used: Primm's, and Kruskal's. They produce something similar to your optimal partial routes list. I'd recommend you look at Primm's algorithm: https://en.wikipedia.org/wiki/Prim%27s_algorithm
The easiest way of solving TSP is by finding the minimum spanning tree, and then doing a pre-order tree walk over the tree. This gives you an approximate travelling salesman solution, and is known as the Triangle Inequality Approximation. It's guaranteed to be no more than twice as long as an optimal TSP, but it can be calculated much faster. This web page explains it fairly well http://www.personal.kent.edu/~rmuhamma/Algorithms/MyAlgorithms/AproxAlgor/TSP/tsp.htm
If you want a more optimal solution, you'll need to look at Christofide's method, which is more complicated.
You are on the right track.
I think you're getting at the fact that the DP recursion itself only tells you the optimal cost for each S and j, not the node that attained that cost. The classical way to handle this is through "backtracking": once you found out that, for example, C({0,1,2,3,4},-) = 10000, you figure out which node attained that cost; let's say it's 3. Then you figure out which node attained C({0,1,2,3,4},3); let's say it's 1. Then you figure out which node attained C({0,1,2,4},1); let's say it's 2, ... and so on. This is the classical way and it avoids having to store a lot of intermediate data. For DPs with a small state space, it's easier just to store all those optimizers along the way. In your case you have an exponentially large state space so it might be expensive to store all of those optimizers, but you already have to store an equally large data structure (C), so most likely you can store the optimizers as well. Your intermediate solution of storing a few of the best could work -- then if the backtracking routine turns out to need one you didn't store you can just calculate it the classical way -- seems reasonable, but I'm not sure it's worth the extra coding vs. a pure backtracking approach.
One clarification: You don't actually want to "calculate all paths of distance 2, then all of distance 3, ...", but rather you want to enumerate all sets of size 2, then all sets of size 3, etc. Still exponential, but not as bad as enumerating all paths.
I need to figure out how to solve this problem. I am using Java but it doesn't matter for now. I don't need any codes since I have to do that myself I just need some advices about the algorithm since I can't find any fast ways to do this.(My solution took too long when compiling)
Question is:
There are 4 cities. Each has different routes from one to other(16 routes totally)
Going from City 1 to City 4 is different than going to City 1 from City 4(all routes are one-way) that's why they have different values.
I have the list of required times for each road which is totally 16. Actually the list will be typed by the user when the program initiates but you can assume that I have the list for now.
After we got the required times the user chooses a starting and an ending city and the program has to find the minimum duration for that travel.
Example:
0 18 15 8
18 0 7 3
7 16 0 19
10 14 19 0
This is a table of travel durations. i(row) x j(column) and the values show the travel duration from i to j city.
When the user inputs "4 2" which means from city 4 to city 2, the the output as answer should be 14
But when the user inputs "2 1" the the output as answer should be 13(3+10). First from city 2 to city 4 which is 3 hours and than from city 4 to city 1 which is 10 hours and totally it makes 13 hours.
So the chosen route doesn't need to be the direct route any number of routes can be used between two cities but with the most fastest way.
This 4x4 table was just an example for 4 cities. (Which is all I have also). The algorithm should work for max 100 cities. The user will type the number of cities before filling the table for travel durations between each of them.
I may find a solution for 4 cities but It doesn't work for 100 cities. I also tried permutation method with java but as I said it took too long to compile. However compiling proccess musn't take longer than 4 seconds.
Sorry for the long and boring question of mine but I hope someone can make a useful suggestion.
I have figured out how to do it, so answering this myself. I have managed to make it work by editing the code I have found here. It works really fast and flawless.
Sorry for the noobish question, i've been trying to figure it out to no avail.
I'm making an Android application which has a list view. I wish to have each row in the have its corresponding row count next to it but in reverse order.
The thing I seem to have trouble with is reversing the numbers with only 1 number and its maximum number.
Example:
I pull things out of my database in reverse order.
So this is how things look on the list view: Row count | Name
0 - John
1 - Jez
2 - Jen
When I add a value to the list view, it becomes
0 - NewName
1- John
2 - Jez
3 - Jen
What I really want is the reverse order... so Jen being 1 and Jez 2, John 3 and any new additions added on, but I'm not sure how to do it as the only thing I have is the row count and the number of rows.
Is there a formula which can do this?
Any help will be greatly appreciated.
Edit: Solved by Loki and Richard - DOH, Max number - number.
You've not added any code, but generally you could do something like this:
int id = (maxRows - rowIndex);
Then just add that to the list view text with something like:
setText(person + " " + String.valueOf(id));
It may not be exactly what you want, but without code, we're running a bit blind. Should point you in the right direction though.
Hope this helps! :)
I am doing some linguistic research that depends on being able to query a corpus of 100 million sentences. The information I need from that corpus is along the lines: how many sentences had "john" as first word, "went" as second word and "hospital" as the fifth word...etc So I just need the count and don't need to actually retrieve the sentences.
The idea I had was to split these sentences into words and store them into a database, where the columns would be the positions (word-1, word-2, word-3..etc) and the sentences would be the rows. So it looks like:
Word1 Word2 Word3 Word4 Word5 ....
Congress approved a new bill
John went to school
.....
And my purpose will then be fulfilled by calling something like COUNT(SELECT * where Word1=John and Word4=school). But I am wondering: Can this be better achieved using Lucene (or some other tool)?
The program I am writing (in Java) will be doing tens of thosands of such queries on that 100 million sentece corpus. So speed of look-up is important.
Thanks for any advice,
Anas
Assuming that the queries are as simple as you have indicated, a simple SQL db (Postgres, MySQL, possibly H2) would be perfect for this.
I suppose you already have infrastructure to create tokens from a given sentence. You can create a lucene document with one field for each word in the sentence. You can name the fields as field1, field2, and so on. Since, lucene doesn't have a schema like DB, you can define as many fields, on the fly, as you wish. You can add an additional identifier field if you want to identify which sentences matched a query.
While searching, your typical lucene query will be
+field1:John +field4:school
Since you are not bothered about the speed of retrieval, you can write a custom Collector which will ignore scores. (That will return results significantly faster as well.)
Since you don't plan to retrive back the matching sentences or words, you should only index these fields and not store. That should push performance up by a notch.
Lucene span queries can implement positional search. Use SpanFirst to find a word in the first N positions of a document, and combine it with SpanNot to rule out the first N-1.
Your example query would like this:
<BooleanQuery: +(+spanFirst(john, 1) +spanFirst(went, 2)) +spanNot(spanFirst(hospital, 5), spanFirst(hospital, 4))>
Lucene also of course allows getting the total hit count of a search result without iterating all the docs.
I suggest you read Search Engine versus DBMS. From what I gather, you do need a database rather than a full text search library.
In any case, I suggest you preprocess your text and replace every word/token with a number using a dictionary. This replaces every sentence with an array of word codes. I would then store every word place in a separate database column, simplifying counts and making them quicker.
For example:
A boy and a girl drank milk
translates into:
120 530 14 120 619 447 253
(I chose arbitrary word codes) leading to store a row
120 530 14 120 619 447 253 0 0 0 0 0 0 0 ....
(until the number of words you allocate per a sentence is exhausted).
This is a somewhat sparse matrix, so maybe this question will help.
Look at Apache Hadoop and Map Reduce. It's developed for things like this.
Or you can done it by the hand, using only only java by
List triple = new ArrayList(3);
for (String word: inputFileWords) {
if (triple.size == 3) {
resultFile.println(StringUtils.join(" ", triple));
triple.remove(0);
}
triple.add(line);
}
then sort this file and sum all duplicate lines (manually or from some command line utility), it will be fast as possible.