is there any DSL for streams/iterators? - java

I wonder (and nearly become desperate) if there is any worked out DSL for streams/iterators on ordered series of objects?
The sources are ordered streams of id,time,key,value instances and the requirement is to join and analyse those streams. This has to be done by collecting combinations of keys and applying metrics to values within certain (defineable) time-constraints (count distinct keys or sum values within a day, within same second ..). There are some DSL, that work on timeseries (ESP), but mostly using relatively simple time-windows and they do not seem to be able to handle the order/join by id,time etc (and in consequence the computation of combinations by id).
What I have to do is something like "compute the combinations of A and (B or C), count distinct D within same second, sum E with same id"
The results should contain all available combinations of A, (B or C) with the count of distinct values for key D that are in the same second with A, (B or C) for each distinct id and the sum of the values for key E for each id (which is the sum over all values of E for ids havin A, (B or C).
not an easy question. I'm just looking for maybe helpful, already thought out DSL for such problems. I do not think SQL will make it.
Thanks a lot!

I think you can't find such methods because streams and iterators are not intended to contain ordered data (however they can). As result if you can't rely on sorted data inside there is no need in such methods, because you will need to read all data from stream/iterator thus they will loose their main purpose as a data structure. So why not to use list?

Related

Data structure used to perform the union operation on two disjoint sets

What basic data structure would be best to use for the union operation on two disjoint sets?
Are there any algorithms that would run in O(1) time?
I'm thinking some variety of Hash Table, but I'm kind of stuck.
This is for a study guide in Algorithms and Data Structures.
The full question:
The set operation UNION takes two disjoint sets S1 and S2 as input, and returns a
set S = S1 ∪ S2 consisting of all the elements of S1 and S2 (the sets S1 and S2 are
usually destroyed by this operation). Explain how you can support UNION operation
in O(1) time using a suitable data structure. Discuss what data structure you would
use and describe the algorithm for the UNION operation.
If the sets are disjoint, a linked list (with a head and tail) will be enough. The union in this case is only a concatenation of the lists. In C++:
struct LL {
Value *val;
LL *next;
};
struct LList{
LL *head;
LL *tail;
};
and the union operation will be:
void unify(LList* list1, LList* list2) {
// assuming you take care of edge cases
list1->tail->next = list2->head;
list1->tail = list2->tail;
return;
}
An interesting technique that sometimes applies to that problem (not always though, as you will see), is to use an array of "cycles", each cycle storing a set. The cycles are stored as a bunch of "next element" links, so next[i] will give an integer that represents the next item. In the end the links loop back, so the sets are necessarily disjoint.
The nice thing there is that you can union two sets together by swapping two items. If you have indexes s1 and s2, then the sets they are in (s1 and s2 are not special representatives, you can refer to a set by any of its elements) can be unioned by swapping those positions:
int temp = next[s1];
next[s1] = next[s2];
next[s2] = temp;
Or however you can swap in your language. Java doesn't have a nice equivalent of std::swap(&next[s1], &next[s2]) as far as I know.
This is obviously related to cyclic linked lists, but more compact. The downside is that you have to prepare your "universe" in advance. With linked lists you can arbitrarily add items. Also if your items are not the integers 0 to n then you will have an array on the side to do the mapping, but that's not really a pure downside or upside, it depends on what you need to do with it.
A bonus upside is that because you can refer to an item by index, it goes together more easily with other data structures, for example it likes to cooperate with the Union Find structure (which is also an array of integers, well two of them), inheriting the O(1) Union that both structure offer, keeping the amortized O(α(n)) Find of Union Find, and also (from the cycles structure) keeping the O(m) set enumeration for a set of size m. So you mostly get the best of both worlds.
In case it wasn't obvious, you can initialize the "universe" with "all singletons" like this:
for (int i = 0; i < next.length; i++)
next[i] = i;
The same as in Union Find.

Record Matching - Efficient Iteration

I have to preform record matching of 70K records in Java. One record size would be 200 bytes As record matching process all records compared against all records. My query is, how efficiently I can iterate and perform comparison.
First of all, you don't need compare all to each other. Once A - B is equal to B - A, you just need compare one with its successors. For example, you have { A, B, C, D }, then you compare A with B, C and D. Compare B with C and D, and compare C with D. This cut the amount of comparisons from n ^ 2 to n!.
You can optimize the algorithm by making search blocks. Put everyone with the same name and last name on the same block. Everyone with the same email on other block and so on. After all, you process each block comparing their records as described above. Depending on the amount of records you have, you will reduce dramatically the time of processing.
Use Duke [https://github.com/larsga/Duke].
Not perfect, but it's free and Java.
We have .NET version that is better and faster, but it's in-house thing, not OSS yet.

Genetic Algorithms: Genes values should sum up to one

I want to implement a genetic algorithm (I'm not sure about the language/framework yet, maybe Watchmaker) to optimize the mixing ratio of some fluids.
Each mix consists of up to 5 ingredients a, b, c, d, e, which I would model as genes with changing values. As the chromosome represents a mixing ratio, there are (at least) two additional conditions:
(1) a + b + c + d + e = 1
(2) a, b, c, d, e >= 0
I'm still in the stage of planning my project, therefore I can give no sample code, however I want to know if and how these conditions can be implemented in a genetic algorithm with a framework like Watchmaker.
[edit]
As this doesn't seem to be straight forward some clarification:
The problem is condition (1) - if each gene a, b, c, d, e is randomly and independently chosen, the probability of this to happen is approximately 0. I would therefore need to implement the mutation in a way where a, b, c, d, e are chosen depending on each other (see Random numbers that add to 100: Matlab as an example).
However, I don't know if this is possible and if it this would be in accordance with evolutionary algorithms in general.
The first condition (a+b+c+d+e=1) can be satisfied by having shorter chromosomes, with only a,b,c,d. The e value can then be represented (in the fitness function or for later use) by e:=1-a-b-c-d.
EDIT:
Another way to satisfy the first condition would be to normalize the values:
sum:= a+b+c+d+e
a:= a/sum;
b:= b/sum;
c:= c/sum;
d:= d/sum;
e:= e/sum;
The new sum will then be 1.
For the second condition (a,b,c,d,e>=0), you can add an approval phase for the new offspring chromosomes (generated by mutation and/or crossover) before throwing them into the gene pool (and allowing them to breed), and reject those who dont satisfy the condition.

Algorithm for Graph/Data Structure on Java

I have been working on the following problem where, I have a CSV file with two columns, we can say the filed names are "Friends". Both the columns contain letters from A to Z.
e.g.
A B
B C
A E
D F
E F
Each row has two different letters(no duplication in the row). A is a friend of B, C is a friend of D etc...If person A talks to person B and Person B talks to person C, then B and C will become aquitances. Aquintaces are who share a common friend. I need to fin out who has more aquintances?
I have been trying with two different methods one using differnt data structures like hashmap, arraylist, stack etc, and another using graph theory (JGraphT library).
But, i am stuck with the logic if I use data strcutres and I am stuck with traversal in the graph if I use graph theory.
I have following questions:-
What is a better approach to go with data structures or graph? Or
any other better approach/logic/algorithm than this?
Does anyone know how to traverse a graph in JgraphT Library. I am
not able to do this, they have very limited documentation about
the library.
Please, any help would really be appreciated.
Generally HashMaps are among the most rapid and easy to use. I would recommend you use them rather any custom libraries, except if you are sure that some library will do easily something you need and something that will take longtime for you to code.
In your case, just you can just use each person as a key and the list of his friends as the object pointed to by. Parsing your .csv file and filling the HashMap accordingly will solve your issue, as a previous comment pointed out.
You can have a hash table first that maps every letter to the set of its friends, e.g. A maps to { B }, B maps to { C }, and if Z has two friends Y and W then Z maps to { Y, W }. This is a hash map from letters to sets-of-letters.
To calculate the acquaintances, iterate through the hash map. When you are at entry A -> { B, C, F } then iterate in an inner loop through the set B, C, F. Collect their friends (three sets) into a single set (just insert the elements into a temp set) and remove A from that set if A was found. The size of that set is then the number of acquaintances for A. Rinse and repeat.

The efficeient approach to calculate the interaction or union of two sets

I have two arraylists, where each of which stores a set of elements. I would like to obtain and output the intersection of these two sets. Is there any efficient and elegant way of achieve this? How about union?
HashSet s0 = new HashSet(arraylist0);
s0.retainAll(arraylist1);
System.out.println("Intersection: " + s0);
s0 = new HashSet(arraylist0);
s0.addAll(arraylist1);
System.out.println("Union: " + s0);
If the data is logically a set, it should be stored in a Set such as a HashSet rather than a List. If you don't mind creating a new copy set and/or modifying one of the existing sets, addAll and retainAll can be used get the union/intersection.
Another option is to use Guava's Sets class to create views of the union, intersection, etc. of the two sets:
Set<Foo> union = Sets.union(firstSet, secondSet);
Such views are very efficient to create (constant time) and to do most operations (particularly contains) on, but may have to iterate over the elements for other operations such as size. They also reflect the state of their input sets even as those sets are modified after creation.
The simplest way to compute an intersection/union is to do it lazily. This takes a constant amount of time and memory. For example, to take a union of two sets which are described by some point membership classifier (aka a PMC), you could do the following:
def union(pmc_a, pmc_b)
return lambda x : pmc_a(x) or pmc_b(x)
Of course to avoid such trivialities, you must define the union and intersection relative to the type of sets you are interested in and the type of data structure you wish to use.
For example, if the sets are discrete, then you should use a hash set, as both Marcin and Pajton suggest.
If they are continuous sets formed by intersections, unions and complements of closed halfspaces (ie Nef polyhedra), then a BSP tree is a better data structure giving you linear time boolean operations (for fixed dimension).
On the other hand, if they are arbitrary algebraic sets (in other words given by the 0's of a set of polynomial equations), then you would be stuck using Buchberger's algorithm to compute the Grobner basis.
Finally, for general semialgebraic sets (ie sets of polynomial inequalities), the best you can do is use Tarski-Sedenberg, and the cylindrical algebraic decomposition. These latter methods are somewhat unreliable, and undecidable in generality.
There are of course many more types of algorithms which are specialized to various types of sets and their representations. So, the bottom line is that in order to compute these operations you have to first describe what sorts of objects you are working with, and how they are to be represented.
Union: add them to a hashset.
Intersection: Add one of them to a hashset. When adding the members of the second list, record which ones are a clash.

Categories

Resources