I am working on microflow engine (backend) which is a process flow to be executed in runtime.
Consider the following diagram where each process is a Java Class. There are variables out from process to in to another process. Since flow is dynamic in nature, very complicated flow is possible with many gateways (GW) and processes.
Is DFS/BFS a good choice to implement the runtime engine? Any idea guys.
As far as the given example is concerned, it is solved via Depth First Search (DFS), using the output node as the "root" of the tree.
This is because:
For the output to obtain a value, it needs the output of Process4
For Process4 to produce an output, it needs the outputs of Process2 and
Process3
For Process2 / Process3 to produce an output, they need the
output of GW
For GW to produce an output it needs the output from
Process1
So, the general idea would be to do a DFS from each output, all the way back to the inputs.
This will work almost as described for anything that looks like a Directed Acyclic Graph (DAG, or in fact a Tree), from the point of view of the output.
If a workflow ends up having "cycle edges" or "feedback loops", that is, if it now looks like a Graph, then additional consideration will need to be given to avoid infinite traversals and re-evaluation of a Process output.
Finally, if a workflow needs to be aware of the concept of "Time" (in general) then additional consideration will need to be given so that it is ensured that although the graph is evaluated progressively, node-by-node, in the end, it has produced the right output for time instance (n). That is, you want to avoid some Processes producing output AHEAD of the current time instance just because they were called more frequently.
A trivial example of this is already present in the question. Due to DFS, GW will be evaluated for Process2 (or Process3) but it doesn't have to be re-evaluated (for the same time instance) for Process3 (or Process2). When dealing with DAGs, you can simply add an "Evaluated" flag on each Process which is cleared at the beginning of the traversal. Then, DFS would decide to descend down the branch of a node if it finds that it is not yet evaluated. Otherwise, it simply obtains the output of some Process that was evaluated during a previous traversal. (This is why I mention "almost as described" earlier). But, this trivial trick will not work with multiple feedback loops. In that case, you really need to make the nodes "aware" about the passage of time.
For more information and for a really thorough exposition of related issues, I would strongly recommend that you go through Bruno Preiss' Y logic simulator. Although it is in C++ and is a logic simulator, it goes through exactly the same considerations that are faced by any similar system of interconnected "abstract nodes" that are supposed to be carrying out some form of "processing".
Hope this helps.
Related
I am working on think, which should be "live". I.e. use web-sockets or SSE to show current data in browser. Source of my data are two and they should be combined with a bit of business logic. Data can be retrieved using http get and they also come as web-hook notifications.
I am able to code needed thing in java + spring, but readability would suffer. I have discovered that using RethinkDB would make my task much more easier. But it seems that given project is not backed by live development.
I would like any java idiomatic approach / library / external SW (like database) to make easy (maintainable ~= less code) algorithm which would for example do thing like this:
2 inputs:
filesystem tree (git repo)
list of trees with some processing info in it. Each tree in list does contain:
root node with some irrelevant info
some number of children nodes
leaf nodes with filename from filesystem (with path), duration of action with file and status of file processing
Note: second input can contain for example 20 trees, where it have info about processing single file from filesystem tree 20 times. i.e. it is possible that for getting info about some particular file, we will need to crawl whole list of trees and there is no guarantee that file will have any matching processing info in second input. In this case, we will output "N/A" to resulting tree for given file.
I would like to transform these two inputs to another tree, which will have structure like first input and will contain info about last status (last from array) and sum of duration.
My current approach was not reactive. It have involved a lot of java stream api and using http GET to get actual data from two sources. It was working ok, it was fast enough, but not enough to introduce pooling and make user feel that it is real time.
To make this reactive, it would involve a lot of spaghetti code to keep current algorithms in place. so I have started another approach (from scratch).
I have started to make some nice OOP class, which will receive changes from both inputs and will produce "observable" changes as an output. This would be relatively nice, if my "query", which computes output would be immutable. It is not, due to design changes of business logic.
Can you point me to some approach making this problem implementation easy to maintain?
PS: I was considering using spring cache mechanism for receiving changes (caching methods which make http get calls for inputs and returns parsed, partly processed, input data). But this part of code is a bit small to mane any difference.
I have been working on tinkerpop gremlin graph, and lately i can able to perform lots of stuff with that, now i'm struck at one point where i'm trying to process many thousands of vertices and edges, it takes around one hour to complete the process, how can i apply parallelStream() operation to this following part:
for(String s : somelist){
String[] ss = s.split(",");
graphTraversal().addEdge(ss[0], ss[1]);
}
That "somelist" contains the information for each edge's source and target vertices(~size of 65,000).
TinkerGraph technically isn't completely thread-safe for writes. You might hit some problems depending on what you're loading and how you are loading it. I can't say exactly what those problems are and what you might need to do to avoid them, but we definitely haven't tested TinkerGraph that way.
That said, 65,000 edges in the format you're specifying in your sample code should not take an hour to load into TinkerGraph even in a single threaded mode of operation. That sounds a bit excessive. I assume your sample code is not what you are actually executing as that is not valid Gremlin syntax, so it's hard to say what the problem might be.
I want to make a Java program to help people with basic discrete mathematics (that is to say, checking the truth values of statements). To do this, I need to be able to detect how many variables the user inputs, what operators there are, and what quantifiers there are, if any (∃ and ∀). Is there a good algorithm for being able to do all these things?
Just so you know, I don't just want a result; I want full control over their input, so I can show them the logical proof. (so doing something like passing it to JavaScript won't work).
Okay, so, your question is a bit vague, but I think I understand what you'd like to do: an educational aid that processes first-order logic formulas, showing the user step by step how to work with such formulas, right? I think the idea has merit, and it's perfectly doable, even as a one-man project, but it's not terribly easily, and you'll have to learn a lot of new things -- but they're all very interesting things, so even if nothing at all comes out of it, you'd certainly get yourself some valuable knowledge.
I'd suggest you to start small. I'd start by building a recursive descent parser to recognize zero-order logic formulas (a machine that would decide if a formula is valid, i.e. it'd accept "A ^ B" but it'd reject "^ A ^"). Next up you'd have to devise a way to store the formula, and then you'd be able to actually work on it. Then again, start small: a little machine that accepts valid zero-order logic formulas like TRUE AND NOT (TRUE AND FALSE), and successfully reduces it step by step to true is already something that people can learn from, and it's not too hard to write. If you're feeling adventurous, add variables and make equations: A AND TRUE = TRUE -- it's easy to work these out with reductions and truth tables.
Things get tricky with quantifiers that bind variables, that's where the Automated theorem proving may come into play; but then, it all depends on exactly what you'd like to do: implementing transformations into the various normal forms, and showing the process step by step to the student would be fairly easy, and rather useful.
At any rate, I think it's a decent personal project, and you could learn a lot from it. If you're in a university, you could even get some credit for it eventually.
The technique I have used is to parse the input string using a context free grammar. There are many frameworks to help you do this, I have personally used ANTLR in the past to parse an input string into a descrete logic tree. ANTLR allows you to define a CFG which you can map to Java types. This allows you to map to a data structure to store and evaluate the truth value of the expression. Of course, you would also be able to pull out the variables contained in the data structure.
I'm writing a biological evolution simulator. Currently, all of my code is written in Python. For the most part, this is great and everything works sufficiently well. However, there are two steps in the process which take a long time and which I'd like to rewrite in Scala.
The first problem area is sequence evolution. Imagine you're given a phylogenetic tree which relates a large set of proteins. The length of each branch represents the evolutionary distance between the parent and child. The root of the tree is seeded with a single sequence, and then an evolutionary model (e.g. http://en.wikipedia.org/wiki/Models_of_DNA_evolution) is used to evolve the sequence along the tree structure; taking into account the branch lengths. PyCogent takes a long time to perform this step, and I believe that a reasonable Java/Scala implementation would be significantly faster. Do you know of any libraries that implement this type of functionality. I want to write the application in Scala, so, due to interoperability, any Java library will suffice.
The second problem area is the comparison of the generated sequences. The problem is, given a set of sequences for the proteins in a number of different extant species, attempt to use the sequence to reconstruct the phylogenetic tree which relates the species. This problem is inherently computationally demanding, because one must basically do a pairwise comparison between all sequences in the extant species. Here again, however, I feel like a Java/Scala implementation would perform significantly faster than a Python one, if for nothing else than the unfortunately slow speed of looping in Python. This part I could write from scratch more easily than the sequence evolution part, but I'd be willing to use a library for it as well if a good one exists.
Thanks,
Rob
For the second problem, why not make use an existing program for comparing sequences and infering phylogenetic trees, like RAxML or MrBayes and call that? Maximum likelihood and Bayesian inference are very sophisticated models for these problems, and using them seems a far better idea than implementing it yourself - something like a maximum parsiomony or a neihbour-joining tree, which probably could be written from scratch for such a project, is not sufficient for evolutionary analysis. Unless you just want a very quick and dirty topology (and trees inferred via MP or NJ are really often quite false), where you can probably use something like this
Well, I need to make simulator for non-deterministic Push-Down Automaton.
Everything is okey, I know I need to do recursion or something similar. But I do not know how to make that function which would simulate automaton.
I got everything else under control, automaton generator, stack ...
I am doing it in java, so this is maybe only issue that man can bump on, and I did it.
So if anyone have done something similar, I could use advices.
This is my current organisation of code:
Classes: class transit:
list<transit> -contains non deterministic transitions
state
input sign
stack sign class generator
it generate automaton from file clas NPA
public boolean start() - this function I am having trouble with
Of course problem of separate stacks, and input for every branch.
I tried to solve it with collection of objects NPA and try to start every object, but it doesn work.
Okay, think about the definition of the automaton. You have states and a state transition function. You have the stack. What makes life exciting is the non-determinism.
however, it is a theorem (look it up) that every nondeterministic finite automaton has an equivalent deterministic FSA.
One approach you could try is to construct the equivalent DFA. That's exponential space in the worst case, though: every state in the DFA maps to a subset of the powerset of the NFA states.
So you could try it "on line" instead. Now, instead of constructing the equivalent DFA, you simulate the NFA; at state transitions you construct all the next states you reach and put them on some data structure; then go back and see what happens next for each such state.
JFLAP is open source and does this (and much more!) - why not check it out?