My own graph representation for Neo4j - java

I have a graph that I want to explore in different ways. This graph is going to be explored by users and I cannot know in advance what information they want to retrieve from the graph. I like Cypher very much and I was wondering if I can use it as a frond-end but using my own representation of the graph.
Let me explain that: I cannot transform my graph into a Neo4j Graph for performance reasons. Hence, I was thinking that maybe I can use Cypher and a modification of Neo4j to explore the graph using my own representation of Node, Labels, Properties and so on.
I think this solution would be good because I can:
Reuse the parser and semantic checker of the language
Partially reuse the optimization engine, let's say the platform independent part.
I was exploring the source code at github and it seems really coupled to a specific implementation.
My questions:
Are you aware of some project using Cypher/Neo4j like this?
Are you aware of another graph database with a good query language that can be used like that?
Any suggestions on how to address the modifications to Neo4J
Just to explain a little bit why I cannot copy the graph. It is a graph that is already produced by another system. It changes a lot an it has easily 10000 nodes, I cannot monitor the graph modification to update the graph because it is, once again, time consuming. Even worse, I have to provide a mechanism to query the graph every five seconds.

Related

Can vertexes and edges in jGraphT be added parallely

I'm creating a graph in java using jGraphT and adding vertexes and edges from a list using a stream.
My question is:
Can I use stream().parallel() to add them faster?
No, at least not as far as I'm aware. Essentially, adding a vertex or edge boils down to 2 steps: (a) check whether the edge/vertex already exists and if not (b) add the edge/vertex. Depending on the type of graph, step (b) involves adding the object to the appropriate container that stores the edges/vertices. I'm not an expert on concurrent programming, but I don't see how a parallel stream can do the above faster.
I don't know exactly what your usecase is, or what you try to accomplish. There are however some optimized, special graph types in the jgrapht-opt package that might benefit you. The graph functionality doesn't change (i.e. you can run the same algorithms on them); only the way the graph is stored changes. Some storage mechanisms are more memory efficient, allowing you to store massive graphs using little memory. Other graphs, such as the sparse graphs, can be created quicker and access operations are also quicker, but these graphs are typically immutable, i.e. once created they cannot be changed. What you need really depends on your usecase.

Sorted Array vs Hashtable: Which data structure would be more efficient in searching over a range of dates in a calendar app?

I have about a year of experience in coding in Java. To hone my skills I'm trying to write a Calendar/journal entry desktop app in Java. I've realized that I still have no experience in data persistence and still don't really understand what the data persistence options would be for this program -- So perhaps I'm jumping the gun, and the design choices that I'm hoping to implement aren't even applicable once I get into the nitty gritty.
I mainly want to write a calendar app that allows you to log daily journal entries with associated activity logs for time spent on daily tasks. In terms of adding, editing and viewing the journal entries, using a hash table with the dates of the entries as keys and the entries themselves as the values seems most Big-Oh efficient (O(1) average case for each using a hash table).
However, I'm also hoping to implement a feature that could, given a certain range of dates, provide a simple analysis of average amount of time spent on certain tasks per day. If this is one of the main features I'm interested in, am I wrong in thinking that perhaps a sorted array would be more Big-Oh efficient? Especially considering that the data entries are generally expected to already be added date by date.
Or perhaps there's another option I'm unaware of?
The reason I'm asking is because of the answer provided by this following question: Why not use hashing/hash tables for everything?
And the reason I'm unsure if I'm even asking the right question is because of the answer to the following question: Whats the best data structure for a calendar / day planner?
If so, I would really appreciate being directed other resources on data persistence in java.
Thank you for the help!
Use a NavigableMap interface (implemented by TreeMap, a red-black tree).
This allows you to easily and efficiently select date ranges and traverse over events in key order.
As an aside, if you consider time or date intervals to be "half-open" it will make many problems easier. That is, when selecting events, include the lower bound in results, but exclude the upper. The methods of NavigableMap, like subMap(), are designed to work this way, and it's a good practice when you are working with intervals of any quantity, as it's easy to define a sequence of intervals without overlap or gaps.
Depends on how serious you want your project to be. In all cases, be careful of premature optimization. This is when you try too hard to make your code "efficient", and sacrifice readability/maintainability in the process. For example, there is likely a way of doing manual memory management with native code to make a more efficient implementation of a data structure for your calendar, but it likely does not outweigh the beneits of using familiar APIs etc. It might do, but you only know when you run your code.
Write readable code
Run it, test for performance issues
Use a profiler (e.g. JProfiler) to identify the code that is responsible for poor performance
Optimise that code
Repeat
For code that will "work", but will not be very scalable, a simple List will usually do fine. You can use JSONs to store your objects, and a library such as Jackson Databind to map between List and JSON. You could then simply save it to a file for persistence.
For an application that you want to be more robust and protected against data corruption, a database is probably better. With this, you can guarantee that, for example, data is not partially written, concurrent access to the same data will not result in corruption, and a whole host of other benefits. However, you will need to have a database server running alongside your application. You can use JDBC and suitable drivers for your database vendor (e.g. Mysql) to connect to, read from and write to the database.
For a serious application, you will probably want to create an API for your persistence. A framework like Spring is very helpful for this, as it allows you to declare REST endpoints using annotations, and introduces useful programming concepts, such as containers, IoC/Dependency Injection, Testing (unit tests and integration tests), JPA/ORM systems and more.
Like I say, this is all context dependent, but above all else, avoid premature optimization.
This thread might give you some ideas what data structure to use for Range Queries.
Data structure for range query
And it even might be easier to use a database and using an API to query for the desired range.
If you are using (or are able to use) Guava, you might consider using RangeMap (*).
This would allow you to use, say, a RangeMap<Instant, Event>, which you could then query to say "what event is occurring at time T".
One drawback is that you wouldn't be able to model concurrent events (e.g. when you are double-booked in two meetings).
(*) I work for Google, Guava is Google's open-sourced Java library. This is the library I would use, but others with similar range map offerings are available.

Real-time updating framework

I am trying to implement a real-time updating framework, where changing input data automatically leads to recalculating all dependent results. So I need a kind of subscription mechanism, but a clever one, as I have to handle enormous amounts of data. I like to think about the mechanism as a "calculation tree" or directed graph, with the nodes representing the results, and the edges representing the functions.
Something similar must have been implemented in MS Excel, with the cells being the nodes, but Excel will not fulfill my needs as it is not able to handle large amounts of data, and is not flexible enough.
While in principle I want to be able to browse through the complete calculation tree (including all results in the complete depth of the tree), I know that this could mean storing several Terabytes of data. So I need to be able to forget or skip nodes if the computer runs out of memory, and then recalculate them as needed. And not to forget: while programming the (short!) functions, I don't want to be bothered with endless technical subscribe stuff (ideally this should be taken care of automatically in the framework).
Do you think it's doable, and if so, how would you attack it? Do you know of any component / library which one could use for this type of things? I have thought about publish/subscribe mechanisms and message brokers, but fear they are going to slow down my calculations.
Thx in advance for your responses!
Calle

data structure for Graph in java

i am looking to implement a algorithm for the Travel Salesman Problem. I want to model the solution using the graph , in this a vertex will represent a city and edge will represent the cost from one city to another.
At any point i have to compute cost from one city to another like
(cityA, cityB)--->cost
What data structure in java should i use for graph?
And what kind of GUI tool or library can i use to represent a graph ?
If you are free to use external libraries, JGraphT is easy to use.
Why a graph, of course. Take a look at JUNG.
depending on the size of your problem and your hardware, you might want to take a look at nosql graph databases like: http://neo4j.org/
most of them are easy to handle and some (like neo4j) provide a graph visualization which is nice for debugging purpose.
if you want to develop with as much performance as possible, you might have to create your own simple graph format. Most of the told ones are slow due to a lot of overhead (generics etc.)

Java Graph Visualisation Library: Nodes with multiple connect points

Can anyone recommend a Java Graph Visualisation library in which graph nodes can be rendered with multiple connect points?
For example, supposing a graph node represents a processor that takes input from two sources and produces output. This would be visualised as 3 vertices. However, clearly each vertex has a defined role in the workflow and therefore ideally my node would appear with 3 distinct connection points that the user could attach vertices to.
I've taken a look at JUNG but I don't think it will suit my needs.
Any recommendations welcome; either on specific libraries or alternative approaches I can take.
You could try JGraph's java library
JGRAPH
It has a good amount of functionality and I have used it with success before. The only thing is that the documentation is a bit lacking, but if you read through some examples and code its pretty good when you get the hang of it.
Take a look at JGraph (http://www.jgraph.com/). I used jgraph-5.14.0.0 for a similar project before. Here are the graphs that I made for another project: https://github.com/eamocanu/spellcheck.graph/tree/master/graph%20photos

Categories

Resources