Java collections over arangoDB

Java collections over arangoDB - java

I am relatively new to ArangoDB. Is there any library written which implements java collections over ArangoDB.
i.e creates an Arangodb server that stores the value in the database and extracts the values as and when needed. I am looking for something similar to Redisson (https://github.com/mrniko/redisson) which is implemented over Redis.

Sadly it is not possible at the moment. But if you want to you can modify the ArangoDB Java Driver (github.com/arangodb/arangodb-java-driver).
Everyone can contribute to the project and if you need any help with the work just ask the ArangoDB Team.

It's important to note that Redis is an in-memory (but persistent-on-disk) database. This makes for screaming fast read/write operations, but at a high memory cost. ArangoDB, on the other hand, compromises some speed to limit the memory footprint, and does so quite well.
However, because of this difference, it does not necessarily make sense to do for ArangoDB what Redisson does for Redis - that is, expose its own Java Collection implementations which allow more direct interaction with the in-memory entities. You would most likely run into unwanted memory issues. Since memory optimization is an important (and nice!) feature of ArangoDB, I would avoid going down this path.
That being said, there are newer Java libraries available to help you easily integrate with ArangoDB.
JNoSQL is a solid "JPA or ORM-like" framework written specifically for NoSQL databases. ArangoDB is one of many that is supported. It exposes convenient annotations and easily supports classic DAO/Repository patterns. There are some good code examples that'll help point you in the right direction.

Related

How to implement Data Access Layer using raw JDBC without using Object Relational Model (ORM)

I am working on a Web Application, to be designed in Java using Play Framework. This application will have high traffic so performance will be a major concern. The performance cause is thus preventing me from using an Object Relational Model (ORM). On searching I found that ORM can be replaced architecturally using a Data Access Layer (DAL) also using database access via "raw" JDBC. However, since I am a novice, I don't understand what "raw" JDBC is. Is this similar to the one in this tutorial. Moreover, how can we implement a modular and manageable DAL using this pattern?

Your best bet would be to initially implement your program as quickly as possible ( lightweight Object Relational Modal may be a good choice). Your aim is speed of creation (but make sure you "hide" elements of your program so the use of an ORM or direct JDBC is not "known" by much of your program.
Once you have your program running, measure where your performance blockers are ... you may be surprised an what you find. Reducing the time to get "running" pays great dividends on focusing your improvement efforts on actual issues rather than "expected ones".

Tools to do data processing from Java

I've got a legacy system that uses SAS to ingest raw data from the database, cleanse and consolidate it, and then score the outputted documents.
I'm wanting to move to a Java or similar object oriented solution, so I can implement unit testing, and otherwise general better code control. (I'm not talking about overhauling the whole system, but injecting java where I can).
In terms of data size, we're talking about around 1 TB of data being both ingested and created. In terms of scaling, this might increase by a factor of around 10, but isn't likely to increase on massive scale like a worldwide web project might.
The question is - what tools would be most appropriate for this kind of project?
Where would I find this information - what search terms should be used?
Is doing processing on an SQL database (creating and dropping tables, adding columns, as needed) an appropriate, or awful, solution?
I've had a quick look at Hadoop - but due to the small scale of this project, would Hadoop be an unnecessary complication?
Are there any Java packages that do similar functionality as SAS or SQL in terms of merging, joining, sorting, grouping datasets, as well as modifying data?

It's hard for me to prescribe exactly what you need given your problem statement.
It sounds like a good database API (i.e. native JDBC might be all you need with a good open source database backend)
However, I think you should take some time to check out Lucene. It's a fantastic tool and may meet your scoring needs very well. Taking a search engine indexing approach to your problem may be fruitful.

I think the question you need to ask yourself is
what's the nature of your data set, how often it will be updated.
what's the workload you will have on this 1TB or more data in the future. Will there be mainly offline read and analysis operations? Or there will also have a lot random write operations?
Here is an article talking about if to choose using Hadoop or not which I think is worth reading.
Hadoop is a better choice if you only have daily or weekly update of your data set. And the major operations on the data is read-only operations, along with further data analysis. For the merging, joining, sorting, grouping datasets operation you mentioned, Cascading is a Java library running on top of Hadoop which supports this operation well.

Best way to develop Java with a DB

I've experience with Toplink to translate objects to database and vica versa. But this was all part of a JSP site and now I did some EJB stuff with it to. Now is my question: is it good to work with stuff like Toplink in a Java Desktop application or is it more common to use native sql stuff from Java?
Maybe some experience of prof. developpers might be good. I need to develop a seriously application for a client. I'm doing it in Java and I'm gonna store the data in a database.
Thanks

ORM is nice if your data model is well structured, not overly complex and, most of all, if you have control over it.
Legacy databases or poorly modelled ones are harder to be represented with ORM, and doing so would be strongly discouraged, as your application would add further complexities over those implied by the model itself.
If you are comfortable with some ORM tool such as Hibernate and your database is fairly well done, go for it. They sure save you a lot of boilerplate code and have some nice query optimization code under the hood. Otherwise, you may want to use JDBC directly or some other framework to simplify JDBC use but still using plain SQL. For such situations I recommend MyBatis.

TopLink (and EclipseLink/JPA) work just as well in a desktop application as in a server side application. In fact TopLink has been around since the 90s with client-server Smalltalk apps before the server side was popular.

It's dependent on your use cases
ORM technologies can nicely abstract away database specifics and allow you to concentrate of the domain model. However, there are circumstances where using an ORM layer is not appropriate (extremely large data sets can cause performance issues for example, database schemas that are difficult to map to objects is another).
I would recommend using a JPA compliant technology such as Hibernate. That way you're using the ORM that implements a Java standard and you can more or less swap in and out implementations.
For everything else then JDBC is a flexible friend

depends on database volume too. For databases with huge data try using hibernate. It might be of great help rather than writing JDBC code

Recommendations for an in memory database vs thread safe data structures

TLDR: What are the pros/cons of using an in-memory database vs locks and concurrent data structures?
I am currently working on an application that has many (possibly remote) displays that collect live data from multiple data sources and renders them on screen in real time. One of the other developers have suggested the use of an in memory database instead of doing it the standard way our other systems behaves, which is to use concurrent hashmaps, queues, arrays, and other objects to store the graphical objects and handling them safely with locks if necessary. His argument is that the DB will lessen the need to worry about concurrency since it will handle read/write locks automatically, and also the DB will offer an easier way to structure the data into as many tables as we need instead of having create hashmaps of hashmaps of lists, etc and keeping track of it all.
I do not have much DB experience myself so I am asking fellow SO users what experiences they have had and what are the pros & cons of inserting the DB into the system?

Well a major con would be the mismatch between Java and a DB. That's a big headache if you don't need it. It would also be a lot slower for really simple access. On the other hand, the benefits would be transactions and persistence to the file system in case of a crash. Also, depending on your needs, it allows for querying in a way that might be difficult to do with a regular Java data structure.
For something in between, I would take a look at Neo4j. It is a pure Java graph database. This means that it is easily embeddable, handles concurrency and transactions, scales well, and does not have all of the mismatch problems that relational DBs have.
Updated If your data structure is simple enough - a map of lists, map of maps, something like that, you can probably get away with either the concurrent collections in the JDK or Google Collections, but much beyond that, and you will likely find yourself recreating an in memory database. And if your query constraints are even remotely difficult, you're going to have to implement all of those facilities yourself. And then you'll have to make sure that they work concurrently etc. If this requires any serious complexity or scale(large datasets), I would definitely not roll your own unless you really want to commit to it.
If you do decided to go with an embedded DB there are quite a few choices. You might want to start by considering whether or not you want to go the SQL or the NoSQL route. Unless you see real benefits to go SQL, I think it would also greatly add to the complexity of your app. Hibernate is probably your easiest route with the least actual SQL, but its still kind of a headache. I've done it with Derby without serious issues, but it's still not straightforward. You could try db4o which is an object database that can be embedded and doesn't require mapping. This is a good overview. Like I had said before, if it were me if I would likely try Neo4j, but that could just be me wanting to play with new and shiny things ;) I just see it as being a very transparent library that makes sense. Hibernate/SQL and db4o just seems like too much hand waving to feel lightweight.

You could use something like Space4J and get the benefits of both a collections like interface and an in memory database. In practical use something as basic as a Collection is an in memory database with no index. A List is an in memory database with a single int index. A Map is an in memory database with a single index type T based index and no concurrency unless synchronized or a java.util.concurrency.* implementation.

I was once working for a project which has been using Oracle TimesTen. This was back in early 2006 when Java 5 was just released and java.util.concurrent classes were barely known. The system we have developed had reasonably big scalability and throughput requirements (it was one of the core telco boxes for SMS/MMS messaging).
Briefly speaking, reasoning for TimesTen was fair: "let's outsource our concurrency/scalability problems to somebody else and focus on our business domain" and made perfect sense then. But this was back in 2006. I don't think such a decision would be made today.
Concurrency is hard, but so is handling of in-memory databases. Freeing yourself of concurrency problems you'd have to become an expert of in-memory database world. Fine tuning TimesTen for replication is hard (we had to hire a professional consultant from Oracle to do this). License(s) don't come for free. You also need to worry about additional layer which is not open source and/or might be written in a different language than the one you understand.
But it is really hard to make any judgement without knowing your experience, budget, time requirements, etc. Do a shopping around, spend some time for looking into decent concurrency frameworks (such as http://akkasource.org/) ...and let us know what you have decided ;)

Below are few questions which could facilitate a decision.
Queries - do you need to query/reproject/aggregate your data in different forms?
Transactions - do you ever need to rollback added data?
Persistence - do you only need to present the gathered data or do you also need to store it in some way?
Scalability - will your data always fit in the memory?
Performance - how fast should it be?

It is unclear to me why you feel that an in memory database cannot be thread safe.
Why don't you look at JDO and DataNucleus? They have a lot of different datastores where you get to plug in what your back end persistence provider is at run time as a configuration step. Your application code is dependent on an ORM but that ORM might be plugged into an RDBMS, DB40, NeoDatis, LDAP, etc. If one backend doesn't work for you, then switch to another.

Easy way to store and retrieve objects in Java without using a relational DB? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Do you know of an "easy" way to store and retrieve objects in Java without using a relational DB / ORM like Hibernate?
[Note that I am not considering serialization as-is for this purpose, as it won't allow to retrieve arbitrary objects in the middle of an object graph. Neither am I considering DB4O because of its restrictive license. Thanks.]
"Easy" meaning: not having to handle low-level details such as key/value pairs to rebuild an object graph (as with BerkeleyDB or traditional caches). The same applies for rebuilding objects from a document- or column-oriented DB (CouchDB, HBase, ..., even Lucene).
Perhaps there are interesting projects out there that provide a layer of integration between the mentioned storage systems and the object model (like ORM would be for RDBMSs) that I am not aware of.
Anyone successfully using those in production, or experimenting with persistence strategies other than relational DBs? How about RDF stores?
Update: I came across a very interesting article: A list of distributed key-value stores

Object Serialization (aka storing things to a file)
Hibernate (uses a relational database but it is fairly transparent to the developer)
I would suggest Hibernate because it will deal with most of the ugly details that bog developers down when using a database while still allowing for the optimizations that have been made to database software over the years.

NeoDatis looks interesting. It is licensed under the LGPL, so not quite as restrictive as the GLP proper.
Check out their 1 minute tutorial to see if it will work for your needs.

I would like to recommend XStream which simply takes your POJOs and creates XML out of them so you can store it on disk. It is very easy to use and is also open source.

I'd recommend Hibernate (or, more general, OR-mapping) like Matt, but there is also a RDBMS at the backend and I'm not so sure about what you mean by
...without using a relational DB?...
It also would be interesting to know more about the application, because OR-mapping is not always a good idea (development performance vs. runtime performance).
Edit: I shortly learned about terracotta and there is a good stackoverflow discussion here about replacing DBs with that tool. Still experimental, but worth reading.

I still think you should consider paying for db4o.
If you want something else, add "with an MIT-style license" to the title.

Check out comments on Prevayler on this question. Prevayler is a transactional wrapper around object serialization - roughly, use objects in plain java and persist to disk through java API w/o sql, a bit neater than writing your own serialization.
Caveats- with serialization as a persistance mechanism, you run the risk of invalidating your saved data when you update the class. Even with a wrapper library you'll probably want to customize the serialization/deserialization handling. It also helps to include the serialVersionUID in the class so you override the JVM's idea of when the class is updated (and therefore can't reload your saved serialized data).

Hmm... without serialization, and without an ORM solution, I would fall back to some sort of XML based implementation? You'd still have to design it carefully if you want to pull out only some of the objects from the object graph - perhaps a different file for each object, where object relationships are referenced by a URI to another file?
I would have said that wasn't "easy" because I've always found designing the mapping of XML to objects to be somewhat time consuming, but I was really inspired by a conversation on Apache Betwixt that has me feeling hopeful that I'm just out of date, and easier solutions are now available.

Terracotta provides a highly available, highly scalable persistent to disk object store. You can use it for just this feature alone - or you can use it's breadth of features to implement a fully clustered application - your choice.
Terracotta:
does not break object identity giving you the most natural programming interface
does not require Serialization
clusters (and persists) nearly all Java classes (Maps, Locks, Queues, FutureTask, CyclicBarrier, and more)
persists objects to disk at memory speeds
moves only object deltas, giving very high performance
Here's a case study about how gnip uses Terracotta for in-memory persistence - no database. Gnip takes in all of the events on Facebook, Twitter, and the like and produces them for consumers in a normalized fashion. Their current solution is processing in excess of 50,000 messages / second.
It's OSS and has a high degree of integration with many other 3rd party frameworks including Spring and Hibernate.

I guess I have found a sort of answer to my question.
Getting the document-oriented paradigm mindset is no easy task when you have always thought your data in terms of relationships, normalization and joins.
CouchDB seems to fit the bill. It still could act as a key-value store but its great querying capabilities (map/reduce, view collations), concurrency readiness and language-agnostic HTTP access makes it my choice.
Only glitch is having to correclty define and map JSON structures to objects, but I'm confident I will come up with a simple solution for usage with relational models from Java and Scala (and worry about caching later on, as contention is moved away from the database). Terracotta could still be useful but certainly not as with an RDBMS scenario.
Thank you all for your input.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.