I have recently started using the Jackcess library in Java for dealing with MS Access databases. The library is pretty good but I have a question regarding searching rows.
Consider that I have "Jack loves apples" in a row of a column named X, what piece of code would I use to search for all the rows of X containing the word "apples"? I know this can be easily done using wildcards in SQL but since there is no way to use SQL queries in Jackcess, that's not a valid option.
I considered using UCanAccess but I have issues with the library, even if I use the "memory=false" option while loading the database, it still takes almost 1.4GB of memory.
#centic's answer was accurate until jackcess version 3.5.0. As of the 3.5.0 release, you can use the new PatternColumnPredicate class to do various wildcard/pattern/regex searches using Cursors.
With Jackcess you need to iterate the rows and apply the filter yourself. As long as your filter is fairly static, this should be fairly easy to build.
Related
I'm trying to convert some Oracle queries using PL/SQL to jOOQ. Most package queries, stored procedures, etc. are easy, using the code generator. However there's one feature used in several places for which I haven't found a jOOQ alternative:
begin
MY_SCHEMA.MY_PACKAGE.MY_QUERY(some_param => MY_SCHEMA.MY_PACKAGE.SOME_CONSTANT)
-- more code
end;
I can call the query just fine, but I'm not sure how to pass the MY_SCHEMA.MY_PACKAGE.SOME_CONSTANT value into it. The jOOQ code generator doesn't seem to generate anything for the constant (at least, I can't find anything similarly named). Do I need to enable a feature on the generator? Or do I need to query those constants? If so, how?
Enabling PL/Scope for this to work
jOOQ can generate code for your package constants if it can find them in your ALL_IDENTIFIERS dictionary view. That's only the case if you enable PLSCOPE_SETTINGS when compiling your packages, e.g. using:
ALTER SESSION SET PLSCOPE_SETTINGS='IDENTIFIERS:ALL'
With that in place, jOOQ will generate expressions for your package constants, which you can use in routine calls, or other procedural logic.
PL/Scope independence
What Simon Martinelli referred to in the comments is issue https://github.com/jOOQ/jOOQ/issues/6504, which attempts to enable this code generation support even without the above PL/Scope setting turned on, because that's quite unreliable depending on your environment.
As of jOOQ 3.15, there's no solution yet that works on any Oracle environment. But you could use testcontainers to generate your jOOQ code from a Docker image that has PL/Scope enabled.
i need to handle a big CSV file with around +750.000 rows of data. Each line has around 1000+ characters and ~50 columns, and i am really not sure what's the best (or atleast good and sufficient) way to handle and manipulate this kind of data.
I need to do the following steps:
Compare the values of two Colomns and write the result to a new column (this one seems easy)
Compare values of two lines and do stuff. (e.g delete if one value is duplicated.)
Compare values of two different files.
My Problem is that this is currently done with PHP and/ or Excel and the limits are nearly exceeded + this takes a long time to process and will be no longer possible when the files get even bigger.
I have 3 different possibilities in mind:
Use MySQL, create a table (or two) and do the comparing, adding or deleting part. (I am not really familiar with SQL and would have to learn it, also it should be done automatically so there is the problem that you cant create tables of CSV files )
Use Java creating Objects in ArrayList or Linked Lists and to "the stuff" (to operations would be easy but handling that much data will probably be the problem)
(Is it even possible to save that many files in Java or does it crash / is there a good tool etc.?)
Use Clojure along with MongoDB to add files from CSV to MongoDB and read files using Mongo.
(Name additional possibilities if you have another idea ..)
All in all I am not a Pro in any of these but would like to solve this problem / get some hints or even your opinion.
Thanks in advance
Since in our company we work a lot with huge csv files here are some ideas:
because these files are in our case always exported from some other relational database we always use PostgreSQL, MySQL or golang + SQLite to be able to use simple plain SQL queries which are in these cases most simple and reliable solution
number of rows you describe is quite low from the point of view of all these databases so do not worry
all have native internal solution for import / export of CSV - which works much quicker than anything created manually
for repeated standard checks I use golang + SQLite with :memory: database - this is definitely the quickest solution
MySQL is definitely very good and quick for checks you described but choose of database depends also on how sophisticated analysis you would need to do further - for example MySQL up to 5.7 still does not have window functions which you could need later - so consider using PostgreSQL in some cases too...
I normally use PostgreSQL for this kind of tasks. PostgreSQL COPY allows importing CSV data easily. Then you get a table with your CSV data and the power for SQL (and a reasonable database) to do basically anything you want with the data.
I am pretty sure MySQL have similar capabilities of importing CSV, I just generally prefer PostgreSQL.
I would not use Java for CSV processing. This will be too much code and unless you take care of indices, the processing will not be performant. An SQL database is much better equiped for tabular data processing (should not be a surprize).
I wouldn't use MongoDB, my impression is that it is less powerful in update operations compared to an SQL database. But this is just an opinion, take it with a grain of salt.
You should try Python with the pandas package. On a machine with enough memory (say 16GB) it should be able to handle your CSV files with ease. The main thing is - anyone with some experience with pandas will be able to develop a quick script for you and tell you in a few minutes if your job is doable or not. To get you started:
import pandas
df = pandas.read_csv('filename.csv')
You might need to specify the column type if you get into memory issues.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
I'd suggest to use Spark. Even in a standalone machine the performance is incredible. You can use Scala and Python to handle your data. It's flexible and you can do processing that is impossible in Java or relational database.
The other choices are great also, but I'd consider Spark to all analytics needs from now on.
I'm looking for easy-to-use graph DB + ORM solution. The requirements are:
Fluent Java interfaces, no need to use any XMLs.
Ease of graph traversal: "give me all entities of these types, starting from this one, traverse only using this set of relation types".
Full text search out of the box: p.2 + "only consider entities where this field contains this text"
No need to operate on graph level: Neo4j is great, but I'd like to avoid working with setProperty/getProperty directly.
I've already checked these:
ogrm - not supported anymore.
jo4neo - looks like doesn't work p.2 and p.3
Spring Data Graph - seems to be great things, but it's too immature - spent a week trying to make it work fine in Eclipse - no success.
Are there any other similar tools I need to check?
Spring Data Graph is the most actively developed, with a recently released version 1.1.0 and lots of work planned before SpringOne in October.
However, it does create a challenge for IDEs because of the AspectJ enhanced POJOs. Have a look at the documentation for some help getting that going.
Cheers,
Andreas
As of January 2015, Hibernate has started supporting neo4j:
http://hibernate.org/ogm/
Obviously, you can't query using hql, but they support using Cypher queries.
There is also the very new spring-data-gremlin which does everything of what you want with the power of spring-data.
It also allows native queries, spatial indexes and a bunch of other cool stuff.
Note: It is quite immature, but still worth a look.
I have a query which is doing ILIKE on some 11 string or text fields of table which is not big (500 000), but for ILIKE obviously too big, search query takes round 20 seconds. Database is postgres 8.4
I need to implement this search to be much faster.
What came to my mind:
I made additional TVECTOR column assembled from all columns that need to be searched, and created the full text index on it. The fulltext search was quite fast. But...I can not map this TVECTOR type in my .hbms. So this idea fell off (in any case i thaught it more as a temporary solution).
Hibernate search. (Heard about it first time today) It seems promissing, but I need experienced opinion on it, since I dont wanna get into the new API, possibly not the simplest one, for something which could be done simpler.
Lucene
In any case, this has happened now with this table, but i would like to solution to be more generic and applied for future cases related to full text searches.
All advices appreciated!
Thanx
I would strongly recommend Hibernate Search which provides a very easy to use bridge between Hibernate and Lucene. Rememeber you will be using both here. You simply annotate properties on your domain classes which you wish to be able to search over. Then when you update/insert/delete an entity which is enabled for searching Hibernate Search simply updates the relevant indexes. This will only happen if the transaction in which the database changes occurs was committed i.e. if it's rolled back the indexes will not be broken.
So to answer your questions:
Yes you can index specific columns on specific tables. You also have the ability to Tokenize the contents of the field so that you can match on parts of the field.
It's not hard to use at all, you simply work out which properties you wish to search on. Tell Hibernate where to keep its indexes. And then can use the EntityManager/Session interfaces to load the entities you have searched for.
Since you're already using Hibernate and Lucene, Hibernate Search is an excellent choice.
What Hibernate Search will primarily provide is a mechanism to have your Lucene indexes updated when data is changed, and the ability to maximize what you already know about Hibernate to simplify your searches against the Lucene indexes.
You'll be able to specify what specific fields in each entity you want to be indexed, as well as adding multiple types of indexes as needed (e.g., stemmed and full text). You'll also be able to manage to index graph for associations so you can make fairly complex queries through Search/Lucene.
I have found that it's best to rely on Hibernate Search for the text heavy searches, but revert to plain old Hibernate for more traditional searching and for hydrating complex object graphs for result display.
I recommend Compass. It's an open source project built on top of Lucene that provider a simpler API (than Lucene). It integrates nicely with many common Java libraries and frameworks such as Spring and Hibernate.
I have used Lucene in the past to index database tables. The solution works great, but remeber that you need to maintain the index. Either, you update the index every time your objects are persisted or you have a daemon indexer that dump the database tables in your Lucene index.
Have you considered Solr? It's built on top of Lucene and offers automatic indexing from a DB and a Rest API.
A year ago I would have recommended Compass. It was good at what it does, and technically still happily runs along in the application I developed and maintain.
However, there's no more development on Compass, with efforts having switched to ElasticSearch. From that project's website I cannot quite determine if it's ready for the Big Time yet or even actually alive.
So I'm switching to Hibernate Search which doesn't give me that good a feeling but that migration is still in its initial stages, so I'll reserve judgement for a while longer.
All the projects are based on Lucene. If you want to implement a very advanced features I advice you to use Lucene directly. If not, you may use Solr which is a powerful API on top of lucene that can help you index and search from DB.
during a lecture my professor gave examples of several actions involving databases and the java.sql package. These examples were supposed to be uploaded online in a pdf file, but for some reason the names of all functions and class names aren't displaying with my pdf reader.
I would like to know the equilavents of the following PHP functions in Java:
mysql_connect
mysql_query
mysql_fetch_row
mysql_fetch_assoc
mysql_close
Thanks!
If you consult the Java API docs appropriate for the version you're using (I'm using JDK 1.5, so it's http://java.sun.com/j2se/1.5.0/docs/api/) and click on java.sql, you can see all the classes for Java JDBC access.
Basically, you create a new Connection to a database with DriverManager, and do a query with Connection.prepareStatement, PreparedStatement.execute() and PreparedStatement.executeQuery() and loop through the resultant ResultSet with ResultSet.next() and pull the results out with ResultSet.getXXXXX.
If you're just getting started with JDBC, consider working your way through Sun's tutorial at: http://java.sun.com/docs/books/tutorial/jdbc/basics/
Working directly with JDBC (java.sql) is verbose and error-prone, especially for beginners, because you need to manually do very repetitive steps, and "finally" close so many database objects (Connections, Statements, ResultSets).
If you do not mind pulling in an extra dependency, Apache Commons have a nice little wrapper package called DbUtils that makes it easy to run queries and updates (while still staying at the SQL level, as opposed to object-relational mappers that go to a higher level of abstraction).