Disable primary key index in MongoDB and enable it after bulk insterting? - java

Is it possible to disable primary key index in MongoDB and enable it after bulk inserting?
The reason is performance during heavy bulk insert.
I'm using JAVA.

No, you cannot drop _id index. See official documentation:
db.collection.dropIndex(index)
Drops or removes the specified index from a collection. The db.collection.dropIndex() method provides a wrapper around the dropIndexes command.
Note
You cannot drop the default index on the _id field.

Apart from what zero323 has mentioned, it is not always a good idea to disable/enable idexes on the already populated collection even if you are doing bulk inserts.
The reasons that in the beginning you have to remove all the indexes that were previously there, insert the data and then redo indexes for previous data and also for the data which was just added with bulk insert.

Related

Oracle: Result set in insertion order

In the Oracle database, the select statement, select * from tablename, does not give output in the order of insertion. In few articles, we have found that the Oracle database stores the row information based on Rowid.
We are using Oracle in a web application based on Java and there is a requirement to display the data in order of insertion in each module. So, applying an order by clause on each table is not feasible and can degrade the application's performance.
Is there any other way that the select statement returns data in insertion order?
Oracle version used is "Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production Version 19.3.0.0.0"
Oracle is a relational database. In it, rows don't have any particular order which means that select statement might return result in different order when you run it several times. Usually it doesn't, but - if there are a lot of inserts/deletes - sooner or later you'll notice such a behavior. Therefore, the only certain way to return rows in desired order is to use - ta-daaa! - order by clause.
Also, you'll have to maintain your own order of insertion. A simple way to do that is to use a column whose source is a sequence.
I would check here first: previous-post
I would recommend not relying on any ordering unless you specify order by.
But what is the deficiency of adding something like ORDER BY ROWNUM ASC; to your queries? You can trim your result sets ( paginate ) or even only do it to the entities you want to 'maintain insertion order'.
Are you using anything for entity management? Hibernate has some defaults you could use as well. - Post some code examples and can provide additional help.

Is it possible in Hibernate Search to predefined Query or conditions before the indexing process?

I am using MassIndexer with #Indexed interceptor and it works just fine I am able to filter the entities. but the problem is that I have thousands of soft-deleted records, I don't want these objects to be in the indexing process since they are not important anymore.
so, is it possible in Hibernate Search to predefined query or conditions before the indexing process?
You can't do indexing with predefined HQL. Rather you can intercept the indexing process and instruct indexer whether it should index, skip or remove index for entity.
Please refer to Conditional Indexing topic in Reference Guilde.
Under your conditions: when > 95% of data is not to be indexed I would suggest the following:
Consider manual reindexing by running query and pushing items to index as described in Manual index changes
Consider splitting full data table and only active data table. This is a bit of data duplication but should give you considerable performance gains when working with active records only.

Most efficient way to determine if a row EXISTS and INSERT into MySQL using java JDBC

I'm looking at trying to query a table in a MySQL database (I have the primary key, which is comprised of two categories, a name and a number but string comparison), such that this table could have anywhere from very few rows to upwards of hundreds of millions. Now, for efficiency, I'm not exactly sure how costly it is to actually do an INSERT query but I have a few options as to go about it:
I could query the database to see if the element EXISTS and then call an INSERT query if it doesn't.
I could try to brute force INSERT into the database and if it succeeds or fails, so be it.
I could initially on program execution, create a cache/store, grab the primary key columns and store them in a Map<String, List<Integer>> and then search the key for if the name exists, then if it does, does the key and value combination in the List<Integer> exists, if it doesn't, then INSERT query the database.
?
Option one really isn't on the table for what I would really implement, just on the list of possible choices. Option two would most likely average better for unique occurrences such that it isn't in the table already. Option three would favour if common occurrences are the case such that a lot are in the cache.
Bearing in mind that option chosen will be iterated over potentially millions of times. Memory usage aside (From option 3), from my calculations it's nothing significant in respect to the capacity available.
Let the database do the work.
You should do the second method. If you don't want to get a failure, you can use on duplicate key update:
insert into t(pk1, pk2, . . . )
values ( . . . )
on duplicate key update set pk1 = values(pk1);
The only purpose of on duplicate key update is to do nothing useful but not return an error.
Why is this the best solution? In a database, a primary key (or columns declared unique) have an index structure. This is efficient for the database to use.
Second, this requires only one round-trip to the database.
Third, there are no race conditions, if you have multiple threads or applications that might be attempting to insert the same record(s).
Fourth, the method with on duplicate key update will work for inserting multiple rows at once. (Without on duplicate key insert, then a multi-value statement would fail if a single row is duplicated.) Combining multiple inserts into a single statement can be another big efficiency.
Your second option is really the right way to go.
Rather than fetching all your result in the third option , you could try using Limit 1 , given the fact that the combination of your name and number form a primary key thus , using limit 1 to fetch the result and then if the result is empty then you can probably insert your desired data. It would lot faster that way.
MySQL has a neat way to perform an special insertion. The INSERT ON DUPLICATE KEY UPDATE is a MySQL extension to the INSERT statement. If you specify the ON DUPLICATE KEY UPDATE option in the INSERT statement and the new row causes a duplicate value in the UNIQUE or PRIMARY KEY index, MySQL performs an update to the old row based on the new values:
INSERT INTO table(column_list)
VALUES(value_list)
ON DUPLICATE KEY UPDATE column_1 = new_value_1, column_2 = new_value_2;

How does Hibernate Search (Lucene indexing) work?

I am using Hibernate Search built on top of Lucene indexing. If indexes are created against database table the performance will be good in returning the results.
My question is, once indexes are created, if we query for the results does Hibernate Search fetch results from the original database table using the created indexes? or does it not need to hit the database to fetch the results?
Thanks!
Unless you use Projections the indexes are used only to identify the set of primary keys matching the query, these are then used to load the entities from the Database.
There are many good reasons for this:
As you pointed out, we don't store all data in the index: a larger index is a slower index
Adding all needed metadata to the index would make indexing a very expensive operation
Value extraction from the index is not efficient at all: it's good at queries, no more
Relational databases are very good at loading data by primary key
If you DB isn't good enough, second level cache is excellent to load by primary key
By loading from the DB we guarantee consistency especially with async indexing
By loading from the DB you have entities participate in Transactions and isolation
That said, if you don't need fully managed entities you can use Projections to load the fields you annotated as Stored.YES. A common pattern is to provide preview of matches using projections, and then when the user clicks for details to load the full entity matching that result.
By default, every time an object is inserted, updated or deleted through Hibernate, Hibernate Search updates the according Lucene index as per documentation
Hence, the further searches will yeild the data through lucene indexes only.
Another Question explaining how Indexes work

Generate encoding String according to creation order

I need to generate encoding String for each item I inserted into the database. for example:
x00001 for the first item
x00002 for the sencond item
x00003 for the third item
The way I chose to do this is counting the rows. Before I insert the third item, I count against the database, I know there're already 2 rows, so the next encoding is ended with 3.
But there is a problem. If I delete the second item, the forth item will not be the x00004,but x00003.
I can add additional columns to table, to store the next encoding, I don't know if there's other better solutions ?
Most databases support some sort of auto incrementing identity field. This field is normally also setup to be unique, so duplicate ids do not occur.
Consult your database documentation to see how it is done in your database and use that - don't reinvent the wheel when you have a good mechanism in place already.
What you want is SELECT MAX(id) or SELECT MAX(some_function(id)) inside the transaction.
As suggested in Oded's answer a lot of databases have their own methods of providing sequences which are more efficient and depending on the DBMS might support non numeric ids.
Also you could have id broken down into Y and 00001 as separate columns and having both columns make up primary key; then most databases would be able to provide the sequence.
However this leads to the question if your primary key should have a meaning or not; Y suggest that there is some meaning in the part of the key (otherwise you would be content with a plain integer id).

Categories

Resources