Spring data elasticsearch bulk index and delete

Spring data elasticsearch bulk index and delete - java

I'm new to the community so I apologise if I do something wrong.
I'm using spring data elasticsearch (2.0.4/2.4)
And I would like to make a bulk insert and delete.
But ElasticsearchTemplate only contains a method bulkInsert
#Override
public void bulkIndex(List<IndexQuery> queries) {
BulkRequestBuilder bulkRequest = client.prepareBulk();
for (IndexQuery query : queries) {
bulkRequest.add(prepareIndex(query));
}
BulkResponse bulkResponse = bulkRequest.execute().actionGet();
if (bulkResponse.hasFailures()) {
Map<String, String> failedDocuments = new HashMap<String, String>();
for (BulkItemResponse item : bulkResponse.getItems()) {
if (item.isFailed())
failedDocuments.put(item.getId(), item.getFailureMessage());
}
throw new ElasticsearchException(
"Bulk indexing has failures. Use ElasticsearchException.getFailedDocuments() for detailed messages ["
+ failedDocuments + "]", failedDocuments
);
}
}
So I have created a bulk method to handle both but I can't access the method prepareIndex which is private.
Are you aware of any solution to, in one bulk, index and delete documents or should I use reflection to change visibility of the prepareIndex method
or is there any easy way to create an indexRequest from a model/POJO?

Not sure which versions you mean with
(2.0.4/2.4)
Currently there is no possibility for bulk deletes. And no combination of different operations like index/update in one request.
Can you file an issue in Jira to add support for bulk delete and a possibility to have different operations in one call? Though this won't make it into the next release, I'm afraid.

Related

Spring PagingAndSortingRepository delete entry during processing

Im using spring PagingAndSortingRepository to do pagination of database entries.
During processing i need to delete some entries..
when i call the repository to delete, the entry is deleted, after that the problem is with the next pageable. i'm not getting the size number of elements from the next Pageable (pageRequest.next();).
Is there any way to iterate with pagination and perform in parallel crud operation.
Part of the code
while (!onePage.isEmpty()) {
while (pageIterator.hasNext()) {
Object nextElement = pageIterator.next();
if (!falseCondition) {
log.info("sending message with Id {}", nextElement.getId());
repository.deleteById(nextElement.getId());
} else {
log.info("Lost connection");
return;
}
}
pageRequest = pageRequest.next();
onePage = repository.findAll(pageRequest);
pageIterator = onePage.iterator();
}
Many thanks.

Like #ruba pointed out in the example, it is not a hibernate issue. Even if you using jdbc API directly you will have to handle the situation. I can propose you a solution
You can implement your custom spring-data-jpa repository method where the service pass the pageRequest but you translate it to offset and limit. So instead of calling pageRequest.next() you do the following which takes into account of the items deleted.
long nextPageNumber = pageRequest.getPageNumber() + 1;
long nextOffset = nextPageNumber * pageRequest.getPageSize()
- itemsDeletedInCurrentPage;
long limit = pageRequest.getPageSize();
List<Item> itemsInNextPage = em.createQuery(query)
.setFirstResult(offset)
.setMaxResults(limit)
.getResultList();

Extract Object Data into usable format

I am a real newbie so go easy on me and my terminology, I am still learning!
I have a Backendless database I would like to show in my app.
I have successfully connected it to my Android Studio app, queried it and returned the data in the following method:
Backendless.Data.of( "database" ).find( queryBuilder, new AsyncCallback>(){public void handleResponse(List'<'Map'>'response ){
The narrative on the Backendless SDK says "the "response" object is a collection of java.util.Map objects"
I then used an iterator:
Iterator itr = response.iterator();
And a while loop to 'get' the object:
Object element = itr.next();
I am happy up until this point, the next step is to extract the useful data from element.
I have tried many options of but the only one I have working is element.toString() and use various methods to pick out what I want. This seems so inefficient I thought I would ask the experts for a better option!?

Your question is rather about working with Java Map interface. So I'd advice you to look into its documentation and maybe some tutorials on this topic.
As to your Backendless question, it looks like you got the request part right. Here is the extended example from the docs, which shows you how to retrieve the object fields:
Backendless.Persistence.of( "Contact" ).find( new AsyncCallback<List<Map<String, Object>>>(){
#Override
public void handleResponse( List<Map<String, Object>> foundContacts )
{
Iterator<Map<String, Object>> contactsIterator = foundContacts.iterator();
while( contactsIterator.hasNext() )
{
Map<String, Object> contact = contactsIterator.next();
String name = (String) contact.get( "name" ); // in case you have STRING field 'name' in Backendless database
Integer age = (Integer) contact.get( "age" ); // in case you have INT field 'age' in Backendless database
// etc.
}
}
#Override
public void handleFault( BackendlessFault fault )
{
System.out.err( "Failed find: " + fault );
}
});
As you may see, the main concern is to retrieve a Map instead of Object from the response List.
And also your question would be more useful with code samples of what you tried and maybe direct link to the docs you used as an example.

How Do You Check For DELETE/UPDATE Without A WHERE Clause

I currently have a listener that we use to do a few different monitoring-type activities (like log a warning if a query takes more than 5 seconds), but it also watches for and kills "silly bugs" -- especially UPDATE and DELETE queries that are missing a WHERE clause.
In the past we did the following (note that we are using com.foundationdb.sql):
/**
* Hook into the query execution lifecycle before rendering queries. We are checking for silly mistakes,
* pure SQL, etc.
*/
#Override
public void renderStart(final #NotNull ExecuteContext ctx) {
if (ctx.type() != ExecuteType.WRITE)
return;
String queryString = ctx.sql();
try (final Query query = ctx.query()) {
// Is our Query object empty? If not, let's run through it
if (!ValidationUtils.isEmpty(query)) {
queryString = query.getSQL(ParamType.INLINED);
final SQLParser parser = new SQLParser();
try {
final StatementNode tokens = parser.parseStatement(query.getSQL());
final Method method = tokens.getClass().getDeclaredMethod("getStatementType");
method.setAccessible(true);
switch (((Integer) method.invoke(tokens)).intValue()) {
case StatementType.UPDATE:
SelectNode snode = ConversionUtils.as(SelectNode.class,
((DMLStatementNode) tokens).getResultSetNode());
// check if we are a mass delete/update (which we don't allow)
if ((Objects.isNull(snode)) || (Objects.isNull(snode.getWhereClause())))
throw new RuntimeException("A mass update has been detected (and prevented): "
+ DatabaseManager.getBuilder().renderInlined(ctx.query()));
break;
case StatementType.DELETE:
snode = ConversionUtils.as(SelectNode.class,
((DMLStatementNode) tokens).getResultSetNode());
// check if we are a mass delete/update (which we don't allow)
if ((Objects.isNull(snode)) || (Objects.isNull(snode.getWhereClause())))
throw new RuntimeException("A mass delete has been detected (and prevented): "
+ DatabaseManager.getBuilder().renderInlined(ctx.query()));
break;
default:
if (__logger.isDebugEnabled()) {
__logger
.debug("Skipping query because we don't need to do anything with it :-): {}", queryString);
}
}
} catch (#NotNull StandardException | IllegalAccessException
| IllegalArgumentException | InvocationTargetException | NoSuchMethodException
| SecurityException e) {
// logger.error(e.getMessage(), e);
}
}
// If the query object is empty AND the SQL string is empty, there's something wrong
else if (ValidationUtils.isEmpty(queryString)) {
__logger.error(
"The ctx.sql and ctx.query.getSQL were empty");
} else
throw new RuntimeException(
"Someone is trying to send pure SQL queries... we don't allow that anymore (use jOOQ): "
+ queryString);
}
}
I really don't want to use yet another tool -- especially since most SQL parsers can't handle UPSERTs or the wide variety of queries that jOOQ can, so a lot just get cut out -- and would love to use jOOQ's constructs, but I'm having trouble. Ideally I could just check the query class and if it's an Update or Delete (or a subclass), I would just scream if it isn't an instance of UpdateConditionStep or DeleteConditionStep, but that doesn't work because the queries are coming back as UpdateQueryImpl... and without crazy reflection, I can't see if there is a condition in use.
So... right now I'm doing:
/**
* Hook into the query execution lifecycle before rendering queries. We are checking for silly mistakes, pure SQL,
* etc.
*/
#Override
public void renderStart(final #NotNull ExecuteContext ctx) {
if (ctx.type() != ExecuteType.WRITE)
return;
try (final Query query = ctx.query()) {
// Is our Query object empty? If not, let's run through it
if (!ValidationUtils.isEmpty(query)) {
// Get rid of nulls
query.getParams().entrySet().stream().filter(entry -> Objects.nonNull(entry.getValue()))
.filter(entry -> CharSequence.class.isAssignableFrom(entry.getValue().getDataType().getType()))
.filter(entry -> NULL_CHARACTER.matcher((CharSequence) entry.getValue().getValue()).find())
.forEach(entry -> query.bind(entry.getKey(),
NULL_CHARACTER.matcher((CharSequence) entry.getValue().getValue()).replaceAll("")));
if (Update.class.isInstance(query)) {
if (!UpdateConditionStep.class.isInstance(query)) {
if (!WHERE_CLAUSE.matcher(query.getSQL(ParamType.INDEXED)).find()) {
final String queryString = query.getSQL(ParamType.INLINED);
throw new RuntimeException(
"Someone is trying to run an UPDATE query without a WHERE clause: " + queryString);
}
}
} else if (Delete.class.isInstance(query)) {
if (!DeleteConditionStep.class.isInstance(query)) {
if (!WHERE_CLAUSE.matcher(query.getSQL(ParamType.INDEXED)).find()) {
final String queryString = query.getSQL(ParamType.INLINED);
throw new RuntimeException(
"Someone is trying to run a DELETE query without a WHERE clause: " + queryString);
}
}
}
} else
throw new RuntimeException(
"Someone is trying to send pure SQL queries... we don't allow that anymore (use jOOQ): "
+ ctx.sql());
}
}
This let's me get rid of the third party SQL parser, but now I'm using a regular expression on the non-inlined query looking for \\s[wW][hH][eE][rR][eE]\\s, which isn't ideal, either.
Is there a way to use jOOQ to tell me if an UPDATE, DELETE, has a WHERE clause?
Similarly, is there a way that let's me see what table the query is acting against (so that I can limit the tables someone can perform mutable actions against -- obviously that one wouldn't check if it's UPDATE or DELETE, instead using the ExecuteType)?

That's an interesting idea and approach. One problem I can see with it is performance. Rendering the SQL string a second time and then parsing it again sounds like a bit of overhead. Perhaps, this ExecuteListener should be active in development and integration test environments only, not in production.
Regarding your questions
Is there a way to use jOOQ to tell me if an UPDATE, DELETE, has a WHERE clause?
Since you seem to be open to use reflection to access a third party library's internals, well of course, you could check if the ctx.query() is of type org.jooq.impl.UpdateQueryImpl or org.jooq.impl.DeleteQueryImpl. In version 3.10.1, both of them have a private condition member, which you could check.
This will obviously break any time the internals are changed, but it might be a pragmatic solution for now.
Similarly, is there a way that let's me see what table the query is acting against
A more general and more robust approach would be to implement a VisitListener, which is jOOQ's callback that is called during expression tree traversal. You can hook into the generation of the SQL string and the collection of bind variables, and throw your errors as soon as you encounter:
An UPDATE or DELETE statement
... without a WHERE clause
... updating a table from a specific set of tables
You "just" have to implement a stack machine that remembers all of the above things prior to throwing the exception. An example of how VisitListener can be implemented is given here:
https://blog.jooq.org/2015/06/17/implementing-client-side-row-level-security-with-jooq
New feature in the future
This kind of feature has been discussed a couple of times on the mailing list as well. It's a low hanging fruit to support by jOOQ natively. I've created a feature request for jOOQ 3.11, for this:
https://github.com/jOOQ/jOOQ/issues/6771

Delete all files in 'folder' or with prefix in Google Cloud Bucket from Java

I know the idea of 'folders' is sort of non existent or different in Google Cloud Storage, but I need a way to delete all objects in a 'folder' or with a given prefix from Java.
The GcsService has a delete function, but as far as I can tell it only takes 1 GscFilename object and does not honor wildcards (i.e., "folderName/**" did not work).
Any tips?

The API only supports deleting a single object at a time. You can only request many deletions using many HTTP requests or by batching many delete requests. There is no API call to delete multiple objects using wildcards or the like. In order to delete all of the objects with a certain prefix, you'd need to list the objects, then make a delete call for each object that matches the pattern.
The command-line utility, gsutil, does exactly that when you ask it to delete the path "gs://bucket/dir/**. It fetches a list of objects matching that pattern, then it makes a delete call for each of them.
If you need a quick solution, you could always have your Java program exec gsutil.
Here is the code that corresponds to the above answer in case anyone else wants to use it:
public void deleteFolder(String bucket, String folderName) throws CoultNotDeleteFile {
try
{
ListResult list = gcsService.list(bucket, new ListOptions.Builder().setPrefix(folderName).setRecursive(true).build());
while(list.hasNext())
{
ListItem item = list.next();
gcsService.delete(new GcsFilename(file.getBucket(), item.getName()));
}
}
catch (IOException e)
{
//Error handling
}
}

Extremely late to the party, but here's for current google searches. We can delete multiple blobs efficiently by leveraging com.google.cloud.storage.StorageBatch.
Like so:
public static void rmdir(Storage storage, String bucket, String dir) {
StorageBatch batch = storage.batch();
Page<Blob> blobs = storage.list(bucket, Storage.BlobListOption.currentDirectory(),
Storage.BlobListOption.prefix(dir));
for(Blob blob : blobs.iterateAll()) {
batch.delete(blob.getBlobId());
}
batch.submit();
}
This should run MUCH faster than deleting one by one when your bucket/folder contains a non trivial amount of items.
Edit since this is getting a little attention, I'll demo error handling:
public static boolean rmdir(Storage storage, String bucket, String dir) {
List<StorageBatchResult<Boolean>> results = new ArrayList<>();
StorageBatch batch = storage.batch();
try {
Page<Blob> blobs = storage.list(bucket, Storage.BlobListOption.currentDirectory(),
Storage.BlobListOption.prefix(dir));
for(Blob blob : blobs.iterateAll()) {
results.add(batch.delete(blob.getBlobId()));
}
} finally {
batch.submit();
return results.stream().allMatch(r -> r != null && r.get());
}
}
This method will:
Delete every blob in the given folder of the given bucket returning true if so. The method will return false otherwise. One can look into the return method of batch.delete() for a better understanding and error proofing.
To ensure ALL items are deleted, you could call this like:
boolean success = false
while(!success)) {
success = rmdir(storage, bucket, dir);
}

I realise this is an old question, but I just stumbled upon the same issue and found a different way to resolve it.
The Storage class in the Google Cloud Java Client for Storage includes a method to list the blobs in a bucket, which can also accept an option to set a prefix to filter results to blobs whose names begin with the prefix.
For example, deleting all the files with a given prefix from a bucket can be achieved like this:
Storage storage = StorageOptions.getDefaultInstance().getService();
Iterable<Blob> blobs = storage.list("bucket_name", Storage.BlobListOption.prefix("prefix")).iterateAll();
for (Blob blob : blobs) {
blob.delete(Blob.BlobSourceOption.generationMatch());
}

ATG Repository API

Im trying to update multiple records via an ATG class extending GenericService.
However im running against a roadblock.
How do I do a multiple insert query where i can keep adding all the items / rows into the cached object and then do a single command sync with the table using item.add() ?
Sample code
the first part is to clear out the rows in the table before insertion happens (mighty helpful if anyone knows of a way to clear all rows in a table without having to loop through and delete one by one).
MutableRepository repo = (MutableRepository) feedRepository;
RepositoryView view = null;
try{
view = getFeedRepository().getView(getFeedRepositoryFeedDataDescriptorName());
RepositoryItem[] items = null;
if(view != null){
QueryBuilder qb = view.getQueryBuilder();
Query getFeedsQuery = qb.createUnconstrainedQuery();
items = view.executeQuery(getFeedsQuery);
}
if(items != null && items.length>0){
// remove all items in the repository
for(RepositoryItem item :items){
repo.removeItem(item.getRepositoryId(), getFeedRepositoryFeedDataDescriptorName());
}
}
for(RSSFeedObject rfo : feedEntries){
MutableRepositoryItem feedItem = repo.createItem(getFeedRepositoryFeedDataDescriptorName());
feedItem.setPropertyValue(DB_COL_AUTHOR, rfo.getAuthor());
feedItem.setPropertyValue(DB_COL_FEEDURL, rfo.getFeedUrl());
feedItem.setPropertyValue(DB_COL_TITLE, rfo.getTitle());
feedItem.setPropertyValue(DB_COL_FEEDURL, rfo.getPublishedDate());
RepositoryItem item = repo.addItem(feedItem) ;
}

The way I interpret your question is that you want to add multiple repository items to your repository but you want to do it fairly efficiently at a database level. I suggest you make use of the Java Transaction API as recommended in the ATG documentation, like so:
TransactionManager tm = ...
TransactionDemarcation td = new TransactionDemarcation ();
try {
try {
td.begin (tm);
... do repository item work ...
}
finally {
td.end ();
}
}
catch (TransactionDemarcationException exc) {
... handle the exception ...
}
Assuming you are using a SQL repository in your example, the SQL INSERT statements will be issued after each call to addItem but will not be committed until/if the transaction completes successfully.

ATG does not provide support for deleting multiple records in a single SQL statement. You can use transactions, as #chrisjleu suggests, but there is no way to do the equivalent of a DELETE WHERE ID IN {"1", "2", ...}. Your code looks correct.
It is possible to invoke stored procedures or execute custom SQL through an ATG Repository, but that isn't generally recommended for portability/maintenance reasons. If you did that, you would also need to flush the appropriate portions of the item/query caches manually.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.