Preprocessing input text before calling ElasticSearch API

Preprocessing input text before calling ElasticSearch API - java

I have a Java client that allows indexing documents on a local ElasticSearch server.
I now want to build a simple Web UI that allows users to query the ES index by typing in some text in a form.
My problem is that, before calling ES APIs to issue the query, I want to preprocess the user input by calling some Java code.
What is the easiest and "cleanest" way to achieve this?
Should I create my own APIs so that the UI can access my Java code?
Should I build the UI with JSP so that I can directly call my Java
code?
Can I somehow make ElasticSearch execute my Java code before
the query is executed? (Perhaps by creating my own ElasticSearch plugin?)

In the end, I opted for the simple solution of using Json-based RESTful APIs. Time proved this to be quite flexible and effective for my case, so I thought I should share it:
My Java code exposes its ability to query an ElasticSearch index, by running an HTTP server and responding to client requests with Json-formatted ES results. I created the HTTP server with a few lines of code, by using sun.net.HttpServer. There are more serious/complex HTTP servers out there (such as Tomcat), but this was very quick to adopt and required zero configuration headaches.
My Web UI makes HTTP GET requests to the Java server, receives Json-formatted data and consumes it happily. My UI is implemented in PHP, but any web language does the job, as long as you can issue HTTP requests.
This solution works really well in my case, because it allows to have no dependencies on ES plugins. I can do any sort of pre-processing before calling ES, and even post-process ES output before sending the results back to the UI.

Depending on the type of pre-processing, you can create an Elasticsearch plugin as custom analyser or custom filter: you essentially extend the appropriate Lucene class(es) and wrap everything into an Elasticsearch plugin. Once the plugin is loaded, you can configure the custom analyser and apply it to the related fields. There are a lot of analysers and filters already available in Elasticsearch, so you might want to have a look at those before writing your own.
Elasticsearch plugins: https://www.elastic.co/guide/en/elasticsearch/reference/1.6/modules-plugins.html (a list of known plugins at the end)
Defining custom analysers: https://www.elastic.co/guide/en/elasticsearch/guide/current/custom-analyzers.html

Related

Does Streams API functionality overlap with Spring Integration?

When introducing parallel processing to an application where multiple save entity calls are being made, I see prior dev has chosen to do it via Spring Integration using split().channel(MessageChannels.executor(Executors.newfixedThreadPool(10))).handle("saveClass","saveMethod").aggregate().get() - where this method is mapped to a requestChannel using #Gateway annotation. My question is this task seems to be simpler to do using the parallelStream() and forEach() method. Does IntergrationFlow provide any benefit in this case?

If you really do a plain in-memory data processing where Java's Stream API is enough, then indeed you don't need the whole messaging solution like Spring Integration. But if you deal with distributed requirements to process data from different systems, like from HTTP to Apache Kafka or DB, then it is better to use some tool which allows you smoothly to connect everything together. Also: no one stops you to use Stream API in the Spring Integration application. In the end all your code is Java anyway. Please, learn more what is EIP and why we would need a special framework to implement this messaging-based solutions: https://www.enterpriseintegrationpatterns.com/

How to send data from a java application to elastic search without using the elasticsearch libraries

I saw already answered questions and seems they are old enough that I couldn't use them. I tried an example given at https://www.elastic.co/blog/found-java-clients-for-elasticsearch which has the code written but not in an organized manner that would help me. The libraries are old and code gives me error.
I saw Spring Data project but that only allow a specific type of document/class to be indexed and needs the model to be predefined which is not my usecase. My goal is to build a java web application which could ingest any data documents fed to elasticsearch and we could analyze it with Kibana. I would need to know how can i fire a rest call or curl for bulk data. Can anyone state an example with complete parts please.

Use rest client.
The Java High Level REST Client works on top of the Java Low Level
REST client. Its main goal is to expose API specific methods, that
accept request objects as an argument and return response objects, so
that request marshalling and response un-marshalling is handled by the
client itself.
To upload data from java application to ES , you can use bulk Api.
To check list of APIs check link

Backup and restore entities from Google App engine

I am looking to implement a way to transfer data from one application to another programmatically in Google app engine.
I know there is a way to achieve this with database_admin console but that process is very time inefficient.
I am currently implementing this with the use of Google Cloud Storage(GCS) but that involves querying data, saving it to GCS and then reading from GCS from different app and restoring it.
Please let me know if anyone knows a simpler way of transferring data between two applications programmatically.
Thanks!

Haven't tried this myself but it sounds like it should work: Use the data_store admin to backup your objects to GCS from one app, then use your other app to restore that file from GCS. This should be a good method if you only require a one time sync.
If you need to constantly replicate data from one app to another, introducing REST endpoints at one or both sides could help:
https://code.google.com/p/appengine-rest-server/ (this is in Python, I know, but just define a version of your app for the REST endpoint)
You just need to make sure your model definitions match on both sides (pretty much update the app at both sides with the same deployment code) and only have the side that needs to sync data track time of last sync and use the REST endpoints to pull in new data. Cron Jobs can do this.
Alternatively, create a PostPut callback on all of your models to make a POST call every time a model is written to your datastore to the proper REST endpoint on the other app.
You can batch update with one method, or keep a constantly updated version with the other method (at the expense of more calls).

Are you trying to move data between two App Engine applications or trying to export all your data from App Engine so you can move to a different hosting system? Your question doesn't have enough information to understand what you're attempting to do. Based on the vague requirements I would say typically this would be handled with a web service that you write in one application that exposes the data and the other application calls that service to consume the data. I'm not sure why Cloud Endpoints was down voted because that provides a nice way to expose your data as a JSON based web service with a minimum of coding fuss.
I'd recommend adding some more details into your question like exactly what are you trying to accomplish and maybe a mock data sample.

You could create a backup of your data using bulkloader and then restore it on another application.
Details here:
https://developers.google.com/appengine/docs/python/tools/uploadingdata?csw=1#Python_Downloading_and_uploading_all_data
Refer to this post if you are using Java:
Downloading Google App Engine Database

I don't know if this could be suitable for your concrete scenario, but Google Cloud Endpoints are definitely a simple way of transferring data programmatically from Google App Engine.
This is kind of the Google implementation of REST web services, so they allow you to share resources using URLs. This is still an experimental technology, but as long as I've worked with them, they work perfectly. Moreover they are very well integrated with GAE and the Google Plugin for Eclipse.
You can automatically generate an endpoint from a JDO persistent class and I think you can automatically generate the client libraries as well (although I didn't try that).

Java REST Interface

I have a PHP web application environment. I am using Slim Framework as REST interface for my application. My application front-end is written using Backbone.js and jQuery.
There is a utility (.jar file) which when I use command line makes a remote call (I guess this is a Web Service) which returns me the data.
how do I best incorporate this into my webapplication described on top?
My application front end will have a Button that should make an AJAX call to the REST Interface and fetch the data as JSON.
My approach:
PHP-REST interface url is: /api/phprestapi.php exists
Add a JAVA-REST interface at url: /api/javarestapi.java (Perhaps) to separate these two
Existing Environment: LAMP Stack on Ubuntu
How do I achieve this? What is the kind of effort involved?
Thanks for your pointers

If I understand you correctly, you need to be able to return the data being output from the jar into php. If that is the case then you should start looking at the different ways to execute a program from php [1]. exec is probably the most well known.
If you want further control, I would recommend learning more about the web service being called by the jar and doing the call to the web service in php. However, this would take a lot more time than the first option above.
[1] http://php.net/manual/en/ref.exec.php

CMIS vs REST. Which client would be easier to implement from scratch?

Im working on a Java project that needs to upload files using either REST or CMIS (both services are available). I'm completely new to these APIs and would like to ask which one would be easiest and more straightforward to implement.
I can't use external libraries in the project, so I will need to implement the client from scratch.
Note: the only requirement is to upload files.
Thanks in advance.

The goal of the Content Management Interoperability Services (CMIS) specification is to provide a set of services for working with rich content repositories. It provides a entire specification for ECM applications, that can be REST or SOAP.
CMIS provides specification to operations that control folders,documents,relationships and policy.
I think that for your upload, using CMIS would be like killing a fly with a bomb.

While I admit that I don't know CMIS, file upload with REST is just classic HTTP file upload where you interpret the path name as indicative of the resource to update or replace. Basic REST usage would have you do (an HTTP) GET (method) as “read the file”, POST as “create the file while picking a new name” (typically with a redirect afterwards so that the client can find out what name was picked), PUT as “create the file with the given name or replace that file's contents”, and DELETE as “delete the file”. Moreover, you don't need to support all those methods; do as few as you want (but it's a good idea to support some GET requests, even if just to let people that their uploads worked).
However, when implementing you want in all cases to try to avoid holding much of the file's data in memory; that doesn't scale. Better to take the time to implement streaming transfers so that you never actually need to buffer more than a few kilobytes. You can certainly do this with REST/HTTP. (You could even do it with SOAP using MTOM, but that's probably out of scope for you…)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Preprocessing input text before calling ElasticSearch API - java

Related

Does Streams API functionality overlap with Spring Integration?

How to send data from a java application to elastic search without using the elasticsearch libraries

Backup and restore entities from Google App engine

Java REST Interface

CMIS vs REST. Which client would be easier to implement from scratch?

Categories

Resources