Performing a distributed search inside a Solr component prepare method

Performing a distributed search inside a Solr component prepare method - java

I'm writing a custom Solr component. In the component's prepare method I'm executing a query given as a custom parameter (inside the req.params). I'm not running the q parameter query in the prepare method, but another input query defined in a custom parameter. I'm using the documents returned by that custom input query to do some preparations in the prepare method.
The problem is that since my index is distributed into several shards, the documents returned by the custom query are only the ones residing on one of the shards. In other words, the search performed in my prepare method is not distributed, and I'm getting partial results. This is more or less how I perform the search in my prepare method:
rb.req.getSearcher().getDocList(customQuery, null, null, offset, len, 0);
Is there a way to make a distributed search in the prepare method and get the matched documents from all the shards?
EDIT:
My current solution is to execute a query using Solrj roughly as follows:
SolrServer server = new HttpSolrServer(url);
SolrQuery request = new SolrQuery(customQuery);
NamedList queryResponse = server.query(request).getResponse();
Then I parse the response to get the content of the returned documents. I don't like my solution for several reasons. One of the reasons is that I have to parse the response. But the main reason is that I have to pass the Solr server url as a parameter. I put the url in the solrconfig.xml file. Is it possible to somehow construct a SolrServer instance without explicitly stating the Solr server url (perhaps through ZooKeeper)?

The Easy Way
Use CloudSolrServer to execute the distributed query. Feed it the Zookeeper url and the collection name (which are available in the response builder):
CoreDescriptor coreDescriptor = rb.req.getCore().getCoreDescriptor();
String collectionName = coreDescriptor.getCloudDescriptor().getCollectionName();
ZkController zkController = coreDescriptor.getCoreContainer().getZkController();
String zookeeperUrl = zkController.getZkServerAddress();
CloudSolrServer server = new CloudSolrServer(zookeeperUrl);
server.setDefaultCollection(collectionName);
server.connect();
SolrRequest request = ... //initialize the solr request to execute the query
NamedList<Object> solrResponse = server.request(solrRequest);
// do whatever you like with the returned response;
server.shutdown();
The Right Way
Do not perform a distributed search inside the prepare method. Don't query the index in the prepare method. What you have to do is first decide at which stage of the execution you want your distributed query to be executed. The stages are STAGE_START, STAGE_PARSE_QUERY, STAGE_TOP_GROUPS, STAGE_EXECUTE_QUERY, STAGE_GET_FIELDS and STAGE_DONE. If you need it to be executed between two of the stages, then create a new intermediate stage (such as EXECUTE_PREPARING_QUERY).
Override the distributedProcess method and implement it in such a way that if the current stage is your stage then set the right parameters for the shard request:
#Override public int distributedProcess(ResponseBuilder rb) {
...
if (rb.stage == MY_STAGE) {
ShardRequest sreq = new ShardRequest();
sreq.purpose = ShardRequest.PURPOSE_PRIVATE;
sreq.params = new ModifiableSolrParams();
// set the parameters for the shard request
rb.addRequest(this, sreq);
}
...
}
Now each shard is going to execute the request defined by the params you've set on its own core. That's going to happen on the stage MY_STAGE. You still have to handle the responses of the shards, combine them and use them. The right place to handle all those responses is inside the handleResponses method of the component. So override handleResponses and do whatever you need to do with the shard responses if you're in the right stage. You probably need to save them somewhere so you can reference them later in the finishStage method.
#Override public void handleResponses(ResponseBuilder rb, ShardRequest sreq) {
...
if (stage == MY_STAGE) {
List<ShardResponse> responses = sreq.responses;
for (ShardResponse response : responses) {
//do something with the response, maybe save it somewhere
rb.finished.remove(sreq);
}
}
...
}
Now you have to override the finishStage method and do whatever you need to do with the combined results.
#Override public void finishStage(ResponseBuilder rb) {
...
if (rb.stage == MY_STAGE) {
// do whatever you need to do with the results
}
...
}
The important message is to use the response builder stages to control the execution flow of the component with relation to the other components. You don't have to put the code in the prepare method if you want it to be executed before the execution of the actual query. You just have to create or use a stage that's intermediate to STAGE_START and STAGE_EXECUTE_QUERY.

Related

How to run a Google App Script using Google API Service Library (Java)

By following the Java Quickstart example, I am able to create a new Google App Script project and retrieve the scriptId. Also, by referring to the Restful API document, the script should be able to be executed using Method: scripts.run. However, I don't know how to retrieve the return value using com.google.api.services.script.Script in Java.
I've tried:
Script scriptService = getScriptService();
Script.Scripts scripts = scriptService.scripts();
Script.Scripts.Run run = scripts.run(scriptId, request);
and decompiled run function:
public Script.Scripts.Run run(String var1, ExecutionRequest var2) throws IOException {
Script.Scripts.Run var3 = new Script.Scripts.Run(var1, var2);
Script.this.initialize(var3);
return var3;
}
The function doesn't return an ExecutionResponse object which I am looking for.

Per the REST API documentation, calling script.run does not immediately return an ExecutionResponse object, but an Operation object that may contain an ExecutionResponse:
{
"done": boolean,
// Union field result can be only one of the following:
"error": {
object(Status)
},
"response": object(ExecutionResponse)
,
// End of list of possible types for union field result.
}
If we look at the Java API Client library, we see that method Script.Script.run takes arguments of the script ID, and an ExecutionRequest, and then returns a Script.Script.Run request that must be .execute()d:
Create a request for the method "scripts.run". This request holds the parameters needed by the script server. After setting any optional parameters, call the AbstractGoogleClientRequest.execute() method to invoke the remote operation.
The request referred to by the quoted documentation is Script.Script.Run, and has methods like .setAccessToken() for additional configuration, and several execution methods like .execute() and .executeMedia() to actually submit the execution request and return the Operation.

how to run multiple synchronous functions asynchronously?

I am writing in Java on the Vertx framework, and I have an architecture question regarding blocking code.
I have a JsonObject which consists of 10 objects, like so:
{
"system":"CD0",
"system":"CD1",
"system":"CD2",
"system":"CD3",
"system":"CD4",
"system":"CD5",
"system":"CD6",
"system":"CD7",
"system":"CD8",
"system":"CD9"
}
I also have a synchronous function which gets an object from the JsonObject, and consumes a SOAP web service, while sending the object to it.
the SOAP Web service gets the content (e.g. CD0), and after a few seconds returns an Enum.
I then want to take that enum value returned, and save it in some sort of data variable(like hash table).
What I ultimately want is a function that will iterate over all the JsonObject's objects, and for each one, run the blocking code, in parallel.
I want it to run in parallel so even if one of the calls to the function needs to wait 20 seconds, it won't stuck the other calls.
how can I do such a thing in vertx?
p.s: I will appreciate if you will correct mistakes I wrote.

Why not to use rxJava and "zip" separate calls? Vertx has great support for rxJava too. Assuming that you are calling 10 times same method with different String argument and returning another String you could do something like this:
private Single<String> callWs(String arg) {
return Single.fromCallable(() -> {
//DO CALL WS
return "yourResult";
});
}
and then just use it with some array of arguments:
String[] array = new String[10]; //get your arguments
List<Single<String>> wsCalls = new ArrayList<>();
for (String s : array) {
wsCalls.add(callWs(s));
}
Single.zip(wsCalls, r -> r).subscribe(allYourResults -> {
// do whatever you like with resutls
});
More about zip function and reactive programming in general: reactivex.io

Mounting a Vert.x sub-router on a path with path parameters

I want to create a URL structure for my Vert.x Web powered API that makes it clear how some entities are "contained" inside other entities and how you "traverse the entity path" to find child entities, so I'm thinking of using something like this to address a "grand child" (I don't expect anything deeper than a grand child):
GET /entity/:parent_id/sub-entity/:child_id/subsub-entity/:grandchild_id
So normally my Router configuration will look something like this:
router.get("/entity/:parent_id/sub-entity/:child_id/subsub-entity/:grandchild_id")
.handler(r -> {
HttpServerRequest req = r.request();
Minipart p = Entities.get(req.getParameter("parent_id"))
.getPart(req.getParameter("child_id"))
.getMinipart(req.getParameter("grandchild_id"));
// do something with p
});
When I add a lot of operations (each entity class at each level has catalog and create operations, and each level entity instance has get, update and delete operations, as well as a few other tidbits), my router class gets really large.
I was thinking of using sub-routers to offload the sub-entity management down the line, so the Entities Router configuration might do:
router.mountSubRouter("/entity/:parent_id/sub-entity", PartsRouter.init(vertx));
and then PartsRouter can do:
router.get("/:child_id").handler(r -> {
String parentEntityId = r.request().getParameter("parent_id");
Entity parent = Entities.get(parentEntityId);
String myid = r.request().getParameter("child_id");
Part myself = parent.getPart(myid);
// do something with myself
});
But when I do try that and try to access the sub-router operations, I get a 404 error from Vert.x...
Update:
Apparently Vert.x explicitly does not support this - it threw an exception that my wrapper code just logged and ignored, saying:
java.lang.IllegalArgumentException: Can't use patterns in subrouter mounts
So, is there another way to achieve what I'm trying to do (split a large router configuration into a proper class hierarchy)?

I can image 2 ways of solving your problem:
The first one would be to have a initial handler that processes the common part of the request and calls next so the following one will continue where the first stopped e.g.:
router.route("/entity/:parent_id/sub-entity", ctx -> {
// common part here...
ctx.next();
});
And then:
router.route("/entity/:parent_id/sub-entity/:child_id", ctx -> {
String parentEntityId = r.request().getParameter("parent_id");
Entity parent = Entities.get(parentEntityId);
String myid = r.request().getParameter("child_id");
Part myself = parent.getPart(myid);
// do something with myself
});
Alternatively you can use internal redirects, so you handle the initial code as before but instead of calling next() you redirect to another URL. In that case you should store in the context what you want to reuse since the request will be restarted in the new location.

What parameters should we pass to userFriendlySelector function in order to get stream information

This query is related to Rational Clear Case Cm api programming using a java. We have a requirement wherein we want to get the list of modified files of a particular stream and in order to do this we want to get the activity list from Ucm Vob object which we have got from Clear Quest. We are trying to get the stream information using the below mentioned function:-
StpLocation streamLoc = provider.***userFriendlySelector(Domain.CLEAR_CASE, Namespace.STREAM,login+"Auto.03.38.00.00.dev.stream","AlnCode_09_PVOB")***;
But by doing this we are getting the below mentioned error:-
CRVAP0226E (internal-error): Internal error: Status 500; Condition null; Message: CRVSV0135E Internal error: bad length.
What parameters should we pass to userFriendlySelector function in order to get the required result?

I only know about the documentation and this example:
public CcStream getUcmStream( String streamName ) throws Exception {
// 获得资源和代理对象 (Access to resources and the proxy object)
StpLocation streamLoc = m_provider.userFriendlySelector(
Domain.CLEAR_CASE ,
// 命名空间类别是流 (Namespace category is stream)
Namespace.STREAM ,
// 流名称，传入参数 (Stream name, passed as a parameter)
streamName ,
// VOB 标签，构造器中被初始化 (vob tag, initialized in the constructor)
m_pvobTag) ;
return m_provider.ccStream(streamLoc) ;
}
Depending on the platform, the Vob tag might vary from \myvob to /vobs/myvob, so check your Vob tag parameter.
And check the stream name: it usually does not include your login (unless it is a development stream named after a user)

elasticsearch java API: matchAll search query doesn't return results?

I've got an in-memory instance of elastic search running, and doing some exploratory coding to learn the search java API. I am able to submit documents to the index and retrieve them using GET, but when I try a simple search query, I am not getting any results.
// first, try a get request, to make sure there is something in the index
GetResponse results = client.prepareGet(INDEX_NAME, INDEX_TYPE, testID)
.execute()
.actionGet();
// this assertion succeeds, as we expect it to.
assertThat(results.getId()).isEqualTo(testID);
// next, try the simplest possible search
SearchResponse s1 = client.prepareSearch(INDEX_NAME).setQuery(matchAllQuery())
.execute()
.actionGet();
// this assertion fails. why? answer: when we have an in-memory node, we have to
// manually call refresh on the indexing, after submitting a document.
assertThat(s1.getHits().totalHits()).isGreaterThanOrEqualTo(1);
after some testing, I think the problem is in how I am setting up my Node and associated client (in memory):
#BeforeMethod
public void setup() {
// set up elastic search to run locally. since the transaction
// log needs a filesystem, we can't run it as purely in memory,
// but we can set the data directories into "target", so that maven will
// clean up after the fact: http://bit.ly/OTN7Qf
Settings settings = ImmutableSettings.settingsBuilder()
.put("node.http.enabled", true)
.put("path.logs","target/elasticsearch/logs")
.put("path.data","target/elasticsearch/data")
.put("gateway.type", "none")
.put("index.store.type", "memory")
.put("index.number_of_shards", 1)
.put("index.number_of_replicas", 1).build();
node = NodeBuilder.nodeBuilder().local(true).settings(settings).node();
client = node.client();
}

Someone on the elastic search google group was kind enough to help me out here. After submitting a document to an in-memory node, I need to refresh the index:
node.client().admin().indices().prepareRefresh().execute().actionGet();
calling refresh fixed the problem.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.