How to enforce valid json in elastic search? - java

Consider following code:
function add_document() {
index=$1
_type=$2
json="$3"
curl -s -X PUT "$NODE_ADDRESS/index/_type?pretty" -d "$json"
}
add_document users documents '{user_name : "kshitiz"}'
The above function runs just fine and adds the document to the index. The problem however is that {user_name : "kshitiz"} isn't valid JSON.
I can validate the JSON in my code before sending to Elastic but the problem is that this instance would be shared among a large team. A better solution would be to disable invalid JSON as acceptable documents at the node so that add operation would fail and force developers to code properly.
How can I enable strict JSON validation in Elastic?

You should implement this functionality in plugin. Have a look at examples:
Validate data before indexing them
Implementing data validation in ElasticSearch

Use mappings which are basically schema for docs in ES doc types. This mapping will do only type checks for keys and associated value.
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html#_explicit_mappings

Related

Azure Function (Java) add attribute to CustomDimensions

I am desperatly searching on how to add a new attribute to the custom dimensions collection in the request table of the log analytics in the context of a Java function.
I know in C# you can do it using Activity and ITelementryInitializer but for Java I can not get it to work.
Any hints are appreciated. ;)
Azure Function (Java) add attribute to CustomDimensions
In Java, we have the Span attributes concept for adding the optional (extra) fields to the App Insights Schema that populates with the customDimensions in the form of various tables such as traces, exceptions, dependencies.
For that, one of the dependencies is required which is opentelemetry-api-1.jar.
AttributeKey newAttributeKey= AttributeKey.stringKey("resourcecustomDimension"); Span.current().setAttribute(newAttributeKey, "customDimValue");
This brings us the custom attributes for the custom dimensions that will be added to the Telemetry data.
Refer to this MS Doc for more information on adding the attributes to the customDimensions and found the GitHub issue 13310 regarding the user seeking for the Java Code on ITelemetryInitializer where the sample code on it provided by #dhaval24 user.

Need external version support in OpenSearch Update operation

I am using OpenSearch to index JSON documents & make them searchable. All documents have update timestamp field in EPOCH format. The problem is I can get update request where document body contains an older update time. My application should skip the update if the current document update time is older than the update time field in existing document stored in OpenSearch
To fulfil the requirement, I added external version in HTTP request /test_index/_update/123?version=1674576432910&version_type=external.
But I am getting error
Validation Failed: 1: internal versioning can not be used for optimistic concurrency control. Please use if_seq_no and if_primary_term instead
I read about if_seq_no & if_primary_term fields. They can't be used to solve my problem. Has anyone else encountered this problem & solved it? Please share. Or if anyone know about any plugin that I can install to support this, please share.
Sadly neither OpenSearch nor ElasticSearch supports external version in update request. And I don't see the feature getting added in near future. You can solve your specific problem using scripting. OpenSearch supports multiple scripting languages including Painless script. You can write a script that will compare a specific field (in your case update timestamp). And if condition is true, it will go ahead & update the fields with the new values.
{
"script": {
"lang": "painless",
"source": "if (params.updateTimestamp > ctx._source.updateTimestamp) {for (entry in params.entrySet()) {ctx._source[entry.getKey()] = entry.getValue();}}"
}
}
You can see a sample script above which will silently skip any update if new document has older timestamp. You can even throw exception also & handle it from you application. That way you can track number of requests with such issue.
You can use a similar script as stored script & use it in your update request. You can get more details including sample HTTP request & Java code in this article.
You should use the "if_seq_no" and "if_primary_term" parameters to perform optimistic concurrency control.
To solve your problem, you could first retrieve the existing document from OpenSearch using the document ID, and check the update timestamp field. If the existing timestamp is newer than the one in the update request, you can skip the update. Otherwise, you can include the "if_seq_no" and "if_primary_term" parameters in your update request, along with the updated document. The "if_seq_no" parameter should be set to the sequence number of the existing document, and the "if_primary_term" parameter should be set to the primary term of the existing document.
You can use the Update API for this... or the Optimistic Concurrency Control (OCC) mechanism, which is based on a combination of _seq_no and _primary_term fields.

Gremlin: getting json response in Java with gremlin-driver

I have the following query:
g
.V("user-11")
.repeat(bothE().subgraph("subGraph").outV())
.times(2)
.cap("subGraph")
.next()
When I run it using gremlin-python, I receive the following response:
{'#type': 'tinker:graph',
'#value': {'vertices': [v[device-3], v[device-1], v[user-11], v[card-1]],
'edges': [e[68bad734-db2b-bffc-3e17-a0813d2670cc][user-11-uses_device->device-1],
e[14bad735-2b70-860f-705f-4c0b769a7849][user-11-uses_device->device-3],
e[f0bb3b6d-d161-ec60-5e6d-068272297f24][user-11-uses_card->card-1]]}}
Which is a Graphson representation of the subgraph obtained by the query.
I want to get the same response using Java and gremlin-driver but I haven't been able to figure how.
My best try was:
ObjectMapper mapper = GraphSONMapper.build().version(GraphSONVersion.V3_0).create().createMapper();
Object a = graphTraversalSource
.V(nodeId)
.repeat(bothE().subgraph("subGraph").outV())
.times(2)
.cap("subGraph")
.next();
return mapper.writeValueAsString(a);
But that gave me the following error:
io.netty.handler.codec.DecoderException: org.apache.tinkerpop.gremlin.driver.ser.SerializationException: org.apache.tinkerpop.shaded.kryo.KryoException: Encountered unregistered class ID: 65536
I am using AWS Neptune, but I doubt that makes a difference given that I receive the answer I want through gremlin-python.
I appreciate any help you can give! Thanks
As mentioned in the comments
When using Java what you get back will be an actual TinkerGraph
Using the GraphBinary or GraphSONV3D0 serializer is recommended.
The Gyro one is older and is likely causing the error you saw if you did not specify one of the others serializers.
Note that even if you use one of the other serializers, to get the graph to deserialize into JSON you will need to use the specific TinkerGraph serializer (see the end of this answer for an example). Otherwise you will just get {} returned.
However, you may not need to produce JSON at all in the case of the Java Gremlin client ....
Given you have an actual TinkerGraph back you can run real Gremlin queries against the in-memory subgraph - just create a new traversal source for it. You can also use the graph.io classes to write the graph to file should you wish to. The TinkerGraph will include properties as well as edges and vertices.
You can also access the TinkerGraph object directly using statements such as
a.vertices and a.edges
By means of a concrete example, if you have a query of the form
TinkerGraph tg = (TinkerGraph)g.V().bothE().subgraph("sg").cap("sg").next();
Then you can do
GraphTraversalSource g2 = tg.traversal();
Long cv = g2.V().count().next();
Long ce = g2.E().count().next();
Or you can just access the TinkerGraph data structure directly using statements of the form:
Vertex v = tg.vertices[<some-id>]
Or
List properties = tg.vertices[<some-id>].properties()
This actually means you have a lot more power available to you in the Java client when working with subgraphs.
If you still feel that you need a JSON version of your subgraph, the IO reference is a handy bookmark to have: https://tinkerpop.apache.org/docs/3.4.9/dev/io/#_io_reference
EDITED: - to save you a lot of reading the docs, this code will print a TinkerGraph as JSON
mapper = GraphSONMapper.build().
addRegistry(TinkerIoRegistryV3d0.instance()).
version(GraphSONVersion.V3_0).create().createMapper();
mapper.writeValueAsString(tg)

Elasticsearch: send JSON query string via Java client?

I am new to Elasticsearch. I read Elasticsearch's Java client APIs and am able to build query and send it to the Elasticsearch server via the transport client.
Because my query is quite complex with multi-level filters and I notice that it is cumbersome to build a query via the Java client. I feel that it is much simpler to build a JSON query string and then send it over to the Elasticsearch server via a Java client.
Is this something Elasticsearch offers?
I do like what Elasticsearch Java API can do after receiving results such as scrolling over the results. I want to keep these features.
Thanks for any input and links!
Regards.
Did further research on Elasticsearch API and found out that Elasticsearch does offer this capability. Here is how:
SearchResponse scrollResp = client.prepareSearch("my-index")
.setTypes("my-type")
.setSearchType(SearchType.SCAN)
.setQuery(query) // **<-- Query string in JSON format**
.execute().actionGet();
You can no longer pass in string to the .setQuery function, however you can use a WrapperQueryBuilder like this:
WrapperQueryBuilder builder = QueryBuilders.wrapperQuery(searchQuery);
SearchRequestBuilder sr = client.prepareSearch().setIndices(index).setTypes(mapping).setQuery(builder);
I'd recommend using the Java API, it is very good once you get used to it and in most cases it is less cumbersome. If you look through the Elasticsearch source code you will see that the Java API Builds the JSON under the hood. Here is an example from the MatchAllQueryBuilder:
#Override
public void doXContent(XContentBuilder builder, Params params) throws IOException {
builder.startObject(MatchAllQueryParser.NAME);
if (boost != -1) {
builder.field("boost", boost);
}
if (normsField != null) {
builder.field("norms_field", normsField);
}
builder.endObject();
}
ElasticSearch has built in capabilities to do exactly what you need, in an organized manner.
To answer your question, please see this link (the material is gone on elastic's site, so it might no longer work):
https://web.archive.org/web/20150906215934/https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/search.html
All you have to do is build a simple file which contains your search template i.e complex search query.
It can be a simple json file, or a text file.
Now you simply pass in your parameters, through your java code.
See the example in the link, it makes things amply clear.
Bhargav.

PHP how to consume SOAP web services?

I'm very new in using web services. Appreciate if anyone can help me on this.
In my PHP codes, I'm trying to use the SOAP web services from another server (JIRA, java). The JIRA SOAP API is shown here.
$jirasoap = new SoapClient($jiraserver['url']);
$token = $jirasoap->login($jiraserver['username'], $jiraserver['password']);
$remoteissue = $jirasoap->getIssue($token, "issuekey");
I found that my codes have no problem to call the functions listed on that page. However, I don't know how to use the objects returned by the API calls.
My question are:
In my PHP codes, how can I use the methods in the Java class objects returned by SOAP API calls?
For example, the function $remoteissue = $jirasoap->getIssue($a, $b) will return a RemoteIssue. Based on this (http://docs.atlassian.com/rpc-jira-plugin/latest/com/atlassian/jira/rpc/soap/beans/RemoteIssue.html), there are methods like getSummary, getKey, etc. How can I use these functions in my codes?
Based on some PHP examples I found from the internet, it seems that everyone is using something like this:
$remoteissue = $jirasoap->getIssue($token, "issuekey");
$key = $remoteissue->key;
They are not using the object's methods.
Refer to this example, it seems that someone is able to do this in other languages. Can it be done in PHP too?
The problem I'm facing is that, I am trying to get the ID of an Attachment. However, it seems that we can't get the Attachment ID using this method: $attachmentid = $remoteattachment->id;. I am trying to use the $remoteattachment->getId() method.
In PHP codes, after we made a SOAP API call and received the returned objects, how do we know what data fields are available in that object?
For example,
$remoteissue = $jirasoap->getIssue($token, "issuekey");
$summary = $remoteissue->summary;
How do we know ->summary is available in $remoteissue?
When i refer to this document (http://docs.atlassian.com/rpc-jira-plugin/latest/com/atlassian/jira/rpc/soap/beans/RemoteIssue.html), I don't see it mention any data fields in RemoteIssue. How do we know we can get key, summary, etc, from this object? How do we know it is ->summary, not ->getsummary? We need to use a web browser to open the WSDL URL?
Thanks.
This question is over one year old, but to share knowledge and provide an answer to people who have this same question and found this page, here are my findings.
The document mentioned in the question is an overview of the JiraSoapService interface. This is a good reference for what functions can be called with which arguments and what they return.
If you use Java for your Jira SoapClient the returned objects are implemented, but if you use PHP, the returned objects aren't of the type stated in this documentation and do not have any of the methods mentioned. The returned objects are instances of the internal PHP class stdClass, which is a placeholder for undefined objects. The best way to know what is returned is to use var_dump() on the objects returned from the SoapCalls.
$jirasoap = new SoapClient($jiraserver['url']);
$token = $jirasoap->login($jiraserver['username'], $jiraserver['password']);
$remoteissue = $jirasoap->getIssue($token, "PROJ-1");
var_dump($remoteissue);
/* -- You will get something like this ---
object(stdClass)#2 (21) {
["id"]=> string(3) "100"
["affectsVersions"]=> array(0) { }
["assignee"]=> string(4) "user"
...
["created"]=> string(24) "2012-12-13T09:27:49.934Z"
...
["description"]=> string(17) "issue description"
....
["key"]=> string(6) "PROJ-1"
["priority"]=> string(1) "3"
["project"]=> string(4) "PROJ"
["reporter"]=> string(4) "user"
["resolution"]=> NULL
["status"]=> string(1) "1"
["summary"]=> string(15) "Project issue 1"
["type"]=> string(1) "3"
["updated"]=> string(24) "2013-01-21T16:11:43.073Z"
["votes"]=> int(0)
}
*/
// You can access data like this:
$jiraKey = $remoteissue->key;
$jiraProject = $remoteissue->project;
The document you referred to in #2 is to a Java implementation and really doesn't give you any help with PHP. If they do not publish a public API for their service (which would be unusual), then using the WSDL as a reference will let you know what objects and methods are accepted by the service and you can plan your method calls accordingly.
The technique you used to call getIssue(...) seems fine, although you should consider using try...catch in case of a SoapException.
I have used Jira SOAP in .NET project and IntelliSense hinted me what fields are available for returned object.
You can use something like VS.Php for Visual Studio or Php for Visual Studio if you are using Visual Studio.
Or you can choose one of the IDEs from here with support of IntelliSense.

Categories

Resources