JSON path condifitional index of element based on value of another field - java

I want to select array element based on value of another object(not array part)
{
"array": [
{
"id": 1
},
{
"id": 2
}
],
"conditionalField": "ab"
}
I want select array element based on value of $.conditionalField. I need something like this:
$.array[($.conditionalField == "ab" ? 0 : 1)]
Does json path support this? I use Jayway JSON Path

Unfortunately, this cannot be done in JSONPath as of now. Generally, current implementations lack the fluency of XPath 1.0 (let alone v2.0/3+).
A (well known) trick mimicking a conditional (if-else) in a language like XPath 1.0 that does not come with such a feature is to use a technique called "Becker's method", i.e. concatenating two mutually exclusive strings, where one of two strings becomes empty depending on a condition. In (XPath) pseudo-code:
concat(
substring('foo', 1, number(true) * string-length('foo')),
substring('bar', 1, number(not(true)) * string-length('bar'))
)
We might be able to do this using JSON-Path in JavaScript and leveraging script eval; I think it should look like this (slit-up for better readability):
$.[?(#.conditionalField && #.dict[0].id && #.dict[1].id && #.temp.concat(
#.dict[0].id.substring(0, #.conditionalField==='ab' * #.dict[0].id.length()) )
.concat( #.dict[1].id.substring(0, #.conditionalField==='xy' * #.dict[1].id.length()) )
)]
but it is not working. So, as of now, there seems to be no way to mimic an if-else construct in JSON-Path directly.
A practical approach is selecting both properties separately and make the decision in your host environment:
$.[?(#.conditionalField=='ab')].dict[0].id
$.[?(#.conditionalField=='xy')].dict[1].id

Try something like this:
$.[?(#.conditionalField=="ab")].array[1].id
Output should be:
[
2
]

Related

How to get value with an underscore inside a string from Elasticsearch using QueryBuilder in Java?

I'm using Elasticsearch 3.2.7 and ElasticsearchRepository.search() which takes QueryBuilder as an argument (doc)
I have a BoolQueryBuilder and use it like this:
boolQuery.must(termQuery("myObject.code", value);
var results = searchRepository.search(boolQuery);
The definition of the field code is as follows:
"myObject": {
"properties": {
"code": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
The issue is, when I search with a value that has underscore inside, for example: FOO_BAR then it doesn't return any results. When I search with other values that have either leading or trailing underscore then it's fine.
I've read that ES may ignore the special character and split the words inside by it so there's a need for an exact match search. But I also read that the keyword setting guarantees that. So right now I'm confused.
yes, you are correct, using keyword field you can achieve the exact match, you need to use the below query
boolQuery.must(termQuery("myObject.code.keyword", value); --> note addition of keyword
var results = searchRepository.search(boolQuery);
you can use the analyze API to see the tokens for your indexed documents and search term, and basically your tokens in index must match search terms tokens, in order ES to return the match :)

MongoDB TextCriteria split on specific characters [duplicate]

Example:
> db.stuff.save({"foo":"bar"});
> db.stuff.find({"foo":"bar"}).count();
1
> db.stuff.find({"foo":"BAR"}).count();
0
You could use a regex.
In your example that would be:
db.stuff.find( { foo: /^bar$/i } );
I must say, though, maybe you could just downcase (or upcase) the value on the way in rather than incurring the extra cost every time you find it. Obviously this wont work for people's names and such, but maybe use-cases like tags.
UPDATE:
The original answer is now obsolete. Mongodb now supports advanced full text searching, with many features.
ORIGINAL ANSWER:
It should be noted that searching with regex's case insensitive /i means that mongodb cannot search by index, so queries against large datasets can take a long time.
Even with small datasets, it's not very efficient. You take a far bigger cpu hit than your query warrants, which could become an issue if you are trying to achieve scale.
As an alternative, you can store an uppercase copy and search against that. For instance, I have a User table that has a username which is mixed case, but the id is an uppercase copy of the username. This ensures case-sensitive duplication is impossible (having both "Foo" and "foo" will not be allowed), and I can search by id = username.toUpperCase() to get a case-insensitive search for username.
If your field is large, such as a message body, duplicating data is probably not a good option. I believe using an extraneous indexer like Apache Lucene is the best option in that case.
Starting with MongoDB 3.4, the recommended way to perform fast case-insensitive searches is to use a Case Insensitive Index.
I personally emailed one of the founders to please get this working, and he made it happen! It was an issue on JIRA since 2009, and many have requested the feature. Here's how it works:
A case-insensitive index is made by specifying a collation with a strength of either 1 or 2. You can create a case-insensitive index like this:
db.cities.createIndex(
{ city: 1 },
{
collation: {
locale: 'en',
strength: 2
}
}
);
You can also specify a default collation per collection when you create them:
db.createCollection('cities', { collation: { locale: 'en', strength: 2 } } );
In either case, in order to use the case-insensitive index, you need to specify the same collation in the find operation that was used when creating the index or the collection:
db.cities.find(
{ city: 'new york' }
).collation(
{ locale: 'en', strength: 2 }
);
This will return "New York", "new york", "New york" etc.
Other notes
The answers suggesting to use full-text search are wrong in this case (and potentially dangerous). The question was about making a case-insensitive query, e.g. username: 'bill' matching BILL or Bill, not a full-text search query, which would also match stemmed words of bill, such as Bills, billed etc.
The answers suggesting to use regular expressions are slow, because even with indexes, the documentation states:
"Case insensitive regular expression queries generally cannot use indexes effectively. The $regex implementation is not collation-aware and is unable to utilize case-insensitive indexes."
$regex answers also run the risk of user input injection.
If you need to create the regexp from a variable, this is a much better way to do it: https://stackoverflow.com/a/10728069/309514
You can then do something like:
var string = "SomeStringToFind";
var regex = new RegExp(["^", string, "$"].join(""), "i");
// Creates a regex of: /^SomeStringToFind$/i
db.stuff.find( { foo: regex } );
This has the benefit be being more programmatic or you can get a performance boost by compiling it ahead of time if you're reusing it a lot.
Keep in mind that the previous example:
db.stuff.find( { foo: /bar/i } );
will cause every entries containing bar to match the query ( bar1, barxyz, openbar ), it could be very dangerous for a username search on a auth function ...
You may need to make it match only the search term by using the appropriate regexp syntax as:
db.stuff.find( { foo: /^bar$/i } );
See http://www.regular-expressions.info/ for syntax help on regular expressions
db.company_profile.find({ "companyName" : { "$regex" : "Nilesh" , "$options" : "i"}});
db.zipcodes.find({city : "NEW YORK"}); // Case-sensitive
db.zipcodes.find({city : /NEW york/i}); // Note the 'i' flag for case-insensitivity
TL;DR
Correct way to do this in mongo
Do not Use RegExp
Go natural And use mongodb's inbuilt indexing , search
Step 1 :
db.articles.insert(
[
{ _id: 1, subject: "coffee", author: "xyz", views: 50 },
{ _id: 2, subject: "Coffee Shopping", author: "efg", views: 5 },
{ _id: 3, subject: "Baking a cake", author: "abc", views: 90 },
{ _id: 4, subject: "baking", author: "xyz", views: 100 },
{ _id: 5, subject: "Café Con Leche", author: "abc", views: 200 },
{ _id: 6, subject: "Сырники", author: "jkl", views: 80 },
{ _id: 7, subject: "coffee and cream", author: "efg", views: 10 },
{ _id: 8, subject: "Cafe con Leche", author: "xyz", views: 10 }
]
)
Step 2 :
Need to create index on whichever TEXT field you want to search , without indexing query will be extremely slow
db.articles.createIndex( { subject: "text" } )
step 3 :
db.articles.find( { $text: { $search: "coffee",$caseSensitive :true } } ) //FOR SENSITIVITY
db.articles.find( { $text: { $search: "coffee",$caseSensitive :false } } ) //FOR INSENSITIVITY
One very important thing to keep in mind when using a Regex based query - When you are doing this for a login system, escape every single character you are searching for, and don't forget the ^ and $ operators. Lodash has a nice function for this, should you be using it already:
db.stuff.find({$regex: new RegExp(_.escapeRegExp(bar), $options: 'i'})
Why? Imagine a user entering .* as his username. That would match all usernames, enabling a login by just guessing any user's password.
Suppose you want to search "column" in "Table" and you want case insensitive search. The best and efficient way is:
//create empty JSON Object
mycolumn = {};
//check if column has valid value
if(column) {
mycolumn.column = {$regex: new RegExp(column), $options: "i"};
}
Table.find(mycolumn);
It just adds your search value as RegEx and searches in with insensitive criteria set with "i" as option.
Mongo (current version 2.0.0) doesn't allow case-insensitive searches against indexed fields - see their documentation. For non-indexed fields, the regexes listed in the other answers should be fine.
For searching a variable and escaping it:
const escapeStringRegexp = require('escape-string-regexp')
const name = 'foo'
db.stuff.find({name: new RegExp('^' + escapeStringRegexp(name) + '$', 'i')})
Escaping the variable protects the query against attacks with '.*' or other regex.
escape-string-regexp
The best method is in your language of choice, when creating a model wrapper for your objects, have your save() method iterate through a set of fields that you will be searching on that are also indexed; those set of fields should have lowercase counterparts that are then used for searching.
Every time the object is saved again, the lowercase properties are then checked and updated with any changes to the main properties. This will make it so you can search efficiently, but hide the extra work needed to update the lc fields each time.
The lower case fields could be a key:value object store or just the field name with a prefixed lc_. I use the second one to simplify querying (deep object querying can be confusing at times).
Note: you want to index the lc_ fields, not the main fields they are based off of.
Using Mongoose this worked for me:
var find = function(username, next){
User.find({'username': {$regex: new RegExp('^' + username, 'i')}}, function(err, res){
if(err) throw err;
next(null, res);
});
}
If you're using MongoDB Compass:
Go to the collection, in the filter type -> {Fieldname: /string/i}
For Node.js using Mongoose:
Model.find({FieldName: {$regex: "stringToSearch", $options: "i"}})
The aggregation framework was introduced in mongodb 2.2 . You can use the string operator "$strcasecmp" to make a case-insensitive comparison between strings. It's more recommended and easier than using regex.
Here's the official document on the aggregation command operator: https://docs.mongodb.com/manual/reference/operator/aggregation/strcasecmp/#exp._S_strcasecmp .
You can use Case Insensitive Indexes:
The following example creates a collection with no default collation, then adds an index on the name field with a case insensitive collation. International Components for Unicode
/* strength: CollationStrength.Secondary
* Secondary level of comparison. Collation performs comparisons up to secondary * differences, such as diacritics. That is, collation performs comparisons of
* base characters (primary differences) and diacritics (secondary differences). * Differences between base characters takes precedence over secondary
* differences.
*/
db.users.createIndex( { name: 1 }, collation: { locale: 'tr', strength: 2 } } )
To use the index, queries must specify the same collation.
db.users.insert( [ { name: "Oğuz" },
{ name: "oğuz" },
{ name: "OĞUZ" } ] )
// does not use index, finds one result
db.users.find( { name: "oğuz" } )
// uses the index, finds three results
db.users.find( { name: "oğuz" } ).collation( { locale: 'tr', strength: 2 } )
// does not use the index, finds three results (different strength)
db.users.find( { name: "oğuz" } ).collation( { locale: 'tr', strength: 1 } )
or you can create a collection with default collation:
db.createCollection("users", { collation: { locale: 'tr', strength: 2 } } )
db.users.createIndex( { name : 1 } ) // inherits the default collation
I'm surprised nobody has warned about the risk of regex injection by using /^bar$/i if bar is a password or an account id search. (I.e. bar => .*#myhackeddomain.com e.g., so here comes my bet: use \Q \E regex special chars! provided in PERL
db.stuff.find( { foo: /^\Qbar\E$/i } );
You should escape bar variable \ chars with \\ to avoid \E exploit again when e.g. bar = '\E.*#myhackeddomain.com\Q'
Another option is to use a regex escape char strategy like the one described here Javascript equivalent of Perl's \Q ... \E or quotemeta()
Use RegExp,
In case if any other options do not work for you, RegExp is a good option. It makes the string case insensitive.
var username = new RegExp("^" + "John" + "$", "i");;
use username in queries, and then its done.
I hope it will work for you too. All the Best.
If there are some special characters in the query, regex simple will not work. You will need to escape those special characters.
The following helper function can help without installing any third-party library:
const escapeSpecialChars = (str) => {
return str.replace(/[-[\]{}()*+?.,\\^$|#\s]/g, "\\$&");
}
And your query will be like this:
db.collection.find({ field: { $regex: escapeSpecialChars(query), $options: "i" }})
Hope it will help!
Using a filter works for me in C#.
string s = "searchTerm";
var filter = Builders<Model>.Filter.Where(p => p.Title.ToLower().Contains(s.ToLower()));
var listSorted = collection.Find(filter).ToList();
var list = collection.Find(filter).ToList();
It may even use the index because I believe the methods are called after the return happens but I haven't tested this out yet.
This also avoids a problem of
var filter = Builders<Model>.Filter.Eq(p => p.Title.ToLower(), s.ToLower());
that mongodb will think p.Title.ToLower() is a property and won't map properly.
I had faced a similar issue and this is what worked for me:
const flavorExists = await Flavors.findOne({
'flavor.name': { $regex: flavorName, $options: 'i' },
});
Yes it is possible
You can use the $expr like that:
$expr: {
$eq: [
{ $toLower: '$STRUNG_KEY' },
{ $toLower: 'VALUE' }
]
}
Please do not use the regex because it may make a lot of problems especially if you use a string coming from the end user.
I've created a simple Func for the case insensitive regex, which I use in my filter.
private Func<string, BsonRegularExpression> CaseInsensitiveCompare = (field) =>
BsonRegularExpression.Create(new Regex(field, RegexOptions.IgnoreCase));
Then you simply filter on a field as follows.
db.stuff.find({"foo": CaseInsensitiveCompare("bar")}).count();
These have been tested for string searches
{'_id': /.*CM.*/} ||find _id where _id contains ->CM
{'_id': /^CM/} ||find _id where _id starts ->CM
{'_id': /CM$/} ||find _id where _id ends ->CM
{'_id': /.*UcM075237.*/i} ||find _id where _id contains ->UcM075237, ignore upper/lower case
{'_id': /^UcM075237/i} ||find _id where _id starts ->UcM075237, ignore upper/lower case
{'_id': /UcM075237$/i} ||find _id where _id ends ->UcM075237, ignore upper/lower case
For any one using Golang and wishes to have case sensitive full text search with mongodb and the mgo godoc globalsign library.
collation := &mgo.Collation{
Locale: "en",
Strength: 2,
}
err := collection.Find(query).Collation(collation)
As you can see in mongo docs - since version 3.2 $text index is case-insensitive by default: https://docs.mongodb.com/manual/core/index-text/#text-index-case-insensitivity
Create a text index and use $text operator in your query.

Need a regex to find a specific number of a specific attribute from JSON

In jmeter, I have this Regular Expression Extractor to extract the customerId from the response JSON to use that customerId in next requests.
I have following json:
{
"customerList":{
"Customer":{
"customerName":"Test1",
"id":"0215236",
"customerContactNo":"655659856"
},
"Customer":{
"customerName":"Test2",
"id":"99925236",
"customerContactNo":"7458622"
},
"Customer":{
"customerName":"Test3",
"id":"1521865",
"customerContactNo":"7984443613"
}
},
"productList":{
"product":{
"productName":"TestProduct1",
"id":"0215236"
},
"product":{
"productName":"TestProduct2",
"id":"452698"
},
"product":{
"productName":"TestProduct3",
"id":"14567892"
}
}
}
I want to extract the customerId of each customer using a regex.
I am trying following regex:
\"customer\":\{.+\"customerId\":\"([0-9]+)\"
It's not the best idea to use regex for field extraction in JSON. You should consider using native/build-in JSON parser for your language. Since you haven't mentioned what language your are using, here is a link where you can check the list(at the bottom of the page) of tools for your language. Also, as #Andy G mentioned, this is not valid format.
Why don't you just go for "customerId":"(\d+)" regular expression:
References:
JMeter Regular Expressions
Perl 5 Regex Cheat sheet
Using RegEx (Regular Expression Extractor) with JMeter
Be aware that the most of JSON implementations don't allow duplicate objects therefore it might be a bug in your application in terms of multiple Customer objects on the same level. As per JSON speficication
An object is an unordered collection of zero or more name/value
pairs, where a name is a string and a value is a string, number,
boolean, null, object, or array.
Where unordered implies that key names are unique. I would recommend raising this point with your application developers and if it will be considered a bug and will be fixed you will be able to use JSON Extractor instead of Regular Expressions
This JSON is invalid, check it in https://jsonchecker.com/ and you'll get:
Duplicate key 'Customer' on line 8
Same at https://jsonformatter.curiousconcept.com/:
Warning:Duplicate key, names should be unique.[Code 23, Structure 21]
Warning:Duplicate key, names should be unique.[Code 23, Structure 37]
Warning:Duplicate key, names should be unique.[Code 23, Structure 69]
Warning:Duplicate key, names should be unique.[Code 23, Structure 81]
So fix the JSON before trying such extraction.

JsonPath Fetch string after filter

I am using Jayway JsonPath 2.2 version in JAVA. I have few questions on the same.
Sample JSON:
{
"l5": [
{
"type": "type1",
"result": [
"res1"
]
},
{
"type": "type2",
"result": [
"r1"
]
}
]
}
I want to fetch a string after a filter is applied?
Eg:
Path used to fetch a string is l5[?(#.type == 'type2')].type
Expected result: type2 (string), but getting ["type2"] (array)
Please correct the path if i am doing anything wrong to fetch it as a string?
Unable to index the resultant array after the filter is applied. How can i achieve the same?
Eg:
If i use the path l5[?(#.type == 'type2')][0], instead of returning me the first JSONObject it returns []
Is it possible to extract substring from a string using jsonPath?
Something like l5[0].type[0,2] => res
Note: I cannot do any operation in JAVA, as i wanted to keep the library generic?
The return value of jsonPath is always an array, quoted from here.
So, the operation1 you are trying is not possible i.e. String value will never be returned.
Even operation 2 and operation 3 doesn't look feasible with jayway jsonpath.
Good read on jayway Jsonpath.
Only Possible way for operation 1 and 2 looks something like below:-
String jsonpath="l5[?(#.type == 'type2')].type";
DocumentContext jsonContext = JsonPath.parse(jsonString);
List<String> typeDataList= jsonContext.read(jsonpath);
System.out.println(typeDataList.get(0)); //type2

Retrieve data from Elasticsearch using aggregations where the values contains hyphen

I am working on elastic search for quite some time now... I have been facing a problem recently.
I want to group by a particular column in elastic search index. The values for that particular column has hyphens and other special characters.
SearchResponse res1 = client.prepareSearch("my_index")
.setTypes("data")
.setSearchType(SearchType.QUERY_AND_FETCH)
.setQuery(QueryBuilders.rangeQuery("timestamp").gte(from).lte(to))
.addAggregation(AggregationBuilders.terms("cat_agg").field("category").size(10))
.setSize(0)
.execute()
.actionGet();
Terms termAgg=res1.getAggregations().get("cat_agg");
for(Bucket item :termAgg.getBuckets()) {
cat_number =item.getKey();
System.out.println(cat_number+" "+item.getDocCount());
}
This is the query I have written inorder to get the data groupby "category" column in "my_index".
The output I expected after running the code is:
category-1 10
category-2 9
category-3 7
But the output I am getting is :
category 10
1 10
category 9
2 9
category 7
3 7
I have already went through some questions like this one, but couldn't solve my issue with these answers.
That's because your category field has a default string mapping and it is analyzed, hence category-1 gets tokenized as two tokens namely category and 1, which explains the results you're getting.
In order to prevent this, you can update your mapping to include a sub-field category.raw which is going to be not_analyzed with the following command:
curl -XPUT localhost:9200/my_index/data/_mapping -d '{
"properties": {
"category": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}'
After that, you need to re-index your data and your aggregation will work and return you what you expect.
Just make sure to change the following line in your Java code:
.addAggregation(AggregationBuilders.terms("cat_agg").field("category.raw").size(10))
^
|
add .raw here
When you index "category-1" you will get (by default) two terms, "category", and "1". Therefore when you aggregate you will get back two results for that.
If you want it to be considered a single "term" then you need to change the analyzer used on that field when indexing. Set it to use the keyword analyzer

Categories

Resources