Spring-data-elasticsearch search for specific fields in multiple indices

Spring-data-elasticsearch search for specific fields in multiple indices - java

I try to search specific fields in multiple indices. I have two indices country and region. Both of the indices have a Field called name.
I am able to specify my field name and my indices in my query using elasticsaerchTemplate:
#Override
public Page<SearchHit> searchAllTest(String text) {
QueryBuilder queryBuilder = QueryBuilders.boolQuery()
.should(QueryBuilders.queryStringQuery(text).field("name"));
NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
.withQuery(queryBuilder)
.withIndices("region", "country").build();
ResultsExtractor<Page<SearchHit>> rs = new ResultsExtractor<Page<SearchHit>>() {
#Override
public Page<SearchHit> extract(SearchResponse response) {
List<SearchHit> hits = Arrays.asList(response.getHits().getHits());
return new PageImpl<SearchHit>(hits, PageRequest.of(0, 10), response.getHits().getTotalHits());
}
};
return elasticsearchTemplate.query(nativeSearchQuery, rs);
}
This code works and searches for the field name in both of the indices. But I would like to specify the field name in index region and give for example a boost.
In simple words:
Field name belongs to index region and get a boost.
Field name belongs to index country and get no boost.
Is there a way to specify a field for a particular index?

Try to use withIndexBoost method:
NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
.withQuery(queryBuilder)
.withIndicesBoost(Arrays.asList(new IndexBoost[] {new IndexBoost("region", 2.0f)}))
.withIndices("region", "country").build();

Related

How to do pagination with DynamoDBMapper?

I'm developing an application in Quarkus that integrates with the DynamoDB database. I have a query method that returns a list and I'd like this list to be paginated, but it would have to be done manually by passing the parameters.
I chose to use DynamoDBMapper because it gives more possibilities to work with lists of objects and the level of complexity is lower.
Does anyone have any idea how to do this pagination manually in the function?

DynamoDBScanExpression scanExpression = new DynamoDBScanExpression()
.withLimit(pageSize)
.withExclusiveStartKey(paginationToken);
PaginatedScanList<YourModel> result = mapper.scan(YourModel.class, scanExpression);
String nextPaginationToken = result.getLastEvaluatedKey();
You can pass the pageSize and paginationToken as parameters to your query method. The nextPaginationToken can be returned along with the results, to be used for the next page.

DynamoDB Mapper paginates by iterating over the results, by lazily loading the dataset:
By default, the scan method returns a "lazy-loaded" collection. It initially returns only one page of results, and then makes a service call for the next page if needed. To obtain all the matching items, iterate over the result collection.
Ref
For example:
List<Customer> result = mapper.scan(Customer.class, scanExpression);
for ( Customer cust : result ) {
System.out.println(cust.getId());
}
To Scan manually page by page you can use ScanPage
final DynamoDBScanExpression scanPageExpression = new DynamoDBScanExpression()
.withLimit(limit);
do {
ScanResultPage<MyClass> scanPage = mapper.scanPage(MyClass.class, scanPageExpression);
scanPage.getResults().forEach(System.out::println);
System.out.println("LastEvaluatedKey=" + scanPage.getLastEvaluatedKey());
scanPageExpression.setExclusiveStartKey(scanPage.getLastEvaluatedKey());
} while (scanPageExpression.getExclusiveStartKey() != null);
Ref
Ref

Jest sort results by name

I have a Person index in my ElasticSearch database i get all the persons via this method:
public List<Person> findAll() {
SearchResult result = null;
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchAllQuery());
Search search = new Search.Builder(searchSourceBuilder.toString()).addIndex(PERSON_INDEX_NAME)
.addType(PERSON_TYPE_NAME).build();
try {
result = client.execute(search);
} catch (IOException e) {
}
List<SearchResult.Hit<Person, Void>> hits = result.getHits(Person.class);
return hits.stream().map(this::getPerson).collect(Collectors.toList());
}
but i want to get the results sorted alphabetically by name (person has String id and String name) but i cant figure out how.
any help is apreciated

What type is your name field?
The problem may be that ElasticSearch is splitting the name up into words, and then it can sort on any word. Which can give some pretty random looking results (e.g. the name "Zachary A. Zincstein" would come out highly, because it contains an "A").
A solution is to have a 2nd field, where you set type keyword in the mapping, and sort on that.

well.
i took the list given by ES and ordered it using the Collections sort method

Creating Pagination in Spring Data JPA

I am trying to implement pagination feature in Spring Data JPA.
I am referring this Blog
My Controller contains following code :
#RequestMapping(value="/organizationData", method = RequestMethod.GET)
public String list(Pageable pageable, Model model){
Page<Organization> members = this.OrganizationRepository.findAll(pageable);
model.addAttribute("members", members.getContent());
float nrOfPages = members.getTotalPages();
model.addAttribute("maxPages", nrOfPages);
return "members/list";
}
My DAO is following :
#Query(value="select m from Member m", countQuery="select count(m) from Member m")
Page<Organization> findMembers(Pageable pageable);
I am able to show first 20 records, how do I show next 20???
Is there any other pagination example that I can refer??

The constructors of Pageable are deprecated, use of() instead:
Pageable pageable = PageRequest.of(0, 20);

I've seen similar problem last week, but can't find it so I'll answer directly.
Your problem is that you specify the parameters too late. Pageable works the following way: you create Pageable object with certain properties. You can at least specify:
Page size,
Page number,
Sorting.
So let's assume that we have:
PageRequest p = new PageRequest(2, 20);
the above passed to the query will filter the results so only results from 21th to 40th will be returned.
You don't apply Pageable on result. You pass it with the query.
Edit:
Constructors of PageRequest are deprecated. Use Pageable pageable = PageRequest.of(2, 20);

Pageable object by default comes with size 20, page 0, and unsorted
So if you want the next page in front end the url can be sent with query params page, size,sort and these u can test it on postman.

You can use Page, List or Slice.
If you dont need the number of pages, and only need to know if the next page exists, use Slice, since it does not do the "count" query:
for (int i = 0; ; i++) {
Slice<Organization> pageOrganization = organizationRepository.find(PageRequest.of(0, 100));
List<Organization> organizationList = pageOrganization.getContent();
for (Organization org : organizationList) {
// code
}
if (!pageOrganization.hasNext()) {
break;
}
}

How to query lucene for 2 index fields?

I'd like to execute queries with lucene. But the lookup should not only be based on the input, but also on a 2nd parameter.
Example: imagine the lucene index should contain citynames and countrycodes.
Now, during lookup I already know which country the desired cityname should be in.
SO I want to query the lucene index by cityname, but tell lucene to only look on the citynames where the countrycode matches.
It it possibel? If yes, how?
For a single attribute I would just set up the following:
QueryParser q = QueryParser(Version matchVersion, String f, Analyzer a)
Query q = queryParser.parse(input);
But how for 2 attributes?

Something like this should work. Untested but you should get the idea:
String countryCode = ....; // known in advance
QueryParser queryParser = new QueryParser(matchVersion, f, a);
Query cityNameQuery = queryParser.parse(inputWithCityName);
Query countryCodeQuery = queryParser.parse("+countrycode:" + countryCode);
BooleanQuery result = new BooleanQuery();
result.add(new BooleanClause(cityNameQuery, MUST));
result.add(new BooleanClause(countryCodeQuery, MUST));

Using Lucene to count results in categories

I am trying to use Lucene Java 2.3.2 to implement search on a catalog of products. Apart from the regular fields for a product, there is field called 'Category'. A product can fall in multiple categories. Currently, I use FilteredQuery to search for the same search term with every Category to get the number of results per category.
This results in 20-30 internal search calls per query to display the results. This is slowing down the search considerably. Is there a faster way of achieving the same result using Lucene?

Here's what I did, though it's a bit heavy on memory:
What you need is to create in advance a bunch of BitSets, one for each category, containing the doc id of all the documents in a category. Now, on search time you use a HitCollector and check the doc ids against the BitSets.
Here's the code to create the bit sets:
public BitSet[] getBitSets(IndexSearcher indexSearcher,
Category[] categories) {
BitSet[] bitSets = new BitSet[categories.length];
for(int i=0; i<categories.length; i++)
{
Query query = categories[i].getQuery();
final BitSet bitset = new BitSet()
indexSearcher.search(query, new HitCollector() {
public void collect(int doc, float score) {
bitSet.set(doc);
}
});
bitSets[i] = bitSet;
}
return bitSets;
}
This is just one way to do this. You could probably use TermDocs instead of running a full search if your categories are simple enough, but this should only run once when you load the index anyway.
Now, when it's time to count categories of search results you do this:
public int[] getCategroryCount(IndexSearcher indexSearcher,
Query query,
final BitSet[] bitSets) {
final int[] count = new int[bitSets.length];
indexSearcher.search(query, new HitCollector() {
public void collect(int doc, float score) {
for(int i=0; i<bitSets.length; i++) {
if(bitSets[i].get(doc)) count[i]++;
}
}
});
return count;
}
What you end up with is an array containing the count of every category within the search results. If you also need the search results, you should add a TopDocCollector to your hit collector (yo dawg...). Or, you could just run the search again. 2 searches are better than 30.

I don't have enough reputation to comment (!) but in Matt Quail's answer I'm pretty sure you could replace this:
int numDocs = 0;
td.seek(terms);
while (td.next()) {
numDocs++;
}
with this:
int numDocs = terms.docFreq()
and then get rid of the td variable altogether. This should make it even faster.

You may want to consider looking through all the documents that match categories using a TermDocs iterator.
This example code goes through each "Category" term, and then counts the number of documents that match that term.
public static void countDocumentsInCategories(IndexReader reader) throws IOException {
TermEnum terms = null;
TermDocs td = null;
try {
terms = reader.terms(new Term("Category", ""));
td = reader.termDocs();
do {
Term currentTerm = terms.term();
if (!currentTerm.field().equals("Category")) {
break;
}
int numDocs = 0;
td.seek(terms);
while (td.next()) {
numDocs++;
}
System.out.println(currentTerm.field() + " : " + currentTerm.text() + " --> " + numDocs);
} while (terms.next());
} finally {
if (td != null) td.close();
if (terms != null) terms.close();
}
}
This code should run reasonably fast even for large indexes.
Here is some code that tests that method:
public static void main(String[] args) throws Exception {
RAMDirectory store = new RAMDirectory();
IndexWriter w = new IndexWriter(store, new StandardAnalyzer());
addDocument(w, 1, "Apple", "fruit", "computer");
addDocument(w, 2, "Orange", "fruit", "colour");
addDocument(w, 3, "Dell", "computer");
addDocument(w, 4, "Cumquat", "fruit");
w.close();
IndexReader r = IndexReader.open(store);
countDocumentsInCategories(r);
r.close();
}
private static void addDocument(IndexWriter w, int id, String name, String... categories) throws IOException {
Document d = new Document();
d.add(new Field("ID", String.valueOf(id), Field.Store.YES, Field.Index.UN_TOKENIZED));
d.add(new Field("Name", name, Field.Store.NO, Field.Index.UN_TOKENIZED));
for (String category : categories) {
d.add(new Field("Category", category, Field.Store.NO, Field.Index.UN_TOKENIZED));
}
w.addDocument(d);
}

Sachin, I believe you want faceted search. It does not come out of the box with Lucene. I suggest you try using SOLR, that has faceting as a major and convenient feature.

So let me see if I understand the question correctly: Given a query from the user, you want to show how many matches there are for the query in each category. Correct?
Think of it like this: your query is actually originalQuery AND (category1 OR category2 or ...) except as well an overall score you want to get a number for each of the categories. Unfortunately the interface for collecting hits in Lucene is very narrow, only giving you an overall score for a query. But you could implement a custom Scorer/Collector.
Have a look at the source for org.apache.lucene.search.DisjunctionSumScorer. You could copy some of that to write a custom scorer that iterates through category matches while your main search is going on. And you could keep a Map<String,Long> to keep track of matches in each category.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Spring-data-elasticsearch search for specific fields in multiple indices - java

Try to use withIndexBoost method: NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder() .withQuery(queryBuilder) .withIndicesBoost(Arrays.asList(new IndexBoost[] {new IndexBoost("region", 2.0f)})) .withIndices("region", "country").build();

Related

How to do pagination with DynamoDBMapper?

Jest sort results by name

Creating Pagination in Spring Data JPA

How to query lucene for 2 index fields?

Using Lucene to count results in categories

Categories

Resources