sphinx facet search: 'with_all' analogue for php/java

sphinx facet search: 'with_all' analogue for php/java - java

I have a doc set with MVA and i need to filer docs that have all required attrs (let's say, i need all 'news' doc having both 'java' and 'oracle' tags, assume i have tags ids).
at ThinkingSphinx (http://pat.github.com/ts/en/searching.html#filters) i found a usefull notaion:
For matching multiple values in a multi-value attribute, :with doesn’t
quite do what you want. Give :with_all a try instead:
Article.search 'pancakes', :with_all => {:tag_ids => #tags.collect(&:id)}
that, as far as i guess, allows to filter docs having ALL the provided attributes, instead of getting docs, having ANY of provided attr list for SetFilterRange usage.
Can anyone suggest soluion at least in terms of standard PHP interface? hope i'll be able to transform it to java.

Multiple calls to setFilter are ANDed. Where as you note specifing multiple ids to one call are ORed.
$cl->setFilter('tag_ids',array($tag_id1));
$cl->setFilter('tag_ids',array($tag_id2));
$cl->setFilter('tag_ids',array($tag_id3));
Sorry cant help with the java syntax.

Related

javax pathparam is being ignored [duplicate]

I am not asking the question that is already asked here:
What is the difference between #PathParam and #QueryParam
This is a "best practices" or convention question.
When would you use #PathParam vs #QueryParam.
What I can think of that the decision might be using the two to differentiate the information pattern. Let me illustrate below my LTPO - less than perfect observation.
PathParam use could be reserved for information category, which would fall nicely into a branch of an information tree. PathParam could be used to drill down to entity class hierarchy.
Whereas, QueryParam could be reserved for specifying attributes to locate the instance of a class.
For example,
/Vehicle/Car?registration=123
/House/Colonial?region=newengland
/category?instance
#GET
#Path("/employee/{dept}")
Patient getEmployee(#PathParam("dept")Long dept, #QueryParam("id")Long id) ;
vs /category/instance
#GET
#Path("/employee/{dept}/{id}")
Patient getEmployee(#PathParam("dept")Long dept, #PathParam("id")Long id) ;
vs ?category+instance
#GET
#Path("/employee")
Patient getEmployee(#QueryParam("dept")Long dept, #QueryParam("id")Long id) ;
I don't think there is a standard convention of doing it. Is there? However, I would like to hear of how people use PathParam vs QueryParam to differentiate their information like I exemplified above. I would also love to hear the reason behind the practice.

REST may not be a standard as such, but reading up on general REST documentation and blog posts should give you some guidelines for a good way to structure API URLs. Most rest APIs tend to only have resource names and resource IDs in the path. Such as:
/departments/{dept}/employees/{id}
Some REST APIs use query strings for filtering, pagination and sorting, but Since REST isn't a strict standard I'd recommend checking some REST APIs out there such as github and stackoverflow and see what could work well for your use case.
I'd recommend putting any required parameters in the path, and any optional parameters should certainly be query string parameters. Putting optional parameters in the path will end up getting really messy when trying to write URL handlers that match different combinations.

This is what I do.
If there is a scenario to retrieve a record based on id, for example you need to get the details of the employee whose id is 15, then you can have resource with #PathParam.
GET /employee/{id}
If there is a scenario where you need to get the details of all employees but only 10 at a time, you may use query param
GET /employee?start=1&size=10
This says that starting employee id 1 get ten records.
To summarize, use #PathParam for retrieval based on id. User #QueryParam for filter or if you have any fixed list of options that user can pass.

I think that if the parameter identifies a specific entity you should use a path variable. For example, to get all the posts on my blog I request
GET: myserver.com/myblog/posts
to get the post with id = 123, I would request
GET: myserver.com/myblog/posts/123
but to filter my list of posts, and get all posts since Jan 1, 2013, I would request
GET: myserver.com/myblog/posts?since=2013-01-01
In the first example "posts" identifies a specific entity (the entire collection of blog posts). In the second example, "123" also represents a specific entity (a single blog post). But in the last example, the parameter "since=2013-01-01" is a request to filter the posts collection not a specific entity. Pagination and ordering would be another good example, i.e.
GET: myserver.com/myblog/posts?page=2&order=backward
Hope that helps. :-)

I personally used the approach of "if it makes sense for the user to bookmark a URLwhich includes these parameters then use PathParam".
For instance, if the URL for a user profile includes some profile id parameter, since this can be bookmarked by the user and/or emailed around, I would include that profile id as a path parameter. Also, another considerent to this is that the page denoted by the URL which includes the path param doesn't change -- the user will set up his/her profile, save it, and then unlikely to change that much from there on; this means webcrawlers/search engines/browsers/etc can cache this page nicely based on the path.
If a parameter passed in the URL is likely to change the page layout/content then I'd use that as a queryparam. For instance, if the profile URL supports a parameter which specifies whether to show the user email or not, I would consider that to be a query param. (I know, arguably, you could say that the &noemail=1 or whatever parameter it is can be used as a path param and generates 2 separate pages -- one with the email on it, one without it -- but logically that's not the case: it is still the same page with or without certain attributes shown.
Hope this helps -- I appreciate the explanation might be a bit fuzzy :)

You can use query parameters for filtering and path parameters for grouping. The following link has good info on this When to use pathParams or QueryParams

Before talking about QueryParam & PathParam. Let's first understand the URL & its components. URL consists of endpoint + resource + queryParam/ PathParam.
For Example,
URL: https://app.orderservice.com/order?order=12345678
or
URL: https://app.orderservice.com/orders/12345678
where
endpoint: https://app.orderservice.com
resource: orders
queryParam: order=12345678
PathParam: 12345678
#QueryParam:
QueryParam is used when the requirement is to filter the request based on certain criteria/criterias. The criteria is specified with ? after the resource in URL. Multiple filter criterias can be specified in the queryParam by using & symbol.
For Example:
https://app.orderservice.com/orders?order=12345678 & customername=X
#PathParam:
PathParam is used when the requirement is to select the particular order based on guid/id. PathParam is the part of the resource in URL.
For Example:
https://app.orderservice.com/orders/12345678

It's a very interesting question.
You can use both of them, there's not any strict rule about this subject, but using URI path variables has some advantages:
Cache:
Most of the web cache services on the internet don't cache GET request when they contains query parameters.
They do that because there are a lot of RPC systems using GET requests to change data in the server (fail!! Get must be a safe method)
But if you use path variables, all of this services can cache your GET requests.
Hierarchy:
The path variables can represent hierarchy:
/City/Street/Place
It gives the user more information about the structure of the data.
But if your data doesn't have any hierarchy relation you can still use Path variables, using comma or semi-colon:
/City/longitude,latitude
As a rule, use comma when the ordering of the parameters matter, use semi-colon when the ordering doesn't matter:
/IconGenerator/red;blue;green
Apart of those reasons, there are some cases when it's very common to use query string variables:
When you need the browser to automatically put HTML form variables into the URI
When you are dealing with algorithm. For example the google engine use query strings:
http:// www.google.com/search?q=rest
To sum up, there's not any strong reason to use one of this methods but whenever you can, use URI variables.

From Wikipedia: Uniform Resource Locator
A path, which contains data, usually organized in hierarchical form, that appears as a sequence of segments separated by slashes.
An optional query, separated from the preceding part by a question mark (?), containing a query string of non-hierarchical data.
— According with the conceptual design of the URL, we might implement a PathParam for hierarchical data/directives/locator components, or implement a QueryParam when the data are not hierarchical. This makes sense because paths are naturally ordered, whereas queries contain variables which may be ordered arbitrarily (unordered variable/value pairs).
A previous commenter wrote,
I think that if the parameter identifies a specific entity you should use a path variable.
Another wrote,
Use #PathParam for retrieval based on id. User #QueryParam for filter or if you have any fixed list of options that user can pass.
Another,
I'd recommend putting any required parameters in the path, and any optional parameters should certainly be query string parameters.
— However, one might implement a flexible, non-hierarchical system for identifying specific entities! One might have multiple unique indexes on an SQL table, and allow entities to be identified using any combination of fields that comprise a unique index! Different combinations (perhaps also ordered differently), might be used for links from various related entities (referrers). In this case, we might be dealing with non-hierarchical data, used to identify individual entities — or in other cases, might only specify certain variables/fields — certain components of unique indexes — and retrieve a list/set of records. In such cases, it might be easier, more logical and reasonable to implement the URLs as QueryParams!
Could a long hexadecimal string dilute/diminish the value of keywords in the rest of the path? It might be worth considering the potential SEO implications of placing variables/values in the path, or in the query, and the human-interface implications of whether we want users to be able to traverse/explore the hierarchy of URLs by editing the contents of the address bar. My 404 Not Found page uses SSI variables to automatically redirect broken URLs to their parent! Search robots might also traverse the path hierarchy.
On the other hand, personally, when I share URLs on social media, I manually strip out any private unique identifiers — typically by truncating the query from the URL, leaving only the path: in this case, there is some utility in placing unique identifiers in the path rather than in the query. Whether we want to facilitate the use of path components as a crude user-interface, perhaps depends on whether the data/components are human-readable or not. The question of human-readability relates somewhat to the question of hierarchy: often, data that may be expressed as human-readable keywords are also hierarchical; while hierarchical data may often be expressed as human-readable keywords. (Search engines themselves might be defined as augmenting the use of URLs as a user-interface.) Hierarchies of keywords or directives might not be strictly ordered, but they are usually close enough that we can cover alternative cases in the path, and label one option as the "canonical" case.
There are fundamentally several kinds of questions we might answer with the URL for each request:
What kind of record/ thing are we requesting/ serving?
Which one(s) are we interested in?
How do we want to present the information/ records?
Q1 is almost certainly best covered by the path, or by PathParams.
Q3 (which is probably controlled via a set of arbitrarily ordered optional parameters and default values); is almost certainly best covered by QueryParams.
Q2: It depends…

PATH PARAMETER -
Path Parameter is a variable in URL path that helps to point some specific resource.
Example - https://sitename.com/questions/115
Here, if 115 is a path parameter it can be changed with other valid number to fetch/point to some other resource on the same application.
QUERY PARAMETER -
Query Parameters are variables in URL path that filter some particular resources from the list.
Example - https://sitename.com/questions/115?qp1=val1&qp2=val2&qp3=val3
Here qp1, qp2 and qp3 are Query Variables with their values as val1, val2 and val3. These can be used to apply as filters while fetching/saving our data. Query variables are always appended in URL after a question Mark(?).

As theon noted, REST is not a standard. However, if you are looking to implement a standards based URI convention, you might consider the oData URI convention. Ver 4 has been approved as an OASIS standard and libraries exists for oData for various languages including Java via Apache Olingo. Don't let the fact that it's a spawn from Microsoft put you off since it's gained support from other industry player's as well, which include Red Hat, Citrix, IBM, Blackberry, Drupal, Netflix Facebook and SAP
More adopters are listed here

You can support both query parameters and path parameters, e.g., in the case of aggregation of resources -- when the collection of sub-resources makes sense on its own.
/departments/{id}/employees
/employees?dept=id
Query parameters can support hierarchical and non-hierarchical subsetting; path parameters are hierarchical only.
Resources can exhibit multiple hierarchies. Support short paths if you will be querying broad sub-collections that cross hierarchical boundaries.
/inventory?make=toyota&model=corolla
/inventory?year=2014
Use query parameters to combine orthogonal hierarchies.
/inventory/makes/toyota/models/corolla?year=2014
/inventory/years/2014?make=toyota&model=corolla
/inventory?make=toyota&model=corolla&year=2014
Use only path parameters in the case of composition -- when a resource doesn't make sense divorced from its parent, and the global collection of all children is not a useful resource in itself.
/words/{id}/definitions
/definitions?word=id // not useful

I prefer following :
#PathParam
When it's required parameters such as ID, productNo
GET /user/details/{ID}
GET /products/{company}/{productNo}
#QueryParam
When you need to pass optional parameters such as filters, online state and They can be null
GET /user/list?country=USA&status=online
GET /products/list?sort=ASC
When Used both
GET /products/{company}/list?sort=ASC

The reason is actually very simple. When using a query parameter you can take in characters such as "/" and your client does not need to html encode them. There are other reasons but that is a simple example. As for when to use a path variable. I would say whenever you are dealing with ids or if the path variable is a direction for a query.

I am giving one exapmle to undersand when do we use #Queryparam and #pathparam
For example I am taking one resouce is carResource class
If you want to make the inputs of your resouce method manadatory then use the param type as #pathaparam, if the inputs of your resource method should be optional then keep that param type as #QueryParam param
#Path("/car")
class CarResource
{
#Get
#produces("text/plain")
#Path("/search/{carmodel}")
public String getCarSearch(#PathParam("carmodel")String model,#QueryParam("carcolor")String color) {
//logic for getting cars based on carmodel and color
-----
return cars
}
}
For this resouce pass the request
req uri ://address:2020/carWeb/car/search/swift?carcolor=red
If you give req like this the resouce will gives the based car model and color
req uri://address:2020/carWeb/car/search/swift
If you give req like this the resoce method will display only swift model based car
req://address:2020/carWeb/car/search?carcolor=red
If you give like this we will get ResourceNotFound exception because in the car resouce class I declared carmodel as #pathPram that is you must and should give the carmodel as reQ uri otherwise it will not pass the req to resouce but if you don't pass the color also it will pass the req to resource why because the color is #quetyParam it is optional in req.

#QueryParam can be conveniently used with the Default Value annotation so that you can avoid a null pointer exception if no query parameter is passed.
When you want to parse query parameters from a GET request, you can simply define respective parameter to the method that will handle the GET request and annotate them with #QueryParam annotation
#PathParam extracts the URI values and matches to #Path. And hence gets the input parameter.
2.1 #PathParam can be more than one and is set to methods arguments
#Path("/rest")
public class Abc {
#GET
#Path("/msg/{p0}/{p1}")
#Produces("text/plain")
public String add(#PathParam("p0") Integer param1, #PathParam("p1") Integer param2 )
{
return String.valueOf(param1+param2);
}
}
In the above example,
http://localhost:8080/Restr/rest/msg/{p0}/{p1},
p0 matches param1 and p1 matches param2. So for the URI
http://localhost:8080/Restr/rest/msg/4/6,
we get the result 10.
In REST Service, JAX-RS provides #QueryParam and #FormParam both for accepting data from HTTP request. An HTTP form can be submitted by different methods like GET and POST.
#QueryParam : Accepts GET request and reads data from query string.
#FormParam: Accepts POST request and fetches data from HTML form or any request of the media

In nutshell,
#Pathparam works for value passing through both Resources and Query String
/user/1
/user?id=1
#Queryparam works for value passing only Query String
/user?id=1

For resource names and IDs, I use #PathParams. For optional variables, I use #QueryParams

As per my understanding:
Use #PathParam - when it is a mandatory item such as an Id
GET /balloon/{id}
Use #QueryParam - when you have the exact resource but need to filter that on some optional traits such as color, size, etc.
GET /balloon/123?color=red&size=large

Jayway JSONPath: how to select terminal nodes

I'm using Jayway JSONPath.
Given a JSON document having nodes with the same name at different structure levels, how would I select only those nodes that are terminal nodes, i.e. having only text or no content?
XPath would allow not(child::*) as a predicate, but I can't see a JSONPath equivalent.

Unfortunately, no JSONPath implementation (as of now) offers such an operation. However, some of the more advanced implementations that expanded on Goessner's reference have operations that get close to this.
One workaround is to use check the type of a node, if possible. For instance this is possible in the JavaScript JSONPath-Plus implementation using Type selectors for JSON types: e.g. #null(), #boolean(), #number(), #string(), #array(), #object()
#integer() and others. This allow us for instance to get only numeric values:
$..*#number()
Combined with a more meaningful path selection we might get close. Nonetheless, this will not yield terminal values only but at least avoids array and object type properties.
Another workaround that is should work with basic data types is to use the regex matcher available in quite a few implementations (like JayWays, many JavaScript implementations, etc) to interpret the type of a node, e.g. again let's say numeric values
$..[?(#.price =~ /[0-9]+\.?[0-9]*/)]
Again, this will not give you terminal values only but avoids array and object type properties.

How to retrieve list of AWS AutoScalingGroups in Java filtered by a specified String?

In AWS console, you can search for all autoscalinggroups and filter by a string if the name contains that string. Is it possible to do the same in Java?
I see that I can do the following through the Java API:
AmazonAutoScalingClient scalingClient = new AmazonAutoScalingClient(awsCredentials);
DescribeAutoScalingGroupsResult autoScalingGroups = scalingClient.describeAutoScalingGroups();
But, is there a way to say "only return autoscalinggroups if name contains specified string" ?
Thanks

I'm pretty sure there is not a direct way, since AWS Java APIs are usually a direct mapping to the REST API of AWS:
http://docs.aws.amazon.com/AutoScaling/latest/APIReference/API_DescribeAutoScalingGroups.html
And that API does not offer the feature you mention.
Also, Java API does not let you specify that:
http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/autoscaling/model/DescribeAutoScalingGroupsRequest.html
Therefore, I'd recommend to get all the autoscaling groups using existing functions and filter the result with regular expressions. That list should not be large.

Hibernate search - '%like%' type query

I'm using hibernate-search in my Spring MVC project and I would like to accomplish something but I'm not sure if it's possible. Here is the problem:
I'm using NGramFilterFactoryClass for this and have configured minGramSize=3 and maxGramSize=3.
Let's say my search term is "Keyword"
If I type anything like this:
"ywo", "key", "ord", "blablaordblabla"
query will return "Keyword". This is fine and I understand how this works but what I wanna do is when I type something like:
"bkey", "blablaordblabla"
I don't want to return "Keyword". "Keyword" should be returned only when search term is something like:
"key", "ord", "ywo", "eywo", "word" etc...
So, I guess I'm looking for a '%like%' type query. How can I accomplish this with hinernate-search?

I don't know if is what you are looking for, but maybe you need what is called "wildcard queries".
Try to have a look at this link as reference.
Also have a look at this stackoverflow topic

If you Analyze your input with NGrams you won't be able to perform exact "Like%" queries.
You probably want a SimpleAnalyzer or something similar which doesn't completely break your keywords in smaller pieces, or you might want to skip Analysis for this field and index it as-is.
You then combine this with a WildCard Query; note how example in the reference docs uses the keyword element to build the query, which inherently disables the analyzer on the input. (Make sure you scroll down the the Wildcard queries section in the docs).
I assume you're using NGrams because you need them for another use case. Remember you can use the #Fields annotation to index a same property in various different ways, so you could index it with ngrams and also in another form more suited for wildcard queries.

Generate valid XML name in Java

Are there any helpers that will transform/escape a string to be a valid XML name ?
Example, I have the string max(OfAll) and need to generate some XML like e.g.
<max(OfAll)>SomeText</<max(OfAll)>
That's obviously not a valid name, are there some helper methods that can transform the string to be a valid xml name ?
(For comparison, .NET have some methods that the above xml fragment would be:
<max_x028_OfAll_x028_>SomeText</<max_x028_OfAll_x028_>)

The encoding in your .NET example looks like the one defined in ISO9075. I don't think there is a built-in implementation in the jdk, but this encoding is also used by content repositories like alfresco or jackrabbit for their xml import/exports and query apis. A quick search turned up these two implementations, both available under open source licenses:
http://www.docjar.com/html/api/org/apache/jackrabbit/util/ISO9075.java.html
http://kickjava.com/src/org/alfresco/util/ISO9075.java.htm

One class which may be of use in other situations is StringEscapeUtils in the apache commons-lang project. It can escape text for use in XML documents, I'm not aware of anything to escape XML element names.
Could you not generate something more readable such as
<aggregation type="max(OfAll)">SomeText</aggregation>
There are lots of libraries available to marshall/unmarshall objects to xml and back including JAXB (part of the JDK), JiBX, Castor, XStream

I don't know of any helper methods for that, but rules here http://www.w3.org/TR/REC-xml/#NT-Name are pretty straightforward, so it should be easy to implement one.

As should be clear, normal XML escaping (replacing inappropriate characters with character entities) does not result in a valid XML identifier.
For the record, what you are doing is frequently called "name mangling".

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.