how to index polygon data in Lucene - java

How can I add polygon data to Lucene index. Below is the code snippet which I use
private SpatialContext ctx;
ctx.readShapeFromWkt("POLYGON((-10 30,-40 40,-10 -20,40 20,0 0,-10 30))")
But it give the exception with message
Unknown Shape definition [POLYGON((-10 30,-40 40,-10 -20,40 20,0 0,-10 30))]
While same format works fine in solr. What should I use instead of this format or way

SpatialContext does not support "POLYGON" (see WktShapeParser).
JtsSpatialContext adds support for polygons.
You may need to get the JTS topology suite added to your classpath first. Then set your spatialContextFactory to com.spatial4j.core.context.jts.JtsSpatialContextFactory.

Related

YamlBeans: Turning an object into a hashmap

I have a Yaml file that's something like below:
rules:
- p_table:
["p_event/Name",
"p_fault/Name"]
- s_table:
["s_event/Name",
"s_fault/Name"]
- r_table:
["r_event/Name",
"r_fault/Name"]
So, I can already take the .yml file above and parse through it with YamlBeans and print it out with code like below:
System.out.println(map.get("rules"));
This gives this kind of result:
[{p_table=[p_event/Name, p_fault/Name]},
{s_table=[s_event/Name, s_fault/Name]},
{r_table=[r_event/Name, r_fault/Name]}]
What I would like to do is more on this sort of level, where I can store it in a HashMap and actually use the specifics within the map, with something like this:
HashMap<String, ArrayList<Strings>> Policies = (HashMap)(map.get("rules"));
But when I do that I either have an exception thrown or it just returns null, is there a solution for this should I not be using HashMaps... or can I just not translate objects in such a way? I plan on replacing the String with another type from a different library that uses Strings but wanted to start at the bottom and then go up from there.
The obvious solution would be to remove the sequence from the YAML file:
rules:
p_table:
["p_event/Name",
"p_fault/Name"]
s_table:
["s_event/Name",
"s_fault/Name"]
r_table:
["r_event/Name",
"r_fault/Name"]
If you can't change the YAML file, you need to transform the data after loading it.

jcrfsuite training file format

From what I understand from the example of POS Tagging given in the examples of jcrfsuite. The training file is tab separated and first token is the label. But I do not get the BigCluster| thing. Can somebody help me with how to specify tokens in training file.
Example below:
O BigCluster|00 BigCluster|0000 BigCluster|000000 BigCluster|00000000 BigCluster|0000000000 BigCluster|000000000000 BigCluster|00000000000000 BigCluster|0000000000000000 NextBigCluster|0100 NextBigCluster|01000101 NextBigCluster|010001011111 POSTagDict|D POSTagDict|N POSTagDict|^ POSTagDict|$ POSTagDict|G NextPOSTag|V 1gramSuff|i 1gramPref|i prevword| prevcurr||i nextword|predict nextword|predict currnext|i|predict Word|I Lower|i Xxdshape|X charclass|1, first-shortcap prevnext||predict t=0
Test file format:
! BigCluster|01 BigCluster|0110 BigCluster|011011 BigCluster|01101100 BigCluster|0110110011 BigCluster|011011001100 BigCluster|01101100110000 BigCluster|0110110011000000 NextBigCluster|1000 NextBigCluster|10001000 NextBigCluster|100010000000 POSTagDict|V NextPOSTag|, metaph_POSDict|N 1gramSuff|n 2gramSuff|nn 3gramSuff|mnn 4gramSuff|mmnn 5gramSuff|mmmnn 6gramSuff|ammmnn 7gramSuff|aammmnn 8gramSuff|aaammmnn 9gramSuff|daaammmnn 1gramPref|d 2gramPref|da 3gramPref|daa 4gramPref|daaa 5gramPref|daaam 6gramPref|daaamm 7gramPref|daaammm 8gramPref|daaammmn 9gramPref|daaammmnn prevword| prevcurr||daaammmnn nextword|. nextword|. currnext|daaammmnn|. Word|Daaammmnn Lower|daaammmnn Xxdshape|Xxxxxxxxx charclass|1,2,2,2,2,2,2,2,2, first-initcap prevnext||. t=0
What is specified after the label is a list of feature-name and feature-value.
It is in a sparse representation instead of tabular representation.
BigCluster is just one of the features and it's relevant to the specific example only. You should create your own features if you are training from scratch.
I have noticed that CRFsuite does not care for the naming convention nor feature design of labels and attributes, because treats them as strings.
CRFsuite learns weights of associations (feature weights) between attributes and labels, without knowing the meaning of labels and attributes. In other words, one can design and use arbitrary features just by writing label and attribute names in data sets, just find the best posible attributes for your example and run some experiments with different sets of attributes and features. And you will good to go.

how to list all the indices' name of elasticsearch using java?

In my elasticsearch I want to get all the indices' name of the cluster. How can I do using java?
I search the internet but there's no much useful information.
You can definitely do it with the following simple Java code:
List<IndexMetaData> indices = client.admin().cluster()
.prepareState().get().getState()
.getMetaData().getIndices();
The list you obtain contains the details on all the indices available in your ES cluster.
You can use:
client.admin().indices().prepareGetIndex().setFeatures().get().getIndices();
Use setFeatures() without parameter to just get index name. Otherwise, other data, such as MAPPINGS and SETTINGS of index, will also be returned by default.
Thanks for #Val's answer. According to your method, I use it in my projects, the code is:
ClusterStateResponse response = transportClient.admin().cluster() .prepareState()
.execute().actionGet();
String[] indices=response.getState().getMetaData().getConcreteAllIndices();
This method can put all the indices name into a String array. The method works.
there's another method I think but not tried:
ImmutableOpenMap<String, MappingMetaData> mappings = node.client().admin().cluster()
.prepareState().execute().actionGet().getState().‌getMetaData().getIndices().
then, we can get the keys of mappings to get all the indices.
Thanks again!

weka wrapper attribute selection random forest java

protected static void attSelection_w(Instances data) throws Exception {
AttributeSelection fs = new AttributeSelection();
WrapperSubsetEval wrapper = new WrapperSubsetEval();
wrapper.buildEvaluator(data);
wrapper.setClassifier(new RandomForest());
wrapper.setFolds(10);
wrapper.setThreshold(0.001);
fs.SelectAttributes(data);
fs.setEvaluator(wrapper);
fs.setSearch(new BestFirst());
System.out.println(fs.toResultsString());
}
Above is my code for wrapper based attribute selection using random forest + bestfirst search. However, this somehow spits out a result using cfs, like below.
Search Method:
Greedy Stepwise (forwards).
Start set: no attributes
Merit of best subset found: 0.287
Attribute Subset Evaluator (supervised, Class (nominal): 9 class):
CFS Subset Evaluator
Including locally predictive attributes
There is no other code using CFS in the whole class, and I'm pretty much stuck.. I would appreciate any help. Thanks!
You just inverted the order and get the default method, the correct order is to set the parameter first, then call the selection:
//first
fs.setEvaluator(wrapper);
fs.setSearch(new BestFirst());
//then
fs.SelectAttributes(data);
Just set class Index and add this line after creating instance data
data.setClassIndex(data.numAttributes() - 1);
I checked and it worked fine.

Apache Commons Configuration get property based on parent with attribute equal to value

I'm using Apache Commons Configuration with XML for the configuration of my application, and I'm having trouble getting a property that I want.
I have the following XML structure (minimal structure just to get point across):
<?xml version="1.0" encoding="UTF-8"?>
<settings>
<planets>
<planet name="mars">
<terrain>terrain/mars.terrain</terrain>
</planet>
<planet name="earth">
<terrain>terrain/earth.terrain</terrain>
</planet>
</planets>
</settings>
As I understand it, the best I can do here to get the mars terrain setting is:
config.getProperty("planets.planet(0).terrain");
Unfortunately, my app doesn't/shouldn't have any concept of the indexes of these planets. At least, the indexes aren't going to be consistent. They could change at any time, so it's unreliable to refer to them by index as is done above.
So, what I want to be able to do is something like this:
config.getProperty("planets.planet[#name=mars].terrain");
That doesn't work, as you might have guessed. The only other way I've found to do it is obtuse and unacceptable:
List<Object> planets = config.getList("planets.planet[#name]");
String terrain = "";
for (int i = 0; i < planets.size(); i++) {
if (planets[i].toString().equals("mars")) {
terrain = config.getString("planets.planet(" + i + ").terrain");
break;
}
}
I'm sure you can see why that's undesirable. Now, I'm to the point where I'm considering just wrapping/extending the Apache Commons Configuration library in order to add this type of functionality, but I'm just unwilling to accept that there isn't an easier way to do this.
Question Revisited
What am I overlooking, and how can this be accomplished in a simpler manner? Does this functionality simply not exist?
I found out that I could replace the DefaultExpressionEngine with an XPathExpressionEngine.
XMLConfiguration.setDefaultExpressionEngine(new XPathExpressionEngine());
This allowed me to use XPath to get my properties, and I could now do:
String terrainFile = config.getString("planets/planet[#name='mars']/terrain");
One thing to note is that you need the Apache Commons JXPath lib to use the XPathExpressionEngine, or you will get an error when you try to create it.

Categories

Resources