Avoiding duplicate entries in a mongoDB with Java and JSON objects - java

I´m developing an analyzing program for Twitter Data.
I´m using mongoDB and at the moment. I try to write a Java program to get tweets from the Twitter API and put them in the database.
Getting the Tweets already works very well, but I have a problem when I want to put them in the database. As the Twitter API often returns just the same Tweets, I have to place some kind of index in the database.
First of all, I connect to the database and get the collection related to the search-term, or create this collection if this doesn´t exist.
public void connectdb(String keyword)
{
try {
// on constructor load initialize MongoDB and load collection
initMongoDB();
items = db.getCollection(keyword);
BasicDBObject index = new BasicDBObject("tweet_ID", 1);
items.ensureIndex(index);
} catch (MongoException ex) {
System.out.println("MongoException :" + ex.getMessage());
}
}
Then I get the tweets and put them in the database:
public void getTweetByQuery(boolean loadRecords, String keyword) {
if (cb != null) {
TwitterFactory tf = new TwitterFactory(cb.build());
Twitter twitter = tf.getInstance();
try {
Query query = new Query(keyword);
query.setCount(50);
QueryResult result;
result = twitter.search(query);
System.out.println("Getting Tweets...");
List<Status> tweets = result.getTweets();
for (Status tweet : tweets) {
BasicDBObject basicObj = new BasicDBObject();
basicObj.put("user_name", tweet.getUser().getScreenName());
basicObj.put("retweet_count", tweet.getRetweetCount());
basicObj.put("tweet_followers_count", tweet.getUser().getFollowersCount());
UserMentionEntity[] mentioned = tweet.getUserMentionEntities();
basicObj.put("tweet_mentioned_count", mentioned.length);
basicObj.put("tweet_ID", tweet.getId());
basicObj.put("tweet_text", tweet.getText());
if (mentioned.length > 0) {
// System.out.println("Mentioned length " + mentioned.length + " Mentioned: " + mentioned[0].getName());
}
try {
items.insert(basicObj);
} catch (Exception e) {
System.out.println("MongoDB Connection Error : " + e.getMessage());
loadMenu();
}
}
// Printing fetched records from DB.
if (loadRecords) {
getTweetsRecords();
}
} catch (TwitterException te) {
System.out.println("te.getErrorCode() " + te.getErrorCode());
System.out.println("te.getExceptionCode() " + te.getExceptionCode());
System.out.println("te.getStatusCode() " + te.getStatusCode());
if (te.getStatusCode() == 401) {
System.out.println("Twitter Error : \nAuthentication credentials (https://dev.twitter.com/pages/auth) were missing or incorrect.\nEnsure that you have set valid consumer key/secret, access token/secret, and the system clock is in sync.");
} else {
System.out.println("Twitter Error : " + te.getMessage());
}
loadMenu();
}
} else {
System.out.println("MongoDB is not Connected! Please check mongoDB intance running..");
}
}
But as I mentioned before, there are often the same tweets, and they have duplicates in the database.
I think the tweet_ID field is a good field for an index and should be unique in the collection.

Set the unique option on your index to have MongoDb enforce uniqueness:
items.ensureIndex(index, new BasicDBObject("unique", true));
Note that you'll need to manually drop the existing index and remove all duplicates or you won't be able to create the unique index.

This question is already answered but I would like to contribute a bit since MongoDB API 2.11 offers a method which receives unique option as a parameter:
public void ensureIndex(DBObject keys, String name, boolean unique)
A minor remind to someone who would like to store json documents on MongoDBNote is that uniqueness must be applied to a BasicObject key and not over values. For example:
BasicDBObject basicObj = new BasicDBObject();
basicObj.put("user_name", tweet.getUser().getScreenName());
basicObj.put("retweet_count", tweet.getRetweetCount());
basicObj.put("tweet_ID", tweet.getId());
basicObj.put("tweet_text", tweet.getText());
basicObj.put("a_json_text", "{"info_details":{"info_id":"1234"},"info_date":{"year":"2012"}, {"month":"12"}, {"day":"10"}}");
On this case, you can create unique index only to basic object keys:
BasicDBObject index = new BasicDBObject();
int directionOrder = 1;
index.put("tweet_ID", directionOrder);
boolean isUnique = true;
items.ensureIndex(index, "unique_tweet_ID", isUnique);
Any index regarding JSON value like "info_id" would not work since it´s not a BasicObject key.
Using indexes on MongDB is not as easy as it sounds. You may also check MongoDB docs for more details here Mongo Indexing Tutorials and Mongo Index Concepts. Direction order might be pretty important to understand once you need a composed index which is well explained here Why Direction order matter.

Related

Trying to get more than 20 followers to display, how can i loop this to give me more? [duplicate]

I have been wondering if there is a way to access all the twitter followers list.
We have tried using call to the REST API via twitter4j:
public List<User> getFriendList() {
List<User> friendList = null;
try {
friendList = mTwitter.getFollowersList(mTwitter.getId(), -1);
} catch (IllegalStateException e) {
e.printStackTrace();
} catch (TwitterException e) {
e.printStackTrace();
}
return friendList;
}
But it returns only a list of 20 followers.
I tried using the same call in loop, but it cause a rate limit exception - says we are not allowed to make too many requests in a small interval of time.
Do we have a way around this?
You should definitely use getFollowersIDs. As the documentation says, this returns an array (list) of IDs objects. Note that it causes the list to be broken into pages of around 5000 IDs at a time. To begin paging provide a value of -1 as the cursor. The response from the API will include a previous_cursor and next_cursor to allow paging back and forth.
The tricky part is to handle the cursor. If you can do this, then you will not have the problem of getting only 20 followers.
The first call to getFollowersIDs will need to be given a cursor of -1. For subsequent calls, you need to update the cursor value, by getting the next cursor, as done in the while part of the loop.
long cursor =-1L;
IDs ids;
do {
ids = twitter.getFollowersIDs(cursor);
for(long userID : ids.getIDs()){
friendList.add(userID);
}
} while((cursor = ids.getNextCursor())!=0 );
Here is a very good reference:
https://github.com/yusuke/twitter4j/blob/master/twitter4j-examples/src/main/java/twitter4j/examples/friendsandfollowers/GetFriendsIDs.java
Now, if the user has more than around 75000 followers, you will have to do some waiting (see Vishal's answer).
The first 15 calls will yield you around 75000 IDs. Then you will have to sleep for 15 minutes. Then make another 15 calls, and so on till you get all the followers. This can be done using a simple Thread.sleep(time_in_milliseconds) outside the for loop.
Just Change like this and try, this is working for me
try {
Log.i("act twitter...........", "ModifiedCustomTabBarActivity.class");
// final JSONArray twitterFriendsIDsJsonArray = new JSONArray();
IDs ids = mTwitter.mTwitter.getFriendsIDs(-1);// ids
// for (long id : ids.getIDs()) {
do {
for (long id : ids.getIDs()) {
String ID = "followers ID #" + id;
String[] firstname = ID.split("#");
String first_Name = firstname[0];
String Id = firstname[1];
Log.i("split...........", first_Name + Id);
String Name = mTwitter.mTwitter.showUser(id).getName();
String screenname = mTwitter.mTwitter.showUser(id).getScreenName();
// Log.i("id.......", "followers ID #" + id);
// Log.i("Name..", mTwitter.mTwitter.showUser(id).getName());
// Log.i("Screen_Name...", mTwitter.mTwitter.showUser(id).getScreenName());
// Log.i("image...", mTwitter.mTwitter.showUser(id).getProfileImageURL());
}
} while (ids.hasNext());
} catch (Exception e) {
e.printStackTrace();
}
Try This...
ConfigurationBuilder confbuilder = new ConfigurationBuilder();
confbuilder.setOAuthAccessToken(accessToken)
.setOAuthAccessTokenSecret(secretToken)
.setOAuthConsumerKey(TwitterOAuthActivity.CONSUMER_KEY)
.setOAuthConsumerSecret(TwitterOAuthActivity.CONSUMER_SECRET);
Twitter twitter = new TwitterFactory(confbuilder.build()).getInstance();
PagableResponseList<User> followersList;
ArrayList<String> list = new ArrayList<String>();
try
{
followersList = twitter.getFollowersList(screenName, cursor);
for (int i = 0; i < followersList.size(); i++)
{
User user = followersList.get(i);
String name = user.getName();
list.add(name);
System.out.println("Name" + i + ":" + name);
}
listView.setAdapter(new ArrayAdapter<String>(this, android.R.layout.simple_list_item_1 , list));
listView.setVisibility(View.VISIBLE);
friend_list.setVisibility(View.INVISIBLE);
post_feeds.setVisibility(View.INVISIBLE);
twit.setVisibility(View.INVISIBLE);
}
This is a tricky one.
You should specify whether you're using application or per user tokens and the number of users you're fetching followers_ids for.
You get just 15 calls per 15 minutes in case of an application token. You can fetch a maximum of 5000 followers_ids per call. That gives you a maximum of 75K followers_ids per 15 minutes.
If any of the users you're fetching followers_ids for has over 75K followers, you'll get the rate_limit error immediately. If you're fetching for more than 1 user, you'll need to build strong rate_limit handling in your code with sleeps and be very patient.
The same applies for friends_ids.
I've not had to deal with fetching more than 75K followers/friends for a given user but come to think of it, I don't know if it's even possible anymore.

MongoDB - find() calls getting stuck sometimes - Timeout

We are using MongoDb for saving and fetching data.
All calls that are putting data into collections are working fine and are through common method.
All calls that are fetching data from collections are working fine sometimes and are through common method.
But Sometimes, only for one of the collection, i get my calls being stuck for forever, consuming CPU usage. I have to manually kill the threads otherwise it consumes my whole CPU.
Mongo Connection
MongoClient mongo = new MongoClient(hostName , Integer.valueOf(port));
DB mongoDb = mongo.getDB(dbName);
Code To fetch
DBCollection collection = mongoDb.getCollection(collectionName);
DBObject dbObject = new BasicDBObject("_id" , key);
DBCursor cursor = collection.find(dbObject);
Though i have figured out the collection for which it is causing issues, but how can i improve upon this, since it is occurring for this particular collection and sometimes.
EDIT
Code to save
DBCollection collection = mongoDb.getCollection(collectionName);
DBObject query = new BasicDBObject("_id" , key);
DBObject update = new BasicDBObject();
update.put("$set" , JSON.parse(value));
collection.update(query , update , true , false);
Bulk Write / collection
DB mongoDb = controllerFactory.getMongoDB();
DBCollection collection = mongoDb.getCollection(collectionName);
BulkWriteOperation bulkWriteOperation = collection.initializeUnorderedBulkOperation();
Map<String, Object> dataMap = (Map<String, Object>) JSON.parse(value);
for (Entry<String, Object> entrySet : dataMap.entrySet()) {
BulkWriteRequestBuilder bulkWriteRequestBuilder = bulkWriteOperation.find(new BasicDBObject("_id" ,
entrySet.getKey()));
DBObject update = new BasicDBObject();
update.put("$set" , entrySet.getValue());
bulkWriteRequestBuilder.upsert().update(update);
}
How can i set timeout for fetch calls..??
A different approach is to use the proposed method for MongoDB 3.2 Driver. Keep in mind that you have to update your .jar libraries (if you haven't) to the latest version.
public final MongoClient connectToClient(String hostName, String port) {
try {
MongoClient client = new MongoClient(hostName, Integer.valueOf(port));
return client;
} catch(MongoClientException e) {
System.err.println("Cannot connect to Client.");
return null;
}
}
public final MongoDatabase connectToDB(String databaseName) {
try {
MongoDatabase db = client.getDatabase(databaseName);
return db;
} catch(Exception e) {
System.err.println("Error in connecting to database " + databaseName);
return null;
}
public final void closeConnection(MongoClient client) {
client.close();
}
public final void findDoc(MongoDatabase db, String collectionName) {
MongoCollection<Document> collection = db.getCollection(collectionName);
try {
FindIterable<Document> iterable = collection
.find(new Document("_id", key));
Document doc = iterable.first();
//For an Int64 field named 'special_id'
long specialId = doc.getLong("special_id");
} catch(MongoException e) {
System.err.println("Error in retrieving document.");
} catch(NullPointerException e) {
System.err.println("Document with _id " + key + " does not exist.");
}
}
public final void insertToDB(MongoDatabase db, String collectioName) {
try {
db.getCollection(collectionName).insertOne(new Document()
.append("special_id", 5)
//Append anything
);
catch(MongoException e) {
System.err.println("Error in inserting new document.");
}
}
public final void updateDoc(MongoDatabase db, String collectionName, long id) {
MongoCollection<Document> collection = db.getCollection(collectionName);
try {
collection.updateOne(new Document("_id", id),
new Document("$set",
new Document("special_id",
7)));
catch(MongoException e) {
System.err.println("Error in updating new document.");
}
}
public static void main(String[] args) {
String hostName = "myHost";
String port = "myPort";
String databaseName = "myDB";
String collectionName = "myCollection";
MongoClient client = connectToClient(hostName, port);
if(client != null) {
MongoDatabase db = connectToDB(databaseName);
if(db != null) {
findDoc(db, collectionName);
}
client.closeConnection();
}
}
EDIT: As the others suggested, check from the command line if the procedure of finding the document by its ID is slow too. Then maybe this is a problem with your hard drive. The _id is supposed to be indexed but for better or for worse, re-create the index on the _id field.
The answers posted by others are great, but did not solve my purpose.
Actually issue was in my existing code itself , my cursor was waiting in while loop infinite time.
I was missing few checks which has been resolved now.
Just some possible explanations/thoughts.
In general "query by id" has to be fast since _id is supposed to be indexed, always. The code snippet looks correct, so probably the reason is in mongo itself. This leads me to a couple of suggestions:
Try to connect to mongo directly from the command line and run the "find" from there. The chances are that you'll still be able to observe occasional slowness.
In this case:
Maybe its about the disks (maybe this particular server is deployed on the slow disk or at least there is a correlation with some slowness of accessing the disk).
Maybe your have a sharded configuration and one shard is slower than others
Maybe its a network issue that occurs sporadically. If you run mongo locally/on staging env. with the same collection does this reproduce?
Maybe (Although I hardly believe that) the query runs in sub un-optimal way. In this case you can use "explain()" as someone has already suggested here.
If you happen to have replica set, please figure out what is the [Read Preference]. Who knows, maybe you prefer to get this id from the sub-optimal server

No file BLOB, through JSON from ORACLE to SQL Server

i am a student and right now I'm doing an internship working with a local library, and in this case i have the following problem:
In the project i´m making, i need to retrieve image data from a temporal table, constructed in ORACLE that receives its data from some triggers in an INFORMIX DB and parse it through a monitor made in JAVA, in a JSON format to a web service published in C# and insert that image in a SQL Server DB.
I looked around and i found that it was possible to parse images through JSON using Base64 encoding and whatnot but when they talk about it they say that you must have the image path file and encode it. as you may have realized by now, i cant use that route because i don't have those images, best case scenario, the triggers are able to feed some BLOB data (by what I've been told). but i have to insert them in the SQL Server DB as Varbinary(MAX).
To summarize:
-->Informix DB has images -->triggers feed an ORACLE Temp_table (images sent probably as BLOB or CLOB at most)-->monitor made in JAVA must read those BLOBS or CLOBS and send them through JSON
-->Web Service made in C# must receive that JSON, and insert the images in a SQL Server DB (where they need to be visible, without having the physical file to refer to).
the schema i´m using (it has been IMPOSED to me, i didn't had a saying in this) is something similar to this: (it´s really long and tedious code so i´ll try to make it as neat and clean as possible)
This is the part of the java monitor that specifies which fields from the temp_table are feeding what fields in the JSON structure
public static BookRecordList viewBookRecordTable(Connection connection) throws ExceptionToOracleConcurrent
{
BookRecordList bookRecordList = new BookRecordList();
BookRecord bookRecord = new BookRecord();
Statement stmt = null;
String query = "SELECT operacion,"
+ "UTL_RAW.CAST_TO_VARCHAR2(DBMS_LOB.SUBSTR(img_logo,32670,1))"
+ "x_logo,"
+ "UTL_RAW.CAST_TO_VARCHAR2(DBMS_LOB.SUBSTR(img_logoGris,32760,1))"
+ "UTL_RAW.CAST_TO_VARCHAR2(DBMS_LOB.SUBSTR(r_firma,32670,1)),"
+ " FROM "
+ dataBaseConnectionData.getDB_SHCHEMA() + "."+ dataBaseConnectionData.getDB_TABLE_COLA()
+ " WHERE (some condition)";
try
{
stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery(query);
while(rs.next())
{
try
{
bookRecord = new BookRecord();
bookRecord.setOperacion(rs.getInt("operacion"));
bookRecord.setImg_logo(rs.getString("img_logo"));
bookRecord.setImg_logoGris(rs.getString("img_logoGris"))
bookRecord.setR_firma(rs.getString("r_firma"));
bookRecord.print();
bookRecordList.getBookRecordList().add(bookRecord);
}
catch (Exception e)
{
logger.error("Some exception " + dataBaseConnectionData.getDB_TABLE_COLA() + ": " + e.toString());
e.printStackTrace();
//Process next order
continue;
}
}
}
catch (SQLException e )
{
logger.fatal("Some exception " + dataBaseConnectionData.getDB_TABLE_COLA() + ": " + e.toString());
throw new ExceptionToOracleConcurrent("exception definition " + dataBaseConnectionData.getDB_TABLE_COLA() + ": " + e.toString());
}
finally
{
if (stmt != null)
{
try
{
stmt.close();
}
catch (SQLException e)
{
logger.fatal("another exception " + e.toString());
}
}
}
return bookRecordList;
}
This is the part of the java monitor that generates the JSON (the empty cases contain another stuff that goes into the JSON but i sorted that out)
private static String GenerateJSON(SomeClass someClass) throws IOException
{
int operation = someClass.getOperation();
JSONObject obj = new JSONObject();
String jsonText = "";
switch (operation)
{
case 0:
//obligatory fields
obj.put("img_logo",someClass.getImg_logo());
break;
case 1:
break;
case 2:
//Obligatory fields
obj.put("img_foto",someClass.getC_empleado());
obj.put("img_firma",someClass.getC_empleado());
break;
case 3:
obj.put("r_firma",someClass.getR_firma());
break;
case 4:
break;
case 5:
break;
}
StringWriter out = new StringWriter();
obj.writeJSONString(out);
jsonText = out.toString();
String newJson = jsonText.replace("\\/", "/");
logger.info("JSON a enviar: " + newJson);
return newJson;
}
The web service is made in C#, it´s another case based program, structured accordingly to the operation number received in the JSON, it calls a number of function and, in the end, it comes down to these two:
this part of the WS receive the parameters of the parsed JSON
public int ActualizarFichaLibro( String img_foto, String r_firma)
{
try
{
//Define query to insert
Cmd.CommandText = QueryCFA.ActualizarFicha();
//Define parameters types to insert
Cmd.Parameters.Add("#img_foto", SqlDbType.VarBinary, -1);
Cmd.Parameters.Add("#r_firma", SqlDbType.VarBinary, -1);
//Define parameters values to insert
Cmd.Parameters["#img_foto"].Value = img_foto;
Cmd.Parameters["#r_firma"].Value = r_firma;
int rowCount = Cmd.ExecuteNonQuery();
CerrarConexionBd();
return rowCount;
}
catch (Exception)
{
return 0;
}
}
and finally that invokes a simple query, in this particular case, to this one:
public string ActualizarFicha()
{
Query = "UPDATE dbo.fichaEmpleado SET( CASE WHEN #img_foto = '' THEN NULL ELSE img_foto = CONVERT(VARBINARY(MAX), #img_foto, 2) END,"
+ "CASE WHEN #r_firma = '' THEN NULL ELSE img_firma = CONVERT(VARBINARY(MAX), #r_firma, 2) END,"
+"WHERE (some conditions)";
return Query;
}
my questions are:
is there a way to do this (sending images from one DB to anther) through JSON, specifically with this massive schema this people got going on? if not is there a way to do it?
the querys for reading a BLOB (possible BLOB) and inserting a Varbinary are well implemented?
I´m sorry for the extremely long explanation, I've been working on this for a week and i cant seem to find a proper way to do it (at least not with this schema, but the bosses don't want to change it)

Indexing of document in elastic search, JAVA API

We are Indexing a resume document by using elastic search java API. It works fine. When we are searching a keyword it's return the accurate response(Document) which has that keyword.
But we want to index document in deep. For example A resume has 'Skills' and their 'Skills Month'. Skills month may be 13 months in document. SO i search for that skill and set skill months between 10 to 15 months in elastic search query, then we want that record(Document).
How can we do this?
Here is the code for Indexing:-
IndexResponse response = client
.prepareIndex(userName, document.getType(),
document.getId())
.setSource(extractDocument(document)).execute()
.actionGet();
public XContentBuilder extractDocument(Document document) throws IOException, NoSuchAlgorithmException {
// Extracting content with Tika
int indexedChars = 100000;
Metadata metadata = new Metadata();
String parsedContent;
try {
// Set the maximum length of strings returned by the parseToString method, -1 sets no limit
parsedContent = tika().parseToString(new BytesStreamInput(
Base64.decode(document.getContent().getBytes()), false), metadata, indexedChars);
} catch (Throwable e) {
logger.debug("Failed to extract [" + indexedChars + "] characters of text for [" + document.getName() + "]", e);
System.out.println("Failed to extract [" + indexedChars + "] characters of text for [" + document.getName() + "]" +e);
parsedContent = "";
}
XContentBuilder source = jsonBuilder().startObject();
if (logger.isTraceEnabled()) {
source.prettyPrint();
}
// File
source
.startObject(FsRiverUtil.Doc.FILE)
.field(FsRiverUtil.Doc.File.FILENAME, document.getName())
.field(FsRiverUtil.Doc.File.LAST_MODIFIED, new Date())
.field(FsRiverUtil.Doc.File.INDEXING_DATE, new Date())
.field(FsRiverUtil.Doc.File.CONTENT_TYPE, document.getContentType() != null ? document.getContentType() : metadata.get(Metadata.CONTENT_TYPE))
.field(FsRiverUtil.Doc.File.URL, "file://" + (new File(".", document.getName())).toString());
if (metadata.get(Metadata.CONTENT_LENGTH) != null) {
// We try to get CONTENT_LENGTH from Tika first
source.field(FsRiverUtil.Doc.File.FILESIZE, metadata.get(Metadata.CONTENT_LENGTH));
} else {
// Otherwise, we use our byte[] length
source.field(FsRiverUtil.Doc.File.FILESIZE, Base64.decode(document.getContent().getBytes()).length);
}
source.endObject(); // File
// Path
source
.startObject(FsRiverUtil.Doc.PATH)
.field(FsRiverUtil.Doc.Path.ENCODED, SignTool.sign("."))
.field(FsRiverUtil.Doc.Path.ROOT, ".")
.field(FsRiverUtil.Doc.Path.VIRTUAL, ".")
.field(FsRiverUtil.Doc.Path.REAL, (new File(".", document.getName())).toString())
.endObject(); // Path
// Meta
source
.startObject(FsRiverUtil.Doc.META)
.field(FsRiverUtil.Doc.Meta.AUTHOR, metadata.get(Metadata.AUTHOR))
.field(FsRiverUtil.Doc.Meta.TITLE, metadata.get(Metadata.TITLE) != null ? metadata.get(Metadata.TITLE) : document.getName())
.field(FsRiverUtil.Doc.Meta.DATE, metadata.get(Metadata.DATE))
.array(FsRiverUtil.Doc.Meta.KEYWORDS, Strings.commaDelimitedListToStringArray(metadata.get(Metadata.KEYWORDS)))
.endObject(); // Meta
// Doc content
source.field(FsRiverUtil.Doc.CONTENT, parsedContent);
// Doc as binary attachment
source.field(FsRiverUtil.Doc.ATTACHMENT, document.getContent());
// End of our document
source.endObject();
return source;
}
Below code is used for getting response:
QueryBuilder qb;
if (query == null || query.trim().length() <= 0) {
qb = QueryBuilders.matchAllQuery();
} else {
qb = QueryBuilders.queryString(query);//query is a name or string
}
org.elasticsearch.action.search.SearchResponse searchHits = node.client()
.prepareSearch()
.setIndices("ankur")
.setQuery(qb)
.setFrom(0).setSize(1000)
.addHighlightedField("file.filename")
.addHighlightedField("content")
.addHighlightedField("meta.title")
.setHighlighterPreTags("<span class='badge badge-info'>")
.setHighlighterPostTags("</span>")
.addFields("*", "_source")
.execute().actionGet();
Elastic search indices all the column by default for providing better search capabilities. Before you put your JSON documents under some type, it would be great to define your mappings (refer: https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping-analysis.html)
When you want to search data by exact keyword, you may need to skip a particular column by not analyzing. While indexing a document, the column values will be analyzed and then it will be indexed. You can enforce Elastic saying that "not_analyzed". Then your column value will be indexed as it is. This way you can get a better search results.
For another part for defining your JSON document, it would be good if you use some library to define JSON. I prefer Jackson library for parsing JSON document. This will reduce the lines of code in your project.

How do I extract properties from Entities in Google App Engine Datastore using Java

I am using Google App Engine and trying to query / pull data from the Datastores. I have followed nearly 20 different tutorials without any luck.
Here is a picture of my Datastore and the respective sample data I have stored in there:
Here is some of the code I have to pull the data:
//To obtain the keys
final DatastoreService dss=DatastoreServiceFactory.getDatastoreService();
final Query query=new Query("Coupon");
List<Key> keys = new ArrayList<Key>();
//Put the keys into a list for iteration
for (final Entity entity : dss.prepare(query).asIterable(FetchOptions.Builder.withLimit(100000))) {
keys.add(entity.getKey());
}
try {
for (int i = 0; i < keys.size(); i++){
Entity myEntity = new Entity("Coupon", keys.get(i));
System.out.println("Size of the Keys array = " + keys.size());
String description = (String) myEntity.getProperty("desc");
String endDate = (String) myEntity.getProperty("endDate");
System.out.println("Description = " + description);
System.out.println("End Date: " + endDate);
//Map here is empty...
Map<String, Object> test = myEntity.getProperties();
System.out.println("MAP SIZE = " + test.size());
}
} catch (Exception e){
e.printStackTrace();
}
**OUPUT:**
Size of the Keys array = 2
Description = null
End date = null
MAP SIZE = 0
I have no clue why the description and end date are null. It is clearly pulling in the right Entity as the size shows 2, which matches the picture shown. Also, when I print the keys out, it matches as well
(Something like this: for the keys.get(i).toString(); -- Entity [!global:Coupon(123)/Coupon(no-id-yet)]:
. Or: Key String = !global:Coupon(5730827476402176)
I have followed the documentation (here) and some examples (here) to the best of my ability but I cannot seem to figure it out. Does anyone have any recommendations or experience in how to obtain the properties from Entities once you have them without them returning null?
I have gone through the following Stackoverflow questions without any success so please do not close this with a simple duplicate question marker on it:
1) How do i get all child entities in Google App Engine (Low-level API)
2) Storing hierarchical data in Google App Engine Datastore?
3) How do you use list properties in Google App Engine datastore in Java?
4) Mass updates in Google App Engine Datastore
5) Checking if Entity exists in google app engine datastore. .
have you tried this?
//Put the keys into a list for iteration
for (final Entity entity : dss.prepare(query).asIterable (FetchOptions.Builder.withLimit(100000))) {
String description = (String) entity.getProperty("desc");
String endDate = (String) entity.getProperty("endDate");
System.out.println("Description = " + description);
System.out.println("End Date: " + endDate);
}
In your example, you creating entity and it is expected that properties will be empty
Eureka! Many thanks to all that answered. Patrice and user2649908 especially thank you as you led me to the answer.
So, Patrice was entirely correct in that I was querying to get the keys, building a new entity, and then trying to parse the newly created (empty) entity.
The solution was to utilize PersistenceManager to parse the data and then use getter/ accessor methods to do so. The link for persistence manager (which I more or less just copied directly from as it worked perfectly) is here:
How to use JDO persistence manager?
Once I setup the persistence manager, I was able to get it to pull the data using this code:
try {
for (int i = 0; i < keys.size(); i++){
//See the link for How to use JDO persistence manager on how to use this
PersistenceManager pm = MyPersistenceManagerClass.getPM();
//Need to cast it here because it returns an object
Coupon coupon = (Coupon) pm.getObjectById(Coupon.class, keys.get(i));
System.out.println("Created by = " + coupon.getCreatedBy());
System.out.println("Description = " + coupon.getDesc());
System.out.println("Modified by = " + coupon.getModifiedBy());
}
} catch (Exception e){
e.printStackTrace();
}

Categories

Resources