Count number of objects in datastore AppEngine java

Count number of objects in datastore AppEngine java - java

I am trying to do a simple application that will persist an object every time a get request is made. In the below code, I use a servlet Put to accomplish this.
public class Put extends HttpServlet {
public void doGet(HttpServletRequest req, HttpServletResponse resp)
throws IOException {
resp.setContentType("text/plain");
PersistenceManagerFactory PMF = JDOHelper
.getPersistenceManagerFactory("transactions-optional");
PersistenceManager pm = PMF.getPersistenceManager();
String id = req.getParameter("id");
String name = req.getParameter("name");
String email = req.getParameter("email");
String productId = req.getParameter("productid");
String timeStamp = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date());
String mailSent = req.getParameter("mailsent");
Product product = new Product(id,name,email,productId,timeStamp,mailSent);
/*
* Get number of objects persisted till now
* Increment the count and use that value as key
*/
Key key = KeyFactory.createKey(Product.class.getSimpleName(),
"1001"); // ??
product.setKey(key);
try {
pm.makePersistent(product);
} finally {
pm.close();
}
}
}
to retrieve all objects I use a Get servlet,
public class Get extends HttpServlet {
public void doGet(HttpServletRequest req, HttpServletResponse resp)
throws IOException {
resp.setContentType("text/plain");
PersistenceManagerFactory PMF = JDOHelper
.getPersistenceManagerFactory("transactions-optional");
PersistenceManager pm = PMF.getPersistenceManager();
/*
* Get number of objects stored
* loop from 0 to the count and print all objects
*
*/
Product e = pm.getObjectById(Product.class, req.getParameter("id"));
resp.getWriter().println();
}
}
My problem is how to get number of objects stored in the datastore?

You should be very careful about using counts in the datastore. All datastore operations are designed to scale with the size of the result set, not the size of the stored set of data. This means that there is no efficient way to count all the entities in your datastore. In large distributed systems, it is difficult to maintain a strongly consistent count, you can see what is necessary to implement this for sharded counters.
Additionally, you should not be storing your data using sequential keys. Additionally, you can run into performance problems by storing your data in sequential order. This is why the default id allocation policy in Datastore switched to using scattered (non-sequential) ids.
In order to loop over all of your entities, you should issue a query over your Product kind.
Query q = pm.newQuery(Product.class);
try {
List<Product> results = (List<Product>) q.execute();
if (!results.isEmpty()) {
for (Product p : results) {
// Process result p
}
} else {
// Handle "no results" case
}
} finally {
q.closeAll();
}
Note that as you get more entities, you'll eventually have too many entities to display on a single page. You should plan for this by setting limits and using cursors to implement paging.
If you want your results in date order, you'll have to order by your timestamp:
Query q = pm.newQuery(Product.class);
q.setOrdering("timestamp");
Also be careful that your query will be eventually consistent. This means that you may not see results in your query for some time after you put it. You'll want to make sure that if this is necessary you rethink your data design to structure it for strong consistency.

Related

Improve performance of loading 100,000 records from database

We created a program to make the use of the database easier in other programs. So the code im showing gets used in multiple other programs.
One of those other programs gets about 10,000 records from one of our clients and has to check if these are in our database already. If not we insert them into the database (they can also change and have to be updated then).
To make this easy we load all the entries from our whole table (at the moment 120,000), create a class for every entry we get and put all of them into a Hashmap.
The loading of the whole table this way takes around 5 minutes. Also we sometimes have to restart the program because we run into a GC overhead error because we work on limited hardware. Do you have an idea of how we can improve the performance?
Here is the code to load all entries (we have a global limit of 10.000 entries per query so we use a loop):
public Map<String, IMasterDataSet> getAllInformationObjects(ISession session) throws MasterDataException {
IQueryExpression qe;
IQueryParameter qp;
// our main SDP class
Constructor<?> constructorForSDPbaseClass = getStandardConstructor();
SimpleDateFormat itaTimestampFormat = new SimpleDateFormat("yyyyMMddHHmmssSSS");
// search in standard time range (modification date!)
Calendar cal = Calendar.getInstance();
cal.set(2010, Calendar.JANUARY, 1);
Date startDate = cal.getTime();
Date endDate = new Date();
Long startDateL = Long.parseLong(itaTimestampFormat.format(startDate));
Long endDateL = Long.parseLong(itaTimestampFormat.format(endDate));
IDescriptor modDesc = IBVRIDescriptor.ModificationDate.getDescriptor(session);
// count once before to determine initial capacities for hash map/set
IBVRIArchiveClass SDP_ARCHIVECLASS = getMasterDataPropertyBag().getSDP_ARCHIVECLASS();
qe = SDP_ARCHIVECLASS.getQueryExpression(session);
qp = session.getDocumentServer().getClassFactory()
.getQueryParameterInstance(session, new String[] {SDP_ARCHIVECLASS.getDatabaseName(session)}, null, null);
qp.setExpression(qe);
qp.setHitLimitThreshold(0);
qp.setHitLimit(0);
int nrOfHitsTotal = session.getDocumentServer().queryCount(session, qp, "*");
int initialCapacity = (int) (nrOfHitsTotal / 0.75 + 1);
// MD sets; and objects already done (here: document ID)
HashSet<String> objDone = new HashSet<>(initialCapacity);
HashMap<String, IMasterDataSet> objRes = new HashMap<>(initialCapacity);
qp.close();
// do queries until hit count is smaller than 10.000
// use modification date
boolean keepGoing = true;
while(keepGoing) {
// construct query expression
// - basic part: Modification date & class type
// a. doc. class type
qe = SDP_ARCHIVECLASS.getQueryExpression(session);
// b. ID
qe = SearchUtil.appendQueryExpressionWithANDoperator(session, qe,
new PlainExpression(modDesc.getQueryLiteral() + " BETWEEN " + startDateL + " AND " + endDateL));
// 2. Query Parameter: set database; set expression
qp = session.getDocumentServer().getClassFactory()
.getQueryParameterInstance(session, new String[] {SDP_ARCHIVECLASS.getDatabaseName(session)}, null, null);
qp.setExpression(qe);
// order by modification date; hitlimit = 0 -> no hitlimit, but the usual 10.000 max
qp.setOrderByExpression(session.getDocumentServer().getClassFactory().getOrderByExpressionInstance(modDesc, true));
qp.setHitLimitThreshold(0);
qp.setHitLimit(0);
// Do not sort by modification date;
qp.setHints("+NoDefaultOrderBy");
keepGoing = false;
IInformationObject[] hits = null;
IDocumentHitList hitList = null;
hitList = session.getDocumentServer().query(qp, session);
IDocument doc;
if (hitList.getTotalHitCount() > 0) {
hits = hitList.getInformationObjects();
for (IInformationObject hit : hits) {
String objID = hit.getID();
if(!objDone.contains(objID)) {
// do something with this object and the class
// here: construct a new SDP sub class object and give it back via interface
doc = (IDocument) hit;
IMasterDataSet mdSet;
try {
mdSet = (IMasterDataSet) constructorForSDPbaseClass.newInstance(session, doc);
} catch (Exception e) {
// cause for this
String cause = (e.getCause() != null) ? e.getCause().toString() : MasterDataException.ERRMSG_PART_UNKNOWN;
throw new MasterDataException(MasterDataException.ERRMSG_NOINSTANCE_POSSIBLE, this.getClass().getSimpleName(), e.toString(), cause);
}
objRes.put(mdSet.getID(), mdSet);
objDone.add(objID);
}
}
doc = (IDocument) hits[hits.length - 1];
Date lastModDate = ((IDateValue) doc.getDescriptor(modDesc).getValues()[0]).getValue();
startDateL = Long.parseLong(itaTimestampFormat.format(lastModDate));
keepGoing = (hits.length >= 10000 || hitList.isResultSetTruncated());
}
qp.close();
}
return objRes;
}

Loading 120,000 rows (and more) each time will not scale very well, and your solution may not work in the future as the record size grows. Instead let the database server handle the problem.
Your table needs to have a primary key or unique key based on the columns of the records. Iterate through the 10,000 records performing JDBC SQL update to modify all field values with where clause to exactly match primary/unique key.
update BLAH set COL1 = ?, COL2 = ? where PKCOL = ?; // ... AND PKCOL2 =? ...
This modifies an existing row or does nothing at all - and JDBC executeUpate() will return 0 or 1 indicating number of rows changed. If number of rows changed was zero you have detected a new record which does not exist, so perform insert for that new record only.
insert into BLAH (COL1, COL2, ... PKCOL) values (?,?, ..., ?);
You can decide whether to run 10,000 updates followed by however many inserts are needed, or do update+optional insert, and remember JDBC batch statements / auto-commit off may help speed things up.

Query Lucene Indexes Created in Apache Geode

Created a Lucene index in Geode with the code provided in documentation. Then put a couple of objects in the region and queried the region with a Lucene query, which documentation also shows how. But the query result is always empty. Here is my code:
Starting a Geode server and creating a Lucene index in it:
public static void startServerAndLocator() throws InterruptedException {
ServerLauncher serverLauncher = new ServerLauncher.Builder()
.setMemberName("server1")
.setServerPort(40404)
.set("start-locator", "127.0.0.1[10334]")
.build();
ServerLauncher.ServerState state = serverLauncher.start();
_logger.info(state.toString());
Cache cache = new CacheFactory().create();
createLuceneIndex(cache);
cache.createRegionFactory(RegionShortcut.PARTITION).create("test");
}
public static void createLuceneIndex(Cache cache) throws InterruptedException {
LuceneService luceneService = LuceneServiceProvider.get(cache);
luceneService.createIndexFactory()
.addField("fullName")
.addField("salary")
.addField("phone")
.create("employees", "/test");
}
Putting objects in region and querying:
public static void testGeodeServer() throws LuceneQueryException, InterruptedException {
ClientCache cache = new ClientCacheFactory()
.addPoolLocator("localhost", 10334)
.create();
Region<Integer, Person> region = cache
.<Integer, Person>createClientRegionFactory(ClientRegionShortcut.CACHING_PROXY).create("test");
List<Person> persons = Arrays.asList(
new Person("John", 3000, 5556644),
new Person("Jane", 4000, 6664488),
new Person("Janet", 3500, 1112233));
for (int i = 0; i < persons.size(); i++) {
region.put(i, persons.get(i));
}
LuceneService luceneService = LuceneServiceProvider.get(cache);
LuceneQuery<Integer, Person> query = luceneService.createLuceneQueryFactory()
.setLimit(10)
.create("employees", "/test", "fullName:John AND salary:3000", "salary");
Collection<Person> values = query.findValues();
System.out.println("Query results:");
for (Person person : values) {
System.out.println(person);
}
cache.close();
}
Person is a basic POJO class with three fields (name, salary, phone).
What am I doing wrong here? Why the query result is empty?

If you do a query with just fullName, do you still get no results?
I think the issue is that salary and phone are getting stored as IntPoint. You could make them String fields in your Person class so they get stored as strings, or you could use an integer query, eg.
luceneService.createLuceneQueryFactory()
.create("employees", "test",
index -> IntPoint.newExactQuery("salary", 30000))

The events are still in AsyncEventQueue and not flushed into index yet. (It might take 10+ milliseconds). The AsyncEventQueue's default flush interval is 10ms.
You need to add following code before doing query.
luceneService.waitUntilFlushed("employees", "/test", 30000, TimeUnit.MILLISECONDS);

Another issue in the program is:
The salary field is a integer. But the query try to do a string query on the salary field and mixed with another string field.
To query on a integer field mixed with a string field, you need to create a LuceneQueryProvider to bind a StringQueryParser with a IntPoint.newExactQuery(or other IntPoint queries).
If you just want to try the basic functionality, you can only use String fields for the time being. (i.e. change the salary field to String)

How to add elements to ConcurrentHashMap using ExecutorService

I have a requirement of reading User Information from 2 different sources (db) per userId and storing consolidated information in a Map with key as userId. Users in numbers can vary based on period they have opted for. Group of users may belong to different Period of Year.eg daily, weekly, monthly users.
I used HashMap and LinkedHashMap to get this done. As it slows down the process and to make it faster, I thought of using Threading here.
Reading some tutorials and examples now I am using ConcurrentHashMap and ExecutorService.
In cases based on some validation I want to skip the current iteration and move to next User info. It doesnot allow to use continue keyword to use within for loop. Is there any way to achieve same differently within Multithreaded code.
Moreover below code piece though it works, but its not significantly that faster than the code without threading which creates doubt if Executor Service is implemented correctly.
How do we debug in case we get any error in Multithreaded code. Execution holds at debug point but its not consistent and it does not move to next line with F6.
Can someone point out if I am missing something in the code. Or any other example of simillar use case also can be of great help.
public void getMap() throws UserException
{
long startTime = System.currentTimeMillis();
Map<String, Map<Integer, User>> map = new ConcurrentHashMap<String, Map<Integer, User>>();
//final String key = "";
try
{
final Date todayDate = new Date();
List<String> applyPeriod = db.getPeriods(todayDate);
for (String period : applyPeriod)
{
try
{
final String key = period;
List<UserTable1> eligibleUsers = db.findAllUsers(key);
Map<Integer, User> userIdMap = new ConcurrentHashMap<Integer, User>();
ExecutorService executor = Executors.newFixedThreadPool(eligibleUsers.size());
CompletionService<User> cs = new ExecutorCompletionService<User>(executor);
int userCount=0;
for (UserTable1 eligibleUser : eligibleUsers)
{
try
{
cs.submit(
new Callable<User>()
{
public User call()
{
int userId = eligibleUser.getUserId();
List<EmployeeTable2> empData = db.findByUserId(userId);
EmployeeTable2 emp = null;
if (null != empData && !empData.isEmpty())
{
emp = empData.get(0);
}else{
String errorMsg = "No record found for given User ID in emp table";
logger.error(errorMsg);
//continue;
// conitnue does not work here.
}
User user = new User();
user.setUserId(userId);
user.setFullName(emp.getFullName());
return user;
}
}
);
userCount++;
}
catch(Exception ex)
{
String errorMsg = "Error while creating map :" + ex.getMessage();
logger.error(errorMsg);
}
}
for (int i = 0; i < userCount ; i++ ) {
try {
User user = cs.take().get();
if (user != null) {
userIdMap.put(user.getUserId(), user);
}
} catch (ExecutionException e) {
} catch (InterruptedException e) {
}
}
executor.shutdown();
map.put(key, userIdMap);
}
catch(Exception ex)
{
String errorMsg = "Error while creating map :" + ex.getMessage();
logger.error(errorMsg);
}
}
}
catch(Exception ex){
String errorMsg = "Error while creating map :" + ex.getMessage();
logger.error(errorMsg);
}
logger.info("Size of Map : " + map.size());
Set<String> periods = map.keySet();
logger.info("Size of periods : " + periods.size());
for(String period :periods)
{
Map<Integer, User> mapOfuserIds = map.get(period);
Set<Integer> userIds = mapOfuserIds.keySet();
logger.info("Size of Set : " + userIds.size());
for(Integer userId : userIds){
User inf = mapOfuserIds.get(userId);
logger.info("User Id : " + inf.getUserId());
}
}
long endTime = System.currentTimeMillis();
long timeTaken = (endTime - startTime);
logger.info("All threads are completed in " + timeTaken + " milisecond");
logger.info("******END******");
}

You really don't want to create a thread pool with as many threads as users you've read from the db. That doesn't make sense most of the time because you need to keep in mind that threads need to run somewhere... There are not many servers out there with 10 or 100 or even 1000 cores reserved for your application. A much smaller value like maybe 5 is often enough, depending on your environment.
And as always for topics about performance: You first need to test what your actual bottleneck is. Your application may simply don't benefit of threading because e.g. you are reading form a db which only allows 5 concurrent connections a the same time. In that case all your other 995 threads will simply wait.
Some other thing to consider is network latency: Reading multiple user ids from multiple threads may even increase the round trip time needed to get the data for one user from the database. An alternative approach might be to not read one user at a time, but the data of all 10'000 of them at once. That way your maybe available 10 GBit Ethernet connection to your database might really speed things up because you have only small communication overhead with the database but it might serve you all data you need in one answer quickly.
So in short, from my opinion your question is about performance optimization of your problem in general, but you don't know enough yet to decide which way to go.

you could try something like that:
List<String> periods = db.getPeriods(todayDate);
Map<String, Map<Integer, User>> hm = new HashMap<>();
periods.parallelStream().forEach(s -> {
eligibleUsers = // getEligibleUsers();
hm.put(s, eligibleUsers.parallelStream().collect(
Collectors.toMap(UserTable1::getId,createUserForId(UserTable1:getId))
});
); //
And in the createUserForId you do your db-reading
private User createUserForId(Integer id){
db.findByUserId(id);
//...
User user = new User();
user.setUserId(userId);
user.setFullName(emp.getFullName());
return user;
}

Is there a way to put "req.getParameter" inside a "for loop" so it saves in database depending on the x amount a user gives in another servlet?

I'm developing a system in which a teacher can edit the exam he/she has previously created, but the problem is that saving the questions and answers depend on how many question the exam has.
When I request variables from the "editing servlet" to the "saving servlet" I cant find a way where I can create String variables for every question&answer so they can be saved into mySQL database.
public void doGet(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException {
String numpreguntas = req.getParameter("x");
int Nump = Integer.parseInt(numpreguntas);
String numrespuestas = req.getParameter("y");
int Numr = Integer.parseInt(numrespuestas);
String nombre = req.getParameter("strNombre");
for(int i=1; i<Nump; i++) {
String resp = req.getParameter("answeri-j");
}
actualizarBD(nombre, Nump, Numr, resp);
devolverPaginaHTML(res, nombre);
}

ServletRequest getParameterMap will give you all the parameters and values passed on the request. You can then use this to iterate over rather than look for individual parameters.

HashMap is broken/ performance issues

Currently I have HashMap implemented which
private static Map<String, Item> cached = new HashMap<String, Item>();
and Item is a object with properties
Date expirationTime , and byte[] data
This map is used when multiple threads concurrently start hitting this.
The check I do is
1.
public static final byte[] getCachedData(HttpServletRequest request) throws ServletException
{
String url = getFullURL(request);
Map<String, Item> cache = getCache(request); // this chec
Item item = null;
synchronized (cache)
{
item = cache.get(url);
if (null == item)
return null;
// Make sure that it is not over an hour old.
if (item.expirationTime.getTime() < System.currentTimeMillis())
{
cache.remove(url);
item = null;
}
}
if (null == item)
{
log.info("Expiring Item: " + url);
return null;
}
return item.data;
}
2. If data is returned null, then we create and data and cache it in hashMap
public static void cacheDataX(HttpServletRequest request, byte[] data, Integer minutes) throws ServletException
{
Item item = new Item(data);
String url = getFullURL(request);
Map<String, Item> cache = getCache(request);
log.info("Caching Item: " + url + " - Bytes: " + data.length);
synchronized (cache)
{
Calendar cal = Calendar.getInstance();
cal.add(Calendar.MINUTE, minutes);
item.expirationTime = cal.getTime();
cache.put(url, item);
}
}
Seems like if multiple threads access the say key (url in this case) , then data gets added to cache more than once at same key location [ as getCacheData will return null for multiple threads since hashmap has not finished writing data for first thread ]
Any suggestions as how to solve the issue ?

In cacheDataX add a check for the existence of the item before you add (inside of the synchronized block).
synchronized (cache)
{
if (cache.get(url) == null) {
Calendar cal = Calendar.getInstance();
cal.add(Calendar.MINUTE, minutes);
item.expirationTime = cal.getTime();
cache.put(url, item);
}
}
This will ensure that multiple threads that have already done a lookup and returned null cannot all add the same data to the cache. One will add it and others threads will silently ignore since the cache has already been updated.

You need one synchronize block, to cover both getting something from the cache plus inserting into the cache. As the code stands you have a race condition: multiple threads can execute step 1 before anybody executes step 2.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Count number of objects in datastore AppEngine java - java

Related

Improve performance of loading 100,000 records from database

Query Lucene Indexes Created in Apache Geode

How to add elements to ConcurrentHashMap using ExecutorService

Is there a way to put "req.getParameter" inside a "for loop" so it saves in database depending on the x amount a user gives in another servlet?

HashMap is broken/ performance issues

Categories

Resources