Creating entities using task queues do not always get created - java

I have a task that simply creates an entity into the datastore. I now queue up many tasks into a named push queue and let it run. When it completes, I see in the log that all of the task request were run. However, the number of entities created was actually lower than expected.
The following is an example of the code I used to test this. I ran 10000 tasks and the final result only has around 9200 entities in the datastore.
I use RestEasy to expose urls for the task queues.
queue.xml
<queue>
<name>testQueue</name>
<rate>5/s</rate>
</queue>
Test Code
#GET
#Path("/queuetest/{numTimes}")
public void queueTest(#PathParam("numTimes") int numTimes) {
for(int i = 1; i <= numTimes; i++) {
Queue queue = QueueFactory.getQueue("testQueue");
TaskOptions taskOptions = TaskOptions.Builder.withUrl("/queuetest/worker/" + i).method(Method.GET);
queue.add(taskOptions);
}
}
#GET
#Path("/queuetest/worker/{index}")
public void queueTestWorker(#PathParam("index") String index) {
DateFormat df = new SimpleDateFormat("MM/dd/yyyy HH:mm:ss");
Date today = Calendar.getInstance().getTime();
String timestamp = df.format(today);
Entity tObj = new Entity("TestObj");
tObj.setProperty("identifier", index);
tObj.setProperty("timestamp", timestamp);
DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
Key key = datastore.put(tObj);
}
I have ran this a few times and not once have I seen all of the entities created.
Is it possible that tasks can be discarded if there is too much contention on the queue?
Is this the expected behavior for a task queue?
#### EDIT
I followed mitch's suggestion to log the entity IDs that are created and found that they are indeed created as expected. But the logs themselves displayed some weird behavior in which logs from some tasks appear in another task's log. And when that happens, some tasks show 2 entity IDs in a single request.
For the tasks that display 2 entity IDs, the first one it logs are the missing entities in the datastore. Does this mean there is a problem with a high number of puts to the datastore? (The entities i'm creating are NOT part of a larger entity group, i.e. It doesn't refer to #parent)

Why don't you add a log statement after each datastore.put() call which logs the ID of the newly created entity. Then you can compare the log to the datastore contents and you will be able to tell if the problem is that datastore.put() is not being invoked successfully 1000 times or if the problem is that some of the successful put calls are not resulting in entities that you see in the datastore.

Related

Avoid overlapping bookings in MySql with Spring Boot

I try to build a booking portal. A booking has a checkin datetime and a checkout datetime. The Spring Boot application will run in many replicas.
My problem is to ensure that there is no overlapping possible.
First of all here is my respository method to check if the time is blocked:
#Lock(LockModeType.PESSIMISTIC_READ)
#QueryHints({#QueryHint(name = "javax.persistence.lock.timeout", value = "1000")})
Optional<BookingEntity> findFirstByRoomIdAndCheckInBeforeAndCheckOutAfter(Long roomId, LocalDateTime checkIn, LocalDateTime checkOut);
As you can see I am using findFirstBy and not findBy because the request could get more than 1 result.
With this i can check if this time is blocked (please notice in the call i switch the requested checkin and requested checkout:
findFirstByWorkplaceIdAndCheckInBeforeAndCheckOutAfter(workplaceId, requestCheckOut, requestCheckIn).isPresent();
And everything is happen in the controller:
#PostMapping("")
#Transactional
public void myController(LocalDateTime checkIn, LocalDateTime checkOut, long roomId) {
try {
if (myService.bookingBlocked(checkIn, checkOut, roomId)) {
Log.warning("Booking blocked");
return "no!";
}
bookingService.createBooking(checkIn, checkOut, roomId);
return "good job";
} catch (Exception exception) {
return "something went wrong";
}
}
Everything works fine but I can simulate an overlapping booking if i set a breakpoint after the bookingBlocked-check in two replicas. Following happens:
Replica 1 checks if the booking free (its ok)
Replica 2 checks if the booking free (its ok)
Replica 1 creates new entity
Replica 2 creates new entity
Now I have a overlapping booking.
My idea was to create a #Constraint but this does not possible with hibernate in MySql dialect. Than I tried to create a #Check but this seems also to be impossible.
Now my idea was to create a Lock with Transaction (is already in the code at the top). This seems to work but I am not sure if I have this implemented correct. If I try the same as before with the breakpoints following happen:
Replica 1 checks if the booking free (its ok)
Replica 2 checks if the booking free (its ok)
Replica 1 creates new entity
Replica 2 creates new entity (throws exception and silent rollbacked)
Controller returns a 500.
I am not sure where the exception is thrown. I am not sure what happens when the replica is shutdown durring the lock. I am not sure if i can create a deadlock in the database. And it is still possible to manipulate the database via sql query to create a overlapping.
Could anybody say me if this the correct way? Is there a possibility to create a constraint on database with hibernate (i dont use migration scripts and only for one constraint it will be not cool)? Should I use optimistic locking? Every idea is welcome.

How can I get JPA/Entity Manager to make parallel queries instead of lumping them into one batch?

Inside the doGet method in my servlet I'm using a JPA TypedQuery to retrieve my data. I'm able to get the data I want through an http get request method. The method to get the data takes roughly 10 seconds and when I make a single request all is good. The problem occurs when I get multiple requests at the same time. If I make 4 request at the same time, all 4 queries are lumped together and they take 40 seconds to get the data back for all of them. How can I get JPA to make 4 separate queries in parallel? Is this something in the persistence.xml that needs set or is it a code related issue? Note: I've also tried executing this code in a thread. A link and some appropriate terminology to increase my understanding would be appreciated.
Thanks!
try{
String sequenceNo = request.getParameter("sequenceNo");
EntityManagrFactory emf = Persistence.createEntityManagerFactory("mydbcon");
EntityManager em = emf.createEntityManager();
long startTime = System.currentTimeMillis();
List<Myeo> returnData = methodToGetData(em);
System.out.println(sequenceNo + " " + (System.currentTimeMillis() - startTime));
String myJson = new Gson().toJson(returnData);
resp.getOutputStream().print(myJson);
resp.getOutputStream().flush();
}finally{
resp.getOutputStream().close();
if (em.isOpen())
em.close();
}
4 simulaneous request samples
localhost/myservlet/mycodeblock?sequenceNo=A
localhost/myservlet/mycodeblock?sequenceNo=B
localhost/myservlet/mycodeblock?sequenceNo=C
localhost/myservlet/mycodeblock?sequenceNo=D
resulting print statements
A 38002
B 38344
C 38785
D 39065
What I want
A 9002
B 9344
C 9785
D 10065
If you do 4 separate GET-requests these request should be called in parallel. They must not be lumped together, since they are called in different transactions.
If that does not work as you wrote, you should check whether you have defined a database-connection-pool-size or a servlet-thread-pool-size which serializes the calls to the dbms.

Bulk insert/update using Stateless session - Hibernate

I have a requirement to insert/update more than 15000 rows in 3 tables. So that's 45k total inserts.
I used Statelesssession in hibernate after reading online that it is the best for batch processing as it doesn't have a context cache.
session = sessionFactory.openStatelessSession;
for(Employee e: emplList) {
session.insert(e);
}
transcation.commit;
But this codes takes more than an hour to complete.
Is there a way to save all the entity objects in one go?
Save the entire collection rather than doing it one by one?
Edit: Is there any other framework that can offer a quick insert?
Cheers!!
You should read this article of Vlad Mihalcea:
How to batch INSERT and UPDATE statements with Hibernate
You need to make sure that you've set the hibernate property:
hibernate.jdbc.batch_size
So that Hibernate can batch these inserts, otherwise they'll be done one at a time.
There is no way to insert all entities in one go. Even if you could do something like session.save(emplList) internally Hibernate will save one by one.
Accordingly to Hibernate User Guide StatelessSession do not use batch feature:
The insert(), update(), and delete() operations defined by the StatelessSession interface operate directly on database rows. They cause the corresponding SQL operations to be executed immediately. They have different semantics from the save(), saveOrUpdate(), and delete() operations defined by the Session interface.
Instead use normal Session and clear the cache from time to time. Acttually, I suggest you to measure your code first and then make changes like use hibernate.jdbc.batch_size, so you can see how much any tweak had improved your load.
Try to change it like this:
session = sessionFactory.openSession();
int count = 0;
int step = 0;
int stepSize = 1_000;
long start = System.currentTimeMillis();
for(Employee e:emplList) {
session.save(e);
count++;
if (step++ == stepSize) {
long elapsed = System.currentTimeMillis() - start;
long linesPerSecond = stepSize / elapsed * 1_000;
StringBuilder msg = new StringBuilder();
msg.append("Step time: ");
msg.append(elapsed);
msg.append(" ms Lines: ");
msg.append(count);
msg.append("/");
msg.append(emplList.size());
msg.append(" Lines/Seconds: ");
msg.append(linesPerSecond);
System.out.println(msg.toString());
start = System.currentTimeMillis();
step = 0;
session.clear();
}
}
transcation.commit;
About hibernate.jdbc.batch_size - you can try different values, including some very large depending on underlying database in use and network configuration. For example, I do use a value of 10,000 for a 1gbps network between app server and database server, giving me 20,000 records per second.
Change stepSize to the same value of hibernate.jdbc.batch_size.

identifing a scheduled business object report

I am doing a java application that has to download only scheduled reports from a Business Object Server. For scheduling the reports I am using Info View the following way
1) Clic on the report
2) Action --> Schedule
3) Set Recurrence, Format and Destinations
The report then has a number of instances, as opposed to not scheduled reports, which have zero instances.
In the code, for separate the scheduled reports I am using
com.crystaldecisions.sdk.occa.infostore.ISchedulingInfo
IInfoObject ifo = ((IInfoObject) result.get( i ))
ISchedulingInfo sche = ifo.getSchedulingInfo();
this should give info about scheduling right? but for some reason this is returning an object(not a null, how I suppose it should return) for not scheduled reports.
And the info returned by its methods (say getBeginDate, getEndDate, etc) are similar for both kinds.
I tried to filter the reports using SI_CHILDREN > 0 the query
SELECT * FROM CI_INFOOBJECTS WHERE SI_PROGID = 'CrystalEnterprise.Webi' "
+ AND SI_CHILDREN > 0 AND SI_PARENTID = " + String.valueOf( privateFolderId )
+ " ORDER BY SI_NAME ASC "
is this a right way to filter the scheduled reports?
So Webi, Crystal etc. implement the ISchedulable interface. This means that your non-instance InfoObject WILL return an ISchedulingInfo, regardless of whether or not it has been scheduled.
If an object is scheduled, an instance is created with SI_SCHEDULE_STATUS = 9 (ISchedulingInfo.ScheduleStatus.PENDING)
The job then runs (SI_SCHEDULE_STATUS = 0), and either completes (SI_SCHEDULE_STATUS=1) or fails (SI_SCHEDULE_STATUS = 3). It can also be paused (SI_SCHEDULE_STATUS = 8)
So to find all instances that are scheduled, you need a query like:
select * from ci_infoObjects where si_instance=1 and si_schedule_status not in (1,3)
This will get you anything that isn't a success or a failure
A scheduled report will have a child instance which holds the scheduling information and has the scheduled report as its parent. (You can see this instance in the history list in BI Launch Pad.)
You can retrieve recurrently scheduled child instances from the CMS like this:
SELECT * FROM CI_INFOOBJECTS WHERE SI_PROGID = 'CrystalEnterprise.Webi'
and si_recurring = 1
This will isolate any the reports which are scheduled to be executed (or to be more precise, the child "scheduling" instances described above). You can then call getSchedulingInfo() on the child instance to get further info about this scheduling.
Bear in mind the the SI_PARENTID field, not the SI_ID field, returned by the above query gives you the ID of the initial WebI report.

How can I do Asynchronous messaging while reading and processing of the database records by using Java

Here I have the thread pool and an Another Polling class for implementing polling and Reading the messages from the database. Now the problem is I have to avoid reading redundant messages for updating and process the other messages waiting at the same time, since there are vast messages waiting.
// the code for poll method
public void poll() throws Exception {
// Method which defines polling of the data entry for counting its size.
st = conn.createStatement();
int count = 1;
long waitInMillisec = 1 * 60 * 125; // Wait for 7.5 seconds.
for (int i = 0; i < count; i++) {
System.out.println("Wait for " + waitInMillisec + " millisec");
Thread.sleep(waitInMillisec);
java.util.Date date = new java.util.Date();
Timestamp start = new Timestamp(date.getTime());
rs = st.executeQuery("select * from msg_new_to_bde where ACTION=804");
java.util.Date date1 = new java.util.Date();
Timestamp end = new Timestamp(date1.getTime());
System.out.print("Query count: ");
System.out.println(end.getTime() - start.getTime());
Collection<KpiMessage> pojoCol = new ArrayList<KpiMessage>();
while (rs.next()) {
KpiMessage filedClass = convertRecordsetToPojo(rs);
pojoCol.add(filedClass);
}
I don't know if you have a choice on how your messages are stored, but they appear to be inserted into a table that you're polling. You might add a database trigger to this table that in turn pushes a message into an Oracle AQ with the same data plus a correlation id.
If you can do without the table, I would suggest just defining the Oracle AQ in the same schema to store the messages, and dequeue by partial correlation id using pattern matching like corrid="804%". The full correlation id for the AQ message might be "804" + the unique pk of the message. You could then reuse this same queue for multiple actions for example, and define a Java queue 804 action worker class to wait on messages of that particular action (804 correlation id prefix on the AQ messages).
The documentation is pretty good at Oracle for AQ, and the package you would use to create the queue is dbms_aqadm. The package you would use to enqueue/dequeue is dbms_aq. There's a few priv's/grants you'll need to get too before the aq can be created and the dbms_aq packages can be used. The dbms_aq should easily be callable from Java.
Go to docs.oracle.com to lookup the details on dbms_aqadm and dbms_aq packages. Once you create the AQ (which will create an AQ table that backs the queue), I would suggest you add an index to the AQ table on corrid for performance.
If you can't avoid the current table architecture you have in place or don't want to get into AQ technology, the other option you could use is to create a lock in Oracle (dbms_lock package) and call that in your polling class to obtain the lock or block/wait. That way you synchronize all your polling classes to avoid multiple threads from picking up the same message. So the first thing the polling class would do is to try to obtain the lock, if successful it pulls a message out of the table, processes it, updates it as processed, releases the lock. The dbms_lock package can block/wait for the lock or return immediately, but based on the operations success/failure you can take further action. But it will help you control the threads from picking up the same message. Oracle's docs are pretty good on this package too.

Categories

Resources