Search optimization when data owner is someone else

Search optimization when data owner is someone else - java

In my project, we have 2 REST calls which take too much time, so we are planning to optimize that. Here is how it works currently - we make 1st call to system A and then pass the response to system B for further processing. Once we get the response from system B, we have to manipulate it further before passing it to UI layer and this entire process takes lot of time. We planned on using Solr/Lucene but since we are not the data owners, we can't implement that. Can someone please shed some light on how best this can be handled? We are using Spring MVC and Spring webflow. Thanks in advance!!
[EDIT:] This is not the actual scenario and I am writing this as an example for better understanding. Think of this as making a store locator call for a particular zip to get a list of 100 stores and then sending those 100 stores to another call to get a list of inventory etc. So, this list of stores would change for every zip code and also the inventory there.

If your queries parameters to System A / System B are frequently the same you can add a cache framework to your code. If you use Spring3, you can use the cache easily with an #Cacheable annotation on your code calling SystemA. See :
http://static.springsource.org/spring/docs/3.1.0.M1/spring-framework-reference/html/cache.html
The cache subsystem will cache the result including processing code.

Related

How to Monitor/inspect data/attribute flow in Java code

I have a use case when I need to capture the data flow from one API to another. For example my code reads data from database using hibernate and during the data processing I convert one POJO to another and perform some more processing and then finally convert into final result hibernate object. In a nutshell something like POJO1 to POJO2 to POJO3.
In Java is there a way where I can deduce that an attribute from POJO3 was made/transformed from this attribute of POJO1. I want to look something where I can capture data flow from one model to another. This tool can be either compile time or runtime, I am ok with both.
I am looking for a tool which can run in parallel with code and provide data lineage details on each run basis.

Now instead of Pojos I will call them States! You are having a start position you iterate and transform your model through different states. At the end you have a final terminal state that you would like to persist to the database
stream(A).map(P1).map(P2).map(P3)....-> set of B
If you use a technic known as Event sourcing you can deduce it yes. How would this look like then? Instead of mapping directly A to state P1 and state P1 to state P2 you will queue all your operations that are necessary and enough to map A to P1 and P1 to P2 and so on... If you want to recover P1 or P2 at any time, it will be just a product of the queued operations. You can at any time rewind forward or rewind backwards as long as you have not yet chaged your DB state. P1,P2,P3 can act as snapshots.
This way you will be able to rebuild the exact mapping flow for this attribute. How fine grained you will queue your oprations, if it is going to be as fine as attribute level , or more course grained it is up to you.
Here is a good article that depicts event sourcing and how it works: https://kickstarter.engineering/event-sourcing-made-simple-4a2625113224
UPDATE:
I can think of one more technic to capture the attribute changes. You can instument your Pojo-s, it is pretty much the same technic used by Hibernate to enhance Pojos and same technic profiles use to for tracing. Then you can capture and react to each setter invocation on the Pojo1,Pojo2,Pojo3. Not sure if I would have gone that way though....
Here is some detiled readin about the byte code instrumentation if https://www.cs.helsinki.fi/u/pohjalai/k05/okk/seminar/Aarniala-instrumenting.pdf

I would imagine two reasons, either the code is not developed by you and therefore you want to understand the flow of data along with combinations to convert input to output OR your code is behaving in a way that you are not expecting.
I think you need to log the values of all the pojos, inputs and outputs to any place that you can inspect later for each run.
Example: A database table if you might need after hundred of runs, but if its one time may be to a log in appropriate form. Then you need to yourself manually use those data values layer by later to map to the next layer. I think with availability of code that would be easy. If you have a different need pls. explain.
Please accept and like if you appreciate my gesture to help with my ideas n experience.

There are "time travelling debuggers". For Java, a quick search did only spill this out:
Chronon Time Travelling Debugger, see this screencast how it might help you .
Since your transformations probably use setters and getters this tool might also be interesting: Flow
Writing your own java agent for tracking this is probably not what you want. You might be able to use AspectJ to add some stack trace logging to getters and setters. See here for a quick introduction.

Is it possible to get a deep copy of objects using the VersionOne Java SDK?

Let's say I want to calculate the cumulative estimate of my defects. I do
double estimate = 0.0;
Double tEstimate = 0.0;
Collection<Defect> defects = project.getDefects(null);
for(Defect d : defects){
tEstimate = d.getEstimate();
if(tEstimate != null){
estimate += tEstimate;
}
}
Here each call to d.getEstimate() does a callback to the server, meaning this code runs extremely slowly. I would like to take the one-time performance hit up front and download all the info along with the Defect object, probably including getting some information I won't use, but avoid hitting the latency of a server callback during each iteration of the loop.

You are using the VersionOne Object model SDK. It does lack robustness because of the very thing you are complaining about. One of the inefficiencies is how it knows that you are requesting a list of assets but first gets all of the assets with a predetermined set of attributes such as AssetState and checks to see if it is dead asset. After this, it makes another call to get the same list of assets again but with your specified attributes. This could be remedied by applying a greedy algorithm that could grab a set a of attributes such that each member of this set is returned regardless of which attributes are requested in your .get_() method. Why? This already (sort of) happens in the Rest based VersionOne API as it stands. If the query returned all attributes, it would probably a little wasteful especially for humongous backlogs.
Anyway, the VersionOne will be deprecating the Object Model in the near future so if you plan on a lot of coding using the OM, consider this.
Here are some ways to circumvent this problem
1) Rewrite your code to use the VersionOne APIClient SDK. It has XML plumbing so that you will save you a lot of time writing your own. This is a little bit more verbose but it is more powerful, fast and efficient. The Object model is actually built upon the APIClient.
2) Rewrite your code using Java and the raw VersionOne Rest API - The requires that you understand http and the VersionOne Rest API.
3) If you cannot change from the Object model, you can mix the 2 sdks. When you need to read large amounts to data, just use APIClient code to manage that segment of the code. Kind of pointless when you can just learn the APIclient and use exclusively unless you have a huge investment in using the Object model and you can't change. The code gets mucky real fast. Not recommended.

The rest-1.v1 API endpoint exposes operations for assets, including DeepCopy. There is no client code that enumerates all of the operations, so you must first explore the asset using the meta.v1 API endpoint. Using the API Client backdoor from the Object Model, you can get to the classes that will allow you to call an operation once you know the name.

Which persistence method?

I'm building an application that downloads a set of images from a website, extracts some features from them and then allows a user to compare an image she submits to the downloaded set, to see which one is the closest. At the moment the application downloads the images and extracts the features from them. Then the image and the feature get wrapped in an object and stored in a map, with the key as the name of the image, and the value as the aforementioned wrapped object.
Because this is stored in memory, each time I start the application it has to go through the quite expensive process of downloading and feature extraction. It would be much quicker if it could just load this info from disk, but I'm not sure on the best way to go about it - I've thought about these options:
RDMS: something like Postgres or SQLite
NoSQL: something like
Voldemort or Reddis
Serialisation: use built in java methods to write
objects to a file (could also be used in conjunction with a DB
though...)
I want it to be really light weight; I want to keep the application as small as possible and keep configuration down to a minimum. For this reason serialisation seems like the way to go, but I'd like a second (or more) opinion on that, because something about doing it that way just feels wrong. I can't quite put my finger on why I feel like that...
I should also say that users can add images to the set when the application is running, I'd like to save these images too.

I wouldn't recommend serialzation - just too many pitfalls.
If what you have is really just a map, then i think any of the key-value stores ( like redis) would be appropriate.
If you have more complex data, then you might want to consider a database (whether SQL or no-sql).

Using A BlockingQueue With A Servlet To Persist Objects

First, this may be a stupid question, but I'm hoping someone will tell me so, and why. I also apologize if my explanation of what/why is lacking.
I am using a servlet to upload a HUGE (247MB) file, which is pipe (|) delineated. I grab about 5 of 20 fields, create an object, then add it to a list. Once this is done, I pass the the list to an OpenJPA transactional method called persistList().
This would be okay, except for the size of the file. It's taking forever, so I'm looking for a way to improve it. An idea I had was to use a BlockingQueue in conjunction with the persist/persistList method in a new thread. Unfortunately, my skills in java concurrency are a bit weak.
Does what I want to do make sense? If so, has anyone done anything like it before?

Servlets should respond to requests within a short amount of time. In this case, the persist of the file contents needs to be an asynchronous job, so:
The servlet should respond with some text about the upload job, expected time to complete or something like that.
The uploaded content should be written to some temp space in binary form, rather than keeping it all in memory. This is the usual way the multi-part post libraries to their work.
You should have a separate service that blocks on a queue of pending jobs. Once it gets a job, it processes it.
The 'job' is simply some handle to the temporary file that was written when the upload happened... and any metadata like who uploaded it, job id, etc.
The persisting service needs to upload a large number of rows, but make it appear 'atomic', either model the intermediate state as part of the table model(s), or write to temp spaces.
If you are writing to temp tables, and then copying all the content to the live table, remember to have enough log space and temp space at the database level.
If you have a full J2EE stack, consider modelling the job queue as a JMS queue, so recovery makes sense. Once again, remember to have proper XA boundaries, so all the row persists fall within an outer transaction.
Finally, consider also having a status check API and/or UI, where you can determine the state of any particular upload job: Pending/Processing/Completed.

Wicket: how to handle long running tasks

I've setup a Wicket + Hibernate + Spring Web application that involves gathering some data (having some files generated and returned), storing this in a database, creating some images and displaying all this on a webpage.
This all works fine for short runs, but sometimes gathering the data (which involves some remote number crunching) takes too long (20+ minutes) and times out. I've tried to resolve this using two approaches, but both of them show some problems.
The first approach was using AjaxLazyLoadPanels and just doing everything within the getLazyLoadComponent. This worked fine for the short runs, but for the 20+ minute runs the LazyLoadComponents would not load (nice oxymoron there) due to timeouts.
The second approach involved creating an intermediate Fragment with an added AjaxSelfUpdatingTimerBehavior with a duration set to 10 seconds, that polled for the files that are created in the number crunching. This seems to make the tasks run in the background without problems, but fails when the returned data needs to be stored in the database. I'm using the Open Session in View pattern, but maybe this fails when attempting to store data after 20 minutes?? (Solution could lie in resolving this..).
Due to the above problems I'm now reading up on alternate approaches to handle these long running tasks and came across:
org.apache.wicket.util.time.Task
org.apache.wicket.util.watch.ModificationWatcher
I'm now wondering if either of these might be better suited to solve the time-out problems I'm having in both running the tasks and storing the data in the database afterwards, or if anyone has any other solutions that might help in this situation.
I'd really like to know if a new approach is viable before I spend another day implementing something that might turn out not to work after all.
Best regards,
Tim

I know we have had success in using a Panel with an attached AjaxSelfUpdatingTimerBehavior. The task and the results piece are separated from the view logic, but are made accessible for the view via a service you create. The service implementation we have used is then responsible for starting a TheadPool or ExectutorService for running the individual tasks. The service can provide a way to monitor the progress/status of the particular job/call that is taking place. Once it is complete it should also make the data available for the view. Injection of a SessionFactory into the service implementation (or injected DAO) should be sufficient to create the HibernateSession outside of a WebSession.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.