Impact of using Thead.currentThread - java

Would there be any impact of using the below snippet at commonplace which gets the frequent application control,
Thread.currentThread().getStackTrace().HasElements
and
Thread.currentThread().getStackTrace()
.hasMatch(\elt1 -> elt1.FileName == "XXXXX.java")
As I need to identify the control comes here from the internal batch file (XXXX.java) to perform the subsequent custom processes. Will the above snippet makes any difference in the Clustered region?

Related

Can we perform, creating a bucket, block public access and bucket versioning functionality in single java SDK call in Amazon S3?

I'm trying to create a bucket with the additional functionality of the "Block public access" setting and bucket versioning enable/disable through JAVA SDK. But the problem that I'm facing is that all the functionalities are given in different classes of JAVA, which is problematic because if one of the requests is executed and the other is not, then it will cause the operation to be half performed or executed.
So my question is that can we perform all the above-given functionalities in a single JAVA SDK call?
For creating a bucket on Amazon S3, I'm using the following code -
CreateBucketRequest createBucketRequest = new CreateBucketRequest(bucket);
s3Object.createBucket(createBucketRequest.withObjectLockEnabledForBucket(Boolean.parseBoolean(objectLock)));
Please note that Object Lock functionality is given in the call.
Thanks in advance.
No, you cannot do that. The underlying S3 CreateBucket API (which each and every SDK uses) simply does not offer that.
You may want to "outsource" the bucket creation e.g. into a dedicated lambda that can be retried and write your code in a way that it can skip the bucket creation if it already exists and then try to set the versioning, etc.

Running existing production Java applications in Spark

I've been reading up on Spark and am very interested in the ability to allocate computation across scalable compute clusters. We have production stream processing code (5K lines written in Java 9) which handles AMQP message processing, that we would like to run in a Spark cluster.
However, I feel like I must misunderstand the basic premise of Spark. On the one hand, it runs Java and we should be able to run our applications with it, but on the other hand it seems (from the documentation) that all code must be rewritten to the Spark API (using Dataframes/Datasets). Is this true? Can Java applications be used as-is with Spark, or must they be rewritten? This seems like a major limitation or rather a showstopper for us.
I think, ideally, we would want to use Spark to handle high level message routing (using the Structured Streaming API), which would hand off the message to our Java application to handle computation, database writes etc. The core part of our code is single class interface and Spark could map the message to that class instance. Hence, there would likely be many, many instances processing messages in parallel both within each machine instance and distributed across the cluster.
Am I missing something here?
for your question Can Java applications be used as-is with Spark, or must they be rewritten?
Yes, you have to rewrite the data interaction layer.
spark reads the source data in the form of rdd/dataframe, in your case its streaming Dataframes/Datasets.
Spark parallel processing/job scheduling is based on these dataset/dataframe
Dataframes/dataset is equivalent to an Array which is storing data on multiple nodes.
so if you have a logic in java that iterate a list and writes to file
conn=openFile(..)
Array[value].foreach{
value-> {
updatedValue=/**your business logic on the value**/
conn.write(updatedValue)
}
}
in spark you have to deal with the dataframe
dataframe[value].map{ value->
updatedValue =/**your business logic on the value**/ <-- reuse your logic here
}.saveToFile(/**file path**/)
hope you can see the difference, you can reuse your business logic,
but spark has to handle the dataflow either read/write(recommended).

Is it possible to get a deep copy of objects using the VersionOne Java SDK?

Let's say I want to calculate the cumulative estimate of my defects. I do
double estimate = 0.0;
Double tEstimate = 0.0;
Collection<Defect> defects = project.getDefects(null);
for(Defect d : defects){
tEstimate = d.getEstimate();
if(tEstimate != null){
estimate += tEstimate;
}
}
Here each call to d.getEstimate() does a callback to the server, meaning this code runs extremely slowly. I would like to take the one-time performance hit up front and download all the info along with the Defect object, probably including getting some information I won't use, but avoid hitting the latency of a server callback during each iteration of the loop.
You are using the VersionOne Object model SDK. It does lack robustness because of the very thing you are complaining about. One of the inefficiencies is how it knows that you are requesting a list of assets but first gets all of the assets with a predetermined set of attributes such as AssetState and checks to see if it is dead asset. After this, it makes another call to get the same list of assets again but with your specified attributes. This could be remedied by applying a greedy algorithm that could grab a set a of attributes such that each member of this set is returned regardless of which attributes are requested in your .get_() method. Why? This already (sort of) happens in the Rest based VersionOne API as it stands. If the query returned all attributes, it would probably a little wasteful especially for humongous backlogs.
Anyway, the VersionOne will be deprecating the Object Model in the near future so if you plan on a lot of coding using the OM, consider this.
Here are some ways to circumvent this problem
1) Rewrite your code to use the VersionOne APIClient SDK. It has XML plumbing so that you will save you a lot of time writing your own. This is a little bit more verbose but it is more powerful, fast and efficient. The Object model is actually built upon the APIClient.
2) Rewrite your code using Java and the raw VersionOne Rest API - The requires that you understand http and the VersionOne Rest API.
3) If you cannot change from the Object model, you can mix the 2 sdks. When you need to read large amounts to data, just use APIClient code to manage that segment of the code. Kind of pointless when you can just learn the APIclient and use exclusively unless you have a huge investment in using the Object model and you can't change. The code gets mucky real fast. Not recommended.
The rest-1.v1 API endpoint exposes operations for assets, including DeepCopy. There is no client code that enumerates all of the operations, so you must first explore the asset using the meta.v1 API endpoint. Using the API Client backdoor from the Object Model, you can get to the classes that will allow you to call an operation once you know the name.

Search optimization when data owner is someone else

In my project, we have 2 REST calls which take too much time, so we are planning to optimize that. Here is how it works currently - we make 1st call to system A and then pass the response to system B for further processing. Once we get the response from system B, we have to manipulate it further before passing it to UI layer and this entire process takes lot of time. We planned on using Solr/Lucene but since we are not the data owners, we can't implement that. Can someone please shed some light on how best this can be handled? We are using Spring MVC and Spring webflow. Thanks in advance!!
[EDIT:] This is not the actual scenario and I am writing this as an example for better understanding. Think of this as making a store locator call for a particular zip to get a list of 100 stores and then sending those 100 stores to another call to get a list of inventory etc. So, this list of stores would change for every zip code and also the inventory there.
If your queries parameters to System A / System B are frequently the same you can add a cache framework to your code. If you use Spring3, you can use the cache easily with an #Cacheable annotation on your code calling SystemA. See :
http://static.springsource.org/spring/docs/3.1.0.M1/spring-framework-reference/html/cache.html
The cache subsystem will cache the result including processing code.

Retrieving Large Lists of Objects Using Java EE

Is there a generally-accepted way to return a large list of objects using Java EE?
For example, if you had a database ResultSet that had millions of objects how would you return those objects to a (remote) client application?
Another example -- that is closer to what I'm actually doing -- would be to aggregate data from hundreds of sources, normalize it, and incrementally transfer it to a client system as a single "list".
Since all the data cannot fit in memory, I was thinking that a combination of a stateful SessionBean and some sort of custom Iterator that called back to the server would do the trick.
So, in other words, if I have an API like Iterator<Data> getData() then what's a good way to implement getData() and Iterator<Data>?
How have you successfully solved this problem in the past?
Definitely don't duplicate the entire DB into Java's memory. This makes no sense and only makes things unnecessarily slow and memory-hogging. Rather introduce pagination at database level. You should query only the data you actually need to display on the current page, like as Google does.
If you actually have a hard time in implementing this properly and/or figuring the SQL query for the specific database, then have a look at this answer. For JPA/Hibernate equivalent, have a look at this answer.
Update as per the comments (which actually changes the entire question subject...), here's a basic (pseudo) kickoff example:
List<Source> inputSources = createItSomehow();
Source outputSource = createItSomehow();
for (Source inputSource : inputSources) {
while (inputSource.next()) {
outputSource.write(inputSource.read());
}
}
This way you effectively end up with a single entry in Java's memory instead of the entire collection as in the following (inefficient) example:
List<Source> inputSources = createItSomehow();
List<Entry> entries = new ArrayList<Entry>();
for (Source inputSource : inputSources) {
while (inputSource.next()) {
entries.add(inputSource.read());
}
}
Source outputSource = createItSomehow();
for (Entry entry : entries) {
outputSource.write(entry);
}
Pagination is a good solution when working with a web based ui. sometimes, however, it is much more efficient to stream everything in one call. the rmiio library was written explicitly for this purpose, and is already known to work in a variety of app servers.
If your list is huge, you must assume that it can't fit in memory. Or at least that if your server need to handle that on many concurrent access then you have high risk of OutOfMemoryException.
So basically, what you do is paging and using batch reading. let say you load 1 thousand objects from your database, you send them to the client request response. And you loop until you have processed all objects. (See response from BalusC)
Problem is same on client side, and you'll likely to need to stream the data to the file system to prevent OutOfMemory errors.
Please also note : It is okay to load millions of object from a database as an administrative task : like for performing a backup, and export of some 'exceptional' case. But you should not use it as a request any user could do. It will be slow and drain server resources.

Categories

Resources