Clustering: Finding a Average Reading [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am looking into finding algorithm within the area of clustering or machine learning which will facilitate or creating a typical data reading for a group of readings. The issue is that it must facilitate time series data; thus some traditional (k-means) techniques are not as useful.
Can anyone recommend places to look or particular algorithms that would provide a typical reading and relatively simple to implement (in Java), manipulate and understand?

As an idea. Try to convert all data types into time, then you will have vectors of the same type (time), then any clustering strategy will work fine.
By converting to time I actually mean that any measurement or data type we know about has a time in its nature. Time is not a 4-th dimension, as many think! Time is actually 0-dimension. Even a point of no physical dimensions which may not exist in space, exists in time.
Distance, weight, temperature, pressure, directions, speed... all measures we do can be converted into certain functions of time.
I have tried this approach on several projects and it payed back with really nice solutions.
Hope, this might help you here as well.

For most machine learning problems in Java, weka usually works pretty well.
See, for example: http://facweb.cs.depaul.edu/mobasher/classes/ect584/weka/k-means.html

Related

What's the most efficient technique to constantly read JSONs from an API? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I'm about to start a program that is constantly getting information from a web API so I want it read the API JSONs as fast as possible in order not to lose any time (I want to check some values that are constantly updating).
As I'm coding using Java, I thought I could use gson to parse those JSON information but I don't know if it's the most efficient way.
Also, is Java a good language in order to read APIs efficiently?
If it's about performance there are some issues to look at / to optimize:
query speed for the data (e.g. the used http or network client)
speed of parsing the result
speed of the output of the data
....
If you want to improve the performance, I would suggest you analyze at first, where the most performance is lost in that chain, for example by logging durations for each part.
When you found the bottleneck, you can improve it and measure it again, to see wether your program became faster.
For example you could perhaps use UDP instead of tcp. Or use http2 instead of old http protocol and so on.
If it's really the parsing part, which makes critical durations, you could try to use the fact, that the data is always in the same structure. For example you could look at "keywords" in your JSON format and extract the text right before or after these keywords. Then your program doesn't have to parse (or "understand") the whole structure and can operate (possibly) faster.
Or you can extract the facts you search for with certain positions (for example the info is always after the sixth curly open brace).
But you should only optimize, if it's a real performance gain (see first part of the answer) because it's quite likely that your code gets less readable, when you optimize it for the sake of performance. That's often the tradeoff, one has to choose.

Design on how to store large objects in a list [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I need a hint on an Interview question that I came across. I tried to find a solution but I need advice from experts over here. What are the different strategies you would employ had you came across this particular situation? The question and my thoughts are as follows:
Q. You want to store a huge number of objects in a list in java. The number of objects is very huge and gradually increasing, but you have very limited memory available. How would you do that?
A. I answered by saying that, once the number of elements in the list
get over a certain threshold, I would dump them to a file. I would typically then build cache-like data-structure that would hold the most-frequently or recently added elements. I gave an analogy of page swapping employed by the OS.
Q. But this would involve disk access and it would be slower and affect the execution.
I did not know the solution for this and could not think properly during the interview. I tried to answer as:
A. In this case, I would think of horizontally scaling the system or
adding more RAM.
Immediately after I answered this, my telephonic interview ended. I felt that the interviewer was not happy with the answer. But then, what should have been the answer.
Moreover, I am not curious only about the answer, I would like to learn about different ways with which this can be handled.
Maybe I am not sure but It indicates that somewhat Flyweight Pattern. This is the same pattern which is been used in String pool and its efficient implementation is must Apart from that, we need focus on database related tasks in order to persist the data if the threshold limit is surpassed. Another technique is to serialize it but as you said, the interviewer was not satisfied and wanted some other explanation.

What would be a real-life example of when inserting/deletion is not common, but searching is? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
In Data Structures and Algorithms in Java, the advantages of ordered arrays are stated. One of the advantages, I wish I had some kind of real example for. This is not for homework, but just self-clarification. What are some real cases for when insertion/deletion is not frequent, but searches are frequent? Anything would help even if you can point me in the direction of some github repository. Thank you.
An example would be a dictionary. After it is built, it can be looked up millions of time. Like your paper dictionary, the words in it better be sorted.
While I like leeyuiwah's answer, a more common domain which you can see in commercial context is a data base of some entity, for example the customers or employees, for which normally you create a view. That's why we index them (make the retrieval faster). Indeed, after inserting some records most of the operations will be retrieval which includes a search (based on complicated conditions or a simple identifier).

cost a huge time because the number is huge? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a program which involves with a bunch of huge numbers (I have to put them into bignumbers type). The time complexity is unexpectedly huhge too. So, I was wondering, do these two factors have a connections? Any comments are greatly appreciated.
Do they have a connection to each other? Probably not.
You can have a large complexity algorithm working on small numbers (such as calculating the set of all sets for ten thousand numbers all in the range 0..30000) and you can have very efficient algorithms working on large numbers (such as simply adding up ten thousand BigInteger variables).
However, they'll both probably have a cascading effect on the time it takes your program to run. Large numbers will add a bit, a high-complexity algorithm will add a bit more I say 'add' but the effect is likely to be multiplicative, much worse - for example, using an inefficient algorithm may may your code take 30% longer, and the use of BigInteger may add 30% to that, giving you a 69% overall hit:
t * 1.3 * 1.3 = 1.69t
Sorry for the general answer but, without more specifics in the question, a general answer is the best you'll probably get. In any case, I believe (or at least hope) it answers the question you asked.

Implementation of Autoencoder [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm trying to implement an Auto-encoder by my own in Java. From the theory, I understood that auto-encoder is basically a symmetric network.
So, if I chose to have 5 layers in total, do I have to use 9 layers in training (back propagation) phase or 5 layers enough?
I've been reading theory but they are too abstract and full of math formulas, I could not get any implementation details via google.
What's the usual way of doing this?
An Auto-encoder, in training phase, using back propagation, tries to get the output similar to the input with a goal to minimize the error. It is shown above. The number of layers in the above image are 7 while the actual layers are 4 after the training. So, while training can I implement the back-propagation with just 4? If so, how can I do this?
Simple backpropagation won't work with so many layers. Due to so called vanishing gradient pehenomen, networks having more than two hidden layers won't learn anything reasonable. In fact, best results are obtained with one hidden layer. So in case of autoencoder you should have INPUT layer, HIDDEN layer and OUTPUT layer. No need for more, the Universal Approximation Theorem clearly shows, that this is enough for any problem.
From the OOP point of view it depends whether you plan to reuse this code with different types of neurons, and by type of neuron I mean something deeper than just different activation function - different behaviour (stochastic neurons?); different topologies (not fully connected networks). If not - modeling each neuron as a separate object is completely redundant.

Categories

Resources