Relationship discovery using Neural networks

Relationship discovery using Neural networks - java

Suppose I am sampling a number of signals at a fixed rate (say once per second) and extracting some metrics from the signals such as, ratio of one to the other, the rate of change, relative rate of change, etc.
I've heard that Neural Networks can be of use in discovering relationships. Is this true?
If so, what books/internet resources can I use to learn more about how do do this.
The processing is being done is Java, so a Java slant on all your answers would be most appreciated.
Thanks

Most likely you would need to determine some sort of a "window", like maybe the last 10 samples. You would normalize your signal into an array of 10 "doubles" normalized between -1 and 1. This would form the "input" into your neural network. So you would have 10 input neurons. Then you have to decide what you want the output to be. Maybe you have 100 different classifications that you may want to classify the signals into. If this is the case you would have 100 different output neurons that would each be trained to produce a higher output than the other output neurons when they recognize a specific signal.
Between the input and output layers neural networks usually have one or more hidden layers. These just provide additional capability to the neural network.
For Java neural network programming, you might try the Encog project. There is also a DotNet version of Encog as well.

That is true. You can discover relationships with NN. The problems is that it's very hard to interpret the weightings after you make the calibration.. so they are a bit of a black box (more so than other data mining algorithms).
I would actually recommend exploring the Neural Net algorithm that comes with MS Analysis Services. It's a good way to learn about NNets before you start programming anything (and since it's a server service you can call it from java).

Related

Comparing between two skeleton tracking on processing 3

I am currently doing my dissertation which would involve in having 2 people a professional athlete and an amateur. First with the image processing skeletonization I would like to record the professional athlete while performing the squat exercise , then when the amateur performs the exercise I want to be able to compare the professional skeleton with that of the amateur to see if it is properly formed.
Please I m open for any suggestions and opinions , Would gladly appreciate some help

Here lies your question:
properly formed.
What does properly performed actually mean ? How can this be quantified ?
Bare in mind I'm not an athletic/experienced in this field.
If I were given the task I would counter-intuitively go in the opposite direction:
moving away form Processing 3/kinect/computer. I would instead:
find a professional athlete
find a skilled with trainer with functional mobility training.
find an amateur (probably easiest)
Item 2 will be trickier.For example FMS seems to put a lot of emphasis on correct exercising and mobility (to enhance performance and reduce risk of injuries). I'm not sure if that's the only approach or the best. You might want to check opinions on Physical Fitness, consult with people studying/teaching exercise science, etc. Do check credentials as it feels like a field where everyone has an opinion/preference.
The idea is to understand how a professional educated trainer asses correct movement. Take note of how that works in the real world and try to systemise it.
What are the cues for a correct execution ?
is the key poses
the motion in between
how the skeletal and muscular system work together/ the weights/forces applied/etc.
Having a better understanding of how this works in the real world should lead you to things you can start quantifying/comparing numerically on a computer.
Try to make a checklist/score system manually using a pen and paper based on the information you gather. If this works you already have a system you can start programming.
The next step is acquiring the data.
This is probably where the kinect comes, but bare in mind:
the second version of the kinect is more precise than the first
there is a Kinect2 SDK wrapper for Processing 3: use that if you can (windows only). There is a way you can get libfreenect2 working with OpenNI on osx/linux and therefore with SimpleOpenNI in Processing, but it's not straight forward and you won't have the same precision on the skeleton tracking algorithm
use data that is as precise as possible:
you can get the accuracy of a tracked skeleton joint
use an environment that doesn't contain a complex background (makes it easy to segment users and detect/track skeletons with little change of mistaking it for something else). prefer artificial non-incandescent light (less of a problem with kinect v2, but still you want as little IR interference as possible).
comparing orientation matrices or joints on single poses might not be enough to get the full picture: how do you capture/quantify motion taking into account the things that the kinect can't easily see: muscles flexing/forces applied/moving centre of gravity/etc.
try to use a grid system that will make it simple to pair the digital values with real world measurements. Check out how people used to study motion in the past, for example Étienne-Jules Marey or Eadweard Muybridge
Motion capture by Étienne-Jules Marey
Motion study by Eadweard Muybridge (notice the grid)
It's a pretty full on project to get right involving bits of anatomy/physics/kinematics/etc.
Start with the research first:
how did people study this in the past ?
what are the current developments ?
how does it work in the real world (without computers) ?
Take your constraints into account:
what resources (people/gear/etc.) can you use ?
how much time do you have available ?
Given the above, what topic/section of the project can be realistically be tackled to get useful results.
Overall probably something along these lines:
background research
real world studies
comparison system has feature which can be measured both with kinect and by a person
record data (real world data + mobility comparison evalutation and kinect data + mobility comparison)
compare data
write evaluation of findings (how effective is the system? what are limitations ? what could be improved (future work) ? etc.)
In short be aware of the kinect limitations: skeleton tracking is probability based: it's not 100% accurate. use data that's as clean/correct as possible to begin with (make it easy to acquire good data if you can control the capture environment). From what a real trainer would track, what could you track with a kinect ? do a comparison of the intersecting measurements.

How to estimate distance between two android devices through wifi-direct?

I am reading about Received Signal Strength Indicator(RSSI). This could be used for our particular case, a rough estimation of distance between devices. But maybe there could be something to combine in order to improve the accuracy.

There are too many variables at play to accurately determine such data.
It's completely impractical to obtain a reliable distance figure just looking at the RSSI. It may give you an order of magnitude but nothing remotely accurate.
Most notable is the variety of devices and underlying hardware used. But take into account this simple example: you have 2 pairs of devices, 1 pair has good signal for a range of 30m without obstacles within and another pair has the same value for what would be considered a good signal but they are within 1m but with an obstacle which causes heavy interefernce. Any interpretation from empirical data would be awfully inaccurate.
The best change you have is to pinpoint each device's location by GPS and communicate&compare coordinates. But again, the underlying hardware comes at play. Also the length of time it takes to pinpoint with a better accuracy. We're talking about 5-50 meters. So it's basically mostly noise.
Go beyond that and you go out of wi-fi direct / peer-to-peer range.
As such, trying to estimate distances begtween 2 android devices at this time would be in my view an utter waste of time. (unless you are building a huge network of interconnected wi-fi direct devices in which case you could come up with some cool applications, or at least visualisation stuff - that's a bit of a stretch considering one-to-many limitations).

How to speed up the model creation process of OpenNLP

I am using OpenNLP Token Name finder for parsing the Unstructured data, I have created a corpus(training set) of 4MM records but as I am creating a model out of this corpus using OpenNLP API's in Eclipse, process is taking around 3 hrs which is very time consuming. Model is building on default parameters that is iteration 100 and cutoff 5.
So my question is, how can I speed up this process, how can I reduce the time taken by the process for building the model.
Size of the corpus could be the reason for this but just wanted to know if someone came across this kind of problem and if so, then how to solve this.
Please provide some clue.
Thanks in advance!

Usually the first approach to handle such issues is to split the training data to several chunks, and let each one to create a model of its own. Afterwards you merge the models. I am not sure that this is valid in this case (I'm not an OpenNLP expert), there's another solution below. Also, as it seems that the OpenNLP API provides only a single threaded train() methods, I would file an issue requesting a multi threaded option.
For a slow single threaded operation the two main slowing factors are IO and CPU, and both can be handled separately:
IO - which hard drive do you use? Regular (magnetic) or SSD? moving to SSD should help.
CPU - which CPU are you using? moving to a faster CPU will help. Don't pay attention to the number of cores, as here you want the raw speed.
An option you may want to consider to to get an high CPU server from Amazon web services or Google Compute Engine and run the training there - you can download the model afterwards. Both give you high CPU servers utilizing Xeon (Sandy Bridge or Ivy Bridge) CPUs and local SSD storage.

I think you should make algorithm related changes before upgrading the hardware.
Reducing the sentence size
Make sure you don't have unnecessarily long sentences in the training sample. Such sentences don't increase the performance but have a huge impact on computation. (Not sure of the order) I generally put a cutoff at 200 words/sentence. Also look at the features closely, these are the default feature generators
two kinds of WindowFeatureGenerator with a default window size of only two
OutcomePriorFeatureGenerator
PreviousMapFeatureGenerator
BigramNameFeatureGenerator
SentenceFeatureGenerator
These features generators generate the following features in the given sentence for the word: Robert.
Sentence: Robert, creeley authored many books such as Life and Death, Echoes and Windows.
Features:
w=robert
n1w=creeley
n2w=authored
wc=ic
w&c=robert,ic
n1wc=lc
n1w&c=creeley,lc
n2wc=lc
n2w&c=authored,lc
def
pd=null
w,nw=Robert,creeley
wc,nc=ic,lc
S=begin
ic is Initial Capital, lc is lower case
Of these features S=begin is the only sentence dependant feature, which marks that Robert occurred in the start of the sentence.
My point is to explain the role of a complete sentence in training. You can actually drop the SentenceFeatureGenerator and reduce the sentence size further to only accomodate few words in the window of the desired entity. This will work just as well.
I am sure this will have a huge impact on complexity and very little on performace.
Have you considered sampling?
As I have described above, the features are very sparse representation of the context. May be you have many sentences with duplicates, as seen by the feature generators. Try to detect these and sample in a way to represent sentences with diverse patterns, ie. it should be impossible to write only a few regular expressions that matches them all. In my experience, training samples with diverse patterns did better than those that represent only a few patterns, even though the former had a much smaller number of sentences. Sampling this way should not affect the model performance at all.
Thank you.

Is it possible to automatically generate descriptors from a data set for use with ANN?

I would like to classify a dataset automatically into several classes. Is it possible to train a neural net without coding any descriptors?
I am classifying a set of fixed size Pictures. I do not really want to write a set of descriptors for them, though. Is there a way where I can classify my set with little effort?
I have a large dataset and only 7-8 classes in which to classify.
I would be extremely happy if I could snag some sample code along the way :)

There is a very broad class of neural networks. For what you're doing, you will want to look for one based on unsupervised learning. Typically, these are based on Hebb's rule.
For your case, you might find competitive learning to be suitable. Essentially, you set an output neuron for each class (so the 7-8 you're expecting), and strengthen the weights to the most active for a given input pattern. This results in clustering; input patterns that are similar activate the same output neuron, strengthening those connections and causing the neurons to specialize for the different classes.

Algorithm or api to create Intrusion Detection System inputs

Hello I want to develope Intrusion detection system using neural network.
I know there are 41 inputs. ( I know this from the Dataset which I used to train the neural network) .
I need help how to capture this 41 inputs in live connection. Please somebody help me or atleast guide me in the correct direction.
Thank you for your answers in advance...

What you are trying to do is feature extraction or reduction on your input data.
As input data I could imagine logs from a firewall, captured packets, ...
And as features you could have things like failed login attempts per time unit, number of connections, ...
But if you want to have your system work with the training you feed it, you need to have the same distribution of the features in the data you process, as you have trained it on (or at least very similar).
So to make matters short and simple : if you want to use the training data you cite, you need to get to know exactly which data they worked on gathering the training data, and exactly how they preprocessed it.

I have answered your other question (http://stackoverflow.com/questions/7587657/building-intrusion-detection-system-but-from-where-to-begin)
more thoroughly. But I repeat here.
Read this article to learn more about how it (KDD99) is constructed
Article (Lee2000framework) Lee, W. & Stolfo, S. J.
A framework for constructing features and models for intrusion detection systems
ACM Trans. Inf. Syst. Secur., ACM, 2000, 3, 227-261

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.