predicting user actions while using GUI/application - java

I am using Java (but am open to solutions in other languages as well). I am looking at open source predictive modeling solutions for guessing what GUI/application features a user is interested in (I will have the specific user behavior data on the GUI/application). Instead of just looking at most used actions etc, should I possibly look at incorporating SVM or decision trees? I am looking at weka, mahout and jahmm - is there any other resource I can look at (specifically for GUI behavior - which hopefully returns results fast enough even if accuracy is reduced). Since I am not extremely knowledgeable about this field, please inquire about any information I may have left out to better ascertain a working solution. Thanks!

It's incredibly difficult to say given that we don't know what data you're using (I don't know of existing software to do this, but it may very well exist). With respect to support vector machines, they are binary or one-versus all classifiers, so I don't think they would be applicable here, if I understand your intentions correctly.
If you're unfamiliar with machine learning, Weka may be a good place for you to start. If you have supervised data, then you can feed all of your feature vectors with associated classification data into Weka and use cross-validation to see what type of technique suits you best. Additionally, you can use Weka to see if certain features are more important than others and do manual dimensionality reduction. Or of course, you can use one of Weka's dimensionality reduction techniques, but it may be difficult to decide which one if you don't know the assumptions that they make or how your data is related (this also applies to whatever prediction technique you try/use). Although, if you have enough time, you can just play around and manually just see what works best.

Related

Java Grammar syntax analyzer (ASCII to graph)

I am developing an assistant to type database commands for DBAs, because these commands have many parameters, and an assistant will help a lot with their job. For this assistant, I need the grammar of the commands, but database vendors (Oracle, DB2) do not provide that information in any format, the only thing is the documentation.
One example of a DB2 command is: http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.admin.cmd.doc/doc/r0001933.html
For this reason, I am trying to analyze the grammar diagram or railroad diagram (http://en.wikipedia.org/wiki/Syntax_diagram), but I have not found anything in Java that could help me. I would like some re-engineering (reverse) tool that takes the ASCII (textual representation) of the grammar, and creates a graph in Java. Then, with the graph in Java, the assistant could propose options of the current typed command.
On example of the assistant http://www.youtube.com/watch?v=5sBoUHJupvs
If you have information about how to analyze grammar diagrams with Java (not generate) I will appreciate that information.
The closest tool I've seen is Grammar Recovery System by Ralf Lammel. It depends on accessibility of railroad diagrams as text strings. That's generally not how they are found. You appear to be lucky in the DB2 case, Ralf's work points in the right direction.
Considering that such diagrams are usally rendered as just a set of pixels (PLSQL's are like this in the PDF files provided for documentation), you have several sets of problems: recognizing graphical entities from pixels, assembling them into actual representations of the railroad diagrams, and then using such as your assistant.
I think this is a long, hard, impractical approach. If you got it to work, you'd discover the diagrams are slightly wrong in many places (read Ralf's paper or find out the hard way), and therefore unusable for a tool that is supposed to produce the "right" stuff to help your DBAs.
Of course, you are objecting to the other long, hard, "impractical" approach of reading the documentation and producing grammars that match, and then validating those grammars against the real world. Yes, this is a tough slog too, but it actually does produce useful results. You need to find vendors that done this and will make it available to you.
ANTLR.org offers a variety of grammars. Have you checked there?
My company offers grammars and tools for processing them. We have done this for PLSQL and SQL2011 but not yet DB2.
Given a grammar, you now need to use it to provide "advice" to your users. Your users aren't going to type in a complete "program"; they want to generate fragments (e.g., SELECT statements). Now you need a parser that will process grammar fragments and at least say "legal" or "not". Most won't do that. Our DMS Software Reengineering Toolkit will do that.
To provide advice, you need to be able to walk the grammar (much as you considered for railroad diagrams) to compute "what is legal next". That's actually pretty hard (an in fact it is roughly equivalent to what an LR/GLR parser generator does when building tables). Our DMS engine does that during syntax error repair by traversing its GLR parse tables (since that work is already encoded in the tables!). That's not easy to do, as it is a peculiar variant of the GLR parsing algorithm. You might do better with an Earley parser, which keeps around all possible parses as a set of choices; you could simply inspect each one.
But this looks like quite a lot of work, and I think you'll be surprised by the amount of machinery you need.
The best work in this area is Harmonia, which produces incremental editors for code. Our DMS engine's parser is based on earlier work done by this project, because we are interested in the incrementality aspect.
You can try using ANTLR http://www.antlr.org/
It will not be able to understand an ASCII representation of the grammar, but it is powerful enough to do anything else you need, if you don't mind spending the time to learn the software.

Designing APIs in Java with top-down approach - Is writing up the Javadoc the best starting point?

Whenever I have the need to design an API in Java, I normally start off by opening up my IDE, and creating the packages, classes and interfaces. The method implementations are all dummy, but the javadocs are detailed.
Is this the best way to go about things? I am beginning to feel that the API documentation should be the first to be churned out - even before the first .java file is written up. This has few advantages:
The API designer can complete the design & specification and then split up the implementation among several implementors.
More flexible - change in design does not require one to bounce around among java files looking for the place to edit the javadoc comment.
Are there others who share this opinion? And if so, how do you go about starting off with the API design?
Further, are there any tools out there which might help? Probably even some sort of annotation-based tool which generates documentation and then the skeleton source (kind of like model-to-code generators)? I came across Eclipse PDE API tooling - but this is specific to Eclipse plugin projects. I did not find anything more generic.
For an API (and for many types of problems IMO), a top-down approach for problem partitioning and analysis is the way to go.
However (and this is just my 2c based on my own personal experience, so take it with a grain of salt), focusing on the Javadoc part of it is a good thing to do, but that is still not sufficient, and cannot reliably be the starting point. In fact, that is very implementation oriented. So what happened to the design, the modeling and reasoning that should take place before that (however brief that might be)?
You have to do some sort of modeling to identify the entities (the nouns, roles and verbs) that make up your API. And no matter how "agile" one would like to be, such things cannot be prototyped without having a clear picture of the problem statement (even if it is just a 10K foot view of it.)
The best starting point is to specify what you are trying to implement, or more precisely, what type of problems your API is trying to address. BDD might be of help (more of that below). That is, what is it that your API will provide (datum elements), and to whom, performing what actions (the verbs) and under what conditions (the context). That leads then to identify what entities provide these things and under what roles (interfaces, specifically interfaces with a single, clear role or function, not as catch-all bags of methods). That leads to an analysis on how they are orchestrated together (inheritance, composition, delegation, etc.)
Once you have that, then you might be in a good position to start doing some preliminary Javadoc. Then you can start working on the implementation of those interfaces, of those roles. More Javadoc follows (in addition to other documentation that might not fall within Javadoc .ie. tutorials, how-tos, etc.)
You start your implementation with use cases and verifiable requirements and behavioral descriptions of what each thing should do alone or in collaboration. BDD would be extremely helpful here.
As you work on, you continuously refactor, hopefully by taking some metrics (cyclomatic complexity and some variant of LCOM). These two tell you where you should refactor.
A development of an API should not be inherently different from the development of an application. After all, an API is a utilitarian application for a user (who happens to have a development role.)
As a result, you should not treat API engineering any diferently from general software-intensive application engineering. Use the same practices, tune them according to your needs (which every one who works with software should), and you'll do fine.
Google has been uploading its "Google Tech Talk" video lecture series on youtube for quite some time. One of them is an hour long lecture titled "How To Design A Good API and Why it Matters". You might want to check it out also.
Some links for you that might help:
Google Tech Talk's "Beyond Test Driven Development: Behaviour Driven Development" : http://www.youtube.com/watch?v=XOkHh8zF33o
Behavior Driven Development : http://behaviour-driven.org/
Website Companion to the book "Practical API Design" : http://wiki.apidesign.org/wiki/Main_Page
Going back to the Basics - Structured Design#Cohesion and Coupling : http://en.wikipedia.org/wiki/Structured_Design#Structured_Design
Defining the interface first is the programming-by-contract style of declaring preconditions, postconditions and invariants. I find it combines well with Test-Driven-Development (TDD), because the invariants and postconditions you write first are the behaviours that your tests can check for.
As an aside, it seems the Behaviour-Driven-Development elaboration of TDD seems to have come about because of programmers who did not habitually think of the interface first.
As for my self, I always prefer starting with writing the interfaces along with their documentation and only then start with the implementation.
In the past I took another approach which was starting with the UML and then using the automatic code generation.
The best tool I encountered for this matter was Rational Rose which is not free but I'm sure there are plenty of free plugins and utils.
The advantage of Rational Rose over other designers I bumped into was that you can "attach" the design to your code and then modify on either code or design and the other will update.
I jump right in with the coding with a prototype. Any required interfaces soon pop out at you and you can mould your proto into a final product. Get feedback along the way from whomever is going to be using your API if you can.
There is no 'best way' of approaching API design, do whatever works for you. Domain knowledge also has a large part to play
I'm a great fan of programming to the interface. It forms a contract between the implementors and the users of your code.
Rather than diving straight into code, I usually start with a basic model of my system (UML diagrams etc, depending on the complexity). Not only does this serve as good documentation, it provides a visual clarification of the system structure. Having this makes the coding part much easier to do. This kind of design documentation also makes it easier to understand the system when you come back to it in 6 months, or try to fix bugs :)
Prototyping also has its merits, but be prepared to throw it away and start again.

Developing a game with complex (and undefined) rules?

How would you develop a game that could end up with complex rules but you need to work on the rules (add, tweak, balance) a lot? I have looked at rule-based languages but I haven't found quite useful enough information regarding this.
UI etc. will be developed later, first I'd need to iteratively develop the rules and formulas and test them out between iterations. The game in question would be a tactical two-player game where players select "troops" and a large part of the game is choosing the correct troop setup. So the rules could be something like
If attacker's skill A is greater than defenders skill B and defender does not have extra skill Z then ...
That's obviously a very simple rule, I expect there to be dozens, if not hundreds of rules, with paths (if A then if B....).
For testing I would write a test framework that can run the rule sets through with large number of iterations and logging that allows me to see how the latest changes affected balance. It would also be useful to be able to define acceptable values and a possibility to see easily the changes. What tools are there for this?
The language of choice is either Python or Java (depending on whether I want to target Android or not - probably I will).
Whatever you'll do, in this sort of program you'll end up implementing half of Prolog anyway. Check out the forward-chaining, backward-chaining, and backtracking algorithms.
A pure-Java version of Prolog called Jekejeke was recently released. I can't comment on its quality.
Python makes it much easier than Java to implement backtracking using generators and yield statements.
As with larsmans I'm going to suggest that you use Prolog for your rules development. It's just the best language in semi-common use for doing exactly the kind of thing you want to do. I will, however, instead recommand tuProlog as looking like the ideal environment for your needs given that you want to write the game in Java. TuProlog is intended as an embeddable Prolog environment (can be embedded in Java or in .NET) with very finely-tuned library inclusion so you don't have to carry around a whole, bloated Prolog world with you when you're using just small parts of it.
Here's the blurb from the web site:
tuProlog is a light-weight Prolog system for distributed applications and infrastructures, intentionally designed around a minimal core (containing only the most essential properties of a Prolog engine), to be later configured by (statically and dynamically) loading/unloading libraries of predicates. tuProlog also natively supports multi-paradigm programming, providing a clean, seamless integration model between Prolog and mainstream object-oriented languages -- namely Java, for tuProlog Java version, and any .NET-based language (C#, F#..), for tuProlog .NET version. It is also easily deployable, just requiring the presence of a Java/CLR virtual machine and an invocation upon a single self-contained archive file. Interoperability is further developed along the two main lines of Internet standard patterns and coordination models.
It's an interesting question, though a bit broad and generic. I'll try to answer it according to my interpretation of what you're asking ;)
Your main concern seems to be about being able to express the rules in a concise, readable way that would allow you to always keep the definition clear and change it rapidly. Since you say the rules are probably going to get complex, I believe your best bet would be to write a DSL for them.
Try to writing a grammar that would be sufficient for describing your rules and then see how to plug it into your game. ANTLR could be very helpful there, especially because it supports both Java and Python.
try The A.I. of F.E.A.R.. there FSM (finite state machine) was combined with A* pathfinding. instead of finding a path in a terrain the engine found chains of goals to implement generation of "intelligent" behaviour of agents on the fly. maybe there is something inspiring in there for you.

Experiences with "language converters"?

I have read a few articles mentioning converters from one language to another.
I'm a bit more than skeptical about the use of such kind of tools. Does anyone know or have experiences let's say about Visual Basic to Java or vs converters? Just one example to pick
http://www.tvobjects.com/products/products.html, claims to be the "world leader" or so in that aspect, However if read this:
http://dev.mysql.com/tech-resources/articles/active-grid.html
There the author states:
"The consensus of MySQL users is that automated conversion tools for MS Access do not work. For example, tools that translate existing Access applications to Java often result in 80% complete solutions where finishing the last 20% of the work takes longer than starting from scratch."
Well we know we need 80% of the time to implement the first 80% functionality and another 80% of the time for the other 20 %....
So has anyone tried such tools and found them to be worthwhile?
Tried? No, actually built (more than one) language convertor.
Here's one I (and my coworkers) built for the B2 Spirit Stealth Bomber to convert the mission software, coded in a legacy language, JOVIAL, into maintainable C code, with 100% automated conversion. One of the requirements was that we were NOT allowed to see the actual source code. No joke.
You are right: if you get only a medium high conversion rate (e.g., 70-80%), the effort to finish the conversion is still very significant if indeed you can do it at all. We target 95%+ and do better when told to try harder as was the case for the B2. The only reason people accept medium high rate converters is because they can't find (or won't fund!) a better one, insist on starting now, and accept the fact that converting it this way may be painful (usually they don't know how much) but is in fact less painful than rebuilding it from scratch. (I happen to agree with this assessment: in general, projects that try to recode a large system from scratch usually fail and conversions using medium high conversion rate tools don't have as high a failure rate.)
There are lots of bad conversion tools out there, something slapped together with a mountain of PERL code doing regexes on text strings, or some YACC-based parser with code generation essentially one-to-one for each statement in the compilation unit. The former are built by people who had a conversion dropped on them out of the sky. The latter are often built by well-intentioned engineers that don't have decent compiler background.
For a singularly bad example, see my response to this SO question about COBOL migration: Experience migrating legacy Cobol/PL1 to Java, which is exactly a direct statement translator... producing the stuff that gave rise to the term "JOBOL".
To get such high-accuracy conversion rates, you need high-quality parsers, and means to build high-quality translation rules that preserve semantics, and optimize for target-language properties and special cases. In essence, you need what amounts to configurable compiler technology. The reason we succeed, IMHO, is our DMS Software Reengineering Toolkit, which was designed to do this job. (I'm the architect; check out my SO icon/bio).
Lots of careful testing helps, too.
DMS "knows" what the compiler knows about code, by virtue of having a compiler-like front end for the language of interest, and having the ability to build ASTs, symbol tables, control and data flows, call graphs. It uses much of the compiler technology that the compiler community spent the last half-century inventing, because that stuff has been proven to be useful in translation!
DMS knows more than most compilers know, because it can read/analyze/transform the entire application at once; most compilers stick to single compilation units. Thus one can code translation rules that depend on the entire application as opposed to just the current statement. We often add problem- or application-specific knowledge to improve the translation. This often shows up when converting special features of a language, or calls on libraries, where one must recognize the library calls as special idioms, and translate them to calls on compositions of target libraries and language constructs.
This capability is used to build translators (e.g., the JOVIAL translator), or domain-specific code generators.
More often we build complex automated software engineering tools that solve problems specific to customers, such as program analysis tools (dead code, duplicate code, style-broken code, metrics, architecture extraction, ...), and mass change tools (platform [not langauge] migrations, data layer insertion, API replacement, ...)
It seems to me, as is almost always the case with MS-ACCESS questions having tags that attract the wider StackOverflow population, that the people answering are missing the key question here, which I read as:
Are there any tools that can successfully convert an Access application to any other platform?
And the answer is
ABSOLUTELY NOT
The reason for that is simply that tools in the same family that use similar models for the UI objects (e.g., VB6) lack so many things that Access provides by default (how do you convert an Access continuous subform to VB6 and not lose functionality?). And other platforms don't even share the same core model as VB6 and Access, so those have even more hurdles to clear.
The cited MySQL article is quite interesting, but it really confuses the problems that come with incompetently-developed apps vs. the problems that come with the development tools being used. A bad data schema is not inherent to Access -- it's inherent to [most] novice database users. But the articles seems to attribute this problem to Access.
And entirely overlooks the possibility of fixing the schema, upsizing it to MySQL and keeping the front end in Access, which is by far the easiest approach to the problem.
This is exactly what I expect from people who just don't get Access -- they don't even consider that Access as front end to a securable, large-capacity server database engine can be a superior solution to the problem.
That article doesn't even really consider conversion of an Access app, and there's good reason for that. All the tools that I've seen that claim to convert Access applications (to whatever platform) either convert nothing but data (in which case they don't convert the app at all -- morons!), or convert the front end structure slavishly, with a 1:1 correspondence between UI objects in the Access application and in the target app.
This doesn't work.
Access's application design is specific to itself, and other platforms don't support the same set of features. Thus, there has to be translation of Access features into a working substitute for the original feature in the converted application. This is not something that can be done in an automated fashion, in my opinion.
Secondly, when contemplating converting an Access app for deployment in the web browser, the whole application model is different, i.e., from stateful to stateless, and so it's not just a matter of a few Access features that are unsupported, but of a completely different fundamental model of how the UI objects interact with the data. Perhaps a 100% unbound Access app could be relatively easily be converted to a browser-based implementation, but how many of those are there? It would mean an Access app that uses no subforms whatsoever (since they can't be unbound), and an app that uses only a handful of events from the rich event model (most of which work only with bound forms/controls). In short, a 100% unbound Access app would be one that fights against the whole Access development paradigm. Anyone who thinks they want to build an unbound app in Access really shouldn't be using Access in the first place, as the whole point of Access is the bound forms/controls! If you eliminate that, you've thrown out the majority of Access's RAD advantage over other development platforms, and gained almost nothing in return (other than enormous code complexity).
To build an app for deployment in the web browser that accomplishes the same tasks as an Access applications requires from-the-ground-up redesign of the application UI and workflow. There is no conversion or translation that will work because the successful Access application model is antithetical to the successful web application model.
Of course, all of this changes with Access 2010 and Sharepoint Server 2010 with Access Services. In that case, you can build your app in Access (using web objects) and deploy on Sharepoint for users to run it in the browser. The results are functionally 100% equivalent (and 90% visually), and run on all browsers (no IE-specific dependencies here).
So, starting this June, the cheapest way to convert an Access app for deployment in the browser may very well be to upgrade to A2010, convert the design to use all web objects, and then deploy with Sharepoint. That's not a trivial project, as Access web objects have a limited set of features in comparison to client objects (and no VBA, for instance, so you have to learn the new macros, which are much more powerful and safe than the old ones, so that's not the terrible hardship it may seem for those familiar with Access's legacy macros), but it would likely be much less work than a full-scale redesign for deployment on the web.
The other thing is that it won't require any retraining for end users (insofar as the web-object version is the same as the original client version), as it will be the same in the Access client as in the web browser.
So, in short, I'd say conversion is a chimera, and almost always not worth the effort. I'm agreeing with the cited sentiment, in fact (even if I have a lot of problems with the other comments from that source). But I'd also caution that the desire for conversion is often misguided and misses out on cheaper, easier and better solutions that don't require wholesale replacement of the Access app from top to bottom. Very often the dissatisfaction with Jet/ACE as data store confuses people into thinking they have to replace the Access application as well. And it's true that many user-developed Access apps are filled with terrible, unmaintainable compromises and are held together with chewing gum and bailing wire. But a badly-designed Access application can be improved in conjunction with the back-end upsizing andrevision of the data schema -- it doesn't have to be discarded.
That doesn't mean it's easy -- it's very often not. As I tell clients all the time, it's usually easier to build a new house than to remodel an old one. But one of the reasons we remodel old houses is because they have irreplaceable characteristics that we don't want to lose. It's very often the case that an Access app implicitly includes a lot of business rules and modelling of workflows that should not be lost in a new app (the old Netscape conundrum, pace Joel Spolsky). These things may not be obvious to the outside developer trying to port to a different platform, but for the end user, if the app produces results that are off by a penny in comparison to the old app, they'll be unhappy (and probably should be, since it may mean that other aspects of the app are not producing reliable results, either).
Anyway, I've rambled on for too long, but my opinion is that conversion never works except for the most trivial apps (or for ones that were designed to be converted, e.g., a 100% unbound Access app). I'm all for revision in place of replacment.
But, of course, that's how I make my living, i.e., fixing Access apps.
A couple of issues that effect the success or failure of cross-language conversion are the relative semantic richness of the languages, and their semantic models.
Translation from C++ to C should be relatively easy, but translation of C to idiomatic C++ would be next to impossible because that would be next to impossible to automatically turn a procedural program into an OO program.
Translation of Java to C would be relatively simple, though handling storage management would be messy. Translation of C into Java would be next to impossible if the C program did funky pointer arithmetic or casting between integers and different kinds of pointer.
Translation of a functional language to an imperative language would be much easy though the result would probably be inefficient, an non-idiomatic. Translation of an imperative language to a functional language is probably beyond the state of the art .... unless you implement an interpreter for the imperative language in the functional language.
What this means is that some translators are necessarily going to be more successful than others in terms of:
completeness and accuracy of translation, and
readability and maintainability of the resulting code.
Things You Should Never Do, Part I by Joel Spolsky
"....They did it by making the single worst strategic mistake that any software company can make:
They decided to rewrite the code from scratch."
I have a list of MS Access converters on my website. I've never heard anything good about any of them in any postings in the Access related newsgroups I read on a daily basis. And I read a lot of postings on a daily basis.
Also note that there is a significant amount of functionality in Access, such as bound continuous forms or subforms, that is more work to reproduce in other systems. Not necessarily a lot of work but more work. And more troubles when it comes time to distribute and install the app.
I've used an automated converter from C# to Visual Basic.NET. It worked pretty well except for adding some unnecessary If True statements.
I've also attempted to use Shed Skin to convert Python-to-C++, but it didn't work because of its lack of support for new-style division.
I've used tools for converting a VB6 Project into VB.Net - which you would hope would be perhaps one of the simpler examples of this sort of thing. My experience was that everything had to be checked, in fine detail, and half the stuff was missing / wrong.
Certainly I would recommend a migration by hand, or depending on the language you're targetting, I would consider a complete rewrite if this gives you a chance to make major improvements to your codebase.
Martin
I have only tried free and basic paid for converters. But the main problem is that it is very very hard to have confidence that the conversion is entirely successful.
Usually they are best used to hand convert code section at a time, where you review each piece of code. Often in my experience a rewrite instead of a conversion turns out to be a better option.

Machine learning challenge: diagnosing program in java/groovy (datamining, machine learning)

I'm planning to develop program in Java which will provide diagnosis. The data set is divided into two parts one for training and the other for testing. My program should learn to classify from the training data (BTW which contain answer for 30 questions each in new column, each record in new line the last column will be diagnosis 0 or 1, in the testing part of data diagnosis column will be empty - data set contain about 1000 records) and then make predictions in testing part of data :/
I've never done anything similar so I'll appreciate any advice or information about solution to similar problem.
I was thinking about Java Machine Learning Library or Java Data Mining Package but I'm not sure if it's right direction... ? and I'm still not sure how to tackle this challenge...
Please advise.
All the best!
I strongly recommend you use Weka for your task
Its a collection of machine learning algorithms with a user friendly front-end which facilitates a lot of different kinds of feature and model selection strategies
You can do a lot of really complicated stuff using this without really having to do any coding or math
The makers have also published a pretty good textbook that explains the practical aspects of data mining
Once you get the hang of it, you could use its API to integrate any of its classifiers into your own java programs
Hi As Gann Bierner said, this is a classification problem. The best classification algorithm for your needs I know of is, Ross Quinlan algorithm. It's conceptually very easy to understand.
For off-the-shelf implementations of the classification algorithms, the best bet is Weka. http://www.cs.waikato.ac.nz/ml/weka/. I have studied Weka but not used, as I discovered it a little too late.
I used a much simpler implementation called JadTi. It works pretty good for smaller data sets such as yours. I have used it quite a bit, so can confidently tell so. JadTi can be found at:
http://www.run.montefiore.ulg.ac.be/~francois/software/jaDTi/
Having said all that, your challenge will be building a usable interface over web. To do so, the dataset will be of limited use. The data set basically works on the premise that you have the training set already, and you feed the new test dataset in one step, and you get the answer(s) immediately.
But my application, probably yours also, was a step by step user discovery, with features to go back and forth on the decision tree nodes.
To build such an application, I created a PMML document from my training set, and built a Java Engine that traverses each node of the tree asking the user to give an input (text/radio/list) and use the values as inputs to the next possible node predicate.
The PMML standard can be found here: http://www.dmg.org/ Here you need the TreeModel only. NetBeans XML Plugin is a good schema-aware editor for PMML authoring. Altova XML can do a better job, but costs $$.
It is also possible to use an RDBMS to store your dataset and create the PMML automagically! I have not tried that.
Good luck with your project, please feel free to let me know if you need further inputs.
There are various algorithms that fall into the category of "machine learning", and which is right for your situation depends on the type of data you're dealing with.
If your data essentially consists of mappings of a set of questions to a set of diagnoses each of which can be yes/no, then I think methods that could potentially work include neural networks and methods for automatically building a decision tree based on the test data.
I'd have a look at some of the standard texts such as Russel & Norvig ("Artificial Intelligence: A Modern Approach") and other introductions to AI/machine learning and see if you can easily adapt the algorithms they mention to your particular data. See also O'Reilly, "Programming Collective Intelligence" for some sample Python code of one or two algorithms that might be adaptable to your case.
If you can read Spanish, the Mexican publishing house Alfaomega have also published various good AI-related introductions in recent years.
This is a classification problem, not really data mining. The general approach is to extract features from each data instance and let the classification algorithm learn a model from the features and the outcome (which for you is 0 or 1). Presumably each of your 30 questions would be its own feature.
There are many classification techniques you can use. Support vector machines is popular as is maximum entropy. I haven't used the Java Machine Learning library, but at a glance I don't see either of these. The OpenNLP project has a maximum entropy implementation. LibSVM has a support vector machine implementation. You'll almost certainly have to modify your data to something that the library can understand.
Good luck!
Update: I agree with the other commenter that Russel and Norvig is a great AI book which discusses some of this. Bishop's "Pattern Recognition and Machine Learning" discusses classification issues in depth if you're interested in the down and dirty details.
Your task is classical for neural networks, which are intended first of all to solve exactly classification tasks. Neural network has rather simple realization in any language, and it is the "mainstream" of "machine learning", closer to AI than anything other.
You just implement (or get existing implementation) standart neural network, for example multilayered network with learning by error back propagation, and give it learning examples in cycle. After some time of such learning you will get it working on real examples.
You can read more about neural networks starting from here:
http://en.wikipedia.org/wiki/Neural_network
http://en.wikipedia.org/wiki/Artificial_neural_network
Also you can get links to many ready implementations here:
http://en.wikipedia.org/wiki/Neural_network_software

Categories

Resources