Building a Texas Hold'em playing AI..from scratch [closed]

Building a Texas Hold'em playing AI..from scratch [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'm interested in building a Texas Hold 'Em AI engine in Java. This is a long term project, one in which I plan to invest at least two years. I'm still at college, haven't build anything ambitious yet and wanting to tackle a problem that will hold my interest in the long term. I'm new to the field of AI. From my data structures class at college, I know basic building blocks like BFS and DFS, backtracking, DP, trees, graphs, etc. I'm learning regex, studying for the SCJP and the SCJD and I'll shortly take a (dense) statistics course.
Questions:
-Where do I get started? What books should I pick? What kind of AI do poker playing programs run on? What open source project can I take a page from? Any good AI resources in Java? I'm interested in learning Lisp as well, is Jatha good?

The following may prove useful:
The University of Alberta Computer Poker Research Group
OpenHoldem
Poker Hand Recognition, Comparison, Enumeration, and Evaluation
The Theory of Poker
The Mathematics of Poker
SpecialKPokerEval

Poker AI's are notoriously difficult to get right because humans bet unpredictably. It's usually broken into two parts.
1) Calculate the odds of your hand being the winner.
2) Formulate betting strategy based on 1.
I'd recommend starting with lots of statistics reading for part 1. It seems easy at first blush, but it's actually very complicated (and getting it wrong will doom your AI). Then move on to genetic algorithms for part 2. Betting strategies are mostly genetic algorithms. They adjust themselves based on past success and failures + some randomization so as not to become predictable.

I wrote a Texas Hold'em Video Poker engine in Java
This code is a core engine for Texas Hold'em without views and others
http://github.com/phstc/javapokertexasholdem

Also, letting genetic algorithm adjust the weights of neural network, which determines the decision logic. This approach is very suitable for poker AI.
I made my own AI like this. At first, I created ~1000 players, who didn't know how to play the game at all. Based on their initial luck during the hands, their fitness was weighted and new generation created. New "brains" were playing better than previous generation.
Eventually, the best individuals played very good.

As already recommended, the book Theory of Poker is a truly invaluable source of information for playing the game as well as for building an AI. You should probably buy it as it does not cost that much.
University of Alberta resarch group does the state-of-the-art at the moment, though they have stiff competition emerging every now and then. (Not all poker bots and AI research in the field is public because of the temptation to use one's results in internet poker, though that's forbidden.)
First you should decide what sort of poker are you going to tackle first. two player hold'em is pretty much solved, though the best humans still put up a real fight with the best AI's available. The AI has the main advantages over humans by having an unlimited flawless memory of past hands, flawless analysis of the patterns based on that and as they are machines, they don't tilt like almost all humans occasionally do.
Fixed Hold'em is probably the easiest to crack, so you might want to start with 1-1 fixed hold'em and then decide what you want to do next.
Here are some aspects which change the correct strategy (and your AI):
A cash game is different from a
tournament
-The number of players
makes the decisions different.
Hold'em
is not the only poker. Omaha, Stud
and others exist and are widely
played.
Fixed Limit is different from
Pot Limit, which is different from No
Limit.
To beat the best you need to cover a lot of very subtle things the best players think about when they play. To beat a low-stakes amateur game, none of these things count.
If you decide to go for No Limit Hold'em, you might want to check out three-book series Harrington on Hold'em and a book No Limit Poker - Theory and Practice. Having read quite a many books on poker, I can say these books combined with the Theory of Poker are quite enough.

I'm not sure which exact game you are interested in, but the typical approach is to create a much smaller abstract version of the game, solving that smaller game, and then mapping real game situations back to the abstract game to generate advice. Most of the academic papers skip over the details of this process in favor of presenting results about convergence, exploitation, and competition results.
However, there are some publicly available code bases which present a complete implementation. One of the best ones is Fell Omen:
http://www.deducer.org/pmwiki/pmwiki.php?n=Main.ArtificialIntelligencePoker
This is a basic complete strategy bot that uses fictitious play to optimize the strategy for the abstract game. It's a good starting point because it is fairly straightforward, complete, and represents a good presentation of the abstract game approach.
If you are interested in developing poker AI, I would suggest reading everything from 2007 and on from the UA poker group and Tuomas Sandholm's students:
http://www.cs.cmu.edu/~sandholm/
http://poker.cs.ualberta.ca/publications.html

I wrote a Hold'Em AI in my undergrad. It wasn't particularly advanced, I used a Q-Value machine that traversed a number of states and updated Q values for each state.
I found the University of Alberta's AI Poker project an invaluable source of info for avoiding pitfalls.
As one poster above states, the first step is to nail in a couple of determinable poker rules - one-on-one poker can be developed programatically.
One pitfall I fell into was not building in reconfigurability early on. For instance being able to switch the grade of learning/playing.
I would be interested to hear how you get on drop me a mail stevekeogh at gmail.com

Just to add to the links above, one of the important things to implement would be http://en.wikipedia.org/wiki/Kelly_criterion which will help figure out the optimal size of bets given the expected odds in a series of bets.
With humans there could be errors in judgement of odds, but if your AI program can spit the some expected normalized odds based on whatever the algorithm then this bet-sizing technique which balances both risk and reward for the advantage gambler would be a good cheap solution.

One interesting result I've heard is that if you restrict the betting options to fold, check and all-in, you can write an AI that wins one-on-ones with probability at least 49%, and 49.5% if it's (IIRC) not going first.
I don't know that this AI is easier to write that one which knows how much to bet, but it's food for thought: choosing amounts to bet only accounts for 1.5% of the probability of winning.

Related

What's a good way of generating football scores based on team strength?

About a year ago I tried to make a simple text-based football simulation program but I abandoned it and I want to start from the beginning since it wasn't really good. My problem is finding a good algorithm for calculating the scores.
I don't want this program to just be a big Math.random() mess, but instead, use a combination of Math.random() and team strength (each team was given attack and defense ratings of 1-20, but that could be improved too since I'm not sure if that's the optimal way of doing it) in order to calculate the scores, and maybe throw home advantage and form/morale into the mix (but that would be later since it's slightly more complicated).
In my previous program, the algorithm compared the attack and defense ratings of the 2 teams and then decided how many goals to randomize. So for example, if a team had a max of 20 attack rating and played against a team with 5 defence rating, they would most likely score a few goals. A few problems with this algorithm are:
I think will be really hard to incorporate home advantage and form using this algorithm. It was a lot of if's and else's with a random integer between 1 to 100 to have different chance percentages.
There were not a lot of surprise results, despite the fact that I gave smaller teams a chance to score a few goals.
The scores were insane, with teams winning 5-1 week in, week out.
It didn't technically check if a team is better overall, just decided the number of goals a team would score based on attack vs defense. So theoretically, attack could be overpowered (I might be wrong though).
Let's say a team who has to win the league and they are playing against a team who already secured a mid-table finish: that wouldn't have had an effect on the game. I suppose this is something to add next to form but I don't know.
Considering these points, do you have a good idea for a better algorithm? I don't know machine learning and all that stuff, I want this program to be somewhat simple. I also don't need you to write the whole algorithm for me of course, just give me some general ideas. Thanks a lot :)

My suggestion (just thinking out loud, I have no experience at all on football scores simulation).
I would assign every team a relative "goal potential", meaning like "how many goals would that team score on average" (against the whole population of teams). Then the simulation would be based on a normal distribution with the mean equal to the difference of goal potentials of the two players, and a some standard deviation.
To get an idea of appropriate values, you could take historical data of a few teams and compute their average number of goals (per match played), and the global standard deviation.
You could refine the model by assigning the teams different standard deviations, which will reflect their "regularity". Again you can compute these on actual data to get an idea of realistic values (and check if using different values makes any sense).
The home advantage can be modelled as just a fixed increment of the goal potential. You can also measure actual home advantage by computing the two average goal-per-match of a given team, both when home or visitor, and taking the difference (this is a way to see if home advantage is real or just folklore).

Beat Matching Algorithm

I've recently begun trying to create a mobile app (iOS/Android) that will automatically beat match (http://en.wikipedia.org/wiki/Beatmatching) two songs.
I know that this exists out there, and there have been others who have had some success, but I'm running into issues related to the accuracy of the players.
Specifically, I run into "sync" issues where the "beats" don't line up. The various methods used to date are:
Calculate the BPM in advance, identify a "beat" (using something like sonicapi.com), and trying to line up appropriately, and begin a mix in with its playback rate adjusted (tempo adjustment)
Utilizing a bunch of meta data to trigger specific starts and stops
What does NOT work:
Leveraging echonest's API (it beat matches on the server, we want to do it on the client)
Something like pydub (does not do it in realtime)
Who uses this algorithm today:
iwebdj
Traktor
Does anyone have any suggestions on how to solve this problem? I've seen lots of people do it, but doing it in real time on a mobile device seems to be an issue.

There are lots of methods for solving this problem, some of which work better than others. Matthew Davies has published several papers on the matter, among many others. Glancing at this article seems to break down some of the steps necessary for doing this. I built a beat tracker in Matlab (unfortunately...) with a fellow student and our goal was to create an outro/intro between 2 songs so that the tempo was seamless between them. We wanted to do this for songs that varied in BPM by a small amount (+-7 or so BPM between the two). Our method went sort of like this:
Find two songs in our database that had overlapping 'key center'. So lets say 2 songs, both in Am.
Find this particular overlap of key centers between the two. Say 30 seconds into song 1 and 60 seconds into song 2
Now create a beat map, using an onset-detection algorithm with peak picking; Also, this was helpful for us.
Pick the first 'beat' for each track, and overlap the two tracks at that point. Now, since they are slightly different BPM from each other, the beats won't really line up with each other.
From this, we created a sort of map that gave us the sample offsets between beats of song A and beats of song B. From this, we wanted to be able to time-stretch the fade-in region of song B so that each one of its onsets (beats in this case) lined up at the correct sample index as the onsets from song A, over ITS fade-out region. So for example, if onset 2 from song B was shown as 5,000 samples ahead of onset 2 from song A, we simply stretched that 5,000 sample region so that onset 2 matched exactly between both songs.
This seems like it would sound weird, but it actually sounded pretty good. Although this was done entirely offline in Matlab, I am also looking for a way to do this in real-time in a mobile app. Not entirely sure about libraries you can use for this in Android world, but I imagine that it would be most efficient in C++.
A couple of libraries I have come across would be good for prototyping something, or at least studying the source code to get a better understanding of how you could do this in a mobile app:
Essentia (great community, open-source)
Aubio (also seems to be maintained pretty well, open-source)
Additional things to read up on for doing this kind of stuff in iOS land:
vDSP Programming guide
This article may also help
I came across this project that is doing some beat detection. Although it seems pretty out-dated unfortunately, it may offer some additional insights.
Unfortunately it isn't as simple as just 'pressing play' at the same time to align beats, unless you are assuming very specific aspects about them (exact tempos, etc.).
If you reallllly have some time on your hands, you should check out Tristan Jehan's (founder of Echonest) thesis; it is jam packed with algorithms and methods for beat detection, etc.

Detect specific sound in audio file [duplicate]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
How can the tempo/BPM of a song be determined programmatically? What algorithms are commonly used, and what considerations must be made?

This is challenging to explain in a single StackOverflow post. In general, the simplest beat-detection algorithms work by locating peaks in sound energy, which is easy to detect. More sophisticated methods use comb filters and other statistical/waveform methods. For a detailed explication including code samples, check this GameDev article out.

The keywords to search for are "Beat Detection", "Beat Tracking" and "Music Information Retrieval". There is lots of information here: http://www.music-ir.org/
There is a (maybe) annual contest called MIREX where different algorithms are tested on their beat detection performance.
http://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/mck/
That should give you a list of algorithms to test.
A classic algorithm is Beatroot (google it), which is nice and easy to understand. It works like this:
Short-time FFT the music to get a sonogram.
Sum the increases in magnitude over all frequencies for each time step (ignore the decreases). This gives you a 1D time-varying function called the "spectral flux".
Find the peaks using any old peak detection algorithm. These are called "onsets" and correspond to the start of sounds in the music (starts of notes, drum hits, etc).
Construct a histogram of inter-onset-intervals (IOIs). This can be used to find likely tempos.
Initialise a set of "agents" or "hypotheses" for the beat-tracking result. Feed these agents the onsets one at a time in order. Each agent tracks the list of onsets that are also beats, and the current tempo estimate. The agents can either accept the onsets, if they fit closely with their last tracked beat and tempo, ignore them if they are wildly different, or spawn a new agent if they are in-between. Not every beat requires an onset - agents can interpolate.
Each agent is given a score according to how neat its hypothesis is - if all its beat onsets are loud it gets a higher score. If they are all regular it gets a higher score.
The highest scoring agent is the answer.
Downsides to this algorithm in my experience:
The peak-detection is rather ad-hoc and sensitive to threshold parameters and whatnot.
Some music doesn't have obvious onsets on the beats. Obviously it won't work with those.
Difficult to know how to resolve the 60bpm-vs-120bpm issue, especially with live tracking!
Throws away a lot of information by only using a 1D spectral flux. I reckon you can do much better by having a few band-limited spectral fluxes (and maybe one broadband one for drums).
Here is a demo of a live version of this algorithm, showing the spectral flux (black line at the bottom) and onsets (green circles). It's worth considering the fact that the beat is extracted from only the green circles. I've played back the onsets just as clicks, and to be honest I don't think I could hear the beat from them, so in some ways this algorithm is better than people at beat detection. I think the reduction to such a low-dimensional signal is its weak step though.
Annoyingly I did find a very good site with many algorithms and code for beat detection a few years ago. I've totally failed to refind it though.
Edit: Found it!
Here are some great links that should get you started:
http://marsyasweb.appspot.com/
http://www.vamp-plugins.org/download.html

Beat extraction involves the identification of cognitive metric structures in music. Very often these do not correspond to physical sound energy - for example, in most music there is a level of syncopation, which means that the "foot-tapping" beat that we perceive does not correspond to the presence of a physical sound. This means that this is a quite different field to onset detection, which is the detection of the physical sounds, and is performed in a different way.
You could try the Aubio library, which is a plain C library offering both onset and beat extraction tools.
There is also the online Echonest API, although this involves uploading an MP3 to a website and retrieving XML, so might not be so suitable..
EDIT: I came across this last night - a very promising looking C/C++ library, although I haven't used it myself. Vamp Plugins

The general area of research you are interested in is called MUSIC INFORMATION RETRIEVAL
There are many different algorithms that do this but they all are fundamentally centered around ONSET DETECTION.
Onset detection measures the start of an event, the event in this case is a note being played. You can look for changes in the weighted fourier transform (High Frequency Content) you can look for large changes in spectrial content. (Spectrial Difference). (there are a couple of papers that I recommend you look into further down) Once you apply an onset detection algorithm you pick off where the beats are via thresholding.
There are various algorithms that you can use once you've gotten that time localization of the beat. You can turn it into a pulse train (create a signal that is zero for all time and 1 only when your beat happens) then apply a FFT to that and BAM now you have a Frequency of Onsets at the largest peak.
Here are some papers to lead you in the right direction:
https://web.archive.org/web/20120310151026/http://www.elec.qmul.ac.uk/people/juan/Documents/Bello-TSAP-2005.pdf
https://adamhess.github.io/Onset_Detection_Nov302011.pdf
Here is an extension to what some people are discussing:
Someone mentioned looking into applying a machine learning algorithm: Basically collect a bunch of features from the onset detection functions (mentioned above) and combine them with the raw signal in a neural network/logistic regression and learn what makes a beat a beat.
look into Dr Andrew Ng, he has free machine learning lectures from Stanford University online (not the long winded video lectures, there is actually an online distance course)

If you can manage to interface with python code in your project, Echo Nest Remix API is a pretty slick API for python:
There's a method analysis.tempo which will give you the BPM. It can do a whole lot more than simple BPM, as you can see from the API docs or this tutorial

Perform a Fourier transform, and find peaks in the power spectrum. You're looking for peaks below the 20 Hz cutoff for human hearing. I'd guess typically in the 0.1-5ish Hz range to be generous.
SO question that might help: Bpm audio detection Library
Also, here is one of several "peak finding" questions on SO: Peak detection of measured signal
Edit: Not that I do audio processing. It's just a guess based on the fact that you're looking for a frequency domain property of the file...
another edit: It is worth noting that lossy compression formats like mp3, store Fourier domain data rather than time domain data in the first place. With a little cleverness, you can save yourself some heavy computation...but see the thoughtful comment by cobbal.

To repost my answer: The easy way to do it is to have the user tap a button in rhythm with the beat, and count the number of taps divided by the time.

Others have already described some beat-detection methods. I want to add that there are some libraries available that provide techniques and algorithms for this sort of task.
Aubio is one of them, it has a good reputation and it's written in C with a C++ wrapper so you can integrate it easily with a cocoa application (all the audio stuff in Apple's frameworks is also written in C/C++).

There are several methods to get the BPM but the one I find the most effective is the "beat spectrum" (described here).
This algorithm computes a similarity matrix by comparing each short sample of the music with every others. Once the similarity matrix is computed it is possible to get average similarity between every samples pairs {S(T);S(T+1)} for each time interval T: this is the beat spectrum. The first high peak in the beat spectrum is most of the time the beat duration. The best part is you can also do things like music structure or rythm analyses.

I'd imagine this will be easiest in 4-4 dance music, as there should be a single low frequency thud about twice a second.

How to make career guidance system intelligent

Well at-last I am working on my final year project which is Intelligent web based career guidance system the core functionality of my system is
Recommendation System
Basically our recommendation system will carefully examine user preferences by taking Interest tests and user’s academic record and on the basis of this examined information it will give user the best career options i.e the course like BS Computer Science etc. .
Input of the recommendation system will be the student credentials and Interest test and in interest test the questions will be given according to user academic history and the answers that he is giving in the test, so basically test will not be asking same questions from everyone it will decide on real time about what to ask from which user according to rules defined by the system.
Its output will be the option of fields which will be decided on the basis of Interest test.
Problem
When I was defending my scope infront of committee they said "this is simple if-else" this system is not intelligent.
My question is which AI technique or Algorithm could be use to make this system intelligent. I have searched a lot but papers related to my system are much more superficial they are just emphasizing on idea not on methodology.
I want to do all my work in Java. It is great if answer is technology specific.
You people can transfer my question to any other stackexchange site if it is not related to SO Q&A criteria.
Edit
After getting some idea from answers I want to implement expert system with rule based and inference engine. Now I want to be more clear on technology aspect to implement rule based engine. After searching I have found Drools to be best but Is it also compatible with web applications? And I also found Tohu to be best dynamic form generator (as this is also need of my project). can I use tohu with drools to make my web application? Is it easy to implement this type of system or not?

If you have a large amount of question, each of them can represent a feature. Assuming you are going to have a LOT of features, finding the series of if-else statements that fulfills the criteria is hard (Recall that a full tree with n questions is going to have 2^n "leaves" - representing 2^n possible answers for these questions, assuming each question is yes/no question).
Since hard programming the above is not possible for a large enough (and probably a realistic size n - there is a place for heuristical solutions one of those is Machine Learning, and specifically - the classification problem. You can have a sample of people answering your survey, with an "expert" saying what is the best career for them, and let an algorithm find a classifier for the general problem (If you want to convert it into a series of yes-no questions automatically, it can be done with a decision tree, and an algorithm like C4.5 to create the tree).
It could also be important to determine - which questions are actually relevant? Is a gender relevant? Is height relevant? These questions as well can be answered using ML algorithms with feature selection algorithms for example (one of these is PCA)
Regarding the "technology" aspect - there is a nice library in java - called Weka which implement many of the classification algorithms out there.
One question you could ask (and try to find out in your project) which classification algorithm will be best for this problem? Some possibilities are The above mentioned C4.5, Naive Bayes, Linear Regression, Neural Networks, KNN or SVM (which usually turned out best for me). You can try and back your decision which algorithm to use with a statistical research and a statistical proof which is better. Wilcoxon test is the standard for this.
EDIT: more details on point 2:
In here an "expert" can be a human classifier from the field of HR
that reads the features and classifies the answers. Obtaining this
data (usually called the "training data") is hard and expansive
sometimes, if your university has an IE or HR faculty, maybe they
will be willing to help.
The idea is: Gather a bunch of people who first answer your survey. Then, give it to a human classifier ("expert") which will chose what is the best career for this person, based on his answers. The data with the classification given by the expert is the input of the learning algorithm, its output will be a classifier.
A classifier is a function itself, that given answers to a surveys - predicts what is the "classification" (suggested career) for the person who did this survey.
Note that once you have a classifier - you do not need to maintain the training data any more, the classifier alone is enough. However, you should have your list of questions and the answers for these questions will be the features provided to the classifier.

All you have to do to satisfy them is create a simple learning system:
Change your thesis terminology so it is described as "learning the best career" instead of using the word "intelligent". Learning is a form of artificial intelligence.
Create a training regime. Do this by giving the questionnaire to people that already have careers and also ask questions to find out how satisfied they are with their career. That way your system can train on what makes a good career match and what makes a bad one.
Choose a learning system to absorb the data from (2). For example, one source of ideas might be this recent paper: http://journals.cluteonline.com/index.php/RBIS/article/download/4405/4493. Product sum networks are cutting edge in AI and apply well to expert-system-like problems.
Finally, try to give a twist to whatever your technology is to make it specific to your problem.

In my final project, I had some experience with Jena RDF inference engine. Basically, what you do with it is create a sort of knowledge base with rules like "if user chose this answer, he has that quality" and "if user has those qualities, he might be good for that job". Adding answers into the system will let you query his current status and adjust questions accordingly. It's pretty easy to create a proof of concept with it, it's easier to do than a bunch of if-else, and if your professors worship prolog-ish style things, they'll like it.

As #amit suggested, Bayesian analysis can provide you guidance on the next question to ask. Another pitfall of dynamic tests is artificial thresholds ("if your score is 28, you are in this category, if your score is 27, you are not"), a problem which fuzzy logic can help address. Another benefit of fuzzy logic is that adding a new category is relatively easy, since the domain expert is only asked to contribute qualitative assessments, not quantitative thresholds.

A program is never more intelligent than the person who wrote it. So, I would first use the collective intelligence that has been built and open sourced already.
Pass your set of known data points as an input to Apache Mahout's PearsonCorrelationSimilarity and use the output to predict which course is the best match. In addition to being open source and scalable, you can also record the outcome and feed it back to the system to improve the accuracy over time. It is very hard to match this level of performance because it is a lot easier to tweak an out of the box algorithm or replace it with your own than it is to deal with a bunch of if else conditions.
I would suggest reading this book . It contains an example of how to use PearsonCorrelationSimilarity.
Mahout also has built in recommender algorithms like NearestNeighborClusterSimilarity
that can simplify your solution further.
There's a good starter code in the book. You can build on it.
Student credentials, Interest Test Questions and answers are inputs. Career choice is the output that you can co-relate to the input. Now that's a very simplistic approach but it might be ok to start with. Eventually, you will have to apply the classifier techniques that Amit has suggested and Mahout can help you with that as well.

Drools can be used via the web, but watch out; it can be a bit of beast to configure and is likely serious overkill for your application. It is an 'enterprise' type of solution focused around rule management, rather than rule execution.
Drools is an "IF-THEN" system, and pretty much all rules engines use the Rete algorithm. http://en.wikipedia.org/wiki/Rete_algorithm - so if your original question is about how not to use an IF-THEN system, Drools is not the right choice. Now, there is a Solver and Planner part of Drools that are not IF-THEN algorithms, but this is not the main Drools algorithm.
That said, it seems like a reasonable choice for your application. Just don't expect it to be considered an 'intelligent' system by those who deem themselves as experts. Rules engines are typically used to codify (that is, make software of) the rules and regulations of business, such as 'should you be approved for a mortgage' or 'how much is your car insurance' and so on. 'what job you should do' is a reasonable application of the same.
If you want to add more AI like intelligence here are a few ideas
Use machine learning to get feedback from the user about earlier recommendations. So, if someone likes or hates a suggestion, add that back in as a feature of the person. You are now doing some basic feedback/reinforcement learning (bayes, neural nets) to try to better classify the person to the career.
Consider the questions you ask the person. Do you need to ask all of the questions? If you can alter the flow of questions based on their responses (by estimating what kind of person they are) then you are trying to learn the series of questions that gives the most useful knowledge for a recommendation.
If you want specific software, look at Weka http://www.cs.waikato.ac.nz/ml/weka/ - it has many great algorithms for classifying. And it is a Java library, so you can easily use it within a web application.
Good luck.

Java real time strategy game development

I'm coming to the end of my first year of CS and I thought a great way to consolidate all the things I've learnt this year would be a personal game project.
I would like to implement a 2D based rts, I'm thinking along the lines of starcraft I, warcraft II or even command and conquer. I will have about 3 months without interruptions to implement the game.
So to anyone experienced with java game programming, I have a few questions:
Is it realistic to design a 2D rts engine from scratch in 3 months?
If so what are some good books/resources to get started?
Would it be better to modify some existing project? I would think the experience of having to work with a lot of someone else's code would be good since our exposure to such topics in an undergrad cs degree seems very rare, if non-existent.
Are there any decent open source 2d rts projects that anyone could recommend? I've looked through a few but most seem to be written in c/c++
My humble thanks
Edit: Thanks for the quick responses, I think that perhaps it was a bad idea to post this in a rush since I think I misrepresented what I want to do.
When I say "along the lines of warcraft II etc" I mean more like that style of rts using sprites. I don't intend to implement a game nearly that complex, more like just a basic prototype.
My goal would be some thing more like a flat textured map with some basic obstacles like trees, a single unit producing structure like a barracks. I'd like to have the units to have health bars, be able to move and attack and die (and possible morph into another unit).
Far off goals would be to implement some basic pathing using a modified version of the dijkstra shortest path algorithm, ranged units with missle attack, etc.
I don't plan to implement any opponents or ai or networking or anything like that.

I'm thinking along the lines of starcraft I, warcraft II or even command and conquer
Make sure you purge your mind of matching the full scope of any of those. They took large teams of developers multiple years to make, with multi-million dollar budgets, so you can't even hope to approach those. They're called AAA for a reason. That being said, there's no reason you can't very minimally ape their design, or make a tiny game in their genre, assuming you have previous experience making small games.
A sub-genre of RTS that might be doable in that amount of time is a Tower Defense game. Plants vs Zombies is a good example. The reason I suggest this sub-genre is that you can avoid implementing any sort of AI or path-finding, which are notoriously difficult to get working, and I think technically impossible to implement "perfectly", especially with a limited CPU budget.
Make sure to reign in your scope. Favor a "complete" game over new features, because you can then call it "done" at any time. Get your game playable ASAP, and don't sweat the polish or details until you have to. Add one enemy type and one type of player unit (with only one ability, if you were thinking of implementing multiple abilities per unit). Make a title screen, menus (even if the menu is just "click screen to play"), game over screen, level complete or stat screens, cross-level player statistics, etc. Once you have all that ironed out, spend equal time adding new features and polishing the gameplay/graphics/bugs.
Once you have a playable, "complete" game ready (no matter how small in scope), find a real artist to do graphics for you. A shiny game always draws an audience, no matter how simple the gameplay.

It is very unrealistic to think you could implement a 2D RTS engine anywhere even close to the complexity in those kind of games. You could maybe get something very rough if you were experienced, but with only one year I think it's doubtful.
I can't help but feel like it would be much better for you if you used an existing engine or framework and built off of it. Like you said, working with other code would probably be a good learning experience as well. It would allow you to experiment without getting bogged down in having to do everything.

Keep it simple or you will simply drown in complexity before getting around to have anything playable. Since you have not tried it before, you will have a lot of nuts to crack and you don't know how long they will take.
Also remember that report writing and documentation takes time too.
The idea is good, and I think you can pull off a whole game if you find good building blocks. I would suggest discussing this with your teacher to hear what is acceptable for you to use. Would it e.g. be ok to do a game on an open source engine if you add some non-trivial functionality?
Update: Seems to be several engines available from Java at http://www.devmaster.net/engines/list.php?fid=6&sid=1

People often forget, that creating games is MUCH MORE than just coding the technique thing. Its about content creation, game design, sound and music, the "fun factor". If you make heavy use of existent APIs or engines, it will be possible, but writing it from scratch with no experience in 3 month is like asking yourself if you can code 100,000 LOC in this time which means 1111 LOC per day. This might be possible, but not if you have to desing and think, and just having the code makes no game.

Perhaps it would make sense to look at some existing efforts to get a feel for the scope of what you are looking at. These should give you some ideas or even code to build on:
http://www.duncanjauncey.com/btinternet/old/javagame/game.html
http://en.wikipedia.org/wiki/Lightweight_Java_Game_Library
http://www.ardor3d.com/
http://en.wikipedia.org/wiki/JMonkeyEngine
It would be a lot for me to bite off (from scratch) in the time given that is for sure. That is about all I can say.
EDIT: I thought maybe JOGRE was not what you are looking for. Then I thought about it and it seems like it would have all the right kinds of plumbing for what you are trying to do.
EDIT AGAIN: After my answer, one of the related questions links on the side seemed relevant: Java Game Programming: JOGL vs LWJGL?

Well if it gives you any hope at all, my team and I are currently working on an RTS game called "The Genesis Project". We call ourselves MotherBoard Games, or MBG for short. If you would like, I am always looking for more coders. You can email me at mpmn5891#gmail.com, I can give you some advice and tips form my 6 year experience, 2 of which have been spent making this game (to give you a scope)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.