Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I have been a C++ developer for about 10 years. I need to pick up Java just for Hadoop. I doubt I will be doing any thing else in Java. So, I would like a list of things I would need to pick up. Of course, I would need to learn the core language, but what else?
I did Google around for this and this could be seen as a possible duplicate of "I want to learn Java. Show me how?" but it's not. Java is a huge programming language with lots, of libraries and what I need to learn will depend largely on what I am using Hadoop for. But I suppose it is possible to say something like don't bother learning this. This will be quite useful too.
In my day job, I've just spent some time helping a C++ person to pick up enough Java to use some Java libraries via JNI (Java Native Interface) and then shared memory into their primarily C++ application. Here are some of the key things I noticed:
You cannot manage for anything beyond a toy project without an IDE. The very first thing you should do is download a popular Java IDE (Eclipse is a fine choice, but there are also alternatives including Netbeans and IntelliJ). Do not be tempted to try and manage with vi / emacs and javac / make. You will be living in a cave and not realising it. Once you're up to speed with even basic IDE functions you will be literally dozens of times more poductive than without an IDE.
Learn how to layout a simple project structure and packages. There will be simple walkthroughs of how to do this on the Eclipse site or elsewhere. Never put anything into the default package.
Java has a type system whereby the reference and primitive types are relatively separate for historic / performance reasons.
Java's generics are not the same as C++ templates. Read up on "type erasure".
You may wish to understand how Java's GC works. Just google "mark and sweep" - at first, you can just settle for the naivest mental model and then learn the details of how a modern production GC would do it later.
The core of the Collections API should be learned without delay. Map / HashMap, List / ArrayList & LinkedList and Set should be enough to get going.
Learn modern Java concurrency. Thread is an assembly-language level primitive compared to some of the cool stuff in java.util.concurrent. Learn ConcurrentHashMap, Atomic*, Lock, Condition, CountDownLatch, BlockingQueue and the threadpools from Executors. Good books here are those by Brian Goetz and Doug Lea.
As soon as you want to use 3rd party libraries, you'll need to learn how the classpath works. It's not rocket science, but it is a bit verbose.
If you're a low-level C++ guy, then you may find some of this interesting also:
Java has virtual dispatch by default. The keyword static on a Java method is used to indicate a class method. private Java methods use invokespecial dispatch, which is a dispatch onto the exact type in use.
On an Oracle VM at least, objects comprise two machine words of header (the mark word and the class word). The mark word is a bunch of flags the VM uses - notably for thread synchronization. The class word you can think of as a pointer to the VM's representation of the Class object (which is where the vtables for methods live). Following the class word are the member fields of the instance of the object.
Java .class files are an intermediate language, and not really that similar to x86 object code. In particular there are lots more useful tools for .class files (including the javap disassembler which ships with the JVM)
The Java equivalent of the symbol table is called the Constant Pool. It's typed and it has a lot of information in it - arguably more than the x86 object code equivalent.
Java virtual method dispatch consists of looking up the correct method to be called in the Constant Pool and then converting that to an offset into a vtable. Then walking up the class hierarchy until a not-null value is found at that vtable offset.
Java starts off interpreted and then goes compiled (for Oracle and some other VMs anyway). The switch to compiled mode is done method-by-method on a as-need basis. When benchmarking and perf tuning you need to make sure that you've warmed the system up before you start, and that you should typically profile at the method level to start with. The optimizations that are made can be quite aggressive / optimistic (with a check and a fallback if the assumptions are violated) - so perf tuning is a bit of an art.
Hopefully there's some useful stuff in there to be going on with - please comment / ask followup questions.
Learning "just enough" Java is learning Java. Either you learn all the core principles and language design decisions, or you suffer along making easily avoidable mistakes. Considering that you already know how to program, a lot of the information can be skimmed (with an eye for where it differs from other languages you are intimately familiar).
so you need to learn:
How to get started
The language itself
The core, essential classes
The major Collections
And if you don't have a build framework in place, how to package your compiled code.
Beyond that, nearly every other item you might need to learn depends heavily on what you intend to do. Don't discount the on-line tutorials from Oracle/Sun, they are quite good (compared to other online tutorials).
Hadoop can use C++ : WordCount example in C++
You can't really use Java without knowing these packages in the standard API:
java.lang
java.util
java.io
And, to a lesser degree:
java.text
java.math
java.net
java.lang.reflect
java.util.concurrent
They contain a lot of classes you'll need to use constantly for pretty much any application, and it's a good idea to look through them until you know which classes they contain and what those are good for, lest you end up reinventing wheels.
Take it easy, learning Java could be
pleasant and fast if you already know
C++
Buy these two books:
The JavaTM Programming Language, (4th Edition) Ken Arnold, James
Gosling, Davis Holmes
Effective Java (2nd Edition), Joshua Bosh
You will soon be mastering Java, You will not regret. Good Luck.
Since C++ and Java share common roots, the core language shouldn't give you too much trouble. You will need to become familar with the java SDK, particularly java.lang and the Collections framework (java.util.)
But perhaps learning java is overkill if you don't see yourself using it elsewhere. Hadoop also has bindings to Python - perhaps learning python would be a better alternative? See Java vs Python on Hadoop.
Here is the quickstart for all you will need
I suggest Eclipse (java) to start working, see this for that
Maybe you don't even need to know Java to use Hadoop.
Pig is far enough from simple to advanced usage of Hadoop.
I don't know how familiar are you with other higher level programming languages. Garbage collection is an important function in Java. It would be important to read a bit about the GC in your VM of choice.
Besides the obvious packages, check out the java.util packages for the collection framework. You might want to check out the source of some classes. I suggest HashMap to get the idea of the computing/memory cost of these operations.
Java likes to use streams instead of buffers when processing large amounts of data. That may take some time getting used to.
Java has no unsigned types. Depending on the packets of data you need to process at once you can either use larger variables and streight arythetics (if we're talking about relatively small packets), or you have to (b[i] & 0xff) every time you read for example unsigned bytes. Also note that Java uses network byte order (msbf) when serializing multibyte numbers.
The most beloved design patterns by the API are Singleton, Decorator and Factory. Check the source of JFC itself for best practices, how these patterns are achieved in the language.
... and you can still post more concrete questions on SO :)
Answer 1 :
It is very desirable to know Java. Hadoop is written in Java. Its popular Sequence File format is dependent on Java.
Even if you use Hive or Pig, you'll probably need to write your own UDF someday. Some people still try to write them in other languages, but I guess that Java has more robust and primary support for them.
Most Hadoop tools are not mature enough (like Sqoop, HCatalog and so on), so you'll see many Java error stack traces and probably you'll want to hack the source code someday
Answer 2
It is not required for you to know Java.
As the others said, it would be very helpful depending on how complex your processing may be. However, there is an incredible amount you can do with just Pig and say Hive.
I would agree that it is fairly likely you will eventually need to write a user defined function (UDF), however, I've written those in Python, and it is very easy to write UDFs in Python.
Granted, if you have very stringent performance requirements, then a Java based MapReduce program would be the way to go. However, great advancements in performance are being made all of the time in both Pig and Hive.
So, the short answer to your question is, "No", it is not required for you to know Java in order to perform Hadoop development.
Source :
http://www.linkedin.com/groups/Is-it-must-Hadoop-Developer-988957.S.141072851
Most of the stuff should be pretty familiar to you. I'd just download eclipse and google a tutorial site. Familiarize yourself with classloading, keywords. One tricky thing a lot of C++ guys run into is how to run a java app so that it finds its library classes(sort of analogous to dynamic linking). Learn the difference between the JRE and JDK. If you can get a few hello world type apps working you ought to be able to get a start on hadoop if you follow the tutorials.
You dont need to learn java to use hadoop.
You need to know linux to installand configure hadoop
then you can write your map reduce jobs using the stream line api on any language which understand standard input/output
further you can do more complex map reduce using other libraries like hive etc
even other components of hadoop like hbase/ cassandra also has clients on most of the languages
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I have an application written in Java. In is stored in several files. It uses different classes with different methods. The code is big and complicated. I think it would be easier to understand the code if I have a graphical model of the code (some kind of directed graph). Are there some standard methods for visualization of code. I am thinking about usage of UML (not sure it is a correct choice). Can anybody recommend me something?
ADDED:
I consider two possibilities:
Creating the graph by hands (explicitly).
Creating graph in an automatic way. For example to use some tools that read the available code and generate some graph describing the structure of the code.
ADDED 2:
It would be nice to have something for free.
I tried using a number of UML tools and found that the reverse-engineering capabilities in most UML tools were not helpful for understanding code. They focus on designing needs and reverse-engineering capabilities often just ends up showing huge pictures of lots of useless information. When I was working on the Microsoft Office codebase, I found using a pen-and-paper more helpful that the typical design/modelling tools.
You typically want to think about doing this in a number of ways:
Use your brain: Someone else mentioned it - there is no substitute to actually trying to understand a code base. You might need to take notes down and refer back to it later. Can tools help? Definitely. But don't expect them to do most of the work for you.
Find documentation and talk to co-workers: There is no better way than having some source describe the main concepts in a codebase. If you can find someone to help you, take a pen and paper, go to him and take lots of notes. How much to bug the other person? In the beginning - as much as is practical for your work, but no amount is too little.
Think about tools: If you are new to a part of a project - you are going to be spending a significant amount of time understanding the code, so see how much help you can get automatically. There are good tools and bad tools. Try to figure out which tools have capabilities that might be helpful for you first. As I mentioned above, the average UML tool focuses more on modeling and does not seem to not be the right fit for you.
Time vs Cost: Sure, free is great. But if a free tool is not being used by many people - it might be that the tool does not work. There are many tools that were create just as an exploration of what could be done, but are not really helpful and therefore just made available for free in hopes that someone else will adopt it. Another way to think about it, decide how much your time is worth - it might make sense to spend a day or two to get a tool to work for you.
Once there, keep these in mind when going trying to understand the project:
The Mile High View: A layered architectural diagram can be really helpful to know how the main concepts in a project are related to one another. Tools like Lattix and Architexa can be really helpful here.
The Core: Try to figure out how the code works with regards to the main concepts. Class diagrams are exceptionally useful here. Pen-and-paper works often enough here, but tools can not only speed up the process but also help you save and share such diagrams. I think AgileJ and Architexa are your best bets here, but your average UML tool can often be good enough.
Key Use Cases: I would suggest tracing atleast one key use case for your app. You likely can get the most important use cases from anyone on your team, and stepping through it will be really helpful. Most IDE's are really helpful here. If you try drawing them, then sequence diagrams arethe most appropriate. For tools here I think MaintainJ, JDeveloper and Architexa are your best bets here.
Note: I am the founder of Architexa - we build tools to help you understand and document Java code, but I have tried to be unbiased above. My intention is to suggest tools and options given that this is what I focused on as part of my PhD.
The most important tool you should use is your brain, and it's free.
There's no reason why you have to use any sort of standard method of visualization, and you can use whatever media you like. Paper, whiteboard, photoshop, visio, powerpoint, notepad: all of these can be effective. Draw a diagram of classes, objects, methods, properties, variables - whatever you think is useful to see in order to understand the application. The audience is not only other members of your team, but also yourself. Create diagrams that are useful for you to look at and quickly understand. Post them around your workspace and look at them regularly to remind yourself of the overall system architecture as you build it.
UML and other code documentation standards are good guidelines for the types of diagrams you can do and the information you should consider including. However, it is overkill for most applications and basically exists for people who can't take personal responsibility for documenting without standards. If you follow UML to the letter, you'll end up spending way too much time on your documentation instead of creating your application.
It is stored in several files. It uses different classes with different methods. The code is big and complicated.
All Java code written outside the school is like that, particularly for a new developer starting on a project.
This is an old question, but as this is coming up in Google searches, I am adding my response here so that it could be useful to the future visitors. Let me also disclose that I am the author of MaintainJ.
Don't try to understand the whole application
Let me ask you this - why do you want to understand the code? Most probably you are fixing a bug or enhancing a feature of the application. The first thing you should not try to do is to understand the whole application. Trying to understand the entire architecture while starting afresh on a project will just overwhelm you.
Believe me when I say this - developers with 10+ years of solid coding experience may not understand how certain parts of the application work even after working on the same project for more than a year (assuming they are not the original developers). They may not understand how the authentication works or how the transaction management works in the application. I am talking about typical enterprise applications with 1000 to 2000 classes and using different frameworks.
Two important skills required to maintain large applications
Then how do they survive and are paid big bucks? Experienced developers usually understand what they are doing; meaning, if they are to fix a bug, they will find the location of the bug, then fix it and make sure that it does not break the rest of the app. If they need to enhance a feature or add a new feature, most of the time, they just have to imitate an existing feature that does a similar thing.
There are two important skills that help them to do this.
They are able to analyze the impact of the change(s) they do while fixing a bug. First they locate the problem, change the code and test it to make sure that it works. Then, because they know the Java language well and the frameworks 'well enough', they can tell if it will break any other parts of the app. If not, they are done.
I said that they simply need to imitate to enhance the application. To imitate effectively, one needs to know Java well and understand the frameworks 'well enough'. For example, when they are adding a new Struts Action class and adding to the configuration xml, they will first find a similar feature, try to follow the flow of that feature and understand how it works. They may have to tweak a bit of the configuration (like the 'form' data being in 'request' than in 'session' scope). But if they know the frameworks 'well enough', they can easily do this.
The bottom line is, you don't need to understand what all the 2000 classes are doing to fix a bug or enhance the app. Just understand what's needed.
Focus on delivering immediate value
So am I discouraging you from understanding the architecture? No, not at all. All I am asking you is to deliver. Once you start on a project and once you have set up the development environment on your PC, you should not take more than a week to deliver something, however small it may be. If you are an experienced programmer and don't deliver anything after 2 weeks, how can a manager know if you really working or reading sports news?
So, to make life easier for everyone, deliver something. Don't go with the attitude that you need to understand the whole application to deliver something valuable. It's completely false. Adding a small and localized Javascript validation may be very valuable to the business and when you deliver it, the manager feels relieved that he has got some value for his money. Moreover, it gives you the time to read the sports news.
As time passes by and after you deliver 5 small fixes, you would start to slowly understand the architecture. Do not underestimate the time needed to understand each aspect of the app. Give 3-4 days to understand the authentication. May be 2-3 days to understand the transaction management. It really depends on the application and your prior experience on similar applications, but I am just giving the ballpark estimates. Steal the time in between fixing the defects. Do not ask for that time.
When you understand something, write notes or draw the class/sequence/data model diagram.
Diagrams
Haaa...it took me so long to mention diagrams :). I started with the disclosure that I am the author of MaintainJ, the tool that generates runtime sequence diagrams. Let me tell you how it can help you.
The big part of maintenance is to locate the source of a problem or to understand how a feature works.
MaintainJ generated sequence diagrams show the call flow and data flow for a single use case. So, in a simple sequence diagram, you can see which methods are called for a use case. So, if you are fixing a bug, the bug is most probably in one of those methods. Just fix it, ensure that it does not break anything else and get out.
If you need to enhance a feature, understand the call flow of that feature using the sequence diagram and then enhance it. The enhancement may be like adding an extra field or adding a new validation, etc. Usually, adding new code is less risky.
If you need to add a new feature, find some other feature similar to what you need to develop, understand the call flow of that feature using MaintainJ and then imitate it.
Sounds simple? It is actually simple, but there will be cases where you will be doing larger enhancements like building an entirely new feature or something that affects the fundamental design of the application. By the time you are attempting something like that, you should be familiar with the application and understand the architecture of the app reasonably well.
Two caveats to my argument above
I mentioned that adding code is less risky than changing existing code. Because you want to avoid changing, you may be tempted to simply copy an existing method and add to it rather than changing the existing code. Resist this temptation. All applications have certain structure or 'uniformity'. Do not ruin it by bad practices like code duplication. You should know when you are deviating from the 'uniformity'. Ask a senior developer on the project to review the changes. If you must do something that does not follow the conventions, at least make sure that it's local to a small class (a private method in a 200 line class would not ruin the application's esthetics).
If you follow the approach outlined above, though you can survive for years in the industry, you run the risk of not understanding the application architectures, which is not good in the long run. This can be avoided by working on bigger changes or by just less Facebook time. Spend time to understand the architecture when you are a little free and document it for other developers.
Conclusion
Focus on immediate value and use the tools that deliver that, but don't be lazy. Tools and diagrams help, but you can do without them too. You can follow my advice by just taking some time of a senior developer on the project.
Some plugins I know for Eclipse:
Architexa
http://www.architexa.com/
nWire
http://www.nwiresoftware.com/
If you want to reverse engineer the code, you should try Enterprise Architect
have you tried Google CodePro Analytix ?
it can for example display dependencies and is free (screenshot from cod.google.com):
Here is a non UML Tool which has very nice visualization features.
You can mapping the lines of code per class / method to colors / side lenght of rectangles.
You can also show the dependencies between the classes.
http://www.moosetechnology.org/
The nice thing is, you can use Smalltalk scripting for displaying what you need:
http://www.moosetechnology.org/docs/faq/JavaModelManipulation
Here you can see how such a visualization looks like:
http://www.moosetechnology.org/tools/moosejee/casestudy
JUDE Community UML used to be able to import Java, but it is no longer the case. It is a good, free tool.
If your app is really complex I think that diagrams won't carry you very far. When diagrams become very complex they become hard to read and lose their power. Some well chosen diagrams, even if generated by hand, might be enough.
You don't need every method, parameter, and return value spelled out. Usually it's just the relationships and interactions between objects or packages that you need.
Here is a another tool that could do the trick:
http://xplrarc.massey.ac.nz/
You can use JArchitect tool, a pretty complete tool to visualize your code structure using the dependency graph, and browse you source code like a database using CQlinq.
JArchitect is free for open source contributors
Some great tools I use -
StarUML (allows code to diagram conversion)
MS Visio
XMind (very very useful for overview of the system)
Pen and Paper!
My organization is considering PDFlib for dynamically creating PDF files (http://www.pdflib.com/) in our Java (Spring/Tomcat) environment.
Does anyone have experiences that they can share about the pro/cons of this Library?
We've been using PDFlib for a few years but we switched to DynaPDF recently (we are not using Java but C++). There never were any issues with the PDFlib - it always worked stable and reliable (and we really used all features including spot colors and importing of other PDFs).
It contains very good documentation and their support is fine, too.
Unfortunately, depending on what features of PDFlib you need, it is very expensive. We requested a 3-platform license without royalties (the PDI-enabled version), and were offered a licence for around 20,000 €. This is a bit expensive for a small company like ours.
So eventually we moved on to DynaPDF, which is less expensive and creates PDF files just as reliable. We got a license including source code for about €1000. I'm not sure if they provide Java wrappers, though.
Also this question might be interesting for you.
Hope that helps.
Iv been using pdfLib for about 3 years now and its been great for me. i guess it really depends on what you want to use it for but for me its been really good. I do a lot of file maniuplations and so far its been able to do everything i need very well. Support can be better but overall its not too bad but the software itself is great.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I find the nature of this question to be quite suited for the practical-minded people on Stack Overflow.
I'm setting out on making a rather large-scale project in Java. I'm not going to go into specifics, but it is going to involve data management, parsing of heterogeneous formats, and need to have an appealing interface with editor semantics. I'm a undergraduate student and think it would evolve into a good project to show my skill for employment -- heck, ideally it would even be the grounds for a startup.
I write to ask you what shortcuts I might not be thinking about that will help with a complicated project in Java. Of course, I am planning to go at it in Eclipse, and will probably use SWT for the GUI. However, I know that Java has the unfortunately quality of overcomplicating everything, and I don't want to get stuck.
Before you tell me that I want to do it in Python, or the like, I just want to reiterate why I would choose Java:
Lots more experience with algorithms in Java, and there will be quite a bit of those.
Want a huge library of APIs to expand functionality. ANTLR, databases, libraries for dealing with certain formats
Want to run it anywhere with decent performance
I am open-minded to all technologies (most familiar with Java, perl, sql, a little functional).
EDIT: For the moment, I am giving it to djna (although the low votes). I think all of your answers are definitely helpful in some respect.
I think djna hit better the stuff I need to watch out for as novice programmer, recognizing that I'm not taking shortcuts but rather trying not to mess up. As for the suggestions of large frameworks, esp. J2EE, that is way too much in this situation. I am trying to offer the simplest solution and one in which my API can be extended by someone who is not a J2EE/JDBC expert.
Thanks for bringing up Apache Commons, although I was already aware. Still confused over SWT vs. Swing, but every Swing program I've used has been butt ugly. As I alluded to in the post, I am going to want to focus most on the file interchange and limited DB features that I have to implement myself (but will be cautious -- I am aware of the concurrency and ACID problems).
Still a community wiki to improve.
Learn/use Spring, and create your project so it will run without a Spring config (those things tend to run out of control), while retaining the possibility to configure the parameters of your environment in a later stage. You also get a uniform way to integrate other frameworks like Hibernate, Quartz, ...
And, in my experience, stay away from the GUI builders. It may seem like a good deal, but I like to retain control of my code all the time.
Google-collections:
http://code.google.com/p/google-collections/
Joda Time for all date and time manipulations.
One of Hibernate or iBATIS to manipulate data in a database
Don't forget the IDE: Eclipse, Netbeans or IDEA if you have some cash and like it
Apache Commons has a lot of time saving code that will you most likely need and can reuse.
http://commons.apache.org/
A sensible build and configuration platform will help you along the way, Ant:
http://ant.apache.org/
or Maven (preferably):
http://maven.apache.org/
Especially as the size of the project, and the number of modules in the project increase.
There's getting the "project" completed efficiently and there are "short cuts". I suspect the following may fall into the "avoiding wasted effort" category rather be truly short cuts but if any of them get you to then end more quickly then I perhaps they help.
1). Decomposition and separation of concerns. You've already identified high-level chunks (UI, persistence layer, parser etc.). Define the interfaces for the provider classes as soon as possible and have all dependent classes work against those interfaces. Pay a lot of attention to the usability of those interfaces, make them easy to understand - names matter. Even something as simple as the difference between setShutdownFlag(true) and requestShutdown(), the second has explicit meaning and hence I prefer it.
Benefits: Easier maintenance. Even during initial development "maintenance" happens. You will want to change code you've already written. Make it easy to get that right by strong encapsulation.
2). Expect iterative development, to be refining and redesigning. Don't expect to "finish" any one component in isolation before you use it. In other words don't take a purely bottom up approach to developing your componenets. As you use them you find out more information, as you implement them you find out more about what's possible.
So enable development of higher level components especially the UI by mocking and stubbing low level components. Something such as JMock is a short-cut.
3). Test early, test often. Use JUnit (or equivalent). You've got mocks for your low level components so you can test them.
Subjectively, I feel that I write better code when I've got a "test hat" on.
4). Define your error handling strategy up front. Produce good diagnostics when errors are detected.
Benefits: Much easier to get the bugs out.
5). Following on from error handling - use diagostic debugging statements. Sprinkle them liberally throughout your code. Don't use System.out.println(), instead use the debugging facilities of your logging library - use java.util.logging. Interactive debuggers are useful but not the only way of analysing problems.
Don't discount Swing for the GUI. There are a lot of good 3rd party Swing libraries available (open source and commercial) depending on your needs e.g.
JGoodies Form Layout
SwingX
JFreeChart
Use logging framework (i.e. Log4J) and plan for it. Reliable logging system saves a lot of time during bug fixing.
For me big time-savers are:
use version-control (Subversion/Git)
use automatic builds (Ant/Make/Maven)
use Continuous-integration (Hudson/CruiseControl)
the apache-commons-libraries are very useful
Test Driven Development
Figure out the functionality you want and write tests to demonstrate the functionality. Then write just enough code to pass the tests. This will have some great side effects:
By writing just enough code to satisfy your requirements you will not be tempted to overbuild, which should reduce complexity and make your code cleaner.
Having tests will allow you to "refactor with confidence" and make changes to the code knowing you're not breaking another part of the system.
You're building quality in. Not only will you have assurance that you code "works, but if you really want to use this code as a sort of "resume" for potential employers, this will show them that you place a lot of value on code quality. This alone will set you apart from the majority of the developers out there.
Aside from that, I would agree that other big time savers are Spring, having an automated build (Maven), and using some form of source control.
For data persistency, if you're not certain that you must use SQL, you should take a look at alternate data-storage libraries:
Prevayler: an in memory database systems, using developing ideas like "crash only components", that make it easy to use and very performatic. May be used even for simple applications where you just need to save some state. It has a BSD License.
BerkleyDB Java Edition: a storage system with different layers of abstraction, so one can use it as a basic key-value storage (almost an in-disk hashtable) to a fully transactional ACID system. Take a look at it's licensing information because even some commercial software may use it for free.
Be aware that they have trade-offs when compared with each other or with a standard SQL database! And, of course, there may be other similar options around, but these are the ones I know and like. :)
PS: I was not allowed to post the respective links because I'm a new user. Just google for them; first hits are the right ones.
You can save time between restarts by using JavaRebel (commercial). Briefly, this tool allows you to write your code in Eclipse and have the code changes picked up instantly.
Spring Roo can not only get your project quickly kickstarted (using a sound architecture) but also reduce your ongoing maintenance. If you're thinking of using Spring, servlets, and/or JPA, you should definitely consider it. It's also easy to layer on things like security.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I've used MyGeneration, and I love it for generating code that uses Data Access Applicaiton Blocks from Microsoft for my Data Access Layer, and keeping my database concepts in sync with the domain I am modeling. Although, it took a steeper than expected learning curve one weekend to make it productive.
I'm wondering what others are doing related to code generation.
http://www.mygenerationsoftware.com
http://www.codesmithtools.com/
Others?
Back in 2000, or so, the company I worked for used a product from Veritas Software (I believe it was) to model components and generate code that integrated components (dlls). I didn't get a lot of experience with it, but it seems that code generation has been the "holy grail" for a long time. Is it practical? How are others using it?
Thanks!
T4 is the CodeSmith killer for Microsoft!!!!
Go check it out. Microsoft doesn't want to destroy their partners so they don't advertise it, but it is a thing to be reckoned with and ITS FREE and comes installed in Visual Studio 2008.
www.olegsych.com
codeplex.com/t4toolbox
www.t4editor.net
I have used LLBLGen and nHibernate successfully to generate Entity and DAL layers.
We use Codesmith and have had great success with it. I am now constantly trying to find where we can implement templates to speed up mundane processes.
I've done work with CSLA and used codesmith to generate my code using the CSLA templates.
codesmithtools.com
If your database is your model, SubSonic has an excellent code generator that as of v2.1, no longer requires ActiveRecord (you can use the Repository Pattern instead). It's less flexible than others, but there are customizations that can be made in the stock templates.
I have used CodeSmith and MyGeneration, wasn't overly keen on either, felt somewhat terse to use, learning template languages etc.
SubSonic is what we sometimes use here to generate a Data Access Layer. Used in the right size projects, it is a fantastic time saving tool. clicky
I see code generation harmfull as well, but only if you use 3rd party tools like codesmith and mygeneration. I have 2 stored procedures that generate my domain objects and domain interfaces
Example
GenerateDomainInterface 'TableName'
Then I just copy and paste it into visual studio. Works pretty awesome for those tasks I hate to do.
Two framworks I use often.
Ragel
Something worth checking out is Ragel. It's used to generate code for state machines.
You just add some simple markup to your source code, then run a generator on
Ragel generates code for C, C++, Objective-C, D, Java and Ruby, and it's easy to mix it with your regular source.
Ragel even allow you to execute code on state transitions and such. It makes it easy to create file format and protocol parsers.
Some notable projects that user Ragel are, Mongrel, a great ruby web server. And Hpricot, a ruby based html-parser, sort of inspired by jQuery.
Another great feature of Ragel is how it can generate graphviz-based charts that visualize your state machines. Below is an example taken from Zed Shaw's article on ragel state charts.
(source: zedshaw.com)
XMLBeans
XMLBeans is a java-based xml-binding. It's got a great workflow and I use it often.
XMLBeans processen an xml-schema that describes your model, into a set of java-classes that represents that model. You can programmatically create models then serialise them to and from xml.
I have used CodeSmith. Was pretty helpful.
I love to use
SubSonic. Open source is the way to go with code generation I think because it is very easy to modify the templates and the core as they always tend to have bugs or one or two things you want to do that is not built in.
I've used code generation for swizzle functions in a vector math library. I used a custom PERL script for it. None of the FLOSS generators I looked at seemed well-suited to creating swizzle functions
I generally use C++ templates, rather than code generation.
I've primarily used LLBLGen Pro to generate code. It offers a variety of patterns to use for generation and you can supply your own patters, just like CodeSmith. The customer support has been excellent.
Essentially, I generate my business objects and DAL using LLBLGen and keep them up to date. The code templates have sections where you can add your own logic that won't be wiped out during regeneration. It's definitely worth taking a look.
We custom build our code generation using linq and XML literals (VB).
We haven't found a way to break the solutions into templates yet; however, those two technologies make this task so trivial, I don't think we will.
I'd consider code generation harmful as it bloats the codebase without adding new logic or insight. Ideally one should raise the level of abstraction, use data files, templates or macros etc. to avoid generating large amounts of boiler plate code. It helps you get things done quickly but can hurt maintainability in the long run.
If your chosen programming language becomes much less painful by generating it from some template language, that seems indicate you'd save even more time by doing the higher level work in another, perhaps more dynamic language. YMMV.
LLBLGen Pro is an excellent tool which allows you to write a database agnostic solution. It's really quick to pick up the basic features. Advanced features aren't much more challenging. I highly recommend you check it out.
I worked for four years as the main developer in a web agency, as I wrote from ground-up my first two or three websites, I soon realized that it was going to be a very boring task to do it all the times. So I started writing my own web site generator engine.
My starting point was this site http://www.codegeneration.net/. I took one of their examples for a simple crud generation and extended to the level that i was generating entire sites with it.
I used xml for the definition of various parts of the website (pages, datalists, joins, tables, form management). The generated web sites were completely detached from the generator, so the generated website could also be modified by hand.
Here is their article http://www.codegeneration.net/tiki-read_article.php?articleId=19.
I've done several one-off's of code generation using Castor to create Java source code based on XSD's. The latest use was to create Java classes for an Open Travel Association implementation. The OTA Schema is pretty hairy and would have been a bear to do by hand. Castor did a pretty good job given the complexity of the schema.
Python.
I have used MyGeneration which uses C# to write your code templates. However, I started using Python and I found that I can write code that generates other code faster in that language than I would if written in C#. Subsequently, I have used Python to code gen C#, TSQL, and VB.
Generally, code that generates other code tends to be harder to follow by its very nature. Python's cleaner syntax helps tremendously by making it more readable and more maintainable than the equivalent in C#.
codesmith for .net
I wrote a utility where you specify a table and it generates an Oracle trigger which records all changes to that table. Makes logging really simple.
There's another one I wrote that generates a Delphi class that models any database table you give it, but I consider it a code smell to do that, so I rarely use it.
At the company we've written our own to generate most of our entity/dalc/business classes and the related stored procedures as it took only a little time and we had some special requirements. Although I'm sure we could've achieved the same thing using an existing generator, it was a fun little project to work on.
Codesmith's been recommended by many people and it does seem to be a good one. Personally all I need from a code generator is to make it easy to amend templates.
I use the hibernate tools in myEclipse to generate domain models and DAO code from my data model. It seems to work pretty well (there are some issues if you write custom methods in your DAO's, these seem to get lost on over-writes), but generally it seems to work pretty well- especially in conjunction with Spring.
SubSonic is great!! The query capability is easy to grasp, and the stored procedure implementation is truly awesome. I could go on and on. It makes you productive instantly.
I mainly code in C# and when i need code generation I do it in XLST when the source could be simply converted to XML or a ruby script when it's more complex.
If the code generation part need frequent modifications by more than a few developers CodeSmith works pretty well (And is easier to learn than XSLT or ruby by new developers).
Outsystems' Agile Platform can be used to generate open source, well documented C# and Java applications. Because it has also several features related to deploying, managing and changing, most people end up using it not just to generate the code but actually to manage the full life-cycle of web applications.
For some time, I've used a home-grown script/template language for code generation. (I've used that languge mostly for no other reason than to find use for my little pet project)
Recently, I've created some SQL*PLUS scripts to create database access code (no Hibernate for us...)
MyGeneration all the way!
MyGeneration is an extremely flexible template based code generator written in Microsoft.NET. MyGeneration is great at generating code for ORM architectures. The meta-data from your database is made available to the templates through the MyMeta API.