Generate unique ID

Generate unique ID - java

I need to generate unique ID's for my application. When I used (UUID.randomUUID()).toString(), I am getting a code (thinking this will be unique), which is very lengthy.
I am not sure how unique it will be, when we generate codes with the help of Java Timestamp or randomstring.
I need to generate unique codes which is only of 8-10 characters in length (alpha-numeric). How to get so?
I am using MySQL database.
Is generating unique code on database side is the best way or can we generate such short (but unique) codes in Java?
Any suggestions with example code will be very helpful.

I use RandomStringUtils.randomAlphanumeric() method from commons-lang to achieve this:
import org.apache.commons.lang.RandomStringUtils;
public static final int ID_LENGTH = 10;
public String generateUniqueId() {
return RandomStringUtils.randomAlphanumeric(ID_LENGTH);
}
If you using Maven, ensure that you have added commons-lang to project's dependencies:
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.6</version>
</dependency>
Is generating unique code on database side is the best way or can we generate such short (but unique) codes in java?
It's up to you and your project. Is id-generation part of business logic? If yes and all logic written on Java, so write it on Java. If all or some part of logic delegated to database, so generate id there (but in this case you will have strong dependency to particular database).

Do you have any specific limitation you need to take into account? Such as cross-application uniqueness? Because otherwise, MySQL is quite capable of generating IDs by itself, all you need to do is define an autoincrement column and not specify it at insert time (meaning, inserting a NULL value for it) - that will make MySQL fill it with the next available ID, unique and requiring no work from you.
It won't be an alphanumerical string (which I'm not sure if you specified as a requirement or restriction), but if all you require is uniqueness, it's more than enough. 8 - 10 alphanumeric characters aren't enough to guarantee uniqueness in a randomly-generated string, so you'd have to perform an insert check on the database.

Is generating unique code on database side is the best way or can we generate such short (but unique) codes in Java?
Databases are designed to be able to generate unique IDs where needed. I doubt anything you (or I) could code would be a 'better' variant of that.

I have written a simple service which can generate semi-unique non-sequential 64 bit long numbers. It can be deployed on multiple machines for redundancy and scalability. It uses ZeroMQ for messaging. For more information on how it works look at github page: zUID

Take look at: UIDGenerator.java
You can customize it (unique to process only, or world), it is easy to use and fast:
private static final UIDGenerator SCA_GEN = new UIDGenerator(new ScalableSequence(0, 100));
.......
SCA_GEN.next();
You can change the implementation to reduce the size of the ID (and add other tradeoffs)
see my benchmarking results at:
http://zoltran.com/roller/zoltran/entry/generating_a_unique_id
or run them yourself.

The question if id generation part be done in database or java end:
This question has to be answered by you depending on requirements of your application:
1) One way is to go by System.currenTimeMillis() . But if your applicaation will work in multi clustered env, then you may end up with duplicate values.
http://www2.sys-con.com/itsg/virtualcd/java/archives/0512/Westra/index.html
2) Another way is to use UUID Generator .It will help you in case you have different databases that need to be merged. Using this mehtod you don't have to worry about duplication of id when merging databases.
https://marketplace.informatica.com/solutions/mapping_uuid_using_java
There may be other factors you may want to consider.
As per your question UUID method will go.

Related

How to retrieve all SQL-queries from Java source code?

We have many Java Spring projects which use Sybase database.
We want to migrate it to MSSQL.
One of the tasks is to develop a script to find all SQL-queries used in the projects' source code. Moreover, there is a brought usage of stored procedures in the projects.
What is an appropriate approach to do so?
#Override
public void update(int id, Entity entity) {
jdbcTemplate.update(
"UPDATE exclusion SET [enabled] = :enabled WHERE [id] = :id",
HashMapBuilder.<String, Object>builder()
.put("id", id)
.put("enabled", entity.enabled)
.build()
);
}
It is the easiest case.
Firstly, we want to REGEX the source code in order to find SQL by a list of SQL keywords.

In essence, you want to find any (SQL) string being fed to a jdbc call.
This means your tool must know what the jdbc methods are (e.g., "jdbcTemplate.update"), and which argument of each method is string intended to be SQL. That's sort of easy since it is documented.
What is hard is to find the string, because you assemble it dynamically; there's no guarantee that the entire SQL string is actually sitting as a direct argument to the function call. It might be computed by combining SQL string fragments using "+" and arbitrary function calls.
This means you have to parse the Java in a compiler sense, know what the meaning of each symbol is, and trace values through the dataflows in the code.
There's no way on earth a regex can do this reliably. (You can do it badly and maybe that's good enough for you, I suggest hunting for all jdbc method call names).
There's a worse problem: once you've figured out what the SQL string is, you know need to know if it is MSSQL-compliant. That requires parsing the abstract string (remember, it is assembled from a bunch of fragments) using an MSSQL-compliant parser (again, no regex can do context-free parsing) and complain about the ones that don't parse.
Even that may not be enough, if MSSQL has statements that look identical to sybase statements, but mean different things.
THis is a really hard problem to solve well using automation. (There are research papers that describe all of the above activities).
I think what you will have to do is find all SQL calls, and hand-inspect each for compatibility.
Next time, you should build your application with a database access layer. Then all the SQL calls are in one place.

JAVA: How to automatically write unique ID number to a .CSV file

I'm doing a java desktop application which writes "ID, Name, Address, Phone number" into a .CSV file then reads and shows it on JTable. The problem is ID needs to be a unique Integer number that automatically written. Every time you write, it must not be the same as any number of the previous written IDs. I tried creating a method that increases ID number by 1 for every time you click a button. But if you exit the program and run again, the ID number starts from 0 as I initialized it to.
Edit: I'm new to programming.

The best option is to use out-of-the box solution: Use
UUID.randomUUID() method. It gives you a unique id.
Second option: You will have to write your last used ID into persistent storage (File, DB or other option). So when your program starts you read your last used ID and generate the next value. This way you can use numeric sequence. If Thread safety is an issue you can use class AtomicLong to store your value. (But it won't help if you run your app twice as two separate processes)
Third: Use the timestamp you can get it as Long. (simple solution, no tracking previous values needed)

There are essentially two approaches to this:
Use a UUID:
UUIDs are big random numbers. There is a chance that you'll get the same
one twice, but the probability is so low as to be negligible, because
the number space is so unimaginably huge
get one with java.util.UUID.randomUUID()
Use an atomic identifier source.
This is just something with a lock to prevent concurrent access, that
emits unique numbers on request
A very simple identifier generator uses synchronized to ensure atomicity:
public class UniqueIdGenerator {
private long currentId;
public UniqueGenerator(long startingId) {
this.currentId = startingId;
}
public synchronized int getUniqueId() {
return currentId++;
}
}
(You can also use AtomicLong, or let a database engine take care of atomicity for you)
There are more advanced strategies for making this work in a distributed system -- for example, the generator could be accessible as a web service. But this is probably beyond the scope of your question.

You have to persist the last written ID and there are many different ways you could think of.
Writing ID to a file
Writing ID to User-Preferences (maybe a windows-registry entry?)
You have to think of the uniqnes of the ID. What if you run the programm as two different users on the same machine? What if you run your programm on two different machines?

At the start of your application and everytime you manipulate (write) your .csv file. You could update your ID to start from the max(ID's in your .csv) and then add 1 everytime you create a new entry.
You might consider using a small embedded Database (e.g Apache derby) and not writing .csv files. That might be a "cleaner" solution because you can use Database operations to ensure that behaviour.
Best regards!

If ID is required in long format and environment is not multi-threaded then System.nanoTime() can be used.
Otherwise for multi-threaded environments, there could be multiple solutions:
java.security.SecureRandom
java.util.UUID.randomUUID--> internally uses SecureRandom
File.createTempFile().getName() --> internally use SecureRandom
If a long output is required then String.hashCode() can be used after above code.

MarkLogic document identity column

Coming from the relational world, I am looking at using some sort of identity / auto-increment mechanism for the id field of my documents. Is there anything in MarkLogic that sets this?
Secondly, is it then appropriate to use said id field in the URI when document is PUT in ML (ie. /books/123)?

A document's unique identifier is its URI. A URI can be (almost) any string. However there are several conventions that most people use when implementing a strategy to define URIs. For example, you can use slashes like UNIX paths, to define (implicit) directories. It's also helpful to add an extension, like .xml or .json to indicate the mime type of the document.
I'd advise you to stay away from sequential identifiers. You'll need to lock on whatever is generating the numbers. In a distributed system, like MarkLogic, that guarantees consistency, this is expensive. Sequential numbers also make it more difficult to integrate data. What if another collection of books starts at the same seed?
I'd suggest using something like a UUID in your URIs to provide global uniqueness without locking.

Best approach for automatically generating I18N key strings in Java

I've taken a look at the following post to grasp a basic idea of how to deal with I18N:
Where to place i18n key strings in Java
But more than that, I have an intention of automatically generating a Java class with constants storing key strings of I18n property files. I know it could be feasibly done using Java itself, however, I wonder which is the best approach (using Ant, Batch or something else...)?

The best approach is to give the identifiers yourself.
The ID can give hint to the translator what the string is about.
In fact, some consistency in naming convention is best:
dialog.print.title = Print
menu.file.print = Print
The two entries have to be translated differently in some languages (for instance in French it would be "Imprimer" for a button and "Impression" for a label / title).
Think it it this way: would you like to maintain some code where all variable names are machine generated?

Simple properties to string conversion in Java

Using Java, I need to encode a Map<String, String> of name value pairs to store into a String, and be able to decode it again. These will be stored in a database column, and will probably usually be short and simple, so the common case should produce a simple nice looking line, but shouldn't corrupt the data, even if it contains unexpected characters, etc.
How would you choose to do it such that:
The encoded form is a single, human readable line
It doesn't require a big library or much context to encode / decode
Any delimeters are properly escaped
Url encoding? JSON? Do it yourself? Please specify any helper libraries or methods you'd use.
(Edited to specify more context and requirements as requested.)

As #Uri says, additional context would be good. I think your primary concerns are less about the particular encoding scheme, as rolling your own for most encodings is pretty easy for a simple Map<String, String>.
An interesting question is: what will this intermediate string encoding be used for?
if it's purely internal, an ad-hoc format is fine eg simple concatenation:
key1|value1|key2|value2
if humans night read it, a format like Ruby's map declaration is nice:
{ first_key => first_value,
second_key => second_value }
if the encoding is to send a serialised map over the wire to another application, the XML suggestion makes a lot of sense as it's standard-ish and reasonably self-documenting, at the cost of XML's verbosity.
<map>
<entry key='foo' value='bar'/>
<entry key='this' value='that'/>
</map>
if the map is going to be flushed to file and read back later by another Java application, #Cletus' suggestion of the Properties class is a good one, and has the additional benefit of being easy to open and inspect by human beings.
Edit: you've added the information that this is to store in a database column - is there a reason to use a single column, rather than three columns like so:
CREATE TABLE StringMaps
(
map_id NUMBER NOT NULL, -- ditch this if you only store one map...
key VARCHAR2 NOT NULL,
value VARCHAR2
);
As well as letting you store more semantically meaningful data, this moves the encoding/decoding into your data access layer more formally, and allows other database readers to easily see the data without having to understand any custom encoding scheme you might use. You can also easily query by key or value if you want to.
Edit again: you've said that it really does need to fit into a single column, in which case I'd either:
use the first pipe-separated encoding (or whatever exotic character you like, maybe some unprintable-in-English unicode character). Simplest thing that works. Or...
if you're using a database like Oracle that recognises XML as a real type (and so can give you XPath evaluations against it and so on) and need to be able to read the data well from the database layer, go with XML. Writing XML parsers for decoding is never fun, but shouldn't be too painful with such a simple schema.
Even if your database doesn't support XML natively, you can just throw it into any old character-like column-type...

Why not just use the Properties class? That does exactly what you want.

I have been contemplating a similar need of choosing a common representation for the conversations (transport content) between my clients and servers via a facade pattern. I want a representation that is standardized, human-readable (brief), robust, fast. I want it to be lightweight to implement and run, easy to test, and easy to "wrap". Note that I have already eliminated XML by my definition, and by explicit intent.
By "wrap", I mean that I want to support other transport content representations such as XML, SOAP, possibly Java properties or Windows INI formats, comma-separated values (CSV) and that ilk, Google protocol buffers, custom binary formats, proprietary binary formats like Microsoft Excel workbooks, and whatever else may come along. I would implement these secondary representations using wrappers/decorators around the primary facade. Each of these secondary representations is desirable, especially to integrate with other systems in certain circumstances, but none of them is desirable as a primary representation due to various shortcomings (failure to meet one or more of my criteria listed above).
Therefore, so far, I am opting for the JSON format as my primary transport content representation. I intend to explore that option in detail in the near future.
Only in cases of extreme performance considerations would I skip translating the underlying conventional format. The advantages of a clean design include good performance (no wasted effort, ease of maintainability) for which a decent hardware selection should be the only necessary complement. When performance needs become extreme (e.g., processing forty thousand incoming data files totaling forty million transactions per day), then EVERYTHING has to be revisited anyway.
As a developer, DBA, architect, and more, I have built systems of practically every size and description. I am confident in my selection of criteria, and eagerly await confirmation of its suitability. Indeed, I hope to publish an implementation as open-source (but don't hold your breath quite yet).
Note that this design discussion ignores the transport medium (HTTP, SMTP, RMI, .Net Remoting, etc.), which is intentional. I find that it is much more effective to treat the transport medium and the transport content as completely separate design considerations, from each other and from the system in question. Indeed, my intent is to make these practically "pluggable".
Therefore, I encourage you to strongly consider JSON. Best wishes.

Some additional context for the question would help.
If you're going to be encoding and decoding at the entire-map granularity, why not just use XML?

As #DanVinton says, if you need this in internal use (I mean "
internal use
as
it's used only by my components, not components written by others
you can concate key and value.
I prefer use different separator between key and key and key and value:
Instead of
key1+SEPARATOR+value1+SEPARATOR+key2 etc
I code
key1+SEPARATOR_KEY_AND_VALUE+value1+SEPARATOR_KEY(n)_AND_KEY(N+1)+key2 etc
if you must debug, this way is clearer (by design too)

Check out the apache commons configuration package. This will allow you to read/save a file as XML or properties format. It also gives you an option of automatically saving the property changes to a file.
Apache Configuration

A realise this is an old "deadish" thread, but I've got a solution not posited previously which I think is worth throwing in the ring.
We store "arbitrary" attributes (i.e. created by the user at runtime) of geographic features in a single CLOB column in the DB in the standard XML attributes format. That is:
name="value" name="value" name="value"
To create an XML element you just "wrap up" the attributes in an xml element. That is:
String xmlString += "<arbitraryAttributes" + arbitraryAttributesString + " />"
"Serialising" a Properties instance to an xml-attributes-string is a no-brainer... it's like ten lines of code. We're lucky in that we can impose on the users the rule that all attribute names must be valid xml-element-names; and we xml-escape (i.e. &quote; etc) each "value" to avoid problems from double-quotes and whatever in the value strings.
It's effective, flexible, fast (enough) and simple.
Now, having said all that... if we had the time again, we'd just totally divorce ourselves from the whole "metadata problem" by storing the complete unadulterated uninterpreted metadata xml-document in a CLOB and use one of the open-source metadata editors to handle the whole mess.
Cheers. Keith.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.