Generate range of promotion codes that are not guessable - java

I'm looking for a way to generate a range of promotion codes. It would be trivial if it wasn't for both these requirements. That it needs to be a range (not saving every single promotion code in a database) to make it fast and that it is not guessable so it can not generate codes like this 000-000-001, 000-000-002, 000-000-003... and so on.
Is there an algorithm to solve this problem? I could try to solve it with some sort of hashing but trying to solve this security problem myself might leave the service open to exploits that I didn't think about.

I think your first requirement (not saving every promotional code in a database) is problematic.
The question is, is it allowed to redeem a single promotional code multiple times?
If this is not allowed then you have to store the already redeemed codes in some persistent data store anyway, so why not store the generated codes in the persistent data store from the beginning, together with a flag indicating whether it has been redeemed or not?
If you don't want to store all codes / can't store all codes, you could still use a Random with a seed unique to your current campaign:
long seed = 20190921065347L; // identifies your current campaign
Random r = new Random(seed);
for (int i = 0; i < numCodes; i++) {
System.out.println(r.nextLong());
}
or
long seed = 20190921065347L; // identifies your current campaign
Random r = new Random(seed);
r.longs(numCodes, 100_000_000_000_000L, 1_000_000_000_000_000L)
.forEach(System.out::println);
To find out whether a code is valid you can generate the same codes again:
long seed = 20190921065347L; // identifies your current campaign
Random r = new Random(seed);
System.out.println(
r.longs(numCodes, 100_000_000_000_000L, 1_000_000_000_000_000L)
.anyMatch(l -> l == 350160558695557L));

Would something like this work?
Random r = new Random();
long start = 1_000_000_000;
long end = 10_000_000_000L;
long n = r.longs(1, start, end).reduce(0, (a, b) -> b);
String s = String.format("%,d", n).replace(",", "-");
System.out.println(s);

Related

Ektorp CouchDb: Query for pattern with multiple contains

I want to query multiple candidates for a search string which could look like "My sear foo".
Now I want to look for documents which have a field that contains one (or more) of the entered strings (seen as splitted by whitespaces).
I found some code which allows me to do a search by pattern:
#View(name = "find_by_serial_pattern", map = "function(doc) { var i; if(doc.serialNumber) { for(i=0; i < doc.serialNumber.length; i+=1) { emit(doc.serialNumber.slice(i), doc);}}}")
public List<DeviceEntityCouch> findBySerialPattern(String serialNumber) {
String trim = serialNumber.trim();
if (StringUtils.isEmpty(trim)) {
return new ArrayList<>();
}
ViewQuery viewQuery = createQuery("find_by_serial_pattern").startKey(trim).endKey(trim + "\u9999");
return db.queryView(viewQuery, DeviceEntityCouch.class);
}
which works quite nice for looking just for one pattern. But how do I have to modify my code to get a multiple contains on doc.serialNumber?
EDIT:
This is the current workaround, but there must be a better way i guess.
Also there is only an OR logic. So an entry fits term1 or term2 to be in the list.
#View(name = "find_by_serial_pattern", map = "function(doc) { var i; if(doc.serialNumber) { for(i=0; i < doc.serialNumber.length; i+=1) { emit(doc.serialNumber.slice(i), doc);}}}")
public List<DeviceEntityCouch> findBySerialPattern(String serialNumber) {
String trim = serialNumber.trim();
if (StringUtils.isEmpty(trim)) {
return new ArrayList<>();
}
String[] split = trim.split(" ");
List<DeviceEntityCouch> list = new ArrayList<>();
for (String s : split) {
ViewQuery viewQuery = createQuery("find_by_serial_pattern").startKey(s).endKey(s + "\u9999");
list.addAll(db.queryView(viewQuery, DeviceEntityCouch.class));
}
return list;
}
Looks like you are implementing a full text search here. That's not going to be very efficient in CouchDB (I guess same applies to other databases).
Correct me if I am wrong but from looking at your code looks like you are trying to search a list of serial numbers for a pattern. CouchDB (or any other database) is quite efficient if you can somehow index the data you will be searching for.
Otherwise you must fetch every single record and perform a string comparison on it.
The only way I can think of to optimize this in CouchDB would be the something like the following (with assumptions):
Your serial numbers are not very long (say 20 chars?)
You force the search to be always 5 characters
Generate view that emits every single 5 char long substring from your serial number - more or less this (could be optimized and not sure if I got the in):
...
for (var i = 0; doc.serialNo.length > 5 && i < doc.serialNo.length - 5; i++) {
emit([doc.serialNo.substring(i, i + 5), doc._id]);
}
...
Use _count reduce function
Now the following url:
http://localhost:5984/test/_design/serial/_view/complex-key?startkey=["01234"]&endkey=["01234",{}]&group=true
Will return a list of documents with a hit count for a key of 01234.
If you don't group and set the reduce option to be false, you will get a list of all matches, including duplicates if a single doc has multiple hits.
Refer to http://ryankirkman.com/2011/03/30/advanced-filtering-with-couchdb-views.html for the information about complex keys lookups.
I am not sure how efficient couchdb is in terms of updating that view. It depends on how many records you will have and how many new entries appear between view is being queried (I understand couchdb rebuilds the view's b-tree on demand).
I have generated a view like that that splits doc ids into 5 char long keys. Out of over 1K docs it generated over 30K results - id being 32 char long, simple maths really: (serialNo.length - searchablekey.length + 1) * docscount).
Generating the view took a while but the lookups where fast.
You could generate keys of multiple lengths, etc. All comes down to your records count vs speed of lookups.

I'm getting different results every time I run my code

I'm using ELKI to cluster my data I used KMeansLloyd<NumberVector> with k=3 every time I run my java code I'm getting totally different clusters results, is this normal or there is something I should do to make my output nearly stable?? here my code that I got from elki tutorials
DatabaseConnection dbc = new ArrayAdapterDatabaseConnection(a);
// Create a database (which may contain multiple relations!)
Database db = new StaticArrayDatabase(dbc, null);
// Load the data into the database (do NOT forget to initialize...)
db.initialize();
// Relation containing the number vectors:
Relation<NumberVector> rel = db.getRelation(TypeUtil.NUMBER_VECTOR_FIELD);
// We know that the ids must be a continuous range:
DBIDRange ids = (DBIDRange) rel.getDBIDs();
// K-means should be used with squared Euclidean (least squares):
//SquaredEuclideanDistanceFunction dist = SquaredEuclideanDistanceFunction.STATIC;
CosineDistanceFunction dist= CosineDistanceFunction.STATIC;
// Default initialization, using global random:
// To fix the random seed, use: new RandomFactory(seed);
RandomlyGeneratedInitialMeans init = new RandomlyGeneratedInitialMeans(RandomFactory.DEFAULT);
// Textbook k-means clustering:
KMeansLloyd<NumberVector> km = new KMeansLloyd<>(dist, //
3 /* k - number of partitions */, //
0 /* maximum number of iterations: no limit */, init);
// K-means will automatically choose a numerical relation from the data set:
// But we could make it explicit (if there were more than one numeric
// relation!): km.run(db, rel);
Clustering<KMeansModel> c = km.run(db);
// Output all clusters:
int i = 0;
for(Cluster<KMeansModel> clu : c.getAllClusters()) {
// K-means will name all clusters "Cluster" in lack of noise support:
System.out.println("#" + i + ": " + clu.getNameAutomatic());
System.out.println("Size: " + clu.size());
System.out.println("Center: " + clu.getModel().getPrototype().toString());
// Iterate over objects:
System.out.print("Objects: ");
for(DBIDIter it = clu.getIDs().iter(); it.valid(); it.advance()) {
// To get the vector use:
NumberVector v = rel.get(it);
// Offset within our DBID range: "line number"
final int offset = ids.getOffset(it);
System.out.print(v+" " + offset);
// Do NOT rely on using "internalGetIndex()" directly!
}
System.out.println();
++i;
}
I would say, since you are using RandomlyGeneratedInitialMeans:
Initialize k-means by generating random vectors (within the data sets value range).
RandomlyGeneratedInitialMeans init = new RandomlyGeneratedInitialMeans(RandomFactory.DEFAULT);
Yes, it is normal.
K-Means is supposed to be initialized randomly. It is desirable to get different results when running it multiple times.
If you don't want this, use a fixed random seed.
From the code you copy and pasted:
// To fix the random seed, use: new RandomFactory(seed);
That is exactly what you should do...
long seed = 0;
RandomlyGeneratedInitialMeans init = new RandomlyGeneratedInitialMeans(
new RandomFactory(seed));
This was too long for a comment. As #Idos stated, You are initializing your data randomly; that's why you're getting random results. Now the question is, how do you ensure the results are robust? Try this:
Run the algorithm N times. Each time, record the cluster membership for each observation. When you are finished, classify an observation into the cluster which contained it most often. For example, suppose you have 3 observations, 3 classes, and run the algorithm 3 times:
obs R1 R2 R3
1 A A B
2 B B B
3 C B B
Then you should classify obs1 as A since it was most often classified as A. Classify obs2 as B since it was always classified as B. And classify obs3 as B since it was most often classified as B by the algorithm. The results should become increasingly stable the more times you run the algorithm.

Need some help for deeplearning4j single RBM usage

I have a bunch of sensors and I really just want to reconstruct the input.
So what I want is this:
after I have trained my model I will pass in my feature matrix
get the reconstructed feature matrix back
I want to investigate which sensor values are completely different from the reconstructed value
Therefore I thought a RBM will be the right choice and since I am used to Java, I have tried to use deeplearning4j. But I got stuck very early. If you run the following code, I am facing 2 problems.
The result is far away from a correct prediction, most of them are simply [1.00,1.00,1.00].
I would expect to get back 4 values (which is the number of inputs expected to be reconstructed)
So what do I have to tune to get a) a better result and b) get the reconstructed inputs back?
public static void main(String[] args) {
// Customizing params
Nd4j.MAX_SLICES_TO_PRINT = -1;
Nd4j.MAX_ELEMENTS_PER_SLICE = -1;
Nd4j.ENFORCE_NUMERICAL_STABILITY = true;
final int numRows = 4;
final int numColumns = 1;
int outputNum = 3;
int numSamples = 150;
int batchSize = 150;
int iterations = 100;
int seed = 123;
int listenerFreq = iterations/5;
DataSetIterator iter = new IrisDataSetIterator(batchSize, numSamples);
// Loads data into generator and format consumable for NN
DataSet iris = iter.next();
iris.normalize();
//iris.scale();
System.out.println(iris.getFeatureMatrix());
NeuralNetConfiguration conf = new NeuralNetConfiguration.Builder()
// Gaussian for visible; Rectified for hidden
// Set contrastive divergence to 1
.layer(new RBM.Builder()
.nIn(numRows * numColumns) // Input nodes
.nOut(outputNum) // Output nodes
.activation("tanh") // Activation function type
.weightInit(WeightInit.XAVIER) // Weight initialization
.lossFunction(LossFunctions.LossFunction.XENT)
.updater(Updater.NESTEROVS)
.build())
.seed(seed) // Locks in weight initialization for tuning
.iterations(iterations)
.learningRate(1e-1f) // Backprop step size
.momentum(0.5) // Speed of modifying learning rate
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) // ^^ Calculates gradients
.build();
Layer model = LayerFactories.getFactory(conf.getLayer()).create(conf);
model.setListeners(Arrays.asList((IterationListener) new ScoreIterationListener(listenerFreq)));
model.fit(iris.getFeatureMatrix());
System.out.println(model.activate(iris.getFeatureMatrix(), false));
}
For b), when you call activate(), you get a list of "nlayers" arrays. Every array in the list is the activation for one layer. The array itself is composed of rows: 1 row per input vector; each column contains the activation for every neuron in this layer and this observation (input).
Once all layers have been activated with some input, you can get the reconstruction with the RBM.propDown() method.
As for a), I'm afraid it's very tricky to train correctly an RBM.
So you really want to play with every parameter, and more importantly,
monitor during training various metrics that will give you some hint about whether it's training correctly or not. Personally, I like to plot:
The score() on the training corpus, which is the reconstruction error after every gradient update; check that it decreases.
The score() on another development corpus: useful to be warned when overfitting occurs;
The norm of the parameter vector: it has a large impact on the score
Both activation maps (= XY rectangular plot of the activated neurons of one layer over the corpus), just after initialization and after N steps: this helps detecting unreliable training (e.g.: when all is black/white, when a large part of all neurons are never activated, etc.)

How to synchronize System Time access in a class in Java

I am writing a class that when called will call a method to use system time to generate a unique 8 character alphanumeric as a reference ID. But I have the fear that at some point, multiple calls might be made in the same millisecond, resulting in the same reference ID. How can I go about protecting this call to system time from multiple threads that might call this method simultaneously?
System time is unreliable source for Unique Ids. That's it. Don't use it.
You need some form of a permanent source (UUID uses secure random which seed is provided by the OS)
The system time may go/jump backwards even a few milliseconds and screw your logic entirely. If you can tolerate 64 bits only you can either use High/Low generator which is a very good compromise or cook your own recipe: like 18bits of days since beginning of 2012 (you have over 700years to go) and then 46bits of randomness coming from SecureRandom - not the best case and technically it may fail but it doesn't require external persistence.
I'd suggest to add the threadID to the reference ID. This will make the reference more unique. However, even within a thread consecutive calls to a time source may deliver identical values. Even calls to the highest resolution source (QueryPerformanceCounter) may result in identical values on certain hardware. A possible solution to this problem is testing the collected time value against its predecessor and add an increment item to the "time-stamp". You may need more than 8 characters when this should be human readable.
The most efficient source for a timestamp is the GetSystemTimeAsFileTime API. I wrote some details in this answer.
You can use the UUID class to generate the bits for your ID, then use some bitwise operators and Long.toString to convert it to base-36 (alpha-numeric).
public static String getId() {
UUID uuid = UUID.randomUUID();
// This is the time-based long, and is predictable
long msb = uuid.getMostSignificantBits();
// This contains the variant bits, and is random
long lsb = uuid.getLeastSignificantBits();
long result = msb ^ lsb; // XOR
String encoded = Long.toString(result, 36);
// Remove sign if negative
if (result < 0)
encoded = encoded.substring(1, encoded.length());
// Trim extra digits or pad with zeroes
if (encoded.length() > 8) {
encoded = encoded.substring(encoded.length() - 8, encoded.length());
}
while (encoded.length() < 8) {
encoded = "0" + encoded;
}
}
Since your character space is still smaller compared to UUID, this isn't foolproof. Test it with this code:
public static void main(String[] args) {
Set<String> ids = new HashSet<String>();
int count = 0;
for (int i = 0; i < 100000; i++) {
if (!ids.add(getId())) {
count++;
}
}
System.out.println(count + " duplicate(s)");
}
For 100,000 IDs, the code performs well pretty consistently and is very fast. I start getting duplicate IDs when I increase another order of magnitude to 1,000,000. I modified the trimming to take the end of the encoded string instead of the beginning, and this greatly improved duplicate ID rates. Now having 1,000,000 IDs isn't producing any duplicates for me.
Your best bet may still be to use a synchronized counter like AtomicInteger or AtomicLong and encode the number from that in base-36 using the code above, especially if you plan on having lots of IDs.
Edit: Counter approach, in case you want it:
private final AtomicLong counter;
public IdGenerator(int start) {
// start could also be initialized from a file or other
// external source that stores the most recently used ID
counter = new AtomicLong(start);
}
public String getId() {
long result = counter.getAndIncrement();
String encoded = Long.toString(result, 36);
// Remove sign if negative
if (result < 0)
encoded = encoded.substring(1, encoded.length());
// Trim extra digits or pad with zeroes
if (encoded.length() > 8) {
encoded = encoded.substring(0, 8);
}
while (encoded.length() < 8) {
encoded = "0" + encoded;
}
}
This code is thread-safe and can be accessed concurrently.

Java: How do I simulate probability?

I have a set of over 100 different probabilities ranging from 0.007379 all the way to 0.913855 (These probabilities were collected from an actuary table http://www.ssa.gov/oact/STATS/table4c6.html). In Java, how can I use these probabilities to determine whether something will happen or not? Something along these lines...
public boolean prob(double probability){
if (you get lucky)
return true;
return false;
}
The Random class allows you to create a consistent set of random numbers so that every time you run the program, the same sequence of values is generated. You can also generate normally distributed random values with the Random class. I doubt you need any of that.
For what you describe, I would just use Math.random. So, given the age of a man we could write something like:
double prob = manDeathTable[age];
if( Math.random() < prob )
virtualManDiesThisYear();
First you need to create an instance of Random somewhere sensible in your program - for example when your program starts.
Random random = new Random();
Use this code to see whether an event happens:
boolean happens = random.NextDouble() < prob;
I'm not sure where that range came from. If you have a distribution in mind, I'd recommend using a Random to generate a value and get on with it.
public ProbabilityGenerator {
private double [] yourValuesHere = { 0.007379, 0.5, 0.913855 };
private Random random = new Random(System.currentTimeMillis());
public synchronized double getProbability() {
return this.yourValuesHere[this.random.nextInt(yourValuesHere.length));
}
}

Categories

Resources