Generating 4 digits of multiple unique IDs per millisecond in java - java

I am inserting into database with batchupdate. I need to generate actual 10 digits of id out of which first six digits will remain same i.e. a yyMMdd I am trying to append four unique digits to this block of yyMMdd pattern as per local date.
the problem is, it generates lots of duplicate as the FOR loop is running ahead of time in ms.
Expected Pattern : 210609xxxx where 210609 is taken from yyMMdd pattern of LocalDate in java and xxxx needs to be unique for even if FOR loop calls this method multiple times per millisecond.
public Long getUniquedeltaId() {
final Long LIMIT = 10000L;
final Long deltaId = Long.parseLong(Long.toString(Long.parseLong((java.time.LocalDate.now()
.format(DateTimeFormatter
.ofPattern("yyMMdd")))))
.concat(Long.toString(System.currentTimeMillis() % LIMIT)));
System.out.println("deltaId"+deltaId);
return deltaId;
I tried using System.nanoTime() but its returning only one uniqueId.

If you need a plainly simple list of unique IDs at one point in time, you can use the following method:
package example;
import java.util.List;
import java.util.concurrent.ThreadLocalRandom;
import java.util.stream.Collectors;
public class Main {
public static void main(String[] args) {
final String prefix = "20210609";// hardcoded prefix for the example
final List<String> uniqueIds = ThreadLocalRandom.current().ints(0, 10_000)// ints from 0 to 10000 (exclusive) -> every possible 4 digit number
.distinct()// only distinct numbers
.limit(1000L)// exactly 1000 (up to 10000 possible)
.mapToObj(v -> String.format("%04d", v))// always 4 digits (format as string, lpad with 0s
.map(v -> prefix + v)// add our prefix
.collect(Collectors.toList());
System.out.println(uniqueIds);
}
}
If you need a component that provides you with one unique ID at a time, you can use this class:
package example;
import java.time.LocalDate;
import java.time.format.DateTimeFormatter;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.atomic.AtomicBoolean;
import java.util.concurrent.atomic.AtomicInteger;
public class IdGenerator {
private static final int LENGTH = 10_000;
private static final DateTimeFormatter DTF = DateTimeFormatter.ofPattern("yyMMdd");// unlike SimpleDateFormat, this is thread-safe
private final Object monitor;
private final AtomicInteger offset;
private final AtomicBoolean generationInProgress;
private volatile int[] ids;
private volatile LocalDate lastGeneratedDate;
public IdGenerator() {
this.monitor = new Object();
this.offset = new AtomicInteger(0);
this.generationInProgress = new AtomicBoolean(false);
this.ids = new int[LENGTH];
this.lastGeneratedDate = LocalDate.MIN;
}
public String nextId() throws InterruptedException {
final LocalDate currentDate = LocalDate.now();
while (this.lastGeneratedDate.isBefore(currentDate)) {
if (this.generationInProgress.compareAndSet(false, true)) {
this.ids = ThreadLocalRandom.current().ints(0, LENGTH)
.distinct()
.limit(LENGTH)
.toArray();
this.offset.set(0);
this.lastGeneratedDate = currentDate;
this.generationInProgress.set(false);
synchronized (this.monitor) {
this.monitor.notifyAll();
}
}
while (this.generationInProgress.get()) {
synchronized (this.monitor) {
this.monitor.wait();
}
}
}
final int myIndex = this.offset.getAndIncrement();
if (myIndex >= this.ids.length) {
throw new IllegalStateException("no more ids today");
}
return currentDate.format(DTF) + String.format("%04d", this.ids[myIndex]);
}
}
Note that your pattern allows only 10000 unique IDs per day. You limit yourself to 4 digits per day (10^4 = 10000).

Related

Spark Dataset Foreach function does not iterate

Context
I want to iterate over a Spark Dataset and update a HashMap for each row.
Here is the code I have:
// At this point, I have a my_dataset variable containing 300 000 rows and 10 columns
// - my_dataset.count() == 300 000
// - my_dataset.columns().length == 10
// Declare my HashMap
HashMap<String, Vector<String>> my_map = new HashMap<String, Vector<String>>();
// Initialize the map
for(String col : my_dataset.columns())
{
my_map.put(col, new Vector<String>());
}
// Iterate over the dataset and update the map
my_dataset.foreach( (ForeachFunction<Row>) row -> {
for(String col : my_map.KeySet())
{
my_map.get(col).add(row.get(row.fieldIndex(col)).toString());
}
});
Issue
My issue is that the foreach doesn't iterate at all, the lambda is never executed and I don't know why.
I implemented it as indicated here: How to traverse/iterate a Dataset in Spark Java?
At the end, all the inner Vectors remain empty (as they were initialized) despite the Dataset is not (Take a look to the first comments in the given code sample).
I know that the foreach never iterates because I did two tests:
Add an AtomicInteger to count the iterations, increment it right in the beginning of the lambda with incrementAndGet() method. => The counter value remains 0 at the end of the process.
Print a debug message right in the beginning of the lambda. => The message is never displayed.
I'm not used of Java (even less with Java lambdas) so maybe I missed an important point but I can't find what.
I am probably a little old school, but I never like lambdas too much, as it can get pretty complicated.
Here is a full example of a foreach():
package net.jgp.labs.spark.l240_foreach.l000;
import java.io.Serializable;
import org.apache.spark.api.java.function.ForeachFunction;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
public class ForEachBookApp implements Serializable {
private static final long serialVersionUID = -4250231621481140775L;
private final class BookPrinter implements ForeachFunction<Row> {
private static final long serialVersionUID = -3680381094052442862L;
#Override
public void call(Row r) throws Exception {
System.out.println(r.getString(2) + " can be bought at " + r.getString(
4));
}
}
public static void main(String[] args) {
ForEachBookApp app = new ForEachBookApp();
app.start();
}
private void start() {
SparkSession spark = SparkSession.builder().appName("For Each Book").master(
"local").getOrCreate();
String filename = "data/books.csv";
Dataset<Row> df = spark.read().format("csv").option("inferSchema", "true")
.option("header", "true")
.load(filename);
df.show();
df.foreach(new BookPrinter());
}
}
As you can see, this example reads a CSV file and prints a message from the data. It is fairly simple.
The foreach() instantiates a new class, where the work is done.
df.foreach(new BookPrinter());
The work is done in the call() method of the class:
private final class BookPrinter implements ForeachFunction<Row> {
#Override
public void call(Row r) throws Exception {
...
}
}
As you are new to Java, make sure you have the right signature (for classes and methods) and the right imports.
You can also clone the example from https://github.com/jgperrin/net.jgp.labs.spark/tree/master/src/main/java/net/jgp/labs/spark/l240_foreach/l000. This should help you with foreach().

Realize for loops in Java Stream for Array List?

Question : Can i realize method private void fillingArrayList() use Java Stream API (that is in one line) . The variable i is needed to define a length of String ;
I try a for each loop but it doesn't work . I need a range for loop.
import org.apache.commons.lang3.StringUtils;
public class Tolls {
public static String digitsConcatenation(short number, long times) {
return StringUtils.repeat(Character.forDigit(number, 10), Long.valueOf(times).intValue());
}
}
public class Progression {
public Progression(Digit digit) {
this.digit = digit;
this.numbers = new ArrayList<>(Long.valueOf(digit.getTimes()).intValue());
this.fillingArrayList();
}
public Optional<Long> getProgressionSum() {
return this.numbers.stream().reduce(Long::sum);
}
public List<Long> getNumbers() {
return Collections.unmodifiableList(this.numbers);
}
private void fillingArrayList() {
for (int i = 1; i <= digit.getTimes(); i++)
this.numbers.add(Long.valueOf(Tolls.digitsConcatenation(digit.getNumber(), i)));
}
private final List<Long> numbers;
private final Digit digit;
}
My Try :
private void fillingArrayList() {
Arrays.stream(this.numbers.toArray())
.forEach(i-> this.numbers.add(
Long.valueOf(Tolls.digitsConcatenation(
digit.getNumber(), (Long) i))));
}
There are some weird things in your code, like in Progression’s constructor, writing an expression like Long.valueOf(digit.getTimes()).intValue() instead of just digit.getTimes(). The con­struc­tor of ArrayList expects an int and digit.getTimes() returns an int (or a type implicitly con­vertible to int), as demonstrated with the loop condition i <= digit.getTimes().
Likewise, the expression Long.valueOf(times).intValue() within the digitsConcatenation method, which is a cast from long to int in disguise, is only necessary because you declared the second parameter of digitsConcatenation as long despite you actually need an int and the caller’s argument is an int, so you could declare it as int in the first place.
But the entire approach of using string concatenation (incorporating a 3rd party library) and parsing it back into a number is unnecessarily complicated and inefficient. Since both conversions are implicitly using the decimal system, the operation’s result is the same as multiplying the number with ten and adding the value of the digit.
So you could just use
private void fillingArrayList() {
int n = digit.getNumber();
LongStream.iterate(n, current -> current * 10 + n)
.limit(digit.getTimes()).forEach(numbers::add);
}
without any string operation.
Even better would be to change the constructor to
public Progression(Digit digit) {
this.digit = digit;
int n = digit.getNumber();
this.numbers = LongStream.iterate(n, current -> current * 10 + n)
.limit(digit.getTimes()).boxed()
.collect(Collectors.toList());
}
letting the stream produce the List<Long> instead of constructing it manually and modify it after construction.
Following should work:
IntStream.rangeClosed(1, digit.getTimes()).forEach(i -> this.numbers.add(Long.valueOf(Tolls.digitsConcatenation(digit.getNumber(), i))););

Java: counting number of times data appears in a class

I know how to count most things when it comes to Java, but this has either stumped me a lot, or my brain is dying. Anyway, I have a class called "Jobs", and within that class is a String variable called "day". Multiple new Jobs have been created already (exact number is unknown), and now I need to query and find out how many Jobs are on x day. I assume it would be easy enough with a while loop, but I don't know how to create one that looks through Jobs as a whole rather than one specific one.
The Job data was created by reading a file (the name of which is jobFile) via a Scanner.
public class Job_16997761 {
private int jobID; // unique job identification number
private int customerID; // unique customer identification number
private String registration; // registration number for vehicle for this job
private String date; // when the job is carried out by mechanic
private String day; // day of the week that job is booked for
private double totalFee; // total price for the Job
private int[] serviceCode; // the service codes to be carried out on the vehicle for the job
//Constructor
public Job_16997761(int jobID, int customerID, String registration,
String date, String day, double totalFee, int[] serviceCode) {
this.jobID = jobID;
this.customerID = customerID;
this.registration = registration;
this.date = date;
this.day = day;
this.totalFee = totalFee;
this.serviceCode = serviceCode;
}
Not sure why you are creating a dynamic instance of a job (eg. Job_16997761, it seems that each job has its own class). But when creating the jobs you can maintain a map that will have the number of jobs per day. Something like:
Map<String, Long> jobsPerDay=new HashMap<String,Long>();
Then when creating a new job you can simply increment the counter for each day:
jobsPerDay.put(day,jobsPerDay.get(day)!=null?jobsPerDay.get(day)++:1);
This way you will be able to get the number of jobs for a day by using: jobsPerDay.get(day)
Please note that you can use java.time.DayOfWeek instead of a String.
It's hard to tell you correct solution unless you give more details. You are saying you can write while loop so I will assume you have a collection of Job already.
int count = 0;
List<Job> jobs = readJobsFromFile();
for(Job job : jobs) {
if(job.getDay().equals(inputDay)){ //inputDay is day you have to find number of jobs on.
count++;
}
}
System.out.Println(count);
This is just one of the many ways and may not be that efficient, but this is one way you may consider (Before you edited your last post). Using an arrayList to contain all the Job objects and iterate through the objects.
import java.util.*;
public class SomeClass {
public static void main(String[] args)
{
Jobs job1 = new Jobs(1);
Jobs job2 = new Jobs(1);
Jobs job3 = new Jobs(2);
Jobs job4 = new Jobs(2);
Jobs job5 = new Jobs(2);
ArrayList<Jobs> jobList = new ArrayList<Jobs>();
jobList.add(job1);
jobList.add(job2);
jobList.add(job3);
jobList.add(job4);
jobList.add(job5);
System.out.println(numOfJobOnDayX(jobList, 2)); //Jobs which falls on day 2
}
public static int numOfJobOnDayX(ArrayList<Jobs> jobList, int specifiedDay)
{
int count=0;
for(int x=0; x<jobList.size(); x++) //May use a for-each loop instead
if(jobList.get(x).days == specifiedDay)
count ++;
return count;
}
}
OUTPUT: 3
Class for Jobs..
class Jobs
{
int days;
public Jobs(int days)
{
this.days = days;
}
}
For simplicity, I am not using any getter and setter methods. You may want to think about what data structure you want to use to hold your objects. Once again, I need to re-emphasize this may not be an efficient way, but it gives you some ideas some possibilities of doing the count.

Java: Filtering lots of data

I have ~10M rows of data, each containing ~1000 columns (String & Numeric). What I need is to be able to apply simple filters (>, <, RANGE, ==) to this data set as quick as possible (less than a second to get 10K slice for this data).
What kind of production ready technology, which could be used from Java exist?
Where is your data coming from? This sounds like a task for a database.
A sql database with an index on the fields you're filtering. The index can be based on numeric value, which will make range and equals queries pretty quick.
If it's not from database,
You can do it in few threads and then combine the results in order to improve performance.
Like, here AMOUNT is a number of elements in your map:
package com.stackoverflow.test;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
public class Test6 {
private static final int AMOUNT = 10000000;
private static final int CORES = Runtime.getRuntime().availableProcessors();
private static final int PART = AMOUNT / CORES;
private static final class MapFilterTask implements Callable<Map<String,Number >> {
private Integer fromElement;
private Integer toElement;
private Map<String,Number > map;
private MapFilterTask(Map<String,Number > map, Integer fromElement, Integer toElement) {
this.map=map;
this.fromElement = fromElement;
this.toElement = toElement;
}
public Map<String,Number > call() throws Exception {
for(int i=fromElement; i<=toElement; i++){
//filetr your map and return filtered resutl
}
}
}
public static void main(String[] args) throws InterruptedException, ExecutionException {
Map<String,Number > yourMap =new HashMap<String, Number>();
ExecutorService taskExecutor = Executors.newFixedThreadPool(CORES);
List<Callable<Map<String,Number >>> tasks = new ArrayList<Callable<Map<String,Number >>>();
for (int i = 0; i < CORES; i++) {
tasks.add(new MapFilterTask(yourMap,i*PART,(i+1)*PART));
}
List<Future<Map<String,Number >>> futures = taskExecutor.invokeAll(tasks);
Map<String,Number > newMap =new HashMap<String, Number>();
for(Future<Map<String,Number >> feature : futures){
newMap.putAll(feature.get());
}
// Map<String,Numeric>
}
}
And for me it works 4 times faster only with VM args : -Xms2048M -Xmx2048M
Without VM args I got 1.7 time increment on my laptop with 4 cores processor and Linux Mint OS.

non-locking threading code using atomic types when implementing a sliding window class for time

I am trying to understand this code from yammer metrics. The confusion starts with the trim method and the call to trim in both update and getSnapShot. Could someone explain the logic here say for a 15 min sliding window? Why would you want to clear the map before passing it into SnapShot (this is where the stats of the window are calculated).
package com.codahale.metrics;
import java.util.concurrent.ConcurrentSkipListMap;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
public class SlidingTimeWindowReservoir implements Reservoir {
// allow for this many duplicate ticks before overwriting measurements
private static final int COLLISION_BUFFER = 256;
// only trim on updating once every N
private static final int TRIM_THRESHOLD = 256;
private final Clock clock;
private final ConcurrentSkipListMap<Long, Long> measurements;
private final long window;
private final AtomicLong lastTick;
private final AtomicLong count;
public SlidingTimeWindowReservoir(long window, TimeUnit windowUnit) {
this(window, windowUnit, Clock.defaultClock());
}
public SlidingTimeWindowReservoir(long window, TimeUnit windowUnit, Clock clock) {
this.clock = clock;
this.measurements = new ConcurrentSkipListMap<Long, Long>();
this.window = windowUnit.toNanos(window) * COLLISION_BUFFER;
this.lastTick = new AtomicLong();
this.count = new AtomicLong();
}
#Override
public int size() {
trim();
return measurements.size();
}
#Override
public void update(long value) {
if (count.incrementAndGet() % TRIM_THRESHOLD == 0) {
trim();
}
measurements.put(getTick(), value);
}
#Override
public Snapshot getSnapshot() {
trim();
return new Snapshot(measurements.values());
}
private long getTick() {
for (; ; ) {
final long oldTick = lastTick.get();
final long tick = clock.getTick() * COLLISION_BUFFER;
// ensure the tick is strictly incrementing even if there are duplicate ticks
final long newTick = tick > oldTick ? tick : oldTick + 1;
if (lastTick.compareAndSet(oldTick, newTick)) {
return newTick;
}
}
}
private void trim() {
measurements.headMap(getTick() - window).clear();
}
}
Two bits of information from the documentation
ConcurrentSkipListMap is sorted according to the natural ordering of its keys
that's the datastructure to hold all measurements. Key here is a long which is basically the current time. -> measurements indexed by time are sorted by time.
.headMap(K toKey) returns a view of the portion of this map whose keys are strictly less than toKey.
The magic code in getTick makes sure that one time value is never used twice (simply takes oldTick + 1 if that would happen). COLLISION_BUFFER is a bit tricky to understand but it's basically ensuring that even through Clock#getTick() returns the same value you get new values that don't collide with the next tick from clock.
E.g.
Clock.getTick() returns 0 -> modified to 0 * 256 = 0
Clock.getTick() returns 1 -> modified to 1 * 256 = 256
-> 256 values room in between.
Now trim() does
measurements.headMap(getTick() - window).clear();
This calculates the "current time", subtracts the time window and uses that time to get the portion of the map that is older than "window ticks ago". Clearing that portion will also clear it in the original map. It's not clearing the whole map, just that part.
-> trim removes values that are too old.
Each time you update you need to remove old values or the map gets too large. When creating the Snapshot the same things happens so those old values are not included.
The endless for loop in getTick is another trick to use the atomic compare and set method to ensure that - once you are ready to update the value - nothing has changed the value in between. If that happens, the whole loop starts over & refreshes it's starting value. The basic schema is
for (; ; ) {
long expectedOldValue = atomic.get();
// other threads can change the value of atomic here..
long modified = modify(expectedOldValue);
// we can only set the new value if the old one is still the same
if (atomic.compareAndSet(expectedOldValue, modified)) {
return modified;
}
}

Categories

Resources