Multiple keys pointing to a single value in Redis (Cache) with Java - java

I want to store multiple keys with a single value using jedis (Redis cache) with Java.
I have three keys like user_1, driver_10, admin_5 and value = this is user, and I want to get value by using any one key among those three.

Having multiple keys point to same value is not supported in Redis for now, see issue #2668.
You would need a workaround.
Some ideas below, possibly obvious or stupid :)
Maybe have an intermediate key:
- user_10 → id_123
- driver_5 → id_123
- id_123 → data_that_you_dont_want_to_duplicate
You could implement that logic in your client code, or in custom Lua scripts on server, and have your client code use those scripts (but I don't know enough about that to provide details).
If you implement the indirection logic on client side, and if accesses are unbalanced, for example you would access data via user key 99% of the time, and via driver key 1% of the time, it might be worth avoiding 2 client-server round trips for the 99% case. For this you can encode redirections. For example, if first character is # then the rest is the data. If first character is # then the rest is the actual key.
user_10 → #data_that_you_dont_want_to_duplicate
driver_5 → #user_10

Here is a Lua script that can save on trafic, and pull the data in one call:
eval "return redis.call('get',redis.call('get',KEYS[1]))" 1 user-10
The above will return the request data.

Related

Using Amazon S3 as Key Value Store (in production)

I am using Wasabi as S3 storage for my project, and I've been thinking of utilizing S3 as key-value, storage. Wasabi does not charge on API requests as noted here https://wasabi.com/cloud-storage-pricing/
And anyone can easily (in any programming language, maybe) to implement such a simple interface to Amazon S3:
value = store.get(key)
store.put(key, value)
store.delete(key)
Where the key is a string and value is binary data. Effectively using it as a highly distributed and elastic Key-Value store.
So one can store a User object for example with
userid:1234567890:username -> johnsmith
userid:1234567890:username:johnsmith:password -> encrypted_password
userid:1234567890:username:johnsmith:profile_picture -> image_binary
userid:1234567890:username:johnsmith:fav_color -> red
Values are serialized into binary.
And so on.
I have a few questions, what's the best strategy to use Amazon S3 as a key-value store for those who have tried to use S3 as either a database or datastore. Although I think its fairly easy to retrieve the whole user object described here by querying keys with prefix userid:1234567890 and do the logic needed in code, the obvious downside with this is you can't search for value.
What algorithm can be used here to implement a simple key search function, e.g. search for a user with a username starting with "j" or user with fav_color "red", looking at the very basic key-value interface get and put I think this is impossible, but maybe someone knows a work-around?
What kind of serialization strategy for both primitive data types (String, Number, Boolean, etc) and Blob data (images, audio, video, and any sort of file) is best for this kind of key-value store? Also, this simple key-value does not have a way to define what type of value is stored in the key (is it a string, number, binary, etc?), how can that be solved?
How can transactions can be achieved in this kind of scenario? Like in the example above, store the username johnsmith if and only if the other keys are also stored or not at all, I am thinking is S3 batch operation enough to solve this?
What the are main design considerations when planning to use this as the main database for applications (and for production use), both in algorithmic perspective and also considering the limitations of S3 itself?

How does TOTP work if I need to store the OTP in DB?

I have a requirement
A third party provider (TPP) wants to access a Rest Endpoint using an OTP.
So TPP requests a service1 which in turn calls a service2 which generates an OTP, stores user specific data related to this request in DB against that OTP and returns the OTP to the TPP. These OTPs are valid for some n time sy for e.g . 6 minutes. So far so good, Now my questions below
I can only generate 6 digits OTPs. I am using Java.Crypto.mac. I am getting many duplicate OTPs. What is the best Algorithm so that the probability of getting duplicates is reduced. I took hint from https://github.com/jchambers/java-otp/blob/master/src/main/java/com/eatthepath/otp/HmacOneTimePasswordGenerator.java
using same logic. I tested using jmeter for single thread,5000 times I am getting almost 500 duplicate OTPs
I have read that TOTP works in a client server approach. I don't understand in my scenario there is no client as such. Is there a way that I do not store the OTP in DB?
Also at some point all the OTPs would be exhausted if I keep them in DB.
I have read almost all the articles about XOR128,TOTP,HOTP but there is something that I am missing to understand. Please help me solve this problem.
I see several possible approaches.
The first one is TOTP, you don't need to store OTPs at all, all you need is storing a secret key, that will be used to generate OTP.
The second option (commonly used) is to provide your clients with an API key and each time they send a request they need to generate access key using some hash algorithm. The source for hash function should be a string which contains API key and current time step.
Here is an API instruction from one of the top OTP providers, you can find an example of described approach on page 5 "Authorization": https://www.protectimus.com/images/pdf/Protectimus_API_Manual_en.pdf

Ways to buffer REST response

There's a REST endpoint, which serves large (tens of gigabytes) chunks of data to my application.
Application processes the data in it's own pace, and as incoming data volumes grow, I'm starting to hit REST endpoint timeout.
Meaning, processing speed is less then network throughoutput.
Unfortunately, there's no way to raise processing speed enough, as there's no "enough" - incoming data volumes may grow indefinitely.
I'm thinking of a way to store incoming data locally before processing, in order to release REST endpoint connection before timeout occurs.
What I've came up so far, is downloading incoming data to a temporary file and reading (processing) said file simultaneously using OutputStream/InputStream.
Sort of buffering, using a file.
This brings it's own problems:
what if processing speed becomes faster then downloading speed for
some time and I get EOF?
file parser operates with
ObjectInputStream and it behaves weird in cases of empty file/EOF
and so on
Are there conventional ways to do such a thing?
Are there alternative solutions?
Please provide some guidance.
Upd:
I'd like to point out: http server is out of my control.
Consider it to be a vendor data provider. They have many consumers and refuse to alter anything for just one.
Looks like we're the only ones to use all of their data, as our client app processing speed is far greater than their sample client performance metrics. Still, we can not match our app performance with network throughoutput.
Server does not support http range requests or pagination.
There's no way to divide data in chunks to load, as there's no filtering attribute to guarantee that every chunk will be small enough.
Shortly: we can download all the data in a given time before timeout occurs, but can not process it.
Having an adapter between inputstream and outpustream, to pefrorm as a blocking queue, will help a ton.
You're using something like new ObjectInputStream(new FileInputStream(..._) and the solution for EOF could be wrapping the FileInputStream first in an WriterAwareStream which would block when hitting EOF as long a the writer is writing.
Anyway, in case latency don't matter much, I would not bother start processing before the download finished. Oftentimes, there isn't much you can do with an incomplete list of objects.
Maybe some memory-mapped-file-based queue like Chronicle-Queue may help you. It's faster than dealing with files directly and may be even simpler to use.
You could also implement a HugeBufferingInputStream internally using a queue, which reads from its input stream, and, in case it has a lot of data, it spits them out to disk. This may be a nice abstraction, completely hiding the buffering.
There's also FileBackedOutputStream in Guava, automatically switching from using memory to using a file when getting big, but I'm afraid, it's optimized for small sizes (with tens of gigabytes expected, there's no point of trying to use memory).
Are there alternative solutions?
If your consumer (the http client) is having trouble keeping up with the stream of data, you might want to look at a design where the client manages its own work in progress, pulling data from the server on demand.
RFC 7233 describes the Range Requests
devices with limited local storage might benefit from being able to request only a subset of a larger representation, such as a single page of a very large document, or the dimensions of an embedded image
HTTP Range requests on the MDN Web Docs site might be a more approachable introduction.
This is the sort of thing that queueing servers are made for. RabbitMQ, Kafka, Kinesis, any of those. Perhaps KStream would work. With everything you get from the HTTP server (given your constraint that it cannot be broken up into units of work), you could partition it into chunks of bytes of some reasonable size, maybe 1024kB. Your application would push/publish those records/messages to the topic/queue. They would all share some common series ID so you know which chunks match up, and each would need to carry an ordinal so they can be put back together in the right order; with a single Kafka partition you could probably rely upon offsets. You might publish a final record for that series with a "done" flag that would act as an EOF for whatever is consuming it. Of course, you'd send an HTTP response as soon as all the data is queued, though it may not necessarily be processed yet.
not sure if this would help in your case because you haven't mentioned what structure & format the data are coming to you in, however, i'll assume a beautifully normalised, deeply nested hierarchical xml (ie. pretty much the worst case for streaming, right? ... pega bix?)
i propose a partial solution that could allow you to sidestep the limitation of your not being able to control how your client interacts with the http data server -
deploy your own webserver, in whatever contemporary tech you please (which you do control) - your local server will sit in front of your locally cached copy of the data
periodically download the output of the webservice using a built-in http querying library, a commnd-line util such as aria2c curl wget et. al, an etl (or whatever you please) directly onto a local device-backed .xml file - this happens as often as it needs to
point your rest client to your own-hosted 127.0.0.1/modern_gigabyte_large/get... 'smart' server, instead of the old api.vendor.com/last_tested_on_megabytes/get... server
some thoughts:
you might need to refactor your data model to indicate that the xml webservice data that you and your clients are consuming was dated at the last successful run^ (ie. update this date when the next ingest process completes)
it would be theoretically possible for you to transform the underlying xml on the way through to better yield records in a streaming fashion to your webservice client (if you're not already doing this) but this would take effort - i could discuss this more if a sample of the data structure was provided
all of this work can run in parallel to your existing application, which continues on your last version of the successfully processed 'old data' until the next version 'new data' are available
^
in trade you will now need to manage a 'sliding window' of data files, where each 'result' is a specific instance of your app downloading the webservice data and storing it on disc, then successfully ingesting it into your model:
last (two?) good result(s) compressed (in my experience, gigabytes of xml packs down a helluva lot)
next pending/ provisional result while you're streaming to disc/ doing an integrity check/ ingesting data - (this becomes the current 'good' result, and the last 'good' result becomes the 'previous good' result)
if we assume that you're ingesting into a relational db, the current (and maybe previous) tables with the webservice data loaded into your app, and the next pending table
switching these around becomes a metadata operation, but now your database must store at least webservice data x2 (or x3 - whatever fits in your limitations)
... yes you don't need to do this, but you'll wish you did after something goes wrong :)
Looks like we're the only ones to use all of their data
this implies that there is some way for you to partition or limit the webservice feed - how are the other clients discriminating so as not to receive the full monty?
You can use in-memory caching techniques OR you can use Java 8 streams. Please see the following link for more info:
https://www.conductor.com/nightlight/using-java-8-streams-to-process-large-amounts-of-data/
Camel could maybe help you the regulate the network load between the REST producer and producer ?
You might for instance introduce a Camel endpoint acting as a proxy in front of the real REST endpoint, apply some throttling policy, before forwarding to the real endpoint:
from("http://localhost:8081/mywebserviceproxy")
.throttle(...)
.to("http://myserver.com:8080/myrealwebservice);
http://camel.apache.org/throttler.html
http://camel.apache.org/route-throttling-example.html
My 2 cents,
Bernard.
If you have enough memory, Maybe you can use in-memory data store like Redis.
When you get data from your Rest endpoint you can save your data into Redis list (or any other data structure which is appropriate for you).
Your consumer will consume data from the list.

JAVA: How to automatically write unique ID number to a .CSV file

I'm doing a java desktop application which writes "ID, Name, Address, Phone number" into a .CSV file then reads and shows it on JTable. The problem is ID needs to be a unique Integer number that automatically written. Every time you write, it must not be the same as any number of the previous written IDs. I tried creating a method that increases ID number by 1 for every time you click a button. But if you exit the program and run again, the ID number starts from 0 as I initialized it to.
Edit: I'm new to programming.
The best option is to use out-of-the box solution: Use
UUID.randomUUID() method. It gives you a unique id.
Second option: You will have to write your last used ID into persistent storage (File, DB or other option). So when your program starts you read your last used ID and generate the next value. This way you can use numeric sequence. If Thread safety is an issue you can use class AtomicLong to store your value. (But it won't help if you run your app twice as two separate processes)
Third: Use the timestamp you can get it as Long. (simple solution, no tracking previous values needed)
There are essentially two approaches to this:
Use a UUID:
UUIDs are big random numbers. There is a chance that you'll get the same
one twice, but the probability is so low as to be negligible, because
the number space is so unimaginably huge
get one with java.util.UUID.randomUUID()
Use an atomic identifier source.
This is just something with a lock to prevent concurrent access, that
emits unique numbers on request
A very simple identifier generator uses synchronized to ensure atomicity:
public class UniqueIdGenerator {
private long currentId;
public UniqueGenerator(long startingId) {
this.currentId = startingId;
}
public synchronized int getUniqueId() {
return currentId++;
}
}
(You can also use AtomicLong, or let a database engine take care of atomicity for you)
There are more advanced strategies for making this work in a distributed system -- for example, the generator could be accessible as a web service. But this is probably beyond the scope of your question.
You have to persist the last written ID and there are many different ways you could think of.
Writing ID to a file
Writing ID to User-Preferences (maybe a windows-registry entry?)
You have to think of the uniqnes of the ID. What if you run the programm as two different users on the same machine? What if you run your programm on two different machines?
At the start of your application and everytime you manipulate (write) your .csv file. You could update your ID to start from the max(ID's in your .csv) and then add 1 everytime you create a new entry.
You might consider using a small embedded Database (e.g Apache derby) and not writing .csv files. That might be a "cleaner" solution because you can use Database operations to ensure that behaviour.
Best regards!
If ID is required in long format and environment is not multi-threaded then System.nanoTime() can be used.
Otherwise for multi-threaded environments, there could be multiple solutions:
java.security.SecureRandom
java.util.UUID.randomUUID--> internally uses SecureRandom
File.createTempFile().getName() --> internally use SecureRandom
If a long output is required then String.hashCode() can be used after above code.

Optimizing RMI Service Response

I am writing a server client application, best performance is a must; I am using RMI for server-client communication, the server uses mySQL database.
Now in the client side I have a method called
getLinks()
which invokes the same method on the server, the problem is that this method returns about 700Mb of data, which takes some time to get, and some more time to analyse.
And then I'm setting some values for each Link:
for (Link l : myService.getLinks()) l.setSelected(false);
What I have in mind right now is just getting the Link Ids first (since this would be a smaller data) and then using Asynchronous method to get each Link by Id (each link need one service call); and then setting the Link values.
Is this the best approach, is there another way of getting RMI data one by one (one method call and more than one return)?
Is there something like (yield return) in C#?
you can also make a pagination method, which receive the initial id (or position if the id's are not a consecutive) and the length, in this way you will not send all the id's twice
Are the Link objects remote objects? If not I don't really see the point of the code, as it only sets something locally in the client object which is immediately thrown away.
Assuming they are remote objects, it would be better to ship the entire update to the server and tell it to update the whole collection, something like setLinksSelected(boolean), where the server does the iteration.
But I would also be wary of updating, or even transporting, 700Mb of data via RMI whichever way you do it. That's a lot of data.

Categories

Resources