Better Performance hget vs get || Using Redis - java

Better Performance hget vs get || Using Redis
1>
hset key field value ---Here field("dept") will always be same(constant) and key could be 20 char
hset "user1" "dept" 1
hset "user2" "dept" 2
hset "user3" "dept" 2
2>
set key value --Here key could be 20 char
set "{user1}dept" 1
set "{user2}dept" 2
set "{user3}dept" 3
Q1.
In both cases which get cmd will run faster (considering our database has millions of key value pair)
hget "user2" "dept" vs get "user2" "dept"
Q2. is hset "user1" "dept" 1 is equivalent to {"user1" : {"dept" : 1}} or {"dept" : {"user1" : 1}}
Q3. I want to implement expiry on key and field which is not possible in case of hset is there any alternative?

Related

Can I perform a regex search on Redis values?

I tried using RedisSearch but there you can perform a fuzzy search, but I need to perform a regex search like:
key: "12345"
value: { name: "Maruti"}
searching "aru" will give the result "Mumbai", basically the regex formed is *aru*. Can anyone help me out how can I achieve it using Redis ?
This can be done, but I do not recommend it - performance will be greatly impacted.
If you must, however, you can use RedisGears for ad-hoc regex queries like so:
127.0.0.1:6379> HSET mykey name Maruti
(integer) 1
127.0.0.1:6379> HSET anotherkey name Moana
(integer) 1
127.0.0.1:6379> RG.PYEXECUTE "import re\np = re.compile('.*aru.*')\nGearsBuilder().filter(lambda x: p.match(x['value']['name'])).map(lambda x: x['key']).run()"
1) 1) "mykey"
2) (empty array)
Here's the Python code for readability:
import re
p = re.compile('.*aru.*')
GearsBuilder() \
.filter(lambda x: p.match(x['value']['name'])) \
.map(lambda x: x['key']) \
.run()

How to query directly from a kafka topic?

I've looked into Interactive queries and KSQL but I can't seem to figure out if querying for a specific record(s) based on key is possible.
Say I have a record in a topic as shown:
{
key: 12314,
value:
{
id: "1",
name: "bob"
}
}
Would it be possible to search for key 12314 in a topic? Also does KSQL and interactive queries consume the entire topic to do queries?
Assuming your value is valid JSON (i.e. the field names are also quoted) then you can do this easily with KSQL/ksqlDB:
Examine the Kafka topic in ksqlDB:
ksql> PRINT test3;
Format:JSON
1/9/20 12:11:35 PM UTC , 12314 , {"id": "1", "name": "bob" }
Declare the stream:
ksql> CREATE STREAM FOO (ID VARCHAR, NAME VARCHAR)
WITH (KAFKA_TOPIC='test3',VALUE_FORMAT='JSON');
Filter the stream as data arrives
ksql> SELECT ROWKEY, ID, NAME FROM FOO WHERE ROWKEY='12314' EMIT CHANGES;
+----------------------------+----------------------------+----------------------------+
|ROWKEY |ID |NAME |
+----------------------------+----------------------------+----------------------------+
|12314 |1 |bob |
Everyone always forgets to add that you can use an interactive query if the underlying dataset is small and can be materialized.
For example, you cannot efficiently find a message by key in a huge topic. At least I cannot find such a way
Yes, you can do it with interactive queries.
You can create a kafka stream to read the input topic and generate a state store ( in memory/rocksdb and synchronize with kafka ).
This state store is queryable by key ( ReadOnlyKeyValueStore ).
You have multiples examples in the official documentation:
https://kafka.apache.org/10/documentation/streams/developer-guide/interactive-queries.html

Data structure to store HashMap in Druid

I am newbie in Druid. My problem is that how to store and query HashMap in Druid using java to interact.
I have network table as follow:
Network f1 f1 f3 .... fn
value 1 3 2 ..... 2
Additional, I have range-time table
time impression
2016-08-10-00 1000
2016-08-10-00 3000
2016-08-10-00 4000
2016-08-10-00 2000
2016-08-10-00 8000
In Druid can I store range-time table as a HashMap and query both of the tables above with the statement:
Filter f1 = 1 and f2 = 1 and range-time between [t1, t2].
Can anyone help me ? Thanks so much.
#VanThaoNguye,
Yes you can store the hashmaps in druid and you can query with bound filters.
You can read more about bound filters here: http://druid.io/docs/latest/querying/filters.html#bound-filter

Smallest possible value to add to array?

From what I've read, you can store bytes in an array in PHP with this sort of command:
$array = [1,2,14,10];
I have four basic flags I want to add to each array value, something like 0000. If the user performed an action that unlocked the 3rd flag, the value should look like 0010. If all flags are set, the value would look like 1111.
I plan on having a lot of these types of array values, so I was wondering what the smallest possible value I could put into an array that's also Java friendly? After the data is stored in PHP, I'll need to get the array in Java and be able to retrieve these flags. That might look something like:
somevar array = array_from_php;
if(array[0][flag3] == 1)//Player has unlocked this flag
/*do something */
Any advice is greatly appreciated.
I think you dont want an array but an byte (8 bit)
or a word (16 bit) or a dword (32 bit) to store
your flags in RAM or persistent in DB or textfile.
While PHP is not a type save language you cannot declare those types as far as I know.
But you inspired me. The PHP's error_reporting value is stored like this.
But I think it is a full integer instead of just a byte, word or dword.
I did a little test and it seems to work:
<?php
// Flag definitions
$defs = array(
"opt1" => 1,
"opt2" => 2,
"opt3" => 4,
"opt4" => 8,
"opt5" => 16,
"opt6" => 32
);
// enable flag 1,3 and 4 by using a bitwise "OR" Operator
$test = $defs["opt1"] | $defs["opt3"] | $defs["opt4"];
displayFlags($test, $defs);
// enable flag 6, too
$test |= $defs["opt6"];
displayFlags($test, $defs);
// disable flag 3
$test &= ~$defs["opt3"];
displayFlags($test, $defs);
// little improvement: the enableFlag/disableFlag functions
enableFlag($test, $defs["opt5"]);
displayFlags($test, $defs);
disableFlag($test, $defs["opt5"]);
displayFlags($test, $defs);
function displayFlags($storage, $defs) {
echo "The current storage value is: ".$storage;
echo "<br />";
foreach($defs as $k => $v) {
$isset = (($storage & $v) === $v);
echo "Flag \"$k\" : ". (($isset)?"Yes":"No");
echo "<br />";
}
echo "<br />";
}
function enableFlag(&$storage, $def) {
$storage |= $def;
}
function disableFlag(&$storage, $def) {
$storage &= ~$def;
}
The output is:
The current storage value is: 13
Flag "opt1" : Yes
Flag "opt2" : No
Flag "opt3" : Yes
Flag "opt4" : Yes
Flag "opt5" : No
Flag "opt6" : No
The current storage value is: 45
Flag "opt1" : Yes
Flag "opt2" : No
Flag "opt3" : Yes
Flag "opt4" : Yes
Flag "opt5" : No
Flag "opt6" : Yes
The current storage value is: 41
Flag "opt1" : Yes
Flag "opt2" : No
Flag "opt3" : No
Flag "opt4" : Yes
Flag "opt5" : No
Flag "opt6" : Yes
The current storage value is: 57
Flag "opt1" : Yes
Flag "opt2" : No
Flag "opt3" : No
Flag "opt4" : Yes
Flag "opt5" : Yes
Flag "opt6" : Yes
The current storage value is: 41
Flag "opt1" : Yes
Flag "opt2" : No
Flag "opt3" : No
Flag "opt4" : Yes
Flag "opt5" : No
Flag "opt6" : Yes
Conclusion:
I think this is the most efficient way to store flags with a minimum of space. But if you store it like this in a database you may get problems with efficient queries on those flags. I dont think that it is possible to query one or more specific bits of an integer value. But maybe I am wrong and you can use bitwise operator in a query, too. However, I love this kind of saving data.
Java also has a byte[], which will be the smallest storage as well.
With that said, I believe you can find what you are looking for in this post: Store binary sequence in byte array?

Hadoop - Analyze log file (Java)

Logfile looks like this:
Time stamp,activity,-,User,-,id,-,data
--
2013-01-08T16:21:35.561+0100,reminder,-,User1234,-,131235467,-,-
2013-01-02T15:57:24.024+0100,order,-,User1234,-,-,-,{items:[{"prd":"131235467","count": 5, "amount": 11.6},{"prd": "13123545", "count": 1, "amount": 55.99}], oid: 5556}
2013-01-08T16:21:35.561+0100,login,-,User45687,-,143435467,-,-
2013-01-08T16:21:35.561+0100,reminder,-,User45687,-,143435467,-,-
2013-01-08T16:21:35.561+0100,order,-,User45687,-,-,-,{items:[{"prd":"1315467","count": 5, "amount": 11.6},{"prd": "133545", "count": 1, "amount": 55.99}], oid: 5556}
...
...
Edit
Concrete example from this log:
User1234 has got a reminder - this reminder has id=131235467, after this he made an order with following data : {items:[{"prd":"131235467","count": 5, "amount": 11.6},{"prd": "13123545", "count": 1, "amount": 55.99}], oid: 5556}
In this case id and prd of data are the same, so i want sum up count*amount -> in this case 5*11.6 = 58 and output it like
User 1234 Prdsum: 58
User45687 made also an order but he didn't received a reminder so no sum up of his data
Output:
User45687 Prdsum: 0
Final Output of this log:
User 1234 Prdsum: 58
User45687 Prdsum: 0
My Question is: How can i compare(?) this values -> id and prd in data?
The key is the user. Would a custom Writable be useful -> value= (id, data). I need some ideas.
I recommend getting the raw output sum as you are doing as the result of the first pass of one Hadoop job, so at the end of the Hadoop job, you have a result like this:
User1234 Prdsum: 58
User45687 Prdsum: 0
and then have a second Hadoop job (or standalone job) that compares the various values and produces another report.
Do you need "state" as part of the first Hadoop job? If so, then you will need to keep a HashMap or HashTable in your mapper or reducer that stores the values of all the keys (users in this case) to compare - but that is not a good setup, IMHO. You are better off just doing an aggregate in one Hadoop job, and doing the comparison in another.
One way to achieve is by using a composite key.
Mapper output Key is combination of userid, event id (reminder -> 0, order -> 1). Partition data using userid and you need to write your own comparator.
here is the gist.
Mapper
for every event check the event type
if event type is "reminder"
emit : <User1234,0> <reminder id>
if event type is "order"
split if you have multiple orders
for every order
emit : <User1234,1> <prd, count* amount, other interested blah>
Partition using userid so all entries with same user is will go to same reducer.
Reducer
At reducer all entries will be grouped by userid and sorted event id (i.e first you will get all reminders for a given userid and followed by orders).
If `eventid` is 0
add reminders id to a set (`reminderSet`).
If `eventid` is is 1 && prd is in `remindersSet`
emit : `<userid> <prdsum>`
else
emit : `<userid> <0>`
More details on Composite key can be found in 'Hadoop definitive guide' or here

Categories

Resources