Generate random number within a range (0-100k) in a cluster environment - java

I have to generate random number within a range (0-100.000) in a cluster environment (many stateless Java based app servers + Mongodb) - so every user request will get some unique number and will maintain it in the next few requests.
As I understand, I have two options:
1. have some number persisted in mongo and incrementAndGet it - but it's not atomic - bad choice.
2. Use Redis - it's atomic and support counters.
3. Any idea? Is it safe to use UUID and set a range for it ?
4. Hazelcast ?
Any other though?
Thanks

I would leverage the existing MongoDB infrastructure and use the MongoDB findAndModify command to do an atomic increment and get operation.
For the shell the command would look like.
var result = db.ids.findAndModify( {
query: { _id: "counter" },
sort: { rating: 1 },
new : true,
update: { $inc: { counter: 1 } },
upsert : true
} );
The 'new : true' returns the document after the update. Upsert creates the document if it is missing.
The 10gen supported driver and the Asynchronous Driver both contain helper methods/builders for the find and modify command.

Related

Changing alias in ElasticSearch returns 200 and acknowledged but does not change alias

Using elasticsearch 8.4.3 with Java 17 and a cluster of 3 nodes where 3 are master eligible, we start with following situation:
index products-2023-01-12-0900 which has an alias current-products
We then start a job that creates a new index products-2023-01-12-1520 and at the end using elastic-rest-client on client side and alias API, we make this call:
At 2023-01-12 16:27:26,893:
POST /_aliases
{"actions":[
{
"remove": {
"alias":"current-products",
"index":"products-*"
}
},
{
"add":{
"alias":"current-products",
"index":"products-2023-01-12-1520"}
}
]}
And we get the following response 26 millis after with HTTP response code 200:
{"acknowledged":true}
But looking at what we end up with, we still have old index with current-products alias.
I don't understand why it happens, and it does not happen 100% of the time (it happened 2 times out of around 10 indexations).
Is it a known bug ? or a regular behaviour ?
Edit for #warkolm:
GET /_cat/aliases?v before indexation as of now:
alias index filter routing.index routing.search is_write_index
current-products products-2023-01-13-1510 - - - -
It appears that there might be an issue with the way you are updating the alias. When you perform a POST request to the _aliases endpoint with the "remove" and "add" actions, Elasticsearch will update the alias based on the current state of the indices at the time the request is executed.
However, it is possible that there are other processes or actions that are also modifying the indices or aliases at the same time, and this can cause conflicts or inconsistencies. Additionally, when you use the wildcard character (*) in the "index" field of the "remove" action, it will remove the alias from all indices that match the pattern, which may not be the intended behavior.
To avoid this issue, you could try using the Indices Aliases API instead of the _aliases endpoint. This API allows you to perform atomic updates on aliases, which means that the alias will only be updated if all actions succeed, and will roll back if any of the actions fail. Additionally, instead of using the wildcard character, you can explicitly specify the index that you want to remove the alias from.
Here is an example of how you could use the Indices Aliases API to update the alias:
POST /_aliases
{
"actions": [
{ "remove": { "index": "products-2023-01-12-0900", "alias": "current-products" } },
{ "add": { "index": "products-2023-01-12-1520", "alias": "current-products" } }
]
}
This way, the alias will only be removed from the specific index "products-2023-01-12-0900" and added to the specific index "products-2023-01-12-1520". This can help avoid any conflicts or inconsistencies that may be caused by other processes or actions that are modifying the indices or aliases at the same time.
Additionally, it is recommended to use a version of elasticsearch that is equal or greater than 8.4.3, as it has many bug fixes that might be the cause of the issue you are facing.
In conclusion, the issue you are encountering may not be a known bug but it's a regular behavior if multiple processes are modifying the indices or aliases at the same time, and using the Indices Aliases API and specifying the exact index to remove or add the alias can help avoid this issue.

Ripple XRP Ledger - Cant get transaction validated (Testnet)

Im using testnet to validate my transaction, transaction :
{"transaction":"ECAB482EB34177FA1B1E6C724F038C42308004B1F307A169FAEA88C825E11642","command":"tx","id":0}
Response :
{id=0, status='success', errorMessage='null', result=TxResult{validated=false}}
Im using websocket , method 'tx' to check. What is the best course of action to figure out problem, is there a way to see reason this is not validated on some of the testnet validators?
Im connected to wss://s.altnet.rippletest.net:51233 ,address i use is rKHDh61BpcojAoiATgJgDaVwdSJ64fGNwF. Can someone help?
Fee is at 1 000 000 drops. This is the transaction blob 1200002200000000240000000061D4838D7EA4C680000000000000000000000000005553440000000000C882FD6AB9862C4F90E57E1BA15C248CABAD5BF96840000000000F42407321033BF063167F21FF6C01045B4E2F03F519879B552D2611F0E885E01F08C88D15247446304402202E90609AAFBF4C105408CFF2377D48085879BEE3C7DE57AF125F73926284362A022002D7A487F5929F9A3E1050FC2B5D6AE1DD5384647AD1ABF6D322765F0ABE0A498114C882FD6AB9862C4F90E57E1BA15C248CABAD5BF983148DC6B336C7D3BE007297DB086B1D3483DEA24C2A
Is my transaction fualty ? Then why was it corrently submited to the network ? Seems like its valid, bud why its not validated and hence finalized in ledger?
Note : responses use my internal model to represent some
properties, hence thats why names might be slightly different and some properties ommited.
Result from 'submit' call :
Result :SubmitResult{engineResult='tefPAST_SEQ', engineResultCode=-190, engineResultMessage='This sequence number has already passed.', txBlob='1200002200000000240000000061D4838D7EA4C680000000000000000000000000005553440000000000C882FD6AB9862C4F90E57E1BA15C248CABAD5BF96840000000000F42407321033BF063167F21FF6C01045B4E2F03F519879B552D2611F0E885E01F08C88D15247446304402202E90609AAFBF4C105408CFF2377D48085879BEE3C7DE57AF125F73926284362A022002D7A487F5929F9A3E1050FC2B5D6AE1DD5384647AD1ABF6D322765F0ABE0A498114C882FD6AB9862C4F90E57E1BA15C248CABAD5BF983148DC6B336C7D3BE007297DB086B1D3483DEA24C2A', txJson=TxJson{transactionType='Payment', account='rKHDh61BpcojAoiATgJgDaVwdSJ64fGNwF', destination='rDveJyEotoUp9jCD1Ghi2ktEBnhHiA6RBB', amount=Amount{currency='USD', value=1, issuer='rKHDh61BpcojAoiATgJgDaVwdSJ64fGNwF'}, fee='1000000', flags=0, sequence=0, signingPubKey='033BF063167F21FF6C01045B4E2F03F519879B552D2611F0E885E01F08C88D1524', txnSignature='304402202E90609AAFBF4C105408CFF2377D48085879BEE3C7DE57AF125F73926284362A022002D7A487F5929F9A3E1050FC2B5D6AE1DD5384647AD1ABF6D322765F0ABE0A49', hash='ECAB482EB34177FA1B1E6C724F038C42308004B1F307A169FAEA88C825E11642'}}
I submited it few times , so 'tefPAST_SEQ' is present.
Looks like your transaction object has sequence field in it.
According to THIS your sequence can be auto-filled. It can be set manually in case you'd want to submit multiple transactions at once by incrementing them manually.
This gives you control over the order of the transactions to be executed in certain order.
If that doesn't matter you can just go without setting the sequence.
In your case your account looks like this (using account_info):
{
"result": {
"account_data": {
"Account": "rKHDh61BpcojAoiATgJgDaVwdSJ64fGNwF",
"Balance": "10000000000",
"Flags": 0,
"LedgerEntryType": "AccountRoot",
"OwnerCount": 0,
"PreviousTxnID": "12CA4E5AAF4198155FF3F16E53D35353B051F4AB5E01749833202339B48D187A",
"PreviousTxnLgrSeq": 11450559,
"Sequence": 1,
"index": "169B6BA91A54B2EC86EFB618995A59E76F07853BB88AF231776118339FFD7268"
},
"ledger_hash": "449E3420C6B1C6959FA794066264432EF4E98543B0C6582B00D6CD28DE33B8F8",
"ledger_index": 11523855,
"status": "success",
"validated": true
}
See the result.account_data.Sequence being 1?
The reason you're seeing This sequence number has already passed is you've set sequence=0 in your transaction. (provided by Result from 'submit' call :)
On a side note, I see you've set currency='USD' which means you have to open a trust line first. your account currently has 0 account_lines
Either way, good luck using XRP ;)

Saving the JavaRDD in Elastic Search using ES Hadoop connector

Currently working on a transformation project where I need to feed the data to elastic search from Oracle. So my work goes like this
1. Sqoop - From oracle
2. Java Spark - Dataframe Joins then saving them into elastic search repo's
And my elastic document will look like
{
Field 1: Value
Field 2: value
Field 3: Value
Field 4: [ -- Array of Maps
{
Name: Value
Age: Value
},{
Name: Value
Age: Value
}
]
Field 5:{ -- Maps
Code :Value
Key : Value
}
}
So would like to know, how to form a javaRDD for the above structure.
I have coded till dataframe join and got stuck, Unable to proceed from there.
So I want my data in normalized form
My spark code
Dataframe esDF = df.select(
df.col("Field1") , df.col("Field2") ,df.col("Field3")
,df.col("Name") ,df.col("Age") ,
df.col("Code"),df.col("Key")
)
Please help.
Few options:
1 - Use saveToES method in dataFrame itself. ( this may not be supported in older versions , works for elasticsearch-spark-20_2.11-5.1.1.jar
import org.apache.spark.sql.SQLContext._
import org.apache.spark.sql.functions._
import org.elasticsearch.spark.sql._
dataFrame.saveToEs("<index>/<type>",Map(("es.nodes" -> <ip:port>"))
2 - Create a case class and use RDD[] method to save. ( Works for older versions as well )
import org.elasticsearch.spark._
case class ESDoc(...)
val rdd = df.map( row => EsDoc(..))
rdd.saveToEs("<index>/<type>",Map(("es.nodes" -> <ip:port>"))
3 - With older versions of scala ( < 2.11 ) , you will be stuck with 22 fields limit in case class . Note that you can use Map instead of case class
import org.elasticsearch.spark._
val rdd = df.map( row => Map(<key>:<value>...) )
rdd.saveToEs("<index>/<type>",Map(("es.nodes" -> <ip:port>")) // saves RDD[Map<K,V>]
For all above methods, you may want to pass es.batch.write.retry.count to appropriate value , or -1 ( infinite retries) if you have another way of controlling lifecycle of EMR ( making sure it wont run for ever)
val esOptions = Map("es.nodes" -> <host>:<port>, "es.batch.write.retry.count" -> "-1")

How to change SortOrder to avoid "unsupported collating sort order" error?

I've been working on a program with a .mdb database from a third party client. Everything was fine until I've tried to update elements on the database. The sortOrder field is not correct. I've tried to change it to general with MS Access, and had no luck. The message I get when I execute the update query is:
java.lang.IllegalArgumentException: Given index Index#150ab4ed[
name: (EXART) PrimaryKey
number: 2
isPrimaryKey: true
isForeignKey: false
data: IndexData#3c435123[
dataNumber: 2
pageNumber: 456
isBackingPrimaryKey: true
isUnique: true
ignoreNulls: false
columns: [
ReadOnlyColumnDescriptor#50fe837a[
column: Column#636e8cc[
name: (EXART) ARCodArt
type: 0xa (TEXT)
number: 0
length: 30
variableLength: true
compressedUnicode: true
textSortOrder: SortOrder[3082(0)]
]
flags: 1
]
]
initialized: false
pageCache: IndexPageCache#3a62c01e[
pages: (uninitialized)
]
]
] is not usable for indexed lookups due to unsupported collating sort order SortOrder[3082(0)] for text index
at com.healthmarketscience.jackcess.impl.IndexCursorImpl.createCursor(IndexCursorImpl.java:111)
at com.healthmarketscience.jackcess.CursorBuilder.toCursor(CursorBuilder.java:302)
at net.ucanaccess.commands.IndexSelector.getCursor(IndexSelector.java:150)
at net.ucanaccess.commands.CompositeCommand.persist(CompositeCommand.java:83)
at net.ucanaccess.jdbc.UcanaccessConnection.flushIO(UcanaccessConnection.java:268)
at net.ucanaccess.jdbc.UcanaccessConnection.commit(UcanaccessConnection.java:169)
at cultifortgestio.EntradaEixidaDades.Insercio(EntradaEixidaDades.java:76)
As you can see, Access does not change the sortOrder at all, I think it should be 1033, and it keeps being 3082. Is there a way to change this? As i said, changing in Access and performing a Compact and Repair database didn't work for me.
As with other similar situations, the solution was to change the sort order of the affected database. This is usually done by
opening the database in Access,
changing the "New database sort order" (see screenshot below) to "General - Legacy", and then
performing a Compact and Repair Database operation.
However, the wrinkle in this case was that the Windows locale was set to "Spanish", so the "General" sort options in Access do not map to a value that UCanAccess (Jackcess, actually) can update. The solution for the asker was to temporarily change their Windows locale to "English ...", perform the above steps to change the database sort order, and then change the Windows locale back.
For those who would prefer not to mess with their Windows locale settings, an alternative solution would be to have UCanAccess create a new empty database file via the newDatabaseVersion option, e.g.,
String connStr = "jdbc:ucanaccess://C:/someplace/new.accdb;newDatabaseVersion=V2010";
try (Connection conn = DriverManager.getConnection(connStr)) {
}
open the new database in Access, and then transfer the tables from the old database file into the new one using the Import feature. The database file created by UCanAccess will have a sort order that is compatible with update operations.

Inserting Analytic data from Spark to Postgres

I have Cassandra database from which i analyzed the data using SparkSQL through Apache Spark. Now i want to insert those analyzed data into PostgreSQL . Is there any ways to achieve this directly apart from using the PostgreSQL driver (I achieved it using postREST and Driver i want to know whether there is any methods like saveToCassandra())?
At the moment there is no native implementation of writing the RDD to any DBMS. Here are the links to the related discussions in the Spark user list: one, two
In general, the most performant approach would be the following:
Validate the number of partitions in RDD, it should not be too low and too high. 20-50 partitions should be fine, if the number is lower - call repartition with 20 partitions, if higher - call coalesce to 50 partitions
Call the mapPartition transformation, inside of it call the function to insert the records to your DBMS using JDBC. In this function you open the connection to your database and use the COPY command with this API, it would allow you to eliminate the need for a separate command for each record - this way the insert would be processed much faster
This way you would insert the data into Postgres in a parallel fashion utilizing up to 50 parallel connection (depends on your Spark cluster size and its configuration). The whole approach might be implemented as a Java/Scala function accepting the RDD and the connection string
You can use Postgres copy api to write it, its much faster that way. See following two methods - one iterates over RDD to fill the buffer that can be saved by copy api. Only thing you have to take care of is creating correct statement in csv format that will be used by copy api.
def saveToDB(rdd: RDD[Iterable[EventModel]]): Unit = {
val sb = mutable.StringBuilder.newBuilder
val now = System.currentTimeMillis()
rdd.collect().foreach(itr => {
itr.foreach(_.createCSV(sb, now).append("\n"))
})
copyIn("myTable", new StringReader(sb.toString), "statement")
sb.clear
}
def copyIn(tableName: String, reader: java.io.Reader, columnStmt: String = "") = {
val conn = connectionPool.getConnection()
try {
conn.unwrap(classOf[PGConnection]).getCopyAPI.copyIn(s"COPY $tableName $columnStmt FROM STDIN WITH CSV", reader)
} catch {
case se: SQLException => logWarning(se.getMessage)
case t: Throwable => logWarning(t.getMessage)
} finally {
conn.close()
}
}
Answer by 0x0FFF is good. Here is an additional point that would be useful.
I use foreachPartition to persist to external store. This is also inline with the design pattern Design Patterns for using foreachRDD given in Spark documentation
https://spark.apache.org/docs/1.3.0/streaming-programming-guide.html#output-operations-on-dstreams
Example:
dstream.foreachRDD { rdd =>
rdd.foreachPartition { partitionOfRecords =>
// ConnectionPool is a static, lazily initialized pool of connections
val connection = ConnectionPool.getConnection()
partitionOfRecords.foreach(record => connection.send(record))
ConnectionPool.returnConnection(connection) // return to the pool for future reuse
}
}
The answers above refers to old spark versions, in spark 2.* there is jdbc connector, enable write directly to RDBS from a dataFrame.
example:
jdbcDF2.write.jdbc("jdbc:postgresql:dbserver", "schema.tablename",
properties={"user": "username", "password": "password"})
https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

Categories

Resources