I have a blacklist stored in a table that contains approximately 1.5 billion records. My goal is to load the records into a HashSet so my program can later check if domain names are blacklisted (this is not the entire functionality of the program, just a piece). I currently have the following code to load the records:
HashSet<String> list = new HashSet<String>();
try {
Statement stmt = conn.createStatement();
stmt.setFetchSize(100000000);
try {
ResultSet rs = stmt.executeQuery("SELECT DNname FROM " + table);
try {
while (rs.next()) {
list.add(rs.getString(1));
}
} finally {
rs.close();
}
} finally {
stmt.close();
}
} catch (SQLException e) {
System.out.println("Error loading blacklist from DB");
e.printStackTrace();
}
However, this takes incredibly long to complete. Is there a more efficient way to accomplish my goal?
"this takes incredibly long to complete" - what takes long to complete? How long does the query take if you run it from a DB client console?
You've not provided any information on what indexes are present in the table, so that's something you'll want to look into.
setFetchSize is a hint to the JDBC driver - you really want a WHERE clause where you can limit the records fetched. Oracle DB can do that using ROWNUM; for other DBs, you'll have to look for yourself.
That said, I concur with #ScaryWombat; keeping 1.5 billion records in memory is not scalable design. An alternative would to be build up a client cache by caching the database look up for each domain; in all practicality, the cache will have fewer than 1.5 billion entries.
Related
Usually a SQLite db is read from a file. Each operation can require a file I/O operation if the db is large enough, and this can be slow.
However SQLite provides a way to load the db to memory - where it can be accessed using a JDBC url which looks something likejdbc:memory...
For example, this question describes how to achieve this using python: How to load existing db file to memory in Python sqlite3?
However I can't figure out how to achieve the same thing in Java or Scala using JDBC.
Any ideas?
I just tried with Xerial's sqlite-jdbc-3.27.2.1.jar and it appears that they let us restore from a native (binary) SQLite database file into a :memory: database like so:
try (Connection conn = DriverManager.getConnection("jdbc:sqlite::memory:")) {
try (Statement st = conn.createStatement()) {
st.execute("restore from C:\\__tmp\\thing.sqlite");
// let's see if we can read one of the tables
ResultSet rs = st.executeQuery("SELECT * FROM Table1");
while (rs.next()) {
System.out.println(rs.getString(1));
}
}
} catch (Throwable e) {
e.printStackTrace(System.err);
}
So, for a school project, I am building a discord bot. One of the features that I have built in is that he can retrieve gif links from a MySQL database, and send them in a message. Now, my issue is that I am only able to retrieve one record from my database, and no other records. If I put the query that I use into MySQL workbench and run it, it will retrieve those records.
This is the method for retrieving the gifs
public static ArrayList<Gif> GetGifsFromDB(String msg){
ArrayList<Gif> gifs = new ArrayList<>();
try(Connection conn = (Connection)DriverManager.getConnection(url, userName, password)){
Class.forName("com.mysql.jdbc.Driver").newInstance();
Statement stmnt = conn.createStatement();
String sql = "Select * from gif WHERE Type = '" + msg + "'";
stmnt.execute(sql);
try(ResultSet rs = stmnt.getResultSet()){
while(rs.next()){
Gif g = new Gif();
g.setID(rs.getInt("GifID"));
g.setURL(rs.getString("GifURL"));
System.out.println(g.getID() + g.getURL());
gifs.add(g);
}
rs.close();
conn.close();
}
}
catch(SQLException ex){
System.err.println(ex.getMessage());
}
catch(Exception ex){
System.err.println(ex.getMessage());
}
return gifs;
}
The "Type" in the database it just a category. With the test data I have in there, the 3 types are no, surprised and lonely. Only no returns a gif.
Remove closing ResultSet and Connection lines:
rs.close();
conn.close();
You are already closing it using try-with-resources
Issue ended up being with MySql not committing records to the database. Once workbench was refreshed, the added records disappeared. Rather strange that even though the records weren't in the database, they could be retrieve.
Most likely your msg is not exactly matching with any of the values for the database Type column.
Test by running
SELECT COUNT(*) FROM gif WHERE Type = '... put msg content here ...'
Do this manually directly on the database.
You can also try to put following line of code at the end:
System.out.println("Number of Selected Gifs: "+gifs.size());
If either of those results zero, then it means that msg was not exactly matched with Type. Maybe uppercase/lowercase issue?
Also to avoid SQL Injection, and other issues, please strongly consider using bind variables using a PreparedStatement.
I'm having a problem with mysql queries taking a little too long to execute.
private void sqlExecute(String sql, Map<Integer, Object> params) {
try (Connection conn = dataSource.getConnection(); PreparedStatement statement = conn.prepareStatement(sql)) {
if (params.size() > 0) {
for (Integer key : params.keySet()) {
statement.setObject(key, params.get(key));
}
}
statement.executeUpdate();
} catch (SQLException e) {
e.printStackTrace();
}
}
I've narrowed the problem down to the executeUpdate() line specifically. Everything else runs smoothly, but this particular line (and when I run executeQuery() as well) takes around 70ms to execute. This may not seem like an unreasonable amount of time, but currently this is a small test db table with under 100 rows. Columns are indexed, so a typical query is only looking at around 15 rows.
Ultimately however, we'll need to scan much larger tables with thousands of rows. Additionally, we're running numerous queries at a time (they can't really be batched because the results of each query are used for future queries), so all of those queries together are taking more like 7s.
Here's an example of a method for running a mysql query:
public void addRating(String db, int user_id, int item_id) {
parameters = new HashMap<>();
parameters.put(1, user_id);
parameters.put(2, item_id);
sql = "INSERT IGNORE INTO " + db + " (user_id, item_id) VALUES (?, ?)";
sqlExecute(sql, parameters);
}
A couple of things to note:
The column indexing is probably not the problem. When running the same mysql statements in our phpMyAdmin console, execution time is more like 0.3ms.
Also of note is that the execution time is consistently 70ms, regardless of the actual mysql statement.
I'm using connection pooling and wonder if this is possibly a source of the problem. In other tests, dataSource.getConnection() also takes about 70ms. I'm running this code locally.
The above example is for an INSERT using executeUpdate(), but the same problem happens for SELECT statements using executeQuery().
I have tried using /dev/urandom per Oracle's suggestion, but this made no difference.
I have a user table (Oracle 11g DB) with more than 1 million rows which has all the user passwords in plain text which I am trying to hash using SHA512 algorithm (hash and salt). To start with below is my Java class to read all the records from user table, hash it and update back to user table.
I am using prepared statement for both SELECT and UPDATE queries
I have set the prepared statement fetch size to 1000 (setFetchSize(1000))
I have set the auto commit property to false
Using batch method to do bulk update
try {
ps = con.prepareStatement("update user set password=? where ID=?");
psSel = con.prepareStatement("select ID, password from user");
psSel.setFetchSize(1000);
rs = psSel.executeQuery();
String hashPassword = null;
while (rs.next()) {
long id = rs.getLong(1);
String pwd = rs.getString(2);
hashPassword = <<CALL TO PASSWORD HASHING UTIL>>;
ps.setString(1, hashPassword);
ps.setLong(2, id);
ps.addBatch();
//Every 5000 records update and commit
if(++count % batchSize == 0) {
ps.executeBatch();
con.commit();
}
}
ps.executeBatch();
con.commit();
} catch (SQLException e) {
e.printStackTrace();
}
To update 100,000 records the above method takes close to 8 minutes which I feel is quite high.
Database used: Oracle 11g
Java Version: 1.6
Environment: Windows 7
I am not sure if I am missing something. Can you advise or recommend any best way to process such bulk loads?
UPDATE
I took a second look at the temp table - USER I created before and could see there was no Primary Key constraint added to the ID column. I went ahead and added the PK constraint for ID column and re ran my utility. Now it just took 36 seconds to process 100,000 rows.
To be double sure I also created another temp table USER_TMP2 without PK constraint and ran my utility and it took 8 mins as usual for 100,000
I took a second look at the temp table - USER I created before and could see there was no Primary Key constraint added to the ID column. I went ahead and added the PK constraint for ID column and re ran my utility. Now it just took 36 seconds to process 100,000 rows.
To be double sure I also created another temp table USER_TMP2 without PK constraint and ran my utility and it took 8 mins as usual for 100,000
Moral of the story: When investigating poor performance the first thing to do is investigate the indexing of the tables involved – either by simple inspection or by looking at the execution plans of the queries – to ensure that you are not doing a lot of unnecessary table scans.
Make a view of user table, and fetch the data from that table. This will optimize your query execution time. It might be helpful in your case.
I am working a Airsoft application.
I'm trying to add records to a MS Access Database via SQL in Java. I have established a link to the database, with the following:
try
{
//String Driver = "sun.java.odbc.JdbcOdbcDriver";
Class.forName("net.ucanaccess.jdbc.UcanaccessDriver");
Connection conn = DriverManager.getConnection("jdbc:ucanaccess://" + URL,"","");
Statement stmt = conn.createStatement();
System.out.println("Connection Established!");
ResultSet rs = stmt.executeQuery("SELECT * FROM AirsoftGunRentals");
tblRent.setModel(DbUtils.resultSetToTableModel(rs));
}
catch(Exception ex)
{
JOptionPane.showMessageDialog(null, "Error");
}
I am using Ucanaccess to access my MS database. It is reading the database and is displaying to a JTable. However, I need to create three JButtons to add, delete and update the table. I have tried to code the add button, and I have tried to add a record, but it crashes and gives me errors.
try
{
//String Driver = "sun.java.odbc.JdbcOdbcDriver";
Class.forName("net.ucanaccess.jdbc.UcanaccessDriver");
Connection conn = DriverManager.getConnection("jdbc:ucanaccess://" + URL,"","");
Statement stmt = conn.createStatement();
System.out.println("Connection Established!");
String Query= "INSERT INTO AirsoftGunRentals(NameOfGun, Brand, TypeOfGuns, NumberOfMagazines,Extras,NumberAvailable,UnitRent)"+
"VALUES('"+pName+"','"+pBrand+"','"+pTypeOfGun+"','"+pNumMags+"','"+pExtras+"','"+pNumberAvail+"','"+pRent+"');";
ResultSet rs = stmt.executeQuery(Query);
JOptionPane.showMessageDialog(null, "Success!");
}
catch(Exception ex)
{
JOptionPane.showMessageDialog(null, "Error");
}
I have attempted all three, hoping for a result. But am still getting big errors. The only difference between the buttons is that one adds, one deletes and one updates the table. Other then that, the code is the same, minus variables.
As Brahim mentionned it, you should use stmt.executeUpdate(Query) whenever you update / insert or delete data. Also with this particular query, given your String concatenation (see end of line), there is no space between the ")" and the "VALUES" which probably causes a malformed query.
However, I can see from your code that you are not very experienced with such use-cases, and I'd like to add some pointers before all hell breaks loose in your project :
Use PreparedStatement instead of Statement and replace variables by placeholders to prevent SQL Injection.
The code that you are using here is extremely prone to SQL injection - if any user has any control over any of the variables, this could lead to a full database dump (theft), destruction of data (vandalism), or even in machine takeover if other conditions are met.
A good advice is to never use the Statement class, better be safe than sorry :)
Respect Java Conventions (or be coherent).
In your example you define the String Query, while all the other variables start with lower-case (as in Java Conventions), instead of String query. Overtime, such little mistakes (that won't break a build) will lead to bugs due to mistaking variables with classnames etc :)
Good luck on your road to mastering this wonderful language ! :)
First add a space before the quotation marks like this :
String Query= "INSERT INTO AirsoftGunRentals(NameOfGun, Brand, TypeOfGuns, NumberOfMagazines,Extras,NumberAvailable,UnitRent) "+
" VALUES('"+pName+"','"+pBrand+"','"+pTypeOfGun+"','"+pNumMags+"','"+pExtras+"','"+pNumberAvail+"','"+pRent+"');";
And use stmt.executeUpdate(Query); instead of : stmt.executeQuery(Query);in your insert, update and delete queries. For select queries you can keep it.
I managed to find an answer on how to add, delete and update records to a MS Access DB. This is what I found, after I declared the connection, and the prepped statement. I will try to explain to the best I can. I had to add values individually using this:
(pstmt = Prepped Statement Variable)
pstmt.setWhatever(1,Variable);
And it works fine now. I use the same method to delete and update records.
This is the basic query format:
String SQLInsert = "INSERT INTO Tbl VALUES(NULL,?,?,?,?)";
The NULL in the statement is the autonumber in the table. and .setWhatever() clause replaces the question marks with the data types. Thus manipulating the database.
Thank you everyone for all your contributions. It helped a lot, and made this section a lot more understandable.