I am having memory issues because I am trying to read a huge ResultSet from a Netezza database. Does Netezza support any kind of "streaming" ResultSet like MySQL does? If not, will limiting the fetch size like this work instead?:
stmt.setFetchSize(50);
conn.setAutoCommitMode(false);
If you want to pull the rows to store in a file, then your best best is to use a remote external table.
Here is an example that creates a transient remote external table over JDBC. This will invoke the bulk export/load funciontality provided with the JDBC driver, and create a pipe delimited text file.
create external table 'c:\mytest.txt'
USING (DELIMITER '|' REMOTESOURCE 'JDBC' ) as
select *
from table1;
You can call this using conn.createStatement().execute, and you will likely have to add change the file specification to c:\mytest.txt to escape the existing backslash.
You can read more about external tables in the documentation here.
You can use setFetchSize, by the way. I'm not sure that it would solve your memory issue though.
From following URL, you can get information about ROWSET_LIMIT, with this variable setting you can limit query result to your requirement and create streams as per need -
https://www-01.ibm.com/support/knowledgecenter/SSULQD_7.1.0/com.ibm.nz.adm.doc/c_sysadm_user_rowset_limits.html?lang=en
"You can place a limit on the number of rows a query can return and thus restrict resources for large result sets. Specifying a rowset limit when you create a user or a group automatically limits the rows that are returned so that users do not have to append a limit clause to their SQL queries."
Related
I have a JPA method in my repository trying to find entities with a where clause. The problem is that i have huge data set, and when i try to send more than 32k elements in the list clause, i received an error. I found that is a PostgreSQL driver limitation, but i cant find a workaround.
I tried Pageable request but is hard to send only 30k for 8 millions record. Is there any possibility to send more than 30k objects in my in list where clause?
List<Object> findAllByIdIn(List<Long> ids)
No, you don't want to do it especially if you plan to send 8 million identifiers. Working around the IN statement or bind parameter limit is inefficient. Consider the following:
Thousands of bind parameters will result in megabytes of SQL. It will take considerable time to send the SQL text to the database. In fact the database might take longer to read the SQL text than execute the query as per Tom's answer to "Limit and conversion very long IN list: WHERE x IN ( ,,, ...)" question.
SQL parsing will be inefficient. Not only the megabytes of SQL text take time to read but with increased bind parameter count each query will usually have a distinct number of bound parameters used. This distinct bound parameter count is going to result in each query being parsed and planned separately (see this article which explains it).
There is a hard limit of bound parameters in a SQL statement. You just discovered it, 32760.
For those types of queries it's usually better to create temporary tables. Create a new temporary table before your query, insert all the identifiers into it and join it with the entity table. This join will be equivalent to IN condition except SQL text will be short.
It's important to understand from where are these 8 million identifiers loaded. If you are pulling these from the database in the previous query just to pass them back to the the next query you most likely want to write a stored procedure. There is possibly a flaw in your current approach, JPA is not always the right tool for the job.
I want to use a result set to hold my user data. Just one wrinkle - the data is in a CSV (comma delimited) format, not in a database.
The CSV file contains a header row at the top, then data rows. I would like to dynamically create a result set but not using an SQL query. The input would be the CSV file.
I am expecting (from my Java experience) that there would be methods like rs.newRow, rs.newColumn, rs.updateRow and so on, so I could use the structure of a result set without having any database query first.
Are there such methods on result sets? I read the docs and did not find any way. It would be highly useful to just build a result set from the CSV and then use it like I had done an SQL query. I find it difficult to use a java bean, array list of beans, and such - because I want to read in CSV files with different structures, the equivalent of SELECT * FROM in SQL.
I don't know what the names or numbers of columns will be in advance. So other SO questions like "Manually add data to a Java ResultSet" do not answer my needs. In that case, he already has an SQL created result set, and wants to add rows.
In my case, I want to create a result set from scratch, and add columns as the first row is read in. Now that I am thinking about it, I could create a query from the first row, using a statement like SELECT COL1, COL2, COL3 ... FROM DUMMY. Then use the INSERT ROW statement to read and insert the rest. (Later:) But on trying that, it fails because there is no real table associated.
The CsvJdbc open source project should do just that:
// Load the driver.
Class.forName("org.relique.jdbc.csv.CsvDriver");
// Create a connection using the directory containing the file(s)
Connection conn = DriverManager.getConnection("jdbc:relique:csv:/path/to/files");
// Create a Statement object to execute the query with.
Statement stmt = conn.createStatement();
// Query the table. The name of the table is the name of the file without ".csv"
ResultSet results = stmt.executeQuery("SELECT col1, col2 FROM myfile");
Just another idea to complement #Mureinik's one, that I find really good.
Alternatively, you could use any CSV Reader library (many of them out there), and load the file into an in-memory table using any in-memory database such as H2, HyperSQL, Derby, etc. These ones offer you a full/complete SQL engine, where you can run high end/complex queries.
It requires more work but you get a lot of flexibility to use the data afterwards.
After you try the in-memory solution, switching to a persistent database is really easy (just change the URL). This way you could load the CSV file only once into the database. On the second execution an on, the database would be ready to use; no need to load the CSV again.
I have indexed some columns in my MS Access database, and I am using Java to query the database.
Before indexing, I used this code:
ResultSet rs = statement.executeQuery("Select * from Employees where FirstName = Sarah");
After indexing some columns in the database, should I make any changes to the code. Is there something like this needed/possible:
statement.getIndexes();
I am asking this because my MS Access database has 300,000+ records. Fetching records was too slow because of the size. After indexing, fetching records did not speed up at all. I think I might still be accessing the unindexed version of that column.
(I am writing the code for an Android app, if that matters)
No. The SQL command tells it to return a certain result, how it finds that result (use of indexes and the like) is an implementation detail of the db. Now you may need to do something on the database to get it to implement the index. Although you really ought to think of moving to a real database, Access is just not meant for large amounts of data.
It's likely that your issue is the query. You should never use select * from a table. Always specify your columns. Have a look here.
I am parsing an XML file consisting of ~600K lines. Parsing and Inserting the data from the XML to the database is not a problem as I am using SAX to parse and using LOAD DATA INFILE (from a .txt file) to INSERT into the database. The txt file is populated in Java using JDBC. All of this takes a good 5 seconds to populate in the database.
My bottle neck is now executing multiple SELECT queries. Basically, each time I hit a certain XML tag, I would call the SELECT query to grab a data from another DB table. Adding these SELECT queries brings my populating time to 2 minutes.
For example:
I am parsing through an XML consisting of books, articles, thesis, etc.
Each book/article has child elements such as isbn, title, author, editor, publisher.
At each author/editor/publisher, I need to query a table in a database.
Let's say I encountered the author tag with value Tolkien.
I need to query a table that already exist in the database called author_table
The query is [select author_id from author_table where name = 'Tolkien']
This is where the bottle neck is happening.
Now my question is: Is there a way to speed this up?
BTW, the reason why I think 2 minutes is long is because this is a homework assignment and I am not yet finished with populating the database. I would estimate that the whole DB population would take 5 minutes. Thus the reason why I am seeking advice for performance optimization.
There are few things you can consider:
Use connection pooling so you don't create/close a new connection everytime you're executing query. Doing so is expensive
Cache whatever data you are obtaining via SELECT query. Is it possible to prefetch all the data beforehand so you don't have to query them on the spot?
If your SELECT is slow, ensure the query is optimized and you have appropriate index in place to avoid scanning the whole table
Ensure you use buffered IO in Java
Can you subdivide the work into multiple threads? If so create multiple worker thread to do multiple instance of your job in parallel
I am writing a DAO layer IN Java for my Tomcat server application,
I wish to use Prepared Statement wrapping my queries (1. parsing queries once, 2. defend against SQL injections),
My db design contains a MyISAM table per data source system. And most of the queries through DBO are selects using different table names as arguments.
Some of this tables may be created on the fly.
I already went though many posts that explain that i may not use table name as an argument for Prepared statement.
I have found solutions that suggest to use some type of function (e.g. mysql_real_escape_string) that may process this argument and append the result as a string to the query,
Is there any built in Jave library function that may do it in the best optimized way, or may be you may suggest to do something else in the DAO layer (i do not prefer to add any routines to the DB it self)?
Are you able to apply restrictions to the table names? That may well be easier than quoting. For example, if you could say that all table names had to match a regex of [0-9A-Za-z_]+ then I don't think you'd need any quoting. If you need spaces, you could probably get away with always using `table name` - but again, without worrying about "full" quoting.
Restricting what's available is often a lot simpler than handling all the possibilities :)
If you want to be extra safe than you can prepare a query and call it with supplied table name to check if it really exists:
PreparedStatement ps = conn.prepareStatement("SHOW TABLES WHERE tables = ?");
ps.setString(1, nameToCheck);
if(!ps.executeQuery().next())
throw new RuntimeException("Illegal table name: " + nameToCheck);
(The WHERE condition might need some correction because I don't have mysql under my fingers at the moment).