I have the following SQL code:
create table cross_links(sid varchar,tid varchar,snd int)
as
select * from csvread('csvfile')
I want to read csvfile twice. The second is exchange the position of sid and tid and then insert into the table. But it cost some performence, so I want to read it only once and the result is the same as read it twice.
How can I do it?
I think it must change the source code of H2.
First, you don't need to do this. You can just write a simple CSV reader yourself that swaps or renames the columns as it reads them in.
Also, with your approach, you would also need to modify csvread to support different types of data - it only supports VARCHAR. That is going to be more work!
Related
I have a parquet file stored in AWS S3 that I want to query. I want to retrieve a certain row of data given that it equals a value. Almost like I would in SQL:
SELECT * FROM file.parquet WHERE id = '1234';
I am using parquet-mr to load it in to memory directly from S3 and read it and have it set up with a AvroParquetReader to read the rows.
I've copied every row into a Map for easy querying for now, however is there a better way to do this? The documentation for parquet-mr is not great, and most tutorials use deprecated methods.
Here is some example code of what i've got:
final ParquetReader<GenericRecord> reader = AvroParquetReader
.<GenericRecord>builder(internalPath)
.withConf(parquetConfiguration).build();
You can use reader.read() to get the next row in the file (which is what i've used to put it in to a HashMap, but I can't find any methods in parquet-mr that allow you to query a file without loading the entire file in to memory.
The feature you are looking for is called predicate pushdown. You can read about it and find examples here.
Is it possible to get the result set data to single string (for writing to notepad)? Using record set we need to loop through each fields.
Is there any other way we can get to a single string without loop through each fields? I am able to do this in VBA by copying the entire recordset to an excel sheet.
There isn't really something in standard Java that does this for you, except maybe using javax.sql.rowset.WebRowSet and one of its writeXml methods, but that is a very specific and verbose format.
If you want to output a result set in a specific format, you will need to do this yourself, or find a library that does this for you.
Well, in general you have to convert a result set into an any DTO class whatever you want and then implement a toString() method in the way you want to be.
There are some ways to achive this. Here is a one:
Mapping a JDBC ResultSet to an object
I want to use a result set to hold my user data. Just one wrinkle - the data is in a CSV (comma delimited) format, not in a database.
The CSV file contains a header row at the top, then data rows. I would like to dynamically create a result set but not using an SQL query. The input would be the CSV file.
I am expecting (from my Java experience) that there would be methods like rs.newRow, rs.newColumn, rs.updateRow and so on, so I could use the structure of a result set without having any database query first.
Are there such methods on result sets? I read the docs and did not find any way. It would be highly useful to just build a result set from the CSV and then use it like I had done an SQL query. I find it difficult to use a java bean, array list of beans, and such - because I want to read in CSV files with different structures, the equivalent of SELECT * FROM in SQL.
I don't know what the names or numbers of columns will be in advance. So other SO questions like "Manually add data to a Java ResultSet" do not answer my needs. In that case, he already has an SQL created result set, and wants to add rows.
In my case, I want to create a result set from scratch, and add columns as the first row is read in. Now that I am thinking about it, I could create a query from the first row, using a statement like SELECT COL1, COL2, COL3 ... FROM DUMMY. Then use the INSERT ROW statement to read and insert the rest. (Later:) But on trying that, it fails because there is no real table associated.
The CsvJdbc open source project should do just that:
// Load the driver.
Class.forName("org.relique.jdbc.csv.CsvDriver");
// Create a connection using the directory containing the file(s)
Connection conn = DriverManager.getConnection("jdbc:relique:csv:/path/to/files");
// Create a Statement object to execute the query with.
Statement stmt = conn.createStatement();
// Query the table. The name of the table is the name of the file without ".csv"
ResultSet results = stmt.executeQuery("SELECT col1, col2 FROM myfile");
Just another idea to complement #Mureinik's one, that I find really good.
Alternatively, you could use any CSV Reader library (many of them out there), and load the file into an in-memory table using any in-memory database such as H2, HyperSQL, Derby, etc. These ones offer you a full/complete SQL engine, where you can run high end/complex queries.
It requires more work but you get a lot of flexibility to use the data afterwards.
After you try the in-memory solution, switching to a persistent database is really easy (just change the URL). This way you could load the CSV file only once into the database. On the second execution an on, the database would be ready to use; no need to load the CSV again.
I have a use case where in I need to read rows from a file, transform them using an engine and then write the output to a database (that can be configured).
While I could write a query builder of my own, I was interested in knowing if there's already an available solution (library).
I searched online and could find jOOQ library but it looks like it is type-safe and has a code-gen tool so is probably suited for static database schema's. In the use case that I have db's can be configured dynamically and the meta-data is programatically read and made available for write-purposes (so a list of tables would be made available, user can select the columns to write and the insert script for these column needs to be dynamically created).
Is there any library that could help me with the use case?
If I understand correctly you need to query the database structure, display the result to via a GUI and have the user map data from a file to that structure?
Assuming this is the case, you're not looking for a 'library', you're looking for an ETL tool.
Alternatively, if you're set on writing something yourself, the (very) basic way to do this is:
the structure of a database using Connection.getMetaData(). The exact usage can vary between drivers so you'll need to create an abstraction layer that meets your needs - I'd assume you're just interested in the table structure here.
the format of the file needs to be mapped to a similar structure to the tables.
provide a GUI that allows the user to connect elements from the file to columns in the table including any type mapping that is needed.
create a parametrized insert statement based on file element to column mapping - this is just a simple bit of string concatenation.
loop throw the rows in the file performing a batch insert for each.
My advice, get an ETL tool, this sounds like a simple problem, but it's full of idiosyncrasies - getting even an 80% solution will be tough and time consuming.
jOOQ (the library you referenced in your question) can be used without code generation as indicated in the jOOQ manual:
http://www.jooq.org/doc/latest/manual/getting-started/use-cases/jooq-as-a-standalone-sql-builder
http://www.jooq.org/doc/latest/manual/sql-building/plain-sql
When searching through the user group, you'll find other users leveraging jOOQ in the way you intend
The setps you need to do is:
read the rows
build each row into an object
transform the above object to target object
insert the target object into the db
Among the above 4 steps, the only thing you need to do is step 3.
And for the above purpose, you can use Transmorph, EZMorph, Commons-BeanUtils, Dozer, etc.
I am editing Java code which stores data in a YAML file, but I need to make it use MySQL instead, but I'm not sure how to go about doing this. The code makes request to read and write data such as SQLset("top.middle.nameleaf", "Joe") or SQLget("top.middle.ageleaf"). These functions are defined by me. This would be simple with YAML, but I'm not sure how to implement this with SQL. Thanks in advance. Another thing is that if top.middle was set to null then top.middle.nameleaf would be removed, like it would in YAML.
sql doesn't work in the same way as yaml. you cannot blindly replace a yaml solution with a sql one. you will have to actually think about what you want to do.
get a basic understanding of how sql works, with tables and columns, and relationships between them.
define a set of tables that match the data you have in yaml (it might be one table for each structure, and a foreign key linking tables that are nested in yaml).
work out how best to adapt your code to use sql. one approach might be to work with yaml until the data are "ready" and then translate the final yaml structure to sql. alternatively, you may want to replace all your yaml-routines with sql routines, but without doing the above it is hard to say exactly how that will work.