Flyway Java Migrations: insert BLOB into Postgresql - java

I am using Flyway for all database migrations. It is time to handle binary data (images) when migrating. I am using Postgresql and Spring Data JPA.
First I had this field resulting in db column photo oid using Postgresql
#Entity
public class Person {
// omitted
#Lob
private byte[] photo;
}
My migration scripts look something like this
V1__CREATE_DB.sql
V2__INSERT_PERSON.sql
V3__INSERT_PHOTO.java
At first I did not manage to successfully migrate (update) a person with photo using JdbcTemplate. Later I found out that I could change the type oid to bytea by doing this.
#Lob
#Type(type = "org.hibernate.type.BinaryType")
private byte[] photo;
I then made the migration code looks like this
public void migrate(Context context) throws IOException {
JdbcTemplate template = ...
List<String> locations = ... // photo physical locations/paths
for(String location: locations) {
InputStream image = ... // from location
Long id = ... // get id from image name
template.update("UPDATE person SET photo = ? where id = " + id,
new Object[] { new SqlLobValue(image.readAllBytes(), new DefaultLobHandler()) },
new int[] { Types.BLOB }
);
}
}
This V3__ migration works as expected however
Is there a better way to implement this migration and should I be able to also do this for oid and in that case how?
Is there a reason for not choosing bytea over oid except for obvious storage capacity differences?

After almost breaking Google I finally managed to find a solution for how to update column photo oid with JdbcTemplate.
DefaultLobHandler lobHandler = new DefaultLobHandler();
lobHandler.setWrapAsLob(true);
jdbcTemplate.execute("UPDATE person SET photo = ? where id = ?", new AbstractLobCreatingPreparedStatementCallback(lobHandler) {
protected void setValues(PreparedStatement ps, LobCreator lobCreator) throws SQLException {
lobCreator.setBlobAsBinaryStream(ps, 1, image, image.available());
ps.setLong(2, id);
}
}

Related

NamedParameterJdbcTemplate.query for returning only one element

I'am writing a method to retrieve an element of a DB giving a Long id parameter.
This id is unique, so the method should just return one element, and I want to create a class instance with this element retrived.
I've made the following method that works perfectly fine:
#Override
public ElementEntity getElement(final long id)
{
final MapSqlParameterSource paramSource = new MapSqlParameterSource();
paramSource.addValue("element_id", id);
final List<ElementEntity > listOfElements =
namedParameterJdbcOperations.query(SQL_RETURN_ELEMENT_BY_ID, paramSource, ROW_MAPPER_ELEMENT);
return !listOfElements .isEmpty() ? listOfElements.get(0) : null;
}
The ROW_MAPPER is implemented this way:
private static final RowMapper<ElementEntity > ROW_MAPPER_INSTALLER =
(rs, RowNum) ->
new ElementEntityBuilder().setElement(rs.getBytes("ELEMENT")).build();
The element is a byte array. I repeat, works perfectly. But I would like to avoid using a list and retrieving the first position and, instead, create directly the ElementEntity. So I tried the following approach:
#Override
public ElementEntity getElement(final long id)
{
final MapSqlParameterSource paramSource = new MapSqlParameterSource();
paramSource.addValue("element_id", id);
return namedParameterJdbcOperations.query(SQL_RETURN_ELEMENT_BY_ID, paramSource, (ResultSet rs) -> new DesktopAppInstallerEntity(rs.getBytes("ELEMENT")));
}
Althought I have't made any more changes, it gives me the following error:org.h2.jdbc.JdbcSQLNonTransientException: No data is available.
That's the full error (I changed some words to avoid showing the real SQL query):
org.springframework.dao.DataIntegrityViolationException: PreparedStatementCallback; SQL [SELECT SOMETHING FROM SOMEWHERE WHERE id = ?]; No data is available [2000-200]; nested exception is org.h2.jdbc.JdbcSQLNonTransientException: No data is available [2000-200]
I'm pretty new to all this JDBC, but it seems to me that the paramSource is not working properly when I apply this changes and I don't have the foggiest idea why is happening because I'm just changing the result extractor.
Well, I figured it out myself. I will share my answer to help the community. I used NamedParameterJdbcOperations.queryForObject instead of namedParameterJdbcOperations.query and used the same ROW_MAPPER:
#Override
public ElementEntity getElement(final long id)
{
final MapSqlParameterSource paramSource = new MapSqlParameterSource();
paramSource.addValue("id", id);
try
{
return namedParameterJdbcOperations.queryForObject(SQL_RETURN_ELEMENT, paramSource, ROW_MAPPER_ELEMENT);
}
catch (EmptyResultDataAccessException e)
{
return null;
}
}
In my particular case I'm interested to catch when the SQL Query attempt to find an Element that doesn't exist in the DB. For this purpose I catch and handle the EmptyResultDataAccessException thrown.
More info could be found here: https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/jdbc/core/namedparam/NamedParameterJdbcTemplate.html#queryForObject-java.lang.String-org.springframework.jdbc.core.namedparam.SqlParameterSource-org.springframework.jdbc.core.RowMapper-

What is the most efficient way to persist thousands of entities?

I have fairly large CSV files which I need to parse and then persist into PostgreSQL. For example, one file contains 2_070_000 records which I was able to parse and persist in ~8 minutes (single thread). Is it possible to persist them using multiple threads?
public void importCsv(MultipartFile csvFile, Class<T> targetClass) {
final var headerMapping = getHeaderMapping(targetClass);
File tempFile = null;
try {
final var randomUuid = UUID.randomUUID().toString();
tempFile = File.createTempFile("data-" + randomUuid, "csv");
csvFile.transferTo(tempFile);
final var csvFileName = csvFile.getOriginalFilename();
final var csvReader = new BufferedReader(new FileReader(tempFile, StandardCharsets.UTF_8));
Stopwatch stopWatch = Stopwatch.createStarted();
log.info("Starting to import {}", csvFileName);
final var csvRecords = CSVFormat.DEFAULT
.withDelimiter(';')
.withHeader(headerMapping.keySet().toArray(String[]::new))
.withSkipHeaderRecord(true)
.parse(csvReader);
final var models = StreamSupport.stream(csvRecords.spliterator(), true)
.map(record -> parseRecord(record, headerMapping, targetClass))
.collect(Collectors.toUnmodifiableList());
// How to save such a large list?
log.info("Finished import of {} in {}", csvFileName, stopWatch);
} catch (IOException ex) {
ex.printStackTrace();
} finally {
tempFile.delete();
}
}
models contains a lot of records. The parsing into records is done using parallel stream, so it's quite fast. I'm afraid to call SimpleJpaRepository.saveAll, because I'm not sure what it will do under the hood.
The question is: What is the most efficient way to persist such a large list of entities?
P.S.: Any other improvements are greatly appreciated.
You have to use batch inserts.
Create an interface for a custom repository SomeRepositoryCustom
public interface SomeRepositoryCustom {
void batchSave(List<Record> records);
}
Create an implementation of SomeRepositoryCustom
#Repository
class SomesRepositoryCustomImpl implements SomeRepositoryCustom {
private JdbcTemplate template;
#Autowired
public SomesRepositoryCustomImpl(JdbcTemplate template) {
this.template = template;
}
#Override
public void batchSave(List<Record> records) {
final String sql = "INSERT INTO RECORDS(column_a, column_b) VALUES (?, ?)";
template.execute(sql, (PreparedStatementCallback<Void>) ps -> {
for (Record record : records) {
ps.setString(1, record.getA());
ps.setString(2, record.getB());
ps.addBatch();
}
ps.executeBatch();
return null;
});
}
}
Extend your JpaRepository with SomeRepositoryCustom
#Repository
public interface SomeRepository extends JpaRepository, SomeRepositoryCustom {
}
to save
someRepository.batchSave(records);
Notes
Keep in mind that, if you are even using batch inserts, database driver will not use them. For example, for MySQL, it is necessary to add a parameter rewriteBatchedStatements=true to database URL.
So better to enable driver SQL logging (not Hibernate) to verify everything. Also can be useful to debug driver code.
You will need to make decision about splitting records by packets in the loop
for (Record record : records) {
}
A driver can do it for you, so you will not need it. But better to debug this thing too.
P. S. Don't use var everywhere.

Problem with DAO save method when using dbunit

I have a class that tests adding a group to a database:
class GroupDAOTest extends TestCase {
private IDatabaseTester databaseTester;
private GroupDao groupDao;
#BeforeEach
protected void setUp() throws Exception
{
databaseTester = new JdbcDatabaseTester("org.postgresql.Driver",
"jdbc:postgresql://localhost:5432/database_school", "principal", "school");
String file = getClass().getClassLoader().getResource("preparedDataset.xml").getFile();
IDataSet dataSet = new FlatXmlDataSetBuilder().build(new File(file));
databaseTester.setDataSet(dataSet);
databaseTester.setSetUpOperation(DatabaseOperation.CLEAN_INSERT);
databaseTester.onSetup();
groupDao = new GroupDao();
}
#Test
void add() throws Exception {
groupDao.save(new Group("NEW_GROUP"));
IDataSet databaseDataSet = databaseTester.getConnection().createDataSet();
ITable actualTable = databaseDataSet.getTable("groups");
String file = getClass().getClassLoader().getResource("GroupDao/add.xml").getFile();
IDataSet expectedDataSet = new FlatXmlDataSetBuilder().build(new File(file));
ITable expectedTable = expectedDataSet.getTable("groups");
Assertion.assertEquals(expectedTable, actualTable);
}
And here is the method "groupDao.save (new Group (" NEW_GROUP "));" must add a group with id = 4, name = "NEW_GROUP". Once the test passed, but when I ran it again and again, the group was added, but for some reason the id grew by one. And for some launch it was already like this:
[![enter image description here][1]][1]
Checked groupDao.save () - everything is fine, tried changing databaseTester.setSetUpOperation (DatabaseOperation ***), but it didn't help.
Can you tell me where the problem is, maybe I'm just not clearing something?
And just in case my dao method:
#Override
public void save(Group group) {
try (Connection connection = connectionProvider.getConnection();
PreparedStatement statement = connection.prepareStatement(SAVE_NEW_RECORD)) {
statement.setString(1, group.getName());
statement.executeUpdate();
} catch (SQLException e) {
e.printStackTrace();
}
}
And table schema:
CREATE TABLE groups
(
group_id serial PRIMARY KEY,
group_name VARCHAR(10) UNIQUE NOT NULL
);
Once the test passed, but when I ran it again and again, the group was added, but for some reason the id grew by one.
The issue is not with your code or configuration. PostgreSQL serial field type is auto-increment. It adds 1 to the field value each time saving row to the table.
Use the dbUnit ValueComparer assertion instead to compare with greater than or equal to instead of the assertion method you currently using which compares only on equality.
http://dbunit.sourceforge.net/datacomparisons/valuecomparer.html

Implementing Spring + Apache Flink project with Postgres

I have a SpringBoot gradle project using apache flink to process datastream signals. When a new signal comes through the datastream, I would like to query look up (i.e. findById() ) it's details using an ID in a postgres database table which is already created in order to get additional information about the signal and enrich the data. I would like to avoid using spring dependencies to perform the lookup (i.e Autowire repository) and want to stick with flink implementation for the lookup.
Where can i specify how to add the postgres connection config information such as port, database, url, username, password etc... (for simplicity purposes can assume the postgres db is local in my machine). Is it as simple as adding the configuration to the application.properties file? if so how can i write the query method to look up the record in the postgres table when searching by non primary key value?
Some online sources are suggesting using this skeleton code but I am not sure how/id it fits my use case. (I have a EventEntity model created which contains all the params/columns from the table which i'm looking up).
like so
public class DatabaseMapper extends RichFlatMapFunction<String, EventEntity> {
// Declare DB connection & query statements
public void open(Configuration parameters) throws Exception {
//Initialize DB connection
//prepare query statements
}
#Override
public void flatMap(String value, Collector<EventEntity> out) throws Exception {
}
}
Your sample code is correct. You can set all your custom initialization and preparation code for PostgreSQL in open() method. Then you can use your pre-configured fields in your flatMap() function.
Here is one sample for Redis operations
I have used RichAsyncFunction here and I suggest you do the same as it is suggested as best practice. Read here for more: https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/asyncio.html)
You can pass configuration parameteres in your constructor method and use it in your initialization process
public static class AsyncRedisOperations extends RichAsyncFunction<Object,Object> {
private JedisPool jedisPool;
private Configuration redisConf;
public AsyncRedisOperations(Configuration redisConf) {
this.redisConf = redisConf;
}
#Override
public void open(Configuration parameters) {
JedisPoolConfig jedisPoolConfig = new JedisPoolConfig();
jedisPoolConfig.setMaxTotal(this.redisConf.getInteger("pool", 8));
jedisPoolConfig.setMaxIdle(this.redisConf.getInteger("pool", 8));
jedisPoolConfig.setMaxWaitMillis(this.redisConf.getInteger("maxWait", 0));
JedisPool jedisPool = new JedisPool(jedisPoolConfig,
this.redisConf.getString("host", "192.168.10.10"),
this.redisConf.getInteger("port", 6379), 5000);
try {
this.jedisPool = jedisPool;
this.logger.info("Redis connected: " + jedisPool.getResource().isConnected());
} catch (Exception e) {
this.logger.error(BaseUtil.append("Exception while connecting Redis"));
}
}
#Override
public void asyncInvoke(Object in, ResultFuture<Object> out) {
try (Jedis jedis = this.jedisPool.getResource()) {
String key = jedis.get(key);
this.logger.info("Redis Key: " + key);
}
}
}

Upsert/Read into/from Cassandra database using Datastax API (using new Binary protocol)

I have started working with Cassandra database. I am planning to use Datastax API to upsert/read into/from Cassandra database. I am totally new to this Datastax API (which uses new Binary protocol) and I am not able to find lot of documentations as well which have some proper examples.
create column family profile
with key_validation_class = 'UTF8Type'
and comparator = 'UTF8Type'
and default_validation_class = 'UTF8Type'
and column_metadata = [
{column_name : crd, validation_class : 'DateType'}
{column_name : lmd, validation_class : 'DateType'}
{column_name : account, validation_class : 'UTF8Type'}
{column_name : advertising, validation_class : 'UTF8Type'}
{column_name : behavior, validation_class : 'UTF8Type'}
{column_name : info, validation_class : 'UTF8Type'}
];
Now below is the Singleton class that I have created for connecting to Cassandra database using Datastax API which uses new Binary protocol-
public class CassandraDatastaxConnection {
private static CassandraDatastaxConnection _instance;
protected static Cluster cluster;
protected static Session session;
public static synchronized CassandraDatastaxConnection getInstance() {
if (_instance == null) {
_instance = new CassandraDatastaxConnection();
}
return _instance;
}
/**
* Creating Cassandra connection using Datastax API
*
*/
private CassandraDatastaxConnection() {
try{
cluster = Cluster.builder().addContactPoint("localhost").build();
session = cluster.connect("my_keyspace");
} catch (NoHostAvailableException e) {
throw new RuntimeException(e);
}
}
public static Cluster getCluster() {
return cluster;
}
public static Session getSession() {
return session;
}
}
First question- let me know if I am missing anything in the above singleton class while making connection to Cassandra database using Datastax API which uses new Binary protocol.
Second question- Now I am trying to upsert and read data into/from Cassandra database-
These are the methods I have in my DAO's which will use the above Singleton class-
public Map<String, String> getColumnNames(final String userId, final Collection<String> columnNames) {
//I am not sure what I am supposed to do here?
//Given a userId, I need to retrieve those columnNames from the Cassandra database
//And then put it in the map with column name and its value and then finally return the map
Map<String, String> attributes = new ConcurrentHashMap<String, String>();
for(String col : columnNames ) {
attributes.put(col, colValue);
}
return attributes;
}
/**
* Performs an upsert of the specified attributes for the specified id.
*/
public void upsertAttributes(final String userId, final Map<String, String> columnNameAndValue) {
//I am not sure what I am supposed to do here to upsert the data in Cassandra database.
//Given a userId, I need to upsert the columns values into Cassandra database.
//columnNameAndValue is the map which will have column name as the key and corresponding column value as the value.
}
Can anyone help me with this? I am totally new to this Datastax API which is using new Binary protocol so having lot of problem on this.
Thanks for the help.
In your cassandra.yaml file look for the tag start_native_transport, by default its disabled, enable it.
Playing with Datastax Java Driver is quite similar like jdbc driver.
Insertion code
String query = "insert into test(key,col1,col2) values('1','value1','value2')";
session.execute(query);
Reading from Cassandra
String query="select * from test;";
ResultSet result = session.execute(query);
for (Row rows: result){
System.out.println(rows.getString("key"));
}

Categories

Resources