Integrating Kafka with Apache Calcite

Integrating Kafka with Apache Calcite - java

I'm trying to integrate calcite with Kafka, I refrenced CsvStreamableTable.
Each ConsumerRecord is convert to Object[] using the fowlloing code:
static class ArrayRowConverter extends RowConverter<Object[]> {
private List<Schema.Field> fields;
public ArrayRowConverter(List<Schema.Field> fields) {
this.fields = fields;
}
#Override
Object[] convertRow(ConsumerRecord<String, GenericRecord> consumerRecord) {
Object[] objects = new Object[fields.size()+1];
int i = 0 ;
objects[i++] = consumerRecord.timestamp();
for(Schema.Field field : this.fields) {
Object obj = consumerRecord.value().get(field.name());
if( obj instanceof Utf8 ){
objects[i ++] = obj.toString();
}else {
objects[i ++] = obj;
}
}
return objects;
}
}
Enumerator is implemented as following,one thread is constantly polling records from kafka and put them into a queue, getRecord() method poll from that queue:
public E current() {
return current;
}
public boolean moveNext() {
for(;;) {
if(cancelFlag.get()) {
return false;
}
ConsumerRecord<String, GenericRecord> record = getRecord();
if(record == null) {
try {
Thread.sleep(200L);
} catch (InterruptedException e) {
e.printStackTrace();
}
continue;
}
current = rowConvert.convertRow(record);
return true;
}
}
I tested SELECT STREAM * FROM Kafka.clicks, it works fine.
rowtime is the first column explicitly added,and the value is record Timestamp of Kafka.
But when I tried
SELECT STREAM FLOOR(rowtime TO HOUR)
AS rowtime,ip,COUNT(*) AS c FROM KAFKA.clicks GROUP BY FLOOR(rowtime TO HOUR), ip
It threw exception
java.sql.SQLException: Error while executing SQL "SELECT STREAM FLOOR(rowtime TO HOUR) AS rowtime,ip,COUNT(*) AS c FROM KAFKA.clicks GROUP BY FLOOR(rowtime TO HOUR), ip": From line 1, column 85 to line 1, column 119: Streaming aggregation requires at least one monotonic expression in GROUP BY clause
at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
at org.apache.calcite.avatica.Helper.createException(Helper.java:41)

You need to declare that the "ROWTIME" column is monotonic. In MockCatalogReader, note how "ROWTIME" is declared monotonic in the "ORDERS" and "SHIPMENTS" streams. That’s why some queries in SqlValidatorTest.testStreamGroupBy() are valid and others are not. The key method relied up by the validator is SqlValidatorTable.getMonotonicity(String columnName).

Related

Percentage calculation based on entity fields data in Spring data JPA

If the entity has 10 fields and 5 fields has data and 5 fields has no data in database record, then the percentage of that entity record has 50%.
How can we calculate using Spring data jpa or any existing libraries in Java

You could try using Reflection in a transient field of the entity class.
#Transient
public float getPercentDone() {
var propDesc = BeanUtilsBean2.getInstance().getPropertyUtils().getPropertyDescriptors(this);
var allProps = Arrays.stream(propDesc).filter(prop -> !"percentDone".equals(prop.getName()))
.collect(Collectors.toList());
var countNotNull = allProps.stream().filter(prop -> {
try {
return BeanUtilsBean2.getInstance().getProperty(this, prop.getName()) != null;
} catch (Exception e) {
return false;
}
}).count();
return (countNotNull * 100.0f) / allProps.size();
}
I used BeanUtils from Apache Commons for this, but if you can't use that, you can do the same with out-of-the-box reflection (it is just longer).
Skipping Fields
To skip fields like ids and join fields, you can create a list. And replace the filter with one that checks the list of skipped properties. If you put this in a #MappedSuperclass the entity children only need to override the list.
Note: percentDone and skippedProperties must both be skipped fields themselves.
List<String> skippedProperties = List.of("percentDone", "skippedProperties", "id", "user");
...
#Transient
public float getPercentDone() {
var propDesc = BeanUtilsBean2.getInstance().getPropertyUtils().getPropertyDescriptors(this);
var allProps = Arrays.stream(propDesc)
.filter (prop -> !skippedProperties.contains(prop.getName())
.collect(Collectors.toList());
var countNotNull = allProps.stream().filter(prop -> {
try {
return BeanUtilsBean2.getInstance().getProperty(this, prop.getName()) != null;
} catch (Exception e) {
return false;
}
}).count();
return (countNotNull * 100.0f) / (allProps.size() - skippedProperties.size());
}

ElasticSeach | Cannot found inserted document after delete index

I write a simple test validate duplicates are not exist, like this:
#Test
public void testSameDataNotPushedTwice() throws Exception {
// Do some logic
// index contains es index name
// adding this line fail the test
// deleteOldData(esPersistence.getESClient(), index);
esPersistence.insert(cdrData);
esPersistence.insert(cdrData);
SearchResponse searchResponse = getDataFromElastic(esPersistence.getESClient(), index);
assertThat(searchResponse.getHits().getHits().length).isEqualTo(1);
}
As you can see I push data to ES and check hits length equals 1.
Test is passed when the delete line is in commnet.
Now, I want to make sure there is no data from others tests, therefore I want to delete the index before the insert. The delete method works but search response return 0 hits after the insert.
The delete index method:
public static void deleteOldData(RestHighLevelClient client, String index) throws IOException {
GetIndexRequest request = new GetIndexRequest(index);
boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
if (exists) {
DeleteIndexRequest deleteRequest = new DeleteIndexRequest(index);
client.indices().delete(deleteRequest, RequestOptions.DEFAULT);
}
}
Highlights:
ES 7.6.2
The data is exist in ES.
Adding sleep not solve the problem (even for 10 seconds).
The search is working (document is found) while debbuging.
Bottom line: How can I perform delete index --> insert --> search and found the documents?
EDIT:
Add insert to ES and GetSettingsRequest:
deleteOldData(esPersistence.getESClient(), index);
esPersistence.insert(testData);
GetSettingsRequest request = new GetSettingsRequest().indices(index);
GetSettingsResponse getSettingsResponse = esPersistence.getESClient().indices().getSettings(request, RequestOptions.DEFAULT);
esPersistence.insert(testData);
Insert methods:
public boolean insert(List<ProjectData> projDataList) {
// Relevant Lines
BulkRequest bulkRequest = prepareBulkRequests(projDataList, esConfiguration.getCdrDataIndexName());
insertBulk(bulkRequest)
}
private BulkRequest prepareBulkRequests(List<ProjectData> data, String indexName) {
BulkRequest bulkRequest = new BulkRequest();
for (ProjectData ProjectData : data) {
String json = jsonParser.parsePojo(ProjectData);
bulkRequest.add(new IndexRequest(indexName)
.id(ProjectData.getId())
.source(json, XContentType.JSON));
}
return bulkRequest;
}
private boolean insertBulk(BulkRequest bulkRequest) {
try {
BulkResponse bulkResponse = rhlClient.bulk(bulkRequest, RequestOptions.DEFAULT);
if (bulkResponse.hasFailures()) {
logger.error(buildCustomBulkFailedMessage(bulkResponse));
return false;
}
} catch (IOException e) {
logger.warn("Failed to insert csv fields. Error: {}", e.getMessage());
return false;
}
return true;
}

With a Speical thanks to David Pilato (from ES fourm) - need to refresh the index after the insert operation, like this:
client.indices().refresh(new RefreshRequest(index), RequestOptions.DEFAULT);
link.

DAO and Synchronization between threads

I'm developing a Javafx application, that synchronizes some data from two different databases.
In the call method I get all the data and store it in an ArrayList. Then I loop through the ArrayList and I try to get that same data from the second database.
If it exists I compare it for differences and if there are differences I update it. Otherwise, if it dosen't exist, I insert it via a DAO object method.
The problem is that sometimes the second database takes some time to provide the response so the process continues its execution and the new data will be compared with old data.
My question is, how can I stop the process until the data has all been fetched and then proceed to the synchronization logic?
#Override
protected Map call() throws Exception {
Map<String, Integer> m = new HashMap();
updateTitle( "getting the data ..." );
int i, updated = 0, inserted = 0;
// creating first database instance
DAOFactory db1Dao = DAOFactory.getInstance( "db1" );
//creating the first database dataObject instance
Db1EmployerDAO empDb1Dao = db1Dao.getDAODb1Employer();
// creating second database instance
DAOFactory db2Dao = DAOFactory.getInstance( "db2" );
//creating the second database dataObject instance
Db2EmployeurDAO empDb2Dao = db2Dao.getDAODb2Employer();
Employer emp;
// getting all the object
List< Employer > LEmpDb1 = empDb1Dao.getAll();
updateTitle( "Data proccessing ..." );
//for each data in the list
for( i = 1; i <= LEmpDb1.size(); i++ ){
if( isCancelled() )
break;
updateMessage( "Processing employer : "+ LEmpDb1.get( i-1 ).getNemploy() +" "+ LEmpDb1.get( i-1 ).getRaison() );
//trying to get the object from the second database which the
//process sometimes pass befor the result is getting which is my problem
emp = empDb2Dao.getEmployerByNo( LEmpDb1.get( i-1 ).getNemploy() );
if( emp != null){
if( !LEmpDb1.get( i-1 ).equals( emp ) )
if( empDb2Dao.update( LEmpDb1.get( i-1 ) ) ){
updated++;
LOG.log( "MAJ employeur : "+ LEmpDb1.get( i ).getNemploy()+" => "+LEmpDb1.get( i ).getDifferences( emp ) );
}
} else {
if( empDb2Dao.insert( LEmpDb1.get( i-1 ) ) )
inserted++;
}
updateProgress( i, LEmpDb1.size() );
}
m.put( "upd", updated );
m.put( "ins", inserted );
m.put( "all", LEmpDb1.size() );
return m;
}
The getEmployerByNo method
public synchronized Employer getEmployerByNo( String no_emp ) throws DAOException {
Employeur emp = null;
Connection con = null;
PreparedStatement stm = null;
ResultSet res = null;
try{
con = dao.getConnection();
stm = preparedRequestInitialisation( con, GET_BY_NO_SQL, no_emp );
res = stm.executeQuery();
if( res.next() ){
//map is a function that map the database resultset data with the object properties
emp = map( res );
LOG.info( "getting the employer : "+ no_emp );
}
} catch( SQLException e ){
throw new DAOException( e.getLocalizedMessage() );
} finally{
silentClose( res, stm, con );
}
return emp;
}

Look into using an ExecutorService and Future.get() as needed to wait for completion. See the documentation here and here. Here is a more-or-less complete example:
public class Application implements Runnable {
private final ExecutorService pool = Executors.newCachedThreadPool();
public void run() {
Dao firstDao = new DaoImpl();
Dao secondDao = new AnotherDaoImpl();
FetchAllTask fetchAll = new FetchAllTask(firstDao);
Future<?> fetchAllFuture = pool.submit(fetchAll);
try {
fetchAllFuture.get();
} catch (Exception e) {
// TODO handle
System.out.println("An exception occurred!");
e.printStackTrace();
}
ConcurrentSkipListSet<AnObject> items = fetchAll.getItems();
Iterator<AnObject> it = items.iterator();
while (it.hasNext()) {
// insert your cancellation logic here
// ...
AnObject daoObj = it.next();
FetchOneTask fetchOne = new FetchOneTask(secondDao, daoObj.getId());
Future<?> fetchOneFuture = pool.submit(fetchOne);
try {
fetchOneFuture.get();
AnObject anotherDaoObj = fetchOne.getAnObject();
if (anotherDaoObj == null) {
// the object retrievied by the first dao (first datasource)
// is not in the second; it needs to be inserted into the second
System.out.println(String.format("Inserting %s", daoObj));
secondDao.insert(daoObj);
} else {
System.out.println(String.format("Updating %s to %s", anotherDaoObj, daoObj));
secondDao.update(daoObj);
}
} catch (Exception e) {
System.out.println("An exception occurred!");
e.printStackTrace();
}
}
Set<AnObject> itemsInSecondDb = secondDao.fetchAll();
for (AnObject o : itemsInSecondDb) {
System.out.println(o);
}
pool.shutdown();
}
// ... invoke the app thread from somewhere else
}

Ormlite Query BuilderCondition

i have a user form, the user specify the research criteria and i must apply them to get the right data from the database using ormlite :
boolean set = false;
QueryBuilder<Client, Integer> builder = clientsDao.queryBuilder();
Where<Client, Integer> builderWhere = builder.where();
if (!tfSearchName.getText().equals("")) {
builderWhere.like("name", tfSearchName.getText().trim());
builderWhere.and();
set = true;
}
if (!tfSearchBalanceMin.getText().equals("")) {
builderWhere.gt("balance", tfSearchBalanceMin);
builderWhere.and();
set = true;
}
if (!tfSearchBalanceMax.getText().equals("")) {
builderWhere.lt("balance", tfSearchBalanceMax);
set = true;
}
clientTable.setItems(FXCollections.observableArrayList(
set ? clientsDao.query(builderWhere.prepare())
: clientsDao.queryForAll()));
the problem with the query builder is there is always an and Clause in the end so that always throw an expection.
i want to know a good way to generate my sql statement using condition like i do in my code.
PS : sorry for bad english

You could try to use public Where<T, ID> and(int numClauses)
For example:
int andClauses= 0; // number of clauses that should be connected with "and" operation
QueryBuilder<Client, Integer> builder = clientsDao.queryBuilder();
Where<Client, Integer> builderWhere = builder.where();
if (!TextUtils.isEmpty(tfSearchName.getText())) {
builderWhere.like("name", tfSearchName.getText().trim());
andClauses++;
}
if (!TextUtils.isEmpty(tfSearchBalanceMin.getText())) {
builderWhere.gt("balance", tfSearchBalanceMin);
andClauses++;
}
if (!TextUtils.isEmpty(tfSearchBalanceMax.getText())) {
builderWhere.lt("balance", tfSearchBalanceMax);
andClauses++;
}
clientTable.setItems(FXCollections.observableArrayList(
andClauses > 0 ? clientsDao.query(builderWhere.and(andClauses).prepare())
: clientsDao.queryForAll()));

Java error Concurrent modification Exception

I need help for this case below :
I have 2 method :
private void calculateTime(Map.Entry<List<String>, List<LogRecord>> entry, List<LogProcess> processList) {
List<List<LogRecord>> processSpentTime = new ArrayList<List<LogRecord>>();
processSpentTime = subListProcess(entry, processSpentTime);
for (List<LogRecord> item : processSpentTime) {
processList = parse(item, DEFAULT_START_LEVEL);
}
}
and the second method
private List<LogProcess> parse(List<LogRecord> recordList, int level) {
List<LogProcess> processList = new ArrayList<LogProcess>();
if(!recordList.isEmpty()) {
LogProcess process = findProcess(recordList, level);
if(!(process instanceof NullLogProcess)) {
if(!(process instanceof IncompleteLogProcess)) {
processList.add(process);
}
int fromIndex = recordList.indexOf(process.returnStartIndexOfNextProcess()) + 1;
processList.addAll(parse(recordList.subList(fromIndex, recordList.size()), level));
}
}
return processList;
}
public LogProcess findProcess(List<LogRecord> recordList, int level) {
LogRecord endRecord = null;
LogRecord startRecord = findStartRecord(recordList);
if(startRecord instanceof NullLogRecord) {
return new NullLogProcess();
}
List<LogRecord> startEndRecord = findStartEndRecord(startRecord, recordList);
startRecord = startEndRecord.get(0);
endRecord = startEndRecord.get(1);
LogProcess process = returnLogProcess(startRecord, endRecord);
process.setLevel(level);
process.setChildren(findChildProcess(recordList, startRecord, endRecord, level + 1));
return process;
}
private List<LogProcess> findChildProcess(List<LogRecord> recordList, LogRecord startRecord, LogRecord endRecord, int level) {
int fromIndex = recordList.indexOf(startRecord) + 1;
int toIndex = recordList.indexOf(endRecord);
if(toIndex > fromIndex) {
List<LogRecord> recordSubList = recordList.subList(fromIndex, toIndex);
return parse(recordSubList, level);
} else {
return new ArrayList<LogProcess>();
}
}
private List<LogRecord> findStartEndRecord(LogRecord startRecord, List<LogRecord> recordList) {
List<LogRecord> startEndRecord = new ArrayList<LogRecord>();
if (!recordList.isEmpty()) {
startEndRecord.add(startRecord);
for (LogRecord record : recordList) {
boolean isStartRecord = record.isStartPoint() && record.hasSameActionName(startRecord);
if(isStartRecord){
startEndRecord = new ArrayList<LogRecord>();;
startEndRecord.add(record);
continue;
}
boolean isEndRecord = record.isEndPoint() && record.hasSameActionName(startRecord);
if (isEndRecord) {
startEndRecord.add(record);
return startEndRecord;
}
}
return startEndRecord;
}
return startEndRecord;
}
private LogRecord findStartRecord(List<LogRecord> recordList) {
for (LogRecord record : recordList) {
if (record.isStartPoint()){
recordList.remove(record);
return record;
}
}
return new NullLogRecord();
}
at the method calculatime in the for loop I just get the result for the first item, and after that I got error the same the title . please help me and explain me more for this case .

The name of this exception is a bit confusing, because it isn't related to multi threading.
What happens is that you are iterating over a collection which is being modified while you are iterating over it.
If performance is not your highest concern, a simple way out would be to copy the list and iterate over that copy and add items to the original list.

My guess is it's related to recordList.subList():
Returns a view of the portion of this list. [..] The returned list is backed by this list. [..] The semantics of the list returned by this method become undefined if the backing list (i.e., this list) is structurally modified in any way other than via the returned list. [..] All methods first check to see if the actual modCount of the backing list is equal to its expected value, and throw a ConcurrentModificationException if it is not.
I don't see any modification, so it probably happens in findProcess(). Consider creating a copy of that list:
new ArrayList(recordList.subList())

You are getting the exception because of this :
for (LogRecord record : recordList) {
if (record.isStartPoint()){
recordList.remove(record); <--- This is the cause
return record;
}
}
Use an Iterator Instead
Iterator<LogRecord> iterator = recordList.iterator();
while(iterator.hasNext()){
LogRecord logRecord = iterator.next();
if(record.isStartPoint()){
iterator.remove();
return logRecord;
}
Check if this works

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Integrating Kafka with Apache Calcite - java

Related

Percentage calculation based on entity fields data in Spring data JPA

ElasticSeach | Cannot found inserted document after delete index

DAO and Synchronization between threads

Ormlite Query BuilderCondition

Java error Concurrent modification Exception

Categories

Resources