On GAE with Spring/JDO after saving 2 entities (in transaction).
On calling getById - entities fetched from data storage.
On calling getCount() returns "0"
and - on calling getAll() - returns empty collection.
#Override
public Long getCount() {
return ((Integer) getJdoTemplate().execute(new JdoCallback() {
#Override
public Object doInJdo(PersistenceManager pm) throws JDOException {
Query q = pm.newQuery(getPersistentClass());
q.setResult("count(this)");
return q.execute();
}
})).longValue();
}
#Override
public void saveOrUpdate(T entity) {
getJdoTemplate().makePersistent(entity);
}
#Override
public List<T> getAll() {
return new ArrayList<T>(getJdoTemplate().find(getPersistentClass()));
}
Google's implementation of JDO does not currently support aggregates AFAIK. Try keeping track of the count by updating some other entity every time you persist a new entity. If you are doing frequent writes, you'll want a "sharded" counter.
Your question is pretty close to this one, so reading those answers may help.
count() is actually implemented in GAE/J's plugin, as seen here
http://code.google.com/p/datanucleus-appengine/source/browse/trunk/src/org/datanucleus/store/appengine/query/DatastoreQuery.java#341
If you have a problem with it then suggest that you provide a testcase to Google and raise an issue on their issue tracker for their GAE/J DN plugin ("Issues" on the linked page)
Related
Background
I have a Spring Batch job where :
FlatFileItemReader - Reads one row at a time from the file
ItemProcesor - Transforms the row from the file into a List<MyObject> and returns the List. That is, each row in the file is broken down into a List<MyObject> (1 row in file transformed to many output rows).
ItemWriter - Writes the List<MyObject> to a database table. (I used this
implementation to unpack the list received from the processor
and delegae to a JdbcBatchItemWriter)
Question
At point 2) The processor can return a List of 100000 MyObject instances.
At point 3), The delegate JdbcBatchItemWriter will end up writing the entire List with 100000 objects to the database.
My question is : The JdbcBatchItemWriter does not allow a custom batch size. For all practical purposes, the batch-size = commit-interval for the step. With this in mind, is there another implementation of an ItemWriter available in Spring Batch that allows writing to the database and allows configurable batch size? If not, how do go about writing a custom writer myself to acheive this?
I see no obvious way to set the batch size on the JdbcBatchItemWriter. However, you can extend the writer and use a custom BatchPreparedStatementSetter to specify the batch size. Here is a quick example:
public class MyCustomWriter<T> extends JdbcBatchItemWriter<T> {
#Override
public void write(List<? extends T> items) throws Exception {
namedParameterJdbcTemplate.getJdbcOperations().batchUpdate("your sql", new BatchPreparedStatementSetter() {
#Override
public void setValues(PreparedStatement ps, int i) throws SQLException {
// set values on your sql
}
#Override
public int getBatchSize() {
return items.size(); // or any other value you want
}
});
}
}
The StagingItemWriter in the samples is an example of how to use a custom BatchPreparedStatementSetter as well.
The answer from Mahmoud Ben Hassine and the comments pretty much covers all aspects of the solution and is the accepted answer.
Here is the implementation I used if anyone is interested :
public class JdbcCustomBatchSizeItemWriter<W> extends JdbcDaoSupport implements ItemWriter<W> {
private int batchSize;
private ParameterizedPreparedStatementSetter<W> preparedStatementSetter;
private String sqlFileLocation;
private String sql;
public void initReader() {
this.setSql(FileUtilties.getFileContent(sqlFileLocation));
}
public void write(List<? extends W> arg0) throws Exception {
getJdbcTemplate().batchUpdate(sql, Collections.unmodifiableList(arg0), batchSize, preparedStatementSetter);
}
public void setBatchSize(int batchSize) {
this.batchSize = batchSize;
}
public void setPreparedStatementSetter(ParameterizedPreparedStatementSetter<W> preparedStatementSetter) {
this.preparedStatementSetter = preparedStatementSetter;
}
public void setSqlFileLocation(String sqlFileLocation) {
this.sqlFileLocation = sqlFileLocation;
}
public void setSql(String sql) {
this.sql = sql;
}
}
Note :
The use of Collections.unmodifiableList prevents the need for any explicit casting.
I use sqlFileLocation to specify an external file that contains the sql and FileUtilities.getfileContents simply returns the contents of this sql file. This can be skipped and one can directly pass the sql to the class as well while creating the bean.
I wouldn't do this. It presents issues for restartability. Instead, modify your reader to produce individual items rather than having your processor take in an object and return a list.
My goal is to cache data inmemory for 60s. As soon as the entry is read again from cache, I want to remove it from cache (permit single reads only).
If those 60s expired in the meantime and the entry is still available in cache, I want to writebehind the entry into a database.
Is there any existing technology/spring/apache framework that already offers such a cache?
(sidenote: I don't want to use complex libraries like redis, ehcache etc for such a simple usecase).
If set up manually, I'd be doing as follows. But probably there are better options?
#Service
public class WriteBehindCache {
static class ObjectEntry {
Object data;
LocalDateTime timestamp;
public ObjectEntry(Object data) {
this.data = data;
timestamp = LocalDateTime.now();
}
}
Map<String, ObjectEntry> cache = new ConcurrentHashMap<>();
//batch every minute
#Scheduled(fixedRate = 60000)
public void writeBehind() {
LocalDateTime now = LocalDateTime.now();
List<ObjectEntry> outdated = cache.values()
.filter(entry -> entry.getValue().timestamp.plusSeconds(60).isBefore(now))
.collect(Collectors.toList());
databaseService.persist(outdated);
cache.removeAll(outdated); //pseudocode
}
//always keep most recent entry
public void add(String key, Object data) {
cache.put(key, new ObjectEntry(data));
}
//fallback lookup to database if cache is empty
public Object get(String key) {
ObjectEntry entry = cache.remove(key);
if (entry == null) {
entry = databaseService.query(key);
if (entry != null) databaseService.remove(entry);
}
return entry;
}
}
Your solution has two problems:
You are doing a sequential scan for persisting, which will get costly when there are a lot of entries
The code has race conditions
Due to the race conditions the code does not satisfy your requirements. Its possible to construct a concurrent access sequence where an entry is removed from the cache but as well was written to the database
Is there any existing technology/spring/apache framework that already offers such a cache? (sidenote: I don't want to use complex libraries like redis, ehcache etc for such a simple usecase).
I think you can solve the concurrency issues based on the ConcurrentHashMap. But I don't know an elegant way for the timeout. Still, a possible solution is to use a caching library. I'd like to offer an example based on cache2k which is not heavy (about a 400k jar) and has other nice use cases as well. As an extra there is also good support for the Spring caching abstraction.
public static class WriteBehindCache {
Cache<String, Object> cache = Cache2kBuilder.of(String.class, Object.class)
.addListener((CacheEntryExpiredListener<String, Object>) (cache, entry)
-> persist(entry.getKey(), entry.getValue()))
.expireAfterWrite(60, TimeUnit.SECONDS)
.build();
public void add(String key, Object data) {
cache.put(key, data);
}
public Object get(String key) {
return cache.invoke(key, e -> {
if (e.exists()) {
Object v = e.getValue();
e.remove();
return v;
}
return loadAndRemove(e.getKey());
});
}
// stubs
protected void persist(String key, Object value) {
}
protected Object loadAndRemove(String key) {
return null;
}
}
With this wiring the cache blocks out concurrent operation on one entry, so it is certain that only one database operation runs for one entry at a time.
You can do it in similar ways with other caching libraries. Using the JCache/JSR107 API the code would look almost identical.
A more "lighter" approach is to use jhalterman's expiringmap
Personally, I believe a cache should be in every developers toolbox. However, I am the author of cache2k. Of course, I need to say that.
Does anyone know how in spring-batch (3.0.7) can I flat a result of processor that returns list of entities?
Example:
I got a processor that returns List
public class MyProcessor implements ItemProcessor < Long , List <Entity>> {
public List<Entity> process ( Long id )
}
Now all following processors / writers need to work on List < Entity >. Is there any way to flat the result to simply Entity so the further processors in given step can work on single Entities?
The only way is to persist the list somehow with a writer and then create a separate step that would read from the persisted data.
Thanks in advance!
As you know, processors in spring-batch can be chained with a composite processor. Within the chain, you can change the processing type from processor to processor, but of course input and output type of two "neighbour"-processors have to match.
However, Input out Output type is always treated as one item. Therefore, if the output type of a processor ist a List, this list is regared as one item. Hence, the following processor needs to have an InputType "List", resp., if a writer follows, the Writer needs to have a List-of-List as type its write-method.
Moreover, a processor can not multiply its element. There can only be one output item for every input element.
Basically, there is nothing wrong with having a chain like
Reader<Integer>
ProcessorA<Integer,List<Integer>>
ProcessorB<List<Integer>,List<Integer>>
Writer<List<Integer>> (which leads to a write-method write(List<List<Integer>> items)
Depending on the context, there could be a better solution.
You could mitigate the impact (for instance reuseability) by using wrapper-processors and a wrapper-writer like the following code examples:
public class ListWrapperProcessor<I,O> implements ItemProcessor<List<I>, List<O>> {
ItemProcessor<I,O> delegate;
public void setDelegate(ItemProcessor<I,O> delegate) {
this.delegate = delegate;
}
public List<O> process(List<I> itemList) {
List<O> outputList = new ArrayList<>();
for (I item : itemList){
O outputItem = delegate.process(item);
if (outputItem!=null) {
outputList.add(outputItem);
}
}
if (outputList.isEmpty()) {
return null;
}
return outputList;
}
}
public class ListOfListItemWriter<T> implements InitializingBean, ItemStreamWriter<List<T>> {
private ItemStreamWriter<T> itemWriter;
#Override
public void write(List<? extends List<T>> listOfLists) throws Exception {
if (listOfLists.isEmpty()) {
return;
}
List<T> all = listOfLists.stream().flatMap(Collection::stream).collect(Collectors.toList());
itemWriter.write(all);
}
#Override
public void afterPropertiesSet() throws Exception {
Assert.notNull(itemWriter, "The 'itemWriter' may not be null");
}
public void setItemWriter(ItemStreamWriter<T> itemWriter) {
this.itemWriter = itemWriter;
}
#Override
public void close() {
this.itemWriter.close();
}
#Override
public void open(ExecutionContext executionContext) {
this.itemWriter.open(executionContext);
}
#Override
public void update(ExecutionContext executionContext) {
this.itemWriter.update(executionContext);
}
}
Using such wrappers, you could still implement "normal" processor and writers and then use such wrappers in order to move the "List"-handling out of them.
Unless you can provide a compelling reason, there's no reason to send a List of Lists to your ItemWriter. This is not the way the ItemProcessor was intended to be used. Instead, you should create/configure and ItemReader to return one object with relevant objects.
For example, if you're reading from the database, you could use the HibernateCursorItemReader and a query that looks something like this:
"from ParentEntity parent left join fetch parent.childrenEntities"
Your data model SHOULD have a parent table with the Long id that you're currently passing to your ItemProcessor, so leverage that to your advantage. The reader would then pass back ParentEntity objects, each with a collection of ChildEntity objects that go along with it.
I need to apply a custom filter to make sure that the user who executed the search has permissions to view the documents are returned by the searcher. I have extended simpleCollector. However, in the Java Docs recommend against using both indexSearch and Reader:
Note: This is called in an inner search loop. For good search performance, implementations of this method should not call IndexSearcher.doc(int) or org.apache.lucene.index.IndexReader.document(int) on every hit. Doing so can slow searches by an order of magnitude or more.
I need to get the documents to get the id term and check those against data held in the DB. Is there another way to filter the results other then using a collector?
Is there a more efficient way of obtaining documents without calling indexSearcher?
My code currently is the following:
public class PermittedResultsCollector extends SimpleCollector {
private IndexSearcher searcher;
public PermittedResultsCollector(IndexSearcher searcher) {
this.searcher = searcher;
}
public boolean needsScores() {
return false;
}
#Override
public void collect(int doc) throws IOException {
Document document = searcher.doc(doc);
if (callToExternalService(document.get("id"))){
throw new CollectionTerminatedException();
}
}
public IndexSearcher getSearcher() {
return searcher;
}
}
I'm having trouble trying to cache data from Parse.com.
I've been reading the Parse API for caching, but i'm still having trouble understanding it. How do I extract data and cache with this?
query.setCachePolicy(ParseQuery.CachePolicy.NETWORK_ELSE_CACHE);
query.findInBackground(new FindCallback<ParseObject>() {
public void done(List<ParseObject> scoreList, ParseException e) {
if (e == null) {
// Results were successfully found, looking first on the
// network and then on disk.
} else {
// The network was inaccessible and we have no cached data
// for this query.
}
});
The data is cached automatically on the internal storage if you specify a CachePolicy. The default one is the CachePolicy.IGNORE_CACHE so no data is cached. Since you are interested in getting the results from cache, it would make more sense to use the CachePolicy.CACHE_ELSE_NETWORK so the query will look first inside the cache. The data you are looking for is stored in your case in the variable scoreList.
Maybe it is difficult for you to understand how your code works because you're using the callback (because of findinBackground()). Consider the following code:
ParseQuery<Person> personParseQuery = new ParseQuery<Person>(Person.class);
personParseQuery.setCachePolicy(ParseQuery.CachePolicy.CACHE_ELSE_NETWORK);
personParseQuery.addAscendingOrder("sort_order");
List<Person> = personParseQuery.find();
As you can see, the result of the query is return by the find() method. From the Parse API documentation:
public List<T> find() throws ParseException -
Retrieves a list of ParseObjects that satisfy this query. Uses the network and/or the cache, depending on the cache policy.
The Person class may look like this:
#ParseClassName("Person")
public class Person extends ParseObject {
public Person(){}
public String getPersonName() {
return getString("personName");
}
public void setPersonName(String personName) {
put("personName",personName);
}
}
And of course, don't forget to initialize Parse first and register the Person class:
Parse.initialize(this, "appID", "clientID");
ParseObject.registerSubclass(Person.class);
I hope my explanation can help you.
PS: You can see the data is cached by looking inside the data.data. your application package+name .cache.com.parse folder on your emulator after executing the code.