Caching with Parse Service - java

I'm having trouble trying to cache data from Parse.com.
I've been reading the Parse API for caching, but i'm still having trouble understanding it. How do I extract data and cache with this?
query.setCachePolicy(ParseQuery.CachePolicy.NETWORK_ELSE_CACHE);
query.findInBackground(new FindCallback<ParseObject>() {
public void done(List<ParseObject> scoreList, ParseException e) {
if (e == null) {
// Results were successfully found, looking first on the
// network and then on disk.
} else {
// The network was inaccessible and we have no cached data
// for this query.
}
});

The data is cached automatically on the internal storage if you specify a CachePolicy. The default one is the CachePolicy.IGNORE_CACHE so no data is cached. Since you are interested in getting the results from cache, it would make more sense to use the CachePolicy.CACHE_ELSE_NETWORK so the query will look first inside the cache. The data you are looking for is stored in your case in the variable scoreList.
Maybe it is difficult for you to understand how your code works because you're using the callback (because of findinBackground()). Consider the following code:
ParseQuery<Person> personParseQuery = new ParseQuery<Person>(Person.class);
personParseQuery.setCachePolicy(ParseQuery.CachePolicy.CACHE_ELSE_NETWORK);
personParseQuery.addAscendingOrder("sort_order");
List<Person> = personParseQuery.find();
As you can see, the result of the query is return by the find() method. From the Parse API documentation:
public List<T> find() throws ParseException -
Retrieves a list of ParseObjects that satisfy this query. Uses the network and/or the cache, depending on the cache policy.
The Person class may look like this:
#ParseClassName("Person")
public class Person extends ParseObject {
public Person(){}
public String getPersonName() {
return getString("personName");
}
public void setPersonName(String personName) {
put("personName",personName);
}
}
And of course, don't forget to initialize Parse first and register the Person class:
Parse.initialize(this, "appID", "clientID");
ParseObject.registerSubclass(Person.class);
I hope my explanation can help you.
PS: You can see the data is cached by looking inside the data.data. your application package+name .cache.com.parse folder on your emulator after executing the code.

Related

Apache Drill: Write general-purpose array_agg UDF

I would like to create an array_agg UDF for Apache Drill to be able to aggregate all values of a group to a list of values.
This should work with any major types (required, optional) and minor types (varchar, dict, map, int, etc.)
However, I get the impression that Apache Drill's UDF API does not really make use of inheritance and generics. Each type has its own writer and handler, and they cannot be abstracted to handle any type. E.g., the ValueHolder interface seems to be purely cosmetic and cannot be used to have type-agnostic hooking of UDFs to any type.
My current implementation
I tried to solve this by using Java's reflection so I could use the ListHolder's write function independent of the holder of the original value.
However, I then ran into the limitations of the #FunctionTemplate annotation.
I cannot create a general UDF annotation for any value (I tried it with the interface ValueHolder: #param ValueHolder input.
So to me it seems like the only way to support different types to have separate classes for each type. But I can't even abstract much and work on any #Param input, because input is only visible in the class where its defined (i.e. type specific).
I based my implementation on https://issues.apache.org/jira/browse/DRILL-6963
and created the following two classes for required and optional varchars (how can this be unified in the first place?)
#FunctionTemplate(
name = "array_agg",
scope = FunctionScope.POINT_AGGREGATE,
nulls = NullHandling.INTERNAL
)
public static class VarChar_Agg implements DrillAggFunc {
#Param org.apache.drill.exec.expr.holders.VarCharHolder input;
#Workspace ObjectHolder agg;
#Output org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter out;
#Override
public void setup() {
agg = new ObjectHolder();
}
#Override
public void reset() {
agg = new ObjectHolder();
}
#Override public void add() {
if (agg.obj == null) {
// Initialise list object for output
agg.obj = out.rootAsList();
}
org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter listWriter =
(org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter)agg.obj;
listWriter.varChar().write(input);
}
#Override
public void output() {
((org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter)agg.obj).endList();
}
}
#FunctionTemplate(
name = "array_agg",
scope = FunctionScope.POINT_AGGREGATE,
nulls = NullHandling.INTERNAL
)
public static class NullableVarChar_Agg implements DrillAggFunc {
#Param NullableVarCharHolder input;
#Workspace ObjectHolder agg;
#Output org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter out;
#Override
public void setup() {
agg = new ObjectHolder();
}
#Override
public void reset() {
agg = new ObjectHolder();
}
#Override public void add() {
if (agg.obj == null) {
// Initialise list object for output
agg.obj = out.rootAsList();
}
if (input.isSet != 1) {
return;
}
org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter listWriter =
(org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter)agg.obj;
org.apache.drill.exec.expr.holders.VarCharHolder outHolder = new org.apache.drill.exec.expr.holders.VarCharHolder();
outHolder.start = input.start;
outHolder.end = input.end;
outHolder.buffer = input.buffer;
listWriter.varChar().write(outHolder);
}
#Override
public void output() {
((org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter)agg.obj).endList();
}
}
Interestingly, I can't import org.apache.drill.exec.vector.complex.writer.BaseWriter to make the whole thing easier because then Apache Drill would not find it.
So I have to put the entire package path for everything in org.apache.drill.exec.vector.complex.writer in the code.
Furthermore, I'm using the depcreated ObjectHolder. Any better solution?
Anyway: These work so far, e.g. with this query:
SELECT
MIN(tbl.`timestamp`) AS start_view,
MAX(tbl.`timestamp`) AS end_view,
array_agg(tbl.eventLabel) AS label_agg
FROM `dfs.root`.`/path/to/avro/folder` AS tbl
WHERE tbl.data.slug IS NOT NULL
GROUP BY tbl.data.slug
however, when I use ORDER BY, I get this:
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: UnsupportedOperationException: NULL
Fragment 0:0
Additionally, I tried more complex types, namely maps/dicts.
Interestingly, when I call SELECT sqlTypeOf(tbl.data) FROM tbl, I get MAP.
But when I write UDFs, the query planner complains about having no UDF array_agg for type dict.
Anyway, I wrote a version for dicts:
#FunctionTemplate(
name = "array_agg",
scope = FunctionScope.POINT_AGGREGATE,
nulls = NullHandling.INTERNAL
)
public static class Map_Agg implements DrillAggFunc {
#Param MapHolder input;
#Workspace ObjectHolder agg;
#Output org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter out;
#Override
public void setup() {
agg = new ObjectHolder();
}
#Override
public void reset() {
agg = new ObjectHolder();
}
#Override public void add() {
if (agg.obj == null) {
// Initialise list object for output
agg.obj = out.rootAsList();
}
org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter listWriter =
(org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter) agg.obj;
//listWriter.copyReader(input.reader);
input.reader.copyAsValue(listWriter);
}
#Override
public void output() {
((org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter)agg.obj).endList();
}
}
#FunctionTemplate(
name = "array_agg",
scope = FunctionScope.POINT_AGGREGATE,
nulls = NullHandling.INTERNAL
)
public static class Dict_agg implements DrillAggFunc {
#Param DictHolder input;
#Workspace ObjectHolder agg;
#Output org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter out;
#Override
public void setup() {
agg = new ObjectHolder();
}
#Override
public void reset() {
agg = new ObjectHolder();
}
#Override public void add() {
if (agg.obj == null) {
// Initialise list object for output
agg.obj = out.rootAsList();
}
org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter listWriter =
(org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter) agg.obj;
//listWriter.copyReader(input.reader);
input.reader.copyAsValue(listWriter);
}
#Override
public void output() {
((org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter)agg.obj).endList();
}
}
But here, I get an empty list in the field data_agg for my query:
SELECT
MIN(tbl.`timestamp`) AS start_view,
MAX(tbl.`timestamp`) AS end_view,
array_agg(tbl.data) AS data_agg
FROM `dfs.root`.`/path/to/avro/folder` AS tbl
GROUP BY tbl.data.viewSlag
Summary of questions
Most importantly: How do I create an array_agg UDF for Apache Drill?
How to make UDFs type-agnostic/general purpose? Do I really have to implement an entire class for each Nullable, Required and Repeated version of all types? That's a lot to do and quite tedious. Isn't there a way to handle values in an UDF agnostic to the underlying types?
I wish Apache Drill would just use what Java offers here with function generic types, specialised function overloading and inheritence of their own type system. Am I missing something on how to do that?
How can I fix the NULL problem when I use ORDER BY on my varchar version of the aggregate?
How can I fix the problem where my aggregate of maps/dicts is an empty list?
Is there an alternative to using the deprecated ObjectHolder?
To answer your question, unfortunately you've run into one of the limits of the Drill Aggregate UDF API which is that it can only return simple data types.1 It would be a great improvement to Drill to fix this, but that is the current status. If you're interested in discussing that further, please start a thread on the Drill user group and/or slack channel. I don't think it is impossible, but it would require some modification to the Drill internals. IMHO it would be well worth it because there are a few other UDFs that I'd like to implement that need this feature.
The second part of your question is how to make UDFs type agnostic and once again... you've found yet another bit of ugliness in the UDF API. :-) If you do some digging in the codebase, you'll see that most of the Math functions have versions that accept FLOAT, INT etc..
Regarding the aggregate of null or empty lists. I actually have some good news here... The current way of doing that is to provide two versions of the function, one which accepts regular holders and the second which accepts nullable holders and returns an empty list or map if the inputs are null. Yes, this sucks, but the additional good news is that I'm working on cleaning this up and hopefully will have a PR submitted soon that will eliminate the need to do this.
Regarding the ObjectHolder, I wrote a median function that uses a few Stacks to compute a streaming median and I used the ObjectHolder for that. I think it will be with us for some time as there is no alternative at the moment.
I hope this answers your questions.

Iterate Fields of Fields continuously with reflection

Please avoid giving answers in Kotlin only and higher than Android 21.
I'm trying to build an API parser that makes use of class hierarchy logic to represent the API hierarchy itself. With this structure I am able to parse the API in an uncomplicated fashion and I was able to achieve this already, but I'd like to improve it further.
I'll begin explaining what I already have implemented.
This is an example URL that my app will receive via GET, parse and dispatch internally:
http://www.example.com/news/article/1105
In the app the base domain is irrelevant, but what comes after is the API structure.
In this case we have a mixture of commands and variables:
news (command)
article (command)
1105 (variable)
To establish what is a command and what is a variable I built the following class structures:
public class API {
public static final News extends AbstractNews {}
}
public class AbstractNews {
public static final Article extends AbstractArticle {}
}
public class Article {
public static void GET(String articleId) {
// ...
}
}
And I iterate through each class after splitting the URL while matching each command to each class (or subclass) starting from the API class. Until I reach the end of the split URL any matches that fail are stored in a separate list as variables.
The process is as follows for the example provided above:
Split URL each forward slash (ignoring the base domain)
/news/article/1105
List<String> stringList = [
news,
article,
1105
];
Iterate each item in the split list and match agains the API structured classes (the following is just a sample example, it is not 100% of what I currently have implemtend):
List<String> variableList = new ArrayList<>();
Class lastClass = API.class;
for (String stringItem : stringList) {
if ((lastClass = classHasSubClass(lastClass, stringItem)) != null) {
continue;
}
variableList.add(stringItem);
}
Once the end of the list is reached I check if the last class contains the request method (in this case GET) and invoke along with the variable list.
Like I said before this is working perfectly fine, but it leaves every class directly exposed and as a result they can be accessed directly and incorrectly by anyone else working on the project, so I am trying to make the hierarchy more contained.
I want to keep the ability to access the methods via hierarchy as well, so the following can still be possible:
API.News.Article.GET(42334);
While at the same time I don't want it to be possible to do the following as well:
AbstractArticle.GET(42334);
I have tried making each subclass into a class instance field instead
public class API {
// this one is static on purpose to avoid having to instantiate
// the API class before accessing its fields
public static final AbstractNews News = new AbstractNews();
}
public class AbstractNews {
public final AbstractArticle Article = new AbstractArticle();
}
public class Article {
public void GET(String articleId) {
// ...
}
}
This works well for the two points I wanted to achieve before, however I am not able to find a way to iterate the class fields in a way that allows me to invoke the final methods correctly.
For the previous logic all I needed to iterate was the following:
private static Class classHasSubClass(Class<?> currentClass, String fieldName) {
Class[] classes;
classes = currentClass.getClasses();
for (final Class classItem : classes) {
if (classItem.getSimpleName().toLowerCase().equals(fieldName)) {
return classItem;
}
}
return null;
}
But for the second logic attempt with fields I was not able to invoke the final method correctly, probably because the resulting logic was in fact trying to do the following:
AbstractArticle.GET(42334);
Instead of
API.News.Article.GET(42334);
I suspect it is because the first parameter of the invoke method can no longer be null like I was doing before and has to be the correct equivalent of API.News.Article.GET(42334);
Is there a way to make this work or is there a better/different way of doing this?
I discovered that I was on the right path with the instance fields, but was missing part of the necessary information to invoke the method correctly at the end.
When iterating the fields I was only using the Class of each field, which was working perfectly fine before with the static class references since those weren't instances, but now it requires the instance of the field in order to work correctly.
In the end the iterating method used in place of classHasSubClass that got this to work is as follows:
private static Object getFieldClass(Class<?> currentClass, Object currentObject, final String fieldName) {
Field[] fieldList;
fieldList = currentClass.getDeclaredFields();
for (final Field field : fieldList) {
if (field.getName().toLowerCase().equals(fieldName)) {
try {
return field.get(currentObject);
} catch (IllegalAccessException e) {
e.printStackTrace();
break;
}
}
}
return null;
}
With this I always keep an instance object reference to the final field that I want to invoke to pass as the 1st parameter (someMethod.invoke(objectInstance);) instead of null.

What data structure or design pattern can I use to resolve this issue

I have the following design issue that I hope to get your help to resolve.
Below is a simplistic look at what the code looks like
class DataProcessor{
public List<Record> processData(DataFile file){
List<Record> recordsList = new ArrayList<Record>();
for(Line line : file.getLines()){
String processedData = processData(line);
recordsList.add(new Record(processedData));
}
}
private String processData(String rawLine){
//code to process line
}
}
class DatabaseManager{
saveRecords(List<Record> recordsList){
//code to insert records objects in database
}
}
class Manager{
public static void main(String[] args){
DatabaseManager dbManager = new DatabaseManager("e:\\databasefile.db");
DataFile dataFile = new DataFile("e:\\hugeRawFile.csv");
DataProcessor dataProcessor = new DataProcessor();
dbManager.saveRecords(dataProcessor.processData(dataFile));
}
}
As you can see, "processData" method of class "DataProcessor" takes DataFile object, processes the whole file, create Record object for each line and then it returns a list of "Record" objects.
My problem with "processData" method: When the raw file is really huge, "List of Record" objects takes a lot of memory and sometimes the program fails. I need to change the current desgin so that the memory usage is minimized. "DataProcessor" should not have direct access to "DatabaseManager".
I was thinking of passing a queue to "processData" method, where one thread run "processData" method to insert "Record" object in the queue, while another thread remove "Record" object from the queue and insert it in database. But I'm not sure about the performance issues with this.
Put the responsibility of driving the process into the most constrained resource (in your case the DataProcessor) - this will make sure the constraints are best obeyed rather than forced to the breaking point.
Note: don't even think of multithreading, it is not going to do you any good for processing files. Threads will be a solution if your data comes over the wire, when you don't know when your next data chunk is going to arrive ad perhaps you have better things to do with your CPU time than to wait "until cows come home to roost" (grin). But with files? You know the job has a start and an end, so get on with it as fast as possible.
class DataProcessor{
public List<Record> processData(DataFile file){
List<Record> recordsList = new ArrayList<Record>();
for(Line line : file.getLines()){
String processedData = processData(line);
recordsList.add(new Record(processedData));
}
}
private String processData(String rawLine){
//code to process line
}
public void processAndSaveData(DataFile dataFile, DatabaseManager db) {
int maxBuffSize=1024;
ArrayList<Record> buff=new ArrayList<Record>(maxBuffSize);
for(Line line : file.getLines()){
String processedData = processData(line);
buff.add(new Record(processedData));
if(buff.size()==maxBuffSize) {
db.saveRecords(buff);
buff.clear();
}
}
// some may be still unsaved here, less that maxBuffSize
if(buff.size()>0) {
db.saveRecords(buff);
// help the CG, let it recycle the records
// without needing to look "is buff still reacheable"?
buff.clear();
}
}
}
class Manager{
public static void main(String[] args){
DatabaseManager dbManager = new DatabaseManager("e:\\databasefile.db");
DataFile dataFile = new DataFile("e:\\hugeRawFile.csv");
DataProcessor dataProcessor = new DataProcessor();
// So... do we need another stupid manager to tell us what to do?
// dbManager.saveRecords(dataProcessor.processData(dataFile));
// Hell, no, the most constrained resource knows better
// how to deal with the job!
dataProcessor.processAndSaveData(dataFile, dbManager);
}
}
[edit] Addressing the "but we settled on what and how, and now you are coming to tell us we need to write extra code?"
Build an AbstractProcessor class and ask your mates just to derive from it.
class AbstractProcessor {
// sorry, need to be protected to be able to call it
abstract protected Record processData(String rawLine);
abstract protected Class<? extends Record> getRecordClass();
public void processAndSaveData(DataFile dataFile, DatabaseManager db) {
Class<? extends Record> recordType=this.getRecordClass();
if(recordType.equals(MyRecord1.class) {
// buffered read and save MyRecord1 types specifically
}
else if(recordType.equals(YourRecord.class)) {
// buffered read and save YourRecord types specifically
}
// etc...
}
}
Now, all they need to do is to "code" extends AbstractProcessor and make their processData(String) protected and write a trivial method declaring its record type (may as well be an enum). It's not like you ask them a huge effort and makes what would have been a costly (or even impossible, for a TB input file) operation a "as fast as possible one".
You should be able to use streaming to do this in one thread, one record at a time in memory. The implementation depends on the technology your DatabaseManager is using.

Filtering Lucene query results based on external data

I need to apply a custom filter to make sure that the user who executed the search has permissions to view the documents are returned by the searcher. I have extended simpleCollector. However, in the Java Docs recommend against using both indexSearch and Reader:
Note: This is called in an inner search loop. For good search performance, implementations of this method should not call IndexSearcher.doc(int) or org.apache.lucene.index.IndexReader.document(int) on every hit. Doing so can slow searches by an order of magnitude or more.
I need to get the documents to get the id term and check those against data held in the DB. Is there another way to filter the results other then using a collector?
Is there a more efficient way of obtaining documents without calling indexSearcher?
My code currently is the following:
public class PermittedResultsCollector extends SimpleCollector {
private IndexSearcher searcher;
public PermittedResultsCollector(IndexSearcher searcher) {
this.searcher = searcher;
}
public boolean needsScores() {
return false;
}
#Override
public void collect(int doc) throws IOException {
Document document = searcher.doc(doc);
if (callToExternalService(document.get("id"))){
throw new CollectionTerminatedException();
}
}
public IndexSearcher getSearcher() {
return searcher;
}
}

GAE, JDO, count() doesn't work?

On GAE with Spring/JDO after saving 2 entities (in transaction).
On calling getById - entities fetched from data storage.
On calling getCount() returns "0"
and - on calling getAll() - returns empty collection.
#Override
public Long getCount() {
return ((Integer) getJdoTemplate().execute(new JdoCallback() {
#Override
public Object doInJdo(PersistenceManager pm) throws JDOException {
Query q = pm.newQuery(getPersistentClass());
q.setResult("count(this)");
return q.execute();
}
})).longValue();
}
#Override
public void saveOrUpdate(T entity) {
getJdoTemplate().makePersistent(entity);
}
#Override
public List<T> getAll() {
return new ArrayList<T>(getJdoTemplate().find(getPersistentClass()));
}
Google's implementation of JDO does not currently support aggregates AFAIK. Try keeping track of the count by updating some other entity every time you persist a new entity. If you are doing frequent writes, you'll want a "sharded" counter.
Your question is pretty close to this one, so reading those answers may help.
count() is actually implemented in GAE/J's plugin, as seen here
http://code.google.com/p/datanucleus-appengine/source/browse/trunk/src/org/datanucleus/store/appengine/query/DatastoreQuery.java#341
If you have a problem with it then suggest that you provide a testcase to Google and raise an issue on their issue tracker for their GAE/J DN plugin ("Issues" on the linked page)

Categories

Resources