How to expand code/description to a complex object? - java

I want to present a list of the names/basic attributes of some complex objects (i.e. they are comprised of multiple collections of other objects) in a recycler view, then get the full object on user selection. For example, the top level objects are "Play Scripts", and each contains a number of "Spoken Lines" spoken by one of the "Actors" associated with the Play Script.
I'm trying to use the Android Architecture components to do this and have (using Florian # codinginflow.com 's tutorials) successfully used Room to create a simplified Play_Script class, DAO and Repository. I've also created some basic REST web services in ASP.Net which can serve up data from a MySQL db.
It strikes me that the path that I am going down will perform poorly and use excessive network bandwidth getting lots of data that I won't use. I'm getting every Play Script (including its Spoken Lines etc) just so that I have the Play Script "Name" and "Description" attributes to populate the Recycler.
In the olden days, I'd just "SELECT ID, Name, Description FROM Play_Script" and once the user had made their choice, I'd use the ID as the key to get everything else that I needed. I suspect that I'm missing something fundamental in the design of my data entities but can't come up with any keywords that would let me search for examples of this common sort of task being done well (/at all).
Please can you help this SO noob with his 1st question?
Cheers,
Z
Update 15 May:
Though I haven't had a response, from what I've been reading in recent weeks (e.g. re Dependency Injection) I suspect that there is no blanket approach for this sort of thing in Android development. It appears that people generally either retrieve extensive data and then use what they require or else build multiple Web Service APIs to return sparse data that includes keys that the client can use to expand when required. So, for example you might make both a "plays_light" and a "plays_detail" Get API.

My solution has been exactly as my May update - i.e. to extend the web API and offer a number of similar calls that return varying granularities of information. It's not particularly elegant and I suspect there may be better ways but it works. In general, I'm finding that the user tends to need less detail in the parent entities and more as we get to individual children/grandchildren.
I do now realise why some apps are so slow though: It's easy to be lazy in the web service design and just return loads of data - only a fragment of which will be used by the client - and justify this by convincing yourself that single API will be universally applicable and thus easier for whoever picks up my code down the line to understand.
Again, it could be my inexperience but I find the local caching of relational data on the Android side retrieved through the API calls quite clunky - lots of storing foreign keys and then re-parsing json to get data into the SQLite tables. I'd hoped Dagger would have been more useful in simplifying this than it has turned out to be so far. I actually unravelled a whole load of Dagger-related code just to preserve my sanity. Not sure I was entirely successful!
Better answers are still very much welcome.
Z

Related

Sorted Array vs Hashtable: Which data structure would be more efficient in searching over a range of dates in a calendar app?

I have about a year of experience in coding in Java. To hone my skills I'm trying to write a Calendar/journal entry desktop app in Java. I've realized that I still have no experience in data persistence and still don't really understand what the data persistence options would be for this program -- So perhaps I'm jumping the gun, and the design choices that I'm hoping to implement aren't even applicable once I get into the nitty gritty.
I mainly want to write a calendar app that allows you to log daily journal entries with associated activity logs for time spent on daily tasks. In terms of adding, editing and viewing the journal entries, using a hash table with the dates of the entries as keys and the entries themselves as the values seems most Big-Oh efficient (O(1) average case for each using a hash table).
However, I'm also hoping to implement a feature that could, given a certain range of dates, provide a simple analysis of average amount of time spent on certain tasks per day. If this is one of the main features I'm interested in, am I wrong in thinking that perhaps a sorted array would be more Big-Oh efficient? Especially considering that the data entries are generally expected to already be added date by date.
Or perhaps there's another option I'm unaware of?
The reason I'm asking is because of the answer provided by this following question: Why not use hashing/hash tables for everything?
And the reason I'm unsure if I'm even asking the right question is because of the answer to the following question: Whats the best data structure for a calendar / day planner?
If so, I would really appreciate being directed other resources on data persistence in java.
Thank you for the help!
Use a NavigableMap interface (implemented by TreeMap, a red-black tree).
This allows you to easily and efficiently select date ranges and traverse over events in key order.
As an aside, if you consider time or date intervals to be "half-open" it will make many problems easier. That is, when selecting events, include the lower bound in results, but exclude the upper. The methods of NavigableMap, like subMap(), are designed to work this way, and it's a good practice when you are working with intervals of any quantity, as it's easy to define a sequence of intervals without overlap or gaps.
Depends on how serious you want your project to be. In all cases, be careful of premature optimization. This is when you try too hard to make your code "efficient", and sacrifice readability/maintainability in the process. For example, there is likely a way of doing manual memory management with native code to make a more efficient implementation of a data structure for your calendar, but it likely does not outweigh the beneits of using familiar APIs etc. It might do, but you only know when you run your code.
Write readable code
Run it, test for performance issues
Use a profiler (e.g. JProfiler) to identify the code that is responsible for poor performance
Optimise that code
Repeat
For code that will "work", but will not be very scalable, a simple List will usually do fine. You can use JSONs to store your objects, and a library such as Jackson Databind to map between List and JSON. You could then simply save it to a file for persistence.
For an application that you want to be more robust and protected against data corruption, a database is probably better. With this, you can guarantee that, for example, data is not partially written, concurrent access to the same data will not result in corruption, and a whole host of other benefits. However, you will need to have a database server running alongside your application. You can use JDBC and suitable drivers for your database vendor (e.g. Mysql) to connect to, read from and write to the database.
For a serious application, you will probably want to create an API for your persistence. A framework like Spring is very helpful for this, as it allows you to declare REST endpoints using annotations, and introduces useful programming concepts, such as containers, IoC/Dependency Injection, Testing (unit tests and integration tests), JPA/ORM systems and more.
Like I say, this is all context dependent, but above all else, avoid premature optimization.
This thread might give you some ideas what data structure to use for Range Queries.
Data structure for range query
And it even might be easier to use a database and using an API to query for the desired range.
If you are using (or are able to use) Guava, you might consider using RangeMap (*).
This would allow you to use, say, a RangeMap<Instant, Event>, which you could then query to say "what event is occurring at time T".
One drawback is that you wouldn't be able to model concurrent events (e.g. when you are double-booked in two meetings).
(*) I work for Google, Guava is Google's open-sourced Java library. This is the library I would use, but others with similar range map offerings are available.

What action should a client application take after executing a command?

Background
This question is best illustrated using an example. Say I have a client application (e.g. desktop application, mobile app, etc.) that consumes information from a web service. One of the screens has a list of products that are queried from the web service when the client application starts up and are bound to the UI element. Now, the user creates a new product. This causes the client application to send a command to the web service to add that product to a database.
Question
In the client application, what should happen after the command is issued and is successful? Do you:
Query the full product list from the service and refresh the entire product list in the client application?
Query just the two newly added products and add them to the product list?
Don't query, and instead just use the information available in the client application to create the new products in the GUI, and then add them to the list?
The same questions apply to update too. If you update a product, do you get confirmation of a successful update on the service, and then just let the GUI update the product without further requests to the service?
Edit - Additional details added
From initial feedback, the takeaway appears to be go with the simplest approach unless this:
Leads to performance concerns
Negatively impacts user experience
There is a major/significant portion of my application where the main way to interact with the application is to drag grid records between a number of different grids. For example, dragging a product onto another grid would create a new order, which would need to be sent to the service. Some of these grids are more complex than your standard grid. Records can be grouped, and each group can be collapsed/expanded (see here). In this case, while the grid can be refreshed from the service very quickly, this would probably lead to usability concerns. When a grid is refreshed with all new data, if the user had any groups expanded/collapsed, this would be lost.
So, while most grids in my application could probably just all be refreshed at once, the more complex ones will need to be updated more carefully. I would think this would lend to option 1 or 2 (at least for creating new records). One thought I had was that the client application could create GUIDs for new records to be sent with the application. That way, no follow-up query would need to be made to the service, as the client application would already have the unique ID. Then, the client application would just wait for a successful response from the service prior to showing the user the new record.
Get the whole list
I guess it depends how costly the request/response are. If possible and efficient, I would always choose your first option (get the whole list) until there is a performance concern.
As the saying goes:
The First Rule of Program Optimization: Don't do it.
The Second Rule of Program Optimization – For experts only: Don't do it yet.
There is simply less scenarios to cover, less code to write, less code to maintain since you'll need the "get the whole list" service no matter what.
It also returns the "most up to date list of products" in case another client added products simultaneously.
Only pros, until there is a performance concern, in my opinion. These last 3 words would imply that this question will only lead to opinions and should be closed...
I don't think there's any definitive right answer; these kinds of questions need to be thought of on a case by case basis. #3 by itself is often not an option - for example, if you need the client to have a database-generated field like an ID, it's gotta get from point A to point B somehow. You also need to think about how you're exposing any errors to your user, because it's a terrible experience if you make it appear that everything succeeded, but you actually had an error and the product didn't really save.
Beyond that, I'd look at usability as my next criteria. What's the experience like for your users if you refresh the list versus adding just a couple of products? Is there a significant difference? A lot comes down to your specific application, and also the workflow being done. If adding products is something that is the main part of someone's job, where they may spend hours a day doing this, shaving even a second off the time is a real win for your users, while if it's an uncommon workflow that people do from time to time, the performance expectations are somewhat lower.
And last I'd look at code maintenance and complexity. If two paths are giving relatively similar experiences, pick the one that's easier to build and maintain.
There are other options, too. You can go with a hybrid approach - for example, maybe on the client you add the data to the product list immediately (perhaps showing some kind of "saving" indicator), while also asynchronously querying the database so you can refresh the product listing and report any errors. Such approaches tend to be the most complex, but you might go down that route if usability demands it.

How to automate the retrieving process from a website

Here is a biological database, http://www.genecards.org/index.php?path=/GeneDecks
Usually, if I type in a gene name (string) (ex. TF53) and summit it, it will come back with a result on the webpage. Also, it can be chosen if users want to save it as tab-delimited/XML file. However, I have a list of gene name which contains more than thousands of gene name. How can I automate this a series of processes by Java program ?
I know this question can be quite broad and probably has various way to do. With only a little experience in Java programming, I really appreciate if someone can suggest a easier way to do it. Thanks.
One of the possibilities is to read gene names sequentially from your list and send for each other that request:
http://www.genecards.org/index.php?path=/GeneDecks/ParalogHunter/<your gene name>/100/{%22Sequence_Paralogs%22:%221%22,%22Domains%22:%221%22,%22Super_Pathways%22:%221%22,%22Expression_Patterns%22:%221%22,%22Phenotypes%22:%221%22,%22Compounds%22:%221%22,%22Disorders%22:%221%22,%22Gene_Ontologies%22:%221%22}
(so basically mimic what the site does).
For example:
http://www.genecards.org/index.php?path=/GeneDecks/ParalogHunter/TNFRSF10B/100/{%22Sequence_Paralogs%22:%221%22,%22Domains%22:%221%22,%22Super_Pathways%22:%221%22,%22Expression_Patterns%22:%221%22,%22Phenotypes%22:%221%22,%22Compounds%22:%221%22,%22Disorders%22:%221%22,%22Gene_Ontologies%22:%221%22}
However, they might not like people using their site in such way (submitting a lot of automatic requests). You might want to check their policy on that. Also, other thing to check is if they have an official API which can be used for batch retrieval of gene information.

What is a good framework to implement data transformation rules through UI

Let me describe the problem. A lot of suppliers send us data files in various formats (with various headers). We do not have any control on the data format (what columns the suppliers send us). Then this data needs to be converted to our standard transactions (this standard is constant and defined by us).
The challenge here is that we do not have any control on what columns suppliers send us in their files. The destination standard is constant. Now I have been asked to develop a framework through which the end users can define their own data transformation rules through UI. (say field A in destination transaction is equal to columnX+columnY or first 3 characters of columnZ from input file). There will be many such data transformation rules.
The goal is that the users should be able to add all these supplier files (and convert all their data to my company data from front end UI with minimum code change). Please suggest me some frameworks for this (preferably java based).
Worked in a similar field before. Not sure if I would trust customers/suppliers to use such a tool correctly and design 100% bulletproof transformations. Mapping columns is one thing, but how about formatting problems in dates, monetary values and the likes? You'd probably need to manually check their creations anyway or you'll end up with some really nasty data consistency issues. Errors caused by faulty data transformation are little beasts hiding in the dark and jumping at you when you need them the least.
If all you need is a relatively simple, graphical way to design data conversations, check out something like Talend Open Studio (just google it). It calls itself an ETL tool, but we used for all kinds of stuff.

Is tagging a form of data mining?

I am implementing a small CRM system. and the concept of data mining to predict and find opportunities and trends are essential for such systems. One data mining approach is clustering. This is a very small CRM project and using java to provide the interface for information retrieval from database.
My question is that when I insert a customer into database, I have a text field which allows customers to be tagged on their way into the database i.e. registration point.
Would you regard tagging technique as clustering? If so, is this a data mining technique?
I am sure there is complex API such as Java Data Mining API which allows data mining. But for the sake of my project I just wanted to know if tagging users with keyword like stackoverflow allows tagging of keywords on posting question is a form of data mining since through those tagged words, one can find trends and patterns easily through searching.
To make it short, yes, tags are additional information that will make data mining easier to conduct later on.
They probably won't be enough though. Tags are linked to entities and, depending on how you compute them, they might not show interesting relations between different entities. With your tagging system, the only relation usable I see is 'has same tag' and it might not be enough.
Clustering your data can be done using community detection techniques on graphs built using your data and relations between entities.
This example is in Python and uses the networkx library but it might give you an idea of what I'm talking about: http://perso.crans.org/aynaud/communities/
Yes, tagging is definitely one way of grouping your users. However, it’s different than ‘clustering.’ Here’s why: you’re making a conscious decision on how you want to group them, but there may be better/ different user groups based on ranging behaviors that may not be obvious to you.
Clustering methods are unsupervised learning methods that can help you uncover these patterns. These methods are “unsupervised” because you don’t have a specific target variable; rather, you want to find groups/ patterns that are most prominent in the data. You can feed CRM data to clustering algorithms to uncover ‘hidden’ relationships.
Also, if you’re using ‘tagging,’ it’s more of a descriptive analytics problem - you’ve well-defined groups in the data, and you’re identifying their behavior. Clustering would be a predictive analytics problem - algorithms will try to predict groups based on the user behavior they recognize in the data.

Categories

Resources