How to automate the retrieving process from a website

How to automate the retrieving process from a website - java

Here is a biological database, http://www.genecards.org/index.php?path=/GeneDecks
Usually, if I type in a gene name (string) (ex. TF53) and summit it, it will come back with a result on the webpage. Also, it can be chosen if users want to save it as tab-delimited/XML file. However, I have a list of gene name which contains more than thousands of gene name. How can I automate this a series of processes by Java program ?
I know this question can be quite broad and probably has various way to do. With only a little experience in Java programming, I really appreciate if someone can suggest a easier way to do it. Thanks.

One of the possibilities is to read gene names sequentially from your list and send for each other that request:
http://www.genecards.org/index.php?path=/GeneDecks/ParalogHunter/<your gene name>/100/{%22Sequence_Paralogs%22:%221%22,%22Domains%22:%221%22,%22Super_Pathways%22:%221%22,%22Expression_Patterns%22:%221%22,%22Phenotypes%22:%221%22,%22Compounds%22:%221%22,%22Disorders%22:%221%22,%22Gene_Ontologies%22:%221%22}
(so basically mimic what the site does).
For example:
http://www.genecards.org/index.php?path=/GeneDecks/ParalogHunter/TNFRSF10B/100/{%22Sequence_Paralogs%22:%221%22,%22Domains%22:%221%22,%22Super_Pathways%22:%221%22,%22Expression_Patterns%22:%221%22,%22Phenotypes%22:%221%22,%22Compounds%22:%221%22,%22Disorders%22:%221%22,%22Gene_Ontologies%22:%221%22}
However, they might not like people using their site in such way (submitting a lot of automatic requests). You might want to check their policy on that. Also, other thing to check is if they have an official API which can be used for batch retrieval of gene information.

Related

How to expand code/description to a complex object?

I want to present a list of the names/basic attributes of some complex objects (i.e. they are comprised of multiple collections of other objects) in a recycler view, then get the full object on user selection. For example, the top level objects are "Play Scripts", and each contains a number of "Spoken Lines" spoken by one of the "Actors" associated with the Play Script.
I'm trying to use the Android Architecture components to do this and have (using Florian # codinginflow.com 's tutorials) successfully used Room to create a simplified Play_Script class, DAO and Repository. I've also created some basic REST web services in ASP.Net which can serve up data from a MySQL db.
It strikes me that the path that I am going down will perform poorly and use excessive network bandwidth getting lots of data that I won't use. I'm getting every Play Script (including its Spoken Lines etc) just so that I have the Play Script "Name" and "Description" attributes to populate the Recycler.
In the olden days, I'd just "SELECT ID, Name, Description FROM Play_Script" and once the user had made their choice, I'd use the ID as the key to get everything else that I needed. I suspect that I'm missing something fundamental in the design of my data entities but can't come up with any keywords that would let me search for examples of this common sort of task being done well (/at all).
Please can you help this SO noob with his 1st question?
Cheers,
Z
Update 15 May:
Though I haven't had a response, from what I've been reading in recent weeks (e.g. re Dependency Injection) I suspect that there is no blanket approach for this sort of thing in Android development. It appears that people generally either retrieve extensive data and then use what they require or else build multiple Web Service APIs to return sparse data that includes keys that the client can use to expand when required. So, for example you might make both a "plays_light" and a "plays_detail" Get API.

My solution has been exactly as my May update - i.e. to extend the web API and offer a number of similar calls that return varying granularities of information. It's not particularly elegant and I suspect there may be better ways but it works. In general, I'm finding that the user tends to need less detail in the parent entities and more as we get to individual children/grandchildren.
I do now realise why some apps are so slow though: It's easy to be lazy in the web service design and just return loads of data - only a fragment of which will be used by the client - and justify this by convincing yourself that single API will be universally applicable and thus easier for whoever picks up my code down the line to understand.
Again, it could be my inexperience but I find the local caching of relational data on the Android side retrieved through the API calls quite clunky - lots of storing foreign keys and then re-parsing json to get data into the SQLite tables. I'd hoped Dagger would have been more useful in simplifying this than it has turned out to be so far. I actually unravelled a whole load of Dagger-related code just to preserve my sanity. Not sure I was entirely successful!
Better answers are still very much welcome.
Z

Which way is it better to extract Account number and Balance from a SMS body?

I am planning a task to read all the Bank related SMS from the users android mobile inbox and extract their account number and balance from it. I am guessing this could be done in 2 ways as,
Using RegEx to extract the data from the SMS body as stated link here. This certainly has the advantage of giving generic representation of any Bank Balance message
Store a template message of every bank in the database and compare it with the read SMS to extract the data
I would like to know which path is efficient or Is there any other way to do it ?

The two approaches have different qualities:
Option 1 might lead to many different, complex regular expressions. Alone glancing into the answer you linked made my head spin. Meaning: maintaining such a list of regular expressions will not be an easy undertaking from the developer perspective.
Whereas for option 2, of course you have to keep track regarding your collection of "templates", but: once your infrastructure is in place, the only work required for you: adding new templates; or adapting them.
So, from a "development" efforts side I would tend to option 2 --- because such "templates" are easier to manage by you. You don't even need much understanding of the Java language in order to deal with such templates. They are just text; containing some defined keywords here and there.
One could even think about telling your users how to define templates themselves! They know how the SMS from their bank looks like; so you could think about some "import" mechanism where your APP pulls the SMS text, and then the user tells the APP (once) where the relevant parts can be found in there!
Regarding runtime efficiency: I wouldn't rely on people making guesses here. Instead: make experiments with real world data; and see if matching SMS text against a larger set of complex regular expressions is cheaper or more expensive than matching them against much simpler "templates".

Storing the template for each bank cost more memory (if you load them on at start up for efficiency) and file system storage, and also as you stated, there is the downside of requiring previous know each bank template and setup the user application properly to it.
Using the regex will not cost file system store neither more memory, however it could create false positives for something which looks like a bank message, but it is not. However there is the facility to not need to know all the banks out there in order to do it properly.

When do I need to use in-memory datastructures instead of an SQL query?

What I aim to do as an end result is for the user to be able type into something like a textbox and a variety of options to appear as the user types letters.
As an example the user could start typing a name e.g. Jam and underneath nearest suggestions e.g. James, Jameson, Jamie etc would appear.
I know that if I use SQL's SELECT LIKE %Jam% could do the job (although I guess the user would have to stop writing to be able to execute the query; so the user can not actually see suggestions as he types.Right?).
But for my scope I would not need to use an RDBMS since the data are small and don't have real relationships between them.
I thought of using some NoSQL solution so that I look into this technology (never used it before), but in this case I am not sure on how to approach a design such as this.
Should I build a tree pre-loading all the data in the storage? So I have the data both in-memory and in-storage? But if I use a NoSQL solution isn't that usually in-memory? So I have the same data in-memory twice?
Any help on how could I approach a problem like this?

Guessing you need some Asynchronous Ajax or JQuery Autocomplete command to fetch the like items from your DB.
If its Java Swing have a look at this.

Based on your comments to your original question, I think your goals are in conflict.
If you want to learn by doing, you risk reinventing the wheel. You also risk trying to reinvent the wheel, but ending up instead with a square wheel that won't roll.
This isn't necessarily a bad thing. You can learn a lot this way, but this kind of learning conflicts with your following statement.
...I still am not sure how should I not re-implement things and follow
a standard approach.E.g. is a NoSQL a good approach for this?
The only thing that comes close to a standard approach here is to store the data on a server, not on the client. There are a lot of ways to do that, and if you object to picking "a random library that seems to do the trick", you ought also to object to picking "a random database technology that seems to do the trick".
I think the best way to learn in this case is to build your software in such a way that it knows nothing about how the data is stored on the back end. That way, you can swap out text files, a SQL dbms, a NoSQL dbms, and anything else you can dream up without affecting the front end code at all.
That's good software design--loose coupling between front end and back end.

How to get the list of zipcodes/theaterid of theaters listed in moviefone.com

HI I need to find the showtimings in all the theaters of US. I gather that I can get the information of a perticular theater using http://gateway.moviefone.com/movies/pox/closesttheaters.xml?zip=zipcode .. but i dont have the list of zipcodes and theaterid's
can someone help me with that

I don't understand exactly what it is you are doing, but I suspect that you are using an unpublished web API to snarff content from the moviefone services.
Beware.
If that is what you are doing, it is likely to fall foul of the AOL Terms of Service.
I will use this data for a feature in a site (which is going to be plublished soon.Sorry I cannot provide much information abt this site because my client has asked me to keep it confidential).
The fact that you are doing this for a client does not absolve you of legal responsibility for your actions. And, in fact, it potentially exposes your client to legal risk as well. If AOL decide to sue someone over this, you both could be named as defendants.
Another possible outcome is that AOL could use technical means to prevent your systems from snarffing the data.
Can you please tell me any other site or list of sites which provide API for theater showtimes or from where I can snarf data on a weekly basis.
No I can't.
The point is that this data you are snarfing costs someone a significant amount of money to create / assemble / manage. AOL will paying at least part of that bill, one way or another, and they won't take kindly to someone (like your client) freeloading off them and (potentially) taking away their business as well.

Writing a program that will automate searches on finra.org for brokers

So I have two possible solutions that I want to implement. Firstly I will state my problem. The task I have been assigned to requires me to go to a website called finra.org and do broker checks to see if the brokers in my excel sheet(which gives the name and company among other things) still work A, and if they do work do they still work for the company in the excel sheet. If they do move on to the next one, and if they don't delete them from the sheet. The issue lies in that I have 37k names to check. I calculated this and to do it individually, which is annoying and takes the whole day allows me to do a maximum of 1400 a day. That is on a productive day when I dont have other things to do. So I figured a better use of my time ( I am an intern) would be to write a program which (here are my two suggested solutions:)
1.) Automatically through minimal key strokes copies the data and pastes it into the search box on the page. Ill still have to click and search but at least I would eliminate copying and pasting and switching between screens which takes the majority of the time.
2.) Completely automate the process. I was thinking of copying the names into a text file and then somehow writing a program that takes each name and submits a query to this website which would then show me the result. Perhaps sends the result text to a text file and then i could just GREP the text file for the data that i need.
Any idea if any of this is possible?
Thanks,
Kevin

Definitely possible. I'm doing something similar with a database and an Excel spreadsheet of values using AutoHotKey to automate queries, Chrome console commands and Javascript bookmarklets to scrape data into the clipboard, and Ruby/Nokogiri with more complex and/or structured parsing tasks.
Either of your methods will work - if you have little programming background, I would suggest starting with AutoHotKey since it mimics keyboard and mouse commands, so the programming is much more straightforward and easier to understand. If you have some object-oriented programming skills, learning Ruby/Nokogiri might be your solution, depending on how FINRA's page is structued.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to automate the retrieving process from a website - java

Related

How to expand code/description to a complex object?

Which way is it better to extract Account number and Balance from a SMS body?

When do I need to use in-memory datastructures instead of an SQL query?

How to get the list of zipcodes/theaterid of theaters listed in moviefone.com

Writing a program that will automate searches on finra.org for brokers

Categories

Resources