Download Pandora source with Java? - java

I'm trying to download www.pandora.com/profile/stations/olin_d_kirkland HTML with Java to match what I get when I select 'view page source' from the context menu of the webpage in Chrome.
Now, I know how to download webpage HTML source code with Java. I have done it with downloads.nl and tested it on other sites. However, Pandora is being a mystery. My ultimate goal is to parse the 'Stations' from a Pandora account.
Specifically, I would like to grab the Station names from a site such as www.pandora.com/profile/stations/olin_d_kirkland
I have attempted using the selenium library and the built in URL getter in Java, but I only get ~4700 lines of code when I should be getting 5300. Not to mention that there is no personalized data in the code, which is what I'm looking for.
I figured it was that I wasn't grabbing the JavaScript or letting the JavaScript execute first, but even though I waited for it to load in my code, I would only always get the same result.
If at all possible, I should have a method called 'grabPageSource()' that returns a String. It should return the source code when called upon.
public class PandoraStationFinder {
public static void main(String[] args) throws IOException, InterruptedException {
String s = grabPageSource();
String[] lines = s.split("\n\r");
String t;
ArrayList stations = new ArrayList();
for (int i = 0; i < lines.length; i++) {
t = lines[i].trim();
Pattern p = Pattern.compile("[\\w\\s]+");
Matcher m = p.matcher(t);
if (m.matches() ? true : false) {
Station someStation = new Station(t);
stations.add(someStation);
// System.out.println("I found a match on line " + i + ".");
// System.out.println(t);
}
}
}
public static String grabPageSource() throws IOException {
String fullTxt = "";
// Get HTML from www.pandora.com/profile/stations/olin_d_kirkland
return fullTxt;
}
}
It is irrelevant how it's done, but I'd like, in the final product, to grab a comprehensive list of ALL songs that have been liked by a user on Pandora.

The Pandora pages are heavily constructed using ajax, so many scrapers struggle. In the case you've shown above, looking at the list of stations, the page actually puts through a secondary request to:
http://www.pandora.com/content/stations?startIndex=0&webname=olin_d_kirkland
If you run your request, but point it to that URL rather than the main site, I think you will have a lot more luck with your scraping.
Similarly, to access the "likes", you want this URL:
http://www.pandora.com/content/tracklikes?likeStartIndex=0&thumbStartIndex=0&webname=olin_d_kirkland
This will pull back the liked tracks in groups of 5, but you can page through the results by increasing the 'thumbStartIndex' parameter.

Not an answer exactly, but hopefully this will get you moving in the correct direction:
Whenever I get into this sort of thing, I always fall back on an HTTP monitoring tool. I use firefox, and I really like the Live HTTP Headers extension. Check out what the headers are that are going back and forth, then tailor your http requests accordingly. As an absolute lowest level test, grab the header from a successful request, then send it to port 80 using telnet and see what comes back.

Related

Twilio using Gather to get UserID and Password

Currently, I'm testing out features of Twilio. I simply wanted to get two numbers from the user representing a username and password. What happens is because I don’t know what address to put in action, it defaults to the same time and loops the first gather over and over again. Currently, I’m using Spark and ngrok to test my code.
package com.example;
import com.twilio.twiml.VoiceResponse;
import com.twilio.twiml.voice.Gather;
import com.twilio.twiml.voice.Say;
import static spark.Spark.*;
public class Example {
public static void main(String[] args) {
post("/", (request, response) -> {
Say say = new Say.Builder(
"Enter UserID.")
.build();
Gather gather = new Gather.Builder().action("/").say(say).build();
Say say2 = new Say.Builder(
"Enter Password.")
.build();
Gather gather2 = new Gather.Builder().action("/").say(say2).build();
VoiceResponse voiceResponse = new VoiceResponse.Builder()
.gather(gather)
.gather(gather2)
.build();
return voiceResponse.toXml();
});
}
}
I tried changing the action attribute to a random address like /test, I changed the method to POST or GET, but neither seemed to have any effect except in the ngrok dashboard. The problem I'm currently having is I have no clue how to use Spark and the Gather verb together.
It repeats the actions over and over again because of the properties of the gather TwiMl element.
First, you should point to a different endpoint of your application and handle the response at this endpoint. Once you parse the user input at this endpoint, you can return another TwiML response that asks for the second input, which you then need to process at a third endpoint.
With this in mind, I see a second problem with the code above. You ask for both inputs sequentially, meaning you only ask for the second input if the caller ignores the first prompt and remains quiet. Look at "Scenario 1-3" of this page for more details.
I understand that implementing so many endpoints for a simple IVR might be confusing at first. The good news is that this can be much simpler by using Studio.

automated testing webdrive webpage translation

I am trying to write an app using automated testing with webdriver in Java (I am really new to this), I can already log in and crawl the data I need from a website, the problem is that the page is in chinese and I am trying to display it in English in my app. I have found information about using right click but only on a WebElement, is there anyway I can right click on the page and translate to English or any other method to achieve this?
Thanks
Personally I would continue to retrieve the information in Chinese.
Store each type for example an id as a String. Then use an external library such as the Google Cloud Translate and then pass that id as the following:
public static void translateText(id, String sourceLang, String targetLang, PrintStream out)
{
Translate trans = createTranslateService();
TranslateOption srcLang = TranslateOption.sourceLanguage(sourceLang);
TranslateOption targLang = TranslateOption.sourceLanguage(targetLang);
TranslateOption model = TranslateOption.model("nmt");
Translation translation = translate.translate(id, srcLang, targLang, model);
translation.getTranslatedText());
// Then you can save this into a new variable and pass it onto your website as you need to.
}

I am coding in Android Studio, and I need to fetch and display a specific line of data from a specific webpage

I am very new to coding in Java/Android Studio. I have everything setup that I have been able to figure out thus far. I have a button, and I need to put code inside of the button click event that will fetch information from a website, convert it to a string and display it. I figured I would have to use the html source code in order to do this, so I have installed Jsoup html parser. All of the help with Jsoup I have found only leads me up to getting the HTML into a "Document". And I am not sure if that is the best way to accomplish what I need. Can anyone tell me what code to use to fetch the html code from the website, and then do a search through the html looking for a specific match, and convert that match to a string. Or can anyone tell me if there is a better way to do this. I only need to grab one piece of information and display it.
Here is the piece of html code that contains the value I want:
writeBidRow('Wheat',-60,false,false,false,0.5,'01/15/2015','02/26/2015','All',' ',' ',60,'even','c=2246&l=3519&d=G15',quotes['KEH15'], 0-0);
I need to grab and display whatever value represents the quotes['KEH15'], in that html code.
Thank you in advance for your help.
Keith
Grabbing raw HTML is an extremely tedious way to access information from the web, bad practice, and difficult to maintain in the case that wherever you are fetching the info from changes their HTML.
I don't know your specific situation and what the data is that you are fetching, but if there is another way for you to fetch that data via an API, use that instead.
Since you say you are pretty new to Android and Java, let me explain something I wish had been explained to me very early on (although I am mostly self taught).
The way people access information across the Internet is traditionally through HTML and JavaScript (which is interpreted by your browser like Chrome or Firefox to look pretty), which are transferred over the internet using the protocol called HTTP. This is a great way for humans to communicate with computers that are far away, and the average person probably doesn't realize that there is more to the internet than this--your browser and the websites you can go to.
Although there are multiple methods, for the purpose of what I think you're looking for, applications communicate over the internet a slightly different way:
When an android application asks a server for some information, rather than returning HTML and JavaScript which is intended for human consumption, the server will (traditionally) return what's called JSON (or sometimes XML, which is very similar). JSON is a very simple way to get information about an object, and put it into a form that is readable easily by both humans (developers) and computers, and can be transmitted over the internet easily. For example, let's say you ask a server for some kind of "Video" object for an app that plays video, it may give you something like this:
{
"name": "Gangnam Style",
"metadata": {
"url": "https://www.youtube.com/watch?v=9bZkp7q19f0",
"views": 2000000000,
"ageRestricted": false,
"likes": 43434
"dislikes":124
},
"comments": [
{
"username": "John",
"comment": "10/10 would watch again"
},
{
"username": "Jane",
"number": "12/10 with rice"
}
]
}
That is very readable by us humans, but also by computers! We know the name is "Gangnam Style", the link of the video, etc.
A super helpful way to interact with JSON in Java and Android is Google's GSON library, which lets you cast a Java object as JSON or parse a JSON object to a Java object.
To get this information in the first place, you have to make a network call to an API, Application Programming Interface. Just a fancy term for communication between a server and a client. One very cool, free, and easy to understand API that I will use for this example is the OMDB API, which just spits back information about movies from IMDB. So how do you talk to the API? Well luckily they've got some nice documentation, which says that to get information on a movie we need to use some parameters in the url, like perhaps
http://www.omdbapi.com/?t=Interstellar
They want a title with the parameter "t". We could put a year, or return type, but this should be good to understand the basics. If you go to that URL in your browser, it spits back lots of information about Interstellar in JSON form. That stuff we were talking about! So how would you get this information from your Android application?
Well, you could use Android's built in HttpUrlConnection classes and research for a few hours on why your calls aren't working. But doesn't essentially every app now use networking? Why reinvent the wheel when virtually every valuable app out there has probably done this work before? Perhaps we can find some code online to do this work for us.
Or even better, a library! In particular, an open source library developed by Square, retrofit. There are multiple libraries like it (go ahead and research that out, it's best to find the best fit for your project), but the idea is they do all the hard work for you like low level network programming. Following their guides, you can reduce a lot of code work into just a few lines. So for our OMDB API example, we can set up our network calls like this:
//OMDB API
public ApiClient{
//an instance of this client object
private static OmdbApiInterface sOmdbApiInterface;
//if the omdbApiInterface object has been instantiated, return it, but if not, build it then return it.
public static OmdbApiInterface getOmdbApiClient() {
if (sOmdbApiInterface == null) {
RestAdapter restAdapter = new RestAdapter.Builder()
.setEndpoint("http://www.omdbapi.com")
.build();
sOmdbApiInterface = restAdapter.create(OmdbApiInterface.class);
}
return sOmdbApiInterface;
}
public interface OmdbApiInterface {
#GET("/")
void getInfo(#Query("t") String title, Callback<JsonObject> callback);
}
}
After you have researched and understand what's going on up there using their documentation, we can now use this class that we have set up anywhere in your application to call the API:
//you could get a user input string and pass it in as movieName
ApiClient.getOmdbApiClient().getInfo(movieName, new Callback<List<MovieInfo>>() {
//the nice thing here is that RetroFit deals with the JSON for you, so you can just get information right here from the JSON object
#Override
public void success(JsonObject movies, Response response) {
Log.i("TAG","Movie name is " + movies.getString("Title");
}
#Override
public void failure(RetrofitError error) {
Log.e("TAG", error.getMessage());
}
});
Now you've made an API call to get info from across the web! Congratulations! Now do what you want with the data. In this case we used Omdb but you can use anything that has this method of communication. For your purposes, I don't know exactly what data you are trying to get, but if it's possible, try to find a public API or something where you can get it using a method similar to this.
Let me know if you've got any questions.
Cheers!
As #caleb-allen said, if an API is available to you, it's better to use that.
However, I'm assuming that the web page is all you have to work with.
There are many libraries that can be used on Android to get the content of a URL.
Choices range from using the bare-bones HTTPUrlConnection to slightly higher-level HTTPClient to using robust libraries like Retrofit. I personally recommend Retrofit. Whatever you do, make sure that your HTTP access is asynchronous, and not done on the UI thread. Retrofit will handle this for you by default.
For parsing the results, I've had good results in the past using the open-source HTMLCleaner library - see http://htmlcleaner.sourceforge.net
Similar to JSoup, it takes a possibly-badly-formed HTML document and creates a valid XML document from it.
Once you have a valid XML document, you can use HTMLCleaner's implementation of the XML DOM to parse the document to find what you need.
Here, for example, is a method that I use to parse the names of 'projects' from a <table> element on a web page where projects are links within the table:
private List<Project> parseProjects(String html) throws Exception {
List<Project> parsedProjects = new ArrayList<Project>();
HtmlCleaner pageParser = new HtmlCleaner();
TagNode node = pageParser.clean(html);
String xpath = "//table[#class='listtable']".toString();
Object[] tables = node.evaluateXPath(xpath);
TagNode tableNode;
if(tables.length > 1) {
tableNode = (TagNode) tables[0];
} else {
throw new Exception("projects table not found in html");
}
TagNode[] projectLinks = tableNode.getElementsByName("a", true);
for(int i = 0; i < projectLinks.length; i++) {
TagNode link = projectLinks[i];
String projectName = link.getText().toString();
String href = link.getAttributeByName("href");
String projectIdString = href.split("=")[1];
int projectId = Integer.parseInt(projectIdString);
Project project = new Project(projectId, projectName);
parsedProjects.add(project);
}
return parsedProjects;
}
If you have permission to edit the webpage to add hyper link to specified line of that page you can use this way
First add code for head of line that you want to go there in your page
head your text if wanna
Then in your apk app on control click code enter
This.mwebview.loadurl("https:#######.com.html#target")
in left side of # enter your address of webpage and then #target in this example that your id is target.
Excuse me if my english lang. isn't good

Including variables in URL, returns error page

I'm trying to access a URL in java using HTMLUnit. The way the website I'm using works is that for search results on the website, it draws the first page of search results initially and then changes to the selected page. What I want to do is access a specific page, say, 21. The URL would have to have a variable appended to it (E.g. http://www.thomsonlocal.com/Electricians/UK/#||25). Using it on my browser gets me the 25th page after the first page loads initially and then a method kicks in. (javascript or JQuery?)
I have tried to encode the URL to escape the vertical bar character but that returns an error page on the site.
page = webClient.getPage("http://www.thomsonlocal.com/Electricians/UK/"+URLEncoder.encode("#||" , "UTF-8")+ 21);
My question is what am I doing wrong here? And is there a way to find out what method is being used which the variables in the URL are passed to?
The part after the # is a URI fragment. It does not obey the same escaping rules as form data which is what URLEncoder.encode() does (which means it does not work for URLs, contrary to popular belief).
What you want is a URI template here (RFC 6570). Sample using this library:
public static void main(final String... args)
throws URITemplateException, MalformedURLException
{
final URITemplate template
= new URITemplate("http://www.thomsonlocal.com/Electricians/UK/#{+var}");
final VariableMap map = VariableMap.newBuilder()
.addScalarValue("var", "||25")
.freeze();
System.out.println(template.toURL(map));
}
This will (correctly) print:
http://www.thomsonlocal.com/Electricians/UK/#%7C%7C25
Another solution, though not as flexible, is to use the URI constructor:
final URI uri = new URI("http", "www.thomsonlocal.com",
"/Electricians/UK/", "||25");
System.out.println(uri.toURL());
This will also print the correct result.

Alert if webpage has been updated

I am creatin an app in Java that checks if a webpage has been updated.
However some webpages dont have a "last Modified" header.
I even tried checking for a change in content length but this method is not reliable as sometimes the content length changes without any modification in the webpage giving a false alarm.
I really need some help here as i am not able to think of a single foolproof method.
Any ideas???
If you connect the whole time to the webpage like this code it can help:
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
public class main {
String updatecheck = "";
public static void main(String args[]) throws Exception {
//Constantly trying to load page
while (true) {
try {
System.out.println("Loading page...");
// connecting to a website with Jsoup
Document doc = Jsoup.connect("URL").userAgent("CHROME").get();
// Selecting a part of this website with Jsoup
String pick = doc.select("div.selection").get(0);
// printing out when selected part is updated.
if (updatecheck != pick){
updatecheck = pick;
System.out.println("Page is changed.");
}
} catch (Exception e) {
e.printStackTrace();
System.out.println("Exception occured.... going to retry... \n");
}
}
}
}
How to get notified after a webpage changes instead of refreshing?
Probably the most reliable option would be to store a hash of the page contet.
If you are saying that content-length changes then probably the webpages your are trying to check are dynamically generated and or not whatsoever a static in nature. If that is the case then even if you check the 'last-Modified' header it won't reflect the changes in content in most cases anyway.
I guess the only solution would be a page specific solution working only for a specific page, one page you could parse and look for content changes in some parts of this page, another page you could check by last modified header and some other pages you would have to check using the content length, in my opinion there is no way to do it in a unified mode for all pages on the internet. Another option would be to talk with people developing the pages you are checking for some markers which will help you determine if the page changed or not but that of course depends on your specific use case and what you are doing with it.

Categories

Resources