Fetching Articles from Google News and Downloading Them in Java - java

How would I go about writing a program that can take articles from Google News and download them to my computer?
I've found that Google News already has a built in RSS feature, but I need to actually download the entire article (text and all) rather than just a headline.
Preferably, I'd like to download these articles as PDFs or HTML files, but for starters just fetching some URLs would be amazing.
There have been some questions on here about fetching articles from Google News, but nothing I've found so far has been particular helpful. Any help would be massively appreciated.
Thanks!

Legal issues aside, this is possible, see Apache HttpComponents. Here is an example (taken from here) of how to use it:
DefaultHttpClient httpclient = new DefaultHttpClient();
if ( useProxy == true ) {
HttpHost proxy = new HttpHost(proxyStr, 80, "http");
httpclient.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY, proxy);
}
HttpGet httpget = new HttpGet(urlStr);
httpget.addHeader("Authorization", "Basic " + encodedAuth);
HttpResponse response = httpclient.execute(httpget);
But be aware of Google TOS before you do anything like this.

Related

How to authenticate with MS SharePoint to use its API with Java?

I'm struggling to authenticate with MS SharePoint to use it's API. I've been googling and playing around with that problem for a while but somehow I can't figure out a solution. The most promising solution so far is based on this answer.
The dependencies I'm using are:
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.4.1</version>
</dependency>
This is my code:
public static void callRestEasyService() throws Exception {
CredentialsProvider credsProvider = new BasicCredentialsProvider();
credsProvider.setCredentials(
new AuthScope(AuthScope.ANY),
new NTCredentials("user", "password", "https://workstation.de", "domain.de"));
CloseableHttpClient httpclient = HttpClients.custom()
.setDefaultCredentialsProvider(credsProvider)
.build();
try {
HttpGet httpget = new HttpGet("https://adress/_api/web/lists");
System.out.println("Executing request " + httpget.getRequestLine());
CloseableHttpResponse response = httpclient.execute(httpget);
try {
System.out.println("----------------------------------------");
System.out.println(response.getStatusLine());
EntityUtils.consume(response.getEntity());
} finally {
response.close();
}
} finally {
httpclient.close();
}
}
The code is quite self-explanatory I think, it is basically just what the linked answer suggests. My investigation showed that the best way to go for this problem are the NTCredentials. I also tried some other alternatives like the Basic Authentication but in all cases I receive:
HTTP/1.1 401 Unauthorized
I also tried using Samba JCIFS as an alternative NTLM engine.
Furthermore I'm a little bit scared that maybe the parameter for workstation (3rd parameter) or domain is filled in incorrectly by me. The documentation says:
workstation - The workstation the authentication request is originating from. Essentially, the computer name for this machine.
So I tried filling in the name of my machine but many examples on the web suggest that you use an URL that you are trying to authenticate with. This caused a little confusion for me but with none of the two options I could get it working.
Does anyone know why that is or has a possible solution or a workaround for that problem? Is it maybe possible that the SharePoint restricts the access via a programmed client? As far as I know it's at least not possible to disable the API from the SharePoint. Any ideas / thoughts to this?
I did not manage to get it going with the apache http client instead of this I tried running a cURL request from my java code to approach the SharePoint API, this worked fine instead. In case any one else has a similiar problem and wants to try a workaround this is my source code to run the cURL request:
ProcessBuilder pb = new ProcessBuilder(
"curl",
"https://mysharepoint/_api/web/lists/getbytitle('listName')/items",
"-v",
"--ntlm",
"--negotiate",
"-u",
"user:password"
);
Process p = pb.start();
InputStream is = p.getInputStream();
return createHashMapForStaff(convertStreamToString(is));
The return line (createHashMapForStaff(convertStreamToString(is)) is just adapting the retrieved .XML from the SharePoint to my needs. The InputStream from the Process p is basically what you need.

GET /tasks does not work

I'm using Query Tasks method: https://asana.com/developers/api-reference/tasks#query using following code snipet:
String url = API_BASE+"/tasks?completed_since=now";
System.out.println(url);
HttpGet httpget = new HttpGet(url);
httpget.addHeader( BasicScheme.authenticate(creds, "US-ASCII", false) );
ResponseHandler<String> responseHandler = new BasicResponseHandler();
String responseBody = httpclient.execute(httpget, responseHandler);
ERROR:
https://app.asana.com/api/1.0/tasks?completed_since=now
null
org.apache.http.client.HttpResponseException: Bad Request
at org.apache.http.impl.client.BasicResponseHandler.handleResponse(BasicResponseHandler.java:67)
at org.apache.http.impl.client.BasicResponseHandler.handleResponse(BasicResponseHandler.java:54)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:735)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:709)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:700)
I work at Asana.
Yes, the underlying message from the server is:
"Must specify exactly one of project, tag, or assignee + workspace"
We'll take a look at updating the documentation for this since it does appear to be explicit with regards to this.
I highly recommend using url as indicated in the examples.
Also, we have a Java client library that you may find useful: https://github.com/Asana/java-asana
Thanks for bringing up the documentation issue.
It looks like project is required parameter missing in documentation.

Capture the URI, resource name,as seen in Developer Tools, from GET POST request method in Java

I am doing this as part of enhancing a Selenium Webdriver script.
I have tried using httpclient with Java and a lot of other things but I am not able to get anywhere.
Please help!
Ths is the scenario:
After a certain action is performed in a webpage like a button click,
GET/POST methods can be seen in the Developer Tools in Chrome.
I have taken the example of Google here.
What I need here is to collect all the resource names until a certain resource appears (If you open the developer tools in Chrome and navigate to google.com , under the Network tab on the leftmost column you will see tia.png , just an example).
There are two things that should be achieved:
ensure that a certain resource was loaded
make sure the page is completely loaded (all GET / POST methods have been transferred) before any other action is taken.
The httpclient, httpurlconnection only capture one request, but a page sends a lot of requests. How do we capture all of them?
By Using http apache client , you can use something like this :-
To solve your problem, you can get the response code "response.getStatusLine().getStatusCode()" and check for the expected response code and can also play with the outcome of the response.
URI url = new URI("String URI");
DefaultHttpClient client = new DefaultHttpClient();
HttpPost post = new HttpPost(url);
HttpResponse response = HttpClientBuilder.create().build().execute(post);
System.out.println("Response Code : " + response.getStatusLine().getStatusCode());
BufferedReader rd = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
StringBuffer result = new StringBuffer();
String line = "";
while ((line = rd.readLine()) != null) {
result.append(line);
}
System.out.println("Ressponse 1 >>>>"+result.toString());
And , if you need to send parameters in your post request like :-
List<NameValuePair> urlParameters = new ArrayList<NameValuePair>();
urlParameters.add(new BasicNameValuePair("username", "yourusername"));
urlParameters.add(new BasicNameValuePair("password", "yourPassword"));
post.setEntity(new UrlEncodedFormEntity(urlParameters));
This is the answer to my question :
https://kenneth.io/blog/2014/12/28/taking-chrome-devtools-outside-the-browser/
https://github.com/auchenberg/chrome-devtools-app
From the blog :
It’s a standalone app that runs Chrome DevTools in its own process.
It’s powered by node-webkit, and it’s able to run on Windows, Mac and
Linux, completely independently of Chrome.
I wanted to capture all the requests (as seen in the screenshot) between the application and the server so that I could ensure all the critical resources, requests have been transferred to and fro and a page has been completely loaded.
I was unable to do this, perform programmatically what we see in the Chrome Dev tool under the Network tab in the image here.
Kenneth's work let's one achieve this to a degree.

URLFetchService with Google App Engine

I used to have this code working with my Tomcat server:
HttpRequestBase targetRequest = ...;
HttpResponse targetResponse = httpclient.execute(targetRequest);
HttpEntity entity = targetResponse.getEntity();
However when I migrated with Google App Engine, I can' use this code anymore. So I read a bit and found that I need to use another code to achieve this.
So I have this code:
URLFetchService fetcher = URLFetchServiceFactory.getURLFetchService();
HTTPResponse targetRespose = fetcher.fetch(targetRequest); // Error
HttpEntity entity = targetResponse.getEntity();
However its obvious that there's an error with the fetcher.fetch code.
All I need to accomplish to to have the same HttpEntity using App Engine approach. Any way to work this out?
org.apache.http.HttpRequest and com.google.appengine.api.urlfetch.HTTPRequest are two totally different classes from two different libraries, so you can not just exchange one for the other.
If you'd like to use Apache HttpClient on GAE, it can be done with some workarounds: see here and here.

What is the preferred method for uploading to a server from Android?

I've been trying to find a way to upload a video from an Android device to an API, but I haven't found a good way to do it. It seems most of the information I've found online is fairly out of date (a lot of it being from last year). Most of them are using a method like this: http://getablogger.blogspot.com/2008/01/android-how-to-post-file-to-php-server.html
What's the easiest/preferred way to upload something to an API with a multipart POST?
I have an Android app I'm developing against the Campfire chat service's "API". The code here uploads a file through multipart POST:
http://github.com/klondike/android-campfire/blob/master/src/com/github/klondike/java/campfire/Room.java#L175
Everything after the "dos.close()" line is related to checking the response to detect whether the post was successful.
Not everything in there is necessary for every multi-part post; for example, the X-Requested-With header is specific to Campfire, the User-Agent is optional, and the Cookie is because I have to stay logged in. Also, the "OH MY GOD" comment about spacing is probably Campfire-specific.
I've heard that the latest version of the HttpClient library from Apache has more convenient built-in multi-part support, but the last sync Google performed against it to Android didn't include those features, so here I am doing it manually.
Hope that's of some help.
You could use the HttpClient from the Apache Software Foundation. It is part of the Android API:
HttpClient httpclient = new DefaultHttpClient();
HttpPost httppost = new HttpPost("www.somewebpage.com/site-that-can-handle-post");
try {
MultipartEntity entity = new MultipartEntity();
entity.addPart("timestamp", new StringBody("1311789946"));
entity.addPart("image", new FileBody(new File("/foo/bar/video.mpeg")));
httppost.setEntity(entity);
HttpResponse response = httpclient.execute(httppost);
} catch (Exception e) {
Log.v(MyActivity.TAG, "doh!", e);
}
Hope that helps. :)

Categories

Resources