im trying to extract a single document through a manually coded variable.
So here I am trying to find a document with the fields name=siteName.
String baseUrl = String.format("https://api.mlab.com/api/1/databases/%s/collections/%s?q=",DB_NAME,COLLECTION_SITE_NAME );
StringBuilder stringBuilder = new StringBuilder(baseUrl);
try {
String first = URLEncoder.encode("{","UTF-8");
String second = URLEncoder.encode("}","UTF-8");
String point = URLEncoder.encode(":","UTF-8");
String URL = first+"\"name\""+point+siteName+second;
stringBuilder.append(URL);
stringBuilder.append("&apiKey="+API_KEY);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return stringBuilder.toString();
}
Related
so i have this class
public static String ip (String url){
try {
String webPagea = url;
URL urla = new URL(webPagea);
URLConnection urlConnectiona = urla.openConnection();
InputStream isa = urlConnectiona.getInputStream();
InputStreamReader isra = new InputStreamReader(isa);
int numCharsReada;
char[] charArraya = new char[1024];
StringBuffer sba = new StringBuffer();
while ((numCharsReada = isra.read(charArraya)) > 0) {
sba.append(charArraya, 0, numCharsReada);
}
String resulta = sba.toString();
return resulta;
} catch (Exception e)
{
}
**(compile error)
}
and i want the above to return the resulta string when called from another class like below:
private class t1 implements Runnable{
public void run() {
String getip= ip("http://google.com");
}
but i get compile error that i didn't add a return statement above where the 2 stars are.
Also in general when i define a string within a try catch like above i cant access it outside the try/catch what am i doing wrong ?
example:
public void haha(String data)
{
try {
string test="test6";
} catch (Exception e)
}
string vv=test; <--test cannot be found
}
I want to emphasize i want to get the output of the page not the source code
if the website outputs text i want just the text not the html code
cheers
The scope of the string resulta is within the bounds of the try block. Modify you rcode to have the string resulta declared outside the try block, like this:
public static String ip (String url){
String resulta = "";
try {
String webPagea = url;
URL urla = new URL(webPagea);
URLConnection urlConnectiona = urla.openConnection();
InputStream isa = urlConnectiona.getInputStream();
InputStreamReader isra = new InputStreamReader(isa);
int numCharsReada;
char[] charArraya = new char[1024];
StringBuffer sba = new StringBuffer();
while ((numCharsReada = isra.read(charArraya)) > 0) {
sba.append(charArraya, 0, numCharsReada);
}
resulta = sba.toString();
} catch (Exception e) {
}
return resulta;
}
I created one Java lambda function and deploy that function to Amazon API gateway.
I want to return JSONObject with inner JSONArray.
But I got { } empty JSONObject in response.
If I set jsonobjetc.toString() in response, That will work perfectly.
But if I return JSONObject I will return empty {} JSON response.
Am I missing something?
JSONObject mainJsonObject;
#Override
public Object handleRequest(Object input, Context context) {
inputHashMap = (LinkedHashMap) input;
responseJSON = new ResponseJSON();
mainJsonObject = new JSONObject();
saveDataToDynamoDB(inputHashMap);
return mainJsonObject;
}
public void saveDataToDynamoDB(LinkedHashMap inHashMap){
String login_id = (String) inputHashMap.get("login_id");
String first_name = (String) inputHashMap.get("first_name");
String last_name = (String) inputHashMap.get("last_name");
try{
DynamoDB dynamoDB = new DynamoDB(new AmazonDynamoDBClient());
Table tableUserDetails = dynamoDB.getTable(USER_PROFILE_TABLE);
Item userProfileTableItem = new Item().withPrimaryKey("login_id", login_id)
.withString("first_name", first_name).withString("last_name", last_name);
tableUserDetails.putItem(userProfileTableItem);
mainJsonObject.put("status", "Success");
mainJsonObject.put("message", "Profile saved successfully.");
mainJsonObject.put("login_id", login_id);
mainJsonObject.put("first_name", first_name);
mainJsonObject.put("last_name", last_name);
}catch(Exception e){
try {
mainJsonObject.put("status", "Failed");
mainJsonObject.put("message", "Failed to saved profile data.");
} catch (JSONException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
}
}
In case you are calling this code from HTTP request of a web service & catch the response result in your application, then your response should be in String format (String representation of JSON object).
After you get the response in String format, then parse the JSON string to JSON Object & do your further logic.
The context is as follows:
I've got objects that represent Tweets (from Twitter). Each object has an id, a date and the id of the original tweet (if there was one).
I receive a file of tweets (where each tweet is in the format of 05/04/2014 12:00:00, tweetID, originalID and is in its' own line) and I want to save them as an XML file where each field has its' own tag.
I want to then be able to read the file and return a list of Tweet objects corresponding to the Tweets from the XML file.
After writing the XML parser that does this I want to test that it works correctly. I've got no idea how to test this.
The XML Parser:
public class TweetToXMLConverter implements TweetImporterExporter {
//there is a single file used for the tweets database
static final String xmlPath = "src/main/resources/tweetsDataBase.xml";
//some "defines", as we like to call them ;)
static final String DB_HEADER = "tweetDataBase";
static final String TWEET_HEADER = "tweet";
static final String TWEET_ID_FIELD = "id";
static final String TWEET_ORIGIN_ID_FIELD = "original tweet";
static final String TWEET_DATE_FIELD = "tweet date";
static File xmlFile;
static boolean initialized = false;
#Override
public void createDB() {
try {
Element tweetDB = new Element(DB_HEADER);
Document doc = new Document(tweetDB);
doc.setRootElement(tweetDB);
XMLOutputter xmlOutput = new XMLOutputter();
// display nice nice? WTF does that chinese whacko want?
xmlOutput.setFormat(Format.getPrettyFormat());
xmlOutput.output(doc, new FileWriter(xmlPath));
xmlFile = new File(xmlPath);
initialized = true;
} catch (IOException io) {
System.out.println(io.getMessage());
}
}
#Override
public void addTweet(Tweet tweet) {
if (!initialized) {
//TODO throw an exception? should not come to pass!
return;
}
SAXBuilder builder = new SAXBuilder();
try {
Document document = (Document) builder.build(xmlFile);
Element newTweet = new Element(TWEET_HEADER);
newTweet.setAttribute(new Attribute(TWEET_ID_FIELD, tweet.getTweetID()));
newTweet.setAttribute(new Attribute(TWEET_DATE_FIELD, tweet.getDate().toString()));
if (tweet.isRetweet())
newTweet.addContent(new Element(TWEET_ORIGIN_ID_FIELD).setText(tweet.getOriginalTweet()));
document.getRootElement().addContent(newTweet);
} catch (IOException io) {
System.out.println(io.getMessage());
} catch (JDOMException jdomex) {
System.out.println(jdomex.getMessage());
}
}
//break glass in case of emergency
#Override
public void addListOfTweets(List<Tweet> list) {
for (Tweet t : list) {
addTweet(t);
}
}
#Override
public List<Tweet> getListOfTweets() {
if (!initialized) {
//TODO throw an exception? should not come to pass!
return null;
}
try {
SAXBuilder builder = new SAXBuilder();
Document document;
document = (Document) builder.build(xmlFile);
List<Tweet> $ = new ArrayList<Tweet>();
for (Object o : document.getRootElement().getChildren(TWEET_HEADER)) {
Element rawTweet = (Element) o;
String id = rawTweet.getAttributeValue(TWEET_ID_FIELD);
String original = rawTweet.getChildText(TWEET_ORIGIN_ID_FIELD);
Date date = new Date(rawTweet.getAttributeValue(TWEET_DATE_FIELD));
$.add(new Tweet(id, original, date));
}
return $;
} catch (JDOMException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return null;
}
}
Some usage:
private TweetImporterExporter converter;
List<Tweet> tweetList = converter.getListOfTweets();
for (String tweetString : lines)
converter.addTweet(new Tweet(tweetString));
How can I make sure the the XML file I read (that contains tweets) corresponds to the file I receive (in the form stated above)?
How can I make sure the tweets I add to the file correspond to the ones I tried to add?
Assuming that you have the following model:
public class Tweet {
private Long id;
private Date date;
private Long originalTweetid;
//getters and seters
}
The process would be the following:
create an isntance of TweetToXMLConverter
create a list of Tweet instances that you expect to receive after parsing the file
feed the converter the list you generated
compare the list received by parsing the list and the list you initiated at the start of the test
public class MainTest {
private TweetToXMLConverter converter;
private List<Tweet> tweets;
#Before
public void setup() {
Tweet tweet = new Tweet(1, "05/04/2014 12:00:00", 2);
Tweet tweet2 = new Tweet(2, "06/04/2014 12:00:00", 1);
Tweet tweet3 = new Tweet(3, "07/04/2014 12:00:00", 2);
tweets.add(tweet);
tweets.add(tweet2);
tweets.add(tweet3);
converter = new TweetToXMLConverter();
converter.addListOfTweets(tweets);
}
#Test
public void testParse() {
List<Tweet> parsedTweets = converter.getListOfTweets();
Assert.assertEquals(parsedTweets.size(), tweets.size());
for (int i=0; i<parsedTweets.size(); i++) {
//assuming that both lists are sorted
Assert.assertEquals(parsedTweets.get(i), tweets.get(i));
};
}
}
I am using JUnit for the actual testing.
I have this code below that encodes a URL before it is send over the wire (email):
private static String urlFor(HttpServletRequest request, String code, String email, boolean forgot) {
try {
URI url = forgot
? new URI(request.getScheme(), null, request.getServerName(), request.getServerPort(), createHtmlLink(),
"code="+code+"&email="+email+"&forgot=true", null)
: new URI(request.getScheme(), null, request.getServerName(), request.getServerPort(), createHtmlLink(),
"code="+code+"&email="+email, null);
String s = url.toString();
return s;
} catch (URISyntaxException e) {
throw new RuntimeException(e);
}
}
/**
* Create the part of the URL taking into consideration if
* its running on dev mode or production
*
* #return
*/
public static String createHtmlLink(){
if (GAEUtils.isGaeProd()){
return "/index.html#ConfirmRegisterPage;";
} else {
return "/index.html?gwt.codesvr=127.0.0.1:9997#ConfirmRegisterPage;";
}
}
The problem with this is that the generated email looks like this:
http://127.0.0.1:8888/index.html%3Fgwt.codesvr=127.0.0.1:9997%23ConfirmRegisterPage;?code=fdc12e195d&email=demo#email.com
The ? mark and # symbol is replaced with %3F and %23 where when the link is opened from the browser it will not open as it is incorrect.
What is the correct way to do this?
You need to combine the query-parts of the url and add the fragment as the correct parameter.
Something like this should work:
private static String urlFor(HttpServletRequest request, String code, String email, boolean forgot) {
try {
URI htmlLink = new URI(createHtmlLink());
String query = htmlLink.getQuery();
String fragment = htmlLink.getFragment();
fragment += "code="+code+"&email="+email;
if(forgot){
fragment += "&forgot=true";
}
URI url = new URI(request.getScheme(), null, request.getServerName(), request.getServerPort(), htmlLink.getPath(),
query, fragment);
String s = url.toString();
return s;
} catch (URISyntaxException e) {
throw new RuntimeException(e);
}
}
You can use the Java API method URLEncoder#encode(). Encode the query parameters using the method.
A better API for doing this is the UriBuilder.
This is the below code in my MyCrawler.java and it is crawling all those links that I have provided in href.startsWith but suppose If I do not want to crawl this particular page http://inv.somehost.com/people/index.html then how can I do this in my code..
public MyCrawler() {
}
public boolean shouldVisit(WebURL url) {
String href = url.getURL().toLowerCase();
if (href.startsWith("http://www.somehost.com/") || href.startsWith("http://inv.somehost.com/") || href.startsWith("http://jo.somehost.com/")) {
//And If I do not want to crawl this page http://inv.somehost.com/data/index.html then how it can be done..
return true;
}
return false;
}
public void visit(Page page) {
int docid = page.getWebURL().getDocid();
String url = page.getWebURL().getURL();
String text = page.getText();
List<WebURL> links = page.getURLs();
int parentDocid = page.getWebURL().getParentDocid();
try {
URL url1 = new URL(url);
System.out.println("URL:- " +url1);
URLConnection connection = url1.openConnection();
Map responseMap = connection.getHeaderFields();
Iterator iterator = responseMap.entrySet().iterator();
while (iterator.hasNext())
{
String key = iterator.next().toString();
if (key.contains("text/html") || key.contains("text/xhtml"))
{
System.out.println(key);
// Content-Type=[text/html; charset=ISO-8859-1]
if (filters.matcher(key) != null){
System.out.println(url1);
try {
final File parentDir = new File("crawl_html");
parentDir.mkdir();
final String hash = MD5Util.md5Hex(url1.toString());
final String fileName = hash + ".txt";
final File file = new File(parentDir, fileName);
boolean success =file.createNewFile(); // Creates file crawl_html/abc.txt
System.out.println("hash:-" + hash);
System.out.println(file);
// Create file if it does not exist
// File did not exist and was created
FileOutputStream fos = new FileOutputStream(file, true);
PrintWriter out = new PrintWriter(fos);
// Also could be written as follows on one line
// Printwriter out = new PrintWriter(new FileWriter(args[0]));
// Write text to file
Tika t = new Tika();
String content= t.parseToString(new URL(url1.toString()));
out.println("===============================================================");
out.println(url1);
out.println(key);
//out.println(success);
out.println(content);
out.println("===============================================================");
out.close();
fos.flush();
fos.close();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TikaException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
// http://google.com
}
}
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("=============");
}
And this is my Controller.java code from where MyCrawler is getting called..
public class Controller {
public static void main(String[] args) throws Exception {
CrawlController controller = new CrawlController("/data/crawl/root");
controller.addSeed("http://www.somehost.com/");
controller.addSeed("http://inv.somehost.com/");
controller.addSeed("http://jo.somehost.com/");
controller.start(MyCrawler.class, 20);
controller.setPolitenessDelay(200);
controller.setMaximumCrawlDepth(2);
}
}
Any suggestions will be appreciated..
How about adding a property to tell which urls you want to exclude.
Add to your exclusions list all the pages that you don't want them to get crawled.
Here is an example:
public class MyCrawler extends WebCrawler {
List<Pattern> exclusionsPatterns;
public MyCrawler() {
exclusionsPatterns = new ArrayList<Pattern>();
//Add here all your exclusions using Regular Expresssions
exclusionsPatterns.add(Pattern.compile("http://investor\\.somehost\\.com.*"));
}
/*
* You should implement this function to specify
* whether the given URL should be visited or not.
*/
public boolean shouldVisit(WebURL url) {
String href = url.getURL().toLowerCase();
//Iterate the patterns to find if the url is excluded.
for (Pattern exclusionPattern : exclusionsPatterns) {
Matcher matcher = exclusionPattern.matcher(href);
if (matcher.matches()) {
return false;
}
}
if (href.startsWith("http://www.ics.uci.edu/")) {
return true;
}
return false;
}
}
In this example we are telling that all urls that start with http://investor.somehost.com should not be crawled.
So these wont be crawled:
http://investor.somehost.com/index.html
http://investor.somehost.com/something/else
I recommend you reading about regular expresions.