I am trying to update a document with MongoDB Async Java Driver and my code is below,
// jsonString is a string of "{_id=5715e426ed3522391f106e68, name=Alex}
final Document document = Document.parse(jsonString);
Document newDocument = document.append("status", "processing");
mongoDbCollection.replaceOne(document, newDocument, (updateResult, throwable) -> {
if (updateResult != null) {
log.info("UPDATED DOC ::::::>>> " + newDocument.toJson());
log.info("UPDATED RESULT ::::>> "+updateResult.toString());
} else {
throwable.printStackTrace();
log.error(throwable.getMessage());
}
});
As per the logging, I do see the updated document as below,
INFO: UPDATED DOC ::::::>>> { "_id" : { "$oid" : "5715e426ed3522391f106e68" }, "status":"processing"}
INFO: UPDATED RESULT ::::>> AcknowledgedUpdateResult{matchedCount=0, modifiedCount=0, upsertedId=null}
But when I see the collection via Robmongo I do not see the updated document and it still shows the old document. I have double checked I am looking in the same collection and there aren't any exceptions. Am I doing something wrong here?
The problem here is this line:
Document newDocument = document.append("status", "processing");
Where you "thought" you were just assigning a "new document copy", but actually this "also" modifies the document object to append the field.
As such, the query does not match, just as indicated in your output:
{matchedCount=0, modifiedCount=0, upsertedId=null}
// ^ Right here! See 0 matched
So what you want is a "clone". It's not straightforward with a Document, but can be done with this slightly "hacky" method:
Document document = Document.parse(jsonString);
Document newDocument = Document.parse(document.toJson()).append("status","processing");
System.out.println(newDocument);
System.out.println(document);
Now you will see that newDocument contains the addition, whilst document remains unaltered, which is not the case with your current code, and why the query does not match anything to update.
Related
I'm using JSoup to grab content from web pages.
I want to get all the links on a page that have some contained text (it doesn't matter what the text is) just needs to be non-empty/image etc.
Example of links I want:
Link to Some Page
Since it contains the text "Link to Some Page"
Links I don't want:
<img src="someimage.jpg"/>
My code looks like this. How can I modify it to only get the first type of link?
Document document = // I get my document object
Elements linksOnPage = document.select("a[href]")
for (Element page : linksOnPage) {
String link = page.attr("abs:href");
// I do stuff with the link
}
You could do something like this.
It does it's job though it's probably not the fanciest solution out there.
Note: the function text() gets you a clean text so if there are any HTML code fragements inside it, it won't return them.
Document doc = // get the doc
Elements linksOnPage = document.select("a");
for (Element pageElem : linksOnPage){
String link = "";
if(pageElem.text().trim().equals(""))
continue;
// do smth with it
}
I am using this and it's working fine:
Document document = // I get my document object
Elements linksOnPage = document.select("a:matches(([^\\s]+))");
for (Element page : linksOnPage) {
String link = page.attr("abs:href");
// I do stuff with the link
}
MongoDB 2.5 driver have DBCollection.findAndModify() method for this, but MongoCollection misses this method. After some search, I found that findOneAndUpdate() now has the same role.
But this method has different signature, don't understand how to use it. Here is command I want to execute
db.COL1.findAndModify({
query: { id: 2 },
update: {
$setOnInsert: { date: new Date(), reptype: 'EOD' }
},
new: true, // return new doc if one is upserted
upsert: true // insert the document if it does not exist
})
Documentation for findOneAndUpdate method states that
Returns:
the document that was updated. Depending on the value of the returnOriginal property, this will either be the document as it was before the update or as it is after the update.
but cannot find anything about this returnOriginal property. Anyone knows how to set it correctly?
A Java equivalent of your query should go roughly like this:
Document query = new Document("id", 2);
Document setOnInsert = new Document();
setOnInsert.put("date", new Date());
setOnInsert.put("reptype", "EOD");
Document update = new Document("$setOnInsert", setOnInsert);
FindOneAndUpdateOptions options = new FindOneAndUpdateOptions();
options.returnDocument(ReturnDocument.AFTER);
options.upsert(true);
db.getCollection("COL1").findOneAndUpdate(query, update, options);
Regarding the returnOriginal property - you're right - there is no such thing. The javadoc is irrelevant in this place. However, there is a returnDocument property in FindOneAndUpdateOptions. You can set it to ReturnDocument.AFTER or ReturnDocument.BEFORE which is equivalent to new: true/false.
I have this web page https://rrtp.comed.com/pricing-table-today/ and from that I need to get the information about Time (Hour Ending) and Day-Ahead Hourly Price column alone. I tried with the following code,
Document doc = Jsoup.connect("https://rrtp.comed.com/pricing-table-today/").get();
for (Element table : doc.select("table.prices three-col")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
if (tds.size() > 2) {
System.out.println(tds.get(0).text() + ":" + tds.get(1).text());
}
}
}
but unfortunately I am unable to get the data I need.
Is there something wrong in the code..? or This page can't be crawled...?
Need some help
As I said in comment:
You should hit https://rrtp.comed.com/rrtp/ServletFeed?type=pricingtabledual&date=20150717 because it's source from which data is loaded on the page you have pointed to.
Data under this link is not a valid html document (and this is why it's not working for you), but you can easily make it "quite" right.
All you have to do is first get the response and add <table>..</table> tags around it, then it's enough to parse it as html document.
Connection.Response response = Jsoup.connect("https://rrtp.comed.com/rrtp/ServletFeed?type=pricingtabledual&date=20150717").execute();
Document doc = Jsoup.parse("<table>" + response.body() + "</table>");
for (Element element : doc.select("tr")) {
System.out.println(element.html());
}
In Mongo, is there a built-in way to update a document and instead of replacing all contents of the query document, to update those nodes which are the same and append those which do not exist in the original document.
For example, imagine I insert the following document into my collection:
{
"name" : "Goku",
"level" : 9000
}
Now, at some later point, I wish to update my existing document with the following document I received:
{
"name" : "Goku",
"son" : "Gohan"
}
Ideally, I would like a way to perform an update and produce the following document:
{
"name" : "Goku",
"level" : 9000,
"son" : "Gohan"
}
The standard case is to overwrite the existing document with the new document (as it should be). However, is there a built-in or clever way to achieve the result above without first finding the first document, appending onto it, and then performing an update?
Thanks.
-- EDIT --
#pennstatephil has the correct answer below. Just in case anyone's is helped by this, here's an implementation of this example in Java as of driver version 2.12.0:
String json = "{'name' : 'Goku', 'level' : 9000 }";
DBObject document = (DBObject) JSON.parse(json);
BasicDBObject update = new BasicDBObject("$set", document);
BasicDBObject query = new BasicDBObject().append("name", document.get("name"));
collection.findAndModify(query, null, null, false, update, false, true);
json = "{'name' : 'Goku', 'son' : 'Gohan'}";
document = (DBObject) JSON.parse(json);
update = new BasicDBObject("$set", document);
query = new BasicDBObject().append("name", document.get("name"));
collection.findAndModify(query, null, null, false, update, false, true);
I believe findAndModify and $set (on the update clause) is what you're looking for.
I did some research and it seems that is standard Jsoup make this change. I wonder if there is a way to configure this or is there some other Parser I can be converted to a document of Jsoup, or some way to fix this?
Unfortunately not, the constructor of Tag class changes the name to lower case:
private Tag(String tagName) {
this.tagName = tagName.toLowerCase();
}
But there are two ways to change this behavour:
If you want a clean solution, you can clone / download the JSoup Git and change this line.
If you want a dirty solution, you can use reflection.
Example for #2:
Field tagName = Tag.class.getDeclaredField("tagName"); // Get the field which contains the tagname
tagName.setAccessible(true); // Set accessible to allow changes
for( Element element : doc.select("*") ) // Iterate over all tags
{
Tag tag = element.tag(); // Get the tag of the element
String value = tagName.get(tag).toString(); // Get the value (= name) of the tag
if( !value.startsWith("#") ) // You can ignore all tags starting with a '#'
{
tagName.set(tag, value.toUpperCase()); // Set the tagname to the uppercase
}
}
tagName.setAccessible(false); // Revert to false
Here is a code sample (version >= 1.11.x):
Parser parser = Parser.htmlParser();
parser.settings(new ParseSettings(true, true));
Document doc = parser.parseInput(html, baseUrl);
There is ParseSettings class introduced in version 1.9.3.
It comes with options to preserve case for tags and attributes.
You must use xmlParser instead of htmlParser and the tags will remain unchanged. One line does the trick:
String html = "<camelCaseTag>some text</camelCaseTag>";
Document doc = Jsoup.parse(html, "", Parser.xmlParser());
I am using 1.11.1-SNAPSHOT version which does not have this piece of code.
private Tag(String tagName) {
this.tagName = tagName.toLowerCase();
}
So I checked ParseSettings as suggested above and changed this piece of code from:
static {
htmlDefault = new ParseSettings(false, false);
preserveCase = new ParseSettings(true, true);
}
to:
static {
htmlDefault = new ParseSettings(true, true);
preserveCase = new ParseSettings(true, true);
}
and skipped test cases while building JAR.