I found this out recently that the listFolder/Continue API returns 2000 entries only. And I have also tried recursion and while loops to fetch all the metadata and add to a existing top level result.getEntries() list. But none of these approaches are working, I still see that only 2000 metadata entries are in the ultimate result.
Has anyone tried and made it work in Java?
Each page of listFolder results, i.e., each response from a single call to listFolder or listFolderContinue, can contain up to about 2,000 entries. (Note that this number is not guaranteed, so you should not rely on it.)
To make sure you can get all of the items in a folder, you need to use both listFolder and listFolderContinue, calling back to listFolderContinue with the latest cursor when the hasMore on the last call was true. There's a working example here:
https://github.com/dropbox/dropbox-sdk-java/blob/c6aeb4bf3011e6b803eaa325ea20a52f8412ee0f/examples/tutorial/src/main/java/com/dropbox/core/examples/tutorial/Main.java#L32
Related
I have a RDF file that has 7MB and ~ 80k statements.
When starting the application, I have the following code, that retrieves a list of itens I need to show to the user:
NodeIterator iterator = technologyModel.listObjectsOfProperty(subject);
while (iterator.hasNext()) {
RDFNode node = iterator.nextNode();
myCollection.add(node.asLiteral().getString().trim());
}
Note: This code works just fine and returns something about 3k results, and is the first time the "technologyModel" is accessed.
Obviously, before doing that, I have to load the dataset/model, and here is the problem.
Case (1) When I load the dataset/model from a RDF file, doing this:
InputStream in = FileManager.get().open(ParamsHelper.sourceRDF);
technologyModel.read(in, "RDF/XML-ABBREV");
the technologyModel seems instantly loaded and the first code posted runs in less than a second.
Case (2) However, when I try to load the model from a TDB database (previously loaded with the same RDF file used on first case), with this code:
dataset = TDBFactory.createDataset(ParamsHelper.tdbBaseDir);
dataset.begin(ReadWrite.READ) ;
technologyModel = dataset.getNamedModel("http://a.example.biz/technology");
dataset.end();
the technologyModel doesn´t seem to be instantly loaded, and even though the first code posted returns as expected, it runs in about 30 seconds at the first call.
If I call that same code after the first time, or, for example, insert another operation like technologyModel.listSubjects() before calling this code for the first time, it will run immediately, as expected.
It seems to me that on the second case, the model is really loaded only afthe the first operation it suffers. Does it make any sense?
I don´t want to keep my data in a RDF file, but rather have a TDB database storing the triples. That´s why the second option seems to fit me better.
Can anyone help me on this? I hope I could expose the problem correctly.
Thanks in advance.
There are two effects here:
TDBFactory.createDataset doesn't loaded any data - it connects to the database. Data is loaded into memory (cached) as it is used so when you are doing listObjectsOfProperty the first time, all caches are cold and the database may well be slow. It will be quite sensitive to the hardware you are running on at this point.
The second is that Model API calls can have access patterns that are databse-unfriendly. It is better to use SPARQL on the dataset.
By the way: listObjectsOfProperty does not take a subject - it takes a property and can access a lotof the database. If myCollection is a set, then you may be adding a lot more than 3K items.
I'm new to MarkLogic and trying to implement following scenario with its Java API:
For each user I'll have two directories, something like:
1.1. user1/xmls/recent/
1.2. user1/xmls/archived/
When user is doing something with his xml - it's put to the "recent" directory;
When user is doing something with his next xml and "recent" directory is full (e.g. has some amount of documents, let's say 20) - the oldest document is moved to the "archived" directory;
User can request all documents from the "recent" directory and should get no more than 20 records;
User can remove something from the "recent" directory manually; In this case, if it had 20 documents, after deleting one it must have 19;
User can do something with his xmls simultaneously and "recent" directory should never become bigger than 20 entries.
Questions are:
In order to properly handle simultaneous adding of xmls to the "recent" directory, should I block whole "recent" directory when adding new entry (to actually add it, check if there are more than 20 records after adding, select the oldest 21st one and move it to the "archived" directory and do all these steps atomically)? How can I do it?
Any suggestions on how to implement this via Java API?
Is it possible to change document's URI (e.g. replace "recent" with "archived" in my case)?
Should I consider using MarkLogic's collections here?
I'm open to any suggestions and comments (as I said I'm new to MarkLogic and maybe my thoughts on how to handle described scenario are completely wrong).
You can achieve atomicity of a sequence of transactions using Multi-Statement Transactions (MST)
It is possible to MST from the Java API: http://docs.marklogic.com/guide/java/transactions#id_79848
It's not possible to change a URI. However, it is possible to use an MST to delete the old document and reinsert a new one using the new URI in one an atomic step. This would have the same effect.
Possibly, and judging from your use case, unless you must have the recent/archived information as part of the URI, it may be simpler to store this information in collections. However, you should read the documentation and evaluate for yourself: http://docs.marklogic.com/guide/search-dev/collections#chapter
Personally I would skip all the hassle with separate directories as well as collections. You would endlessly have to move files around, or changes their properties. It would be much easier to not calculate anything up front, and simply use lastModified property, or something alike, to determine most recent items at run-time.
HTH!
Im new to java and working on a simple application that monitor an url and notify me when a table is updated whit new items. Looking at the entire page will not work as there are commercials that change all the time and they would give false positives.
My thought was to fetch the url line by line looking for the elements. For each element I will check to see if the element is already in an arraylist. If not the element is added to the arraylist and a notification is send.
What I need support with is not the exact code but advice if this would be a good approach and if I should store the elements in an array list or if I should use a file instead as there are 2 lines of text in each element.
Also It would be good to get recomandation on what methods and libs there would be good to look at.
Thanks in advance
Sebastian
To check the site it'd probably be more stable to parse the HTML and work with an object representation of the DOM. I've never had to do this but in a question regarding how to do this another user suggested using JTidy, maybe you could have a look at that.
As for storing the information (what you currently do in your ArrayList): this really depends on what you use your application for. If you only want to be notified of changes that occur during the runtime of your program this is perfectly fine. If you want to have the information persist you should find a way to store the information in the file system or database.
I'm querying data in the Facebook Graph API explorer:
access_token="SECRET"
GET https://graph.facebook.com/me/home?limit=20&until=1334555920&fields=id
result:
{
"data": [
]
}
I was shocked since there are many feeds on my "home".
Then I tried to set the limit to 100, then I got a feed list.
What's going on here? Does the "limit" parameter affect the graph api's result?
I tried to increase the limit to 25 and query again, there is one feed.
So what's the relationship between "limit" and "until"?
Facebook's API can be a little weird sometimes because of the data you're trying to access and there's a few parts to this question.
Limits
The limits are applied when data is returned, but before permissions and access controls are generated, which is explained with this blog post from last year: Limits in the Graph API.
Permissions
More importantly, even if you give yourself a token with every FB permission possible, you still won't be able to access everything that you created. Say you post something on a Friend's feed, but their feed is not set to Public privacy. Any queries against that friend's feed with your token will never return data (Or at least that was the case around a year ago).
API Itself
One of the most awesome bugs I found in the Graph API when I was working with it last year is the way it handles paging. The Graph API allows three filters: limit, offset, and since/until. Somewhere Facebook recommends (and rightly so) that you make use of the since/until dates exclusively whenever possible for paging. Ignoring debates as to why you would do that vs. offsets on a theoretical basis, on a practical one the following query used to degrade over time:
// This obviously isn't valid as written, but you the params change as described
limit=fixed-value&offset=programmatic-increase&since=some-fixed-date-here
The reason: Date ranges and offsets don't behave well with each other. As an example, say I made the following initial query:
// My example query
limit=20&since=1334555920
--> {#1,#2, ... #20}
Naturally you would want to page more data. The result would be something like this (I can't remember the exact pattern, but the top n would be repeats and the list of results would be truncated by n/2 or something similar):
// My example query
limit=20&since=1334555920&offset=20
---> {#10, #11 ... #25}
I never figured out why it happened, but eventually the query would taper off to return nothing and you would only get around 50-100 unique values. If you paged using dates exclusively however, you could go on for as long as the data would let you.
This is with the caveat that this was a bug and this was from a while ago. The main lesson here is I never would have found this bug without modifying my query to make things that should come out exactly the same (A particular date range based on posts #10-30 compared with a limit=20, offset=10) but the results were quite different.
I'm using JavaHL to connect to a 1.6 svn repos. While I managed to list the contents of the repository, I'm not able to get the item history (the comments made on the check ins as well as the dates and the authors).
As far as I see, SVNClient.logMessages is the right method, but the callback method is never been executed. I used Revision.HEAD for the path revision and a revision range object holding Revision.START and Revision.HEAD; the limit is set to 0 (which is no limit according to the documentation). I'm trying to fetch the revision, the date, the author and the comment.
If someone knows about example code on using JavaHL I'm maybe able to find my fault by comparing that code to mine.
BTW: I know about SVNKit, but the management decided not to buy it. Thus I have to use JavaHL, where next-to-no sample programs exist (and the doc will merely list the classes and interfaces without a very detailed description). So, please point me in that direction of SVNKit as this is impossible for me.
Any pointers appreciated.
Gnarf
The issue has been solved. The problem was the call to SVNClient.logMessages(), especially the revision range used.
The start revision had been Revision.START that, according to the documentation, is used to describe the "first existing revision".
The problem disappeared when I used Revision.getInstance(1) instead. As it is reasonable that any item has at least one revision (the initial one) with that number, it should be save to use that.
Hopefully this will save anyone else from spending another two-and-a-half days to figure it out!
Gnarf