How to fix recommendations Failing on SetFetchSize() with Mahout - java

I have a functioning recommender that I wanted to make faster so I decided to connect it directly to my database. However, every time I try to recommend things to people I get an error that setFetchSize() is not >=0. Here is my code:
MySQLJDBCDataModel dataModel = null;
try {
Class.forName("net.sourceforge.jtds.jdbc.Driver");
net.sourceforge.jtds.jdbcx.JtdsDataSource ds = new net.sourceforge.jtds.jdbcx.JtdsDataSource();
ds.setServerName("xxxxx");
ds.setDatabaseName("xxxxx");
ds.setUser("xxxxx");
ds.setPassword(xxxxx);
ds.setDomain("xxxxx");
//net.sourceforge.jtds.jdbc.JtdsStatement.setFetchSize(10);
dataModel = new MySQLJDBCDataModel(ds, "test_tbl", "user_id", "item_id", "preference", null);
} catch (Exception e) {
System.out.println("can't connect");
}
ArrayList<String> itemList=new ArrayList<String>();
ItemSimilarity similarity = new FileItemSimilarity(new File("output/part-r-00000"));
ItemBasedRecommender recommender = new GenericItemBasedRecommender(dataModel, similarity);
//List<RecommendedItem> recommendedItems=recommender.recommend(userid,10);
Recommender cachingRecommender = new CachingRecommender(recommender);
List<userRecData> allUserRecs = new ArrayList<userRecData>();
List<RecommendedItem> uRec=cachingRecommender.recommend(userobjectid,10);
And I get the error:
java.sql.SQLException: The setFetchSize method requires a parameter value >= 0.
at net.sourceforge.jtds.jdbc.JtdsStatement.setFetchSize(JtdsStatement.java:998)
at org.apache.mahout.cf.taste.impl.model.jdbc.AbstractJDBCDataModel.getNumThings(AbstractJDBCDataModel.java:584)
at org.apache.mahout.cf.taste.impl.model.jdbc.AbstractJDBCDataModel.getNumUsers(AbstractJDBCDataModel.java:560)
at org.apache.mahout.cf.taste.impl.recommender.CachingRecommender.<init>(CachingRecommender.java:63)
at mia.recommender.RecommenderIntro.getRecommendations(RecommenderIntro.java:79)
at mia.recommender.RecommenderIntro.main(RecommenderIntro.java:43)
It fails on the cachingRecommender, or if I take that out, on recommnder.recommend
I thought that Mahout automatically set the fetch size to 1000

You're using MySQLJDBCDataModel, but your database is SQL Server. The MySQL implementation disables fetch size with a negative value because its driver needs that. You need to customize AbstractJDBCDataModel to work with SQL Server -- by not overriding getFetchSize() for example.

Related

How many roundtrips are made to a MongoDB server when using transactions?

I wonder how many roundtrips that are made to the server when using transactions MongoDB? For example if the Java driver is used like this:
ClientSession clientSession = client.startSession();
TransactionOptions txnOptions = TransactionOptions.builder()
.readPreference(ReadPreference.primary())
.readConcern(ReadConcern.LOCAL)
.writeConcern(WriteConcern.MAJORITY)
.build();
TransactionBody txnBody = new TransactionBody<String>() {
public String execute() {
MongoCollection<Document> coll1 = client.getDatabase("mydb1").getCollection("foo");
MongoCollection<Document> coll2 = client.getDatabase("mydb2").getCollection("bar");
coll1.insertOne(clientSession, new Document("abc", 1));
coll2.insertOne(clientSession, new Document("xyz", 999));
return "Inserted into collections in different databases";
}
};
try {
clientSession.withTransaction(txnBody, txnOptions);
} catch (RuntimeException e) {
// some error handling
} finally {
clientSession.close();
}
In this case two documents are stored in a transaction:
coll1.insertOne(clientSession, new Document("abc", 1));
coll2.insertOne(clientSession, new Document("xyz", 999));
Are the "insert operations" stacked up and sent to the server in one roundtrip or are two calls (or more?) actually made to the server?
Each insert is sent separately. You can use use bulk writes to batch write operations together.
The commit at the end is a separate operation also.

Azure Document DB - Java 1.9.5 | Authorization Error

I have a collection with some documents in it. And in my application I am creating this collection first and then inserting documents. Also, based on the requirement I need to truncate (delete all documents) the collection as well. Using document db java api I have written the following code for my this purpose-
DocumentClient documentClient = getConnection(masterkey, server, portNo);
List<Database> databaseList = documentClient.queryDatabases("SELECT * FROM root r WHERE r.id='" + schemaName + "'", null).getQueryIterable().toList();
DocumentCollection collection = null;
Database databaseCache = (Database)databaseList.get(0);
List<DocumentCollection> collectionList = documentClient.queryCollections(databaseCache.getSelfLink(), "SELECT * FROM root r WHERE r.id='" + collectionName + "'", null).getQueryIterable().toList();
// truncate logic
if (collectionList.size() > 0) {
collection = ((DocumentCollection) collectionList.get(0));
if (truncate) {
try {
documentClient.deleteDocument(collection.getSelfLink(), null);
} catch (DocumentClientException e) {
e.printStackTrace();
}
}
} else { // create logic
RequestOptions requestOptions = new RequestOptions();
requestOptions.setOfferType("S1");
collection = new DocumentCollection();
collection.setId(collectionName);
try {
collection = documentClient.createCollection(databaseCache.getSelfLink(), collection, requestOptions).getResource();
} catch (DocumentClientException e) {
e.printStackTrace();
}
With the above code I am able to create a new collection successfully. Also, I am able to insert documents as well in this collection. But while truncating the collection I am getting below error-
com.microsoft.azure.documentdb.DocumentClientException: The input authorization token can't serve the request. Please check that the expected payload is built as per the protocol, and check the key being used. Server used the following payload to sign: 'delete
colls
eyckqjnw0ae=
I am using Azure Document DB Java API version 1.9.5.
It will be of great help if you can point out the error in my code or if there is any other better way of truncating collection. I would really appreciate any kind of help here.
According to your description & code, I think the issue was caused by the code below.
try {
documentClient.deleteDocument(collection.getSelfLink(), null);
} catch (DocumentClientException e) {
e.printStackTrace();
}
It seems that you want to delete a document via the code above, but pass the argument documentLink with a collection link.
So if your real intention is to delete a collection, please using the method DocumentClient.deleteCollection(collectionLink, options).

creating a mongodb healthcheck (in dropwizard)

Not necessarily specific to dropwizard, but for the life of me I can't figure out how to easily create a healthcheck for mongodb. This is in java, using version 3.3.0 of mongodb's own java driver.
I was hoping there would be a method that doesn't change the state of the database if it succeeds, but also throws an Exception when the query (or connection, or whatever) fails in order to return a health or unhealthy state. Ideally I'd perform a find, but this doesn't throw an Exception as far as I can tell.
I would just list all collections in database like:
MongoClient client = new MongoClient(addr, opts);
MongoDatabase db = client.getDatabase(database);
try {
MongoIterable<String> allCollections = db.listCollectionNames();
for (String collection : allCollections) {
System.out.println("MongoDB collection: " + collection);
}
} catch (Exception me) {
// problems with mongodb
}

Correct way to get Region Name by using hbase API

I am trying to fetch "Region Name" for a "table" using HBase API.
The setup is mentioned below:
Hbase pseudo-distributed installation (version 0.98.7).
Hadoop 2.5.1 installation.
The Hbase contains very few tables for testing purpose. And information about available regions are shown below from the web UI.
"region name" corresponding to the table "test_table" has been highlighted purposefully.
Now, I have been trying to get these region information from the java based API of hbase using below codes.
void scanTable(String tabName){
org.apache.hadoop.conf.Configuration config = HBaseConfiguration.create();
try{
HTable table = new HTable(config, tabName);
org.apache.hadoop.hbase.TableName tn = table.getName();
HRegionInfo hr = new HRegionInfo(tn);
System.out.println(hr.getRegionNameAsString());
table.close();
}catch(Exception ex){
ex.printStackTrace();
}
}
}
Whenever, I pass a table name, say "test_table", the regionName is returned differently on every run.
RUN 1:
test_table,,1419247657866.77b98d085239ed8668596ea659a7ad7d.
RUN 2:
test_table,,1419247839479.d3097b0f4b407ca827e9fa3773b4d7c7.
RUN 3:
test_table,,1419247859921.e1e39678fa724d7168cd4100289c4234.
I assume that I am using wrong method to generate "region_name" or my approach is wrong.
Please help me to get the region information for given table name.
There is a getTableRegions() in HBaseAdmin which returns all the region info for the table name you want.
List getTableRegions(final TableName tableName)
Below is the method that outputs region name for a given table name.
void getRegionOfTable(String tabName){
org.apache.hadoop.hbase.TableName tn = org.apache.hadoop.hbase.TableName.valueOf(tabName);
org.apache.hadoop.conf.Configuration config = HBaseConfiguration.create();
HRegionInfo ob;
try{
HBaseAdmin hba = new HBaseAdmin(config);
List<HRegionInfo> lr = hba.getTableRegions(tn);
Iterator<HRegionInfo> ir = lr.iterator();
while(ir.hasNext()){
ob = ir.next();
System.out.println(ob.getRegionNameAsString());
}
hba.close();
}catch(Exception ex){
ex.printStackTrace();
}
}
Your code produce a different result every time, because you are building a new "region" with a different timestamp every time. Also that code assumes that your table has a single region.

Using MongoDb with Java Servlet

I am facing an issue with using Mongo DB on a Java servlet.
My servlet has many methods(~20) of accessing the database for retrieving and adding data. A very brief example of one :
public static String getSomething(String s) {
String json = "[]";
JSONArray jsonArray = new JSONArray();
DBCollection table;
try {
Mongo mongo = new Mongo("localhost", 27017);
DB db = mongo.getDB( "myDb" );
BasicDBObject quoteQuery = new BasicDBObject("abc", abc);
DBCursor cursor = table.find(quoteQuery);
try {
while(cursor.hasNext()) {
jsonArray.put(cursor.next());
}
} finally {
cursor.close();
}
// ...
Now the problem is when this Java servlet is deployed in the linux server, it works fine for 10 days or so.
After that it crashes.
When I go to mongodb.log in my var/log directory I get the following repetitive output:
"connection refused because too many open connections"
I am not sure on where to edit things now or how to deal with this. I have tried to grow the limit of open connections in the server but still have the same results.
Any suggestions?
from the API doc : http://api.mongodb.org/java/2.11.3/
public class Mongo extends Object
A database connection with internal connection pooling. For most applications, you should have one Mongo instance for the entire JVM.
You should create Mongo objects very sparingly, ideally even only one per classloader at any time. To reduce the number of Mongo objects you could create it in the servlet's init method and re-use that instance on every call.
EDIT: just had a look at our code, we manage the Mongo instance using a classic singleton class (and always fetch a Mongo using that class's getInstance() method) because if you have multiple servlets / entrypoints in your app just using init() will still generate one instance per servlet, and still won't satisfy the manual section cited by #FredClose
You may create the mongo object for once instead of creating it on each getSomething call.
public SomeClass{
static Mongo mongo = new Mongo("localhost", 27017);
static DB db = mongo.getDB( "myDb" );
public static String getSomething(String s) {
String json = "[]";
JSONArray jsonArray = new JSONArray();
DBCollection table;
try {
BasicDBObject quoteQuery = new BasicDBObject("abc", abc);
DBCursor cursor = table.find(quoteQuery);
while(cursor.hasNext()) {
jsonArray.put(cursor.next());
}
}
Actually the ideal case is not using static access at all and inject DB object from a central controller.
Your are creating connections in MongoDB, but you are not closing connections. For any database, it is very very important to close a connection, otherwise it will reach to its maximum limit and you wont be able to execute your program properly. Following code will be helpful i hope:
public static String getSomething(String s) {
String json = "[]";
JSONArray jsonArray = new JSONArray();
try {
MongoClient mongoClient = new MongoClient("localhost", 27017);
DB db = mongoClient.getDB("myDb");
DBCollection collection = db.getCollection("NAME OF YOUR COLLECTION");
BasicDBObject quoteQuery = new BasicDBObject("abc", "VARIABLE THAT YOU WANT TO FIND");
DBCursor cursor = collection.find(quoteQuery);
try {
while (cursor.hasNext()) {
jsonArray.put(cursor.next());
}
} finally {
cursor.close();
}
mongoClient.close();
} catch (Exception e) {
}
return jsonArray.toString();
}
In this code, 'MongoClient' is closed after its purpose is over.
Arun Gupta ‏#arungupta
New sample shows how to use Mongo within a #JavaEE7 app: New sample to show basic usage of Mongo in a Java EE application
As per above mentioned issue, its like you are creating the Mongo Object for every request.I will suggest to use the single Object through out your application.FOr this you can find the "MongoClient and connection pooling".
MongoClient will handle connection pooling for you automatically.
mongoClient = new MongoClient(URI, connectionOptions);
Here the mongoClient object holds your connection pool, and will give your app connections as needed. You should strive to create this object once as your application initializes and re-use this object throughout your application to talk to your database. The most common connection pooling problem we see results from applications that create a MongoClient object way too often, sometimes on each database request. If you do this you will not be using your connection pool as each MongoClient object maintains a separate pool that is not being reused by your application.

Categories

Resources