Pros and Cons of DFC and DFS?

Pros and Cons of DFC and DFS? - java

I am new to Documentum, I have to upgrade one code from Documentum foundation class to Documentum Foundation Services. Can someone provide the pros and cons of each, and good source of information to get started with it.
btw, I am writing code in Java to get information from documentum.

DFS is an abstraction layer on top of DFC.
SourceRebels is partially right, except for the detail that EMC is now treating DFS as a primary model of integration for external applications (API). You no longer need to use a compiled language (Java or .Net), since you can do everything via SOAP webservice calls. DFC remains available for low-level interaction, but with every Documentum release there are more services added to DFS.
One of the key differences is the object model. In DFS, you can create a batch of operations to send to the server for execution (for instance, create 10 objects). There are also some complex operations in DFS that would take much more code to accomplish using DFC. DFS also allows you to deploy your code to machines without the DFC installed.
Your best resource for Documentum-related questions is http://developer.emc.com.

IMHO they are not comparable because they are not focused on the same. DFC is an API to access Documentum while DFS is a service framework with some predefined services providing some functionality to interact with Documentum.
Thats important: I never used DFS :-)
DFC = Do-it-yourself. Traditional Client-Server programming. Faster.
DFS = Use predefined services or do it yourself for non-trivial tasks. SOA. Probably you need to deploy your services in a new server or purchase more Documentum licenses (not sure about that). Slow but I will feel more comfortable using this if I want to access Documentum from some legacy systems.
Thats my grain of salt I hope you find it useful.

DFS is the new age API for Documentum ( built on the web services concept). You need to read the documentation for DFS which is pretty explanatory. In addition to this you need to have a basic understanding of web service calls (exposing a service, WSDL, building remote clients).

Related

Is it possible to demand load jars into a running java process? Are there frameworks for this?

In the attempt to design & implement & test a distributed capabilities system, Remote Promises[1][2][3], bit identical between Squeak & Java, there are shortcomings. I am seeking work-arounds.
With Remote Promises, proxies can change state, which changes the class implementing the proxy. In Squeak this is done with #becomeForward:, while in Java, it requires a secondary proxy, one that can change it's implemention. This does work.
Exceptions should be non-blocking to allow the event loop to continue, yet also display the problem stack for debugging, out of a quarantine. This is good in Squeak but an open issue with Java. I suppose the answer is do all your logging and then close the exception, allowing the event loop to proceed: it is server-style log debugging.
Using a meta repository, it should be possible to demand load consumers of a particular event type. Dynamically load the latest released code into the consumer servers and spread out the load to speed up the throughput. Update the system at runtime for continuous, seemless operations. I suppose the solution here is to build a dynamic jar classLoader system. Are there any examples of this? An Apache project perhaps?
Remote Promises in Squeak
Cryptography in
Squeak
Remote Promises in
Java, called Raven

Use cloud technologies made for that kind of usecases
I would say that in today world, to get the latest version of a code, you don't use a class loader or any advanced capability of your programming langage. You would user likely some kind of cloud service.
That's may be serverless cloud implementation or a container/kubernetes (https://kubernetes.io/) implementation. You can then perfectly when the new release is loaded, control if you want to do Canary, Blue/Green or progressive rollout or even implement your own strategy.
Because it would work with containers, that would be fine whatever the langage be it C++, java, python, shell, Squeak or anything.
That layer would also provide auto scaling of your various services, redundancy and load balancing and distribute the workload on your cluster.
You can go to the next step with gitops. A PR merge in git automatically trigger the load of the new version in production (https://www.weave.works/technologies/gitops/)
Dynamically loading of jars in Java
Still for sure java thanks to its class loaded API allows to load classes dynamically. This is what web servers are doing and several implementations of that do exist like OSGI or check the response of dimo414.
Conclusion
It would seems that the java route make more sense for a generic plugin system like the one of Eclipse (OSGI), and that the containers solution make more sense for a globally distributed system, auto scaling & resiliance in clusters.
Kubernetes scale to thousand of nodes and provides a whole echosystem to deal with distributed system and it can scale and operate any linux or windows process. This is the de-facto standard pushed by Google and used by thousand of companies over the world.

demand load consumers of a particular event type.
This is typically done via the ServiceLoader API. See the AutoService project to simplify working with services.
This may not be what you need; your question is still very broad, and there are many plausible approaches. Searches for [dynamically load jars] finds existing posts like Load jar dynamically at runtime? that may be of interest.

How to capture save or update events in Couchbase

I would like to be able to do some data manipulation when documents are updated or created in Couchbase.
Documents can arrive in our database either via Sync Gateway or our own code which streams data in from an http service. It would be great to have one place where I can intercept all updates.
We are running a Spring Boot REST API against this data so this would be the good place to have the interceptor/listener. Either way my preference would be for a Java solution.
The data is written as JSON rather than using Spring entities so I can't use ApplicationListener which only listens to events on Entity classes. Correct me if I'm wrong. I can find precious few examples of setting up ApplicationListeners so I may be wrong here but I can't seem to get it working.
I see that there is an Eventing service where you write Javascript but for a number of reasons I'm not keen to go that way. I'm not keen on fragmenting our API code across platforms and languages, not sure I can run the eventing service on our systems etc. Again, I'm open to debate though.
That leaves DCP only as far as I can tell which seems very low level.
https://blog.couchbase.com/couchbases-history-everything-dcp/ but looks like the tool for the job.
The QUESTION: Is there an alternative, less low level, way to catch update events in Couchbase for JSON objects NOT entities other than DCP.

Disclaimer: I work for Couchbase and develop the Java DCP client.
If you've already evaluated the Eventing service and decided it doesn't meet your requirements, the Java DCP client might be worth looking into even though it's not officially supported. It's used by the official Couchbase connectors for Kafka, Spark, and Elasticsearch (all of which are open source) and is actively maintained.
If you only care about events that happened since your app started up, usage can be as simple as registering a callback and starting the event stream. Things get a bit more complicated if you need to remember your place in the stream and resume later (to process events that occurred while you were offline, for example), but there's example code for that case too.
The DCP protocol itself is well documented. If you decide to go this route, it might be good to read at least the Architecture section of that documentation. Also be aware that because the Java DCP Client is unsupported, the API can change without notice. (Officially supporting the library and providing a friendlier API are among our long-term goals, but we haven't committed to anything yet.)

Like David, I also work for Couchbase as a product manager for the Eventing service.
I would like to be able to do some data manipulation when documents are updated or created in Couchbase.
Eventing certainly allows anyone to respond to and perform data manipulation on mutations (inserts or upserts) via tiny JavaScript fragments. Just take a look at couchbase-eventing-small-scripts-that-solve-big-problems for a quick introduction and also the eventing-examples from the documentation.
If you do go the Eventing service route on a SGW enabled bucket you will need to suppress a duplicate mutation via the crc64() function built into Eventing (for details goto eventing-language-constructs and search for: Sync Gateway). In addition if you want to have Eventing directly update the source bucket if SGW is enabled on that bucket there is a more involved workaround (just reach out to me and I will be happy to provide it)
Next you stated:
not sure I can run the Eventing service on our systems
The Eventing service bundled with the Couchbase Enterprise offering, it provides scalable infrastructure to run simple JavaScript fragments on data or documents as they change or mutate without the overhead of an SDK. You either add stand alone Eventing node(s) to your Couchbase cluster or collocate the Eventing service with other existing nodes.

What is the best way to build and expose a Machine Learning model REST api?

I have been working on designing REST api using springframework and deploying them on web servers like Tomcat. I have also worked on building Machine Learning model and use the model to make prediction using sklearn in Python.
Now I have a use case where in I want to expose a REST api which builds Machine Learning Model, and another REST api which makes the prediction. What architecture should help me to achieve the same. (An example of the same maybe a Amazon Machine Learning. They have exposed REST api for generating model and making prediction)
I searched round the internet and found following ways:
Write the whole thing in Java - ML model + REST api
Write the whole thing in Python - ML model + REST api
But playing around with Machine Learning, its models and predictions is really easier and more supported in python with libraries like sklearn, rather than Java. I would really like to use python for Machine Learning part.
I was thinking about and approach wherein I write REST api using JAVA but use sub-process to make python ML calls. Will that work?
Can someone help me regarding the probable architectural approaches that I can take. Also please suggest the most feasible solution.
Thanks in advance.

As others mentioned,
using AzureML is easy solution to deploy ML model as web service/ rest service. However, you need to build the model in Azure platform using graphical interface (drag and drop, configure). People may not like this approach if they have used python -sklearn code build a model. Though, AzureML has option to include R and python script, i did not like it much.
Another option is to store the python ML model as .pkl file and using Flask / DJango rest framework, deploy the model. client apps can consume the rest service. Here is an excellent tutorial on youtube.
https://www.youtube.com/watch?v=s-i6nzXQF3g

From what ive done in the past i suggest 2 options(maybe theres more but this are the ones that i have implemented)
If you have access and budget to cloud services, Azure ML its excelent choice, greate ML framework and environment, and to create your rest API you just need like 2 clicks to expose it ,and then consume it using JSON from any language.
Use scikit-learn and code your REST API in python , but can be consumed from any language, this option is not as easy and user friendly as Azure ML because you will have to code everything by hand and play with the model persistence functions of scikit, but once exposed, you can use it in java(or anything else) . I used this as a reference : https://loads.pickle.me.uk/2016/04/04/deploying-a-scikit-learn-classifier-to-production/
Spark MLlib: i havent tried this option, but i asked myself a question here in stack overflow and got some interesting answers: How to serve a Spark MLlib model?

Well it depends the situation you use python for ML.
For classification models like randomforest,use your train dataset to built tree structures and export as nested dict.Whatever the language you uesd,transform the model object to a kind of data structure then you can ues it anywhere.
BUT if your situation is a large scale,real-timeing,distributional datesets,far as I know,maybe the best way is to deploy the whole ML process on severs.

I'm using Node.js as my rest service and I just call out to the system to interact with my python that holds the stored model. You could always do that if you are more comfortable writing your services in JAVA, just make a call to Runtime exec or use ProcessBuilder to call the python script and get the reply back.

By far, the fastest way to get your sklearn model into an API is FlashAI.io , the service was made for this purpose specifically – I came into this when I was facing the same dilemma recently as I had trained a Scikit-learn model on my local PC using Python, and I wanted to quickly expose it in an API that could be called via an HTTP POST request.
There are other options that were mentioned, all of which require some learning curve, cost in time and effort to simply expose your model. FlashAI lets you expose your model within a couple minutes. Just save your .pkl file and upload it. Your model gets assigned a unique model ID and you just use that to make API requests without any limit. Done and done :)

I have been experimenting with this same task and would like to add another option, not using a REST API: The format of the Apache Spark models is compatible in both the Python and Jave implementations of the framework. So, you could train and build your model in Python (using PySpark), export, and import on the Java side for serving/predictions. This works well.
There are, however, some downsides to this approach:
Spark has two separate ML packages (ML and MLLib) for different data formats (RDD and dataframes)
The algorithms for training models in each of these packages are not the same (no model parity)
The models and training classes don't have uniform interfaces. So, you have to be aware of what the expected format is and might have to transform your data accordingly for both training and inference.
Pre-processing for both training and inference has to be the same, so you either need to do this on the Python side for both stages or somehow replicate the pre-processing on the Java side.
So, if you don't mind the downsides of a Rest API solution (availability, network latency), then this might be the preferable solution.

Feedback on different backends for GWT

I have to re-design an existing application which uses Pylons (Python) on the backend and GWT on the frontend.
In the course of this re-design I can also change the backend system.
I tried to read up on the advantages and disadvantages of various backend systems (Java, Python, etc) but I would be thankful for some feedback from the community.
Existing application:
The existing application was developed with GWT 1.5 (runs now on 2.1) and is a multi-host-page setup.
The Pylons MVC framework defines a set of controllers/host pages in which GWT widgets are embedded ("classical website").
Data is stored in a MySQL database and accessed by the backend with SQLAlchemy/Elixir. Server/client communication is done with RequestBuilder (JSON).
The application is not a typical business like application with complex CRUD functionality (transactions, locking, etc) or sophisticated permission system (tough a simple ACL is required).
The application is used for visualization (charts, tables) of scientific data. The client interface is primarily used to display data in read-only mode. There might be some CRUD functionality but it's not the main aspect of the app.
Only a subset of the scientific data is going to be transfered to the client interface but this subset is generated out of large datasets.
The existing backend uses numpy/scipy to read data from db/files, create matrices and filter them.
The numbers of users accessing or using the app is relatively small, but the burden on the backend for each user/request is pretty high because it has to read and filter large datasets.
Requirements for the new system:
I want to move away from the multi-host-page setup to the MVP architecture (one single host page).
So the backend only serves one host page and acts as data source for AJAX calls.
Data will be still stored in a relational database (PostgreSQL instead of MySQL).
There will be a simple ACL (defines who can see what kind of data) and maybe some CRUD functionality (but it's not a priority).
The size of the datasets is going to increase, so the burden on the backend is probably going to be higher. There won't be many concurrent requests but the few ones have to be handled by the backend quickly. Hardware (RAM and CPU) for the backend server is not an issue.
Possible backend solutions:
Python (SQLAlchemy, Pylons or Django):
Advantages:
Rapid prototyping.
Re-Use of parts of the existing application
Numpy/Scipy for handling large datasets.
Disadvantages:
Weakly typed language -> debugging can be painful
Server/Client communication (JSON parsing or using 3rd party libraries).
Python GIL -> scaling with concurrent requests ?
Server language (python) <> client language (java)
Java (Hibernate/JPA, Spring, etc)
Advantages:
One language for both client and server (Java)
"Easier" to debug.
Server/Client communication (RequestFactory, RPC) easer to implement.
Performance, multi-threading, etc
Object graph can be transfered (RequestFactory).
CRUD "easy" to implement
Multitear architecture (features)
Disadvantages:
Multitear architecture (complexity,requires a lot of configuration)
Handling of arrays/matrices (not sure if there is a pendant to numpy/scipy in java).
Not all features of the Java web application layers/frameworks used (overkill?).
I didn't mention any other backend systems (RoR, etc) because I think these two systems are the most viable ones for my use case.
To be honest I am not new to Java but relatively new to Java web application frameworks. I know my way around Pylons though in the new setup not much of the Pylons features (MVC, templates) will be used because it probably only serves as AJAX backend.
If I go with a Java backend I have to decide whether to do a RESTful service (and clearly separate client from server) or use RequestFactory (tighter coupling). There is no specific requirement for "RESTfulness". In case of a Python backend I would probably go with a RESTful backend (as I have to take care of client/server communication anyways).
Although mainly scientific data is going to be displayed (not part of any Domain Object Graph) also related metadata is going to be displayed on the client (this would favor RequestFactory).
In case of python I can re-use code which was used for loading and filtering of the scientific data.
In case of Java I would have to re-implement this part.
Both backend-systems have its advantages and disadvantages.
I would be thankful for any further feedback.
Maybe somebody has experience with both backend and/or with that use case.
thanks in advance

We had the same dilemma in the past.
I was involved in designing and building a system that had a GWT frontend and Java (Spring, Hibernate) backend. Some of our other (related) systems were built in Python and Ruby, so the expertise was there, and a question just like yours came up.
We decided on Java mainly so we could use a single language for the entire stack. Since the same people worked on both the client and server side, working in a single language reduced the need to context-switch when moving from client to server code (e.g. when debugging). In hindsight I feel that we were proven right and that that was a good decision.
We used RPC, which as you mentioned yourself definitely eased the implementation of c/s communication. I can't say that I liked it much though. REST + JSON feels more right, and at the very least creates better decoupling between server and client. I guess you'll have to decide based on whether you expect you might need to re-implement either client or server independently in the future. If that's unlikely, I'd go with the KISS principle and thus with RPC which keeps it simple in this specific case.
Regarding the disadvantages for Java that you mention, I tend to agree on the principle (I prefer RoR myself), but not on the details. The multitier and configuration architecture isn't really a problem IMO - Spring and Hibernate are simple enough nowadays. IMO the advantage of using Java across client and server in this project trumps the relative ease of using python, plus you'll be introducing complexities in the interface (i.e. by doing REST vs the native RPC).
I can't comment on Numpy/Scipy and any Java alternatives. I've no experience there.

Connect PHP code to Java backend

I am implementing a website using PHP for the front end and a Java service as the back end. The two parts are as follows:
PHP front end listens to http requests and interacts with the database.
The Java back end run continuously and responds to calls from the front end.
More specifically, the back end is a daemon that connects and maintain the link to several IM services (AOL, MSN, Yahoo, Jabber...).
Both of the layers will be deployed on the same system (a CentOS box, I suppose) and introducing a middle layer (for instance: using XML-RPC) will reduce the performance (the resource is also rather limited).
Question: Is there a way to link the two layers directly? (no more web services in between)

Since this is communication between two separate running processes, a "direct" call (as in JNI) is not possible. The easiest ways to do such interprocess communcation are probably named pipes and network sockets. In both cases, you'll have to define a communication protocol and implement it on both sides. Using a standard protocol such as XML-RPC makes this easier, but is not strictly necessary.

There are generally four patterns for application integration:
via Filesystem, ie. one producers writes data to a directory monitored by the consumer
via Database, ie. two applications share a schema or table and use it to swap data
via RMI/RPC/web service/any blocking, sync call from one app to another. For PHP to Java you can pick from the various integration libraries listed above, or use some web services standards like SOAP.
via messaging/any non-blocking, async operation where one app sends a message to another app.
Each of these patterns has pros and cons, but a good rule of thumb is to pick the one with the loosest coupling that you can get away with. For example, if you selected #4 your Java app could crash without also taking down your PHP app.
I'd suggest before looking at specific libraries or technologies listed in the answers here that you pick the right pattern for you, then investigate your specific options.

I have tried PHP-Java bridge(php-java-bridge.sourceforge.net/pjb/) and it works quite well. Basically, we need to run a jar file (JavaBridge.jar) which listens on port(there are several options available like Local socket, 8080 port and so on). Your java class files must be availabe to the JavaBridge in the classpath. You need to include a file Java.inc in your php and you can access the Java classes.

Sure, there are lots of ways, but you said about the limited resource...
IMHO define your own lightweight RPC-like protocol and use sockets on TCP/IP to communicate. Actually in this case there's no need to use full advantages of RPC etc... You need only to define API for this particular case and implement it on both sides. In this case you can serialize your packets to quite small. You can even assign a kind of GUIDs to your remote methods and use them to save the traffic and speed-up your intercommunication.
The advantage of sockets usage is that your solution will be pretty scalable.

You could try the PHP/Java integration.
Also, if the communication is one-way (something like "sendmail for IM"), you could write out the PHP requests to a file and monitor that in your Java app.

I was also faced with this problem recently. The Resin solution above is actually a complete re-write of PHP in Java along the lines of JRuby, Jython and Rhino. It is called Quercus. But I'm guessing for you as it was for me, tossing out your Apache/PHP setup isn't really an option.
And there are more problems with Quercus besides: the free version is GPL, which is tricky if you're developing commercial software (though not as tricky as Resin would like you to believe (but IANAL)) and on top of that the free version doesn't support compiling to byte code, so its basically an interpreter written in Java.
What I decided on in the end was to just exchange simple messages over HTTP. I used PHP's json_encode()/json_decode() and Java's json-lib to encode the messages in JSON (simple, text-based, good match for data model).
Another interesting and light-weight option would be to have Java generate PHP code and then use PHP include() directive to fetch that over HTTP and execute it. I haven't tried this though.
If its the actual HTTP calls you're concerned about (for performance), neither of these solutions will help there. All I can say is that I haven't had problems with the PHP and Java on the same LAN. My feeling is that it won't be a problem for the vast majority of applications as long as you keep your RPC calls fairly course-grained (which you really should do anyway).

Sorry, this is a bit of a quick answer but: i heard the Resin app server has support for integrating java and PHP.
They claim they can smash php and java together: http://www.caucho.com/resin-3.0/quercus/
I've used resin for serving J2ee applications, but not for its PHP support.
I'd be interested to hear of such adventures.

Why not use web service?
Make a Java layer and put a ws access(Axis, SpringWS, etc...) and the Php access the Java layer using one ws client.
I think it's simple and useful.

I've come across this page which introduces a means to link the two layers. However, it still requires a middle layer (TCP/IP). Moreover, other services may exploit the Java service as well because it accepts all incoming connections.
http://www.devx.com/Java/Article/20509
[Researching...]

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.