How do I update a Pentaho transformations connection data using Java? - java

I’m using Java to build a restful API around the functionality of Pentaho data integration.
I’ve implemented several end points such as the ability to create a repository
containing jobs and transformations, running the jobs and transformations, displaying the image of the job and transformations, displaying the database connections within a repo, plus quite a few more.
I’m trying to build an endpoint that allows for the data sources to be changed, such as the hostname, database name, etc. But I’ve run into an issue when it comes to saving the new connection details.
Here’s a snippet of code I’ve got. I’ve hard coded the values simply for testing purposes.
I loop through an array list containing the DatabaseMeta and then change the values of the fields.
for(DatabaseMeta meta: databaseMeta) {
meta.setHostName(“test_host”) ;
meta.setDBPort(“test_port”);
meta.setDBName(“test_database”);
repositoryService.updateDataSource(databaseMeta);
}
The updateDatasource() method simply invokes repository.save() (which is part of the org.pentaho.di.repository package) and passes in the DatabaseMeta.
When this method executes, it creates a .kdb file in my repository, with the values I set above, and making a GET request to the endpoint returns the connection details from the new file.
However, I simply want to overwrite the values in the existing transformation connection and return them in the GET request.
Is there any way that this can be achieved?
Any help will be greatly appreciated.
Thanks.

I don't know about the JAVA integration part, but as far as pure Pentaho goes, the Database Connection stated in the KTR/KJB needs to have the same parameters declared in the KTR/KJB, as such:
This way, whatever parameters you pass through to the KTR/KJB will be swapped out in the connection.

Related

Is it possible to get lineage metadata from the pipeline in my Data Fusion Action plugin?

I'm trying to get data lineage metadata like data source/schema and data target/schema in a custom Action plugin which gets executed after the successful run of the other steps in the pipeline.
I have a basic Action plugin that executes but I'm having trouble finding a way to get the metadata I'm after.
The use case I'm working on is pushing data lineage into a third party data governance tool.
I would very much appreciate if someone could point me in the right direction!
As was suggested in my comment, you might consider to use CDAP system metadata inventory to extract the particular property for the desired entity via CDAP existed RESTfull API methods by sending appropriate HTTP request as explained in CDAP Metadata Microservices documentation. Said this entity properties can also depict lineage of dataset fields returning the result in JSON format.
However, adjusting appropriate HTTP method mostly depends on the particular use case, therefore feel free to further contribute and share your further discovering.

Watson Conversation Java SDK how to get from ExportEntity to CreateEntity?

We are trying to automate some tasks in the chatbot/conversation creation process.
A step in this automation is to take an existing conversation (intents, entities en dialogs) and this to a newly created Conversation.
While working with the API, I see that getting a workspace (https://www.ibm.com/watson/developercloud/conversation/api/v1/java#get_workspace )
returns different types EntityExport, IntentExport etc...
(http://watson-developer-cloud.github.io/java-sdk/docs/java-sdk-4.2.0/com/ibm/watson/developer_cloud/conversation/v1/model/EntityExport.html )
than what the UpdateWorkspace expects:
CreateEntity, CreateIntent etc...
(http://watson-developer-cloud.github.io/java-sdk/docs/java-sdk-4.2.0/com/ibm/watson/developer_cloud/conversation/v1/model/CreateEntity.html)
Before I start writing a copyTo function, I thought I would ask for any pitfalls ? There must be a reason why the objects retrieved via GET are different from the objects you need provide for an update/create ?
These classes are generated to match parameters of REST API endpoints.
It would be much simpler to use HTTP client to fetch JSON of the workspace, remove a few unnecessary attributes (workspace_id, status, created, updated, etc) and send it to create or update endpoint.

Optimizing RMI Service Response

I am writing a server client application, best performance is a must; I am using RMI for server-client communication, the server uses mySQL database.
Now in the client side I have a method called
getLinks()
which invokes the same method on the server, the problem is that this method returns about 700Mb of data, which takes some time to get, and some more time to analyse.
And then I'm setting some values for each Link:
for (Link l : myService.getLinks()) l.setSelected(false);
What I have in mind right now is just getting the Link Ids first (since this would be a smaller data) and then using Asynchronous method to get each Link by Id (each link need one service call); and then setting the Link values.
Is this the best approach, is there another way of getting RMI data one by one (one method call and more than one return)?
Is there something like (yield return) in C#?
you can also make a pagination method, which receive the initial id (or position if the id's are not a consecutive) and the length, in this way you will not send all the id's twice
Are the Link objects remote objects? If not I don't really see the point of the code, as it only sets something locally in the client object which is immediately thrown away.
Assuming they are remote objects, it would be better to ship the entire update to the server and tell it to update the whole collection, something like setLinksSelected(boolean), where the server does the iteration.
But I would also be wary of updating, or even transporting, 700Mb of data via RMI whichever way you do it. That's a lot of data.

Programmatically sending mule message (formerly: Accessing mule http endpoint from java)

Scroll towards the end for the solution to the topic's problem. The original question was asking for a somewhat different thing.
As a part of a larger process, I need to fetch and link two related sets of data together. The way that the data is retrieved(dynamics crm, n:n relationships..) forces us retrieve the second set of the data again so that it will have all the necessary information. During a part of larger transformation of this data, I would like to access the http endpoint that is used to fetch the data from the crm, retrieve the second set of data and process it. I can get the endpoint through DefaultEndPointFactory like so:
DefaultEndpointFactory def = new DefaultEndpointFactory();
def.getInboundEndpoint("uri").getConnector;
But there is no method to actually send the mulemessage.
Solved:
The problem is that you can not set inbound properties on the MuleMessage, and the flow is depending on some of those to function(path, query params etc).
It seems you are able to inbound scoped properties with this:
m.setProperty("test", (Object)"test", PropertyScope.INBOUND);
Is there a way to make this approach work, or an alternative way to access the flow? I tried using mulecontext to get the flow:
muleContext.getRegistry().lookupFlowConstruct("myflow");
But it did not contain anything that looked useful.
Solution:
As David Dossot suggested in a comment of his answer, I was able to solve this with muleClients request method.
muleContext.getClient().request(url, timeout);
Then constructing the url as usual with GET parameters etc.
I'm not 100% sure about what you're trying to achieve but anyway, the correct way of using Mule transports from Java code is to use the MuleClient, which you can access with muleContext.getClient().
For example, the send method allow you to pass a properties map that are automatically added to the inbound scope. Behind the scene, Mule takes care of creating the endpoint needed for the operation.
Regarding the flow: what are you trying to do with it? Invoke it?

Search optimization when data owner is someone else

In my project, we have 2 REST calls which take too much time, so we are planning to optimize that. Here is how it works currently - we make 1st call to system A and then pass the response to system B for further processing. Once we get the response from system B, we have to manipulate it further before passing it to UI layer and this entire process takes lot of time. We planned on using Solr/Lucene but since we are not the data owners, we can't implement that. Can someone please shed some light on how best this can be handled? We are using Spring MVC and Spring webflow. Thanks in advance!!
[EDIT:] This is not the actual scenario and I am writing this as an example for better understanding. Think of this as making a store locator call for a particular zip to get a list of 100 stores and then sending those 100 stores to another call to get a list of inventory etc. So, this list of stores would change for every zip code and also the inventory there.
If your queries parameters to System A / System B are frequently the same you can add a cache framework to your code. If you use Spring3, you can use the cache easily with an #Cacheable annotation on your code calling SystemA. See :
http://static.springsource.org/spring/docs/3.1.0.M1/spring-framework-reference/html/cache.html
The cache subsystem will cache the result including processing code.

Categories

Resources