Wikidata Toolkit: Is it possible to access properties of entities? - java

First of all, I want to clarify that my experience working with wikidata is very limited, so feel free to correct if any of my terminology is wrong.
I've been playing with wikidata toolkit, more specifically their wdtk-wikibaseapi. This allows you to get entity information and their different properties as such:
WikibaseDataFetcher wbdf = WikibaseDataFetcher.getWikidataDataFetcher();
EntityDocument q42 = wbdf.getEntityDocument("Q42");
List<StatementGroup> groups = ((ItemDocument) q42).getStatementGroups();
for(StatementGroup g : groups) {
List<Statement> statements = g.getStatements();
for(Statement s : statements) {
System.out.println(s.getMainSnak().getPropertyId().getId());
System.out.println(s.getValue());
}
}
The above would get me the entity Douglas Adams and all the properties under his site: https://www.wikidata.org/wiki/Q42
Now wikidata toolkit has the ability to load and process dump files, meaning you can download a dump to your local and process it using their DumpProcessingController class under the wdtk-dumpfiles library. I'm just not sure what is meant by processing.
Can anyone explain me what does processing mean in this context?
Can you do something similar to what was done using wdtk-wikibaseapi in the example above but using a local dump file and wdtk-dumpfiles i.e. get an entity and it's respective properties? I don't want to get the info from online source, only from the dump (offline).
If this is not possible using wikidata-toolkit, could you point me to somewhere that can get me started on getting entities and their properties from a dump file for wikidata please? I am using Java.

Related

Connecting to an embedded OrientDB server in Java

I'm looking to run a Java process on several machines, each of which will need to start a local OrientBD server, load a graph, perform our processes, then close. As such, I need to be able to embed the OServer start process from within Java.
There is plenty of advice about how to do so, including SA questions, however most seem to be out of date (so please don't mark this as a duplicate prematurely). The most directly relevant seems to be this, however it doesn't work - at least for me. With the below code, I get the subsequent error:
try {
final OServer server = OServerMain.create();
server.startup(server.getClass().getResourceAsStream("/orientdb-server-config.xml"));
server.activate();
} catch (Exception e) {
e.printStackTrace();
System.exit(-1);
}
2021-12-07 21:47:39:323 INFO Loading configuration from input stream [OServerConfigurationLoaderXml]
2021-12-07 21:47:39:633 INFO OrientDB Server v3.2.3 (build dc98198215aa57baf29b32adb657dc3733acdb55, branch develop) is starting up... [OServer]java.lang.NullPointerException
at com.orientechnologies.orient.core.Orient.onEmbeddedFactoryInit(Orient.java:957)
at com.orientechnologies.orient.core.db.OrientDBEmbedded.<init>(OrientDBEmbedded.java:97)
at com.orientechnologies.orient.core.db.OrientDBInternal.embedded(OrientDBInternal.java:119)
at com.orientechnologies.orient.server.OServer.startupFromConfiguration(OServer.java:388)
at com.orientechnologies.orient.server.OServer.startup(OServer.java:314)
at ems.definitions.instance.Graph.<init>(Graph.java:47)
I am using OrientDB version 3.2.3; the 'ALL' .jar downloaded from here. Note that this jar does not contain the parameters file orientdb-server-config.xml, so I have downloaded it directly from the source GitHub.
Is there an issue with my specific implementation, my approach in general or with the default config file I'm using? I look forward to hearing your thoughts.
The issue was three-fold:
I was using the 'ALL' .jar provided by the website. Instead I needed to use the libraries provided in the full source.
I did not account for the fact that when the code failed, it did not delete the database it half-created, thus could not execute the code I tried to remedy. I had to implement a temporary fail-safe to drop the database prior to initialisation to avoid this.
I was using the wrong(?) strategy in general.
My working method is as below.
orientDB = new OrientDB("embedded:/tmp/","admin","adminpwd", OrientDBConfig.defaultConfig());
/** THIS IS VERY MUCH ONLY FOR LOCAL TESTING **/
if(orientDB.exists(name))
orientDB.drop(name);
if(!orientDB.exists(name)) // if the database does not already exist, create it.
orientDB.execute("create database " + name + " PLOCAL users ( admin identified by 'adminpwd' role admin)");
db = orientDB.open(name, "admin", "adminpwd");

How we can get forest data directory in MarkLogic

I am trying to get the forest Data directory in MarkLogic. I used the following method to get data directory...using the Server Evaluation Call Interface running queries as admin. If not, please let me know how I can get forest data directory
ServerEvaluationCall forestDataDirCall = client.newServerEval()
.xquery("admin:forest-get-data-directory(admin:get-configuration(), admin:forest-get-id(admin:get-configuration(), \"" + forestName +"\"))");
for (EvalResult forestDataDirResult : forestDataDirCall.eval()) {
String forestDataDir = null;
forestDataDir = forestDataDirResult.getString();
System.out.println("forestDataDir is " + forestDataDir);
}
I see no reason for needing to hit the server evaluation endpoint to ask this question to the server. MarkLogic comes with a robust REST based management API including getters for almost all items of interest.
Knowing that, you can use what is documented here:
http://yourserver:8002/manage/v2/forests
Results can be in JSON, XML or HTML
It is the getter for forest configurations. Which forests you care about can be found by iterating over all forests or by reaching through the database config and then to the forests. It all depends on what you already know from the outside.
References:
Management API
Scripting Administrative Tasks

Running Talend Job from within Java application

I am developing a web app using Spring MVC. Simply put, a user uploads a file which can be of different types (.csv, .xls, .txt, .xml) and the application parses this file and extracts data for further processing. The problem is that I format of the file can change frequently. So there must be some way for quick and easy customization. Being a bit familiar with Talend, I decided to give it a shot and use it as ETL tool for my app. This short tutorial shows how to run Talend job from within Java app - http://www.talendforge.org/forum/viewtopic.php?id=2901
However, jobs created using Talend can read from/write to physical files, directories or databases. Is it possible to modify Talend job so that it can be given some Java object as a parameter and then return Java object just as usual Java methods?
For example something like:
String[] param = new String[]{"John Doe"};
String talendJobOutput = teaPot.myjob_0_1.myJob.main(param);
where teaPot.myjob_0_1.myJob is the talend job integrated into my app
I did something similar I guess. I created a mapping in tallend using tMap and exported this as talend job (java se programm). If you include the libraries of that job, you can run the talend job as described by others.
To pass arbitrary java objects you can use the following methods which are present in every talend job:
public Object getValueObject() {
return this.valueObject;
}
public void setValueObject(Object valueObject) {
this.valueObject = valueObject;
}
In your job you have to cast this object. e.g. you can put in a List of HashMaps and use Java reflection to populate rows. Use tJavaFlex or a custom component for that.
Using this method I can adjust the mapping of my data visually in Talend, but still use the generated code as library in my java application.
Now I better understand your willing, I think this is NOT possible because Talend's architecture is made like a standalone app, with a "main" entry point merely as does the Java main() method :
public String[][] runJob(String[] args) {
int exitCode = runJobInTOS(args);
String[][] bufferValue = new String[][] { { Integer.toString(exitCode) } };
return bufferValue;
}
That is to say : the Talend execution entry point only accepts a String array as input and doesn't returns anything as output (except as a system return code).
So, you won't be able link to Talend (generated) code as a library but as an isolated tool that you can only parameterize (using context vars, see my other response) before launching.
You can see that in Talend help center or forum the only integration described is as an "external" job execution ... :
Talend knowledge base "Calling a Talend Job from an external Java application" article
Talend Community Forum "Java Object to Talend" topic
May be you have to rethink the architecture of your application if you want to use Talend as the ETL tool for your purpose.
Now from Talend ETL point of view : if you want to parameter the execution environment of your Jobs (for exemple the physical directory of the uploaded files), you should use context variables that can be loaded at execution time from a configuration file as mentioned here :
https://help.talend.com/display/TalendOpenStudioforDataIntegrationUserGuide53EN/2.6.6+Context+settings

Generating BPEL files programmatically?

Is there a way to generate BPEL programmatically in Java?
I tried using the BPEL Eclipse Designer API to write this code:
Process process = null;
try {
Resource.Factory.Registry reg =Resource.Factory.Registry.INSTANCE;
Map<String, Object> m = reg.getExtensionToFactoryMap();
m.put("bpel", new BPELResourceFactoryImpl());//it works with XMLResourceFactoryImpl()
//create resource
URI uri =URI.createFileURI("myBPEL2.bpel");
ResourceSet rSet = new ResourceSetImpl();
Resource bpelResource = rSet.createResource(uri);
//create/populate process
process = BPELFactory.eINSTANCE.createProcess();
process.setName("myBPEL");
Sequence mySeq = BPELFactory.eINSTANCE.createSequence();
mySeq.setName("mainSequence");
process.setActivity(mySeq);
//save resource
bpelResource.getContents().add(process);
Map<String,String> map= new HashMap<String, String>();
map.put("bpel", "http://docs.oasis-open.org/wsbpel/2.0/process/executable");
map.put("tns", "http://matrix.bpelprocess");
map.put("xsd", "http://www.w3.org/2001/XMLSchema");
bpelResource.save(map);
}
catch (Exception e) {
e.printStackTrace();
}
}
but I received an error:
INamespaceMap cannot be attached to an eObject ...
I read this message by Simon:
I understand that using the BPEL model outside of eclipse might be desirable, but it was never intended by us. Thus, this isn't supported
Is there any other API that can help?
You might want to give JAXB a try. It helps you to transform the official BPEL XSD into Java classes. You use those classes to construct your BPEL document and output it.
I had exactly the same problem with the BPELUnit [1], so I started a module in BPELUnit that has the first things necessary for generating and reading BPEL Models [2] although it is far from complete. Supported is only BPEL 2.0 (1.1 will follow later) and handlers are also currently not supported (but will be added). It is under active development because BPELUnit's code coverage component will be based on it so it will get BPEL-feature complete over time. You are happily invited to contribute if you need to close gaps earlier.
You can check it out from GitHub or grap the Maven artifact.
As of now there is no documentation but you can have a look at the JUnit tests that read and write processes.
If this is not suitable for, I'd like to share some experiences with you:
Do not use JAXB: You will need to read and write XML Namespaces which are not preserved with JAXB. That's why I have chosen XMLBeans. DOM would be the other alternative that I can think of.
The inheritance in the XML Schema is not really developer friendly. That's why there are own interface structures and wrappers around the XMLBeans generated classes.
Daniel
[1] http://www.bpelunit.net
[2] https://github.com/bpelunit/bpelunit/tree/master/net.bpelunit.model.bpel
This has been solved using the unify framework API after adding the necessary classes to handle correlation. BPELUnit stated by #Daniel seems to be another alternative.
The Eclipse BPEL API is based on an EMF Model. So you could generate your own artifacts using JET or Xpand based on that. This way there is no requirement to run inside Eclipse.
Although you may can't use BPEL outside of Eclipse, have you considered moving parts of your application inside it?
The BPEL XML Schemas are listed in the appendig of the spec. So you could also base your work on that and integrate with existing BPEL applications where necessary.
In case anyone is looking to solve the above problem while still running inside eclipse environment.
The problem can be resolved as stated by Luca Pino here by adding:
AdapterRegistry.INSTANCE.registerAdapterFactory( BPELPackage.eINSTANCE, BasicBPELAdapterFactory.INSTANCE );
before the resource creation line i.e.
Resource bpelResource = rSet.createResource(uri);
Note: Another solution, to the same problem, also stating how to resolve the dependencies to make this code work, can be found in my other answer here.

how to copy a schema in mysql using java

in my application i need to copy a schema with its tables and store procedures from a base schemn to a new schema.
i am looking for a way to implement this.
i looked into exacting the mysqldump using cmd however it is not a good solution because i have a client side application and this requires an instillation of the server on the client side.
the other option is my own implantation using show query.
the problem here is that i need t implement it all from scratch and the must problematic part is that i will need to arrange the order of the tables according to there foreign key (because if there is a foreign key in the table, the table i am pointing to needs to be created first).
i also thought of creating a store procedure to do this but store procedures in my SQL cant access the disk.
perhaps someone has an idea on how this can be implemented in another way?
You can try using the Apache ddlutils. There is a way to export the ddls from a database to an xml file and re-import it back.
The api usage page has examples on how to export schema to an xml file, read from xml file and apply it to a new database. I have reproduced those functions below along with a small snippet on how to use it to accomplish what you are asking for. You can use this as starting point and optimize it further.
DataSource sourceDb;
DataSource targetDb;
writeDatabaseToXML(readDatabase(sourceDb), "database-dump.xml");
changeDatabase(targetDb,readDatabaseFromXML("database-dump.xml"));
public Database readDatabase(DataSource dataSource)
{
Platform platform = PlatformFactory.createNewPlatformInstance(dataSource);
return platform.readModelFromDatabase("model");
}
public void writeDatabaseToXML(Database db, String fileName)
{
new DatabaseIO().write(db, fileName);
}
public Database readDatabaseFromXML(String fileName)
{
return new DatabaseIO().read(fileName);
}
public void changeDatabase(DataSource dataSource,
Database targetModel)
{
Platform platform = PlatformFactory.createNewPlatformInstance(dataSource);
platform.createTables(targetModel, true, false);
}
You can use information_schema to fetch the foreign key information and build a dependency tree. Here is an example.
But I think you are trying to solve something that has been solved many times before. I'm not familiar with Java, but there are ORM tools (for Python at least) that can inspect your current database and create a complementing model in Java (or Python). Then you can deploy that model into another database.

Categories

Resources