Parsing RDF items

Parsing RDF items - java

I have a couple lines of (I think) RDF data
<http://www.test.com/meta#0001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class>
<http://www.test.com/meta#0002> <http://www.test.com/meta#CONCEPT_hasType> "BEAR"^^<http://www.w3.org/2001/XMLSchema#string>
Each line has 3 items in it. I want to pull out the item before and after the URL. So that would result in:
0001, type, Class
0002, CONCEPT_hasType, (BEAR, string)
Is there a library out there (java or scala) that would do this split for me? Or do I just need to shove string.splits and assumptions in my code?

Most RDF libraries will have something to facilitate this. For example, if you parse your RDF data using Eclipse RDF4J's Rio parser, you will get back each line as a org.eclipse.rdf4j.model.Statement, with a subject, predicate and object value. The subject in both your lines will be an org.eclipse.rdf4j.model.IRI, which has a getLocalName() method you can use to get the part behind the last #. See the Javadocs for more details.
Assuming your data is in N-Triples syntax (which it seems to be given the example you showed us), here's a simple bit of code that does this and prints it out to STDOUT:
// parse the file into a Model object
InputStream in = new FileInputStream(new File("/path/to/rdf-data.nt"));
org.eclipse.rdf4j.model.Model model = Rio.parse(in, RDFFormat.NTRIPLES);
for (org.eclipse.rdf4j.model.Statement st: model) {
org.eclipse.rdf4j.model.Resource subject = st.getSubject();
if (subject instanceof org.eclipse.rdf4j.model.IRI) {
System.out.print(((IRI)subject).getLocalName());
}
else {
System.out.print(subject.stringValue());
}
// ... etc for predicate and object (the 2nd and 3rd elements in each RDF statement)
}
Update if you don't want to read data from a file but simply use a String, you could just use a java.io.StringReader instead of an InputStream:
StringReader r = new StringReader("<http://www.test.com/meta#0001> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2002/07/owl#Class> .");
org.eclipse.rdf4j.model.Model model = Rio.parse(r, RDFFormat.NTRIPLES);
Alternatively, if you don't want to parse the data at all and just want to do String processing, there is a org.eclipse.rdf4j.model,URIUtil class which you can just feed a string and it can give you back the index of the local name part:
String uri = "http://www.test.com/meta#0001";
String localpart = uri.substring(URIUtil.getLocalNameIndex(uri)); // will be "0001"
(disclosure: I am on the RDF4J development team)

Related

easiest way to read a java file - is there a simpler auternative to JSON

I am writing a small java method that needs to read test data from a file on my win10 laptop.
The test data has not been formed yet but it will be text based.
I need to write a method that reads the data and analyses it character by character.
My questions are:
what is the simplest format to create and read the file....I was looking at JSON, something that does not look particularly complex but is it the best for a very simple application?
My second question (and I am a novice). If the file is in a text file on my laptop.....how do I tell my java code where to find it....how do I ask java to navigate the win10 operating system?

You can also map the text file into java objects (It depends on your text file).
For example, we have a text file that contains person name and family line by line like:
Foo,bar
John,doe
So for parse above text file and map it into a java object we can :
1- Create a Person Object
2- Read and parse the file (line by line)
Create Person Class
public class Person {
private String name;
private String family;
//setters and getters
}
Read The File and Parse line by line
public static void main(String[] args) throws IOException {
//Read file
//Parse line by line
//Map into person object
List<Person> personList = Files
.lines(Paths
.get("D:\\Project\\Code\\src\\main\\resources\\person.txt"))
.map(line -> {
//Get lines of test and split by ","
//It split words of the line and push them into an array of string. Like "John,Doe" -> [John,Doe]
List<String> nameAndFamily = Splitter.on(",").trimResults().omitEmptyStrings().splitToList(line);
//Create a new Person and get above words
Person person = new Person();
person.setName(nameAndFamily.get(0));
person.setFamily(nameAndFamily.get(1));
return person;
}
).collect(Collectors.toList());
//Process the person list
personList.forEach(person -> {
//You can whatever you want to the each person
//Print
System.out.println(person.getName());
System.out.println(person.getFamily());
});
}

Regarding your first question, I can't say much, without knowing anything about the data you like to write/read.
For your second question, you would normally do something like this:
String pathToFile = "C:/Users/SomeUser/Documents/testdata.txt";
InputStream in = new FileInputStream(pathToFile);
As your data gains more complexity you should probably think about using a defined format, if that is possible, something like JSON, YAML or similar for example.
Hope this helps a bit. Good luck with your project.

As for the format the text file needs to take, you should elaborate a bit on the kind of data. So I can't say much there.
But to navigate the file system, you just need to write the path a bit different:
The drive letter is a single character at the beginning of the path i.e. no colon ":"
replace the backslash with a slash
then you should be set.
So for example...
C:\users\johndoe\documents\projectfiles\mydatafile.txt
becomes
c/users/johndoe/documents/projectfiles/mydatafile.txt
With this path, you can use all the IO classes for file manipulation.

Interpolate JSON values into a string

I am writing an application/class that will take in a template text file and a JSON value and return interpolated text back to the caller.
The format of the input template text file needs to be determined. For example: my name is ${fullName}
Example of the JSON:
{"fullName": "Elon Musk"}
Expected output:
"my name is Elon Musk"
I am looking for a widely used library/formats that can accomplish this.
What format should the template text file be?
What library would support the template text file format defined above and accept JSON values?
Its easy to build my own parser but there are many edge cases that needs to be taken care of and I do not want to reinvent the wheel.
For example, if we have a slightly complex JSON object with lists, nested values etc. then I will have to think about those as well and implement it.

I have always used org.json library. Found at http://www.json.org/.
It makes it really easy to go through JSON Objects.
For example if you want to make a new object:
JSONObject person = new JSONObject();
person.put("fullName", "Elon Musk");
person.put("phoneNumber", 3811111111);
The JSON Object would look like:
{
"fullName": "Elon Musk",
"phoneNumber": 3811111111
}
It's similar to retrieving from the Object
String name = person.getString("fullName");
You can read out the file with BufferedReader and parse it as you wish.
Hopefully I helped out. :)

This is how we do it.
Map inputMap = ["fullName": "Elon Musk"]
String finalText = StrSubstitutor.replace("my name is \${fullName}", inputMap)

You can try this：
https://github.com/alibaba/fastjson
Fastjson is a Java library that can be used to convert Java Objects into their JSON representation. It can also be used to convert a JSON string to an equivalent Java object. Fastjson can work with arbitrary Java objects including pre-existing objects that you do not have source-code of.

How to Pre Process Json String in Java :: Convert Capitalised Field names to lowerCase Camel case names

My current Android project consumes many Json web services.
I have no control over the Json content.
I wish to persist this Json directly into my applications local Realm database.
The issue is the Json Field Names Are All Capitalised.
I do not wish my Realm DTO objects to have capitalised field names as thats just WRONG.
How can I transform the Capitalised field names to acceptable Java field name format?
Is there any Json pre processing libraries that will perform the required transformation of Capitalised field names?
I realise I can use Jackson/GSON type libraries to solve this issue, however that means transforming Json to Java Pojo before I can persist the data.
The json Field names are "ThisIsAFieldName".
What I want is to transform them to "thisIsAFieldName".

I think you should really consider letting your JSON deserializer handle this, but if this really isn't a possibility you can always use good old string manipulation :
String input; // your JSON input
Pattern p = Pattern.compile("\"([A-Z])([^\"]*\"\\s*:)"); // matches '"Xxxx" :'
Matcher m = p.matcher(input);
StringBuffer output = new StringBuffer();
while (m.find()) {
m.appendReplacement(output, String.format("\"%s$2", m.group(1).toLowerCase());
}
m.appendTail(output);
Ideone test.

Externalize XML construction from a stream of CSV in Java

I get a stream of values as CSV , based on some condition I need to generate a XML including only a set of values from the CSV. For e.g .
Input : a:value1, b:value2, c:value3, d:value4, e:value5.
if (condition1)
XML O/P = <Request><ValueOfA>value1</ValueOfA><ValueOfE>value5</ValueOfE></Request>
else if (condition2)
XML O/P = <Request><ValueOfB>value2</ValueOfB><ValueOfD>value4</ValueOfD></Request>
I want to externalize the process in a way that given a template the output XML is generated accordingly. String manipulation is the easiest way of implementing this but I do not want to mess up the XML if some special characters appear in the input, etc. Please suggest.

Perhaps you could benefit from templating engine, something like Apache Velocity.

I would suggest creating an xsd and using JAXB to create the Java binding classes that you can use to generate the XML.

I recommend my own templating engine (JATL http://code.google.com/p/jatl/) Although its geared to (X)HTML its also very good at generating XML.
I didn't bother solving the whole problem for you (that is double splitting on the input ("," and then ":").) but this is how you would use JATL.
final String a = "stuff";
HtmlWriter html = new HtmlWriter() {
#Override
protected void build() {
//If condition1
start("Request").start("ValueOfA").text(a).end().end();
}
};
//Now write.
StringWriter writer = new StringWriter();
String results = html.write(writer).getBuffer().toString();
Which would generate
<Request><ValueOfA>stuff</ValueOfA></Request>
All the correct escaping is handled for you.

Lucene 3.5 Custom Payloads

Working with a Lucene index, I have a standard document format that looks something like this:
Name: John Doe
Job: Plumber
Hobby: Fishing
My goal is to append a payload to the job field that would hold additional information about Plumbing, for instance, a wikipedia link to the plumbing article. I do not want to put payloads anywhere else. Initially, I found an example that covered what I'd like to do, but it used Lucene 2.2, and has no updates to reflect the changes in the token stream api.
After some more research, I came up with this little monstrosity to build a custom token stream for that field.
public static TokenStream tokenStream(final String fieldName, Reader reader, Analyzer analyzer, final String item) {
final TokenStream ts = analyzer.tokenStream(fieldName, reader) ;
TokenStream res = new TokenStream() {
CharTermAttribute termAtt = addAttribute(CharTermAttribute.class);
PayloadAttribute payAtt = addAttribute(PayloadAttribute.class);
public boolean incrementToken() throws IOException {
while(true) {
boolean hasNext = ts.incrementToken();
if(hasNext) {
termAtt.append("test");
payAtt.setPayload(new Payload(item.getBytes()));
}
return hasNext;
}
}
};
return res;
}
When I take the token stream and iterate over all the results, prior to adding it to a field, I see it successfully paired the term and the payload. After calling reset() on the stream, I add it to a document field and index the document. However, when I print out the document and look at the index with Luke, my custom token stream didn't make the cut. The field name appears correctly, but the term value from the token stream does not appear, nor does either indicate the successful attachment of a payload.
This leads me to 2 questions. First, did I use the token stream correctly and if so, why doesn't it tokenize when I add it to the field? Secondly, if I didn't use the stream correctly, do I need to write my own analyzer. This example was cobbled together using the Lucene standard analyzer to generate the token stream and write the document. I'd like to avoid writing my own analyzer if possible because I only wish to append the payload to one field!
Edit:
Calling code
TokenStream ts = tokenStream("field", new StringReader("value"), a, docValue);
CharTermAttribute cta = ts.getAttribute(CharTermAttribute.class);
PayloadAttribute payload = ts.getAttribute(PayloadAttribute.class);
while(ts.incrementToken()) {
System.out.println("Term = " + cta.toString());
System.out.println("Payload = " + new String(payload.getPayload().getData()));
}
ts.reset();

It's very hard to tell why the payloads are not saved, the reason may lay in the code that uses the method that you presented.
The most convenient way to set payloads is in a TokenFilter -- I think that taking this approach will give you much cleaner code and in turn make your scenario work correctly. I think that it's most illustrative to take a look at some filter of this type in Lucene source, e.g. TokenOffsetPayloadTokenFilter. You can find an example of how it should be used in the test for this class.
Please also consider if there is no better place to store these hyperlinks than in payloads. Payloads have very special application for e.g. boosting some terms depending on their location or formatting in the original document, part of speech... Their main purpose is to affect how the search is performed, so they are normally numeric values, efficiently packed to cut down the index size.

I might be missing something, but...
You don't need a custom tokenizer to associate additional information to a Lucene document. Just store is as an unanalyzed field.
doc.Add(new Field("fname", "Joe", Field.Store.YES, Field.Index.ANALYZED));
doc.Add(new Field("job", "Plumber", Field.Store.YES, Field.Index.ANALYZED));
doc.Add(new Field("link","http://www.example.com", Field.Store.YES, Field.Index.NO));
You can then get the "link" field just like any other field.
Also, if you did need a custom tokenizer, then you would definitely need a custom analyzer to implement it, for both the index building and searching.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing RDF items - java

Related

easiest way to read a java file - is there a simpler auternative to JSON

Interpolate JSON values into a string

How to Pre Process Json String in Java :: Convert Capitalised Field names to lowerCase Camel case names

Externalize XML construction from a stream of CSV in Java

Lucene 3.5 Custom Payloads

Categories

Resources