I have the following csv file (In production the number of records can range from 20k-100k and has many fields )
id,firstname,lastname,email,profession
100,Betta,Wandie,Betta.Wandie#gmail.com,developer
101,Janey,Firmin,Janey.Firmin#gmail.com,doctor
I need to convert this to json and do further processing.
CSV->JSON->PROCESS FURTHER
.I am able to convert it to JSON directly using the code given here
directly convert CSV file to JSON file using the Jackson library
But i want do validations for json like if lastname has null value then ignore that record or id is missing then ignore that record.
How can i handle the validation?I am using Java 8 and spring boot latest version
I have done something similar by using JavaScript (Nashorn). Yes, that is nasty, but it works, and it is astonishingly fast!
Unfortunately, I do not have the source code at hand …
Why I did it that way had the same reasons as #chrylis-on strike implies in their comment: the validation is much easier if you have an object for each JSON record. But as I was lazy, and there was definitely no need for the Java object I had to create for this, I had the idea with the JavaScript script inside my Java program.
Basically, a JSON String is the source for a JavaScript program; you can assign it directly to a JavaScript variable and then you can access the records as array elements and the fields by their name. So your JavaScript code walks through the records and drops those that do not match your validation rules.
Now the Java part: the keyword here is JSR223; it is an API that allows you to execute scripts inside your Java environment. In your case, you have to provide the converted JSON in the context, then you start the script that writes the modified JSON back to the context.
If the converted JSON is too large, you can even use the same technique to check record by record; if you compile the script, it is nearly as fast as native Java.
You can even omit Jackson and let JavaScript do the conversion …
Sorry that I cannot provide code samples; I will try to get hold on them and add them if I get them.
Related
We are using Vorto now mainly as a normalized format and are starting to look into using the mapping engine for mapping different payload formats to Vorto model as well. I more or less understand how to map functionblock properties from JSON or binary payload using xpath and the conversion functions. However, I'm not clear how to support parsing of non-fixed format binary payload using this method.
For instance we have an off the shelf LoRaWAN sensor which transmits in the following format:
<length><frame type>[<sensor-id><sensor-value>] where length is the total frame length and sensor-id (for eg temperature, humidity, battery, ...) describes how to parse the sensor-value (ie length, datatype). In one frame multiple of these readings may be present in random order.
Parsing this can be done easily in for instance loraserver.io using a small javascript function which iterates over all the bytes en returns the parsed properties. The same way will work in the Ditto payload mapping engine afaik.
However, currently I don't see how to do something similar in Vorto mapping. This is just one specific sensor example of course, but more examples exist on the market using similar dynamic payload format. I know there is already an open issue (#1535) to improve the documentation, but it would already be helpful to know if such flexible parsing would be possible using the mapping DSL.
I tried passing the raw payload as bytearray to the javascript function. In order to test this I duplicated the org.eclipse.vorto.mapping.engine.converter.binary.BinaryMappingTest#testMappingBinaryContaining2DataPoints and adapted the model to use a custom javascript function like this
evaluator.addScriptFunction(new ScriptClassFunction("extractTemperature",
"function extractTemperature(value) { " +
" print(\"parameter of type \" + typeof value + \", value = \" + value);" +
" print(value[1]);" +
"}"));
The output of this function is
parameter of type number, value = 1
undefined
Where the value 1 is the first element of the bytearray used.
So the function does not seem to receive the parameter as bytarray.
The model is configured with .withXPathStereotype("custom:extractTemperature(data)", "demo") so the payload is passed (as BinaryData) in the same way as in the testMappingBinaryContaining2DataPoints test (.withXPathStereotype("custom:convert(vorto_conversion1:byteArrayToInt(data,0,0,0,2))", "demo")). The only difference I see now is that in the testMappingBinaryContaining2DataPoints test is that the byetarray parameter is passed to a Java function instead of a javascript function. Or am I missing something?
Also, I noticed that loop keywords like for and while are not allowed in the javascript code. So even if I can access the bytearray parameter in the javascript function I see no way for now how to iterate over this.
On gitter I received following reply (together with the suggestion to move discussion to SO)
You are right. We restricted the Javascript function usage to very rudimentary set of language keywords excluding for loops as nasty stuff can be implemented there. What you could do Instead is to register a java function In your own namespace to the mapping engine. That function can hold a byte array. Later this function can be contributed to the mapping engine as a standard function to extract a certain value out for other developers to reuse.
I don't think this is solution to the problem however. As mentioned above this is just one example of an off the shelf sensor payload format, and I don't see how this can be generalized enough to include as a generic function in the mapping engine. And I don't think it should be required to implement a sensor specific conversion in Java, since (as an end-user of an IoT platform wanting to deploy a new sensor type) this is more complex to develop and deploy than a little javascript function which can be altered at runtime in the mapping spec. I see a lot of value in being able to do simple mappings in javascript, just like this can be done in for example loraserver.io and Eclipse Ditto.
I think being able to pass a byte array to javascript is a first step. Also I wonder where exactly the risk is in allowing loops in the javascript? For example Ditto also has some restrictions in the javascript sandbox (see here) but this allows loops and only prevents endless looping and recursion.
They state the following:
Using Rhino instead of Nashorn, the newer JavaScript engine shipped with Java, has the benefit that sandboxing can be applied in a better way.
Sandboxing of different payload scripts is required as Ditto is intended to be run as cloud service where multiple connections to different endpoints are managed for different tenants at the same time. This requires the isolation of each single script to avoid interference with other scripts and to protect the JVM executing the script against harmful code execution.
Would using Rhino in Vorto as well allow to control the risks you see and allow loop construct in Vorto mapping?
PS: can someone with enough SO reputation points add the tag eclipse-vorto please?
I created an issue for you request to support this in the Javascript converters: https://github.com/eclipse/vorto/issues/2029
As stated in the issue, as a current workaround, you can register your own custom converter function with Java and re-use this function across your mappings. In these java converter functions, you have all the power of the java language to convert to extract the right property from the arbitrary list.
In order to find out how to implement your own custom converter function with Java, take a look here: https://github.com/eclipse/vorto/tree/master/mapping-engine#Advanced-Usage
Since Eclipse Vorto 0.12.3 release, a fix for your request is available. With this it is possible to pass array object to javascript Converter as well as use for loops inside javascript functions. You might wanna give it a try.
See release notes https://github.com/eclipse/vorto/blob/master/docs/release-notes.md
I am receiving a JSON from a producer application and when I try to convert that into a JSON Object and save it into MarkLogic as .json. I donot have the POJO for the incoming unknown data to have a field to annotate it with #Id. I am getting below Error
#Autowired
MarkLogicOperations ops;
#StreamListener(MultiInputSink.INPUT)
public synchronized void handle(String ConsumerContents) {
JSONObject jsobObj = new JSONObject(consumerContents);
ops.write(jsobObj, "Consumer");
logger.info("Consumer Data "+jsobObj.toString());
}
Below is the error:-
nested exception is java.lang.illegalArgumentException: your entity of type org.json.JSONObject does not have a method or field annotated with field org.springframework.data.annotated.id
I need to save JSONs as it comes. Is there a way to go around this ? Thank you.
Note:- I am using this for Spring-data-abstraction for MarkLogic https://github.com/malteseduck/spring-data-marklogic
A common problem with new marklogic developers is using a high level interface when a 'low' level one is more appropriate.
In this case the spring API exposes operations based on spring entities -- which would work ok if you had spring entities to begin with, but you don't. You have 'plain json' -- aka 'text'. In which case you do NOT want to convert those to entities just to store them in MarkLogic -- try single-stepping sometime though one of the high level APIs -- The amount of work involved to translate from string JSON to Entity then back to string to send to ML is mindboggeling. Not just wrt performance but also data fidelity. Java to JSON is not 1:1 lossless or unambiguous. If you start out with an Entity then the API makes some sense, but you dont want to convert your real JSON into an entity just to write it back out.
Instead look for one of the other APIs. For example executeWithClient will give you access to the next 'lower' API.
From there you can write out JSON as JSON (text) with less fuss.
With the DatabaseClient interface you still have to choose amoung a range of abstractions -- all of which still require you to tranform your perfectly good JSON text into an object of some form so the API can transform it back to text.
For the DocumentManager (part of DatabaseClient) you need a WriteHandle -- the StringWriteHandle should work for this.
A more direct (and effecient) method is to execute the document insert via an xquery exec. This project has some examples:
https://github.com/DALDEI/mlperf
One example: (kotlin source)
val repo = client.newServerEval()
repo.addVariable("url", getURI("evalstring", id))
repo.addVariable("content", (str))
repo.xquery(
"declare variable \$url as xs:string external ;"+
"declare variable ${'$'}content external ;"+
"xdmp:document-insert( \$url , xdmp:unquote( \$content ) )"
)
repo.eval().forEach {
println(it.string)
}
If performance is important for this app, I recommend experimenting with different approaches. Performance can vary dramatically (by 100x or more) with seemingly minor differences in choice of API and methodology. The above example is from a small program that exercises different APIs writing out large numbers of small JSON files. Performance can vary from 5 docs/sec to 2000 docs/sec with roughly the same amount of code.
I would like to store all the gherkin feature files created by a user on the front end as GherkinDocuments on the back end using the gherkin parser. Once saved, I would also like to be able to display the raw gherkin document on the front end. I have read through the documentation and cannot find anything built-in that converts the GherkinDocument back to a raw text. The toString() method is also not overloaded to print out. Is there a way to convert a GherkinDocument object to raw text within the gherkin parser?
I want to be able to keep as much of the original formatting as possible. Normally I would just write my own utility to perform this, however the structure of the GherkinDocument object renders it tedious. I would prefer to use existing capabilities if they exist.
I talked to Aslak, Cucumber developer, on the cucumber help gitter. He told me:
Hi #tramstheman have you considered storing it as text instead of serialising the GherkinDocument AST? It is very quick to parse that text back into an AST when you need to.
There isn't currently a renderer/prettifier that will turn an AST back to source as #mattwynne suggested. The tests don't do roundtrips, they just perform approval testing on various outputs (parser tokens, ASTs as JSON, pickles as JSON)
What I have done instead is extended the GherkinDocument object and set it to store the raw text inside it, as similarly suggested by Aslak.
What about reading the feature files as is and display them? They are available in your test class path. Move them to your production class path and they will be possible to read from any class, test or production. This will allow you to open a stream for each file and display it without any modification.
So I'm working on a windows app to practice my coding and expand my knowledge and I'm having a big issue working with the JSON from Riot Games. I'm not really sure the terminology to look up because I've never learned how to work with JSON so I've been making fair progress using google-fu and stack-overflow as references, along with various documentation.
I received all the following data using inputstream and outputstream. But it's all as an unformatted JSON, no indents and really hard to read. I assume I used the wrong tool for the task here and it just read characters and printed them.
My question is this, what syntax should I be looking at to import a JSON from a URL to a JSON file. Then what should I look at to convert or read that JSON as multiple different Java objects so that I may use the data in my code.
Using Gson is an option I've explored already but I'm not comfortable with their syntax, and I'd rather steer clear of more 3rd party dependencies if i can help it.
Any idea's or discussion is welcome, I'm a little over my head with the JSON here so any discussion i can learn from would help.
Despite your aversion of third party libraries, I would strongly consider looking into the Jackson library for java, and utilizing its deserialization/serialization features.
Essentially, you create java objects that can be mapped from JSON. For example, for the JSON:
{
"email": "email#gmail.com",
"googleId": "43243243242432",
"name": "some name"
}
Create a java class:
public class User {
private String email;
private String googleId;
private String name;
// Getters and Setters...
}
You can then map the JSON String to the object, using Jackson's object mapper, e.g.:
ObjectMapper mapper = new ObjectMapper();
User user = mapper.readValue(myJsonString, User.class);
This allows you to easily access the fields:
String name = user.getName(); // some name
String email = user.getEmail(); // email#gmail.com
and so on.
If you want to do this entirely in Java just skip to the bottom. I was thinking of using Java Script at first. But, realized you might not want to do that half way through.
To get a JSON file from a URL you will most likely need to use Java Script. And in that vain JQuery, a tool for Java Script, has a function for this. See .getJSON(). If you really don't want to use any third parties it can be done in native javascript see https://stackoverflow.com/a/2499647/1361042. However, using JQuery is probably much easier.
$.getJSON('url', function(data) {
//data is the JSON
});
Then you write the data(a JSON object) to a JSON file. This file can then be used in your Java, where the JSON file will be imported and used as a JSON object that you can convert to Java objects.
JSON objects are basically Java Script objects with a little bit different formatting. Java objects are in a different format from Java Script objects. So you will need to do some conversion. This questions touches on that.
Convert a JSON string to object in Java ME?
Java
However, it might be easier for you do this entirely in Java, if that's your strong point. If you want to do it entirely in Java the top part is pretty irrelevant.
This is what I found on Java being used:
simplest way to read json from a URL in java
I want to parse an HTML file and read its content, organize it and then use it in MATLAB
Since I have a background in JAVA and using Jsoup to parse the HTML files, I decided to go that way and parse the HTML file from JAVA and then send the results to MATLAB.
The problem is that my result will be an object, that I will create it, called "seizureList", and has the following entries : classification, onset, pattern, vigilance, and origin.
How I'm supposed to convert this object from JAVA to MATLAB?
A simple but working solution would be to write the result to a file from JAVA, And then read that file in MATLAB and parse it, but I want a more efficient way.
Note that I've gone through the other questions related to this, but they are only dealing with a string return or simple stuff, not a user defined object.
Any help is appreciated.
You can run Java directly from within Matlab. Then the resulting Java object would appear as a variable in the Matlab workspace, and you can access its public fields, call its methods, etc.
For more informaton, have a look at Call Java Libraries in the Matlab documentation.