Sandboxed java scripting replacement for Nashorn

Sandboxed java scripting replacement for Nashorn - java

I've been using Nashorn for awk-like bulk data processing. The idea is, that there's a lot of incoming data, coming row by row, one by another. And each row consists of named fields. These data are processed by user-defined scripts stored somewhere externally and editable by users. Scripts are simple, like if( c>10) a=b+3, where a, b and c are fields in the incoming data rows. The amount of data is really huge. Code is like that (an example to show the use case):
ScriptEngine engine = new NashornScriptEngineFactory().getScriptEngine(
new String[]{"-strict", "--no-java", "--no-syntax-extensions", "--optimistic-types=true"},
null,
scr -> false);
CompiledScript cs;
Invocable inv=(Invocable) engine;
Bindings bd=engine.getBindings(ScriptContext.ENGINE_SCOPE);
bd.remove("load");
bd.remove("loadWithNewGlobal");
bd.remove("exit");
bd.remove("eval");
bd.remove("quit");
String scriptText=readScriptText();
cs = ((Compilable) engine).compile("function foo() {\n"+scriptText+"\n}");
cs.eval();
Map params=readIncomingData();
while(params!=null)
{
Map<String, Object> res = (Map) inv.invokeFunction("foo", params);
writeProcessedData(res);
params=readIncomingData();
}
Now nashorn is obsolete and I'm looking for alternatives. Was googling for a few days but didn't found exact match for my needs. The requirements are:
Speed. There's a lot of data so it shall be really fast. So I assume as well, precompilation is the must
Shall work under linux/openJDK
Support sandboxing at least for data access/code execution
Nice to have:
Simple, c-like syntax (not lua;)
Support sandboxing for CPU usage
So far I found that Rhino is still alive (last release dated 13 Jan 2020) but I'm not sure is it still supported and how fast it is - as I remember, one of reasons Java switched to Nashorn was speed. And speed is very important in my case. Also found J2V8 but linux is not supported. GraalVM looks like a bit overkill, also didn't get how to use it for such a task yet - maybe need to explore further if it is suitable for that, but looks like it is complete jvm replacement and cannot be used as a library.
It's not necessary shall be javascript, maybe there are other alternatives.
Thank you.

GraalVM's JavaScript can be used as a library with the dependencies obtained as any Maven artifact. While the recommended way to run it is to use the GraalVM distribution, there are some explanations how to run it on OpenJDK.
You can restrict things script should have access to, like Java classes, creating threads, etc:
From the documentation:
The following access parameters may be configured:
* Allow access to other languages using allowPolyglotAccess.
* Allow and customize access to host objects using allowHostAccess.
* Allow and customize host lookup to host types using allowHostLookup.
* Allow host class loading using allowHostClassLoading.
* Allow the creation of threads using allowCreateThread.
* Allow access to native APIs using allowNativeAccess.
* Allow access to IO using allowIO and proxy file accesses using fileSystem.
And it is several times faster than Nashorn. Some measurements can be found for example in this article:
GraalVM CE provides performance comparable or superior to Nashorn with
the composite score being 4 times higher. GraalVM EE is even faster.

Related

Parsing non-fixed format binary payload with a custom javascript conversion in Vorto

We are using Vorto now mainly as a normalized format and are starting to look into using the mapping engine for mapping different payload formats to Vorto model as well. I more or less understand how to map functionblock properties from JSON or binary payload using xpath and the conversion functions. However, I'm not clear how to support parsing of non-fixed format binary payload using this method.
For instance we have an off the shelf LoRaWAN sensor which transmits in the following format:
<length><frame type>[<sensor-id><sensor-value>] where length is the total frame length and sensor-id (for eg temperature, humidity, battery, ...) describes how to parse the sensor-value (ie length, datatype). In one frame multiple of these readings may be present in random order.
Parsing this can be done easily in for instance loraserver.io using a small javascript function which iterates over all the bytes en returns the parsed properties. The same way will work in the Ditto payload mapping engine afaik.
However, currently I don't see how to do something similar in Vorto mapping. This is just one specific sensor example of course, but more examples exist on the market using similar dynamic payload format. I know there is already an open issue (#1535) to improve the documentation, but it would already be helpful to know if such flexible parsing would be possible using the mapping DSL.
I tried passing the raw payload as bytearray to the javascript function. In order to test this I duplicated the org.eclipse.vorto.mapping.engine.converter.binary.BinaryMappingTest#testMappingBinaryContaining2DataPoints and adapted the model to use a custom javascript function like this
evaluator.addScriptFunction(new ScriptClassFunction("extractTemperature",
"function extractTemperature(value) { " +
" print(\"parameter of type \" + typeof value + \", value = \" + value);" +
" print(value[1]);" +
"}"));
The output of this function is
parameter of type number, value = 1
undefined
Where the value 1 is the first element of the bytearray used.
So the function does not seem to receive the parameter as bytarray.
The model is configured with .withXPathStereotype("custom:extractTemperature(data)", "demo") so the payload is passed (as BinaryData) in the same way as in the testMappingBinaryContaining2DataPoints test (.withXPathStereotype("custom:convert(vorto_conversion1:byteArrayToInt(data,0,0,0,2))", "demo")). The only difference I see now is that in the testMappingBinaryContaining2DataPoints test is that the byetarray parameter is passed to a Java function instead of a javascript function. Or am I missing something?
Also, I noticed that loop keywords like for and while are not allowed in the javascript code. So even if I can access the bytearray parameter in the javascript function I see no way for now how to iterate over this.
On gitter I received following reply (together with the suggestion to move discussion to SO)
You are right. We restricted the Javascript function usage to very rudimentary set of language keywords excluding for loops as nasty stuff can be implemented there. What you could do Instead is to register a java function In your own namespace to the mapping engine. That function can hold a byte array. Later this function can be contributed to the mapping engine as a standard function to extract a certain value out for other developers to reuse.
I don't think this is solution to the problem however. As mentioned above this is just one example of an off the shelf sensor payload format, and I don't see how this can be generalized enough to include as a generic function in the mapping engine. And I don't think it should be required to implement a sensor specific conversion in Java, since (as an end-user of an IoT platform wanting to deploy a new sensor type) this is more complex to develop and deploy than a little javascript function which can be altered at runtime in the mapping spec. I see a lot of value in being able to do simple mappings in javascript, just like this can be done in for example loraserver.io and Eclipse Ditto.
I think being able to pass a byte array to javascript is a first step. Also I wonder where exactly the risk is in allowing loops in the javascript? For example Ditto also has some restrictions in the javascript sandbox (see here) but this allows loops and only prevents endless looping and recursion.
They state the following:
Using Rhino instead of Nashorn, the newer JavaScript engine shipped with Java, has the benefit that sandboxing can be applied in a better way.
Sandboxing of different payload scripts is required as Ditto is intended to be run as cloud service where multiple connections to different endpoints are managed for different tenants at the same time. This requires the isolation of each single script to avoid interference with other scripts and to protect the JVM executing the script against harmful code execution.
Would using Rhino in Vorto as well allow to control the risks you see and allow loop construct in Vorto mapping?
PS: can someone with enough SO reputation points add the tag eclipse-vorto please?

I created an issue for you request to support this in the Javascript converters: https://github.com/eclipse/vorto/issues/2029
As stated in the issue, as a current workaround, you can register your own custom converter function with Java and re-use this function across your mappings. In these java converter functions, you have all the power of the java language to convert to extract the right property from the arbitrary list.
In order to find out how to implement your own custom converter function with Java, take a look here: https://github.com/eclipse/vorto/tree/master/mapping-engine#Advanced-Usage

Since Eclipse Vorto 0.12.3 release, a fix for your request is available. With this it is possible to pass array object to javascript Converter as well as use for loops inside javascript functions. You might wanna give it a try.
See release notes https://github.com/eclipse/vorto/blob/master/docs/release-notes.md

Access .net DLL from Java

I am new to java and DLL-s
I need to access DLL's methods from java. So go easy on me.
I have tried using JNA to access the DLL here is what I have done.
import com.sun.jna.Library;
public class mapper {
public interface mtApi extends Library {
public boolean IsStopped();
}
public static void main(String []args){
mtApi lib = (mtApi) Native.loadLibrary("MtApi", mtApi.class);
boolean test = lib.IsStopped();
System.out.println(test);
}
}
When I run the code, I am getting the following error:
Exception in thread "main" java.lang.UnsatisfiedLinkError:Error looking up function 'IsStopped':The specified procedure could not be found.
I understand that this error is saying it cannot find the function, but I have no idea how to fix it.
I am trying to use this API mt4api
and here is the method, I am attempting to access MQL4
Can anyone tell me what I am doing wrong?
I have looked at other alternatives, like jni4net, but I cannot get this working either.
If anyone can link me to a tutorial that shows me how to set this up, or knows how to, I would be greatfull.

Trading?Hunting for milliseconds to shave-off?Go rather into Distributed Processing... Definitely safer than relying on API !
While your OP was directed onto how bend java to call .NET DLL-functions,
let me sketch a much future-safer solution.
Using AI/ML-regression based predictors for FOREX trading, I was hunting in the same forest. The best solution found within the last about 12-years, having spent about a few hundreds man*years of experience, was setup in the following manner:
Host A executes trades: operates MetaTrader Terminal 4, with both Script and EA --- the distributed-processing system communicates with with a use of ZeroMQ low-latency messaging/signalling framework ( about a few tens of microseconds needed )
Host B executes AI/ML processing of predictions for a traded instrument ( about a few hundreds of microseconds apply )
Cluster C executes continuous AI/ML predictor re-trainings and HyperParameterSPACE model selections ( many CPU-hours indeed needed, continuous model self-adapting process running 24/7 )
Signalling / Messaging layer with ZeroMQ has ports and/or bindings available and ready for most of the mainstream and many of niche programming languages, including java.
Hidden dangers of going just against a published API:
While the efforts for system integration and testing are immense, the API specifications are always dangerous for specification creeping.
This said, add countless man*months consumed on debugging after a silent change in MT4 language specifications that de-rail your previous tools + libraries. Why? Just imagine. Some time ago, MQL4 stopped to be MQL4 and was silently shifted towards MQL5, under a name New-MQL4. Among other changes in compilation, there were many small and big nails in the coffin -- string surprisingly ceased to be a string and was hidden as an internal struct -- which one could guess what will cause with all DLL-calls.
So, beware of API creepings.
Does it hurt a distributed processing solution?
No.
With a wise message-layout design, there are no adverse effects of MetaTrader Terminal 4 behaviour and all the logic ( incl. the strategy decision ) is put outside this creeping platform.
Doable. Fast and smart. Also could use remote-GPU-cluster processing, if your budget allows.
Does it work even in Strategy Tester?
Yes, it does.
If anyone has the gut to rely on the in-built Strategy Tester, the distributed-processing model still works there. Performance depends on the preferred style of modelling, a full one year, tick-by-tick simulation, with a quite complex AI/ML components took a few days on a common COTS desktops PC-systems ( after years of Quant R&D, we do not use Strategy Tester internally at all, but the request was to batch-test the y/y tick-data, so could be commented here ).

Is it possible to get a deep copy of objects using the VersionOne Java SDK?

Let's say I want to calculate the cumulative estimate of my defects. I do
double estimate = 0.0;
Double tEstimate = 0.0;
Collection<Defect> defects = project.getDefects(null);
for(Defect d : defects){
tEstimate = d.getEstimate();
if(tEstimate != null){
estimate += tEstimate;
}
}
Here each call to d.getEstimate() does a callback to the server, meaning this code runs extremely slowly. I would like to take the one-time performance hit up front and download all the info along with the Defect object, probably including getting some information I won't use, but avoid hitting the latency of a server callback during each iteration of the loop.

You are using the VersionOne Object model SDK. It does lack robustness because of the very thing you are complaining about. One of the inefficiencies is how it knows that you are requesting a list of assets but first gets all of the assets with a predetermined set of attributes such as AssetState and checks to see if it is dead asset. After this, it makes another call to get the same list of assets again but with your specified attributes. This could be remedied by applying a greedy algorithm that could grab a set a of attributes such that each member of this set is returned regardless of which attributes are requested in your .get_() method. Why? This already (sort of) happens in the Rest based VersionOne API as it stands. If the query returned all attributes, it would probably a little wasteful especially for humongous backlogs.
Anyway, the VersionOne will be deprecating the Object Model in the near future so if you plan on a lot of coding using the OM, consider this.
Here are some ways to circumvent this problem
1) Rewrite your code to use the VersionOne APIClient SDK. It has XML plumbing so that you will save you a lot of time writing your own. This is a little bit more verbose but it is more powerful, fast and efficient. The Object model is actually built upon the APIClient.
2) Rewrite your code using Java and the raw VersionOne Rest API - The requires that you understand http and the VersionOne Rest API.
3) If you cannot change from the Object model, you can mix the 2 sdks. When you need to read large amounts to data, just use APIClient code to manage that segment of the code. Kind of pointless when you can just learn the APIclient and use exclusively unless you have a huge investment in using the Object model and you can't change. The code gets mucky real fast. Not recommended.

The rest-1.v1 API endpoint exposes operations for assets, including DeepCopy. There is no client code that enumerates all of the operations, so you must first explore the asset using the meta.v1 API endpoint. Using the API Client backdoor from the Object Model, you can get to the classes that will allow you to call an operation once you know the name.

Fastest way to export keys from cassandra

What is the fastest way to export all the rowkeys from a column family in cassandra (0.7.x and later versions) with Java APIs or other tools ?
Currently I am using the Java Pelops API, and paging through all records, but Im wondering if there is a better mechanism.
I am specifically interested in only exporting the rowkeys (no columns/subcolumns), so Im wondering if there is a section of the cassandra direct storage APIs that could be used to do this as quickly as possible (bypassing thrift).

What about using Java hector client. Sample taken from
https://github.com/rantav/hector/wiki/User-Guide
RangeSlicesQuery<String, String, String> rangeSlicesQuery =
HFactory.createRangeSlicesQuery(keyspace, stringSerializer,
stringSerializer, stringSerializer);
rangeSlicesQuery.setColumnFamily("Standard1");
rangeSlicesQuery.setKeys("fake_key_", "");
rangeSlicesQuery.setReturnKeysOnly(); // use this
rangeSlicesQuery.setRowCount(5);
Result<OrderedRows<String, String, String>> result = rangeSlicesQuery.execute();
thrift is API interface for cassandra. Going directly to storage would require you to read data files in binary. Code above should give you good performance.
If you need this for one time export then I would say it's OK. If you need this for production you should reconsider your data-model - you may be doing something wrong.
You may need to split the query using multiple key ranges in case you need to scan many rows.

Static Analysis tool to detect Internationalization issues

Are there any tools (free/commercial) that can audit an application for internationalization? (or localization-readiness, if you prefer)
Primarily interested in:
Mulitlingual Implementation tests
Examples:
* [javascript] alert('Oops wrong choice!');
* [java] String msg = resourcebundle.getString("key.x").concat("4");
* [jdbc] String query=".. order by abc"; //should be NLS_SORT or equiv.
Date Implementation tests
Examples:
* SimpleDateFormat used without Locale
* Apache's DateFormatUtils used
Numeric Implementation tests
Examples:
* NumberFormat used without Locale
javascript-validation tests
Examples:
* [javascript] checkIsDecimal { //decimal point checked against "." }
* [javascript] hardcoded character range [A-z]
Cheers.

Have a look at Globalyzer - http://lingoport.com/globalyzer - as it is just that, a tool for performing static analysis on code specifically for internationalization. It works with a variety of programming languages too. Supports detection and correction for embedded strings (string externalization capabilities too), potential locale-limiting methods/functions/classes depending upon the programming language and requirements, as well as other issues like programming patterns and embedded images. There are default "rule sets" which get you a good start, and then you can customize your rules for both detection and filtering of issues. Plus there's an underlying database that helps you tag or keep track of i18n issues as you work with them. There's a server component, where you create and share your rule sets with your team members, then desktop and command line clients which run locally on your machine to analyze your source, so you're not sending any code or reporting off your local machine.

Based on your examples, you mostly want to diagnose
functions that produce output, whose input isn't somehow
internationalized.
So for the alert case, you want to find any print call
that acquires a string that is not produced by
one of possibly several well-know translation routines.
For the jdbc case, you want to identify ordering constraints
that are not locale specific.
For the various date cases, you want date routines that
are known to produce locale-specific answers.
The javascript validation is harder to guess at intent;
presumaly you want to diagnose functions that are known
to be wired to a particular locale; this seems a lot like
the date case. For range checks, you want capture anything
that compares a character to another for less or greater than.
For the wired-locale functions, it seems just knowing their
name would be enough (although perhaps there has to be some overload resolution,
e.g., by number of arguments), so NumberFormat(?,?) is bad,
and NumberFormat(?,?,?) is OK.
Why can't you write a regular expression to look (hueristically) for the bad cases?
For the range case, you just need to recognize expressions
of the form of [exp] < [literal-char] or [exp] < [literal-string].
A regexp to look for just "< '.+" would seem adequate.
Are there common cases that these would miss?
EDIT (from comment below: "I've been using regexp but...")
If you want a tool that is deeper than regexp, you pretty much
have to go to language parsing, name/type resolution, and having
data flow analysis would be helpful. Since you want to process
multiple (computer) languages, the tool has to be multi-lingual capable.
And it appears you want to be able to customize it to check for
the specific cases relevant to your application.
The DMS Software Reengineering Toolkit
has all these properties, including
parsers for Java, JavaScript and SQL. It is designed to be customized,
so you have to do that in advance of using it.

I had studied IntelliJ IDEA's code analyzers, and it does have those that you requested. It's a commercial IDE, specialized in java, but knows other languages as well.
http://www.jetbrains.com/idea/

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.