I am trying to use scriptella in my project to copy data from one db to another, now the application has a frontend which users can use to create mapping between tables and create dynamic queries, now currently once the user submits the frontend queries are passed via a query engine and a scriptella xml is created using freemarker template
however to execute the xml the executor expects a file instead of a xml string currently i am achieving this by creating a xml in temp directory and deleting it post execution of query, is there any way i can skip file creation and execute the query as a xml string
You can create a custom URLStreamHandler that will serve streams directly from memory. This is similar to what was done in AbstractTestCase. It can be registered by calling URL.setURLStreamHandlerFactory. See Registering and using a custom java.net.URL protocol or Is it possible to create an URL pointing to an in-memory object?
After that, use
EtlExecutor.newExecutor(java.net.URL) with the new URI, e.g. new URL("memory://file")
I had a similar use case. I downloaded the code and made a small change in the core. Due to some private functions I had no choice.
in
package scriptella.configuration.ConfigurationFactory
I added the following function:
public ConfigurationEl createConfigurationFromTxt(String xml, final ParametersCallback externalParameters ) {
try {
DocumentBuilder db = DBF.newDocumentBuilder();
db.setEntityResolver(ETL_ENTITY_RESOLVER);
db.setErrorHandler(ETL_ERROR_HANDLER);
final InputStream in = new ByteArrayInputStream(xml.getBytes());
final Document document = db.parse(in);
HierarchicalParametersCallback params = new HierarchicalParametersCallback(
externalParameters == null ? NullParametersCallback.INSTANCE : externalParameters, null);
PropertiesSubstitutor ps = new PropertiesSubstitutor(params);
return new ConfigurationEl(new XmlElement(
document.getDocumentElement(), resourceURL, ps), params);
} catch (IOException e) {
throw new ConfigurationException("Unable to load document: " + e, e);
} catch (Exception e) {
throw new ConfigurationException("Unable to parse document: " + e, e);
}
}
Then from my code I can do something like this:
ConfigurationFactory cf = new ConfigurationFactory();
ConfigurationEl conf = cf.createConfigurationFromTxt(FETCH_ETLS, p);
EtlExecutor exec = new EtlExecutor(conf);
Related
In fact I am making a Minecraft plugin and I was wondering how some plugins (without using DB) manage to keep information even when the server is off.
For example if we make a grade plugin and we create a different list or we stack the players who constitute each. When the server will shut down and restart afterwards, the lists will become empty again (as I initialized them).
So I wanted to know if anyone had any idea how to keep this information.
If a plugin want to save informations only for itself, and it don't need to make it accessible from another way (a PHP website for example), you can use YAML format.
Create the config file :
File usersFile = new File(plugin.getDataFolder(), "user-data.yml");
if(!usersFile.exists()) { // don't exist
usersFile.createNewFile();
// OR you can copy file, but the plugin should contains a default file
/*try (InputStream in = plugin.getResource("user-data.yml");
OutputStream out = new FileOutputStream(usersFile)) {
ByteStreams.copy(in, out);
} catch (Exception e) {
e.printStackTrace();
}*/
}
Load the file as Yaml content :
YamlConfiguration config = YamlConfiguration.loadConfiguration(usersFile);
Edit content :
config.set(playerUUID, myVar);
Save content :
config.save(usersFile);
Also, I suggest you to make I/O async (read & write) with scheduler.
Bonus:
If you want to make ONE config file per user, and with default config, do like that :
File oneUsersFile = new File(plugin.getDataFolder(), playerUUID + ".yml");
if(!oneUsersFile.exists()) { // don't exist
try (InputStream in = plugin.getResource("my-def-file.yml");
OutputStream out = new FileOutputStream(oneUsersFile)) {
ByteStreams.copy(in, out); // copy default to current
} catch (Exception e) {
e.printStackTrace();
}
}
YamlConfiguration userConfig = YamlConfiguration.loadConfiguration(oneUsersFile);
PS: the variable plugin is the instance of your plugin, i.e. the class which extends "JavaPlugin".
You can use PersistentDataContainers:
To read data from a player, use
PersistentDataContainer p = player.getPersistentDataContainer();
int blocksBroken = p.get(new NamespacedKey(plugin, "blocks_broken"), PersistentDataType.INTEGER); // You can also use DOUBLE, STRING, etc.
The Namespaced key refers to the name or pointer to the data being stored. The PersistentDataType refers to the type of data that is being stored, which can be any Java primitive type or String. To write data to a player, use
p.set(new NamespacedKey(plugin, "blocks_broken"), PersistentDataType.INTEGER, blocksBroken + 1);
What is the difference between those two queries:
SELECT my_fun(col_name) FROM my_table;
and
CREATE TABLE new_table AS SELECT my_fun(col_name) FROM my_table;
Where my_fun is a java UDF.
I'm asking, because when I create new table (second query) I receive a java error.
Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: Map operator initialization failed
...
Caused by: org.apache.hadoop.hive.ql.exec.UDFArgumentException: Unable to instantiate UDF implementation class com.company_name.examples.ExampleUDF: java.lang.NullPointerException
I found that the source of error is line in my java file:
encoded = Files.readAllBytes(Paths.get(configPath));
But the question is why it works when table is not created and fails if table is created?
The problem might be with the way you read the file. Try to pass the file path as the second argument in the UDF, then read as follows
private BufferedReader getReaderFor(String filePath) throws HiveException {
try {
Path fullFilePath = FileSystems.getDefault().getPath(filePath);
Path fileName = fullFilePath.getFileName();
if (Files.exists(fileName)) {
return Files.newBufferedReader(fileName, Charset.defaultCharset());
}
else
if (Files.exists(fullFilePath)) {
return Files.newBufferedReader(fullFilePath, Charset.defaultCharset());
}
else {
throw new HiveException("Could not find \"" + fileName + "\" or \"" + fullFilePath + "\" in inersect_file() UDF.");
}
}
catch(IOException exception) {
throw new HiveException(exception);
}
}
private void loadFromFile(String filePath) throws HiveException {
set = new HashSet<String>();
try (BufferedReader reader = getReaderFor(filePath)) {
String line;
while((line = reader.readLine()) != null) {
set.add(line);
}
} catch (IOException e) {
throw new HiveException(e);
}
}
The full code for different generic UDF that utilizes file reader can be found here
I think there are several points unclear, so this answer is based on assumptions.
First of all, it is important to understand that hive currently optimize several simple queries and depending on the size of your data, the query that is working for you SELECT my_fun(col_name) FROM my_table; is most likely running locally from the client where you are executing the job, that is why you UDF can access your config file locally available, this "execution mode" is because the size of your data. CTAS trigger a job independent on the input data, this job runs distributed in the cluster where each worker fail accessing your config file.
It looks like you are trying to read your configuration file from the local file system, not from the HDSFS Files.readAllBytes(Paths.get(configPath)), this means that your configuration has to either be replicated in all the worker nodes or be added previously to the distributed cache (you can use add file from this, doc here. You can find another questions here about accessing files from the distributed cache from UDFs.
One additional problem is that you are passing the location of your config file through an environment variable which is not propagated to worker nodes as part of your hive job. You should pass this configuration as a hive config, there is an answer for accessing Hive Config from UDF here assuming that you are extending GenericUDF.
Basically, I'm writing my first Spring-Boot program, and I have to get a list of products stored on a JSON file to display each product using VueJS (I know how to use Vue, I just need to get the JSON data somewhere in the webpage or smth)
I spent last 3'5 hours looking at tutorials about consuming JSON's and POST stuff and none helped.
Lets call your file config.json.
In a typical maven project, keep your file at
src/main/resources/config.json
In your code, read it like
try {
ClassPathResource configFile = new ClassPathResource("config.json");
String json = IOUtils.toString(configFile.getInputStream(), Charset.forName(Util.UTF_8));
} catch (IOException e) {
String errMsg = "unexpected error while reading config file";
logger.error(errMsg, e);
throw new Exception(e);
}
After this, use Jackson or GSON to read the json into an object. From there you can either reference it directly as a static attribute or as an attribute in component as per your use case.
Hope this code will work for you
public class JsonReader{
public static void readFromJson() throws Exception {
InputStream inStream = JsonReader.class.getResourceAsStream("/" + "your_config_file.json");
Map<String, String> keyValueMap =
new ObjectMapper().readValue(inStream, new TypeReference<Map<String, String>>() {});
inStream.close();
}
}
You might need to add the maven dependency for ObjectMapper()
I have a velocity template, it represents an XML file. I am populating the text between tags using data passed to a VelocityContext object. This is then accessed inside the template.
Here is an example lets call it myTemplate.vm:
<text>$myDocument.text</text>
and this is how I am passing that data to the velocity file and building it to output as a String:
private String buildXml(Document pIncomingXml)
{
// setup environment
Properties lProperties = new Properties();
lProperties.put("file.resource.loader.class", "org.apache.velocity.runtime.resource.loader.ClasspathResourceLoader");
VelocityContext lVelocityContext = new VelocityContext();
lVelocityContext.put("myDocument" , pIncomingXml.getRootElement());
StringWriter lOutput = new StringWriter();
try
{
Velocity.init(lProperties);
Velocity.mergeTemplate("myTemplate.vm", "ISO-8859-1", lVelocityContext, lOutput);
}
catch (Exception lEx)
{
throw new RuntimeException("Problems running velocity template, underlying error is " + lEx.getMessage(), lEx);
}
return lOutput.toString();
}
The problem is that when I access myDocument.text inside the template file it outputs text which is not escaped for XML.
I found a work around for this by also adding a VelocityContext for an escape tool like so:
lVelocityContext.put("esc", new EscapeTool());
then wrapping my tag in the template using it:
<text>$esc.xml($myDocument.text)</text>
The reality is I have a very large template and for me to manually wrap each element in an $esc.xml context will be time consuming. Is there a way that I can tell velocity to escape for XML on access to myDocument without editing the template file at all?
Yes, it's possible.
What you need to do is to use the EscapeXMLReference, which implements the reference insertion handler interface:
lProperties.put("eventhandler.referenceinsertion.class",
"org.apache.velocity.app.event.implement.EscapeXmlReference");
It is pretty simple to load data from an URL using jena provider for virtuoso. The following code does the job:
VirtGraph graph = new VirtGraph ("foaf", "jdbc:virtuoso://localhost:1111", "dba", "dba");
/* Load data to Virtuoso */
System.out.print ("Begin read from 'http://xmlns.com/foaf/0.1/index.rdf' ");
graph.read("http://xmlns.com/foaf/0.1/index.rdf", "RDF/XML");
However thing are different when you want to load them from a local file. I tried this:
VirtGraph graph = new VirtGraph ("foaf", "jdbc:virtuoso://localhost:1111", "dba", "dba");
graph.read("/tmp/index.rdf", "RDF/XML");
graph.close();
But I end up with the following Exception:
com.hp.hpl.jena.shared.JenaException: virtuoso.jdbc4.VirtuosoException: HC001: Connection Error in HTTP Client
do someone have any clue about how to load rdf from file using jenaprovider ?
Configuration setting
virt_jena2.jar
virtjdbc4.jar
Rather than providing just the filename, use a file URI instead. E.g.:
graph.read("file:///tmp/index.rdf", "RDF/XML");
About your first example:
VirtGraph graph = new VirtGraph ("foaf", "jdbc:virtuoso://localhost:1111", "dba", "dba");
graph.read("http://xmlns.com/foaf/0.1/index.rdf", "RDF/XML");
It is converted to next SPARQL command (that is executed on server side - Sources):
load "http://xmlns.com/foaf/0.1/index.rdf" into graph <foaf>
And this command doesn't allow to load your local files to DBMS.
You could use Jena Model methods for load local files. Like the next (Jena Model reference):
Model model = VirtModel.openDatabaseModel("load:test", "jdbc:virtuoso://localhost:1111", "dba", "dba");
InputStream in = FileManager.get().open( nfile );
if (in == null) {
throw new IllegalArgumentException( "File: " + nfile + " not found");
}
model.read(new InputStreamReader(in), null, "N-TRIPLE");
model.close();