My cluster should read some input files that are located in my azure storage. I am submitting my .jar to the cluster through livy but it always dies because I cannot locate my files -> User class threw exception: java.io.FileNotFoundException. What am I missing? I dont want to use sc.textFile to open the files because it would make them into RDD structures and I need their structure correct.
val Inputs : String = scala.io.Source.fromFile("wasbs:///inputs.txt").mkString
I believe that I am trying to read from the wrong locationo or with the wrong method, any ideas?
Thanks!
According to your description, based on my understanding, I think you want to load the plain text file on Azure Storage using Scala running on HDInsight.
Per my experience, there are two ways which you can try to implement your needs.
Just using Scala within Azure Java Storage SDK to get the content of the text blob, please refer to the tutorial How to use Blob storage from Java, and I think using Scala to rewrite the sample code in the tutorial is very simple.
Using Hadoop Filesystem API within Hadoop Azure Support library to load file data, please refer to the hadoop example wiki https://wiki.apache.org/hadoop/HadoopDfsReadWriteExample to write your code in Scala.
Related
I want to store images in ArangoDb as image file. I want to know if there is any API or Java API for the same. Thanking You in advance.
Storing binary data inside ArangoDB has been a long standing feature request.
Currently its not possible out of the box.
One can however do this by creating a foxx service that handles the data.
The recommended way is to create a file and reference that file name inside the database.
A detailed description and an example foxx app can be found in the cookbook
I have a web application running in AIX server and the requirement is to read an IDML file, get the coordinates of each and every text in the file and write some custom information into a PDF based on the coordinates.
I have gone thru various documents and forums on how to setup or what is required to achieve this, but I am confused. I need some information on what is required from software and licensing perspective in order to achieve this requirement.
In order to run the java program, which can access IDML file in AIX server, do I have to buy InDesign Server license or I can extract the IDMLTools.jar from SDK and place it in my ClassPath?
Where do I find IDML SDK? I am unable to access IDMLToolsLib.com site?
Any help is appreciated.
Thanks,
Satish.
There is a Java lib IMLLib. It intends to ease the idml file exploration. I never used it myself but it seems a great tool.
Video:
https://www.youtube.com/watch?v=LQqd9NgH8W4
Site:
http://idmllib.com/
Why not unzip the IDML and parse the resulting XML files?
As title, in JAVA API, there are several methods in org.apache.hadoop.conf.Configuration to get details about what we have configure in hdfs configurion files. Such as hdfs-site.xml, core-site.xml. But I want to get this by using C API, libhdfs.so. could any body help me ?
Example program of libhdfs, C++ library to handle HDFS (Hadoop Distributed File System)use the following link
libhdfs
I am trying use a Java Uploader in a ROR app (for its ease of uploading entire directories). The selected uploader comes with some PHP code that saves the files to the server. I am trying to translate this code to Ruby, but am stumped on this point:
PHP has a very convenient superglobal – $_FILES – that contains a hash of all files uploaded to the current script via the HTTP POST method. It appears Ruby does not have a similar resource. Lacking that, what is the best way to access and save the uploaded files?
I am using the JavaPowUpload uploader ( http://www.element-it.com/OnlineHelpJavaPowUpload/index.html ).
ruby on rails allows you use the application root directory to get at the file stored (wherever you have decided to put it) via #{RAILS_ROOT}.
Check out this tutorial. Not the prettiest method, but it should give you an idea of what needs to be done. Once the file is uploaded, it's just a matter of getting the right path and doing your processing from there.
I am not able to figure out how to upload bulk data to the Google's servers bypassing the 10mb upload limit and 30 sec session timeout. I want to design an application that takes my standard SQL data and pushes it to the Google's servers.
I might sound naive but your help is most valuable for my project.
There's not currently a native Java bulkloader, so what you need to do is use the Python one. The process goes like this:
First, you'll need to download the Python SDK and extract it. Then, create an empty directory, and in it create a file called app.yaml, containing the following:
application: yourappid
version: bulkload
runtime: python
api_version: 1
handlers:
- url: /remote_api
script: $PYTHON_LIB/google/appengine/ext/remote_api/handler.py
login: admin
Now, run "appcfg.py update yourdir" from the Python SDK, and enter your credentials when prompted. appcfg will upload a new version of your app, which will run side-by-side with your main version, and allow you to bulkload.
Now, to do the actual bulkloading, you need to use the Python Bulkloader. Follow the instructions here. You'll need to know a (very) little bit of Python, but it's mostly copy-and-paste. When you're done, you can run the bulkloader as described in the article, but add the "-s bulkload.latest.yourapp.appspot.com" argument to the command line, like this:
appcfg.py upload_data --config_file=album_loader.py --filename=album_data.csv --kind=Album -s bulkload.latest.yourapp.appspot.com <app-directory>
Finally, to load data directly from an SQL database instead of from a CSV file, follow the instructions in my blog post here.
I wanna do the same thing also. So, here's my naivest concept to achieve the goal.
Web Server Preparation
Create a servlet that will receive the uploaded data (e.g. for data type
XML, JSON)
(optional) store it as Blobstore
Parse the data using JAXB/JSoup and/or GSON
Dynamically interpret the data structure
Store it using Datastore/
Client Uploader Preparation
Using a local computer, create a Java/C++/PHP script that generates XML/JSON files and store it locally
Create a shell script (linux) or batch file (windows) to programatically upload the files using cURL.
Please drop a comment to this one if you have better idea guys.