Trying to access s3 bucket using scala application

Trying to access s3 bucket using scala application - java

I want to access amazon s3 bucket using scala application. I have set up the scala IDE in my eclipse. But when i try to run the >application on my local (Run As --> Scala Application) , it gives the following >error on the console. Error: Could not find or load main class org.test.spark1.test I an trying to run a simple wordcount application in which i am accessing a >file that is stored in my S3 bucket and storing the results in another file. Please make me understand what might the problem be.
Note: I am using eclipse maven project. My scala application code is :
package org.test.spark1
import com.amazonaws._
import com.amazonaws.auth._
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import com.amazonaws.services.s3._
import com.amazonaws.services.s3.model.GetObjectRequest
import java.io.File;
object test extends App {
def main(args: Array[String]) {
val myAccessKey = "here is my key"
val mySecretKey = "here is my secret key"
val bucket = "nlp.spark.apps"
val conf = new SparkConf().setAppName("sample")
val sc = new SparkContext(conf)
val yourAWSCredentials = new BasicAWSCredentials(myAccessKey, mySecretKey)
val amazonS3Client = new AmazonS3Client(yourAWSCredentials)
// This will create a bucket for storage
amazonS3Client.createBucket("nlp-spark-apps2")
val s3data = sc.textFile("here is my url of text file")
s3data.flatMap(line =>
line.split(" "))
.map(word =>
(word, 1))
.reduceByKey(_ * _)
.saveAsTextFile("/home/hadoop/cluster-code2.txt")
}}

A possible solution I came across is that the Scala IDE does not automatically detect your main class:
Go to Menu --> "Run" --> "Run configurations"
Click on "Scala application" and on the icon for "New launch configuration"
For project select your project and for the main class (that for some reason is not auto-detected) manually enter (in your case) org.test.spark1.test
Apply and Run
OR
You could try to run your Spark job locally without eclipse using spark-submit.
spark-submit --class org.test.spark1.test --master local[8] {path to assembly jar}
Another thing, you should never hardcode your AWS credentials. I suggest you use InstanceProfileCredentialsProvider. This credentials exist in the instance metadata associated with the IAM role for the EC2 instance.

Related

Can't read from Jenkins pipeline workspaces using Java API

I am trying to understand if what I want to get is faceable.
We want to write a shared library step that allows us to produce a Kafka message.
To do this we use
#Grab(group = 'org.apache.kafka', module = 'kafka-clients', version = '3.2.0')
...
...
def producer = new KafkaProducer([
"bootstrap.servers": bootstrapServers,
// serializers
"value.serializer" : "org.apache.kafka.common.serialization.StringSerializer",
"key.serializer" : "org.apache.kafka.common.serialization.StringSerializer",
// acknowledgement control
"acks" : "all",
// TLS config
"ssl.truststore.type": "JKS",
"security.protocol": "SSL",
"ssl.enabled.protocols": "TLSv1.2",
"ssl.protocol": "TLSv1.2",
"ssl.truststore.location" : "<cacets_location>",
"ssl.truststore.password" : "changeit"
])
The method gets all params from outside except ssl.truststore.location
that is provided through the node volume.
The problem I realized is that JAVA commands are executed in a different workspace.
Let's say we have ssl.truststore.location in "/etc/pki/java/cacerts" location,
while using the readFile command I'm able to read it
, however, when I try to read using the pure JAVA command, I'm getting NoSuchFileFoundExecption
I found that when I execute a Java command like
File folder = new File("").getAbsoluteFile();
My working dir is empty "\", as if it's executed in an absolutely empty sandbox that is not related to the Jenkins workspace.
My question is if what I'm trying to obtain is doable in the Jenkins pipeline scope
and how to get it working.

GraalVM - embedding python multi-file project in java

I couldn't find a solution create a polyglot source out of multiple files in GraalVM.
What exactly I want to achieve:
I have a python project:
my-project:
.venv/
...libs
__main__.py
src/
__init__.py
Service.py
Example sourcecode:
# __main__.py
from src.Service import Service
lambda url: Service(url)
# src/Service.py
import requests
class Service:
def __init__(self, url):
self.url = url
def invoke(self):
return requests.get(self.url)
This is very simple example, where we've got an entry-point script, project is structured in packages and there is one external library (requests).
It works, when I run it from command-line with python3 __main__.py, but I can't get it work, when embedding it in Java (it can't resolve imports).
Example usage in java:
import org.graalvm.polyglot.Context;
import org.graalvm.polyglot.Source;
import org.graalvm.polyglot.Value;
import java.io.File;
import java.io.IOException;
public class Runner {
public static void main(String[] args) throws IOException {
Context context = Context.newBuilder("python")
.allowExperimentalOptions(true)
.allowAllAccess(true)
.allowIO(true)
.build();
try (context) {
// load lambda reference:
Value reference = context.eval(Source.newBuilder("python", new File("/path/to/my-project/__main__.py")).build());
// invoke lambda with `url` argument (returns `Service` object)
Value service = reference.execute("http://google.com");
// invoke `invoke` method of `Service` object and print response
System.out.println("Response: " + service.getMember("invoke").execute());
}
}
}
It fails with Exception in thread "main" ModuleNotFoundError: No module named 'src'.
The solution works for javascript project (having similar index.js to __main__.py, its able to resolve imports - GraalVM "sees" other project's files, but somehow it doesn't, when using python.
I found out, that python is able to run zip package with project inside, but this also doesn't work with GraalVM.
Is there any chance to accomplish it? If not, maybe there is a similar tool to webpack for python (if I could create a single-file bundle, it should also work).
Btw, I don't know python at all, so I may missing something.
Thanks for any help!

AWS Java SDK credentials

I am using the AWS Java SDK and trying to run some tests; getting:
Unable to load AWS credentials from the /AwsCredentials.properties file on the classpath
The credentials file # ~/.aws/ is correct per AWS specs; 777'd it to ensure no access issues.
I am not using any IDE plug-ins; per AWS docs, having a credentials file # ~/.aws/ should suffice. Anyone have this working with just the SDK installed? If I hard-code the file path into the ClasspathPropertiesFileCredentialsProvider() request it spits the error back with the path instead of the AwsCredentials.properties string, which doesn't exist anywhere (yes, tried making one of those in ~/.aws/ as well).
Thanks much for any insights, code is below straight from Amazon:
import com.amazonaws.auth.ClasspathPropertiesFileCredentialsProvider;
import com.amazonaws.regions.Region;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.sns.AmazonSNSClient;
import com.amazonaws.services.sns.model.PublishRequest;
import com.amazonaws.services.sns.model.PublishResult;
public class SNS {
public static void main(String[] args) {
AmazonSNSClient snsClient = new AmazonSNSClient(new ClasspathPropertiesFileCredentialsProvider());
snsClient.setRegion(Region.getRegion(Regions.US_EAST_1));
String msg = "this is a test";
PublishRequest publishRequest = new PublishRequest("my arn", msg);
PublishResult publishResult = snsClient.publish(publishRequest);
System.out.println("MessageId - " + publishResult.getMessageId());
}
}

If you use DefaultAWSCredentialsProviderChain instead of ClasspathPropertiesFileCredentialsProvider, it will automatically check various default locations for AWS credentials. (Documentation)

Have you verified that your $HOME environment variable is set for the process you are running? The AWS SDK relies on $HOME to determine the proper location of your .aws folder.

Well that didn't work the way I'd planned it; couldn't get the .aws path as a classpath (tried adding as an external class folder).
Ran the below to find the actual classpaths in my project:
public static void main (String args[]) {
ClassLoader cl = ClassLoader.getSystemClassLoader();
URL[] urls = ((URLClassLoader)cl).getURLs();
for(URL url: urls){
System.out.println(url.getFile());
}
}
and then dropped my AWS credentials into a new AwsCredentials.properties file in one of the dirs from above (I had one; the rest were jar files).
Changed the tag values in the file to "accessKey" and "secretKey" from what was there (aws_access_key, aws_secret_access_key) and it worked.
Thanks to everyone for their inputs.

Magnolia cms: resources module proper usage

I am learning magnolia cms. I am trying to use the resources module. I have actually 2 problems.
Cannot upload a bunch of files. I have a few files, but in some time I will have to upload some more. Modules import feature wants me to upload an xml file. But I don't know how to generate it properly. Tried to import through JCR, but after that I can't see those files in resources app. Tried to configure the module to search files in file system: I set fileSystemLoader to class info.magnolia.module.resources.loaders.FileSystemResourceLoader and set some path. It did not work for me too. Maybe I just don't understand at what time should be activated files upload feature. At the application start up time it did not work.
How to properly use these resources in my template? What ftl tag should I use?
I don't use STK module.
Thanks for your patience if you decide to help me.
Magnolia version: 5.2 CE
JDK iced tea: 1.7.0_51
OS: Linux/OpenSUSE 12.3

I've used previously (on 4.5.x) script below to perform the task via groovy module. It should work on 5.2 as well.
import static groovy.io.FileType.FILES
import info.magnolia.jcr.util.NodeUtil
import org.apache.commons.lang.StringUtils
import info.magnolia.cms.util.ContentUtil
class Globals {
static def folderName = '//some/folder/in/filesystem/on/server'
}
def loadImageFolder() {
session = ctx.getJCRSession("resources")
parentFolder = session.getNode("/templating-kit/jelinek-image/obrazky-produkty")
new File(Globals.folderName).eachFileRecurse(FILES) {
name = it.name
// set file name
extension = StringUtils.substringAfterLast(name, '.')
name = StringUtils.substringBeforeLast(name, '.')
// persist
resource = NodeUtil.createPath(parentFolder,name , "mgnl:content")
// persistResource
resource.setProperty("mgnl:template", "resources:binary")
resource.setProperty("extension", extension)
binary = resource.addNode("binary", "mgnl:resource")
binary.setProperty("jcr:data", new FileInputStream(it.absolutePath))
binary.setProperty("extension", extension)
binary.setProperty("fileName", name)
binary.setProperty("jcr:mimeType", "image/"+extension)
binary.setProperty("size", it.length())
}
session.save()
}
loadImageFolder()
return 'done'

Run a simple Cascading application in local mode

I'm new to Cascading/Hadoop and am trying to run a simple example in local mode (i.e. in memory). The example just copies a file:
import java.util.Properties;
import cascading.flow.Flow;
import cascading.flow.FlowConnector;
import cascading.flow.FlowDef;
import cascading.flow.local.LocalFlowConnector;
import cascading.pipe.Pipe;
import cascading.property.AppProps;
import cascading.scheme.hadoop.TextLine;
import cascading.tap.Tap;
import cascading.tap.hadoop.Hfs;
public class CascadingTest {
public static void main(String[] args) {
Properties properties = new Properties();
AppProps.setApplicationJarClass( properties, CascadingTest.class );
FlowConnector flowConnector = new LocalFlowConnector();
// create the source tap
Tap inTap = new Hfs( new TextLine(), "D:\\git_workspace\\Impatient\\part1\\data\\rain.txt" );
// create the sink tap
Tap outTap = new Hfs( new TextLine(), "D:\\git_workspace\\Impatient\\part1\\data\\out.txt" );
// specify a pipe to connect the taps
Pipe copyPipe = new Pipe( "copy" );
// connect the taps, pipes, etc., into a flow
FlowDef flowDef = FlowDef.flowDef()
.addSource( copyPipe, inTap )
.addTailSink( copyPipe, outTap );
// run the flow
Flow flow = flowConnector.connect( flowDef );
flow.complete();
}
}
Here is the error I'm getting:
09-25-12 11:30:38,114 INFO - AppProps - using app.id: 9C82C76AC667FDAA2F6969A0DF3949C6
Exception in thread "main" cascading.flow.planner.PlannerException: could not build flow from assembly: [java.util.Properties cannot be cast to org.apache.hadoop.mapred.JobConf]
at cascading.flow.planner.FlowPlanner.handleExceptionDuringPlanning(FlowPlanner.java:515)
at cascading.flow.local.planner.LocalPlanner.buildFlow(LocalPlanner.java:84)
at cascading.flow.FlowConnector.connect(FlowConnector.java:454)
at com.x.y.CascadingTest.main(CascadingTest.java:37)
Caused by: java.lang.ClassCastException: java.util.Properties cannot be cast to org.apache.hadoop.mapred.JobConf
at cascading.tap.hadoop.Hfs.sourceConfInit(Hfs.java:78)
at cascading.flow.local.LocalFlowStep.initTaps(LocalFlowStep.java:77)
at cascading.flow.local.LocalFlowStep.getInitializedConfig(LocalFlowStep.java:56)
at cascading.flow.local.LocalFlowStep.createFlowStepJob(LocalFlowStep.java:135)
at cascading.flow.local.LocalFlowStep.createFlowStepJob(LocalFlowStep.java:38)
at cascading.flow.planner.BaseFlowStep.getFlowStepJob(BaseFlowStep.java:588)
at cascading.flow.BaseFlow.initializeNewJobsMap(BaseFlow.java:1162)
at cascading.flow.BaseFlow.initialize(BaseFlow.java:184)
at cascading.flow.local.planner.LocalPlanner.buildFlow(LocalPlanner.java:78)
... 2 more

Just to provide a bit more detail: You can't mix local and hadoop classes in Cascading, as they assume different and incompatible environments. What's happening in your case is that you're trying to create a local flow with hadoop taps, the latter expecting a hadoop JobConf instead of the Properties object used to configure local taps.
Your code will work if you use cascading.tap.local.FileTap instead of cascading.tap.hadoop.Hfs.

Welcome to Cascading -
I just answered on the Cascading user list, but in brief the problem is a mix of local and Hadoop mode classes.. This code has LocalFlowConnector, but then uses Hfs taps.
When I revert back to the classes used in the "Impatient" tutorial, it run correctly:
https://gist.github.com/3784194

Yes, you need to use LFS(Local File System) tap instead of HFS (Hadoop File System).
Also you can test your code using Junit test cases (with cascading-unittest jar) in local mode itself/ from eclipse.
http://www.cascading.org/2012/08/07/cascading-for-the-impatient-part-6/

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Trying to access s3 bucket using scala application - java

Related

Can't read from Jenkins pipeline workspaces using Java API

GraalVM - embedding python multi-file project in java

AWS Java SDK credentials

Magnolia cms: resources module proper usage

Run a simple Cascading application in local mode

Categories

Resources