how to solve the following in apache spark

how to solve the following in apache spark - java

Consider a retail scenario where an array of (K,V) input holds the (product name,price) as show below. Value of every Key need to be subtracted with 500 for discount offer
Use Spark logics to achieve the above requirement,
Input
{(Jeans,2000),(Smart phone,10000),(Watch,3000)}
Expected Outputenter code here
{(Jeans,1500),(Smart phone,9500),(Watch,2500)}
I have tried the below code I'm getting errors please help me to fix them
import java.util.Arrays;
import java.util.Iterator;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import scala.Tuple2;
public class PairRDDAgg {
public static void main(String[] args) {
// TODO Auto-generated method stub
SparkConf conf = new
SparkConf().setAppName("Line_Count").setMaster("local");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaRDD<String> input =
sc.textFile("C:/Users/xxxx/Documents/retail.txt");
JavaPairRDD<String, Integer> counts = input.mapValues(new Function() {
/**
*
*/
private static final long serialVersionUID = 1L;
public Integer call(Integer i) {
return (i-500);
}
});
System.out.println(counts.collect());
sc.close();
}
}

Use mapValues() function
An example for your scenario would be
rdd.mapValues(x => x-500);

You can try this:
scala> val dataset = spark.createDataset(Seq(("Jeans",2000),("Smart phone",10000),("Watch",3000)))
dataset: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
scala> dataset.map ( x => (x._1, x._2 - 500) ).show
+-----------+----+
| _1| _2|
+-----------+----+
| Jeans|1500|
|Smart phone|9500|
| Watch|2500|
+-----------+----+

Related

How to run particular Test step of soapUi in java

I want to run particular testStep of my testcase of soap ui using java code. My problem is when I try to run at test step level it need argument of TestCase runner which is anonymous inner type and TestCaseRunContext which is interface. Do I have to implement both to run the same? if yes can please any sample how to do that??
here's my code
package com.testauto.soaprunner.soap.impl;
import java.sql.Timestamp;
import java.util.ArrayList;
import java.util.Date;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.eviware.soapui.SoapUI;
import com.eviware.soapui.StandaloneSoapUICore;
import com.eviware.soapui.impl.wsdl.WsdlProject;
import com.eviware.soapui.impl.wsdl.WsdlTestSuite;
import com.eviware.soapui.impl.wsdl.testcase.WsdlTestCase;
import com.eviware.soapui.impl.wsdl.testcase.WsdlTestCaseRunner;
import com.eviware.soapui.impl.wsdl.teststeps.WsdlTestStep;
import com.eviware.soapui.model.TestPropertyHolder;
import com.eviware.soapui.model.iface.MessageExchange;
import com.eviware.soapui.model.propertyexpansion.PropertyExpansionUtils;
import com.eviware.soapui.model.testsuite.TestCase;
import com.eviware.soapui.model.testsuite.TestCaseRunContext;
import com.eviware.soapui.model.testsuite.TestProperty;
import com.eviware.soapui.model.testsuite.TestStepResult;
import com.eviware.soapui.model.testsuite.TestSuite;
import com.eviware.soapui.support.types.StringToObjectMap;
import com.eviware.soapui.support.types.StringToStringsMap;
import com.testauto.soaprunner.data.InputData;
import com.testauto.soaprunner.data.ReportData;
public class RunTestImpl{
static Logger logger = LoggerFactory.getLogger(RunTestImpl.class);
List<ReportData> reportDatList=new ArrayList<ReportData>();
public List<ReportData> process(Map<String, String> readDataMap, InputData input, Map<List<String>, String> configurationMap, List<String> configuration, WsdlTestSuite testSuite)
{
List<ReportData> report = new ArrayList<ReportData>();
logger.info("Into the Class for running test cases");
try{
report= getTestSuite(readDataMap,input,configurationMap,configuration,testSuite);
}
catch(Exception e)
{
logger.info(e.getMessage());
}
return report;
}
private List<ReportData> getTestSuite(Map<String, String> readDataMap, InputData input, Map<List<String>, String> configurationMap, List<String> configuration, WsdlTestSuite testSuite) throws Exception {
ReportData report=new ReportData();
logger.info("Into the Class for running test cases");
String suiteName = "";
String reportStr = "";
List<String> testCaseNameList= setPropertyValues(readDataMap,input);
WsdlTestCaseRunner runner = null;
List<TestSuite> suiteList = new ArrayList<TestSuite>();
List<TestCase> caseList = new ArrayList<TestCase>();
SoapUI.setSoapUICore(new StandaloneSoapUICore(true));
System.out.println("testcase name "+ configurationMap.get(configuration));
// WsdlTestCase testCase= testSuite.getTestCaseByName(input.getApiName()+"_"+testCaseName+"_TestCase");
WsdlTestCase testCase= testSuite.getTestCaseByName("my_TESTCASE");
WsdlTestStep tesStep=testCase.getTestStepByName(configurationMap.get(testCaseNameList));
System.out.println("test case name:"+testCase.getName());
report.setTestCase(testCase.getName());
suiteList.add(testSuite);
runner= tesStep.run(?,?);
return reportDatList;
}
private List<String> setPropertyValues(Map<String, String> readDataMap, InputData input) {
String testCaseName="";
TestPropertyHolder holder = PropertyExpansionUtils.getGlobalProperties();
List<String> dataConfigurationList=new ArrayList<String>();
Iterator entries = readDataMap.entrySet().iterator();
while (entries.hasNext()) {
Entry thisEntry = (Entry) entries.next();
String key = (String) thisEntry.getKey();
String value = (String) thisEntry.getValue();
testCaseName+=key;
holder.setPropertyValue(key, holder.getPropertyValue(key));
dataConfigurationList.add(key);
}
System.out.println("testCaseName"+testCaseName);
return dataConfigurationList;
}
}
}

After trying different things I got something like this.
TestCaseRunContext context = new MockTestRunContext(new MockTestRunner(testStep.getTestCase()), testStep);
MockTestRunner runner = new MockTestRunner(testStep.getTestCase());
TestStepResult testStepResult= testStep.run(runner, context);
I don't know how it works this trick worked for me. if someone know the reason behind this please share

How to display current accumulator value updated in DStream?

I am processing a java jar. The accumulator adds up the stream values. The problem is, I want to display the value in my UI every time it increments or in a specific periodic interval.
But, Since the accumulators value can only be got from the Driver program, I am not able to access this value until the process finishes its execution. any idea on how i can access this value periodically?
My code is as given below
package com.spark;
import java.util.HashMap;
import java.util.Map;
import org.apache.spark.Accumulator;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.streaming.Duration;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import org.apache.spark.streaming.kafka.KafkaUtils;
import scala.Tuple2;
public class KafkaSpark {
/**
* #param args
*/
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("Simple Application");
conf.setMaster("local");
JavaStreamingContext jssc = new JavaStreamingContext(conf,
new Duration(5000));
final Accumulator<Integer> accum = jssc.sparkContext().accumulator(0);
Map<String, Integer> topicMap = new HashMap<String, Integer>();
topicMap.put("test", 1);
JavaPairDStream<String, String> lines = KafkaUtils.createStream(jssc,
"localhost:2181", "group1", topicMap);
JavaDStream<Integer> map = lines
.map(new Function<Tuple2<String, String>, Integer>() {
public Integer call(Tuple2<String, String> v1)
throws Exception {
if (v1._2.contains("the")) {
accum.add(1);
return 1;
}
return 0;
}
});
map.print();
jssc.start();
jssc.awaitTermination();
System.out.println("*************" + accum.value());
System.out.println("done");
}
}
I am streaming data using Kafka.

In spark only when jssc.star() is called the actual code starts to execute. Now the control is with spark it starts to run the loop, all you system.out.println will be called only once. and will not be executed with the loop everytime.
For out put operations check the documentation
you can either use
print()
forEachRDD()
save as object text or hadoop file
Hope this helps

jssc.start();
while(true) {
System.out.println("current:" + accum.value());
Thread.sleep(1000);
}

Jahmm's KmeansLearner

I'm new with the Jahmm package, also I'm new with Java.
I'm having an error in KMeansLearner that says
Incompatible Types List<ObservationVector> cannot be converted to
List<? extends Observation Vector>
What does this mean? I have only observation vectors until now, and I declared it on headers. Can please anyone can tell how do I fix this? And if I want to use a <ObservationReal>, how does it affects the code?
Here is my code:
package jahmm;
import be.ac.ulg.montefiore.run.jahmm.*;
import be.ac.ulg.montefiore.run.jahmm.ForwardBackwardCalculator;
import be.ac.ulg.montefiore.run.jahmm.Hmm;
import be.ac.ulg.montefiore.run.jahmm.KMeansCalculator;
import be.ac.ulg.montefiore.run.jahmm.ObservationVector;
import be.ac.ulg.montefiore.run.jahmm.ObservationVector;
import be.ac.ulg.montefiore.run.jahmm.OpdfDiscrete;
import be.ac.ulg.montefiore.run.jahmm.OpdfMultiGaussian;
import be.ac.ulg.montefiore.run.jahmm.ViterbiCalculator;
import be.ac.ulg.montefiore.run.jahmm.draw.GenericHmmDrawerDot;
import be.ac.ulg.montefiore.run.jahmm.io.ObservationReader;
import be.ac.ulg.montefiore.run.jahmm.io.ObservationSequencesReader;
import be.ac.ulg.montefiore.run.jahmm.io.ObservationVectorReader;
import be.ac.ulg.montefiore.run.jahmm.learn.KMeansLearner;
import be.ac.ulg.montefiore.run.jahmm.learn.BaumWelchLearner;
import be.ac.ulg.montefiore.run.jahmm.learn.BaumWelchScaledLearner;
import be.ac.ulg.montefiore.run.jahmm.toolbox.MarkovGenerator;
import be.ac.ulg.montefiore.run.jahmm.ObservationReal;
import be.ac.ulg.montefiore.run.jahmm.OpdfInteger;
import java.io.*;
import java.lang.*;
import java.util.*;
/**
*
* #author
*/
public class Jahmm {
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
//Instances instances;
Reader reader;
int i, j, k;
try{
//String filename = argv[0];
String filex="dta_eat.seq";
//filex = "Desktop\R\dt_junto.csv";
String csvFileToRead = filex;
reader = new FileReader(filex);
List<ObservationVector> sequences =
ObservationSequencesReader.readSequence(new ObservationVectorReader(),
reader);
reader.close();
OpdfMultiGaussianFactory gMix = new OpdfMultiGaussianFactory(3);
KMeansLearner<ObservationVector> kml;
kml = new KMeansLearner<ObservationVector>(6,
gMix, sequences);
Hmm<ObservationVector> initHmm = kml.iterate();
//Hmm<ObservationVector> fittedHmm = kml.learn();
//Hmm<ObservationVector> initHmm = kml.iterate();
} catch(Exception e){
e.printStackTrace();
}
}
}
I'll really would appreciate your help.

you get List of Lists:
Reader reader = new FileReader("vectors.seq");
List<List<ObservationVector>> v = ObservationSequencesReader.
readSequences(new ObservationVectorReader(2), reader);
reader.close();
See this example.

MapReduce-Cassandra wordcount compilation error: ConfigHelper not found

I am trying to run WordCount MapReduce program to read and count data stored in Cassandra table (Column Family) but, when I compile my program I got the same error repeated times. Below is my source code and error I got. Can anyone help me to solve this issue? Thanks in advance.
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.cassandra.db.IColumn;
import org.apache.cassandra.hadoop.*;
import org.apache.cassandra.hadoop.ColumnFamilyInputFormat;
import org.apache.cassandra.hadoop.ConfigHelper;
import org.apache.cassandra.thrift.*;
import org.apache.cassandra.utils.ByteBufferUtil;
/**
* This sums the word count stored in the input_words_count ColumnFamily for the key "key-if-verse1".
*
* Output is written to a text file.
*/
public class WordCountCounters extends Configured implements Tool
{
private static final Logger logger = LoggerFactory.getLogger(WordCountCounters.class);
static final String COUNTER_COLUMN_FAMILY = "input_words";
private static final String OUTPUT_PATH_PREFIX = "/Users/Deepu/Documents/dse-3.2.4/dse-data/word_count_counters";
public static void main(String[] args) throws Exception
{
// Let ToolRunner handle generic command-line options
ToolRunner.run(new Configuration(), new WordCountCounters(), args);
System.exit(0);
}
public static class SumMapper extends Mapper<ByteBuffer, SortedMap<ByteBuffer, IColumn>, Text, LongWritable>
{
public void map(ByteBuffer key, SortedMap<ByteBuffer, IColumn> columns, Context context) throws IOException, InterruptedException
{
long sum = 0;
for (IColumn column : columns.values())
{
logger.debug("read " + key + ":" + column.name() + " from " + context.getInputSplit());
sum += ByteBufferUtil.toLong(column.value());
}
context.write(new Text(ByteBufferUtil.string(key)), new LongWritable(sum));
}
}
public int run(String[] args) throws Exception
{
Job job = new Job(getConf(), "wordcountcounters");
job.setJarByClass(WordCountCounters.class);
job.setMapperClass(SumMapper.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
FileOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH_PREFIX));
job.setInputFormatClass(ColumnFamilyInputFormat.class);
ConfigHelper.setRpcPort(job.getConfiguration(), "9160");
ConfigHelper.setInitialAddress(job.getConfiguration(), "localhost");
ConfigHelper.setPartitioner(job.getConfiguration(), "org.apache.cassandra.dht.RandomPartitioner");
ConfigHelper.setInputColumnFamily(job.getConfiguration(), WordCount.KEYSPACE, WordCountCounters.COUNTER_COLUMN_FAMILY);
SlicePredicate predicate = new SlicePredicate().setSlice_range(
new SliceRange().
setStart(ByteBufferUtil.EMPTY_BYTE_BUFFER).
setFinish(ByteBufferUtil.EMPTY_BYTE_BUFFER).
setCount(100));
ConfigHelper.setInputSlicePredicate(job.getConfiguration(), predicate);
job.waitForCompletion(true);
return 0;
}
}
Compiation Errors are:

Because you commented out these two lines perhaps:
//import org.apache.cassandra.hadoop.ColumnFamilyInputFormat;
//import org.apache.cassandra.hadoop.ConfigHelper;

Running a mapreduce class in another Java program

I write a mapreduce class and create a jar file from the class. now I want to use this jar in another java program.
can anyone help me please how could I do this?
thanks
here is my MapReduce Program:
package org.apache.cassandra.com;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.util.Map;
import java.util.Map.Entry;
import org.apache.cassandra.hadoop.ConfigHelper;
import org.apache.cassandra.hadoop.cql3.CqlConfigHelper;
import org.apache.cassandra.hadoop.cql3.CqlPagingInputFormat;
import org.apache.cassandra.utils.ByteBufferUtil;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class CassandraSumLib extends Configured implements Tool
{
public CassandraSumLib(){
}
static final String KEYSPACE = "weather";
static final String COLUMN_FAMILY = "momentinfo1";
static final String OUTPUT_PATH = "/tmp/OutPut";
private static final Logger logger = LoggerFactory.getLogger(CassandraSum.class);
public int CassandraSum(String[] args) throws Exception
{
return ToolRunner.run(new Configuration(), new CassandraSumLib(), args);
}
///////////////////////////////////////////////////////////
public static class Summap extends Mapper<Map<String, ByteBuffer>, Map<FloatWritable, ByteBuffer>, Text, DoubleWritable>
{
Text word = new Text("SUM");
float temp;
public void map(Map<String, ByteBuffer> keys, Map<FloatWritable, ByteBuffer> columns, Context context) throws IOException, InterruptedException
{
for (Entry<FloatWritable, ByteBuffer> column : columns.entrySet())
{
if (!"column".equals(column.getKey()))
continue;
temp = ByteBufferUtil.toFloat(column.getValue());
//System.out.println(temp);
context.write(word, new DoubleWritable(temp));
//System.out.println(word + " " + temp);
}
}
}
///////////////////////////////////////////////////////////
public static class Sumred extends Reducer<Text, DoubleWritable, Text, DoubleWritable>
{
public void reduce(Text key, Iterable<DoubleWritable> values, Context context) throws IOException, InterruptedException
{
Double sum = 0.0;
for (DoubleWritable val : values){
// System.out.println(val.get());
sum += val.get();}
context.write(key, new DoubleWritable(sum));
}
}
///////////////////////////////////////////////////////////
public int run(String[] args) throws Exception
{
Job job = new Job(getConf(), "SUM");
job.setJarByClass(CassandraSum.class);
job.setMapperClass(Summap.class);
JobConf conf = new JobConf( getConf(), CassandraSum.class);
// conf.setNumMapTasks(1000);
// conf.setNumReduceTasks(900);
job.setOutputFormatClass(TextOutputFormat.class);
job.setCombinerClass(Sumred.class);
job.setReducerClass(Sumred.class);
job.setOutputKeyClass(Text.class);
job.setNumReduceTasks(900);
job.setOutputValueClass(DoubleWritable.class);
FileOutputFormat.setOutputPath(job, new Path(OUTPUT_PATH));
job.setInputFormatClass(CqlPagingInputFormat.class);
ConfigHelper.setInputRpcPort(job.getConfiguration(), "9160");
ConfigHelper.setInputInitialAddress(job.getConfiguration(), "localhost");
ConfigHelper.setInputColumnFamily(job.getConfiguration(), KEYSPACE, COLUMN_FAMILY);
ConfigHelper.setInputPartitioner(job.getConfiguration(), "Murmur3Partitioner");
CqlConfigHelper.setInputCQLPageRowSize(job.getConfiguration(), "3");
job.waitForCompletion(true);
return 0;
}
}
I want to call this class from another program. here is my second program that call my firs program:
package org.apache.cassandra.com;
import java.util;
import org.apache.hadoop.util.RunJar;
import org.apache.cassandra.com.CassandraSumLib;
public class CassandraSum {
public static void main(String[] args) throws Exception{
CassandraSumLib CSL = new CassandraSumLib();
CSL.??? (which method should I write here?)
}
}
thanks

Steps to add jar file in eclipse
1. right click on project
2. click on Bulid Path->configure path
3. click on java Build path
4. Click on libraries tab
5. click on add external jar tab
6. choose jar file
7. click on ok

Add the jar to class path of the second program. If you are compiling/running from command line, use -cp option.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

how to solve the following in apache spark - java

Use mapValues() function An example for your scenario would be rdd.mapValues(x => x-500);

Related

How to run particular Test step of soapUi in java

How to display current accumulator value updated in DStream?

Jahmm's KmeansLearner

MapReduce-Cassandra wordcount compilation error: ConfigHelper not found

Running a mapreduce class in another Java program

Categories

Resources