Apache-Apex : How to write data from kafka topic to hdfs filesystem?

Apache-Apex : How to write data from kafka topic to hdfs filesystem? - java

I am trying to read dat from kafka topic and writing it to HDFS filesystem, I buid my project using the apex malhar from [ https://github.com/apache/apex-malhar/tree/master/examples/kafka].
Unfortunally, after setting up the kafka properties and hadoop config the data are not created in my hdfs 2.6.0 system.
PS: the console dosen't show any error and everything seems to work fine
here the code I am using for my app
public class TestConsumer {
public static void main(String[] args) {
Consumer consumerThread = new Consumer(KafkaProperties.TOPIC);
consumerThread.start();
ApplicationTest a = new ApplicationTest();
try {
a.testApplication();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Here the ApplicationTest class example from apex malhar
package org.apache.apex.examples.kafka.kafka2hdfs;
import org.apache.log4j.Logger;
import javax.validation.ConstraintViolationException;
import org.junit.Rule;
import org.apache.apex.malhar.kafka.AbstractKafkaInputOperator;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.net.NetUtils;
import com.datatorrent.api.LocalMode;
import info.batey.kafka.unit.KafkaUnitRule;
/**
* Test the DAG declaration in local mode.
*/
public class ApplicationTest
{
private static final Logger LOG = Logger.getLogger(ApplicationTest.class);
private static final String TOPIC = "kafka2hdfs";
private static final int zkPort = NetUtils.getFreeSocketPort();
private static final int brokerPort = NetUtils.getFreeSocketPort();
private static final String BROKER = "localhost:" + brokerPort;
private static final String FILE_NAME = "test";
private static final String FILE_DIR = "./target/tmp/FromKafka";
// broker port must match properties.xml
#Rule
private static KafkaUnitRule kafkaUnitRule = new KafkaUnitRule(zkPort, brokerPort);
public void testApplication() throws Exception
{
try {
// run app asynchronously; terminate after results are checked
LocalMode.Controller lc = asyncRun();
lc.shutdown();
} catch (ConstraintViolationException e) {
LOG.error("constraint violations: " + e.getConstraintViolations());
}
}
private Configuration getConfig()
{
Configuration conf = new Configuration(false);
String pre = "dt.operator.kafkaIn.prop.";
conf.setEnum(pre + "initialOffset", AbstractKafkaInputOperator.InitialOffset.EARLIEST);
conf.setInt(pre + "initialPartitionCount", 1);
conf.set(pre + "topics", TOPIC);
conf.set(pre + "clusters", BROKER);
pre = "dt.operator.fileOut.prop.";
conf.set(pre + "filePath", FILE_DIR);
conf.set(pre + "baseName", FILE_NAME);
conf.setInt(pre + "maxLength", 40);
conf.setInt(pre + "rotationWindows", 3);
return conf;
}
private LocalMode.Controller asyncRun() throws Exception
{
Configuration conf = getConfig();
LocalMode lma = LocalMode.newInstance();
lma.prepareDAG(new KafkaApp(), conf);
LocalMode.Controller lc = lma.getController();
lc.runAsync();
return lc;
}
}

After runAsync and before shutdown, you would need to wait for the expected results (otherwise the DAG will exit immediately). That's actually what happens in the example.

Related

How to get a recorded wav file using Twilio Java API

Could somebody point me, using Twilio Java API, NOT! REST requests, how can I get the recorded file (.wav) of a concrete call.
I have read all the related articles to recording (https://support.twilio.com/hc/en-us/sections/205104748-Recording), but none of them shows how to do that with Java API.
I use this code, as an starting point, assuming the CALL_SID is known:
import com.twilio.Twilio;
import com.twilio.base.ResourceSet;
import com.twilio.rest.api.v2010.account.Recording;
import com.twilio.rest.api.v2010.account.RecordingReader;
public class DeleteRecordings1 {
private static final String ACCOUNT_SID = "ACXXXXXXXXXXXXXXXXX";
private static final String AUTH_TOKEN = "999aa999aaa999aaaa999";
private static final String CALL_SID = "CA83837718818gdgdg";
public static void main(String[] args) {
try {
Twilio.init(ACCOUNT_SID, AUTH_TOKEN);
RecordingReader recordingReader = Recording.reader();
recordingReader.setCallSid(CALL_SID);
ResourceSet<Recording> recordings = recordingReader.read();
String recordingSid;
for (Recording recording: recordings) {
recordingSid = recording.getSid();
//HERE! I want to restore the .wav file associated with that RECORD_SID ?¿
}
} catch (Exception e) {
e.printStackTrace();
}
}
}

I put the final code, in case it can help somebody:
import java.io.File;
import java.io.InputStream;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
import com.twilio.Twilio;
import com.twilio.base.ResourceSet;
import com.twilio.rest.api.v2010.account.Recording;
import com.twilio.rest.api.v2010.account.RecordingReader;
public class GetCallRecordings {
private static final String ACCOUNT_SID = "ACXXXXXXXXXXXXXXXXX";
private static final String AUTH_TOKEN = "999aa999aaa999aaaa999";
private static final String CALL_SID = "CA83837718818gdgdg";
private static final String TWILIO_RES_URL = "https://api.twilio.com/2010-04-01/Accounts";
private static final String REC_EXT = ".mp3";
private static final String RUTA_RECS = "C:/recursos/grabaciones/";
public static void main(String[] args) {
try {
Twilio.init(ACCOUNT_SID, AUTH_TOKEN);
RecordingReader recordingReader = Recording.reader();
recordingReader.setCallSid(CALL_SID);
ResourceSet<Recording> recordings = recordingReader.read();
String recordingSid;
String urlGrabacion;
String locGrabacion;
InputStream in;
for (Recording recording : recordings) {
recordingSid = recording.getSid();
urlGrabacion = TWILIO_RES_URL + "/" + ACCOUNT_SID + "/Recordings/" + recordingSid + REC_EXT;
locGrabacion = RUTA_RECS + CALL_SID + "_" + recordingSid + REC_EXT;
System.out.println("Recuperando grabacion " + recordingSid);
System.out.println("Ubicacion remota " + urlGrabacion);
if (!new File(RUTA_RECS).exists()) {
new File(RUTA_RECS).mkdirs();
}
in = new URL(urlGrabacion).openStream();
Files.copy(in, Paths.get(locGrabacion), StandardCopyOption.REPLACE_EXISTING);
System.out.println("Ubicacion local " + locGrabacion);
in.close();
}
} catch (Exception e) {
e.printStackTrace();
}
}
}

Once you know the recordingSid for example RE557ce644e5ab84fa21cc21112e22c485
you can get a .wav file at https://api.twilio.com/2010-04-01/Accounts/ACXXXXX.../Recordings/RE557ce644e5ab84fa21cc21112e22c485.wav
You can get a .mp3 file at https://api.twilio.com/2010-04-01/Accounts/ACXXXXX.../Recordings/RE557ce644e5ab84fa21cc21112e22c485.mp3
where ACXXXXX... is your Twilio account SID (ACCOUNT_SID)

StreamingFileSink bulk writer results in some checkpoint error when running in AWS EMR

Unable to use StreamingFileSink and store incoming events in compressed fashion.
I am trying to use StreamingFileSink to write unbounded event stream to S3. In the process, I would like to compress the data to make better use of storage size available.
I wrote a compressed string writer, by borrowing some code from SequenceFileWriterFactory from flink. It fails with the exception I described below.
If I try to use BucketingSink, it works great.
Using BucketingSink, I approached compressed string write as below. Again, I borrowed this code from some other pull request.
import org.apache.flink.streaming.connectors.fs.StreamWriterBase;
import org.apache.flink.streaming.connectors.fs.Writer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.compress.CodecPool;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.CompressionCodecFactory;
import org.apache.hadoop.io.compress.CompressionOutputStream;
import org.apache.hadoop.io.compress.Compressor;
import java.io.IOException;
public class CompressionStringWriter<T> extends StreamWriterBase<T> implements Writer<T> {
private static final long serialVersionUID = 3231207311080446279L;
private String codecName;
private String separator;
public String getCodecName() {
return codecName;
}
public String getSeparator() {
return separator;
}
private transient CompressionOutputStream compressedOutputStream;
public CompressionStringWriter(String codecName, String separator) {
this.codecName = codecName;
this.separator = separator;
}
public CompressionStringWriter(String codecName) {
this(codecName, System.lineSeparator());
}
protected CompressionStringWriter(CompressionStringWriter<T> other) {
super(other);
this.codecName = other.codecName;
this.separator = other.separator;
}
#Override
public void open(FileSystem fs, Path path) throws IOException {
super.open(fs, path);
Configuration conf = fs.getConf();
CompressionCodecFactory codecFactory = new CompressionCodecFactory(conf);
CompressionCodec codec = codecFactory.getCodecByName(codecName);
if (codec == null) {
throw new RuntimeException("Codec " + codecName + " not found");
}
Compressor compressor = CodecPool.getCompressor(codec, conf);
compressedOutputStream = codec.createOutputStream(getStream(), compressor);
}
#Override
public void close() throws IOException {
if (compressedOutputStream != null) {
compressedOutputStream.close();
compressedOutputStream = null;
} else {
super.close();
}
}
#Override
public void write(Object element) throws IOException {
getStream();
compressedOutputStream.write(element.toString().getBytes());
compressedOutputStream.write(this.separator.getBytes());
}
#Override
public CompressionStringWriter<T> duplicate() {
return new CompressionStringWriter<>(this);
}
}
BucketingSink<DeviceEvent> bucketingSink = new BucketingSink<>("s3://"+ this.bucketName + "/" + this.objectPrefix);
bucketingSink
.setBucketer(new OrgIdBasedBucketAssigner())
.setWriter(new CompressionStringWriter<DeviceEvent>("Gzip", "\n"))
.setPartPrefix("file-")
.setPartSuffix(".gz")
.setBatchSize(1_500_000);
The one with BucketingSink works.
Now my code snippets using StreamingFileSink involves the below set of code.
import org.apache.flink.api.common.serialization.BulkWriter;
import java.io.IOException;
public class CompressedStringBulkWriter<T> implements BulkWriter<T> {
private final CompressedStringWriter compressedStringWriter;
public CompressedStringBulkWriter(final CompressedStringWriter compressedStringWriter) {
this.compressedStringWriter = compressedStringWriter;
}
#Override
public void addElement(T element) throws IOException {
this.compressedStringWriter.write(element);
}
#Override
public void flush() throws IOException {
this.compressedStringWriter.flush();
}
#Override
public void finish() throws IOException {
this.compressedStringWriter.close();
}
}
import org.apache.flink.api.common.serialization.BulkWriter;
import org.apache.flink.core.fs.FSDataOutputStream;
import org.apache.hadoop.conf.Configuration;
import java.io.IOException;
public class CompressedStringBulkWriterFactory<T> implements BulkWriter.Factory<T> {
private SerializableHadoopConfiguration serializableHadoopConfiguration;
public CompressedStringBulkWriterFactory(final Configuration hadoopConfiguration) {
this.serializableHadoopConfiguration = new SerializableHadoopConfiguration(hadoopConfiguration);
}
#Override
public BulkWriter<T> create(FSDataOutputStream out) throws IOException {
return new CompressedStringBulkWriter(new CompressedStringWriter(out, serializableHadoopConfiguration.get(), "Gzip", "\n"));
}
}
import org.apache.flink.core.fs.FSDataOutputStream;
import org.apache.flink.core.fs.FileSystem;
import org.apache.flink.core.fs.Path;
import org.apache.flink.runtime.fs.hdfs.HadoopFileSystem;
import org.apache.flink.util.Preconditions;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.compress.CodecPool;
import org.apache.hadoop.io.compress.CompressionCodec;
import org.apache.hadoop.io.compress.CompressionCodecFactory;
import org.apache.hadoop.io.compress.CompressionOutputStream;
import org.apache.hadoop.io.compress.Compressor;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.io.Serializable;
public class CompressedStringWriter<T> implements Serializable {
private static final Logger LOG = LoggerFactory.getLogger(CompressedStringWriter.class);
private static final long serialVersionUID = 2115292142239557448L;
private String separator;
private transient CompressionOutputStream compressedOutputStream;
public CompressedStringWriter(FSDataOutputStream out, Configuration hadoopConfiguration, String codecName, String separator) {
this.separator = separator;
try {
Preconditions.checkNotNull(hadoopConfiguration, "Unable to determine hadoop configuration using path");
CompressionCodecFactory codecFactory = new CompressionCodecFactory(hadoopConfiguration);
CompressionCodec codec = codecFactory.getCodecByName(codecName);
Preconditions.checkNotNull(codec, "Codec " + codecName + " not found");
LOG.info("The codec name that was loaded from hadoop {}", codec);
Compressor compressor = CodecPool.getCompressor(codec, hadoopConfiguration);
this.compressedOutputStream = codec.createOutputStream(out, compressor);
LOG.info("Setup a compressor for codec {} and compressor {}", codec, compressor);
} catch (IOException ex) {
throw new RuntimeException("Unable to compose a hadoop compressor for the path", ex);
}
}
public void flush() throws IOException {
if (compressedOutputStream != null) {
compressedOutputStream.flush();
}
}
public void close() throws IOException {
if (compressedOutputStream != null) {
compressedOutputStream.close();
compressedOutputStream = null;
}
}
public void write(T element) throws IOException {
compressedOutputStream.write(element.toString().getBytes());
compressedOutputStream.write(this.separator.getBytes());
}
}
import org.apache.hadoop.conf.Configuration;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
public class SerializableHadoopConfiguration implements Serializable {
private static final long serialVersionUID = -1960900291123078166L;
private transient Configuration hadoopConfig;
SerializableHadoopConfiguration(Configuration hadoopConfig) {
this.hadoopConfig = hadoopConfig;
}
Configuration get() {
return this.hadoopConfig;
}
// --------------------
private void writeObject(ObjectOutputStream out) throws IOException {
this.hadoopConfig.write(out);
}
private void readObject(ObjectInputStream in) throws IOException {
final Configuration config = new Configuration();
config.readFields(in);
if (this.hadoopConfig == null) {
this.hadoopConfig = config;
}
}
}
My actual flink job
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties kinesisConsumerConfig = new Properties();
...
...
DataStream<DeviceEvent> kinesis =
env.addSource(new FlinkKinesisConsumer<>(this.streamName, new DeviceEventSchema(), kinesisConsumerConfig)).name("source")
.setParallelism(16)
.setMaxParallelism(24);
final StreamingFileSink<DeviceEvent> bulkCompressStreamingFileSink = StreamingFileSink.<DeviceEvent>forBulkFormat(
path,
new CompressedStringBulkWriterFactory<>(
BucketingSink.createHadoopFileSystem(
new Path("s3a://"+ this.bucketName + "/" + this.objectPrefix),
null).getConf()))
.withBucketAssigner(new OrgIdBucketAssigner())
.build();
deviceEventDataStream.addSink(bulkCompressStreamingFileSink).name("bulkCompressStreamingFileSink").setParallelism(16);
env.execute();
I expect data to be saved in S3 as multiple files. Unfortunately no files are being created.
In the logs, I see below exception
2019-05-15 22:17:20,855 INFO org.apache.flink.runtime.taskmanager.Task - Sink: bulkCompressStreamingFileSink (11/16) (c73684c10bb799a6e0217b6795571e22) switched from RUNNING to FAILED.
java.lang.Exception: Could not perform checkpoint 1 for operator Sink: bulkCompressStreamingFileSink (11/16).
at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:595)
at org.apache.flink.streaming.runtime.io.BarrierBuffer.notifyCheckpoint(BarrierBuffer.java:396)
at org.apache.flink.streaming.runtime.io.BarrierBuffer.processBarrier(BarrierBuffer.java:292)
at org.apache.flink.streaming.runtime.io.BarrierBuffer.getNextNonBlocked(BarrierBuffer.java:200)
at org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:209)
at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:105)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Exception: Could not complete snapshot 1 for operator Sink: bulkCompressStreamingFileSink (11/16).
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:422)
at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1113)
at org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1055)
at org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:729)
at org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:641)
at org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpointOnBarrier(StreamTask.java:586)
... 8 more
Caused by: java.io.IOException: Stream closed.
at org.apache.flink.fs.s3.common.utils.RefCountedFile.requireOpened(RefCountedFile.java:117)
at org.apache.flink.fs.s3.common.utils.RefCountedFile.write(RefCountedFile.java:74)
at org.apache.flink.fs.s3.common.utils.RefCountedBufferingFileStream.flush(RefCountedBufferingFileStream.java:105)
at org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.closeAndUploadPart(S3RecoverableFsDataOutputStream.java:199)
at org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.closeForCommit(S3RecoverableFsDataOutputStream.java:166)
at org.apache.flink.streaming.api.functions.sink.filesystem.PartFileWriter.closeForCommit(PartFileWriter.java:71)
at org.apache.flink.streaming.api.functions.sink.filesystem.BulkPartWriter.closeForCommit(BulkPartWriter.java:63)
at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.closePartFile(Bucket.java:239)
at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.prepareBucketForCheckpointing(Bucket.java:280)
at org.apache.flink.streaming.api.functions.sink.filesystem.Bucket.onReceptionOfCheckpoint(Bucket.java:253)
at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotActiveBuckets(Buckets.java:244)
at org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.snapshotState(Buckets.java:235)
at org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink.snapshotState(StreamingFileSink.java:347)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.trySnapshotFunctionState(StreamingFunctionUtils.java:118)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.snapshotFunctionState(StreamingFunctionUtils.java:99)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.snapshotState(AbstractUdfStreamOperator.java:90)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:395)
So wondering, what am I missing.
I am using AWS EMR latest (5.23).

In CompressedStringBulkWriter#close(), you are calling the close() method on the CompressionCodecStream which also closes the underlying the stream i.e. Flink's FSDataOutputStream. It has to be opened for the checkpointing to be done properly by Flink's internal to guarantee recoverable stream. That is why you are getting
Caused by: java.io.IOException: Stream closed.
at org.apache.flink.fs.s3.common.utils.RefCountedFile.requireOpened(RefCountedFile.java:117)
at org.apache.flink.fs.s3.common.utils.RefCountedFile.write(RefCountedFile.java:74)
at org.apache.flink.fs.s3.common.utils.RefCountedBufferingFileStream.flush(RefCountedBufferingFileStream.java:105)
at org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.closeAndUploadPart(S3RecoverableFsDataOutputStream.java:199)
at org.apache.flink.fs.s3.common.writer.S3RecoverableFsDataOutputStream.closeForCommit(S3RecoverableFsDataOutputStream.java:166)
So instead of compressedOutputStream.close(), use compressedOutputStream.finish() which just flushes everything that's in buffer to the outputstream without closing it. BTW, there is an inbuilt HadoopCompressionBulkWriter made available in the latest version Flink, you can also use that.

java save http post requests hourly

I'm trying to set up a server on aws with simple http server and save each http post request headers & payload.
It works locally.
My steps after connection via ssh to the ec2 server:
javac Server.java
sudo nohup java Server
It saves the headers to log file but not the payload and it doesn't returns 204 response.
Server.java
import com.sun.net.httpserver.HttpExchange;
import com.sun.net.httpserver.HttpHandler;
import com.sun.net.httpserver.HttpServer;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.net.InetSocketAddress;
import java.net.URLDecoder;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.Executors;
public class Server {
private static final int PORT = 80;
private static final String FILE_PATH = "/home/ec2-user/logs/";
private static final String UTF8 = "UTF-8";
private static final String DELIMITER = "|||";
private static final String LINE_BREAK = "\n";
private static final String FILE_PREFIX = "dd_MM_YYYY_HH";
private static final SimpleDateFormat simpleDateFormat = new SimpleDateFormat(FILE_PREFIX);
private static final String FILE_TYPE = ".txt";
public static void main(String[] args) {
try {
HttpServer server = HttpServer.create(new InetSocketAddress(PORT), 0);
server.createContext("/", new HttpHandler() {
#Override
public void handle(HttpExchange t) throws IOException {
System.out.println("Req\t" + t.getRemoteAddress());
InputStream initialStream = t.getRequestBody();
byte[] buffer = new byte[initialStream.available()];
initialStream.read(buffer);
File targetFile = new File(FILE_PATH + simpleDateFormat.format(new Date()) + FILE_TYPE);
OutputStream outStream = new FileOutputStream(targetFile, true);
String prefix = LINE_BREAK + t.getRequestHeaders().entrySet().toString() + LINE_BREAK + System.currentTimeMillis() + DELIMITER;
outStream.write(prefix.getBytes());
Map<String, String> queryPairs = new HashMap<>();
String params = new String(buffer);
String[] pairs = params.split("&");
for (String pair : pairs) {
int idx = pair.indexOf("=");
String key = pair.substring(0, idx);
String val = pair.substring(idx + 1);
String decodedKey = URLDecoder.decode(key, UTF8);
String decodeVal = URLDecoder.decode(val, UTF8);
queryPairs.put(decodedKey, decodeVal);
}
outStream.write(queryPairs.toString().getBytes());
t.sendResponseHeaders(204, -1);
t.close();
}
});
server.setExecutor(Executors.newCachedThreadPool());
server.start();
} catch (Exception e) {
System.out.println("Exception: " + e.getMessage());
e.printStackTrace();
}
}
}

Consider these changes to your handle method. As a starting point, two things are changed:
It reads the complete input and copies that into your file (initialStream.available() might not be the full truth)
catch, log and rethrow IOExceptions (you didn't see your 204 after all)
Consider redirecting your output into files, so you can check what happend on server later:
sudo nohup java Server > server.log 2> server.err &
If you described in more detail the desired target file structure we could figure something out there as well I guess.
#Override
public void handle(HttpExchange t) throws IOException {
try {
System.out.println("Req\t" + t.getRemoteAddress());
InputStream initialStream = t.getRequestBody();
File targetFile = new File(FILE_PATH + simpleDateFormat.format(new Date()) + FILE_TYPE);
OutputStream outStream = new FileOutputStream(targetFile, true);
// This will copy ENTIRE input stream into your target file
IOUtils.copy(initialStream, outStream);
outStream.close();
t.sendResponseHeaders(204, -1);
t.close();
} catch(IOException e) {
e.printStackTrace();
throw e;
}
}

kryo serializing of class (task object) in apache spark returns null while de-serialization

I am using java spark API to write some test application . I am using a class which doesn't extends serializable interface . So to make the application work I am using kryo serializer to serialize the class . But the problem which I observed while debugging was that during the de-serialization the returned class object becomes null and in turn throws a null pointer exception . It seems to be closure problem where things are going wrong but not sure.Since I am new to this kind of serialization I don't know where to start digging.
Here is the code I am testing :
package org.apache.spark.examples;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.net.InetAddress;
import java.net.UnknownHostException;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
/**
* Spark application to test the Serialization issue in spark
*/
public class Test {
static PrintWriter outputFileWriter;
static FileWriter file;
static JavaSparkContext ssc;
public static void main(String[] args) {
String inputFile = "/home/incubator-spark/examples/src/main/scala/org/apache/spark/examples/InputFile.txt";
String master = "local";
String jobName = "TestSerialization";
String sparkHome = "/home/test/Spark_Installation/spark-0.7.0";
String sparkJar = "/home/test/TestSerializationIssesInSpark/TestSparkSerIssueApp/target/TestSparkSerIssueApp-0.0.1-SNAPSHOT.jar";
SparkConf conf = new SparkConf();
conf.set("spark.closure.serializer","org.apache.spark.serializer.KryoSerializer");
conf.set("spark.kryo.registrator", "org.apache.spark.examples.MyRegistrator");
// create the Spark context
if(master.equals("local")){
ssc = new JavaSparkContext("local", jobName,conf);
//ssc = new JavaSparkContext("local", jobName);
} else {
ssc = new JavaSparkContext(master, jobName, sparkHome, sparkJar);
}
JavaRDD<String> testData = ssc.textFile(inputFile).cache();
final NotSerializableJavaClass notSerializableTestObject= new NotSerializableJavaClass("Hi ");
#SuppressWarnings({ "serial", "unchecked"})
JavaRDD<String> classificationResults = testData.map(
new Function<String, String>() {
#Override
public String call(String inputRecord) throws Exception {
if(!inputRecord.isEmpty()) {
//String[] pointDimensions = inputRecord.split(",");
String result = "";
try {
FileWriter file = new FileWriter("/home/test/TestSerializationIssesInSpark/results/test_result_" + (int) (Math.random() * 100));
PrintWriter outputFile = new PrintWriter(file);
InetAddress ip;
ip = InetAddress.getLocalHost();
outputFile.println("IP of the server: " + ip);
result = notSerializableTestObject.testMethod(inputRecord);
outputFile.println("Result: " + result);
outputFile.flush();
outputFile.close();
file.close();
} catch (UnknownHostException e) {
e.printStackTrace();
}
catch (IOException e1) {
e1.printStackTrace();
}
return result;
} else {
System.out.println("End of elements in the stream.");
String result = "End of elements in the input data";
return result;
}
}
}).cache();
long processedRecords = classificationResults.count();
ssc.stop();
System.out.println("sssssssssss"+processedRecords);
}
}
Here is the KryoRegistrator class
package org.apache.spark.examples;
import org.apache.spark.serializer.KryoRegistrator;
import com.esotericsoftware.kryo.Kryo;
public class MyRegistrator implements KryoRegistrator {
public void registerClasses(Kryo kryo) {
kryo.register(NotSerializableJavaClass.class);
}
}
Here is the class I am serializing :
package org.apache.spark.examples;
public class NotSerializableJavaClass {
public String testVariable;
public NotSerializableJavaClass(String testVariable) {
super();
this.testVariable = testVariable;
}
public String testMethod(String vartoAppend){
return this.testVariable + vartoAppend;
}
}

This is because spark.closure.serializer only supports the Java serializer. See http://spark.apache.org/docs/latest/configuration.html about spark.closure.serializer

How to write util.logger in thread safe execution?

I have MyLogger class which contains:
public class MyLogger {
private final static Logger logger = Logger.getLogger(MyLogger.class.getName());
private static FileHandler fileHandler;
private static String loggerFileName;
public MyLogger() {
}
public static void createMyLogger(String filename){
loggerFileName = filename;
try {
File loggerFile = new File(filename);
boolean fileExists = loggerFile.exists();
if(fileExists){
loggerFile.delete();
File lockFile = new File(filename+".lck");
if(lockFile.exists())
lockFile.delete();
}
fileHandler = new FileHandler(filename,true);
} catch (IOException e) {
e.printStackTrace();
}
logger.addHandler(fileHandler);
SimpleFormatter simpleFormatter = new SimpleFormatter();
fileHandler.setFormatter(simpleFormatter);
}
public static void log(String msg) {
logger.log(Level.INFO, msg);
}
public static void log(Exception ex) {
logger.log(Level.SEVERE, "Exception",ex);
}
public static String getLoggerFileName() {
return loggerFileName;
}
public static void setLoggerFileName(String loggerFileName) {
MyLogger.loggerFileName = loggerFileName;
}
And I am having my further execution in threads, i.e. When I am starting first process then Logger File is created and logs are recorded, but when I starts another process then again different thread is created and also new logger file is created but because of static methods and reference It mixed up the both process logs in both logger files...
When I start process for every thread following method is called:
public void start(String process) {
try{
String filename = process.replace(".com", "");
MyLogger.createXPathLogger(filename.concat(WordUtils.capitalize(type))+ ".log");
MyLogger.log("got parameters ===>> process : "+process);
}catch (Exception e) {
MyLogger.log("Exception In main() method....");
MyLogger.log("*****"+process+" process failed In main() method.*****");
MyLogger.log(e);
}
}
So what can I do for this, Ho can I do thread safe logging?
thanks in advance..

The idea here won't really work. One logger gets created and you add the each handler to the logger.
What you really need is a concept of a log context. You need to isolate the creation of the logger on a per-thread basis which AFAIK you can't do with J.U.L.
This is a very rough around the edges example of using JBoss Log Manager to do this:
import java.io.File;
import java.io.IOException;
import java.util.logging.FileHandler;
import java.util.logging.Level;
import java.util.logging.Logger;
import java.util.logging.SimpleFormatter;
import org.jboss.logmanager.LogContext;
public class MyLogger {
private static final ThreadLocal<MyLogger> LOCAL_LOGGER = new ThreadLocal<MyLogger>();
private final Logger logger;
private MyLogger(final Logger logger) {
this.logger = logger;
}
public static MyLogger createMyLogger(final String filename) {
MyLogger result = LOCAL_LOGGER.get();
if (result == null) {
final LogContext logContext = LogContext.create();
final Logger logger = logContext.getLogger(MyLogger.class.getName());
final FileHandler fileHandler;
try {
final File loggerFile = new File(filename);
if (loggerFile.exists()) {
loggerFile.delete();
File lockFile = new File(filename + ".lck");
if (lockFile.exists())
lockFile.delete();
}
fileHandler = new FileHandler(filename, true);
} catch (IOException e) {
throw new RuntimeException("Could not create file handler", e);
}
logger.addHandler(fileHandler);
fileHandler.setFormatter(new SimpleFormatter());
final MyLogger myLogger = new MyLogger(logger);
LOCAL_LOGGER.set(myLogger);
result = myLogger;
}
return result;
}
public void log(String msg) {
logger.log(Level.INFO, msg);
}
public void log(Exception ex) {
logger.log(Level.SEVERE, "Exception", ex);
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Apache-Apex : How to write data from kafka topic to hdfs filesystem? - java

After runAsync and before shutdown, you would need to wait for the expected results (otherwise the DAG will exit immediately). That's actually what happens in the example.

Related

How to get a recorded wav file using Twilio Java API

StreamingFileSink bulk writer results in some checkpoint error when running in AWS EMR

java save http post requests hourly

kryo serializing of class (task object) in apache spark returns null while de-serialization

How to write util.logger in thread safe execution?

Categories

Resources