I'm trying to run the following code to get twitter information live:
import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.twitter._
import org.apache.spark.streaming.StreamingContext._
import twitter4j.auth.Authorization
import twitter4j.Status
import twitter4j.auth.AuthorizationFactory
import twitter4j.conf.ConfigurationBuilder
import org.apache.spark.streaming.api.java.JavaStreamingContext
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
import org.apache.spark.mllib.feature.HashingTF
import org.apache.spark.mllib.linalg.Vector
import org.apache.spark.SparkConf
import org.apache.spark.api.java.JavaSparkContext
import org.apache.spark.api.java.function.Function
import org.apache.spark.streaming.Duration
import org.apache.spark.streaming.api.java.JavaDStream
import org.apache.spark.streaming.api.java.JavaReceiverInputDStream
val consumerKey = "xxx"
val consumerSecret = "xxx"
val accessToken = "xxx"
val accessTokenSecret = "xxx"
val url = "https://stream.twitter.com/1.1/statuses/filter.json"
val sparkConf = new SparkConf().setAppName("Twitter Streaming")
val sc = new SparkContext(sparkConf)
val documents: RDD[Seq[String]] = sc.textFile("").map(_.split(" ").toSeq)
// Twitter Streaming
val ssc = new JavaStreamingContext(sc,Seconds(2))
val conf = new ConfigurationBuilder()
conf.setOAuthAccessToken(accessToken)
conf.setOAuthAccessTokenSecret(accessTokenSecret)
conf.setOAuthConsumerKey(consumerKey)
conf.setOAuthConsumerSecret(consumerSecret)
conf.setStreamBaseURL(url)
conf.setSiteStreamBaseURL(url)
val filter = Array("Twitter", "Hadoop", "Big Data")
val auth = AuthorizationFactory.getInstance(conf.build())
val tweets : JavaReceiverInputDStream[twitter4j.Status] = TwitterUtils.createStream(ssc, auth, filter)
val statuses = tweets.dstream.map(status => status.getText)
statuses.print()
ssc.start()
But when it arrives at this command: val sc = new SparkContext(sparkConf), the following error appears:
17/05/09 09:08:35 WARN SparkContext: Multiple running SparkContexts
detected in the same JVM! org.apache.spark.SparkException: Only one
SparkContext may be running in this JVM (see SPARK-2243). To ignore
this error, set spark.driver.allowMultipleContexts = true.
I have tried to add the following parameters to the sparkConf value, but the error still appears:
val sparkConf = new SparkConf().setAppName("Twitter Streaming").setMaster("local[4]").set("spark.driver.allowMultipleContexts", "true")
If I ignore the error and continue running commands I get this other error:
17/05/09 09:15:44 WARN ReceiverSupervisorImpl: Restarting receiver
with delay 2000 ms: Error receiving tweets 401:Authentication
credentials (https://dev.twitter.com/pages/auth) were missing or
incorrect. Ensure that you have set valid consumer key/secret, access
token/secret, and the system clock is in sync. \n\n\nError 401 Unauthorized
HTTP ERROR: 401 Problem accessing
'/1.1/statuses/filter.json'. Reason:Unauthorized
Any kind of contribution is appreciated. A greeting and have a good day.
A Spark-shell already prepares a spark-session or spark-context for you to use - so you don't have to / can't initialize a new one. Usually you will have a line telling you under what variable it is available to you a the end of the spark-shell launch process.
allowMultipleContexts exists only for testing some functionalities of Spark, and shouldn't be used in most cases.
Related
I am trying to retrive a userCertificate associated with a domain name from Windows Active Directory but having difficulties by using Java API
for example when I use 'ldapsearch' command tool, I am able to retrieve the certificate as you can see below
ldapsearch -h 192.xx.2.xx -D "CN=Administrator,CN=Users,DC=mmo,DC=co,DC=ca" -w Password -b "CN=rsa0,CN=Users,DC=mmo,DC=co,DC=ca" "userCertificate"
# extended LDIF
#
# LDAPv3
# base <CN=rsa0,CN=Users,DC=mmo,DC=co,DC=ca> with scope subtree
# filter: (objectclass=*)
# requesting: userCertificate
#
# rsa0, Users, mmo.co.ca
dn: CN=rsa0,CN=Users,DC=mmo,DC=co,DC=ca
userCertificate:: MIIDbTCCAlWgAwIBAgIEFbvHazANBgkqhkiG9w0BAQsFADBnMQswCQYDVQQG
EwJ1azEQMA4GA1UECBMHVW5rbm93bjEWMBQGA1UEBxMNcmlja21hbnN3b3J0aDERMA8GA1UEChMId
m9jYWxpbmsxDDAKBgNVBAsTA2lwczENMAsGA1UEAxMEcnNhMDAeFw0xOTExMjExNDUwNDNaFw0yOT
ExMTgxNDUwNDNaMGcxCzAJBgNVBAYTAnVrMRAwDgYDVQQIEwdVbmtub3duMRYwFAYDVQQHEw1yaWN
rbWFuc3dvcnRoMREwDwYDVQQKEwh2b2NhbGluazEMMAoGA1UECxMDaXBzMQ0wCwYDVQQDEwRyc2Ew
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA0R0yCr0uU80oFG3Zg0vTbR4NSR2St+w4f
DOmoHQ27z1Q2JwhiNh1XkwC8MtVeGzRJw0pe+jXc2fMVbIqONHImOZuX6p1UMWof7fxMAIEfWq98u
OqVbvbXVLeCE9+BJGsOaiJ70Q76e8tDNTH3vg1orXAvb0O7R0Vz9I0iXjJKzUtmFEBju/m3eoa+WI
6OaBr64hJw7oz1CzPIKj0OcapFypFjr4+QKpRsHA4Nn21XrYSsT00Dk9SVK3NTjHm661crvTR6jSx
j1GrCpVdQGCQ25a2RrHIi0cmclNJmy81PngW0cpdO3p9ZsZ2vPUy5/CNbVwqPEPSlIjJtVa0Xf9O1
QIDAQABoyEwHzAdBgNVHQ4EFgQU1U7VOM/vAHL0pqZgi6TS1f0SAt8wDQYJKoZIhvcNAQELBQADgg
EBAC7fK81BHDbF8PSQO2YznZtnzCMs46TwCezyqIFzQljwYA5wxKIytV7GtV4aEUrfIFIeQIMW812
pMol9xIotULSl1I/2WI18QTIJfRAnkCZZPJIa9MU6nEGCouF1LwW9bzQzHOeI07NgCIyBryojXaxc
L/epJtVxYialdI9mBWB8KDytINrylOcP9sXYaUtkOOiU7h0sBF9XBfzXgtTkF8pB7ObX9YJnyvzTn
y2zVfeZD8Q7BtDL7AvIDcUjoHtYx5B0oD86aCNTSShmtB/ZEyqt8Kynqf+QUYQIWA3wVFjgZjCCwc
NxiXuf6H8KGW8hP+ETKnc7u9XP9GCHINf9K0I=
# search result
search: 2
result: 0 Success
# numResponses: 2
# numEntries: 1
however when I try to use the Java program, I am unable to retrive it, below is the sample java program
package CertStore;
import javax.naming.AuthenticationException;
import javax.naming.AuthenticationNotSupportedException;
import javax.naming.Context;
import javax.naming.NamingException;
import javax.naming.directory.DirContext;
import javax.naming.directory.InitialDirContext;
import javax.security.auth.x500.X500Principal;
import java.security.cert.*;
import java.util.*;
import java.io.*;
class CertStoreTest {
CertStoreTest() {
try {
LDAPCertStoreParameters lcsp =
new LDAPCertStoreParameters("192.xx.2.xx", 389);
String referenceID = "CN=rsa0,CN=Users,DC=bmo,DC=co,DC=ca";
X509CertSelector xcs = new X509CertSelector();
xcs.setSubject(referenceID);
CertStore cs = CertStore.getInstance("LDAP", lcsp);
Collection certificates = cs.getCertificates((CertSelector)xcs);
System.out.println("size: "+ certificates.size());
Iterator certificate = certificates.iterator();
while(certificate.hasNext()) {
System.out.println(certificate.next());
}
} catch(Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
System.out.println("main() called.");
CertStoreTest test = new CertStoreTest();
}
}
When I run this program, I get the size as 0 where I am expecting as 1.
main() called.
size: 0
I also have openldap running on a linux system, and in the above java program if I point to that server and with appropriate domain name information, java is able to pull the certificate associated with that domain name.
Not sure what I am missing when I try to retrive certificate from Windows Active Directory.
Can anyone shed some light on this as I have been stuck for few days now.
I try to follow instructions here (azure API reference) to manage Azure API Management through their API.
Looks like that (groovy):
import groovy.json.JsonSlurper
import org.apache.http.client.methods.HttpGet
import org.apache.http.impl.client.HttpClientBuilder
import javax.crypto.Mac
import javax.crypto.spec.SecretKeySpec
#Grab(group = 'org.apache.httpcomponents', module = 'httpclient', version = '4.5.2')
final def serviceName = 'my-api'
final def url = "https://${serviceName}.management.azure-api.net"
final String identifier = 'integration'
final byte[] primaryKey = Base64.decoder.decode('<key copy pasted from Azure web console > "API Management Service"')
final String expiry = '2018-03-01T12:26:00.0000000Z'
// SAS generation
def hmacSha256 = Mac.getInstance("HmacSHA256")
hmacSha256.init(new SecretKeySpec(primaryKey, "HmacSHA256"))
def toSign = "$identifier\n$expiry"
def signature = new String(Base64.encoder.encode(hmacSha256.doFinal(toSign.bytes)))
def sas = "SharedAccessSignature uid=${identifier}&ex=$expiry&sn=${signature}"
// URL Request
def getUsers = new HttpGet("$url/users?api-version=2017-03-01")
getUsers.setHeader('Authorization', sas)
def client = HttpClientBuilder.create().build()
def response = client.execute(getUsers)
println response
if (response.statusLine.statusCode == 200) {
println "Users: " + new JsonSlurper().parse(response.entity.content)
} else {
println "Error: ${response.entity.content.readLines()}"
}
Which result with:
HttpResponseProxy{HTTP/1.1 401 Unauthorized [Content-Length: 0, Strict-Transport-Security: max-age=31536000; includeSubDomains, WWW-Authenticate: SharedAccessSignature realm="", error="invalid_token", error_description="User is not found or signature is invalid.", Date: Wed, 14 Feb 2018 14:33:14 GMT] [Content-Length: 0,Chunked: false]}
Note: when I'm using a manually generated API, it does work. The issue is on the signature generation.
Does anyone can give me some direction or working code sample (in Java)?
For those having same issue and are lucky enough to find this answer, 2 issues:
signing algo is HmacSHA512, not HmacSHA256
primaryKey is not to be Base64 decoded. Just use it as it.
Working code (groovy):
import groovy.json.JsonSlurper
import org.apache.http.client.methods.HttpGet
import org.apache.http.impl.client.HttpClientBuilder
import javax.crypto.Mac
import javax.crypto.spec.SecretKeySpec
import java.time.LocalDateTime
import java.time.format.DateTimeFormatter
#Grab(group = 'org.apache.httpcomponents', module = 'httpclient', version = '4.5.2')
final def serviceName = '<your service name>'
final def url = "https://${serviceName}.management.azure-api.net"
final String identifier = '<your identifier>'
final byte[] primaryKey = '<copy paste of primaryKey>'.bytes // do not base64 decode!!!
final String expiry = LocalDateTime.now().plusDays(1).format(DateTimeFormatter.ofPattern("yyyy-MM-dd'T'HH:mm:ss.SSS'0000Z'"))
// SAS generation
def hmacSha512 = Mac.getInstance("HmacSHA512")
hmacSha512.init(new SecretKeySpec(primaryKey, "HmacSHA512"))
def dataToSign = "$identifier\n$expiry"
def signature = new String(Base64.encoder.encode(hmacSha512.doFinal(dataToSign.bytes)))
def sas = "SharedAccessSignature uid=${identifier}&ex=$expiry&sn=${signature}"
println "SAS=$sas"
// URL Request
def getUsers = new HttpGet("$url/users?api-version=2017-03-01")
getUsers.setHeader('Authorization', sas)
def client = HttpClientBuilder.create().build()
def response = client.execute(getUsers)
println response
if (response.statusLine.statusCode == 200) {
println "Users: " + new JsonSlurper().parse(response.entity.content)
} else {
println "Error: ${response.entity.content.readLines()}"
}
I am trying to compile sample Spark scala file through sbt and have built maven project in Eclipse IDE
Image
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object simpleSpark {
def main(args : Arrayt[String]){
val logfile = "C:\\spark-1.6.1-bin-hadoop2.6\spark-1.6.1-bin-hadoop2.6\README.md"
val conf = new SparkConf().setAppName("Simple Application").setMaster("local[2]").set("spark.executor.memory", "1g")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numHadoops = logData.filter(line => line.contains("Hadoop")).count()
val numSparks = logData.filer(line => line.contains("Spark")).count()
println("Lines with Hadoop: %s, Lines with Spark: %s".format(numHadoops, numHadoops))
}
}
The error says you have illegal start of expression here set("spark.executor.memory",) . Are you sure you set spark.executor.memory correctly in actual code ?
If yes , can you show what you wrote is .sbt file ?
Here's a [python code][1] that I would like to know if can also be used for GAE Java (when code is migrated). So the question is, is the python code below something that can converted to Java without any python "dependencies" that Java can't have:
# stdlib
from collections import defaultdict
from datetime import datetime, timedelta
import os
import time
# 3p
import simplejson as json
# google api
from google.appengine.api import app_identity, logservice, memcache, taskqueue
from google.appengine.ext.db import stats as db_stats
# framework
import webapp2
class DatadogStats(webapp2.RequestHandler):
def get(self):
api_key = self.request.get('api_key')
if api_key != os.environ.get('DATADOG_API_KEY'):
self.abort(403)
FLAVORS = ['requests', 'services', 'all']
flavor = self.request.get('flavor')
if flavor not in FLAVORS:
self.abort(400)
def get_task_queue_stats(queues=None):
if queues is None:
queues = ['default']
else:
queues = queues.split(',')
task_queues = [taskqueue.Queue(q).fetch_statistics() for q in queues]
q_stats = []
for q in task_queues:
stats = {
'queue_name': q.queue.name,
'tasks': q.tasks,
'oldest_eta_usec': q.oldest_eta_usec,
'executed_last_minute': q.executed_last_minute,
'in_flight': q.in_flight,
'enforced_rate': q.enforced_rate,
}
q_stats.append(stats)
return q_stats
def get_request_stats(after=None):
if after is None:
one_minute_ago = datetime.utcnow() - timedelta(minutes=1)
after = time.mktime(one_minute_ago.timetuple())
else:
# cast to float
after = float(after)
logs = logservice.fetch(start_time=after)
stats = defaultdict(list)
for req_log in logs:
stats['start_time'].append(req_log.start_time)
stats['api_mcycles'].append(req_log.api_mcycles)
stats['cost'].append(req_log.cost)
stats['finished'].append(req_log.finished)
stats['latency'].append(req_log.latency)
stats['mcycles'].append(req_log.mcycles)
stats['pending_time'].append(req_log.pending_time)
stats['replica_index'].append(req_log.replica_index)
stats['response_size'].append(req_log.response_size)
stats['version_id'].append(req_log.version_id)
return stats
stats = {
'project_name': app_identity.get_application_id()
}
if flavor == 'services' or flavor == 'all':
stats['datastore'] = db_stats.GlobalStat.all().get()
stats['memcache'] = memcache.get_stats()
stats['task_queue'] = get_task_queue_stats(self.request.get('task_queues', None))
if flavor == 'requests' or flavor == 'all':
stats['requests'] = get_request_stats(self.request.get('after', None))
self.response.headers['Content-Type'] = 'application/json'
self.response.write(json.dumps(stats))
app = webapp2.WSGIApplication([
('/datadog', DatadogStats),
])
[1]: https://github.com/DataDog/gae_datadog/blob/master/datadog.py
Yes, the code can be converted and will work in Java, but you will have to do it manually (I don't know of any tools to "translate" from Python to Java).
Looking at all the imports you have, there's nothing there that can't be used in Java.
I'm trying to write a remote HBase client using Java. Here is the code for reference :
package ttumdt.app.connector;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.ZooKeeperConnectionException;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.filter.BinaryComparator;
import org.apache.hadoop.hbase.filter.CompareFilter;
import org.apache.hadoop.hbase.filter.Filter;
import org.apache.hadoop.hbase.filter.FilterList;
import org.apache.hadoop.hbase.filter.RowFilter;
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter;
import org.apache.hadoop.hbase.util.Bytes;
import java.io.IOException;
import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;
public class HBaseClusterConnector {
private final String MASTER_IP = "10.138.168.185";
private final String ZOOKEEPER_PORT = "2181";
final String TRAFFIC_INFO_TABLE_NAME = "TrafficLog";
final String TRAFFIC_INFO_COLUMN_FAMILY = "TimeStampIMSI";
final String KEY_TRAFFIC_INFO_TABLE_BTS_ID = "BTS_ID";
final String KEY_TRAFFIC_INFO_TABLE_DATE = "DATE";
final String COLUMN_IMSI = "IMSI";
final String COLUMN_TIMESTAMP = "TIME_STAMP";
private final byte[] columnFamily = Bytes.toBytes(TRAFFIC_INFO_COLUMN_FAMILY);
private final byte[] qualifier= Bytes.toBytes(COLUMN_IMSI);
private Configuration conf = null;
public HBaseClusterConnector () throws MasterNotRunningException, ZooKeeperConnectionException {
conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum",MASTER_IP);
conf.set("hbase.zookeeper.property.clientPort",ZOOKEEPER_PORT);
HBaseAdmin.checkHBaseAvailable(conf);
}
/**
* This filter will return list of IMSIs for a given btsId and ime interval
* #param btsId : btsId for which the query has to run
* #param startTime : start time for which the query has to run
* #param endTime : end time for which the query has to run
* #return returns IMSIs as set of Strings
* #throws IOException
*/
public Set<String> getInfoPerBTSID(String btsId, String date,
String startTime, String endTime)
throws IOException {
Set<String> imsis = new HashSet<String>();
//ToDo : better exception handling
HTable table = new HTable(conf, TRAFFIC_INFO_TABLE_NAME);
Scan scan = new Scan();
scan.addColumn(columnFamily,qualifier);
scan.setFilter(prepFilter(btsId, date, startTime, endTime));
// filter to build where timestamp
Result result = null;
ResultScanner resultScanner = table.getScanner(scan);
while ((result = resultScanner.next())!= null) {
byte[] obtainedColumn = result.getValue(columnFamily,qualifier);
imsis.add(Bytes.toString(obtainedColumn));
}
resultScanner.close();
return imsis;
}
//ToDo : Figure out how valid is this filter code?? How comparison happens
// with eqaul or grater than equal etc
private Filter prepFilter (String btsId, String date,
String startTime, String endTime)
{
byte[] tableKey = Bytes.toBytes(KEY_TRAFFIC_INFO_TABLE_BTS_ID);
byte[] timeStamp = Bytes.toBytes(COLUMN_TIMESTAMP);
// filter to build -> where BTS_ID = <<btsId>> and Date = <<date>>
RowFilter keyFilter = new RowFilter(CompareFilter.CompareOp.EQUAL,
new BinaryComparator(Bytes.toBytes(btsId+date)));
// filter to build -> where timeStamp >= startTime
SingleColumnValueFilter singleColumnValueFilterStartTime =
new SingleColumnValueFilter(columnFamily, timeStamp,
CompareFilter.CompareOp.GREATER_OR_EQUAL,Bytes.toBytes(startTime));
// filter to build -> where timeStamp <= endTime
SingleColumnValueFilter singleColumnValueFilterEndTime =
new SingleColumnValueFilter(columnFamily, timeStamp,
CompareFilter.CompareOp.LESS_OR_EQUAL,Bytes.toBytes(endTime));
FilterList filterList = new FilterList(FilterList.Operator.MUST_PASS_ALL, Arrays
.asList((Filter) keyFilter,
singleColumnValueFilterStartTime, singleColumnValueFilterEndTime));
return filterList;
}
public static void main(String[] args) throws IOException {
HBaseClusterConnector flt = new HBaseClusterConnector();
Set<String> imsis= flt.getInfoPerBTSID("AMCD000784", "26082013","104092","104095");
System.out.println(imsis.toString());
}
}
I'm currently using Cloudera quick start VM to test this.
The problem is; if i run this very code on VM it works absolutely fine. But it fails with below error if it is run from outside. And I'm suspecting it has something to do with the VM setting rather than anything else. Please note that I've already checked if I can connect to the node manager / job tracker of the VM from host machine and it works absolutely fine. When I run the code from my host OS instead of running it on VM; I get the below error :
2013-10-15 18:16:04.185 java[652:1903] Unable to load realm info from SCDynamicStore
Exception in thread "main" org.apache.hadoop.hbase.MasterNotRunningException: Retried 1 times
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:138)
at org.apache.hadoop.hbase.client.HBaseAdmin.checkHBaseAvailable(HBaseAdmin.java:1774)
at ttumdt.app.connector.HBaseClusterConnector.<init>(HBaseClusterConnector.java:47)
at ttumdt.app.connector.HBaseClusterConnector.main(HBaseClusterConnector.java:117)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Process finished with exit code 1
Please note that; the master node is actually running. The zookeper log shows that it has established connection with the host OS :
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
6:16:03.274 PM INFO org.apache.zookeeper.server.ZooKeeperServer
Client attempting to establish new session at /10.138.169.81:50567
6:16:03.314 PM INFO org.apache.zookeeper.server.ZooKeeperServer
Established session 0x141bc2487440004 with negotiated timeout 60000 for client /10.138.169.81:50567
6:16:03.964 PM INFO org.apache.zookeeper.server.PrepRequestProcessor
Processed session termination for sessionid: 0x141bc2487440004
6:16:03.996 PM INFO org.apache.zookeeper.server.NIOServerCnxn
Closed socket connection for client /10.138.169.81:50567 which had sessionid 0x141bc2487440004
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
But I see no trace of any activity in Master or RegionServer log.
Please note that my host OS is Mac OSX 10.7.5
As per the resource available; this should work fine; though some suggest the simple HBase java client never works. I'm confused; and eagerly waiting for pointers !!! Please reply
start your hiveserver2 on different port and then try connecting
command to connect hiveserver2 on different port (make sure hive is in path ) :
hive --service hiveserver2 --hiveconf hive.server2.thrift.port=13000
The HBase Java client certainly does work!
The most likely explanation is that your client can't see the machine that the master is running on for some reason.
One possible explanation is that, although you are connecting to Zookeeper using an IP address, the HBase client is attempting to connect to the master using its hostname.
So, if you ensure that you have entries in your hosts file (on the client) that match the hostname of the machine running the master, this may resolve the problem.
Check that you can access the master Web UI at <hostname>:60010 from your client machine.