hadoop/ how to java write hdfs , on proxy - java

i trying to java write hdfs, on proxy server but failed.
mypc----squid proxy----hadoop/192.168.100.101:8020
proxy is 127.0.0.1:8123/squid
hadoop is 192.168.1000.101:8020
but i failed to write hdfs
what is problem
please looking my test code.
i need help.
how can setting proxyserver info.
//
private void test() throws Exception
{
String hdfsurl="hdfs://192.168.100.101:8020/data";
Configuration conf = new Configuration();
conf.set("hadoop.socks.server", "127.0.0.1:8123"); // <==proxyserver
conf.set("hadoop.rpc.socket.factory.class.default", "org.apache.hadoop.net.SocksSocketFactory");
URI uri = new URI( hdfsurl );
FileSystem fs = FileSystem.get( URI.create( hdfsurl), conf );
String fp="/DATA/dirs.txt"; // mypc file
String tp="dirs.txt"; // to hdfs copy
fs.copyFromLocalFile( new Path(fp), new Path(tp)); //
}
02-18 16:28 INFO org.apache.hadoop.ipc.Client.handleConnectionTimeout(Client.java:906) Retrying connect to server: 192.168.100.101:8020. Already tried 0 time(s); maxRetries=45

Related

Getting "InputStream: cannot search backwards." when reading compressed Avro file from S3 on Flink

I am currently learning Flink and I my goal is to read an Avro encoded and gzipped file from S3, then process its contents.
I managed to make it work for uncompressed files (.avro extensions), but when I point to a .gz extension I get the following exception:
2022-08-12 15:51:21,418 WARN org.apache.flink.runtime.taskmanager.Task [] - Split Reader: Custom File Source -> Map -> Sink: Print to Std. Out (1/1)#0 (bcd4507e1eac498d2c2fea1c4785679e) switched from RUNNING to FAILED with failure cause: java.lang.IllegalArgumentException: Wrapped InputStream: cannot search backwards.
The code that I am using is this:
public class StreamingJob {
public static void main(String[] args) throws Exception {
// String path = "s3a://my-test-bucket/tweets.avro"; // <-- this works fine
String path = "s3a://my-test-bucket/tweets.gz";
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
AvroInputFormat inputFormat = new AvroInputFormat<TwitterSchema>(
new Path(path),
TwitterSchema.class
);
DataStreamSource<TwitterSchema> ds = env.readFile(
inputFormat,
path,
FileProcessingMode.PROCESS_CONTINUOUSLY,
10000l, // 10 seconds
TypeInformation.of(TwitterSchema.class)
);
ds
.map(t -> t.getTweet())
.print();
env.execute("s3 job example");
}
}
I assume that Flink will be able to decompress the file just fine based on the section Read compressed files of their documentation.
Versions I am using:
<flink.version>1.14.5</flink.version>
<target.java.version>1.8</target.java.version>

jbpm persist process in file and kieserver

I have following code, and like to save in file and also on kie server, Taken refrence from https://github.com/kiegroup/jbpm/blob/84c98129de79b5dcd38a3fd6645b3807ef0cce3e/jbpm-bpmn2/src/test/java/org/jbpm/bpmn2/ProcessFactoryTest.java#L228 and to save locally change to filesystem C://dev//processFactory.bpmn2 but it is not working . Also how to persist in kie server which is running at http://localhost:8080/kie-server/docs/ for jbpm
#Test(timeout = 10000)
public void testBoundaryTimerTimeDuration() throws Exception {
NodeLeftCountDownProcessEventListener countDownListener = new NodeLeftCountDownProcessEventListener("BoundaryTimerEvent",
1);
RuleFlowProcessFactory factory = RuleFlowProcessFactory.createProcess("org.jbpm.process");
factory
// header
.name("My process").packageName("org.jbpm")
// nodes
.startNode(1).name("Start").done()
.humanTaskNode(2).name("Task").actorId("john").taskName("MyTask").done()
.endNode(3).name("End1").terminate(false).done()
.boundaryEventNode(4).name("BoundaryTimerEvent").attachedTo(2).timeDuration("1s").cancelActivity(false).done()
.endNode(5).name("End2").terminate(false).done()
// connections
.connection(1,
2)
.connection(2,
3)
.connection(4,
5);
RuleFlowProcess process = factory.validate().getProcess();
Resource res = ResourceFactory.newByteArrayResource(XmlBPMNProcessDumper.INSTANCE.dump(process).getBytes());
// res.setSourcePath("/tmp/processFactory.bpmn2"); // source path or target path must be set to be added into kbase
res.setSourcePath("C://dev//processFactory.bpmn2");
KieBase kbase = createKnowledgeBaseFromResources(res);
StatefulKnowledgeSession ksession = createKnowledgeSession(kbase);
TestWorkItemHandler testHandler = new TestWorkItemHandler();
ksession.getWorkItemManager().registerWorkItemHandler("Human Task",
testHandler);
ksession.addEventListener(countDownListener);
ProcessInstance pi = ksession.startProcess("org.jbpm.process");
assertProcessInstanceActive(pi);
countDownListener.waitTillCompleted(); // wait for boundary timer firing
assertNodeTriggered(pi.getId(),
"End2");
assertProcessInstanceActive(pi); // still active because CancelActivity = false
ksession.getWorkItemManager().completeWorkItem(testHandler.getWorkItem().getId(),
null);
assertProcessInstanceCompleted(pi);
ksession.dispose();
}
setSourcePath is not saving the process into a file, you can do it with FileOutputStream or any other way to write a file from String or bytes:
FileOutputStream outputStream = new FileOutputStream("your-file-name");
outputStream.write(XmlBPMNProcessDumper.INSTANCE.dump(process).getBytes());
Could you elaborate more on the second question?

Bigtable emulator. Not find an appropriate constructor

Recently, I'm trying to develop something use Bigtable emulator with java(Spring Boot) on IntelliJ IDEA tool.
What I have done:
Bigtable emulator works well on my computer (MacOs 10.15.6).
"cbt" works normally with Bigtable emulator running on my mac as somethings like this.
I've checked that running Bigtable emulator doesn't need real gcloud credential.
I write a unit test on IEDA like this works fine.
I have added environment variable in setting like this:
My unit test code:
I. Connect init:
Configuration conf;
Connection connection = null;
conf = BigtableConfiguration.configure("fake-project", "fake-instance");
String host = "localhost";
String port = "8086";
II. Constant data going to write into table.
final byte[] TABLE_NAME = Bytes.toBytes("Hello-Bigtable");
final byte[] COLUMN_FAMILY_NAME = Bytes.toBytes("cf1");
final byte[] COLUMN_NAME = Bytes.toBytes("greeting");
final String[] GREETINGS = {
"Hello World!", "Hello Cloud Bigtable!", "Hello!!"
};
III. Connecting: (Duplicated to I.Connect init.)
Configuration conf;
Connection connection = null;
conf = BigtableConfiguration.configure("fake-project", "fake-instance");
String host = "localhost";
String port = "8086";
III. Connecting: (Edited)
if(!Strings.isNullOrEmpty(host)){
conf.set(BigtableOptionsFactory.BIGTABLE_HOST_KEY, host);
conf.set(BigtableOptionsFactory.BIGTABLE_PORT_KEY,port);
conf.set(BigtableOptionsFactory.BIGTABLE_USE_PLAINTEXT_NEGOTIATION, "true");
}
connection = BigtableConfiguration.connect(conf);
IV. Write & Read data:
Admin admin = connection.getAdmin();
Table table = connection.getTable(TableName.valueOf(TABLE_NAME));
if(!admin.tableExists(TableName.valueOf(TABLE_NAME))){
HTableDescriptor descriptor = new HTableDescriptor(TableName.valueOf(TABLE_NAME));
descriptor.addFamily(new HColumnDescriptor(COLUMN_FAMILY_NAME));
System.out.print("Create table " + descriptor.getNameAsString());
admin.createTable(descriptor);
}
for (int i = 0; i < GREETINGS.length; i++) {
String rowKey = "greeting" + i;
Put put = new Put(Bytes.toBytes(rowKey));
put.addColumn(COLUMN_FAMILY_NAME, COLUMN_NAME, Bytes.toBytes(GREETINGS[i]));
table.put(put);
}
Scan scan = new Scan();
ResultScanner scanner = table.getScanner(scan);
for (Result row : scanner) {
byte[] valueBytes = row.getValue(COLUMN_FAMILY_NAME, COLUMN_NAME);
System.out.println('\t' + Bytes.toString(valueBytes));
}
V. Output
Hello World!
Hello Cloud Bigtable!
Hello!!
Problem came after I get this code to my project.
When I use 'debug' to run the code.
I get somethings like this
when it trying to connect bigtable:
Seems that it can't new instance base on the config i create.
Eventually, it shows me an error like
Could not find an appropriate constructor for com.google.cloud.bigtable.hbase1_x.BigtableConnection
P.S. I have tried to use command running IntelliJ IDEA. Reason I doing so is because I missing environment variable when I using unit test.
In my .zshrc:
My CMD tool is iTerm2 with oh-myzsh.
Anythings is help!!!!
Thanks lots.
It seems that you miss the constructor for the BigtableConnection: BigtableConnection(org.apache.hadoop.conf.Configuration conf)
I would suggest you trying to create a Connection object by following the steps mentioned in Google Documentation
private static Connection connection = null;
public static void connect() throws IOException {
Configuration config = BigtableConfiguration.configure(PROJECT_ID, INSTANCE_ID);
// Include the following line if you are using app profiles.
// If you do not include the following line, the connection uses the
// default app profile.
config.set(BigtableOptionsFactory.APP_PROFILE_ID_KEY, APP_PROFILE_ID);
connection = BigtableConfiguration.connect(config);
}

EMR cluster hangs in Step state 'Running/Pending'

I am launching an EMR cluster through java SDK with a custom jar step. The cluster launch is successful but when after bootstrapping while the step is pending/running state the cluster stucks.
I am not even able to ssh on the machine.
Following is my code to launch the cluster with custom jar step -
String dataTrasnferJar = s3://test/testApplication.jar;
if (dataTrasnferJar == null || dataTrasnferJar.isEmpty())
throw new InvalidS3ObjectException(
"EMR custom jar file path is null/empty. Please provide a valid jar file path");
HadoopJarStepConfig customJarConfig = new HadoopJarStepConfig().withJar(dataTrasnferJar);
StepConfig customJarStep = new StepConfig("Mongo_to_S3_Data_Transfer", customJarConfig)
.withActionOnFailure(ActionOnFailure.CONTINUE);
AmazonElasticMapReduce emr = AmazonElasticMapReduceClientBuilder.standard()
.withCredentials(awsCredentialsProvider)
.withRegion(region)
.build();
Application spark = new Application().withName("Spark");
String clusterName = "my-cluster-" + System.currentTimeMillis();
RunJobFlowRequest request = new RunJobFlowRequest()
.withName(clusterName)
.withReleaseLabel("emr-6.0.0")
.withApplications(spark)
.withVisibleToAllUsers(true)
.withSteps(customJarStep)
.withLogUri(loggingS3Bucket)
.withServiceRole("EMR_DefaultRole")
.withJobFlowRole("EMR_EC2_DefaultRole")
.withInstances(new JobFlowInstancesConfig()
.withEc2KeyName(key_pair)
.withInstanceCount(instanceCount)
.withEc2SubnetIds(subnetId)
.withAdditionalMasterSecurityGroups(securityGroup)
.withKeepJobFlowAliveWhenNoSteps(true)
.withMasterInstanceType(instanceType));
RunJobFlowResult result = emr.runJobFlow(request);
EMR emr-6.0.0 version is still in development. Can you try the same with emr-5.29.0?

Jobtracker API error - Call to localhost/127.0.0.1:50030 failed on local exception: java.io.EOFException

I m trying to connect my jobtracker using Java.
The below shown is the program I am trying to execute
public static void main(String args[]) throws IOException {
Configuration conf = new Configuration();
conf.addResource(new Path(
"/home/user/hadoop-1.0.3/conf/core-site.xml"));
conf.addResource(new Path(
"/home/user/hadoop-1.0.3/conf/hdfs-site.xml"));
conf.addResource(new Path(
"/home/user/hadoop-1.0.3/conf/mapred-site.xml"));
InetSocketAddress jobtracker = new InetSocketAddress("localhost", 50030);
JobClient jobClient = new JobClient(jobtracker, conf);
jobClient.setConf(conf);
JobStatus[] jobs = jobClient.jobsToComplete();
for (int i = 0; i < jobs.length; i++) {
JobStatus js = jobs[i];
if (js.getRunState() == JobStatus.RUNNING) {
JobID jobId = js.getJobID();
System.out.println(jobId);
}
}
This is the exception i get.
Even i though i try replacing localhost with 127.0.0.1 it doesnt work
. The same error.
Exception in thread "main" java.io.IOException: Call to localhost/127.0.0.1:50030 failed on local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107)
at org.apache.hadoop.ipc.Client.call(Client.java:1075)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at org.apache.hadoop.mapred.$Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:480)
at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:534)
at com.tcs.nextgen.searchablemetadata.executor.factory.JobChecker.main(JobChecker.java:34)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:811)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:749)
I added all the jars related to hadoop .
I cant understand why "/" is comming in between localhost/127.0.0.1:50030
Have you tried the actual jobtracker port number, rather than the http port (50030).
Try the port number listed in your $HADOOP_HOME/conf/mapred-site.xml under the mapred.job.tracker property. Here's my pseudo mapred-site.xml conf
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
If you look at the JobTracker.getAddress(Configuration) method, you can see it uses this property if you don't explicitly specify the jobtracker host / port:
public static InetSocketAddress getAddress(Configuration conf) {
String jobTrackerStr =
conf.get("mapred.job.tracker", "localhost:8012");
return NetUtils.createSocketAddr(jobTrackerStr);
}

Categories

Resources