I'm writing a simple Java client code to add values in HBase table. I'm using put.add(byte[] columnFamily, byte[] columnQualifier, byte[] value), but this method is deprecated in new HBase API. Can anyone please help in what is the way of doing it using new Put API?
Using maven I have downloaded jar for HBase version 1.2.0.
I'm using the following code :
package com.NoSQL;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.Cell;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Admin;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.log4j.Level;
import org.apache.log4j.Logger;
public class PopulatingData {
public static void main(String[] args) throws IOException{
String table = "Employee";
Logger.getRootLogger().setLevel(Level.WARN);
Configuration conf = HBaseConfiguration.create();
Connection con = ConnectionFactory.createConnection(conf);
Admin admin = con.getAdmin();
if(admin.tableExists(TableName.valueOf(table))) {
Table htable = con.getTable(TableName.valueOf(table));
/*********** adding a new row ***********/
// adding a row key
Put p = new Put(Bytes.toBytes("row1"));
p.add(Bytes.toBytes("ContactDetails"), Bytes.toBytes("Mobile"), Bytes.toBytes("9876543210"));
p.add(Bytes.toBytes("ContactDetails"), Bytes.toBytes("Email"), Bytes.toBytes("abhc#gmail.com"));
p.add(Bytes.toBytes("Personal"), Bytes.toBytes("Name"), Bytes.toBytes("Abhinav Rawat"));
p.add(Bytes.toBytes("Personal"), Bytes.toBytes("Age"), Bytes.toBytes("21"));
p.add(Bytes.toBytes("Personal"), Bytes.toBytes("Gender"), Bytes.toBytes("M"));
p.add(Bytes.toBytes("Employement"), Bytes.toBytes("Company"), Bytes.toBytes("UpGrad"));
p.add(Bytes.toBytes("Employement"), Bytes.toBytes("DOJ"), Bytes.toBytes("11:06:2018"));
p.add(Bytes.toBytes("Employement"), Bytes.toBytes("Designation"), Bytes.toBytes("ContentStrategist"));
htable.put(p);
/**********************/
System.out.print("Table is Populated");`enter code here`
}else {
System.out.println("The HBase Table named "+table+" doesn't exists.");
}
System.out.println("Returnning Main");
}
}
Use addColumn() method :
Put put = new Put(Bytes.toBytes(rowKey));
put.addColumn(NAME_FAMILY, NAME_COL_QUALIFIER, name);
Please refer more details in below javadoc :
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html
Related
Problem statement:: i want to create a aws lambda in java code that get's a csv file from s3 Bucket and the insert the data in postgres table and it should also generate corresponding log and bad file in s3 bucket which should have logs and bad records respectively.
what i am able to achieve::
using copy command i am able to insert data from csv file which i get from s3 and then insert the data in postgres table.
here what copy command that i am using:: below is my code
package com.copyData;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.nio.charset.StandardCharsets;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.LinkedHashMap;
import org.postgresql.copy.CopyManager;
import org.postgresql.core.BaseConnection;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.LambdaLogger;
import com.amazonaws.services.lambda.runtime.RequestHandler;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.S3Object;
public class mmlLoader3 implements RequestHandler<LinkedHashMap<String,String>,Object> {
private Connection con = null;
private LambdaLogger logger;
#Override
/*
* LambdaFunction::
*
*
*/
public Object handleRequest(LinkedHashMap<String, String> input, Context context) {
try {
logger = context.getLogger();
AmazonS3 s3 = AmazonS3ClientBuilder
.standard()
.withRegion(Regions.****)
.build();
// Retrieve the file from S3
S3Object s3Object = s3.getObject("bucket_name","file.csv");
InputStream objectData = s3Object.getObjectContent();
Class.forName("org.postgresql.Driver");
Connection con = DriverManager.getConnection("jdbc:postgresql://url.rds.amazonaws.com:port/dbName","username","password");
CopyManager copyManager = new CopyManager((BaseConnection) con);
//copyManager.copyIn("COPY tableName FROM STDIN WITH CSV HEADER DELIMITER ','", objectData)
logger.log("Data Entered in db:: ");
if(con!=null) {con.close(); ;logger.log("con closed");}
} catch (Exception e) {
e.printStackTrace();
try {
if(con!=null) {con.close(); logger.log("con closed");}
} catch (SQLException e1) {
e1.printStackTrace();
}
return "got an error";
}
return "Executed";
}
Expectation:: i need guidance or code snippet which helps me to create log and bad file for that copy command in s3 bucket
I have one use case where I need to load thousands of tables from Oracle to BiQuery using Apache Beam (DataFlow). I have written the below code that is working by creating tables manually and using CreateDisposition.CREATE_NEVER but that will not be feasible to create all tables manually. So I have written code to fetch schema from Source (JdbcIO) and pass it to BigQuery writeTableRows().
But the code is giving the below error.
Exception in thread "main" java.lang.IllegalArgumentException: schema can not be null
at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument(Preconditions.java:141)
at org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Write.withSchema(BigQueryIO.java:2256)
at org.example.Main.main(Main.java:109)
Code
package org.example;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.io.FileIO;
import org.apache.beam.sdk.io.gcp.bigquery.TableRowJsonCoder;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.DoFn;
import org.apache.beam.sdk.transforms.ParDo;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.google.api.services.bigquery.model.TableFieldSchema;
import com.google.api.services.bigquery.model.TableRow;
import com.google.api.services.bigquery.model.TableSchema;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO;
import org.apache.beam.sdk.io.jdbc.JdbcIO;
import org.apache.beam.sdk.coders.StringUtf8Coder;
import java.sql.ResultSet;
import java.sql.ResultSetMetaData;
import java.sql.SQLException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class Main {
private static final Logger LOG = LoggerFactory.getLogger(Main.class);
public static TableSchema schema;
public static void main(String[] args) {
// Read from JDBC
Pipeline p = Pipeline.create(PipelineOptionsFactory.fromArgs(args).withValidation().create());
String query2= "select * from Test.emptable";
PCollection<TableRow> rows = p.apply(JdbcIO.<TableRow>read()
.withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create(
"oracle.jdbc.OracleDriver", "jdbc:oracle:thin:#//localhost:1521/ORCL")
.withUsername("root")
.withPassword("password"))
.withQuery(query2)
.withCoder(TableRowJsonCoder.of())
.withRowMapper(new JdbcIO.RowMapper<TableRow>() {
#Override
public TableRow mapRow(ResultSet resultSet) throws Exception {
schema = getSchemaFromResultSet(resultSet);
TableRow tableRow = new TableRow();
List<TableFieldSchema> columnNames = schema.getFields();
for(int i =1; i<= resultSet.getMetaData().getColumnCount(); i++) {
tableRow.put(columnNames.get(i-1).get("name").toString(), String.valueOf(resultSet.getObject(i)));
}
return tableRow;
}
})
);
rows.apply(BigQueryIO.writeTableRows()
.to("project:SampleDataset.emptable")
.withSchema(schema)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withMethod(BigQueryIO.Write.Method.STORAGE_WRITE_API)
);
p.run().waitUntilFinish();
}
private static TableSchema getSchemaFromResultSet(ResultSet resultSet) {
FieldSchemaListBuilder fieldSchemaListBuilder = new FieldSchemaListBuilder();
try {
ResultSetMetaData rsmd = resultSet.getMetaData();
for(int i=1; i <= rsmd.getColumnCount(); i++) {
fieldSchemaListBuilder.stringField(resultSet.getMetaData().getColumnName(i));
}
}
catch (SQLException ex) {
LOG.error("Error getting metadata: " + ex.getMessage());
}
return fieldSchemaListBuilder.schema();
}
}
I have tried to assign a dummy schema to handle this compile time error and assigned schema value to the dummy schema, but that is creating a table with a dummy schema, not with the actual schema.
Can someone help me to understand the flow where I am missing and how I can get the schema from JdbcIO and assign it to BigQuery Sink Connector?
To load a schema within the pipeline itself as you're suggesting here, you can use BigQueryIO.write() and specify withSchemaFromView. In that case, you'd need to fetch the schema from the source database and wrap that in a PCollectionView (see Side inputs in the Beam programming guide).
You're using the storage write API, which likely requires a schema be specified. Note that the BigQuery API for file loads can allow inferring schema from the file contents at load time, although I'm not completely sure if Beam supports this. I would encourage you to try using file loads and setting withSchemaUpdateOption(SchemaUpdateOption.ALLOW_FIELD_ADDITION) to see if that leads to the table creation behavior you're looking for.
I have the following class to perform PCA on a arff file. I have added the Weka jar to my project but I am still getting an error saying DataSource cannot be resolved and I don't know what to do to resolve it. Can anyone suggest what could be wrong?
package project;
import weka.core.Instances;
import weka.core.converters.ArffLoader;
import weka.core.converters.ConverterUtils;
import weka.core.converters.ConverterUtils.DataSource;
import weka.core.converters.TextDirectoryLoader;
import weka.gui.visualize.Plot2D;
import weka.gui.visualize.PlotData2D;
import weka.gui.visualize.VisualizePanel;
import java.awt.BorderLayout;
import java.io.File;
import java.util.ArrayList;
import javax.swing.JFrame;
import org.math.plot.FrameView;
import org.math.plot.Plot2DPanel;
import org.math.plot.PlotPanel;
import org.math.plot.plots.ScatterPlot;
import weka.attributeSelection.PrincipalComponents;
import weka.attributeSelection.Ranker;
public class PCA {
public static void main(String[] args) {
try {
// Load the Data.
DataSource source = new DataSource("../data/ingredients.arff");
Instances data = source.getDataSet();
// Perform PCA.
PrincipalComponents pca = new PrincipalComponents();
pca.setVarianceCovered(1.0);
//pca.setCenterData(true);
pca.setNormalize(true);
pca.setTransformBackToOriginal(false);
pca.buildEvaluator(data);
// Show transform data into eigenvector basis.
Instances transformedData = pca.transformedData();
System.out.println(transformedData);
} catch (Exception e) {
e.printStackTrace();
}
}
}
I am trying to create an EMR Cluster using Java. I have created the jar file and put that into a lambda function. I'm calling the lambda from AWS Step functions. I created the maven package including the AWS JAVA SDK dependencies and also imported all the packages
import java.io.IOException;
import com.amazonaws.auth.*;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.auth.PropertiesCredentials;
import com.amazonaws.services.elasticmapreduce.*;
import com.amazonaws.services.elasticmapreduce.model.AddJobFlowStepsRequest;
import com.amazonaws.services.elasticmapreduce.model.AddJobFlowStepsResult;
import com.amazonaws.services.elasticmapreduce.model.RunJobFlowRequest;
import com.amazonaws.services.elasticmapreduce.model.*;
import com.amazonaws.services.elasticmapreduce.model.HadoopJarStepConfig;
import com.amazonaws.services.elasticmapreduce.model.StepConfig;
import com.amazonaws.services.elasticmapreduce.util.StepFactory;
import com.amazonaws.AmazonServiceException;
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.BasicAWSCredentials;
public class CreateCluster {
public static void main(String[] args) {
AWSCredentials credentials = new BasicAWSCredentials("access key", "secret key");
// myApp={[Hadoop]};
AmazonElasticMapReduceClient emr = new AmazonElasticMapReduceClient(credentials);
String COMMAND_RUNNER = "command-runner.jar";
String DEBUGGING_COMMAND = "state-pusher-script";
String DEBUGGING_NAME = "Setup Hadoop Debugging";
StepFactory stepFactory = new StepFactory();
StepConfig enabledebugging = new StepConfig()
.withName(DEBUGGING_NAME)
.withActionOnFailure(ActionOnFailure.TERMINATE_CLUSTER)
.withHadoopJarStep(new HadoopJarStepConfig()
.withJar(COMMAND_RUNNER)
.withArgs(DEBUGGING_COMMAND));
RunJobFlowRequest request = new RunJobFlowRequest()
.withName("REMR")
.withReleaseLabel("emr-5.16.0")
.withSteps(enabledebugging)
// .withApplications(myApp)
.withLogUri("s3n://r.base.ihm/emr-log/")
.withServiceRole("service_role")
.withJobFlowRole("jobflow_role")
.withInstances(new JobFlowInstancesConfig()
.withEc2KeyName("emr")
.withEc2SubnetId("subnet-d1fbb8ee")
.withInstanceCount(3)
.withKeepJobFlowAliveWhenNoSteps(false)
.withMasterInstanceType("m4.large")
.withSlaveInstanceType("m4.large"));
RunJobFlowResult result = emr.runJobFlow(request);
}
}
but still I'm getting the error
java.lang.NoClassDefFoundError
{
"errorMessage": "Error loading class com.ihm.base.spark.CreateCluster: com/amazonaws/auth/AWSCredentials",
"errorType": "java.lang.NoClassDefFoundError"
}
Any ideas on what I am missing here?
I am trying to make a java app which makes connection with facebook.I am using facebok4j to achieve this,I made an app in fb devolopers and got the key annd id for it.But when i am passing it to get an access token its returning an exception error.Please help me
java code
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import net.sf.json.JSONArray;
import net.sf.json.JSONObject;
import facebook4j.Facebook;
import facebook4j.FacebookException;
import facebook4j.FacebookFactory;
import facebook4j.Post;
import facebook4j.ResponseList;
import facebook4j.auth.AccessToken;
import facebook4j.auth.OAuthAuthorization;
import facebook4j.auth.OAuthSupport;
import facebook4j.conf.Configuration;
import facebook4j.conf.ConfigurationBuilder;
public class Fbsample {
public static Configuration createConfiguration()
{
ConfigurationBuilder confBuilder = new ConfigurationBuilder();
confBuilder.setDebugEnabled(true);
confBuilder.setOAuthAppId("*****");
confBuilder.setOAuthAppSecret("*****");
confBuilder.setUseSSL(true);
confBuilder.setJSONStoreEnabled(true);
Configuration configuration = confBuilder.build();
return configuration;
}
public static void main(String[] argv) throws FacebookException {
Configuration configuration = createConfiguration();
FacebookFactory facebookFactory = new FacebookFactory(configuration );
Facebook facebookClient = facebookFactory.getInstance();
AccessToken accessToken = null;
try{
OAuthSupport oAuthSupport = new OAuthAuthorization(configuration );
accessToken = oAuthSupport.getOAuthAppAccessToken();
}catch (FacebookException e) {
System.err.println("Error while creating access token " + e.getLocalizedMessage());
}
facebookClient.setOAuthAccessToken(accessToken);
//results in an error says {An active access token must be used to query information about the current user}
}
}
For now i specified my token and id as *.When running its returning 'Error while creating access token graph.facebook.com'.Thanks in advance.