Connect from Hive to HDFS (JSON Files)

Connect from Hive to HDFS (JSON Files) - java

I have already built up a hadoop cluster with apache flume to import twitter data, it works fine.
Now I wanna start analytics with apache hive on the twitter data. On the web I found following example from cloudera.
https://github.com/cloudera/cdh-twitter-example
But now, by creating the table, hive returns the following error message:
java.net.URISyntaxException: Relative path in absolute URI: text:STRING, Query returned non-zero code: 1,
cause: java.net.URISyntaxException: Relative path in absolute URI: text:STRING,
On web i didn't found something about this (only by starting hive), maybe someone here can help me!
Thanks!

Okay, first problem are solved by myself. forgot a semicolon on the command. sorry for this.
But now I get another error message after start jobs over hive. All Query Jobs on Hive abort after some seconds. In the Log I found only this:
2015-03-25 14:47:40,680 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_1427105751169_0006_01_000030
Any Ideas here?

Related

Having problem to connect on sftp after moving sftp to AWS through camel

I'm having problems to connect on the sftp through springboot camel app. This started happening after we moved our sftp to AWS . Now, I have a temporary server host which looks like this s-add03ac9b.server.transfer.eu-west-1.amazonaws.com, I can connect in there by using for instance FileZilla but if I try to connect using the app, this is the error I get:
Caused by: org.apache.camel.NoSuchEndpointException: No endpoint could be found for: s-add03ac9b.server.transfer.eu-west-1.amazonaws.com/testFolder?username=myUser&password=myPassword&disconnect=true&maxMessagesPerPoll=50&initialDelay=1s&delay=1s&timeout=3000&move=done&moveFailed=failed, please check your classpath contains the needed Camel component jar.
, and here is the route itself, I changed it a bit to be more readable
from("s-add03ac9b.server.transfer.eu-west-1.amazonaws.com/testFolder?username=myUser&password=myPassword&disconnect=true&maxMessagesPerPoll=50&initialDelay=1s&delay=1s&timeout=3000&move=done&moveFailed=failed")
.setHeader(Headers.CONFIGURATION.name(), constant(routeConfiguration))
.setHeader("filenameModify").constant(modifyFileNames).setHeader("fileExtension")
.constant(fileExtension).choice().when(PredicateBuilder.and(header("filenameModify").isEqualTo(true), header("fileExtension").isNotNull()))
.setHeader(Exchange.FILE_NAME,
simple("${file:name.noext}-${date:in.header.CamelFileLastModified:ddMMyyyy-HHmmss}-${file:length}.${in.header.fileExtension}"))
.end().idempotentConsumer(simple("${file:name}-${file:length}"), MemoryIdempotentRepository.memoryIdempotentRepository(1000))
.log("Processing ${file:name}")
.process(rawDataProcessor)
.to((String) routeConfiguration.get(ConfigKey.END)).otherwise().log("File ${file:name} processed.").stop().end();
Do I need to add something else, maybe some dependency or...?

If anyone is having the same issue, I fixed it by adding an sftp:// as a prefix in from part.

How to solve InvokeHTTP error?

Since I have try to get data from HTTPS.
For that i have configured "GetHTTP" processor with "StandardSSLContextService".
i have Created JKS using KeyTool and imported Security certificate(.CER) file for website into it.
i can able get data from "HTTPS" URL using "GetHTTP" but it not worked in "InvokeHTTP".
I have used GetHTTP for download one file from Server.But i need to download 100 files from Server using Loop.In that case i should not drag the "GetHTTP" processor for 100 files.
That's why i have find loop in NiFi with the InvokeHttp to download 100 files by index 1 to 100 with single processor.
In InvokeHTTP shows following exception but it runs in GetHTTP well.
ERROR [Timer-Driven Process Thread-8] o.a.nifi.processors.standard.InvokeHTTP InvokeHTTP[id=4aa3eb07-8dce-4037-867b-2d36b0b6fab8] Routing to Failure due to exception: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No X509TrustManager implementation available
But those changes worked in GetHTTP.
Can anyone guide me solve this?

Hiveserver2 dont start: "ascii codec cant encode character"

I made a cluster with NameNode, Secondary NameNode, and 3 DataNodes. I installed HDP via Ambari + HUE and now I am configuring XA secure policies for HDFS, Hive and Hbase. It works fine for every component, except Hive. Problem is that when I change hive.security.authorization to true (in Ambari -> hive configs) the Hiveserver2 fails at start with a problem:
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 115, in action_create
fp.write(content)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 990: ordinal not in range(128)
I tried to edit that python file but when I do any changes it gets even worse. It probably tries to encode Unicode character using wrong codec and save it to the file, but I am bad programmer and I dont know how to edit it correctly. I cant figure out what is that file, where is it and what it contains.
When I set security authorization to false, the server starts but crashes in ~3 minutes with an error:
12:02:43,523 ERROR [pool-1-thread-648] JMXPropertyProvider:540 - Caught exception getting JMX metrics : Server returned HTTP response code: 500 for URL: http://localhost.localdomain:8745/api/cluster/summary
12:02:50,604 INFO [qtp677995254-4417] HeartBeatHandler:428 - State of service component HIVE_SERVER of service HIVE of cluster testING has changed from STARTED to INSTALLED at host localhost.localdomain
12:02:53,624 ERROR [pool-1-thread-668] JMXPropertyProvider:540 - Caught exception getting JMX metrics : Read timed out
Any suggestions? Thank you in advance.
#EDIT
Here is line of code in python which causes problem:
fp.write(content)
I was trying to add .decode("utf-8") at the end but:
'NoneType' object has no attribute 'decode' occurs

For the first problem, try adding
# -*- coding: UTF-8 -*-
At the first line of your file

Hive Query returned non zero code

We are seeing different kinds of error messages(error code: 1,2,3) for the same query using hive. Can someone explain what is this error code and what does different error codes mean ? Please share if there is a proper documentation regarding the error messages. Thanks in advance.
Error:-
java.sql.SQLException: Query returned non-zero code: 2, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

Found similar kind of post in archive here says "That's not the real error" and here's how to find it:
Go to the hadoop jobtracker web-dashboard, find the hive mapreduce
jobs that failed and look at the logs of the failed tasks. That will
show you the real error.

Neo4j Server fails to start embedded database

I've created a neo4j database embedded into my java application. Creating nodes, relationships, properties and querying all of them looks fine, but now I want to visualize the database just to check if everything is okay. So I tried to load the test.db inside my neo4j-Server edition (running on the same machine), but however I get all the time the following error:
Starting Neo4j Server failed: Error starting org.neo4j.kernel.EmbeddedGraphDatabase, C:\Users\user\workspace\neo4j_emb_test\target\test.db
Don't know what's going wrong here. Does anybody has suggestions?
Thanks in advance !
Julian
edit:
Checking the logs returned following results:
2014-05-26 14:56:30.988+0000 ERROR [o.n.k.EmbeddedGraphDatabase]: Startup failed: Component 'org.neo4j.kernel.impl.transaction.XaDataSourceManager#7f180826' was successfully initialized, but failed to start. Please see attached cause exception.: Component 'org.neo4j.kernel.impl.nioneo.xa.NeoStoreXaDataSource#71fc9ad0' was successfully initialized, but failed to start. Please see attached cause exception.: 'neostore' has a store version number that we cannot upgrade from. Expected 'NeoStore v0.A.0' but file is version 'NeoStore v0.A.3'.
2014-05-26 14:56:30.988+0000 INFO [o.n.k.EmbeddedGraphDatabase]: Shutdown started

You cannot run two embedded instances against the same Neo4j database at the same time, you need to run Neo4j in stand-alone mode for that. Then you only have access to the REST API provided, and not the Java API.
I had the exact same experience a little while ago that was answered here: Disable locking of Neo4j graph database?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Connect from Hive to HDFS (JSON Files) - java

Related

Having problem to connect on sftp after moving sftp to AWS through camel

How to solve InvokeHTTP error?

Hiveserver2 dont start: "ascii codec cant encode character"

Hive Query returned non zero code

Neo4j Server fails to start embedded database

Categories

Resources