I am switching from python pandas to Java Spark for data analytics and new to it. In my project I'm not using hadoop, just fetching data from mysql db.
The code is like this:
List<QtStkDailyRehabilitation> qtStkDailyRehabilitations = qtStkDailyRehabilitationSelfMapper.getStockDailyKData(stockCode);
SparkSession spark = SparkSession.builder().appName("sparkTest").config("spark.master","local")
.getOrCreate();
StructField field1 = DataTypes.createStructField("date", DataTypes.DateType, false);
StructField field2 = DataTypes.createStructField("close", DataTypes.FloatType, false);
StructType stockInfo = DataTypes.createStructType(Lists.newArrayList(field1, field2));
...
So I didn't install a hadoop on my computer. But I found when I create a SparkSession, it takes a long time - around 20 seconds, and it comes out with a warning.
The log shows below:
2020-05-26 18:45:02.507 DEBUG 15880 --- [nio-8031-exec-1] c.e.d.q.d.Q.getStockDailyKData : ==> Preparing: SELECT * FROM qt_stk_daily_rehabilitation WHERE (stock_code='002241') ORDER BY trade_date DESC
2020-05-26 18:45:02.520 DEBUG 15880 --- [nio-8031-exec-1] c.e.d.q.d.Q.getStockDailyKData : ==> Parameters:
2020-05-26 18:45:02.873 DEBUG 15880 --- [nio-8031-exec-1] c.e.d.q.d.Q.getStockDailyKData : <== Total: 2921
2020-05-26 18:45:12.199 INFO 15880 --- [nio-8031-exec-1] org.apache.spark.SparkContext : Running Spark version 2.4.3
2020-05-26 18:45:21.335 WARN 15880 --- [nio-8031-exec-1] org.apache.hadoop.util.NativeCodeLoader : Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-05-26 18:45:21.406 INFO 15880 --- [nio-8031-exec-1] org.apache.spark.SparkContext : Submitted application: sparkTest
2020-05-26 18:45:21.456 INFO 15880 --- [nio-8031-exec-1] org.apache.spark.SecurityManager : Changing view acls to: edz
As you can see. Running Spark version 2.4.3 shows 10s after sql returns results, and then after 9s it prints a warning about hadoop.
Following codes works ok and I can use DataFrame to calculate, but every time I run into this session create code it will take 20s. I guess it tries to access hadoop server which I don't have one. Is it normal and how to make it faster? Or can I disable hadoop for now just to enjoy it's MLlib features?
Thanks.
Related
I have a Java Spring Boot application (Java 8 JDK) that makes calls to a REST service. For the delete resource case, I need to specify the path as follows: /api/v4/projects/<URL encoded project path>, where the project path is typically represented as a parent "group" followed by the project name. For example: my-group/my-project. So, when I invoke the delete case, the example path needs to be my-group%2Fmy-project.
I am using java.net.URLEncoder.encode(value, StandardCharsets.UTF_8.toString()) to do the encoding and it converts the example path to what it needs to be (the group is test and the project is test-cold-storage):
https://<removed>/api/v4/projects/test%2Ftest-cold-storage
When I invoke the call, however, it fails. I have an interceptor that prints out the details of the request being made. It's showing something unexpected.
2022-07-18 15:49:18.030 INFO 21888 --- [ main] c.b.d.c.c.CustomRequestInterceptor : Request:
2022-07-18 15:49:18.030 INFO 21888 --- [ main] c.b.d.c.c.CustomRequestInterceptor : URI: https://<removed>/api/v4/projects/test%252Ftest-cold-storage
2022-07-18 15:49:18.030 INFO 21888 --- [ main] c.b.d.c.c.CustomRequestInterceptor : Method: DELETE
It looks like it's being encoded again (maybe?). 0x25 = '%' and 0x2F = '/'. If I do it without encoding the group/path, no encoding occurs and it fails again.
2022-07-19 07:47:02.146 INFO 5200 --- [ main] c.b.d.c.c.CustomRequestInterceptor : Request:
2022-07-19 07:47:02.147 INFO 5200 --- [ main] c.b.d.c.c.CustomRequestInterceptor : URI: https://<removed>/api/v4/projects/test/test-cold-storage
2022-07-19 07:47:02.147 INFO 5200 --- [ main] c.b.d.c.c.CustomRequestInterceptor : Method: DELETE
Has anyone else run into this? Is there some setting in the configuration of the RestTemplate object that affects this?
UPDATE
I have managed to trace execution in the debugger and found that it is encoding the URL. This is happening in org.springframework.web.util.DefaultUriBuilderFactory.createURI().
I don't know if this information is helpful.
Figured it out. I needed to pass the project path (i.e. test/test-cold-storage) as a URI variable instead of tacking it on the end of the URL. The endpoint URL need to change as follows:
String endpointURL = baseURL + GITLAB_REST_API + "projects/{path}";
and the delete call changed to add URI variable (projectPath):
template.delete(endpointURL, projectPath);, where in this example projectPath is test/test-cold-storage.
I am trying to start the JMS listener using jmsListenerEndpointRegistry.start() which was stopped using jmsListenerEndpointRegistry.stop(). But looks like it is not getting started. When I am trying to consume the messages it is not allowing me to do so as it is still stopped. Please help me how to start it back using start method.
In application.properties I have spring.jms.listener.auto-startup=true
Using Apache ActiveMQ(Version-5.16.3)
2022-01-06 16:27:54.699 INFO 28804 --- [nio-9091-exec-1] o.s.web.servlet.DispatcherServlet : FrameworkServlet 'dispatcherServlet': initialization completed in 22 ms
2022-01-06 16:27:54.726 ERROR 28804 --- [nio-9091-exec-1] com.jms.poc.controller.JmsController : --------- Trying to start JMS using jmsListenerEndpointRegistry.start()----------
2022-01-06 16:27:54.727 ERROR 28804 --- [nio-9091-exec-1] com.jms.poc.controller.JmsController : ----------jmsListenerEndpointRegistry.isRunning()-------- : false
There is no need to change the autoStartup property in startJmsListener - that property only applies when the application context is initialized.
You have no #JmsListeneners - the only one is commented out.
registry.isRunning() only returns true when at least one of its containers is running.
I am trying to run my project tests in a docker container. All of the tests work just fine when running locally. Errors started occurring when I tried to move my testing to docker container.
Here is the error message:
java.lang.IllegalStateException: Failed to load ApplicationContext
[...]
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'flywayInitializer' defined in class path resource [org/springframework/boot/autoconfigure/flyway/FlywayAutoConfiguration$FlywayConfiguration.class]: Invocation of init method failed; nested exception is org.flywaydb.core.internal.command.DbMigrate$FlywayMigrateException:
Migration V1__initial_user.sql failed
-------------------------------------
SQL State : 42601
Error Code : 0
Message : ERROR: syntax error at or near "GENERATED"
Position: 45
Location : db/migration/V1__initial_user.sql (/Users/villemossip/Desktop/GRP/GRP-SAS/application/build/resources/main/db/migration/V1__initial_user.sql)
Line : 36
Statement : CREATE TABLE revinfo
(
rev INTEGER GENERATED BY DEFAULT AS IDENTITY ( START WITH 1 ),
revtstmp BIGINT,
PRIMARY KEY (rev)
)
From the log we can see that container image was created, but it fails to migrate the sql schema:
[...]
2019-10-10 10:36:18.768 INFO 49547 --- [ main] o.f.c.internal.license.VersionPrinter : Flyway Community Edition 5.2.4 by Boxfuse
2019-10-10 10:36:18.777 INFO 49547 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Starting...
2019-10-10 10:36:18.795 INFO 49547 --- [ main] 🐳 [postgres:9.6.12] : Creating container for image: postgres:9.6.12
2019-10-10 10:36:19.001 INFO 49547 --- [ main] 🐳 [postgres:9.6.12] : Starting container with ID: a32dd0850baf34770cce9bdc81918cd4db40502188b85dfaa90f74e2900f9fa7
2019-10-10 10:36:19.547 INFO 49547 --- [ main] 🐳 [postgres:9.6.12] : Container postgres:9.6.12 is starting: a32dd0850baf34770cce9bdc81918cd4db40502188b85dfaa90f74e2900f9fa7
2019-10-10 10:36:23.342 INFO 49547 --- [ main] 🐳 [postgres:9.6.12] : Container postgres:9.6.12 started
2019-10-10 10:36:23.426 INFO 49547 --- [ main] com.zaxxer.hikari.HikariDataSource : HikariPool-1 - Start completed.
2019-10-10 10:36:23.431 INFO 49547 --- [ main] o.f.c.internal.database.DatabaseFactory : Database: jdbc:postgresql://localhost:32834/test (PostgreSQL 9.6)
2019-10-10 10:36:23.488 INFO 49547 --- [ main] o.f.core.internal.command.DbValidate : Successfully validated 6 migrations (execution time 00:00.024s)
2019-10-10 10:36:23.501 INFO 49547 --- [ main] o.f.c.i.s.JdbcTableSchemaHistory : Creating Schema History table: "public"."flyway_schema_history"
2019-10-10 10:36:23.519 INFO 49547 --- [ main] o.f.core.internal.command.DbMigrate : Current version of schema "public": << Empty Schema >>
2019-10-10 10:36:23.520 INFO 49547 --- [ main] o.f.core.internal.command.DbMigrate : Migrating schema "public" to version 1 - initial user
2019-10-10 10:36:23.542 ERROR 49547 --- [ main] o.f.core.internal.command.DbMigrate : Migration of schema "public" to version 1 - initial user failed! Changes successfully rolled back.
2019-10-10 10:36:23.546 WARN 49547 --- [ main] o.s.w.c.s.GenericWebApplicationContext : Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'flywayInitializer' defined in class path resource [org/springframework/boot/autoconfigure/flyway/FlywayAutoConfiguration$FlywayConfiguration.class]: Invocation of init method failed; nested exception is org.flywaydb.core.internal.command.DbMigrate$FlywayMigrateException:
Migration V1__initial_user.sql failed
-------------------------------------
[...]
Here is part of the sql script (app/src/main/resources/db/migration):
[...]
constraint user_aud_pkey primary key (id, rev)
);
CREATE TABLE revinfo
(
rev INTEGER GENERATED BY DEFAULT AS IDENTITY ( START WITH 1 ),
revtstmp BIGINT,
PRIMARY KEY (rev)
);
CREATE SEQUENCE hibernate_sequence INCREMENT 1 MINVALUE 1
MAXVALUE 9223372036854775807
START 1
CACHE 1;
Here is "application.properties" (app/test/java/resources):
spring.datasource.driver-class-name=org.testcontainers.jdbc.ContainerDatabaseDriver
spring.datasource.url=jdbc:tc:postgresql://localhost:5433/test
spring.jpa.properties.hibernate.jdbc.lob.non_contextual_creation=true
spring.jpa.show-sql=true
spring.jpa.hibernate.ddl-auto=validate
spring.jackson.default-property-inclusion=NON_NULL
spring.flyway.baselineOnMigrate=true
spring.flyway.check-location=true
spring.flyway.locations=classpath:db/migration
spring.flyway.schemas=public
spring.flyway.enabled=true
Also in the same directory I have container-license-acceptance.txt file.
Inside "build.gradle" I added the following lines (app/build.gradle):
dependencies {
[...]
testImplementation "org.testcontainers:junit-jupiter:1.11.3"
testImplementation "org.testcontainers:postgresql:1.11.3"
}
Inside BaseInitTest file, I have the following lines (app/test/java/com):
#Testcontainers
#SpringBootTest
public class BaseIntTest {
#Container
private static final PostgreSQLContainer<?> container = new PostgreSQLContainer<>();
[...]
I don't understand, how can the same tests pass at first, but fail when I move them to docker container?
It looks like the test container with the database has started successfully, so no issue there, you're getting an empty database.
Then you try running the flyway and this fails.
Flyway in spring boot works during the initialization of the spring application context,
so the actual migration runs while the application context gets initialized, so the migration failure looks like a spring failure.
The reason, however, is logged: the migration file has an invalid content:
Migration V1__initial_user.sql failed
-------------------------------------
SQL State : 42601
Error Code : 0
Message : ERROR: syntax error at or near "GENERATED"
Position: 45
Location : db/migration/V1__initial_user.sql (/Users/villemossip/Desktop/GRP/GRP-
SAS/application/build/resources/main/db/migration/V1__initial_user.sql)
Line : 36
Statement : CREATE TABLE revinfo
(
rev INTEGER GENERATED BY DEFAULT AS IDENTITY ( START WITH 1 ),
revtstmp BIGINT,
PRIMARY KEY (rev)
)
This GENERATED BY is unsupported.
Why? Probably your docker image includes the version of RDBMS that doesn't support this syntax. So it differs from the DB that you use in a local environment without docker.
In any case it's not about docker, spring or flyway but about the DB and the migration code.
In terms of resolution, I suggest running the docker image of the DB directly (without java, testcontainers and flyway).
When it runs, just run this migration "manually" in pgadmin or something. You're expected to see the same error.
Thank you #M. Deinum and Mark Bramnik!
I found out that the issue is with Postgres version. For some reason by default docker image is created with old version 9.6.12, but sql script GENERATED BY DEFAULT was added to Postgres with version 10.
Solution 1 (Update the sql script to older version):
CREATE TABLE revinfo
(
rev INTEGER PRIMARY KEY NOT NULL,
revtstmp BIGINT
);
Solution 2:
Changed docker image version to 11.2 by creating CustomPostgreSQLContainer file in the project.
import org.testcontainers.containers.PostgreSQLContainer;
public class CustomPostgreSQLContainer extends PostgreSQLContainer<CustomPostgreSQLContainer> {
private static final String IMAGE_VERSION = "postgres:11.2";
private static CustomPostgreSQLContainer container;
CustomPostgreSQLContainer() {
super(IMAGE_VERSION);
}
public static CustomPostgreSQLContainer getInstance() {
if (container == null) {
container = new CustomPostgreSQLContainer();
}
return container;
}
#Override
public void start() {
super.start();
System.setProperty("spring.datasource.url", container.getJdbcUrl());
System.setProperty("spring.datasource.username", container.getUsername());
System.setProperty("spring.datasource.password", container.getPassword());
}
#Override
public void stop() {
//do nothing, JVM handles shut down
}
}
And updating BaseIntTest file:
#Testcontainers
#SpringBootTest
public class BaseIntTest {
#Container
private static final PostgreSQLContainer<?> container = CustomPostgreSQLContainer.getInstance();
And last removing two lines from test application.properties file:
spring.datasource.driver-class-name=org.testcontainers.jdbc.ContainerDatabaseDriver
spring.datasource.url=jdbc:tc:postgresql://localhost:5433/test
In my case I added in my application.properties file
spring.flyway.baselineOnMigrate = true
and I had also to start from version 1.1 instead of 1 otherwise flyway throws an error (flyway 8.0.5)
The problem in my case was because of using spring.datasource.url=jdbc:postgres://localhost:5432/todoListDb
instead of
spring.datasource.url=jdbc:postgresql://localhost:5432/todoListDb
(postgres instead of PostgreSQL) in my application.properties file
So try to use the right URL for your database and verify if there is a typo in your url
One reason for this issue was missing docker DB version: <embedded-database-spring-test.version>2.0.1</embedded-database-spring-test.version>
By adding this specific version, it worked for me.
As mentioned GENERATED BY DEFAUL needs a newer version of postgres image. In your case the postgres image defaults to 9.6.12.
Simple solution is to just update the datasource url in application.properties and point to a newer postgres image there.
spring.datasource.url=jdbc:tc:postgresql:11.2://localhost:5433/test
I am trying to use kafka with camel and set up the following route:
public class WorkflowEventConsumerRoute extends RouteBuilder {
private static final String KAFKA_ENDPOINT =
"kafka:payments-bus?brokers=localhost:9092";
...
#Override
public void configure() {
from(KAFKA_ENDPOINT)
.routeId(format(KAFKA_CONSUMER))
.to("mock:end);
}
}
When I start my spring boot application I can see the route gets started but immediately after this it shuts down without any reasons given in the logs:
2018-12-21 12:06:45.012 INFO 12184 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka version : 2.0.1
2018-12-21 12:06:45.013 INFO 12184 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka commitId : fa14705e51bd2ce5
2018-12-21 12:06:45.014 INFO 12184 --- [ main] o.a.camel.spring.SpringCamelContext : Route: kafka-consumer started and consuming from: kafka://payments-bus?brokers=localhost%3A9092
2018-12-21 12:06:45.015 INFO 12184 --- [r[payments-bus]] o.a.camel.component.kafka.KafkaConsumer : Subscribing payments-bus-Thread 0 to topic payments-bus
2018-12-21 12:06:45.015 INFO 12184 --- [ main] o.a.camel.spring.SpringCamelContext : Total 1 routes, of which 1 are started
2018-12-21 12:06:45.015 INFO 12184 --- [ main] o.a.camel.spring.SpringCamelContext : Apache Camel 2.23.0 (CamelContext: camel-1) started in 0.234 seconds
2018-12-21 12:06:45.019 INFO 12184 --- [ main] a.c.n.t.p.workflow.WorkflowApplication : Started WorkflowApplication in 3.815 seconds (JVM running for 4.513)
2018-12-21 12:06:45.024 INFO 12184 --- [ Thread-10] o.a.camel.spring.SpringCamelContext : Apache Camel 2.23.0 (CamelContext: camel-1) is shutting down
On the other hand if create an unit test and point to the same kafka endpoint I am able to read the kafka topic content using the org.apache.camel.ConsumerTemplate instance provided by the CamelTestSupport
Ultimately if I replace the kafka endpoint in my route with an activemq one the route starts OK and the application stays up.
Obviously I am missing something but I cannot figure out what.
Thank you in advance for your help.
Do your spring-boot app have a -web-starter or not. If not then you should turn on the camel run-controller to keep the boot application running.
In the application.properties add
camel.springboot.main-run-controller = true
Using scala and spark, i try to build simple apps. To avoid too many logs, i have setLevel Log to Level.ERROR, but it seems all log still appear. Here're my codes :
import org.apache.log4j.{BasicConfigurator, Level, Logger}
import org.apache.spark.{SparkConf, SparkContext}
/**
* Created by hduser on 16/08/16.
*/
object ALS_Test {
def main(command : Array[String]): Unit =
{
Logger.getRootLogger
Logger.getLogger(this.getClass).setLevel(Level.ERROR)
Logger.getLogger("org.spark_project").setLevel(Level.ERROR)
BasicConfigurator.configure()
val sparkConf = new SparkConf().setAppName("AppName").setMaster("local[4]")
val sc = new SparkContext(sparkConf)
println("test 123")
}
}
The Output when running still have many confused logs :
0 [main] INFO org.apache.spark.SparkContext - Running Spark version 2.0.0
163 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation #org.apache.hadoop.metrics2.annotation.Metric(always=false, about=, sampleName=Ops, type=DEFAULT, value=[Rate of successful kerberos logins and latency (milliseconds)], valueName=Time)
175 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation #org.apache.hadoop.metrics2.annotation.Metric(always=false, about=, sampleName=Ops, type=DEFAULT, value=[Rate of failed kerberos logins and latency (milliseconds)], valueName=Time)
176 [main] DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation #org.apache.hadoop.metrics2.annotation.Metric(always=false, about=, sampleName=Ops, type=DEFAULT, value=[GetGroups], valueName=Time)
177 [main] DEBUG org.apache.hadoop.metrics2.impl.MetricsSystemImpl - UgiMetrics, User and group related metrics
555 [main] DEBUG org.apache.hadoop.util.Shell - Failed to detect a valid hadoop home directory
java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.