Setting up a sharded orientdb - java

I'm trying to setup on 3 servers. For the purpose of an example, I'm
trying to setup a class "client", with 3 clusters "client_1",
"client_2", and "client_3". My servers are called node1, node2, and
node3. I want the clusters arranged such that I have 2 copies of each
cluster, so if 1 node goes down I still have access to all the data, so for
example:
node1 is master for client_1 and has a copy of client_2.
node2 is master for client_2 and has a copy of client_3.
node3 is master for client_3 and has a copy of client_1.
I've tried setting this up with the following steps:
1. Download OrientDB 2.1.1 Community and extract onto the 3 servers.
2. Delete the GratefulDeadConcerts database from the databases directory on
each server.
3. Edit default-distributed-db-config.json on node1 as follows :
{
"autoDeploy": true,
"hotAlignment": false,
"executionMode": "undefined",
"readQuorum": 1,
"writeQuorum": 2,
"failureAvailableNodesLessQuorum": false,
"readYourWrites": true,
"clusters": {
"internal": {
},
"index": {
},
"client_1": {
"servers" : [ "node1","node2" ]
},
"client_2": {
"servers" : [ "node2","node3" ]
},
"client_3": {
"servers" : [ "node3","node1" ]
},
"*": {
"servers" : [ "<NEW_NODE>" ]
}
}
}
Start node1 with dserver.sh.
Create a database using console on node1:
connect remote:localhost root password
create database remote:localhost/testdb root password plocal graph
Create a class and rename the default cluster:
create class client extends v
alter cluster client name client_1
Startup node2 with dserver.sh, wait for database to auto deploy, then
startup node3 and wait for deploy
At this point I have a database on 3 nodes, with a class called "client"
with only one cluster "client_1".
On node2, add the client_2 cluster:
alter class client addcluster client_2
Similarly, on node3:
alter class client addcluster client_3
If I reconnect all console sessions and execute "list clusters" I now see
all 3 clusters of the client class on each node. I also see the .cpm and
.pcl files for each of the 3 clusters on each node. However, it appears
that my intention in default-distributed-db-config.json is being taken into
account as if I wait a couple of minutes and then insert a record from each
node I see that the timestamps and file sizes only change on the files
relating to the clusters that are supposed to be present on each node
(would be nice and less confusing if the files didn't exist on the wrong
nodes, but its not the end of the world).
So... now it appears that I have the database setup the way I intended, but
the point of doing this is so that we can survive a server going down, so I
shutdown node3 with ctrl-c. I can still see each of the records (I inserted
3, one per cluster) from both node1 and node2 - so far so good.
If I take a look at the contents of distirbuted-db.json on node1 or node2,
I now see my "client" class clusters have been reconfigured - there's no
node3 in the config any longer:
"client_3": { "servers": [ "node1" ], "#version": 0, "#type": "d" },
"client_2": { "servers": [ "node2" ], "#version": 0, "#type": "d" },
"client_1": { "servers": [ "node1", "node2" ], "#version": 0,
"#type": "d" }
Now I restart node3. The config is not getting updated again:
"client_3": { "servers": [ "node1" ], "#version": 0,
"#type": "d" },
"client_2": { "servers": [ "node2" ], "#version": 0, "#type": "d" },
"client_1": { "servers": [ "node1", "node2" ], "#version": 0,
"#type": "d" }
Is there something wrong in the way I've created/configured the database or is this a bug?

I think the issue here is that "hotAlignment" needs to be set to "true" in the file "default-distributed-db-config.json". Per the OrientDB 2.2.x sharding doc, "If hotAlignment=false is set, when a node re-joins the cluster (after a failure or simply unreachability) the full copy of database from a node could have no all information about the shards." Note, though, this bullet from the changes between 2.1.x to 2.2.x: "Removed hotAlignment setting: servers, once they join the cluster, remain always in the configuration until they are manually removed."

Related

Windowing is not triggered when we deployed the Flink application into Kinesis Data Analytics

We have an Apache Flink POC application which works fine locally but after we deploy into Kinesis Data Analytics (KDA) it does not emit records into the sink.
Used technologies
Local
Source: Kafka 2.7
1 broker
1 topic with partition of 1 and replication factor 1
Processing: Flink 1.12.1
Sink: Managed ElasticSearch Service 7.9.1 (the same instance as in case of AWS)
AWS
Source: Amazon MSK Kafka 2.8
3 brokers (but we are connecting to one)
1 topic with partition of 1, replication factor 3
Processing: Amazon KDA Flink 1.11.1
Parallelism: 2
Parallelism per KPU: 2
Sink: Managed ElasticSearch Service 7.9.1
Application logic
The FlinkKafkaConsumer reads messages in json format from the topic
The jsons are mapped to domain objects, called Telemetry
private static DataStream<Telemetry> SetupKafkaSource(StreamExecutionEnvironment environment){
Properties kafkaProperties = new Properties();
kafkaProperties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "BROKER1_ADDRESS.amazonaws.com:9092");
kafkaProperties.setProperty(ConsumerConfig.GROUP_ID_CONFIG, "flink_consumer");
FlinkKafkaConsumer<String> consumer = new FlinkKafkaConsumer<>("THE_TOPIC", new SimpleStringSchema(), kafkaProperties);
consumer.setStartFromEarliest(); //Just for repeatable testing
return environment
.addSource(consumer)
.map(new MapJsonToTelemetry());
}
The Telemetry’s timestamp is chosen for EventTimeStamp.
3.1. With forMonotonousTimeStamps
Telemetry’s StateIso is used for keyBy.
4.1. The two letter iso code of the state of USA
5 seconds tumbling window strategy is applied
private static SingleOutputStreamOperator<StateAggregatedTelemetry> SetupProcessing(DataStream<Telemetry> telemetries) {
WatermarkStrategy<Telemetry> wmStrategy =
WatermarkStrategy
.<Telemetry>forMonotonousTimestamps()
.withTimestampAssigner((event, timestamp) -> event.TimeStamp);
return telemetries
.assignTimestampsAndWatermarks(wmStrategy)
.keyBy(t -> t.StateIso)
.window(TumblingEventTimeWindows.of(Time.seconds(5)))
.process(new WindowCountFunction());
}
A custom ProcessWindowFunction is called to perform some basic aggregation.
6.1. We calculate a single StateAggregatedTelemetry
ElasticSearch is configured as sink.
7.1. StateAggregatedTelemetry data are mapped into a HashMap and pushed into source.
7.2. All setBulkFlushXYZ methods are set to low values
private static void SetupElasticSearchSink(SingleOutputStreamOperator<StateAggregatedTelemetry> telemetries) {
List<HttpHost> httpHosts = new ArrayList<>();
httpHosts.add(HttpHost.create("https://ELKCLUSTER_ADDRESS.amazonaws.com:443"));
ElasticsearchSink.Builder<StateAggregatedTelemetry> esSinkBuilder = new ElasticsearchSink.Builder<>(
httpHosts,
(ElasticsearchSinkFunction<StateAggregatedTelemetry>) (element, ctx, indexer) -> {
Map<String, Object> record = new HashMap<>();
record.put("stateIso", element.StateIso);
record.put("healthy", element.Flawless);
record.put("unhealthy", element.Faulty);
...
LOG.info("Telemetry has been added to the buffer");
indexer.add(Requests.indexRequest()
.index("INDEXPREFIX-"+ from.format(DateTimeFormatter.ofPattern("yyyy-MM-dd")))
.source(record, XContentType.JSON));
}
);
//Using low values to make sure that the Flush will happen
esSinkBuilder.setBulkFlushMaxActions(25);
esSinkBuilder.setBulkFlushInterval(1000);
esSinkBuilder.setBulkFlushMaxSizeMb(1);
esSinkBuilder.setBulkFlushBackoff(true);
esSinkBuilder.setRestClientFactory(restClientBuilder -> {});
LOG.info("Sink has been attached to the DataStream");
telemetries.addSink(esSinkBuilder.build());
}
Excluded things
We managed to put Kafka, KDA and ElasticSearch under the same VPC and same subnets to avoid the need to sign each request
From the logs we could see that the Flink can reach the ES cluster.
Request
{
"locationInformation": "org.apache.flink.streaming.connectors.elasticsearch7.Elasticsearch7ApiCallBridge.verifyClientConnection(Elasticsearch7ApiCallBridge.java:135)",
"logger": "org.apache.flink.streaming.connectors.elasticsearch7.Elasticsearch7ApiCallBridge",
"message": "Pinging Elasticsearch cluster via hosts [https://...es.amazonaws.com:443] ...",
"threadName": "Window(TumblingEventTimeWindows(5000), EventTimeTrigger, WindowCountFunction) -> (Sink: Print to Std. Out, Sink: Unnamed, Sink: Print to Std. Out) (2/2)",
"applicationARN": "arn:aws:kinesisanalytics:...",
"applicationVersionId": "39",
"messageSchemaVersion": "1",
"messageType": "INFO"
}
Response
{
"locationInformation": "org.elasticsearch.client.RequestLogger.logResponse(RequestLogger.java:59)",
"logger": "org.elasticsearch.client.RestClient",
"message": "request [HEAD https://...es.amazonaws.com:443/] returned [HTTP/1.1 200 OK]",
"threadName": "Window(TumblingEventTimeWindows(5000), EventTimeTrigger, WindowCountFunction) -> (Sink: Print to Std. Out, Sink: Unnamed, Sink: Print to Std. Out) (2/2)",
"applicationARN": "arn:aws:kinesisanalytics:...",
"applicationVersionId": "39",
"messageSchemaVersion": "1",
"messageType": "DEBUG"
}
We could also verify that the messages had been read from the Kafka topic and sent for processing by looking at the Flink Dashboard
What we have tried without luck
We had implemented a RichParallelSourceFunction which emits 1_000_000 messages and then exits
This worked well in the Local environment
The job finished in the AWS environment, but there was no data on the sink side
We had implemented an other RichParallelSourceFunction which emits 100 messages at each second
Basically we had two loops a while(true) outer and for inner
After the inner loop we called the Thread.sleep(1000)
This worked perfectly fine on the local environment
But in AWS we could see that checkpoints' size grow continuously and no message appeared in ELK
We have tried to run the KDA application with different parallelism settings
But there was no difference
We also tried to use different watermarking strategies (forBoundedOutOfOrderness, withIdle, noWatermarks)
But there was no difference
We have added logs for the ProcessWindowFunction and for the ElasticsearchSinkFunction
Whenever we run the application from IDEA then these logs were on the console
Whenever we run the application with KDA then there was no such logs in CloudWatch
Those logs that were added to the main they do appear in the CloudWatch logs
We suppose that we don't see data on the sink side because the window processing logic is not triggered. That's why don't see processing logs in the CloudWatch.
Any help would be more than welcome!
Update #1
We have tried to downgrade the Flink version from 1.12.1 to 1.11.1
There is no change
We have tried processing time window instead of event time
It did not even work on the local environment
Update #2
The average message size is around 4kb. Here is an excerpt of a sample message:
{
"affiliateCode": "...",
"appVersion": "1.1.14229",
"clientId": "guid",
"clientIpAddr": "...",
"clientOriginated": true,
"connectionType": "Cable/DSL",
"countryCode": "US",
"design": "...",
"device": "...",
...
"deviceSerialNumber": "...",
"dma": "UNKNOWN",
"eventSource": "...",
"firstRunTimestamp": 1609091112818,
"friendlyDeviceName": "Comcast",
"fullDevice": "Comcast ...",
"geoInfo": {
"continent": {
"code": "NA",
"geoname_id": 120
},
"country": {
"geoname_id": 123,
"iso_code": "US"
},
"location": {
"accuracy_radius": 100,
"latitude": 37.751,
"longitude": -97.822,
"time_zone": "America/Chicago"
},
"registered_country": {
"geoname_id": 123,
"iso_code": "US"
}
},
"height": 720,
"httpUserAgent": "Mozilla/...",
"isLoggedIn": true,
"launchCount": 19,
"model": "...",
"os": "Comcast...",
"osVersion": "...",
...
"platformTenantCode": "...",
"productCode": "...",
"requestOrigin": "https://....com",
"serverTimeUtc": 1617809474787,
"serviceCode": "...",
"serviceOriginated": false,
"sessionId": "guid",
"sessionSequence": 2,
"subtype": "...",
"tEventId": "...",
...
"tRegion": "us-east-1",
"timeZoneOffset": 5,
"timestamp": 1617809473305,
"traits": {
"isp": "Comcast Cable",
"organization": "..."
},
"type": "...",
"userId": "guid",
"version": "v1",
"width": 1280,
"xb3traceId": "guid"
}
We are using ObjectMapper to parse only just some of the fields of the json. Here is how the Telemetry class looks like:
public class Telemetry {
public String AppVersion;
public String CountryCode;
public String ClientId;
public String DeviceSerialNumber;
public String EventSource;
public String SessionId;
public TelemetrySubTypes SubType; //enum
public String TRegion;
public Long TimeStamp;
public TelemetryTypes Type; //enum
public String StateIso;
...
}
Update #3
Source
Subtasks tab
ID
Bytes received
Records received
Bytes sent
Records sent
Status
0
0 B
0
0 B
0
RUNNING
1
0 B
0
2.83 MB
15,000
RUNNING
Watermarks tab
No Data
Window
Subtasks tab
ID
Bytes received
Records received
Bytes sent
Records sent
Status
0
1.80 MB
9,501
0 B
0
RUNNING
1
1.04 MB
5,499
0 B
0
RUNNING
Watermarks
SubTask
Watermark
1
No Watermark
2
No Watermark
According the comments and more information You have provided, it seems that the issue is the fact that two Flink consumers can't consume from the same partition. So, in Your case only one parallel instance of the operator will consume from kafka partition and the other one will be idle.
In general Flink operator will select MIN([all_downstream_parallel_watermarks]), so In Your case one Kafka Consumer will produce normal Watermarks and the other will never produce anything (flink assumes Long.Min in that case), so Flink will select the lower one which is Long.Min. So, window will never be fired, because while the data is flowing one of the watermarks is never generated. The good practice is to use the same paralellism as the number of Kafka partitions when working with Kafka.
After having a support session with the AWS folks it turned out that we have missed to set the time characteristic on the streaming environment.
In 1.11.1 the default value of TimeCharacteristic was IngestionTime.
Since 1.12.1 (see related release notes) the default value is EventTime:
In Flink 1.12 the default stream time characteristic has been changed to EventTime, thus you don’t need to call this method for enabling event-time support anymore.
So, after we have set that EventTime explicitly then it started to generates watermarks like a charm:
streamingEnv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

Why an user with the 'read' role on the database cannot list the collections?

Given a normal user ('simpleROUser', only with the 'read' role on the database), it is throwing error when attempting to list collections.
The error message is:
Exception in thread "main" com.mongodb.MongoCommandException: Command failed with error 13 (Unauthorized): 'not authorized on wmMonitoring to execute command { listCollections: 1, cursor: {}, $db: "wmMonitoring", ...' on server xxxxxxx:27001. The full response is {"operationTime": {"$timestamp": {"t": 1614169303, "i": 1}}, "ok": 0.0, "errmsg": "not authorized on wmMonitoring to execute command { listCollections: 1, cursor: {}, $db: \"wmMonitoring\", ...
However, changing only the user credentials to one with 'root' role, it works (lists all the collections under the database 'wmMonitoring'.
I've checked the 'simpleROUser' privileges, the 'listCollections' is there.
rs-dev-00:PRIMARY> grants = db.getUser( "simpleROUser", { showCredentials: true, showPrivileges: true, showAuthenticationRestrictions: true } )
rs-dev-00:PRIMARY> grants.user
simpleROUser
rs-dev-00:PRIMARY> grants.inheritedPrivileges
[
{
"resource" : {
"db" : "wmISMonitoring",
"collection" : ""
},
"actions" : [
...
"listCollections",
...
]
},
...
]
rs-dev-00:PRIMARY>
So... what am I missing?
More info:
MongoDB server: Percona distribution, v4.4.1-3
Mongo Java Driver: v4.2.1
Found th issue.
In Brazilan Portuguese, this is also referred as "dedo gordo".
There was a typo in the grant: the actual database name is wmMonitoring.
Once the database name (in the grant command) has fixed, everything else worked.

Unable to push docker image to Nexus

I am running Nexus OSS version 3.29.2-02 and I am experiencing some weird behavior. I am building various images at a CI level (GitLab) and I am pushing them to a custom repository.
For the most part everything works OK and I have no issues tagging and pushing my produced images. Lately though, one of the projects that produces Docker images fails to push, with the following error:
$ TAGGED=${NEXUS_DOCKER_URL}/${BASE_IMAGE_NAME}:snapshot-MR${CI_MERGE_REQUEST_IID}
$ docker tag ${BASE_IMAGE_NAME}:latest ${TAGGED}
$ docker push ${TAGGED}
The push refers to repository [<custom-repository-url>/<image-name:tag>]
cade37b0f9c9: Preparing
578ec024f17c: Preparing
fe0b994190e8: Preparing
b24d08ca4359: Preparing
9a14db3b513b: Preparing
777b2c648970: Preparing
777b2c648970: Waiting
b24d08ca4359: Layer already exists
9a14db3b513b: Layer already exists
777b2c648970: Layer already exists
fe0b994190e8: Pushed
cade37b0f9c9: Pushed
578ec024f17c: Pushed
[DEPRECATION NOTICE] registry v2 schema1 support will be removed in an upcoming release. Please contact admins of the <custom-repository-url> registry NOW to avoid future disruption.
errors:
blob unknown: blob unknown to registry
blob unknown: blob unknown to registry
ERROR: Job failed: exit code 1
I have tried debugging this behavior as well as search online for a solution but I have yet to find anything. It seems that for some reason, this specific Docker image cannot be uploaded. I have tried the same procedure from both a local machine as well as from stateless CI builders and the behavior is consistent i.e. I was able to push it only once and then the process kept failing.
For reference my Dockerfile is the following:
FROM <custom-repository-url>/adoptopenjdk/openjdk11:jre-11.0.10_9-alpine
WORKDIR /home/app
COPY build/libs/email-service.jar application.jar
# Set the appropriate timezone
RUN apk add --no-cache tzdata && \
cp /usr/share/zoneinfo/America/New_York /etc/localtime && \
echo "America/New_York" > /etc/timezone
EXPOSE 8080
CMD java -jar ${OPTS} application.jar
Which is quite straightforward and does not hide anything complicated. I initially thought that the problem could have been attributed to using a proxied based image (i.e FROM) but this is done of several other projects without any issues.
I have tried also checking Nexus's logs and the only thing I see is the following:
2021-02-05 17:12:27,441+0000 ERROR [qtp1025847496-15765] ci-deploy org.sonatype.nexus.repository.docker.internal.orient.V2ManifestUtilImpl - Manifest refers to missing layer: sha256:66db482b5034f8eda0b18533d4eddb0012f4940bf3d348b08ac3bac8486bb2ee for: fts/marketing/email-service/snapshot-MR40 in repository RepositoryImpl$$EnhancerByGuice$$4d5af99c{type=hosted, format=docker, name='docker-hosted-s3'}
2021-02-05 17:12:27,443+0000 ERROR [qtp1025847496-15765] ci-deploy org.sonatype.nexus.repository.docker.internal.orient.V2ManifestUtilImpl - Manifest refers to missing layer: sha256:2ec25ba939258edb2e85293896c5126478d79fe416d3b60feb20426755bcea5a for: fts/marketing/email-service/snapshot-MR40 in repository RepositoryImpl$$EnhancerByGuice$$4d5af99c{type=hosted, format=docker, name='docker-hosted-s3'}
2021-02-05 17:12:27,445+0000 WARN [qtp1025847496-15765] ci-deploy org.sonatype.nexus.repository.docker.internal.V2Handlers - Error: PUT /v2/fts/marketing/email-service/manifests/snapshot-MR40: 400 - org.sonatype.nexus.repository.docker.internal.V2Exception: Invalid Manifest
So my question are:
What does this error really mean? I don't find it very useful:
errors:
blob unknown: blob unknown to registry
blob unknown: blob unknown to registry
What is really causing this behavior and how can I address the problem?
Note (not that it should make any difference), the image is a dockerized Micronaut application, using the latest version of the framework.
For reference, the output of docker inspect for said image is the following:
[{
"Id": "sha256:fec226a68e3b744fc792e47d3235e67f06b17883e60df52c8ae82c5a7ba9750f",
"RepoTags": [
"<custom-repository-url>/fts/marketing/email-service:mes-33-3",
"test-mes-33:latest"
],
"RepoDigests": [],
"Parent": "sha256:ddd8e2235b60d7636283097fc61e5971c32b3006ee52105e2a77e7d4ee7e709e",
"Comment": "",
"Created": "2021-02-06T21:06:59.987108458Z",
"Container": "8ab70692b75aac21d0866816aa52af5febf620744282d71a39dce55f81fe3e44",
"ContainerConfig": {
"Hostname": "8ab70692b75a",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"ExposedPorts": {
"8080/tcp": {}
},
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/opt/java/openjdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"LANG=en_US.UTF-8",
"LANGUAGE=en_US:en",
"LC_ALL=en_US.UTF-8",
"JAVA_VERSION=jdk-11.0.10+9",
"JAVA_HOME=/opt/java/openjdk"
],
"Cmd": [
"/bin/sh",
"-c",
"#(nop) ",
"CMD [\"/bin/sh\" \"-c\" \"java -jar ${OPTS} application.jar\"]"
],
"Image": "sha256:ddd8e2235b60d7636283097fc61e5971c32b3006ee52105e2a77e7d4ee7e709e",
"Volumes": null,
"WorkingDir": "/home/app",
"Entrypoint": null,
"OnBuild": null,
"Labels": {}
},
"DockerVersion": "19.03.13",
"Author": "",
"Config": {
"Hostname": "",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"ExposedPorts": {
"8080/tcp": {}
},
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/opt/java/openjdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"LANG=en_US.UTF-8",
"LANGUAGE=en_US:en",
"LC_ALL=en_US.UTF-8",
"JAVA_VERSION=jdk-11.0.10+9",
"JAVA_HOME=/opt/java/openjdk"
],
"Cmd": [
"/bin/sh",
"-c",
"java -jar ${OPTS} application.jar"
],
"Image": "sha256:ddd8e2235b60d7636283097fc61e5971c32b3006ee52105e2a77e7d4ee7e709e",
"Volumes": null,
"WorkingDir": "/home/app",
"Entrypoint": null,
"OnBuild": null,
"Labels": null
},
"Architecture": "amd64",
"Os": "linux",
"Size": 220998577,
"VirtualSize": 220998577,
"GraphDriver": {
"Data": {
"LowerDir": "/var/lib/docker/overlay2/78561c2e477b099a547bead4ea17b677bb01376fc1ed1ce1cd942157d35c0329/diff:/var/lib/docker/overlay2/af8ac4feace0cbecd616e2a02850ec366715aaa5ac8ad143cb633f52b0f6fbe2/diff:/var/lib/docker/overlay2/211a8e68c833f664de5d304838b8cd98b8e5e790f79da8b8839a4d52d02a8d66/diff:/var/lib/docker/overlay2/cbc98e7274ff8266425aed31989066ff7c5f7a46d9334b84110fc57d8b1d942c/diff:/var/lib/docker/overlay2/c773dedbc53b81c2e68ad61811445c0377271db3af526dbf5a6aa6671d0b2b71/diff",
"MergedDir": "/var/lib/docker/overlay2/04240d9f745382480e52e04d8088de6f65a9ece0cd6e091953087f3d06fcc93c/merged",
"UpperDir": "/var/lib/docker/overlay2/04240d9f745382480e52e04d8088de6f65a9ece0cd6e091953087f3d06fcc93c/diff",
"WorkDir": "/var/lib/docker/overlay2/04240d9f745382480e52e04d8088de6f65a9ece0cd6e091953087f3d06fcc93c/work"
},
"Name": "overlay2"
},
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:777b2c648970480f50f5b4d0af8f9a8ea798eea43dbcf40ce4a8c7118736bdcf",
"sha256:9a14db3b513b928759670c6a9b15fd89a8ad9bf222c75e0998c21bcb04e25e48",
"sha256:b24d08ca43598c9ea44f73c3f5dfca2b4897c475b2cc480bac98cccc42dce10f",
"sha256:11d1fa1ad1ef523c60369c11b1096baf89c8d43afa53813e84c73d0926848598",
"sha256:30001f69fd3b3b08fdbf6d843e38d0a16d0e46e84923f92480ac88603c0eb680",
"sha256:b2d3c5f57d1d626a7501b8871f548fd7e1f7625fe05c1885c27ec415b14e9915"
]
},
"Metadata": {
"LastTagTime": "2021-02-06T23:08:30.440032169+02:00"
}
}]
A docker registry (in your case Nexus) throws that error whenever it encounters a missing/invalid layer in the image.
Nexus used to have difficulty with foreign layers but that shouldn't be a problem since you are running quite a recent version.
I would think that you only need to enable "foreign layer caching" in Nexus to get this working.
It'd be helpful to include the output of docker manifest inspect <custom-repository-url>/adoptopenjdk/openjdk11:jre-11.0.10_9-alpine and docker manifest inspect ${TAGGED}
Docker registry API spec

Elasticsearch local node has no active shards

I have a java/dropwizard application using elasticsearch (running version 1.3.2 but I've also tested with 1.4.0)
For local development/testing I use elasticsearch in local mode, starting it with the following commands:
Node node = nodeBuilder().local(true).node();
Client client = node.client();
According the documentation (http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/client.html, section on Node Client) this should be sufficient, but I'm not able to run any queries. Checking the health status I get the following result:
{
"cluster_name" : "elasticsearch",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 10
}
I assume the problem is that I have no active shards. Do anyone know how activate shards for this kind of setup?

Failed to restore snapshot - IndexShardRestoreFailedException file not found in elasticsearch

I am using 3 node cluster setup with the elasticsearch 1.3.1, i have 17 indices each one is having min 0.5 M (1Gi) documents and 1.4 M (3 Gi) max. now i would like to try the snapshot and restore process in my cluster. i used the following REST calls to do the same...
To create a repository:
curl -XPUT 'http://host.name:9200/_snapshot/es_snapshot_repo' -d '{
"type": "fs",
"settings": {
"location": "/data/es_snapshot_bkup_repo/es_snapshot_repo"
}
}'
Verified the repository:
curl -XGET 'http://host.name:9200/_snapshot/es_snapshot_repo?pretty' the response is
{
"es_snapshot_repo" : {
"type" : "fs",
"settings" : {
"location" : "/data/es_snapshot_bkup_repo/es_snapshot_repo"
}
}
}
done the SNAPSHOT using
curl -XPUT "http://host.name:9200/_snapshot/es_snapshot_repo/snap_001" -d '{
"indices": "index_01",
"ignore_unavailable": "true",
"include_global_state": false,
"wait_for_completion": true
}'
the response is
{
"accepted": true
}
then I am trying to restore the snapshot by the request
curl -XPOST "http://host.name:9200/_snapshot/es_snapshot_repo/snap_001/_restore" -d '{
"indices": "index_01",
"ignore_unavailable": "true",
"include_global_state": false,
"rename_pattern": "index_01",
"rename_replacement": "index_01_bk",
"include_aliases": false
}'
ISSUE:
As I informed I have 3 nodes. the index which I am trying to take snapshot & restore is has 6 shards and 2 replicas.
Most of the shards and its replicas are restored properly, but sometimes 1, sometimes 2 primary shards and its replicas restoring is not happen. those primary shards are in the INITIALIZING state. I allow the cluster to relocate them for more than an hour but the shards are not relocating to the correct node... I got the following exception in my node.
the restore process trying to place the shard in the other 2 nodes... but it can't possible...
[2014-08-27 07:10:35,492][DEBUG][cluster.service ] [node_01] processing [
shard-failed (
[snap_001][4],
node[r4UoA7vJREmQfh6lz634NA],
[P],
restoring[es_snapshot_repo:snap_001],
s[INITIALIZING]),
reason [Failed to start shard,
message [IndexShardGatewayRecoveryException[[snap_001][4] failed recovery];
nested: IndexShardRestoreFailedException[[snap_001][4] restore failed];
nested: IndexShardRestoreFailedException[[snap_001][4] failed to restore snapshot [snap_001]];
nested: IndexShardRestoreFailedException[[snap_001][4] failed to read shard snapshot file];
nested: FileNotFoundException[/data/es_snapshot_bkup_repo/es_snapshot_repo/indices/index_01/4/snapshot-snap_001 (No such file or directory)]; ]]]:
done applying updated cluster_state (version: 56391)
Could anyone help me to overcome this issue and please correct me if I done any mistake in these process...
FYI I am using master node to pass the curl request
We need to provide a shared file system location which should be access by all the elasticsearch nodes with read & write permission.

Categories

Resources