I'm using Solr in my web application as search engine. I use the DataImportHandler to automatically import data from my database into the search index. When the DataImportHandler adds new data, the data is successfully added to the index, but it isn't returned when I query the index using SolrJ: I have to restart my application server for the data to be found by SolrJ. Is there some kind of caching going on? I used SolrJ in embedded mode. Here's my SolrJ code:
private static final SolrServer solrServer = initSolrServer();
private static SolrServer initSolrServer() {
try {
CoreContainer.Initializer initializer = new CoreContainer.Initializer();
coreContainer = initializer.initialize();
EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer, "");
return server;
} catch (Exception ex) {
logger.log(Level.SEVERE, "Error initializing SOLR server", ex);
return null;
}
}
Then to query I do the following:
SolrQuery query = new SolrQuery(keyword);
QueryResponse response = solrServer.query(query);
As you can see, my SolrServer is declared as static. Should I create a new EmbeddedSolrServer for each query instead? I'm afraid that will incur a big performance penalty.
Standard configuration of Solr Server doesn't provide auto-commit. If you have solrconfig.xml file, look for the commented tag "autoCommit". If not, then after each document added you can call server.commit();, although with large stream of documents this could prove a big performance issue (as commit is relatively heavy operation).
If you are using it in a web application, I'd advise using solr-x.x.war in your deploy instead of EmbeddedSolrServer. This will provide you with rich Http interface for updating, administrating and searching the index.
Related
I have an function app: funct1(HttpTrigger) -> blob storage -> func2(BlobTrigger). In Application Insights, there will be two separate request telemetry generated with different operation id. Each has its own end-to-end transaction trace.
In order to get the end-to-end trace for the whole app, I would like to correlate two functions by setting the parent id and operation id of func2 with request id and operation id of func1 so both can be shown in application insights as one end-to-end trace.
I have tried following code but it didn't take any effect and there is a lack of documentation about how to use application insights Java SDK in general for customizing telemetry.
#FunctionName("Create-Thumbnail")
#StorageAccount(Config.STORAGE_ACCOUNT_NAME)
#BlobOutput(name = "$return", path = "output/{name}")
public byte[] generateThumbnail(
#BlobTrigger(name = "blob", path = "input/{name}")
byte[] content,
final ExecutionContext context
) {
try {
TelemetryConfiguration configuration = TelemetryConfiguration.getActive();
TelemetryClient client = new TelemetryClient(configuration);
client.getContext().getOperation().setParentId("MY_CUSTOM_PARENT_ID");
client.flush();
return Converter.createThumbnail(content);
} catch (Exception e) {
e.printStackTrace();
return content;
}
}
Anyone with knowledge in this area can provide some tips?
I'm afraid it can't be achieved as the official doc said :
In C# and JavaScript, you can use an Application Insights SDK to write
custom telemetry data.
If you need to set custom telemetry, you need to add app insights java SDK to your function, but I haven't found any SDK... If there's any progress, I'll update here.
I am using java to connect to JanusGraph using gremlin and using following code to create Vertex and Edge. Currently i am using g.tx().commit() as part of client.submit as shown below code :
try {
String sessionId = UUID.randomUUID().toString();
Client client = cluster.connect(sessionId);
client.submit("graph.tx().open()");
client.submit("g.addV('Person').property('Name', 'Justin').next()");
**client.submit("graph.tx().commit()");**
List<Result> rs = client.submit("g.V().count()").all().join();
System.out.println("Result size is "+rs.size());
System.out.println(rs.get(0).getString());
client.closeAsync();
} catch (Exception e) {}
So want to know if there is any other more appropriate way to handle transactions using java or this is the only way to do so.
Thanks,
Atul.
If you are submitting requests to a remote JanusGraph server then that is the way to do it. You use connect(<sessionId>) to create a session and then submit scripts against it. In the recently released TinkerPop 3.5.0 however there are changes to that rule. You can now do bytecode based sessions as well as script based sessions which means that the transaction API is now unified for both embedded and remote use cases. You can see more in the 3.5.0 Upgrade Documentation found here.
The 3.5.0 release is quite recent, having only been announced a couple of weeks ago. As a result at the time of this answer, JanusGraph does not yet support it (though work has started on it here). Until you are on a release of JanusGraph that supports TinkerPop 3.5.0 you have two options for transactions:
The one you are doing for remote use cases or,
Use JanusGraph in the embedded style.
For the latter, as taken from the documentation in the link provided:
graph = JanusGraphFactory.open("berkeleyje:/tmp/janusgraph")
juno = graph.addVertex() //Automatically opens a new transaction
juno.property("name", "juno")
graph.tx().commit() //Commits transaction
public boolean transactionExample() {
System.out.println("Begin Transaction");
Transaction tx = g.tx();
String id = "123321";
GraphTraversalSource gtx = tx.begin();
try {
gtx.addV("T").property(T.id, id).next();
System.out.println("Searching before commit ==> " + gtx.V().hasId(id).elementMap().next());
if (2/0 == 0) {
throw new TransactionException("throwing exception");
}
tx.commit();
System.out.println("Committed Transaction");
} catch (Exception ex) {
System.out.println("Catching exception " + ex);
System.out.println(gtx);
tx.rollback();
System.out.println("Rollbacked Transaction");
}
System.out.println(gtx.tx().isOpen());
return true;
}
For more information refer https://github.com/m-thirumal/gremlin-dsl
I created an AWS Lambda package (Java) with a function that reads some files from Amazon S3 and pushes the data to AWS ElasticSearch Service. Since I'm using AWS Elastic Search, I can't use the Transport client, in which case I'm working with the Jest Client to push via REST. The issue is with the Jest client.
Here's my Jest client instance:
public JestClient getClient() throws InterruptedException{
final Supplier<LocalDateTime> clock = () -> LocalDateTime.now(ZoneOffset.UTC);
DefaultAWSCredentialsProviderChain awsCredentialsProvider = new DefaultAWSCredentialsProviderChain();
final AWSSigner awsSigner = new AWSSigner(awsCredentialsProvider, REGION, SERVICE, clock);
JestClientFactory factory = new JestClientFactory() {
#Override
protected HttpClientBuilder configureHttpClient(HttpClientBuilder builder) {
builder.addInterceptorLast(new AWSSigningRequestInterceptor(awsSigner));
return builder;
}
#Override
protected HttpAsyncClientBuilder configureHttpClient(HttpAsyncClientBuilder builder) {
builder.addInterceptorLast(new AWSSigningRequestInterceptor(awsSigner));
return builder;
}
};
factory.setHttpClientConfig(
new HttpClientConfig.Builder(URL)
.discoveryEnabled(true)
.multiThreaded(true).build());
JestClient jestClient = factory.getObject();
return jestClient;
}
Since the AWS Elasticsearch domain is protected by an IAM access policy, I sign the requests for them to be authorized by AWS (example here). I use POJOs to index documents.
The problem I face is that I am not able to execute more than one action with the jest client instance. For example, if I created the index first :
client.execute(new CreateIndex.Builder(indexName).build());
and later on, I wanted to, for example do some bulk indexing:
for (Object object : listOfObjects) {
bulkIndexBuilder.addAction(new Index.Builder(object ).
index(INDEX_NAME).type(DOC_TYPE).build());
}
client.execute(bulkIndexBuilder.build());
only the first action will be executed and the second will fail. Why is that? Is it possible to execute more than one action?
Morover, using the provided code, I'm not able to execute more than 20 Bulk operations when I want to index the document. Basically, around 20 is fine, but anything more than that, the client.execute(bulkIndexBuilder.build()); just does not execute and the client shuts down.
Any help or suggestion would be appriciated.
UPDATE:
It seems that AWS ElasticSearch does not allow connecting to individual nodes. Simply turning off node discovery in the Jest client .discoveryEnabled(false) solved all the problems. This answer helped.
I am trying to log some of my data to Azure table storage service and it was working fine till now. I am logging to Azure table storage using Java.
Suddenly from yesterday I am getting the error as below:
Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
com.microsoft.azure.storage.table.TableServiceException
I read some articles on google https://github.com/Azure/azure-storage-net/issues/171 and it talks about blobs. I could not find anything related to table storage.
Can someone help me on this one? Code to access table is usual as below.
private CloudTable GetCloudTableContainer(String tableName) {
CloudTable table = null;
try {
// Retrieve storage account from connection string.
CloudStorageAccount storageAccount = CloudStorageAccount.parse(config.CONNECTION_STRING);
// Create the blob client.
CloudTableClient tableClient = storageAccount.createCloudTableClient();
// Retrieve a reference to a container.
table = tableClient.getTableReference(tableName);
table.createIfNotExists();
} catch (Exception ex) {
ex.printStackTrace();
}
return table;
}
Are you using shared acces signature(SAS)? Maybe it has expired?
I have written a Java app that synchronises Google Groups on our Google Apps for Education domain (similar in function to Google Apps School Directory Sync, but customised for some of our specific needs).
The synchronisation works, but it is slow because it is performing each task individually. I know that there are API interfaces for batching operations, but I can't find any examples of how this is implemented with the Java API.
The code I'm using looks similar to this (authentication and other setup is taken care of elsewhere):
try
{
Member m = new Member ();
m.setEmail (member);
m.setRole ("MEMBER");
service.members ().insert (group, m).execute ();
}
catch (Exception e)
{
// ERROR handling
}
Instead of executing these operations one-by-one, I would like to batch them instead. Can anyone tell me how?
Look here: Batch Java API
For example:
BatchRequest batch = new BatchRequest(httpTransport, httpRequestInitializer);
batch.setBatchUrl(new GenericUrl(/*your customized batch URL goes here*/));
batch.queue(httpRequest1, dataClass, errorClass, callback);
batch.queue(httpRequest2, dataClass, errorClass, callback);
batch.execute();
Remember, that:
The body of each part is itself a complete HTTP request, with its own
verb, URL, headers, and body. The HTTP request must only contain the
path portion of the URL; full URLs are not allowed in batch requests.
UPDATE
Look also here, how to build batch with Google Batch API:
https://github.com/google/google-api-java-client
UPDATE 2
Try something like this:
// Create the Storage service object
Storage storage = new Storage(httpTransport, jsonFactory, credential);
// Create a new batch request
BatchRequest batch = storage.batch();
// Add some requests to the batch request
storage.objectAccessControls().insert("bucket-name", "object-key1",
new ObjectAccessControl().setEntity("user-123423423").setRole("READER"))
.queue(batch, callback);
storage.objectAccessControls().insert("bucket-name", "object-key2",
new ObjectAccessControl().setEntity("user-guy#example.com").setRole("READER"))
.queue(batch, callback);
storage.objectAccessControls().insert("bucket-name", "object-key3",
new ObjectAccessControl().setEntity("group-foo#googlegroups.com").setRole("OWNER"))
.queue(batch, callback);
// Execute the batch request. The individual callbacks will be called when requests finish.
batch.execute();
From here: Batch request with Google Storage Json Api (JAVA)