Adding an attachment on Azure CosmosDB - java

I am looking for some help on how to add an attachment on CosmosDB. Here is the little background.
Our application is currently on IBM Bluemix and we are using CloudantDB. We use CloudanDB to store attachments (PDF file). We are no moving to Azure PaaS App Service and planning to use CosmosDB. I am looking for help on how to create an attachment on CosmosDB using Java API. What API do I need to use? I want to do a small POC.
Thanks,

Well Personally i feel In Azure, if you go want to put files into documentDb, you will pay high for the query cost. Instead it would be normal practice to use Azure blob and save the link in a field, and then return url if its public or binary data if you want it to be secured.
However, You could store it using
var myDoc = new { id = "42", Name = "Max", City="Aberdeen" }; // this is the document you are trying to save
var attachmentStream = File.OpenRead("c:/Path/To/File.pdf"); // this is the document stream you are attaching
var client = await GetClientAsync();
var createUrl = UriFactory.CreateDocumentCollectionUri(DatabaseName, CollectionName);
Document document = await client.CreateDocumentAsync(createUrl, myDoc);
await client.CreateAttachmentAsync(document.SelfLink, attachmentStream, new MediaOptions()
{
ContentType = "application/pdf", // your application type
Slug = "78", // this is actually attachment ID
});
WORKING WITH ATTACHMENTS
I have answered a similar question here

What client API I can use?
You could follow the cosmos db java sdk to CRUD attachment.
import com.microsoft.azure.documentdb.*;
import java.util.UUID;
public class CreateAttachment {
// Replace with your DocumentDB end point and master key.
private static final String END_POINT = "***";
private static final String MASTER_KEY = "***";
public static void main(String[] args) throws Exception, DocumentClientException {
DocumentClient documentClient = new DocumentClient(END_POINT,
MASTER_KEY, ConnectionPolicy.GetDefault(),
ConsistencyLevel.Session);
String uuid = UUID.randomUUID().toString();
Attachment attachment = getAttachmentDefinition(uuid, "application/text");
RequestOptions options = new RequestOptions();
ResourceResponse<Attachment> attachmentResourceResponse = documentClient.createAttachment(getDocumentLink(), attachment, options);
}
private static Attachment getAttachmentDefinition(String uuid, String type) {
return new Attachment(String.format(
"{" +
" 'id': '%s'," +
" 'media': 'http://xstore.'," +
" 'MediaType': 'Book'," +
" 'Author': 'My Book Author'," +
" 'Title': 'My Book Title'," +
" 'contentType': '%s'" +
"}", uuid, type));
}
}
In the documentation it says, total file size we can store is 2GB.
"Azure Cosmos DB allows you to store binary blobs/media either with
Azure Cosmos DB (maximum of 2 GB per account) " Is it the max we can
store?
Yes.The size of attachments is limited in document db. However, there are two methods for creating a Azure Cosmos DB Document Attachment.
1.Store the file as an attachment to a Document
The raw attachment is included as the body of the POST.
Two headers must be set:
Slug – The name of the attachment.
contentType – Set to the MIME type of the attachment.
2.Store the URL for the file in an attachment to a Document
The body for the POST include the following.
id – It is the unique name that identifies the attachment, i.e. no two attachments will share the same id. The id must not exceed 255 characters.
Media – This is the URL link or file path where the attachment resides.
The following is an example
{
"id": "device\A234",
"contentType": "application/x-zip-compressed",
"media": "www.bing.com/A234.zip"
}
If your files are over limitation , you could try to store them with second way. More details, please refer to blog.
In addition, you could notice that cosmos db attachments support
garbage collect mechanism,it ensures to garbage collect the media when all of the outstanding references are dropped.
Hope it helps you.

Related

Cassandra With Spark connector - How Insert list Of Items to Cassandra

Working with Cassandra and Spark 2.12 (3.2.0) with java.
Cassandra connector 3.1.0
My purpose fetch from s3 do preprocess and insert parallel into Cassandra.
I encounter with problem that I did preprocess on each s3 file that include list of items to insert to Cassandra which look like: JavaRDD<List<SearchEntity>>
How should I pass it to cassandra (as in code example) ? I see it supports single object mapToRow.
maybe I miss something ?
Using the following code
public static void main(String[] args) throws Exception {
SparkConf conf = new SparkConf()
.setAppName("Example Spark App")
.setMaster("local[1]")
.set("spark.cassandra.connection.host", "127.0.0.1");
JavaSparkContext sparkContext = new JavaSparkContext(conf);
sparkContext.hadoopConfiguration().set("fs.s3a.access.key", "XXXX");
sparkContext.hadoopConfiguration().set("fs.s3a.secret.key", "YYYY");
sparkContext.hadoopConfiguration().set("fs.s3a.endpoint", "XXXXX");
sparkContext.hadoopConfiguration().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem");
sparkContext.hadoopConfiguration().set("mapreduce.input.fileinputformat.input.dir.recursive", "true");
JavaPairRDD<String, PortableDataStream> javaPairRDD = sparkContext.binaryFiles("s3a://root/folder/");
File ROOT = createTempFolder().getAbsoluteFile();
JavaRDD<List<SearchEntity>> listJavaRDD = javaPairRDD.map(rdd -> {
System.out.println("Working on TAR: " + rdd._1);
DataInputStream stream = rdd._2.open();
// some preprocess
List<SearchEntity> toCassandraList = new WorkerTest(ROOT, stream).run();
return toCassandraList;
});
// here I want to take List<SearchEntity> toCassandraList and save them
// but I don't see how as it support only single object ..
CassandraJavaUtil.javaFunctions(listJavaRDD)
.writerBuilder("demoV2", "simple_search",
CassandraJavaUtil.mapToRow(List<SearchEntity> list objects ...)) // here is problem
.saveToCassandra();
System.out.println("Finish run s3ToCassandra:");
sparkContext.stop();
}
The schema was configured before manually only for tests purposes.
CREATE TABLE simple_search (
engine text,
term text,
time bigint,
rank bigint,
url text,
domain text,
pagenum bigint,
descr text,
display_url text,
title text,
type text,
PRIMARY KEY ((engine, term), time , url, domain, pagenum)
) WITH CLUSTERING ORDER BY
(time DESC, url DESC, domain DESC , pagenum DESC);
Both Java and Scala solutions are welcomed
To write the data, you need to work on the SearchEntity, not on the list of the SearchEntity. To do that, you need to use flatMap instead of ordinary map:
JavaRDD<SearchEntity> entriesRDD = javaPairRDD.flatMap(rdd -> {
System.out.println("Working on TAR: " + rdd._1);
DataInputStream stream = rdd._2.open();
// some preprocess
List<SearchEntity> toCassandraList = new WorkerTest(ROOT, stream).run();
return toCassandraList;
});
and then you can just write the as per documentation:
javaFunctions(rdd).writerBuilder("demoV2", "simple_search",
mapToRow(SearchEntity.class)).saveToCassandra();
P.S. But be beware, that if your tar is too big, it may cause memory errors on the workers, when you're creating the List<SearchEntity>. Depending on the file format inside tar file, it could be more optimal to unpack data first, and then read them using Spark.

Unable to delete document in Azure Cosmos DB with Document Link - Java Async V2

I'm trying to delete a document in one of my Azure Cosmos DB Collection using the Java CosmosDB Async v2 Library with the document link. Every time I'm trying to perform the delete operation on the document I'm running into Resource not found issue. I'm not sure what I missed.
Here is what I have so far:
public Document deleteDocument(String docId, String collectionName){
RequestOptions options = new RequestOptions();
PartitionKey key = new PartitionKey(docId);
options.setPartitionKey(key);
String documentLink = String.format("/dbs/%s/colls/%s/docs/%s", this.databaseName, collectionName, docId);
LOGGER.info("DOCUMENT LINK:" + documentLink);
return this.client.deleteDocument(documentLink, options).next().block().getResource();
}
Here is the Document information from the Azure CosmosDB:
Input:
this.deleteDocument("beff44de-914a-4250-80c3-108b71989720", "SravanCollection");
Thank you
You can do something like this in V4:
public Mono<CosmosAsyncItemResponse> deleteBookByID(String id, String partitionKey) {
return container.deleteItem(id, new PartitionKey(partitionKey));}
You can find a full sample application here:
https://github.com/RaviTella/SpringBootWebFluxCosmosSQL/tree/master/ReadingListWebApp
This should work in V2. Make sure you pass the correct value for Partition key and id :
private void deleteBookObservable(String id, String partitionKey) {
String documentLink = String.format("/dbs/%s/colls/%s/docs/%s", databaseName,
collectionName, id);
RequestOptions reqOpts = new RequestOptions();
reqOpts.setPartitionKey(new PartitionKey(partitionKey));
Observable<ResourceResponse<Document>> deleteObservable =
getClient().deleteDocument(documentLink, reqOpts).toBlocking().first();
}
Its highly recommended to use the V4 for all new development. There are breaking changes from v2 to v4. V4 using Reactor core implementation of the reactive extensions.

Java - DocumentDB Unauthorized access

I am trying to write a functionality to store an object in DocumentDB from Azure.
I have a following piece of code:
public void saveEvent(Event event) throws DocumentClientException {
Document document = new Document(JsonCreator.createJson(event));
//check if document already exists
FeedOptions feedOptions = new FeedOptions();
feedOptions.setEnableCrossPartitionQuery(true);
FeedResponse<Document> eventDocument = documentClient.queryDocuments(COLLECTION_LINK, String.format(SELECT_DOCUMENT, event.getId()), feedOptions);
// if there is a document with the ID then replace
if (eventDocument.getQueryIterator().hasNext()) {
//documentClient.replaceDocument(COLLECTION_LINK, document, null);
documentClient.replaceDocument(COLLECTION_LINK, document, null);
}
else {
documentClient.createDocument(COLLECTION_LINK, document, null, false);
}
}
If the event does not exist (means that there is no record in database with the id of event ) then createDocument is called. If the record already exists in the database then replaceDocument is called.
createDocument is called without problem and document is created in the database
replaceDocument throws StatusCode: Unauthorized exception
com.microsoft.azure.documentdb.DocumentClientException: The input authorization token can't serve the request. Please check that the expected payload is built as per the protocol, and check the key being used. Server used the following payload to sign: 'put colls dbs/sporteventsdb/colls/sportevents sun, 26 mar 2017 08:32:41 gmt
Full stack: http://pastebin.com/YVGwqLkH
Constants used in the code:
COLLECTION_LINK = "dbs/" + DATABASE_ID + "/colls/" + COLLECTION_ID";
SELECT_DOCUMENT = "SELECT * FROM " + DATABASE_ID + " WHERE " + DATABASE_ID + ".id = \"%d\"";
I'm developing with Spring framework on IntelliJ IDEA on Ubuntu with Java 8.
You're getting the error because DocumentDB's replaceDocument takes the documentLink as an argument, no the collectionLink. Javadoc is here: http://azure.github.io/azure-documentdb-java/

How can I set pageToken to get item lists from Google Cloud Storage via Java SDK?

I want to set pageToken to get items stored at Google Cloud Storage. I'm using Google API Client Library for Java v1.19.x.
I have no idea to generate pageToken from file path(or file name).
2 files stored in bucket.
my-bucket
/test.csv
/test2.csv
When I tried Google APIs Explorer with following parameters, I could get nextPageToken Cgh0ZXN0LmNzdg==.
And I found out that I can get test.csv string by decoding nextPageToken with base64.
bucket: my-bucket
pageToken:
prefix: test
maxResults: 1
{"kind": "storage#objects", "nextPageToken": "Cgh0ZXN0LmNzdg==", ...}
But How can I get Cgh0ZXN0LmNzdg== from test.csv?
Although I tried Base64 encoding, result didn't match.
import com.google.api.client.repackaged.org.apache.commons.codec.binary.Base64;
String lastFile = "test.csv"
String token = Base64.encodeBase64String(lastFile.getBytes());
String bucket = "my-bucket"
String prefix = "test"
Storage.Objects.List listObjects = client.objects().list(bucket);
listObjects.setPrefix(prefix);
listObjects.setPageToken(token);
long maxResults = 1;
listObjects.setMaxResults(maxResults);
do {
Objects objects = listObjects.execute();
List<StorageObject> items = objects.getItems();
token = objects.getNextPageToken();
listObjects.setPageToken(token);
} while (token != null);
I could get next token from file path string using following codes by myself.
How to get nextToken from path string
String nextToken = base64encode(0x0a + asciiCode + pathString)
asciiCode can be taken between 0x01(SOH) and 0x7f(DEL). It seems to depend on path length.
my-bucket/
a/a(3byte) 0x03
a/ab(4byte) 0x04
test.txt(8byte) 0x08
Notice
If path length is longer than 1024 byte, another rule seems to apply. But I couldn't found out rules.
See also Object Name Requirements
import com.google.common.io.BaseEncoding;
String lastFile = "test.csv"
String token = base64Encode(lastFile);
String bucket = "my-bucket"
String prefix = "test"
Storage.Objects.List listObjects = client.objects().list(bucket);
listObjects.setPrefix(prefix);
listObjects.setPageToken(token);
long maxResults = 1;
listObjects.setMaxResults(maxResults);
do {
Objects objects = listObjects.execute();
List<StorageObject> items = objects.getItems();
token = objects.getNextPageToken();
listObjects.setPageToken(token);
} while (token != null);
private String base64Encode(String path) {
byte[] encoding;
byte[] utf8 = path.getBytes(Charsets.UTF_8);
encoding = new byte[utf8.length + 2];
encoding[0] = 0x0a;
encoding[1] = new Byte(String.valueOf(path.length()));
String s = BaseEncoding.base64().encode(encoding);
return s;
}
I know this question is already answered and is applied to Java, I'd like to mention that this question applies to PHP as well.
With the help of the approved post from sakama above I figured out a PHP version of his solution.
The PHP equivalent for generating the token is as follow:
base64_encode(pack('c', 0x0a) . pack('c', $path_string_length) . pack('a*', $path_string));
The byte pattern seems indeed (as sakama already mentioned) to be:
<line feed><line data length><line data>

createUserDefinedFunction : if already exists?

I'm using azure-documentdb java SDK in order to create and use "User Defined Functions (UDFs)"
So from the official documentation I finally find the way (with a Java client) on how to create an UDF:
String regexUdfJson = "{"
+ "id:\"REGEX_MATCH\","
+ "body:\"function (input, pattern) { return input.match(pattern) !== null; }\","
+ "}";
UserDefinedFunction udfREGEX = new UserDefinedFunction(regexUdfJson);
getDC().createUserDefinedFunction(
myCollection.getSelfLink(),
udfREGEX,
new RequestOptions());
And here is a sample query :
SELECT * FROM root r WHERE udf.REGEX_MATCH(r.name, "mytest_.*")
I had to create the UDF one time only because I got an exception if I try to recreate an existing UDF:
DocumentClientException: Message: {"Errors":["The input name presented is already taken. Ensure to provide a unique name property for this resource type."]}
How should I do to know if the UDF already exists ?
I try to use "readUserDefinedFunctions" function without success. Any example / other ideas ?
Maybe for the long term, should we suggest a "createOrReplaceUserDefinedFunction(...)" on azure feedback
You can check for existing UDFs by running query using queryUserDefinedFunctions.
Example:
List<UserDefinedFunction> udfs = client.queryUserDefinedFunctions(
myCollection.getSelfLink(),
new SqlQuerySpec("SELECT * FROM root r WHERE r.id=#id",
new SqlParameterCollection(new SqlParameter("#id", myUdfId))),
null).getQueryIterable().toList();
if (udfs.size() > 0) {
// Found UDF.
}
An answer for .NET users.
`var collectionAltLink = documentCollections["myCollection"].AltLink; // Target collection's AltLink
var udfLink = $"{collectionAltLink}/udfs/{sampleUdfId}"; // sampleUdfId is your UDF Id
var result = await _client.ReadUserDefinedFunctionAsync(udfLink);
var resource = result.Resource;
if (resource != null)
{
// The UDF with udfId exists
}`
Here _client is Azure's DocumentClient and documentCollections is a dictionary of your documentDb collections.
If there's no such UDF in the mentioned collection, the _client throws a NotFound exception.

Categories

Resources