Snowplow Data Processing from PubSub to Java API - java

I am using Snowplow to do the behavioral data tracking. I could consume the data from Pub/Sub to BigQuery using Snowplow loader (& mutator) open source code (https://docs.snowplowanalytics.com/docs/getting-started-on-snowplow-open-source/setup-snowplow-on-gcp/setup-bigquery-destination/), but I would like to consume the data from Pub/Sub to a Java API directly.
However, the data from Pub/Sub is unstructured without a schema in a String format. The data includes "\t" as the delimiter as well as "{}" to store some schemas, which may require the string processing to do the data formatting.
Is there any other better way to decode the data from Pub/Sub to Java API rather than writing complex string processing. Thank you!

Snowplow maintains a number of so-called 'analytics SDKs' that let you transform the enriched hybrid tsv + JSON format into plain JSON that can then be used in downstream applications.
For Java, your best bet would probably be the Scala Analytics SDK: https://github.com/snowplow/snowplow-scala-analytics-sdk.
There are also SDKs for .NET, Go, JavaScript and Python: https://github.com/snowplow/snowplow/tree/master/5-data-modeling/analytics-sdk.

Related

How to retrieve output of a JSON request in Java?

I am looking to use this JSON interface:
https://translate.yandex.net/api/v1.5/tr.json/translate ?
key=<API key>
& text=<text to translate>
& lang=<translation direction>
& [format=<text format>]
& [options=<translation options>]
& [callback=<name of the callback function>]
More info
It returns a JSON object. How do I get that JSON object in Java?
I know there already is an implementation for that exact API, but it's old and not working anymore.
I've had good results using google's gson library. Really this depends on what you are doing with the JSON data - is it a rest payload, JMS message or what? A lot of stuff understands JSON natively now so don't reinvent the wheel.
Yandex.Translate API documentation can be found here:
With the API, you can access the online machine translation service Yandex. It supports more than 60 languages and can translate single words and whole texts. This API allows you to embed Yandex.Translate in a mobile application or web service for end users. Or, to translate large volumes of text - such as technical documentation.

What should be the return type of WCF Service for large amount of data and wth different clients?

I have created a WCF service whose return type is Dataset which is .NET framework compatible clients. But now my requirement gets changed and the clients can be platform independent i.e. service can be consumed by JAVA, Android phones and .NET application.
My questions are:
Which data type should I use which is compatible to all clients? i.e. JAVA don't have dataset as type(not much knowledge on JAVA)
service that I've created is default one provided by .NET framework(NOT REST, not using SOAP manually)
Data will be of thousand lines which return type will be better ?
DO I have to use REST,SOAP for large amount of data ?
how can I achieve this?
please don't mark this question as DUPLICATE!
For large amount of data you should consider the followings:
Try to use some kind of compression. I usually use 7zip's open source compression to reduce the data transfer.
Use and share well defined DTO (Data Transfer Objects) on both server and clients.
Use streams to transfer your data and parse them correctly.
If you have less DTOs and large data in the objects, you can use SOAP otherwise stick to REST.

Generate and parse text files in Java

I'm looking for a library/framework to generate/parse TXT files from/into Java objects.
I'm thinking in something like Castor or JAXB, where the mapping between the file and the objects can be defined programmatically or with XML/annotations. The TXT file is not homogeneous and has no separators (fixed positions). The size of the file is not big, therefore DOM-like handling is allowed, no streaming required.
For instance:
TextWriter.write(Collection objects) -> FileOutputStream
TextReader.read(FileInputStream fis) -> Collection
I suggest you use google's protocol buffers
Protocol buffers are a flexible, efficient, automated mechanism for
serializing structured data – think XML, but smaller, faster, and
simpler. You define how you want your data to be structured once, then
you can use special generated source code to easily write and read
your structured data to and from a variety of data streams and using a
variety of languages. You can even update your data structure without
breaking deployed programs that are compiled against the "old" format.
Protobuf messages can be exported/read in binary or text format.
Other solutions would depend on what you call text file : if base64 is texty enough for you, you could simply use java standard serialization with base64 encoding of the binary stream.
You can do this using Jackson serialize to JSON and back
http://jackson.codehaus.org/
Just generate and parse it with XML or JSON formats, there's a whole load of libraries out there that will do all the work for you.

How to deserialize in PHP an object serialized in Java

Is there any way to deserialize in PHP an object serialized in Java? IE If I have a Java class that implements Serialization and I use an ObjectOutputStream to write the object, and convert the result to a string, is there a way in PHP to take that string and create a similar object representation from it?
What does the Java Serialized data look like?
Response:
���sr�com.site.entity.SessionV3Data���������xpsr�java.util.HashMap���`��F�
loadFactorI� thresholdxp?#�����w������t� sessionIdt�0NmViMzUxYWItZDRmZC00MWY4LWFlMmUtZjg2YmZjZGUxNjg5xx
:)
I would heavily recommend you don't do this. Java serialization is meant for a Java instance to both save and load the data (for either transmission to another Java application or persistence between invocations of the same application). It was not at all meant to be a cross-platform protocol.
I would advise you to make an API adapter layer between the two. Output the contents of your Java object to a format you can work with in PHP, be it XML, YAML, or even a binary format (where you could use DataOutputStream).
What is the easiest way to eat soup with chopsticks when the soup was put in a bowl with a ladle? Put the soup in a cup and discard your chopsticks, because chopsticks are a poor choice for aiding in the consumption of soup. A cup (ubiquitous) eliminates external dependencies except for "mouth" and "opposable thumbs", both of which come with the standard library of humans.
A more elegant solution would be to encode that Java object with a JSON Serializer or XML serializer. Protocol Buffers or any other intentionally cross-language serialization technique would work fine plus Protocol Buffers can efficiently encode binary data.
Some time ago i did something simillar. However i didn't make PHP read "Java serialize" format. I did the oposite, that is, made Java serialize itself to a "PHP serialize" format. This is actually quite easy. Have look at PHPSerializedResponseWriter class that is a part of Solr package:
https://github.com/terrancesnyder/solr-analytics/blob/master/solr/core/src/java/org/apache/solr/response/PHPSerializedResponseWriter.java
...then all you have to do is just read the string and call:
$result = unserialize($string);
From comments in the online PHP manual, there is a Java class that serializes to the PHP serialization format that you can look into. Then you can unserialize the data using the standard PHP functionality.
Is it possible to use one of the more common cross platform data formats like JSON to communicate between your Java app and PHP? PHP has plenty of parsers for those formats. Check out json_decode for an example.
Is there any way to deserialize in PHP
an object serialized in Java?
Yes. The question is, should you? Exporting the Java object as XML or JSON probably makes more sense.
The following SO question might also help.
Dynamically create PHP object based on string

understanding json

JSON stands for JavaScript Object Notation. But how come languages like php, java, c etc can also communication each other with json.
What I want to know is that, am i correct to say that json is not limited to js only, but served as a protocol for applications to communicate with each other over the network, which is the same purpose as XML?
JSON cannot handle complex data hierarchies like XML can (attributes, namespaces, etc.), but on the other hand you don't get the same overhead with JSON as you get with XML (if you don't need the complex data structures).
Since JSON is plain text with a special notation for JS to interpret, it's an easy protocol to adopt in other languages.
It is easy for a JS script to parse JSON, since it can be done using 'eval' in which the JS enginge can use its full power.
On the other hand, it is more complicated to generate JSON from within JS. Usually one uses the JSON package from www.json.org in which an object can easily be serialised using JSON.stringify, but it is implemented in JS so its not running with optimal performance.
So serialising JSON is about the same complexity using JS as when using Java, PHP or any other server side language.
Therefore, in my opinion, JSON is best suited when there is asymmetry between produce/consumer e.g. a web server that generates a lot of data that is consumed by the web application. Not the other way around.
But! When one choses JSON as data format it should be used in both directions, not XML<>JSON. Except for when simple get requests are used to retrieve JSON data.
yes, JSON is also wildly used as a data exchange protocol much like XML.
Typically a program (not written in JavaScript) needs a JSON library to parse and create JSON objects (although you can probably create them even without one).
Your right - it's a light weight data interchange format -- more details at: http://www.json.org
You are completely correct. JSON definition of how data should be formatted. It is more light weight than XML and therefore well suited to things like AJAX where you want to send data back and forth to the server quickly.

Categories

Resources