Funnel analysis using MongoDB? - java

I have a collection named 'event' it tracks event from mobile applications.
The structure of event document is
{
eventName:"eventA",
screenName:"HomeScreen",
timeStamp: NumberLong("135698658"),
tracInfo:
{
...,
"userId":"user1",
"sessionId":"123cdasd2123",
...
}
}
I want to create report to display a particular funnel:
eg:
funnel is : event1 -> event2 -> event3
I want to find count of:
event1
event1 then event2
event1 then event2 and then event3
and the session is also considered i.e occurred in single session.
Note: just want to be clear, I want to be able to create any funnel that I define, and be able to create a report for it.

Your solution is likely to revolve around an aggregation like this:
db.event.aggregate([
{ $group: { _id: '$tracInfo.sessionId', events: { $push: '$eventName' } } }
])
where every resulting document will contain a sessionId and a list of eventNames. Add other fields to the $group results as needed. I imagine the logic for detecting your desired sequences in-pipeline would be pretty hairy, so you might consider saving the results to a different collection which you can inspect at your leisure. 2.6 features a new $out operator for just such occasions.

Related

GraphQL Query taking in multiple lists as arguments

This is about adding GraphQL to an existing Java api which takes in multiple lists as input.
Background:
I have an existing Java based REST API getFooInformationByRecognizer which takes in a list of recognizers, where each recognizer object contains an id and it's type and returns information corresponding to each id.
The only 3 types possible are A, B or C. The input can be any combination of these types.
Eg:
[{"id": "1", "type": "A" }, {"id": "2", "type": "B"},{"id": "3", "type": "C"}, {"id": "4", "type": "A"}, {"id":"5", "type": "B"}]
Here's it's Java representation:
class FooRecognizer{
String id;
String FooType;
}
This api does a bit of processing.
First extracts out all the input that has ids of type A and fetches information corresponding to those ids.
Similarly, extract out the ids that has type B and fetches information corresponding to those ids and similarly for C.
So, it fetches data from 3 different sources and finally collates them to a single map and returns.
Eg:
ids of type A --> A SERVICE -> <DATA SOURCE FOR A>
ids of type B --> B SERVICE --> <DATA SOURCE FOR B>
ids of type C --> C SERVICE --> <DATA SOURCE FOR C>
Finally does this:
A information + B information + C information and puts this in a Java Hashmap.
The Java representation of the request to this service is:
class FooRequest{
private Bar bar;
List<FooRecognizer> list;
}
The Java representation of the response object from the service is:
class FooInformationResponse{
private Map<String, FooRecognizer> fooInformationCollated;
}
Sample JSON output of the response is:
"output":{
"fooInformationCollated":{
"1":{
"someProperty": "somePropertyValue"
"listOfNestedProperties": [{"x": "xValue", "y": "yValue", "z","zValue"]
"nestedProperty":{
"anotherProperty":{
"furtherNestedProperty": "value"
}
}
}
"2":{
"someProperty": "somePropertyValue"
"listOfNestedProperties": [{"a": "aValue", "b": "bValue", "c","cValue"]
"nestedProperty":{
"anotherProperty":{
"furtherNestedProperty": "value"
}
}
}
}... and so on for other ids in the input
Now, I want to convert this service to GraphQL and here is my query.
query{
getFooInformationByRecognizer(filterBy:{
fooRecognizer: [{
id: "1",
fooType: A
}],
bar: {
barId: "someId",
...<other bar info>
}
}){
fooInformationCollated{
id
fooInformation{
someProperty
listOfNestedProperties
nestedProperty{
anotherProperty{
furtherNestedProperty
}
}
}
}
}
}
Here is my GraphQL schema:
type Query{
getFooInfoByRecognizer (filterBy: getFooByRecognizerTypeFilter!):getFooByRecognizerTypeFilterResponse
}
input getFooByIdentifierTypeFilter{
bar: Bar!
fooIdentifiers: [FooIdentifier!]!
}
input Bar{
barId: String!
....
}
input FooIdentifier{
id: String!
fooIdType: fooIdtype!
}
enum fooIdType{
A
B
C
}
I have a few questions here:
Would this be the best way / best practice to represent this query? Or should I model my query to be able to take in 3 separate lists. Eg: query getFooInformationByRecognizer(barId, listOfAs, listOfBs, listOfCs). Any other choice that I have to query / model?
I found having a complex input type as the easiest. In general, is there any specific reason to choose complex input type over other choices or vice-versa?
Is there any thing related to query performance that I should be concerned with? I've tried looking into DataLoader / BatchLoading but that doesn't quite seem to fit the case. I don't think N+1 problem should be an issue as I will also create separate individual resolvers for A, B and C but the query as can be seen does not make further calls to back-end once JSON is returned in response.
The question is too broad to answer concretely, but here's my best attempt.
While there isn't a definitive answer on 1 complex input argument vs multiple simpler arguments, 1 complex argument is generally more desirable as it's easier for the clients to pass a single variable, and it keeps the GraphQL files smaller. This may be more interesting for mutations, but it is a good heuristic regardless. See the logic explained it more detail e.g. in this article.
The logic explained above echoes your own observations
For this specific scenario you listed, I don't see anything of importance for performance. You seem to fetch the whole list in one go (no N+1), so not much different from what you're doing for your REST endpoint. Now, I can't say how expensive it is to fetch the lower-level fields (e.g. whether you need JOINs or network calls or whatever), but if there's any non-trivial logic, you may want to optimize it by looking ahead into the sub-selection before resolving your top-level fields.

Count specific enum values with JPA rather than manually

I have a list of status enum values which I am currently iterating over and using a basic counter to store how many in my list have the specific value that I am looking for. I want to improve greatly on this however and think that there may be a way to use some kind of JPA query on a paging and sorting repository to accomplish the same thing.
My current version which isn't as optimized as I would like is as follows.
public enum MailStatus {
SENT("SENT"),
DELETED("DELETED"),
SENDING("SENDING"),
}
val mails = mailService.getAllMailForUser(userId).toMutableList()
mails.forEach { mail ->
if (mail.status === MailStatus.SENT) {
mailCounter++
}
}
With a paging and sorting JPA repository is there some way to query this instead and get a count of all mail that has a status of sent only?
I tried the following but seem to be getting everything rather than just the 'SENT' status.
fun countByUserIdAndMailStatusIn(userId: UUID, mailStatus: List<MailStatus>): Long

SugarCRM custom field

I'am writing a software for the data-synchronization of a custom software and sugarCRM. Therefore I need an updateOrCreate() function. My Problem is, that the custom software uses other uuid´s than sugarCRM so i can´t look for the uuid to check on update or create.So I want to save the custom-uuid in a custom field of sugarCRM.
But i have no idea how to do that over the REST-API of sugarCRM.
By the way I wrote a java-application.
Thank you for help!
As far as I'm aware there is no update-or-create API (see https://your-sugarsite/rest/v10/help), howewer if you just want to use the API (rather than customize it) you could sync data like this:
1) Fetch all ids of records that have a custom uuid by using the POST /rest/v10/<module>/filter endpoint and a payload similar to:
{
offset: 0,
max_num: 1000,
fields: ["id", "custom_uuid_c"],
filter: [{"custom_uuid_c": {"$not_empty": ""}}],
]
}
or if you just need a specific custom uuid at a time:
{
offset: 0,
max_num: 1000,
fields: ["id"],
filter: [{"custom_uuid_c": {"$equals": "example-custom-uuid"}}],
]
}
The response will look something like this:
{
next_offset: -1,
records: [
{"id": "example-sugar-uuid", "custom_uuid_c": "example-custom-uuid"},
...
],
}
Notes:
Make sure to evaluate next_offset as even with a high max_num you may not get all records at once because of server limits. As long as next_offset isn't -1 you should use its value as offset in a new request to get the remaining records.
You can supply all field names you need to sync in the fields array, so that you get that information early and can check whether or not an update is required at all (maybe data is still up-to-date?)
Sugar also always include certain fields in the response, no matter if they were requested or not. (E.g. id and date_modified). I did not include them all in the response snippets for the sake of simplicity.
2)
Based on the information received in the previous step you know which sugar ID belongs to which custom UUID and you can detect/prepare data for updates.
If you need to sync all and retrieve the complete list first, I suggest you create a lookup table custom-uuid => sugar-id, so that you do not have to loop through the data array and compare fields when looking for a specific number.Don't forget to consider the possibility of a custom-uuid being present in one than more Sugar-record at a time, unless you enforce them being unique on the server/database side.
3)
Now that you have all the information you need you can update and create records as needed:
Update existing record: PUT /rest/v10/<module>/<record_id>
Create missing record: POST /rest/v10/<module>
If want to send a lot of creates and/or updates in a single request, have a look at the POST /rest/v10/bulk API - if your version of Sugar has it.
Final notes:
The filter operators definition on /rest/v10/help seems incomplete, for more info you can check the filter docs

Camel / MongoDB - $in operator with reference to another collection/document array

I came across this blog post in looking for a way to organize relationships. What I'm getting confused on is the syntax behind the following statement. I realize by virtue of the javascript variables, the following is possible..
var party = {
_id: "chessparty",
name: "Chess Party!",
attendees: ["seanhess", "bob"]
}
var user = { _id: "seanhess", name: "Sean Hess", events: ["chessparty"]}
db.events.save(party)
db.users.save(user)
db.events.find({_id: {$in: user.events}}) // events for user
db.users.find({_id: {$in: party.attendees}}) // users for event
What is throwing me for a spin in the last two lines though, since what I'm trying to do is something like this in Java. So I understand the idea, but I want to accomplish this in Java, more specifically, the Camel/MongoDB component.
I've been referencing the following documentation and looking at the "findAll" operation. So would I need to first run a query to get the array, for example the "user.events" and then run a second query to find the list of events? Or is there a way to reference the field "events" in collection "db.user" as part of the query on "db.events"?
Something to the tune of the following with a single query..
pseudo idea: db.events.find({_id: {$in: [db.user.events]}})
Ultimately I'm looking to translate this into something like the following..
from("direct:findAll")
.setBody().constant("{ \"_id\": {$in :\"user.events\" }}")
.to("mongodb:myDb?database=sample&collection=events&operation=findAll")
.to("mock:resultFindAll");
I'm a bit new to the mongodb camel component, so I'm wondering if there are any gurus that have already been there done that sort of thing?? And have any advice on the subject. Or to find out without 2 days of trial and error that this simple isn't possible..?
Thanks!
I thought I'd wrap this question up, it has been some time now and a few weeks ago I was able to work past this.
Basically I would up storing an array of userId's in the events collection..
example:
{
_id : 22bjh2345j2k3v235,
eventName : "something",
eventDate : ISODate(...),
attendees : [
"abc123",
"def098",
"etc..."
]
}
essentially assigning users to events. This way I could find all events a user was participating in, and I wound up with a list of users per event.
if I wanted to find all events for a user:
from("direct:findAll")
.setBody().simple("{ \"attendees\": \"${header.userId}\" }")
.to("mongodb:myDb?database=sample&collection=events&operation=findAll")
.to("mock:resultFindAll");

Elasticsearch multiple fields autosuggestion

I want to implement autosuggestion functionality using elastic search. I can use nGram filters to match partial words on multiple fields and its working fine as expected. Output of the search returns full document with multiple fields as required. Now my problem is, how do I give autosuggestion to the user based on the matching field. e.g. I have got 5 fields:
{userId:'rakesh',firstName:'Rakesh','lastName':'Goyal','mobileNo':'123-123-1234','alternativeMobileNo':'123-123-1235'}
{userId:'goyal',firstName:'Goyal','lastName':'Rakshit','mobileNo':'123-123-1236','alternativeMobileNo':'123-123-1237'}
In the above example if user types 123, I want to return 123-123-1234, 123-123-1235, 123-123-1236, 123-123-1237 (4 auto suggestions).
Similarly if user types Rak, I want to return Rakesh, Rakshit (2 auto suggestions).
How do I know match exists in mobileNo and alternativeMobileNo field for first example and return results accordingly?
How do I know match exists in firstName and lastName field for second example and return results accordingly?
How do I give autosuggestion to the user based on the matching field?
When user types 123, store it in a Java variable, prepare a query like below inserting that variable into and send a request to ElasticSearch.
{
"query" : {
"query_string" : {
"query" : "*123*"
}
}
}
The above query will manage to check it in both fields mobileNo and alternativeMobileNo.
Similarly, if user types Rak, the query will be similar to the previous one,
{
"query" : {
"query_string" : {
"query" : "*Rak*"
}
}
}
And I think you want to use highlighter api to answer your last how questions, which allows to highlight search results on one or more fields.
A screenshot of highlight example in es :

Categories

Resources