Aggregate document in MongoDB from two CSV files

Aggregate document in MongoDB from two CSV files - java

I am writing a Java program to insert two CSV files into a single document consisting of a subdocument but I do not know how to do it. I'll explain:
I have a SNP file containing the fields rsid, chr, has_sig and a LOCUS file containing the fields rsid, mrna_acc, gene, class, sap_id where in the LOCUS file, for each rsid can correspond more mrna_acc and therefore I will have more rows with same rsid.
I would like a Mongo document this:
{ _id: ObjectId("7264958211f41a0c647c47b1"),
rsid: rs530,
chr: 21,
has_sig: false,
locus: [
{ mrna_acc: NM_00125,
gene: ETS2,
class: utr_variant
},
{ mrna_acc: NM_00126,
gene: ETS2,
class: utr_variant
},
... ]
}
I tried to read the two CSV files with buffereader and insert them in the document like this:
Document d = new Document();
Document d1 = new Document();
FileSnp fs = new FileSnp("/Users/valentinafratini/Documents/Progetto Tesi/FactoryMethodDb/snp.csv");
fs.readFile();
long startTime = System.currentTimeMillis();
while (fs.line!=null) {
fs.line = fs.reader.readLine();
if (fs.line!=null && fs.line.length()>0) {
fs.obj = fs.line.split("\\s+");
fs.readSingleObj();
d.append("rsid", fs.rsid);
d.append("chr", fs.chr);
d.append("has_sig", fs.has_sig);
}
}
FileLocus fl = new FileLocus("/Users/valentinafratini/Documents/Progetto Tesi/FactoryMethodDb/locus.csv");
fl.readFile();
while (fl.line!=null) {
fl.line = fl.reader.readLine();
if (fl.line!=null && fl.line.length()>0) {
fl.obj = fl.line.split("\\s+");
fl.readSingleObj();
d1.append("mrna_acc", fl.mrna_acc);
d1.append("gene", fl.gene);
d1.append("class", fl.classe);
}
}
d.put("locus", d1);
list.add(d);
coll.insertMany(list);
But the result is the insertion of a single line with all the fields of both the snp file and the locus file.
Can you help me? I really do not know how to do it.
Thank you very much.

In your target document structure the locus attribute contains an array of sub documents ...
locus: [
{ mrna_acc: NM_00125,
gene: ETS2,
class: utr_variant
},
{ mrna_acc: NM_00126,
gene: ETS2,
class: utr_variant
}
]
This suggests that the FileLocus reader should produce a Document instance for each line in the locus.csv and that each of these documents should be added to a collection in the outer document: d which is created by the FileSnp reader.
If so, then you should replace the FileLocus block with the following:
// this will contain the collection of documents, one for each line in `locus.csv`
List<Document> locusDocuments = new ArrayList<>();
FileLocus fl = new FileLocus("/Users/valentinafratini/Documents/Progetto Tesi/FactoryMethodDb/locus.csv");
fl.readFile();
while (fl.line!=null) {
fl.line = fl.reader.readLine();
if (fl.line!=null && fl.line.length()>0) {
fl.obj = fl.line.split("\\s+");
fl.readSingleObj();
// create and populate a sub document for the current line
Document locusDocument = new Document();
locusDocument.append("mrna_acc", fl.mrna_acc);
locusDocument.append("gene", fl.gene);
locusDocument.append("class", fl.classe);
// assign the current sub document to the collection of locus documents
locusDocuments.add(locusDocument);
}
}
// add the collection of locus documents to the outer document
d.append("locus", locusDocuments);

Related

How to create full text search query in mongodb with spring-data?

I have spring-data-mogodb application on java or kotlin, and need create text search request to mongodb by spring template.
In mongo shell it look like that:
db.stores.find(
{ $text: { $search: "java coffee shop" } },
{ score: { $meta: "textScore" } }
).sort( { score: { $meta: "textScore" } } )
I already tried to do something but it is not exactly what i need:
#override fun getSearchedFiles(searchQuery: String, pageNumber: Long, pageSize: Long, direction: Sort.Direction, sortColumn: String): MutableList<SystemFile> {
val matching = TextCriteria.forDefaultLanguage().matching(searchQuery)
val match = MatchOperation(matching)
val sort = SortOperation(Sort(direction, sortColumn))
val skip = SkipOperation((pageNumber * pageSize))
val limit = LimitOperation(pageSize)
val aggregation = Aggregation
.newAggregation(match, skip, limit)
.withOptions(Aggregation.newAggregationOptions().allowDiskUse(true).build())
val mappedResults = template.aggregate(aggregation, "files", SystemFile::class.java).mappedResults
return mappedResults
}
May be someone already working with text searching on mongodb with java, please share your knowledge with us )

Setup Text indexes
First you need to set up text indexes on the fields on which you want to perform your text query.
If you are using Spring data mongo to insert your documents in your database, you can use #TextIndexed annotation and indexes will be built while inserting your document.
#Document
class MyObject{
#TextIndexed(weight=3) String title;
#TextIndexed String description;
}
If your document are already inserted in your database, you need to build your text indexes manually
TextIndexDefinition textIndex = new TextIndexDefinitionBuilder()
.onField("title", 3)
.onField("description")
.build();
After the build and config of your mongoTemplate you can pass your text indexes/
template.indexOps(MyObject.class).ensureIndex(textIndex);
Building your text query
List<MyObject> getSearchedFiles(String textQuery){
TextQuery textQuery = TextQuery.queryText(new TextCriteria().matchingAny(textQuery)).sortByScore();
List<MyObject> result = mongoTemplate.find(textQuery, MyObject.class, "myCollection");
return result
}

Nested loop in birt report

I'm using Eclipse Birt to generate report from a JSON File.
My JSON file look like this :
{
"cells":[
{
"type":"basic.Sensor",
"custom":{
"identifier":[
{
"name":"Name1",
"URI":"Value1"
},
{
"name":"Name4",
"URI":"Value4"
}
],
"classifier":[
{
"name":"Name2",
"URI":"Value2"
}
],
"output":[
{
"name":"Name3",
"URI":"Value3"
}
],
},
"image":{
"width":50,
"height":50,
"xlink:href":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAIAAAACACAYAAADDPmHLAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAABEJAAARCQBQGfEVAAAABl0RVh0U29mdHdhcmUAd3Vi8f+k/EREURQtsda2Or/+nFLqP6T5Ecdi0aJFL85msz2Qxyf4JIumMAx/ClmWt23GmL1kO54CXANAVH+WiN4Sx7EoNVkU3Z41BDHMeXAxjvOxNr7RJjzHX7S/jAflwBxkJr/RwiOpWZ883Nzd+Wpld7tkBr/SJr7ZHZbHZeuVweSnPfniocMAWYwcGBafH0OoPamFGAaY4ZBZjmmFGAaY4ZBZjmmFGAaY4ZBZjmmFGAaY7/B94QnX08zxKLAAAAAElFTkSuQmCC"
}
}
},
{
"type":"basic.Sensor",
"custom":{
"identifier":[
{
"name":"Name1",
"URI":"Value1"
},
{
"name":"Name4",
"URI":"Value4"
}
],
"classifier":[
{
"name":"Name2",
"URI":"Value2"
}
],
"output":[
{
"name":"Name3",
"URI":"Value3"
}
],
},
"image":{
"width":50,
"height":50,
"xlink:href":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAgAAAAIACAYAAAD0eNT6AAAACXBIWXMAAAsTAAALEwEAmpwYAAAKT2lDQ1BQaG90b3Nob3AgSUNDIHByb2ZpbGUAAHjanVNnVFPpFj333vRCS4iAlEtvUhUIIFJCi4AUkSYqIQkQSoghodkVUcERRUUEG8igiAOOjoCMFVEsDIoK2AfkIaKOg6OIisr74Xuja9igAQAAAgAIoAEAAAIACKABAAACAAigAQAAAgAIoAEAAAIACKABAAACAAigAQAAAgAIoAEAAAIADqhvprADeSsau00l5NAAAAAElFTkSuQmCC"
}
}
},
{
"type":"basic.Platform",
"custom":{
"identifier":[
{
"name":"Name1",
"URI":"Value1"
}
],
"classifier":[
{
"name":"Name2",
"URI":"Value2"
}
],
"output":[
{
"name":"Name3",
"URI":"Value3"
}
],
"image":{
"width":50,
"height":50,
"xlink:href":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAQAAAA6TH0jqtg6U8EsCdnm3SpevSK7Pb85xABEMBuLAn2hxjRve7SFzYEaB/HhytLQ4ABRwCWBPvBKnRk6U8EkBeOD9f7iwAGHAGEYEmwDxLvzNKfCCDP8NGLQd3lY7D0JwI4kmlwfHhX6dTSXxsRAAHsWR7aUjc7uM5Wg=="
}
}
}
]
}
i have 3 cells and each one contains 1 Image 1 Name 1 Type and 3 Tables , this is what i have done so far :
what i'm struggling to do is a nested loop, i want to have for each object( Cell) in my JSON a paragraph numerated like this :
2.x Component cell's Name :
Image
Output Table
Identifier Table
Classifier Table
So to do this i need to itterate on each cell and then itterate on each table Output, identifier and classifier and i have no idea how can i do this, a nested loop. like a List which represent the number of cells, that contains 3 tables, one image , one name.
**Edit : **
this is the open method for the data set
// Grab the JSON file and place it in a string
fisTargetFile = new FileInputStream(new File("C:/Users/Sample Reports/moe.json"));
input = IOUtils.toString(fisTargetFile, "UTF-8");
// Store the contents in a variable
jsonData = input;
// Convert the String to a JSON object
myJSONObject = eval( '(' + jsonData + ' )' );
// Get the length of the object
len = myJSONObject.cells.length;
// Counter
count = 0;
Fetch method :
if(count < len) {
var name = myJSONObject.cells[count].attrs.text["text"];
var type = myJSONObject.cells[count].type;
var icon =myJSONObject.cells[count].attrs.image["xlink:href"];
icon = icon.split(",");
icon= icon[1];
imageDataBytes = icon;
row["name"] = name;
row["type"] = type;
row["icon"] = Base64ToBlob.toBytes(icon);
Logger.getAnonymousLogger().info( row["icon"]);
count++;
return true;
}
return false;

You want to use nested tables, there is a good tutorial showing how to link nested tables to an outer table: please watch carefully this demo first, in particular see how the sub-table is linked to the outer table through a dataset parameter.
Of course your case is more challenging because you need to do this with scripted datasets and multiple sub-tables. I already did something similar, you have to create one scripted dataset for each sub-table. Key-points are:
In "parameters" section of each sub-dataset, create one input parameter and name it for instance "systemID"
Create your sub-tables by "drag & drop" each dataset within the outer table
In "bindings" section of each sub-table, link parameter "systemID" to the ID field of the outer table
In "open" event of sub-datasets, access the value of the parameter with this expression: inputParams["systemID"] Thus you can filter related rows in "myJSONObject".
It is important to make sure "myJSONObject" is initialized once for all, otherwise performances could dramatically decrease if it is evaluated on each iteration. For example evaluate it in "initialize" event of the report.
That's it, it won't be easy but these elements should help to achieve this report.

how to Create table in word doc using docx4j in specific bookmark without overwritting the word doc

I need to create a table at the location of particular bookmark. ie i need to find the bookmark and insert the table . how can i do this using docx4j
Thanks in Advance
Sorry Jason, I am new to Stackoverflow so i couldnt write my problem clearly, here is my situation and problem.
I made changes in that code as you suggested and to my needs, and the code is here
//loop through the bookmarks
for (CTBookmark bm : rt.getStarts()) {
// do we have data for this one?
String bmname =bm.getName();
// find the right bookmark (in this case i have only one bookmark so check if it is not null)
if (bmname!=null) {
String value = "some text for testing run";
//if (value==null) continue;
List<Object> theList = null;
//create bm list
theList = ((ContentAccessor)(bm.getParent())).getContent();
// I set the range as 1 (I assume this start range is to say where the start the table creating)
int rangeStart = 1;
WordprocessingMLPackage wordPackage = WordprocessingMLPackage.createPackage();
// create the table
Tbl table = factory.createTbl();
//add boards to the table
addBorders(table);
for(int rows = 0; rows<1;rows++)
{// create a row
Tr row = factory.createTr();
for(int colm = 0; colm<1;colm++)
{
// create a cell
Tc cell = factory.createTc();
// add the content to cell
cell.getContent().add(wordPackage.getMainDocumentPart()
.createParagraphOfText("cell"+colm));
// add the cell to row
row.getContent().add(cell);
}
// add the row to table
table.getContent().add(row);
// now add a run (to test whether run is working or not)
org.docx4j.wml.R run = factory.createR();
org.docx4j.wml.Text t = factory.createText();
run.getContent().add(t);
t.setValue(value);
//add table to list
theList.add(rangeStart, table);
//add run to list
//theList.add(rangeStart, run);
}
I dont need to delete text in bookmark so i removed it.
I dont know whats the problem, program is compiling but I cannot open the word doc , it says "unknown error". I test to write some string "value" it writes perfectly in that bookmark and document is opening but not in the case of table. Please help me
Thanks in advance

You can adapt the sample code BookmarksReplaceWithText.java
In your case:
line 89: the parent won't be p, it'll be body or tc. You could remove the test.
line 128: instead of adding a run, you want to insert a table
You can use TblFactory to create your table, or the docx4j webapp to generate code from a sample docx.

For some reason bookmark replacement with table didn't workout for me, so I relied on text replacement with table. I created my tables from HTML using XHTML importer for my use case
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
String xhtml= <your table HTML>;
XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
int ct = 0;
List<Integer> tableIndexes = new ArrayList<>();
List<Object> documentContents = documentPart.getContent();
for (Object o: documentContents) {
if (o.toString().contains("PlaceholderForTable1")) {
tableIndexes.add(ct);
}
ct++;
}
for (Integer i: tableIndexes) {
documentPart.getContent().remove(i.intValue());
documentPart.getContent().addAll(i.intValue(), XHTMLImporter.convert( xhtml, null));
}
In my input word doc, I defined text 'PlaceholderForTable1' where I want to insert my table.

Getting the metadata of columns from SOQL Query

I have a simple SOQL query in java for extracting Salesforce standard object as follows -
String soqlQuery = "SELECT FirstName, LastName FROM Contact";
QueryResult qr = connection.query(soqlQuery);
I want to get the datatype of the object fields.

I have written a small function below which will provide the list of Phone fields and its label present in a Custom or Standard Object of your Salesforce ORG. I hope this might help you in writing the business logic for your code.
public list<String> getFieldsForSelectedObject(){
selectedPhoneNumber = ''; //to reset home number field
list<String> fieldsName = new list<String>();
selectedObject = 'Object Name' // This should have the object name for which we want to get the fields type
schemaMap = Schema.getGlobalDescribe(); //Populating the schema map
try{
if(selectedObject != null || selectedObject != '' || selectedObject != '--Select Object--'){
Map<String, Schema.SObjectField> fieldMap = schemaMap.get(selectedObject).getDescribe().fields.getMap();
for(Schema.SObjectField sfield : fieldMap.Values()){
schema.describefieldresult dfield = sfield.getDescribe();
schema.Displaytype disfield= dfield.getType();
system.debug('#######' + dfield );
if(dfield.getType() == Schema.displayType.Phone){// Over here I am trying to findout all the PHONE Type fields in the object(Both Custom/Standard)
fieldsName.add('Name:'+dfield.getName() +' Label:'+ dfield.getLabel ());
}
}
}
}catch(Exception ex){
apexPages.addMessage(new ApexPages.message(ApexPages.severity.ERROR,'There is no Phone or Fax Field Exist for selected Object!'));
}
return fieldsName;
}
Sample OUTPUT List of String::
Name: Home_Phone__c Label: Home Phone
Name: Office_Phone__c Label: Office Phone

Say that we have the below soql.
select FirstName,LastName from Contact limit 2
The query result in the QueryResult object looks like below.
{
[2]XmlObject
{
name={urn:partner.soap.sforce.com}records, value=null, children=
[
XmlObject{name={urn:sobject.partner.soap.sforce.com}type, value=Contact, children=[]},
XmlObject{name={urn:sobject.partner.soap.sforce.com}Id, value=null, children=[]},
XmlObject{name={urn:sobject.partner.soap.sforce.com}FirstName, value=Bill, children=[]},
XmlObject{name={urn:sobject.partner.soap.sforce.com}LastName, value=Gates, children=[]}
]
},
XmlObject
{
name={urn:partner.soap.sforce.com}records, value=null, children=
[
XmlObject{name={urn:sobject.partner.soap.sforce.com}type, value=Contact, children=[]},
XmlObject{name={urn:sobject.partner.soap.sforce.com}Id, value=null, children=[]},
XmlObject{name={urn:sobject.partner.soap.sforce.com}FirstName, value=Alan, children=[]},
XmlObject{name={urn:sobject.partner.soap.sforce.com}LastName, value=Donald, children=[]}
]
},
}
In order to parse the QueryResult and to take column names, I have implemented the below method that will return the column names in comma separated String. I have mentioned the logic inside the code.
public String getColumnNames(QueryResult soqlResponse)
{
String columns = ""
try
{
// We are looping inorder to pick the 1st record from the QueryResult
for (SObject record : soqlResponse.getRecords())
{
Iterator<XmlObject> xmlList = record.getChildren();
int counterXml = 0;
while(xmlList.hasNext())
{
XmlObject xObj = xmlList.next();
// Since the 1st 2 nodes contains metadata of some other information, we are starting from the 3rd node only
if(counterXml > 1)
{
columns += xObj.getName().getLocalPart() + ",";
}
counterXml++;
}
// Since we can get the column names from the 1st record, we are breaking the loop after the data of 1st record is read
break;
}
// We are removing the last comma in the string
columns = columns.substring(0, columns.length() - 1);
}
catch(Exception ex)
{
}
return columns;
}

Why Lucene does not return the results based on whole word match?

I am using Lucene to match the keywords with list of words within an application. The whole process is automated without any human intervention. Best matched result (the one on the top and highest score) is picked from the results list returned from Lucene.
The following code demonstrates the above functionality and the results are printed on console.
Problem :
The problem is that lucene searches the keyword (word to be searched) and gives as a result a word that partially matches the keyword. On the other hand the full matched result also exists and does not get ranked in the first position.
For example, if I have lucene RAM index that contains words 'Test' and 'Test Engineer'. If i want to search index for 'AB4_Test Eng_AA0XY11' then results would be
Test
Test Engineer
Although Eng in 'AB4_Test Eng_AA0XY11' matched for Engineer (that is why it is listed in results). But it does not get the top position. I want to optimize my solution to bring the 'Test Engineer' on top because it the best match that considers whole keyword. Can any one help me in solving this problem?
public class LuceneTest {
private static void search(Set<String> keywords) {
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
try {
// 1. create the index
Directory luceneIndex = buildLuceneIndex(analyzer);
int hitsPerPage = 5;
IndexReader reader = IndexReader.open(luceneIndex);
for(String keyword : keywords) {
// Create query string. replace all underscore, hyphen, comma, ( , ), {, }, . with plus sign
StringBuilder querystr = new StringBuilder(128);
String [] splitName = keyword.split("[\\-_,/(){}:. ]");
// After tokenizing also add plus sign between each camel case word.
for (String token : splitName) {
querystr.append(token + "+");
}
// 3. search
IndexSearcher searcher = new IndexSearcher(reader);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
Query q = new QueryParser(Version.LUCENE_36, "name", analyzer).parse(querystr.toString());
searcher.search(q, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
System.out.println();
System.out.println(keyword);
System.out.println("----------------------");
for (ScoreDoc scoreDoc : hits) {
Document d = searcher.doc(scoreDoc.doc);
System.out.println("Found " + d.get("id") + " : " + d.get("name"));
}
// searcher can only be closed when there
searcher.close();
}
}catch (Exception e) {
e.printStackTrace();
}
}
/**
*
*/
private static Directory buildLuceneIndex(Analyzer analyzer) throws CorruptIndexException, LockObtainFailedException, IOException{
Map<Integer, String> map = new HashMap<Integer, String>();
map.put(1, "Test Engineer");
map.put(2, "Test");
Directory index = new RAMDirectory();
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, analyzer);
// 1. create the index
IndexWriter w = new IndexWriter(index, config);
for (Map.Entry<Integer, String> entry : map.entrySet()) {
try {
Document doc = new Document();
doc.add(new Field("id", entry.getKey().toString(), Field.Store.YES, Field.Index.ANALYZED));
doc.add(new Field("name", entry.getValue() , Field.Store.YES, Field.Index.ANALYZED));
w.addDocument(doc);
}catch (Exception e) {
e.printStackTrace();
}
}
w.close();
return index;
}
public static void main(String[] args) {
Set<String> list = new TreeSet<String>();
list.add("AB4_Test Eng_AA0XY11");
list.add("AB4_Test Engineer_AA0XY11");
search(list);
}
}

You can have a look at the Lucene Query syntax rules to see how you can enforce the search for Test Engineer.
Basically, using a query such as
AB4_Test AND Eng_AA0XY11
could work, though I am not sure of it. The page pointed by the link above is quite concise and you will be able to find rapidly a query that can fulfill your needs.

If these two results (test , test engineer) have the same ranking score, then you will see them in the order they came up.
You should try using the length filter and also boosting of the terms, and maybe then you can come up with the solution.
See also:
what is the best lucene setup for ranking exact matches as the highest

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Aggregate document in MongoDB from two CSV files - java

Related

How to create full text search query in mongodb with spring-data?

Nested loop in birt report

how to Create table in word doc using docx4j in specific bookmark without overwritting the word doc

Getting the metadata of columns from SOQL Query

Why Lucene does not return the results based on whole word match?

Categories

Resources