How to create nested object and array in parquet file? - java

How do I create a parquet file with nested fields? I have the following:
public static void main(String[] args) throws IOException {
int fileNum = 10; //num of files constructed
int fileRecordNum = 50; //record num of each file
int rowKey = 0;
for (int i = 0; i < fileNum; ++i) {
Map<String, String> metas = new HashMap<>();
metas.put(HConstants.START_KEY, genRowKey("%10d", rowKey + 1));
metas.put(HConstants.END_KEY, genRowKey("%10d", rowKey + fileRecordNum));
ParquetWriter<Group> writer = initWriter("pfile/scanner_test_file" + i, metas);
for (int j = 0; j < fileRecordNum; ++j) {
rowKey++;
Group group = sfg.newGroup().append("rowkey", genRowKey("%10d", rowKey))
.append("cf:name", "wangxiaoyi" + rowKey)
.append("cf:age", String.format("%10d", rowKey))
.append("cf:job", "student")
.append("timestamp", System.currentTimeMillis());
writer.write(group);
}
writer.close();
}
}
I want to create two fields:
Hobbies which contains a list of hobbies ("Swimming", "Kickboxing")
A teacher object that contains subfields like:
{
'teachername': 'Rachel',
'teacherage':50
}
Can someone provide an example how to do this in Java?

Parquet is columned (mini-storages) key-value storage... I.e. this kind of storage cannot keep nested data, but this storage accepts converting logical types of data to binary format (byte array with header that contains data to understand what kind of convertation should be applied to this data).
I'm not sure about how should you implement your converter, but basically you should work with Binary class as data container and create some converter... sample converter you can find for String data type.

Related

Apache Ignite SqlFieldQuery on top of cache storing BinaryObject

I cannot seem to get this to work, and have scoured the net for documentation or examples to no avail
Goal
To run a simple aggregation query on an Ignite Cache backed by BinaryObject values with UUID as the key
Put Operation Code
IgniteBinary binary = ignite.binary();
IgniteCache<UUID, BinaryObject> rowCache = ignite.getOrCreateCache(CACHE_NAME).withKeepBinary();
// put
final int NUM_ROW = 100000;
final int NUM_COL = 100;
for (int i = 0; i < NUM_ROW; i++) {
BinaryObjectBuilder builder = binary.builder(ROW);
for (int j = 0; j < NUM_COL; j++) {
builder.setField("col" + j, Math.random(), Double.class);
}
BinaryObject obj = builder.build();
rowCache.put(UUID.randomUUID(), obj);
}
Read Operation Code
IgniteCache<UUID, BinaryObject> cache = ignite.cache(CACHE_NAME).withKeepBinary();
final SqlFieldsQuery sqlFieldsQuery = new SqlFieldsQuery("SELECT COUNT(col1)" + cache.getName());
FieldsQueryCursor<List<?>> result = cache.query(sqlFieldsQuery);
Error
org.h2.jdbc.JdbcSQLException: Column "COL1" not found; SQL statement
EDIT
I've since added a QueryEntity to the cache configuration to make the problem disappear
final QueryEntity queryEntity = new QueryEntity();
queryEntity.setTableName(CACHE_NAME);
queryEntity.setKeyFieldName("key");
queryEntity.setKeyType(String.class.getName());
queryEntity.setValueType(Row.class.getName());
LinkedHashMap<String, String> fields = new LinkedHashMap<>();
fields.put("key", String.class.getName());
for (int i = 0; i < 55; i++) {
fields.put("col" + i, Double.class.getName());
}
queryEntity.setFields(fields);
return queryEntity;
However, it is unclear to me how QueryEntity's setValueType and setValueFieldName does? My value type is an arbitrary Binary object with arbitrary key, values
I would like to declare these via fields.put(<colName>, <colType>); ...
I am able to get everything to work using POJOs, but not BinaryObject as the value type
Is there anything I am doing wrong?
new SqlFieldsQuery("SELECT COUNT(col1)" + cache.getName())
Cache name is a schema name, and class name (Row) is a table name. Looks like you have incorrect table name.
Also make sure that ROW in binary.builder(ROW) equals to QueryEntity.valueType.

writing data into .dat file in 3 columns using java

I want to write data generated in loops in my java code into a .dat file and in three columns so that I could use this .dat file to draw plots in matlab or gnuplot. could you help me please.
public static initialpop evolalgorithm(initialpop Population, File results) throws IOException {
initialpop newPopulation = new initialpop(mu+landa,false);
double AverageFit = Population.chooseBests(mu);
double LeastParentsFitness= Population.getroute(mu-1).fitness();
int Alived=mu;
for(int i=0;i<Alived;i++)
{
newPopulation.saveroute(i, Population.getroute(i));
}
for (int i = Alived; i < newPopulation.populationSize(); i++) {
route child = inheritance(Population);
newPopulation.saveroute(i, child);
}
for (int i = Alived; i < newPopulation.populationSize(); i++) {
mutate(newPopulation.getroute(i));
}
double bestIndividualFitness=newPopulation.getFittest().fitness();
return newPopulation;}
I want to have a column of values of AverageFit, a column of values of LeastParentsFitness and onother column for bestIndividualFitness in my data file. these values should be added to a file, each time this function evolalgorithm is called.

Data Analysis and Processing of Variables and their Concreteness

I have a variable_text file with 100K variables (some unique, some not) and an excel file with (unique) variables in one column and their respective concreteness values in another column. I already wrote some code to read the variables from the text file and search their concreteness value from the excel file and spit out the results in another result_text file.
My problem is I need to use an appropriate data structure to store the variables and their concreteness and count the frequency of repeating variables from the variable_text file. I've looked at HashTables and HashMaps but dont know if I should chose from these or if there's another viable option.
This data structure must represent a sort of table or map:
Variable | Frequency | Concreteness
Here is some code that can help you with that, i recommend that you do use a hashmap like this
public static void main(String[] args)
{
Map <String, Data> map = new HashMap <String, Data>();
String [] variables={"variable1", "variable2", "variable3", "variable4", "variable4", "variable1","variable1"};
int Concreteness=5;//for this example every variable will have the same cncreteness
for(int i=0; i<variables.length;i++)
{
Data variable_exists=map.get(variables[i]);
if(variable_exists!=null)
variable_exists.setFrecuency(variable_exists.getFrecuency()+1);
else
map.put(variables[i], new Data(Concreteness,1));
}
for (Map.Entry<String, Data> entry : map.entrySet())
{ System.out.println("variable = " + entry.getKey() + ", Frecuency = " + entry.getValue().getFrecuency()+ ", Concreteness = " + entry.getValue().getConcreteness()); }
}
the output for this example would be
variable = variable4, Frecuency = 2, Concreteness = 5
variable = variable1, Frecuency = 3, Concreteness = 5
variable = variable2, Frecuency = 1, Concreteness = 5
variable = variable3, Frecuency = 1, Concreteness = 5
and here is the Data class i used
public class Data
{
private int frecuency;
private int Concreteness;
Data (int Concreteness, int frecuency)
{
setFrecuency(frecuency);
setConcreteness(Concreteness);
}
public int getFrecuency()
{
return frecuency;
}
public void setFrecuency(int frecuenxy)
{
this.frecuency = frecuenxy;
}
public int getConcreteness()
{
return Concreteness;
}
public void setConcreteness(int Concreteness)
{
this.Concreteness = Concreteness;
}
}

reading all attributes of xml elements at once

I have xml file as below
<dashboard DASHBOARD_ID="1" DASHBOARD_IMAGE="" DASHBOARD_NAME="TestDashboard">
<linkedpages>
<pages page_id=1212 pagename=""/>
<report reportid=212 reportname=""/>
</linkedpages>
my need is that I should import these tag attribute velues int o respective table say page table, report table, dashborad table and so on.
I am get the elements and their attributes by
String attribute = child.getAttribute("report_id");
but I need to write n number of such line, and its not generic, i can have variable length of attributes.
So i need to be able to read all attributes of each tag.
How can this be done Please help, Any idea of doing this is appreciated.
Thank You
Try this method : getAttributes()
And an example:
List<String> attributNames = new ArrayList<String>();
if(child.getAttributes() != null){
for (int i = 0; i < child.getAttributes().getLength(); i++) {
attributNames.add(child.getAttributes().item(i).getNodeName());
}
}
String[] attributes = new String[child.getAttributes().getLength()];
for (int i = 0; i < attributes.length; i++) {
attributes[i] = child.getAttributes().item(i).getNodeValue();
}

incompatible type of double array and properties string.split()

public static void main(String[] args)
{
String input="jack=susan,kathy,bryan;david=stephen,jack;murphy=bruce,simon,mary";
String[][] family = new String[50][50];
//assign family and children to data by ;
StringTokenizer p = new StringTokenizer (input,";");
int no_of_family = input.replaceAll("[^;]","").length();
no_of_family++;
System.out.println("family= "+no_of_family);
String[] data = new String[no_of_family];
int i=0;
while(p.hasMoreTokens())
{
data[i] = p.nextToken();
i++;
}
for (int j=0;j<no_of_family;j++)
{
family[j][0] = data[j].split("=")[0];
//assign child to data by commas
StringTokenizer v = new StringTokenizer (data[j],",");
int no_of_child = data[j].replaceAll("[^,]","").length();
no_of_child++;
System.out.println("data from input = "+data[j]);
for (int k=1;k<=no_of_child;k++)
{
family[j][k]= data[j].split("=")[1].split(",");
System.out.println(family[j][k]);
}
}
}
i have a list of family in input string and i seperate into a family and i wanna do it in double array family[i][j].
my goal is:
family[0][0]=1st father's name
family[0][1]=1st child name
family[0][2]=2nd child name and so on...
family[0][0]=jack
family[0][1]=susan
family[0][2]=kathy
family[0][3]=bryan
family[1][0]=david
family[1][1]=stephen
family[1][2]=jack
family[2][0]=murphy
family[2][1]=bruce
family[2][2]=simon
family[2][3]=mary
but i got the error as title: in compatible types
found:java.lang.String[]
required:java.lang.String
family[j][k]= data[j].split("=")[1].split(",");
what can i do?i need help
nyone know how to use StringTokenizer for this input?
Trying to understand why you can't just use split for your nested operation as well.
For example, something like this should work just fine
for (int j=0;j<no_of_family;j++)
{
String[] familySplit = data[j].split("=");
family[j][0] = familySplit[0];
String[] childrenSplit = familySplit[1].split(",");
for (int k=0;k<childrenSplit.length;k++)
{
family[j][k+1]= childrenSplit[k];
}
}
You are trying to assign an array of strings to a string. Maybe this will make it more clear?
String[] array = data.split("=")[1].split(",");
Now, if you want the first element of that array you can then do:
family[j][k] = array[0];
I always avoid to use arrays directly. They are hard to manipulate versus dynamic list. I implemented the solution using a Map of parent to a list of childrens Map<String, List<String>> (read Map<Parent, List<Children>>).
public static void main(String[] args) {
String input = "jack=susan,kathy,bryan;david=stephen,jack;murphy=bruce,simon,mary";
Map<String, List<String>> parents = new Hashtable<String, List<String>>();
for ( String family : input.split(";")) {
final String parent = family.split("=")[0];
final String allChildrens = family.split("=")[1];
List<String> childrens = new Vector<String>();
for (String children : allChildrens.split(",")) {
childrens.add(children);
}
parents.put(parent, childrens);
}
System.out.println(parents);
}
The output is this:
{jack=[susan, kathy, bryan], murphy=[bruce, simon, mary], david=[stephen, jack]}
With this method you can directory access to a parent using the map:
System.out.println(parents.get("jack"));
and this output:
[susan, kathy, bryan]

Categories

Resources