I need to process big Excel files. My problem is heap space especially with XLS format. Even if I have one Apsose LighCells API to parse it sequentially I need to retrieve the file from the database in chuncks.
So far I have the file in the database in chunks of 40kb. I have an Import table (storing general information related to the import, for instance start and end time, data type etc.) and an ImportData table (containing the chunks of data as blobs). I have a one to many relation between Import and ImportData:
<hibernate-mapping> <class name="com.company.import.pojos.Import
table="IMPORT_TABLE">
<id name="id" type="integer">
<column name="ID" />
<generator class="some.id.generator.IdGenerator"></generator>
</id>
<property name="startTime" type="timestamp">
<column name="START" />
</property>
<property name="endTime" type="timestamp">
<column name="END" />
</property>
<property lazy="false" name="datatzpe" type="com.company.import.enums.ImportDataType">
<column name="DATATYPE" />
</property>
<bag name="importDataList" table="IMPORT_DATA" lazy="true" cascade="all" inverse="false">
<key column="IMPORT_TABLE_ID"/>
<one-to-many class="com.company.import.pojos.ImportData"/>
</bag>
</class> </hibernate-mapping>
<hibernate-mapping> <class
name="com.company.import.pojos.ImportData" table="IMPORT_DATA">
<id name="id" type="integer">
<column name="ID" />
<generator class="some.id.generator.IdGenerator"></generator>
</id>
<property name="importTableID" type="integer">
<column name="IMPORT_TABLE_ID" />
</property>
<property name="data" type="binary">
<column name="DATA" />
</property>
<property name="order" type="integer">
<column name="ORDER" />
</property> </class> </hibernate-mapping>
I use Hibernate (4.2.3.Final), but even lazy loading in not an option because then I will have the whole list of ImportData objects in memory which is why I get an OutOfMemoryError and I want to avoid that.
Providing more memory is not an option (the application runs ona container that runs other applications as well, and several users might process files at the same time so the heaps space problem will eventually happen), therefore I am looking for a way to always read the next chunk of data and provide it to the stream that is given to the Aspose API.
I create my workbook in Aspose simply like this:
InputStream inStream = ....;// get the input stream somehow
Workbook workbook = new Workbook(inStream);
You may need to implement your own InputStream, which handles multiple IS as one, handling that when you reach the end of one IS you close it and pick the next one to fill the rest of the read buffer.
I do not know any implementation for this behaviour.
How big is your file ?
Related
I have two tables: patient_data and patient_diagnosis
Patient_data contains personal data of patient like: pid (pkey), gender, birth_date
patient_diagnosis contains the diagnosis data of the registered patients. It has fields like: record_id (pkey), pid (fkey to patient_data(pid)), diagnosis_date and other related fields.
Now, I want to join these two tables on pid and have all these fields in a single type of object.
Here is the mapping file:
<hibernate-mapping>
<class catalog="emr" name="in.Models.Emr" table="patient_diagnosis">
<id name="recordid" type="long">
<column name="record_id"/>
</id>
<property name="diagnosisDate" type="timestamp">
<column length="19" name="diagnosis_date" not-null="true"/>
</property>
<property name="snomedTermPrimary" type="long">
<column name="snomed_term_primary" not-null="true" />
</property>
<property name="snomedTermSecondary" type="string">
<column name="snomed_term_secondary" />
</property>
<property name="episodeNo" type="long">
<column name="episode_no" not-null="true" />
</property>
<property name="pid" type="long">
<column name="pid" not-null="true" />
</property>
<join table="patient_data">
<key column="pid"/>
<property name="gender" type="string">
<column name="gender" not-null="true"/>
</property>
<property name="birthDate" type="timestamp">
<column length="19" name="birth_date" not-null="true"/>
</property>
</join>
</class>
</hibernate-mapping>
But, the join applies on patient_diagnosis.record_id = patient_data.pid instead of patient_diagnosis.pid = patient_data.pid i.e. HQL applies on primary key of first table with mentioned column from second table.
Please provide the solution so that join can be applied on mentioned column from first with mentioned column from second table. Or is there another way out?
Please note that in case I didn't create classes for patient_data or patient_diagnosis. But, just Emr class having combination of fields of these tables is created.
Try giving foreign key
<id name="pid" type="java.lang.Long">
<column name="pid" />
<generator class="foreign">
<param name="property">patient_data</param>
</generator>
</id>
I am not sure but maybe this should work.
And
<one-to-one name="patient_data" class="in.Models.Emr"
cascade="save-update"></one-to-one>
Similarly in Join class
<one-to-one name="patient_diagnosis" class="in.Models.Emr"
cascade="save-update"></one-to-one>
I hope this helps you.
With the hibernate mapping file as shown :
<hibernate-mapping>
<class name="pojo.Ghazal" table="ghazal">
<id name="s_no">
<generator class="increment" />
</id>
<property name="poem" />
<property name="poet" />
<map name="map" table="linked">
<key column="s_no" />
<index column="key_" type="string" />
<element column="val_" type="string" />
</map>
</class>
</hibernate-mapping>
what type of mapping it is ?
The pojo named ghazal has the following properties :
s_no
poem
poet
map
I have heard many types of mapping like many to one, one to one, etc etc.
This is an association done with collection(Map named "map" in your case) of values, relationship is Many to many, you can find the detailed docs here
I have a java/spring/hibernate project that uses two different schemas on a same database server (second schema is read-only). I've set up a database.properties file that defines names of both schemas:
default.catalog=testdb1
db2.catalog=testdb2
And in the ApplicationContext.xml I can set the default schema from the properties file as such:
<property name="hibernateProperties"><props>
<prop key="hibernate.dialect">org.hibernate.dialect.MySQLInnoDBDialect</prop>
<prop key="hibernate.default_catalog">${default.catalog}</prop>
<prop key="hibernate.jdbc.batch_size">50</prop>
</props>
But the problem is that I currently have a few hibernate objects that need to access both schemas. In order to achieve this I've used the 'catalog' property to define the second schema to the hibernate mappings like such:
DBObjectXX.hbm.xml
<class name="fi.company.project.object.DBObjectXX" table="TestTableFromDB1">
<id name="id" type="int">
<column name="ID"/>
<generator class="native"/>
</id>
<join table="TestTableFromDB2" catalog="testdb2"> <-- Change this ..
<key column="ID" not-null="true" unique="true"/>
<property name="propertyYY" type="string">
<column name="PropertyYY" not-null="true"/>
</property>
</join>
</class>
This works fine as it is. But every now and then I need to chane the schema names (in testing for example) and I have to do it by search & replacing every mapping file. What I would like to do is to have the schema name come from the database.properties file like such:
<class name="fi.company.project.object.DBObjectXX" table="TestTableFromDB1">
<id name="id" type="int">
<column name="ID"/>
<generator class="native"/>
</id>
<join table="TestTableFromDB2" catalog="${db2.catalog}"> <-- .. to this or something similar
<key column="ID" not-null="true" unique="true"/>
<property name="propertyYY" type="string">
<column name="PropertyYY" not-null="true"/>
</property>
</join>
</class>
Any ideas on how to do this?
You cannot unfortunately have the same SessionFactory with two different schemas. There is on other option you have to use two SessionFactories.
I also don't understand why you cannot use two different SessionFactories if you can have two different schemas.
If this is however true then dude you need seriously reconsider you application design and architecture...
I have three entities, in which i try to save only 1 entity right now. All the three entities are shown below :-
1. Student Entity
<class name="com.school.Student" table="TABLE_STUDENT">
<id name="id" type="long">
<column name="ST_ID" />
<generator class="native" />
</id>
<property name="name" type="string" column="ST_NAME"/>
<many-to-one name="studentSection" class="com.school.Section" fetch="select">
<column name="SECTION_ID" not-null="true" />
</many-to-one>
<many-to-one name="studentSportsTeam" class="com.school.SportsTeam" fetch="select">
<column name="SPORTS_TEAM" not-null="true" />
</many-to-one>
</class>
2. Section Entity
<class name="com.school.Section" table="TABLE_SECTION">
<id name="sectionId" type="string">
<column name="SECTION_ID" />
<generator class="assigned" />
</id>
<property name="floor" type="string" column="SEC_FLOOR"/>
<property name="capcacity" type="int" column="SEC_CAPACITY"/>
<set name="studentDetails" inverse="true" lazy="true" table="TABLE_STUDENT" fetch="select">
<key>
<column name="SECTION_ID" not-null="true" />
</key>
<one-to-many class="com.school.Student" />
</set>
</class>
3. SprotsTeam Entity :-
<class name="com.school.SportsTeam" table="TABLE_SPORTS">
<id name="sportsTeamId" type="string">
<column name="SPORTS_TEAM" />
<generator class="assigned" />
</id>
<property name="noOfPlayers" type="int" column="SPORTS_PLAYER_NUM"/>
<property name="captainName" type="string" column="SPORTS_CAPTAIN_NAME"/>
<set name="playerDetails" inverse="true" lazy="true" table="TABLE_STUDENT" fetch="select">
<key>
<column name="SPORTS_TEAM" not-null="true" />
</key>
<one-to-many class="com.school.Student" />
</set>
</class>
Now if i try to save Student Entity with proper Section and SportsTeam details, it takes a lot of time to persist it into the database. Currently i am running it for around 10000 students and this process (only persisting) takes around 15 mins. I added some loggers to calculate the complete time.
Now i need to reduce this time, as we will shorty move from 10,000 to 1 million records, and as calculated it takes very long time.. I need to reduce the time , how can i do that??
As Required, also the schema is as below :-
TABLE STUDENT :
ST_ID NUMBER,
ST_NAME VARCHAR(40),
SECTION_ID VARCHAR(10),
SPORTS_TEAM VARCHAR(10)
TABLE_SECTION :
SECTION_ID VARCHAR(10),
SEC_FLOOR VARCHAR(2),
SEC_CAPACITY NUMBER
TABLE_SPORTS :
SPORTS_TEAM VARCHAR(10),
SPORTS_PLAYER_NUM NUMBER,
SPORTS_CAPTAIN_NAME VARCHAR(40)
Please help
Consider batch inserts.
Moreover, if this batch inserts is just for once i.e. as a part of migration process, then IMO, you can consider dropping all primary keys -- and any other indices, on the tables, and then insert the records. After that re-create all those. Hopefully, you would notice a significant improvement.
I think you are trying to import the Student Entity with Section and SportsTeam. At this time if you set the values from the input data to the elements of the Entities and call save on Student entity then it would result into creation of a Student, Section and SportsTeam records in the data base. Which means if you save 10000 student entity then you are effectvely creating 10000 SportsTeam record and 10000 Section record.
Instead I would suggest you to follow these steps:
1. Read a line from the input data
2. query db (through hql) for the already exiting SportsTeam and Section
3. If no records found for SportsTeam and Section then create them
3. Create a new student record and set the SportsTeam and Section from step2.
4. Save the student record.
Also I would suggest you to optimize the creation and save of Students objects in a batch of say 1000 in a single transaction. Closing the hibernate transaction after a batch and releasing the objects would help increase the utilization of DB/Network as well as memory.
I would specify a length for string properties; otherwise they may be implemented as clobs
I just coded my first Hibernate examples.
The database connection works and I understand how I can map a String from a POJO to a database field:
private String firstName;
And in the mapping file:
<property name="firstName" type="java.lang.String">
<column name="FIRSTNAME" />
</property>
But how can I map an ArrayList to the database? A simpl example from the mapping xml file would be appreciated.
Cheers
UPDATE
I switched to List instead of ArrayList found an example. Now I map as follows:
<list name="test" inverse="false" table="CONTACT" lazy="true">
<key>
<column name="ID" />
</key>
<list-index></list-index>
<element type="java.lang.String">
<column name="TEST" />
</element>
</list>
Unfortunately, I get an exception that I do not understand:
Exception in thread "main" org.hibernate.MappingException: Foreign key (FK6382B0003257FF7F:CONTACT [ID])) must have same number of columns as the referenced primary key (CONTACT [ID,idx])
Any ideas?
Cheers
I notice that you are using XML to map your POJOs. You will find some information about that here.
for example:
<list name="myArrayListProperty" cascade="all">
<key column="parent_id"/>
<index column="idx"/>
<one-to-many class="WhatIsInTheList"/>
</list>
However, using annotations have some advantages. This link will explain how to map any collection using annotations.
See the collection mapping section of the docs. There are multiple ways to map a list (one-to-many, many-to-many, a collection of elements). You can map it as a list or as a bag, so read the whole section.
You have a little error in the XML configuration:
When you have a list the solution to map this list using a database is to link with a additional table, so instead of doing:
<list name="test" inverse="false" table="CONTACT" lazy="true">
<key>
<column name="ID" />
</key>
<list-index></list-index>
<element type="java.lang.String">
<column name="TEST" />
</element>
</list>
You should have to do map to a new data table that holds the list values:
<list name="test" inverse="false" table="CONTACT_test" lazy="true">
<key>
<column name="ID" />
</key>
<list-index></list-index>
<element type="java.lang.String">
<column name="TEST" />
</element>
</list>
Hibernate automatically creates the new table for you.