HBase - delete columns of rows with range of timestamp without scanning

HBase - delete columns of rows with range of timestamp without scanning - java

I was wonder if I could delete some columns of some rows with timestamp without scanning the whole database
my code is like below:
public static final void deleteBatch(long date, String column, String...ids) throws Exception{
Connection con = null; // connection instance
HTable table = null; // htable instance
List<Delete> deletes = new ArrayList<Delete>(ids.length);
for(int i = 0; i < ids.length; i++){
String id = ids[i];
Delete delete = new Delete(id.getBytes());
delete.addColumn(/* CF */, Bytes.toString(column));
/*
also tried:
delete.addColumn(/* CF */, Bytes.toString(column), date);
*/
delete.setTimestamp(date);
deletes.add(delete);
}
table.delete(deletes);
table.close();
}
this works, but deletes all column prior to given date,
I want something like this:
Delete delete = new Delete(id.getBytes());
delete.setTimestamp(date-1, date);
I don't want to delete prior or after a specific date, I want to delete exact time range I give.
Also my MaxVersion of HTableDescriptor is set to Integer.MAX_VALUE to keep all changes.
as mentioned in the Delete API Documentation:
Specifying timestamps, deleteFamily and deleteColumns will delete all
versions with a timestamp less than or equal to that passed
it delets all columns which their timestamps are equal or less than given date.
how can I achieve that?
any answer appreciated

After struggling for weeks I found a solution for this problem.
the apache HBase has a feature called coprocessor which hosts and manages the core execution of data level operations (get, delete, put ...) and can be overrided(developed) for custom computions like data aggregation and bulk processing against the data outside the client scope.
there are some basic implemention for common problems like bulk delete and etc..

Related

How to add key and value dynamically and call the procedure in java (Oracle ADF)

ViewObject VO = getViewObjectFromAMImpl("EOView2", "AppModuleDataControl");
Row[] selectedRows = VO.getFilteredRows("tSelect", true);
int counter = 0;
ADFContext adfCtx = ADFContext.getCurrent();
SecurityContext secCntx = adfCtx.getSecurityContext();
String _user = secCntx.getUserName();
//Date vDate = getMinDate();
java.sql.Timestamp startDate = null;
for (Row r : selectedRows) {
startDate = (java.sql.Timestamp) r.getAttribute("StartDate");
if ("E".equals(r.getAttribute("SrcType"))) {
r.setAttribute("Type","S");
r.setAttribute("UpdatedBy", new Date());
r.setAttribute("LastUpdateDate", new Date());
counter++;
}
}
System.out.println("printing count"+counter);
if (counter == 0) {
JSFUtils.addFacesErrorMessage((String) JSFUtils.resolveExpression("No records Approved."));
} else {
Commit();
JSFUtils.addFacesInformationMessage((String) JSFUtils.resolveExpression(" records Approved successfully."));
AdfFacesContext.getCurrentInstance().addPartialTarget(hearderTableBind);
}
approvePopup.cancel();
From the above code i will get the selected rows with key and value pair. I want to add those rows ( Key and Value) to a list and i need to call the procedure. Could you please tell me which is the best possible way to achive this.
I want to call the procedure with key and value pair( Multiple values will come)

You should read the doc at
https://docs.oracle.com/en/middleware/developer-tools/adf/12.2.1.4/develop/extending-business-components-functionality1.html#GUID-B93C7B79-73C9-4434-B12E-A7E23479969A
However, I fail to understand why you need to call a pl/sql procedure at all.
You should be able to do everything in ADF or call a procedure directly without iterating over the data just to set some values.
It's not a good idea to change values in ADF, then call a procedure and assume that the framework somehow knows the changes. The procedure runs in the DB in a different transaction. ADF doesn't know about changes done in the function. The function doesn't know about the changes done in ADF until you post them to the DB.

DAO Class in java taking too much time to fetch Data

I have the following Data in my DAO class
public List<Environment> fetchMiddlewareVersions() throws SQLException{
System.out.println("reached version");
Environment environment;
List<Environment> environments=new ArrayList<Environment>();
try{
connection=DBConnectionUtil.getConnection();
statement=connection.createStatement();
statement.setFetchSize(100);
preparedStatement=connection.prepareStatement("select * from middleware_version_details order by application,environment");
resultSet=preparedStatement.executeQuery();
while(resultSet.next())
{
environment = new Environment();
environment.setAppName(resultSet.getString("APPLICATION"));
environment.setHostName(resultSet.getString("HOSTNAME"));
environment.setSoftwareComponent(resultSet.getString("SOFTWARE_COMPONENT"));
environment.setVersion(resultSet.getString("VERSION"));
environment.setInstallPath(resultSet.getString("INSTALL_PATH"));
environment.setRemarks(resultSet.getString("REMARKS"));
environment.setEnvironmental(resultSet.getString("ENVIRONMENT"));
environments.add(environment);
}
}
By the time I get the entire data into JSP page, it has consumed 20-30 seconds already. How do I Increase speed of the Fetch. I tried DynaCache and it hasn't helped.

So barring any sort of connectivity issues, it almost always comes down to the number of records you're fetching. If you're fetching A TON of records, the method will not return until it has gone through each item and created an array object.
I would try adding a LIMIT and OFFSET clause to your SQL Statement to only retrieve records, say, 25 at a time. setFetchSize( int ) does not affect the number of overall records, only the number of records the underlying transport will fetch at a time from your sever. (also, move your SQL query into a static final variable:
private static final String SQL_FETCH_MIDDLEWARE_VERSION =
"SELECT * FROM middleware_version_details order by application, environment " +
"LIMIT = ? OFFSET = ?";
then set the limit and the offset in your prepared statement like so:
preparedStatement.setInt( 1, <RECORD COUNT> );
preparedStatement.setInt( 2, <RECORD START> );
Third, do you have an index on application and environment? If you do not and you will be constantly ordering, filtering and joining on those columns, you should add an index.
Fourth, and it's a minor point but one that I adhere to, is that doing resultSet.getString( "<COLUMN NAME>" ) will cause another call to go look up the column index. It's not usually a huge deal, but if you're trying to be as performant as possible, you should use the numeric index. You can do this by creating private static variables holding the index:
private static int INDEX_ENVIRONMENT = 6;
or you can use a counter and just insure that the columns are in the correct order, something like this:
while(resultSet.next())
{
int iC = 0;
environment = new Environment();
environment.setAppName(resultSet.getString( iC++ ));
environment.setHostName(resultSet.getString( iC++ ));
environment.setSoftwareComponent(resultSet.getString( iC++ ));
environment.setVersion(resultSet.getString( iC++ ));
environment.setInstallPath(resultSet.getString( iC++ ));
environment.setRemarks(resultSet.getString( iC++ ));
environment.setEnvironmental(resultSet.getString( iC++ ));
environments.add(environment);
}
Just insure that you're setting the variables in the correct order and it will be slightly more performant. I like this counter approach as well because it allows me to easily adapt to changing schemas.

ATG Repository API

Im trying to update multiple records via an ATG class extending GenericService.
However im running against a roadblock.
How do I do a multiple insert query where i can keep adding all the items / rows into the cached object and then do a single command sync with the table using item.add() ?
Sample code
the first part is to clear out the rows in the table before insertion happens (mighty helpful if anyone knows of a way to clear all rows in a table without having to loop through and delete one by one).
MutableRepository repo = (MutableRepository) feedRepository;
RepositoryView view = null;
try{
view = getFeedRepository().getView(getFeedRepositoryFeedDataDescriptorName());
RepositoryItem[] items = null;
if(view != null){
QueryBuilder qb = view.getQueryBuilder();
Query getFeedsQuery = qb.createUnconstrainedQuery();
items = view.executeQuery(getFeedsQuery);
}
if(items != null && items.length>0){
// remove all items in the repository
for(RepositoryItem item :items){
repo.removeItem(item.getRepositoryId(), getFeedRepositoryFeedDataDescriptorName());
}
}
for(RSSFeedObject rfo : feedEntries){
MutableRepositoryItem feedItem = repo.createItem(getFeedRepositoryFeedDataDescriptorName());
feedItem.setPropertyValue(DB_COL_AUTHOR, rfo.getAuthor());
feedItem.setPropertyValue(DB_COL_FEEDURL, rfo.getFeedUrl());
feedItem.setPropertyValue(DB_COL_TITLE, rfo.getTitle());
feedItem.setPropertyValue(DB_COL_FEEDURL, rfo.getPublishedDate());
RepositoryItem item = repo.addItem(feedItem) ;
}

The way I interpret your question is that you want to add multiple repository items to your repository but you want to do it fairly efficiently at a database level. I suggest you make use of the Java Transaction API as recommended in the ATG documentation, like so:
TransactionManager tm = ...
TransactionDemarcation td = new TransactionDemarcation ();
try {
try {
td.begin (tm);
... do repository item work ...
}
finally {
td.end ();
}
}
catch (TransactionDemarcationException exc) {
... handle the exception ...
}
Assuming you are using a SQL repository in your example, the SQL INSERT statements will be issued after each call to addItem but will not be committed until/if the transaction completes successfully.

ATG does not provide support for deleting multiple records in a single SQL statement. You can use transactions, as #chrisjleu suggests, but there is no way to do the equivalent of a DELETE WHERE ID IN {"1", "2", ...}. Your code looks correct.
It is possible to invoke stored procedures or execute custom SQL through an ATG Repository, but that isn't generally recommended for portability/maintenance reasons. If you did that, you would also need to flush the appropriate portions of the item/query caches manually.

Hibernate memory management

I have an application that uses hibernate. At one part I am trying to retrieve documents. Each document has an account number. The model looks something like this:
private Long _id;
private String _acct;
private String _message;
private String _document;
private String _doctype;
private Date _review_date;
I then retrieve the documents with a document service. A portion of the code is here:
public List<Doc_table> getDocuments(int hours_, int dummyFlag_,List<String> accts) {
List<Doc_table> documents = new ArrayList<Doc_table>();
Session session = null;
Criteria criteria = null;
try {
// Lets create a previous Date by subtracting the number of
// subtractHours_ passed.
session = HibernateUtil.getSession();
session.beginTransaction();
if (accts == null) {
Calendar cutoffTime = Calendar.getInstance();
cutoffTime.add(Calendar.HOUR_OF_DAY, hours_);
criteria = session.createCriteria(Doc_table.class).add(
Restrictions.gt("dbcreate_date", cutoffTime.getTime()))
.add(Restrictions.eq("dummyflag", dummyFlag_));
} else
{ criteria = session.createCriteria(Doc_table.class).add(Restrictions.in("acct", accts));
}
documents = criteria.list();
for (int x = 0; x < documents.size(); x++) {
Doc_table document = documents.get(x);
......... more stuff here
}
This works great if I'm retrieving a small number of documents. But when the document size is large I get a heap space error, probably because the documents take up a lot of space and when you retrieve several thousand of them, bad things happen.
All I really want to do is retrieve each document that fits my criteria, grab the account number and return a list of account numbers (a far smaller object than a list of objects). If this were jdbc, I would know exactly what to do.
But in this case I'm stumped. I guess I'm looking for a way where I can bring just get the account numbers of the Doc_table object back.
Or alternatively, some way where I can retrieve documents one at a time from the database using hibernate that fit my criteria (instead of bringing back the whole List of objects which uses too much memory).

There are several ways to deal with the problem:
loading the docs in batches of an smaller size
(The way you noticed) not to query for the Document, but only for the account numbers:
List accts = session.createQuery("SELECT d._acct FROM Doc d WHERE ...");
or
List<String> accts = session.createCriteria(Doc.class).
setProjection(Projections.property("_acct")).
list();
When there is a special field in you Document class that contains the huge amount Document byte data, then you could map this special field as a Lazy loaded field.
Create a second entity class (read only) that contains only the fields that you need and map it to the same table

Instead of fetching all documents i.e, all records at once, try to limit the rows being fetched. Also, deploy a strategy where in you can store documents temporarily as flat files and fetch them later or delete after usage. Though its a bit long process,its efficient way of handling and delivering documents from database.

Transaction issue in java with hibernate - latest entries not pulled from database

I'm having what seems to be a transactional issue in my application. I'm using Java 1.6 and Hibernate 3.2.5.
My application runs a monthly process where it creates billing entries for a every user in the database based on their monthly activity. These billing entries are then used to create Monthly Bill object. The process is:
Get users who have activity in the past month
Create the relevant billing entries for each user
Get the set of billing entries that we've just created
Create a Monthly Bill based on these entries
Everything works fine until Step 3 above. The Billing Entries are correctly created (I can see them in the database if I add a breakpoint after the Billing Entry creation method), but they are not pulled out of the database. As a result, an incorrect Monthly Bill is generated.
If I run the code again (without clearing out the database), new Billing Entries are created and Step 3 pulls out the entries created in the first run (but not the second run). This, to me, is very confusing.
My code looks like the following:
for (User user : usersWithActivities) {
createBillingEntriesForUser(user.getId());
userBillingEntries = getLastMonthsBillingEntriesForUser(user.getId());
createXMLBillForUser(user.getId(), userBillingEntries);
}
The methods called look like the following:
#Transactional
public void createBillingEntriesForUser(Long id) {
UserManager userManager = ManagerFactory.getUserManager();
User user = userManager.getUser(id);
List<AccountEvent> events = getLastMonthsAccountEventsForUser(id);
BillingEntry entry = new BillingEntry();
if (null != events) {
for (AccountEvent event : events) {
if (event.getEventType().equals(EventType.ENABLE)) {
Calendar cal = Calendar.getInstance();
Date eventDate = event.getTimestamp();
cal.setTime(eventDate);
double startDate = cal.get(Calendar.DATE);
double numOfDaysInMonth = cal.getActualMaximum(Calendar.DAY_OF_MONTH);
double numberOfDaysInUse = numOfDaysInMonth - startDate;
double fractionToCharge = numberOfDaysInUse/numOfDaysInMonth;
BigDecimal amount = BigDecimal.valueOf(fractionToCharge * Prices.MONTHLY_COST);
amount.scale();
entry.setAmount(amount);
entry.setUser(user);
entry.setTimestamp(eventDate);
userManager.saveOrUpdate(entry);
}
}
}
}
#Transactional
public Collection<BillingEntry> getLastMonthsBillingEntriesForUser(Long id) {
if (log.isDebugEnabled())
log.debug("Getting all the billing entries for last month for user with ID " + id);
//String queryString = "select billingEntry from BillingEntry as billingEntry where billingEntry>=:firstOfLastMonth and billingEntry.timestamp<:firstOfCurrentMonth and billingEntry.user=:user";
String queryString = "select be from BillingEntry as be join be.user as user where user.id=:id and be.timestamp>=:firstOfLastMonth and be.timestamp<:firstOfCurrentMonth";
//This parameter will be the start of the last month ie. start of billing cycle
SearchParameter firstOfLastMonth = new SearchParameter();
firstOfLastMonth.setTemporalType(TemporalType.DATE);
//this parameter holds the start of the CURRENT month - ie. end of billing cycle
SearchParameter firstOfCurrentMonth = new SearchParameter();
firstOfCurrentMonth.setTemporalType(TemporalType.DATE);
Query query = super.entityManager.createQuery(queryString);
query.setParameter("firstOfCurrentMonth", getFirstOfCurrentMonth());
query.setParameter("firstOfLastMonth", getFirstOfLastMonth());
query.setParameter("id", id);
List<BillingEntry> entries = query.getResultList();
return entries;
}
public MonthlyBill createXMLBillForUser(Long id, Collection<BillingEntry> billingEntries) {
BillingHistoryManager manager = ManagerFactory.getBillingHistoryManager();
UserManager userManager = ManagerFactory.getUserManager();
MonthlyBill mb = new MonthlyBill();
User user = userManager.getUser(id);
mb.setUser(user);
mb.setTimestamp(new Date());
Set<BillingEntry> entries = new HashSet<BillingEntry>();
entries.addAll(billingEntries);
String xml = createXmlForMonthlyBill(user, entries);
mb.setXmlBill(xml);
mb.setBillingEntries(entries);
MonthlyBill bill = (MonthlyBill) manager.saveOrUpdate(mb);
return bill;
}
Help with this issue would be greatly appreciated as its been wracking my brain for weeks now!
Thanks in advance,
Gearoid.

Is your top method also transactional ? If yes most of the time i've encountered that kind of problem, it was a flush that was not done at the right time by hibernate.
Try to add a call to session.flush() at the beginning of the getLastMonthsBillingEntriesForUser method, see if it address your problem.

Call session.flush() AND session.close() before getLastMonthsBillingEntriesForUser gets called.

Please correct my assumptions if they are not correct...
As far as I can tell, the relationship between entry and user is a many to one.
So why is your query doing a "one to many" type join? You should rather make your query:
select be from BillingEntry as be where be.user=:user and be.timestamp >= :firstOfLastMonth and be.timestamp < :firstOfCurrentMonth
And then pass in the User object, not the user id. This query will be a little lighter in that it will not have to fetch the details for the user. i.e. not have to do a select on user.
Unfortunately this is probably not causing your problem, but it's worth fixing nevertheless.

Move the declaration of BillingEntry entry = new BillingEntry(); to within the for loop. That code looks like it's updating one entry over and over again.
I'm guessing here, but what you've coded goes against what I think I know about java persistence and hibernate.
Are you certain that those entries are being persisted properly? In my mind, what is happening is that a new BillingEntry is being created, it is then persisted. At this point the next iteration of the loop simply changes the values of an entry and calls merge. It doesn't look like you're doing anything to create a new BillingEntry after the first time, thus no new id's are generated which is why you can't retrieve them later.
That being said, I'm not convinced the timing of the flush isn't a culprit here either, so I'll wait with bated breathe for the downvotes.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

HBase - delete columns of rows with range of timestamp without scanning - java

Related

How to add key and value dynamically and call the procedure in java (Oracle ADF)

DAO Class in java taking too much time to fetch Data

ATG Repository API

Hibernate memory management

Transaction issue in java with hibernate - latest entries not pulled from database

Categories

Resources