Performance difference in SSAS query in MSS Studio vs Java MDX - java

For context, I am not an SSAS expert, or even an avid user, I'm primarily a Java developer. We have a data science team that uses SSAS to write, develop and test various models.
In order to integrate the output of these models with other non-Microsoft systems, I am building a Java-based service that can query certain fields from the cube using Olap4j/XMLA to run an MDX query. But the performance (or lack thereof) is confusing me.
If I open MSS Studio, "Browse" the cube, drag a number of measures into the measures pane, toggle "Show Empty Cells" (otherwise for some reason I get no results), and hit execute, I get the expected results almost instantly. If I click on the red square to turn off "design mode", it takes me to the MDX code that looks something like:
SELECT { } ON COLUMNS, { (
[Main].[Measure01].[Measure01].ALLMEMBERS *
[Main].[Measure02].[Measure02].ALLMEMBERS *
[Main].[Measure03].[Measure03].ALLMEMBERS *
[Main].[Measure04].[Measure04].ALLMEMBERS
) } DIMENSION PROPERTIES MEMBER_CAPTION, MEMBER_UNIQUE_NAME ON ROWS FROM (
SELECT ( {
[Main].[Measure01].&[1]
} ) ON COLUMNS FROM [Model])
CELL PROPERTIES VALUE
If I take this MDX query and paste it into my Java application, and run it, it takes over 30 seconds to return the results, using the following code:
Class.forName("org.olap4j.driver.xmla.XmlaOlap4jDriver");
Connection connection = DriverManager.getConnection(cubeUrl);
OlapConnection olapConnection = connection.unwrap(OlapConnection.class);
olapConnection.setCatalog(catalog);
OlapStatement statement = olapConnection.createStatement();
LOG.info("Running Cube query");
CellSet cellSet = statement.executeOlapQuery("<<The MDX Query here>>");
And the more measures I add, the slower it gets. I've tried putting some logging and debug breakpoints in my code, but it really seems like it's SSAS itself that is being slow in returning my data.
Bearing in mind I know very little about SSAS, what can I try? Does Olap4j have some config options I haven't set? Does MSS Studio do some optimization behind the scenes that is impossible for me to replicate?
EDIT 1:
On a hunch I installed Wireshark to monitor my network traffic, and while my query is running I see hundreds of thousands if not millions of packets going between my laptop and the SSAS server. Network packets are hard to interpret, but a lot of them seem to be sending HTTP data with the measure values in. Things like:
<Member Hierarchy="[Main].[Measure01]">
<UName>[Main].[Measure01].&[0]</UName>
<Caption>0.00</Caption>
<LName>[Main].[Measure01].[Measure01]</LName>
<LNum>1</LNum>
<DisplayInfo>131072</DisplayInfo>
<MEMBER_CAPTION>0.00</MEMBER_CAPTION>
<MEMBER_UNIQUE_NAME>[Main].[Measure01].&[0]</MEMBER_UNIQUE_NAME>
<MEMBER_NAME>0</MEMBER_NAME>
<MEMBER_VALUE xsi:type="xsd:double">0</MEMBER_VALUE>
</Member>
<Member Hierarchy="[Main].[Measure02]">
<UName>[Main].[Measure02].&[0]</UName>
<Caption>0</Caption>
<LName>[Main].[Measure02].[Measure02]</LName>
<LNum>1</LNum>
<DisplayInfo>131072</DisplayInfo>
<MEMBER_CAPTION>0</MEMBER_CAPTION>
<MEMBER_UNIQUE_NAME>[Main].[Measure02].&[0]</MEMBER_UNIQUE_NAME>
<MEMBER_NAME>0</MEMBER_NAME>
<MEMBER_VALUE xsi:type="xsd:double">0</MEMBER_VALUE>
</Member>
<Member Hierarchy="[Main].[Measure03]">
<UName>[Main].[Measure03].&</UName>
<Caption/>
<LName>[Main].[Measure03].[Measure03]</LName>
<LNum>1</LNum>
<DisplayInfo>131072</DisplayInfo>
<MEMBER_UNIQUE_NAME>[Main].[Measure03].&</MEMBER_UNIQUE_NAME>
<MEMBER_VALUE xsi:nil="true"/>
</Member>
So it seems like the slowness might actually all be network traffic! Is there a way to get olap4j/IIS/SSAS to compress the traffic, so that I can get similar performance in olap4j as I get with MSS, where the same volume of data is retrieved in under a second?

Related

XML node replace failure

I try Corb to search and update node in large number of documents:
Sample input:
<hcmt xmlns="http://horn.thoery">
<susceptible>X</susceptible>
<reponsible>foresee–intervention</reponsible>
<intend>Benefit Protagonist</intend>
<justified>Goal Outwiegen</justified>
</hcmt>
Xquery:
(: let $resp := "foresee–intervention" :)
let $docs :=
cts:search(doc(),
cts:and-query((
cts:collection-query("hcmt"),
cts:path-range-query("/horn:hcmt/horn:responsible", "=", $resp)
))
)
return
for $doc in $docs
return
xdmp:node-replace($doc/horn:hcmt/horn:responsible, "Foresee Intervention")
Expected output:
<hcmt xmlns="http://horn.thoery">
<susceptible>X</susceptible>
<reponsible>Foresee Intervention</reponsible>
<intend>Benefit Protagonist</intend>
<justified>Goal Outwiegen</justified>
</hcmt>
But node-replace didn’t happen in Corb and no error returns. Other queries work fine in Corb. How can the node-replace work correctly in Corb?
Thanks in advance for any help.
I create functions to reconcile the encoding matters. This not only mitigates potential API transaction failures but also is a requisite to validate & encode parameter or element/property/uri name.
That said, a sample MarkLogic Java API implementation is:
Create a dynamic query construct in the filesystem, in my case, product-query-option.xml (use the query value directly: Chooser–Option)
<search xmlns="http://marklogic.com/appservices/search">
<query>
<and-query>
<collection-constraint-query>
<constraint-name>Collection</constraint-name>
<uri>proto</uri>
</collection-constraint-query>
<range-constraint-query>
<constraint-name>ProductType</constraint-name>
<value>Chooser–Option</value>
</range-constraint-query>
</and-query>
</query>
</search>
Deploy the persistent query options to modules database, in my case, search-lexis.xml, the options file is like:
<options xmlns="http://marklogic.com/appservices/search">
<constraint name="Collection">
<collection prefix=""/>
</constraint>
<constraint name="ProductType">
<range type="xs:string" collation="http://marklogic.com/collation/en/S1">
<path-index xmlns:prod="schema://fc.fasset/product">/prod:requestProduct/prod:_metaData/prod:productType</path-index>
</range>
</constraint>
</options>
Follow on from Dynamic Java Search
File file = new File("src/main/resources/queryoption/product-query-option.xml");
FileHandle fileHandle = new FileHandle(file);
RawCombinedQueryDefinition rcqDef = queryMgr.newRawCombinedQueryDefinition(fileHandle, queryOption);
You can, assuredly, combine the query and the options as one handle in QueryDefinition.
Your original node-replace is translated as Java Partial Update
make sure the DocumentPatchBuilder setNamespaces with the correct NamespaceContext.
For batch data operation, the performant approach is MarkLogic Data Movement: instantiate the QueryBatcher with the searched Uris, supply the replace value or data fragment PatchBuilder.replaceValue
, and complete the batch with
dbClient.newXMLDocumentManager().patch(uri, patchHandle);
MarkLogic Data Services: If you succeed above, perhaps, then go at a more robust and scalable enterprise SOA approach, please review Data Services.
The implementation with Gradle is like:
(Note, all of the transformation metrics should be parameters, including path/element/property name, namespace, value…etc. Nothing is hardcoded.) One proxy service declared in service.json can serve multiple end points (under /root/df-ds/fxd ) with different types of modules which give you the free rein to develop pure Java or extend the development platform to handle complex data operations.
If these operations are persistent node update, you should consider in-memory node transform before the ingestion. Besides the MarkLogic data transformation tools, you can harness the power of XSLT2+.
Saxon XPathFactory could be a serviceable vehicle to query/transform node. Not sure if it is a reciprocity, ML Java API implements the XPath compile to split large paths and stream transaction. XSLT/Saxon is not my forte; therefore, I can’t comment how comparable it is with this encode/decode particularity or how it handles transaction (insert, update…etc) streaming.

Why is a blank Java Icon appearing when I parse a PDF file using Tabula?

I am working on an integration with Apache Drill which enables users to query PDF files directly using SQL. I'm about 80% done and really impressed with how well Tabula works for this.
However, when I execute the first Drill query that uses the Tabula libraries a Java icon pops up and I get the following text in the command line:
2020-10-25 15:06:55.770 java[71188:7121498] Persistent UI failed to open file file://localhost/Users/******/Saved%20Application%20State/net.java.openjdk.cmd.savedState/window_1.data: Permission denied (13)
I changed the permissions on that directory but I'm still getting the Java popup.
This is not normal behavior for Drill and my goal here was to integrate Tabula programmatically. Is Tabula trying to open a window or something like that and if so, is there a way to disable this? I noted that this does not occur in my unit tests.
Here are some relevant code snippets:
public static List<Table> extractTablesFromPDF(PDDocument document, ExtractionAlgorithm algorithm) {
NurminenDetectionAlgorithm detectionAlgorithm = new NurminenDetectionAlgorithm();
ExtractionAlgorithm algExtractor;
SpreadsheetExtractionAlgorithm extractor=new SpreadsheetExtractionAlgorithm();
ObjectExtractor objectExtractor = new ObjectExtractor(document);
PageIterator pages = objectExtractor.extract();
List<Table> tables= new ArrayList<>();
while (pages.hasNext()) {
Page page = pages.next();
algExtractor = algorithm;
/*if (extractor.isTabular(page)) {
algExtractor=new SpreadsheetExtractionAlgorithm();
}
else {
algExtractor = new BasicExtractionAlgorithm();
}*/
List<Rectangle> tablesOnPage = detectionAlgorithm.detect(page);
for (Rectangle guessRect : tablesOnPage) {
Page guess = page.getArea(guessRect);
tables.addAll(algExtractor.extract(guess));
}
}
return tables;
}
This doesn't happen in my unit tests.
Thanks in advance for your help!
Because some code is executed that does an operation that is usually, but technically not necessarily, involved in things that require so-called 'headful' mode (well, that's perhaps not really a term, but the opposite, 'headless' certainly is). This causes a few things to happen, including that icon showing up.
One easy way out of this is to force headless mode. But note that when you do this, any of these 'usually but technically not neccessarily headful' operations may either [1] work fine and no longer show that icon, or, [2] crash with a HeadlessException. Which one you end up with is not just dependent on which operation you're doing, but also which VM you are doing it on - as a rule once one of these ops works fine and no longer throws, later versions won't revert back to throwing (in other words, newer versions of java offer more things that work in headless mode).
To force headless mode, run java with java -Djava.awt.headless=true.
If you must do it from within java code, run System.setProperty("java.awt.headless", "true"); at least once, and before you do any of these 'usually causes headful mode' operations.
Presumably, the thing that is causes headful mode to occur is something graphics involved, such as rendering a JPG or PNG into an ImageBuffer. It's not surprising that Apache Drill is doing this to 'read' images, for example.
Another option is to just upgrade your VM, maybe that helps. As a general rule, features 'move downwards' on this line:
Requires headful mode; running it makes the VM go headful (icon appears); if java.awt.headless is set, the operation fails with a HeadlessException.
Causes headful mode; running it makes the VM go headful. However, if headless is set, it works fine and won't do that.
Completely freed. Running the code works fine and does not cause the VM to go headful. the headless flag has no bearing whatsoever on how the code operates.

Getting CPU 100 percent when I am trying to downloading CSV in Spring

I am getting CPU performance issue on server when I am trying to download CSV in my project, CPU goes 100% but SQL returns the response within 1 minute. In the CSV we are writing around 600K records for one user it is working fine but for concurrent users we are getting this issue.
Environment
Spring 4.2.5
Tomcat 7/8 (RAM 2GB Allocated)
MySQL 5.0.5
Java 1.7
Here is the Spring Controller code:-
#RequestMapping(value="csvData")
public void getCSVData(HttpServletRequest request,
HttpServletResponse response,
#RequestParam(value="param1", required=false) String param1,
#RequestParam(value="param2", required=false) String param2,
#RequestParam(value="param3", required=false) String param3) throws IOException{
List<Log> logs = service.getCSVData(param1,param2,param3);
response.setHeader("Content-type","application/csv");
response.setHeader("Content-disposition","inline; filename=logData.csv");
PrintWriter out = response.getWriter();
out.println("Field1,Field2,Field3,.......,Field16");
for(Log row: logs){
out.println(row.getField1()+","+row.getField2()+","+row.getField3()+"......"+row.getField16());
}
out.flush();
out.close();
}}
Persistance Code:- I am using spring JDBCTemplate
#Override
public List<Log> getCSVLog(String param1,String param2,String param3) {
String sql =SqlConstants.CSV_ACTIVITY.toString();
List<Log> csvLog = JdbcTemplate.query(sql, new Object[]{param1, param2, param3},
new RowMapper<Log>() {
#Override
public Log mapRow(ResultSet rs, int rowNum)
throws SQLException {
Log log = new Log();
log.getField1(rs.getInt("field1"));
log.getField2(rs.getString("field2"));
log.getField3(rs.getString("field3"));
.
.
.
log.getField16(rs.getString("field16"));
}
return log;
}
});
return csvLog;
}
I think you need to be specific on what you meant by "100% CPU usage" whether it's the Java process or MySQL server. As you have got 600K records, trying to load everything in to memory would easily end up in OutOfMemoryError. Given that this works for one user means that you've got enough heap space to process this number of records for just one user and symptoms surface when there are multiple users trying to use the same service.
First issue I can see in your posted code is that you try to load everything into one big list and the size of the list varies based on the content of the Log class. Using a list like this also means that you have to have enough memory to process JDBC result set and generate new list of Log instances. This can be a major problem with a growing number of users. This type of short-lived objects will cause frequent GC and once GC cannot keep up with the amount of garbage being collected it fails obviously. To solve this major issue my suggestion is to use ScrollableResultSet. Additionally you can make this result set read-only, for example below is code fragment for creating a scrollable result set. Take a look at the documentation for how to use it.
Statement st = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE, ResultSet.CONCUR_READ_ONLY);
Above option is suitable if you're using pure JDBC or SpringJDBC template. If Hibernate is already used in your project you can still achieve the same this with the below code fragment. Again please check the documentation for more information and you have a different JPA provider.
StatelessSession session = sessionFactory.openStatelessSession();
Query query = session.createSQLQuery(queryStr).setCacheable(false).setFetchSize(Integer.MIN_VALUE).setReadOnly(true);
query.setParameter(query_param_key, query_paramter_value);
ScrollableResults resultSet = query.scroll(ScrollMode.FORWARD_ONLY);
This way you're not loading all the records to Java process in one go, instead you they're loaded on demand and will have small memory footprint at any given time. Note that JDBC connection will be open until you're done with processing the entire record set. This also means that your DB connection pool can be exhausted if many users are going to download CSV files from this endpoint. You need to take measures to overcome this problem (i.e use of an API manager to rate limit the calls to this endpoint, reading from a read-replica or whatever viable option).
My other suggestion is to stream data which you have already done, so that any records fetched from the DB are processed and sent to client before the next set of records are processed. Again I would suggest you to use a CSV library such as SuperCSV to handle this as these libraries are designed to handle a good load of data.
Please note that this answer may not exactly answer your question as you haven't provided necessary parts of your source such as how to retrieve data from DB but will give the right direction to solve this issue
Your problem in loading all data on application server from database at once, try to run query with limit and offset parameters (with mandatory order by), push loaded records to client and load next part of data with different offset. It help you decrease memory footprint and will not required keep connection to database open all the time. Of course, database will loaded a bit more, but maybe whole situation will better. Try different limit values, for example 5K-50K and monitor cpu usage - on both app server and database.
If you can allow keep many open connection to database #Bunti answer is very good.
http://dev.mysql.com/doc/refman/5.7/en/select.html

SpreadsheetAddRows failing on moderate size query

Edit: i changed the name as there is a similar SO question How do I fix SpreadSheetAddRows function crashing when adding a large query? out there that describes my issue so i pharased more succinctly...the issue is spreadsheetAddrows for my query result bombs the entire server at what i consider a moderate size (1600 rows, 27 columns) but that sounds considerably less than his 18,000 rows
I am using an oracle stored procedure accessed via coldfusion 9.0.1 cfstoredproc that on completion creates a spreadsheet for the user to download
The issue is that result sets greater than say 1200 rows are returning a 500 internal server error, 700 rows return fine, so i am guessing it is a memory problem?
the only message i received other than 500 Internal server error in the standard coldfusion look was in small print "gc overhead limit exceeded" and that was only once on a page refresh, which refers to the underlying Java JVM
I am not even sure how to go about diagnosing this
here is the end of the cfstoredproc and spreadsheet obj
<!--- variables assigned correctly above --->
<cfprocresult name="RC1">
</cfstoredproc>
<cfset sObj = spreadsheetNew("reconcile","yes")>
<cfset SpreadsheetAddRow(sObj, "Column_1, ... , Column27")>
<cfset SpreadsheetFormatRow(sObj, {bold=TRUE, alignment="center"}, 1)>
<cfset spreadsheetAddRows(sObj, RC1)>
<cfheader name="content-disposition" value="attachment; filename=report_#Dateformat(NOW(),"MMDDYYYY")#.xlsx">
<cfcontent type="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" variable="#spreadsheetReadBinary(sObj)#">
My Answer lies with coldfusion and one simple fact: DO NOT USE SpreadsheetAddRows or any of those related functions like SpreadsheetFormatRows
My solution to this was to execute the query, create an xls file, use the tag cfspreadsheet to write to the newly created xls file, then serve to the browser, deleting after serving
Using SpreadsheetAddRows, Runtime went from crashing server on 1000+ rows, 5+mins on 700 rows
Using the method outlined above 1-1.5 secs
if you are interested in more code, i can provide just comment, i am using the coldbox framework so didnt think the specificness would help just the new workflow

Get table names from lotus notes database

I'm trying to write a program that would dump a whole lotus notes database to a file via NotesSQL driver. I'm connecting via jdbc:odbc and have
I can execute selects and get data from Lotus notes database
here is the code
try {
System.out.print("Connecting... ");
Connection con = DriverManager.getConnection("jdbc:odbc:NRC", "UserName", "Passw0rd1337");
System.out.println("OK");
DatabaseMetaData dmd = con.getMetaData();
String[] tableTypes = new String[] {"TABLE", "VIEW"};
ResultSet rs = dmd.getTables(null, null, "%", tableTypes);
ResultSetMetaData rsd = rs.getMetaData();
while (rs.next()) {
for (int i=1; i<=rsd.getColumnCount();i++)
System.out.println(i+" - "+rsd.getColumnName(i) + " - " + rs.getString(1));
}
con.close();
System.out.println("Connection closed");
} catch (Exception e) {
System.out.println(e);
}
And is there a better way to connect to Lotus notes databases via NotesSQL? Because with my code i get only null values for the names...
I know you are trying to use JDBC and NotesSQL. But, depending on your needs, and using Eclipse, you can access Notes databases natively via Java, which frankly is alot easier than trying to use JDBC, bit of a square peg in a round hole when you're using JDBC with Domino. Even if you don't have Lotus Notes installed on the host machine you can still write and deploy java applets and servlets to get into the data.
You will need to get the relevant Lotus Domino jar's though. So, my recommendation is an alternative approach to JDBC.
So, providing you have the Lotus Domino jar files in your Eclipse project you should be able to code any kind of extract from a view or even run adhoc searches on a database.
Setup
If this sounds like the direction you need to go, then firstly, have a look at setting up Eclipse with relevant Notes jar's here. There are only a few. (Sometimes you'll read about using CORBA and/or IIOP. Try avoid that, it's just a world of hurt).
Samples and snippets
This developer works article (although 6 years old) still works for Domino and is a sound foundation for the approach I am advocating. That article starts to address the initialization of the NotesFactory and Session classes to get you into the Notes API. More online help here for the NotesFactory class.
If you have a Lotus Notes client available you can have a look through code snippets here. A classic example for accessing documents via Views in Java can be found here.
After that you can easily access Views and documents with examples from here, and learn from the guru, (Bob Balaban), about memory management here.
If you're processing high volumes or running servlets, then memory management is important, otherwise don't stress about it too much. You can execute Native searches on a Notes database by writing it in formula, and then using the "search" methods to execute the query natively.
Iterating through documents or views ?
The easiest approach is to traverse documents via views and/or use "getdocumentByKey" methods to get a collection and work on that. In Domino "Views" are the equivalent of Tables. You can also get a list of Views via the Database.Views property
Native Queries
It's difficult to find definitive instructions on native queries for Notes, but I have managed to find it online here.

Categories

Resources