I am exploring Impala for a POC, however I can't see any significant performance. I can't insert 5000 records/sec, at max I was able to insert mere 200/sec. This is really slow considering any database performance.
I tried two different methods but both are slow:
Using Cloudera
First, I installed Cloudera on my system and added latest CDH 6.2 cluster. I created a java client to insert data using ImpalaJDBC41 driver. I am able to insert record but speed is terrible. I tried tuning impala by increasing Impala Daemon Limit and my system RAM but it didn't help. Finally, I thought there is something wrong with my installation or something so I switched to another method.
Using Cloudera VM
Cloudera also ships there ready VM for test purpose. I tried my hands on to see if it gives better performance, but there is no big improvement. I still can't insert data 5k/sec speed.
I don't know where do I need to improvement. I have pasted my code below if any improvement can be done.
What is the ideal Impala configuration to achieve speed of (5k - 10k / sec)? This speed is still very less of which Impala is capable.
private static Connection connectViaDS() throws Exception {
Connection connection = null;
Class.forName("com.cloudera.impala.jdbc41.Driver");
connection = DriverManager.getConnection(CONNECTION_URL);
return connection;
}
private static void writeInABatchWithCompiledQuery(int records) {
int protocol_no = 233,s_port=20,d_port=34,packet=46,volume=58,duration=39,pps=76,
bps=65,bpp=89,i_vol=465,e_vol=345,i_pkt=5,e_pkt=54,s_i_ix=654,d_i_ix=444,_time=1000,flow=989;
String s_city = "Mumbai",s_country = "India", s_latt = "12.165.34c", s_long = "39.56.32d",
s_host="motadata",d_latt="29.25.43c",d_long="49.15.26c",d_city="Damouli",d_country="Nepal";
long e_date= 1275822966, e_time= 1370517366;
PreparedStatement preparedStatement;
int total = 1000*1000;
int counter =0;
Connection connection = null;
try {
connection = connectViaDS();
preparedStatement = connection.prepareStatement(sqlCompiledQuery);
Timestamp ed = new Timestamp(e_date);
Timestamp et = new Timestamp(e_time);
while(counter <total) {
for (int index = 1; index <= 5000; index++) {
counter++;
preparedStatement.setString(1, "s_ip" + String.valueOf(index));
preparedStatement.setString(2, "d_ip" + String.valueOf(index));
preparedStatement.setInt(3, protocol_no + index);
preparedStatement.setInt(4, s_port + index);
preparedStatement.setInt(5, d_port + index);
preparedStatement.setInt(6, packet + index);
preparedStatement.setInt(7, volume + index);
preparedStatement.setInt(8, duration + index);
preparedStatement.setInt(9, pps + index);
preparedStatement.setInt(10, bps + index);
preparedStatement.setInt(11, bpp + index);
preparedStatement.setString(12, s_latt + String.valueOf(index));
preparedStatement.setString(13, s_long + String.valueOf(index));
preparedStatement.setString(14, s_city + String.valueOf(index));
preparedStatement.setString(15, s_country + String.valueOf(index));
preparedStatement.setString(16, d_latt + String.valueOf(index));
preparedStatement.setString(17, d_long + String.valueOf(index));
preparedStatement.setString(18, d_city + String.valueOf(index));
preparedStatement.setString(19, d_country + String.valueOf(index));
preparedStatement.setInt(20, i_vol + index);
preparedStatement.setInt(21, e_vol + index);
preparedStatement.setInt(22, i_pkt + index);
preparedStatement.setInt(23, e_pkt + index);
preparedStatement.setInt(24, s_i_ix + index);
preparedStatement.setInt(25, d_i_ix + index);
preparedStatement.setString(26, s_host + String.valueOf(index));
preparedStatement.setTimestamp(27, ed);
preparedStatement.setTimestamp(28, et);
preparedStatement.setInt(29, _time);
preparedStatement.setInt(30, flow + index);
preparedStatement.addBatch();
}
preparedStatement.executeBatch();
preparedStatement.clearBatch();
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
connection.close();
} catch (SQLException e) {
e.printStackTrace();
}
}
}
Data is updating at snails pace. I tried increasing the batch size but it's decreasing the speed. I don't know if my code is wrong or I need to tune Impala for better performance. Please guide.
I am using VM for testing, here is other details:
System.
Os - Ubuntu 16
RAM - 12 gb
Cloudera - CDH 6.2
Impala daemon limit - 2 gb
Java heap size impala daemon - 500mb
HDFS Java Heap Size of NameNode in Bytes - 500mb.
Please let me know if more details are required.
You can't benchmark on a VM with 12GB. Look at the Impala's hardware requirements and you'll see you need 128GB of memory minimum.
Memory
128 GB or more recommended, ideally 256 GB or more. If the intermediate results during query processing on a particular node exceed the amount of memory available to Impala on that node, the query writes temporary work data to disk, which can lead to long query times. Note that because the work is parallelized, and intermediate results for aggregate queries are typically smaller than the original data, Impala can query and join tables that are much larger than the memory available on an individual node.
Also, the VM is used to familiarize yourself with the toolset but it is not powerful enough to even be a development environment.
References
Impala Requirements:Hardware Requirements
Tuning Impala for Performance
Can we add a powerModel for virtual machine also as we do it for host in Cloudsim (simulation Tool)? So that we can track the power consumption of each virtual machines.
Using CloudSim Plus you can compute the CPU usage and power consumption of a VM using the following code into your example:
private void printVmsCpuUtilizationAndPowerConsumption() {
for (Vm vm : vmList) {
System.out.println("Vm " + vm.getId() + " at Host " + vm.getHost().getId() + " CPU Usage and Power Consumption");
double vmPower; //watt-sec
double utilizationHistoryTimeInterval, prevTime = 0;
final UtilizationHistory history = vm.getUtilizationHistory();
for (final double time : history.getHistory().keySet()) {
utilizationHistoryTimeInterval = time - prevTime;
vmPower = history.powerConsumption(time);
final double wattsPerInterval = vmPower*utilizationHistoryTimeInterval;
System.out.printf(
"\tTime %8.1f | Host CPU Usage: %6.1f%% | Power Consumption: %8.0f Watt-Sec * %6.0f Secs = %10.2f Watt-Sec\n",
time, history.vmCpuUsageFromHostCapacity(time) *100, vmPower, utilizationHistoryTimeInterval, wattsPerInterval);
prevTime = time;
}
System.out.println();
}
}
You don't implement specific PowerModel for VMs. The VM power consumption is determined by its CPU utilization and the Host's PowerModel.
You can get the complete example here.
I'm trying to get the Total Ram in Java using Sigar library, I do the following
return String.valueOf(sigar.getMem().getRam());
My total RAM is 4GB, so I was expecting 4 or 4.00 but the result is 4008.
I tried the following:
long x = sigar.getMem().getTotal();
final String[] units = new String[] { "B", "KB", "MB", "GB", "TB" };
int digitGroups = (int) (Math.log10(x) / Math.log10(1024));
return new DecimalFormat("#,##0.##")
.format(x / Math.pow(1024, digitGroups)) + " " + units[digitGroups];
but still I'm having the result of 3.91 GB.
Any Idea?
You probably have 3.91 physically available GigaBytes, and the math from Sigar is correct.
Check if your BIOS is reserving some RAM for the integrated graphics device or some other devilry.
I have block of code that allows me to retreive all the apps/services running on my android device including the app that I
am building. I am not entirely sure if I am on the right path butbecause I am debugging on android 4.3 I would like to use ActivityManager.RunningService.activeSince
(per service/app) and subtract it from SystemClock.elapsedRealtime(); which I understand is total milliseconds since reboot . So for example
if the device was rebboted at 10am and whatsapp was started at 10:15 and the current time is 1030 I want to be able to use these values
to get an a close estimate of the amount spent on whatsapp. I have a feeling that this is not the most elegant way to achieve this and I am therefore very open to
any advice. This my code below thus far . For now I am using android 4.3
ActivityManager am = (ActivityManager)this.getSystemService(Context.ACTIVITY_SERVICE);
List<ActivityManager.RunningServiceInfo> services = am.getRunningServices(Integer.MAX_VALUE);
for (ActivityManager.RunningServiceInfo info : services) {
cal.setTimeInMillis(currentMillis-info.activeSince);
long millisSinceBoot = SystemClock.elapsedRealtime();
long appStartTime = info.activeSince;
long appDuration = appStartTime - millisSinceBoot ;
//long time = ((millisSinceBoot - values.get(position).activeSince)/1000);
//long time = ((millisSinceBoot - currentMillis-info.activeSince)/1000);
//Log.i("HRHHRHRHRHR", "%%%%%%%%%%%%%%%%"+time);
//String time1 = String.valueOf(time);
int seconds = (int) (appDuration / 1000) % 60 ;
int minutes = (int) ((appDuration / (1000*60)) % 60);
int hours = (int) ((appDuration / (1000*60*60)) % 24);
String time11 = hours+":"+minutes+":"+seconds;
Log.i("Time", "Secs:- " + seconds + " " + "Mins:- " + minutes + " " + "Hours:- " + hours);
Log.i(TAG, String.format("Process %s with component %s has been running since %s (%d milliseconds)",
info.process, info.service.getClassName(), cal.getTime().toString(), info.activeSince ));
}
How to get the Ram size and Hard disk size of the PC using Java? And Is it possible to get the currently logged user name on PC through java?
Disk size:
long diskSize = new File("/").getTotalSpace();
User name:
String userName = System.getProperty("user.name");
I'm not aware of a reliable way to determine total system memory in Java. On a Unix system you could parse /proc/meminfo. You can of course find the maximum memory available to the JVM:
long maxMemory = Runtime.getRuntime().maxMemory();
Edit: for completeness (thanks Suresh S), here's a way to get total memory with the Oracle JVM only:
long memorySize = ((com.sun.management.OperatingSystemMXBean) ManagementFactory
.getOperatingSystemMXBean()).getTotalPhysicalMemorySize();
For Ram Size , if you are using java 1.5
java.lang.management package
com.sun.management.OperatingSystemMXBean mxbean = (com.sun.management.OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean();
System.out.println(mxbean.getTotalPhysicalMemorySize() + " Bytes ");
import java.lang.management.*;
import java.io.*;
class max
{
public static void main(String... a)
{
long diskSize = new File("/").getTotalSpace();
String userName = System.getProperty("user.name");
long maxMemory = Runtime.getRuntime().maxMemory();
long memorySize = ((com.sun.management.OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean()).getTotalPhysicalMemorySize();
System.out.println("Size of C:="+diskSize+" Bytes");
System.out.println("User Name="+userName);
System.out.println("RAM Size="+memorySize+" Bytes");
}
}
Have a look at this topic, which goes into detail of how to get OS information such as this.
For Ram capacity:
//this step get ram capacity
long ram= ((com.sun.management.OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean()).getTotalPhysicalMemorySize();
long sizekb = ram /1000;
long sizemb = sizekb / 1000;
long sizegb = sizemb / 1000 ;
System.out.println("System Ram ="+sizegb+"gb");