Writing a collector for OpenTSDB

Writing a collector for OpenTSDB - java

I don't understand from the official documentation of OpenTSDB how to create a collector and how to make it running. In addition to that, i would like to make one collector in Java language.
I'm also a bit new to Unix systems, but i know the basics

Writing a collector for OpenTSDB is quite simple, if you have cloned from git repository the tcollector script you will see the startstop executable, this daemon once being launched will execute all files that are stored inside ./tcollector/collectors/NUMBER where NUMBER is the periodicity in minutes.
Said that, what you need to do is coding those scripts that will be stored inside collectors folder. When OpenTSDB executes those scripts it expects the following output:
<METRIC> <UNIX_TIMESTAMP> <VALUE>
So, in your case. Imaging you want to report the temperature of your PC (call each 5 minutes, you will have to follow the next steps:
Write your script, for example in Java, that gets the temperature of your PC (using SNMP, from the OS, or using any other method). Then when you run your script manually it will output: pc.temperature 1371075574 40
Put the script under ./tcollector/collectors/5/ so OpenTSDB will launch it each 5 minutes
Launch the collector by invoking startstop (OpenTSDB must be running)
A more detailed explanation here.

Related

Is it better to launch a Java app once and sleep or repeat launching and killing?

I have a Java application that needs to run several times. Every time it runs, it checks if there's data to process and if so, it processes the data.
I'm trying to figure out what's the best approach (performance, resource consumption, etc.) to do this:
1.- Launch it once, and if there's nothing to process make it sleep (All Java).
2.- Using a bash script to launch the Java app, and when it finishes, sleep (the script) and then relaunch the java app.
I was wondering if it is best to keep the Java app alive (sleeping) or relaunching every time.

It's hard to answer your question without the specific context. On the face of it, your questions sounds like it could be a premature optimization.
Generally, I suggest you do what's easier for you to do (and to maintain), unless you have good reasons not to. Here are some possible good reasons, pick the ones appropriate to your situation:
For sleeping in Java:
The check of whether there's new data is easier in Java
Starting the Java program takes time or other resources, for example if on startup, your program needs to load a bunch of data
Starting the Java process from bash is complex for some reason - maybe it requires you to fiddle with a bunch of environment variables, files or something else.
For re-launching the Java program from bash:
The check of whether there's new data is easier in bash
Getting the Java process to sleep is complex - maybe your Java process is a complex multi-threaded beast, and stopping, and then re-starting the various threads is complicated.
You need the memory in between Java jobs - killing the Java process entirely would free all of its memory.

I would not keep it alive.
Instead of it you can use some Job which runs at defined intervals you can use jenkins or you can use Windows scheduler and configure it to run every 5 minutes (as you wish).
Run a batch file with Windows task scheduler
And from your batch file you can do following:
javac JavaFileName.java // To Compile
java JavaFileName // to execute file
See here how to execute java file from cmd :
How do I run a Java program from the command line on Windows?

I personally would determine it, by the place where the application is working.
if it would be my personal computer, I would use second option with bash script (as resources on my local machine might change a lot, due to extensive use of some other programs and it can happen that at some point I might be running out of memory for example)
if it goes to cloud (amazon, google, whatever) I know exactly what kind of processes are running there (it should not change so dynamically comparing to my local PC) and long running java with some scheduler would be fine for me

disk I/O of a command line java program

I have a simple question, I've read up online but couldn't find a simple solution:
I'm running a java program on the command line as follows which accesses a database:
java -jar myProgram.jar
I would like a simple mechanism to see the number of disk I/Os performed by this program (on OSX).
So far I've come across iotop but how do I get iotop to measure the disk I/O of myProgram.jar?
Do I need a profiler like JProfiler do get this information?

iotop is a utility which gives you top n processes in descending order of IO consumption/utilization.
Most importantly it is a live monitoring utility which means its output changes every n sec( or time interval you specify). Though you can redirect it to a file, you need to parse that file and find out meaningful data after plotting a graph.
I would recommend to use sar. you can read more about it here
It is the lowest level monitoring utility in linux/unix. It will give you much more data than iotop.
best thing about sar is you can collect the data using a daemon when your program is running and then later analyze it using ksar
According to me, you can follow below approach,
Start sar monitoring, collect sar data every n seconds. value of n depends of approximate execution time of your program.
example : if your program takes 10 seconds to execute then monitoring per sec is good but if your program takes 1hr to execute then monitor per min or 30 sec. This will minimize overhead of sar process and still your data is meaningful.
Wait for some time (so that you get data before your program starts) and then start your program
end of your program execution
wait for some time again (so that you get data after your program finishes)
stop sar.
Monitor/visualize sar data using ksar. To start with, you check for disk utilization and then IOPS for a disk.
You can use Profilers for same thing but they have few drawbacks,
They need their own agents (agents will have their own overhead)
Some of them are not free.
Some of them are not easy to set up.
may or may not provide enough/required data.
besides this IMHO, Using inbuilt/system level utilities is always beneficial.
I hope this was helpful.

Your Java program will eventually be a process for host system so you need to filter out output of monitoring tool for your own process id. Refer Scripts section of this Blog Post
Also, even though you have tagged question with OsX but do mention in question that you are using OsX.
If you are looking for offline data - that is provided by proc filesystem in Unix bases systems but unfortunately that is missing in OSX , Where is the /proc folder on Mac OS X?
/proc on Mac OS X
You might chose to write a small script to dump data from disk and process monitoring tools for your process id. You can get your process id in script by process name, put script in a loop to look for that process name and start script before you execute your Java program. When script finds the said process, it will keep dumping relevant data from commands chosen by you at intervals decided by you. Once your programs ends ,log dumping script also terminates.

Executing a shell script in java as a thread

I need to execute a shell script in a java program. I figured out that i can use processbuilder and runtime.exec.. but my webserver times out every 180 sec but my script execution takes more than that..i do not want to use process for this approach.. is there any other way where i can use thread for this execution.
thanks.

I'm assuming that the response from the script is intended for humans to read.
Good interface design, and human nature, suggests that if your script is taking over 180 seconds to run, then it should be run separately from the web server. On linux, I would suggest putting it into 'cron', and letting it run on a regular basis. You would only serve the results of the script via the web server, with a response time in seconds instead of minutes.
If your script depends on parameters from the http request, or other information that is only available from within the web server's environment, you have the following choices.
If you can figure out the likely combinations of parameters, run the
script automatically for each combination of parameters,
again only serving the results through the web.
If the majority of the time is spent in a single command, and the
results of that command don't change much between runs, move that
command into a separate script that runs automatically, and use the
results of that separate script to build the web response.
Break the response up into segments, only showing a portion of the
data for each request, allowing the user to page through the
response. The script would be rewritten to only request the
necessary data for the current page, reducing the amount of time
needed to obtain that data.
Rewrite the script in a compilable language, which might gain you enough time to make running it for every request reasonable. However, if the problem is a database query, this won't do you any good. You'd have to go with option (3), whether you rewrote it in a compilable language or not.
Without additional information, like an example of the script, or a description of where you're getting the results from, that's the best I can do.

A process can run several threads, but they still are parts of the process.
So, all threads inside a java program are the threads of the java process, and a thread cannot run another program's threads.
A shell script is ran by a program : the shell program ! (/bin/bash or /bin/sh)
Anyway a shell script will mostly ran other programs inside several other processes.
No, you cannot run a shell inside a thread of java.

In general, if you have code that is separate from your Java program, such as code that is in a separate script, then there is no justification for why your code would execute an outside script when that code could be instead integrated into the program. It is insecure at best. Your basically allowing arbitrary code to be executed by your program since the outside script is editable. What you are doing sounds to me almost like it should be confined either a unit test or a build task.
As a unit test task and you could use a threaded JUnit runner to run your outside script during the test phase of your project.
Also, separately from your program, you could also execute it using a Gradle task and by using the parallellforks option that Gradle has.

Program execution in Eclipse is very slow when compared to command prompt

I have created a Java program which reads encrypted files from local system and does some processing. Actually I have 20 files to read so I have used threading mechanism to speed up the program execution.
When I run the program in Eclipse it takes more than 30 minutes to complete the execution, whereas if I make a runnable jar and execute the program using command prompt, it takes less than a minute.
Why does running programs in Eclipse take more time than running them in command prompt?

Eclipse's Console view that captures System.out is notoriously slow compared to the regular stdout of the command line. Whenever there is a lot of printing happening in the program, it is to be expected that the program will run significantly slower from Eclipse.
But anyway, unless you are writing a program designed to integrate with other programs via Unix pipes, you should minimize the printing as it will kill performance even at the command line.

There are some typical mistakes:
Maybe you are executing your program in Debug mode.
Try to use Run (play symbol inside a green circle) instead of Debug (a green bug)
Maybe you are executing your program with a different JVM
Take a look in Project Properties->Java compiler, Window->Preferences->Java->Compiler and Window->Preferences->Java->Installed JREs
The output and input interactions with Java Console of Eclipse JDT differ on performance than standard console.

Ensure that you use the Run action in Eclipse, and not Debug, as the latter really has measurable difference, especially if you use conditional breakpoints.
However, I remember having less significant differences arising from the use of the Debug.

I have just did an experiment for you and did not saw so significant difference.
I created class that calculates sin() 100000000 times.
This program ran ~15 seconds under eclipse and ~14 seconds via command prompt.
So, here are the reasons for slowness in your system I can see at the top of my head:
Be sure that you are not running under debug. Use Run option, not Debug.
Be sure that you do not have some coverage/monitoring developers tools on under eclipse. For example YourKit, Emma etc.
Be sure that your program does not produce significant prints to the console.
Check that you have enough heap memory when running under eclipse

Changing jdk 6 to jdk 7 worked perfectly for me.
Window->Preferences->Java->Installed JREs

PHP exec a unit testing with memory usage information

I have a question relates to PHP and Java programming.
I am going to develop a web application to do a unit testing.
PHP is the language that I'll use for the web and will executes the exec() function.
as I read, the function returns the output of execution, ok I do need it.
but it's not enough, I also think if I can get how much memory are used during the execution.
the web will run in an Apache web server in a Linux native operating system (orefered in Ubuntu).
This is the second case:
if there is a Java source which contains a program in which requires the user input during the execution, how can I execute it via a web server with also pass all the lines which may act as the user input?
the next problem is, the exec() function only accepts parameters in line.
How if I want
so, if there is any idea how to do that things?

The /usr/bin/time program (documented in time(1)) can return the amount of memory used during execution:
$ /usr/bin/time echo hello
hello
0.00user 0.00system 0:00.00elapsed ?%CPU (0avgtext+0avgdata 2512maxresident)k
0inputs+0outputs (0major+205minor)pagefaults 0swaps
You can see that the echo(1) program required 2.5 megabytes of memory and ran very quickly. Larger programs will be more impressive:
$ /usr/bin/time jacksum --help
Unknown argument. Use -h for help. Exit.
Command exited with non-zero status 2
0.08user 0.03system 0:00.87elapsed 12%CPU (0avgtext+0avgdata 57456maxresident)k
25608inputs+64outputs (92major+4072minor)pagefaults 0swaps
jacksum is a Java-based program, so it took 57 megabytes to tell me I screwed up the command line arguments. That's more like it.
You might also find the BSD process account system worthwhile. See lastcomm(1), sa(8), and dump-acct(8) for more information.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.