I have a problem with my html-scraper. Html-scraper is multithreading application written on Java using HtmlUnit, by default it run with 128 threads. Shortly, it works as follows: it takes a site url from big text file, ping url and if it is accessible - parse site, find specific html blocks, save all url and blocks info including html code into corresponding tables in database and go to the next site. Database is mysql 5.1, there are 4 InnoDb tables and 4 views. Tables have numeric indexes for fields used in table joining. I also has a web-interface for browsing and searching parsed data (for searching I use Sphinx with delta indexes), written on CodeIgniter.
Server configuration:
CPU: Type Xeon Quad Core X3440 2.53GHz
RAM: 4 GB
HDD: 1TB SATA
OS: Ubuntu Server 10.04
Some mysql config:
key_buffer = 256M
max_allowed_packet = 16M
thread_stack = 192K
thread_cache_size = 128
max_connections = 400
table_cache = 64
query_cache_limit = 2M
query_cache_size = 128M
Java machine run with default parameters except next options: -Xms1024m -Xmx1536m -XX:-UseGCOverheadLimit -XX:NewSize=500m -XX:MaxNewSize=500m -XX:SurvivorRatio=6 -XX:PermSize=128M -XX:MaxPermSize=128m -XX:ErrorFile=/var/log/java/hs_err_pid_%p.log
When database was empty, scraper process 18 urls in second and was stable enough. But after 2 weaks, when urls table contains 384929 records (~25% of all processed urls) and takes 8.2Gb, java application begun work very slowly and crash every 1-2 minutes. I guess the reason is mysql, that can not handle growing loading (parser, which perform 2+4*BLOCK_NUMBER queries every processed url; sphinx, which updating delta indexes every 10 minutes; I don't consider web-interface, because it's used by only one person), maybe it rebuild indexes very slowly? But mysql and scraper logs (which also contain all uncaught exceptions) are empty. What do you think about it?
I'd recommend running the following just to check a few status things.. puting that output here would help as well:
dmesg
top Check the resident vs virtual memory per processes
So the application become non responsive? (Not the same as a crash at all) I would check all your resources are free. e.g. do a jstack to check if any threads are tied up.
Check in MySQL you have the expect number of connections. If you continuously create connections in Java and don't clean them up the database will run slower and slower.
Thank you all for your advice, mysql was actually cause of the problem. By enabling slow query log in my.conf I see that one of the queries, which executes every iteration, performs 300s (1 field for searching was not indexed).
Related
I have a problem with my Oracle DB network speed.
First of all, what's the essence of the problem. There are java application on my computer and Oracle DB on a remote server. Connection speed between them is about 2,5MB/s. I execute in my java app
a very simple query like "select id, name from table_name", result set contains ~60K rows (size is about 1,5 Mb) and transfers to my app for ~80 seconds. Accordingly to the profiler the most of the time application spends in oracle.net.Packet.recieve method.
For comparison the same query executes in SQL Developer for 0,5-0,7 seconds for 5000 rows. Extrapolating to 60K rows we have about 6-8 seconds.
The result of excution of tcpdump for my application shows that data transfers in chunks with size about 200 bytes. On the other hand for SQL Developer tcpdump shows package size more than 2000 bytes.
Official Oracle documentations suggests to increase SDU and TDU parameters, unfortunately i can't change configuration of database, so i tried to determine them on client side in a such way:
jdbc:oracle:thin:#(DESCRIPTION=(SDU=11280)(TDU=11280)(ADDRESS=(PROTOCOL=tcp)(HOST=<host>)(PORT=1521)(SEND_BUF_SIZE=11784)(RECV_BUF_SIZE=11784))(CONNECT_DATA=(SERVICE_NAME=<db>)))
But this didn't bring any changes. Can database or ojdbc driver ignore this parameters? Or maybe i'm on the wrong way?
As it turned out, the reason was in fetch size. Increasing its value allows to decrease execution time at ~100 times.
I've followed instructions from here and here to increase my SOLR memory allocation. I've done this because the SOLR server has shutdown periodically during some high frequency and high volume indexing activity.
I'm a little new to using SOLR and Ubuntu so bear with me, but I've found several locations where the SOLR_JAV_MEM parameter exist:
/opt/solr-6.2.0/bin/solr.in.sh
/opt/solr-6.2.0/bin/solr.in.cmd
/opt/solr-6.2.0/bin/solr.cmd
The same set of files in this directory: /home/deploy/.rbenv/versions/2.2.4/lib/ruby/gems/2.2.0/gems/sunspot_solr-2.2.5/solr/bin
And this directory:
/home/deploy/solr-6.2.0/bin
And finally, in this file: /etc/default/solr.in.sh
Anywhere I've seen a SOLR_JAV_MEM or SOLR_HEAP param with a number, I've replaced it with a larger value, for example in /opt/solr-6.2.0/bin/solr.in.sh:
# Increase Java Heap as needed to support your indexing / query needs
SOLR_HEAP="1500m"
# Expert: If you want finer control over memory options, specify them directly
# Comment out SOLR_HEAP if you are using this though, that takes precedence
#SOLR_JAVA_MEM="-Xms1512m -Xmx1512m"
If I'm measuring it correctly, I still only see about 500MB of memory allocated to SOLR, as seen by the following command:
root#ip-xxx:~# service solr status
Found 1 Solr nodes:
Solr process 15259 running on port 8989
{
"solr_home":"/var/solr/data",
"version":"6.2.0 764d0f19151dbff6f5fcd9fc4b2682cf934590c5 - mike - 2016-08-20 05:41:37",
"startTime":"2016-09-28T15:01:18.001Z",
"uptime":"0 days, 0 hours, 12 minutes, 28 seconds",
"memory":"100 MB (%20.4) of 490.7 MB"}
Am I doing something wrong? Or am I just measuring the memory incorrectly? Please let me know if I can provide add'l info. Thanks!
I'll answer my own question. It turned out that I had to edit the /etc/default/solr.in.sh file. I changed the SOLR_HEAP="512M" to SOLR_HEAP="1500m" and ran sudo service solr status and saw the memory showing 1.5G!
Need some help from the experts!
We have a project here (still on dev) that needs to run 50 java processes (for now and it will probably doubled or tripled in the future) at the same time every 5 minutes. I set Xmx50m for every process and our server has only 4gb of RAM, I know that would really slow our server. What I have in mind is to upgrade our RAM. My question is that do I have other options to prevent our server from being slow when running that amount of java processes?
Since you have 50 process and as per your assumption your processes need about 2.5 Gb to run .
To prevent your server from being slow you can follow some best practices to set java memory parameters e.g. set -Xmin and -Xmx the same values and determine a proper values based on your process usage, Also you can profile your process on runtime to ensure that everything is ok.
We have a Solr Server (Solr 4.5) with a custom schema and configuration set up for a project under development. Now I have observed, that by running our integration tests the memory usage of the Solr server continually grows. These tests (JUnit) each post a set of 100 randomly generated records to the server, queries around a bit and deletes them.
The deletion policy is set to
<deletionPolicy class="solr.SolrDeletionPolicy">
<str name="maxCommitsToKeep">1</str>
<str name="maxOptimizedCommitsToKeep">0</str>
</deletionPolicy>
Even when the index contains no documents anymore, no memory is freed. Every run of the tests increases the used memory by a certain amount (about 40 M, while the index itself is about 7 k), until the complete server dies with an OutOfMemoryError.
The Solr installation runs on a Tomcat 6.0.35.0, with Java 1.7.0_17 with -Xmx12g. OS is Linux.
How can that be? Where can I tweak the memory handling of Solr?
As it seems, I have set the cache values and the Xmx too high for the test machine, as a java process uses up more memory than it is assigned to (overhead). With reduced sizes the Solr now runs stable for two days and a lot of unit test runs (which it didn't before).
The test scenario (fill the index with random values and then clean it completely) filled the caches to maximum with records. It seems that removing a record from index not necessarily removes it from the caches...
I have a question about faults metric in mongostat.
I'm running mongo 2.0, on ubuntu, with 2 disks (each 32G) in raid-0 configuration.
The test in to load into mongo 5 million of user profiles.
I'm doing the process in single thread and use insert (bulk of 1000 entries) .
When I'm setting up the mongo for the first time and loading into it the profiles i see many faults in mongostat (2,5,and even 15) during the loading.
Then I'm running the loading again: first i'm dropping the old collection, and then run the loading.
The following times the faults=0 almost all the time.
Why is that?
MongoDB relays memory management to the OS via memory-mapped files mechanism. Basically, this mechanism allows a program to open files much larger than amount of installed RAM. When program tries to access a portion of that file, OS looks if this portion (page) is in RAM. If it is not, then page fault happens and that page is loaded from disk. faults/s metric in mongostat shows exactly this: how many page faults are occuring per second.
Now, when you're starting mongo and loading data into it, data files are not mapped into memory and they have to be loaded from disk (page faults). When you drop a collection, it is deleted logically, but corresponding physical files are not deleted and will be reused. Since they are in RAM already, there are no page faults.
If you drop a database instead, it takes the files with it, so you should see page faults next time.