I am woking with AWS Kinesis and CouldWatch. How can I fetch many metrics of one shard with one request? This is how I get one metric:
GetMetricStatisticsRequest request = new GetMetricStatisticsRequest();
request.withNamespace(namespace)
.withDimensions(dimensions)
.withPeriod(duration)
.withStatistics(statistic)
.withMetricName(metricName)
.withStartTime(startTime)
.withEndTime(endTime);
You can't fetch data for multiple metrics with one call to GetMetricStatistics.
GetMetricStatistics API takes metric name and a list of dimensions, which together define exactly one metric. To get data for multiple metrics you'll have to make multiple GetMetricStatistics calls.
We use java mail api with imap and fetch messages of folders containing millions of messages. There are some rules and limitiations:
We do not have always open connections to mail server and therefore we can not add listeners.
The messages will be stored in a local database with all properties, subject, body, receive date, from etc.
Can not use multiple threads
To keep the performance at acceptable levels and prevent out of memory crashes, I am planning:
1.During inital fetch, where all messages have to be fetched, store only the message headers and bypass body and attachments. Getting the body and attachment of a message will be done when requested by the client. The initialization can take hours, it is not a problem.
2.When fetching all messages at start, use a appropriate fetch profile to make it faster, but process in blocks, for example:
Message m1[] = f.getMessages(1, 10000);
f.fetch(m1, fp);
//process m1 array
Message m2[] = f.getMessages(10001, 20000);
f.fetch(m2, fp);
//process m2 array
instead of
Message m_all[] = f.getMessages(1, NUMALLMESSAGES);
f.fetch(m_all, fp);
//process m_all array, may throw out of memory errors
3.And after we have all the messages, store the UID of recent message in the DB and on the next fetch perform:
f.getMessagesByUID(LASTUIDREADFROMDB, UIDMAX)
Do you have additional suggestions, or see any points we have to care of (memory, performance)
I am having a use-case in which I am consuming twitter filter stream in a multithreaded environment using twitter HBC.
I receive keywords to track from user on per request that I receive, process them and save them to the database.
Example
Request 1: track keywords [apple, google]
Request 2: track keywords [yahoo, microsoft]
What I am currently doing is, opening a new thread for each request and processing them.
I am making connection for every end-pints a below (I am following this official HBC example)
endpoint.trackTerms(Lists.newArrayList("twitterapi", "#yolo"));
Authentication auth = new OAuth1(consumerKey, consumerSecret, token, secret);
// Create a new BasicClient. By default gzip is enabled.
Client client = new ClientBuilder()
.hosts(Constants.STREAM_HOST)
.endpoint(endpoint)
.authentication(auth)
.processor(new StringDelimitedProcessor(queue))
.build();
// Establish a connection
client.connect();
The problem that I am facing is twitter gives me warning of having more that one connection.
But as according to the above code and my use case I have to make a new instance of Client object for ever request that I receive as my end points
(trackTerms) are different for every request that I receive.
What is the proper way to connect to twitter for my use-case (multithreaded system) to avoid to many connections warning and rate limit warnings
Current warnings from twitter:
2017-02-01 13:47:33 INFO TwitterStreamImpl:62 - 420:Returned by the Search and Trends API when you are being rate limited (https://dev.twitter.com/docs/rate-limiting).
Returned by the Streaming API:
Too many login attempts in a short period of time.
Running too many copies of the same application authenticating with the same account name.
Exceeded connection limit for user
Thanks.
I use javamail to read mails from an exchage account using IMAP protocol. Those mails are in plain format and its contents are XMLs.
Almost all those mails have short size (usually under 100Kb). However, sometimes I have to deal with large mails (about 10Mb-15Mb). For example, yesterday I received an email which was 13Mb size. It took more than 50min just to read it. Is it normal? Is there a way to increase its performance?
The code is:
Session sesion = Session.getInstance(System.getProperties());
Store store = sesion.getStore("imap");
store.connect(host, user, passwd);
Folder inbox = store.getFolder("INBOX");
inbox.open(Folder.READ_WRITE);
Message[] messages = inbox.search(new FlagTerm(new Flags(Flags.Flag.SEEN), false));
for (int i = 0 ; i< messages.length ; i++){
Object contents = messages[i].getContent(); // Here it takes 50 min on 13Mb mail
// ...
}
Method that takes such a long time is messages[i].getContent(). What am I doing wrong? Any hint?
Thanks a lot and sorry for my english! ;)
I finally solved this issue and wanted to share.
The solution, at least the one that worked to me, was found in this site: http://www.oracle.com/technetwork/java/faq-135477.html#imapserverbug
So, my original code typed in my first post becomes to this:
Session sesion = Session.getInstance(System.getProperties());
Store store = sesion.getStore("imap");
store.connect(host, user, passwd);
Folder inbox = store.getFolder("INBOX");
inbox.open(Folder.READ_WRITE);
// Convert to MimeMessage after search
MimeMessage[] messages = (MimeMessage[]) carpetaInbox.search(new FlagTerm(new Flags(Flags.Flag.SEEN), false));
for (int i = 0 ; i< messages.length ; i++){
// Create a new message using MimeMessage copy constructor
MimeMessage cmsg = new MimeMessage(messages[i]);
// Use this message to read its contents
Object obj = cmsg.getContent();
// ....
}
The trick is, using MimeMessage() copy constructor, create a new MimeMessage and read its contents instead of original message.
You should note that such object is not really connected to server, so any changes you make on it, like setting flags, won't take effect. Any change on message, have to be done on original message.
To sum up: This solution works reading large Plain Text mails (up to 15Mb) connecting to an Exchange Server using IMAP protocol. The times lowered from 51-55min to read a 13Mb mail, to 9seconds to read same mail. Unbelievable.
Hope this helps someone and sorry for English mistakes ;)
It would always be messages[i].getContent() that would be the slowest part of the code. The reason is normally IMAP server would not cache this part of message data. Nevertheless, you can try this:
FetchProfile fp = new FetchProfile();
fp.add(FetchProfile.Item.ENVELOPE);
fp.add(FetchProfileItem.FLAGS);
fp.add(FetchProfileItem.CONTENT_INFO);
fp.add("X-mailer");
and after you have specified the fetch profile then you do your search/fetch of messages.
Basically the concept is that the IMAP provider fetches the data for a message from the server only when necessary. (The javax.mail.FetchProfile is used to optimize this). The header and body structure information, once fetched, is always cached within the Message object. However, the content of a bodypart is not cached. So each time the content is requested by the client (either using getContent() or using getInputStream()), a new FETCH request is issued to the server. The reason for this is that the content of a message could be potentially large, and if we cache this content for a large number of messages, there is the possibility that the system may run out of memory soon since the garbage collector cannot free the referenced objects. Clients should be aware of this and must hold on to the retrieved content themselves if needed.
By using the above mentioned code snippet you could 'hope' for some speed improvement but it solely depends on your SMTP server if this would work or not. All the big SMTP server do not support this behaviour because of the load issue mentioned in the previous paragraph and hence you may not gain any speed.
Using the Folder.fetch method you can prefetch in one operation the metadata for multiple messages. That will reduce the time to process each message, but won't help that much with a huge message.
The handle huge message parts efficiently, you'll generally want to use the getInputStream method to process the data incrementally, rather than using the getContent method to read all the data in and create a huge String object with all the data.
You can also tune the fetching by specifying the "mail.imap.fetchsize" property, which defaults to 16384. If most of your messages are less than 100K, and you always need to read all of the data in the message, you might set the fetchsize to 100K. That will make small messages much faster and larger message more efficient.
I had a similar issue. Fetching mails via IMAP were very slow. Furthermore I had another issue downloading large attachments. After a look in the JavaMail FAQ I found the solution for the later issue in this question that advises to set the mail.imap.partialfetch (respectively mail.imaps.partialfetch) to false. This not only fixes the download issue but the slow reading of the messages as well.
In the referenced JavaMail notes.txt it is said.
Due to a problem in the Microsoft Exchange IMAP server, insufficient
number of bytes may be retrieved when reading big messages. There
are two ways to workaround this Exchange bug:
(a) The Exchange IMAP server provides a configuration option called
"fast message retrieval" to the UI. Simply go to the site, server
or recipient, click on the "IMAP4" tab, and one of the check boxes
is "enable fast message retrieval". Turn it off and the octet
counts will be exact. This is fully described at
http://support.microsoft.com/default.aspx?scid=kb;EN-US;Q191504
(b) Set the "mail.imap.partialfetch" property to false. You'll have
to set this property in the Properties object that you provide to
your Session.
Certain IMAP servers do not implement the IMAP Partial FETCH
functionality properly. This problem typically manifests as corrupt
email attachments when downloading large messages from the IMAP
server. To workaround this server bug, set the
"mail.imap.partialfetch"
property to false. You'll have to set this property in the Properties
object that you provide to your Session.
I have four db servers with the same db structure but different data.
Currently when new data are inserted to database my application get this data, create template and send email.
I would like to separate sending email from my applications.
For example some thread which will start once per 10 minutes. It selects data from my four db servers, connect to mail server and send email to users.
It's possible with using JMS or something similar ?
Thanks for replys !
I did the same by creating a mail table (likely one per DB) and save the Template and data (or subject/body) in it. A separate process could be Quartz or your own pooling thread read that table and connect to mail server and send email and update email status.
In this way you can check status of any email at any given time and even you can resend any email. The table needs to purge/archive after some time may be after 1 day or 1 week depends on table size.