I have a mailbox file containing over 50 megs of messages separated by something like this:
From - Thu Jul 19 07:11:55 2007
I want to build a regular expression for this in Java to extract each mail message one at a time, so I tried using a Scanner, using the following pattern as the delimiter:
public boolean ParseData(DataSource data_source) {
boolean is_successful_transfer = false;
String mail_header_regex = "^From\\s";
LinkedList<String> ip_addresses = new LinkedList<String>();
ASNRepository asn_repository = new ASNRepository();
try {
Pattern mail_header_pattern = Pattern.compile(mail_header_regex);
File input_file = data_source.GetInputFile();
//parse out each message from the mailbox
Scanner scanner = new Scanner(input_file);
while(scanner.hasNext(mail_header_pattern)) {
String current_line = scanner.next(mail_header_pattern);
Matcher mail_matcher = mail_header_pattern.matcher(current_line);
//read each mail message and extract the proper "received from" ip address
//to put it in our list of ip's we can add to the database to prepare
//for querying.
while(mail_matcher.find()) {
String message_text = mail_matcher.group();
String ip_address = get_ip_address(message_text);
//empty ip address means the line contains no received from
if(!ip_address.trim().isEmpty())
ip_addresses.add(ip_address);
}
}//next line
//add ip addresses from mailbox to database
is_successful_transfer = asn_repository.AddIPAddresses(ip_addresses);
}
//error reading file--unsuccessful transfer
catch(FileNotFoundException ex) {
is_successful_transfer = false;
}
return is_successful_transfer;
}
This seems like it should work, but whenever I run it, the program hangs, probably due to it not finding the pattern. This same regular expression works in Perl with the same file, but in Java it always hangs on the String current_line = scanner.next(mail_header_pattern);
Is this regular expression correct or am I parsing the file incorrectly?
I'd be leaning toward something much simpler, by just reading lines, something like this:
while(scanner.hasNextLine()) {
String line = scanner.nextLine();
if (line.matches("^From\\s.*")) {
// it's a new email
} else {
// it's still part of the email body
}
}
Related
i Want to find out the geolocation by only providing the ip adress.
My Aim is to save city, country, postal code and other informations.
CraftPlayer cp = (CraftPlayer)p;
String adress = cp.getAddress();
Any short possibilities, to find out by only using ip?
I recommend using http://ip-api.com/docs/api:newline_separated
You can then chose what information you need and create your HTTP-link like:
http://ip-api.com/line/8.8.8.8?fields=49471
The result in this example would be:
success
United States
US
VA
Virginia
Ashburn
20149
America/New_York
So you can create a method in Java to read HTTP and split it at \n to get the lines:
private void whatever(String ip) {
String ipinfo = getHttp("http://ip-api.com/line/" + ip + "?fields=49471");
if (ipinfo == null || !ipinfo.startsWith("success")) {
// TODO: failed
return;
}
String[] lines = ipinfo.split("\n");
// TODO: now you can get the info
String country = lines[1];
/*
...
*/
}
private static String getHttp(String url) {
try {
BufferedReader br = new BufferedReader(new InputStreamReader(new URL(url).openStream()));
String line;
StringBuilder sb = new StringBuilder();
while ((line = br.readLine()) != null) {
sb.append(line).append(System.lineSeparator());
}
br.close();
return sb.toString();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
just make sure not to create to many querys in a short amount of time since ip-api.com will ban you for it.
There are a lot of websites that provide free databases for IP geolocation.
Examples include:
MaxMind
IP2Location
At the plugin startup you could download one of these databases and then query it locally during runtime.
If you choose do download the .bin format you will have to initialize a local database and then import the data. Otherwise you could just use the csv file with a Java library like opencsv.
From the documentation of opencsv:
For reading, create a bean to harbor the information you want to read,
annotate the bean fields with the opencsv annotations, then do this:
List<MyBean> beans = new CsvToBeanBuilder(FileReader("yourfile.csv"))
.withType(Visitors.class).build().parse();
Link to documentation: http://opencsv.sourceforge.net
I am programming a little server-client-programm, which sends a text from one client who is writing on a file, to the other clients with the same filename, and got the following error
But I am just sending an integer and no other characters...
Here's the code:
Server
String[] splitter = scanText.split("\n");
String length = splitter.length + "";
//sending scanText to clients
for (PrintWriter pw2 : userMap.get(filename) ) {
if(!pw2.equals(pw))
{
pw2.println(length + "\n" + scanText);
}
}
Client
class "UpdateInBackground" is a class which is in the Client-class
class UpdateInBackground extends Thread {
#Override
public void run() {
int lines; //to know how much lines are send from the server
String scanText;
while (!this.isInterrupted()) {
scanText = "";
lines = Integer.parseInt(sc.nextLine()); //here I get the error
while (lines-- > 0) {
scanText += sc.nextLine() + "\n";
}
output.setText(scanText);
}
}
}
#asparagus, please define sc in line sc.nextLine(), considering this is an object from class Scanner, I need to know the input. The question must be self explainable with the definitions of variables and what are the inputs.
In Class UpdateInBackground,
lines = Integer.parseInt(sc.nextLine());// here nextLine() is for any String , please refer documentation
Reason for NumberFormatException : You are converting the value to int, without knowing, what is getting as input.
Try to use exception handling, to know what types of errors, might just come, to avoid the program getting struck.
I've been making a instant chat program, and wanted to make it possible for users to "whisper" or private message one another. The way I implemented that is the user would type:
/w [username] [message]
I then send it to the server which sends it to all the users. The users then have to check to see if its sent to them, this is that method:
if (message.startsWith("/msg/")) {
message = message.trim();
message = message.substring(5);
String[] words = message.split("//s");
String UserName = null;
try {
String username = words[2];
UserName = username;
System.out.println(UserName);
} catch (ArrayIndexOutOfBoundsException e) {
System.out.println("Error reading the whisper command!");
}
if (UserName == client.getName()) {
List<String> list = new ArrayList<String>(Arrays.asList(words));
list.removeAll(Arrays.asList(2));
words = list.toArray(words);
String text = words.toString();
text = "You've been whispered! " + message;
console(text);
}
Everytime I send a /w when I'm testing it always give the ArrayIndexOutOfBoundsException. I also modify the message in the sending. Heres that method:
else if (message.toLowerCase().startsWith("/w")) {
message = "/msg/" + client.getName() + message;
message = "/m/" + message + "/e/";
txtMessage.setText("");
}
I also added a whole bunch more options for the actual code for the users, I made it /whisper, /m, /msg, and /message, but those are all just copies of this with a different input. Why is it giving me an ArrayIndextOutOfBoundsException, when the 3rd place in the words array SHOULD be the username that the sender is trying to send it to. Obviously this probably isn't the best way to send private messages, and if any of you guys have a simpler way I can implement to my server, please go ahead and let me know! Just know that I am a young, new programmer and so I will probably have a lot of questions.
The slashes in your split() regex are backwards. You need
String[] words = message.split("\\s");
You can also just use a space
String[] words = message.split(" ");
I am using a terminal emulator library to create a terminal and then I use it to send the data entered over serial to a serial device. When the data is sent back I want to parse it and show the most important information to the user in an editText. Currently I receive byte arrays/chunks and I convert them to a string. When I get a \r or a \n I crete a new string and the process repeats. This is fine for most commands, however some commands return results over multiple lines like "show vlan" here:
When I loop through this I get a string for each line. The first would contain VLAN Name Status and Ports, as an example. So now I have a problem, how can I VLAN 1 has x ports active. They are in different strings. Here is the code and screenshot for a current easier command where I am interested in one line:
Handler viewHandler = new Handler();
Runnable updateView = new Runnable() {
#Override
public void run() {
mEmulatorView.invalidate();
if (statusBool == true) {
for (int i = 0; i < dataReceived.length(); i++) {
parseCommand = parseCommand + dataReceived.charAt(i);
if (dataReceived.charAt(i) == '\n' || dataReceived.charAt(i) == '\r'){
if(parseCommand.contains("KlasOS"))
{
String[] tokens = parseCommand.split("\\s{1,}");
final String ReceivedText = mReceiveBox.getText().toString() + " "
+ new String("Software Version: " + tokens[1] + "\n" );
runOnUiThread(new Runnable() {
public void run() {
mReceiveBox.setText(ReceivedText);
mReceiveBox.setSelection(ReceivedText.length());
}
});
}
parseCommand = "";
}
}
statusBool = false;
viewHandler.postDelayed(updateView, 1000);
}
}
};
Now I would like to change this so i can deal with multiple lines. Would the ebst way be to store strings if they contain certain information?
I need this outputted on the right hand editText:
"The following ports are on vlan 1: Fa1/0, fa1/1, fa1/2, fa1/3, fa1/4, fa1/5, fa1/6, fa1/7, fa1/8, fa1/9, fa1/10, fa1/11, Gi0"
Basically, you need a way to reliably detect the end of a command result. Then it boils down to sending your command, reading data from the device until you encounter the end of result, and finally parsing that result.
I would scan for the prompt (switch#) as you do in your own answer. Maybe your are even able to force the device to use a more peculiar character sequence, which is unlikely to occur in the regular output of commands and makes it easier to detect the end of a result. For example, you could try to configure the prompt to include a control character like ^G or ^L. Or if your users don't mind, you could always send a second command that emits such a sequence, for example, "show vlan; echo ^G".
You should also be prepared for command errors, which result in a different output, for example, more or fewer lines as expected or a totally different output format. A result may even contain both, a regular output and a warning or an error.
I solved this in a messy way with a boolean and a few strings. i made a method for appending strings.
if((parseCommand.contains("VLAN Name") && parseCommand.contains("Status")&& parseCommand.contains("Ports"))
|| ((ShowVlanAppend.contains("VLAN Name")&& ShowVlanAppend.contains("Status")&& ShowVlanAppend.contains("Ports"))))
{
commandParse();
if(finalCommandBool == true){
runOnUiThread(new Runnable() {
public void run() {
mReceiveBox.setText(finalCommand);
mReceiveBox.setSelection(finalCommand.length());
ShowVlanAppend = "";
finalCommand = "";
finalCommandBool = false;
}
});
}
}
public void commandParse()
{
if (!parseCommand.contains("switch#")){
ShowVlanAppend = ShowVlanAppend + parseCommand;
}
else{
finalCommand = ShowVlanAppend;
finalCommandBool = true;
}
}
public class Parser {
public static void main(String[] args) {
Parser p = new Parser();
p.matchString();
}
parserObject courseObject = new parserObject();
ArrayList<parserObject> courseObjects = new ArrayList<parserObject>();
ArrayList<String> courseNames = new ArrayList<String>();
String theWebPage = " ";
{
try {
URL theUrl = new URL("http://ocw.mit.edu/courses/");
BufferedReader reader =
new BufferedReader(new InputStreamReader(theUrl.openStream()));
String str = null;
while((str = reader.readLine()) != null) {
theWebPage = theWebPage + " " + str;
}
reader.close();
} catch (MalformedURLException e) {
// do nothing
} catch (IOException e) {
// do nothing
}
}
public void matchString() {
// this is my regex that I am using to compare strings on input page
String matchRegex = "#\\w+(-\\w+)+";
Pattern p = Pattern.compile(matchRegex);
Matcher m = p.matcher(theWebPage);
int i = 0;
while (!m.hitEnd()) {
try {
System.out.println(m.group());
courseNames.add(i, m.group());
i++;
} catch (IllegalStateException e) {
// do nothing
}
}
}
}
What I am trying to achieve with the above code is to get the list of departments on the MIT OpencourseWare website. I am using a regular expression that matches the pattern of the department names as in the page source. And I am using a Pattern object and a Matcher object and trying to find() and print these department names that match the regular expression. But the code is taking forever to run and I don't think reading in a webpage using bufferedReader takes that long. So I think I am either doing something horribly wrong or parsing websites takes a ridiculously long time. so I would appreciate any input on how to improve performance or correct a mistake in my code if any. I apologize for the badly written code.
The problem is with the code
while ((str = reader.readLine()) != null)
theWebPage = theWebPage + " " +str;
The variable theWebPage is a String, which is immutable. For each line read, this code creates a new String with a copy of everything that's been read so far, with a space and the just-read line appended. This is an extraordinary amount of unnecessary copying, which is why the program is running so slow.
I downloaded the web page in question. It has 55,000 lines and is about 3.25MB in size. Not too big. But because of the copying in the loop, the first line ends up being copied about 1.5 billion times (1/2 of 55,000 squared). The program is spending all its time copying and garbage collecting. I ran this on my laptop (2.66GHz Core2Duo, 1GB heap) and it took 15 minutes to run when reading from a local file (no network latency or web crawling countermeasures).
To fix this, make theWebPage into a StringBuilder instead, and change the line in the loop to be
theWebPage.append(" ").append(str);
You can convert theWebPage to a String using toString() after the loop if you wish. When I ran the modified version, it took a fraction of a second.
BTW your code is using a bare code block within { } inside a class. This is an instance initializer (as opposed to a static initializer). It gets run at object construction time. This is legal, but it's quite unusual. Notice that it misled other commenters. I'd suggest converting this code block into a named method.
Is this your whole program? Where is the declaration of parserObject?
Also, shouldn't all of this code be in your main() prior to calling matchString()?
parserObject courseObject = new parserObject();
ArrayList<parserObject> courseObjects = new ArrayList<parserObject>();
ArrayList<String> courseNames = new ArrayList<String>();
String theWebPage=" ";
{
try {
URL theUrl = new URL("http://ocw.mit.edu/courses/");
BufferedReader reader = new BufferedReader(new InputStreamReader(theUrl.openStream()));
String str = null;
while((str = reader.readLine())!=null)
{
theWebPage = theWebPage+" "+str;
}
reader.close();
} catch (MalformedURLException e) {
} catch (IOException e) {
}
}
You are also catching exceptions and not displaying any error messages. You should always display an error message and do something when you encounter an exception. For example, if you can't download the page, there is no reason to try to parse a empty string.
From you comment I learned about static blocks in classes (thank you, didn't know about them). However, from what I've read you need to put the keyword static before the start of the block {. Also, it might just be better to put the code into your main, that way you can exit if you get a MalformedURLException or IOException.
You can, of course, solve this assignment with the limited JDK 1.0 API, and run into the issue that Stuart Marks helped you solve in his excellent answer.
Or, you just use a popular de-facto standard library, like for instance, Apache Commons IO, and read your website into a String using a no-brainer like this:
// using this...
import org.apache.commons.io.IOUtils;
// run this...
try (InputStream is = new URL("http://ocw.mit.edu/courses/").openStream()) {
theWebPage = IOUtils.toString(is);
}