I have a text file that consist of string. What i want to do is to separate the string with "[ham]" and the string with "[spam]" inside to the different array, how can i do that, i think about to use regex to recognize the pattern (ham & spam), but i have no idea to start. please help me.
String in text file:
good [ham]
very good [ham]
bad [spam]
very bad [spam]
very bad, very bad [spam]
and i want the output to be like this:
Ham array:
good
very good
Spam array:
bad
very bad
very bad, very bad
Help me please.
Instead of using array I think you should go for ArrayList
List<String> ham=new ArrayList<String>();
List<String> spam=new ArrayList<String>();
if(line.contains("[ham]"))
ham.add(line.substring(0,line.indexOf("[ham]")));
if(line.contains("[spam]"))
spam.add(line.substring(0,line.indexOf("[spam]")));
If you really need do this that way (with regex & array as output) write code like this:
public class StringResolve {
public static void main(String[] args) {
try {
// read data from some source
URL exampleTxt = StringResolve.class.getClassLoader().getResource("me/markoutte/sandbox/_25989334/example.txt");
Path path = Paths.get(exampleTxt.toURI());
List<String> strings = Files.readAllLines(path, Charset.forName("UTF8"));
// init all my patterns & arrays
Pattern ham = getPatternFor("ham");
List<String> hams = new LinkedList<>();
Pattern spam = getPatternFor("spam");
List<String> spams = new LinkedList<>();
// check all of them
for (String string : strings) {
Matcher hamMatcher = ham.matcher(string);
if (hamMatcher.matches()) {
// we choose only text without label here
hams.add(hamMatcher.group(1));
}
Matcher spamMatcher = spam.matcher(string);
if (spamMatcher.matches()) {
// we choose only text without label here
spams.add(spamMatcher.group(1));
}
}
// output data through arrays
String[] hamArray = hams.toArray(new String[hams.size()]);
System.out.println("Ham array");
for (String s : hamArray) {
System.out.println(s);
}
System.out.println();
String[] spamArray = spams.toArray(new String[spams.size()]);
System.out.println("Spam array");
for (String s : spamArray) {
System.out.println(s);
}
} catch (URISyntaxException | IOException e) {
e.printStackTrace();
}
}
private static Pattern getPatternFor(String label) {
// Regex pattern for string with same kind: some text [label]
return Pattern.compile(String.format("(.+?)\\s(\\[%s\\])", label));
}
}
You can use Paths.get("some/path/to/file") if you need to read it from somewhere in your drive.
Related
I currently have written a code that is able to read through a .txt file and for every new line it will be placed in a array element (not very hard). It works but this was not my initial intention, I want to have every word placed in a new array element, not after every new line. Here is my current code, can someone maybe help? Thank you!
public static ArrayList<String> read_file() {
try {
ArrayList<String> data_base = new ArrayList<String>();
Scanner s1 = new Scanner(new File("C:\\Users\\Jcool\\OneDrive\\A Levels\\Computer Science\\CSV files\\data convert\\convert.txt"));
while(s1.hasNextLine()) {
data_base.add(s1.nextLine());
}
return data_base;
}catch(FileNotFoundException e) {
}
return null;
}
Read all the lines at once and split them into array.
private static String readAllBytes(String filePath)
{
String content = "";
try
{
content = new String ( Files.readAllBytes( Paths.get(filePath) ) );
}
catch (IOException e)
{
e.printStackTrace();
}
return content;
}
Create a method named readAllBytes and call it like this;
/* String to split. */
String stringToSplit = readAllBytes(filePath);
String[] tempArray;
/* delimiter */
String delimiter = " ";//space if its a file contains words
/* given string will be split by the argument delimiter provided. */
tempArray = stringToSplit.split(delimiter);
If you mean to split your lines into array check this answer.
Take a look at the split(String) method. It returns a String[]. As an example
String string = "AAA-BBB";
String[] parts = string.split("-");
String part1 = parts[0]; // AAA
String part2 = parts[1]; // BBB
I have a following code:
String[] stringArray = new String[] { "One,", "Two", "Three" };
System.out.println(Arrays.toString(stringArray));
which produces the following string:
[One,, Two, Three]
Right now It is impossible to convert this string back into the same String[] with 3 elements because of two consecutive commas ,,
How to correctly make this conversion ?
UPDATED
Arrays.toString(stringArray)
is just a particular case and I'm not limited to use only this approach. I need to implement approach where conversion from String[] to String and back from String to String[] will be idempotent operation.
You state that "Arrays.toString is absolutely not required."1
I suggest you serialize the Array to Base64:
public String serializeArray(final String[] data) {
try (final ByteArrayOutputStream boas = new ByteArrayOutputStream();
final ObjectOutputStream oos = new ObjectOutputStream(boas)) {
oos.writeObject(data);
return Base64.getEncoder().encodeToString(boas.toByteArray());
} catch (IOException e) {
throw new RuntimeException(e);
}
}
Then deserialize the Base64 to an array:
public String[] deserializeArray(final String data) {
try (final ByteArrayInputStream bias = new ByteArrayInputStream(Base64.getDecoder().decode(data));
final ObjectInputStream ois = new ObjectInputStream(bias)) {
return (String[]) ois.readObject();
} catch (IOException | ClassNotFoundException e) {
throw new RuntimeException(e);
}
}
This requires Java 8.
Example:
public static void main(String args[]) throws Exception {
String[] stringArray = new String[]{"One,", "Two", "Three"};
String serialized = serializeArray(stringArray);
String[] deserialized = deserializeArray(serialized);
System.out.println(Arrays.toString(stringArray));
System.out.println(serialized);
System.out.println(Arrays.toString(deserialized));
}
Output
[One,, Two, Three]
rO0ABXVyABNbTGphdmEubGFuZy5TdHJpbmc7rdJW5+kde0cCAAB4cAAAAAN0AARPbmUsdAADVHdvdAAFVGhyZWU=
[One,, Two, Three]
Note, this works for any Object that implements Serializable, not just String[].
As a simple alternative, you could replace , by \, before joining the array and then also replace \, by , after splitting it. This relies on the standard "escaped delimiter" pattern that CSV uses. But it will fail if the user inputs \, somewhere in the input, so is less robust: YMMV.
public String serializeArray(final String[] data) {
return Arrays.stream(data)
.map(s -> s.replace(",", "\\,"))
.collect(joining(","));
}
public String[] deserializeArray(final String data) {
return Pattern.compile("(?<!\\\\),").splitAsStream(data)
.map(s -> s.replace("\\,", ","))
.toArray(String[]::new);
}
Convert it to a format intended for this, like JSON. Using Jackson it would be something like this:
ObjectMapper objectMapper = new ObjectMapper();
String out = objectMapper.writeValueAsString(Arrays.asList(array));
And back:
List<String> strings = (List<String>) objectMapper.readValue(out, List.class);
String[] array2 = strings.toArray();
I really don't know what you want to do, but the array separator , is in your string, so the simplest way to avoid this would be to avoid building the string with default array separator! like this:
String[] stringArray = new String[] { "One,", "Two", "Three" };
StringBuilder string = new StringBuilder();
string.append("[");
for (int i = 0; i < stringArray.length; i++) {
string.append(stringArray[i] + (i == (stringArray.length - 1) ? "" : "; "));
}
string.append("]");
System.out.println(string);
System.out.println(string.toString().substring(1, string.length() - 1).split("; "));
surely you can do some more stuff do get it work with default array separator, but it depends on what you want to do, I just choose the simplest way.
As part of a project I'm working on, I'd like to clean up a file I generate of duplicate line entries. These duplicates often won't occur near each other, however. I came up with a method of doing so in Java (which basically find a duplicates in the file, I stored two strings in two arrayLists and iterating but it was not working because of nested for loops i am getting into the condition manyways.
I need an integrated solution for this, however. Preferably in Java. Any ideas?
List item
public class duplicates {
static BufferedReader reader = null;
static BufferedWriter writer = null;
static String currentLine;
public static void main(String[] args) throws IOException {
int count=0,linecount=0;;
String fe = null,fie = null,pe=null;
File file = new File("E:\\Book.txt");
ArrayList<String> list1=new ArrayList<String>();
ArrayList<String> list2=new ArrayList<String>();
reader = new BufferedReader(new FileReader(file));
while((currentLine = reader.readLine()) != null)
{
StringTokenizer st = new StringTokenizer(currentLine,"/"); //splits data into strings
while (st.hasMoreElements()) {
count++;
fe=(String) st.nextElement();
//System.out.print(fe+"/// ");
//System.out.println("count="+count);
if(count==1){ //stores 1st string
pe=fe;
// System.out.println("first element "+fe);
}
else if(count==5){
fie=fe; //stores 5th string
// System.out.println("fifth element "+fie);
}
}
count=0;
if(linecount>0){
for(String s1:list1)
{
for(String s2:list2){
if(pe.equals(s1)&&fie.equals(s2)){ //checking condition
System.out.println("duplicate found");
//System.out.println(s1+ " "+s2);
}
}
}
}
list1.add(pe);
list2.add(fie);
linecount++;
}
}
}
i/p:
/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/book1/_cwc/B737/customer/Special_Reports/
/jangeer/_cwc/Crj_200/customer/plots/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
/jangeer/_cwc/ERJ170/customer/01_Highlights/
o/p:
/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/plots/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
Use a Set<String> instead of Arraylist<String>.
Duplicates aren't allowed in a Set, so if you just add everyline to it, then get them back out, you'll have all distinct strings.
Performance-wise it's also quicker than your nested for-loop.
public static void removeDups() {
String[] input = new String[] { //Lets say you read whole file in this string array
"/book1/_cwc/B737/customer/Special_Reports/",
"/Airbook/_cwc/A330-200/customer/02_Watchlists/",
"/book1/_cwc/B737/customer/Special_Reports/",
"/jangeer/_cwc/Crj_200/customer/plots/",
"/Airbook/_cwc/A330-200/customer/02_Watchlists/",
"/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/",
"/jangeer/_cwc/Crj_200/customer/02_Watchlists/",
"/jangeer/_cwc/Crj_200/customer/01_Highlights/",
"/jangeer/_cwc/ERJ170/customer/01_Highlights/"
};
ArrayList<String> outPut = new ArrayList<>(); //The array list for storing output i.e. distincts.
Arrays.stream(input).distinct().forEach(x -> outPut.add(x)); //using java 8 and stream you get distinct from input
outPut.forEach(System.out::println); //I will write back to the file, just for example I am printing out everything but you can write back the output to file using your own implementation.
}
The output when I ran this method was
/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/plots/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
/jangeer/_cwc/ERJ170/customer/01_Highlights/
EDIT
Non Java 8 answer
public static void removeDups() {
String[] input = new String[] {
"/book1/_cwc/B737/customer/Special_Reports/",
"/Airbook/_cwc/A330-200/customer/02_Watchlists/",
"/book1/_cwc/B737/customer/Special_Reports/",
"/jangeer/_cwc/Crj_200/customer/plots/",
"/Airbook/_cwc/A330-200/customer/02_Watchlists/",
"/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/",
"/jangeer/_cwc/Crj_200/customer/02_Watchlists/",
"/jangeer/_cwc/Crj_200/customer/01_Highlights/",
"/jangeer/_cwc/ERJ170/customer/01_Highlights/"
};
LinkedHashSet<String> output = new LinkedHashSet<String>(Arrays.asList(input)); //output is your set of unique strings in preserved order
}
I have an array of string that I load throughout my application, and it contains different words. I have a simple if statement to see if it contains letters or numbers but not words .
I mean i only want those words which is like AB2CD5X .. and i want to remove all other words like Hello 3 , 3 word , any other words which is a word in English. Is it possible to filter only alphaNumeric words except those words which contain real grammar word.
i know how to check whether string contains alphanumeric words
Pattern p = Pattern.compile("[\\p{Alnum},.']*");
also know
if(string.contains("[a-zA-Z]+") || string.contains([0-9]+])
What you need is a dictionary of English words. Then you basically scan your input and check if each token exists in your dictionary.
You can find text files of dictionary entries online, such as in Jazzy spellchecker. You might also check Dictionary text file.
Here is a sample code that assumes your dictionary is a simple text file in UTF-8 encoding with exactly one (lower case) word per line:
public static void main(String[] args) throws IOException {
final Set<String> dictionary = loadDictionary();
final String text = loadInput();
final List<String> output = new ArrayList<>();
// by default splits on whitespace
final Scanner scanner = new Scanner(text);
while(scanner.hasNext()) {
final String token = scanner.next().toLowerCase();
if (!dictionary.contains(token)) output.add(token);
}
System.out.println(output);
}
private static String loadInput() {
return "This is a 5gse5qs sample f5qzd fbswx test";
}
private static Set<String> loadDictionary() throws IOException {
final File dicFile = new File("path_to_your_flat_dic_file");
final Set<String> dictionaryWords = new HashSet<>();
String line;
final LineNumberReader reader = new LineNumberReader(new BufferedReader(new InputStreamReader(new FileInputStream(dicFile), "UTF-8")));
try {
while ((line = reader.readLine()) != null) dictionaryWords.add(line);
return dictionaryWords;
}
finally {
reader.close();
}
}
If you need more accurate results, you need to extract stems of your words. See Apache's Lucene and EnglishStemmer
You can use Cambridge Dictionaries to verify human words. In this case, if you find a "human valid" word you can skip it.
As the documentation says, to use the library, you need to initialize a request handler and an API object:
DefaultHttpClient httpClient = new DefaultHttpClient(new ThreadSafeClientConnManager());
SkPublishAPI api = new SkPublishAPI(baseUrl + "/api/v1", accessKey, httpClient);
api.setRequestHandler(new SkPublishAPI.RequestHandler() {
public void prepareGetRequest(HttpGet request) {
System.out.println(request.getURI());
request.setHeader("Accept", "application/json");
}
});
To use the "api" object:
try {
System.out.println("*** Dictionaries");
JSONArray dictionaries = new JSONArray(api.getDictionaries());
System.out.println(dictionaries);
JSONObject dict = dictionaries.getJSONObject(0);
System.out.println(dict);
String dictCode = dict.getString("dictionaryCode");
System.out.println("*** Search");
System.out.println("*** Result list");
JSONObject results = new JSONObject(api.search(dictCode, "ca", 1, 1));
System.out.println(results);
System.out.println("*** Spell checking");
JSONObject spellResults = new JSONObject(api.didYouMean(dictCode, "dorg", 3));
System.out.println(spellResults);
System.out.println("*** Best matching");
JSONObject bestMatch = new JSONObject(api.searchFirst(dictCode, "ca", "html"));
System.out.println(bestMatch);
System.out.println("*** Nearby Entries");
JSONObject nearbyEntries = new JSONObject(api.getNearbyEntries(dictCode,
bestMatch.getString("entryId"), 3));
System.out.println(nearbyEntries);
} catch (Exception e) {
e.printStackTrace();
}
Antlr might help you.
Antlr stands for ANother Tool for Language Recognition
Hibernate uses ANTLR to parse its query language HQL(like SELECT,FROM).
if(string.contains("[a-zA-Z]+") || string.contains([0-9]+])
I think this is a good starting point, but since you're looking for strings that contain both letters and numbers you might want:
if(string.contains("[a-zA-Z]+") && string.contains([0-9]+])
I guess you might also want to check if there are spaces? Right? Because you that could indicate that there are separate words or some sequence like 3 word. So maybe in the end you could use:
if(string.contains("[a-zA-Z]+") && string.contains([0-9]+] && !string.contains(" "))
Hope this helps
You may try this,
First tokenize the string using StringTokenizer with default delimiter, for each token if it contains only digits or only characters, discard it, remaining will be the words which contains combination of both digits and characters. For identifying only digits only characters you can have regular expressions used.
Hi so i have this project that requires me to write the code in java lets say i have this txt file:
GoodTitle Description
Gold The shiny stuff
Wheat What wheaties are made of
Wood To make more ships
Spices To disguise the taste of rotten food
Tobacco Smoko time
Coal To make them steam ships go
Coffee Wakes you up
Tea Calms you down
all i want to do is to put the left side of the text (goodtitle,gold,wheat,wood,etc) into an arraylist and the right side of the text(description,the shiny stuff) into another array list. this is my current code:
public void openFile(){
try{
x = new Scanner(new File("D://Shipping.txt"));
}
catch (Exception e){
System.out.println("File could not be found");
}
}
public void readFile(){
while (x.hasNextLine()){
String a = x.next();
x.nextLine();
ArrayList<String> list = new ArrayList<String>();
while (x.hasNext()){
list.add(x.next());
}
System.out.printf("%s \n", list);
}
}
public void closeFile(){
x.close();
probably it need some modification on readFile as i still confuse on how to do it. thanks in advance...
NOTE=I am not allowed to change the content of the txt file.
in my current code i still put the whole thing into 1 arraylist because i am unable to split them.
do i need toString method?because i have no idea how to do it.
thanks in advance...
You have to read the left side into one list and the right side into another.
This code isn't right because x.next() doesn't actually return one column. How could it know what one column is? But it should give you an idea of how to do it.
ArrayList<String> listL = new ArrayList<String>();
ArrayList<String> listR = new ArrayList<String>();
while (x.hasNextLine()){
x.nextLine();
if (x.hasNext()){
listL.add(x.next());
} else {
listL.add("");
}
if (x.hasNext()){
listR.add(x.next());
} else {
listR.add("");
}
}
System.out.println(listL);
System.out.println(listR);
If you are willing to use a Map<String, String>, perhaps you could try something like this:
public static Map<String, String> getContents() throws IOException {
final Map<String, String> content = new HashMap<>();
final Scanner reader = new Scanner(new File("D://Shipping.txt"), "UTF-8");
while(reader.hasNextLine()){
final String line = reader.nextLine();
final String[] split = line.split(" +");
content.put(split[0], split[1]);
}
reader.close();
return content;
}
public static void main(String args[]) throws IOException{
final Map<String, String> content = getContents();
content.keySet().forEach(k -> System.out.printf("%s -> %s\n", k, content.get(k)));
}
I'd just like to note that this solution was programmed using Java 8, surely you will be able to modify it to a lower JDK level.