Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have to do a search in a text file or a large string to check if the text contains a set of keywords (could be millions). If it contains the keywords I have to highlight whatever keywords got matched. What approach should be taken for this? Does lucene provide a solution for this?
You've tagged your question with Elasticsearch - if you're open to using ES I think Percolation with highlighting may fit what you need. You could register each keyword as a separate query with the percolator and then run each document or string thru it. It will return a list of the queries that matched. You can also combine it with highlighting.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-percolate.html
http://blog.qbox.io/elasticsesarch-percolator
You can use lucene ShingleFilter
You will find lots of example on the net, here is one http://www.massapi.com/class/sh/ShingleFilter.html
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm looking for a way to apply a filter by verb on feed.getActivities() in GetStream API, more than lg_te or etc. I want to filter verb to be equals something like writeArticle. Is there any way? Unfortunately, I couldn't find anything in the docs.
Thank you for all of your help.
You can't.
For these activities, put them into a separate feed and read directly and make your original feed to follow this new feed.
Your current follow: add to X
Your new follow: add to article feed and X will follow article feed so everything as before.
But now you can filter, just read article feed.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
trying to match a word with some hard coded values, let's say i have this word
'revenue' but 'revenues'
should also be a match.same way like this
'liability' > 'liabilities' .
what would be the approach we should take here, thanks in advance.
I have tried using my own algorithm but it is very difficult maintain word library and its respective plural or singular.
If you don't want to maintain full dictionary, then you might try to implement some general rules plus dictionary of exceptions from those rules.
But these are all quick and hacky solutions. Depending on how good must it be, different approaches would also be available like machine learning and maybe some language services available on clouds like AWS or Azure...
You might want to look at PorterStemmer of lucene. The idea is to compare the stems of both the words instead of comparing singulars and plurals. You can read more about it here.
Here is the maven dependency and below is an example:
PorterStemmer stemmer = new PorterStemmer();
stemmer.setCurrent("liability");
stemmer.stem();
System.out.println(stemmer.getCurrent());
stemmer.setCurrent("liabilities");
stemmer.stem();
System.out.println(stemmer.getCurrent());
The above returns same stems for both the words.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
The image below describes what I want to do, so I'm supposed to add many values to this three tables.
I'm using the library docx4j
You can use content control databinding for this; docx4j's OpenDoPE convention allows you to repeat table rows. And more recent versions of Word have a concept of repeating content controls; see https://www.docx4java.org/blog/2015/01/word-2013-repeatingsection-content-controls-ready-for-prime-time/
In principle, docx4j supports both, but it'll be easier to get help with the OpenDoPE approach.
To get started, try invoice.docx from https://github.com/plutext/docx4j/tree/master/sample-docs/word/databinding which is an example of repeating table rows.
To merge invoice-data.xml (from the same dir) into it, use https://github.com/plutext/docx4j/blob/master/src/samples/docx4j/org/docx4j/samples/ContentControlsMergeXML.java
If you like this approach, you'll need to author your own input document; to do this, you can try the "friendly" Word AddIn at https://opendope.org/implementations.html
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am trying to extract an object from a JSText String in Java.
Regex that I am using (more accurate at this moment):
PJ\s?[=:]\s?\{(.*\s*\})
This is the demo:
https://regex101.com/r/hlkEUc/3
If you appreciate, at the end its the full code in single line form. This is captured without problems but in the middle of the text you can see the regex is trying capture the same object but it's broken due the line break.
Object to extract:
var PJ={yF:function(a,b){var c=a[0];a[0]=a[b%a.length];a[b]=c},It:function(a){a.reverse()},yp:function(a,b){a.splice(0,b)}};
I think you probably want this: PJ\s?[=:]\s?\{(.*[\r\n].*?)*?\};.
Regex
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
im stuck on a simple question, i want to display formatted text in a swing control and keep on adding new values into it, i don't want to use .setText(.getText + text) for personal reasons, (something like the append method for text area is what I am looking for) I've tried JEditorpane, Textpane but all of them do not have append method. Which swing control should I use?
While JEditorPane has no append method, you can certainly add text to its Document via its insertString(...) method, and I suggest that you look into doing this.
Edit
You ask:
it worked it out but it seems it works like setText, all the previous data vanishes.. how do i keep the previous data ?
Are you correctly passing in the first parameter, the offset? This should be the length of the current Document.