I am working with the com.sencha.gxt(3.0) grid widget. I am trying to write a Java function that searches for a string in a row(across all columns) and only display the rows that contain this string. Is there a built in, or easy method in which I can do this?
Thanks
No.
(I'm pretty sure you didn't actually mean to write a "yes or no" question...)
In short, it would be silly for GXT to assume that all data in your grid is a string or string-like or only makes sense to compare in string-y ways. Plus, you might have all your data locally so it makes sense to do a simple filter on the client, or you might really be saying 'please search all ten million rows, on the server/db, and only send back the ones that match to render on the client.
Client-side filtering
The Store (probably a ListStore for a grid, or a TreeStore for the tree grid, but your question doesn't specify) has a built-in method for local filtering - you give it a function that takes an item (and some other possibly-relevant data), and returns true/false indicating whether or not the item should be hidden from the user. More than one filter can be applied, and if any filter says the item is hidden, then it is gone. In other words, this can be thought of as a sql where clause, where all statements are joined with and - if you want or, you can implement a filter that checks if any single filter wants the item visible.
This filtering mechanism operates over the entire row item, the T in Store<T>. It doesn't know what a column is, since you could be using the store in any number of ways - items in a combobox, points on a chart, etc. It lets you build your own tools to perform the filtering. If your data is generalized in a way that makes it easy to know what all possible columns are, then you could easily write a loop over those columns that checks if any one of them contains the string, in which case return false so that the item is visible. Or, since you already built the ColumnModel full of ColumnConfigs, each of which has a ValueProvider and you know that it makes sense to compare each value with .toString().contains(queryText), you could iterate over the columns that way.
Server-side filtering
In this case, we have very little control on the client of what is happening, since you are very likely off-loading that work to your backend database. We still know the list of columns in the column model, but we either send a query for cells matching that text, or send a single query for the entire row. Either way, we modify the paging load config object that is being used, and send a new query for the data we want, and the actual work of doing the filtering is the server's job. It sounds like from your question that this is not what you want, but without a lot more detail, its hard to say for sure.
You can achieve this through filtering.
Initially, the grid takes an argument as store and modification to the store will be directly reflected on the User Interface.
To enable filtering:
store.setEnableFilters(true);
To filter the grid, use the following commands:
store.addFilter(new StoreFilter<M>()
{
#Override
public boolean select(...,...,...)
{
//return true if you want to display the row, else false
}
}
Related
So I'm porting my Swing Java database application to Java FX (still a beginner here, I recently just learned the basics of FXML and the MVC pattern, so please bear with me).
I intend to load the data from my existing database to the "students" ObservableList so I can show it on a TableView, but on my original Swing code, I have a search TextField, and when the user clicks on a button or presses Enter, the program:
Executes an SQLite command that searches for specific records, and retrieves the RecordSet.
Creates a DefaultTableModel based on the RecordSet contents
And throws that TableModel to the JTable.
However, Java FX is a completely different beast (or at least it seems so to me--don't get me wrong, I love Java FX :D ) so I'm not sure what to do.
So, my question is, do I have to load ALL the students in the database, then use some Java code to filter out students that don't fit the search criteria (and display all students when the search text is blank), or do I still use SQLite in filtering and retrieving records (which means I need to clear the list then add students every time a search is performed, and maybe it will also mess up with the bindings? Maybe there will be a speed penalty on this method also? Besides that, it will also reset the currently selected record because I clear the list--basically, bad UI design and will negatively impact the usability)
Depending on the right approach, there is also a follow-up question (sorry, I really can't find the answer to these even after Googling):
If I get ALL students from database and implement a search feature in Java, won't it use up more RAM than it should, because I am storing ALL the database data in RAM, instead of just the ones searched for? I mean, sure, even my lowly laptop has 4GB RAM, but the feeling of using more memory than I should makes me feel somewhat guilty LOL
If I choose to just update the contents of the ObservableList every time a new search has been performed, will it mess up with the bindings? Do I have to set up bindings again? How do I clear the contents of the ObservableList before adding the new contents?
I also have the idea of just setting the selected table item to the first record that matches the search string but I think it will be difficult to use, since only one record can be highlighted per search. Even if we highlight multiple rows, it'd be difficult to browse all selected items.
Please give me the proper way, not the "easy" way. This is my first time implementing a pattern (MVC or am I actually doing MVP, I don't know) and I realized how unmaintainable and ugly my previous programs are because I used my own style. This is a relatively big project that I need to support and improve for several years so having clean code and doing stuff the right way should help in maintaining the functionality of this program.
Thank you very much in advance for your help, and I hope I don't come off as a "dumb person who can't even Google" in asking these questions. Please bear with me here.
Basic design tradeoffs
You can, of course, do this either of the ways you describe. The basic tradeoffs are:
If you load everything from the database, and filter the table in Java, you use more memory (though not as much as you might think, as explained below)
If you filter from the database and reload every time the user changes the filter, there will be a bigger latency (delay) in displaying the data, as a new query will be executed on the database, with (usually) network communication between the database and the application being the biggest bottleneck (though there are others).
Database access and concurrency
In general, you should perform database queries on a background thread (see Using threads to make database requests); if you are frequently making database queries (i.e. filtering via the database), this gets complex and involves frequently disabling controls in the UI while a background task is running.
TableView design and memory management
The JavaFX TableView is a virtualized control. This means that the visual components (cells) are created only for visible elements (plus, perhaps, a small amount of caching). These cells are then reused as the user scrolls around, displaying different "items" as required. The visual components are typically quite memory-consumptive (they have hundreds of properties - colors, font properties, dimensions, layout properties, etc etc - most of which have CSS representations), so limiting the number created saves a lot of memory, and the memory consumption of the visible part of the table view is essentially constant, no matter how many items are in the table's backing list.
General memory consumption computations
The items observable list that forms the table's backing list contains only the data: it is not hard to ballpark-estimate the amount of memory consumed by a list of a given size. Strings use 2 bytes per character, plus a small fixed overhead, doubles use 8 bytes, ints use 4 bytes, etc. If you wrap the fields in JavaFX properties (which is recommended), there will be a few bytes overhead for each; each object has an overhead of ~16 bytes, and references themselves typically use up to 8 bytes. So a typical Student object that stores a few string fields will usually consume of the order of a few hundred bytes in memory. (Of course, if each has an image associated with it, for example, it could be a lot more.) Thus if you load, say 100,000 students from a database, you would use up of the order of 10-100MB of RAM, which is pretty manageable on most personal computer systems.
Rough general guidelines
So normally, for the kind of application you describe, I would recommend loading what's in your database and filtering it in memory. In my usual field of work (genomics), where we sometimes need 10s or 100s of millions of entities, this can't be done. (If your database contains, say, all registered students in public schools in the USA, you may run into similar issues.)
As a general rule of thumb, though, for a "normal" object (i.e. one that doesn't have large data objects such as images associated with it), your table size will be prohibitively large for the user to comfortably manage (even with filtering) before you seriously stretch the memory capacity of the user's machine.
Filtering a table in Java (all objects in memory)
Filtering in code is pretty straightforward. In brief, you load everything into an ObservableList, and wrap the ObservableList in a FilteredList. A FilteredList wraps a source list and a Predicate, which returns true is an item should pass the filter (be included) or false if it is excluded.
So the code snippets you would use might look like:
ObservableList<Student> allStudents = loadStudentsFromDatabase();
FilteredList<Student> filteredStudents = new FilteredList<>(allStudents);
studentTable.setItems(filteredStudents);
And then you can modify the predicate based on a text field with code like:
filterTextField.textProperty().addListener((obs, oldText, newText) -> {
if (newText.isEmpty()) {
// no filtering:
filteredStudents.setPredicate(student -> true);
} else {
filteredStudents.setPredicate(student ->
// whatever logic you need:
student.getFirstName().contains(newText) || student.getLastName().contains(newText));
}
});
This tutorial has a more thorough treatment of filtering (and sorting) tables.
Comments on implementing "filtering via queries"
If you don't want to load everything from the database, then you skip the filtered list entirely. Querying the database will almost certainly not work fast enough to filter (using a new database query) as the user types, so you would need an "Update" button (or action listener on the text field) which recomputed the new filtered data. You would probably need to do this in a background thread too. You would not need to set new cellValueFactorys (or cellFactorys) on the table's columns, or reload the columns; you would just call studentTable.setItems(newListOfStudents); when the database query finished.
Suppose, I would like to implement data validation in JavaFX TableView.
Once user have changed data in the row, I would like to check it when user is trying to change current row number. Once data is correct, I would like to allow row change, once data is incorrect, I want to disallow this.
How to implement this?
Currently, I am trying to add a listener
getSelectionModel().selectedIndexProperty().addListener(indexFromTableToModel);
but any operations inside it causes bad table view behavior. Using
Platform.runLater
makes situation better, but still imperfect.
Is there a convenient place, to perform row data validation and perform different operation depending on it?
You can write your own selection model that can be locked (if current value is invalid). In that case the model won't change the internal selected index.
I'm working with a pretty niche requirement to model a relational structure within Solr and thought that a custom polyfield would be the most suitable solution to my problem. In short, each record in the index will have a number of embargo and expiry dates for when the content should be considered 'available'. These dates are grouped with another kind of categorisation (let's say by device), so for example, any given item in the index may be available for mobile users between two dates, but only available for desktop users between another two dates.
Much like the currency and the latlon types, I would index the values as a comma separated list representing each availability window, for example:
mobile,2013-09-23T00:00:00Z,2013-09-30T00:00:00Z
So, a single index record could look like
{
id: "1234",
text: ["foobarbaz"],
availability: [
"mobile,2013-09-23T00:00:00Z,2013-09-30T00:00:00Z",
"pc,2013-09-22T00:00:00Z,2013-09-30T00:00:00Z"
]
}
The custom type would do the job of parsing the incoming value and storing it accordingly. Is this a viable solution? How would I approach the custom logic required at query time to filter by device and then make sure that NOW is within the provided dates?
My attempt so far has been based on the Currency field type, but now I've dialled it back to just storing the string in its un-parsed state. If I could prove that the filtering I want is even possible before using the polyfield features, then I'll know if it's worth continuing.
Does anybody else have any experience writing custom (poly)fields, or doing anything similar to what I'm doing?
Thanks!
If you want to be able to filter and search on these ranges, I don't think you'll have much luck storing records like that. It would make more sense to me to have a more structured document, something like:
id: "1234",
text: ["foobarbaz"],
mobileavailabilitystart: "mobile,2013-09-23T00:00:00Z",
mobileavailabilityend: "2013-09-30T00:00:00Z",
pcavailabilitystart: "2013-09-22T00:00:00Z",
pcavailabilityend: "2013-09-30T00:00:00Z"
Indexing the full contents of a csv line in Lucene/Solr, in a single field, would allow you to perform full-text searches on it, but would not be a good way to support querying for a specific element of it.
I wanted to ask for a clarification about the use of RowFilter and its effects on performances.
I implemented a filter through the method include(Entry) that for every row simply checks if its correspondent value in the model has a boolean flag set: if so, returns true, else false.
Now, the JTable I have can be pretty big (1000000+ rows), and I wasn't sure if this simple filtering, applied to such a big input set, could be costly.
How does the mapping between filtered rows and underlying data work exactly? I mean, does it store any extra data or it just draws the rows that match the filter "on the fly"??
Thank you very much for the clarifications.
no component in any of programing languages aren't designated to displaying too largiest matrix of data on the screen, then you have two choises
faster way is let's this job for SQL engine, that designated for searching and filtering rows in hugest Database table
slower way is hold these data in HashMap and there apply Comparator, and JTable would be display only result from this Comparator
Expanding on #mKorbel's second point, a TableModel for a very large data set may contain a List<Record>, as suggested here. The list may be sorted using a suitable Comparator and dynamically partitioned using the subList() method. At any one time, the corresponding JTable can only see one such partition, and separate controls will be required to change partitions.
I have a table called Token in my database that represents texts tokenized.
Each row haves attributes like textblock, sentence and position(for identifying the text that the token is from) and logical fields like text, category, chartype, etc.
What I want to know is iterate over all tokens to find patterns and do some operations. For example, merging two adjacent tokens that have the category as Name into one (and after this, reset the positions). I think that I will need some kind of list
What is the best way to do this? With SQL queries to find the patterns or iterating over all tokens in the table. I think the queries will be complex a lot and maybe, iterating as a list will be more simple, but I don't know which is the way (as example, retrieving to a Java list or using a language that I can iterate and do changes right on database).
To this question not be closed, what I want to know is what the most recommended way to do this? I'm using Java, but if other language is better, no problem, I think I will need use R to do some statistic calculus.
Edit: The table is large, millions rows, load entire in memory is not possible.
If you are working with a small table, or proving out a merge strategy, then just setup a query that finds all of the candidate duplicate lines and dump the relevant columns out to a table. Then view that table in a text editor or spreadsheet to see if your hypothesis about the duplication is correct.
Keep in mind that any time you try to merge two rows into one, you will be deleting data. Worst case is that you might merge ALL of your rows into one. Proceed with caution!
This is an engineering decision to be made, based mostly on the size of the corpus you want to maintain, and the kind of operations you want to perform on them.
If the size gets bigger than "what fits in the editor", you'll need some kind of database. That may or may not be an SQL database. But there is also the code part: if you want perform non-trivial operations on the data, you might need a real programming language (could be anything: C, Java, Python. anything goes). In that case, the communication with the database will become a bottleneck: you need to generate queries that produce results that fit in the application programme's memory. SQL is powerful enough to represent and store N-grams and do some calculations on them, but that is about as far as you are going to get. In any case the database has to be fully normalised, and that will cause it to be more difficult to understand for non-DBAs.
My own toy project, http://sourceforge.net/projects/wakkerbot/ used a hybrid approach:
the data was obtained by a python crawler
the corpus was stored as-is in the database
the actual (modified MegaHal) Markov code stores it's own version of the corpus in a (binary) flatfile, containing the dictionary, N-grams, and the associated coefficients.
the training and text generation is done by a highly optimised C program
the output was picked up by another python script, and submitted to the target.
[in another life, I would probably have done some more normalisation, and stored N-grams or trees in the database. That would possibly cause the performance to drop to only a few generated sentences per second. It now is about 4000/sec]
My gut feeling is that what you want is more like a "linguistic workbench" than a program that does exactly one task efficiently (like wakkerbot). In any case you'll need to normalise a bit more: store the tokens as {tokennumber,tokentext} and refer to them only by number. Basically, a text is just a table (or array) containing a bunch of token numbers. An N-gram is just a couple of tokennumbers+the corresponding coefficients.
This is not the most optimized method but it's a design that allows you to write the code easily.
write an entity class that represent a row in your table.
write a factory method that allows you to get the entity object of a given row id, i.e. a method that create an object of entity class witht the values from the specified row.
write methods that remove and insert a given row object into table.
write a row counting method.
now, you can try to iterate your table using your java code. remember that if you merge between two row, you need to correctly adjust the next index.
This method allows you use small memory but you will be using a lot of query to create the row.
The concept is very similar or identical to ORM (Object Relational Mapping). If you know how tho use hibernate or other ORM then try those libraries.
IMO it'd be easier, and likely faster overall, to load everything into Java and do your operations there to avoid continually re-querying the DB.
There are some pretty strong numerical libs for Java and statistics, too; I wouldn't dismiss it out-of-hand until you're sure what you need isn't available (or is too slow).
This sounds like you're designing a text search engine. You should first see if pgsql's full text search engine is right for you.
If you do it without full text search, loading pl into pgsql and learning to drive it is likely to be the fastest and most efficient solution. It'll allow you to put all this work into a few well thought out lines of R, and do it all in the db where access to the data is closest. the only time to avoid such a plan is when it would make the database server work VERY hard, like holding the dataset in memory and cranking a single cpu core across it. Then it's ok to do it app side.
Whether you use pl/R or not, access large data sets in a cursor, it's by far the most efficient way to get either single or smaller subsets of rows. If you do it with a select with a where clause for each thing you want to process then you don't have to hold all those rows in memory at once. You can grab and discard parts of result sets while doing things like running averages etc.
Think about scale here. If you had a 5 TB database, how would you access it to do this the fastest? A poor scaling solution will come back to bite you even if it's only accessing 1% of the data set. And if you're already starting on a pretty big dataset today, it'll just get worse with time.
pl/R http://www.joeconway.com/plr/