I'm building an application that downloads a set of images from a website, extracts some features from them and then allows a user to compare an image she submits to the downloaded set, to see which one is the closest. At the moment the application downloads the images and extracts the features from them. Then the image and the feature get wrapped in an object and stored in a map, with the key as the name of the image, and the value as the aforementioned wrapped object.
Because this is stored in memory, each time I start the application it has to go through the quite expensive process of downloading and feature extraction. It would be much quicker if it could just load this info from disk, but I'm not sure on the best way to go about it - I've thought about these options:
RDMS: something like Postgres or SQLite
NoSQL: something like
Voldemort or Reddis
Serialisation: use built in java methods to write
objects to a file (could also be used in conjunction with a DB
though...)
I want it to be really light weight; I want to keep the application as small as possible and keep configuration down to a minimum. For this reason serialisation seems like the way to go, but I'd like a second (or more) opinion on that, because something about doing it that way just feels wrong. I can't quite put my finger on why I feel like that...
I should also say that users can add images to the set when the application is running, I'd like to save these images too.
I wouldn't recommend serialzation - just too many pitfalls.
If what you have is really just a map, then i think any of the key-value stores ( like redis) would be appropriate.
If you have more complex data, then you might want to consider a database (whether SQL or no-sql).
Related
Im building an automation framework in selenium using the Page Object Design Pattern.
Following are some of the data that Im using and where i have stored them
PageObjects (xpath, id etc) - In the Page Classes itself
Configuration Data (wait-times, browser type , the URL etc) - In a properties file.
Other data - In a class as static variables.
Once the framework starts growing it would be hard to store all the data it would be hard to organize the data. I did a some research on how others have implemented the way they store data in their framework. Here is what I found out,
Storing data (mostly page objects) in classes itself
Storing data in JSON
And some even suggested storing data in a database so that it would reduce reading times
Since there are lot of options out there, I thought of getting some feedback on what is the best way to store data and how everyone else has stored there data.
JSON or Any temp data storage is the best option as it is a framework and the purpose of it is to reuse for different projects.
I don't see any problem with the way you have stored your data.
Locators (by POM definition) should be stored in the page objects themselves.
Config data can be stored in some sort of config file... whatever you find convenient. You can use plain text, JSON, XML, etc. We use XML but that really comes down to personal preference.
I think this is fine also.
The framework doesn't really grow, the automation suite does. As long as you keep the data stored in the 3 places above consistently, I think you should be fine. The only issue I've run into with this approach is that sometimes certain pages have a LOT of functionality on them so the page objects grow quite large. In those cases, we found a way to divide the page into smaller chunks, e.g. one page had 22 tabs, each consisting of a different panel. In that case, we broke the page object into 22 different class files to keep the size more manageable and then hooked them all back into the main page as properties, e.g. mainPage.Panel1.someMethodOnPanel1();
I advice using Interfaces for each device type to store multiple type selectors, example:
import static org.openqa.selenium.By.cssSelector;
import static org.openqa.selenium.By.linkText;
import static org.openqa.selenium.By.xpath;
public interface DesktopMainPageSelector {
By FIRST_ELEMENT = cssSelector("selector_here");
By SECOND_ELEMENT = xpath("selector_here");
By THIRD_ELEMENT = id("selector_here");
}
than, just implement these selectors from whatever you need them.
You can also use enums with for a more complex structure.
I found this as best solution, because its easy to manage large numbers of selectors
I want to store multiple values (String, Int and Date) in a file via Java in Android Studio.
I don't have that much experience in that area, so I tried to google a bit, but I didn't get the solution, which I've been looking for. So, maybe you can recommend me something?
What I've tried so far:
Android offers a SharedPreferences feature, which allows a user to save a primitive value for a key. But I have multiple values for a key, so that won't work for me.
Another option is saving data on an external storage medium as file. As far as good. But I want to keep the filesize at minimum and load the file as fast as possible. That's the place, where I can't get ahead. If I directly save all values as simple text, I would need to parse the .txt file per hand to load the data which will take time for multiple entries.
Is there a possibility to save multiple entries with multiple values for a particular key in an efficient way?
No need to reinvent a bicycle. Most probably the best option for your case is using the databases. Look into Sqlite or Realm.
You don’t divulge enough details about your data structure or volume, so it is difficult to give a specific solution.
Generally speaking, you have these three choices.
Serialize a collection
I have multiple values for a key
You could use a Map with a List or Set as its value. This has been discussed countless times on Stack Overflow.
Then use Serialization to write and read to storage.
Text file
Write a text file.
Use Tab-delimited or CSV format if appropriate. I suggest using the Apache Commons CSV library for that.
Database
If you have much data, or concurrency issues with multiple threads, use a database such as the H2 Database Engine.
I have a RuneScape Private Server project coded in Java, and am trying to code a personal "tag" that players can use. I have managed to do this, but everytime there is a restart on the server, their "tag" gets reset to "null".
Their "tag" is initalized by doing a command ";;settag [name]". Their tag is then set to whatever they want. I have done this through a string:
if (command[0].equals("settag")) {
newTag = getCompleteString(command, 1);
newTag = player.yellTag
player.sendMessage("Your tag is now:" +newTag);
}
I am unsure what the most efficient way to fix this would be, I am thinking of just loading and saving through .xml/.txt files. By the way, player.yellTag is where the next command (::mytag) searches it from, which works fine, until there is a restart of the server.
it all depends on the context of your application. If you are planning on having less than a few hundreds players, then a xml file may be ok. You should look at JAXB, which is, afaict, the standard way to store your objects in Java. You can also store them as JSON files, using gson which is way simpler to use and implement than XML stuff.
But if you get to have more than thousands of players, you may want to get some more efficient way to serialize your tags by putting them in a database, and thus an ORM library like hibernate could help you do that.
You may want to make your own stuff, like a tag directory full of files named after unique ids of your players containing the players' tag... It's a lot more "hackish" but still quite efficient.
i have a binary file with following format :
[N bytes identifier & record length] [n1 bytes data]
[N bytes identifier & record length] [n2 bytes data]
[N bytes identifier & record length] [n3 bytes data]
as you see i have records with different lengths. in each record i have N bytes fixed which contains and id and the length of data in record.
this file is very big and can contains 3 millions records.
I want to open this file by an application and let user to browse and edit the records.
( Insert / Update / Delete records)
my initial plan is to create and index file from original file and for each record, keep next and previous record address to navigate forward and backward easily. (some sort of linked list but in file not in memory)
is there library (java library) to help me to implement this requirement ?
any recommendation or experience that you think is useful?
----------------- EDIT ----------------------------------------------
Thanks for guides and suggestions,
some more info:
the original file and its format is out of my control (it's a third party file) and i can't change the file format. but i have to read it, let user to navigate over records and edit some of them (insert new record/ update an existing record/ delete a record) and at the end save it back to original file format.
do u still recommend DataBase instead of a normal index file ?
----------------- SECOND EDIT ----------------------------------------------
record size in update mode is fixed. it means updated (edited) record has same length as original record's, unless user delete the record and create another record with different format.
Many Thanks
Seriously, you should NOT be using a binary file for this. You should use a database.
The problems with trying to implement this as a regular file stem from the fact that operating systems do not allow you to insert extra bytes into the middle of an existing file. So if you need to insert a record (anywhere but the end), update a record (with a different size) or remove a record, you would need to:
rewrite other records (after the insertion/update/deletion point) to make or reclaim space, or
implement some kind of free space management within the file.
All of this is complicated and / or expensive.
Fortunately, there is a class of software that implements this kind of thing. It is called database software. There are a wide range of options, ranging from using a full-scale RDBMS to light-weight solutions like BerkeleyDB files.
In response to your 1st and 2nd edits, a database will still be simpler.
However, here's an alternative that might perform better for this use-case than using a DB... without doing complicated free-space management.
Read the file and build an in-memory index that maps ids to file locations.
Create a second file to hold new and updated records.
Perform the record adds/updates/deletes:
An addition is handled by writing the new record to the end of the second file, and adding an index entry for it.
An update is handled by writing the updated record to the end of the second file, and changing the existing index entry to point to it.
A delete is handled by deleting the index entry for the record's key.
Compact the file as follows:
Create a new file.
Read each record in the old file in order, and check the index for the record's key. If the entry still points to the location of the record, copy the record to the new file. Otherwise skip it.
Repeat the step 4.2 for the second file.
If we completed all of the above successfully, delete the old file and second file.
Note this relies on being able to keep the index in memory. If that is not feasible, then the implementation is going to be more complicated ... and more like a database.
Having a data file and an index file would be the general base idea for such an implementation, but you'd pretty much find yourself dealing with data fragmentation upon repeated data updates/deletion, etc. This kind of project, in itself, should be a separate project and should not be part of your main application. However, essentially, a database is what you need as it is specifically designed for such operations and use cases and will also allow you to search, sort, and extend (alter) your data structure without having to refactor an in-house (custom) solution.
May I suggest you to download Apache Derby and create a local embedded database (derby does it for you want you create a new embedded connection at run-time). It will not only be faster than anything you'll write yourself, but will make your application easier to maintain.
Apache Derby is a single jar file that you can simply include and distribute with your project (check the license if any legal issue may apply in your app). There is no need for a database server or third party software; it's all pure Java.
Bottom line as that it all depends on how large is your application, if you need to share the data across many clients, if speed is a critical aspect of your app, etc.
For a stand-alone, single user project, I recommend Apache Derby. For a n-tier application, you might want to look into MySQL, PostgreSQL or (hrm) even Oracle. Using already made and tested solutions is not only smart, but will cut down your development time (and maintenance efforts).
Cheers.
Generally you are better off letting a library or database do the work for you.
You may not want to have an SQL database and there are plenty of simple databases which don't use SQL. http://nosql-database.org/ lists 122 of them.
At a minimum, if you are going to write this I suggest you read the source for one of these databases to see how they work.
Depending on the size of the records, 3 million isn't that much and I would suggest you keep as much in memory as possible.
The problem you are likely to have is ensuring the data is consistent and recovering the data when a corruption occurs. The second problem is dealing with fragmentation efficiently (some thing the brightest minds working on the GC deal with) The third problem is likely to be maintain the index in a transaction fashion with the source data to ensure there are no inconsistencies.
While this may appear simple at first, there are significant complexities in making sure there data is reliable, maintainable and can be accessed efficiently. This is why most developers use an existing database/datastore library and concentrate on the features which are unqiue to their application.
(Note: My answer is about the problem in general, not considering any Java libraries or - like the other answers also proposed - using a database (library), which might be better than reinventing the wheel)
The idea to create an index is good and will be very helpful performance-wise (although you wrote "index file", I think it should be kept in memory). Generating the index should be quite fast if you read the ID and record length for each entry and then just skip the data with a file seek.
You should also think about the edit functionality. Especially inserting and deleting can be very slow on such a big file if you do it wrong (f.e. deleting and then moving all the following entries to close the gap).
The best option would be to only mark deleted entries as deleted. When inserting, you can overwrite one of those or append to the end of the file.
Insert / Update / Delete records
Inserting (rather than merely appending) and deleting records to a file is expensive because you have to move all the following content of the file to create space for the new record or to remove the space it used. Updating is similarly expensive if the update changes the length of the record (you say they are variable length).
The file format you propose is fundamentally unsuitable for the kinds of operations you want to perform. Others have suggested using a data-base. If you don't want to go that far, adding an index file (as you suggest) is the way to go. I recommend making the index records all the same length.
As others have stated a database would seem a better solution. The following are Java SQL DB's that could be used: H2, Derby or HSQLDB
If you want to use an index file look at Berkley DB or No Sql
If there is some reason for using a file, look at JRecord . It has
Several Classes for reading/writing files with variable length binary records (they where written for Cobol VB files). Any of Mainframe / Fujitsu / Open Cobol VB file structures should do the job.
An Editor for editing JRecord files. The latest version of the Editor can handle large files (it uses Compression / spill file). The editor suffers from having to download the whole file and only one user can edit the file at one time.
The JRecord solution will only work if
There is a limited number (preferably one) users all located in the one location
Fast infostructure
I have a hobby project, which is basically to maintain 'todo' tasks in the way I like.
One task can be described as:
public class TodoItem {
private String subject;
private Date dueBy;
private Date startBy;
private Priority priority;
private String category;
private Status status;
private String notes;
}
As you can imagine I would have 1000s of todo items at a given time.
What is the best strategy to store a
todo item? (currently on an XML file)
such that all the items are loaded
quickly up on app start up(the
application shows kind of a dashboard
of all the items at start up)?
What is the best way to design its
back-end so that it can be ported to
Android/or a J2ME based phone?
Currently this is done using Java
Swing. What should I concentrate on so
that it works efficiently on a device
where memory is limited?
The application throws open a form
to enter new todo task. For now, I
would like to save the newly added
task to my-todos.xml once the user
presses "save" button. What are the
common ways to append such a change
to an existing XML file?(note that I don't want to read the whole file again and then persist)
For storing: SQLite seems like a good solution for things such as searching and cross platform support. Android and many other devices support SQLite.
As with any programming question there are a lot of ways to do things. However, by specifying that you are intending to go to a phone, you list of considerations changes. Firstly you need to look at your intended phones to see what they support. Especially in terms of data storage.
Xml or some other flat file format will work fine if you don't have too much data and don't want to enable searching and other functions which will access the data in random ways.
But if you want to store larger amounts of data or do random access, you need to look into data storage techniques that are more database like. This is where you intended target platforms are likely to impose limits in terms of performance or storage limits.
The other alternative is that you design the application so that it's storage os decoupled from the core program. This means that you can apply different types of data storage, depending on whether it's a PC or phone, yet not have to recode everything else.
One option that comes to mind is an in-memory DB, which exists in various flavors. I've yet to use one of these, so I can't tell you about memory usage or platform constraints. Still, it's worth looking at.
Another option that comes to mind is to maintain a large collection of TodoItem objects, and write your own code to read from and persist this collection to the XML file. Essentially, build a class that contains the large Map (or whatever you decide to use) and have this class implement Externalizable.
Both of these options will allow you to read the XML file to its in-memory representation, search and alter the state, and eventually write the final state back to XML when the app goes down (or at fixed intervals, whatever you decide).
You might be able to use java.util.prefs.Preferences.