Looking for an all in one solution that I can distribute within my Java application to store a small amount of data as records. This data can change over time and will be updated frequently. It stores book information - author, genre, title, rating etc.
I would like to have something that will allow me to look into full text searching on a particular table, searching is something that will be crucial.
The only real requirement outside of the searching abilities is minimal setup, I dont expect MySQL etc to be running as a service, so the solution needs minimal attention form the end user.
So far I have been looking at SQLLITE.
As for the searching its totally new to me, one of the reasons I am doing this project is so I can learn about it.
Any feedback is appreciated.
You can look at HSQLDB that is the closest thing to sqlite, and is very simple (there are three storage: in memory, in a local file, in a distant server).
It can works with apache Lucene to get a full text search.
Related
I’m am taking a data structures course and am developing a project in java. The project is pretty much complete except for one aspect, implementation of a cache. My professor has been very vague as he is and should be with everything on how to implement this. The only hint he had given is that our operating system has its own file system which in and of itself is a map, and we can use it as a way to create a cache. I will paste the assignment details below. Any help would be greatly appreciated.
Almost forgot. My OS is windows 10
Requirements
This assignment asks you to create a web page categorization program.
The program reads 20 (or more) web pages. The urls for some of these web pages can be maintained in a control file that is read when the program starts. The others should be links from these pages. (Wikipedia is recommended source.) For each page, the program maintains frequencies of words along with any other related information that you choose.
The user can enter any other URL, and the program reports which other known page is most closely related, using a similarity metric of your choosing.
The implementation restrictions are:
Create a cache based on a custom hash table class you implement to keep track of pages that have not been modified since accessed; keep them in local files.
Use library collections or your own data structures for all other data stores. Read through the Collections tutorial.
Establish a similarity metric. This must be in part based on word-frequencies, but may include other attributes. If you follow the recommended approach of hash-based TF-IDF, create a hash table storing these.
A GUI allows a user to indicate one entity, and displays one or more similar ones.
I'm working on (essentially) a calendar application written in Java, and I need a way to store calendar events. This is the first "real" application I've written, as opposed to simple projects (usually for classes) that either don't store information between program sessions or store it as text or .dat files in the same directory as the program, so I have a few very basic questions about data storage.
How should the event objects and other data be stored? (.dat files, database of some type, etc)
Where should they be stored?
I'm guessing it's not good to load all the objects into memory when the program starts and not update them on the hard drive until the program closes. So what do I do instead?
If there's some sort of tutorial (or multiple tutorials) that covers the answers to my questions, links to those would be perfectly acceptable answers.
(I know there are somewhat similar questions already asked, but none of them I could find address a complete beginner perspective.)
EDIT: Like I said in one of the comments, in general with this, I'm interested in using it as an opportunity to learn how to do things the "right" (reasonably scalable, reasonably standard) way, even if there are simpler solutions that would work in this basic case.
For a quick solution, if your data structures (and of course the way you access them) are sufficiently simple, reading and writing the data to files, using your own format (e.g. binary, XML, ...), or perhaps standard formats such as iCalendar might be more suited to your problem. Libraries such as iCal4J might help you with that.
Taking into account the more general aspects of your question, this is a broader topic, but you may want to read about databases (relational or not). Whether you want to use them or not will depend on the overall complexity of your application.
A number of relational databases can be used in Java using JBDC. This should allow you to connect to the relational database (SQL) of your choice. Some of them run within their own server application (e.g. MS SQL, Oracle, MySQL, PostgreSQL), but some of them can be embedded within your Java application, for example: JavaDB (a variant of Apache Derby DB), Apache Derby DB, HSQLDB, H2 or SQLite.
These embeddable SQL databases will essentially store the data on files on the same machine the application is running on (in a format specific to them), but allow you to use the data using SQL queries.
The benefits include a certain structure to your data (which you build when designing your tables and possible constraints) and (when supported by the engine) the ability to handle concurrent access via transactions. Even in a desktop application, this may be useful.
This may imply a learning curve if you have to learn SQL, but it should save you the trouble of handling the details of defining your own file format. Giving structure to your data via SQL (often known by other developers) can be better than defining your own data structures that you would have to save into and read from your own files anyway.
In addition, if you want to deal with objects directly, without knowing much about SQL, you may be interested in Object-Relational Mapping frameworks such as Hibernate. Their aim is to hide the SQL details from you by being able to store/load objects directly. Not everyone likes them and they also come with their own learning curve (which may entail learning some details of how SQL works too). Their pros and cons could be discussed at length (there are certainly questions about this on StackOverflow or even DBA.StackExchange).
There are also other forms of databases, for example XML databases or Semantic-Web/RDF databases, which may or may not suit your needs.
How should the event objects and other data be stored? (.dat files,
database of some type, etc)
It depends on the size of the data to be stored (and loaded), and if you want to be able to perform queries on your data or not.
Where should they be stored?
A file in the user directory (or in a subdirectory of the user directory) is a good choice. Use System.getProperty("user.home") to get it.
I'm guessing it's not good to load all the objects into memory when
the program starts and not update them on the hard drive until the
program closes. So what do I do instead?
It might be a perfectly valid thing to do, unless the amount of data is so great that it would eat far too much memory. I don't think it would be a problem for a simple calendar application. If you don't want to do that, then store the events in a database and perform queries to only load the events that must be displayed.
A simple sequential file should suffice. Basically, each line in your file represents a record, or in your case an event. Separate each field in your records with a field delimiter, something like the pipe (|) symbol works nice. Remember to store each record in the same format, for example:
date|description|etc
This way you can read back each line in the file as a record, extract the fields by splitting the string on your delimiter (|) symbol, and use the data.
Storing the data in the same folder as your application should be fine.
The best way I find to handle the objects (for the most part), is to determine whether or not the amount of data you are storing is going to be large enough to have consequences on the user's memory. Based on your description, it should be fine in this program.
The right answer depends on details, but probably you want to write your events to a database. There are several good free databases out there, like MySQL and Postgres, so you can (relatively) easily grab one and play with it.
Learning to use a database well is a big subject, bigger than I'm going to answer in a forum post. (I could recommend that you read my book, "A Sane Approach to Database Design", but making such a shameless plug on a forum would be tacky!)
Basically, though, you want to read the data from the database when you need it, and update it when it changes. Don't read everything at start up and write it all back at shut-down.
If the amount of data is small and rarely changes, keeping it all in memory and writing it to a flat file is simpler and faster. But most applications don't fit that description.
I'm trying to implement part of the facebook ads api, the auto complete function ads.getAutoCompleteData
Basically, Facebook supplies this 39MB file which updated weekly, and which contains targeting ads data including colleges, college majors, workplaces, locales, countries, regions and cities.
Our application needs to access all of those objects and supply auto completion using this file's data.
I'm thinking of preferred ways to solved this. I was thinking about one of the following options:
Loading it to memory using Trie (Patricia-trie), the disadvantage of course that it will take too much memory on the server.
Using a dedicated search platform such as Solr on a different machine, the disadvantage is perhaps over-engineering (Though the file size will probably increase largely in the future).
(Fill here cool, easy and speed of light option) ?
Well, what do you think?
I would stick with a service oriented architecture (especially if the product is supposed to handle high volumes) and go with Solr. That being said, 39 MB is not a lot of hold in memory if it's going to be a singleton. With indexes and all this will get up to what? 400MB? This of course depends on what your product does and what kind of hardware you wish to run it on.
I would go with Solr or write your own service that reads the file into a fast DB like MySQL's MyISAM table (or even in-memory table) and use mysql's text search feature to serve up results. Barring that I would try to use Solr as a service.
The benefit of writing my own service is that I know what is going on, the down side is that it'll be no where as powerful as Solr. However I suspect writing my own service will take less time to implement.
Consider writing your own service that serves up request in a async manner (if your product is a website then using ajax). The trouble with Solr or Lucene is that if you get stuck, there is not a lot of help out there.
Just my 2 cents.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I am working on my Master's project and I am looking for a substantial amount of financial data about a particular company.
Example: let's say "Apple". I want the historic prices, current market price / ratios, quarterly results and the analyst calls.
I saw couple of posts on StackOverflow about YQL. I think I can get current price and various ratios from Yahoo Finance for free. However for other data, there are companies like Thomson Reuters, Bloomberg, etc. but they seem to have a closed system.
Where can I get an API to fetch various data? Is there anything which will help me get that data? I am fine with raw data as well in any format. Whatever I can get. Could you guys please suggest any API?
A Java library under development is IdylFin, which has convenience methods to download historical data.
Disclaimer: I am the author of this library.
Stephen is right on the money, if you really want a real wealth of data, you're probably gonna have to pay for it.
however, I've been successful on my own private projects by using the "API" spelled out here:
http://www.gummy-stuff.org/Yahoo-data.htm
I've pulled down all the stocks from the S&P 500 quite often, but if you ever publish that data, talk with yahoo. you'll probably have to license it.
btw, all this data is in CSV format, so get a CSV reader/converter etc. their easy to find
This is a Yahoo finance Historical data for "Apple"
http://in.finance.yahoo.com/q/hp?s=AAPL
There is a link at the bottom to download the data. May be this could help
I will suggest a couple of APIs that have financial data that is sometimes hard to find (e.g. quarterly results, analyst calls):
1) http://www.zacksdata.com/zacks-data-api
2) http://www.mergent.com/servius
Both have free trials available.
(Disclosure: My company manages both of these APIs)
A Java example to fetch data from Yahoo finance it given here Obba Tutorial: Using a Java class which fetches stock quotes from finance.yahoo.com
I have tackled this problem in the past.
For price history data, I used yahoo's API. When I say API, I mean I was making an HTTP get request for a CSV file of price history data. Unfortunately, that only gets you data for one company at a time, for a time span you specify. So I first made a list of all the ticker symbols, and iterated over that, calling yahoo's API for each. You might be able to find a website that lists ticker symbols too, and just periodically download that list.
Do this too often and too fast, and their website just might block you. I added some code to limit how frequently I made http requests. I also persisted my data so I would not have to get it again. I would always persist the raw/unprocessed form of data, your code could change in ways that make it tough to use anything else. Avro/Thrift might be an exception, since those support schema evolution.
For other kinds of data, you may not have any API that gives you nice CSV files. I had to cope with that problem many times. Here is my advice.
Sometimes a website calls a restful web service behind the scenes, you can discover that by using firebug. Sometimes it will also require certain headers, which you can also discover using firebug.
If you are forced to work with HTML, there are several java libraries that can help you. apache.commons.http is a library you can use to easily make http requests and handle their responses. Google has an http-client jar too, which is probably worth investigating.
The JSoup API is excellent at parsing HTML data, even when it is poorly formatted, and not XHTML. It works with XML too. Instead of traversing or visiting nodes in the jsoup hierarchy, learn XPath and use that to select what you want. The website may periodically change the format of its web page, that should be easy to cope with and fix if you're using JSoup, and tough to cope with otherwise.
If you have to work with JSON, use the Jackson library to parse it.
If you have to work with CSV, use the OpenCSV library to parse and handle it.
Also, always store the data in the raw, and avoid making unnecessary HTTP requests so you don't get blocked. I have been blocked by google finance a couple times, they can do it. Fortunately the block does expire. You might even want to add a random wait period between requests.
Have you tried Google Finance API. (Please google it ;). I am using it for tracking my portfolio. Could you try http://code.google.com/apis/finance/docs/finance-gadgets.html? There is an example of custom widget and it might tell you if you are barking under the right tree.
You are really asking about a free financial data service ... rather than an API.
The problem is that the data is a valuable commodity. It probably has cost the providers a lot of money to set up their systems, and it costs them even more money to keep those systems running. Naturally, they want a return on their investment, and they do this (in part) by selling their data / services.
(In the case of Yahoo, Google, etc, the data is bought from someone else, and Yahoo/Google will be subject to restrictions on how they can use it. Those restrictions will be reflected in respective ToS; e.g. you are only allowed to access the services "for personal use".)
I think your best bet would be to approach a number of the financial data providers, and ask if they can provide you with free access (subject to whatever restrictions they might want to impose) to their data services. You could get lucky ...
Good data is not free. Its as simple as that. The reason is that all data is ultimately licensed from an exchange like NYSE or NASDAQ.
If you can get some money high resolution historical data is available from Automated Trader.
You should also talk to the business school at your school. If they have finance masters/phd students or masters in financial engineering they should have large repositories of high resolution data for their students.
If you make your question more detailed I can provide a more detailed answer.
This is something that I kick myself for at least once a week. Way back when the internet consisted of Gopher and all that, you were able to log into FTP servers at the NASDAQ and NYSE, and download all kinds of stock history files for free. I had done it, even had it imported to a database and did some stuff with it....but that was probably 10 computers ago, its LONG gone now.
First of all, I'm sorry for a long explanation.
I need to develop a desktop app that will enable users to enter patients lab results for tracking diabetes treatment.
I usually code in php/mysql, but I'm not allowed to use it (my first choice was to use LAMP / XAMPP, but was told not to do so). I was told to use MS Access, as the easiest tool, but since I've never used it before, it gave me a lot of headache. Its wizards are great, but I needed to use data from several tables in my forms, and was simply unable to efficiently do so.
So, here is my question: What sort of db can I use with Java to create such an Application? As I said, I'm most comfortable with PHP/MYSQL combination, but during my studies I also used Java quite a lot. If I could find some sort of 'template' for such an application, I could probably create this easily.
I found this http://netbeans.org/kb/docs/java/gui-db.html using google. How complicated is this?
I need to create this ASAP, since I need to get this App to a team of medical staff, since I need this data to use as datasets for my MSc thesis (related to case based reasoning). I'm really behind schedule here :). And someone will probably break my neck if I don't develop it in a few days and start working on a theoretical part of my thesis.
Thank you in advance.
Okay, this one is near and dear having just been diagnosed as pre-diabetic.
What's not completely clear from your description is if you are keeping a separate database for each instance of the app, or are you sharing the data in one big remote database?
And how much data are you expecting?
If it's just for each instance, then any old database will work; the lightest weight one is actually probably sqlite3, but since you're in a hurry and used to LAMP, mysql will certainly work.
For remote data, go with mysql.
Either way around, there are plenty of ways to connect to the database, most all of them based on JDBC. It seems like you basiucally need two tables: patient, and labs. Patient will be something like <name,dob,ssan,address>, and labs something like <ssan, labtype, value, units>. SSAN would be the key and a record would be something like
123-45-6789|A1C| 6.0|pct glucolated
In that case, I'd just make the tables, have two forms with the data, and don't try for anything too fancy.
Update
Okay, from your comments, yes, either use Derby or just use text files. Do you need any particularly complicated queries? In the Old Days we would just have made something like a CSV file and read it into memory.
Oh, and look into Java Web Start for distributing the app.
You can use Java to connect to a database via JDBC. Check out this tutorial. http://download.oracle.com/javase/tutorial/jdbc/index.html. You can use mySQL as your database. You can also use embedded databases such as derby which don't require an external database application.
Here are 2 tutoriala how to build CRUD application on top of the NetBeans Platform (a Swing-based RCP):
http://platform.netbeans.org/tutorials/nbm-maven-crud.html
http://platform.netbeans.org/tutorials/nbm-crud.html