Hi, I've got one table in DB on 4k notes with 2 columns(id and title) and one array on 4k elements. Both of them are full of equal data. What is the best way to sort it and insert in sorted order into another table in DB? Which way takes less resources to sort: using SQL or code it yourself using quick sort with array? Which strategy is better to use array or DB?
If you query for the data, then use order by and load it in the correct order in the first place.
If that data is available in your application without querying the other table, then do not do it and quicksort. You will save your database some work and it will probably be faster, especially since you're using java.
Using inbuilt DB operations should be faster. Sorting externally and again putting back involves I/O operations which themselves are slow, and then it also depends on the language you are using.
Related
Our db have around 400k strings (usernames) at the moment that will need to be checked do they exist on request. Due to large amount of requests needed to check in single second (and because data in mysql will be updated only periodically) it will probably be better to use java instead of mysql query. So my thought is that ill be better to load data from mysql to java and get result if some specific usernames exist in it instead of overheating mysql.
And so my question is, to which one should i load records to, to array ArrayList list collection map HashMap or something third if you can think of, for best speed for only to check if (per request) one username is on that list or not?
You can find details on how each is implemented in the java docs. I believe ArrayList will do a linear search. So HashMap is most likely better.
I am writing an application that needs to look up data from a table (20x200) for calculation inputs. The table is filled with constants (i.e. I do not need to write to the table). I am still a novice programmer and have not had a lot of experience with databases, and so prior to proceeding I would like to know the best way to achieve this.
I had intended to place the data in an array and simply perform the lookup with 2 loops (one row look up and one column lookup) however I feel this is very inefficient. Is it worth looking into A database such as SQLite? or is that overkill for what is a relatively small data set with no requirement for editing?
As often, the answer is: It depends.
Do you need some advanced querying, like the sum of all values in the x column for which the value in the y column is greater then 23. If so a in memory SQL database comes in handy. Otherwise it would just be overkill.
Assuming the database is out of the discussion, the next questions are: Do you need single values, complete (or large parts of) columns or rows? And what are the natural "names" of your columns and rows.
Here are some options:
"names" are continuous integers: Use a 2D array (I wouldn't use arrays very often in Java, but in a read only situation with fixed lengths everything else sounds like to much overhead. By choosing the order of the indices, i.e. rows first vs. columns first you can get complete columns/rows very easy and efficient.
"names" are not continuous, Strings or any other objects: Use a Map of Maps if you need access to complete rows or columns. If you only need single values, create a Pair type and use it a the key for the map.
1) You can use a in-memory database like H2-Datbase Engine.
For which you just need to include a jar and data retrieval will be very fast.
It can't be considered as an overhead on your application.
2) Or you can use a Map<key,Map<String,string>> for the lookup.
For the main Map, key will be your record id, and for inner Map key will be your column name.
Whether to make it static or not I leave that on you to decide.
3) You can also explore caching options like ehcache.
I have a requirement like running 'n' numbers of select queries at fixed time intervals and storing that data. These results need to be pulled later upon a client's demand.
My question is:
1) Is it okay to store it as csv files? Or could you suggest another format?
2) Or, should it be stored as clob variable in a db?
Please suggest any compression techniques to store these query results; also, is it possible to store only revisions of previous resultsets instead of storing the whole resultset?
note:
The minimum time interval is hourly.
The number of queries (n) will be varying (currently 10 to 200 queries.)
The resultset size of each query is also varying (say 10 to 1,000,000 but mostly around 10k.)
The resultset data fetched between each time intervals doesn't differ much. (The row value will not be updated frequently.)
I am new to computer science and programming and also not very aware about storage or db designs.
It sounds like you should be building a data warehouse.
Performance-wise I suppose it would be better to have a table which purpose is to store the query results.
I think you need to store the data in a database. SQL database can serve you the best.
Regarding to storing the data in fixed interval of time, you just need to make effect of the change in the data set instead of storing the whole data again and again. I don't know what is your requirement and how much infrastructure you can afford. If you have such huge queries, I recommend you to work in Distributed System. Use NOSQL database for better performance.
I am making a java desktop application for billing customers that will be using a mysql database (so I can make a php frontend using the same database later). I was wondering if I should make a class that puts all the mysql info into arrays on startup so I can work with the arrays or if I should just query the database when I need to access data.
I was wondering what is the most efficient, fastest etc... Has anyone got an good pointers?
You should query the database when you need the data. That's what databases are for. If you bring all the data into Java arrays, then you will end up building querying methods on those arrays, or limiting yourself to simplistic ways of accessing the data.
If your data is small enough to fit easily into RAM, then MySQL will cache it all anyway, and it will go just as fast as if you had pulled it into arrays first.
Putting data into arrays might make sense if it's static - I'd call that caching.
But billing data seems more dynamic to me, depending on how you define it. In that case, I'd query the database each time.
Query as needed rather than pre-loading all the information. This will use potentially a lot less memory. Some of your data may need to be cached while working, but odds are most of it doesn't. The RDBMS is already designed and optimized to store and retrieve data as needed, so it is best allowed to do its job.
I want to put "random" output from my result set (about 1.5 mil rows) in a file in a sorted manner. I know i can use sort by command in my query but that command is "expensive".
Can you tell me is there any algorithm for writing result set rows in a file so the content would be sorted in the end and can i gain in performance with this?
I'm using java 1.6, and query has multiple joins.
Define an index for the sort criteria in your table, then you can use the order by clause without problems and write the file as it comes from the resultset.
If your query has multiple joins, create the proper indexes for the joins and for the sort criteria. You can sort the data on your program but you'd be wasting time. That time will be a lot more valuable when employed learning how to properly tune/use your database rather than reinventing sorting algorithms already present in the database engine.
Grab your database's profiler and check the query's execution plan.
In my experience sorting at the database side is usually as fast or faster...certainly if the column you sort on is indexed
If you're reading from a database, getting sorted output shouldn't be so 'expensive' if you have appropriate indexes.
But, sometimes with complex queries it's very hard for the SQL optimiser to apply indexes. In that case, the DB simply accumulates the results in a temporary table and sorts it for you, transparently.
It's very unlikely that you could match the level of optimisations put into your DB engine; but if your problem arises because you're doing some postprocessing of the data that negates any sorting done by the DB, then you have no alternative other than sorting it yourself.
Again, the easiest would be to use the DB: simply write to a temporary table with an appropriate index and dump from there.
If you're certain that the data will always fit in RAM, you can sort it in memory. It's the only case in which you might be able to beat the DB engine, just because you know you won't need HD access.
But that's a lot of 'ifs'. Better stay with your DB
If you need the data sorted, someone has to do it - either you or the database. It's certainly easier effort-wise to add the ORDER BY to the query. But there's no reason you can't sort it in-memory on your side. The easiest way is to chunk the data in a sorted collection (TreeSet, TreeMap) using a Comparator to sort on the column you need. Then write out the sorted data.