How to read WAL files in pg_xlog directory through java - java

Am trying to read WAL files of the postgresql can any body tell me how to do that n what type of binary encoding is used in WAL binary files

Using pg_xlogdump to read WAL file (this contrib program added to PG 9.3 version - PG 9.3 released doc)
This utility can only be run by the user who installed the server,
because it requires read-only access to the data directory.
pg_xlogdump --help
pg_xlogdump decodes and displays PostgreSQL transaction logs for debugging.
Usage:
pg_xlogdump [OPTION]... [STARTSEG [ENDSEG]]
Options:
-b, --bkp-details output detailed information about backup blocks
-e, --end=RECPTR stop reading at log position RECPTR
-f, --follow keep retrying after reaching end of WAL
-n, --limit=N number of records to display
-p, --path=PATH directory in which to find log segment files
(default: ./pg_xlog)
-r, --rmgr=RMGR only show records generated by resource manager RMGR
use --rmgr=list to list valid resource manager names
-s, --start=RECPTR start reading at log position RECPTR
-t, --timeline=TLI timeline from which to read log records
(default: 1 or the value used in STARTSEG)
-V, --version output version information, then exit
-x, --xid=XID only show records with TransactionId XID
-z, --stats[=record] show statistics instead of records
(optionally, show per-record statistics)
-?, --help show this help, then exit
For example: pg_xlogdump 000000010000005A00000096
PostgreSQL Document or this blog

You can't really do that. It's easy enough to read the bytes from a WAL archive, but it sounds like you want to make sense of them. You will struggle with that.
WAL archives are a binary log showing what blocks changed in the database. They aren't SQL-level or row-level change logs, so you cannot just examine them to get a list of changed rows.
You probably want to investigate trigger-based replication or audit triggers instead.

The format is complicated and low-level as other answers imply.
However, if you have time to learn and understand the data that is stored, and know how to build the binary from source, there is a published reader for versions 8.3 to 9.2: xlogdump
The usual way to build it is as a contrib (Postgres add-on):
First get the source for the version of Postgres that you wish to view WAL data for.
./configure and make this, but no need to install
Then copy the xlogdump folder to the contrib folder (a git clone in that folder works fine)
Run make for xlogdump - it should find the parent postgres structure and build the binary
You can copy the binary to your path, or use it in-situ. Be warned, there is still a lot of internal knowledge of Postgres required before you will understand what you are looking at. If you have the database available, it is possible to attempt to reverse out SQL statements form the log.
To perform this in Java, you could either wrap the executable, link the C library as a hybrid, or figure out how to do the parsing you need from source. Any of those options are likely to involve a lot of detailed work.

The WAL files are in the same format as the actual database files themselves, and depends on the exact version of PostgreSQL that you are using. You will probably need to examine the source code for your particular version to determine the exact format.

Related

How to store a small piece of information in a file without messing with its content?

My question:
How to store the current version of my software in a dump file generated by PostgreSQL?
The reason for my question:
I've developed a JAVA software using the PostgreSQL database. The software is installed locally on each user's computer, and the database is also local and individual for each user.
I've created a feature so that users can back up their databases and restore them. For this, my JAVA code runs pg_dump to generate the backup file and pg_restore to restore it. That is, the backup is nothing more than a dump of the database generated by the command below:
pg_dump.exe -U myuser -h localhost -p 5432 -Fc -f bkpname.bkp mydb
The problem is that I usually launch software updates. New versions of the software are always compatible with dumps from previous versions. However, older versions of the software are not compatible with dumps generated by a newer version.
Sometimes it happens that a user attempts to restore a recent version of dump in an old version of the software, which is not compatible.
I would like the dump file to have the information of which version of the software generated it. In this way, I could simply display a message informing the user that he needs to download the most current version of the software in order to restore the backup.
I thought of the two forms below, but I think they are not appropriate:
To save the software version to the dump file name. It would be bad
because the user could rename the file.
To concatenate the version inside the dump file content. I'm afraid
that the dump file might somehow be corrupted in the process of
entering text inside it or in the process of removing text of it
(before restoring the dump).
Is there a better way to add this information to the dump file?
One idea would be to store the information in a special table inside the database.
The table is not used normally, and you write the correct version into it right before you perform a dump.
Before you restore the whole dump, you first restore only that table:
pg_restore --table dump_version -d mydatabase dumpfile.dmp
Then you check what is in the table and proceed accordingly.

Spark: reading local file, file should exist on all nodes?

I have a spark cluster with 2 machines say mach-1 and mach-2.
I code on my local and then export it to JAR, and copy it to mach-1.
Then i run the code on mach-1 using spark-submit.
The code tries to read a local file, which exists on mach-1.
It works well most of the time, but sometimes it gave me errors like File does not exist. So, i then copied the file to mach-2 as well, and now the code works.
Similarly, while writing out the file to local, sometimes it worked when the output folder was only available on mach-1, but then it gave an error, and i created the output folder on mach-2 as well. Now it creates the output in both mach-1 and mach-2 (some part in mach-1 and some part in mach-2).
Is this expected behavior? any pointers to texts explaining this.
P.S: i do not collect my RDDs before writing to local file (I do it in foreach). If i do that, the code works well with output folder only being present on mach-1.
Your input data has to exist at every Node. You can achieve this by copy the data to the nodes, using NFS or HDFS.
For your output you can write to NFS or HDFS. Or you call collect(), but only do it, when your Dataset does fit into the Memory of the Driver. When it doesn't fit you should call rdd.toLocalIterator() or take(n).
Is it possible, that you run your code in Cluster Mode and not in Client Mode?

Revision number with SVN [duplicate]

I'm using Visual SVN on my Windows Box.
I have Repository Application, which has Framework as an svn:external. All well and good.
When I make a checkout of Application, I'd like to have the version of Application and Framework for inclusion in a footer file. This way I could have something like:
Application Version $ApplicationVersion$, Framework Version $FrameworkVersion$
Ordinarily, I understand I could use svn:keywords and add the revision - but as I understand it, svn:keywords apply on a per-file basis. A few sites have suggested using svnversion to produce the output for each variable, but I'm not entirely sure how to go about this.
Once again, on a Windows Box, using VisualSVN. I also develop on a Mac using Versions.app if it provides a more familiar interface for people to answer :)
Edit - my application is a PHP web application. As such, there is no compiling.
Thanks!
To use svnversion, you need to integrate it into the build process. If you run it on a subversion checkout, it will output a string like 73597:73598, indicating what version your tree has (notice that different files may have different versions, plus files may have also local modifications). You put something like
CFLAGS+=-DSVNVERSION="\"`svnversion`\""
into your Makefile, and then put
#define VERSION_STRING "Application version" SVNVERSION ", Framework version" FRAMEWORK_VERSION
into the code. If you don't use Make, or cannot readily have your build process run a command whose output produces a compiler command line option, then you can also use the subwcrev utility that comes with TortoiseSVN. You use that as a pre-build step, and have it transform some file with placeholders into a copy of the file with the placeholders replaced with the actual version; then your compilation will compile and link the new file.
Edit: For the PHP case, it is not possible to have the revision written automatically into a file on checkout or update. Instead, you could run svnversion on every PHP access, putting its output into the HTML response. If that gets too expensive, you can cache the svnversion result in a file and only regenerate the file if it is older than one hour (say), leaving it up to the user to remember to delete the file after an update to make it recompute the cache right away.

elasticsearch fail to start gives erorr

when I run command to start elasticsearch ..
/elasticsearch -f
gives bunch of errors like
ElasticSearchIllegalStateException[Failed to obtain node lock, is the following location writable?: [/home/anish/elasticsearch/data/elasticsearch]]
IOException[failed to obtain lock on /home/anish/elasticsearch/data/elasticsearch/nodes/49]
IOException[Cannot create directory: /home/anish/elasticsearch/data/elasticsearch/nodes/49]
dont know how to get rid of it..pl help
Look in /etc/sysconfig/elasticsearch for the values of ES_USER and ES_GROUP -- the default setting for both is elasticsearch. If that user cannot write to your data directory, then you will see this error. Since the data directory appears to be under your home directory, this could be the issue. You can either modify the settings for ES_USER and ES_GROUP or change the permissions for the data directory.
There could be multiple causes of this error, Elasticsearch stores its data in the file system and location of that is specified in elasticsearch.yml and while starting elasticsearch, it checks whether it can write data to that location or not, if yes, then it would acquire a lock on that location so that other process can't write to that location which prevents the data loss and corruption.
Also, this depends on the version of Elasticsearch, in current major version 7.X, Elasticsearch changed the way it writes to the same location when there are multiple installations on the same machine and before that it used node.max_local_storage_nodes to allow multiple installations on the same machine.
Tips to fix the issue
Check if multiple Elasticsearch process is running and you intend to do it.
Check if Elasticsearch have write access to the location, as mentioned in the previous answer.
Check the version of elasticsearch and see if you are using the correct configuration.

Store data between Program Runs Java

Short Version: I need to store some data between runs of a java program.The data will be of the form of a table.Is there anything that can let do something like a sql query in java??THE SOLUTION MUST BE ABLE TO RUN ON AN OFFLINE COMPUTER.
Long Version: The user will be entering some data daily and i want something like a sql table in java. The program will run on a computer that is NOT CONNECTED TO THE INTERNET and so i need a truly local way to store data(Lots of it).Also preferably the data should be stored in such a way that it is not easily accessible to the end user(as in ,he should not be able to double click the file and simply read its contents)
Major Constraint: On searching online i found many people were using localhost to solve similar problems but that facility is not available to me as i CANNOT INSTALL ANYTHING on the target computer.
If a simple data file is not good enough, how about using SQLite with a JDBC backend? It will allow you to have an SQL database stored in a regular file with no dependency on any kind of server. Alternatively, there are many other embedded DB engines that you could use, depending on your needs.
EDIT:
By the way, most (if not all) DB engines that I know of do not obfuscate the data before storing them in the filesystem. The data will be fragmented, but parts of it will be visible if you force an editor to open the file (e.g. using "Open with..." in Windows).
There is also nothing to stop your user from accessing the data using the command line utility of the selected DB engine. If you want to obfuscate the data you have to do it in your code. Keep in mind that this will not stop a determined person - if your application can read it offline, so can everyone else.
Use an embedded database (like Apache Derby, HSQLDB, H2) so that you don't have to run a database server on the machine. The data will be stored locally on the target machine and it won't be human readable.
You have several options:
Store it in an xml-file
Store it in an local installed database
You can install a database like mysql or use a in memory database like sqlite or hbase or apache derby, which is included in java 6

Categories

Resources