alnoda-workspaces/workspaces/postgres-workspace/removeme_docs/getting-started.md
2022-05-23 16:13:15 +00:00

21 KiB

Getting started

Intro

These tutorials demonstrates how to use Postgres workspace to explore and interact with Postgres.

To start, open Quickstart page http://localhost:8020/ for quick access to all the tools. Use terminal to execute commands from this tutorial.

Postgres

The workspace includes Postgres 14 up and running. You can use it to experiment and try things before trying on the application database. Connection parameters: host - localhost, user - abc (no password), database - abc.

You can load sample database to check out the workspace features and applications. For example, load "usda" database

git clone https://github.com/morenoh149/postgresDBSamples /home/pg_dbs
cd /home/pg_dbs/usda-r18-1.0
createdb usda
psql -f usda.sql usda
#Explore data
pgcli usda
SELECT * FROM data_src

load Chinook database

cd /home/pg_dbs/chinook-1.4
createdb -E UTF8 chinook
psql -f Chinook_PostgreSql_utf8.sql -d chinook   
# Explore data
pgcli chinook
SELECT * FROM "Track"

load Adventureworks database

cd /home/pg_dbs/adventureworks/
unzip data.zip # this will unzip all csv to the current directory
psql -c "CREATE DATABASE \"adventureworks\";"
psql -d adventureworks < install.sql
# Explore data
pgcli adventureworks
SELECT * FROM sales.salesorderdetail

Further examples will use this local Postgres instance, but can easily applied to another Postgres database.

VS-code & plugins

Apache Theia, a browser-based version of the VS-code, is part of the workspace. It has extensions to query Postgres

Theia connect

If you loaded sample databases to the local Postgres, connect & explore data directly in the IDE

Theia explore

Postgres CLI tools

Querying from command line can be awesome!

psql

psql is a PostgreSQL interactive terminal

# connect to local Postgres
psql
# connect to database chinook (local Postgres)
psql chinook
# general psql connection pattern
psql postgresql://dbmaster:5432/mydb
# with password prompt
psql -U postgres -W postgresql://my-postgres-host:5432/mydb
$ without password prompt
PGPASSWORD=xxxxxxxxxxx psql -U postgres postgresql://my-postgres-host:5432/mydb

pgcli

pgcli - is a postgres client that does auto-completion and syntax highlighting.

# pgcli to the local Postgres 
pgcli 
# connect to database usda (local Postgres)
pgcli usda
# pgcli to the remote Postgres 
pgcli -h my-postgres-host -p 5432 -U postgres -W -d mydb

pgcli

pspg

pspg

Pspg is a unix pager (with very rich functionality) designed for work with tables. Pspg is enabled by default, and will automatically applied to psql and pgcli if the result of SELECT statement is large. You can disable pspg by deleting the (last) line PAGER='pspg -s 4 --interactive --no-mouse' from file /home/abc/.zshrc. (or you can modify configuration).

Performance and load testing

Monitor live activity, collect and analyse metrics, perform load tests.

pgmetrics

pgmetrics

pgmetrics is an open-source, zero-dependency, single-binary tool that can collect 350+ metrics from a running PostgreSQL server and display it in easy-to-read text format or export it as JSON and CSV for scripting.

# Try on local Postgres database "usda" (upon password prompt just hit Enter) 
pgmetrics --no-pager -h localhost -U abc usda 
# Remote Postgres 
pgmetrics --no-pager -h my-postgres-host -p 5432 -U postgres mydb

Hint: Output to txt file pgmetrics -h localhost -U abc usda >> /home/static-server/pgmetrics-usda.txt and open static server http://localhost:8022

pgCenter

pgCenter is a command-line admin tool for observing and troubleshooting Postgres.

# for local Postgres 
pgcenter top -h localhost -p 5432 -U abc -d usda
# for remote Postgres (password prompt)
pgcenter top -h my-postgres-host -p 5432 -U postgres -d mydb
# for remote Postgres (no password prompt)
PGPASSWORD=xxxxxxxxxxx pgcenter top -h my-postgres-host -p 5432 -U postgres -d mydb

pgbench

pgbench is a simple program for running benchmark tests on PostgreSQL.

# initialize 
pgbench -h localhost -U abc -i -s 50 abc
# test
pgbench -h localhost -U abc -c 10 -j 2 -t 10000 abc

Example use on remote Postgres:

# initialize 
pgbench -h my-postgres-host -U postgres -i -s 50 mydb
# test
pgbench -h my-postgres-host -U postgres -c 10 -j 2 -t 10000 mydb

Learn more:

Backups and restore

Create backup dumps with pg_dump or pg_dumpall, push to S3 with rclone and schedule with Cronicle.

pgdump

The purpose of pg_dump is to generate a file with SQL commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump.

mkdir /home/dumps/
# Dump from local Postgres
pg_dump -h localhost -p 5432 -U abc usda > /home/dumps/local_postgres_dump
# Dump from remote Postgres
PGPASSWORD=xxxxxxxxxxx pg_dump -h my-postgres-host -p 5432 -U postgres mydb > /home/dumps/remote_postgres_dump
# Restore from dump to local Postgres
psql dbname < /home/dumps/remote_postgres_dump

Other versions of pg_dump The default pg_dump version is for Postgres 14. If you see error error: aborting because of server version mismatch, try pg_dump for your postgres version (currently Postgres versions 12, 13 and 14 are supported)

pg_dump_12 -h localhost -p 5432 -U abc abc > /home/dumps/local_postgres_dump_12
pg_dump_13 -h localhost -p 5432 -U abc abc > /home/dumps/local_postgres_dump_13

pg_dumpall

pg_dumpall is a utility for writing out ("dumping") all PostgreSQL databases of a cluster into one script file.

pg_dumpall -h localhost -p 5432 -U abc > /home/dumps/local_postgres_dumpall
# Dump from remote Postgres
PGPASSWORD=xxxxxxxxxxx pg_dump -h my-postgres-host -p 5432 -U postgres mydb > /home/dumps/remote_postgres_dump
# Restore from dump to local Postgres
psql dbname < /home/dumps/remote_postgres_dump

The default is pg_dumpall for Postgres 14, if you need earlier version, use pg_dumpall_12, pg_dumpall_13.

rclone

rclone is a command line program to manage files on cloud storage. It is a feature rich alternative to cloud vendors' web storage interfaces.
Copy dump to S3:
Create file ~/.config/rclone/rclone.conf with the following content

[remote]
type = s3
provider = AWS
access_key_id = XXXXXXXXXXXXXXXXXXXXXX
secret_access_key = XXxxXXxXXXxxxXXxXXXxxXXXxxxXXXxXXXX
region = xx-xxxx-x

Use Rclone to copy to S3 and delete from local

rclone move /home/dumps/ remote:my-s3-bucket/dumps/

Restore from S3 to local: When there is a need to restore database from a dump, copy dupm from S3 or mount S3 bucket to local folder. In case of mounting, file will not be physically copied. Database restoration will happen directly from dump on S3.

# copy & restore
rclone copy remote:dwh-1/dumps/remote_postgres_dump /home/s3-dumps/
psql dbname < /home/s3-dumps/remote_postgres_dump
# mount & restore
rclone sync remote:dwh-1/dumps /home/s3-dumps
psql dbname < /home/s3-dumps/remote_postgres_dump

scheduling backups

Schedule with Cronicle: open Cronicle WEB UI from the main Workspace page, or directly on http://localhost:8023/ (user/pass - admin/admin) and create scheduled event (plugin - shell script). Provide commands to be executed.

Cronicle

Database sample

Who hasn't faced a need to make a sample database for testing/development purposes? :-)

# Create sample from database "usda" in local Postgres
pg_sample -h localhost -U abc --limit=500 --file=/home/project/sample_db.sql usda
# create new database (local Postgres)
createdb usda_sample
# restore sample to the new database
psql usda_sample < /home/project/sample_db.sql 

Data import/export

Import data from various files, export to csv, json, html. Download with Filebrowser, upload to S3 with Rclone, schedule with Cronicle.

pgclimb

pgclimb is a PostgreSQL utility to export data into different data formats with support for templates.

# From local Postgres instance
pgclimb --host localhost --port 5432 --dbname abc --username abc --query "SELECT * FROM distributors" -o /home/project/distributors.csv csv
# From remote Postgres instance
pgclimb --host my-postgres-host --port 5432 --dbname mydb --username postgres --password xxxxxxxxxxx --query "SELECT * FROM phonebook" -o /home/project/phone.csv csv
pgclimb --host my-postgres-host --port 5432 --dbname mydb --username postgres --password xxxxxxxxxxx --query "SELECT * FROM phonebook" -o /home/static-server/phone.html html

pgfutter

pgfutter helps to import CSV and line delimited JSON into PostgreSQL the easy way.
For example, create file /home/project/friends.json with the following content

{"name": "Jacob", "age": 26, "friends": ["Anthony"]}
{"name": "Anthony", "age": 25, "friends": []}
{"name": "Emma", "age": 28, "friends": ["Jacob", "Anthony"]}

Import data to Postgres

# to the local Postgres
pgfutter --host localhost --port 5432 --dbname abc --username abc --table friends json /home/project/friends.json
# to the remote Postgres
pgfutter --host my-postgres-host --port 5432 --dbname mydb --username postgres --pass admin --schema imports --table friends json /home/project/friends.json

Another example: create file /home/project/friends2.csv with the following content

name,age,friends
Jacob,26,"Anthony"
Anthony,25,""
Emma,28,"Jacob,Anthony"

Import data to Postgres

pgfutter --host my-postgres-host --port 5432 --dbname mydb --username postgres --pass admin --schema public --table friends2 csv /home/project/friends2.csv

PGLoader

PGLoader - a very versatile data loading tool for PostgreSQL. Load data from files, or migrate entire databases to Postgres.

Example of use: create test load file /home/project/test.csv with the following content

Header, with a © sign
"2.6.190.56","2.6.190.63","33996344","33996351","GB","United Kingdom"
"3.0.0.0","4.17.135.31","50331648","68257567","US","United States"
"4.17.135.32","4.17.135.63","68257568","68257599","CA","Canada"
"4.17.135.64","4.17.142.255","68257600","68259583","US","United States"
"4.17.143.0","4.17.143.15","68259584","68259599","CA","Canada"
"4.17.143.16","4.18.32.71","68259600","68296775","US","United States"

Create PGLoader file /home/project/test.load with the following content (use your postgres connection arguments)

LOAD CSV
     FROM '/home/project/test.csv' (x, y, a, b, c, d)
     -- local Postgres
     INTO postgresql://abc@localhost:5432/abc?csv (a, b, d, c)
     -- remote Postgres
     -- INTO postgresql://user:password@my-postgres-host:5432/mydb?csv (a, b, d, c)

     WITH truncate,
          skip header = 1,
          fields optionally enclosed by '"',
          fields escaped by double-quote,
          fields terminated by ','

      SET client_encoding to 'latin1',
          work_mem to '12MB',
          standard_conforming_strings to 'on'

   BEFORE LOAD DO
    $$ drop table if exists csv; $$,
    $$ create table csv (
        a bigint,
        b bigint,
        c char(2),
        d text
       );
  $$;

Load data into new table called "csv"

pgloader /home/project/test.load

Data generators/mocking

Generate realistic data for existing tables, create new mock tables or databases.

Synth

Synth is a tool for generating realistic data using a declarative data model. Synth is database agnostic and can scale to millions of rows of data.

Create dir mkdir /home/project/synth and create file /home/project/synth/companies.json with the following contents

{
    "type": "array",
    "length": {
        "type": "number",
        "constant": 1
    },
    "content": {
        "type": "object",
        "company_id": {
            "type": "number",
            "id": {}
        },
        "company_name": {
            "type": "string",
            "faker": {
                "generator": "company_name"
            }
        }
    }
}

Generate data (2 entries only) as json - synth generate /home/project/synth/ --size 2 | jq. Generate data into (local) Postgres

# Create table users in local Postgres database
echo "CREATE TABLE companies(company_id SERIAL PRIMARY KEY, company_name VARCHAR(255) NOT NULL);" | psql
# generate random data to table users
synth generate /home/project/synth/ --size 200 --to postgres://abc@localhost:5432/abc
# check 10 records
echo "SELECT * FROM companies LIMIT 10" | psql

Learn more:

mock-data

The idea behind mock-datais to allow users to test database queries with sets of fake data in any pre-defined table.

# Create table in (local) Postgres
echo "CREATE TABLE distributors (did integer, name varchar(40));" | psql
# Populate (local) Postgres with fake data (2000 rows)
mock tables -t "distributors" -r 2000 --uri="postgres://abc@localhost:5432/abc?sslmode=disable"

Learn more:

Database change management

Implement database change management in your team and drive higher software delivery and organizational performance.

sqitch

sqitch - is a database change management application. To demonstrate (very briefly) how Sqitch works, create a new database called 'flipr_test' in the local Postgres:

createdb flipr_test

Initialize and configure a Sqitch project, called 'flipr'

mkdir /home/project/flipr && cd /home/project/flipr
sqitch init flipr --engine pg

Open file /home/project/flipr/sqitch.conf and update configuration for the "pg" engine to conect to local Postgres as following

[core]
	engine = pg
[engine "pg"]
	target = db:pg:
	client = psql

Create database and changes to be managed by Sqitch

# create new Postgres database 
createdb flipr_test
# create sqitch change
sqitch add appschema -n 'Add schema for all flipr objects.'
# add SQL command to change file
echo "CREATE SCHEMA flipr;" >> /home/project/flipr/deploy/appschema.sql
# add SQL command to change-revert file
echo "DROP SCHEMA flipr;" >> /home/project/flipr/revert/appschema.sql

Apply db change with Squitch

sqitch deploy db:pg:flipr_test

Connet to local Postgres, database 'flipr_test' and check schema 'flipr' was created

pgcli flipr_test
SELECT schema_name FROM information_schema.schemata

Now you can open terminal back, and revert this change with the command sqitch revert db:pg:flipr_test.

Learn more:

yuniql

Yuniql is a data platform devops tool using migration-based and database-first delivery model. Migration-based as each changeset to the schema and seed data is a set of carefully prepared scripts controlled with a version number.
To briefly demonstrate how Yuniql works, clone example Yuniql project

git clone https://github.com/rdagumampan/yuniql.git /home/project/yuniql

Create new database in the local Postgres instance

createdb helloyuniql

Apply to the local Postgres a Yuniql project in /home/project/yuniql/samples/basic-postgresql-sample

export YUNIQL_PLATFORM="postgresql"
export YUNIQL_WORKSPACE="/home/project/yuniql/samples/basic-postgresql-sample"
export YUNIQL_CONNECTION_STRING="Host=localhost;Port=5432;Username=abc;Password=;Database=helloyuniql"

Finally, run migrations

cd /home/project/yuniql/samples/basic-postgresql-sample
yuniql run -a --platform postgresql

Check the results

pgcli helloyuniql
SELECT * FROM regions

migra

migra is a schema comparison tool for PostgreSQL. Find differences in database schemas as easily as running a diff on two text files. Migra makes schema changes almost automatic. Management of database migration deployments becomes much easier, faster, and more reliable. For example, compare two local Postgres databases

migra postgresql://abc:@localhost:5432/abc postgresql://abc:@localhost:5432/helloyuniql --unsafe

Migra will produce a set of SQL commands to make database "abc" have the same DDL as databas "helloyuniql".

Visualization tools

dbdesigner

dbdesigner a visual tool to create entity relationship diagrams and generate Postgres DDL code. Open dbdesigner from the Quickstart page.

dbdesigner

schemaspy

schemaspy generates complete database documentation that looks great.

schemaspy -h localhost -p 5432 -d usda -u abc -o /home/static-server/usda-schemaspy

And open static file server on http://localhost:8022/ folder usda-schemaspy

schemaspy

Postgres explain visualizer

Pev helps to understand Postgres execution plans

pev

tbls

tbls - a tool for document a database, written in Go. For example, generate Markdown docs

tbls doc "postgres://abc@localhost:5432/usda?sslmode=disable" /home/static-server/usda-tbls

Now you can push these docks push to Github.

pg_flame

pg_flame - a flamegraph generator for Postgres EXPLAIN ANALYZE output.

psql usda -qAtc 'EXPLAIN (ANALYZE, FORMAT JSON) SELECT *
   FROM nut_data N
   JOIN food_des F ON F.ndb_no = N.ndb_no
   JOIN datsrcln D ON D.nutr_no = N.nutr_no
   LEFT JOIN data_src ON data_src.datasrc_id = D.datasrc_id
   WHERE N.nutr_val > 20
       AND data_src.year > 1975 AND data_src.year < 1986;' \
    | pg_flame > /home/static-server/flamegraph.html

pgflame

Other useful tools

pg_insights

Convenient SQL for monitoring Postgres database health. Clone repo and use it to analyse Postgres databases

git clone https://github.com/lob/pg_insights /home/pg_insights

sqlfluff

Sqlfluff is a moduler sql linter for humans.

echo "select * from t where id=1" >/tmp/test.sql
sqlfluff lint /tmp/test.sql

Results in:

== [/tmp/test.sql] FAIL
L:   1 | P:   1 | L044 | Query produces an unknown number of result columns.
L:   1 | P:  27 | L006 | Missing whitespace before =
L:   1 | P:  27 | L006 | Missing whitespace after =