Ingres Installations Training Notes
From PTAGISWiki
Trainer: Mary Schulte
- Ingres trainer for North America for four years based in Dallas
- flew in for our training
- previously with Informix and IBM
Contents |
Day 1
Planning and performing installation
- Assigning major components to resources for best performance
- installation methods
- different ingstop options (force, kill, immediate)
- force or kill are probably what we want in a script, immediate is probably a bad idea.
- force is the least violent, immediate is the most violent
- force allows rollbacks when it terminates transactions
Architecture
- general communication facility
- shared cache size is influenced by page size
- basic unit of i/o is a page
- communication server is used by distributed db features
- II_EMBED_SET=printgca (captures GCA statements to log, including SQL)
- name server can do round robin load balancing
- ipm, netutil
- mkvalidpw (required for visual dba)
- isql group1db
- ingprenv
Query tools
- isql, qbf, rbf, tables, visual dba
- new visual tools from a company in France are coming (translation work still going on)
- isolation level is very strict by default in Ingres (serializability)
- takes a lock even on reads while they are in progress
- "help" can lock the system
- set readlock=nolock lets you do a dirty read
- autocommit is not on by default
- utilities will do an implicit commit when you exit the tool
- unloaddb does a copydb plus the system catalogs
Day 2
Tables
- copydb is a good way to generate documentation on a database. It creates scripts that can be used to export a database to a file or import a database from files. But if you just look at the generated scripts, it will give you a good description of the database.
- copydb can operate on a single table or set of tables.
- copydb can generate a script that will drop a table and recreate it without loading data
- Doug has wrapper scripts around copydb:
- tools/propogate_schema_data
- tools/archive_schema_data
- copyout and copyin may mess up sequence generator keys
Storage structures
- row by row insert is the slowest way to get data in
- can we do a copy statement to load?
- is referential integrity a problem with using copy? orphans?
- what is faster? load into an empty heap? load into an empty b-tree?
- make sure to turn off journalling
- remove secondary indexes
- then load
how do we load?
- preprocess and validate to a temp csv file
- modify to isam and search for unique tags and sites
- use this to drive the rest of the transaction
- copy setwise into obs_data
- would a ping-pong loader help us?
- all our reads are dirty-reads, so not really
Secondary indexes
- dropped when a table is modified, but can be defined as persistent to make them rebuilt automatically
- "set io_trace" shows every I/O action
- "set trace point dm421;" clears the server's cache so microbenchmarks will be repeatable
- fillfactor tuning
Logging and Recovery
- consistency points happen regularly
- the time between consistency points determines the amount of time required for recovery after a crash
- frequent consistency points reduce performance
- if the log file fills up, user activity is suspended so the system can rollback transactions
- the archiver process will execute a script when it fails (acpexit.def)
- the script by default sends mail to user ingres with the contests of the iiacp.log
- the script can be modified to attempt to restart the archiver
Data Integrity
- table definitions
- constraints
- unique
- referential (foreign key constraints)
- check (domain of possible values, arbitrary logic)
- constraints impose performance penalty
- database procedures
- procedures invoked by rules
Tuning
Day 3
Locking
- page locking (useful in the rare instance that row locking is causing unnecessary overhead)
- row locking (default unless you are using 2K pages)
- set lock_trace (to trace locks)
- readlock=nolock
- Isolation levels
Interlude
- Dan Koren Director of Performance Team gave an idea about how to do ping pong loading (always load to an offline database)
Security
- only users known to ingres can connect
- databases are public by default
- tables are private by default
grant priv on object to username;- views are private by default
- can grant access to procedures and events as well
- help permit emp; to view permissions
- can use copydb to view permissions
- revoke counters grant
- a group can contain many users
- a user can be a member of multiple groups
- a user has one default group that they are a member of
- an application or session can be established with a given role
- privileges can be granted to roles
- could grant query_row_limit to certain users
- could grant connect_time_limit to certain users
- if they exceed the limit, the query fails with no results probably determined by optimizer
- the "set maxrow x" restriction can be applied to a session (see page 11-29)
- specific privileges override general privileges
- table security overrides database security overrides installation security
- role overrides user overrides group overrides public overrides default
Other database objects
- views
- updateable or not updateable
- synonyms (shorthand for fully qualified table names)
- comments on tables or columns
help comment table tablename;
- reports
- forms
Backup and Recovery
- copydb
- relocatedb
- ckpdb
- components in the checkpoint example
- transaction log buffer
- db
- journal
- dump file
- transaction log file
- shared memory
- user session
- checkpoint file
- when you turn on journaling, a new checkpoint is forced
- alterdb manages checkpoints
- infodb shows checkpoint history
Day 4
DBMS
- write_behind threads can be increased to make consistency points take less time
- activated and deactivated according to high-water limits
- components in the DBMS
- GCA receives requests
- SCF scheduler that coordinates other facilities
- PSF converts text to query trees
- OPF converts query tree to QEP
- QEF manages execution of QEP (set qef_sort_mem low (64K?) to make the faster DMF sort do more of the work)
- DMF talks to file system/storage, manages shared memory cache
- RDF
- QSF workspace used by all server facilities
- query text, query trees, short term plans, long term plans
- setting the DMF cache smaller leaves more room for the OS file system cache and may improve performance
- Ingres uses files instead of raw disk, so double buffering is always going on, but the file system cache is not under ingres control
- start ingres from a shell that was started with rtprio (real time priority)
- don't start the client with rtprio
- modern OS scheduling is pessimal for database servers
- solaris makes it easier than hpux or linux
- solaris implemented threads with LWP, not with pthreads
- when they implemented pthreads, they were just put a wrapper on LWPs
- hpux has trouble scaling above 8 processors because of their implementation of pthreads
- on a T2000 has small caches per core, so it should not have multiple instances
- ZFS is bad for database performance, it performs checksums redundantly
- veritas database edition gives a 20% performance increase
- each gcc can handle about 200 sessions, add more gccs if necessary
Maintenance and Optimization
- statistics are used by the optimizer
- default is 4K page sizes
- tables must be monitored for overflow
- two types of pages (main, overflow)
- sysmod reduces overflow on system tables
is the default page size 4K or 2K?
- optimizedb generates statistics
optimizedb -zk dbname
- statdump displays statistics
- benchmark queries with dbmsinfo
- clear the buffer cache
- query dbmsinfo
- run query
- query dbmsinfo
- calculate the difference
- work space can be defined at the session level:
set work locations add (loc8);
- For $1000 you could buy a 256G SSD drive that would make our I/O fly
- wait for reliability problems to be solved
Day 5
Visual DBA
- create a node
- run mkvalidpw on server
- ingstop/ingstart
- Ingres net utility config:
user: ingres password: ingres node: 10.3.241.115 protocol: tcp_ip listen address: II
- test node: successful
- start up Visual DBA
- navigate through tree
System Maintenance
- iidbdb
- system catalogs
- master catalogs
- user data
- locations
- extending a database
- tip: "union" sorts and removes duplicates, but "union all" does not
- environment variables
- unix
- ingres (ingprenv)
- TERM_INGRES
- II_SYSTEM
- II_INSTALLATION
- ING_SET
- startup commands for a session (could go in unix .profile)
- II_EMBED_SET
- allows tracing a user session (printqry, printgca, printtrace, eventdisplay, prefetchrows)
Items for Doug
- How do we do backups? Answer: splitting mirrors
- Issue: Dan Koren says ZFS is very bad because it redundantly does checksums
- Issue: Dan Koren says that Veritas database edition is the best filesystem for Ingres on Solaris
- Issue: Dan Koren says that if our database will fit within 500GB we can put it on an SSD drive and score great performance gains
- If I were to graph a dozen (or two) Ingres system performance metrics, do you have any suggestions for what those should be? And could this be done without providing undue load on the system?
- trace point dm420 (ratio of single buffer reads to group buffer reads)
- overflow pages (total and per-table)
- I/O wait (total and per-table)
- thread count
- start ingres from a shell that was started with rtprio (real time priority) (but the Solaris utility is priocntl). Here is a good article on Solaris scheduling: http://www.princeton.edu/~unix/Solaris/troubleshoot/schedule.html.
- don't start the client with rtprio
Items from Doug
- Can we use a partition plan that includes ISAM storage for the old partition of obs_data and b-tree for the recent partition of obs_data?
