Ingres Installations Training Notes

From PTAGISWiki

Jump to: navigation, search

Trainer: Mary Schulte

Ingres trainer for North America for four years based in Dallas
flew in for our training
previously with Informix and IBM

Contents

Day 1

Planning and performing installation

  • Assigning major components to resources for best performance
  • installation methods
  • different ingstop options (force, kill, immediate)
    • force or kill are probably what we want in a script, immediate is probably a bad idea.
    • force is the least violent, immediate is the most violent
    • force allows rollbacks when it terminates transactions

Architecture

  • general communication facility
  • shared cache size is influenced by page size
  • basic unit of i/o is a page
  • communication server is used by distributed db features
  • II_EMBED_SET=printgca (captures GCA statements to log, including SQL)
  • name server can do round robin load balancing
  • ipm, netutil
  • mkvalidpw (required for visual dba)
  • isql group1db
  • ingprenv

Query tools

  • isql, qbf, rbf, tables, visual dba
  • new visual tools from a company in France are coming (translation work still going on)
  • isolation level is very strict by default in Ingres (serializability)
    • takes a lock even on reads while they are in progress
    • "help" can lock the system
  • set readlock=nolock lets you do a dirty read
  • autocommit is not on by default
  • utilities will do an implicit commit when you exit the tool
  • unloaddb does a copydb plus the system catalogs

Day 2

Tables

  • copydb is a good way to generate documentation on a database. It creates scripts that can be used to export a database to a file or import a database from files. But if you just look at the generated scripts, it will give you a good description of the database.
  • copydb can operate on a single table or set of tables.
  • copydb can generate a script that will drop a table and recreate it without loading data
  • Doug has wrapper scripts around copydb:
    • tools/propogate_schema_data
    • tools/archive_schema_data
  • copyout and copyin may mess up sequence generator keys

Storage structures

  • row by row insert is the slowest way to get data in
  • can we do a copy statement to load?
  • is referential integrity a problem with using copy? orphans?
  • what is faster? load into an empty heap? load into an empty b-tree?
  • make sure to turn off journalling
  • remove secondary indexes
  • then load
how do we load?
  • preprocess and validate to a temp csv file
  • modify to isam and search for unique tags and sites
    • use this to drive the rest of the transaction
  • copy setwise into obs_data
  • would a ping-pong loader help us?
    • all our reads are dirty-reads, so not really

Secondary indexes

  • dropped when a table is modified, but can be defined as persistent to make them rebuilt automatically
  • "set io_trace" shows every I/O action
  • "set trace point dm421;" clears the server's cache so microbenchmarks will be repeatable
  • fillfactor tuning

Logging and Recovery

  • consistency points happen regularly
  • the time between consistency points determines the amount of time required for recovery after a crash
  • frequent consistency points reduce performance
  • if the log file fills up, user activity is suspended so the system can rollback transactions
  • the archiver process will execute a script when it fails (acpexit.def)
    • the script by default sends mail to user ingres with the contests of the iiacp.log
    • the script can be modified to attempt to restart the archiver

Data Integrity

  • table definitions
  • constraints
    • unique
    • referential (foreign key constraints)
    • check (domain of possible values, arbitrary logic)
  • constraints impose performance penalty
  • database procedures
  • procedures invoked by rules

Tuning

Day 3

Locking

  • page locking (useful in the rare instance that row locking is causing unnecessary overhead)
  • row locking (default unless you are using 2K pages)
  • set lock_trace (to trace locks)
  • readlock=nolock
  • Isolation levels

Interlude

  • Dan Koren Director of Performance Team gave an idea about how to do ping pong loading (always load to an offline database)

Security

  • only users known to ingres can connect
  • databases are public by default
  • tables are private by default
  • grant priv on object to username;
  • views are private by default
  • can grant access to procedures and events as well
  • help permit emp; to view permissions
  • can use copydb to view permissions
  • revoke counters grant
  • a group can contain many users
  • a user can be a member of multiple groups
  • a user has one default group that they are a member of
  • an application or session can be established with a given role
  • privileges can be granted to roles
  • could grant query_row_limit to certain users
  • could grant connect_time_limit to certain users
  • if they exceed the limit, the query fails with no results probably determined by optimizer
  • the "set maxrow x" restriction can be applied to a session (see page 11-29)
  • specific privileges override general privileges
  • table security overrides database security overrides installation security
  • role overrides user overrides group overrides public overrides default

Other database objects

  • views
    • updateable or not updateable
  • synonyms (shorthand for fully qualified table names)
  • comments on tables or columns
    • help comment table tablename;
  • reports
  • forms

Backup and Recovery

  • copydb
  • relocatedb
  • ckpdb
  • components in the checkpoint example
    • transaction log buffer
    • db
    • journal
    • dump file
    • transaction log file
    • shared memory
    • user session
    • checkpoint file
  • when you turn on journaling, a new checkpoint is forced
  • alterdb manages checkpoints
  • infodb shows checkpoint history

Day 4

DBMS

  • write_behind threads can be increased to make consistency points take less time
    • activated and deactivated according to high-water limits
  • components in the DBMS
    • GCA receives requests
    • SCF scheduler that coordinates other facilities
    • PSF converts text to query trees
    • OPF converts query tree to QEP
    • QEF manages execution of QEP (set qef_sort_mem low (64K?) to make the faster DMF sort do more of the work)
    • DMF talks to file system/storage, manages shared memory cache
    • RDF
    • QSF workspace used by all server facilities
      • query text, query trees, short term plans, long term plans
  • setting the DMF cache smaller leaves more room for the OS file system cache and may improve performance
    • Ingres uses files instead of raw disk, so double buffering is always going on, but the file system cache is not under ingres control
  • start ingres from a shell that was started with rtprio (real time priority)
  • don't start the client with rtprio
  • modern OS scheduling is pessimal for database servers
  • solaris makes it easier than hpux or linux
  • solaris implemented threads with LWP, not with pthreads
  • when they implemented pthreads, they were just put a wrapper on LWPs
  • hpux has trouble scaling above 8 processors because of their implementation of pthreads
  • on a T2000 has small caches per core, so it should not have multiple instances
  • ZFS is bad for database performance, it performs checksums redundantly
  • veritas database edition gives a 20% performance increase
  • each gcc can handle about 200 sessions, add more gccs if necessary


Maintenance and Optimization

  • statistics are used by the optimizer
  • default is 4K page sizes
  • tables must be monitored for overflow
  • two types of pages (main, overflow)
  • sysmod reduces overflow on system tables

is the default page size 4K or 2K?

  • optimizedb generates statistics

optimizedb -zk dbname

  • statdump displays statistics
  • benchmark queries with dbmsinfo
    • clear the buffer cache
    • query dbmsinfo
    • run query
    • query dbmsinfo
    • calculate the difference
  • work space can be defined at the session level:

set work locations add (loc8);

  • For $1000 you could buy a 256G SSD drive that would make our I/O fly
  • wait for reliability problems to be solved

Day 5

Visual DBA

  • create a node
  • run mkvalidpw on server
  • ingstop/ingstart
  • Ingres net utility config:
user: ingres
password: ingres
node: 10.3.241.115
protocol: tcp_ip
listen address: II
  • test node: successful
  • start up Visual DBA
  • navigate through tree

System Maintenance

  • iidbdb
    • system catalogs
    • master catalogs
    • user data
  • locations
  • extending a database
  • tip: "union" sorts and removes duplicates, but "union all" does not
  • environment variables
    • unix
    • ingres (ingprenv)
  • TERM_INGRES
  • II_SYSTEM
  • II_INSTALLATION
  • ING_SET
    • startup commands for a session (could go in unix .profile)
  • II_EMBED_SET
    • allows tracing a user session (printqry, printgca, printtrace, eventdisplay, prefetchrows)

Items for Doug

  • How do we do backups? Answer: splitting mirrors
  • Issue: Dan Koren says ZFS is very bad because it redundantly does checksums
  • Issue: Dan Koren says that Veritas database edition is the best filesystem for Ingres on Solaris
  • Issue: Dan Koren says that if our database will fit within 500GB we can put it on an SSD drive and score great performance gains
  • If I were to graph a dozen (or two) Ingres system performance metrics, do you have any suggestions for what those should be? And could this be done without providing undue load on the system?
    • trace point dm420 (ratio of single buffer reads to group buffer reads)
    • overflow pages (total and per-table)
    • I/O wait (total and per-table)
    • thread count

Items from Doug

  • Can we use a partition plan that includes ISAM storage for the old partition of obs_data and b-tree for the recent partition of obs_data?
Personal tools