sql Package

sql Package

This package provides an interface to an SQL database to store image features for steganalysis.

Module:pysteg.sql
Date:$Date$
Revision:$Revision$
Author:© 2012: Hans Georg Schaathun <georg@schaathun.net>

The tables represented by SQLObject classes are visible directly in the package. The different submodules provide functionality:

setup:functions to create tables and enter standard feature def’s
imageset:enter images in the db and create test and training sets
features:establish new feature vectors
extract:extracting features from images and enter them in the db
stats:statistical analysis of features in the database
svmodel:managing SVM classifiers using features from the db
tools:reporting, feature dumps, and other output
latex:LaTeX formatted output
coverselect:cover selection
errors:error analysis for steganalysers
stegocompare:performance analysis for steganalysers
exceptions:exceptions and errors

A couple of private modules are important, but their contents exposed via the main package:

tables:the core database tables
_queue:the class and SQL table for the job queue
_scaling:scaling models for use with learning classifiers
pysteg.sql.sqlConnect()[source]

Connect to the data base, using connection data from the config object.

pysteg.sql.getImageSet(name)[source]

Look up an image set by its key. The ImageSet table is tried first, and the TestSet table if that fails. If the argument is itself an SQLObject, this is returned as is. It should be used by any function intended to take a polymorphic image set argument.

class pysteg.sql.Queue(**kw)

Bases: sqlobject.main.SQLObject

Table to record pending jobs. Each entry concerns one image and one or more feature sets. A worker node should use a transaction to select one item where assigned is null, and then set this field with the current date and time before the transaction is released.

Three modes: 1. image set/svmodel=None for normal feature calculation 2. image=None/svmodel set/testset=None for SVM training 3. image=None/svmodel and testset set for SVM testing

addFeatureSet(obj)
addListTo(L)

Add a list (or any other iterator) of feature sets to the queue item. The elements must be FeatureSet objects (neither id-s nor keys are acceptable).

addTo(fset)

Add feature sets to the queue item.

classmethod addToImage(img, fset, verbosity=2)

Add a new image with one or more feature sets to the queue.

Parameters :

img : an Image or TestImage object

fset : an iterator of FeatureSet objects

assigned
assignee
destroy(force=False)

Delete the job. Unless force is True, an assigned job will not be deleted. Normally, releaseJob() is used to release and delete a processed job. This is not safe; a transaction should be used to lock the record while deleting.

entered
features
classmethod getJob(worker=None, SVM=True, verbosity=0, **kw)

Get a job from the queue. Transactions are used to make this safe to concurrency.

If SVM is false, only feature extraction tasks will be accepted. This is useful if some compute nodes are used without access to the filesystem holding SVM model files.

image
imageID
j = queue
q = queue
releaseJob(worker=None, success=True)

Notify the Queue that the job has been completed.

removeFeatureSet(obj)
class sqlmeta(instance)

Bases: sqlobject.main.sqlmeta

childName = None
columnDefinitions = {'testsetID': <ForeignKey 2a12cd0 testset>, 'svmodelID': <ForeignKey 2a12c90 svmodel>, 'imageID': <ForeignKey 2a12c50 image>, 'assigned': <DateTimeCol 2654150 assigned>, 'assignee': <StringCol 2654390 assignee>, 'entered': <DateTimeCol 262fb50 entered>}
columnList = [<SODateTimeCol entered>, <SODateTimeCol assigned default=None>, <SOStringCol assignee default=None>, <SOForeignKey imageID connected to Image>, <SOForeignKey svmodelID default=None connected to SVModel>, <SOForeignKey testsetID default=None connected to TestSet>]
columns = {'testsetID': <SOForeignKey testsetID default=None connected to TestSet>, 'svmodelID': <SOForeignKey svmodelID default=None connected to SVModel>, 'imageID': <SOForeignKey imageID connected to Image>, 'assigned': <SODateTimeCol assigned default=None>, 'assignee': <SOStringCol assignee default=None>, 'entered': <SODateTimeCol entered>}
idName = 'id'
indexDefinitions = []
indexes = []
joinDefinitions = [<sqlobject.joins.SQLRelatedJoin object at 0x2a127d0>]
joins = [<sqlobject.joins.SOSQLRelatedJoin object at 0x2a19610>]
soClass

alias of Queue

style = <sqlobject.styles.MixedCaseUnderscoreStyle object at 0x256d850>
table = 'queue'
Queue.svmodel
Queue.svmodelID
Queue.testset
Queue.testsetID
class pysteg.sql.Image(**kw)

Bases: sqlobject.main.SQLObject

An Image is an Image Object to be analysed. It may be an identical copy of a Source Image, or it may be a modified version obtained by stego embedding, compression, down sampling, etc.

addFeatureMatrix(key, *a, **kw)

Add feature values from a numpy array M. The given key is the prefix, to which indices are appended. If symindex is True, the indices are symmetric around 0, otherwise they range from 0 upwards.

addFeatures(**kw)

Add feature values for the image. The features are given as a dictionary with keys as used in the database and a floating point value. (Not tested!)

addFeaturesNamed(vals, names)

Add feature values from a list vals. The keys of the features should be given in a list names.

addTestSet(obj)
classmethod byPath(path)

Look up an image by its path name.

delta(feature)

Compare this image with its cover or source image with respect to the given feature. The return value is the difference between the feature values. None is returned if the image does not have a known source image.

destroyValues(verbosity=2)
featureValueObjects(key=None)

Return an iterator of FeatureValue objects defined by the given key. If key is None, all features are included.

features
filename
getBasename()

Return the basename of the file, stripping any extension off.

getCoverFeature(key)

Obtain the given feature value recursively from the source image.

getFeatures(key=None, featureSet=False)

Return a feature vector as a list of floating point values.

getOneFeature(key, verbosity=2)

Return the given feature value.

getPath()

Return the full path name for the image.

getSource()

Return the source image, or self if no source is defined.

idx = <sqlobject.index.SODatabaseIndex object at 0x29fbf10>
imageset
imagesetID
j = image
log
msgfrac
msglen
q = image
removeTestSet(obj)
source
sourceID
class sqlmeta(instance)

Bases: sqlobject.main.sqlmeta

childName = None
columnDefinitions = {'imagesetID': <ForeignKey 29fb110 imageset>, 'msgfrac': <IntCol 29fb290 msgfrac>, 'sourceID': <ForeignKey 29fb210 source>, 'msglen': <IntCol 29fb250 msglen>, 'filename': <StringCol 2a0e250 filename>}
columnList = [<SOStringCol filename>, <SOForeignKey imagesetID connected to ImageSet>, <SOForeignKey sourceID default=None connected to Image>, <SOIntCol msglen default=None>, <SOIntCol msgfrac default=None>]
columns = {'imagesetID': <SOForeignKey imagesetID connected to ImageSet>, 'msgfrac': <SOIntCol msgfrac default=None>, 'sourceID': <SOForeignKey sourceID default=None connected to Image>, 'msglen': <SOIntCol msglen default=None>, 'filename': <SOStringCol filename>}
idName = 'id'
indexDefinitions = [<DatabaseIndex 29fb1d0 {'unique': True, 'name': 'idx', 'columns': ('imageset', 'filename')}>]
indexes = [<sqlobject.index.SODatabaseIndex object at 0x29fbf10>]
joinDefinitions = [<sqlobject.joins.SQLMultipleJoin object at 0x29fb2d0>, <sqlobject.joins.SQLMultipleJoin object at 0x29fb310>, <sqlobject.joins.RelatedJoin object at 0x29fb350>]
joins = [<sqlobject.joins.SOSQLMultipleJoin object at 0x29fbc90>, <sqlobject.joins.SORelatedJoin object at 0x29fbd10>, <sqlobject.joins.SOSQLMultipleJoin object at 0x29fbdd0>]
soClass

alias of Image

style = <sqlobject.styles.MixedCaseUnderscoreStyle object at 0x256d850>
table = 'image'
Image.testsets
class pysteg.sql.Feature(**kw)

Bases: sqlobject.main.SQLObject

A feature is a function of an image. The database table stores a unique key (ID) and a description.

addFeatureVector(obj)
addValue(image, value)

Add a calculated feature value giving the image and its value.

classmethod byKey(val, connection=None)
cat
catID
description
destroy()

Delete the feature including all calculated feature values.

fv
j = feature
key
q = feature
removeFeatureVector(obj)
class sqlmeta(instance)

Bases: sqlobject.main.sqlmeta

childName = None
columnDefinitions = {'key': <StringCol 29fbf90 key>, 'description': <StringCol 29fbfd0 description>, 'catID': <ForeignKey 29fb510 cat>}
columnList = [<SOForeignKey catID connected to FeatureSet>, <SOStringCol key alternate ID>, <SOStringCol description default=None>]
columns = {'key': <SOStringCol key alternate ID>, 'description': <SOStringCol description default=None>, 'catID': <SOForeignKey catID connected to FeatureSet>}
idName = 'id'
indexDefinitions = []
indexes = []
joinDefinitions = [<sqlobject.joins.RelatedJoin object at 0x29ff050>, <sqlobject.joins.SQLMultipleJoin object at 0x29ff090>]
joins = [<sqlobject.joins.SORelatedJoin object at 0x29ff5d0>, <sqlobject.joins.SOSQLMultipleJoin object at 0x29ff7d0>]
soClass

alias of Feature

style = <sqlobject.styles.MixedCaseUnderscoreStyle object at 0x256d850>
table = 'feature'
Feature.val
class pysteg.sql.FeatureValue(**kw)

Bases: sqlobject.main.SQLObject

A Feature Value is a Feature calculated for a particular Image. The database table stores references to the Feature and Image as foreign keys (one-to-one), and a floating point value.

feature
featureID
getFID()

Return the ID of the feature. The ID is currently an integer, and one can assume that it is comparable. It can be used to give a canonical ordering of features. It is provided as a method for compatibility with decorator patterns and other objects mimicking the interface.

getValue()

Accessor for the value field.

idx = <sqlobject.index.SODatabaseIndex object at 0x29ffe90>
idximg = <sqlobject.index.SODatabaseIndex object at 0x29ffed0>
image
imageID
j = feature_value
q = feature_value
class sqlmeta(instance)

Bases: sqlobject.main.sqlmeta

childName = None
columnDefinitions = {'featureID': <ForeignKey 29ff110 feature>, 'value': <FloatCol 29ff8d0 value>, 'imageID': <ForeignKey 29ff850 image>}
columnList = [<SOForeignKey featureID connected to Feature>, <SOForeignKey imageID connected to Image>, <SOFloatCol value>]
columns = {'featureID': <SOForeignKey featureID connected to Feature>, 'value': <SOFloatCol value>, 'imageID': <SOForeignKey imageID connected to Image>}
idName = 'id'
indexDefinitions = [<DatabaseIndex 29ff910 {'unique': True, 'name': 'idx', 'columns': ('feature', 'image')}>, <DatabaseIndex 29ff950 {'unique': False, 'name': 'idximg', 'columns': ('image',)}>]
indexes = [<sqlobject.index.SODatabaseIndex object at 0x29ffe90>, <sqlobject.index.SODatabaseIndex object at 0x29ffed0>]
joinDefinitions = []
joins = []
soClass

alias of FeatureValue

style = <sqlobject.styles.MixedCaseUnderscoreStyle object at 0x256d850>
table = 'feature_value'
FeatureValue.value
class pysteg.sql.FeatureSet(**kw)

Bases: sqlobject.main.SQLObject

A Feature Set is a collection of Features with a common description. Fields to be set in the constructor:

Key :human-readable, unique key
Description :longer description of the features
Func :python function to extract the feature The function is stored as a string and interpreted using eval().
Jpeg(bool) :flag to indicate that the extraction function takes a jpeg object instead of a pixmap matrix.
Matrix(bool) :flag to indicate a feature set represented by a matrix If set, the addFeatureMatrix() method applies.
Symidx(bool) :(assumes matrix) Flag to indicate that individual elements should be indexed symetrically around 0.

Relational fields:

Features (SelectResult):
 the included features
Queues (SelectResult):
 queue jobs asking to extract the feature set
addQueue(obj)
classmethod byKey(val, connection=None)
count(check=False)

Return the number of features in the set.

credit
description
destroy()

Delete the object including constituent features and feature values.

classmethod destroyKey(key)

Delete the object with the given key.

features
func
j = feature_set
jpeg
key
matrix
q = feature_set
queues
removeQueue(obj)
class sqlmeta(instance)

Bases: sqlobject.main.sqlmeta

childName = None
columnDefinitions = {'key': <StringCol 29ffa50 key>, 'description': <StringCol 29fff10 description>, 'jpeg': <BoolCol 29fffd0 jpeg>, 'credit': <StringCol 29fff50 credit>, 'symidx': <BoolCol 2a04090 symidx>, 'func': <StringCol 29fff90 func>, 'matrix': <BoolCol 2a04050 matrix>}
columnList = [<SOStringCol key alternate ID>, <SOStringCol description>, <SOStringCol credit default=None>, <SOStringCol func default=None>, <SOBoolCol jpeg default=False>, <SOBoolCol matrix default=True>, <SOBoolCol symidx default=True>]
columns = {'key': <SOStringCol key alternate ID>, 'description': <SOStringCol description>, 'jpeg': <SOBoolCol jpeg default=False>, 'credit': <SOStringCol credit default=None>, 'symidx': <SOBoolCol symidx default=True>, 'func': <SOStringCol func default=None>, 'matrix': <SOBoolCol matrix default=True>}
idName = 'id'
indexDefinitions = []
indexes = []
joinDefinitions = [<sqlobject.joins.SQLMultipleJoin object at 0x2a040d0>, <sqlobject.joins.SQLRelatedJoin object at 0x2a04150>]
joins = [<sqlobject.joins.SOSQLRelatedJoin object at 0x2a04c50>, <sqlobject.joins.SOSQLMultipleJoin object at 0x2a04d10>]
soClass

alias of FeatureSet

style = <sqlobject.styles.MixedCaseUnderscoreStyle object at 0x256d850>
table = 'feature_set'
FeatureSet.symidx
FeatureSet.theFeatures(image=None, verbosity=0)

Return an SQLResult of FeatureValue objects. If image is given, the result is filtered to include just the given image.

class pysteg.sql.FeatureVector(**kw)

Bases: sqlobject.main.SQLObject

A Feature Vector is a vector where each element is a Feature. The database tables stores Feature Vectors which form the basis for classifiers. Where Feature Sets contain Features with common descriptions, Feature Vectors contain Features which are used together.

addFeature(obj)
classmethod byKey(val, connection=None)
count()

Return the dimensionality of the feature vector.

credit
description
destroy()

Delete the object including corresponding objects in the relation table VectorFeature.

classmethod destroyKey(key)

Delete the object with the given key.

dim
features
j = feature_vector
key
q = feature_vector
removeFeature(obj)
class sqlmeta(instance)

Bases: sqlobject.main.sqlmeta

childName = None
columnDefinitions = {'dim': <IntCol 2a0b210 dim>, 'description': <StringCol 2a0b290 description>, 'key': <StringCol 2a04e90 key>, 'credit': <StringCol 2a0b250 credit>}
columnList = [<SOStringCol key alternate ID>, <SOIntCol dim>, <SOStringCol credit default=None>, <SOStringCol description default=None>]
columns = {'dim': <SOIntCol dim>, 'description': <SOStringCol description default=None>, 'key': <SOStringCol key alternate ID>, 'credit': <SOStringCol credit default=None>}
idName = 'id'
indexDefinitions = []
indexes = []
joinDefinitions = [<sqlobject.joins.SQLRelatedJoin object at 0x2a0b2d0>]
joins = [<sqlobject.joins.SOSQLRelatedJoin object at 0x2a0b890>]
soClass

alias of FeatureVector

style = <sqlobject.styles.MixedCaseUnderscoreStyle object at 0x256d850>
table = 'feature_vector'
FeatureVector.theFeatures(image=None, verbosity=0)

Return an SQLResult of FeatureValue objects. If image is given, the result is filtered to include just the given image.

class pysteg.sql.ImageSet(**kw)

Bases: sqlobject.main.SQLObject

Image Set is a collection of images from the same source and which have been subject to similar processing. It may be an original image base, or a collection of Images processed from an image base.

classmethod byName(val, connection=None)
classmethod byPath(val, connection=None)
colour
conv
description
destroy()

Delete the object, including constituent images.

classmethod destroyKey(key)

Delete the object with the given key.

destroyValues(*a, **kw)
extension
fileformat
getBasename(base)

Look up an image by its base filename (excluding extension).

getPath()

Get the full path to the image set directory.

images
imgformat
j = image_set
name
path
q = image_set
source
sourceID
class sqlmeta(instance)

Bases: sqlobject.main.sqlmeta

childName = None
columnDefinitions = {'imgformat': <StringCol 2a0e110 imgformat>, 'name': <StringCol 2a0e050 name>, 'conv': <StringCol 2a0e210 conv>, 'sourceID': <ForeignKey 29d7f50 source>, 'stego': <StringCol 2a0e1d0 stego>, 'colour': <BoolCol 2a0e150 colour>, 'extension': <StringCol 2a0e0d0 extension>, 'path': <StringCol 29d7fd0 path>, 'fileformat': <StringCol 2a0e090 fileformat>, 'description': <StringCol 2a0e190 description>}
columnList = [<SOForeignKey sourceID default=None connected to ImageSet>, <SOStringCol path alternate ID>, <SOStringCol name alternate ID>, <SOStringCol fileformat>, <SOStringCol extension default=None>, <SOStringCol imgformat>, <SOBoolCol colour default=False>, <SOStringCol description>, <SOStringCol stego default=None>, <SOStringCol conv default=None>]
columns = {'imgformat': <SOStringCol imgformat>, 'name': <SOStringCol name alternate ID>, 'conv': <SOStringCol conv default=None>, 'sourceID': <SOForeignKey sourceID default=None connected to ImageSet>, 'stego': <SOStringCol stego default=None>, 'colour': <SOBoolCol colour default=False>, 'extension': <SOStringCol extension default=None>, 'path': <SOStringCol path alternate ID>, 'fileformat': <SOStringCol fileformat>, 'description': <SOStringCol description>}
idName = 'id'
indexDefinitions = []
indexes = []
joinDefinitions = [<sqlobject.joins.SQLMultipleJoin object at 0x29d7f90>]
joins = [<sqlobject.joins.SOSQLMultipleJoin object at 0x29fb0d0>]
soClass

alias of ImageSet

style = <sqlobject.styles.MixedCaseUnderscoreStyle object at 0x256d850>
table = 'image_set'
ImageSet.stego
class pysteg.sql.TestSet(**kw)

Bases: sqlobject.main.SQLObject

A TestSet is a collection of images used for training or testing of a classifier.

addImage(obj)
classmethod byName(val, connection=None)
count()

Return the number of images in the set.

destroy()

Delete the object, including dependent SVMPerformance objects and TestImage objects.

getClass(label=1)

Return an iterator of Test Image objects restricted to the given class.

getFeatures(fv)

Return a pair (l,v) where l is a list of labels and v is a list of feature vectors for the individual images. This is designed to be compatible with libSVM.

getOneFeature(f)

Return an unsorted list of feature values for the given feature f which can be a Feature object or a key.

This appears to be exceedingly slow. TODO: It should be optimised to use a single query to the server.

images
j = test_set
name
perf
q = test_set
removeImage(obj)
class sqlmeta(instance)

Bases: sqlobject.main.sqlmeta

childName = None
columnDefinitions = {'name': <StringCol 2a0b0d0 name>}
columnList = [<SOStringCol name alternate ID>]
columns = {'name': <SOStringCol name alternate ID>}
idName = 'id'
indexDefinitions = []
indexes = []
joinDefinitions = [<sqlobject.joins.SQLRelatedJoin object at 0x2a0b3d0>, <sqlobject.joins.SQLMultipleJoin object at 0x2a0b950>, <sqlobject.joins.SQLMultipleJoin object at 0x2a0ba50>]
joins = [<sqlobject.joins.SOSQLMultipleJoin object at 0x2a0bd10>, <sqlobject.joins.SOSQLMultipleJoin object at 0x2a0bd90>, <sqlobject.joins.SOSQLRelatedJoin object at 0x2a0be50>]
soClass

alias of TestSet

style = <sqlobject.styles.MixedCaseUnderscoreStyle object at 0x256d850>
table = 'test_set'
TestSet.testimg
class pysteg.sql.TestImage(**kw)

Bases: sqlobject.main.SQLObject

TestImage is a relational table marking a given Image as included in a Test or Training Set. It includes additional fields, where label is used for classification and response for regression. Clearly, these numbers could be derived from Image data on the fly, but because it depends on both the Image and ImageSet tables that seems cumbersome and it is preferrable at this stage to hardcode it in the relational table.

The TestImage class is a decorator for the Image class, so all methods of Image are supported. See the Image class for details.

For any Image or TestImage object img, the call img() returns the appropriate Image object. This should be used polymorphically whenever the type is unknown and the Image (or Image ID) is required.

addFeatureMatrix(*a, **kw)
addFeatures(*a, **kw)
copy(imageset)

Copy this image into the TestSet imageset, with the same settings.

delta(*a, **kw)
featureValueObjects(*a, **kw)
getBasename()
getCoverFeature(*a, **kw)
getFeatures(*a, **kw)
getOneFeature(*a, **kw)
getPath()
getSource(*a, **kw)
image
imageID
imageset
imagesetID
j = test_image
label
q = test_image
response
class sqlmeta(instance)

Bases: sqlobject.main.sqlmeta

childName = None
columnDefinitions = {'imagesetID': <ForeignKey 2a0bf50 imageset>, 'label': <IntCol 2a12090 label>, 'response': <FloatCol 2a120d0 response>, 'imageID': <ForeignKey 2a0bb50 image>}
columnList = [<SOForeignKey imageID connected to Image not null>, <SOForeignKey imagesetID connected to TestSet>, <SOIntCol label default=None>, <SOFloatCol response default=None>]
columns = {'imagesetID': <SOForeignKey imagesetID connected to TestSet>, 'label': <SOIntCol label default=None>, 'response': <SOFloatCol response default=None>, 'imageID': <SOForeignKey imageID connected to Image not null>}
idName = 'id'
indexDefinitions = []
indexes = []
joinDefinitions = []
joins = []
soClass

alias of TestImage

style = <sqlobject.styles.MixedCaseUnderscoreStyle object at 0x256d850>
table = 'test_image'

svmodel Module

This module defines SQLObject classes for the image and feature datasets. The SQL database tables are defined through the SQLObject definitions.

Module:pysteg.sql.svmodel
Date:$Date$
Revision:$Revision$
Author:© 2012: Hans Georg Schaathun <georg@schaathun.net>
class pysteg.sql.svmodel.SVModel(**kw)[source]

Bases: sqlobject.main.SQLObject

An SVModel object defines a classifier (using SVM or another
algorithm. It consists of the following elements:
  1. testset: a TestSet object used as training set,
  2. fvector: a FeatureVector object identifying the features used
  3. feature: an associated Feature object representing the classifier output (classification score)
  4. classifier: integer to identify the classifier algorithm. Default: 1 for pysteg.ml.SVM.
  5. scalemethod: integer to identify the scaling method Default: 1 for scaling into the interval (-1,+1)
  6. scaling: a ScaleModel object The ScaleModel object is calculated from the training set prior to training, and is used to scale all test objects prior to testing.
  7. model: the classification model. This is not persistent. Instead the model is serialised either in a file identified by the modfile column, or by pickling in the pickle column.
  8. modfile: filename for the the classification model.
  9. param: classifier parameters (for arbitrary classifiers) This is a string which must evaluate to a python dict object.

The classifier is defined by the training set (1), feature vector (2), and a scaling strategy which is used to calculate (3). The model (4) is derived from the first three. The feature (5) is a hook for the classifier output to use it as input feature in a fused classifier.

Unfortunately, the libsvm classifier model relies on the ctypes library and cannot be pickled. It has to be stored on file and not in the database.

classmethod byKey(key)[source]

Retrieve an SVM model by key (feature key).

classify(img)

Classify the given images img, which may be an Image, a TestSet, or a list of Image objects. The return value is a list of (image,score) pairs where score is the soft information classification heuristic.

In addition to returning the scores, they are also entered in the database as feature values.

destroy(values=False)[source]

Delete the model from the data base. Note that any external model file is not removed.

classmethod destroyAll()[source]

Delete all SVM models.

getModel()[source]

Return the SVM model. The type is a libsvm ctypes object.

getPerformance(training=False, prune=False)[source]

Get the canonical SVMPerformance objects for this model. If training is True, then the performance on the training set is returned.

getScaleModel()[source]

Get the scaling model from the database. The return value is a pair (factor,addterm) of lists. The addterm should be subtracted from the feature vector and then the factor should be multiplied to get a scale feature vector.

getTestSet()[source]

Get the canonical test set for this SVM model. The canonical test set is found by taking the name of the training set and appending the string “_test”.

loadModel()[source]

Load the model from file. The filename is stored in the database.

TODO: add support for the pickle column.

classmethod new(classifier='svm', **kw)

Create a new instance of an SVModel using the given classifier algorith. Every configuration parameter of the classifier can be passed as a keyword argument, and so can key settings for the SVModel object.

saveModel(filename=None)[source]

Save the model to file. The filename can be specified as an argument, overriding normal behaviour. Otherwise it is sought in in the database or constructed from the key using a standard formula.

TODO: add support for the pickle column.

train()[source]

Train the classifier.

The training set is scaled before training and the scaling model stored.

class pysteg.sql.svmodel.SVMPerformance(**kw)[source]

Bases: sqlobject.main.SQLObject

Class to record performance statistics for a particular SVM model on a particular test set.

display()[source]

Pretty-print the performance entry.

run(recalculate=True)[source]

Calculate the performance.

By default, every test object is reclassified. If the recalculate parameter is set to False, the classification scores are retrieved from the database.

TODO: Check for missing classification scores and recalculate
as required.
class pysteg.sql.svmodel.ScaleModel(**kw)

Bases: sqlobject.main.SQLObject

This is a complete scaling model, with scaling formulæ for each feature. It implements part of the interface of FeatureVector and can be passed to the getFeatures() methods of Image, ImageSet, and TestSet to return complete scaled feature vectors with canonical coordinate ordering.

destroy(values=False)

Delete the model from the data base.

getDim()

Return the feature space dimensionality.

pysteg.sql.svmodel.newModel(fskey, key, fvkey, tset, fsdesc=None, desc=None, **kw)[source]

Create a new SVModel object and necessary related records.

pysteg.sql.svmodel.newSVM(trainingset, fvlist, fsdesc=None, **kw)[source]

Add a new SVModel for the given trainingset and every feature vector in fvlist. Queue all the new models for training. If a model already exists, no new model is created with the same parameters.

pysteg.sql.svmodel.testSVM(training=False, key=None)[source]

Queue a performance test of every SVModel object in the database.

If key is given, every model is tested on the corresponding TestSet object. Otherwise, it is tested on the training set if the training argument is set to True, or on the canonical test set if it is set to False (default).

pysteg.sql.svmodel.perfQueue(T, m)[source]

Queue a performance test of model m on TestSet T, unless a performance record already exists.

setup Module

Functions to create the database tables.

This module is only needed when the database is initialised and should otherwise be ignored.

pysteg.sql.setup.dropTables(**kw)[source]

Create all the tables.

pysteg.sql.setup.createTables(**kw)[source]

Create all the tables.

exceptions Module

Exceptions for the pysteg.sql package

Module:pysteg.sql.exceptions
Date:$Date$
Revision:$Revision$
Author:© 2012: Hans Georg Schaathun <georg@schaathun.net>
exception pysteg.sql.exceptions.ConfigError[source]

Bases: exceptions.Exception

Error in the configuration file.

exception pysteg.sql.exceptions.DataIntegrityException[source]

Bases: exceptions.Exception

Integrity error in the database contents.

exception pysteg.sql.exceptions.MissingDataException[source]

Bases: exceptions.Exception

This exception is raised when prerequisite data are found to be missing from the database during calculations. Catching it allows client processes to proceed to the next task in the queue.

Data entry and Feature extraction

imageset Module

Load image sets into the database and define test and training sets.

Module:pysteg.sql.imageset
Date:$Date$
Revision:$Revision$
Author:© 2012: Hans Georg Schaathun <georg@schaathun.net>

This is rather crude and it may be better to consult the scripts to see how the functions are used.

pysteg.sql.imageset.loadImages(fn)[source]

Define image sets based on a config file with the given filename fn.

pysteg.sql.imageset.similarTestSet(base, stego, name, incomplete=False)[source]

Create a new TestSet based on base, but using stego images from stego instead. The same random selection is used as in bane. If images are missing from the new stego set, an excpetion will be raised unless the incomplete argument is set to True, in which case the missing image will just be ommitted.

The current approach is not ideal. It is difficult to queue feature extraction tasks for the new images without requeueing old images as well. A new approach is needed.

pysteg.sql.imageset.makeTestSets(clean, stego, name, testname=None, testsize=None, trainsize=None, skew=0.5)[source]

Given two image sets for clean images and steganograms respectively, training and test sets are constructed randomly. It is assumed that both clean and stego contain corresponding images with the same basename, and if a clean image is included, the corresponding stego images is excluded, and vice versa.

TODO: Create intermediate subsets to make it easier to queue
feature extraction of just the necessary images.
pysteg.sql.imageset.groupTestSet(set, name, feature, min=None, max=None, create=True, **kw)[source]

Return a new TestSet object with the given name, created by taking the images from set which satisfy min <= feature < max. If min or max is None, it poses no constraint.

pysteg.sql.imageset.dummyTestSet(name, L)[source]

Create a dummy TestSet by combining all images from every image set in L. All the test images are given the label 1. This is mainly intended to form a set of images for which classification scores can be calculated in bulk, and not as a test or training set as such. The elements of L may be any iterable over images, including TestSet or ImageSet objects.

pysteg.sql.imageset.mergeTestSet(name, L, verbosity=1)

Create a dummy TestSet by combining all images from every image set in L. All the test images are given the label 1. This is mainly intended to form a set of images for which classification scores can be calculated in bulk, and not as a test or training set as such. The elements of L may be any iterable over images, including TestSet or ImageSet objects.

features Module

Define new feature sets and feature vectors.

Module:pysteg.sql.features
Date:$Date$
Revision:$Revision$
Author:© 2012: Hans Georg Schaathun <georg@schaathun.net>

This modules provide functions to define new features, feature vectors, and feature sets, including feature level fusion. The functions fsconfig() and fvconfig() read definitions from a config file and enter them into the database.

pysteg.sql.features.fsconfig(fn)[source]

Define feature sets/vectors from a config file with the given filename fn.

The file assumes feature sets by default, but any section with vector = True will be handled as a feature vector.

pysteg.sql.features.fvconfig(fn=None, cfg=None, fvlist=None, verbosity=1)[source]

Define feature vectors based on a config file with the given filename fn.

extract Module

Methods to load images, extract features, and enter them into the database. This module makes the glue between the SQL object library and the features module.

Module:pysteg.sql.extract
Date:$Date$
Revision:$Revision$
Author:© 2012: Hans Georg Schaathun <georg@schaathun.net>

Only two methods are exported, one to enter tasks in the queue and one to process the queue.

pysteg.sql.extract.queueSet(imgset, fv, stegonly=False, checkLog=True)[source]

Queue new tasks for feature extraction. The given feature sets or list of feature sets fv are queued for every image in imgset, which can be either an ImageSet or a TestSet.

pysteg.sql.extract.worker(*a, **kw)[source]

Process jobs from the queue until the queue is empty, or the SIGUSR2 signal is received.

It may be suboptimal to configure the signal handler in the API. It might be better to move the loop to the script defining the UI.

Analysis and reporting

stats Module

Module for statistical analysis and comparison of features.

Module:pysteg.sql.stats
Date:$Date$
Revision:$Revision$
Author:© 2012: Hans Georg Schaathun <georg@schaathun.net>
pysteg.sql.stats.ccount(L, fl=['1', '2'])[source]
pysteg.sql.stats.compareClassifiers(imgset, fl)[source]
pysteg.sql.stats.corrcoef(imgset, features)[source]

Returns the correlation coefficient matrix of the given features, calculated from the images in imgset. The features argument can be a list of Feature objects or feature keys. The imgset object can be a list of Image objects, an ImageSet object, or a TestSet object.

pysteg.sql.stats.deltaMoments(imgset, feature, label=None)[source]

Consider the difference in the given feature between a steganogram and its corresponding cover image. Return the four first statistical moments (mean, variance, skewness and kurtosis) of this difference in the given image set (imgset).

If label is given, imgset should be a TestSet or other iterable over TestImage object, and only images with the given class label will be considered.

Images which do not have a source (cover) image recorded in the database will be tacitly ignored.

pysteg.sql.stats.featureMedian(imgset, feature, label=None)[source]

Return the median of the given feature within imgset. If label is given, imgset should be a TestSet or other iterable over TestImage object, and only images with the given class label will be considered.

pysteg.sql.stats.featureMoments(imgset, feature, label=None)[source]

Return the four first statistical moments (mean, variance, skewness, and kurtosis) of the given feature in the given image set imgset.

If label is given, imgset should be a TestSet or other iterable over TestImage object, and only images with the given class label will be considered.

pysteg.sql.stats.featurePerc(imgset, feature, bins=10, label=None)[source]

Return the percentile points of the given feature within imgset. If label is given, imgset should be a TestSet or other iterable over TestImage object, and only images with the given class label will be considered.

pysteg.sql.stats.reclass(v)[source]

Translate +/-1 labels to 0/1.

pysteg.sql.stats.scatterPlot(imgset, f1, f2, outfile=None)[source]

Plot two features against eachother in the form of a scatter plot. The first argument is a TestSet object using the class labels 0 and 1, where 0 is plotted red and 1 is plotted blue. The second and third arguments are features, given as Feature objects or as keys. If the optional outfile is given, the plot is written to the given file.

latex Module

Selection of functions to create reports in LaTeX format.

pysteg.sql.latex.texPerformance()[source]

Return the entire SVMPerformance table in LaTeX format. The return value is a string.

pysteg.sql.latex.texModels()[source]

Return a LaTeX table showing all SVM models with testing where available.

tools Module

Reporting, feature dumps, and outher output.

Module:pysteg.sql.tools
Date:$Date$
Revision:$Revision$
Author:© 2012: Hans Georg Schaathun <georg@schaathun.net>

Selection of functions to produce reports and other output from the feature database.

pysteg.sql.tools.calculated()[source]

Print a report for each image set showing which features have been calculated and recorded in the database. This is a slow operation.

pysteg.sql.tools.saveScaledFeatures(fn, imgset, model, *a, **kw)[source]

Save scaled features in a text file, taking features from the imageset imgset and SVM model model. The output filename is given by fn. The output is libSVM’s sparse format by default, but specifying the keyword argument libsvm=False gives comma separated values instead.

TODO: fix this - avoid call to non-existent function scaling.applyScale.

pysteg.sql.tools.savefeatures(fn, imgset, fv, *a, **kw)[source]

Save features in a text file, taking features from the imageset imgset and feature vector fv. The output filename is given by fn. The output is libSVM’s sparse format by default, but specifying the keyword argument libsvm=False gives comma separated values instead.

coverselect Module

Classification error analysis for different cover selection groups.

Module:pysteg.sql.coverselect
Date:$Date$
Revision:$Revision$
Author:© 2012: Hans Georg Schaathun <georg@schaathun.net>

The main feature of this module is the cStat() function which plots bar charts of accuracy and/or FP/FN rates for subgroups of the test set divided according to some given feature. The charts make a basis for assessing the feature as a cover selection heuristic.

The mcStat() function is similar to cStat() but allows a joint plot for multiple classification scores.

There is also an under-documented iStat() function which is used to check cover selections created as an intersection of two or more existing selections.

The other methods auxiliaries, but may be useful for variations over the theme.

pysteg.sql.coverselect.cStat(testset, feature, score, bins=10, reverse=False, aplot=None, eplot=None)[source]

Make bar charts of accuracy and error rates of the classification score score for different groups of covers. The covers are divided into bins bins according to the cover heuristics feature. Error rates are plotted on the file eplot and accuracies on aplot.

pysteg.sql.coverselect.iStat(*a, **kw)[source]

Test intersection of multiple cover selections.

pysteg.sql.coverselect.mcStat(testset, feature, score, bins=10, reverse=False, aplot='/tmp/test.pdf')[source]

Make a bar chart of accuracies for different groups of covers. The covers are divided into bins bins according to the cover heuristics feature. The accuracy is plotted for each of the classifier scores in the list score. The plot is saved in the file aplot.

stegocompare Module

pysteg.sql.stegocompare.cbar(L, T, score, feature, outfile=None)[source]
pysteg.sql.stegocompare.ccount(L, bn, score, feature=None, verbosity=1)[source]

Given a list L of ImageSet objects and a basename bn, check the images corresonding to bn from each ImageSet and return the number of such images which are classified as stego by the given classifier score.

pysteg.sql.stegocompare.ccount2d(L1, L2, bn, s1, s2, verbosity=1)[source]

Given a list L of ImageSet objects and a basename bn, check the images corresonding to bn from each ImageSet and return the number of such images which are classified as stego by the given classifier score.

pysteg.sql.stegocompare.cdata2d(L1, L2, T, s1, s2)[source]
pysteg.sql.stegocompare.chist(L, T, score, outfile=None)[source]
pysteg.sql.stegocompare.cscatter(L, T, score, feature, outfile=None)[source]
pysteg.sql.stegocompare.cstat(L, T, score, feature, outfile=None)[source]

errors Module

Error profiling for steganalysers. Very experimental and undocumented.

Module:pysteg.sql.errors
Date:$Date$
Revision:$Revision$
Author:© 2012: Hans Georg Schaathun <georg@schaathun.net>
class pysteg.sql.errors.ErrorProfiler(L)[source]

Bases: object

cat(k1, k2)[source]
pairProfile(k1, k2)[source]
pie(outfile, k1, k2)[source]
class pysteg.sql.errors.ImgList(imgset=None)[source]

Bases: list

This class represents a list of images with feature values downloaded from the SQL server and managed in local memory.

bar3d(outfile=None, **kw)[source]
erates(key)[source]
get(key, score=None, ecat=None)[source]
getBars(k1, k2, key=None, bins=5)[source]
histogram(key, bins=5)[source]

Make a histogram of the feature values given by feature key.

histogram2d(k1, k2, bins=5, score=None, ecat=None)[source]

Make a 2D histogram of the feature values given by the two features k1 and k2.

loadCoverFeatures(L)[source]
loadFeatures(L)[source]
pysteg.sql.errors.scatterPlot(outfile, imgset, L, f1, f2)[source]

Private modules

_aux Module

Auxilliary functions. Used internally in the package; not intended for export.

pysteg.sql.aux.getFeatureObject(f)[source]
pysteg.sql.aux.isDuplicateError(e)[source]
pysteg.sql.aux.matrix2dict(M, centre=False)[source]

Return a list of (index,value) pairs where value is an entry in the matrix M and index its index. If centre is True, the indices are offset to be centred at 0.

pysteg.sql.aux.splitFilename(fn)
pysteg.sql.aux.tailType(obj)[source]

Return the class name of an object, stripping any prefixing package names. This is used to recognise exceptions returned from different database backends. The exception names have been standardised (DataError, IntegrityError, etc.), but each backend has its own definition.

_config Module

This module defines the cp class which is used to manage global configuration. It should not be imported directly, instead an instance, config, is exposed by the pysteg.sql package. The cp class decorates the OptionParser and should be used to parse options in scripts. Some command line options are defined to override the config file.

Module:pysteg.sql.config
Date:$Date$
Revision:$Revision$
Author:© 2012: Hans Georg Schaathun <georg@schaathun.net>

Modules providing objects exposed elsewhere

The following modules provide core classes for the data model. Members are exposed via other modules and documented there. Only very rarely would one need to import any of these explicitely.

tables Module

This module defines SQLObject classes for the image and feature datasets. The SQL database tables are defined through the SQLObject definitions.

Module:pysteg.sql.tables
Date:$Date$
Revision:$Revision$
Author:© 2012: Hans Georg Schaathun <georg@schaathun.net>

_queue Module

This module defines Queue class and associated SQL table to maintain the job queue. All the necessary functionality is provided by methods.

All the necessary objects are exposed by the main package, so this module should not normally be explicitly imported.

_scaling Module

This module defines a scaling model, to scale features prior to classification. It is used by the SVModel class, but is designed with loose coupling to facilitate reuse with other classification algorithms.

The ScaleModel class implements some of the interface of FeatureVector and can be used in lieu thereof when getting feature values from images.

The implementation is slow. Each feature value depends on three tables and three records are queried separately from Feature, FeatureValue, and Scaling. Combining the three in one view to be queried in one operation is expected to be faster.

This module will auto-connect to the database and must be loaded after options have been processed, to ensure correct connection. The reason for this is that it depends on views defined server side.