sql Package

sql Package

This package provides an interface to an SQL database to store image features for steganalysis.

Module:pysteg.sql
Date:$Date$
Revision:$Revision$
Author:© 2012: Hans Georg Schaathun <georg@schaathun.net>

The tables represented by SQLObject classes are visible directly in the package. The different submodules provide functionality:

setup:functions to create tables and enter standard feature def’s
imageset:enter images in the db and create test and training sets
features:establish new feature vectors
extract:extracting features from images and enter them in the db
queue:the class and SQL table for the job queue
stats:statistical analysis of features in the database
svmodel:managing SVM classifiers using features from the db
scaling:scaling models for use with learning classifiers
exception pysteg.sql.ConfigError[source]

Bases: exceptions.Exception

Error in the configuration file.

exception pysteg.sql.DataIntegrityException[source]

Bases: exceptions.Exception

Integrity error in the database contents.

exception pysteg.sql.MissingDataException[source]

Bases: exceptions.Exception

This exception is raised when prerequisite data are found to be missing from the database during calculations. Catching it allows client processes to proceed to the next task in the queue.

pysteg.sql.getImageSet(name)[source]

Look up an image set by its key. The ImageSet table is tried first, and the TestSet table if that fails. If the argument is itself an SQLObject, this is returned as is. It should be used by any function intended to take a polymorphic image set argument.

pysteg.sql.sqlConnect()[source]

Connect to the data base, using connection data from the config object.

The data model

The database tables and corresponding python objects are defined in three modules. Normally, one should not import tables or queue as all the elements are exposed by importing just the pysteg.sql package.

The svmodel module must be imported if needed though, and it also includes helper functions in addition to the data structure.

tables Module

This module defines SQLObject classes for the image and feature datasets. The SQL database tables are defined through the SQLObject definitions.

Module:pysteg.sql.tables
Date:$Date$
Revision:$Revision$
Author:© 2012: Hans Georg Schaathun <georg@schaathun.net>
class pysteg.sql.tables.Image(**kw)[source]

Bases: sqlobject.main.SQLObject

An Image is an Image Object to be analysed. It may be an identical copy of a Source Image, or it may be a modified version obtained by stego embedding, compression, down sampling, etc.

addFeatureMatrix(key, M)[source]

Add feature values from a numpy array M. The given key is the prefix, to which indices are appended. If symindex is True, the indices are symmetric around 0, otherwise they range from 0 upwards.

addFeatures(**kw)[source]

Add feature values for the image. The features are given as a dictionary with keys as used in the database and a floating point value. (Not tested!)

addFeaturesNamed(vals, names)[source]

Add feature values from a list vals. The keys of the features should be given in a list names.

classmethod byPath(path)[source]

Look up an image by its path name.

delta(feature)[source]

Compare this image with its cover or source image with respect to the given feature. The return value is the difference between the feature values. None is returned if the image does not have a known source image.

featureValueObjects(key=None)[source]

Return an iterator of FeatureValue objects defined by the given key. If key is None, all features are included.

getBasename()[source]

Return the basename of the file, stripping any extension off.

getCoverFeature(key)[source]

Obtain the given feature value recursively from the source image.

getFeatures(key=None, featureSet=False)[source]

Return a feature vector as a list of floating point values.

getOneFeature(key, verbosity=2)[source]

Return the given feature value.

getPath()[source]

Return the full path name for the image.

getSource()

Return the source image, or self if no source is defined.

class pysteg.sql.tables.Feature(**kw)[source]

Bases: sqlobject.main.SQLObject

A feature is a function of an image. The database table stores a unique key (ID) and a description.

addValue(image, value)[source]

Add a calculated feature value giving the image and its value.

destroy()[source]

Delete the feature including all calculated feature values.

class pysteg.sql.tables.FeatureValue(**kw)[source]

Bases: sqlobject.main.SQLObject

A Feature Value is a Feature calculated for a particular Image. The database table stores references to the Feature and Image as foreign keys (one-to-one), and a floating point value.

getFID()[source]

Return the ID of the feature. The ID is currently an integer, and one can assume that it is comparable. It can be used to give a canonical ordering of features. It is provided as a method for compatibility with decorator patterns and other objects mimicking the interface.

getValue()[source]

Accessor for the value field.

class pysteg.sql.tables.FeatureSet(**kw)[source]

Bases: sqlobject.main.SQLObject

A Feature Set is a collection of Features with a common description. Fields to be set in the constructor:

Key :human-readable, unique key
Description :longer description of the features
Func :python function to extract the feature The function is stored as a string and interpreted using eval().
Jpeg(bool) :flag to indicate that the extraction function takes a jpeg object instead of a pixmap matrix.
Matrix(bool) :flag to indicate a feature set represented by a matrix If set, the addFeatureMatrix() method applies.
Symidx(bool) :(assumes matrix) Flag to indicate that individual elements should be indexed symetrically around 0.

Relational fields:

Features (SelectResult):
 the included features
Queues (SelectResult):
 queue jobs asking to extract the feature set
count(check=False)[source]

Return the number of features in the set.

destroy()[source]

Delete the object including constituent features and feature values.

classmethod destroyKey(key)[source]

Delete the object with the given key.

theFeatures(image=None, verbosity=0)[source]

Return an SQLResult of FeatureValue objects. If image is given, the result is filtered to include just the given image.

class pysteg.sql.tables.FeatureVector(**kw)[source]

Bases: sqlobject.main.SQLObject

A Feature Vector is a vector where each element is a Feature. The database tables stores Feature Vectors which form the basis for classifiers. Where Feature Sets contain Features with common descriptions, Feature Vectors contain Features which are used together.

count()[source]

Return the dimensionality of the feature vector.

destroy()[source]

Delete the object including corresponding objects in the relation table VectorFeature.

classmethod destroyKey(key)[source]

Delete the object with the given key.

theFeatures(image=None, verbosity=0)[source]

Return an SQLResult of FeatureValue objects. If image is given, the result is filtered to include just the given image.

class pysteg.sql.tables.ImageSet(**kw)[source]

Bases: sqlobject.main.SQLObject

Image Set is a collection of images from the same source and which have been subject to similar processing. It may be an original image base, or a collection of Images processed from an image base.

destroy()[source]

Delete the object, including constituent images.

classmethod destroyKey(key)[source]

Delete the object with the given key.

getBasename(base)[source]

Look up an image by its base filename (excluding extension).

getPath()[source]

Get the full path to the image set directory.

class pysteg.sql.tables.TestSet(**kw)[source]

Bases: sqlobject.main.SQLObject

A TestSet is a collection of images used for training or testing of a classifier.

count()[source]

Return the number of images in the set.

destroy()[source]

Delete the object, including dependent SVMPerformance objects and TestImage objects.

getClass(label=1)[source]

Return an iterator of Test Image objects restricted to the given class.

getFeatures(fv)[source]

Return a pair (l,v) where l is a list of labels and v is a list of feature vectors for the individual images. This is designed to be compatible with libSVM.

getOneFeature(f)[source]

Return an unsorted list of feature values for the given feature f which can be a Feature object or a key.

This appears to be exceedingly slow. TODO: It should be optimised to use a single query to the server.

class pysteg.sql.tables.TestImage(**kw)[source]

Bases: sqlobject.main.SQLObject

TestImage is a relational table marking a given Image as included in a Test or Training Set. It includes additional fields, where label is used for classification and response for regression. Clearly, these numbers could be derived from Image data on the fly, but because it depends on both the Image and ImageSet tables that seems cumbersome and it is preferrable at this stage to hardcode it in the relational table.

The TestImage class is a decorator for the Image class, so all methods of Image are supported. See the Image class for details.

For any Image or TestImage object img, the call img() returns the appropriate Image object. This should be used polymorphically whenever the type is unknown and the Image (or Image ID) is required.

copy(imageset)[source]

Copy this image into the TestSet imageset, with the same settings.

queue Module

This module defines Queue class and associated SQL table to maintain the job queue. All the necessary functionality is provided by methods.

class pysteg.sql.queue.Queue(**kw)[source]

Bases: sqlobject.main.SQLObject

Table to record pending jobs. Each entry concerns one image and one or more feature sets. A worker node should use a transaction to select one item where assigned is null, and then set this field with the current date and time before the transaction is released.

Three modes: 1. image set/svmodel=None for normal feature calculation 2. image=None/svmodel set/testset=None for SVM training 3. image=None/svmodel and testset set for SVM testing

addTo(fset)[source]

Add feature sets to the queue item.

classmethod addToImage(img, fset)[source]

Add a new image with one or more feature sets to the queue.

destroy(force=False)[source]

Delete the job. Unless force is True, an assigned job will not be deleted. Normally, releaseJob() is used to release and delete a processed job. This is not safe; a transaction should be used to lock the record while deleting.

classmethod getJob(worker=None, SVM=True, verbosity=0)[source]

Get a job from the queue. Transactions are used to make this safe to concurrency.

If SVM is false, only feature extraction tasks will be accepted. This is useful if some compute nodes are used without access to the filesystem holding SVM model files.

releaseJob(worker=None, success=True)[source]

Notify the Queue that the job has been completed.

svmodel Module

scaling Module

This module defines a scaling model, to scale features prior to classification. It is used by the SVModel class, but is designed with loose coupling to facilitate reuse with other classification algorithms.

The ScaleModel class implements some of the interface of FeatureVector and can be used in lieu thereof when getting feature values from images.

The implementation is slow. Each feature value depends on three tables and three records are queried separately from Feature, FeatureValue, and Scaling. Combining the three in one view to be queried in one operation is expected to be faster.

This module will auto-connect to the database and must be loaded after options have been processed, to ensure correct connection. The reason for this is that it depends on views defined server side.

class pysteg.sql.scaling.ScaleModel(**kw)[source]

Bases: sqlobject.main.SQLObject

This is a complete scaling model, with scaling formulæ for each feature. It implements part of the interface of FeatureVector and can be passed to the getFeatures() methods of Image, ImageSet, and TestSet to return complete scaled feature vectors with canonical coordinate ordering.

destroy(values=False)[source]

Delete the model from the data base.

getDim()[source]

Return the feature space dimensionality.

class pysteg.sql.scaling.ScaledFeatureValue(**kw)[source]

Bases: sqlobject.main.SQLObject

This is an attempt to start on an object to fetch multiple SQL records in one operation to save time. NOT COMPLETE.

getValue()[source]

Return the scaled feature value.

class pysteg.sql.scaling.Scaling(**kw)[source]

Bases: sqlobject.main.SQLObject

The Scaling object holds the model to scale a particular feature.

Data entry and Feature extraction

imageset Module

Functions to load image sets into the database and define test and training sets.

Module:pysteg.sql.imageset
Date:$Date$
Revision:$Revision$
Author:© 2012: Hans Georg Schaathun <georg@schaathun.net>

This is rather crude and it may be better to consult the scripts to see how the functions are used.

pysteg.sql.imageset.loadImages(fn)[source]

Define image sets based on a config file with the given filename fn.

pysteg.sql.imageset.similarTestSet(base, stego, name, incomplete=False)[source]

Create a new TestSet based on base, but using stego images from stego instead. The same random selection is used as in bane. If images are missing from the new stego set, an excpetion will be raised unless the incomplete argument is set to True, in which case the missing image will just be ommitted.

The current approach is not ideal. It is difficult to queue feature extraction tasks for the new images without requeueing old images as well. A new approach is needed.

pysteg.sql.imageset.makeTestSets(clean, stego, name, testname=None, testsize=None, trainsize=None, skew=0.5)[source]

Given two image sets for clean images and steganograms respectively, training and test sets are constructed randomly. It is assumed that both clean and stego contain corresponding images with the same basename, and if a clean image is included, the corresponding stego images is excluded, and vice versa.

TODO: Create intermediate subsets to make it easier to queue
feature extraction of just the necessary images.
pysteg.sql.imageset.groupTestSet(set, name, feature, min=None, max=None, create=True, **kw)[source]

Return a new TestSet object with the given name, created by taking the images from set which satisfy min <= feature < max. If min or max is None, it poses no constraint.

pysteg.sql.imageset.dummyTestSet(name, L)[source]

Create a dummy TestSet by combining all images from every image set in L. All the test images are given the label 1. This is mainly intended to form a set of images for which classification scores can be calculated in bulk, and not as a test or training set as such. The elements of L may be any iterable over images, including TestSet or ImageSet objects.

features Module

This modules provide functions to define new features, feature vectors, and feature sets, including feature level fusion. The functions fsconfig() and fvconfig() read definitions from a config file and enter them into the database.

pysteg.sql.features.fsconfig(fn)[source]

Define feature sets based on a config file with the given filename fn.

pysteg.sql.features.fvconfig(fn=None, cfg=None, fvlist=None)[source]

Define feature vectors based on a config file with the given filename fn.

extract Module

Analysis and reporting

stats Module

Module for statistical analysis and comparison of features.

Module:pysteg.sql.stats
Date:$Date$
Revision:$Revision$
Author:© 2012: Hans Georg Schaathun <georg@schaathun.net>
pysteg.sql.stats.ccount(L, fl=['1', '2'])[source]
pysteg.sql.stats.compareClassifiers(imgset, fl)[source]
pysteg.sql.stats.corrcoef(imgset, features)[source]

Returns the correlation coefficient matrix of the given features, calculated from the images in imgset. The features argument can be a list of Feature objects or feature keys. The imgset object can be a list of Image objects, an ImageSet object, or a TestSet object.

pysteg.sql.stats.deltaMoments(imgset, feature, label=None)[source]

Consider the difference in the given feature between a steganogram and its corresponding cover image. Return the four first statistical moments (mean, variance, skewness and kurtosis) of this difference in the given image set (imgset).

If label is given, imgset should be a TestSet or other iterable over TestImage object, and only images with the given class label will be considered.

Images which do not have a source (cover) image recorded in the database will be tacitly ignored.

pysteg.sql.stats.featureMedian(imgset, feature, label=None)[source]

Return the median of the given feature within imgset. If label is given, imgset should be a TestSet or other iterable over TestImage object, and only images with the given class label will be considered.

pysteg.sql.stats.featureMoments(imgset, feature, label=None)[source]

Return the four first statistical moments (mean, variance, skewness, and kurtosis) of the given feature in the given image set imgset.

If label is given, imgset should be a TestSet or other iterable over TestImage object, and only images with the given class label will be considered.

pysteg.sql.stats.featurePerc(imgset, feature, bins=10, label=None)[source]

Return the percentile points of the given feature within imgset. If label is given, imgset should be a TestSet or other iterable over TestImage object, and only images with the given class label will be considered.

pysteg.sql.stats.reclass(v)[source]

Translate +/-1 labels to 0/1.

pysteg.sql.stats.scatterPlot(imgset, f1, f2, outfile=None)[source]

Plot two features against eachother in the form of a scatter plot. The first argument is a TestSet object using the class labels 0 and 1, where 0 is plotted red and 1 is plotted blue. The second and third arguments are features, given as Feature objects or as keys. If the optional outfile is given, the plot is written to the given file.

latex Module

tools Module

coverselect Module

The main feature of this module is the cStat() function which plots bar charts of accuracy and/or FP/FN rates for subgroups of the test set divided according to some given feature. The charts make a basis for assessing the feature as a cover selection heuristic.

There is also an under-documented iStat() function which is used to check cover selections created as an intersection of two or more existing selections.

The other methods auxiliaries, but may be useful for variations over the theme.

pysteg.sql.coverselect.cStat(testset, feature, score, bins=10, reverse=False, aplot=None, eplot=None)[source]

Make bar charts of accuracy and error rates of the classification score score for different groups of covers. The covers are divided into bins bins according to the cover heuristics feature. Error rates are plotted on the file eplot and accuracies on aplot.

pysteg.sql.coverselect.iStat(*a, **kw)[source]

Test intersection of multiple cover selections.

pysteg.sql.coverselect.mcStat(testset, feature, score, bins=10, reverse=False, aplot='/tmp/test.pdf')[source]

Make a bar chart of accuracies for different groups of covers. The covers are divided into bins bins according to the cover heuristics feature. The accuracy is plotted for each of the classifier scores in the list score. The plot is saved in the file aplot.

stegocompare Module

pysteg.sql.stegocompare.cbar(L, T, score, feature, outfile=None)[source]
pysteg.sql.stegocompare.ccount(L, bn, score, feature=None, verbosity=1)[source]

Given a list L of ImageSet objects and a basename bn, check the images corresonding to bn from each ImageSet and return the number of such images which are classified as stego by the given classifier score.

pysteg.sql.stegocompare.ccount2d(L1, L2, bn, s1, s2, verbosity=1)[source]

Given a list L of ImageSet objects and a basename bn, check the images corresonding to bn from each ImageSet and return the number of such images which are classified as stego by the given classifier score.

pysteg.sql.stegocompare.cdata2d(L1, L2, T, s1, s2)[source]
pysteg.sql.stegocompare.chist(L, T, score, outfile=None)[source]
pysteg.sql.stegocompare.cscatter(L, T, score, feature, outfile=None)[source]
pysteg.sql.stegocompare.cstat(L, T, score, feature, outfile=None)[source]

errors Module

Error profiling for steganalysers. Very experimental and undocumented.

Module:pysteg.sql.errors
Date:$Date$
Revision:$Revision$
Author:© 2012: Hans Georg Schaathun <georg@schaathun.net>
class pysteg.sql.errors.ErrorProfiler(L)[source]

Bases: object

cat(k1, k2)[source]
pairProfile(k1, k2)[source]
pie(outfile, k1, k2)[source]
class pysteg.sql.errors.Img(im)[source]

Bases: dict

eType(key)[source]
loadCoverFeatures(L)[source]
loadFeatures(L)[source]
class pysteg.sql.errors.ImgList(imgset=None)[source]

Bases: list

This class represents a list of images with feature values downloaded from the SQL server and managed in local memory.

bar3d(outfile=None, **kw)[source]
erates(key)[source]
get(key, score=None, ecat=None)[source]
getBars(k1, k2, key=None, bins=5)[source]
histogram(key, bins=5)[source]

Make a histogram of the feature values given by feature key.

histogram2d(k1, k2, bins=5, score=None, ecat=None)[source]

Make a 2D histogram of the feature values given by the two features k1 and k2.

loadCoverFeatures(L)[source]
loadFeatures(L)[source]
pysteg.sql.errors.getFeatures(imgset, L, f1, f2)[source]
pysteg.sql.errors.scatterPlot(outfile, imgset, L, f1, f2)[source]

Auxiliary modules

aux Module

Auxilliary functions. Used internally in the package; not intended for export.

pysteg.sql.aux.getFeatureObject(f)[source]
pysteg.sql.aux.isDuplicateError(e)[source]
pysteg.sql.aux.matrix2dict(M, centre=False)[source]

Return a list of (index,value) pairs where value is an entry in the matrix M and index its index. If centre is True, the indices are offset to be centred at 0.

pysteg.sql.aux.tailType(obj)[source]

Return the class name of an object, stripping any prefixing package names. This is used to recognise exceptions returned from different database backends. The exception names have been standardised (DataError, IntegrityError, etc.), but each backend has its own definition.

setup Module

config Module

This module defines the cp class which is used to manage global configuration. It should not be imported directly, instead an instance, config, is exposed by the pysteg.sql package. The cp class decorates the OptionParser and should be used to parse options in scripts. Some command line options are defined to override the config file.

Module:pysteg.sql.config
Date:$Date$
Revision:$Revision$
Author:© 2012: Hans Georg Schaathun <georg@schaathun.net>
class pysteg.sql.config.cp[source]

Bases: ConfigParser.SafeConfigParser

This class represents a configuration and supports option parsing both from config files and from command line options. The form is implemented by inheriting SafeConfigParse and the second by an instance of OptionParser.

add_option(*a, **kw)[source]

Define a command line option. This is passed to the OptionParser class.

getVerbosity(section='DEFAULT')[source]

Get the verbosity level as an integer.

parse_args(*a, **kw)[source]

Parse command line arguments. This is first passed to the OptionParser class before known options are interpreted and used to override the config files.