This package provides an interface to an SQL database to store image features for steganalysis.
Module: | pysteg.sql |
---|---|
Date: | $Date$ |
Revision: | $Revision$ |
Author: | © 2012: Hans Georg Schaathun <georg@schaathun.net> |
The tables represented by SQLObject classes are visible directly in the package. The different submodules provide functionality:
setup: | functions to create tables and enter standard feature def’s |
---|---|
imageset: | enter images in the db and create test and training sets |
features: | establish new feature vectors |
extract: | extracting features from images and enter them in the db |
stats: | statistical analysis of features in the database |
svmodel: | managing SVM classifiers using features from the db |
tools: | reporting, feature dumps, and other output |
latex: | LaTeX formatted output |
coverselect: | cover selection |
errors: | error analysis for steganalysers |
stegocompare: | performance analysis for steganalysers |
exceptions: | exceptions and errors |
A couple of private modules are important, but their contents exposed via the main package:
tables: | the core database tables |
---|---|
_queue: | the class and SQL table for the job queue |
_scaling: | scaling models for use with learning classifiers |
Connect to the data base, using connection data from the config object.
Look up an image set by its key. The ImageSet table is tried first, and the TestSet table if that fails. If the argument is itself an SQLObject, this is returned as is. It should be used by any function intended to take a polymorphic image set argument.
Bases: sqlobject.main.SQLObject
Table to record pending jobs. Each entry concerns one image and one or more feature sets. A worker node should use a transaction to select one item where assigned is null, and then set this field with the current date and time before the transaction is released.
Three modes: 1. image set/svmodel=None for normal feature calculation 2. image=None/svmodel set/testset=None for SVM training 3. image=None/svmodel and testset set for SVM testing
Add a list (or any other iterator) of feature sets to the queue item. The elements must be FeatureSet objects (neither id-s nor keys are acceptable).
Add feature sets to the queue item.
Add a new image with one or more feature sets to the queue.
Parameters : | img : an Image or TestImage object fset : an iterator of FeatureSet objects |
---|
Delete the job. Unless force is True, an assigned job will not be deleted. Normally, releaseJob() is used to release and delete a processed job. This is not safe; a transaction should be used to lock the record while deleting.
Get a job from the queue. Transactions are used to make this safe to concurrency.
If SVM is false, only feature extraction tasks will be accepted. This is useful if some compute nodes are used without access to the filesystem holding SVM model files.
Notify the Queue that the job has been completed.
Bases: sqlobject.main.sqlmeta
Bases: sqlobject.main.SQLObject
An Image is an Image Object to be analysed. It may be an identical copy of a Source Image, or it may be a modified version obtained by stego embedding, compression, down sampling, etc.
Add feature values from a numpy array M. The given key is the prefix, to which indices are appended. If symindex is True, the indices are symmetric around 0, otherwise they range from 0 upwards.
Add feature values for the image. The features are given as a dictionary with keys as used in the database and a floating point value. (Not tested!)
Add feature values from a list vals. The keys of the features should be given in a list names.
Look up an image by its path name.
Compare this image with its cover or source image with respect to the given feature. The return value is the difference between the feature values. None is returned if the image does not have a known source image.
Return an iterator of FeatureValue objects defined by the given key. If key is None, all features are included.
Return the basename of the file, stripping any extension off.
Obtain the given feature value recursively from the source image.
Return a feature vector as a list of floating point values.
Return the given feature value.
Return the full path name for the image.
Return the source image, or self if no source is defined.
Bases: sqlobject.main.sqlmeta
Bases: sqlobject.main.SQLObject
A feature is a function of an image. The database table stores a unique key (ID) and a description.
Add a calculated feature value giving the image and its value.
Delete the feature including all calculated feature values.
Bases: sqlobject.main.sqlmeta
Bases: sqlobject.main.SQLObject
A Feature Value is a Feature calculated for a particular Image. The database table stores references to the Feature and Image as foreign keys (one-to-one), and a floating point value.
Return the ID of the feature. The ID is currently an integer, and one can assume that it is comparable. It can be used to give a canonical ordering of features. It is provided as a method for compatibility with decorator patterns and other objects mimicking the interface.
Accessor for the value field.
Bases: sqlobject.main.sqlmeta
alias of FeatureValue
Bases: sqlobject.main.SQLObject
A Feature Set is a collection of Features with a common description. Fields to be set in the constructor:
Key : | human-readable, unique key |
---|---|
Description : | longer description of the features |
Func : | python function to extract the feature The function is stored as a string and interpreted using eval(). |
Jpeg(bool) : | flag to indicate that the extraction function takes a jpeg object instead of a pixmap matrix. |
Matrix(bool) : | flag to indicate a feature set represented by a matrix If set, the addFeatureMatrix() method applies. |
Symidx(bool) : | (assumes matrix) Flag to indicate that individual elements should be indexed symetrically around 0. |
Relational fields:
Features (SelectResult): | |
---|---|
the included features | |
Queues (SelectResult): | |
queue jobs asking to extract the feature set |
Return the number of features in the set.
Delete the object including constituent features and feature values.
Delete the object with the given key.
Bases: sqlobject.main.sqlmeta
alias of FeatureSet
Return an SQLResult of FeatureValue objects. If image is given, the result is filtered to include just the given image.
Bases: sqlobject.main.SQLObject
A Feature Vector is a vector where each element is a Feature. The database tables stores Feature Vectors which form the basis for classifiers. Where Feature Sets contain Features with common descriptions, Feature Vectors contain Features which are used together.
Return the dimensionality of the feature vector.
Delete the object including corresponding objects in the relation table VectorFeature.
Delete the object with the given key.
Bases: sqlobject.main.sqlmeta
alias of FeatureVector
Return an SQLResult of FeatureValue objects. If image is given, the result is filtered to include just the given image.
Bases: sqlobject.main.SQLObject
Image Set is a collection of images from the same source and which have been subject to similar processing. It may be an original image base, or a collection of Images processed from an image base.
Delete the object, including constituent images.
Delete the object with the given key.
Look up an image by its base filename (excluding extension).
Get the full path to the image set directory.
Bases: sqlobject.main.sqlmeta
Bases: sqlobject.main.SQLObject
A TestSet is a collection of images used for training or testing of a classifier.
Return the number of images in the set.
Delete the object, including dependent SVMPerformance objects and TestImage objects.
Return an iterator of Test Image objects restricted to the given class.
Return a pair (l,v) where l is a list of labels and v is a list of feature vectors for the individual images. This is designed to be compatible with libSVM.
Return an unsorted list of feature values for the given feature f which can be a Feature object or a key.
This appears to be exceedingly slow. TODO: It should be optimised to use a single query to the server.
Bases: sqlobject.main.sqlmeta
Bases: sqlobject.main.SQLObject
TestImage is a relational table marking a given Image as included in a Test or Training Set. It includes additional fields, where label is used for classification and response for regression. Clearly, these numbers could be derived from Image data on the fly, but because it depends on both the Image and ImageSet tables that seems cumbersome and it is preferrable at this stage to hardcode it in the relational table.
The TestImage class is a decorator for the Image class, so all methods of Image are supported. See the Image class for details.
For any Image or TestImage object img, the call img() returns the appropriate Image object. This should be used polymorphically whenever the type is unknown and the Image (or Image ID) is required.
Copy this image into the TestSet imageset, with the same settings.
Bases: sqlobject.main.sqlmeta
This module defines SQLObject classes for the image and feature datasets. The SQL database tables are defined through the SQLObject definitions.
Module: | pysteg.sql.svmodel |
---|---|
Date: | $Date$ |
Revision: | $Revision$ |
Author: | © 2012: Hans Georg Schaathun <georg@schaathun.net> |
Bases: sqlobject.main.SQLObject
The classifier is defined by the training set (1), feature vector (2), and a scaling strategy which is used to calculate (3). The model (4) is derived from the first three. The feature (5) is a hook for the classifier output to use it as input feature in a fused classifier.
Unfortunately, the libsvm classifier model relies on the ctypes library and cannot be pickled. It has to be stored on file and not in the database.
Classify the given images img, which may be an Image, a TestSet, or a list of Image objects. The return value is a list of (image,score) pairs where score is the soft information classification heuristic.
In addition to returning the scores, they are also entered in the database as feature values.
Delete the model from the data base. Note that any external model file is not removed.
Get the canonical SVMPerformance objects for this model. If training is True, then the performance on the training set is returned.
Get the scaling model from the database. The return value is a pair (factor,addterm) of lists. The addterm should be subtracted from the feature vector and then the factor should be multiplied to get a scale feature vector.
Get the canonical test set for this SVM model. The canonical test set is found by taking the name of the training set and appending the string “_test”.
Load the model from file. The filename is stored in the database.
TODO: add support for the pickle column.
Create a new instance of an SVModel using the given classifier algorith. Every configuration parameter of the classifier can be passed as a keyword argument, and so can key settings for the SVModel object.
Bases: sqlobject.main.SQLObject
Class to record performance statistics for a particular SVM model on a particular test set.
Bases: sqlobject.main.SQLObject
This is a complete scaling model, with scaling formulæ for each feature. It implements part of the interface of FeatureVector and can be passed to the getFeatures() methods of Image, ImageSet, and TestSet to return complete scaled feature vectors with canonical coordinate ordering.
Delete the model from the data base.
Return the feature space dimensionality.
Create a new SVModel object and necessary related records.
Add a new SVModel for the given trainingset and every feature vector in fvlist. Queue all the new models for training. If a model already exists, no new model is created with the same parameters.
Queue a performance test of every SVModel object in the database.
If key is given, every model is tested on the corresponding TestSet object. Otherwise, it is tested on the training set if the training argument is set to True, or on the canonical test set if it is set to False (default).
Functions to create the database tables.
This module is only needed when the database is initialised and should otherwise be ignored.
Exceptions for the pysteg.sql package
Module: | pysteg.sql.exceptions |
---|---|
Date: | $Date$ |
Revision: | $Revision$ |
Author: | © 2012: Hans Georg Schaathun <georg@schaathun.net> |
Bases: exceptions.Exception
Error in the configuration file.
Load image sets into the database and define test and training sets.
Module: | pysteg.sql.imageset |
---|---|
Date: | $Date$ |
Revision: | $Revision$ |
Author: | © 2012: Hans Georg Schaathun <georg@schaathun.net> |
This is rather crude and it may be better to consult the scripts to see how the functions are used.
Define image sets based on a config file with the given filename fn.
Create a new TestSet based on base, but using stego images from stego instead. The same random selection is used as in bane. If images are missing from the new stego set, an excpetion will be raised unless the incomplete argument is set to True, in which case the missing image will just be ommitted.
The current approach is not ideal. It is difficult to queue feature extraction tasks for the new images without requeueing old images as well. A new approach is needed.
Given two image sets for clean images and steganograms respectively, training and test sets are constructed randomly. It is assumed that both clean and stego contain corresponding images with the same basename, and if a clean image is included, the corresponding stego images is excluded, and vice versa.
Return a new TestSet object with the given name, created by taking the images from set which satisfy min <= feature < max. If min or max is None, it poses no constraint.
Create a dummy TestSet by combining all images from every image set in L. All the test images are given the label 1. This is mainly intended to form a set of images for which classification scores can be calculated in bulk, and not as a test or training set as such. The elements of L may be any iterable over images, including TestSet or ImageSet objects.
Create a dummy TestSet by combining all images from every image set in L. All the test images are given the label 1. This is mainly intended to form a set of images for which classification scores can be calculated in bulk, and not as a test or training set as such. The elements of L may be any iterable over images, including TestSet or ImageSet objects.
Define new feature sets and feature vectors.
Module: | pysteg.sql.features |
---|---|
Date: | $Date$ |
Revision: | $Revision$ |
Author: | © 2012: Hans Georg Schaathun <georg@schaathun.net> |
This modules provide functions to define new features, feature vectors, and feature sets, including feature level fusion. The functions fsconfig() and fvconfig() read definitions from a config file and enter them into the database.
Methods to load images, extract features, and enter them into the database. This module makes the glue between the SQL object library and the features module.
Module: | pysteg.sql.extract |
---|---|
Date: | $Date$ |
Revision: | $Revision$ |
Author: | © 2012: Hans Georg Schaathun <georg@schaathun.net> |
Only two methods are exported, one to enter tasks in the queue and one to process the queue.
Module for statistical analysis and comparison of features.
Module: | pysteg.sql.stats |
---|---|
Date: | $Date$ |
Revision: | $Revision$ |
Author: | © 2012: Hans Georg Schaathun <georg@schaathun.net> |
Returns the correlation coefficient matrix of the given features, calculated from the images in imgset. The features argument can be a list of Feature objects or feature keys. The imgset object can be a list of Image objects, an ImageSet object, or a TestSet object.
Consider the difference in the given feature between a steganogram and its corresponding cover image. Return the four first statistical moments (mean, variance, skewness and kurtosis) of this difference in the given image set (imgset).
If label is given, imgset should be a TestSet or other iterable over TestImage object, and only images with the given class label will be considered.
Images which do not have a source (cover) image recorded in the database will be tacitly ignored.
Return the median of the given feature within imgset. If label is given, imgset should be a TestSet or other iterable over TestImage object, and only images with the given class label will be considered.
Return the four first statistical moments (mean, variance, skewness, and kurtosis) of the given feature in the given image set imgset.
If label is given, imgset should be a TestSet or other iterable over TestImage object, and only images with the given class label will be considered.
Return the percentile points of the given feature within imgset. If label is given, imgset should be a TestSet or other iterable over TestImage object, and only images with the given class label will be considered.
Plot two features against eachother in the form of a scatter plot. The first argument is a TestSet object using the class labels 0 and 1, where 0 is plotted red and 1 is plotted blue. The second and third arguments are features, given as Feature objects or as keys. If the optional outfile is given, the plot is written to the given file.
Selection of functions to create reports in LaTeX format.
Reporting, feature dumps, and outher output.
Module: | pysteg.sql.tools |
---|---|
Date: | $Date$ |
Revision: | $Revision$ |
Author: | © 2012: Hans Georg Schaathun <georg@schaathun.net> |
Selection of functions to produce reports and other output from the feature database.
Print a report for each image set showing which features have been calculated and recorded in the database. This is a slow operation.
Save scaled features in a text file, taking features from the imageset imgset and SVM model model. The output filename is given by fn. The output is libSVM’s sparse format by default, but specifying the keyword argument libsvm=False gives comma separated values instead.
TODO: fix this - avoid call to non-existent function scaling.applyScale.
Save features in a text file, taking features from the imageset imgset and feature vector fv. The output filename is given by fn. The output is libSVM’s sparse format by default, but specifying the keyword argument libsvm=False gives comma separated values instead.
Classification error analysis for different cover selection groups.
Module: | pysteg.sql.coverselect |
---|---|
Date: | $Date$ |
Revision: | $Revision$ |
Author: | © 2012: Hans Georg Schaathun <georg@schaathun.net> |
The main feature of this module is the cStat() function which plots bar charts of accuracy and/or FP/FN rates for subgroups of the test set divided according to some given feature. The charts make a basis for assessing the feature as a cover selection heuristic.
The mcStat() function is similar to cStat() but allows a joint plot for multiple classification scores.
There is also an under-documented iStat() function which is used to check cover selections created as an intersection of two or more existing selections.
The other methods auxiliaries, but may be useful for variations over the theme.
Make bar charts of accuracy and error rates of the classification score score for different groups of covers. The covers are divided into bins bins according to the cover heuristics feature. Error rates are plotted on the file eplot and accuracies on aplot.
Make a bar chart of accuracies for different groups of covers. The covers are divided into bins bins according to the cover heuristics feature. The accuracy is plotted for each of the classifier scores in the list score. The plot is saved in the file aplot.
Given a list L of ImageSet objects and a basename bn, check the images corresonding to bn from each ImageSet and return the number of such images which are classified as stego by the given classifier score.
Error profiling for steganalysers. Very experimental and undocumented.
Module: | pysteg.sql.errors |
---|---|
Date: | $Date$ |
Revision: | $Revision$ |
Author: | © 2012: Hans Georg Schaathun <georg@schaathun.net> |
Bases: list
This class represents a list of images with feature values downloaded from the SQL server and managed in local memory.
Auxilliary functions. Used internally in the package; not intended for export.
Return a list of (index,value) pairs where value is an entry in the matrix M and index its index. If centre is True, the indices are offset to be centred at 0.
Return the class name of an object, stripping any prefixing package names. This is used to recognise exceptions returned from different database backends. The exception names have been standardised (DataError, IntegrityError, etc.), but each backend has its own definition.
This module defines the cp class which is used to manage global configuration. It should not be imported directly, instead an instance, config, is exposed by the pysteg.sql package. The cp class decorates the OptionParser and should be used to parse options in scripts. Some command line options are defined to override the config file.
Module: | pysteg.sql.config |
---|---|
Date: | $Date$ |
Revision: | $Revision$ |
Author: | © 2012: Hans Georg Schaathun <georg@schaathun.net> |
The following modules provide core classes for the data model. Members are exposed via other modules and documented there. Only very rarely would one need to import any of these explicitely.
This module defines SQLObject classes for the image and feature datasets. The SQL database tables are defined through the SQLObject definitions.
Module: | pysteg.sql.tables |
---|---|
Date: | $Date$ |
Revision: | $Revision$ |
Author: | © 2012: Hans Georg Schaathun <georg@schaathun.net> |
This module defines Queue class and associated SQL table to maintain the job queue. All the necessary functionality is provided by methods.
All the necessary objects are exposed by the main package, so this module should not normally be explicitly imported.
This module defines a scaling model, to scale features prior to classification. It is used by the SVModel class, but is designed with loose coupling to facilitate reuse with other classification algorithms.
The ScaleModel class implements some of the interface of FeatureVector and can be used in lieu thereof when getting feature values from images.
The implementation is slow. Each feature value depends on three tables and three records are queried separately from Feature, FeatureValue, and Scaling. Combining the three in one view to be queried in one operation is expected to be faster.
This module will auto-connect to the database and must be loaded after options have been processed, to ensure correct connection. The reason for this is that it depends on views defined server side.