This package provides an interface to an SQL database to store image features for steganalysis.
Module: | pysteg.sql |
---|---|
Date: | $Date$ |
Revision: | $Revision$ |
Author: | © 2012: Hans Georg Schaathun <georg@schaathun.net> |
The tables represented by SQLObject classes are visible directly in the package. The different submodules provide functionality:
setup: | functions to create tables and enter standard feature def’s |
---|---|
imageset: | enter images in the db and create test and training sets |
features: | establish new feature vectors |
extract: | extracting features from images and enter them in the db |
queue: | the class and SQL table for the job queue |
stats: | statistical analysis of features in the database |
svmodel: | managing SVM classifiers using features from the db |
scaling: | scaling models for use with learning classifiers |
Bases: exceptions.Exception
Error in the configuration file.
Bases: exceptions.Exception
Integrity error in the database contents.
Bases: exceptions.Exception
This exception is raised when prerequisite data are found to be missing from the database during calculations. Catching it allows client processes to proceed to the next task in the queue.
The database tables and corresponding python objects are defined in three modules. Normally, one should not import tables or queue as all the elements are exposed by importing just the pysteg.sql package.
The svmodel module must be imported if needed though, and it also includes helper functions in addition to the data structure.
This module defines SQLObject classes for the image and feature datasets. The SQL database tables are defined through the SQLObject definitions.
Module: | pysteg.sql.tables |
---|---|
Date: | $Date$ |
Revision: | $Revision$ |
Author: | © 2012: Hans Georg Schaathun <georg@schaathun.net> |
Bases: sqlobject.main.SQLObject
An Image is an Image Object to be analysed. It may be an identical copy of a Source Image, or it may be a modified version obtained by stego embedding, compression, down sampling, etc.
Add feature values from a numpy array M. The given key is the prefix, to which indices are appended. If symindex is True, the indices are symmetric around 0, otherwise they range from 0 upwards.
Add feature values for the image. The features are given as a dictionary with keys as used in the database and a floating point value. (Not tested!)
Add feature values from a list vals. The keys of the features should be given in a list names.
Compare this image with its cover or source image with respect to the given feature. The return value is the difference between the feature values. None is returned if the image does not have a known source image.
Return an iterator of FeatureValue objects defined by the given key. If key is None, all features are included.
Return a feature vector as a list of floating point values.
Return the source image, or self if no source is defined.
Bases: sqlobject.main.SQLObject
A feature is a function of an image. The database table stores a unique key (ID) and a description.
Bases: sqlobject.main.SQLObject
A Feature Value is a Feature calculated for a particular Image. The database table stores references to the Feature and Image as foreign keys (one-to-one), and a floating point value.
Bases: sqlobject.main.SQLObject
A Feature Set is a collection of Features with a common description. Fields to be set in the constructor:
Key : | human-readable, unique key |
---|---|
Description : | longer description of the features |
Func : | python function to extract the feature The function is stored as a string and interpreted using eval(). |
Jpeg(bool) : | flag to indicate that the extraction function takes a jpeg object instead of a pixmap matrix. |
Matrix(bool) : | flag to indicate a feature set represented by a matrix If set, the addFeatureMatrix() method applies. |
Symidx(bool) : | (assumes matrix) Flag to indicate that individual elements should be indexed symetrically around 0. |
Relational fields:
Features (SelectResult): | |
---|---|
the included features | |
Queues (SelectResult): | |
queue jobs asking to extract the feature set |
Bases: sqlobject.main.SQLObject
A Feature Vector is a vector where each element is a Feature. The database tables stores Feature Vectors which form the basis for classifiers. Where Feature Sets contain Features with common descriptions, Feature Vectors contain Features which are used together.
Bases: sqlobject.main.SQLObject
Image Set is a collection of images from the same source and which have been subject to similar processing. It may be an original image base, or a collection of Images processed from an image base.
Bases: sqlobject.main.SQLObject
A TestSet is a collection of images used for training or testing of a classifier.
Delete the object, including dependent SVMPerformance objects and TestImage objects.
Bases: sqlobject.main.SQLObject
TestImage is a relational table marking a given Image as included in a Test or Training Set. It includes additional fields, where label is used for classification and response for regression. Clearly, these numbers could be derived from Image data on the fly, but because it depends on both the Image and ImageSet tables that seems cumbersome and it is preferrable at this stage to hardcode it in the relational table.
The TestImage class is a decorator for the Image class, so all methods of Image are supported. See the Image class for details.
For any Image or TestImage object img, the call img() returns the appropriate Image object. This should be used polymorphically whenever the type is unknown and the Image (or Image ID) is required.
This module defines Queue class and associated SQL table to maintain the job queue. All the necessary functionality is provided by methods.
Bases: sqlobject.main.SQLObject
Table to record pending jobs. Each entry concerns one image and one or more feature sets. A worker node should use a transaction to select one item where assigned is null, and then set this field with the current date and time before the transaction is released.
Three modes: 1. image set/svmodel=None for normal feature calculation 2. image=None/svmodel set/testset=None for SVM training 3. image=None/svmodel and testset set for SVM testing
Add a new image with one or more feature sets to the queue.
Delete the job. Unless force is True, an assigned job will not be deleted. Normally, releaseJob() is used to release and delete a processed job. This is not safe; a transaction should be used to lock the record while deleting.
Get a job from the queue. Transactions are used to make this safe to concurrency.
If SVM is false, only feature extraction tasks will be accepted. This is useful if some compute nodes are used without access to the filesystem holding SVM model files.
This module defines a scaling model, to scale features prior to classification. It is used by the SVModel class, but is designed with loose coupling to facilitate reuse with other classification algorithms.
The ScaleModel class implements some of the interface of FeatureVector and can be used in lieu thereof when getting feature values from images.
The implementation is slow. Each feature value depends on three tables and three records are queried separately from Feature, FeatureValue, and Scaling. Combining the three in one view to be queried in one operation is expected to be faster.
This module will auto-connect to the database and must be loaded after options have been processed, to ensure correct connection. The reason for this is that it depends on views defined server side.
Bases: sqlobject.main.SQLObject
This is a complete scaling model, with scaling formulæ for each feature. It implements part of the interface of FeatureVector and can be passed to the getFeatures() methods of Image, ImageSet, and TestSet to return complete scaled feature vectors with canonical coordinate ordering.
Functions to load image sets into the database and define test and training sets.
Module: | pysteg.sql.imageset |
---|---|
Date: | $Date$ |
Revision: | $Revision$ |
Author: | © 2012: Hans Georg Schaathun <georg@schaathun.net> |
This is rather crude and it may be better to consult the scripts to see how the functions are used.
Define image sets based on a config file with the given filename fn.
Create a new TestSet based on base, but using stego images from stego instead. The same random selection is used as in bane. If images are missing from the new stego set, an excpetion will be raised unless the incomplete argument is set to True, in which case the missing image will just be ommitted.
The current approach is not ideal. It is difficult to queue feature extraction tasks for the new images without requeueing old images as well. A new approach is needed.
Given two image sets for clean images and steganograms respectively, training and test sets are constructed randomly. It is assumed that both clean and stego contain corresponding images with the same basename, and if a clean image is included, the corresponding stego images is excluded, and vice versa.
Return a new TestSet object with the given name, created by taking the images from set which satisfy min <= feature < max. If min or max is None, it poses no constraint.
Create a dummy TestSet by combining all images from every image set in L. All the test images are given the label 1. This is mainly intended to form a set of images for which classification scores can be calculated in bulk, and not as a test or training set as such. The elements of L may be any iterable over images, including TestSet or ImageSet objects.
This modules provide functions to define new features, feature vectors, and feature sets, including feature level fusion. The functions fsconfig() and fvconfig() read definitions from a config file and enter them into the database.
Module for statistical analysis and comparison of features.
Module: | pysteg.sql.stats |
---|---|
Date: | $Date$ |
Revision: | $Revision$ |
Author: | © 2012: Hans Georg Schaathun <georg@schaathun.net> |
Returns the correlation coefficient matrix of the given features, calculated from the images in imgset. The features argument can be a list of Feature objects or feature keys. The imgset object can be a list of Image objects, an ImageSet object, or a TestSet object.
Consider the difference in the given feature between a steganogram and its corresponding cover image. Return the four first statistical moments (mean, variance, skewness and kurtosis) of this difference in the given image set (imgset).
If label is given, imgset should be a TestSet or other iterable over TestImage object, and only images with the given class label will be considered.
Images which do not have a source (cover) image recorded in the database will be tacitly ignored.
Return the median of the given feature within imgset. If label is given, imgset should be a TestSet or other iterable over TestImage object, and only images with the given class label will be considered.
Return the four first statistical moments (mean, variance, skewness, and kurtosis) of the given feature in the given image set imgset.
If label is given, imgset should be a TestSet or other iterable over TestImage object, and only images with the given class label will be considered.
Return the percentile points of the given feature within imgset. If label is given, imgset should be a TestSet or other iterable over TestImage object, and only images with the given class label will be considered.
Plot two features against eachother in the form of a scatter plot. The first argument is a TestSet object using the class labels 0 and 1, where 0 is plotted red and 1 is plotted blue. The second and third arguments are features, given as Feature objects or as keys. If the optional outfile is given, the plot is written to the given file.
The main feature of this module is the cStat() function which plots bar charts of accuracy and/or FP/FN rates for subgroups of the test set divided according to some given feature. The charts make a basis for assessing the feature as a cover selection heuristic.
There is also an under-documented iStat() function which is used to check cover selections created as an intersection of two or more existing selections.
The other methods auxiliaries, but may be useful for variations over the theme.
Make bar charts of accuracy and error rates of the classification score score for different groups of covers. The covers are divided into bins bins according to the cover heuristics feature. Error rates are plotted on the file eplot and accuracies on aplot.
Make a bar chart of accuracies for different groups of covers. The covers are divided into bins bins according to the cover heuristics feature. The accuracy is plotted for each of the classifier scores in the list score. The plot is saved in the file aplot.
Given a list L of ImageSet objects and a basename bn, check the images corresonding to bn from each ImageSet and return the number of such images which are classified as stego by the given classifier score.
Error profiling for steganalysers. Very experimental and undocumented.
Module: | pysteg.sql.errors |
---|---|
Date: | $Date$ |
Revision: | $Revision$ |
Author: | © 2012: Hans Georg Schaathun <georg@schaathun.net> |
Bases: list
This class represents a list of images with feature values downloaded from the SQL server and managed in local memory.
Auxilliary functions. Used internally in the package; not intended for export.
Return a list of (index,value) pairs where value is an entry in the matrix M and index its index. If centre is True, the indices are offset to be centred at 0.
Return the class name of an object, stripping any prefixing package names. This is used to recognise exceptions returned from different database backends. The exception names have been standardised (DataError, IntegrityError, etc.), but each backend has its own definition.
This module defines the cp class which is used to manage global configuration. It should not be imported directly, instead an instance, config, is exposed by the pysteg.sql package. The cp class decorates the OptionParser and should be used to parse options in scripts. Some command line options are defined to override the config file.
Module: | pysteg.sql.config |
---|---|
Date: | $Date$ |
Revision: | $Revision$ |
Author: | © 2012: Hans Georg Schaathun <georg@schaathun.net> |
Bases: ConfigParser.SafeConfigParser
This class represents a configuration and supports option parsing both from config files and from command line options. The form is implemented by inheriting SafeConfigParse and the second by an instance of OptionParser.