CRM114 Wrapper
Python wrapper for the CRM114 Classifier (http://crm114.sourceforge.net/).
The original code was given life over at Elegant Chaos by Sam Deane.
From here you can view the modified source or download it from svn with:
svn co http://www.openvest.com/svn/crm/trunk/crm.py
This is bleeding edge so be careful.
The python file requires the crm command to be installed in your command path or be specified in the cfg file.
Uses an ini style config file.
To use the module, create an instance of the Classifier class, giving it a path to the config file. Alternatively a space delimited list of categories can be passed in and the a crm.cfg file will be loaded from or created in the current dir.
c = Classifier("/path/to/mycrm.cfg") #to load a config file # or c = Classifier("good bad ugly") #to create a config in the current dir with defaults
To teach the classifier object about some text, call the learn method passing in a category (on of the ones that you provided originally OR a new category), and the text.
c.learn("good", "some good text") c.learn("bad", "some bad text") c.learn("ugly","SoMee Uggly tExT")
To find out what the classifier things about some text, call the classify method passing in the text. The result of this method is a tuple of-
- the category best matching the text,
- the probability of the match
- the pR (see crm114 docs).
(classification, probability, pR) = c.classify("pretty good text")
Crm Config file
[crm] # command path where the crm executable is found cmdpath = crm # directory where all classification(css) files are # %(here)s is replaced with the directory of this file #dir = %(here)s/data dir = %(here)s # classifier to use if this changes the css files need to be recreated classifier = osb unique microgroom extension = .css # space delimited list of possible classes #classes = spam ham logfile = %(here)s/learning.log
User path expansion is performed so ~/mycsspath/crm.cfg should be OK.
Crm Log file
In addition to adding a config file for easier management crm.py now includes a logging capability. The log file will hold all text learned with a timestamp. Using that log file, different classifiers can be tested by feeding logged learning events back through a new classification configuration.
