How to use it
Load the dataset
from preimage.datasets.loader import load_ocr_letters
train_dataset, test_dataset = load_ocr_letters(fold_id=0)
Create the inference model
Weighted degree
from preimage.utils.alphabet import Alphabet
from preimage.models.weighted_degree_model import WeightedDegreeModel
inference_model = WeightedDegreeModel(Alphabet.latin, n=3)
N-Gram
from preimage.utils.alphabet import Alphabet
from preimage.models.n_gram_model import NGramModel
inference_model = NGramModel(Alphabet.latin, n=3)
Generic String
from preimage.utils.alphabet import Alphabet
from preimage.models.generic_string_model import GenericStringModel
inference_model = GenericStringModel(Alphabet.latin, n=3, sigma_position=10)
Eulerian Path
from preimage.utils.alphabet import Alphabet
from preimage.models.eulerian_path_model import EulerianPathModel
inference_model = EulerianPathModel(Alphabet.latin, n=3)
Create the learner
from preimage.kernels.polynomial import PolynomialKernel
from preimage.learners.structured_krr import StructuredKernelRidgeRegression
poly_kernel = PolynomialKernel(degree=2)
alpha = 0.001
learner = StructuredKernelRidgeRegression(alpha, poly_kernel, inference_model)
Train and predict
learner.fit(train_dataset.X, train_dataset.Y, train_dataset.y_lengths)
Y_predictions = learner.predict(test_dataset.X, test_dataset.y_lengths)
Evaluate the predictions
from preimage.metrics.structured_output import zero_one_loss, hamming_loss, levenshtein_loss
print('zero_one_loss', zero_one_loss(test_dataset.Y, Y_predictions))
print('levenshtein_loss', levenshtein_loss(test_dataset.Y, Y_predictions))
print('hamming_loss', hamming_loss(test_dataset.Y, Y_predictions))
Dataset
The original OCR Letter dataset of Ben Taskar can be found at: http://www.seas.upenn.edu/~taskar/ocr/
Authors and Contributors
Amélie Rolland and Sébastien Giguère