How to use it

Load the dataset

from preimage.datasets.loader import load_ocr_letters

train_dataset, test_dataset = load_ocr_letters(fold_id=0)

Create the inference model

Weighted degree

from preimage.utils.alphabet import Alphabet
from preimage.models.weighted_degree_model import WeightedDegreeModel

inference_model = WeightedDegreeModel(Alphabet.latin, n=3)

N-Gram

from preimage.utils.alphabet import Alphabet
from preimage.models.n_gram_model import NGramModel

inference_model = NGramModel(Alphabet.latin, n=3)

Generic String

from preimage.utils.alphabet import Alphabet
from preimage.models.generic_string_model import GenericStringModel

inference_model = GenericStringModel(Alphabet.latin, n=3, sigma_position=10)

Eulerian Path

from preimage.utils.alphabet import Alphabet
from preimage.models.eulerian_path_model import EulerianPathModel

inference_model = EulerianPathModel(Alphabet.latin, n=3)

Create the learner

from preimage.kernels.polynomial import PolynomialKernel
from preimage.learners.structured_krr import StructuredKernelRidgeRegression

poly_kernel = PolynomialKernel(degree=2)
alpha = 0.001
learner = StructuredKernelRidgeRegression(alpha, poly_kernel, inference_model)

Train and predict

learner.fit(train_dataset.X, train_dataset.Y, train_dataset.y_lengths)
Y_predictions = learner.predict(test_dataset.X, test_dataset.y_lengths)

Evaluate the predictions

from preimage.metrics.structured_output import zero_one_loss, hamming_loss, levenshtein_loss

print('zero_one_loss', zero_one_loss(test_dataset.Y, Y_predictions))
print('levenshtein_loss', levenshtein_loss(test_dataset.Y, Y_predictions))
print('hamming_loss', hamming_loss(test_dataset.Y, Y_predictions))

Dataset

The original OCR Letter dataset of Ben Taskar can be found at: http://www.seas.upenn.edu/~taskar/ocr/

Authors and Contributors

Amélie Rolland and Sébastien Giguère