Common Schedule

Week 1: 5/12 - 5/16

- Set up computer; install software
- Familiarize myself with Matlab, having no previous experience
- Create Matlab exercise Q/A
- Become familiar with Adriane's progress and Emma's thesis Automated Writer Identification for Syriac Scribes

Week 2: 5/19 - 5/23

In Adriane's function processSyriacAnnotations.m, we saved the two structural arrays lftr and ldoc. lftr and ldoc are both structural arrays containing 14 fields which correspond to different letters in the Syriac alphabet. Each field is a cell array of various length, in which feature vectors describe a sample of that letter found in a manuscript. Inside each cell array, the first two indices indicate the character's global skew, while the rest of the indices in groups of 7 are feature vectors that describe each feature/cavity's 7 parameters of transformation (translation, rotation, elongation, and shear). At same index of each sample in ldoc is the string holding the source manuscript where the sample came from.

Here are screenshots from Matlab:

• lftr

• lftr.Alaph

• lftr.Alaph{1,1}

Functions written:

• runSyriacStyleComparison.m compares test features against sample features and ranks all sample sources in proximity to each test source. It takes 4 fixed parameters and 3 modification parameter (sampFtr, sampSrc, testFtr, testSrc + method, weights, letters). The four fixed parameters and the output rank are all structural arrays. The method parameter can be one of the three functions, simple vote, rank vote, or weighted rank vote. Users can pass in which method, which set of weights and which letters they want to use in the program.
• filterManuscripts.m filters a set of sample features and their sources and returns a subset of features and sources corresponding to the given subset of sources.
• identifyRepresentativeFeatureSample.m takes a cell array of sample features and returns the most representative sample feature and its index in the given array.
• identifyRepresentativeSample.m takes a cell array of samples and returns the most representative sample and its index in the given array.

Functions in progress:

• cleanSamples.m cleans samples of characters passed in as a cell array. The method of cleaning may be specified: ccTouch, selectLetters, predefined, predefinedFromFile. When using selectLetters method, users have the option to save (default empty string vs. a given directory) the masks generated for later use in the option to use predefined, which uses the latest masks saved. The default settings for cleaning: method = selectLetters; mask = ' ' (no mask); saveMasks = ' ' (don't save)

Week 5: 6/9 - 6/13

• Display raw manuscripts on the Representative Samples webpage
• Functions written/revised:
• processSyriacAnnotations.m: This function processes Syriac annotations by creating a struct array of all given samples with their source information and images. The user can add padding to the extracted samples and specify the desired data format: 'raw', 'binarized', 'grayscale' or 'blackOnWhite' by entering 'true' or 'false' after each argument.
@param dbFile a database text file containing annotations
@param varagin user can specify the directory of manuscripts, the padding in pixels added to the image coordinates, and fields of character images with different formats (raw, binarized, grayscale, blackOnWhite) of the images
defaults: raw = true
binarized = false,
grayscale = false,
blackOnWhite = false,
directory = 'C:\MATLAB\Handwriting\Summer2014\ImageDirectory');
@returns a struct array of character sample source information and image with fields: source, url, letter, coordinates
ex. sample(1).source = 'VatSyr1'
ex. sample(1).page = '01'
ex. sample(1).url = 'http://.../VatSyr1-01.png'
ex. sample(1).letter = 'Alaph'
ex. sample(1).coordinates = [515 65 80 105]
ex. sample(1).raw = %raw image
ex. sample(1).binarized = %binarized image
• getSampleSources.m: This function returns a struct (samples) of the data passed in from a database file.
@param dbFile The database file containing info about the character samples
ex. 'C:\MATLAB\Handwriting\Summer2014\FullDB-2014-06-09.txt'
@returns a struct array of character sample source information with fields: source, url, letter, coordinates
ex. sample(1).source = 'VatSyr1'
ex. sample(1).page = '01'
ex. sample(1).url = 'http://.../VatSyr1-01.png'
ex. sample(1).letter = 'Alaph'
ex. sample(1).coordinates = [515 65 80 105]
• getSamples.m: This function takes in a struct containing source information about character samples, and saves the manuscript image file into the specified local directory, retrieves each character sample and stores the image into the original struct. If no manuscript is found or the sample dimension requirements are not met, nothing is saved into the character sample field(s) for that source.
@param samples the struct containing source info about charactersamples, the struct must have fields: source, url, letter, coordinates
@param directory the image file of the desired manuscript
@param varargin user can specify the padding in pixels added to the image coordinates, and the output format (raw, binarized, grayscale, blackOnWhite) of the images and directory where manuscripts will be retrieved and downloaded
default: raw = true
binarized = false,
grayscale = false,
blackOnWhite = false,
directory = 'C:\MATLAB\Handwriting\Summer2014\ImageDirectory');
@returns the passed in struct with new field(s) containing the corresponding character sample(s)
• retrieveManuscriptImage.m: This function retrieves the desired manuscript(s)/image(s) specified in the given scalar struct recursively and saves the manuscript(s) into a local directory. If no manuscript is found online, returns NaN. The user may specify the output format: raw or binarized.
@param sample a scalar struct containing only the info about the desired sample image. The struct must have fields: letter, source, page, url, coordinates
@param varargin user can specify the output format (raw or binarized) and the local directory where the image files will be saved
default: outputFormat = raw
directory = 'C:\MATLAB\Handwriting\Summer2014\ImageDirectory'
@returns desired manuscript image, if none is found, returns NaN
• extractSampleFromManuscript.m: This function extracts a character/subimage from a larger manuscript/image. Any sample coordinate that has coordinates yielding an image greater than the maximum pixels specified will be downgraded and an image smaller than the minimum pixels specified will not be returned.
@param sampleInfo One layer of the original sample struct containing only the info about the desired sample image. The struct must have fields: letter, source, page, url, coordinates
ex. sample(3)
@param manuscript the image file of the desired manuscript
@param varargin user can specify the padding added to the image coordinates, the minimum pixels of the sample and the maximun pixels of the sample, and can pass in whether or not the manuscript is binarized, which updates the fill color if the sample exceeds the boundaries of the manuscript page