Common Schedule

From CSclasswiki
Revision as of 17:00, 24 June 2014 by Rhuang (talk | contribs) (Week6: 6/16 - 6/20)
Jump to: navigation, search

Alice Yang & Rui Huang


Week 1: 5/12 - 5/16

- Set up computer; install software
- Familiarize myself with Matlab, having no previous experience
- Create Matlab exercise Q/A
- Become familiar with Adriane's progress and Emma's thesis Automated Writer Identification for Syriac Scribes

Week 2: 5/19 - 5/23

In Adriane's function processSyriacAnnotations.m, we saved the two structural arrays lftr and ldoc. lftr and ldoc are both structural arrays containing 14 fields which correspond to different letters in the Syriac alphabet. Each field is a cell array of various length, in which feature vectors describe a sample of that letter found in a manuscript. Inside each cell array, the first two indices indicate the character's global skew, while the rest of the indices in groups of 7 are feature vectors that describe each feature/cavity's 7 parameters of transformation (translation, rotation, elongation, and shear). At same index of each sample in ldoc is the string holding the source manuscript where the sample came from.

Here are screenshots from Matlab:

  • lftr


Lftr.png

  • lftr.Alaph


Lftr.Alaph.png


  • lftr.Alaph{1,1}


Lftr.Alaphcell.png




Functions written:

  • runSyriacStyleComparison.m compares test features against sample features and ranks all sample sources in proximity to each test source. It takes 4 fixed parameters and 3 modification parameter (sampFtr, sampSrc, testFtr, testSrc + method, weights, letters). The four fixed parameters and the output rank are all structural arrays. The method parameter can be one of the three functions, simple vote, rank vote, or weighted rank vote. Users can pass in which method, which set of weights and which letters they want to use in the program.
  • filterManuscripts.m filters a set of sample features and their sources and returns a subset of features and sources corresponding to the given subset of sources.
  • identifyRepresentativeFeatureSample.m takes a cell array of sample features and returns the most representative sample feature and its index in the given array.
  • identifyRepresentativeSample.m takes a cell array of samples and returns the most representative sample and its index in the given array.

Functions in progress:

  • cleanSamples.m cleans samples of characters passed in as a cell array. The method of cleaning may be specified: ccTouch, selectLetters, predefined, predefinedFromFile. When using selectLetters method, users have the option to save (default empty string vs. a given directory) the masks generated for later use in the option to use predefined, which uses the latest masks saved. The default settings for cleaning: method = selectLetters; mask = ' ' (no mask); saveMasks = ' ' (don't save)

Week 3 & 4: 5/26 - 6/6

Please refer to Rui's page

Week 5: 6/9 - 6/13

  • Display raw manuscripts on the Representative Samples webpage
  • Functions written/revised:
    • processSyriacAnnotations.m: This function processes Syriac annotations by creating a struct array of all given samples with their source information and images. The user can add padding to the extracted samples and specify the desired data format: 'raw', 'binarized', 'grayscale' or 'blackOnWhite' by entering 'true' or 'false' after each argument.
@param dbFile a database text file containing annotations
@param varagin user can specify the directory of manuscripts, the padding in pixels added to the image coordinates, and fields of character images with different formats (raw, binarized, grayscale, blackOnWhite) of the images
ex: 'padding', 3, 'raw', true
defaults: raw = true
binarized = false,
grayscale = false,
blackOnWhite = false,
padding = 0,
directory = 'C:\MATLAB\Handwriting\Summer2014\ImageDirectory');
@returns a struct array of character sample source information and image with fields: source, url, letter, coordinates
ex. sample(1).source = 'VatSyr1'
ex. sample(1).page = '01'
ex. sample(1).url = 'http://.../VatSyr1-01.png'
ex. sample(1).letter = 'Alaph'
ex. sample(1).coordinates = [515 65 80 105]
ex. sample(1).raw = %raw image
ex. sample(1).binarized = %binarized image
    • getSampleSources.m: This function returns a struct (samples) of the data passed in from a database file.
@param dbFile The database file containing info about the character samples
ex. 'C:\MATLAB\Handwriting\Summer2014\FullDB-2014-06-09.txt'
@returns a struct array of character sample source information with fields: source, url, letter, coordinates
ex. sample(1).source = 'VatSyr1'
ex. sample(1).page = '01'
ex. sample(1).url = 'http://.../VatSyr1-01.png'
ex. sample(1).letter = 'Alaph'
ex. sample(1).coordinates = [515 65 80 105]
    • getSamples.m: This function takes in a struct containing source information about character samples, and saves the manuscript image file into the specified local directory, retrieves each character sample and stores the image into the original struct. If no manuscript is found or the sample dimension requirements are not met, nothing is saved into the character sample field(s) for that source.
@param samples the struct containing source info about charactersamples, the struct must have fields: source, url, letter, coordinates
@param directory the image file of the desired manuscript
@param varargin user can specify the padding in pixels added to the image coordinates, and the output format (raw, binarized, grayscale, blackOnWhite) of the images and directory where manuscripts will be retrieved and downloaded
ex: 'padding', 3, 'raw', true
default: raw = true
binarized = false,
grayscale = false,
blackOnWhite = false,
padding = 0,
directory = 'C:\MATLAB\Handwriting\Summer2014\ImageDirectory');
@returns the passed in struct with new field(s) containing the corresponding character sample(s)
    • retrieveManuscriptImage.m: This function retrieves the desired manuscript(s)/image(s) specified in the given scalar struct recursively and saves the manuscript(s) into a local directory. If no manuscript is found online, returns NaN. The user may specify the output format: raw or binarized.
@param sample a scalar struct containing only the info about the desired sample image. The struct must have fields: letter, source, page, url, coordinates
@param varargin user can specify the output format (raw or binarized) and the local directory where the image files will be saved
default: outputFormat = raw
directory = 'C:\MATLAB\Handwriting\Summer2014\ImageDirectory'
@returns desired manuscript image, if none is found, returns NaN
    • extractSampleFromManuscript.m: This function extracts a character/subimage from a larger manuscript/image. Any sample coordinate that has coordinates yielding an image greater than the maximum pixels specified will be downgraded and an image smaller than the minimum pixels specified will not be returned.
@param sampleInfo One layer of the original sample struct containing only the info about the desired sample image. The struct must have fields: letter, source, page, url, coordinates
ex. sample(3)
@param manuscript the image file of the desired manuscript
@param varargin user can specify the padding added to the image coordinates, the minimum pixels of the sample and the maximun pixels of the sample, and can pass in whether or not the manuscript is binarized, which updates the fill color if the sample exceeds the boundaries of the manuscript page
ex. 'padding',3,'binarized',true,'minpixels',50,'maxpixels',100
@returns the extracted sample image; if the coordinates of the sample image yield an image smaller than the min pixels or is all one color, this returns NaN

Week6: 6/16 - 6/20

  • Run processSyriacAnnotations.m to generate a struct array of samples with 7 fields: letter, source, page, url, coordinates, raw, and binarized.


Samples2.png

  • Functions written:
    • compareSamples.m: This function takes two samples/struct of samples and returns a number/matrix of how similar they are.
@param sample1 a struct containing the first character or set of characters to compare
@param sample2 a struct containing the second character or set of characters to compare
@param method the method used to compare the samples: chamferDistance, congealFeatures, or inkballDifference
@param varargin
if user selects method involving congealing features, user can specify which distance caluclation to use
options: 'Euclidean', 'Manhattan', or 'EarthMovers'
default: congealMethod = 'Euclidean'
ex: 'congealMethod', 'Manhattan'
user can pass in cells of sources along rows and columns for current letter
default: 'rowSources' = cell(1); 'colSources' = cell(1);
ex: 'rowSources',letterRowsources,'colSources',letterColsources
@returns
distances a number/matrix of the similarity between two sets of samples
rowSources updated sources along the row after deleting samples with empty feature vectors
colSources updated sources along the column after deleting samples with empty feature vectors
    • voteOnSimilarity.m: This function performs a vote on a matrix of distances/difference scores.
@param simMatrix a matrix of difference scores
@param method the desired voting method: 'simple', 'ranked' or 'weighted'
@rowSource a cell array specifying the manuscript of the sample in the corresponding row index of the passed in matrix
@colSource a cell array specifying the manuscript of the sample in the corresponding column index of the passed in matrix
@m1Sources a cell array of all of the manuscripts in the first set of samples to compare
@m2Sources a cell array of all of the manuscripts in the second set of samples to compare
@varargin user can specify the weight value of the current letter to be applied in the weighted rank vote
ex: 'weight', .35
default: weight = 0.5
@return a matrix of votes indicating the similarity between two sets of manuscripts; the higher the vote, the more similar the manuscripts
    • compareManuscripts.m: This function compares two sets of manuscripts and rates their similarity relative to the other manuscripts in each set, and returns a value/matrix describing their similarity. The higher the score, the more similar the two manuscripts are.
@param mSet1 a struct array containing samples from manuscript or cell array of struct arrays containing samples from manuscripts; each index in the cell holds samples from a different manuscript
@param mSet2 a struct array containing samples from manuscript or cell array of struct arrays containing samples from manuscripts; each index in the cell holds samples from a different manuscript
@varargin
user can specify the method to calculate the difference between characters
options: 'chamferDistance', 'congealFeatures' or 'inkballDifference'
default: weight = 0.5
ex: 'diffMethod', 'chamferDistance'
if user selects method involving congealing features, user can specify which distance caluclation to use
options: 'Euclidean', 'Manhattan' or 'EarthMovers'
default: congealMethod = 'Euclidean'
ex: 'congealMethod', 'Manhattan'
user can specify the voting method for calculating which manuscripts are most similar
options: 'simple' 'ranked' or 'weighted'
default: votingMethod = 'ranked'
ex: 'votingMethod', 'simple'
@returns returns a value/matrix describing the similarites between the two manuscripts/two sets of manuscripts; the higher the number, the more similar the two manuscripts relative to the other manuscripts in their sets

Week 7: 6/23 - 6/27