Sunday, October 2, 2016

Homework 2 - Segmentation

Overview
Following on our previous assignment about feature extraction and classification, we will now move into additional sketch-specific operations.  Over the next two assignments, we will cover stroke segmentation (corner finding) and a couple of simple recognition algorithms.  Homework 2 will focus on segmentation.

Instructions
Languages and Data
While there is no "required" language for this assignment, it is strongly recommended that you use Javascript.  The primary reason is ready access to the data.  As we move from having a couple hundred files of small, single-stroke sketches into more complicated tasks, we will be dealing with thousands of sketches, with multiple strokes and shapes that are of substantial size.  Rather than downloading large datasets to your local computer, all data for this homework is provided via the Sketch Recognition Library database API (srlib_db), which was an optional data source in the first homework.

The data handlers for srlib_db are available in Javascript via srlib_js, which is included in a limited form in the Starter code available on the class drive under "Homework 2".  The documentation for srlib_js is also included; it is a JSDoc directory and so should be downloaded and read through a web browser.

The specification of SketchML::JSON, which is the format of all sketches on srlib_db, is also available in the homework's folder.  If you wish to use another language, that will be a useful reference, but any language that handles JSON-like objects natively will be easy to adapt.  Furthermore, if you wish to use another language, you should make an HTTP GET request to http://srl-prod1.cs.tamu.edu:7750/getSketches?domain=MechanixCleaned&interpretation=force to retrieve the data.  That particular request simply grabs the first 50 sketches, and the Starter code grabs 200 between 4000 and 4200.  None of these numbers really matter as the main goal is to collect some pool of data for using with your segmentation algorithm.

Segmentation

The goal of the assignment is that, given a set of raw strokes, you can segment them successfully into a set of segmented substrokes.  You may implement any corner finding / segmentation algorithm you would like.  This can be an algorithm from class, a paper, or one of your own creation.  A few points will be awarded based on creativity of your algorithm.

Try to make an algorithm which performs fairly well on the data set.  All of the sketches in srlib_db for this domain (MechanixCleaned) will feature a set of raw strokes and segmented substrokes.  These are provided for comparison of your algorithm.  There are thousands of sketches, so you will not be able to make use of the provided substrokes to hardcode a solution.  Furthermore, some of the sketches are imperfect, with the occasional dropped stroke and mis-identified substroke.  They were generated by a sketch recognition software tool -- Mechanix in this case.  As such, you should try to make the best segmentation algorithm you can, knowing that some disagreements will exist between your algorithm and the original dataset.

Finally, Mechanix is designed to recognize trusses, so it segments strokes into substrokes at points of nodes on trusses.  You may choose to ignore this form of segmentation for the assignment if you wish to focus on corner finding.  It will not impact you greatly on this homework; however, as the subsequent homework will be using the same data to implement a recognizer for arrows and trusses, you may benefit from some additional work in finding truss nodes as corners in this homework.

More details will be shared regarding evaluation metrics soon.  For now, the focus should be on an algorithm which generates substrokes and adds them to a sketch object when only working from the raw stroke data.

Starter Code

As with Homework 1, there is provided Starter code in the directory for Homework 2.  This file already handles loading data and converting it into the appropriate format.  It also visualizes the original sketch data, prefixed "old", and your sketch data, prefixed "new", according to either the strokes or substrokes.  Note that Mechanix sketches may be large and can get cutoff on the canvas.

Obtaining Credit
Again, the goal of this assignment is that you write a corner finding / segmentation algorithm.  You will need to submit all of your source files to the grader.  Also, include a small report describing what algorithm you implemented, the intuition of how it works, and the most difficult problems you encountered.  If you run the metrics code, include those numbers as well.

Please ZIP all your files together into a single file submission titled "HW2_<last name>_<first initial>.zip".  Since the university may block ZIP files, it is recommended that you upload your file to Google Drive and share the link via email.

More details will follow regarding the evaluation metrics.  Remember that the Mechanix data is imperfect, which will be accounted for during the grading process; however, srlib_db has a large sketch library against which your work may be checked, so you should still strive to write an algorithm which performs well on the supplied data.

Although this assignment seems large because of the size of the data repository, it really only consists of a single algorithm, likely meaning less code than the first homework.  Thus, you will have roughly a week and a half to complete the assignment.

Due Date
Oct. 16, Sunday @ Midnight (extended from Oct. 12)
25% deducted per day late

No comments:

Post a Comment