FA Database

Annotated database of on-line sketched
diagrams from Finite Automata domain.

The Database

fa_diagram This is a database of on-line sketched diagrams from the finite automata (also known as finite state machine) domain. The purpose of the database is to provide a benchmark to algorithms for diagram recognition. It contains 300 diagrams drawn by 25 users. Each user was asked to draw 12 diagrams according to the given patterns. The data was collected with standard tablet PC Lenovo X61. The data are stored in InkML standard. Individual strokes contain information about position of the pen tip, time, and pressure. Annotation of symbols and relations among them is provided. Additionally, there is a meaning of the text assigned to text blocks and arrows have defined connection points.

Download

The database is divided into train dataset (11 users, 132 diagrams), validation dataset (7 users, 84 diagrams) and test dataset (7 users, 84 diagrams). It is free to download.

file: FA_database_1.0.zip

Parsing the Database

The database is stored in InkML format and thus it should be easy to parse. However, we describe how are symbols, connection points of arrows and relations annotated. The list of symbols in the database is following:

  • state
  • final state
  • arrow
  • label

Symbols are annotated in the following way. Let us note that annotation textMeaning is defined for labels only and the annotation connectionPointFrom with connectionPointTo are defined for arrows only. <traceGroup id="[id]">
  <annotation type="truth">[symbol_name]</annotation>
  <annotation type="textMeaning">[text_meaning]</annotation>
  <annotation type="connectionPointFrom">[relative_stroke_id] [point_id]</annotation>
  <annotation type="connectionPointTo">[relative_stroke_id] [point_id]</annotation>
  <traceView traceDataRef="[stroke_1_id]" />
  <traceView traceDataRef="[stroke_2_id]" />
  .
  .
  .
</traceGroup>
Relations are binary (initial arrow enrtering a state or a labeling of a symbol) or ternary (arrow connecting two symbols). In the case the relation is binary the third reference to a symbol is not defined and the first reference is always text (in the case of labeling) or arrow (in the case of arrow entering an initial state). In the case of ternary relation the first reference is the arrow, second reference is a symbol where is the arrow coming out and the second reference is the symbol the arrow is coming in. The list of relations in the database:

  • arrow_in
  • arrow_connection
  • text_inside
  • arrow_label

<symbolGroup id="[id]" arity="[arity_num]">
  <annotation type="truth">[relation_name]</annotation>
  <symbolView symbolDataRef="[symbol_1_id]" />
  <symbolView symbolDataRef="[symbol_2_id]" />
  <symbolView symbolDataRef="[symbol_3_id]" />
</symbolGroup>
Let us note that initial arrows are annotated the same way as arrows connecting two symbols. However, they might be easily found using the information about the reltionships.