Extracting Structured Data from Web Pages: Experiments



Collection Name Baseball players' list
Source RoadRunner Project
Number of Pages 10
Automatically Extracted Template template.xml
Automatically Extracted Schema schema.xml
Manually Deduced Schema manschema.txt
Equivalence Classes eq.cls
Source Pages and Automatically Extracted Values corresponding to pages
Page Source Extracted Value
1.html value
2.html value
3.html value
4.html value
5.html value
6.html value
7.html value
8.html value
9.html value
10.html value