Extracting Structured Data from Web Pages: Experiments



Collection Name Amazon Pop Artist
Source RoadRunner Project
Number of Pages 19
Automatically Extracted Template template.xml
Automatically Extracted Schema schema.xml
Manually Deduced Schema manschema.txt
Equivalence Classes eq.cls
Source Pages and Automatically Extracted Values corresponding to pages
Page Source Extracted Value
1.html value
2.html value
3.html value
4.html value
5.html value
6.html value
7.html value
8.html value
9.html value
10.html value
11.html value
12.html value
13.html value
14.html value
15.html value
16.html value
17.html value
18.html value
19.html value