Extracting Structured Data from Web Pages: Experiments



Collection Name Amazon Cars
Source RoadRunner Project
Number of Pages 21
Automatically Extracted Template template.xml
Automatically Extracted Schema schema.xml
Manually Deduced Schema manschema.txt
Equivalence Classes eq.cls
Source Pages and Automatically Extracted Values corresponding to pages
Page Source Extracted Value
1.html value
2.html value
3.html value
4.html value
5.html value
6.html value
7.html value
8.html value
9.html value
10.html value
11.html value
12.html value
13.html value
14.html value
15.html value
16.html value
17.html value
18.html value
19.html value
20.html value
21.html value