Sources of data
Linked data
RDF vocabularies and ontologies used
- Public Contracts Ontology (further abbreviated with the prefix
pc:
) - Schema.org
- GoodRelations
- Dublin Core Terms
- Simple Knowledge Organization System
- VCard
- Asset Description Metadata Schema
The format of data is the Turtle serialization of RDF (GZipped).
Tasks
Task A1
Training dataset
- Size of data (Gzipped): 61 MB
- Number of RDF triples: 6 223 018
- Number of instances of
pc:Contract
: 70 543 - Number of links to DBpedia: 41 468
- Number of links to OpenCorporates: 14 894
Testing dataset
- Size of data (Gzipped): 1.2 MB
- Number of RDF triples: 67 007
- Number of instances of
pc:Contract
: 788 - Number of links to DBpedia: 177
- Number of links to OpenCorporates: 0 (testing dataset does not contain information about bidders)
Format of results
The results for the task will be delivered in CSV format with two columns. First column will contain URI of an annotated public contract (instance of pc:Contract
), second column will contain the predicted number of tenders for the public contract in the format of positive integer.
Example
contract,numberOfTenders "http://linked.opendata.cz/resource/domain/fbo.gov/contract/AG-02NV-S-14-7000",13
Task A2
Same as the training dataset for the task A1.
License information
Data extracted from USASpending.gov, FedBizOpps.gov and FAR Codes is public domain. The datasets include data from DBpedia, OpenCorporates and North American Industry Classification System (NAICS). Data from DBpedia is dual-licensed under the Creative Commons Attribution-Share Alike (CC-BY-SA) and the GNU Free Documentation Licence. Data from OpenCorporates.com is provided with the Open Database License, which is also the license used for the RDF version of NAICS.