Funded Projects at TH Köln
Smart Harvesting II
Automatically extracting and editing bibliographic data is one of the major problems associated with the maintenance of bibliographic databases. The successor project “Smart Harvesting II” aims to intensify the prolific collaboration between the database providers dblp (computer science) and GESIS (social science) in order to solve common problems.
The predecessor project focused on the development of a learning wrapper which uses the current database to automatically generate extraction rules. However, due to the multitude of technologies used on the web, this is not generally applicable. Especially, dynamically generated and updated contents (e.g., via AJAX calls) still pose a substantial challenge.
Therefore, the current project prioritizes the development of a wrapper framework for a rule-based data extraction which can be handled by non-computer scientists by means of simple extraction rules. Navigation as well as extraction shall be effected by parsing the underlying DOM trees of the HTML pages. In cooperation with the University of Oxford, we intend to integrate their addressing scheme OXPath (an extension of XPath) into the wrapper. Furthermore, we plan to create monitoring tools enabling non-programmers (e.g., librarians) to oversee the complete data extraction process and tap new data sources.
At the same time, we shall revise and edit the existing data pool by means of author disambiguation in order to guarantee a more solid data base. The disambiguation software for new data already established during the predecessor project shall be enhanced by a further component that is to detect homonyms and synonyms in the existing data. Above all, the project is to respond to the discrepancies between the different publication cultures (computer science ↔ social science), which have been revealed in the predecessor project, because they require the use of disparate strategies.
Project site: https://ir.web.th-koeln.de/projects/
Timespan: February 2017 till August 2019
Funded Projects at GESIS (my former institute)
The objective of the Scientific Information Service for Sociology is to improve the sociological information infrastructure by providing three modules which are developed in response to current deficits formulated by the scientific community.
The first module is a newly developed discovery platform that will serve as a central entry point for the supply of information to sociological research. The second module is an academic social network built specifically for the sociological community to support professional communication and networking. The third module is the promotion and support of open access publishing. Thus the Scientific Information Service aims at making sociological research widely accessible. By linking the subject repository (Social Science Open Access Repository, SSOAR) to the social network and providing consultation service, scholars will be supported to republish and self-archive their research output in the most effective way.
Project site: http://fid-soziologie.de
Timespan: June 2016 till May 2019
Due to missing web-interfaces and the insufficient interoperability of databases and metadata formats the inclusion of full texts and metadata into repositories is a challenge – especially for medium-sized publishers, research units and editors. The project DDA aims to establish a self-deposit-platform for open access full-texts and metadata in order to reduce labor-intensive manual integration of these into the repositories. As such the DDA works as a broker between the content-providers and repositories similar to the EU-financed PEER-depot within the PEER-project. Unlike the PEER-project the DDA aims to establish a sustainable deposit-infrastructure and especially addresses small publishers and partners.
In order to import and index larger quantities of documents that cannot sensibly be integrated by individual self-upload we plan to establish and run an interactive self-deposit-portal which allows automated integration of different data formats. The DDA consists of two components: (1) a web-platform with an interactive questionnaire-based assistant (online wizard) to find out about the systems/databases used by the content-provider and guide him to the possible ways of delivering his data and (2) an upload platform and a converter to import and publish the data in the repositories.
Project site: http://www.gesis.org/forschung/drittmittelprojekte/projektuebersicht-drittmittel/dda-document-deposit-assistant/
Timespan: September 2015 till August 2017
The task of IRM is to introduce and evaluate value-added services (treatment of term vagueness and document re-ranking) for information retrieval within a heterogeneous DL environment (like sowiport). The methods, which will be implemented, focus on query construction and on result set re-ranking and are designed to positively influence each other. The goal of the project is to evaluate whether, and how far, search quality will be improved by applying the services under study.
We focus on the following three value-added services:
- Search Term Recommender (STR, idea based on work at School of Information, University of California, Berkeley) for term vagueness treatment
- Bradfordizing and author centrality in co-authorship networks which are derived from scientometrics and social network analysis for document re-ranking
As an open-access full-text server, SSOAR‘s goal is to implement the “green road” to open access by providing users with free electronic access to journal article preprints and postprints. SSOAR is especially committed to the archiving and dissemination of quality-controlled texts. The repository has been certified by DINI, the German Initiative for Networked Information (DINI certificate 2007). The DINI certificate confirms our compliance with formal and technical standards and quality criteria for open-access repositories.
I implemented SSOAR mainly using DBClear (a Java-based metadata repository software) and Typo3. The repository is still very active with over 14.000 full texts and more than 20.000 unique visitors per month. It’s the only dedicated disciplinary Open Access Repository in the Social Sciences.
Project site: http://www.ssoar.info
Timespan: January 2007 till December 2008
Edited by Konrad Umlauf and Stefan Gradmann. Up to now I contributed more than 100 articles ranging from computer and information science related topics like newsfeed, Cascading Style Sheet to human-computer interaction or network topology.
Project description: http://www.ib.hu-berlin.de/~kumlau/LBI_1.Lieferung_U1-U4.pdf
Timespan: Since August 2009