|
Machine Learning for Resolving Researcher Affiliations |
Authors:
Marjan Šterk, Daniel Vladušič, Eva Milošev: XLAB Jure Ferlež, Dunja Mladenić, Marko Grobelnik: Jožef Stefan Institute
Abstract:
This paper describes the Institution Finder, an approach to develop a simple web mining procedure to find the internet domain of the institution(s) that a given researcher is affiliated with. The Institution Finder starts several queries on public Web search engines and tries to extract from the hits the institution names and internet domains that are likely to be related to the given researcher. A simple procedure based on machine learning is used to improve ranking of the hits. A researcher can be also rejected by the system if the corresponding domain cannot be found reliably. The performance is quantified by accuracy, i.e. the conditional probability P(correct | not rejected), and by the reject rate. The hits obtained from various queries can be combined in different ways, enabling the tradeoff between reasonable accuracy with almost no reject (i.e. of the 363 test examples about 44% are correctly classified) or high accuracy with high reject (for example 55% of the test examples rejected and 75% of the rest correctly classified).
Full paper (published at SiKDD 2007) |
|
|
Dual Proximity Neighbour Selection Method for Peer-to-peer-based Discovery Service |
Authors:
Piotr Karwaczynski, Dariusz Konieczny: Wroclaw University of Technology Jaka Močnik, Marko Novak: XLAB
Abstract:
In this paper, we propose a new, dual method for selfoptimization of a pervasive, DHT-based discovery service. This method addresses the topology mismatch problem. On the one hand, it selects close neighbours based on static, readily available information (namely the IP addresses of the nodes) and thus does not involve costly periodic probing of many nodes. On the other hand, it enables an overlay to optimize its topology in run-time in a cost-effective manner. We prove the effectiveness of our method by statistical and experimental verification.
Draft paper |
|
|
Self-optimization of a DHT-based Discovery Service |
Authors:
Piotr Karwaczynski: Wroclaw University of Technology Jaka Močnik: XLAB
Abstract:
Discovery in large, highly dynamic service oriented systems cannot be realized by traditional means, i.e. with a centralized index. Instead, distributed data structures are used, with a distributed hash table deployed on an overlay network being a typical example. The efficiency of overlay networks built on top of the IP network commonly suffers from the mismatch between the topologies of the overlay and the underlying IP network, resulting in unnecessary traffic and increased latencies. Substantial improvement can be achieved by optimizing the logical links between overlay nodes. In this paper, we propose a new method for self-optimization of a pervasive, DHT-based discovery service. Our method has no need for active measurement of inter-node latencies, thus minimizing network traffic costs of node insertion and topology maintenance. We verify our method by means of analysis of a large data sets of latency measurements between arbitrary nodes on the Internet, proving correlation among common IP prefix length of communicating nodes and latency.
Draft paper |
|
|
Distributed Directory Services for Grids Based on Peer-to-peer Technologies (in Slovene) |
|
Author: Marko Novak: XLAB Mentor: Borut Robič, Ph.D.: Faculty of computer science and informatics, University of Ljubljana
Abstract:
The main concern of Grid computing is interoperability in heterogeneous environments. In this work we concentrate mostly on Monitoring and Discovery Service (MDS), a directory service of Globus Toolkit 4, which is currently the most popular tool for building Grid systems. We design a structured peer-to-peer system for publishing and discovery of various objects in overlay network, which is based on improved Tapestry algorithms. We use it to extend MDS service and improve its defficiencies. We also conduct two types of experiments: one on PlanetLab platform and the other on local cluster. We analyse the results to compare the extended version of MDS service with original one with regard to consumed storage space and average query time.
|
|
|
Grid-Based Solution for Financial Modeling |
Authors:
Eva Milošev, Marko Novak, Marko Pihlar, Gregor Pipan: XLAB
Abstract:
We present a grid-based application for econometric modeling in finance, where speed of computation, reliability and security are an important issue. Due to large input data sets, the prototype of the application running on a standalone machine proved to be too slow for practical purposes. Thus, the application was implemented as a service within Globus Toolkit 4 and distributed on a per job basis. The grid infrastructure provides the standard front-end and ensures the security necessary for financial industry applications.
Draft Paper
|
|
|
|