Eintrag weiter verarbeiten
Towards building a collection of web archiving research articles
Gespeichert in:
Zeitschriftentitel: | Proceedings of the American Society for Information Science and Technology |
---|---|
Personen und Körperschaften: | , |
In: | Proceedings of the American Society for Information Science and Technology, 51, 2014, 1, S. 1-5 |
Format: | E-Article |
Sprache: | Englisch |
veröffentlicht: |
Wiley
|
Schlagwörter: |
author_facet |
Ayala, Brenda Reyes Caragea, Cornelia Ayala, Brenda Reyes Caragea, Cornelia |
---|---|
author |
Ayala, Brenda Reyes Caragea, Cornelia |
spellingShingle |
Ayala, Brenda Reyes Caragea, Cornelia Proceedings of the American Society for Information Science and Technology Towards building a collection of web archiving research articles Library and Information Sciences Information Systems |
author_sort |
ayala, brenda reyes |
spelling |
Ayala, Brenda Reyes Caragea, Cornelia 0044-7870 1550-8390 Wiley Library and Information Sciences Information Systems http://dx.doi.org/10.1002/meet.2014.14505101150 <jats:title>ABSTRACT</jats:title><jats:p>The field of Web Archiving exists in a fluid, fragmented, and heterogeneous state. Part of the problem is that this field is relatively new and its literature is scattered across a wide range of journal and conference venues. This makes the state of Web Archiving as a discipline particularly difficult to ascertain. This paper presents an approach to building a collection of articles about the subject. We begin with a small dataset of articles taken from a Web Archiving Bibliography and then proceed to expand it by crawling the Web and collecting additional documents. The crawled documents are then classified using machine learning classification techniques. We show that by extracting the documents’ titles and abstracts and representing them using the “bag of words” approach, we are able to accurately identify documents from the Web crawler as documents that are about Web Archiving. We also discuss our results in the context of Web Archiving as an emerging field.</jats:p> Towards building a collection of web archiving research articles Proceedings of the American Society for Information Science and Technology |
doi_str_mv |
10.1002/meet.2014.14505101150 |
facet_avail |
Online Free |
finc_class_facet |
Allgemeines Informatik |
format |
ElectronicArticle |
fullrecord |
blob:ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTAwMi9tZWV0LjIwMTQuMTQ1MDUxMDExNTA |
id |
ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTAwMi9tZWV0LjIwMTQuMTQ1MDUxMDExNTA |
institution |
DE-D275 DE-Bn3 DE-Brt1 DE-Zwi2 DE-D161 DE-Gla1 DE-Zi4 DE-15 DE-Pl11 DE-Rs1 DE-105 DE-14 FID-BBI-DE-23 DE-Ch1 DE-L229 |
imprint |
Wiley, 2014 |
imprint_str_mv |
Wiley, 2014 |
issn |
0044-7870 1550-8390 |
issn_str_mv |
0044-7870 1550-8390 |
language |
English |
mega_collection |
Wiley (CrossRef) |
match_str |
ayala2014towardsbuildingacollectionofwebarchivingresearcharticles |
publishDateSort |
2014 |
publisher |
Wiley |
recordtype |
ai |
record_format |
ai |
series |
Proceedings of the American Society for Information Science and Technology |
source_id |
49 |
title |
Towards building a collection of web archiving research articles |
title_unstemmed |
Towards building a collection of web archiving research articles |
title_full |
Towards building a collection of web archiving research articles |
title_fullStr |
Towards building a collection of web archiving research articles |
title_full_unstemmed |
Towards building a collection of web archiving research articles |
title_short |
Towards building a collection of web archiving research articles |
title_sort |
towards building a collection of web archiving research articles |
topic |
Library and Information Sciences Information Systems |
url |
http://dx.doi.org/10.1002/meet.2014.14505101150 |
publishDate |
2014 |
physical |
1-5 |
description |
<jats:title>ABSTRACT</jats:title><jats:p>The field of Web Archiving exists in a fluid, fragmented, and heterogeneous state. Part of the problem is that this field is relatively new and its literature is scattered across a wide range of journal and conference venues. This makes the state of Web Archiving as a discipline particularly difficult to ascertain. This paper presents an approach to building a collection of articles about the subject. We begin with a small dataset of articles taken from a Web Archiving Bibliography and then proceed to expand it by crawling the Web and collecting additional documents. The crawled documents are then classified using machine learning classification techniques. We show that by extracting the documents’ titles and abstracts and representing them using the “bag of words” approach, we are able to accurately identify documents from the Web crawler as documents that are about Web Archiving. We also discuss our results in the context of Web Archiving as an emerging field.</jats:p> |
container_issue |
1 |
container_start_page |
1 |
container_title |
Proceedings of the American Society for Information Science and Technology |
container_volume |
51 |
format_de105 |
Article, E-Article |
format_de14 |
Article, E-Article |
format_de15 |
Article, E-Article |
format_de520 |
Article, E-Article |
format_de540 |
Article, E-Article |
format_dech1 |
Article, E-Article |
format_ded117 |
Article, E-Article |
format_degla1 |
E-Article |
format_del152 |
Buch |
format_del189 |
Article, E-Article |
format_dezi4 |
Article |
format_dezwi2 |
Article, E-Article |
format_finc |
Article, E-Article |
format_nrw |
Article, E-Article |
_version_ |
1792336255660326916 |
geogr_code |
not assigned |
last_indexed |
2024-03-01T14:57:32.047Z |
geogr_code_person |
not assigned |
openURL |
url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fvufind.svn.sourceforge.net%3Agenerator&rft.title=Towards+building+a+collection+of+web+archiving+research+articles&rft.date=2014-01-01&genre=article&issn=1550-8390&volume=51&issue=1&spage=1&epage=5&pages=1-5&jtitle=Proceedings+of+the+American+Society+for+Information+Science+and+Technology&atitle=Towards+building+a+collection+of+web+archiving+research+articles&aulast=Caragea&aufirst=Cornelia&rft_id=info%3Adoi%2F10.1002%2Fmeet.2014.14505101150&rft.language%5B0%5D=eng |
SOLR | |
_version_ | 1792336255660326916 |
author | Ayala, Brenda Reyes, Caragea, Cornelia |
author_facet | Ayala, Brenda Reyes, Caragea, Cornelia, Ayala, Brenda Reyes, Caragea, Cornelia |
author_sort | ayala, brenda reyes |
container_issue | 1 |
container_start_page | 1 |
container_title | Proceedings of the American Society for Information Science and Technology |
container_volume | 51 |
description | <jats:title>ABSTRACT</jats:title><jats:p>The field of Web Archiving exists in a fluid, fragmented, and heterogeneous state. Part of the problem is that this field is relatively new and its literature is scattered across a wide range of journal and conference venues. This makes the state of Web Archiving as a discipline particularly difficult to ascertain. This paper presents an approach to building a collection of articles about the subject. We begin with a small dataset of articles taken from a Web Archiving Bibliography and then proceed to expand it by crawling the Web and collecting additional documents. The crawled documents are then classified using machine learning classification techniques. We show that by extracting the documents’ titles and abstracts and representing them using the “bag of words” approach, we are able to accurately identify documents from the Web crawler as documents that are about Web Archiving. We also discuss our results in the context of Web Archiving as an emerging field.</jats:p> |
doi_str_mv | 10.1002/meet.2014.14505101150 |
facet_avail | Online, Free |
finc_class_facet | Allgemeines, Informatik |
format | ElectronicArticle |
format_de105 | Article, E-Article |
format_de14 | Article, E-Article |
format_de15 | Article, E-Article |
format_de520 | Article, E-Article |
format_de540 | Article, E-Article |
format_dech1 | Article, E-Article |
format_ded117 | Article, E-Article |
format_degla1 | E-Article |
format_del152 | Buch |
format_del189 | Article, E-Article |
format_dezi4 | Article |
format_dezwi2 | Article, E-Article |
format_finc | Article, E-Article |
format_nrw | Article, E-Article |
geogr_code | not assigned |
geogr_code_person | not assigned |
id | ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTAwMi9tZWV0LjIwMTQuMTQ1MDUxMDExNTA |
imprint | Wiley, 2014 |
imprint_str_mv | Wiley, 2014 |
institution | DE-D275, DE-Bn3, DE-Brt1, DE-Zwi2, DE-D161, DE-Gla1, DE-Zi4, DE-15, DE-Pl11, DE-Rs1, DE-105, DE-14, FID-BBI-DE-23, DE-Ch1, DE-L229 |
issn | 0044-7870, 1550-8390 |
issn_str_mv | 0044-7870, 1550-8390 |
language | English |
last_indexed | 2024-03-01T14:57:32.047Z |
match_str | ayala2014towardsbuildingacollectionofwebarchivingresearcharticles |
mega_collection | Wiley (CrossRef) |
physical | 1-5 |
publishDate | 2014 |
publishDateSort | 2014 |
publisher | Wiley |
record_format | ai |
recordtype | ai |
series | Proceedings of the American Society for Information Science and Technology |
source_id | 49 |
spelling | Ayala, Brenda Reyes Caragea, Cornelia 0044-7870 1550-8390 Wiley Library and Information Sciences Information Systems http://dx.doi.org/10.1002/meet.2014.14505101150 <jats:title>ABSTRACT</jats:title><jats:p>The field of Web Archiving exists in a fluid, fragmented, and heterogeneous state. Part of the problem is that this field is relatively new and its literature is scattered across a wide range of journal and conference venues. This makes the state of Web Archiving as a discipline particularly difficult to ascertain. This paper presents an approach to building a collection of articles about the subject. We begin with a small dataset of articles taken from a Web Archiving Bibliography and then proceed to expand it by crawling the Web and collecting additional documents. The crawled documents are then classified using machine learning classification techniques. We show that by extracting the documents’ titles and abstracts and representing them using the “bag of words” approach, we are able to accurately identify documents from the Web crawler as documents that are about Web Archiving. We also discuss our results in the context of Web Archiving as an emerging field.</jats:p> Towards building a collection of web archiving research articles Proceedings of the American Society for Information Science and Technology |
spellingShingle | Ayala, Brenda Reyes, Caragea, Cornelia, Proceedings of the American Society for Information Science and Technology, Towards building a collection of web archiving research articles, Library and Information Sciences, Information Systems |
title | Towards building a collection of web archiving research articles |
title_full | Towards building a collection of web archiving research articles |
title_fullStr | Towards building a collection of web archiving research articles |
title_full_unstemmed | Towards building a collection of web archiving research articles |
title_short | Towards building a collection of web archiving research articles |
title_sort | towards building a collection of web archiving research articles |
title_unstemmed | Towards building a collection of web archiving research articles |
topic | Library and Information Sciences, Information Systems |
url | http://dx.doi.org/10.1002/meet.2014.14505101150 |