author_facet Ayala, Brenda Reyes
Caragea, Cornelia
Ayala, Brenda Reyes
Caragea, Cornelia
author Ayala, Brenda Reyes
Caragea, Cornelia
spellingShingle Ayala, Brenda Reyes
Caragea, Cornelia
Proceedings of the American Society for Information Science and Technology
Towards building a collection of web archiving research articles
Library and Information Sciences
Information Systems
author_sort ayala, brenda reyes
spelling Ayala, Brenda Reyes Caragea, Cornelia 0044-7870 1550-8390 Wiley Library and Information Sciences Information Systems http://dx.doi.org/10.1002/meet.2014.14505101150 <jats:title>ABSTRACT</jats:title><jats:p>The field of Web Archiving exists in a fluid, fragmented, and heterogeneous state. Part of the problem is that this field is relatively new and its literature is scattered across a wide range of journal and conference venues. This makes the state of Web Archiving as a discipline particularly difficult to ascertain. This paper presents an approach to building a collection of articles about the subject. We begin with a small dataset of articles taken from a Web Archiving Bibliography and then proceed to expand it by crawling the Web and collecting additional documents. The crawled documents are then classified using machine learning classification techniques. We show that by extracting the documents’ titles and abstracts and representing them using the “bag of words” approach, we are able to accurately identify documents from the Web crawler as documents that are about Web Archiving. We also discuss our results in the context of Web Archiving as an emerging field.</jats:p> Towards building a collection of web archiving research articles Proceedings of the American Society for Information Science and Technology
doi_str_mv 10.1002/meet.2014.14505101150
facet_avail Online
Free
finc_class_facet Allgemeines
Informatik
format ElectronicArticle
fullrecord blob:ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTAwMi9tZWV0LjIwMTQuMTQ1MDUxMDExNTA
id ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTAwMi9tZWV0LjIwMTQuMTQ1MDUxMDExNTA
institution DE-D275
DE-Bn3
DE-Brt1
DE-Zwi2
DE-D161
DE-Gla1
DE-Zi4
DE-15
DE-Pl11
DE-Rs1
DE-105
DE-14
FID-BBI-DE-23
DE-Ch1
DE-L229
imprint Wiley, 2014
imprint_str_mv Wiley, 2014
issn 0044-7870
1550-8390
issn_str_mv 0044-7870
1550-8390
language English
mega_collection Wiley (CrossRef)
match_str ayala2014towardsbuildingacollectionofwebarchivingresearcharticles
publishDateSort 2014
publisher Wiley
recordtype ai
record_format ai
series Proceedings of the American Society for Information Science and Technology
source_id 49
title Towards building a collection of web archiving research articles
title_unstemmed Towards building a collection of web archiving research articles
title_full Towards building a collection of web archiving research articles
title_fullStr Towards building a collection of web archiving research articles
title_full_unstemmed Towards building a collection of web archiving research articles
title_short Towards building a collection of web archiving research articles
title_sort towards building a collection of web archiving research articles
topic Library and Information Sciences
Information Systems
url http://dx.doi.org/10.1002/meet.2014.14505101150
publishDate 2014
physical 1-5
description <jats:title>ABSTRACT</jats:title><jats:p>The field of Web Archiving exists in a fluid, fragmented, and heterogeneous state. Part of the problem is that this field is relatively new and its literature is scattered across a wide range of journal and conference venues. This makes the state of Web Archiving as a discipline particularly difficult to ascertain. This paper presents an approach to building a collection of articles about the subject. We begin with a small dataset of articles taken from a Web Archiving Bibliography and then proceed to expand it by crawling the Web and collecting additional documents. The crawled documents are then classified using machine learning classification techniques. We show that by extracting the documents’ titles and abstracts and representing them using the “bag of words” approach, we are able to accurately identify documents from the Web crawler as documents that are about Web Archiving. We also discuss our results in the context of Web Archiving as an emerging field.</jats:p>
container_issue 1
container_start_page 1
container_title Proceedings of the American Society for Information Science and Technology
container_volume 51
format_de105 Article, E-Article
format_de14 Article, E-Article
format_de15 Article, E-Article
format_de520 Article, E-Article
format_de540 Article, E-Article
format_dech1 Article, E-Article
format_ded117 Article, E-Article
format_degla1 E-Article
format_del152 Buch
format_del189 Article, E-Article
format_dezi4 Article
format_dezwi2 Article, E-Article
format_finc Article, E-Article
format_nrw Article, E-Article
_version_ 1792336255660326916
geogr_code not assigned
last_indexed 2024-03-01T14:57:32.047Z
geogr_code_person not assigned
openURL url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fvufind.svn.sourceforge.net%3Agenerator&rft.title=Towards+building+a+collection+of+web+archiving+research+articles&rft.date=2014-01-01&genre=article&issn=1550-8390&volume=51&issue=1&spage=1&epage=5&pages=1-5&jtitle=Proceedings+of+the+American+Society+for+Information+Science+and+Technology&atitle=Towards+building+a+collection+of+web+archiving+research+articles&aulast=Caragea&aufirst=Cornelia&rft_id=info%3Adoi%2F10.1002%2Fmeet.2014.14505101150&rft.language%5B0%5D=eng
SOLR
_version_ 1792336255660326916
author Ayala, Brenda Reyes, Caragea, Cornelia
author_facet Ayala, Brenda Reyes, Caragea, Cornelia, Ayala, Brenda Reyes, Caragea, Cornelia
author_sort ayala, brenda reyes
container_issue 1
container_start_page 1
container_title Proceedings of the American Society for Information Science and Technology
container_volume 51
description <jats:title>ABSTRACT</jats:title><jats:p>The field of Web Archiving exists in a fluid, fragmented, and heterogeneous state. Part of the problem is that this field is relatively new and its literature is scattered across a wide range of journal and conference venues. This makes the state of Web Archiving as a discipline particularly difficult to ascertain. This paper presents an approach to building a collection of articles about the subject. We begin with a small dataset of articles taken from a Web Archiving Bibliography and then proceed to expand it by crawling the Web and collecting additional documents. The crawled documents are then classified using machine learning classification techniques. We show that by extracting the documents’ titles and abstracts and representing them using the “bag of words” approach, we are able to accurately identify documents from the Web crawler as documents that are about Web Archiving. We also discuss our results in the context of Web Archiving as an emerging field.</jats:p>
doi_str_mv 10.1002/meet.2014.14505101150
facet_avail Online, Free
finc_class_facet Allgemeines, Informatik
format ElectronicArticle
format_de105 Article, E-Article
format_de14 Article, E-Article
format_de15 Article, E-Article
format_de520 Article, E-Article
format_de540 Article, E-Article
format_dech1 Article, E-Article
format_ded117 Article, E-Article
format_degla1 E-Article
format_del152 Buch
format_del189 Article, E-Article
format_dezi4 Article
format_dezwi2 Article, E-Article
format_finc Article, E-Article
format_nrw Article, E-Article
geogr_code not assigned
geogr_code_person not assigned
id ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTAwMi9tZWV0LjIwMTQuMTQ1MDUxMDExNTA
imprint Wiley, 2014
imprint_str_mv Wiley, 2014
institution DE-D275, DE-Bn3, DE-Brt1, DE-Zwi2, DE-D161, DE-Gla1, DE-Zi4, DE-15, DE-Pl11, DE-Rs1, DE-105, DE-14, FID-BBI-DE-23, DE-Ch1, DE-L229
issn 0044-7870, 1550-8390
issn_str_mv 0044-7870, 1550-8390
language English
last_indexed 2024-03-01T14:57:32.047Z
match_str ayala2014towardsbuildingacollectionofwebarchivingresearcharticles
mega_collection Wiley (CrossRef)
physical 1-5
publishDate 2014
publishDateSort 2014
publisher Wiley
record_format ai
recordtype ai
series Proceedings of the American Society for Information Science and Technology
source_id 49
spelling Ayala, Brenda Reyes Caragea, Cornelia 0044-7870 1550-8390 Wiley Library and Information Sciences Information Systems http://dx.doi.org/10.1002/meet.2014.14505101150 <jats:title>ABSTRACT</jats:title><jats:p>The field of Web Archiving exists in a fluid, fragmented, and heterogeneous state. Part of the problem is that this field is relatively new and its literature is scattered across a wide range of journal and conference venues. This makes the state of Web Archiving as a discipline particularly difficult to ascertain. This paper presents an approach to building a collection of articles about the subject. We begin with a small dataset of articles taken from a Web Archiving Bibliography and then proceed to expand it by crawling the Web and collecting additional documents. The crawled documents are then classified using machine learning classification techniques. We show that by extracting the documents’ titles and abstracts and representing them using the “bag of words” approach, we are able to accurately identify documents from the Web crawler as documents that are about Web Archiving. We also discuss our results in the context of Web Archiving as an emerging field.</jats:p> Towards building a collection of web archiving research articles Proceedings of the American Society for Information Science and Technology
spellingShingle Ayala, Brenda Reyes, Caragea, Cornelia, Proceedings of the American Society for Information Science and Technology, Towards building a collection of web archiving research articles, Library and Information Sciences, Information Systems
title Towards building a collection of web archiving research articles
title_full Towards building a collection of web archiving research articles
title_fullStr Towards building a collection of web archiving research articles
title_full_unstemmed Towards building a collection of web archiving research articles
title_short Towards building a collection of web archiving research articles
title_sort towards building a collection of web archiving research articles
title_unstemmed Towards building a collection of web archiving research articles
topic Library and Information Sciences, Information Systems
url http://dx.doi.org/10.1002/meet.2014.14505101150