Towards building a collection of web archiving research articles

Gespeichert in:

Bibliographische Detailangaben
Zeitschriftentitel:	Proceedings of the American Society for Information Science and Technology
Personen und Körperschaften:	Ayala, Brenda Reyes, Caragea, Cornelia
In:	Proceedings of the American Society for Information Science and Technology, 51, 2014, 1, S. 1-5
Format:	E-Article
Sprache:	Englisch
veröffentlicht:	Wiley
Schlagwörter:	Library and Information Sciences Information Systems

author_facet	Ayala, Brenda Reyes Caragea, Cornelia Ayala, Brenda Reyes Caragea, Cornelia
author	Ayala, Brenda Reyes Caragea, Cornelia
spellingShingle	Ayala, Brenda Reyes Caragea, Cornelia Proceedings of the American Society for Information Science and Technology Towards building a collection of web archiving research articles Library and Information Sciences Information Systems
author_sort	ayala, brenda reyes
spelling	Ayala, Brenda Reyes Caragea, Cornelia 0044-7870 1550-8390 Wiley Library and Information Sciences Information Systems http://dx.doi.org/10.1002/meet.2014.14505101150 <jats:title>ABSTRACT</jats:title><jats:p>The field of Web Archiving exists in a fluid, fragmented, and heterogeneous state. Part of the problem is that this field is relatively new and its literature is scattered across a wide range of journal and conference venues. This makes the state of Web Archiving as a discipline particularly difficult to ascertain. This paper presents an approach to building a collection of articles about the subject. We begin with a small dataset of articles taken from a Web Archiving Bibliography and then proceed to expand it by crawling the Web and collecting additional documents. The crawled documents are then classified using machine learning classification techniques. We show that by extracting the documents’ titles and abstracts and representing them using the “bag of words” approach, we are able to accurately identify documents from the Web crawler as documents that are about Web Archiving. We also discuss our results in the context of Web Archiving as an emerging field.</jats:p> Towards building a collection of web archiving research articles Proceedings of the American Society for Information Science and Technology
doi_str_mv	10.1002/meet.2014.14505101150
facet_avail	Online Free
finc_class_facet	Allgemeines Informatik
format	ElectronicArticle
fullrecord	blob:ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTAwMi9tZWV0LjIwMTQuMTQ1MDUxMDExNTA
id	ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTAwMi9tZWV0LjIwMTQuMTQ1MDUxMDExNTA
institution	DE-D275 DE-Bn3 DE-Brt1 DE-Zwi2 DE-D161 DE-Gla1 DE-Zi4 DE-15 DE-Pl11 DE-Rs1 DE-105 DE-14 FID-BBI-DE-23 DE-Ch1 DE-L229
imprint	Wiley, 2014
imprint_str_mv	Wiley, 2014
issn	0044-7870 1550-8390
issn_str_mv	0044-7870 1550-8390
language	English
mega_collection	Wiley (CrossRef)
match_str	ayala2014towardsbuildingacollectionofwebarchivingresearcharticles
publishDateSort	2014
publisher	Wiley
recordtype	ai
record_format	ai
series	Proceedings of the American Society for Information Science and Technology
source_id	49
title	Towards building a collection of web archiving research articles
title_unstemmed	Towards building a collection of web archiving research articles
title_full	Towards building a collection of web archiving research articles
title_fullStr	Towards building a collection of web archiving research articles
title_full_unstemmed	Towards building a collection of web archiving research articles
title_short	Towards building a collection of web archiving research articles
title_sort	towards building a collection of web archiving research articles
topic	Library and Information Sciences Information Systems
url	http://dx.doi.org/10.1002/meet.2014.14505101150
publishDate	2014
physical	1-5
description	<jats:title>ABSTRACT</jats:title><jats:p>The field of Web Archiving exists in a fluid, fragmented, and heterogeneous state. Part of the problem is that this field is relatively new and its literature is scattered across a wide range of journal and conference venues. This makes the state of Web Archiving as a discipline particularly difficult to ascertain. This paper presents an approach to building a collection of articles about the subject. We begin with a small dataset of articles taken from a Web Archiving Bibliography and then proceed to expand it by crawling the Web and collecting additional documents. The crawled documents are then classified using machine learning classification techniques. We show that by extracting the documents’ titles and abstracts and representing them using the “bag of words” approach, we are able to accurately identify documents from the Web crawler as documents that are about Web Archiving. We also discuss our results in the context of Web Archiving as an emerging field.</jats:p>
container_issue	1
container_start_page	1
container_title	Proceedings of the American Society for Information Science and Technology
container_volume	51
format_de105	Article, E-Article
format_de14	Article, E-Article
format_de15	Article, E-Article
format_de520	Article, E-Article
format_de540	Article, E-Article
format_dech1	Article, E-Article
format_ded117	Article, E-Article
format_degla1	E-Article
format_del152	Buch
format_del189	Article, E-Article
format_dezi4	Article
format_dezwi2	Article, E-Article
format_finc	Article, E-Article
format_nrw	Article, E-Article
_version_	1792336255660326916
geogr_code	not assigned
last_indexed	2024-03-01T14:57:32.047Z
geogr_code_person	not assigned
openURL	url_ver=Z39.88-2004&ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fvufind.svn.sourceforge.net%3Agenerator&rft.title=Towards+building+a+collection+of+web+archiving+research+articles&rft.date=2014-01-01&genre=article&issn=1550-8390&volume=51&issue=1&spage=1&epage=5&pages=1-5&jtitle=Proceedings+of+the+American+Society+for+Information+Science+and+Technology&atitle=Towards+building+a+collection+of+web+archiving+research+articles&aulast=Caragea&aufirst=Cornelia&rft_id=info%3Adoi%2F10.1002%2Fmeet.2014.14505101150&rft.language%5B0%5D=eng
SOLR
_version_	1792336255660326916
author	Ayala, Brenda Reyes, Caragea, Cornelia
author_facet	Ayala, Brenda Reyes, Caragea, Cornelia, Ayala, Brenda Reyes, Caragea, Cornelia
author_sort	ayala, brenda reyes
container_issue	1
container_start_page	1
container_title	Proceedings of the American Society for Information Science and Technology
container_volume	51
description	<jats:title>ABSTRACT</jats:title><jats:p>The field of Web Archiving exists in a fluid, fragmented, and heterogeneous state. Part of the problem is that this field is relatively new and its literature is scattered across a wide range of journal and conference venues. This makes the state of Web Archiving as a discipline particularly difficult to ascertain. This paper presents an approach to building a collection of articles about the subject. We begin with a small dataset of articles taken from a Web Archiving Bibliography and then proceed to expand it by crawling the Web and collecting additional documents. The crawled documents are then classified using machine learning classification techniques. We show that by extracting the documents’ titles and abstracts and representing them using the “bag of words” approach, we are able to accurately identify documents from the Web crawler as documents that are about Web Archiving. We also discuss our results in the context of Web Archiving as an emerging field.</jats:p>
doi_str_mv	10.1002/meet.2014.14505101150
facet_avail	Online, Free
finc_class_facet	Allgemeines, Informatik
format	ElectronicArticle
format_de105	Article, E-Article
format_de14	Article, E-Article
format_de15	Article, E-Article
format_de520	Article, E-Article
format_de540	Article, E-Article
format_dech1	Article, E-Article
format_ded117	Article, E-Article
format_degla1	E-Article
format_del152	Buch
format_del189	Article, E-Article
format_dezi4	Article
format_dezwi2	Article, E-Article
format_finc	Article, E-Article
format_nrw	Article, E-Article
geogr_code	not assigned
geogr_code_person	not assigned
id	ai-49-aHR0cDovL2R4LmRvaS5vcmcvMTAuMTAwMi9tZWV0LjIwMTQuMTQ1MDUxMDExNTA
imprint	Wiley, 2014
imprint_str_mv	Wiley, 2014
institution	DE-D275, DE-Bn3, DE-Brt1, DE-Zwi2, DE-D161, DE-Gla1, DE-Zi4, DE-15, DE-Pl11, DE-Rs1, DE-105, DE-14, FID-BBI-DE-23, DE-Ch1, DE-L229
issn	0044-7870, 1550-8390
issn_str_mv	0044-7870, 1550-8390
language	English
last_indexed	2024-03-01T14:57:32.047Z
match_str	ayala2014towardsbuildingacollectionofwebarchivingresearcharticles
mega_collection	Wiley (CrossRef)
physical	1-5
publishDate	2014
publishDateSort	2014
publisher	Wiley
record_format	ai
recordtype	ai
series	Proceedings of the American Society for Information Science and Technology
source_id	49
spelling	Ayala, Brenda Reyes Caragea, Cornelia 0044-7870 1550-8390 Wiley Library and Information Sciences Information Systems http://dx.doi.org/10.1002/meet.2014.14505101150 <jats:title>ABSTRACT</jats:title><jats:p>The field of Web Archiving exists in a fluid, fragmented, and heterogeneous state. Part of the problem is that this field is relatively new and its literature is scattered across a wide range of journal and conference venues. This makes the state of Web Archiving as a discipline particularly difficult to ascertain. This paper presents an approach to building a collection of articles about the subject. We begin with a small dataset of articles taken from a Web Archiving Bibliography and then proceed to expand it by crawling the Web and collecting additional documents. The crawled documents are then classified using machine learning classification techniques. We show that by extracting the documents’ titles and abstracts and representing them using the “bag of words” approach, we are able to accurately identify documents from the Web crawler as documents that are about Web Archiving. We also discuss our results in the context of Web Archiving as an emerging field.</jats:p> Towards building a collection of web archiving research articles Proceedings of the American Society for Information Science and Technology
spellingShingle	Ayala, Brenda Reyes, Caragea, Cornelia, Proceedings of the American Society for Information Science and Technology, Towards building a collection of web archiving research articles, Library and Information Sciences, Information Systems
title	Towards building a collection of web archiving research articles
title_full	Towards building a collection of web archiving research articles
title_fullStr	Towards building a collection of web archiving research articles
title_full_unstemmed	Towards building a collection of web archiving research articles
title_short	Towards building a collection of web archiving research articles
title_sort	towards building a collection of web archiving research articles
title_unstemmed	Towards building a collection of web archiving research articles
topic	Library and Information Sciences, Information Systems
url	http://dx.doi.org/10.1002/meet.2014.14505101150