I analyzed metadata regarding the preceding 10 years of articles published in the Journal of Pediatric Surgery. The intent was to ascertain a) where the articles came from geographically: country, zip code (for US sources), institution b) data regarding authors-average number of authors per paper per year, “top 25″ authors and their number of publications, c) article content by keyword analysis-most common keywords, “top 25″ most common topics/keywords, d) analysis of content by title and abstract-searching for the most common words found in both title and abstract, and their relative frequency. The journal was analyzed over a ten-year period. Trends were looked for: a) number of articles per year, b) number of authors per article per year, c) site of origin of the articles over time, d) “popular topics” as they changed over time.
Methods
The United States National Library of Medicine “PubMed” database was queried via EndNote software (Windows version 7, Thomson ISI). This data was sorted by year, and exported into an XML file (“extensible markup language”). The text analysis program was written in the Ruby language (a high level scripting language) and used to search and parse the XML files. Regular expressions “Regex” were used in the queries. For example, “/d{5}-d{4}/ =~ x” means ‘search for any text string five digits in length, followed by a hyphen, with four additional digits at the end’ (zip code search). The data was analyzed with Mathematica 5.1 (Wolfram Research, Peoria, IL).