A citation has a PubMed Unique Identifier (PMID), whose characteristics

are described at It has a list of keywords which are simple strings that are extracted from the published version of the article. It also has a publication type which can have the values listed here Some articles have a list of collaborators. The collaborator names are simple strings in this format: last name followed by a space and up to the first two initials followed by a space and a suffix abbreviation, if applicable, all without periods or a comma after the last name. For example, "McCrary SV" or "Smith AB 3rd ". 
Some articles, especially clinical trial descriptions, may have an associated list of substance names, which are usually names of drugs. Clinical trial citations will also have an associated trial id, called a NCT number. The format is "NCT" followed by an 8- digit number (for example, NCT00000419). If you look at PubMed, you can see that you can click on a "similar articles" link. The algorithm for computing which articles are similar is resource intensive, so similar articles are precomputed and stored. Thus, you can think of each article has having an associated list of similar articles. Authors have a first name, a last name, an email address, and an id. Authors may write many articles that are listed in MEDLINE. Each author has an institutional affiliation, usually their employer. An institutional affiliation record has a department, an institution name, an id, a city, a state or province, a postal code, and a country. An article is published in a publication. A publication has an id, a title, a publisher,a frequency, one or more MeSH terms, and a country. For example, the publisher for the journal The Lancet is Elsevier. Here is a publication record:[Title+Abbreviati on] A MeSH term has the following components - the term itself, an id, a description, and the year it was introduced into the database. You can see MeSH terms at These entries are more complex than our representation, but you can get data from these entries. Finally, you may notice that it is possible to log into PubMed, and to keep lists of favorite citations. Therefore, you will need to keep track of users, the citations that have been viewed by a user, and the citations that a user has saved as a "Favorite". Users have a NCBI user name, a password, and an email. 

Create sample data for each important concept (citations, publications, authors, affiliations, users, MeSH terms, etc.).

