Gut bacteria promoting colorectal cancer (40 minute biostatistics)

A couple of studies have been published recently, and quite a bit written about them, which link the abundance of types of bacteria found in the mouth with incidence of colorectal cancer.

These finding result from the observation that the oral bacterium Fusobacterium nucleatum is often found in high abundance in colorectal carcinoma tissue samples.

Gram-negative stained culture of F. nucleatum. Image Courtesy of J. Michael Miller, Ph.D.,(D)ABMM of National Center for Zoonotic, Vector-borne, and Enteric Diseases. Picture submitted by him to American Society for Microbiology

Image & caption nicked from: http://microbewiki.kenyon.edu/index.php/Fusobacterium_nucleatum

I've not read up on the science and microbiology in any depth, but I got the feeling that it would be interesting to plot and summarize some of the public health data available on two variables relating to Oral health and Colorectal cancer incidence.

Hence I pulled down some data from the web and made some charts.

US state by state data sets are often complete, with up-to-date and freely available for many observable values.

After a bit of google searching, I came up with these 2 sets from CDC:

The US Centers for disease control and Prevention have data for colorectal cancer for 2009 here;
http://apps.nccd.cdc.gov/uscs/cancersrankedbystate.aspx

[Age-Adjusted Invasive Cancer Incidence Rates and 95% Confidence Intervals by State (Table 5.4.1M) *†Rates are per 100,000 persons and are age-adjusted to the 2000 U.S. standard population (19 age groups - Census P25-1130). Rates are per 100,000 persons and are age-adjusted to the 2000 U.S. standard population (19 age groups - Census P25-1130).]


and the data available for download

The CDC.gov site also carries summary data from the BRFSS survey, which is the "Behavioral Risk Factor Surveillance System", which collected data for 2008 asking adults 18+, whether they "had visited a dentist or dental clinic in the past year": http://apps.nccd.cdc.gov/nohss/ListV.asp?qkey=5&DataSet=2




The two data sets can be merged, and then scatter plot and correlations, and best fit plotted. (assuming linear association and normally distributed data etc etc)



here is the summary data:





TODO:

characterize the association... and consider any other ideas..




August 2013

http://www.cell.com/cell-host-microbe/abstract/S1931-3128(13)00255-2

 http://www.sciencedirect.com/science/article/pii/S1931312813002606


2011


http://genome.cshlp.org/content/22/2/299.full

http://genome.cshlp.org/content/22/2/292.full




genome sequence for Fusobacterium nucleatum SubspeciesPolymorphum
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0000659