{"id":173057,"date":"2021-05-17T07:30:34","date_gmt":"2021-05-17T11:30:34","guid":{"rendered":"https:\/\/today.uconn.edu\/?p=173057"},"modified":"2022-06-22T14:48:49","modified_gmt":"2022-06-22T18:48:49","slug":"the-study-of-big-data-how-clas-researchers-use-data-science","status":"publish","type":"post","link":"https:\/\/today.uconn.edu\/2021\/05\/the-study-of-big-data-how-clas-researchers-use-data-science\/","title":{"rendered":"The Study of Big Data: How CLAS Researchers Use Data Science"},"content":{"rendered":"<p>When Anji Seth was in graduate school, she never thought of herself as a big data scientist.<\/p>\n<p>She just went to her engineering and atmospheric science classes, did the computer programming that was required, and learned as she went.<\/p>\n<p>\u201cAll of my classes required some kind of programming \u2013 it was a natural thing,\u201d she notes. \u201cBut we didn\u2019t train specifically on it \u2013 we just did it. Climate science is one of the original \u2018big data\u2019 problems, but we didn\u2019t always call it that.\u201d<\/p>\n<p>Now, as a professor in UConn\u2019s department of geography, she still doesn\u2019t refer to herself a data scientist \u2013 she\u2019s a climate scientist, first and foremost, she says. But, she notes, that\u2019s the beauty of data science: it\u2019s a \u201cbig umbrella,\u201d she says.<\/p>\n<p>Seth is one of many scientists, social scientists, and even humanists across the College of Liberal Arts and Sciences whose work overlaps the realm of big data, a major component of the College\u2019s research portfolio.<\/p>\n<p>Their work is inherently interdisciplinary, team-focused, and constantly changing.<\/p>\n<p>\u201cOur work gets more and more complicated and computationally intensive over time,\u201d says Seth. \u201cSo the data is inherently big, and getting bigger.\u201d<\/p>\n<p><strong>Climate Challenge<\/strong><\/p>\n<p>For a place like Connecticut, with a relatively small geographic area, Seth\u2019s climate modeling work takes on special significance.<\/p>\n<figure id=\"attachment_173059\" aria-describedby=\"caption-attachment-173059\" style=\"width: 199px\" class=\"wp-caption alignleft\"><img decoding=\"async\" class=\"size-medium wp-image-173059 img-responsive lazyload\" data-src=\"https:\/\/today.uconn.edu\/wp-content\/uploads\/2021\/05\/Anji-Seth-199x300.jpg\" alt=\"Professor of Geography Anji Seth uses climate data to help steer UConn and Connecticut climate change policy.\" width=\"199\" height=\"300\" data-srcset=\"https:\/\/today.uconn.edu\/wp-content\/uploads\/2021\/05\/Anji-Seth-199x300.jpg 199w, https:\/\/today.uconn.edu\/wp-content\/uploads\/2021\/05\/Anji-Seth-279x420.jpg 279w, https:\/\/today.uconn.edu\/wp-content\/uploads\/2021\/05\/Anji-Seth.jpg 332w\" data-sizes=\"(max-width: 199px) 100vw, 199px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 199px; --smush-placeholder-aspect-ratio: 199\/300;\" \/><figcaption id=\"caption-attachment-173059\" class=\"wp-caption-text\">Professor of Geography Anji Seth uses climate data to help steer UConn and Connecticut climate change policy (UConn Photo).<\/figcaption><\/figure>\n<p>Climate model projections are done globally, using computer models of climate that simulate temperature, wind speed, precipitation, humidity, and dozens of other variables\u00a0 at regular intervals around the globe. The areas between simulated points, called grid cells, can be very large in size.<\/p>\n<p>\u201cConnecticut is only a few grid cells,\u201d Seth points out. \u201cSo how can we have confidence in detailed projections of climate change effects for the state of Connecticut?\u201d<\/p>\n<p>She says there are ways to analyze multiple climate models to provide more detailed data for smaller geographic scales, but running a single global model at a resolution of 10 kilometers per cell \u2013 instead of the usual 100 &#8212; requires an enormous amount of computer time.<\/p>\n<p>In addition to their own high-powered computers, she and her graduate students use UConn\u2019s <a href=\"https:\/\/hpc.uconn.edu\/\">High Performance Computing<\/a> facilities for their work. This centralized computing facility has more than 11,000 cores \u2013 each comparable to a traditional computer \u2013 and more than 200 data analysis programs for researcher use.<\/p>\n<p>Working with Governor Ned Lamont\u2019s Climate Change Council (GC3), and using the data analysis methods she\u2019s developed for her own research, Seth co-led a 2019 effort to produce a state climate change report. The GC3 report presented the results to Lamont in January 2021.<\/p>\n<p>The GC3 report spurred the development of three pieces of state legislation concerning transportation cap and trade systems, climate adaptation and reducing greenhouse gases. The first of these passed out of committee and will be heard at the legislative session in the coming weeks.<\/p>\n<p>At UConn, Seth has worked for a year and a half on the UConn President\u2019s Working Group on Sustainability and the Environment, an internal working group concerned with transforming UConn to a zero-carbon campus. The working group is a response to student protest surrounding the Fridays for the Future movement that climate activist Greta Thunberg began in 2018.<\/p>\n<p>The committee made several recommendations to the President and the Board of Trustees in April 2021, on a path forward toward zero-carbon.<\/p>\n<p>\u201cI am steeped in the climate science, so I can give you all the reasons why this is so urgent,\u201d she says. \u201cWe must aim for zero [carbon emissions] by 2040. The science requires it.\u00a0 Environmental justice requires it. As a public university sustainability leader, we can help the state and the nation meet our commitments to the Paris Agreement.\u201d<\/p>\n<p><strong>Inner Space<\/strong><\/p>\n<p>Every year, a wealth of new questions arise about what is happening in outer space, says astronomer Cara Battersby. And each of those questions requires more data and more computing.<\/p>\n<p>\u201cAs our understanding of the Universe becomes more sophisticated, the questions we can ask become more complex, with each generation needing more and more data to ask the next big questions,\u201d the assistant professor of physics says<\/p>\n<p>Battersby\u2019s work focuses on describing and studying the center of the Milky Way galaxy, which she calls an \u201cexperimental playground\u201d for the distant cosmos.<\/p>\n<p>She studies this area because it has properties similar to faraway galaxies, and can help us understand cosmic occurrences that would otherwise be more difficult to study.<\/p>\n<p>\u201cIt\u2019s denser, hotter and at a higher pressure than the rest of our galaxy,\u201d she says.<\/p>\n<p>Battersby works on data from the Submillimeter Array facility, a collection of eight powerful telescopes situated atop Mount Maunakea, the highest point in Hawaii. The telescope can collect up to a terabyte of data every day, and Battersby\u2019s project used 61 days of data.<\/p>\n<p>Her work described the spectroscopy of the galaxy\u2019s center, which analyzes imagery of the area to understand the chemical makeup of the area, as well as its temperature and the velocity of objects.<\/p>\n<p>Importantly, she says, we can then compare these descriptors to other areas of the universe, to determine their similarities and understand how processes will work within them.<\/p>\n<p>\u201cPrevious models were formed using information from the disc of the galaxy,\u201d she says, where physical properties are very different from the center. \u201cOur survey is the first to be sensitive to all these star-forming cores.\u201d<\/p>\n<p>The star-forming cores, or precursors to stars, have turned out to produce stars about 10 times slower than cores in the disc of the galaxy. Battersby says this difference is crucial to getting models right to interpret information gathered from far-off ends of the universe.<\/p>\n<p>Her publications sets the stage for future predictive work done in the center of the galaxy.<\/p>\n<p>Battersby refers to her computer as \u201cher laboratory,\u201d and ensures the students in her classes do, too. In her courses she often assigns programming and analysis problems, like using a large data set to determine the material composition of the Sun.<\/p>\n<p>\u201cWe have a lot of the tools to train students in data science,\u201d she says. \u201cResearch is moving in that direction, and students in our programs are prepared for it.\u201d<\/p>\n<p><strong>When The Data are Missing<\/strong><\/p>\n<p>\u201cI was trained as a statistician, to do theoretical and methodological statistics,\u201d he says. \u201cBut at the end of the day, I enjoy solving real world problems.\u201d<\/p>\n<figure id=\"attachment_173060\" aria-describedby=\"caption-attachment-173060\" style=\"width: 300px\" class=\"wp-caption alignright\"><img decoding=\"async\" class=\"wp-image-173060 size-medium img-responsive lazyload\" data-src=\"https:\/\/today.uconn.edu\/wp-content\/uploads\/2021\/05\/chen210427cb006-web-300x200.jpg\" alt=\"As a statistician, associate professor Kun Chen consults broadly on studies that use data science to address public health problems.\" width=\"300\" height=\"200\" data-srcset=\"https:\/\/today.uconn.edu\/wp-content\/uploads\/2021\/05\/chen210427cb006-web-300x200.jpg 300w, https:\/\/today.uconn.edu\/wp-content\/uploads\/2021\/05\/chen210427cb006-web-1024x683.jpg 1024w, https:\/\/today.uconn.edu\/wp-content\/uploads\/2021\/05\/chen210427cb006-web-768x512.jpg 768w, https:\/\/today.uconn.edu\/wp-content\/uploads\/2021\/05\/chen210427cb006-web-630x420.jpg 630w, https:\/\/today.uconn.edu\/wp-content\/uploads\/2021\/05\/chen210427cb006-web-150x100.jpg 150w, https:\/\/today.uconn.edu\/wp-content\/uploads\/2021\/05\/chen210427cb006-web-998x665.jpg 998w, https:\/\/today.uconn.edu\/wp-content\/uploads\/2021\/05\/chen210427cb006-web.jpg 1500w\" data-sizes=\"(max-width: 300px) 100vw, 300px\" src=\"data:image\/svg+xml;base64,PHN2ZyB3aWR0aD0iMSIgaGVpZ2h0PSIxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciPjwvc3ZnPg==\" style=\"--smush-placeholder-width: 300px; --smush-placeholder-aspect-ratio: 300\/200;\" \/><figcaption id=\"caption-attachment-173060\" class=\"wp-caption-text\">As a statistician, associate professor Kun Chen consults broadly on studies that use data science to address public health problems. (Bri Diaz\/UConn Photo)<\/figcaption><\/figure>\n<p>One recent common link among projects on which he\u2019s consulted is that of uncertain, missing or inconsistent data.<\/p>\n<p>In a recent publication, he and colleagues present a new way of approaching large data sets with models to understand what\u2019s called heterogeneity, or how different factors influence the outcome of an analysis in different ways.<\/p>\n<p>For example, an imaging-genetics study may involve hundreds or thousands of possible genetic markers that may influence Alzheimer\u2019s disease, and a predictive model may identify 30 of those as the most useful and important to study. Of those 30 markers, Chen says, his new model can help determine which have what kinds of effects in different subgroups of patients \u2013 like, which are active or inactive at different stages of the disease.<\/p>\n<p>This work is in collaboration with researchers at UConn Health, Yale University, and University of California, Riverside.<\/p>\n<p>Chen\u2019s model can also work to understand social questions, such as the risk of suicide among students at particular school districts. Using demographic, socioeconomic and academic data, Chen and Robert Aseltine, professor and chair of behavioral science and community health, have worked with Connecticut schools to study the relative risk of suicide attempts among their student populations.<\/p>\n<p>In this case, the goal isn\u2019t always to match up with reality, he notes. If the actual rate of suicide attempts is higher or lower than predicted by the district-level factors, the \u201coutlying\u201d school could be a subject of future study.<\/p>\n<p>He is also working with Aseltine, a medical sociologist, and Fei Wang at Weill Cornell Medicine, a computer scientist, to determine suicide risks in clinical settings with big medical claims and electronic health records data.<\/p>\n<p>\u201cCan we actually predict the risk of suicide to improve suicide prevention?\u201d he asks. \u201cIt\u2019s a fascinating question. If I can do something to help figure it out, that makes me excited.\u201d<\/p>\n<p>Chen also collaborates with Board of Trustees Distinguished Professor of Psychological Sciences Blair Johnson and Professor of Sociology Mary Bernstein to understand community-level factors contributing to rates of gun violence. His new models may be applicable since gun violence data is collected primarily from qualitative sources, such as news reports, so tends to be inconsistent and have missing data points.<\/p>\n<p>Chen enjoys the diversity of research his background allows him to work on, and he sees collaborative work as the future not only of applied statistics, but of big data in general.<\/p>\n<p>\u201cThere&#8217;s no way I, as a statistician, can do all the work or can even understand the problem as deep as a domain expert,\u201d he says. \u201cIf you want to address big questions, you have to have a team of people.\u201d<\/p>\n<p>&nbsp;<\/p>\n<p><em>This article is the final story in a series about emerging research areas in UConn\u2019s College of Liberal Arts and Sciences. Learn more at #DiscoverUConnCLAS.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>UConn researchers are using big data to attack issues of climate, space, genetics and public health<\/p>\n","protected":false},"author":37,"featured_media":173058,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_crdt_document":"","wds_primary_category":0,"wds_primary_series":0,"wds_primary_attribution":0,"footnotes":""},"categories":[2226,2404,2076,2235,2225],"tags":[],"magazine-issues":[],"coauthors":[1860],"class_list":["post-173057","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-clas","category-data-science","category-research","category-today-homepage","category-uconn-storrs"],"pp_statuses_selecting_workflow":false,"pp_workflow_action":"current","pp_status_selection":"publish","acf":[],"publishpress_future_action":{"enabled":false,"date":"2026-04-22 04:53:13","action":"change-status","newStatus":"draft","terms":[],"taxonomy":"category","extraData":[]},"publishpress_future_workflow_manual_trigger":{"enabledWorkflows":[]},"_links":{"self":[{"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/posts\/173057","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/users\/37"}],"replies":[{"embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/comments?post=173057"}],"version-history":[{"count":3,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/posts\/173057\/revisions"}],"predecessor-version":[{"id":173079,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/posts\/173057\/revisions\/173079"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/media\/173058"}],"wp:attachment":[{"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/media?parent=173057"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/categories?post=173057"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/tags?post=173057"},{"taxonomy":"magazine-issue","embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/magazine-issues?post=173057"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/today.uconn.edu\/wp-rest\/wp\/v2\/coauthors?post=173057"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}