Friday, November 30, 2018

Data Analytics in the Undergraduate Curriculum

By David Bressoud

You can now follow me on Twitter @dbressoud

The National Academies will be holding a Roundtable on Data Science Postsecondary Education: Motivating Data Science Education through Social Good on December 10, 2018. Event Website 

If I had to choose the most common job title for students who have graduated from Macalester with a degree in Mathematics, it would be analyst. Our graduates seldom wind up in jobs where they have to find derivatives or integrals, solve differential equations, or even find eigenvalues. Instead, they are almost always working with and trying to make sense of the data that can inform and shape business decisions. The habits of mind intrinsic to mathematics have generally prepared them for this role. But as the data available to business and industry has exploded in quantity and complexity, there is a growing need for graduates familiar with the increasingly sophisticated tools available for its analysis. The challenge to our colleges and universities is to provide the education that will equip graduates to become the data analysts that we need today and for the future.

450.jpg
Cover of the National Academies' Report on Data Science for Undergraduates.

In response to this need, the National Academies have produced a report, Data Science for Undergraduates: Opportunities and Options, that provides a framework for building an undergraduate program in data science. Reflecting the necessarily interdisciplinary nature of such a program, the program is the joint work of the National Research Council’s Computer Science and Telecommunications Board, Board on Mathematical Sciences and Analytics, Committee on Applied and Theoretical Statistics, and Board on Science Education. The official rollout of the report is December 10, 2018 at the roundtable described at the top of this column.

The need is immense. The report references an estimate that by 2020 the U.S. will have positions for 2.7 million data analysts (p. 1-2). Meeting this need is frustrated by many obstacles, not least of which is the fact that few students understand what data science means or entails. Data analysis is also necessarily highly interdisciplinary, requiring new undergraduate programs that draw on expertise in computer science, information science mathematics, and statistics. As the report forcefully states, no single one of these fields adequately covers the core concepts of data science. It can only be taught as an interdisciplinary program. The breadth that is needed is reflected in this passage from the report:

Building on the work of De Veaux et al. (2017), the committee puts forth the following key concept areas for data science: mathematical foundations, computational foundations, statistical foundations, data management and curation, data description and visualization, data modeling and assessment, workflow and reproducibility, communication and teamwork, domain-specific considerations, and ethical problem solving. (p. 2-7)
The report goes into a detailed exploration of the necessary contributions from each of these concept areas.

It also briefly describes programs for majors in data science at the University of Michigan, Smith College, Virginia Tech, UC San Diego, University of Rochester, MIT, UC Irvine, and the NYU School of Professional Studies, programs that are variously housed within a business school, a department of mathematics or statistics, or a computer science department. The report describes a variety of data science minors and highlights the need to provide a basic understanding of data science for all undergraduates.

Macalester College has its own minor in data science. We are particularly well situated for such a program since we have a single department of Mathematics, Statistics, and Computer Science. This is a department that is strong in all three areas and has a long history of cooperation among these disciplines, including several cross-disciplinary faculty hires.

Our data science program begins with Introduction to Data Science, a course on the handling, analysis, and interpretation of big data sets that is intended to be accessible to all students. Students minoring in data science need two computer science courses, which could include our junior-level course in Database Management Systems. They also take Introduction to Statistical Modeling plus a course in Machine Learning, Survival Analysis, or Bayesian Statistics, and two courses in a single domain such as bioinformatics that provide an opportunity for the application of data science methods. A complete description of Macalester’s data science minor can be found at here.


Most math departments lost their faculty who worked in computer science decades ago. Statistics has long been a separate department at many universities. Far too often applied mathematics has been spun off, leaving a department that is increasingly insular, isolated from some of the most important developments in the mathematical sciences today. Separate departments are not necessarily a bad idea provided they are able to work collaboratively and share the work that transcends existing boundaries. If they are to serve their students, today’s departments of mathematics must be engaged in the process of shaping and delivering programs in data science.

References

De Veaux, R., M. Agarwal, M. Averett, B.S. Baumer, A. Bray, T.C. Bressoud, L. Bryant, et al. 2017. Curriculum guidelines for undergraduate programs in data science. Annual Review of Statistics and Its Applications 4:2.1-2.6.

National Academies of Sciences, Engineering, and Medicine. 2018. Data Science for Undergraduates: Opportunities and Options. Washington, DC: The National Academies Press. doi.org/10.17226/25104.

National Academies of Sciences, Engineering, and Medicine. 2018. Roundtable on Data Science Postsecondary Education: Motivating Data Science Education through Social Good. www.eventbrite.com/e/motivating-data-science-education-through-social-good-registration-51307330607


Thursday, November 1, 2018

The Derivative is not the Slope of the Tangent Line


You can now follow me on Twitter @dbressoud


The title of this article is not intended to imply that one cannot use the derivative to find the slope of a tangent line. My point is that we cannot and should not expect students to base their understanding of the derivative on the slope of the tangent.

When I teach either the first or second semester of calculus, I always begin with a short problem set to assess student understanding of a few key ideas. One of the first questions I pose is to give the students a simple cubic polynomial, say x3+ 6x, and ask for both the average rate of change over a given interval, say [0,2], and the instantaneous rate of change at a particular value, say x = 1. Invariably, almost everyone, even at the start of Calculus I, can calculate the instantaneous rate of change. Almost no one gives me the correct average rate of change.

The difficulty is that finding the instantaneous rate is formulaic. If students remember nothing else from calculus, they know that differentiation turns x3+ 6x into 3x+ 6. Asking for the average rate of change requires that they know what this means. I am certain that my students all saw average rates of change in their precalculus courses. They probably saw it again when they were introduced to the derivative in high school calculus. But in a calculus class, it is merely a step in the development of the derivative, a case of what the teacher talks about but not what they need to know for the exam.

The belief that average rates of change are not significant is reinforced when, as in Stewart’s calculus, the derivative is introduced as the slope of the tangent line. The problem is that slope is a problematic concept for many students. Identifying the derivative with the slope of a tangent line suggests a geometric understanding of derivatives. But too often it does no such thing, instead short-circuiting student development of an understanding of the derivative as describing the multiplicative relationship between changes in two linked variables.

The problematic nature of slope and rates of change was nicely documented in a paper by Cameron Byerley and Pat Thompson that appeared last year in the Journal of Mathematical Behavior. In the summers of 2013 and 2014, they administered a diagnostic instrument requiring written responses to 251 high school mathematics teachers.

The following is an example of the kinds of questions that were asked. Part B was asked on a separate page with the answer entered by pen so that teachers could not go back to change the answer to Part A after seeing Part B.

Part A. Mrs. Samber taught an introductory lesson on slope. In the lesson she divided 8.2 by 2.7 to calculate the slope of a line, getting 3.04. Convey to Mrs. Samber’s students what 3.04 means.

Part B. A student explained the meaning of 3.04 by saying, “It means that every time x changes by 1, y changes by 3.04.” Mrs. Samber asked, “What would 3.04 mean if x changes by something other than 1?” What would be a good answer to Mrs. Samber’s question?

The point that Byerley and Thompson were getting at was whether teachers recognized 3.04 as a multiplicative factor connecting the change in x to the change in y. Earlier interviews had revealed that many teachers have a “chunky” understanding of slope, that a slope of ¾ means that if you go 4 units to the right and 3 units up, you will return to the line. One sign of a chunky understanding is an inability to find the increase in y if x changes by something other than 4. Another is the belief that a slope of  –5/6 is different from a slope of 5/–6, indicating that the teacher understands a slope of a/b as meaning a sequence of actions rather than a single number.

A chunky explanation of Part A, similar to the student’s response described in Part B, was given by 78% of the teachers. Part B was included to give them a chance to expand to a multiplicative explanation. Only 8% of the teachers who gave a chunky answer to Part A provided a multiplicative response to Part B.

Further teacher difficulties with the concept of slope and rate of change are illustrated in the following two problems (Figures 1 and 2).

Figure 1. Item Called Relative Rates.
© 2014 Arizona Board of Regents. Used with permission.
Most teachers interpreted the information in Figure 1 as describing a difference, with 54% answering a. Only 28% answered e.

Figure 2. Item Called Slope from Blank Graph.
© 2014 Arizona Board of Regents. Used with permission.

Only 21% of teachers were able to provide a reasonable approximation to the slope for the problem in Figure 2. Most were unable to give any numerical value.

Given teacher difficulties with the concept of slope, we should expect most of our students to enter calculus with an inadequate understanding of what it tells us about the relationship between the variables. While mathematicians hear “slope” and associate it with the multiplicative relationship between changes in the two variables, most of our students interpret it as nothing more than an arbitrary numerical description of the degree of “slantiness.”

Consequently, when we define the derivative as the slope of the tangent, we fail to convey the meaning that makes the derivative so useful. If we want students to understand this meaning, the derivative must be introduced in terms of a multiplicative relationship between changes in the variables. It must be grounded in a thorough understanding of what average rates of change tell us and what a constant rate of change actually implies.

Reference

Byerley, C. and Thompson, P. (2017). Secondary mathematics teachers’ meaning for measure, slope, and rate of change. Journal of Mathematical Behavior. 48:168–193.