Docker and R – Talk for the Greater Cleveland R Group

I was able to give a talk recently to the Greater Cleveland R Group about using Docker and R together. Docker is a virtualization platform that uses OS-level virtualization to provide containers. It’s pretty neat, so I wanted to share the talk here. Big thanks to Tim Hoolihan for both organizing the R User Group in Cleveland, and also filming and nicely editing my video when part of the presentation unexpectedly did not work.

On Encryption, Part 2: Of Rung Bells and Cats Out of Bags

Last time, we discussed the basics of encryption, and talked about concepts like security through obscurity, Kerckhoffs’ principle, Linus’s law, and the fundamental challenge of encryption (the adversary and you). In this post, we’ll discuss how those concepts apply to ideas like backdoors and deliberate flaws in encryption software, and then I’ll end with a few observations about the modern computing landscape and where I think we ought to go.

For the previous post, click here.

Continue reading On Encryption, Part 2: Of Rung Bells and Cats Out of Bags →

On Encryption, Part 1: The Morality of Mathematics

I have been privileged throughout my life to be surrounded by many intelligent and articulate people. Between friends, colleagues, and family members, I regularly have a chance to engage in real, honest, and meaningful discussion on a wide range of subjects. I value and cherish this fact for numerous reasons, not the least of which being that it forces me to be better: to be more rigorous, to be more intellectually honest, to be more proactive in examining the world. A few days ago, when Apple released their letter to customers regarding the San Bernadino case, I had such an engagement with my friend James. Our conversation reaffirmed my desire to write this series of posts, discussing this subject and trying to both inform and advocate, in a way that hopefully distinguishes between the two.

This first post will be mostly a discussion of facts and terminology. A lot of people I know, especially my less technically-minded friends and colleagues, are not really aware of the state of modern cryptography, and the subject is woefully misconstrued in the popular press and often in our political system. People cannot have a meaningful discussion about this important topic when so many of us operate from a position of relative ignorance on encryption. Hopefully, I can do a little to change that.

My second post will finalize some technical distinctions, and will then offer some arguments about how we respond to encryption in our modern lives.

I’m going to explicitly avoid discussing the particulars of the Apple case, as it is a little different from some of the broader items that I’ll discuss below. I’ll see about possibly doing a follow-up on it at some point, or you could just consider reading these several items that I’ve found educational:

Continue reading On Encryption, Part 1: The Morality of Mathematics →

My R Coding Convention

It seems like many R programmers (probably, many programmers in general) end up writing a post of this type, so I decided to jump on the bandwagon. I recently switched jobs, so I am at a nice point to make a “fresh start” with my coding conventions: I am not facing the need to refactor years of my programs to be consistent.

On top of this new start, over the last few months, I’ve caught my R conventions evolving – and also becoming inconsistent. The genesis of this was my realization, thanks to Hadley Wickham’s style guide, that embedding dots in my user-defined functions was actually rather bad practice, since S3 concatenates methods for classes together using dot (i.e. plot.function, plot.ts, and so on). Prior to that, I had been fairly consistent in naming my user-defined functions using the prefix func. (for instance, func.query_model_data).

Anyway – ever since I realized this issue, my programming style has been kind of fluctuating, because I never sat down and decided what to do now that my intuitive approach was not ideal. Today, I’m aiming to rectify that!

Continue reading My R Coding Convention →

Intro to Shiny – Talk for the Greater Cleveland R Group

I was able to give a talk recently to the Greater Cleveland R Group about Shiny, the R package that lets you build web apps for data visualization and analysis using R. It’s a pretty neat package, and I wanted to share my slides, code, and presentation here.

My thanks to Gaurav Narain Saxena for recording the presentation for me!




We Stopped Dreaming (Part 2)

I posted the original of this several years ago, but hadn’t realized until tonight that there was a second part, which I’m posting again now. Take five minutes to listen to the wonderful Neil deGrasse Tyson talk about space and culture.

Write your own! On having better habits as an R programmer

I contribute responses to Stack Overflow pretty frequently. I like answering well-written questions and enjoy that it keeps my skills sharp. However, one area of annoyance for me on Stack Overflow is that many answers start with “You can do this using the <insert package name here> package” – even when the task at hand can be handled in base R. For many posters, that’s probably not a big deal, but I find myself getting those answers occasionally on my questions, even when I explicitly ask for base R solutions.

“So what?” I can hear you asking it already, and it’s a valid question. After all, one of the great benefits of R is that you can tap into the collective talent of thousands of statistical programmers across the globe. In part, that’s what makes R such a powerful tool for data scientists and statisticians – the fact that it is, for all intents and purposes, the “bleeding edge” of statistical methods development. If you want to find someone working with a new type of analysis, you look for their R code. You can know for sure that it won’t be included in SAS for at least five years, if ever. (That’s not a slam on SAS, per se – it’s a recognition that the two tools are used for different things.)

But I suggest that there are many reasons to limit use of third-party packages, and that in the context of Stack Overflow, it is as much a detriment as it is a benefit. So, my proposal is this: the default position of all R programmers (and especially new R programmers) should be to “do it in base R” for a lot of bread-and-butter tasks, and that external packages should be limited to a) specialized tasks that would take an inordinate amount of time to code manually, or b) analytic methods where the published packages are written by the people developing the methods. (To be clear, here, I define “base R” as including the packages that come with R in a clean install.) Forcing yourself write your own solutions will make you a better R programmer, and will make your code more sustainable over the long run.

Continue reading Write your own! On having better habits as an R programmer →

This One Chart Perfectly Sums Up Why Most Posts That Start This Way Are Total Lies!!!

How many times have you heard this line before? “This one graph perfectly sums up the current plight of Millenials!” “This one chart shows everything about global warming in a nutshell!” It’s one of the more common clickbait articles, but as a data science professional and Edward Tufte fan, I just can’t take it anymore. I decided I’d make my own graph explaining this.

Continue reading This One Chart Perfectly Sums Up Why Most Posts That Start This Way Are Total Lies!!! →

Public Health Informatics and the Future of Public Health

I wrote this essay almost six years ago, while I was in graduate school working on my MPH. I stumbled across it recently and thought to share it, especially given the direction in which my career has moved.

Public health informatics represents an exciting area of future growth in public health, where many different disciplines are applied to solving practical social problems. By attaching established public health practices to the rapidly expanding technological tools available in our modern society, whole new avenues of analysis and intervention can be developed for public health practice. Harnessing the information available thanks to this fusion of technology and practice will alter, and may fundamentally change, the function and nature of public health. New challenges will arise with new frontiers in informatics, and preparing for these challenges – by understanding the power of informatics, as well as its limits and dangers – is a necessary task for all public health practitioners.

Public health informatics can be defined as the “Systematic application of information and computer science and technology to public health practice, research and learning.” We can simplify the definition by stating that informatics deals with how to acquire, store, and utilize data in effective ways. A central issue in informatics is, therefore, data – more specifically, the problems associated with gathering data, presenting data, and using data.

Continue reading Public Health Informatics and the Future of Public Health →

Learn to Code

I love this video. I mean, I just love it. It inspires me about our future, and it truly moves me to see some of the titans of information technology talking about how they first discovered coding. Listening to them talk about the thrill of making a computer do something on their command reminds me of my first heady days doing programming.