The TeenScreen Debrief

2017-03-19

    As part of my first job hunt, I’ve had to talk about previous technical projects I’ve done. I worked on my most significant project last summer, so in order to better collect my thoughts and generally reflect on the experience, I’d like to tell you about it here.

    Call to adventure

    Last year, I applied to work with CS + Mental Health, a student group that connects Stanford CS students with Stanford medical school professors/researchers to work on meaningful projects.

    I was assigned to TeenScreen, a mental health screening tool used to identify teenagers at risk for depression, anxiety, substance abuse, and other mental health conditions. The original Java software was developed 10 years ago and hadn’t been updated since. This outdated version was still being used on around 13,000 adolescents every year.

    At a high level, the screening tool was a multiple-choice and free-response survey broken down into 20+ sections (e.g. depression, eating disorders, ADHD, etc). Adolescents would complete the survey, and based on the numerical scores calculated for each section and overall, the administrator could decide whether or not to recommend the adolescent for further professional help.

    CS + Mental Health’s goal was to develop an updated, web-based version of the screening software. There were some significant benefits, which weren’t all obvious to me at the beginning. In increasing order of utility:

    1. No installation: Previously, administrators would physically install a copy of the software on all the required computers (the test was often administered in school computer labs to multiple students at a time).
    2. Automatic aggregate reports: The Java software could show one test result for one student at a time, and administrators told me on phone calls that they would print out these individual reports and tabulate group summaries by hand. Why, if only there existed some tool that could perform such mindless and repetitive computations for them. Some sort of computer
    3. Scientific validation: The Java software only stored test results locally, which meant each school’s data was siloed. With a web app and a single database, administrators wouldn’t have to worry about backing up data or passing along inconsistent, unorganized file directories of reports to successors or researchers. Additionally, researchers could access anonymized test results from tests administered around the country. They could track national and regional mental health trends over the years, validate the effectiveness of different sections of the test, etc.

    High-level goals

    I was assigned as a project lead for TeenScreen, along with another senior, who had experience with databases and stats, to lead a team of four. We decided on the following roadmap:

    1. Design the overall system structure and decide how to technically implement the survey
    2. Muck about through old Java software to scrape for the questions and responses
    3. Create working survey frontend
    4. Design database schema and store survey data
    5. Create frontend to access data and visualize reports

    Overall system design

    The test has 20 separate sections, each testing for a different mental health condition.section

    • Section:
      • sectionID
      • Symptom Name (e.g. “Depression”)
      • isMandatory
      • Array of section questions

    After the administrator had set up the test for the student, the student would respond to questions one-by-one. An audio recording of the question text would automatically play along for students who weren’t literate. The user would not be allowed to progress if no response was entered, but they could go back to change previous responses. Besides entering a response, the three actions available to the user on the screen were:

    • Go back one question
    • Go forward one question
    • Replay the audio

    There were also some expository screens explaining what the student would be tested on next. These also had audio recordings, but did not have responses that students could select.

    Some questions had followups that would be skipped if the user answered no, which led to a deceptive amount of complexity (explained further below).

    • Question:
      • ID
      • Question text
      • Response type (including “No response” for expository screens)
      • hasFollowups

    Although most questions were yes/no, some were frequency-based (all of the time, some of the time, seldom, etc) and others were multiple selection (check all that apply). In addition, there was a demographics section with unique response types and several free response questions as well.

    In order to calculate the symptom scores, values for the responses had to be kept track of to be summed up at the end.

    • Response:
      • Response type
      • Text for each response
      • Numeric values for each response

    Web tradeoffs

    There were some tradeoffs to creating a web-based app that we identified over the first few week. The major downside was that schools would need Internet connectivity, which isn’t ubiquitous or consistent around the country. In addition, students might accidentally (or deliberately) escape from the survey given that it was just administered in a web browser.

    Concurrency wouldn’t be a significant issue with this web app. There would never be a situation where two students would be sharing a screening session since each the test was taken individually. Since the reports were all being stored to the same database, we could guarantee that each report would have a unique ID (unlike if two administrators used the same naming convention with their local copies of the Java software). In addition, we did not allow for administrators to edit previously submitted surveys, so reports could only be accessed with read-only permissions.

    Overall, a web app provided many benefits over the existing software and these tradeoffs could be mitigated further down the road.

    Now that we had a general sense of what we were building, we could move forward.

    Recreating data

    There was no digital copy of the TeenScreen questions and responses, so we had to recreate those by going through the Java software. The reports that the Java software generated included the full question text along with the responses, so I was able to scrape a dummy report with Python to port into a CSV file, which would allow future non-technical contributors to update or modify questions without mucking through the codebase.

    Meanwhile, the other team members determined there were 11 total response types. The response types and their respective text for the multiple-choice responses were also ported into CSV files.

    Create working survey

    Only one other team member had web development experience, so we worked together on implementing the survey frontend. He had more web dev experience, so I deferred to his decision to use the MEAN stack. We decided that Angular wouldn’t be necessary since our application was mainly going to be form inputs, which HTML already took care of. What fools we were (explained further below).

    To address the risk of a student accidentally escaping from the survey, the web app would automatically expand to full-screen and the default key bindings of the Escape and Backspace keys were blocked. This didn’t completely solve the problem since hovering the cursor at the top of the screen would bring down the browser bar, but it was a big improvement in terms of preventing user error.

    By the end of the school year, we had implemented login authentication, survey initiation (i.e. inputting student ID, some other fields, and which sections to test for), and progression through survey questions with basic error checking. The survey was still buggy and the backend and report generation still had to be implemented. I decided to see this project through to completion, so I worked on it by myself for the summer (other team members had summer plans).

    Caching

    The followup question implementation was deceptively tricky. Let’s say there are 20 questions total in section A, with followup questions starting at question 15. If the user answered question 14 in a way that skipped the followups, it would auto-progress to the section B. However, if the user decided to go back, the app couldn’t just serve question 20 from section A. It would have to remember which question was last answered, so the app would have to either hold all previous questions and responses in memory or retrieve them from the database (assuming there was a write after every section). In addition, should everything after the changed response be reset or preserved?

    First, let’s consider how often to write to the database. Per-section writes to the database would be better in case a user accidentally exited the test or just didn’t finish. However, if a user changed an answer to a question that also had followup questions, the app would have to know which followup responses to modify or clear in the overwrite. It would certainly add complexity to figure out how an administrator would find and retrieve an incomplete survey from the database and then repopulate the survey with the filled responses. Given my timeframe and priorities, I decided that the app would write to database just once, at the end of the survey. If a student exited prematurely, they would just have to start over.

    I ended up using an array to keep track of which questions had been filled out and answered. If a user clicked “previous,” it would pop off the last element to re-render. Re-rendering was easy because the app used jQuery’s toggle() function to hide every question except for the current one. It wasn’t the most theoretically efficient way, but it worked given the speed of modern browsers and the relatively small number of questions in the survey.view

    Clicking “next” would call a helper function to determine whether to proceed to the next sequential number or jump to the start of the next section.

    Besides any followup questions, I decided all other filled responses would be preserved since they were in separate sections from the changed responses.

    It was hard to say how crucial some of these concerns were without user testing, but I still had to make these decisions.

    Design database schema and store survey data

    Now I was getting into database territory, which I hadn’t really done before. My aforementioned co-project lead had other summertime obligations but had written some starter code during the school year for me to use as a jumping-off point.

    Using MongoDB was a decision we made early on, and I didn’t consider researching the alternatives. It was the “M” in MEAN, so it was legit, right? Over the summer, I discovered there’s quite a bit of backlash over MongoDB. After taking a databases class this fall quarter, I also see a lot of benefits to using SQL. For what it’s worth, lack of schema requirements made it easy to iterate on the fly and it was easy to get setup with Node, which was important to keep the project moving forward.

    The biggest question was whether to store data by student or by disorder. “By student” was the logical choice at first. At the end of the survey, just package everything up into JSON and store the test as one document. This made for easy retrieval for an individual report and was intuitive with MongoDB’s document database paradigm.

    However, if TeenScreen was to be used for aggregate reports, then storing data by section (and using a SQL-based database) would lead to better performance. If a researcher wanted to see how many students were at risk for depression in a certain month, simply do a sum on the Depression table for the given timeframe. If it was stored by student using MongoDB, summation would require O(n) time to access the Depression score of each corresponding survey.

    Another area I was unsure of was whether some data belonged on the client or server side. For example, should I have a field that indicates whether a student is at risk for a certain disorder (hasDepression = true), or is that something that the client-side should just visually represent based on the score (if (depressionScore > 6) { style = warningColor })?

    Overall, the database design required more insight on how data would be accessed and what the storage constraints would be. Obviously, the schema debate would be moot if we could just store both by student and by section, but because the data had to be stored in a HIPAA-compliant manner on an academic budget, there were concerns of accumulated cost over time.

    After I had finished my work during the summer, the professor received third-party advice to use REDcap, which honestly seems to cover a lot of both frontend and backend functionality I worked on. Had we known about it earlier, we would have implemented the backend in a totally different manner. As it stands, the app requires custom PHP hooks to properly route the data between server and client.

    Miscellaneous

    I initially set up the app to store test data by a combination of school and student ID, as this would uniquely identify a student. Thus, the app would check if the key was already in MongoDB, and if it was, the old version of a test was overwritten. This seemed like a way to handle the case where a student didn’t complete their test — it would be overwritten when they started over so the administrator would never see partial data.

    However, after learning that administrators actually tested students repeatedly to track changes, I changed the app to always create a new MongoDB entry. This actually simplified the record creation process because I no longer needed to do a key-existence check. On the flip side, the URL for a rendered report used to be the student ID, which was very clean. Now, the URL was a long alphanumeric string based on the MongoDB _id field (a minor tradeoff in the grand scheme of things).

    Based on my phone calls, administrators still wanted the options to print and save reports locally. In terms of UX, it’d be pretty lame to tell them to use the browser’s built-in “Print” and “Save as“ functionality, so I looked into this after implementing the major functionalities above. The main hurdle here was that although the CSS formatting for the reports was really nice for webdetail, it was unwieldy for print. Generous whitespace between rows and a column-based view meant an excessive number of pages to print. The workaround involved custom classes in the @page CSS selector to expand columns, strip fancy formatting and line spacing, and hide headers and sidebars when printing.

    Although it was easy to update the wording of the questions after we had ported them to CSV, there was no easy way to update the corresponding audio recordings. Those would have to be re-recorded (in both English and Spanish). I ended up using a dummy mp3 filemp3 to test the audio replay functionality and documented a file-naming system for future recordings that mapped to the ID field in the questions CSV, reducing the need to wade through the codebase for future updates and coupling the audio with its corresponding question.

    There was also one frustrating day where nothing was rendering because it turns out the CSV files I had modified had to be saved specifically as Windows-formatted CSV.

    Big takeaways

    Over the next few months, I learned a tremendous amount by sheer necessity, from methodically tackling bugs to prioritizing design decisions on a deadline. It’s amazing how much complexity arises out of even a simple system. When I started the project, it seemed like a relatively easy task to combine a bunch of form inputs together and display the results, but scalable maintenance was trickier than expected. Turns out the devil is in the details.

    Constants

    Given the number of CSV files I was dealing with, there were a lot of arbitrary integers floating around that represented sections, questions, response types, etc. The mixture of integers and string representation of integers also made comparisons tricky. To my credit, I did declare these as constants, but it would have made sense to keep them all in a separate file to import. Same goes for the helper functions that weren’t directly tied to making the survey work. In general, I wasn’t familiar at the time with the concept of modules or bundling in JavaScript. I knew how to import .h files in C and C++ for school courses but didn’t think about how the same modularity could be applied to web development. I’ve since delved into the great debates over Grunt, Gulp, Webpack, Rollup, etc, and I’m blown away by how much choice and community support web developers have for their tools.

    View library/framework

    Remember when I mentioned that we decided against using Angular and stuck with just basic HTML form inputs and jQuery? That made life a lot harder for me during the summer. Vanilla HTML was easy to work with at first, and the files were clean since we used Jade formatting.pug

    Besides, why would we spend crucial weeks learning something new if jQuery would work? This early decision led to a big mess of spaghetti code by the end of the summer. As the complexity of the app grew, it became increasingly difficult to reason about the state of the UI and data, given their tight coupling. Modifying pre-existing code was like pulling a block out of the bottom of a Jenga tower — I was never sure what would topple from above.

    I began learning about React and other view libraries/frameworks that summer, but given that my main priority was to have a fully-functional deliverable by the end of the summer, I decided it was best to forge onward rather than start over with a framework I’d have to learn from scratch. I did find the time to work through several tutorials on various topics (Redux, Meteor, TypeScript) to prepare me for future projects.

    I say all this over half a year later, and I’ve learned and grown since then. It’s possible 2016-Kevin would have been overwhelmed with picking from so many choices on top of actually implementing the deliverables. At the end of that summer, I can at least say I delivered what I promised without the onset of JavaScript fatigue.

    Testing

    I’ve only recently begun learning about test-driven development (mainly from Sandi Metz), but I can see its benefits based on my own challenges (especially for the followup question implementation I mentioned above). Over the course of the summer, I wasted hours just from initializing dummy surveys and rapidly clicking through them just to reach a specific section for testing when I could have just created a test harness with something like Tape or Intern.

    Aside from unit testing with TDD and BDD, cross-browser compatibility testing would have been helpful too since school computer labs across the nation use a wide variety of OS versions (both desktop and tablet) and screen resolutions.

    The entire framework

    I can’t think of any specific advantages a Node stack has for the TeenScreen app. As a team of students, we had heard of MEAN and were eager to learn it. From what I’ve read since then, Node shines for real-time I/O (e.g. chat), which this app certainly doesn’t require. If I wasn’t thinking about job marketability, I could probably redo the entire app with a couple Ruby on Rails generators.

    However, as a student, I now appreciate the opportunity to learn the nitty-gritty of how web applications are built rather than letting Rails magically abstract away a great deal of the heavy lifting. TeenScreen was a relatively straightforward project, so I was able to work with all aspects of the stack without getting too stuck in unexpected complexity.

    Mentorship

    This all speaks to a larger pitfall of working on my own — although the freedom and independence of solo work is nice, I didn’t have any mentors or peers to turn to for best practices or debugging help. Stack Overflow and Medium articles can only get you so far (I can’t Google for help if I don’t know what I don’t know). Someone once asked me if we had any technical validation on what we were doing during the school year, e.g. an industry developer who could provide feedback on our progress. We didn’t, and it never even occurred to me that we could have asked someone with experience to look at our codebase or system design.

    I would recommend for CS + MH to look into such mentorship for future student projects. We had access to CS professors, but CS professors often stopped doing hands-on coding after their dissertations. So industry professionals then! From my experience, getting a group of Stanford students to meet is like herding a bunch of nerdy cats. Adding a full-time adult into the mix would certainly add to that complexity, but the extra effort could save untold man-hours down the road.

    Unexpected hurdles

    The Stanford professor spearheading the project had himself inherited TeenScreen and was non-technical. So, although he could provide us with feedback about user needs and HIPAA compliance, there were gaps in his knowledge. To give an example from the beginning, we were given access to a deeply-nested Box folder and knew the original Java-based app was in there somewhere, but it took the combined effort of several team members to actually find the correct directory.

    Later on, we realized we didn’t know how scores were calculated or what the thresholds were for a student to be considered at-risk for a mental health condition. The best we could do was run dummy tests and try to reverse-engineer the scoring. It wasn’t until much later that I found a manual (also deeply nested in the Box folder) that detailed the scoring, and even that was poorly-documented (turns out there are two different versions of the tests; the shorter one is used for younger children).

    Two of team members were assigned to the group by CS + MH for their graphic design experience rather than technical ability. As we defined the scope of the project early on, we realized there wasn’t much need for graphic design on this project. Still, I made sure everyone was included in the system design discussions, and everyone was able to help out with data scraping from the Java software. As we moved into the frontend phase of the project, I asked them what they wanted to get out of their CS + MH experience and provided them with tutorials on basic web development and Git to work on.

    Students could get independent study units from participating in CS + MH, so it was also my responsibility to monitor and evaluate my team members at the end of the quarter. Out of privacy concerns, I won’t get into this too much, but I had to engage in some difficult conversations at the end of the quarter.

    My main takeaway from such experiences was that the technical challenges aren’t always the most difficult ones. At the very least, I had the agency to define the technical challenges we would tackle, whereas the real world doesn’t care much about my agency.

    Conclusion

    So that’s TeenScreen. I delivered the prototype as summer ended, with all promised functionality delivered. I couldn’t commit to continuing work as I began taking classes again and TAing (and applying for jobs), but another group of students is currently working on the REDcap integration.

    I’m immensely grateful for the opportunity to work on TeenScreen and the responsibility I was entrusted with. I learned not only about full-stack web development, but also determining user needs, leading a team (with technical and non-technical members), and meeting deadlines.

    Moreover, I learned first-hand that there are still vast inefficiencies in a variety of industries that can be reduced or eliminated with computer science. The new-and-improved TeenScreen will not only save administrators countless man-hours from printing out and tallying scores but also provide researchers with invaluable data to further validate and improve TeenScreen’s utility. The conversation around mental health is growing in the United States, and I hope my contributions will add to this conversation, improving early detection and rehabilitation efforts.


    1. The original Java software allowed the administrator to un-select any of the sections, which I found baffling once I learned that 11 of the sections were mandatory for the final symptom score calculations.
    2. Another area which could have been addressed more elegantly with a view framework.
    3. I included a table-of-contents to jump between sections, sortable tables for each section, color-coded recommendations for further evaluation, and more!
    4. I used an actual soundbite at first, but changed it to a snippet of silence after hearing the same phrase one too many times during testing.
    5. Now known as pug due to dumb trademark issues.