Back to school — 2nd master-thesis learnings (after a few years in the real world)

Getting back to writing into thesis-writing mode, combining it with some of the knowledge gained from outside academia 👨‍🎓 has been… a ride 🎢

Jonas Tillman
7 min readApr 2, 2021

--

Photo by Ben White on Unsplash

TL;DR

I finished all courses for my M.Sc. in Geographic Information Systems (GIS) many years ago. Finding a thesis-topic took time and many iterations.

The planning and writing-process since my first Master-thesis in Econometrics/Statistics in 2013 has been tough, but now I’m seeing the light.

As basically a lonely solo-project, you get to decide everything, you can learn… a lot.

Very few learnings were the ones I expected, and related to the thesis topic. Many were combining work-learnings and tools (some from software engineering, some not), putting academic work-processes in perspective.

  1. True “Agile” — Don’t force your first direction and adjust. Sometimes that means not getting a project possible within agile organizations.
  2. Accept “good enough” sometimes — Ugly things that work can save a lot of time
  3. Processes are key — tools can force processes, “force” is not bad (a.k.a. “Git everything”
  4. Plan for no results and do some “future proofing” — while academic honesty is a breath of fresh air
  5. Take advantage of an excuse to learn and combine learnings
    Work experience ⇾ The user/use cases, and software dev, problem solving = no stress
    • Earlier academics ⇾ visual inspection, old rusty statistics and R-skills
    • Where they intersect ⇾ LaTeX, reproducibility

Background

My thesis is not yet published, and the topic is not of key interest for this article, but in short:

Everyday people around the world use maps on their phones or on webpages to find information on businesses, traffic, and other data. Putting data that is connected to a certain area that don’t have cartographic representations (for example: roads/houses), such as average population, COVID-19 rates or similar is difficult, because the geographic context can be covered by the data. A common method is to make the data see-through, so you can see part of the map background as well. This study looks on how this type of data can be supported to understand the values it represents using different legend designs.

Examples of data mappings where only the legend types were changed were shown in a web browser to people. There was a marker, similar to the type used in Google maps at a location in the data, and the people were asked to guess the value at that location in the map data using the legends. All the responses were saved on the computer, and after doing that for 10 images (2 times for every legend design) the users were asked how useful the legend types were for guessing the value.

Hm… GIF colours a bit limited….

Learnings — Many many many!

(1) True “Agile”

Photo by Eden Constantino on Unsplash

(a) Don’t try to force a thesis topic into agile organizations

In short: If it’s too damn hard, accepting and adjusting is actually agile.

I tried to come up with a GIS-topic for the longest time that would allow me to conduct the work within the frames of my work.

From looking at how you could find coordinates from image frame from aerial photos, HD map representations and accuracy requirements, and map updates… nothing remained clear enough to be a focus for a longer study.

If timeframe for a study needs to be ≥ 1 year, agile company priorities will change. Then it can be better to just go at it yourself 💪

(b) Small-scale testing can teach you a lot and Qualitative learnings can complement qualitative results

This could be long, so short version:

  1. I had planned to distribute the data collection tool online.
    Testing showed that user understanding needed checking first in order not to get too much noise in the data.
    Adjusting to this to collect data on only one laptop in a more qualitative way was better.
  2. I had hoped to collect many proxy variables.
    Seeing people interact with the data collection tool made it clear they would be useless 😿
  3. Don’t overdo (keep it simple) — I had planned to include a colour picker together with a value-picker.
    This would give feedback to the user (= learning effects) and not isolate effects in the collected data.

These data points from early testing allowed me to adjust the way the study was conducted. Going back and collect data again would have been a nightmare 😅

(2) Accept “good enough” sometimes

Ugly code can save you time. If it’s not likely to survive and be changed, then just make it work ASAP.

Below is some of the worst code… I have ever written.

Those timeouts and likely memory hogging… ugh

But it does connect to this… so whatevs 😇

Also not great, but used once and stored forever on Github, and… slightly less awful

Other parts of the project was better, but the trade-off was always in there (do I need to go back to this, or is this a one-off).

(3) Processes are key — tools can force processes (a.k.a. “Git Everything!”)

I will not introduce what Git or version-control software is here — see https://www.atlassian.com/git/tutorials/what-is-version-control

Git is really a life-saver:

  • It forces you to organize your work packages in your head
  • You feel progress continously
  • You are not afraid to test things (you can always go back)
  • Git keeps you on track! It is difficult to work with git without creating a work flow and break down larger problems into smaller pieces.
  • More areas need a tool as good as Git!
Photo by Kirill Pershin on Unsplash

(4) Plan for no results and do some “future proofing”

(a) P-fishing is for suckers (also “data dredging, “data fishing”, “data snooping”, “data butchery”, “significance chasing”, “significance questing”, “selective inference”, “p-hacking”)

Some key things:

  • Academic honesty is a breath of fresh air
  • No statistically significant results is also an interesting result

(b) Instead look elsewhere: future proof for less “hard” results

In my data collection efforts, it became clear pretty early that there would be no strong results in different legend types…

Luckily, I also wanted to see how much users liked the different types and included that in the survey!

In this there are actually some more interesting results…

Photo by Tim Mossholder on Unsplash

(5) Take advantage of an excuse to learn and combine learnings

(a) Work Experience Learnings

  • The USERnot everyone learns and takes in information the same way.
    I learn am a combination of a visual and writing/reader learner, that is not true for everyone. Some people rely more on listening, or hands-on learning.
    (See the need for (1b) Small-scale testing)
  • Related to the USER.
    Academic wording can sometimes “irk” you.
    Your user is a “subject” 🥴.
    Nah, for me they will be “USERS” or “RESPONDENTS” 🥰
  • Things will happen, don’t freak out, stay agile-ish (see section 1)
  • Simple factory functions FTW! (unless you write… library code that needs real optimizing)
    Very simply return an object with it’s own functions — readable, easy for other developers to understand. If you don’t have thousands of the same objects… why optimize for memory, chances are where the functions are stored in the browsers will be closer to the data anyway. If needing more reusability… throw in a mixin 😊

(b) Old study-instincts

  • Always do visual inspection, if you can’t see the results they are not there
  • You forget things you used to know… stats and modeling was a bit rusty, but picks up quickly once you need those old skills again.

(c) Learn new and combine

  • Getting a chance to relearn and learn more is not to be wasted. Most will be forgotten, but at least I know where to look if I need it again
  • This project has been a ride:
  • Scrolling-JS-libraries 📜
  • Perlin noise 💥
  • Perspective and CSS transitions ⤵
  • Clustering-algorithms for image data 🌠
  • Combining SVG into HTML5 Canvas (⌲ + 🌠)
  • Research map libraries 🗺
  • Front end bundler debugging 🐞
  • Image URL-data encoding and automatic extraction (0️⃣1️⃣1️⃣0️⃣1️⃣)
  • Front-end storage 👩‍💻
  • Headless browser usage and custom events (🤯👩‍💻)
  • “Get things done” using Node.js 💪
  • Differences between debounce and throttle 🤷‍♀️
  • Express, Mongoose, MongoDB ❓
  • LaTeX with Rmarkdown ❓
  • LaTeX bibliography management 📚
  • Realizing how difficult to replicate setup… may need to Dockerize… 🚢
    To make anyone wanting to check the results from https://github.com/Tille88/thesis-data-analysis a little easier… (TODO once passing thesis-course…)
  • And some more, phew 😅

All quick-and-dirty commits are available in the following repos:

https://github.com/Tille88/thesis-data-analysis, https://github.com/Tille88/thesis-front-end, https://github.com/Tille88/thesis-back-end, https://github.com/Tille88/thesis-map-generation

Thesis link and maybe data collection front-end deployment to be added once the thesis writing and defence fully done 😇

--

--

Jonas Tillman

Data analyst and javascripter. Dabbles in other things as well.