873 stories
·
28 followers

Missing data

1 Comment

Missing data throws a monkey wrench into otherwise elegant plans. Yesterday’s post on genetic sequence data illustrates this point. DNA sequences consist of four bases, but we need to make provision for storing a fifth value for unknowns. If you know there’s a base in a particular position, but you don’t know what its value is, it’s important to record this unknown value to avoid throwing off the alignment of the sequence.

There are endless debates over how to handle missing data because missing data is a dilemma to be managed rather than a problem to be solved. (See Problems vs Dilemmas.)

It’s simply a fact of life that data will be incomplete. The debate stems from how to represent and handle missingness. Maybe the lowest level of a software application represents missing data and the highest uses complete data only. At what level are the missing values removed and how they are removed depends very much on context.

A naive approach to missing data is to not allow it. We’ve all used software that demands that we enter a value for some field whether a value exists or not. Maybe you have to enter a middle name, even though you don’t have a middle name. Or maybe you have to enter your grandfather’s name even though you don’t know his name.

Note that the two examples above illustrate two kinds of missing data: one kind does not exist, while the other certainly exists but is unknown. In practice there are entire taxonomies of missing data. Is in unknown or non-existent? If it is unknown, why is it unknown? If it does not exist, why doesn’t it?

There can be information in missing information. For example, suppose a clinical trial tracks how long people survive after a given treatment. You won’t have complete data until everyone in the study has died. In the mean time, their date of death is missing. If someone’s date of death is missing because they’re still alive, that’s information: you know they’ve survived at least until the current point in time. If someone’s date of death is missing because they were lost to followup, i.e. they dropped out of the study and you lost contact with them, that’s different.

The simplest approach to missing data is throw it away. That can be acceptable in some circumstances, particularly if the amount of missing data is small. But simply discarding missing data can be disastrous. In wide data, data with many different fields per subject, maybe none of your data is complete. Maybe there are many columns and every row is missing something in at least one column.

Throwing away incomplete data can be inefficient or misleading. In the survival study example above, throwing out missing data would give you a very pessimistic assessment of the treatment. The people who lived the longest would be excluded precisely because they’re still living! Your analysis would be based only on those who died shortly after treatment.

Analysis of data with missing values is a world unto itself. It seems paradoxical at first to devise ways to squeeze information out of data that isn’t there. But there are many ways to do just that, each with pros and cons. There are subtle ways to infer the missing values, while also accounting for the fact that these values have been inferred. If done poorly, this can increase bias, but if done well it decreases bias.

Analysis techniques that account for missing data are more complicated than techniques that do not. But they are worth the effort if throwing away missing data would leave you with too little data or give you misleading results. If you’re not concerned about the former, perhaps you should be concerned about the latter. The bias introduced by discarding incomplete data could be hard to foresee until you’ve analyzed the data properly accounting for missing values.

The post Missing data first appeared on John D. Cook.
Read the whole story
luizirber
35 days ago
reply
"There are endless debates over how to handle missing data because missing data is a dilemma to be managed rather than a problem to be solved."
Davis, CA
Share this story
Delete

cognatos

2 Shares


 

você já ouviu falar de "cognatos" e "falsos cognatos" ao estudar um idioma? Basicamente é:⠀⠀⠀⠀⠀⠀⠀
• Cognato: palavras parecidas com mesmo significado (ex.: Sofa)
• Falso Cognato: palavra parecida com alguma em português, mas com significado diferente (ex.: Pretend - significa "fingir", não "pretender").
Sabe mais exemplos? Comenta aí!
⠀⠀⠀⠀⠀⠀⠀
Enfim, espero que tenham gostado do ~conteúdo que adicionei, porque tava com vergonha de ficar colocando só trocadilho idiota nas postagens HAHAHAH



Read the whole story
luizirber
38 days ago
reply
Davis, CA
iaravps
38 days ago
reply
Rio de Janeiro, Brasil
Share this story
Delete

Nancy by Olivia Jaimes for Mon, 06 Sep 2021

1 Share

Nancy by Olivia Jaimes on Mon, 06 Sep 2021

Source - Patreon

Read the whole story
luizirber
42 days ago
reply
Davis, CA
Share this story
Delete

guerrillatech:Black panther poster. Just as relevant today

9 Shares

guerrillatech:

Black panther poster. Just as relevant today

Read the whole story
iaravps
42 days ago
reply
Rio de Janeiro, Brasil
luizirber
45 days ago
reply
Davis, CA
jepler
46 days ago
reply
Earth, Sol system, Western spiral arm
vitormazzi
46 days ago
reply
Brasil
sirshannon
46 days ago
reply
Share this story
Delete

My love-hate affair with technology

3 Shares

Ten years ago I would have considered myself someone who was excited about new technology. I always had the latest smartphone, I would read the reviews of new Android releases with a lot of interest, and I was delighted when things like Google Maps Navigation, speech-to-text, or keyboard swiping made my life easier.

Nowadays, to the average person I probably look like a technology curmudgeon. I don’t have a smart speaker, a smart watch, or any smart home appliances. My 4-year-old phone runs a de-Googled LineageOS that barely runs any apps other than Signal and F-Droid. My house has a Raspberry Pi running Nextcloud for file storage and Pi-hole for ad blocking. When I bought a new TV I refused to connect it to the Internet; instead, I hooked it up to an old PC running Ubuntu so I can watch Netflix, Hulu, etc.

My wife complains that none of the devices in our house work, and she’s right. The Pi-hole blocks a lot of websites, and it’s a struggle to unblock them. Driving the TV with a wireless keyboard is cumbersome. Nextcloud is clunky compared to something like Dropbox or Google Drive. I even tried cloudflared for a while, but I had to give up when DNS kept periodically failing.

One time – no joke – I had a dream that I was using some open-source alternative to a popular piece of software, and it was slow and buggy. I don’t even remember what it was, but I remember being frustrated. This is just what I’m used to nowadays – not using a technology because it’s the best-in-class or makes my life easier, but because it meets some high-minded criteria about how I think software should be: privacy-respecting, open-source, controlled by the user, etc.

To the average person, this is probably crazy. “Nolan,” they’d say. “You couldn’t order a Lyft because their web app didn’t work in Firefox for Android. Your files don’t sync away from home because you’re only running Nextcloud on your local network. Your friends can’t even message you on WhatsApp, Facebook, or Twitter because you don’t have an account and the apps don’t work on your phone. If you want to live in the eighteenth century so bad, why don’t you get a horse and buggy while you’re at it?”

Maybe this nagging voice in my head is right (and I do think these thoughts sometimes). Maybe what I’m practicing is a kind of tech veganism that, like real veganism, is a great idea in theory but really hard to stick to in practice. (And yes, I’ve tried real veganism too. Maybe I should join a monastery at this point.)

On the other hand, I have to remind myself that there are benefits to the somewhat ascetic lifestyle I’ve chosen. The thing that finally pushed me to switch from stock Android to de-Googled LineageOS was all the ads and notifications in Google Maps. I remember fumbling around with a dozen settings, but never being able to get rid of the “Hey, rate this park” message. (Because everything on Earth needs a star rating apparently.)

And now, I don’t have to deal with Google Maps anymore! Instead I deal with OsmAnd~, which broke down the other day and failed to give me directions. So it goes.

Maybe someday I’ll relent. Maybe I’ll say, “I’m too old for this shit” and start using technology that actually works instead of technology that meets some idealistic and probably antiquated notion of software purity. Maybe I’ll be forced to, because I need a pacemaker that isn’t open-source. Or maybe there will be some essential government service that requires a Google or Apple phone – my state’s contact tracing app does! I got jury duty recently and was unsurprised to find that they do everything through Zoom. At what point will it be impossible to be a tech hermit, without being an actual hermit?

That said, I’m still doing what I’m doing for now. It helps that I’m on Mastodon, where there are plenty of folks who are even more hardcore than me. (“I won’t even look at a computer if it’s running non-FLOSS software,” they smirk, typing from their BSD laptop behind five layers of Tor.) Complaining to this crowd about how I can’t buy a TV anymore without it spying on me makes me feel a little bit normal. Just a bit.

The thing that has always bothered me about this, and which continues to bother me, is that I’m only able to live this lifestyle because I have the technical know-how. The average person would neither know how to do any of the things I’m doing (installing a custom Android ROM, setting up Nextcloud, etc.), nor would they probably want to, given that it’s a lot of extra hassle for a sub-par experience.

And who am I, anyway? Edward Snowden? Why am I LARPing as a character in a spy novel when I could be focusing on any one of a million other hobbies in the world?

I guess the answer is: this is my hobby. Figuring out how to get my Raspberry Pi to auto-update is a hobby. Tinkering with my TV setup so that I can get Bluetooth headphones working while the TV is in airplane mode is a hobby. Like a gearhead who’s delighted when his car breaks down (“Hey! Now I can fix it!”), I don’t mind when the technology around me doesn’t work – it gives me something to do on the weekend! But I have no illusions that this lifestyle makes sense for most people. Or that it will even make sense for me, once I get older and probably bored of my hobby.

For the time being, though, I’m going to keep acting like technology is an enemy I need to subdue rather than a purveyor of joys and delights. So if you want to know how it’s going, subscribe to my blog via RSS or message me on Signal. Or if that fails, come visit me in a horse and buggy.



Read the whole story
acdha
52 days ago
reply
Washington, DC
luizirber
58 days ago
reply
Davis, CA
Share this story
Delete

18-04-2021

1 Share


 

Read the whole story
luizirber
87 days ago
reply
Davis, CA
Share this story
Delete
Next Page of Stories