876 stories
·
28 followers

Methods Included

1 Comment and 2 Shares

Standardizing computational reuse and portability with the Common Workflow Language.

Read the whole story
biocrusoe
4 days ago
reply
Our big paper is out!
Lansing, MI
luizirber
2 days ago
reply
Davis, CA
Share this story
Delete

Managing multiple architecture specific installations of conda on apple m1

1 Share
2022-04-05-m1-conda.knit The new(ish) Mac M1 chips are different from Intel chips that Mac used to use. They have a lot of benefits, including better battery life and access to GPUs. However, a lot of software, especially scientific research software, is not natively installable on the M1 architecture. To circumvent this issue, Mac built Rosetta, a translator that enables software built for an Intel chip (x86) to run on the M1 chip (arm64). This blog post describes how to install two side-by-side, non-conflicting versions of conda to manage arm64 and x86 installations separately. This post was motivated by my own struggles with the M1 chip. I almost exclusively use (mini)conda to manage software installations because it deals with dependency issues, improves reproducibility and portability of software environments, and integrates seamlessly with workflow automation software that I use frequently. However many of my favorite scienitific software packages were not available in conda for the arm64 architecture (ex. sourmash, DESeq2). At the same time, some packages have released arm64-compatible installations, and many of these have benefits such as offering access to GPUs (see here). This blog post covers how to get the best of both worlds by installing two versions of conda, one for arm64 and one for x86. I also provide links to other solutions at the bottom, including how to have one version of conda with separate arm64 and x86 environments. This post assumes that zsh, xtools, and rosetta are already installed and configured…mostly because I had been using my M1 laptop for months before I embarked on finding a solution to my problems, and I already had these tools set up. Running two separate installations of conda for different architectures Step 1: Create separate terminal (or iTerm) application shortcuts for arm64 or x86 First, we’ll create two separate terminal entry points. One terminal will run the default arm64 architecture, while the other will run the x86 architecture using Rosetta. We’ll do this by creating a duplicate shortcut for terminal that will be opened using Rosetta. To do this, go to your “Applications” folder in Finder, right click Terminal (or iTerm) in the “Utilities” folder, and select Duplicate. Rename the new shortcut to “Rosetta Terminal” (or “Rosetta iTerm”), and then right click and select Get Info and check the Open using Rosetta check box. This step was inspired by this post, which also provides a gif for the above process. Step 2: Install miniforge3 arm64 and miniconda3 x86 Next, we’ll install miniforge3 to be used in the arm64 terminal, and minicoda3 to be used in the x86 terminal. The differences between miniforge and miniconda are described well in this stackoverflow post. Both miniforge and miniconda are minimal conda installers, but miniforge is a community-led effort (conda-forge) while miniconda is a company-led effort (anaconda). Originally, an arm64-compatible version of conda was only provided by miniforge. Now, both miniforge and miniconda provide one. However, we’ll still stick to using the miniforge installation for arm64, and the miniconda installation for x86. We’ll start by installing miniforge for the arm64 terminal. First, make sure you’re using arm64 processor by running: uname -m It should return arm64. Then, install miniforge (check here for the lastest version): curl -L https://github.com/conda-forge/miniforge/releases/download/4.12.0-0/Miniforge3-MacOSX-arm64.sh > Miniforge3-MacOSX-arm64.sh sh Miniforge3-MacOSX-arm64.sh Follow the prompts to accept the licence, install, and initialize conda. The initialization script, conda init, will add the following text to your .zshrc file: # >>> conda initialize >>> # !! Contents within this block are managed by 'conda init' !! __conda_setup="$('/Users/reitert/miniforge3/bin/conda' 'shell.zsh' 'hook' 2> /dev/null)" if [ $? -eq 0 ]; then eval "$__conda_setup" else if [ -f "/Users/reitert/miniforge3/etc/profile.d/conda.sh" ]; then . "/Users/reitert/miniforge3/etc/profile.d/conda.sh" else export PATH="/Users/reitert/miniforge3/bin:$PATH" fi fi unset __conda_setup # <<< conda initialize <<< We’re going to extract this portion of the .zshrc file and place it in a new file. Using your favorite text editor, open your .zshrc file, cut the newly added conda initialization lines, and paste them into a new file. Save the file as ~/.start_miniforge3.sh (After we install the x86 version of miniconda in our rosetta terminal, we’ll edit the the .zshrc file to run either the miniforge initialization when the terminal is arm64, or the miniconda initialization when the terminal is x86.) Next, we’ll install miniconda in the rosetta terminal. Switch terminal applications, and use the uname -m command again to check that you’re in the correct terminal. uname -m It should return x86_64. Then run the following lines to download and install miniconda3. curl -L https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh > Miniconda3-latest-MacOSX-x86_64.sh sh Miniconda3-latest-MacOSX-x86_64.sh Again, follow the prompts to accept the licence, install, and initialize conda. The initialization script will add text that looks like this to your .zshrc: # >>> conda initialize >>> # !! Contents within this block are managed by 'conda init' !! __conda_setup="$('/Users/reitert/miniconda3/bin/conda' 'shell.zsh' 'hook' 2> /dev/null)" if [ $? -eq 0 ]; then eval "$__conda_setup" else if [ -f "/Users/reitert/miniconda3/etc/profile.d/conda.sh" ]; then . "/Users/reitert/miniconda3/etc/profile.d/conda.sh" else export PATH="/Users/reitert/miniconda3/bin:$PATH" fi fi unset __conda_setup # <<< conda initialize <<< Repeat the process of extracting the new initialization text, this time placing it in a file called ~/.start_miniconda3.sh. Step 3: Edit .zshrc to automatically initialize the proper conda for each architecture The last step is to edit the .zshrc file so that it automatically initializes the correct conda when a new terminal is started (miniforge for arm64 and miniconda for x86). We’ll use an if statement that evaluates the terminal architecture and then runs the corresponding conda initialization script. Add the text below to your .zshrc file. Depending on your configuration, your .zshrc file may have a lot of text in it already. Adding it to the bottom should (usually) work fine. I like to add a note when I manually edit my .zshrc file so that future me knows where the addition came from, but the commented lines are optional. # <<<<<< Added by TR 20220405 << arch_name="$(uname -m)" if [ "${arch_name}" = "x86_64" ]; then echo "Running on Rosetta using miniconda3" source ~/.start_miniconda3.sh elif [ "${arch_name}" = "arm64" ]; then echo "Running on ARM64 using miniforge3" source ~/.start_miniforge3.sh else echo "Unknown architecture: ${arch_name}" fi # <<<<<<<< end <<<<<<< This solution was inspired in part by this superuser post. Exit out of all terminals. When you re-open them, everything should be ready to run! The miniconda installation will still need channels to be added. You can do this by running the following lines of code: conda config --add channels defaults conda config --add channels bioconda conda config --add channels conda-forge Running two separate installations of conda without create duplicate terminal application shortcuts If you’d rather not have two separate terminals, this youtube video by Jeff Heaton provides instructions for switching between conda installations within one terminal. More low-key solution: only switch to x86 for a specific environment It’s also possible to have a single conda installation, and to have single environments run as arm64 or x86. This issue gives instructions for how to use the arm64 installation of miniforge by default, but set up x86-compatible environments. Phrases I googled in search of a solution to do these things set up two terminals for m1 mac miniconda x86 arm64 two miniconda installations arm64 and intel two conda installations one for rosetta and one for arm64 managing miniconda m1 rosetta setting up m1 conda to run rosetta by default
Read the whole story
luizirber
48 days ago
reply
Davis, CA
Share this story
Delete

Max Siedentopf

1 Share


Max Siedentopf

Read the whole story
luizirber
170 days ago
reply
Davis, CA
Share this story
Delete

Missing data

1 Comment

Missing data throws a monkey wrench into otherwise elegant plans. Yesterday’s post on genetic sequence data illustrates this point. DNA sequences consist of four bases, but we need to make provision for storing a fifth value for unknowns. If you know there’s a base in a particular position, but you don’t know what its value is, it’s important to record this unknown value to avoid throwing off the alignment of the sequence.

There are endless debates over how to handle missing data because missing data is a dilemma to be managed rather than a problem to be solved. (See Problems vs Dilemmas.)

It’s simply a fact of life that data will be incomplete. The debate stems from how to represent and handle missingness. Maybe the lowest level of a software application represents missing data and the highest uses complete data only. At what level are the missing values removed and how they are removed depends very much on context.

A naive approach to missing data is to not allow it. We’ve all used software that demands that we enter a value for some field whether a value exists or not. Maybe you have to enter a middle name, even though you don’t have a middle name. Or maybe you have to enter your grandfather’s name even though you don’t know his name.

Note that the two examples above illustrate two kinds of missing data: one kind does not exist, while the other certainly exists but is unknown. In practice there are entire taxonomies of missing data. Is in unknown or non-existent? If it is unknown, why is it unknown? If it does not exist, why doesn’t it?

There can be information in missing information. For example, suppose a clinical trial tracks how long people survive after a given treatment. You won’t have complete data until everyone in the study has died. In the mean time, their date of death is missing. If someone’s date of death is missing because they’re still alive, that’s information: you know they’ve survived at least until the current point in time. If someone’s date of death is missing because they were lost to followup, i.e. they dropped out of the study and you lost contact with them, that’s different.

The simplest approach to missing data is throw it away. That can be acceptable in some circumstances, particularly if the amount of missing data is small. But simply discarding missing data can be disastrous. In wide data, data with many different fields per subject, maybe none of your data is complete. Maybe there are many columns and every row is missing something in at least one column.

Throwing away incomplete data can be inefficient or misleading. In the survival study example above, throwing out missing data would give you a very pessimistic assessment of the treatment. The people who lived the longest would be excluded precisely because they’re still living! Your analysis would be based only on those who died shortly after treatment.

Analysis of data with missing values is a world unto itself. It seems paradoxical at first to devise ways to squeeze information out of data that isn’t there. But there are many ways to do just that, each with pros and cons. There are subtle ways to infer the missing values, while also accounting for the fact that these values have been inferred. If done poorly, this can increase bias, but if done well it decreases bias.

Analysis techniques that account for missing data are more complicated than techniques that do not. But they are worth the effort if throwing away missing data would leave you with too little data or give you misleading results. If you’re not concerned about the former, perhaps you should be concerned about the latter. The bias introduced by discarding incomplete data could be hard to foresee until you’ve analyzed the data properly accounting for missing values.

The post Missing data first appeared on John D. Cook.
Read the whole story
luizirber
248 days ago
reply
"There are endless debates over how to handle missing data because missing data is a dilemma to be managed rather than a problem to be solved."
Davis, CA
Share this story
Delete

cognatos

2 Shares


 

você já ouviu falar de "cognatos" e "falsos cognatos" ao estudar um idioma? Basicamente é:⠀⠀⠀⠀⠀⠀⠀
• Cognato: palavras parecidas com mesmo significado (ex.: Sofa)
• Falso Cognato: palavra parecida com alguma em português, mas com significado diferente (ex.: Pretend - significa "fingir", não "pretender").
Sabe mais exemplos? Comenta aí!
⠀⠀⠀⠀⠀⠀⠀
Enfim, espero que tenham gostado do ~conteúdo que adicionei, porque tava com vergonha de ficar colocando só trocadilho idiota nas postagens HAHAHAH



Read the whole story
luizirber
252 days ago
reply
Davis, CA
iaravps
252 days ago
reply
Rio de Janeiro, Brasil
Share this story
Delete

Nancy by Olivia Jaimes for Mon, 06 Sep 2021

1 Share

Nancy by Olivia Jaimes on Mon, 06 Sep 2021

Source - Patreon

Read the whole story
luizirber
256 days ago
reply
Davis, CA
Share this story
Delete
Next Page of Stories