803 stories
·
25 followers

Scaling up and down

1 Share

There’s a worn-out analogy in software development that you cannot build a skyscraper the same way you build a dog house. The idea is that techniques that will work on a small scale will not work on a larger scale. You need more formality to build large software systems.

The analogy is always applied in one direction: up. It’s always an exhortation to use techniques appropriate for larger projects.

But the analogy works in the other direction as well: it’s inappropriate to build a dog house the same way you’d build a skyscraper. It would be possible to build a dog house the way you’d build a skyscraper, but it would be very expensive. Amateur carpentry methods don’t scale up, but professional construction methods don’t scale down economically.

Bias for over-engineering

There’s a bias toward over-engineering because it works, albeit inefficiently, whereas under-engineering does not. You can use a sledgehammer to do a hammer’s job. It’ll be clumsy, and you might hurt yourself, but it can work. And there are tasks where a hammer just won’t get the job done.

Another reason for the bias toward over-engineering is asymmetric risk. If an over-engineered approach fails, you’ll face less criticism than if a simpler approach fails. As the old saying goes, nobody got fired for choosing IBM.

Context required

Simple solutions require context to appreciate. If you do something simple, you’re open to the criticism “But that won’t scale!” You have to defend your solution by explaining that it will scale far enough, and that it avoids costs associated with scaling further than necessary.

Suppose a group is debating whether to walk or drive to lunch. Someone advocating driving requires less context to make his point. He can simply say “Driving is faster than walking,” which is generally true. The burden is on the person advocating walking to explain why walking would actually be faster under the circumstances.

Writing prompt

I was using some database-like features in Emacs org-mode this morning and that’s what prompted me to write this post. I can just hear someone say “That won’t scale!” I often get this reaction from someone when I write about a simple, low-tech way to do something on a small scale.

Using a text file as a database doesn’t scale. But I have 88 rows, so I think I’ll be OK. A relational database would be better for storing million of records, but that’s not what I’m working on at the moment.

More posts on scale

Read the whole story
luizirber
3 days ago
reply
Davis, CA
Share this story
Delete

My guild RIP

1 Comment

My Wow classic guild fell apart. Not sure what happened, no big drama. Everything seemed good and then I went on Hollidays and when I got back it was a ghost town. Guild officers are still in the guild, but none have logged on in close to a month. Maybe I was the secret glue that held it all together (joke).

So, now I start looking for a new guild. I want a guild that actively runs 5 player dungeons and is raiding at 60 or pushing towards it. But I don't want to be in a guild that takes it all too seriously. All guilds say they want to help players get geared and learns dungeons and raids, but what I've often found is that they turn into cliques where the cool kids raid and the rest... well I'm not sure.

There are a few guilds on my server where you have to "apply" and give them logs of your runs. Yeah... that sounds !fun.

I don't want to start my own guild. When it comes to Wow, I'm a follower not a leader. I think that's why I love healing. The only decisions I want to make in a dungeon is who lives and who dies.

If your guild meets the above criteria and wants an amazing undead priest healer named Delores... hit me up.

Now, back to my pile of adventure games I'm speed running.

Read the whole story
luizirber
13 days ago
reply
"I think that's why I love healing. The only decisions I want to make in a dungeon is who lives and who dies."
Exactly why I (was) a healer in WoW too! =]
Davis, CA
Share this story
Delete

Top tech grads are increasingly unwilling to work for Big Tech, viewing it as a new, unethical Wall Street

2 Shares

About five years ago, I was trying to get a bunch of Big Tech companies to take the right side of an urgent online civil rights fight, and I called an old friend who was very senior at one of the biggest tech companies in the world; they told me that it wasn't going to work, in part because the kinds of people who were coming to tech were there because they wanted to get as rich as possible, no matter what they had to do. My friend contrasted this with earlier eras -- even the dotcom bubble -- when the financial motive was blended with a genuine excitement for the transformative potential of tech to make a fairer, more equitable world. Now, my friend said, the kind of kid who would have gotten an MBA was instead getting an electrical engineering or computer science degree -- not out of any innate love for the subject, but because that was a path to untold riches.

But things are changing. Not only are young people far more skeptical of capitalism and concerned that it will annihilate the human race, but the tech companies' masks have slipped, revealing their willingness to supply ICE and the Chinese government alike, to help the oil industry torch the planet, and to divert their fortunes to supporting white nationalist causes. Companies that tout their ethical center have harbored and even rewarded serial sexual predators and busted nascent union movements.

The tech worker uprisings of recent years have caught the attention of the best and brightest grads of America's elite universities. The New York Times reports that tech recruiters are getting the cold shoulder from students a job fairs, where they are quizzed about Salesforce and Palantir's work with ICE; Google's multimillion-dollar payout for a top exec who engaged in years of demeaning sexual misconduct; Tesla's union-busting; and Facebook's years and years of depraved fuckery (a former Facebook recruiter told the Times that the rate of job-offer acceptances has fallen by 40%!).

Two years ago, I wrote that the tech industry's scarcest resource was tech workers, who are increasingly disillusioned with their employers, and who no longer see themselves as possible CEOs of their own startups, thanks to the monopoly tactics and market concentration that have created a "kill zone" around tech that preclude the possibility of unseating the Big Tech companies with your scrappy startup.

Tech companies can hire new cafeteria workers and gig economy drivers, but talented software engineers, security researchers, and mathematicians are in short supply. They have incredibly leverage over the Big Tech companies, and, as they start to build solidarity with their users and more easily replaced co-workers, they have it within their power to bring Big Tech to its knees.

“It felt like in my freshman year Google, Palantir and Facebook were these shiny places everyone wanted to be. It was like, ‘Wow, you work at Facebook. You must be really smart,’” said Ms. Dogru, 23. “Now if a classmate tells me they’re joining Palantir or Facebook, there’s an awkward gap where they feel like they have to justify themselves.”

Palantir, in particular, has drawn the ire of students at Stanford for providing services to U.S. Immigrations and Customs Enforcement (also known as ICE).

Last summer, a campus activist group, Students for the Liberation of All People, visited the company’s office, a 15-minute walk from campus, and hung a banner nearby that read: “Our software is so powerful it separates families.” Similar protests took place at the University of California, Berkeley, Brown and Yale, according to Recode. The protests, and the attitudes they reflected, were also covered in The Los Angeles Times.

Audrey Steinkamp, a 19-year-old sophomore at Yale, which sends about 10 percent of each graduating class into tech, said that taking a job in Silicon Valley is seen as “selling out,” no different from the economics majors going into consulting who are “lovingly and not-so-lovingly called ‘snakes.’”

‘Techlash’ Hits College Campuses [Emma Goldberg/New York Times]

(Image: No Tech For ICE)

Read the whole story
biocrusoe
10 days ago
reply
Lansing, MI
luizirber
13 days ago
reply
Davis, CA
Share this story
Delete

Quino

2 Shares

Read the whole story
luizirber
15 days ago
reply
Davis, CA
iaravps
17 days ago
reply
Rio de Janeiro, Brasil
Share this story
Delete

Oxidizing sourmash: PR walkthrough

1 Comment

sourmash 3 was released last week, finally landing the Rust backend. But, what changes when developing new features in sourmash? I was thinking about how to best document this process, and since PR #826 is a short example touching all the layers I decided to do a small walkthrough.

Shall we?

The problem

The first step is describing the problem, and trying to convince reviewers (and yourself) that the changes bring enough benefits to justify a merge. This is the description I put in the PR:

Calling .add_hash() on a MinHash sketch is fine, but if you're calling it all the time it's better to pass a list of hashes and call .add_many() instead. Before this PR add_many just called add_hash for each hash it was passed, but now it will pass the full list to Rust (and that's way faster).

No changes for public APIs, and I changed the _signatures method in LCA to accumulate hashes for each sig first, and then set them all at once. This is way faster, but might use more intermediate memory (I'll evaluate this now).

There are many details that sound like jargon for someone not familiar with the codebase, but if I write something too long I'll probably be wasting the reviewers time too. The benefit of a very detailed description is extending the knowledge for other people (not necessarily the maintainers), but that also takes effort that might be better allocated to solve other problems. Or, more realistically, putting out other fires =P

Nonetheless, some points I like to add in PR descriptions: - why is there a problem with the current approach? - is this the minimal viable change, or is it trying to change too many things at once? The former is way better, in general. - what are the trade-offs? This PR is using more memory to lower the runtime, but I hadn't measure it yet when I opened it. - Not changing public APIs is always good to convince reviewers. If the project follows a semantic versioning scheme, changes to the public APIs are major version bumps, and that can brings other consequences for users.

Setting up for changing code

If this was a bug fix PR, the first thing I would do is write a new test triggering the bug, and then proceed to fix it in the code (Hmm, maybe that would be another good walkthrough?). But this PR is making performance claims ("it's going to be faster"), and that's a bit hard to codify in tests. 1 Since it's also proposing to change a method (_signatures in LCA indices) that is better to benchmark with a real index (and not a toy example), I used the same data and command I run in sourmash_resources to check how memory consumption and runtime changed. For reference, this is the command:

sourmash search -o out.csv --scaled 2000 -k 51 HSMA33OT.fastq.gz.sig genbank-k51.lca.json.gz

I'm using the benchmark feature from snakemake in sourmash_resources to track how much memory, runtime and I/O is used for each command (and version) of sourmash, and generate the plots in the README in that repo. That is fine for a high-level view ("what's the maximum memory used?"), but not so useful for digging into details ("what method is consuming most memory?").

Another additional problem is the dual2 language nature of sourmash, where we have Python calling into Rust code (via CFFI). There are great tools for measuring and profiling Python code, but they tend to not work with extension code...

So, let's bring two of my favorite tools to help!

Memory profiling: heaptrack

heaptrack is a heap profiler, and I first heard about it from Vincent Prouillet. It's main advantage over other solutions (like valgrind's massif) is the low overhead and... how easy it is to use: just stick heaptrack in front of your command, and you're good to go!

Example output:

$ heaptrack sourmash search -o out.csv --scaled 2000 -k 51 HSMA33OT.fastq.gz.sig genbank-k51.lca.json.gz

heaptrack stats:
        allocations:            1379353
        leaked allocations:     1660
        temporary allocations:  168984
Heaptrack finished! Now run the following to investigate the data:

  heaptrack --analyze heaptrack.sourmash.66565.gz

heaptrack --analyze is a very nice graphical interface for analyzing the results, but for this PR I'm mostly focusing on the Summary page (and overall memory consumption). Tracking allocations in Python doesn't give many details, because it shows the CPython functions being called, but the ability to track into the extension code (Rust) allocations is amazing for finding bottlenecks (and memory leaks =P). 3

CPU profiling: py-spy

Just as other solutions exist for profiling memory, there are many for profiling CPU usage in Python, including profile and cProfile in the standard library. Again, the issue is being able to analyze extension code, and bringing the cannon (the perf command in Linux, for example) looses the benefit of tracking Python code properly (because we get back the CPython functions, not what you defined in your Python code).

Enters py-spy by Ben Frederickson, based on the rbspy project by Julia Evans. Both use a great idea: read the process maps for the interpreters and resolve the full stack trace information, with low overhead (because it uses sampling). py-spy also goes further and resolves [native Python extensions] stack traces, meaning we can get the complete picture all the way from the Python CLI to the Rust core library!4

py-spy is also easy to use: stick py-spy record --output search.svg -n -- in front of the command, and it will generate a flamegraph in search.svg. The full command for this PR is

py-spy record --output search.svg -n -- sourmash search -o out.csv --scaled 2000 -k 51 HSMA.fastq.sig genbank-k51.lca.json.gz

Show me the code!

OK, OK, sheesh. But it's worth repeating: the code is important, but there are many other aspects that are just as important =]

Replacing add_hash calls with one add_many

Let's start at the _signatures() method on LCA indices. This is the original method:

@cached_property
def _signatures(self):
    "Create a _signatures member dictionary that contains {idx: minhash}."
    from .. import MinHash

    minhash = MinHash(n=0, ksize=self.ksize, scaled=self.scaled)

    debug('creating signatures for LCA DB...')
    sigd = defaultdict(minhash.copy_and_clear)

    for (k, v) in self.hashval_to_idx.items():
        for vv in v:
            sigd[vv].add_hash(k)

    debug('=> {} signatures!', len(sigd))
    return sigd

sigd[vv].add_hash(k) is the culprit. Each call to .add_hash has to go thru CFFI to reach the extension code, and the overhead is significant. It's similar situation to accessing array elements in NumPy: it works, but it is way slower than using operations that avoid crossing from Python to the extension code. What we want to do instead is call .add_many(hashes), which takes a list of hashes and process it entirely in Rust (ideally. We will get there).

But, to have a list of hashes, there is another issue with this code.

for (k, v) in self.hashval_to_idx.items():
    for vv in v:
        sigd[vv].add_hash(k)

There are two nested for loops, and add_hash is being called with values from the inner loop. So... we don't have the list of hashes beforehand.

But we can change the code a bit to save the hashes for each signature in a temporary list, and then call add_many on the temporary list. Like this:

temp_vals = defaultdict(list)

for (k, v) in self.hashval_to_idx.items():
    for vv in v:
        temp_vals[vv].append(k)

for sig, vals in temp_vals.items():
    sigd[sig].add_many(vals)

There is a trade-off here: if we save the hashes in temporary lists, will the memory consumption be so high that the runtime gains of calling add_many in these temporary lists be cancelled?

Time to measure it =]

version mem time
original 1.5 GB 160s
list 1.7GB 173s

Wait, it got worse?!?! Building temporary lists only takes time and memory, and bring no benefits!

This mystery goes away when you look at the add_many method:

def add_many(self, hashes):
    "Add many hashes in at once."
    if isinstance(hashes, MinHash):
        self._methodcall(lib.kmerminhash_add_from, hashes._objptr)
    else:
        for hash in hashes:
            self._methodcall(lib.kmerminhash_add_hash, hash)

The first check in the if statement is a shortcut for adding hashes from another MinHash, so let's focus on else part... And turns out that add_many is lying! It doesn't process the hashes in the Rust extension, but just loops and call add_hash for each hash in the list. That's not going to be any faster than what we were doing in _signatures.

Time to fix add_many!

Oxidizing add_many

The idea is to change this loop in add_many:

for hash in hashes:
    self._methodcall(lib.kmerminhash_add_hash, hash)

with a call to a Rust extension function:

self._methodcall(lib.kmerminhash_add_many, list(hashes), len(hashes))

self._methodcall is a convenience method defined in [RustObject] which translates a method-like call into a function call, since our C layer only has functions. This is the C prototype for this function:

void kmerminhash_add_many(
    KmerMinHash *ptr,
    const uint64_t *hashes_ptr,
    uintptr_t insize
  );

You can almost read it as a Python method declaration, where KmerMinHash *ptr means the same as the self in Python methods. The other two arguments are a common idiom when passing pointers to data in C, with insize being how many elements we have in the list. 5. CFFI is very good at converting Python lists into pointers of a specific type, as long as the type is of a primitive type (uint64_t in our case, since each hash is a 64-bit unsigned integer number).

And the Rust code with the implementation of the function:

ffi_fn! {
unsafe fn kmerminhash_add_many(
    ptr: *mut KmerMinHash,
    hashes_ptr: *const u64,
    insize: usize,
  ) -> Result<()> {
    let mh = {
        assert!(!ptr.is_null());
        &mut *ptr
    };

    let hashes = {
        assert!(!hashes_ptr.is_null());
        slice::from_raw_parts(hashes_ptr as *mut u64, insize)
    };

    for hash in hashes {
      mh.add_hash(*hash);
    }

    Ok(())
}
}

Let's break what's happening here into smaller pieces. Starting with the function signature:

ffi_fn! {
unsafe fn kmerminhash_add_many(
    ptr: *mut KmerMinHash,
    hashes_ptr: *const u64,
    insize: usize,
  ) -> Result<()>

The weird ffi_fn! {} syntax around the function is a macro in Rust: it changes the final generated code to convert the return value (Result<()>) into something that is valid C code (in this case, void). What happens if there is an error, then? The Rust extension has code for passing back an error code and message to Python, as well as capturing panics (when things go horrible bad and the program can't recover) in a way that Python can then deal with (raising exceptions and cleaning up). It also sets the #[no_mangle] attribute in the function, meaning that the final name of the function will follow C semantics (instead of Rust semantics), and can be called more easily from C and other languages. This ffi_fn! macro comes from symbolic, a big influence on the design of the Python/Rust bridge in sourmash.

unsafe is the keyword in Rust to disable some checks in the code to allow potentially dangerous things (like dereferencing a pointer), and it is required to interact with C code. unsafe doesn't mean that the code is always unsafe to use: it's up to whoever is calling this to verify that valid data is being passed and invariants are being preserved.

If we remove the ffi_fn! macro and the unsafe keyword, we have

fn kmerminhash_add_many(
    ptr: *mut KmerMinHash,
    hashes_ptr: *const u64,
    insize: usize
  );

At this point we can pretty much map between Rust and the C function prototype:

void kmerminhash_add_many(
    KmerMinHash *ptr,
    const uint64_t *hashes_ptr,
    uintptr_t insize
  );

Some interesting points: - We use fn to declare a function in Rust. - The type of an argument comes after the name of the argument in Rust, while it's the other way around in C. Same for the return type (it is omitted in the Rust function, which means it is -> (), equivalent to a void return type in C). - In Rust everything is immutable by default, so we need to say that we want a mutable pointer to a KmerMinHash item: *mut KmerMinHash). In C everything is mutable by default. - u64 in Rust -> uint64_t in C - usize in Rust -> uintptr_t in C

Let's check the implementation of the function now. We start by converting the ptr argument (a raw pointer to a KmerMinHash struct) into a regular Rust struct:

let mh = {
    assert!(!ptr.is_null());
    &mut *ptr
};

This block is asserting that ptr is not a null pointer, and if so it dereferences it and store in a mutable reference. If it was a null pointer the assert! would panic (which might sound extreme, but is way better than continue running because dereferencing a null pointer is BAD). Note that functions always need all the types in arguments and return values, but for variables in the body of the function Rust can figure out types most of the time, so no need to specify them.

The next block prepares our list of hashes for use:

let hashes = {
    assert!(!hashes_ptr.is_null());
    slice::from_raw_parts(hashes_ptr as *mut u64, insize)
};

We are again asserting that the hashes_ptr is not a null pointer, but instead of dereferencing the pointer like before we use it to create a slice, a dynamically-sized view into a contiguous sequence. The list we got from Python is a contiguous sequence of size insize, and the slice::from_raw_parts function creates a slice from a pointer to data and a size.

Oh, and can you spot the bug? I created the slice using *mut u64, but the data is declared as *const u64. Because we are in an unsafe block Rust let me change the mutability, but I shouldn't be doing that, since we don't need to mutate the slice. Oops.

Finally, let's add hashes to our MinHash! We need a for loop, and call add_hash for each hash:

for hash in hashes {
  mh.add_hash(*hash);
}

Ok(())

We finish the function with Ok(()) to indicate no errors occurred.

Why is calling add_hash here faster than what we were doing before in Python? Rust can optimize these calls and generate very efficient native code, while Python is an interpreted language and most of the time don't have the same guarantees that Rust can leverage to generate the code. And, again, calling add_hash here doesn't need to cross FFI boundaries or, in fact, do any dynamic evaluation during runtime, because it is all statically analyzed during compilation.

Putting it all together

And... that's the PR code. There are some other unrelated changes that should have been in new PRs, but since they were so small it would be more work than necessary. OK, that's a lame excuse: it's confusing for reviewers to see these changes here, so avoid doing that if possible!

But, did it work?

version mem time
original 1.5 GB 160s
list 1.7GB 73s

We are using 200 MB of extra memory, but taking less than half the time it was taking before. I think this is a good trade-off, and so did the reviewer and the PR was approved.

Hopefully this was useful, 'til next time!

Comments?

Bonus: list or set?

The first version of the PR used a set instead of a list to accumulate hashes. Since a set doesn't have repeated elements, this could potentially use less memory. The code:

temp_vals = defaultdict(set)

for (k, v) in self.hashval_to_idx.items():
    for vv in v:
        temp_vals[vv].add(k)

for sig, vals in temp_vals.items():
    sigd[sig].add_many(vals)

The runtime was again half of the original, but... | version | mem | time | | :-- | :-- | :-- | |original|1.5 GB|160s| |set|3.8GB|80s| |list|1.7GB|73s| ... memory consumption was almost 2.5 times the original! WAT

The culprit this time? The new kmerminhash_add_many call in the add_many method. This one:

self._methodcall(lib.kmerminhash_add_many, list(hashes), len(hashes))

CFFI doesn't know how to convert a set into something that C understands, so we need to call list(hashes) to convert it into a list. Since Python (and CFFI) can't know if the data is going to be used later 6 it needs to keep it around (and be eventually deallocated by the garbage collector). And that's how we get at least double the memory being allocated...

There is another lesson here. If we look at the for loop again:

for (k, v) in self.hashval_to_idx.items():
    for vv in v:
        temp_vals[vv].add(k)

each k is already unique because they are keys in the hashval_to_idx dictionary, so the initial assumption (that a set might save memory because it doesn't have repeated elements) is... irrelevant for the problem =]


Footnotes

  1. We do have https://asv.readthedocs.io/ set up for micro-benchmarks, and now that I think about it... I could have started by writing a benchmark for add_many, and then showing that it is faster. I will add this approach to the sourmash PR checklist =]

  2. or triple, if you count C

  3. It would be super cool to have the unwinding code from py-spy in heaptrack, and be able to see exactly what Python methods/lines of code were calling the Rust parts...

  4. Even if py-spy doesn't talk explicitly about Rust, it works very very well, woohoo!

  5. Let's not talk about lack of array bounds checks in C...

  6. something that the memory ownership model in Rust does, BTW

Read the whole story
luizirber
15 days ago
reply
New post!
Davis, CA
Share this story
Delete

Weird Hill

1 Comment and 5 Shares
I'm compromising by picking a weird hill to lie on.
Read the whole story
luizirber
19 days ago
reply
Davis, CA
Share this story
Delete
1 public comment
alt_text_bot
31 days ago
reply
I'm compromising by picking a weird hill to lie on.
Next Page of Stories