Quantcast
Channel: Aaronontheweb
Viewing all 114 articles
Browse latest View live

Want me to work on your distributed .NET applications?

$
0
0

Petabridge - We help .NET companies make distributed, realtime applications.

Time for a brief commercial interruption...

Recently Petabridge announced its professional services offerings around Akka.NET.

I've been approached over the past year by a lot of .NET developers who've needed help using Helios for socket programming, or Akka.NET for distributed systems programming in .NET, and sometimes just some help getting Cassandra set up on Windows Azure.

Well, as part of Petabridge's offerings my time is officially now on the market for your projects.

If you would like my personal help learning how to use Akka.Cluster to build scalable ASP.NET and Windows Service microservices , deploying Akka.NET on Azure, rewriting your in-house socket servers to use Helios, or whatever else you need - request a consultation with Petabridge.

Here's what I do:

Architecture Review

The most common things I get asked are questions like "is this the right way to do XYX on top of Akka.Remote" or "we're trying to do X on top of Helios, can you take a look at it?" or "could you take a look at our Cassandra CQL schema for this time-series system?"

I consider all of this "architecture review," and that's something I can help you with. For a fixed price you can ask me, one of the guys who wrote Akka.NET and the dude who invented Helios, to review your production-ready Akka.NET and Helios apps. I'll give you the expert eye you need to de-risk your apps in production.

Consulting

If reviewing your architecture isn't enough, I can also help you write it.

In-person training

Want to learn how Akka.Cluster and Akka.Persistence can help you build high availability systems easily? And how to integrate them with other technologies your company currently uses like ASP.NET, SignalR, NServiceBus, or Windows Azure?

These are things I already talk about at conferences and on the Petabridge Blog, but at a high level. If you want hands-on, in-person training for you and your team that's something I can do.


Let's talk

I'd love to hear about what you're working on and how I can help! So let's start a conversation and see what we can do together!


Creating a Custom ETW EventSource for Debugging High-Performance Code in C#

$
0
0

One of the things I've been working on for both Helios and Akka.NET is a custom ThreadPool implementation that allows us to isolate mission-critical code from the noise of the CLR ThreadPool.

My second pass at implementing one of these instanceable thread pools had me borrow the work-stealing queue implementation that's been responsible for dramatic performance improvements in the CLR ThreadPool. The original ThreadPool code is very optimized and relies on native calls internally, so it's not always easy to follow.

My initial performance numbers for DedicatedThreadPool were absolutely awful, so I needed to find a high-performance way of measuring just how well this work queueing structure was working for me.

Enter Event Tracing for Windows (ETW.) I spent way more time than I care to admit trying to figure out how to make this work for me. I'm going to spare you the agony of figuring all of this stuff out by yourself and give you an easy step by step guide on how to record and view your own events in ETW.

Step 1 - Install PerfView

PerfView is a free tool from Microsoft for doing software performance analysis, and it works great with ETW. Download a copy of it and unzip PerfView.exe somewhere where you can find it again easily.

Step 2 - Install the Microsoft.Tracing.EventSource NuGet Package

I could not figure out, for the life of me, how to use the rest of this technique using the System.Tracing.EventSource library built into .NET - and after much researching I was told to just use this NuGet package.

PM> Install-Package Microsoft.Diagnostics.Tracing.EventSource -Pre

This package will give you compile-time validation if you're wiring up your EventSource classes correctly and even ships with a handy guide which explains how to use all of the EventSource features in their entirety.

Step 3 - Subclass Microsoft.Diagnostics.Tracing.EventSource

This is how you define your custom EventSource and all of the events that will show up in PerfView. Here's the one I used for profiling our DedicatedThreadPool.

[EventSource(Name = "DedicatedThreadPool")]internalsealedclassDedicatedThreadPoolSource:EventSource{internalstaticclassDebugCounters{publicstaticreadonlyAtomicCounterStealCounter=newAtomicCounter(0);publicstaticreadonlyAtomicCounterStealMissCounter=newAtomicCounter(0);publicstaticreadonlyAtomicCounterGlobalHitCounter=newAtomicCounter(0);publicstaticreadonlyAtomicCounterGlobalMissCounter=newAtomicCounter(0);publicstaticreadonlyAtomicCounterLocalHitCounter=newAtomicCounter(0);publicstaticreadonlyAtomicCounterLocalMissCounter=newAtomicCounter(0);}publicvoidMessage(stringmessage){WriteEvent(1,message);}publicvoidStealHit(){WriteEvent(2,DebugCounters.StealCounter.GetAndIncrement());}publicvoidStealMiss(){WriteEvent(3,DebugCounters.StealMissCounter.GetAndIncrement());}publicvoidGlobalQueueHit(){WriteEvent(4,DebugCounters.GlobalHitCounter.GetAndIncrement());}publicvoidGlobalQueueMiss(){WriteEvent(5,DebugCounters.GlobalMissCounter.GetAndIncrement());}publicvoidLocalQueueHit(){WriteEvent(6,DebugCounters.LocalHitCounter.GetAndIncrement());}publicvoidLocalQueueMiss(){WriteEvent(7,DebugCounters.LocalMissCounter.GetAndIncrement());}publicvoidThreadStarted(){WriteEvent(8);}publicstaticreadonlyDedicatedThreadPoolSourceLog=newDedicatedThreadPoolSource();}

When you make a call to WriteEvent, you must pass in a unique ID (int) for each different type of event you want to record.

For high-performance events like these (clocking thread queuing behavior) I recommend using small data types for your event state - integers.

I also recommend using the EventSource attribute on your class like this, to give it a friendly name:

[EventSource(Name = "DedicatedThreadPool")]

You're going to have to use this name to look up ETW events in PerfView, so it's a little bit easier if you manually assign it a name rather than depending on the conventions generated by the compiler.

You might have also noticed this:

publicstaticreadonlyDedicatedThreadPoolSourceLog=newDedicatedThreadPoolSource();

I'm using a singleton here because this makes it a lot easier to actually log events throughout my code.

Step 4 - Record Events Using Your Custom EventSource

Now we just need to record some of the events we defined on the DedicatedThreadPoolSource class. It looks like this:

[SecurityCritical]internalvoidMarkThreadRequestSatisfied(){#if HELIOS_DEBUGDedicatedThreadPoolSource.Log.ThreadStarted();#endif// rest of method...}

Easy!

Step 5 - Use PerfView to Profile Your App and Record Your Custom ETW Source

Now for the next part - actually recording your metrics - you can do this easily through PerfView.exe by doing the following.

In PerfView.exe, go to Collect --> Run and grant it Admin rights when asked.

Profiling an app with custom ETW EventSource in PerfView.

Set the path to your own app where you've defined your EventSource, but most importantly - you need to specify the following under Additional Providers:

*DedicatedThreadPool

This is the name we gave our DedicatedThreadPoolSource via the EventSource application earlier - and we must put the * in front of it in order for PerfView to pick up its events.

Once this is done, press the Run Command button and you should see the following:

Running PerfView for profiling.

The bottom-right corner will continue to flash for a bit until the profiling is complete. Once it's done you'll be able to view the results.

Step 6 - Review Your Results in PerfView

After PerfView is finished collecting its data, you should see the left-hand panel fill up with new items for you to review.

Results in PerfView

Your custom events will show up under Events and EventStats. Personally, I like to take a look at EventStats first just to gauge how frequently each custom event was hit.

PerfView EventStats for custom EventSource

You can see our DedicatedThreadPool events sprinkled throughout the list.

Now go give this a shot yourself!

On a Mission to Mars

$
0
0

This weekend I went through the Landmark Forum and had to confront a big secret that I've been keeping hidden for virtually my entire life: the true aim of my ambitions.

Ever since I was a middle schooler, I've wanted to help build the first human colony on Mars. I've multiple hours per day thinking about it for years and years and told no one. I was afraid that people would think it's a ridiculous goal, and that I wasn't successful enough to do it yet. And really, I was scared that I would fail. So I let these fears stop me from ever trying or ever sharing it.

The Forum helped me realize that these fears are bullshit.

I'm going to help put the first permanent human settlement on Mars. That's my mission, and today I'm committing to realizing the possibilities that go with it. Sure, it'll be really, really hard - but that's what makes it worth doing.

Let me tell you where this all started...

It all began with a shitty Val Kilmer movie

Red Planet movie poster

When I was in middle school, I saw Red Planet one night. The movie was entertaining enough - but one plot detail it contained stuck with me for years as a fascinating possibility.

In the movie, scientists had begun the process of "terraforming" Mars - the word "terraform" literally means "to shape something in Earth's image."

Terraforming is a process by which you take a satellite like a planet, moon, or asteroid and reshape its environment to mirror Earth's. Create a breathable atmosphere, sources of water, human-livable temperatures, Earth-like weather, and so forth where there was none before.

It's a powerful idea that's been around in science fiction for a long time, but the way it was done in Red Planet fully captured my imagination so completely that it's stuck around for years.

Scientists bioengineered species of algae that could sustain themselves on the materials found on Mars' surface to gradually produce breathable atmosphere over time, and the algae was bred to evolve with the planet as the atmosphere began to form. Rather than inventing massive terraforming machines that used traditional 20th century approaches to produce something (chemical input + combustion / electricity = results) - the Red Planet writers exposed me to an entirely new possibility: design a simple organism that could use nature itself to achieve this goal without any further input from humans.

This idea gave me a profound new sense of possibility for myself and mankind.

I took from Red Planet that humanity is destined to become interplantary species if we look past what we've done before and consider new approaches to challenges that seem impossible. Because they only seem that way - and that's how many people choose to perceive them. But the reality is that most of them aren't impossible. Really, really, really, really hard and perhaps not possible today. But they're challenges that are all possible to overcome tomorrow.

My role

In my view, the mission to Mars consists of four distinct problem areas:

  1. Transportation - how do we move things from Earth to Mars? This is SpaceX's mission.
  2. Infrastructure - how do we reliably communicate between Earth and Mars? How do we leverage Mars' natural resources to create habitats, energy, food, and water?
  3. Survival - how do we enable life to survive on Mars? How do we enable life to survive the journey?
  4. Sustainability - what will humans actually do once they land on Mars? What will those societies look like?

Transportation is the first problem, but it's been solved - we've already been able to transport things to Mars.

However, we haven't be able to transport very much to Mars. Getting stuff off of the Earth's surface and into orbit is extremely expensive - Elon Musk, SpaceX's founder and CEO, said somewhere that 90% of the costs of launching a rocket are just to escape Earth's gravity.

This presents a really interesting challenge... Without a radical innovation in rocketry and space travel, it will never be economical for us to build infrastructure materials on Earth and transport them to Mars. It costs $22,000 per pound to send something into Earth's orbit.

It costs $22,000 per pound to send something into Earth's orbit. Shipping four tons of building supplies (8,000 lbs) into Earth's orbit costs $176,000,000 today - that doesn't cover any of the costs for transporting those materials the 54.6 million kilometers to Mars (when Mars is at its closest orbit to Earth) and landing them safely on the planet's surface. The full cost is some multiple of the launch costs.

Edit: a previous version of this article said it was $10,000 per lb. Had a friend who works in aerospace inform me that the costs have increased to roughly $22,000 per lb now.

Did I mention that we only have that a "closest" possible distance to Mars once every 2 years or so? And that "closest" distance can be anywhere from 54.6m km to over 100m km?

From Universe Today: The last known closest approach was back in 2003, when Earth and Mars were only 56 million kilometers apart. And this was the closest they’d been in 50,000 years.

At Mars' longest orbit it's actually 401 million kilometers away, and the freaking Sun stands in the middle between Earth and Mars at that distance. You can read more about the relative distances and orbits of Mars and Earth here.

The point is: transportation, the strongest focal area of the Mars mission so far, only solves one part of the problem. And waiting for that problem to be "solved" without starting on infrastructure misses a massive opportunity to have some real breakthroughs. And even when it is "solved," there are periods where Mars will be on the opposite side of the sun and we simply won't be able to launch missions there.

The problem I want to solve is building infrastructure on Mars without waiting for a multiple order of magnitude decrease in the cost of transportation. And without needing people on Mars to build it.

Introducing Collaborative Computing

Software architects like me get really concerned about latency in networked applications - the amount of time it takes to get a response from an application running remotely on a separate machine. We freak out if our web pages take more than 250 ms to load.

Well, when Mars is at its closest orbit it takes 3 minutes just for a signal from Earth to reach Mars and another 3 minutes to get a response back. At it's furthest point it takes 21 minutes, and the sun sits in the middle and completely blocks any signal to or from Mars.

Light Travel Latencies by Mars Opposition

Here's how long it actually takes information to travel from Earth to Mars, per the oppositions listed here as well as the theoretical min and max.

DateDistance to Mars (km)Light Travel Time (min)
Dec. 24, 200788,200,0004.903392199
Jan. 29, 201099,300,0005.520485776
Mar. 03, 2012100,700,0005.598317398
Apr. 08, 201492,400,0005.136887066
May. 22, 201675,300,0004.186229395
Jul. 27. 201857,600,0003.202215314
Oct. 13, 202062,100,0003.452388385
MAX THEORETICAL401,000,00022.29320036
MIN THEORETICAL54,600,0003.035433266

This type of latency is unheard of in software!

If you're curious, I made a spreadsheet of possible "light travel time to Mars" calculations that you can try on yourself.

Overcoming latency with autonomous, self-correcting systems

Trying to control a machine deployed on Mars that takes 6 minutes of round-trip time on it's very best day, which happens once roughly every 50,000 years per request remotely via software or humans on Earth is an approach that is utterly doomed to fail from the start.

The solution is to build autonomous systems that are capable of making their own decisions in pursuit of specific goals, without needing any remote guidance or instruction from Earth. This is what what the NASA Curiosity rover did and all of the other machines before that.

I'm going to take this to the next level by introducing the field of Collaborative Computing.

Collaborative Computing consists of groups of machines that are able to work together as a team, self-replicate or invent new tools, and learn from each other in order to collaborate in pursuit specific objectives without any human supervision.

Today when people hear "collaborative computing" they think of collaboration software - like editing a Google Doc. I'm going to take this as an opportunity to define something much larger - a type of technology that has the ability to change worlds.

Collaborative Computing on Mars

So, back to the economics of space travel - it's too damn expensive to ship building materials and infrastructure to Mars, and there are periods of time when it's simply not possible to even do it due to the relative orbits of Mars and Earth.

Thus, we need a way to ensure that infrastructure can be built without any human involvement and without any major transportation costs. Enter Collaborative Computing.

Imagine if we could send a small group of robots to Mars that were capable of doing the following:

  • Identify and extract iron deposits on Mars;
  • Refine raw materials into building materials;
  • Use building materials to construct infrastructure;
  • Use infrastructure to build more machines (or robots) that can build additional types of infrastructure.

No human involvement other than specifying the goals for these machines and providing them with initial designs and methodologies for accomplishing those objectives. That possibility literally has the ability to transform worlds, both Mars and Earth.

What needs to happen to make Collaborative Computing possible

Collaborative Computing is not possible today, otherwise we'd already have robots autonomously building roads and all sorts of other things here on Earth.

So what needs to happen in order to realize Collaborative Computing?

  • Develop artificial intelligence and machine learning software capable of self-directed evolution in order to accomplish specific goals with minimal direction;
  • Give that AI software the ability to distribute and share knowledge with other related systems;
  • Put that software onto hardware capable of carrying out said goals;
  • Make that hardware capable of constructing other machines, structures, and tools - including copies of itself; and
  • Most importantly: expose a framework for allowing anyone to build collaborative systems.

This is where I come in.

How Petabridge, Akka.NET, and Helios fit into this picture

Petabridge - helping developers discover entirely new ways to take on audacious challenges..

So what does any of this have to do with what I'm doing today at Petabridge? The answer is this: Petabridge is going to be the company that develops the software frameworks for accomplishing this and many other "impossible" challenges.

Petabridge's founders, Andrew and myself, believe that the future is something we create right now. And that in order to create a future where challenges like getting to Mars can be realized, we have to look past what we already know and consider new ways of doing things.

Petabridge helps people write their own futures by giving them visionary new technologies to use and including them in communities of users we build around them.

We just happen to build software development platforms for doing this today. So why don't you get started and try Akka.NET or Helios?

Collaborative Computing will be realized, and I invite you to join me in making it happen.

How to Start Contributing to Open Source Software

$
0
0

The Petabridge team (all two of us) just wrapped up a big two weeks. We launched Akka.NET V1.0 and then traveled to Portland to talk about .NET open source software at .NET Fringe.

.NET Fringe Conference

One of central themes of .NET Fringe is open source communities - and there's two sides to this coin:

  • How does an open source project successfully attract contributors and make them effective?
  • How does your typical software developer become an OSS contributor?

I'm going to touch on the former topic in a subsequent post, as that appeals to a more niche audience than the latter.

Why do open source?

All OSS contributors start off like any other typical software developer... So let's start there.

You work in an office and you're initially really satisfied with what you do. But engineering is as much a creative endeavor as it is a technical one... And your day job doesn't really scratch your itch entirely.

You really have the languages, frameworks, tools, and design methodologies that you currently use down cold. At least you think you do. So, in your opinion, there's not a lot of new or exciting stuff happening at your work place.

After enough time passes, you start to feel unsatisfied. Bored. Unfulfilled.

And thus, you choose one of following:

  1. Switch jobs to someplace else and hope the environment is more stimulating; or
  2. Become complacent (the "default" option) and stop being curious about software development; or
  3. Take matters into your own hands and become responsible for your own happiness when it comes to software development.

Option 1 is the insanity option - repeating the same choices that lead you to where you are now and expecting a different result. Getting a job at Facebook, Google, Microsoft, Cloudera, or whatever might seem more stimulating than writing line of business apps for a bank but the honeymoon ends and the novelty wears off. Quickly.

Option 2 is the resignation option - the die is cast and you're just going to have to get used to whatever your current environment is. You create a little prison cell for yourself based on the confines of your current environment, and at some point you may even begin to love the cell walls.

Let this go on long enough and you become that fat bearded bastard who incredulously defends using Visual Basic in production in 2015. That's not a career worth celebrating or a life worth living if your resignation translates over to other areas of your life. As the saying goes, "the way you do anything is the way you do everything."

Option 3 is the empowerment option - take responsibility for your own career and happiness by satisfying your curious and creative urges. This is the key to a happy career and life - don't let your work environment be the sole determinant in your day-to-day programming experience and expression.

Contributing to open source is one of the most natural and common ways to take ownership of your self-expression as a developer and the fact that you get to share that experience with like-minded people, the other contributors and users, makes it even better.

Open source is a venue for discovery, self-expression, and most importantly: investing in yourself. It's one of the most powerful choices you can make as a software developer. You're not required to do anything - it's entirely your choice. And that's what makes you powerful when you do it.

And here's the big irony - the "great" work environments that developers dream about where self-expression comes naturally... That comes about as a result of owning your self-expression and bringing the attitude with you to to work.

So how do you get started?

Getting started

Ok, so you're ready to take ownership of your self-expression and blah blah. What are the literal, tactical steps for getting started?

Pick a project

The first step is to just find a project you want to work on.

Ideally it should be a project that's already active, meaning that it has at least one other participant, gets updated frequently, and does releases regularly.

It really doesn't matter what the project does or how big the project is - those are details that become more important when it's time to make your first contribution.

The only criteria that really matters is this question:

Is this project something that I would want to use in my ideal day job?

If the answer is "yes" then you've found a great fit!

Become an end-user (if you aren't already)

Before you can start making effective contributions, you should become an end-user of the project first.

Write a couple of small applications, maybe a blog post explaining how to use it, or whatever. You don't need to become an expert on every nit-picking detail right away and you don't need to do any production-grade deployments - but you need to get a feel for how the end-user is going to consume any of your contributions.

The whole point of contributing to this project is to be able to use it yourself, so have fun with it!

Start lurking the project (read-only participation)

Before you can contribute, you need to get your bearings on the project. Here are some questions you can get the answers to just by reading and listening:

  1. What are the goals of the project?
  2. How's the code organized?
  3. Does the project have any standards? What are they?
  4. Who are the key players responsible for X,Y, and Z?
  5. Does the project have a published roadmap for future versions?
  6. How do pull requests get submitted? What does the review process look like?
  7. How do the contributors communicate? Google Groups? Gitter? Github issues? IRC?

The goal of read-only participation is to learn the rules of the road for the project. Learn how the code is structured and organized, but more importantly: start learning how the other people involved work together.

Once you've done this for a little while, you'll have enough information to start actively contributing. This is where the real fun begins.

Working with other contributors

The hard part of any software project is working with other people, and open source is no different.

But learning how to do this well is a lot easier if you adopt the following ideas.

Assume good faith

When you're interacting with other contributors, assume that everyone is acting in good faith by default.

If you make a contribution and a veteran contributor declines to accept your changes because they don't meet some standard or don't align with some other planned changes that are in progress already, don't take it personally! Take the feedback at face value and see what you can learn from it.

Assume there's a good reason for previous designs

Don't come into a project and start refactoring everything without communicating with other contributors first - there's often a history to the design that you may not be privy to right away.

It's exhausting for project owners to regularly battle with new contributors who insist on imposing biases from previous and usually inapplicable experiences onto the project. Don't be that guy - be the person who seeks to understand why things are the way they are.

Assume there's a history to these changes and find out what it is - who knows? It might be sloppy code from early on in the project before clear standards were established, and you should refactor it. Or it may turn out that there's a previously unfamiliar programming principle at work or a weird environmental bug that contributors had to work around.

You will be treated with a great deal of respect if you ask thoughtful questions, even if the contributors have had to answer them dozens of times before."

Be coachable

The single most important thing you can do as a potential new OSS contributor is to become coachable. You're doing OSS for the sake of learning and becoming a more powerful developer and thus you must be open to suggestions from others in order to do this effectively.

Learn to acknowledge shortcomings with your own code without assigning any moral weight or attachment to them.

Accept that the tools and concepts that you know today aren't the best solution for all problems.

Explore ideas without making instant "right" or "wrong", "should" or "shouldn't," and "better" or "worse" judgments.

Actually listen to other contributors - don't just search for a reason to make yourself right or them wrong.

And review the Taxonomy of Terrible Programmers - avoid any behavior on that list.

Don't be afraid of being "wrong"

I'm an experienced OSS leader and contributor and I screw up *all** the time*. Software is a process, not a product - and part of that process involves discovering bugs and design flaws, including your own!

Don't be afraid of making a mistake - whether it's in your code, your speech, or whatever. You're going to become a much more powerful contributor if you don't assign moral "right" or "wrong" weight to everything.

If you make a mistake, it's not a big deal. Apply that standard to everybody else too.

Follow the rules

If the project has standards and guidelines... Follow them. They're usually pretty small and not very complicated. If you don't understand why a rule exists or what it means, just ask.

Be a part of the conversation

Stay in touch with the other contributors on the mailing list, Gitter, IRC, Twitter, or whatever. We're all just people! Just relax and be yourself - ask questions, tell stories, make jokes, post animated gifs, or whatever.

Act like a person, get treated like one.

Dealing with jerks

A lot of developers are shy, introverted types - so the prospect of dealing with some belligerent asshole developer (The Agitator from the Taxonomy of Terrible Programmers) on a public forum can be pretty intimidating.

Here's a simple set of guidelines for dealing with jerks.

If the project is run by jerks, then it's a toxic project. Don't get involved.

Nothing sucks the fun out of open source like stepping on eggshells around the core contributors; I recommend just avoiding projects like these.

OSS contributors who run successful projects and act like jerks are victims of their own success in a way - deeply insecure about their own code, which a lot of people now use thus putting them under even more pressure, so they overcompensate by trying to dominate and belittle others.

Just avoid these projects - no one has time for other people's drama.

"Jerks" usually aren't jerks 1 (cultural differences)

One of the things that will come up often in successful OSS projects is cultural differences. Americans and Germans are extremely direct when we communicate, so we can come across as giant assholes to people in other cultures when we're expressing ourselves the way we normally would.

Develop an awareness of how different cultures communicate and how you communicate with them. Many of the contributors on Akka.NET are Swedish, and they have a much more communal / softer tone when they communicate. Americans like me are hyper individualistic and direct.

Without being aware of the differences in how we communicate with each other, this could be a source of drama. So give people the benefit of the doubt; it might just be a cultural thing after all.

But take the time to account for those cultural differences in your own communication too, and if necessary - learn what those cultural differences are in their communication and educate them on yours. It's information that stands to benefit all parties.

"Jerks" usually aren't jerks 2 (perspectives)

In addition to the cultural differences, you should try to gauge the perspectives of other contributors who might be acting a little "jerky."

What if the contributor who's being a "jerk" is also responsible for a massive production deployment of this OSS project at his or her company and is being criticized heavily at work for how it's performing. Do you think that's going to have an impact on their tone with other contributors on the project? Of course it will.

Again, operate on the assumption of good faith - try to figure out what's really going on with that contributor one on one. I guarantee you'll get better results from these contributors if they feel like they're understood - a little empathy goes a long way.

Actual jerks

They do exist, but they're exceedingly rare in my experience. Actual jerks are deeply insecure and take it out on other people by trying to make themselves right / others wrong.

Don't be afraid of them - pity them. They're more afraid of you than you are of them.

If an actual jerk comes after you, the best practice is to not engage. Just dismiss them.

If the harassment gets really bad or happens in private, make sure you let other contributors know. You shouldn't have to put up with assholes - so don't allow it.

Actively contributing

If you nail all of the above, actually contributing is as easy as just asking more experienced contributors "what can I do to help?"

Every OSS project needs help! It might be in the form of more documentation, fixing bugs, upgrading the build system, or building a new feature.

When it comes to open source the technology is easy - the people part less so.

So master and practice the above - enjoy the new things you'll learn from the project and the other contributors. And bring those new attitudes to your work. You'll be immensely more powerful and more satisfied with your career for it.

Talking about Akka.NET and the Actor Model on Hanselminutes and .NET Rocks

$
0
0

I've done a bit of a "press tour" for Akka.NET since we released Akka.NET v1.0 at the beginning of April and I wanted to share a couple of the interviews I've done in the .NET community since.

Hanselminutes: Inside the Akka.NET open source project and the Actor Model with Aaron Stannard

I did an interview with Scott Hanselman immediately after .NET Fringe and we covered the goals and scope of the Akka.NET project, but also really took a close look at the Actor model and why it's becoming a popular concept among .NET developers.

Listen to me on Hanselminutes! (33:04).

.NET Rocks: Akka.NET V1 with Aaron Stannard

Carl and Rich, the two hosts of .NET Rocks, did a better job with the synopsis than I could!

Akka.NET ships! Carl and Richard talk with Aaron Stannard about Akka.NET, a toolkit and runtime for building highly concurrent, distributed and fault tolerant event-driven applications. Akka.NET is a port of the original Akka framework in Java/Scala. Aaron talks about the reactive manifesto as the driver for Akka.NET, to provide tools for responsiveness, resiliency, elasticity and message driven.

Listen to me on .NET Rocks! (58:40).

Scalability Lessons we can Learn from Voat

$
0
0

Voat - have your say

Last weekend I found voat.co on the /r/dotnet subreddit, one of the places I frequent for news and happenings related to all things C# / F#.

Voat exists simulataneously as two different things:

  1. The Voat software - an open source, ASP.NET MVC + MSSQL implementation of Reddit AND
  2. voat.co - an instance of the software that is owned and operated by its creators.

I'm a huge fan of Reddit - had an account there for years, and seeing such a kick-ass implementation of it written in C# with SignalR, ASP.NET MVC, Web API, and lots of other cool toys gets me really excited!

Voat could be a huge opportunity to expose lots of curious onlookers to .NET and C#, and it will be an even better opportunity to get C# / ASP.NET developers into open source if the voat.co website successfully establishes a community.

The authors themselves explicitly state in the README of the Voat project on Github that they used Voat as an opportunity to just for the sake of learning:

This was just a hobby project to help me get a better understanding of C# and ASP.NET MVC and Entity Framework.

So helping broaden the reach of .NET in OSS is my primary interest in getting excited about Voat, and last weekend I spent a bunch of time reviewing the Voat source code and looking for potential areas where I could contribute.

Voat's Gradually Increasing Popularity

The Voat software has been online for about a year, but it's been seeing a steady increase in activity over the past several months.

Voat's founders gave me some interesting usage and performance statistics in this thread I asked on /v/voatdev, but unfortunately due to the load the site is currently under I can't retrieve them for you at the time of writing this.

The point being: Voat.co's usage has been increasing rapidly over the past few months, apparently as a result of dissatisfied users moving on from Reddit (based on my own limited observations.)

It became clear to me that Voat.co was going to have to start dealing with scaling issues sooner rather than later - I didn't think it would take much for Reddit to stir the pot and cause a mass exodus onto Voat given the state of the political climate at Reddit lately.

Having been in a situation where sudden onset traffic nearly crushed my business before, I immediately started looking through the Voat source code for anything that would indicate obvious scaling problems or bottlenecks.

Here's what I found:

  1. Everything uses the built-in CRUD tools from Microsoft: Entity Framework, ASP.NET Identity, and so on. All of those built-in tables are baked into the Voat SQL schemas as well.
  2. No SignalR Backplane: means that this software is currently only designed to run on a single box.
  3. SignalR on every page: page loads are going to get expensive, given that SignalR websocket calls will be used on every tab.

I suspected that these might be sources of problems if scalability became an issue.

EF, ASP.NET Identity, and Strong Coupling

The built-in Microsoft CRUD and ASP.NET Identity tools are designed for rapid application development, and they tightly couple your user identity schema to SQL Server.

The real danger I see in this is that when a write-heavy workload arrives, such as counting votes / comments / posts from thousands of concurrent users, SQL Server is going to implode and it's going to require a major rewrite to decouple the comment / voting store from the SQL store without some abstraction between the two.

Fact of life: SQL Server and other relational DB stores are not intended for applications with high write / read ratios. They're heavily optimized for read-heavy workloads.

SignalR

As for the SignalR issues, I was able to confirm that Voat is running on a single box. Not a big deal given that it's still a hobby project for the developers, but the fact that it's not designed to run on anything other than a single box is an obvious problem. Either remove SignalR or support a backplane for it.

But beyond that, SignalR is much more resource-intensive than serving up a single HTTP request. It keeps an open connection open for each tab for as long as the tab is active. If you're serving up a large number of page views, this will get expensive in terms of memory utilization.

"We have bigger problems to worry about than scalability"

I started asking questions about the current performance numbers for Voat.co, the network architecture, and so forth - mostly because .NET scalability is an area I know a lot about and love helping with! But I was more or less told that the voat.co team wasn't worried about it at the moment. Wouldn't be an issue.

"Ok, let me know how I can help" is how I left it.

72 Hours Later: Mass Exodus from Reddit Takes Down Voat.co's Servers

As fate would have it, Reddit's administrators banned a number of offensive subreddits with a large number of users a few days after I had this conversation with Voat's developers. Hundreds of thousands of regular users.

I don't really care about the politics of why that happened, but the point is - Voat.co was hit with an avalanche of traffic that they were not prepared for. Thus, voat's homepage now looks like this:

Voat - HTTP 503

Scalability problems are great problems to have, because this means that the demand for your service is beyond your ability to provide it. But, as I've learned, they are also immensely scary and frustrating to deal with if you are not prepared.

What can we learn?

Voat.co is missing out on a fantastic opportunity to capture a ton of traffic, new users, and advertising revenue right now. It may have started out as a hobby project for the original developers, but now the platform has really taken a life of its own.

So what we can learn from this? What can we learn as .NET developers about Voat?

  1. Don't prematurely optimize your code, but plan for scalability - what does this mean exactly? For starters, don't tightly couple your application to the implementation details of a particular database. I'm extremely skeptical of anyone who says "YAGNI" to this. Here's my personal foray into the hell known as "realizing you've picked the wrong database at exactly the wrong time." Design your system in a way that the components you launched with can be replaced by the right tool for the job later under different levels of stress.
  2. Don't use anything you can't scale beyond one machine - if your system can only run on one machine, you're screwed. You have no redundancy or resiliency against most types of network failures. Design your system to have more hardware thrown at it from day one.
  3. Use a monitoring / alerting service from day one - Pingdom has a free trial and starts at $15 / month. Don't be stuck outside golfing the day your server gets engulfed on precious, precious traffic. Make sure your servers can let you know if there's a problem!
  4. You never know when traffic will strike; plan for it happening any time. Have a contingency plan to scale your service at any given time. Don't assume that it's going to happen gradually. At MarkedUp we experienced 600% growth for 3 consecutive days without any warning, and we didn't have a plan. This happened right around Thanksgiving here in the US, so my holidays that year were pretty stressful to say the least.
  5. Measure your bottlenecks carefully, and if you're an OSS project - share that data with contributors! - I can't really do more than just make educated guesses as to where Voat's bottlenecks are, because I don't have any data to quantify the problems! So make sure you're actively monitoring CPU utilization and profiling your code - run traces on your SQL queries and try to tune up slow-running indexes and so forth. Solving scaling problems requires good data.

I'm excited for what the future has in store for Voat.co and the OSS project behind it! If you want to help make a difference for those guys, donate to them using the information below:

If you want to help us, you could donate via paypal to hello@voat.co or via bitcoin to 1C4Q1RvUb3bzk4aaLVgGccnSnaHYFdESzY and we'll make sure you get a badge on your voat profile as a small token of gratitude.

And if you want to get involved in Voat's OSS efforts, check out the Voat project on Github!

Cassandra, Hive, and Hadoop: How We Picked Our Analytics Stack

$
0
0

MarkedUp Logo

This is an archive of a blog post I wrote for the MarkedUp Analytics blog on February 19th, 2013. It's been a popular post and I'm posting it here in order to preserve it.

When we first made MarkedUp Analytics available on an invite-only basis to back in September we had no idea how quickly the service would be adopted. By the time we completely opened MarkedUp to the public in December, our business was going gangbusters.

But we ran into a massive problem by the end of November: it was clear that RavenDB, our chosen database while we were prototyping our service, wasn’t going to be able to keep growing with us.

So we had to find an alternative database and data analysis system, quickly!

The Nature of Analytic Data

The first place we started was by thinking about our data, now that we were moving out of the “validation” and into the "scaling" phase of our business.

Analytics is a weird business when it comes to read / write characteristics and data access patterns.

In most CRUD applications, mobile apps, and e-commerce software you tend to see read / write characteristics like this:

Read and Write characteristics in a traditional application

This isn’t a controversial opinion – it’s just a fact of how most networked applications work. Data is read far more often than it’s written.

That’s why all relational databases and most document databases are optimized to cache frequently read items into memory – because that’s how the data is used in the vast majority of use cases.

In analytics though, the relationship is inverted:

Read and Write characteristics in an analytics application

By the time a MarkedUp customer views a report on our dashboard, that data has been written to anywhere from 1,000 to 10,000,000 times since they viewed their report last. In analytics, data is written multiple orders of magnitude more frequently than it’s read.

So what implications does this have for our choice of database?

Database Criteria

Looking back to what went wrong with RavenDB, we determined that it was fundamentally flawed in the following ways:

  • Raven’s indexing system is very expensive on disk, which makes it difficult to scale vertically – even on SSDs Raven’s indexing system would keep indexes stale by as much as three or four days;
  • Raven’s map/reduce system requires re-aggregation once it’s written by our data collection API, which works great at low volumes but scales at an inverted ratio to data growth – the more people using us, the worse the performance gets for everyone;
  • Raven’s sharding system is really more of a hack at the client level which marries your network topology to your data, which is a really bad design choice – it literally appends the ID of your server to all document identifiers;
  • Raven’s sharding system actually makes read performance on indices orders of magnitude worse (has to hit every server in the cluster on every request to an index) and doesn’t alleviate any issues with writing to indexes – no benefit there;
  • Raven’s map/reduce pipeline was too simplistic, which stopped us from being able to do some more in-depth queries that we wanted; and
  • We had to figure out everything related to RavenDB on our own – we even had to write our own backup software and our own indexing-building tool for RavenDB; there’s very little in the way of a RavenDB ecosystem.
So based on all of this, we decided that our next database system needed to be capable of:
  1. Integrating with Hadoop and the Hadoop ecosystem, so we could get more powerful map/reduce capabilities;
  2. "Linear" hardware scale – make it easy for us to increase our service’s capacity with better / more hardware;
  3. Aggregate-on-write – eliminate the need to constantly iterate over our data set;
  4. Utilizing higher I/O – it’s difficult to get RavenDB to move any of its I/O to memory, hence why it’s so hard on disk;
  5. Fast setup time – need to be able to move quickly;
  6. Great ecosystem support – we don’t want to be the biggest company using whatever database we pick next.

The Candidates

Based on all of the above criteria, we narrowed down the field of contenders to the following:

  1. MongoDB
  2. Riak
  3. HBase
  4. Cassandra

Evaluation Process

The biggest factor to consider in our migration was time to deployment – how quickly could we move off of Raven and restore a high quality of service for our customers? We tested this in two phases:
  1. Learning curve of the database – how long would it take us to set up an actual cluster and a basic test schema?
  2. Acceptance test – how quickly could we recreate a median-difficulty query on any of these systems?
So we did this in phases, as a team – first up was HBase.

HBase

HBase was highly recommended to us by some of our friends on the analytics team at Hulu, so this was first on our list. HBase has a lot of attractive features and satisfied most of our technical requirements, save the most important one – time to deployment.

The fundamental problem with HBase is that cluster setup is difficult, particularly if you don’t have much JVM experience (we didn’t.) It also has a single point of failure (edit: turns out this hasn't been an issue since 0.9x,) is a memory hog, and has a lot of moving parts. That being said, HBase is a workhorse – it’s capable of handling immensely large workloads. Ultimately we decided that it was overkill for us at this stage in our company and the setup overhead was too expensive. We’ll likely revisit HBase at some point in the future though.

Riak

Riak

One of our advisors is a heavy Riak user, so we decided it was worth exploring. Riak, on the surface, is a very impressive database – it’s heinously easy to set up a cluster and the HTTP REST API made it possible for us to test it using only curl.

After getting an initial 4-node cluster setup and writing a couple of “hello world” applications, we decided that it was time to move onto phase 2: see how long it would take to port a real portion of our analytics engine over to Riak. I decided to use Node.JS for this since there’s great node drivers for both Raven and Riak and it was frankly a lot less work than C#. I should point out that CorrugatedIron is a decent C# driver for Riak though.

So, it took me about 6 hours to write the script to migrate a decent-sized data set into Riak – just enough to simulate a real query for a single MarkedUp app.

Once we had the data stuffed into our Riak cluster I wrote a simple map/reduce query using JavaScript and ran it – took 90 seconds to run a basic count query. Yeesh. And this map/reduce query even used key filtering and all of the other m/r best practices for Riak.

Turns out that Map/Reduce performance with the JavaScript VM is atrocious and well-known in Riak. So, I tried a query using the embedded Erlang console using only standard modules – 50 seconds. Given the poor map/reduce performance and the fact that we’d all have to learn Erlang, Riak was out. Riak is a pretty impressive technology and it’s easy to set up, but not good for our use case as is.

MongoDB

MongoDB

I’ve used MongoDB in production before and had good experiences with it. Mongo’s collections / document system is nearly identical to RavenDB, which gave it a massive leg up in terms of migration speed.

On top of that, Mongo has well-supported integration with Hadoop and its own aggregation framework.

Things were looking good for Mongo – I was able to use Node.JS to replicate the same query I used to test Riak and used the aggregation framework to get identical results within 3 hours of starting.

However, the issue with MongoDB was that it required us to re-aggregate all of our data regularly and introduced a lot of operational complexity for us. At small scale, it worked great, but under a live load it would be very difficult to manage Mongo’s performance, especially when adding new features to our analytics engine.

We didn’t write Mongo off, but we decided to take a look at Cassandra first before we made our decision.

Cassandra

Cassandra

We started studying Cassandra more closely when we were trying to determine if Basho had any future plans for Riak which included support for distributed counters.

Cassandra really impressed us from the get-go – it would require a lot more schema / data modeling than Riak or MongoDB, but its support for dynamic columns and distributed counters solved a major problem for us: being able to aggregate most statistics as they’re written, rather than aggregating them with map/reduce afterwards. On top of that, Cassandra’s slice predicate system gave us a constant-time lookup speed for reading time-series data back into all of our charts.

But Cassandra didn’t have all of the answers – we still needed map/reduce for some queries (ones that can’t or shouldn’t be done with counters) and we also needed the ability to traverse the entire data set.

Enter DataStax Enterprise Edition– a professional Cassandra distribution which includes Hive, Hadoop, Solr, and OpsCenter for managing backups and cluster health. It eliminated a ton of setup overhead and complexity for us and dramatically shortened our timeline to going live.

Evaluating Long-Term Performance

Cassandra had MongoDB edged out on features, but we still needed to get a feel for Cassandra’s performance. eBay uses Cassandra for managing time-series data that is similar to ours (mobile device diagnostics) to the tune of 500 million events a day, so we were feeling optimistic.

Our performance assessment was a little unorthodox – after we had designed our schema for Cassandra we wrote a small C# driver using FluentCassandra and replayed a 100GB slice of our production data set (restored from backup on a new RavenDB XL4 EC2 machine with 16 cores, 64GB of RAM, and SSD storage) to the Cassandra cluster; this simulated four month’s worth of production data written to Cassandra in… a little under 24 hours.

We used DataStax OpsCenter to graph the CPU, Memory, I/O, and latency over all four of our writeable nodes over the entire migration. We set our write consistency to 1, which is what we use in production.

Here are some interesting benchmarks – all of our Cassandra servers are EC2 Large Ubuntu 12.04 LTS machines:

  1. During peak load, our cluster completed 422 write requests per second– all of these operations were large batch mutations with hundreds rows / columns at once. We weren’t bottlenecked by Cassandra though – we were bottlenecked by our read speed pulling data out RavenDB.
  2. Cassandra achieved a max CPU utilization of 5%, with an average utilization of less than 1%.
  3. The amount of RAM consumed remained pretty much constant regardless of load, which tells me that our memory requirements never exceeded the pre-allocated buffer on any individual node (although we’ve spiked it since during large Hive jobs.)
  4. Cassandra replicated the contents of our 100GB RavenDB data set 3 times (replication factor of 3 is the standard) and our schema denormalized it heavily – despite both of those factors (which should contribute to data growth) Cassandra actually compressed our data set down to a slim 30GB, which provided us with storage savings of nearly 1000%! This is due to the fact that RavenDB saves its data as tokenized JSON documents, whereas everything is as byte arrays in Cassandra (layman’s terms.)
  5. Maximum write latency for Cassandra was 70731µs per operation with an an average write latency of 731µs. Under normal loads the average write latency is around 200µs.
Our performance testing tools ran out of gas long before Cassandra did. Based on our ongoing monitoring of Cassandra we’ve observed that our cluster is operating at less than 2% capacity under our production load.

We’ll see how that changes once we start driving up the amount of Hive queries we run on any given day.

We never bothered running this test with MongoDB – Cassandra already had a leg up feature-set wise and the performance improvements were so remarkably good that we just decided to move forward with a full migration shortly after reviewing the results.

Hive and Hadoop

The last major piece of our stack is our map/reduce engine, which is powered by Hive and Hadoop. Hadoop is notoriously slow, but that’s ok. We don’t serve live queries with it – we batch data periodically and use Hive to re-insert it back into Cassandra.

Hive is our tool of choice for most queries, because it’s an abstraction that feels intuitive to our entire team (lots of SQL experience) and is easy to extend and test on the fly. We’ve found it easy to tune and it integrates well with the rest of DataStax Enterprise Edition.

Conclusion

It’s important to think carefully about your data and your technology choices, and sometimes it can be difficult to do that in a data vacuum. Cassandra, Hive, and Hadoop ended up being the right tools for us at this stage, but we only arrived at that conclusion after actually doing live acceptance tests and performance tests.

Your mileage may vary, but feel free to ask us questions in the comments!

Helios 2.0 Development Diary 1 - Clean Slate

$
0
0

To my eternal shame, I've never blogged about one of the most important open source projects I'm involved in: Helios. Helios is for all intents and purposes a .NET port of Java's wildly successful Netty project, the reactive high-performance socket server that powers the internals of systems like Apache Cassandra and Typesafe's Akka project.

Helios has dutifully powered the remoting layer behind Akka.NET (my main OSS project) for the past couple of years, and is capable of being a powerful tool within the .NET distributed programming ecosystem in its own right.

Helios' Goals

Helios' job is provide .NET developers with the following tools:

  1. An asynchronous, event-driven network application programming framework that works across both TCP and UDP, client or server;
  2. Message framing and buffering capabilities; and
  3. An extensible, simple programming model; and
  4. Lots of other useful tools for working with sockets, such as serializers, loggers, throttling controls, and so forth.

The previous implementations of Helios (currently at version v1.4) have also included a number of other useful utilities such as circular buffer implementations and other types of useful collections.

However, Helios hasn't been able to provide .NET developers with the following thus far:

  1. Security options for socket communications using industry-standard protocols such as TLS and DTLS;
  2. A sufficiently flexible, intuitive programming model that can easily be picked up by a new user;
  3. Robust and predictable socket behavior under all circumstances (we currently have some race conditions when it comes to detecting malformed message frames;) and
  4. Extremely good performance out of the box.

I attribute most of these shortcomings to me having to learn the hard way how to work with sockets during a period of immense time pressure.

Helios 2.0: a Clean Slate

Thus I'm starting work on Helios 2.0 beginning by cleaning the slate. I deleted all code from the helios-2.0 branch that I wasn't 100% sure we would be using in Helios 2.0.

My goal for Helios is to have it truly become a C# port of Netty, leveraging the years of valuable production knowledge from Norman Maurer and the tens of thousands of other Netty users.

Looking for Contributors

I have a bit of baseline work that needs to be done on the core Helios 2.0 skeleton yet before it'll be actionable for other contributors to get involved, but you can start by reviewing the Helios 2.0 specs and forking / starring the Helios Github repository.

In the meantime, if you're interested in the project then subscribe to updates from this blog! I'll be regularly updating my development journal for Helios 2.0 as I make work on it individually and as other contributors send pull requests too!


Helios 2.0 Development Diary 2 - Channels, Config, and the Curiously Recurring Template Pattern

$
0
0

Picking up where I left off in the previous Helios 2.0 diary entry... After clearing the decks of all of any code I wasn't 100% certain we'd be keeping, I began writing new code.

IChannel, IChannelConfig, and more

I immediately began by porting Netty's Channel interface to C# (IChannel interface,) as this defines the root element of work behind the Channel API in Helios.

The original Channel interface is chock-full of detailed, helpful comments written by the Netty team and I did my best to port most of those over onto the IChannel interface in Helios, but I got impatient and left a couple of them to be filled-in as TODOs (send me a PR!)

Next I got to work on porting all of the ChannelConfig options, which included porting the AbstractConstant and ChannelOption classes.

The Curiously Recurring Template Pattern

So I noticed something interesting when looking at the original ChannelOption Java classes in Netty...

ChannelOption.java

publicfinalclassChannelOption<T>extendsAbstractConstant<ChannelOption<T>>{/* ... */}

Huh, the ChannelOption extends an AbstractConstant class but uses itself as a generic parameter... So I wonder what the deal with the AbstractConstant<T> class is...

AbstractConstant.java

publicabstractclassAbstractConstant<TextendsAbstractConstant<T>>implementsConstant<T>{/* ... */}

So the AbstractConstant<T> takes a generic argument of type T, which is constrained to be of type... AbstractConstant<T>.

My braaaaaaaaaaaain

Ooooook. It hurt my brain to even figure out what this code is and why someone would write a generic constraint this way.

I don't know all of the intricacies of the JVM and all of the nuances of their type system, so my first thought in porting this code was to determine if it's even possible to implement this pattern in C#.

As it turns out: generic classes with self-referencing constraints are legal in C#.

I took comfort in knowing that even the great Eric Lippert also experiences brain pain at trying to comprehend this code pattern:

But this really hurts my brain:

class Blah<T> where T : Blah<T>

That appears to be circular in (at least) two ways. Is this really legal?

Yes it is legal, and it does have some legitimate uses. I see this pattern rather a lot(**). However, I personally don't like it and I discourage its use.

This is a C# variation on what's called the Curiously Recurring Template Pattern in C++, and I will leave it to my betters to explain its uses in that language. Essentially the pattern in C# is an attempt to enforce the usage of the CRTP.

TL;DR; a bunch of evil C++ programmers, in a misguided effort to comply with the Liskov Substitution Principle, invented this smelly pattern which ended up not being able to enforce the LSP anyway.

The Ugliest Boxing

So I decided to design my C# ChannelOption class in Helios to avoid this pattern, mostly to help make the code more comprehensible.

/// <summary>/// A <see cref="ChannelOption{T}"/> enable us to configure a <see cref="IChannelConfig"/> in a/// type-safeway. Which <see cref="ChannelOption{T}"/> is supported depends on the actual implementation/// of <see cref="IChannelConfig"/> and may depend on the nature of the transport it belongs to./// </summary>publicsealedclassChannelOption<T>:AbstractConstant{publicChannelOption(intid,stringname):base(id,name,typeof(T)){}/// <summary>/// Validate the value which is set for the <see cref="ChannelOption{T}"/>. /// </summary>/// <param name="value">The value that will be set for this option.</param>publicvoidValidate(Tvalue){if(value==null)thrownewArgumentNullException("value");}#region Conversion// Cache of previously casted values, since template downcasting works a little differently in C#privatestaticreadonlyConcurrentDictionary<ChannelOption<object>,ChannelOption<T>>CastedValues=newConcurrentDictionary<ChannelOption<object>,ChannelOption<T>>();publicstaticimplicitoperatorChannelOption<T>(ChannelOption<object>obj){returnCastedValues.GetOrAdd(obj,newChannelOption<T>(obj.Id,obj.Name));}#endregion}

You'll notice the implicit cast between ChannelOption<object> and ChannelOption<T> - I had to implement this because the following is a perfectly valid cast in Java:

publicfinalclassChannelOption<T>extendsAbstractConstant<ChannelOption<T>>{privatestaticfinalConstantPool<ChannelOption<Object>>pool=newConstantPool<ChannelOption<Object>>(){@OverrideprotectedChannelOption<Object>newConstant(intid,Stringname){returnnewChannelOption<Object>(id,name);}};/**     * Returns the {@link ChannelOption} of the specified name.     */@SuppressWarnings("unchecked")publicstatic<T>ChannelOption<T>valueOf(Stringname){return(ChannelOption<T>)pool.valueOf(name);}//rest of the class}

This cast (ChannelOption<T>) pool.valueOf(name); casts an object of type ChannelOption<object> into a ChannelOption<T> like it's no big deal. This cast isn't legal in C#, so in order to try to stick with Netty's design as close as I can I had to add implicit operator to ChannelOption<T> in C#.

Plus I had to write some evil-looking code inside DefaultChannelConfig like this, as the result of more usages of this weird cast in the original Java code:

publicTGetOption<T>(ChannelOption<T>option){if(option==null)thrownewArgumentNullException("option");if(option==ChannelOption.CONNECT_TIMEOUT_MILLIS)return(T)(object)ConnectTimeoutMillis;if(option==ChannelOption.MAX_MESSAGES_PER_READ)return(T)(object)MaxMessagesPerRead;if(option==ChannelOption.WRITE_SPIN_COUNT)return(T)(object)WriteSpinCount;if(option==ChannelOption.ALLOCATOR)return(T)(object)Allocator;//TODO: RCVBUF_ALLOCATORif(option==ChannelOption.AUTO_READ)return(T)(object)AutoRead;if(option==ChannelOption.WRITE_BUFFER_HIGH_WATER_MARK)return(T)(object)WriteBufferHighWaterMark;if(option==ChannelOption.WRITE_BUFFER_LOW_WATER_MARK)return(T)(object)WriteBufferLowWaterMark;//TODO: MESSAGE_SIZE_ESTIMATORreturndefault(T);}

For clarification: the above is the C# code I wrote.

Take a closer look this: return (T)(object)ConnectTimeoutMillis;

In this code we are:

  1. Casting an int to an object via boxing;
  2. Casting that object to type T, an int, via boxing again.

No God! Please No!

Oh Java, you and your type system. You so crazy.

So yeah, that's some of the smelliest statically typed code I've ever written.

Thankfully, it's infrequently used even within Netty (most of these configuration properties are accessed via strongly typed getters and setters) so it doesn't pose any major performance risks.

Onto the next challenge!

Real-time Marketing Automation with Distributed Actor Systems and Akka.NET

$
0
0

MarkedUp Logo

This is an archive of a blog post I wrote for the MarkedUp Analytics blog on July 23rd, 2014. It's been a popular post and I'm posting it here in order to preserve it. I shut down MarkedUp in November, 2014.

The MarkedUp team has had experience developing SDKs, high-performance APIs, working with Cassandra / Hadoop, and service-oriented software for several years. However, when we started laying out the blueprint for MarkedUp In-app Marketing we knew that it would require a radically different set of tools than the traditional stateless HTTP APIs we’d been used to developing for some time.

Some Background

Before we dive into the the guts of how the system works, it’s more important to understand what the MarkedUp team is trying to achieve and why.

Most marketing automation software is borderline terrible, ineffective, or totally onerous to use. This is largely because marketing automation is an afterthought and is hardly ever integrated into systems that software companies regularly use, such as product analytics.

We wanted to build a product that made it extremely easy to:

  1. Quickly and successfully set up drip campaigns of messages, especially for users who are non-technical;
  2. Allow customers to leverage existing MarkedUp Analytics events and data;
  3. Target and segment users based on behavior AND demographics; and
  4. Measure results quickly.

Our Analytics services have a reputation for being tremendously easy to use compared to the alternatives – we wanted to create a similarly good experience for in-app marketing automation.

With that background in mind, now we can talk about how In-app Marketing works.

How In-app Marketing Works

We included the following diagram in our public launch of MarkedUp In-app Marketing, and it’s a helpful launch point for discussing the technology behind it.

How MarkedUp In-app Marketing Works

The process for working with MarkedUp In-app Marketing looks like this, from the point of view of one of our users:

  1. Integrate the MarkedUp SDK and define events inside your application;
  2. Release your app with MarkedUp successfully integrated to your users and begin collecting data;
  3. Sign up for MarkedUp In-app Marketing and define campaigns of messages that are delivered based on an end-user behavior;
  4. MarkedUp In-app Marketing immediately begins filtering users based off of the behavior segments you defined and automatically subscribes them to any eligible campaigns.

And here’s an example of an actual behavior we can target:

"Select all users from the United States and Canada who installed after 6/1/2014, have run the app within the past two days, have had event foo, had event bar, and have viewed page Buy."

In this case events "foo" and "bar" are some arbitrary events that are specific to a MarkedUp customer’s application – they can mean anything.

Technical Challenges

So what’s challenging about building this product? Let’s put this in list form:

  1. [Real-time] Customers get the best results on their messages when they’re delivered immediately after a user is "subscribed" to a campaign– therefore, our segmentation system for filtering subscribers must be capable of making decisions within seconds of receiving critical events. This eliminates traditional batch processing via Hive / Hadoop queries as a feasible option – we’re going to have to process data as streams instead.
  2. [Stateful] Overwhelmingly, most campaigns customers define will require us to observe multiple events per user– this means that we have to maintain state that is incrementally completed for each user over multiple successive events. Given that this needs to be done in real-time we’re not going to read/write from a database on every request – state will probably have to be kept in-memory somewhere.
  3. [Highly Available] The system must be capable of supporting millions of parallel data streams and must be able to recover from failures– we need to have a way of throwing hardware at the problem when demand on the system increases (happens suddenly when it does) and we need to be able to recover from software and hardware failures quickly.
  4. [Remoting] The targeting system needs to be able to use the data-stream from our API in a loosely coupled fashion– we need some way of quickly sharing the fire hose of data that our API servers collect, and do it in a way that is loosely coupled enough where a downed in-app marketing server won’t disrupt the API from serving its primary function: storing customer’s data.

The real-time component is what makes this entire project a challenge – and no, it is not optional. It’s a real competitive advantage and essential to our "it just works" promise we make to our customers. We’re in the business of delivering on promise of "better experience than everything else" at MarkedUp.

So how in the hell were we going to be build a system that was highly-available, stateful, loosely-coupled, and able to respond in real-time?

Solution: Actor Model

As with 99% of challenging technical problems faced by today’s software developers, a solution was already invented in the early 1970s: the Actor model.

The Actor model’s premise is that every component of your system is an "actor" and all actors communicate with each other by passing immutable messages to each other. Each actor has a unique address inside an actor system, even across multiple physical computers, so it’s possible to route a message to one specific actor. Actors are also composed in hierarchies, and parent actors are responsible for supervising the child actors one level beneath them – if a child actor crashes suddenly the parent actor can make a decision about how to proceed next.

The actor model appealed to us for the following reasons:

  1. Actors are cheap – you can have 10s of millions of them with minimal overhead. This means that we can have an actor to track every user / campaign tuple, of which there might be millions. This localizes the "filtering" problem nicely for us – we can define a filter that operates at the per-user level, so it’s a tiny piece of code. Sure, there might be millions of these filters running at once – but the code is highly atomized into small pieces.
  2. Remoting and addressing between actor systems makes it easy to route data for each user to a specific location– using a technique like consistent hash routing, we can push all of the state for each individual user to the same location in memory even across a large cluster of machines and do it in a way that avoids load-balancing hotspots.
  3. Actors only process one message in their inbox at a time– therefore it’s really easy to for us to process streams for individual users, since it will be done serially. This allows us to process data streams for each individual user and each in-app marketing campaign with a simple Finite State Machine.
  4. Actor hierarchies and supervision allow our software to be self-healing and highly available– the supervision aspect of actor hierarchies is immensely powerful. It allows us to make local decisions about what to do in the event of failure, and we can simply "reboot" part of our actor system automatically if something goes wrong. In the event of hardware failure, we can re-route around an unavailable node and redistribute the work accordingly.
  5. Actor model offers a fantastically simple API for highly concurrent computing, which is exactly what we need. We’re handling thousands of parallel events for hundreds of different apps running on millions of different devices – our incoming data stream is already inherently concurrent. Being able to manage this workload in a stateful way is challenging, but the Actor model exposes a simple API that eliminates the need for us to worry about threads, locks, and the usual synchronization concerns.

There are certainly other ways we could have solved this problem and we evaluated them, but we were ultimately sold on the Actor model because of its simplicity relative to the others.

Distributed Actor Systems in .NET with Akka.NET

We teamed up with some other like-minded folks to develop Akka.NET – a distributed actor framework for .NET that closely follows Scala’s Akka project.

Akka.NET offers all of the important features of the Actor model in C# and F#, and a number of critical features that were essential to building MarkedUp In-app Marketing:

  1. A hefty collection of built-in router actors, such as the RoundRobinRouter and the ConsistentHashRouter – both of which include the ability to automatically scale up or down on-demand if needed (via the Resizer function.)
  2. Out of the box Finite State Machine actors, which are exactly we need for segmenting users.
  3. Robust actor supervision and message scheduling APIs, which we use for self-terminating and remote-terminated actors.
  4. Remoting capabilities for distributing actor workloads across multiple physical systems.
  5. Highly extensible logging and actor system configuration APIs.
  6. And some pretty insane performance benchmarks (21 million messages per second) – bottom line is that the overhead of the Actor system itself probably isn’t going to be an issue for us.

We settled on Akka.NET as our framework and used it as the building blocks for the back-end of MarkedUp In-app Marketing.

MarkedUp In-app Marketing Network Topology

We’ve left out some details of our service-oriented architecture above, but the network topology shown above covers the In-app Marketing product in its entirety.

MarkedUp has two public services exposed directly to end-users via an HTTP API:

  1. The "MarkedUp API"– which is what our analytics SDKs communicate with inside your applications; it handles millions of HTTP requests per day and does 100m+ database writes per day. Most of those writes are counter updates which are inexpensive, but the bottom line is that there’s a lot of activity going on inside this API.
  2. The Win32 Mailbox Service– a brand new service that we released as part of our In-app Marketing launch for Windows Desktop applications; all of our Win32 messaging clients work via HTTP polling since there’s a number of tricky permissions issues related to keeping an open socket in the background on each version of Windows (a subject for a separate blog-post.) This is the endpoint these clients use to check for available messages.

The goal of our Targeting System is to take the streams of data directly from the MarkedUp API servers and populate mailbox messages for individual users in accordance with the app developer’s filtering rules, and we use Akka.NET to do this.

Filtering Messages, Users, and Campaigns with Actors, State Machines, and Routing

Success for MarkedUp In-app Marketing’s Targeting System is defined as "being able to subscribe a user into one or more campaigns within seconds of receiving the segmentation data specified by the customer" for N concurrent users per server, where N is a coefficient determined largely by the size of the hardware and number of potential campaigns per-user, which varies for each one of our customer’s apps.

Our product is designed to filter messages for specific users for campaigns that are specific a customer’s app, so we reflected these relationships in our actor hierarchy.

Markedup IAM remote routers

Data arrives to the Targeting System from the MarkedUp API via Akka.NET’s built-in remoting over TCP sockets, and we’ll get into the details in a moment. For the time being, all you need to know is that the data is routed to our API Router, a round-robin pool router actor that specializes in concurrently load-balancing requests across a number of worker actors (API Router Agents) who actually respond to the requests.

The number of workers that exist at any given time can be hard-coded to a value of N workers, or it can be resizable based on load depending on how you configure the router.

Each API Router Agent is responsible for doing one thing: making sure that data for a specific user makes it to that user’s actor. Here’s what that process looks like:

Markedup IAM actor hierarchy

And here’s a rough idea of what the source code looks like:

publicclassMarkedUpApiRouterActor:UntypedActor{privateActorSelection_appIndexActor;protectedoverridevoidPreStart(){_appIndexActor=Context.ActorSelection(ActorNames.MarkedUpAppMasterActor.Path);}protectedoverridevoidOnReceive(objectmessage){PatternMatch.Match(message).With<IMKSession>(m=>ForwardToRequestActor(m.AppId,m)).With<IUser>(m=>ForwardToRequestActor(m.AppId,m)).With<ISessionEvent>(m=>ForwardToRequestActor(m.AppId,m)).With<IMKLogMessage>(m=>ForwardToRequestActor(m.AppId,m)).With<ICommercialTransaction>(m=>ForwardToRequestActor(m.AppId,m)).Default(Unhandled);}privatevoidForwardToRequestActor<T>(stringappId,Tmessage){_appIndexActor.Tell(newForwardOntoApp<T>(appId,message));}}

The PatternMatch.Match method is used to filter messages based off of their C# type – any messages that we don’t match are "unhandled" and logged. In Akka.NET, all messages are just objects.

In terms of where we’re sending messages, we have a fixed address scheme inside our in-app marketing product that makes it easy for us to locate individual users, campaigns, and apps. Suppose we have an app called "App1" and a user called "UserA" – we use Akka.NET’s built-in address scheme to make it really easy to determine if this user already exists.

Every single actor inside Akka.NET has a unique address – expressed via an ActorUri and ActorPath, like this:

akka.tcp://<hostname>:<port>@<actor-system-name>/user/{parent}/{child}

When you’re routing messages within the in-process actor system all you really care about is the ActorPath – the /user/parent/child part.

Markedup IAM user actor hierarchy

We constructed our actor hierarchy to include a single App Master actor (/user/apps), responsible for supervising every "App Actor" *for our customer’s apps (/user/apps/{customer’s app ID}) – and every single user we ever observe inside MarkedUp is always associated with an app, so every App Actor supervises one or more *"User Actors" who can be found at /user/apps/{customer’s app ID}/{userId}.

If App Master shuts down or restarts, all of its child actors shutdown or restart with it – if a child actor dies on its own, it’s up to the App Master to decide what to do next. This is the essence of how actor supervision works.

So the API Router Agent forwards the message to App Master which kicks off a process of lazily creating App and User actors on-demand, but eventually the messages do arrive inside the inbox of the User Actor.

The User Actor implements an **Akka.NET Finite State Machine** to determine which campaigns this user should start filtering for – this is determined by (1) which, if any, campaigns are available for this app and (2) which campaigns this user has already been subscribed.

Here’s what the User Actor’s initial state looks like in C#:

When(UserState.Initializing,fsmEvent=>{State<UserState,UserStateData>nextState=null;fsmEvent.FsmEvent.Match().With<RecycleIfIdle>(recycle=>{nextState=IsIdle()?Stop(newNormal()):Stay();}).With<UserLoadResponse>(load=>{SetLastReceive();nextState=DetermineUserLoadedState(load);}).With<AddCampaign>(add=>{SetLastReceive();nextState=HandleAddCampaign(add.CampaignId);}).With<RemoveCampaign>(remove=>{SetLastReceive();nextState=HandleRemoveCampaign(remove.CampaignId);}).Default(m=>{SetLastReceive();CurrentStash.Stash();nextState=Stay();});returnnextState;});

After a User Actor has determined that it’s eligible to be subscribed into at least 1 additional campaign, it’ll change it’s state into a "Ready" state where it begins sending messages to the "Campaign State Actors" responsible for filtering the rules for every possible campaign this user can belong to.

The User Actor has three jobs:

  1. Determine which campaigns a specific app user can be a possible subscriber;
  2. Serialize all of the data for this user into a linear sequence, based on when the message arrived, and hand this data over to the Campaign State Actors for filtering; and
  3. Automatically shut down the User Actor if the user stops being active.

Items #1 and #2 are pretty generic, but item #3 is more interesting – how do we determine that an app user is no longer using their application?

We do this by setting a "Receive Time" that marks the UTC time a user last received a message from our API:

SetLastReceive();SendToFilter(m);nextState=Stay();

The SetLastReceive function sets this time value, and then we have a timer using Akka.NET’s FSM’s built-in scheduler that checks on whether on not this user is still active once every 60 seconds:

StartWith(UserState.Initializing,newInitialUserStateData(_appId,_userId));SetTimer(RecycleIfIdleTimerName,newRecycleIfIdle(),TimeSpan.FromMinutes(1),true);

If we receive a "RecycleIfIdle" message from this timer and the user is determined to be idle:

privateboolIsIdle(){returnMath.Abs((DateTime.UtcNow-_lastReceivedMessage).TotalSeconds)>=SystemConstants.MaxUserIdleTime.TotalSeconds;}

Then the User Actor stops itself and all of the Campaign State actors beneath it. This is how we free up memory and resources for future actors.

The Campaign State actor itself is another FSM and it communicates with a group of dedicated "Filter Actors" who process the campaign’s rules via a domain-specific language we invented for filtering. All the Campaign State actor does it process the results from the Filter Actors and save its state to Cassandra or send messages to the user if the user’s event stream satisfies all of the requirements for a campaign.

Communication between Remote Actor Systems with Akka.Remote

The MarkedUp API and the Targeting System communicate with each other via Akka’s Remoting features using a TCP socket and Google’s protocol buffers, and the MarkedUp API uses the ActorUri and ActorPath convention I showed you earlier to ensure that these messages are routed directly to the API Router Actor on the Targeting System.

publicclassMessagePublishingActor:UntypedActor{privateConfig_routerConfig;privateActorRef_router;protectedoverridevoidPreStart(){varconfig=@"routees.paths = [""akka.tcp://markedup-notifications@$REPLACE$/user/api""            ]";varnotificationsEndpoint=ConfigurationManager.AppSettings["NotificationsEndpoint"]??"127.0.0.1:9991";config=config.Replace("$REPLACE$",notificationsEndpoint);_routerConfig=ConfigurationFactory.ParseString(config);_router=Context.ActorOf(Props.Empty.WithRouter(newRoundRobinGroup(_routerConfig)));}protectedoverridevoidOnReceive(objectmessage){_router.Tell(message);}}

The MessagePublishingActor lives inside the MarkedUp API and uses a RoundRobinGroup router to communicate with specific, named Actors on a remote system.

In production we can have several Targeting System’s serving as routees and we use a ConsistentHash router to make sure that all messages for the same user always arrive at the same server, but for the sake of brevity I rewrote this use a single server and a RoundRobinGroup router.

A RoundRobinGroup is different from a RoundRobinPool in that the RoundRobinGroup doesn’t directly supervise or manage it routees – it forwards messages to actors that are pre-created, whereas the RoundRobinPool creates and manages the worker actors themselves.

The important part, however, is the addressing – using the /user/api convention, which is the valid ActorPath for the API Router Actor on the Targeting System, Akka will automatically route my messages from the API server to the Targeting System via TCP, and then Akka’s remoting internals will ensure that these messages are correctly routed to this actor.

As for the messages themselves, and this is important – both the MarkedUp API and the Targeting System share a common assembly that defines all of the message types that can be exchange over the network. Otherwise we couldn’t deserialize any of those messages at the other end of the network connection.

Wrapping Up

Akka.NET has been a boon to our productivity, because of how simple its programming model is – instead of writing a piece of code that tries to determine campaign eligibility for 10,000 users in parallel, we can write a piece of code that makes that determination for a single user and run 10,000 instances of it with minimal overhead.

The actor model does an excellent job of atomizing code into very small parts, particularly because actors can only process one message in their inbox at a time (except for router actors.) The serial nature of message processing inherently makes everything inside an actor thread-safe, so you can store local state inside each actor instead of having to rely on a big synchronized cache or polling a distributed cache in Redis / Memcached.

We’ll have some more posts in the future about some of the other cool stuff we’re using Actors for and some of the integrations we’ve set up, such as our live debugger using SignalR and Akka.NET’s logging capabilities.

MarkedUp In-App Marketing Demo (Post-mortum)

I preserved the original MarkedUp In-App Marketing Demo on my personal YouTube channel, so you can see what this system actually did!

Visual Studio ProTip: Copying Binaries on Pre and Post-Build Macros

$
0
0

Last year I had to spend a fair amount of time working on C and C++ projects in Visual Studio 2013, and one of the tasks that I had to learn how to do was use Visual Studio's pre-build and post-build events to copy all of my dependent DLLs into the final output folder for my applications.

In C# we take the ability to do this for granted - if we need to add a reference to another project in our Visual Studio solution we just Add Reference --> Projects --> [Project Name] and boom, we're all set. Visual Studio will automatically copy our dependent binaries into our final /bin/[Release|Debug] folder automatically.

In C/C++ - hellllllllll no. You have to do this all yourself!

And as it turns out, there's lots of cases where you might need to do this yourself inside C# projects too. If your application uses run-time loading (Assembly.Load) or if you need to be able to call Process.Start to launch a custom application that you've written, then you might want to be able to easily debug your application by copying the dependent, non-reference binaries you need into your application's bin folder.

I'm going to show you how to do that!

Using Pre-Build and Post-Build Visual Studio Macros

During the course of some routine work on Akka.NET today, I needed a way to test our MultiNodeTestRunner.exe - a custom unit test runner we developed to help test Akka.NET's highly availability modules and user-defined, distributed applications built on top of Akka.NET.

This test runner needs to be able to do both assembly run-time loading AND launch third-party processes, so it's a perfect example of a real-world application that can utilize pre and post-build macros for easy debugging.

So in Visual Studio, I have the following two projects inside the Akka.NET solution:

  • Akka.MultiNodeTestRunner - a project that defines the MultiNodeTestRunner.exe, used to run all MultiNodeSpecs.
  • Akka.Cluster.Tests.MultiNode - a .DLL that contains a bunch of MultiNodeSpec tests that will be run by the MultiNodeTestRunner.exe.

We're going to copy the output binaries from Akka.Cluster.Tests.MultiNode INTO the bin folder for Akka.MultiNodeTestRunner, so we can run all of the tests using the same working directory as the MultiNodeTestRunner.exe itself.

Adding a Pre-Build Macro

To add a pre-build macro for a Visual Studio project, right click on the project (in this case,the Akka.MultiNodeTestRunner project) and view Properties, then click on Build Events.

Blank pre-build event in project view in Visual Studio

Next, we're going to use some good-ole MS-DOS commands to XCOPY all of the binaries from Akka.Cluster.Tests.MultiNode into the bin directory of Akka.MultiNodeTestRunner. And we're using MS-DOS commands because PowerShell is for script kiddies, right! Actually... I have no idea if Visual Studio supports PowerShell :p

echo PREBUILDSTEP for $(ProjectName)

echo Copying files from $(SolutionDir)core\Akka.Cluster.Tests.MultiNode\$(OutDir) to $(ProjectDir)bin\$(Configuration)\Akka.Cluster

if not exist "$(ProjectDir)bin\$(Configuration)\Akka.Cluster" mkdir "$(ProjectDir)bin\$(Configuration)\Akka.Cluster"

xcopy "$(SolutionDir)core\Akka.Cluster.Tests.MultiNode\$(OutDir)*.dll" "$(ProjectDir)bin\$(Configuration)\Akka.Cluster" /i /d /y
if errorlevel 1 goto BuildEventFailed

if errorlevel 1 goto BuildEventFailed 

REM Exit properly because the build will not fail 
REM unless the final step exits with an error code

goto BuildEventOK
:BuildEventFailed
echo PREBUILDSTEP for $(ProjectName) FAILED
exit 1
:BuildEventOK
echo PREBUILDSTEP for $(ProjectName) COMPLETED OK

This batch file uses XCOPY with wild-card matching to copy all binaries from Akka.Cluster.Tests.MultiNode\bin[Build Configuration] to Akka.MultiNodeTestRunner\bin[Build Configuration]\Akka.Cluster, and we need to paste this into the pre-build event under Debug configuration to make sure that this runs.

Populated pre-build event in project view in Visual Studio

Now we just need to kick off a build, and we should see all of these echo statements log their output into the Output window in Visual Studio.

Visual Studio pre-build macro logs in Output view

All of our output was successfully copied, and we can verify that be looking for the Akka.Cluster folder inside Akka.MultiNodeTestRunner\bin[Build Configuration].

Copied build output

So with all of the binaries in their proper place, we're almost ready to debug! Last thing we need to do is pass some command-line arguments to the MultiNodeTestRunner.exe via the Debug view in the Project Properties dialog.

Debug settings

And if we run the Akka.MultiNodeTestRunner project, viola! We can observe it successfully loading the Akka.Cluster.Tests.MultiNode.dll and executing the tests!

Application output

You can use this technique in a lot of different cases. Macros are eternal!

Akka.NET Request for Contributors: Akka.Cluster

$
0
0

Akka.NET - Distributed Actor Model for .NET

The next major milestone for Akka.NET is Akka.NET v1.1, the primary focus of which is a stable, production-ready release of the Akka.Cluster module.

We've made some solid progress towards that goal in some of our maintenance releases, but I came to the realization over the past couple of months that the way we've been working with our contributors isn't the most effective way for the project to achieve these milestones.

In this post I'm going to invite you, a .NET developer who's interested in distributed computing, to work with me directly on the guts of Akka.Cluster and take on a project that's bigger than you or me. And it'll be awesome - you'll learn a lot and make a real difference.

But in meantime, let's get some background for context.

What does Akka.Cluster do and why is it important?

Akka.Cluster is the foundation of Akka.NET's high-availability toolchain; it's what allows you to build elastic, distributed, fault-tolerant peer-to-peer networks of Akka.NET applications that don't have any single point of failure or bottleneck.

My Petabridge co-founder Andrew explained it well in Akka.NET: Introduction to Clustered Applications w/ Akka.Cluster (21:32), embedded below:

TL;DR; Akka.Cluster is what makes it feasible for .NET developers to build applications that are capable of being highly available and stateful in a way that's largely unheard of in the .NET ecosystem. It's capable of truly amazing stuff - just try out the WebCrawler Akka.Cluster demo we wrote and watch what happens when you spin up multiple Crawler nodes in the middle of a job.

All of the cool stuff depends on Akka.Cluster

Akka.Cluster is pretty cool in and of itself, but all of the really exciting modules all depend on it. For example:

  • Akka.Cluster.Sharding - automatically persist application state to a durable store and maintain in-memory replicas of that state across the cluster; includes the ability to execute partition handoffs and other fun stuff.
  • Akka.Cluster.Tools - gives developers the ability to extend the EventBus to work across the entire cluster, rather than just within one ActorSystem; allows you to specify cluster "singletons" that when killed will be recreated elsewhere in the cluster; and also allows for the ClusterClient, which enables you to create read-only clients for Akka.Cluster clusters that don't actually participate in the cluster themselves as members.
  • Akka.Cluster.Metrics - extend Akka.Cluster's built-in Gossip mechanism to include MetricsGossip, information about the CPU and memory utilization of each node in the cluster, and allows you to load-balance work distribution across the cluster based on those metrics. Works great in combination with auto-scaling.
  • Akka.Cluster.DData - Distributed Data, an experimental module that makes it really easy to share data across Akka.Cluster nodes using Communicative Replicated Data Types (CRDTs).
  • Akka.Streams - yeah, you can technically use it without Akka.Cluster, but for 80% real-world applications this capability is really meant to be used in conjunction with a distributed Akka.NET cluster.

Want those cool toys? Then let's ship Akka.Cluster first.

The current state of Akka.Cluster

Akka.Cluster has existed in some form since September, 2014 - so it's been around for about a year and users are running it under serious production workloads. The library has been mostly code complete for a while.

However, there's one major problem with Akka.Cluster: me. I've been the bottleneck on Akka.Cluster for most of its lifespan.

Historically I've created the impression that I have it all under control with Akka.Cluster, but the truth is that it's a project that's bigger than any one person. Today, if I have to take time away from Akka.Cluster to work on projects for Petabridge, speak at conferences, or whatever, then all progress on Akka.Cluster comes to a halt. That sucks and it's my fault.

So let's create a possibility where lots of people understand how the guts of Akka.Cluster work and are motivated to improve upon it and share the knowledge of how it works with others. That's what this is really all about.

I need your help

I have to come clean here - I've been reluctant to really ask for help on Akka.Cluster for a long time, because:

  • I believed that I had the bandwidth to do it all myself (how hard could it be?) and
  • I believed everyone else when they told me "I don't know if I know enough to work on that."

I don't believe either of those anymore. First and foremost, Akka.Cluster is a module that's too important to be the responsibility of just one contributor. If something happened to me then someone else would have to step up the take over that part of the project. Rather than have it come to that, I would rather coach some capable developers beginning this week on how that system works so others can always contribute and lead there.

Redundancy is just as important for people as it is for servers. Plus: I'm not perfect. I fuck stuff up all the time. And that's ok!

As for the second point: Akka.Cluster is not hard, it's just different - based on distributed programming concepts that are unfamiliar to most developers, but totally transformational once understood.

So here it is: if you're interested in distributed systems, .NET, and open source then I would like you to become a contributor with me on Akka.Cluster. I'm still going to be focusing most of my contributions in that area, and I'd like to do it alongside a team of contributors.

What needs to be done

The burden of this module is its high quality assurance requirements. Writing multi-node specs; ensuring that the MultiNode TestKit's behavior is correct; beating racy and inconsistent behavior out of the system; and thoroughly documenting the expected behavior of a cluster under lots of different scenarios.

Here's what needs to be done to complete our work on Akka.NET v1.1:

How to get involved

The first thing to do if you want to get involved is to fill out this interest form below!

We're asking people to fill this out so I can do the following to help you help us:

  • Schedule regular calls where we can coordinate on problems and work;
  • Train you on the internals and distributed programming concepts that power Akka.NET; and
  • Teach you how other parts of the Akka.NET development process work, such as our build system and the multi-node test runner.

I'm committed to doing that for you - and, frankly, pretty excited about it. I think this will be something huge.

Once you've filled this out, check out the outstanding issues for Akka.NET v1.1 on our waffle (in the "For Next Release" tab) and hop into the Akka.NET Gitter chat and introduce yourself if you haven't already! Looking forward to working with you!

Developers Who Can Build Things from Scratch

$
0
0

There's lots of different types of developers you're going to need to work with over the span of your career in the software business, but the one I want to talk about today is the kind you need when you're trying to build something new.

Finding a developer who can transform a set of ideas into something tangible is hard - they're out there, but it takes more than just knowing how to code. I gave a talk about startup product development (slides) to some entrepreneurs on Friday and none of the folks in the audience had deep technical backgrounds (no engineers, in other words.)

All of them had ideas for products they wanted to validate and were looking for ways on how to find the right types of engineers to work with early on.

A developer who can take a blank sheet of paper and turn it into a functioning product, under their own direction, is a rare - they are in possession of hard technical skills, excellent at explaining technical concepts to non-technical team members, can work with abstract requirements, able to anticipate future changes and revisions on their own, and are able to adapt on-the-fly.

What follows is the advice I give to entrepreneurs with non-technical backgrounds on how to recognize these types of developers - I'm going to call them "from scratch" developers.

Not attached to a particular way of doing things

An engineer who can build products from scratch is one who's not attached to a specific set of technologies or a specific way of building things. They'll objectively choose the right tools for the job depending on what it is.

In other words, you're not going to want any pet technologists or futurists from my "Taxonomy of Terrible Programmers."

Those are red flags right off of the bat - any engineer who's more focused to how they're building the product than what they're actually building lacks the maturity to do this job. They'd need to be managed by another engineer in order to provide the best ROI, so pass over these types for now.

Corollary: actively suggests using off-the-shelf software

A corollary to this property is if a developer actively suggests using pieces of off-the-shelf software, even if they have licensing costs associated with them, that's often a good indicator of actual go-to-market experience and being able to focus on what's important: delivering the product.

There's a middle ground to this though - you have to make sure the developer isn't overly dependent on a specific piece of infrastructure in order to do their job.

For instance, if I as a .NET developer wasn't able to use Windows Azure because the product had a data sovereignty or compliance requirement that prevented me from being able to use Azure - I'd need to be able to adapt to a new hosting environment in order to be the right developer for the job. Which brings us to the next requirement...

Not derailed by new requirements or new things to learn

Building something from scratch is always going to involve new ways of doing things, because every product is different at some level. It also inherently involves learning and changing things on the fly - because you will always discover new requirements after you actually start getting into the meat of trying to implement a product. Anyone who tells you otherwise should read up on the Nirvana fallacy.

So any developer who can do the job must be able to learn new things on the fly. For instance, if you're able to do some level of customer and product development in parallel (which you always should) and discover a new thing about your customers that needs to be factored into the product, what happens if your developer isn't willing to learn whatever they need to learn to fulfill that requirement?

Turtle on its back

Happens more often than you'd think! The first sign that you're dealing with a developer who's intellectually rigid like this is when they start telling you that things are "impossible." Nothing is ever really "impossible" - expensive, different, and challenging maybe. But rarely are things actually "impossible."

A developer who can build something for scratch will figure it out - they might need some help picking from a number of different possible options that they discover, but ultimately they're not going to get stopped by not knowing what to do initially.

Structured way of thinking, communicating, and asking

The most important trait of a "from scratch" developer is their ability to communicate and think in a structured manner.

The "from scratch" developer begins in an environment where they are the only engineer - the other stakeholders they're working with typically don't know what "technical debt" is; which technology stack is better for X; or even which parts of their software product are "expensive" to build. So it's the responsibility of the "from scratch" developer to be able to communicate these issues clearly and concisely to those team members.

Here's a list of things this developer needs to be able to both do and explain:

  1. Visualize and describe a product build cycle end to end;
  2. Identify the core features in that product and estimate the relative cost of each;
  3. Break up the development of the product into discrete milestones;
  4. Sequence those milestones into a schedule;
  5. Identify areas where features could be cut or altered to help ship faster; and
  6. Be able to provide a business justification for every technical decision.

This is actually a lot of stuff! But it's important to do - if a developer can develop all of those distinctions and explain them effectively, then they're capable of building this product without a lot of direction.

How developers become "from scratch" developers

I'm a big believer in nurture over nature - I've become proficient at building things from scratch as a result of years of repetition, beginning with developing baseball card trading sites when I was a fourth grader all the way up to the OSS projects I've founded to the startups I've founded.

If you're a developer and read this list and feel like you can't do this yet - don't worry. It takes practice, effort, and a willingness to really learn from your mistakes. So give yourself permission to fail and start getting your hands dirty - building an OSS project or a product from scratch is one of the best professional growth exercises I can recommend. You'll learn a lot, and not just the technical parts of it.

If you want to try your hand at open source software, make sure you read "How to Start Contributing to Open Source Software."

Or if you want to try to ship a commercial product, then make sure you read "What It Takes to Actually Ship a Piece of Commercial Software."

Introducing Access to {AI} Conference, November 12-13 2015

$
0
0

Access to {AI}

And now for something completely different - I'm hosting a conference in the November 12-13th in Mountain View, California called Access to {AI}.

The goal of the conference is to have experienced practicioners of Machine Learning, Natural Language Processing, and Artificial Intelligence introduce the techniques and tools they use everyday to programmers who've never done any of that before.

If you're a TL;DR; sort of person, just click here to register for updates about Access to {AI}.

Why is this important? Well...

Breaking into advanced fields is hard

I first learned how to do distributed computing in college, and I was overwhelmed learning the strange new lexicon that came with it. I barely knew how to write C++ and C#, even though I had spent thousands of hours doing both prior to coming to college. And I had to learn about "consistency," "wire formats," "serialization," and dozens of other concepts that were completely alien to me even as someone who's been programming since age 10.

Thankfully, I had great professors and other people who could help teach those concepts to me - and even with their help it still took me years of practicing on my own to really "get it."

But what if you or I need to learn an advanced field like AI or machine learning on the job today? How are we going to learn that technology's lexicon, algorithms, frameworks, and best practices completely on our own? What about the underlying math involved - is that something we'll need to know too?

Now imagine that English isn't our first language; we're almost entirely self-taught programmers; and never went to college. The barriers for entry for these technologies start to look really intimidating!

What if we did something extraordinary to introduce programmers who have no background in these areas a boost... What if we held a conference where professionals who use these advanced technologies every day in production got together and shared their experiences with beginners? That's exactly what we are doing with Access to {AI}.

What if all developers could use AI, machine learning, or any other advanced field as readily as web programming?

When someone posed the following question to me, I committed to the possibility inherent in the answer.

What would our world look like if every programmer could put advanced technologies like AI, machine learning, distributed systems, concurrent programming, and others into action as effortlessly as web programming? What types of software would the average developer produce then?

The answer is obvious: we wouldn't recognize the world we live in if this was already possible, because the average piece of software produced would enable computers to do orders of magnitude more work for people than they do today.

And yet, the reality is that this can easily be possible - but it will require us to do something really different in our advanced computer science fields: actively make these technologies more accessible to developers who are self-taught, haven't gone to college for CS, or have otherwise not had access to these technologies before.

And that is why we are putting on Access to {AI}. Through this conference we want to connect developers who have no background in AI or machine learning to experts who use it every day.

Access to {AI} is a conference by experts for beginners - a rare opportunity in tech conferences.

Why AI and machine learning?

Of all of the advanced fields within computer science, why did we pick Artificial Intelligence and Machine Learning as the first subject for an Access to {_} conference?

Because I personally don't know it and neither do the other organizers. We're experienced professionals in distributed computing and other advanced fields, but none of us collectively have much experience with AI or Machine Learning.

And we're exactly the sort of people we intend to empower through Access to {AI} - largely self-taught programmers who've never been immersed in AI before. If we can get it, then everyone can.

We intend to host a series of Access to {_} conferences on different areas over time, but we're starting with one where we're the customer too - so we can gauge for ourselves how effectively this format gives us access to technologies we've had trouble picking up on our own.

I am requesting your help

I want you to help me stand for the possibility of giving every programmer access to AI, machine learning, and every other advanced technical field in computer science. Through this work we can create a world where every programmer has access to tools and technologies that will make them individually more powerful and expressive.

Here's how you can help realize this possibility:

Attend or Register for Updates

Easy, right? Buy a ticket to Access to {AI} OR register for updates about Access to {AI} here.

Conference will be held at Microsoft Silicon Valley Campus from November 12th, 2015 through November 13th.

Submit a Talk

We have some amazing AI and ML experts lined up to speak already, but we are looking for more!

If you're interested in sharing your expertise and giving programmers access to your world as an experienced AI or machine learning professional, fill out our Access to {AI} Call for Presentations Form and get involved!

Volunteer

We're looking for more volunteers to help with reviewing CFP proposals, scheduling, promoting the conference, and lots of other things stuff! Conferences are a lot of logistical work and organizing them is great growth opportunity for programmers.

If you want to get involved as a volunteer, send us an email at conference@accessto.io

Help Promote Access to {AI}

The easiest thing you can do to help is to let your friends know about it!

You can start by mentioning Access to {AI} on Twitter!

And you can follow Access to {AI} on:

Reminder: Access to {AI} is on November 12-13 in Mountain View, CA, at Microsoft's Silicon Valley Campus! Buy a ticket to Access to {AI} OR register for updates about Access to {AI} here.

The Beginner's Reference Guide to Startups

$
0
0

I was asked by a close friend earlier this week about whether or not I have any references, books, or recommended reading for anyone wanting to get into startups. I don't have a single source that I can point to, so I thought I'd write one!

Let's start from the beginning.

Getting Acquainted with Startups

Before you get into trying to do your own business, it's a great idea to see what all of the other entrepreneurs are up to across the world. Learn from their experiences, successes and failures both.

Start with Paul Graham's essays. Paul's one of the founders of Y Combinator, one of the most successful early stage startup investment vehicles on the planet. He's worked closely with hundreds of startups, met thousands of founders, and seen deals of all shapes, sizes, and trajectories.

His essays are pure gold - a culmination of dozens of man-years worth of experience. They're not really for learning tactical, detail-oriented parts of startup life (such as how to hire a developer) as much as they are for the higher-order strategies, trends, and put simply: ways of being in startup life.

While you're at it, consider adding Hacker News, Y Combinator's user-driven news website, to your list of regular reads. You'll discover a lot of great opportunities early if you check the site regularly and you'll also learn a lot by osmosis.

The next person worth reading is Ben Horowitz, a famous entrepreneur with an amazing story in his own right and one of the founders of the wildly successfully Andreessen Horowitz venture fund. His stories are much more personal and gritty than Graham's, and if you find his blog compelling then you should definitely read his book The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers; I've finished it and absolutely loved the anecdotes.

Picking Something to Work On

Over the past 6 years an area of entrepreneurship that's received a tremendous amount of attention is the concept of idea and business validation - an empirical method for determining whether or not a specific business concept is worth pursuing into various stages and at what scale.

The book that really blazed the trail in this area is Eric Ries's The Lean Startup, which defined many of the key concepts and methodologies even the most widely successful entrepreneurs in Silicon Valley employ to validate a business idea before investing heavily into it.

Mastering the Lean Startup methodology will save you tremendous amounts of wasted effort, dollars, and energy. And if you master the process of being able to simply validate an idea, you'll have breakthroughs in areas like talking to customers, product design, analytics, marketing, sales, and other key business areas where entrepreneurs generally struggle.

Thus the next resource I suggest you checkout is the Lean Startup Machine - a 3-day workshop that will help you employ all of these techniques on a business idea over the course of three days. And if you can't attend one of their courses, then at least try their validation board software and use that as a framework for employing the lean methodology.

Lastly, here are some other books in this area that are worth a read:

These are both in my personal Kindle collection and I've found them to be helpful.

Fund-raising

One area of running a startup that can be pretty intimidating is the prospect of pitching investors and raising venture capital - I know it certainly was for me the first time I raised.

A really simple introduction to venture capital fund raising that I've kept on my favorites bar for years is this 2012 TechCrunch article "How To Raise A $1M Seed Round" - I actually used this advice when I ended up raising in 2013 for MarkedUp, so speaking from personal experience there's real value to this.

However, if you're looking for a more general guide that explains how the entire fund-raising process works across all of the various funding sources such as accelerators, angel investors, micro VCs, growth funds, etc... Then I strongly recommend Paul Graham's "Startup Funding" essay.

And in general, if you want to hear the thoughts of someone who's crossed the divide from being a software entrepreneur to an investor then make sure you read Mark Suster's Both Sides of the Table. It's a popular blog and for good reason: Mark offers a unique perspective on the ecosystem and is both personal and authentic in sharing what he sees as trends in startups seeking to raise capital and where many founders misspend their time. Truly a great read.

Lastly, when it comes down to the tactics of actually pitching an investor - then you should read "Eleven Compelling Startup Pitch Archetypes"; it provides a great overview of the different ways to structure a pitch narrative for investors, using real examples from Y Combinator companies.

Legal and Accounting

When it comes to finding good legal and accounting advice for startups online the resources are somewhat scarce... Honestly the best article I can find on determining which legal structure you should pick for your company is this one from Entrepreneur magazine. I'll probably end up having to write one of my own here unless someone suggests a good one in the comments.

In terms of accounting, this list of "8 Accounting Pitfalls That Start-up Businesses Can Avoid" is worth its weight in gold.

Marketing and Sales

A book I've recently fallen in love with and given away copies of is Traction: How Any Startup Can Achieve Explosive Customer Growth by by Gabriel Weinberg and Justin Mares. I've linked to the new hardback print which isn't out yet, but I've had this book on my Kindle for the past year. In Traction the authors really offer a concrete framework (the "bullseye") for being able to test and quantify which marketing channels are and aren't working for your startup, and gives you some guidance on how to focus your resources on a small number of successful channels versus spreading yourself thin trying to be good at everything.

Beyond that though, the advice Traction offers on how to work with individual marketing channels like email marketing, search engine optimization, social media, and so on are tactically sound. You'll get lots of good ideas from this book.

On the sales side of the equation, you have go with Predictable Revenue by Aaron Ross and Marylou Tyler. It's an instant classic and a must-read for any startup that'll have a direct sales component built into it.

Founders, Co-Founders, and Early Employees

One of the tricky areas of startups and the most important to get right early is managing your expectations on what the experience of running a startup will actually be like. I've written about this ad nauseam (1, 2, 3, 4, 5,) but I have some personal favorites in this area:

The only thing more critical and more complicated than managing your own expectations, life, and time as a startup founder is when you bring a co-founder or early employee into the mix. Here are some of the resources I've found helpful when I was first starting out:

Most of the good advice I've actually used in this area was delivered to me verbally by some great advisers. I'll codify this into some new posts in the not-too-distant-future.

Conclusion

I may expand this list over time with new references and sections (product development comes to mind) but for the time being this is a great list for anyone who's really interested in learning the ropes of tech startups.

Let me know if there's anything else you'd like to see in the comments!


Introducing NBench - an Automated Performance Testing Framework for .NET Applications

$
0
0

I originally posted this to the Petabridge blog earlier today. See the original here.

Not long ago in Akka.NET-land we had an issue occur where users noticed a dramatic drop in throughput in Akka.Remote's message processing pipeline - and to make matters worse, this occurred in a production release of AKka.NET!

Yikes, how did that happen?

The answer is that although you can use unit tests and code reviews to detect functional problems with code changes and pull requests, using those same mechanisms to detect performance problems with code is utterly ineffective. Even skilled developers who have detailed knowledge about the internals of the .NET framework and CLR are unable to correctly predict how changes to code will impact its performance.

Hence why I developed NBench - a .NET performance-testing, stress-testing, and benchmarking framework for .NET applications that works and feels a lot like a unit test.

An NBench Example

Here's a small sample from the NBench README that shows a simple test that measures the throughput of the built-in Counter class that NBench uses to measure the throughput of user-defined code.

usingNBench.Util;usingNBench;/// <summary>/// Test to see if we can achieve max throughput on a <see cref="AtomicCounter"/>/// </summary>publicclassCounterPerfSpecs{privateCounter_counter;    [PerfSetup]publicvoidSetup(BenchmarkContextcontext){_counter=context.GetCounter("TestCounter");}    [PerfBenchmark(Description = "Test to ensure that a minimal throughput test can be rapidly executed.",         NumberOfIterations = 3, RunMode = RunMode.Throughput,         RunTimeMilliseconds = 1000, TestMode = TestMode.Test)]    [CounterThroughputAssertion("TestCounter", MustBe.GreaterThan, 10000000.0d)]    [MemoryAssertion(MemoryMetric.TotalBytesAllocated, MustBe.LessThanOrEqualTo, ByteConstants.ThirtyTwoKb)]    [GcTotalAssertion(GcMetric.TotalCollections, GcGeneration.Gen2, MustBe.ExactlyEqualTo, 0.0d)]publicvoidBenchmark(){_counter.Increment();}    [PerfCleanup]publicvoidCleanup(){// does nothing}}

I compile this benchmark into a DLL (just like you would with an NUnit or XUnit unit test) and then use the NBench.Runner NuGet package to execute this benchmark according to the parameters I specified on the PerfBenchmark attribute. And it will produce a report like this:

What NBench Can Measure

Currently, NBench is programmed to be able to measure and perform assertions against the following types of performance data:

  1. Throughput of code - measured in operations per second;
  2. GC overhead - measured in total collections per GC generation; and
  3. Memory allocations - measured in total bytes allocated per benchmark iteration.

In the not too distance future we will also be adding support for Windows Performance Counters, which will allow you to measure and collect data directly from the operating system such as disk, cpu, or network utilization.

NBench Assertions and Real-world Use

The output report produces some interesting and usable benchmark data (and we've expanded it since to include all of the raw data from each of the individual runs of the benchmark,) but the most useful feature in my opinion are the performance assertions available in NBench.

Take a closer look at this attribute:

[MemoryAssertion(MemoryMetric.TotalBytesAllocated, MustBe.LessThanOrEqualTo, ByteConstants.ThirtyTwoKb)]

This MemoryAssertion attribute specifies that we want to measure the total number of bytes allocated on each iteration of this benchmark, and if that value ever matches or exceeds 32kb then this performance test is considered to be a failure.

This is a tremendous leap forward in being able to correctly assess the performance impact of code before it's ever merged into production. To be able to set a floor on the performance of some method whether it's memory, throughput, or any other metric that can be collected.

NBench makes it easy for developers and release managers to now do the following:

  1. Easily write and run benchmarks without complicated tools or expensive Visual Studio licenses;
  2. Integrate performance testing directly into your build pipeline;
  3. Collect and retain NBench performance reports, so you can see how the performance of measured code has changed over time; and
  4. Write assertions that can automatically fail pull requests and patches that negatively impact performance in critical errors, eliminating the possibility of accidentally including those changes into production code.

Introduction to NBench

If you want to learn more about how to use NBench, I suggest by starting with the NBench README and watching this brief "Introduction to NBench" tutorial video below.

Related Links

Broken Windows: How Bad Software Releases Happen to Good Teams

$
0
0

One of my primary responsibilities with the Akka.NET project is release manager - I put together the release notes, press the big green button when we're ready to deploy, and make sure that each contributor signs off on the release.

The thing I take most seriously about my job is quality control - trying to ensure that no release ever does any of the following:

  1. Introduces a breaking change to a public interface;
  2. Introduces a game-changing bug that forces users to roll-back to a previous version;
  3. Causes a major degradation in performance or stability; or
  4. Never significantly alters the behavior of a component in a manner that falls out of alignment with previous behavior without giving the users sufficient advanced notice.

Unfortunately, within the past few months we've had all of the above happen at least once each. Akka.NET is an open source project that has seen a rapid increase in adoption, contribution, and deployment over the past six months in particular (since we released 1.0) so these issues aren't unexpected; growing pains.

However, I've observed what the root cause of this particular set of growing pains appears to be: broken windows theory at work.

Broken windows theory

The Broken windows theory is a criminological theory that essentially amounts to this: if you tolerate lesser crimes such as vandalism, public drinking then this creates a negative feedback loop that results in social decay and increase in more serious crimes such as roberry and theft.

Harkening this idea back to software development, broken window theory amounts to neglect that accelerates code rot within a codebase. If you tolerate some failing unit tests, how long to do you go before someone puts a massive bug in that area of the code base? Much faster than you would if you didn't tolerate lack of code coverage and test failures.

To my delight, I discovered when researching this post that Jeff Atwood covered the application of broken window theory to software ten years ago:

Programming is insanely detail oriented, and perhaps this is why: if you're not on top of the details, the perception is that things are out of control, and it's only a matter of time before your project spins out of control.

In the case of Akka.NET, our failure to achieve our release goals on multiple releases came about as the sum of the following root causes:

  1. No automation, standardization, and historical references for the project's performance over time;
  2. Asynchronous code is inherently more difficult to test than synchronous code; there were a small number of Heisenbugs which showed up in random test failures only on our build servers, crappy Azure boxes, and never on our high-end development machines. As a result we got conditioned to ignore some of those tests and dismiss the results as related to the CPU-sharing going on inside Azure. As it turns out, these tests revealed real faults in our code that eventually would show up under higher loads.
  3. No automation for testing binary compatibility of the Akka.NET (breaking .DLL changes.)

None of these issues in and of themselves are that significant, until we see what it lead to:

  1. Overlooking potential issues in tests that failed;
  2. Relying on manually-run benchmarks and examples to gauge the performance of our software between releases; and
  3. Inadvertently accepting contributions from well-intentioned contributors who made hard-breaking changes to APIs in ways which were not obvious (to us), such as adding an overload to an extension method.

These were all small details early on in the project, when we had bigger concerns like actually shipping a stable version of our core modules. But as the project grew and these broken windows were left unattended the project rotted in core areas that had a real impact on our users.

However, the users and contributors of Akka.NET are on top of their game so we didn't let things stay this way for long.

We added NBench, an NUnit-style performance testing framework for .NET, to add stress testing and performance testing to all of Akka.NET's builds, which eliminates many of these issues (spots performance issues and makes it easier to reproduce Heisenbug.) We're working on adding an API diffing tool to our release process to help prevent issues related to breaking changes. And we're experimenting with changes to our review process to make it easier to catch potential bugs and issues sooner.

As Jeff put it, software is insanely detail-oriented - and some of the growing pains Akka.NET experienced over the course of 2015 came down to us having to being paying attention to new types of details we never worried about before, such as performance and backwards-compatibility. As our projects and products grow, we need to sweat the small stuff.

Introducing the New .NET Stack

$
0
0

I’ve been a .NET developer for roughly 10 years now - since the summer after my freshman year in college in 2005 I’ve been developing in Visual Studio and .NET. I’ve founded three startups on .NET, worked for Microsoft, and founded multiple successful OSS projects in .NET - I say all of this in evidence to the depth of my commitment and investment in the .NET ecosystem.

But it’s time we, the .NET community, address a major elephant in the room - .NET is, and often has been, way behind in other ecosystems in terms of overall innovation, openness to new ideas, and flexibility.

And you know what - we are addressing it! And this has created some exciting new possibilities for .NET developers that were never available to us before, to a point where we may actually be competing with platforms like Node.JS, Java, and others for mindshare on Linux.

And that’s the crux of my talk at the 2015 Cassandra Summit, “The New .NET Stack,” which I present to you now below. Enjoy!

Slides

Essay

Want to learn more about the new .NET stack? Take a look at the original blog post on the subject I wrote for Petabridge.

.NET Core is Boiling the Ocean

$
0
0

I get asked regularly in the Akka.NET Gitter Chat and elsewhere about “when will Akka.NET support .NET Core?”

TL;DR; .NET Core

Part of the issue I’ll address here is that .NET Core means different things to different people and there hasn’t been clear messaging on that from Microsoft as far as I know, so I’ll summarize how it’s relevant to me, Petabridge, and Akka.NET.

.NET Core is about decoupling .NET from Windows. Allowing it to run in non-Windows environments without having to install a giant 400mb set of binaries.

This will allow .NET to run in containers on Linux and lots of other places it can’t today and presents a very exciting opportunity for those of us in the server-side .NET development space to start taking advantage of modern DevOps and deployment tools.

Because let’s face it: deployment tooling built specifically for Windows is complete and utter dogshit compared to the what’s available everywhere else. Even the Windows versions of tools like Chef are pitiful imitations of the real thing. Windows Server is the red headed step child of server operating systems and even .NET developers are increasingly united in resenting its necessity in our day-to-day lives. .NET Core should liberate us from that.

New Week, New Story

My answer to this question for the better part of a year has been “whenever .NET Core is stable enough for us to use.” I had originally imagined that .NET Core would be “released” in a stable form this year, 2016.

Now I no longer have any certainty with anything in regard to .NET Core, because the roadmap has been changing rapidly.

Two weeks ago the .NET Core team abandoned the new project.json format for managing projects / dependencies and are reverting back to the MSBuild .csproj format, for reasons I understand and am ultimately sympathetic to.

I hated the thought of having to port the 80-something .csproj files in the core Akka.NET repo to a new format, because it’s trivial tedious bullshit that offers no upside and a lot of downside for our development and release processes. So kudos for that.

But in a public meeting with the community yesterday, the following bombs were dropped:

  1. .NET Core will be expanding its API support to include the Mono API surface area, to support Unity / UWP apps
  2. .NET Core will be back in web browser (for some reason) again once WebAssembly finishes
  3. Mscorlib may, or may not be coming back. Or maybe we’ll be using NuGet. #YOLO
  4. AppDomains and other excluded features may be coming back, but different

Update:Miguel de Icaza pointed out that WebAssembly support will actually be for Mono and not .NET Core, which leaves me with even more questions. So Mono is going to still be a thing? Are there going to be competing efforts between .NET Core and Mono for x-plat .NET developer support? Who knows? Miguel seems confident that their team’s communication has been sufficiently clear on this point. I disagree.

You can read the full Slack transcript on that thread, but the summary made by Jose Fajardo is excellent.

After going through this a second time, I’m left asking…. “so what do my projects and my business need to be doing with regard to .NET Core today?”

At this rate I’m fairly confident that even the .NET Core team themselves would answer with:

¯_(ツ)_/¯

Ship Dates and Boiling the Ocean

I’m left with the impression that .NET Core is trying to do everything at once: static linking, new tooling, support for 3D games, web applications, cross-platform desktop applications, browser-based apps, and anything else that could be aptly labeled under the masthead of “panacea for .NET developers.”

There’s a term for this, “boiling the ocean,” and ironically Immo Landwerth, one of the PMs in charge of .NET Core, claims this is precisely what his team is trying to avoid:

Immo Landwerth on not boiling the ocean

Link to the original Tweet

Upon reading yesterday’s announcements, I’m left with the impression that the parts of .NET that aren’t being changed in .NET Core are in the minority.

Here’s the thing: the vision that Miguel and Immo paint for .NET’s role in the future is compelling. .NET everywhere?! YESSSSSSSSSSSS!

But you know what I and every other .NET developer have to deal with outside of the bubble in Redmond? Setting ship dates and expectations with our own users, management, and stakeholders.

The .NET Core team has done a fucking awful job at everything in regard to expectations management around .NET Core. It would be a disservice to amateurs to call their PR efforts “amateurish;” manic and frantic would be better terms.

As an ex-Microsoft employee I hate myself for saying this, but Immo and friends should consider going through Microsoft PR before they make any promises with end-users. They’ve proven to be too wildly inconsistent and too ready to make promises about software that isn’t on terra firma yet. Look at the damage Microsoft can do when they fuck up expectations with .NET developers.

.NET developers are betting their careers and livelihoods on .NET Core, and I genuinely don’t see Scott Hanselman, Damian Edwards, David Fowl, Immo Landwerth, Miguel de Icaza, or any other public-facing figure in the .NET Core efforts genuinely taking into the account the real impact resetting expectations with .NET developers can have in terms of lost business, missed opportunities, or possibly even getting fired. If you’re of the opinion that I’m overblowing the impact of honoring expectations can have on something as big as platform choice, then you should read the post I linked in the previous paragraph. People lose their jobs and lose customers over less.

Case in point: after yesterday’s announcements I’m wondering if the .NET Core roadmap is even valid anymore. We had talked about starting work on Akka.NET for .NET Core in 2016. As of today, I’m thinking that .NET Core probably won’t be ready until much later than that.

And here’s where the rubber meets the road: if I start promising tools and features to end-users based on what’s been promised by the .NET Core team and .NET Core changes direction and pulls the rug out from under me, then I effectively screw up my users’ and customers’ timelines.

I can’t have that, and I’m fortunate enough to be in a position where I can and am choosing not to. I’m sticking with .NET 4.5.2 until .NET Core is solid.

Not everyone else is - what about the .NET developers who have to explain to management that they can’t ship a .NET app on Linux until next year, and all of the effort they spent on RC1 is going down the shitter? “Oh well, they shouldn’t have used an RC - screw them?”

Update: this is from the Event Store team:

Suggestions

I have a lot of sympathy for David and company - I know it can be tough working on such an important project out in public. So here’s what I would suggest:

  1. Stop. Stop boiling the Ocean. WebAssembly, Unity Support - that’s all great. And they can wait. WebAssembly isn’t even supported by any browsers yet and Unity3D already works cross platform. Meanwhile, your users are overwhelmingly ASP.NET developers and they’re stuck with Windows Server.
  2. Don’t mention anything not on the current .NET Core roadmap. Seriously, don’t even bring up WebAssembly or anything else until you’ve shipped something on this roadmap first. Everything that’s happened in the past two weeks has totally undermined the confidence I had in .NET Core’s near-term arrival, and I’m not alone in that camp. I’m wondering if this roadmap is even accurate anymore.
  3. Get PR training. Using the term “pay to play” to describe the cost of running extra NuGet packages shipping as part of .NET Core is not going to be interpreted generously by anyone who’s done business with Microsoft before. Have some self-awareness, and in lieu of that get a PR person involved. Obviously, extra NuGet packages that are included in an application will incur extra disk / memory / CPU overhead - you don’t need to appropriate a term that means “money changing hands” to explain this.

I love the direction .NET is going in and I’m genuinely excited for it, but the .NET Core team desperately needs some finesse when it comes to communicating with end-users. I’ve struggled with this issue myself - I’ve missed our own roadmap dates and deadlines and been called to task for it. It’s hard.

But this is Microsoft - they have hundreds of thousands of developers who’ve staked their careers on the future of .NET. They can and should do better.

Writing Better Tests Than Humans Can Part 1: FsCheck Property Tests in C#

$
0
0

This is the first post in a 3-part series on property-and-model based testing in FsCheck in C#.

  1. Writing Better Tests Than Humans Can Part 1: FsCheck Property Tests in C#
  2. Writing Better Tests Than Humans Can Part 2: Model-based Tests with FsCheck in C#

Subscribe to get the next as they’re published!

During my work on developing Akka.NET 1.1 I took a long look at the history of various bug reports over the lifespan of Akka.Remote, Akka.Cluster, and Helios over the past two years or so. The thing that I wanted to make sure we absolutely nailed in this release was ensuring the stability of the underlying transport and so-called “endpoint management” system.

If these sub-systems aren’t robust then Akka.Cluster can’t fulfill any of its behavior guarantees to end users.

For instance, I found that in the previous stable version of Helios we were using (1.4.1) there was an easily reproducible race condition that could result in a socket failing to open. In addition to that we (long ago) discovered that Helios didn’t treat message order correctly as a result of a fundamental design flaw with its concurrent message processing model. We had to patch that via a configuration change that severely limited Helios’ throughput inside Akka.Remote.

How is Akka.Cluster supposed to behave reliably during a network partition if its own internal software can easily create one?

This is the crux of the problem I tackled intensely over the course of three months.

Thus we eliminated these kinds of problems and graduated Akka.Cluster from beta (which it has been since August 2014) to a full, production-ready release. One of the tools that was essential to making this happen was FsCheck - a property and model-based testing framework for .NET applications.

Where Unit Tests and Traditional Testing Practices Fall Short

In modern software development it’s standard practice for developers to cover their own code using individual unit tests like this:

publicclassMyModuleSpec{[Fact]publicvoidMyModule_should_add_item_to_collection(){varmyModule=newMyModule();myModule.Add(1);Assert.True(myModule.Count==1);Assert.True(myModule.First()==1);}}

These types of tests serve three purposes:

  1. Validating that your code works as you write it, especially if you’re working from the bottom-up.
  2. Protect against future regressions on this code. This is the most important function of tests - to scream, loudly, if the underlying behavior of a system inadvertently changes as the result of a software change.
  3. Defines a working specification for how code behaves. The tests also act as self-contained contracts for defining how each of the covered modules under test are supposed to behave during a variety of conditions.

Unit tests are ideally meant to target very small, isolated units of code. This makes them easier to write; easier to read (by another developer;) and makes the test more “pure.”

Where the unit testing model really starts to fall apart however, is that this model is really designed to test the behavior of individual features. What happens if you need to test how multiple features in a given application interact with each other?

Unit Tests Don’t Scale

Let’s suppose a feature of our application is a custom IList<T> implementation called FixedSizedList<T> that stays at a fixed size and won’t add any more items if the collection once its limit is reached.

varfixedSize=newFixedSizeList<int>(1);booladdedFirst=fixedSize.Add(1);// should be true
booladdedSecond=fixedSize.Add(2);// should be false

An experienced developer would manually write some unit tests that cover the following scenarios:

  1. Adding an item to an empty list;
  2. Adding an item to a full list;
  3. Removing an item from a full list and then adding a new item; and
  4. Removing an item from an empty list.

That covers all of the basics, maybe, for this feature. Now here’s where things get fun - imagine if we use this FixedSizeList<T> inside another feature: a BackoffThrottler.

publicclassBackoffThrottler{privatereadonlydouble_throttleRate;privatereadonlyFixedSizedList<IMessage>_messageBuffer;privateIChannel_downstreamConsumer;privatedoubleobservedBytes;privatelongnextReleaseTimeTicks;publicBackoffThrottler(doublethrottleRate,FixedSizeList<IMessage>buf,IChannelconsumer){// assignments...
}publicboolPass(IMessagenext){if(next.Size+observedBytes>=_throttleRate){if(!_messageBuffer.Add(next)){thrownewInvalidOperationException("buffer full!");}returnfalse;}else{observedBytes+=next.Size;_downstreamConsumer.Send(next);returntrue;}}publicboolDrainBuffer(longnextReleaseTime){// drain all buffered messages
// calculate new release time
// reset observed bytes to zero
// etc...
}}

Manually writing unit tests for just the Pass method of this BackoffThrottler class is going to be complicated. We have to determine if the Pass method will return true, false, or if it will throw an InvalidOperationException based on:

  1. The number and size of all messages observed previously;
  2. The size of the current message;
  3. Whether or not any DrainBuffer operations occurred previously; and
  4. The possible configuration settings for both the BackoffThrottlerand the FixedSizeList<IMessage> objects.

How many tests would you have to write to exhaustively prove that these two classes both work correctly under all supported configurations? “More than a human can feasibly write” is the correct answer. Now imagine having to do this for three, four, or dozens of features all working together simultaneously. You simply can’t write manual tests to cover all of the scenarios that might occur in the real world. It’s not feasible.

Model and Property-Based Testing

Folks in the Haskell community developed a solution to this problem in the late 1990s, and it was called QuickCheck. This is the library that pioneered the concept of property and model-based testing. FsCheck, which we will be using, is an F# implementation of QuickCheck - although I’ll be using it with C# in this example.

The idea is simple: rather than try to test every possible combination of inputs with a hand-written test, describe an abstract “model” for the feature(s) under test and generate random inputs for each possible value and verify that the test still holds true for all of them (assuming the inputs satisfy from pre-conditions, which we’ll cover.)

What this gives you is a single test written by a developer that can cover thousands of randomly generated scenarios and verify that the code-under-test behaves correctly under all of them. This is much more powerful, efficient, and secure than hand-written tests.

Testing Properties of Features

Let me give you a real-world example from one of my open source libraries, Faker. Faker uses reflection to create a plan for randomly generating fully constructed POCO objects and can even handle rather complex message graphs. But one part of our system for generating accurate fakes includes randomizing array / list types - and so I wrote a shuffle function based on the well-known Fisher-Yates shuffle algorithm.

usingSystem;usingSystem.Collections;usingSystem.Collections.Generic;usingSystem.Linq;namespaceFaker.Helpers{/// <summary>
///     Extension methods for working with arrays
/// </summary>
publicstaticclassArrayHelpers{/// <summary>
/// Produces the exact same content as the input array, but in different orders.
/// 
/// IMMUTABLE.
/// </summary>
/// <typeparam name="T">The type of entity in the array</typeparam>
/// <param name="array">The target input</param>
/// <returns>A randomized, shuffled copy of the original array</returns>
publicstaticIEnumerable<T>Shuffle<T>(thisIEnumerable<T>array){varoriginal=array.ToList();/*
             * https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle
             */varnewList=newT[original.Count];for(vari=0;i<original.Count;i++){varj=Faker.Generators.Numbers.Int(0,i);if(j!=i)newList[i]=newList[j];newList[j]=original[i];}returnnewList;}}}

For a specific feature inside Faker, I have to be able to guarantee that for any IEnumerable<T> this shuffle function must:

  1. Never modify the original collection;
  2. Never return a collection with a different number of elements;
  3. Never return a collection with different items in its contents; and
  4. Must never return the original collection in its original order.

These are all properties of how this feature of my application works. And I can verify that all four of these properties hold true using property-based tests with FsCheck.

usingSystem.Linq;usingFaker.Helpers;usingFsCheck;usingNUnit.Framework;namespaceFaker.Models.Tests{[TestFixture(Description="Validates our extension methods for working with arrays")]publicclassArrayHelperSpecs{[Test(Description="Ensure that our shuffle function works over a range of intervals")]publicvoidShuffled_lists_should_never_match_original(){Prop.ForAll<int[]>(original=>{varshuffle=original.Shuffle().ToArray();return(!original.SequenceEqual(shuffle)).Label($"Expected shuffle({string.Join(",", shuffle)}) to be "+$"different than original({string.Join(",", original)})").And(original.All(x=>shuffle.Contains(x)).Label($"Expected shuffle({string.Join(",", shuffle)}) to contain"+$" same items as original({string.Join(",", original)})"));}).QuickCheckThrowOnFailure();}}}

In the sample above I’m calling FsCheck from inside NUnit, and I’m going to specify that for any random array of integers (Prop.ForAll<int[]>) my Func<int[], bool> will hold true. Given that int is a primitive data type, FsCheck has a built in randomizer that will generate incrementally more complex values and incrementally larger arrays.

The real piece of code doing the work here though are my two explicit property checks - the the two sets contain the same items but not in the same order. This covers all four of my properties from earlier.

Let’s see what happens when I run this sample as-is.

System.Exception : Falsifiable, after 1 test (1 shrink) (StdGen (1428853014,296185311)):
Label of failing property: Expected shuffle() to be different than original()
Original:
[|0|]
Shrunk:
[||]

Ouch, fails on the very first try - when we attempt to shuffle a zero-length collections. Well you can’t technically shuffle a list with no items, so let’s add a precondition to this property test that states that these properties are only true if there’s at least 1 item.

Prop.ForAll<int[]>(original=>{varshuffle=original.Shuffle().ToArray();return(!original.SequenceEqual(shuffle)).When(original.Length>0)// pre-condition to filter out zero-length []s
.Label($"Expected shuffle({string.Join(",", shuffle)}) to be "+$"different than original({string.Join(",", original)})").And(original.All(x=>shuffle.Contains(x)).Label($"Expected shuffle({string.Join(",", shuffle)}) to contain"+$" same items as original({string.Join(",", original)})"));}).QuickCheckThrowOnFailure();

And the results are….

System.Exception : Falsifiable, after 1 test (2 shrinks) (StdGen (1049246400,296185312)):
Label of failing property: Expected shuffle(0) to be different than original(0)
Original:
[|-1|]
Shrunk:
[|0|]

Doh! The spec fails if we have an array with a single entry in it too! I guess that’s because you also can’t shuffle an item with a single entry.

Automatically Finding the Smallest Reproduction Case Using FsCheck

Now notice this comment produced by FsCheck: Falsifiable, after 1 test (2 shrinks) in the last scenario. This process, called “shrinking,” allows FsCheck to work backwards through the series of random inputs it generated to determine what the minimal reproduction steps are for producing a test failure. This will come in handy once we start testing more complicated models.

I’ve had FsCheck shrink a test scenario it produced with 1000+ steps in it down to four steps before - it found a combination of operations that put a particular class into an illegal state. I would have never found that using manual testing.

Now back to our example - looks like we need to modify our precondition to specify that the length of the input collection must be greater than 1 in order to pass.

Prop.ForAll<int[]>(original=>{varshuffle=original.Shuffle().ToArray();return(!original.SequenceEqual(shuffle)).When(original.Length>1)// changed precondition
.Label($"Expected shuffle({string.Join(",", shuffle)}) to be "+$"different than original({string.Join(",", original)})").And(original.All(x=>shuffle.Contains(x)).Label($"Expected shuffle({string.Join(",", shuffle)}) to contain"+$" same items as original({string.Join(",", original)})"));}).QuickCheckThrowOnFailure();

So we’ll try running FsCheck again…

System.Exception : Falsifiable, after 2 tests (0 shrinks) (StdGen (424319174,296185314)):
Label of failing property: Expected shuffle(1,1) to be different than original(1,1)
Original:
[|1; 1|]

Ugh! It still fails - this time because it inserted an identical item into multiple places inside the same collection. The output of our shuffle function for a case where all of the inputs have identical elements will be the same as the input each time. So let’s add another precondition for that as well.

Prop.ForAll<int[]>(original=>{varshuffle=original.Shuffle().ToArray();return(!original.SequenceEqual(shuffle)).When(original.Length>1&&original.Distinct().Count()>1).Label($"Expected shuffle({string.Join(",", shuffle)}) to be "+$"different than original({string.Join(",", original)})").And(original.All(x=>shuffle.Contains(x)).Label($"Expected shuffle({string.Join(",", shuffle)}) to contain"+$" same items as original({string.Join(",", original)})"));}).QuickCheckThrowOnFailure();

And finally, we should have a passing test.

Ok, passed 100 tests.

Yay! It passed! We can now rest assured that our shuffle function works for the 100 randomly generated scenarios FsCheck produced. You can increase the number of tests if you need it.

Read the next post: Writing Better Tests Than Humans Can Part 2: Model-based Tests with FsCheck in C#.

Viewing all 114 articles
Browse latest View live