• Sun, Jan 28, 2018

    I noticed something today: In Chrome, accessing http://127.0.0.1/ will disable some extensions (like AdBlock and Ghostery), while http://localhost/ will not.

    Who knew?!

    Write me a note if this is the same in other browsers. Or alternatively, if I’m missing something obvious and this is totally wrong.

  • Thu, Jan 25, 2018

    So you have an SSH session that’s locked up. Ctrl-C doesn’t work. What to do, aside from closing your terminal?

    Indeed, there is a way. On a newline, type ~. (that is a tilde followed by a period). Instagib.

    The more you know!

  • Wed, Jan 3, 2018

    “Ohai Azure Portal, how I’ve missed you!” – said no one ever.

  • jq is a swiss army knife for working with JSON. It is especially handy for piping output of CLI tools, such as curling JSON APIs, or aws and az CLIs.

    I wanted to get a nice list of public IP addresses of my EC2 instances, together with instance names. I could have used boto for this, but the combo of AWS CLI and jq turned to be a simple and effective one-liner (split for better wrapping).

    aws ec2 describe-instances | jq '.Reservations[].Instances[] |
      {(.Tags[] | select (.Key == "Name") | .Value): .PublicIpAddress}' |
      jq -s add
    

    produces:

    {
      "foo": "54.131.121.177",
      "bar": "52.75.8.58",
      "baz": "34.228.156.28"
    }
    
  • Fri, Oct 13, 2017

    Azure functions can look at blob storage and react to things.

    But actually not really all that well.

    Excerpt from the Documentation:

    When you’re using a blob trigger on a Consumption plan, there can be up to a 10-minute delay in processing new blobs after a function app has gone idle. After the function app is running, blobs are processed immediately. To avoid this initial delay, consider one of the following options:

    Use an App Service plan with Always On enabled.

    Use another mechanism to trigger the blob processing, such as a queue message that contains the blob name. For an example, see Queue trigger with blob input binding.

    Let’s deconstruct this a bit.

    The important parts are the "Consumption Plan" vs "App Service", and how those relate to the Always On mode.

    See, Azure Functions have two methods of operation (“plans”). The “Consumption” plan executes the function only when triggered. So if nothing is calling it, the function will go to sleep. A Function runs ephemerally and you need not think of its underlying resources whatsoever, aside from paying per invocation.

    The App Service plan, on the other hand, launches a VM that will host your functions, and that VM remains running. You don’t need to directly manage it (nor can you), but you are being charged for all the minutes it’s humming away. Also, unlike the Consumption plan, you need to manage autoscaling yourself.

    Only on the App Service plan you are given the option to enable “Always On”, which will prevent your function apps from going to sleep.

    So in contrast to the probably familiar pattern of AWS Lambda being triggered by a change in S3 bucket, the Azure Blob storage doesn’t immediately trigger your function on change in blob storage, unless the function is already running. Otherwise, you are waiting for the scheduled wake-up window (feel free to correct me on Twitter if I am misunderstanding something). I personally find this behaviour to be super confusing, and inferior to what the rest of the cloud has come to to expect of the “serverless” patterns.

  • Sun, Oct 1, 2017

    Good morning. Today we will take the terms “domains”, “fault”, and “update”, and make it sound more sophisticateder because competitive advantage.
    - Azure marketing people, probably

    I mean, it’s good they have thought of this. It’s even on the exam. But really, as the user of Azure, I don’t need to care about how they power their racks and in what order they are restarted. I care about stability of my VMs, but it’s ok to leave the mechanics of fault-tolerance to be a black box. For the most part, it would suffice for me to know that if I launch a group of 3 machines, I’ll have almost 3 machines running most of the time. I don’t have any control over this anyway, so those “domains” are trivia and implementation details.

    That aside, Microsoft’s general aversion to visual presentation of data rears its ugly head here once again. They could have designed the UX around this as a nice grid, with current status of each slot in the fault/update domain, etc. Could’ve even put this next to each VM. But no. Everything must look like a spreadsheet.

    The important takeaway of the entire feature: You should, for best availability vs cost effectiveness, try to horizontally scale your VMs in sets of 5: N % 5 == 0. That’s how many update domains exist. N < 5 - and you’re not utilizing the full fault-tolerance potential. 5 < N < 10 - and you are overprovisioning some of those update domains.

  • Tue, Sep 26, 2017

    – Hey, we need to do a deployment.

    5 developers swarm in to participate in the process. The fun begins with importing CSVs into Azure tables, a trivial task that we’ve yet to automate. Then off we go to deploy the application. Each deployment is a special snowflake - some services get updated, some not… We set the dials and hit the button. After all, “Those progress bars ain’t gonna watch themselves” (© Stan). All seems to be going well for 15 minutes…

    …until someone realizes that – apparently – something else had to be deployed first.

    – uh, can’t you cancel it?

    uh, NOPE.

    so we wait for this deployment to complete, because canceling a running deployment is bad luck (trust me). Then we deploy the prerequisite (it’s an ARM template, FYI). Finally, we’re ready to deploy the original application, and so we push the button and twiddle thumbs…

    until 20 minutes later:

    hey, did we change that variable?

    guess what ensues?… correct - all sorts of good times.

    fast forward, and the redeployment of the prerequisite is done. We’re into the 2nd hour of this extravaganza now, by the way. We go back to re-re-deploy the apps (3rd time if you’re keeping count). This is it, and then we’re done, right?

    riiight.

    As is tradition, post-deployment, the three scripts exist, which must be manually run on a snowflake box, as a final sacrifice of tears to the great pool of entropy. This involves, uh, pasting the actual scripts into that thing over there, complete with executable paths and all. I don’t know but I’ve been told, this will take like 3 hours to run.

    And it would indeed… only if someone didn’t

    restart the remote-script-runner-service-thingamajig because it was being slow.

    Suddenly what happens? Pop quiz? I hope you said “those script processes are now disowned”, which they are, they bloody are. They are running, but apparently either doing nothing, or something useless. Logs? What if I told you: there are no logs.

    Long story short (not really): I get on chat with the server admin. He checks the box for me, does admin things. We have no visibility into what the scripts are doing. We leave them alone for the time being.

    And a few hours later, they are still humming along…


    There are some takeaways from here. something something.

  • Tue, Aug 8, 2017

    Some of these tech vendors need to grow a pair.

    I found an email of this nature in my inbox today:

    Dearest Eugene,

    Our sincerest apologies… (sob)… for previously mistakenly sending you this email about some industry event that we’re hosting. We realize you have not registered, and yet the message was dispatched to you. HOW COULD WE. For the inconvenience that we have caused you, and all the confusion - we are so, so sorry!.. We truly understand the rollercoaster of feels that you’re experiencing at this very moment! We want - nay, need! - to do better. You DESERVE better. We admit, this was a human error. We made a mistakeSUCH a mistake… And, indeed, we do so sincerely hope you still choose to remain on our mailing list. Please, please let’s stay friends! However, if you choose not to (ohnoes!!)…. We will be sad. So very sad, yet understanding of you clicking this unsubscribe link. ( pleeeezdontgoooooooooooo )

    (Of course I’m exaggerating, but only slightly).

    seriously, what IS this shit.

    You’d think this was an apology for poisoning my prized breeding ferret, or another unthinkable evil of similar magnitude. But alas, a mere mis-addressed email message had been the trigger of such profuse expressions of remorse.

    Now I face a dilemma, don’t you see? On one hand, some of the their content is kind of interesting. But on the other, this type of submissive servility is a turn-off of unsubscribeable magnitude.

    Dear tech copywriters of the Web 2.x. We are not children, kk? Most of us don’t need a safe space to cry over your accidental emails. KTHXBAI.

    Though on second thought, if you really feel the need to apologise - nothing short of a phone call shall suffice. Ladies and gentlemen - start your Asterisks!

  • Accidentally discovered that EC2 security groups do not terminate an open connection (like SSH) when the security group rules or membership change. New connections will be prevented, but this will not terminate established ones.

    See for yourself:

    • create an EC2 instance and give it a security group
    • add an ingress rule on port 22
    • SSH into it
    • change the security group; remove the instance from that group altogether, or just change ingress rule.
    • Observe how the SSH connection remains open

    Tested this for N hours and SSH connection did not get terminated. So if someone is in your boxen, you can’t kick them out that way.

    Heed the warning and plan accordingly.


    update 2017-11:

    Apparently Azure NSGs have the same flaw. Not even surprised.

  • Mon, Jan 2, 2017

    So you get an idea, and it’s an amazing one. You’re inspired to start working on it right away…. But what if someone has already made this? Does this mean your idea is dead?

    And so you rush to the interwebs, and prepare to search…. But wait!

    Until you’ve made the search to determine originality of your idea, it is simultaneously both dead and alive.

    Meow.

  • Fri, Dec 30, 2016

    I’ve been thinking of technical debt lately, and came to extend some interesting (or at least cute) parallels:

    • Technical debt doesn’t just accumulate. It incurs interest, in form of dependencies that will break and will then need to be refactored to repay this debt in full.

    • Perhaps you can also default on technical debt. That’s when things get so much out of hand that you scrap the whole thing and rewrite/rebuild it from scratch. Engineers often love the “let’s rebuild it, for realsies this time!” approach. Having seen some shit, I don’t always disagree.

    • Finally… you can go technically bankrupt. That’s when you lose it completely, give up, flip the table, and fire up the ol’ QBASIC.

      10 PRINT "LOL"
      20 GOTO 10
      

    (If that’s you - maybe get help. Maybe from someone like me.)

  • Fri, Dec 16, 2016

    It took me 3 years to grok callbacks in Node.js. Not something I’m proud to admit.

  • Wed, Dec 14, 2016

    Propensity for dogmatic brand loyalty in a person is indicative of their critical thinking skills.

  • Sat, Dec 10, 2016

    Recovering my better half’s system drive. Her OWC 480GB SSD was allowed to reach 100% capacity (only 1.5GB remain)… And all hell broke loose. It’s barely readable (takes 10 minutes just to mount on my laptop), and I can’t even delete anything (any attempt to modify the filesystem just returns an invalid argument).

    I’m currently rsync-ing all the things to a network volume, and will attempt to deal with this after the data is safe.

    That machine was long overdue for a refresh anyway.

    Lessons *:

    • When setting up an SSD, make sure to enable TRIM. Windows | macOS (≥10.10.4).
      • OWC seems to discourage the use of TRIM, citing “garbage collection”. I believe those are different things, but more research is needed.
    • Leave unpartitioned space on the SSD. The accepted guideline seems to be ≈10% of total capacity

    * Disclaimer: I take no responsibility whatsoever for any effects, including but not limited to loss of data, caused directly or indirectly by this blog post.

  • Thu, Dec 8, 2016

    Today I was testing how a SaaS that we use for search handles failover. I clobbered our DNS to make our app use their secondary endpoint. The module in our app handled this very well, and switched over seamlessly.

    But then a thoughts crossed my mind: did this affect the Ops folks on the other end? Did someone see a blip on the charts, and ask the person next to them:

    hey dude, are you seeing weird traffic on node B13? dafuq is that?…

    Or… was it simply missed? unnoticed, lost in the shuffle - because let’s face it: ain’t nobody got time for this.

    I guess the former would be nice, because that would mean those packets of mine were just a little more special than the rest of the internet noise.

    Yet I do hope for the latter…

    Because Karma.

Hosting AWS Docker Microservices Tooling Automation