Wednesday, November 5, 2014

Stability world map

Working on techtalk related to stability, made colorized version of patterns and antipatterns relationships of the diagram from "Release It" of Michael Nygard.


Thursday, October 16, 2014

Memento

Just to have something to remember when I'm old - this is how COM component is invoked from .NET.


Friday, October 3, 2014

Frankenbot

A frankestein-styled robot we made at latest hackathon.

 Based on mirobot chassis, equipped with two stepper motors and a servo to handle a marker, managed by Raspberry Pi.

 Initially we panned it to draw on a whiteboard - magnets to hold the construction are under the bottom platform. However, wheels are too big and due to their friction with chassis motors couldn't turn properly - robot is heavier than we expected.

The idea was to replicate a user's drawing on a tablet to whiteboard directly.

By the end of the day we managed to draw the company name without lifting the marker up :)

Friday, September 26, 2014

NullReferenceException in PipelineModuleStepContainer

One shiny day a complex but pretty reliable application failed with the following exception report:


So I had to spend some days and nights searching for the reason.

Internet is not really good - it mostly reports issues related to registering in Application_Start. That was not the case, because application had been running for some time in production.

(Practially I should have made my mind when I saw those pages - because application in "global.asax.cs" is only a special kind of module).


Ok, so I made some research. It included debugging during the failure, dump file analysis from up to down and reverse-engineering of HttpApplication. I'll skip all those details and proceed to results.


  1. Due to integrated mode, IIS manages pipeline. Runtime just gets notifications from IIS. They contain information which module/handler pair should be used for event handling.
  2. During first-time app initialization, Runtime registers event handlers in IIS, specifically calling webengine4.MgdRegisterEventSubscription.
  3. Registration is in form [moduleName, eventType, moduleIndex], where moduleIndex is the index of PipelineModuleStepContainer instance in HttpApplication._moduleContainers array.
  4. Runtime maintains pool of HttpApplication instances (HttpApplicationFactory._freeList). In case if all app instances are busy processing requests, it creates new instance. It is called “normal” instance.
  5. During normal instance initialization, event handlers populate HttpApplication._moduleContainers, which will be used for events handling.
  6. When handling request, HttpApplication relies on data integrity and doesn’t check absence of handler. It basically makes this._moduleContainers[moduleIndex]._moduleSteps[eventIndex], which triggers NullReferenceException.



At the diagram above IIS contains registration for BeginRequest and AuthenticateRequest handler with module index = 6. At the same time HttpApplication doesn't contain such handlers. So we have a bang.


This gives us common rule of thumb for such class of errors:


NullReference exception occurs, if a module subscribed for event handling during first-time initialization; and for some reasons did not it during initialization of another HttpApplication instance. 

In my particular research NewRelic's module caused issues.



Wednesday, July 16, 2014

The Case Of Lost Files


Overview


Recently a customer made a request for investigation of a strange case in their system.

The system itself is a set of web services interacting with each other. Let's say there is a service "Designer" which maintains some design artifacts: a template in a form of XML file and related images and texts; "Generator" service ingests those artifacts and generates PDF file. When a user makes a particular 'download' request to "Designer", latter calls "Generator", which produces a file, puts it to a file storage and return full path to the file generated. "Designer" then takes that file and sends it to a user, competing the request.

Now to the point. Sometimes users encountered an intermittent bug. Once "Designer" has made a call and received a path, it tries to access the file and receives "FileNotFound" exception.

Some additional information: the system runs as IaaS in Azure Cloud, and error had appeared after moving to Chinese Azure.


Tuesday, November 19, 2013

Vim in visual studio tools

Here is what I use to launch vim from the studio:


// ampersand allows me to press 
// Alt-T-1 to run vim
Title: &1 Vim 

// 1. servername allows all files 
//    to be edited inside the same vim instance
// 2. 'g' before the name: not sure 
//    if it's actual now, but at the 
//    time I wrote it, it helped to bring 
//    vim window to front (because it 
//    starts with 'g' as gvim).
// 3. we also set cursor to the same position
// 4. Item path - also allows 
//    editing project/solution files
Arguments: --servername g$(SolutionFileName) --remote-silent  "+call cursor($(CurLine), $(CurCol))" $(ItemPath) 

// start in solution dir allows to 
// open all solution files/dirs in 
// navigation plugin, like NERD tree.
Initial directory: $(SolutionDir)

Monday, July 15, 2013

Force git to show diverged branches log

Look like it's not an easy task to make git show the history of two branches simultaneously starting from divergence point.

Consider the following diverging branches:

$ git log  --all --graph --format="%h %d %s"
* c4c7035  (HEAD, master) commit #10 master
* f41150e  commit #9 master
* 41af5d9  Commit #5 master
* 97a530f  commit #4 master
| * 271d605  (new_branch) commit #8 branch
| * 52409b4  commit #7 branch
| * ad3d575  commit #6 branch
|/
* 4a24c6d  commit #3 master
* e21ca0c  commit #2 master
* f02ecbd  commit #1 to master


Wednesday, October 31, 2012

Case of drop-down #2147483647

It began when our QA server had stopped responding to any requests. So it did for pings. Virtual machine console shown that the server was still running, and that was weird. We explicitly restarted it and started investigation.

    It looked like a disaster, really. There are no crash dumps, log files, etc. It just stopped working. So we decided to wait the situation to reoccur.

   And it did occur again in about 10 minutes: w3wp.exe process started to consume memory with the eagerness no one could imagine and explain: during a minute memory usage had increased from 300Mb to 1.2Gb and continued to increase. That was good news from a certain point of view - we had reproduced that strange situation. But from the other side, we couldn't do a lot because computer became pretty unresponsive.

   Attempt to create crash dump using Process explorer or Task manager failed with a strange message:

Only part of a ReadProcessMemory or WriteProcessMemory 
request was completed.

    Windbg shown that there were about a 1M objects of SelectListItem objects. That is not something that we meet everyday. But that was obvious that the application tried to create long select list - but the question was why.

   After some investigation we found an answer. The problem had been caused by a form, where new tasks (for students) had been entered. There was a "Max mark" field, with default value 10, which specified maximum mark value for a particular task.


An author of functionality didn't know if there should be an upper limit for this "Max mark", so, logically, he set it to Int32.MaxValue. And, logically, during testing QA person saw a validation message that Max mark "Must be between 1 and 2147483647". And... entered 2147483647. Why, of course she should have entered that..

  To simplify marks entering, all allowed values are represented as a drop-down list from 0 to maximum value - and here the application tried to generate a drop-down list with 2147483647 items, consuming up to 2Gb of memory.

  Although it's easy to fix, that was an interesting experience.







Thursday, June 7, 2012

HttpResponse.Flush()

When calling HttpResponse.Flush(), need to remember:

  • it sets "Transfer-Encoding: chunked" header if:
    • it's not a "finalFlush" and
    • protocol is HTTP/1.1 (which is usual for today) and
    • transfer encoding is not set and
    • content-length is not set and
    • status code is 200.
  • "finalFlush" is invoked only internally, after request processing by handler.

And what was most important for us: Output cache had been disabled when transfer encoding was set by Flush() call.

Thursday, March 29, 2012

Clearing Nlog records from the database

So we came up with logging application events to a database using NLog. Well, it went pretty smoothly, but one sunny day it took more than 5 minutes to get something from the log table. The problem was that development environment puts verbose information there. Probably, adding a special job to sqls server could be a good idea, but we don't really want to overload it.

As a solution for development environment, a new table [NLogLastCleanup] was added to store last time of table cleanup. And the command which enters new records is responsible for old records cleanup now. Also, we pass "cleanup" and "removal" periods via NLog parameters.

   <!--Database-->  
   <target name="database" xsi:type="Database" connectionStringName="nlog">  
    <commandText>  
     <![CDATA[  
     --insert log message  
     insert into [dbo].[NLog] ([Date], [Level], [Logger], [User], [Message], [Exception], [MachineName], [Pid])  
     values (@date, @level, @logger, @user, @message, @exception, @machinename, @pid);  
     --  
     -- Cleanup old records  
     -- Cleanup should run once per day  
     -- Remove records, which are older than a week  
     --  
     declare @lastCleanup datetime  
     select top 1 @lastCleanup = [LastCleanupDate] from [dbo].[NLogLastCleanup]  
     if @lastCleanup < DATEADD(Day, -CAST(@cleanupPeriod as int), GETDATE())   
     begin  
          update [dbo].[NLogLastCleanup] set [LastCleanupDate] = getdate()  
          delete [dbo].[NLog] where [Date] < DATEADD(Day, -CAST(@removePeriod as int), GETDATE())  
     end   
     ]]>  
    </commandText>  
    <parameter name="@cleanupPeriod" layout="1" size="4" scale="0"/>  
    <parameter name="@removePeriod" layout="7" size="4" scale="0"/>  
    <parameter name="@date" layout="${date:format=yyyy\-MM\-dd HH\:mm\:ss.fff}"/>  
    <parameter name="@machinename" layout="${machinename}" />  
    <parameter name="@pid" layout="${processid}" />  
    <parameter name="@level" layout="${level}"/>  
    <parameter name="@logger" layout="${logger}"/>  
    <parameter name="@user" layout="${identity:authType=false:isAuthenticated=false}" />  
    <parameter name="@message" layout="${message}"/>  
    <parameter name="@exception" layout="${plainExceptionFormat}"/>  
   </target>