A lesson in being too clever

For a project I’m working on I need to move a dataset from a remote server to a local server for analysis. The dataset may contain several million rows.

My test dataset is 1.1 million rows with three fields of interest. (The real dataset will have many more fields.) All fields are assumed to be positive.

  • Field 1: an 8 digit integer, repeating or increasing slowly
  • Field 2: an integer likely between 1 and 20
  • Field 3: a decimal number with two digits of precision likely below 10,000

We move a lot of data around in JSON format so I started by implementing my transfer in JSON. I knew it would be an unacceptably large file. It was 62 MB. Next I gzipped the file to see how much that helped. My mental heuristic on gzipping JSON is you get about a 90% reduction. This large file did better at 94% and weighed in at 4.5 MB. Not too bad.

Then I got clever.

Since the decimal field only has two digits of precision I multiplied it by 100 and treated it as an integer. Now with three integers, I transformed my data into a byte array (14 MB) and compressed that (3.2 MB). Hah I beat JSON compression!

Now I know that gzip has better performance on repeated data and I knew the first field in the dataset was a slowly increasing number. Most records are the same as or between 1 and 19 numbers higher than the previous one. To take advantage of this fact, I transformed the first field into a delta. So if the data was this:

  • 10191378
  • 10191385
  • 10191385
  • 10191392
  • 10191408

After my transform the dataset became:

  • 10191378
  • 7
  • 0
  • 7
  • 16

After compressing this transformed dataset, it was only 1.87 MB. Exciting!

Then I realized what I should have realized at the start. I have a tabular dataset and there’s a well known format that already exists for that: CSV.

So I rendered my dataset as CSV (25 MB) and compressed it (3.4 MB). That’s definitely better than JSON and very comparable to the byte array.

Then I applied my delta formula to the CSV and recompressed it. The result: 1.83 MB. That’s about 40 KB smaller than my “clever” solution. And CSV is far more adaptable than byte arrays. I wouldn’t even need to transform decimals into integers.

So my lesson learned is the widely used file formats are worth using after all.

A small amount of time invested in thinking about the constraints of this test dataset reduced the compressed file size almost 50%, but I suspect on the real dataset with more columns the gains from applying a delta transform to one column will become much less significant.

Scripting database changes

On my team we use SQL Server Database Projects to version our databases. We previously used EF6 for our data layer, but recently started using EF Core which gave me the opportunity to optimize my workflow. Here are the steps we would manually take to modify the schema.

  1. Open the database project and alter the schema.
  2. Build the project.
  3. Publish the project to a local test database.
  4. Make the corresponding change in the EF Core model.

I decided to script it and add the script as a shortcut in Launchy. Here is my script.

"C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\MSBuild\Current\Bin\msbuild.exe" C:\code\prism\DB\PrismDB.sln /property:Configuration=Release

if %errorlevel% neq 0 exit /b %errorlevel%

"C:\Program Files\Microsoft SQL Server\150\DAC\bin\SqlPackage.exe" /Action:Publish /SourceFile:"C:\path\to\sql\project\bin\Release\MyDatabase.dacpac" /TargetServerName:localhost /TargetDatabaseName:MyDatabase /p:AllowIncompatiblePlatform=True /p:BlockOnPossibleDataLoss=False /TargetTimeout:120 /p:ScriptDatabaseOptions=False

if %errorlevel% neq 0 exit /b %errorlevel%

pushd C:\path\to\sql\project

dotnet ef dbcontext scaffold "Server=localhost;Database=MyDatabase;Trusted_Connection=True;" Microsoft.EntityFrameworkCore.SqlServer --output-dir ../Data/Entities --context Entities --force --project Data.Dummy

if %errorlevel% neq 0 exit /b %errorlevel%

My workflow with the script is:

  1. Open the database project and alter the schema.
  2. Press Alt+Spacebar and type database to execute my script.
  3. Go back to coding in Visual Studio and everything is up to date.

To improve the error experience, you can create a second script that calls the above script with the /k switch to prevent the command shell from closing at the end.

cmd /k "Database Run All Inner.bat"

Custom metric in Application Insights

Tracking custom metrics in Application Insights is easy. I wanted to track how long our cash register takes to print receipts so I could compare performance across hardware and make better recommendations to our sales team and diagnose customer issues related to printing speed.

You will need a TelemetryClient instance. Use the GetMetric() method to get or create a metric by name. You can use the overloads to provide names for additional custom dimensions. In this case I am tracking the receipt number and the number of images printed on the receipt.

Call TrackValue() to add a new measurement. The TelemetryClient will aggregate the metrics over time and report them to Application Insights.
The default interval appears to be 54 seconds.

In my case, aggregation is not doing much since each printed receipt has unique dimensions and a register is not likely to print more than one receipt every 54 seconds.

var metric = _telemetry.GetMetric("PrintReceiptDurationMs", "ReceiptNumber", "ImageCount");
metric.TrackValue(sw.ElapsedMilliseconds, receipt.ReceiptNumber, imageCount.ToString());

In Log Analytics you can now query for the results.

customMetrics 
| where name == 'PrintReceiptDurationMs'
| extend receipt_number = tostring(customDimensions.ReceiptNumber)
| extend image_count = todouble(customDimensions.ImageCount)
| project value, receipt_number, image_count
| order by receipt_number desc

Or you could plot a chart.

customMetrics 
| where name == 'PrintReceiptDurationMs'
| summarize avg(value) by todouble(customDimensions.ImageCount)
| render barchart

You can query across all metrics if they share common custom dimensions.

customMetrics 
| where customDimensions.ReceiptNumber == 'RC-00092261-7'
| project name, value, timestamp 
| order by name

Prototype: Generating Vue forms from JSON

I’m fascinated with generated code. I love to write code, but when it comes to repetitive CRUD screens, nothing beats a template. Being able to quickly generate screens builds confidence with clients and gets you right into the meat of the application.

I used to build applications primarily in ASP.NET MVC. Recently I’ve started using Vue and immediately missed having input and form builders. Since I still use a C# Web API on the back end, I had to creatively get the C# model from server to client. I did this using a modified JSON Schema. I tried several libraries, but was not very happy with the extensibility of any of them. This proof of concept uses NJsonSchema.

You’re probably here for the client side. Here’s a demo. The code is in the second and third tab.

This form requires two objects from the server. A schema object describing the form and an object containing the form data.

The type attribute comes from the C# property type. When I was limited by NJsonSchema, I added an xtype attribute so I could pick how to render DateTimes and option lists. Select list options come from a property on the formData object mapped from optionsFromProperty in the schema.

You can find the (currently) very ugly server side model here:
https://gist.github.com/ryanohs/781a62717325dd2897addaeb14459e98

Improvements:

For simplicity I published the demo as a single component, but I did break it into several components in my own code.

I will probably end up writing my own schema generator so I’m not constrained by the assumptions of existing ones. JSON Schema is designed for validating JSON data, not building UIs so I’m really stretching the use case here. I would prefer to use the DataAnnotations attributes whenever possible since many tools, like EF, Newtonsoft, and ASP.NET data validation, are already capable of generating and interpreting them.

I couldn’t generate enum drop downs in this demo because NJsonSchema renders them as $ref properties which I didn’t want to interpret client-side.

It would also be great to have sensible default attributes so you can build a form directly from a plain class or EF table object without manually defining labels and enum/list data types.

In a production build scenario, you could precompile the schema as a dependency json file so only the form data is downloaded at run-time.

Thanks for reading! Let me know what features would be useful to you.

Discovering connections in code via Reflection, part 2

Ayende has a really neat post about about using an AST visitor to generate workflow diagrams from code. I used that as inspiration to modify my previous pub/sub documentation generator to output GraphViz syntax. It was a trivial change.

Console.WriteLine($"{messageUsage.Publisher} -> {handler}");

I copy the output into the demo at
https://www.planttext.com/ and it generates a diagram of all the messages passed in my application.

In the future, I may import the GraphViz nuget package and generate the diagram inside my own code.

A Start menu alternative

I recently installed Launchy on my machine to automate common actions. Launchy is a program that lets you quickly execute shortcuts via the keyboard. To activate it, press Alt+Spacebar then type your shortcut. It has autocomplete and defaults to the last shortcut you ran.

It indexes your Start Menu and Quick Launch. I created an additional index and added frequently visited Chrome bookmarks as well as some batch and Powershell scripts that I regularly use.

Some of my shortcuts:

  • JIRA (link to current sprint)
  • Backlog (link to backlog)
  • Pull Requests
  • Prod Insights (my production activity monitor)
  • Script Out DBs (batch file regenerates EF Core files from a SQL DB)
  • Vue Debug (launches a background Vue compiler and watches for changes)
  • ES6 cheat sheet

Discovering connections in code via Reflection

I wrote a .NET application that makes heavy use of the publish/subscribe pattern. In order to help other developers learn about the code base I wrote a unit test that finds all publishers and subscribers and describes how they are connected.

Each published message is a class inheriting from IMessage.

Each subscriber inherits from ISubscribeTo<TheMessageType>.

This code uses an IL reflector (source code here) to find each location a message type is constructed (before it’s published) and type reflection to find all its subscribers. Then it builds a text document describing what methods publish each message type and what subscribes to it.

The output looks like this. One improvement would be to remove the return type from the method signature so it reads more naturally.

{Method} publishes  ==>  {message type}
	-> handled by {subscriber type}

AccountViewModel.Void Execute_NewAccountSelectedCmd()  ==>  AccountSelected
	-> CustomerDetailViewModel
AddCustomerAccountsViewModel.Void Execute_CloseCmd()  ==>  AccountSelected
	-> CustomerDetailViewModel
AppliedTenderViewModel.Void Execute_RemoveTenderCmd()  ==>  RemoveTender
	-> TransactionViewModel
AuthorizationService.User ValidateUser(System.String, System.String)  ==>  LogoutRequested
	-> RegisterViewModel
	-> TransactionViewModel
	-> HardwareService
BasketIdViewModel.Void Execute_ApplyCmd()  ==>  ApplyBasketId
	-> TransactionViewModel

The code:

public void ListOfAllPublishersAndSubscribers()
{
	Console.WriteLine("{Method} publishes  ==>  {message type}");
	Console.WriteLine("\t-> handled by {subscriber type}");
	Console.WriteLine();
	Console.WriteLine("Discovered via Reflection. Duplicates not removed.");
	Console.WriteLine();
	Console.WriteLine();

	var domain = typeof(App).Assembly;
	var pos = typeof(TransactionViewModel).Assembly;
	var assemblies = new List<Assembly>() { domain, pos };

	var handlerType = typeof(ISubscribeTo<>);
	var handlersByType = assemblies
		.SelectMany(s => s.GetTypes())
		.SelectMany(s => s.GetInterfaces(), (t, i) => new { Type = t, Interface = i })
		.Where(p => p.Interface.IsGenericType && handlerType.IsAssignableFrom(p.Interface.GetGenericTypeDefinition()))
		.GroupBy(t => t.Interface.GetGenericArguments().First().Name)
		.ToDictionary(g => g.Key, g => g.Select(x => x.Type.Name));

	var imessage = typeof(IMessage);
	foreach (var messageUsage in assemblies
		.SelectMany(s => s.GetTypes())
		.Where(type => type.IsClass)
		.SelectMany(cl => cl.GetMethods().OfType<MethodBase>(), (t, mb) => new { t, mb })
		.SelectMany(a => MethodBodyReader.GetInstructions(a.mb), (a, i) => new { Publisher = $"{a.t.Name}.{a.mb.ToString()}", op = i.Operand as ConstructorInfo })
		.Where(a => a.op != null)
		.Where(a => imessage.IsAssignableFrom(a.op.DeclaringType))
		.OrderBy(a => a.Publisher)
		.ThenBy(a => a.op.DeclaringType.Name))
	{
		Console.WriteLine($"{messageUsage.Publisher}  ==>  {messageUsage.op.DeclaringType.Name}");
		if (handlersByType.ContainsKey(messageUsage.op.DeclaringType.Name))
		{
			foreach (var handler in handlersByType[messageUsage.op.DeclaringType.Name])
			{
				Console.WriteLine($"\t-> {handler}");
			}
		}
		else
		{
			Console.WriteLine("\t-> NO HANDLERS");
		}
	}
}

1% Better: Budgeting

I don’t think about how I’m spending money very often. Every January I set up a spreadsheet with my projected income, expenses, and investments for the year. Each month I check my credit card balance. If it’s under $700 I know I was frugal. If it’s over $1,000 or, gasp, $2,000, I know to rein in my spending next month. Most of my major purchases are simply made based on if I have enough cash in my checking account and if it feels right in my gut. The following January, I compare my actual savings and investment balances with my projected values. Not surprisingly, I always come up short.

In this regard I’m probably an average member of the American middle class. But this year I was frustrated that, yet again, I came up short on my savings so I decided to do something about it. I pay for almost everything with a credit card and I have a neglected Mint account, so I started digging through my spending history and built a spreadsheet of every penny I spent last year. What I found is that my budgeted categories were actually very accurate, however my “one time” expenses went out of control. The spur of the moment vacation trips, the nice new bed, an elective surgery, and others ate up a good chunk of my expected savings.

I made a new budget spreadsheet with realistic breakdowns by category. I also built in a “one time” expense budget that I’ll (try to) stick to. This will force me to weigh my big purchases and prevent impulse spending. It also encourages me to shop around for better deals.

One category I overspend on is food. Last year I spent about $300 per month on food. The USDA published a monthly food cost report. According to that report, my food budget puts my squarely in the “moderate cost” plan.

Thrifty plan $183.70
Low-cost plan $237.20
Moderate-cost plan $297.30
Liberal plan $365.30

Over the past two months, I’ve been planning menus and monitoring what I have on hand so I don’t end up in a situation where I feel like it’s easier to get fast food than go to the grocery store before preparing dinner. It’s working. My January and February food expenses were $251 and $193 dollars. It’s one week into March and so far I’ve only spent $41. I have enough food on hand to prepare at least 5 or 6 days worth of meals. I can’t really tell that my eating habits have changed. I might even be eating healthier now that I’m paying attention to it.

Another step I’ve taken to control spending is paying myself first. I adjusted my 401k withholding so I’m investing as much pre-tax income as possible. This not only takes the money out of my paycheck before I ever see it, but it is also going to provide me with significant tax savings this year. I have a minimum monthly investment target I want to hit so I set up automated stock and bond purchases with my broker. Now if I want to go over my spending allowance, I’ll have to convince myself to sell assets and transfer funds around. Basically I’m increasing the friction necessary to spend money.

 

1% Better: Diet

It’s hard to make an abrupt change in your diet. You can stick with it for a while, but then you start to cheat a little, and then it all falls apart. Instead of an abrupt change, I’m making small changes as part of a system to improve my eating habits over the long term. I don’t have any rules like gluten free, paleo, no carb, or anything like that. I’m just starting to replace less healthy foods with healthier options. Here are a few of the meal changes I’ve made recently:

  • I gave up french fries. Today every restaurant offers an alternative sides menu. I don’t regularly eat out, but when I do, I’m only ordering an alternative side.
  • Finish that bag of salad. Whenever I buy a bag of lettuce it always goes bad before I finish it. They seem to have a shelf life of no more than three days in the fridge. Now when I buy lettuce, which I try to do on every trip to the store, I eat some at each meal to ensure I finish it.
  • I replaced my deli lunch sandwiches with pita pocket sandwiches. I generally use turkey, cheese, and mustard. When I ran out of pita pockets I made tortilla wraps.
  • I substituted ground turkey for hamburger in tacos. I use the 99% fat free turkey.
  • I used zucchini and squash as meat substitutes in a few dishes.
  • I stepped up my block cheese for snacking on from the cheap stuff to the deli cheeses. They’re so much richer in flavor.
  • I know crackers are bad and loaded with simple carbs. I’m replacing some of my cheese and cracker snacks with baby carrots and hummus.
  • To get more variety of veggies (I eat lots of broccoli) I bought a bag of frozen stir fry veggies and made stir fry and rice.

Updating disconnected Entity Framework child collections

One pain point I have with Entity Framework is dealing with updating child collections on disconnected entities. The most common scenario I run into is in web APIs. I have a web page which allows a user to edit an entity which includes a child collection. New children can be added, existing children edited, and existing children deleted. All the actions are saved to the server at once. When I POST this to my API, I end up writing a lot of boilerplate to figure out what changed. To prevent that, I came up with this method. A sample usage is below. This is for EF 6.1.