Skip to content

MemoryBlocks For Speed: A Case Study

A few years ago I held a session about MemoryBlocks at the Xojo Developer Conference where I discussed how, generally, MemoryBlocks (and Ptrs) should be avoided except for cases where you must use them, e.g., Declare, or when speed is absolutely critical. I offered this advice because a MemoryBlock can be tedious to work with, and can lead to hard-to-trace bugs.

But when you do need that extra boost, it’s an option to consider, and I recently came across a scenario where it made a huge difference.

The (One Billion Row) Challenge

This came about from a discussion of the “One Billion Row Challenge” on the Xojo Forum, where a programmer is tasked with reading a billion rows of temperature data and consolidating it into statistics for each given city. Our friend Mike D started a project to demonstrate different techniques, and I eventually started my own project. Using MemoryBlock, Ptr, and preemptive threading, I was able to process a billion rows in roughly 8 seconds.

But that’s not the point of this post.

See, in order to process the data, you must first create it, which isn’t as straightforward as it seems.

Creating The Data

Each row of the data file takes the form “City;temp”, where “temp” is single-place decimal between -99.9 and 99.9. For purpose of limits, I used, arbitrarily, 413 random cities out of a list of all cities. The original code looked something like this where “bs” represents a BinaryStream:

Var r As New Random

For i As Integer = 1 To rowCount
  Var city As String = cities(r.InRange(0, cities.LastIndex))
  Var temp As Double = r.InRange(-999, 999) / 10.0

  bs.Write city + ";" + temp.ToString("#0.0") + EndOfLine
Next

This was simple, easy, and slow. To generate the full billion rows took about 2.5 hours.

Memory-Unblocking

Upon investigation, I rediscovered what I already knew. Dealing with strings, both in conversion and concatenation, can be a bottleneck. I won’t go through all the iterations here, but nothing I tried made a significant difference. The only solution was to ditch strings entirely.

I started with creating a MemoryBlock “buffer” (“outMB”) of 1 MB with an associated Ptr (“outPtr”). (You can access the contents of a MemoryBlock through its methods, but those are function calls, which have an overhead. Ptr methods are operators that work with the bytes directly so they are faster.) The plan was to fill the buffer as much as I could, write it to the file, then start again at the top of the buffer.

Keeping a position index, I started with writing the city using outMB.StringValue since there is no equivalent Ptr method for this. Next, I plugged in the value of a semicolon with outPtr.Byte(outMBIndex) = 59.

Working with integers is faster than doubles, so I used a little math to plug in the temperature values directly using If statements and outPtr.Byte.

Finally, I used outPtr.Byte(outMBIndex) = 10 to plug in the linefeed (ASCII 10).

The final code looked something like this:

Const kEOL As Integer = 10
Const kHyphen As Integer = 45
Const kDot As Integer = 46
Const kZero As Integer = 48
Const kSemicolon As Integer = 59

Var r As New Random

Var outMB As New MemoryBlock(1000000)
Var outPtr As Ptr = outMB

Var outMBIndex As Integer = 0

For row As Integer = 1 To rows
  Var cityIndex As Integer = r.InRange(0, cities.LastIndex)
  Var city As string = cities(cityIndex)
  Var cityBytes As Integer = city.Bytes

  If (outMBIndex + cityBytes + 10) >= outMB.Size Then
    bs.Write outMB.StringValue(0, outMBIndex)
    outMBIndex = 0
  End If

  outMB.StringValue(outMBIndex, cityBytes) = city
  outMBIndex = outMBIndex + cityBytes

  outPtr.Byte(outMBIndex) = kSemicolon
  outMBIndex = outMBIndex + 1

  If r.InRange(0, 4) = 0 Then
    outPtr.Byte(outMBIndex) = kHyphen
    outMBIndex = outMBIndex + 1
  End If

  Var temp As Integer = r.InRange(0, 999)
  Var t1 As Integer = temp \ 100
  Var t2 As Integer = (temp \ 10) Mod 10
  Var t3 As Integer = temp Mod 10

  If t1 <> 0 Then
    outPtr.Byte(outMBIndex) = t1 + kZero
    outMBIndex = outMBIndex + 1
  End If

  outPtr.Byte(outMBIndex) = t2 + kZero
  outMBIndex = outMBIndex + 1

  outPtr.Byte(outMBIndex) = kDot
  outMBIndex = outMBIndex + 1

  outPtr.Byte(outMBIndex) = t3 + kZero
  outMBIndex = outMBIndex + 1

  outPtr.Byte(outMBIndex) = kEOL
  outMBIndex = outMBIndex + 1
next

If outMBIndex <> 0 Then
  bs.Write outMB.StringValue(0, outMBIndex)
End If

This code is far longer, harder to follow, and difficult to maintain, which goes back to my original point of why MemoryBlock should be avoided.

It also generates one billion rows of data in about a minute (as opposed to 2.5 hours).

It’s nice to have the option.

Kem Tekinay is a Mac consultant and programmer who has been using Xojo since its first release to create custom solutions for clients. He is the author of the popular utilities TFTP Client and RegExRX (both written with Xojo) and lives in Connecticut with his wife Lisa, and their cat.