Dividing Data Models – Objects vs Relationships

May 25th, 2011

Although I’m intrigued by the concept of NoSQL, an alternative to SQL that isn’t quite mainstream yet, I’ve come to believe there are parts of the concept that are indeed novel, while there are parts of the concept that make the technology unusable. Unfortunately, we may never arrive at the perfect level of getting it right, but here’s hoping.

NoSQL starts off with what is often quoted as a schema-less design, although some are hybrid implementations that have definitions for each column family (not unlike a table). While I applaud the schema-less design (it would certainly help speed development), this is where NoSQL starts to get in trouble. Querying now involves looking into each object on either the search or index. Not necessary a difficult task and NoSQL implementations handle this quite well, but it starts to feel more like traditional SQL makes more logical sense when it comes to handling queries.

Well, to me it seems like they just missed the dream. Along with trying to do away with the schema, they also tried to do away with the in-code manipulation of the data before being persisted. Although this makes persistence easier in some ways (think REST), it makes data manipulation much more difficult in other ways. They replaced the standard practice of pushing data into rows with a more complex mode of querying via map reduce functions.

The dream isn’t completely missed, though, as they are one step closer to making some novel concepts a reality. One of those is the compilation of related data into contiguous records rather than spread across tables and rows. This concept helps force the understanding that data in a system requires forethought as to the boundaries of autonomous records rather than grouping information into a single giant tree. For instance, the contact information of a user should not be saved separately from the user’s name, even if there is a 1-x relationship between the user and their contact information. Rather, the information should be read from the persistent storage as a single block. On deciding this compilation and separation, the interface for retrieval will become self apparent.

Additionally, since all related data is stored under a single key, basic operations such as copy and delete can be performed without missing extended elements.

Software as a Self Perpetuating Undeliverable

March 24th, 2011

If you are an executive for a company that relies on a software development cycle – virtually all companies that exist now – don’t count yourself alone in your frustrations with an out of control development cycle.

The problems lie with retrofitting existing system with new requirements. Since the system cannot be wholly redesigned, there is a certain degree of adjustments (hacking) that must be done to get the system to operate under these new conditions. After a system has been fixed to work correctly in this manner, it is the natural cycle for these changes to remain in their state as long as they operate correctly. Attempting to perform any redesigns (counter to this convention) could interrupt the service and cause problems that would not have otherwise cropped up. Therefore the evolution of any system is toward a state that is bogged down by hacks, causing the eventual delays in the development cycle.

Strucutres and persistence

February 23rd, 2011

Converting a structure between persistent media and memory is an essential methodological decision that will drive the design of a system. Because of this, it needs to be one of the earliest decisions to be made. While it may not be practical to follow the same mechanism in all circumstances, we would like this to be a discussion of the highlights and shortcomings of the ideal system.

One of our established rules of design is the (fairly standard) separation of code into control and data structures. Although breaking this rule may make solving the problem easier, we believe, in the end, it would make the system more difficult to understand, work with, and extend.

The example for discussion is a tree structure, primarily because of its complexity when converting between some types of persistent media and memory. For instance, when recalling items, sometimes it is necessary to make separate calls for each level or parent, thus extending the time to read into memory. Other times, memory size may be the limiting factor. More to the point, it may not be practical to read an entire tree into memory.

The question regarding this point is how data and control structures should be fashioned.

The first option for the data structure is to omit parent/child relationships. This would push determination of these relationships to the control structure:

Object parent = control.getParent(child);

The second option for the data structure is to include parent/child relationships. This would not require use of the control structure:

Object parent = child.getParent();

However, for this code to work without any control code, the entire tree must be loaded into memory. Something that’s already been discussed as not always being practical. One alternative that would retain the same data structure is to assume a null value may correspond to an unloaded object (see below). Obviously, this isn’t considered a solution because the additional code would lend itself toward errors.


Object parent = child.getParent();
if(parent == null) {
parent = control.getParent(child)
}

I’d like to consider the use of persistence placeholders rather than complete objects. Here is one piece of code along these lines that solves a common problem with the first data persistence option:


Object child = control.newObject();
Object parentPlaceholder = control.loadPlaceholder(999);
child.setParentPlaceholder(parentPlaceholder);
control.saveObject(child);

Subsystems and Context

September 22nd, 2010

Conceptually, the simple process of running one system inside another may seem like it would be one of the most standard constructs of programming. Yet there is surprisingly little attention paid to how systems specify the resources available to a subsystem. To illustrate what is meant by this statement, first consider a small calendar component of an application that is trying to get a basic configuration resource. While it is fairly well understood that the interface for using the configuration should be reusable, what is so often overlooked is that the interface for retrieving the configuration interface should also remain consistent.

While this may seem easily solved, let’s take a look at a solution that will hopefully inspire future designs as it has inspired the design here at OpenSourceApi. First, a subsystem may have several different functions that it performs, each of which may be called from a number of other systems. The resources available to that subsystem should specific to the function called, but able to be overridden by the calling system. Therefore, we construct a Context that gets passed to the subsystem when a function is to be executed. The following pseudo code describes the process of retrieving a resource:

get(key, defaultValue):
  Object value = internalMap.get(key);
  if(value != null):
    return value;
  Context parentContext = this.getParentContext();
  while(parentContext != null):
    Object value = parentContext.get(key);
    if(value != null):
      return value;
    parentContext = parentContext.getParentContext();
  Context precursorContext = this.getPrecursorContext();
  while(precursorContext != null):
    Object value = precursorContext.get(key);
    if(value != null):
      return value;
    precursorContext = precursorContext.getPrecursorContext();
  return defaultValue;

The Framework Trap

July 27th, 2010

Lets say you’re a young, bright-eyed company on the cusp of building something revolutionary in your field. Maybe you already have a rough design, but the time has come to include all the features of an industry leading piece of software. Without the time or money to build all the intricacies yourself, you decide to turn to a framework.

Frameworks have advantages that make them attractive, but also have disadvantages that may not be apparent from the start. For instance, choosing to operate within the confines of a pre-built system means that the amount of support available to you becomes diminished by thousands of times. Also associated with this is the fact that current and future employees require increasingly specialized skills. A framework additionally means that any future business decision to stop its use would require a massive commitment of money and time.

OpenSourceApi is not designed as a framework, but as individual components that are able to be used at the discretion of the architect. We do this because it is one of our direct goals to be the antithesis of a framework. We aim to provide minimal integration or dis-integration time and maximum interoperability.

Timing Persistance

July 25th, 2010

One of the decisions nearly every designer will some day face, whether they realize it or not, is when to persist data in relation to adding it to the in-memory model.  The timing of these actions drastically effects the format of the code, but very often this decision is made without consideration of the possible options.  To illustrate this, consider the following variations in code structure below.

In the first version, the memory data is changed first, then the persistent data:

object.setProperty("ABC");
dao.saveObject(object);

In the second version, the persistent data is changed first, then the memory data:

dao.updateObjectProperty(object, "ABC");

Aside from the obvious differences in coding structure these will cause, there are underlying consequences associated with either model. One of these is the differing priority they place on the preservation of data. In the first version, modified data remains in memory even if the save operation fails. In the second version, modified data will be remain saved even if the property is unable to be set in memory.

The number of persistence operations also differs between these two variations. In the first, multiple pieces of data can be persisted in a single operation. The second, however, requires unique operations for each item of data.

We have chosen the best practice at OpenSourceApi with these considerations in mind. Our best practice is to use the first method, with the requirement that any cached versions of the object be cleared on a write failure. The following pseudo-code shows an example of this procedure.

saveObject(Object object):
    try:
        databaseConnection.updateObject(object);
    exception:
        this.clearCache();

The Complexity Problem

July 11th, 2010

Despite the prevalence of open source systems, experience has shown us that the more a business expands, the more monolithic and resistant to change a code base evolves to be.  If you need convincing of that, just view the source code of the landing page for any major business (such as Vodafone) and witness how many small hacks work together to make up the whole.  Where does the complexity in these systems come from?  While some may argue poor programming, the reality is that new requirements on older systems force hacks as the only means of accomplishing them (barring a complete rewrite, of course).

In a word, the old designers did not have the experience necessary to see how their system may need to be adapted. Even the best new designers won’t have the experience necessary to see how their system might need to be adapted in the future.  So what is the solution for developing software today that has the flexibility necessary to withstand future modifications?  To use the experience of thousands of users that have helped to modify OpenSourceApi.  These API are time tested to ensure the most adaptable base for future systems.

Our Philosophy

July 2nd, 2010

Back in the 80’s and 90’s, computers ran at a snail’s pace of even the most simple machines on today’s market. In that time, and even to a point now, the most important aspect of computing was speed.  The faster things ran, the happier people were.  While that principle is still the same today, the speed of the hardware has changed dramatically.  Technology has grown to fill the gap between people’s expectations and the speed at which applications are able to execute.  With that growth comes the opportunity for a new paradigm in programming.  One that is defined by the clarity of its API.

Getting back to the past, we have a time in which program design was based entirely around speed.  In fact, many of the applications that stood apart did so because they were able to use lower level operations to achieve better performance.  Because such low level design required consideration from the start, the programs were designed for speed before anything else.  While performance must always play a role in application development, we now have the freedom to escape the architecture that was once necessary.

Also in the past, we see the start of many conventions that persist in programming today, some not so bad and some not so good.  One such convention is that of the creators of programming languages dictating a broader and broader spectrum of API for their language.  While programming languages are powerful tools, at their core they need to support a finite number of concepts: objects and pointing, functions and looping, streams, OS calls, arrays, locking, and some others.  Any additional structures in the API are often constructed from the language itself, such as can be seen in Java if the class HashMap is decompiled.  There is no magic to the classes or constructs a programming language claims to be part of its core.

With these constructs of the past in mind, I would like to introduce a new idea.  The idea of opensourceapi.org, a repository of community driven APIs designed with a unique set of goals in mind.  The first of these is to represent models rather than actions.  While most API are designed around specific tools, our API are designed to be a representation of anything.  By creating API that represent the spectrum of businesses, then we are free to build applications with ever increasing complexity.

The second goal of this repository is to benefit the time to develop an application.  As the size of applications continues to increase, this has become one of the foremost aspects of design.  The standards created by this API provide a core with which programmers can become familiar.  This will enable new developers to more easily become ready to begin work on a new code base.

While this does not cover all aspects of opensourceapi.org, it serves as an introduction to the ideas that make us unique.  Please visit back for updates in the future and development of the APIs as they take shape.