Please specify the group
Archive | Technology RSS feed for this section

MongoDB – Introduction


MongoDB wasn’t designed in a lab. We built MongoDB from our own experiences building large scale, high availability, robust systems. We didn’t start from scratch, we really tried to figure out what was broken, and tackle that. So the way I think about MongoDB is that if you take MySql, and change the data model from relational to document based, you get a lot of great features: embedded docs for speed, manageability, agile development with schema-less databases, easier horizontal scalability because joins aren’t as important. There are lots of things that work great in relational databases: indexes, dynamic queries and updates to name a few, and we haven’t changed much there. For example, the way you design your indexes in MongoDB should be exactly the way you do it in MySql or Oracle, you just have the option of indexing an embedded field.
– Eliot Horowitz, 10gen CTO and Co-founder

Why MongoDB?

•Documents (objects) map nicely to programming language data types
•Embedded documents and arrays reduce need for joins
•Dynamically-typed (schemaless) for easy schema evolution
•No joins and no multi-document transactions for high performance and easy scalability
•High performance
•No joins and embedding makes reads and writes fast
•Indexes including indexing of keys from embedded documents and arrays
•Optional streaming writes (no acknowledgements)
•High availability
•Replicated servers with automatic master failover
•Easy scalability
•Automatic sharding (auto-partitioning of data across servers)
•Reads and writes are distributed over shards
•No joins or multi-document transactions make distributed queries easy and fast
•Eventually-consistent reads can be distributed over replicated servers
•Rich query language

Large MongoDB deployment

1. One or more shards, each shard holds a portion of the total data (managed automatically). Reads and writes are automatically routed to the appropriate shard(s). Each shard is backed by a replica set – which just holds the data for that shard.
A replica set is one or more servers, each holding copies of the same data. At any given time one is primary and the rest are secondaries. If the primary goes down one of the secondaries takes over automatically as primary. All writes and consistent reads go to the primary, and all eventually consistent reads are distributed amongst all the secondaries.
2. Multiple config servers, each one holds a copy of the meta data indicating which data lives on which shard.
3. One or more routers, each one acts as a server for one or more clients. Clients issue queries/updates to a router and the router routes them to the appropriate shard while consulting the config servers.
4. One or more clients, each one is (part of) the user’s application and issues commands to a router via the mongo client library (driver) for its language.
mongod is the server program (data or config). mongos is the router program.

Small deployment (no partitioning)
1. One replica set (automatic failover), or one server with zero or more slaves (no automatic failover).
2. One or more clients issuing commands to the replica set as a whole or the single master (the driver will manage which server in the replica set to send to).

Mongo data model
•A Mongo system (see deployment above) holds a set of databases
•A database holds a set of collections
•A collection holds a set of documents
•A document is a set of fields
•A field is a key-value pair
•A key is a name (string)
•A value is a
•basic type like string, integer, float, timestamp, binary, etc.,
•a document, or
•an array of values

Mongo query language
To retrieve certain documents from a db collection, you supply a query document containing the fields the desired documents should match. For example, {name: {first: ‘John’, last: ‘Doe’}} will match all documents in the collection with name of John Doe. Likewise, {name.last: ‘Doe’} will match all documents with last name of Doe. Also, {name.last: /^D/} will match all documents with last name starting with ‘D’ (regular expression match).
Queries will also match inside embedded arrays. For example, {keywords: ‘storage’} will match all documents with ‘storage’ in its keywords array. Likewise, {keywords: {$in: [‘storage’, ‘DBMS’]}} will match all documents with ‘storage’ or ‘DBMS’ in its keywords array.
If you have lots of documents in a collection and you want to make a query fast then build an index for that query. For example, ensureIndex({name.last: 1}) or ensureIndex({keywords: 1}). Note, indexes occupy space and slow down updates a bit, so use them only when the tradeoff is worth it.

Leave a comment Continue Reading →

Java Coffee Break – Java revolutionary language

Java – an island of Indonesia, a type of coffee, and a programming language. Three very different meanings, each in varying degrees of importance. Most programmers, though, are interested in the Java programming language. In just a few short years (since late 1995), Java has taken the software community by storm. Its phenomenal success has made Java the fastest growing programming language ever. There’s plenty of hype about Java, and what it can do. Many programmers, and end-users, are confused about exactly what it is, and what Java offers.

Java is a revolutionary language

The properties that make Java so attractive are present in other programming languages. Many languages are ideally suited for certain types of applications, even more so than Java. But Java brings all these properties together, in one language. This is a revolutionary jump forward for the software industry.

Let’s look at some of the properties in more detail: –

  • object-oriented

  • portable

  • multi-threaded

  • automatic garbage collection

  • secure

  • network and “Internet” aware

  • simplicity and ease-of-use


Many older languages, like C and Pascal, were procedural languages. Procedures (also called functions) were blocks of code that were part of a module or application. Procedures passed parameters (primitive data types like integers, characters, strings, and floating point numbers). Code was treated separately to data. You had to pass around data structures, and procedures could easily modify their contents. This was a source of problems, as parts of a program could have unforeseen effects in other parts. Tracking down which procedure was at fault wasted a great deal of time and effort, particularly with large programs.

In some procedural language, you could even obtain the memory location of a data structure. Armed with this location, you could read and write to the data at a later time, or accidentally overwrite the contents.

Java is an object-oriented language. An object-oriented language deals with objects. Objects contain both data (member variables) and code (methods). Each object belongs to a particular class, which is a blueprint describing the member variables and methods an object offers. In Java, almost every variable is an object of some type or another – even strings. Object-oriented programming requires a different way of thinking, but is a better way to design software than procedural programming.

There are many popular object-oriented languages available today. Some like Smalltalk and Java are designed from the beginning to be object-oriented. Others, like C++, are partially object-oriented, and partially procedural. In C++, you can still overwrite the contents of data structures and objects, causing the application to crash. Thankfully, Java prohibits direct access to memory contents, leading to a more robust system.


Most programming languages are designed for a specific operating system and processor architecture. When source code (the instructions that make up a program) are compiled, it is converted to machine code which can be executed only on one type of machine. This process produces native code, which is extremely fast.

Another type of language is one that is interpreted. Interpreted code is read by a software application (the interpreter), which performs the specified actions. Interpreted code often doesn’t need to be compiled – it is translated as it is run. For this reason, interpreted code is quite slow, but often portable across different operating systems and processor architectures.

Java takes the best of both techniques. Java code is compiled into a platform-neutral machine code, which is called Java bytecode. A special type of interpreter, known as a Java Virtual Machine (JVM), reads the bytecode, and processes it. Figure One shows a disassembly of a small Java application. The bytecode, indicated by the arrow, is represented in text form here, but when compiled it is represented as bytes to conserve space.

Figure One – Bytecode disassembly for “HelloWorld”

The approach Java takes offers some big advantages over other interpreted languages. Firstly, the source code is protected from view and modification – only the bytecode needs to be made available to users. Secondly, security mechanisms can scan bytecode for signs of modification or harmful code, complimenting the other security mechanisms of Java. Most of all though, it means that Java code can be compiled once, and run on any machine and operating system combination that supports a Java Virtual Machine (JVM). Java can run on Unix, Windows, Macintosh, and even the Palm Pilot. Java can even run inside a web browser, or a web server. Being portable means that the application only has to be written once – and can then execute on a wider range of machines. This saves a lot of time, and money.


If you’ve ever written complex applications in C, or PERL, you’ll probably have come across the concept of multiple processes before. An application can split itself into separate copies, which run concurrently. Each copy replicates code and data, resulting in increased memory consumption. Getting the copies to talk together can be complex, and frustrating. Creating each process involves a call to the operating system, which consumes extra CPU time as well.

A better model is to use multiple threads of execution, referred to as threads for short. Threads can share data and code, making it easier to share data between thread instances. They also use less memory and CPU overhead. Some languages, like C++, have support for threads, but they are complex to use. Java has support for multiple threads of execution built right into the language. Threads require a different way of thinking, but can be understood very quickly. Thread support in Java is very simple to use, and the use of threads in applications and applets is quite commonplace.

Automatic garbage collection

No, we’re not talking about taking out the trash (though a computer that could literally do that would be kind of neat). The term garbage collection refers to the reclamation of unused memory space. When applications create objects, the JVM allocates memory space for their storage. When the object is no longer needed (no reference to the object exists), the memory space can be reclaimed for later use. 

Languages like C++ force programmers to allocate and deallocate memory for data and objects manually. This adds extra complexity, but also causes another problem – memory leaks. When programmers forget to deallocate memory, the amount of free memory available is decreased. Programs that frequently create and destroy objects may eventually find that there is no memory left. In Java, the programmer is free from such worries, as the JVM will perform automatic garbage collection of objects.


Security is a big issue with Java. Since Java applets are downloaded remotely, and executed in a browser, security is of great concern. We wouldn’t want applets reading our personal documents, deleting files, or causing mischief. At the API level, there are strong security restrictions on file and network access for applets, as well as support for digital signatures to verify the integrity of downloaded code. At the bytecode level, checks are made for obvious hacks, such as stack manipulation or invalid bytecode. The strong security mechanisms in Java help to protect against inadvertent or intentional security violations, but it is important to remember that no system is perfect. The weakest link in the chain is the Java Virtual Machine on which it is run – a JVM with known security weaknesses can be prone to attack. It is also worth noting that while there have been a few identified weaknesses in JVMs, they are rare, and usually fixed quickly. 

Network and “Internet” aware

Java was designed to be “Internet” aware, and to support network programming. The Java API provides extensive network support, from sockets and IP addresses, to URLs and HTTP. It’s extremely easy to write network applications in Java, and the code is completely portable between platforms. In languages like C/C++, the networking code must be re-written for different operating systems, and is usually more complex. The networking support of Java saves a lot of time, and effort.

Java also includes support for more exotic network programming, such as remote-method invocation (RMI), CORBA and Jini. These distributed systems technologies make Java an attractive choice for large distributed systems.

Simplicity and ease-of-use

Java draws its roots from the C++ language. C++ is widely used, and very popular. Yet it is regarded as a complex language, with features like multiple-inheritance, templates and pointers that are counter-productive. Java, on the other hand, is closer to a “pure” object-oriented language. Access to memory pointers is removed, and object-references are used instead. Support for multiple-inheritance has been removed, which lends itself to clearer and simpler class designs. The I/O and network library is very easy to use, and the Java API provides developers with lots of time-saving code (such as networking and data-structures).  After using Java for awhile, most developers are reluctant to return to other languages, because of the simplicity and elegance of Java.


Java provides developers with many advantages. While most of these are present in other languages, Java combines all of these together into one language. The rapid growth of Java has been nothing short of phenomenal, and shows no signs (yet!) of slowing down.

Leave a comment Continue Reading →