loader from loading.io

awk && Regular Expressions For Finding Text

The History of Computing

Release Date: 03/04/2022

Lotus: From Yoga to Software show art Lotus: From Yoga to Software

The History of Computing

Nelumbo nucifera, or the sacred lotus, is a plant that grows in flood plains, rivers, and deltas. Their seeds can remain dormant for years and when floods come along, blossom into a colony of plants and flowers. Some of the oldest seeds can be found in China, where they’re known to represent longevity. No surprise, given their level of nitrition and connection to the waters that irrigated crops by then. They also grow in far away lands, all the way to India and out to Australia. The flower is sacred in Hinduism and Buddhism, and further back in ancient Egypt. Padmasana is a Sanskrit term...

info_outline
Section 230 and the Concept of Internet Exceptionalism show art Section 230 and the Concept of Internet Exceptionalism

The History of Computing

We covered computer and internet copyright law in a previous episode. That type of law began with interpretations that tried to take the technology out of cases so they could be interpreted as though what was being protected was a printed work, or at least it did for a time. But when it came to the internet, laws, case law, and their knock-on effects, the body of jurisprudence work began to diverge.  Safe Harbor mostly refers to the Online Copyright Infringement Liability Limitation Act, or OCILLA for short, was a law passed in the late 1980s that  shields online portals and internet...

info_outline
Bluetooth: From Kings to Personal Area Networks show art Bluetooth: From Kings to Personal Area Networks

The History of Computing

Bluetooth The King Ragnar Lodbrok was a legendary Norse king, conquering parts of Denmark and Sweden. And if we’re to believe the songs, he led some of the best raids against the Franks and the the loose patchwork of nations Charlemagne put together called the Holy Roman Empire.  We use the term legendary as the stories of Ragnar were passed down orally and don’t necessarily reconcile with other written events. In other words, it’s likely that the man in the songs sung by the bards of old are likely in fact a composite of deeds from many a different hero of the norse.   Ragnar...

info_outline
One History Of 3D Printing show art One History Of 3D Printing

The History of Computing

One of the hardest parts of telling any history, is which innovations are significant enough to warrant mention. Too much, and the history is so vast that it can't be told. Too few, and it's incomplete. Arguably, no history is ever complete. Yet there's a critical path of innovation to get where we are today, and hundreds of smaller innovations that get missed along the way, or are out of scope for this exact story. Children have probably been placing sand into buckets to make sandcastles since the beginning of time. Bricks have survived from round 7500BC in modern-day Turkey where humans made...

info_outline
Adobe: From Pueblos to Fonts and Graphics to Marketing show art Adobe: From Pueblos to Fonts and Graphics to Marketing

The History of Computing

The Mogollon culture was an indigenous culture in the Western United States and Mexico that ranged from New Mexico and Arizona to Sonora, Mexico and out to Texas. They flourished from around 200 CE until the Spanish showed up and claimed their lands. The cultures that pre-existed them date back thousands more years, although archaeology has yet to pinpoint exactly how those evolved. Like many early cultures, they farmed and foraged. As they farmed more, their homes become more permanent and around 800 CE they began to create more durable homes that helped protect them from wild swings in the...

info_outline
The Evolution of Fonts on Computers show art The Evolution of Fonts on Computers

The History of Computing

Gutenburg shipped the first working printing press around 1450 and typeface was born. Before then most books were hand written, often in blackletter calligraphy. And they were expensive.    The next few decades saw Nicolas Jensen develop the Roman typeface, Aldus Manutius and Francesco Griffo create the first italic typeface. This represented a period where people were experimenting with making type that would save space. The 1700s saw the start of a focus on readability. William Caslon created the Old Style typeface in 1734. John Baskerville developed Transitional typefaces in 1757....

info_outline
Flight Part II: From Balloons to Autopilot to Drones show art Flight Part II: From Balloons to Autopilot to Drones

The History of Computing

In our previous episode, we looked at the history of flight - from dinosaurs to the modern aircraft that carry people and things all over the world. Those helped to make the world smaller, but UAVs and drones have had a very different impact in how we lead our lives - and will have an even more substantial impact in the future. That might not have seemed so likely in the 1700s, though - when unmann Unmanned Aircraft Napoleon conquered Venice in 1797 and then ceded control to the Austrians the same year. He then took it as part of a treaty in 1805 and established the first Kingdom of Italy....

info_outline
Flight: From Dinosaurs to Space show art Flight: From Dinosaurs to Space

The History of Computing

Humans have probably considered flight since they found birds. As far as 228 million years ago, the Pterosaurs used flight to reign down onto other animals from above and eat them. The first known bird-like dinosaur was the Archaeopteryx, which lived around 150 million years ago. It’s not considered an ancestor of modern birds - but other dinosaurs from the same era, the theropods, are. 25 million years later, in modern China, the Confuciusornis sanctus had feathers and could have flown. The first humans wouldn’t emerge from Africa until 23 million years later. By the 2300s BCE, the...

info_outline
SABRE and the Travel Global Distribution System show art SABRE and the Travel Global Distribution System

The History of Computing

Computing has totally changed how people buy and experience travel. That process seemed to start with sites that made it easy to book travel, but as with most things we experience in our modern lives, it actually began far sooner and moved down-market as generations of computing led to more consumer options for desktops, the internet, and the convergence of these technologies. Systems like SABRE did the original work to re-think travel - to take logic and rules out of the heads of booking and travel agents and put them into a digital medium. In so doing, they paved the way for future...

info_outline
The Story of Intel show art The Story of Intel

The History of Computing

We’ve talked about the history of microchips, transistors, and other chip makers. Today we’re going to talk about Intel in a little more detail.  Intel is short for Integrated Electronics. They were founded in 1968 by Robert Noyce and Gordon Moore. Noyce was an Iowa kid who went off to MIT to get a PhD in physics in 1953. He went off to join the Shockley Semiconductor Lab to join up with William Shockley who’d developed the transistor as a means of bringing a solid-state alternative to vacuum tubes in computers and amplifiers. Shockley became erratic after he won the Nobel Prize and...

info_outline
 
More Episodes
Programming was once all about math. And life was good. Then came strings, or those icky non-numbery things. Then we had to process those strings. And much of that is looking for patterns that wouldn’t be a need with integers, or numbers. For example, a space in a string of text. Let’s say we want to print hello world to the screen in bash. That would be the echo command, followed by “Hello World!” Now let’s say we ran that without the quotes then it would simply echo out the word Hello to the screen, given that the interpreter saw the space and ended the command, or looked for the next operator or verb according to which command is being used.

Unix was started in 1969 at Bell Labs. Part of that work was The Thompson shell, the first Unix shell, which shipped in 1971. And C was written in 1972. These make up the ancestral underpinnings of the modern Linux, BSD, Android, Chrome, iPhone, and Mac operating systems.

A lot of the work the team at Bell Labs was doing was shifting from pure statistical and mathematical operations to connect phones and do R&D faster to more general computing applications. Those meant going from math to those annoying stringy things. Unix was an early operating system and that shell gave them new abilities to interact with the computer. People called files funny things. There was text in those files. And so text manipulation became a thing.

Lee McMahon developed sed in 1974, which was great for finding patterns and doing basic substitutions. Another team  at Bell Labs that included Finnish programmer Alfred Aho, Peter Weinberger, and Brian Kernighan had more advanced needs. Take their last name initials and we get awk. Awk is a programming language they developed in 1977 for data processing, or more specifically for text manipulation. Marc Rochkind had been working on a version management tool for code at Bell and that involved some text manipulation, as well as a good starting point for awk. 

It’s meant to be concise and given some input, produce the desired output. Nice, short, and efficient scripting language to help people that didn’t need to go out and learn C to do some basic tasks. AWK is a programming language with its own interpreter, so no need to compile to run AWK scripts as executable programs. 

Sed and awk are both written to be used as one0line programs, or more if needed. But building in an implicit loops and implicit variables made it simple to build short but power regular expressions. Think of awk as a pair of objects. The first is a pattern followed by an action to take in curly brackets. It can be dangerous to call if the pattern is too wide open.; especially when piping information For example,  ls -al at the root of a volume and piping that to awk $1 or some other position and then piping that into xargs to rm and a systems administrator could have a really rough day. Those $1, $2, and so-on represent the positions of words. So could be directories. 

Think about this, though. In a world before relational databases, when we were looking to query the 3rd column in a file with information separated by some delimiter, piping those positions represented a simple way to effectively join tables of information into a text file or screen output. Or to find files on a computer that match a pattern for whatever reason. 

Awk began powerful. Over time, improvements have enabled it to be used in increasingly  complicated scenarios. Especially when it comes to pattern matching with regular expressions. Various coding styles for input and output have been added as well, which can be changed depending on the need at hand. 

Awk is also important because it influenced other languages. After becoming part of the IEEE Standard 1003.1, it is now a part of the POSIX standard. And after a few years, Larry Wall came up with some improvements, and along came Perl. But the awk syntax has always been the most succinct and useable regular expression engines. Part of that is the wildcard, piping, and file redirection techniques borrowed from the original shells.

The AWK creators wrote a book called The AWK Programming Language for Addison-Wesley in 1988. Aho would go on to develop influential algorithms, write compilers, and write books (some of which were about compilers). Weinberger continued to do work at Bell before becoming the Chief Technology Officer of Hedge Fund Renaissance Technologies with former code breaker and mathematician James Simon and Robert Mercer. His face led to much love from his coworkers at Bell during the advent of digital photography and hopefully some day we’ll see it on the Google Search page, given he now works there. 

Brian Kernighan was a contributor to the early Multics then Unix work, as well as C. In fact, an important C implementation, K&R C, stands for Kernighan and Ritchie C. He coauthored The C Programming Language ands written a number of other books, most recently on the Go Programming Language. He also wrote a number of influential algorithms, as well as some other programming languages, including AMPL. His 1978 description of how to manage memory when working with those pesky strings we discussed earlier went on to give us the Hello World example we use for pretty much all introductions to programming languages today. He worked on ARPA projects at Stanford, helped with emacs, and now teaches computer science at Princeton, where he can help to shape the minds of future generations of programming languages and their creators.