Music and Architecture: A Cross between Inspiration and Method

  • Published: July 2009
  • Volume 11 , pages 257–271, ( 2009 )

Cite this article

  • Alessandra Capanna 1  

10k Accesses

Explore all metrics

This paper is one of a set of lessons prepared for the course of “Theory of Architecture” (Faculty of Architecture – “La Sapienza” University of Rome). The didactic aim was to present – to students attending the first year of courses – some methods for the beginning stages of design and their applicability to any kind creative work. The brief multimedia hypertext quoted at the end of this paper was carried out in collaboration with the “LaMA” (Laboratorio Multimediale di Architettura) as a test for new educational tools applied to first our “e-learning” experiences.

Article PDF

Download to read the full article text

Similar content being viewed by others

research paper on music and architecture

The Imperative of an Arts-Led Curriculum

research paper on music and architecture

Video Games as Mythology Museums? Mythographical Story Collections in Games

Alexander Vandewalle

research paper on music and architecture

Intangible Cultural Heritage: A Challenge to Aesthetic and Cultural Education

Avoid common mistakes on your manuscript.

Author information

Authors and affiliations.

Via della Bufalotta 67, 00139, Roma, Italy

Alessandra Capanna

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Alessandra Capanna .

About this article

Capanna, A. Music and Architecture: A Cross between Inspiration and Method. Nexus Netw J 11 , 257–271 (2009). https://doi.org/10.1007/s00004-008-0092-z

Download citation

Issue Date : July 2009

DOI : https://doi.org/10.1007/s00004-008-0092-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • architecture
  • Daniel Libeskind
  • Bela Bartók
  • Steven Holl
  • Ernest Bloch
  • golden section
  • Find a journal
  • Publish with us
  • Track your research

Cengage Logo-Home Page

  • Instructors
  • Institutions
  • Teaching Strategies
  • Higher Ed Trends
  • Academic Leadership
  • Affordability
  • Product Updates

Black History Month 2024: African Americans and the Arts 

A woman reads a book

The national theme for Black History Month 2024 is “ African Americans and the Arts .”  

Black History Month 2024 is a time to recognize and highlight the achievements of Black artists and creators, and the role they played in U.S. history and in shaping our country today.  

To commemorate this year’s theme, we’ve gathered powerful quotes about learning, culture and equality from five historic Black American authors, teachers and artists who made a significant impact in the Arts, education ― and the nation.  

  Making history  

“Real education means to inspire people to live more abundantly, to learn to begin with life as they find it and make it better.” – Carter G. Woodson, Author, Journalist, Historian and Educator, 1875-1950  

Known as the “Father of Black History,” Carter G. Woodson was primarily self-taught in most subjects. In 1912, he became the second Black person to receive a Ph.D. from Harvard.   

He is the author of more than 30 books, including “T he Mis-Education of the Negro. ”  

Carter G. Woodson dedicated his life to teaching Black History and incorporating the subject of Black History in schools. He co-founded what is now the Association for the Study of African American Life and History, Inc. (ASALH) . In February 1926, Woodson launched the first Negro History Week , which has since been expanded into Black History Month.  

Carter G. Woodson

Providing a platform  

“I have created nothing really beautiful, really lasting, but if I can inspire one of these youngsters to develop the talent.” – Augusta Savage, Sculptor, 1892-1962  

An acclaimed and influential sculptor of the Harlem Renaissance, Augusta Savage was a teacher and an activist who fought for African American rights in the Arts. She was one out of only four women, and the only Black woman, commissioned for the 1939 New York World’s Fair. She exhibited one of her most famous works, “Lift Every Voice and Sing,” which she named after the hymn by James Weldon Johnson, sometimes referred to as the Black National Anthem. Her sculpture is also known as “ The Harp, ” renamed by the fair’s organizers.  

Photograph of Augusta Savage

Raising a voice  

“My mother said to me ‘My child listen, whatever you do in this world no matter how good it is you will never be able to please everybody. But what one should strive for is to do the very best humanly possible.’” – Marian Anderson, American Contralto, 1897-1993  

Marian Anderson broke barriers in the opera world. In 1939, she performed at the Lincoln Memorial in front of a crowd of 75,000 after the Daughters of the American Revolution (DAR) denied her access to the DAR Constitution Hall because of her race. And in 1955, Marian Anderson became the first African American to perform at the Metropolitan Opera. She sang the leading role as Ulrica in Verdi’s Un Ballo in Maschera.  

research paper on music and architecture

Influencing the world  

“The artist’s role is to challenge convention, to push boundaries, and to open new doors of perception.” – Henry Ossawa Tanner, Painter, 1859-1937  

Henry Ossawa Tanner is known to be the first Black artist to gain world-wide fame and acclaim. In 1877, he enrolled at the Pennsylvania Academy of the Fine Arts , where he was the only Black student. In 1891, Tanner moved to Paris to escape the racism he was confronted with in America. Here, he painted two of his most recognized works, “ The Banjo Lesson” and “ The Thankful Poor of 1894. ”    

In 1923, Henry O. Tanner was awarded the Chevalier of the Legion of Honor by the French government, France’s highest honor.  

Henry Ossawa Tanner

Rising up  

“Wisdom is higher than a fool can reach.” – Phillis Wheatley, Poet, 1753-1784  

At about seven years old, Phillis Wheatley was kidnapped from her home in West Africa and sold into slavery in Boston. She started writing poetry around the age of 12 and published her first poem, “ Messrs. Hussey and Coffin ,” in Rhode Island’s Newport Mercury newspaper in 1767.   

While her poetry spread in popularity ― so did the skepticism. Some did not believe an enslaved woman could have authored the poems. She defended her work to a panel of town leaders and became the first African American woman to publish a book of poetry. The panel’s attestation was included in the preface of her book.  

Phillis Wheatley corresponded with many artists, writers and activists, including a well-known 1 774 letter to Reverand Samson Occom about freedom and equality.  

Phillis Wheatley with pen and paper

Honoring Black History Month 2024  

Art plays a powerful role in helping us learn and evolve. Not only does it introduce us to a world of diverse experiences, but it helps us form stronger connections. These are just a few of the many Black creators who shaped U.S. history ― whose expressions opened many doors and minds.  

Black History Month is observed each year in February. To continue your learning, go on a journey with Dr. Jewrell Rivers, as he guides you through Black History in higher education. Read his article, “A Brief History: Black Americans in Higher Education.”  

Related articles

Student reading a book

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

MUSIC AS INSPIRATION FOR ARCHITECTURAL FORM

Profile image of Viola Langat

a c o m p a r a t i v e a n a l y s i s o f m i j i k e n d a t r a d i t i o n a l a r c h i t e c t u r e a n d m u s i c

Related Papers

The Architecture of Music - Design by Analogy

Marko Petreski

Music has the potential to evoke some of our deepest emotions and drastically alter our moods simply through specific combinations of notes. Architecture too, should have this potential. This thesis aims at investigating the secret of emotion in music, as it is an attempt to draw an analogy that may help to create a deeper architectural experience. A great musical composition is like a dream. When it is convincing in its narrative it draws you in, and makes you believe in its reality, stirring emotions within you for no apparent reason. When you look deeper into the nature of a musical composition, like musicologists have been doing, you discover testable, psychological devices are applied in order to create a greater emotional response from the listeners. By studying the work of Beethoven, I believe that I can translate these aural devices into visual and tactile ones, in the form of a housing development in Auckland city. Housing developments suffer from problems of monotony, due to their repetitive and modular nature. This project seeks to provide an alternative to these monotonous schemes by using a piece of music as a design brief. Highly architectural pieces of music are already perfectly balanced in their blend between variation and repetition, and are therefore great for regulating order. I have chosen my own piece of music, the Pathetique sonata by Beethoven, as inspiration because of its complex architecture, and because there are many professional analyses of it from which to study. My goal in this thesis is to reveal the wealth of architecural ideas hidden within highly structured pieces of music.

research paper on music and architecture

sneha trivedi

Alessandra Capanna

This paper is one of a set of lessons prepared for the course of “Theory of Architecture” (Faculty of Architecture — “La Sapienza” University of Rome). The didactic aim was to present — to students attending the first year of courses — some methods for the beginning stages of design and their applicability to any kind creative work. The brief multimedia hypertext quoted at the end of this paper was carried out in collaboration with the “LaMA” (Laboratorio Multimediale di Architettura) as a test for new educational tools applied to first our “e-learning” experiences.

In this portfolio of compositions, I have explored the concept of musical ambiguity. I perceive this kind of ambiguity as an interesting and purposeful instability, created by employing a variety of techniques to arrange and manipulate complex compositional materials. Interrupting the narrative flow with unexpected, disturbing fragments should increase the expressive tension of the musical structure and narrative, thereby obliterating ‘habitual’ hierarchies of perception and eradicating a false sense of stability in the listener. My techniques include the simultaneous juxtaposition of several aural ‘viewpoints’, such as dense versus lucid textures, as well as the fragmentation and repositioning of chosen elements. In the analyses of my compositions I have made use of some ideas and terms from the visual arts, architecture and philosophy to clarify my arguments. Every musical element from tonality, texture and rhythm to structure and narrative is questioned and reinterpreted by means of fragmentation and juxtaposition with the aim of creating layers of textures and timbres. Prior to this treatment, a composition is generally begun with a musical idea, derived from my mental ‘sound library’, which already alludes to timbre, instrumentation, atmosphere, modality and provides motivic impetus. Furthermore, a lively collaboration with musicians plays an important role during the compositional process and has enabled me to find my own voice through the shaping of the experimental materials into the two orchestral works, a few ensemble pieces and solo works.

Ken-Albert Orwa

Architectural design in urban Africa has long been defined from an outsider's point of view. It's high time that Africans begun to craft their own identity, even in their built environment. African urban music is a metaphor of both local expression and global appreciation. Music and architecture have a resonance in the design elements and principles that are used in their composition. Therefore, this dissertation demonstrates the possibility of using African urban music as a way of inspiring and basing the architectural landscape of urban Africa.

Marwa Morsi

At a time when technology has unleashed architects to design the most extravagant tangible forms, could questioning the concept of form in an abstract medium like music, which lies in the territory of idea, and incorporating its ephemeral qualities into the spatial realm lead to a reconsideration of architectural form? -- Isben Önen's insightful book offers a new view of the relationship between the histories of Western music and architecture. It demonstrates both the profound overlaps and divergences between the two fields, presenting a wide-ranging understanding of the development of each field. This is cultural history at its best: richly informative and deeply engaging. (Christopher Long) -- Throughout history there has been a constant fascination of architects with music. Isben Önen, an architect and a musician himself, sets out explore the formal correspondence between his two fields of expertise, from Vitruvius to Leon Battista Alberti; from Gabrieli through the enlightenment, to Alban Berg and all the way to Renzo Piano’s and Luigi Nono’s contemporary Prometeo. This uncommonly vast and profound piece of scholarship is of interest not only to musicians and architects but also the non-specialized public. (Liane Lefaivre)

Different art branches meet in many fields. Words & sentences of art are similar as regards harmonic elements, unity & rhythm. We focus in this study on those fields which make the architecture meets the music arts, and how each art is affecting the others? There has been a tremendous development in architectural concepts. In the past, we had obtained a complete symphony through the architectural works. Presently, we invent flexible, dynamic architectural models by integrating music wave frequencies using computer simulation programs or by de-constructing forms into free separate pixels. Step 1, Explaining the mutual relation between music & its philosophical effect on the architectural ideas. Step 2, Analyzing created architectural products which are designed by the musical harmony inside the computer simulation program. Step 3 Finding if these new design methods will meet the society‘s desire & acceptance in the future, or not?

Dancing Architecture: The parallel evolution of Bharatanatyam and South Indian Architecture

Tracey Eve Winton

In her book, "Indian Classical dance", Kapila Vatsyayan describes dance as the highest order of spiritual discipline, the enactment of which is symbolic of a ritual sacrifice of one's being to a transcendental order. The Natya-Shashtra, a treatise on drama and dance, reveals the status of the performing arts as equal to prayer and sacrificial rites in the pursuit of moksha, the release form cycles of rebirth. Both dance and dancer function as a vehicle for divine invocation and are mirrored in the architectural surroundings. To investigate this connection between dance and place, it is imperative to understand the mythical origins of architecture and temple dance. the Hindu philosophy of the cosmic man and its religious relationship with the Dravidian architecture of Tamil Nadu is the starting point of the discussion of a south Indian aesthetic. The Vastu-purusha mandala is a philosophical diagram that provides a foundation for Hindu aesthetics, linking physical distance, religious position and universal scale in both time and space. Used as an architectural diagram, it becomes a mediator between the human body and the cosmos. The temple, as a setting for dance performances, and constructed based on the mandala, shares this quality of immersing its participants into a multi-sensory spatial experience. However, while the link between architecture and dance culture was explicit up to the 18th century, it is less compelling in the context of modern south Indian architecture. With an increasingly unstable political landscape during the 20th century, architectural growth in south India during this period is almost stagnant. Unfortunately, this creates a break in the continuity and comparative evolution of dance and architecture, leading to the fragmentation and abstraction of dance in its modern form. South Indian dance has since transformed into a prominent cultural symbol and various incarnations of the dancer have become the isolated yet important link, between tradition and modernity. As an evolving living embodiment of contemporary culture and identity, her transformation from Devadasi, to an icon of nationalism, to a choreographer of 'high art' provides the foundation for the reintegration of architecture in the cultural fabric. The culmination of this research aims to reinstate the importance of architecture as a cultural nexus in order to restring a fragmented dance, community and cultural identity.

Jocelyn Wolfe , Bruce Wolfe

We engage with a place in a multitude of ways, through all of the senses. We are all an extension of place, but in modern societies the visual often typifies our sense of belonging to place at the expense of other senses. The architecture of our surroundings, for example, is predominantly described in visual terms. Art and architecture are often conflated. Art in architecture is often reduced to the visual, and the integration with or application to the built form. Yet architecture gives an acoustic quality of material and space that is inseparable from the experience. Commonly, we are more touched by what we hear than what we see. How, then, do we know our musical selves through architecture? This paper discusses one way of knowing through “The Piano Mill Project”, a hybrid building and musical instrument, designed and purpose-built to house sixteen reclaimed pianos – vestiges of colonialism, post-colonialism, artistic hierarchies, and new beginnings.

RELATED PAPERS

Published by Canadian Center of Science and Education

Essam M E T W A L L Y Mohamed

Gisa Jähnichen

Nathaniel Burrows

Archipel, 86

Dana Rappoport

Ishan Pendam

Sourabh Paliwal

Sofia Kondylia

Beyond Computer-Aided Design, 13th Conference on Computer Aided Architectural Design Research in Asia (CAADRIA),

Marc Aurel Schnabel

Panos Parthenios

Monte Pemberton

mauricio molina

Marilina Α Tzelepi

Ravindra Kumar

Karl J Haas

Anetta Floirat

E. John Collins

Bauhaus 8: Movement

Ross Anderson

Amatalraof Abdullah

Studying Hinduism in Practice

Jeffrey S Lidke

Malcolm Dickson

Journal of Literature and Art Studies JLAS

Musicology Australia

Made Mantle Hood

Jaishree Mishra

Stephen Davies

Thomas Leddy , Richard Shusterman

Rita Farrell

Michaela Ann Cameron

The International Journal of Technology, Knowledge and Society

Bronwen Wickkiser

New Trends and Issues Proceedings on Humanities and Social Sciences

New Trends and Issues Proceedings on Humanities and Social Sciences (PROSOC)

Modernism/ modernity

Allen Pierce

Nadia Burgess

Ralph Middenway

Sound Moves Conference Proceedings

Allen Fogelsanger

Francisco Otalora

International Journal of Social Sciences, Arts and Humanities

Dr. Amit Soni

Art Journal

Nell Andrew

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

Our next-generation model: Gemini 1.5

Feb 15, 2024

The model delivers dramatically enhanced performance, with a breakthrough in long-context understanding across modalities.

SundarPichai_2x.jpg

A note from Google and Alphabet CEO Sundar Pichai:

Last week, we rolled out our most capable model, Gemini 1.0 Ultra, and took a significant step forward in making Google products more helpful, starting with Gemini Advanced . Today, developers and Cloud customers can begin building with 1.0 Ultra too — with our Gemini API in AI Studio and in Vertex AI .

Our teams continue pushing the frontiers of our latest models with safety at the core. They are making rapid progress. In fact, we’re ready to introduce the next generation: Gemini 1.5. It shows dramatic improvements across a number of dimensions and 1.5 Pro achieves comparable quality to 1.0 Ultra, while using less compute.

This new generation also delivers a breakthrough in long-context understanding. We’ve been able to significantly increase the amount of information our models can process — running up to 1 million tokens consistently, achieving the longest context window of any large-scale foundation model yet.

Longer context windows show us the promise of what is possible. They will enable entirely new capabilities and help developers build much more useful models and applications. We’re excited to offer a limited preview of this experimental feature to developers and enterprise customers. Demis shares more on capabilities, safety and availability below.

Introducing Gemini 1.5

By Demis Hassabis, CEO of Google DeepMind, on behalf of the Gemini team

This is an exciting time for AI. New advances in the field have the potential to make AI more helpful for billions of people over the coming years. Since introducing Gemini 1.0 , we’ve been testing, refining and enhancing its capabilities.

Today, we’re announcing our next-generation model: Gemini 1.5.

Gemini 1.5 delivers dramatically enhanced performance. It represents a step change in our approach, building upon research and engineering innovations across nearly every part of our foundation model development and infrastructure. This includes making Gemini 1.5 more efficient to train and serve, with a new Mixture-of-Experts (MoE) architecture.

The first Gemini 1.5 model we’re releasing for early testing is Gemini 1.5 Pro. It’s a mid-size multimodal model, optimized for scaling across a wide-range of tasks, and performs at a similar level to 1.0 Ultra , our largest model to date. It also introduces a breakthrough experimental feature in long-context understanding.

Gemini 1.5 Pro comes with a standard 128,000 token context window. But starting today, a limited group of developers and enterprise customers can try it with a context window of up to 1 million tokens via AI Studio and Vertex AI in private preview.

As we roll out the full 1 million token context window, we’re actively working on optimizations to improve latency, reduce computational requirements and enhance the user experience. We’re excited for people to try this breakthrough capability, and we share more details on future availability below.

These continued advances in our next-generation models will open up new possibilities for people, developers and enterprises to create, discover and build using AI.

Context lengths of leading foundation models

Highly efficient architecture

Gemini 1.5 is built upon our leading research on Transformer and MoE architecture. While a traditional Transformer functions as one large neural network, MoE models are divided into smaller "expert” neural networks.

Depending on the type of input given, MoE models learn to selectively activate only the most relevant expert pathways in its neural network. This specialization massively enhances the model’s efficiency. Google has been an early adopter and pioneer of the MoE technique for deep learning through research such as Sparsely-Gated MoE , GShard-Transformer , Switch-Transformer, M4 and more.

Our latest innovations in model architecture allow Gemini 1.5 to learn complex tasks more quickly and maintain quality, while being more efficient to train and serve. These efficiencies are helping our teams iterate, train and deliver more advanced versions of Gemini faster than ever before, and we’re working on further optimizations.

Greater context, more helpful capabilities

An AI model’s “context window” is made up of tokens, which are the building blocks used for processing information. Tokens can be entire parts or subsections of words, images, videos, audio or code. The bigger a model’s context window, the more information it can take in and process in a given prompt — making its output more consistent, relevant and useful.

Through a series of machine learning innovations, we’ve increased 1.5 Pro’s context window capacity far beyond the original 32,000 tokens for Gemini 1.0. We can now run up to 1 million tokens in production.

This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we’ve also successfully tested up to 10 million tokens.

Complex reasoning about vast amounts of information

1.5 Pro can seamlessly analyze, classify and summarize large amounts of content within a given prompt. For example, when given the 402-page transcripts from Apollo 11’s mission to the moon, it can reason about conversations, events and details found across the document.

Reasoning across a 402-page transcript: Gemini 1.5 Pro Demo

Gemini 1.5 Pro can understand, reason about and identify curious details in the 402-page transcripts from Apollo 11’s mission to the moon.

Better understanding and reasoning across modalities

1.5 Pro can perform highly-sophisticated understanding and reasoning tasks for different modalities, including video. For instance, when given a 44-minute silent Buster Keaton movie , the model can accurately analyze various plot points and events, and even reason about small details in the movie that could easily be missed.

Multimodal prompting with a 44-minute movie: Gemini 1.5 Pro Demo

Gemini 1.5 Pro can identify a scene in a 44-minute silent Buster Keaton movie when given a simple line drawing as reference material for a real-life object.

Relevant problem-solving with longer blocks of code

1.5 Pro can perform more relevant problem-solving tasks across longer blocks of code. When given a prompt with more than 100,000 lines of code, it can better reason across examples, suggest helpful modifications and give explanations about how different parts of the code works.

Problem solving across 100,633 lines of code | Gemini 1.5 Pro Demo

Gemini 1.5 Pro can reason across 100,000 lines of code giving helpful solutions, modifications and explanations.

Enhanced performance

When tested on a comprehensive panel of text, code, image, audio and video evaluations, 1.5 Pro outperforms 1.0 Pro on 87% of the benchmarks used for developing our large language models (LLMs). And when compared to 1.0 Ultra on the same benchmarks, it performs at a broadly similar level.

Gemini 1.5 Pro maintains high levels of performance even as its context window increases. In the Needle In A Haystack (NIAH) evaluation, where a small piece of text containing a particular fact or statement is purposely placed within a long block of text, 1.5 Pro found the embedded text 99% of the time, in blocks of data as long as 1 million tokens.

Gemini 1.5 Pro also shows impressive “in-context learning” skills, meaning that it can learn a new skill from information given in a long prompt, without needing additional fine-tuning. We tested this skill on the Machine Translation from One Book (MTOB) benchmark, which shows how well the model learns from information it’s never seen before. When given a grammar manual for Kalamang , a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person learning from the same content.

As 1.5 Pro’s long context window is the first of its kind among large-scale models, we’re continuously developing new evaluations and benchmarks for testing its novel capabilities.

For more details, see our Gemini 1.5 Pro technical report .

Extensive ethics and safety testing

In line with our AI Principles and robust safety policies, we’re ensuring our models undergo extensive ethics and safety tests. We then integrate these research learnings into our governance processes and model development and evaluations to continuously improve our AI systems.

Since introducing 1.0 Ultra in December, our teams have continued refining the model, making it safer for a wider release. We’ve also conducted novel research on safety risks and developed red-teaming techniques to test for a range of potential harms.

In advance of releasing 1.5 Pro, we've taken the same approach to responsible deployment as we did for our Gemini 1.0 models, conducting extensive evaluations across areas including content safety and representational harms, and will continue to expand this testing. Beyond this, we’re developing further tests that account for the novel long-context capabilities of 1.5 Pro.

Build and experiment with Gemini models

We’re committed to bringing each new generation of Gemini models to billions of people, developers and enterprises around the world responsibly.

Starting today, we’re offering a limited preview of 1.5 Pro to developers and enterprise customers via AI Studio and Vertex AI . Read more about this on our Google for Developers blog and Google Cloud blog .

We’ll introduce 1.5 Pro with a standard 128,000 token context window when the model is ready for a wider release. Coming soon, we plan to introduce pricing tiers that start at the standard 128,000 context window and scale up to 1 million tokens, as we improve the model.

Early testers can try the 1 million token context window at no cost during the testing period, though they should expect longer latency times with this experimental feature. Significant improvements in speed are also on the horizon.

Developers interested in testing 1.5 Pro can sign up now in AI Studio, while enterprise customers can reach out to their Vertex AI account team.

Learn more about Gemini’s capabilities and see how it works .

Get more stories from Google in your inbox.

Your information will be used in accordance with Google's privacy policy.

Done. Just one step more.

Check your inbox to confirm your subscription.

You are already subscribed to our newsletter.

You can also subscribe with a different email address .

Related stories

What is a long context window.

MSC_Keyword_Cover (3)

How AI can strengthen digital security

Shield

Working together to address AI risks and opportunities at MSC

AI Evergreen 1 (1)

How we’re partnering with the industry, governments and civil society to advance AI

NWSL_Pixel_Hero

Pixel is now the Official Mobile Phone of the National Women’s Soccer League

Bard_Gemini_Hero

Bard becomes Gemini: Try Ultra 1.0 and a new mobile app today

Let’s stay in touch. Get the latest news from Google in your inbox.

OpenAI’s Sora video-generating model can render video games, too

research paper on music and architecture

OpenAI’s new — and first! — video-generating model, Sora , can pull off some genuinely impressive cinematographic feats. But the model’s even more capable than OpenAI initially made it out to be, at least judging by a technical paper published this evening.

The paper, titled “Video generation models as world simulators,” co-authored by a host of OpenAI researchers, peels back the curtains on key aspects of Sora’s architecture — for instance revealing that Sora can generate videos of an arbitrary resolution and aspect ratio (up to 1080p). Per the paper, Sora’s able to perform a range of image and video editing tasks, from creating looping videos to extending videos forwards or backwards in time to changing the background in an existing video.

But most intriguing to this writer is Sora’s ability to “simulate digital worlds,” as the OpenAI co-authors put it. In an experiment, OpenAI fed Sora prompts containing the word “Minecraft” and had it render a convincingly Minecraft-like HUD and game — and the game’s dynamics, including physics — while simultaneously controlling the player character.

OpenAI Sora can simulate Minecraft I guess. Maybe next generation game console will be "Sora box" and games are distributed as 2-3 paragraphs of text. pic.twitter.com/9BZUIoruOV — Andrew White (@andrewwhite01) February 16, 2024

So how’s Sora able to do this? Well, as observed by senior Nvidia researcher Jim Fan ( via Quartz ), Sora’s more of a “data-driven physics engine” than a creative too. It’s not just generating a single photo or video, but determining the physics of each object in an environment — and rendering a photo or video (or interactive 3D world, as the case may be) based on these calculations.

“These capabilities suggest that continued scaling of video models is a promising path towards the development of highly-capable simulators of the physical and digital world, and the objects, animals and people that live within them,” the OpenAI co-authors write.

Now, Sora’s usual limitations apply in the video game domain. The model can’t accurately approximate the physics of basic interactions like glass shattering. And even with interactions it  can model, Sora’s often inconsistent — for example rendering a person eating a burger but failing to render bite marks.

Still, if I’m reading the paper correctly, it seems Sora could pave the way for more realistic — perhaps even photorealistic — procedurally generated games from text descriptions alone. That’s in equal parts exciting and terrifying (consider the deepfake implications, for one) — which is probably why OpenAI’s choosing to gate Sora behind a very limited access program for now.

Here’s hoping we learn more sooner rather than later.

OpenAI’s newest model Sora can generate videos — and they look decent
  • Newsletters

OpenAI teases an amazing new generative video model called Sora

The firm is sharing Sora with a small group of safety testers but the rest of us will have to wait to learn more.

  • Will Douglas Heaven archive page

OpenAI has built a striking new generative video model called Sora that can take a short text description and turn it into a detailed, high-definition film clip up to a minute long.

Based on four sample videos that OpenAI shared with MIT Technology Review ahead of today’s announcement, the San Francisco–based firm has pushed the envelope of what’s possible with text-to-video generation (a hot new research direction that we flagged as a trend to watch in 2024 ).

“We think building models that can understand video, and understand all these very complex interactions of our world, is an important step for all future AI systems,” says Tim Brooks, a scientist at OpenAI.

But there’s a disclaimer. OpenAI gave us a preview of Sora (which means sky in Japanese) under conditions of strict secrecy. In an unusual move, the firm would only share information about Sora if we agreed to wait until after news of the model was made public to seek the opinions of outside experts. [Editor’s note: We’ve updated this story with outside comment below.] OpenAI has not yet released a technical report or demonstrated the model actually working. And it says it won’t be releasing Sora anytime soon. [ Update: OpenAI has now shared more technical details on its website.]

The first generative models that could produce video from snippets of text appeared in late 2022. But early examples from Meta , Google, and a startup called Runway were glitchy and grainy. Since then, the tech has been getting better fast. Runway’s gen-2 model, released last year, can produce short clips that come close to matching big-studio animation in their quality. But most of these examples are still only a few seconds long.  

The sample videos from OpenAI’s Sora are high-definition and full of detail. OpenAI also says it can generate videos up to a minute long. One video of a Tokyo street scene shows that Sora has learned how objects fit together in 3D: the camera swoops into the scene to follow a couple as they walk past a row of shops.

OpenAI also claims that Sora handles occlusion well. One problem with existing models is that they can fail to keep track of objects when they drop out of view. For example, if a truck passes in front of a street sign, the sign might not reappear afterward.  

In a video of a papercraft underwater scene, Sora has added what look like cuts between different pieces of footage, and the model has maintained a consistent style between them.

It’s not perfect. In the Tokyo video, cars to the left look smaller than the people walking beside them. They also pop in and out between the tree branches. “There’s definitely some work to be done in terms of long-term coherence,” says Brooks. “For example, if someone goes out of view for a long time, they won’t come back. The model kind of forgets that they were supposed to be there.”

Impressive as they are, the sample videos shown here were no doubt cherry-picked to show Sora at its best. Without more information, it is hard to know how representative they are of the model’s typical output.   

It may be some time before we find out. OpenAI’s announcement of Sora today is a tech tease, and the company says it has no current plans to release it to the public. Instead, OpenAI will today begin sharing the model with third-party safety testers for the first time.

In particular, the firm is worried about the potential misuses of fake but photorealistic video . “We’re being careful about deployment here and making sure we have all our bases covered before we put this in the hands of the general public,” says Aditya Ramesh, a scientist at OpenAI, who created the firm’s text-to-image model DALL-E .

But OpenAI is eyeing a product launch sometime in the future. As well as safety testers, the company is also sharing the model with a select group of video makers and artists to get feedback on how to make Sora as useful as possible to creative professionals. “The other goal is to show everyone what is on the horizon, to give a preview of what these models will be capable of,” says Ramesh.

To build Sora, the team adapted the tech behind DALL-E 3, the latest version of OpenAI’s flagship text-to-image model. Like most text-to-image models, DALL-E 3 uses what’s known as a diffusion model. These are trained to turn a fuzz of random pixels into a picture.

Sora takes this approach and applies it to videos rather than still images. But the researchers also added another technique to the mix. Unlike DALL-E or most other generative video models, Sora combines its diffusion model with a type of neural network called a transformer.

Transformers are great at processing long sequences of data, like words. That has made them the special sauce inside large language models like OpenAI’s GPT-4 and Google DeepMind’s Gemini . But videos are not made of words. Instead, the researchers had to find a way to cut videos into chunks that could be treated as if they were. The approach they came up with was to dice videos up across both space and time. “It’s like if you were to have a stack of all the video frames and you cut little cubes from it,” says Brooks.

The transformer inside Sora can then process these chunks of video data in much the same way that the transformer inside a large language model processes words in a block of text. The researchers say that this let them train Sora on many more types of video than other text-to-video models, varied in terms of resolution, duration, aspect ratio, and orientation. “It really helps the model,” says Brooks. “That is something that we’re not aware of any existing work on.”

“From a technical perspective it seems like a very significant leap forward,” says Sam Gregory, executive director at Witness, a human rights organization that specializes in the use and misuse of video technology. “But there are two sides to the coin,” he says. “The expressive capabilities offer the potential for many more people to be storytellers using video. And there are also real potential avenues for misuse.” 

OpenAI is well aware of the risks that come with a generative video model. We are already seeing the large-scale misuse of deepfake images . Photorealistic video takes this to another level.

Gregory notes that you could use technology like this to misinform people about conflict zones or protests. The range of styles is also interesting, he says. If you could generate shaky footage that looked like something shot with a phone, it would come across as more authentic.

The tech is not there yet, but generative video has gone from zero to Sora in just 18 months. “We’re going to be entering a universe where there will be fully synthetic content, human-generated content and a mix of the two,” says Gregory.

The OpenAI team plans to draw on the safety testing it did last year for DALL-E 3. Sora already includes a filter that runs on all prompts sent to the model that will block requests for violent, sexual, or hateful images, as well as images of known people. Another filter will look at frames of generated videos and block material that violates OpenAI’s safety policies.

OpenAI says it is also adapting a fake-image detector developed for DALL-E 3 to use with Sora. And the company will embed industry-standard C2PA tags , metadata that states how an image was generated, into all of Sora’s output. But these steps are far from foolproof. Fake-image detectors are hit-or-miss. Metadata is easy to remove, and most social media sites strip it from uploaded images by default.  

“We’ll definitely need to get more feedback and learn more about the types of risks that need to be addressed with video before it would make sense for us to release this,” says Ramesh.

Brooks agrees. “Part of the reason that we’re talking about this research now is so that we can start getting the input that we need to do the work necessary to figure out how it could be safely deployed,” he says.

Update 2/15: Comments from Sam Gregory were added .

Artificial intelligence

Ai for everything: 10 breakthrough technologies 2024.

Generative AI tools like ChatGPT reached mass adoption in record time, and reset the course of an entire industry.

What’s next for AI in 2024

Our writers look at the four hot trends to watch out for this year

  • Melissa Heikkilä archive page

These six questions will dictate the future of generative AI

Generative AI took the world by storm in 2023. Its future—and ours—will be shaped by what we do next.

Google’s Gemini is now in everything. Here’s how you can try it out.

Gmail, Docs, and more will now come with Gemini baked in. But Europeans will have to wait before they can download the app.

Stay connected

Get the latest updates from mit technology review.

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at [email protected] with a list of newsletters you’d like to receive.

IMAGES

  1. Music & Architecture: Architecture Dissertation by Radhika Ravindran

    research paper on music and architecture

  2. Music and Architecture: How Buildings are the Outcome of These Two

    research paper on music and architecture

  3. Music in Architecture

    research paper on music and architecture

  4. Music in Architecture & A Conservatory of Music by Sam Jebadurai D

    research paper on music and architecture

  5. (PDF) Holistic Architecture for Music Education: A proposal for

    research paper on music and architecture

  6. (PDF) MUTUAL RELATION BETWEEN MUSIC AND ARCHITECTURE

    research paper on music and architecture

VIDEO

  1. Downtime

COMMENTS

  1. PDF A Literature Review on The Use of Music in Architectural Design ...

    A Literature Review on The Use of Music in Architectural Design Education Burcu ÖLGEN, Işık University, Turkey Abstract In order to improve creative thinking in architectural design education, it is useful to interact with other disciplines such as music.

  2. University of Massachusetts Amherst ScholarWorks@UMass Amherst

    After additional research, I found that I was but one among many who had explored this relationship. What is most fascinating about the history of the long- ... music and architecture, it is illuminating to first dissect and understand why music is a particularly rich source of inspiration for architecture. In essence, this hinges on the fact ...

  3. Music and Architecture: A Cross between Inspiration and Method

    Music and Architecture: A Cross between Inspiration and Method Authors: Alessandra Capanna Sapienza University of Rome Abstract and Figures This paper is one of a set of lessons prepared for...

  4. PDF Music + Architecture: The Spatial Translation of Schenkerian Analysis

    INTRODUCTION Both music and architecture are mediums through which creativity is expressed. Music is defined as the art of sound in time that expresses ideas and emotions in significant forms through the elements of rhythm, melody, harmony, and color. While the origin of these ideas cannot be defined, they can be expressed in many forms.

  5. Comparative Study of Music and Architecture from the Aesthetic View

    In this paper, some comparative aspects of aesthetics in music and architecture, which has achieved the high degree of beauty in the light of the total shares and coordination between the two...

  6. Music: A Source of Inspiration and Harmony in Architecture: An ...

    An Apologia for the Study of Music New research reveals that children with music training develop a far better ... From these phenomena, thus paper confirms that architecture without musical or dance-related movements, patterns, or elements such as metre, rhythm, harmony, tonality, plasticity and tactus, is arrhythmic, ascetic and dis- ...

  7. Musi-tecture: Seeking Useful Correlations Between Music and Architecture

    Gregory Young, Jerry Bancroft and Mark Sanderson AA B S T R A C T This article explores some of the creative and educational po- tential available through the inte- gration of music and architecture. ence, likeness without connec- tion. Our interest is in the influ- ence and connection that the one can have upon and with the other.

  8. A methodology for the transformation of architectural forms into music

    Every architectural building design is a combination of various geometrical forms. These forms are generated from the geometrical shapes such as circles, squares, and triangles which undergoes transformations to form substructures. These substructures are then organized to form an architectural design. Similarly, in music a substructure may involve one or more notes in the chosen musical scale ...

  9. MUTUAL RELATION BETWEEN MUSIC AND ARCHITECTURE

    Step 1, Explaining the mutual relation between music & its philosophical effect on the architectural ideas. Step 2, Analyzing created architectural products which are designed by the musical harmony inside the computer simulation program. Step 3 Finding if these new design methods will meet the society's desire & acceptance in the future, or not?

  10. Music and Architecture: A Cross between Inspiration and Method

    This paper is one of a set of lessons prepared for the course of "Theory of Architecture" (Faculty of Architecture - "La Sapienza" University of Rome). The didactic aim was to present - to students attending the first year of courses - some methods for the beginning stages of design and their applicability to any kind creative work. The brief multimedia hypertext quoted at the ...

  11. Sound and Architecture

    open access Abstract Sound exists in architecture and architecture exists in sound. The process of how the two have influenced each other can be observed throughout history and has brought us the most surprising outcomes.

  12. (Pdf) Analogising Music and Architecture As Products of A Common

    Architecture and music share the unification of three realms: conceptual, objectual and perceptual. In the historical writings of ikhwãn al-safã; (The Brethren of Sincerity-Muslim philosophers in Basra, Iraq, in the 10th century), the relationship between arts, architecture, music, astronomy, mathematics and chemistry was clearly mentioned.

  13. PDF Study on the Relationship between Architecture and Music

    Relationship between Architecture and Music The main role of music is to impress the soul through the center of emotions and feelings. The specific energy which lies in the nature of music can be formed. It can express the scenes which composer has in his thoughts and intentions while composing the song.

  14. (PDF) A Literature Review on The Use of Music in ...

    A Literature Review on The Use of Music in Architectural Design Education International Journal of Technology and Design Education Authors: Burcu Olgen Concordia University Montreal Abstract and...

  15. The Interplay of Music and Architecture: Layering of Sound and Space

    Music and Architecture is an ongoing workshop series organised by Theatrum Mundi, a professional network of academics, architects, planners, performing and visual artists, hosted by LSE Cities. ... Selected papers of internet research. Data Citizenship: Data Literacies to Challenge Power Imbalance. 2021 • Simeon Yates. Download Free PDF View PDF.

  16. PDF Comparing And Evoking Architecture With Music

    Abstract : Architecture pleases the eye while music pleases the ear and the very interesting part is that music and architecture have extreme similarities. The principles of architecture resemble with that of music. Whether it is music or architecture both needs an artistic mind to create.

  17. (PDF) Music in Architecture: Lest we Forget

    This paper discusses one way of knowing through "The Piano Mill Project", a hybrid building and musical instrument, designed and purpose-built to house sixteen reclaimed pianos - vestiges of...

  18. Visions of Research in Music Education

    Visions of Research in Music Education Volume 31 Article 2 2018 Musical architects: Immersive learning through design thinking in ... The music/architecture metaphor provided a way to ground the experience in concrete means while allowing for complete creative freedom. Further, the curriculum became multifaceted as it developed into a

  19. Music and architecture by ruiduan

    1 Abstract As two different categories of arts, music and architecture represent the art in different way. Architecture is the art of design in space and music is the art of design in time....

  20. A systematic review of artificial intelligence-based music generation

    For those papers that describe a generation system, we also include: • System name: Only if the paper uses a specific name for the system. • Dataset: Dataset used for training. • Music Representation: The type of music representation used in the paper (symbolic or audio). • Type of generation: Ex Nihilo, inpainting, harmonization, etc. •

  21. Black History Month 2024: African Americans and the Arts

    Augusta Savage, Schomburg Center for Research in Black Culture, Photographs and Prints Division, The New York Public Library. (1930 - 1939). Raising a voice "My mother said to me 'My child listen, whatever you do in this world no matter how good it is you will never be able to please everybody.

  22. Gartner Emerging Technologies and Trends Impact Radar for 2024

    Use this year's Gartner Emerging Tech Impact Radar to: ☑️Enhance your competitive edge in the smart world ☑️Prioritize prevalent and impactful GenAI use cases that already deliver real value to users ☑️Balance stimulating growth and mitigating risk ☑️Identify relevant emerging technologies that support your strategic product roadmap Explore all 30 technologies and trends: www ...

  23. MUSIC AS INSPIRATION FOR ARCHITECTURAL FORM

    Music and Architecture: A Cross between Inspiration and Method Alessandra Capanna This paper is one of a set of lessons prepared for the course of "Theory of Architecture" (Faculty of Architecture — "La Sapienza" University of Rome).

  24. Introducing Gemini 1.5, Google's next-generation AI model

    Gemini 1.5 delivers dramatically enhanced performance. It represents a step change in our approach, building upon research and engineering innovations across nearly every part of our foundation model development and infrastructure. This includes making Gemini 1.5 more efficient to train and serve, with a new Mixture-of-Experts (MoE) architecture.

  25. Systematic Analysis of Risks in Industry 5.0 Architecture

    The purpose of this research is to examine the risks involved in the adoption of Industry 5.0's architecture. The paper discusses the significance of Industry 5.0 and the advanced technology needed for this industrial revolution, followed by a detailed discussion of Industry 5.0's human-centric strategy.

  26. Visualizing Music Compositions in Architectural Conceptual Design

    The study aims to achieve a new conceptual design-thinking module in architectural education by focusing on the principles and factors of music and the ability of applying it in the architectural ...

  27. OpenAI's Sora video-generating model can render video games, too

    The paper, titled "Video generation models as world simulators," co-authored by a host of OpenAI researchers, peels back the curtains on key aspects of Sora's architecture — for instance ...

  28. OpenAI teases an amazing new generative video model called Sora

    OpenAI has built a striking new generative video model called Sora that can take a short text description and turn it into a detailed, high-definition film clip up to a minute long.. Based on four ...