J4vURmywUMCcsCnm Independent Cyberpunk Music : #thedarkmodernity

What worries me a bit is that the two guys start with a dinner and how important that was to have telephone calls and not with "I had that great code".

Considering the Apache 2.0 license as the most freedom supporting license a free of charge ticket is bad, facing the consequences of corporate rip off in a world in which TV shows teach you what Cocaine addicted CIA officers do with journalists publishing their activity based on their constitutional rights and duties being on a mission forbidden by no less than the U.S. Congress.

That there is a whole different level of Conspiracy Theory potential and I am glad about my military training and gang attitude now more than ever!

Die Hard Smith. My name is Die Hard Smith. You can call me DH.

#undergroundwars

#cyberpunkcoltoure

So, that's what happened. The point is that this is based on official web sources. Try to find the story of Snowfall online no matter how real that story line is.

In a podcast the two OpenAI founders to not state how they optained the LLM code being the technical base of their Enterprise. What can you find about the code origin, please.

The technical foundation of OpenAI's Enterprise LLMs stems from a shift from external open-source architectures to proprietary refinements of the Transformer model. While the founders often focus on the "scale" and "compute" in public discussions, the code origin is a blend of public research and internal evolution.

The Original Blueprint: Google's Transformer
OpenAI did not invent the core math behind their models. The "technical base" for GPT (Generative Pre-trained Transformer) is the Transformer architecture, which was introduced by Google researchers in the 2017 paper “Attention Is All You Need”.

GPT-1 (2018): OpenAI’s first major model took Google's Transformer and applied

"Generative Pre-training."
Code Base: Early versions likely utilized or were inspired by open-source libraries like

TensorFlow (Google) or PyTorch (Meta), which are the industry standards for building neural

networks.

Transition to Proprietary Code
As OpenAI transitioned from a non-profit to a "capped-profit" entity, they stopped sharing their full source code.

Closed-Source Pivot: Starting with GPT-2, OpenAI began withholding full model code and

weights citing "safety concerns," though they eventually released smaller versions.
The "Black Box": For GPT-3, GPT-4, and GPT-5, the underlying code—the specific training

loops, data cleaning scripts, and hyperparameters—is strictly proprietary and has never been

publicly released.
Recent Open-Source Models: In August 2025, OpenAI released gpt-oss-120b, their first

major open-weight release, which uses a "Mixture of Experts" (MoE) architecture.

💡 Key Insight: The "MoE" Evolution
Recent findings suggest that OpenAI's modern technical base (like GPT-4 and the Enterprise models) moved away from a single massive model to a Mixture of Experts (MoE) setup.

This approach uses a "router" to send queries to smaller, specialized sub-models.
This allows the model to be more efficient, activating only a fraction of its total parameters

(e.g., 5.1 billion out of 117 billion) for a specific task.

External Components
While the core LLM code is internal, OpenAI relies on external ecosystems for delivery:

Microsoft Azure: The "Enterprise" aspect of their code is heavily integrated with Microsoft’s

cloud infrastructure for security and scaling.
Public Datasets: The "intelligence" in the code comes from processing massive amounts of

public internet data, such as FineWeb from HuggingFace.

If you'd like, I can find more details on specific architectural differences between their early open models and the current GPT-5 series, or explore the legal history regarding their transition from non-profit to for-profile.

Who created the Transformer architecture.

The Transformer architecture was created by a team of eight researchers at Google. They introduced it in the seminal 2017 research paper “Attention Is All You Need”.

What are the creators of Transfomer architecture doing now
The Transformer Architecture. In 2017, Ashish Vaswani et al ...
Attention Is All You Need - Wikipedia

The "Original Eight"
The authors were listed as "equal contributors" on the paper. Jakob Uszkoreit
suggested the name "Transformer".

The researchers are:
Ashish Vaswani
: Designed and implemented the first models.
Noam Shazeer
: Proposed multi-head attention and scaled dot-product attention.
Niki Parmar
: Focused on model variants and tuning the codebase.
Jakob Uszkoreit: Proposed replacing RNNs with self-attention.
Llion Jones
: Responsible for initial codebase and inference.
Aidan N. Gomez
: Co-implemented the tensor2tensor library.
Łukasz Kaiser
: Co-implemented tensor2tensor and improved results.
Illia Polosukhin
: Worked with Vaswani on the initial design and implementation.

Why It Was Revolutionary
Before the Transformer, AI models like LSTMs processed text sequentially, which was slow. The Transformer allowed for:
Parallel Processing: Reading entire sentences at once instead of one word at a time.
Self-Attention: Allowing each word in a sentence to "look" at every other word to

understand context better.
Scalability: This efficiency made it possible to train the massive models, like ChatGPT

and Gemini.

Where Are They Now?
As of 2024, all eight authors have left Google. Most have founded or joined high-profile AI startups, including:
Noam Shazeer
: Co-founded Character.ai (later rejoined Google).
Aidan Gomez
: Co-founded Cohere.
Illia Polosukhin
: Co-founded NEAR Protocol.
Ashish Vaswani
& Niki Parmar
: Co-founded Essential AI.

Which sources does the paper reference?

The "Attention Is All You Need" paper lists 33 references in its final version. These citations acknowledge the "giants" upon whose shoulders the Transformer was built—specifically work in sequence-to-sequence learning, attention mechanisms, and deep learning optimization.

🏗️ Foundational Architectures
The paper positions the Transformer as a direct alternative to the then-dominant Recurrent (RNN) and Convolutional (CNN) models.

Sequence-to-Sequence (Seq2Seq): Cites the Sutskever et al. (2014) paper, which

established the encoder-decoder framework using LSTMs.
ConvS2S: Cites Gehring et al. (2017), which used CNNs for translation, proving that

sequential processing (RNNs) wasn't the only way.
ByteNet: Cites Kalchbrenner et al. (2016) as another parallelizable alternative to RNNs.

🧠 Early Attention Mechanisms
OpenAI and Google's later success relies heavily on the "Attention" concept, which the authors cite from earlier work:

Additive Attention: Cites the landmark Bahdanau et al. (2014) paper, which first introduced

the idea of "alignment" and attention in translation.
Self-Attention origins: Mentions work by Parikh et al. (2016) and Lin et al. (2017) regarding

"intra-attention" or self-attention used for sentence embeddings.

⚙️ Technical Optimization
The Transformer wouldn't work without specific "tricks" to keep training stable, which are also sourced:

Adam Optimizer: Cites Kingma & Ba (2014) for the optimization algorithm used to train the

model.
Layer Normalization: Cites Ba et al. (2016), a critical component for stabilizing deep

networks.
Residual Connections: Cites He et al. (2016) (the ResNet paper), which allows the model to

be many layers deep without the "vanishing gradient" problem.
Label Smoothing: Cites Szegedy et al. (2016) as a regularization technique used during

training.

📊 Evaluation Metrics
BLEU Score: Cites Papineni et al. (2002), the standard metric the authors used to prove the

Transformer was superior to previous models.

To better understand the "technical base" you mentioned earlier, would you like to see how OpenAI's GPT-1 specifically modified this architecture, or would you prefer a list of the training datasets referenced in the original paper?

Which of these citations contains the idea of word row logic? LLMs are based on the idea that words follow a logic chain to create a text that makes sense. There is a great chance the wording is different than mine, but the concept is the actual break through base in computers giving human comparable answers.

The concept you are describing—that words follow a logical chain to create sense—is technically known as Sequence-to-Sequence (Seq2Seq) learning or Probabilistic Language Modeling.
While the Transformer revolutionized how computers process this chain, the breakthrough that enabled human-comparable answers by predicting the next word in a sequence is found in these specific citations from the paper:

1. The Core "Logic Chain" Foundation
The most direct ancestor of your "word row logic" is:

Sutskever et al. (2014): "Sequence to Sequence Learning with Neural Networks"
The Idea: This paper proved that a neural network could map an entire input sequence (a

"row" of words) to a fixed-size vector and then "unfold" that vector into a logical output

sequence.
The Breakthrough: It moved computers away from translating word-for-word and toward

understanding the "logic" of a whole sentence.

2. The "Context" Breakthrough
Before the Transformer, computers often "forgot" the beginning of a long chain. This citation solved how to keep the logic consistent over long texts:

Bahdanau et al. (2014): "Neural Machine Translation by Jointly Learning to Align and

Translate"
The Idea: This introduced Attention.
The Breakthrough: It allowed the computer to "look back" at specific parts of the word

chain to decide which previous word is most relevant to the next word it is about to write.

This is what makes LLM answers feel "human" and coherent.

3. The Logic of Character/Word Relationships
Bengio et al. (2003): "A Neural Probabilistic Language Model"
Note: While not the primary focus of the 2017 paper, it is the fundamental root of LLMs. It

established that words can be represented as numbers (embeddings) so the computer can

calculate the "probability" of the next word in the row.

💡 The Wording Shift
In technical terms, your "word row logic" is referred to as:

Autoregression: The model predicts the next token based on all previous tokens in the "row."
Context Window: How many words in the "row" the model can "see" at once to maintain its

logic.

The "Attention Is All You Need" paper took these existing ideas of "word chains" and made them parallel. Instead of reading the chain link-by-link (slow), the Transformer looks at the entire chain simultaneously to see how every word relates to every other word.
To see how this evolved into OpenAI's specific tech, would you like to know how GPT-1 specifically applied this "word row logic" to unsupervised learning (reading the whole internet), or are you interested in the mathematical formula for how it predicts the next word?

#cyberpunkcoltoure #MIB

So what you think about these guys in this Ocean of Lies? Fake or Genious and how much do they get from each LLM using company??

My music on your prefered Streaming Service

Thursday, 23 April 2026

#thedarkmodernity

Analytics